[01:57:45] <groceryheist>	 I can't reach github from stat1006 for some reason
[01:57:48] <groceryheist>	 i can reach google though
[01:58:03] <groceryheist>	 wait 
[01:58:07] <groceryheist>	 I can't reach google either
[02:03:53] <groceryheist>	 i changed http proxies and it seems better
[06:13:38] <wikibugs>	 10Analytics, 10Operations, 10ops-eqiad: Degraded RAID on dbstore1003 - https://phabricator.wikimedia.org/T239217 (10Marostegui) It looks like disk #4: ` root@dbstore1003:~# megacli -LDPDInfo -aAll  Adapter #0  Number of Virtual Disks: 1 Virtual Drive: 0 (Target Id: 0) Name                : RAID Level...
[07:56:40] <joal>	 Hi team
[07:59:00] <wikibugs>	 (03PS4) 10Joal: Fix sqoop after new tables added [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554255 (https://phabricator.wikimedia.org/T239127)
[08:00:06] <wikibugs>	 (03CR) 10Joal: [V: 03+2 C: 03+2] "Merge for next week deploy" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554255 (https://phabricator.wikimedia.org/T239127) (owner: 10Joal)
[08:09:59] <elukey>	 bonjour :)
[08:29:14] <wikibugs>	 10Analytics, 10MediaWiki-Cache, 10Research: Creating a wikipedia CDN caching trace - https://phabricator.wikimedia.org/T239885 (10Addshore)
[08:36:37] <elukey>	 groceryheist: o/ you should use https://wikitech.wikimedia.org/wiki/HTTP_proxy, let me know if you still have issues
[08:42:11] <elukey>	 joal: o/
[08:42:32] <elukey>	 not sure if you have followed https://phabricator.wikimedia.org/T236180
[08:42:49] <elukey>	 but airflow needs a special setting for mariadb (more info in the task)
[08:43:00] <elukey>	 I have already applied it in test and didn't see anything exploding
[08:43:04] <joal>	 I have not followed no
[08:43:20] <elukey>	 but if you could triple check and tell me your thoughts I would be happier
[08:43:25] <elukey>	 before applying it to prod :)
[08:43:55] <elukey>	 (that needs a mariadb restart so I'll have to temporary stop camus etc.. hive/oozie will probably need a restart)
[08:44:15] <joal>	 elukey: no need for camus to stop, but hive oowie yes
[08:44:35] <elukey>	 joal: yes but if camus run then hive is used, this was my point
[08:44:44] <elukey>	 safer to drain a little bit before doing the restarts
[08:46:47] <joal>	 elukey: I don't know enough of how mariadb is used by our systems to provde any value here - A quick read of the doc makes me think this setting is actually good, and the task says this will also become default in next versions, so let's go fir ut :)
[08:46:59] <joal>	 elukey: got it for camus, makes sense
[08:50:02] <elukey>	 joal: yep I am in the same position as you are, but it doesn't seem a problem.. just wanted your opinion :)
[08:50:09] <elukey>	 all right so I am going to stop DER CAMUS
[09:05:04] <elukey>	 joal: after a chat with Marcel, we found another kerberos thing to fix (low priority though)
[09:05:54] <elukey>	 since presto will be kerberized, superset will need to be able to authenticate to it to visualize the data quality dashboards
[09:06:07] <elukey>	 but of course there doesn't seem to be support yet
[09:06:13] <joal>	 Ah - makes sense - so superset will access to a keyrab I guess
[09:06:32] <elukey>	 in theory yes, we'll have to follow up with superset upstream probably :(
[09:06:50] <elukey>	 !log temporarily stop timers on an-coord1001 to ease the restart of mariadb on an-coord1001
[09:06:53] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[09:09:29] <joal>	 elukey: you now are part of the superset dev team I guess ;)
[09:10:53] <elukey>	 joal: after a long issue on gh explaining in depth the problem and pointing to the line of code that was triggering it, I got two things
[09:11:13] <elukey>	 1) a fix was made but nobody pinged me on the issue, I discovered it just checking related things
[09:11:38] <elukey>	 2) I tested the fix and reported that it worked, got a thumb up to my message and the issue was closed
[09:11:45] <elukey>	 not even a trace of "thank you"
[09:12:03] <elukey>	 sometimes it is still frustrating to deal with them, even if they improved
[09:12:07] <elukey>	 this is not the apache way
[09:12:09] <elukey>	 for sure
[09:17:16] <joal>	 :(
[09:24:44] <elukey>	 going to wait for the cassandra load to finish or ok to restart?
[09:30:11] <joal>	 elukey: thinking
[09:30:28] <joal>	 elukey: let me pause it, then we can restartand reenable if you don't mind
[09:30:58] <joal>	 actually elukey, let's restart without pausing - worst case: one loading fail, I restart it, done
[09:31:50] <elukey>	 ack
[09:34:18] <elukey>	 !log stop oozie/hive-*; restart mariadb; restart oozie/hive-* on an-coord1001 to pick up explicit_defaults_for_timestamp - T236180
[09:34:20] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[09:34:21] <stashbot>	 T236180: Deploy search platform airflow service - https://phabricator.wikimedia.org/T236180
[09:34:27] <elukey>	 joal: all done
[09:34:38] <joal>	 \o/
[09:34:53] <joal>	 elukey: this means there is an available airflow we can play with?
[09:34:59] <joal>	 or if not now, soon?
[09:35:07] <elukey>	 !log enable timers on an-coord1001 after maintenance
[09:35:08] <stashbot>	 Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log
[09:35:24] <elukey>	 joal: yes check https://phabricator.wikimedia.org/T236180
[09:35:26] <elukey>	 we are close
[09:36:06] <elukey>	 Erik has to init the db and then it should be ready to go
[09:36:23] <joal>	 This is awesome :)
[09:36:44] <joal>	 Thanks a lot ebernhardson and elukey for setting this up!
[09:39:56] <elukey>	 it would be nice to have something to talk about in SF
[09:40:02] <elukey>	 next steps experiences etc..
[10:02:20] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Wikidata, 10User-Addshore, 10Wikidata-Campsite (Wikidata-Campsite-Iteration-∞): Sqoop wikidata terms tables into hadoop - https://phabricator.wikimedia.org/T239471 (10Addshore) 05Open→03Resolved
[10:42:06] <wikibugs>	 10Analytics, 10Research-Backlog, 10Wikidata: Copy Wikidata dumps to HDFS - https://phabricator.wikimedia.org/T209655 (10GoranSMilovanovic) @JAllemandou Thank you - as ever!
[10:51:59] <wikibugs>	 10Analytics, 10Discovery, 10Event-Platform, 10Wikidata, and 2 others: Log Wikidata Query Service queries to the event gate infrastructure - https://phabricator.wikimedia.org/T101013 (10dcausse) 05Open→03Resolved
[11:02:20] * elukey bbiab
[11:15:59] <addshore>	 joal o/
[11:16:12] <addshore>	 I guess I have another query that is having issues due to joins and partitioning ?
[11:16:18] <addshore>	 maybe....
[11:16:46] <addshore>	 https://www.irccloud.com/pastebin/b0pMmORC/
[11:17:16] <addshore>	 ^^ I would expect that to result in around 2 billion rows, as wikibase_wbt_item_terms has around 2 billion, instead I end up with less than 1 billion...?
[11:26:49] <joal>	 Hi addshore - THis doesn't seem related to partitioning
[11:27:15] <addshore>	 any ideas? :)
[11:29:27] <joal>	 nope, not really
[11:29:46] <joal>	 I guess going for joins one after the other would help, but it's cumbersome
[11:29:47] <addshore>	 hmmm :(
[11:30:00] <addshore>	 Yes, Maybe I will have to try that
[11:35:35] <addshore>	 So, it is infact the first join that seems to reduce the number of rows
[11:35:40] <addshore>	 https://www.irccloud.com/pastebin/2ELCD2er/
[11:44:39] <elukey>	 https://github.com/dropbox/PyHive/issues/288
[11:44:40] <elukey>	 sigh
[11:45:13] <elukey>	 so 0.6.2, that contains support for presto and kerberos, has not been released, no response from upstream in months
[11:45:17] <elukey>	 (dropbox)
[11:49:04] <addshore>	 joal: my debugging has lead me to believe that this will work....  https://www.irccloud.com/pastebin/k4BwK6vD/
[11:49:08] <addshore>	 Now to run it and see
[11:50:11] <joal>	 hm - This feels wrong, but maybe :)
[11:56:24] <joal>	 Ah! addshore you're actually right, I get it now :)
[11:56:27] <addshore>	 I guess those joins should actually be wmf_raw.wikibase_wbt_text.wiki_db = wmf_raw.wikibase_wbt_item_terms.wiki_db for example
[11:56:39] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10User-Elukey: Kerberize Superset to allow Presto queries - https://phabricator.wikimedia.org/T239903 (10elukey)
[11:57:22] <elukey>	 sigh
[11:57:27] <joal>	 addshore: yes, it would be better this way - reason for the issue: you do a left join, therefore many rows will have null right-table values, and the filter applies after the join, therefore removing those nulls
[11:57:40] <joal>	 Tricky!
[11:57:53] <joal>	 Good catch addshore :)
[11:58:00] <addshore>	 Yes, thats right! now I think about it it makes sense, just struggled to notice it while just staring at the query
[11:58:20] <elukey>	 disappearing for a bit, lunch + errands
[15:05:39] <milimetric>	 hey yall
[15:17:32] <elukey>	 o/
[15:28:05] <milimetric>	 ottomata: the only weird thing about the refine failures is that there's no _REFINE_FAILED flag in those directories, and the data looks the same as other hours.  The partitions for that hour aren't on any of the tables, but the data is there
[15:29:45] <milimetric>	 according to the docs, to rerun I'd have to first write the _REFINE_FAILED flag, so I'm reading some code
[15:30:12] <milimetric>	 (and none of the hours have a _REFINED flag, I checked a few days - that's also inconsistent with the docs)
[15:30:34] <elukey>	 milimetric: there is an ignore flag iirc that will force the execution, but in theory refine should be ok now
[15:30:49] <elukey>	 since we didn't get any alarm from the monitor stuff
[15:30:53] <elukey>	 like data missing etc..
[15:31:06] <milimetric>	 refine's ok, it didn't send failures for the other hours and the other hours are added as partitions to the tables (before and after the hour with the meta failure)
[15:31:08] <elukey>	 in theory the next execution of refine should have picked up the prev one
[15:31:24] <milimetric>	 yeah, so there's two inconsistencies
[15:31:37] <milimetric>	 1. none of the hours have a _REFINED flag, and the docs suggest they all should
[15:32:03] <ottomata>	 milimetric:  there is or was probably partial data there
[15:32:05] <milimetric>	 2. the partition for the failed hour is not on the tables
[15:32:33] <ottomata>	 if there is no _REFINED or _REFINE_FAILED flag
[15:32:35] <milimetric>	 hm, doubtful there was partial data since it failed to fetch the schema, I think it does that at the beginning
[15:32:46] <ottomata>	 then refine will attempt to refine the data
[15:32:51] <ottomata>	 overwriting whatever is there what what it gets from raw
[15:33:03] <ottomata>	 hm, yeanh depends on the timing i guess
[15:33:03] <milimetric>	 ottomata: but there's no _REFINED flag on *any* hour
[15:33:10] <ottomata>	 usually a delay of 2 hours will have refeine get everything.
[15:33:13] <ottomata>	 so yeah maybe not
[15:33:21] <ottomata>	 and yeah, there would have been a flag
[15:33:22] <ottomata>	 you are right
[15:33:31] <ottomata>	 why data but no flag?
[15:33:32] <ottomata>	 that is weird
[15:33:46] <milimetric>	 data and partitions are there, but when I say any hour I really mean like going back months
[15:33:47] <ottomata>	 milimetric:  on any of the 'failed' hours you mean?
[15:33:52] <milimetric>	 no *ANY*
[15:33:53] <ottomata>	 ????
[15:33:55] <ottomata>	 ok looking
[15:34:02] <elukey>	 is it possible that since it failed while fetching from meta, it just didn't write any REFINE_FAILED and the next run picked it up?
[15:34:12] <milimetric>	 like hdfs dfs -ls /wmf/data/raw/eventlogging/eventlogging_MobileWikiAppiOSFeed/hourly/2019/10/01/20
[15:34:25] <milimetric>	 elukey: no, like in September, October, etc.
[15:34:27] <milimetric>	 no _REFINED flag
[15:34:38] <elukey>	 nono I meant for REFINE FAILED, not REFINED
[15:34:54] <elukey>	 no idea if it should be there or not
[15:34:55] <elukey>	 :)
[15:34:59] <milimetric>	 I think Andrew's right that there's no _REFINE_FAILED because it probably re-refined it automatically
[15:35:33] <milimetric>	 lemme check other schemas, maybe it's just this one
[15:35:46] <ottomata>	 milimetric:  there is?i see them
[15:35:48] <ottomata>	 oh
[15:35:54] <ottomata>	 milimetric:  _REFINED flag goes in the output data di
[15:35:55] <ottomata>	 dir
[15:35:56] <ottomata>	 not the source
[15:36:00] <ottomata>	 not in raw
[15:36:14] <ottomata>	 sl
[15:36:33] <ottomata>	 hdfs dfs -ls /wmf/data/event/MobileWikiAppiOSFeed/year=2019/month=12/day=4/hour=20
[15:36:36] <ottomata>	 has _REFINED_FAILED
[15:36:37] <ottomata>	 as expected
[15:36:45] <ottomata>	 so you just need to re-refine with the --ignore_failed_flag=true
[15:36:57] <milimetric>	 ok, I'll clear up the docs on that
[15:38:04] <elukey>	 ottomata: if we don't see for some reason the email from refine failure, shouldn't we get also an alarm from the monitor refine timer? Or am I remembering incorrectly?
[15:38:07] <elukey>	 it is easy to miss emails
[15:38:26] <ottomata>	 we did get the email from refine failure
[15:38:45] <ottomata>	 refine monitor will alert if e.g. a refine job hasn't run by the expected time
[15:38:54] <ottomata>	 if the job fails, it has run, refine monitor won't check
[15:39:07] <ottomata>	 refine monitor uses the same logic as refine to find directories that NEED refinement
[15:39:19] <ottomata>	 if the _REFINED_FAILED flag exists, the directory does not need refinement, 
[15:39:43] <ottomata>	 often _REFINE_FAILED happens because refine is not possible.  this happened more often when we inferred schemas from json data instead of the schemas
[15:39:51] <ottomata>	 so there is no reason to try and re-refine somethign that will never refine
[15:42:36] <elukey>	 maybe I am misremembering but when an email for refine was sent in the past we had also a correspondent icinga alert for monitor_etcc.. firing
[15:42:53] <elukey>	 but in this case, as you said, there is nothing wrong from its point of view
[15:43:26] <elukey>	 my point is that say we don't receive the email about refine failures for $reasons
[15:43:40] <elukey>	 (spam, problems with emails in prod, etc..)
[15:43:54] <elukey>	 how can we know that something failed refine?
[15:44:22] <ottomata>	 heh, our only alerting is email, so if we don't receive the email we won't know
[15:45:32] <ottomata>	 elukey:  i think if RefineMonitor sends an alert, it does exit(1)
[15:45:38] <ottomata>	 which causes the timer to fail, which ends up in icigna
[15:46:14] <ottomata>	 that doesn't work for Refine, since it runs in YARN
[15:46:15] <wikibugs>	 10Analytics, 10Performance-Team, 10Research, 10Security-Team, and 2 others: A Large-scale Study of Wikipedia Users' Quality of Experience: data release - https://phabricator.wikimedia.org/T217318 (10JFishback_WMF) a:05JFishback_WMF→03Gilles Assigning back to @Gilles but let me know if there's anything...
[15:46:53] <ottomata>	 and, we don't really want to alert just because of _REFINE_FAILED
[15:46:59] <ottomata>	 ideally we'd have an interface like oozie's
[15:47:04] <ottomata>	 where we could see the status of all the refines
[15:47:23] <ottomata>	 switching to airflow or whatever would probably give us that
[15:50:44] <elukey>	 ottomata: no upon every refine failed, but something that after a day sweeps dirs on hdfs and calls out _REFINE_FAILED might be good as icinga alert
[15:51:03] <elukey>	 it would be a safe net in my opinion
[15:51:31] <elukey>	 an icinga alert can't be missed for long, an email definitely can
[15:51:31] <ottomata>	 that would be good
[15:52:11] <ottomata>	 elukey:  dynamic icinga alerts aren't really possible though, we can't easily have alerts for each day or hour or whatever
[15:52:14] <ottomata>	 we could have a rolling one though
[15:52:30] <ottomata>	 like, > 0 _REFINE_FAILED in last hour, day, week, whatever
[15:53:00] <elukey>	 yes yes this is my idea as well
[15:53:03] <ottomata>	 ya that woudl be cool
[15:53:56] <wikibugs>	 10Analytics, 10Datasets-Archiving, 10Research-Backlog: Make HTML dumps available - https://phabricator.wikimedia.org/T182351 (10ArielGlenn)   >>! In T182351#5710173, @ArielGlenn wrote: ... > No, there is a ticket for dumping parsed wikitext as it is stored in RESTBase but that's not a full page view with ski...
[15:58:34] <elukey>	 ottomata: could be a good excercise for me with pyspark? what do you think?
[16:33:23] <ebernhardson>	 elukey: hey! airflow db initialized correctly now. if you have a moment i typo'd one of the airflow directories in puppet: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/554909/
[16:35:12] <ebernhardson>	 heh, i've also been having similar thoughts about failure emails :) An email isn't enough to ensure the failure doesn't get missed
[16:35:32] <elukey>	 going to merge it in a bit :)
[16:35:38] <ebernhardson>	 thanks!
[16:35:49] <elukey>	 re emails: yes we now rely almost exclusively on icinga alerts for systemd timers
[16:36:00] <elukey>	 I complely agree
[16:40:14] <elukey>	 ebernhardson: merged and ran puppet on an-airflow1001
[16:40:32] <ottomata>	 elukey:  might not really need spark for it..., might be just an hdfs api thing
[16:40:41] <ottomata>	 you might be able to reuse some of the RefineTarget find logic though
[16:40:58] <elukey>	 ottomata: sure it is an clear excuse for me to work with spark :D
[16:41:05] <elukey>	 *a clear
[16:48:04] <ottomata>	 ya but i think it might not really be a spark thing....
[16:48:08] <ottomata>	 spark wont' really help you much
[16:48:18] <ottomata>	 it'd be more like the tmp cleaner  thing i just did
[16:48:28] <ottomata>	 basicallty implementing a find in hdfs
[16:48:36] <ottomata>	 hmmm
[16:48:39] <ottomata>	 but
[16:48:39] <ottomata>	 mabye
[16:48:42] <ottomata>	 using RefineTarget might help
[16:48:46] <ottomata>	 https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-job/src/main/scala/org/wikimedia/analytics/refinery/job/refine/RefineTarget.scala#L476-L581
[16:49:06] <ottomata>	 not sure if that is the right approach for this though
[16:49:21] <ottomata>	 you might just want to look for any _REFINE_FAILED flags found
[16:49:36] <ottomata>	 RefineTarget considers both input and output
[16:49:53] <elukey>	 uff you ruined my happiness about spark! :P
[16:50:05] <elukey>	 now I need to find something else 
[16:50:06] <elukey>	 :D :D :D
[16:50:20] <ottomata>	 hmm actually
[16:50:21] <ottomata>	 you could use that
[16:50:29] <ottomata>	 targets = RefineTarget.find(...)
[16:51:22] <ottomata>	 failedTargets = targets.filter(_.failureFlagExists())
[16:51:43] <ottomata>	 yeah luca that would do it
[16:53:09] <elukey>	 is that two lines of spark? :D
[16:53:41] <elukey>	 anyway joking, opening a task with all the info
[16:56:26] <ottomata>	 elukey: !  wanna see if it works together real quick in hangout?
[16:56:29] <ottomata>	 before standup?
[16:58:58] <elukey>	 ottomata: I am in!
[17:07:34] <elukey>	 bearloga: o/ - when you have time, I sent two code reviews for you :)
[17:09:51] <bearloga>	 elukey: thank you! Looking at them now
[17:10:48] <wikibugs>	 (03PS1) 10Milimetric: Normalize coordinator properties to bundle props [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554915
[17:15:06] <wikibugs>	 (03CR) 10Milimetric: "moved all the refinery_hive_jar versions to 0.0.100 because one of the coordinators had 0.0.100 and that means the bundle should have that" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554915 (owner: 10Milimetric)
[17:17:19] <elukey>	 bearloga: fixed your comment thanks :)
[17:18:33] <elukey>	 (main.sh I meant)
[17:19:18] <ottomata>	 elukey:  https://gist.github.com/ottomata/6841ee47f8b4b86081856a1ff636d38f
[17:20:21] <elukey>	 ottomata: nice!
[17:28:38] <elukey>	 bearloga: also thanks for puppet, pebkac :)
[17:29:41] <bearloga>	 :)
[17:55:44] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10User-Elukey: Kerberize Superset to allow Presto queries - https://phabricator.wikimedia.org/T239903 (10mforns) We could ping dropbox, to see if they want to upgrade pypi? We could also ping superset if they want to change lib? <-- Maybe better bet.
[17:56:04] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10User-Elukey: Kerberize Superset to allow Presto queries - https://phabricator.wikimedia.org/T239903 (10mforns) p:05Triage→03High
[17:57:17] <wikibugs>	 10Analytics, 10Editing-team, 10observability, 10Performance-Team (Radar): VE edit data stopped at 2019-11-24Z00:57 and again at 2019-12-01Z22:45 - https://phabricator.wikimedia.org/T239121 (10Jdforrester-WMF) And again. @Ottomata, can you kick webperf1001 again?
[17:57:57] <wikibugs>	 10Analytics, 10MediaWiki-Cache, 10Research: Creating a wikipedia CDN caching trace - https://phabricator.wikimedia.org/T239885 (10mforns) Does this data set answer your needs? Not sure if you're asking for that. https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Caching
[17:58:42] <wikibugs>	 10Analytics, 10Editing-team, 10observability, 10Performance-Team (Radar): VE edit data stopped due to statsv falling over (?) on webperf1001 - https://phabricator.wikimedia.org/T239121 (10Jdforrester-WMF)
[17:59:16] <wikibugs>	 10Analytics: Add pertinent wdqs_external_sparql_query metrics and wdqs_internal_sparql_query  to a superset dashboard - https://phabricator.wikimedia.org/T239852 (10mforns) We should ingest that data into druid, so that is queryable from Superset.
[17:59:39] <wikibugs>	 10Analytics: Add pertinent wdqs_external_sparql_query metrics and wdqs_internal_sparql_query  to a superset dashboard - https://phabricator.wikimedia.org/T239852 (10mforns) p:05Triage→03High
[18:00:18] <wikibugs>	 10Analytics, 10Analytics-Kanban: Delay cassandra mediarequest-per-file daily job one hour so that it doesn't colide with pageview-per-article - https://phabricator.wikimedia.org/T239848 (10mforns) p:05Triage→03High
[18:01:26] <wikibugs>	 10Analytics, 10Analytics-Kanban: Delay cassandra mediarequest-per-file daily job one hour so that it doesn't colide with pageview-per-article - https://phabricator.wikimedia.org/T239848 (10mforns) 05Open→03Resolved
[18:01:48] <wikibugs>	 10Analytics: Update mediawiki-history to use new Multi-Content-Revision tables - https://phabricator.wikimedia.org/T239591 (10mforns) p:05Triage→03High
[18:02:54] <wikibugs>	 10Analytics: Change sqoop project list config so that content sqoop doesn't fail - https://phabricator.wikimedia.org/T239589 (10mforns) p:05Triage→03High
[18:03:47] <wikibugs>	 10Analytics: Check home leftovers of dfoy - https://phabricator.wikimedia.org/T239571 (10mforns) p:05Triage→03Normal
[18:04:52] <wikibugs>	 10Analytics: Check home leftovers of dfoy - https://phabricator.wikimedia.org/T239571 (10mforns) a:03Milimetric
[18:05:36] <wikibugs>	 10Analytics: Check home leftovers of maxsem - https://phabricator.wikimedia.org/T239047 (10elukey) Pinging also @MaxSem :)
[18:06:07] <wikibugs>	 10Analytics: Check home leftovers of dfoy - https://phabricator.wikimedia.org/T239571 (10elukey) ` ====== stat1004 ====== total 1152 -rw-r--r-- 1 2318 wikidev  12564 Mar 24  2017 angola24.txt -rw-r--r-- 1 2318 wikidev  12836 Mar 27  2017 angola26.txt -rw-r--r-- 1 2318 wikidev  12555 Mar 29  2017 angola28.txt -rw...
[18:06:10] <wikibugs>	 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10SDC General, 10Wikidata: Create reportupdater reports that execute SDC requests - https://phabricator.wikimedia.org/T239565 (10mforns) p:05Triage→03High
[18:09:26] <wikibugs>	 10Analytics, 10WMDE-Analytics-Engineering, 10User-GoranSMilovanovic: Public data set review for T237728 - https://phabricator.wikimedia.org/T239393 (10mforns) I think @JFishback_WMF can help you with this task.
[18:09:49] <wikibugs>	 10Analytics, 10Core Platform Team: Update mediawiki-history to use new Multi-Content-Revision tables - https://phabricator.wikimedia.org/T239591 (10WDoranWMF)
[18:10:17] <wikibugs>	 10Analytics, 10Operations, 10ops-eqiad: Degraded RAID on an-worker1089 - https://phabricator.wikimedia.org/T239365 (10mforns) p:05Triage→03High
[18:10:46] <wikibugs>	 10Analytics: Revise wiki scoop list from labs once a quarter - https://phabricator.wikimedia.org/T239136 (10mforns) p:05Triage→03Low
[18:12:35] <wikibugs>	 10Analytics: Superset getting slower as usage increases - https://phabricator.wikimedia.org/T239130 (10mforns) It could maybe be Druid as well. Let's troubleshoot and determine the cause.
[18:12:43] <wikibugs>	 10Analytics: Superset getting slower as usage increases - https://phabricator.wikimedia.org/T239130 (10mforns) a:03Nuria
[18:17:18] <wikibugs>	 10Analytics, 10Analytics-Kanban: Import slots/slots_roles  and  wikibase.wbc_entity_usage through scoop - https://phabricator.wikimedia.org/T239127 (10mforns) 05Open→03Resolved
[18:17:24] <wikibugs>	 10Analytics, 10Product-Analytics, 10SDC General, 10Wikidata: Data about how many file pages on Commons contain at least one structured data element - https://phabricator.wikimedia.org/T238878 (10mforns)
[18:18:05] <wikibugs>	 10Analytics, 10WMDE-Analytics-Engineering, 10Privacy, 10User-GoranSMilovanovic: Public data set review for T237728 - https://phabricator.wikimedia.org/T239393 (10JFishback_WMF)
[18:19:11] <wikibugs>	 10Analytics, 10WMDE-Analytics-Engineering, 10Privacy, 10User-GoranSMilovanovic: Public data set review for T237728 - https://phabricator.wikimedia.org/T239393 (10JFishback_WMF) p:05Triage→03Normal
[18:19:28] <wikibugs>	 10Analytics: Test if Hue can run with Python3 - https://phabricator.wikimedia.org/T233073 (10mforns) Grosking: We can try to see if it works with CDH5, or we could deprecate Hue, and use sth else?
[18:19:56] <wikibugs>	 10Analytics: Test if Hue can run with Python3 - https://phabricator.wikimedia.org/T233073 (10mforns) p:05Triage→03Normal
[18:20:20] <wikibugs>	 10Analytics: Test if Hue can run with Python3 - https://phabricator.wikimedia.org/T233073 (10mforns) p:05Normal→03High
[18:21:01] <wikibugs>	 10Analytics: Test if Hue can run with Python3 - https://phabricator.wikimedia.org/T233073 (10mforns) p:05High→03Normal
[18:27:57] <wikibugs>	 10Analytics, 10Wikimedia Design Style Guide: Analytics: Some pages/page requests are not reflected in statistics - https://phabricator.wikimedia.org/T239685 (10mforns) a:03Nuria
[18:28:22] <wikibugs>	 10Analytics, 10Wikimedia Design Style Guide: Analytics: Some pages/page requests are not reflected in statistics - https://phabricator.wikimedia.org/T239685 (10mforns) a:05Nuria→03Milimetric
[18:28:42] <wikibugs>	 10Analytics, 10Editing-team, 10observability, 10Performance-Team (Radar): VE edit data stopped due to statsv falling over (?) on webperf1001 - https://phabricator.wikimedia.org/T239121 (10Ottomata) Done. BTW same errors were present in recent logs.
[18:29:12] <wikibugs>	 10Analytics, 10Product-Analytics: Many special pages missing from pageview_hourly dataset starting on July 23, 2019 - https://phabricator.wikimedia.org/T239672 (10Milimetric) a:03Nuria
[18:31:52] <wikibugs>	 10Analytics, 10Product-Analytics: Many special pages missing from pageview_hourly dataset starting on July 23, 2019 - https://phabricator.wikimedia.org/T239672 (10Nuria) This is correct, Special:Page pages other than search should have not been included (there was a long standing bug on the pageview definition...
[18:34:03] <wikibugs>	 10Analytics, 10Better Use Of Data, 10Product-Infrastructure-Team-Backlog, 10Wikimedia-Logstash, and 3 others: Client side error logging production launch - https://phabricator.wikimedia.org/T226986 (10Ottomata)
[18:56:09] <wikibugs>	 10Analytics, 10Core Platform Team: Update mediawiki-history to use new Multi-Content-Revision tables - https://phabricator.wikimedia.org/T239591 (10WDoranWMF) @mforns Adding @cicalese the PM for this project as well to make sure she has space to correct me. But the timehorizon for this is quite far, and I beli...
[19:04:38] <elukey>	 I am rebooting an-worker1089 for hw maintenance
[19:04:55] <elukey>	 if any job fails it might be due to me
[19:11:11] <wikibugs>	 10Analytics, 10Core Platform Team: Update mediawiki-history to use new Multi-Content-Revision tables - https://phabricator.wikimedia.org/T239591 (10CCicalese_WMF) My understanding was that the fields may be dropped from the replicas earlier, but they would not be dropped from the master until the switchover.
[19:16:01] <wikibugs>	 10Analytics, 10Core Platform Team: Update mediawiki-history to use new Multi-Content-Revision tables - https://phabricator.wikimedia.org/T239591 (10Nuria) Let's move this item to late Q2 given timelines.
[19:18:16] <wikibugs>	 10Analytics, 10Better Use Of Data, 10Product-Infrastructure-Team-Backlog, 10Wikimedia-Logstash, and 3 others: Client side error logging production launch - https://phabricator.wikimedia.org/T226986 (10Ottomata) Ah, I won't be able to make tomorrow's meeting, I'm not working and I have a grocery coop shift!...
[19:22:45] <wikibugs>	 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, 10Operations, and 8 others: Set up eventgate-logging-external in production - https://phabricator.wikimedia.org/T236386 (10Ottomata)  - events-<instance>.wikimedia.org/v1/events - <instance>-events.wikimedia.org/v1/events - intake-<instance>.wikim...
[19:25:24] <wikibugs>	 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, 10Operations, and 8 others: Set up eventgate-logging-external in production - https://phabricator.wikimedia.org/T236386 (10Ottomata) - logging-sink & analytics-sink (or sink-*)  ?
[19:25:37] <wikibugs>	 10Analytics, 10Operations, 10ops-eqiad: Degraded RAID on an-worker1089 - https://phabricator.wikimedia.org/T239365 (10elukey) 05Open→03Resolved Disk swapped, raid status ok. Thanks!
[19:27:22] * elukey off!
[19:27:31] <elukey>	 (an-worker1089 back working)
[19:50:30] <wikibugs>	 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, 10Operations, and 8 others: Set up eventgate-logging-external in production - https://phabricator.wikimedia.org/T236386 (10mpopov) >>! In T236386#5716552, @Ottomata wrote: >  > - events-<instance>.wikimedia.org/v1/events > - <instance>-events.wiki...
[19:56:07] <wikibugs>	 10Analytics, 10Research: Parse wikidumps and extract redirect information for 1 small wiki, romanian - https://phabricator.wikimedia.org/T232123 (10leila) 05Open→03Resolved
[20:02:16] <wikibugs>	 10Analytics, 10Operations, 10ops-eqiad: Degraded RAID on an-worker1089 - https://phabricator.wikimedia.org/T239365 (10Jclark-ctr) Replaced failed drive
[20:25:20] <wikibugs>	 10Analytics, 10Code-Stewardship-Reviews, 10Operations, 10Tools, 10Wikimedia-IRC-RC-Server: IRC RecentChanges feed: code stewardship request - https://phabricator.wikimedia.org/T185319 (10Dzahn) Is this really replacing the IRCd from T134271 ?
[20:26:04] <wikibugs>	 10Analytics, 10SDC General, 10Wikimedia-Stream: Verify that EventStreams work with WikiBase MediaInfo - https://phabricator.wikimedia.org/T210702 (10Abbe98) 05Open→03Resolved a:03Abbe98
[20:30:51] <wikibugs>	 10Analytics-Kanban, 10Better Use Of Data, 10Event-Platform, 10Operations, and 8 others: Set up eventgate-logging-external in production - https://phabricator.wikimedia.org/T236386 (10Ottomata) > Is /v1/events plural with the intention that eventually EventGate will support batches of events in the same req...
[21:31:36] <wikibugs>	 10Analytics, 10Better Use Of Data, 10Performance-Team, 10Product-Infrastructure-Team-Backlog, 10Product-Analytics (Kanban): Switch mw.user.sessionId back to session-cookie persistence - https://phabricator.wikimedia.org/T223931 (10mpopov) Okay, if `localStorage` is out of the discussion (due to lack of e...
[21:53:19] <lexnasser>	 ottomata: The risk assessment for the dataset has been completed, are you free to move the files from my user dir to the release location?
[22:27:27] <wikibugs>	 (03CR) 10Nuria: "So I understand, * I think* that without these changes the jobs should still work, correct?" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554915 (owner: 10Milimetric)
[22:32:17] <nuria>	 lexnasser: I think if you move the data under /srv/published-datasets/caching/2019
[22:32:26] <nuria>	 lexnasser: it will automagically appear as released
[22:32:45] <lexnasser>	 nuria: Do i have the proper user permissions to do that?
[22:32:56] <nuria>	 lexnasser: i think you should, please do try
[22:43:16] <lexnasser>	 nuria: moved successfully to that location, I'd assume it would appear somewhere in this dir, but doesn't appear yet: https://analytics.wikimedia.org/published/datasets/ . How often does it update?
[22:47:25] <lexnasser>	 nuria: Just appeared. Success! Thanks for your help, will follow up on ticket and update docs right now
[22:48:22] <nuria>	 lexnasser: nice
[22:48:30] <nuria>	 lexnasser: ACHIEVEMENT UNLOCKED
[22:49:51] <wikibugs>	 10Analytics, 10Analytics-Kanban: Request for a large request data set for caching research and tuning - https://phabricator.wikimedia.org/T225538 (10lexnasser) The data has been released!  URL: https://analytics.wikimedia.org/published/datasets/caching/2019/  Wikitech will be updated soon. I hope you find this...
[22:52:42] <wikibugs>	 (03CR) 10Nuria: Fix sqoop after new tables added (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/554255 (https://phabricator.wikimedia.org/T239127) (owner: 10Joal)