[00:56:40] 10Analytics, 10ChangeProp, 10EventBus, 10MediaWiki-JobQueue, and 3 others: Make Kafka JobQueue use Special:RunSingleJob - https://phabricator.wikimedia.org/T182372 (10mobrovac) [00:59:22] 10Analytics, 10EventBus, 10Growth-Team, 10MediaWiki-Watchlist, and 5 others: Clear watchlist on enwiki only removes 50 items at a time - https://phabricator.wikimedia.org/T207329 (10mobrovac) [01:22:18] 10Analytics, 10Product-Analytics: `rev_parent_id` and `rev_content_changed` are missing in event.mediawiki_revision_tags_change - https://phabricator.wikimedia.org/T218274 (10chelsyx) [06:42:22] 10Analytics, 10EventBus, 10Operations, 10Prod-Kubernetes, and 2 others: eventgate-analytics k8s pods occasionally can't produce to kafka - https://phabricator.wikimedia.org/T218268 (10akosiaris) Do we have logs of this happening? [08:16:25] Thanks a lot mforns for reviews and restarting oozie job :) [08:18:17] 10Analytics, 10Dumps-Generation, 10Wikidata: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday - https://phabricator.wikimedia.org/T216160 (10ArielGlenn) @notconfusing I think you are the WHGI person; can you please weigh in? [08:19:28] 10Analytics, 10Dumps-Generation, 10Wikidata: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday - https://phabricator.wikimedia.org/T216160 (10JAllemandou) >>! In T216160#5020236, @ArielGlenn wrote: > By Friday I'll have done that; by next Wednesday let's make a decision,... [08:26:46] 10Analytics, 10Dumps-Generation, 10Wikidata: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday - https://phabricator.wikimedia.org/T216160 (10ArielGlenn) Mail sent to Rosie Stephenson-Goodknight of the Women in Red group. [08:53:44] 10Analytics, 10Dumps-Generation, 10Wikidata: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday - https://phabricator.wikimedia.org/T216160 (10ArielGlenn) Summarizing an IRC conversation with @JAllemandou : - Analytics ingests full revision history xml dumps from the ru... [09:15:15] 10Analytics, 10Analytics-Kanban, 10Operations, 10ops-eqiad: confirm gpu form factor in stat1005 - https://phabricator.wikimedia.org/T216528 (10elukey) @Cmjohnson do you have time today/tomorrow to answer Rob's question? It would unblock us to order the new GPU :) (sorry for the hassle with stat1005, we hop... [09:38:35] joal: o/ [09:38:39] did you see https://www.apachecon.com/aceu19/index.html? [09:39:16] 10Analytics, 10Dumps-Generation, 10Wikidata: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday - https://phabricator.wikimedia.org/T216160 (10ArielGlenn) Posted on Wikidata directly: https://www.wikidata.org/wiki/Wikidata:Project_chat#Schedule_of_Wikidata_entity_dumps_ge... [09:47:11] 10Analytics, 10Dumps-Generation, 10WikiCite, 10Wikidata: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday - https://phabricator.wikimedia.org/T216160 (10ArielGlenn) The WikiCite project apparently uses these dumps, so adding them too. @Mvolz (hope you're the right per... [09:54:44] 10Analytics, 10Dumps-Generation, 10WikiCite, 10Wikidata: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday - https://phabricator.wikimedia.org/T216160 (10ArielGlenn) Email sent to wikitech-l at https://lists.wikimedia.org/pipermail/wikitech-l/2019-March/091705.html I... [10:02:30] 10Analytics, 10Research: Check home leftovers of ISI researchers - https://phabricator.wikimedia.org/T215775 (10MoritzMuehlenhoff) >>! In T215775#5019705, @leila wrote: > @MoritzMuehlenhoff do I recall correctly that you were working with 1+ of the researchers to figure out how to keep their data? I helped Mi... [10:27:56] 10Analytics, 10Dumps-Generation, 10WikiCite, 10Wikidata: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday - https://phabricator.wikimedia.org/T216160 (10Lea_Lacroix_WMDE) @ArielGlenn I already mentioned it in the Wikidata newsletter a few weeks ago, will put a reminde... [11:23:19] 10Analytics, 10Dumps-Generation, 10WikiCite, 10Wikidata: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday - https://phabricator.wikimedia.org/T216160 (10ArielGlenn) @Lea_Lacroix_WMDE Thanks, because absolutely no one showed up, and I would rather we get people sorted... [12:37:10] elukey: Berlin in October sounds fun :) [12:53:07] elukey: shall we deploy AQS on node10 when you're back from lunch? [12:58:41] joal: mforns wanted to participate, is it ok if we do it after standup? [12:58:50] it sure is :) [12:59:05] so my idea is the following [12:59:37] add the apt config on the hosts and upgrade nodejs via apt. This should not cause any aqs service restart, that will keep using nodejs 6 [12:59:47] then we deploy, causing a restart, that will pick up nodejs 10 [12:59:50] first on canary [12:59:53] then to the rest [13:00:07] one by one sounds great :) [13:00:39] is it one by one or canary + the rest? [13:00:58] elukey: it's one by one [13:01:16] elukey: One group per machine [13:01:42] all right even better then [13:02:14] I can then apt-get install only on one node at the time [13:02:19] so rollback should be easier [13:21:24] sounds great [13:30:02] * elukey lunch! [13:30:39] as FYI the hadoop testing cluster is not working atm, I am testing the TLS config.. any email etc.. that might come should not be something to worry about :) [13:34:47] ack elukey :) [13:34:49] Thanks ;) [13:38:05] 10Analytics, 10EventBus, 10Operations, 10Prod-Kubernetes, and 2 others: eventgate-analytics k8s pods occasionally can't produce to kafka - https://phabricator.wikimedia.org/T218268 (10Ottomata) https://logstash.wikimedia.org/app/kibana#/doc/logstash-*/logstash-2019.03.13/eventgate?id=AWl4sxguNBo9dX1kfcii&_... [13:39:02] 10Analytics, 10EventBus, 10Core Platform Team Kanban (Doing), 10Services (doing): Decrease timeout for EventBus extension for analytics events - https://phabricator.wikimedia.org/T218260 (10Ottomata) I wonder if we should also use ?hasty=true mode for mediawiki 'analytics' events? This would use a non-ACK... [13:42:45] 10Analytics, 10EventBus, 10Core Platform Team Kanban (Doing), 10Services (doing): Decrease timeout for EventBus extension for analytics events - https://phabricator.wikimedia.org/T218260 (10Pchelolo) Makes sense for analytics events IMHO [13:44:46] 10Analytics, 10MediaWiki-Vagrant: Vagrant initial provision fails on NodeJS version mismatch - https://phabricator.wikimedia.org/T218238 (10Ottomata) [14:24:25] joal: I was going to take a stab at the goals verbiage for smart tools [14:24:35] I can ping you when I have something or we can work on it together [14:24:49] ack milimetric - Let me know if you want me to help from now on :) [14:25:16] ok, I'll do a rough draft and we can look together then [14:39:32] oh wow, this is really hard [14:45:46] (03PS3) 10Joal: Update refinery sqoop to use dedicated labsdb host [analytics/refinery] - 10https://gerrit.wikimedia.org/r/495266 (https://phabricator.wikimedia.org/T215550) [14:58:11] 10Analytics, 10EventBus, 10Services (watching), 10cloud-services-team (Kanban): EventGate wikimedia implementation should emit rdkafka stats - https://phabricator.wikimedia.org/T218305 (10Ottomata) [15:01:25] (03CR) 10Nuria: [C: 04-1] Update refinery sqoop to use dedicated labsdb host (033 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/495266 (https://phabricator.wikimedia.org/T215550) (owner: 10Joal) [15:03:26] ok joal, the three sentences I spent the most time on in my life, line 83 of https://etherpad.wikimedia.org/p/analytics-goals [15:05:36] milimetric: :) To the essense [15:05:57] I donno! I wrote so many versions I have no idea what these words even mean [15:05:59] what are words?! [15:06:13] milimetric: I wonder if objective is to be super concise, or if we want a bit more - not sure [15:10:35] joal: I looked at the other years for reference, and that's why I panicked and revised so much [15:10:40] they almost never use any technical terms [15:10:47] and they are usually two or three sentences [15:23:45] k milimetric :) [15:24:16] nuria: I'm gonna split the patch you -1ed in two: small improvements and mandatory changes, and 2-pool change [15:24:21] as we discussed [15:50:20] a-team will miss grooming, have better use of data meeting. [15:53:50] ahhh the joy of production [15:53:50] 2019-03-14 15:52:59,420 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode. [15:53:53] java.io.IOException: !JsseListener: java.lang.NullPointerException [15:54:00] (this is hadoop test) [15:54:02] sigh [16:01:16] ping ottomata [16:01:38] ottomata, joal: coming to groskin? we need to talk about goals and fun stuff like that [16:02:20] ottomata: we need to talk about goals, can you make it? [16:02:39] nuria: ah have better use of data meeting [16:02:47] i'll see if i cana duck out of it [16:02:50] ottomata: k [16:03:22] 10Analytics, 10Dumps-Generation, 10WikiCite, 10Wikidata: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday - https://phabricator.wikimedia.org/T216160 (10Rosiestep) @ArielGlenn The proposed change from every other Monday to the 1st and 15th of each month will be fine f... [16:24:09] 10Analytics: Set up a Kerberos KDC service in production with minimal puppet automation - https://phabricator.wikimedia.org/T212257 (10Milimetric) p:05High→03Triage [16:24:43] 10Analytics: Set up a Kerberos KDC service in production with minimal puppet automation - https://phabricator.wikimedia.org/T212257 (10Milimetric) p:05Triage→03High [16:40:55] 10Analytics, 10Analytics-Kanban, 10Better Use Of Data, 10Product-Analytics: "Edit" equivalent of pageviews daily available to use in Turnilo and Superset - https://phabricator.wikimedia.org/T211173 (10Milimetric) [16:46:28] joining a-team [16:55:00] 10Analytics, 10Product-Analytics: automatic ingestion from annotations on schemas into druid - https://phabricator.wikimedia.org/T218319 (10Nuria) [16:59:38] joal: ok, sounds good, i think teh second round of changes we should park until they are needed. [16:59:46] works for me :) [17:29:12] elukey, pingbak [17:29:20] *pingback [17:29:30] back to batcave :] [17:39:25] 10Analytics, 10Dumps-Generation, 10WikiCite, 10Wikidata: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday - https://phabricator.wikimedia.org/T216160 (10notconfusing) Normally whgi.wmflabs.org tries to run the day after the dumps are created to make it's data as real... [17:42:38] 10Analytics, 10Product-Analytics, 10Patch-For-Review, 10SEO: Make various auth libraries available on stat* machines - https://phabricator.wikimedia.org/T197896 (10mpopov) 05Open→03Resolved Managed with others! :) I'll reopen if needed. Thanks for checking in! [17:43:09] !log Deploying AQS using scap (node10 upgrade) [17:43:11] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:07:08] aqs migrated to nodejs 10! [18:07:41] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Move AQS to nodejs 10 - https://phabricator.wikimedia.org/T210706 (10elukey) AQS migrated to nodejs 10! [18:07:48] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Move AQS to nodejs 10 - https://phabricator.wikimedia.org/T210706 (10elukey) [18:20:34] elukey, joal, everything seems fine! I see though that the 99th percentile is a bit higher after deployment [18:20:49] 75 seems stable [18:20:57] looks good then :) [18:21:09] cool [18:21:16] if it breaks I'll call you tomorrow :P [18:22:22] elukey, of course :D [18:37:40] ottomata: TLS deployed for HDFS on the hadoop testing cluster! [18:37:46] all working fine afaics [18:38:02] awesooome! [18:38:24] you guys shouldn't get any email alarm etc.. but in case it is all wip [18:39:09] tomorrow I'll carefully test all ports/traffic/etc.. [18:39:57] * elukey off! [18:45:52] 10Analytics, 10Dumps-Generation, 10WikiCite, 10Wikidata: Update wikidata-entities dump generation to fixed day-of-month instead of fixed weekday - https://phabricator.wikimedia.org/T216160 (10Melderick) @ArielGlenn , Actually, I came here because I saw @Lea_Lacroix_WMDE message on the newsletter :) [18:52:44] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Move AQS to nodejs 10 - https://phabricator.wikimedia.org/T210706 (10mobrovac) \o/ [18:53:57] 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models - https://phabricator.wikimedia.org/T148843 (10Nuria) [18:54:50] 10Analytics, 10Operations, 10Research-management, 10Patch-For-Review, 10User-Elukey: Remove computational bottlenecks in stats machine via adding a GPU that can be used to train ML models - https://phabricator.wikimedia.org/T148843 (10Nuria) Stalled on https://phabricator.wikimedia.org/T216528 [19:05:14] 10Analytics: Upgrade analytics cluster to cloudera distro CDH 5.16 - https://phabricator.wikimedia.org/T218343 (10Nuria) [19:08:00] 10Analytics, 10Research: Check home leftovers of ISI researchers - https://phabricator.wikimedia.org/T215775 (10leila) @MoritzMuehlenhoff understood. @elukey I sent email to the 4 researchers and asked them to get back to me by 2019-03-30 or the data will be deleted. I'll let you know if there is any request... [19:13:22] (03PS4) 10Joal: Update refinery sqoop to use dedicated labsdb host [analytics/refinery] - 10https://gerrit.wikimedia.org/r/495266 (https://phabricator.wikimedia.org/T215550) [19:16:43] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Modern Event Platform: Deploy instance of EventGate service that processes events from kafka main - https://phabricator.wikimedia.org/T218346 (10Nuria) p:05Triage→03High [19:17:14] 10Analytics, 10Analytics-EventLogging, 10EventBus, 10Core Platform Team Backlog (Watching / External), 10Services (watching): Modern Event Platform: Deploy instance of EventGate service that produces events to kafka main - https://phabricator.wikimedia.org/T218346 (10Ottomata) [19:17:39] (03PS5) 10Joal: Update refinery sqoop to use dedicated labsdb host [analytics/refinery] - 10https://gerrit.wikimedia.org/r/495266 (https://phabricator.wikimedia.org/T215550) [19:18:23] ottomata : i have added some tests and goals to MEP program please update ass needed: https://www.mediawiki.org/wiki/Wikimedia_Technology/Annual_Plans/FY2019/TEC2:_Modern_Event_Platform/Goals#Goal(s) [19:21:48] 10Analytics: Ingest cirrusserachrequest data into druid - https://phabricator.wikimedia.org/T218347 (10Nuria) [19:25:00] 10Analytics, 10Core Platform Team: Ingest api data (for posts) into druid - https://phabricator.wikimedia.org/T218348 (10Nuria) [19:26:00] 10Analytics, 10Core Platform Team: Ingest api data (for posts) into druid - https://phabricator.wikimedia.org/T218348 (10Nuria) blocked on http://phabricator.wikimedia.org/T215442 [19:26:43] 10Analytics: Ingest cirrusserachrequest data into druid - https://phabricator.wikimedia.org/T218347 (10Nuria) Blocked on http://phabricator.wikimedia.org/T215442 [19:26:58] 10Analytics, 10Discovery: Ingest cirrusserachrequest data into druid - https://phabricator.wikimedia.org/T218347 (10Nuria) [19:28:26] (03PS1) 10Joal: Update refinery sqoop parallel execution [analytics/refinery] - 10https://gerrit.wikimedia.org/r/496560 (https://phabricator.wikimedia.org/T215550) [19:28:35] nuria: patches split :) [19:29:04] 10Analytics, 10Core Platform Team: Ingest api data (for posts) into druid - https://phabricator.wikimedia.org/T218348 (10Nuria) [19:31:46] 10Analytics, 10Product-Analytics, 10Patch-For-Review: Standardize datetimes/timestamps in the Data Lake - https://phabricator.wikimedia.org/T212529 (10Neil_P._Quinn_WMF) I actually asked my team about their opinions on these two formats today (without telling them which one I liked), and I've included what I... [19:39:24] (03PS7) 10Joal: Update mediawiki-reconstruction with log info [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/493012 [19:53:46] (03PS9) 10Joal: Refactor mediawiki-page-history computation [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/493390 (https://phabricator.wikimedia.org/T190434) [19:58:33] (03CR) 10jerkins-bot: [V: 04-1] Refactor mediawiki-page-history computation [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/493390 (https://phabricator.wikimedia.org/T190434) (owner: 10Joal) [20:43:05] 10Analytics, 10MediaWiki-Vagrant: Vagrant initial provision fails on NodeJS version mismatch - https://phabricator.wikimedia.org/T218238 (10marcella) I'm having the same error. Thanks for taking a look! [21:07:21] (03CR) 10Milimetric: "Joseph I was trying to look at this code before Marcel beat me to it, and I had this note written down. Maybe you can walk me through it " (032 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/485710 (https://phabricator.wikimedia.org/T213603) (owner: 10Joal) [21:16:42] milimetric: will you work tomorrow? [21:16:48] yeah joal [21:17:08] k - I invite you to a talk around that patch when you wish :) [21:17:33] joal: I read the whole patch now and realize I was stuck on that one line when the rest of the patch was super straightforward [21:17:49] I'm sorry I didn't just finish the review, I thought for some reason I was stupid and couldn't understand that one change [21:18:01] I'll ping you tomorrow [21:18:16] sounds good - Let's make sure the change I made was NOT stupid :) [21:18:29] Gone for tonight team [21:34:06] whats the "probably works" way to select date ranges from hourly partitioned tables in spark with partition predicate pushdown? Naively making something like `(year >= 2019 and month >= 2 and day >= 27) and (year <= 2019 and month <= 3 and day < 3)` will not select the 7 days between feb 27 and march 3. I tried to do `row_date = unix_timestamp(concat(year, '-', month, '-', day, ' ', [21:34:12] hour, ':00:00'))` along with `row_date >= ts_start and row_date < ts_end` but this isn't pushing down the partition predicate. [21:36:38] ebernhardson: i feel like someone has a script or shortcut to generate the proper where statement [21:36:47] i'm not sure who or where though [21:36:50] :) [21:52:57] spark-20331 shipped with 2.3.0 seems to claim to support this use case, hmm [21:54:02] hmm, maybe not [22:20:15] sigh, the problem isn't spark (at least not directly). The problem is spark can only do that pushdown on parquet, for some reason the avro table doesn't want to pushdown :S [22:37:09] ebernhardson: on hive you can use datediff like datediff(string enddate, string startdate) <10 can you use this udf in spark? [23:22:14] nuria: seems to do the trick, thanks! [23:22:22] * ebernhardson isn't sure why one works and not the other, but not that important [23:26:00] ebernhardson: i share that approach