[00:43:29] (03PS6) 10Lex Nasser: Create pageviews 'top-per-country' endpoint with tests [analytics/aqs] - 10https://gerrit.wikimedia.org/r/657228 (https://phabricator.wikimedia.org/T207171) [00:56:58] PROBLEM - Check the last execution of monitor_refine_eventlogging_legacy_failure_flags on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_eventlogging_legacy_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [02:10:43] 10Analytics, 10DBA, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Ladsgroup) >>! In T120242#6914982, @Ottomata wrote: > >> I'd like to see a better explanati... [03:17:21] !log rebalance kafka partitions for webrequest_upload partition 19 [03:17:23] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [08:04:41] Good morning [08:14:28] (03CR) 10Joal: [C: 03+1] "LGTM :)" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/672541 (https://phabricator.wikimedia.org/T277512) (owner: 10Mforns) [08:16:13] (03CR) 10Joal: [C: 03+2] "Merging for deploy this week" [analytics/aqs] - 10https://gerrit.wikimedia.org/r/657228 (https://phabricator.wikimedia.org/T207171) (owner: 10Lex Nasser) [08:16:25] RECOVERY - Check the last execution of monitor_refine_eventlogging_legacy_failure_flags on an-launcher1002 is OK: OK: Status of the systemd unit monitor_refine_eventlogging_legacy_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [08:17:43] (03Merged) 10jenkins-bot: Create pageviews 'top-per-country' endpoint with tests [analytics/aqs] - 10https://gerrit.wikimedia.org/r/657228 (https://phabricator.wikimedia.org/T207171) (owner: 10Lex Nasser) [08:51:51] (03CR) 10Silvan Heintze: "> There is this note (but not an AC) in the task description that says: "We count only the edits an editor makes in a specific namespace. " [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/671195 (https://phabricator.wikimedia.org/T275999) (owner: 10Silvan Heintze) [09:09:42] hello hello [09:09:52] I have a first draft of the capacity scheduler puppet code review [09:10:01] now we can discuss settings and then test them on hadoop test :) [09:10:12] \o/ [09:10:31] elukey: how do you wish us to discuss? here? batcave? [09:12:40] joal: bc is fine, lemme geta coffee first :D [09:13:30] elukey: on the phone now, will ping you when ready [09:20:13] sure! I am now, anytime is fine :) [09:33:10] (03PS9) 10Phuedx: universalLanguageSelector: Add new properties [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668743 (https://phabricator.wikimedia.org/T275766) [09:43:39] elukey: Hi! [09:43:48] elukey: I'm sorry, long phone call :S [09:44:14] nono please :) [09:44:28] joining the cave [10:32:28] elukey: Hi again! [10:32:54] elukey: I have time - is now ok? (I assume you may have started something else) [10:33:53] PROBLEM - Check the last execution of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [10:36:21] joal: I have time yes [10:36:46] ack elukey - to the cave [10:45:03] RECOVERY - Check the last execution of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [10:56:14] PROBLEM - Check the last execution of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [11:29:24] 10Analytics-Radar, 10Discovery, 10Wikidata, 10Wikidata-Query-Service: Add pageviews total counts to WDQS - https://phabricator.wikimedia.org/T174981 (10Sascha) Perhaps the QRank signal might be helpful here? The signal is computed in the Wikimedia cloud infrastructure (Toolforge) and gets periodically refr... [12:00:17] RECOVERY - Check the last execution of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [12:13:45] PROBLEM - Check the last execution of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [12:45:18] RECOVERY - Check the last execution of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [12:58:52] PROBLEM - Check the last execution of produce_canary_events on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [13:15:55] 10Analytics, 10Machine-Learning-Team: Configure the Hadoop cluster to use the GPUs available on some workers - https://phabricator.wikimedia.org/T276791 (10elukey) @fkaelin after a chat with Miriam this morning I realized that I haven't really provided a good view of my current thoughts/aim, lemme add more det... [13:30:42] 10Analytics, 10Event-Platform, 10Research: TranslationRecommendation* Schemas Event Platform Migration - https://phabricator.wikimedia.org/T271163 (10Ottomata) I just manually POSTed your event from the CLI, and all went well: curl -v -H 'Content-Type: text/plain' -d@/tmp/tr.json 'https://intake-analytics... [13:32:22] 10Analytics, 10Research: Webrequest.isWMFDomain should return true for .wmflabs.org domains. - https://phabricator.wikimedia.org/T277536 (10Ottomata) [13:32:34] RECOVERY - Check the last execution of produce_canary_events on an-launcher1002 is OK: OK: Status of the systemd unit produce_canary_events https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [13:56:40] thanks joal for the prompt review, today I'll run the full oozie job to compare the production dashboard with the new-code dashboard, and if all charts look identic I'll merge and deploy [13:57:50] (03CR) 10Tonina Zhelyazkova: [C: 03+1] "LGTM, just one minor comment." (031 comment) [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/671195 (https://phabricator.wikimedia.org/T275999) (owner: 10Silvan Heintze) [14:43:47] !log rebalance kafka partitions for webrequest_upload partition 20 [14:43:49] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:45:24] razzi: I was wondering one thing - we have discussed about working on two partitions at the time, but never really followed up since we wanted to know how kafka behaved with important topics like webrequest. Do you think that we should revisit? [14:46:04] it seems safe enough to test with say one upload and one text at the same time [14:46:26] (brb) [14:50:16] elukey: they have to be the same topic, but we could try doing 2 partitions at a time from webrequest_upload and webrequest_text, would cut down the manual steps in half [14:58:22] razzi: ah okok perfect, yes if you think it is safe I am +1 [15:06:46] 10Analytics, 10DBA, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Marostegui) >>! In T120242#6914507, @Ottomata wrote: >> Debezium requires binlog_format=ROW,... [15:08:03] 10Analytics, 10DBA, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Ottomata) Hm, @ladsgroup I'm certainly not suggesting that we should ever bypass MediaWiki a... [15:23:23] 10Analytics-Clusters, 10DC-Ops, 10SRE, 10ops-eqiad: analytics1066's BBU might need to be replaced - https://phabricator.wikimedia.org/T277005 (10Cmjohnson) 05Open→03Resolved swapped the bbu, the server is back up and handed back to @elukey [15:24:10] 10Analytics-Clusters, 10DC-Ops, 10SRE, 10ops-eqiad: analytics1066's BBU might need to be replaced - https://phabricator.wikimedia.org/T277005 (10elukey) ` elukey@analytics1066:~$ sudo megacli -LDInfo -Lall -aALL | grep "Cache Policy" Default Cache Policy: WriteBack, ReadAdaptive, Direct, Write Cache OK if... [15:33:30] 10Analytics, 10DBA, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Ottomata) > clouddb1021 is owned by Analytics so we can set up ROW there if that's Cool soun... [15:36:13] (03CR) 10Mforns: MobileWikiAppiOSFeed Whitelist Request (034 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/666227 (owner: 10Erin Yener) [15:36:19] (03CR) 10Mforns: WikipediaPortal schema whitelist request (036 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/666223 (owner: 10Erin Yener) [15:39:06] (03CR) 10Mforns: MobileWikiAppFeed Whitelist Request (033 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/666229 (owner: 10Erin Yener) [15:44:49] (03CR) 10Mforns: WikipediaPortal schema whitelist request (031 comment) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/666223 (owner: 10Erin Yener) [15:48:57] (03PS1) 10Phuedx: universalLanguageSelector: Add timeToChangeLanguage property [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/672740 (https://phabricator.wikimedia.org/T275794) [16:08:29] elukey: email will be safer :) [16:20:05] 10Analytics, 10Traffic: varnishkafka / ATSkafka should support setting the kafka message timestamp - https://phabricator.wikimedia.org/T277553 (10Ottomata) [16:25:53] 10Analytics-Radar, 10Growth-Scaling, 10Product-Analytics, 10Growth-Team (Current Sprint): Growth: shorten welcome survey retention to 90 days - https://phabricator.wikimedia.org/T275171 (10Rileych) a:03Tgr [16:41:54] (03CR) 10Milimetric: "One more comment from Marcel, and then go ahead and merge." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/658348 (https://phabricator.wikimedia.org/T265732) (owner: 10Fdans) [16:51:55] 10Analytics-Radar, 10Cassandra, 10ContentTranslation, 10Event-Platform, and 10 others: Rebuild all blubber build docker images running on kubernetes - https://phabricator.wikimedia.org/T274262 (10Eevans) [17:05:57] 10Analytics-EventLogging, 10Analytics-Radar, 10Front-end-Standards-Group, 10MediaWiki-extensions-WikimediaEvents, and 4 others: Provide a reusable getEditCountBucket function for analytics purposes - https://phabricator.wikimedia.org/T210106 (10awight) >>! In T210106#6914801, @Krinkle wrote: > This is what... [17:18:48] 10Analytics-EventLogging, 10Analytics-Radar, 10Front-end-Standards-Group, 10MediaWiki-extensions-WikimediaEvents, and 4 others: Provide a reusable getEditCountBucket function for analytics purposes - https://phabricator.wikimedia.org/T210106 (10awight) >>! In T210106#6914789, @Krinkle wrote: > What is the... [17:41:35] (03CR) 10Awight: "I'll expand the scope of this a bit, and will try to "modernize" the packing files to the best of my understanding." (031 comment) [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/649296 (owner: 10Awight) [17:55:49] (03CR) 10Fdans: Add monthly pageview complete job (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/658348 (https://phabricator.wikimedia.org/T265732) (owner: 10Fdans) [18:08:48] 10Analytics, 10DBA, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Joe) >>! In T120242#6917670, @Ottomata wrote: >> clouddb1021 is owned by Analytics so we can... [18:26:52] (03PS2) 10Milimetric: Update mysql resolver to work with cloud replicas [analytics/refinery] - 10https://gerrit.wikimedia.org/r/666209 (https://phabricator.wikimedia.org/T274690) [18:27:01] (03CR) 10Milimetric: "Working ok up to the connection, gotta troubleshoot some more with Razzi. But @Joal let me know what you think about that comment." (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/666209 (https://phabricator.wikimedia.org/T274690) (owner: 10Milimetric) [19:10:30] * elukey off! [19:14:19] ah before leaving [19:14:39] razzi: we'd need to remember to reboot an-conf100x [19:14:52] can we do it tomorrow when you log in? [19:15:09] elukey: that's zookeeper? Yeah can do so tomorrow [19:15:30] razzi: yes those are the three nodes running zookeeper [19:15:41] we can reboot it via reboot-single [19:16:01] the cookbook may be good but we have to limit it to only our cluster, all the other ones are shared [19:16:33] and it needs to be clear in the code in case that a pre-requisite to use it is that the cluster doesn't share anything [19:16:55] (so I am supportive but let's be very specific in case, otherwise there is the risk of potential damages) [19:17:18] ok cool, can start it out with only 1 cluster option and a big warning in the code [19:43:29] milimetric: Heya - I haz a patch! https://github.com/jobar/gobblin/tree/wmf_versions_update [19:43:44] \o/ I was already tracking, pulling now [19:44:03] milimetric: I pushed-force - ou need to delete/recreate the branch [19:44:06] got it, thanks!! I'm playing with the Gson streamer and will add it in there [19:44:08] milimetric: sorry :( [19:44:17] milimetric: great :) [19:44:23] no worries, I didn't make any changes, git automatically forces if that's the case [19:44:33] milimetric: I have a perf question for you to test if you're ok [19:45:04] joal: of course, if I get my stuff working [19:45:56] joal: q: why a list of possible timestamps for a given stream? Anticipating schema evolution where the timestamp field is renamed? [19:45:59] milimetric: I wonder if fully parsing the json as I currently do is really slower than having to instantiate a stream-reader and stream-read only after interesting fields [19:46:38] joal: of course, that's the important question, it should be faster but not sure how much. Only one way to find out :) [19:46:41] milimetric: for genericity - We could reuse the same job for multiple topics, among which some use an old timestamp name and some a new one [19:47:37] joal: oh I don't love that, would rather do that with templates and keep the json-parsing code simpler. It would get complicated with lists regardless of whole-or-stream decision [19:47:40] indeed milimetric - unit-testing with a loop should do [19:47:51] works for me milimetric [19:48:15] milimetric: let's keep it simple [19:48:19] not sure about unit testing, can be tricky with caching and generating random enough structures [19:49:15] milimetric: I suggest getting a 1k-lines revision-create json files from the cluster, and loop over those as an example [19:49:19] for example :) [19:49:53] yeah, that's what I was thinking, some real data, maybe 10k records [19:50:12] sure milimetric - we can even loop over 10k 10 times, should be fine :) [19:50:20] I think it matters, because I want to be as in control of the performance on this new pipeline as we can be. [19:52:42] 10Analytics, 10DBA, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Ottomata) > I think they will need further discussion (in the tech forum? with the interest... [19:53:36] milimetric: IMO we'll only be interested if the perf difference is important - if it's small, the easiest to maitain is better IMO [19:53:43] anyhow - let's try :) [19:54:21] joal: hm, https://stackoverflow.com/a/58313599/180664 looks like Jackson streaming is about 3x faster than Gson, but simple Jackson is fast enough [19:54:24] what do you think? [19:55:09] I remember there being lots of problems with some older version of Jackson? [19:55:30] milimetric: I'd give it a try - I have used Jackson streaming for XML, and was happy with performance indeed [19:56:04] k [19:57:27] milimetric: awesome - I'll test my code tomorrow, making sure it works functionally with MR for some stream, and help you as needed [19:57:41] ok, sounds good [20:04:06] razzi: forgot another thing, just came up to mind - we changed the ip address for clouddb1021, and it is not in the VLAN firewall [20:04:48] got it elukey - will look into that [20:08:01] razzi: https://gerrit.wikimedia.org/r/c/operations/homer/public/+/672797 is the patch [20:08:49] I'll not mention why it needs to be done in this way and how to deploy it, let's find some time tomorrow to discuss it ok? [20:09:32] since no one answered my email... who is the best person to talk to others about Superset and some WMF internal discussion about it across teams? [20:11:01] tltaylor: ah snap there were multiple replies but you didn't get into the reply list :D [20:11:43] milimetric: we use jackson elsewhere [20:11:49] might be good to keep using that [20:11:52] tltaylor: done, you should have an email in the inbox [20:11:57] the only issues with it are no different than other java dep issues [20:12:04] hard to synchronize sometimes which versions are used [20:12:08] 10Analytics, 10DBA, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Joe) The best practices I am talking about are, basically: - **Don't use the database as an... [20:12:09] ok ottomata, will do [20:14:15] * elukey afk again :) [20:15:37] * elukey afk! [20:15:58] (wrong irssi command :P) [20:18:08] thank you! [20:48:43] (03CR) 10Mforns: [V: 03+2 C: 03+2] "Tested this by creating a parallel Superset dashboard and comparing it with the production one. All seemed identical, so merging!" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/672541 (https://phabricator.wikimedia.org/T277512) (owner: 10Mforns) [20:56:54] milimetric: joal have done a bunch of work on data platform doc [20:57:03] got any time to look it over witih me? [20:57:40] yeah, let's look ottomata [22:13:08] 10Analytics, 10Machine-Learning-Team, 10ORES, 10Toolforge, and 2 others: Generate dump of scored-revisions from 2018-2020 for English Wikipedia - https://phabricator.wikimedia.org/T277609 (10Halfak) [22:13:21] 10Analytics, 10Machine-Learning-Team, 10ORES, 10Toolforge, and 2 others: Generate dump of scored-revisions from 2018-2020 for English Wikipedia - https://phabricator.wikimedia.org/T277609 (10Halfak) [22:13:41] 10Analytics, 10Machine-Learning-Team, 10ORES, 10Toolforge, and 2 others: Generate dump of scored-revisions from 2018-2020 for English Wikipedia - https://phabricator.wikimedia.org/T277609 (10Halfak) [22:16:45] 10Analytics, 10DBA, 10Event-Platform, 10WMF-Architecture-Team, 10Services (later): Reliable (atomic) MediaWiki event production / MediaWiki events as source of truth - https://phabricator.wikimedia.org/T120242 (10Ottomata) Thanks for responses, I want to respond more in full too, but here's a quick thoug... [22:19:44] 10Analytics, 10Data-Services, 10Machine-Learning-Team, 10ORES, and 2 others: Generate dump of scored-revisions from 2018-2020 for English Wikipedia - https://phabricator.wikimedia.org/T277609 (10JJMC89)