[00:00:17] 10Analytics, 10Event-Platform, 10Product-Analytics, 10Product-Data-Infrastructure: [MEP] [BUG] Timestamp format changed in migrated client-side EventLogging schemas - https://phabricator.wikimedia.org/T277253 (10nettrom_WMF) [01:48:23] PROBLEM - Check the last execution of monitor_refine_event_failure_flags on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_event_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [07:01:02] 10Analytics, 10Machine-Learning-Team: Configure the Hadoop cluster to use the GPUs available on some workers - https://phabricator.wikimedia.org/T276791 (10elukey) Thanks a lot for the list of use cases! I created a couple of subtasks to deal with the Hadoop config, once done we should be able to restart fro... [07:21:20] !log re-run monitor_refine_event_failure_flags [07:21:27] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:25:19] RECOVERY - Check the last execution of monitor_refine_event_failure_flags on an-launcher1002 is OK: OK: Status of the systemd unit monitor_refine_event_failure_flags https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [09:34:28] so the capacity scheduler is a bit more complicated, but I really like it [10:11:14] 10Analytics-Clusters: Review the Yarn Capacity scheduler and see if we can move to it - https://phabricator.wikimedia.org/T277062 (10elukey) Still trying to figure out the best way to set this in puppet, but I have created this first draft to kick off a discussion: ` yarn-site.xml (listed as properties for clar... [11:52:31] spent 2h in a puppet refactoring rabbit hole [11:52:49] result -> more ideas, more mess [11:53:03] going to take a lunch break and will get back to it with a fresh mind :D [13:56:11] hello teammm [13:58:51] holaaa [14:01:42] (03PS7) 10Phuedx: universalLanguageSelector: Add new properties [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668743 (https://phabricator.wikimedia.org/T275766) [14:04:37] (03PS8) 10Phuedx: universalLanguageSelector: Add new properties [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668743 (https://phabricator.wikimedia.org/T275766) [14:04:48] (03CR) 10Phuedx: universalLanguageSelector: Add new properties (032 comments) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668743 (https://phabricator.wikimedia.org/T275766) (owner: 10Phuedx) [14:09:31] (03CR) 10Phuedx: universalLanguageSelector: Add new properties (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668743 (https://phabricator.wikimedia.org/T275766) (owner: 10Phuedx) [14:42:03] o/ [14:42:38] hola hola [14:44:39] yoyo [14:56:17] 10Analytics-Clusters, 10DC-Ops, 10SRE, 10ops-eqiad: analytics1066's BBU might need to be replaced - https://phabricator.wikimedia.org/T277005 (10Cmjohnson) @elukey @Ottomata I would like to do this Monday morning my time around 11am local. 1600UTC [15:02:14] 10Analytics-Clusters, 10DC-Ops, 10SRE, 10ops-eqiad: analytics1066's BBU might need to be replaced - https://phabricator.wikimedia.org/T277005 (10elukey) @Cmjohnson perfect, @razzi might be around as well, in case we'll let you to sync and do the work :) [15:09:07] (03CR) 10Joal: [C: 03+1] "Let's have this merged 😊 Thanks a lot @lexnasser" [analytics/aqs] - 10https://gerrit.wikimedia.org/r/657228 (https://phabricator.wikimedia.org/T207171) (owner: 10Lex Nasser) [15:18:44] 10Analytics-Clusters, 10Technical-blog-posts: Story idea for Blog: Migration of the Analytics Hadoop infrastructure to Apache Bigtop - https://phabricator.wikimedia.org/T277133 (10srodlund) @elukey Hey hey! This sounds like a great idea for a blog post (or two)! It is perfectly fine to plan for more than one... [15:48:32] hnowlan: I added you to https://gerrit.wikimedia.org/r/c/operations/puppet/+/671172 if you want to get exposure to our codebase, but feel free to just skip it [15:48:49] I spent some time trying to clean up commons.yaml from our default configs [15:49:15] basically we have a profile, profile::hadoop::common, that is responsible to deploy a lot of xmls with common configs that all nodes need to have [15:49:45] nice, thanks! I'll give it a look [15:52:43] it is not the prettiest thing that we have to offer, but it is a compromise :) [15:52:51] from time to time I try to make it better [16:10:52] 10Analytics-Radar, 10Cassandra, 10observability, 10Puppet, and 2 others: Upgrade prometheus-jmx-exporter on all services using it - https://phabricator.wikimedia.org/T192948 (10colewhite) [16:10:56] 10Analytics-Clusters, 10CirrusSearch, 10SRE, 10Wikidata, and 3 others: Upgrade prometheus-jmx-exporter - https://phabricator.wikimedia.org/T276595 (10colewhite) 05Open→03Resolved a:03colewhite prometheus-jmx-exporter 0.15.0 is deployed to our apt repo. [16:26:14] (03CR) 10Mforns: [C: 03+1] "LGTM!" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/670269 (https://phabricator.wikimedia.org/T273789) (owner: 10Ottomata) [16:39:35] ottomata: o/ [16:39:37] re: https://gerrit.wikimedia.org/r/c/operations/puppet/+/671172/ [16:39:46] I only added the properties that are common to both clustrs [16:39:51] *clusters [16:39:58] nothing needs to be overriden [16:40:14] oh. [16:40:19] and in case, it is possible via common.yaml, that will take the precedence [16:40:28] yeah I just wanted to clean up the defaults [16:40:35] the memory settings are the same? [16:40:56] the map/reduce ones etc... yes, we didn't really set them up differently [16:41:07] the ones that are different stay on common.yaml [16:41:17] elukey: i guess if the settings that are the same are also ones that would be the same if we were to e.g. set up a ML hadoop or a public data lake hadoop [16:42:05] ottomata: yes exactly this was the idea [16:42:14] ok sorry didn't get that [16:42:17] carry on +1 :) [16:42:31] nono sorry I should've explained in a better way :( [16:42:47] I had the same idea that you had about a specific profile config for each cluster [16:42:52] i guess if we run into somehting that isn't a good default we can always move it back to hiera [16:42:58] but I ended up in a little mess so I decided to take a little step [16:43:13] yes yes all that it is not a good default goes in hiera [16:43:42] I added in all kerberos etc.. things too that we don't really change [16:44:55] ottomata: I basically started from https://phabricator.wikimedia.org/T277062#6907915, wondering where that blurb could go [16:45:22] common.yaml is an easy one, maybe a separate profile::hadoop::yarn::scheduler.pp could be better [16:45:36] (the config is not final, but it is a little more verbose than fair) [16:47:19] for example, a separate profile that renders capacity-scheduler.xml could be added only to the masters [16:47:27] ah yes this seems better [16:53:49] (03PS1) 10Silvan Heintze: Track editor numbers split by namespace [analytics/wmde/scripts] - 10https://gerrit.wikimedia.org/r/671195 (https://phabricator.wikimedia.org/T275999) [16:58:26] (03CR) 10Mforns: "LGTM! Left 2 nit-picky comments, but will not insist if you want to skip those." (032 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/658348 (https://phabricator.wikimedia.org/T265732) (owner: 10Fdans) [17:01:24] (03CR) 10Mforns: "> Patch Set 2:" (031 comment) [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/649296 (owner: 10Awight) [17:01:29] (03CR) 10Mforns: [C: 03+1] Glue for pure-setup.cfg project [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/649296 (owner: 10Awight) [17:06:32] joal, mforns - did any of you restart mediawiki-wikitext-history? [17:06:42] elukey: that was a question I had for you :) [17:06:44] elukey: not me [17:06:47] ahahahhaha [17:06:51] ok - very werird :) [17:07:48] ottomata: was it you? [17:12:13] no ok this is very weird [17:12:18] so the job failed this EU morning [17:12:45] and the current history-wikitext was submitted, according to hue, March 12, 2021 9:22 AM [17:13:24] yup elukey - I saw that - I don't understand [17:15:03] so there is a failed one in [17:15:04] https://hue-next.wikimedia.org/hue/jobbrowser/#!id=0023575-210222192802983-oozie-oozi-W [17:15:09] I found it via FAILED workflows [17:16:03] ahhh https://hue-next.wikimedia.org/hue/jobbrowser/#!id=0001955-201103154415936-oozie-oozi-C [17:16:06] joal: --^ [17:16:11] there are two coords active! [17:16:38] I am going to kill this one [17:17:13] WUT? [17:17:30] elukey: makes no sense! there is only one visible in [17:17:33] old hue [17:17:57] I know [17:18:06] /o\ [17:18:17] even hue-next, I am trying to find it now [17:20:38] !log kill duplicate mediawiki-wikitext-history coordinator failing and sending emails to alerts@ [17:20:40] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [17:22:07] ok so the oozie id was 0001955-201103154415936-oozie-oozi-C and I checked via CLI, it seems killed [17:25:07] (03CR) 10Mforns: "Thanks for this patch! Left a comment." (031 comment) [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/667159 (https://phabricator.wikimedia.org/T193174) (owner: 10Awight) [17:27:22] elukey: separate profile makes sense! [17:27:31] you could probably pu thtat in the bigtop module [17:27:55] elukey: i did not restart anything! [17:33:30] ottomata: yes yes we solved the mistery, hue was hiding stuff from us! [17:37:04] (03CR) 10Mforns: "LGTM overall! Left a couple comments." (032 comments) [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/667192 (https://phabricator.wikimedia.org/T193169) (owner: 10Awight) [17:38:09] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM!" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/666933 (https://phabricator.wikimedia.org/T273454) (owner: 10Awight) [17:39:28] going afk, have a good weekend folks! [17:39:34] see you on monday [17:39:35] (03PS2) 10Mforns: Fix typo: no "performer" field [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/667565 (https://phabricator.wikimedia.org/T272569) (owner: 10Awight) [17:39:36] :) [17:39:46] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/667565 (https://phabricator.wikimedia.org/T272569) (owner: 10Awight) [17:49:25] (03PS3) 10Mforns: Filter bot traffic out of metrics [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/668032 (https://phabricator.wikimedia.org/T276308) (owner: 10Awight) [17:50:27] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM! Rebased, and merging." [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/668032 (https://phabricator.wikimedia.org/T276308) (owner: 10Awight) [17:51:27] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/668039 (https://phabricator.wikimedia.org/T271902) (owner: 10Awight) [17:58:20] (03CR) 10Mforns: "LGTM! I think some changes didn't make it to patch set 3, though." (034 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/670321 (https://phabricator.wikimedia.org/T273789) (owner: 10Ottomata) [18:31:40] 10Analytics, 10Event-Platform, 10Product-Analytics, 10Product-Data-Infrastructure: [MEP] [BUG] Timestamp format changed in migrated client-side EventLogging schemas - https://phabricator.wikimedia.org/T277253 (10nettrom_WMF) At the moment it's unclear in the Product Analytics team whether the millisecond t... [18:42:22] 10Analytics, 10Event-Platform, 10Product-Analytics, 10Product-Data-Infrastructure: [MEP] [BUG] dt field in migrated client-side EventLogging schemas is not set to meta.dt - https://phabricator.wikimedia.org/T277330 (10nettrom_WMF) [21:20:44] (03PS10) 10Sharvaniharan: Image recommendations table for android [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668244 [21:22:44] (03CR) 10Sharvaniharan: "Changes done. Thank you @Ottomata and @MHolloway" (036 comments) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668244 (owner: 10Sharvaniharan) [21:25:57] (03CR) 10Sharvaniharan: "One tiny clarification... I am assuming it doesn't matter if the field names are different between the old platform and MEP. At this point" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668244 (owner: 10Sharvaniharan) [21:26:30] 10Analytics, 10Analytics-EventLogging, 10Better Use Of Data, 10Event-Platform, and 4 others: KaiOS / Inuka Event Platform client - https://phabricator.wikimedia.org/T273219 (10SBisson) @Ottomata what is the `hasty=true` param? Should we always have it in the intake URL? [21:32:43] 10Analytics, 10Event-Platform, 10Inuka-Team (Kanban): KaiOSAppFeedback Event Platform Migration - https://phabricator.wikimedia.org/T267345 (10SBisson) a:03SBisson [21:37:37] (03PS11) 10Sharvaniharan: Image recommendations table for android [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668244 [21:42:16] (03PS12) 10Sharvaniharan: Image recommendations table for android [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668244