[00:13:51] Is the list of who is all in the Analytics Engineering team available publicly anywhere? Pages will say things like "then add someone from Analytics team as a reviewer" but there's nothing on https://wikitech.wikimedia.org/wiki/Analytics_Engineering or https://wikitech.wikimedia.org/wiki/Analytics/Team [00:51:37] 10Analytics, 10Analytics-Kanban, 10Event-Platform: Sanitize and ingest all event tables into the event_sanitized database - https://phabricator.wikimedia.org/T273789 (10mpopov) [00:55:01] 10Analytics-Radar, 10Event-Platform, 10Product-Data-Infrastructure, 10Product-Analytics (Kanban): Draft of full process for instrumentation using new client libraries - https://phabricator.wikimedia.org/T275694 (10mpopov) [04:45:46] 10Analytics-Clusters, 10DBA, 10Patch-For-Review: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10razzi) @Marostegui clouddb1021 has the analytics_multiinstance role applied, is configured to expect data on s1 and s3 only (https://gerrit.wikimedia.org/r/c/... [04:47:13] bearloga: you can find the team on office wiki at https://office.wikimedia.org/wiki/Contact_list#Analytics [05:14:50] 10Analytics-Clusters, 10DBA, 10Patch-For-Review: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10Marostegui) Thanks @razzi. I have ack'ed the alerts on icinga and I will start working with these sections. If I successfully get them up and running today I... [05:55:05] 10Analytics-Clusters, 10DBA, 10Patch-For-Review: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10Marostegui) s1 and s3 are now replicating on clouddb1021 (pending enabling GTID - will do it once replication is in sync). Host added to tendril and to zarcil... [05:58:23] 10Analytics-Clusters, 10DBA, 10Patch-For-Review: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10Marostegui) Changed clouddb1021 from planned to active on netbox. [06:44:17] 10Analytics-Clusters, 10DBA, 10Patch-For-Review: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10Marostegui) I have adjusted a bit the buffer pool sizes. They might need further changing but we'll only know once the sqoops run. [06:58:42] good morning [07:01:10] !log drain + reimage an-worker109[4,5] to Buster [07:01:12] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:10:36] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-worker1094.eqiad.wmnet', 'an-worker1095.eqiad.wmnet'] ` The log can be found in... [07:56:40] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1094.eqiad.wmnet', 'an-worker1095.eqiad.wmnet'] ` and were **ALL** successful. [08:04:57] 10Analytics-Radar, 10SRE, 10Patch-For-Review, 10Services (watching), 10User-herron: Replace and expand kafka main hosts (kafka[12]00[123]) with kafka-main[12]00[12345] - https://phabricator.wikimedia.org/T225005 (10elukey) @herron getting back to this so we can add an OKR for Q4 :) We could do the follo... [08:15:06] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10elukey) Remaining nodes: an-worker[1080,1085,1087,1089-1093,1102-1103,1111-1112].eqiad.wmnet,analytics[1072,1076-1077].eqiad.wmnet ` an-worker1080.eqiad.wmnet: /eqiad/A/4 an-worker1103.e... [08:16:56] 10Analytics-Clusters, 10DBA, 10Patch-For-Review: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10Marostegui) s2 and s7 are up and running [08:33:01] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-worker1092.eqiad.wmnet', 'an-worker1093.eqiad.wmnet'] ` The log can be found in... [08:52:55] * elukey bbiab [09:11:58] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1092.eqiad.wmnet', 'an-worker1093.eqiad.wmnet'] ` and were **ALL** successful. [09:13:55] 10Analytics-EventLogging, 10Analytics-Radar, 10Front-end-Standards-Group, 10MediaWiki-extensions-WikimediaEvents, and 4 others: Provide a reusable getEditCountBucket function for analytics purposes - https://phabricator.wikimedia.org/T210106 (10awight) a:03awight Making that change, to move the server-si... [09:14:44] !log drain + reimage analytics1076 and an-worker1112 to Buster [09:14:45] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:24:57] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['analytics1076.eqiad.wmnet', 'an-worker1112.eqiad.wmnet'] ` The log can be found in... [09:52:00] so I found a way to tell log4j to use gzip when rolling over logs for the namenode [09:52:13] but it requires an extra jar, log4j extra [09:52:13] sigh [09:53:54] ahhh and there is a package about it!! [09:53:56] yessss [09:59:06] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['analytics1076.eqiad.wmnet', 'an-worker1112.eqiad.wmnet'] ` and were **ALL** successful. [10:03:00] (03PS4) 10Phuedx: Add new properties to UniversalLanguageSelector schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668743 (https://phabricator.wikimedia.org/T275766) [10:03:51] (03PS5) 10Phuedx: Add new properties to UniversalLanguageSelector schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668743 (https://phabricator.wikimedia.org/T275766) [10:13:31] 10Analytics-Radar, 10Cassandra, 10ContentTranslation, 10Event-Platform, and 9 others: Rebuild all blubber build docker images running on kubernetes - https://phabricator.wikimedia.org/T274262 (10JMeybohm) [10:13:39] 10Analytics: Configure the HDFS Namenodes to use the log4j rolling gzip appender - https://phabricator.wikimedia.org/T276906 (10elukey) [10:15:00] didn't manage to make it work but opened --^ [10:59:42] 10Analytics, 10Cassandra: Cassandra3 migration for Analytics AQS - https://phabricator.wikimedia.org/T249755 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by hnowlan on cumin1001.eqiad.wmnet for hosts: ` ['aqs1010.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/202103091058_hno... [11:12:52] 10Analytics, 10Cassandra: Cassandra3 migration for Analytics AQS - https://phabricator.wikimedia.org/T249755 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['aqs1010.eqiad.wmnet'] ` Of which those **FAILED**: ` ['aqs1010.eqiad.wmnet'] ` [11:40:15] 10Analytics, 10Cassandra: Cassandra3 migration for Analytics AQS - https://phabricator.wikimedia.org/T249755 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by hnowlan on cumin1001.eqiad.wmnet for hosts: ` ['aqs1010.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/202103091139_hno... [12:02:48] 10Analytics, 10Cassandra: Cassandra3 migration for Analytics AQS - https://phabricator.wikimedia.org/T249755 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['aqs1010.eqiad.wmnet'] ` and were **ALL** successful. [12:05:19] 10Analytics-EventLogging, 10Analytics-Radar, 10Front-end-Standards-Group, 10MediaWiki-extensions-WikimediaEvents, and 4 others: Provide a reusable getEditCountBucket function for analytics purposes - https://phabricator.wikimedia.org/T210106 (10awight) a:05awight→03None [12:25:57] Hi team - going through some complicated time at home - Internet was gone, Lino is sick [12:26:05] I'll be off the rest of the day mostly I assume [12:28:41] ack joal, please take care! [12:43:25] 10Analytics, 10Cassandra: Cassandra3 migration for Analytics AQS - https://phabricator.wikimedia.org/T249755 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by hnowlan on cumin1001.eqiad.wmnet for hosts: ` ['aqs1011.eqiad.wmnet', 'aqs1012.eqiad.wmnet', 'aqs1013.eqiad.wmnet'] ` The log can be found... [12:52:17] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-worker1103.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/20... [12:59:55] !log drain + reimage an-worker1103 to Buster [12:59:57] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:17:02] mforns: o/ what time should I use to start the new session length job? (I am checking https://etherpad.wikimedia.org/p/analytics-weekly-train for later on) [13:17:13] a-team: please update https://etherpad.wikimedia.org/p/analytics-weekly-train if needed :) [13:24:23] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1103.eqiad.wmnet'] ` and were **ALL** successful. [13:26:58] !log reimage an-worker1102 and an-worker1080 (hdfs journal node) to Buster [13:26:59] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:30:35] 10Analytics-Radar: Presto error in Superest - only when grouping - https://phabricator.wikimedia.org/T270503 (10EYener) Thanks again @JAllemandou . This was confusing for me because, like you said, I was only aware of query optimization and it doesn't fit with my understanding of SQL - so I appreciate the inform... [13:34:40] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-worker1102.eqiad.wmnet', 'an-worker1080.eqiad.wmnet'] ` The log can be found in... [13:44:16] 10Analytics-Clusters, 10DBA, 10Patch-For-Review: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10Marostegui) s4 and s6 are now replicating [13:51:44] 10Analytics, 10Cassandra: Cassandra3 migration for Analytics AQS - https://phabricator.wikimedia.org/T249755 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['aqs1012.eqiad.wmnet'] ` Of which those **FAILED**: ` ['aqs1012.eqiad.wmnet'] ` [14:05:00] 10Analytics, 10Analytics-Kanban, 10Event-Platform: Sanitize and ingest all event tables into the event_sanitized database - https://phabricator.wikimedia.org/T273789 (10Ottomata) @mpopov all that stuff sounds really great! Buuut, this task is meant to track the adaptation of the existing system so that it w... [14:06:54] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1102.eqiad.wmnet', 'an-worker1080.eqiad.wmnet'] ` and were **ALL** successful. [14:11:37] 10Analytics, 10Cassandra: Cassandra3 migration for Analytics AQS - https://phabricator.wikimedia.org/T249755 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by hnowlan on cumin1001.eqiad.wmnet for hosts: ` ['aqs1012.eqiad.wmnet', 'aqs1014.eqiad.wmnet', 'aqs1015.eqiad.wmnet'] ` The log can be found... [14:19:57] mforns: o/ want to deploy a sampling rate change? [14:29:43] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-worker1090.eqiad.wmnet', 'an-worker1089.eqiad.wmnet'] ` The log can be found in... [14:29:51] !log drain + reimage an-worker1090/89 to Buster [14:29:53] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [14:29:58] 10Analytics, 10Editing-team, 10Event-Platform, 10Patch-For-Review: EditAttemptStep Event Platform Migration - https://phabricator.wikimedia.org/T267343 (10Ottomata) [14:30:09] 10Analytics, 10Editing-team, 10Event-Platform, 10MW-1.36-notes (1.36.0-wmf.34; 2021-03-09), 10Patch-For-Review: VisualEditorFeatureUse Event Platform Migration - https://phabricator.wikimedia.org/T267353 (10Ottomata) [14:39:07] 10Analytics, 10Cassandra: Cassandra3 migration for Analytics AQS - https://phabricator.wikimedia.org/T249755 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['aqs1012.eqiad.wmnet', 'aqs1014.eqiad.wmnet', 'aqs1015.eqiad.wmnet'] ` and were **ALL** successful. [14:52:12] (03PS1) 10Ottomata: Migrate KaiOS schemas from metawiki [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/670186 (https://phabricator.wikimedia.org/T267344) [14:53:50] (03CR) 10Ottomata: [C: 03+2] Migrate KaiOS schemas from metawiki [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/670186 (https://phabricator.wikimedia.org/T267344) (owner: 10Ottomata) [14:58:40] (03PS1) 10Ottomata: Add client_ip to inukapageview and kaiosappfirstrun 1.0.0 [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/670187 (https://phabricator.wikimedia.org/T267344) [14:59:24] (03CR) 10Ottomata: [C: 03+2] Add client_ip to inukapageview and kaiosappfirstrun 1.0.0 [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/670187 (https://phabricator.wikimedia.org/T267344) (owner: 10Ottomata) [15:02:28] ottomata: yes! :D [15:03:22] u got patch? [15:08:02] ottomata: I got patch: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/668553 [15:12:16] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1090.eqiad.wmnet', 'an-worker1089.eqiad.wmnet'] ` and were **ALL** successful. [15:15:06] only 6 worker nodes left on stretch [15:15:08] !!! [15:18:49] !log reimage analytics1072 (hadoop hdfs journal node) to buster [15:18:50] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:19:22] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Analytics, 10Product-Data-Infrastructure: [MEP] Determine how stream configuration is authored and deployed - https://phabricator.wikimedia.org/T269774 (10Ottomata) Another reason for moving this at least out of InitialiseSettings.php: Right... [15:20:08] mforns: syncing now [15:20:23] ottomata: cool! [15:21:35] done mforns! i guess it will take some time for configs to reach JS clients [15:21:44] aha, makes sense [15:22:06] will be following kafka per topic in grafana [15:22:10] thanks a lot! [15:22:16] ya can also follow [15:22:16] https://grafana.wikimedia.org/goto/-m811hUMz [15:30:54] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['analytics1072.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/20... [15:35:42] Hiya team [15:36:03] hello! [15:36:14] !log rebalance kafka partitions for webrequest_upload partition 13 [15:36:15] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [15:40:15] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: KaiOS / Inuka Event Platform client - https://phabricator.wikimedia.org/T273219 (10Ottomata) Ok @SBisson we are ready to go. Steps 1-6 done. You should be able to produce events to https://intake-analytics.wiki... [15:40:43] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: Migrate legacy metawiki schemas to Event Platform - https://phabricator.wikimedia.org/T259163 (10Ottomata) [16:02:11] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['analytics1072.eqiad.wmnet'] ` and were **ALL** successful. [16:19:39] mforns: re sanitization [16:19:43] is keepAll not used anywhere currently? [16:19:45] hey yes [16:19:50] mmmm [16:19:59] ottomata: no, not used anywhere [16:20:13] ok, i you dont' mind i'm going to lower/camel case that setting [16:20:15] rather than keepAll [16:20:35] yes, no problem, I like keepall better now [16:20:51] or keep_all [16:20:56] whatever you prefer [16:21:07] k [16:21:53] oof, sorry I just realized we put wikistats on the train etherpad and it's poor Luca's week. Luca, if you have any trouble, we owe you that task to improve the deployment, feel free to pass it to me, it only takes me a few minutes [16:22:47] ottomata: session_tick throughput has stabilized at +- 1700 evts/sec [16:22:51] nice [16:22:51] ! [16:22:53] :) [16:24:12] peak should still come in a couple hours, but volume looks expected [16:26:04] milimetric: can I grab your brain for a session length question (given jo-al is not here)? [16:34:24] milimetric: no problem! The last time I tried to execute the procedure but ended up with https://gerrit.wikimedia.org/r/c/analytics/wikistats2/+/661968 [16:34:35] see the git+ssh entries in package-lock.json etc.. [16:34:42] no idea why it happened [16:34:49] 10Analytics, 10Analytics-Kanban, 10Event-Platform: Sanitize and ingest all event tables into the event_sanitized database - https://phabricator.wikimedia.org/T273789 (10mpopov) [16:37:15] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10elukey) ` (5) an-worker[1085,1087,1091,1111].eqiad.wmnet,analytics1077.eqiad.wmnet ----- OUTPUT of 'cat /etc/debian_version' -----... [16:38:00] 10Analytics, 10Better Use Of Data, 10Event-Platform: Develop comprehensive process, guidelines, and roles for Event Platform stream sanitization - https://phabricator.wikimedia.org/T276955 (10mpopov) [16:38:10] 10Analytics, 10Better Use Of Data, 10Event-Platform, 10Product-Analytics: Develop comprehensive process, guidelines, and roles for Event Platform stream sanitization - https://phabricator.wikimedia.org/T276955 (10mpopov) [16:38:29] mforns: ps ok? Atlaas cranky [16:38:46] milimetric: ofc [16:40:04] !log reimage analytics1077 to buster [16:40:06] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:40:50] 10Analytics, 10Analytics-Kanban, 10Event-Platform: Sanitize and ingest all event tables into the event_sanitized database - https://phabricator.wikimedia.org/T273789 (10mpopov) @Ottomata good point, thanks for clarifying the original intention! I've adjusted the scope and factored those ideas out into T276955 [16:45:11] mforns: hola :) did you see my earlier ping? [16:45:30] no elukey looking [16:45:33] I am wondering at what date/time I should start the new session len coord [16:45:34] hello! [16:45:43] I was reading https://etherpad.wikimedia.org/p/analytics-weekly-train [16:45:49] oh! elukey, please don't start it [16:45:57] ah perfect :D [16:46:17] today, data is still half at 1% sample rate, and half 10% sample rate [16:46:26] I'll start it tomorrow clean at 10% [16:46:52] mforns: ok so I simply deploy refinery and you'll start it when it is the time, ack? [16:47:02] yes, elukey, thanks! [16:47:03] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['analytics1077.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/20... [16:48:33] as far as I can see no source deployment right? [16:48:42] only a quick refinery one without restarts [16:48:51] well I can wait for standup to ask, so we are sure [16:59:45] elukey: i got nuthin for ya :) [17:04:58] 10Analytics, 10Analytics-Kanban, 10Event-Platform: Sanitize and ingest all event tables into the event_sanitized database - https://phabricator.wikimedia.org/T273789 (10Ottomata) Thank you! :) [17:17:32] (03PS1) 10Mforns: Preemptively add weight field to session length intermediate table [analytics/refinery] - 10https://gerrit.wikimedia.org/r/670239 (https://phabricator.wikimedia.org/T273116) [17:18:48] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['analytics1077.eqiad.wmnet'] ` and were **ALL** successful. [17:19:42] 10Analytics-Kanban, 10Patch-For-Review: Update sqoop to work with multi-instance clouddb1021 mariadb host - https://phabricator.wikimedia.org/T274690 (10Ottomata) [17:24:55] (03CR) 10Milimetric: "Makes sense, both the temporary 10% hardcoded and the plan after that." [analytics/refinery] - 10https://gerrit.wikimedia.org/r/670239 (https://phabricator.wikimedia.org/T273116) (owner: 10Mforns) [17:25:55] 10Analytics-Clusters, 10observability, 10User-fgiunchedi: Setup Analytics team in VO/splunk oncall - https://phabricator.wikimedia.org/T273064 (10Ottomata) a:03razzi Talked about this today, bare minimum of what we should do now: - Set up a paging schedule for analytics SREs in Splunk OnCall. - make sure... [17:26:13] 10Analytics-Clusters, 10observability, 10User-fgiunchedi: Setup Analytics team in VO/splunk oncall - https://phabricator.wikimedia.org/T273064 (10Ottomata) p:05Triage→03Medium [17:27:13] (03CR) 10Mforns: "> Patch Set 1:" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/670239 (https://phabricator.wikimedia.org/T273116) (owner: 10Mforns) [17:28:54] 10Analytics-Clusters: /wmf/data/raw should be readable by analytics-privatedata-users - https://phabricator.wikimedia.org/T275396 (10Ottomata) a:03Ottomata [17:29:29] 10Analytics-Clusters, 10Analytics-Kanban: /wmf/data/raw should be readable by analytics-privatedata-users - https://phabricator.wikimedia.org/T275396 (10Ottomata) p:05Triage→03High [17:29:48] 10Analytics, 10Analytics-EventLogging, 10Analytics-Kanban, 10Better Use Of Data, and 5 others: KaiOS / Inuka Event Platform client - https://phabricator.wikimedia.org/T273219 (10SBisson) @Ottomata the vast majority of users are getting auto updates. For the others, we'll see how their numbers decline once... [17:31:33] 10Analytics, 10Event-Platform, 10Inuka-Team (Kanban): InukaPageView Event Platform Migration - https://phabricator.wikimedia.org/T267344 (10SBisson) Putting on the Inuka kanban board for the client-side migration. [17:32:01] 10Analytics, 10Event-Platform, 10Inuka-Team (Kanban): KaiOSAppFeedback Event Platform Migration - https://phabricator.wikimedia.org/T267345 (10SBisson) Putting on the Inuka kanban board for the client-side migration. [17:32:14] 10Analytics, 10Event-Platform, 10Inuka-Team (Kanban): KaiOSAppFirstRun Event Platform Migration - https://phabricator.wikimedia.org/T267346 (10SBisson) Putting on the Inuka kanban board for the client-side migration. [17:33:33] (03Abandoned) 10Mforns: Preemptively add weight field to session length intermediate table [analytics/refinery] - 10https://gerrit.wikimedia.org/r/670239 (https://phabricator.wikimedia.org/T273116) (owner: 10Mforns) [17:34:14] 10Analytics-Clusters: Add analytics-presto.eqiad.wmnet CNAME for Presto coordinator failover - https://phabricator.wikimedia.org/T273642 (10Ottomata) a:03razzi [17:37:16] 10Analytics-Clusters: Upgrade to Superset 1.0 - https://phabricator.wikimedia.org/T272390 (10razzi) [17:39:27] 10Analytics-Clusters, 10Data-Persistence-Backup: Evaluate the need to generate and maintain zookeeper backups - https://phabricator.wikimedia.org/T274808 (10Ottomata) Hello! Why not eh? I don't think this is high priority for us, but it is certainly a good idea. Feel free to reach out to any Data/Analytics E... [17:39:51] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-worker1085.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/20... [17:41:26] 10Analytics-Clusters, 10Analytics-Kanban: Upgrade to Superset 1.0 - https://phabricator.wikimedia.org/T272390 (10Ottomata) [17:41:52] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Add 6 worker nodes to the HDFS Namenode config of the Analytics Hadoop cluster - https://phabricator.wikimedia.org/T275767 (10Ottomata) [17:42:19] 10Analytics-Clusters, 10Patch-For-Review: Add superset-next.wikimedia.org domain for superset staging - https://phabricator.wikimedia.org/T275575 (10Ottomata) p:05Triage→03Medium [17:46:35] 10Analytics-Clusters, 10Product-Analytics: Can't re-run failed Oozie workflows in Hue/Hue-Next (as non-admin) - https://phabricator.wikimedia.org/T275212 (10Ottomata) p:05Triage→03High [17:53:58] silly question, do we have any Kafka listeners in Python already deployed in our infrastructure? was thinking about writing one to listen to the NEL eventgate streams to export some stats [17:55:50] cdanis: do you mean python library or daemon running somewhere? [17:56:05] elukey: daemon :) [17:56:33] but recommendations on libraries are appreciated also -- do we use the official one? a wrapped librdkafka? something else? [18:00:58] cdanis: we use confluent-kafka-python (that wraps librdkafka) and kafka-python for eventlogging (that we are deprecating) [18:01:03] the latter is a pure python client [18:01:35] for daemons I don't think we have much, kafkatee is usually the one that we use to dump from kafka if we need [18:01:37] cdanis: the performance team does this [18:01:44] I'm happy with whatever is prepackaged in production already and is preferred by you all :) sounds like confluent-kafka-python is the way to go [18:01:59] yeah I had been using kafkatee but it's a bit tricky because I need to merge a few different streams into one [18:02:02] https://gerrit.wikimedia.org/r/plugins/gitiles/performance/navtiming/+/refs/heads/master [18:02:24] cdanis: this is what I want to be able to give you: https://phabricator.wikimedia.org/T262942 [18:02:24] thanks ottomata! [18:02:31] elukey: re: merging streams, you may enjoy this: https://phabricator.wikimedia.org/P13393 [18:02:56] i was thinking about this exact use case today [18:03:18] i'd love to have an easily deployable event stream prometheus exporter stream processing job [18:04:42] * razzi afk for lunch [18:04:47] dcausse also, check this out [18:04:47] https://gist.github.com/ottomata/ec5cd742fc2d2e894126e31ddc34ebd3#file-spark-streaming-demo-for-isaac-L21-L39 [18:05:18] (sorry the formatting is bad... it was from a jupyter notebook) [18:05:30] https://gist.github.com/ottomata/ec5cd742fc2d2e894126e31ddc34ebd3#file-spark-streaming-demo-for-isaac-L90-L106 [18:05:32] is the interesting part [18:05:46] cdanis: I love the name :D [18:06:13] oh cdanis [18:06:15] do you konw about kafkatee? [18:06:16] elukey: I was honestly stunned it didn't exist as a UNIX utility already, or in moreutils or something [18:06:20] ottomata: I do! [18:06:25] oh yes you do/. [18:06:29] ottomata: but to reconstruct the NEL stream I had to merge streams from two different brokers [18:06:33] which kafkatee won't do on its own [18:06:38] oh [18:06:44] so I did a construction like (sigh, embarrassing things ahead) [18:06:49] cdanis: that is a problem wwith the way the logging cluster kafka is configured [18:07:04] eet.py <(kafkatee -B broker1 ... ) <(kafkatee -B broker2 ... ) | stats-exporter.py [18:07:06] in other clusters, datacenter prefixed topics are always mirrored to each other [18:07:07] very gross :) [18:07:20] logging is not done that way because they expect the only consuemr to be logstash [18:07:26] right [18:07:29] and they do the cross DC repl with elasticsearch [18:07:35] i think that's wrong, they should replicate with kafka [18:07:44] but ¯\_(ツ)_/¯ [18:08:11] cdanis: we should probably just set up mirror maker to consume from logging clusters into kafka-jumbo [18:08:17] then you'd have both topics in one place [18:08:46] hmm, sure [18:08:59] I'd also be interested in playing with NEL data using Turnilo [18:09:00] that would also let us ingest it into hiv [18:09:01] e [18:09:02] yeah [18:09:02] hmm, cross-dc replication with elsaticsearch? That's only in the paid offering [18:09:03] right [18:09:12] ebernhardson: ok i don't know how they do it then [18:09:15] but they dont' use kafka [18:09:26] maybe they have multiple logstash consumers that consume cross dc? [18:09:40] the suggested deployment is multiple logstash consumers reading from kafka per-dc [18:09:51] they might be writing from one logstash to multiple clusters though [18:09:53] suggested by? :) [18:09:58] upstream [18:10:18] ah, i think that maybe is a good suggestion if you don't already have cross dc kafka mirroring set up [18:10:25] i could be mistaken, i luckily gave away ownership of this system 3 years ago :P [18:10:37] for all our other kafka clusters the data is already mirrored cross dc [18:10:48] so the local logstash could just use the local kafka cluster and consume both dc topics [18:11:04] haah [18:11:11] by 'all' i mean kafka-main [18:11:12] haha [18:11:41] yes, if we already mirror kafka it almost certainly makes the most sense for logstash to read the local kafka [18:12:17] yeah, that would make mirroring to jumbo easier to [18:12:27] cdanis: ... i'm goingi to file a task to fix kafka logging mirroring [18:12:31] actually>.>..>> [18:12:37] that can be done without changing any logstash consumers [18:12:50] it would just make the preefixed topcis availalbe in both clusters [18:12:58] oh! yeah [18:13:01] the topics are indeed prefixed [18:19:58] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1085.eqiad.wmnet'] ` and were **ALL** successful. [18:21:27] 10Analytics, 10SRE: Set up cross DC topic mirroring for Kafka logging clusters - https://phabricator.wikimedia.org/T276972 (10Ottomata) [18:21:35] 10Analytics, 10SRE: Set up cross DC topic mirroring for Kafka logging clusters - https://phabricator.wikimedia.org/T276972 (10Ottomata) Note: Recent versions of Kafka have a new robust version of MirrorMaker, [[ https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0 | MirrorMaker 2 ]]. O... [18:22:43] 10Analytics, 10SRE, 10observability: Set up cross DC topic mirroring for Kafka logging clusters - https://phabricator.wikimedia.org/T276972 (10Ottomata) [18:23:12] cdanis: I really think that is the right solution, doensn't mean you shouldn't use a python consumer or whatev er [18:23:21] but you really should not have to consume from multiple kafka clusters [18:23:46] setting up that mirroring is not hard, it just takes a little bit of puppet declaration [18:23:53] all the defines are set up to just make it work [18:24:32] ok! [18:25:29] https://github.com/wikimedia/puppet/blob/ebdc6d429f4bf78a56375a6bd56f1244f1098710/modules/profile/manifests/kafka/mirror.pp [18:26:03] !log reimage an-worker1087 to buster [18:26:05] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:26:14] 3 nodes left [18:33:38] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-worker1087.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/20... [18:41:38] * elukey afk! dinner [19:06:18] mforns: [19:06:21] https://www.irccloud.com/pastebin/69K8dMzn/ [19:06:26] without separators? [19:06:47] ottomata: lookin [19:07:29] ottomata: docs look good, but I can not see the code [19:07:36] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1087.eqiad.wmnet'] ` and were **ALL** successful. [19:07:38] oh mforns i was just wondering why? [19:07:57] the regex replacees [19:07:58] tag.replaceAll("[-_ ]", "").toLowerCase [19:08:33] ottomata: regarding without separators or snake_case, I don't mind, you choose! you can make it more like EP conventions [19:08:53] ottomata: oh! [19:08:55] was there a reason to do that? were there inconsistencies? [19:09:07] you mean changing the field names as well? [19:09:07] keep_all vs keepall ? [19:09:11] noono [19:09:18] mforns: i'm just wondering why that function was needed at all [19:09:41] oh! I understand now... [19:09:45] yes, there was a problem [19:10:40] the include-list accepts any case, to make it more robust [19:11:00] but when it comes to comparing the fields specified in the include-list with the actual database fields... [19:11:13] they have to be normalized to match them correctly [19:11:58] the code, IIUC, normalizes both the include-list field names as well as the database field names [19:12:16] as the whitelist is parsed only once, that function is doing the normalization all at once [19:12:20] OHHHH [19:12:22] on the include-list side [19:12:30] right right [19:12:49] ok so its not just for the allowlist, but also for the field names to normalize them with db [19:12:55] yes [19:12:57] got it [19:13:38] it just prepares the include-list to be used by the transform function without issues [19:15:18] so, I remember now, the include-list can actually have tags in any format: keepall KeepAll keepAll keep-all keep_all [19:15:52] ok, i might remove that if you are ok, and only support keep_all [19:15:53] not that we want to encourage that, but it is robust to such inconsistencies [19:16:04] i'd rather the thing error than maintain those inconssitencies [19:16:39] yes, that makes sense [19:17:16] I think I did it this way, because it's an include-list and not an exclude-list [19:18:04] but yea, what you say makes more sense [19:18:32] if there's an invalid tag, error [19:32:09] (03PS1) 10Ottomata: Rename whitelist to allowlist for Refine sanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/670269 (https://phabricator.wikimedia.org/T273789) [19:32:51] (03PS2) 10Ottomata: Rename whitelist to allowlist for Refine sanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/670269 (https://phabricator.wikimedia.org/T273789) [19:37:21] (03CR) 10jerkins-bot: [V: 04-1] Rename whitelist to allowlist for Refine sanitization [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/670269 (https://phabricator.wikimedia.org/T273789) (owner: 10Ottomata) [20:08:15] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on cumin1001.eqiad.wmnet for hosts: ` ['an-worker1091.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/20... [20:41:32] 10Analytics-Clusters, 10Patch-For-Review: Install Debian Buster on Hadoop - https://phabricator.wikimedia.org/T231067 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['an-worker1091.eqiad.wmnet'] ` and were **ALL** successful. [20:42:26] !log reimaged an-worker1091 to buster [20:42:29] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [21:38:45] (03CR) 10Jdlrobson: [C: 03+1] Add new properties to UniversalLanguageSelector schema (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668743 (https://phabricator.wikimedia.org/T275766) (owner: 10Phuedx) [22:00:08] !log rebalance kafka partitions for webrequest_upload partition 14 [22:00:10] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [22:32:05] 10Analytics-Clusters, 10Analytics-Kanban: Upgrade to Superset 1.0 - https://phabricator.wikimedia.org/T272390 (10razzi) Found another bug in 1.0.1: Viewing https://superset.wikimedia.org/superset/dashboard/165/, the top left chart "Impression Count Pie Chart | Banners Selected | FY2021 | India Campaign" loads... [23:28:34] (03PS9) 10Mholloway: Image recommendations table for android [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668244 (owner: 10Sharvaniharan) [23:30:48] (03PS1) 10Ottomata: [WIP] Refactor EventLoggingSanitization to a generic job: RefineSanitize [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/670321 (https://phabricator.wikimedia.org/T273789) [23:31:25] (03CR) 10Ottomata: "I haven't tested this yet, but I think the general idea will work. Ready for initial comments." [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/670321 (https://phabricator.wikimedia.org/T273789) (owner: 10Ottomata) [23:32:20] (03CR) 10Mholloway: "I squash-merged this with the abandoned parent commit and rebased it." [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668244 (owner: 10Sharvaniharan) [23:36:13] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Refactor EventLoggingSanitization to a generic job: RefineSanitize [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/670321 (https://phabricator.wikimedia.org/T273789) (owner: 10Ottomata) [23:36:41] (03PS2) 10Ottomata: [WIP] Refactor EventLoggingSanitization to a generic job: RefineSanitize [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/670321 (https://phabricator.wikimedia.org/T273789) [23:38:43] (03CR) 10Ottomata: "For a minute, I tried to keep EventLoggingSanitization as a wrapper around RefineSanitize with hardcoded args for EventLogging, but I thin" [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/670321 (https://phabricator.wikimedia.org/T273789) (owner: 10Ottomata) [23:41:16] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Refactor EventLoggingSanitization to a generic job: RefineSanitize [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/670321 (https://phabricator.wikimedia.org/T273789) (owner: 10Ottomata) [23:41:35] (03CR) 10Mholloway: [C: 04-1] "Mostly stylistic nitpicks inline." (033 comments) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668244 (owner: 10Sharvaniharan) [23:44:40] (03CR) 10Mholloway: [C: 04-1] Image recommendations table for android (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668244 (owner: 10Sharvaniharan) [23:59:38] (03CR) 10Ottomata: "Nit on naming." (033 comments) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/668244 (owner: 10Sharvaniharan)