[06:11:51] 10Analytics, 10Analytics-Wikistats: Wikistats active editors metric reporting unrealistic numbers - https://phabricator.wikimedia.org/T265322 (10Tgr) [06:14:10] 10Analytics, 10Analytics-Wikistats: Wikistats active editors metric reporting unrealistic numbers - https://phabricator.wikimedia.org/T265322 (10Tgr) I am guessing this is a regression of T213800 plus actually counting users with 1+ monthly edits, not 5+ (compare https://stats.wikimedia.org/#/hu.wikipedia.org/... [06:15:07] goof morning! [06:15:47] *good :D [06:19:38] Good morning - still repairing my computer :S [06:20:18] !log decom analytics1048 from the Hadoop cluster [06:20:20] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:20:22] joal: bonjour :( [07:56:30] Buongiorno! [07:56:45] !log re-imaging stat1004 to Buster [07:56:47] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [07:57:12] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Move the stat1004-6-7 hosts to Debian Buster - https://phabricator.wikimedia.org/T255028 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by klausman on cumin1001.eqiad.wmnet for hosts: ` ['stat1004.eqiad.wmnet'] ` The log can be found... [07:59:31] klausman: good morning :) [08:11:46] 10Analytics-Clusters, 10Operations, 10Traffic: varnishkafka 1.1.0 CPU usage increase - https://phabricator.wikimedia.org/T264074 (10ema) The function `VUT_Main` is the main loop of VUT programs. The [[ https://github.com/varnishcache/varnish-cache/blob/6d4df3639725bbec6d1657b07867ec44f4ba14f8/lib/libvarnish... [08:44:39] bbiab [08:48:15] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Move the stat1004-6-7 hosts to Debian Buster - https://phabricator.wikimedia.org/T255028 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['stat1004.eqiad.wmnet'] ` and were **ALL** successful. [08:59:25] !log Regenned the jupyterhub venvs on stat1004 [08:59:27] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [09:16:30] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Move the stat1004-6-7 hosts to Debian Buster - https://phabricator.wikimedia.org/T255028 (10klausman) This should be complete, finally! [10:50:34] 10Analytics-Clusters, 10Operations, 10Traffic, 10Patch-For-Review: varnishkafka 1.1.0 CPU usage increase - https://phabricator.wikimedia.org/T264074 (10ema) >>! In T264074#6537906, @ema wrote: > So varnishkafka seems to be correctly looping continuously in the do-while part of VUT_Main. Why is VSM_Status b... [10:55:49] hey all! [10:56:53] elukey, joal: I believe we need to add 2 more fields to netflow events. What do you think, do we want to increase the kafka topic partitions? https://gerrit.wikimedia.org/r/c/operations/puppet/+/633510 [10:57:26] I mean, before we merge this ^ [10:57:35] or merge first, and then evaluate? [11:26:22] (03CR) 10Mforns: "I left some comments on naming/typos, but looks good to me overall!" (038 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/633579 (https://phabricator.wikimedia.org/T257692) (owner: 10Nuria) [12:01:39] \o/ I'm back with almost eveything reinstalled [12:02:06] mforns: I think the data growth of netflow with the 2 fields should not be huge - let's apply and watch [12:02:14] elukey: can you confirm --^ please? [12:27:00] joal: sounds good to me [12:32:50] joal: I agree yes [12:33:07] great :) [12:37:37] 10Analytics-Clusters, 10Patch-For-Review: Review an-coord1001's usage and failover plans - https://phabricator.wikimedia.org/T257412 (10elukey) Created https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Mysql_Meta#Restore_a_Backup [12:37:49] 10Analytics, 10Analytics-Kanban, 10Product-Analytics, 10Patch-For-Review: Add dimensions to editors_daily dataset - https://phabricator.wikimedia.org/T256050 (10JAllemandou) Hi @cchen - I have commented on your CR last week and wanted to be sure you noticed :) [12:38:20] !log drop /srv/backup/mysql from an-master1002 (not used anymore) [12:38:22] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [12:44:04] joal: about https://issues.apache.org/jira/browse/CALCITE-4034 - should we set a spike to see if it can be used for sqooping? It is really valuable in the context of the labsdb1012 reimage/refactor, since during the next months we'll have to move it to multi-instance [12:44:50] elukey: happy to do so!n [12:45:42] elukey: we should sync with persistence team (will ping Manu) and see if we can get access to innodb files [12:46:08] elukey: Let's create a task :) [12:48:27] yep let's do it [12:49:51] the backup infra should already have all the we need, the difficult part is surely how to read those files [12:50:04] agreed elukey [12:57:07] elukey: I can easily imagine a system with the backup being pushed both on current backup storage + hdfs, then converted on Hadoop to parquet using calcite - That'd be so neat ! [12:58:32] wow that sounds pretty cool [13:00:24] Hey! Hi ottomata :) How are the legs? [13:00:54] hellooo ottomata :] [13:01:23] thanks elukey and joal :] will ask ayounsi to merge. [13:01:33] hellloooooo [13:01:35] legs are good! [13:01:40] joal: I agree, but that would mean adding kerberos + hdfs + etc.. to the backup infra, and I am not sure that everybody would be happy :D [13:01:46] hello ottomata ! [13:01:56] makes sense elukey [13:01:56] stronger for sure, but almost back to normal [13:02:07] :) [13:02:27] And the eyes? Have you seen beautiful placeS? [13:02:31] elukey: perhaps we can just restore/copy the files from back up infra to a host in the an cluster [13:02:37] omg joal yes [13:02:41] still processing pictures [13:02:47] it kinda feels like a dream [13:02:50] i don't htink utah is a real place [13:03:17] :) [13:04:17] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops, 10Patch-For-Review: Add more dimensions in the netflow/pmacct/Druid pipeline - https://phabricator.wikimedia.org/T254332 (10mforns) @ayounsi > Let me know when it's fine to merge the relevant change (for src_net + dst_net at least). Please, merge... [13:04:48] ottomata: sure it is a possibility, but those backups are huge so we'd need a host with a lot of disk space to copy/upload etc.. [13:07:33] elukey: reducing the number of replicas of that thing seems a good idea I assume - Could we have dedicated read-only hosts whose job is to sync with prod and write files to HDFS? [13:07:51] elukey: also - I'm ready to talk about druid if you have a minute now [13:08:26] joal: no idea about the read-only hosts, but it would be definitely great [13:09:56] joal: sure for druid, bc? [13:10:06] OMW elukey [13:20:46] (03PS2) 10Ottomata: Add option to use Wikimedia EventStreamConfig to get kafka topics to ingest [analytics/camus] (wmf) - 10https://gerrit.wikimedia.org/r/628447 (https://phabricator.wikimedia.org/T251609) [13:29:20] (03PS3) 10Ottomata: Add option to use Wikimedia EventStreamConfig to get kafka topics to ingest [analytics/camus] (wmf) - 10https://gerrit.wikimedia.org/r/628447 (https://phabricator.wikimedia.org/T251609) [13:34:02] (03CR) 10Ottomata: Add option to use Wikimedia EventStreamConfig to get kafka topics to ingest (031 comment) [analytics/camus] (wmf) - 10https://gerrit.wikimedia.org/r/628447 (https://phabricator.wikimedia.org/T251609) (owner: 10Ottomata) [13:40:56] 10Analytics, 10Analytics-Kanban, 10Operations, 10netops, 10Patch-For-Review: Add more dimensions in the netflow/pmacct/Druid pipeline - https://phabricator.wikimedia.org/T254332 (10ayounsi) Merged, note that it's not in a CIDR notation, so `src_mask` + `dst_mask` would be needed to generate the CIDR form. [13:55:54] Starting build #4 for job wikimedia-event-utilities-maven-release-docker [13:56:52] Project wikimedia-event-utilities-maven-release-docker build #4: 09SUCCESS in 58 sec: https://integration.wikimedia.org/ci/job/wikimedia-event-utilities-maven-release-docker/4/ [13:58:30] (03PS4) 10Ottomata: Add option to use Wikimedia EventStreamConfig to get kafka topics to ingest [analytics/camus] (wmf) - 10https://gerrit.wikimedia.org/r/628447 (https://phabricator.wikimedia.org/T251609) [14:01:09] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Review and improve Oozie authorization permissions - https://phabricator.wikimedia.org/T262660 (10elukey) @razzi let's use the adminlist txt file for production (an-coord1001), then we'll test later on an-test-coord1001 (test cluster) to see if it... [14:02:18] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Put 6 GPU-based Hadoop worker in service - https://phabricator.wikimedia.org/T255138 (10elukey) Last step before closing is to reboot the workers that don't have yet the /dev/kfd device working. [14:02:42] (03PS5) 10Ottomata: Add option to use Wikimedia EventStreamConfig to get kafka topics to ingest [analytics/camus] (wmf) - 10https://gerrit.wikimedia.org/r/628447 (https://phabricator.wikimedia.org/T251609) [14:25:32] klausman: ok if I schedule a maintenance window to reboot stat1005 and 1008? [14:29:28] * elukey be back in a bit [14:43:01] Aye [14:43:24] and sure go ahead re: maint' window [14:46:38] hey ottomata :] qq: if I want to use transform functions in Refine, can I use merge_with_hive_schema_before_read? [14:47:11] related question, does the schema evolution happen pre-transform-functions or post-transform-functions? [14:51:01] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Move the stat1004-6-7 hosts to Debian Buster - https://phabricator.wikimedia.org/T255028 (10nshahquinn-wmf) >>! In T255028#6538152, @klausman wrote: > This should be complete, finally! Thanks @klausman! 🏆 [14:55:22] mforns: hiyaa [14:55:27] not sure i understand the first q [14:55:29] hey :] [14:55:30] but for the second; after. [14:55:37] ok ok, [14:55:45] we really would like to get rid of merge_with_hive_schema_before_read [14:55:59] the first q is: can I use transform functions with merge_with_hive_schema_before_read=true? [14:56:04] aha [14:56:06] https://phabricator.wikimedia.org/T255818 [14:56:16] mforns: yes that should be fine, we do that now in prod [14:56:24] ok ok, thanks! [14:56:46] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Move the stat1004-6-7 hosts to Debian Buster - https://phabricator.wikimedia.org/T255028 (10Isaac) Agreed! Thanks all for these updates (and I'm on the wrong task but thank you for the RAM upgrades too)! Exciting (and very useful) to see the machi... [15:04:42] very interesting: [15:04:43] https://grafana.wikimedia.org/d/XhFPDdMGz/cluster-overview?viewPanel=547&orgId=1&var-site=eqiad&var-cluster=analytics&var-instance=an-worker1085&var-datasource=thanos&from=now-1h&to=now [15:05:07] sometimes datanode disks are saturated for *minutes* [15:07:15] I am wondering if the workers with more disks (22) are better on this side (in the long term when they'll have the same load) [15:07:40] IOPS spread among multiple disks, maybe we could have researched this before buying the new nodes [15:27:25] (03PS2) 10Nuria: [WIP] Adding quality alarms for mobile app data [analytics/refinery] - 10https://gerrit.wikimedia.org/r/633579 (https://phabricator.wikimedia.org/T257692) [15:30:38] (03CR) 10Nuria: "Still testing." (037 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/633579 (https://phabricator.wikimedia.org/T257692) (owner: 10Nuria) [15:44:29] (03CR) 10Mforns: [C: 03+1] "LGTM!" (038 comments) [analytics/refinery] - 10https://gerrit.wikimedia.org/r/633579 (https://phabricator.wikimedia.org/T257692) (owner: 10Nuria) [15:47:52] (03PS6) 10Ottomata: Add option to use Wikimedia EventStreamConfig to get kafka topics to ingest [analytics/camus] (wmf) - 10https://gerrit.wikimedia.org/r/628447 (https://phabricator.wikimedia.org/T251609) [15:51:19] (03PS7) 10Ottomata: Add option to use Wikimedia EventStreamConfig to get kafka topics to ingest [analytics/camus] (wmf) - 10https://gerrit.wikimedia.org/r/628447 (https://phabricator.wikimedia.org/T251609) [15:55:22] nuria: hola! so should we do a single standup this week? [15:55:39] elukey: let's do that so andrew can see everyone, including lex [15:55:40] yes please! [15:58:06] sure [16:01:58] ottomata: coming sorry [16:02:54] ping fdans standupp [16:27:10] razzi: re max-mind, I +1ed your change on friday, you are free to go :) [16:53:54] 10Analytics, 10Analytics-Wikistats: pagecounts-ez uploads stopped after 9/24 - https://phabricator.wikimedia.org/T265378 (10S1magreene) [17:00:44] (03CR) 10Ottomata: [V: 03+2 C: 03+2] Add option to use Wikimedia EventStreamConfig to get kafka topics to ingest [analytics/camus] (wmf) - 10https://gerrit.wikimedia.org/r/628447 (https://phabricator.wikimedia.org/T251609) (owner: 10Ottomata) [17:08:21] ottomata: I'll deploy https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/630262/ - you don't mind? [17:09:12] (03PS1) 10Ottomata: EtlInputFormat - Fix test case where empty strings from csv config values are removed [analytics/camus] (wmf) - 10https://gerrit.wikimedia.org/r/633788 [17:09:22] Pchelolo: please do! [17:10:02] (03CR) 10Ottomata: [V: 03+2 C: 03+2] EtlInputFormat - Fix test case where empty strings from csv config values are removed [analytics/camus] (wmf) - 10https://gerrit.wikimedia.org/r/633788 (owner: 10Ottomata) [17:13:21] oh, it's not converted to new helmfile.d yet.. [18:04:01] (03PS1) 10Ottomata: Add camus-wmf-0.1.0-wmf11.jar with EventStreamConfig support [analytics/refinery] - 10https://gerrit.wikimedia.org/r/633800 (https://phabricator.wikimedia.org/T251609) [18:12:25] ottomata: hiiii I am logging off but lemme know tomorrow what you think about the two an-coord solution :) [18:12:47] (https://phabricator.wikimedia.org/T257412#65283640 [18:12:48] ) [18:13:08] no sorry https://phabricator.wikimedia.org/T257412#6528364 :D [18:13:28] * elukey off! [18:34:03] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Fix Maxmind geoip database archive - https://phabricator.wikimedia.org/T264152 (10razzi) Deployed the first part of this, to stop backing up maxmind files locally, putting them directly on hdfs. Today is Tuesday, so the weekly backup already ran earlier to... [19:16:37] 10Analytics-Radar, 10Product-Analytics, 10Anti-Harassment (The Letter Song): Capture special mute events in Prefupdate table [4 hour spike] - https://phabricator.wikimedia.org/T261461 (10jwang) Understand. There are 5 events for entire 2020, which has the same `before` and `after`. {F32383627} SQL used: `... [19:35:30] 10Analytics-Clusters: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers - https://phabricator.wikimedia.org/T255973 (10Ottomata) a:03razzi [19:52:57] hey ottomata :] I have 1 question about netflow Refine: I'm adding some dimensions to the data set by adding a new dedicated transform function [19:53:43] I guess 2 questions. I created a new file (not TransformFunctions.scala) because I thought that file was more for generic stuff that can be reused. [19:54:17] So I created a file under refinery.job (outside refine dir) named NetflowRefineTransformFunctions, what do you think? [19:55:27] And, q2: To calculate one of the new fields, the transform function needs a map from network-prefix to DC-location (eqiad, ulsfo, etc.) [19:55:48] The map is in puppet under: https://github.com/wikimedia/puppet/blob/production/modules/network/data/data.yaml#L32 [19:55:50] q1: sounds good [19:56:21] oh ho [19:56:23] but I dont know if there's a way to pass it to the transform function, or we'll have to hardcode it [19:56:32] i had to do something simliar for canary events [19:56:37] lemme see [19:56:52] "q1: sounds good" cool! [19:57:50] mforns: here's how I got the data out of puppet into a config file [19:57:51] it could be in a file under refinery/static_data [19:57:51] https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/analytics/refinery/event_service_config.pp [19:57:57] aha [19:58:00] and then [19:58:54] this would be stored in the an-launcher machine right? [19:59:03] yes [19:59:20] then the trans-func parses it [19:59:41] then i got it into a map like this [19:59:41] https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/+/refs/heads/master/eventutilities/src/main/java/org/wikimedia/eventutilities/core/event/EventStreamConfig.java#154 [20:00:03] awesome ottomata this will help a lot :] thanks [20:00:18] ya! :) [20:38:38] (03PS2) 10Ottomata: Use camus + EventStreamConfig integration in CamusPartitionChecker [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/629377 (https://phabricator.wikimedia.org/T251609) [20:57:56] * razzi taking computer break [21:16:36] 10Analytics: Check home/HDFS leftovers of leila - https://phabricator.wikimedia.org/T264994 (10leila) @elukey yes, please. [21:17:05] 10Analytics-Radar, 10Product-Analytics, 10Anti-Harassment (The Letter Song): Capture special mute events in Prefupdate table [4 hour spike] - https://phabricator.wikimedia.org/T261461 (10jwang) @Nuria , Does any one in analytics team possibly know the answer to @dbarratt 's question? > On my local machine... [21:47:22] 10Analytics-Radar, 10Product-Analytics, 10Anti-Harassment (The Letter Song): Capture special mute events in Prefupdate table [4 hour spike] - https://phabricator.wikimedia.org/T261461 (10Nuria) @jwang I think @Mholloway might be able to help given that this seems to bean instrumentation issue. [22:20:29] 10Analytics, 10Analytics-Wikistats: Wikistats 2.0: Add statistics for the geographical origin of the contributors - https://phabricator.wikimedia.org/T188859 (10Nuria) This is scheduled to be added to wikistats Q2 2020 (Sep to Dec) [22:21:16] 10Analytics, 10Analytics-Wikistats: Wikistats 2.0: Add statistics for the geographical origin of the contributors - https://phabricator.wikimedia.org/T188859 (10Nuria) a:03fdans