[05:02:24] 10Analytics-Clusters, 10Analytics-Kanban, 10DBA, 10Data-Services, and 2 others: Convert labsdb1012 from multi-source to multi-instance - https://phabricator.wikimedia.org/T269211 (10Marostegui) We can start with that and then check how it goes. [05:40:44] 10Analytics: Mysql partition on an-coord1001 sudden change in growth rate since Apr 14th - https://phabricator.wikimedia.org/T280367 (10elukey) Let's leave this open for a day more, but the progression of the daily binlog files looks shrinking, so we should be good. [05:44:36] good morning! [05:45:07] !log cleanup Lex's jupyter notebooks on stat1007 to allow puppet to clean up [05:45:09] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:50:29] !log move /var/lib/hadoop/name partition under /srv/hadoop/name on an-master1001 - T265126 [06:50:32] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:50:33] T265126: Improve logging for HDFS Namenodes - https://phabricator.wikimedia.org/T265126 [06:59:51] all right an-master1001 and 1002 have brand new /srv partitions! [07:00:04] more standardized, rather than separate /var/lib/hadoop/name etc.. [07:56:51] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Configure the HDFS Namenodes to use the log4j rolling gzip appender - https://phabricator.wikimedia.org/T276906 (10elukey) Currently testing the new config on an-test-master1001 and an-test-worker1001, from the first results it looks good. [07:57:02] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Configure the HDFS Namenodes to use the log4j rolling gzip appender - https://phabricator.wikimedia.org/T276906 (10elukey) p:05Triageβ†’03Medium [07:59:29] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Improve logging for HDFS Namenodes - https://phabricator.wikimedia.org/T265126 (10elukey) @razzi @Ottomata given the good results in T276906 (.gz files are way more little in size) we may want to stop this task here, and avoid to move the logs und... [08:06:59] 10Analytics-Clusters, 10Analytics-Kanban: Upgrade the Hadoop masters to Debian Buster - https://phabricator.wikimedia.org/T278423 (10elukey) @razzi in T265126 I reshaped the an-master100* partitions, now they look like this: ` elukey@an-master1001:~$ sudo lsblk -i NAME MAJ:MIN RM SIZ... [08:08:13] * elukey bbiab [08:30:45] Good morning [08:31:14] bonjour! [08:31:27] o/ How are you elukey? [08:35:37] good! and you?? [08:35:52] great! [08:40:34] elukey: anything special I should stat with? I'm looking at emails and there are some ) [08:44:22] joal: not really! I am going to prep the puppet change for the capacity scheduler, then we can think about rolling it out.. if you have time later on to send the comments updates etc.. it would be great (so we can close that part too) [08:44:26] but really anytime [08:44:28] even tomorrow [08:44:48] Right elukey - I'm sorry I didn't do it before leaving [08:44:58] joal: there is no rush :) [08:45:23] ah and at some point let's talk about the blogpost :) [08:48:44] Guess who woke up to no electricty in half of his apartment :( [08:49:37] Stove/oven: yes; Fridge: no; Bedroom: yes; office lights: yes; office wall sockets: no --- etc. [08:49:41] lovely way to start the week! [08:49:48] fridge no -> ouch :( [08:50:01] I guess I'll have to make a lot of pizza this week, lest the dough goes bad. [08:50:27] mwarf klausman :( Have you pinpointed the problem? [08:51:32] No. It's not a fuse problem (none have tripped, none are warm). I suspect it's a junction box somewhere. I've already mailed my landlord to send an electrician. [08:51:51] :( [08:51:53] I could probably fix it myself, but it's not worth the hassle, and if something goes wrong, it could go seriously wrong. [08:54:38] indeed klausman - good luck with the pizzarathon [08:56:57] I am sure Luca approves :) [09:03:23] yes definitely! :) [10:36:12] * elukey lunch [11:01:03] (03CR) 10Joal: "Some small comments." (034 comments) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/678293 (https://phabricator.wikimedia.org/T210106) (owner: 10Awight) [11:47:30] elukey: hey, how are firewall rules from the workers to AQS hosts configured? I'm guessing homer but I haven't touched it much [12:26:32] (03PS1) 10Kosta Harlan: [WIP] Create structuredtask/article/edit schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/681052 (https://phabricator.wikimedia.org/T278177) [12:28:07] (03CR) 10jerkins-bot: [V: 04-1] [WIP] Create structuredtask/article/edit schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/681052 (https://phabricator.wikimedia.org/T278177) (owner: 10Kosta Harlan) [12:33:45] (03PS2) 10Kosta Harlan: [WIP] Create structuredtask/article/edit schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/681052 (https://phabricator.wikimedia.org/T278177) [12:46:42] hi team [12:48:51] hola hola [12:49:21] hnowlan: hey! Yes homer, sending a patch now :) [12:51:13] elukey: I found the section, already have one in! [12:52:13] hnowlan: ack, it is in the firewall file [12:56:56] hnowlan: do we need only the instance hostnames or should we also add the aqs10{10..15} ones? [12:57:03] (brb) [13:00:19] elukey: in theory only the -a and -b hostnames, cassandra doesn't listen on 9042 on the main hostname [13:00:55] hnowlan: yeah I am thinking if there is another use case, but in theory no [13:01:26] do you know how these firewalls are deployed/where/etc..? [13:01:31] otherwise I can add some words [13:05:34] hello team! [13:06:33] hola Marcel [13:06:43] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Improve logging for HDFS Namenodes - https://phabricator.wikimedia.org/T265126 (10Ottomata) I'm fine with this either way! Really, if we were to do this, we should do it for all Hadoop daemons, just to be consistent. So, perhaps we can just not... [13:06:55] (03CR) 10GergΕ‘ Tisza: "I imagine we'll want to connect this with the suggested edit session." (032 comments) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/681052 (https://phabricator.wikimedia.org/T278177) (owner: 10Kosta Harlan) [13:07:20] hello! [13:11:21] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Improve logging for HDFS Namenodes - https://phabricator.wikimedia.org/T265126 (10elukey) >>! In T265126#7013940, @Ottomata wrote: > I'm fine with this either way! Really, if we were to do this, we should do it for all Hadoop daemons, just to be... [13:14:45] (03CR) 10Ottomata: [WIP] Create structuredtask/article/edit schema (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/681052 (https://phabricator.wikimedia.org/T278177) (owner: 10Kosta Harlan) [13:15:13] just a heads-up, I'll be missing standup today as I have a conflict I can't miss [13:16:19] ack! [13:17:22] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Growth-Team, and 3 others: Revisions missing from mediawiki_revision_create - https://phabricator.wikimedia.org/T215001 (10Ottomata) [13:20:27] o/ [13:20:48] I've gotta run out this morning to figure out my car situation (lease expiring) [13:22:03] I'll miss standup but if joal has time later I hope to sync about gobblin [13:22:32] 10Analytics: Delete UpperCased eventlogging legacy directories in /wmf/data/event 90 days from 2021-04-15 (after 2021-07-14) - https://phabricator.wikimedia.org/T280293 (10mforns) I don't see that happening in the event_sanitized base directory. Is refine_sanitize not going to do that as well? If it did, we'd h... [13:24:24] (03PS3) 10Kosta Harlan: [WIP] Create structuredtask/article/edit schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/681052 (https://phabricator.wikimedia.org/T278177) [13:25:44] (03CR) 10Kosta Harlan: [WIP] Create structuredtask/article/edit schema (032 comments) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/681052 (https://phabricator.wikimedia.org/T278177) (owner: 10Kosta Harlan) [13:32:43] milimetric: yessir! I'll have time to catch up on gobblin [13:33:00] (03CR) 10Kosta Harlan: [WIP] Create structuredtask/article/edit schema (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/681052 (https://phabricator.wikimedia.org/T278177) (owner: 10Kosta Harlan) [13:35:04] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM!" [analytics/reportupdater] - 10https://gerrit.wikimedia.org/r/677511 (https://phabricator.wikimedia.org/T193169) (owner: 10Awight) [13:36:40] (03CR) 10Mforns: "I just merged the DATE_FORMAT change to the executor.py." [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/676299 (https://phabricator.wikimedia.org/T193169) (owner: 10Awight) [13:38:55] (03CR) 10Mforns: [V: 03+2 C: 03+2] "LGTM!" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/680270 (https://phabricator.wikimedia.org/T279046) (owner: 10Awight) [13:44:23] 10Analytics: Delete UpperCased eventlogging legacy directories in /wmf/data/event 90 days from 2021-04-15 (after 2021-07-14) - https://phabricator.wikimedia.org/T280293 (10Ottomata) > Is refine_sanitize not going to do that as well? Hm it will, but it hasn't been applied yet, we are still using old jar with Eve... [13:44:48] 10Analytics: Delete UpperCased eventlogging legacy directories in /wmf/data/event 90 days from 2021-04-15 (after 2021-07-14) - https://phabricator.wikimedia.org/T280293 (10Ottomata) I'll add this as a TODO for {T273789} [13:47:05] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Sanitize and ingest all event tables into the event_sanitized database - https://phabricator.wikimedia.org/T273789 (10Ottomata) [13:53:03] 10Analytics, 10Analytics-Kanban, 10Event-Platform, 10Patch-For-Review: Sanitize and ingest all event tables into the event_sanitized database - https://phabricator.wikimedia.org/T273789 (10Ottomata) [14:01:32] 10Analytics-Radar, 10User-bd808: Reduce partition granularity of hive tables - https://phabricator.wikimedia.org/T273310 (10JAllemandou) something to note: Hive separate table metadata from storage. When using external tables in Hive, dropping the tables only deletes the metadata, not the data itself: ` hdfs d... [14:13:57] 10Analytics, 10Event-Platform, 10Privacy Engineering, 10Product-Analytics, 10Privacy: Capture rev_is_revert event data in a stream different than mediawiki.revision-create - https://phabricator.wikimedia.org/T280538 (10Ottomata) [14:20:02] 10Analytics, 10Product-Analytics: Hive: create table statement failure - https://phabricator.wikimedia.org/T280168 (10JAllemandou) I managed to have this working in my personal database. Can we sync on this via IRC @nettrom_WMF ? [14:20:40] joal: i was talking with nettrom about that. i think it was a file permissions problem in his data files? [14:20:51] Ah ok ottomata [14:26:42] (03PS7) 10Jason Linehan: [WIP] Metrics Platform schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/676392 (https://phabricator.wikimedia.org/T276379) [14:29:31] (03CR) 10Ottomata: "What is a metric platform event? 😊" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/676392 (https://phabricator.wikimedia.org/T276379) (owner: 10Jason Linehan) [14:32:02] 10Analytics: Top read repeats - https://phabricator.wikimedia.org/T280011 (10JAllemandou) @kzimmerman Hi - Is this task something your team could look at? I have triple checked and confirm that at least a few of the listed pages show unnatural patterns: * F5 Networks: https://pageviews.toolforge.org/pageviews/?... [14:32:05] (03CR) 10Ottomata: [WIP] Metrics Platform schema (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/676392 (https://phabricator.wikimedia.org/T276379) (owner: 10Jason Linehan) [14:37:51] (03CR) 10Joal: "When adding databases to sqoop-list, we should add them to both labs and production. And actually, given we keep those two files in sync, " [analytics/refinery] - 10https://gerrit.wikimedia.org/r/680414 (https://phabricator.wikimedia.org/T279564) (owner: 10Razzi) [14:38:55] 10Analytics, 10Event-Platform: Schema tests should validate examples - https://phabricator.wikimedia.org/T275143 (10Ottomata) [[ https://github.com/wikimedia/jsonschema-tools/blob/master/lib/tests/robustness.js#L121-L144 | This exists ]]. The only problem is that in schemas/event/secondary repository, jsonsch... [14:40:18] (03PS1) 10Ottomata: event_sanitized_allowlist - mediawiki_page_delete: keep_all [analytics/refinery] - 10https://gerrit.wikimedia.org/r/681099 (https://phabricator.wikimedia.org/T273789) [14:53:26] (03CR) 10Ottomata: [C: 03+2] SanitizeTransformation - Just some simple logging improvements. [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/680382 (owner: 10Ottomata) [14:54:23] mforns: FYI this is only going to be used in then test cluster for now [14:54:24] https://gerrit.wikimedia.org/r/c/analytics/refinery/+/681099 [14:54:26] ok w you? [14:54:31] 10Analytics, 10WMCZ-Stats: Review request: New datasets for WMCZ published under analytics.wikimedia.org - https://phabricator.wikimedia.org/T279567 (10JAllemandou) Hi @Urbanecm , thank you for pinging us on this :) The usual pattern for data publication is to ask for an approval through a security review. I h... [15:06:56] 10Analytics, 10Event-Platform, 10Privacy Engineering, 10Product-Analytics, 10Privacy: Capture rev_is_revert event data in a stream different than mediawiki.revision-create - https://phabricator.wikimedia.org/T280538 (10JFishback_WMF) p:05Triageβ†’03Medium [15:07:26] 10Analytics, 10WMCZ-Stats: Review request: New datasets for WMCZ published under analytics.wikimedia.org - https://phabricator.wikimedia.org/T279567 (10Urbanecm) Thanks a lot for your comment, this is nice to hear! I'm happy to use `user` from Hive instead, I didn't know it is sanitized already. Do let me kno... [15:10:31] 10Analytics, 10Data-release, 10Privacy Engineering, 10Research, 10Privacy: Apache Beam go prototype code for DP evaluation - https://phabricator.wikimedia.org/T280385 (10JFishback_WMF) p:05Triageβ†’03Medium [15:10:50] 10Analytics-Radar, 10User-bd808: Reduce partition granularity of hive tables - https://phabricator.wikimedia.org/T273310 (10bd808) >>! In T273310#7014252, @JAllemandou wrote: > @bd808 I'm droppping the data to match the table-drop. Ack, and thank you. [15:23:55] 10Analytics-Clusters, 10Analytics-Kanban: Balance Kafka topic partitions on Kafka Jumbo to take advantage of the new brokers - https://phabricator.wikimedia.org/T255973 (10Ottomata) Prometheus doesn't seem to like long range queries, so I can't show more than 30 days back, but we can see the topic data differe... [15:32:38] 10Analytics: Consolidate labs / production sqoop lists to a single list - https://phabricator.wikimedia.org/T280549 (10razzi) [15:37:19] 10Analytics-Radar, 10Article-Recommendation: Generate article recommendations in Hadoop for use in production - https://phabricator.wikimedia.org/T210844 (10Ottomata) [15:39:12] 10Analytics-Clusters: Prevent notebooks on spark to launch 2 pyspark instances instead of 1 - https://phabricator.wikimedia.org/T152522 (10Ottomata) 05Openβ†’03Declined [15:39:46] 10Analytics-Clusters: Make it (just a bit) easier to spin up Hadoop cluster in Cloud VMs - https://phabricator.wikimedia.org/T223389 (10Ottomata) 05Openβ†’03Declined [15:40:12] 10Analytics, 10Event-Platform, 10Privacy Engineering, 10Product-Analytics, 10Privacy: Capture rev_is_revert event data in a stream different than mediawiki.revision-create - https://phabricator.wikimedia.org/T280538 (10sbassett) Some potential concerns: # There's a property within the revision-tags-cha... [15:43:04] 10Analytics-Clusters, 10Product-Analytics: Improve Hue user management - https://phabricator.wikimedia.org/T127850 (10Ottomata) 05Openβ†’03Resolved a:03Ottomata [15:43:33] 10Analytics-Clusters, 10SRE, 10User-Elukey: Manage Hue via systemd unit - https://phabricator.wikimedia.org/T206484 (10Ottomata) 05Openβ†’03Resolved a:03Ottomata [15:44:36] 10Analytics, 10Dumps-Generation: Temp files left around in wikistats_1/ ? - https://phabricator.wikimedia.org/T280311 (10fdans) @ArielGlenn thank you for noticing, please delete! [15:44:48] 10Analytics-Radar, 10Dumps-Generation: Temp files left around in wikistats_1/ ? - https://phabricator.wikimedia.org/T280311 (10fdans) [15:45:10] 10Analytics-Clusters: hue.wikimedia.org throws an exception when trying to log in with a non-ASCII username - https://phabricator.wikimedia.org/T260929 (10Ottomata) 05Openβ†’03Declined Hue login is now handled by CAS. [15:48:34] 10Analytics-Clusters: Disk filling up on `/` on an-coord1001 - https://phabricator.wikimedia.org/T279304 (10Ottomata) a:03razzi [15:49:04] 10Analytics-Clusters: Disk filling up on `/` on an-coord1001 - https://phabricator.wikimedia.org/T279304 (10Ottomata) p:05Triageβ†’03Medium [15:50:07] 10Analytics: Superset annotation text overlaps illegibly - https://phabricator.wikimedia.org/T279738 (10Ottomata) p:05Triageβ†’03Low [15:50:47] 10Analytics-Clusters, 10Analytics-Kanban: AQS Cassandra storage: Investigate incorrect storage report on Grafana - https://phabricator.wikimedia.org/T278234 (10Ottomata) 05Openβ†’03Declined Should be fixed in Cassandra 3 [15:53:44] 10Analytics, 10Event-Platform, 10Privacy Engineering, 10Product-Analytics, 10Privacy: Capture rev_is_revert event data in a stream different than mediawiki.revision-create - https://phabricator.wikimedia.org/T280538 (10fdans) a:05Milimetricβ†’03None [15:55:44] 10Analytics-Clusters, 10Analytics-Kanban, 10Patch-For-Review: Improve logging for HDFS Namenodes - https://phabricator.wikimedia.org/T265126 (10Ottomata) 05Openβ†’03Declined Declining for now, let's re-open if we decide to do this later. [15:56:11] 10Analytics-Radar, 10Dumps-Generation: Temp files left around in wikistats_1/ ? - https://phabricator.wikimedia.org/T280311 (10ArielGlenn) >>! In T280311#7014801, @fdans wrote: > @ArielGlenn thank you for noticing, please delete! I will delete the temp files immediately, thanks! Do we need both of wikistats_... [15:57:09] 10Analytics, 10Analytics-Dashiki, 10Analytics-Kanban: npm install gives Verification failed while extracting mediawiki-storage@https://github.com/wikimedia/analytics-mediawiki-storage/archive/master.tar.gz - https://phabricator.wikimedia.org/T278982 (10fdans) p:05Triageβ†’03High [15:57:49] 10Analytics-Kanban: Analytics Hardware for Fiscal Year 2020/2021 - https://phabricator.wikimedia.org/T255145 (10Ottomata) [15:57:51] 10Analytics-Clusters: Put 24 Hadoop worker nodes in service (cluster expansion) - https://phabricator.wikimedia.org/T255146 (10Ottomata) 05Openβ†’03Resolved All but 6 have been racked, those 6 will be tracked in {T275767}. [15:58:41] 10Analytics-Radar, 10SRE, 10ops-eqiad: Try to move some new analytics worker nodes to different racks - https://phabricator.wikimedia.org/T276239 (10Ottomata) Hiya, @Cmjohnson any news on this? {T275767} is blocked on this task. [16:06:23] 10Analytics, 10Event-Platform, 10Privacy Engineering, 10Product-Analytics, 10Privacy: Capture rev_is_revert event data in a stream different than mediawiki.revision-create - https://phabricator.wikimedia.org/T280538 (10Ottomata) Interesting. For 1., is the only potential issue the chronology_id? I can'... [16:06:53] 10Analytics: Improve Sonar job for analytics-refinery-source - https://phabricator.wikimedia.org/T279841 (10fdans) p:05Triageβ†’03Low @awight do you feel like taking this one? [16:07:02] !log run kafka preferred-replica-election on jumbo cluster (kafka-jumbo1002) [16:07:11] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:07:34] 10Analytics, 10Event-Platform, 10SRE, 10serviceops, 10Patch-For-Review: DRY kafka broker declaration in helmfiles - https://phabricator.wikimedia.org/T253058 (10akosiaris) Hi! Adopting the new functionality in networkpolicy resources has indeed created some tech debt. It's a tech debt we created on purp... [16:07:39] 10Analytics, 10WMCZ-Stats: Review request: New datasets for WMCZ published under analytics.wikimedia.org - https://phabricator.wikimedia.org/T279567 (10JAllemandou) Follow up questions after having talked to the team: * How frequent does the job need to be run, and new data released ? * If not a one-off, woul... [16:08:13] 10Analytics, 10Event-Platform, 10SRE, 10serviceops, 10Patch-For-Review: DRY kafka broker declaration in helmfiles - https://phabricator.wikimedia.org/T253058 (10Ottomata) <3 [16:15:37] 10Analytics, 10FR-Tech-Analytics: event.WikipediaPortal referer modification - https://phabricator.wikimedia.org/T279952 (10fdans) p:05Triageβ†’03Medium [16:20:24] 10Analytics: Improve pageview automated traffic detection heuristics - https://phabricator.wikimedia.org/T280565 (10JAllemandou) [16:21:08] 10Analytics: Improve pageview automated traffic detection heuristics - https://phabricator.wikimedia.org/T280565 (10fdans) [16:21:22] 10Analytics: Top read repeats - https://phabricator.wikimedia.org/T280011 (10fdans) p:05Triageβ†’03Medium [16:22:04] 10Analytics: Improve pageview automated traffic detection heuristics - https://phabricator.wikimedia.org/T280565 (10fdans) p:05Triageβ†’03Medium [16:22:47] 10Analytics, 10Event-Platform, 10SRE, 10serviceops, 10Patch-For-Review: DRY kafka broker declaration in helmfiles - https://phabricator.wikimedia.org/T253058 (10akosiaris) a:03akosiaris [16:35:29] 10Analytics: Easy dimensional data visualization - https://phabricator.wikimedia.org/T280029 (10fdans) p:05Triageβ†’03Medium [16:37:04] 10Analytics, 10Analytics-Kanban, 10Machine-Learning-Team, 10ORES, and 2 others: Generate dump of scored-revisions from 2018-2020 for Wikis except English Wikipedia - https://phabricator.wikimedia.org/T280107 (10fdans) a:03JAllemandou [16:38:31] 10Analytics: Easy dimensional data visualization - https://phabricator.wikimedia.org/T280029 (10Milimetric) We talked this over, some quick notes: * leaving this as-is for now, keeping an eye on how it scales. If we see problems we can split the data into "the last 6 months" and "all time but simplified schema... [16:45:24] !log make RefineMonitor use analytics keytab - this should be a no-op [16:45:27] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [16:46:31] 10Analytics, 10Event-Platform, 10Privacy Engineering, 10Product-Analytics, 10Privacy: Capture rev_is_revert event data in a stream different than mediawiki.revision-create - https://phabricator.wikimedia.org/T280538 (10sbassett) >>! In T280538#7014971, @Ottomata wrote: > For 1., is the only potential iss... [16:52:59] 10Analytics-Radar, 10Product-Analytics, 10wmfdata-python: wmfdata cannot recover from a crashed Spark session - https://phabricator.wikimedia.org/T245713 (10nshahquinn-wmf) p:05Mediumβ†’03Low a:05nshahquinn-wmfβ†’03None [16:53:32] 10Analytics, 10Event-Platform, 10Privacy Engineering, 10Product-Analytics, 10Privacy: Capture rev_is_revert event data in a stream different than mediawiki.revision-create - https://phabricator.wikimedia.org/T280538 (10Ottomata) > if chronology_id behaves the same way, then I'd imagine it would. Just loo... [16:56:25] 10Analytics-Radar, 10Contributors-Team, 10Product-Analytics: Consider scrapping Schema:PageContentSaveComplete and Schema:NewEditorEdit, given we have Schema:Edit - https://phabricator.wikimedia.org/T123958 (10ldelench_wmf) 05Openβ†’03Declined [16:56:47] 10Analytics-Radar, 10Contributors-Team, 10Product-Analytics: Consider scrapping Schema:PageContentSaveComplete and Schema:NewEditorEdit, given we have Schema:Edit - https://phabricator.wikimedia.org/T123958 (10kzimmerman) These schemas will probably be retired when schema migrate to MEP [16:58:41] (03CR) 10Awight: "Thanks for asking, and for all the merges! There is actually a blocker, which is that I've disabled these jobs because of corruption caus" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/676299 (https://phabricator.wikimedia.org/T193169) (owner: 10Awight) [16:59:49] mforns: Hi, I see that I'm creating an increasingly large mess in your workshop! Please let me know if I can do anything to help, or if we should chat to clarify, etc. [17:04:09] (03PS4) 10Awight: Validate the native "hive" report type [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/676299 (https://phabricator.wikimedia.org/T193169) [17:04:37] (03CR) 10Awight: "PS 4: manual rebase (but previous comments still apply)" [analytics/reportupdater-queries] - 10https://gerrit.wikimedia.org/r/676299 (https://phabricator.wikimedia.org/T193169) (owner: 10Awight) [17:07:35] 10Analytics, 10Event-Platform, 10Privacy Engineering, 10Product-Analytics, 10Privacy: Capture rev_is_revert event data in a stream different than mediawiki.revision-create - https://phabricator.wikimedia.org/T280538 (10sbassett) >>! In T280538#7015331, @Ottomata wrote: > Just looking at both revision-cre... [17:16:25] 10Analytics, 10Event-Platform, 10Privacy Engineering, 10Product-Analytics, 10Privacy: Capture rev_is_revert event data in a stream different than mediawiki.revision-create - https://phabricator.wikimedia.org/T280538 (10Pchelolo) chronology_id was initially added by Stas for Wikidata query service. Given... [17:21:11] Gone for today team - see you tomorrow [17:25:49] 10Analytics, 10Event-Platform, 10Privacy Engineering, 10Product-Analytics, 10Privacy: Capture rev_is_revert event data in a stream different than mediawiki.revision-create - https://phabricator.wikimedia.org/T280538 (10dcausse) >>! In T280538#7015434, @Pchelolo wrote: > chronology_id was initially added... [17:27:14] razzi: o/ I left some comments in various tasks about Buster upgrades, I think that we should prioritize those [17:28:47] I also left a note for the flerovium/furud recipe [17:28:56] Sounds good elukey, I'll come up with a plan for "Upgrade the Hadoop masters to Debian Buster" and we can try the procedure later this week [17:29:28] razzi: let's also try to speed up flerovium/furud, we should be able to do them this week [17:29:46] or should we do flerovium first, since it's lower risk? [17:29:56] yes yes flerovium first [17:30:06] so you'll get some experience with the partman recipes [17:30:13] those will be needed for the masters as well [17:30:26] 10Analytics-EventLogging, 10Analytics-Radar, 10Front-end-Standards-Group, 10MediaWiki-extensions-WikimediaEvents, and 4 others: Provide a reusable getEditCountBucket function for analytics purposes - https://phabricator.wikimedia.org/T210106 (10LGoto) a:03Jdrewniak [18:06:38] 10Analytics, 10Event-Platform, 10Privacy Engineering, 10Product-Analytics, 10Privacy: Capture rev_is_revert event data in a stream different than mediawiki.revision-create - https://phabricator.wikimedia.org/T280538 (10Ottomata) @Pchelolo putting aside the privacy question for the moment, could we add an... [18:07:25] 10Analytics, 10Event-Platform, 10Privacy Engineering, 10Product-Analytics, 10Privacy: Capture rev_is_revert event data in a stream different than mediawiki.revision-create - https://phabricator.wikimedia.org/T280538 (10Pchelolo) >>! In T280538#7015653, @Ottomata wrote: > @Pchelolo putting aside the priva... [18:12:51] elukey: should I take https://phabricator.wikimedia.org/T280096 for ops week? [18:13:45] ottomata: yes please! The remaining part is to decide what to do with the hdfs files, the stat100x ones were moved to Fran's home [18:14:01] ok, i'm going to make a tarball of all it in one file, ok? [18:14:20] i'll make hdfs, and hostname/ dirs for them [18:14:24] and try to keep paths the same [18:14:50] sure [18:19:08] elukey: can you log into aqs-test1001? [18:19:15] i can get into hosts in deployment-prep [18:19:20] but not in analytics cloud vps project [18:19:30] * razzi lunch [18:19:55] ottomata: only the hdfs files are left, already taken care of the rest [18:20:19] oh ok [18:20:29] where are the aqs-test ones? [18:22:19] on stat1007, in Fran's dir [18:22:35] https://phabricator.wikimedia.org/T280096#7008131 [18:22:37] ottomata: --^ [18:33:04] * elukey afk! [18:49:28] OH! i missed the aqs stuff, thnak you1 [18:49:29] got it [18:50:41] (03CR) 10Mforns: "Is this allow-list going to host mediawiki schemas as well as analytics schemas?" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/681099 (https://phabricator.wikimedia.org/T273789) (owner: 10Ottomata) [19:03:49] (03PS1) 10Milimetric: Update to gulp 4 and fix vulnerabilities [analytics/mediawiki-storage] - 10https://gerrit.wikimedia.org/r/681176 [19:05:52] (03CR) 10jerkins-bot: [V: 04-1] Update to gulp 4 and fix vulnerabilities [analytics/mediawiki-storage] - 10https://gerrit.wikimedia.org/r/681176 (owner: 10Milimetric) [19:28:44] 10Analytics-Radar, 10observability, 10Graphite, 10Patch-For-Review, and 2 others: Broken reportupdater queries: edit count bucket label contains illegal characters - https://phabricator.wikimedia.org/T279046 (10awight) [19:54:07] 10Analytics: Move lexnasser's files before user deletion - https://phabricator.wikimedia.org/T280096 (10Ottomata) @lexnasser double checking, are you sure you need to keep lex.db/referer_test, wrdata and wrrefined? Together those are 1.5 TB. [19:59:09] 10Analytics: Move lexnasser's files before user deletion - https://phabricator.wikimedia.org/T280096 (10Ottomata) a:05elukeyβ†’03Ottomata [20:02:41] 10Analytics: Move lexnasser's files before user deletion - https://phabricator.wikimedia.org/T280096 (10Ottomata) I'm collecting the data to archive on an-launcher1002 in /srv/lexnasser-backup-2021-04. I've got Lex's hdfs user dir files, but the hive tables are pretty large. [20:04:39] (03CR) 10Ottomata: "> Patch Set 1:" [analytics/refinery] - 10https://gerrit.wikimedia.org/r/681099 (https://phabricator.wikimedia.org/T273789) (owner: 10Ottomata) [20:09:20] 10Analytics: Move lexnasser's files before user deletion - https://phabricator.wikimedia.org/T280096 (10lexnasser) @Ottomata Thanks for pointing out the huge sizes of those tables. I was mainly keeping them for reference, but it seems that any future utility of those tables is dwarfed by their sizes. Feel free t... [20:12:30] (03PS2) 10Milimetric: Clean up repository and release 0.6.0 [analytics/mediawiki-storage] - 10https://gerrit.wikimedia.org/r/681176 [20:13:03] (03CR) 10jerkins-bot: [V: 04-1] Clean up repository and release 0.6.0 [analytics/mediawiki-storage] - 10https://gerrit.wikimedia.org/r/681176 (owner: 10Milimetric) [20:16:55] this thing is gonna be the death of me [20:19:15] like... pretending we're all following semver has got to be one of the larger cases of mass-hypnosis ever [20:20:58] (03PS3) 10Milimetric: Clean up repository and release 0.6.0 [analytics/mediawiki-storage] - 10https://gerrit.wikimedia.org/r/681176 [20:24:06] who's following semver?! [20:24:47] event schemas are but major changes are not allowed :p [20:24:51] restricted semver [20:26:01] :P yeah, in this case node claims to follow it but there are ALWAYS breaking changes between minor versions [20:26:29] (03CR) 10Milimetric: [C: 03+2] "let's see if this works to release a new version on the githubs" [analytics/mediawiki-storage] - 10https://gerrit.wikimedia.org/r/681176 (owner: 10Milimetric) [20:27:26] haha [20:48:03] huh... someone hacked my IRC account and changed my password... lame [20:48:28] milimetric: on freenode? [20:48:33] yea [20:48:59] milimetric: are you in any restricted channels? Do you need to alert sec team? [20:49:08] Freenode need to be told for sure [20:49:38] I'll send a message to security, I just reset it to something pretty gnarly so I doubt they can do it again... [20:49:44] and it wasn't shared [20:50:23] I don't obviously know what access you have but please be careful [20:50:39] And tell freenode as they might be able to block whoever did it [20:52:20] yep, I sent security a message [21:06:17] Can you tell freenode too? [21:06:27] Kline is on /stats p currently [21:08:58] milimetric: ^ [21:36:52] (done) [22:11:37] 10Analytics, 10Analytics-Dashiki, 10Analytics-Kanban: npm install gives Verification failed while extracting mediawiki-storage@https://github.com/wikimedia/analytics-mediawiki-storage/archive/master.tar.gz - https://phabricator.wikimedia.org/T278982 (10Milimetric) I found that dashiki doesn't really build on... [22:12:37] 10Analytics, 10Analytics-Dashiki, 10Analytics-Kanban: npm install gives Verification failed while extracting mediawiki-storage@https://github.com/wikimedia/analytics-mediawiki-storage/archive/master.tar.gz - https://phabricator.wikimedia.org/T278982 (10Milimetric) a:03Milimetric This "Done" makes me feel v...