[00:29:16] 10Analytics, 10Data-Services, 10Privacy Engineering, 10cloud-services-team (Kanban): Raw IPs of logged-out users disclosed in wiki-replicas - https://phabricator.wikimedia.org/T284948 (10AntiCompositeNumber) [00:29:19] 10Analytics, 10Data-Services, 10Privacy Engineering, 10cloud-services-team (Kanban): Increased visibility in wiki-replicas for volunteers fighting vandals - https://phabricator.wikimedia.org/T284944 (10AntiCompositeNumber) [00:32:11] 10Analytics, 10Data-Services, 10Privacy Engineering, 10cloud-services-team (Kanban): Raw IPs of logged-out users disclosed in wiki-replicas - https://phabricator.wikimedia.org/T284948 (10AntiCompositeNumber) [00:32:22] 10Analytics, 10Data-Services, 10Privacy Engineering, 10cloud-services-team (Kanban): Raw IPs of logged-out users disclosed in wiki-replicas - https://phabricator.wikimedia.org/T284948 (10AntiCompositeNumber) 05Open→03Stalled [00:49:22] 10Analytics, 10Data-release, 10Privacy Engineering, 10Research, 10Privacy: Apache Beam go prototype code for DP evaluation - https://phabricator.wikimedia.org/T280385 (10Milimetric) There's a lot in this task I can't catch up with, but it looks more than ready, happy to help you pitch it to all the teams... [00:50:37] 10Analytics, 10Data-Services, 10Privacy Engineering, 10cloud-services-team (Kanban): Increased visibility in wiki-replicas for volunteers fighting vandals - https://phabricator.wikimedia.org/T284944 (10AntiCompositeNumber) >>! In T284944#7455674, @JJMC89 wrote: >> Is this data visible on the wikis? > > Al... [00:55:03] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Improve Refine bad data handling - https://phabricator.wikimedia.org/T289003 (10Milimetric) Yep, that's what I meant, that uri_host created here is passed through unchanged. The steps are: * the UDF: https://gerrit.wikimedia.... [04:23:42] PROBLEM - Check unit status of monitor_refine_event_sanitized_analytics_delayed on an-launcher1002 is CRITICAL: CRITICAL: Status of the systemd unit monitor_refine_event_sanitized_analytics_delayed https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [05:06:29] 10Analytics, 10Platform Engineering, 10Patch-For-Review: Add log entry details to page and user events in EventBus - https://phabricator.wikimedia.org/T263055 (10odimitrijevic) Removing from Kanban since there hasn't been any activity since Jan 21. [05:08:28] 10Analytics-Clusters, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, and 2 others: Setup Analytics team in VO/splunk oncall - https://phabricator.wikimedia.org/T273064 (10odimitrijevic) [05:09:43] 10Analytics-Clusters, 10Analytics-Kanban, 10Cassandra, 10Data-Engineering, and 2 others: Set up a testing environment for the AQS Cassandra 3 migration - https://phabricator.wikimedia.org/T257572 (10odimitrijevic) [05:11:40] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: dbstore1007 is swapping heavilly, potentially soon killing mysql services due to OOM error - https://phabricator.wikimedia.org/T290841 (10odimitrijevic) [05:11:43] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Retroactively edit Airflow decision document to include discarded schedulers - https://phabricator.wikimedia.org/T285691 (10odimitrijevic) [05:11:45] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Write a job entirely in Airflow with spark and/or sparkSQL - https://phabricator.wikimedia.org/T285692 (10odimitrijevic) [05:11:47] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Switch off skipTrash for some data purging - https://phabricator.wikimedia.org/T270431 (10odimitrijevic) [05:11:49] 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Purge any Kerberos keytab files that are not managed by puppet - https://phabricator.wikimedia.org/T294124 (10odimitrijevic) [05:11:53] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Jupyter notebook logs should appear in Logstash - https://phabricator.wikimedia.org/T288348 (10odimitrijevic) [05:11:54] 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Trash, 10Upstream: --- DISCUSSED BELOW --- - https://phabricator.wikimedia.org/T114124 (10odimitrijevic) [05:11:57] 10Analytics-Clusters, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, and 2 others: Improve user experience for Kerberos by creating automatic token renewal service - https://phabricator.wikimedia.org/T268985 (10odimitrijevic) [05:19:47] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: [Airflow] Create repository for Airflow DAGs - https://phabricator.wikimedia.org/T294026 (10odimitrijevic) [05:21:42] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Automate kerberos credential creation and management to ease the creation of testing infrastructure - https://phabricator.wikimedia.org/T292389 (10odimitrijevic) [05:26:05] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, and 3 others: wmfdata.mariadb relies on analytics-mysql being available - https://phabricator.wikimedia.org/T292479 (10odimitrijevic) [05:26:46] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Add a presto query logger - https://phabricator.wikimedia.org/T269832 (10odimitrijevic) [05:30:02] 10Analytics, 10Traffic: Review use of realloc in varnishkafka - https://phabricator.wikimedia.org/T287561 (10odimitrijevic) 05Open→03Declined Thanks @BBlack I'll go ahead and close the issue as declined for now and we can reopen as new information comes in. [05:31:05] 10Analytics, 10Patch-For-Review: AQS is not OpenAPI 3 compliant - https://phabricator.wikimedia.org/T240995 (10odimitrijevic) [05:32:16] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Move the Analytics/DE testing infrastructure to Pontoon - https://phabricator.wikimedia.org/T292388 (10odimitrijevic) [05:37:39] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Conda's CPPFLAGS may not be correct when pip installing a package that needs c/cpp compilation - https://phabricator.wikimedia.org/T292699 (10odimitrijevic) [05:38:37] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Standardize the stats system user uid - https://phabricator.wikimedia.org/T291384 (10odimitrijevic) [05:39:42] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10SRE: ~1 request/minute to intake-logging.wikimedia.org times out at the traffic/service interface - https://phabricator.wikimedia.org/T264021 (10odimitrijevic) [05:40:49] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Platform Team Initiatives (Modern Event Platform (TEC2)): Allow disabling/enabling configured streams via wgEventStreams config - https://phabricator.wikimedia.org/T259712 (10odimitrijevic) Removing the kanban tag for now. [05:41:53] 10Analytics-Clusters: Set yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds - https://phabricator.wikimedia.org/T269616 (10odimitrijevic) [05:42:37] 10Analytics: Crunch and delete many old dumps logs - https://phabricator.wikimedia.org/T280678 (10odimitrijevic) [05:45:04] 10Analytics: Data drifts between superset_production on an-coord1001 and db1108 - https://phabricator.wikimedia.org/T279440 (10odimitrijevic) [05:45:39] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Setup Presto UI in production - https://phabricator.wikimedia.org/T292087 (10odimitrijevic) [05:46:13] 10Analytics, 10Analytics-Kanban, 10Data-Engineering: Reportupdater SQL jobs failing with Python error - https://phabricator.wikimedia.org/T284074 (10odimitrijevic) [05:46:39] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Un-fork analytics/gobblin - https://phabricator.wikimedia.org/T292396 (10odimitrijevic) [05:53:20] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Reportupdater SQL jobs failing with Python error - https://phabricator.wikimedia.org/T284074 (10odimitrijevic) a:05mforns→03None [05:57:27] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Retroactively edit Airflow decision document to include discarded schedulers - https://phabricator.wikimedia.org/T285691 (10odimitrijevic) 05Open→03Resolved [06:14:55] 10Analytics-Clusters, 10Analytics-Kanban: Move the Analytics infrastructure to Debian Buster - https://phabricator.wikimedia.org/T234629 (10elukey) 05Open→03Resolved a:03elukey Sure! [07:02:33] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Platform Engineering, 10tech-decision-forum: MediaWiki Events as Source of Truth - Decision Statement Overview - https://phabricator.wikimedia.org/T291120 (10Joe) >>! In T291120#7454793, @Ottomata wrote: >> Which part of this is controversial / of wide... [07:07:02] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Platform Engineering, 10tech-decision-forum: MediaWiki Events as Source of Truth - Decision Statement Overview - https://phabricator.wikimedia.org/T291120 (10awight) [07:30:43] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Platform Engineering, 10tech-decision-forum: MediaWiki Events as Source of Truth - Decision Statement Overview - https://phabricator.wikimedia.org/T291120 (10awight) > Also, I would like to see a reference for this statement: > >> For example, we know... [08:47:44] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Platform Engineering, 10tech-decision-forum: MediaWiki Events as Source of Truth - Decision Statement Overview - https://phabricator.wikimedia.org/T291120 (10Joe) @awight I fully agree with you: we need a single source of truth. But I don't think //tha... [09:12:01] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Platform Engineering, 10tech-decision-forum: MediaWiki Events as Source of Truth - Decision Statement Overview - https://phabricator.wikimedia.org/T291120 (10daniel) >>! In T291120#7463166, @Milimetric wrote: > In many cases, that's ok. MW itself does... [09:15:56] (03PS12) 10DCausse: Add fragment/mediawiki/revision/slot [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/731006 (https://phabricator.wikimedia.org/T293195) [09:38:02] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Platform Engineering, 10tech-decision-forum: MediaWiki Events as Source of Truth - Decision Statement Overview - https://phabricator.wikimedia.org/T291120 (10awight) >>! In T291120#7464376, @Joe wrote: > @awight I fully agree with you: we need a single... [09:41:12] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Platform Engineering, 10tech-decision-forum: MediaWiki Events as Source of Truth - Decision Statement Overview - https://phabricator.wikimedia.org/T291120 (10daniel) I recently attended a workshop called "Software Architecture: The Hard Parts" by Mark... [12:41:30] ottomata: o/ mind re-applying Petr +2 on https://gerrit.wikimedia.org/r/c/schemas/event/primary/+/731006 the dependent patch has been merged :) [13:08:04] lol - https://twitter.com/TrungTPhan/status/1453592751446970375/photo/1 [13:22:23] 10Analytics, 10Inuka-Team, 10Product-Analytics: Superset timeouts for KaiOS dashboard - https://phabricator.wikimedia.org/T277320 (10JAllemandou) I have opened the dashboard and it loaded correctly - Some charts took some time, but no timeout. @nshahquinn-wmf Have you changed the dashboard so that it loads?... [13:27:23] (03PS1) 10Thcipriani: DNM: Add mediawiki/desktop/button/click [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/735387 [13:28:57] (03Abandoned) 10Thcipriani: DNM: Add mediawiki/desktop/button/click [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/735387 (owner: 10Thcipriani) [13:43:48] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Platform Engineering, 10tech-decision-forum: MediaWiki Events as Source of Truth - Decision Statement Overview - https://phabricator.wikimedia.org/T291120 (10Ottomata) > It would be great to link these tasks here, @awight, those tasks are linked at th... [13:48:11] (03CR) 10Ottomata: Add fragment/mediawiki/revision/slot (031 comment) [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/731006 (https://phabricator.wikimedia.org/T293195) (owner: 10DCausse) [13:50:05] 10Analytics, 10Product-Analytics, 10Structured-Data-Backlog, 10Wikidata-Query-Service: Create a Commons equivalent of the wikidata_entity table in the Data Lake - https://phabricator.wikimedia.org/T258834 (10Gehel) [14:04:43] 10Analytics-EventLogging, 10Analytics-Kanban, 10Event-Platform, 10Goal, and 2 others: Modern Event Platform - https://phabricator.wikimedia.org/T185233 (10Ottomata) I don't really feel like this is done, but perhaps we don't need the parent task anymore? [14:18:48] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Snapshot and Reload cassandra2 pageview_per_article data table from all 12 instances - https://phabricator.wikimedia.org/T291472 (10BTullis) The loading of snapshot 7 completed successfully: ` progress: [/10.64.32.128]0:2131/21... [14:20:49] joal: Snapshot 7 completed loading. How long do you think we should give it before moving to snapshot 8 of 12? Do you want to wait until the compactions are complete, or start it now, or somewhere in between? [14:22:04] hm, I think we should wait for compaction completion [14:22:07] btullis [14:22:17] I know it's long, but I htink it's worth [14:23:21] Yep, no worries from me. I'm happy to wait. [14:24:20] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Snapshot and Reload cassandra2 pageview_per_article data table from all 12 instances - https://phabricator.wikimedia.org/T291472 (10BTullis) @JAllemandou and I have discussed and have decided to wait until the compactions are c... [14:27:34] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, and 3 others: Migrate analytics cluster alerts from Icinga to AlertManager - https://phabricator.wikimedia.org/T293399 (10fgiunchedi) Something that occurred to me while reading the list: (thank you @BTullis for putting that to... [14:28:10] RECOVERY - Check unit status of monitor_refine_event_sanitized_analytics_delayed on an-launcher1002 is OK: OK: Status of the systemd unit monitor_refine_event_sanitized_analytics_delayed https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [14:55:31] (03PS13) 10DCausse: Add fragment/mediawiki/revision/slot [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/731006 (https://phabricator.wikimedia.org/T293195) [14:55:33] (03CR) 10DCausse: Add fragment/mediawiki/revision/slot (031 comment) [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/731006 (https://phabricator.wikimedia.org/T293195) (owner: 10DCausse) [15:15:24] (03PS1) 10Milimetric: Update pageview allow list [analytics/refinery] - 10https://gerrit.wikimedia.org/r/735403 [15:15:43] (03CR) 10Milimetric: [V: 03+2 C: 03+2] Update pageview allow list [analytics/refinery] - 10https://gerrit.wikimedia.org/r/735403 (owner: 10Milimetric) [15:28:56] (03CR) 10Ottomata: [C: 03+2] Add fragment/mediawiki/revision/slot [schemas/event/primary] - 10https://gerrit.wikimedia.org/r/731006 (https://phabricator.wikimedia.org/T293195) (owner: 10DCausse) [15:39:36] 10Analytics, 10Analytics-Kanban, 10Data-Engineering-Kanban, 10Event-Platform, and 5 others: Revisions missing from mediawiki_revision_create - https://phabricator.wikimedia.org/T215001 (10JAllemandou) I did an analysis for both 2021-09-29 and 2021-09-30 on all projects (we don't have revision data on hadoo... [15:39:47] ottomata, milimetric --^ [15:47:00] joal: yeah, sorry, I did the same last night. It's even better than what you found I think. On the 29th the envoy timeouts were still not fixed, so the 30th is the day to look at. And two of those missing revisions actually happened almost at midnight, I found them the next day, Oct 1st. I'm currently exporting all the rev_ids from revision and archive on the prod replicas, will import them into HDFS and join to events [15:47:26] (so we can look at all the days since the envoy fixes, I'm exporting all available data) [15:47:36] ack milimetric [15:47:45] This is very good news :) [15:49:20] 10Analytics, 10Data-Engineering: Reduce number of missing revisions from mediawiki-history - https://phabricator.wikimedia.org/T294578 (10JAllemandou) [15:52:31] 10Analytics, 10Data-release, 10Privacy Engineering, 10Research, 10Privacy: Apache Beam go prototype code for DP evaluation - https://phabricator.wikimedia.org/T280385 (10Htriedman) Hi @Milimetric! Thanks for commenting on this task — lots has happened in the last 3-4 weeks and this served as a good remin... [15:58:41] wow that's amazing [16:01:21] milimetric: actually I think you can wait for your import - regular sqoop is to be done in 3 days [16:29:17] 10Analytics, 10Data-Engineering, 10EventStreams: Expose mediawiki/revision/tags-change in stream.wikimedia.org - https://phabricator.wikimedia.org/T294391 (10odimitrijevic) [16:29:20] 10Analytics, 10Analytics-Jupyter, 10Data-Engineering, 10Product-Analytics: conda list does not show all packages in environment - https://phabricator.wikimedia.org/T294368 (10odimitrijevic) [16:29:23] 10Analytics, 10Data-Engineering: Enforce authentication and authorization for webrequest_* topics in Kafka jumbo-eqiad cluster - https://phabricator.wikimedia.org/T294264 (10odimitrijevic) [16:29:26] 10Analytics, 10Data-Engineering, 10Wikidata, 10Wikidata-Query-Service: Events missing from event.rdf_streaming_updater_fetch_failure but present in /wmf/data/raw/event/eqiad.rdf-streaming-updater.fetch-failure - https://phabricator.wikimedia.org/T294361 (10odimitrijevic) [16:29:33] 10Analytics, 10Data-Engineering, 10Data-Services, 10Privacy Engineering, 10cloud-services-team (Kanban): Increased visibility in wiki-replicas for volunteers fighting vandals - https://phabricator.wikimedia.org/T284944 (10odimitrijevic) [16:29:36] 10Analytics, 10Data-Engineering, 10Data-Services, 10Privacy Engineering, 10cloud-services-team (Kanban): Raw IPs of logged-out users disclosed in wiki-replicas - https://phabricator.wikimedia.org/T284948 (10odimitrijevic) [16:29:42] 10Analytics, 10Data-Engineering, 10Product-Analytics, 10Structured-Data-Backlog, and 2 others: Create a Commons equivalent of the wikidata_entity table in the Data Lake - https://phabricator.wikimedia.org/T258834 (10odimitrijevic) [16:29:46] 10Analytics, 10Data-Engineering, 10Data-Services: Expose more properties to the user_properties_anon table on Wiki Replicas - https://phabricator.wikimedia.org/T226162 (10odimitrijevic) [16:29:52] 10Analytics, 10DC-Ops, 10Data-Engineering, 10SRE, 10ops-eqiad: Q2:(Need By: TBD) rack/setup/install an-worker11[42-48].eqiad.wmnet - https://phabricator.wikimedia.org/T293922 (10odimitrijevic) [16:34:23] ottomata: I remember you said schemas are deployed with eventgate and not fetched through schema.wikimedia.org, does it mean I need to wait a bit before +2ing https://gerrit.wikimedia.org/r/c/mediawiki/extensions/EventBus/+/731105 ? [16:37:36] 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Purge any Kerberos keytab files that are not managed by puppet - https://phabricator.wikimedia.org/T294124 (10BTullis) 05Open→03Resolved [16:38:38] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, 10Patch-For-Review: Test Alluxio as cache layer for Presto - https://phabricator.wikimedia.org/T266641 (10BTullis) 05Open→03Resolved [16:38:40] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Epic: Alluxio for Improved Superset Query Performance - https://phabricator.wikimedia.org/T288252 (10BTullis) [16:39:51] 10Analytics, 10Data-Engineering, 10Event-Platform, 10EventStreams: Expose mediawiki/revision/tags-change in stream.wikimedia.org - https://phabricator.wikimedia.org/T294391 (10odimitrijevic) [16:42:31] 10Analytics, 10Analytics-Jupyter, 10Data-Engineering, 10Product-Analytics: conda list does not show all packages in environment - https://phabricator.wikimedia.org/T294368 (10odimitrijevic) 05Open→03Resolved a:03odimitrijevic Closing as resolved [16:45:14] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Wikidata, 10Wikidata-Query-Service: Events missing from event.rdf_streaming_updater_fetch_failure but present in /wmf/data/raw/event/eqiad.rdf-streaming-updater.fetch-failure - https://phabricator.wikimedia.org/T294361 (10odimitrijevic) [16:50:13] 10Analytics, 10Data-Engineering, 10Event-Platform, 10EventStreams: Expose mediawiki/revision/tags-change in stream.wikimedia.org - https://phabricator.wikimedia.org/T294391 (10Ottomata) yeah lets do it! @sguebo_WMF @Htriedman before we just do this, wanted to check with your team to see if there is a pro... [16:52:48] dcausse: ah yes. actually there is a bit of a deployment race problem here. eventagte-main does need a rebuild+deploy to get this stuff [16:53:00] but, if the stream is not declared when it starts, it will need another restart too (it requests stream config on the start up) [16:53:59] ottomata: the stream is already there I think it's just the schema version that got bumped [16:54:29] but I'll wait til next week before merging, it has to go with the train anyways [17:06:04] 10Analytics-Clusters, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban, and 2 others: Improve user experience for Kerberos by creating automatic token renewal service - https://phabricator.wikimedia.org/T268985 (10BTullis) 05Open→03Resolved [17:06:22] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Jupyter notebook logs should appear in Logstash - https://phabricator.wikimedia.org/T288348 (10BTullis) 05In progress→03Resolved [17:08:15] 10Analytics, 10Data-Engineering, 10Event-Platform, 10EventStreams: Expose mediawiki/revision/tags-change in stream.wikimedia.org - https://phabricator.wikimedia.org/T294391 (10Ottomata) p:05Triage→03Medium [17:20:41] OH right [17:20:42] great [17:21:08] dcausse: then yeah we can just deploy eg main to pick up the new schema version [17:21:10] dcausse: i can do that asap [17:21:26] actually dcausse , can I teach you how? [17:22:45] (03PS8) 10DLynch: talk_page_event schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/731333 (https://phabricator.wikimedia.org/T286076) [17:28:37] dcausse: added some instructions https://wikitech.wikimedia.org/wiki/Event_Platform/EventGate/Administration#eventgate-wikimedia_schema_repository_change [17:28:52] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Epic: Alluxio for Improved Superset Query Performance - https://phabricator.wikimedia.org/T288252 (10BTullis) 05Open→03Resolved a:03BTullis [17:31:39] razzi: I managed to setup a max_query time for presto in superset :) [17:32:06] razzi: let me know when you have a minute so that I show you [17:32:31] ottomata: I'm in meeting now - let's test the gobblin-purge after? [17:35:04] btullis: I wanted to share some on cassandra compaction and why I'd rather have us wait [17:49:42] (03PS1) 10Thcipriani: Add mediawiki/desktop/button/click [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/735422 [18:08:50] (03CR) 10Thcipriani: [C: 04-1] Add mediawiki/desktop/button/click (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/735422 (owner: 10Thcipriani) [18:11:20] (03PS2) 10Thcipriani: Add mediawiki/desktop/button/click [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/735422 [18:13:28] (03PS3) 10Thcipriani: Add mediawiki/desktop/button/click [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/735422 [18:13:30] (03PS1) 10Thcipriani: new patchset [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/735425 [18:14:18] (03Abandoned) 10Thcipriani: new patchset [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/735425 (owner: 10Thcipriani) [18:15:10] (03PS4) 10Thcipriani: Add mediawiki/desktop/button/click [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/735422 [18:19:59] ping ottomata? [18:20:47] hm, ping razzi? [18:22:37] joal: there in 10 [18:22:49] ack ottomata [18:26:52] (03PS1) 10Thcipriani: Test edit readme [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/735427 [18:27:23] (03CR) 10Thcipriani: [C: 03+2] Test edit readme [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/735427 (owner: 10Thcipriani) [18:28:23] (03CR) 10Thcipriani: [V: 03+2 C: 03+2] Test edit readme [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/735427 (owner: 10Thcipriani) [18:32:27] joal o/ [18:32:32] scarfing a bit but am here and rdy [18:41:47] heya ottomata [18:41:56] yoohoo [18:42:04] sorry went to kiss the kids and it lasted longer than epected :) [18:42:11] ottomata: batcave, or here? [18:42:16] nice long Kiisssssss smooch goodnight [18:42:20] ahhh bc ya! [18:44:40] (03Abandoned) 10Thcipriani: Test edit readme [analytics/wmf-product/jobs] - 10https://gerrit.wikimedia.org/r/735427 (owner: 10Thcipriani) [18:44:50] (03Abandoned) 10Thcipriani: Add mediawiki/desktop/button/click [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/735422 (owner: 10Thcipriani) [18:46:29] 10Analytics, 10Data-Engineering, 10Event-Platform, 10EventStreams: Expose mediawiki/revision/tags-change in stream.wikimedia.org - https://phabricator.wikimedia.org/T294391 (10Htriedman) Hi @Ottomata! I know that this is the same theoretical attack vector as revision create, e.g. someone creates a page wi... [19:10:49] ping razzi? [19:13:00] 10Analytics, 10Data-Engineering, 10Event-Platform, 10EventStreams: Expose mediawiki/revision/tags-change in stream.wikimedia.org - https://phabricator.wikimedia.org/T294391 (10Ottomata) > Potential mitigations [...] for both event streams would likely be the same, correct? Yup! > I would also like to hear... [19:13:11] !log re-enable hdfs-cleaner for /wmf/gobblin [19:13:13] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [19:13:56] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Wikidata, 10Wikidata-Query-Service: Events missing from event.rdf_streaming_updater_fetch_failure but present in /wmf/data/raw/event/eqiad.rdf-streaming-updater.fetch-failure - https://phabricator.wikimedia.org/T294361 (10Ottomata) a:03Ottoma... [19:14:18] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Wikidata, 10Wikidata-Query-Service: Events missing from event.rdf_streaming_updater_fetch_failure but present in /wmf/data/raw/event/eqiad.rdf-streaming-updater.fetch-failure - https://phabricator.wikimedia.org/T294361 (10Ottomata) INNNNNTERes... [19:17:50] Hi joal, stepped out for lunch [19:18:05] but I have my computer with me, what’s up? [19:18:27] no prob razzi - we can talk tomorrow if you prefer :) [19:18:33] Lunch is important! :) [19:19:50] of course! [19:20:50] all good then razzi, I'll sign off for today - HAve a good end of day :) [19:32:34] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Platform Engineering, 10tech-decision-forum: MediaWiki Events as Source of Truth - Decision Statement Overview - https://phabricator.wikimedia.org/T291120 (10daniel) >>! In T291120#7464525, @awight wrote: > The theory is that event sourcing is specific... [19:40:51] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Wikidata, 10Wikidata-Query-Service: Events missing from event.rdf_streaming_updater_fetch_failure but present in /wmf/data/raw/event/eqiad.rdf-streaming-updater.fetch-failure - https://phabricator.wikimedia.org/T294361 (10Ottomata) 3 our of yo... [20:05:12] (03PS2) 10Clare Ming: Add new schema for web UI scroll tracking. [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/735003 (https://phabricator.wikimedia.org/T292586) [20:26:39] (03PS1) 10Ottomata: Refine - don't remove records during deduplication if ids are null [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/735444 (https://phabricator.wikimedia.org/T294361) [20:27:00] (03PS1) 10Ebernhardson: Add user_identifier field to sparql/query [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/735445 (https://phabricator.wikimedia.org/T293462) [20:30:17] (03CR) 10Ottomata: Refine - don't remove records during deduplication if ids are null (031 comment) [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/735444 (https://phabricator.wikimedia.org/T294361) (owner: 10Ottomata) [20:31:48] (03PS9) 10Bartosz Dziewoński: talk_page_edit schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/731333 (https://phabricator.wikimedia.org/T286076) (owner: 10DLynch) [20:32:18] (03CR) 10Bartosz Dziewoński: [C: 03+2] talk_page_edit schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/731333 (https://phabricator.wikimedia.org/T286076) (owner: 10DLynch) [20:33:00] (03Merged) 10jenkins-bot: talk_page_edit schema [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/731333 (https://phabricator.wikimedia.org/T286076) (owner: 10DLynch) [20:33:35] (03CR) 10Clare Ming: Add new schema for web UI scroll tracking. (031 comment) [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/735003 (https://phabricator.wikimedia.org/T292586) (owner: 10Clare Ming) [20:33:37] (03CR) 10jerkins-bot: [V: 04-1] Refine - don't remove records during deduplication if ids are null [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/735444 (https://phabricator.wikimedia.org/T294361) (owner: 10Ottomata) [20:42:15] (03CR) 10DLynch: "My understanding is that I'll also need to get Ie0a876427c5973a7cc259349769cacb913cec791 merged to config before this'll work in productio" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/731333 (https://phabricator.wikimedia.org/T286076) (owner: 10DLynch) [20:53:40] 10Analytics, 10Analytics-Kanban, 10Data-Engineering, 10Data-Engineering-Kanban: Reportupdater SQL jobs failing with Python error - https://phabricator.wikimedia.org/T284074 (10Milimetric) 05Open→03Resolved a:03Milimetric This just started working sometime in mid June of this year. There was no code... [20:53:43] 10Analytics-Radar, 10WMDE-Templates-FocusArea, 10Patch-For-Review, 10WMDE-TechWish (Sprint-2021-02-03): Add missing normalization to CodeMirror Grafana board - https://phabricator.wikimedia.org/T273748 (10Milimetric) [20:56:02] ottomata: I'm drawing an architecture diagram, and am I correct in thinking the mediawiki -> kafka bit is http? [21:03:30] 10Analytics: Running into errors while adding Hive table to Superset dataset - https://phabricator.wikimedia.org/T284604 (10Milimetric) This worked for a few datasets I added, but not the one in @cchen's screenshot. There were no meaningful differences in how the tables are set up, so maybe it's some kind of pe... [21:04:53] 10Analytics: Reportupdater should stop running a job after some fixed number of failures - https://phabricator.wikimedia.org/T284037 (10Milimetric) 05Open→03Declined Trying to clean up. Please do re-open if we are too slow with the AirFlow migration. [21:07:51] 10Analytics: Make hudi work with Hive - https://phabricator.wikimedia.org/T262260 (10Milimetric) 05Open→03Declined no longer looking at Hudi in favor of Apache Iceberg [21:07:54] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: [SPIKE] Prototype of incremental updates for mediawiki history for simplewiki , including reverts using apache hudi - https://phabricator.wikimedia.org/T258532 (10Milimetric) [21:10:25] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: [SPIKE] Prototype of incremental updates for mediawiki history for simplewiki , including reverts using apache hudi - https://phabricator.wikimedia.org/T258532 (10Milimetric) [21:10:35] 10Analytics, 10Platform Engineering, 10Patch-For-Review: Add log entry details to page and user events in EventBus - https://phabricator.wikimedia.org/T263055 (10Milimetric) 05Open→03Declined We are discussing ideas like this, but not exactly this, closing and linking from a strategy document I'm putting... [21:14:28] 10Analytics, 10Patch-For-Review: HiveExtensions.convertToSchema does not properly convert arrays of structs - https://phabricator.wikimedia.org/T259924 (10Milimetric) From what I gathered in the code reviews, this is easier in Spark 3 and we're waiting for that, correct? [21:25:48] 10Analytics: Add mediarequests dataset to druid (just some dimensions) - https://phabricator.wikimedia.org/T240613 (10Milimetric) 05Open→03Declined no activity for a while [21:34:02] 10Analytics, 10Discovery: Ingest wdqs metrics into druid - https://phabricator.wikimedia.org/T240498 (10Milimetric) 05Open→03Declined I don't think Druid panned out as our solution for this, but please re-open if this is interesting at all. [21:36:33] 10Analytics, 10Data-Engineering, 10Event-Platform, 10EventStreams: Expose mediawiki/revision/tags-change in stream.wikimedia.org - https://phabricator.wikimedia.org/T294391 (10Htriedman) Got it. In that case I don't see it adding any new privacy risk — I'll just make sure to bump my investigation of the fr... [21:39:20] 10Analytics: Add pertinent wdqs_external_sparql_query metrics and wdqs_internal_sparql_query to a superset dashboard - https://phabricator.wikimedia.org/T239852 (10Milimetric) 05Open→03Declined pretty much same as T240498 [21:47:06] 10Analytics, 10Data-Engineering: Analytics-test-hadoop Spark3 package upgrade - https://phabricator.wikimedia.org/T291465 (10Milimetric) 05Open→03Declined In a recent discussion we decided to separate the spark upgrade from the bigtop upgrade. So I'm closing this to reflect that decision, so that just lea... [21:47:08] 10Analytics, 10Data-Engineering: Upgrade analytics-hadoop to Spark 3 + scala 2.12 - https://phabricator.wikimedia.org/T291464 (10Milimetric) [21:50:12] 10Analytics, 10Data-Engineering: Analytics-test-hadoop Spark3 package upgrade - https://phabricator.wikimedia.org/T291465 (10Milimetric) 05Declined→03Open [21:50:14] 10Analytics, 10Data-Engineering: Upgrade analytics-hadoop to Spark 3 + scala 2.12 - https://phabricator.wikimedia.org/T291464 (10Milimetric) [21:54:13] 10Analytics, 10Product-Analytics, 10Editing-team (Tracking): Add MariaDB replicas to Superset - https://phabricator.wikimedia.org/T291195 (10Milimetric) 05Open→03Declined Declining this to reflect the discussion above. The use cases will be handled separately for now. I'm helping Megan with the one we... [21:57:42] 10Analytics: Superset query timeouts for charts using Druid table - https://phabricator.wikimedia.org/T282618 (10Milimetric) 05Open→03Declined Since this has been reported, and we don't have much work we can do on our side, I'm going to close this and add it to a reference document I'm building that would al... [22:00:31] 10Analytics, 10Data-Engineering, 10Data-Engineering-Kanban, 10Privacy Engineering: Implement Data Governance Tool - https://phabricator.wikimedia.org/T272060 (10Milimetric) p:05Medium→03High [22:09:39] 10Analytics: Modify ReportUpdater to support `YYYY-MM` dates for monthly reports - https://phabricator.wikimedia.org/T245096 (10Milimetric) 05Open→03Declined no longer maintaining reportupdater, we'll move to AirFlow and keep this in a list to consider [22:19:11] (03CR) 10Nray: [C: 03+2] "👍" [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/735003 (https://phabricator.wikimedia.org/T292586) (owner: 10Clare Ming) [22:19:56] (03Merged) 10jenkins-bot: Add new schema for web UI scroll tracking. [schemas/event/secondary] - 10https://gerrit.wikimedia.org/r/735003 (https://phabricator.wikimedia.org/T292586) (owner: 10Clare Ming) [22:51:14] 10Analytics, 10Data-Engineering, 10Event-Platform, 10Platform Engineering, 10tech-decision-forum: MediaWiki Events as Source of Truth - Decision Statement Overview - https://phabricator.wikimedia.org/T291120 (10Pchelolo) The format of "problem statement first, solutions later" is very hard to follow, sin... [22:57:13] 10Analytics: Convert siteinfo dumps from json to parquet - https://phabricator.wikimedia.org/T244380 (10Milimetric) [23:05:27] 10Analytics, 10Analytics-Dashiki: Allow sorting data in the Tabular View by multiple columns - https://phabricator.wikimedia.org/T240049 (10Milimetric) 05Open→03Declined Dashiki isn't going to see updates like this anytime soon. A more comprehensive plan will be made after we get into our groove as the ne... [23:18:46] 10Analytics: Generate edit totals for all wikipedias by country by year - https://phabricator.wikimedia.org/T219453 (10Milimetric) [23:18:49] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Generate edit totals by country by month/year - https://phabricator.wikimedia.org/T215655 (10Milimetric) [23:19:16] 10Analytics: Generate edit totals for all wikipedias by country by year - https://phabricator.wikimedia.org/T219453 (10Milimetric) [23:19:56] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review: Generate edit totals by country by month/year - https://phabricator.wikimedia.org/T215655 (10Milimetric) [23:23:37] 10Analytics: Add user_properties mysql table data to hadoop cluster - https://phabricator.wikimedia.org/T213910 (10Milimetric) 05Open→03Declined We did end up instrumenting user preferences, this data is flowing in real time on events. We had some problems getting too much of it so the instrumentation now i... [23:25:45] 10Analytics: Proposal: Make centralauth db replicate to all the analytics dbstores - https://phabricator.wikimedia.org/T219827 (10Milimetric) 05Open→03Declined Keeping track of this in some internal planning notes. I think we're moving towards getting data out of mediawiki somehow, but not in the ways we di... [23:28:51] 10Analytics: Alert and halt mediawiki processing on schema changes - https://phabricator.wikimedia.org/T209888 (10Milimetric) 05Open→03Declined This never happened again in the three years since I made this task. Nice. Of course, now that I said this... [23:34:44] 10Analytics: reportupdater TLC - https://phabricator.wikimedia.org/T193167 (10Milimetric) 05Open→03Declined Closing this as we're moving to Airflow and cleaning up our backlog. [23:36:08] 10Analytics: Refactor refinery/oozie/mediawiki/load to use a loop instead of repeating code - https://phabricator.wikimedia.org/T207012 (10Milimetric) 05Open→03Declined we'll take care of this when we move to Airflow [23:39:58] 10Analytics: reportupdater TLC - https://phabricator.wikimedia.org/T193167 (10Milimetric) [23:40:18] 10Analytics, 10Patch-For-Review, 10Unplanned-Sprint-Work: [reportupdater] Add a configurable hive client - https://phabricator.wikimedia.org/T193169 (10Milimetric) 05Open→03Declined not to diminish the work already done here, just declining in favor of moving to Airflow and to clean up the logs [23:41:09] 10Analytics: [Reportupdater] Support category of jobs that cannot be backfilled - https://phabricator.wikimedia.org/T280997 (10Milimetric) 05Open→03Declined [23:47:53] 10Analytics: Allow "scoped" EventLogging sanitization hashes - https://phabricator.wikimedia.org/T205331 (10Krinkle) I wonder if there is a long-term need for cross-schema or global hashes. I don't know this area at all and much less how people analyse the data, so maybe this isn't practical, but in my experienc... [23:48:09] 10Analytics, 10Analytics-EventLogging, 10CSS: mw-indicators popups hidden under bodyContent content - https://phabricator.wikimedia.org/T262510 (10Milimetric) 05Open→03Declined no movement in a while, and EventLogging is going away