[05:05:45] PROBLEM - Check the last execution of mediawiki-raw-cu-changes-drop-month on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit mediawiki-raw-cu-changes-drop-month https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [07:17:25] morning! Need to run some errands but I'll be on-line in ~1h max [10:11:54] RECOVERY - Check the last execution of mediawiki-raw-cu-changes-drop-month on an-coord1001 is OK: OK: Status of the systemd unit mediawiki-raw-cu-changes-drop-month https://wikitech.wikimedia.org/wiki/Analytics/Systems/Managing_systemd_timers [10:54:04] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Dropping data from druid takes down aqs hosts - https://phabricator.wikimedia.org/T226035 (10elukey) 05Resolved→03Open Very interesting - yesterday the systemd timer dropped old snapshots and the AQS alert fired, but only for a brief m... [10:57:16] * elukey lunch! [12:58:04] 10Analytics, 10Discovery, 10Operations, 10Research-Backlog: Run swift-object-expirer as part of the swift cluster - https://phabricator.wikimedia.org/T229584 (10CDanis) p:05Triage→03Normal [14:07:21] hey teammm! [14:09:40] o/ [14:26:17] hey yall [14:31:00] o/ [14:41:05] hey! [14:48:40] milimetric, do you have 20 mins to help me troubleshoot oozie-spark mess? [14:49:07] yes mforns, omw cave, need a sec to find headphones [14:49:22] np [15:42:45] people I am restarting druid brokers and historicals [15:42:49] of both clusters [15:42:54] to pick up new metrics logging etc.. [15:43:02] if you see anything weird it is my fault, please tell me :) [16:27:56] https://grafana.wikimedia.org/d/000000538/druid?refresh=1m&orgId=1&var-datasource=eqiad%20prometheus%2Fanalytics&var-cluster=druid_analytics&var-druid_datasource=All&panelId=57&fullscreen [16:28:00] \o/ [16:28:16] and also historical [16:28:17] https://grafana.wikimedia.org/d/000000538/druid?refresh=1m&orgId=1&var-datasource=eqiad%20prometheus%2Fanalytics&var-cluster=druid_analytics&var-druid_datasource=All&panelId=58&fullscreen [16:36:59] druid roll restart completed [16:38:14] 10Analytics-Kanban, 10Product-Analytics: Make aggregate data on editors per country per wiki publicly available - https://phabricator.wikimedia.org/T131280 (10Milimetric) Quick status update. I am currently evaluating ways to release this data. This is just while we wait for our privacy framework to be finis... [16:50:20] What's appropriate for documentation that is no longer relevant? We undeployed mediawiki avro logging, but still have https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/MediaWiki_Avro_Logging [16:50:27] delete page? hatnote about being undeployed? [16:54:31] ebernhardson: I guess that a big warning on the top could be a good starter, then we could delete the page in say a month? [16:55:50] elukey: sounds reasonable, thanks [17:02:28] thank you! [17:03:18] * elukey off! [17:03:18] o/ [17:09:39] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Dropping data from druid takes down aqs hosts - https://phabricator.wikimedia.org/T226035 (10elukey) Today I added some changes to the prometheus-druid-exporter: - more granularity to the latency buckets (https://gerrit.wikimedia.org/r/51... [18:44:50] \ [18:45:43] ]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]]] [18:46:15] ]]]]]]]]]]]\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ [18:49:09] ugh, sorry about the above! [19:55:08] 10Analytics: Unable to access SWAP notebooks using LDAP - https://phabricator.wikimedia.org/T230627 (10cchen)