[00:13:35] PROBLEM - Kafka MirrorMaker main-eqiad_to_eqiad average message consume rate in last 30m on einsteinium is CRITICAL: 0 le 0 https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_eqiad [00:14:05] PROBLEM - Kafka MirrorMaker main-eqiad_to_eqiad average message produce rate in last 30m on einsteinium is CRITICAL: 0 le 0 https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_eqiad [05:50:06] RECOVERY - Kafka MirrorMaker main-eqiad_to_eqiad average message consume rate in last 30m on einsteinium is OK: (C)0 le (W)100 le 673.4 https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_eqiad [05:50:25] !log restart mirror maker on kafka10[12-23] - failures to consume after rebalance [05:50:26] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [05:50:35] RECOVERY - Kafka MirrorMaker main-eqiad_to_eqiad average message produce rate in last 30m on einsteinium is OK: (C)0 le (W)100 le 1551 https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_eqiad [05:54:16] PROBLEM - Kafka MirrorMaker main-eqiad_to_eqiad max lag in last 10 minutes on einsteinium is CRITICAL: 1.013e+06 gt 1e+05 https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_eqiad [05:58:42] catching up --^ [06:00:46] RECOVERY - Kafka MirrorMaker main-eqiad_to_eqiad max lag in last 10 minutes on einsteinium is OK: (C)1e+05 gt (W)1e+04 gt 0 https://grafana.wikimedia.org/dashboard/db/kafka-mirrormaker?var-datasource=eqiad+prometheus/ops&var-lag_datasource=eqiad+prometheus/ops&var-mirror_name=main-eqiad_to_eqiad [06:11:15] !log temporary point Turnilo to druid1002 to allow druid1001's reimage [06:11:16] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:14:18] !log re-run failed webrequest-load jobs [06:14:19] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [06:17:25] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade Druid nodes (1001->1006) to Debian Stretch - https://phabricator.wikimedia.org/T192636#4241906 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['druid1001.eqiad.wmnet']... [06:21:19] I am reimaging druid1001 to Debian Stretch [06:22:14] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Pageviews-daily broken after move from Pivot to Turnilo - https://phabricator.wikimedia.org/T195819#4238182 (10elukey) [06:22:44] 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Refresh zookeeper nodes in eqiad - https://phabricator.wikimedia.org/T182924#3838887 (10elukey) [06:24:34] 10Analytics, 10Analytics-Kanban: Upgrade Analytics infrastructure to Debian Stretch - https://phabricator.wikimedia.org/T192642#4241915 (10elukey) [06:49:17] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade Druid nodes (1001->1006) to Debian Stretch - https://phabricator.wikimedia.org/T192636#4241925 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['druid1001.eqiad.wmnet'] ``` and were **ALL** successful. [06:51:52] druid1001 back in service, druid analytics cluster on stretch! [07:01:49] brb [08:44:41] Reimaging druid1004 to stretch! [08:46:58] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade Druid nodes (1001->1006) to Debian Stretch - https://phabricator.wikimedia.org/T192636#4242061 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by elukey on neodymium.eqiad.wmnet for hosts: ``` ['druid1004.eqiad.wmnet']... [09:11:01] 10Analytics, 10Analytics-Kanban, 10Patch-For-Review, 10User-Elukey: Upgrade Druid nodes (1001->1006) to Debian Stretch - https://phabricator.wikimedia.org/T192636#4242114 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['druid1004.eqiad.wmnet'] ``` and were **ALL** successful. [09:13:44] druid1004 back in service [10:02:50] 10Analytics, 10Product-Analytics, 10Reading-analysis: Assess impact of ua-parser update on core metrics - https://phabricator.wikimedia.org/T193578#4242326 (10fdans) Hey @Tbayer it seems like until we updated the regexes, ua_parser assumed that the major version of Internet Explorer matched that of its brows... [10:50:07] * elukey lunch! [11:44:05] 10Analytics, 10Analytics-EventLogging, 10Readers-Web-Backlog: Explore an API for logging events sampled by session - https://phabricator.wikimedia.org/T168380#4242566 (10Jhernandez) Link to source: https://github.com/wikimedia/mediawiki-extensions-EventLogging/blob/ab184351ede28dfaf12e3ad367d53a556be69c9d/m... [12:37:18] as FYI I am draining analytics1029/42/70 as prep step before reboot [12:37:25] (new cpu microcode to test) [13:15:44] !log re-run webrequest-load-wf-upload-2018-5-30-11 - died after worker node reboots [13:15:45] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [13:33:55] I am now rebooting analytics1002 [13:47:31] 10Analytics, 10Analytics-Kanban, 10EventBus, 10ORES, and 4 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#4242922 (10Ottomata) [13:47:37] 10Analytics, 10Analytics-Kanban, 10EventBus, 10ORES, and 4 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3319960 (10Ottomata) [13:48:16] 10Analytics, 10Analytics-Kanban, 10EventBus, 10ORES, and 4 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#3319960 (10Ottomata) a:03Ottomata [13:51:36] * addshore needs to make a data flow diagram [13:52:02] anyone got any suggestions for something easy to use, open source, pretty, easy to update, ? :d [13:52:41] ideally something I just write some text to describe the diagram and then run some sort of rendering? I was thinking there could be some cool JS package, but cant really find one [14:07:12] 10Analytics, 10Analytics-Kanban, 10EventBus, 10ORES, and 4 others: Emit revision-score event to EventBus and expose in EventStreams - https://phabricator.wikimedia.org/T167180#4242999 (10Pchelolo) [15:08:41] 10Analytics, 10EventBus, 10ORES, 10Scoring-platform-team, 10User-Ladsgroup: Numeric keys in ORES models causing downstream Hive ingestion to fail - https://phabricator.wikimedia.org/T195979#4243147 (10Ottomata) [15:35:52] 10Analytics, 10User-Elukey: Upgrade Archiva (meitnerium) to Debian Stretch - https://phabricator.wikimedia.org/T192639#4243256 (10akosiaris) Looking at meitnerium resources it does not require much so yes we can create a new ganeti VM. Wanna file a task using the link in https://phabricator.wikimedia.org/proje... [15:59:36] a-team, missing stand-up today, attending the unconscious bias meeting, sent an email, seeya! [16:38:15] * elukey off! [17:22:40] 10Analytics, 10Analytics-Kanban, 10EventBus, 10Services (watching): Enable multiple topics in EventStreams URL - https://phabricator.wikimedia.org/T187418#4243469 (10Ottomata) [17:49:31] !log Rerun webrequest-load-wf-misc-2018-5-30-16 [17:49:32] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [18:30:57] (03CR) 10Mforns: "A couple inline comments :]" (033 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/436002 (https://phabricator.wikimedia.org/T185533) (owner: 10Sahil505) [18:35:38] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Partially purge MobileWikiAppiOSUserHistory eventlogging schema - https://phabricator.wikimedia.org/T195269#4243820 (10JMinor) > Maybe I'd advice to also purge the OS minor version, because it starts to be identifying in some cases. Is it so important? I... [19:50:33] 10Analytics, 10Analytics-Kanban, 10Wikimedia-Stream: Support timestamp based consumption in KafkaSSE and EventStreams - https://phabricator.wikimedia.org/T196009#4244114 (10Ottomata) [20:03:30] 10Analytics, 10Analytics-Kanban, 10Wikimedia-Stream: Support timestamp based consumption in KafkaSSE and EventStreams - https://phabricator.wikimedia.org/T196009#4244166 (10Ottomata) p:05Triage>03Low [20:33:46] (03PS4) 10Joal: Update mediawiki-history stats [analytics/refinery/source] - 10https://gerrit.wikimedia.org/r/434987 (https://phabricator.wikimedia.org/T192481) [20:39:37] (03PS2) 10Sahil505: Fixed accessibility/markup issues of Wikistats 2.0 [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/436002 (https://phabricator.wikimedia.org/T185533) [20:43:54] (03CR) 10Sahil505: Fixed accessibility/markup issues of Wikistats 2.0 (033 comments) [analytics/wikistats2] - 10https://gerrit.wikimedia.org/r/436002 (https://phabricator.wikimedia.org/T185533) (owner: 10Sahil505) [21:36:05] 10Analytics, 10Analytics-Kanban, 10Product-Analytics: Partially purge MobileWikiAppiOSUserHistory eventlogging schema - https://phabricator.wikimedia.org/T195269#4244452 (10mforns) @JMinor @chelsyx > I'd suggest we make this decision on the basis of some specific concern or evidence this is identifying. Yea... [22:08:55] Hi Analytics! Can you help me delete the MobileWikiAppiOSSessions table again? ottomata helped delete it once because we change field types during development, but for some reason the type of `measure` field is still `double` -- it should be `integer` instead [22:09:40] Also can you help deleting MobileWikiAppiOSUserHistory, MobileWikiAppiOSLoginAction, MobileWikiAppiOSSettingAction, MobileWikiAppiOSReadingLists as well? [22:10:05] We're going to release a new version soon. This is just to be safe [22:17:27] 10Analytics, 10Commons, 10EventBus, 10MediaWiki-JobQueue, and 3 others: Make gwtoolsetUploadMediafileJob JSON-serializable - https://phabricator.wikimedia.org/T192946#4244556 (10Ramsey-WMF) a:03matthiasmullie Assigning to Matthias, who can ping @MaxSem for assistance and review :) [22:31:52] 10Analytics, 10Product-Analytics, 10Reading List Service, 10Reading-Infrastructure-Team-Backlog, and 3 others: [EPIC] Reading List Sync service analytics - https://phabricator.wikimedia.org/T191859#4244588 (10mpopov) p:05High>03Low [22:40:17] 10Analytics, 10Product-Analytics, 10Reading List Service, 10Reading-Infrastructure-Team-Backlog, and 3 others: [EPIC] Reading List Sync service analytics - https://phabricator.wikimedia.org/T191859#4244607 (10mpopov) >>! In T191859#4178146, @Tbayer wrote: >> The reason to hash `app_install_id` is because t... [22:42:42] 10Analytics, 10Product-Analytics, 10Reading List Service, 10Reading-Infrastructure-Team-Backlog, and 3 others: [EPIC] Reading List Sync service analytics - https://phabricator.wikimedia.org/T191859#4244610 (10mpopov)