[08:21:46] 10Analytics, 10Research, 10WMDE-Analytics-Engineering, 10Patch-For-Review, and 2 others: Provide tools for querying MediaWiki replica databases without having to specify the shard - https://phabricator.wikimedia.org/T212386 (10Neil_P._Quinn_WMF) >>! In T212386#4927285, @elukey wrote: > I had the chance to... [09:25:32] 10Analytics, 10Analytics-Kanban, 10DBA, 10Patch-For-Review, 10User-Banyek: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5] - https://phabricator.wikimedia.org/T210478 (10Marostegui) I have disabled notifications for lag checks (only for lag ones) for dbstore1002, as they are very noisy.... [09:57:32] 10Analytics, 10Analytics-Kanban, 10Operations, 10Product-Analytics, 10Patch-For-Review: dbstore1002 Mysql errors - https://phabricator.wikimedia.org/T213670 (10Marostegui) Another crash just happened: ` InnoDB: We intentionally crash the server, because it appears to be hung. 2019-02-10 09:54:08 7fa2cfff... [10:07:40] PROBLEM - Check the last execution of check_webrequest_partitions on an-coord1001 is CRITICAL: CRITICAL: Status of the systemd unit check_webrequest_partitions [10:10:27] webrequest-load-wf-upload-2019-2-10-0 is still running [10:10:28] weir [10:16:34] https://yarn.wikimedia.org/proxy/application_1547732626747_82573/mapreduce/task/task_1547732626747_82573_m_000000 [10:16:38] 9 hrs :D [10:17:21] but https://yarn.wikimedia.org/proxy/application_1547732626747_82573 shows only 1 mapper [10:17:25] is it correct? [10:37:19] so https://yarn.wikimedia.org/proxy/application_1547732626747_82573/mapreduce/job/job_1547732626747_82573 seems to have one mapper [10:37:25] that is it already done? [10:37:39] no that is the one running for hours [10:37:57] but https://yarn.wikimedia.org/proxy/application_1547732626747_82573/mapreduce/task/task_1547732626747_82573_m_000000 shows 100% progress [10:40:21] on an1062 [10:41:25] the stdout logs are pointing to https://yarn.wikimedia.org/jobhistory/job/job_1547732626747_82580 [10:41:32] that is done (contains the Hive query) [10:42:14] and the task on 1062 keeps logging Heart beat.. [10:45:12] ok I am going to try to kill application_1547732626747_82573 [10:47:32] oozie didn't really realize that the job is killed [10:52:36] !log killed oozie job related to webrequest-load-wf-upload-2019-2-10-0, seemed stuck in generate_sequence_statistics (not really clear why) [10:52:37] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [10:52:54] !log re-run webrequest upload webrequest-load-wf-upload-2019-2-10-0 [10:52:55] Logged the message at https://www.mediawiki.org/wiki/Analytics/Server_Admin_Log [11:00:38] yeah this time generate_sequence_statistics went fine [13:16:43] I confirm webrequest-load-wf-upload-2019-2-10-0 shows success in hue - Thanks elukey for having rerun! [15:25:47] 10Analytics, 10Product-Analytics, 10Research, 10WMDE-Analytics-Engineering, and 3 others: Provide tools for querying MediaWiki replica databases without having to specify the shard - https://phabricator.wikimedia.org/T212386 (10Neil_P._Quinn_WMF) [15:37:48] 10Analytics-Kanban, 10Contributors-Analysis, 10Product-Analytics, 10Patch-For-Review: Whitelist VisualEditorFeatureUse data stream - https://phabricator.wikimedia.org/T212588 (10Neil_P._Quinn_WMF) @JAllemandou, @Ottomata, I'd like to get this deployed 😁 [15:42:41] 10Analytics, 10Product-Analytics, 10Reading-analysis: [EventLogging Sanitization] Update EL sanitization white-list for field renames in EL schemas - https://phabricator.wikimedia.org/T209087 (10Neil_P._Quinn_WMF) >>! In T209087#4938610, @mforns wrote: > @mpopov Just a heads-up that we'll be turning on the d... [17:13:31] thanks for checking joal! Just re-run the check webrequest partitions, all good [17:18:07] RECOVERY - Check the last execution of check_webrequest_partitions on an-coord1001 is OK: OK: Status of the systemd unit check_webrequest_partitions [17:19:51] \o/ [21:41:46] 10Analytics-Kanban, 10Contributors-Analysis, 10Product-Analytics, 10Patch-For-Review: Whitelist VisualEditorFeatureUse data stream - https://phabricator.wikimedia.org/T212588 (10Nuria) It will be deployed with the next deployment to cluster, there is 1 per week. [22:43:03] 10Analytics, 10Analytics-Kanban, 10Page-Issue-Warnings: event_pageissues Turnilo view contains no valid data from before January 5 - https://phabricator.wikimedia.org/T214136 (10Petar.petkovic) [23:11:15] 10Analytics, 10Graphite: Grafana shows zero EventLogging events for around 44 hours around January 15 - https://phabricator.wikimedia.org/T215744 (10Tbayer)