[00:08:35] 06Labs, 10Tool-Labs, 06Collaboration-Team-Triage, 06Community-Tech-Tool-Labs, and 5 others: Enable Flow on wikitech (labswiki and labtestwiki), then turn on for Tool talk namespace - https://phabricator.wikimedia.org/T127792#2695552 (10Dereckson) [00:09:12] 06Labs, 10Tool-Labs, 06Collaboration-Team-Triage, 06Community-Tech-Tool-Labs, and 5 others: Enable Flow on wikitech (labswiki and labtestwiki), then turn on for Tool talk namespace - https://phabricator.wikimedia.org/T127792#2054159 (10Dereckson) Flow tables have been created. [03:47:15] PROBLEM - Puppet staleness on tools-worker-1018 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [43200.0] [06:39:22] PROBLEM - Puppet run on tools-webgrid-lighttpd-1206 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [06:56:42] andrewbogott: Let me see with team if we can delete language-lcmd instance. [07:14:22] RECOVERY - Puppet run on tools-webgrid-lighttpd-1206 is OK: OK: Less than 1.00% above the threshold [0.0] [10:21:43] andrewbogott: yeah go for it (re: filippo-test) [10:29:04] (03CR) 10Hashar: "recheck" [labs/tools/guc] - 10https://gerrit.wikimedia.org/r/312601 (owner: 10Krinkle) [10:29:10] (03CR) 10Hashar: "recheck" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/313760 (https://phabricator.wikimedia.org/T145574) (owner: 10Lokal Profil) [11:32:26] RECOVERY - Host tools-secgroup-test-103 is UP: PING OK - Packet loss = 0%, RTA = 0.67 ms [11:36:42] PROBLEM - Host tools-secgroup-test-103 is DOWN: CRITICAL - Host Unreachable (10.68.21.22) [12:28:12] RECOVERY - Host secgroup-lag-102 is UP: PING OK - Packet loss = 0%, RTA = 2.59 ms [12:43:10] PROBLEM - Host secgroup-lag-102 is DOWN: CRITICAL - Host Unreachable (10.68.17.218) [13:34:06] 06Labs, 10Beta-Cluster-Infrastructure, 13Patch-For-Review: Replace all class imports on Labs with role imports - https://phabricator.wikimedia.org/T147233#2696393 (10Andrew) [14:00:32] 06Labs, 06Operations, 10netops: Consider renumbering Labs to separate address spaces - https://phabricator.wikimedia.org/T122406#2696632 (10faidon) We agreed on all of the above during the Barcelona offsite. We've preliminary agreed to attempt implementing them in tandem with the Neutron migration, which wou... [15:07:54] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations: Prepare storage layer for olo.wikipedia - https://phabricator.wikimedia.org/T147302#2696868 (10jcrespo) a:03jcrespo Claimed, but will be done together with @Marostegui for demonstration purposes. [15:09:24] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations: Prepare and check production and labs-side filtering for olowiki - https://phabricator.wikimedia.org/T147302#2696878 (10jcrespo) [15:38:30] godog: now on my radar: test-prometheus2.monitoring.eqiad.wmflabs and monitoring-prometheus2.monitoring.eqiad.wmflabs. Can either of those die? [15:38:47] (Use of mariadb classes in labs is very diverse, trying to consolidate a bit.) [15:59:13] andrewbogott: the latter can, not the former [16:05:43] godog: so, to confirm: I can delete monitoring-prometheus2 [16:05:49] but not delete test-prometheus2 [16:05:50] ? [16:06:27] andrewbogott: yeah! thanks for taking care of that [16:06:38] ok [16:07:52] !log monitoring deleting monitoring-prometheus2 [16:07:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Monitoring/SAL, dummy [16:10:20] godog: could test-prometheus2 include role::mariadb instead of ::mariadb? [16:11:02] andrewbogott: it can yeah, or not at all too [16:11:10] I'll switch it, thanks [16:11:33] andrewbogott: np, if you see puppet failing just remove the role/class altogether [16:12:46] godog: done, seems unharmed :) [16:12:59] \o/ [16:14:37] 06Labs, 10Beta-Cluster-Infrastructure, 13Patch-For-Review: Replace all class imports on Labs with role imports - https://phabricator.wikimedia.org/T147233#2697077 (10Andrew) [16:47:51] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [17:22:52] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [17:29:35] 06Labs, 10Beta-Cluster-Infrastructure, 13Patch-For-Review: Replace all class imports on Labs with role imports - https://phabricator.wikimedia.org/T147233#2697241 (10Andrew) [17:55:09] PROBLEM - Host bdsync-deb-3 is DOWN: CRITICAL - Host Unreachable (10.68.21.240) [18:16:19] PROBLEM - Host tools-exec-cyberbot is DOWN: CRITICAL - Host Unreachable (10.68.16.39) [18:18:50] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [18:35:36] PROBLEM - Puppet run on tools-bastion-05 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [18:38:14] PROBLEM - Puppet run on tools-precise-dev is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [18:50:41] PROBLEM - Puppet run on tools-bastion-02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [18:52:25] PROBLEM - Puppet run on tools-bastion-03 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [19:02:16] RECOVERY - Puppet staleness on tools-worker-1018 is OK: OK: Less than 1.00% above the threshold [3600.0] [19:04:17] 06Labs, 10Labs-Infrastructure, 10Salt, 13Patch-For-Review: update salt key monitoring scripts for labs to new nova api version - https://phabricator.wikimedia.org/T123607#2697506 (10AlexMonk-WMF) 05Open>03Resolved [19:05:03] RECOVERY - Free space - all mounts on tools-worker-1018 is OK: OK: tools.tools-worker-1018.diskspace._var_lib_docker.byte_percentfree (No valid datapoints found) [19:07:43] RECOVERY - Puppet run on tools-worker-1018 is OK: OK: Less than 1.00% above the threshold [0.0] [19:17:31] 06Labs, 10Beta-Cluster-Infrastructure, 13Patch-For-Review: Replace all class imports on Labs with role imports - https://phabricator.wikimedia.org/T147233#2697541 (10Andrew) [20:16:02] PROBLEM - Host tools-secgroup-test-102 is DOWN: CRITICAL - Host Unreachable (10.68.21.170) [21:25:42] RECOVERY - Puppet run on tools-bastion-02 is OK: OK: Less than 1.00% above the threshold [0.0] [21:27:26] RECOVERY - Puppet run on tools-bastion-03 is OK: OK: Less than 1.00% above the threshold [0.0] [21:40:35] RECOVERY - Puppet run on tools-bastion-05 is OK: OK: Less than 1.00% above the threshold [0.0] [21:43:16] RECOVERY - Puppet run on tools-precise-dev is OK: OK: Less than 1.00% above the threshold [0.0] [21:53:49] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [22:48:35] !log tools.wikibugs Updated channels.yaml to: e23d40f0cf15b1387847ed9f5972b7e0662d8803 Change project Project-Creators to Project-Admins [22:48:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL, Master [23:26:44] valhallasw`vecto: I am always impressed with how accidentally resilient wikibugs is.