[00:01:35] PROBLEM - Puppet errors on integration-r-lang-01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [00:12:25] PROBLEM - Puppet errors on deployment-logstash2 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [00:30:47] RECOVERY - Puppet errors on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0] [00:41:34] RECOVERY - Puppet errors on integration-r-lang-01 is OK: OK: Less than 1.00% above the threshold [0.0] [00:52:21] RECOVERY - Puppet errors on deployment-logstash2 is OK: OK: Less than 1.00% above the threshold [0.0] [01:07:45] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:09:22] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [01:12:38] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 39199 bytes in 4.829 second response time [01:14:09] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 51061 bytes in 1.259 second response time [03:02:01] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [03:40:46] (03CR) 10Awight: [C: 031] "I think that's right, thank you!" [integration/config] - 10https://gerrit.wikimedia.org/r/372759 (https://phabricator.wikimedia.org/T173251) (owner: 10AnotherLadsgroup) [03:42:00] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [03:45:28] 10Gerrit, 10Release-Engineering-Team (Next), 10Scap, 10ORES, and 2 others: Simplify git-fat support for pulling from both production and labs - https://phabricator.wikimedia.org/T171758#3536646 (10awight) /me likes @demon's post. Awesome, let's stay in coordination about how we might be able to help with... [04:02:35] PROBLEM - Puppet errors on integration-r-lang-01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [04:16:39] Yippee, build fixed! [04:16:39] Project selenium-MultimediaViewer » firefox,beta,Linux,BrowserTests build #492: 09FIXED in 20 min: https://integration.wikimedia.org/ci/job/selenium-MultimediaViewer/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/492/ [04:42:38] RECOVERY - Puppet errors on integration-r-lang-01 is OK: OK: Less than 1.00% above the threshold [0.0] [05:02:49] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [05:12:49] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [05:14:09] PROBLEM - Puppet errors on deployment-mathoid is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [05:35:20] PROBLEM - Puppet errors on deployment-conf03 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [06:10:20] RECOVERY - Puppet errors on deployment-conf03 is OK: OK: Less than 1.00% above the threshold [0.0] [06:19:07] RECOVERY - Puppet errors on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [06:25:07] 10Gerrit, 10Release-Engineering-Team (Kanban), 10Regression, 10Upstream: Cannot log into Gerrit as of recent upgrade - https://phabricator.wikimedia.org/T152640#3536689 (10Reception123) thanks [07:15:14] PROBLEM - Puppet errors on deployment-restbase01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [07:22:30] PROBLEM - Puppet errors on deployment-imagescaler02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [07:23:55] PROBLEM - Puppet errors on deployment-apertium02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [07:52:48] PROBLEM - Puppet errors on deployment-urldownloader is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [07:55:12] RECOVERY - Puppet errors on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [0.0] [08:02:29] RECOVERY - Puppet errors on deployment-imagescaler02 is OK: OK: Less than 1.00% above the threshold [0.0] [08:03:57] RECOVERY - Puppet errors on deployment-apertium02 is OK: OK: Less than 1.00% above the threshold [0.0] [08:32:50] RECOVERY - Puppet errors on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0] [08:33:36] PROBLEM - Puppet errors on integration-r-lang-01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [08:41:12] 10Beta-Cluster-Infrastructure, 10Salt: deployment-imagescaler02 is not responding to salt - https://phabricator.wikimedia.org/T173628#3535198 (10fgiunchedi) Interesting, I seem to remember seeing something like this in production too but it self healed once puppet was running on the box [08:42:49] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [09:13:34] RECOVERY - Puppet errors on integration-r-lang-01 is OK: OK: Less than 1.00% above the threshold [0.0] [09:15:06] PROBLEM - Puppet errors on deployment-mathoid is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:23:31] PROBLEM - Puppet errors on deployment-imagescaler02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [09:34:35] PROBLEM - Puppet errors on integration-r-lang-01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [09:41:54] PROBLEM - Puppet errors on deployment-mediawiki04 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [09:50:07] RECOVERY - Puppet errors on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [10:03:31] RECOVERY - Puppet errors on deployment-imagescaler02 is OK: OK: Less than 1.00% above the threshold [0.0] [10:09:36] RECOVERY - Puppet errors on integration-r-lang-01 is OK: OK: Less than 1.00% above the threshold [0.0] [10:21:53] RECOVERY - Puppet errors on deployment-mediawiki04 is OK: OK: Less than 1.00% above the threshold [0.0] [10:46:11] PROBLEM - Puppet errors on deployment-restbase01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [11:21:13] RECOVERY - Puppet errors on deployment-restbase01 is OK: OK: Less than 1.00% above the threshold [0.0] [11:24:31] PROBLEM - Puppet errors on deployment-imagescaler02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [12:04:30] RECOVERY - Puppet errors on deployment-imagescaler02 is OK: OK: Less than 1.00% above the threshold [0.0] [12:25:51] PROBLEM - Puppet errors on deployment-jobrunner02 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [13:00:52] RECOVERY - Puppet errors on deployment-jobrunner02 is OK: OK: Less than 1.00% above the threshold [0.0] [13:13:12] 10Release-Engineering-Team (Kanban), 10Wikimedia-Site-requests, 10Spanish-Sites, 10User-MarcoAurelio: Configuration changes for the AbuseFilter extension at es.wikibooks - https://phabricator.wikimedia.org/T31483#3537507 (10MarcoAurelio) [13:13:46] 10Release-Engineering-Team (Kanban), 10Wikimedia-Site-requests, 10User-MarcoAurelio: Please close the Asturianu Wikiquote - https://phabricator.wikimedia.org/T30964#3537514 (10MarcoAurelio) [13:46:14] Project selenium-VisualEditor » firefox,beta,Linux,BrowserTests build #498: 04FAILURE in 2 min 13 sec: https://integration.wikimedia.org/ci/job/selenium-VisualEditor/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/498/ [14:25:31] PROBLEM - Puppet errors on deployment-imagescaler02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [14:35:34] PROBLEM - Puppet errors on integration-r-lang-01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [15:00:30] RECOVERY - Puppet errors on deployment-imagescaler02 is OK: OK: Less than 1.00% above the threshold [0.0] [15:10:34] RECOVERY - Puppet errors on integration-r-lang-01 is OK: OK: Less than 1.00% above the threshold [0.0] [15:22:15] 10Release-Engineering-Team (Kanban), 10RelatedArticles, 10Patch-For-Review, 10Readers-Web-Backlog (Tracking), 10User-zeljkofilipin: Create Jenkins job that runs RelatedArticles Selenium tests daily - https://phabricator.wikimedia.org/T171847#3538047 (10Jdlrobson) Anything I can help with? [15:30:08] (03CR) 10Jforrester: [C: 031] Whitelist second email of Kghbln [integration/config] - 10https://gerrit.wikimedia.org/r/372181 (owner: 10Umherirrender) [15:37:16] (03CR) 10Paladox: [C: 031] Whitelist second email of Kghbln [integration/config] - 10https://gerrit.wikimedia.org/r/372181 (owner: 10Umherirrender) [15:54:56] PROBLEM - Puppet errors on deployment-apertium02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [16:01:35] PROBLEM - Puppet errors on integration-r-lang-01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [16:03:01] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [16:34:56] RECOVERY - Puppet errors on deployment-apertium02 is OK: OK: Less than 1.00% above the threshold [0.0] [16:38:33] 10Release-Engineering-Team (Kanban), 10RelatedArticles, 10Patch-For-Review, 10Readers-Web-Backlog (Tracking), 10User-zeljkofilipin: Create Jenkins job that runs RelatedArticles Selenium tests daily - https://phabricator.wikimedia.org/T171847#3538525 (10zeljkofilipin) @Jdlrobson the job is //almost// gree... [16:42:35] RECOVERY - Puppet errors on integration-r-lang-01 is OK: OK: Less than 1.00% above the threshold [0.0] [16:42:59] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:07:13] 10Release-Engineering-Team (Watching / External), 10Operations, 10Ops-Access-Requests, 10User-Addshore: Make @daniel a MediaWiki deployer - https://phabricator.wikimedia.org/T173230#3538839 (10RobH) The request to add Daniel to mw deployers was approved in today's Operations team meeting. [17:09:16] 10Continuous-Integration-Infrastructure, 10Operations, 10Ops-Access-Requests, 10Patch-For-Review, 10User-Addshore: Requesting access to contint-admins for addshore - https://phabricator.wikimedia.org/T173233#3538863 (10RobH) The request to give addshore contint-admins was approved in today's operations t... [18:09:15] 10Release-Engineering-Team (Kanban), 10RelatedArticles, 10Patch-For-Review, 10Readers-Web-Backlog (Tracking), 10User-zeljkofilipin: Create Jenkins job that runs RelatedArticles Selenium tests daily - https://phabricator.wikimedia.org/T171847#3539024 (10zeljkofilipin) Only one failed test! 🎉 {F9142684} [18:15:48] 10Continuous-Integration-Infrastructure, 10Operations, 10Ops-Access-Requests, 10Patch-For-Review, 10User-Addshore: Requesting access to contint-admins for addshore - https://phabricator.wikimedia.org/T173233#3539046 (10Dzahn) 05Open>03Resolved a:03Dzahn Hi @addshore you have been added to the group... [18:16:06] PROBLEM - Puppet errors on deployment-mathoid is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [18:17:27] 10Continuous-Integration-Infrastructure, 10Operations, 10Ops-Access-Requests, 10Patch-For-Review, 10User-Addshore: Requesting access to contint-admins for addshore - https://phabricator.wikimedia.org/T173233#3539052 (10Dzahn) And here are the things you can do as root: ``` [contint1001:~] $ sudo cat /e... [18:18:29] !log addshore is now a contint-admin [18:18:33] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:25:47] RainbowSprinkles: / Reedy could you https://gerrit.wikimedia.org/r/#/c/372780/ ? :) [18:26:31] PROBLEM - Puppet errors on deployment-imagescaler02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [18:35:07] 10Beta-Cluster-Infrastructure, 10Patch-For-Review: Disk full on deployment-jobrunner02 - https://phabricator.wikimedia.org/T173571#3539169 (10Krenair) 05Open>03Resolved a:03Krenair hopefully that should stop this happening again [18:51:08] RECOVERY - Puppet errors on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [19:01:32] RECOVERY - Puppet errors on deployment-imagescaler02 is OK: OK: Less than 1.00% above the threshold [0.0] [19:12:00] Project selenium-MinervaNeue » chrome,beta,Linux,BrowserTests build #86: 04FAILURE in 22 min: https://integration.wikimedia.org/ci/job/selenium-MinervaNeue/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/86/ [19:19:39] 10Release-Engineering-Team (Watching / External), 10Operations, 10Ops-Access-Requests, 10User-Addshore: Make @daniel a MediaWiki deployer - https://phabricator.wikimedia.org/T173230#3539361 (10herron) 05Open>03Resolved a:03herron Change 371661 has been merged and is propagating out now. Transitionin... [19:23:51] PROBLEM - Puppet errors on deployment-urldownloader is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [19:33:06] PROBLEM - Free space - all mounts on integration-slave-jessie-android is CRITICAL: CRITICAL: integration.integration-slave-jessie-android.diskspace._mnt.byte_percentfree (No valid datapoints found)integration.integration-slave-jessie-android.diskspace.root.byte_percentfree (<100.00%) [19:58:48] RECOVERY - Puppet errors on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0] [20:17:07] PROBLEM - Puppet errors on deployment-mathoid is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [20:52:04] RECOVERY - Puppet errors on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [21:12:55] PROBLEM - Puppet errors on deployment-mediawiki04 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [21:52:54] RECOVERY - Puppet errors on deployment-mediawiki04 is OK: OK: Less than 1.00% above the threshold [0.0] [22:58:51] 10Beta-Cluster-Infrastructure, 10Wikimedia-Logstash: deployment-logstash2 out of disk space - https://phabricator.wikimedia.org/T170521#3539902 (10greg) [23:17:53] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.30.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T170632#3539907 (10greg) 05Open>03Resolved [23:24:49] PROBLEM - Puppet errors on deployment-urldownloader is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [23:28:08] 10Beta-Cluster-Infrastructure, 10Wikimedia-Logstash: deployment-logstash2 out of disk space - https://phabricator.wikimedia.org/T170521#3539925 (10EBernhardson) 05Open>03Resolved a:03EBernhardson Varying definitions of done. This particular instance isn't a problem, but there is nothing preventing the sa... [23:34:01] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [23:59:48] RECOVERY - Puppet errors on deployment-urldownloader is OK: OK: Less than 1.00% above the threshold [0.0]