[06:10:43] Project beta-scap-eqiad build #222986: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/222986/ [06:21:26] 10Project-Admins: Replace tracking bug T21719 by new project tag "HTML5" - https://phabricator.wikimedia.org/T102502 (10Aklapper) I'm missing a use case why someone would like to follow only HTML-5 related tasks. [06:24:00] Project beta-scap-eqiad build #222987: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/222987/ [06:32:35] The beta update job failure says "sudo: a password is required" [06:38:30] Yippee, build fixed! [06:38:30] Project beta-scap-eqiad build #222988: 09FIXED in 13 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/222988/ [07:09:18] PROBLEM - Free space - all mounts on deployment-deploy01 is CRITICAL: CRITICAL: deployment-prep.deployment-deploy01.diskspace.root.byte_percentfree (<11.11%) [07:24:16] PROBLEM - Free space - all mounts on deployment-deploy01 is CRITICAL: CRITICAL: deployment-prep.deployment-deploy01.diskspace.root.byte_percentfree (<11.11%) [07:39:17] PROBLEM - Free space - all mounts on deployment-deploy01 is CRITICAL: CRITICAL: deployment-prep.deployment-deploy01.diskspace.root.byte_percentfree (<11.11%) [07:47:09] PROBLEM - Puppet errors on deployment-elastic06 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [08:02:00] PROBLEM - Puppet errors on deployment-elastic07 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [08:02:33] PROBLEM - Puppet errors on deployment-logstash2 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [08:12:34] PROBLEM - Puppet errors on deployment-elastic05 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [09:20:51] Project beta-scap-eqiad build #222999: 04FAILURE in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/222999/ [09:34:02] Project beta-scap-eqiad build #223000: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/223000/ [09:47:07] 10Release-Engineering-Team, 10Scap, 10Operations: mwdebug1001 and mwdebug1002 are reliably the last two hosts to finish scap-cdb-rebuild - https://phabricator.wikimedia.org/T203625 (10MoritzMuehlenhoff) p:05Triage>03Normal [09:47:17] Project beta-scap-eqiad build #223001: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/223001/ [09:49:14] zeljkof: wdio/ffmpeg I rewrote it this morning [09:49:22] as a wdio reporter, see https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/462656/2 [09:49:30] and I left some comment on your change https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/422933/ [09:49:45] I basically copy pasted the code from wdio.conf.js to a new file [09:49:50] and wdio.conf.js is now all about: [09:50:04] reporters: [ require('wdio-mediawiki/VideoRecordingReporter') ], [09:50:08] reporterOptions: { [09:50:14] video: { videoPath: logPath } [09:50:15] } [09:50:29] cool! [09:50:50] so we can add video recording to the extensions daily jobs fairly trivially (just release wdio-mediawiki 0.0.3, bump in the extension package.json and add the above reporter configuration) [09:50:55] then we get video reporting on daily jobs [09:51:37] for running the extension selenium tests from Quibble, I think I have a good plan. Will do my best to deliver it by end of the week and cut a new quibble version [09:52:17] even cooler! [10:00:38] Project beta-scap-eqiad build #223002: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/223002/ [10:13:51] Project beta-scap-eqiad build #223003: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/223003/ [10:17:49] 10Release-Engineering-Team, 10Wikimedia-Incident, 10Wikimedia-production-error: Promoting group1 to 1.32.0-wmf.22 caused a spam of web request took longer than 60 seconds and timed out - https://phabricator.wikimedia.org/T204871 (10hashar) [10:18:34] 10Release-Engineering-Team, 10Wikimedia-Incident, 10Wikimedia-production-error: Deployments of MediaWiki with scap cause a spam of "web request took longer than 60 seconds and timed out" - https://phabricator.wikimedia.org/T204871 (10hashar) [10:20:27] 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10MediaWiki-extensions-Other, 10User-zeljkofilipin: EditSubpages extension fails mediawiki/core webdriver.io tests - https://phabricator.wikimedia.org/T196436 (10zeljkofilipin) @hashar this will be resolved by {T199116}? [10:21:46] 10Release-Engineering-Team, 10Scap, 10Datacenter-Switchover-2018, 10Patch-For-Review: Scap is checking canary servers in dormant instead of active-dc - https://phabricator.wikimedia.org/T204907 (10hashar) [10:22:08] 10Release-Engineering-Team, 10Scap, 10Operations, 10Datacenter-Switchover-2018, 10Wikimedia-Incident: Scap is checking canary servers in dormant instead of active-dc - https://phabricator.wikimedia.org/T204907 (10hashar) [10:24:44] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments, 10User-zeljkofilipin: 1.32.0-wmf.22 deployment blockers - https://phabricator.wikimedia.org/T191068 (10hashar) Train report published on Wikitech: https://wikitech.wikimedia.org/wiki/Incident_documentation/20180918-train [10:27:20] Project beta-scap-eqiad build #223004: 04STILL FAILING in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/223004/ [10:40:56] Project beta-scap-eqiad build #223005: 15ABORTED in 12 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/223005/ [10:44:17] RECOVERY - Free space - all mounts on deployment-deploy01 is OK: OK: All targets OK [10:49:18] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Scap: On deployment-prep scap cache_git_info takes 12 minutes (that is too slow) - https://phabricator.wikimedia.org/T204762 (10hashar) Even funnier, I wanted to trace the execution of `scap sync` using the python `trace` module. The git c... [10:57:25] Yippee, build fixed! [10:57:25] Project beta-scap-eqiad build #223006: 09FIXED in 15 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/223006/ [11:02:14] Project beta-scap-eqiad build #223007: 04FAILURE in 3 min 59 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/223007/ [11:08:30] 10Release-Engineering-Team (Someday), 10MediaWiki-Core-Tests, 10User-zeljkofilipin: Someday/maybe Selenium framework improvements - https://phabricator.wikimedia.org/T190995 (10hashar) [11:08:40] zeljkof: I have just declined that wdio async: false task [11:09:17] seems webdriver.io 4 runs them synchronously now. That is good enough :] [11:10:19] it's actually a 4.x feature I think, provides a more JS syntax [11:10:37] na 3.x ran then asynchronously, and 4.x runs them synchronously [11:11:09] and async: true is merely for back compatibility so people don't have to rewrite all their tests when upgrading [11:13:41] hashar: something is wrong with freenode/irccloud, my messages get rejected all the time :/ [11:16:23] Yippee, build fixed! [11:16:23] Project beta-scap-eqiad build #223008: 09FIXED in 13 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/223008/ [11:26:35] 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10Patch-For-Review, 10User-zeljkofilipin: Video recording for Selenium tests in Node.js - https://phabricator.wikimedia.org/T179188 (10zeljkofilipin) [11:26:48] 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10Patch-For-Review, 10User-zeljkofilipin: Video recording for Selenium tests in Node.js - https://phabricator.wikimedia.org/T179188 (10zeljkofilipin) p:05High>03Normal [11:42:24] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Scap: On deployment-prep scap cache_git_info takes 12 minutes (that is too slow) - https://phabricator.wikimedia.org/T204762 (10hashar) I wanted some historical build durations. I took the IRC logs from https://wm-bot.wmflabs.org/logs/%23w... [11:44:39] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Scap: On deployment-prep scap cache_git_info takes 12 minutes (that is too slow) - https://phabricator.wikimedia.org/T204762 (10hashar) If that is from scap, that might be one of: * 35e4dc6ea8557bb0ba63ec7eacd7e7233a24cd7e - sync-wikivers... [12:13:28] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Find top 15 target projects that could use Selenium tests to prevent incidents - https://phabricator.wikimedia.org/T199133 (10zeljkofilipin) [12:17:19] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Find top 15 target projects that could use Selenium tests to prevent incidents - https://phabricator.wikimedia.org/T199133 (10zeljkofilipin) [12:21:45] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Find top 15 target projects that could use Selenium tests to prevent incidents - https://phabricator.wikimedia.org/T199133 (10zeljkofilipin) [13:49:34] https://phabricator.wikimedia.org/tag/readers-web-backlog/https://phabricator.wikimedia.org/project/board/67/ [13:50:01] Sorry, bad message. [13:50:37] zeljkof / zeljko-evil-twin: is it too late for a patch to make the train? [13:50:57] enick_847: yes [13:51:17] I've cut the branch, but for urgent things, you can always create a backport [13:51:37] 10Continuous-Integration-Config, 10Operations, 10Traffic: CI jobs for authdns linting need to run on Stretch - https://phabricator.wikimedia.org/T205439 (10BBlack) p:05Triage>03Normal [13:55:21] Ok, thanks zeljkof. [14:25:52] 10Release-Engineering-Team (Watching / External), 10Scap, 10Operations, 10Datacenter-Switchover-2018, 10Wikimedia-Incident: Scap is checking canary servers in dormant instead of active-dc - https://phabricator.wikimedia.org/T204907 (10greg) (I assume SRE will do the adding to conftool and the editing/ext... [15:04:42] (03CR) 10Jforrester: [C: 032] "…" [tools/release] - 10https://gerrit.wikimedia.org/r/461733 (https://phabricator.wikimedia.org/T106067) (owner: 10Reedy) [15:05:46] (03Merged) 10jenkins-bot: Stop branching DisableAccount [tools/release] - 10https://gerrit.wikimedia.org/r/461733 (https://phabricator.wikimedia.org/T106067) (owner: 10Reedy) [15:08:20] PROBLEM - Puppet errors on saucelabs-01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [15:09:57] Someone needs to set jenkins-bot's gerrit account "status" to an emoji. Maybe "🤖" or "🌋"? [15:10:05] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments, 10User-zeljkofilipin: 1.32.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T191069 (10zeljkofilipin) 1.32.0-wmf.23 at group 0: - https://tools.wmflabs.org/versions/ - https://www.mediawiki.org/w/index.php?diff=2894422&old... [15:15:36] 10Release-Engineering-Team (Kanban), 10Education-Program-Dashboard, 10MediaWiki-extensions-EducationProgram, 10Epic, and 2 others: Deprecate and remove the EducationProgram extension from Wikimedia servers after June 30, 2018 - https://phabricator.wikimedia.org/T125618 (10Jdforrester-WMF) [15:16:46] 10Release-Engineering-Team (Kanban), 10Education-Program-Dashboard, 10MediaWiki-extensions-EducationProgram, 10Epic, and 2 others: Deprecate and remove the EducationProgram extension from Wikimedia servers after June 30, 2018 - https://phabricator.wikimedia.org/T125618 (10Jdforrester-WMF) [15:19:11] 10Release-Engineering-Team (Kanban), 10Education-Program-Dashboard, 10MediaWiki-extensions-EducationProgram, 10Epic, and 2 others: Deprecate and remove the EducationProgram extension from Wikimedia servers after June 30, 2018 - https://phabricator.wikimedia.org/T125618 (10Jdforrester-WMF) [15:20:46] (03PS1) 10Jforrester: Stop branching EducationProgram [tools/release] - 10https://gerrit.wikimedia.org/r/462735 (https://phabricator.wikimedia.org/T125618) [15:39:03] PROBLEM - Puppet errors on deployment-eventlog05 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [15:48:20] RECOVERY - Puppet errors on saucelabs-01 is OK: OK: Less than 1.00% above the threshold [0.0] [16:07:28] 10Continuous-Integration-Config: Capture failure logs from php-compile jobs (at least for luasandbox) - https://phabricator.wikimedia.org/T205453 (10Anomie) [16:45:23] !log launching new integration-slave-docker-1039/1040 bigram instances [16:45:27] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:46:27] !log new instance creation delayed due to quota [16:46:31] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:47:24] !log increasing executors to 7 for jenkins nodes integration-slave-docker-1033/1034 [16:47:25] :( [16:47:28] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:05:05] !log taking integration-slave-docker-1030/1031 offline for replacement [17:05:08] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:06:25] Hopefully I didn't break Zuul, I submitted a patch when gerrit "cherry-pick" didn't result in a merge job. [17:09:29] !log deleting integration-slave-docker-1030/1031 instances (T205362) [17:09:34] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:09:35] T205362: Migrate m4executor CI nodes to bigram instances - https://phabricator.wikimedia.org/T205362 [17:11:38] PROBLEM - Host integration-slave-docker-1030 is DOWN: CRITICAL - Host Unreachable (10.68.21.92) [17:11:40] PROBLEM - Host integration-slave-docker-1031 is DOWN: CRITICAL - Host Unreachable (10.68.20.213) [17:12:22] !log taking integration-slave-docker-1007/1008 offline for replacement (T205362) [17:12:27] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:13:11] !log launching new integration-slave-docker-1039 bigram instance [17:13:14] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:23:04] Project beta-scap-eqiad build #223032: 04FAILURE in 13 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/223032/ [17:23:14] Project beta-update-databases-eqiad build #28584: 04FAILURE in 3 min 13 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/28584/ [17:23:54] 17:23:14 Exception: ('command: ', "echo 'wikidatawiki'; /usr/local/bin/mwscript update.php --wiki=wikidatawiki --quick", 'output: ', 'wikidatawiki\ngroups: cannot find name for group ID 50120\ngroups: cannot find name for group ID 51904\n\nWe trust you have received the usual lecture from the local System\nAdministrator. It usually boils down to these three things:\n\n #1) Respect the privacy of others.\n #2) Think before you [17:23:54] type.\n #3) With great power comes great responsibility.\n\nsudo: no tty present and no askpass program specified\n') [17:24:09] !log deleting instances integration-slave-docker-1007/1008 (T205362) [17:24:15] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:24:16] T205362: Migrate m4executor CI nodes to bigram instances - https://phabricator.wikimedia.org/T205362 [17:24:19] 17:23:04 17:23:04 Last output: [17:24:19] 17:23:04 sudo: a password is required [17:24:19] 17:23:04 17:23:04 Last output: [17:24:19] 17:23:04 sudo: a password is required [17:24:31] thcipriani: marxarelli ^ Has the keyholder become disarmed? [17:25:29] !log launching integration-slave-docker-1040 bigram instance (T205362) [17:25:34] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:25:51] thcipriani: ^ do you have a sec to look? i'm juggling jenkins nodes [17:25:55] sure [17:26:00] PROBLEM - Host integration-slave-docker-1008 is DOWN: CRITICAL - Host Unreachable (10.68.17.85) [17:26:24] hrm, that's not a symptom of a keyholder problem, but I have seen this a few times in the past couple of days [17:26:39] PROBLEM - Host integration-slave-docker-1007 is DOWN: CRITICAL - Host Unreachable (10.68.19.105) [17:26:43] seems like ldap is flapping maybe? /me files a task [17:30:44] !log the puppet parameter for docker_lvm_volume specified in horizon was not applied correctly on the first puppet run for some reason. tearing down integration-slave-docker-1039... [17:30:48] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:37:45] 10Beta-Cluster-Infrastructure, 10Scap: Scap in beta fails occasionally due to permissions - https://phabricator.wikimedia.org/T205463 (10thcipriani) [17:37:55] Yippee, build fixed! [17:37:56] Project beta-scap-eqiad build #223033: 09FIXED in 13 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/223033/ [17:38:14] !log launching integration-slave-docker-1041 bigram instance (T205362) [17:38:19] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:38:20] T205362: Migrate m4executor CI nodes to bigram instances - https://phabricator.wikimedia.org/T205362 [17:42:27] !log configuring new jenkins node integration-slave-docker-1040 with 7 executors (T205362) [17:42:33] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:50:51] 10Continuous-Integration-Infrastructure, 10Wikimedia-production-error (Shared Build Failure): Jenkins jobs for MediaWiki failing with 'npm: shasum check failed' - https://phabricator.wikimedia.org/T203506 (10Krinkle) 05Open>03stalled This is blocked on CI upgrading to npm 5 or later. The cache instability... [18:00:16] !log configuring new integration-slave-docker-1041 jenkins node with 7 executors (T205362) [18:00:23] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:00:24] T205362: Migrate m4executor CI nodes to bigram instances - https://phabricator.wikimedia.org/T205362 [18:13:07] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments, 10User-zeljkofilipin: 1.32.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T191069 (10Krinkle) [18:16:16] !log reconfiguring bigram jenkins nodes to use 6 executors. 7 were configured by mistake (T205362) [18:16:22] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:16:22] T205362: Migrate m4executor CI nodes to bigram instances - https://phabricator.wikimedia.org/T205362 [18:18:41] 10Beta-Cluster-Infrastructure, 10Scap: Scap in beta fails occasionally due to permissions - https://phabricator.wikimedia.org/T205463 (10Krenair) The other day I briefly observed that `sudo` had stopped functioning for me (on a random deployment-prep box, may have been deploy01) too. It resumed working soon af... [18:20:48] 10Beta-Cluster-Infrastructure, 10Analytics-Kanban, 10Operations, 10Patch-For-Review, and 2 others: exported puppet resources are not queryable: cannot create grafana graphs of EventLogging running in beta cluster - https://phabricator.wikimedia.org/T204088 (10Niedzielski) @Ottomata, @fgiunchedi hello! We'... [18:22:18] Yippee, build fixed! [18:22:19] Project beta-update-databases-eqiad build #28585: 09FIXED in 2 min 18 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/28585/ [18:29:41] 10Beta-Cluster-Infrastructure, 10Analytics-Kanban, 10Operations, 10Patch-For-Review, and 2 others: exported puppet resources are not queryable: cannot create grafana graphs of EventLogging running in beta cluster - https://phabricator.wikimedia.org/T204088 (10Ottomata) They should be. Something is not wor... [18:38:53] 10Beta-Cluster-Infrastructure, 10Analytics-Kanban, 10Operations, 10Patch-For-Review, and 2 others: exported puppet resources are not queryable: cannot create grafana graphs of EventLogging running in beta cluster - https://phabricator.wikimedia.org/T204088 (10Ottomata) BTW, I updated https://wikitech.wikim... [18:46:07] thcipriani: Would it be OK for me to deploy a maintenance-script only cherry-pick during the (unused) American train window? https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ORES/+/462788/ [18:48:00] RoanKattouw: yep, sure. RelEng should probably need to revisit how SWATs are arranged during EU train-weeks. Thanks for checking :) [18:49:42] s/need to// [18:53:42] I'm wary of variance [18:54:04] not only because of the pain it puts on the current copy/paste/edit wiki workflow, but also because people's memory [18:54:18] (I guess copy/edit/paste, but whatever) [18:54:42] but if we can A) get rid of that stupid workflow then I'm probably OK with it :) [18:54:47] (there is no B) [18:54:55] (yet) [18:55:29] :) [18:56:02] that's a fair point, I am regularly surprised by SWAT since we moved the Wednesday window [18:57:10] SURPRISE SWAT [19:00:11] greg-g: post is live [19:02:03] Reedy: I want no surprise swat near my physical location :) [19:02:05] Krinkle: yay [19:27:51] 10Continuous-Integration-Config, 10ContentTranslation, 10WorkType-Maintenance: ContentTranslation is not running PHPUnit structure tests - https://phabricator.wikimedia.org/T109670 (10Petar.petkovic) [19:31:43] greg-g when is the deployment stop this year for the fundraiser? [19:32:53] Volker_E: https://wikitech.wikimedia.org/wiki/Deployments#Upcoming [19:34:00] greg-g: wait, there are deployments in December? Or what am I misreading? [19:34:25] been on that page [19:34:33] yes, as it was the last two years :P [19:34:51] https://wikitech.wikimedia.org/wiki/Deployments/Archive/2017/12 [19:35:07] https://wikitech.wikimedia.org/wiki/Deployments/Archive/2016/12 [19:35:33] https://wikitech.wikimedia.org/wiki/Deployments/Archive/2015/12 [19:35:53] https://wikitech.wikimedia.org/wiki/Deployments/Archive/2014/12 [19:36:01] 4 years :) [20:13:34] PROBLEM - Puppet errors on deployment-prometheus01 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [20:14:06] me [20:15:28] PROBLEM - Puppet errors on deployment-maps03 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [20:24:48] 10Beta-Cluster-Infrastructure, 10Analytics-Kanban, 10Operations, 10Patch-For-Review, and 2 others: exported puppet resources are not queryable: cannot create grafana graphs of EventLogging running in beta cluster - https://phabricator.wikimedia.org/T204088 (10Krenair) (for context: `modules/prometheus/mani... [20:28:32] RECOVERY - Puppet errors on deployment-prometheus01 is OK: OK: Less than 1.00% above the threshold [0.0] [20:36:19] Volker_E: do we even need to stop depolys for fundraising anymore?, they are on their own cluster with their own wiki handling things now [20:38:39] CN [20:45:28] RECOVERY - Puppet errors on deployment-maps03 is OK: OK: Less than 1.00% above the threshold [0.0] [20:48:18] 10Beta-Cluster-Infrastructure, 10Analytics-Kanban, 10Operations, 10Patch-For-Review, and 2 others: exported puppet resources are not queryable: cannot create grafana graphs of EventLogging running in beta cluster - https://phabricator.wikimedia.org/T204088 (10Krenair) With these puppet changes: ```diff --g... [21:04:08] 10Beta-Cluster-Infrastructure, 10Analytics-Kanban, 10Operations, 10Patch-For-Review, and 2 others: exported puppet resources are not queryable: cannot create grafana graphs of EventLogging running in beta cluster - https://phabricator.wikimedia.org/T204088 (10Ottomata) @fgiunchedi Alex's change ^ should do... [21:14:16] 10Beta-Cluster-Infrastructure, 10Analytics-Kanban, 10Operations, 10Patch-For-Review, and 2 others: Prometheus resources in deployment-prep to create grafana graphs of EventLogging - https://phabricator.wikimedia.org/T204088 (10Krenair) [22:15:17] !log taking remaining m1.medium m4executor jenkins nodes offline (T205362) [22:15:23] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:15:24] T205362: Migrate m4executor CI nodes to bigram instances - https://phabricator.wikimedia.org/T205362 [22:33:16] !log deleting remaining m1.medium instances used as m4executors (T205362) [22:33:21] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:33:22] T205362: Migrate m4executor CI nodes to bigram instances - https://phabricator.wikimedia.org/T205362 [22:36:55] PROBLEM - Host integration-slave-docker-1024 is DOWN: CRITICAL - Host Unreachable (10.68.17.114) [22:36:59] PROBLEM - Host integration-slave-docker-1012 is DOWN: CRITICAL - Host Unreachable (10.68.16.21) [22:37:09] PROBLEM - Host integration-slave-docker-1013 is DOWN: CRITICAL - Host Unreachable (10.68.19.155) [22:38:33] PROBLEM - Host integration-slave-docker-1023 is DOWN: CRITICAL - Host Unreachable (10.68.16.200) [22:38:47] PROBLEM - Host integration-slave-docker-1015 is DOWN: CRITICAL - Host Unreachable (10.68.19.76) [22:38:55] PROBLEM - Host integration-slave-docker-1014 is DOWN: CRITICAL - Host Unreachable (10.68.19.123) [22:39:03] PROBLEM - Host integration-slave-docker-1010 is DOWN: CRITICAL - Host Unreachable (10.68.18.61) [22:39:11] !log launching new integration-slave-docker-1042 bigram instance [22:39:15] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:39:45] PROBLEM - Host integration-slave-docker-1009 is DOWN: CRITICAL - Host Unreachable (10.68.21.208) [22:39:49] PROBLEM - Host integration-slave-docker-1022 is DOWN: CRITICAL - Host Unreachable (10.68.19.33) [22:39:55] PROBLEM - Host integration-slave-docker-1011 is DOWN: CRITICAL - Host Unreachable (10.68.23.221) [23:01:20] !log replaced integration-slave-docker-1042 with new integration-slave-docker-1043 instance [23:01:24] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [23:01:32] !log configured new jenkins node integration-slave-docker-1043 with 6 executors [23:01:36] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [23:05:03] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Release Pipeline: Evaluate different strategy for Docker CI instances - https://phabricator.wikimedia.org/T202160 (10dduvall) [23:05:20] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Release Pipeline: Evaluate different strategy for Docker CI instances - https://phabricator.wikimedia.org/T202160 (10dduvall) 05Open>03Resolved [23:11:20] 10Release-Engineering-Team (Kanban), 10Quibble, 10Patch-For-Review: Error: 1071 Specified key was too long; max key length is 767 bytes - https://phabricator.wikimedia.org/T193222 (10Reedy) [23:11:33] 10Release-Engineering-Team (Kanban), 10Education-Program-Dashboard, 10MediaWiki-extensions-EducationProgram, 10Epic, and 2 others: Deprecate and remove the EducationProgram extension from Wikimedia servers after June 30, 2018 - https://phabricator.wikimedia.org/T125618 (10Reedy) [23:50:29] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments, 10User-zeljkofilipin: 1.32.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T191069 (10Ryasmeen)