[08:25:02] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.32.0-wmf.24 deployment blockers - https://phabricator.wikimedia.org/T191070 (10Addshore) [09:25:10] 10Continuous-Integration-Infrastructure: Add pre-commit hook that does basic checks like php -l - https://phabricator.wikimedia.org/T201778 (10Simetrical) So pre-push doesn't run for git review. There's a pre-review hook, but it doesn't receive information about which commits are being pushed for review, as far... [10:33:41] 10Scap (Scap3-MediaWiki-MVP), 10scap2, 10Wikimedia-Incident: Implement MediaWiki pre-promote checks - https://phabricator.wikimedia.org/T121597 (10mmodell) [10:33:48] 10Release-Engineering-Team (Kanban), 10Scap, 10Operations, 10Packaging, and 2 others: Update Debian Package for Scap to 3.8.7-1 - https://phabricator.wikimedia.org/T204383 (10mmodell) [10:44:14] 10Scap (Scap3-MediaWiki-MVP), 10scap2, 10Wikimedia-Incident: Implement MediaWiki pre-promote checks - https://phabricator.wikimedia.org/T121597 (10fgiunchedi) [10:44:18] 10Release-Engineering-Team (Kanban), 10Scap, 10Operations, 10Patch-For-Review: mwscript rebuildLocalisationCache.php takes 40 minutes on HHVM (rather than ~5 on PHP 5) - https://phabricator.wikimedia.org/T191921 (10fgiunchedi) [10:44:21] 10Release-Engineering-Team (Kanban), 10Scap, 10Operations, 10Packaging, and 2 others: Update Debian Package for Scap to 3.8.7-1 - https://phabricator.wikimedia.org/T204383 (10fgiunchedi) 05Open>03Resolved a:03fgiunchedi All done! 3.8.7-1 is live [10:52:51] 10Scap (Scap3-MediaWiki-MVP), 10scap2, 10Wikimedia-Incident: Implement MediaWiki pre-promote checks - https://phabricator.wikimedia.org/T121597 (10mmodell) This is now deployed to production with scap 3.8.7-1! scap-book [11:22:46] Project beta-scap-eqiad build #223865: 04FAILURE in 13 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/223865/ [11:22:57] Project beta-update-databases-eqiad build #28794: 04FAILURE in 2 min 57 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/28794/ [11:23:36] Project beta-scap-eqiad build #223866: 04STILL FAILING in 1.4 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/223866/ [11:24:43] there goes the sudo thing again [11:24:47] I wonder if someone has put in a task for that [11:39:01] Yippee, build fixed! [11:39:01] Project beta-scap-eqiad build #223867: 09FIXED in 14 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/223867/ [11:56:45] 10Beta-Cluster-Infrastructure, 10Parsoid, 10Patch-For-Review, 10Services (blocked): Beta Cluster: Parsoid config request failures from the MediaWiki API - https://phabricator.wikimedia.org/T206003 (10mobrovac) 05Resolved>03Open The config still doesn't seem to be working for the edge case of `en.wp.org... [12:22:12] Yippee, build fixed! [12:22:12] Project beta-update-databases-eqiad build #28795: 09FIXED in 2 min 11 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/28795/ [12:30:59] 10Continuous-Integration-Infrastructure: runtime/cgo: pthread_create failed: Resource temporarily unavailable - https://phabricator.wikimedia.org/T206215 (10Nikerabbit) [12:55:34] 10Beta-Cluster-Infrastructure, 10Parsoid, 10Patch-For-Review, 10Services (blocked): Beta Cluster: Parsoid config request failures from the MediaWiki API - https://phabricator.wikimedia.org/T206003 (10Arlolra) @mobrovac The above patch is deployed. Please confirm it meets your expectations [13:02:17] 10Scap, 10Operations, 10Product-Analytics, 10SRE-Access-Requests, and 2 others: Add Mathew.onipe(onimisionipe) to deployment group - https://phabricator.wikimedia.org/T205981 (10mobrovac) >>! In T205981#4634221, @Gehel wrote: > I can confirm that @Mathew.onipe needs to be able to deploy wikidata query serv... [13:15:00] 10Beta-Cluster-Infrastructure, 10Parsoid, 10Patch-For-Review, 10Services (blocked): Beta Cluster: Parsoid config request failures from the MediaWiki API - https://phabricator.wikimedia.org/T206003 (10mobrovac) 05Open>03Resolved Yup, that did the trick! Thank you, @Arlolra ! [14:03:41] 10Beta-Cluster-Infrastructure, 10Parsoid, 10Services (done): Beta Cluster: Parsoid config request failures from the MediaWiki API - https://phabricator.wikimedia.org/T206003 (10mobrovac) [15:22:44] 10Continuous-Integration-Config, 10Quibble, 10Regression: MediaWiki PHPUnit tests no longer have "Test report" in jenkins with quibble - https://phabricator.wikimedia.org/T206227 (10Legoktm) p:05Triage>03High [15:30:00] 10Continuous-Integration-Config, 10phpunit-patch-coverage: phpunit-patch-coverage HTML reports are not being saved by jenkins - https://phabricator.wikimedia.org/T206230 (10Legoktm) p:05Triage>03High [15:38:27] PROBLEM - SSH on integration-slave-docker-1021 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:43:15] RECOVERY - SSH on integration-slave-docker-1021 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u7 (protocol 2.0) [15:48:56] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10Patch-For-Review: Database busted for CiviCRM tests? - https://phabricator.wikimedia.org/T205950 (10Ejegg) ...and the index data are mostly on tables that get manipulated during the tests. We just push a lot of data around to get realisti... [15:54:25] PROBLEM - SSH on integration-slave-docker-1021 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:59:35] fyi - phabricator is going read-only for maintenance soon [16:03:46] ohnoes, wikitech is down! [16:03:57] https://wikitech.wikimedia.org/ [16:09:15] RECOVERY - SSH on integration-slave-docker-1021 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u7 (protocol 2.0) [16:09:26] maintenance ongoing [16:13:08] K wikitech is back, it was a maintenance thing there [16:13:52] what beta cluster box should I ssh into to run mwrelp? [16:31:40] AndyRussG: not sure [16:34:41] twentyafterfour: ah ok thx.. I think I got it though :) [16:35:15] 10Release-Engineering-Team (Kanban), 10User-zeljkofilipin: Find top 15 target projects that could use Selenium tests to prevent incidents - https://phabricator.wikimedia.org/T199133 (10zeljkofilipin) [17:03:41] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 3542 bytes in 6.184 second response time [17:04:17] PROBLEM - Host deployment-snapshot01 is DOWN: CRITICAL - Host Unreachable (10.68.19.94) [17:06:08] AndyRussG, mwrelp? [17:06:22] Krenair: it's a mw shell [17:06:42] Works fine on deployment-mediawiki-07 [17:07:00] mwrepl? [17:07:53] RECOVERY - Host deployment-snapshot01 is UP: PING OK - Packet loss = 0%, RTA = 2.44 ms [17:08:27] Krenair: ah yesss [17:08:39] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 47866 bytes in 0.847 second response time [17:11:30] Project beta-scap-eqiad build #223889: 04FAILURE in 15 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/223889/ [17:26:33] Yippee, build fixed! [17:26:33] Project beta-scap-eqiad build #223890: 09FIXED in 14 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/223890/ [17:35:56] RECOVERY - puppet last run on contint1001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [18:02:14] (03PS1) 10simetrical: Recognize MediaWikiTestCaseBase as test class [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/464615 [18:09:25] (03CR) 10jerkins-bot: [V: 04-1] Recognize MediaWikiTestCaseBase as test class [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/464615 (owner: 10simetrical) [18:16:04] PROBLEM - DPKG on contint2001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:16:23] PROBLEM - DPKG on contint1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:31:54] PROBLEM - puppet last run on contint2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[zuul] [18:38:43] PROBLEM - puppet last run on contint1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[zuul] [19:07:11] 10Gerrit, 10Release-Engineering-Team: Upgrade gerrit to 2.15.4 - https://phabricator.wikimedia.org/T205784 (10Paladox) test [19:11:24] (03CR) 10Legoktm: [C: 04-1] "A regex is the best codesniffer can do. If we used phan then we could look at inheritance trees to see whether it's a test class or not bu" (031 comment) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/464615 (owner: 10simetrical) [19:15:24] PROBLEM - SSH on integration-slave-docker-1021 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:25:08] Project mediawiki-core-doxygen-docker build #1636: 04FAILURE in 21 min: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen-docker/1636/ [19:30:15] RECOVERY - SSH on integration-slave-docker-1021 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u7 (protocol 2.0) [19:36:25] PROBLEM - SSH on integration-slave-docker-1021 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:39:14] Upstream have the release that GWTUI will be removed in now [19:39:15] https://groups.google.com/forum/#!topic/repo-discuss/yON4C-hIk-o [19:39:26] 3.0, GWTUI will be dropped and db support too [19:44:27] maintenance-disconnect-full-disks build 8437 integration-slave-jessie-1003 (/: 96%): OFFLINE due to disk space [19:45:08] there will be no 2.17 release as far as im aware. So 2.16 will be the last 2.x release. [19:45:38] PROBLEM - Free space - all mounts on integration-slave-jessie-1003 is CRITICAL: CRITICAL: integration.integration-slave-jessie-1003.diskspace.root.byte_percentfree (<33.33%) [19:46:17] RECOVERY - SSH on integration-slave-docker-1021 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u7 (protocol 2.0) [19:55:07] maintenance-disconnect-full-disks build 8440 integration-slave-jessie-1003: OFFLINE due to disk space [19:59:45] (03PS1) 10Paladox: Use ❌ for failure and ✅ for success [integration/config] - 10https://gerrit.wikimedia.org/r/464643 [20:02:28] (03PS2) 10Paladox: Use ❌ for failure and ✅ for success [integration/config] - 10https://gerrit.wikimedia.org/r/464643 [20:02:57] paladox: do those change the colors? [20:03:22] legoktm not really, they just add some emojies. The colour will come in 2.16 i think. [20:03:29] like https://gerrit-review.googlesource.com/c/gerrit/+/198450 [20:04:25] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.32.0-wmf.24 deployment blockers - https://phabricator.wikimedia.org/T191070 (10dduvall) 05Open>03Resolved [20:12:35] Yippee, build fixed! [20:12:35] Project mediawiki-core-doxygen-docker build #1637: 09FIXED in 8 min 27 sec: https://integration.wikimedia.org/ci/job/mediawiki-core-doxygen-docker/1637/ [20:20:07] maintenance-disconnect-full-disks build 8445 integration-slave-jessie-1003: OFFLINE due to disk space [20:24:04] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Quibble: Quibble docker instance running on CI instance for 6 hours - https://phabricator.wikimedia.org/T198517 (10dduvall) There are currently a number of long-running Docker processes in CI: {P7632} [20:31:36] !log terminating long-running (> 3 hours) CI docker containers (T198517) [20:31:41] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:31:42] T198517: Quibble docker instance running on CI instance for 6 hours - https://phabricator.wikimedia.org/T198517 [20:44:16] (03CR) 10jerkins-bot: [V: 04-1] Use ❌ for failure and ✅ for success [integration/config] - 10https://gerrit.wikimedia.org/r/464643 (owner: 10Paladox) [20:45:12] maintenance-disconnect-full-disks build 8450 integration-slave-jessie-1003: OFFLINE due to disk space [21:02:45] thcipriani: ^ looks like various cache buildup in the jenkins home directory https://phabricator.wikimedia.org/P7633 [21:03:16] any idea if it's safe to just all those directories? [21:03:25] just *delete* all [21:04:06] hrm [21:04:16] * thcipriani looks [21:04:55] no idea what the jenkins agent keeps in .jenkins but most of the other large ones look ok to me [21:04:59] FYI if you want to stop the alerts for disk-space while you work on stuff you can change the "offline reason" on jenkins and it'll stop showing up in alerts [21:05:12] cool [21:06:04] doesn't look like there's a settings.xml in .m2 so I think that means it's fine to delete [21:06:50] .gradle should be fine to delete as well [21:07:06] yeah, looks super old [21:07:50] .npm, likewise, looks super old [21:08:02] well [21:08:20] some of it is from last week, I guess :\ [21:08:47] they're all just caches i believe [21:09:01] so worst case, some jobs take a little longer than usual [21:09:19] but at least they'll have somewhere to take longer than usual :) [21:10:09] maintenance-disconnect-full-disks build 8455 integration-slave-jessie-1003: OFFLINE due to disk space [21:10:23] !log deleting cache directories in /mnt/home/jenkins-deploy on integration-slave-jessie-1003 to free up disk space [21:10:26] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:12:06] !log deleted /mnt/home/jenkins-deploy/{.m2,.gradle} on integration-slave-jessie-1003 [21:12:08] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:12:20] thcipriani: ^ should free up plenty of space for now [21:12:24] nice [21:13:19] !log bringing integration-slave-jessie-1003 back online [21:13:22] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:16:37] (03PS1) 10Thcipriani: ci-src-setup-simple: Add a layer of depth [integration/config] - 10https://gerrit.wikimedia.org/r/464719 [21:16:54] marxarelli: if you've got some time for review, would you check out ^ [21:18:01] thcipriani: sure thing! [21:18:09] thanks :) [21:18:16] but ftr, i'm hurt there's no shoutout :) [21:18:26] haha, j/k [21:20:38] RECOVERY - Free space - all mounts on integration-slave-jessie-1003 is OK: OK: All targets OK [21:20:57] (03CR) 10Dduvall: [C: 031] "Looks good! I left a really dumb nit. :)" (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/464719 (owner: 10Thcipriani) [21:22:03] heh, I'll add the Signed-off-by line :) Also, it's a good nit. I just did: dch -c and didn't modify anything. [21:23:53] (03PS2) 10Thcipriani: ci-src-setup-simple: Add a layer of depth [integration/config] - 10https://gerrit.wikimedia.org/r/464719 [21:24:14] Project beta-scap-eqiad build #223905: 04FAILURE in 13 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/223905/ [21:28:33] (03CR) 10Dduvall: [C: 032] ci-src-setup-simple: Add a layer of depth [integration/config] - 10https://gerrit.wikimedia.org/r/464719 (owner: 10Thcipriani) [21:29:03] oh boy, now I've got to remember how to deploy those :P [21:29:30] heheh, yeah [21:29:49] isn't there a periodic job that runs on contint1001 or something similar? [21:30:17] (03Merged) 10jenkins-bot: ci-src-setup-simple: Add a layer of depth [integration/config] - 10https://gerrit.wikimedia.org/r/464719 (owner: 10Thcipriani) [21:30:42] unsure about that. Last I remember there was a fab task [21:30:59] fab deploy_docker [21:31:54] but that information may be woefully out of date [21:31:57] * thcipriani searches docs [21:38:51] Yippee, build fixed! [21:38:51] Project beta-scap-eqiad build #223906: 09FIXED in 13 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/223906/ [21:42:20] hrm, don't see any updates to the docs, don't see a cron or periodic job in puppet or integration/config (although that's a good idea), looks like running deploy_docker is still the right thing [21:42:22] * thcipriani does [21:45:36] !log Updating docker-pkg files on contint1001 for https://gerrit.wikimedia.org/r/#/c/integration/config/+/464719/ [21:45:38] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:46:33] * marxarelli nods [21:52:24] (03PS1) 10Thcipriani: ci-src-setup-simple docker image version to 0.2.0 [integration/config] - 10https://gerrit.wikimedia.org/r/464725 [22:02:40] only 103 jobs [22:04:58] (03CR) 10Thcipriani: [C: 032] ci-src-setup-simple docker image version to 0.2.0 [integration/config] - 10https://gerrit.wikimedia.org/r/464725 (owner: 10Thcipriani) [22:08:45] (03Merged) 10jenkins-bot: ci-src-setup-simple docker image version to 0.2.0 [integration/config] - 10https://gerrit.wikimedia.org/r/464725 (owner: 10Thcipriani) [22:14:13] thcipriani: yes, we're still using a fabfile [22:14:51] legoktm: cool, thanks for confirmation :) [22:19:10] !log updating 103 jenkins jobs that use ci-src-setup-simple [22:19:12] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:23:01] !log updated jenkins jobs: https://phabricator.wikimedia.org/P7634 [22:23:03] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:27:32] 10Gerrit, 10Developer-Advocacy, 10Documentation, 10Google-Code-in-2018: Gerrit's test instance gerrit.git.wmflabs.org is not quite visible in the docs; no clear instructions how to use it - https://phabricator.wikimedia.org/T193788 (10srishakatux) [23:04:51] 10Continuous-Integration-Infrastructure: runtime/cgo: pthread_create failed: Resource temporarily unavailable - https://phabricator.wikimedia.org/T206215 (10thcipriani) The `pthread_create` failure sounds like we ran out of memory on that machine. @dduvall killed a whole bunch of left-over containers today that...