[00:16:12] 10MediaWiki-Releasing: Ready-to-use Docker package for MediaWiki - https://phabricator.wikimedia.org/T92826#1892064 (10GWicke) After some more testing on Jessie labs VMs I have now created a simple shell script to start the vms. This drops the dependency on docker-compose, making it even easier to get started.... [00:55:23] 00:47:33 Building remotely on integration-slave-trusty-1016 (phpflavor-hhvm contintLabsSlave UbuntuTrusty) in workspace /mnt/jenkins-workspace/workspace/mediawiki-core-qunit [00:55:28] 00:47:39 chmod: changing permissions of ‘/mnt/home/jenkins-deploy/tmpfs/jenkins-2’: Operation not permitted [00:56:07] 00:42:47 Building remotely on integration-slave-trusty-1012 (phpflavor-hhvm contintLabsSlave UbuntuTrusty) in workspace /mnt/jenkins-workspace/workspace/mediawiki-extensions-hhvm [00:56:17] 00:43:09 chmod: changing permissions of ‘/mnt/home/jenkins-deploy/tmpfs/jenkins-0’: Operation not permitted [00:57:57] yuck. legoktm ^^ another tmpfs breakage [00:58:10] :/ [00:58:16] I've just been marking slaves offline when that happens [00:58:33] there's allegedly a salt command that also works [00:58:47] marxarelli / hashar have figured out the root cause, but I don't think it's been patched yet [00:58:51] there are many alleged salt commands that work [00:59:37] https://phabricator.wikimedia.org/T120824 that one [00:59:38] Reedy: just mark trusty-1016 offline for now and see if you can get a build on a non-broken node I guess [00:59:50] and 1012 too? [01:01:09] I suppose so [01:01:16] 1016 offline [01:02:26] !log releng integration-slave-trusty-101[26] marked as offline due to chmod related errors [01:02:34] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [01:05:10] I'll try the magic salt command [01:06:03] thanks [01:07:03] yay, more green ticks [01:09:07] !log Cleared tmpfs on integration-slave-trusty-101[26] with variant of salt command from T120824 and marked as online again [01:09:13] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [01:10:49] 01:09:27 chmod: changing permissions of ‘/mnt/home/jenkins-deploy/tmpfs/jenkins-1’: Operation not permitted [01:11:00] bd808: mind doing 17 too please? Should I offline it? [01:11:59] Reedy: nuke applied! [01:12:00] though, all these dependancies might be getting in a bit of a mess too for some of the other errors.. [01:14:01] Plenty of CR needed now :D [01:15:31] It seems I'm breaking everything atm [01:16:28] Same on 14 now [01:16:45] 12 seems to be doing it again [01:16:47] Ughhh [01:18:17] mediawiki-phpunit-hhvm FAILURE in 7s [01:18:17] mediawiki-core-qunit FAILURE in 5s [01:21:00] The failures after recheck are almost random traffic lights [01:21:01] it's amusing [01:21:37] Reedy: do you have sudo on the integration hosts? [01:21:45] * bd808 could hook you up [01:21:56] C+2 V+2 is much easier ;) [01:22:18] I have access to gallium... So you'd presume I'd have access to integration hosts [01:23:32] !log Added Reedy as a projectadmin in the integration project [01:23:38] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [01:26:20] Reedy: the command I've been running is -- sudo salt --show-timeout '*slave-trusty-1017*' cmd.run 'rm -fR /mnt/home/jenkins-deploy/tmpfs/jenkins-?/*' [01:26:30] with the 1017 changed obviously [01:26:53] ,,,, [01:26:55] ... [01:26:57] mmmm [01:27:11] The command in the phab task is '*slave*' but that would probably nuke things on running jobs [01:28:28] I'll have a poke at it later [01:28:41] * bd808 wanders away to eat pizza and drink beer [01:28:47] Why does composer move everything to the bottom in installed.json when you update osmething? [01:29:02] the lock file updates in place, which makes sense [01:29:23] I think installed.json is a dump of an unsorted php array [01:30:21] https://github.com/composer/composer/issues/4410 [01:30:27] "I don't think you can or should really rely on this file's ordering" [01:31:11] When we started using it for SpecialVersion I added a sort [01:31:55] 10Beta-Cluster-Infrastructure, 10Flow, 3Collaboration-Team-Current: Run Flow External Store migration in dry-run mode on Beta - https://phabricator.wikimedia.org/T119567#1892334 (10Catrope) p:5Triage>3High [01:34:40] bd808: Plenty of vendor commits to review after you've had some beers <3 [01:50:52] RECOVERY - Host integration-t102459 is UP: PING OK - Packet loss = 0%, RTA = 0.72 ms [03:13:17] Project beta-scap-eqiad build #82824: 04FAILURE in 8 min 8 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/82824/ [03:21:38] Yippee, build fixed! [03:21:39] Project beta-scap-eqiad build #82825: 09FIXED in 6 min 24 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/82825/ [03:33:14] (03PS1) 10JanZerebecki: Workaround for tmpdir being created with a different user [integration/jenkins] - 10https://gerrit.wikimedia.org/r/260174 (https://phabricator.wikimedia.org/T120824) [03:35:48] Reedy, legoktm: I think this would fix it https://gerrit.wikimedia.org/r/#/c/260174/ should I deploy it, and then go offline? [03:38:08] 10Beta-Cluster-Infrastructure, 10Wikimedia-Site-Requests: Check status of beta interwiki json/cdb files - https://phabricator.wikimedia.org/T120427#1892436 (10Liuxinyu970226) [03:39:20] ok not deploying it now. [04:29:09] 10Deployment-Systems, 10MediaWiki-Internationalization, 6Performance-Team, 7Performance: Experiment with plain .php files for l10n cache instead of CDB - https://phabricator.wikimedia.org/T99740#1892455 (10EBernhardson) Since this is resolved, what was the result of the experiment? [06:05:50] PROBLEM - Host integration-t102459 is DOWN: CRITICAL - Host Unreachable (10.68.16.67) [07:20:04] Project beta-update-databases-eqiad build #5161: 04FAILURE in 3.8 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/5161/ [08:35:52] Yippee, build fixed! [08:35:52] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce build #818: 09FIXED in 25 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce/818/ [09:31:38] jenkins seems to be broken [09:32:02] https://integration.wikimedia.org/ci/job/mwext-testextension-zend/17249/console [09:32:02] cssjanus/cssjanus: 1.1.2 installed, 1.1.1 required. [09:32:02] 09:27:50 Error: your composer.lock file is not up to date. Run "composer update" to install newer dependencies [09:33:55] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Dozens of jobs failing on integration-slave-trusty-1012 because chmod fails for /tmp/jenkins-2 - https://phabricator.wikimedia.org/T120824#1892556 (10hashar) [09:34:48] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Dozens of jobs failing on integration-slave-trusty-1012 because chmod fails for /tmp/jenkins-2 - https://phabricator.wikimedia.org/T120824#1862091 (10hashar) I have adjusted the task description for magic command to delete www-data files: salt -... [10:23:24] Yippee, build fixed! [10:23:24] Project beta-update-databases-eqiad build #5164: 09FIXED in 3 min 22 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/5164/ [11:17:09] Project beta-scap-eqiad build #82869: 04FAILURE in 2 min 40 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/82869/ [11:25:26] 10Deployment-Systems, 10MediaWiki-Internationalization, 6Performance-Team, 7Performance: Experiment with plain .php files for l10n cache instead of CDB - https://phabricator.wikimedia.org/T99740#1892621 (10Reedy) >>! In T99740#1892455, @EBernhardson wrote: > Since this is resolved, what was the result of t... [11:25:56] Yippee, build fixed! [11:25:56] Project beta-scap-eqiad build #82870: 09FIXED in 7 min 18 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/82870/ [11:27:24] (03CR) 10Reedy: [C: 031] Workaround for tmpdir being created with a different user [integration/jenkins] - 10https://gerrit.wikimedia.org/r/260174 (https://phabricator.wikimedia.org/T120824) (owner: 10JanZerebecki) [13:38:48] (03CR) 10Hashar: [C: 04-1] "Watchout, bash wildcard does not expand files beginning with a dot. Need to shopt -s dotglob, see inline comment." (031 comment) [integration/jenkins] - 10https://gerrit.wikimedia.org/r/260174 (https://phabricator.wikimedia.org/T120824) (owner: 10JanZerebecki) [18:48:50] PROBLEM - Puppet failure on deployment-mathoid is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [22:04:09] !log sudo rm -rf /mnt/home/jenkins-deploy/tmpfs/jenkins* on integration-slave-precise-1011 [22:04:11] !log sudo rm -rf /mnt/home/jenkins-deploy/tmpfs/jenkins* on integration-slave-precise-1013 [22:04:17] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:04:22] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:18:18] !log sudo rm -rf /mnt/home/jenkins-deploy/tmpfs/jenkins* on integration-slave-precise-1012 [22:18:24] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [22:18:49] !log sudo rm -rf /mnt/home/jenkins-deploy/tmpfs/jenkins* on integration-slave-precise-1014 [22:18:55] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [23:40:56] PROBLEM - App Server Main HTTP Response on deployment-mediawiki01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:40:57] PROBLEM - App Server Main HTTP Response on deployment-mediawiki03 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:40:57] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:40:57] PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [23:44:53] RECOVERY - App Server Main HTTP Response on deployment-mediawiki01 is OK: HTTP OK: HTTP/1.1 200 OK - 39427 bytes in 0.702 second response time [23:45:22] RECOVERY - App Server Main HTTP Response on deployment-mediawiki03 is OK: HTTP OK: HTTP/1.1 200 OK - 39408 bytes in 0.648 second response time [23:45:32] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 39758 bytes in 0.748 second response time [23:45:33] RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 39420 bytes in 0.696 second response time