[00:37:34] (03PS1) 10Legoktm: Set up composer test jobs for AhoCorasick repo [integration/config] - 10https://gerrit.wikimedia.org/r/216600 [00:38:00] (03PS2) 10Legoktm: Set up composer test jobs for AhoCorasick repo [integration/config] - 10https://gerrit.wikimedia.org/r/216600 [00:38:07] (03CR) 10Legoktm: [C: 032] Set up composer test jobs for AhoCorasick repo [integration/config] - 10https://gerrit.wikimedia.org/r/216600 (owner: 10Legoktm) [00:39:46] (03Merged) 10jenkins-bot: Set up composer test jobs for AhoCorasick repo [integration/config] - 10https://gerrit.wikimedia.org/r/216600 (owner: 10Legoktm) [00:40:19] !log deploying https://gerrit.wikimedia.org/r/216600 [00:40:22] Logged the message, Master [02:07:13] PROBLEM - Puppet staleness on deployment-elastic07 is CRITICAL 100.00% of data above the critical threshold [43200.0] [02:35:43] Yippee, build fixed! [02:35:44] Project browsertests-WikiLove-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #596: FIXED in 2 min 42 sec: https://integration.wikimedia.org/ci/job/browsertests-WikiLove-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/596/ [03:23:29] Yippee, build fixed! [03:23:29] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #712: FIXED in 41 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce/712/ [04:07:45] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL deployment-prep.deployment-bastion.diskspace._var.byte_percentfree (<40.00%) [04:24:51] (03PS1) 10Legoktm: Simplify PHPUnit boostrap, require usage of composer for running tests [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/216610 [04:25:03] Project beta-scap-eqiad build #56282: FAILURE in 1 min 6 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/56282/ [04:28:07] (03PS2) 10Legoktm: Add job to run MW-CS against mediawiki/core on patchset proposals [integration/config] - 10https://gerrit.wikimedia.org/r/216520 (https://phabricator.wikimedia.org/T100966) [04:28:39] (03CR) 10Legoktm: [C: 032] Add job to run MW-CS against mediawiki/core on patchset proposals [integration/config] - 10https://gerrit.wikimedia.org/r/216520 (https://phabricator.wikimedia.org/T100966) (owner: 10Legoktm) [04:30:30] (03Merged) 10jenkins-bot: Add job to run MW-CS against mediawiki/core on patchset proposals [integration/config] - 10https://gerrit.wikimedia.org/r/216520 (https://phabricator.wikimedia.org/T100966) (owner: 10Legoktm) [04:30:55] !log deploying https://gerrit.wikimedia.org/r/216520 [04:30:58] Logged the message, Master [04:31:16] Yippee, build fixed! [04:31:16] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce build #463: FIXED in 39 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-9-sauce/463/ [04:31:25] (03CR) 10Legoktm: "recheck" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/216517 (https://phabricator.wikimedia.org/T101623) (owner: 10Legoktm) [04:34:59] Yippee, build fixed! [04:34:59] Project beta-scap-eqiad build #56283: FIXED in 59 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/56283/ [04:40:00] Yippee, build fixed! [04:40:00] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8.1-internet_explorer-11-sauce build #468: FIXED in 32 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8.1-internet_explorer-11-sauce/468/ [05:11:56] 10Beta-Cluster, 10ContentTranslation-cxserver, 5Patch-For-Review: CXServer on beta is writing Logs to NFS - https://phabricator.wikimedia.org/T101240#1344739 (10KartikMistry) a:3KartikMistry [05:34:51] Yippee, build fixed! [05:34:52] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-11-sauce build #443: FIXED in 32 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-11-sauce/443/ [05:41:11] Yippee, build fixed! [05:41:11] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-chrome-sauce build #94: FIXED in 25 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-chrome-sauce/94/ [06:35:15] (03CR) 10Polybuildr: "GlobalFunctions.php has a wf prefix error. Of all the files it could be in..." [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/216517 (https://phabricator.wikimedia.org/T101623) (owner: 10Legoktm) [06:42:46] RECOVERY - Free space - all mounts on deployment-bastion is OK All targets OK [07:00:41] RECOVERY - Free space - all mounts on deployment-videoscaler01 is OK All targets OK [07:31:32] 10Beta-Cluster, 10Graphoid, 6Services: git deploy sync on beta-cluster never finishes fetch (2/4) - https://phabricator.wikimedia.org/T101633#1344858 (10mobrovac) Cool, thnx @thcipriani. [07:42:19] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-10-sauce build #61: FAILURE in 33 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-10-sauce/61/ [07:50:46] good morning [08:02:49] grrr [08:18:37] PROBLEM - Puppet staleness on deployment-videoscaler01 is CRITICAL 100.00% of data above the critical threshold [43200.0] [08:29:18] PROBLEM - Puppet failure on deployment-sca02 is CRITICAL 33.33% of data above the critical threshold [0.0] [08:48:23] !log rebooting integration-slave-trusty-1012 (stalled can't login) [08:48:26] Logged the message, Master [08:55:59] 10Continuous-Integration-Infrastructure: integration-slave-trusty-1012, -1013, and -1015 unresponsive - https://phabricator.wikimedia.org/T101658#1345073 (10hashar) 5Open>3Resolved a:3hashar Seems it was some a transient labs issue. I have rebooted the three instances and repooled them: integration-slave... [08:56:28] !log rebooted trusty-1013 trusty-1015 ( https://phabricator.wikimedia.org/T101658 ) and repooled them in Jenkins [08:56:31] Logged the message, Master [08:59:17] RECOVERY - Puppet failure on deployment-sca02 is OK Less than 1.00% above the threshold [0.0] [09:03:17] 10Continuous-Integration-Infrastructure: integration-slave-trusty-1012, -1013, and -1015 unresponsive - https://phabricator.wikimedia.org/T101658#1345108 (10hashar) [09:03:18] 10Continuous-Integration-Infrastructure: MediaWiki Jenkins jobs stuck for 20 minutes - https://phabricator.wikimedia.org/T101653#1345107 (10hashar) [09:03:32] 10Continuous-Integration-Infrastructure: MediaWiki Jenkins jobs stuck for 20 minutes - https://phabricator.wikimedia.org/T101653#1344382 (10hashar) Was due to three slaves being broken for some reason ( T101653 ) [09:03:44] RECOVERY - Puppet staleness on integration-slave-trusty-1013 is OK Less than 1.00% above the threshold [3600.0] [09:05:42] RECOVERY - Puppet staleness on integration-slave-trusty-1012 is OK Less than 1.00% above the threshold [3600.0] [09:08:07] 10Beta-Cluster, 7Puppet: Puppet failures on deployment-mx: can't find puppet://private/dkim/wikimedia.org-wiki-mail.key - https://phabricator.wikimedia.org/T87848#1345113 (10hashar) [09:13:22] 10Beta-Cluster, 7Puppet: Puppet failures on deployment-mx: can't find puppet://private/dkim/wikimedia.org-wiki-mail.key - https://phabricator.wikimedia.org/T87848#1345116 (10hashar) Puppet can't find puppet://private/dkim/wikimedia.org-wiki-mail.key . The instance has the puppet class `role::mail::mx` and I g... [09:38:22] (03CR) 10Addshore: [C: 032] Don't require "wf" prefix on functions that are namespaced [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/216517 (https://phabricator.wikimedia.org/T101623) (owner: 10Legoktm) [09:38:36] (03Merged) 10jenkins-bot: Don't require "wf" prefix on functions that are namespaced [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/216517 (https://phabricator.wikimedia.org/T101623) (owner: 10Legoktm) [10:02:42] 10Browser-Tests: refactor "upload file" step to mediawiki_selenium gem - https://phabricator.wikimedia.org/T64888#1345192 (10hashar) [10:02:53] 10Browser-Tests: refactor "upload file" step to mediawiki_selenium gem - https://phabricator.wikimedia.org/T64888#677987 (10hashar) That is solely for #Browser-Tests [10:16:13] (03PS2) 10Hashar: JJB: move single use macro in the job-template [integration/config] - 10https://gerrit.wikimedia.org/r/216090 [10:18:23] (03CR) 10Hashar: [C: 032] "Thanks!" [integration/config] - 10https://gerrit.wikimedia.org/r/216090 (owner: 10Hashar) [10:18:51] Project browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-10-sauce build #61: ABORTED in 2 min 50 sec: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-en.wikipedia.beta.wmflabs.org-windows_8-internet_explorer-10-sauce/61/ [10:20:19] (03Merged) 10jenkins-bot: JJB: move single use macro in the job-template [integration/config] - 10https://gerrit.wikimedia.org/r/216090 (owner: 10Hashar) [10:52:24] (03PS1) 10Paladox: Add Wikidata dependance on scribunto [integration/config] - 10https://gerrit.wikimedia.org/r/216630 [10:52:40] (03PS2) 10Paladox: Add Wikidata dependance on scribunto [integration/config] - 10https://gerrit.wikimedia.org/r/216630 [10:53:41] (03PS2) 10Hashar: fix mwext-PhpTagsFunctions [integration/config] - 10https://gerrit.wikimedia.org/r/207754 (owner: 10Pastakhov) [10:57:07] (03PS3) 10Paladox: Add Wikidata dependance on scribunto [integration/config] - 10https://gerrit.wikimedia.org/r/216630 [10:57:21] (03PS4) 10Paladox: Add Wikidata dependance on scribunto [integration/config] - 10https://gerrit.wikimedia.org/r/216630 [11:00:36] (03CR) 10Hashar: [C: 032] "Indeed that is a mistake. Sorry it took so long." [integration/config] - 10https://gerrit.wikimedia.org/r/207754 (owner: 10Pastakhov) [11:02:28] (03Merged) 10jenkins-bot: fix mwext-PhpTagsFunctions [integration/config] - 10https://gerrit.wikimedia.org/r/207754 (owner: 10Pastakhov) [11:06:35] (03CR) 10Hashar: "Tests passed using a dummy change https://gerrit.wikimedia.org/r/216632" [integration/config] - 10https://gerrit.wikimedia.org/r/207754 (owner: 10Pastakhov) [11:30:17] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » nn,contintLabsSlave && UbuntuTrusty build #56: SUCCESS in 19 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=nn,label=contintLabsSlave%20&&%20UbuntuTrusty/56/ [11:45:50] (03PS1) 10Hashar: Move -tabs jobs to Trusty and shallow clone [integration/config] - 10https://gerrit.wikimedia.org/r/216657 [11:46:23] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Migrate all jobs to labs slaves - https://phabricator.wikimedia.org/T86659#1345303 (10hashar) [11:49:11] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » eu,contintLabsSlave && UbuntuTrusty build #56: SUCCESS in 38 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=eu,label=contintLabsSlave%20&&%20UbuntuTrusty/56/ [11:52:00] (03CR) 10Hashar: [C: 032] Move -tabs jobs to Trusty and shallow clone [integration/config] - 10https://gerrit.wikimedia.org/r/216657 (owner: 10Hashar) [11:53:46] (03Merged) 10jenkins-bot: Move -tabs jobs to Trusty and shallow clone [integration/config] - 10https://gerrit.wikimedia.org/r/216657 (owner: 10Hashar) [12:08:19] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » sk,contintLabsSlave && UbuntuTrusty build #56: SUCCESS in 57 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=sk,label=contintLabsSlave%20&&%20UbuntuTrusty/56/ [12:27:36] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » ml,contintLabsSlave && UbuntuTrusty build #56: SUCCESS in 1 hr 16 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=ml,label=contintLabsSlave%20&&%20UbuntuTrusty/56/ [12:31:04] PROBLEM - Puppet failure on deployment-zotero01 is CRITICAL 55.56% of data above the critical threshold [0.0] [12:46:18] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » fo,contintLabsSlave && UbuntuTrusty build #56: SUCCESS in 1 hr 35 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=fo,label=contintLabsSlave%20&&%20UbuntuTrusty/56/ [12:56:04] RECOVERY - Puppet failure on deployment-zotero01 is OK Less than 1.00% above the threshold [0.0] [12:56:24] 10Browser-Tests: Browser-Tests Workboard meeting notes - https://phabricator.wikimedia.org/T101700#1345457 (10zeljkofilipin) 3NEW [12:57:22] 10Browser-Tests: Browser-Tests Workboard meeting notes - https://phabricator.wikimedia.org/T101700#1345457 (10zeljkofilipin) The log from the first meeting (and the only so far): ``` 5:09 PM #startmeeting Browser test meeting triage 5:09 PM Meeting started Tue Jun 2 15:09:00 201... [13:02:08] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » sq,contintLabsSlave && UbuntuTrusty build #56: FAILURE in 1 hr 51 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=sq,label=contintLabsSlave%20&&%20UbuntuTrusty/56/ [13:03:57] Yippee, build fixed! [13:03:58] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #676: FIXED in 31 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/676/ [13:18:17] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » mr,contintLabsSlave && UbuntuTrusty build #56: FAILURE in 2 hr 7 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=mr,label=contintLabsSlave%20&&%20UbuntuTrusty/56/ [13:36:46] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » tl,contintLabsSlave && UbuntuTrusty build #56: SUCCESS in 2 hr 25 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=tl,label=contintLabsSlave%20&&%20UbuntuTrusty/56/ [13:48:38] PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL - Socket timeout after 10 seconds [13:48:41] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL - Socket timeout after 10 seconds [13:48:42] PROBLEM - App Server Main HTTP Response on deployment-mediawiki01 is CRITICAL - Socket timeout after 10 seconds [13:48:43] legoktm: What kind of scripts (re: NFS in CI) [13:48:43] I assume in /home, righit? [13:48:44] Project beta-scap-eqiad build #56339: FAILURE in 2 min 16 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/56339/ [13:48:44] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL - Socket timeout after 10 seconds [13:48:44] We've got /srv/deployment/integration/slave-scripts though [13:48:44] For anything not temporary [13:48:46] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 31255 bytes in 0.604 second response time [14:00:13] PROBLEM - App Server bits response on deployment-mediawiki01 is CRITICAL - Socket timeout after 10 seconds [14:00:17] hashar: ping [14:00:17] PROBLEM - HHVM Queue Size on deployment-mediawiki01 is CRITICAL 50.00% of data above the critical threshold [80.0] [14:00:17] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL - Socket timeout after 10 seconds [14:00:20] Krinkle: good mornng [14:00:22] Project browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox » yi,contintLabsSlave && UbuntuTrusty build #56: FAILURE in 2 hr 48 min: https://integration.wikimedia.org/ci/job/browsertests-VisualEditor-language-screenshot-os_x_10.10-firefox/LANGUAGE_SCREENSHOT_CODE=yi,label=contintLabsSlave%20&&%20UbuntuTrusty/56/ [14:00:22] RECOVERY - App Server bits response on deployment-mediawiki01 is OK: HTTP OK: HTTP/1.1 200 OK - 3896 bytes in 0.004 second response time [14:01:14] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 48829 bytes in 1.081 second response time [14:06:10] hashar: I rebooted the instances yesterday already [14:06:10] and that did not help [14:06:10] Krinkle: seems they were back this morning [14:06:10] https://phabricator.wikimedia.org/T101658 [14:06:10] hashar: I'm doing an 'echo 1' via dsh from my laptop to all slaves. [14:06:13] I assumed a transient error on labs infra [14:06:25] Some are responding, but still some are not [14:06:32] Also, puppet is broken on a bunch of instances for a long time now [14:06:43] PROBLEM - Free space - all mounts on deployment-videoscaler01 is CRITICAL deployment-prep.deployment-videoscaler01.diskspace._var.byte_percentfree (<40.00%) [14:06:46] I tried to deploy a fix for apc.ini, but it's not getting deployed [14:06:55] RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 48528 bytes in 1.100 second response time [14:06:55] 'ls -l /etc/php5/conf.d/disable-html_errors.ini || ls -l /etc/php5/mods-available/disable-html_errors.ini || echo "Not found"' [14:07:03] almost all instances still have it [14:07:28] well if you remove it from puppet [14:07:33] you still have to do the manual cleanup [14:07:41] hashar: I didn't "remove" it [14:07:44] we removed it 3 months ago [14:10:33] or need file { 'something.ini': ensure => absent } [14:10:33] I added ensure absent [14:10:33] I know [14:10:33] maybe it is part of the base image? [14:10:33] No, it isn't and never was [14:10:33] We added that line [14:10:33] to fix a bug [14:10:33] it's linked from the comment in the file itself [14:10:33] https://phabricator.wikimedia.org/T99413#1329623 [14:10:33] "Disable html_errors per T97040" [14:10:33] It was shortly deployed [14:10:33] RECOVERY - HHVM Queue Size on deployment-mediawiki01 is OK Less than 30.00% above the threshold [10.0] [14:10:33] and then undeployed, but left the file behind [14:10:33] anyway, I tried removing it by ensuring absent, but like I said, puppet isn't working properly [14:10:33] have you applied the patch on integration-master ? [14:10:33] what does puppet agent reprots? [14:10:34] 10Continuous-Integration-Infrastructure: Fix "PHP Warning: Module 'apc' already loaded" on zend slaves - https://phabricator.wikimedia.org/T99413#1345597 (10Krinkle) Per T97040, this was https://gerrit.wikimedia.org/r/206143. Which was deployed on integration slaves for a short time while debugging. That file w... [14:10:44] oh the puppet auto updater [14:10:46] it removed my patch [14:10:52] puppet seems to run fine on all instances http://shinken.wmflabs.org/problems?global_search=hg%3Aintegration# [14:11:22] !log clearing disk space on trusty 1011 and 1012 [14:11:55] precise1011 is not responding to ssh [14:12:00] qa-morebots: hey [14:12:02] qa-morebots: ping [14:12:06] anyway, I have to get back to other work. [14:12:13] Logged the message, Master [14:12:14] I am a logbot running on tools-exec-1206. [14:12:14] Messages are logged to https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL. [14:12:14] To log a message, type !log . [14:12:14] I am a logbot running on tools-exec-1206. [14:12:14] Messages are logged to https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL. [14:12:14] To log a message, type !log . [14:12:18] Maybe try again with ensure=>absent? [14:12:18] !log clearing disk space on trusty 1011 and 1012 [14:12:21] Logged the message, Master [14:12:26] Krinkle: don't bother with puppet [14:12:29] Will need differnet directory for precise and trusty, see the other phabricator task mentioned in my comment [14:12:31] Krinkle: just delete them from instances