[05:35:36] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-11-sauce build #474: FAILURE in 33 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-11-sauce/474/ [06:38:22] RECOVERY - Free space - all mounts on deployment-bastion is OK All targets OK [06:54:30] RECOVERY - Free space - all mounts on deployment-videoscaler01 is OK All targets OK [06:59:35] PROBLEM - Puppet failure on deployment-memc04 is CRITICAL 90.00% of data above the critical threshold [0.0] [06:59:38] 10Deployment-Systems: [scap] Add a file recording deploy information to all scap/sync-* calls - https://phabricator.wikimedia.org/T72477#1440178 (10mmodell) [07:15:03] 10Deployment-Systems, 7HHVM: [scap] Compile HHVM bytecode cache as deployment step - https://phabricator.wikimedia.org/T66272#1440189 (10mmodell) @thcipriani and I did some testing of herd/horde, it does work fairly well. It's just a simple wrapper around bittornado and it uses ssh to coordinate the whole swar... [07:19:34] RECOVERY - Puppet failure on deployment-memc04 is OK Less than 1.00% above the threshold [0.0] [08:21:22] Yippee, build fixed! [08:21:22] Project browsertests-CirrusSearch-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #637: FIXED in 1 min 20 sec: https://integration.wikimedia.org/ci/job/browsertests-CirrusSearch-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/637/ [08:53:30] 10Continuous-Integration-Infrastructure, 6Labs, 10Labs-Infrastructure: Reenable resolv.conf ndots:2 option on CI instances - https://phabricator.wikimedia.org/T105297#1440502 (10hashar) 3NEW a:3coren [08:54:07] 10Continuous-Integration-Infrastructure, 6Labs, 10Labs-Infrastructure, 6operations: dnsmasq returns SERVFAIL for (some?) names that do not exist instead of NXDOMAIN - https://phabricator.wikimedia.org/T92351#1106873 (10hashar) We still have that the Gerrit patch applied on the `integration` labs project. F... [08:56:58] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce build #655: SUCCESS in 46 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-os_x_10.9-safari-sauce/655/ [08:57:14] 10Continuous-Integration-Infrastructure, 6Labs, 10Labs-Infrastructure, 6operations: dnsmasq returns SERVFAIL for (some?) names that do not exist instead of NXDOMAIN - https://phabricator.wikimedia.org/T92351#1440518 (10hashar) >>! In T92351#1439472, @scfc wrote: > Too bad for the readers Google will bring... [08:59:14] 10Browser-Tests, 5MW-1.26-release, 5Patch-For-Review, 5WMF-deploy-2015-07-14_(1.26wmf14): unknown environment `default` (MediawikiSelenium::ConfigurationError) - https://phabricator.wikimedia.org/T105174#1440520 (10zeljkofilipin) 5Open>3Resolved [09:17:29] 10Continuous-Integration-Infrastructure: MediaWiki phpunit jobs should collect php errors from installer - https://phabricator.wikimedia.org/T104909#1440535 (10hashar) p:5Triage>3Normal [09:22:25] PROBLEM - Puppet failure on deployment-mx is CRITICAL 100.00% of data above the critical threshold [0.0] [09:27:13] 10Continuous-Integration-Infrastructure: MediaWiki phpunit jobs should collect php errors from installer - https://phabricator.wikimedia.org/T104909#1440555 (10hashar) The job does have the archiver configured: ``` lang=xml log/* false Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #541: FAILURE in 7 min 4 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/541/ [09:59:54] PROBLEM - Puppet staleness on deployment-logstash2 is CRITICAL 50.00% of data above the critical threshold [43200.0] [10:29:17] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-monobook-sauce build #491: STILL FAILING in 1 hr 8 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-monobook-sauce/491/ [10:37:35] 5Continuous-Integration-Isolation: diskimage cloud-init does not bring up network - https://phabricator.wikimedia.org/T105152#1440683 (10hashar) I needed a way to look at the cloud init log file. So I have booted an instance again, this time with systemd enabled. Once the instance booted I created an image out... [10:41:20] 5Continuous-Integration-Isolation: diskimage cloud-init does not bring up network - https://phabricator.wikimedia.org/T105152#1440684 (10hashar) List of installed packages from /home/hashar/mount/etc/dib-manifests/dib-manifest-dpkg-ci-dib-jessie-wikimedia-1436435961 ``` lang=python {"packages": [ {"package": "a... [10:49:51] (03PS8) 10Paladox: Add jsonlint test for MaintenanceShell [integration/config] - 10https://gerrit.wikimedia.org/r/222592 [10:54:25] (03PS3) 10Zfilipin: Run RuboCop when Gemfile.lock changes [integration/config] - 10https://gerrit.wikimedia.org/r/223574 (https://phabricator.wikimedia.org/T105178) [10:54:41] (03CR) 10Zfilipin: Run RuboCop when Gemfile.lock changes (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/223574 (https://phabricator.wikimedia.org/T105178) (owner: 10Zfilipin) [10:55:21] (03PS4) 10Zfilipin: Run RuboCop when Gemfile.lock changes [integration/config] - 10https://gerrit.wikimedia.org/r/223574 (https://phabricator.wikimedia.org/T105178) [10:55:42] (03CR) 10Zfilipin: "Patch set 3 fixed the problem noticed by hashar." [integration/config] - 10https://gerrit.wikimedia.org/r/223574 (https://phabricator.wikimedia.org/T105178) (owner: 10Zfilipin) [10:55:56] (03CR) 10Zfilipin: "Patch set 4 is a rebase." [integration/config] - 10https://gerrit.wikimedia.org/r/223574 (https://phabricator.wikimedia.org/T105178) (owner: 10Zfilipin) [11:56:50] 5Continuous-Integration-Isolation: diskimage cloud-init does not bring up network - https://phabricator.wikimedia.org/T105152#1440861 (10hashar) cloud-init uses ifconfig to gather network configuration, but it is not installed. We need the package `net-tools` in the disk image. debootstrap is run with`--variant... [12:02:09] 5Continuous-Integration-Isolation, 7Upstream: diskimage cloud-init does not bring up network - https://phabricator.wikimedia.org/T105152#1440876 (10hashar) I proposed to upstream https://review.openstack.org/200030 which adds the packages `isc-dhcp-client` and `net-tools` to the element `debian`. [12:02:37] 5Continuous-Integration-Isolation, 7Upstream: diskimage cloud-init does not bring up network - https://phabricator.wikimedia.org/T105152#1440880 (10hashar) p:5Triage>3High a:3hashar [12:03:37] 5Continuous-Integration-Isolation, 7Upstream: diskimage cloud-init does not bring up network - https://phabricator.wikimedia.org/T105152#1437480 (10hashar) [12:03:39] 5Continuous-Integration-Isolation: Create a Jessie image with diskimage-builder suitable for nodepool - https://phabricator.wikimedia.org/T102878#1440885 (10hashar) [12:12:02] 5Continuous-Integration-Isolation, 7Upstream: diskimage cloud-init does not bring up network - https://phabricator.wikimedia.org/T105152#1440908 (10hashar) Adding: DIB_DEBOOTSTRAP_EXTRA_ARGS: '--include isc-dhcp-client,net-tools' # T105152 Then disk image uses: ``` sudo sh -c 'http_proxy= debootstrap --... [12:38:07] 5Continuous-Integration-Isolation, 7Upstream: diskimage cloud-init does not bring up network - https://phabricator.wikimedia.org/T105152#1441021 (10hashar) The second --include override the first. Thus the image fails because it lacks `python`. [13:06:46] Yippee, build fixed! [13:06:47] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #711: FIXED in 34 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/711/ [13:35:15] 10Beta-Cluster: http://deployment.wikimedia.beta.wmflabs.org/wiki/en: redirected to a non-exist domain - https://phabricator.wikimedia.org/T105333#1441184 (10Bugreporter) 3NEW [14:01:19] 5Continuous-Integration-Isolation, 5Patch-For-Review, 7Upstream: diskimage cloud-init does not bring up network - https://phabricator.wikimedia.org/T105152#1441263 (10hashar) Fixed by introducing a new element `wikimedia-networking` which installs the dhcp client and ifconfig commands ( https://gerrit.wikime... [14:20:30] PROBLEM - Free space - all mounts on deployment-videoscaler01 is CRITICAL deployment-prep.deployment-videoscaler01.diskspace._var.byte_percentfree (<20.00%) [14:23:27] 10Browser-Tests: Run PhantomJS across set of Wikimedia wiki pages to ensure sane JavaScript - https://phabricator.wikimedia.org/T71519#1441344 (10He7d3r) [14:40:54] 10Continuous-Integration-Infrastructure, 6Labs, 10Labs-Infrastructure, 6operations: dnsmasq returns SERVFAIL for (some?) names that do not exist instead of NXDOMAIN - https://phabricator.wikimedia.org/T92351#1441395 (10scfc) I don't mind if ops wouldn't have investigated the issue. But doing so, claiming... [14:45:59] 10Browser-Tests: Run PhantomJS across set of Wikimedia wiki pages to ensure sane JavaScript - https://phabricator.wikimedia.org/T71519#1441414 (10Krinkle) [14:48:55] 10Continuous-Integration-Infrastructure, 6Labs, 10Labs-Infrastructure, 6operations: dnsmasq returns SERVFAIL for (some?) names that do not exist instead of NXDOMAIN - https://phabricator.wikimedia.org/T92351#1441425 (10coren) >>! In T92351#1441395, @scfc wrote: > But doing so, claiming that `dnsmasq` is at... [15:07:21] 10Continuous-Integration-Infrastructure, 6Labs, 10Labs-Infrastructure: Reenable resolv.conf ndots:2 option on CI instances - https://phabricator.wikimedia.org/T105297#1441491 (10Dzahn) I have abandoned https://gerrit.wikimedia.org/r/#/c/196731/ because Coren said "This was the wrong workaround for a problem... [15:23:27] PROBLEM - Puppet failure on deployment-cache-bits01 is CRITICAL 100.00% of data above the critical threshold [0.0] [16:04:18] 10Deployment-Systems, 7HHVM: [scap] Compile HHVM bytecode cache as deployment step - https://phabricator.wikimedia.org/T66272#1441672 (10mmodell) This: https://github.com/TMG-nl/p2ptracker > The tracker uses the knowledge we have about our network topology to build two-tier swarms of bittorrent clients. It... [16:13:52] 10Browser-Tests: Run PhantomJS across set of Wikimedia wiki pages to ensure sane JavaScript - https://phabricator.wikimedia.org/T71519#1441731 (10Krinkle) The webperf project has an asset-check deamon (see ["Webperf" on Wikitech](https://wikitech.wikimedia.org/wiki/Webperf#asset-check)) that frequently (every 5... [16:28:36] 10Beta-Cluster, 7HHVM, 7Wikimedia-log-errors: Investigate HHVM Aggregator functions that writes errors to a database. - https://phabricator.wikimedia.org/T102144#1441815 (10demon) p:5Triage>3Low [16:40:00] 10Beta-Cluster, 5Patch-For-Review, 3Reading-Web, 7Wikimedia-log-errors: Visiting sign up form shows 500 - https://phabricator.wikimedia.org/T103107#1441893 (10demon) p:5Triage>3Normal [16:41:59] 10Deployment-Systems, 7Wikimedia-log-errors: Cant unserialize( ) - https://phabricator.wikimedia.org/T103744#1441930 (10demon) p:5Triage>3Normal [18:28:27] 5Continuous-Integration-Isolation: Create a Jessie image with diskimage-builder suitable for nodepool - https://phabricator.wikimedia.org/T102878#1442414 (10hashar) The bulk of the disk image config is in nodepool.yaml puppetized at https://gerrit.wikimedia.org/r/#/c/201728/16/modules/nodepool/templates/nodepool... [18:42:41] 10Beta-Cluster: http://deployment.wikimedia.beta.wmflabs.org/wiki/en: redirected to a non-exist domain - https://phabricator.wikimedia.org/T105333#1442469 (10hashar) [18:42:43] 10Beta-Cluster, 5Patch-For-Review: Beta should not use productions interwiki.cdb - https://phabricator.wikimedia.org/T69931#1442470 (10hashar) [20:35:46] (03CR) 10Hashar: [C: 032] Run RuboCop when Gemfile.lock changes [integration/config] - 10https://gerrit.wikimedia.org/r/223574 (https://phabricator.wikimedia.org/T105178) (owner: 10Zfilipin) [20:36:34] * hashar whistles [20:36:47] marxarelli: I got disk image to produce a working Jessie image [20:37:01] the issue I hit was that the image missed /sbin/ifconfig and a dhcp client [20:37:11] once I added the relevant package ... IT booted and acquired network ! [20:37:20] hashar: great! [20:37:23] the Ubuntu images use a totally different receipe [20:37:36] i figured out how to fix my new vagrant setup as well [20:37:38] (03Merged) 10jenkins-bot: Run RuboCop when Gemfile.lock changes [integration/config] - 10https://gerrit.wikimedia.org/r/223574 (https://phabricator.wikimedia.org/T105178) (owner: 10Zfilipin) [20:37:41] will try to port the disk image element that would let us run puppet [20:37:43] ah [20:37:46] progress!!!!!!!!!! [20:37:55] :) [20:38:27] hashar: re https://gerrit.wikimedia.org/r/#/c/223691/ [20:38:44] i'm wondering if operatingsystem should be added to the hiera hierarchy [20:38:58] hehe [20:39:04] you can suggest it to ops list [20:39:07] or giuseppe [20:39:13] or who knows [20:39:36] in this case you can just amend the else [20:39:45] hashar: yeah, i'll do a quick grep to see how much impact it would have [20:39:52] that's what i'll do for this patch i think [20:39:54] to be an if os_version( 'Ubuntu >= Trusty' ) [20:39:56] or something like that [20:47:26] (03CR) 10Zfilipin: "Yeah! ;)" [integration/config] - 10https://gerrit.wikimedia.org/r/223574 (https://phabricator.wikimedia.org/T105178) (owner: 10Zfilipin) [20:48:53] (03PS1) 10Hashar: Adjust mw/core rubocop filter to be an alias [integration/config] - 10https://gerrit.wikimedia.org/r/223958 (https://phabricator.wikimedia.org/T105178) [20:49:08] (03CR) 10Hashar: "That should work for all repositories." [integration/config] - 10https://gerrit.wikimedia.org/r/223574 (https://phabricator.wikimedia.org/T105178) (owner: 10Zfilipin) [20:50:38] zeljkof: did not work for mediawiki/core :-D [20:51:10] hashar: wait, what? [20:51:13] * zeljkof is looking [20:52:34] (03CR) 10Zfilipin: [C: 032] Adjust mw/core rubocop filter to be an alias [integration/config] - 10https://gerrit.wikimedia.org/r/223958 (https://phabricator.wikimedia.org/T105178) (owner: 10Hashar) [20:52:57] ok, found it and +2d [20:53:55] zeljkof: have you looked at the zuul diff ? [20:53:56] https://integration.wikimedia.org/ci/job/integration-zuul-layoutdiff/4650/console [20:54:03] yaml aliases are magic :-) [20:54:33] (03Merged) 10jenkins-bot: Adjust mw/core rubocop filter to be an alias [integration/config] - 10https://gerrit.wikimedia.org/r/223958 (https://phabricator.wikimedia.org/T105178) (owner: 10Hashar) [20:54:35] zeljkof: also legoktm created a fabric file at the root of interation/config [20:54:41] make it easy to deploy a conf change [20:54:45] hashar: no, looking now, do not see anything strange [20:54:45] not sure whether you are aware about it [20:54:57] yeah [20:54:58] hashar: no, I was not aware [20:55:04] so most of the time, I look at the zuul diff job output [20:55:06] to see what happens [20:55:19] in this case it nicely show the mediawiki core rubocop jobs now honors Gemfile.lock :-} [20:55:26] so gives total confidence the change can be +2 / deployed [20:55:39] cool [20:56:04] look at /fabfile.py [20:56:07] it has the instructions [20:56:10] ie [20:56:17] pip install fabric [20:56:20] fab deploy_zuul [20:56:23] PROFIT!!!!!!!!! [20:56:36] * hashar donates to "legoktm Foundation Inc." [20:56:41] o.O [20:57:00] legoktm: I think thcipriani|afk started looking at fabric :} [20:57:51] woot [20:58:25] and ansible [20:58:26] and salt [20:58:30] and custom script [20:58:43] fab is definitely a good use case for us though [20:58:47] rather simple [21:10:16] I wanted to use fab for the scap rewrite. I ran into some political issues that should be moot now [21:16:17] bd808: I thought you dismissed fab because it would not match our use case [21:16:29] also I think some are looking at ansible [21:17:17] no, I wanted to use it for the command and control aspects of scap but at the time Faidon objected because he wanted me to use Trebuchet (salt). [21:17:33] Faidon and Chris wanted to see ssh die [21:18:09] 5Continuous-Integration-Isolation: Create a Jessie image with diskimage-builder suitable for nodepool - https://phabricator.wikimedia.org/T102878#1443091 (10hashar) [21:18:10] Ori's shared ssh key and auth sock hander have basically made that problem go away [21:18:11] 5Continuous-Integration-Isolation, 5Patch-For-Review, 7Upstream: diskimage cloud-init does not bring up network - https://phabricator.wikimedia.org/T105152#1443089 (10hashar) 5Open>3Resolved One less problem! [21:18:27] PROBLEM - Puppet failure on deployment-parsoidcache02 is CRITICAL 100.00% of data above the critical threshold [0.0] [21:18:27] (Which Faidon actually designed for us) [21:26:16] PROBLEM - Puppet failure on deployment-cache-text03 is CRITICAL 100.00% of data above the critical threshold [0.0] [21:27:34] bd808: all great! thanks for the background! [21:27:55] bd808: on other news, marxarelli will probably have some Vagrant related question for you while you all are in Mexico [21:28:01] blame me :} [21:28:14] cool. I like messing with Vagrant [21:28:52] I'd like to find some more time to play with https://gerrit.wikimedia.org/r/#/c/212294/ [21:28:54] bd808: yes, questions with answers like 'wtf. why?!' [21:29:34] heh. those are often the most fun questions [21:33:47] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #695: FAILURE in 1 hr 7 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/695/ [22:00:43] 5Continuous-Integration-Isolation, 7Nodepool: puppet yields: Could not find declared class ::nodepool at /etc/puppet/manifests/role/nodepool.pp:25 - https://phabricator.wikimedia.org/T105406#1443220 (10hashar) 3NEW [22:04:18] 5Continuous-Integration-Isolation, 7Nodepool: puppet yields: Could not find declared class ::nodepool at /etc/puppet/manifests/role/nodepool.pp:25 - https://phabricator.wikimedia.org/T105406#1443228 (10hashar) I have spawned [[ https://horizon.wikimedia.org/project/instances/33a5ce13-8d55-47a3-9307-405eaf766c6... [22:16:03] !log integration: pulled labs/private.git : dbef45d..d41010d [22:16:06] Logged the message, Master [22:18:17] PROBLEM - Puppet failure on nodepool-t105406 is CRITICAL 100.00% of data above the critical threshold [0.0] [22:19:23] PROBLEM - Puppet failure on integration-lightslave-jessie-1002 is CRITICAL 55.56% of data above the critical threshold [0.0] [22:24:58] well I am done [22:25:02] I HATE PUPPET [22:25:08] (you can quote / cite / bash me) [22:25:17] PROBLEM - Puppet failure on integration-slave-precise-1014 is CRITICAL 40.00% of data above the critical threshold [0.0] [22:27:05] PROBLEM - Puppet failure on integration-slave-precise-1012 is CRITICAL 66.67% of data above the critical threshold [0.0] [22:31:55] the puppet failures are just lagged out [22:32:19] though on Precise [22:32:21] that is E: Unable to locate package chromium-chromedriver [22:32:21] :D [22:32:28] guess marxarelli is working on it [22:32:35] sleep sleep [22:35:42] PROBLEM - Puppet failure on integration-slave-precise-1011 is CRITICAL 60.00% of data above the critical threshold [0.0] [22:35:56] PROBLEM - Puppet failure on integration-slave-precise-1013 is CRITICAL 30.00% of data above the critical threshold [0.0] [22:39:21] RECOVERY - Puppet failure on integration-lightslave-jessie-1002 is OK Less than 1.00% above the threshold [0.0] [22:48:15] RECOVERY - Puppet failure on nodepool-t105406 is OK Less than 1.00% above the threshold [0.0] [22:51:58] (03PS1) 10Dduvall: Include chromedriver directory in PATH [integration/jenkins] - 10https://gerrit.wikimedia.org/r/223977 [22:52:06] RECOVERY - Puppet failure on integration-slave-precise-1012 is OK Less than 1.00% above the threshold [0.0] [22:55:13] RECOVERY - Puppet failure on integration-slave-precise-1014 is OK Less than 1.00% above the threshold [0.0] [22:55:43] RECOVERY - Puppet failure on integration-slave-precise-1011 is OK Less than 1.00% above the threshold [0.0] [23:00:55] RECOVERY - Puppet failure on integration-slave-precise-1013 is OK Less than 1.00% above the threshold [0.0] [23:18:52] (03CR) 10Dduvall: [C: 032] Include chromedriver directory in PATH [integration/jenkins] - 10https://gerrit.wikimedia.org/r/223977 (owner: 10Dduvall) [23:19:26] (03Merged) 10jenkins-bot: Include chromedriver directory in PATH [integration/jenkins] - 10https://gerrit.wikimedia.org/r/223977 (owner: 10Dduvall) [23:20:11] PROBLEM - Puppet failure on deployment-bastion is CRITICAL 100.00% of data above the critical threshold [0.0]