[00:13:25] thcipriani: yeah that works now but something new pops up: Error: Execution of '/usr/bin/salt-call --log-level=quiet --out=json grains.append deployment_target wdqs/wdqs' returned 2: Minion failed to authenticate with the master, has the minion key been accepted? [00:14:02] yeah, doesn't look like salt-key is accepted on staging-palladium [00:14:21] should work now: satging-palladium sudo salt-key -A [00:14:32] thcipriani: aha, thanks [00:15:14] thcipriani: yes, looks fine now [00:15:28] SMalyshev: cool, glad to hear it [00:18:18] 6Release-Engineering, 6operations, 7Database: Audit all existing code to ensure that any extension currently or previously adding blobs to ES has been registering a reference in the text table (and fix up if wrong) - https://phabricator.wikimedia.org/T106388#1469515 (10Legoktm) I see usage of the `ExternalSt... [00:21:32] 6Release-Engineering, 6operations, 7Database: Audit all existing code to ensure that any extension currently or previously adding blobs to ES has been registering a reference in the text table (and fix up if wrong) - https://phabricator.wikimedia.org/T106388#1469519 (10Jdforrester-WMF) [00:21:35] 6Release-Engineering, 6operations, 7Database: Re-compress External Storage in production using trackBlobs.php and recompressTracked.php - https://phabricator.wikimedia.org/T106387#1469518 (10Jdforrester-WMF) [00:43:45] PROBLEM - Puppet failure on deployment-salt is CRITICAL 30.00% of data above the critical threshold [0.0] [00:56:42] 10Deployment-Systems, 6operations, 5Patch-For-Review: Trebuchet doesn't like when a deployer server is also a minion, a edge case for scap - https://phabricator.wikimedia.org/T67549#1469568 (10thcipriani) @fgiunchedi works as expected on deployment-prep deploying the test repo. LGTM. [00:58:42] RECOVERY - Puppet failure on deployment-salt is OK Less than 1.00% above the threshold [0.0] [01:17:57] PROBLEM - Puppet failure on deployment-restbase01 is CRITICAL 100.00% of data above the critical threshold [0.0] [02:17:32] 10Deployment-Systems: l10nupdate fails since 2015-07-14 - https://phabricator.wikimedia.org/T106460#1469669 (10greg) 3NEW [03:15:23] 10Deployment-Systems: l10nupdate fails since 2015-07-14 - https://phabricator.wikimedia.org/T106460#1469737 (10Legoktm) Did someone just run it manually? [19:37:46] !log LocalisationUpdate completed (1.26wmf14) at 2015-07-22 02:37:45+00:00 [20:10:25] !log LocalisationUpdate completed (1.... [03:18:05] PROBLEM - Puppet failure on deployment-mediawiki01 is CRITICAL 66.67% of data above the critical threshold [0.0] [03:53:03] RECOVERY - Puppet failure on deployment-mediawiki01 is OK Less than 1.00% above the threshold [0.0] [05:35:07] Project browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-11-sauce build #487: FAILURE in 33 min: https://integration.wikimedia.org/ci/job/browsertests-MultimediaViewer-en.wikipedia.beta.wmflabs.org-windows_7-internet_explorer-11-sauce/487/ [06:36:14] http://githubengineering.com/deploying-branches-to-github-com/ is .. very interesting [06:41:35] I read that when it came out. [06:42:29] RECOVERY - Free space - all mounts on deployment-videoscaler01 is OK All targets OK [08:05:54] PROBLEM - Puppet failure on deployment-redis01 is CRITICAL 30.00% of data above the critical threshold [0.0] [08:06:18] PROBLEM - Puppet failure on deployment-elastic07 is CRITICAL 33.33% of data above the critical threshold [0.0] [08:09:31] PROBLEM - Puppet failure on deployment-elastic05 is CRITICAL 50.00% of data above the critical threshold [0.0] [08:17:07] PROBLEM - Puppet failure on deployment-redis02 is CRITICAL 55.56% of data above the critical threshold [0.0] [08:18:51] 10Beta-Cluster, 6Labs, 6operations, 7Monitoring: Setup (simple) catchpoint monitoring and metrics for enwiki betacluster just like production - https://phabricator.wikimedia.org/T97865#1469958 (10hashar) 5Open>3declined a:3hashar From a reply I made to ops-l: > I thought Catchpoint to be super cheap... [08:23:40] PROBLEM - Puppet failure on deployment-elastic06 is CRITICAL 60.00% of data above the critical threshold [0.0] [08:26:09] Looking at ^^ already [08:26:21] tldr: Prod puppet change is a little wonky since we lack ganglia. [08:33:36] PROBLEM - Puppet failure on deployment-memc03 is CRITICAL 50.00% of data above the critical threshold [0.0] [08:34:36] PROBLEM - Puppet failure on deployment-memc02 is CRITICAL 60.00% of data above the critical threshold [0.0] [08:34:36] PROBLEM - Puppet failure on deployment-memc04 is CRITICAL 60.00% of data above the critical threshold [0.0] [08:46:29] PROBLEM - Puppet failure on deployment-elastic08 is CRITICAL 50.00% of data above the critical threshold [0.0] [08:48:13] (03CR) 10Hashar: "Sent some patch to upstream https://review.openstack.org/#/c/204499/" [integration/config] - 10https://gerrit.wikimedia.org/r/226220 (owner: 10Hashar) [08:52:03] RECOVERY - Puppet failure on deployment-redis02 is OK Less than 1.00% above the threshold [0.0] [09:09:34] RECOVERY - Puppet failure on deployment-memc04 is OK Less than 1.00% above the threshold [0.0] [09:13:49] 10Continuous-Integration-Infrastructure: CR +2 events not triggering gate-and-submit pipeline in zuul - https://phabricator.wikimedia.org/T106436#1470052 (10hashar) Thanks @chasemp for the revert . I should have been more careful in doing this upgrade. That is an oddity in Zuul BaseFilter which uses a lower cas... [09:14:36] 10Continuous-Integration-Infrastructure: CR +2 events not triggering gate-and-submit pipeline in zuul - https://phabricator.wikimedia.org/T106436#1470053 (10hashar) I have sent a documentation change to Zuul upstream at https://review.openstack.org/#/c/204499/ Needs a bit more work. [09:20:32] (03PS1) 10Hashar: Workaround Zuul normalizing approval fields [integration/config] - 10https://gerrit.wikimedia.org/r/226274 (https://phabricator.wikimedia.org/T106436) [09:26:29] RECOVERY - Puppet failure on deployment-elastic08 is OK Less than 1.00% above the threshold [0.0] [09:27:52] (03PS2) 10Hashar: Workaround Zuul normalizing approval fields [integration/config] - 10https://gerrit.wikimedia.org/r/226274 (https://phabricator.wikimedia.org/T106436) [09:28:43] RECOVERY - Puppet failure on deployment-elastic06 is OK Less than 1.00% above the threshold [0.0] [09:30:01] (03CR) 10Hashar: [C: 032] Workaround Zuul normalizing approval fields [integration/config] - 10https://gerrit.wikimedia.org/r/226274 (https://phabricator.wikimedia.org/T106436) (owner: 10Hashar) [09:31:27] (03Merged) 10jenkins-bot: Workaround Zuul normalizing approval fields [integration/config] - 10https://gerrit.wikimedia.org/r/226274 (https://phabricator.wikimedia.org/T106436) (owner: 10Hashar) [09:32:35] !log Reupgrading Zuul to zuul_2.0.0-327-g3ebedde-wmf2precise1_amd64.deb with an approval fix ( https://gerrit.wikimedia.org/r/#/c/226274/ ) for gate-and-submit no more matching Code-Review+2 events ( https://phabricator.wikimedia.org/T106436 ) [09:32:38] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [09:37:00] (03PS1) 10Hashar: Invalid testing change [integration/config] - 10https://gerrit.wikimedia.org/r/226275 [09:37:42] (03CR) 10jenkins-bot: [V: 04-1] Invalid testing change [integration/config] - 10https://gerrit.wikimedia.org/r/226275 (owner: 10Hashar) [09:37:55] (03CR) 10Hashar: [C: 032] Invalid testing change [integration/config] - 10https://gerrit.wikimedia.org/r/226275 (owner: 10Hashar) [09:39:38] RECOVERY - Puppet failure on deployment-memc02 is OK Less than 1.00% above the threshold [0.0] [09:43:36] RECOVERY - Puppet failure on deployment-memc03 is OK Less than 1.00% above the threshold [0.0] [09:45:00] (03CR) 10Hashar: [C: 032] Invalid testing change [integration/config] - 10https://gerrit.wikimedia.org/r/226275 (owner: 10Hashar) [09:45:24] (03CR) 10Hashar: [V: 032] Invalid testing change [integration/config] - 10https://gerrit.wikimedia.org/r/226275 (owner: 10Hashar) [09:45:37] (03CR) 10Hashar: [C: 032] Invalid testing change [integration/config] - 10https://gerrit.wikimedia.org/r/226275 (owner: 10Hashar) [09:45:51] RECOVERY - Puppet failure on deployment-redis01 is OK Less than 1.00% above the threshold [0.0] [09:46:20] (03CR) 10jenkins-bot: [V: 04-1] Invalid testing change [integration/config] - 10https://gerrit.wikimedia.org/r/226275 (owner: 10Hashar) [09:46:33] (03CR) 10Hashar: Invalid testing change [integration/config] - 10https://gerrit.wikimedia.org/r/226275 (owner: 10Hashar) [09:46:37] (03Abandoned) 10Hashar: Invalid testing change [integration/config] - 10https://gerrit.wikimedia.org/r/226275 (owner: 10Hashar) [09:48:11] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: CR +2 events not triggering gate-and-submit pipeline in zuul - https://phabricator.wikimedia.org/T106436#1470184 (10hashar) I deployed the layout change https://gerrit.wikimedia.org/r/226274 and reupgraded Zuul to zuul_2.0.0-327-g3ebedde-wmf2precise1... [09:48:27] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: CR +2 events not triggering gate-and-submit pipeline in zuul - https://phabricator.wikimedia.org/T106436#1470185 (10hashar) 5Open>3Resolved Resolved by https://gerrit.wikimedia.org/r/226274 [09:48:44] I should not upgrade during my evening [09:48:50] I should not upgrade during my evenings [09:48:53] I should not upgrade during my evenings [09:48:54] ... [09:49:33] RECOVERY - Puppet failure on deployment-elastic05 is OK Less than 1.00% above the threshold [0.0] [09:54:49] 10Continuous-Integration-Infrastructure: CI run complains about missing javac - https://phabricator.wikimedia.org/T106446#1470204 (10hashar) a:3hashar The job is solely tied to the Jenkins label `contintLabsSlave`. That causes it to roam on integration-lightslave-jessie-1002 which is missing most of the packa... [09:59:03] (03PS1) 10Hashar: Tie puppet validate jobs to Trusty [integration/config] - 10https://gerrit.wikimedia.org/r/226277 [10:04:44] (03PS2) 10Hashar: Tie puppet validate jobs to Trusty [integration/config] - 10https://gerrit.wikimedia.org/r/226277 (https://phabricator.wikimedia.org/T106446) [10:04:53] (03CR) 10Hashar: [C: 032] Tie puppet validate jobs to Trusty [integration/config] - 10https://gerrit.wikimedia.org/r/226277 (https://phabricator.wikimedia.org/T106446) (owner: 10Hashar) [10:05:28] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: CI run complains about missing javac - https://phabricator.wikimedia.org/T106446#1470245 (10hashar) 5Open>3Resolved The job is now running on Trusty slaves that are complete and have javac. [10:05:59] (03CR) 10Hashar: [C: 04-2 V: 04-1] Tie puppet validate jobs to Trusty [integration/config] - 10https://gerrit.wikimedia.org/r/226277 (https://phabricator.wikimedia.org/T106446) (owner: 10Hashar) [10:06:23] (03CR) 10jenkins-bot: [V: 04-1] Tie puppet validate jobs to Trusty [integration/config] - 10https://gerrit.wikimedia.org/r/226277 (https://phabricator.wikimedia.org/T106446) (owner: 10Hashar) [10:06:42] (03PS1) 10Hashar: Tie wikidata-query-rdf to Trusty [integration/config] - 10https://gerrit.wikimedia.org/r/226280 (https://phabricator.wikimedia.org/T106446) [10:07:04] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: CI run complains about missing javac - https://phabricator.wikimedia.org/T106446#1470249 (10hashar) >>! In T106446#1470241, @gerritbot wrote: > Change 226277 had a related patch set uploaded (by Hashar): > Tie puppet validate jobs to Trusty > > [[htt... [10:07:19] (03CR) 10Hashar: [C: 032] Tie wikidata-query-rdf to Trusty [integration/config] - 10https://gerrit.wikimedia.org/r/226280 (https://phabricator.wikimedia.org/T106446) (owner: 10Hashar) [10:09:02] (03PS3) 10Hashar: Tie puppet validate Jenkins jobs to Precise [integration/config] - 10https://gerrit.wikimedia.org/r/226277 [10:14:43] (03Abandoned) 10Hashar: Tie puppet validate Jenkins jobs to Precise [integration/config] - 10https://gerrit.wikimedia.org/r/226277 (owner: 10Hashar) [10:16:17] RECOVERY - Puppet failure on deployment-elastic07 is OK Less than 1.00% above the threshold [0.0] [10:19:24] (03CR) 10Hashar: [C: 032] Tie wikidata-query-rdf to Trusty [integration/config] - 10https://gerrit.wikimedia.org/r/226280 (https://phabricator.wikimedia.org/T106446) (owner: 10Hashar) [10:21:25] (03Merged) 10jenkins-bot: Tie wikidata-query-rdf to Trusty [integration/config] - 10https://gerrit.wikimedia.org/r/226280 (https://phabricator.wikimedia.org/T106446) (owner: 10Hashar) [10:24:43] !log Upgrading Zuul on Jenkins Precise slaves to zuul_2.0.0-327-g3ebedde-wmf2precise1_amd64.deb [10:24:46] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [10:32:10] 10Continuous-Integration-Infrastructure, 6operations: Upload new Zuul .deb package on apt.wikimedia.org for precise-wikimedia - https://phabricator.wikimedia.org/T106499#1470255 (10hashar) 3NEW [10:42:27] 6Release-Engineering, 6operations, 7Database: Audit all existing code to ensure that any extension currently or previously adding blobs to ES has been registering a reference in the text table (and fix up if wrong) - https://phabricator.wikimedia.org/T106388#1470286 (10PleaseStand) >>! In T106388#1469515, @L... [11:58:34] PROBLEM - Free space - all mounts on deployment-videoscaler01 is CRITICAL deployment-prep.deployment-videoscaler01.diskspace._var.byte_percentfree (<50.00%) [12:07:18] PROBLEM - Puppet failure on integration-slave-jessie-1001 is CRITICAL 100.00% of data above the critical threshold [0.0] [12:13:09] (03PS6) 10Paladox: Update TwitterLogin tests [integration/config] - 10https://gerrit.wikimedia.org/r/225712 [12:40:24] 5Continuous-Integration-Isolation: Create a Jessie image with diskimage-builder suitable for nodepool - https://phabricator.wikimedia.org/T102878#1470578 (10hashar) 5Open>3Resolved Bulk of the work is completed and merged in operations/puppet.git. [12:51:58] 10Continuous-Integration-Infrastructure, 7Zuul: Bump python-gear package to 0.5.7 - https://phabricator.wikimedia.org/T98294#1470599 (10hashar) The package sources on Alioth can't build anymore https://bugs.launchpad.net/pbr/+bug/1256138 :-( [13:00:09] Yippee, build fixed! [13:00:10] Project browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #724: FIXED in 28 min: https://integration.wikimedia.org/ci/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/724/ [13:01:09] 10Continuous-Integration-Infrastructure: Jenkins/Zuul something stuck - https://phabricator.wikimedia.org/T71045#1470625 (10hashar) [13:01:11] 10Continuous-Integration-Infrastructure, 7Upstream, 7Zuul: [upstream] Jobs are sometime no more being triggered by Zuul / Jenkins - https://phabricator.wikimedia.org/T65760#1470623 (10hashar) 5Open>3Resolved I havent seen this one in ages. Since I got python-gear and the Gearman Jenkins plugin upgraded,... [13:02:09] 10Continuous-Integration-Infrastructure: Run zuul-clear-refs.py daily on all our repositories to reclaim Zuul references - https://phabricator.wikimedia.org/T103528#1470629 (10hashar) a:5hashar>3None [13:02:19] 10Continuous-Integration-Infrastructure: Package / puppetize zuul-clear-refs.py - https://phabricator.wikimedia.org/T103529#1470630 (10hashar) a:5hashar>3None [13:04:49] 10Deployment-Systems, 6operations, 7HHVM: HHVM lock-ups - https://phabricator.wikimedia.org/T89912#1470645 (10fgiunchedi) did this get fixed upstream? afaik we're not experiencing hhvm lockups now in production even on big deploys and there was work around statcache [13:05:15] 10Continuous-Integration-Infrastructure, 6Release-Engineering, 7Jenkins, 7Upstream: [upstream] Jenkins Gearman plugin has deadlock on executor threads (was: Beta Cluster stopped receiving code updates (beta-update-databases-eqiad hung) - https://phabricator.wikimedia.org/T72597#1470648 (10hashar) 5Open>... [13:14:31] 5Continuous-Integration-Isolation, 5Patch-For-Review: Instances created by Nodepool cant run puppet due to missing certificate - https://phabricator.wikimedia.org/T96670#1470660 (10hashar) 5Open>3Resolved a:3hashar @Andrew fixed labs so we can boot instances from the OpenStack API and I confirmed it work... [13:15:07] 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Isolation, 6operations, 7Nodepool, 5Patch-For-Review: Use systemd for Nodepool - https://phabricator.wikimedia.org/T96867#1470664 (10hashar) [13:15:10] 5Continuous-Integration-Isolation, 6operations: install/deploy labnodepool1001 - https://phabricator.wikimedia.org/T95045#1470663 (10hashar) [13:15:27] 5Continuous-Integration-Isolation, 6operations: install/deploy labnodepool1001 - https://phabricator.wikimedia.org/T95045#1470666 (10hashar) [13:15:29] 5Continuous-Integration-Isolation, 10hardware-requests, 6operations: eqiad: 2 hardware access request for CI isolation on labsnet - https://phabricator.wikimedia.org/T93076#1470667 (10hashar) [13:16:06] 5Continuous-Integration-Isolation, 6operations: install/deploy labnodepool1001 - https://phabricator.wikimedia.org/T95045#1179075 (10hashar) The hardware has been allocated for Nodepool, so I removed the blocking task {T93706} Most of the puppet patches have been merged, the one left over is the systemd confi... [13:17:46] PROBLEM - Puppet staleness on deployment-restbase01 is CRITICAL 100.00% of data above the critical threshold [43200.0] [13:26:56] 5Continuous-Integration-Isolation, 6operations: install/deploy scandium as zuul merger (ci) server - https://phabricator.wikimedia.org/T95046#1470690 (10hashar) scandium is going to host the Zuul mergers. On the [[ https://www.mediawiki.org/wiki/Continuous_integration/Architecture/Isolation#Architecture_overv... [13:48:28] 10Continuous-Integration-Infrastructure, 10Wikidata: github.com is 403ing downloads from Wikimedia CI during composer update - https://phabricator.wikimedia.org/T106519#1470716 (10JanZerebecki) 3NEW [13:52:48] 10Continuous-Integration-Infrastructure, 10Wikidata: github.com is 403ing downloads from Wikimedia CI during composer update - https://phabricator.wikimedia.org/T106519#1470725 (10Addshore) Of the top of my head it looks like our CI infrastructure will need a OAuth token for the api requests, I am guessing the... [14:16:19] 5Continuous-Integration-Isolation, 7Nodepool: Nodepool Debian package should create /var/run/nodepool directory - https://phabricator.wikimedia.org/T105501#1470753 (10hashar) 5Open>3declined a:3hashar The systemd will launch nodepool in interactive mode and keep track of the pid by itself. So there is no... [14:27:56] PROBLEM - Puppet failure on integration-slave-precise-1013 is CRITICAL 60.00% of data above the critical threshold [0.0] [14:59:34] 5Continuous-Integration-Isolation, 6operations: install/deploy labnodepool1001 - https://phabricator.wikimedia.org/T95045#1470849 (10hashar) [14:59:37] 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Isolation, 6operations, 7Nodepool: Create a Debian package for NodePool on Debian Jessie - https://phabricator.wikimedia.org/T89142#1470850 (10hashar) [14:59:40] 10Continuous-Integration-Infrastructure, 5Continuous-Integration-Isolation, 6operations, 7Nodepool, 5Patch-For-Review: Use systemd for Nodepool - https://phabricator.wikimedia.org/T96867#1470844 (10hashar) 5Open>3Resolved Thanks to @Muehlenhoff for the final review of the systemd integration. We now... [15:01:17] greg-g: following up on my yesterday Zuul deploy , Chase ended up reverting the upgrade due to a corner case bug [15:01:42] so that is definitely the last time i deploy anything during SF business time / some other deploy window [15:14:14] PROBLEM - Puppet failure on deployment-restbase02 is CRITICAL 100.00% of data above the critical threshold [0.0] [15:14:57] hashar: yeah, tyler and lego figured it out, just needed chase to run the sudo commands [15:18:01] poor sudo [15:18:09] hashar: we should pair for next deploy there—the things I don't know about zuul...are many things, is what I realized yesterday :) [15:18:47] yeah nobody knows about it [15:18:50] tis magic! [15:19:14] it is in a half backed state for sure. I took note yesterday about how to update zuul [15:19:21] need to write them down on the wiki [15:19:49] but in short, using .deb package might not be the easiest way for us. Maybe we can go with a python virtual env and new-deploy-tool to push stuff to labs instances (where Zuul is needed) [15:20:29] 10Deployment-Systems, 10MediaWiki-extensions-LocalisationUpdate, 7I18n: l10nupdate fails since 2015-07-14 - https://phabricator.wikimedia.org/T106460#1470873 (10Krenair) [15:25:47] grrrr.... https://lists.wikimedia.org/pipermail/mobile-l/2015-July/009491.html [15:25:54] I thought we crushed that [15:28:34] Yay less projects in Gerrit \o/ \o/ [15:31:31] bd808: that follow up some discussion we had with mobile [15:31:39] bd808: they need IOS CI which we can't really provide [15:31:51] same deal for RESTBase [15:31:56] they needed cassandra and other stuff [15:32:36] k. more fragmentation sucks but if your folks are fine with it I'll stop bitching [15:34:07] hasharConfcall: zuul may not be picking up changes again...https://gerrit.wikimedia.org/r/#/c/220358 [15:38:56] ostriches: Fewer. ;-) [15:40:57] James_F: :) :) [15:41:04] * James_F grins. [15:43:01] thcipriani: interesting [15:43:06] huh, well, I guess changes to mediawiki/core trigger gate-and-submit, just not changes to mediawiki/config [15:43:09] thcipriani: can you try removing your CR+2 and vote V+2 and CR+2 [15:43:18] yup [15:44:44] hasharConfcall: that did it \o/ [15:45:20] thcipriani: I think gate-and-submit reject the change because it lacks verified +2 [15:45:30] thcipriani: gerrit would report the change has not being ready to be merged [15:46:06] hasharConfcall: gotcha, just a little gun-shy now, thanks for the help! :) [15:46:43] James_F: I place fewer importance on using words correctly :) [15:46:52] * James_F grins. [15:51:12] 10Continuous-Integration-Infrastructure, 10Wikidata: github.com is 403ing downloads from Wikimedia CI during composer update - https://phabricator.wikimedia.org/T106519#1470948 (10JanZerebecki) We should be using Satis ( https://getcomposer.org/doc/articles/handling-private-packages-with-satis.md ) to have a l... [15:52:22] thcipriani: which seems to be a regression :-(((( [15:52:59] bummer :( [15:57:47] thcipriani: so yeah Zuul is tightly coupled with Gerrit [15:58:10] and whenever it does an action for a change, it would often query Gerrit for a bunch of info [15:58:35] seems only having Verified+1 is not enough :-/ [15:59:37] 10Browser-Tests: mediawiki selenium gem creates a new user for every page created - https://phabricator.wikimedia.org/T106343#1470982 (10dduvall) 5Open>3Invalid We confirmed yesterday in IRC that configuration was the issue: `MEDIAWIKI_ENVIRONMENT` was not defined and so the default environment which has `us... [15:59:49] it's atypical that jenkins-bot will Verified+1 though, right? [16:00:03] it does V+1 when tests havent been run [16:00:05] that is the check pipeline [16:00:12] which run jobs for non whitelisted folks [16:00:22] V+2 is granted when tests ran with the 'test' pipeline [16:00:27] which unlock the change [16:00:42] but I think I found the reason [16:01:02] Zuul has been broken all time long until I changed the code-review / verified fields in layout.yaml to be upper cased [16:01:14] need to write some test :( [16:02:00] so yesterday issue of CR+2 changes not entering gate-and-submit was caused by https://gerrit.wikimedia.org/r/#/c/226220/ [16:02:04] which changes a bunch of labels [16:02:18] so zuul can gerrit --label verified=2 [16:02:40] which does not work because Gerrit does not recnogize the label [16:02:59] so I made them upper case which ends up crafting: gerrit --label Verified=2 (note uppercase Verified) [16:03:41] the side effect is that the gate-and-submit require an approval of CR+2 for a change to be accepted. And the approvals are matched all lower() case ( https://gerrit.wikimedia.org/r/#/c/226274/2/zuul/layout.yaml,unified ) [16:03:43] that is a mess [16:03:53] I should fill a task [16:08:24] wow, that's a strange breaking change from upstream. [16:12:12] 10Continuous-Integration-Infrastructure, 7Zuul: Changes voted CR+2 do not enter gate-and-submit unless Verified is +2. - https://phabricator.wikimedia.org/T106531#1471038 (10hashar) 3NEW [16:12:24] thcipriani: https://phabricator.wikimedia.org/T106531 [16:12:31] thcipriani: it has been broken on our setup since day 1 I believe [16:12:53] but changing the labels case in zuul layout.yaml unbroke the feature [16:13:42] thcipriani: openstack require a change to be all green (two CR+2, one Approval+1 and Verified+2) before it can enter the gate [16:14:08] because their gate pipelines takes hours to process changes, they dont want it to be overloaded with patches that did not pass tests yet [16:14:49] 10Continuous-Integration-Infrastructure, 7Zuul: Changes voted CR+2 do not enter gate-and-submit unless Verified is +2. - https://phabricator.wikimedia.org/T106531#1471071 (10hashar) The reason for that behavior is OpenStack requires a change to be all green (two CR+2, one Approval+1 and Verified+2) before it... [16:15:00] maybe we can allow changes regardless of their Verified score [16:16:53] hashar: hmm, so we were relying on a bug in the Openstack code somewhere, effectively. [16:17:17] or our file had a bug which disabled the feature [16:17:29] I havent really deeply investigated, that is all my first impression [16:18:15] gotcha, well your ticket makes sense from an outside perspective. [16:18:29] ohh [16:18:38] bouh [16:24:43] 10Continuous-Integration-Infrastructure, 7Zuul: Changes voted CR+2 do not enter gate-and-submit unless Verified is +2. - https://phabricator.wikimedia.org/T106531#1471140 (10hashar) OpenStack 'gate' pipeline has: ``` lang=yaml require: open: True current-patchset: True approv... [16:24:50] added a few more thoughts [16:28:05] (03CR) 10Ejegg: [C: 031] "This looks necessary!" [integration/config] - 10https://gerrit.wikimedia.org/r/221310 (owner: 10Awight) [16:29:16] thcipriani: I have poked engineering/wikitech about it [16:29:28] work around is to have the change V+2 manually [16:30:53] kk, I'll keep an eye on IRC today in case that comes up again. [16:33:15] thcipriani: I will write a test to exercice it tomorrow [16:33:21] for now I promised kids to be back home earlier [16:33:34] I think having to vote V+2 is an acceptable workaround for today [16:35:00] # IMPOSSIBLE [16:35:02] return False [16:35:04] when you see that [16:35:10] you know the code must have some issue hehe [16:40:57] 10Continuous-Integration-Infrastructure, 7Zuul: Changes voted CR+2 do not enter gate-and-submit unless Verified is +2. - https://phabricator.wikimedia.org/T106531#1471257 (10hashar) When a change enters gate-and-submit, Zuul reset the Verified score to 0: ``` start: gerrit: Verified: 0 ``` T... [16:41:08] thcipriani: found it :} [16:41:14] and the root cause is the lame https://gerrit.wikimedia.org/r/#/c/226220/1/zuul/layout.yaml [16:41:28] which I have set because gerrit review --label verified=2 is rejected by Gerrit [16:41:32] it can't find such label [16:41:33] bah [16:42:03] I am calling it an end see you tomorrow [16:58:56] 10Continuous-Integration-Infrastructure, 10Wikidata: create mirror for for our composer dependencies - https://phabricator.wikimedia.org/T106548#1471449 (10JanZerebecki) 3NEW [17:22:01] 10Browser-Tests, 3Reading-Web: Failed Jenkins job sets Sauce Labs job to passed - https://phabricator.wikimedia.org/T105589#1447336 (10Jdlrobson) [17:25:17] PROBLEM - Puppet failure on nodepool-t105406 is CRITICAL 100.00% of data above the critical threshold [0.0] [17:40:12] This is mysterious... https://integration.wikimedia.org/ci/job/php-composer-validate/1646/console [17:40:31] I just created a new mediawiki/vendor branch for fundraising deployment, is there anything special I have to do? [17:50:50] what is going on, jenkins is not testing/merging https://gerrit.wikimedia.org/r/#/c/226314/ ? [17:52:44] Nikerabbit: Try "recheck" [17:55:32] 6Release-Engineering, 6operations, 7Database: Audit all existing code to ensure that any extension currently or previously adding blobs to ES has been registering a reference in the text table (and fix up if wrong) - https://phabricator.wikimedia.org/T106388#1471749 (10Legoktm) [17:58:21] awight: nope, it is not running the tests [17:58:56] only 2 of 4 compared to https://gerrit.wikimedia.org/r/#/c/213232/ [18:00:38] It's also not merging, even after V+2 CR+2 [18:05:38] PROBLEM - Free space - all mounts on integration-slave-trusty-1015 is CRITICAL integration.integration-slave-trusty-1015.diskspace._mnt.byte_percentfree (<10.00%) [18:33:11] 10Continuous-Integration-Infrastructure, 10Wikidata: create mirror for for our composer dependencies - https://phabricator.wikimedia.org/T106548#1471904 (10JanZerebecki) [18:48:24] 10Continuous-Integration-Infrastructure, 10Wikidata: create mirror for for our composer dependencies - https://phabricator.wikimedia.org/T106548#1471958 (10Legoktm) Last time I played with satis, it was just a packagist.org mirror, not a mirror of the actual archives. https://toranproxy.com/ looks more like wh... [18:48:35] 10Continuous-Integration-Infrastructure, 10Wikidata, 7Composer: create mirror for for our composer dependencies - https://phabricator.wikimedia.org/T106548#1471959 (10Legoktm) [18:51:19] 10Deployment-Systems, 6operations, 7HHVM: HHVM lock-ups - https://phabricator.wikimedia.org/T89912#1471986 (10swtaarrs) I'm not aware of any fixes for this specific issue. I had the original author of StatCache take a look at @BBlack's comments and he said it shouldn't be possible without some kind of memory... [19:21:45] 10Deployment-Systems: Make sync-wikiversions check that a valid localisation cache exists when syncing new versions - https://phabricator.wikimedia.org/T100573#1472140 (10mmodell) We could/should also check for `wmf-config/ExtensionMessages-$v.php` [20:10:25] (03PS1) 1020after4: Check for l10n cache before sync-wikiversions [tools/scap] - 10https://gerrit.wikimedia.org/r/226353 (https://phabricator.wikimedia.org/T100573) [20:14:03] 10Deployment-Systems, 10MediaWiki-extensions-LocalisationUpdate, 7I18n: l10nupdate fails since 2015-07-14 - https://phabricator.wikimedia.org/T106460#1472475 (10mmodell) @legoktm: I think I did that accidentally. [20:23:49] 10Deployment-Systems, 5Patch-For-Review: Make sync-wikiversions check that a valid localisation cache exists when syncing new versions - https://phabricator.wikimedia.org/T100573#1472508 (10Reedy) >>! In T100573#1472140, @mmodell wrote: > We could/should also check for `wmf-config/ExtensionMessages-$v.php` Go... [20:26:23] (03CR) 10Reedy: "Woo :D" (031 comment) [tools/scap] - 10https://gerrit.wikimedia.org/r/226353 (https://phabricator.wikimedia.org/T100573) (owner: 1020after4) [20:30:24] 10Continuous-Integration-Infrastructure, 10Wikimedia-Git-or-Gerrit, 7Zuul: gerrit review --label verified=1 causes a SQL error - https://phabricator.wikimedia.org/T106596#1472547 (10hashar) 3NEW [20:30:54] 10Continuous-Integration-Infrastructure, 7Zuul: Changes voted CR+2 do not enter gate-and-submit unless Verified is +2. - https://phabricator.wikimedia.org/T106531#1472557 (10hashar) [20:47:24] 10Continuous-Integration-Infrastructure, 10Wikimedia-Git-or-Gerrit, 7Zuul: gerrit review --label verified=1 causes a SQL error - https://phabricator.wikimedia.org/T106596#1472608 (10hashar) I checked on OpenStack and both works: ``` $ ssh -p 29418 review.openstack.org 'gerrit review --label code-review=-1 20... [20:49:57] 10Continuous-Integration-Infrastructure, 7Zuul: Changes voted CR+2 do not enter gate-and-submit unless Verified is +2. - https://phabricator.wikimedia.org/T106531#1472629 (10hashar) The reason I changed the labels to be upper case is because `gerrit review --label verified=-1` causes a SQL error (T106596).... [20:55:10] 6Release-Engineering: Testing: where does it hurt? - https://phabricator.wikimedia.org/T106600#1472646 (10ggellerman) 3NEW a:3dduvall [20:57:20] 10Continuous-Integration-Infrastructure, 10Wikimedia-Git-or-Gerrit, 7Zuul: gerrit review --label verified=1 causes a SQL error - https://phabricator.wikimedia.org/T106596#1472659 (10hashar) The OpenStack Gerrit forks at https://review.openstack.org/p/openstack-infra/gerrit.git does not seem to have anything... [21:04:01] (03PS1) 10Hashar: Revert "Fix passing labels to Gerrit when they are not defined in All-Projects" [integration/zuul] (patch-queue/debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/226415 (https://phabricator.wikimedia.org/T106531) [21:04:39] thcipriani: hey [21:04:45] thcipriani: following this morning rant [21:05:00] thcipriani: I will revert the fault change in Zuul. It is just one line [21:05:10] thcipriani: and revert back the layout.yaml file to use lower case code-review and verified [21:05:49] and I will not deploy it tonight :-} [21:06:25] that's probably good. [21:06:26] (03PS2) 10Hashar: Revert "Fix passing labels to Gerrit when they are not defined in All-Projects" [integration/zuul] (patch-queue/debian/precise-wikimedia) - 10https://gerrit.wikimedia.org/r/226415 (https://phabricator.wikimedia.org/T106531) [21:07:43] 10Continuous-Integration-Infrastructure, 5Patch-For-Review, 7Zuul: Changes voted CR+2 do not enter gate-and-submit unless Verified is +2. - https://phabricator.wikimedia.org/T106531#1472699 (10hashar) p:5Triage>3Unbreak! [21:09:04] 10Continuous-Integration-Infrastructure, 5Patch-For-Review, 7Zuul: Changes voted CR+2 do not enter gate-and-submit unless Verified is +2. - https://phabricator.wikimedia.org/T106531#1471038 (10hashar) [21:09:07] 10Continuous-Integration-Infrastructure, 6operations: Upload new Zuul .deb package on apt.wikimedia.org for precise-wikimedia - https://phabricator.wikimedia.org/T106499#1472709 (10hashar) [21:09:34] hashar: wow, went on a bit of a spree, eh? https://review.openstack.org/#/c/204499/ [21:09:59] 10Continuous-Integration-Infrastructure, 6operations: Upload new Zuul .deb package on apt.wikimedia.org for precise-wikimedia - https://phabricator.wikimedia.org/T106499#1472719 (10hashar) 5Open>3stalled There is some nasty regression in it so marking T106531 as a blocker. Will craft a new package with //... [21:20:18] (03PS1) 10Paladox: SyntaxHighlight_GeSHi run old test for branchs REL1_23, REL1_24, REL1_25 [integration/config] - 10https://gerrit.wikimedia.org/r/226418 [21:20:38] (03PS2) 10Paladox: SyntaxHighlight_GeSHi run old test for branchs REL1_23, REL1_24, REL1_25 [integration/config] - 10https://gerrit.wikimedia.org/r/226418 [21:38:18] (03CR) 10Hashar: "This patch is non sense :-) Please stop copy pasting random bits and hoping it might do something useful!" (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/226418 (owner: 10Paladox) [21:38:32] hashar: -2 it [21:38:49] Reedy: CI would not let it sneak in :-} [21:39:07] Reedy: Zuul gonna complain about the non sense! [21:40:30] (03PS3) 10Awight: Update vendor using composer rather than cloning the deployment repo [integration/config] - 10https://gerrit.wikimedia.org/r/221310 [21:42:42] thcipriani: that upstream change at https://review.openstack.org/#/c/204499/ is a non sense really :-} [21:42:51] thcipriani: it is just to run the test suite on their side hehe [21:43:10] it fails everything http://logs.openstack.org/99/204499/1/check/gate-zuul-python27/906a04d/testr_results.html.gz [21:44:08] thcipriani: will revert the zuul faulty commit that is the root cause of all the madness. Will do tomorrow and write some wiki doc :-) [21:45:39] hashar: sounds good. Would like to tag along for the next upgrade. Steal some of your knowledge. [21:55:32] (03CR) 10JanZerebecki: [C: 04-1] "I like the intent of this patch." [integration/config] - 10https://gerrit.wikimedia.org/r/221310 (owner: 10Awight) [21:57:02] thcipriani: definitely [21:57:17] for now it is sleep() :-} [22:28:10] (03PS2) 10JanZerebecki: Change time of Wikidata browser test to run at 01:00 UTC [integration/config] - 10https://gerrit.wikimedia.org/r/224998 (https://phabricator.wikimedia.org/T105985) [22:32:01] (03CR) 10JanZerebecki: [C: 032] "Updated Jenkins jobs: (['browsertests-Wikidata-PerformanceTests-linux-firefox-sauce', 'browsertests-Wikidata-SmokeTests-linux-firefox-sauc" [integration/config] - 10https://gerrit.wikimedia.org/r/224998 (https://phabricator.wikimedia.org/T105985) (owner: 10JanZerebecki) [22:34:12] (03Merged) 10jenkins-bot: Change time of Wikidata browser test to run at 01:00 UTC [integration/config] - 10https://gerrit.wikimedia.org/r/224998 (https://phabricator.wikimedia.org/T105985) (owner: 10JanZerebecki) [22:49:15] (03CR) 10JanZerebecki: [C: 032] "Yay, tests!" [integration/config] - 10https://gerrit.wikimedia.org/r/224517 (owner: 10Hashar) [22:49:30] (03PS8) 10JanZerebecki: tests: consistency of Zuul pipelines declarations [integration/config] - 10https://gerrit.wikimedia.org/r/224517 (owner: 10Hashar) [22:49:38] (03CR) 10JanZerebecki: [C: 032] tests: consistency of Zuul pipelines declarations [integration/config] - 10https://gerrit.wikimedia.org/r/224517 (owner: 10Hashar) [22:53:07] (03CR) 10JanZerebecki: [C: 032] tests: consistency of Zuul pipelines declarations [integration/config] - 10https://gerrit.wikimedia.org/r/224517 (owner: 10Hashar) [22:54:40] (03Merged) 10jenkins-bot: tests: consistency of Zuul pipelines declarations [integration/config] - 10https://gerrit.wikimedia.org/r/224517 (owner: 10Hashar) [23:11:34] (03PS6) 10JanZerebecki: Publish php docs for MobileFrontend [integration/config] - 10https://gerrit.wikimedia.org/r/223527 (https://phabricator.wikimedia.org/T105134) (owner: 10Florianschmidtwelzow) [23:29:46] 10Browser-Tests, 6Collaboration-Team, 10Echo: 'I am logged in as a new user with no notifications' doesn't always work on browser tests - https://phabricator.wikimedia.org/T102177#1473369 (10Catrope) p:5Triage>3Normal [23:29:51] 6Release-Engineering, 6Analytics-Backlog, 6Performance-Team, 6operations, 7Varnish: Verify traffic to static resources from past branches does indeed drain - https://phabricator.wikimedia.org/T102991#1473368 (10ori) To investigate this, @Krinkle and I collected 10 minutes' worth of requests to `poweredby... [23:30:35] PROBLEM - Free space - all mounts on integration-slave-trusty-1015 is CRITICAL integration.integration-slave-trusty-1015.diskspace._mnt.byte_percentfree (<20.00%) [23:31:45] PROBLEM - Puppet failure on deployment-cache-text02 is CRITICAL 100.00% of data above the critical threshold [0.0] [23:37:13] 6Release-Engineering, 6Analytics-Backlog, 6operations, 7Varnish: Verify traffic to static resources from past branches does indeed drain - https://phabricator.wikimedia.org/T102991#1473421 (10ori)