[00:03:03] 10Continuous-Integration-Infrastructure: update phpunit requirement to one that doesn't use the deprecated each() - https://phabricator.wikimedia.org/T178859#3705243 (10MarkAHershberger) [02:16:17] Yeah, beta cluster is up again :) [02:42:26] PROBLEM - Puppet errors on deployment-sca01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [03:22:25] RECOVERY - Puppet errors on deployment-sca01 is OK: OK: Less than 1.00% above the threshold [0.0] [05:01:00] 10Release-Engineering-Team (Kanban), 10Epic: FY2017/18 Program 3 Outcome 2 Objective 3: Promote and surface important technical debt topics at large gatherings of Wikimedia developers (e.g., DevSummit and Hackathon(s)) - https://phabricator.wikimedia.org/T174096#3705461 (10Jrbranaa) a:03Jrbranaa [05:01:06] 10Release-Engineering-Team (Kanban), 10Epic: FY2017/18 Program 3: Addressing technical debt - https://phabricator.wikimedia.org/T174087#3705463 (10Jrbranaa) a:03Jrbranaa [05:01:08] 10Release-Engineering-Team (Kanban), 10Epic: FY2017/18 Program 3 Outcome 1: The amount of orphaned code that is running Wikimedia “production” services is reduced. - https://phabricator.wikimedia.org/T174088#3705465 (10Jrbranaa) a:03Jrbranaa [05:01:30] 10Release-Engineering-Team (Kanban), 10Epic: FY2017/18 Program 3 Outcome 1: The amount of orphaned code that is running Wikimedia “production” services is reduced. - https://phabricator.wikimedia.org/T174088#3705469 (10Jrbranaa) [05:01:39] 10Release-Engineering-Team (Kanban), 10Epic: FY2017/18 Program 3 Outcome 1: The amount of orphaned code that is running Wikimedia “production” services is reduced. - https://phabricator.wikimedia.org/T174088#3550622 (10Jrbranaa) [05:01:42] 10Release-Engineering-Team (Kanban), 10Epic, 10RelEng FY201718 Q2 Goals: FY2017/18 Program 3 Outcome 1 Objective 1: Define a set of code stewardship levels (from high to low expectations) - https://phabricator.wikimedia.org/T174090#3705472 (10Jrbranaa) 05Resolved>03Open [05:02:02] 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10RelEng FY201718 Q2 Goals: FY2017/18 Program 3 Outcome 1 Objective 2: Identify and find stewards for high-priority/high use code segment orphans - https://phabricator.wikimedia.org/T174091#3705473 (10Jrbranaa) a:03Jrbranaa [05:02:56] 10Release-Engineering-Team (Kanban), 10Epic: FY2017/18 Program 3 Outcome 2: Organizational technical debt is reduced. - https://phabricator.wikimedia.org/T174089#3705475 (10Jrbranaa) a:03Jrbranaa [05:03:10] 10Release-Engineering-Team (Kanban), 10Epic: FY2017/18 Program 3 Outcome 2 Objective 1: Define a “Technical Debt Project Manager” role that regularly communicates with all Foundation engineering teams regarding their technical debt - https://phabricator.wikimedia.org/T174093#3705478 (10Jrbranaa) a:03Jrbranaa [05:03:35] 10Release-Engineering-Team (Kanban), 10Epic: FY2017/18 Program 3 Outcome 2 Objective 2: Define and implement a process to regularly address technical debt across the Foundation - https://phabricator.wikimedia.org/T174095#3705480 (10Jrbranaa) a:03Jrbranaa [05:08:35] 10Release-Engineering-Team (Kanban): Identify Orphaned components/code - https://phabricator.wikimedia.org/T173349#3705485 (10Jrbranaa) [05:08:37] 10Release-Engineering-Team (Kanban), 10Epic, 10RelEng FY201718 Q2 Goals: FY2017/18 Program 3 Outcome 1 Objective 1: Define a set of code stewardship levels (from high to low expectations) - https://phabricator.wikimedia.org/T174090#3705484 (10Jrbranaa) [05:09:56] 10Release-Engineering-Team (Kanban): Identify Orphaned components/code - https://phabricator.wikimedia.org/T173349#3525095 (10Jrbranaa) [05:09:58] 10Release-Engineering-Team (Kanban), 10Epic, 10RelEng FY201718 Q2 Goals: FY2017/18 Program 3 Outcome 1 Objective 1: Define a set of code stewardship levels (from high to low expectations) - https://phabricator.wikimedia.org/T174090#3550646 (10Jrbranaa) [05:10:35] 10Release-Engineering-Team (Kanban): Identify Orphaned components/code - https://phabricator.wikimedia.org/T173349#3525095 (10Jrbranaa) [05:10:37] 10Release-Engineering-Team (Kanban), 10releng-201718-q3, 10Epic, 10RelEng FY201718 Q2 Goals: FY2017/18 Program 3 Outcome 1 Objective 2: Identify and find stewards for high-priority/high use code segment orphans - https://phabricator.wikimedia.org/T174091#3705488 (10Jrbranaa) [07:25:50] 10Beta-Cluster-Infrastructure, 10Operations, 10Traffic: Beta cluster is down - https://phabricator.wikimedia.org/T178841#3704733 (10Gilles) upload.beta.wmflabs.org refuses SSL connections right now, I see that it's not on that list [07:45:03] zeljkof: good morning. Beta cluster was down last night (the text Varnish was screwed up) [07:45:05] Yippee, build fixed! [07:45:05] Project selenium-Echo » chrome,beta,Linux,BrowserTests build #557: 09FIXED in 43 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/557/ [07:45:10] Yippee, build fixed! [07:45:10] Project selenium-Echo » firefox,beta,Linux,BrowserTests build #557: 09FIXED in 49 sec: https://integration.wikimedia.org/ci/job/selenium-Echo/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/557/ [07:45:13] zeljkof: I have manually triggered the selenium builds ^^ :) [07:45:45] Yippee, build fixed! [07:45:45] Project selenium-CentralAuth » firefox,beta,Linux,BrowserTests build #559: 09FIXED in 1 min 29 sec: https://integration.wikimedia.org/ci/job/selenium-CentralAuth/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/559/ [07:46:03] Yippee, build fixed! [07:46:03] Project selenium-PageTriage » chrome,beta,Linux,BrowserTests build #554: 09FIXED in 1 min 38 sec: https://integration.wikimedia.org/ci/job/selenium-PageTriage/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/554/ [07:46:09] Yippee, build fixed! [07:46:09] Project selenium-PageTriage » firefox,beta,Linux,BrowserTests build #554: 09FIXED in 1 min 44 sec: https://integration.wikimedia.org/ci/job/selenium-PageTriage/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/554/ [07:49:56] 10Beta-Cluster-Infrastructure, 10Operations, 10Traffic: Beta cluster is down - https://phabricator.wikimedia.org/T178841#3705637 (10hashar) I guess we only fixed the text cache. Puppet fails on deployment-cache-upload04.deployment-prep.eqiad.wmflabs :( ``` Error: /Stage[main]/Nginx/Package[nginx-full]/ensure... [07:55:51] hashar: ouch [07:56:01] Thanks for running the jobs [07:59:21] 10Beta-Cluster-Infrastructure, 10Operations, 10Traffic: Beta cluster is down - https://phabricator.wikimedia.org/T178841#3705644 (10hashar) I have applied a similar configuration in hiera for deployment-cache-upload04 While installing nginx-extra, the service failed to restart which blocks puppet: ``` nginx... [08:15:07] 10Beta-Cluster-Infrastructure, 10Operations, 10Traffic: Beta cluster is down - https://phabricator.wikimedia.org/T178841#3705665 (10hashar) In `profile::cache::ssl::unified` I have commented out the `tlsproxy::localssl { 'unified': ... }` to get the Varnish conf updated eg: ``` - new cache_local = vslp.vslp... [08:24:49] 10Beta-Cluster-Infrastructure, 10Operations, 10Traffic: Beta cluster is down - https://phabricator.wikimedia.org/T178841#3705702 (10hashar) Next error: ``` Notice: /Stage[main]/Cacheproxy::Instance_pair/Varnish::Instance[upload-backend]/Exec[retry-load-new-vcl-file]/returns Command failed with error code 106... [08:29:44] 10Beta-Cluster-Infrastructure, 10Operations, 10Traffic: Beta cluster is down - https://phabricator.wikimedia.org/T178841#3705726 (10hashar) ``` # dpkg -S /usr/lib/x86_64-linux-gnu/varnish/vmods/libvmod_std.so varnish: /usr/lib/x86_64-linux-gnu/varnish/vmods/libvmod_std.so # apt-cache policy varnish varnish:... [08:35:44] !log beta: cherry pick https://gerrit.wikimedia.org/r/#/c/386077/4 "hieradata for varnish caches" - T178841 [08:35:50] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [08:35:50] T178841: Beta cluster is down - https://phabricator.wikimedia.org/T178841 [08:39:51] 10Beta-Cluster-Infrastructure, 10Operations, 10Traffic, 10Patch-For-Review: Beta cluster is down - https://phabricator.wikimedia.org/T178841#3705759 (10hashar) p:05Triage>03Normal **Status** https://gerrit.wikimedia.org/r/#/c/386077/4 cherry picked on the beta cluster puppetmaster Puppet and Varnish... [08:39:58] gilles: upload.beta.wmflabs.org should work fine now [08:40:06] RECOVERY - Puppet errors on deployment-cache-upload04 is OK: OK: Less than 1.00% above the threshold [0.0] [08:40:08] gilles: varnish / LE went up being completely screwed yesterday [08:40:24] gilles: we got the text cache fixed, but not the upload one [08:45:14] hashar: I put some idea about publishing Maven sites in https://gerrit.wikimedia.org/r/#/c/385984/ [08:45:57] hashar: no idea if it actually works or how to test it. I'm planning some time on Friday to dig into it a bit more, but if you have time to have a look at it, that would be great! (with a really low priority) [08:49:50] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [09:08:48] 10Continuous-Integration-Infrastructure (shipyard): Document minimum required version of docker to build CI images - https://phabricator.wikimedia.org/T178821#3705816 (10Addshore) > Multi-stage builds are a new feature requiring Docker 17.05 or higher on the daemon and client. [09:11:10] 10Continuous-Integration-Infrastructure (shipyard): Document minimum required version of docker to build CI images - https://phabricator.wikimedia.org/T178821#3705817 (10Addshore) @Legoktm looks like you are looking at the wrong package? https://github.com/docker/for-linux/issues/35 https://download.docker.com... [10:14:18] (03PS1) 10Aude: Bump wikidata [tools/release] - 10https://gerrit.wikimedia.org/r/386149 [10:14:39] (03CR) 10Aude: [C: 032] Bump wikidata [tools/release] - 10https://gerrit.wikimedia.org/r/386149 (owner: 10Aude) [10:15:11] (03Merged) 10jenkins-bot: Bump wikidata [tools/release] - 10https://gerrit.wikimedia.org/r/386149 (owner: 10Aude) [10:33:26] PROBLEM - Puppet errors on deployment-tin is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [10:34:29] PROBLEM - Puppet errors on deployment-cpjobqueue is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [10:53:55] PROBLEM - Puppet errors on deployment-mira is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [11:08:27] RECOVERY - Puppet errors on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0] [11:19:22] !log removed several roles mistakenly applied to puppet prefix deployment-aqs in Horizon (causing puppet failures for AQS nodes) [11:19:27] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:30:00] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [11:31:47] 10Release-Engineering-Team, 10Scap, 10Services (watching): Scap3 broken in Beta - https://phabricator.wikimedia.org/T178884#3706058 (10mobrovac) [11:32:08] 10Release-Engineering-Team, 10Scap, 10Services (watching): Scap3 broken in Beta - https://phabricator.wikimedia.org/T178884#3706071 (10mobrovac) [11:33:53] RECOVERY - Puppet errors on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0] [11:44:05] RECOVERY - Puppet errors on deployment-aqs02 is OK: OK: Less than 1.00% above the threshold [0.0] [11:47:14] 10Release-Engineering-Team, 10Scap, 10Services (watching): Scap3 broken in Beta - https://phabricator.wikimedia.org/T178884#3706146 (10mobrovac) The problem is introduced in rMSCAccea24641f77b59211ec3c8f51c94f91e344b6b9, which [sets `--jobs `](https://phabricator.wikimedia.org/source/scap/browse/master/sc... [11:49:49] RECOVERY - Puppet errors on deployment-aqs03 is OK: OK: Less than 1.00% above the threshold [0.0] [11:54:36] 10Release-Engineering-Team, 10Scap, 10Services (watching): Scap3 broken in Beta - https://phabricator.wikimedia.org/T178884#3706167 (10mobrovac) p:05High>03Unbreak! I manually patched `scap/git.py` on `deployment-cpjobqueue`, which made the error go away. However, git's version in Beta is v2.1.4, while t... [12:23:09] PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:42:12] 10Release-Engineering-Team, 10Scap, 10Services (watching): Scap3 broken in Beta - https://phabricator.wikimedia.org/T178884#3706058 (10Paladox) I think we should backport git 2.11 to wikimedia-jessie? That way all hosts that have it will install the update. [12:43:39] (03CR) 10Hashar: "recheck" [integration/config] - 10https://gerrit.wikimedia.org/r/386070 (owner: 10MacFan4000) [12:49:29] (03CR) 10Hashar: [C: 032] Rm numerous extensions [integration/config] - 10https://gerrit.wikimedia.org/r/386070 (owner: 10MacFan4000) [12:49:34] (03CR) 10Hashar: [C: 032] "Thanks :)" [integration/config] - 10https://gerrit.wikimedia.org/r/386070 (owner: 10MacFan4000) [12:50:34] (03Merged) 10jenkins-bot: Rm numerous extensions [integration/config] - 10https://gerrit.wikimedia.org/r/386070 (owner: 10MacFan4000) [13:00:29] 10Beta-Cluster-Infrastructure, 10Browser-Tests-Infrastructure, 10Release-Engineering-Team (Kanban), 10Ruby, and 2 others: Make selenium users use botflags at beta-cluster - https://phabricator.wikimedia.org/T116027#3706366 (10zeljkofilipin) Sorry, I am still not sure what to do. Let me explicitly state wha... [13:02:05] hashar i thought with https://github.com/wikimedia/puppet/commit/b3c6968b3cb81670b58f375480296aa6813fd354#diff-bca6bd045a1cd579d6827b234cdf2f74 we moved from /mnt to /srv [13:02:14] but puppet keeps bringing it back in /mnt [13:02:22] it also does it in /srv [13:16:57] ah i had to unmount it [13:17:05] ssssss [13:17:19] paladox: yeah you gotta manually remove /mnt from /etc/fstab as well [13:17:26] oh [13:17:49] else if you reboot it will fail to find the partition [13:17:53] and the instance will not boot :( [13:18:00] ah thanks [13:18:01] :) [13:19:07] paladox: you can also check whether puppet knows about /mnt by running on each instance: grep /mnt /var/lib/puppet/state/resources.txt [13:19:17] oh [13:19:18] you would just have : file[/mnt/nfs] [13:19:19] thanks [13:19:38] and double check /mnt is no more in the list of partitions mounted at boot: grep /mnt /etc/fstab [13:20:05] thanks [13:24:18] PROBLEM - Puppet errors on integration-slave-docker-1006 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [13:24:37] gehel: still around? :-) I thought highlighter required javadoc::aggregate to run https://gerrit.wikimedia.org/r/#/c/385984/3/jjb/search.yaml [13:24:57] gehel: but maybe the goals install site site:stage handles it for us [13:25:08] yes and no... [13:25:09] (03PS4) 10Hashar: Publish full maven site [integration/config] - 10https://gerrit.wikimedia.org/r/385984 (owner: 10Gehel) [13:25:20] we can still try :] [13:26:09] basically, I'm using a different approach. site:stage will do aggregation [13:27:00] not exactly in the same way, but the result is similar. And if we want a single aggregated javadoc, we should do that by configuring the project (pom.xml), not the build pipeline [13:30:03] PROBLEM - Puppet errors on integration-slave-docker-1002 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [13:33:12] PROBLEM - Puppet errors on integration-slave-docker-1005 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [13:33:42] PROBLEM - Puppet errors on deployment-redis02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [13:33:51] (03CR) 10Hashar: [C: 032] "Deployed:" [integration/config] - 10https://gerrit.wikimedia.org/r/385984 (owner: 10Gehel) [13:35:56] (03Merged) 10jenkins-bot: Publish full maven site [integration/config] - 10https://gerrit.wikimedia.org/r/385984 (owner: 10Gehel) [13:37:16] PROBLEM - Puppet errors on integration-slave-docker-1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [13:41:48] hashar: is there an easy way to trigger a post merge build to test? [13:43:01] gehel: not really [13:43:06] gehel: but i can do it from the zuul server [13:43:20] the changes I triggered are: [13:43:28] https://gerrit.wikimedia.org/r/#/c/386002/ https://gerrit.wikimedia.org/r/#/c/386004/ https://gerrit.wikimedia.org/r/#/c/384508/ [13:43:31] I have a few patches which need merging on one of the projects, let's see what it does... [13:43:37] PROBLEM - Puppet errors on integration-slave-docker-1003 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [13:43:39] https://gerrit.wikimedia.org/r/#/c/384567/ https://gerrit.wikimedia.org/r/#/c/331525/ https://gerrit.wikimedia.org/r/#/c/366238/ [13:43:55] and maybe we could use a magic keyword like "repostmerge" or "republish" [13:43:58] that would trigger the run [13:44:14] hashar: all those will be fine! [13:44:17] anyway they are running :) [13:44:30] https://integration.wikimedia.org/ci/job/discovery-maven-tool-configs-maven-site-publish/1/console is a success at least :] [13:44:32] I mean all those builds you re-triggered [13:44:40] https://doc.wikimedia.org/discovery-maven-tool-configs/ !!! [13:44:55] https://doc.wikimedia.org/discovery-maven-tool-configs/ [13:45:03] yeah, I saw the same! Cool! [13:45:06] and if you add some custom css you would have super nice sites out of the box (as i got it) [13:45:07] !! [13:45:23] Now let's see if anyone is interested in that content... [13:45:59] * gehel hates CSS, so he is not even going to try until someone tells him that this content is indeed interesting! [13:46:04] gehel: we have some access logs on contint1001 [13:46:23] else that is behind the misc varnish, so maybe the browsing habits end up in the analytics stack somewhere [13:47:17] I'm more relying on my team telling me they like it than on stats. There are only a handful of people for who those docs make sense [13:47:47] btw, where does https://doc.wikimedia.org/ come from? It does look to be auto-generated (I can't find discovery-parent-pom onit) [13:47:58] gehel: https://gerrit.wikimedia.org/r/#/c/384567/ fails [13:48:11] the docroot is in the repo integration/docroot.git [13:48:17] and indeed the entries are manually added :( [13:48:24] nobody has found out time yet to autogenerate them [13:48:35] door ringing brb [13:50:14] that failure is interesting. the "install" phase should have taken care of it, but maybe it needs 2 consecutive executions. I'll check [13:52:25] hashar: yep, it looks like it actually needs `mvn clean install && mvn site site:generate` [13:53:18] * gehel has no idea how to have 2 consecutive maven run in the same job [13:56:20] :(( [13:56:42] gehel: sounds like a bug :] [13:57:14] or at least a request for enhancement... [13:57:36] at least 50% of them worked! [13:59:45] gehel: Jenkins archives the build artifacts, so maybe we can trigger a second job that would use those artifacts and then run the site goal [14:00:33] nah, I should be able to get all that to run in the same maven execution [14:01:26] gehel: and the repository-swift one complains with : 00:00:53.569 [WARNING] The repository url 'http://download.java.net/maven/2/' is invalid - Repository 'maven2-repository.dev.java.net' will be blacklisted. [14:01:29] 00:00:56.249 [ERROR] Unable to connect to: http://maven.elasticsearch.org/releases [14:01:48] Oh, repository-swift is mostly dead [14:01:56] :D [14:02:12] and has not been updated for ages, so let's just not care about it [14:02:34] we're not using that code anymore, so we've offered the project for adoption [14:02:34] and search/ltr has the same issue https://integration.wikimedia.org/ci/job/search-ltr-maven-site-publish/1/console [14:03:48] Could not resolve dependencies for project org.wikimedia.search.highlighter:experimental-highlighter-lucene:jar:5.5.2.3-SNAPSHOT: Could not find artifact org.wikimedia.search.highlighter:experimental-highlighter-core:jar:5.5.2.3-SNAPSHOT [14:03:52] that is for search/highlighter on https://gerrit.wikimedia.org/r/#/c/384567/ [14:04:48] Oops, ltr has not been moved to our parent pom yet, my mistake [14:04:56] 10Release-Engineering-Team (Kanban), 10Readers-Web-Backlog, 10RelatedArticles, 10Browser-Tests, and 4 others: Automated browser tests cannot create pages on the Beta Cluster as anonymous user in RelatedArticles tests - https://phabricator.wikimedia.org/T176315#3706521 (10zeljkofilipin) @Jdlrobson sorry, @h... [14:07:31] gehel: anyway the jobs run after the change got merged. So it is not a blocker to development, just a distraction when the failure is reported [14:07:37] I guess it is good enough [14:08:00] yep, I'll have a look on my side, I should be able to fix the issues in the project itself. [14:08:08] I'll take some time on Friday to see... [14:08:43] RECOVERY - Puppet errors on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0] [14:10:44] hashar: damn, bug open since 2006: https://issues.apache.org/jira/browse/MSITE-171 [14:16:14] 10Gerrit, 10Release-Engineering-Team (Backlog), 10Zuul: Update zuul to upstream master - https://phabricator.wikimedia.org/T158243#3706533 (10Paladox) This needs to be done so we can do T140366 and T178385 [14:21:44] presente! [14:21:53] (lol, wrong channel) [14:30:16] (03Draft1) 10Paladox: Backport fix "Fix change number extraction on new enough Gerrit master" [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/386194 (https://phabricator.wikimedia.org/T158243) [14:33:36] (03PS1) 10Paladox: Allows users to merge branches on refs/for/* [integration/zuul] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/386201 [14:33:53] hashar im wondering could you review please ^^ :) [14:36:09] though i guess i can create a .patch to do what i want :) [14:36:13] backporting a fix [14:37:08] paladox: I am not going to upgrade Zuul anytime soon [14:37:24] ok [14:37:41] I cant afford to context switch and redo the way the package is being done. Sorry [14:37:58] ok [14:46:42] (03CR) 10Paladox: "recheck" [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/338620 (owner: 10Paladox) [14:47:04] (03CR) 10Paladox: "recheck" [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/356181 (owner: 10Paladox) [14:47:30] (03Abandoned) 10Hashar: docker: fix pip cache permissions [integration/config] - 10https://gerrit.wikimedia.org/r/385993 (owner: 10Hashar) [14:49:54] (03Abandoned) 10Hashar: castor in a container [integration/config] - 10https://gerrit.wikimedia.org/r/386053 (owner: 10Hashar) [14:50:53] (03Abandoned) 10Paladox: Backport fix "Fix change number extraction on new enough Gerrit master" [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/386194 (https://phabricator.wikimedia.org/T158243) (owner: 10Paladox) [14:50:54] (03PS7) 10Paladox: Backport fix "Fix change number extraction on new enough Gerrit master" [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/356181 (https://phabricator.wikimedia.org/T158243) [15:04:51] Ah i found the blocking change https://gerrit-review.googlesource.com/?polygerrit=0#/c/gerrit/+/95250/ [15:09:26] 10Scap: sync-wikiversions reporting success when all hosts failed - https://phabricator.wikimedia.org/T78024#3706776 (10demon) 05Open>03Resolved a:03demon Should be fixed. [15:13:09] i guess we could revert https://gerrit-review.googlesource.com/#/c/gerrit/+/95250/ && https://gerrit-review.googlesource.com/#/c/plugins/its-base/+/96131/ tmp? [15:13:40] no_justification ^^. There shoulden't be anyside affects as i've been looking around. [15:14:34] Bleh. [15:15:04] it is just correcting the behavour so instead of it doing a string inside the code it is being an int. [15:15:22] (03PS7) 10Hashar: Castor for Docker jobs [integration/config] - 10https://gerrit.wikimedia.org/r/385390 [15:19:01] (03CR) 10Hashar: [C: 04-2] "check experimental" [integration/config] - 10https://gerrit.wikimedia.org/r/385390 (owner: 10Hashar) [15:20:35] no_justification as a 2.14 branch hasent been created for its-base yet (i've requested one) https://gerrit-review.googlesource.com/#/c/plugins/its-base/+/135972/ https://gerrit-review.googlesource.com/#/c/plugins/its-base/+/135992/ [15:20:44] 00:00:00.160 + docker run -rm --tty [15:20:44] 00:00:00.185 unknown shorthand flag: 'r' in -rm [15:20:45] yes [15:20:48] I am doomed [15:21:04] those two changes are reverts as one of them removes draft support which we doint need to do in 2.14 so i reverted it [15:23:07] hashar: --rm [15:24:39] and https://gerrit-review.googlesource.com/#/c/gerrit/+/136013/ [15:24:41] for gerrit core [15:24:48] i've put it on the stable-2.14 branch [15:55:24] PROBLEM - Puppet errors on integration-slave-jessie-1002 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [16:26:23] 10Continuous-Integration-Infrastructure, 10Patch-For-Review, 10User-Addshore: un blacklist https://integration.wikimedia.org/ci/computer/XXXX/builds - https://phabricator.wikimedia.org/T178458#3707103 (10Addshore) So, we merged and then reverted. For hosts with few jobs run this change is absolutly fine. Fo... [16:27:07] mobrovac: re: scap breakage on beta... I merged a change recently that affects the way the submodules get cloned. Which repo are you seeing problems with? I'll take a look and see what's up with it [16:27:41] twentyafterfour: everything you need to know is in https://phabricator.wikimedia.org/T178884, all services repos are affected [16:27:52] (dunno for MW) [16:27:54] mobrovac: thanks looking [16:29:13] 10Release-Engineering-Team, 10Scap, 10Services (watching): Scap3 broken in Beta - https://phabricator.wikimedia.org/T178884#3707111 (10mmodell) we already have git 2.11 backported to wikimedia jessie, I'm not sure why it isn't in beta... [16:29:42] PROBLEM - Puppet errors on deployment-redis02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [16:31:39] 10Release-Engineering-Team, 10Scap, 10Services (watching): Scap3 broken in Beta - https://phabricator.wikimedia.org/T178884#3707114 (10mmodell) hmm, I'm not sure how to force an upgrade but the package is upgradeable on `cpjobqueue` [16:34:43] 10Release-Engineering-Team, 10Scap, 10Services (watching): Scap3 broken in Beta - https://phabricator.wikimedia.org/T178884#3707131 (10mobrovac) @mmodell I should point out that `deployment-cpjobqueue` was a new instance I spun today, so the expectation should have been that the package was already installed... [16:36:18] 10Release-Engineering-Team, 10Scap: Beta cluster scap deployment: TypeError: execv() arg 2 must contain only strings - https://phabricator.wikimedia.org/T178922#3707141 (10bearND) [16:37:48] 10Release-Engineering-Team, 10Scap: Beta cluster scap deployment: TypeError: execv() arg 2 must contain only strings - https://phabricator.wikimedia.org/T178922#3707160 (10mobrovac) [16:37:50] 10Release-Engineering-Team, 10Scap, 10Services (watching): Scap3 broken in Beta - https://phabricator.wikimedia.org/T178884#3707163 (10mobrovac) [16:37:52] 10Release-Engineering-Team, 10Scap: Beta cluster scap deployment: TypeError: execv() arg 2 must contain only strings - https://phabricator.wikimedia.org/T178922#3707164 (10bearND) [16:37:54] 10Release-Engineering-Team, 10Scap, 10Services (watching): Scap3 broken in Beta - https://phabricator.wikimedia.org/T178884#3707165 (10bearND) [16:38:17] 10Release-Engineering-Team, 10Scap: Beta cluster scap deployment: TypeError: execv() arg 2 must contain only strings - https://phabricator.wikimedia.org/T178922#3707141 (10bearND) Duh, just noticed this ticket after I submitted mine. [16:41:19] 10Release-Engineering-Team, 10Scap, 10Services (watching): Scap3 broken in Beta - https://phabricator.wikimedia.org/T178884#3707179 (10mmodell) I expected that the same git version would be installed in prod and beta, but apparently not. I'm looking into it [16:48:28] 10Continuous-Integration-Infrastructure, 10Patch-For-Review, 10User-Addshore: un blacklist https://integration.wikimedia.org/ci/computer/XXXX/builds - https://phabricator.wikimedia.org/T178458#3707210 (10Paladox) I belive we may see more performance improvements with the next big lts release https://issues.j... [16:52:06] 10Release-Engineering-Team, 10Scap, 10Services (watching): Scap3 broken in Beta - https://phabricator.wikimedia.org/T178884#3706058 (10demon) Also: I'm also looking at my `utils.cpus_for_jobs()` I introduced in rMSCAccea2. This returns a number of cores as an integer. This matched existing behavior, but othe... [17:04:39] i can test with building those changes :). [17:04:45] (gerrit) [17:05:03] as i want to backport its-base soy change for 2.14 as it hasen't been merged yet. [17:09:43] RECOVERY - Puppet errors on deployment-redis02 is OK: OK: Less than 1.00% above the threshold [0.0] [17:11:16] 10Release-Engineering-Team, 10Scap, 10Services (watching): Scap3 broken in Beta - https://phabricator.wikimedia.org/T178884#3707395 (10mmodell) ``` twentyafterfour@deployment-parsoid09:~$ apt-cache policy git git: Installed: 1:2.11.0-2~bpo8+1 Candidate: 1:2.11.0-2~bpo8+1 Version table: 1:2.11.0-3~... [17:14:23] 10Release-Engineering-Team, 10Scap, 10Services (watching): Scap3 broken in Beta - https://phabricator.wikimedia.org/T178884#3707400 (10mmodell) ``` twentyafterfour@deployment-sca02:~$ apt-cache policy git git: Installed: 1:2.1.4-2.1+deb8u5 Candidate: 1:2.11.0-2~bpo8+1 Version table: 1:2.11.0-3~bpo... [17:17:18] 10Release-Engineering-Team (Kanban), 10Readers-Web-Backlog, 10RelatedArticles, 10Browser-Tests, and 4 others: Automated browser tests cannot create pages on the Beta Cluster as anonymous user in RelatedArticles tests - https://phabricator.wikimedia.org/T176315#3707410 (10Jdlrobson) Well the page is just a... [17:21:37] 10Release-Engineering-Team, 10Scap, 10Services (watching): Scap3 broken in Beta - https://phabricator.wikimedia.org/T178884#3707424 (10demon) Still need to sort out the git packaging issue as we're moving from a soft to hard requirement on git 2.11. But the above commit will at least fix the str/int issue. [17:36:00] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [17:40:51] PROBLEM - Puppet errors on deployment-aqs03 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [17:53:26] PROBLEM - Puppet errors on deployment-sca02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [17:54:55] Project beta-scap-eqiad build #179006: 04FAILURE in 1 min 10 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179006/ [17:55:55] [45583c62cfadcebf7f5d53b5] [no req] RuntimeException from line 494 of /srv/mediawiki-staging/php-master/includes/registration/ExtensionProcessor.php: The configuration setting 'wgContributionTrackingFundraiserMaintenance' was already set by another extension, and cannot be set again. [18:04:46] Project beta-scap-eqiad build #179007: 04STILL FAILING in 1 min 6 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179007/ [18:10:59] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [18:13:27] RECOVERY - Puppet errors on deployment-sca02 is OK: OK: Less than 1.00% above the threshold [0.0] [18:14:46] Project beta-scap-eqiad build #179008: 04STILL FAILING in 1 min 4 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179008/ [18:18:40] 10Continuous-Integration-Infrastructure, 10Patch-For-Review, 10User-Addshore: un blacklist https://integration.wikimedia.org/ci/computer/XXXX/builds - https://phabricator.wikimedia.org/T178458#3707627 (10Krinkle) [18:20:09] Project beta-update-databases-eqiad build #20861: 04FAILURE in 6.8 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/20861/ [18:20:50] RECOVERY - Puppet errors on deployment-aqs03 is OK: OK: Less than 1.00% above the threshold [0.0] [18:24:39] Project beta-scap-eqiad build #179009: 04STILL FAILING in 1 min 3 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179009/ [18:26:05] PROBLEM - Free space - all mounts on integration-slave-jessie-1002 is CRITICAL: CRITICAL: integration.integration-slave-jessie-1002.diskspace._mnt.byte_percentfree (No valid datapoints found)integration.integration-slave-jessie-1002.diskspace._srv.byte_percentfree (<100.00%) [18:32:39] no_justification: can you look at scap failing in beta? [18:32:48] I'm looking into it already [18:34:35] well then [18:34:40] :P [18:35:05] Project beta-scap-eqiad build #179010: 04STILL FAILING in 1 min 23 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179010/ [18:36:29] 10Continuous-Integration-Infrastructure (shipyard): Document minimum required version of docker to build CI images - https://phabricator.wikimedia.org/T178821#3707708 (10Legoktm) Thanks, I was able to build containers now! I was using the docker in the official Fedora repos, but I guess that's really outdated no... [18:37:14] Nothing new in that codebase [18:39:24] Whatever happened was between 17:46 and 17:54 [18:40:21] Only thing merged in that window in gerrit was https://gerrit.wikimedia.org/r/#/c/371947/ [18:40:27] But that seems unlikely [18:44:44] Project beta-scap-eqiad build #179011: 04STILL FAILING in 1 min 5 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179011/ [18:54:54] Project beta-scap-eqiad build #179012: 04STILL FAILING in 1 min 7 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179012/ [18:56:28] oh ffs [18:56:59] no_justification: its https://gerrit.wikimedia.org/r/#/c/374334/ [18:57:12] which is doing its job, ContributionTracking is doing something broken [18:57:25] Ah, that would do it! [18:57:50] I put up https://gerrit.wikimedia.org/r/#/c/386251/ to unbreak it now, but there's probably a better fix [19:00:52] someone needs to decide whether those variables should be set in ContributionTracking's extension.json or FundraiserLandingPage's extension.json [19:00:54] not both [19:01:05] is the core patch in todays deploy branch? [19:01:18] I assume it's also going to blow up in prod [19:01:58] legoktm: doesn't look like it's in anything other than master [19:02:06] according to gerrit's included in [19:02:13] ok, I'm filing a bug now [19:04:49] https://phabricator.wikimedia.org/T178940 [19:04:50] Project beta-scap-eqiad build #179013: 04STILL FAILING in 1 min 7 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179013/ [19:06:43] (03PS8) 10Hashar: Castor for Docker jobs [integration/config] - 10https://gerrit.wikimedia.org/r/385390 [19:09:07] 10Release-Engineering-Team, 10Scap, 10Services (watching): Scap3 broken in Beta - https://phabricator.wikimedia.org/T178884#3707835 (10mmodell) so it seems that most beta hosts already had git 2.11, however, a few (e.g. `deployment-sca02`) did not. @madhuvishy ran `apt-get install git` for us with cumin, so... [19:11:46] Project beta-scap-eqiad build #179014: 04STILL FAILING in 1 min 7 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179014/ [19:14:29] (03CR) 10Chad: [C: 031] Backport fix "Fix change number extraction on new enough Gerrit master" [integration/zuul] (debian/jessie-wikimedia) - 10https://gerrit.wikimedia.org/r/356181 (https://phabricator.wikimedia.org/T158243) (owner: 10Paladox) [19:14:45] Project beta-scap-eqiad build #179015: 04STILL FAILING in 1 min 2 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179015/ [19:14:57] no_justification im not sure if hashar would do ^^. [19:18:06] though that's the change i have been running in labs [19:18:17] i used wget to download from the deb jenkins built [19:18:47] It's not urgent [19:19:38] ok [19:20:14] Project beta-update-databases-eqiad build #20862: 04STILL FAILING in 14 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/20862/ [19:20:44] ncaught exception 'RuntimeException' with message 'The configuration setting 'wgNamespacesWithSubpages' was already set by another extension, and cannot be set again.' in /srv/mediawiki-staging/php-master/includes/registration/ExtensionProcessor.php:494\nStack trace:\n#0 /srv/mediawiki-staging/php-master/includes/registration/ExtensionProcessor.php(481): ExtensionProcessor->addConfigGlobal('wgNamespacesWit...', Array)\n#1 /srv/mediawiki-stagi [19:20:44] ng/php-master/includes/registration/ExtensionProcessor.php(190): ExtensionProcessor->extractConfig2(Array, '/srv/mediawiki-...')\n#2 /srv/mediawiki-staging/php-master/includes/registration/ExtensionRegistry.php(246): ExtensionProcessor->extractInfo('/srv/mediawiki-...', Array, 2)\n#3 /srv/mediawiki-staging/php-master/includes/registration/ExtensionRegistry.php(148): ExtensionRegistry->readFromQueue(Array)\n#4 /srv/mediawiki-staging/php-master [19:20:44] /includes/Setup.php(40): ExtensionRegistry->loadFromQueue()\n#5 /srv/mediawiki-staging/php-master/maintenance/doMaintenance.php(79): requi in /srv/mediawiki-staging/php-master/includes/registration/ExtensionProcessor.php on line 494\ [19:21:06] oh [19:21:07] woops [19:24:42] Project beta-scap-eqiad build #179016: 04STILL FAILING in 1 min 3 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179016/ [19:33:35] WTF [19:34:00] paladox: that is https://gerrit.wikimedia.org/r/#/c/374334/ [19:34:03] which is fine [19:34:10] but that also mean bunch of stuff gotta be fixed [19:34:44] Project beta-scap-eqiad build #179017: 04STILL FAILING in 1 min 4 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179017/ [19:35:17] yep [19:39:38] paladox: mind filling a bug for that please ? :) [19:39:45] yep [19:40:01] you can point https://gerrit.wikimedia.org/r/#/c/374334/ cc the reviewers and maybe mentions wgNamespacesWithSubpages is in extension.json [19:40:21] yep [19:40:23] (apparently) [19:40:23] thanks [19:40:28] or whatever I dont know what is going on really [19:40:44] EducationProgram/extension.json: "NamespacesWithSubpages": { [19:40:44] Html2Wiki/extension.json: "NamespacesWithSubpages": { [19:40:44] SecurePoll/extension.json: "NamespacesWithSubpages": { [19:42:07] paladox: I dont know whether extension registration has a way to merge settings [19:42:19] it should [19:42:22] i think [19:42:28] array_merge [19:42:31] or array_merge_2d [19:42:41] i am not sure about the name for the 2nd one [19:42:49] but the first one is array_merge [19:44:45] Project beta-scap-eqiad build #179018: 04STILL FAILING in 1 min 5 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179018/ [19:44:46] https://phabricator.wikimedia.org/T178944 [19:44:49] hashar legoktm ^^ [19:44:54] hasharAway ^^ [19:46:21] paladox: ah and there is another occurence great [19:46:45] yep [19:54:42] Project beta-scap-eqiad build #179019: 04STILL FAILING in 1 min 4 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179019/ [20:04:41] Project beta-scap-eqiad build #179020: 04STILL FAILING in 1 min 3 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179020/ [20:10:24] (03PS1) 10Legoktm: Introduce ci-src-setup-simple [integration/config] - 10https://gerrit.wikimedia.org/r/386259 [20:11:04] (03CR) 10Legoktm: "Haven't tested this yet, but does the concept look OK?" [integration/config] - 10https://gerrit.wikimedia.org/r/386259 (owner: 10Legoktm) [20:12:26] sigh [20:14:42] Project beta-scap-eqiad build #179021: 04STILL FAILING in 1 min 3 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179021/ [20:20:05] Project beta-update-databases-eqiad build #20863: 04STILL FAILING in 4.5 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/20863/ [20:24:41] Project beta-scap-eqiad build #179022: 04STILL FAILING in 1 min 1 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179022/ [20:34:38] Project beta-scap-eqiad build #179023: 04STILL FAILING in 1 min 1 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179023/ [20:44:39] Project beta-scap-eqiad build #179024: 04STILL FAILING in 1 min 1 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179024/ [20:54:37] Project beta-scap-eqiad build #179025: 04STILL FAILING in 1 min 0 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179025/ [21:04:38] Project beta-scap-eqiad build #179026: 04STILL FAILING in 1 min 0 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179026/ [21:05:55] no_justification legoktm: btw, re those multiple definions of config variables issues, can we just unilaterially nuke whichever one we want to get beta running again? or should we revert the core change? [21:11:33] * greg-g caught up from -core that I missed [21:11:49] the two patches lego mentioned are merged but beta scap update is still failing... [21:14:39] Project beta-scap-eqiad build #179027: 04STILL FAILING in 59 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179027/ [21:15:20] I said those wouldn't fix it [21:16:32] ah, the second issue [21:16:40] 10Release-Engineering-Team, 10Scap, 10Services (watching): Scap3 broken in Beta - https://phabricator.wikimedia.org/T178884#3708247 (10mobrovac) I can confirm git is up to date now, but it seems that @demon's commit missed [one more invocation of `cpus_for_jobs`](https://phabricator.wikimedia.org/source/scap... [21:21:55] Yippee, build fixed! [21:21:56] Project beta-update-databases-eqiad build #20864: 09FIXED in 1 min 55 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/20864/ [21:24:42] Project beta-scap-eqiad build #179028: 04STILL FAILING in 1 min 0 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179028/ [21:25:23] odd thing, maybe already known? https://gerrit.wikimedia.org/r/#/c/386199/ was +2'd, ran gate-and-submit jobs, got V+2, but was never merged by jenkins. [21:25:58] oh nm, i just can't read .. someone uploaded a new patch quickly after the first CR+2 [21:26:18] "quickly" = hours :P [21:29:49] poor cindy [21:32:30] 10Continuous-Integration-Infrastructure (shipyard): Document minimum required version of docker to build CI images - https://phabricator.wikimedia.org/T178821#3708298 (10hashar) Sure please be bold! Though we would most probably forget to update the README.md file, that would at least give a baseline. I guess i... [21:34:24] 10Release-Engineering-Team, 10Scap, 10Services (watching): Scap3 broken in Beta - https://phabricator.wikimedia.org/T178884#3708301 (10demon) Pushed a fix, it should build and be live before long. [21:34:39] Project beta-scap-eqiad build #179029: 04STILL FAILING in 1 min 0 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179029/ [21:44:37] Project beta-scap-eqiad build #179030: 04STILL FAILING in 1 min 0 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179030/ [21:54:39] Project beta-scap-eqiad build #179031: 04STILL FAILING in 1 min 0 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179031/ [22:04:38] Project beta-scap-eqiad build #179032: 04STILL FAILING in 58 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179032/ [22:14:37] Project beta-scap-eqiad build #179033: 04STILL FAILING in 1 min 1 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179033/ [22:24:32] Project beta-scap-eqiad build #179034: 04STILL FAILING in 55 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179034/ [22:29:28] RECOVERY - Puppet errors on deployment-cpjobqueue is OK: OK: Less than 1.00% above the threshold [0.0] [22:34:36] Project beta-scap-eqiad build #179035: 04STILL FAILING in 1 min 0 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179035/ [22:44:36] Project beta-scap-eqiad build #179036: 04STILL FAILING in 58 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179036/ [22:54:39] Project beta-scap-eqiad build #179037: 04STILL FAILING in 1 min 1 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179037/ [22:54:42] 10Beta-Cluster-Infrastructure, 10Operations, 10Traffic, 10Patch-For-Review: Beta cluster is down - https://phabricator.wikimedia.org/T178841#3708461 (10Krenair) Are we okay to close this now? Do we want to look into what caused the initial varnish upgrade? [23:04:37] Project beta-scap-eqiad build #179038: 04STILL FAILING in 1 min 2 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179038/ [23:14:36] Project beta-scap-eqiad build #179039: 04STILL FAILING in 1 min 1 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179039/ [23:24:31] Project beta-scap-eqiad build #179040: 04STILL FAILING in 55 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179040/ [23:34:38] Project beta-scap-eqiad build #179041: 04STILL FAILING in 1 min 1 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179041/ [23:44:35] Project beta-scap-eqiad build #179042: 04STILL FAILING in 59 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179042/ [23:45:30] 10Beta-Cluster-Infrastructure, 10Operations, 10Traffic, 10Patch-For-Review: Beta cluster is down - https://phabricator.wikimedia.org/T178841#3708577 (10greg) That's a good question (re what caused the varnish upgrade) so I guess we should figure that out. The timing seems oddly non-deterministic (from my u... [23:46:09] no_justification upstream have created a 2.14 branch for its-base now heh :) [23:46:15] i can backport my soy change now [23:54:35] Project beta-scap-eqiad build #179043: 04STILL FAILING in 59 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/179043/