[00:30:25] 10Browser-Tests, 10Gather, 6Mobile-Web, 10MobileFrontend, 3Mobile-Web-Sprint-48-Voyage-of-the-Damned: Audit existing browser tests - https://phabricator.wikimedia.org/T101071#1332520 (10Jdlrobson) TODO: * Skipped tests shouldn't send e-mail notifications * @dduval and @jdlrobson to explore why this test... [00:49:03] 10Deployment-Systems, 6Release-Engineering: Use subrepos instead of git submodules for deployed MediaWiki extensions - https://phabricator.wikimedia.org/T98834#1332566 (10mmodell) @jdforrester-wmf: It wouldn't necessarily have to take place in the main VE repo, this could be done via an intermediate merge rep... [00:55:19] 10Deployment-Systems, 6Release-Engineering: Use subrepos instead of git submodules for deployed MediaWiki extensions - https://phabricator.wikimedia.org/T98834#1332586 (10mmodell) This is really a detail specific to the wmf release branches of mediawiki, it will really only come into play when applying commit... [01:03:09] 10Deployment-Systems: Come up with an abstract deployment model that roughly addresses the needs of existing projects - https://phabricator.wikimedia.org/T97068#1332623 (10mmodell) [01:03:14] 10Browser-Tests, 6Release-Engineering, 6Mobile-Web: Introduce @skip tag in mediawiki selenium - https://phabricator.wikimedia.org/T101062#1332626 (10kaldari) 5Open>3Resolved a:3kaldari This was implemented for Mobile-Web in https://gerrit.wikimedia.org/r/#/c/215542/ [01:44:51] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: Generate code coverage reports for extensions - https://phabricator.wikimedia.org/T71685#1332687 (10Legoktm) >>! In T71685#1328430, @phuedx wrote: > @Legoktm: Can we get MobileFrontend and Gather added to the list of extensions that you're generating... [01:48:51] 10Beta-Cluster, 10MediaWiki-extensions-GettingStarted: GettingStarted on Beta Cluster periodically loses its Redis index - https://phabricator.wikimedia.org/T100515#1332691 (10Mattflaschen) >>! In T100515#1318782, @hashar wrote: > I guess on prod you are using a dedicated one or another one. Nope, we're doing... [02:20:06] (03CR) 10Krinkle: [C: 04-1] "As outlined in the past with Antoine, avoid using 'make build' for this purpose. It incurs additional complexities that aren't worth the b" [integration/config] - 10https://gerrit.wikimedia.org/r/191046 (https://phabricator.wikimedia.org/T74794) (owner: 10Hashar) [02:21:51] (03CR) 10Krinkle: "If migration is too much work, it can be bypassed by specifying the relevant shell command (e.g. grunt docs) in package.json/scripts/doc a" [integration/config] - 10https://gerrit.wikimedia.org/r/191046 (https://phabricator.wikimedia.org/T74794) (owner: 10Hashar) [02:52:47] (03PS1) 10Krinkle: Enable npm job for CategoryTree [integration/config] - 10https://gerrit.wikimedia.org/r/215571 [02:53:03] (03CR) 10Krinkle: [C: 032] Enable npm job for CategoryTree [integration/config] - 10https://gerrit.wikimedia.org/r/215571 (owner: 10Krinkle) [02:58:34] (03Merged) 10jenkins-bot: Enable npm job for CategoryTree [integration/config] - 10https://gerrit.wikimedia.org/r/215571 (owner: 10Krinkle) [02:58:39] 6Release-Engineering, 6operations: Try out hack (>! In T91590#1332329, @Legoktm wrote: > Also HHVM's linter is significantly slower than PHP5: https://github.com/JakubOnderka/PHP-Parallel-Lint/issues/47 This is a general... [03:04:53] 6Release-Engineering, 6operations: Try out hack (>! In T91590#1332763, @bd808 wrote: >>>! In T91590#1332329, @Legoktm wrote: >> Also HHVM's linter is significantly slower than PHP5: https://github.com/JakubOnderka/PHP-Para... [03:07:16] !log Reloading Zuul to deploy https://gerrit.wikimedia.org/r/215571 [03:07:21] Logged the message, Master [04:19:20] 10Browser-Tests, 6Release-Engineering, 6Mobile-Web: Introduce @skip tag in mediawiki selenium - https://phabricator.wikimedia.org/T101062#1332911 (10Jdlrobson) 5Resolved>3Open This is for a generic component [04:23:54] 10Browser-Tests, 10MediaWiki-extensions-GuidedTour: Add Cucumber browser tests for GuidedTour - https://phabricator.wikimedia.org/T92154#1332915 (10Mattflaschen) It's not a priority for me right now since GuidedTour isn't being actively developed at the moment. [05:08:18] 10Deployment-Systems, 6Release-Engineering: Use subrepos instead of git submodules for deployed MediaWiki extensions - https://phabricator.wikimedia.org/T98834#1332933 (10Mattflaschen) [05:24:21] Project beta-scap-eqiad build #55583: FAILURE in 10 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/55583/ [06:40:51] RECOVERY - Free space - all mounts on deployment-bastion is OK All targets OK [06:56:26] andre__: https://phabricator.wikimedia.org/T97642 ? [06:57:00] matanya: I fail to see the urgency and why people feel like they need to ping me all of the time, tbh [06:57:08] yes, I am aware of it as it is assigned to me. [06:57:17] yes, many other things are also assigned to me. Yes, I will get there. [06:57:45] andre__: people are eager to help :) [06:58:25] I'm not sure if approx. 1.2 people pinging me daily and making me switch to the IRC window and interrupt other stuff is a good trade-off. :P [06:58:42] So yes, I'll do that soon. And thanks for the ping :) [06:58:58] (/me not even ironic; thanks for the reminder) [06:59:39] at your service andre__ didn't know there was a backround there. [06:59:44] hehe [07:00:42] RECOVERY - Free space - all mounts on deployment-videoscaler01 is OK All targets OK [08:19:29] addshore: new release of mwcs came out yesterday :D [08:19:35] :D [08:20:11] About a minute after it came out, I used it in the composer dependencies of Extension:SmiteSpam, my GSoC project. [08:20:15] Such a fun moment. [08:20:29] :) [08:20:33] Good work! :) [08:21:19] thanks :) [08:21:39] I remember when I first took a look at it and added the tests :P [08:21:46] and always meant to come back and add more! ;p [08:44:09] (03PS1) 10Polybuildr: Update README.md code formatting [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/215580 [08:46:29] (03CR) 10Polybuildr: "Refer to https://github.com/polybuildr/mediawiki-tools-codesniffer/blob/master/README.md to see working example." [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/215580 (owner: 10Polybuildr) [08:46:41] addshore: ^ [08:53:45] (03CR) 10Addshore: [C: 032] Update README.md code formatting [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/215580 (owner: 10Polybuildr) [08:59:05] (03PS3) 10Hashar: Update Git plugin configuration [integration/config] - 10https://gerrit.wikimedia.org/r/215335 (https://phabricator.wikimedia.org/T101105) [08:59:39] addshore: jzerebecki: you want to update your JJB copy :-} [08:59:46] I have pushed some changes yesterday [08:59:57] (03CR) 10Hashar: [C: 032] "Going to refresh them" [integration/config] - 10https://gerrit.wikimedia.org/r/215335 (https://phabricator.wikimedia.org/T101105) (owner: 10Hashar) [09:04:09] (03Merged) 10jenkins-bot: Update Git plugin configuration [integration/config] - 10https://gerrit.wikimedia.org/r/215335 (https://phabricator.wikimedia.org/T101105) (owner: 10Hashar) [09:10:26] !log Refershing almost all jenkins jobs to take in account the Jenkins Git plugin upgrade https://phabricator.wikimedia.org/T101105 [09:10:30] Logged the message, Master [09:22:25] PROBLEM - Puppet failure on deployment-mx is CRITICAL 100.00% of data above the critical threshold [0.0] [09:44:50] (03CR) 10Hashar: "I have refreshed all the jobs." [integration/config] - 10https://gerrit.wikimedia.org/r/215335 (https://phabricator.wikimedia.org/T101105) (owner: 10Hashar) [09:49:46] 10Continuous-Integration-Infrastructure, 6Editing-Department, 10VisualEditor, 5Patch-For-Review, 7Regression: Submodule not being updated in Jenkins jobs - https://phabricator.wikimedia.org/T101105#1333209 (10hashar) I have refreshed all the jobs. Should be fine now. I sent a mail to the qa list to have... [10:02:45] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-monobook-sauce build #451: FAILURE in 41 min: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-monobook-sauce/451/ [10:03:48] !log Further updated JJB fork c7231fe..f966521 [10:03:53] Logged the message, Master [10:04:44] 10Continuous-Integration-Infrastructure, 6Editing-Department, 10VisualEditor, 5Patch-For-Review, 7Regression: Submodule not being updated in Jenkins jobs - https://phabricator.wikimedia.org/T101105#1333221 (10hashar) 5Open>3Resolved [10:07:35] 10Continuous-Integration-Infrastructure, 7Regression: ERROR: Failed to notify endpoint 'HTTP:http://127.0.0.1:8001/jenkins_endpoint' - https://phabricator.wikimedia.org/T93321#1333234 (10hashar) On Monday we bumped JJB. I am now bumping it to merge commit 4135e143 which is the patch I wrote https://review.ope... [10:08:23] !log Update JJB fork again f966521..4135e14 . Will remove the http notification to zuul {{bug:T93321}}. REFRESHING ALL JOBS! [10:08:28] Logged the message, Master [10:32:47] 10Continuous-Integration-Infrastructure, 7Regression: ERROR: Failed to notify endpoint 'HTTP:http://127.0.0.1:8001/jenkins_endpoint' - https://phabricator.wikimedia.org/T93321#1333276 (10hashar) 5Open>3Resolved Jobs are still being refreshed but I confirmed the http notification is gone and jobs are proper... [10:42:18] PROBLEM - Content Translation Server on deployment-cxserver03 is CRITICAL: Connection refused [10:57:16] RECOVERY - Content Translation Server on deployment-cxserver03 is OK: HTTP OK: HTTP/1.1 200 OK - 1103 bytes in 0.027 second response time [11:33:11] 10Beta-Cluster, 10ContentTranslation-cxserver: CXServer on beta is writing Logs to NFS - https://phabricator.wikimedia.org/T101240#1333386 (10yuvipanda) 3NEW [11:38:16] PROBLEM - Content Translation Server on deployment-cxserver03 is CRITICAL: Connection refused [11:42:42] PROBLEM - Content Translation Server on deployment-sca02 is CRITICAL: Connection refused [11:57:57] duh [11:58:19] !log Cherry-picked 213840 to test logstash [11:58:24] Logged the message, Master [11:58:37] That's good Kartik as per documentation. [12:00:50] 10Beta-Cluster: beta-scap-eqiad broken since June 3rd 5:24am UTC - https://phabricator.wikimedia.org/T101252#1333501 (10hashar) 3NEW [12:05:48] 10Beta-Cluster: beta-scap-eqiad broken since June 3rd 5:24am UTC - https://phabricator.wikimedia.org/T101252#1333510 (10hashar) That is caused by https://gerrit.wikimedia.org/r/#/c/213469/ (Add PHP error logging to Sentry extension) for T85188. It introduces a PHP dependency for `raven/raven` in composer.json w... [12:06:42] PROBLEM - Free space - all mounts on deployment-videoscaler01 is CRITICAL deployment-prep.deployment-videoscaler01.diskspace._var.byte_percentfree (<10.00%) [12:14:50] Yippee, build fixed! [12:14:51] Project beta-scap-eqiad build #55630: FIXED in 1 min 9 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/55630/ [12:15:34] 10Beta-Cluster, 5Patch-For-Review: beta-scap-eqiad broken since June 3rd 5:24am UTC - https://phabricator.wikimedia.org/T101252#1333530 (10hashar) p:5Triage>3Unbreak! [12:15:39] 10Beta-Cluster, 5Patch-For-Review: beta-scap-eqiad broken since June 3rd 5:24am UTC - https://phabricator.wikimedia.org/T101252#1333532 (10hashar) 5Open>3Resolved a:3hashar I have reverted the Sentry patch and commented about it on T85188 Triggered a build of [[ https://integration.wikimedia.org/ci/job/... [12:21:04] 10Beta-Cluster, 10ContentTranslation-cxserver, 5Patch-For-Review: CXServer on beta is writing Logs to NFS - https://phabricator.wikimedia.org/T101240#1333540 (10hashar) a:3yuvipanda [13:08:34] and off [13:08:36] see you tomorrow [13:21:06] 10Continuous-Integration-Infrastructure: Create CI slaves using Debian Jessie (tracking) - https://phabricator.wikimedia.org/T94836#1333664 (10faidon) >>! In T98003#1320527, @hashar wrote: > I created a single Jessie slave to report on package/puppet/upstart errors. Tracking is T94836. It is not a priority thou... [13:58:16] RECOVERY - Content Translation Server on deployment-cxserver03 is OK: HTTP OK: HTTP/1.1 200 OK - 1103 bytes in 0.023 second response time [14:02:45] RECOVERY - Content Translation Server on deployment-sca02 is OK: HTTP OK: HTTP/1.1 200 OK - 1103 bytes in 0.024 second response time [14:19:17] PROBLEM - Content Translation Server on deployment-cxserver03 is CRITICAL: Connection refused [14:34:57] Project browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #522: FAILURE in 8 min 56 sec: https://integration.wikimedia.org/ci/job/browsertests-Echo-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/522/ [14:59:19] RECOVERY - Content Translation Server on deployment-cxserver03 is OK: HTTP OK: HTTP/1.1 200 OK - 1103 bytes in 0.036 second response time [16:21:21] 10Continuous-Integration-Infrastructure, 6operations: Build a new version of php-luasandbox and hhvm-luasandbox, and deploy to integration hosts - https://phabricator.wikimedia.org/T101275#1334197 (10Anomie) 3NEW [16:24:34] 10Continuous-Integration-Infrastructure, 10MediaWiki-extensions-Scribunto, 7I18n: LuaStandalone timeout is sometimes reported as read error - https://phabricator.wikimedia.org/T96912#1334222 (10Anomie) 5Open>3Resolved [16:24:38] 10Continuous-Integration-Infrastructure, 10MediaWiki-extensions-Scribunto, 7I18n: LuaStandalone timeout is sometimes reported as read error - https://phabricator.wikimedia.org/T96912#1229176 (10Anomie) This was probably fixed with [[https://gerrit.wikimedia.org/r/#/c/213586/|Gerrit change 213586]]. If this s... [17:03:18] bd808: Possibly easy idea: could we mount /var/www somewhere like we do the stuff in /srv? [17:03:28] That docroot is useful for debugging and experimentation [17:03:45] (somewhere available to the host OS, that is) [17:12:02] 10Browser-Tests, 6Release-Engineering: Introduce @skip tag in mediawiki selenium - https://phabricator.wikimedia.org/T101062#1334384 (10kaldari) [17:23:23] 10Browser-Tests, 6Release-Engineering: Introduce @skip tag in mediawiki selenium - https://phabricator.wikimedia.org/T101062#1334446 (10greg) a:5kaldari>3dduvall [17:29:12] marxarelli: I assume that is right^ ? [17:34:01] 10Browser-Tests, 6Release-Engineering: Introduce @skip tag in mediawiki selenium - https://phabricator.wikimedia.org/T101062#1334484 (10dduvall) a:5dduvall>3None [17:34:19] greg-g: probably shouldn't assign it until someone is working on it [17:36:09] * greg-g nods [17:43:15] ostriches: yeah I think we could do that easily. We could move it to $VAGRANT/srv/www in the host os and update the apache config to read from there. If we did it with a hiera var then we could let people put it wherever they wanted. [17:43:34] That sounds like a plan [17:44:28] * bd808 dreams of having things setup so that he can put everything in the VM [17:58:09] 6Release-Engineering, 7user-notice: Shorten/Simplify MW train deploy cadence to Tu->W->Th - https://phabricator.wikimedia.org/T97553#1334671 (10mmodell) In the announcement email, @Greg wrote: >== Transition == >Transitions from one cadence to another are hard. Here's how we'll be >doing this transition: > >We... [18:15:47] twentyafterfour: phab generally writes to apache's error log, right? [18:16:20] Ah nope, nvm [18:16:29] phabricator_error.log [18:16:30] 6Release-Engineering, 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Grant access for aklapper to phab-admins - https://phabricator.wikimedia.org/T97642#1334834 (10Aklapper) 5stalled>3Open Went on vacations. Got back. Found people asking for stuff. Created key. Signed L3. wikitech username: ak... [18:16:54] 6Release-Engineering, 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Grant access for aklapper to phab-admins - https://phabricator.wikimedia.org/T97642#1334838 (10Aklapper) a:5Aklapper>3None [18:20:04] 6Release-Engineering, 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Grant access for aklapper to phab-admins - https://phabricator.wikimedia.org/T97642#1334847 (10Matanya) a:3Matanya [18:21:31] PROBLEM - Puppet failure on integration-zuul-packaged is CRITICAL 100.00% of data above the critical threshold [0.0] [18:24:52] !log updating deployment-salt puppet in prep for use_dnsmasq=false [18:24:57] Logged the message, Master [18:26:24] ostriches: yeah [18:27:04] * ostriches is poking git-http-backend a tad [18:27:13] Seeing if we can at least hosting our diffusion repos as mirrors [18:28:12] thcipriani: gotta go rescue tiny plants from thunderstorm, brb [18:28:20] kk [18:29:48] ostriches: as mirrors? [18:30:31] 6Release-Engineering, 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Grant access for aklapper to phab-admins - https://phabricator.wikimedia.org/T97642#1334910 (10Matanya) a:5Matanya>3Dzahn [18:31:20] twentyafterfour: So you can clone the repos from diffusion. [18:31:47] The problem is opening up ssh [18:31:56] we can set up https cloning I think [18:32:13] Yeah https cloning is all I'm doing now [18:32:18] R/O :) [18:32:33] Actually I'm not sure if phabricator supports that when it's mirroring remote repos, I'd ask in #phabricator they might know [18:32:52] I think it should [18:32:59] Config/UI seems to indicate it is [18:33:07] I'm trying to debug why it won't work yet tho :) [18:33:49] hmm, maybe you have to authenticate? are the repos set to fully public? [18:33:58] https://phabricator.wikimedia.org/diffusion/UINF/edit/serve/ [18:34:19] Visible To Public (No Login Required) [18:34:38] thcipriani: I’m back, at least partially :) Ping if you run into trouble. [18:34:46] Set 1 was getting git-http-backend into $PATH (done, will have a puppet patch shortly to shore it up) [18:34:55] *step [18:35:03] Now I'm still getting 500 and nothing in the log [18:35:48] andrewbogott: uh...puppetmaster restart doesn't seem to want to come back up [18:35:59] PROBLEM - Puppet failure on deployment-db2 is CRITICAL 20.00% of data above the critical threshold [0.0] [18:36:05] ok, on deployment-salt? [18:36:16] yeah [18:36:29] wonder if it's finishing a run, or it's just hung up. [18:36:37] PROBLEM - Puppet failure on deployment-logstash1 is CRITICAL 20.00% of data above the critical threshold [0.0] [18:36:38] twentyafterfour: I'm going to undo my testing and poke this later, I don't wanna leave it half-working [18:39:18] ok .. I think the problem is sudo settings [18:39:38] ostriches: you just triggered a bunch of sudo failure emails wrt git-http-backend [18:39:45] PROBLEM - Puppet failure on deployment-salt is CRITICAL 50.00% of data above the critical threshold [0.0] [18:40:15] PROBLEM - Puppet failure on deployment-db1 is CRITICAL 70.00% of data above the critical threshold [0.0] [18:40:25] PROBLEM - Puppet failure on deployment-cxserver03 is CRITICAL 44.44% of data above the critical threshold [0.0] [18:40:41] andrewbogott: I'm guessing pid 1212 is just going to have to get killed [18:41:05] PROBLEM - Puppet failure on deployment-fluorine is CRITICAL 66.67% of data above the critical threshold [0.0] [18:41:28] thcipriani: yep, that seems to’ve done it. [18:41:32] No idea why it got stuck :( [18:42:00] PROBLEM - Puppet failure on deployment-mediawiki03 is CRITICAL 60.00% of data above the critical threshold [0.0] [18:43:16] ok, rerunning on deployment-salt one time more for good measure, then we'll flip the switch [18:44:23] alright, seems to have run [18:44:24] PROBLEM - Puppet failure on deployment-elastic08 is CRITICAL 30.00% of data above the critical threshold [0.0] [18:46:10] PROBLEM - Puppet failure on deployment-elastic06 is CRITICAL 66.67% of data above the critical threshold [0.0] [18:46:18] PROBLEM - Puppet failure on deployment-sca02 is CRITICAL 60.00% of data above the critical threshold [0.0] [18:47:26] just looked into the deployment-elastic08 puppet failure, to make sure no updates to the puppet repo blew anything up. Seems like these failures are a result of restarting the puppetmaster [18:48:19] andrewbogott: [x] update puppet master; [ ] use_dnsmasq: false in hiera, is that the proper next step? [18:49:00] change the puppetmaster name in ldap and hiera — that’s done already? [18:49:44] well, puppetmaster is updated in heira, should override ldap [18:49:58] yep [18:50:04] alright, here goes [18:50:17] !log change use_dnsmasq: false for deployment-prep [18:50:22] Logged the message, Master [18:51:11] running puppet on d-salt [18:53:23] alright, updated master [18:54:06] I'd say we are in good shape for the new cadence, I've never seen the fatal errors so clean. Barely any OOMs, just a bunch of mysql failures which I don't fully understand but it's a known issue [18:54:47] RECOVERY - Puppet failure on deployment-salt is OK Less than 1.00% above the threshold [0.0] [18:58:38] andrewbogott: looks like I have to update resolv.conf manually before each agent recognizes the new server address [19:00:03] RECOVERY - Puppet failure on deployment-salt is OK Less than 1.00% above the threshold [0.0] [19:00:23] 6Release-Engineering, 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Grant access for aklapper to phab-admins - https://phabricator.wikimedia.org/T97642#1335010 (10Dzahn) 5Open>3Resolved user has been created on bast1001 and on iridium. linked andre__ to example for ProxyCommand setup. let us k... [19:00:47] PROBLEM - Puppet failure on deployment-salt is CRITICAL 20.00% of data above the critical threshold [0.0] [19:01:13] 6Release-Engineering, 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Grant access for aklapper to phab-admins - https://phabricator.wikimedia.org/T97642#1335012 (10Dzahn) ``` [iridium:/etc/sudoers.d] $ sudo cat phabricator-admin # This file is managed by Puppet! %phabricator-admin ALL = NOPASSWD:... [19:01:17] thcipriani: that makes sense. I’m not sure why that didn’t happen to me; I’ll set up a new test and investigate. [19:01:38] thcipriani: still manageable, right? [19:01:51] yeah, ndb really [19:03:11] cool. Deployment-elastic08: first one I updated, went just fine. [19:05:17] RECOVERY - Puppet failure on deployment-db1 is OK Less than 1.00% above the threshold [0.0] [19:05:44] PROBLEM - Puppet failure on deployment-zookeeper01 is CRITICAL 20.00% of data above the critical threshold [0.0] [19:05:49] andrewbogott: here's a thought, if a puppet run happens, the resolv.conf updates, then you can _just_ update puppet.conf, that might be why it didn't happen to you? [19:06:00] RECOVERY - Puppet failure on deployment-db2 is OK Less than 1.00% above the threshold [0.0] [19:06:08] RECOVERY - Puppet failure on deployment-fluorine is OK Less than 1.00% above the threshold [0.0] [19:06:32] thcipriani: so you mean running puppet before changing puppet.conf works? [19:06:34] PROBLEM - Puppet failure on deployment-redis01 is CRITICAL 50.00% of data above the critical threshold [0.0] [19:06:36] PROBLEM - Puppet failure on deployment-fluoride is CRITICAL 20.00% of data above the critical threshold [0.0] [19:06:39] It [19:06:44] It’s possible that that’s what I did. [19:07:08] maybe? [19:07:25] hm, probably if you remove use_dnsmasq and then run puppet on the clients before running on the master it works. [19:07:36] PROBLEM - Puppet failure on deployment-test is CRITICAL 30.00% of data above the critical threshold [0.0] [19:07:38] PROBLEM - Puppet failure on deployment-sca01 is CRITICAL 20.00% of data above the critical threshold [0.0] [19:08:03] PROBLEM - Puppet failure on deployment-mathoid is CRITICAL 50.00% of data above the critical threshold [0.0] [19:08:06] PROBLEM - Puppet failure on deployment-upload is CRITICAL 44.44% of data above the critical threshold [0.0] [19:08:33] PROBLEM - Puppet failure on deployment-memc03 is CRITICAL 60.00% of data above the critical threshold [0.0] [19:08:37] PROBLEM - Puppet failure on deployment-jobrunner01 is CRITICAL 40.00% of data above the critical threshold [0.0] [19:09:07] PROBLEM - Puppet failure on deployment-mediawiki01 is CRITICAL 55.56% of data above the critical threshold [0.0] [19:10:21] PROBLEM - Puppet failure on deployment-videoscaler01 is CRITICAL 44.44% of data above the critical threshold [0.0] [19:10:27] PROBLEM - Puppet failure on deployment-memc02 is CRITICAL 50.00% of data above the critical threshold [0.0] [19:10:45] RECOVERY - Puppet failure on deployment-salt is OK Less than 1.00% above the threshold [0.0] [19:11:53] PROBLEM - Puppet failure on deployment-sentry2 is CRITICAL 40.00% of data above the critical threshold [0.0] [19:12:10] PROBLEM - Puppet failure on deployment-mediawiki02 is CRITICAL 33.33% of data above the critical threshold [0.0] [19:12:28] PROBLEM - Puppet failure on deployment-bastion is CRITICAL 33.33% of data above the critical threshold [0.0] [19:12:42] PROBLEM - Puppet failure on deployment-pdf02 is CRITICAL 60.00% of data above the critical threshold [0.0] [19:13:08] PROBLEM - Puppet failure on deployment-zotero01 is CRITICAL 44.44% of data above the critical threshold [0.0] [19:14:22] RECOVERY - Puppet failure on deployment-elastic08 is OK Less than 1.00% above the threshold [0.0] [19:14:24] PROBLEM - Puppet failure on deployment-stream is CRITICAL 30.00% of data above the critical threshold [0.0] [19:15:10] PROBLEM - Puppet failure on deployment-apertium01 is CRITICAL 66.67% of data above the critical threshold [0.0] [19:16:14] PROBLEM - Puppet failure on deployment-db1 is CRITICAL 22.22% of data above the critical threshold [0.0] [19:16:18] PROBLEM - Puppet failure on deployment-elastic05 is CRITICAL 33.33% of data above the critical threshold [0.0] [19:16:58] PROBLEM - Puppet failure on deployment-db2 is CRITICAL 30.00% of data above the critical threshold [0.0] [19:17:00] PROBLEM - Puppet failure on deployment-memc04 is CRITICAL 60.00% of data above the critical threshold [0.0] [19:17:06] PROBLEM - Puppet failure on deployment-fluorine is CRITICAL 22.22% of data above the critical threshold [0.0] [19:23:33] twentyafterfour: Sounds fixable, we'll figure it out later. [19:24:41] ostriches: just needs sudoers adjustments, those are documented in the diffusion setup instructions [19:25:32] Ah yes [19:25:36] I should do that :) [19:26:21] RECOVERY - Puppet failure on deployment-sca02 is OK Less than 1.00% above the threshold [0.0] [19:27:27] RECOVERY - Puppet failure on deployment-bastion is OK Less than 1.00% above the threshold [0.0] [19:29:53] all the puppet fail is already known, right [19:30:06] mutante: yup, just changing labs dns stuffs [19:30:15] thcipriani: ok, cool [19:32:06] RECOVERY - Puppet failure on deployment-fluorine is OK Less than 1.00% above the threshold [0.0] [19:33:30] PROBLEM - Puppet failure on deployment-bastion is CRITICAL 33.33% of data above the critical threshold [0.0] [19:34:08] RECOVERY - Puppet failure on deployment-mediawiki01 is OK Less than 1.00% above the threshold [0.0] [19:38:03] RECOVERY - Puppet failure on deployment-mathoid is OK Less than 1.00% above the threshold [0.0] [19:40:41] RECOVERY - Puppet failure on deployment-zookeeper01 is OK Less than 1.00% above the threshold [0.0] [19:43:06] RECOVERY - Puppet failure on deployment-upload is OK Less than 1.00% above the threshold [0.0] [19:43:26] RECOVERY - Puppet failure on deployment-bastion is OK Less than 1.00% above the threshold [0.0] [19:43:30] RECOVERY - Puppet failure on deployment-memc03 is OK Less than 1.00% above the threshold [0.0] [19:46:40] PROBLEM - Puppet failure on deployment-zookeeper01 is CRITICAL 30.00% of data above the critical threshold [0.0] [19:47:08] RECOVERY - Puppet failure on deployment-mediawiki02 is OK Less than 1.00% above the threshold [0.0] [19:48:36] RECOVERY - Puppet failure on deployment-jobrunner01 is OK Less than 1.00% above the threshold [0.0] [19:51:22] andrewbogott: can you look at deployment-stream quickly? Complaining about something :\ [19:51:28] yep [19:51:55] RECOVERY - Puppet failure on deployment-sentry2 is OK Less than 1.00% above the threshold [0.0] [19:52:41] RECOVERY - Puppet failure on deployment-pdf02 is OK Less than 1.00% above the threshold [0.0] [19:54:09] oh wait, did puppetmaster crap out? [19:54:23] I restarted it [19:54:26] probably needlessly [19:54:47] ah [19:57:28] thcipriani: I generated a new cert for deployment-stream and it seems happy now. Clearly there are 101 races in this process. [19:57:40] heh, indeed. [19:57:55] PROBLEM - Puppet failure on deployment-sentry2 is CRITICAL 40.00% of data above the critical threshold [0.0] [19:58:05] RECOVERY - Puppet failure on deployment-zotero01 is OK Less than 1.00% above the threshold [0.0] [19:59:29] PROBLEM - Puppet failure on deployment-bastion is CRITICAL 44.44% of data above the critical threshold [0.0] [20:03:23] andrewbogott: huh, deployment-cache-bits01, seemingly there may be a bigger issue there. Could you poke that one to verify? [20:03:33] yep [20:08:03] thcipriani: other than some real errors in the puppet config it seems ok. [20:08:16] I ran it once, had to sign the cert on deployment-salt, ran again. [20:08:43] yeah, real errors is what I suspected :\ [20:09:26] RECOVERY - Puppet failure on deployment-stream is OK Less than 1.00% above the threshold [0.0] [20:10:07] thcipriani: init script stuff, could have to do with services configured for jessie but running on precise [20:10:08] RECOVERY - Puppet failure on deployment-apertium01 is OK Less than 1.00% above the threshold [0.0] [20:10:10] dunno [20:10:41] I'll check for phab tickets related to it once I'm done here [20:11:12] RECOVERY - Puppet failure on deployment-db1 is OK Less than 1.00% above the threshold [0.0] [20:14:50] (03PS1) 10Dduvall: Push commits to 0.4 by default [selenium] (0.4) - 10https://gerrit.wikimedia.org/r/215760 [20:15:16] (03CR) 10jenkins-bot: [V: 04-1] Push commits to 0.4 by default [selenium] (0.4) - 10https://gerrit.wikimedia.org/r/215760 (owner: 10Dduvall) [20:15:54] (03PS2) 10Dduvall: Push 0.4 commits to remote 0.4 branch by default [selenium] (0.4) - 10https://gerrit.wikimedia.org/r/215760 [20:16:38] (03CR) 10jenkins-bot: [V: 04-1] Push 0.4 commits to remote 0.4 branch by default [selenium] (0.4) - 10https://gerrit.wikimedia.org/r/215760 (owner: 10Dduvall) [20:16:58] RECOVERY - Puppet failure on deployment-db2 is OK Less than 1.00% above the threshold [0.0] [20:19:02] marxarelli: rspec-core is not part of the bundle. !!! [20:19:22] hashar: it's an old branch [20:20:26] RECOVERY - Puppet failure on deployment-memc02 is OK Less than 1.00% above the threshold [0.0] [20:22:25] !log deployment-bastion Jenkins slave is stalled again :-( No code update happening on beta cluster [20:22:31] Logged the message, Master [20:24:05] thcipriani: you in a good stopping place soon, or do you want to push back our 1:1? [20:24:37] greg-g: I'm close to wrapping up, I can pause for a wee bit too [20:24:45] kk [20:24:48] didn't want to interrupt [20:24:52] andrewbogott: everything seems to be fairly uneventful, thanks for your help :) [20:25:06] great! Sorry it was so much work. [20:25:14] I guess the only major project left is integration? [20:25:18] PROBLEM - Puppet staleness on deployment-restbase01 is CRITICAL 100.00% of data above the critical threshold [43200.0] [20:25:22] RECOVERY - Puppet failure on deployment-videoscaler01 is OK Less than 1.00% above the threshold [0.0] [20:25:49] yup, integration will be the next one. I wonder if there's just a salt state that could be written :) [20:26:36] RECOVERY - Puppet failure on deployment-logstash1 is OK Less than 1.00% above the threshold [0.0] [20:27:55] RECOVERY - Puppet failure on deployment-sentry2 is OK Less than 1.00% above the threshold [0.0] [20:28:48] !log Restarting Jenkins to release a deadlock [20:28:52] Logged the message, Master [20:29:07] thcipriani: thanks a ton for handling this ! :-} [20:29:09] PROBLEM - Puppet staleness on deployment-restbase02 is CRITICAL 100.00% of data above the critical threshold [43200.0] [20:29:13] Project browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #658: ABORTED in 3 min 13 sec: https://integration.wikimedia.org/ci/job/browsertests-Flow-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/658/ [20:29:32] hashar: what does the `files:` filter do in jjb, again? [20:29:44] there is no such thing :-} [20:29:47] it is in zuul [20:29:50] does it only execute the job if files matching the pattern exist? [20:29:52] hashar: np #breaking_down_silos :) [20:29:53] yeah [20:30:00] thcipriani: #together !!! [20:30:09] thcipriani: I’m sure it could be done via salt, even just with cmd.run and sed. [20:30:10] thcipriani: you got it by the dns issue with staging haven't you ? [20:30:32] marxarelli: so when a new patch notif is received by zuul, that contains the list of files that have been changed [20:30:45] marxarelli: so we can prevent running a job unless some specific file is changed [20:31:05] marxarelli: an example is we validate the composer json file only if the patch actually change it [20:31:33] RECOVERY - Puppet failure on deployment-redis01 is OK Less than 1.00% above the threshold [0.0] [20:31:55] hashar: ok. maybe i'm better off excluding the rspec job for just the 0.4 branch of mediawiki-selenium [20:31:59] RECOVERY - Puppet failure on deployment-memc04 is OK Less than 1.00% above the threshold [0.0] [20:32:27] marxarelli: yeah that is doable [20:32:32] marxarelli: in zuul something like: [20:32:36] - job: whatever-rspec [20:32:51] branch: (!:^0.4$) [20:33:05] the problem is the job is probably shared by multiple repos :-/ [20:35:14] (03PS1) 10Dduvall: Don't run rspec for pre-1.0 branches of mediawiki_selenium [integration/config] - 10https://gerrit.wikimedia.org/r/215770 [20:35:19] hashar: ^ [20:36:34] oh [20:37:19] (03CR) 10Hashar: [C: 031] "GO go go !!!" [integration/config] - 10https://gerrit.wikimedia.org/r/215770 (owner: 10Dduvall) [20:37:27] marxarelli: I will let you +2 and deploy it :-} [20:37:51] there is a fabfile.py at the root of the repo for convenience (thanks legoktm ) [20:38:50] hashar: what's a fabfile? :) [20:39:16] oooh, fabric [20:39:17] neat [20:39:22] yeah a python deployment tool [20:39:24] yet another one [20:39:36] I like fabric solely because ... python :-} [20:39:50] Bryan Davis looked at it before porting scap from bash to python [20:39:54] but eventually had to dismiss it [20:40:30] RECOVERY - Puppet failure on deployment-cxserver03 is OK Less than 1.00% above the threshold [0.0] [20:41:20] RECOVERY - Puppet failure on deployment-elastic05 is OK Less than 1.00% above the threshold [0.0] [20:45:11] Jenkis is back [20:45:28] but deployment-bastion is still deadlocked apparently (: [20:47:49] !log Reloading Zuul to deploy I96649bc92a387021a32d354c374ad844e1680db2 [20:47:53] Logged the message, Master [20:49:17] !log restarted zuul entirely to remove some stalled jobs [20:49:19] marxarelli: ^^^ [20:49:20] sorry [20:49:22] Logged the message, Master [20:49:32] PROBLEM - Puppet failure on integration-dev is CRITICAL 33.33% of data above the critical threshold [0.0] [20:50:03] hashar: doh. my changes to layout.yaml didn't take for some reason [20:50:28] maybe the regex is wrong ? [20:50:31] or you forgot to git pull [20:51:21] hmm no [20:51:26] marxarelli: I only +1ed https://gerrit.wikimedia.org/r/#/c/215770/ :} [20:51:46] note that you did a filter for mediawiki-selenium-bundle-rspec' [20:51:51] but that job is not triggered apparently [20:51:59] there are mediawiki-selenium-gembuild [20:52:04] and bundle-yard / bundle-rspec [20:53:37] the jobs have been unified [20:53:44] and are now shared by multiple repos :-( [20:54:26] marxarelli: if you plan to add rspec, the best is probably to force merge that dummy change that just update .gitreview [21:01:20] bed time sorry :/ [21:05:06] PROBLEM - Puppet failure on deployment-mediawiki01 is CRITICAL 44.44% of data above the critical threshold [0.0] [21:05:07] (03CR) 10Polybuildr: "Ping?" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/153399 (owner: 10Addshore) [21:05:22] PROBLEM - Puppet failure on deployment-elastic08 is CRITICAL 30.00% of data above the critical threshold [0.0] [21:07:15] PROBLEM - Puppet failure on deployment-db1 is CRITICAL 50.00% of data above the critical threshold [0.0] [21:07:19] PROBLEM - Puppet failure on deployment-elastic05 is CRITICAL 30.00% of data above the critical threshold [0.0] [21:08:01] PROBLEM - Puppet failure on deployment-db2 is CRITICAL 30.00% of data above the critical threshold [0.0] [21:08:11] PROBLEM - Puppet failure on deployment-mediawiki02 is CRITICAL 66.67% of data above the critical threshold [0.0] [21:09:15] PROBLEM - Puppet failure on integration-vmbuilder-trusty is CRITICAL 44.44% of data above the critical threshold [0.0] [21:09:45] PROBLEM - Puppet failure on integration-slave-trusty-1016 is CRITICAL 40.00% of data above the critical threshold [0.0] [21:10:23] PROBLEM - Puppet failure on deployment-stream is CRITICAL 60.00% of data above the critical threshold [0.0] [21:12:18] PROBLEM - Puppet failure on deployment-sca02 is CRITICAL 55.56% of data above the critical threshold [0.0] [21:12:24] PROBLEM - Puppet failure on integration-raita is CRITICAL 20.00% of data above the critical threshold [0.0] [21:13:00] PROBLEM - Puppet failure on deployment-memc04 is CRITICAL 20.00% of data above the critical threshold [0.0] [21:14:06] PROBLEM - Puppet failure on deployment-mathoid is CRITICAL 22.22% of data above the critical threshold [0.0] [21:14:06] PROBLEM - Puppet failure on deployment-zotero01 is CRITICAL 66.67% of data above the critical threshold [0.0] [21:14:34] PROBLEM - Puppet failure on deployment-memc03 is CRITICAL 50.00% of data above the critical threshold [0.0] [21:15:07] (03Abandoned) 10Dduvall: Don't run rspec for pre-1.0 branches of mediawiki_selenium [integration/config] - 10https://gerrit.wikimedia.org/r/215770 (owner: 10Dduvall) [21:16:29] PROBLEM - Puppet failure on deployment-cxserver03 is CRITICAL 55.56% of data above the critical threshold [0.0] [21:16:45] PROBLEM - Puppet failure on deployment-salt is CRITICAL 60.00% of data above the critical threshold [0.0] [21:17:25] PROBLEM - Puppet staleness on deployment-eventlogging02 is CRITICAL 100.00% of data above the critical threshold [43200.0] [21:17:37] PROBLEM - Puppet failure on deployment-redis01 is CRITICAL 50.00% of data above the critical threshold [0.0] [21:18:39] PROBLEM - Puppet failure on deployment-pdf02 is CRITICAL 40.00% of data above the critical threshold [0.0] [21:18:53] PROBLEM - Puppet failure on deployment-sentry2 is CRITICAL 50.00% of data above the critical threshold [0.0] [21:19:38] PROBLEM - Puppet failure on deployment-jobrunner01 is CRITICAL 50.00% of data above the critical threshold [0.0] [21:20:35] PROBLEM - Puppet failure on deployment-bastion is CRITICAL 66.67% of data above the critical threshold [0.0] [21:21:09] PROBLEM - Puppet failure on deployment-apertium01 is CRITICAL 33.33% of data above the critical threshold [0.0] [21:21:29] PROBLEM - Puppet failure on deployment-memc02 is CRITICAL 30.00% of data above the critical threshold [0.0] [21:21:31] PROBLEM - Puppet failure on integration-slave-trusty-1017 is CRITICAL 22.22% of data above the critical threshold [0.0] [21:21:45] (03PS3) 10Dduvall: Fixup the 0.4 branch to work with CI jobs [selenium] (0.4) - 10https://gerrit.wikimedia.org/r/215760 [21:22:39] PROBLEM - Puppet failure on deployment-logstash1 is CRITICAL 40.00% of data above the critical threshold [0.0] [21:23:05] PROBLEM - Puppet failure on deployment-fluorine is CRITICAL 44.44% of data above the critical threshold [0.0] [21:23:17] PROBLEM - Puppet failure on integration-slave-precise-1014 is CRITICAL 30.00% of data above the critical threshold [0.0] [21:23:27] PROBLEM - Puppet failure on deployment-parsoidcache02 is CRITICAL 100.00% of data above the critical threshold [0.0] [21:24:07] PROBLEM - Puppet failure on deployment-upload is CRITICAL 44.44% of data above the critical threshold [0.0] [21:24:41] PROBLEM - Puppet failure on deployment-kafka02 is CRITICAL 100.00% of data above the critical threshold [0.0] [21:25:03] PROBLEM - Puppet failure on integration-publisher is CRITICAL 44.44% of data above the critical threshold [0.0] [21:26:03] PROBLEM - Puppet failure on deployment-parsoid05 is CRITICAL 100.00% of data above the critical threshold [0.0] [21:26:05] PROBLEM - Puppet failure on deployment-redis02 is CRITICAL 57.14% of data above the critical threshold [0.0] [21:26:10] RECOVERY - Puppet failure on deployment-elastic06 is OK Less than 1.00% above the threshold [0.0] [21:26:16] ummmmmmmm [21:29:46] (03CR) 10Dduvall: [C: 032] Fixup the 0.4 branch to work with CI jobs [selenium] (0.4) - 10https://gerrit.wikimedia.org/r/215760 (owner: 10Dduvall) [21:31:00] (03Merged) 10jenkins-bot: Fixup the 0.4 branch to work with CI jobs [selenium] (0.4) - 10https://gerrit.wikimedia.org/r/215760 (owner: 10Dduvall) [21:31:02] see -labs, shit is down [21:32:15] RECOVERY - Puppet failure on deployment-db1 is OK Less than 1.00% above the threshold [0.0] [21:34:10] wee [21:34:42] RECOVERY - Puppet failure on integration-slave-trusty-1016 is OK Less than 1.00% above the threshold [0.0] [21:35:22] RECOVERY - Puppet failure on deployment-elastic08 is OK Less than 1.00% above the threshold [0.0] [21:35:26] RECOVERY - Puppet failure on deployment-stream is OK Less than 1.00% above the threshold [0.0] [21:36:02] RECOVERY - Puppet failure on deployment-redis02 is OK Less than 1.00% above the threshold [0.0] [21:37:16] RECOVERY - Puppet failure on deployment-sca02 is OK Less than 1.00% above the threshold [0.0] [21:37:23] RECOVERY - Puppet failure on deployment-elastic05 is OK Less than 1.00% above the threshold [0.0] [21:37:59] RECOVERY - Puppet failure on deployment-db2 is OK Less than 1.00% above the threshold [0.0] [21:39:03] RECOVERY - Puppet failure on deployment-zotero01 is OK Less than 1.00% above the threshold [0.0] [21:39:33] RECOVERY - Puppet failure on deployment-memc03 is OK Less than 1.00% above the threshold [0.0] [21:39:36] (03PS1) 10Dduvall: Check for session ID before updating SauceLabs job [selenium] (0.4) - 10https://gerrit.wikimedia.org/r/215796 (https://phabricator.wikimedia.org/T101304) [21:41:27] RECOVERY - Puppet failure on deployment-cxserver03 is OK Less than 1.00% above the threshold [0.0] [21:41:49] RECOVERY - Puppet failure on deployment-salt is OK Less than 1.00% above the threshold [0.0] [21:42:20] RECOVERY - Puppet failure on integration-raita is OK Less than 1.00% above the threshold [0.0] [21:43:02] RECOVERY - Puppet failure on deployment-memc04 is OK Less than 1.00% above the threshold [0.0] [21:47:14] (03PS1) 10Dduvall: Releasing patch version 0.4.3 [selenium] (0.4) - 10https://gerrit.wikimedia.org/r/215799 [21:48:03] (03CR) 10Dduvall: [C: 032] Check for session ID before updating SauceLabs job [selenium] (0.4) - 10https://gerrit.wikimedia.org/r/215796 (https://phabricator.wikimedia.org/T101304) (owner: 10Dduvall) [21:48:12] (03CR) 10Dduvall: [C: 032] Releasing patch version 0.4.3 [selenium] (0.4) - 10https://gerrit.wikimedia.org/r/215799 (owner: 10Dduvall) [21:53:20] (03CR) 10Hashar: "Nice workaround :-}" [selenium] (0.4) - 10https://gerrit.wikimedia.org/r/215760 (owner: 10Dduvall) [21:59:19] (03Merged) 10jenkins-bot: Check for session ID before updating SauceLabs job [selenium] (0.4) - 10https://gerrit.wikimedia.org/r/215796 (https://phabricator.wikimedia.org/T101304) (owner: 10Dduvall) [21:59:21] (03Merged) 10jenkins-bot: Releasing patch version 0.4.3 [selenium] (0.4) - 10https://gerrit.wikimedia.org/r/215799 (owner: 10Dduvall) [22:00:02] So, labs is ‘fixed’ but I can’t browse to beta.wmflabs.org. Is that somehow… expected? [22:02:05] andrewbogott: redirects me to http://deployment.wikimedia.beta.wmflabs.org/wiki/Main_Page [22:02:19] which loads? [22:03:00] yeah, seemingly works for me [22:03:10] andrewbogott: yes [22:03:27] then I will ignore the fact that it does not load for me [22:03:31] andrewbogott: unless you are trying https [22:03:34] probably https-everwhere messing with me [22:03:39] see the problem kaldari had [22:03:41] although it happens on chrome as well which should be http [22:03:45] andrewbogott: try http://en.wikipedia.beta.wmflabs.org/wiki/Main_Page [22:03:46] there is that cookie [22:03:49] anyway… I’m no longer curious [22:03:59] bd808: that works [22:04:05] cookies are shared between browsers? [22:04:13] 14:59 < bd808> kaldari: look at your cookies. There is some cookie that gets set for some people some times that tries to force https to beta cluster. Once it's there you have to delete it manually or the frontend nagios/varnish will keep rediring you to the non-existent https endpoint [22:04:52] grrr... why is the beta logo not on the beta sites? [22:36:10] anyone else seeing "invalid host name" here? http://en.m.wikipedia.beta.wmflabs.org/ [22:37:00] PROBLEM - Puppet failure on integration-slave-trusty-1013 is CRITICAL 40.00% of data above the critical threshold [0.0] [22:37:32] PROBLEM - Puppet failure on integration-slave-precise-1012 is CRITICAL 50.00% of data above the critical threshold [0.0] [22:38:05] i see actual content [22:38:37] hrm [22:46:14] marxarelli: wfm [22:46:28] check yo cookies? [22:48:28] blek. http://i.imgur.com/fHgPHDl.png [22:49:21] MF browser tests are also failing as a result [22:49:22] https://integration.wikimedia.org/ci/view/BrowserTests/view/-Dashboard/job/browsertests-MobileFrontend-en.m.wikipedia.beta.wmflabs.org-linux-firefox-sauce/704/console [22:49:26] headers already sent? [22:50:31] wtf. i'm a hella confused "(Caused by : [Errno -2] Name or service not known" [22:50:37] seems like a dns issue [22:51:23] i'm also seeing funning stuff when i try to resolve wmflabs.org hosts locally [22:51:51] ldap is/was/something fubar'd in labs right now [22:51:52] ( dig +short bastion.wmflabs.org gives me nothing) [22:52:16] thcipriani: still around? [22:52:30] yup [22:52:38] ^^ [22:52:52] looking [22:53:14] not sure if you're aware of the labs general stuff going on as well [22:53:28] not sure of the summary of that (other than andrew called in faidon for help) [22:53:42] yeah, I kicked off the whole trend of "ldap doesn't work" before it was cool [22:53:42] this works: dig +short @labs-ns1.wikimedia.org. bastion.wmflabs.org [22:53:43] there are multiple issues [22:53:53] this don't: dig +short @labs-ns0.wikimedia.org. bastion.wmflabs.org [22:53:53] since LDAP was fixed [22:53:55] now: [22:54:01] ns-0 forgot stuff [22:54:04] ns-1 did not [22:54:07] got it! [22:54:17] thanks, mutante. i'll chill out [22:54:28] i also just reported it like a user [22:54:34] no replies so far [22:54:51] cant connect to my instance [22:57:04] Project browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #141: FAILURE in 4.2 sec: https://integration.wikimedia.org/ci/job/browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/141/ [23:02:31] RECOVERY - Puppet failure on integration-slave-precise-1012 is OK Less than 1.00% above the threshold [0.0] [23:06:59] RECOVERY - Puppet failure on integration-slave-trusty-1013 is OK Less than 1.00% above the threshold [0.0] [23:11:14] PROBLEM - Host Generic Beta Cluster is DOWN: CRITICAL - Host not found (en.wikipedia.beta.wmflabs.org) [23:16:28] (03CR) 10Addshore: "Pong!" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/153399 (owner: 10Addshore) [23:27:37] (03PS4) 10Krinkle: Switch Graph extension to use npm testing for lint checks [integration/config] - 10https://gerrit.wikimedia.org/r/209991 (owner: 10TheDJ) [23:28:10] (03PS5) 10Krinkle: Switch Graph extension to use npm testing for lint checks [integration/config] - 10https://gerrit.wikimedia.org/r/209991 (owner: 10TheDJ) [23:28:39] (03CR) 10Krinkle: [C: 032] Switch Graph extension to use npm testing for lint checks [integration/config] - 10https://gerrit.wikimedia.org/r/209991 (owner: 10TheDJ) [23:30:13] (03Merged) 10jenkins-bot: Switch Graph extension to use npm testing for lint checks [integration/config] - 10https://gerrit.wikimedia.org/r/209991 (owner: 10TheDJ) [23:31:14] !log Reloading Zuul to deploy https://gerrit.wikimedia.org/r/209991 [23:31:18] Logged the message, Master [23:41:17] RECOVERY - Host Generic Beta Cluster is UPING OK - Packet loss = 0%, RTA = 0.83 ms