[00:22:05] 10Deployment-Systems, 6Services: Evaluate Ansible as a deployment tool - https://phabricator.wikimedia.org/T93433#1267624 (10GWicke) FWIW, I have recently used Ansible for some simple deploy-related tasks: - [Rolling RESTBase deploy / restart on staging servers](https://github.com/gwicke/ansible-playground/bl... [01:01:28] 10Continuous-Integration-Infrastructure: Error in CI job: "destination path 'src' already exists and is not an empty directory" - https://phabricator.wikimedia.org/T98426#1267680 (10Mattflaschen) 3NEW [01:06:28] 10Continuous-Integration-Infrastructure: Error in CI job: "destination path 'src' already exists and is not an empty directory" - https://phabricator.wikimedia.org/T98426#1267692 (10Krinkle) Yeah, I'm getting it too from time to time. Sounds like a bootstrapping issue. In one hand (zuul-cloner) assumes `src` can... [01:07:05] 10Continuous-Integration-Infrastructure: Error in CI job: "destination path 'src' already exists and is not an empty directory" - https://phabricator.wikimedia.org/T98426#1267693 (10Krinkle) [01:08:03] 10Continuous-Integration-Infrastructure: mwexit-qunit fails with "git fatal: 'src' already exists and is not empty" - https://phabricator.wikimedia.org/T98426#1267695 (10Krinkle) [01:26:31] 10Browser-Tests, 10Wikimania-Hackathon-2015, 10Wikimedia-Hackathon-2015: Workshop: write the first browsertests/Selenium test - https://phabricator.wikimedia.org/T94024#1267732 (10Qgil) Are you still planning to run this session? If so, we need to know how much time you need, roughly how many participants yo... [01:31:39] PROBLEM - Host integration-dev is DOWN: CRITICAL - Host Unreachable (10.68.16.227) [01:32:49] PROBLEM - Host integration-publisher is DOWN: CRITICAL - Host Unreachable (10.68.16.255) [01:34:09] RECOVERY - Host integration-dev is UP: PING OK - Packet loss = 0%, RTA = 0.57 ms [01:36:46] RECOVERY - Host integration-publisher is UP: PING OK - Packet loss = 0%, RTA = 0.71 ms [01:37:38] PROBLEM - Host integration-slave-precise-1011 is DOWN: CRITICAL - Host Unreachable (10.68.17.70) [01:50:08] would someone please tell me who does the updates for interwiki.cdb [01:51:25] in production? [01:51:40] or in beta? [01:52:18] prod [01:53:02] I don't know if it's assigned to someone specific [01:53:06] reedy used to [01:53:32] should i just do the occasional phab request? [01:53:52] or is ther another process? [01:54:07] phab requests sounds good and like it would be effective [01:54:29] k, any special project to add? [01:54:43] reedy did everything:) [01:55:12] yes . who forgot to nailgun his feet to the floor? [01:55:14] wikimedia-site-requests probably sDrewth [01:55:27] I think we have docs on this somewhere... I forget where [01:55:49] saw MediaWiki-Sites [01:55:53] k that I can do. it was poopoo'd for bugzilla [01:57:25] https://www.mediawiki.org/wiki/Interwiki_cache#CDB_interwiki_cache [01:57:26] hmm [01:57:46] https://phabricator.wikimedia.org/T35395 [01:57:53] "MediaWiki core does not have a way to generate interwiki cache" [01:57:54] oh really [01:59:18] https://www.mediawiki.org/wiki/Extension:WikimediaMaintenance [02:01:17] So it turns out I have run this script before sDrewth [02:01:24] https://gerrit.wikimedia.org/r/#/c/194513/ [02:01:31] t98429 [02:02:53] oho, something really broke [02:02:53] wow [02:03:02] not screamingly urgent, just needs doing occasionally, and now and again quickly for internal updates [02:03:14] bd808, you there? [02:03:25] I think a sync-file thing has broken updateinterwikicache [02:03:41] It seems to be trying to syntax-check interwiki.cdb [02:04:12] that doesn't seem right or good [02:04:32] * sDrewth hides for being starter of evil [02:05:48] 02:02:35 sync-file failed: Command '/usr/bin/php -l /srv/mediawiki-staging/wmf-config/interwiki.cdb' returned non-zero exit status 255 [02:06:02] naturally including all sorts of crap like: Warning: Unexpected character in input: '' (ASCII=30) state=0 in /srv/mediawiki-staging/wmf-config/interwiki.cdb on line 3143 [02:06:05] heh. yeah that's a bug for sure [02:06:09] or Parse error: syntax error, unexpected T_STRING in /srv/mediawiki-staging/wmf-config/interwiki.cdb on line 3143 [02:07:42] So now the change to the file is on tin but not synced or committed or anything [02:08:09] I wonder why it would try to lint it... [02:08:21] RECOVERY - Host integration-slave-precise-1011 is UP: PING OK - Packet loss = 0%, RTA = 0.58 ms [02:09:26] oh for christ sakes [02:09:37] https://github.com/wikimedia/mediawiki-tools-scap/blob/master/scap/main.py#L397 [02:09:56] PROBLEM - Host integration-slave-precise-1012 is DOWN: CRITICAL - Host Unreachable (10.68.17.174) [02:10:01] I bet that was the uncomitted hack I stashed this morning [02:10:08] haha [02:12:04] jeebus. the hack was for the next line. They both need to be guarded [02:12:05] * yuvipanda tut tuts bd808 [02:12:35] yuvipanda: no there was a dirty file that I stashed (removing the hack) [02:12:41] aaah ok [02:12:42] but this needs a patch [02:13:08] Krenair: you can sync-dir wmf-config for now [02:13:26] I'll get a patch up for this. [02:36:04] RECOVERY - Host integration-slave-precise-1012 is UP: PING OK - Packet loss = 0%, RTA = 0.65 ms [02:36:07] PROBLEM - Host integration-slave-precise-1013 is DOWN: CRITICAL - Host Unreachable (10.68.17.209) [02:43:31] thx krenair [02:44:34] * sDrewth mutters about how to do tab name expansion on his phone [03:01:10] RECOVERY - Host integration-slave-precise-1013 is UP: PING OK - Packet loss = 0%, RTA = 2.60 ms [03:04:18] PROBLEM - Host integration-slave-precise-1014 is DOWN: CRITICAL - Host Unreachable (10.68.18.38) [03:27:11] RECOVERY - Host integration-slave-precise-1014 is UP: PING OK - Packet loss = 0%, RTA = 0.71 ms [03:28:01] PROBLEM - Host integration-slave-trusty-1011 is DOWN: CRITICAL - Host Unreachable (10.68.17.244) [03:29:19] PROBLEM - Puppet staleness on deployment-redis01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [03:43:25] 10Continuous-Integration-Infrastructure: mwexit-qunit fails with "git fatal: 'src' already exists and is not empty" - https://phabricator.wikimedia.org/T98426#1267854 (10Legoktm) a:3Legoktm I think we can just move `extensions_load.txt` directly under $WORKSPACE. [03:57:21] RECOVERY - Host integration-slave-trusty-1011 is UP: PING OK - Packet loss = 0%, RTA = 0.78 ms [03:59:19] PROBLEM - Host integration-slave-trusty-1012 is DOWN: CRITICAL - Host Unreachable (10.68.18.2) [03:59:42] um [04:00:11] what's going on with the slaves? [04:00:22] they felt they were entitled, so they're taking the night off [04:00:40] well, they're going down and up in order so I'm guessing someone is doing something :P [04:07:57] (03PS1) 10BryanDavis: Better handling for php lint checks [tools/scap] - 10https://gerrit.wikimedia.org/r/209425 [04:10:01] (03CR) 10Legoktm: Better handling for php lint checks (031 comment) [tools/scap] - 10https://gerrit.wikimedia.org/r/209425 (owner: 10BryanDavis) [04:14:47] (03PS1) 10Legoktm: Make sure src/ is empty before zuul-cloner runs [integration/config] - 10https://gerrit.wikimedia.org/r/209426 (https://phabricator.wikimedia.org/T98426) [04:15:24] (03PS2) 10BryanDavis: Better handling for php lint checks [tools/scap] - 10https://gerrit.wikimedia.org/r/209425 [04:15:44] (03CR) 10BryanDavis: Better handling for php lint checks (031 comment) [tools/scap] - 10https://gerrit.wikimedia.org/r/209425 (owner: 10BryanDavis) [04:19:41] (03CR) 10Legoktm: [C: 031] Better handling for php lint checks [tools/scap] - 10https://gerrit.wikimedia.org/r/209425 (owner: 10BryanDavis) [04:25:39] RECOVERY - Host integration-slave-trusty-1012 is UP: PING OK - Packet loss = 0%, RTA = 66.63 ms [04:27:10] (03CR) 10Legoktm: [C: 032] Make sure src/ is empty before zuul-cloner runs [integration/config] - 10https://gerrit.wikimedia.org/r/209426 (https://phabricator.wikimedia.org/T98426) (owner: 10Legoktm) [04:27:39] PROBLEM - Host integration-slave-trusty-1015 is DOWN: CRITICAL - Host Unreachable (10.68.18.30) [04:28:46] (03Merged) 10jenkins-bot: Make sure src/ is empty before zuul-cloner runs [integration/config] - 10https://gerrit.wikimedia.org/r/209426 (https://phabricator.wikimedia.org/T98426) (owner: 10Legoktm) [04:29:33] qa-morebots: link? [04:29:33] I am a logbot running on tools-exec-1213. [04:29:33] Messages are logged to https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL. [04:29:33] To log a message, type !log . [04:31:50] * legoktm grumbles about not !logging [04:52:22] PROBLEM - Content Translation Server on deployment-cxserver03 is CRITICAL: Connection refused [05:00:45] PROBLEM - Host integration-slave-trusty-1016 is DOWN: CRITICAL - Host Unreachable (10.68.18.34) [05:01:24] legoktm: ^ [05:01:24] hmm [05:01:41] yuvipanda: 1015 has't come back yet though :/ [05:02:29] RECOVERY - Host integration-slave-trusty-1015 is UP: PING OK - Packet loss = 0%, RTA = 2.05 ms [05:02:49] !log slaves are going up/down likely due to automated labs migration script [05:02:57] Logged the message, Master [05:05:30] 10Beta-Cluster, 10ContentTranslation-Deployments, 10ContentTranslation-cxserver: Beta: cxserver fails to start with permission issue - https://phabricator.wikimedia.org/T98436#1267903 (10KartikMistry) 3NEW [05:07:30] 10Beta-Cluster, 10ContentTranslation-Deployments, 10ContentTranslation-cxserver: Beta: cxserver fails to start with permission issue - https://phabricator.wikimedia.org/T98436#1267910 (10KartikMistry) [05:17:24] RECOVERY - Content Translation Server on deployment-cxserver03 is OK: HTTP OK: HTTP/1.1 200 OK - 1103 bytes in 0.021 second response time [05:29:57] RECOVERY - Host integration-slave-trusty-1016 is UP: PING OK - Packet loss = 0%, RTA = 1.14 ms [05:32:31] PROBLEM - Puppet failure on deployment-bastion is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [05:50:09] (03CR) 10Ori.livneh: [C: 032] Better handling for php lint checks [tools/scap] - 10https://gerrit.wikimedia.org/r/209425 (owner: 10BryanDavis) [05:50:25] (03Merged) 10jenkins-bot: Better handling for php lint checks [tools/scap] - 10https://gerrit.wikimedia.org/r/209425 (owner: 10BryanDavis) [06:02:32] RECOVERY - Puppet failure on deployment-bastion is OK: OK: Less than 1.00% above the threshold [0.0] [07:02:15] PROBLEM - Puppet failure on deployment-logstash1 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [07:27:18] RECOVERY - Puppet failure on deployment-logstash1 is OK: OK: Less than 1.00% above the threshold [0.0] [07:48:11] (03Abandoned) 10Awight: CiviCRM job can be run concurrently [integration/config] - 10https://gerrit.wikimedia.org/r/203187 (https://phabricator.wikimedia.org/T91895) (owner: 10Awight) [08:40:29] 10Beta-Cluster, 10ContentTranslation-Deployments, 10ContentTranslation-cxserver: Beta: cxserver fails to start with permission issue - https://phabricator.wikimedia.org/T98436#1268110 (10akosiaris) I see it's fine now. I have seen this once before and it seemed like a race condition between puppet and jenkin... [08:44:09] PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:49:01] RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 46885 bytes in 0.963 second response time [09:16:40] 6Release-Engineering, 6Phabricator: Next Phabricator upgrade on 2015-05-20 (tentative) - https://phabricator.wikimedia.org/T98451#1268189 (10mmodell) 3NEW a:3mmodell [12:33:33] 10Deployment-Systems, 6operations, 7Graphite, 5Patch-For-Review: [scap] Deploy events aren't showing up in graphite/gdash - https://phabricator.wikimedia.org/T64667#1268531 (10fgiunchedi) 5Open>3Resolved so two additional issues identified, one fixed in https://gerrit.wikimedia.org/r/209462 the other w... [13:32:39] zeljkof-meeting: one sec, running a couple minutes late, sorry [13:32:57] AndyRussG: no problem, I am in the hangout [13:40:37] AndyRussG: https://office.wikimedia.org/wiki/Selenium_passwords [13:47:38] zeljkof-meeting: https://saucelabs.com/tests/01763171ff854fb6aa1128a23e5a6f32 [13:48:38] zeljkof-meeting: https://integration.wikimedia.org/ci/view/BrowserTests/view/CentralNotice/ [14:04:53] 10Browser-Tests, 10Wikimania-Hackathon-2015, 10Wikimedia-Hackathon-2015: Workshop: Fix broken browsertests/Selenium Jenkins jobs - https://phabricator.wikimedia.org/T94299#1268830 (10Qgil) Are you still planning to run this session? If so, we need to know how much time you need, roughly how many participants... [14:05:08] !log As of two days now, Jenkins always returns Wikimedia 503 Error page after logging in. Log in session itself is fine. [14:05:15] Logged the message, Master [14:05:38] !log deployment-bastion.eqiad has been stuck for 10 hours. [14:05:48] Logged the message, Master [14:09:32] 10Continuous-Integration-Infrastructure, 6Release-Engineering, 7Jenkins, 7Upstream: [upstream] Jenkins Gearman plugin has deadlock on executor threads (was: Beta Cluster stopped receiving code updates (beta-update-databases-eqiad hung) - https://phabricator.wikimedia.org/T72597#1268860 (10Krinkle) You can... [14:13:19] 6Release-Engineering, 6Engineering-Community, 6Team-Practices: RelEng team offsite - May 2015 - Pre Wikimedia Hackathon - https://phabricator.wikimedia.org/T89036#1268878 (10Qgil) [14:14:45] zeljkof-meeting: https://integration.wikimedia.org/ci/view/BrowserTests/view/CentralNotice/job/browsertests-CentralNotice-en.m.wikipedia.beta.wmflabs.org-linux-android-sauce/89/ [14:21:41] 6Release-Engineering, 10Wikimedia-Hackathon-2015: Release/QA tasks at the Wikimedia Hackathon 2015 - https://phabricator.wikimedia.org/T92565#1268902 (10Qgil) If there are any sessions that you would prefer to schedule in advance (i.e. a training session for newcomers), please let me know. The time to book slo... [14:25:51] 6Release-Engineering, 6Mobile-Web: reading-wmf@ list should be notified of browser test failures - https://phabricator.wikimedia.org/T98477#1268931 (10Jdlrobson) 3NEW [14:27:09] 6Release-Engineering, 6Mobile-Web: reading-wmf@ list should be notified of browser test failures - https://phabricator.wikimedia.org/T98477#1268940 (10Krenair) [14:35:40] 10Beta-Cluster, 10MediaWiki-extensions-GWToolset, 6Multimedia, 7HHVM, 5Patch-For-Review: GWToolset XML upload fails with “The file that was uploaded exceeds the upload_max_filesize and/or the post_max_size directive in php.ini” on hhvm 3.6 - https://phabricator.wikimedia.org/T97415#1268987 (10JeanFred) (... [14:37:33] 6Release-Engineering, 6Mobile-Web: reading-wmf@ list should be notified of browser test failures in Gather and mobile web - https://phabricator.wikimedia.org/T98477#1268988 (10Jdlrobson) [14:42:26] (03PS1) 10Legoktm: Fix typo [integration/config] - 10https://gerrit.wikimedia.org/r/209491 [14:43:58] (03CR) 10Legoktm: [C: 032] Fix typo [integration/config] - 10https://gerrit.wikimedia.org/r/209491 (owner: 10Legoktm) [14:45:42] (03Merged) 10jenkins-bot: Fix typo [integration/config] - 10https://gerrit.wikimedia.org/r/209491 (owner: 10Legoktm) [15:35:19] 6Release-Engineering, 6Mobile-Web: reading-wmf@ list should be notified of browser test failures in Gather and mobile web - https://phabricator.wikimedia.org/T98477#1269189 (10zeljkofilipin) (I think @dduvall is on vacation until the hackathon.) Yes, any change done via jenkins web interface will get overwrit... [15:35:42] 10Browser-Tests, 6Release-Engineering, 6Mobile-Web: reading-wmf@ list should be notified of browser test failures in Gather and mobile web - https://phabricator.wikimedia.org/T98477#1269191 (10zeljkofilipin) [15:42:01] 10Continuous-Integration-Infrastructure: integration-saltmaster stalled / can not reboot due to labvirt1005 - https://phabricator.wikimedia.org/T97533#1269201 (10Cmjohnson) [15:55:43] 6Release-Engineering, 6Reading-Infrastructure-Team, 5Patch-For-Review, 7Puppet: Create basic puppet role for Sentry - https://phabricator.wikimedia.org/T84956#1269268 (10Jdforrester-WMF) [16:05:51] !log Updated scap to 5d681af (Better handling for php lint checks) [16:05:55] Logged the message, Master [16:22:59] 10Browser-Tests, 6Release-Engineering, 10Continuous-Integration-Config, 6Mobile-Web: reading-wmf@ list should be notified of browser test failures in Gather and mobile web - https://phabricator.wikimedia.org/T98477#1269390 (10Legoktm) [17:57:57] why is deployment-bastion.eqiad offline? [17:58:03] legoktm: ^ [17:58:17] umm [17:58:19] I have no idea [17:58:25] in jenkins? [17:58:29] Disconnected by krinkle : Stuck. [17:58:37] ask him :P [17:59:27] well, I brought it back online and things are running [18:00:02] !log brought deployment-bastion.eqiad back online in Jenkins (after Krinkle disconnected it some hours ago). Jobs are processing [18:00:11] Logged the message, Master [18:04:51] https://gerrit.wikimedia.org/r/#/c/209418/ - that took it's time... [18:08:38] Krenair: 8s :P [18:08:47] look at when it was merged [18:08:54] yes I know :P [18:12:04] Krenair: I think that's the job (15+ hours right?) that got me to look at jenkins [18:39:20] yeah, greg-g [19:00:26] PROBLEM - Free space - all mounts on deployment-eventlogging02 is CRITICAL: CRITICAL: deployment-prep.deployment-eventlogging02.diskspace._var.byte_percentfree (<100.00%) [19:01:32] bearND: yuo're kinda spamming the channels with all the joins/parts again :) you make our bots look quiet :) [19:05:14] greg-g: sorry, I had not noticed. I had put my laptop to sleep. So, no idea why it would join and disconnect all the time. [19:07:34] ah, I had my wireless mouse in my pocket. That probably explains. [19:09:54] bearND: heh :) [19:43:15] why are there still references to common/ in the scap code, bd808? [19:43:26] wasn't that some fenari directory structure? [19:44:11] just no one has bothered to update them? or does it still mean something non-obvious? [20:00:31] greg-g: legoktm: I disconnected it because it was stuck and wouldn't come back. This has been happening for a long time. Almost every day the beta job locks up Jenkins in manual intervention. It interferes with everything else, so the quick fix is to kill it and get back to work. It needs rewriting. [20:00:43] See https://phabricator.wikimedia.org/T72597, and more actionably: https://phabricator.wikimedia.org/T96199 [20:01:20] Maybe someone in RelEng can rewrite that job to use bash or some other language to use concurrency instead of replicating/matrix jenkins jobs, which is unstable. [20:01:25] and is what causes these deadlocks [20:01:50] that should never have been written and enabled that way in the first place. It's poisonous and doesn't work, [20:02:12] * and required manual intervention [20:24:26] Krenair: The code reference in scap/tasks.py is to the rsync server location name -- https://github.com/wikimedia/operations-puppet/blob/19d39296876c2f734dfa8d632a91e624049e16e8/modules/scap/manifests/master.pp#L25-L29 [20:24:52] I think I saw something in a comment [20:25:28] I think this was pointed to /a/common at some point in the way long ago [20:25:49] now it roots at /srv/mediawiki-staging [20:27:33] 10Beta-Cluster: Grant deployment access for beta cluster - https://phabricator.wikimedia.org/T98523#1270400 (10Krenair) [20:28:02] 10Beta-Cluster: Grant deployment access for beta cluster - https://phabricator.wikimedia.org/T98523#1270389 (10Krenair) Beta cluster is not really administered directly by ops, certainly not with #ops-access-requests [20:28:15] was that /home/wikipedia/common at some point bd808? [20:29:48] yeah I think so. there was a farm of crazy symlinks for every path ever when I started working on prod and beta cluster. Ori and _joe_ squashed them all over the course of the hhvm rollout [20:30:27] things like /home/wikipedia/common, /usr/local/common, /a/common [20:30:57] I guess /usr/local/apache/common [20:31:10] and /u/l/a/uncommon :) [20:31:57] Basically every time the disk layout changed people just linked the old location to the new one "just in case something needs it" [20:33:25] We still have /a/mw-log on fluorine [20:33:33] *nod* [20:34:02] I think that may be the only /a/* cruft left around [20:34:02] is /a/mw-log not the canonical location of logs? [20:34:13] it's just a weird mount point [20:34:42] what actually reads from it automatically? fatalmonitor? [20:34:45] I would expect that when that box is reimaged/replaced the logs will be at /srv/mw-logs or some such [20:35:29] fatalmonitor, udp2log, log2udp, syslog, ... probably some other things [20:39:19] 10Beta-Cluster: Grant deployment access for beta cluster - https://phabricator.wikimedia.org/T98523#1270439 (10Krenair) What's your wikitech username? [20:39:44] 10Beta-Cluster: Grant deployment access for beta cluster - https://phabricator.wikimedia.org/T98523#1270440 (10Krenair) Oh right, we can get that via Phabricator now. Great. [20:41:20] 10Beta-Cluster: Grant deployment access for beta cluster - https://phabricator.wikimedia.org/T98523#1270446 (10Krenair) 5Open>3Resolved a:3Krenair [20:42:08] 10Beta-Cluster: Grant James Douglas deployment access for beta cluster - https://phabricator.wikimedia.org/T98523#1270451 (10greg) [20:42:21] 10Beta-Cluster: Grant James Douglas deployment access for beta cluster - https://phabricator.wikimedia.org/T98523#1270389 (10greg) p:5Triage>3Normal [20:42:52] 10Beta-Cluster: Grant James Douglas deployment access for beta cluster - https://phabricator.wikimedia.org/T98523#1270459 (10Jdouglas) It's jdoug- oh, you already fixed it. Thanks! [20:44:46] 10Deployment-Systems, 6Release-Engineering: Determine weekly triage meeting for Deployment Systems - https://phabricator.wikimedia.org/T98206#1270473 (10greg) a:3thcipriani Assigning to Tyler to lead this (for now at least). [20:45:21] 10Beta-Cluster, 6Release-Engineering: Determine weekly triage meeting for Beta Cluster - https://phabricator.wikimedia.org/T98204#1270475 (10greg) a:3mmodell Assigning to Mukunda to lead this, for now at least. [20:48:47] !log Updated kibana to bb9fcf6 (Merge remote-tracking branch 'upstream/kibana3') [20:48:52] Logged the message, Master [21:05:02] (03PS1) 10Awight: Disable tests on deployment branches where we have removed them [integration/config] - 10https://gerrit.wikimedia.org/r/209629 [21:05:29] (03PS2) 10Awight: Disable tests on deployment branches where we have removed them [integration/config] - 10https://gerrit.wikimedia.org/r/209629 (https://phabricator.wikimedia.org/T94586) [21:24:21] PROBLEM - Puppet failure on deployment-videoscaler01 is CRITICAL 55.56% of data above the critical threshold [0.0] [21:39:21] RECOVERY - Puppet failure on deployment-videoscaler01 is OK Less than 1.00% above the threshold [0.0] [21:57:56] PROBLEM - SSH on integration-slave-trusty-1011 is CRITICAL - Socket timeout after 10 seconds [21:58:49] um [21:59:55] 10Continuous-Integration-Infrastructure, 10Gather, 5Patch-For-Review: Set up qunit Jenkins job for Extension:Gather - https://phabricator.wikimedia.org/T91708#1270700 (10Jdlrobson) [22:02:48] RECOVERY - SSH on integration-slave-trusty-1011 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2 (protocol 2.0) [22:07:40] PROBLEM - Host integration-slave-jessie-1001 is DOWN: PING CRITICAL - Packet loss = 100% [22:23:02] 10Beta-Cluster, 10Graphoid: Deploy Graphoid on Beta Cluster - https://phabricator.wikimedia.org/T97606#1270752 (10GWicke) @yurik, IIRC you have to set up deploy rules in puppet that are specific to labs to make that work with trebuchet. It's certainly easier to check out the code manually, straight from gerri... [22:35:32] 22:32:16 java.io.IOException: remote file operation failed: /mnt/jenkins-workspace/workspace/operations-puppet-tox-py27 at hudson.remoting.Channel@14bf2cd3:integration-slave-precise-1012: java.io.IOException: Could not fetch from any repository [22:36:42] eh.. after simply repeating it with "recheck" it's ok again [22:42:34] PROBLEM - Host integration-slave-jessie-1001 is DOWN: CRITICAL - Host Unreachable (10.68.16.72) [22:43:02] mutante: there's some weird stuff going on... [22:44:00] PROBLEM - Host integration-slave-trusty-1011 is DOWN: CRITICAL - Host Unreachable (10.68.17.244) [22:44:42] legoktm: looks like labs issues then [22:45:11] 15:44 < icinga-wm> PROBLEM - Host labvirt1008 is DOWN: PING CRITICAL - Packet loss = 100% [22:45:56] legoktm: i think it was a labs migration [22:46:00] < andrewbogott> Will send an email about labvirt1008 and migration status .. [22:46:04] ok [22:49:21] 10Continuous-Integration-Infrastructure, 5Patch-For-Review: mwexit-qunit fails with "git fatal: 'src' already exists and is not empty" - https://phabricator.wikimedia.org/T98426#1270851 (10Mattflaschen) 5Open>3Resolved [23:02:38] RECOVERY - Host integration-slave-jessie-1001 is UPING OK - Packet loss = 0%, RTA = 0.65 ms [23:02:46] RECOVERY - Host integration-slave-trusty-1011 is UPING OK - Packet loss = 0%, RTA = 0.65 ms [23:21:24] (03CR) 1020after4: [C: 032] make-release: Add option to list all bundled extensions [tools/release] - 10https://gerrit.wikimedia.org/r/201247 (owner: 10Legoktm) [23:26:34] (03CR) 1020after4: [C: 032] make-release: Unbreak the SMW bundle [tools/release] - 10https://gerrit.wikimedia.org/r/201248 (owner: 10Legoktm) [23:27:09] (03CR) 1020after4: [C: 032] make-release: Don't re-list all bundled extensions for each release [tools/release] - 10https://gerrit.wikimedia.org/r/201246 (owner: 10Legoktm) [23:27:34] thanks twentyafterfour [23:29:29] (03Merged) 10jenkins-bot: make-release: Don't re-list all bundled extensions for each release [tools/release] - 10https://gerrit.wikimedia.org/r/201246 (owner: 10Legoktm) [23:29:31] (03Merged) 10jenkins-bot: make-release: Add option to list all bundled extensions [tools/release] - 10https://gerrit.wikimedia.org/r/201247 (owner: 10Legoktm) [23:29:33] (03Merged) 10jenkins-bot: make-release: Unbreak the SMW bundle [tools/release] - 10https://gerrit.wikimedia.org/r/201248 (owner: 10Legoktm) [23:31:21] PROBLEM - Puppet failure on deployment-pdf01 is CRITICAL 100.00% of data above the critical threshold [0.0] [23:31:25] PROBLEM - Puppet staleness on deployment-urldownloader is CRITICAL 100.00% of data above the critical threshold [43200.0] [23:31:45] PROBLEM - Puppet failure on deployment-cache-text02 is CRITICAL 100.00% of data above the critical threshold [0.0]