[00:02:12] 10Gerrit, 06Operations, 10grrrit-wm, 13Patch-For-Review: Support restarting grrrit-wm automatically when we restart production gerrit - https://phabricator.wikimedia.org/T149609#2761897 (10Paladox) @Legoktm what I have done to achive what your asking is I have added a new library https://github.com/forever... [00:53:07] 10Continuous-Integration-Infrastructure (phase-out-gallium), 10releng-201617-q1, 13Patch-For-Review, 07Wikimedia-Incident: Phase out gallium.wikimedia.org - https://phabricator.wikimedia.org/T95757#2762045 (10Dzahn) IPv6 for contint1001 while at it: https://gerrit.wikimedia.org/r/#/c/316040/ https://gerri... [00:55:58] 10Beta-Cluster-Infrastructure, 06Labs: Move deployment-prep to role::puppetmaster::standalone - https://phabricator.wikimedia.org/T149620#2762053 (10AlexMonk-WMF) [01:13:54] PROBLEM - Puppet run on deployment-puppetmaster is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [01:28:56] RECOVERY - Puppet run on deployment-puppetmaster is OK: OK: Less than 1.00% above the threshold [0.0] [01:42:00] Yippee, build fixed! [01:42:00] Project selenium-CentralAuth » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #197: 09FIXED in 1 min 35 sec: https://integration.wikimedia.org/ci/job/selenium-CentralAuth/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/197/ [02:27:39] 10Gerrit, 06Operations, 10grrrit-wm, 13Patch-For-Review: Support restarting grrrit-wm automatically when we restart production gerrit - https://phabricator.wikimedia.org/T149609#2762126 (10Paladox) @Legoktm ok restarting only the ssh connection is supported now. You will need to be in the whitelist to run... [02:59:49] Heads up im doing some whitelist coding so if grrrit-wm disconnects(ill try to make it less often) thats why! [03:10:35] Zppix: huh? [03:10:46] Zppix: are you making changes to the bot that is running? [03:10:55] please see -operations for that same discussion [03:10:57] Nevermind disregard [03:10:58] there is a test bot [03:11:04] and a test channel [03:11:14] But i cant find its files xD [03:11:20] that shoudl be used preferably [03:11:39] sorry, but i dont know where they are, paladox does [03:11:45] he told me it's the same tool though [03:11:47] just a copy [03:11:52] I think its diff tool [03:12:19] Its fine i can wait until midnight when dev practically stops to restart if anything [03:13:03] can you just make your own copy ? [03:13:06] if you cant find his copy [03:13:12] and make it join the test channel [03:14:06] mutante: i could but there would be confusion having files 3 diff places [03:15:49] Zppix: that's exactly why i said to work on top of the existing file [03:16:03] you can just take it though and start another instance [03:16:07] as long as the code is the same [03:16:37] I could run a second grrit-wm and comment out its gerrit stream? [03:17:10] But that will be a hellve a bitch pardon my fancy words [03:17:17] you could run a second "grrrit-wm-test" [03:17:23] and do whateever you like with it [03:17:41] i dont know why you would want to comment the gerrit stream [03:17:47] but doesnt really matter [03:18:36] Im just not gonna restart it ever imma just edit the file via shell then let paladox push it or somethign [03:19:00] i feel a bit going in circles because i keep pointing out how it's better to use gerrit and the same code basis, rather than multiple people live hacking their own private stuff, but then you say that it's a problem that everybody does their own stuff [03:19:50] No, see whitelist.js does not even exist on gerrit as far as i knoe [03:19:52] Know [03:20:50] yes.. so? [03:20:55] you are testing something new, right [03:21:04] so it doesnt exist yet [03:22:29] why cant you just start a copy of the bot [03:22:35] like he did [03:22:43] Not entirely, see what this whitelist.js will do is grab from wikitech the maintainers list make sure the var whitelist in relay .js matches those users then will verify added users (via cmd or shell or file change) is up to date then update its own personal db with the correct whitelist [03:22:44] PROBLEM - Puppet run on integration-slave-trusty-1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [03:23:22] personal db? [03:23:29] The bots [03:23:36] it has a db? [03:24:39] Idk but if it does im making a new one for its whitelist (i dont wanna fuck shit up) [03:25:25] it's just a simple script, i dont know of any db [03:26:00] Ok [03:26:53] Well this simple script is about to get chuck norris nah never mind bad joke anyway it will become less simple when it comes to whitelist (atleast code wise) [03:27:19] and .. this is why you should upload it to code review [03:27:57] Its easier for me (not my first whitelist) [03:27:58] i can already see some questions about that [03:28:19] like "parsing a wiki to get maintainer list" .. doesnt that mean anyone can make themselves maintainer [03:28:38] No [03:28:43] and isn't that overengineering as opposed to a simple list of nick names in the file itself (which we have now) [03:28:45] etc [03:28:46] It will only get not send [03:29:08] It will get and send to bots db [03:29:11] Thats all [03:29:28] well, afaict, "bots db" here means "list of nick names in the script" [03:30:06] let's talk about it in the actual code [03:30:07] (I plan on updating the var via cmd usually) so attempting to relieve the need for someone to shell in [03:30:43] ok [03:31:14] Needing the db from mango hell even sql would of worked if we didnt use mango [03:31:33] Zppix: you can just paste your code on a phab paste... or use the gerrit patch uploader [03:31:41] but let's talk about it with real code [03:32:06] Ive not started yet ive been talking to you answering questions [03:32:06] preferably in the tool made for code review [03:33:23] ok, yes, all i wanted is point you to the test instance and channel that was made for testing now [03:34:13] i gotta go, cu later [03:57:45] RECOVERY - Puppet run on integration-slave-trusty-1001 is OK: OK: Less than 1.00% above the threshold [0.0] [07:41:29] PROBLEM - Puppet staleness on integration-slave-precise-1002 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [43200.0] [08:15:14] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Gerrit: Restricted tests are not run when creating a new patchset using Gerrit's web editor - https://phabricator.wikimedia.org/T149770#2762309 (10Nikerabbit) [08:37:53] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Gerrit: Restricted tests are not run when creating a new patchset using Gerrit's web editor - https://phabricator.wikimedia.org/T149770#2762353 (10Paladox) [08:40:13] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure, 10Gerrit: Restricted tests are not run when creating a new patchset using Gerrit's web editor - https://phabricator.wikimedia.org/T149770#2762309 (10Paladox) @Nikerabbit this is related to T141329 and it is fixed in gerrit 2.12.4 but w... [09:05:47] 10MediaWiki-Releasing, 10Timeless, 10Vector, 10Wikimedia-Developer-Summit (2017): Replacing Vector as the default MediaWiki skin - https://phabricator.wikimedia.org/T149636#2762550 (10matmarex) > There are two problems here: > > # Wikimedia wikis use the skin Vector, which does not have any Wikimedia br... [09:34:25] !grrrit-wm-restart [09:44:27] !grrrit-wm-restart [09:46:30] RECOVERY - Puppet staleness on integration-slave-precise-1002 is OK: OK: Less than 1.00% above the threshold [3600.0] [09:49:48] (03CR) 10Hashar: [C: 032] Whitelist anirudh24seven [integration/config] - 10https://gerrit.wikimedia.org/r/318928 (owner: 10Mholloway) [09:52:51] (03Merged) 10jenkins-bot: Whitelist anirudh24seven [integration/config] - 10https://gerrit.wikimedia.org/r/318928 (owner: 10Mholloway) [10:19:55] (03PS1) 10Hashar: Drop a couple explicit voting: true [integration/config] - 10https://gerrit.wikimedia.org/r/319289 [10:28:21] (03CR) 10Hashar: [C: 032] "Noop in Zuul" [integration/config] - 10https://gerrit.wikimedia.org/r/319289 (owner: 10Hashar) [10:30:05] (03Merged) 10jenkins-bot: Drop a couple explicit voting: true [integration/config] - 10https://gerrit.wikimedia.org/r/319289 (owner: 10Hashar) [10:34:57] (03PS1) 10Hashar: Drop oojs-ui-jshint and oojs-ui-jsonlint [integration/config] - 10https://gerrit.wikimedia.org/r/319290 [10:38:15] (03CR) 10Hashar: [C: 032] Drop oojs-ui-jshint and oojs-ui-jsonlint [integration/config] - 10https://gerrit.wikimedia.org/r/319290 (owner: 10Hashar) [10:39:06] (03Merged) 10jenkins-bot: Drop oojs-ui-jshint and oojs-ui-jsonlint [integration/config] - 10https://gerrit.wikimedia.org/r/319290 (owner: 10Hashar) [10:45:32] (03PS1) 10Hashar: dib: doxygen on Jessie images [integration/config] - 10https://gerrit.wikimedia.org/r/319292 (https://phabricator.wikimedia.org/T119140) [10:46:22] (03CR) 10Hashar: [C: 032] dib: doxygen on Jessie images [integration/config] - 10https://gerrit.wikimedia.org/r/319292 (https://phabricator.wikimedia.org/T119140) (owner: 10Hashar) [10:46:57] (03Merged) 10jenkins-bot: dib: doxygen on Jessie images [integration/config] - 10https://gerrit.wikimedia.org/r/319292 (https://phabricator.wikimedia.org/T119140) (owner: 10Hashar) [10:47:39] !log Force refresh Nodepool snapshot for Jessie so it get doxygen included T119140 [10:47:42] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:54:47] !log Image ci-jessie-wikimedia-1478083637 in wmflabs-eqiad is ready [10:54:50] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:13:44] (03PS1) 10Hashar: Migrate doxygen jobs to Nodepool/Jessie [integration/config] - 10https://gerrit.wikimedia.org/r/319296 (https://phabricator.wikimedia.org/T119140) [11:26:45] PROBLEM - Parsoid on deployment-parsoid09 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [11:28:28] bah [11:30:38] (03PS2) 10Hashar: Migrate doxygen jobs to Nodepool/Jessie [integration/config] - 10https://gerrit.wikimedia.org/r/319296 (https://phabricator.wikimedia.org/T119140) [11:32:36] PROBLEM - Puppet run on deployment-parsoid09 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [11:40:06] (03PS3) 10Hashar: Migrate doxygen jobs to Nodepool/Jessie [integration/config] - 10https://gerrit.wikimedia.org/r/319296 (https://phabricator.wikimedia.org/T119140) [11:40:23] (03CR) 10Hashar: [C: 032] Migrate doxygen jobs to Nodepool/Jessie [integration/config] - 10https://gerrit.wikimedia.org/r/319296 (https://phabricator.wikimedia.org/T119140) (owner: 10Hashar) [11:41:19] (03Merged) 10jenkins-bot: Migrate doxygen jobs to Nodepool/Jessie [integration/config] - 10https://gerrit.wikimedia.org/r/319296 (https://phabricator.wikimedia.org/T119140) (owner: 10Hashar) [11:47:51] PROBLEM - Puppet staleness on deployment-cache-text04 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [43200.0] [11:57:34] RECOVERY - Puppet run on deployment-parsoid09 is OK: OK: Less than 1.00% above the threshold [0.0] [12:05:52] 10Gerrit, 06Operations, 10grrrit-wm, 13Patch-For-Review: Support restarting grrrit-wm automatically when we restart production gerrit - https://phabricator.wikimedia.org/T149609#2763074 (10Zppix) [12:06:36] RECOVERY - Parsoid on deployment-parsoid09 is OK: HTTP OK: HTTP/1.1 200 OK - 1514 bytes in 0.463 second response time [12:09:20] 10Gerrit, 06Operations, 10grrrit-wm, 13Patch-For-Review: Support restarting grrrit-wm automatically when we restart production gerrit - https://phabricator.wikimedia.org/T149609#2763086 (10Zppix) Im also currently working on a whitelist code to let certain users add people to whitelist for admin commands a... [12:12:58] (03PS1) 10Hashar: mediawiki-core-doxygen-publish to Jessie/Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/319305 (https://phabricator.wikimedia.org/T119140) [12:18:36] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T149059#2763131 (10matmarex) [12:20:52] 10Gerrit, 06Operations, 10grrrit-wm, 13Patch-For-Review: Support restarting grrrit-wm automatically when we restart production gerrit - https://phabricator.wikimedia.org/T149609#2757787 (10hashar) Why are you seeking to restart the bot entirely and or adding a user facing command to manually restart it ?... [12:21:42] (03CR) 10Hashar: [C: 032] "Validated against a REL1_28 change" [integration/config] - 10https://gerrit.wikimedia.org/r/319305 (https://phabricator.wikimedia.org/T119140) (owner: 10Hashar) [12:23:02] hashar ^^ the reason is so we can automate things from production gerrit. So when gerrit production goes down instead of us ssh in to restart the bot, we can issue a command and it does it for us. [12:23:23] (03PS1) 10Hashar: Expand mw-setup JJB macro in the sole job using it [integration/config] - 10https://gerrit.wikimedia.org/r/319311 [12:24:04] paladox: the bot should just do it automatically [12:24:13] hashar it carn't [12:24:20] it runs on tools and using kubenetes [12:24:34] paladox: Gerrit restart --> the ssh is terminated --> bot catch the disconnect error -> reestablish the connection. Done [12:25:05] Oh, well i found a command that ends the ssh but it just disconnects it, dosen't cause it to fail [12:27:24] (03Merged) 10jenkins-bot: mediawiki-core-doxygen-publish to Jessie/Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/319305 (https://phabricator.wikimedia.org/T119140) (owner: 10Hashar) [12:31:20] (03CR) 10Hashar: [C: 032] "Noop in JJB :}" [integration/config] - 10https://gerrit.wikimedia.org/r/319311 (owner: 10Hashar) [12:33:29] hashar but it isen't as easy as that. Thats why a irc command is better. [12:33:50] (03Merged) 10jenkins-bot: Expand mw-setup JJB macro in the sole job using it [integration/config] - 10https://gerrit.wikimedia.org/r/319311 (owner: 10Hashar) [12:45:42] paladox: Surely it will be [12:45:46] SSH will eventually timeout [12:45:50] Might take a few minutes [12:45:52] But it will [12:45:55] And you can catch that [12:46:33] Reedy oh, maybe you could help since the docs at https://github.com/mscdex/ssh2 only say an end command [12:46:39] please? [13:56:45] 10Gerrit, 06Release-Engineering-Team, 06Operations, 13Patch-For-Review: Investigate why gerrit slowed down on 17/10/2016 / 18/10/2016 / 21/10/2016 - https://phabricator.wikimedia.org/T148478#2763356 (10ArielGlenn) Note that since the parallel GCs and CMS use pretty much the same algorithm when doing minor... [14:02:27] (03PS1) 10Hashar: php-compile jobs to Nodepool [integration/config] - 10https://gerrit.wikimedia.org/r/319324 (https://phabricator.wikimedia.org/T119140) [14:06:19] (03CR) 10Hashar: [C: 04-2] "Miss a bunch of development dependencies" [integration/config] - 10https://gerrit.wikimedia.org/r/319324 (https://phabricator.wikimedia.org/T119140) (owner: 10Hashar) [16:27:24] 06Release-Engineering-Team, 10Wikimedia-Developer-Summit, 15User-greg: Facilitate Wikidev'17 main topic "How to manage our technical debt" - https://phabricator.wikimedia.org/T147937#2763909 (10greg) [16:27:36] 06Release-Engineering-Team, 10Wikimedia-Developer-Summit, 15User-greg: Facilitate Wikidev'17 main topic "How to manage our technical debt" - https://phabricator.wikimedia.org/T147937#2709174 (10greg) [16:27:47] 06Release-Engineering-Team, 10Wikimedia-Developer-Summit, 15User-greg: Facilitate Wikidev'17 main topic "How to manage our technical debt" - https://phabricator.wikimedia.org/T147937#2709174 (10greg) [16:27:58] 06Release-Engineering-Team, 10Wikimedia-Developer-Summit, 15User-greg: Facilitate Wikidev'17 main topic "How to manage our technical debt" - https://phabricator.wikimedia.org/T147937#2709174 (10greg) [16:28:11] 06Release-Engineering-Team, 10Wikimedia-Developer-Summit, 15User-greg: Facilitate Wikidev'17 main topic "How to manage our technical debt" - https://phabricator.wikimedia.org/T147937#2709174 (10greg) [16:28:24] 06Release-Engineering-Team, 10Wikimedia-Developer-Summit, 15User-greg: Facilitate Wikidev'17 main topic "How to manage our technical debt" - https://phabricator.wikimedia.org/T147937#2709174 (10greg) [16:30:07] ryasmeen: sorry, looks like my connection dropped [16:30:42] zeljkof: yeah I cant join the hangout either [16:31:08] looks like it is hangouts problem then [16:31:53] ok lets do over text then ? [16:32:02] (03PS1) 10Hashar: pywikibot-tests-beta-cluster to Nodepool/Jessie [integration/config] - 10https://gerrit.wikimedia.org/r/319353 [16:32:22] (03CR) 10Hashar: "Deployed/refreshed" [integration/config] - 10https://gerrit.wikimedia.org/r/319353 (owner: 10Hashar) [16:32:35] ryasmeen: it will be hard to debug problems via text :) [16:32:42] want to try bluejeans? [16:33:12] or try appear.in [16:33:55] we can, I have never tried it [16:34:36] andre__: do you know if it has screen sharing? [16:34:41] (03CR) 10Hashar: [C: 032] pywikibot-tests-beta-cluster to Nodepool/Jessie [integration/config] - 10https://gerrit.wikimedia.org/r/319353 (owner: 10Hashar) [16:34:50] I am looking at it, but don't see it anywhere [16:35:21] oh, screensharing... probably not, sorry [16:35:43] (03Merged) 10jenkins-bot: pywikibot-tests-beta-cluster to Nodepool/Jessie [integration/config] - 10https://gerrit.wikimedia.org/r/319353 (owner: 10Hashar) [16:36:17] I'm also not sure if bluejeans supports it, but probably does [16:37:14] ryasmeen: can you try joining the hangout again? looks like it works again [16:37:34] zeljkof: okay [17:42:10] James_F: legoktm I added you to https://gerrit.wikimedia.org/r/#/c/319361/ since you were both listed on the 1.27.0-rc.0 bump patch previously and I'm cargo culting. [17:44:46] thcipriani: That's not been done already? :) [17:45:28] the 1.28.0-rc.0 tag? Not on REL1_28 at least... [17:45:41] Well, I meant the bump, at least [17:45:57] oh nope :) [17:45:59] Oh buggar [17:46:04] That reminds me what I was gonna do today [17:46:10] I've got half an hour to do it [17:46:14] * Reedy looks [17:46:32] plan from here is to tag that commit, and the run make-release and hopefully come out with a presentable tarball at the end [17:46:53] Reedy: is this something for 1.28 release? [17:46:57] Yup [17:47:06] legoktm wants to revert his manifest_version 2 stuff [17:47:16] I was gonna look at the extensions, and work out which ones are trivial, and which ones need more work [17:47:23] And put some effort into fixing up as appropriate [17:47:35] I'd question if we want to deal with that before a rc [17:47:39] kk, is this stuff that needs to get into rc.0? [17:47:47] the extension patches not so much [17:47:55] If we're making changes to core... I'd suggest we might do, yes [17:48:13] Core bug is https://phabricator.wikimedia.org/T149757 [17:48:25] Krinkle is suggesting we just ship it as is, and move onto version 3 [17:48:33] That way, would just be some documentation fixes [17:48:47] https://phabricator.wikimedia.org/T149759 is the affected extensions [17:48:57] There's a couple, such as Zero that I won't bother with [17:48:59] Cause no one else uses them [17:49:22] But TorBlock, TrustedXff (among others) will need dealing with in REL1_28 *IF* we revert [17:49:56] PROBLEM - Puppet run on deployment-pdfrender is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [17:52:53] but none of those extensions that need downgraded are bundled extensions for the release afaict (using ./make-release.py --list-bundled) [17:53:09] so reverting core would be the only thing for REL1_28, seem correct? [17:58:53] 06Release-Engineering-Team, 10Wikimedia-Developer-Summit, 15User-greg: Facilitate Wikidev'17 main topic "How to manage our technical debt" - https://phabricator.wikimedia.org/T147937#2764234 (10greg) [18:01:24] thcipriani: Yeah... Would be nice to get the extensions fixed in the REL1_28 branch before the final release though [18:01:27] So they work [18:01:32] Shouldn't be too much work [18:02:12] sure, agreed. I'm just trying to determine what should block tagging 1.28.0-rc.0 [18:02:36] PROBLEM - Host deployment-pdfrender is DOWN: CRITICAL - Host Unreachable (10.68.17.167) [18:03:10] PROBLEM - Host deployment-mediawiki05 is DOWN: CRITICAL - Host Unreachable (10.68.22.21) [18:04:10] PROBLEM - Host deployment-prometheus01 is DOWN: CRITICAL - Host Unreachable (10.68.20.247) [18:04:52] PROBLEM - Host repository is DOWN: CRITICAL - Host Unreachable (10.68.18.179) [18:04:55] PROBLEM - Host deployment-db03 is DOWN: CRITICAL - Host Unreachable (10.68.23.30) [18:06:22] PROBLEM - Host deployment-tin is DOWN: CRITICAL - Host Unreachable (10.68.21.205) [18:14:57] RECOVERY - Host deployment-db03 is UP: PING OK - Packet loss = 0%, RTA = 0.79 ms [18:15:05] RECOVERY - Host deployment-prometheus01 is UP: PING OK - Packet loss = 0%, RTA = 1.24 ms [18:15:54] RECOVERY - Host deployment-tin is UP: PING OK - Packet loss = 0%, RTA = 1.41 ms [18:17:22] RECOVERY - Host deployment-pdfrender is UP: PING OK - Packet loss = 0%, RTA = 0.85 ms [18:17:48] RECOVERY - Host repository is UP: PING OK - Packet loss = 0%, RTA = 0.81 ms [18:18:47] RECOVERY - Host deployment-mediawiki05 is UP: PING OK - Packet loss = 0%, RTA = 0.97 ms [18:18:54] Project beta-mediawiki-config-update-eqiad build #5940: 04FAILURE in 13 min: https://integration.wikimedia.org/ci/job/beta-mediawiki-config-update-eqiad/5940/ [18:18:54] Project beta-code-update-eqiad build #128444: 04FAILURE in 15 min: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/128444/ [18:19:27] PROBLEM - Puppet run on deployment-mediawiki05 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [18:22:27] PROBLEM - Puppet run on deployment-tin is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [18:24:22] PROBLEM - Host integration-puppetmaster is DOWN: CRITICAL - Host Unreachable (10.68.16.42) [18:25:27] PROBLEM - Host deployment-mediawiki06 is DOWN: CRITICAL - Host Unreachable (10.68.19.241) [18:26:15] PROBLEM - Host deployment-phab02 is DOWN: CRITICAL - Host Unreachable (10.68.19.232) [18:26:19] PROBLEM - Host deployment-kafka05 is DOWN: CRITICAL - Host Unreachable (10.68.21.106) [18:26:25] PROBLEM - Host deployment-db04 is DOWN: CRITICAL - Host Unreachable (10.68.18.35) [18:26:52] PROBLEM - Host deployment-apertium02 is DOWN: CRITICAL - Host Unreachable (10.68.22.254) [18:26:58] PROBLEM - Host integration-slave-jessie-1003 is DOWN: CRITICAL - Host Unreachable (10.68.21.145) [18:27:08] PROBLEM - Host integration-slave-jessie-android is DOWN: CRITICAL - Host Unreachable (10.68.19.239) [18:27:30] RECOVERY - Puppet run on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0] [18:27:48] PROBLEM - Host deployment-jobrunner02 is DOWN: CRITICAL - Host Unreachable (10.68.19.42) [18:28:12] PROBLEM - Host integration-puppetmaster01 is DOWN: CRITICAL - Host Unreachable (10.68.22.41) [18:29:08] PROBLEM - Puppet run on integration-publisher is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [18:29:20] PROBLEM - Puppet run on integration-slave-trusty-1006 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [18:29:28] RECOVERY - Puppet run on deployment-mediawiki05 is OK: OK: Less than 1.00% above the threshold [0.0] [18:36:26] RECOVERY - Host deployment-db04 is UP: PING OK - Packet loss = 0%, RTA = 2.69 ms [18:36:41] RECOVERY - Host deployment-mediawiki06 is UP: PING OK - Packet loss = 0%, RTA = 0.70 ms [18:36:41] PROBLEM - Puppet run on integration-slave-trusty-1011 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [18:36:45] RECOVERY - Host deployment-apertium02 is UP: PING OK - Packet loss = 0%, RTA = 2.28 ms [18:37:31] RECOVERY - Host integration-slave-jessie-android is UP: PING OK - Packet loss = 0%, RTA = 2.61 ms [18:37:31] RECOVERY - Host integration-slave-jessie-1003 is UP: PING OK - Packet loss = 0%, RTA = 1.37 ms [18:37:32] RECOVERY - Host deployment-kafka05 is UP: PING OK - Packet loss = 0%, RTA = 1.93 ms [18:37:39] RECOVERY - Host integration-puppetmaster01 is UP: PING OK - Packet loss = 0%, RTA = 1.27 ms [18:38:55] PROBLEM - Puppet run on integration-saltmaster is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [18:39:44] PROBLEM - Puppet run on integration-slave-precise-1011 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [18:40:24] RECOVERY - Host deployment-phab02 is UP: PING OK - Packet loss = 0%, RTA = 1.40 ms [18:41:08] PROBLEM - Puppet run on deployment-jobrunner02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [18:41:18] RECOVERY - App Server Main HTTP Response on deployment-mediawiki06 is OK: HTTP OK: HTTP/1.1 200 OK - 1547 bytes in 2.218 second response time [18:42:08] PROBLEM - Puppet run on integration-slave-jessie-1003 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [18:42:30] PROBLEM - Puppet run on integration-puppetmaster01 is CRITICAL: CRITICAL: 16.67% of data above the critical threshold [0.0] [18:44:05] PROBLEM - Puppet run on deployment-apertium02 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0] [18:44:25] PROBLEM - Puppet run on deployment-mediawiki06 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0] [18:45:05] PROBLEM - Puppet run on deployment-kafka05 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [18:45:17] PROBLEM - Puppet run on deployment-db04 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [18:49:02] RECOVERY - Puppet run on deployment-apertium02 is OK: OK: Less than 1.00% above the threshold [0.0] [18:49:24] RECOVERY - Puppet run on deployment-mediawiki06 is OK: OK: Less than 1.00% above the threshold [0.0] [18:50:04] RECOVERY - Puppet run on deployment-kafka05 is OK: OK: Less than 1.00% above the threshold [0.0] [18:50:17] RECOVERY - Puppet run on deployment-db04 is OK: OK: Less than 1.00% above the threshold [0.0] [18:51:09] RECOVERY - Puppet run on deployment-jobrunner02 is OK: OK: Less than 1.00% above the threshold [0.0] [18:52:11] RECOVERY - Puppet run on integration-slave-jessie-1003 is OK: OK: Less than 1.00% above the threshold [0.0] [18:52:31] RECOVERY - Puppet run on integration-puppetmaster01 is OK: OK: Less than 1.00% above the threshold [0.0] [18:55:15] greg-g: u around ? [18:55:54] matanya: I have partial attention for 5 minutes, then zero for the next hour-ish :) [18:56:13] 5 should be enough for a short trivial question [18:56:51] you might know i try to review the wmf.X releases every week and report fatals and issues [18:57:46] i find i am only partially useful as i can't provide the full flow of the failing request due to insufficient rights to view logs [18:58:20] would you support a request from me to view access and error logs and the like ?or is there a way to view them without server access ? [18:59:09] greg-g: i used 3 out of the 5 :) [18:59:32] :) [18:59:44] I think so yeah, file a task! [19:00:12] greg-g: that would be flourine ? [19:00:59] yeah, flourine/logstash [19:01:20] greg-g: logstash i already have [19:02:03] ah, then yeah, fluorine [19:02:50] please request groups rather than individual server logins [19:03:20] where they exist anyway [19:03:21] there is mw-log-readers for this purpose [19:03:38] mw-log-reader it is, thanks [19:06:22] Krenair: ah, there is! good [19:06:33] with the 's', but yeah [19:06:46] alright, time to be afk for a bit [19:08:33] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.28.0-wmf.23 deployment blockers - https://phabricator.wikimedia.org/T147517#2764556 (10mmodell) 05Open>03Resolved [19:09:09] RECOVERY - Puppet run on integration-publisher is OK: OK: Less than 1.00% above the threshold [0.0] [19:09:19] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T149059#2764562 (10mmodell) a:05thcipriani>03mmodell [19:09:20] RECOVERY - Puppet run on integration-slave-trusty-1006 is OK: OK: Less than 1.00% above the threshold [0.0] [19:13:54] RECOVERY - Puppet run on integration-saltmaster is OK: OK: Less than 1.00% above the threshold [0.0] [19:16:41] RECOVERY - Puppet run on integration-slave-trusty-1011 is OK: OK: Less than 1.00% above the threshold [0.0] [19:19:43] RECOVERY - Puppet run on integration-slave-precise-1011 is OK: OK: Less than 1.00% above the threshold [0.0] [19:39:38] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T149059#2764769 (10Nikerabbit) [20:04:46] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T149059#2740552 (10mmodell) [20:05:21] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T149059#2764940 (10Matanya) [20:11:00] 10Continuous-Integration-Config, 10Pywikibot-core: pywikibot-tests-beta-cluster job does not run any test - https://phabricator.wikimedia.org/T149842#2764959 (10hashar) [20:14:33] 10Continuous-Integration-Config, 10Pywikibot-core: pywikibot-tests-beta-cluster job does not run any test - https://phabricator.wikimedia.org/T149842#2764976 (10hashar) T100796 was for Travis setup Looks that my task is a duplicate of T100903 based on last comment there. [20:14:44] 10Continuous-Integration-Config, 10Pywikibot-core: pywikibot-tests-beta-cluster job does not run any test - https://phabricator.wikimedia.org/T149842#2764985 (10hashar) [20:14:47] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 10Pywikibot-core, 13Patch-For-Review, 07Pywikibot-tests: Run pywikibot test suite regularly on beta cluster as part of MediaWiki/Wikimedia CI - https://phabricator.wikimedia.org/T100903#1323218 (10hashar) [20:18:52] 10Continuous-Integration-Infrastructure, 10Ladies-That-FOSS-MediaWiki, 13Patch-For-Review, 07PostgreSQL: Jenkins: Set up PHPUnit testing on PostgreSQL backend - https://phabricator.wikimedia.org/T39602#2765276 (10Jdforrester-WMF) Migrating from the old tracking task to a tag for PostgreSQL-related tasks. [20:23:18] twentyafterfour: ping? [20:27:07] 10Gerrit, 06Operations, 10grrrit-wm, 13Patch-For-Review: Support restarting grrrit-wm automatically when we restart production gerrit - https://phabricator.wikimedia.org/T149609#2766021 (10Zppix) @hashar As you may already know we are actually doing not that we are attempting (still WIP) to get ssh to gerr... [20:29:44] maybe of interest to people here: https://lists.wikimedia.org/mailman/listinfo/labs-admin [20:33:56] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 10Pywikibot-core, 13Patch-For-Review, 07Pywikibot-tests: Run pywikibot test suite regularly on beta cluster as part of MediaWiki/Wikimedia CI - https://phabricator.wikimedia.org/T100903#2766133 (10hashar) It does not run any test! `... [20:46:45] SMalyshev: yo [20:48:25] twentyafterfour: ah, cool! wanted to ask - do you know how wikidata is deployed? [20:48:48] PROBLEM - Puppet run on deployment-fluorine02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [20:48:49] (because you're owner of the deploy train, so maybe :) [20:49:12] twentyafterfour: i.e. if I want to get some patch into current deploy, what I need to do [20:49:21] It's deployed along with the train but it's got a bit of a unique build process (which I am not very familiar with) and a separate release cycle. [20:49:54] SMalyshev: I think it would go through swat like anything else but you might want to talk to the wmfde team about it [20:50:02] twentyafterfour: yeah that's the missing part for me... for other places I usually make a cherry-pick and put it on SWAT but not sure here [20:50:23] ok, will try that [20:50:53] twentyafterfour: thanks! [20:51:04] SMalyshev: no prob [20:54:07] 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 10Pywikibot-core, 13Patch-For-Review, 07Pywikibot-tests: Run pywikibot test suite regularly on beta cluster as part of MediaWiki/Wikimedia CI - https://phabricator.wikimedia.org/T100903#2766208 (10hashar) With `-a code=en,family=wiki... [21:13:42] 10Continuous-Integration-Config, 10Reading Web Trending service, 06Reading-Web-Backlog, 06Services: Add CI to trending-edits repo - https://phabricator.wikimedia.org/T149601#2766334 (10Jdlrobson) [21:27:04] is this vagrant patch failing because of indentation problems in a file I didn't touch? https://gerrit.wikimedia.org/r/317862 [21:30:53] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T149059#2766393 (10Matanya) [21:33:42] 06Release-Engineering-Team, 10Wikimedia-Developer-Summit, 15User-greg: Facilitate Wikidev'17 main topic "How to manage our technical debt" - https://phabricator.wikimedia.org/T147937#2766421 (10greg) [21:45:07] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.1 deployment blockers - https://phabricator.wikimedia.org/T149059#2766493 (10mmodell) [21:45:16] (03PS1) 10Hashar: Rename parsoid jsduck publish job [integration/config] - 10https://gerrit.wikimedia.org/r/319423 [21:45:26] (03CR) 10Hashar: [C: 032] Rename parsoid jsduck publish job [integration/config] - 10https://gerrit.wikimedia.org/r/319423 (owner: 10Hashar) [21:46:57] ejegg: yuck looks like it. let me take a look at the lint failure locally and see if I can fix it [21:47:49] (03CR) 10jenkins-bot: [V: 04-1] Rename parsoid jsduck publish job [integration/config] - 10https://gerrit.wikimedia.org/r/319423 (owner: 10Hashar) [21:47:54] thanks bd808! [21:50:36] (03CR) 10Hashar: [C: 032] Rename parsoid jsduck publish job [integration/config] - 10https://gerrit.wikimedia.org/r/319423 (owner: 10Hashar) [21:52:48] (03Merged) 10jenkins-bot: Rename parsoid jsduck publish job [integration/config] - 10https://gerrit.wikimedia.org/r/319423 (owner: 10Hashar) [21:54:02] (03CR) 10Arlolra: "Look like this failed :(" [integration/config] - 10https://gerrit.wikimedia.org/r/319114 (owner: 10Arlolra) [21:56:43] (03CR) 10Hashar: "Yeah mostly my fault :] I rebased your change and have not been careful. The job has to be suffixed with '-publish' or some env variable" [integration/config] - 10https://gerrit.wikimedia.org/r/319114 (owner: 10Arlolra) [21:56:57] Yippee, build fixed! [21:56:57] Project beta-code-update-eqiad build #128445: 09FIXED in 2 min 7 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/128445/ [21:57:03] ah [21:57:18] jenkins jobs for beta cluster are running again [21:58:28] (03CR) 10Hashar: "Completed! Thank you Arlolra :]" [integration/config] - 10https://gerrit.wikimedia.org/r/319114 (owner: 10Arlolra) [21:59:38] (03CR) 10Arlolra: "Awesome, thanks!" [integration/config] - 10https://gerrit.wikimedia.org/r/319114 (owner: 10Arlolra) [22:03:28] done for tonight [22:04:23] * greg-g waves [22:06:31] oh [22:06:36] and apparently morebots are gone for realy [22:06:43] !log hello stashbot [22:06:46] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:06:57] * hashar waves [22:07:07] ejegg: ok I submitted and merged https://gerrit.wikimedia.org/r/#/c/319473/ to fix the linter errors. You'll need to rebase your patch on top of it. The sneaky thing that happened here is that you edited a *.rb file so the full rake test fired and it noticed random files in the repo that had linter errors. [22:08:03] aha, makes sense [22:08:13] rebasing [22:40:01] thanks bd808, got my V+2 [22:40:52] ejegg: awesome. and death to activemq :) [22:41:10] heh, so satisfying [23:01:30] 03Scap3, 10scap: sync-dir labs config change cached wrong version of InitialSettings.php - https://phabricator.wikimedia.org/T149618#2766718 (10greg) [23:01:50] thcipriani: I added a new REL1_28 backport (it's being SWATed now); what's the process now that RC0 is out? [23:02:00] 10Gerrit, 06Labs, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2766719 (10greg) >>! In T149529#2760183, @Paladox wrote: > Ive managed to create a instance on the git project. It is a small instance. So, resolved? [23:02:48] 10Gerrit, 06Labs, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2766720 (10Paladox) 05Open>03Resolved Yep. [23:03:57] James_F: I tagged RC0 in core, so it should be fine to continue to merge into REL1_28 and we'll release a .1 with a .patch file. Is that how it's been done in the past? [23:13:02] 06Release-Engineering-Team, 15User-greg: Create RelEng FY1617Q1 QR slides (due 10/14) - https://phabricator.wikimedia.org/T146515#2766808 (10greg) [23:13:04] 06Release-Engineering-Team, 15User-greg: Update RelEng skillmatrix (due 10/18) - https://phabricator.wikimedia.org/T146516#2766807 (10greg) 05Open>03Resolved [23:13:14] 06Release-Engineering-Team, 15User-greg: Create FY1617Q2 timespent spreadsheet - https://phabricator.wikimedia.org/T147675#2766809 (10greg) 05Open>03Resolved [23:13:19] 06Release-Engineering-Team, 15User-greg: Create RelEng FY1617Q1 QR slides (due 10/14) - https://phabricator.wikimedia.org/T146515#2663812 (10greg) 05Open>03Resolved [23:13:54] 06Release-Engineering-Team, 15User-greg: Once over of [[mw:dev/maint]] for obvious out of dateness - https://phabricator.wikimedia.org/T147862#2706149 (10greg) 05Open>03Resolved [23:21:24] 06Release-Engineering-Team, 15User-greg: Reach out to WMDE re [[mw:dev/maint]] - https://phabricator.wikimedia.org/T147861#2766843 (10greg) 05Open>03Resolved [23:28:50] thcipriani: Err. I think so? It theoretically (famous last words) shouldn't have any ability to affect anything other than itself (it's a JS fix for a widget used by UploadWizard, VisualEditor and others), but… [23:32:11] in theory theory and practice are similar :) I'd say go ahead and cherry-pick to REL1_28 and we can figure it out. [23:32:26] Done. [23:32:43] If it all breaks, it's my fault, but I'm still blaming you. ;-) [23:33:47] heh, seems fair. [23:33:52] I'll blame chad :) [23:34:10] in the grand tradition of buck-passing. [23:58:46] * James_F laughs.