[00:55:15] 10Gerrit, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2758960 (10Paladox) Oh @greg yes please for the test instance. Just needs labs to create the project for this test robot and to copy the original bot to this instance too :) [01:18:21] Seems we have a phab spammmet [03:07:44] RECOVERY - Puppet run on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [03:08:10] RECOVERY - Puppet run on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0] [03:08:34] RECOVERY - Puppet run on deployment-mira is OK: OK: Less than 1.00% above the threshold [0.0] [05:34:56] 10Gerrit, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2759181 (10greg) Of what? Again, please use the names of things you want a test instance of. I'm still confused on what you need. You haven't listed anything yet other than "test instance". I *think* your last se... [07:23:02] 10Gerrit, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2759232 (10Paladox) Oh, I will just be duplicating grrrit-wm on the instance so it should use the same gerrit account. But as it will be a test I will start it and test then stop it since we doint need two bots. A... [07:28:26] 10Gerrit, 06Labs, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2759234 (10Paladox) [07:28:35] 10Gerrit, 06Labs, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2755213 (10Paladox) 05stalled>03Open [07:33:57] 10MediaWiki-Releasing, 10Timeless, 10Vector, 10Wikimedia-Developer-Summit (2017): Replacing Vector as the default MediaWiki skin - https://phabricator.wikimedia.org/T149636#2759237 (10Paladox) As some extensions Wikipedia uses hardcore support for vector. Will will need to update those extensions to also c... [07:43:33] PROBLEM - Puppet run on integration-slave-trusty-1003 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [07:44:25] 03Scap3, 15User-mobrovac: Smart-merge checks for different environments - https://phabricator.wikimedia.org/T149668#2759242 (10mobrovac) [07:45:59] 03Scap3, 15User-mobrovac: Smart-merge checks for different environments - https://phabricator.wikimedia.org/T149668#2759256 (10mobrovac) [07:57:03] 10MediaWiki-Releasing, 10Timeless, 10Vector, 10Wikimedia-Developer-Summit (2017): Replacing Vector as the default MediaWiki skin - https://phabricator.wikimedia.org/T149636#2759281 (10Nemo_bis) [08:23:32] RECOVERY - Puppet run on integration-slave-trusty-1003 is OK: OK: Less than 1.00% above the threshold [0.0] [08:35:47] 10Continuous-Integration-Config, 10Tool-Labs-tools-stewardbots, 13Patch-For-Review: Implement jenkins tests on labs/tools/stewardbots - https://phabricator.wikimedia.org/T128503#2759303 (10MarcoAurelio) Maybe we shouldn't be using the `mediawiki` queue. Is there a `labs` queue? (Sometimes the mediawiki queue... [08:36:39] 10Continuous-Integration-Config, 10Tool-Labs-tools-stewardbots, 13Patch-For-Review: Implement jenkins tests on labs/tools/stewardbots - https://phabricator.wikimedia.org/T128503#2759304 (10MarcoAurelio) I've merged the above change. It always fails on tox, but I guess it's because the bot code is old. [09:19:18] 10Beta-Cluster-Infrastructure, 10Mobile-Content-Service, 10RESTBase, 06Services: Set up MCS in BetaCluster - https://phabricator.wikimedia.org/T149671#2759340 (10mobrovac) [09:20:01] 10Beta-Cluster-Infrastructure, 10Mobile-Content-Service, 10RESTBase, 06Services, 15User-mobrovac: Set up MCS in BetaCluster - https://phabricator.wikimedia.org/T149671#2759340 (10mobrovac) [09:51:33] PROBLEM - Puppet run on deployment-apertium01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [10:18:51] 10Gerrit, 06Operations, 10grrrit-wm, 13Patch-For-Review: Support restarting grrrit-wm automatically when we restart production gerrit - https://phabricator.wikimedia.org/T149609#2759431 (10Peachey88) [12:18:09] 10Gerrit, 06Labs, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2759556 (10Aklapper) @Paladox: Why did you re-add the #Labs team project? [12:19:38] 10Gerrit, 06Labs, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2759569 (10Aklapper) (In general: Could people please be specific, avoid using only "this" and "it", actually be explicit what they're talking about, and take more time to phrase sentences that do not of... [12:43:58] PROBLEM - Puppet run on repository is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [12:48:38] 10Gerrit, 06Labs, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2759613 (10Paladox) Sorry, I didn't see the labs tag had already been added and then removed. Reason why I added the tag is because labs needs to create this new labs project so I can create the instance. [13:23:59] RECOVERY - Puppet run on repository is OK: OK: Less than 1.00% above the threshold [0.0] [14:15:38] 10Gerrit, 06Operations, 10grrrit-wm, 13Patch-For-Review: Support restarting grrrit-wm automatically when we restart production gerrit - https://phabricator.wikimedia.org/T149609#2759780 (10Zppix) >>! In T149609#2758578, @Dzahn wrote: > And how would that be triggered from gerrit, when the whole point of ne... [14:21:32] 10Gerrit, 06Operations, 10grrrit-wm, 13Patch-For-Review: Support restarting grrrit-wm automatically when we restart production gerrit - https://phabricator.wikimedia.org/T149609#2759801 (10Zppix) [14:33:32] 10Gerrit, 06Labs, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2755213 (10Krenair) New project requests don't just need to be in #Labs, they also need to block T76375 - but I'm not sure this qualifies for a project of it's own. Why not just an extra tool, or even a... [14:34:49] 10Gerrit, 06Labs, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2759828 (10Zppix) >>! In T149529#2759181, @greg wrote: > Of what? Again, please use the names of things you want a test instance of. I'm still confused on what you need. You haven't listed anything yet o... [14:46:34] PROBLEM - Puppet run on zuul-dev-jessie is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [14:53:11] PROBLEM - Host deployment-pdf02 is DOWN: CRITICAL - Host Unreachable (10.68.16.129) [14:54:29] PROBLEM - Host deployment-conftool is DOWN: CRITICAL - Host Unreachable (10.68.20.30) [15:07:36] So how does one disable a phabricator user? https://phabricator.wikimedia.org/p/OakleyAlways1/ this person is just adding crap about hacking him to every task that he touches.... [15:08:34] robh, or.i disabled that one for me [15:08:43] awesome [15:08:50] robh, you need an admin account to do it [15:08:56] robh, but you don't have one [15:09:01] yeah i would have used the command line to take over as root admin ;D [15:09:11] unless you have the password to that @admin account [15:09:12] * robh doesnt use admin on robh since he can become root [15:09:14] which I know you've used before [15:09:28] but its disabled, no worries... not a very effective troll =] [15:09:29] in which case, where is that password? two other ops couldn't find it when asked [15:09:40] its a command line flag to give you a login url [15:09:56] its detailed in the phab module notes, i dont recall the exact command i read the README in the module when i need it =] [15:09:58] lemme see [15:10:09] ew. I assumed you had a password stored away in pwstore somewhere [15:10:10] ok [15:10:21] yeah, its not a password but a generated string with login details in the url [15:10:28] changes its generation each time too, so not static [15:10:30] might be easier to ask ops to bin/accountadmin to disable people [15:11:21] which is probably the answer to your original question now that I think about it, maybe [15:12:00] PROBLEM - SSH on deployment-sca02 is CRITICAL: Server answer [15:12:20] hrmm, readme says /bin/auth recover [15:12:24] that sounds familar [15:15:07] nope, at least that file doesnt exist in /srv/phab/phabricator/bin [15:15:24] Krenair: if i figure it out i'll toss a file in pwstore that says how to do it for ease of reference =] [15:15:59] to get the @admin credentials or to just disable an account from the CLI? [15:16:05] get the admin account [15:16:13] either way thanks [15:19:04] got it to work to give me a recovery url string that logs me in. [15:19:18] i'll go ahead and copy that info directly into pwstore now for future =] [15:28:44] disabling is as easy as pulling up the user profile and managing it after being admin, so thats easy. i love phabricator. [15:30:41] thanks robh [15:30:47] (re getting that info into pwstore) [15:31:04] it was always in the phab readme in the module but that may not be the first place someone looked. [15:31:14] I only knew from being around for the first implementation of phab [15:31:51] * greg-g nods [15:37:26] greg-g: we are doing 1.29 to group0 today ? [15:39:49] PROBLEM - Puppet run on deployment-phab02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:39:59] asking cause i don't see any changelog for this release [15:40:09] matanya: No changelog if it's not branched yet, presumably [15:40:30] yeah, makes sense [15:42:13] usually branch around 17:00 UTC, post notes on mediawiki.org shortly thereafter. [15:42:23] matanya: yup: https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20161101T1900 [15:42:44] and now i see https://phabricator.wikimedia.org/T149059 [15:42:55] thanks folks, sorry for the noise [15:44:55] 10Gerrit, 06Labs, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2760101 (10Paladox) @Krenair oh, I guess we could have it on tools. But we need the ability to perminatly stop this test bot since it will just duplicate things if it is left running. [15:51:03] PROBLEM - Puppet run on deployment-phab01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:55:12] 06Release-Engineering-Team (Deployment-Blockers), 05Release: MW-1.29.0-wmf.2 deployment blockers - https://phabricator.wikimedia.org/T149338#2760145 (10greg) a:03mmodell [15:55:39] 10Gerrit, 06Release-Engineering-Team, 06Operations, 13Patch-For-Review: Investigate why gerrit slowed down on 17/10/2016 / 18/10/2016 / 21/10/2016 - https://phabricator.wikimedia.org/T148478#2760146 (10ArielGlenn) Preliminary findings from the logs for about 1 week: - There were 238 full GCs or an averag... [15:59:06] 10Gerrit, 06Labs, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2760158 (10greg) I saw complaints last night of testing related to this making noise in our production Gerrit (and Phab?); what is your testing plan and how will you ensure that you are not disruptive in... [16:02:46] 10Gerrit, 06Labs, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2760170 (10Paladox) @greg we could test on either an instance. Or duplicate grrrit-wm on the tools labs so that the production one is always working, and we can test using the test bot under a different... [16:07:41] 10Gerrit, 06Labs, 10grrrit-wm: Create grrrit-wm test instance - https://phabricator.wikimedia.org/T149529#2760183 (10Paladox) Ive managed to create a instance on the git project. It is a small instance. [16:34:55] greg-g ive managed to setup a test instance using the project git ^^ [16:39:00] paladox: a test instance of what? [16:39:11] greg-g of grrrit-wm [16:39:53] and what impact will it have on Gerrit or IRC? [16:41:57] Well i will not do any testing on the production bot. I will do it on the test bot and will do it under probaly a nick name like grrrit-wm-test [16:42:02] as not to confuse [16:59:16] 10Gerrit: Delete node-rdkafka-stats repo from gerrit - https://phabricator.wikimedia.org/T149712#2760441 (10Ottomata) [17:00:39] 10Beta-Cluster-Infrastructure, 10Analytics, 10WikimediaPageViewInfo: Deploy WikimediaPageViewInfo extension to beta cluster - https://phabricator.wikimedia.org/T129602#2110544 (10Reedy) Don't want to do this until T148775 is fixed, but I'm happy to deploy this to beta and/or production afterwards as deemed a... [18:02:09] PROBLEM - Host integration-slave-trusty-1001 is DOWN: CRITICAL - Host Unreachable (10.68.16.168) [18:02:56] PROBLEM - Host integration-slave-trusty-1021 is DOWN: CRITICAL - Host Unreachable (10.68.17.118) [18:03:45] PROBLEM - Host deployment-elastic06 is DOWN: CRITICAL - Host Unreachable (10.68.17.186) [18:05:46] (03PS1) 10Arlolra: Build docs on jessie as well [integration/config] - 10https://gerrit.wikimedia.org/r/319114 [18:07:51] (03CR) 10Arlolra: "Please test this before merging. This was in best effort." [integration/config] - 10https://gerrit.wikimedia.org/r/319114 (owner: 10Arlolra) [18:56:28] greg-g: #wikimedia-gerrit is a thing.. apparently .. just not a very popular thing [18:56:31] 11:55 [Users #wikimedia-gerrit] [18:56:33] 11:55 [@ChanServ] [ erikab] [ mutante] [18:56:36] never knew [18:56:50] Me either found it after trying some channels that were empty [18:56:53] to test a bot [19:08:17] well that's a dumb thing :) [19:08:29] we don't need more channels for specific tools like that [19:08:36] Yep [19:08:41] I'd leave it and let it go quietly into the sunset :) [19:16:51] ^^ that wasent me [19:16:57] Project beta-scap-eqiad build #126977: 04FAILURE in 2 min 3 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/126977/ [19:26:48] Project beta-scap-eqiad build #126978: 04STILL FAILING in 1 min 49 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/126978/ [19:37:10] Yippee, build fixed! [19:37:11] Project beta-scap-eqiad build #126979: 09FIXED in 2 min 11 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/126979/ [21:19:33] Project beta-scap-eqiad build #126989: 04STILL FAILING in 4 min 36 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/126989/ [21:19:58] 10Gerrit, 06Operations, 10grrrit-wm, 13Patch-For-Review: Support restarting grrrit-wm automatically when we restart production gerrit - https://phabricator.wikimedia.org/T149609#2761484 (10Paladox) @Legoktm I've figured out how to do that. I found an npm library that can restart the whole script. [21:27:22] paladox: renames should be straight-forward, simply a #site-request to make the relevant config changes [21:27:28] legoktm: ty sir [21:27:31] Oh [21:27:55] greg-g ive created https://gerrit.wikimedia.org/r/#/c/319131/ [21:28:05] Project beta-scap-eqiad build #126990: 04STILL FAILING in 3 min 8 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/126990/ [21:29:24] paladox: it should really have an associated task for the config change in WMF production, not the task that's linked right now (the one about the procedural aspects of it) [21:30:07] greg-g oh is there a wiki page on this please? So i can create the task according to how the wiki page says this? [21:30:57] paladox: https://www.mediawiki.org/wiki/Shell_requests [21:31:03] Thanks [21:34:16] 10Gerrit, 06Repository-Admins, 13Patch-For-Review, 05WMF-deploy-2016-11-01_(1.29.0-wmf.1): Rename the Semantic Forms extension to "Page Forms" - https://phabricator.wikimedia.org/T147582#2697339 (10Paladox) [21:34:20] greg-g https://phabricator.wikimedia.org/T149749 [21:42:21] (03PS2) 10Legoktm: Build Parsoid docs on jessie [integration/config] - 10https://gerrit.wikimedia.org/r/319114 (owner: 10Arlolra) [21:50:30] Project beta-code-update-eqiad build #128322: 04FAILURE in 17 min: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/128322/ [21:56:54] Project selenium-Core » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #204: 04FAILURE in 4 min 53 sec: https://integration.wikimedia.org/ci/job/selenium-Core/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/204/ [21:57:21] Yippee, build fixed! [21:57:21] Project beta-code-update-eqiad build #128323: 09FIXED in 2 min 16 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/128323/ [21:58:09] Project selenium-PageTriage » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #194: 04FAILURE in 8.3 sec: https://integration.wikimedia.org/ci/job/selenium-PageTriage/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/194/ [21:58:10] Project selenium-PageTriage » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #194: 04FAILURE in 9.2 sec: https://integration.wikimedia.org/ci/job/selenium-PageTriage/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/194/ [22:00:24] Project beta-scap-eqiad build #126991: 04STILL FAILING in 3 min 2 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/126991/ [22:05:55] beta has some troubles apparently ^^^ [22:06:20] :/ [22:06:29] hm [22:06:49] Project beta-scap-eqiad build #126992: 04STILL FAILING in 1 min 54 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/126992/ [22:07:01] permission denied [22:07:14] I guess the keyholder is crazy on deployment-tin [22:07:45] probably still because all of labs is rebooting [22:07:54] or I broke it with the puppetmaster move :/ [22:08:01] yup [22:08:10] https://en.wikipedia.beta.wmflabs.org/w/api.php yields a varnish error [22:08:21] so maybe the mw app instances are rebooting [22:08:29] or the cache did not came back all fine [22:10:04] !log Armed keyholder on deployment-tin . Instance had 20 minutes uptime and apparently keyholder does not self arm [22:10:07] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:10:29] building on https://integration.wikimedia.org/ci/job/beta-scap-eqiad/126993/console [22:10:35] I already tried that hashar [22:10:43] Oh [22:10:51] Did you do it differently? [22:11:03] just did ssh to deployment-tin [22:11:07] sudo keyholder status [22:11:16] noticed keys are not loaded and thus: sudo keyholder arm [22:11:28] derp [22:11:31] I armed on -mira [22:11:36] then tried to use it on -tin [22:12:12] maybe we can make the jenkins job to always try to arm :D [22:12:43] there's still something wrong on -mir [22:12:44] mira [22:12:52] krenair@deployment-mira:~$ SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh mwdeploy@deployment-mediawiki04 [22:12:52] Connection closed by 10.68.19.128 [22:13:06] krenair@deployment-tin:~$ SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh mwdeploy@deployment-mediawiki04 [22:13:06] Linux deployment-mediawiki04 4.4.0-2-amd64 #1 SMP Debian 4.4.2-3+wmf6 (2016-10-18) x86_64 [22:14:43] who knows :( [22:15:01] the lame way would be: run puppet ; reboot ; keyholder arm [22:15:07] Yippee, build fixed! [22:15:08] Project beta-scap-eqiad build #126993: 09FIXED in 4 min 50 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/126993/ [22:15:31] so at least scap works now [22:15:43] the text cache / mw app servers is to be figured out though ( https://en.wikipedia.beta.wmflabs.org/w/api.php yields a 503 ) [22:16:04] but I am not working today and it is time to get to bed :/ [22:16:53] Nov 1 22:14:39 deployment-mediawiki04 sshd[5342]: pam_access(sshd:account): access denied for user `mwdeploy' from `deployment-mira.deployment-prep.eqiad.wmflabs' [22:17:09] right, I'll look at that first [22:17:24] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: invalid byte sequence in US-ASCII at /etc/puppet/modules/role/manifests/cache/text.pp:1 on node deployment-cache-text04.deployment-prep.eqiad.wmflabs [22:17:34] that is puppet on deployment-cache-text04 [22:18:14] iirc that would be due to a newer ruby version on the puppetmaster that is more strict about encoding [22:18:41] * Krenair sighs [22:18:42] 13 FetchError c Junk after gzip data [22:18:49] bah [22:20:04] Project beta-update-databases-eqiad build #12493: 04FAILURE in 3 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/12493/ [22:20:06] 2016-11-01 22:19:55 [WBkVCwpEE4AAAANQjy4AAAAE] deployment-mediawiki04 enwiki 1.29.0-alpha exception ERROR: [WBkVCwpEE4AAAANQjy4AAAAE] /wiki/test DBConnectionError from line 748 of /srv/mediawiki/php-master/includes/libs/rdbms/database/Database.php: Cannot access the database: Can't connect to MySQL server on '10.68.18.35' (111) (10.68.18.35) {"exception_id":"WBkVCwpEE4AAAANQjy4AAAAE"} [22:20:06] Project selenium-CentralAuth » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #196: 04FAILURE in 5.9 sec: https://integration.wikimedia.org/ci/job/selenium-CentralAuth/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/196/ [22:20:17] -db04 [22:22:37] hashar, okay, started mysql on the -db hosts to fix that [22:22:51] magic [22:23:39] clicked the job that updates the db https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/12494/console [22:24:01] api link is all back [22:24:03] Krenair: well done :) [22:24:23] Yippee, build fixed! [22:24:23] Project beta-update-databases-eqiad build #12494: 09FIXED in 1 min 5 sec: https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/12494/ [22:24:50] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find class role::cache::text for deployment-cache-text04.deployment-prep.eqiad.wmflabs on node deployment-cache-text04.deployment-prep.eqiad.wmflabs [22:24:52] seriously. [22:25:22] oh, it's one of those silly temporary errors I think [22:25:32] Yippee, build fixed! [22:25:33] Project selenium-PageTriage » chrome,beta,Linux,contintLabsSlave && UbuntuTrusty build #195: 09FIXED in 54 sec: https://integration.wikimedia.org/ci/job/selenium-PageTriage/BROWSER=chrome,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/195/ [22:25:38] retriggered https://integration.wikimedia.org/ci/job/selenium-PageTriage/ and https://integration.wikimedia.org/ci/job/selenium-Core/ [22:25:41] Yippee, build fixed! [22:25:42] Project selenium-PageTriage » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #195: 09FIXED in 1 min 3 sec: https://integration.wikimedia.org/ci/job/selenium-PageTriage/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/195/ [22:25:43] wait... no [22:30:27] hashar, did you change anything on that host? [22:30:30] -cache-text04 [22:30:43] Yippee, build fixed! [22:30:44] Project selenium-Core » firefox,beta,Linux,contintLabsSlave && UbuntuTrusty build #205: 09FIXED in 6 min 12 sec: https://integration.wikimedia.org/ci/job/selenium-Core/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/205/ [22:32:01] Krenair: nothing [22:32:05] havent touched it beside looking at puppet.log [22:32:16] maybe puppet has been broken for a while [22:32:29] at least starting the db on db4 fixed the jobs/beta web pages [22:33:17] (and apparently we have lost shinken.wmflabs.org / all monitoring) [22:33:49] sigh [22:33:56] I am losing track of all the broken things [22:34:02] The last Puppet run was at Tue Nov 1 18:03:20 UTC 2016 (263 minutes ago). Puppet is disabled. reason not specified [22:34:03] FML [22:34:14] bah [22:34:20] (this is on shinken-01) [22:34:35] just give up / move to other more interesting things :D [22:35:14] okay let's go back to beta [22:35:15] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: invalid byte sequence in US-ASCII at /etc/puppet/modules/role/manifests/cache/text.pp:1 on node deployment-cache-text04.deployment-prep.eqiad.wmflabs [22:35:22] this uses the deployment-puppetmaster puppetmaster [22:35:50] root@deployment-puppetmaster:/var/lib/git/operations/puppet# head -n1 /etc/puppet/modules/role/manifests/cache/text.pp | xxd [22:35:50] 0000000: 636c 6173 7320 726f 6c65 3a3a 6361 6368 class role::cach [22:35:50] 0000010: 653a 3a74 6578 7428 0a e::text(. [22:39:10] When I run puppet, I get a different error [22:39:13] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find class role::cache::text for deployment-cache-text04.deployment-prep.eqiad.wmflabs on node deployment-cache-text04.deployment-prep.eqiad.wmflabs [22:39:15] the puppet message is misleading [22:39:35] that is probably ruby dieing out when parsing the .pp as a ruby [22:39:40] and puppet assuming it is line 1 [22:40:18] Krenair: example from the past https://phabricator.wikimedia.org/T86282 [22:40:49] though that is for mw vagrant [22:41:01] but the trick is that ruby needs some appropriate locale to be passed [22:41:11] and whatever new puppetmaster setup we have is broken in that regard [22:41:22] worth filling it as a bug and reach out to ops list [22:41:31] some puppet guru will know [22:41:46] that is all. I am asleep now :) [22:42:38] no [22:42:42] this is the normal puppetmaster [22:42:46] I have not moved this node over [22:45:42] Krenair: dont obsess too much about it though :) [22:45:50] have a good day [22:46:00] any idea what happened here? https://integration.wikimedia.org/ci/job/selenium-CentralAuth/196/console [22:46:11] the selenium job failed with no output [22:47:29] Think I got the damn thing [22:47:53] Had python try to read the file, encode it, it told me what the invalid character was [22:47:56] then I removed it [22:48:00] - # topic into�many JSON based kafka topics for further [22:48:00] + # topic into many JSON based kafka topics for further [22:53:54] RECOVERY - Puppet run on deployment-mx is OK: OK: Less than 1.00% above the threshold [0.0] [22:54:04] RECOVERY - Puppet run on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:54:08] RECOVERY - Puppet run on deployment-imagescaler01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:54:14] RECOVERY - Puppet run on deployment-sca01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:54:31] RECOVERY - Puppet run on deployment-poolcounter04 is OK: OK: Less than 1.00% above the threshold [0.0] [22:54:35] RECOVERY - Puppet run on deployment-pdf01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:54:35] RECOVERY - Puppet run on deployment-db03 is OK: OK: Less than 1.00% above the threshold [0.0] [22:54:53] RECOVERY - Puppet run on deployment-mathoid is OK: OK: Less than 1.00% above the threshold [0.0] [22:55:18] RECOVERY - Puppet run on deployment-sentry01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:55:49] tgr: there was weirdness in beta recently [22:56:04] RECOVERY - Puppet run on deployment-kafka01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:56:04] RECOVERY - Puppet run on deployment-ms-be01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:56:08] RECOVERY - Puppet run on deployment-ircd is OK: OK: Less than 1.00% above the threshold [0.0] [22:57:18] RECOVERY - Puppet run on deployment-elastic07 is OK: OK: Less than 1.00% above the threshold [0.0] [22:58:14] RECOVERY - Puppet run on deployment-kafka03 is OK: OK: Less than 1.00% above the threshold [0.0] [22:58:19] RECOVERY - Puppet run on deployment-eventlogging03 is OK: OK: Less than 1.00% above the threshold [0.0] [22:58:23] RECOVERY - Puppet run on deployment-changeprop is OK: OK: Less than 1.00% above the threshold [0.0] [22:59:49] RECOVERY - Puppet run on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0] [23:24:00] 10Gerrit, 06Operations, 10grrrit-wm, 13Patch-For-Review: Support restarting grrrit-wm automatically when we restart production gerrit - https://phabricator.wikimedia.org/T149609#2761798 (10Paladox) @Legoktm ok ive fixed everything https://gerrit.wikimedia.org/r/318976 now. It's ready to be merged but needs... [23:37:21] tgr|away, greg-g: okay beta should have returned to normal levels of weirdness now [23:40:39] :) [23:45:48] PROBLEM - Puppet run on deployment-cache-text04 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [23:50:28] RECOVERY - Puppet run on deployment-tin is OK: OK: Less than 1.00% above the threshold [0.0]