[01:09:31] !log tools killing local copy of python-requests, there seems to be a newer vesrion in prod [01:09:38] Logged the message, Master [01:50:55] RECOVERY - Puppet failure on tools-exec-1202 is OK: OK: Less than 1.00% above the threshold [0.0] [01:52:25] RECOVERY - Puppet failure on tools-exec-1203 is OK: OK: Less than 1.00% above the threshold [0.0] [01:58:41] RECOVERY - Puppet failure on tools-exec-1206 is OK: OK: Less than 1.00% above the threshold [0.0] [02:01:19] RECOVERY - Puppet failure on tools-exec-1210 is OK: OK: Less than 1.00% above the threshold [0.0] [02:01:53] RECOVERY - Puppet failure on tools-exec-1208 is OK: OK: Less than 1.00% above the threshold [0.0] [02:03:56] YESSSS [02:03:57] RECOVERY - Puppet failure on tools-exec-1209 is OK: OK: Less than 1.00% above the threshold [0.0] [02:06:52] RECOVERY - Puppet failure on tools-exec-1207 is OK: OK: Less than 1.00% above the threshold [0.0] [02:09:58] RECOVERY - Puppet failure on tools-exec-1204 is OK: OK: Less than 1.00% above the threshold [0.0] [02:10:52] RECOVERY - Puppet failure on tools-exec-1205 is OK: OK: Less than 1.00% above the threshold [0.0] [02:14:10] !log tools created toolx-exec-14{01-05} [02:14:36] Logged the message, Master [02:38:21] YuviKTM: why do you have so many nicks? [02:38:49] Negative24: 'I have made a vow to Yahweh and cannot break it.' [02:39:18] ? [02:39:35] :) long story [02:40:42] I can see a bit of an explanation in -dev. Hi legopanda [02:40:48] :D [02:44:21] PROBLEM - Puppet failure on tools-exec-1401 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [02:44:40] we know, shinken-wm. [02:44:44] it’ll be alright, don’t worry [02:46:43] PROBLEM - Puppet failure on tools-exec-1405 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [0.0] [02:46:47] its always nice to know that it has your back (about 4-5 times in redundancy) [02:46:57] :) [02:46:58] yeah [02:51:27] PROBLEM - Puppet failure on tools-exec-1403 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [02:53:19] !log tools created tools-exec-14{06-10} [02:54:01] Logged the message, Master [02:54:21] RECOVERY - Puppet failure on tools-exec-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [02:55:54] thank you, tools-exec-1401! [02:56:45] RECOVERY - Puppet failure on tools-exec-1405 is OK: OK: Less than 1.00% above the threshold [0.0] [03:01:00] PROBLEM - Puppet failure on tools-exec-1402 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:04:07] !log tools pooled tools-exec-1401 [03:04:12] Logged the message, Master [03:06:50] PROBLEM - Puppet failure on tools-exec-1404 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [03:07:46] !log tools pooled tools-exec-1405 [03:07:51] Logged the message, Master [03:09:46] dungodung: I was wondering if you had a minute to look into my bot's cloak request, please. :) [03:13:56] !log tools pooled tools-exec-1402 [03:14:02] Logged the message, Master [03:16:28] RECOVERY - Puppet failure on tools-exec-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [03:16:50] RECOVERY - Puppet failure on tools-exec-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [03:18:23] !log tools pooled tools-exec-1403, 1404 [03:18:28] Logged the message, Master [03:21:34] !log disabled and drained continuous tasks off tools-exec-20 [03:21:38] disabled is not a valid project. [03:24:17] !log tools disabled and drained continuous tasks off tools-exec-20 to tools-exec-24 [03:24:22] Logged the message, Master [03:25:03] I love those overlooked "fill in your details here" files [03:25:33] * Negative24 facepalms *hard* [03:26:03] RECOVERY - Puppet failure on tools-exec-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [03:26:41] PROBLEM - Host tools-exec-21 is DOWN: CRITICAL - Host Unreachable (10.68.17.252) [03:27:29] PROBLEM - Host tools-exec-22 is DOWN: CRITICAL - Host Unreachable (10.68.17.253) [03:27:32] !log tools deleted toolx-exec-21 to 24, one task still running on tools-exec [03:28:04] Logged the message, Master [03:28:16] PROBLEM - Host tools-exec-23 is DOWN: CRITICAL - Host Unreachable (10.68.17.254) [03:29:18] years later when people look at the tools logs for today, they'll be like "who in the world was YuviKTM"??? [03:29:25] haha [03:29:26] :D [03:29:42] PROBLEM - Host tools-exec-24 is DOWN: CRITICAL - Host Unreachable (10.68.17.255) [03:30:28] !log phabricator git cloning on ssh configured and working [03:30:32] Logged the message, Master [03:31:31] PROBLEM - Puppet failure on tools-exec-1410 is CRITICAL: CRITICAL: 71.43% of data above the critical threshold [0.0] [03:31:34] !log depooled and deleted tools-exec-12 had nothing on it [03:31:34] depooled is not a valid project. [03:31:38] !log tools depooled and deleted tools-exec-12 had nothing on it [03:31:43] Logged the message, Master [03:32:18] legoPanda: wikibugs got restarted, wonder if it’s still working [03:32:45] PROBLEM - Puppet failure on tools-exec-1406 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:32:53] PROBLEM - Puppet failure on tools-exec-1407 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:33:01] YuviKTM: uhh, no idea! [03:33:02] PROBLEM - Host tools-exec-12 is DOWN: CRITICAL - Host Unreachable (10.68.17.166) [03:33:09] YuviKTM: it should have rejoined here upon restart [03:33:54] it did but I just made a comment and it didn’t say anything [03:33:54] 6Labs, 10Tool-Labs: Rebuild a bunch of tools instances - https://phabricator.wikimedia.org/T97437#1248181 (10yuvipanda) Alright, so I've created tools-exec-12{01-10} and tools-exec-14{01-10}. I've also pooled in tools-exec-14{01-05} and depooled almost all the old trusty nodes (except tools-exec-20, which has... [03:33:55] PROBLEM - Puppet failure on tools-exec-1408 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:33:55] aha! [03:33:55] PROBLEM - Puppet failure on tools-exec-1409 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:33:55] it’s here \o/ [03:41:30] RECOVERY - Puppet failure on tools-exec-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [03:52:57] !log tools depooled tools-exec-03 / 04 [03:53:03] Logged the message, Master [03:54:30] !log tools tools-exec-03 and -04 have been deleted a long time ago [03:54:35] Logged the message, Master [03:57:41] RECOVERY - Puppet failure on tools-exec-1406 is OK: OK: Less than 1.00% above the threshold [0.0] [03:57:51] RECOVERY - Puppet failure on tools-exec-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [03:58:14] !log tools pooled tools-exec-12{02-10}, forgot to put appropriate roles on 1201, fixing now [03:58:19] Logged the message, Master [04:00:09] !log tools pooled tools-exec-1406 and 1407 [04:00:14] Logged the message, Master [04:03:42] RECOVERY - Puppet failure on tools-exec-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [04:04:13] !log tools pooled tools-exec-1408 and tools-exec-1409 [04:04:18] Logged the message, Master [04:08:08] !log tools depooled tools-exec-09, apt troubles [04:08:13] Logged the message, Master [04:14:33] !log tools repooled tools-exec-09, apt troubles fixed [04:14:38] Logged the message, Master [04:19:40] !log tools rejuggle jobs again in trustyland [04:19:45] Logged the message, Master [04:23:43] !log tools repooled tools-exec-1201 is all good now [04:23:48] Logged the message, Master [04:25:01] PROBLEM - Puppet failure on tools-exec-1201 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [04:25:42] 6Labs, 10Tool-Labs: Rebuild a bunch of tools instances - https://phabricator.wikimedia.org/T97437#1248203 (10yuvipanda) Ok, so tools-exec-14{01-10} are pooled now, and so are tools-exec-12{01-10} :D All old trusty instances except tools-exec-20 are deleted as well. [04:27:40] !log tools depooled tools-exec-09.eqiad.wmflabs [04:27:45] Logged the message, Master [04:28:45] RECOVERY - Puppet failure on tools-exec-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [04:28:49] !log tools deleted tools-exec-09 [04:28:53] Logged the message, Master [04:31:10] !log tools depooled exec-{01-05}, rejigged jobs to newer nodes [04:31:37] I think I’m going to need more nodes. [04:32:20] PROBLEM - Host tools-exec-09 is DOWN: CRITICAL - Host Unreachable (10.68.17.64) [04:33:52] 6Labs, 10Tool-Labs: Rebuild a bunch of tools instances - https://phabricator.wikimedia.org/T97437#1248204 (10yuvipanda) So everything in tools-exec-{01-10} has been disabled and drained of continuous jobs. [04:35:02] RECOVERY - Puppet failure on tools-exec-1201 is OK: OK: Less than 1.00% above the threshold [0.0] [04:35:11] 6Labs, 10Tool-Labs: Rebuild a bunch of tools instances - https://phabricator.wikimedia.org/T97437#1248205 (10yuvipanda) We're going to need more nodes, I think. I'm going to add 10 more precise larges and 5 more trusty larges. Some of the nodes being decommed are xlarges too, while all of the new ones are larges. [04:39:45] !log tools delete tools-exec-10, was out of jobs [04:39:45] !log tools deplooled exec-{06-10} rejigged jobs to newer nodes [04:39:46] boo morebots [04:40:16] PROBLEM - Host tools-exec-10 is DOWN: CRITICAL - Host Unreachable (10.68.17.65) [04:41:27] !log tools killed tools-dev, nobody still ssh’d in, no crontabs [04:44:52] 6Labs, 10Tool-Labs: Rebuild a bunch of tools instances - https://phabricator.wikimedia.org/T97437#1248211 (10yuvipanda) Created tools-exec-121{1-9}, and just ran out of quota. [04:45:25] PROBLEM - Host tools-dev is DOWN: CRITICAL - Host Unreachable (10.68.16.8) [04:49:10] 6Labs, 10Tool-Labs: Rebuild a bunch of tools instances - https://phabricator.wikimedia.org/T97437#1248219 (10yuvipanda) After a little more investigation, I think 5 more precise and 5 more trusty should hold good for a long time. Let me rejig appropriately. [04:54:19] 10Tool-Labs: Audit redis usage on toollabs - https://phabricator.wikimedia.org/T91979#1248227 (10yuvipanda) Looks like this is happening again, I've poked the culprits from last time to see if it's them again. [05:01:05] 10Tool-Labs: Audit redis usage on toollabs - https://phabricator.wikimedia.org/T91979#1248236 (10yuvipanda) 5Open>3stalled [05:01:12] 10Tool-Labs: Audit redis usage on toollabs - https://phabricator.wikimedia.org/T91979#1100333 (10yuvipanda) 5stalled>3Open [05:01:35] 10Tool-Labs: Audit redis usage on toollabs - https://phabricator.wikimedia.org/T91979#1100333 (10yuvipanda) Just realized @Dfko is part of 'culprits' :) Am making dump now. [05:02:02] 10Tool-Labs: Audit redis usage on toollabs - https://phabricator.wikimedia.org/T91979#1248239 (10Dfko) We started it up again to get a key dump (see previous 3 comments) @yuvipanda [05:12:57] 10Tool-Labs: Audit redis usage on toollabs - https://phabricator.wikimedia.org/T91979#1248247 (10yuvipanda) Dump provided in private :) [05:24:31] PROBLEM - Puppet failure on tools-exec-1216 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [05:25:37] PROBLEM - Puppet failure on tools-exec-1213 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [05:25:59] PROBLEM - Puppet failure on tools-exec-1214 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [05:28:26] PROBLEM - Puppet failure on tools-exec-1212 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [05:29:56] PROBLEM - Puppet failure on tools-exec-1215 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [05:35:17] !log tools rebooted the newly created tools-exec-121{0-9} so they pick up appropriate idmapd behavior [05:36:06] PROBLEM - Puppet failure on tools-exec-1211 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [0.0] [05:38:24] PROBLEM - Puppet failure on tools-exec-1217 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [05:38:44] PROBLEM - Puppet failure on tools-exec-1218 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [05:39:09] !log tools delete tools-exec-10, was out of jobs [05:39:15] Logged the message, Master [05:39:17] !log tools deplooled exec-{06-10} rejigged jobs to newer nodes [05:39:21] Logged the message, Master [05:39:23] !log tools killed tools-dev, nobody still ssh’d in, no crontabs [05:39:27] Logged the message, Master [05:39:40] !log tools created new instances tools-exec-121{1-9} as precise [05:39:44] Logged the message, Master [05:39:48] PROBLEM - Puppet failure on tools-exec-1219 is CRITICAL: CRITICAL: 16.67% of data above the critical threshold [0.0] [05:39:55] !log tools rebooted tools-exec-121{1-9} instances so they can apply gridengine-common properly [05:39:59] Logged the message, Master [05:40:12] !log tools pooled in tools-exec-121{1-9} [05:40:16] Logged the message, Master [05:42:16] !log tools disabled and drained tools-exec-1{1-5} of continuous jobs [05:42:20] Logged the message, Master [05:43:26] RECOVERY - Puppet failure on tools-exec-1212 is OK: OK: Less than 1.00% above the threshold [0.0] [05:44:14] 6Labs, 10Tool-Labs: Rebuild a bunch of tools instances - https://phabricator.wikimedia.org/T97437#1248280 (10yuvipanda) Created tools-exec-121{1-9} and pooled them :) Also drained tools-exec-1{1-5} of continuous jobs. Things left to do: # Wait for tools-exec-xx (anything with two digits) to have no running t... [05:44:30] RECOVERY - Puppet failure on tools-exec-1216 is OK: OK: Less than 1.00% above the threshold [0.0] [05:44:50] RECOVERY - Puppet failure on tools-exec-1219 is OK: OK: Less than 1.00% above the threshold [0.0] [05:44:56] RECOVERY - Puppet failure on tools-exec-1215 is OK: OK: Less than 1.00% above the threshold [0.0] [05:45:37] RECOVERY - Puppet failure on tools-exec-1213 is OK: OK: Less than 1.00% above the threshold [0.0] [05:46:03] RECOVERY - Puppet failure on tools-exec-1214 is OK: OK: Less than 1.00% above the threshold [0.0] [05:46:05] RECOVERY - Puppet failure on tools-exec-1211 is OK: OK: Less than 1.00% above the threshold [0.0] [05:48:45] RECOVERY - Puppet failure on tools-exec-1218 is OK: OK: Less than 1.00% above the threshold [0.0] [05:49:48] 10Tool-Labs, 5Patch-For-Review: Create separate partition for /tmp on toollabs exec / web nodes - https://phabricator.wikimedia.org/T97445#1248282 (10yuvipanda) 5Open>3Resolved a:3yuvipanda Resolved for all new nodes :D And the old ones will die soon! [05:50:11] 10Tool-Labs, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Harmonize VMEM available on all exec hosts - https://phabricator.wikimedia.org/T95979#1248285 (10yuvipanda) All done for new exec nodes, and the old ones are going to go away really soon :D [05:53:26] RECOVERY - Puppet failure on tools-exec-1217 is OK: OK: Less than 1.00% above the threshold [0.0] [05:57:08] hi all - what is service.manifest and why does webservice not work without it (I can't find any docs or any mention of this file) [05:59:52] hi [06:00:00] Earwig: I’m just fixing that particular bug, moment [06:00:07] alright, thanks [06:00:33] Earwig: basically, it’s the replacement for bigbrother, and is also the reason nobody’s been complaining (at least to me / publicly) about dead webservices. [06:00:39] no docs yet, I was hopihng to do that this week [06:00:55] Hey! I need to create a instance for the Newletter extension project ( https://phabricator.wikimedia.org/tag/mediawiki-extensions-newsletter/ ) . But I assume the project is not created yet. I've created a subtask requesting for the creation of project ( https://phabricator.wikimedia.org/T97523 ). So, can I create a instance before the associated project is [06:00:55] created ? [06:01:07] hooray, good to know (thought it sounded familiar, but wasn't expecting that to actually be in place yet) [06:01:25] Earwig: it’s been in place for 2 weeks now. [06:01:33] Earwig: so essentially you do ‘webservice start’ and then that’s it. it should stay up [06:01:44] right [06:02:17] Earwig: which tool doesn’t have one? I think I made most of them have it... [06:02:22] except if you just created a new tool [06:02:24] copyvios [06:02:28] er wait [06:02:29] actually [06:02:33] it wasn't that one [06:02:44] it was earwigbot, which just has a static index.html file [06:03:36] Earwig: hmm, I manually put most of them in place (took me a day) [06:03:53] Earwig: and when you do a webservice start manually it puts on in place anyway (when this bug is fixed, patch in am merging atm) [06:04:50] could I just stick "web: lighttpd" in it? [06:04:59] Earwig: no, just give me like, about 30s. [06:05:39] err, make that 30 more seconds [06:05:43] * YuviKTM waits for puppet to run [06:05:45] no rush [06:06:11] Earwig: try now [06:06:54] 10Tool-Labs: Fix oscillation between 'purged' and 'latest' for several packages on toollabs - https://phabricator.wikimedia.org/T97628#1248286 (10yuvipanda) 3NEW [06:07:07] works. thanks! [06:07:21] 10Tool-Labs: Fix oscillation between 'purged' and 'latest' for several packages on toollabs - https://phabricator.wikimedia.org/T97628#1248293 (10yuvipanda) [06:07:22] Earwig: cool :) [06:07:45] Earwig: it’s missing some of bigbrother’s functionality atm (that is, it only works for webservices and not grid jobs, and also no support for custom webservices yet) [06:07:48] so unannounced. [06:07:58] m'hm [06:08:04] Earwig: also this week we had to basically shift around *all* labs instances due to hardware issues, so that has taken up my time [06:08:19] Earwig: but we have enough redundancy in place that users barely noticed all the shifting! (At least from lack of any complaints) [06:08:26] so things are getting better :) [06:08:26] uhh, yeah [06:08:30] I was gonna mention that also [06:08:36] did you notice the shifting? [06:08:46] if so, in what ways? [06:08:59] redis would’ve had some connection failures because we don’t have a redundancy model for that yet... [06:09:11] you mean the new tools-exec-12* nodes and whatnot? [06:09:25] my bot died about an hour and a half ago, but I restarted it with the standard jsub command it now seems unable to connect to IRC like usual [06:09:29] so I'm not sure what's going on with that [06:09:39] Earwig: oh, ugh. I see. do you have an error message? [06:09:54] I think I might know what’s happening (lack of public IP) [06:09:57] let me fix that [06:10:34] Earwig: is your bot running on exec-12* or exec-14*? [06:10:37] (precise or trusty) [06:10:50] unfortunately it just tells me the socket's closed and tries to restart itself to no avail [06:10:55] seems to be a 12* node [06:11:00] guess I never bothered to migrate [06:11:03] should do that soon [06:11:10] heh, I haven’t started a migrate pitch for people on grid yet [06:11:22] yeah, I sorta didn't realize that was happening in parallel to the webserver stuff [06:11:38] was a little annoying to recreate the virtualenvs and whatnot but after that it was fine [06:12:28] yeah. [06:13:07] !log tools allocating new floating IPs for the new instances, because IRC bots need them. [06:13:12] Logged the message, Master [06:13:19] I should consider having a separate queue for bots that need IRC at some point [06:13:44] Earwig: btw, if you’re running any python webservices, have you considered migrating them to the uwsgi-python server setup? [06:13:45] why would freenode require public IPs and not other services? [06:14:00] yep, I've got one on that (which I did during the precise migration) [06:14:04] sweet [06:14:06] quite happy with that [06:14:17] Earwig: freenode limits total number of connectisons from one ip [06:14:18] seems to be more snappy overall [06:14:20] ah okay [06:14:30] Earwig: yeah, is going through less layers of proxying [06:17:09] Earwig: I’m allocating public IPs, your bot should be able to connect in a few minutes. [06:17:14] sounds good [06:17:28] I’ll poke you when I’m done with the precise cluster [06:20:41] Earwig: all done. check? [06:21:10] looks good! [06:21:17] Earwig: IRC connects? [06:21:21] indeed [06:21:45] Earwig: sweet :D [06:25:04] Earwig: thanks for reporting! [06:25:09] * YuviKTM does this for the trusty nodes too [06:25:13] no problem, good luck with the other stuff [06:25:15] Earwig: you should move your other tools to trusty too at some point [06:25:16] thanks! [06:25:33] will add that to the todo list [06:25:37] Earwig: if there’s something that you think tools should do better (other than the obvious, like reliability) please do let me know. [06:25:45] sure [06:26:48] 10Tool-Labs: Install grunt on tools-labs - https://phabricator.wikimedia.org/T97629#1248303 (10Mjbmr) 3NEW [06:27:24] 10Tool-Labs: Install grunt on tools-labs - https://phabricator.wikimedia.org/T97629#1248311 (10yuvipanda) 5Open>3declined a:3yuvipanda Please use npm locally to install nodejs packages. [06:30:29] !log tools added public IPs for all exec nodes so IRC tools continue to work. Removed all associated hostnames, let’s not do those [06:30:34] Logged the message, Master [06:32:49] PROBLEM - Puppet failure on tools-mail is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [06:33:11] 6Labs, 10Tool-Labs: Rebuild a bunch of tools instances - https://phabricator.wikimedia.org/T97437#1248318 (10yuvipanda) I forgot to give the new instances public IPs, which was causing a bunch of failures for IRC bots. That has been remedied now with a lot of clicking. When this is all done I'm going to write... [06:36:28] 10Tool-Labs: Install grunt on tools-labs - https://phabricator.wikimedia.org/T97629#1248319 (10Mjbmr) >>! In T97629#1248311, @yuvipanda wrote: > Please use npm locally to install nodejs packages. It required root access for grunt, I tried. [07:02:50] RECOVERY - Puppet failure on tools-mail is OK: OK: Less than 1.00% above the threshold [0.0] [07:16:05] 10Tool-Labs: Install grunt on tools-labs - https://phabricator.wikimedia.org/T97629#1248348 (10yuvipanda) If you run npm install in a directory with a valid package.json file it will install the module locally. [07:59:46] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Pavlo Chemist was created, changed by Pavlo Chemist link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Pavlo_Chemist edit summary: Created page with "{{Tools Access Request |Justification=To run a bot "PavloChemBot" in Ukrainian and maybe later in English Wikipedia. PavloChemBot already has bot flag in Ukrainian Wikipedia a..." [08:11:26] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Pavlo Chemist was modified, changed by Pavlo Chemist link https://wikitech.wikimedia.org/w/index.php?diff=156753 edit summary: [08:12:33] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Pavlo Chemist was modified, changed by Pavlo Chemist link https://wikitech.wikimedia.org/w/index.php?diff=156754 edit summary: link to the bot [08:54:27] PROBLEM - Puppet staleness on tools-mailrelay-01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [43200.0] [10:20:30] 6Labs, 10Labs-Infrastructure, 6operations, 10ops-eqiad: labvirt1005 memory errors - https://phabricator.wikimedia.org/T97521#1248596 (10hashar) @Andrew thank you for the instances migrations! [10:28:54] 10Tool-Labs, 6Engineering-Community, 6WMF-Legal: Set up process / criteria for taking over abandoned tools - https://phabricator.wikimedia.org/T87730#1248601 (10Qgil) Is someone planning to work on this task during the month of May? If so, please take it. If not, maybe it is better to lower its priority? [11:02:00] PROBLEM - Puppet failure on tools-exec-1214 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [11:02:38] PROBLEM - Puppet failure on tools-webgrid-07 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [11:03:25] PROBLEM - Puppet failure on tools-exec-1203 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [11:03:37] PROBLEM - Puppet failure on tools-webproxy-02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [11:04:15] PROBLEM - Puppet failure on tools-webgrid-08 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0] [11:04:27] PROBLEM - Puppet failure on tools-exec-1212 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [11:05:18] PROBLEM - Puppet failure on tools-exec-1401 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [11:05:32] PROBLEM - Puppet failure on tools-exec-1216 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [11:06:04] PROBLEM - Puppet failure on tools-exec-1201 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [11:06:36] PROBLEM - Puppet failure on tools-exec-1213 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [11:07:06] PROBLEM - Puppet failure on tools-exec-1211 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [11:07:34] PROBLEM - Puppet failure on tools-exec-1403 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [11:07:44] PROBLEM - Puppet failure on tools-exec-1405 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [11:08:02] PROBLEM - Puppet failure on tools-redis is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [11:08:41] PROBLEM - Puppet failure on tools-exec-1406 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [11:08:51] PROBLEM - Puppet failure on tools-exec-1407 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [11:09:41] PROBLEM - Puppet failure on tools-exec-1206 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [11:09:45] PROBLEM - Puppet failure on tools-exec-1218 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [11:10:31] PROBLEM - Puppet failure on tools-webgrid-06 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [11:11:44] PROBLEM - Puppet failure on tools-webgrid-02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [11:11:54] PROBLEM - Puppet failure on tools-trusty is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [11:11:56] PROBLEM - Puppet failure on tools-master is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [11:11:58] PROBLEM - Puppet failure on tools-exec-1202 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [11:12:22] PROBLEM - Puppet failure on tools-exec-1210 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [11:12:30] PROBLEM - Puppet failure on tools-webgrid-05 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [11:12:30] PROBLEM - Puppet failure on tools-exec-1410 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [11:12:46] PROBLEM - Puppet failure on tools-webproxy-01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [11:12:54] PROBLEM - Puppet failure on tools-exec-1208 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [11:13:50] PROBLEM - Puppet failure on tools-mail is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [11:13:54] PROBLEM - Puppet failure on tools-bastion-01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [11:14:10] PROBLEM - Puppet failure on tools-bastion-02 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [11:14:40] PROBLEM - Puppet failure on tools-exec-1408 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [11:14:44] PROBLEM - Puppet failure on tools-exec-1409 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [11:14:48] PROBLEM - Puppet failure on tools-webgrid-01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [11:15:02] PROBLEM - Puppet failure on tools-static-01 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [11:15:34] PROBLEM - Puppet failure on tools-webgrid-generic-01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [11:15:56] PROBLEM - Puppet failure on tools-exec-1215 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [11:17:51] PROBLEM - Puppet failure on tools-exec-1207 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [11:19:55] PROBLEM - Puppet failure on tools-webgrid-03 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [11:19:59] PROBLEM - Puppet failure on tools-exec-1209 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [11:20:55] PROBLEM - Puppet failure on tools-exec-1204 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [11:21:13] PROBLEM - Puppet failure on tools-webgrid-generic-02 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [11:21:37] PROBLEM - Puppet failure on tools-submit is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [11:21:51] PROBLEM - Puppet failure on tools-exec-1205 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [11:21:59] PROBLEM - Puppet failure on tools-exec-1402 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [11:22:05] PROBLEM - Puppet failure on tools-services-01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [11:22:59] PROBLEM - Puppet failure on tools-static-02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [11:23:14] PROBLEM - Puppet failure on tools-services-02 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [11:24:23] PROBLEM - Puppet failure on tools-exec-1217 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [11:25:49] PROBLEM - Puppet failure on tools-exec-1219 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [11:30:19] RECOVERY - Puppet failure on tools-exec-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [11:30:29] RECOVERY - Puppet failure on tools-exec-1216 is OK: OK: Less than 1.00% above the threshold [0.0] [11:33:02] RECOVERY - Puppet failure on tools-redis is OK: OK: Less than 1.00% above the threshold [0.0] [11:33:26] RECOVERY - Puppet failure on tools-exec-1203 is OK: OK: Less than 1.00% above the threshold [0.0] [11:33:34] RECOVERY - Puppet failure on tools-webproxy-02 is OK: OK: Less than 1.00% above the threshold [0.0] [11:34:15] RECOVERY - Puppet failure on tools-webgrid-08 is OK: OK: Less than 1.00% above the threshold [0.0] [11:34:25] RECOVERY - Puppet failure on tools-exec-1212 is OK: OK: Less than 1.00% above the threshold [0.0] [11:34:43] RECOVERY - Puppet failure on tools-exec-1218 is OK: OK: Less than 1.00% above the threshold [0.0] [11:35:30] RECOVERY - Puppet failure on tools-webgrid-06 is OK: OK: Less than 1.00% above the threshold [0.0] [11:36:04] RECOVERY - Puppet failure on tools-exec-1201 is OK: OK: Less than 1.00% above the threshold [0.0] [11:36:37] RECOVERY - Puppet failure on tools-exec-1213 is OK: OK: Less than 1.00% above the threshold [0.0] [11:37:00] RECOVERY - Puppet failure on tools-master is OK: OK: Less than 1.00% above the threshold [0.0] [11:37:06] RECOVERY - Puppet failure on tools-exec-1211 is OK: OK: Less than 1.00% above the threshold [0.0] [11:37:22] RECOVERY - Puppet failure on tools-exec-1210 is OK: OK: Less than 1.00% above the threshold [0.0] [11:37:30] RECOVERY - Puppet failure on tools-exec-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [11:37:42] RECOVERY - Puppet failure on tools-exec-1405 is OK: OK: Less than 1.00% above the threshold [0.0] [11:37:50] RECOVERY - Puppet failure on tools-webproxy-01 is OK: OK: Less than 1.00% above the threshold [0.0] [11:38:41] RECOVERY - Puppet failure on tools-exec-1406 is OK: OK: Less than 1.00% above the threshold [0.0] [11:38:51] RECOVERY - Puppet failure on tools-exec-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [11:39:05] RECOVERY - Puppet failure on tools-bastion-02 is OK: OK: Less than 1.00% above the threshold [0.0] [11:39:42] RECOVERY - Puppet failure on tools-exec-1206 is OK: OK: Less than 1.00% above the threshold [0.0] [11:39:43] RECOVERY - Puppet failure on tools-exec-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [11:39:46] RECOVERY - Puppet failure on tools-exec-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [11:40:34] RECOVERY - Puppet failure on tools-webgrid-generic-01 is OK: OK: Less than 1.00% above the threshold [0.0] [11:41:44] RECOVERY - Puppet failure on tools-webgrid-02 is OK: OK: Less than 1.00% above the threshold [0.0] [11:41:55] RECOVERY - Puppet failure on tools-trusty is OK: OK: Less than 1.00% above the threshold [0.0] [11:41:57] RECOVERY - Puppet failure on tools-exec-1202 is OK: OK: Less than 1.00% above the threshold [0.0] [11:42:31] RECOVERY - Puppet failure on tools-exec-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [11:42:31] RECOVERY - Puppet failure on tools-webgrid-05 is OK: OK: Less than 1.00% above the threshold [0.0] [11:42:55] RECOVERY - Puppet failure on tools-exec-1208 is OK: OK: Less than 1.00% above the threshold [0.0] [11:43:52] RECOVERY - Puppet failure on tools-mail is OK: OK: Less than 1.00% above the threshold [0.0] [11:43:52] RECOVERY - Puppet failure on tools-bastion-01 is OK: OK: Less than 1.00% above the threshold [0.0] [11:44:50] RECOVERY - Puppet failure on tools-webgrid-01 is OK: OK: Less than 1.00% above the threshold [0.0] [11:44:56] RECOVERY - Puppet failure on tools-webgrid-03 is OK: OK: Less than 1.00% above the threshold [0.0] [11:44:58] RECOVERY - Puppet failure on tools-exec-1209 is OK: OK: Less than 1.00% above the threshold [0.0] [11:45:04] RECOVERY - Puppet failure on tools-static-01 is OK: OK: Less than 1.00% above the threshold [0.0] [11:45:56] RECOVERY - Puppet failure on tools-exec-1215 is OK: OK: Less than 1.00% above the threshold [0.0] [11:46:12] RECOVERY - Puppet failure on tools-webgrid-generic-02 is OK: OK: Less than 1.00% above the threshold [0.0] [11:47:51] RECOVERY - Puppet failure on tools-exec-1207 is OK: OK: Less than 1.00% above the threshold [0.0] [11:49:25] RECOVERY - Puppet failure on tools-exec-1217 is OK: OK: Less than 1.00% above the threshold [0.0] [11:50:01] @q test [11:50:01] Sorry but I don't see this user in a channel [11:50:15] @q labs-morebots [11:50:20] @unq labs-morebots [11:50:49] RECOVERY - Puppet failure on tools-exec-1219 is OK: OK: Less than 1.00% above the threshold [0.0] [11:50:57] RECOVERY - Puppet failure on tools-exec-1204 is OK: OK: Less than 1.00% above the threshold [0.0] [11:51:14] I trust: .*@wikimedia/.* (2trusted), .*@mediawiki/.* (2trusted), .*@wikimedia/Ryan-lane (2admin), .*@wikipedia/.* (2trusted), .*@nightshade.toolserver.org (2trusted), .*@wikimedia/Krinkle (2admin), .*@[Ww]ikimedia/.* (2trusted), .*@wikipedia/Cyberpower678 (2admin), .*@wirenat2\.strw\.leidenuniv\.nl (2trusted), .*@unaffiliated/valhallasw (2trusted), .*@mediawiki/yuvipanda (2admin), .*@wikipedia/Coren (2admin), [11:51:14] @trusted [11:51:35] RECOVERY - Puppet failure on tools-submit is OK: OK: Less than 1.00% above the threshold [0.0] [11:51:51] RECOVERY - Puppet failure on tools-exec-1205 is OK: OK: Less than 1.00% above the threshold [0.0] [11:52:00] RECOVERY - Puppet failure on tools-exec-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [11:52:04] RECOVERY - Puppet failure on tools-exec-1214 is OK: OK: Less than 1.00% above the threshold [0.0] [11:52:04] RECOVERY - Puppet failure on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [11:52:38] RECOVERY - Puppet failure on tools-webgrid-07 is OK: OK: Less than 1.00% above the threshold [0.0] [11:52:58] RECOVERY - Puppet failure on tools-static-02 is OK: OK: Less than 1.00% above the threshold [0.0] [11:53:16] RECOVERY - Puppet failure on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [12:02:53] hello [12:03:02] hi [12:03:22] though... hold on, i'll try to figure out this myself. [12:08:19] okay, i think i'm a bit lost again. i'm not sure how to ssh onto this my new instance i created yesterday [12:08:52] wikitech.wikimedia.org shows me its internal IP address - i run ssh root@internal_ip and my ssh key get rejected [12:09:06] gets* [12:09:47] i created the ssh key on bastion and added it via the account preferenecs [12:09:50] preferences* [12:10:05] 10Wikibugs: wikibugs should notify on dependency changes - https://phabricator.wikimedia.org/T77006#1248774 (10Qgil) [12:10:27] i guess that this is not the way. what should i do to tell my instance what ssh key should it let in as root? [12:48:42] 10Tool-Labs, 6Engineering-Community, 6WMF-Legal: Set up process / criteria for taking over abandoned tools - https://phabricator.wikimedia.org/T87730#1248823 (10Technical13) >>! In T87730#1248601, @Qgil wrote: > Is someone planning to work on this task during the month of May? If so, please take it. If not,... [12:54:52] d33tah: You shouldn't be using root directly. [12:55:41] 6Labs: allow routing between labs instances and public labs ips - https://phabricator.wikimedia.org/T96924#1248833 (10akosiaris) Here's an update on this. When a labs VM wants to contact a public IP it will use its local routing table to figure out where to send the packet. The routing table has 2 entries - d... [12:57:23] Coren: any other account would be fine as well, but none of them lets me in [12:57:44] i tried d (my bastion account) and D33tah (my wikilabs acc name) [12:57:57] oh [12:58:00] d let me in now [12:58:00] You need to use your /shell/ account name. [12:58:14] okay, nvm, now it works [13:42:48] 6Labs, 3Labs-Q4-Sprint-2, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Disable LDAP and enable admin puppet module on labstore100[12] - https://phabricator.wikimedia.org/T95559#1248935 (10coren) A good summary of the issue: NFS's protocol places a hard limit on the number of supplemental groups that can be sent... [13:56:56] 6Labs: allow routing between labs instances and public labs ips - https://phabricator.wikimedia.org/T96924#1248966 (10Andrew) Thank you for investigating, Alex! [13:58:22] Coren: Things seem to be flaky lately. Job 387832 mysteriously was dead yesterday, now this morning 387831, 387833, and 387835 are gone. In all cases it looks like the grid migrated them just before they disappeared. [14:00:31] ... And 387835 apparently returned from the dead when I tried to restart everything? Even weirder. [14:01:13] anomie: Yesterday's hardware issue (yeay!) caused us to have to do a lot of juggling. [14:12:23] andrewbogott: Hi, wb :) [14:18:10] Vivek: hello! I’m here but will vanish again shortly for breakfast. [14:18:24] Ok. [14:19:01] I have pmed you :) [14:31:52] Hi. [14:33:06] What do I have to do to get https://en.wikipedia.org/wiki/User:Technical_13/Scripts/OrphanStatus https://en.wikipedia.org/wiki/User:Zhaofeng_Li/reFill and https://en.wikipedia.org/wiki/Wikipedia:The_Wikipedia_Adventure edits added to the semi-automated section of https://tools.wmflabs.org/xtools-ec/?user=3gg5amp1e&project=en.wikipedia.org [14:36:23]