[00:12:02] 6Labs, 10Labs-Infrastructure, 7Tracking: Labs instances sometimes freeze - https://phabricator.wikimedia.org/T124133#1950597 (10chasemp) In looking at poor client behavior it seems prudent to poke at the server. A few weeks ago I added some instrumentation to get statistics from labstore1001 (a few general... [00:17:07] tgr: ok [00:19:59] 6Labs, 10Labs-Infrastructure, 7Tracking: Labs instances sometimes freeze - https://phabricator.wikimedia.org/T124133#1950641 (10chasemp) A few notes on `tools-webgrid-lighttpd-1201.tools.eqiad.wmflabs` > top - 00:17:20 up 21 days, 21:02, 0 users, load average: 168.20, 168.12, 167.9 ---- {P2503} Sam... [00:20:03] tgr: tried on test.wikidata and i'm apparently missing the token param [00:20:04] https://phabricator.wikimedia.org/P2505 [00:20:31] thanks! [00:20:37] is that pywikibot? [00:20:52] my own scripts (php) [00:21:03] is it available somewhere? [00:21:08] suppose i might have to update them [00:21:21] 6Labs, 10Labs-Infrastructure, 7Tracking: Labs instances sometimes freeze - https://phabricator.wikimedia.org/T124133#1950651 (10chasemp) tools-webgrid-lighttpd-1201 did not respond to soft reboot via salt and I rebooted it the hard way [00:23:57] 6Labs, 10Tool-Labs: tools-webgrid-lighttpd-1201 webservices and ssh unaccessible - https://phabricator.wikimedia.org/T122719#1950667 (10chasemp) 5Open>3Resolved https://phabricator.wikimedia.org/T124133#1950641 https://phabricator.wikimedia.org/T124133#1950651 [00:24:00] 6Labs, 10Labs-Infrastructure, 7Tracking: Labs instances sometimes freeze - https://phabricator.wikimedia.org/T124133#1950669 (10chasemp) [00:25:12] tgr: the code's is still WIP but here: https://github.com/filbertkm/wikibot [00:25:28] and never had a problem with the login parts [00:31:12] aude: thanks! the related bug is https://phabricator.wikimedia.org/T124252 [00:32:26] ok [02:41:54] Krenair: can I delete labs-dnsrecursor2.openstack? Or is it still doing things? [02:42:22] you can delete it [02:43:46] thanks [02:48:51] andrewbogott: are you clearing out things with broken puppet? [02:49:19] just randomly picking things that complain about apt-get update while the kernel update script runs [02:49:26] nothing comprehensive [02:51:21] nice [02:55:52] bd808: what about the logstash project? puppet’s broken there too [02:56:10] Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Failed when searching for node puppetmaster.logstash.eqiad.wmflabs: Failed to find puppetmaster.logstash.eqiad.wmflabs via exec: Execution of '/usr/local/bin/ldap-yaml-enc.py puppetmaster.logstash.eqiad.wmflabs' returned 1: [03:05:47] aude: can you tell me how to set up wikibot? [03:06:03] I reproduced what it seems to do in curl and that gets me logged in fine [03:06:34] there must be some subtle difference but I can't find it by reading the code [03:06:57] tgr: i can now login and edit my own mediawiki instance (master) [03:07:27] but still can't on test.wikidata? [03:07:44] andrewbogott: you can kill the current instances in the logstash project, but leave the project itself please. [03:07:47] let me try again [03:07:52] bd808: ok, thanks [03:08:02] Thank you [03:08:07] i had to fix https://gerrit.wikimedia.org/r/#/c/265449/ but think it is unrelated [03:08:28] not sure if i need to use bot passwords now or not [03:09:11] you can but don't need to for at least a month [03:21:30] YuviPanda: https://gerrit.wikimedia.org/r/#/c/265451/ [03:21:37] tgr: i added some documentation and sample configs [03:21:44] but can try again myself [03:21:50] I think that 'apt-get update’ has been timing out on pretty much all trusty instances for ages [03:22:56] andrewbogott: uh, but I did an rm with salt [03:23:11] ah, ok, that explains why it’s present some places but not all :) [03:23:15] Any objection to the patch? [03:23:17] :) [03:23:34] andrewbogott: I still think we should just do an rm with clush instead [03:23:37] but no objections as such ;) [03:23:42] we can remove it in a few weeks [03:23:51] ok, I’ll do it with clush [03:25:17] tgr: doesn't help that i can't reproduce on my devwiki [03:26:46] tgr: you can ply with pywikibot at https://tools.wmflabs.org/paws [03:26:50] *play [03:27:51] Don't try to use your (WMF) account though. I'm pretty sure it still has issues with usernames that have non-alphanumeric chars [03:30:04] tgr: i get something on test.wikidata about my account not existing [03:30:14] and not on my devwiki [03:31:31] (account does exist on test.wikidata and i have the right password for it) [03:35:29] that sounds like the bug anomie patched today with the id being looked up locally rather than from centralauth [03:35:44] could be [03:35:59] but test.wikidata should have that patch [03:36:25] YuviPanda: lol. I just saw the namespace id for Hiera: [03:36:59] bd808: what did I pick? [03:37:03] 666 [03:37:05] I remember I did something I thought was funny [03:37:07] right [03:37:17] It's a shame since I wanted it for Notebook: [03:37:22] and was looking for it [03:37:33] bd808: that was an oauth patch, wikibot uses the login API [03:37:34] and then 'some dickhead had taken it already!' [03:37:44] 668: neighbor of the beast [03:37:47] :D [03:37:54] 667: Talk page of teh Beast [03:38:08] tgr: pwb with PAWS uses OAuth [03:38:11] not bot passwords tho [03:38:13] but full on OAuth [03:38:16] Do we use the Project: namespace on wikitech for anything? [03:38:48] https://gist.github.com/filbertkm/c1e14a3fbda62fd5db88 [03:39:12] ^ these are the steps taken to login and the responses (sans my tokens ,etc) [03:40:28] YuviPanda: if https://grafana.wikimedia.org/dashboard/db/authentication-metrics?panelId=13&fullscreen can be believed there were 400 login errors per second [03:40:43] ouch [03:40:48] no way there are enough clients using OAuth or bot passwords [03:40:54] ...for that [03:41:31] PWB with OAuth wouldn't call the login API anyway [03:41:45] botpasswords doesn't even yet work for wikidata (until i put my patch in swat) [03:41:51] not for wikibase-specific things [03:42:48] aude: thanks but the trick will be somewere in the cookie handling [03:42:55] ok [03:43:06] I can login rto test.wikidata by doing those steps in curl [03:43:28] and sending back the session cookie in the second step [03:43:43] the bot must do some small detail differently [03:43:54] ok [03:44:01] i might be missing that [03:44:03] or maybe the account you use is different in some relevant way, I have no idea [03:44:18] though not necessary on my devwiki [03:44:31] i'm using my AudeBot account on testwikidata [03:55:46] aude: what command were you using to test the login on testwikidata? [03:56:56] tgr: i tried testwikidata with my non-bot account and that works [03:57:06] i'm trying to set a label [03:57:13] (let me try again with my bot account) [03:58:41] and now AudeBot works :/ [03:58:50] i don't know how [03:59:03] I tried 'app/console set-label testwikidatawiki Q100 Test 1' and that gave me a serialization error [04:00:16] nevermind, it works if I give it a base revision that actually exists [04:00:17] ./app/console set-label testwikidatawiki Q583 wzgAhEff 25580 [04:00:27] obviously i need to make the baserev part automatic :) [04:00:44] (normally have been using this stuff to debug and test things, or one off tasks) [04:01:02] probably revision 1 is the main page (e.g. wikitext) [04:06:06] in any case, thanks for looking into it [04:06:15] sure [04:06:22] it's strange that i can login now [04:06:43] i did end up logged in as my bot on wikidata (on the site) [04:07:02] and did a bot edit on testwikidata as Aude [04:07:36] something must have caused it to work [04:15:30] !log tools.stashbot restarted to fix https://github.com/bd808/tools-stashbot/issues/1 [04:19:26] tgr: you were saying that pywikibot already uses oauth for bots? [04:19:35] or has already adapted to the changes? [04:20:28] aude: if the bot operator has set it up, yes [04:20:38] https://www.mediawiki.org/wiki/Manual:Pywikibot/OAuth [04:20:41] ok [04:21:00] i just want to make sure the wikidata communtiy gets informed (though bot authors should already be on wikitech, etc) [04:24:37] aude: https://lists.wikimedia.org/pipermail/wikitech-l/2016-January/084501.html has the details but short story is ideally your bot should do oauth, if it can't you can go to Special:BotPasswords and set up a special username/password for which no changes in bot code are needed [04:25:16] tgr: i saw the mail, yes [04:25:35] i have some code for oauth (or could use one of the libraries for that) [04:26:09] adding to the wikidata newsletter https://www.wikidata.org/wiki/Wikidata:Status_updates/Next#Other_Noteworthy_Stuff [04:26:23] to help make sure people see this [04:28:07] legoktm: can you poke morebots to get it back in this channel? [04:28:22] * aude needs sleep now :) [04:28:29] uh, lets see [04:28:42] hope you figure out the login problem soon [04:29:56] tools.morebots@tools-bastion-01:~$ qmod -rj labs-logbot [04:29:56] Pushed rescheduling of job 228481 on host tools-exec-1219.eqiad.wmflabs [04:30:02] !log tools.morebots restarted labs-logbot [04:30:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.morebots/SAL, Master [04:30:09] bd808: ^ [04:30:16] yay! [04:30:25] thanks legoktm [04:30:28] np [04:31:03] oops [04:32:21] uhh [04:33:02] It did log the message before dying so probably not another session bug [04:33:59] bd808: https://tools.wmflabs.org/?tool=morebots [04:34:05] enjoy! :) [04:34:22] oh no! I asked too many questions! [04:39:12] hmmm... "Died in main event loop" [04:39:29] from a KeyboardInterrupt? [04:39:42] I think that means it used too much memory? [04:41:55] !log tools.morebots Restarted labs-morebots with ./labs.sh [04:42:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.morebots/SAL, Master [04:43:49] well it didn't die again yet which is nice [04:44:12] !log tools.bd808-test Testing labs-morebots a bit [04:44:42] labs-morebots: hello? [04:44:42] I am a logbot running on tools-exec-1207. [04:44:43] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [04:44:43] To log a message, type !log . [04:45:49] !log tools.bd808-test Testing labs-morebots take 2 [04:45:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bd808-test/SAL, Master [05:04:53] aude: did you maybe try to manually log in on testwikidata with your bot account and fail? [05:07:17] also, do you remember how many times you got the notoken error? [05:07:25] * tgr is trying to make sense of the logs [06:40:08] 6Labs, 10wikitech.wikimedia.org: "action=formedit" doesn't work any more - https://phabricator.wikimedia.org/T124248#1951237 (10Luke081515) Works for me... [07:24:00] 6Labs, 10wikitech.wikimedia.org: "action=formedit" doesn't work any more - https://phabricator.wikimedia.org/T124248#1951247 (10Florian) [07:24:02] 6Labs, 10Labs-Infrastructure, 10Tool-Labs, 10MediaWiki-extensions-SemanticForms, 5Patch-For-Review: https://wikitech.wikimedia.org/wiki/Special:FormEdit/Tools_Access_Request down - https://phabricator.wikimedia.org/T123583#1951248 (10Florian) [07:24:59] 6Labs, 10wikitech.wikimedia.org: "action=formedit" doesn't work any more - https://phabricator.wikimedia.org/T124248#1950333 (10Florian) The most important comment of the duplicate task is [[ https://phabricator.wikimedia.org/T123583#1950540 | this one, posted by Reedy ]] :) [08:46:14] 6Labs, 10Phragile, 6TCB-Team: Unable to access Phragile WMFLabs instance - https://phabricator.wikimedia.org/T123369#1951317 (10WMDE-leszek) 5Open>3Resolved a:3WMDE-leszek Thank you all for trying to help us getting access to the instance again. As we needed to update software running on the instance a... [08:46:28] 6Labs, 10Phragile, 6TCB-Team: Unable to access Phragile WMFLabs instance - https://phabricator.wikimedia.org/T123369#1951320 (10WMDE-leszek) 5Resolved>3declined [09:31:44] 6Labs, 10Tool-Labs: Replication lag starting a script on tools.taxonbot - https://phabricator.wikimedia.org/T124172#1951359 (10jcrespo) 5Open>3Resolved a:3jcrespo All lag finally went away at 3:25 UTC. [09:35:21] 6Labs, 10Labs-Infrastructure, 10Beta-Cluster-Infrastructure, 6operations: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#1951375 (10hashar) [09:43:50] 6Labs, 10Labs-Infrastructure, 10Beta-Cluster-Infrastructure, 6operations: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#1951388 (10faidon) >>! In T50501#527689, @Krinkle wrote: > Would it be an option to flatten our subdomains? > > We'd only need b... [14:52:51] 10Tool-Labs-tools-Other, 6Community-Tech, 7Community-Wishlist-Survey, 7Milestone: Pageview Stats tool - https://phabricator.wikimedia.org/T120497#1951866 (10Aklapper) IMPORTANT: **If you are a community developer interested in working on this task:** The [[ https://www.mediawiki.org/wiki/Wikimedia_Hackath... [15:13:28] JOIN [15:13:50] CON LUIS CORRAL [15:15:08] MAS PUTO FUCKING ANSWER ME HELPPPPPPP! [16:35:12] 10Tool-Labs-tools-Other: 504 error for Autolist 2 on https://tools.wmflabs.org/autolist/ - https://phabricator.wikimedia.org/T124280#1952374 (10JEumerus) Probably the former, seeing as other Tool Labs functions still work fine. [16:42:46] 6Labs, 10Labs-Infrastructure, 10Tool-Labs: tools-webgrid-lighttpd-1412 is not accessible by ssh - https://phabricator.wikimedia.org/T124304#1952419 (10scfc) 3NEW [17:09:31] PROBLEM - Host tools-redis-01 is DOWN: CRITICAL - Host Unreachable (10.68.18.70) [17:10:26] 6Labs, 10Labs-Infrastructure, 10Tool-Labs: tools-webgrid-lighttpd-1412 is not accessible by ssh - https://phabricator.wikimedia.org/T124304#1952472 (10chasemp) blargh missing host from salt and without prior console setup I'm stuck. I did grab the running jobs on this host {P2507} seems pretty small? may... [17:10:48] andrewbogott: it seems a bunch of tools are broken :/ like autolist and https://vital-signs.wmflabs.org/ (maybe it's own labs project?) [17:12:11] aude: I’m in the midst of a maintenance thing that’s going to break lots of things. Can you check back later in the day and see if there’s still breakage? [17:12:22] ok [17:14:18] 10Tool-Labs-tools-Other: 504 error for Autolist 2 on https://tools.wmflabs.org/autolist/ - https://phabricator.wikimedia.org/T124280#1952477 (10aude) asked in irc and told there is maintenance (across all of labs) going on and we need to check back in a bit [17:15:55] 6Labs, 10Labs-Infrastructure, 10Tool-Labs: tools-webgrid-lighttpd-1412 is not accessible by ssh - https://phabricator.wikimedia.org/T124304#1952480 (10chasemp) back post reboot for now [17:16:25] hi [17:17:04] i have an instance in labs that shut off by its own and I can't get it to reboot. can somebody help me out? [17:17:13] 6Labs, 10Labs-Infrastructure, 10Tool-Labs: tools-webgrid-lighttpd-1412 is not accessible by ssh - https://phabricator.wikimedia.org/T124304#1952483 (10chasemp) fwiw it seems like it should be a valid salt client root@labcontrol1001:~# salt-key -L | grep 1412 tools-webgrid-lighttpd-1412.tools.eqiad.wmflabs... [17:17:23] joakino: I’m rebooting all instances today to update kernels. What instance? [17:17:37] andrewbogott: maybe set a notice in motd? [17:17:39] andrewbogott: stack.reading-web-staging.eqiad.wmflabs [17:17:42] joakino: (this was announced on labs-l and labs-announce, I encourage you to subscribe if you are not already) [17:17:46] chasemp: isn’t it? [17:17:52] oh maybe so :) [17:17:57] oh ok andrewbogott, will do, didn't know about those lists [17:17:59] thanks [17:18:16] joakino: yes, that’s one of the ones I’m rebooting right now. It should be back in 5-10 minutes. [17:18:35] alright, thanks! sorry for the spam :D [17:18:45] joakino: I think you can subscribe here: https://lists.wikimedia.org/mailman/listinfo/labs-l [17:18:51] 6Labs, 10Labs-Infrastructure, 7Tracking: Labs instances sometimes freeze - https://phabricator.wikimedia.org/T124133#1952488 (10chasemp) have we had non-webgrid examples? [17:19:08] that's what I'm doing :D [17:19:12] 6Labs, 10Labs-Infrastructure, 7Tracking: Labs instances sometimes freeze - https://phabricator.wikimedia.org/T124133#1952492 (10chasemp) [17:19:35] 6Labs, 10Tool-Labs, 7Mail: Move tools-mail to trusty - https://phabricator.wikimedia.org/T96299#1952495 (10coren) a:5coren>3None [17:20:41] 6Labs, 10Labs-Infrastructure, 5Patch-For-Review: Labs: update image builders to use new PAM scheme - https://phabricator.wikimedia.org/T120710#1952504 (10coren) a:5coren>3None [17:20:58] 6Labs, 10Labs-Sprint-115, 10Tool-Labs, 10labs-sprint-116, and 3 others: Attribute cache issue with NFS on Trusty - https://phabricator.wikimedia.org/T106170#1952506 (10coren) a:5coren>3None [17:21:12] 6Labs, 10Labs-Infrastructure, 10Labs-Sprint-102, 6operations, 10ops-eqiad: Locate and assign some MD1200 shelves for proper testing of labstore1002 - https://phabricator.wikimedia.org/T101741#1952507 (10coren) a:5coren>3None [17:23:44] instances on 1008 are reviving now. I’m going to step away while that finishes up… then wait for yuvi before I reboot anything else. (partly that gets us half an hour to verify that the new kernel isn’t a total disaster.) [17:25:36] joakino: your instance should be back now — does it look ok? [17:28:24] andrewbogott: I can ssh in now fine [17:28:32] great [17:29:33] RECOVERY - Host tools-redis-01 is UP: PING OK - Packet loss = 0%, RTA = 1.00 ms [17:39:03] 6Labs: tools replication is failing between labstore1001 and labstore1002 - https://phabricator.wikimedia.org/T124310#1952556 (10chasemp) 3NEW [17:39:13] 6Labs: tools replication is failing between labstore1001 and labstore2001 - https://phabricator.wikimedia.org/T124310#1952564 (10chasemp) [17:57:20] YuviPanda: ping me when you arrive? [18:04:12] andrewbogott: ping [18:04:18] howdy! [18:04:21] I hope waking up wasn’t too painful [18:04:54] it always is :D [18:04:58] I rebooted 1008 already. Do you have any preference about which one I do next? [18:05:08] https://etherpad.wikimedia.org/p/tools-reboots-cve-0728 [18:05:13] let's find one that doesn't need failovering [18:05:28] which is most of 'em [18:05:31] so you can hit 1001 nex [18:05:39] ok. Need to do anything before I do? [18:06:10] andrewbogott: nope [18:06:21] ok then, here we go [18:09:24] YuviPanda: I gave 10 seconds between starts last time, with no problems. Going to try 5 this time. [18:09:38] * jimmyxu would greatly appreciate we run shutdown with like 60s wall notice in the future :p [18:10:25] andrewbogott: ok [18:10:33] jimmyxu: we emailed labs-l and labs-announce :) [18:10:38] jimmyxu: you mean, like, announce here that I’m going to reboot 1001 a minute before I do it? [18:10:47] but noted, and will do it for bastion-01 [18:11:11] I tend to assume that no one knows what host their vm is on, so figured it wouldn’t be useful [18:11:37] andrewbogott: rather than the wall message (as in, you'll get this on your terminal if you're logged in) shutdown automatically prints [18:11:53] ah I see [18:12:00] that’s because I’m not strictly speaking shutting down the VMs [18:12:01] andrewbogott: was running vim and got "the system is shutting down NOW" and did a panic save successfully :p [18:12:08] I’m shutting down the host that contains them [18:12:22] Running a shutdown command on each individual host would be… [18:12:28] oh.. I guess tools-dev got signalled somehow then [18:12:31] well, maybe possible, but inconsistent and a lot of trouble [18:12:33] that's more understandable [18:12:47] yeah, I’m sure that KVM sends a proper shutdown notice when it gets one from the host [18:12:56] PROBLEM - Host tools-webgrid-lighttpd-1410 is DOWN: CRITICAL - Host Unreachable (10.68.18.44) [18:12:56] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:13:19] PROBLEM - Host tools-exec-1204 is DOWN: CRITICAL - Host Unreachable (10.68.17.88) [18:13:21] PROBLEM - Host tools-exec-1408 is DOWN: CRITICAL - Host Unreachable (10.68.18.14) [18:13:30] ^ is fine [18:13:35] it'll all be ok [18:13:53] PROBLEM - Host tools-exec-1202 is DOWN: CRITICAL - Host Unreachable (10.68.16.57) [18:13:53] PROBLEM - Host tools-webgrid-generic-1404 is DOWN: CRITICAL - Host Unreachable (10.68.18.53) [18:13:59] PROBLEM - Host tools-webgrid-lighttpd-1411 is DOWN: CRITICAL - Host Unreachable (10.68.17.51) [18:14:15] YuviPanda: ok, 1002 next? [18:14:21] PROBLEM - Host tools-exec-cyberbot is DOWN: CRITICAL - Host Unreachable (10.68.16.39) [18:14:31] PROBLEM - Host tools-exec-1206 is DOWN: CRITICAL - Host Unreachable (10.68.17.105) [18:14:35] PROBLEM - Host tools-webgrid-generic-1405 is DOWN: CRITICAL - Host Unreachable (10.68.16.110) [18:14:39] once this fully comes back up, yeah [18:14:42] 5s no problem? [18:14:46] I can silence teh http://tools.wmflabs.org/ for a bit? [18:15:03] PROBLEM - Host tools-bastion-02 is DOWN: CRITICAL - Host Unreachable (10.68.16.44) [18:15:10] chasemp: yeah, that's ag ood idea! [18:15:15] PROBLEM - Host tools-exec-1201 is DOWN: CRITICAL - Host Unreachable (10.68.17.49) [18:15:16] kk [18:15:19] PROBLEM - Host tools-exec-1213 is DOWN: CRITICAL - Host Unreachable (10.68.17.252) [18:15:27] PROBLEM - Host tools-exec-1209 is DOWN: CRITICAL - Host Unreachable (10.68.17.129) [18:15:55] PROBLEM - Host tools-puppetmaster-01 is DOWN: CRITICAL - Host Unreachable (10.68.22.61) [18:16:06] YuviPanda: 1001 instances are still waking up, but so far 5s seems fine [18:16:13] PROBLEM - Host tools-exec-1217 is DOWN: CRITICAL - Host Unreachable (10.68.18.20) [18:16:19] ok [18:16:30] http://brojsimpson.com/wordpress/wp-content/uploads/2011/11/its-gonna-be-ok.jpg [18:16:30] I wanna see how the exec nodes did [18:16:33] when they come back up [18:17:14] PROBLEM - Host tools-exec-1218 is DOWN: CRITICAL - Host Unreachable (10.68.18.19) [18:17:35] btw, yuvi, you saw that 3.19 got sorted out? [18:17:41] PROBLEM - Host tools-webgrid-lighttpd-1409 is DOWN: CRITICAL - Host Unreachable (10.68.18.43) [18:17:56] tools-exec-1201 should be back up now [18:18:06] andrewbogott: yeah [18:18:14] YuviPanda, andrewbogott: instances outside tools affected too? [18:18:21] RECOVERY - Host tools-exec-1204 is UP: PING OK - Packet loss = 0%, RTA = 1.07 ms [18:18:22] Luke081515: all of labs [18:18:36] thanks. I just wodnered, why I get a 500 [18:18:47] andrewbogott: yes. thanks for doing that :) [18:18:48] YuviPanda: all exec nodes should be back up [18:18:51] RECOVERY - Host tools-exec-1202 is UP: PING OK - Packet loss = 0%, RTA = 0.75 ms [18:19:23] RECOVERY - Host tools-exec-cyberbot is UP: PING OK - Packet loss = 0%, RTA = 1.30 ms [18:19:23] ok let me check [18:19:31] andrewbogott: Can you pelase ping me, if this is fixed? [18:19:33] RECOVERY - Host tools-exec-1206 is UP: PING OK - Packet loss = 0%, RTA = 3.23 ms [18:19:37] RECOVERY - Host tools-webgrid-generic-1405 is UP: PING OK - Packet loss = 0%, RTA = 1.20 ms [18:19:58] Luke081515, that last outage should be over by now, but there may be others. 8 more virt hosts to reboot yet. [18:20:04] RECOVERY - Host tools-bastion-02 is UP: PING OK - Packet loss = 0%, RTA = 0.86 ms [18:20:16] RECOVERY - Host tools-exec-1201 is UP: PING OK - Packet loss = 0%, RTA = 1.46 ms [18:20:20] RECOVERY - Host tools-exec-1213 is UP: PING OK - Packet loss = 0%, RTA = 0.71 ms [18:20:27] andrewbogott: Thanks, my instance is back now ;) [18:20:28] RECOVERY - Host tools-exec-1209 is UP: PING OK - Packet loss = 0%, RTA = 2.80 ms [18:20:56] RECOVERY - Host tools-puppetmaster-01 is UP: PING OK - Packet loss = 0%, RTA = 1.55 ms [18:21:14] RECOVERY - Host tools-exec-1217 is UP: PING OK - Packet loss = 0%, RTA = 0.80 ms [18:22:00] hmmmm [18:22:09] * andrewbogott cringes [18:22:17] RECOVERY - Host tools-exec-1218 is UP: PING OK - Packet loss = 0%, RTA = 1.46 ms [18:22:22] * YuviPanda is still doing checks [18:22:43] RECOVERY - Host tools-webgrid-lighttpd-1409 is UP: PING OK - Packet loss = 0%, RTA = 0.77 ms [18:22:49] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 982995 bytes in 3.173 second response time [18:22:55] RECOVERY - Host tools-webgrid-lighttpd-1410 is UP: PING OK - Packet loss = 0%, RTA = 0.73 ms [18:23:08] oookay [18:23:21] andrewbogott: so it looks like gridengine still thinks the jobs that have been running on those nodes [18:23:23] RECOVERY - Host tools-exec-1408 is UP: PING OK - Packet loss = 0%, RTA = 0.92 ms [18:23:24] are still running [18:23:26] despite evidence [18:23:28] to the contrary [18:23:30] fun [18:23:39] yay gridengine [18:23:51] RECOVERY - Host tools-webgrid-generic-1404 is UP: PING OK - Packet loss = 0%, RTA = 0.95 ms [18:23:54] the clustering setup where if it *never* loses jobs, even if it does! [18:24:01] RECOVERY - Host tools-webgrid-lighttpd-1411 is UP: PING OK - Packet loss = 0%, RTA = 0.70 ms [18:24:03] maybe it just needs time for things to timeout? [18:24:08] possibly [18:24:12] so I'm going to give it another minute [18:25:20] * andrewbogott peels an orange [18:26:01] hah [18:26:04] I'm out of oranges [18:26:06] unfortunately [18:28:04] I think this was a satsuma, strictly speaking [18:28:19] ah [18:28:22] 'citrus' [18:28:28] it hasn't noticed still [18:28:43] so maybe we restarted the exec nodes too soon? Would it be smarter if the nodes went down and stayed down? [18:28:57] I think [18:29:01] we'll just drain them of jobs explicitly [18:29:03] for the next time [18:29:05] and I've to do tha tnow [18:29:07] *that now [18:29:33] ok. You’re going to drain the ones from 1001 as well? [18:29:35] also fuck SGE, etc. this is like, the underlying bedrock of what a clustering setup should do [18:29:37] yeah [18:29:39] going to do that now [18:29:59] right, the whole point is for it to notice when a node goes down [18:30:21] yeah [18:30:52] hm, some of mine were restarted [18:36:27] YuviPanda: anything I can do to help? [18:36:32] I restarted them allllllll [18:36:56] ok, ready for 1002 to go down then? [18:37:00] let me look [18:37:10] andrewbogott: we should drain the exec nodes prior to restarting this time [18:37:19] oh, that’s what I thought you were doing, sorry [18:37:29] I was draining [18:37:31] 1001 [18:37:31] great, now it's gone [18:37:34] or at least [18:37:35] resetting them [18:38:13] !log tools restarted all restartable jobs in instances on labvirt1001 and deleted all non-restartable ghost jobs. these were already dead [18:38:17] ok, but you’ll drain the 1002 ones as well? Or shall I? (Confused by your use of ‘we’ :) ) [18:38:19] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [18:38:20] yeah [18:38:22] andrewbogott: am doing it [18:38:27] ok! [18:38:33] I'm actually making it into a tiny script [18:38:35] so hold on [18:45:05] ok [18:45:07] script done [18:46:24] !log tools drained and disabled queues on all nodes on labvirt1002 [18:46:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [18:46:36] cool. Ready for reboot? [18:46:41] yeah [18:46:49] I think [18:46:53] this reboot will kill wikibugs [18:46:57] since there'll be a redis connection reset [18:47:01] and we've to restart it manually [18:47:03] after [18:47:05] I can do that [18:47:07] anyway [18:47:09] go on [18:47:11] :) [18:48:25] PROBLEM - Host tools-worker-1003 is DOWN: CRITICAL - Host Unreachable (10.68.17.58) [18:48:40] Cyberpower678: alive? [18:49:08] AzaToth, no. I died yesterday. This is his ghost speaking [18:49:22] good [18:49:37] I notice https://tools.wmflabs.org/xtools-articleinfo/ is fubar Cyberpower678 [18:50:03] is it something that has been removed/moved/etc...? [18:51:32] AzaToth, it would appear someone broke it [18:51:47] I only restart the tools when needed. [18:51:57] I don't do much maintainence anymore. [18:52:07] oh [18:52:15] you are listed as "maintainer" [18:52:26] So I can restart it if I need to. [18:52:37] But I don't tinker with the code anymore. [18:53:03] I've moved on from it when I no longer felt an pleasure maintaining the coed. [18:53:17] PROBLEM - Host tools-services-01 is DOWN: CRITICAL - Host Unreachable (10.68.16.29) [18:53:59] AzaToth, I like writing my own code and maintaining that much better, or maintaining code I actively use myself. [18:54:06] np [18:54:20] PROBLEM - Host tools-redis-1001 is DOWN: CRITICAL - Host Unreachable (10.68.22.56) [18:54:46] PROBLEM - Host tools-webgrid-lighttpd-1209 is DOWN: CRITICAL - Host Unreachable (10.68.17.152) [18:54:46] AzaToth, yea, sorry. Maintaining xTools was driving me to an early wiki retirment. [18:54:51] hehe [18:55:08] PROBLEM - Host tools-exec-1203 is DOWN: CRITICAL - Host Unreachable (10.68.16.133) [18:55:16] just put it on YuviPanda then instead? [18:55:16] * Cyberpower678 now has his own big project. [18:55:29] InternetArchiveBot [18:55:42] hmm [18:55:44] PROBLEM - Host tools-submit is DOWN: CRITICAL - Host Unreachable (10.68.17.1) [18:55:58] in what context? [18:56:04] PROBLEM - Host tools-webgrid-generic-1403 is DOWN: CRITICAL - Host Unreachable (10.68.18.52) [18:56:04] I started this project before it was even mentioned on the community wishlist. [18:56:17] there's a community wishlist? [18:56:20] PROBLEM - Host tools-exec-1214 is DOWN: CRITICAL - Host Unreachable (10.68.17.253) [18:56:22] Yes [18:56:28] I didn't know that either [18:56:36] PROBLEM - Host tools-webgrid-lighttpd-1204 is DOWN: CRITICAL - Host Unreachable (10.68.18.49) [18:56:54] YuviPanda: 1002 instances coming up now. What’s next? [18:56:54] PROBLEM - Host tools-exec-1405 is DOWN: CRITICAL - Host Unreachable (10.68.18.3) [18:56:58] PROBLEM - Host tools-webgrid-lighttpd-1405 is DOWN: CRITICAL - Host Unreachable (10.68.17.65) [18:57:03] I was minding my own business until the WMF approached me when they saw I was pretty far in bot development already and had approval [18:57:13] maybe wherever the proxy failover is? [18:57:19] andrewbogott: yeah [18:57:22] andrewbogott: that's 1003 [18:57:25] let me do the failover [18:57:47] wait, you’re moving onto 1003 or off of? [18:57:50] PROBLEM - Host tools-webgrid-lighttpd-1401 is DOWN: CRITICAL - Host Unreachable (10.68.16.34) [18:57:54] Cyberpower678: what kind of bot? [18:58:02] andrewbogott: off [18:58:07] ok [18:58:08] PROBLEM - Host tools-exec-gift is DOWN: CRITICAL - Host Unreachable (10.68.16.40) [18:58:11] andrewbogott: to 1007 [18:58:20] PROBLEM - Host tools-exec-1210 is DOWN: CRITICAL - Host Unreachable (10.68.17.147) [18:59:35] AzaToth, what kind of bots are there. [18:59:50] I only know of one kind. The bot [18:59:52] :p [19:00:09] RECOVERY - Host tools-exec-1203 is UP: PING OK - Packet loss = 0%, RTA = 0.90 ms [19:00:40] what does the bot do? [19:00:45] RECOVERY - Host tools-submit is UP: PING OK - Packet loss = 0%, RTA = 0.87 ms [19:00:54] YuviPanda: exec nodes should be back up on 1002, if you need to repool [19:01:05] RECOVERY - Host tools-webgrid-generic-1403 is UP: PING OK - Packet loss = 0%, RTA = 1.01 ms [19:01:07] andrewbogott: yeah, so I'm going to failover proxy first, repool, verify, and then depool [19:01:19] RECOVERY - Host tools-exec-1214 is UP: PING OK - Packet loss = 0%, RTA = 1.11 ms [19:01:23] andrewbogott: there's a killnode.bash on my homedir on tools that kills nodes [19:01:30] AzaToth, It goes around wikipedia, actively archiving sources and fixing dead sources [19:01:31] ok, meanwhile I will try to talk less [19:01:33] RECOVERY - Host tools-worker-1003 is UP: PING OK - Packet loss = 0%, RTA = 0.85 ms [19:01:35] hehe [19:01:36] ok [19:01:37] RECOVERY - Host tools-webgrid-lighttpd-1204 is UP: PING OK - Packet loss = 0%, RTA = 0.77 ms [19:01:39] andrewbogott: should we shut up shinken-wm [19:01:44] why do you restart jobs that aren't continuous? do they have state saved? [19:01:45] sure [19:01:53] RECOVERY - Host tools-exec-1405 is UP: PING OK - Packet loss = 0%, RTA = 1.08 ms [19:01:59] RECOVERY - Host tools-webgrid-lighttpd-1405 is UP: PING OK - Packet loss = 0%, RTA = 1.28 ms [19:02:06] andrewbogott: can you do that? :D kill ircecho on shinken-01 and disable puppet to keep it from coming back up? [19:02:48] !log tools failed over tools proxy to tools-proxy-02 [19:02:51] RECOVERY - Host tools-webgrid-lighttpd-1401 is UP: PING OK - Packet loss = 0%, RTA = 2.28 ms [19:02:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [19:03:12] YuviPanda: done [19:03:13] ok [19:03:24] andrewbogott: can you also force a puppetrun on the DNS host? [19:03:28] that is also a needed step [19:03:34] since there's IP aliasing going on there [19:04:09] it’s running now [19:04:19] ok! [19:04:21] now repooling things [19:04:22] RECOVERY - Host tools-redis-1001 is UP: PING OK - Packet loss = 0%, RTA = 0.80 ms [19:05:12] dns hosts should be updated [19:05:31] YuviPanda: have you seen my question? [19:06:35] !log tools re-enabled queues on exec nodes that were on labvirt1002 [19:06:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [19:06:46] gifti2: jobs that aren't continuoes get deleted, not restarted [19:06:52] they aren't restartable jobs [19:07:28] i had one with Rq state [19:07:55] that never ever may be restarted [19:08:08] without manual preparation [19:08:15] > [19:08:17] ? [19:08:39] you should write your code such that it may be restarted at whatever time. [19:08:48] duh [19:08:49] we had to restart the exec nodes to deal with a linux kernel security issue [19:08:52] so... [19:08:57] but this is a non-continuous job [19:09:12] what else am i supposed to do?! [19:09:35] i understand why you have to reboot labs [19:09:36] if you want to keep a non-continuous job running [19:09:42] you can add a .bigbrotherrc file [19:09:48] andrewbogott: ok, looks good, let me depool things on 1003 [19:09:49] i don't want that, that's the point [19:09:57] gifti2: yeah, so if it isn't a continuous job [19:10:01] it wouldn't be restarted [19:10:04] it'd have been qdel'd [19:10:04] it was [19:10:15] can you file a bug? I can look at it [19:10:17] after this [19:10:18] (i may be mistaken) [19:10:20] ok [19:10:35] I am pretty sure [19:10:37] if it is R [19:10:41] it is in a restartable queue [19:10:45] anyway [19:11:08] ha, is the giftbot queue restartable? [19:11:31] ah [19:11:33] good question [19:11:35] I don't know [19:11:37] it probably is [19:11:45] hm! [19:11:55] I didn't actually do anything specifial for giftbot or any of the other special queues [19:13:01] !log tools depooled instances on labvirt1003 [19:13:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [19:13:38] andrewbogott: i think we're good to go now [19:13:42] ok! [19:18:34] (03PS1) 10Ricordisamoa: Configure tox job for Jenkins [labs/tools/ptable] - 10https://gerrit.wikimedia.org/r/265544 [19:18:56] hm, why did we get a ‘host down for toollabs’? [19:19:12] it’s still up, must be a monitoring mistake [19:19:59] andrewbogott: might be part of rescheduling [19:25:00] (03CR) 10Ricordisamoa: "integration/config patch at https://gerrit.wikimedia.org/r/265543" [labs/tools/ptable] - 10https://gerrit.wikimedia.org/r/265544 (owner: 10Ricordisamoa) [19:27:14] YuviPanda: you can repool 1003 now [19:27:28] andrewbogott: ok [19:29:33] !log repooled exec nodes from labvirt1003 [19:29:33] repooled is not a valid project. [19:29:57] !log tools repooled exec nodes from labvirt1003 [19:29:59] and I’m ready for 1004 when you are [19:30:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [19:30:20] yeah let me depool [19:30:56] problem at betacluster: https://phabricator.wikimedia.org/T124333 Special:Preferences says "Database error - Error: 145 Table './centralauth/globalnames' is marked as crashed and should be repaired (10.68.16.193)" [19:31:32] !log tools depooled exec nodes from labvirt1004 [19:31:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [19:31:40] andrewbogott: good to go [19:31:49] quiddity: reporting in #wikimedia-releng is probably going to be better [19:32:05] ah, k. ty :) [19:37:25] RIP my inbox [19:39:52] What's the email address for a tool? [19:41:53] a930913: https://wikitech.wikimedia.org/wiki/Help:Tool_Labs#Email [19:42:38] YuviPanda: meanwhile, shall we move the proxy back to 1003? [19:43:12] andrewbogott: yes [19:43:15] let me do that [19:47:47] YuviPanda: 1004 instances back up and running [19:48:18] ok [19:48:33] !log tools failed over proxy to tools-proxy-01 again [19:48:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [19:48:52] ok all good there [19:48:58] andrewbogott: I'm going to repool them now [19:49:10] * andrewbogott nods [19:49:19] need me to run puppet on the dns host? [19:49:44] !log tools repooled exec nodes from labvirt1004 [19:49:50] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [19:50:05] andrewbogott: only before we get to kill 1007 (which is where -02 is) [19:50:15] so no need unless you think we'll get there before next puppet run [19:50:54] YuviPanda: should I try restarting wikibugs now or is it going to be futile? [19:51:15] legoktm: I restarted it [19:51:27] legoktm: it's restartable job so it sohuld survive, it only died when I killed redis [19:51:30] hrm. [19:51:38] the irc part isn't connected right now... [19:51:41] ah [19:51:43] I restarted that too [19:51:45] just now [19:52:12] thanks :) [19:52:15] andrewbogott: I'm prepping for 1005 now [19:52:33] ok, me too [19:52:40] goddamit [19:52:44] doing a 'vim labvirt1005' [19:52:44] (and I did that puppet refresh just in case) [19:52:46] takes like 10s [19:52:52] probably LDAP [19:53:01] * YuviPanda doesn't have any energy to debug that now [19:53:14] let's pop stack one broken thing at a time [19:53:39] !log tools depooled exec nodes on labvirt1005 [19:53:42] Yeay kernel exploit. They are *so* much fun. [19:53:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [19:54:02] * Coren comiserates with Andrew and Yuvi. [19:54:08] Coren: yeah, Ops in general regarded this as unserious since not very many people have shell. [19:54:12] Matters a bit more in labs :) [19:54:18] * YuviPanda waves at Coren [19:54:25] yeah [19:54:26] YuviPanda: ready for 1005 reboot? [19:54:27] although [19:54:29] the exploit didn't work [19:54:33] on the instance I tried it on [19:54:35] so [19:54:37] andrewbogott: yes [19:54:54] bd808: ^ this takes out 2/3 of the ES nodes, let's see how they recover [19:55:14] * bd808 crosses fingers [19:56:35] * Coren returns to the difficult task of catching up with his Steam playlist. [20:00:06] stashbot is going to be sad until it locks back on to an ES server [20:00:18] * andrewbogott pretty much only uses steam to play peggle [20:01:27] I used to kinda like Steam before; but since they allow sharing within family and do Linux, I love it. : -) [20:01:47] maaaaaaaaan it takes a long time for these HPs to post [20:02:21] Most of my steam experience was running steam for windows on debian with crossover, so you can imagine my memories are mixed [20:02:49] * YuviPanda 's steam experience was buying things on sale, realizing he never plays them, and uninstalling as a way of avoiding temptation [20:03:03] now, my laptop can hardly run GNOME3 much less any games so no temptating [20:05:01] oh, actually Civ 5 is on steam, isnt’ it? So I take it back, I’ve run steam on my mac for a thousand hours. [20:05:32] bd808: es nodes should be back up [20:07:09] and, YuviPanda, you can repool the 1005 tools nodes. [20:07:11] andrewbogott: ssh tools-elastic-03 is giving me "channel 0: open failed: connect failed: No route to host" [20:07:37] tools-elastic-01 is back up [20:07:52] bd808: 03 wfm [20:08:02] maybe I just pinged you too soon [20:08:32] andrewbogott: maybe so. works for me now [20:08:41] ok. sorry to mislead [20:09:05] ok [20:09:07] repooling [20:09:09] now [20:10:01] YuviPanda: cluster came right back to green [20:10:19] hrmmm... why 2 bots [20:11:00] !log tools repooled exec node son 1005 [20:11:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [20:12:52] andrewbogott: ready to depool when you want [20:12:57] yep, have at [20:13:46] how's the maintenance going? [20:14:06] aude, we’re doing the 7th out of 11 nodes. [20:14:11] ok [20:14:16] !log tools depooled all exec nodes in labvirt1006 [20:14:21] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [20:14:24] people noticed that magnus' tools (e.g. https://wdq.wmflabs.org/) are not working [20:14:25] andrewbogott: give it maybe 2mins and then restart [20:14:32] ok [20:14:44] i'll tell them to be patient [20:16:21] aude: in theory his tool should be surviving this, but it’s probably not worth burning much effort [20:17:05] andrewbogott: it's its own labs instnace [20:17:17] oh, which? [20:17:22] wdq-mm-01 [20:17:24] it's up [22:35:51] 6Labs, 10Tool-Labs, 10Diffusion, 10Internet-Archive: Copy contents of https://svn.toolserver.org/ to Wikimedia Diffusion - https://phabricator.wikimedia.org/T60801#1953746 (10bd808) [22:39:31] YuviPanda: yeah I'll email labs-l about the need to restart Vagrant [22:40:15] thanks bd808 [22:55:35] (03PS1) 10MtDu: Add PLURAL support to LOGEVENT_RENAMEUSER message [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/265647 (https://phabricator.wikimedia.org/T114876) [22:57:45] Hi. I get an SSL error when trying to reach https://ru.wikipedia.org/ from mono. Do we miss the proper certificates configuration? [22:58:02] System.Net.WebException: Error writing request: The authentication or decryption has failed. [22:58:31] hey Leloiandudu [22:58:36] maybe the default mono version is too old [22:58:42] how're you running your code? [22:59:23] mono --runtime=v4.0.30319 --optimize=-inline my.exe [22:59:39] hmmm [22:59:41] not sure :( [22:59:52] I don't know enough mono to figure out what's going on :( [22:59:55] sorry! [23:00:41] hello Leloiandudu [23:00:57] I've just run this: [23:00:59] you may need to tweak your code to ignore some ssl issues [23:01:02] mozroots --import --sync [23:01:15] and it doesn't crash anymore [23:01:17] or just catch and analyze the details of that exception to see why it happens [23:01:26] does that mean it will work on the grid too? [23:02:15] what does that command do? [23:02:22] I suppose it imports them to your tool's home, so most likely yes [23:02:37] http://linux.die.net/man/1/mozroots [23:03:40] ok, it works on the grid! [23:03:44] thanks everybody :) [23:04:39] YuviPanda: looks like it updates the local trust store with the root certs that Mozilla publishes [23:04:54] which seems like a nice and reasonable thing to do [23:05:04] hmm [23:05:09] so I guess we get the debian ca-certificates [23:05:29] I'm surprised that our wiki ssl cert doesn't work with that [23:05:35] assuming mono kows how to find them [23:05:45] *knows [23:06:06] hmm [23:06:13] https://packages.debian.org/search?keywords=ca-certificates-mono [23:06:14] https worked from links for me, so I think mono just looked in the wrong place [23:06:15] but how do you get to run mozroots without being root? [23:06:22] ah [23:06:33] bd808: but I think we have that installed too [23:06:37] maybe *that* is out of data [23:06:39] which I don't doubt [23:06:45] @YuvaPanda, it doesn't require root for some reason [23:07:25] YuviPanda: possible. The stretch package says "This package uses the hooks of the ca-certificates package to update the Mono keystore." [23:07:34] 6Labs, 10Tool-Labs, 10Diffusion, 10Internet-Archive: Copy contents of https://svn.toolserver.org/ to Wikimedia Diffusion - https://phabricator.wikimedia.org/T60801#1953883 (10mmodell) a:3mmodell I'll see what I can do... [23:07:36] which would make me think it gets all the system certs [23:08:15] hmm [23:09:20] 6Labs, 10Tool-Labs, 10Diffusion, 10Internet-Archive: Copy contents of https://svn.toolserver.org/ to Wikimedia Diffusion - https://phabricator.wikimedia.org/T60801#1953893 (10mmodell) it seems that phabricator only supports accessing svn repos over ssh. I'm not sure if that will be a problem? [23:10:21] hrm, maps-warper isn't happy... [23:11:28] YuviPanda: tools-login.wmflabs.org:/tmp/kernels.txt [23:11:37] I made it but I haven’t really looked at it yet [23:14:17] ok [23:14:24] chippy: what happened to it [23:14:54] YuviPanda, seems unable to start the ruby/rails application [23:16:05] there was a failure for the /mnt partition a while ago, I wonder if its related. [23:16:53] * YuviPanda hasn't really done any rails before, not sure if he could help :( [23:17:02] andrewbogott: looks like only 2 machines with 4.2 left [23:17:28] and those are the proxies [23:20:29] 6Labs, 10Tool-Labs, 10Diffusion, 10Internet-Archive: Copy contents of https://svn.toolserver.org/ to Wikimedia Diffusion - https://phabricator.wikimedia.org/T60801#1953947 (10mmodell) rTSVN repo created, downloading the svn dump now. [23:20:46] 6Labs, 10Tool-Labs, 10Diffusion, 10Internet-Archive: Copy contents of https://svn.toolserver.org/ to Wikimedia Diffusion - https://phabricator.wikimedia.org/T60801#1953948 (10mmodell) p:5Triage>3High [23:32:16] 6Labs, 10Tool-Labs, 10Diffusion, 10Internet-Archive: Copy contents of https://svn.toolserver.org/ to Wikimedia Diffusion - https://phabricator.wikimedia.org/T60801#1954012 (10mmodell) ok I downloaded and extracted the toolserver-svn archive. It contains 582 repositories, not just one. I'm not sure what to... [23:44:26] 6Labs, 5Patch-For-Review: Any puppet failure on a labs instance should send an email to project admins - https://phabricator.wikimedia.org/T121773#1954074 (10Andrew) 5Open>3Resolved [23:45:53] 6Labs, 10Tool-Labs, 10Diffusion, 10Internet-Archive: Copy contents of https://svn.toolserver.org/ to Wikimedia Diffusion - https://phabricator.wikimedia.org/T60801#1954081 (10bd808) >>! In T60801#1954012, @mmodell wrote: > ok I downloaded and extracted the toolserver-svn archive. It contains 582 repositori... [23:46:32] 6Labs, 10Tool-Labs, 10Diffusion, 10Internet-Archive: Copy contents of https://svn.toolserver.org/ to Wikimedia Diffusion - https://phabricator.wikimedia.org/T60801#1954084 (10mmodell) @bd808: I'm going to try that.