[01:02:29] !log tools.morebots morebots missing from #wikimedia-qa [01:02:35] Logged the message, Master [01:04:43] jeremyb_: morebots is AWOL from #wikimedia-qa. [02:01:54] 3Tool-Labs: Replicate the Phabricator database to labsdb - https://phabricator.wikimedia.org/T52422#941131 (10Krenair) >>! In T52422#550927, @Aklapper wrote: > Plus I don't see how to sanitize that data (legal). The same way we do it for MediaWiki databases? [03:14:24] Coren: still up? Ready for me to build a new debian image? [03:19:40] PROBLEM - Puppet failure on tools-exec-14 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:19:44] PROBLEM - Puppet failure on tools-uwsgi-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:19:46] PROBLEM - Puppet failure on tools-exec-02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:19:58] PROBLEM - Puppet failure on tools-dev is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:20:00] PROBLEM - Puppet failure on tools-exec-08 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:20:04] PROBLEM - Puppet failure on tools-webgrid-03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:20:10] PROBLEM - Puppet failure on tools-webgrid-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:20:20] PROBLEM - Puppet failure on tools-shadow is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:20:20] PROBLEM - Puppet failure on tools-webgrid-tomcat is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:20:21] PROBLEM - Puppet failure on tools-exec-07 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:20:42] PROBLEM - Puppet failure on tools-webgrid-05 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:20:58] PROBLEM - Puppet failure on tools-exec-11 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:21:02] PROBLEM - Puppet failure on tools-exec-15 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:21:06] PROBLEM - Puppet failure on tools-submit is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:21:16] PROBLEM - Puppet failure on tools-exec-wmt is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:21:18] PROBLEM - Puppet failure on tools-exec-03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:21:54] PROBLEM - Puppet failure on tools-exec-09 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:22:09] PROBLEM - Puppet failure on tools-exec-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:22:11] PROBLEM - Puppet failure on tools-redis is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:22:25] PROBLEM - Puppet failure on tools-webgrid-02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:22:29] PROBLEM - Puppet failure on tools-exec-10 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [0.0] [03:22:31] PROBLEM - Puppet failure on tools-exec-04 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:22:35] PROBLEM - Puppet failure on tools-trusty is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:22:41] PROBLEM - Puppet failure on tools-exec-cyberbot is CRITICAL: CRITICAL: 77.78% of data above the critical threshold [0.0] [03:23:07] PROBLEM - Puppet failure on tools-exec-gift is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:23:07] PROBLEM - Puppet failure on tools-static is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:23:25] PROBLEM - Puppet failure on tools-webproxy is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:23:53] PROBLEM - Puppet failure on tools-exec-12 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:24:09] PROBLEM - Puppet failure on tools-exec-06 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:24:31] PROBLEM - Puppet failure on tools-mail is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:24:39] PROBLEM - Puppet failure on tools-webgrid-04 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:24:41] PROBLEM - Puppet failure on tools-login is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:24:49] PROBLEM - Puppet failure on tools-exec-05 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [0.0] [03:24:49] PROBLEM - Puppet failure on tools-exec-catscan is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:24:59] PROBLEM - Puppet failure on tools-exec-13 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:25:34] PROBLEM - Puppet failure on tools-master is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [03:29:03] Oddly, ^ is the sound of the puppetmaster being fixed. Will clear up shortly. [03:31:08] RECOVERY - Puppet failure on tools-submit is OK: OK: Less than 1.00% above the threshold [0.0] [03:32:11] RECOVERY - Puppet failure on tools-redis is OK: OK: Less than 1.00% above the threshold [0.0] [03:32:29] RECOVERY - Puppet failure on tools-exec-10 is OK: OK: Less than 1.00% above the threshold [0.0] [03:32:41] RECOVERY - Puppet failure on tools-exec-cyberbot is OK: OK: Less than 1.00% above the threshold [0.0] [03:34:11] RECOVERY - Puppet failure on tools-exec-06 is OK: OK: Less than 1.00% above the threshold [0.0] [03:34:49] RECOVERY - Puppet failure on tools-exec-05 is OK: OK: Less than 1.00% above the threshold [0.0] [03:35:57] RECOVERY - Puppet failure on tools-exec-11 is OK: OK: Less than 1.00% above the threshold [0.0] [03:36:15] RECOVERY - Puppet failure on tools-exec-03 is OK: OK: Less than 1.00% above the threshold [0.0] [03:36:58] RECOVERY - Puppet failure on tools-exec-09 is OK: OK: Less than 1.00% above the threshold [0.0] [03:38:05] RECOVERY - Puppet failure on tools-static is OK: OK: Less than 1.00% above the threshold [0.0] [03:39:30] RECOVERY - Puppet failure on tools-mail is OK: OK: Less than 1.00% above the threshold [0.0] [03:40:22] RECOVERY - Puppet failure on tools-webgrid-tomcat is OK: OK: Less than 1.00% above the threshold [0.0] [03:40:34] RECOVERY - Puppet failure on tools-master is OK: OK: Less than 1.00% above the threshold [0.0] [03:40:46] RECOVERY - Puppet failure on tools-webgrid-05 is OK: OK: Less than 1.00% above the threshold [0.0] [03:41:06] RECOVERY - Puppet failure on tools-exec-15 is OK: OK: Less than 1.00% above the threshold [0.0] [03:41:16] RECOVERY - Puppet failure on tools-exec-wmt is OK: OK: Less than 1.00% above the threshold [0.0] [03:42:26] RECOVERY - Puppet failure on tools-webgrid-02 is OK: OK: Less than 1.00% above the threshold [0.0] [03:42:38] RECOVERY - Puppet failure on tools-trusty is OK: OK: Less than 1.00% above the threshold [0.0] [03:43:08] RECOVERY - Puppet failure on tools-exec-gift is OK: OK: Less than 1.00% above the threshold [0.0] [03:43:50] RECOVERY - Puppet failure on tools-exec-12 is OK: OK: Less than 1.00% above the threshold [0.0] [03:44:40] RECOVERY - Puppet failure on tools-exec-14 is OK: OK: Less than 1.00% above the threshold [0.0] [03:44:40] RECOVERY - Puppet failure on tools-login is OK: OK: Less than 1.00% above the threshold [0.0] [03:44:46] RECOVERY - Puppet failure on tools-exec-02 is OK: OK: Less than 1.00% above the threshold [0.0] [03:44:46] RECOVERY - Puppet failure on tools-uwsgi-01 is OK: OK: Less than 1.00% above the threshold [0.0] [03:44:56] RECOVERY - Puppet failure on tools-dev is OK: OK: Less than 1.00% above the threshold [0.0] [03:44:59] RECOVERY - Puppet failure on tools-exec-13 is OK: OK: Less than 1.00% above the threshold [0.0] [03:44:59] RECOVERY - Puppet failure on tools-exec-08 is OK: OK: Less than 1.00% above the threshold [0.0] [03:45:09] RECOVERY - Puppet failure on tools-webgrid-01 is OK: OK: Less than 1.00% above the threshold [0.0] [03:45:15] RECOVERY - Puppet failure on tools-shadow is OK: OK: Less than 1.00% above the threshold [0.0] [03:45:19] RECOVERY - Puppet failure on tools-exec-07 is OK: OK: Less than 1.00% above the threshold [0.0] [03:47:10] RECOVERY - Puppet failure on tools-exec-01 is OK: OK: Less than 1.00% above the threshold [0.0] [03:47:30] RECOVERY - Puppet failure on tools-exec-04 is OK: OK: Less than 1.00% above the threshold [0.0] [03:48:26] RECOVERY - Puppet failure on tools-webproxy is OK: OK: Less than 1.00% above the threshold [0.0] [03:49:38] RECOVERY - Puppet failure on tools-webgrid-04 is OK: OK: Less than 1.00% above the threshold [0.0] [03:49:48] RECOVERY - Puppet failure on tools-exec-catscan is OK: OK: Less than 1.00% above the threshold [0.0] [03:50:04] RECOVERY - Puppet failure on tools-webgrid-03 is OK: OK: Less than 1.00% above the threshold [0.0] [05:51:32] PROBLEM - Host tools-uwsgi-01 is DOWN: CRITICAL - Host Unreachable (10.68.16.64) [05:56:38] wat [05:57:13] andrewbogott: ^ [05:58:11] andrewbogott: this was also on one of the newer hosts, I think [05:58:41] it’s in SHUTOFF state [05:58:52] * YuviPanda goes to do a ‘hard’ reboot [06:00:04] !log tools tools-uwsgi-01 randomly went to SHUTOFF state, rebooting from virt1000 [06:00:10] Logged the message, Master [06:01:34] RECOVERY - Host tools-uwsgi-01 is UP: PING OK - Packet loss = 0%, RTA = 1.18 ms [06:22:36] (03PS2) 10Yuvipanda: Add support for python uwsgi webservices [labs/toollabs] - 10https://gerrit.wikimedia.org/r/181418 [06:43:14] YuviPanda: is there anything in the log for that instance explaining why it might have shut down? Do you know what time it turned off? [06:43:52] andrewbogott: shinken-wm’s is timestamp is pretty accurate, I think. [06:44:17] So it turned off just an hour ago when no one was doing anything, huh? [06:44:17] andrewbogott: nothing in the logs *in* that instance tho [06:44:20] andrewbogott: yup [06:44:36] Are there a bunch of uwsgi nodes with similar load? [06:44:54] Is this the same one that's been SHUTOFF before or is that happening to everything on virt1010? [06:45:12] (sorry, I have to ask all the obvious questions while I think about this) [06:45:21] andrewbogott: nope, this one is new. created yesterday. [06:45:32] andrewbogott: another machine shut off like this a few days ago (wdq-mm-02, I think) [06:45:52] dang [06:45:58] yeah [06:47:02] 2014-12-23 05:45:57.239 5932 WARNING nova.compute.manager [-] [instance: 6f8ead40-b3c9-4abe-86a1-407a410f9843] Instance shutdown by itself. Calling the stop API. [06:47:15] 2014-12-23 05:45:56.922 5932 INFO nova.compute.manager [-] [instance: 6f8ead40-b3c9-4abe-86a1-407a410f9843] VM Stopped (Lifecycle Event) [06:47:20] So nova thinks that it turned itself off [06:47:28] hmmm [06:47:37] lemme see if that's happening anywhere else [06:47:54] It could be that… well, it's a long shot but maybe the latest Trusty image just does that [06:48:02] that’s possible [06:48:11] nothing in dmesg... [06:50:33] I don't see that same log message anywhere about anything save that one instance [06:50:47] andrewbogott: on the other virt hosts? [06:50:55] I looked on 1003 and 1006 and 1011 [06:51:04] https://wikitech.wikimedia.org/wiki/Nova_Resource:I-000007a1.eqiad.wmflabs was on 11 [06:51:05] hmm [06:51:10] I don't see that message for wdq-mm-02 either [06:51:17] andrewbogott: if it happens again should I let it be for you to investigate? [06:51:22] (before your holidays, that is) [06:53:32] I don't think I'll be any better at investigating than you. Would be nice to create another similar image on a different (cisco) virt host and see if there's any difference [06:54:42] YuviPanda: so, this is the shutdown and restart, right? https://dpaste.de/mN4j [06:55:19] 3Release-Engineering, Wikimedia-Labs-wikitech-interface: add [[wikitech:Release Engineering/SAL]] to [[wikitech:mediawiki:sidebar]] - https://phabricator.wikimedia.org/T73165#941281 (10jeremyb) [06:55:42] andrewbogott: you mean in terms of timing? [06:55:45] yeah [06:55:55] And there's a 15-minute gap in the log, probably the downtime [06:55:58] Does that look right to you? [06:56:12] yeah, that does. [06:56:18] wait, doing TZ math [06:57:18] That instance says that right now it is Tue Dec 23 06:57:09 UTC 2014 [06:58:04] yeah, that matches up [06:58:12] So… nothing useful in that log at all. [06:58:38] We don't have metrics of any sort, do we? To see if it was running out of memory or handles or something? [06:58:52] (again, I am asking the obvious questions because clueless) [06:59:04] andrewbogott: we do for memory, let me get ‘em [06:59:18] andrewbogott: not handles or anything, just memory [06:59:26] ok, well, we might get lucky [06:59:33] (brb) [06:59:51] https://tools.wmflabs.org/nagf/?project=tools [06:59:55] 3Wikimedia-Labs-General, MediaWiki-extensions-OpenStackManager: adding/removing project members updates project page, changing projectadmins does not - https://phabricator.wikimedia.org/T73164#941283 (10jeremyb) [07:00:02] andrewbogott: nothing about CPU or memory overusage [07:02:49] dang [07:03:16] So, Nova /thinks/ that it's off and is marking it as off. If nova is mistaken about it being off, it might be shutting it down itself [07:03:31] But I don't really know how it decides if an instance is up or down. Presumably not via IP [07:03:36] lemme look at the virt logs [07:04:37] 2014-12-23 05:45:53.476+0000: 64000: error : virProcessGetAffinity:433 : cannot get CPU affinity of process 2000: No such process [07:04:38] 2014-12-23 05:45:54.460+0000: 63995: error : qemuMonitorIO:656 : internal error: End of file from monitor [07:05:38] that doesn’t sound so good. maybe the qemu process running it crashed? [07:06:35] qemu-system-x86_64: /build/buildd/qemu-2.0.0+dfsg/kvm-all.c:984: kvm_irqchip_commit_routes: Assertion `ret == 0' failed. [07:06:35] 2014-12-23 05:45:54.460+0000: shutting down [07:06:41] That's the qemu log [07:06:57] I can't tell if that thing about kvm_irqchip_commit_routes is related or not, it doesn't have a timestamp [07:07:30] brbbmf [07:07:33] sigh [07:11:04] andrewbogott: https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=1029743 has at least the same error message, might be for a different reason [07:11:08] related to… hotplug? [07:11:58] what is hotplug? [07:12:18] I’ve no idea. [07:12:28] hotplugging in devices while the system is still running?! [07:12:35] might be a red herring [07:14:48] legoktm: valhallasw`cloud why is there a webservice running for wikibugs? [07:16:29] YuviPanda: here's all I know so far: https://wikitech.wikimedia.org/wiki/HPshutdowns [07:16:36] I guess we'll add to that page if this happens more, and see if things repeat? [07:17:04] andrewbogott: yeah, makes sense [07:17:27] And now I'm going to lunch, which will ensure it happens again /right now/ :) [07:17:56] andrewbogott: hah! to increase chances, let me also go take a shower :) [07:28:30] (03PS3) 10Yuvipanda: Add support for python uwsgi webservices [labs/toollabs] - 10https://gerrit.wikimedia.org/r/181418 [07:29:45] YuviPanda: confirmed that we're running the same qemu/libvirt packages on old and new virt boxes. [07:30:06] andrewbogott: hmm, this used to happen on the old hosts but we presumed they were just OOM [07:30:09] but maybe they weren’t? [07:30:20] oh! I didn't know this was ever happening before :( [07:30:22] Anyway -- lunch! [07:38:49] valhallasw`cloud: btw, there’s a uwsgi.ini now in www/python on g-p-u [07:45:53] So, as I understand from the recent blog about the migration to Phabricator, for code review WMF will migrate from Gerrit to Phabricator's Diffusion ? [07:46:50] (03PS4) 10Yuvipanda: Add support for python uwsgi webservices [labs/toollabs] - 10https://gerrit.wikimedia.org/r/181418 [07:47:06] DrSkyLizard: at some point in the future, possibly. there will be discussions and evaluations about it before then. [08:35:36] 3Tool-Labs: Enable OpenJDK 8 - https://phabricator.wikimedia.org/T68171#941350 (10yuvipanda) We have a trusty package thanks to T78267, but it might or might not be maintained depending on wether we end up using Titan or not. I shall -2 my patch until that is determined. [09:03:00] (03CR) 10Yuvipanda: [C: 032] Add support for python uwsgi webservices [labs/toollabs] - 10https://gerrit.wikimedia.org/r/181418 (owner: 10Yuvipanda) [09:08:24] 3Tool-Labs: Support uwsg web servers directly on toollabs - https://phabricator.wikimedia.org/T85202#941357 (10yuvipanda) 3NEW [09:14:51] 3Tool-Labs: Support uwsgi web servers directly on toollabs - https://phabricator.wikimedia.org/T85202#941363 (10yuvipanda) [09:26:27] YuviPanda: redirect... [09:59:02] valhallasw`cloud: hmm? [10:00:01] YuviPanda: wikibugs' lighttpd provides a redirect [10:00:22] valhallasw`cloud: oh? To what [10:00:45] Wikitech, i think [10:01:09] Some outdated doc page, probably :-P [10:08:26] Heh [10:50:17] 3Tool-Labs: Make http (404, 302, 301 etc) statistics for toolserver.org - https://phabricator.wikimedia.org/T85167#941434 (10TTO) I suspect this is the reason: http://wiki.openstreetmap.org/wiki/Hike_%26_Bike_Map (the connection to `hikebike` is obvious, and `hill` is possibly short for hill-shading). Probably... [11:52:30] 3Labs-Team, operations, Beta-Cluster: Core dumps fill up /var on labs instances - https://phabricator.wikimedia.org/T1259#941500 (10fgiunchedi) p:5Triage>3High looks like we're close to resolution for this? [11:54:56] 3Labs-Team, operations, Beta-Cluster: Core dumps fill up /var on labs instances - https://phabricator.wikimedia.org/T1259#941506 (10yuvipanda) More like we have a bandaid that works :) 'Solution' I would say is have some way to make the kernel *not* do core dumps at all - bsd does this, but linux I can't find a... [12:50:15] Coren: bigbrother doesn’t seem to work with -l release=trusty? [12:50:18] for normal services... [12:51:06] YuviPanda, are you trying to kill bigbrother again? :-p [12:51:12] I think that should just work [12:51:23] valhallasw`cloud: gerrit-to-redis, I’m trying to get it to run on trusty. [12:51:29] YuviPanda: I see. [12:51:31] valhallasw`cloud: keeps getting scheduled on exec-01 [12:51:35] :/ [12:51:42] so either jsub -l doesn’t work, or something else is wrong? [12:52:30] YuviPanda: use jstart -l ... [12:52:41] specifically [12:52:50] jstart -N -l release=trusty [12:52:58] aka 'rtfm': https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Grid#Bigbrother :-p [12:53:24] valhallasw`cloud: wait [12:53:33] -l release=trusty has to be after jobname? [12:53:33] err [12:53:37] that hsouldn’t matter [12:53:55] valhallasw`cloud: right, so jstart is what’s on bigbrotherrc. [12:53:55] YuviPanda: except where bigbrother parses the of the command to find the job [12:54:01] ewwwwwww [12:54:27] YuviPanda: jstart -N lolrrit-wm -l release=trusty should be fine? [12:54:46] valhallasw`cloud: err, you need to give it path of something too [12:54:52] jstart -l release=trusty -N gerrit-to-redis ~/gerrit-to-redis/src/stream-receiver.bash is what I have now [12:55:10] YuviPanda: chech .bigbrotherrc.disabled... [12:55:11] jstart -N lolrrit-wm -l release=trusty -mem 1G node /data/project/lolrrit-wm/lolrrit-wm/src/relay.js [12:55:23] oh, other project :-p [12:55:27] valhallasw`cloud: oh, I’m trying to move gerrit-to-redis [12:55:39] can move lolrrit-wm after [12:56:02] YuviPanda: anyway, -N first, rest later [12:57:25] continuous@tools-exec-12.eqiad.wmflabs [12:57:26] valhallasw`cloud: works now... [12:57:28] ta-ta-taaaa [12:57:35] thanks [12:58:52] valhallasw`cloud: I suppose I should make lolrrit-wm move now :) [13:02:36] valhallasw`cloud: migrated :) [13:02:41] yay. [13:08:14] YuviPanda: but I think you mean labs-morebots et al, not logmsgbot [13:08:27] valhallasw`cloud: hmm, I’ve managed to kill lolrrit-wm [13:09:00] *slow clap* [13:09:29] it's here, though? [13:10:23] YuviPanda: yeah, it's the other one that's broken [13:10:30] gerrit-to-redis [13:10:37] running, but paramiko is not connecting [13:10:38] yeah [13:10:43] import _io [13:10:43] ImportError: No module named _io [13:10:44] wat. [13:10:56] valhallasw`cloud: ah, venv corruption [13:11:17] ohhhhhh [13:11:19] I'm guessing [13:11:25] ubuntu changed the python search path [13:11:33] recreating... [13:21:35] valhallasw`cloud: is back [13:21:39] valhallasw`cloud: I left comments [13:23:56] I love comments! [13:28:13] 3Labs-Team, operations, Beta-Cluster: Core dumps fill up /var on labs instances - https://phabricator.wikimedia.org/T1259#941548 (10fgiunchedi) under linux it should be enough to set the process' rlimit_core to 0 to disable core dumps, unless I'm missing something? [14:00:27] 3Labs-Team, operations: Setup per-instance configurability for labs in hiera - https://phabricator.wikimedia.org/T1356#941593 (10fgiunchedi) [14:14:57] Coren: hmm, we’ve someone vainly attempting to shut down tools-login with sudo shutdown -h now... [14:15:42] Two question: (a) what happens when you try and (b) why? :-) [14:16:04] But also, I've had no problem accidentally rebooting it a few days ago, I don't see what could be the issue. [14:16:08] Coren: nothing happens, of course, because they don’t have sudo. as to (b), I’ve no idea. [14:16:23] Coren: root just gets a sudo failure email [14:16:29] Coren: looks like a newish account [14:16:47] Oh, *someone*. I misread "somehow" [14:16:53] oh, yeah. not me :D [14:17:04] YuviPanda: probably the typical 'I'm trying to shut down my computer without noticing I'm logged in to tools-login' :-p [14:17:14] valhallasw`cloud: two tries so far… :) [14:17:30] but yeah, if nothing else we’ll let it be I guess [14:17:41] YuviPanda: What valhallasw said. That would have happened to me a couple days ago if I wasn't rootlike. :-) [14:17:47] heh [14:18:08] Coren: you know I’ve almost rm -rf / my laptop once... [14:18:15] YuviPanda: I repliieeeeeeed on your commeeeeeeents :> [14:18:28] valhallasw`cloud: you diiid :) I’ll look in a minute [14:18:29] YuviPanda: --really-delete-root-yes-please-I-do-mean-that [14:18:43] valhallasw`cloud: YES EXCEPT THAT THAT DOES NOT FUCKING EXIST ON OS X WHICH HAS BSD RM [14:18:44] sigh [14:18:47] I don't think I know about that specific incident, but it's part of life as a sysadmin. :-) [14:19:07] valhallasw`cloud: thankfully I interrupted it when I realized it hadn’t errored out immediately [14:19:34] YuviPanda: brrr. [14:19:41] YuviPanda: windows ftw! ;D [14:19:45] heh [14:19:59] I don't know /what/ / is in my windows bash, but it sure isn't c:/ [14:20:26] About 10 years ago I had the "pleasure" to learn that a (then) modern Linux system /can/ be recuperated (if with some effort) so long as /sbin remains intact but with /bin and /etc all gone. [14:20:59] heh [14:21:28] On a production, live system no less. The attempt to fix was made because the actual DB server that lived there was still running -- but its executables were gone so we couldn't stop-and-restart. Fun times. :-) [16:06:51] (03PS1) 10Merlijn van Deen: + taxonomy script [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/181583 [16:07:05] (03CR) 10jenkins-bot: [V: 04-1] + taxonomy script [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/181583 (owner: 10Merlijn van Deen) [16:07:21] YuviPanda: If you get a chance, qa-morebots is missing from #wikimedia-qa [16:08:23] (03PS2) 10Merlijn van Deen: + taxonomy script [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/181583 [16:08:35] (03CR) 10jenkins-bot: [V: 04-1] + taxonomy script [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/181583 (owner: 10Merlijn van Deen) [16:10:48] (03PS3) 10Merlijn van Deen: + taxonomy script [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/181583 [16:13:19] (03PS1) 10Merlijn van Deen: Make sure to reset to origin/master, and show current sha1 before doing so [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/181584 [16:13:35] 3Tool-Labs: Replicate the Phabricator database to labsdb - https://phabricator.wikimedia.org/T52422#941897 (10Aklapper) >>! In T52422#941131, @Krenair wrote: >>>! In T52422#550927, @Aklapper wrote: >> Plus I don't see how to sanitize that data (legal). > > The same way we do it for MediaWiki databases? Feel fr... [16:59:21] (03PS1) 10Alexandros Kosiaris: Add passwords::openldap::corp [labs/private] - 10https://gerrit.wikimedia.org/r/181594 [17:06:58] (03PS4) 10Merlijn van Deen: + taxonomy script [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/181583 [17:10:06] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Add passwords::openldap::corp [labs/private] - 10https://gerrit.wikimedia.org/r/181594 (owner: 10Alexandros Kosiaris) [17:29:54] 3Labs-Team, operations: labs VMs install with outdated packages - https://phabricator.wikimedia.org/T85024#941980 (10fgiunchedi) yep golden master, https://wikitech.wikimedia.org/wiki/OpenStack#Building_new_images [18:01:44] YuviPanda: wikibugs has a redirect to mw.o [18:04:18] legoktm: https://github.com/gawel/irc3/commit/da5a525b029a3e852cd5d11cef2afed55a18e983 [18:10:25] valhallasw`cloud: nice! [18:10:43] valhallasw`cloud: what abour \r? [18:10:49] legoktm: just submitted a PR for that [18:10:56] ok [19:20:17] 3Labs-Team, operations: labs VMs install with outdated packages - https://phabricator.wikimedia.org/T85024#942165 (10coren) 5Open>3declined a:3coren Because of the way OpenStack manages images, making a new one obsoletes the previous ones and makes recovery in case of failure more difficult (if not impossi... [19:31:15] 3Tool-Labs: program created by proprietary compiler allowed on labs? - https://phabricator.wikimedia.org/T74253#942187 (10coren) >>! In T74253#852391, @LuisV_WMF wrote: > is there a reason why these have to reside on labs? Data access? Something else? I think only the OP can answer that definitively. @Inkowik? [19:32:21] 3Tool-Labs: Make bigbrother work with webservice2 - https://phabricator.wikimedia.org/T76162#942188 (10coren) 5Open>3Resolved Bigbrother now knows to invoke webservice2 as apropriate. [19:36:41] 3Tool-Labs, Wikidata: Lost connection to MariaDB server during query - https://phabricator.wikimedia.org/T76699#942200 (10coren) a:5coren>3Springle I see the problem occuring intermitently, but I am unable to reproduce it myself. @springle: do you have insight on what could be going on / has changed? [19:38:38] 3Tool-Labs: Webservice restarts should be faster - https://phabricator.wikimedia.org/T85010#942206 (10coren) p:5Triage>3Low [19:40:57] 3Tool-Labs: Tools labs exec host issue - https://phabricator.wikimedia.org/T78010#942210 (10coren) p:5Triage>3Normal Aborts are normally failed assertions in running code. In order to diagnose it, we'd need (a) any error messages and (b) a backtrace from the core dump to show where the abort() took place. [19:41:49] 3Tool-Labs: PHP abort()s during execution of a script on an exec node - https://phabricator.wikimedia.org/T78010#942213 (10coren) [19:42:12] 3Tool-Labs: program created by proprietary compiler allowed on labs? - https://phabricator.wikimedia.org/T74253#942216 (10coren) p:5Triage>3Low [19:42:30] 3Tool-Labs: Dumps not updating again. - https://phabricator.wikimedia.org/T74154#942218 (10coren) p:5Triage>3Normal [19:42:42] 3Labs-Team, Tool-Labs: Setup postgres credentials for all tool users - https://phabricator.wikimedia.org/T76167#942219 (10coren) p:5Triage>3Low [19:43:54] 3Tool-Labs: Wikimedia Labs system admin (sysadmin) documentation sucks - https://phabricator.wikimedia.org/T57946#942223 (10coren) p:5High>3Normal [19:45:43] 3Tool-Labs: Transfer domain toolserver.org to WMF - https://phabricator.wikimedia.org/T62864#942229 (10coren) p:5Triage>3Normal [19:47:22] 3Tool-Labs: Toolserver migration to Tools (tracking) - https://phabricator.wikimedia.org/T60788#942237 (10coren) [19:47:24] 3Tool-Labs: Transfer domain toolserver.org to WMF - https://phabricator.wikimedia.org/T62864#942235 (10coren) 5Open>3stalled [19:47:25] 3Tool-Labs: Set up redirect webserver for toolserver.org - https://phabricator.wikimedia.org/T62238#942236 (10coren) [19:52:42] 3Tool-Labs: Moving toolserver domain, mail and redirects - https://phabricator.wikimedia.org/T68113#942249 (10coren) 5Open>3Resolved This is complete. There are a few kinks leftover but those are tracked separately. [19:52:43] 3Tool-Labs: Toolserver migration to Tools (tracking) - https://phabricator.wikimedia.org/T60788#942251 (10coren) [20:35:54] 3Tool-Labs: Replicate the Phabricator database to labsdb - https://phabricator.wikimedia.org/T52422#942309 (10RobLa-WMF) >>! In T52422#941897, @Aklapper wrote: > Feel free to provide links to how it's done for MW DBs if it can be done "the same way". A couple of code links: * https://github.com/wikimedia/operat... [20:46:04] 3Tool-Labs: watchlist table not available on labs - https://phabricator.wikimedia.org/T59617#942314 (10coren) One of the issues is the exposure of tables that have few or no watchers, providing valuable vectors for vandalism attacks. I'm pretty sure that @luisv_WMF would be entirely okay with a version that pro... [21:46:15] anyone around? [21:49:10] Im getting an SSL error trying to do an SVN update :/ [21:56:49] Betacommand: Did the error say "dude! are you still using svn?" ;) [21:57:21] bd808: I find it a hell of a lot better than GIT [21:58:23] Betacommand: svn update of what? :-p [21:58:34] bd808: GIT stands for Going Insane and To hell [21:58:55] valhallasw`cloud: A private repo of my code [21:59:21] Betacommand: If you want help, it helps if you provide relevant information. [21:59:55] where is it located? [22:00:09] what kind of clone did you use? [22:00:25] or whatever that was called in the svn-sphere [22:00:27] valhallasw`cloud: http://pastebin.com/gA95KvDp [22:00:41] valhallasw`cloud: its been working for quite a while [22:01:41] site had an outage recently and Im trying to update my code to the most recent \ [22:02:04] mmm. [22:02:06] curl seems to work [22:02:20] committed the changes on my end no issues [22:02:37] went to update on labs and got an SSl error [22:03:40] I'm not sure how to make svn show requests [22:04:26] but it works on my local ubuntu [22:04:36] weeiiiirrrd [22:06:37] thats why Im here [22:07:16] svn --config-option servers:global:neon-debug-mask=511 checkout https://riouxsvn.com/svn/tspywiki gives some more info, but nothing relevant [22:08:32] * Server certificate: [22:08:33] * subject: OU=Domain Control Validated; OU=PositiveSSL Multi-Domain; CN=sni66364.cloudflaressl.com [22:08:36] they're doing SNI [22:08:41] I'm guessing SVN doesn't like that [22:08:44] I bet is has something to do with svn being v1.6.17 on tools-dev. [22:08:56] yeah [22:09:06] libneon is probably similarly ancient (Ubuntu LTS) [22:09:14] try the 14.04 login host [22:09:37] which is something with trusty and tools :-p [22:09:41] tools-trusty [22:10:01] yep, no issues there [22:10:13] that's v1.8.8 [22:10:17] Betacommand: ^ [22:10:55] let me see if I can connect [22:12:34] valhallasw`cloud: whats the remote connection host name? [22:12:45] Betacommand: tools-trusty.wmflabs.org, I think [22:13:22] Host does not exist [22:13:26] on a sidenote, why are you checking out a private repo on tool labs? [22:13:31] then ssh internally to tools-trusty [22:14:14] trusty.tools.wmflabs.org is the external name ,apparently [22:16:15] valhallasw`cloud: Its my personal code, Im the only one with access to the repo. Makes things easier [22:17:08] Betacommand: dunno, there's also free public svn hosting I think? :P it's supposed to be open source, after all [22:17:41] valhallasw`cloud: I dont want to have to deal with leaking personal info in a public repo [22:18:07] then don't put it in =p [22:19:41] valhallasw`cloud: Its a pain in the ass sometimes to deal with stuff both in repo and not, and managing private info needed in scripts [22:19:55] I suppose. [22:20:55] valhallasw`cloud: I can think of several remote APIs that I use that require authing (google books is the largest) [22:21:18] Betacommand: right, so one repo with passwords and one repo with code? [22:21:27] that would be the somewhat standard solution, at least [22:21:50] valhallasw`cloud: Its a pain in the ass, Better solution, one repo [22:22:15] If people want my code they can talk to me [22:22:48] that's not really in the spirit of open source, but whatever [22:26:07] valhallasw`cloud: It may not be. But I dont hand guns to 5 year olds just because they have a right to bare arms [22:26:31] that's... an insane comparison [22:28:26] valhallasw`cloud: given some of the code I have sitting around, its not [22:29:54] Betacommand: Dude, anyone can fire up pywikipedia (or whichever framework of choice) and cause tons of damage without needing your receipes for it. :-) [22:33:17] testing [22:33:21] mmmm. [22:33:43] testing [22:34:11] last test [22:34:34] * valhallasw`cloud hands wm-bot a prize for not being affected by IRC injection