[01:32:09] where be yuvi? [01:32:28] ugh Sundays are always slow [04:33:40] 6Labs, 7Database: Missing record in replica - https://phabricator.wikimedia.org/T89689#1543937 (10Krenair) [04:47:41] 6Labs, 7Database: Missing record in replica - https://phabricator.wikimedia.org/T89689#1543940 (10Krenair) Given example shows up for me now, however others are missing [04:57:32] 6Labs, 7Database: Missing record in replica - https://phabricator.wikimedia.org/T89689#1543941 (10Krenair) ```krenair@tools-bastion-01:~$ mysql --defaults-file=replica.my.cnf -h labsdb1001.eqiad.wmnet -s -e "select count(*) from revision left join page on rev_page = page_id where page_id is null" plwikisource_... [05:02:21] 6Labs, 10Labs-Infrastructure: Replica MySQL: Wiki ViewStats databases completely missing! - https://phabricator.wikimedia.org/T73043#1543949 (10Krenair) 5Open>3Resolved It sounds like you resolved this and the user never confirmed. They're unlikely to confirm now given the dates and the fact that they don... [07:53:27] Negative24: on vacatiob [08:18:37] Could you help me set up my tool for Python 3? I created a virtualenv (virtualenv -p /usr/bin/python3 venv) and activated it (source venv/bin/activate). It seems that running "python myscript.py" works just fine, but when I submitted a job to qsub, I got the following error: "/data/project/alkamidbot/scripts/venv/bin/python: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.17' not found (required by /data/project/alkamidbot/scripts/v [08:18:38] env/bin/python)" [08:19:20] the qsub command was "qsub -N 'afterDump.py' -b y -l h_rt=5:00:00 -l h_vmem=2500M -e $HOME/output/ -o $HOME/output/ -wd $HOME/scripts $HOME/scripts/venv/bin/python $HOME/scripts/afterDump.py >/dev/null" [08:23:58] 6Labs, 6operations, 10wikitech.wikimedia.org: intermittent nutcracker failures - https://phabricator.wikimedia.org/T105131#1544092 (10fgiunchedi) [08:25:53] ^ valhallasw`cloud? You seem to know these things well (: [08:28:13] alkamid: add -l release=trusry to thr qsub? [08:28:59] I think it might schedule your job on a precise host, which doesn't have python 3.4 and dependencies installrd [08:31:53] valhallasw`cloud, makes sense, but then I get a PermissionError: https://dpaste.de/Dnsf [08:32:08] when trying to read dumps [08:32:40] but it might be as well due to wrong conversion from 2to3... I don't know, what does it look like to you? [08:36:08] valhallasw`cloud, no, it must be the host - when I run the script on tools-bastion-01, I don't get this error [09:14:03] alkamid: yes. That's what -release changes [09:14:11] Not sure about the permission error [09:15:04] valhallasw`cloud, is there a simple test that I could carry out (about permissions)? [09:20:51] I checked locally and the script works fine [09:21:00] so it must be on the server's side [09:33:35] 6Labs, 10Tool-Labs: PermissionError when trying to read a dump - https://phabricator.wikimedia.org/T109261#1544271 (10Alkamid) 3NEW [09:50:32] Alkamid: please file a bug [13:05:57] valhallasw`cloud: around? [13:06:04] addshore: sí [13:06:09] who am I kidding, your always here [13:06:30] Is there anything special one needs to do when trying to run python2 stuff in a virtualenv when using jsub? [13:06:49] addshore: -l release=trusty [13:06:57] virtualenvs don't like being run on a different host python version [13:06:57] :/ [13:07:07] ohhhhhh, that could be it [13:07:10] maybe :p [13:07:53] not entirely sure how we can intercept that error to clarify =p [13:08:03] :D [13:11:12] 6Labs, 6operations, 10wikitech.wikimedia.org: Figure out what to do about maintenance scripts on silver/wikitech - https://phabricator.wikimedia.org/T107547#1544831 (10Krenair) [13:27:04] I can't even find the right import to trtigger the issue >_< [13:50:28] 6Labs, 10Wikimedia-Labs-General, 7Database: Replication behind or missing records - https://phabricator.wikimedia.org/T74908#1545165 (10Krenair) [13:52:35] valhallasw`cloud: is that a weird mix of job and vacation? [13:55:38] 6Labs, 10Wikimedia-Labs-General, 7Database: Replication behind or missing records - https://phabricator.wikimedia.org/T74908#1545207 (10Krenair) 5Open>3Resolved It looks like this got resolved at some point because all the given example rows show up and `select count(*) from user` returns the same regard... [14:14:13] 6Labs, 10Wikimedia-Labs-General, 7Database: Replication behind or missing records - https://phabricator.wikimedia.org/T74908#1545311 (10Milimetric) thanks @Krenair, hopefully the underlying issues that caused the missing data are fixed as well. [14:50:07] !log tools killing remaining jobs on tools-exec-1211 tools-exec-1212 tools-exec-1215 tools-exec-1403 tools-exec-1406 tools-master tools-shadow tools-webgrid-generic-1402 tools-webgrid-lighttpd-1203 tools-webgrid-lighttpd-1208 tools-webgrid-lighttpd-1403 tools-webgrid-lighttpd-1404 tools-webproxy-01 [14:50:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, dummy [14:52:39] andrewbogott: did you see https://phabricator.wikimedia.org/T99027#1543296 ? [14:54:32] valhallasw`cloud: seeing it now... [14:55:06] It’s quite possible that I didn’t rearrange the queues correctly, although it seems weird that I would’ve messed up only one of them [14:55:44] valhallasw`cloud: does that have any implication for the reboot I’m about to do? [14:56:10] The only thing I can think of is to maybe do a check for still-running continuous jobs/ [14:56:26] i.e. checking if they were rescheduled correctly [14:57:04] I could run ‘killjobs.sh’ a second time and see if it finds anything to kill :) [14:57:24] this still returns a whole set of jobs...: qhost -j -h tools-exec-1211 tools-exec-1212 tools-exec-1215 tools-exec-1403 tools-exec-1406 tools-master tools-shadow tools-webgrid-generic-1402 tools-webgrid-lighttpd-1203 tools-webgrid-lighttpd-1208 tools-webgrid-lighttpd-1403 tools-webgrid-lighttpd-1404 tools-webproxy-01 [14:58:06] ok, will you try to repeat my work? I’ll make sure you can exec /home/andrew/killjobs.sh [14:58:25] I ran it on the bastion, that’s right isn’t it? [14:58:31] Since it has to be on a submit host... [14:58:34] *nod* [14:59:26] ok, queues were already disabled [14:59:54] yeah, that’s the part that I feel most confident about [15:00:09] ahhhh [15:00:26] I can't read your killjobs.sh [15:00:46] but my version in /home/valhallasw/restarttools has a bug -- it refers to $HOST rather than $@ [15:00:53] really? I just chmod a+r killjobs.sh [15:01:19] no +x (or is it +r) on /home/andrew [15:01:29] ah, sure. Um… one moment [15:02:03] now can you cat /home/andrew/killjobs/killjobs.sh ? [15:02:07] Ah, yeah, I see that I have the same mistake [15:02:09] fixing [15:02:25] ... I can of course just sudo cat it [15:02:38] ok, look right now? [15:02:57] no, /home/andrew/ is still 700 [15:03:07] anyway, doesn't matter, it's probably that $@ issue :-) [15:03:26] well, um… maybe sudo then :) [15:04:46] ok, I now force-rescheduled a whole batch of jobs [15:04:53] with sudo qmod -rj $(qhost -j -h $HOSTS | sed -e 's/^\s*//' | cut -d ' ' -f 1|egrep ^[0-9]) [15:05:07] and qhost -j -h $HOSTS is now clean [15:05:17] ok, so… can I reboot now? :) [15:05:20] yep! [15:06:18] 6Labs, 10Tool-Labs: Jobs Disappearing from SGE - https://phabricator.wikimedia.org/T99027#1545581 (10valhallasw) 5Open>3Resolved After discussing, we figured out it was a bug in our management scripts. This caused jobs to not be rescheduled, so they then died during the reboot. We have now fixed the script... [15:18:32] hi, it looks like jobs may not be working properly on tool labs? when I try `qstat` I see: "error: unable to contact qmaster using port 6444 on host "tools-master.tools.eqiad.wmflabs"" [15:18:50] I think valhallasw`cloud and andrewbogott are working on things [15:19:13] MusikAnimal: yeah, tools-master is being rebooted. Should be back up in a few minutes. [15:19:21] cool, thanks! [15:20:46] error: unable to contact qmaster using port 6444 on host "tools-master.tools.eqiad.wmflabs" [15:20:47] o_O [15:21:14] Steinsplitter: tools-master is being restarted [15:21:15] known, people working on it [15:21:24] ah, thx [15:22:10] another question, sorry to pester: still trying to get my Ruby tool up and running. I think it needs to run on trusty, and requires a Ruby-specific webserver. Anyone know how the portgrabber works? [15:22:54] someone said portgrabber is flaky and needs to have things exported and what not in the program it runs, so my most recent attempt was this: http://pastebin.com/U357uE8Y [15:23:16] then I tried to start the app with `jstart -l release=trusty -q webgrid-generic ./httpserver.sh -mem 500m` [15:23:28] (the mem is just for safe measure, shouldn't need nearly that much) [15:24:01] the webservice starts but the Ruby server, Unicorn, does not. httpserver.err is some python output that makes little since to me [15:33:37] !log tools re-enabling the queue on tools-exec-1211 tools-exec-1212 tools-exec-1215 tools-exec-1403 tools-exec-1406 tools-master tools-shadow tools-webgrid-generic-1402 tools-webgrid-lighttpd-1203 tools-webgrid-lighttpd-1208 tools-webgrid-lighttpd-1403 tools-webgrid-lighttpd-1404 tools-webproxy-01 [15:33:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, dummy [15:35:29] coren, valhallasw`cloud: the grid engine master doesn’t seem to have come up properly… any ideas? [15:36:19] andrewbogott: service gridengine-master start? I have no clue otherwise ... :/ [15:37:24] valhallasw`cloud: that was it! [15:37:34] >_< [15:38:37] # Default-Start: 2 3 4 5 < so it's supposed to start by itself [15:38:47] but maybe nfs wasnt up yet at that point? [15:40:18] or maybe the issue is that we have an init.d script rather than an upstart script? [15:41:19] dunno, I’ll make a bug. [15:41:26] I wonder what’s supposed to be running on tools-shadow? [15:41:50] the same, I think [15:42:34] 6Labs, 10Tool-Labs: sge master not starting up on tools-master - https://phabricator.wikimedia.org/T109316#1545709 (10Andrew) 3NEW a:3coren [15:44:20] valhallasw`cloud: yeah, either it needs to be running or it needs to not be running, I’m not sure which :) [15:47:39] Coren: when you appear, please advise about what needs to happen on tools-shadow? [15:53:27] 6Labs, 10Datasets-Archiving, 10Datasets-General-or-Unknown, 10Labs-Infrastructure, 10Wikidata: [Bug] Wikidata JSON dumps gets deleted after every new Wikidata dump - https://phabricator.wikimedia.org/T107226#1545801 (10Lydia_Pintscher) [15:57:01] 6Labs, 10Tool-Labs: Unable to boot Ruby tool that runs on Unicorn server - https://phabricator.wikimedia.org/T109322#1545831 (10MusikAnimal) 3NEW [15:57:34] ^ if anyone has time, I'll be your best friend :) [16:01:49] 6Labs, 6operations, 3Labs-Sprint-107, 3Labs-Sprint-108, and 3 others: Investigate kernel issues on labvirt** hosts - https://phabricator.wikimedia.org/T99738#1545865 (10Andrew) [16:05:57] MusikAnimal: I don’t know that I’ll be able to fix the issue, but I’m confused about the ‘unicorn’ part [16:06:12] You’re using the ‘unicorn’ server in the ‘design’ project? [16:06:30] design project? that I'm not familiar with [16:06:47] ok, tell me where ‘unicorn’ is then please? [16:06:57] I just need unicorn or a similar Ruby server because the app is interpreted, not staticly served. E.g. there is no "index.html" [16:06:59] it's a Ruby gem [16:07:17] it lives at `/data/project/musikanimal/.gem/ruby/2.2.0/bin/unicorn` [16:07:27] ah, unicorn is a piece of software [16:07:29] installed with `gem install --user-install` [16:07:33] yes, sorry, I will clarify [16:07:50] In that case I don’t know anything :) ‘unicorn’ is also the name of an actual, um, server (like, a labs instance) used by some people here [16:07:57] but clearly you’re not doing anything with that. [16:08:29] maybe, it's a "rack HTTP server for fast clients and Unix", as far as I know, Ruby-specific [16:09:11] oh you said "labs instance" [16:09:16] yes so definitely not the same thing [16:11:54] 6Labs, 10Tool-Labs: Unable to boot Ruby tool that runs on Unicorn server - https://phabricator.wikimedia.org/T109322#1545963 (10MusikAnimal) [16:17:43] !log tools disable queues for tools-exec-1205 tools-exec-1207 tools-exec-1208 tools-exec-140 tools-exec-1404 tools-exec-1409 tools-exec-1410 tools-exec-catscan tools-web-static-01 tools-webgrid-lighttpd-1201 tools-webgrid-lighttpd-1205 tools-webgrid lighttpd-1206 tools-webgrid-lighttpd-1406 tools-webproxy-02 [16:17:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, dummy [16:18:32] 6Labs, 10Tool-Labs: Unable to boot Ruby app on tool labs - https://phabricator.wikimedia.org/T109322#1546007 (10MusikAnimal) [16:18:34] ^ is there a typo? [16:19:42] my search-fu is weak today...where is the wikitech page that describes ProxyPass and setting up *.eqiad.wmflabs for ssh : [16:20:14] Hey, remember that time NFS was totally fucked recently? What timespan was that? I need to know to diagnose a potentially related issue. [16:20:54] hare: It was for about 2.5 hours. [16:21:15] No, I'm talking about the catastrophic failure where we had to restore from a nine-day-old backup. [16:21:30] oh, that one :) [16:21:35] Let me find the incident report [16:22:10] It's entirely possible my bot hasn't been running because I have an outdated password file. [16:22:17] I don't sync my password file to GitHub for obvious reasons. ;) [16:22:32] But if the password file is out of date, i.e., from before I added my second bot in there... [16:22:38] hare: I believe it’s this one: https://wikitech.wikimedia.org/wiki/Incident_documentation/20150617-LabsNFSOutage [16:23:14] Okay, so June 8 to June 17 is a gap in the records. Now, let me correlate this to a special:contributions page... [16:23:35] No, that wouldn't explain it. [16:25:00] Oh well. Thank you! [16:25:53] sure [16:26:18] hare: the whole history of outages is https://wikitech.wikimedia.org/wiki/Incident_documentation [16:26:37] but of course that includes lots of things you don’t care about [16:42:05] ebernhardson: https://wikitech.wikimedia.org/wiki/Help:Access [16:43:18] valhallasw`cloud: yup found it, realized ProxyPass was not the term i'm looking for, that is an apache thing not ssh :) [17:21:49] legoktm: again about legobot. how did you install the crontab module? pip doesn't work on tools [17:23:11] Negative24: I don't think I ever ran it. Also, use a virtualenv! [17:23:32] legoktm: oh yes virtualenv. didn't get to that yet :\ [17:23:58] too busy setting up everything [17:56:25] 6Labs, 10Tool-Labs: Unable to boot Ruby app on tool labs - https://phabricator.wikimedia.org/T109322#1546633 (10scfc) (You cannot use a shell function (`launch`) as the program that `portgrabber` will execute. It needs to be a script/executable.) Neither `~tools.musikanimal/.rbenv/bin` nor `~tools.musikanima... [17:57:46] 6Labs, 10Tool-Labs: sge master not starting up on tools-master - https://phabricator.wikimedia.org/T109316#1546643 (10scfc) [17:57:48] 6Labs, 10Tool-Labs: Puppetize gridengine master configuration - https://phabricator.wikimedia.org/T95747#1546642 (10scfc) [18:09:57] valhallasw`cloud: \o/ Could you work out the jobs that were affected, and if so, would it be worth emailing the tools noting thus? [18:15:22] a930913: hm. maybe. [18:22:25] 6Labs, 10Tool-Labs: PermissionError when trying to read a dump - https://phabricator.wikimedia.org/T109261#1546735 (10Alkamid) Hang on, I didn't change the scripts, but now I cannot replicate this error. Maybe it should hang around here for a bit and if it doesn't happen again for a week or so, this should be... [18:38:41] 6Labs, 10Tool-Labs: PermissionError when trying to read a dump - https://phabricator.wikimedia.org/T109261#1546788 (10scfc) (Unrelated #1: If you directly use `qsub`, please add the arguments `-q task` so that the job is not executed on the `tools-webgrid-*` nodes.) (Unrelated #2: You use `-o`/`-e` with a dir... [18:41:54] 6Labs, 10Tool-Labs: PermissionError when trying to read a dump - https://phabricator.wikimedia.org/T109261#1546800 (10Alkamid) [18:44:38] 6Labs, 10Tool-Labs: PermissionError when trying to read a dump - https://phabricator.wikimedia.org/T109261#1546802 (10scfc) On `tools-exec-1410`, `file /public/dumps/public/plwiktionary/20150806/plwiktionary-20150806-pages-articles.xml.bz2` as `scfc` gave: ``` /public/dumps/public/plwiktionary/20150806/plwikt... [18:47:19] Could someone elaborate on scfc's comment on Phabricator: "If you directly use qsub, please add the arguments -q task so that the job is not executed on the tools-webgrid-* nodes."? Sending a link explaining what difference it would make would be appreciated [18:47:31] 6Labs, 10Tool-Labs: Unable to boot Ruby app on tool labs - https://phabricator.wikimedia.org/T109322#1546803 (10MusikAnimal) @scfc thanks for the reply! I see now that a shell script does not, as I tried logging to file only to see it was never called. With musikbot, `jsub` from the cronjob was ran as the cur... [19:06:09] 6Labs, 10Tool-Labs: cdnjs-packages-gen fails when Puppet is run interactively - https://phabricator.wikimedia.org/T109355#1546911 (10scfc) 3NEW [19:06:57] alkamid: as a parameter to your job, add -q task [19:07:01] parameter to qsub [19:07:13] because otherwise the job can be scheduled on other queues with unintended consequences [19:07:46] (e.g. webgrid nodes don't have all packages installed, and jobs on continuous queues may be restarted without warning) [19:08:08] qsub is not documented on purpose, as we expect users to use the more friendly front-end jsub instead [19:08:58] 6Labs, 10Tool-Labs: PermissionError when trying to read a dump - https://phabricator.wikimedia.org/T109261#1546925 (10scfc) 5Open>3Resolved a:3scfc I remounted the dumps directory on all of those hosts. I don't see a particular pattern, so I assume it was just a fluke. If you encounter this again, plea... [19:28:48] i'm waiting forever for my webservice to start, are the queues full or something? [19:31:46] andrewbogott: it seems the same issue happened again today, but on tools-exec-1212. Did we miss that one? :/ [19:33:34] ok, works with --release precise [19:35:34] valhallasw`cloud: lemme look in my bash history [19:35:54] it's in the list in the !log so I don't think we missed it :/ [19:36:15] valhallasw`cloud: yeah, I definitely explicitly disabled its queue [19:36:45] So that leaves us with killjobs still not doing a proper job of killing things? [19:37:06] the timing also seems to coincide with the qmod -rj rather than the reboot [19:37:45] so basically it kills the job and then rescheduls itself right back on the same node? [19:37:51] Or was it something different this time? [19:38:24] the timing suggests it's actually the rescheduling itself rather than the reboot [19:38:43] timing => 2015-08-17 15:04:58 [19:39:15] isn’t 15:04 right when I rebooted it? [19:39:39] 6Labs, 10Tool-Labs: continuous jobs killed during restart despite rescheduling - https://phabricator.wikimedia.org/T109362#1547071 (10valhallasw) 3NEW [19:40:20] I'm not sure how good ircclouds clock is, but I think you rebooted it maybe one or two mins later [19:40:23] ^ details there [19:43:29] 6Labs, 10Tool-Labs: continuous jobs killed during restart despite rescheduling - https://phabricator.wikimedia.org/T109362#1547097 (10valhallasw) Affected tasks the 13th (T99027): ``` 2015-08-13 15:01:18 tools-exec-1214.eqiad.wmflabs tools.defconbot defconbot 679417 100 2015-08-13 15:01:18 tools-exec-1214.eqia... [19:44:35] valhallasw`cloud: I’m not sure what to do other than watch closely tomorrow. Are you available then? [19:44:47] Yes, I should be [19:45:16] maybe we should do a staggered restart after all? [19:45:37] ...there's one more thing I can think of [19:45:49] if there's no queue available the job might not restart? [19:45:55] but no, then it should just wait [19:49:03] 6Labs, 10Tool-Labs: continuous jobs killed during restart despite rescheduling - https://phabricator.wikimedia.org/T109362#1547145 (10valhallasw) [19:51:57] andrewbogott: I just realized you rebooted hosts on friday without rescheduling because the script didn't work on tools-master [19:52:11] eh, thursday [19:52:17] yeah, thursday [20:03:49] 6Labs, 10Tool-Labs: continuous jobs killed during restart despite rescheduling - https://phabricator.wikimedia.org/T109362#1547178 (10valhallasw) The list of jobs on the 13th was caused because servers were rebooted without rescheduling jobs. I'm not sure why not all jobs died, though -- apparently SGE did a f... [20:03:55] it's definitely the qmod -rj that already causes this [20:04:28] andrewbogott: ok, I'm going to send an e-mail to labs-l with possibly affected jobs, and then I think we should try a staggered approach tomorrow [20:04:55] valhallasw`cloud: can you tell me what you mean by ‘a staggered approach’? [20:05:17] andrewbogott: rescheduling a few jobs at a time (or maybe a host at a time) instead of all jobs at the same time [20:05:41] why would that make a difference? [20:06:08] the only reason I can think of at the moment is that the master was overwhelmed by the amount of rescheduling [20:06:18] even though I think rescheduling a hundred jobs should be perfectly fine [20:06:31] ....but it's SGE, so who knows, maybe it concatenates everything somewhere [20:06:32] * andrewbogott nods [20:06:32] ok [20:15:55] a930913: done. thanks for the suggestion [20:29:58] 6Labs, 3Labs-Sprint-109, 5Patch-For-Review: Remove reliance on ldap $::projectid from shinkengen - https://phabricator.wikimedia.org/T108625#1547283 (10Andrew) [20:29:59] 6Labs, 3Labs-Sprint-109, 5Patch-For-Review: Make a fact for project_id on labs instances - https://phabricator.wikimedia.org/T93684#1547282 (10Andrew) [20:30:12] 6Labs, 5Patch-For-Review: Move to a new dns scheme for labs: hostname.projectname.eqiad.wmflabs - https://phabricator.wikimedia.org/T93087#1547286 (10Andrew) [20:30:13] 6Labs, 3Labs-Sprint-109, 5Patch-For-Review: Make a fact for project_id on labs instances - https://phabricator.wikimedia.org/T93684#1547285 (10Andrew) 5Open>3Resolved [20:31:08] 6Labs, 3Labs-Sprint-109, 3labs-sprint-110: Make a menu of potential new labs features, invite comments from users - https://phabricator.wikimedia.org/T101769#1547293 (10Andrew) [20:52:24] andrewbogott, around? [21:07:48] Krenair: back now, what’s up? [21:08:10] andrewbogott, how can you write to /public/dumps without labstore1003 root [21:08:11] ? [21:08:47] I’m not sure. I thought the point of dumps was to be written externally and be read-only in labs? [21:09:16] The docs say 'This is a global, readonly share that contains data dumps that can be read for research purposes ' [21:09:19] andrewbogott, well the provided files don't seem to include enwiki-20100312-pages-meta-history.xml ? [21:09:37] valhallasw`cloud: csn you kill all my jobs. im afk and can only resrt them [21:10:20] Krenair: ok — I think that the best thing is to open a phobia ticket requesting that that be added and assign to ariel [21:10:28] bah, autocorrect [21:10:30] a phab ticket [21:10:55] * andrewbogott turns off autocorrect [21:11:15] Krenair: does that make sense or am I misunderstanding the question? [21:11:39] okay... [21:11:46] I need it to see if it contains one particular revision [21:12:05] if ariel has a full copy somewhere it might be easier for him to extract it [21:16:06] any ops able to kill my jobs? [21:16:29] Betacommand: eh? [21:16:40] Betacommand: all tools.betacoomand-dev? [21:16:52] yeah [21:17:27] Betacommand: killed all except for lighttpd [21:17:45] thanks [21:32:38] i'm trying to create a new instance in wikitech, and it just tells me 'Failed to create instance'. any ideas? [21:33:04] my only thought was perhaps bumping into quotas? but the project doesn't have too much in it (can i see the quotas?) [21:33:46] ahha, yes its running into a # of cores quota :( i'll file a ticket [21:34:42] Coren / andrewbogott, lighttpd jobs on trusty are not starting because of a lack of available memory [21:34:52] I don't have time to investigate now (bedtime, etc) [21:35:10] jobs request 4GB (?!) and this gives lots of '(-l h_vmem=4g,release=trusty) cannot run at host "tools-webgrid-lighttpd-1401.eqiad.wmflabs" because it offers only hc:h_vmem=3.142G' [21:36:35] valhallasw`cloud: probably because there are too few nodes available? [21:36:40] possibly [21:37:18] but only tools-webgrid-lighttpd-1406.eqiad.wmflabs" is disabled [21:37:38] so that means we've basically been lucky up to now [21:38:18] 6Labs: Increase quota's for search project in labs - https://phabricator.wikimedia.org/T109377#1547476 (10EBernhardson) 3NEW [21:38:22] anyway, I have to sleep [21:38:24] good night [21:38:45] 6Labs: Increase quota's for search project in labs - https://phabricator.wikimedia.org/T109377#1547483 (10EBernhardson) [22:12:13] 6Labs, 6Multimedia, 6operations, 10wikitech.wikimedia.org, and 2 others: Some wikitech.wikimedia.org thumbnails broken (404) - https://phabricator.wikimedia.org/T93041#1547621 (10Krenair) 5Open>3Resolved Special:ListFiles is looking much better [22:14:14] Yuvi|Vacation: how do I add a new tools-webgrid-lighttpd node to the grid? [22:23:54] andrewbogott: phab tickets can be a phobia...