[02:20:25] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL - Socket timeout after 10 seconds [02:25:17] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 777154 bytes in 2.660 second response time [03:27:25] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL - Socket timeout after 10 seconds [03:32:18] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 777147 bytes in 2.183 second response time [05:55:11] 10Quarry: SQL String functions not working - https://phabricator.wikimedia.org/T100057#1304273 (10Soni) 3NEW a:3yuvipanda [08:16:51] (03PS1) 10Jforrester: Remove huge low-value noisy UploadWizard from -dev [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/212843 [08:28:30] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1209 is CRITICAL 100.00% of data above the critical threshold [0.0] [08:28:32] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1207 is CRITICAL 100.00% of data above the critical threshold [0.0] [08:30:10] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1208 is CRITICAL 100.00% of data above the critical threshold [0.0] [08:30:32] PROBLEM - Puppet failure on tools-exec-wmt is CRITICAL 100.00% of data above the critical threshold [0.0] [08:31:20] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1202 is CRITICAL 100.00% of data above the critical threshold [0.0] [08:31:34] PROBLEM - Puppet failure on tools-master is CRITICAL 100.00% of data above the critical threshold [0.0] [08:31:50] PROBLEM - Puppet failure on tools-mail is CRITICAL 100.00% of data above the critical threshold [0.0] [08:31:51] hmm [08:31:54] is this the daily one or what? [08:31:56] PROBLEM - Puppet failure on tools-mailrelay-01 is CRITICAL 100.00% of data above the critical threshold [0.0] [08:32:42] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1205 is CRITICAL 100.00% of data above the critical threshold [0.0] [08:33:23] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1201 is CRITICAL 100.00% of data above the critical threshold [0.0] [08:33:41] PROBLEM - Puppet failure on tools-exec-1202 is CRITICAL 100.00% of data above the critical threshold [0.0] [08:34:49] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1203 is CRITICAL 100.00% of data above the critical threshold [0.0] [08:39:19] PROBLEM - Puppet failure on tools-shadow is CRITICAL 100.00% of data above the critical threshold [0.0] [08:41:32] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1206 is CRITICAL 100.00% of data above the critical threshold [0.0] [08:43:22] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1210 is CRITICAL 100.00% of data above the critical threshold [0.0] [08:44:20] PROBLEM - Puppet failure on tools-exec-cyberbot is CRITICAL 100.00% of data above the critical threshold [0.0] [08:44:23] Coren: ^ this is all you :) the gridengine package upgrade failing [08:44:38] PROBLEM - Puppet failure on tools-precise-dev is CRITICAL 100.00% of data above the critical threshold [0.0] [08:45:08] PROBLEM - Puppet failure on tools-submit is CRITICAL 100.00% of data above the critical threshold [0.0] [08:46:00] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1204 is CRITICAL 100.00% of data above the critical threshold [0.0] [08:51:18] (03CR) 10MarkTraceur: [C: 032] Remove huge low-value noisy UploadWizard from -dev [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/212843 (owner: 10Jforrester) [08:51:21] (03Merged) 10jenkins-bot: Remove huge low-value noisy UploadWizard from -dev [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/212843 (owner: 10Jforrester) [08:57:14] hrmmmm, labs-vagrant instance having trouble with cookies for login [08:57:23] let's try randomly disabling hhvm :D [09:00:36] yuvipanda: is there going to be some labs related hacking? btw if you were too busy you can also send some people in need of newbie advices to me [09:11:48] petan: probably, yeah :) me and Joe are playing with docker on labs, and I think halfa.k is doing some labs related stuff too [09:22:41] Hi, I've been getting a locale error recently ("locale.Error: unsupported locale setting") for pl_PL.UTF-8 on tools-bastion-01. How can this be fixed? It used to work just fine [09:30:18] petan: ^ do you think you can look at alkamid's problem? [09:46:38] yuvipanda: sure [09:47:01] I will let you know when I see alkamid around here [09:47:01] @notify alkamid [09:47:28] You've already asked me to watch this user [09:47:28] @notify alkamid can you please try in tools-dev if it works there? [09:49:11] https://phabricator.wikimedia.org/T100068 <- here have a labs-vagrant bug :D [10:02:25] hmm, maybe redis is just broken here [10:02:40] i get an error adding jobs on a video upload [10:05:56] hmmmm some restarting seems to have helped [10:07:09] Hi, I've been getting a locale error recently ("locale.Error: unsupported locale setting") for pl_PL.UTF-8 on tools-bastion-01. How can this be fixed? It used to work just fine (sorry, I got disconnected and couldn't see replies if there were any) [10:07:28] alkamid: petan was offering to look at it [10:07:55] yuvipanda, thanks [10:10:31] 10Tool-Labs: Check for error log ownership before starting webservice job - https://phabricator.wikimedia.org/T99576#1304573 (10valhallasw) a:5valhallasw>3None [11:32:50] 10Tool-Labs: Gridengine upgrade causes puppet failures - https://phabricator.wikimedia.org/T100073#1304655 (10valhallasw) 3NEW [11:55:13] !ping [11:55:13] !pong [12:05:46] 6Labs: Make OpenStack Horizon useful for production labs - https://phabricator.wikimedia.org/T87279#1304737 (10Andrew) [12:06:18] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Besnar b was created, changed by Besnar b link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Besnar_b edit summary: Created page with "{{Tools Access Request |Justification=Hackathon :-) |Completed=false |User Name=Besnar b }}" [12:07:54] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Besnar b was modified, changed by Merlijn van Deen link https://wikitech.wikimedia.org/w/index.php?diff=160331 edit summary: [12:07:56] 10Tool-Labs: Gridengine upgrade causes puppet failures - https://phabricator.wikimedia.org/T100073#1304746 (10yuvipanda) @Coren how were the packages installed? Did you put them in labsdebrepo or were they put in the wmf repo on carbon? [12:09:18] 10Tool-Labs: Gridengine upgrade causes puppet failures - https://phabricator.wikimedia.org/T100073#1304750 (10yuvipanda) Ok, so I don't find it on labsdebrepo nor on wmf repo on carbon... [12:41:04] yuvipanda: feeling helpful? I’m trying to get user tyldar a login on labs and we’re clearly making some kind of dumb mistake [12:41:15] andrewbogott: what's erroring out? [12:41:22] labs or toollabs? [12:41:34] It looks like a plain old key mismatch, except I’m looking and it’s right... [12:41:40] at the moment, just bastion1.wmflabs.org [12:41:50] andrewbogott: newlines or linebreaks in key? [12:42:15] nope [12:42:29] in fact, his looks different from mine on the wikitech gui — mine wraps over several lines, his does not [12:42:35] btw, Enguerrand == tyldar [12:44:11] andrewbogott: hmm, not sure :| [12:44:22] :) Want to come look over his shoulder? [12:45:02] andrewbogott: where are you? [12:45:11] black couch over by the fooseball [12:48:15] andrewbogott: in a few mins? [12:48:20] np [12:50:57] Hi, can anyone help? When I try to ssh to tool-labs I'm getting "no matching mac found: client hmac-md5,hmac-sha1,umac-64@openssh.com,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96 server hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-512,hmac-sha2-256,umac-128@openssh.com" [12:51:19] It worked until 2-3 days ago [12:51:43] mutante: ^ [12:51:53] jem: what OS are you on? [12:52:07] Linux [12:56:53] jem: which distro / version? [12:57:51] Let me check [13:02:07] yuvipanda: Debian 6.0 [13:07:20] Hi? [13:07:22] jem: we're investigating, should have a fix in place shortly... [13:07:24] sorry about htat [13:07:25] *that [13:07:29] Ah [13:07:36] Great, thanks [13:14:58] jem: so this is due to 'more secure ssh' settings we rolled out yesterday, apparently they're having more of an impact than expected [13:15:09] jem: should have a fix shortly, I hope [13:18:54] 6Labs, 10Tool-Labs: Rebuild a bunch of tools instances - https://phabricator.wikimedia.org/T97437#1304891 (10Andrew) this is done now, right? [13:19:48] 6Labs, 10Tool-Labs: Rebuild a bunch of tools instances - https://phabricator.wikimedia.org/T97437#1304892 (10yuvipanda) 5Open>3Resolved Yes :) [13:21:12] 6Labs, 10Tool-Labs: Rebuild a bunch of tools instances - https://phabricator.wikimedia.org/T97437#1304896 (10valhallasw) tools-mail still has to go, but this is taking longer due to incomplete puppetization. ({T97574}). If you feel the virt hosts needs the space taken by that host, we can speed it up, but it's... [13:24:32] Ok [13:32:03] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Rjain was created, changed by Rjain link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Rjain edit summary: Created page with "{{Tools Access Request |Justification=Work on flow OAuth demo |Completed=false |User Name=Rjain }}" [13:36:02] jem: so longer term solution is for you to upgrade your version of ssh, I'm afraid. but right now we're rolling it back, you should be able to access shortly. [13:37:02] Ah, ok [13:37:06] yuvipanda: are you doing the hiera on wikitech? [13:37:15] valhallasw: weird, it's already removed [13:37:48] puppet true/false weirdness? [13:38:13] valhallasw: shouldn't be, these are straight bools [13:40:25] If it helps I can try to upgrade now [13:44:56] jem: yes, that would be great :) [13:53:21] (03PS1) 10Sitic: Fixed setup.py and added patched kombu version [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/212923 (https://phabricator.wikimedia.org/T100086) [13:53:36] (03CR) 10Sitic: [C: 032 V: 032] Fixed setup.py and added patched kombu version [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/212923 (https://phabricator.wikimedia.org/T100086) (owner: 10Sitic) [14:04:15] Ok, a lot of upgrades where pending... 600 Mb to download and this will take an hour or so [14:17:20] yuvipanda: multichill still can't login [14:17:31] valhallasw: haven't changed on hiera - can you do that? [14:17:33] sorry. [14:17:49] valhallasw: it was a transient puppet thing, I guess - puppet was weird, it put them back in next run [14:18:00] ah, ok [14:23:54] yuvipanda: "puppet was wierd" is the default value. [14:29:22] yuvipanda: ok, done! [14:29:26] yuvipanda: set on wikitech [14:29:32] PROBLEM - Puppet failure on tools-bastion-01 is CRITICAL 20.00% of data above the critical threshold [0.0] [14:29:37] uh oh [14:29:42] jem: logging in should be OK now [14:29:47] eh [14:30:01] valhallasw: it's mailrelay swapping again, for some reason [14:30:09] wat. [14:30:18] look at /var/log/puppet.log [14:31:39] yuvipanda: weird. I called puppet agent --disable on mailrelay-01 now [14:31:50] yuvipanda: I think the failure is just me ctrl-C'ing a --debug run [14:31:55] because I needed --noop, not --debug [14:33:19] valhallasw: ah fair enough [14:33:23] that explains why it wasn't in puppet.log [14:34:45] Correct, valhallasw, thanks [14:35:24] Now let's hope the upgrade doesn't break anything :) [14:37:09] jem: :) [14:39:31] RECOVERY - Puppet failure on tools-bastion-01 is OK Less than 1.00% above the threshold [0.0] [14:48:13] Hi, can someone does something to fix i-0000086c.eqiad.wmflabs? Instance sate is ERROR and can not connect to it via ssh: channel 0: open failed: administratively prohibited: open failed [14:48:36] yuvipanda: ? [14:57:12] anomie: how do I get a consumer secret for my oauth thing, it's not anywhere on mw site [14:58:01] andrewbogott: Kelson's problem looks like something due to migration? [14:58:14] ok, looking [14:58:34] Kelson: can you please tell me the name and/or project and/or guid of that instance? [14:59:04] andrewbogott: mwoffliner2 in the mwoffliner project [14:59:13] valhallasw: redis is out of memory again http://graphite.wmflabs.org/render/?width=586&height=308&_salt=1432393063.855&target=tools.tools-redis.redis.6379.memory.internal_view&from=-4weeks [15:07:32] yuvipanda: andrewbogott: seems to work now. thx [15:07:57] Sure thing. I didn’t do anything smart, just reset it to ‘active’ and rebooted. [15:08:53] andrewbogott: ok, but AFAIK your help was mandatory, because I tried to reboot the VM by myself, but this did not helped [15:09:13] Kelson: yeah, nova won’t reboot something in an error state, it requires root to reset things. [15:09:35] In the near future users will be able to do such things as well. [15:09:58] andrewbogott: great :) [15:12:26] 6Labs: Document PuTTY based tunneling workaround for accessing labsdb via older MySQLBench versions - https://phabricator.wikimedia.org/T99942#1305258 (10Superyetkin) 5Open>3Resolved a:3Superyetkin I have got the MySQL Wokbench 5.2.24 CE to work by discarding its SSH tunneling features. One needs to establ... [15:12:39] 10Tool-Labs: Audit redis usage on toollabs - https://phabricator.wikimedia.org/T91979#1305261 (10Sitic) Redis is out of memory again: http://graphite.wmflabs.org/render/?width=586&height=308&target=tools.tools-redis.redis.6379.memory.internal_view&from=-4weeks [15:17:36] (03PS1) 10Andrew Bogott: Added dummy ceph passwords [labs/private] - 10https://gerrit.wikimedia.org/r/212943 [15:45:23] (03PS2) 10Andrew Bogott: Added dummy ceph passwords [labs/private] - 10https://gerrit.wikimedia.org/r/212943 [16:01:57] wikibugs is missing, needs a kick after the redis restart [16:03:17] sitic: done [16:08:41] yuvipanda: Hey :-) When you have a moment, can you take a look at https://phabricator.wikimedia.org/T99930 and the timetable at https://phabricator.wikimedia.org/T92955 ? [16:09:00] the community bonding period ends monday, see https://phabricator.wikimedia.org/T94166 [16:10:33] sitic: oooh, will do :) [16:11:13] yeah had a bad cold since last weekend and the time flew by :-/ [16:11:48] ouch [16:11:49] take care [17:04:16] [13intuition] 15nemobis opened pull request #46: Typofix in adverb "globally" (06master...06patch-1) 02https://github.com/Krinkle/intuition/pull/46 [17:06:21] [13intuition] 15nemobis opened pull request #47: Restore missing "unless" in about_license (06master...06patch-2) 02https://github.com/Krinkle/intuition/pull/47 [17:11:02] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1410 is CRITICAL 44.44% of data above the critical threshold [0.0] [17:18:53] [13intuition] 15Krinkle pushed 2 new commits to 06master: 02https://github.com/Krinkle/intuition/compare/fe9cdd3a08e5...24b0ed1e33b5 [17:18:53] 13intuition/06master 14e4bb768 15Timo Tijhof: Intuition: Abstract path resolution logic with getDomainDir() [17:18:54] 13intuition/06master 1424b0ed1 15Timo Tijhof: Intuition: Implement registerDomain() method to aid custom domains... [17:36:04] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1410 is OK Less than 1.00% above the threshold [0.0] [18:06:35] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Rjain was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=160421 edit summary: [19:59:52] 10Tool-Labs: Unattended upgrades are failing from time to time - https://phabricator.wikimedia.org/T92491#1305677 (10scfc) ``` From: root@tools.wmflabs.org (Cron Daemon) Subject: Cron test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ) To: root@tools.wmflabs.org Dat... [20:08:44] yuvipanda: what are the rules for bot testing on testwiki [20:08:57] Negative24: not sure.... [20:09:08] wikitech-l might know? [20:09:15] I think it's ok to just do it and see if someone screams :) [20:09:18] Negative24: legoktm might know too [20:10:23] 6Labs: Puppet errors on newly created instances - https://phabricator.wikimedia.org/T100108#1305689 (10yuvipanda) 3NEW [20:11:55] Negative24: no rules, don't break shit [20:12:01] k [20:12:14] can someone +bot to negative24-bottest [20:12:20] er [20:12:26] negative24-testbot [20:13:46] yuvipanda: ^ [20:14:29] Negative24: legoktm is able to too, I think :) [20:14:47] !bot test.wikipedia User:negative24-testbot [20:14:47] http://meta.wikimedia.org/wiki/WM-Bot | troubleshooting bots -> https://labsconsole.wikimedia.org/wiki/Nova_Resource:Bots/Documentation#Troubleshooting [20:15:28] Negative24: done [20:15:51] legoktm: thanks [20:46:04] yuvipanda: newar4ur.py [20:46:09] valhallasw: running [20:46:46] valhallasw: depth 3 and 2 didn't find anything so I'm running find :) [20:46:57] err [20:47:00] 4 levels deep [20:47:28] yuvipanda: yeah, I'm not sure how deep this script might be. it's pywikibot compat, so I guess /home/user/pywikibot-compat/X would make sense [20:47:42] but that would be depth 3, I think? [20:48:21] valhallasw: oh, I'm looking in /data/project [20:48:29] oh, that sounds sane. [20:48:36] I'm also going to try to query qacct somehow [20:48:43] (03PS1) 10Wctaiwan: Fix percent formatting [labs/tools/extreg-wos] - 10https://gerrit.wikimedia.org/r/213001 [20:49:51] valhallasw: bah, was running in wrong place [20:49:52] fixing now [20:49:57] <3 [20:50:39] valhallasw: ./shuaib-bot/pywikipedia/newar4ur.py [20:50:50] valhallasw: /data/project [20:52:35] (03CR) 10Legoktm: [C: 032 V: 032] Fix percent formatting [labs/tools/extreg-wos] - 10https://gerrit.wikimedia.org/r/213001 (owner: 10Wctaiwan) [20:59:53] valhallasw: works? [21:00:00] yes, thanks <3 [21:00:12] valhallasw: :) yw [21:14:16] Negative24 "python scripts/pagefromfile.py -file:pages.txt" is giving me "ImportError: No module named pywikibot" [21:14:37] from #pywikibot no one is responding [21:15:01] * Negative24 has never used pywikibot before [21:16:24] Negative24: python pwn.py pagefromfile -file:pages.txt [21:17:07] valhallasw: then why on the script page it says "pagefromfile.py [global-arguments] -start:xxxx -end:yyyy -file:Filename.txt" [21:17:48] Negative24: because T2001 Documentation is out of date, incomplete [21:17:59] sigh [21:18:56] sorry :-p [21:19:37] it shouldn't be like that, obviously, but there's a bit of a mess with older/newer versions [23:06:39] On https://wikitech.wikimedia.org/wiki/Help:Tool_Labs#Redis, "It has been allocated a maximum of 7G of memory" are we talking about RAM or disk space? [23:22:24] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1401 is CRITICAL 62.50% of data above the critical threshold [0.0] [23:47:23] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1401 is OK Less than 1.00% above the threshold [0.0] [23:48:39] Negative24: that's a bit outdated, it's actually 12 GB of RAM memory. The redis INFO command or http://graphite.wmflabs.org/render/?width=586&height=308&target=tools.tools-redis.redis.6379.memory.internal_view&from=-4weeks shows the memory usage [23:50:26] sitic: ah ok