[00:09:15] terrrydactyl: Try again? [00:10:24] PhantomTech: That's very little information to go on - but if qstat says it's running then the job /is/ running, but it may not be giving out any output. It's hard to figure out what may have gone wrong with no information though. :-( [00:10:58] Coren, huzzah, it seems to work [00:11:20] * terrrydactyl throws confetti and vacuums it up [00:11:36] any other info i can provide that can help? i've stopped the job for now [00:17:20] PhantomTech: i find that the best debug technique in such situations is to get a blunt instrument and begin hammering on the thing. modify the tool until it alters behavior and see what change made a difference. [00:19:04] thanks afeder, im sticking in a bunch of print lines to see if any get printed [00:23:20] nope... [00:26:08] Careful with prints - most languages will buffer the output until some amount of data is ready to flush so it'd never hit the file. [00:26:21] You may want to fflush() or equivalent after the prints. [00:31:17] a job in state Rr just means its been restarted right? [00:31:47] thanks for the flush tip Coren [00:32:12] Yes, it's been restarted since its last manual start. [01:01:11] got my bot outputting, when it queries wikipedia for the server time through pywikibot it keeps getting 21:41:59 [01:03:42] that time btw is a little after its first run on labs, anyone have any ideas for whats causing the issue? [01:14:18] PhantomTech: without seeing the code I cant give much help [01:15:05] but look at the lookup command and if it caches that value somehow [01:15:56] if you're familiar with pywikibot its getting a site from pywikibot.Site() [01:16:08] then using .getcurrenttime() on it [01:16:25] PhantomTech: core or compat? [01:16:30] core [01:16:38] * Betacommand shudders [01:16:43] Ill take a look [01:16:45] lol [01:16:51] it works fine localy [01:17:04] so i think it might be a caching problem somewhere [01:17:10] but i don't know what would be causing it [01:17:47] PhantomTech: give me a sec, Ive been using pywiki for almost 10 years now [01:20:36] PhantomTech: simple thought, write a loop, and print the current time ever 30 seconds [01:20:46] see if that works on labs [01:21:10] thats pretty much whats happening, its printing the time it gets from the site every 45 seconds but its giving the same time [01:22:59] PhantomTech: lets have just a while true: print site.time, time.sleep(60) [01:23:12] this is a basic KISS test [01:23:38] if you can strip it down to the barest components we can confirm that its a bug in pywiki [01:25:18] PhantomTech: I know this sounds kind of silly, but its key to successful troubleshooting [01:25:54] ya i know, i was just trying an idea i had, didnt seem to work [01:26:25] PhantomTech: reminds me about the law of halves [01:28:13] PhantomTech: In order to apply the law of halves we need to points, a known working point, and a known failure point [01:28:20] *two [01:28:52] getting Point A is often as simple as going back to the closest as you can from the starting point [01:29:27] (which is where my suggestion is coming from) [01:30:43] alright, code modified job submitted [01:32:41] this... umm.. [01:32:58] PhantomTech: ?? [01:33:13] first time it gives is 20:13 [01:33:18] then 19:02 [01:33:30] and now its stuck at 19:02 [01:34:25] thats minutes and seconds btw [01:34:29] its on the right hour [01:34:34] still? [01:34:56] ya, first time was 01:20:13 [01:35:07] 3 after that are 1:19:02 [01:35:34] thats odd, it should closer to 35 [01:35:35] around 1:20:00 i deleted everything in apicache to try to fix it [01:35:56] so im guessing thats why its changed from what it was giving before [01:36:15] PhantomTech: and this is why I dont use core [01:36:21] lol [01:36:51] yet another example of how often it has issues [01:36:59] PhantomTech: file a ticket [01:37:23] oh phabricator? [01:37:24] suspect its an issue with the caching "feature" of core [01:37:28] yeah [01:37:29] k [01:37:42] pywiki has its project management there [01:38:29] Since its core, Im not sure how to bypass the automatic caching [01:39:48] comments say the function forces a refresh [01:39:53] maybe ill have to look through the code myself [01:40:04] any ideahow long a ticket would take? [01:40:49] PhantomTech: depends on where the issue lies [01:41:12] PhantomTech: I know I will get a lot of backlash for this, But I always recommend using compat [01:41:36] compat tends to have far fewer issues [01:41:57] I know there isnt any caching done there either [01:43:05] caching is kind of required for what im doing, but i have it coded in myself where its need so that wouldn't be an issue [01:43:20] how hard would it be to switch over to compact [01:43:36] or how similar are they with their method names i guess [01:45:06] depends on what your doing [01:45:13] alot of it is the same [01:45:21] stalking recent changes [01:45:41] PhantomTech: getting RC data should be trivial [01:47:14] site.recentchanges() [01:52:50] after that i need text from page revisions, page deletion reasons, user edit count, groups, registration time, contributions [01:54:35] isn't core cache disk broken if you need to run multiple instance of a bot? [01:56:06] and there is also a huge bug with the edit token since nearly one year, I moved to core one year ago, and it was a VERY bad idea [01:57:05] im multithreading, not sure if that would be considered multiple instances [01:57:11] it works fine localy [02:00:17] PhantomTech: I use compat and ive had as many as 50 workers running at a time [02:03:06] im thinking i might just write my own query for getting the time instead of switching [02:08:10] PhantomTech: your likely to run into other issues [02:08:24] sounds exciting [02:23:16] Betacommand 650 untouched tasks for pywikibot core, looks like this is going to take a while [02:52:58] RECOVERY - Puppet failure on tools-webgrid-01 is OK: OK: Less than 1.00% above the threshold [0.0] [03:11:37] so im trying to remove the pywikibot folder in my tool's directory and its saying i dont have permission for one of the folders in it but ls -l shows me as the owner and its chmoded 655 [03:21:09] nevermind, silly me [03:34:30] 6Labs, 6Phabricator: Phabricator security policy open up port 222 for regular ssh with git on port 22 - https://phabricator.wikimedia.org/T94217#1159089 (10mmodell) @negative24: you are now a projectadmin ... [03:35:08] 6Labs, 6Phabricator: Phabricator security policy open up port 222 for regular ssh with git on port 22 - https://phabricator.wikimedia.org/T94217#1159090 (10Negative24) 5Open>3Resolved a:3Negative24 @mmodell Thanks. [03:36:07] twentyafterfour: I'll recreate phab-02 and configure it correctly with the security ext if that's ok with you [03:36:21] sure [03:36:48] doing... [04:02:45] Change on 12www.mediawiki.org a page Wikimedia Labs was modified, changed by Shirayuki link https://www.mediawiki.org/w/index.php?diff=1502629 edit summary: [+4] [04:16:47] !log phabricator recreated phab-02 configured with role::phabricator::labs and alternate ssh security group [04:16:54] Logged the message, Master [04:28:05] twentyafterfour: is running phab_update_tag still necessary for puppet? [04:29:07] the only error I'm seeing is that /var/run/phd doesn't exist which can easily be fixed [04:31:05] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Weipengyu was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=150582 edit summary: [04:31:39] Negative24: I'm not sure about phab_update_tag [04:31:44] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Ning328 was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=150585 edit summary: [04:31:51] the /var/run/phd bug is something I had to deal with manually last time [04:32:11] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Ning28 was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=150587 edit summary: [04:32:13] is it worth me making a review to fix it? [04:32:57] making a commit that is [04:41:01] twentyafterfour: Phabricator's puppet manifest needs some changes :) but not for today. I'll be signing off. [04:42:09] Negative24: yeah, agreed. [04:42:26] I had attempted to fix that one but my patch was flawed [04:42:50] I'll take a shot. For now I will just note down all of the errors I see. [04:51:59] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:56:54] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 759901 bytes in 3.158 second response time [06:06:58] "The URI you have requested, /xtools-articleinfo/index.php?article=Germanwings_Flight_9525&lang=en&wiki=wikipedia, is not currently serviced." [06:07:06] could someone please kickstart the thing? ^^^^ [06:38:33] PROBLEM - Puppet failure on tools-exec-07 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [08:09:37] hoi is it now that the performance of the disc systems is not great ? [09:30:24] 6Labs, 10pywikibot-core: Time issue - https://phabricator.wikimedia.org/T94293#1159190 (10XZise) Maybe the version on #labs is outdated? Is it possible to look in the code on labs and check if site.py does contain the same code as the current version on git? I'm not sure but I think the _is_expired method was... [09:39:00] 6Labs, 10pywikibot-core: Time issue - https://phabricator.wikimedia.org/T94293#1159192 (10XZise) Okay [[https://github.com/wikimedia/pywikibot-core/commit/75e5834d1a7a19940e19708e00a4f3e5e615d199|75e5834d1a7a19940e19708e00a4f3e5e615d199]] did fix something along the lines, so please check that first. I'll lowe... [09:40:08] 6Labs, 10pywikibot-core: On labs Siteinfo is caching time sensitive stuff - https://phabricator.wikimedia.org/T94293#1159193 (10XZise) p:5Unbreak!>3Normal a:3XZise [11:25:47] 6Labs, 10pywikibot-core: On labs Siteinfo is caching time sensitive stuff - https://phabricator.wikimedia.org/T94293#1159238 (10XZise) Do you delete the apicache and restart the script or do you delete it while the script runs? You could also (unlike other want to tell you) disable the apicache by setting `API... [11:40:41] 6Labs, 10pywikibot-core: On labs Siteinfo is caching time sensitive stuff - https://phabricator.wikimedia.org/T94293#1159246 (10valhallasw) With a clear apicache, the following requests are made: 1. 2015-03-28 11:33:26 threadedhttp.py, 215 in request: DEBUG ('https://en.wikipedia.org/w/api.ph... [12:03:43] 6Labs, 10pywikibot-core: On labs Siteinfo is caching time sensitive stuff - https://phabricator.wikimedia.org/T94293#1159264 (10valhallasw) After puzzling with @xzise on IRC, this is what we think happens: * Initially, the Site object has no information loaded. * site.version() calls the API with an expiry of... [12:10:20] 6Labs, 10pywikibot-core: On labs Siteinfo is caching time sensitive stuff - https://phabricator.wikimedia.org/T94293#1159266 (10valhallasw) > I'm /not/ quite sure why the version request isn't handled by the Siteinfo internal cache, though. @xzise pointed out to me this is because the comparison is the wrong... [12:20:12] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Suriyaa Kudo was created, changed by Suriyaa Kudo link https://wikitech.wikimedia.org/wiki/Nova+Resource%3aTools%2fAccess+Request%2fSuriyaa+Kudo edit summary: Created page with "{{Tools Access Request |Justification=I want to build a modern OpenSource search site for Wikimedia projects using a search software which was programmed by me. Link: https..." [12:23:16] 6Labs, 10pywikibot-core, 5Patch-For-Review: On labs Siteinfo is caching time sensitive stuff - https://phabricator.wikimedia.org/T94293#1159275 (10XZise) 5Open>3Resolved Sorry for blaming labs when I the comparison was the wrong way around the hole time already :/ [13:28:57] 10Tool-Labs-tools-Other: bring back missing-from-wikipedia - https://phabricator.wikimedia.org/T72199#1159304 (10Dzahn) a:5sumanah>3terrrydactyl [15:02:36] !log testlabs testing the logbot [15:08:57] PROBLEM - Puppet failure on tools-webproxy-jessie is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:09:37] PROBLEM - Puppet failure on tools-exec-07 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:09:41] PROBLEM - Puppet failure on tools-webgrid-03 is CRITICAL: CRITICAL: 28.57% of data above the critical threshold [0.0] [15:10:19] PROBLEM - Puppet failure on tools-exec-03 is CRITICAL: CRITICAL: 85.71% of data above the critical threshold [0.0] [15:10:47] PROBLEM - Puppet failure on tools-webgrid-generic-01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [15:11:09] 6Labs, 6operations: OOM on virt1000 - https://phabricator.wikimedia.org/T88256#1159347 (10Andrew) This happened again last night. Something must be running amok and gobbling memory. [15:11:20] PROBLEM - Puppet failure on tools-webgrid-06 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [15:11:44] PROBLEM - Puppet failure on tools-webgrid-02 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [15:11:44] PROBLEM - Puppet failure on tools-exec-01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [15:11:45] PROBLEM - Puppet failure on tools-dev is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [15:12:20] PROBLEM - Puppet failure on tools-exec-cyberbot is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [15:12:52] PROBLEM - Puppet failure on tools-exec-catscan is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [15:13:00] PROBLEM - Puppet failure on tools-webgrid-01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [15:13:01] PROBLEM - Puppet failure on tools-webproxy-02 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [15:13:01] PROBLEM - Puppet failure on tools-exec-05 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [15:13:01] PROBLEM - Puppet failure on tools-webproxy-01 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [15:13:27] PROBLEM - Puppet failure on tools-exec-06 is CRITICAL: CRITICAL: 87.50% of data above the critical threshold [0.0] [15:13:33] PROBLEM - Puppet failure on tools-trusty is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [15:14:16] PROBLEM - Puppet failure on tools-webgrid-generic-02 is CRITICAL: CRITICAL: 71.43% of data above the critical threshold [0.0] [15:14:28] PROBLEM - Puppet failure on tools-exec-12 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [15:14:58] PROBLEM - Puppet failure on tools-webgrid-04 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [15:15:49] PROBLEM - Puppet failure on tools-webgrid-07 is CRITICAL: CRITICAL: 85.71% of data above the critical threshold [0.0] [15:18:09] Oh oh. [15:19:43] RECOVERY - Puppet failure on tools-exec-07 is OK: OK: Less than 1.00% above the threshold [0.0] [15:20:45] RECOVERY - Puppet failure on tools-webgrid-generic-01 is OK: OK: Less than 1.00% above the threshold [0.0] [15:21:41] RECOVERY - Puppet failure on tools-webgrid-02 is OK: OK: Less than 1.00% above the threshold [0.0] [15:24:21] RECOVERY - Puppet failure on tools-exec-12 is OK: OK: Less than 1.00% above the threshold [0.0] [15:24:47] RECOVERY - Puppet failure on tools-webgrid-03 is OK: OK: Less than 1.00% above the threshold [0.0] [15:24:59] RECOVERY - Puppet failure on tools-webgrid-04 is OK: OK: Less than 1.00% above the threshold [0.0] [15:26:40] RECOVERY - Puppet failure on tools-exec-01 is OK: OK: Less than 1.00% above the threshold [0.0] [15:26:53] RECOVERY - Puppet failure on tools-dev is OK: OK: Less than 1.00% above the threshold [0.0] [15:27:57] RECOVERY - Puppet failure on tools-webgrid-01 is OK: OK: Less than 1.00% above the threshold [0.0] [15:27:57] RECOVERY - Puppet failure on tools-exec-catscan is OK: OK: Less than 1.00% above the threshold [0.0] [15:28:35] RECOVERY - Puppet failure on tools-trusty is OK: OK: Less than 1.00% above the threshold [0.0] [15:29:21] RECOVERY - Puppet failure on tools-webgrid-generic-02 is OK: OK: Less than 1.00% above the threshold [0.0] [15:30:51] RECOVERY - Puppet failure on tools-webgrid-07 is OK: OK: Less than 1.00% above the threshold [0.0] [15:32:22] RECOVERY - Puppet failure on tools-exec-cyberbot is OK: OK: Less than 1.00% above the threshold [0.0] [15:32:57] RECOVERY - Puppet failure on tools-webproxy-02 is OK: OK: Less than 1.00% above the threshold [0.0] [15:32:58] RECOVERY - Puppet failure on tools-exec-05 is OK: OK: Less than 1.00% above the threshold [0.0] [15:36:18] RECOVERY - Puppet failure on tools-webgrid-06 is OK: OK: Less than 1.00% above the threshold [0.0] [15:38:04] RECOVERY - Puppet failure on tools-webproxy-01 is OK: OK: Less than 1.00% above the threshold [0.0] [15:38:34] RECOVERY - Puppet failure on tools-exec-06 is OK: OK: Less than 1.00% above the threshold [0.0] [15:56:41] Ping. grrrit-wm doesn't seem to be working. [15:57:32] Out of curiosity, did someone misspell gerrit and grrrit? [15:58:08] as* [16:19:02] !log tools.lolrrrit-wm restarted grrrit-wm [16:19:02] tools.lolrrrit-wm is not a valid project. [16:19:09] !log tools.lolrrit-wm restarted grrrit-wm [16:19:13] Logged the message, Master [16:20:43] !log tools.lolrrit-wm seems the json parser got stuck in an error state; all kinds of " at Object.parse (native)...." errors [16:20:47] Logged the message, Master [16:21:29] !log tools.lolrrit-wm restart fails because tools-exec-12 is severely overloaded. WTH? [16:21:33] Logged the message, Master [16:22:15] Coren, tools-exec-12 has a load average of 30 and is the only trusty exec host [16:22:40] It's not the only one, but 30 is on the high side. [16:23:24] * Coren goes to see what's up [16:23:32] That's odd -- grrrit-wm's job skips all other queues because they are precise, and doesn't do -12 because it's overloaded [16:23:49] I thought Yuvi added a couple. [16:23:51] let me completely kill the job instead of just restarting it; maybe the list of available queues is not up to date [16:24:57] Coren: there is exec 12 and webgrid as trusty only apparently [16:25:03] Huh. [16:25:06] Need moar [16:25:26] 13-15 are all precices [16:25:34] precises [16:26:03] There's one tool that's eating a lot of slots and a load of load on its own [16:27:15] I'm wondering whether we should have specific queues for low-cpu tasks [16:27:36] but that probably wouldnt help, because a load of 30 suggests i/o issues [16:28:16] 10Tool-Labs: Add more Trusty exec nodes - https://phabricator.wikimedia.org/T94304#1159366 (10coren) 3NEW a:3coren [16:29:05] valhallasw: Well, it doesn't have to be IO, but it is at this time apparently. That job seems to be very disk bound, and having so many in parallel makes things worse. [16:29:42] * Coren suspends a couple [16:31:30] That should give a chance for the first group to complete. [16:31:37] (And lower load a bit) [16:34:46] twentyafterfour: "Bad getter call: getCCPHIDs" with new security bug? [16:52:01] twentyafterfour: Never mind [16:58:30] !log tools.lolrrit-wm allowed grrrit-wm to schedule on a trusty webgrid node for now (qalter -q webgrid-generic,webgrid-lighttpd,task,continuous 9420587) [16:58:34] Logged the message, Master [17:59:36] hi everyone [18:00:02] in german wikipedia, there is a link "new articles" at the bottom of the user contributions page [18:00:28] when i click on that link, i get to https://tools.wmflabs.org/xtools/pages/ which says "No webservice" [18:00:39] is this the expected behavior? ^^ [18:06:56] Carbidfischer: it is not [18:08:12] JohnFLewis: i gathered as much [18:08:54] And the maintainers aren't around as far as I can see so, Coren / YuviPanda ^ (if you have time) [18:09:42] xtools restarted. [18:13:35] Coren: thx [18:16:19] seems to work again [19:32:07] valhallasw: hi! [19:35:10] hey YuviPanda [19:35:23] valhallasw: no ‘cloud eh [19:35:24] :) [19:36:16] YuviPanda: yeah, have the VM with irssi running on my laptop again these days :P [19:37:05] valhallasw: heh, you’re missing from some channels :) [19:38:31] that sounds so dramatic [19:38:37] valhallasw: :P [19:39:05] valhallasw: anyway, your NDA is being held up atm due to some procedural issues on our side. Let’s hope it clears up the coming week [19:40:03] np. I'll be away next week again :-) [19:40:12] valhallasw: :) ok [19:42:55] !log tools created tools-exec-20 [19:43:02] Logged the message, Master [19:43:35] wow, so many exec nodes by now [19:45:19] gifti: i skipped 16-19, just in case we need to add more precise nodes…. [19:45:23] 20+ will be trusty [19:45:30] oh :D [19:45:47] YuviPanda: but 12 :p [19:45:57] ’twas an accident :) [19:46:04] I think I’ll depool twelve after doing 20-25 [19:49:15] YuviPanda: Hi [19:49:23] hi Vivek [19:49:33] YuviPanda: Have you reached US ? [19:49:41] yes just landed yesterday evening :) [19:49:44] getting my bearings and stuff now [19:50:04] YuviPanda: ok, nice to know that :) [20:00:03] valhallasw: what does the cloud mean anyways? [20:02:30] Is it a bad idea to have my private key on bastion? [20:03:56] Negative24: yup. bad idea [20:04:30] Ok because I can't get agent forwarding to work and what I'm about to do can't be run through the proxycommand [20:04:53] hmm, I suppose the ‘fix’ is to get agent fowrarding to work somehow [20:05:10] It hasn't worked ever for me [20:05:22] 6Labs, 6operations, 7Monitoring, 5Patch-For-Review: Setup alarms for labstore* to check for network saturation - https://phabricator.wikimedia.org/T92629#1159525 (10yuvipanda) [20:05:30] :( [20:06:24] Negative24: it's when I use a cloud-based irc client [20:06:36] oh [20:06:50] is that a bad thing? [20:07:15] no, it's just so irssi and that client don't clash on user name [20:07:27] * Negative24 is crossing fingers that sshd works so he isn't locked out of his instance [20:07:46] that makes sense [20:09:06] Negative24: btw, if ssh is borked one of us admins can log in as root [20:09:07] well [20:09:11] if sshd is borked we can’t :P [20:09:19] but even then we can execute commands via salt... [20:09:22] * YuviPanda has borked sshd before [20:09:32] that's good to know [20:09:58] I'll be smoke testing as well with a second sshd [20:10:06] 6Labs: Make a fact for project_id on labs instances - https://phabricator.wikimedia.org/T93684#1159528 (10yuvipanda) ^ +1 [20:10:17] Negative24: ah, are you trying out diffusion etc? [20:10:39] Yes. Hence all the commotion over getting port 222 unblocked [20:10:45] heh nice [20:11:29] here goes... [20:11:59] 10Tool-Labs: Add more Trusty exec nodes - https://phabricator.wikimedia.org/T94304#1159530 (10yuvipanda) tools-exec-20 is in the process of being set up. [20:12:14] whoop! [20:12:20] that worked :) [20:12:54] well that was just the smoke test. I can still be screwed [20:23:59] YuviPanda: Is it possible to use the same proxy that is already linked to an instance but have it forward two ports? [20:25:18] and the proxy list isn't even working [20:25:37] Negative24: log out and back in? [20:25:44] Negative24: and yes, yu can do that. just needs different domain names [20:25:49] Negative24: also it’s only a http proxy [20:26:13] so I can't have my phab-02.wmflabs.org proxy forward ssh through? [20:26:40] 10Tool-Labs, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Explicitly define all the services that Tool Labs provides and their interfaces - https://phabricator.wikimedia.org/T93622#1159544 (10yuvipanda) [20:26:51] 10Tool-Labs, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Make webservice2 activities blocking - https://phabricator.wikimedia.org/T93334#1159545 (10yuvipanda) [20:26:58] 6Labs, 10Tool-Labs, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Implement 'webservice2 status' - https://phabricator.wikimedia.org/T93560#1159547 (10yuvipanda) [20:27:14] 10Tool-Labs, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Make webservice2 default webservice implementation - https://phabricator.wikimedia.org/T90855#1159549 (10yuvipanda) [20:27:33] 6Labs, 10Tool-Labs, 5Patch-For-Review, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Setup a redis slave for toollabs as backup / redundancy - https://phabricator.wikimedia.org/T91239#1159558 (10yuvipanda) [20:28:32] 6Labs, 10Tool-Labs, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Implement 'webservice2 status' - https://phabricator.wikimedia.org/T93560#1159560 (10yuvipanda) a:3yuvipanda [20:28:41] 10Tool-Labs, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Make webservice2 activities blocking - https://phabricator.wikimedia.org/T93334#1159564 (10yuvipanda) a:3yuvipanda [20:28:47] 6Labs, 10Tool-Labs, 5Patch-For-Review, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Setup a redis slave for toollabs as backup / redundancy - https://phabricator.wikimedia.org/T91239#1159565 (10yuvipanda) a:3yuvipanda [20:33:29] 6Labs, 3ToolLabs-Goals-Q4: virt1000 SPOF - https://phabricator.wikimedia.org/T90625#1159569 (10yuvipanda) [20:34:27] 6Labs, 10Wikimedia-Labs-Infrastructure: Make /tmp an lvm partition too for new labs instances - https://phabricator.wikimedia.org/T85471#1159572 (10yuvipanda) 5Open>3declined a:3yuvipanda Unified root exists now :) [20:35:46] 6Labs, 3ToolLabs-Goals-Q4: Fix documentation & puppetization for labs NFS - https://phabricator.wikimedia.org/T88723#1159575 (10yuvipanda) [20:36:11] 6Labs, 3ToolLabs-Goals-Q4: Labs NFSv4/idmapd mess - https://phabricator.wikimedia.org/T87870#1159576 (10yuvipanda) [20:47:45] 10Tool-Labs, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Make webservice2 default webservice implementation - https://phabricator.wikimedia.org/T90855#1159614 (10yuvipanda) [20:47:46] 6Labs, 10Tool-Labs, 5Patch-For-Review, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Implement 'webservice2 status' - https://phabricator.wikimedia.org/T93560#1159613 (10yuvipanda) 5Open>3Resolved [20:49:52] 6Labs, 10REFLEX: Public IP and Wildcard DNS for REFLEX project - https://phabricator.wikimedia.org/T92273#1159621 (10yuvipanda) @werdna how did this end up working out? Do you still need the wildcard domain? [20:51:22] 6Labs, 3ToolLabs-Goals-Q4: dhclient overwrites /etc/resolv.conf - https://phabricator.wikimedia.org/T93691#1159622 (10yuvipanda) [20:54:44] 6Labs, 6Analytics-Engineering: LabsDB problems negatively affect analytics tools like Wikimetrics, Vital Signs, Quarry, etc. - https://phabricator.wikimedia.org/T76075#1159625 (10yuvipanda) @milimetric update? [20:54:56] PROBLEM - Puppet failure on tools-exec-22 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [20:56:04] 10Tool-Labs, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Add more Trusty exec nodes - https://phabricator.wikimedia.org/T94304#1159627 (10yuvipanda) I am going to set up 5 trusty nodes, 20 to 24. That should hold for a while. [20:56:26] 10Tool-Labs, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Add more Trusty exec nodes - https://phabricator.wikimedia.org/T94304#1159630 (10yuvipanda) a:5coren>3yuvipanda [21:00:20] PROBLEM - Puppet failure on tools-exec-21 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:01:37] 10Tool-Labs, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Add more Trusty exec nodes - https://phabricator.wikimedia.org/T94304#1159636 (10yuvipanda) I've created 5 and applied roles. Slowly they are churning through all the packages... [21:03:46] 10Tool-Labs, 3ToolLabs-Goals-Q4, 3ToolLabs-Q4-Sprint-1: Explicitly define all the services that Tool Labs provides and their interfaces - https://phabricator.wikimedia.org/T93622#1159638 (10yuvipanda) (I have updated description to match) Next step would be to create tasks for each service and figure out ho... [21:20:02] RECOVERY - Puppet failure on tools-exec-22 is OK: OK: Less than 1.00% above the threshold [0.0] [21:20:26] RECOVERY - Puppet failure on tools-exec-21 is OK: OK: Less than 1.00% above the threshold [0.0] [21:26:01] YuviPanda: this channel is unreadable because of bots again consider moving them to separate channel? [21:26:29] petan: feel free to ignore them if you don’t like them? also this was me going through a lot of bugs and triaging, won’t happen as much [21:26:40] I like them but I like people too [21:27:14] I think wikibugs shouldn’t announce new project additions... [21:27:21] that would simplify things and reduce noise a bit [21:27:30] * YuviPanda files a bug for that [21:28:24] 10Wikibugs: Wikibugs should not announce new project additions unless the new project makes it be announced in a separate channel - https://phabricator.wikimedia.org/T94318#1159654 (10yuvipanda) 3NEW [21:28:33] done [21:28:35] * YuviPanda goes afk now [21:29:10] PROBLEM - Puppet failure on tools-exec-23 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:30:49] twentyafterfour: Did you figure out that ssh is on port 222? [21:43:59] PROBLEM - Puppet failure on tools-exec-20 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:47:28] * L235 oops for doing heavy processing on trusty.login [21:47:55] YuviPanda: btw what are these mptraid messages? I thought exec hosts are virtual o.O [21:54:12] RECOVERY - Puppet failure on tools-exec-23 is OK: OK: Less than 1.00% above the threshold [0.0] [22:13:58] RECOVERY - Puppet failure on tools-exec-20 is OK: OK: Less than 1.00% above the threshold [0.0] [22:32:37] 6Labs: Make a fact for project_id on labs instances - https://phabricator.wikimedia.org/T93684#1159676 (10Andrew) It is, but it comes from ldap. Would be nice if we could get it from metadata instead (but it isn't in there :( ) [22:38:38] twentyafterfour: I'm going to disable extensions and see if that fixes it [22:41:49] and that didn't fix it [22:47:21] is something up with tools-redis? It seems like keys get deleted automatically within a couple of seconds [22:49:22] 10Wikibugs: Wikibugs should not announce new project additions unless the new project makes it be announced in a separate channel - https://phabricator.wikimedia.org/T94318#1159680 (10valhallasw) I don't get what you mean with " unless the new project makes it be announced in a separate channel" -- do you mean y... [22:52:45] sitic: are you setting an expiry time on them? [22:54:56] no, is that required now? (Worked before, the memory used by my keys is very small) [22:54:56] PROBLEM - Puppet failure on tools-exec-24 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [22:56:31] sitic: I know there was an issue with mass amounts of keys without an expiry [22:56:47] it OOMed the server several times [22:57:35] I saw https://phabricator.wikimedia.org/T91979 [22:57:42] they might have added a script to remove all keys without an expiry [23:00:28] I don't think we did that [23:00:29] I tried it with a key with an expiry, still gets deleted before it should [23:00:34] it might just be ful? [23:00:40] full* [23:00:48] * legoktm points to YuviPanda [23:00:52] semms like it "used_memory_human:12.00G" [23:01:57] sitic: max is 15g [23:02:17] No is 12g [23:02:23] I reduced it from 15 [23:02:30] Basically if all 12g is full [23:02:40] Then it starts evicting everything with a ttl [23:02:42] I get "libgcc_s.so.1 must be installed for pthread_cancel to work" when trying to run a PHP job on the tool labs grid. Any known workarounds? [23:02:44] And then just everything [23:03:03] afeder: increase memory granted? -mem 1g as param [23:03:20] sitic: Betacommand basically it means that redis is full and it is trying to be in full. [23:03:22] Unfull [23:03:38] I'll have to do an audit of who uses the most space and see if they can set TTL [23:03:46] And maybe even set up a few more instances [23:04:33] YuviPanda: worked, thanks [23:05:38] Yw [23:14:55] RECOVERY - Puppet failure on tools-exec-24 is OK: OK: Less than 1.00% above the threshold [0.0] [23:19:43] Negative24: what's this: "This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now." [23:19:52] maybe something new from upstream [23:25:44] Negative24: problem goes away when logged in. you can toggle that panel with backslash. definitely upstream brokenness, probably can be fixed by pulling from upstream master [23:39:56] twentyafterfour: I have no idea. [23:40:12] twentyafterfour: Its not going away when I'm logged in [23:40:40] and toggling it doesn't do it permanently and the other visual glitches still persist [23:43:34] I'm going to checkout master and see if that does anything [23:45:22] twentyafterfour: Heh. Going to master fixes it. [23:45:42] along with a really weird icon theme that's not in upstream nor prod [23:46:07] but that means that prod has this bug