[00:33:28] are there issues with tools-login.wmflabs.org ssh ? [00:34:16] it lags saving files in nano [00:34:18] and so on [00:35:07] Coren - anything on? [00:36:11] doctaxon, yes, try now [00:36:32] yes, it's interrupting meanwhile [00:37:26] although.... I know ldap broke which caused issues logging in, but I don't think that'd cause file writing to be effected [00:38:11] Krenair: last two days it emptied the file to zero !!! [00:40:09] Krenair: do you try ldap tests these days? [00:40:34] I can log in to labs at the moment [00:40:46] yes, now it's okay [00:40:57] but it interrupts then and then [00:42:44] it what? [00:44:27] it is lagging time to time [00:44:44] doctaxon: ‘time to time’ meaning a few times per day or a few times per hour? [00:45:50] I noticed it three times this hour [00:46:20] ok — please let me know if it happens again soon? [00:46:34] sure [00:46:50] can you tell me anything about that? [00:47:33] We’ve had issues with labs seizing up over the last few days [00:47:50] it’s not yet well-understood. But one possible culprit is the ldap server which was just now restarted. [00:47:52] I noticed [00:48:16] So, maybe it’s cured at least for another 12 hours or so? [00:48:22] If it continues to stutter, that’s a datapoint. [00:49:02] I got a ldap error with "become" [00:49:44] but I use tools-login.wmflabs.org [00:50:27] "cannot connect" [00:50:53] about 00:30 UTC [01:13:43] andrewbogott: it lags right now [01:13:53] saving a file with nano [01:14:18] doctaxon: ok — I see an alert in another channel which is surely related. Thanks! [01:14:26] oh [01:16:25] still stuck? Or just slow? [01:16:27] stuck [01:16:28] yeah I think NFS is stuck [01:17:02] now the file is written [01:17:59] I am on my phone are you guys around to deal with tools home? [01:18:21] chasemp: yeah, just LDAP failing. [01:18:25] chasemp: I'll call you if need be. [01:18:26] duunno what it would be thia time [01:18:27] ah [01:18:54] any clue what happened to ldap? [01:19:38] chasemp: it fell over (FD exhaustion) [01:19:42] got restarted [01:20:25] Oh gotcha, how do we know fd issues? I will note similar time of night to last time as well [01:20:38] i wonder if some job hammers it [01:21:10] fwiw if we get desperate an idletimeout did work to cull bad clients [01:21:52] Granted 10s was aggressive, maybe 300s? Anyway i am a call away if i can help [01:22:14] chasemp: yes, there's a 10s idletimeout now. [01:22:23] chasemp: yup, parav.oid is on it now. [01:22:33] 10 *minutes* [01:22:45] oh [01:22:47] ok [01:25:32] Ok makes sense i hope that does it, thanks paravoid [01:25:33] as I said in the other channel... https://graphite.wikimedia.org/render/?width=586&height=308&target=servers.seaborgium.openldap.conns.current [01:25:33] it lags, because... our pipeline sucks [01:25:34] but better than nothing [01:25:34] I'll let it collect some data and then we can create a grafana dashboard [01:29:08] API does not work? [01:33:32] chasemp: you configured diamond initially, right? [01:33:39] why are we pushing to statsd and not to graphite directly? [01:35:19] statistics i imagine at this point but that is all ori and godog at this point [01:36:33] 95th percentile etc, i think ppl prefer it mainly and also there is some statsdlb and statsite layering i dont entirely get now [01:36:42] igloo seems to be stuck on tools. Known? [01:37:03] diffusion of load? Not sure I didnt have much to do with it [01:37:04] doctaxon: what do you mean by 'API does not work'? [01:37:10] Negative24: 'igloo'? [01:37:21] oh it was my fault sorry [01:37:29] doctaxon: np :) [01:37:55] Ok batteries at 3% i gotta get to charger [01:39:44] YuviPanda: [[WP:Igloo]] and same name on tools [01:40:23] Negative24: http://tools.wmflabs.org/igloo/ wfm? [01:41:27] YuviPanda: that's the web server. It's a gadget on enwiki and isn't responding to requests [01:41:40] I don't know if that's a tools problem or igloo [01:41:58] 6Labs, 10netops, 6operations, 5Patch-For-Review: Create labs baremetal subnet? - https://phabricator.wikimedia.org/T121237#1874862 (10Dzahn) [01:42:04] I can see if there's a problem on the backend if you can point me to a specific request that's failing [01:42:09] I see that the webservice is up and working [01:44:32] YuviPanda: nm, its probably not something with tools. All I know is that chrome is waiting on tools.wikimedia.org forever [01:44:33] 6Labs, 10Wikimedia-Extension-setup, 10wikitech.wikimedia.org, 7Mobile: Install MobileFrontend on wikitech - https://phabricator.wikimedia.org/T87633#1874870 (10Dzahn) I also think that's what we have wikitech-static for. But if we decline this, then fine, then we should just delete the DNS entry, [01:44:49] Negative24: hit refresh and see what happens :) [01:45:24] still going [04:56:28] 6Labs, 10Tool-Labs, 10Gerrit: git clone operations/mediawiki-config on tool labs fail: recursion detected in die_errno handler - https://phabricator.wikimedia.org/T106393#1875009 (10liangent) I can now reproduce it in VisualEditor: ``` tools.liangent-php@tools-bastion-01:~/mw/extensions$ cat clone.sh #!/bin... [05:11:49] ssh to bastion.wmflabs.org seems to be failing [05:12:11] Connection closed by 208.80.155.129 [05:13:34] tools-login is working fine [05:15:03] Coren, ^ [05:15:12] YuviPanda, ^ [05:18:21] https://phabricator.wikimedia.org/P2417 [05:35:09] Glaisher, I sent Coren a message [05:38:25] Krenair: Thanks. [05:38:29] I just tried sshing on my windows pc as well but it's failing there as well [05:39:22] yeah it doesn't work for me either [05:39:30] definitely something up on the server side [05:41:03] 6Labs, 10MediaWiki-extensions-Newsletter: Create a larger newsletter-test instance in labs - https://phabricator.wikimedia.org/T120516#1875027 (10Glaisher) Thank you very much bd808! I just set up the web proxy at https://newsletter-test.wmflabs.org again. But due to some issues with ssh, I haven't been able t... [05:41:36] should probably file a bug about it then [05:43:28] 6Labs: sshing to bastion.wmflabs.org fails with "Connection closed by 208.80.155.129" - https://phabricator.wikimedia.org/T121302#1875028 (10Glaisher) 3NEW [06:48:43] same, just noticed i can't login :( [06:50:37] 6Labs: sshing to bastion.wmflabs.org fails with "Connection closed by 208.80.155.129" - https://phabricator.wikimedia.org/T121302#1875051 (10EBernhardson) Doesn't seem isolated to Glaisher, i just noticed same problem [07:09:52] 6Labs, 10Datasets-Archiving, 10Datasets-General-or-Unknown, 10Labs-Infrastructure, 10Wikidata: [Bug] Wikidata JSON dumps gets deleted after every new Wikidata dump - https://phabricator.wikimedia.org/T107226#1875055 (10Hydriz) 5Invalid>3Open Clearly it hasn't been fixed. [07:10:01] 6Labs, 10Datasets-Archiving, 10Datasets-General-or-Unknown, 10Labs-Infrastructure, 10Wikidata: [Bug] Wikidata JSON dumps gets deleted after every new Wikidata dump - https://phabricator.wikimedia.org/T107226#1875058 (10Hydriz) a:5Hydriz>3None [07:34:39] PROBLEM - Host tools-worker-04 is DOWN: CRITICAL - Host Unreachable (10.68.16.122) [09:22:13] Krenair: Glaisher around? can you try sshing now? [09:23:48] YuviPanda: works for me now. It didn't a few hours ago [09:23:58] zhuyifei1999_: where are you sshing to? [09:24:04] I just hand-hacked a fix in [09:24:27] tools-bastion-01.eqiad.wmflabs via bastion [09:25:31] ah ok [09:26:21] tools-dev.wmflabs.org worked when tools-bastion-01.eqiad.wmflabs via bastion failed (ik they are different hosts, but I didn't check tools-login.wmflabs.org or tools-bastion-02.eqiad.wmflabs) [09:26:39] yeah, .wmflabs.org go directly [09:26:46] ik [09:26:57] (just to test the bug) [09:26:58] I know that tools-login.wmflabs.org worked for me [09:29:25] hmm why isn't there a wikibugs message for your comment to T121302? [09:30:08] ugh [09:30:10] it probably died too [09:30:29] ctcp the bot works [09:30:49] I restarted both the bots [09:32:36] also, could you check if the crontab on tools-submit is working? One of my jobs that's supposed to submit 3 times an hour had no log output nor qstat output since 3:10 UTC [09:33:17] lst submit should be 3 minutes ago, still nothing [09:33:25] manual jsub works [09:33:28] *last [09:34:25] ugh [09:34:29] what's your tool name [09:34:38] yifeibot [09:34:45] the job is flr [09:35:13] for this bot https://commons.wikimedia.org/wiki/Special:Contributions/FlickreviewR_2 [09:38:40] PROBLEM - Puppet failure on tools-docker-builder-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:41:30] ok, I'm not sure at all why cron isn't working [09:41:38] :/ [09:42:40] oh [09:42:42] Dec 12 09:42:02 tools-submit CRON[24428]: Permission denied [09:42:48] Dec 12 09:42:10 tools-submit nslcd[29853]: [e85ae0] passwd entry uid=80686,ou=people,dc=wikimedia,dc=org denied by validnames option: "80686" [09:43:02] zhuyifei1999_: can you file a bug? [09:43:21] k [09:46:31] 6Labs, 10Tool-Labs: cron fail on tools-submit - https://phabricator.wikimedia.org/T121305#1875157 (10zhuyifei1999) 3NEW [09:47:43] 6Labs, 10Tool-Labs: cron fail on tools-submit - https://phabricator.wikimedia.org/T121305#1875165 (10yuvipanda) p:5Triage>3High Seems LDAP related: ```Dec 12 09:42:02 tools-submit CRON[24437]: Permission denied Dec 12 09:42:02 tools-submit CRON[24428]: Permission denied Dec 12 09:42:02 tools-submit CRON[2... [09:53:16] zhuyifei1999_: can you see if any of your crons have run? [09:53:33] like right now? [09:54:06] zhuyifei1999_: in the last few hours [09:54:13] hmm [09:56:06] most jobs don't produce very obvious log outputs [09:58:22] nope, one job that's 17 * * * * has no "job already running messages" since [Sat Dec 12 02:17:03 2015] [09:59:04] ok [09:59:05] it's continuous, so it should have that message every hour [10:03:01] YuviPanda: all my cron sge submittions are not working, too [10:03:55] 6Labs: sshing to bastion.wmflabs.org fails with "Connection closed by 208.80.155.129" - https://phabricator.wikimedia.org/T121302#1875203 (10fgiunchedi) "fixed" also bastion-restricted-01: ``` root@bastion-restricted-01:~# cat /etc/security/access.conf -:ALL EXCEPT (ops) root:ALL root@bastion-restricted-01:~# p... [10:04:23] I'm relocating, I'll take a look in about 30mins again. sorry. [10:04:35] last run was at at 2:51 UTC [10:06:03] k [10:06:11] just for your info, no emergency request for me [10:06:21] scratch that, I'm back and looking [10:06:26] thanks Merlissimo, last run info is helpful [10:06:42] this breaks a lot of bots wikis rely on doesn't it [10:07:46] lol unless it's a continuous bot [10:08:30] yeah [10:08:34] RECOVERY - Puppet failure on tools-docker-builder-01 is OK: OK: Less than 1.00% above the threshold [0.0] [10:08:45] !log tools restarted cron on tools-submit [10:08:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [10:09:43] hmm next crontab submit is in a minute [10:10:00] let me see if the restart helped [10:10:51] ... and qstat says nothing about flr [10:11:49] 6Labs, 10Tool-Labs: cron fail on tools-submit - https://phabricator.wikimedia.org/T121305#1875209 (10Stigmj) Same problem for my account. Last task run was 03:10. [10:14:19] godog: I think the cron issues are related to PAM [10:15:37] YuviPanda: could be [10:16:55] zhuyifei1999_: have a bot running anytime soon? [10:17:44] I can rescedule [10:18:01] zhuyifei1999_: <3 that'd be lovely [10:18:39] actually I'll do a simple bash script [10:21:19] job "jsubtest" for every five minutes [10:23:52] so next run is in two minutes, outputing date & time to jsubtest.err [10:24:14] I see other bots running now [10:25:43] yep got one Sa 12. Dez 10:25:05 UTC 2015 [10:26:07] \o/ [10:26:09] cool [10:26:11] let me update bug [10:26:42] 6Labs, 10Tool-Labs: cron fail on tools-submit - https://phabricator.wikimedia.org/T121305#1875213 (10yuvipanda) https://gerrit.wikimedia.org/r/#/c/258663/ was hand-reverted on tools-submit, bringing cron back. Will need more thorough investigation when I'm more awake... [10:28:56] "I'm more awake" lol [10:29:41] hehe, it's 2:30 AM [10:29:47] and this week has had me sleep not very much [10:31:11] :/ [10:31:50] 2:30 AM sound like a very bad time to stay awake [10:32:43] part of being ops, I guess :) it always falls apart when you go 'ok I will take one look before I go to sleep' [10:32:52] then bam bam bam everything is on fire and you gotta put it out [10:33:32] lol [10:34:16] now I wonder how many hours jynus sleep every day [10:34:32] lol [10:35:08] jynus never sleeps [10:35:13] what is this sleeping that you continue talking about [10:35:31] 'jynus never sleeps' I ask google [10:35:42] 'Genius never sleeps' it tells me: https://www.psychologytoday.com/blog/mr-personality/201107/creative-insomnia-genius-never-sleeps [10:36:01] you are definitly a genious, I am not [10:37:02] clearly explains why I've no idea what I'm doing :D [10:42:23] hmm I just googled that and found "mysql> SELECT *, sleep(10) FROM test ORDER BY rand();" in his blog I think [10:43:45] hahah :D [10:44:48] I wondered what was the context of that ugly query [10:45:16] and it was to show the usage of temporary tables, not for anything productive [10:46:02] whatever [10:46:51] :D [10:47:08] I wonder if there are random sleep() calls in our NFS setup somewhere [10:49:35] zhuyifei1999_: btw, PAWS is a lot more stable now [10:49:40] I'm going to publicly announce it next week [10:49:49] ok let me check [10:53:49] hmm the old "test.py" script running forever [10:54:25] ? [10:54:48] import pywikibot [10:54:49] s = pywikibot.Site() [10:54:49] p = pywikibot.Page(s, "Test") [10:54:49] p.get() [10:55:18] same for curl www.google.com [10:55:21] zhuyifei1999_: ... [10:55:23] wat [10:55:25] I thought I had fixed this [10:55:27] and I had... [10:55:46] maybe I should restart or something? [10:55:54] goddamit [10:55:56] it's back [10:56:21] zhuyifei1999_: I killed yours. [10:56:28] I think one node is pretty badly fucked up [10:56:42] * YuviPanda checks [10:56:49] zhuyifei1999_: if you try again you should hopefully get a good node [10:57:16] ok [10:59:30] zhuyifei1999_: working now? [11:00:03] firefox says "The page isn't redirecting properly" on https://tools.wmflabs.org/paws/hub/user/Zhuyifei1999/ [11:00:08] let me try again [11:00:37] Ah I see [11:00:48] cached 302 loop [11:00:51] ah right [11:01:03] did a hard refresh clean them up? [11:01:14] https://tools.wmflabs.org/paws/user/Zhuyifei1999/ => https://tools.wmflabs.org/paws/hub/user/Zhuyifei1999/ and back again [11:01:37] is it still in that loop? [11:01:43] yeah [11:02:29] zhuyifei1999_: I just killed everything. try again? [11:02:48] ctrl-shift-r doesn't seem to work [11:02:53] yep now works [11:04:26] I need to make this a lot more robust [11:04:28] tomorrow, I guess [11:04:47] zhuyifei1999_: do you have working network access now [11:05:12] yep now the networks works [11:05:28] cool [11:05:33] just tested curl www.google.com and python test.py [11:05:55] zhuyifei1999_: if you do 'pwb.py login' it'll log you in without asking for your password [11:06:07] I didn't notice google has this many scripts lol [11:06:43] nice [11:07:45] there's even a links browser there now :) [11:07:52] appearance logging into other wikis doesn't work [11:08:06] https://www.irccloud.com/pastebin/tVkNwnK4/ [11:09:13] zhuyifei1999_: hmm, I wonder if that's a pywikibot bug [11:09:28] zhuyifei1999_: can you try creating a 'user-config.py' file in your homedir and setting myfamily and mylang? [11:09:34] and then trying pwb.py login? [11:09:38] ok [11:10:15] 'usernames[family]['*'] = os.environ['JPY_USER']' I wonder why is this limited to only the set family [11:10:26] zhuyifei1999_: because ['*'] doesn't work [11:10:45] zhuyifei1999_: if there aren't that many families I can list them all tho [11:11:07] https://github.com/yuvipanda/paws/blob/master/singleuser/user-config.py is where the source is [11:11:48] ik [11:12:05] yep with customized user-config.py it works [11:12:18] right [11:12:48] https://phabricator.wikimedia.org/T120334 is the bug in pwb [11:15:33] yeah I see why, https://github.com/wikimedia/pywikibot-core/blob/master/pywikibot/site.py#L1994 [11:16:04] it's checking if the username in user-config.py and the oauth username match [11:16:18] and the former is a None [11:16:25] right [11:16:36] I wonder if we can just list all the families and set usernames for them [11:17:06] https://gerrit.wikimedia.org/r/#/c/257165/ might or might not solve this too I guess [11:17:39] I'm not entirely sure about setting on all families. you don't want to tokens to go to third-party wikis [11:18:32] all wikimedia families I guess [11:18:34] ideally there should be some kind of wmf-wiki selector [11:19:33] yeah [11:21:14] maybe it's not that hard to implement. If I remember correctly, there's a superclass for all wmf wikis [11:21:44] it would be great if you can metnion that in the bug or just send me a Pull request :D [11:22:41] ugh, it's not yet implemented in pywikibot-core [11:28:44] hmm the operation of loading all family files then check whether it's a subclass WikimediaFamily might be very expensive [11:28:57] zhuyifei1999_: can't we just hardcode the list of wikimedia families? [11:29:06] I don't think there are gonna be any new families for a while [11:29:26] ugh [11:32:59] https://github.com/wikimedia/pywikibot-core/search?utf8=%E2%9C%93&q=WikimediaFamily <= everything's here [11:35:37] so if we hardcode all the projects specified there... [11:36:19] shit, I should sleep [11:36:26] I try to make sure I"m asleep by 4AM... [11:36:30] might miss that today [11:36:35] zhuyifei1999_: do play around and file bugs! <3 [11:36:38] thank you very much :D [11:36:55] ok [11:36:57] np [12:51:31] 6Labs: sshing to bastion.wmflabs.org fails with "Connection closed by 208.80.155.129" - https://phabricator.wikimedia.org/T121302#1875352 (10coren) 5Open>3Resolved a:3coren The puppet variable restricted_to was used inconsistently between projects; it was set to the group name everywhere but in project bas... [13:05:04] 6Labs, 10MediaWiki-extensions-Newsletter: Internal error when creating new user in newsletter-test.wmflabs.org - https://phabricator.wikimedia.org/T119945#1875359 (10Glaisher) [13:05:07] 6Labs, 10MediaWiki-extensions-Newsletter: Create a larger newsletter-test instance in labs - https://phabricator.wikimedia.org/T120516#1875357 (10Glaisher) 5Open>3Resolved Newsletter extension has also now been enabled. Thanks again. [13:05:27] 6Labs, 10MediaWiki-extensions-Newsletter: Internal error when creating new user in newsletter-test.wmflabs.org - https://phabricator.wikimedia.org/T119945#1875361 (10Glaisher) 5Open>3Resolved See {T120516} [13:05:31] 6Labs, 10MediaWiki-extensions-Newsletter: Internal error when creating new user in newsletter-test.wmflabs.org - https://phabricator.wikimedia.org/T119945#1875371 (10Glaisher) [14:39:22] 6Labs: Can't ssh to social-tools1 from bastion-01 - https://phabricator.wikimedia.org/T121313#1875463 (10ashley) 3NEW [15:28:39] 6Labs, 10Tool-Labs: Labs: Move tools-shadow off the same host as tool-master - https://phabricator.wikimedia.org/T103390#1875517 (10Aklapper) >>! In T103390#1808581, @coren wrote: > @yuvipanda: It looks like the test is broken, rather than host distribution. Does somebody plan to fix the test? [15:48:44] (03CR) 10Aklapper: "Amire80, Ricordisamoa: Could these 13 lines get a review / decision what do here? No updates for 5 months and this patch is rotting..." [labs/tools/translatemplate] - 10https://gerrit.wikimedia.org/r/225267 (owner: 10Ricordisamoa) [17:29:09] Hi, I am trying to process this SQL on labs 'select * from categorylinks where cl_to like "Vesnice%" or cl_to like "Město%" limit 10' but output is served in bad way. See screenshot on http://urbanecm.8u.cz/wikipedia/labsSql.png . Can you help me? [17:29:59] why is it a bad way? [17:31:53] because I cannot see it. 10 rows must fit in one screen and I can see only the bottom. [17:32:54] scroll up? :) [17:34:53] I think that this isn [17:34:57] sorry [17:35:17] I think that this isn't normal (see http://urbanecm.8u.cz/wikipedia/labsSql2.png ) [17:40:51] and, by the way, does somebody know why almost all queries in quarry are in queued state? [18:53:12] quarry.wmflabs.org gives 502 Bad gateway when trying to login. Known issue? [19:10:31] Stigmj: File a phabricator task? [19:43:11] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/HakanIST was created, changed by HakanIST link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/HakanIST edit summary: Created page with "{{Tools Access Request |Justification=modifying existing tools to fix some caveats (especially rech tool ) |Completed=false |User Name=HakanIST }}" [19:43:11] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/HakanIST was created, changed by HakanIST link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/HakanIST edit summary: Created page with "{{Tools Access Request |Justification=modifying existing tools to fix some caveats (especially rech tool ) |Completed=false |User Name=HakanIST }}" [20:44:46] (03CR) 10Ricordisamoa: "I find this solution dirty because of manual encode/decodes, and a bit pointless since Tool Labs seem to support Python 2 only. I guess on" [labs/tools/translatemplate] - 10https://gerrit.wikimedia.org/r/225267 (owner: 10Ricordisamoa) [22:59:45] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/HakanIST was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=226733 edit summary: [22:59:45] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/HakanIST was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=226733 edit summary: