[00:37:17] 10Tool-Labs-tools-wikiloves: Criar pagina de configuração da ferramenta no commons - https://phabricator.wikimedia.org/T130240#2139636 (10Danilo) Gracias Platonides! [[ https://github.com/ptwikis/wikiloves/blob/master/config.json | Reescrevi a configuração ]] baseado nos históricos dos templates e na configuraç... [01:29:07] hi, I'm getting error 1044 (access denied) after following these instructions https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Database#Steps_to_create_a_user_database_on_tools-db [01:29:25] should I ask for access to this db? [01:38:12] it seems that the only grants I have are SELECT and SHOW VIEW according to the 'show grants' command and http://dev.mysql.com/doc/refman/5.7/en/grant.html [01:45:16] RECOVERY - Puppet run on tools-exec-1406 is OK: OK: Less than 1.00% above the threshold [0.0] [03:33:10] 6Labs: Enforce true multi-tenancy for labs public DNS - https://phabricator.wikimedia.org/T130032#2139713 (10Andrew) [03:44:14] 6Labs, 13Patch-For-Review: Create web-proxy editing panel in Horizon - https://phabricator.wikimedia.org/T124183#2139715 (10Andrew) [03:44:16] 6Labs, 10Horizon, 13Patch-For-Review: Horizon dashboard for managing http proxies for labs instances - https://phabricator.wikimedia.org/T129245#2139716 (10Andrew) [03:44:48] 6Labs, 10Horizon, 13Patch-For-Review: Horizon dashboard for managing http proxies for labs instances - https://phabricator.wikimedia.org/T129245#2099584 (10Andrew) [03:44:50] 6Labs: Switch to using Horizon/Designate for labs public dns - https://phabricator.wikimedia.org/T124184#2139717 (10Andrew) [03:55:47] RECOVERY - Puppet run on tools-worker-1003 is OK: OK: Less than 1.00% above the threshold [0.0] [04:00:00] 6Labs, 10Horizon, 13Patch-For-Review: Horizon dashboard for managing http proxies for labs instances - https://phabricator.wikimedia.org/T129245#2139725 (10Andrew) Regarding policies... I think we can just re-use the create/delete record policies from designate. If the proxy policies were ever less restrict... [04:06:15] RECOVERY - Puppet run on tools-precise-dev is OK: OK: Less than 1.00% above the threshold [0.0] [04:06:31] RECOVERY - Puppet run on tools-exec-1201 is OK: OK: Less than 1.00% above the threshold [0.0] [04:14:31] 6Labs, 10Horizon, 13Patch-For-Review: Horizon dashboard for managing http proxies for labs instances - https://phabricator.wikimedia.org/T129245#2139741 (10Andrew) I'd also like to see this code in a gerrit patch ASAP -- as a backup mechanism if nothing else. [07:44:47] 6Labs: Writes to encoding03.video.eqiad.wmflabs 140G vd hangs forever. - https://phabricator.wikimedia.org/T130577#2139880 (10zhuyifei1999) [07:46:44] PeterBowman, what is your user account? [09:21:37] 6Labs, 10Tool-Labs: Korean Locale Installation - https://phabricator.wikimedia.org/T130532#2140009 (10Ykhwong) I can limit my request to the locales as follows: * ko_KR.euckr : cp949 and its extension are compatible with this charset. My application depends on this charset. * ko_KR.utf8 : For unicode compatibi... [09:45:08] 6Labs, 10Labs-Infrastructure: ssh-key-ldap-lookup should support multiple ldap servers - https://phabricator.wikimedia.org/T130583#2140045 (10fgiunchedi) [09:54:18] 6Labs, 10Tool-Labs: Offer Korean Locales "ko_KR.euckr" and "ko_KR.utf8" on Tool Labs - https://phabricator.wikimedia.org/T130532#2140084 (10Aklapper) [11:09:19] during the reimport, I will need to make sure replication is close to 0, so I will kill every query lagging replication [11:10:27] those take the form of "DELETE FROM WHERE NOT EXISTS (SELECT FROM )" [11:12:39] either SELECT and then DELETE or, do not use myisam tables [11:18:22] 6Labs, 10Tool-Labs: Some of my tools don't have .my.cnf / can't create databases in tools-db - https://phabricator.wikimedia.org/T50950#533327 (10PeterBowman) @coren: I hit a similar issue when trying to create a database with my tool (pbbot): ``` $ cat replica.my.cnf | grep user user='s52584' $ sql local Mari... [11:19:09] thank you, dear bot [11:19:43] 6Labs, 10Labs-Infrastructure, 6Operations: investigate slapd memory leak - https://phabricator.wikimedia.org/T130593#2140296 (10fgiunchedi) [11:31:00] 6Labs, 10Tool-Labs: labsdb accounts being created without grants to create personal databases - https://phabricator.wikimedia.org/T130595#2140339 (10jcrespo) [11:32:38] 6Labs, 10Tool-Labs: Some of my tools don't have .my.cnf / can't create databases in tools-db - https://phabricator.wikimedia.org/T50950#533327 (10jcrespo) @PeterBowman See T130595. Comment there, this one is an old, resolved issue. [11:33:03] PeterBowman, which database do you want to access, local? [11:33:15] yes, tools-db [11:33:42] ok, I will give it the permissions to you manually, but a proper fix will be needed [11:33:59] keep a look on https://phabricator.wikimedia.org/T130595 [11:34:09] ok, thank you :) [11:35:34] PeterBowman, try now [11:35:38] jynus: I recall that my tool was created during a major labs outage, and replica.my.cnf didn't show up until several days (maybe weeks) later [11:35:39] ok [11:35:49] ah, it could be related [11:36:52] Query OK, 1 row affected (0.00 sec) [11:36:53] check that the current issue is solved, then make sure the tickets is followed up so that it is solved for all users [11:37:08] ok, I'll post the result of this [11:37:34] 6Labs, 10Labs-Infrastructure, 6Operations: investigate slapd memory leak - https://phabricator.wikimedia.org/T130593#2140368 (10MoritzMuehlenhoff) There's a similar report for openldap 2.4.40 at http://www.openldap.org/lists/openldap-technical/201504/msg00005.html There are three memory leak fixes related t... [11:38:44] 6Labs, 10Horizon, 13Patch-For-Review: Horizon dashboard for managing http proxies for labs instances - https://phabricator.wikimedia.org/T129245#2140369 (10AlexMonk-WMF) >>! In T129245#2139741, @Andrew wrote: > I'd also like to see this code in a gerrit patch ASAP -- as a backup mechanism if nothing else. I... [11:40:06] 6Labs, 10Tool-Labs: labsdb accounts being created without grants to create personal databases - https://phabricator.wikimedia.org/T130595#2140370 (10jcrespo) p:5Triage>3Normal I've added manually privileges to the user affected, s52584 on labsdb1005, but the issue should be checked (was it a one-time scrip... [11:48:02] 6Labs, 10Tool-Labs: labsdb accounts being created without grants to create personal databases - https://phabricator.wikimedia.org/T130595#2140390 (10PeterBowman) Thanks! I'm including this possibly relevant comment on IRC, the mentioned outage happened in June 2015 (I presume it was [[https://wikitech.wikimedi... [13:21:59] 6Labs, 10Tool-Labs: labsdb accounts being created without grants to create personal databases - https://phabricator.wikimedia.org/T130595#2140339 (10chasemp) So the user in question was created and did have some permissions but was missing others? It is a periodic script that runs that does this, if it's th... [13:46:04] 6Labs, 6WMF-Legal: Ensure that Terms of Use document restrictions on third-party web interactions - https://phabricator.wikimedia.org/T129936#2140537 (10chasemp) p:5Triage>3Normal [13:56:59] 6Labs, 6WMF-Legal: Ensure that Terms of Use document restrictions on third-party web interactions - https://phabricator.wikimedia.org/T129936#2120419 (10chasemp) hey @ZhouZ! Thanks for digging into this. >>! In T129936#2132231, @bd808 wrote: > I'd be all for just disallowing, but yeah my word of mouth underst... [14:00:59] 6Labs, 10Tool-Labs: labsdb accounts being created without grants to create personal databases - https://phabricator.wikimedia.org/T130595#2140577 (10jcrespo) It is happening for all servers. I suspect a wrong grant for new accounts. [14:14:38] 6Labs, 10Tool-Labs: labsdb accounts being created without grants to create personal databases - https://phabricator.wikimedia.org/T130595#2140613 (10chasemp) @jcrespo https://phabricator.wikimedia.org/diffusion/OPUP/browse/production/modules/labstore/files/create-dbusers;1a0a028371723e2efd0bbe7a80bb407a8bd49... [14:20:08] @zhuyifei1999_ better to ping me in here so others can see but sure I can try to look [14:20:24] ok [14:20:25] seems unresponsive to me atm encoding03.video.eqiad.wmflabs [14:20:34] may need to reboot to further check it out? [14:20:43] I can ssh in a few hours ago [14:21:01] atm I cannot [14:21:08] please don't reboot right now [14:21:11] um [14:21:15] 6Labs: Writes to encoding03.video.eqiad.wmflabs 140G vd hangs forever. - https://phabricator.wikimedia.org/T130577#2140660 (10chasemp) p:5Triage>3Normal [14:21:23] 6Labs: Writes to encoding03.video.eqiad.wmflabs 140G vd hangs forever. - https://phabricator.wikimedia.org/T130577#2139880 (10chasemp) host is unresponsive atm [14:21:57] I'm in [14:22:51] chasemp, there is no obvious problem, but it can be from slight differences on the server config to, temporal issues, to many other things [14:23:00] if you have to reboot, I need to sigterm a few processes to ensure they can be restarted afterwards [14:23:14] the idea would be to monitor the next account creation and see if it is recurring [14:23:25] jynus: ok, thanks for looking, I'm not sure what to do there other than ....you beat me to it :) [14:23:40] if it is not, I can manually add the missing permissions [14:23:42] I'll try to spot check one in the near future as long as we both have it in mind we'll see I guess [14:24:00] I don't have much exposure to this job or how it fails etc [14:24:01] but I didn't want to fix thing manually and not report it in case there is larger issues [14:24:09] is it normal if the puppet run is 1559 minutes ago? [14:24:09] chasemp, I have none [14:24:12] yep understand thanks man [14:24:23] zhuyifei1999_: no but could be from that high load you mentioned [14:25:08] the other thing is https://phabricator.wikimedia.org/T130469 [14:25:39] I can do things manually on db side, but I do not know how that is being executed [14:25:42] um they are nearly all IO waits, shouldn't block other stuffs like puppet runs [14:26:46] top find 772 minutes of CPU time used by ksoftirqd/0 [14:28:26] jynus: ok I pinged yuvi in that one I'm not sure if there is nuance there I wouldnt know or not [14:28:37] zhuyifei1999_: where does htis try to write files out too? [14:29:06] ksoftirqd? it's a kernel thread so I have no idea [14:29:26] you said IO wait, is this NFS? [14:29:31] no [14:29:36] local mount [14:29:42] a vd [14:30:23] I can't do much w/o being able to access it, and so far no dice on that [14:30:45] you can't ssh in as root? [14:31:05] Hey folks. I'm getting bit by the labs proxy and how it handles https. I'm wondering if anyone has a solution to the problem I'm running into. [14:31:44] So, if you go to "https://ores.wmflabs.org/scores", you'll get redirected to "http://ores.wmflabs.org/scores/" [14:31:47] zhuyifei1999_: I cannot [14:31:47] Note the scheme change! [14:31:56] hmm appearantly ls /srv hangs forever as well [14:32:05] ok I'll prepare for reboot [14:32:18] This happens because the labs proxy converts the https request to http for ORES. [14:32:35] But when ORES issues a redirect, it has no idea that the request came from https [14:32:42] zhuyifei1999_: I'm not sure what the deal is but most probably the host is overwhelmed / storage, depending on what the mechanism writing is [14:32:42] So it issues a 301 for "http" [14:32:46] we could try throttling it [14:32:58] specifically 'pv -L 100M' type thing [14:33:01] And the labs proxy ignores the protocol changes and just forwards the response. [14:33:51] Potential solutions: No redirects at all, HTTPS required for all requests, ??? [14:33:58] halfak: is the bad outcome that the redirect url is now inadvertently insecure? [14:33:59] there should be absolutely no IO since a few hours before it went out [14:34:23] chasemp, yes, definitely. [14:34:31] It breaks all sorts of things. [14:34:52] halfak: https://github.com/wikimedia/operations-puppet/blob/production/modules/wikimetrics/manifests/web.pp#L25 [14:35:10] chasemp, try this URL https://ores.wmflabs.org/v2/ [14:35:24] halfak: so that should fix it, since it is uwsgi that's not doing this redirect properly. [14:35:32] It fails to load because one of the assets does this redirect and becomes insecure. [14:35:48] yuvipanda, ? But how would uwsgi know that the request is coming in https? [14:35:55] Seems like uwsgi is doing the only thing it could do. [14:35:59] halfak: we set an additional header [14:36:03] halfak: X-Forwarded-Proto [14:36:07] Oh! [14:36:08] Cool [14:36:27] yuvipanda: oh hi, can you look into T130577, so I don't have to wait for andrew? [14:36:27] T130577: Writes to encoding03.video.eqiad.wmflabs 140G vd hangs forever. - https://phabricator.wikimedia.org/T130577 [14:36:46] halfak: so if we add that, we make ores https only as well [14:37:01] zhuyifei1999_: unfortunately that does look like a thing andrewbogott would be best suited to fix [14:37:12] hmm... Wouldn't be terrible. I'd rather https only then have this bad behavior [14:37:23] yeah [14:37:26] :/ [14:37:48] halfak: this is basically how I crossed this similar bridge in prod w/ phab [14:37:51] https only I mean [14:38:42] zhuyifei1999_: are you able to get a login there at all? [14:38:52] yes [14:39:05] a responsive one [14:39:25] yuvipanda, shall I make that patch in puppet? [14:39:26] until querying /srv/ for anything [14:39:31] halfak: +1 [14:39:35] you tried rebooting already? [14:39:39] reads or writes, both hang [14:39:43] not yet [14:39:54] that will kill a few jobs [14:39:55] RECOVERY - Puppet run on tools-exec-1221 is OK: OK: Less than 1.00% above the threshold [0.0] [14:40:04] kk yuvipanda [14:40:04] halfak: halfak you need the 'router_redirect' from line 12 in that file too [14:40:55] ok, rebooting in a few minutes [14:41:10] zhuyifei1999_: yeah, that's probably the first thing to try [14:41:17] since I can't get a shell anyway :( [14:42:27] 6Labs, 10Tool-Labs: Should we change labs and tools proxies to https-only? - https://phabricator.wikimedia.org/T130236#2140810 (10Andrew) [14:42:30] 6Labs, 10Tool-Labs, 6Operations, 10Traffic, and 2 others: Migrate tools.wmflabs.org to https only (and set HSTS) - https://phabricator.wikimedia.org/T102367#2140811 (10Andrew) [14:42:45] 6Labs, 10Tool-Labs: Should we change labs and tools proxies to https-only? - https://phabricator.wikimedia.org/T130236#2140813 (10yuvipanda) [14:42:49] yuvipanda, https://gerrit.wikimedia.org/r/278898 [14:42:49] 6Labs, 10Tool-Labs, 6Operations, 10Traffic, and 2 others: Migrate tools.wmflabs.org to https only (and set HSTS) - https://phabricator.wikimedia.org/T102367#2140814 (10yuvipanda) [14:45:46] um the processes don't even respond to SIGKILL [14:45:56] force unmounting [14:46:54] halfak: jenkins -1'd it [14:47:10] * halfak looks for typo [14:47:13] halfak: you need the lintignore stuff around the change like in the other place I linked you to [14:48:52] yuvipanda, got it. Thanks [14:49:22] halfak: np. it looks good to me, shall I merge it now? [14:49:46] yuvipanda, yes. It'll take place the next time we restart uwsgi on the web nodes, right? [14:49:51] ... and hangs [14:50:06] halfak: nope, should happen immediately on next puppet run [14:50:12] and will restart the uwsgi bits [14:50:35] yuvipanda, will the web nodes restart simultaniously? [14:50:39] halfak: I can also do this during the outage window you already announced if you want to be on the safer side [14:50:51] yuvipanda, yeah. Seems like a good idea. [14:50:56] :) [14:50:59] halfak: when was it again? [14:51:09] Tomorrow at this time. [14:51:13] oh I see [14:51:15] Actually the start of the current hour [14:51:21] So, 7AM PST [14:51:24] halfak: hmm I might end up doing it earlier then. [14:51:38] halfak: I can stop puppet on all of them, and run it one to see if it works ok :) [14:51:42] and then re-enable everywhere [14:51:44] I can have akosiaris merge it tomorrow? [14:51:55] sure [14:52:06] andrewbogott: maybe you need to restart via wikitech [14:52:08] but you'll have the annoying redirect behavior till then [14:52:14] sudo reboot hangs [14:52:22] yuvipanda, that's OK. We have had that for a long time. [14:52:26] zhuyifei1999_: fair enough, I'll try :) [14:52:28] halfak: cool :) [14:52:31] halfak: I +1'd it [14:52:34] Thanks [14:52:40] halfak: np! [14:52:58] yuvipanda, one more if you have a minute. https://gerrit.wikimedia.org/r/#/c/278455/ [14:53:05] Should be easy [14:53:16] !log video rebooting encoding3 because it's mostly broken [14:53:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Video/SAL, dummy [14:53:23] * halfak punches self [14:53:28] I think I borked that change [14:54:46] halfak: minor fix needed [14:54:48] andrewbogott: just fyi, I got no disconnect message yet [14:55:06] ok, it'll take a bit [14:55:13] yuvipanda, rename the class from "base" to "compute"? [14:55:17] halfak: yeah [14:55:27] update pushed. [14:56:41] halfak: done [14:56:49] Thanks! [14:58:30] 10PAWS, 7Documentation: Documentation for PAWS - https://phabricator.wikimedia.org/T129548#2140823 (10yuvipanda) 5Open>3Resolved a:5yuvipanda>3None [15:03:09] andrewbogott: the host seems back :) [15:03:58] zhuyifei1999_: and fixed? [15:04:34] I can touch /srv/test no problem [15:04:51] not sure if the problem will reappear [15:05:23] great, let's close that ticket and can reopen if this happens again [15:11:04] ok [15:12:17] 6Labs: Writes to encoding03.video.eqiad.wmflabs 140G vd hangs forever. - https://phabricator.wikimedia.org/T130577#2140886 (10zhuyifei1999) 5Open>3Resolved a:3Andrew Rebooted by @andrew via Wikitech [15:18:07] 6Labs, 7Tracking: Increase horizon session length - https://phabricator.wikimedia.org/T130621#2140896 (10Andrew) [16:00:03] 6Labs, 7Puppet: Receiving puppet run failure alert for instance where manual puppet runs complete fine - https://phabricator.wikimedia.org/T129403#2141024 (10dschwen) 5Open>3Resolved a:3dschwen No further emails received. Closing. Thanks! [16:12:59] jynus: is this information up-to-date? "If you use a user database (see below), connect to a database server instead of one of the aliases. (...) The current database servers are c1.labsdb, c2.labsdb and c3.labsdb." [16:13:03] it's from https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Database#Naming_conventions [16:13:30] I'm trying to access tools-db from mysql workbench [16:14:13] I tried to add a test db (user s52584), and it's not listed here https://tools.wmflabs.org/tools-info/?dblist=tools-db [16:17:54] well, there is only one accessible physical server for toolsdb, so it does not apply for that [16:18:26] if in the future we provide more servers, it will be something similar [16:18:40] right now there is only 1 toolsdb [16:19:07] that documentation is for replicas [16:21:38] PeterBowman, I would assume that only lists public databases (*_p) [16:21:50] (03PS1) 10Glaisher: Improve elections.php [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/278925 [16:22:28] (03PS2) 10Glaisher: Improve elections.php [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/278925 [16:22:32] I can see 4 databases created by you [16:23:12] (03PS3) 10Glaisher: Improve elections.php [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/278925 [16:24:13] (03CR) 10jenkins-bot: [V: 04-1] Improve elections.php [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/278925 (owner: 10Glaisher) [16:24:15] (03PS4) 10Glaisher: Improve elections.php [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/278925 [16:24:37] (03CR) 10jenkins-bot: [V: 04-1] Improve elections.php [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/278925 (owner: 10Glaisher) [16:25:12] (03CR) 10jenkins-bot: [V: 04-1] Improve elections.php [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/278925 (owner: 10Glaisher) [16:25:27] (03CR) 10jenkins-bot: [V: 04-1] Improve elections.php [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/278925 (owner: 10Glaisher) [16:30:41] (03CR) 10Glaisher: "https://gerrit.wikimedia.org/r/278926 drops php53lint test" [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/278925 (owner: 10Glaisher) [16:46:19] 6Labs, 10Tool-Labs: Labs/Tools mailing list reform - https://phabricator.wikimedia.org/T130637#2141307 (10Andrew) [16:58:51] 6Labs, 10Tool-Labs: Labs/Tools mailing list reform - https://phabricator.wikimedia.org/T130637#2141357 (10yuvipanda) I'm going to say '+1 to whatever @bd808 will have said in the future on this ticket' [17:00:04] 6Labs, 10Tool-Labs: Labs/Tools mailing list reform - https://phabricator.wikimedia.org/T130637#2141380 (10Andrew) "When that day comes, you will cry out for relief from the king you have chosen" [17:08:39] toools login highly slow? [17:14:55] Steinsplitter: let me check [17:15:48] ssh to tool labs lasted less than a second once I entered the password [17:15:54] I'll now become a tool [17:16:05] \o/ that was fast [17:16:11] same [17:16:43] it's pretty dependent on whether users are trying to run things directly on teh bastion which we discourage [17:18:10] There was a hung bash process on there the other day running on one of my tools completely without my knowledge. It got killed, obviously. [17:19:13] I wasn't on the user or anything, and I didn't launch it, so I was stumped. I couldn't kill it either, that was the problem. [17:19:32] to run things ain't dev.tools? [17:20:58] stuff schould run on jobserver. [17:21:03] I hadn't been on that user for days, so I don't know what was causing it. It was using 99% CPU on the bastion. [17:21:13] likely screen schould be enabled on labs as well :P [17:21:17] It was a bash process. [17:21:31] It is. [17:21:47] Screen and tmux are on there and able to be used. [17:22:21] *disabled i mean [17:23:26] tmux only got installed a little while ago. [17:23:55] It's on there because of those users with unreliable connections I think. [17:27:33] (03CR) 10MarcoAurelio: "Can I safely merge this then or will jenkins abort merging due to failure on php53?" [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/278925 (owner: 10Glaisher) [17:28:30] (03CR) 10Glaisher: "This can be merged after that patch is deployed." [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/278925 (owner: 10Glaisher) [17:29:46] (03CR) 10MarcoAurelio: "check zend" [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/278925 (owner: 10Glaisher) [17:47:56] (03CR) 10MarcoAurelio: "recheck" [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/278925 (owner: 10Glaisher) [17:48:49] (03CR) 10MarcoAurelio: [C: 032] Improve elections.php [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/278925 (owner: 10Glaisher) [17:49:23] !log tools.stewardbots Dropped support for php5.3 as per https://gerrit.wikimedia.org/r/#/c/278925/ [17:49:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL, Master [17:49:44] !log tools.stewardbots Merging https://gerrit.wikimedia.org/r/278925 [17:49:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL, Master [17:53:33] hey everyone how long does it take to get an account approved for lab access? [18:03:12] 6Labs, 10Tool-Labs, 6Security-Team: Procure *.tools.wmflabs.org certificate - https://phabricator.wikimedia.org/T130649#2141663 (10yuvipanda) [18:08:26] (03Merged) 10jenkins-bot: Improve elections.php [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/278925 (owner: 10Glaisher) [18:12:33] jshulz: it's a bit random, but I'll check the approvals queue for you [18:25:02] (03PS1) 10MarcoAurelio: Revert "Improve elections.php" [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/278954 [18:26:48] (03CR) 10MarcoAurelio: "Change reverted since it caused the tool to become a blank page." [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/278925 (owner: 10Glaisher) [18:27:41] (03CR) 10MarcoAurelio: [C: 032 V: 032] Revert "Improve elections.php" [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/278954 (owner: 10MarcoAurelio) [18:28:37] !log tools.stewardbots Change https://gerrit.wikimedia.org/r/278925 reverted in https://gerrit.wikimedia.org/r/278954 [18:28:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL, Master [18:46:23] 6Labs, 10Labs-Infrastructure: Unable to SSH onto tools-login.wmflabs.org - https://phabricator.wikimedia.org/T130446#2141963 (10chasemp) [18:46:25] 6Labs, 10Labs-Infrastructure, 13Patch-For-Review: ssh-key-ldap-lookup should support multiple ldap servers - https://phabricator.wikimedia.org/T130583#2141960 (10chasemp) 5Open>3Resolved a:3chasemp seems to handle the failure case now [18:51:48] hi! I'm trying to use wikitools 1.3 in a virtualenv in my tool. however, it fails to query the wikipedia API when running in the grid. [18:51:54] I think I've tracked this down to the ssl Python module not being present in the grid when running with the virtualenv, which causes urllib2 to fail to handle https, which in turn causes wikitools to fail. is this a known issue? [18:54:58] ggp: it would be if it's an older version of the python-requests package trying to use SSLv3 while that is disabled for security reasons on the server side and should use TLS now [18:55:37] that specifically if python-requests has been installed using pip [18:56:00] and pip sucks, not removing the remnants of old versions even when it's told to remove stuff [18:56:16] and then you still use the old python-requests from whenever .. [18:56:46] ggp: add '-l release=trusty' to your jsub command [18:56:55] that'll make it run on trusty nodes, which is what you want [18:57:09] chasemp: we should change the default sometime soon, this is biting a lot of people [18:57:39] anecdotally I have seen that as well [18:58:08] hang on, let me try this from a fresh venv [18:58:32] I'm not using python-requests as I reproduce this, but I do have it installed. so let me rule that out and use the new option to jsub [18:59:00] ggp: so if you just use jsub without specifying -l release=trusty, your virtualenv is being built on a trusty machine but run on a precise machine [18:59:05] which will cause all sorts of problems [19:01:10] yuvipanda: is there a command line option to specify a virtualenv btw? I'm wrapping my Python script in a shell script that enters the virtualenv first, but that doesn't sound like it would be the proper way :) [19:01:31] ggp: you can just use the $VIRTUALENVPATH/bin/python to execute your file [19:01:34] and that should work [19:01:40] so instead of 'python something.py' [19:01:42] you would have [19:01:54] /data/project/$toolname/$virtualenvpath/bin/python something.py [19:03:20] 6Labs, 10Labs-Infrastructure, 6Operations: investigate slapd memory leak - https://phabricator.wikimedia.org/T130593#2142031 (10MoritzMuehlenhoff) Moving to a backport of 2.4.41 is probably the better solution, though. [19:07:08] andrewbogott: does the slapd from in labtest* also run syncrepl replication? wondering whether it would be an approriate testbed for a 2.4.41 backport for T130593 [19:07:08] T130593: investigate slapd memory leak - https://phabricator.wikimedia.org/T130593 [19:07:54] moritzm: should be the same, yes [19:08:27] oh, wait... [19:08:31] yuvipanda: aha, this works, thank you! [19:08:38] no, sorry, my mistake, I was thinking about something else [19:08:45] moritzm: I think it doesn't, since there's only one of it [19:10:08] ah, ok so it isn't kept up-to-date with the data from the actual labs slapd, but rather a one-off import [19:11:21] but it should probably still be useful to check whether a 2.4.41 backport works in general (whether it addresses the memleaks in LDAP replication can probably only be found out in our specific setup) [19:11:28] moritzm: correct, a one-off [19:11:40] but yeah — it's not a perfect test but it won't hurt to try there first [19:12:01] I'll build on in the next days and run some basic tests, we can then decide whether to install it on labtest [19:15:05] (03PS1) 10Dzahn: add fake phab deploy private key for compiler [labs/private] - 10https://gerrit.wikimedia.org/r/278961 [19:15:41] (03CR) 10Dzahn: [C: 032 V: 032] add fake phab deploy private key for compiler [labs/private] - 10https://gerrit.wikimedia.org/r/278961 (owner: 10Dzahn) [19:17:21] 6Labs, 10Labs-Infrastructure, 6Operations: investigate slapd memory leak - https://phabricator.wikimedia.org/T130593#2142143 (10MoritzMuehlenhoff) a:3MoritzMuehlenhoff [19:17:42] (03PS1) 10Dzahn: move fake phab deploy private key to correct location [labs/private] - 10https://gerrit.wikimedia.org/r/278962 [19:18:29] (03CR) 10Dzahn: [C: 032 V: 032] "so that https://gerrit.wikimedia.org/r/#/c/274502/ can compile" [labs/private] - 10https://gerrit.wikimedia.org/r/278962 (owner: 10Dzahn) [19:28:02] 6Labs, 7Tracking: Increase horizon session length - https://phabricator.wikimedia.org/T130621#2142251 (10hashar) From the IRC discussion we had, Horizon sets a cookie `session-id` that expires after roughly 2 hours and 15 minutes or roughly 8000 seconds. Horizon uses Django configured via modules/openstack/te... [19:35:32] 6Labs, 10Tool-Labs: Labs/Tools mailing list reform - https://phabricator.wikimedia.org/T130637#2142274 (10bd808) The Labs/Tool Labs confusion problem is real. That being said, I'm not sure that either Labs or Tool Labs is big enough without the other to warrant a discrete mailing list today. I'm not highly opp... [19:38:54] 6Labs, 10Tool-Labs: Labs/Tools mailing list reform - https://phabricator.wikimedia.org/T130637#2142283 (10chasemp) One negative thing I have seen with labs-announce is people trying to respond to it directly and that does not work, but for whatever reason I don't see their reply ever surface elsewhere. I'm p... [19:49:35] 6Labs, 10Tool-Labs: Labs/Tools mailing list reform - https://phabricator.wikimedia.org/T130637#2141307 (10Legoktm) Does labs-announce have reply-to set to labs-l? [20:30:49] 10Tool-Labs-tools-wikiloves: Criar pagina de configuração da ferramenta no commons - https://phabricator.wikimedia.org/T130240#2142413 (10Danilo) Coloquei a configuração em [[ https://commons.wikimedia.org/wiki/Module:WL_data | um módulo no commons ]], assim a mesma configuração pode ser aproveitada por módulos.... [20:39:51] hello! is there way I can make my own 500/503/404, etc, HTML pages for my tool? [20:39:55] on Tool Labs [20:47:40] PROBLEM - Host tools-docker-builder-01 is DOWN: CRITICAL - Host Unreachable (10.68.19.84) [20:50:46] MusikAnimal: hmmm... I'm actually not sure if it is possible right now. The proxy server may take over when it sees those status codes [20:50:46] yeah just read that at https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Web#Error_pages [20:50:48] 6Labs, 10Tool-Labs: Goal: Allow using k8s instead of GridEngine as a backend for webservices (Tracking) - https://phabricator.wikimedia.org/T129309#2142480 (10yuvipanda) [20:50:48] not sure how the X-Wikimedia-Debug header stuff works, so not going to bother [20:50:49] 6Labs, 6Operations, 13Patch-For-Review: Setup private docker registry with authentication support in tools - https://phabricator.wikimedia.org/T118758#2142478 (10yuvipanda) 5Open>3Resolved I've reverted all the CNAME work - @Joe pointed out that we'll want to have the registry available (in a readonly mo... [20:50:49] just would be nice to have it show some links to contact the maintainers [20:50:50] MusikAnimal: it would work for you to see the raw error pages but not generally other users. There are Firefox and Chrome plugins to make using the header from a browser easier [20:50:50] MusikAnimal: https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug [20:51:12] I'm not 100% certain if the labs proxy has been updated to work with the new header standard there though :/ [20:52:46] raw normal HTML files don't need a webserver right? e.g. just browsing as if it were the filesystem itself [20:54:42] so maybe I could somehow configure Varnish to load those [20:54:52] MusikAnimal: no, they need a web server. You can't talk HTTP without a web server somewhere [20:55:10] and we don't have Varnish for Tool Labs [20:55:12] so no solution to what I'm trying to do then, I guess [20:55:17] ah [20:55:41] But Varnish would also need to get content from an http server somewhere [20:55:41] I was just going by what it said at https://wikitech.wikimedia.org/wiki/Debugging_in_productionl [20:55:50] right [20:56:24] yeah. The header was co-opted to do something similar but different in Tool Labs [20:56:27] bd808: there's tools-static, and tools-static.wmflabs.org/ serves things from ~/www/static, no webservice needed. [20:56:49] yuvipanda: but there is a web server, it's just one that someone else runs [20:56:55] ah true! [20:57:05] just no user controlled one in this case [20:57:06] hey, they works for me, so long as I can tell it what to serve [20:57:13] :( [20:57:14] oh well [20:57:33] I'll just focus on making sure the tool doesn't go down in the first place [20:57:35] MusikAnimal: the default 404 / 500 page links to contact info for maintainers [20:57:36] MusikAnimal: what problem are you hoping to fix? [20:57:48] MusikAnimal: links to their wikitech user page (maybe) [20:58:17] http://tools.wmflabs.org/pageviews-test/ [20:58:24] MusikAnimal: http://tools.wmflabs.org/admin/waaat for example [20:58:33] the default 404 page looks like this -- https://tools.wmflabs.org/?404 [20:58:42] the pageviews-test webservice is not running [20:58:45] 403 == https://tools.wmflabs.org/?403 [20:58:53] and it shows a 503 for some reason [20:58:56] https://tools.wmflabs.org/?500 [20:59:04] https://tools.wmflabs.org/?503 [20:59:27] yeah that 503 page sucks [20:59:41] we don't really have a 503 page I think. that's just nginx default [20:59:48] yeah it is [20:59:49] there is a pretty page but the nginx config eats it somehow [20:59:58] I don't think we've a pretty page for 503 [21:00:20] this is the page they're going to see if I mess up deploying this Intuition update [21:00:25] yuvipanda: we do in the admin tool, but it isn't wired up to nginx properly [21:00:29] we do [21:00:36] error_page 503 /admin/?503; [21:01:00] bd808: 403 and 404 aren't actually hooked [21:01:32] so what makes https://tools.wmflabs.org/admin/?503 not the page that the php would generate? [21:02:53] yuvipanda: this is the pretty page that should come out -- https://github.com/wikimedia/labs-toollabs/blob/master/www/content/503.php [21:03:12] bd808: good question. I don't know [21:03:20] MusikAnimal has unearthed a bug in our proxy, looks like! [21:03:36] 6Labs: Switch to using Horizon/Designate for labs public dns - https://phabricator.wikimedia.org/T124184#2142534 (10Andrew) [21:03:37] woohoo! [21:03:46] :) [21:03:59] yuvipanda: oh. I bet I know what the problem is [21:04:14] https://github.com/wikimedia/labs-toollabs/blob/master/www/index.php#L26-L27 [21:04:20] is it that the upstream lighttpd server is returning 503 that is interpreted by nginx rather than passed? [21:04:22] * yuvipanda clicks [21:04:24] the 503 handler is returning a 503 [21:04:31] bam! :D [21:04:33] it should return a 200 [21:05:05] bd808: woah, HTTP/1.0 503 No Webservice [21:05:22] that's how you dod status codes in php [21:05:26] *do [21:05:28] HTTP/1.0, and that isn't the description for a 503 no? [21:05:37] also, hahah, nice! [21:06:14] the text that goes with a status code is arbitrary [21:06:20] * bd808 will make a patch [21:06:21] 6Labs, 10Tool-Labs: Setup DNS for kubernetes services - https://phabricator.wikimedia.org/T111914#2142537 (10yuvipanda) a:5yuvipanda>3None [21:06:28] bd808: cool, thanks :D [21:10:33] 6Labs, 10Tool-Labs: Build containers for use with Tool Labs - https://phabricator.wikimedia.org/T130668#2142556 (10yuvipanda) [21:10:39] 6Labs, 10Tool-Labs: Build containers for use with Tool Labs - https://phabricator.wikimedia.org/T130668#2142556 (10yuvipanda) [21:10:41] 6Labs, 10Tool-Labs: Define base Wikimedia Docker container - https://phabricator.wikimedia.org/T118446#1800488 (10yuvipanda) [21:11:09] (03PS1) 10BryanDavis: Fix 503 pretty page [labs/toollabs] - 10https://gerrit.wikimedia.org/r/278984 [21:11:29] yuvipanda: ^ [21:12:21] (03CR) 10Yuvipanda: [C: 032] Fix 503 pretty page [labs/toollabs] - 10https://gerrit.wikimedia.org/r/278984 (owner: 10BryanDavis) [21:12:26] bd808: merged [21:12:43] andrewbogott, yuvipanda: i just clicked on a patch and was suddenly logged out of gerrit, and it won't let me back in. wikitech either. i feel certain that i'm using the right creds, but i'm clearly not. is there any way to tell if the pass has changed recently? [21:12:43] It needs a followup. there is another busted path [21:13:12] cwd: 'the pass' meaning your password? [21:13:19] yes [21:14:24] I don't know that there's a good way to tell. You can do a password reset on wikitech, I'm not sure what else to suggest. [21:14:35] Well, that, and check your caps-lock :D [21:14:48] andrewbogott: :) [21:14:55] andrewbogott: that will reset the ldap entry? [21:15:02] right from wikitech? [21:15:17] yes, wikitech password == ldap password [21:17:18] perfect, thank you [21:17:19] i swear i haven't changed it in...a long time. they expire or anything? or have you heard of one being "corrupted" or something? [21:17:19] nope, neither [21:17:22] well i have about 20 pws that are in need of a change anyhow, this will be a good place to start [21:17:24] thanks again [21:17:29] sorry to bother [21:18:22] (03PS1) 10BryanDavis: Fix error handler endpoints [labs/toollabs] - 10https://gerrit.wikimedia.org/r/278987 [21:18:25] cwd: lots of us use keepassX, I'd recommend it if you aren't already using it [21:19:23] yuvipanda: follow up in https://gerrit.wikimedia.org/r/#/c/278987/ [21:19:45] bd808: that'd affect the other pages too, but I guess that's ok? [21:19:53] andrewbogott: probably time i retire the old gpg encrypted text file :P [21:20:02] yeah. they should all return 200 to the calling webserver [21:20:24] nginx and/or apache will send the right header on to the calling client [21:20:42] I'm pretty sure I'm the person who broke tha [21:20:44] *that [21:20:51] (03CR) 10Yuvipanda: [C: 032] Fix error handler endpoints [labs/toollabs] - 10https://gerrit.wikimedia.org/r/278987 (owner: 10BryanDavis) [21:21:01] cwd: one thing, though — the newer versions of keepassX are a complete rewrite, generally regarded as crappy and potentially insecure. The lucky/fully vetted version is v0.4.3 [21:21:48] I use keepassx but I hate it [21:22:12] Candidate: 2.0.2-1 [21:22:15] #testing [21:22:18] no browser integration that works (at least on osx) [21:22:35] the 2.x series is fully busted as far as security goes [21:22:39] I just copy/paste a lot [21:23:02] TimStarling wrote up a thing about the horror of the 2.x series somewhere [21:24:42] cwd: ^ ^^ ^^^ [21:25:20] * cwd eyes ye olde gpg encrypted text file [21:26:28] thcipriani recommends https://www.passwordstore.org/ [21:27:37] does the clipboard integration work in xmonad? i don't love the idea of having a bunch of passwords in my term scrollback [21:27:42] I do like that one. Bunch of bash wrapping around gpg and pwgen. Works great. [21:27:52] bd808: another question for ya: my access.log is 694MB in size, but I find that `truncate --size 0` clears it out only temporarily, then the file is restored in full [21:28:00] some kind of NFS issue I assume? [21:28:52] cwd: with pass? pass -c [name] prompts you for your gpg encryption key's pass, then uses xsel to copy your pass to your clipboard. Clears automagically in 45 seconds. [21:29:53] If I get locked out of my Wikitech account with 2FA enabled, then how can I recover my account? [21:30:57] MusikAnimal: ugh. yeah. It's a "feature" of how the grid handles stderr/stdout [21:31:12] oh haha okay [21:31:13] MusikAnimal: to actually clean up you have to stop the service first [21:31:23] I ended up deleted it, touching it, and restarting [21:31:46] the grid job holds the file open and that keeps your truncate from really doing what you'd hoped [21:32:13] but ideally I'd be able to rotate the logs, something like `tail -c 100000 access.log > temp.log; mv temp.log access.log` [21:32:37] and have that as a cronjob [21:32:41] yeah. I think we have an open task about making that sane [21:33:00] but SGE is fighting against a reasonable solution [21:33:43] is there anything stopping these logs from growing? [21:33:53] other than available disk space [21:33:58] no, and it's a global problem [21:34:05] ouch [21:34:33] one of the many things about SGE that makes my head hurt [21:34:58] it's an awesomely powerful system that let you shoot your own foot off in a lot of ways [21:35:05] this tool's access.log grows at about 15K a minute [21:37:13] For access.log I think you can send nginx SIG_USR1 to tell it to drop and reopen the file handle [21:37:48] * bd808 tries to figure out how that translates into gird commands [21:38:02] PROBLEM - Puppet run on tools-docker-builder-01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [21:38:46] MusikAnimal: can you truncate with '> foo.log' [21:38:48] oh. we use lighttpd, not nginx [21:38:53] or cat /dev/null > foo.log [21:39:02] so lighttpd reload would open a new file handle [21:39:16] http://www.cyberciti.biz/tips/lighttpd-rotating-logs-with-logrotate.html [21:39:40] chasemp: I think I need sudo for /dev/null [21:39:48] chasemp: bd808 did we budget for logging hardware for next fiscal? [21:40:04] * bd808 looks to the boss for that answer [21:40:44] 6Labs, 10Tool-Labs, 15User-bd808: Server errors of web tools are reported as 404 - https://phabricator.wikimedia.org/T128898#2142738 (10scfc) 5Open>3Resolved a:3bd808 This has been fixed by b3b4d1843de969b780797049d0bfa8185585c5d2 and 6c3c1eaa1b4b29c15174b87a53e6b9123f2a1892: ``` tools.typoscan@tools-... [21:40:46] yuvipanda: nothing is on the $$$$ books for now but I made overtures to steal some of the parsoid hosts that are new and didn't work out and in teh mean time I should have some labstore hosts to get PoC going [21:40:50] once I can shuffle things [21:40:57] ok! [21:40:59] so no but it's not a total unknown either [21:41:10] alright. as long as it's on someone's radar :) [21:41:16] those parsoid hosts would be ideal and are not spoken for I was hoping we could make a case together (3 of us) [21:41:20] and snatch'em up [21:41:22] +1 [21:41:31] should be fun! [21:41:45] (going afk for a bit, battery about to die) [21:42:04] chasemp: can we just put in a standing order for all of the mis-sized prod hosts? ;) [21:42:35] :) [21:43:09] If I get locked out of my Wikitech account with 2FA enabled, then how can I recover my account? [21:43:10] bd808: I have two hosts that are slightly out of warrany but would make good fodder for getting started but I need their space to shuffle off things for reimaging the labstore backup host [21:43:22] *nod* [21:43:23] if you want to let's chat a bit on this and you could get started at least playing? [21:43:32] if interested [21:43:43] chasemp: let's talk in 2 weeks? [21:43:46] yep [21:43:58] once I'm back from the hackathon [21:44:26] right [21:44:39] but yeah I was cruising the hw page yesterday thinking on this [21:44:47] bd808: OT: Since we got the "Tool" namespace now, do you think it would be useful to migrate the SAL for the tools to this namespace too? [21:45:04] I'd be for it, yes [21:45:13] not sure if others would disagree [21:45:20] no objection [21:45:23] seems worth a task anyway [21:45:34] maybe make another survey at phabricator? ;) [21:45:57] heh. that worked last time but seemed a bit extreme [21:46:17] we need better governance policies [21:46:20] * Luke081515 didn't say anything the survey at phabricator [21:46:25] I think there are useful [21:47:56] RECOVERY - Puppet run on tools-docker-builder-01 is OK: OK: Less than 1.00% above the threshold [0.0] [21:50:30] Luke081515: legoktm didn't like it and for valid reasons -- https://phabricator.wikimedia.org/T122865#2033434 [22:15:53] bd808: Maybe a "oldschool" poll at wikitech is better [22:16:35] Luke081515: I think just a phab task and ping the handful of people who are actually using the per-tool SAL would work [22:17:10] hm, ok, I think that would work too [22:17:17] if valhalla and scfc don't hate the idea then we are probably good :) [22:17:43] :) [22:44:42] 10Tool-Labs-tools-Other, 6Community-Tech, 7Community-Wishlist-Survey, 7Milestone: Pageview Stats tool - https://phabricator.wikimedia.org/T120497#2143003 (10DannyH) [23:04:10] 6Labs, 6Developer-Relations, 10wikitech.wikimedia.org, 7Epic: [EPIC] Make wikitech more friendly for the multiple audiences it supports - https://phabricator.wikimedia.org/T123425#2143048 (10bd808) [23:04:13] 6Labs, 10wikitech.wikimedia.org, 15User-bd808: Exclude nova resource pages from *default* wikitech search - https://phabricator.wikimedia.org/T122993#2143045 (10bd808) 5Open>3Resolved a:3bd808 A [[https://wikitech.wikimedia.org/w/index.php?search=puppet&title=Special%3ASearch&fulltext=1|search for "pup... [23:27:20] 6Labs, 10Tool-Labs: Setup an easy to use logrotate based system for rotating tools logs - https://phabricator.wikimedia.org/T68623#2143079 (10bd808) @MusikAnimal was asking about log rotation for access.log today. Would that be easier to setup or the same sort of hassle? [23:51:34] Can anyone help me setup an account?.. [23:51:40] Account creation error : There was either an authentication database error or you are not allowed to update your external account. [23:52:48] RileyH: I think that is a know bad error message when the shell account name you picked is already in use [23:53:33] "The error message "There was either an authentication database error or you are not allowed to update your external account." generally indicates an invalid shell account name was used (see T18524)." [23:53:33] T18524: Allow authentication plugins to report error messages - https://phabricator.wikimedia.org/T18524 [23:53:56] RileyH: what did you put in the "Instance shell account name" field? [23:54:04] I put riley [23:54:17] ok. let me see if that really is taken [23:54:42] yup. there is an existing user with that name [23:55:13] fml [23:55:29] there is also an existing rileyh [23:55:35] were toolserver usernames moved over? [23:55:56] maybe? I know old svn account names were [23:56:09] I don't know why toolserver accounts would've been imported [23:57:07] cheers bd808 [23:57:12] I'll find a different username