[00:00:09] 6Labs, 10Labs-Infrastructure, 7Tracking: Labs instances sometimes freeze - https://phabricator.wikimedia.org/T124133#2010263 (10chasemp) thanks @valhallasw. I did look at this, no game changer conclusions yet but some more details I think. I will update in the morning as it is late, but a few thoughts --... [00:02:06] ok, i try to keep it short. the idea came up when i saw a tv documentation about the links between goldman sachs and the european central bank. could it be interesting to bild up something like a wiki network where all users can show hidden or concealed networks between media, politics, industriy, banks, whatever? look at this here: http://visjs.org/network_examples.html evereyone could be able to add nodes as well as edges [00:02:34] i/html site that would give more detailed information about the node and the incoming and outgoing edges [00:04:54] 6Labs, 10Tool-Labs: tools-exec-1213 looks dead - https://phabricator.wikimedia.org/T126141#2010283 (10chasemp) 5Open>3Resolved a:3chasemp unresponsive to salt so reboot [00:05:02] 6Labs, 10Labs-Infrastructure, 7Tracking: Labs instances sometimes freeze - https://phabricator.wikimedia.org/T124133#2010286 (10chasemp) [00:05:24] 6Labs, 10Labs-Infrastructure: tools-worker-1002 locked up - https://phabricator.wikimedia.org/T125039#2010287 (10chasemp) 5Open>3Resolved a:3chasemp back [00:05:31] 6Labs, 10Labs-Infrastructure, 7Tracking: Labs instances sometimes freeze - https://phabricator.wikimedia.org/T124133#1946783 (10chasemp) [00:06:24] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/NicoV was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=292946 edit summary: [00:23:39] Peter__: hmm, sounds very cool, like G. William Domhoff's work. Maybe that could be done using https://www.semantic-mediawiki.org/wiki/Semantic_MediaWiki and linking together wikipedia articles... [00:23:52] I'm not sure where you could suggest this that it would get traction. [00:25:35] Peter__: You could write to the wikidata-l list: https://lists.wikimedia.org/mailman/listinfo/wikidata (subscribe before posting) [00:27:40] thank you, i will wirte them [01:52:48] 6Labs, 10wikitech.wikimedia.org: InstantCommons on wikitech-static - https://phabricator.wikimedia.org/T125695#2010537 (10Krenair) After I got access to wikitech-static today I took a look at the config. It turns out it actually has a setting explicitly enabling InstantCommons, so that's not the issue here. I... [02:08:14] 6Labs, 10wikitech.wikimedia.org: InstantCommons on wikitech-static - https://phabricator.wikimedia.org/T125695#2010550 (10Krenair) Yeah, definitely a change in versions... If you find one of these images that doesn't work on the old version and visit its file page, you get a 404... But if you visit it on the n... [02:17:32] 6Labs, 10wikitech.wikimedia.org: InstantCommons on wikitech-static - https://phabricator.wikimedia.org/T125695#2010557 (10Krenair) (Of course, now I have done it to all of the ones on the front page and we need to find a new example of the bug :/) [02:29:37] hello, I need some advice on serving the new pageviews tool at https://tools.wmflabs.org/musikanimal/pageviews [02:30:00] it is an all clientside app, but the route /pageviews is being served by my Ruby app, which is not good [02:30:42] there is the lighttpd but I wonder if it could handle large traffic, or if it matters since we're only serving a single HTML file, and the assets (CSS, JS) are on the file system [02:31:53] for instance I see the lighttpd config option "server.max-connections". How would that correlate to a single HTML file? (no backend) [02:32:31] since it's just an HTML file, there is no "connection", right? [02:33:17] there are active connections when it is sending the file [02:34:41] so how would the limit apply in this situation? if it's set to say 300, then up to 300 people can be served the file in a single "instant"? or what's the unit of time [02:36:01] just need to make sure this doesn't go down... ever. My hope is it being just an all-frontend application downtime won't ever be an issue (barring an issue with Labs itself) [02:36:21] MusikAnimal: well, I'd guess that it's the number of open connections allowed at any single point in time, but I doubt you really need more than the default (which is 1024 IIRC) [02:36:56] https://redmine.lighttpd.net/projects/1/wiki/Server_max-connectionsDetails says some other things [02:37:01] I deleted my .conf because there was some kind of conflict with my Ruby app a while back [02:37:10] but they have a sample at https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Web [02:37:27] might want to seek help setting up... maybe I should create another tool entirely just to test this [02:37:49] oh, I see [02:38:13] well, I really doubt that would ever be a problem if you're serving static html [02:38:40] yeah, there's also the `server.stat-cache-engine = "fam"` config option, so I assume it will stay cached until I pull in new changes [02:38:57] hopefully I don't have to restart, I don't with the Ruby app [02:42:06] MusikAnimal: um, if you want it to never go down, labs is the wrong place [02:42:30] haha, I suppose that's true [02:42:44] I guess I could put it on GitHub pages [02:43:17] but there's already a ton of links going to https://tools.wmflabs.org/musikanimal/pageviews , big cleanup effort [02:43:26] just redirect it [02:43:48] sort out the major backlinks and don't worry about the others [02:44:11] I could try to do that through the Ruby app, a JS redirect won't be pretty, I think [02:44:23] and not a real redirect [02:47:31] uhm [02:47:40] please don't send people to a non-privacy policy compliant place :(( [02:47:46] I don't understand why you need 100% uptime [02:47:55] even Wikimedia production doesn't have that [02:48:07] last week labs had better uptime than github lol [02:48:15] yeah I had the same feeling, GitHub will be more reliable but it's not as wiki-ish [02:48:20] oh really [02:48:20] well [02:48:31] that was because of a storm in Virginia I think [02:48:34] did you miss the giant github outage caused by a power failure? [02:48:48] https://github.com/blog/2106-january-28th-incident-report [02:48:50] *Virginia [02:48:52] yeah [02:48:56] that's what I'm talking about [02:49:07] I mean, guess where Labs physically lives [02:49:38] Virginia? [02:49:46] ding ding ding [02:50:02] also one of the larger Amazon EC2 farms [02:50:09] You still haven't explained why you need 100% uptime for this [02:50:20] and Carpathia [02:50:31] *need* is the wrong word [02:51:56] just getting a lot of traffic is all, so I want to do whatever I can to get optimal uptime [02:52:31] it already is getting a lot of traffic? [02:52:58] about 4-5k hits a day [02:53:05] that's only page refreshes [02:55:28] I mean, that's not that much, only a few per minute [02:55:48] yeah not that many at all, but it is for a tool [02:56:28] and for relying on a Unicorn server on Labs [02:57:19] but a static HTML files per min is a lot less than, say, a few 30-second copyvio checks per minute [02:57:33] I'm not sure what I'm getting at though [02:57:35] definitely, but it's routing through the Ruby app [02:57:42] so...? [02:58:45] for one, I think it will eventually go down naturally; took about 3 months last time, but I was on holiday so it stayed down for a week [02:59:09] the lighttpd webserver is supposed to automatically restart [02:59:19] but also, restarting the Unicorn server takes 5 minutes or more, for some f-ing reason [02:59:30] so I can't update my Ruby tools without bringing pageviews down [03:01:57] okay, so back up for a sec [03:02:04] why do we need a ruby app to serve a html file? [03:02:17] we don't, it's just all I had set up [03:02:31] you can just put the file in public_html [03:02:52] I mean, if that's the concern here [03:02:59] well I need to first restore the lighttpd conf [03:04:05] but also I'm worried... there were conflicts before between lighttpd and the Unicorn server. Don't remember what exactly, but if it's routed properly I think it should be fine [03:05:33] my plan is to serve from the same /public folder my Ruby app uses for CSS/JS/etc. The HTML for the Ruby app isn't in /public, so no concern with a /index.html conflict, I think [03:06:03] or I could create a new git repo... but I kind of like it being bundled in with my other work [03:06:18] I don't think I see where ruby comes into play here [03:07:39] me neither [03:07:54] the Ruby Unicorn server routes /pageviews. If it goes down than pageviews goes bye bye, unless I get lighttpd up [03:08:39] okay first of all, why isn't this a separate tool from your other stuff? [03:09:55] that's what I'm starting to question, since there's no backend. It made sense initially to have all my tools in the same repo [03:10:56] I'm going to give lighttpd a try, I think it will be OK [03:11:12] generally you want to make a separate tool for everything [03:11:33] the old 'everything goes under /yourname/' toolserver system is kinda a bad idea [03:11:47] since multiple maintainers are hard and all of that [03:12:08] not if they all work together, which is the case with all my Ruby tools [03:12:21] in what sense do they work together? [03:13:48] they share a lot of the same code [03:29:39] 6Labs, 10wikitech.wikimedia.org: InstantCommons on wikitech-static - https://phabricator.wikimedia.org/T125695#2010662 (10Krenair) 5Open>3Resolved a:3Krenair Okay, tracked this down. Was due to commons going SSL-only, was fixed in https://gerrit.wikimedia.org/r/#/c/222086/ and https://gerrit.wikimedia.or... [03:30:25] 6Labs, 10wikitech.wikimedia.org: InstantCommons broken on wikitech-static - https://phabricator.wikimedia.org/T125695#2010665 (10Krenair) [03:47:28] Earwig: I want to give you a hug for saying "the old 'everything goes under /yourname/' toolserver system is kinda a bad idea" :) [03:47:55] haha [03:48:41] it's a really short sighted way to think of a FLOSS software project [03:49:36] I hope to see more people start thinking about tools as proper FLOSS software rather than personal property [04:08:48] well that didn't work [04:10:51] https://tools.wmflabs.org/musikanimal/pageviews [04:16:30] l [04:53:27] so fun thing ... i tore down discourse.search.eqiad.wmflabs so i could re-create it in the discourse project yuvi created for me last week. Everything is back up and i can talk to the instance over a socks proxy to labs. but after deleting the web proxy from search project and recreating it in the discourse project all i can get from discourse.wmflabs.org is a gateway timeout :S [04:54:32] the url shown by Special:NovaProxy seems happy, http://discourse1001.discourse.eqiad.wmflabs:80 [04:55:22] 6Labs, 10Labs-Infrastructure, 10labs-sprint-117, 10labs-sprint-118, and 3 others: Move project membership/assignment from ldap to keystone mysql - https://phabricator.wikimedia.org/T115029#2010737 (10Andrew) labtestwikitech.wikimedia.org now reflects the end of the above process (plus one additional patch... [05:00:18] oh of course, need new security groups for new project [05:31:00] (03PS1) 10BryanDavis: Add bash-completion config for `become` [labs/toollabs] - 10https://gerrit.wikimedia.org/r/269367 (https://phabricator.wikimedia.org/T124106) [05:31:56] (03CR) 10jenkins-bot: [V: 04-1] Add bash-completion config for `become` [labs/toollabs] - 10https://gerrit.wikimedia.org/r/269367 (https://phabricator.wikimedia.org/T124106) (owner: 10BryanDavis) [05:34:18] (03CR) 10BryanDavis: "Not sure why this change would cause a jsub test to fail :/" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/269367 (https://phabricator.wikimedia.org/T124106) (owner: 10BryanDavis) [05:35:05] (03CR) 10BryanDavis: "The completion script itself works; the debian bits are google powered copy-pasta." [labs/toollabs] - 10https://gerrit.wikimedia.org/r/269367 (https://phabricator.wikimedia.org/T124106) (owner: 10BryanDavis) [07:21:36] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1412 is OK: OK: Less than 1.00% above the threshold [0.0] [09:56:30] cc !log running table engine conversion script on db1069 (potential small lag on labs for 1 day) [09:56:35] only on s3 [10:00:24] RECOVERY - Puppet failure on tools-flannel-etcd-02 is OK: OK: Less than 1.00% above the threshold [0.0] [10:00:57] jynus: thanks! [10:01:06] 6Labs, 10Labs-Infrastructure, 10DBA, 6operations: db1069 is running low on space - https://phabricator.wikimedia.org/T124464#2010935 (10jcrespo) We are already down to 85% disk usage, the conversion is still ongoing. It may be worth checking labs, too. [10:26:35] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Ljonka was created, changed by Ljonka link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Ljonka edit summary: Created page with "{{Tools Access Request |Justification=Create and submit new extensions, as a developer at HalloWelt, most extensions will be around BlueSpice. |Completed=false |User Name=Ljon..." [10:53:48] in case anyone is in here using salt for something [10:54:09] it's going to be about a half hour (however long it takes for puppet to make the rounds) for the minions to come back up [10:54:28] I did a salt master key fixup (useful because labcontrol1002 had some random different key) [10:54:48] but I left off the newline so all the minions thought it was a new key :-/ [10:55:06] in about a half hour I'll see which ones haven't recovered and restart via ssh for any that are accessible that way [11:29:45] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Ljonka was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=293686 edit summary: [11:30:05] so we have 195 that didn't return, and 532 thta are happy [11:31:42] but a lot of these no returners are running their own salt masters, I wonder how their keys got in here again [11:31:55] anyways I'll do an ssh loop to restart salt-minion on those and that wil be the end of it [11:49:31] wikitech logins are failing for me ("Incorrect password entered. Please try again."), my password is fine, though (just double-checked using ldapsearch), is that a known problem/fallout of the OSM/Horizon work? [11:54:44] 123 or those had bad names (no project name in the hostname) [11:55:07] that's poor. I can remove those keys outright, they are dupes of keys already in there but a sign that something in lab instance setup is going wrong [12:00:45] (03CR) 10Tim Landscheidt: "recheck" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/268602 (owner: 10Tim Landscheidt) [12:04:50] I can't login on wikitech since at least yesterday [12:05:11] someone sitting next to me has the same problem [12:06:49] moritzm: same for me, but seemingly not "known" [12:06:57] Works for me, strangely enough. [12:07:25] can you create a phabricator task with the affected username + description of the error? [12:07:50] 6Labs, 10Tool-Labs, 10Continuous-Integration-Infrastructure, 7Blocked-on-RelEng, 5Patch-For-Review: debian-glue tries to fetch obsolete package - https://phabricator.wikimedia.org/T125999#2011140 (10scfc) 5Open>3Resolved a:3scfc Now the package installation works (https://integration.wikimedia.org/... [12:08:03] 6Labs, 10Tool-Labs, 10Continuous-Integration-Infrastructure, 7Blocked-on-RelEng, 5Patch-For-Review: debian-glue tries to fetch obsolete package - https://phabricator.wikimedia.org/T125999#2011143 (10scfc) a:5scfc>3None [12:08:47] (03CR) 10Tim Landscheidt: "recheck" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/268563 (owner: 10Tim Landscheidt) [12:12:20] I'll create a phab task [12:14:14] moritzm: the same credentials do work for gerrit? [12:14:21] 6Labs, 10Tool-Labs, 10Continuous-Integration-Infrastructure, 7Blocked-on-RelEng, 5Patch-For-Review: debian-glue tries to fetch obsolete package - https://phabricator.wikimedia.org/T125999#2011150 (10hashar) 5Resolved>3Open I have manually updated the image somehow. Will have to ask around but we mos... [12:14:42] if it works there, it suggests it's something wikitech specific, if not it suggests something LDAP-y [12:15:30] valhallasw`cloud: it's specific to wikitech for sure, I made an authenticated ldapsearch with my password and that worked fine [12:15:42] I created https://phabricator.wikimedia.org/T126322 [12:16:06] not sure about the specific project, though, please add as needed, for now I only added operations [12:16:28] (03PS1) 10Tim Landscheidt: WIP [labs/toollabs] - 10https://gerrit.wikimedia.org/r/269404 [12:16:48] me neither, but I added labs and wikitech [12:16:53] * valhallasw`cloud prods wikibugs [12:17:02] ughhh [12:17:15] why can I CC projects [12:17:15] 6Labs, 6operations, 10wikitech.wikimedia.org: Failing wikitech logins - https://phabricator.wikimedia.org/T126322#2011152 (10valhallasw) [12:21:24] 6Labs, 6operations, 10wikitech.wikimedia.org: Failing wikitech logins - https://phabricator.wikimedia.org/T126322#2011166 (10Aklapper) p:5Triage>3High [12:28:31] valhallasw`cloud: cc-ing projects sorta the same as cc-ing everyone in that project, even when they are not watching the project [12:29:07] Yeah. I'm not sure what that's good for ;-) [12:29:18] except for accidentally spamming everyone when you try to add a project [12:36:54] 6Labs, 6operations, 10wikitech.wikimedia.org: Failing wikitech logins - https://phabricator.wikimedia.org/T126322#2011187 (10aude) I can login, but maybe something is different with my sessions, tokens, etc. [12:39:04] 6Labs, 6operations, 10wikitech.wikimedia.org: Failing wikitech logins - https://phabricator.wikimedia.org/T126322#2011189 (10JanZerebecki) From logstash, the only relevant message my try created seems to be: "2.1.0 OpenStackNovaController::authenticate return code: 401" [12:40:26] ok done, about 20 instances still don't respond to salt, can't ssh into those (broken) pretty much. I fixed some broken minion config files (broken puppet) and some out of space instances. [12:40:32] so I'm off again [12:40:36] *poof* [12:42:11] 6Labs, 6operations, 10wikitech.wikimedia.org: Failing wikitech logins - https://phabricator.wikimedia.org/T126322#2011194 (10hashar) I have managed to login properly. I have two factor authentication if that matter. [12:42:28] (03CR) 10Tim Landscheidt: "The failure is not related to this change (or jsub); the output of ls on the test server when the argument cannot be found has changed fro" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/269367 (https://phabricator.wikimedia.org/T124106) (owner: 10BryanDavis) [12:46:30] 6Labs: broken labs instances (ssh or perms), do we care? - https://phabricator.wikimedia.org/T126323#2011201 (10ArielGlenn) 3NEW [12:48:16] 6Labs: broken labs instances (ssh or perms), do we care? - https://phabricator.wikimedia.org/T126323#2011215 (10valhallasw) Possibly related to {T124133}? [12:50:10] 6Labs: salt keys being created and accepted with wrong hostname (no project name in hostname) - https://phabricator.wikimedia.org/T126324#2011218 (10ArielGlenn) 3NEW [13:06:29] Hello, I have just registered to Wikitech and pushed my first commit, but now I have realized that I would like to change my username from "Petr.matas" to "Petr Matas" to match my Wikipedia name. Is it possible? [13:08:42] petr-matas: unfortunately not [13:08:52] petr-matas: you can however create a new account with the correct username [13:10:27] Ok, now if I create the new account, can I reuse the shell name "petr-matas"? [13:17:42] no, unfortunately now [13:17:44] not* [13:36:44] valhallasw`cloud: ok, thanks [13:37:19] (03Abandoned) 10Tim Landscheidt: WIP [labs/toollabs] - 10https://gerrit.wikimedia.org/r/269404 (owner: 10Tim Landscheidt) [13:43:23] (03PS1) 10Tim Landscheidt: Fix test suite's check for empty directory [labs/toollabs] - 10https://gerrit.wikimedia.org/r/269408 [13:50:46] (03CR) 10Tim Landscheidt: [C: 032] Fix test suite's check for empty directory [labs/toollabs] - 10https://gerrit.wikimedia.org/r/269408 (owner: 10Tim Landscheidt) [14:00:52] (03PS2) 10Tim Landscheidt: Add bash-completion config for `become` [labs/toollabs] - 10https://gerrit.wikimedia.org/r/269367 (https://phabricator.wikimedia.org/T124106) (owner: 10BryanDavis) [14:18:46] (03CR) 10Tim Landscheidt: [C: 04-1] "I built the package on toolsbeta-webgrid-lighttpd-1406 with "sudo DIST=trusty pdebuild --buildresult ..", but the resulting misctools*.deb" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/269367 (https://phabricator.wikimedia.org/T124106) (owner: 10BryanDavis) [14:24:20] 6Labs, 6operations, 10wikitech.wikimedia.org: Failing wikitech logins - https://phabricator.wikimedia.org/T126322#2011398 (10JanZerebecki) I can still log in to https://horizon.wikimedia.org/ . [14:32:12] 6Labs, 6operations, 10wikitech.wikimedia.org: Rename specific account in LDAP, Wikitech, Gerrit and Phabricator - https://phabricator.wikimedia.org/T85913#2011403 (10Krenair) @demon: Ping. [14:32:17] petan, ^ [14:32:31] that's an existing request for a similar sort of rename [14:33:42] 6Labs, 10Tool-Labs, 10Continuous-Integration-Infrastructure, 7Blocked-on-RelEng, 5Patch-For-Review: debian-glue tries to fetch obsolete package - https://phabricator.wikimedia.org/T125999#2011404 (10akosiaris) Taking a look at this I would say that it has nothing to do with the symlink mentioned above, b... [14:40:46] 6Labs, 6operations, 10wikitech.wikimedia.org: Failing wikitech logins - https://phabricator.wikimedia.org/T126322#2011429 (10Krenair) WFM. Interestingly I was getting that same OSNC auth 401 error when trying to log into labtestwikitech recently. [14:44:04] 6Labs, 6operations, 10wikitech.wikimedia.org: Failing wikitech logins - https://phabricator.wikimedia.org/T126322#2011431 (10jcrespo) I had the same problem, changing the LDAP password though wikitech fixed it for me. [14:51:39] 6Labs, 6operations, 10wikitech.wikimedia.org: Failing wikitech logins - https://phabricator.wikimedia.org/T126322#2011439 (10JanZerebecki) Restarting keystone didn't help. [15:15:10] RECOVERY - Puppet failure on tools-precise-dev is OK: OK: Less than 1.00% above the threshold [0.0] [15:23:31] 6Labs, 6operations, 10wikitech.wikimedia.org, 5Patch-For-Review: Failing wikitech logins - https://phabricator.wikimedia.org/T126322#2011499 (10Andrew) a:3Andrew This is resolved for the moment, but the settings are wrong... keeping open until I can fix properly. [15:35:17] 6Labs, 10wikitech.wikimedia.org: Enable math extension on wikitech - https://phabricator.wikimedia.org/T126338#2011531 (10Halfak) 3NEW [15:36:30] 6Labs, 10Wikimedia-Site-Requests, 10wikitech.wikimedia.org: Enable math extension on wikitech - https://phabricator.wikimedia.org/T126338#2011543 (10Krenair) [15:43:59] 6Labs: labs dns seems to be holding bad records - https://phabricator.wikimedia.org/T126340#2011568 (10chasemp) 3NEW a:3Andrew [15:47:28] sounds dupey [15:48:38] 6Labs: labs dns seems to be holding bad records - https://phabricator.wikimedia.org/T126340#2011600 (10Krenair) [15:48:40] 6Labs, 6operations: RDNS for 10.68.18.65 resolves to two different instances - https://phabricator.wikimedia.org/T115194#2011601 (10Krenair) [15:49:44] 6Labs, 6operations: RDNS for some labs instance IPs resolve to multiple different instances - https://phabricator.wikimedia.org/T115194#2011608 (10Krenair) [15:51:27] 6Labs, 6operations: RDNS for some labs instance IPs resolve to multiple different instances - https://phabricator.wikimedia.org/T115194#2011613 (10Krenair) [15:52:14] 6Labs, 10wikitech.wikimedia.org: Hostnames assigned to floating IP persist when deallocated - https://phabricator.wikimedia.org/T55816#2011616 (10Krenair) Ping... Another case of that bug has been found by @chasemp. [15:58:51] 6Labs, 10Wikimedia-Site-Requests, 10wikitech.wikimedia.org: Enable math extension on wikitech - https://phabricator.wikimedia.org/T126338#2011634 (10Glaisher) @andrew Any particular reason why it was disabled https://gerrit.wikimedia.org/r/#/c/158409/ ? Can we enable it now? [16:01:19] 6Labs, 10Wikimedia-Site-Requests, 10wikitech.wikimedia.org: Enable math extension on wikitech - https://phabricator.wikimedia.org/T126338#2011655 (10Krenair) [16:01:21] 6Labs, 6Developer-Relations, 10wikitech.wikimedia.org, 7Epic: [EPIC] Make wikitech more friendly for the multiple audiences it supports - https://phabricator.wikimedia.org/T123425#2011654 (10Krenair) [16:55:52] valhallasw`cloud: andrewbogott we have some space warnings on tools NFS share, going to have to address this soon. I needed to for capacity plannings sake anyway but fyi I'm going to email that a log culling is necessary and try to do so sanely. [16:56:09] chasemp: sounds good [16:56:13] yep [16:58:02] andrewbogott: labs-announce? [16:58:44] yeah, probably [17:13:42] 6Labs, 6operations, 10wikitech.wikimedia.org: Rename specific account in LDAP, Wikitech, Gerrit and Phabricator - https://phabricator.wikimedia.org/T85913#2011847 (10demon) @adrianheine find me on IRC (ostriches or ^d), prefer to do it synchronously so we can troubleshoot if it goes wrong. [17:27:12] RECOVERY - Puppet failure on tools-exec-1210 is OK: OK: Less than 1.00% above the threshold [0.0] [17:27:15] PROBLEM - Puppet failure on tools-exec-1208 is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [0.0] [17:27:59] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1207 is OK: OK: Less than 1.00% above the threshold [0.0] [17:28:13] RECOVERY - Puppet failure on tools-exec-1219 is OK: OK: Less than 1.00% above the threshold [0.0] [17:28:35] 6Labs, 10Tool-Labs: tools-exec-12* puppet broken: php5* packages have been upgraded again? - https://phabricator.wikimedia.org/T126205#2011885 (10valhallasw) a:3valhallasw I'm just ran ```apt-get install -y --force-yes libapache2-mod-php5filter php-pear php5-cli php5-common php5-curl php5-gd php5-intl php5-m... [17:29:25] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1209 is OK: OK: Less than 1.00% above the threshold [0.0] [17:30:25] bah, I want logstash for tools ;-) [17:31:55] valhallasw`cloud: so do I ;) [17:32:13] I feel most of my time is spent in waiting for ssh to connect [17:32:40] 6Labs, 10Tool-Labs: tools-exec-12* puppet broken: php5* packages have been upgraded again? - https://phabricator.wikimedia.org/T126205#2011904 (10valhallasw) [17:32:42] 6Labs, 10Tool-Labs: puppet failure on a large number of instances - https://phabricator.wikimedia.org/T126165#2011903 (10valhallasw) [17:33:05] 6Labs, 10Tool-Labs: tools-exec-gift: libsndfile1:amd64 1.0.25-4ubuntu0.1 cannot be configured because libsndfile1:i386 is in a different version - https://phabricator.wikimedia.org/T126351#2011905 (10valhallasw) 3NEW [17:34:18] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1203 is OK: OK: Less than 1.00% above the threshold [0.0] [17:34:30] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1205 is OK: OK: Less than 1.00% above the threshold [0.0] [17:36:52] RECOVERY - Puppet failure on tools-exec-1206 is OK: OK: Less than 1.00% above the threshold [0.0] [17:36:55] RECOVERY - Puppet failure on tools-exec-1216 is OK: OK: Less than 1.00% above the threshold [0.0] [17:37:07] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1206 is OK: OK: Less than 1.00% above the threshold [0.0] [17:37:07] RECOVERY - Puppet failure on tools-exec-1211 is OK: OK: Less than 1.00% above the threshold [0.0] [17:37:21] RECOVERY - Puppet failure on tools-exec-1208 is OK: OK: Less than 1.00% above the threshold [0.0] [17:38:35] RECOVERY - Puppet failure on tools-exec-1217 is OK: OK: Less than 1.00% above the threshold [0.0] [17:45:10] RECOVERY - Puppet failure on tools-exec-1202 is OK: OK: Less than 1.00% above the threshold [0.0] [17:51:32] 6Labs, 10Tool-Labs: tools-grid-master / almost full (929M/18G free) - https://phabricator.wikimedia.org/T126353#2011964 (10valhallasw) 3NEW [17:53:08] 6Labs, 10Tool-Labs: Fix 'unknown's in shinken - https://phabricator.wikimedia.org/T99072#2011976 (10valhallasw) [17:53:10] 6Labs, 10Tool-Labs: Fix shinken config to remove tools-webproxy-test - https://phabricator.wikimedia.org/T99073#2011974 (10valhallasw) 5Open>3Resolved a:3valhallasw [17:54:25] 6Labs, 10Tool-Labs: tools-docker-registry-01 has incorrect puppetmaster key - https://phabricator.wikimedia.org/T126167#2011981 (10valhallasw) 5Resolved>3Open Still broken: ``` Exiting; no certificate found and waitforcert is disabled ``` [17:54:27] 6Labs, 10Tool-Labs: puppet failure on a large number of instances - https://phabricator.wikimedia.org/T126165#2011983 (10valhallasw) [18:03:51] 6Labs, 10Tool-Labs, 10Continuous-Integration-Infrastructure, 7Blocked-on-RelEng, 5Patch-For-Review: debian-glue tries to fetch obsolete package - https://phabricator.wikimedia.org/T125999#2012014 (10akosiaris) Tested and the above patch definitely solves the production problem with outdated build environ... [18:04:34] 6Labs, 10Tool-Labs: puppet failure on a large number of instances - https://phabricator.wikimedia.org/T126165#2012019 (10valhallasw) [18:04:36] 6Labs, 10Tool-Labs: tools-submit: cron service stopped and puppet disabled - https://phabricator.wikimedia.org/T126172#2012016 (10valhallasw) 5Open>3Resolved a:3valhallasw Host deleted. [18:05:07] PROBLEM - Host tools-submit is DOWN: CRITICAL - Host Unreachable (10.68.17.1) [18:07:18] 6Labs, 10Tool-Labs: tools-exec-gift: libsndfile1:amd64 1.0.25-4ubuntu0.1 cannot be configured because libsndfile1:i386 is in a different version - https://phabricator.wikimedia.org/T126351#2012021 (10valhallasw) Same for `tools-exec-cyberbot`. [18:11:46] YuviPanda: what did you do to preserve a logged-in state on quarry? [18:12:06] harej: cookies, probably? [18:12:34] what would the cookie actually consist of? [18:12:37] gerrit-patch-uploader uses a signed cookie on the users side, but you can also use real sessions [18:12:51] I think django probably has sane session support? [18:13:11] django does have a first-party auth system [18:13:25] though i'm not using it in favor of this oauth thing [18:13:43] unrelated question: is there a parsoid API I can use, or should I run my own? [18:14:09] sessions != auth [18:14:17] sessions are a way to store information between requests [18:14:35] which you need for auth, but it doesn't say anything about which kind of auth you're using [18:15:05] right. but if there's an authentication system, there probably is a lower-level API for cookies or sessions or whatever [18:21:05] And sure enough. https://docs.djangoproject.com/en/1.9/topics/http/sessions/ [18:21:40] 6Labs, 10Tool-Labs: tools-exec-gift: libsndfile1:amd64 1.0.25-4ubuntu0.1 cannot be configured because libsndfile1:i386 is in a different version - https://phabricator.wikimedia.org/T126351#2012035 (10valhallasw) [18:21:42] 6Labs, 10Tool-Labs, 5Patch-For-Review: puppet/apt issues on tools-submit - https://phabricator.wikimedia.org/T124014#2012036 (10valhallasw) [18:25:59] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1201 is OK: OK: Less than 1.00% above the threshold [0.0] [18:26:21] 6Labs, 10Tool-Labs: tools-exec-cyberbot: Notice: /Stage[main]/Toollabs::Node::All/Labs_lvm::Swap[big]/Exec[create-swap-big]/returns: Volume group "vd" has insufficient free space (1678 extents): 2967 required. - https://phabricator.wikimedia.org/T126356#2012050 (10valhallasw) 3NEW [18:26:31] 6Labs, 10Tool-Labs: tools-exec-cyberbot and tools-exec-gift: Notice: /Stage[main]/Toollabs::Node::All/Labs_lvm::Swap[big]/Exec[create-swap-big]/returns: Volume group "vd" has insufficient free space (1678 extents): 2967 required. - https://phabricator.wikimedia.org/T126356#2012050 (10valhallasw) [18:50:57] RECOVERY - Puppet failure on tools-exec-1214 is OK: OK: Less than 1.00% above the threshold [0.0] [18:52:46] 6Labs: broken labs instances (ssh or perms), do we care? - https://phabricator.wikimedia.org/T126323#2012127 (10mmodell) {T125666} [19:00:12] RECOVERY - Puppet failure on tools-exec-1207 is OK: OK: Less than 1.00% above the threshold [0.0] [19:10:15] RECOVERY - Puppet failure on tools-exec-1220 is OK: OK: Less than 1.00% above the threshold [0.0] [19:10:15] RECOVERY - Puppet failure on tools-exec-1215 is OK: OK: Less than 1.00% above the threshold [0.0] [19:22:12] kaldari, ringa ping ping [19:36:33] Hey, I want to delete one of my service groups, Is there anyone to help [19:40:12] valhallasw`cloud: what is project/bene [19:40:20] "wikibase" lots of large zips [19:42:53] chasemp: hey, In order to help, I just cleaned up my logs, and some unsed files, got 18 GB deleted in tools [19:42:56] *unused [19:43:03] Amir1: thanks1 [19:43:09] 1=! :) [19:43:25] I want to delete one of my service groups that might also take some space [19:43:36] like the actual group definition or ? [19:43:39] not sure I understand [19:43:57] just deleted another 4GB [19:44:37] let me show you :) [19:44:45] (trying to find a doc.) [19:44:58] sure [19:47:54] chasemp: https://wikitech.wikimedia.org/wiki/Special:NovaServiceGroup [19:48:34] I deleted ~300MB from my user space [19:49:42] chasemp: https://tools.wmflabs.org/contact/ ;-) so something from benestar|cloud ( https://www.mediawiki.org/wiki/User:Bene* ) [19:49:45] Amir1: ah! right spirit but the tool user won't affect our space issue here, we need to do a whole round of cleanup there though as well :) I'm not entirely clear on best practice in this case. If you can make a task in phabricator? otherwise I wouldn't worry at this time [19:50:07] valhallasw`cloud: nice thank you [19:50:21] chasemp, Amir1, Tim (scfc) knows how to do that and odes it every now and then, so a phab task would be best [19:50:48] sure valhallasw`cloud :) [19:51:03] in the mean time let me delete some more files [19:51:45] chasemp: could we do a du -hs * on /data/project on labstore? [19:51:54] or did you already do that? [19:52:04] yeah I'm in process of a gentle version of this now [19:52:11] it kind of borks the server atm [19:52:25] * valhallasw`cloud nods [19:52:45] * valhallasw`cloud thinks back about the good ol' toolserver 500MB quota days [19:53:46] deleted 2.3GB [19:54:11] so overall I've deleted 25GB [19:54:12] valhallasw`cloud: going to have to get real w/ quota's here sometime [19:54:16] no help for it [19:54:26] but we can be more lenient than 500MB [19:54:33] * gifti tries to save their data before it is deleted [19:55:28] so we have 8T for 1300 service groups (!) and 860 users [19:55:37] yeah [19:56:09] so about 4G on average, which is not even such a crazy amount of data [19:56:14] we can bump that some and beg and borrow etc but for the most part current allocation of space is just poor [19:56:40] but I guess most of the data is not in the average users but in the top few $ [19:56:54] basically yes [19:57:01] we have two things happening [19:57:05] death by a thousand paper cuts [19:57:15] ppl w/ like .e1 .e2 .e3 .e4 [19:57:29] that's qsub [19:57:32] and then the 30 or so ppl w/ a lot of data [19:57:36] right [19:57:51] I mean, none of this is really user fault...there is no log cleanup or logic or mechanism [19:58:06] so we are now at an eventual consequence of expected behavior [19:58:14] * valhallasw`cloud nods [19:58:28] of course, the secondary issue is bandwidth [19:58:33] that too [19:58:42] if you write a 1.5T log file, that means you're consistently using a lot of bandwidth as well [19:58:53] which logrotate would hide, in some sense [19:59:03] yeah [19:59:04] but we can now measure that more directly iirc? [19:59:11] we can [19:59:27] and also limit it within certain guidelines [19:59:31] 1.5T, holy crap [20:00:39] valhallasw`cloud: did you see https://phabricator.wikimedia.org/T126083#2004459 [20:00:47] some illuminating stuff on actual NFS usage [20:01:28] I can't remember talking to you about this but I took apart the sysutils nfsiostat and made a diamond collector [20:01:38] yeah, I remember talking about that [20:01:40] so it's per node not per tool [20:01:46] back when we were fighting IO congestion [20:03:02] I spent like 2 weeks in every waking moment thinkng about to limit NFS clients and dug into cgroups to do it [20:03:10] and I have finally admitted to myself it's not easy [20:03:18] and that bw / rate is the most sane mechanism atm [20:03:35] though i/o doesn't directly equate i.e. I can make small requests w/ large i/o demands [20:03:46] we can get fairly close and leave it to significant outliers [20:08:22] for a few days I've been meaning to write this up but fire fire fire [20:13:45] chasemp: *grin* 'oh, right, we were going to do logstash' [20:13:55] :) [20:14:25] I think it may be some combo of http://www.fluentd.org/ and elk [20:14:38] that is what the k8s mechanism is built around [20:14:49] I sent and saw an openshift setup a few weeks ago where they showed it off [20:14:59] /per user log permissions// [20:15:05] I was like 'yessssss' [20:16:17] as in: k8s containers log to fluentd by default? [20:17:09] yeah it's all baked in [20:17:23] they have a pretty modular mindset so you are not stuck w/ it [20:17:36] but costs of trying to be different I think are not small [20:18:31] 6Labs, 10Tool-Labs: Setup an easy to use logrotate based system for rotating tools logs - https://phabricator.wikimedia.org/T68623#2012498 (10valhallasw) I've fiddled a bit more with this; the following works correctly: ``` valhallasw@tools-bastion-01:~$ cat testrotate /home/valhallasw/echoforever.out { c... [20:18:32] ^ [20:19:52] I don't see wikibugs, arrow to? [20:20:02] https://phabricator.wikimedia.org/T68623#2012498 [20:20:16] nice [20:20:29] the current problem is basically that the cost of rotating all the files in all teh places is too much for teh labstore box [20:20:37] and yet doing it all offbox is harder on it in some ways [20:20:44] NFS and log files is just not a good situation [20:21:16] * valhallasw`cloud nods [20:21:19] i was thinking about some softer iterative rotation logic that we could sustain [20:21:22] but I have no real plan atm [20:22:14] I'm also thinking about how we could pipe SGE log files into some other system [20:23:21] but that would probably end up with something like fifo pipes, which already makes my brain hurt [20:23:33] :D I spent some time on that too [20:23:45] would a seperate NFS partition/NFS host help? [20:24:24] some horizontal load split has to happen yeah [20:25:03] http://www.fluentd.org/images/fluentd-before.png [20:25:14] Yep, that looks like Tool Labs all right [20:25:25] heh [20:28:00] a not insignificant source of load is read on NFS from the fact that we are literally hosting a thousand sites on NFS [20:28:10] teh same NFS that the daemons serving teh files are logging too... [20:28:18] and some of those files are 25G and larger [20:28:35] so you can dowload this huge zip through nginx, lighthttpd, from NFS [20:28:36] * valhallasw`cloud nods [20:29:19] I've deleted some more directories which I think they took lots of space but it's hard to say exactly how much. Please check if it was helpful [20:29:28] Amir1: thank you [20:31:25] is the enwp10 project relevant now? [20:33:09] :) [20:33:48] 6Labs, 10wikitech.wikimedia.org: Have a process for regularly updating wikitech-static - https://phabricator.wikimedia.org/T125709#2012615 (10Krenair) 5Open>3Resolved It's now on 1.26.2 and I plan to keep it up to date with MW releases. [20:42:03] 6Labs, 10Tool-Labs, 10Continuous-Integration-Infrastructure, 5Patch-For-Review, 7WorkType-Maintenance: Change sid pbuilder image name to 'unstable' - https://phabricator.wikimedia.org/T111097#2012644 (10akosiaris) >>! In T111097#2007656, @hashar wrote: > Funny side effect found on {T125999}. The labs/too... [20:54:54] (03PS3) 10Tim Landscheidt: Add bash-completion config for `become` [labs/toollabs] - 10https://gerrit.wikimedia.org/r/269367 (https://phabricator.wikimedia.org/T124106) (owner: 10BryanDavis) [21:01:47] (03CR) 10Tim Landscheidt: [C: 032] "The problem was that the extension in debian/ should be *.bash-completion and not *.bash_completion (one thing I don't like about Debian p" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/269367 (https://phabricator.wikimedia.org/T124106) (owner: 10BryanDavis) [21:20:19] (03PS4) 10Tim Landscheidt: Import sql from operations/puppet [labs/toollabs] - 10https://gerrit.wikimedia.org/r/268563 [21:20:21] (03PS3) 10Tim Landscheidt: Add man page for sql [labs/toollabs] - 10https://gerrit.wikimedia.org/r/268602 [21:25:24] 6Labs, 10Labs-Infrastructure, 10Reading-Web, 6operations, and 3 others: https://wikitech.m.wikimedia.org/ serves wikimedia.org portal - https://phabricator.wikimedia.org/T120527#2012804 (10Dzahn) added the ServerAlias in Apache config, silver answers for wikitech.m now, but merging the DNS change to switch... [21:27:17] (03PS1) 10Tim Landscheidt: WIP [labs/toollabs] - 10https://gerrit.wikimedia.org/r/269531 [21:30:18] (03Abandoned) 10Tim Landscheidt: WIP [labs/toollabs] - 10https://gerrit.wikimedia.org/r/269531 (owner: 10Tim Landscheidt) [21:37:54] thanks for hammering tools gluten free crawler [21:37:55] 10.68.21.49 tools.wmflabs.org - [04/Feb/2016:22:25:29 +0000] "GET /toolserver-home-archive/ HTTP/1.1" 200 153 "-" "Mozilla/5.0 (compatible; Gluten Free Crawler/1.0; +http://glutenfreepleasure.com/)" [21:44:55] it's not even really funny [21:45:57] (03PS1) 10Tim Landscheidt: Add missing uploaders [labs/toollabs] - 10https://gerrit.wikimedia.org/r/269537 [21:46:15] oh, after re-reading the page twice, the gluten free part actually is a joke and they are using it seriously but don't tell for what. I see. [21:48:59] 6Labs, 6operations, 10wikitech.wikimedia.org: Update wikitech-static OS/PHP version - https://phabricator.wikimedia.org/T126385#2012874 (10Krenair) 3NEW [21:49:20] (03PS2) 10Tim Landscheidt: Add missing uploaders [labs/toollabs] - 10https://gerrit.wikimedia.org/r/269537 [21:49:42] ^ I'm filing that task to be prepared for when 1.27 does get released [21:50:19] I'm hoping it won't be left long enough to also end up with the question "what about xenial?" [21:53:19] (03CR) 10Tim Landscheidt: [C: 04-1] "While correct, the purpose of this change was to fix the changelog-should-mention-nmu error of lintian (https://integration.wikimedia.org/" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/269537 (owner: 10Tim Landscheidt) [21:54:45] (03PS6) 10Tim Landscheidt: Add list-user-databases command [labs/toollabs] - 10https://gerrit.wikimedia.org/r/234934 (https://phabricator.wikimedia.org/T91231) [21:57:51] (03CR) 10BryanDavis: "Thanks for turning my half-baked patch into a working solution Tim." [labs/toollabs] - 10https://gerrit.wikimedia.org/r/269367 (https://phabricator.wikimedia.org/T124106) (owner: 10BryanDavis) [22:06:59] 6Labs: broken labs instances (ssh or perms), do we care? - https://phabricator.wikimedia.org/T126323#2012950 (10mmodell) I care about phab-01, but not sure how to deal with it. [22:14:58] (03PS7) 10Tim Landscheidt: Add list-user-databases command [labs/toollabs] - 10https://gerrit.wikimedia.org/r/234934 (https://phabricator.wikimedia.org/T91231) [22:17:45] 10Tool-Labs-tools-Database-Queries, 6Phabricator: Archive Tool-Labs-tools-Database-Queries project - https://phabricator.wikimedia.org/T107699#2013000 (10Aklapper) >>! In T107699#1844789, @jcrespo wrote: > Sorry for reopening these, but who is actively attending these? I do dozens more of these being pinged by...