[00:15:34] leila: I don't see any of the tasks in https://phabricator.wikimedia.org/tag/article-recommendation/ [00:16:20] https://phabricator.wikimedia.org/T112321 [00:16:31] YuviPanda: they are in increasing-content-coverage [00:16:44] I just added them as blockers to T112321 [00:16:44] leila: ah [00:16:46] leila: cool [00:19:54] 6Labs, 6Increasing content coverage, 7Article-Recommendation: Investigate possible instances with 32G of RAM to test article-reccomendations - https://phabricator.wikimedia.org/T116321#1764450 (10leila) [00:20:14] 6Labs, 6Increasing content coverage, 7Article-Recommendation: Investigate possible instances with 32G of RAM to test article-reccomendations - https://phabricator.wikimedia.org/T116321#1746391 (10leila) [00:21:40] leila: can you add a ticket about the search stuff? [00:22:06] I'm not sure what should the ticket be about, let me ping Ellery, YuviPanda. [00:22:14] is that a blocking task YuviPanda? [00:22:29] leila: no but just a ticket in general since I wasn't really sure what it was about [00:22:38] got it [00:30:54] leila: would it be useful for me to list out the 'things' that revscoring has right now? [00:31:02] like, generalized loggin, caching, deployment system, etc? [00:31:47] mm, should we make a capital P task and make those as subtasks for that, and we get back to them when we want to make it capital P? [00:31:50] YuviPanda: ? [00:32:08] leila: kindof. there are several other things that *that* will have that this current list will not :) [00:32:20] like a *particular* deploy system, a particular way of deploying things, etc [00:32:26] while currently we just have *a* deploy system [00:32:28] ah! got it. then listing them somewhere will be useful, YuviPanda [00:32:32] and much more lax standards [00:32:37] and also things like security review [00:32:39] for example [00:32:41] ok [00:33:00] leila: https://etherpad.wikimedia.org/p/things-revscoring-has-now-in-labs [00:33:18] thanks, YuviPanda. [00:38:53] leila: that's fairly complete I think [00:38:57] leila: are any o fthem not clear? [00:39:54] what is the difference between the two stages YuviPanda? [00:40:10] leila: stage #1 was stuff we did last quarter [00:40:15] Stage #2 is stuff we're doing this quartr [00:40:18] ah! got it, YuviPanda [00:40:23] stage #2 is more 'nice to have' I guess [00:40:41] and what does comprehensive logging entail, YuviPanda? [00:41:06] leila: just putting logging.info / logging.debug calls in as many places as possible [00:41:15] so you can look at the logs and kind of trace the big important parts of the program [00:41:32] logging.info("starting request for article foo from lang bar to noo") [00:41:33] are those logs collected in Labs, YuviPanda? [00:41:35] etc [00:41:39] leila: yeah [00:41:43] leila: and automatically rotated etc [00:41:47] got it, good to know, YuviPanda. [00:41:51] yup [00:41:53] that's handled by the puppetization part [00:41:57] cool cool! all clear. thanks! [00:42:16] leila: I think it might be useful if I can make halfak write small 'this is why this was useful!' comments by the side too [00:42:46] yeah, that'd be great! [00:43:02] leila: yah it's probably too late today but I guess he's been poked! [00:43:31] halfak: I'm trying to document the stuff we did for revscoring from an infrastructure POV, and have a list at https://etherpad.wikimedia.org/p/things-revscoring-has-now-in-labs can you look at it and maybe add a short sentence about how each of those things helped? [00:44:11] we should let him rest YuviPanda [00:44:16] leila: yeah [00:44:23] no burning out the aaron [00:44:28] I'm proud of him that he signs off a bit earlier these days, YuviPanda. [00:44:33] :D [00:44:34] +1 [00:44:44] I probably should start doing similar things but then I've to wake up earlier [00:45:12] http://jessenoller.com/blog/2015/9/27/a-lot-happens is a painful read about burnout :( [00:46:29] will read it on the plane tomorrow, YuviPanda. ;-) [00:46:59] leila: ooo nice! enjoy yur trip :D [00:47:14] I will. going to a workshop in Chicago, will be back Saturday night. [00:47:27] leila: nice [00:47:30] leila: chicago was awesome [00:48:10] yeah, that city is great. I won't have much time to explore this time. will have quick meet ups with some old friends, other than that, just workshop. [00:48:27] and I'm super happy to see that it's not freezing cold there, YuviPanda. [00:48:46] :D [00:48:57] leila: just being in the city felt like being in a batman movie [00:49:15] hahaha! :D [00:49:32] leila: can you poke ellery? he might be filling up stat1002's disk space [00:49:52] what should I tell him, YuviPanda? :D [00:50:08] leila: 'what are you running? and staaaahp' [00:50:12] :D [00:50:13] okay! [00:50:28] leila: he has 400G in his homedir :) [00:50:55] done, YuviPanda. you're cc-ed. [00:51:17] okay, I'm going to pack YuviPanda. will be back around 7pm. [04:24:57] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Mirahold was created, changed by Mirahold link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Mirahold edit summary: Created page with "{{Tools Access Request |Justification=Nationstates, a political simulator where you can go in depth on your political ideologies. I plan to use these tools to graph various t..." [06:46:41] PROBLEM - Puppet failure on tools-bastion-02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [07:26:37] RECOVERY - Puppet failure on tools-bastion-02 is OK: OK: Less than 1.00% above the threshold [0.0] [07:49:14] 10Tool-Labs-tools-Other, 6Community-Tech, 7Tracking: Improving Magnus' tools (tracking) - https://phabricator.wikimedia.org/T115537#1764829 (10Ricordisamoa) [12:45:51] hey all, i'm following instructions from https://wikitech.wikimedia.org/wiki/Help:MediaWiki-Vagrant_in_Labs to install vagrant on a labs instance, but I'm stuck on the step 7. [12:46:09] how do I start the LCX virtual machine? [12:46:44] do I have to install virtualbox first? [13:12:54] bmansurov: by running 'vagrant up', I presume? [13:13:09] applying role::labs::mediawiki_vagrant should have installed virtualbox and vagrant [13:17:30] YuviPanda, i can't connect to postgress in labs, do you know the passwd? [14:34:49] anyone else have trouble with getting Fabric to work with labs? I've had this for like a year and just worked around it by deploying manually, but it's getting Really annoying: "Fatal error: Incompatible ssh peer (no acceptable kex algorithm)" [14:35:10] they say it's fixed wih paramiko 0.15.1 but I tried uninstalling and reinstalling like a madman and nothing works [14:36:49] milimetric: let me check which version I have [14:37:02] you deploy with fabric successfully? :) [14:37:03] oh, wait. On tool labs we don't have the kex algorithms restricted I think [14:37:15] oh ok [14:37:22] yeah, this is deploying to limn1.eqiad.wmnet [14:37:30] which you can abuse by hopping through tools-login [14:37:45] and you can actually just allow the old kex algorithms with a hiera parameter [14:38:30] i only ever had this working before they changed that, and I have some weird Kex algorithm string in my .ssh now [14:38:33] maybe that's wrong? [14:39:25] I'm not entirely sure. iirc it's a server-side thing, but I'm not sure if it's tools-bastion or limn1 that's actually giving you trouble [14:39:29] er, bastion.eqiad [14:40:42] ok, so testing from tools-bastion: limn1 doesn't accept old kex. That's with paramiko 1.10.1 [14:41:35] and testing locally I cannot connect to either tools-bastion nor bastion.eqiad. OK, that doesn't help >_< [14:41:40] right, I wonder if YuviPanda got this to work 'cause he uses fabric but not always on tools [14:41:53] but then how can I use fabric to deploy stuff on tools?! [14:42:06] magic to me :) [14:42:41] aaah. I know. I installed fabric using pipsi, which installs it in its own venv [14:42:56] with paramiko 1.15.2 [14:43:08] milimetric: https://github.com/mitsuhiko/pipsi [14:43:27] or just create a venv and pip install fabric there [14:44:07] ok, I'll try [14:44:31] but i mean, i have paramiko 1.15.2, and if you just pip install fabric, it comes with a lower version [14:56:39] I tried pipsi... after "pipsi install Fabric" I get fab linked to .local/bin but "fab" results in "ERROR: You're missing a dependency! ... To build this project using fabric, you'll need to install fabric (and some other stuff): ... pip install -U Fabric paramiko path.py" [14:58:07] hello, I've been having a very difficult time getting a cronjob to restart a webservice. I know we have some auto-restart mechanism, but it doesn't seem to work right with xtools [14:58:39] so I wanted to create a cronjob that restarts it once daily. This is not the same as the webwatcher thing we used to use that apparently kept spawning new webservices instead of restarting the existing ones [14:59:47] anyway, I've got `jlocal sh ~/webstart.sh` where `webstart.sh` just does `webservice restart`. With that I get an email that says `Can't open /data/project/xtools/webstart.sh` [15:00:15] webstart.sh has full r/w/x [15:04:24] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Mirahold was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=198212 edit summary: [15:06:29] milimetric: weird :/ [15:06:44] it's ok, no worries, I hate fabric [15:06:57] I'll just deploy manually from now on and erase this repo [15:07:22] milimetric: so I don't understand that 'pip install fabric' comes with a lower version (it should just take the latest from pypi) [15:07:45] and I also don't understand how pipsi install fabric fails. Does 'fab' execute the correct fabric? :/ [15:08:09] yeah, pipsi install fabric, then "which fab" gives me the .local/bin one [15:08:30] so before the pipsi install, just saying "fab" worked [15:08:40] but then actually doing the deploy would give me that Kex error [15:08:55] and after pipsi, "fab" didn't work, saying there are unmet dependencies [15:09:13] which makes me think the fabric package is broken somehow and doesn't reference path.py or something [15:11:32] oh, and the lower version thing is if I uninstall everything and just "pip install fabric". Then the installed version of paramiko seems to be 1.10. So I just "pip install paramiko" first and then it's fine [15:17:10] oh, right. pip doesn't upgrade unless requested explicitly (pip install -U ...) [15:18:19] fabric just says install_requires=['paramiko>=1.10'] [15:24:28] milimetric: ooooooh. That error actually comes from https://github.com/dsc/kraken-old/blob/master/fabfile/__init__.py [15:25:34] oh, which I'm assuming is the same as https://github.com/wikimedia/limn-deploy/blob/master/fabfile/__init__.py? [15:26:38] more or less, yes [15:27:05] yes, ok, oops :) sorry, hard to know when to dive deep and when you're thinking about diving into a dragon cave [15:27:17] I'll get a better error message then [15:27:31] 6Labs, 10Labs-Infrastructure, 10netops, 6operations, 3labs-sprint-117: Allocate labs subnet in dallas - https://phabricator.wikimedia.org/T115491#1765434 (10Andrew) [15:27:43] 6Labs, 10Labs-Infrastructure, 10netops, 6operations, 3labs-sprint-117: Allocate subnet for labs test cluster instances - https://phabricator.wikimedia.org/T115492#1765435 (10Andrew) [15:28:11] milimetric: so it's not entirely clear to me whether it works for you now, but ~/.local/venvs/fabric/bin/pip install path.py will probably solve the pipsi issue [15:28:28] I can understand if you just gave up, though :-) [15:28:35] k, I'll try (was foolishly doing pip install path.py) [15:28:50] hey man, if you don't give up, it's shameful for me to give up [15:29:30] :) ok, so pipsi fab worked now, but same Kex error: Fatal error: Incompatible ssh peer (no acceptable kex algorithm) [15:29:39] and it still asks me for my password, which doesn't seem right [15:30:43] no, that clearly isn't :/ [15:31:45] last thing I can think of: ~/.local/venvs/fabric/bin/pip install -U fabric paramiko path.py [15:31:52] and if that doesn't work, I give up >_< [15:32:17] This is so frustrating. Fabric is supposed to make stuff easy, not difficult :{ [15:33:10] 6Labs, 10Labs-Infrastructure, 10netops, 6operations, 3labs-sprint-117: Allocate labs subnet in dallas - https://phabricator.wikimedia.org/T115491#1765459 (10faidon) a:5mark>3chasemp `17:28 < chasemp> I would like to outline it and take care of it` [15:33:26] 6Labs, 10Labs-Infrastructure, 10netops, 6operations, 3labs-sprint-117: Allocate subnet for labs test cluster instances - https://phabricator.wikimedia.org/T115492#1765464 (10faidon) a:5mark>3chasemp [15:38:52] milimetric: valhallasw`cloud yes if you copy the contents of https://wikitech.wikimedia.org/wiki/Hiera:Quarry to the hiera page for your project and run puppet fabric will work [15:39:10] valhallasw`cloud: I don't think there's a paramiko release with support for the kex we use [15:39:13] YuviPanda: so it's not working on tool labs at the moment [15:39:18] oh [15:39:19] oh. [15:39:23] I see, that explains. [15:39:32] yeah paramiko is kindofabandoned [15:39:38] paramiko is always behind [15:40:29] yeah but it's been behind for quite a while now [15:40:31] cool, glad to know I was right to hate it. I'll try the hiera thing [15:40:33] like a year ago someone sent them a patch [15:40:42] and boom still not merged [15:40:44] uuuuh. paramiko 1.15.2 can connect to tools-login, 1.10.1 cannot [15:41:04] but neither can connect to bastion.wmflabs [15:41:06] fun fun fun [15:41:51] https://tools.wmflabs.org/wikidata-game/ is not answering :( [15:42:08] wrote a mass of automation code at $lastgig that was using paramiko so sad panda it's dying to me [15:42:19] curl sees no response headers [15:43:46] chasemp: there's some recent activity in the repo, so hopefully it's pick up some steam again [15:44:13] not a lot of stuff though... [15:45:00] actually, "I'm back from vacation now and will be working to merge this & other high profile stuff ASAP" on that specific issue (comment from a few days ago). So maybe... [15:45:33] valhallasw`cloud: can you help with wikidata-game? [15:46:11] valhallasw`cloud: well I know in ansible will use paramiko in some cases and it just got a huge influx of cash [15:46:20] so maybe it will work out :) [15:47:03] valhallasw`cloud: can you restart wikidata-game? I wonder if it's just lighttpd crapping out again [15:47:06] !log tools.wikidata-game 2015-10-29 11:20:33: (server.c.1444) [note] sockets disabled, connection limit reached [15:47:10] heh [15:47:17] yeah let me check that issue [15:47:30] * YuviPanda goes to interview people [15:47:40] netstat etc [15:47:58] straaaaaaaaaaaaaaaaaaaceeeeeeee! [15:48:20] !log tools.wikidata-game yep, 300 CLOSE_WAIT connections for tools.wikidata-game [15:48:37] !log tools.wikidata-game webservice restart should fix that for now... [15:48:58] 6Labs, 10Tool-Labs: lighttpd does not correctly close connections (CLOSE_WAIT) - https://phabricator.wikimedia.org/T104799#1765534 (10valhallasw) Happened for tools.wikidata-game today. [15:49:24] jzerebecki: try again? [15:49:54] valhallasw`cloud: thx [15:49:56] works [16:06:58] :( YuviPanda still no luck [16:07:01] https://www.irccloud.com/pastebin/iTe471vM/ [16:07:21] and it asks me for the password!! why?! [16:07:24] https://www.irccloud.com/pastebin/yug14coj/ [16:07:37] I added the things you said here and ran puppet (twice): https://wikitech.wikimedia.org/wiki/Hiera:Analytics [16:09:20] milimetric: :( in an interview I'll check [16:12:28] (no rush, Yuvi :) been like this for over a year) [16:52:21] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1411 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [16:53:06] milimetric: back [16:53:13] milimetric: still around? I might have a possible solution [16:54:13] milimetric: env.shell = '/bin/bash -c' [16:54:16] at the top of your fabfile [16:54:18] valhallasw`cloud: ^ [17:33:54] 10MediaWiki-extensions-OpenStackManager, 7I18n: OpenStackManager has 15 messages without documentation - https://phabricator.wikimedia.org/T103214#1765860 (10Umherirrender) 5Open>3Resolved a:3Umherirrender Done by export from translatewiki.net, with https://gerrit.wikimedia.org/r/#/c/247683/ and https://... [17:46:32] YuviPanda: wat [17:58:18] YuviPanda: nah, no luck on that [17:58:21] maybe we should hang out [17:59:45] milimetric: sure. already in another hangout, gimme a few more mins? [17:59:58] no worries, me too, ping me whenever [18:02:21] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [18:20:50] milimetric: wanna do it now? [18:21:31] valhallasw`cloud: yeah so by default it invokes bash with -l and that treats it as a login shell which sudo is confused about and asks for passwords often [18:27:08] YuviPanda: sure, now's good [18:27:51] milimetric: kk [18:32:03] YuviPanda: https://github.com/wikimedia/limn-deploy/blob/master/fabfile/__init__.py [18:33:17] https://github.com/wikimedia/limn-deploy/blob/master/fabfile/deploy.py#L83 [18:38:07] valhallasw`cloud: bastion didn't have the no_explicit_macs stuff set [18:39:42] I just set it [18:44:38] 6Labs, 5Gerrit-Migration: Figure out a git hosting solution for tools - https://phabricator.wikimedia.org/T117071#1766048 (10yuvipanda) 3NEW [18:49:13] 6Labs: Figure out a git hosting solution for tools - https://phabricator.wikimedia.org/T117071#1766056 (10yuvipanda) [18:49:28] 6Labs, 10Tool-Labs, 10Diffusion: Figure out a git hosting solution for tools - https://phabricator.wikimedia.org/T117071#1766048 (10yuvipanda) [18:50:04] 6Labs, 10Tool-Labs, 10Diffusion: Figure out a git hosting solution for tools - https://phabricator.wikimedia.org/T117071#1766067 (10Legoktm) Additionally, it would be nice if said service did not require a new account system and could be hooked up to LDAP. Bonus points if you can grant repo access to all mem... [18:50:56] 6Labs, 10Tool-Labs, 10Diffusion: Figure out a git hosting solution for tools - https://phabricator.wikimedia.org/T117071#1766072 (10yuvipanda) [19:03:59] YuviPanda: awesome, the remaining problem was that we had monkey-patched so I removed that and all works ok [19:04:09] monkey patch was: https://github.com/wikimedia/limn-deploy/blob/master/fabfile/monkeypatch_sshproxy.py [19:04:11] milimetric: awesome [19:04:24] thank you very much, I can deploy like a normal human again [19:04:25] :) [19:05:05] milimetric: :D cool [19:22:55] 6Labs: Investigate moving mwoffliner onto a labs-on-real-hardware machine - https://phabricator.wikimedia.org/T117081#1766189 (10yuvipanda) 3NEW [19:23:28] 6Labs: Investigate moving mwoffliner onto a labs-on-real-hardware machine - https://phabricator.wikimedia.org/T117081#1766197 (10chasemp) p:5Triage>3Normal [19:25:27] 6Labs, 10Labs-Team-Backlog: Support bare-metal server allocation in labs -- bootstrap mode - https://phabricator.wikimedia.org/T95185#1766201 (10chasemp) use case https://phabricator.wikimedia.org/T117081 [19:34:20] 6Labs: Investigate moving mwoffliner onto a labs-on-real-hardware machine - https://phabricator.wikimedia.org/T117081#1766225 (10chasemp) [19:34:22] 6Labs, 10Labs-Team-Backlog: Support bare-metal server allocation in labs -- bootstrap mode - https://phabricator.wikimedia.org/T95185#1766224 (10chasemp) [19:34:37] 6Labs: Investigate moving mwoffliner onto a labs-on-real-hardware machine - https://phabricator.wikimedia.org/T117081#1766226 (10yuvipanda) [19:41:20] hi [19:54:10] 6Labs: Investigate moving mwoffliner onto a labs-on-real-hardware machine - https://phabricator.wikimedia.org/T117081#1766189 (10yuvipanda) [20:18:59] 6Labs: Investigate moving mwoffliner onto a labs-on-real-hardware machine - https://phabricator.wikimedia.org/T117081#1766449 (10yuvipanda) [20:22:12] 6Labs: Investigate moving mwoffliner onto a labs-on-real-hardware machine - https://phabricator.wikimedia.org/T117081#1766474 (10yuvipanda) [20:29:21] 6Labs, 10Wikibugs, 10grrrit-wm: puppetize grrrit-wm and wikibugs - https://phabricator.wikimedia.org/T104616#1766487 (10yuvipanda) grrrit-wm is on kubernetes now, wikibugs should be too [20:33:28] 6Labs, 10Labs-Infrastructure, 6operations, 3labs-sprint-117, 3labs-sprint-118: How to handle mgmt lan for labs bare metal? - https://phabricator.wikimedia.org/T116607#1766502 (10RobH) [20:42:24] 6Labs, 10Labs-Infrastructure, 6operations: deployment tracking of codfw labs test cluster - https://phabricator.wikimedia.org/T117097#1766532 (10RobH) 3NEW a:3RobH [20:42:40] 6Labs, 10Labs-Infrastructure, 10hardware-requests, 6operations, and 2 others: Labs test cluster in codfw - https://phabricator.wikimedia.org/T114435#1766541 (10RobH) [20:42:42] 6Labs, 10Labs-Infrastructure, 6operations: deployment tracking of codfw labs test cluster - https://phabricator.wikimedia.org/T117097#1766540 (10RobH) [20:43:13] 6Labs, 10Tool-Labs, 10Diffusion: Figure out a git hosting solution for tools - https://phabricator.wikimedia.org/T117071#1766551 (10yuvipanda) [20:43:59] 6Labs, 10Labs-Infrastructure, 6operations: deployment tracking of codfw labs test cluster - https://phabricator.wikimedia.org/T117097#1766532 (10RobH) [20:52:06] 6Labs, 10Labs-Infrastructure, 6operations: deployment tracking of codfw labs test cluster - https://phabricator.wikimedia.org/T117097#1766623 (10RobH) [20:55:55] 6Labs, 10Labs-Infrastructure, 6operations: deployment tracking of codfw labs test cluster - https://phabricator.wikimedia.org/T117097#1766532 (10RobH) [20:55:58] 6Labs, 10Labs-Infrastructure, 10hardware-requests, 6operations, and 2 others: Labs test cluster in codfw - https://phabricator.wikimedia.org/T114435#1766662 (10RobH) 5Open>3Resolved a:3RobH Ok, so everything in the blocking network tasks states row B is labs in codfw as well (for now.) So I'm resolv... [21:05:33] 6Labs, 10Labs-Infrastructure, 6operations: deployment tracking of codfw labs test cluster - https://phabricator.wikimedia.org/T117097#1766722 (10RobH) [21:05:40] 6Labs, 10Labs-Infrastructure, 6operations, 10ops-codfw: on-site tasks for labs deployment cluster - https://phabricator.wikimedia.org/T117107#1766723 (10RobH) 3NEW a:3Papaul [21:07:29] 6Labs, 10Labs-Infrastructure, 6operations: deployment tracking of codfw labs test cluster - https://phabricator.wikimedia.org/T117097#1766532 (10RobH) [23:01:56] 6Labs, 3labs-sprint-118: Document support levels for tools and labs projects - https://phabricator.wikimedia.org/T116598#1767179 (10chasemp) p:5Triage>3Normal [23:25:23] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1411 is CRITICAL: CRITICAL: 57.14% of data above the critical threshold [0.0] [23:38:48] 6Labs, 10wikitech.wikimedia.org, 7Regression: Failed log in on wikitech results in blank page with Exception - https://phabricator.wikimedia.org/T117133#1767327 (10Krinkle) 3NEW