[00:17:18] 7Tool-Labs, 3ToolLabs-Goals-Q4: Make tools-login / bastion hosts redundant and move them to trusty - https://phabricator.wikimedia.org/T91863#1214403 (10yuvipanda) Alright, tools-dev should move in, say, a week. 22nd April. Writing announcement email now. [00:54:32] 7Tool-Labs, 3ToolLabs-Goals-Q4: Make tools webproxy redundancy a lot better - https://phabricator.wikimedia.org/T96334#1214519 (10yuvipanda) 3NEW [00:56:23] 7Tool-Labs, 3ToolLabs-Goals-Q4: Use base::firewall on tools proxies - https://phabricator.wikimedia.org/T96335#1214541 (10yuvipanda) 3NEW [01:10:05] 7Tool-Labs, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Use base::firewall on tools proxies - https://phabricator.wikimedia.org/T96335#1214570 (10yuvipanda) [01:48:55] !log tools disable puppet on live webproxy (-01) to apply firewall changes to -02 [01:49:00] Logged the message, Master [03:28:15] 7Tool-Labs, 3ToolLabs-Goals-Q4: Make tools webproxy redundancy a lot better - https://phabricator.wikimedia.org/T96334#1214672 (10yuvipanda) [03:28:17] 7Tool-Labs, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Use base::firewall on tools proxies - https://phabricator.wikimedia.org/T96335#1214669 (10yuvipanda) 5Open>3Resolved a:3yuvipanda Boom all done :D [06:51:37] PROBLEM - Puppet failure on tools-exec-06 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [07:01:52] PROBLEM - Puppet failure on tools-webgrid-03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [07:16:38] RECOVERY - Puppet failure on tools-exec-06 is OK: OK: Less than 1.00% above the threshold [0.0] [07:26:51] RECOVERY - Puppet failure on tools-webgrid-03 is OK: OK: Less than 1.00% above the threshold [0.0] [09:03:41] 6Labs, 5Patch-For-Review: labvirt boxes need a new cert for libvirtd - https://phabricator.wikimedia.org/T96291#1214908 (10akosiaris) OK, change submitted. I am unsure of the repercussions of this one. Probably needs to be coordinated so that it happens on all virt nodes in the same time followed by a restart... [11:56:55] 7Tool-Labs, 5Patch-For-Review: Trusty instances do not show the motd banners - https://phabricator.wikimedia.org/T85307#1215073 (10faidon) Labs' outdated PAM config is a problem that needs to be solved. See T85910. [12:13:36] YuviPanda: why I don't see any instances in project huggle? They are up and running but not in web interfaces [12:14:04] This user is now online in #wikimedia-labs. I'll let you know when they show some activity (talk, etc.) [12:14:04] @notify YuviPanda [13:19:04] 10Wikimedia-Labs-Infrastructure, 10Wikimedia-Labs-wikitech-interface: Can't list of create new instances - https://phabricator.wikimedia.org/T96362#1215182 (10Petrb) 3NEW [13:28:15] 10Wikimedia-Labs-Infrastructure, 10Wikimedia-Labs-wikitech-interface: Can't list of create new instances - https://phabricator.wikimedia.org/T96362#1215241 (10Petrb) p:5Triage>3Unbreak! [13:28:23] 10Wikimedia-Labs-Infrastructure, 10Wikimedia-Labs-wikitech-interface: Can't list or create new instances - https://phabricator.wikimedia.org/T96362#1215243 (10Petrb) [13:29:48] 10Wikimedia-Labs-Infrastructure, 10Wikimedia-Labs-wikitech-interface: Can't list or create new instances - https://phabricator.wikimedia.org/T96362#1215182 (10Petrb) 5Open>3Resolved a:3Petrb relogging fixed this [14:21:26] 7Tool-Labs, 5Patch-For-Review: Improve & force sudo lecture - https://phabricator.wikimedia.org/T95882#1215412 (10scfc) a:3valhallasw [15:13:56] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Hprmedina was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=154360 edit summary: [15:14:56] o.o [15:17:05] 10Wikimedia-Labs-Infrastructure, 10Wikimedia-Labs-wikitech-interface: Can't list or create new instances - https://phabricator.wikimedia.org/T96362#1215526 (10Krenair) 5Resolved>3Invalid [15:25:47] Hello, I have a bot that I've been working on that is ready to go into the approval process, but I need some Ruby gems installed. Please see: https://phabricator.wikimedia.org/T96261 [15:26:22] It will only take a second. If an admin could go to /data/project/musikbot/MusikBot and run `gem install bundler && bundle install` they'd be my best friend!!! [15:27:13] 7Tool-Labs: Trusty instances do not show the motd banners - https://phabricator.wikimedia.org/T85307#1215557 (10scfc) [15:28:16] MusikAnimal: Doesn't ruby allow you to keep local gems without having to install them globally? [15:28:54] I thought that's what bundler did [15:28:59] but bundler is not installed :( [15:29:33] bundler should be global anyway [15:29:57] MusikAnimal: Can you open a phab ticket for this then? We do not install packages with things like gem and pip and would need to create a debian package for this. [15:30:10] There is probably a deb for it too. :-) [15:30:12] https://phabricator.wikimedia.org/T96261 [15:30:33] ^ that ask for a little bit more, I can remove the part about installing Ruby 2, not necessary [15:30:44] I just need to be able to install gems as needed [15:30:55] MusikAnimal: That would make it easy to install bundler, yes. :-) [15:31:35] 7Tool-Labs, 3Labs-Q4-Sprint-3: Bundler and other Ruby gems needed for MusikBot tool - https://phabricator.wikimedia.org/T96261#1215567 (10coren) p:5Triage>3Normal a:3coren [15:32:14] 6Labs, 3Labs-Q4-Sprint-2, 3Labs-Q4-Sprint-3: Slides for the Labs storage presentation - https://phabricator.wikimedia.org/T95317#1215572 (10coren) [15:35:26] MusikAnimal: doesn't "gem install --user-install bundle" work? [15:35:41] I was just reading that [15:36:04] the --user-install admittedly is something I wasn't aware of, never been put in this position before. Anyway about to try it! [15:38:07] MusikAnimal: That's even better if it works for you - ruby isn't in all that much use so global packages may not be the best idea. :-) [15:40:59] I don't know where it installed it. There should be a .gem directory somewhere. Should I be installing as musikanimal or musikbot? [15:41:48] bundler installed and now I want to add it to the PATH [15:48:41] put it in verbose mode and found them, in /data/project/musikbot/.gem [15:48:51] good stuff, this works for me! thank you Coren sitic [16:17:34] !log openstack live-migrating instances to labvirt hosts. There may be brief interruptions in responsiveness [16:17:38] Logged the message, dummy [16:23:59] 6Labs: Zillion expired tokens in keystone database - https://phabricator.wikimedia.org/T96256#1215781 (10Andrew) As of now, 3953465 So that's progress, albeit creeping progress [16:33:30] it's a bit confusing that with the new grid job numbers qacct first shows infos about a job from a year ago with the same id and only after a long waiting period the information on the job that ended some minutes ago [16:34:15] sitic: We should probably truncate the accounting log now that job ids have rollovered. It's just a dumb flat file, and qacct actually traverses it from the start every time. [16:42:01] 7Tool-Labs, 3Labs-Q4-Sprint-3: Bundler and other Ruby gems needed for MusikBot tool - https://phabricator.wikimedia.org/T96261#1215829 (10MusikAnimal) All set, but for reference to other Rubyists working on labs, here's what I did: # install rbenv, follow instructions at https://github.com/sstephenson/rbenv#in... [16:42:28] ^ Coren you can close that task, not sure if I am able to do it myself [16:43:02] I left some notes for others should they run into the same problem [16:45:22] MusikAnimal: You should be able to. [16:46:00] 7Tool-Labs, 3Labs-Q4-Sprint-3: Bundler and other Ruby gems needed for MusikBot tool - https://phabricator.wikimedia.org/T96261#1215833 (10MusikAnimal) 5Open>3Resolved Thank you! [16:46:02] found it [16:46:03] thanks [16:46:25] 7Tool-Labs, 3Labs-Q4-Sprint-3: Bundler and other Ruby gems needed for MusikBot tool - https://phabricator.wikimedia.org/T96261#1215841 (10Technical13) a:5coren>3MusikAnimal [16:46:35] ;) [16:48:50] is there something wrong wtih account creation? [16:59:31] are you talking about wikitech or accounts.wmflabs.org or ??? [17:07:12] Dragonfly6-7: ? [17:36:42] petan: we deleted all your instances because of inactivity [17:59:32] @YuviPanda: dplbot's webserver is not responding again (3rd time in 24 hours); I'm leaving it alone this time for you [18:00:04] Oh damn I'm not at my computer again boo :( [18:00:20] russblau: it is gonna be about... 2h before I get to a computer :( [18:00:23] Is that ok? [18:01:20] it's not ideal :-) but if you think you can determine a cause, 2 hours of downtime now is better than having it go down again in the middle of the night [18:04:22] Thank you. I think I should be able to do that... [18:28:53] 10Gerrit-Patch-Uploader, 7Easy: Serve static resources from //tools-static.wmflabs.org or /static/ project - https://phabricator.wikimedia.org/T86354#1216338 (10Aklapper) @valhallasw: Could you answer Krinkle's question? [18:30:11] 6Labs, 3Labs-Q4-Sprint-2, 3Labs-Q4-Sprint-3: Slides for the Labs storage presentation - https://phabricator.wikimedia.org/T95317#1216339 (10coren) 5Open>3Resolved https://commons.wikimedia.org/wiki/File:WMF_Labs_storage_presentation.pdf [19:22:19] 10Wikimedia-Labs-Infrastructure, 10Beta-Cluster, 6operations: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#526960 (10Dzahn) Is T70387 a duplicate of this? [20:29:17] 6Labs, 3Labs-Q4-Sprint-3: Labs: puppetize stripe_cache_size tweaks on labstores - https://phabricator.wikimedia.org/T96045#1216878 (10coren) This isn't a real sysctl, and only becomes available once the devices are assembled. Given that setting the size is an idempotent operation, the easiest thing to do is t... [20:47:45] YuviPanda: do you deploy apache changes? [20:49:46] mutante: nope... [20:50:05] then how did you get stuff deployed when you merged them? [20:51:13] YuviPanda: I see dplbot webpages are back up; thanks. Do you think this is a permanent fix? [20:52:05] russblau: not sure. I'm making more changes to the proxy infra today to prevent issues like yours [20:52:05] Let's see what happens [21:10:08] > Your job 43104 ("lolrrit-wm") has been submitted [21:10:44] !log tools.lolrrit-wm manually started it, why is bigbrother still disabled? [21:10:47] Logged the message, Master [21:17:49] hi, is it possible to load mobile friendly version of some article to external site (including all the minerva stack)? I know of parse with mobileformat and action=mobileview but it seems to be more for internal use [21:19:41] eranroz: action=mobileview with the API is stable - it is what is used by the mobile app [21:20:15] 6Labs, 3Labs-Q4-Sprint-3: Labs: puppetize stripe_cache_size tweaks on labstores - https://phabricator.wikimedia.org/T96045#1217086 (10coren) [21:20:29] YuviPanda: but it doesn't give all the nice styles of minerva ;) [21:20:32] 6Labs, 3Labs-Q4-Sprint-3: Labs: puppetize stripe_cache_size tweaks on labstores - https://phabricator.wikimedia.org/T96045#1217088 (10coren) 5Open>3Resolved Deployed. [21:20:42] eranroz: then it might be a good question for #wikimedia-mobile :) [21:20:48] action=mobileview just gives you the content [21:21:51] YuviPanda: ok thanks [21:58:05] 6Labs: Zillion expired tokens in keystone database - https://phabricator.wikimedia.org/T96256#1217253 (10Andrew) [21:58:08] 10Wikimedia-Labs-wikitech-interface, 6operations: Can not log into wikitech.wikimedia.org - https://phabricator.wikimedia.org/T96240#1217252 (10Andrew) [21:58:56] hm, interesting that i can add a blocking task to a resolved issue [21:59:49] andrewbogott: blocking or "blocked by" [21:59:57] i guess it should auto-reopen it then [22:00:15] ‘blocked by' [22:00:23] and it didn’t reopen, which is fine in this case [22:00:34] but I’d expect phab to care more strongly about graph integrity :) [22:04:22] 6Labs: Upgrade labs cluster to Jessie (alternative to T90821) - https://phabricator.wikimedia.org/T91799#1217270 (10Andrew) 5Open>3declined It's going to be Trusty. The cloud archive is just too handy. [22:15:21] 6Labs: Nova Instance creation hook for ldap - https://phabricator.wikimedia.org/T91987#1217308 (10Andrew) This is now happening. Designate-sink is running a custom plugin python-nova-ldap that creates ldap entries with new instances and deletes them when the instances are deleted. It also cleans up puppet and... [22:16:47] andrewbogott: to be fair to phab, it is written in PHP and probably doesn’t care strongly about anything :) [22:16:53] see also: Their horrrrriiiibleeee API [22:17:08] Man, people sure do keep writing software in php. [22:17:43] :) [22:18:36] do we use any software we don't also think sucks?:) [22:19:06] redis / nginx get very little hate I think? [22:20:18] git! [22:20:24] well, submodules :P [22:20:27] Also, maybe memcache? [22:20:30] ah, hmm [22:20:36] there’s a large contignent that used to hate git [22:20:41] but a lot of that is just gerrit hate [22:21:12] andrewbogott: btw, there’s now base::firewall on the labs proxy instances, with appropriate holes for wikitech and the outside world [22:21:30] that seems good :) [22:21:31] andrewbogott: part of https://phabricator.wikimedia.org/T96334 [22:21:31] just a fi [22:21:32] fyi [22:21:36] andrewbogott: yeah :) [22:22:00] andrewbogott: I can make them redundant trivially now, I think. not sure if worth it [22:22:08] (doing a lot of work to make the redundancy of toollabs proxies much better) [22:22:41] andrewbogott: also did you get my message earlier about needing to clean puppet / salt certs in local puppet / salt masters? [22:27:41] YuviPanda: I don't /hate/ git but I still miss my clearer svn workflow. [22:28:03] YuviPanda: Give me svn with proper directory rename support and I'd be happy. :-) [22:28:13] tch tch :) [22:28:20] local branches and rebasing ftw :) [22:28:25] and commit —amend [22:28:28] and commit -p... [22:28:30] and stashing... [22:28:56] I'm not sure I find any of those things to be features rather than warts, actually. Git really shines at a very specific usecase we don't actually care about. [22:29:09] (well, mw core probably does, not ops) [22:29:43] Coren: svn doesn't really do pre-merge code review well, though. Not that git-review is a beauty... [22:30:26] Coren: depending on what part of svn you feel you're missing out on, you could try the github svn bridge [22:30:43] Meh. I've gotten used to the git workflow by now. [22:30:53] I'm just getting old and set in mah ways. :-) [22:31:01] wait, ops doesnt care about commit --amend ? [22:31:07] that would explain [22:31:25] mutante: Hah. I meant long-lived feature branch (the use case where git really shines) [22:31:47] right. the patch workflow can be done with svn as well, but then you're mailing patches around [22:32:08] although rietveld or phabricator can do that in a smarter way as well, I guess [22:32:12] :) i love it when patches are actually edited by multiple people.. wiki [22:32:44] mutante: That's... collaborative but the workflow is anything *but* wiki wiki! [22:33:03] Wiki is "edit the endresult 'directly' and changes go live now" [22:33:15] YuviPanda: on a sidenode.... are -dev and -login supposed to be interchangable? I thought -dev had more compilers and stuff [22:33:42] YuviPanda: if they are, having them fully interchangable sounds like a good plan [22:33:51] valhallasw`cloud: The software is the same, the difference is "don't load -login to avoid messing with other people, if you have heavy things to do do 'em on -dev" [22:34:11] Like builds, etc. [22:34:15] valhallasw`cloud: the new ones are the same size. [22:34:19] and only differ by convention [22:34:25] in an outage if needed they can be. [22:34:36] yeah, fair enough. YuviPanda wanted to have -dev available as -login backup for when -login fails, with the same ssh keys [22:34:38] and yes I’m going to make the keys the same. [22:34:45] once I talk to csteipp :) [22:36:13] how long does provisioning take? would it make sense to have a provisioned but not running host standby? [22:44:48] YuviPanda: I didn’t get your message. But, I’ve been fretting about that today. [22:45:03] I kind of want to just tell people with local puppet masters that they’re on their own :) [22:45:38] I can’t imagine having nova know where the puppet master is for an instance /and/ have access to that puppet master to clear certs... [23:09:10] andrewbogott: yeah and you can’t salt that either because of lack of salt syndic... [23:10:15] YuviPanda: so, is it valid to just leave that use case to ‘fix it by hand’? [23:10:22] It’s not /that/ hard to clean a cert. [23:10:23] I… don’t know. [23:10:42] it’s a manual step and I hate manual steps. but I’m not sure what the alternative is, but I haven’t thought about that... [23:10:44] hmmm [23:11:00] What are local puppetmasters needed for these days? [23:11:20] local testing. deployment-prep, staging, integration [23:11:27] well, projectwide puppetmaster at least. [23:11:28] That’s who, but what? [23:11:37] Why using them instead of the normal puppetmaster? [23:11:41] we test patches on deployment-prep before applying them on prod [23:11:47] ok [23:11:47] so you cherry-pick that to deployment-prep [23:11:53] at least deployment prep still manages a separate set of patches [23:11:56] yeah, that’s hard to avoid. [23:11:59] and staging would be same use case [23:12:09] yeah, we’ll keep them minimal but testing is a very valid use case [23:12:17] Although you could solve that with puppetception maybe? I guess that just adds complexity without solving the cert issue. [23:12:20] Integration has nobody with +2 caring about it for a long enough time. [23:12:28] andrewbogott: not for deployment-prep, no. [23:12:46] because it’s testing ground for prod changes (mw related ones at least) [23:12:46] So... [23:13:07] Another option is to just keep id-named certs. And switch to using the modern nova id rather than ec2. [23:13:21] no, that’s still not human readable [23:13:23] That sucks for salt, though, right? Because then you can’t do wildcard matches? [23:13:24] *whispers*....local puppet only.... [23:13:27] just a dream tho [23:13:37] what do you mean by ‘local puppet only’? [23:13:49] andrewbogott: how about ec2id for things with local puppet only? [23:13:55] err [23:13:57] local puppetmasters [23:13:58] err [23:14:01] YuviPanda: ugh! [23:14:01] non-virt1000 puppetmasters [23:14:04] But, might be possible. [23:14:26] Well, I’m committed to abolishing ec2. But proper nova id, maybe. [23:14:29] andrewbogott: can we write a script that auto accepts all ‘conflicting’ ids [23:14:45] if that’s done, I think we can tell per-project puppetmaster people to ‘just run that script' [23:14:49] and maybe we can put that in a cron [23:14:53] YuviPanda: I don’t know. I thought of that — making the puppetmaster just not care. [23:14:56] (similar to our current auto accept script) [23:15:10] right —auto accept could just /really/ auto-accept every damn thing. [23:15:11] I think making the puppetmaster not just care for per-project puppet stuff is ok... [23:15:14] yeah [23:15:25] I guess you’re still waiting on a review from me for auto-accept, right? [23:15:33] that was just a cleanup yeah [23:15:41] it currently runs tho [23:15:44] you say cleanup, I say total rewrite [23:15:49] ;D [23:15:53] but, ok, I’ll try to catch up with that [23:15:57] and then maybe we can make it more lenient. [23:15:58] andrewbogott: I say cleanup because I didn’t delete the entire file before starting :D [23:16:07] andrewbogott: and also because there’s still stuff there I have no idea why it’s for [23:16:07] https://gerrit.wikimedia.org/r/#/c/198790/ [23:16:10] the ‘wat’ comments [23:16:19] oh, I guess Ryan wrote that originally? [23:17:19] andrewbogott: his name is on it because it was imported from original repo and he did the import... [23:17:20] I think [23:17:25] ah [23:17:40] so, written by unknown puppet engineers in prehistory [23:17:43] yes [23:17:57] we should keep it that way, I think :D [23:18:00] “We wrote a security system, /and/ a tool to subvert that security system!” [23:18:23] synthesis, antithesis, puppet [23:18:28] :D [23:23:20] YuviPanda: could you approve this? it's the production version of the last one I asked you to approve: https://www.mediawiki.org/w/index.php?title=Special:OAuthListConsumers/view/cc03c438ffef19f19b7d60a952327451&name=&publisher=Ragesoss&stage=0 [23:24:27] ragesoss: done [23:24:33] thanks! [23:24:43] yw :) [23:54:02] mutante: https://phabricator.wikimedia.org/T273 is that because of the weird phab deployment flow? [23:55:56] the confusion that is [23:56:11] Negative24: yes, it got merged but not deployed yet [23:56:19] i think it's simply not resolved before it's .. resolved [23:56:35] hmm [23:56:39] i don't agree with calling things resolved because we will deploy them next week [23:56:58] how are tasks dealt with when mw changes are only deployed with the mw train? [23:57:38] i don't really know but i would argue the same, a bug is resolved when the code is actually live [23:58:34] ok [23:59:11] it doesn't really harm anything if a task is kept open for a few weeks more [23:59:41] a task like that is kind of borderline which is why I'm asking