[00:00:07] woooooooooooooo [00:00:35] yuvipanda: is it um, ready to make production edits? I have a replace.py job I need to run [00:00:55] legoktm: how many? [00:01:23] dunno, like 30 or 100 [00:01:29] legoktm: oh yeah, let's do it [00:01:38] legoktm: the only instability is me restarting things now and then but I"ll hold off now :) [00:02:03] also, is the oauth stuff all done now? [00:02:08] legoktm: no haven't started on it yet [00:02:09] or do I have to give it my password? [00:02:14] legoktm: so you have to give it your password now [00:03:14] are the cookie files only readable to me? [00:03:30] legoktm: yes [00:03:41] legoktm: well, and me [00:03:47] legoktm: and other labs / tools admins [00:04:00] I mean, they already have my password from my other tools so nbd [00:04:05] yeah [00:04:08] nobody outside of that [00:04:20] each person's $HOME is individually written to /data/project/paws/userhomes/$userid [00:04:49] legoktm: one issue with OAuth is then I need to get way more powers for PAWS [00:04:56] right. [00:04:58] and not just authentication [00:05:01] tools.paws@jupyter-legoktm-2645:~$ python replace.py [00:05:01] python: can't open file 'replace.py': [Errno 2] No such file or directory [00:05:09] legoktm: pwb replace? [00:05:09] Do I have to do something before I can run stuff? [00:05:15] legoktm: did you do pwb [00:05:17] err [00:05:19] pwb.py [00:05:21] no? [00:05:22] legoktm: pwb.py login? [00:05:25] legoktm: https://www.mediawiki.org/wiki/User:John_Vandenberg/GCI_walk-through [00:05:30] tools.paws@jupyter-legoktm-2645:~$ pwb.py login [00:05:31] bash: pwb.py: command not found [00:05:32] legoktm: the scripts are in /srv/pwb [00:05:48] legoktm: oh wat. when did you login? [00:05:57] a few hours ago [00:05:59] legoktm: oh yeah, you logged in before I fixed that [00:06:05] okay, log out and in again? [00:06:10] legoktm: go to 'control panel', and 'stop' your server [00:06:17] legoktm: and wait for about 30s and 'start' again [00:06:20] (I need to fix this properly too) [00:06:41] legoktm: I"ll tell you when it's fully stopped... [00:06:48] lol, does it have to make a network request? [00:07:54] legoktm: no, I deployed a new container [00:07:57] legoktm: with PATH set [00:08:03] and the i18n stuff fixed [00:08:07] you're still on the old container [00:08:17] and 'stop' only rqeuests the stop and doesn't wait for it to actually stop [00:08:20] which I should fix [00:08:47] so is it good for me to login now? [00:08:59] (I didn't really understand what you just said btw) [00:09:09] adding a MOTD is a good idea imo [00:09:21] legoktm: did you go to 'control panel' and click 'stop server' [00:09:26] legoktm: control panel in the top bar [00:10:26] no, I hit logout [00:10:37] legoktm: ah ok. let me just delete it for you [00:10:39] moment [00:10:41] ok :P [00:11:31] legoktm: ok login now [00:13:53] yay it works! [00:14:19] legoktm: \o/ [00:15:11] 10MediaWiki-extensions-OpenStackManager, 10Echo, 3Collaboration-Team-Current: Write presentation models for notifications in OpenStackManager - https://phabricator.wikimedia.org/T116853#1851045 (10Mooeypoo) a:3Mooeypoo [00:16:40] legoktm: did your replace.py run work? [00:17:06] legoktm: also where's the cookie file created? in $HOME? [00:17:56] no, got distracted [00:17:57] in $PYWIKIBOT2_DIR [00:18:16] legoktm: hmm where is that? [00:18:20] legoktm: ok run run run :D [00:18:20] ummm [00:18:21] pm [00:19:23] the cookie file is /srv/pwb/pywikibot.lwp [00:19:26] legoktm: hmm I guess we should make it do it in $HOME [00:19:48] we should set $PYWIKIBOT2_DIR = $HOME/.pywikibot I think [00:20:01] ok [00:20:09] I can do that.. [00:20:16] what else gets put in $PYWIKIBOT2_DIR? [00:20:20] what else is it used for? [00:20:27] legoktm: actually let's switch to #pywikibot [00:20:40] ok [01:23:16] bd808: btw, first edit done via PAWS! https://test.wikipedia.org/w/index.php?title=Test&diff=prev&oldid=253311 [01:23:41] neat! [01:23:53] bd808: legoktm was going to run a larger script at some point soon :) [01:24:15] And I'm trying to do deep OAuth integration... [01:24:18] * bd808 fell into a dark hole of fixing scholarships and hasn't even stopped for dinner yet [01:24:39] fun fun fun [01:24:43] Deep like passing the oauth token to pwb? [01:24:45] I saw your commits to slimapp go by [01:24:47] bd808: yes [01:24:54] bd808: so you won't have to enter your password [01:25:07] no passwords please! [01:25:11] yeah [01:25:28] * bd808 lectures yuvipanda about passing prod credentials into labs [01:25:36] :) [01:25:43] bd808: I was testing with a test account :P [01:25:52] bd808: also all these bots already have their passwords lying around on NFS [01:26:18] Brad is trying to figure out a plan for "bot passwords" that would be easy to revoke on the wiki side [01:26:26] +1 [01:26:38] but I guess this oauth thing at least prevents it for interactive pwb usage [01:27:15] there is a patch up right now for "personal" oauth grants [01:27:47] that would let anyone using pwb or another thing with oauth support stay away from passwords [01:28:30] 10PAWS: Do not require users to type passwords into PAWS - https://phabricator.wikimedia.org/T120331#1851216 (10yuvipanda) 3NEW [01:28:39] right now it is possible but you have to get your oauth app approved and this would make a class of grants taht at auto approved but only work for the account that requested them [01:28:39] 10PAWS: Do not require users to type passwords into PAWS - https://phabricator.wikimedia.org/T120331#1851223 (10yuvipanda) p:5Triage>3High [01:33:46] AFAIK the user who requests a grant can use it immediately even before it's approved, for testing purposes [01:34:22] bd808: right, but I guess I can't use those either since that'll require each user to get their own OAuth request approval instead of just using PAWS' [01:35:02] yuvipanda: yeah figuring out how to pass through a general PAWS OAuth grant would be nice I thik [01:35:44] your grant may end up looking a bit scary though. I would guess you'd want to ask for *all the permissions* to pass to the bot scripts [01:36:26] bd808: yeah, except any admin type stuff [01:36:45] bd808: I guess I should use all the protections (RSA and ip limit too) [01:37:11] *nod* certainly RSA [01:37:15] bd808: also I'll be passing the consumer id and secret down to the terminal, so they'll all be considered public [01:37:24] actually I probably can't do RSA [01:37:31] since if I do I'll have to also pass that in [01:37:35] so defeating the purpose, I guess [01:37:38] hmmm [01:37:48] but if I specify the callback to be only an exact URL [01:37:53] can others bypass that? [01:37:56] passing the secret is really bad [01:38:06] yeah but what else can I do? [01:38:22] "talk to csteipp" [01:38:26] indeed [01:38:35] but this is the same as the unsolved 'mobile app' or 'desktop app' scenario IMO [01:38:43] yeah [01:38:52] which makes it no good really [01:39:08] well, I don't fully understand the attack scenarios, I guess [01:39:10] the url pinning works afaik [01:39:14] since it's tied to a URL [01:39:33] the url and ip range restrictions should help [01:39:37] yea [01:39:40] h [01:39:45] the ip range would be all of labs, so not *that* helpful [01:39:47] but the URL definitely [01:40:24] if you had the secret and sniffed other tokens then you could act as those other users [01:40:37] bd808: yeah but if you're sniffing other tokens... [01:40:48] you might as well sniff the session cookie no [01:41:47] I think oauth sessions need to keep passing the token as a header too, but I get where you are trying to lead me [01:43:05] 'tis also orders of magnitude more secure than a password, I guess :) [01:43:18] for sure [01:43:55] https://phabricator.wikimedia.org/T119859 will also be a 'killer feature' of sorts, I suppose [01:43:59] easy web exposing [01:44:18] bd808: pywikibot people are going to be using it for GCI https://www.mediawiki.org/wiki/User:John_Vandenberg/GCI_walk-through [01:45:20] so I've a captive audience :D [01:45:30] "test subjects" [01:46:56] yuvipanda: is https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey/Bots_and_gadgets#Migrate_dead_links_to_Wayback_Machine actually hard? [01:47:00] bd808: :) I'm also talking to a researcher from norway who wants to run jupyterhub + kubernetes [01:47:04] bd808: it's a matter of scale [01:47:28] bd808: we've millions of links that need to be checked for HTTP return codes (and rate limiting, etc to deal with - a usual large crawling operation) [01:47:39] bd808: and then the Bot Authorization process to go through [01:48:01] "Likely what will be required is a team of programmers working full-time, something that is beyond the scope of a few volunteers working spare time." [01:48:22] bd808: it's a tantalizingly fun problem, sure [01:49:02] so you need to (a) harvest links from WP, (b) check to see if they are still live, (c) submit to IA if live, (d) tag for cleanup if dead [01:49:29] bd808: (c) is already being done by IA themselves IIRC [01:49:45] but only for newly added links right? [01:50:03] bd808: you need (a), (b), and (e) which is 'replace them with a link to the closet to-the-date-of-the-cite IA link' [01:50:06] (e) is a bit hard too [01:50:19] bd808: hmm, not sure. I presumed they did a prior run too, but maybe not [01:50:53] 10PAWS: Do not require users to type passwords into PAWS - https://phabricator.wikimedia.org/T120331#1851249 (10Legoktm) https://www.mediawiki.org/wiki/Manual:Pywikibot/OAuth [01:52:36] (e) could be mechanical turked I bet [01:53:14] ok, time to eat and then look at elastic [01:55:42] \o/ [02:33:22] 10PAWS: Terminal gets 'stuck' after a few minutes without activity - https://phabricator.wikimedia.org/T120335#1851299 (10yuvipanda) 3NEW [02:34:43] 10PAWS: TranslationError : scripts/i18n submodule not fetched - https://phabricator.wikimedia.org/T120312#1851309 (10yuvipanda) 5Open>3Resolved a:3yuvipanda This works now [02:54:43] yuvipanda: so... can I cherry-pick my puppet patch to the k8s puppet master or should I do something else to test things out [02:55:13] bd808: soooo.... self hosted puppetmaster on labs is broken. [02:55:24] dude [02:55:27] err [02:55:29] ok [02:55:31] to rephrase [02:55:33] on *tool labs* is broken [02:55:36] because of a very specific thing [02:55:40] it's localized to just there [02:55:50] it's not broken elsewhere [02:56:01] ah. the cert stuff you were playing with or something I imagine [02:56:02] bd808: but no worry! let me spin up an elasticsearch-01 instance [02:56:18] bd808: since I've almost killed the need for role::puppet::self in clients [02:56:37] bd808: actually, you can do it - just create an instance, and set the 'puppetmaster' hiera variable to 'tools-puppetmaster-01.tools.eqiad.wmflabs' [02:56:48] k [02:56:50] bd808: then run puppet, and do 'rm -rf /var/lib/puppet/ssl' and tada, it's all good [02:56:56] bd808: I'm working on automating the 'rm -rf' part now [02:57:12] bd808: you can cherrypick on to tools-puppetmaster-01 after that and test. [02:57:20] even though puppet is broken on tools-puppetmaster-01 itself [02:57:22] but that's ok [02:57:27] lol [02:57:50] bd808: yeah, this entire saga resulted in this ticket https://phabricator.wikimedia.org/T120159 [02:58:03] where I'm going to dedicate effort into finally killing the almost-duplicate puppet/ module [02:58:15] and it's fun idea of putting certs in different places depending on weteher it is self hosted puppet or not [02:58:43] which works around the problem of bootstrapping itself [02:58:52] and is what you need a magic solution for [02:58:57] bd808: I have a magic solution :D [02:59:11] if I Can figure out bash [02:59:13] bd808: https://dpaste.de/xY4F [02:59:24] bd808: line 28 onwards, I want to check on the result of the 'curl' [02:59:26] * bd808 is a bash wizard [02:59:39] bd808: if it's 0, it means it's fine, no puppet and certs are fine, etc. [02:59:48] bd808: if it is not 0, and specifically if it is 60, means cert has changed [02:59:59] bd808: excepte, this is 'set -e' and so the script fails when curl returns to be not 0 [03:00:09] so.... how do I work around that? [03:00:14] * yuvipanda is a bash muggle [03:00:26] you chain it to an || true [03:00:35] but that will nuke $? [03:00:41] but ... there;s a way [03:00:46] * bd808 heads to google [03:02:22] I think: set -o pipefail; curl --cacert /var/lib/puppet/ssl/certs/ca.pem https://<%= @puppetmaster %>:8140 || /bin/true [03:03:12] * yuvipanda tries [03:03:55] #!/bin/bash [03:03:57] set -e [03:03:59] # We pass show-diff, show the log may be sensitive, [03:04:01] # so make sure it's sufficiently protected [03:04:03] touch /var/log/puppet.log [03:04:05] chmod 600 /var/log/puppet.log [03:04:07] # Check this before apt-get update, so that our update doesn't screw up [03:04:09] # package installs in a running (manual and/or initial install) puppet run [03:04:11] PUPPETLOCK=`puppet agent --configprint agent_catalog_run_lockfile` [03:04:13] if [ -n "$PUPPETLOCK" -a -e "$PUPPETLOCK" ]; then [03:04:15] set +e [03:04:17] PUPPETPID=$(cat $PUPPETLOCK) [03:04:19] CMDLINE_FILE="/proc/$PUPPETPID/cmdline" [03:04:21] if [ -f $CMDLINE_FILE ]; then [03:04:23] grep -q puppet $CMDLINE_FILE [03:04:25] if [ $? -eq 0 ]; then [03:04:27] echo Skipping this run, puppet agent already running at pid `cat $PUPPETLOCK` >>/var/log/puppet.log [03:04:29] exit 0 [03:04:31] fi [03:04:33] fi [03:04:35] set -e [03:04:37] ��=���=���=���=���=���=���=���=���=���=���=���=���=���=���=���=���=���=���=���=���=���=��OE*N [03:04:39] dammit [03:04:42] i dont think you should have to invent new bash scripts to check SSL certs [03:04:52] check_http has an option for it [03:04:57] dammit [03:05:01] am i here? [03:05:04] double dammit [03:05:06] 19:07 < mutante> i dont think you should have to invent new bash scripts to check SSL certs [03:05:07] yes you are [03:05:09] 19:07 < mutante> check_http has an option for it [03:05:12] yea [03:05:18] mutante: that's not my bash script [03:05:23] mutante: that's me adding 3 lines to puppet-run [03:05:38] ignore me, it seemed you are writing yet another "check certificate" [03:05:58] yuvipanda: oh, why not just do the set +e dance they do above? [03:06:06] oh [03:06:08] ofc [03:06:10] can do that [03:06:12] * yuvipanda is an idiot [03:06:13] I just ignored that big block [03:06:19] got caught in my bash defence [03:06:48] bd808: hmm but if I do set -e right after the curl [03:06:51] that'll reset $? too [03:06:52] eh, puppet-run and set -e , we just had that the other day [03:06:55] for the apt-get update thing [03:06:55] I'm sure it can be done with either pipefail or $PIPESTATUS[0] too [03:07:14] and faidon voted down brandon's suggestion to move it [03:07:23] yuvipanda: yeah don't reset until after your test [03:07:36] bd808: but then I want the rm to fail to cause it to fail too [03:07:57] so rm -r ... || exit 1 [03:08:04] ah [03:08:07] ofc [03:08:09] * yuvipanda has no bag of bash tricks [03:08:25] if [ $? -eq '60' ]; then [03:08:29] is that a correct check, bd808? [03:08:33] or am I doing something totally wrong? [03:09:18] that's ok. I's use `if [[ $? -eq 60 ]]` maybe [03:09:24] ok [03:09:29] the [[ is bash's internal test [03:09:43] and nicer than /bin/test that you get with [ [03:09:58] it behaves much better with quoting and things [03:10:28] so you don't have to do crap like [ "x-$FOO" == "x-" ] [03:10:57] * bd808 twitches when he sees that [03:11:49] also see https://gerrit.wikimedia.org/r/#/c/256148/1/modules/base/files/puppet/puppet-run and the comments on that [03:11:50] heh [03:12:18] mutante thanks :) but completely unrelated to what we are doing [03:12:35] well, you are editing the same file [03:12:40] and talked about set -e [03:12:50] I'm also writing bash [03:12:53] and using vim [03:12:53] and it's important where in the file it is [03:12:55] and gerrit :) [03:13:26] mutante: that's for the apt issues we've been seeing? [03:13:28] that's talking about dealing with apt failures. I'm writing sometihng completely different. [03:13:33] are you not changing puppet-run [03:13:38] didnt you just say that [03:13:46] "3 lines in puppet-run" [03:13:47] I am but that patch is about apt... [03:13:50] I"m not touching apt [03:14:07] I'm also not touching the set -es that already exist... [03:14:25] did you see the code we were talking about? [03:14:33] you were about to move the -e to a different location [03:14:34] https://dpaste.de/xY4F [03:14:40] yes, *my* set -e [03:15:49] he's just adding lines 28-31 to check if the puppetmaster has changed to a new cert (by switching masters) [03:17:07] it's a different "puppet-run" file that is only used in labs then? [03:18:29] bd808: do you want to go ahead and do the elasticsearch stuff? I'll figure out the current stuff with the puppet/ module. [03:18:37] shouldn't really clash I guess [03:18:52] yuvipanda: I'm working on it. tools-elastic-01.tools.eqiad.wmflabs is up now [03:18:57] just starting to poke it [03:18:57] bd808: \o/ awesome [03:19:05] bd808: I'll poke you if I touch the puppetmaster at all [03:19:32] * bd808 wonders why this new host is stupid slow [03:20:14] oh. nfs homedir [03:20:19] uck [03:20:57] yuvipanda: is it possible to un-nfs just these hosts or is that project wide? [03:21:07] bd808: oh yeah, we can un-NFS just them [03:21:08] bd808: moment [03:21:20] bd808: https://wikitech.wikimedia.org/wiki/Hiera:Tools/host/tools-k8s-master-01 [03:21:28] bd808: just that, plus manually unmount [03:21:37] bd808: if you create the hiera file before you create the host it wouldn't mount in the first place [03:21:50] next time [03:22:13] I'll probably have to reboot since I'm logged in as me, but I'll figure it out [03:22:35] bd808: oh? I usulaly manage to unmount after sudoing and cding elsewhere [03:25:37] cool. I'll try it [03:27:36] yuvipanda: I apparently don't have sudo in tools? [03:27:55] bd808: oh? [03:28:02] bd808: you totally should have, you're projectadmin... [03:28:05] let me check [03:28:17] I got prompted for my password [03:28:43] lookin [03:29:00] there are custom sudoer policies [03:29:02] https://wikitech.wikimedia.org/wiki/Special:NovaSudoer [03:30:00] I suppose I have the rights to add myself to the "roots" policy [03:30:04] bd808: added you anyway now [03:30:22] works. thanks [03:30:43] bd808: yw [03:30:46] ]] [03:51:18] 6Labs, 6operations, 5Patch-For-Review: Kill the 'puppet' module with fire, make self hosted puppetmasters use the puppetmaster module - https://phabricator.wikimedia.org/T120159#1851429 (10yuvipanda) Once ^ gets merged, I'll have to find list of all instances that have role::puppet::self *and* the puppetmast... [03:53:22] yuvipanda: puppet failure -- Could not evaluate: Could not retrieve information from environment production source(s) file:/var/lib/puppet/client/ssl/certs/ca.pem -- is that expected? [03:53:34] bd808: on which host? [03:53:52] that's after switching to the k8s master on tools-elastic-01 [03:54:21] bd808: how did you switch? [03:54:59] with https://wikitech.wikimedia.org/wiki/Hiera:Tools/host/tools-elastic-01 [03:55:14] oh, but I did the old way first (setting in wikitech) [03:55:24] should I undo that part? [03:55:44] bd808: old way as in role::puppet::self? [03:56:00] I didn't check the role but I set the variable [03:56:09] * bd808 undoes that [03:56:11] bd808: oh, yeah, undo the variable and that *should* work [03:56:28] the fact that the variable and the hiera param are the same name but different is confusing [03:56:50] bd808: you would have to rm -rf /var/lib/puppet/ssl too [03:57:11] (this should get better once https://gerrit.wikimedia.org/r/#/c/256890/ lands) [03:57:30] I did that; then it generated a new client cert and I accepted it on tools-puppetmaster-01 [03:58:07] yeah, there was some magic code that attempts to figure out where the puppet ssl dir is in what case [03:58:13] hmm. unsetting the ldap var didn't seem to help [03:58:15] and since there are like 4 million edge cases I don't trust it [03:58:25] bd808: hmm, looking [03:58:40] ah [03:58:43] moment [04:01:10] bd808: fixed. see top commit on puppetmaster [04:01:16] I undid all the 'magic' [04:01:46] beautiful commit message [04:01:58] yeah [04:02:07] I need to work that into my patch [04:02:09] somehow [04:03:16] puppet runs clean now [04:03:22] * bd808 gets ready to change that [04:09:30] !log tools Cherry-picked https://gerrit.wikimedia.org/r/#/c/256618 to tools-puppetmaster-01 [04:09:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [04:10:57] hrm.. did I kill the puppetmaster? [04:12:39] oh? [04:13:01] bd808: interesting. [04:13:03] the puppetmaster service went down and won't come back up [04:13:09] I know what happened unfortunately [04:13:12] this is going to be all a mess [04:13:20] let me fix it for now [04:13:28] it was your commit huh? [04:13:33] all certificates [04:14:16] bd808: yes, except now that it does the 'right' thing elsewhere it changed the certificate on the puppetmaster itself [04:14:33] *nod* [04:14:47] that was sort of what the errors looked like [04:15:25] * bd808 needs to learn more about debugging systemd stuff still [04:15:27] bd808: yeah, has;jgha;sdjg [04:15:29] osadjg;asjdg; [04:15:31] asjdg; [04:15:33] akshgd [04:15:35] jn; [04:15:37] I HATE EVERYTHING AA:JFAKGJD [04:15:39] sigh [04:15:45] * yuvipanda stops to breathe for a while [04:16:21] you take this crap pretty seriously [04:16:41] is just computers [04:17:03] not worth actually being mad at ;) [04:17:17] I guess I got nothing else going on, huh? :) [04:17:31] * yuvipanda parks that line of thought for a while and continues being mad instead [04:18:20] be aggressively interested in better outcomes; but skip being mad [04:18:31] it's just day 3 of this, I guess [04:19:23] hahaha [04:19:26] ofcours [04:19:35] self hosted puppetmasters keep their puppet conf in a *Different* file [04:19:37] at a different path [04:19:41] and people ask me why I hate it [04:21:43] oh wow [04:21:46] and it declares ssldir *twice* [04:21:49] in each file [04:21:51] for a grand total of 4 times [04:22:18] obviously [04:22:29] 4 shall be the number of the counting [04:22:48] * yuvipanda throws the holy handgrenade at foot [04:23:56] bd808: it' now back to different shit breaking, of course, but at least puppetmaster works [04:24:42] w00t. I got a new error (and one I was expecting) [04:25:34] bd808: yay [04:25:51] bd808: I'll stop touching the puppetmaster now and let you play with stuff. [04:38:57] progress... not success yet [04:39:48] yuvipanda: it almost sort of worked but half way through the certs bit me on the ass again [04:39:58] bd808: oh, which bit now? [04:40:13] bd808: y'know, I can probably cherry-pick my patch again, and *disable puppet* on the puppetmaster [04:40:19] "Error: Could not request certificate: The certificate retrieved from the master does not match the agent's private key." [04:41:27] bd808: try again [04:41:41] !log disabled puppet on tools-puppetmaster-01 because everything sucks [04:41:42] disabled is not a valid project. [04:42:03] !log tools disabled puppet on tools-puppetmaster-01 because everything sucks [04:42:07] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [04:42:56] looks like maybe I need to regenerate the client key [04:43:13] "Could not request certificate: The certificate retrieved from the master does not match the agent's private key." [04:43:30] bd808: let me look [04:43:41] bd808: nope, moment [04:44:50] bd808: yup, fixed again [04:44:56] bd808: should stay fixed this time for a while [04:45:00] bd808: failures still but I think they are yours [04:45:18] certainly could be. I'll check it out [04:45:21] bd808: I cherry-picked my patch back on top of yours. can you rebase to change order to make your life easier? [04:46:26] bd808: no not you. more puppet fuckery [04:46:29] looking [04:48:52] bd808: actually [04:48:54] + source => 'puppet:///labs/toollabs/elasticsearch/nginx.conf', [04:48:57] might be the issue? [04:49:02] bd808: labs/? [04:49:31] the file is at files/labs/toollabs/elasticsearch/nginx.conf [04:50:01] do I need the "files" in there since it isn't module? [04:50:47] yes I do [04:51:05] bd808: yes [04:52:04] heh. "HEAD is now at dfe480c everything sucks" [04:52:10] indeed [04:53:30] cool. now I think I have some error in my nginx config to fix [04:54:37] \o/ [05:30:24] bd808: are you still messing with the hosts? [05:30:28] ah you are [05:30:30] nvm go on :) [05:30:38] I think this will do it [05:38:02] bd808: \o/ cool [05:38:11] bd808: I think I'm done for the day, since I need to wake up early. [05:38:16] bd808: anything you want me to do before I go? [05:38:28] nope. I'm going to wrap it up too [05:38:38] I think I'm pretty close [05:38:38] bd808: \o/ awesome. [05:38:59] nginx config is still not quite right but not far off I think [05:40:10] \o/ cool [05:41:18] * yuvipanda goes off [05:41:19] good night [06:12:05] 6Labs, 10Beta-Cluster-Infrastructure, 10Labs-Infrastructure, 6operations: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#1851595 (10Chmarkine) Let's Encrypt is in Public Beta now. Everyone can get free certificates from them now. [1] https://letsenc... [07:35:46] 10PAWS, 10pywikibot-core: svnversion failed - https://phabricator.wikimedia.org/T120268#1851654 (10yuvipanda) @jayvdb I already have git installed, and in fact now git clone is what's used to provision pwb. Can you check if this is still happening? [07:48:40] 10PAWS: 'Stop my Server' in PAWS should wait until server is actually stopped - https://phabricator.wikimedia.org/T120351#1851664 (10yuvipanda) 3NEW [07:52:19] 10PAWS: Customize Jupyter Logo in PAWS to say PAWS - https://phabricator.wikimedia.org/T120352#1851673 (10yuvipanda) 3NEW [08:21:27] 6Labs, 10MediaWiki-extensions-OpenStackManager: Prevent empty service groups - https://phabricator.wikimedia.org/T120022#1851705 (10MoritzMuehlenhoff) With the currently used schemas it's needed: We use the object class "groupofnames" for which "member" is a mandatory attribute. and we cannot modify the exist... [08:37:52] 6Labs, 10MediaWiki-extensions-OpenStackManager, 10MediaWiki-General-or-Unknown, 10wikitech.wikimedia.org, 5Patch-For-Review: MWException after account creation on wikitech - https://phabricator.wikimedia.org/T117553#1851734 (10ori) Since hook signature errors are an abnormal condition, it may be OK to ca... [09:46:29] 6Labs, 10MediaWiki-extensions-OpenStackManager: Prevent empty service groups - https://phabricator.wikimedia.org/T120022#1851783 (10scfc) If it makes the maintenance of the LDAP server more standardized and less complex, I think that's more important. [11:03:46] 6Labs, 10Beta-Cluster-Infrastructure, 10Labs-Infrastructure, 6operations: beta: Get SSL certificates for *.{projects}.beta.wmflabs.org - https://phabricator.wikimedia.org/T50501#1851883 (10Krenair) Yeah, beta.wmflabs.org was in the private beta. Don't know if it can actually work with our setup though. [13:51:14] yuvipanda: https://tools.wmflabs.org/bd808-test/elastic.php [13:56:22] 6Labs, 10Tool-Labs, 5Patch-For-Review, 15User-bd808: Setup an experimental, user accessible (read+write) ES cluster for Tool Labs - https://phabricator.wikimedia.org/T120040#1852192 (10bd808) I've got the basic cluster up and running on tools-elastic-0[123]. The status of the cluster can be seen from https... [14:07:17] Hi all, how can I list all languages for a project? [14:10:33] * anomie sees backscroll [14:10:33] bd808, yuvipanda: Not having actually tried to do it, https://meta.wikimedia.org/wiki/2015_Community_Wishlist_Survey/Bots_and_gadgets#Migrate_dead_links_to_Wayback_Machine doesn't seem that hard really. The hardest part is establishing the relationship with IA, IMO. And depending on how exactly the "(a) harvest all the links" bit is to be done there might be some firehose issues to deal with. Then maybe you could get tricky trying to detect [14:10:33] soft-404s. "(e) add archiveurl to cite templates" is really not very hard, I did that sort of thing already in https://en.wikipedia.org/wiki/User:AnomieBOT/source/tasks/ReplaceExternalLinks5.pm [14:10:33] Personally, I've never been terribly impressed with IA's "retroactively apply robots.txt" policy though. [15:50:18] hey, is deployment-prep's quota all full up? i can't create instances there [15:51:32] ottomata: hm nope looksl like 33 instances of 50 assuming horizon is right [15:51:48] k trying again [15:52:34] I hate to say it but...have you loggted out and in again :D [15:53:09] I have! [15:53:16] i just logged in before trying to create [15:53:30] will try again.. [15:55:41] nope [15:55:42] no good [15:55:46] Failed to create instance. [15:59:34] chasemp: "thcipriani [15:59:34] ottomata: yeah, the amount of memory allocated for the project is filled (as well as number of instances). [15:59:35] " [15:59:50] hm well, horizon is off its rocker then [15:59:55] this is the information on wikitech [16:01:09] RAM: 286720/286720 Instances: 50/50 (although, I haven't actually done any counting) [16:01:24] I'm not sure if some instances can be culled or not [16:01:45] and then I'm also not sure how to adjust teh quoto so I'm not much help [16:01:45] atm [16:03:31] I was just looking at the instances, only thing that stands out is the sentry2 instance (don't know the status there) but culling that one wouldn't buy us much of anything. andrewbogott (iirc) was able to increase our quota in the past. [16:08:57] I can icrease quotas at need, How much do you require? [16:09:13] chasemp: Also, I can show you the way. :-) [16:09:50] sweet [16:10:05] wonder why horizon is so far off as a side note [16:10:15] chasemp: Done from labcontrol1001 [16:10:46] chasemp: I... don't know. I don't think the horizon install is entirely up to date atm. [16:11:19] chasemp: So, the idea is once you have sourced ~root/novaenv.sh for the credentials... [16:11:30] chasemp: You can nova quota-show --tenant $foo [16:11:55] chasemp: That'll show you the current values but also the actual names of the quotas. :-) [16:11:59] with you so far [16:12:42] chasemp: Next step is just as simple: nova quota-update --name-of-quota $value $tenant [16:13:13] Like, say, 'nova quota-update --instances 105 tools' [16:18:43] chasemp: what about horizon being off? [16:18:56] thcipriani: So, what quota increase do you need? [16:19:14] is there a way to show the quata usage from nova as well as teh quota itself? [16:20:18] chasemp: on wikitech if you’re on the ‘manage projects’ page there’s a ‘Display Quotas’ link [16:20:29] I’m not sure if there’s a commandline that gives you everything at once [16:20:52] andrewbogott: I never found one, at any rate, but it surprises me. [16:21:15] yes, seems silly [16:21:23] the new commandline is all behind the ‘openstack’ command [16:21:28] so there might be something there [16:21:55] Wait, depending on what you meant exactly chasemp, nova absolute-limits --tenant xxx [16:22:29] yeah that's pretty close [16:22:36] | RAM | 286720 | 286720 | [16:22:42] The set of what it shows doesn't exactly match the set of quotas though. [16:23:04] and it's not ordered or consistent naming :) [16:23:05] fun times openstack [16:23:36] (And, IIRC, the 'Max' value there /can/ be the one derived from quota but may be some smaller limit coming from elsewhere). But it does show the 'Used' column. [16:25:03] Coren: we have in the plan 3 new instances (if ottomata was planning on adding 1 new instance) with some buffer room it would be nice to up the quota from 50 to 60 instances for the time being. Up the ram to the point where they could all be medium sized so ram from 286720 to 327680 cores would be fine with 10 new medium instances. [16:25:25] was going to add 2 instances and then delete one after migrating [16:25:30] maybe 3 [16:26:40] well it's a ram limit I believe not strictly an instance one -- which is effectively teh same in this case but not literally [16:28:38] 6Labs, 10Tool-Labs: Move tools-master and tools-shadow to trusty - https://phabricator.wikimedia.org/T94791#1852570 (10coren) `tools-grid-shadow` is now the active master, and seems to be running without issues. I'm going to give it a little while then create the new (trusty) master. [16:31:01] ok. 2 days ago I went to add 2 and only got to 1 (that was smaller than I wanted) so I'd probably ditch the m.medium I made and add back two m.large instances. Adding 3 more medium instances would take up more than half the increase in memory I proposed. If we moved to 368640 that would be enough additional memory capacity for 10 m.large instances. [16:34:25] andrewbogott: Coren how to see global ram availability then to make sure a quote increase on a tenant isn't ill advised? [16:38:25] chasemp: try nova hypervisor-stats [16:39:23] thanks [16:49:35] andrewbogott: Coren ok doing all the math I think it's fair to increase the quota to the requested atm so..I did [16:49:40] ottomata: thcipriani give it a whirl [16:49:54] thanks chasemp [16:52:27] chasemp: thanks, looks like the memory quota has increased but not the instances quota (couldn't create a new m.large just now) [16:55:02] thcipriani: ah yes ok so I upped it by 5, not because I'm a jerk (necessarily) but because the 10 increase at 20% seems like a big jump and we should probably talk general deployment-prep growth before going much further etc etc etc [16:55:40] chasemp: sounds fair :) thanks! [17:12:34] bd808: \o/ awesome [17:12:57] anomie: yes, not that hard - just not a '3 day job' either. A couple of weeks for people who know what they're doing [17:13:19] bd808: anomie I'm involved with the IA people and can do talking integration if needed [17:20:52] yuvipanda: I spent some time this morning scratching my head but finally figured out that ferm had gone wonky on the 01 server and subsequent puppet runs didn't fix it. stopping and starting ferm got it all to green [17:21:10] ah awesome [18:35:32] YuviPanda: I've had the new trusty shadow master play the master role since early today; I've yet to see an issue but keep an eye out just in case? [18:39:02] YuviPanda: also, a fresh pair of eyes on https://gerrit.wikimedia.org/r/#/c/256693/ would be much appreciated; all of Faidon's last quibbles have been squished but it's only been the two of us looking at it. [19:02:56] (03PS1) 10Dzahn: add fake SSL keys for ldap-laps compile errors [labs/private] - 10https://gerrit.wikimedia.org/r/256965 [19:03:34] (03PS2) 10Dzahn: add fake SSL keys for ldap-labs compile errors [labs/private] - 10https://gerrit.wikimedia.org/r/256965 [19:03:53] (03CR) 10Dzahn: [C: 032 V: 032] add fake SSL keys for ldap-labs compile errors [labs/private] - 10https://gerrit.wikimedia.org/r/256965 (owner: 10Dzahn) [19:33:22] !log tools switching master role to tools-grid-master [19:33:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [19:38:34] 6Labs, 10Tool-Labs: Move tools-master and tools-shadow to trusty - https://phabricator.wikimedia.org/T94791#1853090 (10coren) `tools-grid-master is now the active master` and `tools-grid-shadow` has resumed its role as shadow master. There is an apparent lingering issue (almost certainly Trusty-based) that th... [20:53:53] YuviPanda: The new trusty master and shadow masters are chugging along; ima power down the old precise instances but not delete them yet in case something goes bust over the weekend. Sounds sane to you? [20:54:01] Coren: +1 [20:55:29] 6Labs, 10Tool-Labs: Move tools-master and tools-shadow to trusty - https://phabricator.wikimedia.org/T94791#1853532 (10coren) The original masters (`tools-master` and `tools-shadow`) have been turned off but not yet deleted to monitor how things go over the weekend. [21:01:41] cool [21:02:44] PROBLEM - Host tools-master is DOWN: CRITICAL - Host Unreachable (10.68.16.9) [21:05:58] PROBLEM - Host tools-shadow is DOWN: CRITICAL - Host Unreachable (10.68.16.10) [21:08:31] Oh, right. Shinken. [21:17:00] When i log into wikitech.wikimedia.org i'm getting an exception [21:17:03] https://www.irccloud.com/pastebin/nmvodkV7/ [21:17:07] is this a known issue? [21:17:36] jdlrobson: No; lemme look into it. [21:17:45] a refresh shows the sign in worked but i can't manage instances [21:19:24] 6Labs: 'virt1' entry at markmonitor? - https://phabricator.wikimedia.org/T102689#1853664 (10Andrew) "HI Andrew, We requested the names server virt1.wikimedia.org be removed and the registry responded, that this nameserver is in use as an authoritative nameserver for name(s) within the .org space. The glue re... [21:20:58] Coren: correction - i can manage my instances. So just the internal error issue [21:21:30] Hm. I can't seem to reproduce it - can I trouble you to log of and back on again? [21:22:51] Coren, what? [21:22:54] That's definitely a known issue. [21:23:35] https://phabricator.wikimedia.org/T117553 [21:24:17] Krenair: I was under the impression that that bug was only account creation having issues? [21:25:03] And I can't reproduce the error - I can log in fine. [21:25:12] (Possible difference is I have 2fa enabled) [21:25:46] Coren, well look at the stack traces. . . Definitely the same error [21:26:31] Hmmm. You're right about that - so the issue isn't actually related specifically to either account creation or logging in. [21:37:02] 6Labs, 10MediaWiki-extensions-OpenStackManager, 10MediaWiki-General-or-Unknown, 10wikitech.wikimedia.org, 5Patch-For-Review: MWException after account creation on wikitech - https://phabricator.wikimedia.org/T117553#1853761 (10ori) 5Open>3Resolved a:3ori [21:38:44] jdlrobson: Well, the issue was known apparently, and ori has deployed a patch. :-) [21:39:07] Coren: thanks for the follow up :) [21:48:42] Coren: chasemp andrewbogott first real edits being made to wiki from the pywikibot as a shell thing https://i.imgur.com/PojxutV.png \o/ [21:48:58] (and https://en.wikipedia.org/w/index.php?title=Dean_Kamen&diff=prev&oldid=693785248) [21:49:10] YuviPanda: Muy cool. [22:18:22] 6Labs, 10Labs-Infrastructure: labcontrol1001 and 1002 running web servers on 80 and 443 for no reason - https://phabricator.wikimedia.org/T120449#1854056 (10Andrew) 3NEW [22:38:42] 6Labs, 10Labs-Infrastructure, 6operations: Apache on labs-ns0 - https://phabricator.wikimedia.org/T120463#1854225 (10Dzahn) [22:38:48] 6Labs, 10Labs-Infrastructure, 6operations: Apache on labs-ns0? - https://phabricator.wikimedia.org/T120463#1854229 (10Dzahn) [22:40:03] 6Labs, 10Labs-Infrastructure, 6operations: Apache on labs-ns[01]? - https://phabricator.wikimedia.org/T120463#1854236 (10Krenair) [22:53:02] 10PAWS, 10pywikibot-core: svnversion failed - https://phabricator.wikimedia.org/T120268#1854269 (10Legoktm) I can't reproduce this anymore. [22:54:09] 10PAWS, 10pywikibot-core: svnversion failed - https://phabricator.wikimedia.org/T120268#1854277 (10yuvipanda) 5Open>3Resolved a:3yuvipanda Yes, I think the git clone stuff made it work. [22:55:16] 10PAWS, 10pywikibot-core: FileNotFoundError: [Errno 2] No such file or directory: 'generate_user_files.py' - https://phabricator.wikimedia.org/T120266#1854280 (10yuvipanda) 5Open>3Resolved a:3yuvipanda [23:23:47] bd808: I talked to csteipp and we've settled on a solution :) [23:23:58] awesome [23:25:18] bd808: are we ready to merge the ES patch [23:25:20] ? [23:25:32] bd808: am ok waiting for next week [23:25:34] if need to [23:25:38] so I can fix the puppet mess by then [23:25:59] Merge whenever. There might be some followup nginx tweaks but it's working now [23:34:46] Haing more labs issues bd808 and YuviPanda "No usable default provider could be found for your system." [23:34:54] (when i run vagrant provision) [23:35:36] hmmm... I saw this form someone else (you?) in the past couple of weeks [23:35:59] * bd808 tries to remember what the problem was [23:36:25] instance is reading-web-staging / future-wikipedia [23:36:43] bd808: /etc/profile not picked up? [23:36:45] maybe [23:38:15] ssh hates me there. can't log in [23:38:16] 6Labs, 10Labs-Infrastructure, 6operations: Apache on labs-ns[01]? - https://phabricator.wikimedia.org/T120463#1854383 (10Andrew) [23:38:17] 6Labs, 10Labs-Infrastructure, 5Patch-For-Review: labcontrol1001 and 1002 running web servers on 80 and 443 for no reason - https://phabricator.wikimedia.org/T120449#1854384 (10Andrew) [23:38:20] 6Labs, 10Labs-Infrastructure, 6operations: Apache on labs-ns[01]? - https://phabricator.wikimedia.org/T120463#1854387 (10Dzahn) looks like this is a duplicate of T120449 just saw https://gerrit.wikimedia.org/r/257034 [23:38:37] 6Labs, 10Labs-Infrastructure, 5Patch-For-Review: labcontrol1001 and 1002 running web servers on 80 and 443 for no reason - https://phabricator.wikimedia.org/T120449#1854389 (10Andrew) Apache is used to manage the puppetmasters that run on those boxes. [23:40:23] jdlrobson: found it in my chat logs. the fix last time was -- `sudo chown mwvagrant /srv/mediawiki-vagrant/.vagrant` [23:41:16] vagrant list-roles works its just vagrant provision [23:41:19] still get "No usable default provider could be found for your system." with that [23:41:22] bd808: & [23:41:22] 10PAWS: Implement a 'signing OAuth Proxy' for PAWS - https://phabricator.wikimedia.org/T120469#1854394 (10yuvipanda) 3NEW [23:43:08] I got logged in. I'll look around a bit [23:44:50] jdlrobson: seems like maybe the vagrant error messages are shitty. No container was running. `vagrant up` started to work and then failed [23:44:57] Permission denied - /srv/mediawiki-vagrant/puppet/hieradata/vagrant-managed.yaml (Errno::EACCES) [23:45:11] thanks bd808 [23:45:18] jdlrobson: your umask is bad [23:45:27] -rw-r--r-- 1 jdlrobson wikidev 36 Dec 4 23:34 /srv/mediawiki-vagrant/puppet/hieradata/vagrant-managed.yaml [23:45:56] jdlrobson: what does `umask` say? 0002 or 0022? [23:46:06] bd808: 0022 [23:46:17] hmm that's right [23:46:32] why did these files not get the group write bit? [23:47:07] no, that;s wrong. should be 0002. I wonder if that's a labs bug [23:47:23] * bd808 has umask 0022 there too [23:48:13] 10PAWS: Implement a sane way to access mysql replicas from PAWS - https://phabricator.wikimedia.org/T120471#1854416 (10yuvipanda) 3NEW [23:48:43] hi, is there any way to get more detailed tomcat logs? I'm getting a 500 internal server error, but logs say nothing useful [23:49:24] jdlrobson: ah. apparently we don't normally set the 0002 umask *except* on deploy servers. TIL [23:51:28] jdlrobson: the short term fix is to `chmod -R g+wX /srv/mediawiki-vagrant` [23:51:47] I'll see about a patch to set the umask better [23:52:38] YuviPanda: any tomcat debugging magic that PeterBowman can do? ^^^ [23:53:02] bd808: no, I know Coren was helping him yesterday, and he probably knows more... [23:53:17] I suspect you need to pass flags to tomcat, but I've no idea how tomcat works... :( [23:53:20] bd808: The provider 'lxc' could not be found, but was requested to back the machine 'default'. Please use a provider that existsb [23:53:23] not quite out the woods :( [23:54:53] jdlrobson: `vagrant status` says "The container is currently stopped. Run `vagrant up` to bring it up again." -- so still being bitten by bad error messages from vagrant for the LXC provider [23:55:00] do `vagrant up --provision` [23:57:51] 10PAWS: Do not require users to type passwords into PAWS - https://phabricator.wikimedia.org/T120331#1854433 (10yuvipanda) I have talked to @csteipp about this and he's happy enough.