[00:00:05] akoopal: s1 through s7, that is correct (though those are, strictly speaking, clusters /of/ databases, each generally holding multiple project databases) [00:00:57] Coren: ok, but if I create my (well, erwins') userdatabase on all, that should be enough for now [00:01:26] akoopal: What is the database used for? [00:01:26] untill of course you need to split up more [00:01:46] Coren: a cache of categories [00:02:08] Ah, and those require joins with the project databases then? [00:02:12] yes [00:02:16] * Coren nods. [00:02:23] Then seven is what you need. [00:02:28] this is for the related changes tool [00:02:50] and by the looks of it others as well [00:05:45] but I made a good step for now, now bedtime [00:06:05] Cyberpower678: magnus gave me access, so related changes seems to work :-) [00:07:08] akoopal, ok. :-) [00:19:55] hey labs folks, terrrydactyl is working on Wikimetrics and I'd like to get her access to tools [00:21:27] oops, disregard above, just remembered https://tools.wmflabs.org/ [04:27:05] Coren: it seems my tools tool can't receive mails :p I assume it's because mails to tools.tools@tools.wmflabs.org are routed to tools.maintainers@tools.wmflabs.org, but tools.maintainers@tools.wmflabs.org is considered as an address of maintainers tool. [05:48:36] I installed postgresql on my instance, and want to sudo -u postgres -i [05:48:46] Sorry, user nicolas-raoul is not allowed to execute '/bin/bash' as postgres on map.eqiad.wmflabs. [05:58:13] How can I work around this problem? [05:58:35] hi Nicolas [05:58:40] Nicolas: can you sudo as anything at all? [05:59:05] Hello YuviPanda :-) Nice to meet you, thanks for your work always! [05:59:15] Nicolas: :D [06:00:05] sudo worked perfectly when I ran sudo apt-get install postgresql-9.1-postgis [06:01:03] Nicolas: hmm, can you add me to the project on wikitech? I think thig might be a sudo policy issue [06:02:36] sudo -u root whoami <- works directly, no password prompt [06:03:02] sudo -u postgres whoami <- asks for password, fails with my wikitech password [06:04:22] At https://wikitech.wikimedia.org/w/index.php?title=Special:NovaRole&action=addmember&projectname=oxygenguide&rolename=projectadmin I can only add Andrew Bogott, I don't know how to add you... [06:04:41] Nicolas: hmm, moment [06:05:25] sorry! [06:05:29] I was mistaken [06:05:55] "Successfully added YuviPanda to oxygenguide. " [06:05:58] woot [06:06:48] Nicolas: can you also add me as an admin? [06:07:09] Done [06:08:16] Nicolas: can you close your current ssh session, log back in, and try again? [06:08:36] I just tweaked sudo policy [06:09:29] Sorry, user nicolas-raoul is not allowed to execute '/usr/bin/whoami' as postgres on map.eqiad.wmflabs. [06:10:00] in a brand new ssh session (even logged out of bastion) [06:11:50] Nicolas: ok, try this [06:11:55] sudo su [06:12:00] su postgres [06:12:03] bash [06:12:09] and then attempt to execute things? [06:13:34] "sudo su" asks me for a password and fails. Sorry for being a newbie but is it the same password I use to log into wikitech.wikimedia.org? [06:13:35] Nicolas: ^ [06:14:30] Nicolas: no, it shouldn't be asking you for a password at all [06:14:40] Nicolas: mind if I ssh in and take a look? [06:14:57] Sure! of course please :-) [06:15:03] ok :) sshing now [06:16:41] Nicolas: try the same sequence now? [06:18:28] Great that worked! :-) [06:18:48] Nicolas: woot :) [06:19:34] What was the trick? I will probably delete this instance and re-create another one in the future, can I perform the trick myself to avoid disturbing you every time? [06:20:00] Nicolas: so I couldn't get sudo -u to work at all [06:20:17] Nicolas: but this trick - sudo su && su && bash should work everywhere [06:20:19] Nicolas: no tricks needed [06:20:32] Nicolas: I was just trying to mess with sudo policy to see if I could get sudo -u working but no look [06:21:03] "sudo su" failed the first time I tried, though [06:21:48] Nicolas: yeah, that is because I had been messing with sudo policy. If you had tried 'sudo apt-get install ' that would have failed too :) [06:21:59] I see! [06:22:30] Nicolas: so if you create a new instance, this trick should still work [06:23:13] I will try to find a place on the wiki where to add the "sudo -u" workaround. Thanks a lot, see you! :-) [06:23:29] Nicolas: yw! :) [06:34:25] I added the tip at the bottom of this page: https://wikitech.wikimedia.org/wiki/Help:Sudo_Policies [06:35:05] Nicolas: woot! Thank you :) [06:37:39] Nicolas: you can also remove me from the project now if you want [06:59:51] I installed postgresql (via apt-get install) but after I start it with "sudo /etc/init.d/postgresql start", no process appears, zero result for "ps -e | grep postgres", and zero file in /var/log/postgresql/ ... any idea what's going wrong? On an out-of-the-box instance [07:53:46] !log deployment-prep installed mariadb via puppet on deployment-db1. no data yet [07:53:48] Logged the message, Master [08:20:26] hi. how do I create my own repo for the gsoc project on gerrit. [09:02:33] hi all! [09:02:50] where can I see the apache (and php) error logs for tool labs? [09:02:59] i'm tryign to debug http://tools.wmflabs.org/potd-feed/commons/potd-800x600.rss [09:08:33] this is a 503... I don't get it, I must be missing something obvious :P [09:08:37] this is a static rss file... [09:09:22] oh, and the crontab is gone! yay. wtf? [09:14:23] guh [09:14:34] can't figure out how to log into the old toollabs instance either [10:08:01] DanielK_WMDE: you can't log there because these are deleted [10:08:07] there are no old instances [10:09:27] petan: ok. so... where do i find my old contab entries then? [10:09:50] you need to ask Coren :/ he did the migration [10:10:16] petan: fwiw shouldn't Labs be getting IPv6 now it's moving/moved to eqiad? [10:10:25] no idea [10:26:06] DanielK_WMDE: ...DATA.crontab sollten sein [10:29:35] Steinsplitter: wo? [10:30:16] DanielK_WMDE: im home ordner deines tools [10:31:08] Steinsplitter: da liegen genau drei dinge: Mail, potd, und replica.my.cnf [10:31:09] /data/project/deinproj/...DATA.crontab lagen sie bei mir... [10:31:11] DATA gibt's nicht [10:31:15] o_O [10:33:47] Steinsplitter: andere frage: wo kann ich denn das apache error log sehen? [10:36:37] DanielK_WMDE: keine ahnung :/. wenn dein tool zwangsmigriert wurde musst du die migration finalisieren und den webservice neu starten. [10:37:54] i must have missed the memo [10:38:00] is this stuff documented somewhere? [10:38:20] how do i restart the webservice (don't all tools share the web service?) [10:38:59] i mean, this is just a cron job and static files. i didn't expect any trouble migrating this. [10:39:06] but apparently, tehre are some hoops to jump through [10:39:36] --> https://wikitech.wikimedia.org/wiki/Tool_Labs/Migration_to_eqiad [10:39:52] thanks [10:40:21] finish-migration [10:40:21] restart webservice [10:40:21] kann gut sein dass die crons erst bei der finalisierung rüberkopiert werden. np [10:41:47] yea, crontab is restored by finish-migration [10:42:51] Steinsplitter: ok, funzt. danke für deine Hilfe! [12:47:08] Jasper_Deng_away: We had issues migrating to Neutron (which was a requirement) and were against a thight deadline. Rather than risk breaking stuff, we delayed neutron migration. IPv6 is still on the roadmap, but we'll let the migration dust settle first. [12:57:00] hashar: deployment-db1 is loading a dump from deployment-sql now. the dump itself will be internally consistent, but out of date by a couple hours. does that matter to beta? [12:57:08] springle: hello! :-] [12:57:27] we can afford some data loss [12:57:30] if it does i'll figure something else out. but this ould be easiest :) [12:57:37] ok [12:57:59] springle: there is also a wikidata database on the sql02 server [12:58:09] sql02 is the master for the wikidata and the wikivoyages dbs [12:58:24] alternative is to make deployment-db1 a slave, let it catch up, then switch master and traffic [12:59:11] save yourself the hassle of switching master. I am fine dropping out some data [12:59:41] no reason sql02 data cannot go onto db1? [12:59:53] no reason, you can dump the few db there and import them on db1 [12:59:55] it's a bigger vm [12:59:56] ok [13:00:03] what about es1 [13:00:09] then we will have only one master [13:00:33] es1 ? [13:00:37] deployment-es1 [13:00:45] is that an external storage thing? [13:00:45] ohhh [13:00:52] that is ElasticSearch [13:01:05] manybubbles will take care of migrating the data / regenerating the search index [13:01:09] ah ok. nice and ambiguously named like production external storage :D [13:01:17] yeah ES is a confusing acronym :D [13:01:37] I don't think we use external storage on beta [13:01:47] ok easy [13:02:36] the wikivoyages DB we can probably drop them [13:04:46] springle: Re dbstore100[12] and "10 delayed slaves replicating all shards" = slaves replicating one shard each or, well, all shards? I. e., would this setup be usable for cross-wiki joins? [13:04:51] *efficient [13:05:51] scfc_de: yes, recombining all shards in one place. one box will be available for cross-wiki queries [13:05:56] won't be that fast [13:07:12] hashar: yeah, I'll just rebuild the index when the database is ready [13:07:19] or, if it is ready now, I'll go do it now [13:07:31] springle: I am dropping the de,en wikivoyage databases from sql02. They are full imports of the old wikivouages, that is way too big and we don't have any use for them. [13:07:36] I think we named it deployment-elasticXXXX in the new beta [13:07:40] hashar: ok [13:07:46] manybubbles: +1 on name change :] [13:07:53] manybubbles: nice :) [13:07:55] bd808|BUFFER: did it [13:08:08] he is a good man [13:09:41] springle: Available from Labs/Tools?! [13:10:17] scfc_de: er havn't discussed that :) but theoretically it could be [13:10:40] the sanitarium would need an extra box [13:14:04] !log deployment-prep Dropping enwikivoyage and dewikivoyage databases from sql02. Related changes are updating the Jenkins config: https://gerrit.wikimedia.org/r/#/c/121045/ and cleaning up the mw-config : https://gerrit.wikimedia.org/r/#/c/121047/ [13:14:07] Logged the message, Master [13:14:43] hashar: so happy to dump sql02 now? [13:15:20] springle: yeah dropping enwikivoyage and dewikivoyage there. [13:15:42] so there is information_schema / mysql and wikidatawiki [13:15:49] I guess we only care about the wikidatawiki one [13:17:27] manybubbles: on beta I have dropped wikivoyage from $wgCirrusSearchInterwikiSources ( https://gerrit.wikimedia.org/r/#/c/121047/1/wmf-config/CirrusSearch-labs.php ) [13:17:37] manybubbles: the databases no more exists. [13:18:07] springle: it is great you managed to write a mariadb puppet class from scratch :] [13:20:42] hashar: I have no problem with this [13:20:54] so do the other databases exist? [13:21:41] manybubbles: haven't checked, was merely cleaning out wikivoyage :-] [13:21:46] cool [13:21:51] manybubbles: sean is populating them on the db1 server \O/ [13:22:00] hashar: it's a little simplistic yet. am a puppet noob [13:22:14] but it serves for this [13:22:27] springle: it is better than nothing and it is a good occasion to practice your puppet [13:22:44] :) [13:22:58] I started puppet by copy pasting manifests [13:23:13] then trying to please folks reviewing my changes [13:23:26] sounds about right [13:24:02] my first change ever to puppet is a copy paste : https://gerrit.wikimedia.org/r/#/c/401/1/manifests/site.pp [13:31:15] and configuring db1 :-] [13:31:15] https://gerrit.wikimedia.org/r/#/c/121051/ [13:32:52] wikidata is surprisingly small: 300MB .. while innodb ibdata is 70G. were those other database very large? [13:32:58] on sql02 [13:33:21] we imported on beta all the wikivoyage projects [13:33:28] before they migrated under wikimedia umbrella [13:33:41] so if innodb ibdata is never garbage collected, that would explain the big size [13:33:58] it will probably end up smaller if you export/import [13:34:05] yep, innodb global tablespace never shrinks [13:34:07] * hashar has no clue how innodb works [13:34:26] new vms will use innodb_file_per_table for that reason, amongst others [13:34:46] can you reclaim the space somehow? [13:35:07] yes [13:38:00] springle: have you created a database user for mediawiki? [13:38:11] doing the grants now [13:38:12] springle: should be filled on eqiad in the file /data/project/apache/common-local/wmf-config/PrivateSettings.php [13:38:17] great [13:38:23] it will be the same as the old one [13:38:28] we have user 'mw', you can use whatever else, maybe wikiuser [13:38:42] which do you prefer? [13:38:45] the PrivateSettings.php file is local to the filesystem so update at will [13:38:49] whatever works for you [13:38:55] keeping the same user/pass is fine to me [13:39:09] old one mw. fewer changes to break today :) [13:40:32] there is also an 'oren' user on -sql [13:40:54] can drop it [13:41:05] probably some left over from when we started beta [13:41:09] oren is a volunteer [13:52:29] !log deployment-prep dropped some live hack on eqiad in /data/project/apache/common-local and ran git pull [13:52:32] Logged the message, Master [13:53:12] springle: I managed to access a wiki page!!!!!!!!!!!!!!!!!!!!! [13:53:56] :) [13:54:05] so happy [13:54:20] still loading [13:54:27] up to simplewiki [13:54:38] should be in alpha order [13:54:41] I even managed to login [13:54:58] simplewiki is quite big I think [13:55:49] !log deployment-prep stopping udp2log on eqiad bastion, starting udp2log-mw (really should fix that issue one day) [13:55:51] Logged the message, Master [14:00:30] manybubbles: mind reviewing a tiny change for beta which change CommonSettings.php please? I am paranoid : https://gerrit.wikimedia.org/r/121056 [14:00:39] manybubbles: merely vary the udp2log destination in labs [14:02:53] done [14:02:55] hashar: ^ [14:02:57] thx [14:03:42] in my Special:NovaResources why does my lab instance Puppet status show 'failed' ? [14:04:07] tonythomas: puppet fails there? Connect on your instance and run: sudo puppetd -tv [14:04:17] tonythomas: that will apply puppet and show you a bunch of errors [14:04:28] hashar: actually, I cant connect to .. [14:04:30] tonythomas: or one can look at /var/log/puppet.log [14:05:08] hashar: I can log in to the instance remotely via ssh right ? [14:05:10] !log deployment-prep udp2log functional on eqiad beta cluster \O/ [14:05:13] Logged the message, Master [14:05:29] tonythomas: na you need to ssh to one of the bastion hosts first [14:05:49] oh! I will do that [14:05:51] tonythomas: example configuration https://wikitech.wikimedia.org/wiki/Help:Access#Accessing_instances_with_ProxyCommand_ssh_option_.28recommended.29 [14:06:04] tonythomas: that would configure your ssh client so you can do: ssh instance.eqiad.wmflabs [14:06:27] hashar: thanks. let me try that again [14:06:27] ssh client will then figure out the bastion to use, connect to that bastion and connect from the bastion to your instance [14:06:33] ok [14:16:59] !log deployment-prep fixed up redis configuration in eqiad. Jobrunner is happy now: aawiki-504cd7d2: 0.9649 21.5M Creating a new RedisConnectionPool instance with id 627014dc7020485d721532dde4142d5190ba3cc1. {{gerrit|121060}} [14:17:01] Logged the message, Master [14:22:15] manybubbles: and here is the elastic search basic conf for beta in eqiad [14:22:17] manybubbles: https://gerrit.wikimedia.org/r/121063 [14:30:39] hashar: for some reasons ssh -A @bastion.wmflabs.org [14:30:42] fails for me [14:30:53] that username will be my gerrit shell username ? [14:33:26] tonythomas: your shell username [14:34:00] tonythomas: most probably tonythomas01 [14:34:20] hashar: of course, that's my gerrit one, and wikitech as I remember, [14:34:24] but still it fails [14:34:30] I have added the ssh keys [14:34:31] yet [14:34:59] it shows the Public Key issue [14:36:56] tonythomas: double check your key in wikitech interface? [14:37:00] it is apparently there [14:37:18] hashar: ssh tonythomas01@gerrit.wikimedia.org -p 29418 works fine [14:37:28] I will try adding the key once again [14:37:46] I think Gerrit uses a different system [14:37:55] aka the key is added to your user preference in Gerrit web interface [14:38:11] yeah! The same key is added here too I think [14:38:49] I gave xclip -sel clip < ~/.ssh/id_rsa.pub , pasted it there, and on submit shows the error - probably due to existing same key [14:38:59] hashar: sorry ! for some reasons I have to quitt [14:39:02] will brb [14:44:05] hashar: Logstash seems to be working in eqiad beta -- https://logstash-beta.wmflabs.org/#/dashboard/elasticsearch/default [14:45:23] !log deployment-prep created proxy https://logstash-beta.wmflabs.org for logstash instance [14:45:25] Logged the message, Master [14:49:22] hashar, anything I can help with today? [15:05:40] aude, can you catch me up on what's happening with the 'maps' project? [15:07:29] hashar, anything I can help with today? [15:09:23] andrewbogott: we need map warper (ask chippy) [15:09:32] andrewbogott: sean has created the eqiad database \O/ :-] [15:09:35] wma, check with daniel schwen [15:09:42] aude: I don't know who chippy is [15:09:48] tim waters [15:09:49] here [15:09:52] :) [15:09:54] Ah! [15:10:02] chippy, what's up with 'map warper'? [15:10:04] I do need to add ticket to bugzilla, sorry [15:10:27] rest of the instances are not needed and we can start fresh [15:10:38] chippy, please do it right now. the deadline for this passed weeks ago… [15:10:42] maps-warper instance under the maps project. [15:11:01] aude: What is daniel's irc handle? [15:11:06] Or, can you pm me his email? [15:11:12] one sec [15:11:21] he's here sometimes, usually when he has question or problem [15:12:25] andrewbogott: I got to figure out how to have puppet to create the syslog-ng destination directory. [15:13:05] hashar: Yeah, it's dumb that puppet doesn't do recursive directory creation. Probably just shell out 'mkdir -p' [15:13:14] andrewbogott: good idea [15:13:28] hashar: I bet if you grep you'll find that happening other places in the code [15:14:35] andrewbogott: i just deleted wikidata instances we maintain and don't care about [15:14:46] aude: great, thanks. [15:14:52] there is "wikidata-jenkins" which can be left until the end and deleted [15:15:01] or probably deleted now [15:15:27] there are a few others we don't maintain (e.g. they belonged to students) [15:15:44] if you want to "auto" migrate those and if they survive, ok (if not ok) [15:16:04] hashar: done :) now its connceted [15:16:05] aude: Would you mind making me a bug about those so I don't lose track? [15:16:11] tonythomas: congrats! [15:16:16] ok [15:16:17] I'm happy to migrate them, just don't want to miss anything. [15:16:19] thanks [15:16:30] one belongs to magnus [15:16:37] i'm sure that is migrated [15:16:46] not 100% but 99% sure [15:17:42] hashar: thanks ! now doing sudo puppetd -tv [15:17:55] suppose i could try to ask the students [15:21:00] andrewbogott, aude have posted to bugzilla: https://bugzilla.wikimedia.org/show_bug.cgi?id=63115 [15:22:12] andrewbogott: I came up with mkdir -p , also added ensure => directory to the /home/wikipedia/syslog file {} statement. https://gerrit.wikimedia.org/r/#/c/119256/4..5/manifests/misc/logging.pp [15:22:23] andrewbogott: https://bugzilla.wikimedia.org/show_bug.cgi?id=63116 [15:22:26] chippy: I'll copy over your instance right now. [15:22:33] thank you :) [15:23:02] chippy: There are a few ways to handle shared storage. If you don't require it to be totally up-to-the-minute fresh then you have recent backups in /data/project/glustercopy and /home/glustercopy. [15:23:35] chippy: the pmtpa instance will be shut down as part of the copy [15:24:43] I get this error on runnig sudo puppetd -tv http://pastebin.com/2pwanYzV [15:24:50] any thoughts ? [15:25:37] andrewbogott, thanks. Will it be started after that? Or (as im assuming) there isn't any need to after that [15:25:55] chippy: best to leave the old pmtpa instance to die. [15:25:59] I'll start up the eqiad one. [15:26:41] tonythomas: probably your project doesn't have shared homedirs turned on. That's a bit of a bug on our end... [15:26:53] but if you click on the 'configure' link on the 'manage projects' page there are checkboxes… you should check 'em both. [15:38:04] andrewbogott: I enabled both. and the Service user homedir pattern needs to be changed to something like /home/tonythomas01/ ? [15:38:40] tonythomas: um… what project is this? Are you really using service users? [15:42:19] chippy: Have a look at maps-warper.eqiad.wmflabs -- it should be all set [15:42:28] except you'll probably need to rearrange some of the shared storage bits. [15:42:58] andrewbogott: its a project on impementing VERP feature for mediawiki [15:43:19] you can see the request here https://wikitech.wikimedia.org/wiki/New_Project_Request/mediawiki-VERP [15:43:59] tonythomas: Why would you want to put non-human users in /home? [15:44:20] In labs, those canonically go to /data/project [15:44:59] andrewbogott: That reminds me, we need to alter the wikitech interface to (a) stop creating the old-style LDAP entries and (b) change references to local- to $project. [15:45:17] * andrewbogott makes note [15:45:36] Coren: you mean to change Service user homedir pattern to :/data/project ? [15:45:51] hashar: dump reload has completed. have to zzz now :) let me know if anything appears amiss [15:46:25] springle: thank you very much! [15:46:43] tonythomas: Oh, right, it's /home/%p%u by default. My bad. But yeah, most put it in /data/project [15:46:49] !log deployment-prep deployment-db1 data loaded [15:46:51] Logged the message, Master [15:47:01] hashar: yw [15:47:08] tonythomas: /home/ also works. [15:47:19] It just looks odd to me. :-) [15:47:27] let me try with /data/project [15:47:38] That shouldn't make a functional difference. [15:48:32] ok. [15:49:17] Oh, hey, I think I found a cheat around my NFS race. [15:49:44] But /man/ it's ugly. [15:51:05] Coren: I think I am having some connection error when I gave /data/project [15:51:18] ... "connection error"? [15:51:39] That setting affects only service users/groups. [15:53:34] Coren: Now, when I login, it shows the error Unable to create and initialize directory '/home/tonythomas01'. [15:53:36] and quits [15:54:24] Wait, you're not supposed to try logging in with service users. [15:55:49] Ah, nevermind, your shell account name and wikitech names differ and that confused me. :-) [15:56:01] Coren: yeah :) [15:56:05] they differ [15:56:19] dont know why I made like that, but tonythomas01 is my shell account name [15:56:23] That last error is a known symptom of an instance not having its home properly rw because of the very race condition I had just discussed. :-) [15:56:38] I can fix it manually. What's your instance name? [15:57:19] the instance id is : i-000002ad.eqiad.wmflabs [15:58:01] and Jeff_Green says he too is having trouble logging in [15:58:11] he is away at the moment though [15:59:27] fyi, you don't have to refer to the id. It's also 'box2.eqiad.wmflabs' [15:59:58] Coren: ok [16:01:18] Coren: now its ok [16:01:23] Thanks [16:03:20] tonythomas: No it's not. Please log off. [16:03:21] :-) [16:03:30] Coren: done [16:04:23] Won't be long. [16:04:42] k. [16:06:59] tonythomas: /now/ it's fixed properly. :-) [16:07:17] Coren: earlier, there was nothing inside /home/ [16:07:17] :) [16:07:20] let me see [16:10:29] Coren: sad to bug you again , but now sudo puppetd -tv shows this warning warning: /Stage[main]/Role::Labs::Instance/Mount[/data/project]: Skipping because of failed dependencies [16:10:46] there are no errors and the catalog run finishes [16:10:48] tonythomas: I know, I just noticed. I will have to reboot your instance [16:10:58] To give you /data/project [16:11:01] logged out [16:11:03] ok :) [16:11:20] chippy, still here? [16:13:42] tonythomas: All done. [16:16:21] aude: after those three wikidata-dev instances… are we still waiting on addshore to do more work in that project? [16:17:42] wikidata jenkins can be die when tampa dies [16:17:55] we don't mind keeping it around until then, just in case we want to look up something [16:19:56] aude: Ok, can I move it to the 'finished' section? [16:20:00] yes [16:20:04] cool. [16:20:17] and after chippy signs off on his maps instance, that one too? [16:20:27] yes [16:20:55] ok, thanks [16:22:08] andrewbogott: So, I found a way to prevent the mounts being mounted readonly if the export didn't go through yet -- but at the cost of the mount /failing/ entirely. There are two approaches: let it fail, meaning you don't get the shared filesystems at all until next puppet run tries it again or make puppet wait for the filesystem to become available. I don't particularily like either option. [16:22:45] I think failing until the next puppet run is ok -- certainly better than what happens now. [16:23:04] well, really either sound ok to me. [16:23:09] *sounds [16:23:40] much better, at least :) [16:23:40] The "real" solution is to have instance creation actively make the exports rather than wait for a daemon to run at interval. [16:24:27] Or, actively cause the exports to be created. Same difference. [16:24:34] Yeah. There are a lot of hacky, easy ways to do that… doing it a proper way with an OpenStack API will be a fair bit of trouble. [16:24:58] How often does the daemon run now? Making puppet block on creation /and/ shortening the refresh time seem like a pretty good option. [16:25:16] hm, having a lot of trouble with subj/verb agreement today [16:26:20] andrewbogott: Hmm. 2 min IIRC [16:26:35] * Coren checks. [16:27:12] The problem has never been the actual interval so much as mountd caching what it got the first time 'round for a while after the last unmount [16:28:10] Right, but if you implement either of the above fixes then the interval starts to matter. [16:28:35] Indeed. [16:28:52] Oh, wait, I can't do that. [16:29:18] If I wait for the filesystem to become available rather than fail, then projects without shared home / shared project will hang puppet. [16:29:30] Because those will /never/ become available. [16:30:00] * Coren grumbles. [16:30:09] So… step one is remove those checkboxes from the web interface, remove the check for those settings in the daemon [16:30:17] so we always have shared storage. Because we want that anyway, right? [16:30:43] I suppose we do; though arguably projects that don't need a shared home remove load from the NFS server. [16:30:53] * Coren ponders. [16:31:04] True, but right now puppet errors on such projects doesn't it? [16:31:34] I might actually do one better. If the manage-nfs-daemon provided information on what mounts are available on a ro filesystem, puppet would know (a) when they are ready and (b) which are actually available. [16:32:06] How would that be communicated? [16:32:56] Coren: Hmm. Piping to /usr/bin/mail doesn't work from a job, I'm guessing because it backgrounds itself. Is piping to /usr/sbin/exim -odf -i instead (and changing stdin to include Subject: and a blank line) safe? [16:33:48] create a file named along the lines of .$project.exports in /public/keys, containing the name of the actual exports one-per line? The file would only exist after manage-nfs-daemon did its run (so no ro mounts) and the contents would allow being selective about what is mounted. [16:34:13] Actually, no, that needs to be .$instance.exports [16:34:26] hm… [16:34:29] hacky, but could work. [16:34:36] But regardless of name, this is something that puppet can check easily. [16:34:45] * andrewbogott nods [16:34:59] anomie: It should. [16:35:14] anomie: Although I would have said /usr/bin/mail also should. Odd. [16:35:34] Coren: Thanks ;) puppet status: ok [16:35:50] chippy: I need to step away, but please check out that instance and respond to that bug as soon as you can. [16:36:03] Coren, back in ~an hour [16:36:09] andrewbogott: kk. [16:38:07] Coren: When I use -v to /usr/bin/mail, it looks like something in there backgrounds itself for the actual giving of the message to the MSA. And I'm guessing SGE kills every subprocess when the main process exits, so the mail never actually gets sent. [16:40:42] anomie: Ah, hmm. Trying to be too smart for its own good is it? Hm. [16:42:16] hashar: what is the bastion for the new beta? [16:42:25] manybubbles: deployment-bastion.eqiad.wmflabs [16:42:34] well that is not really a bastion, merely the equivalent of tin [16:42:35] sensible [16:42:45] I still have to setup the jenkins slave there to update the mw code automatically [16:42:46] that is what I needed anyway [16:42:48] as well as the mediawiki-config [16:43:49] manybubbles: I haven't checked in a couple of days, but the deployment-elastic04 host wasn't joining the cluster when I set it up. I couldn't find an obvious reason why. [16:44:03] bd808: k [16:45:09] manybubbles: I am escaping back home sorry [16:45:20] see you [16:45:47] so what is the actual bastion? [16:45:53] well [16:45:56] bastion.wmflabs.org [16:46:03] oh, but that was for pmtpa? [16:46:04] that is where you ssh too to access the instances :-] [16:46:16] deployment-bastion.eqiad.wmflabs is merely the equivalent of tin / terbium [16:46:26] aka were we run the mwdeploy scripts [16:46:39] ok, Im' in! [16:46:49] congrats! [16:47:04] if you need to become mwdeploy : sudo su - mwdeploy [16:47:05] should work [16:48:02] and i am really off now [16:48:05] *wave* [16:50:14] bd808: everything is working well while I build the index, which is good [16:50:25] I'lll look at deployment-elastic1004 later I guess [16:50:27] not really required.... [16:53:18] Coren: Re /data/project mount, what is the scenario? A new instance X isn't known to the NFS server as belonging to the project Y and thus not granted read-write access? Or no access at all? [16:57:25] scfc_de: granted ro. The problem is that this is then cached for quite some time. [16:58:31] But wouldn't that imply that any instance can mount any volume ro? Shouldn't the default be no access? [16:58:54] scfc_de: That's not the only layer. [16:59:07] But it's also moot, I'm changing some things around now. [17:46:05] Change on 12mediawiki a page Wikimedia Labs/Tool Labs/Roadmap de was modified, changed by Silke WMDE link https://www.mediawiki.org/w/index.php?diff=941287 edit summary: /* Zeitplan */ Angleichung an die aktuellere englischsprachige Seite [17:47:37] Change on 12mediawiki a page Wikimedia Labs/Tool Labs/Roadmap de was modified, changed by Silke WMDE link https://www.mediawiki.org/w/index.php?diff=941288 edit summary: /* Zeitplan */ huch, template-korrektur... dieser VE... [17:50:55] Change on 12mediawiki a page Wikimedia Labs/Tool Labs/Roadmap de was modified, changed by Silke WMDE link https://www.mediawiki.org/w/index.php?diff=941294 edit summary: /* Zeitplan */ Angleichung an aktuellere englischsprachige Seite [17:50:58] andrewbogott_afk, in location with (cheap) no ssh access, got messages, many thanks. will investigate asap [18:00:06] andrewbogott_afk: I've set things so that, atm, too-early mounts simply fail. We can refine this later. [18:00:20] Worse-case scenario, a puppet run fixes. [18:01:12] This will have the side effect that /currently/ mismounted filesystems will be broken, though. [18:07:53] Change on 12mediawiki a page Wikimedia Labs/Tool Labs/Roadmap de was modified, changed by Silke WMDE link https://www.mediawiki.org/w/index.php?diff=941334 edit summary: /* Zeitplan */ Angleichung an engl. Seite [18:16:15] Coren: are you leaning towards thinking that we want to keep the shared storage configuration options? [18:20:07] andrewbogott: Coren: Just curious if this is on anyone's radar (and if not, who's it should be on). I'm still getting swamped with useless empty Echo notification emails from wikitech (have for several months now). If I recall correctly, this broke because the Echo extension was upgraded (in like August 2013 or so) but without upgrading the database or configuration or some other back-incompat change. [18:20:07] Krinkle: Can I help with either cvn or integration? Do you need a public IP for integration? [18:20:47] Krinkle: probably worth logging a bug about that, I don't know what's happening. [18:20:57] The notifications don't have any content at all? They aren't about instance creation/deletion? [18:21:20] andrewbogott: There's no /harm/ in it. the gain is not nearly as big as it was avoiding creating gluster volumes, but I can see valid reasons why a project might not want to have shared homes. Perhaps default them to 'on' however? [18:21:24] For integration better ask hashar. Afaik the only thing we need to keep in integration are the Jenkins slaves (which are used by production Jenkins). He's already spawned eqiad versions of those, assuming those work out, we can just let the tampa ones die. [18:21:35] The other instances aren't used at the moment, when we will, we'll just re-create them accordinglu [18:22:20] As for cvn, that one does indeed need a public subdomain (we use cvn.wmflabs.org), but I can do that myself. Just need to find time to set it all up. I'm hoping to do a rough attempt later today. [18:22:22] +1 for fixing Echo mails. There isn't even a link to wikitech apart from the preferences in the mails. [18:22:42] andrewbogott: Can we appoint a time so I can have you standing by to copy over all data stores? [18:22:48] (at the right time) [18:23:18] scfc_de, Krinkle, does it help if you uncheck things here? https://wikitech.wikimedia.org/wiki/Special:Preferences#mw-prefsection-echo [18:23:25] Krinkle: sure, when would you like to do that? [18:27:08] andrewbogott: I'm going to migrate one bot manually now to see if the environment is compatible, then I'll set up the right number of empty bot containers and kill the tampa bot processes. [18:27:15] andrewbogott: What's your time zone? [18:27:26] cst [18:27:30] um… -6, I think? [18:27:38] I'm in SF at the moment. [18:28:04] !ping [18:28:05] !pong [18:28:06] When you say 'bot' -- are we talking about things in toollabs? Or something else? [18:28:07] ok [18:28:54] andrewbogott: Nope, not tool labs. I set up a separate project for cvn bots because of them being memory hogs and needing all kinds of evil packages and versions thereof. [18:29:01] ah, ok. [18:29:15] Well, we can probably do this yet today, if you have the time. Check back with me after you finish your test? [18:30:37] andrewbogott: I don't mind the mails as such; they are just useless at the moment. If an instance is created, the notification should be "An instance xyz has been created in project abc." This works for "xyz left a message for you at abc.", and should be made to work for instance creations/deletions as well. I assume that's Krinkle's issue with that as well? [18:31:01] yeah, if the test goes well, I'd say lets copy sometime 3 or 4 hours from now? [18:31:03] andrewbogott: [18:31:12] !ping [18:31:13] !pong [18:31:17] scfc_de: Those messages are fine [18:31:29] scfc_de: They are visible on the wiki in the Echo ticker [18:31:50] the problem is that the configuration or db wasn't upgraded properly so the message is stripped from the email (or doesn't get inserted properly) [18:31:54] the logic for this is all in place [18:31:59] in OSM and Echo [18:32:17] so just should be a matter of runnign an upgrade script or some coinf change. [18:32:21] Yep, on wiki everything's fine; I'm just talking about the mails. [18:44:57] bd808: Is the logstash migration going OK now? Can I do anything to help? [18:49:19] Hi [18:52:29] I'm trying connect for ssh [18:53:02] andrewbogott: I think the logstash migration is {{done}} but I'd like to leave the instances in pmtpa running until we shutdown beta there. In theory people use them to look at beta errors [18:53:20] * bd808 will go update the appropriate wiki page [18:53:23] bd808: can we redirect people to the new instances? [18:53:30] but it gives me error [18:53:57] If you are having access problems, please see: https://wikitech.wikimedia.org/wiki/Access#Accessing_public_and_private_instances [18:53:57] Permission denied (publickey,hostbased). [18:54:23] andrewbogott: It's a data thing. pmtpa logstash gets pmtpa beta logs and eqiad->eqiad [18:54:23] Know why? [18:54:24] andrewbogott: i just noticed that all the sartoris instances are shut off [18:54:32] do we have another place to test deploy stuff in labs? [18:55:13] ottomata: We have trebuchet running in deployment-prep in eqiad [18:55:18] I have added the public ssh keys [18:55:26] bd808: is that the same stuff that tin uses? [18:55:30] Thanks [18:55:31] or ryan's separate project? [18:55:48] ottomata: Yes. Using all the same puppet config [18:56:02] ok awesome, mind if I test some git-fat git-deploy stuff there? [18:56:27] bd808: ok… I can't promise you much more pmtpa uptime, things are going to start to shut down soon. Basically as soon as hashar finishes with beta instances are going to start shutting down. [18:56:28] what labs project? [18:56:42] bd808: but, anyway, can I mark that project as finished w/migration since there aren't any more pending tasks? [18:57:25] andrewbogott: Shutdown along with beta is perfect. I'll update the wiki page as soon as I remember where it is :) [18:57:34] Harpagornis: varions things could be going wrong. Which system are you trying to reach? [18:57:39] bd808: https://wikitech.wikimedia.org/wiki/Labs_Eqiad_Migration_Progress [18:57:49] ottomata: deployment-prep project (beta) [18:58:28] ottomata: We have a project local puppet and salt master at deployment-salt.eqiad.wmflabs [18:59:35] andrewbogott, I'm using: ssh harpagornis@tools-login-eqiad.wmflabs.org [18:59:55] bd808: can you add me to that project [19:01:06] ottomata: Sure can. What's your wikitech name? [19:01:43] Harpagornis: the error on the login host is "error: key_read: uudecode \n failed" [19:01:50] I don't know what that means yet, but something to do with your key format... [19:02:18] Ah, this seems like a good explanation: http://theunknown.com.au/ssh-error-key_read-uudecode/ [19:02:27] Harpagornis: that possible? Just that you have a newline in the middle of your key? [19:02:37] It could be either on your local system or in the one you uploaded. [19:02:44] bd808: Ottomata [19:02:58] I see.. [19:03:19] andrewbogott: i had that error the other day because of a copy/paste typo in the ssh key [19:03:33] yep, seems likely, depending on where you copy it from. [19:03:37] just missing 2 chars [19:03:38] yeah [19:03:44] might not be just newline, could be any typo [19:04:25] ottomata: {{done}} See https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/How_code_is_updated#Puppet_and_Salt for info on puppet/salt setup there [19:04:26] how do you request deletion of wikitech pages? [19:05:03] gifti: probably you can just delete it yourself :) Or ask me. [19:05:18] i can? [19:05:23] !log deployment-prep Added ottomata as a project member and admin [19:05:25] Logged the message, Master [19:05:33] andrewbogott, copy from the command line [19:06:16] and also from a text edit [19:06:48] andrewbogott: ok, please delete the page GiftBot, it's more than obsolete now [19:07:04] bd808: is one of those hosts a generic 'deploy host' like tin is? [19:07:14] so: i'm tryign to test git-deploy's git-fat support [19:07:15] gifti: link? [19:07:26] ottomata: deployment-bastion is the tin equivalent [19:07:30] sigh [19:07:34] it worked in sartoris project, but now that I'm trying in production, it seems salt.utils.which is not doing what I expect [19:07:35] ok cool [19:07:37] Harpagornis: I think that when you paste it into the window it's getting a newline… should be pretty clear how to modify that in the entry field. [19:07:51] gifti: Ah, is that just the page name? That's easy :) [19:08:19] andrewbogott, I'm in. [19:08:24] bd808: can I rebase and run puppet? [19:08:28] Harpagornis: great! [19:08:33] gifti: done, thanks for the cleanup. [19:08:51] ottomata: Should be safe. Try not to lose all the local patches that are awaiting upstream merge :) [19:09:11] k, are they committed there? i don't see anything dirty in the working copy [19:09:29] thank you as well, andrewbogott :) [19:10:10] ottomata: Yes, they should be committed on top of upstream production branch. Fetch + rebase + cherry-pick if you need to pull in pending changes [19:10:20] ok cool [19:10:21] thanks [19:10:33] cherry-pick is only needed to add new stuff obviously [19:10:50] right, woudl you prefer cherry pick or should I just use a temporary local branch to check them out into? [19:10:54] (I don't have anything yet, but I might) [19:11:11] also, is there a deploy target I could use to test deploying to? [19:11:24] i don't really need a whole host for this, i'm just testing deployment [19:11:40] ottomata: We've been doing cherry-pick from gerrit so far which has worked well. [19:11:58] ok [19:12:55] ottomata: Not sure about the deploy targets. It may be easiest for you to spin up a small instance with the roles you need to act as a target [19:13:04] ok, i'll do that [19:13:21] Neat. [19:14:31] * andrewbogott looks around for more people to nag [19:14:38] One nice thing about cherry-picking from gerrit is that once the merge happens the rebase will clean up the local commit [19:15:03] andrewbogott: if we can turn on data4all for a bit, I can copy data off it and then it can be destroyed in peace. Let me do that now. [19:15:32] YuviPanda: sure -- you should be able to boot up the pmtpa instance, let me know if you have any trouble. [19:15:41] andrewbogott: ty. [19:16:59] andrewbogott: btw, I see that 'dptypes' in pmtpa is in 'shut off' state. Do I need to delete it or just let it be? [19:17:32] YuviPanda: there's no real need to clean up stuff in tampa, that'll get taken care of when we format the drives :) [19:18:33] andrewbogott: hehe. alright then [19:18:53] YuviPanda: instances that are actually running might give me pause… but if it's shut down then not to worry. [19:28:29] mutante|away: any update regarding wikistats? [19:33:52] Hi, I need to store the query result set in text file. When I try to execut "sql enwiki_p < test-query.sql > test-query.txt". I get "-bash: test.sql: No such file or directory". can anyone help me ? [19:36:03] andrewbogott: ok, I think data4all can be killed 4eva now. Should I make a note on the wiki page? [19:36:48] YuviPanda: Sure, move it from 'mothballed' to 'abandoned' [19:37:43] andrewbogott: ok! [19:39:39] andrewbogott: So, at this time, new instances that try to wake up faster than the nfs will just fail the mounts; it's suboptimal but much easier on the support ("just run puppet") [19:40:20] Hi! I want request access to the 'tools' project. https://wikitech.wikimedia.org/wiki/User:Doc_Taxon [19:40:28] Coren, that seems better, thanks! [19:40:43] andrewbogott: I don't want to have puppet wait for it; there's too much risk of a stall if something is funky with manage-nfs-daemon. [19:41:02] andrewbogott: We'll look into a better solution (through openstack) once the dust settles. kk? [19:41:07] yep, sounds good. [20:03:06] bd808: having a bit of trouble getting my new node recognized as a minion [20:03:19] salt-call pillar.data on the new node looks good I think [20:03:27] but i can't trigger a remote salt call from deployment-salt [20:04:00] * bd808 will try to pretend he knows how to troubleshoot salt [20:04:11] hmm, actually, i can't run that for any minion [20:04:18] # salt -E 'deployment.*' cmd.run 'hostname' [20:04:18] No minions matched the target. No command was sent, no jid was assigned. [20:06:17] ok, bd808, i take it back, git-deploy is working [20:06:29] i just assumed it wasn't because I couldn't get salt to do what it usually can [20:06:42] Good. :) I'm 96% clueless still on how to fix it when it's broken [20:10:12] yeah it is fairly mysterious to me mostly too [20:10:19] nice work on geting it set up :p [20:10:29] Ryan did it for me :) [20:10:32] oh ha [20:11:13] I think the reason your command didn't work is that the minion names are thing like 'i-000002b7.eqiad.wmflabs' [20:11:42] oh hm [20:11:42] There is a `fqdn` grain that might match... [20:11:43] ok [20:11:46] hm [20:11:47] ok [20:11:56] so, hm, do you know this much:? [20:12:04] how does deploy.py code get run on minions during sync? [20:12:11] afaik it only exists on salt master [20:12:45] It should be on the minions too I think. Let me look at a node that I know works [20:13:58] ottomata: "somehow" (salt magic I think) deploy.py gets copied to /var/cache/salt/minion/extmods/modules/deploy.py on the minions [20:17:23] errgh, hgm [20:17:30] ottomata: I think that /srv/salt/top.sls and /srv/salt/deploy/sync_all.sls are the magic that makes salt copy the code to minions. [20:18:39] Those were the files that we had to patch to make things work correctly on deployment-bastion. Before the patch they didn't have the statements for the 'deployment_server:true' grain [20:24:36] bd808: hm, [20:24:48] i don't think that /var/cache file is what is being run [20:24:56] i'm putting syntax errors in there and it still does the same thing [20:26:29] ottomata: On deployment-bastion? [20:26:36] Doc_Taxon: You noticed petan processed your request? [20:26:52] oh there? [20:26:55] i'm editing on minion [20:26:58] thank you, I already noticed [20:26:59] scfc_de: I even answered him in -requests [20:27:08] he went there and requested :3 [20:27:11] Ah, okay, thanks. [20:27:19] first and only user who did it proper way XD [20:27:30] ottomata: Deployment-bastion is a minion, but the special one for this module I think [20:27:41] proper way? [20:27:49] And if Guest20326 should come back: That error message doesn't fit the command line => Don't lie, or we can't help you :-). [20:28:38] yeah totally strange bd808, its the same there too [20:28:45] where is this code coming from!? [20:31:44] ottomata: You might see if you can poke Ryan in #trebuchet-deploy for some tips [20:32:32] And then write it up in some way that makes sense :) [21:02:21] bd808: quick learning of something [21:02:23] saltutil.sync_modules is what does it [21:02:28] and puppet calls that whenever a module changes [21:03:03] Cool. Did you figure out which version gets run yet? [21:15:18] Hello, in tools project I can't do "become `mytool`." It said: "sudo: ldap_start_tls_s(): Connect error" [21:15:38] How to fix it? [21:20:25] Hi all [21:23:42] Is there somebody who had experience with making MediaWiki an OAuth provider for other applications to use? [21:24:08] nullzero_: try logout from tools-login and login again [21:24:27] does anyone know what happened to become ? [21:24:38] haven't been on in a while [21:25:08] still doesn't work [21:25:33] nullzero_: nope. woked one. now the same error Coren ^ [21:25:44] ok, i guess cd'ing will do for now [21:25:45] *once [21:27:19] * hedonil can't get it: LDAP errrors in *new* eqiad. tssss [21:27:33] bd808: no, not sure, just that the puppet exec updates everything [21:27:35] nullzero: LDAP issue. I will be looking into it, I think someone is working on the certificate. [21:28:19] I can "become wikilint", so good karma/bad karma :-). [21:28:20] aha. thank you very much. [21:28:44] Well, no longer. [21:29:16] scfc_de: ;) LDAP is an attention whore [21:29:37] it works now. Seemingly it's very unstable [21:31:13] No, someone was working on it. [21:33:00] phew, that was a pain to figure out [21:33:07] bd808: could you look this over real quick? [21:33:08] https://gerrit.wikimedia.org/r/#/c/121248/ [21:33:21] * bd808 looks [21:36:40] ottomata: LGTM. I'm assuming you've been testing to get to that state [21:37:38] yup [21:37:53] finally got it working [21:38:01] dunno how to best get logging out of salt/deploy stuff [21:38:17] i had to hack the code and add f.write statements to a tmpfile on the target to figure out what was goign wrong [21:38:39] We need to come up with a better solution than that for sure [21:38:48] for sure [21:39:17] python logging is easy to work with. We just need a place to send the logs… like my logstash cluster [21:39:23] yeahhh! [21:39:24] :) [21:39:27] well, i mean [21:39:36] ideally you could do git-deploy sync --verbose [21:39:40] or something [21:39:51] anyway, ja [21:39:52] I have a log appender in the scap project that logs to udp2log [21:39:53] gimme +1! [21:40:13] thanks :) [21:40:51] Does salt.utils.which return nil on failure? [21:45:55] None [21:46:16] * bd808 has ruby on the brain at the moment :/ [22:55:18] chippy: there? [22:56:25] hi andrewbogott [22:56:35] just replied to the ticket [22:56:56] Yeah, I saw. We're having some kind of ldap outage right now, breaking lots of login/sudo stuff. [22:57:12] Folks are working on it but I don't quite understand the problem. If you can hang on for a few minutes, I'll ping you when it's time to try again [22:57:16] ouch. Okay, no worries. [22:58:47] andrewbogott, many thanks. I may head off to bed in 30 mins or so. [22:59:03] chippy: My experience on maps-warper is that sudo throws an error but works anyway. Same for you? [22:59:46] andrewbogott, it throws an error and then asks for the password [22:59:55] oh, ok... [22:59:56] hm [23:02:20] chippy: That's something different, then. What is your shell account name and what are you trying to sudo to? [23:02:34] Coren, my shell account name is chippy [23:02:45] and I am trying to do a "sudo ls" [23:03:17] chippy: Wait, on what instance then? [23:03:27] Coren, the instance is maps-warper [23:03:30] * Coren thought that was tools. [23:03:44] ahh nope - part of map project [23:03:53] or "maps" :) [23:04:39] I was on irc a few days ago asking about the Tools Project migration - getting myself confused as to the old name for Labs (toolserver was it?) and the Tools project itself [23:05:22] User chippy is not allowed to run sudo on maps-warper. [23:05:54] It's not working because you don't actually have the right to sudo. :-) You might want to check the sudo rules on wikitech? [23:06:14] Coren, I did have the rights previously (pre-migration) [23:06:23] also, wikitech login is down, which could be related ^ [23:07:26] chippy: It's related to the "sudo: ldap_start_tls_s(): Connect error" message, but not the fact that you don't have access. [23:08:16] okay thanks :) am able to log into wikitech. lets have a look [23:10:20] so the sudo policy is for all members [23:10:39] see #wikimedia-operations [23:10:53] i think ldap is being fixed, which means sudo etc [23:12:28] probably late to the party again, but on the labs website I suddenly cannot see any of my instances [23:12:46] ldap issues [23:13:50] dschwen: how about now? [23:13:50] dschwen: It should be back now. [23:14:18] i'm unable to ssh into the instance now (from bastion) [23:16:13] chippy: me neither. Stay tuned... [23:17:16] :) I have always patience for sys admins. They are worth their weight in bitcoins [23:23:08] It looks like there's going to be a partial outage of eqiad labs instances. This should last no longer than 30 minutes. Sorry, everyone. [23:23:16] Instances won't actually be down, just unreachable via ssh. [23:27:18] Hey guys, just a quick note came to my mind, while was reading this https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/How_code_is_updated#Puppet_and_Salt. I believe that accepting all client keys on salt master / puppet master isn't a best way to deal with this keys, as it could happen that some unwanted minion was registred:) shouldn't this page mention an explicit way to pick up keys manually (i.e. salt-key -L; salt-key -a example.hostname ) ? [23:28:24] (sorry if not an appropriate channel) [23:36:39] ssh -A chjohnson@bastion.wmflabs.org = permission denied (public key) ? I have done this a thousand times. [23:38:25] ChrisJ_WMDE: We're having an ssh outage. I'll post here when it's resolved. [23:38:38] ok thanks andrew [23:44:56] Poor ssh, you should accept them for who they are