[09:57:24] 06Labs, 10Labs-Infrastructure, 10Continuous-Integration-Infrastructure: Nodepool has trouble taking snapshots on OpenStack labs - https://phabricator.wikimedia.org/T138106#2398447 (10hashar) p:05Triage>03Low I can't remember whether I managed to reproduce manually using the openstack CLI. But in case it... [10:29:19] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations: Migrate labsdb1005/1006/1007 to jessie - https://phabricator.wikimedia.org/T123731#1936600 (10jcrespo) There is indeed a replacement for labsdb100[123] about to arrive. However, there are no short-term plans for these, as they have lower impact. labsdb10... [10:39:39] 06Labs, 10Tool-Labs: Install debootstrap, fakechroot and fakeroot on tools - https://phabricator.wikimedia.org/T138138#2398645 (10tom29739) My tools often need different packages to be installed, and sometimes these can be installed easily (binaries available to download) but often the only options for binarie... [10:40:35] 06Labs, 10Tool-Labs: Install debootstrap and fakechroot on tools - https://phabricator.wikimedia.org/T138138#2398647 (10tom29739) [10:54:21] 06Labs, 10wikitech.wikimedia.org: wikitech: Tell people to log out log out and in again? - https://phabricator.wikimedia.org/T138373#2398668 (10Peachey88) [10:54:44] 06Labs, 10wikitech.wikimedia.org: wikitech: Tell people to log out log out and in again? - https://phabricator.wikimedia.org/T138373#2398480 (10Peachey88) I would assume there is some context behind this? [10:56:26] enwiki replica will be a bit behind today, as the import process is doing a large chuck, but this is being monitored and under control [10:56:57] hopefully it will start to go down in ~3 hours [10:58:06] 06Labs: Make ladsgroup admin on fa-wp - https://phabricator.wikimedia.org/T138372#2398672 (10Peachey88) [12:08:15] 06Labs, 10wikitech.wikimedia.org: wikitech: Tell people to log out log out and in again? - https://phabricator.wikimedia.org/T138373#2398784 (10Andrew) [12:09:56] 06Labs, 10wikitech.wikimedia.org: wikitech: Tell people to log out log out and in again? - https://phabricator.wikimedia.org/T138373#2398785 (10tom29739) I got really confused by this yesterday, nothing worked, because I didn't know about this. [12:17:55] 06Labs, 10wikitech.wikimedia.org: wikitech: Tell people to log out and in again? - https://phabricator.wikimedia.org/T138373#2398789 (10Andrew) [12:17:59] I'm getting -bash: fork: retry: No child processes when I try to do anything on the tools bastion. [12:18:35] 06Labs, 10wikitech.wikimedia.org: wikitech: Tell people to log out and in again? - https://phabricator.wikimedia.org/T138373#2398792 (10Peachey88) Is anything being doing towards fixing the cause issue? having a looking in #wikitech.wikimedia.org the only relevant tasks I found were: - {T118395} - {T101199} [12:41:42] tom29739: which bastion? and it may be you are hit a max proc count [12:41:57] tools-bastion-03 [12:42:58] It seems to be working now. [12:43:11] But it happens fairly often. [12:44:15] there are limits in place to prevent fork bombing and resource exhaustion among other things and you seem to have a lot of sessions, it's possible tmux usage that isn't ever getting cleaned up? [12:45:12] ps -u tom29739 -U tom29739 | grep bash | wc [12:45:12] 15 60 435 [12:46:58] I closed a few bash windows, did that change it? [12:47:20] 15 to 8 so yep [12:47:56] Thanks. [12:48:10] my guess is you are using tmux or screen or something and sessions are behind orphaned kind of thing, it's per user [12:48:37] I'm using tmux, it says I have 2 sessions, which I do. [12:49:16] ps -u tom29739 -U tom29739 | grep tmux [12:49:16] 1651 pts/51 00:00:00 tmux [12:49:17] 10279 ? 00:05:18 tmux [12:49:18] 21352 pts/33 00:00:00 tmux [12:49:24] That's weird. [12:49:59] I just detached from both, and there's still the ? one there. [12:50:20] yes, I killed it now [12:50:27] you should be in decent shape [12:51:10] but fyi w/ something like screen/dtach leaving behind open bash sessions or something is easy and consumes resources thus we limit max procs [12:51:32] for a long time this would escalate over enough people that it was causing issues [12:51:36] anyways, gtg [13:04:34] 06Labs, 10Labs-Infrastructure: Empty default security group for newly created project - https://phabricator.wikimedia.org/T136871#2398898 (10chasemp) @andrew can you take a look at this when you get back? seems relevant to recent work I imagine :) [13:08:23] finally, labs enwiki lag going down [13:58:52] 06Labs, 10DBA: write irc bot to report high replag of s{1,2,3}.labsdb on #wikimedia-labsdb - https://phabricator.wikimedia.org/T106151#1460476 (10Boshomi) Please notice T138378 ; actually 73970 sec on s1 [14:16:21] !log wikilabesls deploying 503b20c to staging [14:16:21] wikilabesls is not a valid project. [14:24:10] !log wikilabels deploying 503b20c to staging [14:24:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikilabels/SAL, Master [14:24:18] !log wikilabesls deploying 503b20c to prod [14:24:19] wikilabesls is not a valid project. [14:24:25] !log wikilabels deploying 503b20c to prod [14:24:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikilabels/SAL, Master [14:24:54] 06Labs, 10Labs-Infrastructure: Replag on s1 (enwiki) is 70370 and is still growing fast. - https://phabricator.wikimedia.org/T138378#2399276 (10jcrespo) [14:24:56] 06Labs, 10Labs-Infrastructure: Replag on s1 (enwiki) is 70370 and is still growing fast. - https://phabricator.wikimedia.org/T138378#2398563 (10jcrespo) [14:37:48] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Psychoslave was modified, changed by Psychoslave link https://wikitech.wikimedia.org/w/index.php?diff=673021 edit summary: [14:49:52] All but one of the instances on the privpol-captcha project are asking me for a password when I try to ssh in from bastion-01. I've tried rebuilding the instances, terminating them and creating them again but all to no avail. There doesn't seem to be anything in the log that is bad. :/ [14:51:00] tom29739: do you have a hostname for me? [14:52:02] (03PS1) 10Jean-Frédéric: Add Iran in Farsi to Monuments database [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/295507 [14:52:08] Yep, captcha-postgres.privpol-captcha.eqiad.wmflabs captcha-web-1.privpol-captcha.eqiad.wmflabs captcha-web-2.privpol-captcha.eqiad.wmflabs captcha-web-proxy.privpol-captcha.eqiad.wmflabs [14:52:22] valhallasw`cloud, there should be 4 there. [14:53:13] tom29739: odd. root key also doesn't work. [14:53:17] anything in the console log? [14:53:39] also, re-creating hosts with the same hostname is a bad idea [14:54:05] I can login to captcha-web-1 [14:54:22] Jun 22 14:53:57 captcha-web-1 sshd[11951]: error: AuthorizedKeysCommand /usr/sbin/ssh-key-ldap-lookup returned status 1 [14:54:45] mm, that's just for root [14:54:49] captcha-web-proxy is a completely new name. [14:55:24] Jun 22 14:40:18 captcha-web-1 sshd[1798]: error: AuthorizedKeysCommandUser "ssh-key-ldap-lookup" not found: Success [14:55:24] Jun 22 14:40:18 captcha-web-1 sshd[1798]: Failed publickey for tom29739 from 10.68.23.58 port 47037 ssh2: RSA 44:1b:59:53:08:c5:7e:91:6b:a3:a9:3b:68:68:ae:4c [14:55:29] * valhallasw`cloud ponders [14:55:38] Is my public key wrong? [14:55:47] no, when you tried to login, puppet wasn't done yet [14:55:53] can you try again for captcha-web-1? [14:56:05] captcha-web-1 works. [14:56:33] The other 3 don't though. [14:56:36] are you sure puppet is done on all hosts? [14:56:58] How do I know when it's done? [14:57:03] the console log [14:57:49] 'Error: Could not request certificate: The certificate retrieved from the master does not match the agent's private key.' I've just noticed this in the console log for captcha-postgres [14:58:03] '2016-06-22T14:45:29.773270+00:00 captcha-postgres rc.local[405]: To fix this, remove the certificate from both the master and the agent and then start a puppet run, which will automatically regenerate a certficate.' [14:58:13] But I can't access the agent :/ [14:58:15] that sounds like a re-used hostname [14:58:40] Should I completely terminate the lot and start over with new hostnames? [14:59:20] that depends on what's in the console log... [14:59:28] if puppet is still running in the log, wait [14:59:48] if there is an error about the certificate, delete the host and try again with a new hostname [15:00:12] I see this in captcha-web-2: '2016-06-22T14:45:21.767173+00:00 captcha-web-2 puppet-agent[1074]: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find node 'captcha-web-2.privpol-captcha.eqiad.wmflabs'; cannot compile' [15:01:10] please create a bug for that one -- that sounds like an issue with puppet [15:01:28] (as in: the operations/puppet repo or the puppetmaster confgi) [15:03:19] captcha-web-2 was a re-used hostname, which might have affected it. [15:03:46] Oh. in that case, yeah, try again with another hostname [15:08:18] (03CR) 10Lokal Profil: [C: 032] "Tests look happy! GOGOGOGO" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/295507 (owner: 10Jean-Frédéric) [15:09:13] (03Merged) 10jenkins-bot: Add Iran in Farsi to Monuments database [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/295507 (owner: 10Jean-Frédéric) [15:10:24] tom29739: feel free to use anything I say in a logged channel in bug reports, email, or whatever [15:10:47] OK. [15:11:56] It appears to work now, thanks valhallasw`cloud [15:17:12] 06Labs, 10Tool-Labs, 07Tracking: /home/basvb missing replica.my.cnf - https://phabricator.wikimedia.org/T138416#2399516 (10Basvb) [15:18:41] (03PS6) 10Jean-Frédéric: Add local dev environment with docker-compose [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/291198 (https://phabricator.wikimedia.org/T136351) [15:25:53] 06Labs: Nova_Resource:Puppet.privpol-captcha.eqiad.wmflabs will not go away - https://phabricator.wikimedia.org/T138417#2399539 (10tom29739) [15:29:54] 06Labs, 10Labs-Infrastructure: Copy labmon data to new SSDs - https://phabricator.wikimedia.org/T137924#2399557 (10RobH) The SSDs for this arrived today. I've emailed the labs-l list and updated the topic in #wikimedia-labs to reflect the planned work: > Labs users, > > As many of you may recall, (mostly)... [15:30:14] 06Labs, 10Labs-Infrastructure: Copy labmon data to new SSDs - https://phabricator.wikimedia.org/T137924#2399558 (10RobH) a:05yuvipanda>03RobH Stealing this task for implementation tomorrow. [15:30:40] yuvipanda: I hope I dont mess up your labmon server ;D [15:33:54] (03PS7) 10Jean-Frédéric: Add local dev environment with docker-compose [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/291198 (https://phabricator.wikimedia.org/T136351) [15:42:29] !log tools.heritage Deployed latest from Git: 4be4f04, c280649, d667e19 (T136566 & T137543), e867c45, 0a09a20 [15:42:31] T137543: Add banana checker to automated tests for heritage - https://phabricator.wikimedia.org/T137543 [15:42:31] T136566: Move i18n messages from Intuition to locally - https://phabricator.wikimedia.org/T136566 [15:42:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.heritage/SAL, Master [15:51:44] 06Labs, 10Labs-Infrastructure: Copy labmon data to new SSDs - https://phabricator.wikimedia.org/T137924#2399607 (10Krenair) [15:53:26] 06Labs: Make ladsgroup admin on the labs 'fa-wp' project - https://phabricator.wikimedia.org/T138372#2399626 (10Krenair) [16:05:21] 10Quarry: puppet disabled on quarry-main-01 - https://phabricator.wikimedia.org/T136315#2330367 (10Krenair) Looks like it's been this way since the 20th of May at 17:21. [16:13:27] (03PS1) 10Jean-Frédéric: Fix Iran in Farsi config [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/295545 [16:15:43] (03CR) 10Lokal Profil: [C: 032] "sure" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/295545 (owner: 10Jean-Frédéric) [16:17:45] (03Merged) 10jenkins-bot: Fix Iran in Farsi config [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/295545 (owner: 10Jean-Frédéric) [16:35:44] hI [16:35:52] I'm working on Enwiki_p database [16:36:15] And wanted to create a copy of the database [16:36:28] WHat is the way for me to do that? [16:37:08] SoniWP: uh, you can't [16:37:22] the database is way too big for that [16:37:27] why do you want to do that? [16:37:52] Huh [16:38:09] valhallasw`cloud, I'm essentially trying to run this query [16:38:11] https://gist.github.com/halfak/f03bfea42d63824b1856fb60ae5e4aa6 [16:38:34] But I am hitting this error. [16:38:35] ERROR 1044 (42000): Access denied for user 's53024'@'%' to database 'enwiki_p' [16:38:57] I was told that I need to create my own DB to make a Temporary table. [16:39:00] SoniWP: yes. You're not allowed to create tables in enwiki_p. [16:39:11] SoniWP, "CREATE DATABASE s53024__sandbox_p;" [16:39:15] you can create your own database on that server [16:39:16] but [16:39:34] SoniWP: https://phabricator.wikimedia.org/T138111#2393960 [16:40:06] Ah, I see. So I need to start the DB name with my own username to create that DB? [16:40:46] SoniWP: yes, with caveats. [16:40:47] https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Database#Steps_to_create_a_user_database_on_the_replica_servers [16:41:08] you are not going to be able to run EXPLAIN queries either I don't think [16:41:49] In this case, SoniWP just needs some temp tables as intermediary steps in a query [16:42:05] ^ That. [16:42:31] We can get some *massive* performance improvements by doing that [16:46:11] !log wikilabels sudo -u www-data ../venv/bin/wikilabels new_campaign svwiki "Redigeringskvalitet (5k balanserad)" damaging_and_goodfaith DiffToPrevious 1 50 [16:46:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikilabels/SAL, Master [16:46:30] !log wikilabels tail -n +2 ~/datasets/svwiki.revisions_for_review.5k_2016.tsv | cut -f1 | sed -r 's/(.*)/{"rev_id": \1}/' | sudo -u www-data ../venv/bin/wikilabels task_inserts 35 [16:46:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikilabels/SAL, Master [16:55:37] valhallasw`cloud, I am trying to follow the steps in the link you mentioned [16:56:04] But the replica database I created does not seem to have any tables. [16:56:13] Is there a step in between that I am missing? [16:56:26] !log tools.heritage Deployed latest from Git: 76c6dd6c (T138377) [16:56:27] T138377: Add Iran in Farsi to the Monuments Database - https://phabricator.wikimedia.org/T138377 [16:56:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.heritage/SAL, Master [16:57:04] SoniWP: no, creating tables is a seperate step. That's what you do with create table (...) [16:57:35] valhallasw`cloud, So how do I replicate tables in one db over to my sandbox db? [16:57:50] ?? [16:58:27] Basically I'm trying to run queries with temporary tables. Because they improve my query massively. [16:58:57] To do that, I was told I should replicate the DB [16:59:32] No, you have to create a temporary table /on the replica server/ [16:59:32] So I am trying to copy over the "revision" and the "page" tables in the enwiki_p DB [16:59:41] why would you need to do that?! [16:59:54] I think I am confused then [16:59:55] the tables are right there on the server [17:00:08] also, are we talking about the query in T138111 ? [17:00:08] T138111: Run a Tool Labs query without Timing out - https://phabricator.wikimedia.org/T138111 [17:00:14] Can I access enwiki_p while on my replica server? [17:00:26] 'your replica server'? [17:00:42] Nevermind. I think I am horribly confused with the terminology here. [17:01:10] And yes, the query I am curently trying is trying to achieve the same thing that T138111 was. [17:01:11] T138111: Run a Tool Labs query without Timing out - https://phabricator.wikimedia.org/T138111 [17:01:12] a server hosts multiple databases (enwiki_p, s53024...., etc) [17:01:25] each database can contain multiple tables (eg enwiki_p.page) [17:01:32] Specifically, https://gist.github.com/halfak/f03bfea42d63824b1856fb60ae5e4aa6 is the query I am currently trying to run [17:01:39] you can use all tables on one server in a query [17:02:12] SoniWP: SELECT * FROM page WHERE page_random > 0.5 LIMIT 1000; ? [17:02:30] valhallasw`cloud, So I can access multiple databases from the same query? [17:02:34] Yes. [17:02:41] As long as they are on the same server [17:03:02] So basically... [17:03:41] I make my own DB, make a query that access enwiki_p makes a temporary table in my DB [17:04:03] And then use the temporary table from my DB to run the query I'm trying. [17:04:10] valhallasw`cloud, Does that make sense? [17:04:49] Yes. [17:06:15] Great, thanks [17:06:33] A little bit of my confusion is now resolved [17:09:59] 06Labs, 10Labs-Infrastructure: Copy labmon data to new SSDs - https://phabricator.wikimedia.org/T137924#2399966 (10RobH) [17:32:43] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: graphite.wmflabs.org is very slow / flaky - https://phabricator.wikimedia.org/T127957#2400008 (10RobH) [17:35:18] !log discourse Added Gergő Tisza as project admin [17:35:22] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Discourse/SAL, Master [17:42:56] bd808: is that discourse project in use or still a kind of toy beta? [17:43:24] I think its mostly a POC testing area. [17:52:25] !log privpol-captcha Finally got salt master working and talking to minions [17:52:29] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Privpol-captcha/SAL, Master [17:53:43] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations, 07Blocked-on-Operations: adywiki and jamwiki are missing the associated *_p databases with appropriate views - https://phabricator.wikimedia.org/T135029#2400045 (10RobH) [18:10:51] !log wikilabels 8812056 goes to staging [18:10:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikilabels/SAL, Master [18:13:07] !log wikilabels 8812056 goes to prod [18:13:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wikilabels/SAL, Master [18:47:59] 06Labs: Nova_Resource:Puppet.privpol-captcha.eqiad.wmflabs will not go away - https://phabricator.wikimedia.org/T138417#2400168 (10scfc) I always thought that with the migration to Horizon there was a hook (?) that triggered the deletion of the corresponding wiki page, but looking at `modules/openstack` I don't... [18:51:36] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations, and 2 others: adywiki and jamwiki are missing the associated *_p databases with appropriate views - https://phabricator.wikimedia.org/T135029#2400172 (10ori) Sample invocation: ```lang=bash $ DB_USER=root DB_PASS=password perl maintain-replicas.pl 2> T13... [19:01:55] 06Labs, 10Tool-Labs, 07Tracking: /home/basvb missing replica.my.cnf - https://phabricator.wikimedia.org/T138416#2400180 (10scfc) [19:02:05] 06Labs, 10Tool-Labs: /home/basvb missing replica.my.cnf - https://phabricator.wikimedia.org/T138416#2399516 (10scfc) [19:03:31] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Psychoslave was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=674620 edit summary: [19:11:47] 06Labs, 10Tool-Labs: /home/basvb missing replica.my.cnf - https://phabricator.wikimedia.org/T138416#2400189 (10chasemp) thanks @scfc, ran `/usr/local/sbin/delete-dbuser --config /etc/create-dbusers.yaml u10928` so we'll see how it goes [19:57:31] (03PS1) 10Lokal Profil: Add category to "Unknown fields" reports [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/295579 [20:07:38] This page: https://wikitech.wikimedia.org/wiki/Help:Project_hosted_salt_master tells me to add role::salt::masters::labs::project_master to the saltmaster's puppet classes. But on this page: https://wikitech.wikimedia.org/w/index.php?title=Special:NovaInstance&action=configure&instanceid=f9df29b8-3a66-4bea-bad6-57437ec887ac&project=privpol-captcha®ion=eqiad it only shows those 5 puppet classes in 'global groups'. How do I add that class [20:07:40] to that instance? [20:09:04] tom29739: you can add new puppet roles, variables, etc for your project using https://wikitech.wikimedia.org/wiki/Special:NovaPuppetGroup [20:09:28] at one point we had lots of these in the global set but it caused a lot of confusion [20:10:00] The other thing you can do is add things to your project's hiera settings via wikitech [20:10:35] that's actually a lot nicer for keeping things in sync across a project [20:11:01] How do I add puppet classes to the hiera config? [20:11:27] tom29739: the "classes:" hiera key [20:11:31] take a look at https://wikitech.wikimedia.org/wiki/Hiera:Deployment-prep [20:15:35] Change on 12www.mediawiki.org a page OAuth/For Developers was modified, changed by Slashme link https://www.mediawiki.org/w/index.php?diff=2170405 edit summary: Give clear example of where config gets its data from [20:24:08] bd808, I'm getting 'Error: Could not request certificate: getaddrinfo: Name or service not known' when I try to run puppet. On these new instances I made, they say error message in console log and I can't ssh in because it asks for a password. [20:24:22] *they say that error message [20:24:45] The instance hostnames aren't reused or anything. [20:25:17] yuck. I've seen that failure occasionally when bringing up new instances. It's a DNS server hiccup at just the wrong time [20:25:51] I seem to be having no end of problems with this project :/ [20:26:07] Do I need to terminate and create those instances yet again? [20:26:12] (the fourth time) [20:27:14] tom29739: you are getting the error on hosts that you have an active shell on too? [20:27:54] is the error finding the puppetmaster or the salt master? [20:27:58] I'm in a shell session with captcha-saltmaster-04. I'm trying to run puppet with puppet agent --test --verbose and it doesn't work. [20:28:08] It errors out with that error. [20:28:48] the error is with the puppetmaster, I'm trying to create a problem saltmaster. [20:28:54] *project [20:29:14] what does `grep server /etc/puppet/puppet.conf` tell you? [20:29:29] server = labs-puppetmaster-eqiad.wikimedia.org [20:29:54] Should I try pinging it or something? [20:30:13] try `host labs-puppetmaster-eqiad.wikimedia.org` [20:30:49] labs-puppetmaster-eqiad.wikimedia.org is an alias for labcontrol1001.wikimedia.org. [20:30:49] labcontrol1001.wikimedia.org has address 208.80.154.92 [20:31:18] yeah that looks right [20:31:46] I tried running puppet again and it didn't work. [20:31:46] but `sudo puppet agent --test --verbose` is still telling you it can't find lookup the puppetmaster? [20:32:16] I need to use sudo? [20:32:20] * tom29739 facepalms [20:32:34] It works now. [20:32:40] \o/ [20:32:43] Sorry for wasting your time. [20:32:51] no worries [20:33:14] the ones where the initial puppet run failed are more annoying [20:33:23] *sometimes* a reboot fixes them [20:33:39] * tom29739 tries [20:33:51] if that doesn't work though then you are back at the delete the instance and try again stage [20:35:17] I'll run out of names to use at this rate. [20:36:24] 'Warning: Unable to fetch my node definition, but the agent run will continue:#033[0m' I just spotted that in the log when I rebooted it, does that mean anything? [20:36:50] Nope, still asks for a password. [20:37:29] And 'rror: Could not retrieve catalog from remote server: Error 400 on SERVER: Could not find node 'captcha-web-06.privpol-captcha.eqiad.wmflabs'; cannot compile#033[0m' [20:38:20] I was advised to file a bug for that earlier, but I didn't because I thought it was caused because of reusing a hostname. [20:38:51] These new instances don't have reused hostnames. [20:38:51] And these instances are still pointed at the labs puppetmaster? [20:38:56] Yep. [20:39:11] I haven't touched the puppetmaster for these ones. [20:39:38] hmmm... yeah that's a bug somewhere. Cert isn't getting signed on the puppetmaster or something [20:39:54] * bd808 doesn't have access to that host to debug deeper [20:40:26] The very first 2 instances I ever created in the project I tried to get a project puppetmaster working, but I deleted those instances and erased the hiera config. [20:41:02] Is it possible that the old, removed hiera config is somehow coming though? [20:41:57] possible, but not likely I don't think [20:42:17] * tom29739 files a bug... [20:47:53] 06Labs: Initial puppet run failing for new instances in privpol-captcha - https://phabricator.wikimedia.org/T138438#2400395 (10tom29739) [20:48:07] bd808, ^ [20:49:08] And now I'm back to the terminate and recreate stage again... [21:32:14] yuvipanda Hi, do you know how i can get ssh working on labs. [21:32:26] Im trying to get jenkins to ssh. [21:32:35] So i can test something with a test install of gerrit. [21:32:37] please [21:33:11] paladox: what problem are you having? [21:33:31] bd808 i have been trying to test jenkins ssh. [21:33:39] But dosent work [21:34:25] ssh out from jenkins to some host or ssh into jenkins? And what does "doesnt work" actually mean? Do you have error logs? [21:35:01] bd808 https://phabricator.wikimedia.org/P3298 [21:35:10] bd808 and yes trying to ssh out. [21:35:29] Im trying to ssh into the same machine to test weather ssh work's before doing anything outside. [21:35:50] http://gerrit-jenkins.wmflabs.org/ [21:35:56] Have you looked at the sshd demon's logs to see what's happening on that side? [21:36:27] bd808, nope, where is that located please. [21:36:35] "Key exchange was not finished" is not a great error message from the jenkins side :/ [21:37:10] bd808, oh. [21:37:13] I think with our stock config sshd would log to /var/log/syslog [21:37:42] bd808 ok thanks, i will go and look at the log now [21:39:16] bd808 last few mins [21:39:17] Jun 22 21:15:01 gerrit-test CRON[11805]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) [21:39:17] Jun 22 21:17:01 gerrit-test CRON[11831]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) [21:39:18] Jun 22 21:17:01 gerrit-test CRON[11832]: (root) CMD (/usr/local/sbin/puppet-run > /dev/null 2>&1) [21:39:18] Jun 22 21:17:09 gerrit-test puppet-agent[11874]: Sleeping for 23 seconds (splay is enabled) [21:39:19] Jun 22 21:17:32 gerrit-test puppet-agent[11874]: Retrieving pluginfacts [21:39:21] Jun 22 21:17:32 gerrit-test puppet-agent[11874]: Retrieving plugin [21:39:23] Jun 22 21:17:32 gerrit-test puppet-agent[11874]: Loading facts [21:39:25] Jun 22 21:17:39 gerrit-test puppet-agent[11874]: Caching catalog for gerrit-test.git.eqiad.wmflabs [21:39:27] Jun 22 21:17:40 gerrit-test puppet-agent[11874]: Applying configuration version '1466629919' [21:39:29] Jun 22 21:17:40 gerrit-test crontab[12080]: (root) LIST (root) [21:39:31] Jun 22 21:17:48 gerrit-test puppet-agent[11874]: Finished catalog run in 8.37 seconds [21:39:33] Jun 22 21:25:01 gerrit-test CRON[12648]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) [21:39:36] Jun 22 21:35:01 gerrit-test CRON[12769]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1) [21:42:08] paladox: I was wrong. sshd logs to /var/log/auth.log. Looks for lines containing "sshd" [21:42:23] bd808, ok thanks [21:42:48] you may also want to verify that you have the right java classes installed -- https://issues.jenkins-ci.org/browse/JENKINS-26495 [21:43:09] bd808 oh, right classes? [21:43:13] Is that plugins. [21:43:16] Im using jenkins 2.x [21:44:28] the last comment on that bug report says that not having JCE (java crypto extension) installed with the JRE cause a similar "Key exchange was not finished, connection is closed." error [21:46:35] bd808, i think i installed the plugin, but maybe it is the wrong version i installed. [21:46:43] bd808 where do i install it. [21:46:44] please [21:47:19] paladox: I don't know. There is this really cool search engine called google... :) [21:47:48] bd808, yep, i used that but i carn't remember what i search for that showed me the path to java. [21:56:46] bd808 you install it in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/security [22:00:21] bd808 i installed the extension and still the same thing http://gerrit-jenkins.wmflabs.org/computer/Test/log [22:00:58] http://gerrit-jenkins.wmflabs.org/job/test/17/console [22:08:06] paladox: from that log, "Offering RSA public key: /var/lib/jenkins/.ssh/id_rsa" is jenkins sending the key to the server. It gets rejected, but you will need to look in the auth.log to figure out why. [22:08:59] how did you setup the user that is being used to connect? [22:16:39] bd808 well i used the package from jenkins apt-get. [22:17:12] that's not going to setup a user account that can ssh into the machine [22:17:35] assuming the keys need to match up, how would that work? I think there is specific CI puppet stuff for VM's in this space now [22:17:42] i.e. vm's that get ssh'd into to run CI logic [22:17:48] you need a user account that exists in Labs LDAP and has a key configured [22:23:54] !log deployment-prep Installed netpbm on all deployment-mediawiki* hosts to fix ProofreadPage thumbnailing. I wonder if we should include the puppet mediawiki::packages::multimedia class on these hosts really [22:23:54] Please !log in #wikimedia-releng for beta cluster SAL [22:24:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL, Master [22:28:55] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations, and 2 others: adywiki and jamwiki are missing the associated *_p databases with appropriate views - https://phabricator.wikimedia.org/T135029#2400629 (10ori) I ran it against labsdb1001 and labsdb1003 and it seems to have done the trick. labsdb1002's MySQ... [22:34:42] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations, and 2 others: adywiki and jamwiki are missing the associated *_p databases with appropriate views - https://phabricator.wikimedia.org/T135029#2400632 (10MaxSem) Confirmed working. Let's keep this bug open until labsdb1002 also gets fixed. [22:36:57] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations, and 2 others: adywiki and jamwiki are missing the associated *_p databases with appropriate views - https://phabricator.wikimedia.org/T135029#2400639 (10Krenair) Thank you @ori! I believe the issues with labsdb1002 are {T126946} [23:00:10] (03PS1) 10Lokal Profil: Add wikidata connection to monuments_all and Qid tester to updater [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/295594 (https://phabricator.wikimedia.org/T55808) [23:00:36] 06Labs: tools-worker-1011 has no working user accounts?! - https://phabricator.wikimedia.org/T138447#2400655 (10yuvipanda) [23:11:59] PROBLEM - Host latestimagetest101 is DOWN: CRITICAL - Host Unreachable (10.68.22.143) [23:13:36] 06Labs, 10Tool-Labs, 07Tracking: Tool Labs users missing replica.my.cnf (tracking) - https://phabricator.wikimedia.org/T135931#2400690 (10scfc) [23:13:38] 06Labs, 10Tool-Labs: /home/basvb missing replica.my.cnf - https://phabricator.wikimedia.org/T138416#2400687 (10scfc) 05Open>03Resolved a:03chasemp ``` scfc@tools-bastion-03:~$ sudo sudo -iu basvb sql enwiki_p Reading table information for completion of table and column names You can turn off this feature... [23:28:15] 06Labs, 10Labs-Infrastructure, 10DBA: maintain-replicas.pl unmaintained, unmaintainable - https://phabricator.wikimedia.org/T138450#2400728 (10ori) [23:29:53] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations, and 2 others: adywiki and jamwiki are missing the associated *_p databases with appropriate views - https://phabricator.wikimedia.org/T135029#2400771 (10ori) Filed {T138450} for tracking issues with the script. [23:30:27] 06Labs, 10Labs-Infrastructure, 10DBA: maintain-replicas.pl unmaintained, unmaintainable - https://phabricator.wikimedia.org/T138450#2400728 (10ori) [23:30:31] 06Labs, 10Labs-Infrastructure, 10DBA, 06Operations, and 2 others: adywiki and jamwiki are missing the associated *_p databases with appropriate views - https://phabricator.wikimedia.org/T135029#2400777 (10ori) [23:30:57] 06Labs, 10Labs-Infrastructure, 10DBA, 07Blocked-on-Operations: maintain-replicas.pl unmaintained, unmaintainable - https://phabricator.wikimedia.org/T138450#2400728 (10ori) [23:43:45] 06Labs, 10Labs-Infrastructure, 10DBA, 07Blocked-on-Operations: maintain-replicas.pl unmaintained, unmaintainable - https://phabricator.wikimedia.org/T138450#2400728 (10Krenair) I wouldn't call this strictly blocked-on-ops yet (since although ops would have to approve the code, the next step is writing it a...