[07:40:34] petan: ryan upgraded the cluster overnight :) [07:40:36] upgraded aha [07:40:37] :D [07:40:37] !ping [07:40:38] pong [07:40:38] you must have missed the labs-l post ;) [07:40:40] I just came to office :P [07:40:41] will check my personal mail soon [07:41:22] !beta restarted apache process on Apaches boxes [07:41:22] !log deployment-prep restarted apache process on Apaches boxes [07:41:25] Logged the message, Master [07:41:40] let me know if you run into bugs [07:41:44] it's freshly upgraded [07:42:38] deployment-prep likely needs to be booted in the proper order [07:42:53] would really be nice to kill off the boot order dependencies [07:43:38] do you have any dependency in mind? [07:43:54] some of the instances fail because there's an nfs mount in the fstab [07:44:06] ahh [07:44:32] when I migrated from our NFS instance to /data/project, I did not had the mount ensure=>absent snippets in puppet :( [07:44:58] heh [07:45:25] RECOVERY HTTP is now: OK on deployment-apache33 i-0000031b output: HTTP OK: HTTP/1.1 200 OK - 27256 bytes in 0.087 second response time [07:46:22] nagios works? o.o [07:46:26] that's news [07:46:34] heh [07:46:38] who fixed it [07:46:44] <- [07:46:47] yay [07:46:50] how? [07:46:57] petan: still broken : http://nagios.wmflabs.org/cgi-bin/nagios3/status.cgi?hostgroup=deployment-prep&style=detail [07:47:04] nrpe had the wrong ip on a number of instances [07:47:08] ooh [07:47:25] though yeah, still has issues [07:47:27] not sure why [07:47:36] (No output returned from plugin) [07:47:39] hm [07:47:40] not sure why its doing that [07:47:42] that's weird [07:47:50] haven't really investigated much, though [07:47:58] is apache puppetied on that instance? [07:47:58] we need to have a bug for that :P [07:48:05] because / needs to redirect [07:48:06] surely not [07:48:15] I fixed it and somehow that disappeared [07:48:28] there is a lot of non puppetized stuff [07:48:34] * Ryan_Lane nods [07:48:50] !beta migrated Apaches boxes from applicationserver::labs to role::applicationserver [07:48:50] !log deployment-prep migrated Apaches boxes from applicationserver::labs to role::applicationserver [07:48:52] Logged the message, Master [07:49:00] I need to fix some of my scripts [07:49:05] I'm going to do that tomorrow [07:49:13] instance creation, for instance, works [07:49:14] but... [07:49:15] PROBLEM Puppet freshness is now: CRITICAL on robh-spl i-00000369 output: (Return code of 127 is out of bounds - plugin may be missing) [07:49:25] home directories aren't being created [07:49:39] keys should still be updated, though [07:49:56] and gluster shares aren't being updated either [07:50:19] both are easy fixes, though :) [07:51:02] allowed_hosts=127.0.0.1 [07:51:08] nrpe_local.cfg [07:51:46] PROBLEM host: deployment-feed is DOWN address: i-00000118 CRITICAL - Host Unreachable (i-00000118) [07:51:46] PROBLEM host: gerrit-puppet-andrewhaul is DOWN address: i-000003c8 CRITICAL - Host Unreachable (i-000003c8) [07:51:46] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [07:52:06] PROBLEM host: ve-nodejs is DOWN address: i-00000245 CRITICAL - Host Unreachable (i-00000245) [07:52:14] ok. i need to go to bed [07:52:15] * Ryan_Lane waves [07:52:16] PROBLEM host: configtest-main is DOWN address: i-000002dd CRITICAL - Host Unreachable (i-000002dd) [07:52:16] PROBLEM host: conventionextension-test is DOWN address: i-000003c0 CRITICAL - Host Unreachable (i-000003c0) [07:52:16] PROBLEM host: deployment-cache-upload is DOWN address: i-00000263 CRITICAL - Host Unreachable (i-00000263) [07:52:26] PROBLEM host: mobile-feeds is DOWN address: i-000000c1 CRITICAL - Host Unreachable (i-000000c1) [07:52:32] Ryan_Lane: have a good night :) [07:52:47] thanks. see you guys tomorrow [07:52:56] PROBLEM host: wep is DOWN address: i-000000c2 CRITICAL - Host Unreachable (i-000000c2) [07:52:56] PROBLEM host: deployment-backup is DOWN address: i-000000f8 CRITICAL - Host Unreachable (i-000000f8) [07:55:26] RECOVERY HTTP is now: OK on deployment-apache32 i-0000031a output: HTTP OK: HTTP/1.1 200 OK - 27256 bytes in 0.085 second response time [07:57:16] PROBLEM host: p-b is DOWN address: i-000000ae CRITICAL - Host Unreachable (i-000000ae) [07:58:56] PROBLEM host: pageviews is DOWN address: i-000000b2 CRITICAL - Host Unreachable (i-000000b2) [08:19:16] PROBLEM Puppet freshness is now: CRITICAL on php-packaging i-000003ae output: (Return code of 127 is out of bounds - plugin may be missing) [08:21:46] PROBLEM host: deployment-feed is DOWN address: i-00000118 CRITICAL - Host Unreachable (i-00000118) [08:21:46] PROBLEM host: gerrit-puppet-andrewhaul is DOWN address: i-000003c8 CRITICAL - Host Unreachable (i-000003c8) [08:21:46] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [08:22:06] PROBLEM host: ve-nodejs is DOWN address: i-00000245 CRITICAL - Host Unreachable (i-00000245) [08:22:16] PROBLEM Puppet freshness is now: CRITICAL on mobile-sphinx i-00000364 output: (Return code of 127 is out of bounds - plugin may be missing) [08:22:16] PROBLEM Puppet freshness is now: CRITICAL on syslogcol-ac i-00000362 output: (Return code of 127 is out of bounds - plugin may be missing) [08:22:16] PROBLEM host: configtest-main is DOWN address: i-000002dd CRITICAL - Host Unreachable (i-000002dd) [08:22:16] PROBLEM host: conventionextension-test is DOWN address: i-000003c0 CRITICAL - Host Unreachable (i-000003c0) [08:22:16] PROBLEM host: deployment-cache-upload is DOWN address: i-00000263 CRITICAL - Host Unreachable (i-00000263) [08:22:26] PROBLEM host: mobile-feeds is DOWN address: i-000000c1 CRITICAL - Host Unreachable (i-000000c1) [08:22:56] PROBLEM host: wep is DOWN address: i-000000c2 CRITICAL - Host Unreachable (i-000000c2) [08:22:56] PROBLEM host: deployment-backup is DOWN address: i-000000f8 CRITICAL - Host Unreachable (i-000000f8) [08:27:16] PROBLEM host: p-b is DOWN address: i-000000ae CRITICAL - Host Unreachable (i-000000ae) [08:28:56] PROBLEM host: pageviews is DOWN address: i-000000b2 CRITICAL - Host Unreachable (i-000000b2) [08:29:16] PROBLEM Puppet freshness is now: CRITICAL on cvn-apache2 i-00000339 output: (Return code of 127 is out of bounds - plugin may be missing) [08:29:16] PROBLEM Puppet freshness is now: CRITICAL on deployment-apache32 i-0000031a output: (Return code of 127 is out of bounds - plugin may be missing) [08:29:16] PROBLEM Puppet freshness is now: CRITICAL on deployment-apache33 i-0000031b output: (Return code of 127 is out of bounds - plugin may be missing) [08:29:16] PROBLEM Puppet freshness is now: CRITICAL on deployment-cache-bits02 i-0000031c output: (Return code of 127 is out of bounds - plugin may be missing) [08:29:16] PROBLEM Puppet freshness is now: CRITICAL on deployment-jobrunner06 i-0000031d output: (Return code of 127 is out of bounds - plugin may be missing) [08:29:17] PROBLEM Puppet freshness is now: CRITICAL on greensmw1 i-0000032c output: (Return code of 127 is out of bounds - plugin may be missing) [08:29:17] PROBLEM Puppet freshness is now: CRITICAL on incubator-bot3 i-00000340 output: (Return code of 127 is out of bounds - plugin may be missing) [08:29:18] PROBLEM Puppet freshness is now: CRITICAL on jesusaurus-cleanup i-0000038a output: (Return code of 127 is out of bounds - plugin may be missing) [08:29:18] PROBLEM Puppet freshness is now: CRITICAL on pdbhandler-1 i-0000030e output: (Return code of 127 is out of bounds - plugin may be missing) [08:29:19] PROBLEM Puppet freshness is now: CRITICAL on signwriting-ase10 i-00000322 output: (Return code of 127 is out of bounds - plugin may be missing) [08:29:19] PROBLEM Puppet freshness is now: CRITICAL on signwriting-ase9 i-00000316 output: (Return code of 127 is out of bounds - plugin may be missing) [08:29:20] PROBLEM Puppet freshness is now: CRITICAL on sultest1 i-0000032d output: (Return code of 127 is out of bounds - plugin may be missing) [08:29:20] PROBLEM Puppet freshness is now: CRITICAL on sultest2 i-00000330 output: (Return code of 127 is out of bounds - plugin may be missing) [08:29:21] PROBLEM Puppet freshness is now: CRITICAL on sultestdb i-0000032f output: (Return code of 127 is out of bounds - plugin may be missing) [08:51:46] PROBLEM host: deployment-feed is DOWN address: i-00000118 CRITICAL - Host Unreachable (i-00000118) [08:51:46] PROBLEM host: gerrit-puppet-andrewhaul is DOWN address: i-000003c8 CRITICAL - Host Unreachable (i-000003c8) [08:51:46] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [08:52:06] PROBLEM host: ve-nodejs is DOWN address: i-00000245 CRITICAL - Host Unreachable (i-00000245) [08:52:16] PROBLEM host: configtest-main is DOWN address: i-000002dd CRITICAL - Host Unreachable (i-000002dd) [08:52:16] PROBLEM host: conventionextension-test is DOWN address: i-000003c0 CRITICAL - Host Unreachable (i-000003c0) [08:52:16] PROBLEM host: deployment-cache-upload is DOWN address: i-00000263 CRITICAL - Host Unreachable (i-00000263) [08:52:26] PROBLEM host: mobile-feeds is DOWN address: i-000000c1 CRITICAL - Host Unreachable (i-000000c1) [08:52:56] PROBLEM host: wep is DOWN address: i-000000c2 CRITICAL - Host Unreachable (i-000000c2) [08:52:56] PROBLEM host: deployment-backup is DOWN address: i-000000f8 CRITICAL - Host Unreachable (i-000000f8) [08:57:16] PROBLEM host: p-b is DOWN address: i-000000ae CRITICAL - Host Unreachable (i-000000ae) [08:58:56] PROBLEM host: pageviews is DOWN address: i-000000b2 CRITICAL - Host Unreachable (i-000000b2) [09:21:46] PROBLEM host: deployment-feed is DOWN address: i-00000118 CRITICAL - Host Unreachable (i-00000118) [09:21:46] PROBLEM host: gerrit-puppet-andrewhaul is DOWN address: i-000003c8 CRITICAL - Host Unreachable (i-000003c8) [09:21:46] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [09:22:06] PROBLEM host: ve-nodejs is DOWN address: i-00000245 CRITICAL - Host Unreachable (i-00000245) [09:22:16] PROBLEM host: configtest-main is DOWN address: i-000002dd CRITICAL - Host Unreachable (i-000002dd) [09:22:16] PROBLEM host: conventionextension-test is DOWN address: i-000003c0 CRITICAL - Host Unreachable (i-000003c0) [09:22:16] PROBLEM host: deployment-cache-upload is DOWN address: i-00000263 CRITICAL - Host Unreachable (i-00000263) [09:22:26] PROBLEM host: mobile-feeds is DOWN address: i-000000c1 CRITICAL - Host Unreachable (i-000000c1) [09:22:56] PROBLEM host: wep is DOWN address: i-000000c2 CRITICAL - Host Unreachable (i-000000c2) [09:22:56] PROBLEM host: deployment-backup is DOWN address: i-000000f8 CRITICAL - Host Unreachable (i-000000f8) [09:27:16] PROBLEM Puppet freshness is now: CRITICAL on dumps-1 i-00000355 output: (Return code of 127 is out of bounds - plugin may be missing) [09:27:16] PROBLEM host: p-b is DOWN address: i-000000ae CRITICAL - Host Unreachable (i-000000ae) [09:28:56] PROBLEM host: pageviews is DOWN address: i-000000b2 CRITICAL - Host Unreachable (i-000000b2) [09:51:46] PROBLEM host: deployment-feed is DOWN address: i-00000118 CRITICAL - Host Unreachable (i-00000118) [09:51:46] PROBLEM host: gerrit-puppet-andrewhaul is DOWN address: i-000003c8 CRITICAL - Host Unreachable (i-000003c8) [09:51:46] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [09:52:06] PROBLEM host: ve-nodejs is DOWN address: i-00000245 CRITICAL - Host Unreachable (i-00000245) [09:52:16] PROBLEM Puppet freshness is now: CRITICAL on mediawiki-dev-1 i-0000039c output: (Return code of 127 is out of bounds - plugin may be missing) [09:52:16] PROBLEM host: configtest-main is DOWN address: i-000002dd CRITICAL - Host Unreachable (i-000002dd) [09:52:16] PROBLEM host: conventionextension-test is DOWN address: i-000003c0 CRITICAL - Host Unreachable (i-000003c0) [09:52:16] PROBLEM host: deployment-cache-upload is DOWN address: i-00000263 CRITICAL - Host Unreachable (i-00000263) [09:52:26] PROBLEM host: mobile-feeds is DOWN address: i-000000c1 CRITICAL - Host Unreachable (i-000000c1) [09:52:56] PROBLEM host: wep is DOWN address: i-000000c2 CRITICAL - Host Unreachable (i-000000c2) [09:52:56] PROBLEM host: deployment-backup is DOWN address: i-000000f8 CRITICAL - Host Unreachable (i-000000f8) [09:57:16] PROBLEM host: p-b is DOWN address: i-000000ae CRITICAL - Host Unreachable (i-000000ae) [09:58:56] PROBLEM host: pageviews is DOWN address: i-000000b2 CRITICAL - Host Unreachable (i-000000b2) [10:12:16] PROBLEM Puppet freshness is now: CRITICAL on dumps-incr i-0000035d output: (Return code of 127 is out of bounds - plugin may be missing) [10:12:16] PROBLEM Puppet freshness is now: CRITICAL on translation-memory-3 i-00000358 output: (Return code of 127 is out of bounds - plugin may be missing) [10:12:25] hashar [10:12:40] meh [10:14:19] !log deployment-prep root: test [10:14:21] Logged the message, Master [10:20:09] !log deployment-prep root: rebooting bastion [10:20:10] Logged the message, Master [10:21:46] PROBLEM host: deployment-feed is DOWN address: i-00000118 CRITICAL - Host Unreachable (i-00000118) [10:21:46] PROBLEM host: gerrit-puppet-andrewhaul is DOWN address: i-000003c8 CRITICAL - Host Unreachable (i-000003c8) [10:21:46] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [10:22:06] PROBLEM host: ve-nodejs is DOWN address: i-00000245 CRITICAL - Host Unreachable (i-00000245) [10:22:16] PROBLEM host: configtest-main is DOWN address: i-000002dd CRITICAL - Host Unreachable (i-000002dd) [10:22:16] PROBLEM host: conventionextension-test is DOWN address: i-000003c0 CRITICAL - Host Unreachable (i-000003c0) [10:22:16] PROBLEM host: deployment-cache-upload is DOWN address: i-00000263 CRITICAL - Host Unreachable (i-00000263) [10:22:26] PROBLEM host: mobile-feeds is DOWN address: i-000000c1 CRITICAL - Host Unreachable (i-000000c1) [10:22:56] PROBLEM host: wep is DOWN address: i-000000c2 CRITICAL - Host Unreachable (i-000000c2) [10:22:56] PROBLEM host: deployment-backup is DOWN address: i-000000f8 CRITICAL - Host Unreachable (i-000000f8) [10:27:16] PROBLEM host: p-b is DOWN address: i-000000ae CRITICAL - Host Unreachable (i-000000ae) [10:28:56] PROBLEM host: pageviews is DOWN address: i-000000b2 CRITICAL - Host Unreachable (i-000000b2) [10:44:16] PROBLEM Puppet freshness is now: CRITICAL on hildisvini i-000003ac output: (Return code of 127 is out of bounds - plugin may be missing) [10:51:46] PROBLEM host: deployment-feed is DOWN address: i-00000118 CRITICAL - Host Unreachable (i-00000118) [10:51:46] PROBLEM host: gerrit-puppet-andrewhaul is DOWN address: i-000003c8 CRITICAL - Host Unreachable (i-000003c8) [10:51:46] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [10:52:06] PROBLEM host: ve-nodejs is DOWN address: i-00000245 CRITICAL - Host Unreachable (i-00000245) [10:52:16] PROBLEM host: configtest-main is DOWN address: i-000002dd CRITICAL - Host Unreachable (i-000002dd) [10:52:16] PROBLEM host: conventionextension-test is DOWN address: i-000003c0 CRITICAL - Host Unreachable (i-000003c0) [10:52:16] PROBLEM host: deployment-cache-upload is DOWN address: i-00000263 CRITICAL - Host Unreachable (i-00000263) [10:52:26] PROBLEM host: mobile-feeds is DOWN address: i-000000c1 CRITICAL - Host Unreachable (i-000000c1) [10:52:56] PROBLEM host: wep is DOWN address: i-000000c2 CRITICAL - Host Unreachable (i-000000c2) [10:52:56] PROBLEM host: deployment-backup is DOWN address: i-000000f8 CRITICAL - Host Unreachable (i-000000f8) [10:57:16] PROBLEM host: p-b is DOWN address: i-000000ae CRITICAL - Host Unreachable (i-000000ae) [10:58:56] PROBLEM host: pageviews is DOWN address: i-000000b2 CRITICAL - Host Unreachable (i-000000b2) [10:59:16] PROBLEM Puppet freshness is now: CRITICAL on gerrit-db i-0000038b output: (Return code of 127 is out of bounds - plugin may be missing) [10:59:16] PROBLEM Puppet freshness is now: CRITICAL on wikiminiatlas i-0000038c output: (Return code of 127 is out of bounds - plugin may be missing) [11:01:16] PROBLEM Puppet freshness is now: CRITICAL on gerrit-build i-00000387 output: (Return code of 127 is out of bounds - plugin may be missing) [11:01:16] PROBLEM Puppet freshness is now: CRITICAL on puppet-abogott i-00000389 output: (Return code of 127 is out of bounds - plugin may be missing) [11:21:46] PROBLEM host: deployment-feed is DOWN address: i-00000118 CRITICAL - Host Unreachable (i-00000118) [11:21:46] PROBLEM host: gerrit-puppet-andrewhaul is DOWN address: i-000003c8 CRITICAL - Host Unreachable (i-000003c8) [11:21:46] PROBLEM host: analytics is DOWN address: i-000000e2 CRITICAL - Host Unreachable (i-000000e2) [11:22:06] PROBLEM host: ve-nodejs is DOWN address: i-00000245 CRITICAL - Host Unreachable (i-00000245) [11:22:16] PROBLEM host: configtest-main is DOWN address: i-000002dd CRITICAL - Host Unreachable (i-000002dd) [11:22:16] PROBLEM host: conventionextension-test is DOWN address: i-000003c0 CRITICAL - Host Unreachable (i-000003c0) [11:22:16] PROBLEM host: deployment-cache-upload is DOWN address: i-00000263 CRITICAL - Host Unreachable (i-00000263) [11:22:26] PROBLEM host: mobile-feeds is DOWN address: i-000000c1 CRITICAL - Host Unreachable (i-000000c1) [11:22:56] PROBLEM host: wep is DOWN address: i-000000c2 CRITICAL - Host Unreachable (i-000000c2) [11:22:56] PROBLEM host: deployment-backup is DOWN address: i-000000f8 CRITICAL - Host Unreachable (i-000000f8) [11:23:16] PROBLEM Puppet freshness is now: CRITICAL on apachemxetc i-00000348 output: (Return code of 127 is out of bounds - plugin may be missing) [11:23:16] PROBLEM Puppet freshness is now: CRITICAL on deployment-cache-upload03 i-0000034b output: (Return code of 127 is out of bounds - plugin may be missing) [11:23:16] PROBLEM Puppet freshness is now: CRITICAL on deployment-integration i-0000034a output: (Return code of 127 is out of bounds - plugin may be missing) [11:23:16] PROBLEM Puppet freshness is now: CRITICAL on extrev1 i-00000346 output: (Return code of 127 is out of bounds - plugin may be missing) [11:23:16] PROBLEM Puppet freshness is now: CRITICAL on rocsteady-cleanup i-00000349 output: (Return code of 127 is out of bounds - plugin may be missing) [11:27:16] PROBLEM host: p-b is DOWN address: i-000000ae CRITICAL - Host Unreachable (i-000000ae) [11:28:56] PROBLEM host: pageviews is DOWN address: i-000000b2 CRITICAL - Host Unreachable (i-000000b2) [11:37:16] PROBLEM Puppet freshness is now: CRITICAL on aggregator-test1 i-000002bf output: (Return code of 127 is out of bounds - plugin may be missing) [11:37:16] PROBLEM Puppet freshness is now: CRITICAL on aggregator1 i-0000010c output: (Return code of 127 is out of bounds - plugin may be missing) [11:37:16] PROBLEM Puppet freshness is now: CRITICAL on aggregator2 i-000002c0 output: (Return code of 127 is out of bounds - plugin may be missing) [11:38:16] PROBLEM Puppet freshness is now: CRITICAL on asher1 i-0000003a output: (Return code of 127 is out of bounds - plugin may be missing) [11:38:16] PROBLEM Puppet freshness is now: CRITICAL on bastion-restricted1 i-0000019b output: (Return code of 127 is out of bounds - plugin may be missing) [11:38:16] PROBLEM Puppet freshness is now: CRITICAL on bastion1 i-000000ba output: (Return code of 127 is out of bounds - plugin may be missing) [11:38:16] PROBLEM Puppet freshness is now: CRITICAL on bob i-0000012d output: (Return code of 127 is out of bounds - plugin may be missing) [11:38:16] PROBLEM Puppet freshness is now: CRITICAL on bots-1 i-000000a9 output: (Return code of 127 is out of bounds - plugin may be missing) [14:47:07] <^demon> andrewbogott: So, on a fresh install I'm getting mostly a success, barring http://p.defau.lt/?ItbWWAn87WYxjnsMoctl0g [14:47:39] ^demon: Yeah, I was wondering what to do about that private reference, just when labs stopped working yesterday. [14:47:48] We don't need the private key for labs/gerrit do we? [14:47:59] <^demon> Yes we will. [14:48:09] <^demon> So we can test replication. [14:48:51] <^demon> We can generate a separate one for labs [14:52:32] So, hm, I think I don't know how puppet/private is handled in labs at all. Are labs instances ever allowed to access it? [14:52:54] <^demon> They use a different repo, labs/private, which is in gerrit. [14:54:19] Oh, that should be straightforward then. We just need to move that reference into the role so it can be switched accordingly... [14:54:32] I am moderately distracted atm, but am happy to look at that in a bit. [14:55:48] <^demon> The private key can sit in the same path, iirc. Puppet on labs resolves that to the same repo. [14:55:53] <^demon> Need to double check though [14:56:04] <^demon> Only public key will need to change in gerrit.pp [14:58:12] !log deployment-prep petrb: fixed mounts [14:58:14] Logged the message, Master [14:58:22] !log deployment-prep petrb: we have a new bastion :D [14:58:23] Logged the message, Master [14:58:44] <^demon> Subsequent runs seem to make the page-bkg.jpg error disappear and it looks sane. [14:58:48] <^demon> Just some weird error on a first run [14:58:55] * andrewbogott nods [15:00:24] <^demon> The service error is a problem with how java's installed. [15:00:26] <^demon> "Cannot find a JRE or JDK. Please set JAVA_HOME to a >=1.6 JRE" [15:00:46] <^demon> Prolly cuz we're using openjdk...I had this same problem. [15:00:55] <^demon> Oh, how did I fix it... [15:03:39] !log deployment-prep petrb: updated puppet [15:03:40] Logged the message, Master [15:30:21] <^demon> andrewbogott: Argh, it's because java's installed in different places on labs & production. [15:30:29] <^demon> So the production config is breaking labs [15:30:52] Hm, is there any good reason not to just normalize the labs install so it's the same as production? [15:34:45] <^demon> Yeah, something needs fixing either in prod or labs [15:34:53] <^demon> No need to have different config here. [15:36:20] Well, maybe this is a way we can parallelize? Want me to work on refactoring java whilst you continue with gerrit? [15:37:34] <^demon> Yeah, if you'd take a look at java I'll take a look at the ssh key. [15:42:10] 'k [15:56:36] ^demon: Did you have to build a new instance because your old one wouldn't come up after the upgrade? Or were you just sprucing up? [15:57:33] <^demon> Been sprucing up every so often. [15:57:54] <^demon> Running it repetitively in various states of brokenness ends up in a very confused puppet. [16:01:05] Mine seems to've perished in the deluge. [16:02:00] <^demon> I just deleted gerrit-puppet-overhaul and gerrit-puppet-overhaulz [16:02:06] <^demon> I didn't touch the andrew one [16:02:25] I deleted it because I couldn't reach it. Nothing was there that isn't in gerrit anyway. [16:16:18] Hm. ^demon, have you built an instance since yesterday? [16:16:35] <^demon> One this morning. [16:16:53] Did /home get mounted correctly? [16:17:12] <^demon> As far as I know. [16:17:15] <^demon> Haven't had any problems [16:17:55] hm [16:18:45] <^demon> I fixed the apache restart problem. Wrapped the SSL commands in [16:18:49] <^demon> PS30 ^ [16:19:51] <^demon> Oh, ServerName is still probably wrong :\ [16:25:08] <^demon> And 31 fixes $url. [16:28:55] And I still don't have an instance to work on… I'm going to vanish briefly while I wait for my latest attempt to start up. [16:36:37] <^demon> Other than /home missing, I'm not having any problems with the new instance. [16:36:38] <^demon> Hmm [16:55:07] <^demon> Whoo, fixed private key :) [17:01:02] * Damianz pokes petan [17:01:22] :P [17:01:35] o.0 [17:02:07] petan: Any chance of getting added to the nagios project to a) fix the command syntax on some checks to stop it spamming and possibly b) somewhat puppetize it [17:02:18] ok [17:21:26] <^demon> andrewbogott: Well, I fixed the key issue, but somewhere I introduced a syntax error that I'm just not seeing. [17:21:51] ^demon: OK, I'll look. btw, does it turn out that $HOME is broken for you, same as it was for me? [17:23:11] <^demon> On your new instance, yes. [17:23:16] <^demon> Works on the one I made earlier today though [17:33:18] <^demon> Time to play spot the syntax error everyone :) [17:33:24] <^demon> https://gerrit.wikimedia.org/r/#/c/13484/31..34/manifests/gerrit.pp -- where is the mistake? [17:33:29] You're missing some commas, for starters... [17:33:40] <^demon> I don't see it. [17:33:53] lines 25 and 26 [17:33:57] But there's something else as well [17:34:17] Oh, wait, hang on -- [17:34:23] the commans are missing in the role class, not the base class. [17:34:26] commas [17:36:22] Actually, I think that's it… I'm getting past that now. [17:36:51] <^demon> Oh whoops, missed the role. [17:36:53] <^demon> in the path [17:36:58] Dammit, I got all set up yesterday to use a local puppet repo, and now I don't have a homedir anymore [17:38:07] <^demon> Extra quote as well. PS36 :p [17:38:32] So where/when is puppet installing java? [17:39:12] <^demon> In gerrit::jetty we install openjdk-jre [17:41:02] <^demon> Which is installing it in /usr/lib/jvm/something [17:41:09] <^demon> A different something from production [17:43:28] <^demon> Actually, easiest way I think is just removing the explicit directive in gerrit.config. [17:43:39] <^demon> /etc/init.d/gerrit is smart enough to find it [17:52:16] andrewbogott: can you help out Steve Slevinski? [17:52:27] he's having issues with mysql and the data directory [17:52:54] sure; what channel is he in? [17:53:03] no clue :) [17:53:07] right here [17:53:09] ah [17:53:10] sweet [17:53:35] Ryan_Lane: Are you interested in hearing about post-upgrade that don't work, or do you already have a long list? [17:53:46] I'd like to know them [17:53:50] bugs would be great [17:54:20] adding bugs to bugzilla would be great, I don't actively want bugs :) [17:54:24] Damianz log changes [17:54:24] I have an instance (i-00000245 ) that is in 'shutoff' state since yesterday noon- how can I start it again? [17:54:36] gwicke: reboot it [17:54:46] if it doesn't come up I'll take a look at it [17:54:49] tried that without success [17:54:52] ok [17:54:55] instance id? [17:54:56] ^demon: Does removing that config line work? Does that mean that fixing puppet is temporarily moot? [17:55:09] Ryan_Lane: i-00000245 [17:55:32] <^demon> andrewbogott: Yeah, I think it'll work [17:56:12] slevinski: I'm multi-tasking, but… what's the problem? (Or, is there an email somewhere in my inbox that already describes it?) [17:57:27] mysql was looking in /mnt/mysql for the database, but the database was in /var/lib/mysql [17:58:34] slevinski: What puppet class are you using, and when did you last try? [17:59:34] https://labsconsole.wikimedia.org/wiki/Nova_Resource:I-00000322 [18:00:38] I tried to manually fix it. Moved the /var/lib/mysql to /mnt/mysql and tried to fix the security rights [18:01:02] selvinsky: I cannot, by default, see into your instasne. What puppet class? [18:01:21] role::mediawiki-install::labs [18:01:34] And, you first set it up a month or so ago? [18:01:41] yes [18:02:37] slevinsky: I believe that I've fixed that particular bug, although that class is still not especially stable. If you aren't invested in that instance you could start a fresh one. Otherwise if you want to add me to your project as an admin I can go in and see about fixing it by hand. [18:03:11] gwicke: ok. it's up [18:03:20] Another option is that you could just remove both directories, uninstall mysql, and do a new puppet run. That /might/ fix things. [18:03:24] Or it might break things worse. [18:03:25] Ryan_Lane: thanks! [18:03:38] when I did a reboot of all instances, it was already in the "rebooting" state, so it didn't happen [18:03:49] The database has some valuable information I'd like to keep if possible. [18:04:05] ah, that was probably because I had tried to reboot it yesterday after it went down [18:04:08] <^demon> andrewbogott: I think I got it working beginning to end :D [18:04:17] I tried to get mysqldump to work, but it couldn't find the data [18:04:19] <^demon> I'll fire up a fresh instance (and pray) [18:04:28] ^demon: Right on! Does it actually do something useful, or just not throw errors? [18:04:49] slevinski: Ah, so the instance was up and working and then subsequently broke? [18:04:50] <^demon> Just not throw any errors :) [18:04:56] <^demon> Might still be some issues to iron out [18:06:16] It was broke. I managed to get it up and running. I tried to sqldump the data but failed. Rebooted the instances and it's broke again. [18:06:41] Could not chdir to home directory /home/damian: No such file or directory interesting [18:07:06] Ryan_Lane: https://bugzilla.wikimedia.org/show_bug.cgi?id=39741 [18:07:10] gwicke: ah, yeah. likely [18:07:26] andrewbogott: ah. yep. knew about this one [18:07:48] this is because the script for this needs to be modified for the new project DIT in LDAP [18:07:56] same with gluster [18:08:09] I'll be taking care of that today [18:08:45] Ryan_Lane: OK! I figured you were on top of it. [18:09:35] slevinski: Do you mind if I add myself to your project, temporarily? [18:09:48] I do not mind. [18:11:10] <^demon> andrewbogott: Still got one failure, but it's because gerrit is stupid and explodes if you configure thing one thing wrong :p [18:11:15] <^demon> I'm gonna fix that upstream. [18:12:10] ok… slevinski, I have added the puppet var 'mysql_datadir' to your project. If you change the value of that setting to point at the directory where your actual sql data is and rerun puppet, it may fix the problem. [18:12:22] Want to give that a try? If it fails then I'll log into the instance directly. [18:12:33] Thanks [18:13:16] slevinski: Sorry that we broke your system… it was due to an obvious rearrangement in the sql code that had a typo in it. You just happened to catch a puppet update on a bad day. [18:13:35] (by 'obvious' I mean 'shouldn't have broken anything') [18:19:49] !log nagios chmod 644 /etc/nagios3/resource.cfg so nagios can read it on reload. [18:19:52] Logged the message, Master [18:20:34] slevinski: Any luck? [18:20:50] !log nagios Copied /etc/nagios3/conf.d to /etc/nagios3/conf.d.backup and sed -i 's/check_nrpe/check_nrpe_1arg/g' /etc/nagios3/conf.d/* to fix nrpe checks, need to check the parser. [18:20:51] Logged the message, Master [18:22:23] ^demon: this patchset is making me die of laughter https://gerrit.wikimedia.org/r/#/c/13484/ [18:22:25] :) [18:22:37] <^demon> Well it's almost done now :) [18:22:38] err [18:22:39] the change [18:22:46] 37 patchsets! :D [18:22:55] is this the record? [18:22:57] How many times did you have to rebase that heh [18:23:01] thats my lucky number! [18:23:04] <^demon> Not many rebases. [18:23:07] this must be the record [18:23:16] 1337 problems! [18:23:18] <^demon> It would've been less if Ryan had just reviewed it 3 weeks ago when I asked ;-) [18:23:20] So calling that [18:23:23] we need a change hall of fame [18:23:35] ^demon: can you add that to the interface? changes with the most patchsets ever? [18:23:37] Can I be listed for "most inline comments (75)"? ;) [18:23:39] :) [18:23:44] RoanKattouw: hahaha [18:23:45] indeed [18:23:50] Not sure how to update the variable. From the manage puppet groups page, when I click the modify link next to the variable, the new special page doesn't have any way to change the value. [18:23:54] mine would be in the hall of fame in that situation too [18:23:57] <^demon> I've got about 80 other things to do. [18:24:02] <^demon> None of which involve making a hall of fame. [18:24:05] ^demon: :) [18:24:20] slevinski: https://labsconsole.wikimedia.org/w/index.php?title=Special:NovaInstance&action=configure&project=signwriting&instanceid=4732cf8f-4f1b-463b-990d-6e8793239d07®ion=pmtpa [18:24:40] * Damianz thinks petan needs to lern to love python [18:24:43] RoanKattouw: hey, thankfully, I'm not on the openstack api, and there won't ever be a change that large to the extension ever again [18:24:46] *now [18:25:12] Did you actually ever get it all reviewed or just merge the last bits? :P [18:25:19] <^demon> Ryan_Lane: Also, if you could put your gerrit package in ops/debs/gerrit, that'd be super-cool ;-) [18:25:29] I didn't get the minor fixes changes [18:25:33] err [18:25:39] I didn't get the minor fixes reviewed [18:25:45] because I forgot about it [18:25:46] slevinski: It's a bit confusing, I think you were on the page to modify puppet options rather than to apply specific puppet options to an instance. [18:26:00] yes, you are correct. [18:26:07] ^demon: my package is garbage [18:26:15] <^demon> Mehhhh :\ [18:26:20] ^demon: I can put it there, but you'll likely need to put some work into it [18:26:23] <^demon> I know that. [18:26:26] right now it just wraps the war [18:26:33] <^demon> I need to remove gitweb dependency, for one. [18:26:37] <^demon> (gerrit does not require gitweb) [18:27:06] ah ok [18:27:10] <^demon> Also, I want init to run automatically on upgrade. It runs on install, so upgrade makes sense too. [18:27:16] hm [18:27:23] where do I even have the package? [18:27:31] <^demon> Good question :) [18:27:35] * Damianz goes to make food while nagios catches up. [18:28:29] ^demon: So, you are off and running, and I should now find something else to do, right? [18:28:41] <^demon> Yeah, I think I'm good now. [18:28:45] <^demon> Thanks for all your help :) [18:28:55] ^demon: I probably want to rearrange and squash a few more global references, but it'll be safest to do that once things are working and merged... [18:29:04] (Because then it'll be easier to tell what I broke.) [18:29:35] ^demon: I'm always happy to help, although I'm not convinced that I did, exactly, in this case :( [18:29:49] But I learned more puppetese. [18:30:29] * andrewbogott only contributed three or four of those 38 patch versions. Barely counts. [18:31:30] Question; are custom debs that are not in gerrit around as source packages anywhere? (if debs so such things like srpm's). [18:35:52] Damianz: some .deb might still be in subversion [18:36:21] Damianz: https://svn.wikimedia.org/viewvc/mediawiki/trunk/debs/ [18:36:42] Damianz: of course some are now in Gerrit and others are no more used and should be deleted [18:37:15] Ah, awesome [18:38:04] !log nagios parser re-breaks configs, removing -a $ARG2$ from check_nrpe now as nothing gets an arg passed anyway. This should be fixed in a better way so we /can/ use args later. [18:38:05] Logged the message, Master [18:40:39] Hmm, graphite debs aren't in there either. Maybe they are just pulled straight from the bug on lp and stuck on the repo. [18:41:03] * jeremyb waves Ryan_Lane. just wondering how the timing is for that user creation/rename [18:45:47] RECOVERY dpkg-check is now: OK on wlmpuppet i-0000035c output: All packages OK [18:45:47] RECOVERY dpkg-check is now: OK on wmde-test i-000002ad output: All packages OK [18:45:47] RECOVERY Disk Space is now: OK on apachemxetc i-00000348 output: DISK OK [18:45:48] RECOVERY Free ram is now: OK on bastion-restricted1 i-0000019b output: OK: 89% free memory [18:45:48] RECOVERY Total Processes is now: OK on bastion1 i-000000ba output: PROCS OK: 116 processes [18:45:51] RECOVERY Current Load is now: OK on bots-2 i-0000009c output: OK - load average: 0.00, 0.00, 0.00 [18:45:51] RECOVERY Disk Space is now: OK on bots-3 i-000000e5 output: DISK OK [18:46:01] RECOVERY Free ram is now: OK on demo-mysql1 i-00000256 output: OK: 94% free memory [18:46:01] RECOVERY Disk Space is now: OK on demo-deployment1 i-00000276 output: DISK OK [18:46:02] RECOVERY Current Users is now: OK on deployment-apache33 i-0000031b output: USERS OK - 0 users currently logged in [18:46:02] RECOVERY Total Processes is now: OK on deployment-cache-bits02 i-0000031c output: PROCS OK: 81 processes [18:46:06] RECOVERY Current Users is now: OK on deployment-cache-upload03 i-0000034b output: USERS OK - 0 users currently logged in [18:46:07] RECOVERY Current Users is now: OK on dumps-1 i-00000355 output: USERS OK - 0 users currently logged in [18:46:07] RECOVERY Disk Space is now: OK on embed-sandbox i-000000d1 output: DISK OK [18:46:07] RECOVERY Current Users is now: OK on extrev1 i-00000346 output: USERS OK - 0 users currently logged in [18:46:07] RECOVERY Current Users is now: OK on gerrit i-000000ff output: USERS OK - 0 users currently logged in [18:46:07] RECOVERY Disk Space is now: OK on grail i-000003aa output: DISK OK [18:46:07] RECOVERY Total Processes is now: OK on greensmw1 i-0000032c output: PROCS OK: 91 processes [18:46:12] RECOVERY Current Users is now: OK on hugglewa-1 i-000001e0 output: USERS OK - 0 users currently logged in [18:46:12] PROBLEM Current Users is now: CRITICAL on integration-apache1 i-000002eb output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:46:12] RECOVERY dpkg-check is now: OK on incubator-bot2 i-00000252 output: All packages OK [18:46:12] RECOVERY Total Processes is now: OK on incubator-bot1 i-00000251 output: PROCS OK: 95 processes [18:46:17] RECOVERY Disk Space is now: OK on kripke i-00000268 output: DISK OK [18:46:22] PROBLEM Total Processes is now: CRITICAL on mobile-wlm i-000002bc output: Connection refused by host [18:46:27] RECOVERY Disk Space is now: OK on mobile-testing i-00000271 output: DISK OK [18:46:27] RECOVERY dpkg-check is now: OK on mwreview i-000002ae output: All packages OK [18:46:27] RECOVERY Disk Space is now: OK on nova-ldap1 i-000000df output: DISK OK [18:46:27] PROBLEM dpkg-check is now: CRITICAL on orgcharts-dev i-0000018f output: DPKG CRITICAL dpkg reports broken packages [18:46:27] PROBLEM Free ram is now: CRITICAL on psm-precise i-000002f2 output: Connection refused by host [18:46:28] RECOVERY Current Users is now: OK on rds i-00000207 output: USERS OK - 0 users currently logged in [18:46:28] RECOVERY Disk Space is now: OK on pediapress-ocg2 i-00000234 output: DISK OK [18:46:29] RECOVERY dpkg-check is now: OK on puppet-abogott i-00000389 output: All packages OK [18:46:29] RECOVERY Free ram is now: OK on redis1 i-000002b6 output: OK: 93% free memory [18:46:30] RECOVERY Current Load is now: OK on shop-analytics-main i-000001e6 output: OK - load average: 0.00, 0.00, 0.00 [18:46:30] RECOVERY Total Processes is now: OK on search-test i-000000cb output: PROCS OK: 83 processes [18:46:32] RECOVERY Current Users is now: OK on signwriting-ase10 i-00000322 output: USERS OK - 3 users currently logged in [18:46:37] * Damianz apologies for the spam *whistles* [18:46:37] RECOVERY Current Users is now: OK on su-fe2 i-000002e6 output: USERS OK - 0 users currently logged in [18:46:37] RECOVERY dpkg-check is now: OK on su-be3 i-000002e9 output: All packages OK [18:46:42] RECOVERY Current Users is now: OK on swift-be4 i-000001ca output: USERS OK - 0 users currently logged in [18:46:42] RECOVERY dpkg-check is now: OK on swift-be2 i-000001c8 output: All packages OK [18:46:47] RECOVERY Current Users is now: OK on translation-memory-3 i-00000358 output: USERS OK - 0 users currently logged in [18:46:47] RECOVERY Disk Space is now: OK on testing-singer-puppetization i-00000331 output: DISK OK [18:46:48] RECOVERY Total Processes is now: OK on tutopuppet i-00000336 output: PROCS OK: 80 processes [18:46:52] RECOVERY Total Processes is now: OK on varnish i-000001ac output: PROCS OK: 82 processes [18:46:57] RECOVERY Current Load is now: OK on ve-nodejs i-00000245 output: OK - load average: 0.05, 0.11, 0.10 [18:46:58] RECOVERY Disk Space is now: OK on ve-parsoid3 i-00000345 output: DISK OK [18:46:58] RECOVERY Current Users is now: OK on wikibits-mysql i-00000341 output: USERS OK - 0 users currently logged in [18:46:58] RECOVERY Free ram is now: OK on wikidata-dev-1 i-0000020c output: OK: 90% free memory [18:47:03] RECOVERY Current Load is now: OK on wikiminiatlas i-0000038c output: OK - load average: 0.00, 0.01, 0.05 [18:47:03] RECOVERY Current Users is now: OK on wikisource-web i-000000fe output: USERS OK - 0 users currently logged in [18:47:03] RECOVERY Current Load is now: OK on wmde-test i-000002ad output: OK - load average: 0.00, 0.00, 0.00 [18:47:03]