[01:05:52] PROBLEM Total processes is now: WARNING on bots-salebot i-00000457.pmtpa.wmflabs output: PROCS WARNING: 181 processes [01:10:52] RECOVERY Total processes is now: OK on bots-salebot i-00000457.pmtpa.wmflabs output: PROCS OK: 103 processes [02:24:22] PROBLEM Free ram is now: CRITICAL on bots-sql2 i-000000af.pmtpa.wmflabs output: 966748 [02:34:22] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af.pmtpa.wmflabs output: 951916 [02:39:12] RECOVERY Free ram is now: OK on bots-2 i-0000009c.pmtpa.wmflabs output: 1645776 [02:39:22] PROBLEM Free ram is now: CRITICAL on bots-sql2 i-000000af.pmtpa.wmflabs output: 980196 [03:56:52] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f.pmtpa.wmflabs output: Warning: 16% free memory [04:21:52] PROBLEM Free ram is now: CRITICAL on orgcharts-dev i-0000018f.pmtpa.wmflabs output: Critical: 4% free memory [04:31:52] RECOVERY Free ram is now: OK on orgcharts-dev i-0000018f.pmtpa.wmflabs output: OK: 95% free memory [04:54:04] I am trying to install a custom package with puppet using the "misc:labsdebrepo". [04:55:18] But I get a "WARNING: the following packages cannot be authenticated" followed by a "E: There are problems and -y was used without --force-yes" [05:09:22] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af.pmtpa.wmflabs output: 907944 [05:13:57] Ah, I think I figured out how to sign the repo with my own key and get it to accept it [05:46:02] RECOVERY Current Users is now: OK on aggregator2 i-000002c0.pmtpa.wmflabs output: USERS OK - 0 users currently logged in [05:46:42] RECOVERY Disk Space is now: OK on aggregator2 i-000002c0.pmtpa.wmflabs output: DISK OK [05:47:22] PROBLEM Free ram is now: WARNING on aggregator2 i-000002c0.pmtpa.wmflabs output: Warning: 7% free memory [05:48:32] RECOVERY SSH is now: OK on aggregator2 i-000002c0.pmtpa.wmflabs output: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [05:49:12] RECOVERY Total processes is now: OK on aggregator2 i-000002c0.pmtpa.wmflabs output: PROCS OK: 218 processes [05:49:52] RECOVERY dpkg-check is now: OK on aggregator2 i-000002c0.pmtpa.wmflabs output: All packages OK [05:50:22] RECOVERY Current Load is now: OK on aggregator2 i-000002c0.pmtpa.wmflabs output: OK - load average: 0.02, 0.25, 0.44 [06:14:32] PROBLEM Free ram is now: CRITICAL on bots-sql2 i-000000af.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [06:19:21] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af.pmtpa.wmflabs output: 876868 [06:24:32] PROBLEM Free ram is now: CRITICAL on bots-sql2 i-000000af.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [06:28:23] PROBLEM Total processes is now: WARNING on dumps-bot2 i-000003f4.pmtpa.wmflabs output: PROCS WARNING: 153 processes [06:32:22] PROBLEM Free ram is now: UNKNOWN on aggregator2 i-000002c0.pmtpa.wmflabs output: NRPE: Call to fork() failed [06:34:02] PROBLEM Current Users is now: CRITICAL on aggregator2 i-000002c0.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [06:36:32] PROBLEM SSH is now: CRITICAL on aggregator2 i-000002c0.pmtpa.wmflabs output: Server answer: [06:37:12] PROBLEM Total processes is now: CRITICAL on aggregator2 i-000002c0.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [06:37:22] PROBLEM Free ram is now: CRITICAL on aggregator2 i-000002c0.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [06:37:52] PROBLEM dpkg-check is now: CRITICAL on aggregator2 i-000002c0.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [06:38:22] PROBLEM Current Load is now: CRITICAL on aggregator2 i-000002c0.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [06:39:42] PROBLEM Disk Space is now: CRITICAL on aggregator2 i-000002c0.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [06:42:12] PROBLEM Free ram is now: WARNING on bots-2 i-0000009c.pmtpa.wmflabs output: 1739328 [06:53:23] RECOVERY Total processes is now: OK on dumps-bot2 i-000003f4.pmtpa.wmflabs output: PROCS OK: 149 processes [07:17:59] !ping [07:17:59] pong [07:34:08] oh god... Nemo is in the dumps project? who... when... [07:35:09] @seenrx Ryan_Lane [07:35:09] petan: Ryan_Lane is in here, right now (multiple results were found: Ryan_Lane1) [07:35:19] hey Ryan_Lane [07:35:26] there is some problem with sudo? [07:37:01] Damianz can you restore the original nagios ram plugin I wrote? [07:37:15] should be in gerrit history [07:37:33] eh [07:37:35] nvm [07:39:46] mutante: There? Did Nemo bis say what he intends to do inside the dumps project? [07:41:00] !ping [07:41:00] pong [07:44:04] zzz I should have better prepared for the day when dumps project opens its doors... sigh... [07:45:58] 10/04/2012 - 07:45:58 - Updating keys for tisane at /export/keys/tisane [08:04:22] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af.pmtpa.wmflabs output: 925800 [08:18:44] Damianz: You there? [08:19:32] PROBLEM Free ram is now: CRITICAL on bots-sql2 i-000000af.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [08:33:32] PROBLEM Free ram is now: WARNING on dumps-bot3 i-000003ef.pmtpa.wmflabs output: 6826316 [08:39:22] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af.pmtpa.wmflabs output: 897592 [08:57:12] RECOVERY Free ram is now: OK on bots-2 i-0000009c.pmtpa.wmflabs output: 1707256 [09:09:12] PROBLEM Total processes is now: WARNING on aggregator-test1 i-000002bf.pmtpa.wmflabs output: PROCS WARNING: 199 processes [10:44:12] PROBLEM Total processes is now: CRITICAL on aggregator-test1 i-000002bf.pmtpa.wmflabs output: PROCS CRITICAL: 202 processes [11:14:12] PROBLEM Total processes is now: WARNING on aggregator-test1 i-000002bf.pmtpa.wmflabs output: PROCS WARNING: 198 processes [12:19:18] Quentinv57: don't forget to use /mnt/share to store your private data, instead of your /home [12:19:34] just create a directory there and chmod 700 [12:19:55] your home is shared across all instances including unsafe [12:20:18] petan, okay, thanks, so I should place my bot scripts on /mnt/share on bots-nr1 ? [12:21:21] in folder in there [12:21:28] for example /mnt/share/quentin [12:21:52] you should never store big stuff in home, it's very small disk and slow [12:22:01] /mnt is best [12:22:21] it's local disk, accessible only from restricted instance [12:49:32] PROBLEM Free ram is now: CRITICAL on bots-sql2 i-000000af.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [12:54:22] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af.pmtpa.wmflabs output: 913036 [13:04:12] PROBLEM Total processes is now: WARNING on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS WARNING: 190 processes [13:12:30] RECOVERY Total processes is now: OK on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS OK: 97 processes [13:28:32] PROBLEM Total processes is now: WARNING on bastion1 i-000000ba.pmtpa.wmflabs output: PROCS WARNING: 156 processes [13:50:21] PROBLEM Free ram is now: WARNING on bots-2 i-0000009c.pmtpa.wmflabs output: 1718836 [15:07:12] PROBLEM Total processes is now: WARNING on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS WARNING: 190 processes [15:27:12] RECOVERY Total processes is now: OK on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS OK: 97 processes [16:09:11] andrewbogott: Is it possible to set the ssh timeout higher? [16:10:01] Jan_Luca: I'm not sure. Can you tell me what's happening/what you want to happen? [16:11:28] I get after ca. 3 minutes inactivity the message: "Write failed: Broken pipe [16:11:28] " two times [16:12:20] I'm using ProxyCommand [16:12:54] Hm, I've had that happen but only at a particular coffeeshop, which makes me think it's a local setting. [16:13:03] Jan_Luca: I get that also, I think it is a local setting, not a server setting, but I could be wrong. [16:14:24] Jan_Luca: Are you at a gelato parlor? It only ever happens to me at the gelato shop :) [16:14:42] * andrewbogott is not helpful [16:15:43] andrewbogott: Are you using bastion? [16:15:49] yep [16:16:05] Um, well, actually, bastion-restricted which is different from the one you're using I think [16:16:49] Maybe there is a timeout setting? [16:21:08] Jan_Luca: I don't know. I doubt that it's a setting on Bastion, no one else has mentioned it. [16:22:58] Google told me: The option "ServerAliveInterval xx" should fix this [16:27:35] Jan_Luca: Yep, looks like that's something you can set locally. I'm making a note in case I need to work with ice cream again :) [16:28:00] :-D [16:31:26] Jan_Luca: you can do the same with a client setting [16:31:47] ServerAliveInterval xx is a client setting [16:32:05] The server setting would be ClientAliveInterval xx [16:32:21] ah. right [16:36:22] PROBLEM Free ram is now: WARNING on deployment-integration i-0000034a.pmtpa.wmflabs output: 851156 [16:41:21] RECOVERY Free ram is now: OK on deployment-integration i-0000034a.pmtpa.wmflabs output: 498040 [16:45:12] RECOVERY Free ram is now: OK on bots-2 i-0000009c.pmtpa.wmflabs output: 1714352 [16:53:48] Damianz: The other day you were talking about having puppet generate a random password and stow it someplace? [16:54:04] Am I remember that correctly? And, if so, did you come up with any good ideas? [17:03:39] yeah, I didn't come up with any good idea's really. the best idea I could think of was if /root/.my.cnf doesn't exist change the password and write that file. Which would fail on current boxes where the password has been changed from puppet and could also fail if someone removed that file. [17:03:56] Not really an awesome easy way to do a 'run once' after installing the package as far as I can tell [17:14:36] adminxor: Btw for future reference you can't edit patchsets which extends the history [17:15:30] Damianz: I'm sorry, didn't get that. [17:15:48] s/can't/can/ bleh [17:16:44] ah [17:17:29] andrewbogott: Want to review some changes for puppet? [17:17:59] Damianz, sure. [17:18:35] https://gerrit.wikimedia.org/r/#/c/25758/ fixes snmp traps for labs nagios, https://gerrit.wikimedia.org/r/#/c/23681/ fixes the output for the free ram check used in labs [17:21:06] Damianz, I was not able to amend the changes though. [17:21:12] I followed https://labsconsole.wikimedia.org/wiki/Help:Git for amending a change. [17:21:17] But at the end I got a message refusing to push to review [17:21:48] Hmm, interesting. That should work [17:21:58] You might need to git add the files though or it could see 0 changes [17:22:40] brb grabbing food then I'll revise my puppetmaster::self changes and work on bots manifests some more [17:23:50] Yes, I added the file. [17:23:59] I had to remove a file from the previous change and add the new one after I did a FETCH_HEAD thing. [17:24:00] Maybe it's me doing something wrong. [17:24:07] :) [17:24:50] Damianz, I don't think I know what 'neon' is and why we don't want it on labs. [17:26:39] andrewbogott: I /think/ it's for prod, not 100% sure though [17:26:45] ok. [17:27:11] Good point about fqdn though [17:27:28] second one looks good; you're just cutting down on excess logfile verbiage, right? [17:28:21] Ah, neon runs icinga probably a test box that Leslie was working with. [17:28:51] Ryan_Lane, can you please take a look at this, if it's any good [17:28:52] https://gerrit.wikimedia.org/r/#/c/26642/ [17:28:53] Yeah, basically that makes the % show in the output rather than the raw number (which it use to). Seems to just be left over from some testing/debug work that was done with that file. [17:29:08] Damianz: What's your preferred tool for generating a random password? [17:29:15] pwgen -N 1 [17:29:30] Ah, that's easy. [17:29:43] there's a puppet module to add a function for it also [17:30:34] No changes between HEAD and gerrit/production. lies [17:31:52] adminxor: You're right, it doesn't work. [17:32:06] Which is strange considering I've edited things dozens of times in this mannor ... [17:33:33] * Damianz rebases and tries again [17:36:06] Change on 12mediawiki a page Developer access was modified, changed by Matthias M. link https://www.mediawiki.org/w/index.php?diff=590560 edit summary: [17:36:20] ^demon: Any ideas on why gerrit would refuse an updated patchset? [17:36:26] I don't think you changed what you wanted to change in patchset 2 [17:36:32] OH, maybe that's what you're fighting with [17:36:50] * andrewbogott in a meeting now [17:36:53] I rebased it via gerrit so probably 0 changed [17:37:52] <^demon> If nothing changed, it won't accept it. As long as it changes in some manner (files, commit msg, parent) gerrit should accept it. [17:38:28] !log wikidata-dev wikidata-dev-3: dev repo is not working after a complete reinstall with new database. :( [17:38:30] Logged the message, Master [17:38:42] I changed the file, edited the commit (which git show shows the correct updated diff), gerrit says I never changed it.. [17:39:31] Change on 12mediawiki a page Developer access was modified, changed by Matthias M. link https://www.mediawiki.org/w/index.php?diff=590561 edit summary: /* User:Matthias M. */ change e-Mail [17:41:16] http://pastie.org/private/ks7k0p45ojlwdzwqgcgfa < I definitely changed the file.... unless I'm doing something really dumb [17:42:30] <^demon> Damianz: That's git-review yelling at you, not gerrit. [17:42:39] <^demon> Try `git push origin HEAD:refs/for/production` [17:42:48] <^demon> I hate git-review so much :( [17:43:23] <^demon> Your branch is 18 commits ahead of origin/production, which is probably confusing git-review. [17:43:51] It's really not though, if I tried to push git would go hey you're at head. [17:44:00] pusing to that ref seemed to work. [17:44:33] Yep that works... *adds git alias* [17:44:36] Thanks ^demon. [17:44:38] <^demon> yw. [17:45:04] <^demon> You can swap HEAD for any given commit you want to push. git-review is just a wrapper around that, really, plus some extra "magic" [17:45:22] <^demon> I wouldn't mind ditching git-review. Only thing it's good for is automatically installing the commit-msg hook. [17:45:41] I kinda prefer the idea of working from branches over all this lets go into a detached pretend branch state stuff =/ [17:46:05] ^demon: It didn't do that too. I had to do an scp to get the commit-msg [17:46:28] <^demon> Well it's supposed to. If it didn't, it just further proves my point that it's useless. [17:46:42] PROBLEM dpkg-check is now: CRITICAL on wikidata-dev-1 i-0000020c.pmtpa.wmflabs output: DPKG CRITICAL dpkg reports broken packages [17:46:46] Thanks andrewbogott :) [17:46:48] :D [17:46:55] np [17:46:57] <^demon> Nobody uses git-review except us and openstack :p [17:47:25] We should either improve git-review to be a less dumb wrapper or say hey just makes these aliases and use those :D [17:48:16] !seen Hydriz [17:48:25] Change on 12mediawiki a page Developer access was modified, changed by Matthias M. link https://www.mediawiki.org/w/index.php?diff=590563 edit summary: /* User:Matthias M. */ removed, for now [17:48:29] @seen Hydriz [17:48:29] Damianz: Last time I saw Hydriz they were quiting the network N/A at 10/4/2012 2:38:16 PM (03:10:12.6696540 ago) [17:48:38] aha, thanks:) [17:48:46] i did not even expect we really have it [17:49:18] now i would like to leave a message for Hydriz with the bot that it tells him on rejoin :) [17:49:22] seriously someone linked to http://www.google.com/recaptcha/mailhide/ for their email... [17:49:27] that would be nice [17:49:49] like right now I'd happily go @memo petan Free ram check output should be fixed :) [17:50:07] <^demon> Someone else wrote a new tool called git-change which looks pretty cool. [17:50:11] <^demon> Haven't tried it yet. [17:50:13] <^demon> http://engblog.nextdoor.com/post/27136956002/introducing-git-change [17:51:18] One of those incredibly useful tools was Mondrian, a proprietary code review tool with a slick web UI written by Guido van Rossum. < Yeah Mondrian looks just like gerrit and it's not something I'd call 'slick' heh [17:51:23] Actually I'd say it looks worse than gerrit [17:51:42] RECOVERY dpkg-check is now: OK on wikidata-dev-1 i-0000020c.pmtpa.wmflabs output: All packages OK [17:52:08] I like the idea of selecting reviewers when you submit the change. [17:52:12] Might check this out later. [17:54:13] New patchset: DamianZaremba; "Re-enabling puppet check" [labs/nagios-builder] (master) - https://gerrit.wikimedia.org/r/26645 [17:54:25] Change merged: DamianZaremba; [labs/nagios-builder] (master) - https://gerrit.wikimedia.org/r/26645 [17:58:24] Damianz: i see some working RAM checks though .. where is it broken [17:58:36] mutante: ? [17:58:49] The current warnings are puppetmaster::self or hosts with puppet not running. [17:59:03] It got fixed a while back, the last change was a change to the output. [17:59:03] did i get you wrong? i thought you are sayiing all the Nagios free RAM checks are broken currently [17:59:11] ah, ok [17:59:17] There's like 19 broken. [17:59:34] and they are all using ::self ? [17:59:34] Along with like 5 not running nrpe, some with broken dpkg (which will affect updates) [17:59:59] Dunno, going to check their configs on the wiki and write a status email later. [18:00:02] taking a look at the dpkg [18:01:26] oh, now Nagios just died [18:01:45] http://nagios.wmflabs.org/cgi-bin/nagios3/status.cgi?host=all&servicestatustypes=16&hoststatustypes=15 [18:01:50] yeah typo sorry [18:01:53] kk [18:08:30] Really need to get this box puppetized [18:08:36] Change on 12mediawiki a page Developer access was modified, changed by LouisDang link https://www.mediawiki.org/w/index.php?diff=590568 edit summary: [18:10:42] PROBLEM Disk Space is now: CRITICAL on mobile-wlm i-000002bc.pmtpa.wmflabs output: Connection refused by host [18:10:52] PROBLEM Current Users is now: CRITICAL on patchtest i-000000f1.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [18:11:02] PROBLEM Disk Space is now: CRITICAL on wikidata-dev-3 i-00000225.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:11:02] PROBLEM Current Users is now: CRITICAL on patchtest2 i-000000fd.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [18:11:02] PROBLEM Disk Space is now: CRITICAL on maps-test2 i-00000253.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:11:02] PROBLEM Disk Space is now: CRITICAL on ipv6test1 i-00000282.pmtpa.wmflabs output: DISK CRITICAL - free space: / 0 MB (0% inode=54%): [18:11:02] PROBLEM Disk Space is now: CRITICAL on aggregator2 i-000002c0.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:11:02] PROBLEM Disk Space is now: CRITICAL on integration-apache1 i-000002eb.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:11:12] PROBLEM Disk Space is now: CRITICAL on sultest1 i-0000032d.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:11:12] PROBLEM Disk Space is now: CRITICAL on ganglia-test2 i-00000250.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [18:11:12] PROBLEM Disk Space is now: CRITICAL on deployment-video03 i-000003c1.pmtpa.wmflabs output: DISK CRITICAL - free space: / 0 MB (0% inode=82%): [18:11:22] PROBLEM Disk Space is now: WARNING on ve-roundtrip2 i-0000040d.pmtpa.wmflabs output: DISK WARNING - free space: /run 522 MB (5% inode=99%): [18:11:42] PROBLEM Free ram is now: CRITICAL on wikidata-dev-3 i-00000225.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:11:42] PROBLEM Disk Space is now: CRITICAL on patchtest i-000000f1.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [18:11:42] PROBLEM Free ram is now: CRITICAL on maps-test2 i-00000253.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:11:42] PROBLEM Disk Space is now: CRITICAL on patchtest2 i-000000fd.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [18:11:42] PROBLEM Free ram is now: WARNING on ipv6test1 i-00000282.pmtpa.wmflabs output: 460616 [18:11:42] PROBLEM Free ram is now: CRITICAL on mobile-wlm i-000002bc.pmtpa.wmflabs output: Connection refused by host [18:11:43] PROBLEM Free ram is now: CRITICAL on aggregator2 i-000002c0.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:11:43] PROBLEM Free ram is now: WARNING on aggregator-test1 i-000002bf.pmtpa.wmflabs output: Warning: 8% free memory [18:11:44] PROBLEM Free ram is now: UNKNOWN on su-fe2 i-000002e6.pmtpa.wmflabs output: NRPE: Unable to read output [18:11:44] PROBLEM Free ram is now: CRITICAL on integration-apache1 i-000002eb.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:11:45] PROBLEM Free ram is now: UNKNOWN on pdbhandler-1 i-0000030e.pmtpa.wmflabs output: NRPE: Unable to read output [18:11:45] PROBLEM Free ram is now: CRITICAL on sultest1 i-0000032d.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:11:56] PROBLEM Free ram is now: UNKNOWN on mars i-000003a8.pmtpa.wmflabs output: NRPE: Unable to read output [18:11:57] PROBLEM Free ram is now: UNKNOWN on deployment-video03 i-000003c1.pmtpa.wmflabs output: NRPE: Unable to read output [18:11:57] PROBLEM Free ram is now: WARNING on dumps-bot3 i-000003ef.pmtpa.wmflabs output: Warning: 13% free memory [18:12:12] PROBLEM Free ram is now: CRITICAL on patchtest i-000000f1.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [18:12:12] PROBLEM Free ram is now: CRITICAL on patchtest2 i-000000fd.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [18:12:42] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af.pmtpa.wmflabs output: 902968 [18:13:02] PROBLEM SSH is now: CRITICAL on mobile-wlm i-000002bc.pmtpa.wmflabs output: Connection refused [18:13:02] PROBLEM SSH is now: CRITICAL on aggregator2 i-000002c0.pmtpa.wmflabs output: Server answer: [18:13:02] PROBLEM SSH is now: CRITICAL on ganglia-test2 i-00000250.pmtpa.wmflabs output: CRITICAL - Socket timeout after 10 seconds [18:13:32] PROBLEM Total processes is now: CRITICAL on wikidata-dev-3 i-00000225.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:13:32] PROBLEM Total processes is now: WARNING on incubator-apache i-00000211.pmtpa.wmflabs output: PROCS WARNING: 179 processes [18:13:32] PROBLEM Total processes is now: CRITICAL on maps-test2 i-00000253.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:13:32] PROBLEM Total processes is now: CRITICAL on kripke i-00000268.pmtpa.wmflabs output: PROCS CRITICAL: 229 processes [18:13:33] PROBLEM Total processes is now: CRITICAL on mobile-wlm i-000002bc.pmtpa.wmflabs output: Connection refused by host [18:13:33] PROBLEM Total processes is now: CRITICAL on aggregator2 i-000002c0.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:13:33] PROBLEM Total processes is now: WARNING on aggregator-test1 i-000002bf.pmtpa.wmflabs output: PROCS WARNING: 194 processes [18:13:33] PROBLEM Total processes is now: CRITICAL on integration-apache1 i-000002eb.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:13:42] PROBLEM Total processes is now: CRITICAL on sultest1 i-0000032d.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:13:42] PROBLEM Total processes is now: CRITICAL on ganglia-test2 i-00000250.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [18:13:43] PROBLEM Total processes is now: WARNING on syslogcol-srv i-000003a9.pmtpa.wmflabs output: PROCS WARNING: 191 processes [18:14:02] PROBLEM Total processes is now: WARNING on bastion1 i-000000ba.pmtpa.wmflabs output: PROCS WARNING: 153 processes [18:14:02] PROBLEM Total processes is now: CRITICAL on aggregator1 i-0000010c.pmtpa.wmflabs output: PROCS CRITICAL: 247 processes [18:14:02] PROBLEM dpkg-check is now: CRITICAL on rds i-00000207.pmtpa.wmflabs output: DPKG CRITICAL dpkg reports broken packages [18:14:02] PROBLEM dpkg-check is now: CRITICAL on worker1 i-00000208.pmtpa.wmflabs output: DPKG CRITICAL dpkg reports broken packages [18:14:12] PROBLEM dpkg-check is now: CRITICAL on wikidata-dev-3 i-00000225.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:14:12] PROBLEM dpkg-check is now: CRITICAL on upload-wizard i-0000021c.pmtpa.wmflabs output: DPKG CRITICAL dpkg reports broken packages [18:14:12] PROBLEM Total processes is now: CRITICAL on patchtest i-000000f1.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [18:14:12] PROBLEM dpkg-check is now: CRITICAL on log1 i-00000239.pmtpa.wmflabs output: DPKG CRITICAL dpkg reports broken packages [18:14:12] PROBLEM dpkg-check is now: CRITICAL on maps-test2 i-00000253.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:14:12] PROBLEM dpkg-check is now: CRITICAL on wikidata-dev-2 i-00000259.pmtpa.wmflabs output: DPKG CRITICAL dpkg reports broken packages [18:14:13] PROBLEM Total processes is now: CRITICAL on patchtest2 i-000000fd.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [18:14:13] PROBLEM dpkg-check is now: CRITICAL on mobile-testing i-00000271.pmtpa.wmflabs output: DPKG CRITICAL dpkg reports broken packages [18:14:14] PROBLEM dpkg-check is now: CRITICAL on mobile-wlm i-000002bc.pmtpa.wmflabs output: Connection refused by host [18:14:14] PROBLEM dpkg-check is now: CRITICAL on aggregator2 i-000002c0.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:14:15] PROBLEM dpkg-check is now: CRITICAL on integration-apache1 i-000002eb.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:14:15] PROBLEM dpkg-check is now: CRITICAL on sultest1 i-0000032d.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:14:42] PROBLEM dpkg-check is now: CRITICAL on ee-prototype i-0000013d.pmtpa.wmflabs output: DPKG CRITICAL dpkg reports broken packages [18:14:42] PROBLEM dpkg-check is now: CRITICAL on fundraising-db i-0000015c.pmtpa.wmflabs output: DPKG CRITICAL dpkg reports broken packages [18:14:42] PROBLEM dpkg-check is now: CRITICAL on orgcharts-dev i-0000018f.pmtpa.wmflabs output: DPKG CRITICAL dpkg reports broken packages [18:14:42] PROBLEM Current Load is now: CRITICAL on wikidata-dev-3 i-00000225.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:14:42] PROBLEM dpkg-check is now: CRITICAL on patchtest i-000000f1.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [18:14:43] PROBLEM dpkg-check is now: CRITICAL on patchtest2 i-000000fd.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [18:14:43] PROBLEM Current Load is now: CRITICAL on maps-test2 i-00000253.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:14:52] PROBLEM Current Load is now: CRITICAL on mobile-wlm i-000002bc.pmtpa.wmflabs output: Connection refused by host [18:14:52] PROBLEM Current Load is now: CRITICAL on aggregator2 i-000002c0.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:14:52] PROBLEM Current Load is now: CRITICAL on integration-apache1 i-000002eb.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:14:52] PROBLEM Current Load is now: CRITICAL on sultest1 i-0000032d.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:14:52] PROBLEM Current Load is now: WARNING on echo-xmpp i-00000351.pmtpa.wmflabs output: WARNING - load average: 6.97, 7.15, 7.17 [18:14:52] PROBLEM Current Load is now: CRITICAL on ganglia-test2 i-00000250.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [18:15:22] PROBLEM Current Users is now: CRITICAL on wikidata-dev-3 i-00000225.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:15:22] PROBLEM Current Load is now: CRITICAL on patchtest i-000000f1.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [18:15:22] PROBLEM Current Load is now: CRITICAL on patchtest2 i-000000fd.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [18:15:22] PROBLEM Current Users is now: CRITICAL on maps-test2 i-00000253.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:15:32] PROBLEM Current Users is now: CRITICAL on mobile-wlm i-000002bc.pmtpa.wmflabs output: Connection refused by host [18:15:32] PROBLEM Current Users is now: CRITICAL on aggregator2 i-000002c0.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:15:32] PROBLEM Current Users is now: CRITICAL on integration-apache1 i-000002eb.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:15:32] PROBLEM Current Users is now: CRITICAL on sultest1 i-0000032d.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:15:32] PROBLEM Current Users is now: CRITICAL on ganglia-test2 i-00000250.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [18:17:12] PROBLEM Puppet freshness is now: CRITICAL on bastion-restricted1 i-0000019b.pmtpa.wmflabs output: Puppet has not run in the last 10 hours [18:17:12] PROBLEM Puppet freshness is now: CRITICAL on publicdata-administration i-0000019e.pmtpa.wmflabs output: Puppet has not run in the last 10 hours [18:17:12] PROBLEM Puppet freshness is now: CRITICAL on robh2 i-000001a2.pmtpa.wmflabs output: Puppet has not run in the last 10 hours [18:17:12] PROBLEM Puppet freshness is now: CRITICAL on varnish i-000001ac.pmtpa.wmflabs output: Puppet has not run in the last 10 hours [18:17:12] PROBLEM Puppet freshness is now: CRITICAL on swift-be1 i-000001c7.pmtpa.wmflabs output: Puppet has not run in the last 10 hours [18:26:03] Ryan_Lane: Don't suppose you're around? [18:26:56] yeah [18:26:57] I am [18:27:03] Damianz: what's up? [18:27:03] Can you look at an instance? [18:27:28] bots-sql2 oomed out nslcd etc so I couldn't login, rebooted it didn't come back, reboot says failed now, console log says nothing, ping says dead [18:27:32] :( [18:27:42] ugh [18:27:49] (tried reboot a few times, just says failed every time) [18:27:53] paravoid: around? [18:28:08] paravoid: how did you solve that grub issue on the instance that wouldn't come back up? [18:28:56] * Damianz thinks he's going to sort this instance tonight by either resizing or re-installing it as it's next to useless currently. [19:31:24] http://opengeoserver.org/ that's freaking cool [19:40:37] OMG [19:40:39] A MAP [19:40:40] OMG [19:44:04] * Damianz gives Reedy his pills [19:48:55] i could use some pills, too [19:50:09] You take the blue pill - the story ends, you wake up in your bed and believe whatever you want to believe. You take the red pill - you stay in Wonderland and I show you how deep the rabbit-hole goes. [19:56:05] Ryan_Lane: Don't suppose you got to fixing that instance or where you waiting for paravoid to re-appear? [19:56:14] didn't get a chance yet [19:56:44] kk [19:57:04] * Damianz goes to find some clothes to throw in the washer [20:09:42] !log editor-engagement running dpkg --configure -a on ee-prototype to fix broken dpkg [20:09:43] Logged the message, Master [20:10:13] aha, ganglia-monitor ..encountered errors [20:10:22] Parse error for '/etc/ganglia/gmond.conf' [20:10:40] Starting Ganglia Monitor Daemon: unexpected token '}' [20:11:43] tasty... that's managed by puppet [20:13:32] │ Override local changes to /etc/pam.d/common-*? │ [20:14:54] I'd say yes, puppet should replace anything needed after [20:17:10] Errors were encountered while processing:php5-memcached [20:17:13] sigh [20:18:00] /var/lib/dpkg/info/php5-memcached.postinst: 9: php5enmod: not found [20:18:49] !log editor-engagement on ee-prototype: removed ganglia-monitor package having issues, upgraded packages with dist-upgrade (incl. mysql,kernel,..lots), override local changes to pam.d/common-*: yes, reinstalled ganglia-monitor, still has issues with ganglia-monitor and also php5-memcached [20:18:50] Logged the message, Master [20:20:51] !log editor-engagement ee-prototype is on lucid, want to try a precise upgrade in place? [20:20:52] Logged the message, Master [20:21:18] Probably should deploy new boxes over distupgrade [20:22:54] dist-upgrade is still within lucid. to go to precise it would be like do-release-upgrade -p [20:23:51] from personal experience the do-release-upgrade also seems to work fine, but deploying a new instance is cleaner. let the people in the project decide [20:24:19] such silly naming [20:26:17] mutante: idle question, what's the motivation to upgrade the ee-prototype host? [20:27:06] chrismcmahonbrb: we have been talking about Nagios criticals and this instance reported broken DPKG, so no upgrades could be installed at all and there were lots of packages with available upgrades [20:27:35] i just picked it as a random example which had broken DPKG and i was already a user on [20:28:03] mutante: thanks, I was wondering because we're mirroring the critical software on that host on beta labs right now (more or less, still got a maintenance hole here and there) [20:29:40] chrismcmahonbrb: yw, for your info though it still has issues now, but you have quite a few packages upgraded, but stay within lucid [20:30:31] i did not reboot it either [20:30:43] but it could use a reboot since it got a new kernel [20:32:19] i wouldn't just do that without asking but since we enabled auto-upgrades for packages installing them would have happened automatically if there would not have been these issues [20:35:29] rebooting might be a bad idea [20:36:02] dunno if Ryan's issue is random or annoyingly standard [20:56:10] Does anyone here have experience with cgroup? [21:04:25] You means cgroups? [21:05:08] Exactly. [21:06:18] I know about it, but haven't used them [21:06:30] Only used them a small bit [21:07:20] I was just playing around with this and wondered if we have any practical usage. [21:08:07] People still go for limits.conf file for memory/cpu fine tuning. But, this one seems like a real good stuff. [21:08:14] Potentially for shared instances like bots [21:08:26] More flexiable than ulimits [21:08:58] Yes, can be useful where more than one apps are running on a single server. [21:09:18] I see systemd has taken advantage of that very much. [21:09:29] !log centralauth Setup en.review1-MW-installation [21:09:31] Logged the message, Master [21:10:54] Not sure about ubuntu, but fedora has implemented systemd a while now and it does a good job in controlling services etc. [21:10:54] I always liked Solaris's smf thingy. systemd is close enough. [21:11:45] yeah, could be good for things like bots [21:11:48] for resource limiting [21:11:51] not for security limiting [21:11:52] Ryan_Lane: Why is it not allowed to create domains with a "." (subdomains) [21:12:15] I don't really know my original reasoning for that [21:12:26] likely because it's a subdomain, and it could be inside of another entry [21:12:59] Because I want to create a subdomain of centralauth.wmflabs.org [21:13:15] we'd need to make a centralauth domain for that [21:14:33] technically if you have a subdomain and a zone the zone should take priority [21:14:35] I don't think this is a good solution for the time when Labs will come out of beta [21:17:53] Ryan_Lane, if you don't mind, can you please have a look and let me know if this is any good? [21:17:54] https://gerrit.wikimedia.org/r/#/c/26642/ [21:18:15] Jan_Luca: we're going to be switching out the dns code with better dns code soonish [21:18:16] Ryan_Lane: Can you create such a domain? [21:18:20] I can, yes [21:18:27] Thank you [21:19:28] done [21:19:37] you should be able to add an entry under that domain now [21:19:41] let me know if not [21:19:47] the dns code is quite buggy [21:23:55] Ryan_Lane: It seems to work without any problem, thank you [21:24:01] great [21:24:10] let me know if you run into any other issues [21:24:26] adminxor: hm [21:24:35] I started looking at it earlier [21:24:48] ohh thanks [21:28:51] adminxor: reviewed [21:28:53] see comments [21:29:58] Thank you! [21:53:48] andrewbogott: Any reason you can't just stick the port in the server var rather than another if? Seems wasteful [21:53:54] Errr adminxor I mean [21:55:14] I think I messed up something :) [21:55:47] MBPs need a preamp, sucks using decent headphones [21:57:56] I'd say you made a new commit rather than ammending the first which is sorta fine to do but also a little messy :) [22:00:08] Damianz, yes, the same git review didn't work so i was cheking out the new git change [22:00:49] BTW, how do you suggest I fix the port for the server? [22:01:09] Do we have any ?: operator in puppet? [22:01:14] Just like C? [22:01:46] I don't think so [22:04:05] http://pastie.org/private/dik7fkmst3w2rsctt8egiw [22:04:11] any reason that wouldn't work? [22:04:21] I assume it's talking to a standard syslog server [22:04:51] * Damianz needs to fix his tab setup in vim... [22:06:47] ahh...what i was thinking!! Thank you, Damianz [22:06:57] That should work fine [22:07:06] Probably confused by the mess that is the puppet repo heh [22:08:00] Also we really need a pastebin, I might take another hash at trying to figure out the openid extension ... or maybe sunday when I don't have stuff to do before going away. [22:08:12] yeah and a little scared of screwing up things too! [22:12:10] Damianz, can I revert the change I pushed it now? [22:12:56] When you fetch the branch? you can use git checkout -- [22:13:19] Damianz: we can have pastebin when someone fixes openid :) [22:13:38] Ryan_Lane: I started looking at it, saw the login hack and ran away quickly ;) [22:13:42] I spent a few hours look at it [22:14:04] deciding the provider stuff needed a decent sized refactor and decided I didn't have time for it [22:15:04] I was half tempted to just write a provider extension and KISS [22:15:16] But the openid spec is a bit urgh worded around implimentation [22:23:04] ugh [22:23:12] the reboot issue with grub is fixed in newer images [22:23:32] Barras: this page says I should ask you if I have a questions about cloaks: http://meta.wikimedia.org/wiki/IRC/Cloaks#People_who_deal_with_Wikimedia_cloaks [22:23:48] Barras: please let me know if I misunderstood it [22:24:00] yeah, right [22:24:15] Barras: I am trying to fill Wikimedia Cloak Request form [22:24:26] let's take it to pm [22:25:27] Ryan_Lane: Well if you fix the types we can use new iamges soon for the mysql servers ;) [22:25:49] well, I should be able to fix grub on the instances [22:26:02] and upgraded packages just work [22:26:11] so once the problem is solved once, it should be fine [22:26:52] I wish paravoid had documented his solution for this [22:27:13] He might have pastebined it if we had a pastebin =D [22:27:24] though talking of docs, the current ones are wrong since we changed things [22:27:29] (for openstack at least) [22:28:44] maybe I can just run grub-install on the disks's device [22:29:28] yeah [22:30:40] and gerrit is down again [22:30:47] le sigh [22:31:19] we could improve this by including unicorns on the 503 error page ;D [22:33:27] That looks better, personally I'd of abandoned 26719 and edited 26642 since it's a crappy dependancy but the content looks about right. [22:35:46] 10/04/2012 - 22:35:46 - Created a home directory for dzahn in project(s): mobile-sms [22:36:01] https://labsconsole.wikimedia.org/wiki/Nova_Resource:I-0000049c <— looks…. odd [22:36:06] gerrit is down? [22:36:21] it randomly just 503's for like 20seconds [22:36:22] works for me [22:36:23] probably puppet [22:36:37] maybe [22:37:39] looks like a lot of disk space for the type [22:38:02] it looks like 49c didn't ever build correctly [22:38:10] seems like there might be an issue with security groups, in mobile-sms project, there are rules to allow ports 6667 and 8080 but seems like they arent [22:38:10] bots-sql2 is yours right? [22:38:17] yes [22:38:24] mutante: were they just added? [22:38:29] Ryan_Lane: yea [22:38:39] nova has some bug, which is fixed upstream [22:38:44] older rules work, like the one for port 80 [22:38:52] it occasionally takes a really long time to apply security group rules [22:39:08] ah, thanks [22:40:33] its image id is listed as: ubuntu-kernels/ubuntu-lucid-amd64-linux-image-2.6.32-28-virtual-v-2.6.32-28.55-kernel [22:40:34] that was it, i hear it works now [22:40:35] We need a community member with the nick 'nova' then everything can be their fault [22:40:36] that's weird [22:40:48] 10/04/2012 - 22:40:48 - User dzahn may have been modified in LDAP or locally, updating key in project(s): mobile-sms [22:40:59] that's for 049c [22:41:09] bots-sql2 has it listed correctly [22:41:31] ah [22:41:32] crap [22:41:34] I see the problem [22:41:48] missing on virt5? [22:41:53] the kernel and loader show up in the image list [22:41:59] and someone chose that [22:43:28] bots-sql2 OOM'd [22:44:39] it's in the shutoff state [22:44:47] bah [22:44:50] fucking nova [22:44:55] well, fucking essex [22:45:01] ERROR: Cannot 'reboot' while instance is in task_state rebooting (HTTP 409) [22:45:16] someone in the openstack channel I was a noob because I didn't want to muck in the database when this happened [22:45:19] it oomd? [22:45:35] same person that called me a noob when I didn't want to muck in the database for extending a network [22:45:35] it had oomkillerified loads of stuff but it didn't go down till I hit reboot [22:45:39] yeah [22:45:40] then never came back :( [22:45:49] nova tried to reboot it and it failed for some reason [22:45:53] ah [22:45:55] so it's stuck in the rebooting state [22:45:58] awesome [22:46:04] and you have to go into the damn database to fix it [22:46:11] there's a way to force this in folsom, thankfully [22:46:12] this is the box that needs more ram hence types fixing etc. sometime, probably late next week when I get back home. [22:46:17] yeah [22:46:25] I can try a resize for you in a sec [22:46:27] on a test instance [22:46:36] if it works, we can try on bots-sql2 [22:46:44] that would be awesome [22:46:45] lemme get the instance back up first, though [22:46:58] since I don't really want to dump like 7gb of data, create a new box and import it heh [22:47:30] You'd have though if it failed a reboot and you hit reboot again it would go meh failed but I'll try again [22:48:35] libvirtError: Domain not found: no domain with matching name 'i-000000af' [22:48:36] hm [22:48:57] maybe that's why it failed lol [22:49:12] it wasn't defined in libvirt it seems [22:49:32] maybe it'll work this time [22:49:37] I manually defined it [22:50:09] Considering libvirt has a damn good api with half decent python bindings this shit should be simple to logic test against... [22:50:27] well, this may be due to a cold migration [22:50:48] maybe my script missed this [22:51:08] it's ponging [22:51:46] it should be up [22:51:50] it is [22:51:53] great [22:51:55] stupid nova [22:51:58] or stupud me [22:52:00] one of the two [22:52:56] !log wikistats added docs and how to build the .deb [22:52:57] Logged the message, Master [22:53:13] !log bots bots-sql2 is back from a long reboot [22:53:14] Logged the message, Master [23:16:18] damn adminxor left [23:17:03] I wish the gerrit rebase button let you chose a rebase target [23:18:07] heh [23:18:13] it just does it against head, right? [23:18:27] if it has dependencies, against its dependencies [23:18:57] urgh, that sucks [23:20:00] well, it makes sense [23:20:12] if you have 10 deps in the chain, you don't want it to lose the deps [23:20:28] I assume it doesn't rebase all the deps in the chain though? [23:21:35] I don't think so [23:21:41] but it may [23:22:05] Meh, I so prefer working on branches and 1 review to merge the branch [23:22:54] I'm fine with this model [23:22:58] * Ryan_Lane shrugs [23:24:09] The downside is you either smash weeks of work into 1 commit which you have to refer back to gerrit to see or you have dozens of deps that need to be reviewed (some of which may fix issues in other commits that would otherwise cause it not to be merged) [23:24:58] wow. man. [23:25:11] puppet's dsl is such a horrible piece of shit [23:25:18] I'm pretty sure case doesn't want a , at the end of the case [23:25:27] and yes it is [23:25:42] that's why I say it's terrible [23:25:52] why do I need a comma in hashes, and not in cases? [23:26:13] totally [23:26:20] imo they should exist in both [23:26:23] it's so unbelievably inconsistent [23:26:25] ye [23:26:27] yes [23:26:31] it's like php =D [23:26:34] but most importantly, it should be consistent [23:26:37] I hate php, too [23:26:49] they actually refused bug report on the grounds that it's too hard to impliment something consistant in behaviour [23:27:03] which I nearly swore at them for [23:28:24] hahaha [23:28:28] ^^ [23:28:41] giftpflanze: ? [23:28:55] i found that funny [23:28:58] heh [23:29:28] Ryan_Lane: I'm currently playing locally with puppet (inside a vagrant/virtualbox vm) to puppetize TestSwarm [23:29:48] cool [23:29:49] you'll find that painful [23:29:55] Ryan_Lane: I see that most libs for web-app puppetization provide a define that takes params [23:30:09] well... [23:30:19] the model is to use modules [23:30:29] inside the modules use paramaterized classes [23:30:33] then make role classes [23:30:38] Ryan_Lane: e.g. define testswarm::site( ..... ), and in node integration ( testswarm::site { 'swarm.localhost', mysql_user => .. } [23:30:40] call the modules from the role classes [23:30:55] defines aren't what you want [23:31:03] you want parameterized classes [23:31:11] Why not? One can install multiple testswarms [23:31:22] class can only be used once in puppet right? [23:31:27] hm [23:31:28] yes [23:31:30] (due to resources having to be unique) [23:31:33] I ran into this [23:31:36] I had a class first [23:31:45] I'm still lost as to what should be a module and what should be a role class... I tend to just band everything in modules [23:32:04] Damianz: configurables shouldn't go into a role [23:32:06] err [23:32:07] into a module [23:32:13] hack, even package has to be unique, that sucks pretty bad. Can't reliably do package { 'foo' : ensure => present } and then Package['foo'] [23:32:15] they should be defined in the role and passed into the module [23:32:34] Krinkle: well, you should do as much as possible in a class [23:32:41] then use defines for the things you can't [23:32:47] you say that, but I'd end up with a role class configuing other classes and that's it and then like maybe 2 3 line classes in a module [23:32:51] so, the package installation and dependencies go into a module [23:32:56] the site creation goes into a define [23:33:03] Ryan_Lane: sure [23:33:05] Damianz: that's fine [23:33:12] Ryan_Lane: (its not that much code, got it all in one define right now) [23:33:17] Ryan_Lane: anyway, I got a question [23:33:23] Damianz: if you put the config items into the module, they aren't re-usable, and then what's the point? [23:33:31] well... [23:33:41] https://gerrit.wikimedia.org/r/#/c/26441/ < I just think that's ugly and a mess and the files should be part of the module [23:33:44] Ryan_Lane: since we have public manifests, how/where do we call testswarm::site { 'wm-integration swarm': mysql_password => ''' [23:33:59] it's also possible to make the roles modules too [23:34:30] Krinkle: that needs to go into the private repo [23:34:37] I couldn't find existing examples of it (do we have wordpress install puppetized?) [23:35:04] the mediawiki role stuff is an example [23:35:07] for labs [23:35:11] Ryan_Lane: the entire call, or do we use private variables or somethign that we then refer to? [23:35:19] private variables [23:35:37] it's used all over the place. not too hard to fine [23:35:39] *find [23:35:43] look at the ldap role [23:35:45] Right, since we want node ... { } to be public as well as which parameters to pass to testswarm::site [23:35:47] or the nova one [23:35:54] cool [23:35:58] I'll check it out [23:36:09] this is my first ever .pp file, so I might have more questions later. [23:36:27] but so far their docs are pretty good, and I learn by example in the repo and the interwebs [23:36:50] Damianz: in this change almost everything you are doing in roles should be in modules [23:36:52] Ryan_Lane: regarding package {} having to be unique though, is there a solution for that you prefer? [23:37:15] Ryan_Lane: Even the configuration of the mysql class? [23:37:17] Krinkle: define it in the class, only include the class once for the node [23:37:23] I read that some people like to keep them out and put in install instructions that those are simply assumed to be declared somewhere [23:37:36] Damianz: yes [23:37:37] The reason it's a role is a) puppet couldn't see the class in a module and b) a wrapping role class caused conflicts [23:37:56] Ryan_Lane: well, that gives a problem. that only works if you own all pp files [23:38:13] Krinkle: why? [23:38:14] we don't write everything ourselves right? e.g. plugin in a mysql manifest or whatever [23:38:21] I mean, we shouldn't have to [23:38:25] it doesn't matter [23:38:41] as long as we only include the package once in the catalogue's parse tree [23:38:44] package throws up if two files do package { 'php5-cli' : ensure => present } [23:38:45] people get grumpy when you include other modules :P [23:38:50] meaning, the tree of includes for a single node [23:39:03] So we can only have 1 piece of software that uses php :P [23:39:07] great xD [23:39:14] as a side note some stuff in modules is crap anyway because we have random package definitions that conflict so are never really going to be re-usable. [23:39:15] Krinkle: yeah, this is a known and shitty problem [23:39:41] Exactly [23:39:57] So I wonder if there is some big boy solution to this [23:40:07] there's a way to say "only import this if it isn't already imported" [23:40:13] but, doing that for every package is lame [23:40:14] or are there simply no "perfect" modules one can just download and plugin and use in your own node manifests? [23:40:33] there's no perfect modules [23:40:43] now you see why I complain about puppet so much :) [23:40:45] You mean with realise? [23:40:52] no [23:41:03] there's a way of checking if a resource is already defined [23:41:08] interesting [23:41:43] btw, I read that resource names are globally unique, that's right, right ? [23:41:49] Cloning into .... < so awesome git [23:42:00] Krinkle: well, kind of, yes [23:42:22] Krinkle: a resource can't be defined twice, globally [23:42:26] how global is global. e.g. can there be exec { "foo": } and mystuff::do { "foo" } and require them elsewhere without conflict? [23:42:37] that's different [23:42:43] it's by type too [23:42:48] okay good [23:43:12] that's why you can have a file, a service and a package all with the same name [23:43:35] but if one does like mysql::database { "foo": } and mysql::user { "foo": } and both do exec { "setup stuff $name": ..something else here } [23:43:46] then you are SOL [23:44:09] (e.g. imagine both do that internally in the mysql classes) [23:44:15] I originally did database stuff in the openstack stuff and decided to yank it out [23:44:16] of course that'd be stupid [23:44:20] it was more work than it was worth [23:44:25] Change on 12mediawiki a page Developer access was modified, changed by LouisDang link https://www.mediawiki.org/w/index.php?diff=590646 edit summary: /* User:LouisDang */ [23:44:44] but if it is mysql::database { "foo" } and testswarm::site{ "foo" }, then it becomes more likely [23:45:04] just prepend your resource names [23:45:13] it works for most things [23:45:14] so just using $name isn't enough, the part around that also has to be unique, ideally include the class [23:45:35] Krinkle: is this fun yet? :) [23:45:45] class name* [23:45:51] Ryan_Lane: :/ [23:46:07] I like to watch "vagrant up .." though, its nice [23:46:08] ruby is so powerful because you can make your own dsls! [23:46:23] spin up sqeeze, install puppet, and it goes and installs all I told it to [23:46:27] and then I can ssh into it. [23:46:28] *note: your own dsl is going to suck [23:46:39] yeah. vagrant is nice for that [23:46:40] dsl? [23:46:46] domain specific language [23:47:16] can you elaborate a little bit as to what it would be useful for in this context? [23:47:24] I was making a joke [23:47:29] puppet is a dsl written in ruby [23:47:39] and it makes kittens cry with its horribleness [23:47:46] right, it is written as DSL I didn't know that [23:47:47] interesting [23:48:11] this is why I like saltstack [23:48:17] it uses YAML and Jinja templates [23:48:26] its pretty cool though how it uses templates. I was afraid it only had <% varname %> but lots of other ruby as well [23:48:32] (erb) [23:48:35] and you can change the state engine file by file to use straight python [23:48:46] conditions etc. [23:48:54] yeah [23:49:10] erb is basically chopped down ruby [23:49:18] it's like how jinja compares to python [23:50:06] Ryan_Lane: So I know now how crazy this is going to sounds, but yesterday I was looking for a way to parse JSON, extend it and to string - from an erb template. See, testswarm has its config in a JSON file, that's cool, except that the 3 config vars I need to puppetize isn't all there will be. One will be able to add browsers etc. without having to go through puppet. [23:50:18] Anyway, I ended up working around it by templating a php file instead [23:50:30] Basically using erb template to patch a file [23:50:32] you can actually do that in erv [23:50:34] err [23:50:35] erb [23:50:38] it can call ruby [23:51:00] Yes, I realised that later, but I'm happy I didn't do it [23:51:02] it should be able to anyway [23:51:02] heh [23:51:14] since that would require puppet to run to re-create the json file [23:51:21] yeah [23:51:40] this is actually one spot where salt would help [23:51:58] you can make a call to regenerate the file [23:52:32] can puppet (within sanity limits) do partial templates? e.g. where there is a file that is going to be modified by humans from time to time (outside puppet), and puppet will maintain part of the file? [23:52:44] e.g. between markers. I know it can do that for crontab [23:52:44] I don't think so [23:52:57] but not sure about regular files directly [23:53:18] one thing we do is create a .d directory with all the files [23:53:24] an have an exec combine them [23:53:31] hmm [23:53:37] I might move this now before sleep [23:53:51] need to kill the apache stuff anyway [23:54:02] Damianz: the bots-sql stuff? [23:54:06] lemme do a quick resize test [23:54:14] puppet stuff [23:54:23] though could do bots-sql stuff also, it's only 1am [23:55:59] ok. attempting resize [23:56:03] let's see how it goes :D [23:56:42] wow. seriously? [23:56:51] it needs to fucking ssh back into itself for this? [23:57:11] Ryan_Lane: interesting, some of those files could be puppetized and some not [23:57:13] and now its in an error status [23:57:22] (only ensure present, replace > false) [23:57:24] Damianz: so, let's not resize that instance [23:57:32] o.0 [23:57:38] ok [23:57:50] I'll re-install it at some point if you fix the types [23:58:33] * Ryan_Lane nods [23:58:46] Command: ssh 10.4.16.8 mkdir -p /var/lib/nova/instances/i-00000255 [23:58:49] ^^ seriously [23:58:51] no joke [23:58:54] awesome [23:59:05] it re-uses the live-migration code [23:59:19] *why* does it re-use all of it, though? [23:59:23] ok so if I move these class to a module do I need to impliment a role class if I don't need to deal with args? I assume module stuff is auto loaded