[00:00:02] they know the clustered systems, since they are setting up a clone of the cluster inside of labs [00:00:04] I'd still like to change puppetmaster::self so you can store the data in /data/project/somefolder and have a cluster of instances configured on the same repo. [00:00:12] Damianz: yeah, that would be nice [00:00:30] rocsteady: also, we have a number of things that are already half-way done to being turned into modules [00:00:38] most of the things that are in the roles directory [00:00:42] Still trying to think of a way to make checkins not suck though =/ [00:00:46] taking them all the way there would be great :) [00:01:00] * Damianz goes back to watching a bearded guy talk about random stuff. [00:01:03] :D [00:01:11] I'm not sure what checkins are. [00:01:26] I'm with Roan at a cafe though in San Francisco. He's helping me get started. [00:01:46] I decided to stay in town a little longer so I could familiarize myself with the workflow before I head back to Portland to help remotely. [00:01:49] Say hi to Catrope [00:02:09] oh. cool. [00:02:17] that's awesome [00:02:24] tell roan I said thanks for helping out [00:03:25] I kinda wish puppet had the searching capacity of chef. Auto configuring stuff based on other stuff is still meh [00:04:17] Damianz: That would be a good thing to tell the puppet guys. They really want to make their software better for the user. [00:04:44] I introduced Roan to Randall Hansen, and he gave Roan his card in case you guys want to swap ideas. [00:05:24] Okay, I'm gonna start working on getting set up. [00:05:32] Thanks! [00:06:31] My vp of tech ops knows Luke (the puppet founder) and Adam (creater of chef) so we have tend to have office banter about which sucks less heh. [00:06:31] * RoanKattouw waves [00:06:33] Yeah I tried to help her & jesusaur with this a while ago but somehow their instance got hosed in the middle of the Nova upgrade or soemthing [00:06:45] in the nova upgrade? [00:06:54] or the instance cold migration snafu? [00:07:10] Actually, it was when they were updating openstack [00:07:22] webs console was down [00:07:32] really? I didn't hear of any instance destruction during that [00:07:36] labs* [00:07:43] * Ryan_Lane did the upgrade [00:07:43] heh [00:07:46] I know there are instances still down, but I gave up bitching about it [00:07:49] I don't remember exactly [00:07:54] It's in the irc scroll back [00:08:01] Damianz: if they are down, it's from the cold migration snafu [00:08:03] I just remember that there was a nova upgrade scheduled for the next Monday, and labsconsole was throwing 500s [00:08:06] lol I was like 'Ryan, labs console is down' and someone else answered lol [00:08:13] when trying to create a new instnace [00:08:34] The existing instance was screwed up somehow, as if puppetmaster::self had been partially installed [00:09:29] GRRRRRRRRRRRRRRRRRRRRR [00:09:34] I seriously fucking hate this script [00:09:35] * Damianz stab [00:09:36] so it never installed properly or something? [00:09:56] or it was running for a while and was destroyed in the upgrade? [00:10:05] * Damianz kicks the shit out of petan's script [00:10:07] I need to know when things die :) [00:10:22] or screw up, so that I can see what happened so it doesn't happen again [00:10:56] ok, I'm just going to replace this with *my* script [00:11:02] Stupid parsing json with grep and awk [00:11:04] Just argh [00:11:05] * Damianz rage [00:11:12] heh [00:12:51] Hello everyone! [00:13:18] Who do I talk to in order to get my account added to the bastion project? [00:17:05] Huji: I can add you [00:17:10] do you already have an account? [00:17:14] I do sir [00:17:18] I am a MW developer [00:17:26] and my account on Gerrit is "huji [00:17:28] " [00:18:15] you already have access [00:18:44] oh really? [00:18:46] yep [00:18:47] Because I saw this: https://labsconsole.wikimedia.org/wiki/Nova_Resource:Bastion [00:18:56] But then I don't find myself listed here: [00:18:56] https://labsconsole.wikimedia.org/wiki/Special:Ask/-5B-5BResource-20Type::project-5D-5D/-3F/-3FMember/-3FDescription/mainlabel%3D-2D/searchlabel%3Dprojects/offset%3D0 [00:19:50] hm [00:19:51] ooooooh, I found the problem: I thought that list is alphabetical; it's not! I'm at the end!! [00:20:01] yep [00:20:01] heh [00:22:07] fixed! It is not alphabetical :) [00:22:22] Ryan_Lane : do you also have a few minutes? [00:22:32] sure [00:22:34] what's up? [00:22:40] hmm ok so eqiad isn't online yet [00:22:50] Damianz: not just yet [00:23:04] we want to change some things before we bring it up [00:23:05] sadtimes that it has instances in ldap [00:23:08] like I{v6 [00:23:11] IPv6 [00:23:19] well, technically instances are running on it [00:23:45] Ryan_Lane: I want to understand a few more things about how labs work, and then convey that message to people in Fa WP who are not good at English [00:23:50] can I msg you please? [00:24:01] sure, but it's likely best if it's in channel [00:24:13] that way any questions I answer can be recorded in the channel log [00:24:54] okay. [00:25:09] Is there anyone in here with a computer science degree that can explain what a guard digit actually /does/? [00:25:19] cause my book and the Wikipedia article make zero sense [00:25:33] Ryan_Lane: allow me ask questions one by one; these are all in the light of possible closure of Toolserver [00:25:56] Ryan_Lane: (1) how can one host a project WITH a public IP on the labs? [00:26:06] TParis: guard digit? [00:26:16] Ryan_Lane: Are instance names unique globally? [00:26:18] also called a guard bit? [00:26:19] Huji: any project can have public IPs [00:26:25] Damianz: in region, yes [00:26:28] Used for binary arithmetic? [00:26:30] They just have to request a public IP, Huji [00:26:31] Damianz: between regions, no [00:26:32] kk [00:26:40] Ryan_Lane: that is given for free by Labs? [00:26:43] yep [00:26:46] I figure I should be able to find someone in here with a computer degree [00:26:53] TParis: I have one :D [00:27:08] :D So....what's a guard bit? [00:27:08] I'd need to look that up, though [00:27:10] ok [00:27:21] I'm stumped and I cant find an answer that makes sense anywhere [00:27:28] I don't actually do much things are are computer sciency, really ;) [00:27:44] Huji: everything offered is free [00:28:31] Ryan_Lane: (2) a couple of people are running bots in Fa WP using TS. I was wondering if it is better for them to get a Large machine and share it, or get a Tiny machine each. Do we have policies/best practices for that in Labs? [00:28:52] Damianz: ^^ ? [00:29:05] either is fine [00:29:13] it depends on how expensive the bot is [00:29:21] if it's really expensive, generally it's run on its own instance [00:29:29] otherwise it can go on a shared instance [00:30:04] TParis: I've never heard of a "guard bit". I've heard of a "sign bit" [00:30:05] what he said [00:30:21] Ryan_Lane: (3) this is the most important Q; as a CU, I have become aware that AWS IPs are globally blocked, because they have been frequently used to create free open proxies using the AWS free tier. Is that going to affect the Labs-hosted bots? [00:30:34] Roan: I get the sign bit part. My book refers to a guard bit that is used in arithmetic only. It's not stored. [00:30:45] Labs doesn't run on AWS [00:31:02] By the way, I enjoyed your talk in Germany. I've never paid much attention to indexes before. [00:31:09] what is this 'guard bit' in reference to? [00:31:16] binary arithmetic [00:31:20] [[Guard digit]] [00:31:24] Huji: well, it's possible that Labs ranges could be blocked [00:31:35] Huji: but I believe people in the community are ensuring that doesn't happen [00:31:54] guard digits are used when dividing binary floating point numbers in biased representation. [00:32:00] guard bits is another name [00:32:10] Huji: Labs is completely run inside of our own network (in the same datacenter as production) on our own hardware [00:32:18] we don't share IP address space with anyone else [00:32:20] Ryan_Lane: Want to review some stuff for me? [00:32:26] Damianz: I saw your change [00:32:27] looks good [00:32:35] let me review it tomorrow or monday [00:32:40] I'm leaving really soon [00:32:47] Ryan_Lane: so Labs machines, despite being AWS machines, are all using WMF IPs? [00:32:54] Huji: they aren't AWS [00:32:56] I was thinking about a different one, that one's a bit meh now I added 3 options but can't decide on the correct usage. [00:33:04] Huji: Labs is completely built and maintained by us [00:33:08] Np if you're busy, I'll just annoy Leslie later as it's her realm [00:33:17] Huji: we run OpenStack on the hardware [00:33:17] Ryan_Lane: aah! that's new to me! I remember reading "EC2" somewhere tho [00:33:27] we used to use the EC2 API [00:33:33] which OpenStack is compatible with [00:33:37] we don't even use that any more [00:33:40] we use nova's open API [00:33:48] OpenStack Nova [00:33:53] Ryan_Lane: marvelous! That's a big step forward :) [00:33:57] (nova is an openstack project, which is really what we're using) [00:34:08] TParis: maybe the guard bit is a fancy name for some part of the number that tells you if you have +/- zero or NaN [00:34:14] it would be way too expensive to do this on cloud services :) [00:34:27] bleh [00:34:27] also, we don't use third party services, generally, as a rule [00:34:42] Ryan_Lane: in theory, it should be more expensive locally, but I realize cloud services kind of overcharge people [00:34:45] i dont recall the specifics of IEEE floating points [00:34:45] jesusaurus: That's the sign bit...mostly. [00:35:10] Huji: well, we already have datacenters, and we get donated hardware [00:35:12] I think I got it now, I had to stare at the Wikipedia article some more [00:35:25] i thought there was something about the first bit or two in the mantissa as well [00:35:30] and we're expecting to run a *lot* of instances in the future [00:35:40] my estimate is 1000 or more in 2-3 years [00:35:47] PROBLEM Free ram is now: CRITICAL on bots-sql2 i-000000af output: CHECK_NRPE: Socket timeout after 10 seconds. [00:36:15] Ryan_Lane: you've been patient and really helpful. That makes me want to ask yet another question! [00:36:21] jesusaurus: The first bit is the sign bit, the next 8 are the exponent, and the last 23 are the significand. [00:36:23] assuming little bandwidth usage (which is unlikely), that's at minimum $50,000 per year. likely more like $200,000 or more [00:36:29] But my book is crap and the school really needs to find something better. [00:36:42] It pisses me off that these schools use books written by their own professors. [00:36:58] When the books are crap but the prof has a COI and wants to make $$$ so he makes the book the textbook [00:37:07] There should be ethical issues with that.... [00:37:17] Anyway, now I am spamming -labs with my homework... [00:37:19] Man, floating point internals... [00:37:23] heh [00:37:32] Hey it's fine I'm also doing homework :) [00:37:39] Huji: ask as many questions as you'd like [00:37:41] Ryan_Lane: how is the Wikimedia DB replication going to be accessible on Labs projects. Say I create an instance, with my code on it, and I want the code to query the DB [00:38:00] replicated databases will be accessible to every instance in every project [00:38:01] Ryan_Lane: or, even more, say I want to get to the mysql prompt, just like how I do in Toolserver. [00:38:14] it should work similarly [00:38:19] PROBLEM Puppet freshness is now: CRITICAL on hugglewiki i-000000aa output: Puppet has not run in the last 10 hours [00:38:19] PROBLEM Puppet freshness is now: CRITICAL on search-test i-000000cb output: Puppet has not run in the last 10 hours [00:38:19] PROBLEM Puppet freshness is now: CRITICAL on bots-4 i-000000e8 output: Puppet has not run in the last 10 hours [00:38:19] PROBLEM Puppet freshness is now: CRITICAL on patchtest2 i-000000fd output: Puppet has not run in the last 10 hours [00:38:19] PROBLEM Puppet freshness is now: CRITICAL on test2 i-0000013c output: Puppet has not run in the last 10 hours [00:38:19] PROBLEM Puppet freshness is now: CRITICAL on secondinstance i-0000015b output: Puppet has not run in the last 10 hours [00:38:19] PROBLEM Puppet freshness is now: CRITICAL on bots-dev i-00000190 output: Puppet has not run in the last 10 hours [00:38:20] PROBLEM Puppet freshness is now: CRITICAL on robh2 i-000001a2 output: Puppet has not run in the last 10 hours [00:38:20] PROBLEM Puppet freshness is now: CRITICAL on swift-be3 i-000001c9 output: Puppet has not run in the last 10 hours [00:38:21] PROBLEM Puppet freshness is now: CRITICAL on resourceloader2-apache i-000001d7 output: Puppet has not run in the last 10 hours [00:38:21] PROBLEM Puppet freshness is now: CRITICAL on pediapress-packager i-000001e4 output: Puppet has not run in the last 10 hours [00:38:22] PROBLEM Puppet freshness is now: CRITICAL on wikidata-dev-1 i-0000020c output: Puppet has not run in the last 10 hours [00:38:24] Ryan_Lane: do I have to connect to some remote DB? [00:38:25] ack [00:38:39] RECOVERY Puppet freshness is now: OK on hugglewiki i-000000aa output: puppet ran at Sun Sep 30 00:38:33 UTC 2012 [00:38:41] Huji: we haven't set it up yet, so I'm not sure [00:38:49] RECOVERY Puppet freshness is now: OK on search-test i-000000cb output: puppet ran at Sun Sep 30 00:38:34 UTC 2012 [00:38:49] RECOVERY Puppet freshness is now: OK on bots-4 i-000000e8 output: puppet ran at Sun Sep 30 00:38:34 UTC 2012 [00:38:49] RECOVERY Puppet freshness is now: OK on patchtest2 i-000000fd output: puppet ran at Sun Sep 30 00:38:34 UTC 2012 [00:38:49] RECOVERY Puppet freshness is now: OK on test2 i-0000013c output: puppet ran at Sun Sep 30 00:38:34 UTC 2012 [00:38:49] RECOVERY Puppet freshness is now: OK on secondinstance i-0000015b output: puppet ran at Sun Sep 30 00:38:34 UTC 2012 [00:38:49] RECOVERY Puppet freshness is now: OK on bots-dev i-00000190 output: puppet ran at Sun Sep 30 00:38:34 UTC 2012 [00:38:49] RECOVERY Puppet freshness is now: OK on robh2 i-000001a2 output: puppet ran at Sun Sep 30 00:38:34 UTC 2012 [00:38:54] heh [00:38:55] :) [00:38:56] I hate you bot [00:38:57] Huji: it would be nice to get a use case on how it's used now, and how you guys would like it to work [00:39:00] not restarting you [00:39:07] Damianz: :D [00:39:15] * Damianz waits for all the checks to catch up [00:39:24] Damianz: thanks for handling nagios [00:39:33] I haven't seen petan around in a while [00:39:38] what's he been up to? [00:39:45] * Damianz has a secret monitoring fetish that he'd like to expand into more st00f [00:39:48] Dunno [00:40:14] !log nagios Parser was pootched due to json format changes. Switched to https://github.com/DamianZaremba/labsnagiosbuilder which pulls from ldap and should be stable (moving to gerrit slowly). [00:40:23] oh fuck you adminbot [00:40:26] :D [00:40:31] I really need to fix that bot [00:40:42] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af output: 896700 [00:40:48] !log nagios Parser was pootched due to json format changes. Switched to https://github.com/DamianZaremba/labsnagiosbuilder which pulls from ldap and should be stable (moving to gerrit slowly). [00:40:53] !log nagios Parser was pootched due to json format changes. Switched to https://github.com/DamianZaremba/labsnagiosbuilder which pulls from ldap and should be stable (moving to gerrit slowly). [00:40:58] * Damianz frown [00:41:01] :( [00:41:01] Yeah [00:41:06] It's doing the can't edit page thing [00:41:09] hm [00:41:12] PROBLEM Total processes is now: CRITICAL on search-test i-000000cb output: NRPE: Command check_processes not defined [00:41:12] PROBLEM Total processes is now: CRITICAL on nova-dev3 i-000000e9 output: NRPE: Command check_processes not defined [00:41:12] PROBLEM Total processes is now: CRITICAL on labs-relay i-00000103 output: NRPE: Command check_processes not defined [00:41:12] PROBLEM Total processes is now: CRITICAL on firstinstance i-0000013e output: NRPE: Command check_processes not defined [00:41:17] this must be due to us using the same user [00:41:20] bleh puppet ran [00:41:35] Well it's not netsplits so yeah I think it's a token thing [00:41:38] did you need a review to fix this? [00:41:39] I couldn't see what though [00:41:47] nagios, that is [00:42:09] Nah, my only pending review is to fix misc output... that check if just broken think I typed it wrong [00:42:12] Ryan_Lane: Well, I'm asking more to learn the whole model than a special case. Let me explain why I'm interested [00:42:27] Huji: well, so far we don't have that in the model :) [00:42:35] since it isn't implemented yet [00:42:36] Ryan_Lane: if the replica is going to be MySQL based, then, to the best of my knowledge, it can't be run on several machines [00:42:44] yeah, it's going to be mysql [00:42:54] and we have some method of dealing with that, apparently [00:43:15] not using trainwreck, like the TS [00:43:38] binasher (who isn't online right now) would have more info about that [00:43:48] right, that should be fixed [00:44:27] Ryan_Lane: Hmm [00:44:46] So i-00000026 exists in ldap (new region), not in dns though. Is that region not hooked up dns wise? [00:45:03] Damianz: it's not in DNS? [00:45:07] oh wait [00:45:12] ignore me, it's not in ldap apparently [00:45:12] it has to be [00:45:14] ah [00:45:17] where is it? [00:45:20] how did you spot it? [00:45:32] hm. shit. actually the instance IDs are going to be a problem [00:45:38] hmm [00:45:39] we really need to switch away from using the instance IDs [00:45:41] Processing info for dc=i-00000026,dc=eqiad,ou=hosts,dc=wikimedia,dc=org [00:45:43] wth [00:45:51] Ryan_Lane: I'll be waiting to learn about tha tmethod! [00:46:11] Huji: well, ideally, it'll work as much like TS as possible [00:46:21] Now I'm confused [00:46:33] the way TS does a few things is kind of problematic… so I'm not sure if it'll be 100% the same [00:46:34] Ryan_Lane: I wish you ended your sentence with "if not better"! [00:46:45] heh [00:46:50] well, I'm hoping it'll be better [00:47:05] damian@nagios-main:~$ ldapsearch -D $(grep binddn /etc/ldap.conf | awk '{print $2}') -w $(grep bindpw /etc/ldap.conf | awk '{print $2}') | grep 'eqiad' | wc -l [00:47:08] 0 [00:47:10] THAT'S NOT POSSIBLE [00:47:39] I'm seeing it [00:47:39] dn: dc=i-00000026,dc=eqiad,ou=hosts,dc=wikimedia,dc=org [00:47:49] Damianz: is this breaking things? [00:47:52] Yeah my script is seeing it (which is using those same details) [00:48:10] Ryan_Lane: It's just warning for everything because it can't resolve it against dns [00:48:22] ah. right. we were going to switch to FQDNs for IDs rather than the short names [00:48:26] But I don't see how my script (which uses ldap) sees it but I don't when I do a query unless I'm being dumb [00:48:29] but really, getting rid of the IDs would be nice [00:48:39] how is it checking it in DNS? [00:48:42] fqdn? [00:48:49] short name [00:48:56] that won't work [00:49:01] it's not in the search path [00:49:03] I'd like to move to dns and a record based stuff [00:49:15] you need to use FQDN [00:49:35] That's a little hard to figure out hmm [00:50:08] it's in the a record ;) [00:50:11] err [00:50:18] in the associatedDomain record [00:50:27] well, attribute, not record [00:50:31] you know what I mean [00:50:53] yeah [00:51:00] that contains a record for each though [00:51:07] guess I'll just loop and make a best effort guess [00:51:17] heh [00:51:23] this is why I want to kill off IDs [00:51:36] I need to make labsconsole delete keys from puppet using the API to do so, though [00:51:56] ok. I gotta run [00:51:58] * Ryan_Lane waves [00:52:10] Huji: don't hesitate to ask any questions you have [00:52:21] Huji: I'm always happy to answer them [00:52:22] Ryan_Lane: will do [00:52:49] https://github.com/DamianZaremba/labsnagiosbuilder/commit/a27c8c2b2fca65a48899c7827dc97f611e15d90c should sort it [00:52:55] I think [00:52:56] w/e, it sees to work [00:53:04] Better than crapnotworkingjsonparsing [00:53:15] * Damianz waves at Ryan_Lane [00:59:56] Ryan_Lane: Will do! [01:00:10] I'd love to make nagios use names but unless we have 1 server per region that isn't really possible probably... unless I just prefix them all hmmm [01:21:31] Instance State building - does this mean wait? [01:21:56] yes [01:22:10] once building finishes look at the console log and wait until puppet finishes [01:22:28] ok that must explain this: utrsweb i-00000464 error [01:24:52] PROBLEM host: i-00000464.pmtpa.wmflabs is DOWN address: i-00000464.pmtpa.wmflabs check_ping: Invalid hostname/address - i-00000464.pmtpa.wmflabs [01:25:48] ^^ that's my guy :P [01:26:25] yeah, I'm running the update script while debugging stuff. We don't have an easy way to tell what's online and what's not currently. [01:26:31] Now it's in nova I could actually do that [01:30:12] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [01:30:42] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [01:30:42] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [01:30:42] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [01:33:32] PROBLEM Free ram is now: CRITICAL on bots-sql2 i-000000af.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [01:34:02] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [01:34:02] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [01:34:12] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [01:34:12] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [01:34:12] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [01:34:12] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [01:34:42] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [01:35:12] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [01:35:12] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [01:35:12] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [01:38:23] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af.pmtpa.wmflabs output: 936536 [01:47:09] right, now it shouldn't check stuff until it's built [01:56:52] PROBLEM Total processes is now: CRITICAL on metavidwiki i-00000465.pmtpa.wmflabs output: Connection refused by host [01:57:32] PROBLEM dpkg-check is now: CRITICAL on metavidwiki i-00000465.pmtpa.wmflabs output: DPKG CRITICAL dpkg reports broken packages [02:00:12] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [02:00:42] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [02:00:42] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [02:00:42] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [02:01:48] RECOVERY Total processes is now: OK on metavidwiki i-00000465.pmtpa.wmflabs output: PROCS OK: 84 processes [02:02:28] RECOVERY dpkg-check is now: OK on metavidwiki i-00000465.pmtpa.wmflabs output: All packages OK [02:04:08] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [02:04:08] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [02:04:48] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [02:04:48] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [02:05:13] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [02:05:33] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [02:05:53] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [02:06:03] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [02:06:13] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [02:06:13] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [02:30:12] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [02:30:42] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [02:30:42] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [02:30:42] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [02:33:32] PROBLEM Free ram is now: CRITICAL on bots-sql2 i-000000af.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [02:34:42] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [02:35:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [02:35:42] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [02:36:02] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [02:36:02] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [02:36:02] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [02:36:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [02:36:32] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [02:36:32] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [02:36:32] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [02:37:22] RECOVERY Free ram is now: OK on ipv6test1 i-00000282.pmtpa.wmflabs output: 404564 [02:38:12] RECOVERY Free ram is now: OK on bots-2 i-0000009c.pmtpa.wmflabs output: 1688852 [02:45:22] PROBLEM Free ram is now: WARNING on ipv6test1 i-00000282.pmtpa.wmflabs output: 450100 [02:55:32] PROBLEM Free ram is now: CRITICAL on ipv6test1 i-00000282.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [02:58:32] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af.pmtpa.wmflabs output: 960864 [03:00:12] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [03:00:42] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [03:00:42] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [03:00:42] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [03:04:42] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [03:05:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [03:05:42] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [03:06:02] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [03:06:02] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [03:06:02] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [03:06:12] PROBLEM Free ram is now: WARNING on bots-2 i-0000009c.pmtpa.wmflabs output: 1717228 [03:06:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [03:06:32] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [03:06:32] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [03:06:32] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [03:16:12] RECOVERY Free ram is now: OK on bots-2 i-0000009c.pmtpa.wmflabs output: 1711568 [03:30:12] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [03:30:42] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [03:30:42] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [03:30:42] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [03:34:42] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [03:35:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [03:36:02] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [03:36:02] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [03:36:02] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [03:36:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [03:36:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [03:36:32] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [03:36:32] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [03:36:42] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [03:45:22] PROBLEM Free ram is now: WARNING on ipv6test1 i-00000282.pmtpa.wmflabs output: 456900 [03:57:52] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f.pmtpa.wmflabs output: Warning: 16% free memory [03:58:22] PROBLEM Total processes is now: CRITICAL on bots-sql2 i-000000af.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [04:00:12] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [04:00:42] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [04:00:42] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [04:00:42] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [04:03:12] RECOVERY Total processes is now: OK on bots-sql2 i-000000af.pmtpa.wmflabs output: PROCS OK: 84 processes [04:04:42] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [04:05:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [04:06:02] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [04:06:02] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [04:06:02] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [04:06:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [04:06:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [04:06:32] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [04:06:32] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [04:06:42] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [04:22:43] PROBLEM Free ram is now: CRITICAL on orgcharts-dev i-0000018f.pmtpa.wmflabs output: Critical: 4% free memory [04:30:12] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [04:30:42] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [04:30:42] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [04:30:42] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [04:32:52] RECOVERY Free ram is now: OK on orgcharts-dev i-0000018f.pmtpa.wmflabs output: OK: 95% free memory [04:34:42] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [04:35:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [04:36:02] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [04:36:02] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [04:36:02] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [04:36:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [04:36:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [04:36:32] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [04:36:32] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [04:36:42] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [05:00:12] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [05:00:42] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [05:00:42] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [05:00:42] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [05:04:42] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [05:05:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [05:05:49] 09/30/2012 - 05:05:49 - Created a home directory for hazard-sj in project(s): bastion [05:05:59] 09/30/2012 - 05:05:59 - Creating a home directory for hazard-sj at /export/keys/hazard-sj [05:06:02] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [05:06:02] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [05:06:02] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [05:06:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [05:06:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [05:06:32] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [05:06:32] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [05:06:42] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [05:06:48] :| [05:10:44] 09/30/2012 - 05:10:44 - User hazard-sj may have been modified in LDAP or locally, updating key in project(s): bastion [05:10:53] 09/30/2012 - 05:10:53 - Updating keys for hazard-sj at /export/keys/hazard-sj [05:30:12] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [05:30:42] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [05:30:42] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [05:30:42] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [05:34:42] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [05:35:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [05:36:02] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [05:36:02] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [05:36:02] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [05:36:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [05:36:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [05:36:32] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [05:36:32] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [05:36:42] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [06:00:12] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [06:00:42] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [06:00:42] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [06:00:42] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [06:04:42] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [06:05:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [06:06:02] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [06:06:02] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [06:06:02] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [06:06:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [06:06:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [06:06:32] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [06:06:32] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [06:06:42] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [06:25:23] PROBLEM Free ram is now: CRITICAL on ipv6test1 i-00000282.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [06:29:23] PROBLEM Total processes is now: CRITICAL on ipv6test1 i-00000282.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [06:30:12] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [06:30:22] PROBLEM Free ram is now: WARNING on ipv6test1 i-00000282.pmtpa.wmflabs output: 458188 [06:30:42] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [06:30:42] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [06:30:42] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [06:34:12] RECOVERY Total processes is now: OK on ipv6test1 i-00000282.pmtpa.wmflabs output: PROCS OK: 124 processes [06:34:42] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [06:35:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [06:36:02] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [06:36:02] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [06:36:02] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [06:36:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [06:36:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [06:36:32] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [06:36:32] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [06:36:42] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [06:37:32] PROBLEM Free ram is now: CRITICAL on dumps-bot3 i-000003ef.pmtpa.wmflabs output: 7741876 [06:37:33] !ping [06:37:33] pong [06:42:32] PROBLEM Free ram is now: WARNING on dumps-bot3 i-000003ef.pmtpa.wmflabs output: 7679680 [06:47:42] PROBLEM Free ram is now: CRITICAL on dumps-bot3 i-000003ef.pmtpa.wmflabs output: 7861152 [06:54:12] PROBLEM Free ram is now: WARNING on bots-2 i-0000009c.pmtpa.wmflabs output: 1744060 [06:55:12] PROBLEM dpkg-check is now: CRITICAL on bots-sql2 i-000000af.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [06:55:22] RECOVERY Free ram is now: OK on ipv6test1 i-00000282.pmtpa.wmflabs output: 232012 [07:00:13] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [07:00:43] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [07:00:43] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [07:00:43] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [07:04:43] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [07:05:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [07:06:02] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [07:06:02] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [07:06:12] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [07:06:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [07:06:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [07:06:32] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [07:06:32] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [07:06:42] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [07:30:52] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [07:30:52] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [07:31:52] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [07:32:02] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [07:35:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [07:36:02] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [07:36:02] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [07:36:02] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [07:36:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [07:36:32] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [07:36:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [07:36:32] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [07:36:32] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [07:36:42] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [07:53:12] RECOVERY Free ram is now: OK on dumps-bot2 i-000003f4.pmtpa.wmflabs output: 5584468 [07:55:52] RECOVERY dpkg-check is now: OK on bots-sql2 i-000000af.pmtpa.wmflabs output: All packages OK [08:00:52] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [08:00:52] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [08:01:52] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [08:02:02] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [08:05:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [08:06:02] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [08:06:02] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [08:06:02] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [08:06:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [08:06:32] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [08:06:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [08:06:32] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [08:06:32] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [08:06:42] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [08:11:22] PROBLEM Total processes is now: CRITICAL on bots-sql2 i-000000af.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [08:16:12] RECOVERY Total processes is now: OK on bots-sql2 i-000000af.pmtpa.wmflabs output: PROCS OK: 83 processes [08:22:32] PROBLEM Free ram is now: CRITICAL on dumps-bot3 i-000003ef.pmtpa.wmflabs output: 7744956 [08:30:52] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [08:30:52] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [08:31:52] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [08:32:02] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [08:35:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [08:36:02] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [08:36:02] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [08:36:02] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [08:36:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [08:36:32] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [08:36:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [08:36:32] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [08:36:32] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [08:36:42] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [09:00:52] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [09:00:52] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [09:01:52] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [09:02:02] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [09:05:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [09:06:02] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [09:06:02] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [09:06:02] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [09:06:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [09:06:32] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [09:06:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [09:06:32] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [09:06:32] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [09:06:42] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [09:30:52] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [09:30:52] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [09:31:52] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [09:32:02] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [09:35:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [09:36:02] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [09:36:02] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [09:36:02] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [09:36:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [09:36:32] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [09:36:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [09:36:32] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [09:36:32] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [09:36:42] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [09:39:12] RECOVERY Free ram is now: OK on bots-2 i-0000009c.pmtpa.wmflabs output: 1712360 [10:01:21] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [10:02:01] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [10:02:01] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [10:02:51] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [10:03:02] !log glam deleting instance as puppet connection is broken ... Sep 30 06:26:49 glam-gwtools puppet-agent[9297]: Could not request certificate: getaddrinfo: Name or service not known [10:03:03] Logged the message, Master [10:05:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [10:06:02] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [10:06:02] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [10:06:02] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [10:06:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [10:06:32] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [10:06:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [10:06:32] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [10:06:32] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [10:06:42] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [10:16:12] PROBLEM Free ram is now: WARNING on bots-2 i-0000009c.pmtpa.wmflabs output: 1728904 [10:31:22] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [10:32:02] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [10:32:02] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [10:32:52] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [10:35:23] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [10:36:03] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [10:36:03] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [10:36:03] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [10:36:13] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [10:36:33] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [10:36:33] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [10:36:33] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [10:36:43] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [10:37:13] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [10:59:13] Change on 12mediawiki a page Developer access was modified, changed by Quentinv57 link https://www.mediawiki.org/w/index.php?diff=588961 edit summary: /* User:Quentinv57 */ + [11:01:22] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [11:02:02] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [11:02:02] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [11:02:52] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [11:05:32] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [11:06:32] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [11:06:32] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [11:06:32] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [11:06:42] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [11:07:02] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [11:07:12] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [11:07:12] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [11:07:12] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [11:07:52] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [11:12:02] PROBLEM dpkg-check is now: CRITICAL on bots-sql2 i-000000af.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [11:16:52] RECOVERY dpkg-check is now: OK on bots-sql2 i-000000af.pmtpa.wmflabs output: All packages OK [11:31:22] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [11:32:02] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [11:32:02] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [11:32:52] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [11:35:32] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [11:36:32] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [11:36:32] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [11:36:32] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [11:36:42] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [11:37:12] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [11:37:12] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [11:37:12] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [11:37:22] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [11:37:52] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [11:45:25] Change on 12mediawiki a page Developer access was modified, changed by Mardetanha link https://www.mediawiki.org/w/index.php?diff=588970 edit summary: [11:48:51] do i have to log packages that i install for testing? [12:01:22] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [12:02:02] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [12:02:02] PROBLEM dpkg-check is now: CRITICAL on bots-sql2 i-000000af.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [12:02:02] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [12:02:52] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [12:05:32] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [12:06:32] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [12:06:32] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [12:06:32] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [12:06:42] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [12:07:12] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [12:07:12] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [12:07:12] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [12:07:52] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [12:07:52] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [12:08:32] PROBLEM Free ram is now: CRITICAL on bots-sql2 i-000000af.pmtpa.wmflabs output: CHECK_NRPE: Socket timeout after 10 seconds. [12:31:22] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [12:32:02] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [12:32:02] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [12:32:52] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [12:35:32] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [12:36:32] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [12:36:32] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [12:36:32] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [12:36:42] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [12:37:12] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [12:37:12] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [12:37:12] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [12:37:52] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [12:37:52] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [13:01:22] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [13:02:02] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [13:02:02] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [13:02:52] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [13:05:13] PROBLEM Total processes is now: WARNING on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS WARNING: 193 processes [13:05:33] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [13:06:33] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [13:06:33] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [13:06:33] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [13:06:43] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [13:07:13] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [13:07:13] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [13:07:13] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [13:07:53] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [13:07:53] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [13:10:12] RECOVERY Total processes is now: OK on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS OK: 100 processes [13:13:23] !log bots installed tclsh8.6 with tip-389-impl for full unicode support on bots-4 [13:13:25] Logged the message, Master [13:31:22] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [13:32:02] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [13:32:02] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [13:32:52] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [13:36:32] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [13:36:52] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [13:37:12] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [13:37:12] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [13:37:12] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [13:37:42] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [13:37:52] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [13:37:52] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [13:38:32] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [13:39:12] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [14:01:22] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [14:02:02] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [14:02:02] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [14:02:52] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [14:06:33] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [14:06:53] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [14:07:13] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [14:07:13] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [14:07:13] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [14:07:43] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [14:07:53] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [14:07:53] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [14:08:33] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [14:09:13] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [14:31:22] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [14:32:02] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [14:32:02] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [14:32:52] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [14:37:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [14:37:45] Change on 12mediawiki a page Developer access was modified, changed by Jeremyb link https://www.mediawiki.org/w/index.php?diff=589006 edit summary: /* User:Quentinv57 */ re [14:37:52] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [14:37:52] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [14:37:52] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [14:37:52] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [14:38:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [14:38:22] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [14:38:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [14:39:12] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [14:39:42] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [14:41:00] 09/30/2012 - 14:40:59 - Creating a home directory for mardetanha at /export/keys/mardetanha [14:42:03] Change on 12mediawiki a page Developer access was modified, changed by Jeremyb link https://www.mediawiki.org/w/index.php?diff=589008 edit summary: /* User:Mardetanha */ done [14:45:57] 09/30/2012 - 14:45:57 - Updating keys for mardetanha at /export/keys/mardetanha [14:48:33] can some please add me to https://labsconsole.wikimedia.org/wiki/Nova_Resource:Bots ? [15:01:22] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [15:02:02] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [15:02:02] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [15:02:52] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [15:04:12] PROBLEM Total processes is now: WARNING on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS WARNING: 193 processes [15:07:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [15:07:52] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [15:07:52] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [15:07:52] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [15:07:52] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [15:08:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [15:08:22] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [15:08:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [15:09:12] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [15:09:42] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [15:19:36] Mardetanha: as I already told you elsewhere you must first request shell... [15:21:12] RECOVERY Free ram is now: OK on bots-2 i-0000009c.pmtpa.wmflabs output: 1710352 [15:21:13] this is stupid [15:22:38] jeremyb: did so [15:25:18] giftpflanze: ? [15:26:02] Mardetanha: i was responding to your request above. i guess maybe that was before i mentioned shell though... [15:26:10] why do you have to request shell access [15:27:25] ... [15:27:36] because someone said so? why does it matter? [15:28:06] ah, now i got it [15:29:12] RECOVERY Total processes is now: OK on wikistats-01 i-00000042.pmtpa.wmflabs output: PROCS OK: 100 processes [15:29:35] Mardetanha: i see no sign that you have done so [15:29:57] in fact there's no requests waiting... [15:30:33] oh, huh [15:30:38] maybe the page just needed a purge? [15:30:41] i see it now [15:32:01] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [15:32:31] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [15:32:39] :) [15:32:41] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [15:33:11] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [15:37:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [15:37:52] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [15:37:52] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [15:37:52] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [15:37:52] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [15:38:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [15:38:22] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [15:38:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [15:39:12] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [15:39:52] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [15:40:45] 09/30/2012 - 15:40:44 - User apache has been renamed, moving home directory in project(s): centralauth [15:55:02] PROBLEM dpkg-check is now: CRITICAL on dumps-bot3 i-000003ef.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [15:55:42] PROBLEM Current Load is now: CRITICAL on dumps-bot3 i-000003ef.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [15:56:22] PROBLEM Current Users is now: CRITICAL on dumps-bot3 i-000003ef.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [15:56:52] PROBLEM Disk Space is now: CRITICAL on dumps-bot3 i-000003ef.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [15:58:52] PROBLEM SSH is now: CRITICAL on dumps-bot3 i-000003ef.pmtpa.wmflabs output: Server answer: [15:59:22] PROBLEM Total processes is now: CRITICAL on dumps-bot3 i-000003ef.pmtpa.wmflabs output: CHECK_NRPE: Error - Could not complete SSL handshake. [16:00:41] RECOVERY Current Load is now: OK on dumps-bot3 i-000003ef.pmtpa.wmflabs output: OK - load average: 1.55, 1.28, 0.81 [16:00:51] RECOVERY dpkg-check is now: OK on dumps-bot3 i-000003ef.pmtpa.wmflabs output: All packages OK [16:01:21] RECOVERY Current Users is now: OK on dumps-bot3 i-000003ef.pmtpa.wmflabs output: USERS OK - 0 users currently logged in [16:01:51] RECOVERY Disk Space is now: OK on dumps-bot3 i-000003ef.pmtpa.wmflabs output: DISK OK [16:02:01] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [16:02:31] RECOVERY Free ram is now: OK on dumps-bot3 i-000003ef.pmtpa.wmflabs output: 5572076 [16:02:41] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [16:02:41] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [16:03:11] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [16:03:21] PROBLEM Free ram is now: WARNING on bots-sql2 i-000000af.pmtpa.wmflabs output: 899688 [16:03:51] RECOVERY SSH is now: OK on dumps-bot3 i-000003ef.pmtpa.wmflabs output: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [16:04:21] RECOVERY Total processes is now: OK on dumps-bot3 i-000003ef.pmtpa.wmflabs output: PROCS OK: 132 processes [16:07:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [16:07:52] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [16:07:52] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [16:07:52] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [16:07:52] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [16:08:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [16:08:22] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [16:08:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [16:09:12] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [16:09:52] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [16:19:12] PROBLEM Total processes is now: CRITICAL on kripke i-00000268.pmtpa.wmflabs output: PROCS CRITICAL: 204 processes [16:24:12] PROBLEM Total processes is now: WARNING on kripke i-00000268.pmtpa.wmflabs output: PROCS WARNING: 195 processes [16:32:02] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [16:32:42] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [16:33:12] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [16:33:42] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [16:37:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [16:37:52] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [16:37:52] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [16:37:52] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [16:37:52] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [16:38:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [16:38:22] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [16:38:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [16:39:12] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [16:40:12] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [17:02:02] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [17:02:42] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [17:03:12] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [17:03:42] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [17:05:45] 09/30/2012 - 17:05:45 - User huji may have been modified in LDAP or locally, updating key in project(s): bastion [17:05:57] 09/30/2012 - 17:05:56 - Updating keys for huji at /export/keys/huji [17:07:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [17:07:52] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [17:07:52] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [17:07:52] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [17:07:52] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [17:08:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [17:08:22] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [17:08:32] PROBLEM host: i-000003e5.pmtpa.wmflabs is DOWN address: i-000003e5.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [17:09:12] PROBLEM host: i-000000e2.pmtpa.wmflabs is DOWN address: i-000000e2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000e2.pmtpa.wmflabs) [17:11:02] PROBLEM host: i-00000118.pmtpa.wmflabs is DOWN address: i-00000118.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000118.pmtpa.wmflabs) [17:27:59] Good morning nagios! How long has that been happening? [17:29:30] since puppet restarted it again [17:29:57] Is labs actually busted, or are they all false alarms? (All my instances seem to be working…) [17:30:11] There's a handful of instances that have been down for weeks [17:30:27] I'm just in the process of fixing nagios to a point where it stops randomly breaking so we can get around to puppetizing [17:30:30] it [17:30:41] Oh, I see, it's a short list that is repeating. [17:30:58] Though some checks like puppet are just removed for now, since they're pending merged in gerrit. [17:31:07] We can probably make it repeat the spamming less though hmmmm [17:31:53] And I really need to check with Ryan as to if pmtpa and eqiad will have pings allowed betwean them (currently it seems they do not). [17:32:08] Or if we're going to split into 1 nagios server per region which I'll have to tweak the script for. [17:32:51] PROBLEM host: i-000003c0.pmtpa.wmflabs is DOWN address: i-000003c0.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000003c0.pmtpa.wmflabs) [17:33:21] PROBLEM host: i-0000038e.pmtpa.wmflabs is DOWN address: i-0000038e.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000038e.pmtpa.wmflabs) [17:33:21] PROBLEM host: i-00000389.pmtpa.wmflabs is DOWN address: i-00000389.pmtpa.wmflabs CRITICAL - Host Unreachable (i-00000389.pmtpa.wmflabs) [17:33:59] andrewbogott: If someone halts an instance from within the instance nova won't see that it's died and trigger the update script, right? Just if you do an action from nova like reboot. [17:34:26] That's a good question. I'm not totally sure. [17:34:31] PROBLEM host: i-0000031c.pmtpa.wmflabs is DOWN address: i-0000031c.pmtpa.wmflabs CRITICAL - Host Unreachable (i-0000031c.pmtpa.wmflabs) [17:34:32] I think you're right that Nova won't notice. [17:35:42] It would be interesting to have a 'watcher' to restart dead instances that are marked as suppose to be running. That would probably also cause confusing to people heh. [17:36:02] * Damianz goes to find the list of down since block storage migration to complain at people with [17:37:22] PROBLEM host: i-00000026.eqiad.wmflabs is DOWN address: i-00000026.eqiad.wmflabs CRITICAL - Packet Filtered (i-00000026.eqiad.wmflabs) [17:37:52] PROBLEM host: i-0000040b.pmtpa.wmflabs is DOWN address: i-0000040b.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [17:37:52] PROBLEM host: i-000000c1.pmtpa.wmflabs is DOWN address: i-000000c1.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c1.pmtpa.wmflabs) [17:37:52] PROBLEM host: i-000000c2.pmtpa.wmflabs is DOWN address: i-000000c2.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000c2.pmtpa.wmflabs) [17:37:52] PROBLEM host: i-000000f8.pmtpa.wmflabs is DOWN address: i-000000f8.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000000f8.pmtpa.wmflabs) [17:38:12] PROBLEM host: i-000002dd.pmtpa.wmflabs is DOWN address: i-000002dd.pmtpa.wmflabs CRITICAL - Host Unreachable (i-000002dd.pmtpa.wmflabs) [17:38:22] PROBLEM host: i-0000040c.pmtpa.wmflabs is DOWN address: i-0000040c.pmtpa.wmflabs PING CRITICAL - Packet loss = 100% [17:38:32] Go away for now [17:38:48] I wonder if Ryan would be up for allowing nagios to use salt to query info from instances [17:39:00] Could be nicer than nrpe for stuff like checking puppet freshness that requires root [17:46:30] Does nagios currently watch /every/ instance? And is that maybe just silly? [17:47:57] yes [17:48:27] It watches every instance that has an ip (finished building) and uses puppet classes to judge services. [17:48:44] Maybe I'm missing something… it seems like something that should be optional and off by default. [17:48:55] This would make more sense when we have a 'production' cluster. [17:49:21] It's how it's been since before I touched it, though I'd like to have an option in ldap to disable it for purly 'I don't care it's a dev instance'. [17:49:36] Since it's currently rather useless for actual semi-production stuff we care about. [17:49:43] Well, right… a user could decide that a given instance is important and turn nagios on. [17:49:46] * andrewbogott nods [17:50:05] I'm just writing an email to propose some changes to it actually. [17:50:24] cool [17:51:04] Would rather like to pull in nagios info/ganglia info to the project/instance pages on the wiki [17:55:07] ganglia is, I think? [17:55:40] At the very bottom, 'stats': https://labsconsole.wikimedia.org/wiki/Nova_Resource:I-00000386 [17:56:04] Adding similar things should be pretty easy, just a template edit. [17:56:23] Um… ok, so, probably you can't see that page I just linked to. But it's on every instance page. [17:59:17] I was thinking more doing some image transclusion thing because the instance pages suck a little. [18:00:07] Though I'd like to package graphite up and have a thing where hosts can throw data and depending on the source ip it gets .