[00:03:21] New patchset: Ryan Lane; "Disallow changing of passwords through instances" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/11261 [00:03:22] paravoid: ^^ [00:03:38] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/11261 [01:18:51] New review: Demon; "(no comment)" [operations/puppet] (test); V: 0 C: 1; - https://gerrit.wikimedia.org/r/11261 [02:18:47] RECOVERY Free ram is now: OK on bots-sql2 i-000000af output: OK: 23% free memory [03:11:47] PROBLEM Free ram is now: CRITICAL on incubator-bot1 i-00000251 output: Critical: 5% free memory [05:14:20] PROBLEM Free ram is now: WARNING on nova-daas-1 i-000000e7 output: Warning: 11% free memory [05:19:10] PROBLEM Free ram is now: WARNING on orgcharts-dev i-0000018f output: Warning: 16% free memory [05:29:10] PROBLEM Free ram is now: WARNING on test-oneiric i-00000187 output: Warning: 17% free memory [05:29:10] PROBLEM Free ram is now: WARNING on utils-abogott i-00000131 output: Warning: 14% free memory [05:34:20] PROBLEM Free ram is now: CRITICAL on nova-daas-1 i-000000e7 output: Critical: 4% free memory [05:39:11] PROBLEM Free ram is now: CRITICAL on orgcharts-dev i-0000018f output: Critical: 4% free memory [05:39:21] RECOVERY Free ram is now: OK on nova-daas-1 i-000000e7 output: OK: 94% free memory [05:44:11] RECOVERY Free ram is now: OK on orgcharts-dev i-0000018f output: OK: 94% free memory [05:44:11] PROBLEM Free ram is now: CRITICAL on utils-abogott i-00000131 output: Critical: 5% free memory [05:49:11] PROBLEM Free ram is now: CRITICAL on test-oneiric i-00000187 output: Critical: 3% free memory [05:49:11] RECOVERY Free ram is now: OK on utils-abogott i-00000131 output: OK: 97% free memory [05:54:07] RECOVERY Free ram is now: OK on test-oneiric i-00000187 output: OK: 97% free memory [09:52:04] hashar: I think new apaches doesn't work [09:52:18] they return socket error on http [09:52:35] HTTP CRITICAL 2012-06-14 09:46:17 1d 20h 58m 11s 4/4 CRITICAL - Socket timeout after 10 seconds [09:54:41] which apaches ? [09:54:55] I have deleted the old ones 20, 21, 22, 23 yesterday [09:55:11] I guess nagios is not up to date [09:56:39] petan: beta now use only two boxes : apache30 and apache31 [09:56:42] which run Precise [09:56:56] why these two can't be reached by nagios [09:57:05] then I will purge/delete the invalid video thumbnails [09:57:12] did you properly configured the firewall [09:57:21] they need to be in web group [09:57:23] ohh [09:57:30] yeah the security group might be missing [09:57:34] :D [09:57:38] luck we have squid then [09:58:37] I am not sure what the "web" security group does though [09:58:44] there should be a nagios one for sure :) [10:01:20] out for lunch [10:01:23] will be back later this afternoon [11:47:03] PROBLEM Free ram is now: UNKNOWN on incubator-bot1 i-00000251 output: NRPE: Call to fork() failed [11:52:08] RECOVERY Free ram is now: OK on incubator-bot1 i-00000251 output: OK: 33% free memory [11:54:17] back [12:12:37] oh god, labsconsole is slow... [12:15:32] !log bots wmib: patching bot [12:21:37] !labs [12:21:37] https://labsconsole.wikimedia.org/wiki/$1 [12:23:04] !log bots wmib: patching bot [12:25:39] !log bots wmib: patching again :/ [12:25:49] !log bots wmib: something broke [12:29:02] wut [12:31:00] @help [12:31:00] Type @commands for list of commands. This bot is running http://meta.wikimedia.org/wiki/WM-Bot version wikimedia bot v. 1.6.2 source code licensed under GPL and located at https://github.com/benapetr/wikimedia-bot [12:31:09] ah... [12:49:37] bastion hosts down? [12:50:19] hundfred: no [12:50:32] @regsearch nagios [12:50:32] Results (Found 3): nagios, monitor, alert, [12:50:37] !monitor bastion [12:50:37] http://nagios.wmflabs.org/cgi-bin/nagios3/status.cgi?host=bastion [12:50:48] hm... [12:50:51] !nagios [12:50:51] http://nagios.wmflabs.org/nagios3 [12:50:57] hundfred: it would be down in nagios [12:51:09] !monitor bastion1 [12:51:09] http://nagios.wmflabs.org/cgi-bin/nagios3/status.cgi?host=bastion1 [12:51:10] :P [12:51:57] i have problems to resolve its name: dig wmflabs.org; connection timed out; no servers could be reached [12:53:32] ok, sry, from another host it seems to work [12:54:18] hundfred: ok that is known issue [12:54:32] it's problem with dns [12:54:58] !bastionip is 208.80.153.194 [12:54:59] Key was added [12:55:05] hundfred: ssh directly to ip [12:55:35] !bastion [12:55:35] http://en.wikipedia.org/wiki/Bastion_host; lab's specific bastion host is: bastion.wmflabs.org; see !access [12:55:40] !bastion del [12:55:40] Successfully removed bastion [12:55:46] !bastionip del [12:55:46] Successfully removed bastionip [12:56:24] !bastion is http://en.wikipedia.org/wiki/Bastion_host; lab's specific bastion host is: bastion.wmflabs.org which should resolve to 208.80.153.194; see !access [12:56:24] Key was added [12:57:23] ok, that works [13:12:47] hundfred: ah, nice to see you already got help [13:13:56] petan: still leftovers from recent DNS issue? i knew about it but thought it was solved already [13:14:08] no idea [13:14:18] I don't have any problem with dns [13:15:00] hundfred: there was a recent issue with labs DNS which should be fixed. if you get different results its caching [13:15:48] I am stroke by that issue at the moment :-] [13:15:53] can't resolve bastion.wmflabs.org [13:16:11] guess I need to wait for my current ISP DNS resolver to timeout the faulty entry [13:16:38] dig bastion.wmflabs.org @8.8.8.8 [13:16:45] VIRT0.WIKIMEDIA.ORG. 17904 IN A 208.80.153.135 [13:16:48] \O/ [13:16:57] bastion.wmflabs.org.3600INA208.80.153.194 [13:17:21] 8.8.8.8 = Google DNS [13:17:52] !bastio | hashar [13:17:52] hashar: http://en.wikipedia.org/wiki/Bastion_host; lab's specific bastion host is: bastion.wmflabs.org which should resolve to 208.80.153.194; see !access [13:18:05] there is ip :) [13:19:12] mutante: we are lucky that happened on the labs and not on a production domain [13:20:57] indeed [13:40:18] google seems to have cached it [14:06:53] petan: do you still want to keep that gerrit change in there? (alt. check for puppet freshness) [14:08:10] uh, I don't know, I think I abandoned it [14:08:18] its still alive [14:08:23] hm... [14:08:27] I will reject it [14:08:31] 3376 [14:08:33] thx [14:09:27] Change abandoned: Petrb; "using other" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/3376 [14:14:12] petan: i have another one from you where im added as reviewer, but that is perfectly fine to stay, is pending existing BZ ticket and should just stay as it is [14:15:39] andrewbogott: if you are still looking for ops tasks, I have one :) [14:16:49] aha [14:27:30] hi guys, is bastion currently down? we cannot connect to bastion [14:28:01] works for me though [14:28:22] Abraham_WMDE: there seems to be an issue with dns, bastion.wmflabs.org which should resolve to 208.80.153.194 [14:28:47] hundfred: thx [14:40:29] hundfred: any idea which instance type would be appropriate? [14:40:54] for archive db and graph drawing [14:41:47] Ryan_Lane: want some overview page of instance types and recommendations besides the actual form? [14:42:20] sure [14:42:31] archive db? [14:42:33] what's that? [14:42:48] Abraham_WMDE: there was some DNS issue when we changed IPs [14:43:06] Ryan_Lane: the request is for a second instance in wikistats project to "database-history of wikistats, and to draw some graphs later on." [14:43:16] Abraham_WMDE: you can either switch to using google's resolver, or, if you have access to the resolver you are using, flush its caches [14:43:38] Ryan_Lane: the current project is just a "update once daily"-thing but does not have any history/archive/long-term trends [14:43:39] mutante: in general you should always use the m and not the s instance types [14:44:05] the s types were added before we had project storage [14:44:29] there's rarely a good reason to use m1.tiny [14:44:43] Ryan_Lane: thx, we're connected again [14:44:43] always go with small or larger [14:44:49] tending to go with small [14:44:53] Abraham_WMDE: yw [14:45:01] how about local mysql/maria currently? [14:45:44] the project wouldnt mind, as long as we can backup it somewhere [14:45:52] backups? [14:46:11] they need to backup elsewhere than labs if they really care about the data [14:46:21] ya, so the "have a central db" for labs thing is not right around the corner [14:46:29] * Ryan_Lane nods [14:46:33] ok [14:51:59] hundfred: if you dont say otherwise its gonna be precise (Ubuntu 12.04) [14:54:13] !log wikistats creating a second instance wikistats-archive per BZ 37233 [14:55:11] PROBLEM Puppet freshness is now: CRITICAL on deployment-apache23 i-00000270 output: Puppet has not run in last 20 hours [14:55:16] hundfred: m1.small 1 CPU, 2048MB RAM, 20GB storage [14:56:11] should be enaugh [14:58:01] if you need more, take a medium [14:58:07] you can't easily change size later [14:59:02] will help with backup elsewhere [15:01:11] PROBLEM Puppet freshness is now: CRITICAL on deployment-apache20 i-0000026c output: Puppet has not run in last 20 hours [15:02:10] hundfred: now to the hopping via bastion host: ~/.ssh/config [15:03:34] hundfred: Host wikistats-archive (newline) ProxyCommand ssh -W %h:%p hundfred@bastion.wmflabs.org (newline) User hundfred [15:04:07] hundfred: should let you have direct ssh to the instance without even forwarding agents.. eh yeah, i think you another username there, but you'll know [15:04:11] PROBLEM Puppet freshness is now: CRITICAL on deployment-apache22 i-0000026f output: Puppet has not run in last 20 hours [15:07:02] !log wikistats adding mariadb classes and bringing up new instance [15:07:11] PROBLEM Puppet freshness is now: CRITICAL on deployment-apache21 i-0000026d output: Puppet has not run in last 20 hours [15:07:19] sigh, logging [15:09:16] broken again? [15:09:20] lemme fix that today [15:09:32] yes, we have docs for [15:09:40] i forget the IRC trigger:) [15:10:08] https://labsconsole.wikimedia.org/wiki/Nova_Resource:Bots/Documentation#Troubleshooting [15:10:51] thanks Ryan [15:14:25] !log bots moved adminbot to bots-labs [15:14:28] bah [15:14:42] hundfred: https://labsconsole.wikimedia.org/wiki/Nova_Resource:I-000002d7 [15:15:11] !logbot is https://labsconsole.wikimedia.org/wiki/Nova_Resource:Bots/Documentation#Troubleshooting [15:15:11] Key was added [15:15:24] @search https://labsconsole.wikimedia.org/wiki/Nova_Resource:Bots/Documentation#Troubleshooting [15:15:24] Results (Found 2): bot, logbot, [15:15:28] !bot [15:15:28] http://meta.wikimedia.org/wiki/WM-Bot | troubleshooting bots -> https://labsconsole.wikimedia.org/wiki/Nova_Resource:Bots/Documentation#Troubleshooting [15:15:31] aha [15:16:29] hundfred: notice the "get console output" link there, fyi, i am checking though why it is pending [15:17:48] hmm, now having said that it loads for me forever [15:19:07] mutante: i noticed that link, but i am not allowed to have access to it [15:19:17] Sysadmin role required [15:19:45] makes sense, just currently i cant see it myself, looking [15:21:31] hundfred: ah, wth, i just gave you sysadmin, you are technically if you want to take care of the instance [15:24:30] hundfred: lemme remove all puppet classes again and reboot first [15:28:39] !log bots moved adminbot to bots-labs [15:28:39] https://labsconsole.wikimedia.org/wiki/Nova_Resource:Bots/Documentation#Troubleshooting [15:28:40] Error reading project list from LDAP. [15:28:40] bots is not a valid project. [15:28:54] gimme a fucking break [15:29:04] someone started it improperlyt [15:29:20] this is on the new box [15:29:26] !log [15:29:27] https://labsconsole.wikimedia.org/wiki/Nova_Resource:Bots/Documentation#Troubleshooting [15:29:28] !log bots moved adminbot to bots-labs [15:29:28] https://labsconsole.wikimedia.org/wiki/Nova_Resource:Bots/Documentation#Troubleshooting [15:29:28] aha [15:29:28] bots is not a valid project. [15:29:34] @infobot-ignore+ log [15:29:35] Item log was inserted to ignore list [15:29:38] !log [15:29:40] :) [15:29:51] the "not a valid project" thing is another one in troubleshooting:) [15:30:00] !log test [15:30:00] Message missing. Nothing logged. [15:30:11] !log bastion hello [15:30:11] bastion is not a valid project. [15:30:19] !log bots hello [15:30:19] bots is not a valid project. [15:30:22] nothing is getting added to the cache file [15:30:55] Ryan_Lane: "It needs /var/run/adminbot as a cache directory and permissions to write to it. It also needs working LDAP to fetch the project list. " [15:30:56] !bastion [15:30:56] http://en.wikipedia.org/wiki/Bastion_host; lab's specific bastion host is: bastion.wmflabs.org which should resolve to 208.80.153.194; see !access [15:31:06] yeah [15:31:06] Ryan_Lane: i had to fix / create that once before [15:31:07] I did all of that [15:31:11] oh..hm [15:32:24] LDAP connection? [15:32:39] maybe the code is somehow fucked up [15:32:43] probably the timestamop [15:32:46] lemme remove the file [15:33:00] !log bots moved adminbot to bots-labs [15:33:01] Error reading project list from LDAP. [15:33:01] bots is not a valid project. [15:33:04] damn it [15:34:35] !log bots moved adminbot to bots-labs [15:34:35] bots is not a valid project. [15:35:02] this is what I get for not putting this into source control [15:35:04] !log bots moved adminbot to bots-labs [15:35:05] Error reading project list from LDAP. [15:35:05] bots is not a valid project. [15:36:05] oh, I'm missing a config option [15:36:08] "Successfully deleted instance, but failed to remove dumps-2 DNS entry." Zzz [15:36:23] eh? [15:36:25] odd [15:37:53] Hydriz: well, it's not in DNS :) [15:38:02] had you already deleted it in the past? [15:38:06] yep [15:38:10] that's why [15:38:14] but its still in NovaInstance [15:38:24] "dumps-2 i-00000257 running" [15:38:27] sometimes it takes a while for nova to delete it [15:38:33] ah, I see [15:38:48] and labsconsole seems a little slower today though [15:38:59] maybe only for me, hmm [15:39:07] labsconsole is dead for me [15:39:38] in which way? [15:39:41] it works for me [15:40:01] like 208.80.153.135 does not ping nor answer to port 80 or 443 [15:40:08] !log bots moved adminbot to bots-labs [15:40:16] that's because that's no longer its IP [15:40:32] !log bots moved adminbot to bots-labs [15:40:39] labs-morebots: poke poke [15:40:42] wtf you doing? [15:40:50] ohh it changed to just like virt0 :-D TTL still 21512 [15:40:55] so that should fix self tomorrow [15:41:01] wtf [15:41:03] really? [15:41:08] use google's dns [15:41:14] Hydriz: confirming slownewss [15:41:15] no f**** way :-D [15:41:19] I will use /etc/hosts [15:41:19] but just since a little while [15:41:27] little while lol [15:41:41] no i mean like literally 10 minutes ago it wasnt [15:41:46] for me [15:41:50] it occurred to me for a few hours, though it did not harm to wait :P [15:42:07] Bastion ip is 208.80.153.194 [15:42:34] so not sure what that .135 meant :P [15:42:39] by the way fenari resolve labsconsole to the old address still [15:42:42] Ryan_Lane: ack @ google dns, i recommended 8.8.8.8 earlier since worksforme [15:42:56] hashar: I can fix that. gimme a sec [15:43:04] it is ok the TTL is 9000 sec [15:43:27] that is the production resolver I guess, not a big deal [15:43:37] as long as the office one is fixed, we should be fine :-) [15:44:01] hashar: why connect from fenari to labsconsole? just curious [15:44:25] mutante: that was to access a dig command from another resolver than mine, my ISP or google [15:44:31] gotcha [15:44:34] mutante: so I did: ssh fenari dig labsconsole [15:45:12] well, it's fixed from fenari now :) [15:45:16] \O/ [15:45:23] * hashar launch a gnome season on fenari [15:45:31] heh [15:45:39] just switch to google dns for a while ;) [15:45:43] season/session [15:46:07] no way! [15:46:15] that will make Google aware of my porn addiction :-] [15:46:51] !log deployment-prep redeleting deployment-apache20 [15:46:53] Logged the message, Master [15:47:22] Successfully deleted instance, but failed to remove deployment-apache20 DNS entry. [15:47:23] oops? [15:47:33] heh [15:47:44] that is an instance I deleted like 2 days ago [15:47:52] hashar: they already are because of the passwords in your mail [15:47:57] it can't get deleted properly apparently [15:48:26] mutante: I don't use gmail for personal emails exactly for that reason :-] [15:48:31] Ryan: Is there a good way of changing the email address of an account? [15:48:46] unfortunately you need to ask us [15:48:52] heh [15:48:55] hashar: if people would just use gpg and then not worry [15:48:59] since mediawiki is a broken piece of crap [15:49:10] doesn't seem like there is a script called setmail [15:49:13] !log bots moved adminbot to bots-labs [15:49:15] Logged the message, Master [15:49:16] which the TS has for changing emails [15:49:18] \o/ [15:49:50] well, someone just needs to fix mediawiki [15:51:30] hmm, one last question [15:51:35] ? [15:51:35] not sure if anyone asked before [15:51:50] but, is it better to do dist-upgrade [15:51:57] or delete and recreate with precise? [15:51:59] better to install from scratch [15:52:11] especially for one reason [15:52:21] a dist-upgrade eats more space [15:52:30] since the instances are based off of a base image [15:52:40] so delete and recreate... [15:52:42] using a copy on write filesystem [15:52:46] yes, if possible [15:52:59] and adding a 1 hour delay in between for dns? [15:53:10] my surviving to delete bug is https://bugzilla.wikimedia.org/show_bug.cgi?id=37593 ;) [15:54:04] Hydriz: yes. or just use a different instance name [15:54:24] yep, understood :) [15:55:40] PROBLEM host: wikistats-archive is DOWN address: i-000002d7 check_ping: Invalid hostname/address - i-000002d7 [15:56:20] ah, heh [15:56:43] hundfred: a good way to point out there is monitoring for it and nagios.wmflabs.org :) [15:56:49] ^ [15:57:10] and the project pages have links to ganglia and nagios [15:57:15] !project bots [15:57:15] https://labsconsole.wikimedia.org/wiki/Nova_Resource:bots [15:58:36] mutante: ok, you just did a click on the corresponding puppet class? [15:58:52] hundfred: no, the nagios things works differently and fully automatic [15:59:02] hundfred: nagios pulls info from central place [15:59:21] so notices new instance creations sometimes before they are even up [15:59:37] petan has the details [15:59:40] i think [16:00:50] hm. it's really odd that deletes are failing [16:04:11] Ryan_Lane: I have lots of coding to do today, but don't mind a distraction. What's up? [16:04:34] andrewbogott: well, I'm working on a deployment system and it relies on saltstack [16:04:48] I'd like to have saltstack installed and working in some way in labs [16:05:20] saltstack is a puppet competitor? [16:05:33] what I'd love to have is a single salt master running somewhere (likely on virt0) and salt minions running on all labs instances [16:05:45] * andrewbogott googles [16:05:54] New patchset: Hashar; "enable imagescaler packages on Precise hosts." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/11298 [16:06:13] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/11298 [16:06:22] I'd like to work out peer relationships somehow, so that people can run salt from their instances to their instances [16:06:35] that, of course, is harder, since we have sudo policies.... [16:06:43] Is there a reason why you'd want the master on virt0 vs. on a vm? [16:06:53] heh [16:07:08] New review: Hashar; "Issue on a precise host is:" [operations/puppet] (test); V: 0 C: 0; - https://gerrit.wikimedia.org/r/11298 [16:07:10] because salt uses its own auth model that doesn't fit with puppet [16:07:26] so, we want puppet to automatically add nodes into salt's auth database [16:08:04] also, the salt master basically has the same level of access as the puppet master [16:08:08] so it's dangerous to put it in a vm [16:09:27] Ryan_Lane: OK; is this something you'd like me to start on immediately, or just put on the list? [16:09:37] just put on the list [16:10:05] And when you said 'working on a deployment system' do you mean, building a new one? Or using some existing thing? [16:12:26] well, it's a combination of things and some custom code [16:12:28] not much custom code [16:12:30] Ryan_Lane: would you mind having a look at - https://gerrit.wikimedia.org/r/11298 please? It is to have image scaling package to install on Precise hosts [16:12:46] Ryan_Lane: OK... it's on the list :) [16:12:51] thanks :) [16:13:56] hashar: I don't think I understand what this is doing [16:14:43] the if ( $lsbdistcodename == "lucid" ) { was added because some packages were renamed with Lucid [16:14:45] ah [16:14:54] linux-libertine became ttf-linux-libertine [16:14:56] this was due to hardy [16:15:16] I don't think we have any hardy boxes left [16:15:20] we don't [16:15:23] but that is blocking Precise :-] [16:15:31] what about the class below that, that has the same check [16:15:32] ? [16:15:40] ohh [16:17:01] I love our workflow [16:17:08] I love git [16:17:10] I like gerrit [16:17:11] New patchset: Hashar; "enable imagescaler classes on Precise hosts." [operations/puppet] (test) - https://gerrit.wikimedia.org/r/11298 [16:17:13] I am happy [16:17:30] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/11298 [16:18:40] Ryan_Lane: I have updated patchset [16:24:24] New review: Ryan Lane; "The checks were due to hardy hosts. We don't have those anymore, so the checks aren't needed now." [operations/puppet] (test); V: 0 C: 2; - https://gerrit.wikimedia.org/r/11298 [16:24:26] Change merged: Ryan Lane; [operations/puppet] (test) - https://gerrit.wikimedia.org/r/11298 [16:26:30] PROBLEM host: wikistats-archive is DOWN address: i-000002d7 check_ping: Invalid hostname/address - i-000002d7 [16:28:06] will fix the rest later on :-D [16:28:09] thanks ryan! [16:28:15] yw [16:30:25] hey [16:30:27] back [16:43:41] new instance reports as "running" in labsconsole, but did not get IP and console output now loads but empty [16:44:40] RECOVERY HTTP is now: OK on grail i-000002c6 output: HTTP OK: HTTP/1.1 200 OK - 453 bytes in 0.245 second response time [16:44:40] RECOVERY dpkg-check is now: OK on patchtest i-000000f1 output: All packages OK [16:45:08] Ryan_Lane: Unexpected non-MediaWiki exception encountered, of type "RequestCore_Exception" [16:45:18] where's this? [16:45:27] overlay in reboot instance screen [16:45:35] oh. right [16:45:36] want me to pastebin all? [16:45:38] I just restarted the api [16:45:42] oh,ok [16:46:20] PROBLEM HTTP is now: WARNING on deployment-apache30 i-000002d3 output: HTTP WARNING: HTTP/1.1 403 Forbidden - 366 bytes in 0.018 second response time [16:46:20] RECOVERY HTTP is now: OK on demo-deployment1 i-00000276 output: HTTP OK: HTTP/1.1 200 OK - 911 bytes in 0.575 second response time [16:49:43] Hydriz: deletes will work now [16:49:47] this was due to the IP change [16:50:00] okay, good :) [16:51:30] PROBLEM HTTP is now: CRITICAL on deployment-apache30 i-000002d3 output: CRITICAL - Socket timeout after 10 seconds [16:52:50] PROBLEM HTTP is now: CRITICAL on grail i-000002c6 output: CRITICAL - Socket timeout after 10 seconds [16:52:50] PROBLEM dpkg-check is now: CRITICAL on patchtest i-000000f1 output: CHECK_NRPE: Socket timeout after 10 seconds. [16:54:30] PROBLEM HTTP is now: CRITICAL on demo-deployment1 i-00000276 output: CRITICAL - Socket timeout after 10 seconds [16:56:30] PROBLEM host: wikistats-archive is DOWN address: i-000002d7 check_ping: Invalid hostname/address - i-000002d7 [17:00:52] Ryan_Lane: some problem with nagios [17:00:53] ? [17:01:04] you can open a bug for that if there is some [17:01:41] btw I don't believe that nagios doesn't remove the deleted instances [17:01:47] because all hosts are up [17:02:00] "invalid hostname" from nagios sounds like related to dns, no? [17:02:20] aha [17:02:23] probably [17:02:36] I have to go away right now I will be back in 20 min [17:02:43] same here [17:02:46] is there any problem or not [17:02:52] with nagios [17:02:58] it looked as it works fine to me [17:03:07] hi robla [17:03:15] hi petan [17:03:24] mutante: is there still dns issue? [17:03:28] petan: that one wikistats-archive is the one i just created and hasnt really been up yet for me [17:03:30] because I can't resolve right now [17:03:48] petan: locally? [17:03:55] mutante: nagios should update the list from smw query [17:04:02] petan: you mean you cant resolve nagios.wmflabs.org from home? [17:04:07] mutante: I can't resolve here in prague [17:04:09] yes [17:04:20] !nagios [17:04:20] http://nagios.wmflabs.org/nagios3 [17:04:24] this is what I can't :) [17:04:25] open [17:04:29] petan: that depends on the local DNS, try using 8.8.8.8 as DNS temporary [17:04:40] hm... [17:04:43] petan: i had that issue earlier but now its gone for me..it depends [17:04:53] ok I really need to go now [17:05:02] petan: independent of that, nagios itself seems to think some instance names are "invalid" [17:05:06] me too [17:05:08] cya [17:06:11] petan: wikistats-archive is just in status "running" but without an IP afaict now .bbl [17:06:32] hence the "invalid address", so i guess forget the dns comment [17:06:51] (just about the nagios message not about your local pc) ..out [17:26:30] PROBLEM host: wikistats-archive is DOWN address: i-000002d7 check_ping: Invalid hostname/address - i-000002d7 [17:46:38] is labsconsole down? I'm again having trouble reaching it from the wmf network [17:56:31] PROBLEM host: wikistats-archive is DOWN address: i-000002d7 check_ping: Invalid hostname/address - i-000002d7 [18:02:42] <^demon> Eloquence: It's a tiny bit slow for me, but I can hit it from outside. [18:03:48] The OIT DNS server had a TTL issue, poke OIT about it [18:04:08] Yesterday after they fixed the wmflabs DNS outage, the negative responses were cached in the office DNS server I think [18:07:50] I get the same response when trying from a German server; doubt that it's just the OIT network [18:11:25] Oh, hm [18:26:31] PROBLEM host: wikistats-archive is DOWN address: i-000002d7 check_ping: Invalid hostname/address - i-000002d7 [18:44:10] Eloquence: works for me. Might be an issue with the DNS entry that was changed recently [18:44:58] labsconsole.wikimedia.org should resolve to 208.80.152.32 [18:45:22] yeah, I manually added it to /etc/hosts for now .. they're poking at it in #wikimedia-operations [18:46:13] Eloquence: I thought someone reloaded the SF office DNS resolver [18:56:31] PROBLEM host: wikistats-archive is DOWN address: i-000002d7 check_ping: Invalid hostname/address - i-000002d7 [19:25:31] New patchset: Hashar; "Explicitly define fonts package for Precise" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/11358 [19:25:47] New review: gerrit2; "Change did not pass lint check. You will need to send an amended patchset for this (see: https://lab..." [operations/puppet] (test); V: -1 - https://gerrit.wikimedia.org/r/11358 [19:26:31] PROBLEM host: wikistats-archive is DOWN address: i-000002d7 check_ping: Invalid hostname/address - i-000002d7 [19:28:54] New patchset: Hashar; "'gs' package renamed 'ghostscript' in Precise" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/11359 [19:29:12] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/11359 [19:30:21] New patchset: Hashar; "Explicitly define fonts package for Precise" [operations/puppet] (test) - https://gerrit.wikimedia.org/r/11358 [19:30:40] New review: gerrit2; "Lint check passed." [operations/puppet] (test); V: 1 - https://gerrit.wikimedia.org/r/11358 [19:56:31] PROBLEM host: wikistats-archive is DOWN address: i-000002d7 check_ping: Invalid hostname/address - i-000002d7 [20:26:31] PROBLEM host: wikistats-archive is DOWN address: i-000002d7 check_ping: Invalid hostname/address - i-000002d7 [20:31:26] https://labsconsole.wikimedia.org/ is still not accessible from wmf office [20:36:46] drdee: the workaround is to get the IP in /etc/hosts aka: 208.80.152.32 labsconsole.wikimedia.org [20:37:15] the fix would be to reload the DNS zone in office resolver :/ [20:37:22] or wait for the time to live to expire [20:57:10] PROBLEM host: wikistats-archive is DOWN address: i-000002d7 check_ping: Invalid hostname/address - i-000002d7 [21:13:36] !labsconsole.wiki is 208.80.152.32 [21:13:36] Key was added [21:16:00] !labsconsole.wiki [21:16:00] 208.80.152.32 [21:17:38] !lab [21:17:38] There are multiple keys, refine your input: labs, labsconf, labsconsole.wiki, labs-home-wm, labs-morebots, labs-nagios-wm, labs-project, [21:17:48] heh [21:17:55] * jeremyb waits for the bot [21:18:03] !labsconsol [21:18:04] 208.80.152.32 [21:18:14] !labsconso [21:18:14] 208.80.152.32 [21:18:16] <3 [21:18:17] huh [21:18:24] still waiting for the bot... [21:18:29] for what [21:19:43] petan: to say my key was sync'd [21:19:50] ? [21:19:58] what [21:27:10] PROBLEM host: wikistats-archive is DOWN address: i-000002d7 check_ping: Invalid hostname/address - i-000002d7 [21:27:51] ohhhh, did we already switch to the new key location? so i was waiting for a bot that would never speak [21:31:36] ^demon: so can i just change stuff locally or do i use the test branch or what? [21:32:02] <^demon> If you're volunteering, I'd love to have the labs instance run from puppet. Right now it's just a local manual install in /var/lib/gerrit2 [21:32:02] (also, can i has merge / submit rights on test branch? :-P) [21:32:11] <^demon> I don't has those either :( [21:32:21] oh, really?!!!! [21:32:27] (re you not having) [21:32:35] ok, manual's good because i only have ~5 mins [21:32:36] atm [21:34:02] ^demon: do i need to boot gerrit? [21:34:10] has public IP? [21:34:21] <^demon> Oh, guess it went down at some point. [21:34:31] <^demon> Yeah, public url is http://gerrit-dev.wmflabs.org/r/ [21:34:45] no, i mean just because i changed the config i thought it needed a boot [21:34:54] how do i boot it? [21:35:57] <^demon> export GERRIT_SITE=/var/lib/gerrit2/review_site && /var/lib/gerrit2/review_site/bin/gerrit.sh start [21:37:46] ewww. you can just do `GERRIT_SITE=/var/lib/gerrit2/review_site /var/lib/gerrit2/review_site/bin/gerrit.sh start` then [21:37:51] no export [21:41:38] huh, it's blue [21:41:58] i'm timo! [21:42:19] jeremyb: I have no idea what you mean [21:45:53] <^demon> jeremyb: I dunno, I always use export cuz that's what the docs said to do. [21:46:01] ^demon: btw, links from gitweb grep search results to blobs are buggy. if the search is on a branch or specific commit id / tag / etc. then the links just go to the default HEAD [21:46:07] ^demon: docs are wrong! [21:46:17] <^demon> {{sofixit}} [21:46:39] ^demon: or maybe they are just trying to support ancient or alternative shells [21:46:56] but as far as I know 98% of the WMF is bash [21:46:57] the export may be posix compliant [21:47:21] i have to read more on the POSIX. but yes that was my point [21:47:28] I don't know of a unix-shell that doesn't do FOO=bar comand [21:47:38] no, that's also POSIX [21:47:56] I checked it for an execution code in MW [21:48:11] I guess it could still be older posix, though [21:49:15] solaris sh accepts it [21:49:38] and that's a traditional sh [21:51:26] ^demon: are these regexes case sensitive? [21:51:32] <^demon> Yes [21:52:10] so we should make them [0-9A-Fa-f] i think then. (and A-Za-z) [21:56:50] ok, i'm off, bbl [21:57:13] PROBLEM host: wikistats-archive is DOWN address: i-000002d7 check_ping: Invalid hostname/address - i-000002d7 [22:27:13] PROBLEM host: wikistats-archive is DOWN address: i-000002d7 check_ping: Invalid hostname/address - i-000002d7 [22:57:13] PROBLEM host: wikistats-archive is DOWN address: i-000002d7 check_ping: Invalid hostname/address - i-000002d7 [22:57:55] Is Labs down? I can't access ee-prototype.wmflabs.org or labs console. [22:58:47] ^ petan, Ryan_Lane [22:59:08] * Ryan_Lane sighs [22:59:12] Seems so [22:59:14] no [22:59:16] slow* [22:59:16] it's up [22:59:19] DNS is fucked up [22:59:31] ok, as long as it's not just me ;) [22:59:41] well, it's the office, for sure [22:59:47] and other resolvers [22:59:52] it's working fine for google DNS [23:00:07] oh [23:00:14] you guys really shouldn't use the labs logo btw ;) [23:00:50] I will pass that on. [23:00:57] hm. I wonder if we could use variations of the labs logo for each project [23:01:03] Totally should change the labs logo to use a pink unicorn [23:01:25] StevenW: can't use any trademarked logos (according to legal) [23:02:01] 'If you have to ask a laywer it's probably a bad idea' [23:02:06] makes sense [23:25:48] * jeremyb is back; waves [23:27:49] PROBLEM host: wikistats-archive is DOWN address: i-000002d7 check_ping: Invalid hostname/address - i-000002d7 [23:29:26] * jeremyb could use a favor from a root when you have a sec... [23:29:29] gah [23:29:31] ww [23:57:49] PROBLEM host: wikistats-archive is DOWN address: i-000002d7 check_ping: Invalid hostname/address - i-000002d7