[00:00:07] it always loses stats [00:00:12] it writes them to tmpfs [00:00:16] I'm not a huge fan of that [00:00:47] ... [00:00:50] really [00:01:14] fix it if you'd like :) [00:04:20] not a huge fan of ganglia tbh [00:30:05] 01/05/2013 - 00:30:05 - Creating a project directory for sartoris [00:40:08] Damianz: what do you prefer? [00:42:33] PROBLEM Current Load is now: WARNING on parsoid-roundtrip3.pmtpa.wmflabs 10.4.0.62 output: WARNING - load average: 5.04, 6.19, 5.29 [01:02:33] PROBLEM Current Load is now: WARNING on parsoid-roundtrip6-8core.pmtpa.wmflabs 10.4.0.222 output: WARNING - load average: 6.15, 5.81, 5.31 [01:04:33] PROBLEM Current Users is now: CRITICAL on sartoris-deploy1.pmtpa.wmflabs 10.4.1.64 output: Connection refused by host [01:05:13] PROBLEM Disk Space is now: CRITICAL on sartoris-deploy1.pmtpa.wmflabs 10.4.1.64 output: Connection refused by host [01:05:53] PROBLEM Current Load is now: CRITICAL on sartoris-deploy1.pmtpa.wmflabs 10.4.1.64 output: Connection refused by host [01:05:53] PROBLEM Free ram is now: CRITICAL on sartoris-deploy1.pmtpa.wmflabs 10.4.1.64 output: Connection refused by host [01:06:14] RECOVERY Current Users is now: OK on patchtest.pmtpa.wmflabs 10.4.0.69 output: USERS OK - 0 users currently logged in [01:06:14] RECOVERY Current Users is now: OK on patchtest2.pmtpa.wmflabs 10.4.0.74 output: USERS OK - 0 users currently logged in [01:06:33] RECOVERY Current Load is now: OK on patchtest.pmtpa.wmflabs 10.4.0.69 output: OK - load average: 0.02, 0.09, 0.08 [01:06:43] PROBLEM Total processes is now: WARNING on bots-salebot.pmtpa.wmflabs 10.4.0.163 output: PROCS WARNING: 170 processes [01:06:53] RECOVERY Disk Space is now: OK on patchtest.pmtpa.wmflabs 10.4.0.69 output: DISK OK [01:06:53] RECOVERY Total processes is now: OK on patchtest2.pmtpa.wmflabs 10.4.0.74 output: PROCS OK: 83 processes [01:07:03] RECOVERY Disk Space is now: OK on patchtest2.pmtpa.wmflabs 10.4.0.74 output: DISK OK [01:07:13] RECOVERY Current Load is now: OK on patchtest2.pmtpa.wmflabs 10.4.0.74 output: OK - load average: 0.00, 0.00, 0.00 [01:07:13] RECOVERY dpkg-check is now: OK on patchtest2.pmtpa.wmflabs 10.4.0.74 output: All packages OK [01:07:23] PROBLEM Total processes is now: CRITICAL on sartoris-deploy1.pmtpa.wmflabs 10.4.1.64 output: Connection refused by host [01:08:13] RECOVERY Free ram is now: OK on patchtest.pmtpa.wmflabs 10.4.0.69 output: OK: 367% free memory [01:09:33] PROBLEM dpkg-check is now: CRITICAL on sartoris-deploy1.pmtpa.wmflabs 10.4.1.64 output: Connection refused by host [01:10:53] PROBLEM Current Load is now: WARNING on ve-roundtrip2.pmtpa.wmflabs 10.4.0.162 output: WARNING - load average: 6.28, 6.26, 5.41 [01:14:23] PROBLEM Current Users is now: CRITICAL on patchtest.pmtpa.wmflabs 10.4.0.69 output: CHECK_NRPE: Socket timeout after 10 seconds. [01:14:23] PROBLEM Current Users is now: CRITICAL on patchtest2.pmtpa.wmflabs 10.4.0.74 output: CHECK_NRPE: Socket timeout after 10 seconds. [01:14:43] PROBLEM Current Load is now: CRITICAL on patchtest.pmtpa.wmflabs 10.4.0.69 output: CHECK_NRPE: Socket timeout after 10 seconds. [01:15:12] PROBLEM Disk Space is now: CRITICAL on patchtest.pmtpa.wmflabs 10.4.0.69 output: CHECK_NRPE: Socket timeout after 10 seconds. [01:15:12] PROBLEM Total processes is now: CRITICAL on patchtest2.pmtpa.wmflabs 10.4.0.74 output: CHECK_NRPE: Socket timeout after 10 seconds. [01:15:12] PROBLEM Disk Space is now: CRITICAL on patchtest2.pmtpa.wmflabs 10.4.0.74 output: CHECK_NRPE: Socket timeout after 10 seconds. [01:15:22] PROBLEM Current Load is now: CRITICAL on patchtest2.pmtpa.wmflabs 10.4.0.74 output: CHECK_NRPE: Socket timeout after 10 seconds. [01:15:22] PROBLEM dpkg-check is now: CRITICAL on patchtest2.pmtpa.wmflabs 10.4.0.74 output: CHECK_NRPE: Socket timeout after 10 seconds. [01:16:22] PROBLEM Free ram is now: CRITICAL on patchtest.pmtpa.wmflabs 10.4.0.69 output: CHECK_NRPE: Socket timeout after 10 seconds. [01:16:42] RECOVERY Total processes is now: OK on bots-salebot.pmtpa.wmflabs 10.4.0.163 output: PROCS OK: 93 processes [01:22:23] RECOVERY Total processes is now: OK on sartoris-deploy1.pmtpa.wmflabs 10.4.1.64 output: PROCS OK: 89 processes [01:23:12] heya. On our labs instance piramido, E3 overrode the captcha and allowed account creation. But I think SilkeMeyer's "Changes order of settings for role_config to work" change to templates/mediawiki/labs-localsetting in puppet stopped this from working. [01:24:34] RECOVERY Current Users is now: OK on sartoris-deploy1.pmtpa.wmflabs 10.4.1.64 output: USERS OK - 0 users currently logged in [01:24:34] RECOVERY dpkg-check is now: OK on sartoris-deploy1.pmtpa.wmflabs 10.4.1.64 output: All packages OK [01:24:35] That change moves the require of orig/LocalSettings.php early, so its captcha and "Only sysops can create new accounts" can't be overridden. ?! [01:25:12] RECOVERY Disk Space is now: OK on sartoris-deploy1.pmtpa.wmflabs 10.4.1.64 output: DISK OK [01:25:53] RECOVERY Current Load is now: OK on sartoris-deploy1.pmtpa.wmflabs 10.4.1.64 output: OK - load average: 0.13, 0.70, 0.81 [01:25:53] RECOVERY Free ram is now: OK on sartoris-deploy1.pmtpa.wmflabs 10.4.1.64 output: OK: 871% free memory [01:25:53] PROBLEM Free ram is now: WARNING on bots-3.pmtpa.wmflabs 10.4.0.59 output: Warning: 17% free memory [01:30:54] spagewmf: Dang. [01:32:05] andrewbogott, it's intentional "Changes order of settings for role_config to work", but we kinda need to be able to create accounts and change the captcha to match enwiki for testing. [01:32:29] spagewmf: Yeah, I'm saying 'dang' because it seems like either order causes us trouble. [01:32:36] I have to go in a minute but will look at the problem soon. [01:33:02] NP, I'll hack around it for now and redo on puppet changes. BTW, "Hi" [01:33:09] Hi! [01:37:09] Ryan_Lane: i can't ssh into wikidata-dev-3 [01:37:20] * aude can ssh into the other wikidata instances [01:40:43] aude: yeah, it seems that something has screwed up the ldap config [01:40:51] one se [01:40:51] *Sec [01:40:54] PROBLEM Free ram is now: CRITICAL on bots-3.pmtpa.wmflabs 10.4.0.59 output: Critical: 4% free memory [01:41:51] aude: ok, you can now [01:42:34] uh ok [01:42:47] not sure what did that [01:42:53] let me try a puppet run to see if it goes back [01:43:19] still not sure i can do anything about the sed thing, though [01:43:32] not sure the purpose [01:46:13] me either [01:46:19] it's hurting performance, though [01:46:28] it's causing like 40% waitio [01:46:41] puppet run seemed to fix the ldap config [01:47:30] well, expect the folks from the toolserver and bots ppl to do much crazier stuff [01:47:33] RECOVERY Current Load is now: OK on parsoid-roundtrip6-8core.pmtpa.wmflabs 10.4.0.222 output: OK - load average: 4.47, 4.43, 4.97 [01:47:37] just be prepared for it [01:47:56] * aude can probably disable the cronjob for the weekend and anja can look at it [01:50:14] !log wikidata-dev disable cleanupPhpCodeCoverageHtml.sh cronjob on wikidata-dev-3 [01:50:15] Logged the message, Master [01:50:53] RECOVERY Free ram is now: OK on bots-3.pmtpa.wmflabs 10.4.0.59 output: OK: 697% free memory [01:51:52] Ryan_Lane: also makes me worry about doing much with the maps project [01:53:01] aude: yeah. I'm expecting it [01:53:02] * aude would like test rendering / syncing of osm data for a subset of the osm data, like just the US instead of the world [01:53:10] it's not the IO that's a problem by itself [01:53:22] it's that the IO is for something kind of silly :) [01:53:25] that could still be a bit intensive, but need something like that to test everything [01:53:33] ok :) [01:53:41] for maps... [01:53:49] we really just want to test things [01:53:53] well, our test code coverage is important too [01:53:53] so move it to production [01:53:57] yes [01:53:59] indeed [01:54:07] * aude nods [01:54:09] we have another 3 nodes to add to the cluster in pmtpa [01:54:13] and another 7 in eqiad [01:54:19] ok [01:54:26] we have lots of capacity left [01:54:42] we may add some SSD instances in the future as well [01:54:46] err [01:54:48] SSD hosts [01:55:00] maybe 1 per datacenter [01:55:06] good [01:55:22] osm uses ssd, which is one thing they do nice [01:55:40] makes a tremendous difference [01:56:32] the only bad thing is that SSDs are small ;) [02:01:10] true [02:04:52] there'd be a little dev work involved in that as well, since we'd need to use scheduler hinting. I think it's worth the effort, though [02:05:52] RECOVERY Current Load is now: OK on ve-roundtrip2.pmtpa.wmflabs 10.4.0.162 output: OK - load average: 4.91, 4.12, 4.74 [02:08:49] * aude nods [02:33:52] PROBLEM Current Load is now: WARNING on ve-roundtrip2.pmtpa.wmflabs 10.4.0.162 output: WARNING - load average: 5.60, 5.57, 5.22 [02:38:32] RECOVERY Total processes is now: OK on parsoid-spof.pmtpa.wmflabs 10.4.0.33 output: PROCS OK: 150 processes [02:51:23] RECOVERY Free ram is now: OK on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: OK: 22% free memory [02:52:34] RECOVERY Current Load is now: OK on parsoid-roundtrip3.pmtpa.wmflabs 10.4.0.62 output: OK - load average: 2.74, 4.19, 5.00 [02:53:53] RECOVERY Current Load is now: OK on ve-roundtrip2.pmtpa.wmflabs 10.4.0.162 output: OK - load average: 2.53, 3.74, 4.64 [02:59:22] PROBLEM Free ram is now: WARNING on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: Warning: 12% free memory [03:00:22] PROBLEM Disk Space is now: CRITICAL on deployment-apache32.pmtpa.wmflabs 10.4.0.166 output: DISK CRITICAL - free space: / 286 MB (2% inode=74%): [04:30:22] PROBLEM Free ram is now: WARNING on bots-2.pmtpa.wmflabs 10.4.0.42 output: Warning: 16% free memory [04:31:33] PROBLEM Disk Space is now: CRITICAL on deployment-apache33.pmtpa.wmflabs 10.4.0.187 output: DISK CRITICAL - free space: / 286 MB (2% inode=74%): [04:58:05] why! [04:59:17] oh wait, that is bots-2 [06:29:53] PROBLEM Total processes is now: WARNING on parsoid-roundtrip4-8core.pmtpa.wmflabs 10.4.0.39 output: PROCS WARNING: 152 processes [06:32:32] PROBLEM Total processes is now: WARNING on parsoid-spof.pmtpa.wmflabs 10.4.0.33 output: PROCS WARNING: 156 processes [06:44:53] RECOVERY Total processes is now: OK on parsoid-roundtrip4-8core.pmtpa.wmflabs 10.4.0.39 output: PROCS OK: 147 processes [06:57:32] RECOVERY Total processes is now: OK on parsoid-spof.pmtpa.wmflabs 10.4.0.33 output: PROCS OK: 150 processes [07:25:22] RECOVERY Free ram is now: OK on bots-2.pmtpa.wmflabs 10.4.0.42 output: OK: 20% free memory [09:37:32] RECOVERY Current Load is now: OK on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: OK - load average: 5.00, 4.79, 4.92 [10:25:32] PROBLEM Current Load is now: WARNING on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: WARNING - load average: 4.57, 5.00, 5.01 [10:30:32] RECOVERY Current Load is now: OK on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: OK - load average: 4.53, 4.92, 4.99 [10:37:22] PROBLEM Free ram is now: CRITICAL on dumps-bot2.pmtpa.wmflabs 10.4.0.60 output: Critical: 5% free memory [10:57:03] @seenrx Ryan [10:57:03] petan: Last time I saw Ryan_Lane they were quitting the network with reason: Quit: Leaving. at 1/5/2013 3:51:26 AM (07:05:36.6772430 ago) (multiple results were found: Ryan_Lane1, Ryan_Vesey, Ryan_Lane_away, Ryan) [11:49:13] !logs [11:49:13] logs http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-labs [11:52:13] !log bots ireas: Installed qt4-qmake and libqt4-dev on bots-4. [12:07:51] is anyone of you able to a) login to wmflabs.org or b) if already logged in, edit a page? [12:08:12] trying to login, MediaWiki tells me that cookies were disabled (which is not true) [12:09:02] ireas: I am logged in [12:09:06] if already logged in (from a previous session), I cannot save changes "due to a session data" (probably about cookies, too) [12:09:29] benestar: can you please try to edit something? [12:09:36] ok [12:10:38] hu [12:11:09] "4Sorry! We could not process your edit due to a loss of session data. Please try again. If it still does not work, try logging out and logging back in." [12:11:22] yes, that's the error I get, too [12:11:45] trying to login again [12:12:22] ireas: get the same error [12:12:32] benestar: okay, thanks for testing! [12:12:42] it's quite confusing [12:19:52] wrote a mail to labs-l [12:38:52] PROBLEM Free ram is now: WARNING on bots-3.pmtpa.wmflabs 10.4.0.59 output: Warning: 16% free memory [12:39:15] * Beetstra looks at lego|away [12:39:33] -> 20438 legoktm 20 0 945m 898m 3900 R 6.7 44.7 0:52.57 /data/project/legoktm/py2/bin/python /data/project/legoktm/pywp/indexer2.py [12:40:10] 945 m of virtual, 898 m of resident .. that is bringing the whole box down ... [12:41:25] * Beetstra pokes petan / Ryan Lane [12:41:36] memcached issue [12:41:54] yes [12:42:03] that was regarding labsconsole [12:42:09] oh [12:42:37] Beetstra ok what do you suggest? larger memory? [12:42:51] or move that bot of lego to somewhere else? [12:43:00] Clearly they can't cope together [12:43:09] hmm, we really need to create a new instance with LARGE memory [12:43:17] you require root or not? [12:43:23] for that bot [12:43:32] no, I can just run normally [12:43:34] I would prefer to make it NR [12:43:45] ok [12:44:56] petan, I would suggest to make more instances, with less bots on them [12:45:04] bots-4, bots-5 .. [12:46:32] bots-4 already exist [12:46:38] @labs-resolve bots [12:46:38] I don't know this instance - aren't you are looking for: I-0000009c (bots-2), I-0000009e (bots-cb), I-000000a9 (bots-1), I-000000af (bots-sql2), I-000000b4 (bots-sql3), I-000000b5 (bots-sql1), I-000000e5 (bots-3), I-000000e8 (bots-4), I-0000015e (bots-labs), I-00000190 (bots-dev), [12:46:49] bots-nr1 is nearly empty [12:46:52] !botsdocs [12:46:52] https://labsconsole.wikimedia.org/wiki/Nova_Resource:Bots/Documentation [12:47:22] I've killed my bots so that bot by Lego does not bring the box down .. [12:47:36] (I tried to kill lego's bot .. but I can't) [12:48:53] RECOVERY Free ram is now: OK on bots-3.pmtpa.wmflabs 10.4.0.59 output: OK: 25% free memory [12:48:53] !log bots Beetstra: @Legoktm - top shows: legoktm 20 0 1098m 1.0g 3900 S 0.0 52.3 1:05.72 /data/project/legoktm/py2/bin/python /data/project/legoktm/pywp/indexer2.py - I killed my bots to avoid bots-3 crashing [12:49:06] ok I will create another instance then, but are you sure bots-nr1 is not enough? [12:49:13] I could use that one [12:49:30] can I 'reserve' bots-nr1 for my bots then? [12:49:33] you better check load before, but we really need to create some table of bots and servers [12:49:56] ok, if you want instance only for YOUR bot we should create bots-beetstra or something [12:50:04] like bots-cb (cluebot) [12:50:55] OK, then we have to discuss: can I have 'bots-LiWa' for the linkwatcher (then bots-2 is going to be freed completely, that is now my 'private instance' for that bot) [12:50:59] @labs-project-instances bots [12:51:00] Following instances are in this project: bots-2, bots-cb, bots-1, bots-sql2, bots-sql3, bots-sql1, bots-3, bots-4, bots-labs, bots-dev, bots-salebot, bots-apache-test, bots-nr1, bots-apache01, bots-abogott-devel, bots-analytics, [12:51:19] and 'bots-beetstra' for the rest, then I will stay away from bots-3 as well [12:51:37] @labs-resolve bots-liwa [12:51:37] I don't know this instance, sorry, try browsing the list by hand, but I can guarantee there is no such instance matching this name, host or Nova ID unless it was created less than 56 seconds ago [12:51:52] @labs-resolve liwa [12:51:52] I don't know this instance, sorry, try browsing the list by hand, but I can guarantee there is no such instance matching this name, host or Nova ID unless it was created less than 11 seconds ago [12:52:09] oh [12:52:12] I didn't get you [12:52:14] right [12:52:21] I can create bots-LiWa but what size? [12:52:28] 2gb ram? 1 cpu? [12:52:32] or more? [12:52:33] same as bots-2 [12:52:36] ok [12:52:42] I will do it as soon as mc is fixed [12:52:47] is fine [12:53:30] I am going to temporarily run COIBot/XLinkBot and Unblockbot from bots-nr1 .., leave LiWa on bots-2 .. when you've made the new instances, please ping me, and I will move all [12:56:14] oh [12:56:23] nr-2 does not have a complete perl-install .. [12:56:31] sorry, bots-nr1 [13:04:17] Beetstra will fix [13:05:09] petan .. I may need more, I am running cpan, but not sure whether I am allowed to install there [13:05:32] no problem [13:05:37] just send me what all you need [13:05:47] OK [13:06:59] !log bots petrb: upgrading bots-nr1 and installing some more perl packages [13:07:27] @labs-info bots-nr1 [13:07:28] [Name bots-nr1 doesn't exist but resolves to I-0000049e] I-0000049e is Nova Instance with name: bots-nr1, host: virt5, IP: 10.4.1.2 of type: m1.small, with number of CPUs: 1, RAM of this size: 2048M, member of project: bots, size of storage: 30 and with image ID: ubuntu-12.04-precise [13:07:44] mm [13:07:45] precise [13:16:54] PROBLEM Free ram is now: WARNING on bots-3.pmtpa.wmflabs 10.4.0.59 output: Warning: 18% free memory [13:17:34] PROBLEM dpkg-check is now: CRITICAL on bots-nr1.pmtpa.wmflabs 10.4.1.2 output: DPKG CRITICAL dpkg reports broken packages [13:17:56] Petan: I think these are all: DBI, LWP, Data, Socket, utf8, HTML, Encode, warnings, POE, strict, Date, URI, List [13:19:19] Oh: XML [13:19:52] oh funny, updating ldap client make it impossible for machine to contact wmf ldap [13:20:40] (I hope installing POE also installs ALL subpackages ..) [13:21:36] Lego's bot is going to bring the bots-3 down on his own ... [13:21:47] 1.5g memory used there [13:22:12] I think he needs to optimize it a bit [13:22:30] :-) [13:22:32] I don't mean it's not possible to run a bot with such a need for ram, but in his case I guess it's a bug [13:22:32] RECOVERY dpkg-check is now: OK on bots-nr1.pmtpa.wmflabs 10.4.1.2 output: All packages OK [13:24:49] I had the same in one of my bots .. perl and python can eat memory if you don't take care [13:25:03] 15 Mb free and counting down [13:33:01] :-D .. I think that bot by Lego has started to push other scripts out of existance .. 66 Mb free, using 1.7g now .. [13:33:24] PROBLEM Free ram is now: WARNING on bots-2.pmtpa.wmflabs 10.4.0.42 output: Warning: 19% free memory [13:43:23] RECOVERY Free ram is now: OK on bots-2.pmtpa.wmflabs 10.4.0.42 output: OK: 20% free memory [13:55:40] you have all packages now? [13:55:44] Beetstra ^^ [14:01:07] hmm .. no [14:01:08] Can't locate WWW/Mechanize.pm in @INC (@INC contains: /etc/perl /usr/local/lib/perl/5.14.2 /usr/local/share/perl/5.14.2 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.14 /usr/share/perl/5.14 /usr/local/lib/site_perl .) at Perlwikipedia.pm line 4. [14:02:15] oh, forgot that module [14:02:18] maybe Carp? [14:04:20] all carp.*perl packages installed [14:04:35] btw cpan is there and you should be able to use it [14:05:02] I tried, but it told me that I was not allowed to install something, no rights for it [14:05:09] oh [14:05:37] and su gives an authentication failure [14:06:01] try relogin [14:06:07] because I upgraded ldap client [14:06:14] so all users who logged before are borked [14:08:38] build for WWW::Mechanize is running in CPAN [14:11:13] ok [14:11:32] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! [14:11:32] ERROR: Can't create '/usr/local/bin' [14:11:32] Do not have write permissions on '/usr/local/bin' [14:11:32] !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! [14:12:30] ok, send me that cpan command? [14:12:31] :-( [14:12:48] or tell me library name [14:12:51] "install WWW::Mechanize" [14:13:21] !log bots root: bots-nr1 cpan install WWW:Mechanize [14:16:43] !log bots root: bots-nr1 cpan install WWW:Mechanize - done [14:16:58] try now [14:17:10] XML::Simple [14:17:37] !log bots root: nr1 cpan install XML::Simple [14:18:36] done [14:19:23] PROBLEM Free ram is now: WARNING on bots-2.pmtpa.wmflabs 10.4.0.42 output: Warning: 19% free memory [14:20:29] first bot is online [14:21:57] and the second and the third as well [14:22:04] Thanks! [14:22:21] I hope this will keep everything running smoothly [14:23:17] Now I have to run before my wife starts to become angry :-/ [14:23:25] thanks for this, petan. See you tomorrow! [14:23:38] ok [14:24:39] petan: just fyi: logging does not work due to the wiki issue, I guess [14:25:02] indeed [14:25:12] but it's better on irc than nowhere [14:25:15] !logs [14:25:15] logs http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-labs [14:25:18] :P [14:25:39] okay :D [14:25:41] bye [14:43:32] PROBLEM Current Load is now: WARNING on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: WARNING - load average: 5.06, 5.16, 5.09 [15:13:33] RECOVERY Current Load is now: OK on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: OK - load average: 4.63, 4.73, 5.00 [16:14:52] PROBLEM Free ram is now: WARNING on swift-be3.pmtpa.wmflabs 10.4.0.124 output: Warning: 19% free memory [16:24:52] RECOVERY Free ram is now: OK on swift-be3.pmtpa.wmflabs 10.4.0.124 output: OK: 21% free memory [16:47:53] PROBLEM Free ram is now: WARNING on swift-be3.pmtpa.wmflabs 10.4.0.124 output: Warning: 19% free memory [17:14:40] !log bots petrb: created 2 new instances per irc log [17:14:41] Logged the message, Master [17:16:37] @labs-info bots-nr2 [17:16:37] [Name bots-nr2 doesn't exist but resolves to I-00000567] I-00000567 is Nova Instance with name: bots-nr2, host: virt5, IP: 10.4.1.66 of type: m1.small, with number of CPUs: 1, RAM of this size: 2048M, member of project: bots, size of storage: 30 and with image ID: ubuntu-12.04-precise [17:16:51] @labs-info bots-liwa [17:16:51] I don't know this instance, sorry, try browsing the list by hand, but I can guarantee there is no such instance matching this name, host or Nova ID unless it was created less than 0 seconds ago [17:17:22] RECOVERY Free ram is now: OK on dumps-bot2.pmtpa.wmflabs 10.4.0.60 output: OK: 71% free memory [17:17:28] @labs-info bots-liwa [17:17:28] I don't know this instance, sorry, try browsing the list by hand, but I can guarantee there is no such instance matching this name, host or Nova ID unless it was created less than 12 seconds ago [17:19:52] PROBLEM host: bots-liwa.pmtpa.wmflabs is DOWN address: 10.4.1.65 CRITICAL - Host Unreachable (10.4.1.65) [17:19:52] PROBLEM host: bots-nr2.pmtpa.wmflabs is DOWN address: 10.4.1.66 CRITICAL - Host Unreachable (10.4.1.66) [17:23:54] RECOVERY host: bots-nr2.pmtpa.wmflabs is UP address: 10.4.1.66 PING OK - Packet loss = 0%, RTA = 0.76 ms [17:23:54] RECOVERY host: bots-liwa.pmtpa.wmflabs is UP address: 10.4.1.65 PING OK - Packet loss = 0%, RTA = 0.69 ms [17:24:23] PROBLEM Total processes is now: CRITICAL on bots-liwa.pmtpa.wmflabs 10.4.1.65 output: Connection refused by host [17:24:23] PROBLEM Total processes is now: CRITICAL on bots-nr2.pmtpa.wmflabs 10.4.1.66 output: Connection refused by host [17:25:52] PROBLEM Current Load is now: CRITICAL on bots-liwa.pmtpa.wmflabs 10.4.1.65 output: Connection refused by host [17:25:52] PROBLEM dpkg-check is now: CRITICAL on bots-liwa.pmtpa.wmflabs 10.4.1.65 output: Connection refused by host [17:25:52] PROBLEM Current Load is now: CRITICAL on bots-nr2.pmtpa.wmflabs 10.4.1.66 output: Connection refused by host [17:26:32] PROBLEM Current Users is now: CRITICAL on bots-liwa.pmtpa.wmflabs 10.4.1.65 output: Connection refused by host [17:26:33] PROBLEM Current Users is now: CRITICAL on bots-nr2.pmtpa.wmflabs 10.4.1.66 output: Connection refused by host [17:26:33] PROBLEM dpkg-check is now: CRITICAL on bots-nr2.pmtpa.wmflabs 10.4.1.66 output: Connection refused by host [17:27:12] PROBLEM Disk Space is now: CRITICAL on bots-liwa.pmtpa.wmflabs 10.4.1.65 output: Connection refused by host [17:27:12] PROBLEM Disk Space is now: CRITICAL on bots-nr2.pmtpa.wmflabs 10.4.1.66 output: Connection refused by host [17:28:02] PROBLEM Free ram is now: CRITICAL on bots-liwa.pmtpa.wmflabs 10.4.1.65 output: Connection refused by host [17:28:02] PROBLEM Free ram is now: CRITICAL on bots-nr2.pmtpa.wmflabs 10.4.1.66 output: Connection refused by host [17:30:20] !log bots ireas: Installed qt4-qmake and libqt4-dev on bots-4. [17:30:22] Logged the message, Master [17:30:26] ah, great [17:37:13] RECOVERY Disk Space is now: OK on bots-liwa.pmtpa.wmflabs 10.4.1.65 output: DISK OK [17:37:13] RECOVERY Disk Space is now: OK on bots-nr2.pmtpa.wmflabs 10.4.1.66 output: DISK OK [17:38:03] RECOVERY Free ram is now: OK on bots-liwa.pmtpa.wmflabs 10.4.1.65 output: OK: 895% free memory [17:38:03] RECOVERY Free ram is now: OK on bots-nr2.pmtpa.wmflabs 10.4.1.66 output: OK: 643% free memory [17:39:23] RECOVERY Total processes is now: OK on bots-liwa.pmtpa.wmflabs 10.4.1.65 output: PROCS OK: 84 processes [17:39:24] RECOVERY Total processes is now: OK on bots-nr2.pmtpa.wmflabs 10.4.1.66 output: PROCS OK: 93 processes [17:40:54] RECOVERY Current Load is now: OK on bots-liwa.pmtpa.wmflabs 10.4.1.65 output: OK - load average: 0.08, 0.59, 0.63 [17:40:54] RECOVERY Current Load is now: OK on bots-nr2.pmtpa.wmflabs 10.4.1.66 output: OK - load average: 1.72, 1.38, 0.94 [17:40:54] RECOVERY dpkg-check is now: OK on bots-liwa.pmtpa.wmflabs 10.4.1.65 output: All packages OK [17:41:34] RECOVERY Current Users is now: OK on bots-liwa.pmtpa.wmflabs 10.4.1.65 output: USERS OK - 0 users currently logged in [17:41:34] RECOVERY Current Users is now: OK on bots-nr2.pmtpa.wmflabs 10.4.1.66 output: USERS OK - 1 users currently logged in [17:51:32] RECOVERY dpkg-check is now: OK on bots-nr2.pmtpa.wmflabs 10.4.1.66 output: All packages OK [18:09:22] are coibot, xlinkbot and unblockbot on -nr1 or on -3? doc page says both [18:21:55] 3 [18:24:46] confusing [18:52:52] RECOVERY Free ram is now: OK on swift-be3.pmtpa.wmflabs 10.4.0.124 output: OK: 21% free memory [19:13:32] PROBLEM Total processes is now: WARNING on parsoid-spof.pmtpa.wmflabs 10.4.0.33 output: PROCS WARNING: 152 processes [19:15:53] Holy crap sorry [19:16:04] I didn't even think that script was still running [19:18:33] RECOVERY Total processes is now: OK on parsoid-spof.pmtpa.wmflabs 10.4.0.33 output: PROCS OK: 150 processes [19:20:52] PROBLEM Free ram is now: WARNING on swift-be3.pmtpa.wmflabs 10.4.0.124 output: Warning: 19% free memory [19:29:23] RECOVERY Free ram is now: OK on bots-2.pmtpa.wmflabs 10.4.0.42 output: OK: 20% free memory [19:48:10] Change on 12mediawiki a page OAuth/status was modified, changed by 50.136.243.106 link https://www.mediawiki.org/w/index.php?diff=624594 edit summary: [+205] [19:48:56] o.0 [19:49:01] guess someone at the office forgot to login [19:50:37] Btw, what's wrong with 80M/s of disk writes? We peak at 3-4G/s of writes on misc servers at w0rk [19:52:22] PROBLEM Free ram is now: WARNING on bots-2.pmtpa.wmflabs 10.4.0.42 output: Warning: 19% free memory [20:40:43] PROBLEM Free ram is now: CRITICAL on dumps-bot1.pmtpa.wmflabs 10.4.0.4 output: Critical: 5% free memory [20:55:33] PROBLEM Current Load is now: WARNING on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: WARNING - load average: 4.90, 5.13, 5.04 [21:00:35] RECOVERY Current Load is now: OK on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: OK - load average: 4.63, 4.94, 5.00 [21:08:33] PROBLEM Current Load is now: WARNING on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: WARNING - load average: 4.88, 5.02, 5.02 [21:22:22] RECOVERY Free ram is now: OK on bots-2.pmtpa.wmflabs 10.4.0.42 output: OK: 20% free memory [21:25:53] RECOVERY Free ram is now: OK on swift-be3.pmtpa.wmflabs 10.4.0.124 output: OK: 20% free memory [21:30:23] PROBLEM Free ram is now: WARNING on bots-2.pmtpa.wmflabs 10.4.0.42 output: Warning: 19% free memory [21:53:53] PROBLEM Free ram is now: WARNING on swift-be3.pmtpa.wmflabs 10.4.0.124 output: Warning: 18% free memory [21:55:25] RECOVERY Free ram is now: OK on bots-2.pmtpa.wmflabs 10.4.0.42 output: OK: 20% free memory [22:48:08] Ryan_Lane: Quick way to rule virt0 being memcache's issue? Bring an instance up on 1 and use it remotly - generally a bad idea, but would do as a temp fix [22:50:06] I really don't want to do that ;) [23:05:17] 01/05/2013 - 23:05:16 - Updating keys for techman224 at /export/keys/techman224 [23:15:40] I'm having problems allocating a public ip address [23:17:10] do you have a quote of ips? [23:18:18] Damianz, I'll be back after eating [23:18:35] nom [23:18:45] s/quote/quota/ [23:51:53] PROBLEM Free ram is now: CRITICAL on swift-be4.pmtpa.wmflabs 10.4.0.127 output: Critical: 5% free memory [23:53:53] RECOVERY Free ram is now: OK on swift-be3.pmtpa.wmflabs 10.4.0.124 output: OK: 20% free memory [23:56:52] PROBLEM Free ram is now: WARNING on swift-be4.pmtpa.wmflabs 10.4.0.127 output: Warning: 6% free memory