[00:06:53] RECOVERY Total processes is now: OK on bots-4.pmtpa.wmflabs 10.4.0.64 output: PROCS OK: 143 processes [00:22:54] RECOVERY Free ram is now: OK on swift-be4.pmtpa.wmflabs 10.4.0.127 output: OK: 20% free memory [00:29:29] andrewbogott: can definitely upload to labsconsole [00:36:11] mike_wang: the basic idea is that you'd have a * dns entry, like: *.pmtpa.proxy.wmflabs [00:37:09] err, make that *.pmtpa.proxy.wmflabs.org [00:37:16] mike_wang: if a user requested reportcard.pmtpa.proxy.wmflabs.org, it would act as a reverse proxy to reportcard.pmtpa.wmflabs [01:06:42] PROBLEM Total processes is now: WARNING on bots-salebot.pmtpa.wmflabs 10.4.0.163 output: PROCS WARNING: 180 processes [01:11:42] RECOVERY Total processes is now: OK on bots-salebot.pmtpa.wmflabs 10.4.0.163 output: PROCS OK: 102 processes [01:18:16] oh my god whoops [01:18:23] addshore: ? [01:18:33] oh, number of processes? [01:18:36] don't worry about that [01:18:41] urm, i think i just spawned 600 processes on bots-4 [01:18:43] ah [01:18:44] heh [01:18:47] ok, maybe do :) [01:18:49] xD [01:18:57] urm [01:18:59] right [01:19:17] there is a chance that that isnt what I wanted to do [01:20:12] heh [01:20:13] it happens [01:20:18] xD [01:20:45] It was a nice though / accident to try and multithread thousands of checks, of course! [01:21:02] PROBLEM Free ram is now: WARNING on swift-be4.pmtpa.wmflabs 10.4.0.127 output: Warning: 18% free memory [01:21:36] hmm [01:21:42] how to kill these processes... [01:21:42] PROBLEM Free ram is now: CRITICAL on bots-4.pmtpa.wmflabs 10.4.0.64 output: CHECK_NRPE: Socket timeout after 10 seconds. [01:23:32] PROBLEM dpkg-check is now: CRITICAL on bots-4.pmtpa.wmflabs 10.4.0.64 output: CHECK_NRPE: Socket timeout after 10 seconds. [01:23:43] PROBLEM Disk Space is now: CRITICAL on bots-4.pmtpa.wmflabs 10.4.0.64 output: CHECK_NRPE: Socket timeout after 10 seconds. [01:24:04] I have a feeling I may have killed it [01:24:22] PROBLEM Current Load is now: CRITICAL on bots-4.pmtpa.wmflabs 10.4.0.64 output: CHECK_NRPE: Socket timeout after 10 seconds. [01:25:05] PROBLEM Total processes is now: CRITICAL on bots-4.pmtpa.wmflabs 10.4.0.64 output: CHECK_NRPE: Socket timeout after 10 seconds. [01:25:23] PROBLEM host: bots-4.pmtpa.wmflabs is DOWN address: 10.4.0.64 PING CRITICAL - Packet loss = 100% [01:25:50] Ryan_Lane, is there anything you can do from your side? :/ [01:26:32] oh, you killed the vm? [01:26:35] reboot it [01:26:39] you can do so from labsconsole [01:27:29] * addshore goes in search [01:27:45] it's under "Manage instances" [01:27:46] oh [01:27:47] wait [01:27:53] are you a sysadmin in bots? [01:27:55] probably not.... [01:28:07] dont think so [01:28:08] I can reboot it for you [01:28:13] it's bots-salesbot? [01:28:17] bots-4 [01:28:27] ah [01:29:26] I've put in a reboot request. it may take a little bit. nova (the vm manager) has been slow lately. [01:29:41] okey dokey, thanks :) [01:36:30] * addshore spanks nova [01:39:13] RECOVERY Current Load is now: OK on bots-4.pmtpa.wmflabs 10.4.0.64 output: OK - load average: 0.11, 0.03, 0.01 [01:39:23] RECOVERY host: bots-4.pmtpa.wmflabs is UP address: 10.4.0.64 PING OK - Packet loss = 0%, RTA = 0.88 ms [01:40:53] RECOVERY Total processes is now: OK on bots-4.pmtpa.wmflabs 10.4.0.64 output: PROCS OK: 107 processes [01:41:33] RECOVERY Free ram is now: OK on bots-4.pmtpa.wmflabs 10.4.0.64 output: OK: 876% free memory [01:43:23] RECOVERY dpkg-check is now: OK on bots-4.pmtpa.wmflabs 10.4.0.64 output: All packages OK [01:43:34] RECOVERY Disk Space is now: OK on bots-4.pmtpa.wmflabs 10.4.0.64 output: DISK OK [01:51:08] :> [02:15:42] PROBLEM Current Load is now: WARNING on parsoid-roundtrip3.pmtpa.wmflabs 10.4.0.62 output: WARNING - load average: 6.54, 6.09, 5.34 [02:17:43] PROBLEM Current Load is now: WARNING on parsoid-roundtrip6-8core.pmtpa.wmflabs 10.4.0.222 output: WARNING - load average: 5.49, 5.64, 5.10 [02:20:43] RECOVERY Total processes is now: OK on parsoid-spof.pmtpa.wmflabs 10.4.0.33 output: PROCS OK: 150 processes [02:23:54] PROBLEM Total processes is now: WARNING on bots-4.pmtpa.wmflabs 10.4.0.64 output: PROCS WARNING: 152 processes [02:24:54] PROBLEM Current Load is now: WARNING on ve-roundtrip2.pmtpa.wmflabs 10.4.0.162 output: WARNING - load average: 6.05, 5.53, 5.17 [02:37:43] RECOVERY Current Load is now: OK on parsoid-roundtrip6-8core.pmtpa.wmflabs 10.4.0.222 output: OK - load average: 4.62, 4.47, 4.96 [02:40:53] RECOVERY Free ram is now: OK on swift-be4.pmtpa.wmflabs 10.4.0.127 output: OK: 22% free memory [02:50:33] PROBLEM Current Load is now: WARNING on parsoid-roundtrip6-8core.pmtpa.wmflabs 10.4.0.222 output: WARNING - load average: 6.66, 5.52, 5.21 [02:53:55] PROBLEM Free ram is now: WARNING on swift-be4.pmtpa.wmflabs 10.4.0.127 output: Warning: 19% free memory [03:09:53] RECOVERY Current Load is now: OK on ve-roundtrip2.pmtpa.wmflabs 10.4.0.162 output: OK - load average: 3.04, 4.13, 4.87 [03:15:42] RECOVERY Current Load is now: OK on parsoid-roundtrip3.pmtpa.wmflabs 10.4.0.62 output: OK - load average: 4.14, 4.36, 4.78 [03:23:53] RECOVERY Free ram is now: OK on swift-be4.pmtpa.wmflabs 10.4.0.127 output: OK: 21% free memory [04:12:02] PROBLEM Free ram is now: WARNING on swift-be4.pmtpa.wmflabs 10.4.0.127 output: Warning: 19% free memory [04:23:43] PROBLEM Free ram is now: CRITICAL on dumps-bot1.pmtpa.wmflabs 10.4.0.4 output: Critical: 5% free memory [05:51:59] RECOVERY Free ram is now: OK on swift-be4.pmtpa.wmflabs 10.4.0.127 output: OK: 20% free memory [05:59:53] PROBLEM Free ram is now: WARNING on swift-be4.pmtpa.wmflabs 10.4.0.127 output: Warning: 18% free memory [06:29:44] PROBLEM Total processes is now: WARNING on parsoid-spof.pmtpa.wmflabs 10.4.0.33 output: PROCS WARNING: 156 processes [06:32:32] PROBLEM dpkg-check is now: CRITICAL on deployment-sql02.pmtpa.wmflabs 10.4.0.248 output: DPKG CRITICAL dpkg reports broken packages [06:36:32] PROBLEM dpkg-check is now: CRITICAL on conventionextension-trial.pmtpa.wmflabs 10.4.0.165 output: DPKG CRITICAL dpkg reports broken packages [06:37:33] RECOVERY dpkg-check is now: OK on deployment-sql02.pmtpa.wmflabs 10.4.0.248 output: All packages OK [06:39:42] RECOVERY Total processes is now: OK on parsoid-spof.pmtpa.wmflabs 10.4.0.33 output: PROCS OK: 149 processes [06:45:13] PROBLEM dpkg-check is now: CRITICAL on testing-arky.pmtpa.wmflabs 10.4.0.45 output: DPKG CRITICAL dpkg reports broken packages [07:00:52] PROBLEM dpkg-check is now: CRITICAL on mw1-21beta-lucid.pmtpa.wmflabs 10.4.0.182 output: DPKG CRITICAL dpkg reports broken packages [07:33:42] PROBLEM Current Load is now: WARNING on parsoid-roundtrip3.pmtpa.wmflabs 10.4.0.62 output: WARNING - load average: 6.77, 6.53, 5.60 [07:42:22] PROBLEM Total processes is now: WARNING on aggregator1.pmtpa.wmflabs 10.4.0.79 output: PROCS WARNING: 200 processes [08:03:42] RECOVERY Current Load is now: OK on parsoid-roundtrip3.pmtpa.wmflabs 10.4.0.62 output: OK - load average: 2.16, 3.25, 4.54 [08:09:26] !log deployment-prep put back role::beta::autoupdater on -bastion [08:09:28] Logged the message, Master [08:12:23] PROBLEM Total processes is now: CRITICAL on aggregator1.pmtpa.wmflabs 10.4.0.79 output: PROCS CRITICAL: 201 processes [08:18:47] !log deployment-prep removed misc::deployment::scripts from -bastion, already provided by misc::deployment::scap_scripts [08:18:49] Logged the message, Master [08:24:19] !log deployment-prep made deployment-bastion a git-deploy deployment host [08:24:39] !log deployment-prep deployed all repos to destination hosts [08:24:52] labs-morebots: hey, you. [08:25:04] labs-morebots: wtf? [08:26:59] !log deployment-prep made deployment-bastion a git-deploy deployment host [08:26:59] deployment-prep is not a valid project. [08:27:02] -_- [08:27:09] !log deployment-prep made deployment-bastion a git-deploy deployment host [08:27:09] deployment-prep is not a valid project. [08:27:16] oh, fuck yourself [08:27:18] !log deployment-prep made deployment-bastion a git-deploy deployment host [08:27:19] deployment-prep is not a valid project. [08:27:26] !log testing test [08:27:26] testing is not a valid project. [08:27:32] I think I know why [08:27:45] well, no, actually it should pull that from ldap, right? [08:28:40] !log testing test [08:28:40] testing is not a valid project. [08:28:44] RAWR [08:29:21] this is definitely an ldap call [08:29:38] !log testing test [08:29:39] Logged the message, Master [08:29:44] !log deployment-prep made deployment-bastion a git-deploy deployment host [08:29:46] Logged the message, Master [08:29:54] !log deployment-prep deployed all repos to destination hosts [08:29:55] Logged the message, Master [08:30:33] ok, syncing slot0, then bedtime [08:34:19] !log bastion testing [08:34:20] Logged the message, Master [08:34:29] labs-morebots doesn't work? [08:34:30] I am a logbot running on i-0000015e. [08:34:30] Messages are logged to labsconsole.wikimedia.org/wiki/Server_Admin_Log. [08:34:30] To log a message, type !log . [08:34:44] !log bastion testing [08:34:46] Logged the message, Master [08:34:53] !log bota testing [08:34:53] bota is not a valid project. [08:34:56] !log bots testing [08:34:57] Logged the message, Master [08:35:29] !ping [08:35:29] pong [08:37:22] RECOVERY Free ram is now: OK on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: OK: 21% free memory [08:40:52] RECOVERY Free ram is now: OK on swift-be4.pmtpa.wmflabs 10.4.0.127 output: OK: 22% free memory [08:45:22] PROBLEM Free ram is now: WARNING on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: Warning: 15% free memory [08:50:52] PROBLEM Free ram is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Unable to read output [08:52:22] PROBLEM Total processes is now: WARNING on aggregator1.pmtpa.wmflabs 10.4.0.79 output: PROCS WARNING: 200 processes [08:55:53] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [09:00:27] oh lol [09:00:35] so I really was here? [09:00:43] my bouncer was only sending messages to channel [09:00:48] so I thought all these bots are dead [09:00:52] !ping [09:00:52] pong [09:00:55] oh my [09:03:53] PROBLEM Free ram is now: WARNING on swift-be4.pmtpa.wmflabs 10.4.0.127 output: Warning: 19% free memory [09:23:14] petan: hahaha [09:52:23] PROBLEM Total processes is now: CRITICAL on aggregator1.pmtpa.wmflabs 10.4.0.79 output: PROCS CRITICAL: 203 processes [10:02:30] !log Wikidata-dev wikidata-dev-9: Installed openjdk-7-jre. [10:02:31] Wikidata-dev is not a valid project. [10:02:41] !log wikidata-dev wikidata-dev-9: Installed openjdk-7-jre. [10:02:43] Logged the message, Master [10:53:52] RECOVERY Free ram is now: OK on swift-be4.pmtpa.wmflabs 10.4.0.127 output: OK: 20% free memory [10:57:23] PROBLEM Total processes is now: WARNING on aggregator1.pmtpa.wmflabs 10.4.0.79 output: PROCS WARNING: 199 processes [11:06:52] PROBLEM Free ram is now: WARNING on swift-be4.pmtpa.wmflabs 10.4.0.127 output: Warning: 18% free memory [12:40:22] RECOVERY Free ram is now: OK on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: OK: 21% free memory [12:41:52] RECOVERY Free ram is now: OK on swift-be4.pmtpa.wmflabs 10.4.0.127 output: OK: 21% free memory [12:42:51] !tunnel [12:42:52] ssh -f user@bastion.wmflabs.org -L :server: -N Example for sftp "ssh chewbacca@bastion.wmflabs.org -L 6000:bots-1:22 -N" will open bots-1:22 as localhost:6000 [12:53:23] PROBLEM Free ram is now: WARNING on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: Warning: 15% free memory [12:54:53] PROBLEM Free ram is now: WARNING on swift-be4.pmtpa.wmflabs 10.4.0.127 output: Warning: 18% free memory [13:01:42] RECOVERY Current Load is now: OK on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: OK - load average: 4.77, 4.82, 4.97 [13:09:43] @labs-project-info bots [13:09:43] The project Bots has 18 instances and 45 members, description: A project for creating and running bots for use on Wikimedia Foundation sites. [13:12:03] @labs-info bots-bnr1 [13:12:03] I don't know this instance, sorry, try browsing the list by hand, but I can guarantee there is no such instance matching this name, host or Nova ID unless it was created less than 19 seconds ago [13:14:33] PROBLEM Current Users is now: CRITICAL on bots-bnr1.pmtpa.wmflabs 10.4.1.68 output: Connection refused by host [13:15:12] PROBLEM Disk Space is now: CRITICAL on bots-bnr1.pmtpa.wmflabs 10.4.1.68 output: Connection refused by host [13:15:52] PROBLEM Free ram is now: CRITICAL on bots-bnr1.pmtpa.wmflabs 10.4.1.68 output: Connection refused by host [13:15:52] PROBLEM Current Load is now: CRITICAL on bots-bnr1.pmtpa.wmflabs 10.4.1.68 output: Connection refused by host [13:17:22] PROBLEM Total processes is now: CRITICAL on bots-bnr1.pmtpa.wmflabs 10.4.1.68 output: Connection refused by host [13:18:12] PROBLEM dpkg-check is now: CRITICAL on bots-bnr1.pmtpa.wmflabs 10.4.1.68 output: Connection refused by host [13:24:33] RECOVERY Current Users is now: OK on bots-bnr1.pmtpa.wmflabs 10.4.1.68 output: USERS OK - 0 users currently logged in [13:24:53] RECOVERY Free ram is now: OK on swift-be4.pmtpa.wmflabs 10.4.0.127 output: OK: 20% free memory [13:25:13] RECOVERY Disk Space is now: OK on bots-bnr1.pmtpa.wmflabs 10.4.1.68 output: DISK OK [13:25:54] RECOVERY Free ram is now: OK on bots-bnr1.pmtpa.wmflabs 10.4.1.68 output: OK: 2853% free memory [13:25:54] RECOVERY Current Load is now: OK on bots-bnr1.pmtpa.wmflabs 10.4.1.68 output: OK - load average: 0.43, 0.88, 0.62 [13:27:23] RECOVERY Total processes is now: OK on bots-bnr1.pmtpa.wmflabs 10.4.1.68 output: PROCS OK: 100 processes [13:27:43] PROBLEM Total processes is now: WARNING on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS WARNING: 151 processes [13:28:13] RECOVERY dpkg-check is now: OK on bots-bnr1.pmtpa.wmflabs 10.4.1.68 output: All packages OK [13:37:23] PROBLEM Total processes is now: CRITICAL on dumps-bot1.pmtpa.wmflabs 10.4.0.4 output: CHECK_NRPE: Error - Could not complete SSL handshake. [13:37:33] PROBLEM dpkg-check is now: CRITICAL on dumps-bot1.pmtpa.wmflabs 10.4.0.4 output: CHECK_NRPE: Error - Could not complete SSL handshake. [13:38:33] PROBLEM SSH is now: CRITICAL on dumps-bot1.pmtpa.wmflabs 10.4.0.4 output: Server answer: [13:42:28] RECOVERY Total processes is now: OK on dumps-bot1.pmtpa.wmflabs 10.4.0.4 output: PROCS OK: 125 processes [13:42:33] RECOVERY dpkg-check is now: OK on dumps-bot1.pmtpa.wmflabs 10.4.0.4 output: All packages OK [13:43:33] RECOVERY SSH is now: OK on dumps-bot1.pmtpa.wmflabs 10.4.0.4 output: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [13:43:43] RECOVERY Free ram is now: OK on dumps-bot1.pmtpa.wmflabs 10.4.0.4 output: OK: 54% free memory [14:12:53] PROBLEM Free ram is now: WARNING on swift-be4.pmtpa.wmflabs 10.4.0.127 output: Warning: 18% free memory [14:22:42] RECOVERY Total processes is now: OK on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS OK: 146 processes [14:52:23] PROBLEM Total processes is now: CRITICAL on aggregator1.pmtpa.wmflabs 10.4.0.79 output: PROCS CRITICAL: 202 processes [14:55:53] PROBLEM Free ram is now: UNKNOWN on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Unknown [14:57:23] PROBLEM Total processes is now: WARNING on aggregator1.pmtpa.wmflabs 10.4.0.79 output: PROCS WARNING: 200 processes [15:00:52] PROBLEM Free ram is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: CHECK_NRPE: Error - Could not complete SSL handshake. [15:01:15] Damianz [15:01:20] I found it :P [15:01:23] the old ramcheck [15:03:33] petan, am i imagining things or was there an instance somewhere with WP dumps on? [15:03:49] you are imagining things [15:03:51] :D :D [15:04:06] there is a beta with some little clones [15:04:13] there is full clone of simple wiki [15:04:21] but definitely not full clone of en [15:04:35] :< [15:05:00] gonna have to make a wee script to ssh to toolserver do a request and return :P [15:05:52] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [15:07:43] @search git [15:07:43] Results (Found 8): leslie's-reset, damianz's-reset, account-questions, git, origin/test, git-puppet, gitweb, msys-git, [15:08:45] !git-branches is git branch blah - create a branch; git checkout blah - switch to blah [15:08:45] Key was added [15:10:03] !q1 is Damianz where is teh ramcheck in puppet? [15:10:03] Key was added [15:10:06] !q1 [15:10:07] Damianz where is teh ramcheck in puppet? [15:20:42] PROBLEM Total processes is now: WARNING on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS WARNING: 152 processes [15:21:13] PROBLEM dpkg-check is now: CRITICAL on bots-bnr1.pmtpa.wmflabs 10.4.1.68 output: DPKG CRITICAL dpkg reports broken packages [15:25:42] RECOVERY Total processes is now: OK on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS OK: 150 processes [15:44:26] !q1 [15:44:26] Damianz where is teh ramcheck in puppet? [15:53:18] !q2 [15:53:39] ah nagios no more reports the puppet freshness :/ [15:53:40] http://nagios.wmflabs.org/cgi-bin/nagios3/status.cgi?hostgroup=deployment-prep&style=detail [15:53:51] the SNMP traps are sent from host but never received apparently [15:53:55] or not handled properly [15:53:55] sniff [15:56:13] RECOVERY dpkg-check is now: OK on bots-bnr1.pmtpa.wmflabs 10.4.1.68 output: All packages OK [16:03:32] !log bots root: created 20gb swap on bots-bnr1 to avoid OOM crashes [16:03:34] Logged the message, Master [16:04:25] !log bots petrb: moving afc bot to bots-bnr1 [16:04:26] Logged the message, Master [16:22:53] RECOVERY Free ram is now: OK on swift-be4.pmtpa.wmflabs 10.4.0.127 output: OK: 20% free memory [16:30:52] PROBLEM Free ram is now: WARNING on swift-be4.pmtpa.wmflabs 10.4.0.127 output: Warning: 18% free memory [16:38:23] RECOVERY Free ram is now: OK on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: OK: 21% free memory [16:40:53] RECOVERY Free ram is now: OK on swift-be4.pmtpa.wmflabs 10.4.0.127 output: OK: 21% free memory [16:46:34] w [16:46:36] damn [16:51:41] !log wikidata-dev wikidata-dev-9: Added cron job for test coverage CSV generation. [16:51:42] Logged the message, Master [16:56:23] PROBLEM Free ram is now: WARNING on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: Warning: 15% free memory [17:11:42] PROBLEM Total processes is now: WARNING on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS WARNING: 153 processes [17:38:53] PROBLEM Free ram is now: WARNING on swift-be4.pmtpa.wmflabs 10.4.0.127 output: Warning: 18% free memory [17:46:43] RECOVERY Total processes is now: OK on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS OK: 148 processes [17:54:43] PROBLEM Total processes is now: WARNING on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS WARNING: 153 processes [18:43:04] <^demon> Ryan_Lane: Could you merge https://gerrit.wikimedia.org/r/#/c/43239/ for me? [18:53:02] <^demon> Thanks. [18:53:45] yw [18:54:43] RECOVERY Total processes is now: OK on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS OK: 149 processes [18:55:54] PROBLEM Free ram is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: CHECK_NRPE: Error - Could not complete SSL handshake. [18:56:24] PROBLEM Current Load is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: CHECK_NRPE: Error - Could not complete SSL handshake. [19:00:53] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [19:01:23] RECOVERY Current Load is now: OK on aggregator2.pmtpa.wmflabs 10.4.0.193 output: OK - load average: 0.02, 0.14, 0.22 [19:03:03] <^demon> Ryan_Lane: https://gerrit-review.googlesource.com/#/c/39140/ just got a +2 \o/ [19:03:07] <^demon> Yayayay. [19:03:14] sweet [19:11:32] PROBLEM Free ram is now: WARNING on bots-4.pmtpa.wmflabs 10.4.0.64 output: Warning: 18% free memory [19:36:33] RECOVERY Free ram is now: OK on bots-4.pmtpa.wmflabs 10.4.0.64 output: OK: 20% free memory [20:15:53] PROBLEM Free ram is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Call to popen() failed [20:35:53] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [20:38:53] RECOVERY Free ram is now: OK on swift-be4.pmtpa.wmflabs 10.4.0.127 output: OK: 20% free memory [20:41:23] RECOVERY Free ram is now: OK on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: OK: 21% free memory [20:46:52] PROBLEM Free ram is now: WARNING on swift-be4.pmtpa.wmflabs 10.4.0.127 output: Warning: 17% free memory [20:54:22] PROBLEM Free ram is now: WARNING on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: Warning: 15% free memory [21:23:18] andrewbogott_afk: How to assign a public IP to my instance mwang-dev. I added a domain mwang-dev.wmflabs.org through labsconsole, it seems doesn't work. [21:24:31] change the project quota and then assign one [21:28:13] Damianz: where can I find the project quota? [21:28:51] I'm not sure there's any interface for it in osm - I know you can do it from the nova management command line [21:29:08] * Damianz points at Ryan_Lane for knowing that for sure [21:29:49] mike_wang: https://labsconsole.wikimedia.org/wiki/Help:Contents#Administration [21:30:01] mike_wang: https://labsconsole.wikimedia.org/wiki/Help:Nova-manage [21:30:16] Setting tz to SGT = 'Hi John, we've set your timezone to Europe/London!' < *FROWN* wtf is the middleware not setting meh timezone... [21:30:17] oh [21:30:20] * Damianz goes to stab [21:30:23] you don't have root in production [21:30:31] mike_wang: which project is this in? [21:30:37] Can I have toor in production? [21:30:50] Damianz: sure. toor = no shell account [21:30:52] :) [21:31:03] toor is totally an alternative r00t account [21:31:03] :D [21:31:31] I need to add admin api support to OSM [21:31:44] so that we can manage quotas and such through the interface [21:32:05] Ryan_Lane: It is in testlabs [21:32:25] wait [21:32:31] you added a domain? [21:32:39] go delete the domain you added [21:32:44] that's an entire zone [21:33:04] hm [21:33:04] you know the interface could be a level of awesome better [21:33:08] can we re-write it in python? :D [21:33:17] well. I guess you actually need a zone to test nginx reverse proxy, eh? [21:33:18] Damianz: hahaha [21:33:26] Damianz: if we were going to do that we'd just use horizon [21:33:38] :D [21:33:38] <^demon> UUID PATCH IN MASTER. [21:33:45] <^demon> PARDON ME, IT'S TIME FOR A DANCE PARTY [21:33:48] tho quota support would be fucking awesome [21:34:01] and means we can add notices for users trying to do stuff like add ips/servers more than 'failed' [21:34:10] mike_wang: did you go to "Manage addresses"? [21:34:15] ^demon: Buuut it's not friday, friday, friday yet :( [21:34:45] <^demon> I'm treating it as friday. It's time for a beer. [21:34:53] <^demon> :) [21:34:54] in that interface you click on "allocate" next to testlabs [21:34:55] It's allways time for beer [21:35:04] Damianz: and yes, the interface could be way better [21:35:17] give me some time to work on it and it would improve. heh [21:35:24] it hasn't had a major change since I switched the API [21:35:36] I'll add it to the list of other shit you need to do :) [21:36:17] I should work on osm somewhat now 2 is working.. or I think it's working bleh [21:38:56] I think LDAP needs to be fixed on it [21:39:00] in fact, I know it does [21:39:09] that sucks [21:39:15] I'll get that working today [21:39:27] we need to have a dev environment. it's critical [21:39:28] I might work on salt some more instead then :P [21:43:19] I can not find "Manage addresses" [21:43:43] PROBLEM Total processes is now: WARNING on parsoid-spof.pmtpa.wmflabs 10.4.0.33 output: PROCS WARNING: 157 processes [21:45:00] Ryan_Lane: I found allocate IP [21:45:14] manage addresses is on the sidebar [21:45:42] under Labs Netadmins [21:59:59] oh crap [22:00:04] what have they done to youtube [22:03:36] hey guys - trying to create a new instance on labs and I don't have permission [22:03:44] (I don't see the "add instance" link) [22:03:58] account: Milimetric [22:04:10] What project? [22:04:14] Analytics [22:04:17] is it per project? [22:04:32] Yeah - you need sysadmin rights in a project to create stuff [22:05:31] ok, cool. Could someone give me rights to Analytics? I wanted to create a instance for robla and Diederik, and it's not related to reportcard [22:05:38] it looks like I have rights to reportcard [22:06:29] I can't, you need someone with sysadmin rights to add you [22:07:25] Ryan_Lane: What is the freaking hell point of the 'admins' tab on project pages? [22:07:26] thanks Damianz, I'll try to hunt someone down [22:07:35] admins tab? [22:07:43] !resources bots [22:07:48] right, above members [22:07:49] !resource bots [22:07:49] https://labsconsole.wikimedia.org/wiki/Nova_Resource:bots [22:07:58] oh [22:08:04] that's means to be a list of project admins [22:08:18] like sysadmins/netadmins or manual? [22:08:20] I'm very strongly considering collapsing sysadmin and netadmin [22:08:27] makes sense for us [22:08:29] yes [22:08:32] I think so too [22:08:40] tbh I'd collapse the functions too [22:08:43] yes [22:08:45] that's the idea [22:08:46] Ryan_Lane do you have rights to add me to Analytics sysadmin? [22:08:48] make adding an ip configuring an instance the same page [22:08:51] milimetric: yes [22:09:04] oh, that'd be helpful [22:09:05] Damianz: I'd like to have the entire interface on one page [22:09:12] Damianz: with tabs at the top [22:09:15] all javascript [22:09:18] yeah [22:09:21] and ajax [22:09:22] like user pref page [22:09:26] oh Ryan_Lane [22:09:26] less ugly tho [22:09:30] nvm, diederik added me [22:09:31] heh [22:09:31] yes [22:09:32] sorry! [22:09:34] oh [22:09:35] great [22:09:38] no problem [22:09:46] thx guys, have a nice night [22:09:50] It's weird listening to a dance track being played acoustic... has a whole new meaning [22:13:22] * Damianz wonders if andrewbogott is feeling ok [22:14:39] I've been getting kicked just as soon as I attach. It's something messed up with dircproxy I think. [22:14:42] Sorry for the spam [22:15:10] You should get some irssi proxy ;) [22:16:08] hm, Ryan_Lane I've got the "Add instance" link for both "Reportcard" and "Analytics" now but adding an instance on "Analytics" fails with no message. Adding one on "Reportcard" works. I'll just do it on "Reportcard" but thought you guys might want to know. Maybe Diederik didn't add my permissions properly [22:16:22] hm [22:16:26] define fails [22:16:28] I wonder if I broke something [22:17:04] (I'm trying to get good at this so I can puppetize/replace the reportcard2 and kripke instances eventually :)) [22:17:09] milimetric: you mean you get a blank page? [22:17:15] or it says it failed? [22:17:18] no, it says it failed [22:17:24] i can try again and give you the msg [22:17:28] Sometimes in labs you have to wonder what isn't broken :) [22:17:30] I bet you've hit a quota [22:17:38] tho our nova api failure handling is crap [22:17:40] Failed to create instance. [22:17:40] Damianz: well, I need to pass through error messages [22:17:52] and expose quotas [22:17:55] yes [22:17:56] that too [22:18:05] I've never created an instance, but is there a per-project limit? [22:18:10] milimetric: yes [22:18:19] lemme see if you hit it [22:18:23] we have 8 in Analytics [22:18:24] it's not really isntance thou it's cpu/ram just to be confusing [22:18:28] yep [22:18:30] oh gotcha [22:18:33] well, it's instance too [22:18:40] I was trying a small instance [22:18:40] yeah but you're gonna hit cpu first [22:18:47] ye [22:18:49] yep [22:18:51] quota limit [22:18:58] one sec [22:19:58] why'd I remove the size from the instance list table? [22:20:00] annoying [22:20:08] because it was broken? [22:20:12] no, it wasnt [22:20:19] oh nope that was image sizes [22:20:22] because someone told me the page was too cluttered [22:20:24] the documentation says 10 instances is the quota limit, should I change it to 8? [22:20:29] no [22:20:31] it's not 8 [22:20:37] you likely hit cpu quota [22:20:40] tbh smw makes it kinda suck [22:20:41] gotcha [22:20:56] or ram [22:20:57] like we take a nice database of info, make it a horrid static format then transclude it [22:21:08] you'd normally just memcache it and use the api =\ stupid mw [22:21:09] Damianz: we're only doing that on project pages [22:21:09] so then I'll work on "Reportcard" for now [22:21:20] milimetric: try to create it now [22:21:21] and when I'm ready I'll swap out my new instance for kripke [22:21:42] should work, I upped instance and cpu quotas for that project [22:21:46] ooh yay :) [22:21:49] I'll up ram if necessary [22:21:51] thank you! [22:21:57] nope, it worked [22:21:58] I'd go for memcache more things... but memcache breaks a lot :D [22:22:01] I have a shiny new instance [22:22:05] thanks again guys [22:22:10] Damianz: most likely because the hardware is broken [22:22:19] milimetric: yw [22:22:31] I need to figure out which dimm is bad [22:22:36] It's 2013, new year new 20odd million quid budget... buy ram! [22:22:51] yeah. but I need to take down virt0 to figure out which dimm is bad [22:23:10] Would be interesting to see if live migration works [22:23:11] heh [22:23:22] actually storage is local still isn't it [22:23:23] sadface [22:23:40] or well... glusterish local [22:24:16] which storage? [22:24:22] oh [22:24:23] vm [22:24:23] yeah [22:24:24] storage [22:24:26] err [22:24:30] instance storage [22:24:31] local [22:24:34] yeah [22:24:45] we're going to try ceph rbd in eqiad [22:25:12] I'm really hoping ceph becomes awesome and not like gluster... not seen many stats off it really though [22:25:19] yeah. agreed [22:25:26] lots of openstack folks using it [22:25:32] so I have a little faith there [22:26:08] I think dreamhost backing it is good from a point of real world exposure to insane fs usage patterns [22:26:15] yes [22:26:22] and they are using it for their public cloud [22:26:34] they're using loads of ssds iirc [22:26:36] so that's a good public example of it [22:26:37] yeah [22:26:38] they are [22:29:14] I wonder if we'll get to play with ipv6 in eqiad too... tho really it's redudnant until there's a network node per compute node [22:29:24] yes [22:29:26] we will [22:29:39] we're doing network node per compute in eqiad from the start too [22:29:53] PROBLEM host: redmine.pmtpa.wmflabs is DOWN address: 10.4.1.71 CRITICAL - Host Unreachable (10.4.1.71) [22:30:49] Is that still pending the work to advertise ips to the routers over bgp (ie the next 6mo release)? [22:33:39] yes [22:33:48] I need to add that to quantum [22:33:50] badly [22:33:52] RECOVERY host: redmine.pmtpa.wmflabs is UP address: 10.4.1.71 PING OK - Packet loss = 0%, RTA = 0.79 ms [22:33:59] I'm just going to add it to nova-network for us [22:34:04] but I want it upstream [22:34:22] PROBLEM Total processes is now: CRITICAL on redmine.pmtpa.wmflabs 10.4.1.71 output: Connection refused by host [22:35:12] PROBLEM dpkg-check is now: CRITICAL on redmine.pmtpa.wmflabs 10.4.1.71 output: Connection refused by host [22:35:52] PROBLEM Current Load is now: CRITICAL on redmine.pmtpa.wmflabs 10.4.1.71 output: Connection refused by host [22:36:32] PROBLEM Current Users is now: CRITICAL on redmine.pmtpa.wmflabs 10.4.1.71 output: Connection refused by host [22:36:45] I can imagine others using it... since arp is a really crap way of moving ips around [22:37:12] PROBLEM Disk Space is now: CRITICAL on redmine.pmtpa.wmflabs 10.4.1.71 output: Connection refused by host [22:38:02] PROBLEM Free ram is now: CRITICAL on redmine.pmtpa.wmflabs 10.4.1.71 output: Connection refused by host [22:44:22] RECOVERY Total processes is now: OK on redmine.pmtpa.wmflabs 10.4.1.71 output: PROCS OK: 84 processes [22:45:53] RECOVERY Current Load is now: OK on redmine.pmtpa.wmflabs 10.4.1.71 output: OK - load average: 0.18, 0.75, 0.56 [22:45:53] RECOVERY dpkg-check is now: OK on redmine.pmtpa.wmflabs 10.4.1.71 output: All packages OK [22:46:33] RECOVERY Current Users is now: OK on redmine.pmtpa.wmflabs 10.4.1.71 output: USERS OK - 0 users currently logged in [22:47:13] RECOVERY Disk Space is now: OK on redmine.pmtpa.wmflabs 10.4.1.71 output: DISK OK [22:48:03] RECOVERY Free ram is now: OK on redmine.pmtpa.wmflabs 10.4.1.71 output: OK: 889% free memory [22:58:54] Damianz: yes, a bunch of people were excited about it at the last summit [22:59:05] Damianz: did you hear the october summit will be in europe? [22:59:33] :o nope [22:59:46] I'm conference planning for this year atm though [23:00:13] it's a good one to go to [23:00:48] definitely one of my favorite ones [23:00:54] Cool [23:01:16] I'm hoping euro python release their dates soon... wondering if I can ride over to europe and have some fun in the summer [23:05:53] PROBLEM Free ram is now: UNKNOWN on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Call to fork() failed [23:07:23] PROBLEM Total processes is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: CHECK_NRPE: Error - Could not complete SSL handshake. [23:10:52] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [23:12:22] RECOVERY Total processes is now: OK on aggregator2.pmtpa.wmflabs 10.4.0.193 output: PROCS OK: 193 processes [23:35:02] PROBLEM host: lwelling.pmtpa.wmflabs is DOWN address: 10.4.1.69 CRITICAL - Host Unreachable (10.4.1.69) [23:38:53] RECOVERY host: lwelling.pmtpa.wmflabs is UP address: 10.4.1.69 PING OK - Packet loss = 0%, RTA = 9.37 ms [23:39:23] PROBLEM Total processes is now: CRITICAL on lwelling.pmtpa.wmflabs 10.4.1.69 output: Connection refused by host [23:40:53] PROBLEM Current Load is now: CRITICAL on lwelling.pmtpa.wmflabs 10.4.1.69 output: Connection refused by host [23:40:54] PROBLEM dpkg-check is now: CRITICAL on lwelling.pmtpa.wmflabs 10.4.1.69 output: Connection refused by host [23:41:33] PROBLEM Current Users is now: CRITICAL on lwelling.pmtpa.wmflabs 10.4.1.69 output: Connection refused by host [23:42:13] PROBLEM Disk Space is now: CRITICAL on lwelling.pmtpa.wmflabs 10.4.1.69 output: Connection refused by host [23:43:03] PROBLEM Free ram is now: CRITICAL on lwelling.pmtpa.wmflabs 10.4.1.69 output: Connection refused by host [23:47:12] RECOVERY Disk Space is now: OK on lwelling.pmtpa.wmflabs 10.4.1.69 output: DISK OK [23:48:02] RECOVERY Free ram is now: OK on lwelling.pmtpa.wmflabs 10.4.1.69 output: OK: 900% free memory [23:49:22] RECOVERY Total processes is now: OK on lwelling.pmtpa.wmflabs 10.4.1.69 output: PROCS OK: 84 processes [23:50:32] PROBLEM Free ram is now: WARNING on mediawiki-bugfix-kozuch.pmtpa.wmflabs 10.4.0.26 output: Warning: 19% free memory [23:50:52] RECOVERY Current Load is now: OK on lwelling.pmtpa.wmflabs 10.4.1.69 output: OK - load average: 0.08, 0.72, 0.63 [23:50:52] RECOVERY dpkg-check is now: OK on lwelling.pmtpa.wmflabs 10.4.1.69 output: All packages OK [23:51:32] RECOVERY Current Users is now: OK on lwelling.pmtpa.wmflabs 10.4.1.69 output: USERS OK - 0 users currently logged in