[00:03:52] PROBLEM Free ram is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Unable to read output [00:07:23] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [00:13:53] PROBLEM Free ram is now: WARNING on swift-be2.pmtpa.wmflabs 10.4.0.112 output: Warning: 17% free memory [00:16:22] Ryan_Lane, is there a clever way to do a big search/replace in ldap, or would you just change all the sudoers with a hand-edited ldif? [00:17:14] andrewbogott: I usually use a hand-edited ldif [00:17:24] ok, that's fine. [00:19:30] andrewbogott: is adding security groups to instances possible in folsom? [00:19:45] I only said that because I heard it from you (I thought) [00:19:46] that alone may be worth putting in effort to upgrading soon [00:19:49] ah [00:19:55] let me look at the api docs [00:37:44] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [00:37:54] Ryan_Lane: Before I fill in the rest of the blanks… does sudogroup.ldif on virt0 look right to you? [00:37:54] RECOVERY Free ram is now: OK on swift-be4.pmtpa.wmflabs 10.4.0.127 output: OK: 20% free memory [00:37:59] lemme see [00:38:14] also, I'm pretty sure it's still not in the api to add security groups to instances [00:38:21] Also… there are four small OSM patches awaiting your review. [00:38:23] :( [00:38:54] RECOVERY Free ram is now: OK on swift-be2.pmtpa.wmflabs 10.4.0.112 output: OK: 20% free memory [00:38:55] Ryan_Lane: It's tempting to add a web group by default to all projects and instances [00:39:08] that doesn't look right ;) [00:39:16] ok -- how so? [00:39:24] RECOVERY Free ram is now: OK on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: OK: 21% free memory [00:39:27] oh [00:39:29] ignore me [00:39:34] I was reading it incorrectly [00:39:50] did you do a search for any policy that matched "ALL" for sudo user? [00:39:57] yes. [00:40:05] * Ryan_Lane nods [00:40:08] And verified (by hand) that none of them had additional users. [00:40:16] yeah, this looks fine [00:40:24] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 22% free memory [00:40:25] ok. I will get back to the copy/pasting [00:40:52] oh. some of the entries wrapped around [00:40:56] ldapsearch is annoying [00:41:05] and will add line breaks into the files [00:41:09] supposedly that's proper ldif format… indenting the next line by one space [00:41:20] it will probably work :) [00:41:27] I've had issues in the past [00:41:40] * andrewbogott crosses fingers [00:41:42] heh [00:41:48] worst case it errors out on some [00:46:02] PROBLEM Free ram is now: WARNING on swift-be4.pmtpa.wmflabs 10.4.0.127 output: Warning: 15% free memory [00:47:03] RECOVERY Total processes is now: OK on bots-4.pmtpa.wmflabs 10.4.0.64 output: PROCS OK: 147 processes [00:51:05] andrewbogott: why the switch to $this->projectName in SpecialNovaSudoer? [00:51:40] I think I thought I would need it. Maybe it didn't actually save any time... [00:52:15] it's not actually a problem [00:53:10] it's less calls and less code [00:54:33] andrewbogott: I think we want to remove "ALL" completely [00:54:36] Oh, it's because I wasn't sure if I could call getRequest in the tryblahsubmit functions. Because isn't the request something different by that time? [00:54:58] Ryan_Lane: You mean remove it from the GUI as well? [00:55:01] yes [00:55:16] it's dangerous [00:55:44] once people start seeing !authenticated, they'll want to use it for other things too [00:56:08] I think that… as I wrote it, the 'ALL' in the GUI is just a placeholder for %project-projectname [00:56:13] So it won't ever get inserted into ldap [00:56:15] oh [00:56:26] that works, then [00:56:37] It's easy to change the gui string to say 'all project memebers' instead of 'ALL' [00:56:43] I think. [00:57:14] I should make it an explicit localized string, that would be clearer. [00:57:51] * Ryan_Lane nods [00:57:55] yeah, indeed [01:00:53] PROBLEM Free ram is now: CRITICAL on nova-precise2.pmtpa.wmflabs 10.4.1.57 output: CHECK_NRPE: Socket timeout after 10 seconds. [01:03:23] PROBLEM Current Load is now: CRITICAL on sube.pmtpa.wmflabs 10.4.0.245 output: Connection refused by host [01:03:23] PROBLEM Free ram is now: CRITICAL on sube.pmtpa.wmflabs 10.4.0.245 output: Connection refused by host [01:04:23] PROBLEM Current Load is now: CRITICAL on nova-precise2.pmtpa.wmflabs 10.4.1.57 output: CHECK_NRPE: Socket timeout after 10 seconds. [01:04:53] PROBLEM Total processes is now: CRITICAL on sube.pmtpa.wmflabs 10.4.0.245 output: Connection refused by host [01:05:12] PROBLEM dpkg-check is now: CRITICAL on nova-precise2.pmtpa.wmflabs 10.4.1.57 output: CHECK_NRPE: Socket timeout after 10 seconds. [01:05:32] PROBLEM Disk Space is now: CRITICAL on nova-precise2.pmtpa.wmflabs 10.4.1.57 output: CHECK_NRPE: Socket timeout after 10 seconds. [01:05:52] PROBLEM Total processes is now: CRITICAL on nova-precise2.pmtpa.wmflabs 10.4.1.57 output: CHECK_NRPE: Socket timeout after 10 seconds. [01:06:51] ryan_lane, stage three is sudonopassword.ldif [01:07:23] PROBLEM Free ram is now: WARNING on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: Warning: 12% free memory [01:07:52] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [01:08:32] PROBLEM SSH is now: CRITICAL on nova-precise2.pmtpa.wmflabs 10.4.1.57 output: CRITICAL - Socket timeout after 10 seconds [01:08:50] in https://gerrit.wikimedia.org/r/#/c/45481/ you say a localized string [01:08:59] I don't see a message from i18n anywhere [01:09:21] Oh, I didn't make the change yet, that was just a reminder. [01:09:28] ah ok [01:09:42] PROBLEM Total processes is now: WARNING on bots-salebot.pmtpa.wmflabs 10.4.0.163 output: PROCS WARNING: 173 processes [01:09:53] meanwhile, nova-precise2 is… broken? [01:10:02] PROBLEM Total processes is now: WARNING on bots-4.pmtpa.wmflabs 10.4.0.64 output: PROCS WARNING: 152 processes [01:10:07] it is? [01:10:15] in which way? [01:10:35] timing out [01:10:44] nagois agrees :( [01:10:46] on ssh? [01:10:52] ugh [01:11:02] I bet it OOM'd [01:11:11] 1GB of memory may not have been enough [01:11:50] check its console log [01:12:39] fixed one typo in the ldif [01:12:41] otherwise it looks good [01:13:18] yep [01:13:19] OOM [01:13:23] I'm going to reboot it [01:14:43] RECOVERY Total processes is now: OK on bots-salebot.pmtpa.wmflabs 10.4.0.163 output: PROCS OK: 96 processes [01:15:25] ok [01:15:31] it's absurd that get console output is in the same thread as everything else [01:16:41] I guess really it's just absurd that the periodic tasks are [01:16:41] heh [01:18:23] RECOVERY SSH is now: OK on nova-precise2.pmtpa.wmflabs 10.4.1.57 output: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1 (protocol 2.0) [01:19:13] PROBLEM Current Load is now: WARNING on nova-precise2.pmtpa.wmflabs 10.4.1.57 output: WARNING - load average: 3.53, 15.04, 15.14 [01:20:23] RECOVERY Disk Space is now: OK on nova-precise2.pmtpa.wmflabs 10.4.1.57 output: DISK OK [01:20:34] andrewbogott: ok. it's back up [01:20:43] RECOVERY Free ram is now: OK on nova-precise2.pmtpa.wmflabs 10.4.1.57 output: OK: 221% free memory [01:20:43] RECOVERY Total processes is now: OK on nova-precise2.pmtpa.wmflabs 10.4.1.57 output: PROCS OK: 111 processes [01:20:53] RECOVERY dpkg-check is now: OK on nova-precise2.pmtpa.wmflabs 10.4.1.57 output: All packages OK [01:20:53] RECOVERY Total processes is now: OK on bots-4.pmtpa.wmflabs 10.4.0.64 output: PROCS OK: 148 processes [01:21:53] PROBLEM Free ram is now: WARNING on swift-be2.pmtpa.wmflabs 10.4.0.112 output: Warning: 17% free memory [01:24:13] RECOVERY Current Load is now: OK on nova-precise2.pmtpa.wmflabs 10.4.1.57 output: OK - load average: 0.17, 0.49, 0.26 [01:24:24] PROBLEM dpkg-check is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: CHECK_NRPE: Error - Could not complete SSL handshake. [01:29:24] RECOVERY dpkg-check is now: OK on aggregator2.pmtpa.wmflabs 10.4.0.193 output: All packages OK [01:39:24] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [02:03:54] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [02:10:22] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [02:20:33] PROBLEM Free ram is now: CRITICAL on bots-4.pmtpa.wmflabs 10.4.0.64 output: Critical: 5% free memory [02:24:53] PROBLEM Total processes is now: WARNING on bots-4.pmtpa.wmflabs 10.4.0.64 output: PROCS WARNING: 151 processes [02:25:32] PROBLEM Free ram is now: WARNING on bots-4.pmtpa.wmflabs 10.4.0.64 output: Warning: 6% free memory [02:29:03] Is the thing returned by $this->msg a different datatype than a "string literal"? [02:31:15] andrewbogott: where? [02:31:31] in mediawiki code [02:31:46] heh. right. but where in the code? [02:31:55] Oh, um… [02:32:35] in SpecialNovaSudoer.php... [02:32:57] the behavior changes when I change "All project members" to a localized string [02:33:14] ...maybe I've introduced some interesting bug and it's nothing to do with localization… [02:33:22] RECOVERY Current Load is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK - load average: 0.02, 0.06, 0.02 [02:33:22] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 16% free memory [02:33:35] I think it just turns a message into a string [02:34:19] Yeah, that's what I'd expect [02:35:02] RECOVERY Total processes is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: PROCS OK: 92 processes [02:40:23] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [02:41:43] Sure enough, whatever it returns can't be used as an array index [02:46:54] huh. weird [02:47:05] does it need to be explitly turned into a string? [02:48:01] If i just use 'openstackmanager-allmembers' then the logic works but the web interface just displays 'openstackmanager-allmembers'. It doesn't localize. [02:48:25] So I thought I needed to load it with msg() first, but then the option doesn't display at all. [02:48:55] This is in getSudoUsers [02:49:03] * Ryan_Lane nods [02:49:06] The code is live on nova-precise2 if you want to see the weirdness [02:51:23] it's in createSudoers? [02:51:50] Yep, that's what calls it [02:51:54] Or, one of the things [02:52:09] * Ryan_Lane nods [02:52:28] where does the localized msg get injected? [02:52:47] $userUid = $this->msg( 'openstackmanager-allmembers' ); ? [02:53:50] $user_keys[$projectmember] = $userUid [02:53:59] I believe that $projectmember is what gets displayed [02:54:23] If you look at debug lines in the comments, you can see that that line does nothing [02:54:35] where's the debug log? [02:54:42] view source [02:54:50] ah [02:54:50] ok [03:00:00] it never sets the key [03:01:45] Yeah. I print out the key and the value, then set it, then show the newly set array and… it's empty. [03:01:55] Or, that's what it looks like to me? [03:02:00] look at line 271 [03:02:14] in the file on nova-precise2 [03:02:26] that line doesn't get printed in the output [03:03:37] That's a different array, though? user_defaults vs. user_keys [03:03:42] ah [03:03:42] right [03:03:43] crap [03:03:56] the setting should be down around 284 [03:05:16] I'm thinking this is some ascii vs. unicode thing... [03:05:22] which, when I print them out they look the same [03:05:35] Although I would think I'd get an error if I set an invalid key, rather than just nothing at all [03:08:31] Of course if that's right then that would mean that unicode can never be displayed in an htmlform, which seems unlikely [03:11:52] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [03:14:44] andrewbogott: have you tried casting it to a string explicitly? [03:15:25] that's what I was trying above, in the incorrect spot [03:16:11] nope, I'll try that [03:16:24] the key and value will both need to be cast in that situation [03:16:41] works [03:16:49] But… that means that it can't actually be localized :( [03:16:53] ah [03:16:54] true [03:16:56] :( [03:17:21] So I guess I will make a note about htmlform being broken... [03:18:05] yeah, add a bug [03:20:33] PROBLEM Free ram is now: CRITICAL on bots-4.pmtpa.wmflabs 10.4.0.64 output: Critical: 5% free memory [03:34:52] RECOVERY Total processes is now: OK on bots-4.pmtpa.wmflabs 10.4.0.64 output: PROCS OK: 147 processes [03:40:50] OK, there's a new (uglier) patch… I'm going to get some food now. [03:41:29] yep. see it [03:41:35] enjoy [03:41:53] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [03:50:34] PROBLEM Free ram is now: WARNING on bots-4.pmtpa.wmflabs 10.4.0.64 output: Warning: 7% free memory [04:05:33] PROBLEM Free ram is now: CRITICAL on bots-4.pmtpa.wmflabs 10.4.0.64 output: Critical: 5% free memory [04:07:53] PROBLEM Total processes is now: WARNING on bots-4.pmtpa.wmflabs 10.4.0.64 output: PROCS WARNING: 155 processes [04:11:52] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [04:37:22] RECOVERY Free ram is now: OK on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: OK: 21% free memory [04:38:22] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 30% free memory [04:41:53] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [04:41:53] RECOVERY Free ram is now: OK on swift-be2.pmtpa.wmflabs 10.4.0.112 output: OK: 20% free memory [04:45:32] PROBLEM Free ram is now: WARNING on bots-4.pmtpa.wmflabs 10.4.0.64 output: Warning: 6% free memory [04:50:02] PROBLEM Free ram is now: WARNING on swift-be2.pmtpa.wmflabs 10.4.0.112 output: Warning: 17% free memory [04:56:22] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 18% free memory [05:10:22] PROBLEM Free ram is now: WARNING on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: Warning: 19% free memory [05:12:32] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [05:28:52] PROBLEM Free ram is now: UNKNOWN on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Call to fork() failed [05:33:52] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [05:42:52] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [05:42:52] RECOVERY Total processes is now: OK on bots-4.pmtpa.wmflabs 10.4.0.64 output: PROCS OK: 150 processes [06:05:53] PROBLEM Total processes is now: WARNING on bots-4.pmtpa.wmflabs 10.4.0.64 output: PROCS WARNING: 153 processes [06:08:53] PROBLEM Free ram is now: UNKNOWN on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Unknown [06:14:24] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [06:30:33] PROBLEM Total processes is now: WARNING on dumps-bot2.pmtpa.wmflabs 10.4.0.60 output: PROCS WARNING: 154 processes [06:31:23] PROBLEM Total processes is now: WARNING on parsoid-roundtrip4-8core.pmtpa.wmflabs 10.4.0.39 output: PROCS WARNING: 152 processes [06:35:33] PROBLEM Free ram is now: CRITICAL on bots-4.pmtpa.wmflabs 10.4.0.64 output: Critical: 5% free memory [06:35:34] RECOVERY Total processes is now: OK on dumps-bot2.pmtpa.wmflabs 10.4.0.60 output: PROCS OK: 149 processes [06:45:23] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [06:48:53] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 8% free memory [06:56:22] RECOVERY Total processes is now: OK on parsoid-roundtrip4-8core.pmtpa.wmflabs 10.4.0.39 output: PROCS OK: 147 processes [07:12:14] Ryan_Lane: tried to access my instance etherpad-lite but authenticationwith my username wikinaut _there_ fails. Pls. can you check and correct what is wrong [07:16:42] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [07:17:38] Damianz: yesterday you mentioned "I hate gerrit" because raw file view is missing [07:17:43] here's https://bugzilla.wikimedia.org/show_bug.cgi?id=42989 [07:18:42] PROBLEM Current Load is now: WARNING on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: WARNING - load average: 4.83, 5.16, 5.06 [07:46:55] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [07:53:43] RECOVERY Current Load is now: OK on parsoid-roundtrip7-8core.pmtpa.wmflabs 10.4.1.26 output: OK - load average: 4.97, 4.74, 4.96 [08:17:52] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [08:23:43] PROBLEM Total processes is now: WARNING on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS WARNING: 152 processes [08:37:18] Change on 12mediawiki a page Wikimedia Labs/Toolserver features needed in Tool Labs was modified, changed by DrTrigon link https://www.mediawiki.org/w/index.php?diff=633934 edit summary: [+178] /* Web */ + some stats [08:41:22] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 30% free memory [08:47:52] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [08:58:43] RECOVERY Total processes is now: OK on bastion1.pmtpa.wmflabs 10.4.0.54 output: PROCS OK: 140 processes [09:04:22] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 17% free memory [09:08:57] !log bots addshore: killing 5716 mono by eikes using 100% cpu [09:09:00] Logged the message, Master [09:10:32] PROBLEM Free ram is now: WARNING on bots-4.pmtpa.wmflabs 10.4.0.64 output: Warning: 6% free memory [09:10:52] RECOVERY Total processes is now: OK on bots-4.pmtpa.wmflabs 10.4.0.64 output: PROCS OK: 147 processes [09:17:52] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [09:25:32] PROBLEM Free ram is now: CRITICAL on bots-4.pmtpa.wmflabs 10.4.0.64 output: Critical: 5% free memory [09:35:34] PROBLEM Free ram is now: WARNING on bots-4.pmtpa.wmflabs 10.4.0.64 output: Warning: 6% free memory [09:47:53] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [09:50:12] PROBLEM dpkg-check is now: CRITICAL on toro.pmtpa.wmflabs 10.4.0.98 output: DPKG CRITICAL dpkg reports broken packages [09:55:53] RECOVERY dpkg-check is now: OK on toro.pmtpa.wmflabs 10.4.0.98 output: All packages OK [10:18:52] PROBLEM Total processes is now: WARNING on bots-4.pmtpa.wmflabs 10.4.0.64 output: PROCS WARNING: 152 processes [10:19:22] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [10:33:52] RECOVERY Total processes is now: OK on bots-4.pmtpa.wmflabs 10.4.0.64 output: PROCS OK: 148 processes [10:49:23] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [11:10:32] PROBLEM Free ram is now: CRITICAL on bots-4.pmtpa.wmflabs 10.4.0.64 output: Critical: 4% free memory [11:12:32] PROBLEM Current Load is now: CRITICAL on etherpad-lite.pmtpa.wmflabs 10.4.0.87 output: CRITICAL - load average: 21.81, 21.56, 20.45 [11:19:42] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [11:50:23] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [12:20:24] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [12:39:23] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 30% free memory [12:40:22] RECOVERY Free ram is now: OK on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: OK: 24% free memory [12:40:32] PROBLEM Free ram is now: WARNING on bots-4.pmtpa.wmflabs 10.4.0.64 output: Warning: 6% free memory [12:45:34] PROBLEM Free ram is now: CRITICAL on bots-4.pmtpa.wmflabs 10.4.0.64 output: Critical: 4% free memory [12:51:52] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [12:53:23] PROBLEM Free ram is now: WARNING on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: Warning: 15% free memory [12:57:24] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 17% free memory [13:21:52] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [13:24:51] anyone familiar with deployment on Labs? [13:51:52] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [14:21:53] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [14:51:53] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [14:58:53] PROBLEM Free ram is now: UNKNOWN on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Call to fork() failed [15:03:52] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [15:21:53] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [15:52:52] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [15:53:53] PROBLEM Free ram is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: CHECK_NRPE: Error - Could not complete SSL handshake. [16:22:52] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [16:23:52] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [16:37:22] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 30% free memory [16:38:22] RECOVERY Free ram is now: OK on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: OK: 21% free memory [16:43:52] PROBLEM Free ram is now: UNKNOWN on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Call to fork() failed [16:48:14] PROBLEM Disk Space is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: CHECK_NRPE: Error - Could not complete SSL handshake. [16:48:24] PROBLEM dpkg-check is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: CHECK_NRPE: Error - Could not complete SSL handshake. [16:48:24] PROBLEM Current Load is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: CHECK_NRPE: Error - Could not complete SSL handshake. [16:51:24] PROBLEM Free ram is now: WARNING on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: Warning: 15% free memory [16:52:53] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [16:53:13] RECOVERY Disk Space is now: OK on aggregator2.pmtpa.wmflabs 10.4.0.193 output: DISK OK [16:53:23] RECOVERY dpkg-check is now: OK on aggregator2.pmtpa.wmflabs 10.4.0.193 output: All packages OK [16:53:24] RECOVERY Current Load is now: OK on aggregator2.pmtpa.wmflabs 10.4.0.193 output: OK - load average: 0.11, 0.22, 0.32 [17:10:22] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 17% free memory [17:24:22] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [17:38:53] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [17:54:24] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [18:11:32] Krenair, can I safely do a git-reset on nova-precise2, or do you have a work in progress there? [18:11:50] I'm not doing anything [18:11:58] I was actually waiting for you to finish whatever you were doing there :P [18:15:07] Oh, huh, I wonder what all these local changes are. [18:16:37] It looks like someone uploaded files which included patches in newer versions of master [18:16:43] It might just be that [18:17:06] But they could also have changed the files themselves [18:25:23] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [18:33:53] PROBLEM Free ram is now: UNKNOWN on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Unknown [18:38:53] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [18:38:54] PROBLEM Current Load is now: CRITICAL on pediapress-packager2.pmtpa.wmflabs 10.4.0.13 output: Connection refused by host [18:39:33] PROBLEM Disk Space is now: CRITICAL on pediapress-packager2.pmtpa.wmflabs 10.4.0.13 output: Connection refused by host [18:40:14] PROBLEM Free ram is now: CRITICAL on pediapress-packager2.pmtpa.wmflabs 10.4.0.13 output: Connection refused by host [18:41:44] PROBLEM Total processes is now: CRITICAL on pediapress-packager2.pmtpa.wmflabs 10.4.0.13 output: Connection refused by host [18:42:24] PROBLEM dpkg-check is now: CRITICAL on pediapress-packager2.pmtpa.wmflabs 10.4.0.13 output: Connection refused by host [18:43:54] RECOVERY Current Load is now: OK on pediapress-packager2.pmtpa.wmflabs 10.4.0.13 output: OK - load average: 0.96, 0.98, 0.53 [18:44:34] RECOVERY Disk Space is now: OK on pediapress-packager2.pmtpa.wmflabs 10.4.0.13 output: DISK OK [18:45:13] RECOVERY Free ram is now: OK on pediapress-packager2.pmtpa.wmflabs 10.4.0.13 output: OK: 856% free memory [18:46:43] RECOVERY Total processes is now: OK on pediapress-packager2.pmtpa.wmflabs 10.4.0.13 output: PROCS OK: 91 processes [18:47:23] RECOVERY dpkg-check is now: OK on pediapress-packager2.pmtpa.wmflabs 10.4.0.13 output: All packages OK [18:56:52] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [19:25:20] ryan_lane, I'm happy with https://gerrit.wikimedia.org/r/#/c/45481/ now so would appreciate a second review. The other two patches haven't changed (just rebased) so no need to reread them. [19:26:52] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [19:34:56] Change on 12mediawiki a page Wikimedia Labs/Toolserver features wanted in Tool Labs was modified, changed by DrTrigon link https://www.mediawiki.org/w/index.php?diff=634083 edit summary: [+234] /* Labs wide (not only bots / tools), but available for all projects */ user projects or "products"? [19:36:55] Change on 12mediawiki a page Wikimedia Labs/Toolserver features wanted in Tool Labs was modified, changed by DrTrigon link https://www.mediawiki.org/w/index.php?diff=634085 edit summary: [+32] /* Labs wide (not only bots / tools), but available for all projects */ aaaww - I forgot; what about migration? [19:40:51] andrewbogott: ok. I'll take a look at that soon [19:41:37] andrewbogott, hey. you done with nova-precise2's OpenStackManager repo for now? [19:41:47] Krenair: Yep, all yours. [19:42:27] there's also https://nova-precise2.pmtpa.wmflabs/wiki2/Main_Page now [19:43:11] hm. though I do a cp of the other location. I should probably ensure its git config is correct [19:44:01] ah. right. no submodules [19:44:02] it's fine [19:56:54] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [20:02:13] PROBLEM Free ram is now: WARNING on bots-liwa.pmtpa.wmflabs 10.4.1.65 output: Warning: 19% free memory [20:12:13] RECOVERY Free ram is now: OK on bots-liwa.pmtpa.wmflabs 10.4.1.65 output: OK: 20% free memory [20:27:53] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [20:28:53] PROBLEM Free ram is now: UNKNOWN on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Call to fork() failed [20:33:52] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [20:40:22] RECOVERY Free ram is now: OK on sube.pmtpa.wmflabs 10.4.0.245 output: OK: 30% free memory [20:40:32] PROBLEM Free ram is now: WARNING on bots-4.pmtpa.wmflabs 10.4.0.64 output: Warning: 6% free memory [20:41:22] RECOVERY Free ram is now: OK on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: OK: 21% free memory [20:45:32] PROBLEM Free ram is now: CRITICAL on bots-4.pmtpa.wmflabs 10.4.0.64 output: Critical: 3% free memory [20:48:22] PROBLEM Free ram is now: WARNING on sube.pmtpa.wmflabs 10.4.0.245 output: Warning: 17% free memory [20:49:22] PROBLEM Free ram is now: WARNING on bots-sql2.pmtpa.wmflabs 10.4.0.41 output: Warning: 15% free memory [20:59:14] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [21:23:22] PROBLEM Disk Space is now: WARNING on nova-precise2.pmtpa.wmflabs 10.4.1.57 output: DISK WARNING - free space: / 567 MB (5% inode=84%): [21:23:52] PROBLEM Free ram is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Unable to read output [21:29:24] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [21:53:52] PROBLEM Free ram is now: WARNING on aggregator2.pmtpa.wmflabs 10.4.0.193 output: Warning: 9% free memory [21:59:23] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [22:06:57] andrewbogott: I think I +2'd everything [22:07:04] andrewbogott: ready for deployment? [22:13:08] Ryan_Lane: Yes! [22:13:19] Did you merge too, or shall I do that? [22:13:20] sweet. need help? [22:13:28] go for it [22:13:33] ok. I think I'm set. [22:16:45] Running bzcat ../data/simplewiktionary-20130113-pages-meta-current.xml.bz2 | php maintenance/importDump.php on my instance is taking a long time -- going for 20m [22:16:57] storage is slow [22:17:10] glusterfs 40-50% of CPU [22:17:17] is this normal ? [22:17:21] yeah [22:17:25] sadly [22:17:50] How much slower than real metal, roughly ? [22:19:06] not sure we actually have numbers, but quite a bit [22:20:07] glusterfs is incredibly slow [22:20:27] most of the cpu it's eating is likely waiting on io [22:20:53] xyzram: if you need fast storage use /mnt [22:21:15] it's local to the instance, though, so remember that if your instance goes away that storage does too [22:22:00] Ok, that was going to me my question. [22:23:52] PROBLEM Free ram is now: CRITICAL on aggregator2.pmtpa.wmflabs 10.4.0.193 output: NRPE: Call to popen() failed [22:24:02] * Damianz waits [22:24:56] xyzram: [22:25:03] root@i-0000009e:/mnt# dd if=testfile of=/dev/null bs=8k [22:25:03] 75624+0 records in [22:25:03] 75624+0 records out [22:25:03] 619511808 bytes (620 MB) copied, 0.307515 s, 2.0 GB/s [22:25:03] root@i-0000009e:/mnt# cd /data/project/ [22:25:06] root@i-0000009e:/data/project# dd if=testfile of=/dev/null bs=8k [22:25:08] 75624+0 records in [22:25:10] 75624+0 records out [22:25:12] 619511808 bytes (620 MB) copied, 20.5206 s, 30.2 MB/s [22:25:17] A little un-realistic maybe but yeah.. it's kinda that slow [22:25:56] Wow, thanks. [22:27:42] Ryan_Lane: it could be interesting to lame benchmark gluster for labs against different write/read workloads just to give users an idea [22:28:06] is it really 30MB/s? [22:28:10] I want to try something. [22:29:23] PROBLEM host: orgcharts-dev.pmtpa.wmflabs is DOWN address: 10.4.0.122 CRITICAL - Host Unreachable (10.4.0.122) [22:30:22] I'm going to see if switching to the virtio network driver will speed this up [22:30:34] I've been meaning to switch all instances to that for some time [22:30:40] so I'm going to try a single instance first [22:31:52] a 600mb file with 9k block size reports that... dd speed is a bit meh though... we could do better, but just the time it takes to run it seems to be that slow [22:33:30] Confused a bit: I thought /mnt was the fast one but your benchmark seemed to show it as the slow one ? [22:34:07] Oops, sorry, misread MB/GB. [22:34:26] yeah... unit matters, mnt is fast [22:35:54] I'm going to kill my import task and redo it on /mnt [22:42:59] yeah, use /mnt for speed [22:43:16] use /data/project for sharing and large storage [22:43:54] Damianz: 30 MB or Mb? [22:44:11] MB [22:44:14] ok [22:44:25] so 300 not 30 [22:44:32] still slow [22:44:40] -_- [22:44:47] [ 6.931582] 8139cp 0000:00:03.0: eth0: link up, 100Mbps, full-duplex, lpa 0x05E1 [22:44:59] the NICs are 100Mb? [22:45:03] no fucking wonder [22:45:12] .... [22:45:16] lolololol [22:45:31] What are the hosts connected at 1gb/10gb or a PC of x gb? [22:46:15] hosts are 1Gb [22:46:49] ok. I can't fix this right now [22:47:08] we need network node per compute node first at minimum [22:47:19] and we likely need to bond some nics on the hosts [22:48:52] PROBLEM Current Load is now: CRITICAL on testing-virtio.pmtpa.wmflabs 10.4.0.45 output: Connection refused by host [22:49:32] PROBLEM Disk Space is now: CRITICAL on testing-virtio.pmtpa.wmflabs 10.4.0.45 output: Connection refused by host [22:49:33] 8139 is 100mbps [22:49:35] 8169 is 1gbps [22:50:12] PROBLEM Free ram is now: CRITICAL on testing-virtio.pmtpa.wmflabs 10.4.0.45 output: Connection refused by host [22:51:42] PROBLEM Total processes is now: CRITICAL on testing-virtio.pmtpa.wmflabs 10.4.0.45 output: Connection refused by host [22:52:22] PROBLEM dpkg-check is now: CRITICAL on testing-virtio.pmtpa.wmflabs 10.4.0.45 output: Connection refused by host [22:52:37] Well normally you'd bond minimum 2 gig nics into 2 sepearte switches in a stack so you're redundant and fast... it would be interesting to use something like openvswitch, give everything gb then apply QOS to stop 1 box eating the entire hosts bw [22:52:48] paravoid: HAI! Long time no seeeees [22:52:56] I can't believe no one has noticed this, this entire time :D [22:53:04] Ryan_Lane: for shame [22:53:07]