[05:23:50] 3Beta-Cluster, Wikimedia-Labs-Infrastructure: Setup real ssl certs for Beta Cluster using a restricted project - https://phabricator.wikimedia.org/T75919#792542 (10MZMcBride) [05:44:34] did tools just crash ? [06:13:54] I am SOOO FUCKING annoyed [08:24:05] hi [08:24:09] ohai [08:24:11] shinken [08:24:37] what is up, [08:25:57] what is up, [08:26:00] aha! [08:26:07] now let's see if actual notifications work [08:26:10] * YuviPanda twiddles thumbs [08:29:51] PROBLEM - Puppet failure on tools-exec-15 is CRITICAL: OK: Less than 1.00% above the threshold [0.0] [08:29:51] PROBLEM - Puppet failure on tools-exec-09 is CRITICAL: OK: Less than 1.00% above the threshold [0.0] [08:29:57] PROBLEM - Free space - all mounts on tools-login is CRITICAL: OK: All targets OK [08:29:59] PROBLEM - Free space - all mounts on tools-exec-11 is CRITICAL: OK: All targets OK [08:30:15] BAAAMM! [08:34:53] RECOVERY - Puppet failure on tools-exec-15 is OK: OK: Less than 1.00% above the threshold [0.0] [08:34:54] RECOVERY - Puppet failure on tools-exec-09 is OK: OK: Less than 1.00% above the threshold [0.0] [08:34:58] RECOVERY - Free space - all mounts on tools-exec-11 is OK: OK: All targets OK [08:34:58] RECOVERY - Free space - all mounts on tools-login is OK: OK: All targets OK [10:05:13] Hi all. Is an admin around? I created a project request (winput). [10:05:38] I'd like to get it created so that I can try to start setting it up. [10:37:57] hi [10:38:02] hi to you too, shinken-wm [10:38:23] gry: hmm, could you possibly wait a week? labs is slightly constrained on hardware resources at the moment. new servers are on the way, should be there next week [10:42:42] YuviPanda: I wouldn't take resources, I would mainly do coding. I could do it on localhost of course... [10:47:29] gry: Start it in tools? [10:48:34] gry: by resources I meant creating an instance allocates resources to it. [10:48:37] CPU, Memory, Disk space [10:48:52] if you absolutely need it, I can create it for you, but would prefer to wait if that's ok with you [10:51:47] YuviPanda: What's the numbers to these incoming resources btw? [10:51:59] 'numbers' as in? [10:52:03] how many servers are coming in? [10:52:11] and specs? [10:52:14] let me look 'em up [10:52:19] Yeah, servers, rams, horses. [10:55:50] ok [10:55:58] a930913: how do I start it in tools? [10:56:14] yay horses :) [10:57:06] gry: What are you trying to do? [10:57:47] a930913: i would like to run a small django app on it which some of the wikimedia projects would link to [as an external tool] [10:58:14] Yeah, tools is the best place for that I reckon. [10:58:41] gry: Do you have a tools account? [10:59:06] yes [10:59:57] a930913: 24 cores (2 CPUs with 12 cores each), 384GB RAM, 4.8TB of raw storage (will be RAIDed) [10:59:58] gry: So make a new tool (service group). [11:00:00] a930913: 3 such machines [11:00:02] i have a 'thing' (which i thought is a project) with it too, in fact a couple, they're listed here: https://tools.wmflabs.org [11:00:16] how do i make a new tool (service group) ? [11:00:34] gry: https://wikitech.wikimedia.org/w/index.php?title=Special:NovaServiceGroup&action=addservicegroup&projectname=tools [11:01:07] YuviPanda: 3x the first message? [11:01:15] a930913: yes [11:01:53] gry: yeah, I'd higly suggest using toollabs for this [11:02:02] cool [11:02:22] YuviPanda: That's a lot of ram per core, no? :o [11:02:45] a930913: no :) it closely matches our current hardware in terms of core/ram/storage [11:03:43] a930913: it essentially should double our capacity. [11:05:26] YuviPanda: Because there are so many people using it, there needs to be lots of ram, but people sporadically use processing power which is effectively shared? [11:06:30] a930913: yeah, pretty much. [11:06:56] we're usually overprovisioned on CPUs and Storage a lot more than on RAM [11:06:59] and that works out ok [11:07:17] although note that a large instance takes up 64GB of RAM, so 384 GB isn't as much :) we've close to a thousand instances. [11:08:45] YuviPanda: Yeah, that's what I mean, people have a 64GB instance that they only log into once a day to do their batch of processing. Therefore 64GB is needed round the clock, but processing only for the period of the batch. [11:09:15] gry: How you getting along? [11:09:49] a930913: it's not needed round the clock, I think. I'm sure there's some intelligent swap going on [11:10:02] but I've no idea how KVM works at that level :) [11:10:50] a930913: it is going ok, i logged in :) [11:12:29] YuviPanda: Power saving styled "swap mode" :p [11:12:35] hehe :) [11:17:41] gry: So, what's this app going to do? :) [11:18:34] see input.mozilla.org, i try to run the same but for wikimedia projects ( anonymous feedback ), smaller projects first, see how it kicks off [11:19:52] gry: And winput is a concatenation of "wiki input"? [11:20:14] yes [11:20:36] for wikimedia projects & wikimedia chapters on opt-in basis after asking them at a village pump [11:21:38] Tbh, it looks like a public complaints dump :/ [12:07:25] a930913: yes, it's a very low moderation and high noise medium, i'm curious how it avoids duplicates and how useful it gets. it's an experiment [13:22:47] Good evening! [13:23:05] well, hello, planemad [13:23:23] I need to run shp2pgsql to load shapefiles into the postgres database [13:23:30] but it says postgis is not installed [13:23:49] The program 'shp2pgsql' is currently not installed. To run 'shp2pgsql' please ask your administrator to install the package 'postgis' [13:24:31] hmmm [13:24:51] should I file a bug for this? [13:25:22] planemad: yup! file a bug on phabricator, toollabs project, and Il'l take care of it? [13:26:14] thanks YuviPanda [13:27:17] planemad: yw! :) [13:27:25] planemad: I've also manually installed it on tools-login, but do file a bug. [13:32:00] planemad: try it out on tools-login now? [13:33:38] YuviPanda, wonderful, works [13:33:44] cool :) [13:34:27] YuviPanda, filed: https://phabricator.wikimedia.org/T76226 [13:34:55] planemad: yeah, am fixing it properly now. you can go ahead, though. [13:35:05] planemad: any time you need a new package, just file a task [13:35:15] thanks you, Sir [13:40:26] planemad: yw! :) [14:13:12] Connection closed by UNKNOWN [14:13:24] Whois that UNKNOWN? :) [14:13:33] (language-apertium instance) [14:13:48] YuviPanda: can you check? ^ [14:13:58] (was working fine till my noon) [14:14:05] that usually means the VM is OOM or shut down [14:14:11] kart_: check status in Special:NovaInstance [14:14:43] ACTIVE [14:15:34] ok. [14:15:36] Out of memory: Kill process 16659 (apertium-transf) score 31 or sacrifice child [14:15:49] on console output is new. [14:17:29] kart_: ah, yeah. reboot? [14:17:32] kart_: it just OOM'd [14:18:37] I rebooted. [14:18:42] Thanks. [14:30:47] Coren: Could you help me a bit? I'm trying to get the data from my tool in a page on fr.wiktionary, but I get a Cross-Origin Request error. [14:33:57] Darkdadaah: You almost certainly want jsonp to do that. [14:34:03] http://stackoverflow.com/questions/3506208/jquery-ajax-cross-domain [14:38:14] Coren: heya! think you could take a look at tools-webgrid-05 which became 'SHUTOFF' yesterday and wouldn't restart? [14:38:29] Yeah, I'm looking at it now. [14:38:54] I can probably just restart it but I'd really rather figure out why taht happened first. [15:08:21] YuviPanda: I... don't get it. As far as I can tell, it just shut itself down with neither error nor message. [15:09:36] So, I've forcibly restarted it. Let's see what happens next. [15:12:30] PROBLEM - Puppet staleness on tools-webgrid-05 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [43200.0] [15:22:28] RECOVERY - Puppet staleness on tools-webgrid-05 is OK: OK: Less than 1.00% above the threshold [3600.0] [15:34:22] Coren: jsonp is old, he should just set the CORS headers in the lightty config. https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Web [17:00:44] 3Wikimedia-Labs-Infrastructure: make Debian Jessie image for labs - https://phabricator.wikimedia.org/T75592#793510 (10Andrew) I have a couple of hand-made jessie instances running now. Unfortunately, the software that we use to build official labs images (python-vm-builder) doesn't support Debian. There's a d... [20:00:00] YuviPanda: I have absolutely no idea what happened to your instance before. It just shut down without warning or notice and has run without issue since the restart. [20:09:25] Coren: hello, in case you are around, I have at least one instance locked because of some DNS failure resolution :D [20:09:45] ount.nfs: Failed to resolve server labstore.svc.eqiad.wmnet: Temporary failure in name resolution [20:09:52] though that resolves [20:15:20] 3Wikimedia-Labs-Infrastructure: integration-slave1001.eqiad.wmflabs can't start, mount.nfs yields failure in name resolution - https://phabricator.wikimedia.org/T76250#793736 (10hashar) [20:17:51] 3Continuous-Integration, Wikimedia-Labs-Infrastructure: integration-slave1001.eqiad.wmflabs can't start, mount.nfs yields failure in name resolution - https://phabricator.wikimedia.org/T76250#793749 (10hashar) I have marked the associated Jenkins slave as offline. We will want to bring it back online whenever th... [20:18:33] 3Continuous-Integration, Wikimedia-Labs-Infrastructure: integration-slave1001.eqiad.wmflabs can't start, mount.nfs yields failure in name resolution - https://phabricator.wikimedia.org/T76250#793752 (10hashar) [20:58:19] Coren: hey! sorry was away. [20:58:27] Coren: hmm, I wonder if we should just let it be, or re-create it just to be safe? [20:59:09] YuviPanda: There seems to be nothing wrong with it atm; I'd at least see if if works right for a while before I gave up on it. [20:59:22] Coren: hmm, ok. I'll email labs-l again [21:13:33] Coren / YuviPanda I got an instance that refuse to start because it can't resolve labstore.svc.eqiad.wmnet [21:13:42] not sure whether it is just this instance or general issue on labs :/ [21:13:55] filled in https://phabricator.wikimedia.org/T76250 [21:13:59] Yeah, I saw your email pop up. afaict, DNS is fine everywhere else. [21:14:30] I didn't really get a chance to look at it in depth yet, but I'm guessing your resolv.conf got mangled. [21:14:38] ah possibly [21:14:49] so if that is just that instance, it can definitely wait next week :] [21:39:34] Coren, how long does it normally take to sync ssh keys to a new instance these days? [21:40:42] stwalkerster: A few minutes, at most. IIRC the job runs every other minute. Might be every 5. [21:40:50] I've been waiting forty. [21:41:03] sorry, thirty. Can't do maths tonight [21:41:08] That's definitely not normal. What account name is this? [21:41:22] I'm stwalkerster, this is the instance accounts-application3 [21:41:57] I'm just getting Permission denied (publickey) to my ssh login attempts [21:42:31] I'm indeed not seeing an update to your keys since Oct 16 [21:43:15] And you see the new key in the wikitech prefs, right? [21:43:24] Oct 16? This is a new instance... [21:43:28] created today. [21:43:51] No, keys aren't pushed to instances they are pulled centrally. [21:44:21] But if you created the instance, you have to wait for puppet to finish entirely, that *can* take a while. [21:45:21] Aaah... cool. I'll give it a bit longer then :) [21:45:48] How long depends on what needs to be installed; tool labs grid nodes can take nearly an hour. :-( [21:46:18] This only has the webserver::php5-mysql role to install [21:50:23] Shouldn't be all that long then. [21:57:33] Coren, found the problem [21:57:52] That class doesn't exist in puppet any more since https://git.wikimedia.org/commit/operations%2Fpuppet.git/d691d9b568c83c2dccbbd978d8efaa0531650ae3 [21:58:36] Could it be removed from the interface, or marked as deprecated on Wikitech? [21:58:36] Ah! [21:58:44] Yes, it could. And should. [21:59:32] * stwalkerster wonders how many instances are/were using that... [22:02:08] and is there an equivalent I can use? [22:05:32] No doubt, but I wouldn't know about it. I'm quite sure Joe could tell you, comments on the Gerrit change hint that the functionality has been moved elsewhere.