[00:04:00] Luke081515: did you figure out the email problem? [00:05:40] yeah, the smpt server is not configured [00:06:34] Luke081515: all I did was I changed metamta.mail-adapter to PhabricatorMailImplementationPHPMailerLiteAdapter [00:06:39] and that fixed it [00:06:52] it defaults to a test adapter [00:09:46] Negative24: So if I change this, the mail will work? [00:10:13] Luke081515: It should. It worked for me. That just configures it to use sendmail [00:10:34] ok, I will test it know, wait a moment [00:11:04] Luke081515: You may also want to change metamta.default-address and metamta.domain [00:12:13] * Luke081515 rates the user Negative24 as "very helpful" :) [00:12:16] It works! [00:12:28] :) [00:18:46] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1410 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [00:21:12] Luke081515: were you having problems with user activations because the emails weren't sending? [00:22:09] YuviPanda: Is there a reason prod can't reach labs public hostnames? [00:22:21] E.g. https://graphite.wmflabs.org/ is unreachable from graphite1001 it seems [00:22:28] also tried stat1002, tin, gallium etc. [00:22:29] Krinkle: I think that's just the general prod -> labs firewall [00:22:31] No, I let the users create them by their own :) [00:22:43] Krinkle: and graphite.wmflabs.org is using novaproxy [00:22:44] YuviPanda: but it could just go through the frontend via outside? [00:22:46] and deactivated registration after that [00:22:55] Krinkle: you can possibly try labmon1001.eqiad.wmnet and see if *that* works [00:22:58] Krinkle: yes [00:23:27] I guess the hostname is being resolved internally and so it never makes it to the normal path [00:23:47] Luke081515: ok [00:24:01] Hm.. labmon1001.eqiad.wmnet seems to work [00:24:03] nice [00:24:14] Krinkle: so for graphite.wmflabs.org, it hits novaproxy-01.eqiad.wmflabs and then that routes it to labmon1001.eqiad.wmnet [00:24:16] Negative24: There is always an alternative :D [00:24:24] not the most elegant setup but the easiest one [00:24:40] so prod can't hit anything on labs instances so can't hit graphite.wmflabs.org [00:24:46] Luke081515: I was just seeing if you had new users that were locked out because they couldn't get activation emails. [00:24:49] but it can hit labs support vlan machines, so can hit labmon1001 [00:25:36] It seems that graphite instance is very unstable though [00:25:39] half the requests fail constantly [00:25:49] oh? [00:25:54] the labs graphite one? [00:25:55] At least something somewhere is going wrong [00:25:57] Yeah [00:26:04] I thought it's mostly ok in terms of load and scaling [00:26:04] It's replying with PNG instead of JSON for some requests [00:26:10] oh [00:26:18] that seems like a graphite fuckup instead of a load related issue [00:26:33] See https://grafana.wikimedia.org/dashboard/db/labs-project-board [00:26:38] hit refresn on the top right a few times [00:27:01] https://grafana-admin.wikimedia.org/dashboard/db/labs-project-board [00:27:05] it's not yet public for non-admin yet [00:27:09] * Krinkle fixes [00:28:10] I see red exclamation marks [00:32:17] Yeah, the non-admin url doesn't work because the graphite proxy in grafana usees POST and we disabled POST for grafana.wikimedia.org [00:32:24] YuviPanda: use the grafana-admin url instead [00:32:27] ldap login [00:33:09] Krinkle: ah I see what you mean [00:33:15] I've no idea why that's happening [00:34:14] On https://grafana-admin.wikimedia.org/dashboard/db/labs-project-board the overall dashboard is working [00:34:28] but there's still randomly some panels not working because the ajax request is getting a PNG back instead of JSON [00:34:30] really weird [00:38:43] https://grafana.wikimedia.org/dashboard/db/labs-project-board [00:38:46] working better now, YuviPanda [00:39:52] still lots of red exclamation marks for me [00:40:00] I gotta go now though :( file a bug if it persist... [00:40:10] Yeah, t's probably a bug in Graphite or Grafana [00:40:21] Not all of them though, right? [00:58:46] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [01:02:41] Negative24: Still here? [01:02:51] yes [01:03:06] I got another problem with gits [01:03:13] I cannot read or write via SSH [01:05:31] Negative24: The error message is: [01:05:35] FATAL ERROR: Network error: Connection timed out [01:05:36] fatal: Could not read from remote repository. [01:05:50] cloning through ssh? [01:06:10] not cloning, the first point is uploading :), but he can not read it [01:07:51] hmm yea. SSH is pretty bad at the moment [01:08:05] this is my first stab at it -> https://gerrit.wikimedia.org/r/#/c/222987/ [01:08:21] But import external don't do anything too [01:08:58] YuviPanda: did you hear if anyone was reworking (Chasemp?) that? ^ [01:09:14] oh wait. you weren't involved in that [01:10:39] Luke081515: where are you seeing import external? [01:11:22] I can creata a git by importing from an external URL, but then I got an error too [01:14:20] Luke081515: are you importing an existing repository on Diffusion [01:14:30] ah, external import works now, my fault [01:14:47] yeah, I tried both possibilities [01:29:41] Luke081515: what was the problem? [01:30:45] The local directory already exists, so he can't import [01:51:17] 6Labs, 6Phabricator, 7Puppet: phabricator puppet at labs broken - https://phabricator.wikimedia.org/T116442#1751263 (10zhuyifei1999) >>! In T116442#1751011, @Negative24 wrote: > `role::phabricator::main` isn't the right Puppet class to use in Labs. I'm pretty sure the error had to do with the site variables.... [02:27:27] 6Labs, 6Phabricator, 7Puppet: phabricator puppet at labs broken - https://phabricator.wikimedia.org/T116442#1751267 (10Negative24) Its not documented. But its not hard. Essentially its just: 1. Apply `role::phabricator::labs` and run puppet 2. `cd /srv/phab/phabricator` and run `sudo bin/storage upgrade` 3.... [10:08:59] Hi. The last plwiktionary dump (20151002) is missing from /public/dumps/public. Is there any reason for this? [11:36:45] 6Labs: Latest dumps not available on Labs - https://phabricator.wikimedia.org/T116529#1751601 (10Alkamid) 3NEW [12:33:57] 6Labs: Latest dumps not available on Labs - https://phabricator.wikimedia.org/T116529#1751641 (10zhuyifei1999) See also https://lists.wikimedia.org/pipermail/labs-l/2015-October/004080.html [12:38:56] 6Labs: Latest dumps not available on Labs - https://phabricator.wikimedia.org/T116529#1751644 (10zhuyifei1999) [12:38:57] 6Labs, 10Tool-Labs: /public/dumps/public/ is not updating on Tool Labs - https://phabricator.wikimedia.org/T115969#1751645 (10zhuyifei1999) [13:17:00] anyone know if it's possible to get a little bit more space on my /srv partition? [13:17:35] either from / or from an admin that can increase it (without rebooting or re-mounting anything) [13:17:38] that sounds impossible :) [13:18:14] I have this job that's been running for almost 24 hours and it's just baaarely gonna run out of space [13:22:01] milimetric: other than removing other files from / /srv, no [13:22:11] I don't think there's a way to do an on-line resize [13:22:28] someone said: [13:22:31] lvextend -L +10G /dev/mapper/HU-root [13:22:31] e2resize /dev/mapper/HU-root [13:22:46] yes. that requires the device to be offline (=unmounted) [13:22:47] (not those names) [13:22:54] ah... k [13:23:05] sux :) [13:23:09] thx valhallasw`cloud [13:23:59] I'd try to move some stuff around to make some space available [13:25:40] sadly this is a huge local map reduce job, and I watched the files with inotifywatch and it looks like it still needs all of them [13:26:38] ah, it doesn't keep the files open? In that case you might be able to get away with moving them to another filesystem and symlinking them [13:27:29] but that depends on how much you need the results Right Now(TM). I'd be inclined to say 'meh, I'll wait for another 24 hours' [13:28:12] oooh!!! symlink!! you're a genius [13:28:32] I think I can do that, yeah, in the reduce phase it doesn't seem to touch those files, if I can time the move with the log I should be ok :) [13:28:43] make sure to check it doesn't have the file open, though (check /proc//fd/* ) [13:28:46] lol, that's hilarious, I would've never thought of it [13:29:23] and you can send SIGSTOP to the process so it doesn't try to open anything while you're doing to moving [13:42:11] milimetric: There's https://wikitech.wikimedia.org/wiki/Special:NovaVolume , not there's 0 documentation about it [13:42:17] *but [13:43:37] zhuyifei1999_: huh... never saw that before, so what is this? shared volumes for your project that are mounted in /srv ? [13:43:47] you have to unmount / remount /srv though, right? [13:44:00] no idea what that is [13:44:08] haha, ok, wild speculation [13:44:35] the symlink idea is brilliant. I think it's gonna work. I'm writing a script that will move everything and replace with symlinks and I'm watching the file usage pattern [13:45:02] the code for the special page is https://github.com/wikimedia/mediawiki-extensions-OpenStackManager/blob/master/special/SpecialNovaVolume.php, but it tells little on its purpose and usage [13:47:04] actually, let me try it [13:47:45] Failed to create volume. [13:48:12] sounds like we need rights we don't have [13:49:56] https://github.com/wikimedia/mediawiki-extensions-OpenStackManager/blob/8892ffd0f9c0693f9f0ee92711d806fa5f1d5dd8/nova/OpenStackNovaController.php#L630 ah, Unimplemented [13:52:38] aha, thx, interesting [19:13:16] Negative24: Are you here at the moment? [19:21:14] PROBLEM - Puppet failure on tools-webgrid-generic-1402 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0] [19:41:00] milimetric: zhuyifei1999_ I've also never seen that interface... [19:41:08] I think that might've been a holdover from glusterfs [19:41:29] milimetric: and if you're already using the srv mount, then no there's no way to get more space, unfortunately [19:56:22] RECOVERY - Puppet failure on tools-webgrid-generic-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [20:52:15] PROBLEM - Puppet failure on tools-webgrid-generic-1402 is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [0.0] [22:27:16] RECOVERY - Puppet failure on tools-webgrid-generic-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [22:48:11] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Stigmj was created, changed by Stigmj link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Stigmj edit summary: Created page with "{{Tools Access Request |Justification=At first, a statistics backend for the nowiki project. Have earlier used own servers, but running closer to the data would be better. See..." [23:05:27] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Stigmj was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=196942 edit summary: