[00:00:51] RECOVERY - Puppet failure on tools-exec-15 is OK: OK: Less than 1.00% above the threshold [0.0] [00:32:14] Coren: https://en.wikipedia.org/wiki/Special:WhatLinksHere/Template:TH_question_page specifically line 14: Topic:S214uoczkp47cfsx (transclusion; from a post) ‎ (links) [ https://en.wikipedia.org/wiki/Topic:S214uoczkp47cfsx ]: 13:57, September 10, 2014 Coren (talk | contribs) deleted a topic on Topic:S214uoczkp47cfsx (Test, borne out of wikimedia-l thread.) [00:32:32] Why are deleted topics still transcluding templates? [00:33:21] I would expect that's because deletion/suppression doesn't update the templatelinks table. That'd be a mw bug imo. [00:50:22] Which is why I was reporting it to you here. Want me to open a ticket on Phab? [01:09:26] Coren: Which is why I was reporting it to you here. Want me to open a ticket on Phab? [01:14:16] Yep, that needs a phab. [01:15:51] Coren: under "Flow" ? [01:19:54] Coren: https://phabricator.wikimedia.org/T76681 look accurate? [01:43:23] T13: Sounds reasonable. [02:41:12] (03CR) 10Spage: [C: 031] "Looks right to me too, we just need someone in https://wikitech.wikimedia.org/wiki/Grrrit-wm#Access to deploy." [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/177371 (owner: 10Mattflaschen) [02:45:02] (03CR) 10Legoktm: [C: 032] Rename #wikimedia-corefeatures to #wikimedia-collaboration [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/177371 (owner: 10Mattflaschen) [02:45:06] (03Merged) 10jenkins-bot: Rename #wikimedia-corefeatures to #wikimedia-collaboration [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/177371 (owner: 10Mattflaschen) [02:45:41] thanks legoktm! [02:46:02] !log tools.lolrrit-wm restarting for https://gerrit.wikimedia.org/r/177371 [02:46:04] Logged the message, Master [02:47:29] (03CR) 10Legoktm: [C: 032 V: 032] Rename #wikimedia-corefeatures to #wikimedia-collaboration [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/177372 (owner: 10Mattflaschen) [02:48:23] !log tools.wikibugs restarted for https://gerrit.wikimedia.org/r/177372 [02:48:25] Logged the message, Master [02:48:42] spagewmf: np [06:28:41] I do not know what is happening but Labs is extremely slow [06:34:06] PROBLEM - Puppet failure on tools-webgrid-02 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0] [06:34:32] PROBLEM - Puppet failure on tools-trusty is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [06:41:27] PROBLEM - Puppet failure on tools-exec-07 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [0.0] [06:52:28] PROBLEM - Puppet failure on tools-exec-03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [06:52:50] PROBLEM - Puppet failure on tools-exec-15 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [06:57:32] PROBLEM - Puppet failure on tools-dev is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [06:59:08] RECOVERY - Puppet failure on tools-webgrid-02 is OK: OK: Less than 1.00% above the threshold [0.0] [07:00:48] PROBLEM - Puppet failure on tools-webgrid-03 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [07:04:30] RECOVERY - Puppet failure on tools-trusty is OK: OK: Less than 1.00% above the threshold [0.0] [07:06:31] RECOVERY - Puppet failure on tools-exec-07 is OK: OK: Less than 1.00% above the threshold [0.0] [07:17:27] RECOVERY - Puppet failure on tools-exec-03 is OK: OK: Less than 1.00% above the threshold [0.0] [07:22:34] RECOVERY - Puppet failure on tools-dev is OK: OK: Less than 1.00% above the threshold [0.0] [07:22:52] RECOVERY - Puppet failure on tools-exec-15 is OK: OK: Less than 1.00% above the threshold [0.0] [07:25:50] RECOVERY - Puppet failure on tools-webgrid-03 is OK: OK: Less than 1.00% above the threshold [0.0] [10:05:14] All config deployments to beta labs are stuck: https://integration.wikimedia.org/zuul/ [10:42:16] WDQ keeps crashing because the MySQL server on Labs keeps disconnecting since tonight. Trying to hack around the infrastructure, again. [10:42:22] quote from Magnus [12:52:40] PROBLEM - Puppet failure on tools-exec-06 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [13:17:44] RECOVERY - Puppet failure on tools-exec-06 is OK: OK: Less than 1.00% above the threshold [0.0] [14:50:37] PROBLEM - Puppet failure on tools-exec-05 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [15:10:29] !petan-build [15:10:29] make -j `getconf _NPROCESSORS_ONLN` deb-pkg LOCALVERSION=-custom [15:10:51] YuviPanda: Did you try the trusty support for bigbrother? [15:12:48] Coren: merged it but didn't test yet. Am out at dinner will test when I come back if you didn't do it already :) [15:13:01] Doing so now. Go eat. [15:20:36] RECOVERY - Puppet failure on tools-exec-05 is OK: OK: Less than 1.00% above the threshold [0.0] [15:41:11] YuviPanda: Works, with a typo fix. [17:25:38] hi [17:25:52] I have a problem with an instance [17:26:20] i cannot reboot it from the wikitech web page [17:27:02] It says "Error restarting instance 3ef62478-dc92-47b5-8385-78a3010115d3. " [17:27:26] The instance involved is i-00000336 (osmit-cruncher1) [17:27:36] Can someone please restart it? [17:28:10] btw I've no ssh/console access to the instance [17:29:05] Lemme take a look. [17:29:55] Thanks [17:33:14] sbiribizio: It's not immediately clear why it's not booting. I see it attempt to do so, then die quietly. Lemme dig a bit deeper. [17:34:12] Last console output was reporting an error 32 on mounting [17:34:53] but if I remember was mounting /home [17:38:15] Yeah, that's "normal" if your project isn't set with shared homes - then it can't mount the shared homes. :-) [17:38:38] ok, false problem [17:42:25] Ah. I see what the issue is. [17:42:29] "Cannot set up guest memory 'pc.ram': Cannot allocate memory" [17:42:37] * Coren cries a little. [17:45:42] andrewbogott: Can we turn off on or some of wikitech-test-frontend, wikitech-test1-frontend, or wikitech-test1-network? virt1002 is out of resources. [18:01:06] Coren: I'll kill a few [18:01:13] since none of them work anyway :( [18:01:25] Since labs-vagrant doesn't work on precise at all <- poke for YuviPanda [18:02:18] andrewbogott: Don't blame Yuvi, blame progress. :) [18:02:33] precise is old busted and trusty is the new hotness [18:02:57] Until we switch to Jessie or whatever [18:03:45] 'progress' means that our software only ever has two states: "Not working yet" or "Don't bother to maintain this, it's deprecated" [18:03:47] andrewbogott: which instance is this? [18:03:55] YuviPanda: I just deleted it, as per Coren's request. [18:03:57] but, see for yourself. [18:04:04] andrewbogott: moving mw to an old enough instance will make that work [18:04:29] 'an old enough instance'? [18:05:00] What was totally broken? Just the vendor repo? [18:05:10] sbiribizio: That made room enough for your instance to start. [18:05:17] bd808: and many things subsequent [18:05:39] andrewbogott: Strictly speaking, it didn't have to be deleted so long as it was shutdown; but that works. :) [18:06:24] Ok [18:06:26] YuviPanda: I feel a bit pinched between "Don't bother to maintain singlenode because everything should use labs-vagrant" and "we aren't maintaining labs-vagrant for your use case" [18:06:32] Hence, wikitech-test remains kaput [18:06:39] Can I connect to it? [18:06:52] ciao sbiribizio [18:06:59] ciao nemo [18:07:22] andrewbogott: :( Sorry. I think I have a couple labs-vagrant hosts still running precise but they are running roles other than mediawiki [18:07:31] sbiribizio: It's up and running now as far as I can tell. [18:08:24] andrewbogott: err, I mean, old enough checkout of mw [18:08:35] All right Coren [18:08:37] andrewbogott: But do you have other blockers to using trusty instead of precise? The hhvm switch was the biggest reason for dropping precise support in mw-v [18:08:47] andrewbogott: perhaps I should find out that and stick that along with precise-compat [18:08:48] It works! [18:08:56] Thanks a lot!!! [18:09:23] bd808: the blocker is that we're using precise in production, and the whole reason I'm using vagrant is to test for production [18:09:42] There shouldn't be anything in MW *yet* that won't run on precise [18:10:08] They yet is important though because I expect that to change by the end of January at the latest [18:10:57] There will be a discussion on wikitech-l for sure but the last time we had it the only blocker to dropping 5.3 support was the WMF cluster [18:11:29] and we are just a few api servers away from escaping that for the main site [18:17:41] andrewbogott: I can try setting up a plain precise vagrant thing on openstack [18:17:43] project [18:18:04] YuviPanda: yeah, or I can hack on labs-vagrant… it just won't happen immediately. [18:18:10] The role I'm using is 'wikitech' [18:18:29] I'm sure it's broken now, and probably it's up to me to update it, I'm just bitter about so many regressions coming on so quickly [18:18:39] getting wikitech running on trusty and hhvm would be really good. really double plus good. [18:20:34] andrewbogott: Yes; apache needs to exist on labstore for it to work right with NFS; but you can't have a username in LDAP if a package expects to be managing it. Simplest solution: create the user (any UID works) on labstore* [18:21:20] I always considered it a bug that dpkg doesn't allow a user to exist in your directory service and use /that/ rather than create it [18:21:26] Coren: are there other package users on labstore already? I'm wondering how to avoid future uid conflicts [18:23:01] andrewbogott: You can't avoid the conflict entirely - creating the user means you can't install the apache packages anymore (though we wouldn't *want* to). Any uid would work but you may want to simjply pick a non-system uid just to be sure. [18:23:35] Coren: I was thinking conflicts with ldap, mostly. [18:24:05] andrewbogott: Then anything outside the range we allocate with ldap should do. Low 4000s maybe? [18:24:32] Is ldap for logging into phab case sensitive for usernames? [18:24:47] Thanks and goodbye [18:44:58] Greetings! [18:45:36] Is there documentation available on installing node modules on toollabs? [18:47:02] https://wikitech.wikimedia.org/wiki/Help:Node.js does not have much. npm install module fails for me [18:57:12] planemad: do it on trusty.tools.wmflabs.org [18:57:41] ok [19:09:43] YuviPanda, getting a mouthful: http://pastebin.com/ANX55EKe [19:28:54] YuviPanda: can shinken tell me why puppet failed? [19:29:45] legoktm: sadly no. All your extdist failures are failures across labs. I am investigating [19:31:59] oh ok [19:32:06] :) :( [21:06:44] legoktm: there was a warning from extdist for puppet failures recently, no? [21:06:47] legoktm: can you tell me timestamp and machine? [21:07:34] YuviPanda: extdist3 [21:07:43] Date/Time: Thu 04 Dec 19:25:07 UTC 2014 [21:07:47] ah, cool [21:07:58] looks like most of these are because of... apt-get failing [21:09:11] :| [21:09:23] legoktm: lol, not actually. yours failed because... gerrit timed out [21:09:36] [mNotice: /Stage[main]/Extdist/Git::Clone[integration/composer]/Exec[git_pull_integration/composer]/returns: fatal: unable to access 'https: [21:09:36] //gerrit.wikimedia.org/r/p/integration/composer.git/': The requested URL returned error: 503 [21:09:40] >.< [21:09:48] so it was an actual error [21:09:52] just... not one we cared about [21:10:31] legoktm: not sure how exactly to fix it. [21:10:40] legoktm: I can fix the check to only error out if *two* consecutive checks fail [21:10:42] Get rid of gerrit? [21:10:43] :P [21:10:48] yeah, that would also work [21:10:52] legoktm: heh ;) [21:11:01] legoktm: yeah, *but* that fails on some very important cases [21:11:07] puppet failing shouldn't actively break anything right? [21:11:11] legoktm: which is when a restart of a service fails because a config file changed and it was wrong [21:11:13] it just means somethin is broken [21:11:17] o.o [21:11:26] legoktm: and so the service will die, and then it won't be restarted, and boom service gone [21:11:35] legoktm: and puppet won't try to restart it next time, because hey, this file didn't change! [21:11:45] I don't mind the spam [21:12:00] well, a monitoring system that sends spam is useless. [21:29:13] YuviPanda, tried a whole bunch of things, no luck. Any pointers on how to install topojson? [21:29:28] planemad: hmm, didn't your partner install it a while ago? [21:29:37] planemad: he ran into almost exactly the same problem, and found some way around it... [21:30:01] i've got him on the line, but he says it was some fluke [21:30:14] ... heh :) [21:30:33] planemad: can you give me the paste url again? I'm taking a look [21:30:57] YuviPanda, http://pastebin.com/ANX55EKe [21:31:37] I'm taking a look now [21:33:12] 3Wikimedia-Labs-General: Pages are appearing in a broken way in Betalabs - https://phabricator.wikimedia.org/T76777#819660 (10Ryasmeen) [21:34:28] I'm starting to *hate* nodejs [21:35:29] YuviPanda: starting to? [21:35:36] you're way behind [21:35:41] Ryan_Lane: haven't had to deal with it much until now :) [21:35:50] switch to io.js :D [21:35:56] (I kid, I kid) [21:36:22] Ryan_Lane: :D [21:36:34] Ryan_Lane: y'know, maybee.... :) [21:36:39] * YuviPanda installs rust everywhere [21:36:51] use rocket, too [21:37:12] we aren't hardcoreos enough [21:38:09] 3Wikimedia-Labs-General, Beta-Cluster: Pages are appearing in a broken way in Betalabs - https://phabricator.wikimedia.org/T76777#819681 (10Ryasmeen) [21:38:41] YuviPanda: when are you coming to SF next? [21:38:43] planemad: y'know, I'm somewhat stumped. the web is filled with other people with similar problems with wildly varying solutions, none of which seem to work [21:39:48] can confirm, still stuck :) [21:40:34] wtf, it doesn't seem to even write an actual debug log?! [21:40:58] gwicke: hey! as someone who has probably spent more time with node than others, do you know what to make of http://pastebin.com/ANX55EKe? [21:41:15] gwicke: npm install fails because it can't seem to find files it thinks it just installed. [21:41:15] 3Wikimedia-Labs-General, Beta-Cluster: Pages are appearing in a broken way in Betalabs - https://phabricator.wikimedia.org/T76777#819689 (10greg) p:5Triage>3Unbreak! Confirmed, this is due to a config breakage. @chad / @Aaron: maybe due to https://gerrit.wikimedia.org/r/#/c/177561/ See also: https://integr... [21:41:19] and no npm-debug.log either [21:41:37] 3Wikimedia-Labs-General, Beta-Cluster: Pages are appearing in a broken way in Betalabs - https://phabricator.wikimedia.org/T76777#819692 (10greg) [21:42:45] legoktm: here's the bug that caused your puppetspam, btw: https://phabricator.wikimedia.org/T76771?workflow=create [21:43:00] thanks [21:44:44] planemad: sooo. [21:44:48] planemad: I found https://github.com/npm/npm/wiki/Troubleshooting [21:45:01] planemad: which just says 'well, so yeah, our threading code kind of sucks badly, so upgrade npm and hope' [21:45:31] sounds promising.. [21:45:49] planemad: well, only *kind* of, since npm is tided to node... [21:46:11] YuviPanda, is there some way to install modules without npm? [21:46:33] planemad: well, you can install it on your computer and just put the node_modules file in your git repo and see if that works [21:46:39] planemad: that's what your partner did, I think [21:46:42] damn I keep forgetting his name [21:46:49] YuviPanda, Hugo [21:47:02] planemad: damn, yes [21:47:31] yup, we did that. but how does topojson get added to the path? [21:47:38] it still says topojson not found [21:47:53] YuviPanda, > https://github.com/WikimapsAtlas/node_modules/tree/master/node_modules [21:48:41] ah, heh. [21:48:52] planemad: do you know if topojson is pure node or is a binding? [21:49:04] no clue [21:53:32] YuviPanda, btw http://tools.wmflabs.org/wikiatlas2014/api/v1/ [21:53:38] my first api =) [21:53:44] :) [21:53:49] planemad: wooooo! :) congratulations! [21:53:51] ohai yug [21:54:01] ohayo! [21:54:34] i pushed the node modules, should work after the pull [21:55:29] yug, aha, let me try [21:56:31] planemad: Need to delete the former node_module folder [21:56:43] then git pull [21:56:55] i am ready to do it if ok [21:57:00] yug: planemad hold off for a min? [21:57:50] Ryan_Lane: this is indeed *very* terrible. 'oh, just install this newer version and you will be good to go, sure' [21:57:52] sigh [21:58:04] :D [21:58:16] yug: planemad ah, go ahead now :) I thought I could fix it in a 'nice' way but apparently not [21:58:17] well, just use npm for every single thing [21:58:23] including node itself [21:58:28] problem solved!! [21:58:51] Ryan_Lane: heh :) I *personally* wouldn't mind that, honestly, but I could see how that's just asking for massive headaches. [21:59:07] it's actually the most straightforward way of doing it [21:59:10] Yuvipanda: we already fought that npm bug for one our or so last time. [21:59:15] 3.. 2.. 1.. [21:59:17] yug: yeah. [21:59:19] Ryan_Lane: I agree [21:59:21] tracking what's installed after that is pretty impossible, though [21:59:31] Ryan_Lane: yup, that as well. [22:00:42] planemad: Someone removed the node_modules folder before me? [22:01:02] yug: that would be me :) [22:01:04] planemad, yes me [22:01:05] feel free to git reset me [22:01:12] hmm [22:01:15] Hahaha [22:01:23] heh, we all did [22:01:32] ok, i make sure i am alone doing the git pull [22:02:41] ok, time for me to head to bed. have a train tocatch in a few hours [22:03:12] yug, you are doing the dirty stuff I suppose? [22:03:30] Yeap. [22:04:00] ok, i try to demo topojson on server tonight. [22:04:19] planemad: seeya [22:04:41] yug, one topojson is working, this should give some numbers: http://tools.wmflabs.org/wikiatlas2014/api/v1/topojson/IN [22:04:47] *once [22:05:26] yug, bye! i'm in the train for 24 hours, will surface on saturday [22:08:13] Ok, ok. We manage. [22:08:46] Yuvipanda: does i have the acces right to git clone? It fails. [22:08:59] yug: you should [22:09:02] yug: let me check [22:09:27] yug: you do, I think. [22:09:35] yug: have you done 'become wikiatlas2014' before trying? [22:09:47] Yes, i did. [22:10:48] yug: works fine? [22:10:52] Yuvipanda: works via https, not via ssh [22:10:52] yug: what do you mean by 'it fails'? [22:11:07] yug: yeah, your ssh key isn't on the tools server :) [22:11:17] yug: always use https on machines that aren't local to you. [22:11:41] YuviPanda: Ok [22:11:47] yug: never copy private keys anywhere, and don't use ssh :) [22:13:21] Oh.. shit.... ^^ [22:13:37] you've your private keys on the box, don't you? [22:13:45] no, maybe not. [22:14:09] doesn't look like it at least ;) [22:14:19] but if you do, I highly suggest removing it and re-generating your keys. [22:14:36] Yuvipanda:I think i only shared my public key, but i never been careful about that [22:14:57] yug: yeah, if you ever share a private key, ever have it get off your machine, you should consider it insecure and stop using it [22:15:34] ok: topojson module operational on wikiatlas2014. [22:15:45] yug: yay! :) [22:16:07] Api not working, probably a path issue. [22:35:40] when I try to import matplotlib in tool labs it returns the error: "ImportError: No module named _tkinter, please install the python-tk package" [22:35:56] can someone install this package? [22:38:48] danilo: tkinter is a GUI toolkit, doesn't make sense in a server environment like toollabs [22:39:36] Yuvipanda: we cannot do more now without arun. Need to check the path to topojson bin. [22:39:44] yug: ah ok :) [22:41:15] I'm planning to use matplotlib to generate some graphs based in database queries, I will save the image, i wll not use plt.show() [22:43:00] danilo: http://stackoverflow.com/a/4935945/17865 might help [22:44:05] YuviPanda: thanks! [22:54:24] Coren: heya! I remember you fiddled with apt-get being broken a while ago, and that's resurfaced somewhat: https://phabricator.wikimedia.org/T76771 [22:54:33] Coren: a lot of the transient puppet failures are just apt-get being stuck and timing out [22:54:36] err [22:54:38] apt-get update [22:54:43] do you remember what was happening then? [23:41:43] YuviPanda: That was out of labs; it's often a symptom of IPv6 being configured on the box but no real v6 connectivity. [23:42:10] Coren: no, we had something happen in toollabs, and I think you fixed it by making Packages.gz not be gzipped? [23:42:51] That's something else indeed - yeah, just ungzipping it works - but that shouldn't make timeouts. [23:43:07] hmm, right [23:43:15] is there any way for me to diagnose what is happening? [23:43:19] I don't know if apt-get update keeps logs [23:43:26] perhaps I should just run it in a loop and see if it fails... [23:44:05] Running it in an strace might prove instructive. At least you'd know what it's trying to do. [23:44:25] Coren: true, but transient failure, so need to catch one in the act. [23:44:42] Coren: only reason I even know about it is because shinken spams me about puppet failures