[00:15:47] !log tools killed all db and tools-webproxy aliases in /etc/hosts for tools-webproxy, since otherwise puppet fails because ec2id thinks we’re not in labs because hostname -d is empty because we set /etc/hosts to resolve IP directly to tools-webproxy [00:15:51] Logged the message, Master [00:22:07] rillke1: just a fyi, the UA change has been merged :) [00:24:40] thank you so much, Yuvi [00:25:40] RECOVERY - Puppet failure on tools-webproxy is OK: OK: Less than 1.00% above the threshold [0.0] [06:52:07] can someone restart WDQ ? [06:52:20] https://wikitech.wikimedia.org/wiki/User:Yuvipanda/Restarting_magnus_wdq [07:17:25] YuviPanda: I'm about to step out but will look at the virt1000 puppetmaster failure when I get back [08:38:46] YuviPanda, ping? [09:14:42] PROBLEM - Puppet failure on tools-exec-02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:14:44] PROBLEM - Puppet failure on tools-login is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:14:46] PROBLEM - Puppet failure on tools-webgrid-03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:14:58] PROBLEM - Puppet failure on tools-exec-08 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:14:58] PROBLEM - Puppet failure on tools-exec-12 is CRITICAL: CRITICAL: 77.78% of data above the critical threshold [0.0] [09:15:19] PROBLEM - Puppet failure on tools-webgrid-01 is CRITICAL: CRITICAL: 88.89% of data above the critical threshold [0.0] [09:15:23] PROBLEM - Puppet failure on tools-exec-catscan is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:15:29] PROBLEM - Puppet failure on tools-exec-07 is CRITICAL: CRITICAL: 77.78% of data above the critical threshold [0.0] [09:15:41] PROBLEM - Puppet failure on tools-exec-09 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:16:31] PROBLEM - Puppet failure on tools-trusty is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:16:33] PROBLEM - Puppet failure on tools-webgrid-tomcat is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:16:41] PROBLEM - Puppet failure on tools-master is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:16:41] PROBLEM - Puppet failure on tools-webproxy is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:16:49] PROBLEM - Puppet failure on tools-exec-04 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:16:49] PROBLEM - Puppet failure on tools-webgrid-04 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:17:20] PROBLEM - Puppet failure on tools-webgrid-05 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:17:24] PROBLEM - Puppet failure on tools-exec-cyberbot is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:17:26] PROBLEM - Puppet failure on tools-exec-14 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [09:17:36] PROBLEM - Puppet failure on tools-exec-05 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:17:54] PROBLEM - Puppet failure on tools-exec-wmt is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:17:55] PROBLEM - Puppet failure on tools-submit is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:18:06] PROBLEM - Puppet failure on tools-exec-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:18:08] PROBLEM - Puppet failure on tools-redis is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:18:20] PROBLEM - Puppet failure on tools-exec-10 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:18:21] PROBLEM - Puppet failure on tools-exec-13 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:19:25] PROBLEM - Puppet failure on tools-mail is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:19:27] PROBLEM - Puppet failure on tools-exec-03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:19:31] PROBLEM - Puppet failure on tools-shadow is CRITICAL: CRITICAL: 77.78% of data above the critical threshold [0.0] [09:19:33] PROBLEM - Puppet failure on tools-dev is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [09:19:43] PROBLEM - Puppet failure on tools-exec-06 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:19:53] PROBLEM - Puppet failure on tools-exec-15 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:20:14] I do not know what 'data above the critical threshold' means... [09:20:25] PROBLEM - Puppet failure on tools-exec-gift is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:21:00] PROBLEM - Puppet failure on tools-exec-11 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:21:04] PROBLEM - Puppet failure on tools-webgrid-02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:22:10] YuviPanda: <3 [09:22:28] RECOVERY - Puppet failure on tools-exec-14 is OK: OK: Less than 1.00% above the threshold [0.0] [09:23:16] valhallasw`cloud: Do you know what a failure like that signifies? Puppet was totally broken until a moment ago… as soon as I fixed it tools started complaining. [09:24:13] andrewbogott: I'm not sure, but I'd guess it logs the return code of puppet, and compares that to 0? [09:24:31] RECOVERY - Puppet failure on tools-dev is OK: OK: Less than 1.00% above the threshold [0.0] [09:24:38] maybe, except… 77.78%? [09:24:43] RECOVERY - Puppet failure on tools-exec-02 is OK: OK: Less than 1.00% above the threshold [0.0] [09:24:45] RECOVERY - Puppet failure on tools-login is OK: OK: Less than 1.00% above the threshold [0.0] [09:24:48] How can you be 77% above 0? [09:24:59] RECOVERY - Puppet failure on tools-exec-12 is OK: OK: Less than 1.00% above the threshold [0.0] [09:25:03] RECOVERY - Puppet failure on tools-exec-08 is OK: OK: Less than 1.00% above the threshold [0.0] [09:25:10] andrewbogott: 77% of data points, maybe? not sure how that would be calculated, though [09:25:23] RECOVERY - Puppet failure on tools-exec-catscan is OK: OK: Less than 1.00% above the threshold [0.0] [09:25:23] RECOVERY - Puppet failure on tools-webgrid-01 is OK: OK: Less than 1.00% above the threshold [0.0] [09:25:29] RECOVERY - Puppet failure on tools-exec-07 is OK: OK: Less than 1.00% above the threshold [0.0] [09:25:35] seems to be recovering, anyway, so I assume it's some kind of aftershock from the original failure [09:26:49] RECOVERY - Puppet failure on tools-exec-04 is OK: OK: Less than 1.00% above the threshold [0.0] [09:26:53] RECOVERY - Puppet failure on tools-webgrid-04 is OK: OK: Less than 1.00% above the threshold [0.0] [09:28:07] RECOVERY - Puppet failure on tools-exec-01 is OK: OK: Less than 1.00% above the threshold [0.0] [09:29:31] RECOVERY - Puppet failure on tools-shadow is OK: OK: Less than 1.00% above the threshold [0.0] [09:29:47] RECOVERY - Puppet failure on tools-webgrid-03 is OK: OK: Less than 1.00% above the threshold [0.0] [09:31:43] RECOVERY - Puppet failure on tools-webproxy is OK: OK: Less than 1.00% above the threshold [0.0] [09:32:26] RECOVERY - Puppet failure on tools-exec-cyberbot is OK: OK: Less than 1.00% above the threshold [0.0] [09:32:37] RECOVERY - Puppet failure on tools-exec-05 is OK: OK: Less than 1.00% above the threshold [0.0] [09:32:59] RECOVERY - Puppet failure on tools-submit is OK: OK: Less than 1.00% above the threshold [0.0] [09:33:09] RECOVERY - Puppet failure on tools-redis is OK: OK: Less than 1.00% above the threshold [0.0] [09:33:19] RECOVERY - Puppet failure on tools-exec-10 is OK: OK: Less than 1.00% above the threshold [0.0] [09:34:28] RECOVERY - Puppet failure on tools-exec-03 is OK: OK: Less than 1.00% above the threshold [0.0] [09:34:42] RECOVERY - Puppet failure on tools-exec-06 is OK: OK: Less than 1.00% above the threshold [0.0] [09:35:43] RECOVERY - Puppet failure on tools-exec-09 is OK: OK: Less than 1.00% above the threshold [0.0] [09:35:59] RECOVERY - Puppet failure on tools-exec-11 is OK: OK: Less than 1.00% above the threshold [0.0] [09:36:05] RECOVERY - Puppet failure on tools-webgrid-02 is OK: OK: Less than 1.00% above the threshold [0.0] [09:36:41] RECOVERY - Puppet failure on tools-master is OK: OK: Less than 1.00% above the threshold [0.0] [09:37:53] RECOVERY - Puppet failure on tools-exec-wmt is OK: OK: Less than 1.00% above the threshold [0.0] [09:39:29] RECOVERY - Puppet failure on tools-mail is OK: OK: Less than 1.00% above the threshold [0.0] [09:39:53] RECOVERY - Puppet failure on tools-exec-15 is OK: OK: Less than 1.00% above the threshold [0.0] [09:40:25] RECOVERY - Puppet failure on tools-exec-gift is OK: OK: Less than 1.00% above the threshold [0.0] [09:41:31] RECOVERY - Puppet failure on tools-trusty is OK: OK: Less than 1.00% above the threshold [0.0] [09:41:33] RECOVERY - Puppet failure on tools-webgrid-tomcat is OK: OK: Less than 1.00% above the threshold [0.0] [09:42:19] RECOVERY - Puppet failure on tools-webgrid-05 is OK: OK: Less than 1.00% above the threshold [0.0] [09:43:19] RECOVERY - Puppet failure on tools-exec-13 is OK: OK: Less than 1.00% above the threshold [0.0] [10:49:37] andrewbogott: hey! sorry, just woke up (painkillers) [10:49:49] andrewbogott: our error messages are stupid for check_graphite, I should fix that... [10:50:06] YuviPanda: no worries. The issue is resolved now (I just graceful'd apache on virt1000) [10:50:38] The "100.00% of data above the critical threshold" error is cryptic, yeah [10:50:51] But it's not worth hurting your hands over :) [10:50:56] andrewbogott: but, *threshold* is ‘anything more than 0 puppet failures for more than 0 minutes’, [10:51:08] Ah, I see. that makes sense [10:51:35] And, also, that explains why the errors suddenly appeared when I fixed puppet. Presumably the checks were suppressed until virt1000 came back up [10:53:23] andrewbogott: right, because of the dependency [10:54:05] andrewbogott: heh, so I guess the dependency isn’t so bad - because otherwise we would’ve been spammed at the start itself. [10:54:12] andrewbogott: did you get an email saying puppetmaster is down? [10:54:21] YuviPanda: I did, that's how I knew to check. [10:54:31] So, everything worked. Just a bit of extra chatter at the end. [10:54:38] andrewbogott: niiice :) I get way too much shinken spam now to be able to see. [10:55:00] Hm, I got just that once notice, and then a resolved notice. So, just right. [10:55:02] andrewbogott: https://phabricator.wikimedia.org/T76771 is one of the leading causes of puppetspam. [10:55:20] andrewbogott: indeed, but that’s because you aren’t on contact list for toollabs or betalabs, while I’m on contact list for *everything* :) [10:55:30] whose fault is that? [10:55:34] andrewbogott: hehe :) [10:58:58] andrewbogott: I've a fairly decent plan to make the checks more resilient. Will do today. [10:59:24] andrewbogott: although I have no idea how to fix the apt get issue [11:00:05] Since no logs [11:00:10] I don't suppose puppet can distinguish between transient failures and real ones? [11:00:49] andrewbogott: nope. Also some transient failures we want alerts on - specifically, when a config change causes a service restart to fail [11:01:01] Since the restart won't be tried again [11:02:05] hm, true [11:02:32] andrewbogott: so I'm going to track number of failed restarts and set a strict check on *that* and make the regular puppet check flail only if it fails two consecutive times [11:03:01] That seems ok for labs… in prod we want to know right away if someone breaks something [11:03:17] andrewbogott: probably yeah. [11:03:28] andrewbogott: the alternative is to fix the underlying issue [11:03:35] Which is perhaps the better thing to do [11:04:01] Fixing apt get update would fix 80% of the transient issues [11:04:25] If it's possible. Apt relies on external servers; it could be that the failures are outside our influence [11:04:26] Perhaps I should run it in a tight loop to see when it fails ;) [11:04:46] andrewbogott: yup. It doesn't usually fail, it just... Times out [11:05:06] andrewbogott: maybe we can increase the time out but that might not help [11:06:06] yep, looking at that now [11:06:08] seems possible [11:12:21] andrewbogott: or we could debmirror our external repos [11:15:44] Hm, I see how to change it but not what the units are or what the default is :) [11:20:20] andrewbogott: hehe :) [11:22:26] andrewbogott: I’m thinking I’ll set up more ‘infrastructure’ checks as well soon. NFS comes to mind, and perhaps LDAP too? [11:23:33] yeah, we definitely need an ldap test. This is handled by icinga already, but -- your work is meant to duplicate/replace icinga right? [11:24:20] andrewbogott: well, for labs only at least :) [11:24:40] andrewbogott: shinken is supposed to replace icinga too, but alex will evaluate again before deciding that, I think. and it won’t be the same config - there will always be two instances. [11:24:44] in that case an ldap check might be redundant… I don't think it ever fails except for everything at once [11:24:50] heh, ok [11:25:19] andrewbogott: hmm, but we also already have a puppetmaster check on icing. [11:25:41] Hm, true, and it didn't tell me about the virt1000 thing. So... [11:25:45] I'm not sure what this teaches us [11:25:52] other than that I'm not set up properly in icinga :) [11:26:06] hehe :) [11:26:32] andrewbogott: heh, I just found out that salt has grains for roles per instance too [11:32:30] andrewbogott: can you also do a round of shell requests on wikitech at some point? [11:33:20] Yes, although maybe not tonight [11:35:32] andrewbogott: ok [11:35:39] andrewbogott: wait, where are you now? [11:35:42] singapore? [11:36:09] Actually, I just updated shell rights after saying I wouldn't :) [11:36:20] Yeah, Singapore -- just got here so somewhat dazed [11:36:40] andrewbogott: heh :) [11:36:46] andrewbogott: oooh, new timezone! [11:36:55] valhallasw`cloud: legoktm CR / merge on wikibugs? https://gerrit.wikimedia.org/r/178136 [11:37:14] weekly ops meeting is at the worst possible time [11:37:24] andrewbogott: heh, yeah, would be later for you than for me [11:37:34] 3 [11:37:36] yeah [11:37:43] YuviPanda: why did you remove the tool-labs catch-0all? [11:37:48] valhallasw`cloud: ouch. [11:37:54] Tool-Labs-tools(-.*)? < that one [11:38:04] see this is why I shouldn’t merge things by myself [11:38:05] also, while you're add it, add Quarry :-p [11:38:06] valhallasw`cloud: will amend [11:38:12] valhallasw`cloud: hehe :) yeah [11:38:29] YuviPanda: you can form an opinion about whether or not that change improves the puppet failure rate (since I'm currently not getting notices for them). [11:38:37] If it doesn't help we should revert or raise the timeout. [11:38:43] But I guess we need to give it a week first. [11:38:43] also to whatever other channel it had to report to (I forgot, it's on phab somewhere :P) [11:40:09] (03PS2) 10Yuvipanda: Update config file for new toollabs / labs-team project names [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/178136 [11:40:11] (03PS1) 10Yuvipanda: Add quarry to analytics, research and labs [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/178182 [11:40:15] valhallasw`cloud: ^ [11:40:19] andrewbogott: yeah, I can do that. [11:40:43] (03CR) 10Merlijn van Deen: [C: 032] Update config file for new toollabs / labs-team project names [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/178136 (owner: 10Yuvipanda) [11:40:58] (03CR) 10Merlijn van Deen: [C: 032] Add quarry to analytics, research and labs [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/178182 (owner: 10Yuvipanda) [11:41:02] (03CR) 10Merlijn van Deen: [V: 032] Update config file for new toollabs / labs-team project names [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/178136 (owner: 10Yuvipanda) [11:41:05] (03CR) 10Merlijn van Deen: [V: 032] Add quarry to analytics, research and labs [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/178182 (owner: 10Yuvipanda) [11:41:18] YuviPanda: can you do the deployment yourself? you should have access [11:41:21] YuviPanda: that patch is at least applying properly... [11:41:30] andrewbogott: heh, yeah :) [11:41:33] valhallasw`cloud: ok [11:41:37] thx [11:41:47] I'm out for now -- with any luck I'll show up for the meeting later [11:41:58] take good care of your hands! [11:41:58] andrewbogott: :) [11:42:02] andrewbogott: will try! [11:52:14] is labs bastion working for others? [12:07:28] ah, nevermind [12:09:41] YuviPanda, need some basic help with adding a path to bashrc [12:10:24] is there something wrong here? export PATH="$HOME/.rbenv/bin:/data/project/wikiatlas2014/wikimapsatlas-server/api/node_modules/topojson/bin:$PATH" [12:10:56] planemad: looks right. [12:11:05] planemad: try putting it in .profile or .bash_profile instead? [12:12:47] planemad: also you’re using ruby in this project as well? nodejs, python *and* ruby? :) [12:13:05] YuviPanda, nope, but that was already there in the .bashrc [12:13:17] planemad: heh :) [12:13:26] planemad: but yeah, putting it in .profile or .bash_profile might help [12:13:55] ok, trying that. Or is it that the bash needs to be restarted or something? [12:14:08] oh, yeah [12:14:09] that too [12:14:14] bashrc is only read at the start of the session [12:14:20] doesn’t magicall appear right after you edit it [12:14:39] so ssh again should work right? [12:16:33] new sessions, still says topojson: command not found [12:18:03] planemad: is there a file called topojson in that path? [12:18:12] planemad: does specifying the full path work? [12:18:19] YuviPanda, adding to .profile fixed it :) [12:18:28] planemad: yay :) cool [12:18:37] whats the reason? [12:19:05] planemad: https://stackoverflow.com/questions/415403/whats-the-difference-between-bashrc-bash-profile-and-environment [12:20:10] wikibugs: welcomeback [12:24:09] YuviPandanks, tha [12:24:12] ah [12:31:33] Coren: around? [12:49:36] 3Tool-Labs: install tdbc and tdbc::mysql - https://phabricator.wikimedia.org/T53129#824692 (10yuvipanda) Status of this bug? [12:55:02] 3Tool-Labs: install tdbc and tdbc::mysql - https://phabricator.wikimedia.org/T53129#824695 (10Giftpflanze) Still not done, still relevant and would be nice to have. [12:55:37] 3Tool-Labs: install tdbc and tdbc::mysql - https://phabricator.wikimedia.org/T53129#824696 (10yuvipanda) pinging @Coren? [12:56:07] valhallasw`cloud: Got it approved, so you can test if you have a moment :) [13:08:36] YuviPanda: is there a debian image yet, or still in progress ? [13:08:48] matanya: very much in progress [13:09:06] YuviPanda: Just work up. [13:09:13] woke* [13:11:29] YuviPanda: any eta ? [13:11:40] matanya: not sure, andrewbogott_afk was working on it [13:11:55] thanks [13:12:20] Coren: ah, ok! Kelson was having trouble sshing in, and I can’t exactly figure out what was going on [13:14:56] I'll take a look once I'm alert. [13:15:02] Coren: ok! [13:31:10] YuviPanda: I expect you already did the obvious checks in the logs? [13:31:48] Coren: yeah, failed with a uudecode failure on the pubkey. not fully sure what that means. [13:31:59] Coren: and I wasn’t sure how to check his actual public key [13:32:32] It means the pasted key is not valid; you can look at the key in /public/keys -- what's his account name? [13:32:41] oh, right. [13:32:42] kelson [13:32:51] Kelson: are you still around? [13:33:36] YuviPanda: There are newlines inside the key. I expect it's because the way hey copied/pasted it introduced a word wrap. [13:33:36] I forgot about /public/keys [13:33:44] aaaaah [13:33:45] I see [13:33:51] cat’d the key and then pasted it. [13:33:53] Kelson: ^ [13:35:22] So yeah, just removing the spurious newline from the middle of his key should work. :-) [13:35:30] cool :) thanks Coren [13:47:34] Coren: also, was considering adding monit to toollabs (can eventually replace big brother). Not invented here, plus is much more configurable. I’ll take a look next week or somesuch. [13:47:56] people can use that to monitor particular endpoints in their webservice as well, if they require. plus external things, etc. [13:48:36] Yeah, I saw your email. Not in love with the concept - but we'll go into it later. [13:49:25] hmm, ok [15:45:03] YuviPanda, Coren and other fellow tool admins, can you have a look on my last wikitech e-mail and eventually share your opinion should you have some? https://lists.wikimedia.org/pipermail/wikitech-l/2014-December/079779.html [15:45:40] petan: Yeah, I like the idea though you'll need to support at least python with such a framework as it's used as often as PHP [15:46:21] yes I know... that is why I put (php) in there, just to make it clear. I have no problem with support of python, but I have no idea how to mix php with python + I have poor python knowledge [15:46:57] we could create python version that would contain same classes and function though [15:47:03] * functions [15:48:27] I can even imagine that this "framework" would be in some shared location just as pywikipediabot is now, so that its users wouldn't need to bother with updates. For that however perfect backward compatibilty needs to be ensured. [16:09:02] hi, how can I reset my LDAP password? [16:11:04] actually, disregard, found a way around it [18:13:45] 3Labs-Team: Transient failure on labs puppet hosts because they can't find role::labs::instance in puppet - https://phabricator.wikimedia.org/T77644 (10yuvipanda) 3NEW p:3Triage [18:14:07] andrewbogott: so, no apt-get related spam today! (so far!) but a different transient one: https://phabricator.wikimedia.org/T77644?workflow=create [18:20:18] 3Wikimedia-Labs-Infrastructure: Labs: Enable "Puppet freshness" checks in shinken for cvn project - https://phabricator.wikimedia.org/T68573#829886 (10Krinkle) [18:21:50] YuviPanda: that's a strange one. [18:22:08] andrewbogott: yeah [18:22:11] Although, that box uses a local puppetmaster doesn't it? [18:22:24] andrewbogott: hmm, yeah, deployment-salt [18:22:27] So it might reflect some actual outage. [18:23:57] andrewbogott: hmm, possible, but then it should’ve affected the others too [18:24:05] andrewbogott: anyway, I’ll keep it open to see if similar things pop up [18:24:16] not if the failure was very brief and one instance just got lucky [18:24:20] yep, we'll see if it recurrs [18:24:27] andrewbogott: let’s also consider the apt-get one fixed if I don’t see any other false warnings for another couple of days [18:24:40] andrewbogott: not one so far, which I consider good! Usuaully I get one eveyr hour or so [18:24:52] Wow -- easy fix if that's it. [18:25:31] andrewbogott: yeah. I’m cautiously optimistic [18:26:04] andrewbogott: also, what’s the status on spinning up new instances? I’m going to help magnus set up wdq as a new project, fully puppetized. Should I wait for new machines to be online before spinning up instances? [18:26:36] YuviPanda: yeah, ideally. Considering I just had to kill a few to make room for a new instance to start... [18:26:39] Would be best if he can wait. [18:27:37] andrewbogott: alright, I’ll just re-use existing instances for now. [18:28:18] thanks [18:31:56] andrewbogott: is there any eta on debian image for labs ? [18:32:25] matanya: I'm working on it 80% of my available time right now. Lots of work ahead though... [18:32:43] I see, need any help ? [18:32:44] our old tool, python-vmbuilder, is deprecated and doesn't work for jessie. New tool (bootstrap-vz) is new and doesn't support our use case. [18:33:17] Yes, maybe. I'll write up a list of features we want for bootstrap-vz and share with you. Thanks for the offer. [18:34:19] sure [18:34:56] + when then new HP servers are in place, please give my the joy of globbing more resources [18:35:16] matanya: :) several people are working on it furiously :) [18:35:56] YuviPanda: i'm a demanding customer :D [18:36:15] heh :) [18:36:39] I actually have my own customers to satisfy [18:36:58] heh, it’s customers all the way down [18:37:05] namely wikimania folks, wikidata folks [18:43:48] 3Wikimedia-Labs-Infrastructure: Labs: Enable "Puppet freshness" checks in shinken for cvn project - https://phabricator.wikimedia.org/T68573#830338 (10yuvipanda) 5Open>3Resolved Done! Will email @Krinkle if there are issues. You can see status at http://shinken.wmflabs.org/all?global_search=cvn [18:49:22] 3Labs-Team, Analytics-Engineering: LabsDB problems negatively affect analytics tools like Wikimetrics, Vital Signs, Quarry, etc. - https://phabricator.wikimedia.org/T76075#830524 (10Tnegrin) [19:02:04] YuviPanda: mmm. I tried to do nesting, but that of course fails: query -> latest_revision -> query -> etc :D [19:02:17] I suppose the flat latest_yak wasn't so bad [19:02:25] :) [19:05:39] To ssh://valhallasw@gerrit.wikimedia.org:29418/analytics/quarry/web.git [19:05:39] * [new branch] HEAD -> refs/publish/master/(detached [19:05:40] dafuq [19:06:06] does seem to work though, https://gerrit.wikimedia.org/r/#/c/178128/ [19:08:46] valhallasw`cloud: line endings! [19:15:24] Hello [19:16:15] Is this a forum to seek help regarding a wikipedia page? [19:16:58] this is an online chat [19:17:15] help regarding a wikipedia page may be more suited for #wikipedia-en [19:17:21] what's your issue? [19:18:14] petan, should I request a gerrit repo for that? [19:19:12] I was a first time contributor to a wikipedia page. Created a topic called "Startups in Hyderabad", I called for the community to participate to build the article [19:19:44] A lot of people contributed to it and suddenly its deleted [19:20:01] with no way to recover [19:20:22] well, it is possible to recover that [19:21:02] don't worry about it [19:21:09] It would be great help, I have been googling around but no luck yet [19:21:13] which is different than "it will stay in wikipedia" [19:21:21] please join #wikipedia-en [19:21:48] That is fine, as long as I can go back to the community and ask them to shift it to some other place [19:21:49] can someone bump shell approval for user:Dfko, we are trying to work together on a project. [19:21:49] Thanks [19:22:04] if #wikipedia-en is not a link for you, type /join #wikipedia-en [19:22:42] notconfusing: hey! looking [19:22:57] YuviPanda: NOW WHAT :O [19:23:06] valhallasw`cloud: you have wrong line endings! look? [19:23:11] :'( [19:23:18] thanks YuviPanda [19:23:27] ftupid text editor [19:23:33] notconfusing: done [19:23:35] well [19:23:36] almost done [19:23:48] heh, that's funny [19:23:58] for new document, the editor is set to unix line endings [19:24:03] notconfusing: done [19:24:06] but apparently it guesses windows endings for an empty document [19:24:39] YuviPanda, thanks [19:24:46] well, an empty document has the same line endings as an empty window document xDD [19:30:21] YuviPanda: ſtupid line endings fixed [19:30:46] valhallasw`cloud: :D ok. [19:31:37] valhallasw`cloud: https://gerrit.wikimedia.org/r/#/c/178129/ still has jenkins complaining [19:31:41] ta [19:33:03] SUCCESS [19:34:23] valhallasw`cloud: I’m going to deploy now [19:34:27] <3 [19:36:19] !log quarry deploying to get in valhallasw`cloud’s patches [19:36:21] Logged the message, Master [19:37:09] valhallasw`cloud: test now? [19:39:43] http://quarry.wmflabs.org/run/4015/metao/ [19:39:45] http://quarry.wmflabs.org/run/4015/meta \o/ [19:40:12] maybe in true REST style it should have an url for the resultset, but w/e [19:40:54] valhallasw`cloud: heh. I plan on restructuring the API at some point anyway. will introduce versions, so current URLs won’t break at least for a year. [19:41:35] 3Labs-Team, Analytics-Engineering: LabsDB problems negatively affect analytics tools like Wikimetrics, Vital Signs, Quarry, etc. - https://phabricator.wikimedia.org/T76075#831356 (10Springle) MariaDB 10.0.15 upgrade done. Seems good with no *new* replication glitches that we've found. About to run the resync pro... [19:43:20] YuviPanda: \o/ [19:44:22] valhallasw`cloud: \o/ [19:46:20] valhallasw`cloud: let me know what you end up building :) [19:47:08] :-) [19:52:01] 3Labs-Team: Sanity check submitted SSH keys on (Tool) Labs - https://phabricator.wikimedia.org/T77902 (10Kelson) 3NEW p:3Triage [19:52:48] 3MediaWiki-extensions-OpenStackManager: Sanity check submitted SSH keys on (Tool) Labs - https://phabricator.wikimedia.org/T77902#831419 (10yuvipanda) [19:55:22] (03CR) 10Ejegg: [C: 031] Add more repos maintained by Fundraising Tech [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/177378 (owner: 10Awight) [19:59:54] hi, I just set up wikimedia labs access and am trying to connect to tools-login.wmflabs.org but it is not working with my public key and I read something a bit cryptic about using one's private key as password (?!?) on the wiki but the server is not set up to accept password login; can someone tell me what to do? [20:00:52] dfko, did you upload your publick ssh key? [20:01:03] yes, and I can log in to bastion fine [20:01:16] dfko: then you're probably not in the tools project yet [20:01:39] (step 6 on https://wikitech.wikimedia.org/wiki/Help:Tool_Labs : "Wait for your requests to be completed (you should receive messages on your talk page).") [20:02:07] dfko: where did you read this thing about using a private key as password? [20:02:11] yuvipanda just added me a few minutes ago [20:02:19] oh, I see [20:02:22] perhaps? [20:02:23] no, dfko I granted shell access [20:02:26] " (your login is the one you specified in sign up form; your "pass" is your private key matching the public key you registered)." [20:02:27] wtf. [20:02:57] oh, so how do I join the tools project then? [20:03:12] dfko: I’ll just add you, moment [20:03:20] ok, thanks [20:03:24] dfko: what’s your wiki username? [20:03:35] same, dfko [20:03:54] dfko: done [20:04:00] thanks [20:06:32] great, works now [21:18:34] Ugh. I think I want VB6-style html :| [23:15:21] 3Labs-Team: apt-get update fails some times causing puppet failure alarms - https://phabricator.wikimedia.org/T76771#832114 (10yuvipanda) Ia0e09f620d06d086c8619bb036ec22f7de23ab10 is an attempt to fix this, let's see how well it fares. [23:23:47] 3Quarry: Add date when query was last run - https://phabricator.wikimedia.org/T77941 (10Capt_Swing) 3NEW p:3Normal [23:27:56] 3Quarry: Make the query description window expanable - https://phabricator.wikimedia.org/T77945#832191 (10Capt_Swing) [23:40:59] 3Quarry: Allow comments on queries - https://phabricator.wikimedia.org/T71543#832306 (10Capt_Swing) If we add comment functionality, we should also add some notification mechanism. The query author is unlikely to notice that someone has commented on their query if they don't get a ping of some sort. [23:41:28] 3Quarry: Allow comments on queries - https://phabricator.wikimedia.org/T71543#832312 (10yuvipanda) [23:42:32] 3Quarry: Allow comments on queries - https://phabricator.wikimedia.org/T71543#832317 (10Capt_Swing) yep... ugh. That's not going to be trivial. [23:42:51] 3Multimedia, Tool-Labs: Migrate our labs instance(s) to eqiad - https://phabricator.wikimedia.org/T77255#832318 (10Gilles) [23:56:15] 3Quarry: Allow comments on queries - https://phabricator.wikimedia.org/T71543#832409 (10yuvipanda) Yup. only option is if flow is flexible enough, IMO.