[00:52:49] elee: looking at your commit. gonna try and get you at least a +1 (because I'm not powerful :( ) [00:53:00] Negative24: =p [00:53:04] this was fun though [00:53:43] elee: quick question: was there a reason you split ln 36 in base to ln 34-35 in patch 12? [00:53:57] Negative24: got any tickets I can try my hand at? [00:54:00] which file is this? [00:54:41] oh [00:54:44] adminlogbot.py? [00:54:51] because pep8 was complaining about long lines [00:55:05] pep8 should go shove it but I guess style guides are a thing =p [00:56:09] ha [00:56:21] sorry got distracted. filename is kinda importand :D [00:56:24] =p [00:56:37] yes linters are very strict (hence the name strict) [00:57:01] =.= [00:57:17] anyone able to take care of T105014 ? it kinda broke something and it was just brought to my attention [00:57:54] "Log files need restor-ing-" [00:58:22] messing with you, but uh sorry for the nfs outage screwing things up for you. =/ [00:59:04] elee: its just one file [00:59:10] elee: looks good +1'd [00:59:17] roger Betacommand [00:59:20] and fine business Negative24 [00:59:21] that should help it along a bit [00:59:29] <3 [01:00:06] unless someone sees something I didn't and -1's it. then its awkward for me :P [01:00:11] heh [01:00:21] honestly I just did this to fix one issues [01:00:24] issue* [01:00:28] but no CI is a thing [01:00:45] yea most definitely [01:07:49] 6Labs, 6WMF-Legal: Make sure tools can be taken over after they are abandoned - https://phabricator.wikimedia.org/T102066#1436525 (10ZhouZ) Thanks @Csteipp! @Ricordisamoa, I hope this answers your questions. I don't see more questions related to what you originally raised on the legal side but let me know if... [01:16:57] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1408 is CRITICAL 20.00% of data above the critical threshold [0.0] [01:18:31] PROBLEM - Puppet failure on tools-redis-02 is CRITICAL 40.00% of data above the critical threshold [0.0] [01:43:31] RECOVERY - Puppet failure on tools-redis-02 is OK Less than 1.00% above the threshold [0.0] [01:45:15] 6Labs, 10Tool-Labs, 10Pywikibot-compat-to-core, 10pywikibot-core, 5Patch-For-Review: Install all pywikibot python optional dependencies on tool labs - https://phabricator.wikimedia.org/T86015#1436547 (10jayvdb) [01:46:56] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1408 is OK Less than 1.00% above the threshold [0.0] [02:16:50] PROBLEM - Puppet failure on tools-exec-1205 is CRITICAL 20.00% of data above the critical threshold [0.0] [02:18:07] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1208 is CRITICAL 22.22% of data above the critical threshold [0.0] [02:19:19] PROBLEM - Puppet failure on tools-submit is CRITICAL 55.56% of data above the critical threshold [0.0] [02:19:35] PROBLEM - Puppet failure on tools-exec-gift is CRITICAL 40.00% of data above the critical threshold [0.0] [02:19:35] PROBLEM - Puppet failure on tools-redis-02 is CRITICAL 50.00% of data above the critical threshold [0.0] [02:20:39] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1407 is CRITICAL 60.00% of data above the critical threshold [0.0] [02:20:56] PROBLEM - Puppet failure on tools-exec-1203 is CRITICAL 50.00% of data above the critical threshold [0.0] [02:20:59] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1209 is CRITICAL 50.00% of data above the critical threshold [0.0] [02:21:56] PROBLEM - Puppet failure on tools-exec-1206 is CRITICAL 60.00% of data above the critical threshold [0.0] [02:44:16] RECOVERY - Puppet failure on tools-submit is OK Less than 1.00% above the threshold [0.0] [02:44:32] RECOVERY - Puppet failure on tools-exec-gift is OK Less than 1.00% above the threshold [0.0] [02:44:33] RECOVERY - Puppet failure on tools-redis-02 is OK Less than 1.00% above the threshold [0.0] [02:45:40] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1407 is OK Less than 1.00% above the threshold [0.0] [02:45:54] RECOVERY - Puppet failure on tools-exec-1203 is OK Less than 1.00% above the threshold [0.0] [02:45:59] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1209 is OK Less than 1.00% above the threshold [0.0] [02:46:53] RECOVERY - Puppet failure on tools-exec-1205 is OK Less than 1.00% above the threshold [0.0] [02:46:55] RECOVERY - Puppet failure on tools-exec-1206 is OK Less than 1.00% above the threshold [0.0] [02:48:09] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1208 is OK Less than 1.00% above the threshold [0.0] [04:06:06] Does anyone know how to make `webservice uwsgi-python` use Python 3? [04:47:53] 6Labs, 6WMF-Legal: Make sure tools can be taken over after they are abandoned - https://phabricator.wikimedia.org/T102066#1436673 (10Ricordisamoa) Thanks @csteipp and @ZhouZ! Should I ensure that co-maintainers have read the OAuth TOU and promised not to publish keys? [06:23:02] [13intuition] 15siebrand pushed 1 new commit to 06master: 02https://github.com/Krinkle/intuition/commit/d116fade99531460071ef930006d72d34b9c3875 [06:23:03] 13intuition/06master 14d116fad 15Siebrand Mazeland: Localisation updates from https://translatewiki.net. [07:08:38] SigmaWP: you can't at the moment [07:17:50] valhallasw`cloud: don't worry i figured it out [07:34:40] 6Labs, 10Incident-20150617-LabsNFSOutage, 3Labs-Sprint-103, 3Labs-Sprint-104, 3Labs-Sprint-105: Labs: increase size of the volume for the maps project and restore - https://phabricator.wikimedia.org/T103358#1436903 (10Kghbln) @coren That's awesome news. Cannot wait to see these maps being up and showing... [07:49:11] 6Labs, 10VisualEditor, 10wikitech.wikimedia.org, 3VisualEditor 2015/16 Q1 blockers: Enable VisualEditor by default on wikitech.wikimedia.org - https://phabricator.wikimedia.org/T104961#1436931 (10hashar) @Jdforrester-WMF thank you very much! [08:00:10] 6Labs, 10Labs-Infrastructure, 5Continuous-Integration-Isolation, 3Labs-Sprint-103, and 3 others: Instances without a shared NFS storage suffers from a 3 minutes boot delay - https://phabricator.wikimedia.org/T102544#1436938 (10hashar) Works for me with Trusty. I created a Trusty instance on the integration... [09:37:36] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL 50.00% of data above the critical threshold [0.0] [09:40:55] 6Labs, 6operations, 10wikitech.wikimedia.org: intermittent wikitech failures - https://phabricator.wikimedia.org/T105131#1437053 (10fgiunchedi) 3NEW [10:02:34] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK Less than 1.00% above the threshold [0.0] [10:59:54] Access denied for user 's51291'@'%' to database 's51291_stone' [11:00:05] database chema has been change again? .... :/? [11:00:41] Steinsplitter: the database is probably called s51291__stone? [11:00:57] no [11:01:02] it has worked for a year now [11:01:17] exept the few times wmf played around with databases [11:01:34] i simply don't have time took every 5 minutes where the database has been gone [11:01:41] Steinsplitter: user databases have always had two underscores in their name [11:02:14] why it has worked since now? [11:03:45] no, it isn't called s51291__stone [11:03:51] it was called s51291_stone [11:05:10] Steinsplitter: right, so that was a bug / security issue that was 'fixed' recently. I guess we can just rename the db to have two underscores and it should work ok. [11:05:21] Steinsplitter: which host is this on? how do you connect to it? which tool is this for? [11:05:59] i created it on commonswiki.labsdb [11:06:20] YuviPanda: ...aha. That explains. There's quite some other databases with just a single underscore :/ [11:06:29] sigh, a lot of renaming then... [11:07:04] YuviPanda: why was there no mail about this to labs-l...? [11:07:16] show databases; <--- i can no longer find it there. [11:07:21] no iddea where it has been moved [11:07:43] jynus: ^ (grant fixing seems to have broken some databases) [11:08:06] jynus: I wonder if we should fix those by renaming databases or by just adding additional grants? [11:08:32] if possible, renaming databases with 2 __ [11:09:04] I wonder how many there are [11:09:05] * YuviPanda checks [11:10:30] done [11:10:46] Steinsplitter, test s51291__stone [11:12:00] yes, thans :) [11:12:04] jynus: I found a bunch more - https://dpaste.de/NBmZ [11:12:21] jynus: can you rename them all? I'll email labs-l... [11:12:34] YuviPanda, that is not good [11:12:59] it's 6 databases. [11:13:10] ok [11:13:35] for the record [11:13:38] YuviPanda: are you sure thos are all? that looks like just the _p ones [11:14:08] it is documented that databases should be the "username__*" [11:14:14] https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Database [11:14:35] if it worked before, it was a bug being exploited [11:14:45] jynus: yes, but unfortunately since our code didn't enforce it we have to 'fix' these to be right for people... [11:15:01] YuviPanda, I am explaining the reasoning to the users [11:15:08] oh yeah, cool :) [11:15:09] (not to you :-P) [11:15:56] it would be good to explain why a single _ is a security hole in the email [11:16:19] http://etherpad.wikimedia.org/p/db-renames [11:16:28] that's actually a good question [11:16:32] I don't understand the _ vs __ [11:17:10] _=all characters. When I changed it, I went for the documented format "__" [11:17:27] aka "\_\_" [11:17:30] right, so it's just original design from Coren - do you know why you picked __ for user databases? [11:18:18] I asume it would be to not confuse it from _p databases [11:19:07] (actually, I did not change it, I only make it work) [11:19:10] possibly, although these do start with s*** [11:19:13] indeed. [11:20:05] yeah, I gues the question is 'what does s1234_p do', but now we've just moved the question to 's1234__p' [11:20:33] jynus: u1237_platonides_wlm_p also needs renaming, on labsdb1002, I think. [11:20:39] I'll email individuals right after [11:20:58] is there a reason we cannot just revert the change to require a single _? [11:20:58] valhallasw`cloud: yeah, but right now step 1 is getting them to work, I guess, so let's rename and then ask coren for rationale when he's back... [11:21:04] let me give you a list [11:21:07] valhallasw`cloud: that'll break all the ones using __ [11:21:16] if we do it properly. [11:21:26] eh, why? it's 'starts with s1234\_' [11:21:35] underscores are allowed in a database name [11:21:39] and if I have your permissions for renaming them, I will do it [11:21:43] jynus: yeah, go ahead. [11:22:09] except the rename doesnt' fix anything [11:22:15] people still need to change their code, then [11:22:28] indeed. [11:22:46] so unless there's a very good reason to not (at least temporarily!) allow a single underscore, please change the grants instead [11:23:09] that was what I was going to do [11:23:20] change the grants? [11:23:21] but maybe only grant CREATE for double underscore? [11:23:23] yes [11:23:25] ok [11:23:35] it is on the ticket [11:24:24] which one? [11:24:25] * YuviPanda checks [11:24:45] T101758 [11:25:17] The first and most important step has been done [...] However, the issues haven't yet been fixed... [11:26:10] maybe you schould send out a notice to tools operators, so they don't need spend time to search db's [11:27:03] after the security issue has been fixed, yes [11:27:07] jynus: hmm, I feel slightly out of depth on all this mostly because I don't know why things are the way they are and coren was the one who was involved afaik. He was on vacation last week but is back now, maybe I should just leave it to you two once he's online today (in a few hours) [11:33:51] YuviPanda: Do you know who to ping / who could do something about https://phabricator.wikimedia.org/T104417 ? [11:34:23] YuviPanda: coren is working on restoring the files ( https://phabricator.wikimedia.org/T103358 ) [11:34:35] eh, andre__ ^ [11:34:43] * valhallasw`cloud needs more coffee [11:34:56] :) [11:35:03] andre__: yes, what valhallasw`cloud said. that's already set as blocked ticket [11:49:59] oh thank you [11:50:42] 6Labs, 6Discovery, 10Maps: WikiMiniAtlas (wma.wmflabs.org) is still down - https://phabricator.wikimedia.org/T104417#1437336 (10Aklapper) Asked on IRC and @coren is working on restoring the files, see T103358 (a blocker for this task)... [11:50:44] * andre__ should read closer [12:17:25] (03PS1) 10Hashar: Passwords/keys for Nodepool [labs/private] - 10https://gerrit.wikimedia.org/r/223536 (https://phabricator.wikimedia.org/T89143) [13:49:34] 6Labs, 6operations, 3Labs-Sprint-102, 3Labs-Sprint-103, and 3 others: labstore has multiple unpuppetized files/scripts/configs - https://phabricator.wikimedia.org/T102478#1437568 (10coren) [14:27:36] Coren, how can I monitor the Cyberbot exec node? [14:27:55] CP678|Studying: That's a very vague question. What do you mean by monitor? [14:28:05] Something like Ganglia? [14:28:52] Coren, ^ [14:31:24] Coren, did I say something dumb? [14:31:49] CP678|Studying: No, that's an interesting question, actually - one for which I don't have an immediately clear answer. [14:32:58] we used to have ganglia in labs [14:34:32] We have graphite, that might do the trick. [14:34:59] Coren, can you provide a link and check your PM [14:35:31] CP678|Studying: If we're collecting the stats you need, https://graphite.wmflabs.org/ might give you what you want. [14:35:32] (03PS2) 10Andrew Bogott: Passwords/keys for Nodepool [labs/private] - 10https://gerrit.wikimedia.org/r/223536 (https://phabricator.wikimedia.org/T89143) (owner: 10Hashar) [14:35:59] (03CR) 10Andrew Bogott: "Fake passwords seem better than empty passwords -- that way if something pulls these in it'll be more obvious what's going wrong :)" [labs/private] - 10https://gerrit.wikimedia.org/r/223536 (https://phabricator.wikimedia.org/T89143) (owner: 10Hashar) [14:37:19] CP678|Studying: Depending on what you need to check for your workers, the stats under CPU or memory might be what you need. [14:37:40] Lots of memory and CPU [14:37:47] Good lord. [14:37:51] CP678|Studying: But, right now, there is little to no support for customizing the metrics we collect per-instance [14:38:04] That's a lot of exec nodes. :p [14:39:03] CP678|Studying: Much of this is historical data that could be expunged by now. [14:39:17] CP678|Studying: But the metrics for tools-exec-cyberbot are in there. [14:39:39] I see them. I wish there was a total. [14:43:46] So I have about 1.5 GB [14:44:00] Coren, PM? [14:44:51] (03CR) 10Hashar: [C: 031] "yeah indeed nice idea :-}" [labs/private] - 10https://gerrit.wikimedia.org/r/223536 (https://phabricator.wikimedia.org/T89143) (owner: 10Hashar) [14:45:01] andrewbogott_afk: +1 for fake passwords :} [14:53:08] Coren: can you triage and do the recovery requests that came in this week? A reminder email to labs-announce might also be good... [14:53:53] YuviPanda: Sure thing; I'm doing some tests right now re. maps, but I'll do the triage right after. [14:54:00] ok, thanks. [14:54:16] Coren: jynus also had some questions about labsdb grants that you're probably best placed to answer [14:56:30] things are tracked on [14:56:54] T101758#1437317 [14:59:37] I actually have some questions now that I see [15:00:01] differences between account names and its permissions? [15:00:41] jynus: yes, it'll be awesome if you can co-ordinate with Coren on getting answers to / documenting current behaviors / fixing as you see fit. I'm working on the rewrite of the grants code (and other things) atm. [15:00:52] I also wasn't around when this setup was first done and so have very few answers... [15:02:00] Coren, I expect you to help with T85868 [15:02:09] I already did some [15:03:44] jynus: I'm most familiar with the front half, I'll be glad to document it. [15:05:25] YuviPanda, is the script something that you want to do personally? [15:05:41] well, I've a good chunk of it on the way and am pushing up a patch atm. [15:05:45] doesn't have the mysql bits in it though [15:05:47] ok, ok [15:05:52] I am not pressing it [15:06:00] just knowing if you have owned it [15:06:05] I am ok [15:06:24] :) I'm writing a WIP commit message now so I can push it up [15:06:31] so the plan is [15:06:41] yuvi finishes the script [15:07:10] I helped with the final grants, but that may need tweeking (coren may need to chime here) [15:07:27] I am doing things at production side already [15:07:36] and have to fix the pending grants [15:07:55] is that ok for everyone? [15:07:57] after making a change to "operations/debs/morebots", who knows the process after that, where do we build the .deb and how does it get installed in tool labs? [15:08:13] Yeah, that breaks what were the original requirements, but clearly we've been too liberal in replicating the toolserver setup. [15:08:31] eh, operations/debs/adminbot .. the name is always confusing [15:08:36] but same thing [15:08:46] mutante: Labs exec nodes use ensure => latest so if it gets updated in the repo they'll pick it up [15:09:02] Coren: so it should be in the normal apt.wikimedia.org repo [15:09:03] well, the suggestion, Coren, is to use roles, but I discussed with Springle and that requires 10.1 to be transparent [15:09:05] ? [15:09:21] and that is not even stable [15:11:18] mutante: Once upon a time, we used a local repo for this; you might want to double check with apt source morbots comes from now. [15:11:28] mutante: ... *adminbot [15:12:09] Coren: ah, ok, will do, thx [15:15:12] 6Labs: clean up old ec2id-based salt keys on labs - https://phabricator.wikimedia.org/T103089#1437862 (10Andrew) I've cleaned up a few of these. I presume that the unreachable instances are in state 'shutdown' or 'error'? [15:16:09] 10Tool-Labs-tools-Morebots: build new .deb for adminbot/morebots, install on toollabs - https://phabricator.wikimedia.org/T105169#1437863 (10Dzahn) 3NEW [15:16:30] 10Tool-Labs-tools-Morebots: build new .deb for adminbot/morebots, install on toollabs - https://phabricator.wikimedia.org/T105169#1437870 (10Dzahn) [15:20:06] 6Labs, 7Database: Provision a labsdb useraccount that can be used to run replica-addusers.pl - https://phabricator.wikimedia.org/T104476#1437872 (10coren) >>! In T104476#1433541, @jcrespo wrote: > Obviously, feel free to argue about the grants provided or ask any question. My only real question is: why not WI... [15:26:25] 10Tool-Labs-tools-Morebots: build new .deb for adminbot/morebots, install on toollabs - https://phabricator.wikimedia.org/T105169#1437894 (10JanZerebecki) [15:31:23] mutante: it's not in /data/project/.system/, so probably on apt.wm.o, yeah [15:31:44] mutante: but for the long term, killing the .deb and moving to a fabric-based deploy might make more sense.. [15:42:47] 10Tool-Labs-tools-Morebots: build new .deb for adminbot/morebots, install on toollabs - https://phabricator.wikimedia.org/T105169#1437931 (10valhallasw) A simple `debuild` seems to correctly build it, except for a few lintian issues ``` E: adminbot changes: bad-distribution-in-changes-file precise-wikimedia W:... [15:52:13] 6Labs, 10Incident-20150617-LabsNFSOutage, 3Labs-Sprint-103, 3Labs-Sprint-104, 3Labs-Sprint-105: Labs: increase size of the volume for the maps project and restore - https://phabricator.wikimedia.org/T103358#1437957 (10coren) This is over 1T done now. The copy is slow because the actual filesystem usage... [15:58:13] 6Labs, 7Database: Provision a labsdb useraccount that can be used to run replica-addusers.pl - https://phabricator.wikimedia.org/T104476#1437995 (10jcrespo) Because as there are no roles, and no centralizing management of privileges, I now have to check individually the 8000 accounts for each server in order t... [15:59:06] 6Labs, 7Database: Provision a labsdb useraccount that can be used to run replica-addusers.pl - https://phabricator.wikimedia.org/T104476#1438001 (10coren) >>! In T104476#1437995, @jcrespo wrote: > Having no GRANT OPTION //at least// allow us to script the changes. Fair enough. [16:40:04] 10Tool-Labs-tools-Morebots, 5Patch-For-Review: build new .deb for adminbot/morebots, install on toollabs - https://phabricator.wikimedia.org/T105169#1438158 (10Dzahn) i built 1.7.7 and imported it into our APT repo: http://apt.wikimedia.org/wikimedia/pool/main/a/adminbot/ BUT.. why is 1.7.6 not there?? whi... [16:41:39] 10Tool-Labs-tools-Morebots, 5Patch-For-Review: build new .deb for adminbot/morebots, install on toollabs - https://phabricator.wikimedia.org/T105169#1438163 (10Dzahn) "adminbot is already the newest version." yea, so 1.7.5 and 1.7.7 is for precise, but tools-bastion runs trusty [16:43:26] ^ hacks [16:43:42] please update repo when building new package versions [16:59:49] anyone around to help me poke at a http 502 bad gateway for a fresh instance? [16:59:52] not sure what i've messed up [17:01:11] 10Tool-Labs-tools-Morebots, 5Patch-For-Review: build new .deb for adminbot/morebots, install on toollabs - https://phabricator.wikimedia.org/T105169#1438186 (10Dzahn) carbon: Skipping inclusion of 'adminbot' '1.7.8' in 'trusty-wikimedia|main|amd64', as it has already '1.7.8'. Skipping inclusion of 'adminbot'... [17:05:01] instance can serve responses (curl from bastion) but nginx is saying NO! [17:09:14] phuedx: that sounds like related to the instanceproxy [17:09:23] feels like [17:09:28] did you add a new proxy via the wikitech ui? [17:10:16] mutante: we don't need a trusty one, the job runs on precise exec nodes I think [17:10:41] but, yeah, debian packages :( [17:12:45] valhallasw`cloud: root@tools-bastion-01:~# dpkg -l | grep adminbot [17:12:46] ii adminbot 1.7.6 all [17:12:58] that 1.7.6 was not added to repo [17:13:05] i'm running out of time to spend on this [17:13:09] it's supposed to be 5 min [17:13:48] mutante: is the precise package updated? because then we juts need to restrat the bot [17:13:59] tools-bastion0-01 doesnt even have our repos in the list it seems [17:14:10] valhallasw`cloud: updated where [17:14:18] are you talking about bastion or about exec nodes [17:14:22] or about apt.wikimedia.org [17:14:28] or the other unknown source [17:14:47] sorry, precise package, on precise exec nodes [17:15:02] then why is it installed on tools-bastion-01 too [17:15:13] in a version that is newer than what is in gerrit [17:15:23] and trusty [17:17:12] I don't know. The precise hosts have 1.7.6 from apt.wm.o [17:17:26] in precise-wikimedia/universe [17:17:32] no, the dont [17:17:34] http://apt.wikimedia.org/wikimedia/pool/main/a/adminbot/ [17:17:37] no 1.7.6 there [17:17:48] it was skipped [17:17:50] http://apt.wikimedia.org/wikimedia/pool/universe/a/adminbot/ [17:17:55] universe is wrong [17:17:59] sigh.. sigh [17:18:26] phuedx: security groups, open up port 80 on yours :) [17:18:36] Should be on the sidebar on wikitech [17:18:36] main = for existing Debian/Ubuntu packages that just have been recompiled/backported for the given distribution. [17:18:47] universe ^ [17:18:54] main = for Wikimedia native packages, as well as Debian/Ubuntu packages that have had source-modifications [17:19:27] YuviPanda|brb: have done (i think), created a web role with 80 80 tcp 0.0.0.0/0 and created an instance with the role [17:19:32] Coren: any eta on the restore for T105014 ? [17:19:51] valhallasw`cloud: ssh tools-exec-03 [17:19:57] channel 0: open failed: administratively prohibited: open failed [17:20:01] 1203 [17:20:24] exec hosts are now called tools-exec-12XX for precise and tools-exec-14XX for trusty [17:20:28] Betacommand: I was about to embark on triage of those requests. Not long, I expect, I should be able to do most of them in the coming hour or so unless they are unusually large. [17:21:03] less than a meg [17:21:58] phuedx: not sure then :( sorry! [17:22:04] valhallasw`cloud: it's not getting the update. adminbot is already the newest version. [17:22:05] * YuviPanda|brb continues being beb [17:22:06] Brb [17:22:25] :( [17:22:26] mutante: is that because universe has priority over main or something like that? [17:22:31] 6Labs, 10Incident-20150617-LabsNFSOutage: Log file needs restored. - https://phabricator.wikimedia.org/T105014#1438326 (10coren) [17:22:49] 10Tool-Labs-tools-Morebots, 5Patch-For-Review: build new .deb for adminbot/morebots, install on toollabs - https://phabricator.wikimedia.org/T105169#1438327 (10Dzahn) http://apt.wikimedia.org/wikimedia/pool/main/a/adminbot/ vs. http://apt.wikimedia.org/wikimedia/pool/universe/a/adminbot/ universe is wrong [17:23:06] apt-cache madison doesn't show the new ones in http://apt.wikimedia.org/wikimedia/pool/main/a/adminbot/ at all [17:24:50] 6Labs, 6WMF-Legal: Make sure tools can be taken over after they are abandoned - https://phabricator.wikimedia.org/T102066#1438355 (10csteipp) @Ricordisamoa, the only agreement is the notice on the bottom of the application, which valhallasw pointed out, so yes, they should be aware that by using OAuth, we rese... [17:26:17] valhallasw`cloud: probably. i have this: Skipping inclusion of 'adminbot' '1.7.8' in 'trusty-wikimedia|main|amd64', as it has already '1.7.8'. to proof it was imported [17:28:12] 6Labs, 10Incident-20150617-LabsNFSOutage, 3Labs-Sprint-103, 3Labs-Sprint-104, 3Labs-Sprint-105: Labs: Salvage, then remove volumes on labstores' raid6 - https://phabricator.wikimedia.org/T103265#1438377 (10coren) [17:28:12] 6Labs, 3Labs-Sprint-104: Recover files from old corrupted file system (Tracking) - https://phabricator.wikimedia.org/T104334#1438376 (10coren) [17:28:14] 6Labs, 10Incident-20150617-LabsNFSOutage: Log file needs restored. - https://phabricator.wikimedia.org/T105014#1438374 (10coren) 5Open>3declined I'm sorry, that file was damaged beyond repair and contains only blocks full of zeroes. [17:28:21] Betacommand: Sorry. :-( [17:28:30] Betacommand: First restore that failed. [17:28:55] damit [17:29:14] mutante: it's not listed in http://apt.wikimedia.org/wikimedia/dists/precise-wikimedia/main/binary-amd64/Packages though [17:29:19] Possibly because it was actively being written to then the system asplode. [17:29:28] s/then/when/ [17:29:35] Coren: its edited rarely [17:29:48] mutante: http://apt.wikimedia.org/wikimedia/dists/precise-wikimedia/main/binary-amd64/Packages has 1.7.5, http://apt.wikimedia.org/wikimedia/dists/precise-wikimedia/universe/binary-amd64/Packages has 1.7.6 [17:30:17] which explains why my apt-get update && apt-cache madison adminbot doesn't show anything newer [17:39:02] 6Labs, 3Labs-Sprint-104: Recover files from old corrupted file system (Tracking) - https://phabricator.wikimedia.org/T104334#1438421 (10coren) [17:39:03] 6Labs, 3Labs-Sprint-104: Recover /home/kjschiroo/Hours/src/ and /home/kjschiroo/Unwind/src/ - https://phabricator.wikimedia.org/T104993#1438419 (10coren) 5Open>3Resolved The files in those directories have been placed in `/home/kjschiroo/restore.tgz` [17:45:37] elee: now to test your patch [17:45:55] !log testlabs testing the log by logging a test [17:45:58] Logged the message, Master [17:46:06] 6Labs, 10Labs-Infrastructure, 6operations, 10ops-eqiad, 3Labs-Sprint-102: Locate and assign some MD1200 shelves for proper testing of labstore1002 - https://phabricator.wikimedia.org/T101741#1438442 (10Cmjohnson) We do not have spare md1200 shelves lying around. I have one that is not used that is waiti... [17:46:51] er [17:49:38] YuviPanda|brb, mutante: deleting the instance and recreating it (same with the web proxy) worked [17:49:41] :D [17:49:46] <- professional [17:50:57] phuedx: glad the meme is reliable http://knowyourmeme.com/photos/362738 :) [17:53:22] Negative24: fingers crossed [17:53:25] something I could watch? [17:53:58] elee: which bot was it again. I though a !log would do it [17:54:04] hah [17:54:14] adminlog? [17:54:33] so it'd be aprt of the morebots group of bots [17:55:03] so #-operations [17:56:02] I think we'd have to wait until the end of the month? [17:56:14] roger [17:57:28] what is it now? i already made a ticket and pasted all the updates [17:57:34] and we talked about it earlier right here [17:58:26] could you read that before the usual comments [18:00:34] Its the morebots project on tools. I'll restart the jobs once exec hosts have the new package [18:02:14] PROBLEM - Puppet failure on tools-exec-1404 is CRITICAL 55.56% of data above the critical threshold [0.0] [18:07:46] 6Labs, 10Labs-Infrastructure: horizon: as user 'hashar' I can't boot instances from the contintcloud project image - https://phabricator.wikimedia.org/T105015#1438544 (10hashar) Noticed the images showing up all have a custom property: `show: true`. Trying to update the Nodepool image to inject the same metad... [18:12:43] PROBLEM - Puppet failure on tools-exec-1214 is CRITICAL 20.00% of data above the critical threshold [0.0] [18:13:44] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1401 is CRITICAL 50.00% of data above the critical threshold [0.0] [18:17:14] RECOVERY - Puppet failure on tools-exec-1404 is OK Less than 1.00% above the threshold [0.0] [18:22:55] 10Tool-Labs-tools-Morebots: Undebianify morebots - https://phabricator.wikimedia.org/T105208#1438593 (10valhallasw) 3NEW [18:24:27] labnodepool1001: DISK CRITICAL - /home/hashar/mount is not accessible: Permission denied [18:26:18] 6Labs, 10Labs-Infrastructure: horizon: as user 'hashar' I can't boot instances from the contintcloud project image - https://phabricator.wikimedia.org/T105015#1438613 (10Andrew) yeah, that's on purpose -- otherwise we'd be inviting users to create instances based on obsolete images. The official upstream bug... [18:26:29] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1206 is CRITICAL 20.00% of data above the critical threshold [0.0] [18:26:31] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1410 is CRITICAL 30.00% of data above the critical threshold [0.0] [18:28:41] 6Labs, 10Labs-Infrastructure: labnodepool1001 - DISK CRITICAL - /home/hashar/mount is not accessible: Permission denied - https://phabricator.wikimedia.org/T105209#1438628 (10Dzahn) 3NEW [18:29:11] PROBLEM - Puppet failure on tools-exec-1407 is CRITICAL 44.44% of data above the critical threshold [0.0] [18:29:12] mutante: do you want to try and get the .deb situation sorted out or shall I hack a fabric deploy system together? [18:29:41] PROBLEM - Puppet failure on tools-services-01 is CRITICAL 60.00% of data above the critical threshold [0.0] [18:29:55] in any case, thank you for putting time into it [18:37:25] valhallasw`cloud: i want to sort it out but get back to it later please [18:37:39] valhallasw`cloud: i dunno which deployment system is right, we have too many of them :p [18:37:57] put to keep it simple, can it just be git::clone in puppet too? [18:38:43] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1401 is OK Less than 1.00% above the threshold [0.0] [18:42:42] RECOVERY - Puppet failure on tools-exec-1214 is OK Less than 1.00% above the threshold [0.0] [18:54:39] RECOVERY - Puppet failure on tools-services-01 is OK Less than 1.00% above the threshold [0.0] [18:56:28] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1206 is OK Less than 1.00% above the threshold [0.0] [18:56:32] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1410 is OK Less than 1.00% above the threshold [0.0] [18:59:10] RECOVERY - Puppet failure on tools-exec-1407 is OK Less than 1.00% above the threshold [0.0] [19:08:39] 6Labs, 10Labs-Infrastructure: horizon: as user 'hashar' I can't boot instances from the contintcloud project image - https://phabricator.wikimedia.org/T105015#1438808 (10hashar) The custom properties need to be passed as a `properties` containing a hash hence: ``` providers: - name: wmflabs-eqiad ....... [19:15:54] 6Labs, 10Labs-Infrastructure: horizon: as user 'hashar' I can't boot instances from the contintcloud project image - https://phabricator.wikimedia.org/T105015#1438814 (10hashar) 5Open>3Resolved a:3hashar show: true did the trick: {F190385 size=full} I can now boot diskimage from Horizon \O/ [19:20:40] mutante: yeah, something like that, with a fabfile that actually starts the bots [19:21:00] mutante: but provisioning is not really important when it runs on tool labs [19:22:15] valhallasw`cloud: it's one of the simplest things to just add a "git::clone" in the right puppet class. the only issue is deploying would have to be "rm -rf adminbot; run puppet" [19:22:26] it will clone it just fine when it doesnt exist [19:22:33] but i dont think it will update it on changes [19:22:50] which might not be a huge deal in this case though [19:24:13] mutante: right, but we can't actually start it from puppet as long as it's on tool labs [19:25:00] as long as it's run on the grid, specifically [19:42:53] Hm.. looks like bigbrotherrc stopped ensuring jstart [19:46:30] 6Labs, 10Tool-Labs, 7Regression: [Regression] BigBrother isn't handling jstart - https://phabricator.wikimedia.org/T105223#1438898 (10Krinkle) 3NEW [19:52:44] 6Labs, 10Continuous-Integration-Infrastructure, 10Labs-Infrastructure, 6operations: dnsmasq returns SERVFAIL for (some?) names that do not exist instead of NXDOMAIN - https://phabricator.wikimedia.org/T92351#1438931 (10coren) AFAICT, this problem solved itself (as expected) since we switched to a properly... [19:55:50] 6Labs, 10Continuous-Integration-Infrastructure, 10Labs-Infrastructure, 6operations: dnsmasq returns SERVFAIL for (some?) names that do not exist instead of NXDOMAIN - https://phabricator.wikimedia.org/T92351#1438941 (10coren) 5declined>3Resolved Indeed it has: ```marc@tools-bastion-01:~$ host notexist... [20:00:53] andrewbogott: around? [20:00:59] YuviPanda|brb: yep! [20:01:12] 6Labs, 10Labs-Infrastructure: labnodepool1001 - DISK CRITICAL - /home/hashar/mount is not accessible: Permission denied - https://phabricator.wikimedia.org/T105209#1438951 (10hashar) Yup the monitoring probe should filter out some mounts :-/ T104975 [20:01:25] 6Labs, 10Labs-Infrastructure: labnodepool1001 - DISK CRITICAL - /home/hashar/mount is not accessible: Permission denied - https://phabricator.wikimedia.org/T105209#1438953 (10hashar) [20:04:23] andrewbogott: any clue why a new host would sent "mpt raid status change" messages every two hours? The mails are empty, and the only reference I can find to mpt-status is in the base image [20:04:47] valhallasw`cloud: no idea at all. What distro? [20:04:51] and when did you build it? [20:04:58] precise, last week [20:05:03] https://phabricator.wikimedia.org/T104779 [20:05:19] sorry, no, trusty [20:05:35] it's https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta-webproxy-01.toolsbeta.eqiad.wmflabs [20:05:46] I just built a fresh trusty image, so it may not happen anymore. But I definitely don’t know why it happened. [20:06:30] that should probably be in a "if $realm" check or something like that [20:06:56] if machine is virtual then no RAID checks [20:07:07] andrewbogott: ok. I'll just keep the bug open then, and somehow disable the service manually [20:07:36] mutante: do you know where the check is running? [20:08:42] andrewbogott: it's from files/monitoring/check-raid.py [20:08:45] in base module [20:09:00] the script tries to be smart about what utility to use to check [20:09:06] depending on the type of server [20:09:18] like Ciscos need another tool than Dell and all that [20:09:54] 6Labs, 10Tool-Labs: new labs host sends out "mpt raid status change" emails - https://phabricator.wikimedia.org/T104779#1438977 (10valhallasw) /etc/default/mpt-statusd specifies ``` RUN_DAEMON=no ``` which should prevent the daemon from starting. It also prevents it from stopping via service stop, though...... [20:10:20] the question is, are there real RAIDs on that server nor not [20:10:23] valhallasw`cloud: [20:10:36] because: [20:10:38] 27 if utility is None: [20:10:38] 28 print 'OK: no RAID installed' [20:10:42] 6Labs, 10Tool-Labs: new labs host sends out "mpt raid status change" emails - https://phabricator.wikimedia.org/T104779#1438985 (10Andrew) Daniel thinks this is files/monitoring/check-raid.py misapplying. Probably that instance is including parent classes that we don't normally include on labs... in any case,... [20:10:52] 38 elif utility == 'mptsas': [20:10:52] 39 status = checkmptsas() [20:10:53] 40 elif utility == 'mdadm': [20:10:53] 41 status = checkSoftwareRaid() [20:12:40] 6Labs, 10Tool-Labs: new labs host sends out "mpt raid status change" emails - https://phabricator.wikimedia.org/T104779#1438990 (10Dzahn) /puppet/modules/base$ vi files/monitoring/check-raid.py ``` 26 try: 27 if utility is None: 28 print 'OK: no RAID installed' 29 sta... [20:13:05] mutante: it's m1.small, so I don't think so? [20:13:33] 6Labs, 10Tool-Labs: new labs host sends out "mpt raid status change" emails - https://phabricator.wikimedia.org/T104779#1438995 (10Dzahn) is "mptsas" installed ? [20:14:06] valhallasw`cloud: is maybe just the tool installed for some reason? "mptsas" ? [20:14:43] so yea, it's not the script itself so much [20:15:03] it's the fact that it gets mpt-statusd and mptsas installed even though it's a VM [20:15:12] from base [20:15:27] 6Labs, 10Tool-Labs: new labs host sends out "mpt raid status change" emails - https://phabricator.wikimedia.org/T104779#1439001 (10valhallasw) `arcconf` seems to be installed; none of the others. [20:15:32] 6Labs: Don't run block-on-mount block-for-home-export if the volume is already mounted - https://phabricator.wikimedia.org/T105226#1439002 (10Andrew) 3NEW a:3Andrew [20:15:46] 6Labs, 10Labs-Infrastructure: Some instances don't get automatic nfs exports - https://phabricator.wikimedia.org/T105024#1439012 (10Andrew) 5Open>3Resolved testlabs-mount103 is working now too. So probably this will be fixed by https://gerrit.wikimedia.org/r/#/c/223656/1 [20:16:36] valhallasw`cloud: how about apt-get remove anything "mpt" related and run puppet. does it come back? [20:17:23] eh, i should say "arcconf" then [20:18:18] 39 package { [ 'megacli', 'arcconf', 'mpt-status' ]: [20:18:18] 40 ensure => 'latest', [20:18:27] modules/base/manifests/monitoring/host.pp [20:18:44] Notice: /Stage[main]/Base::Monitoring::Host/Package[arcconf]/ensure: ensure changed 'purged' to 'latest' [20:18:45] hmmm.. but also this: [20:18:46] yep [20:18:47] modules/install_server/files/reprepro/updates:ListShellHook: grep-dctrl -e -S '^megacli|arcconf|lsiutil$' || [ $? -eq 1 ] [20:19:11] 6Labs, 10Labs-Infrastructure: horizon: as user 'hashar' I can't boot instances from the contintcloud project image - https://phabricator.wikimedia.org/T105015#1439026 (10hashar) >>! In T105015#1438613, @Andrew wrote: > yeah, that's on purpose -- otherwise we'd be inviting users to create instances based on obs... [20:19:27] valhallasw`cloud: hehehe, look what i found [20:19:33] commit da2f050608bd6c7bf2856c3e528fef3b50d56de3 [20:19:41] Author: Faidon Liambotis [20:19:41] Date: Fri Jun 5 01:27:53 2015 +0300 [20:19:45] base: kill annoying mpt-status emails [20:19:58] sounds like it didnt kill all of them [20:20:16] " Right now we're getting annoying mpt-status emails when systems are [20:20:16] first installed. [20:20:54] I think that one describes getting a single mail when installing, but maybe the fix actually made it worse? [20:21:15] Instead of this, do the following: [20:21:15] 1. provision /etc/default/mpt-statusd with RUN_DAEMON=no [20:21:15] 2. install mpt-status [20:21:51] 6Labs, 10Tool-Labs: new labs host sends out "mpt raid status change" emails - https://phabricator.wikimedia.org/T104779#1439039 (10Dzahn) so that package comes from base: ``` 13:19 < mutante> 39 package { [ 'megacli', 'arcconf', 'mpt-status' ]: 13:19 < mutante> 40 ensure => 'latest', 13:19 < mu... [20:22:00] mmm, maybe mpt-status was already installed in an earlier puppet run? [20:29:29] hello! [20:29:49] does anyone know how to turn cloud-init verbose/debug mode when an instance boot ? [20:30:04] I am wondering whether there is some magic config or a kernel parameter that needs to be set [20:58:57] 6Labs, 10Labs-Infrastructure, 5Continuous-Integration-Isolation, 3Labs-Sprint-103, and 3 others: Instances without a shared NFS storage suffers from a 3 minutes boot delay - https://phabricator.wikimedia.org/T102544#1439110 (10Andrew) I've built new images for Trusty and Jessie. Not bothering with Precise... [20:59:42] 6Labs, 10Labs-Infrastructure: Some instances don't get automatic nfs exports for a long time - https://phabricator.wikimedia.org/T105024#1439122 (10Andrew) 5Resolved>3Open [21:00:18] 6Labs, 10Labs-Infrastructure: Some instances don't get automatic nfs exports for a long time - https://phabricator.wikimedia.org/T105024#1434698 (10Andrew) Sometimes the delay is quite long: 10 minutes, 20 minutes, maybe an hour. Something is gummed up with the way we detect exports, possibly due to caching... [21:00:58] 6Labs, 10Labs-Infrastructure, 5Continuous-Integration-Isolation, 3Labs-Sprint-103, and 3 others: Instances without a shared NFS storage suffers from a 3 minutes boot delay - https://phabricator.wikimedia.org/T102544#1439141 (10Andrew) 5Open>3Resolved [21:05:06] (03PS1) 10Multichill: Pre wikimania backup [labs/tools/multichill] - 10https://gerrit.wikimedia.org/r/223670 [21:05:08] (03PS1) 10Multichill: Merge branch 'master' of ssh://gerrit.wikimedia.org:29418/labs/tools/multichill into prewikimania [labs/tools/multichill] - 10https://gerrit.wikimedia.org/r/223671 [21:07:01] (03CR) 10Multichill: [C: 032 V: 032] "In a hurry" [labs/tools/multichill] - 10https://gerrit.wikimedia.org/r/223670 (owner: 10Multichill) [21:21:43] 6Labs, 3Labs-Sprint-105: Do a manual backup of labstore1002 - https://phabricator.wikimedia.org/T104882#1439239 (10Andrew) [21:22:02] 6Labs, 3Labs-Sprint-105: Do a manual backup of labstore1002 - https://phabricator.wikimedia.org/T104882#1430972 (10Andrew) Yuvi, did you document this someplace? And if so could I get a link? [21:24:25] 6Labs, 7Tracking: Request to create Gather labs project - https://phabricator.wikimedia.org/T89185#1439248 (10Andrew) [21:24:26] 6Labs, 7Tracking: Shutdown Gather labs Project - https://phabricator.wikimedia.org/T105038#1439246 (10Andrew) 5Open>3Resolved Done -- thanks for cleaning up! [22:37:00] 6Labs, 10wikitech.wikimedia.org: Build a simple tool to query which instances have which roles / puppet variables - https://phabricator.wikimedia.org/T103995#1439471 (10Krenair) >>! In T103995#1408352, @Krenair wrote: >>>! In T103995#1407951, @yuvipanda wrote: >> Code at github.com/wikimedia/watroles > > 404... [22:37:06] 6Labs, 10Continuous-Integration-Infrastructure, 10Labs-Infrastructure, 6operations: dnsmasq returns SERVFAIL for (some?) names that do not exist instead of NXDOMAIN - https://phabricator.wikimedia.org/T92351#1439472 (10scfc) Too bad for the readers Google will bring here in the future: Nearly four months o... [22:39:26] 6Labs, 3Labs-Sprint-103, 3Labs-Sprint-104, 3Labs-Sprint-105: Limit available images on horizon - https://phabricator.wikimedia.org/T91782#1439475 (10hashar) Related is {T105015}. Images created by Nodepool were no more showing. Just had to inject a metadata property: `show = true`. [22:40:02] 6Labs, 10Labs-Infrastructure: horizon: as user 'hashar' I can't boot instances from the contintcloud project image - https://phabricator.wikimedia.org/T105015#1434451 (10hashar) Was caused by the new feature {T91782} which now let us hide images \O/ [23:04:25] valhallasw`cloud: git::clone has "ensure => latest" :) let's do that [23:22:17] 10Wikibugs: wikibugs test bug - https://phabricator.wikimedia.org/T1152#1439657 (10Legoktm) 1 [23:43:06] 6Labs, 10Wikimedia-Apache-configuration, 6operations, 10wikitech.wikimedia.org, 5Patch-For-Review: Make 404.php be served as the 404 error for wikitech. - https://phabricator.wikimedia.org/T102147#1439719 (10Dzahn) 5Open>3Resolved a:3Dzahn has been applied on silver and works now :) does look nic... [23:58:10] 6Labs, 10Wikimedia-Apache-configuration, 6operations, 10wikitech.wikimedia.org, 5Patch-For-Review: Make 404.php be served as the 404 error for wikitech. - https://phabricator.wikimedia.org/T102147#1439798 (10Krenair) a:5Dzahn>3Southparkfan