[00:19:36] (03PS1) 10Alex Monk: Send Beta-Cluster-Infrastructure to #wikimedia-releng [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/243844 [00:23:13] (03CR) 10Legoktm: [C: 032] Send Beta-Cluster-Infrastructure to #wikimedia-releng [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/243844 (owner: 10Alex Monk) [00:30:50] (03Merged) 10jenkins-bot: Send Beta-Cluster-Infrastructure to #wikimedia-releng [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/243844 (owner: 10Alex Monk) [00:34:01] !log tools.wikibugs Updated channels.yaml to: 9da0a4809b8d990d2a87d465868d8a8c8fd549b1 Send Beta-Cluster-Infrastructure to #wikimedia-releng [00:34:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL, Master [02:56:02] 6Labs, 10Tool-Labs, 5Patch-For-Review: Please install hugin-tools and pillow - https://phabricator.wikimedia.org/T108210#1704397 (10yuvipanda) 5Open>3Resolved a:3yuvipanda [02:56:03] 6Labs, 10Tool-Labs, 7Tracking: Packages to be added to toollabs puppet - https://phabricator.wikimedia.org/T55704#1704400 (10yuvipanda) [02:56:12] 6Labs, 10Tool-Labs, 5Patch-For-Review: Please install hugin-tools and pillow - https://phabricator.wikimedia.org/T108210#1515302 (10yuvipanda) a:5yuvipanda>3valhallasw [04:25:33] (03CR) 10: "Thanks :)" [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/243844 (owner: 10Alex Monk) [04:30:59] woah [04:31:33] how did that work? [04:32:03] greg-g: how'd you post that comment? [04:32:56] 6Labs: Install mailutils on tools - https://phabricator.wikimedia.org/T114073#1704508 (10jimmyxu) [04:32:58] 6Labs, 10Tool-Labs, 7Tracking: Packages to be added to toollabs puppet - https://phabricator.wikimedia.org/T55704#1704507 (10jimmyxu) [04:33:14] 6Labs, 10Tool-Labs, 5Patch-For-Review, 3labs-sprint-117: Setup a way to store secrets and access them from puppet inside the Tool Labs project - https://phabricator.wikimedia.org/T112005#1704509 (10yuvipanda) There are two ways of doing this: # Add a modulepath for secrets to the self hosted puppetmaster'... [04:35:23] !log tools created tools-puppetmaster-02 as hot spare [04:35:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [04:36:11] legoktm: commented on the patch....? [04:36:39] legoktm: greg-g is a ninja etc [04:37:04] greg-g: but gerrit shows the comment as made by "gjg" and the email is from "Anonymous Coward" [04:37:11] ............. [04:37:27] I promise I did it, unless I was under the influence of a mind control drug [04:37:54] hmm [04:38:03] if I use the old change screen view, it shows up as "greg" [04:38:21] weird. [04:38:38] I was futzing with my gerrit settings today for some reason.. [04:55:46] 6Labs, 10Tool-Labs, 5Patch-For-Review, 3labs-sprint-116, 3labs-sprint-117: Allow direct ssh access to tools - https://phabricator.wikimedia.org/T113979#1704513 (10faidon) >>! In T113979#1703261, @scfc wrote: > As I wrote in T48468, for normal users `umask` needs to be `022` to avoid all other users tampe... [05:04:27] i can't ssh to labs [05:05:13] hm, ok, one connection just worked [05:05:22] but others time out [05:09:19] woo, i'm in again [06:42:05] !log deleting all instances in k8s-eval project, puppet's failed forever and stuff [06:42:05] deleting is not a valid project. [06:44:47] !log k8s-eval deleting all instances in k8s-eval project, puppet's failed forever and stuff [06:44:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:K8s-eval/SAL, Master [07:05:14] 6Labs, 10Wikimedia-Mailing-lists: Shutdown toolserver-l mailman list - https://phabricator.wikimedia.org/T113845#1704611 (10Multichill) Thanks @Dzahn . Is it worth documenting the steps you took so we have a checklist for proper decommissioning of a mailman list? [08:28:48] yuvipanda: so for the tool labs packaging policy, I don't really care what the end result is as long as we have something consistent. I think that's the same for all of us ;-) so shall I give a shot at defining it? [08:29:12] yes [08:29:33] is there a policy for prod? Other than 'can't get stuff from the internet'? [08:29:53] no [08:29:56] well [08:30:08] it's just 'has to be a debian package that an ops person uploads to our repo' [08:30:21] and debian packages built normally can't just get stuff from the internet, I think [08:30:27] ah, ok, so 'everything is a deb' [08:30:31] no git checkouts etc [08:30:40] well, trebuchet is a thing [08:30:43] git deploy [08:30:46] is trebuchet [08:30:53] and that's used for a bunch of things, especially java [08:31:07] so yeah, there is no policy. it's a case-by-case basis I guess [08:31:11] heh. ok. [08:32:29] my preference would be: 'if it is in an upstream repo, sure! if not, sorry - can you ask them to backport it?'. Specifically I don't want to maintain precise versions of trusty/future backports... [08:32:45] backporting from vivid+ to trusty seems more ok... [08:33:03] but I've been up for too long and staring at a computer for too long that I should now go to sleep... [08:33:09] valhallasw`cloud: start a phab ticket too? [08:33:10] Yeah, I think the general gist would be 'we don't have time for this' [08:33:12] I did. [08:33:21] lemme find it [08:33:29] https://phabricator.wikimedia.org/T114645 [08:33:29] valhallasw`cloud: <3 thanks. yeah, "We don't have time for this" is an accurate description atm unforunately [08:33:44] we can also mirror ppa's now that we have aptly [08:33:50] which is something I think we should consider [08:34:48] 'depends' I guess... but yeah, definitely less of a 'we do not have time for it' excuse. We can mirror 'official' PPAs perhaps. I think the node/python packaging teams have one [08:34:58] rather than 'person X's PPA' but that too depend on who person X is [08:35:38] anyway, in a glorious future you just write apt-get or pip install commands (or a combination of both!) in your Dockerfile... [08:36:12] if I'm still alive by then [08:36:21] * yuvipanda goes to sleeep for real [08:36:25] valhallasw`cloud: I merged a few of your patches [08:36:38] I saw, thanks! [08:36:42] and good night :-) [08:36:48] good night! [08:41:34] yuvipanda: also, if you're still around... is it OK if I bump a few issues from the queue into the sprint so that it gets your (or corens) attention? Mostly direct user-facing issues that really should be solved soon(tm) [08:42:00] valhallasw`cloud: yeah, which ones? [08:42:09] I'm going to get started on a 'backlog' queue soon [08:42:11] actually [08:42:14] let me just do that [08:42:18] gimme a sec [08:42:56] yuvipanda: https://phabricator.wikimedia.org/T104614 https://phabricator.wikimedia.org/T109972 https://phabricator.wikimedia.org/T97857 mostly [08:44:05] and https://phabricator.wikimedia.org/T109216 and https://phabricator.wikimedia.org/T107725 maybe? [08:44:17] 6Labs, 10Tool-Labs, 10Incident-20150617-LabsNFSOutage, 3labs-sprint-117: Re-enable cron for tools on tool labs - https://phabricator.wikimedia.org/T104614#1704703 (10yuvipanda) [08:44:29] valhallasw`cloud: can you put them into Labs-Team-Backlog [08:44:32] and I'll triage now [08:44:36] oh, sure. [08:44:55] but everything is backlog ;-D [08:45:15] 6Labs, 10Tool-Labs, 7Database: tools.citationhunt can't access databases - https://phabricator.wikimedia.org/T109972#1704704 (10yuvipanda) Yeah I think this needs replica.my.cnf re-creation. I think just deleting the user account on the labsdbs should trigger this. I'll do this tomorrow. [08:45:24] valhallasw`cloud: yeah but if I put them in a sprint I'll get them out of it [08:45:37] 6Labs, 10Tool-Labs, 7Database, 3labs-sprint-117: tools.citationhunt can't access databases - https://phabricator.wikimedia.org/T109972#1704705 (10yuvipanda) [08:45:46] 6Labs, 10Tool-Labs, 7Database, 10Labs-Team-Backlog, 3labs-sprint-117: tools.citationhunt can't access databases - https://phabricator.wikimedia.org/T109972#1704714 (10valhallasw) [08:45:49] 6Labs, 10Tool-Labs, 10Incident-20150617-LabsNFSOutage, 10Labs-Team-Backlog, 3labs-sprint-117: Re-enable cron for tools on tool labs - https://phabricator.wikimedia.org/T104614#1704717 (10valhallasw) [08:45:52] 6Labs, 10Tool-Labs, 10Labs-Team-Backlog: Tool Labs: Enable php5-mcrypt on Trusty - https://phabricator.wikimedia.org/T97857#1704718 (10valhallasw) [08:45:57] oh, wait, I shouldn't have added those two probably :-p [08:45:59] oh well. [08:46:19] valhallasw`cloud: not sure how I can do anything for https://phabricator.wikimedia.org/T109216 [08:46:21] s/I/we/ [08:46:27] oh [08:46:29] I see [08:46:30] yuvipanda: changing permissions for existign files [08:46:33] a find and chmod [08:46:39] valhallasw`cloud: I guess I should run them on the labstore [08:46:45] yes, that was my thinking [08:46:45] than on the the client [08:46:50] so as to not fuck everything up [08:47:43] valhallasw`cloud: I've put them all on this week's sprint [08:47:49] and it also needs to be fixed on the pywikibot side, but I think we should fix existing files first [08:48:05] 6Labs, 10Tool-Labs, 3Labs-Sprint-115, 3labs-sprint-116: Write admission controller disabling mounting of unauthorized volumes - https://phabricator.wikimedia.org/T112718#1704727 (10yuvipanda) [08:48:06] yeah [08:48:54] 6Labs, 10Tool-Labs, 10Incident-20150617-LabsNFSOutage, 3labs-sprint-117: Re-enable cron for tools on tool labs - https://phabricator.wikimedia.org/T104614#1704731 (10yuvipanda) [08:49:03] 6Labs, 10Tool-Labs, 7Database, 3labs-sprint-117: tools.citationhunt can't access databases - https://phabricator.wikimedia.org/T109972#1704733 (10yuvipanda) [08:49:15] valhallasw`cloud: so I've removed Labs-Team-Backlog from everything except the trusty ticket [08:49:19] which is Too Sleepy Didn't Read atm [08:49:27] ok [08:49:56] valhallasw`cloud: I'll get to the others this week. Put things that you think need attention from one of us onto this board and I'll try to spend some time every day to keep it clean [08:50:10] this board = labs-backlog? [08:50:55] Labs-Team-Backlog [08:51:09] ok. Will do! [08:51:18] which is specifically for the 3 paid members of the labs team, I guess. it's all strange new worlds and trying things [08:51:44] ya [08:52:27] 6Labs, 10Tool-Labs, 10Labs-Team-Backlog: Set up A-based SPF for tools.wmflabs.org - https://phabricator.wikimedia.org/T104733#1704736 (10valhallasw) [09:46:37] PROBLEM - Puppet staleness on tools-k8s-bastion-01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [43200.0] [11:36:58] 6Labs, 10Continuous-Integration-Infrastructure, 10Labs-Infrastructure: "puppet-compiler02" Jenkins slave is no more connected - https://phabricator.wikimedia.org/T104428#1705041 (10hashar) a:3Joe I guess @Joe fixed it . There is a slave pooled https://integration.wikimedia.org/ci/computer/compiler02.puppet... [11:37:05] 6Labs, 10Continuous-Integration-Infrastructure, 10Labs-Infrastructure: "puppet-compiler02" Jenkins slave is no more connected - https://phabricator.wikimedia.org/T104428#1705043 (10hashar) 5Open>3Resolved [12:14:59] 6Labs, 10Labs-Infrastructure, 5Continuous-Integration-Scaling: Support dedicating a specific virt node to a specific nova project - https://phabricator.wikimedia.org/T84989#1705193 (10hashar) [13:22:40] 6Labs, 10Beta-Cluster-Infrastructure, 7Tracking: Beta Cluster <-> WMF Labs policy compliance (tracking) - https://phabricator.wikimedia.org/T114615#1705385 (10hashar) 5Open>3stalled [13:31:08] 6Labs, 10Beta-Cluster-Infrastructure: Completely remove Beta Cluster dependency on NFS - https://phabricator.wikimedia.org/T102953#1705419 (10hashar) 5Open>3stalled Waiting for {T64835} [17:39:31] 6Labs, 10Labs-Infrastructure, 3labs-sprint-116: Audit private IP allocation for Labs instances - https://phabricator.wikimedia.org/T113982#1706049 (10mark) 5Open>3Resolved Yep, this is fine. [17:59:16] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [0.0] [18:19:00] yuvipanda: Can you help me installing that job queue (T112465)? [18:19:39] hey Luke081515 [18:19:48] I can probably point you to documentation and help you with specific questions [18:19:57] https://celery.readthedocs.org/en/latest/ is a good place to start for celery [18:20:02] ok, thanks [18:20:22] I also suggest just using tools if you want to continue using gridengine. [18:21:28] yuvipanda: I realized today ppas effectively give the people who build them root access on machines that use the ppas >_< [18:21:38] valhallasw`cloud: indeed. [18:21:44] but the good news is one can download the ppa recipe instead [18:21:49] and check that [18:21:52] but... more work :( [18:22:08] packaging is always more owrk [18:22:10] *work [18:22:23] I still think people should just use pip if they need things that aren't packages [18:22:38] or whatever the package manager for their language is [18:22:39] I'm not just thinking python packages :P [18:22:48] example in point: composer [18:23:26] 6Labs, 10Labs-Infrastructure, 6operations, 5Patch-For-Review, 3labs-sprint-117: add logrotate for designate logs (holmium disk space) - https://phabricator.wikimedia.org/T114544#1706263 (10Andrew) well, setting debug=False and verbose=False didn't actually stop me from getting a verbose log. So that nee... [18:23:26] indeed. I'm totally ok with a git clone for composer [18:26:31] The main argument against that is puppet slowness, although I suppose I should actually do some profiling before pointing fingers [18:34:14] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [18:58:07] 10Tool-Labs-tools-Global-user-contributions, 6Collaboration-Team-Backlog, 10Flow, 10xTools-on-Labs: Add Flow contributions to GUC and Xtools - https://phabricator.wikimedia.org/T114777#1705954 (10Josve05a) [20:26:15] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [20:42:43] 6Labs, 10Labs-Infrastructure, 6operations, 5Patch-For-Review, 3labs-sprint-117: add logrotate for designate logs (holmium disk space) - https://phabricator.wikimedia.org/T114544#1706745 (10Andrew) best I can tell, loglevels can't be changed. Nonetheless, the attached rotation patch should save us. [20:48:12] 6Labs, 10Labs-Infrastructure, 10hardware-requests, 6operations, 3labs-sprint-117: Labs test cluster in codfw - https://phabricator.wikimedia.org/T114435#1706748 (10Andrew) [21:01:12] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [21:17:43] where are cgi scripts supposed to be placed on tools? [21:20:01] 6Labs, 10Labs-Infrastructure, 6operations, 5Patch-For-Review, 3labs-sprint-117: add logrotate for designate logs (holmium disk space) - https://phabricator.wikimedia.org/T114544#1706937 (10Dzahn) p:5Triage>3High [21:25:28] 6Labs, 10Labs-Infrastructure, 6operations, 5Patch-For-Review, 3labs-sprint-117: add logrotate for designate logs (holmium disk space) - https://phabricator.wikimedia.org/T114544#1707015 (10Dzahn) merged and config snippets got added in holmium. we should confirm tomorrow or so it got rotated [21:26:07] 6Labs, 10Labs-Infrastructure, 6operations, 5Patch-For-Review, 3labs-sprint-117: add logrotate for designate logs (holmium disk space) - https://phabricator.wikimedia.org/T114544#1707021 (10Dzahn) p:5High>3Normal [21:26:21] 6Labs, 10Labs-Infrastructure, 6operations, 5Patch-For-Review, 3labs-sprint-117: add logrotate for designate logs (holmium disk space) - https://phabricator.wikimedia.org/T114544#1707025 (10Dzahn) a:3Dzahn [21:26:35] 6Labs, 10Labs-Infrastructure, 6operations, 3labs-sprint-117: add logrotate for designate logs (holmium disk space) - https://phabricator.wikimedia.org/T114544#1698955 (10Dzahn) [21:27:45] 6Labs, 10Labs-Infrastructure, 6operations, 3labs-sprint-117: add logrotate for designate logs (holmium disk space) - https://phabricator.wikimedia.org/T114544#1707045 (10Dzahn) a:5Dzahn>3Andrew @Andrew here, you uploaded the fix. wanna close it tomorrow? [22:27:48] yuvipanda: how do I server cgi scripts through the tools webservice? [22:29:30] Negative24: https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Web has info [22:31:00] ok thanks