[02:20:47] ooh, phab has a nice blue bar now [02:35:09] GEOFBOT: and soon a nice logo (see it on phab-03.wmflabs.org) [02:35:27] ooh [02:35:34] tasty [02:36:15] and it has the 2015 redesign which we may also impliment [02:36:25] *implement [04:04:58] PROBLEM - Free space - all mounts on tools-webgrid-generic-1403 is CRITICAL tools.tools-webgrid-generic-1403.diskspace.root.byte_percentfree (<50.00%) [06:44:58] RECOVERY - Free space - all mounts on tools-webgrid-generic-1403 is OK All targets OK [06:54:04] 6Labs, 5Patch-For-Review: Nested ".d" dirs in /etc/apt/ - https://phabricator.wikimedia.org/T104019#1408968 (10Gage) Because new images were not built, I tried to work around this myself like so: ``` sudo mv /etc/apt/apt.conf.d/apt.conf.d/* /etc/apt/apt.conf.d/ sudo mv /etc/apt/preferences.d/preferences.d/* /e... [07:29:09] jgage: that sounds like one of the puppet standard libs has not been updated [07:29:25] jgage: assuming 'umask' as parameter was added to the git checkout thing at some point [07:29:51] jgage: so, basically, I think the self-hosted puppetmaster is working as expected, it's just that that master is not up to date [08:41:34] 10Tool-Labs-tools-Other: Usersearch: ERROR: Unhandled exception.[Errno socket error] [Errno -2] Name or service not known - https://phabricator.wikimedia.org/T104138#1409036 (10Aklapper) >>! In T104138#1408433, @Elee wrote: > Issue has been fixed So should this task be closed as resolved? Who is the assignee here? [08:42:38] 10Tool-Labs-tools-Other: Usersearch: ERROR: Unhandled exception.[Errno socket error] [Errno -2] Name or service not known - https://phabricator.wikimedia.org/T104138#1409037 (10Elee) 5Open>3Resolved a:3Elee @Aklapper er yeah sorry about that. [08:44:03] 6Labs, 10Labs-Infrastructure, 5Continuous-Integration-Isolation, 3Labs-Sprint-103, 5Patch-For-Review: Instances without a shared NFS storage suffers from a 3 minutes boot delay - https://phabricator.wikimedia.org/T102544#1409042 (10hashar) p:5Triage>3Normal [09:02:12] 6Labs, 7Tracking: Elee Labs Project - https://phabricator.wikimedia.org/T104170#1409059 (10Elee) 3NEW [09:02:26] yes I know wikibugs I love you too. [09:25:48] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1409 is CRITICAL 40.00% of data above the critical threshold [0.0] [09:26:02] PROBLEM - Puppet failure on tools-exec-1212 is CRITICAL 40.00% of data above the critical threshold [0.0] [09:26:10] PROBLEM - Puppet failure on tools-exec-1407 is CRITICAL 22.22% of data above the critical threshold [0.0] [09:26:40] PROBLEM - Puppet failure on tools-services-01 is CRITICAL 30.00% of data above the critical threshold [0.0] [09:26:42] PROBLEM - Puppet failure on tools-shadow is CRITICAL 50.00% of data above the critical threshold [0.0] [09:26:46] PROBLEM - Puppet failure on tools-webgrid-generic-1401 is CRITICAL 30.00% of data above the critical threshold [0.0] [09:27:18] PROBLEM - Puppet failure on tools-mail is CRITICAL 44.44% of data above the critical threshold [0.0] [09:27:28] PROBLEM - Puppet failure on tools-exec-1216 is CRITICAL 50.00% of data above the critical threshold [0.0] [09:27:30] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1206 is CRITICAL 30.00% of data above the critical threshold [0.0] [09:27:33] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1410 is CRITICAL 40.00% of data above the critical threshold [0.0] [09:27:33] PROBLEM - Puppet failure on tools-precise-dev is CRITICAL 40.00% of data above the critical threshold [0.0] [09:27:37] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1204 is CRITICAL 60.00% of data above the critical threshold [0.0] [09:27:39] PROBLEM - Puppet failure on tools-exec-1405 is CRITICAL 60.00% of data above the critical threshold [0.0] [09:28:11] PROBLEM - Puppet failure on tools-webproxy-01 is CRITICAL 33.33% of data above the critical threshold [0.0] [09:29:55] PROBLEM - Puppet failure on tools-master is CRITICAL 50.00% of data above the critical threshold [0.0] [09:30:35] PROBLEM - Puppet failure on tools-exec-1211 is CRITICAL 40.00% of data above the critical threshold [0.0] [09:30:41] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1207 is CRITICAL 60.00% of data above the critical threshold [0.0] [09:30:45] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1401 is CRITICAL 20.00% of data above the critical threshold [0.0] [09:30:45] PROBLEM - Puppet failure on tools-exec-1214 is CRITICAL 30.00% of data above the critical threshold [0.0] [09:30:59] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1406 is CRITICAL 50.00% of data above the critical threshold [0.0] [09:30:59] PROBLEM - Puppet failure on tools-exec-cyberbot is CRITICAL 44.44% of data above the critical threshold [0.0] [09:32:01] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1403 is CRITICAL 40.00% of data above the critical threshold [0.0] [09:32:09] PROBLEM - Puppet failure on tools-checker-01 is CRITICAL 44.44% of data above the critical threshold [0.0] [09:32:13] PROBLEM - Puppet failure on tools-redis-01 is CRITICAL 44.44% of data above the critical threshold [0.0] [09:32:15] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1201 is CRITICAL 22.22% of data above the critical threshold [0.0] [09:32:23] PROBLEM - Puppet failure on tools-webgrid-generic-1403 is CRITICAL 55.56% of data above the critical threshold [0.0] [09:32:35] PROBLEM - Puppet failure on tools-exec-1406 is CRITICAL 50.00% of data above the critical threshold [0.0] [09:33:36] PROBLEM - Puppet failure on tools-exec-1218 is CRITICAL 20.00% of data above the critical threshold [0.0] [09:33:40] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1407 is CRITICAL 20.00% of data above the critical threshold [0.0] [09:33:42] PROBLEM - Puppet failure on tools-exec-1209 is CRITICAL 60.00% of data above the critical threshold [0.0] [09:34:18] PROBLEM - Puppet failure on tools-bastion-01 is CRITICAL 66.67% of data above the critical threshold [0.0] [09:34:24] PROBLEM - Puppet failure on tools-exec-1213 is CRITICAL 50.00% of data above the critical threshold [0.0] [09:35:34] PROBLEM - Puppet failure on tools-exec-1217 is CRITICAL 50.00% of data above the critical threshold [0.0] [09:35:37] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL 50.00% of data above the critical threshold [0.0] [09:35:45] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1203 is CRITICAL 50.00% of data above the critical threshold [0.0] [09:35:51] PROBLEM - Puppet failure on tools-exec-1205 is CRITICAL 40.00% of data above the critical threshold [0.0] [09:35:55] PROBLEM - Puppet failure on tools-static-02 is CRITICAL 40.00% of data above the critical threshold [0.0] [09:35:55] PROBLEM - Puppet failure on tools-exec-1206 is CRITICAL 30.00% of data above the critical threshold [0.0] [09:35:59] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1408 is CRITICAL 30.00% of data above the critical threshold [0.0] [09:36:09] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1404 is CRITICAL 55.56% of data above the critical threshold [0.0] [09:36:28] PROBLEM - Puppet failure on tools-exec-gift is CRITICAL 40.00% of data above the critical threshold [0.0] [09:36:44] PROBLEM - Puppet failure on tools-exec-1215 is CRITICAL 60.00% of data above the critical threshold [0.0] [09:36:50] PROBLEM - Puppet failure on tools-webproxy-02 is CRITICAL 40.00% of data above the critical threshold [0.0] [09:37:18] PROBLEM - Puppet failure on tools-submit is CRITICAL 55.56% of data above the critical threshold [0.0] [09:37:19] PROBLEM - Puppet failure on tools-exec-1219 is CRITICAL 60.00% of data above the critical threshold [0.0] [09:37:27] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1205 is CRITICAL 55.56% of data above the critical threshold [0.0] [09:37:35] PROBLEM - Puppet failure on tools-static-01 is CRITICAL 60.00% of data above the critical threshold [0.0] [09:38:55] PROBLEM - Puppet failure on tools-exec-1203 is CRITICAL 60.00% of data above the critical threshold [0.0] [09:38:57] PROBLEM - Puppet failure on tools-exec-1410 is CRITICAL 80.00% of data above the critical threshold [0.0] [09:38:57] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1209 is CRITICAL 60.00% of data above the critical threshold [0.0] [09:39:23] Cyberpower678: know anything about T13? [09:55:38] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1207 is OK Less than 1.00% above the threshold [0.0] [09:55:48] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1409 is OK Less than 1.00% above the threshold [0.0] [09:55:58] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1406 is OK Less than 1.00% above the threshold [0.0] [09:56:00] RECOVERY - Puppet failure on tools-exec-1212 is OK Less than 1.00% above the threshold [0.0] [09:56:08] RECOVERY - Puppet failure on tools-exec-1407 is OK Less than 1.00% above the threshold [0.0] [09:56:40] RECOVERY - Puppet failure on tools-services-01 is OK Less than 1.00% above the threshold [0.0] [09:56:41] RECOVERY - Puppet failure on tools-shadow is OK Less than 1.00% above the threshold [0.0] [09:56:45] RECOVERY - Puppet failure on tools-webgrid-generic-1401 is OK Less than 1.00% above the threshold [0.0] [09:57:21] RECOVERY - Puppet failure on tools-mail is OK Less than 1.00% above the threshold [0.0] [09:57:21] RECOVERY - Puppet failure on tools-webgrid-generic-1403 is OK Less than 1.00% above the threshold [0.0] [09:57:29] RECOVERY - Puppet failure on tools-exec-1216 is OK Less than 1.00% above the threshold [0.0] [09:57:31] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1206 is OK Less than 1.00% above the threshold [0.0] [09:57:33] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1410 is OK Less than 1.00% above the threshold [0.0] [09:57:37] RECOVERY - Puppet failure on tools-precise-dev is OK Less than 1.00% above the threshold [0.0] [09:57:39] RECOVERY - Puppet failure on tools-exec-1405 is OK Less than 1.00% above the threshold [0.0] [09:57:39] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1204 is OK Less than 1.00% above the threshold [0.0] [09:58:13] RECOVERY - Puppet failure on tools-webproxy-01 is OK Less than 1.00% above the threshold [0.0] [09:58:43] RECOVERY - Puppet failure on tools-exec-1209 is OK Less than 1.00% above the threshold [0.0] [09:59:11] RECOVERY - Puppet failure on tools-bastion-01 is OK Less than 1.00% above the threshold [0.0] [09:59:55] RECOVERY - Puppet failure on tools-master is OK Less than 1.00% above the threshold [0.0] [10:00:35] RECOVERY - Puppet failure on tools-exec-1211 is OK Less than 1.00% above the threshold [0.0] [10:00:41] RECOVERY - Puppet failure on tools-exec-1214 is OK Less than 1.00% above the threshold [0.0] [10:00:43] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1401 is OK Less than 1.00% above the threshold [0.0] [10:01:01] RECOVERY - Puppet failure on tools-exec-cyberbot is OK Less than 1.00% above the threshold [0.0] [10:01:46] RECOVERY - Puppet failure on tools-exec-1215 is OK Less than 1.00% above the threshold [0.0] [10:01:52] RECOVERY - Puppet failure on tools-webproxy-02 is OK Less than 1.00% above the threshold [0.0] [10:02:01] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1403 is OK Less than 1.00% above the threshold [0.0] [10:02:07] RECOVERY - Puppet failure on tools-checker-01 is OK Less than 1.00% above the threshold [0.0] [10:02:11] RECOVERY - Puppet failure on tools-redis-01 is OK Less than 1.00% above the threshold [0.0] [10:02:11] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1201 is OK Less than 1.00% above the threshold [0.0] [10:02:15] RECOVERY - Puppet failure on tools-submit is OK Less than 1.00% above the threshold [0.0] [10:02:19] RECOVERY - Puppet failure on tools-exec-1219 is OK Less than 1.00% above the threshold [0.0] [10:02:29] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1205 is OK Less than 1.00% above the threshold [0.0] [10:02:35] RECOVERY - Puppet failure on tools-static-01 is OK Less than 1.00% above the threshold [0.0] [10:02:37] RECOVERY - Puppet failure on tools-exec-1406 is OK Less than 1.00% above the threshold [0.0] [10:03:33] RECOVERY - Puppet failure on tools-exec-1218 is OK Less than 1.00% above the threshold [0.0] [10:03:41] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1407 is OK Less than 1.00% above the threshold [0.0] [10:03:57] RECOVERY - Puppet failure on tools-exec-1203 is OK Less than 1.00% above the threshold [0.0] [10:03:57] RECOVERY - Puppet failure on tools-exec-1410 is OK Less than 1.00% above the threshold [0.0] [10:04:21] RECOVERY - Puppet failure on tools-exec-1213 is OK Less than 1.00% above the threshold [0.0] [10:05:37] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK Less than 1.00% above the threshold [0.0] [10:05:38] RECOVERY - Puppet failure on tools-exec-1217 is OK Less than 1.00% above the threshold [0.0] [10:05:47] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1203 is OK Less than 1.00% above the threshold [0.0] [10:05:51] RECOVERY - Puppet failure on tools-exec-1205 is OK Less than 1.00% above the threshold [0.0] [10:05:54] RECOVERY - Puppet failure on tools-static-02 is OK Less than 1.00% above the threshold [0.0] [10:05:56] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1408 is OK Less than 1.00% above the threshold [0.0] [10:05:56] RECOVERY - Puppet failure on tools-exec-1206 is OK Less than 1.00% above the threshold [0.0] [10:06:10] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1404 is OK Less than 1.00% above the threshold [0.0] [10:06:32] RECOVERY - Puppet failure on tools-exec-gift is OK Less than 1.00% above the threshold [0.0] [10:08:58] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1209 is OK Less than 1.00% above the threshold [0.0] [10:39:38] 6Labs, 7Tracking: Elee Labs Project - https://phabricator.wikimedia.org/T104170#1409239 (10yuvipanda) Please be more specific as to what you plan on using these for? We prefer to not give out projects to individuals but to specific plans and activities. I also suggest using tools instead if you just want to ex... [10:40:47] 6Labs, 7Tracking: Create labs project for analysis of recent changes and user contributions - https://phabricator.wikimedia.org/T104144#1409243 (10yuvipanda) I suggested on IRC that you use tools instead and we can expand that as necessary. Do you still think you need your own project? [10:58:52] 6Labs, 7Tracking: Elee Labs Project - https://phabricator.wikimedia.org/T104170#1409325 (10Elee) Sure @yuvipanda - I thought I was specific when I said "test changes before deploying". It would be a variety of things. 1. Rewrite of xtools - we've fulfilled the quota for that project with 2 m1.xlarges and a m... [11:07:14] 6Labs, 7Tracking: Elee Labs Project - https://phabricator.wikimedia.org/T104170#1409355 (10yuvipanda) 1 should be done inside the xtools project - we can increase quotas if needed, but I would reccomend not using xlarges for a project that does not clearly need them. Please start with m1. mediums and go on to... [12:00:53] 6Labs, 7Tracking: Create labs project for analysis of recent changes and user contributions - https://phabricator.wikimedia.org/T104144#1409432 (10Luke081515) I prefer it, because this project works also a little bit at the direction of the abusefilter, so maybe I can also extend this extension. [12:22:33] elee, I consider him a friend why? [12:23:00] Cyberpower678: sounds good. Will he be working on xtools soon or should we start just banging something new out? [12:24:36] (03PS1) 10Sitic: Fix brackets problem with RTL text [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/221606 [12:24:53] elee, dunno yet [12:25:03] hrm, go bother him =p [12:25:08] (03CR) 10Sitic: [C: 032 V: 032] Fix brackets problem with RTL text [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/221606 (owner: 10Sitic) [12:25:35] elee, let me first wake up [12:25:47] T13 is probably still sleeping [12:26:19] heh [12:40:05] 6Labs, 7Mobile: Decide what to do with the android-build machine - https://phabricator.wikimedia.org/T104190#1409504 (10yuvipanda) 3NEW [12:40:21] hey andrewbogott [12:40:26] 'morning [12:40:28] andrewbogott: WIP but https://tools.wmflabs.org/watroles/role/role::puppet::self [12:40:49] no UI yet for actually specifying what role you want to check but you can click around or modify the URL [12:41:38] Ah, that’s just a report of every puppet class for every project? [12:41:46] YuviPanda: https://tools.wmflabs.org/watroles/variable/realm/labs :P [12:42:19] andrewbogott: yeah, you can query specific ones though. that link is all instances with role::puppet::self [12:42:32] it's a web frontend to LDAP basically [12:42:34] JohnFLewis: :) indeed. [12:42:38] seems handy! [12:42:47] YuviPanda: did you notice that ‘abuse’ email that came in on Friday and again today? [12:43:26] andrewbogott: yeah, I saw your response, and it's an automated one from fail2ban [12:43:30] Oh, I see, discussing it in security already [12:44:01] I thought you might have a better idea of how to track down the offender. I sure don't. [12:44:48] andrewbogott: me neither. we also don't really know wtf the report is actually for - HTTP? ssh? [12:45:00] yeah, it’s not very helpful. [12:49:59] andrewbogott: I'm going to merge https://gerrit.wikimedia.org/r/#/c/218637/21,publish and update documentation accordingly now. [12:51:16] YuviPanda: ok [12:57:09] 6Labs, 7Tracking: Elee Labs Project - https://phabricator.wikimedia.org/T104170#1409563 (10scfc) Also, using `nmap` might be considered as "accessing other systems without authorization" under the [[https://wikitech.wikimedia.org/wiki/Wikitech:Labs_Terms_of_use|Labs Terms of Use]]. It is certainly not needed... [13:01:34] (03CR) 10Yuvipanda: [C: 032 V: 032] Add Gage's key to labs root [labs/private] - 10https://gerrit.wikimedia.org/r/221300 (owner: 10Gage) [13:05:25] 6Labs: Find replacements for various things that people were using NFS for but should not have been (Tracking) - https://phabricator.wikimedia.org/T104193#1409578 (10yuvipanda) 3NEW [13:06:36] 6Labs: Simple method to have a per-project debian repository - https://phabricator.wikimedia.org/T104194#1409586 (10yuvipanda) 3NEW [13:07:12] 6Labs: implement a sane way to share dotfiles across labs instances - https://phabricator.wikimedia.org/T102173#1409593 (10yuvipanda) [13:07:14] 6Labs: Find replacements for various things that people were using NFS for but should not have been (Tracking) - https://phabricator.wikimedia.org/T104193#1409592 (10yuvipanda) [13:07:26] 6Labs: Simple method to have a per-project debian repository - https://phabricator.wikimedia.org/T104194#1409586 (10yuvipanda) Having a simple to use aptly role might be an answer? [13:07:54] 6Labs, 3Labs-Sprint-102, 3Labs-Sprint-103, 5Patch-For-Review: Disable NFS by default for new projects - https://phabricator.wikimedia.org/T102403#1409599 (10yuvipanda) 5Open>3Resolved Done now! [13:11:15] hi,what means:0-59 0-23 3 8 * 2014 /home/user/Downloads/script_cround.sh as crontab line [13:17:17] !log deployment-prep restarting Elasticsearch to pick up new plugin versions [13:17:21] Logged the message, Master [13:38:36] 6Labs: please move the directory /home/matanya/winners-auctions.com/ from /home to /srv on the video project, instance encoding01.wmflabs.org - https://phabricator.wikimedia.org/T104197#1409675 (10Matanya) 3NEW [13:38:42] YuviPanda: ^ [13:40:23] 6Labs: please move the directory /home/matanya/winners-auctions.com/ from /home to /srv on the video project, instance encoding01.wmflabs.org - https://phabricator.wikimedia.org/T104197#1409693 (10yuvipanda) aaah, so we can't actually do that - /srv is local to the instance itself, and so this would actually ne... [13:40:27] matanya: unfortunately not useful for /srv [13:41:03] so rate limit rsync would work ? [13:41:08] matanya: yeah [13:41:16] ok, starting now [13:41:49] 6Labs: please move the directory /home/matanya/winners-auctions.com/ from /home to /srv on the video project, instance encoding01.wmflabs.org - https://phabricator.wikimedia.org/T104197#1409695 (10Matanya) 5Open>3Invalid a:3Matanya will rsync locally with rate limiting [13:43:35] YuviPanda: please keep an eye, to see i am not killing it [13:43:55] matanya: :) will do! there'll be icinga alerts for labstore1002 on -operations.. [13:44:20] ok, thanks [13:49:08] hi guys [13:49:51] just a stupid thing. but i feel the terminal prompt a bit slower when connected to tool labs lately [13:50:14] i can't type that fast or go up and down with previous commands with the speed i used to [13:50:15] is it me or could there be sth? [13:51:14] valhallasw: do you have the script you used to measure ssh latencies? [13:51:21] YuviPanda: yes [13:51:22] marmick: have you considered using mosh instead of ssh? [13:51:23] sec [13:51:35] valhallasw: would be nice to have marmick run it and see? I can also make my friends in India run it [13:52:07] never heard of mosh [13:52:41] https://gist.github.com/valhallasw/aaf79b1617b27d9f9d4a [13:53:01] is it really a good leap between the regular terminal-ssh connection and mosh? [13:53:21] 6Labs, 10Incident-20150617-LabsNFSOutage, 3Labs-Sprint-102, 3Labs-Sprint-103: Audit projects' use of NFS, and remove it where not necessary - https://phabricator.wikimedia.org/T102240#1409718 (10yuvipanda) [13:53:25] 6Labs, 10VisualEditor: Investigate and potentially move off NFS in the 'visualeditor' project - https://phabricator.wikimedia.org/T102688#1409716 (10yuvipanda) 5Open>3Resolved Gwicke rescued the files he needed onto parsoid-spof, and the rest have been unmounted now. I also fixed puppet in the towtruck ins... [13:55:50] marmick: especially with local prediction it's a huge improvement [13:56:36] ok [13:56:36] should i run the code you posted? [14:01:34] valhallasw: [14:01:46] marmick: install mosh and attempt to use those instead? [14:02:01] marmick: eh, if you want to measure the RTT, yes, but if you just want to login with better performance, use mosh [14:13:55] YuviPanda: valhallasw, i'm donig that, yes. [14:14:04] some problems with locales appeared though. i'm on it. [14:30:50] 6Labs: Send alert emails to any project admin responsible for an instance with broken puppet - https://phabricator.wikimedia.org/T104199#1409758 (10yuvipanda) [14:31:42] 6Labs: Send alert emails to any project admin responsible for an instance with broken puppet - https://phabricator.wikimedia.org/T104199#1409745 (10yuvipanda) So, let's define a 'check if instance is OK' check, which checks for: # ping # puppet running at all (ignore puppet failures maybe?) And when they fail... [14:31:52] andrewbogott: I can take this on for this sprint if you'd like :) [14:32:01] cool [14:32:11] andrewbogott: do my checks there sound sane? [14:32:20] I think so [14:32:34] 6Labs: Send alert emails to any project admin responsible for an instance with broken puppet - https://phabricator.wikimedia.org/T104199#1409761 (10Andrew) maybe 3. salt ? [14:32:38] YuviPanda: unless explicitly disabled with puppet agent --disable [14:32:41] ? [14:33:00] 6Labs: Send alert emails to any project admin responsible for an instance with broken puppet - https://phabricator.wikimedia.org/T104199#1409762 (10Andrew) Oh, I guess salt doesn't work since some hosts have local salt masters. [14:33:00] valhallasw: nope, doing that for extended periods of time is a bad thing and people should not do it... [14:33:20] YuviPanda: go fix tools-mailrelay-01, then =p [14:33:27] 6Labs: Send alert emails to any project admin responsible for an instance with broken puppet - https://phabricator.wikimedia.org/T104199#1409763 (10yuvipanda) How would we check for salt? Puppet can be checked via graphite, ping can be checked by pinging... [14:33:45] valhallasw: that instance is easily deleteable too if it goes bad. [14:33:48] YuviPanda: unrelated, I have two things: 1) How can I moderate labs-announce? 2) Do we really need labs-announce in addition to labs-l? [14:34:13] andrewbogott: JohnFLewis should've sent you a password when he created the list, if not I can find out and send you [14:34:17] YuviPanda: it is, but that's not the point. having an instance where one is still fiddling is sane [14:34:20] I’ll look... [14:34:36] valhallasw: indeed, and getting an alert saying 'yo you have puppet not running there!' is sane too :) [14:34:37] if you spam people for those instances, they are just going to ignore the mails altogether [14:34:57] I don’t think I have it [14:35:00] valhallasw: we should have an appropriate level of spamming, I think. [14:35:06] andrewbogott: let me find and forward. [14:35:13] valhallasw: that’s why I proposed 48 hours — an interval of fiddling. I would be fine with that being 7 days or 14 even. [14:35:14] also, looking at https://tools.wmflabs.org/watroles/role/role::puppet::self [14:35:30] most are deployment-prep or integration, which have people actively taking care of it [14:35:55] and then there's the long tail... [14:36:13] IMO when someone stops fiddling for the day they should ‘git checkout -b fiddling && git checkout master && git reset —hard origin’ [14:36:28] But that’s a bit much to ask [14:36:37] or autoupdate will take care of it as long as it doesn't conflict [14:36:56] true [14:38:18] andrewbogott: emailed you labs-announce password [14:38:20] 6Labs: Send alert emails to any project admin responsible for an instance with broken puppet - https://phabricator.wikimedia.org/T104199#1409772 (10Negative24) I have a feeling I'm going to be getting these emails quite often... [14:38:23] thanks [14:38:43] Negative24: if you update your self hosted puppetmaster to latest, you'll get auto update :) [14:38:51] andrewbogott: can you also email about auto update? [14:39:10] sure [14:39:26] andrewbogott: thanks :) [14:40:40] YuviPanda: I believe it did auto update. Was it that puppet flag? [14:40:56] Negative24: yup. if that's the case then you'll be ok. the puppet flag just defaults to true now instead of false [14:41:35] I believe it ran because my work had been wiped 3 or 4 times :P [14:41:50] YuviPanda: is recovery possible now? or, is fsck done now [14:42:00] liangent: yes! [14:42:10] Negative24: ah :) it stopped doing that as well now... [14:42:19] 6Labs: Send alert emails to any project admin responsible for an instance with broken puppet - https://phabricator.wikimedia.org/T104199#1409781 (10Andrew) @Negative24, are we missing a use case where it is useful/important to leave an instance with broken puppet? [14:42:34] YuviPanda: that's good because I promptly removed the crontab entry [14:42:47] Negative24: heh :) [14:42:57] Negative24: should make it autostash... [14:43:19] YuviPanda: haven't seen an announcement yet [14:43:30] liangent: oh, I thought one was made... [14:43:32] let me find out [14:43:32] YuviPanda: the method of sending out passwords is awesome is it not? :) [14:44:20] Negative24: autoupdate should never have wiped anything out… it rebases, it doesn’t reset [14:44:26] * andrewbogott checks to make sure this is true [14:44:28] andrewbogott: it did until your change... [14:44:45] oh... [14:44:46] ok then [14:44:55] Negative24: you can blame that one one me [14:45:21] 6Labs: Send alert emails to any project admin responsible for an instance with broken puppet - https://phabricator.wikimedia.org/T104199#1409783 (10Negative24) @Andrew, well I do have an instance (phab-pup) which I'm writing puppet configs for and will eventually have those merged into the main puppetmaster but... [14:45:25] he [14:45:27] *heh [14:45:47] 6Labs, 10Incident-20150617-LabsNFSOutage, 3Labs-Sprint-103: Labs: Salvage, then remove volumes on labstores' raid6 - https://phabricator.wikimedia.org/T103265#1409785 (10yuvipanda) [14:45:47] liangent: yup, no announcement. let me make one now. [14:45:57] liangent: can you file a subtask of ^ for your recovery request? [14:47:07] Negative24: are the local changes on phab-pup incompatible with upstream puppet? [14:48:00] andrewbogott: not incompatible just I have to have them tested in their intermediary stages with itself [14:48:17] its ssh stuff so testing is a must [14:48:30] ok, so why does that entail leaving that instance with failing puppet runs? [14:49:14] they're not failing. Its disabled so it doesn't rollback the changes [14:49:20] oh I see [14:49:23] I wasn't clear [14:49:39] phab-pup isn't failing. phab-02 is disabled [14:49:46] phab-pup is just fine [14:49:56] so puppet is disabled on phab-02 [14:49:59] yea [14:50:04] Because you have local changes that will be clobbered by puppet runs. [14:50:08] yup [14:50:10] So... [14:50:11] don’t do that. [14:50:33] Aren’t the local changes puppetized in phab-pup? [14:50:47] ^ pending [14:50:54] in the process [14:51:17] ok… so that’s where the ’48 hours’ comes in :) [14:51:35] yea so I need to get on that :) [14:54:21] YuviPanda: I've already created a task somewhere [14:54:22] in phab [14:55:06] hmm and coren has responded [14:55:07] https://phabricator.wikimedia.org/T103268 [14:55:46] liangent: ah, I see. want me to look again? [14:57:19] andrewbogott: is this andrewmadethis branch still important? [14:57:24] YuviPanda: so, I believe I made some change in ~/mw, in response to https switch [14:57:42] Negative24: no need to preserve it, I just needed a branch that worked. [14:57:53] You can replace it with a branch of your choice as long as it’s up to date. [14:58:28] liangent: am looking again now [14:59:22] liangent: uh, there's no mw directory at all? [14:59:24] YuviPanda: more exactly, I did something on 2015-06-16 UTC, after some user told me "my bot is broken", because HTTP api doesn't work anymore [15:00:19] YuviPanda: /data/project/liangent-php/mw [15:00:23] not there? [15:00:34] how would I overwrite the ssh port # specified here https://github.com/wikimedia/operations-puppet/blob/production/modules/ssh/manifests/server.pp#L2 [15:00:50] liangent: ah, no I was looking in home/liangent [15:01:16] btw does it make sense to force https for traffic from labs? [15:01:59] I guess all of them are internal, and using https just adds extra loads [15:02:01] liangent: I'm just copying your mw folder into ~/recovery [15:02:29] YuviPanda: ok but it's huge :p [15:02:40] is done [15:02:44] huh [15:03:21] yeah got it [15:03:55] YuviPanda: may I get ~/.bash_history as well? [15:11:34] liangent: sure [15:12:00] liangent: done for .bash_history too (under recovery) [15:13:31] YuviPanda: would http://pastebin.com/FtCQQ6QJ override https://github.com/wikimedia/operations-puppet/blob/production/modules/ssh/manifests/server.pp#L2 correctly? [15:13:43] YuviPanda: thanks! [15:13:53] Negative24: nope [15:14:04] Negative24: would give you a second ssh sever [15:14:08] thought so [15:14:46] I would probably have to block base from running which is very bad [15:15:06] YuviPanda: -rw------- 1 root tools.liangent-php 66938 Jun 29 15:11 .bash_history [15:15:10] I can't read it :( [15:15:26] Negative24: why do you want to remove the current one? [15:15:31] liangent: use 'take' command? [15:15:34] on the recovery directory [15:16:10] YuviPanda: tools.liangent-php@tools-bastion-01:~/recovery$ take .bash_history [15:16:11] .bash_history: you must own the containing directory [15:16:16] YuviPanda: https://phabricator.wikimedia.org/T94217 [15:16:19] drwxr-sr-x 19 root tools.liangent-php 4096 Jun 29 15:02 mw [15:16:26] liangent: go to your homedir, take recovery itself? [15:17:23] seems working, but slow... [15:17:35] liangent: yeah, it's recursively chowning things to you [15:19:57] YuviPanda: could I specify that ssh server on port 22 to use a different config [15:20:05] or I could just use the regular config [15:20:12] any command to compare directory tree and output different (by content) files? [15:20:17] Negative24: probably not. but you can make the port number of the default a hiera-izeable value... [15:20:48] liangent: http://stackoverflow.com/questions/6710878/diff-a-directory-recursively-ignoring-all-binary-files [15:21:10] YuviPanda: I'll just setup the /etc/ssh/sshd_config as the phabricator config and then create a new ssh server with port 222 and the regular sshd_config [15:21:25] may make things a bit more confusing [15:21:36] Negative24: that'll just require you to disable puppet, no? [15:21:52] gr [15:21:55] Negative24: want me to put up a patch showing how? moment [15:21:56] thx [15:22:18] I've disabled puppet and that's gotten me into trouble :P [15:22:45] Negative24: ah! [15:22:49] Negative24: you can just put [15:23:00] Negative24: ssh::server::listen_port: 222 [15:23:02] Negative24: in hiera config [15:23:04] and its hould just work [15:23:06] *should [15:23:26] but that would do it to all instances in a project wouldn't it? [15:23:37] Negative24: there's a trick for that, let me show you in a few minutes. [15:24:13] I was trying to go for a role that would automatically do it so we didn't have to do anything except enable it for future instances [15:26:34] Negative24: so which instance is this? [15:27:03] phab-pup is the puppetmaster and phab-02 is the instance with the fancy ssh configs [15:27:06] 6Labs, 7Puppet: Allow per-host hiera overrides via wikitech - https://phabricator.wikimedia.org/T104202#1409856 (10yuvipanda) 3NEW [15:27:24] Negative24: alright, let me make a local commit on phab-pup showing [15:28:03] ok [15:28:45] Negative24: see now [15:29:05] I see [15:29:33] so that overrides the base ssh server [15:29:37] Negative24: yeah [15:29:42] well, the particular parameter [15:30:08] and that can't be puppetized? [15:31:01] Negative24: as a separate class? no - it's just set in hiera. [15:31:14] Negative24: that commit can go in the operations/puppet repo too - I'll happily merge it for you if you want [15:31:23] that's too bad [15:31:26] please do [15:31:37] Negative24: commit the patch? :) [15:31:44] to gerrit that is [15:31:53] one sec [15:50:41] YuviPanda: something has seriously gone wrong with git review [15:50:47] on my machine [15:50:58] and it just started [15:54:21] YuviPanda: and ~/.mysql_history [15:54:27] thanks for joining the program git-review [15:54:31] YuviPanda: https://gerrit.wikimedia.org/r/#/c/221649/ [15:55:21] 6Labs, 10wikitech.wikimedia.org, 5Patch-For-Review, 5WMF-deploy-2015-06-23_(1.26wmf11), 5WMF-deploy-2015-06-30_(1.26wmf12): Automatically grant shell user right to everyone who signs up on wikitech - https://phabricator.wikimedia.org/T97334#1409934 (10yuvipanda) Wheee this is done now :D Wait for a few... [15:55:32] Negative24: :) I hate git review, never use it [15:55:35] Negative24: sure, merging now [15:55:40] liangent: sure! after Negative24 :) [15:56:02] YuviPanda: it did something funny this time (Fast forward only?) [15:56:24] YuviPanda: what do you use? [15:57:09] Negative24: I just do 'git push gerrit HEAD:refs/for/production' [15:57:30] Negative24: alright, merged. you should be able to remove that local cherrypick now [15:57:44] YuviPanda: will do thanks [15:58:00] liangent: doing yours now [15:58:17] liangent: done [15:58:21] under the recovery folder again [15:59:07] 6Labs, 6Phabricator: Phabricator security policy open up port 222 for regular ssh with git on port 22 - https://phabricator.wikimedia.org/T94217#1409936 (10yuvipanda) https://gerrit.wikimedia.org/r/#/c/221649/ is related [15:59:17] YuviPanda: nice [16:01:42] YuviPanda: how do you get change-ids then? [16:02:00] Negative24: git review -s once per repo introduces a post-commit hook that does the change ids for you [16:02:12] it's also a trivial script you can copy around yourself [16:02:23] so git review just for ids then [16:02:33] yup [16:04:30] 6Labs, 7Mobile: Decide what to do with the android-build machine - https://phabricator.wikimedia.org/T104190#1409941 (10bearND) For now we still need this setup for our alpha users. T99115 could be a replacement for this but I don't know when we get that. [16:15:01] 6Labs: Provide a simple way to backup arbitrary files from instances - https://phabricator.wikimedia.org/T104206#1409976 (10yuvipanda) 3NEW [16:15:25] o/ was wondering if someone could give me a hand at getting a tools lab project up and running. [16:15:41] I tried "create new tool" but that doesn't appear to do anything. [16:16:25] er wait [16:16:32] wait... I think I did it [16:16:42] hah - I'll come back later if this is not the case. [16:16:46] thanks for the help, or lack thereof =] [16:26:30] 6Labs: Send alert emails to any project admin responsible for an instance with broken puppet - https://phabricator.wikimedia.org/T104199#1410012 (10yuvipanda) Hmm, I don't think we expose emails of people directly, but I think we can just post messages on their wikitech talk pages. @Andrew would that be acceptab... [16:27:14] 6Labs, 3Labs-Sprint-104, 7Puppet: Allow per-host hiera overrides via wikitech - https://phabricator.wikimedia.org/T104202#1410013 (10yuvipanda) [16:27:49] * YuviPanda pokes MaxSem with https://phabricator.wikimedia.org/T103757 [16:28:09] MaxSem: I think you might've responded on IRC but I think I might have missed it [16:28:43] I think I responded on the bug itself? :P [16:29:50] 6Labs, 6Discovery, 10Maps: Investigate and reduce NFS use in maps-team project - https://phabricator.wikimedia.org/T103757#1410022 (10yuvipanda) Can you use /data/scratch for that instead? [16:29:51] MaxSem: woops, you did [16:31:10] 6Labs, 6operations, 3Labs-Sprint-104, 5Patch-For-Review: update star.wmflabs.org cert from sha1 to sha256 - https://phabricator.wikimedia.org/T104017#1410027 (10yuvipanda) [16:31:30] MaxSem: can you use /data/scratch for those instead? seems better fit for that use case than /data/project [16:32:33] where can I read about these differences? [16:34:53] MaxSem: at https://wikitech.wikimedia.org/wiki/Help:Shared_storage but I'm in the middle of a major rewrite of that thing :) [16:35:14] MaxSem: /data/scratch is common across all projects, is not backed up in case of disaster, and a bit faster. should be used for cache / move around related stuff [16:35:32] 6Labs: Send alert emails to any project admin responsible for an instance with broken puppet - https://phabricator.wikimedia.org/T104199#1410047 (10Elee) It would be reasonable to make this an opt-in service, no? [16:35:58] hmm [16:36:36] 6Labs: Send alert emails to any project admin responsible for an instance with broken puppet - https://phabricator.wikimedia.org/T104199#1410049 (10yuvipanda) No - unmaintained instances are a burden on the labs operations team, and so I do not think you should have to opt in to receive email saying you are doin... [16:38:16] 6Labs: Send alert emails to any project admin responsible for an instance with broken puppet - https://phabricator.wikimedia.org/T104199#1410057 (10yuvipanda) Oh, do ignore me - email is possible as well (I hadn't looked properly >_>). However, we should still discuss if this should be email vs talk page notific... [16:42:22] 6Labs: Send alert emails to any project admin responsible for an instance with broken puppet - https://phabricator.wikimedia.org/T104199#1410065 (10Andrew) Talk page is probably adequate. [16:43:21] YuviPanda: last night my calendar dinged and announced that the toolserver ssl cert expires today. Can you think of any reason why that would matter? [16:43:27] 6Labs, 3Labs-Sprint-104: Send alert emails to any project admin responsible for an instance with broken puppet - https://phabricator.wikimedia.org/T104199#1410068 (10yuvipanda) [16:43:31] e.g. is there a dead host someplace redirecting people to tools, that is now failing? [16:43:51] andrewbogott: mine did too, and I think it does - coren would know, but I don't know if there are any actions we need to do [16:44:02] of course [16:44:03] andrewbogott: https://toolserver.org/ [16:44:47] 6Labs, 3Labs-Sprint-104: Send alert emails to any project admin responsible for an instance with broken puppet - https://phabricator.wikimedia.org/T104199#1410072 (10Andrew) @Elee: No! Project admins have responsibilities, and one of them is to keep their instances up to date and working. [16:44:57] shit [16:45:16] well, maybe we should make toolserver.org http only? No real need for security when there’s only one page there. [16:46:10] YuviPanda: could you please remind me of how to create my db in the different instances of tool labs? I posed my problem several weeks ago in the channel and Coren got to that conclusion. I needed to create my user-db in every instance, so I could use them. [16:46:12] ah, it already servers a perfectly reasonable http page [16:46:21] but with the failure i think it broke, could it be? [16:46:25] andrewbogott: indeed but there are https links to it in wiki pages, I think. [16:46:31] I'm not sure if Coren|Away or valhallasw told me that. [16:46:32] hm [16:47:03] marmick: the databases are completely seperate from nfs, so that's unrelated [16:47:17] valhallasw: oks. [16:47:45] then i'll check sth else [16:48:35] 6Labs, 7HTTPS: Renew toolserver.org SSL certificate - https://phabricator.wikimedia.org/T104211#1410083 (10yuvipanda) 3NEW [16:50:38] 6Labs, 10Datasets-General-or-Unknown, 6operations, 10wikitech.wikimedia.org: Provide dumps of wikitech.wikimedia.org - https://phabricator.wikimedia.org/T54170#1410096 (10ArielGlenn) should we host the lastest dump on dumps.wm.org? [16:51:07] YuviPanda: who is aklapper and why do I see him everywhere? =p [16:51:42] 6Labs, 7HTTPS: Renew toolserver.org SSL certificate - https://phabricator.wikimedia.org/T104211#1410101 (10Andrew) There's a fair case to be made for letting it rot, but I think it's worth a bit of money just to keep people from hitting a cert warning. [16:52:01] elee: Klapper is the king of bugs, he keeps Phabricator tidy. [16:52:32] elee: An actual human, not a bot. [16:52:34] andrewbogott: heh [16:52:47] why did everything move to phab? [16:52:54] I'm still happy here with trac =p [16:53:35] https://blog.wikimedia.org/2014/06/10/on-our-way-to-phabricator/ [16:54:52] 6Labs, 10Datasets-General-or-Unknown, 6operations, 10wikitech.wikimedia.org: Provide dumps of wikitech.wikimedia.org - https://phabricator.wikimedia.org/T54170#1410103 (10Krenair) Maybe. Might need to keep in mind that you can only login to the database as wikiadmin from silver (e.g. it won't work from tin... [16:55:11] 6Labs, 3Labs-Sprint-104, 7Puppet: Allow per-host hiera overrides via wikitech - https://phabricator.wikimedia.org/T104202#1410105 (10scfc) [16:58:31] 6Labs, 6operations: salt does not run reliably for toollabs - https://phabricator.wikimedia.org/T99213#1410114 (10ArielGlenn) I'm going through all the labs instances and: converting those that still talk to virt1001 to the new saltmaster, generating shorter keys as we have for production, and testing. there a... [17:02:21] PROBLEM - Free space - all mounts on tools-webproxy-01 is CRITICAL tools.tools-webproxy-01.diskspace.root.byte_percentfree (<60.00%) [17:11:06] 6Labs, 3Labs-Sprint-104: Send alert emails to any project admin responsible for an instance with broken puppet - https://phabricator.wikimedia.org/T104199#1410163 (10scfc) I think mail might be more fruitful, as I believe few people check their wikitech talk pages regularly or enable the wiki notifications (by... [17:15:07] 6Labs, 10RESTBase, 10Traffic, 6operations, and 2 others: Fix RESTBase support for wikitech.wikimedia.org - https://phabricator.wikimedia.org/T102178#1410178 (10GWicke) p:5Triage>3Normal [17:16:31] YuviPanda: could you do a quick lookover of http://pastebin.com/udAuVwaZ [17:16:44] 6Labs, 10RESTBase, 10Traffic, 6operations, and 2 others: Fix RESTBase support for wikitech.wikimedia.org - https://phabricator.wikimedia.org/T102178#1410181 (10GWicke) @bblack, do you think it is feasible / advisable to add a `/api/rest_v1/` rewrite in the wikitech nginx config? [17:17:19] 6Labs, 7HTTPS: Renew toolserver.org SSL certificate - https://phabricator.wikimedia.org/T104211#1410184 (10Andrew) Rob says ~$115 to keep this alive for another year. [17:17:31] there's probably a dozen mistakes because I'm not good at puppet [17:23:29] 6Labs, 7HTTPS: Renew toolserver.org SSL certificate - https://phabricator.wikimedia.org/T104211#1410216 (10Andrew) I will try to research whether or not anyone is still hitting that page. [17:24:49] Negative24: not right now, sorry :( [17:25:01] np [17:25:03] Negative24: you should also put it up on gerrit and have people who do phab stuff look at it :) [17:25:14] ok [17:26:40] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Temporarily Unavailable - string 'Magnus' not found on 'http://tools.wmflabs.org:80/' - 401 bytes in 0.002 second response time [17:27:11] 503 Service Temporarily Unavailable ? [17:27:18] woah woah [17:27:20] what happened there [17:27:22] all webservices [17:27:34] broken! [17:28:58] !log tools failed over tools webproxy to tools-webproxy-02 [17:29:02] Logged the message, Master [17:29:02] Steinsplitter: phe back now [17:29:10] :) [17:31:49] 6Labs, 7HTTPS, 3Labs-Sprint-104: Renew toolserver.org SSL certificate - https://phabricator.wikimedia.org/T104211#1410274 (10Andrew) [17:37:01] 6Labs, 3Labs-Sprint-103, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Set up labcontrol1002 as hot spare for labcontrol1001. - https://phabricator.wikimedia.org/T103722#1410307 (10Andrew) 5Open>3Resolved [17:37:02] 6Labs, 3Labs-Sprint-103, 3ToolLabs-Goals-Q4: virt1000 SPOF - https://phabricator.wikimedia.org/T90625#1410308 (10Andrew) [17:37:19] RECOVERY - Free space - all mounts on tools-webproxy-01 is OK All targets OK [17:38:36] 6Labs, 3Labs-Sprint-103, 3Labs-Sprint-104: Limit available images on horizon - https://phabricator.wikimedia.org/T91782#1410317 (10Andrew) [17:38:37] 6Labs, 3Labs-Sprint-104: Make labs domainproxies fully redundant - https://phabricator.wikimedia.org/T98556#1410316 (10yuvipanda) [17:39:54] 6Labs, 3Labs-Sprint-103, 3Labs-Sprint-104: In openstack upstream, add project_id to instance metadata - https://phabricator.wikimedia.org/T103384#1410324 (10Andrew) [17:40:19] 6Labs, 10Labs-Infrastructure, 5Continuous-Integration-Isolation, 3Labs-Sprint-103, and 2 others: Instances without a shared NFS storage suffers from a 3 minutes boot delay - https://phabricator.wikimedia.org/T102544#1410329 (10Andrew) [17:41:10] 6Labs, 3Labs-Sprint-102, 3Labs-Sprint-103, 3Labs-Sprint-104, 5Patch-For-Review: Replace puppetsigner with a script to clean certificates, puppet's autosign and salt's auto accept - https://phabricator.wikimedia.org/T102504#1410339 (10Andrew) [17:45:25] 6Labs, 3Labs-Sprint-104: Provide a simple way to backup arbitrary files from instances - https://phabricator.wikimedia.org/T104206#1410351 (10yuvipanda) [17:46:43] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 806085 bytes in 2.298 second response time [17:47:46] 6Labs, 10Incident-20150617-LabsNFSOutage, 3Labs-Sprint-102, 3Labs-Sprint-103, 3Labs-Sprint-104: Audit projects' use of NFS, and remove it where not necessary - https://phabricator.wikimedia.org/T102240#1410371 (10yuvipanda) [17:56:09] 6Labs, 3Labs-Sprint-104: Learn about/document NFS and LVM setup - https://phabricator.wikimedia.org/T104220#1410655 (10Andrew) 3NEW a:3Andrew [17:58:21] 6Labs: Make labs domainproxies fully redundant - https://phabricator.wikimedia.org/T98556#1410771 (10yuvipanda) [17:58:24] 6Labs: Send alert emails to any project admin responsible for an instance with broken puppet - https://phabricator.wikimedia.org/T104199#1410775 (10yuvipanda) [17:58:25] 6Labs, 7HTTPS, 3Labs-Sprint-104: Renew toolserver.org SSL certificate - https://phabricator.wikimedia.org/T104211#1410778 (10mark) Let's just renew it for another year. [17:58:27] 6Labs, 3Labs-Sprint-102, 3Labs-Sprint-104, 5Patch-For-Review: Labs: manage-nfs-volumes-daemon rewrite - https://phabricator.wikimedia.org/T102782#1410779 (10yuvipanda) [17:59:37] 6Labs, 3Labs-Sprint-102, 3Labs-Sprint-104, 5Patch-For-Review: Labs: manage-nfs-volumes-daemon rewrite - https://phabricator.wikimedia.org/T102782#1410805 (10yuvipanda) @mark has asked me to pick this up and attempt to finish this off this week, since it currently blocks new instances in projects with NFS f... [18:08:37] 6Labs, 7HTTPS, 3Labs-Sprint-104: Renew toolserver.org SSL certificate - https://phabricator.wikimedia.org/T104211#1410826 (10Andrew) a:3RobH [18:11:36] 6Labs, 7HTTPS, 3Labs-Sprint-104: Renew toolserver.org SSL certificate - https://phabricator.wikimedia.org/T104211#1410828 (10RobH) Since it will be a new certificate (as the old one expired and is on an old vendor), would you guys like to furnish (or have me create) a new key for this as well? [18:23:23] 6Labs, 6operations, 3Labs-Sprint-102, 3Labs-Sprint-103, and 2 others: labstore has multiple unpuppetized files/scripts/configs - https://phabricator.wikimedia.org/T102478#1410880 (10yuvipanda) [18:23:29] 6Labs, 10Incident-20150617-LabsNFSOutage, 3Labs-Sprint-103, 3Labs-Sprint-104: Labs: Salvage, then remove volumes on labstores' raid6 - https://phabricator.wikimedia.org/T103265#1410881 (10yuvipanda) [18:23:38] 6Labs, 10Incident-20150617-LabsNFSOutage, 3Labs-Sprint-103, 3Labs-Sprint-104: Labs: Make a new backup of the Labs storage to codfw - https://phabricator.wikimedia.org/T103356#1410882 (10yuvipanda) [18:23:47] 6Labs, 10Incident-20150617-LabsNFSOutage, 3Labs-Sprint-103, 3Labs-Sprint-104: Labs: increase size of the volume for the maps project and restore - https://phabricator.wikimedia.org/T103358#1410884 (10yuvipanda) [18:31:31] puppet TIL: classname { ['a', 'b'] : ....} is exactly equivalent to classname { 'a': }; classname { 'b': } [18:39:41] 6Labs, 6operations, 7Puppet: Labs puppet breaks for projects without a Hiera: page on wikitech - https://phabricator.wikimedia.org/T101913#1410960 (10yuvipanda) 5Open>3Resolved a:3yuvipanda [18:46:14] 6Labs: Investigate NFS alternatives to the wikistream project - https://phabricator.wikimedia.org/T103148#1410981 (10yuvipanda) I've copied the homedirs to the local instance and everything seems to continue to work. [18:49:06] valhallasw, where do you request a mailing list? [18:49:21] Cyberpower678: hm? [18:49:48] Cyberpower678: on phabricator, somewhere, I think? there's probably docs on mw.org [18:50:00] https://phabricator.wikimedia.org/maniphest/task/create/?projects=Wikimedia-Mailing-lists [18:50:21] valhallasw, JohnFLewis: thanks [19:09:48] Cyberpower678: check the ticket [19:09:59] I see it. Responding [19:12:24] !log tools.forrestbot threw away emails mentioning T104123 because they had a bad Bug: header [19:12:28] Logged the message, Master [19:15:14] JohnFLewis, responded [19:15:21] checking [19:16:09] Cyberpower678: do you want as an announcement only or not? [19:16:23] so only people who are selected can email the list [19:16:37] like labs-announce@lists.wikimedia.org is set up as (and wikimedia-announce) [19:16:38] JohnFLewis, can that be changed by listadmin [19:16:44] yes [19:16:55] Then no announcement only [19:19:17] JohnFLewis, ^ [19:19:41] so public, anyone can email in and it'll be sent [19:19:49] Yes. [19:20:23] 6Labs, 10Deployment-Systems, 10wikitech.wikimedia.org, 5Patch-For-Review: Merge as many configuration hacks in wikitech.php configuration file as possible into InitialiseSettings.php - https://phabricator.wikimedia.org/T75939#1411236 (10Krenair) I think we can probably resolve this after the above commit g... [19:21:16] okay, done. [19:21:28] JohnFLewis, many thanks. :-) [19:22:10] 6Labs, 10Incident-20150617-LabsNFSOutage: Recover data file /data/project/phetools/public_html/data/new_stats.py - https://phabricator.wikimedia.org/T104239#1411239 (10Phe) 3NEW [19:23:30] Cyberpower678: enjoy the new list :) [19:24:20] JohnFLewis, I will. It will hopefully reduce the amount of chasing we have to do on Wikipedia when xTools is down, and everyone posts everywhere like monkeys. :p [19:24:44] mailing lists are awesome for that reason [19:26:53] JohnFLewis, I know. :-) [19:27:38] this may or may not blow up... [19:28:13] Negative24: it's labs, it will ;) [19:28:57] haha its alive! [19:29:02] Test Bot24 [19:29:07] I am a bot. [19:29:49] first time messing with irc and it worked [19:29:58] Suddenly Negative24 realizes there is a major bug in the bot, and the bot activates the nuclear self-destruct sequence. [19:30:45] heh [19:30:58] its only 50 lines long [19:31:00] Negative24, I hope you didn't load the bot onto labs. :p [19:31:17] no but why? [19:31:24] oh [19:31:39] Otherwise your bot is about to nuke labs. :p [19:31:51] :P [19:32:07] Which wouldn't be that bad to begin with. :p [19:52:34] andrewbogott: so when the tools webproxy outage - I just switched IP to the other box and made a hiera edit, and total downtime there was 2mins... [19:52:45] and tools-webproxy-01 is back again now. not sure what exactly happened [19:56:10] Did you reboot it? Or did it just cheer up after a minute? [19:56:55] andrewbogott: it cheered up. I think it ran out of fds or something maybe - there was a disk space alert right before it, and then it cleared up right after [20:13:37] andrewbogott: I rewrote https://wikitech.wikimedia.org/wiki/Help:Shared_storage [20:13:46] I should probably have a tracking task for those, will do that later [20:13:47] JohnFLewis, how do you shut member moderation off? [20:13:54] MaxSem: ^ you wanted a page explaining the indiviudal mounts :) [20:14:00] I only see non-member options. [20:14:11] members are not moderated [20:14:19] they can only be moderated separately [20:14:44] JohnFLewis, whoops. I had it backwards [20:16:20] so you want it on? [20:31:01] 6Labs, 10wikitech.wikimedia.org, 5Patch-For-Review, 5WMF-deploy-2015-06-23_(1.26wmf11), 5WMF-deploy-2015-06-30_(1.26wmf12): Automatically grant shell user right to everyone who signs up on wikitech - https://phabricator.wikimedia.org/T97334#1411480 (10scfc) I assume you tested it with User:TestAccount3?... [20:32:41] 6Labs, 10wikitech.wikimedia.org, 5WMF-deploy-2015-06-23_(1.26wmf11), 5WMF-deploy-2015-06-30_(1.26wmf12): Grant shell user right with project memberships and remove autocreation of shell requests - https://phabricator.wikimedia.org/T97334#1411491 (10scfc) p:5Triage>3Normal [20:33:09] 6Labs, 10wikitech.wikimedia.org, 5WMF-deploy-2015-06-23_(1.26wmf11), 5WMF-deploy-2015-06-30_(1.26wmf12): Grant shell user right with project memberships and remove autocreation of shell requests - https://phabricator.wikimedia.org/T97334#1411496 (10yuvipanda) Yes, I did. [20:36:10] 6Labs, 10wikitech.wikimedia.org, 5Patch-For-Review, 5WMF-deploy-2015-06-23_(1.26wmf11), 5WMF-deploy-2015-06-30_(1.26wmf12): Grant shell user right with project memberships and remove autocreation of shell requests - https://phabricator.wikimedia.org/T97334#1411503 (10yuvipanda) We could SWAT ^ whenever :) [21:05:44] 6Labs, 6Discovery, 10Maps: Investigate and reduce NFS use in maps-team project - https://phabricator.wikimedia.org/T103757#1411609 (10MaxSem) Yep. Let's kill everything but scratch. I've already moved everything over. [21:06:33] 6Labs, 6Discovery, 10Maps: Investigate and reduce NFS use in maps-team project - https://phabricator.wikimedia.org/T103757#1411611 (10yuvipanda) Wonderful! [21:24:49] 6Labs, 10Incident-20150617-LabsNFSOutage, 3Labs-Sprint-103, 3Labs-Sprint-104: Labs: Salvage, then remove volumes on labstores' raid6 - https://phabricator.wikimedia.org/T103265#1411689 (10yuvipanda) [21:24:51] 6Labs, 10Incident-20150617-LabsNFSOutage, 3Labs-Sprint-103: Recover some sql queries - https://phabricator.wikimedia.org/T104134#1411687 (10yuvipanda) 5Open>3Resolved Recovered all .sql files in queries, folder, and they are under the 'recovered' folder in your homedir. [21:27:57] 6Labs, 6WMF-Legal: Make sure tools can be taken over after they are abandoned - https://phabricator.wikimedia.org/T102066#1411695 (10Ricordisamoa) See also {T59590} [21:50:13] PROBLEM - Puppet failure on tools-bastion-01 is CRITICAL 22.22% of data above the critical threshold [0.0] [21:50:55] PROBLEM - Puppet failure on tools-master is CRITICAL 50.00% of data above the critical threshold [0.0] [21:51:41] PROBLEM - Puppet failure on tools-webgrid-lighttpd-1207 is CRITICAL 60.00% of data above the critical threshold [0.0] [21:51:50] 6Labs, 10Incident-20150617-LabsNFSOutage, 3Labs-Sprint-102, 3Labs-Sprint-103, 3Labs-Sprint-104: Audit projects' use of NFS, and remove it where not necessary - https://phabricator.wikimedia.org/T102240#1411780 (10yuvipanda) [21:51:52] 6Labs, 6Discovery, 10Maps, 5Patch-For-Review: Investigate and reduce NFS use in maps-team project - https://phabricator.wikimedia.org/T103757#1411778 (10yuvipanda) 5Open>3Resolved Aand done! [21:51:59] PROBLEM - Puppet failure on tools-exec-cyberbot is CRITICAL 60.00% of data above the critical threshold [0.0] [21:53:30] 6Labs, 6operations, 3Labs-Sprint-104, 5Patch-For-Review: update star.wmflabs.org cert from sha1 to sha256 - https://phabricator.wikimedia.org/T104017#1411788 (10yuvipanda) 5Open>3Resolved Done! [22:00:12] RECOVERY - Puppet failure on tools-bastion-01 is OK Less than 1.00% above the threshold [0.0] [22:05:46] 6Labs: Disable NFS for the toolserver-legacy project - https://phabricator.wikimedia.org/T104256#1411844 (10yuvipanda) 3NEW a:3yuvipanda [22:16:37] RECOVERY - Puppet failure on tools-webgrid-lighttpd-1207 is OK Less than 1.00% above the threshold [0.0] [22:16:57] RECOVERY - Puppet failure on tools-exec-cyberbot is OK Less than 1.00% above the threshold [0.0] [22:20:57] RECOVERY - Puppet failure on tools-master is OK Less than 1.00% above the threshold [0.0] [22:47:58] 6Labs, 7HTTPS, 3Labs-Sprint-104: Renew toolserver.org SSL certificate - https://phabricator.wikimedia.org/T104211#1412023 (10RobH) I put in the toolserver.org domian on globalsign two hours ago, and its still pending vetting. I've emailed our rep to determine what the hold up is. [22:48:08] 6Labs, 7HTTPS, 3Labs-Sprint-104: Renew toolserver.org SSL certificate - https://phabricator.wikimedia.org/T104211#1412024 (10RobH) p:5Triage>3High [22:57:35] 6Labs, 7Tracking: Create labs project for analysis of recent changes and user contributions - https://phabricator.wikimedia.org/T104144#1412052 (10Luke081515) [23:23:57] 6Labs, 10Deployment-Systems, 10wikitech.wikimedia.org, 5Patch-For-Review: Merge as many configuration hacks in wikitech.php configuration file as possible into InitialiseSettings.php - https://phabricator.wikimedia.org/T75939#1412153 (10Krenair) 5Open>3Resolved Let's call this resolved. Not much point... [23:57:44] 6Labs, 10Datasets-General-or-Unknown, 10Labs-Infrastructure, 10Wikidata, and 2 others: Add Wikidata json dumps to labs in /public/dumps - https://phabricator.wikimedia.org/T100885#1412299 (10hoo) 5Open>3Resolved