[05:26:52] a [05:35:41] (used https) [05:39:06] (03CR) 10Krinkle: [C: 031] Make urls protocol neutral [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/281380 (https://phabricator.wikimedia.org/T131739) (owner: 10Lokal Profil) [08:34:52] PROBLEM - Host tools-bastion-01 is DOWN: CRITICAL - Host Unreachable (10.68.17.228) [10:14:41] 6Labs, 10Beta-Cluster-Infrastructure: deployment-upload won't start, upload.beta.wmflabs.org down - https://phabricator.wikimedia.org/T131322#2177257 (10hashar) @Andrew Nice! Thank you very much to have ported the hack :-} [11:17:20] !log tools.stewardbots Imported code for requests.php, not yet merged (T130028) [11:17:21] T130028: Bring back requests.php - https://phabricator.wikimedia.org/T130028 [11:17:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL, Master [13:37:55] 6Labs, 10Labs-Infrastructure, 10Tool-Labs: jsub/jstart take 60 s due to /usr/local/bin/log-command-invocation CPU hunger - https://phabricator.wikimedia.org/T131700#2177481 (10chasemp) p:5Triage>3Normal @yuvipanda, any thoughts on making this async to the actual submission process? [13:38:33] 6Labs, 10Horizon: Unable to login on Horizon - https://phabricator.wikimedia.org/T131630#2177484 (10chasemp) p:5Triage>3Normal [13:39:40] 6Labs, 10Tool-Labs: Tool Labs users .bashrc file does not exist for tools accounts - https://phabricator.wikimedia.org/T131561#2177498 (10chasemp) p:5Triage>3Normal [13:44:27] 6Labs, 10Tool-Labs: Create replica.my.cnf in my home directory - https://phabricator.wikimedia.org/T131546#2177503 (10chasemp) p:5Triage>3Normal [13:45:21] 6Labs: Add default spam prevention tools or settings to wmflabs instances - https://phabricator.wikimedia.org/T131459#2177505 (10chasemp) p:5Triage>3Normal [13:51:23] 6Labs, 10Horizon: Proxy corner case: proxy name foo.wmflabs.org == domain name foo.wmflabs.org - https://phabricator.wikimedia.org/T131367#2177516 (10chasemp) p:5Triage>3Normal [13:51:50] 6Labs, 10Horizon: DNS Domains view in Horizon for Tools project displays only one domain - https://phabricator.wikimedia.org/T131334#2177519 (10chasemp) p:5Triage>3Normal [13:52:18] 6Labs, 10Labs-Infrastructure: Labs proxy api (aka 'Invisible Unicorn') is a spof - https://phabricator.wikimedia.org/T131308#2177520 (10chasemp) p:5Triage>3High [13:52:29] 6Labs, 10Horizon: Allow creation and deletion of domains - https://phabricator.wikimedia.org/T131301#2177521 (10chasemp) p:5Triage>3Normal [13:52:44] 6Labs, 10Labs-Infrastructure: Abolish use of labs proxies in domains other than .wmflabs.org - https://phabricator.wikimedia.org/T131290#2177522 (10chasemp) p:5Triage>3Normal [13:52:56] 6Labs, 10Labs-Infrastructure: Make labs proxies https only - https://phabricator.wikimedia.org/T131288#2177523 (10chasemp) p:5Triage>3Normal [13:53:57] 6Labs, 10Labs-Infrastructure, 7Graphite: Graphite is unable to detect, if a paused instance is resumed - https://phabricator.wikimedia.org/T131022#2177524 (10chasemp) p:5Triage>3Low [13:54:27] 6Labs, 6Operations, 13Patch-For-Review: Labtest designate giving out Forbidden exceptions when trying to list domains - https://phabricator.wikimedia.org/T130979#2177525 (10chasemp) p:5Triage>3Low [13:55:43] 6Labs: Interactive consoles? - https://phabricator.wikimedia.org/T130806#2177529 (10chasemp) p:5Triage>3Normal [13:56:31] 6Labs, 10Labs-Infrastructure: Install python-requests-oauthlib on labs - https://phabricator.wikimedia.org/T130529#2177530 (10chasemp) p:5Triage>3Normal [13:57:35] 6Labs, 10Labs-Infrastructure: Install pdf2djvu for Wikisource DjVu aid - https://phabricator.wikimedia.org/T130138#2177531 (10chasemp) p:5Triage>3Normal [13:58:16] 6Labs, 10Labs-Infrastructure: invisible-unicorn/dynamicproxy-api should refuse to add backends to another project's domain - https://phabricator.wikimedia.org/T129800#2177532 (10chasemp) p:5Triage>3Normal [13:58:24] 6Labs, 10Tool-Labs: Investigate kubelet container garbage collection - https://phabricator.wikimedia.org/T129730#2177533 (10chasemp) p:5Triage>3Normal [13:58:42] 6Labs, 10Labs-Infrastructure, 10Continuous-Integration-Infrastructure: Cannot SSH to a few CI slaves due to DNS failure - https://phabricator.wikimedia.org/T129640#2177534 (10chasemp) p:5Triage>3Normal [14:01:18] 6Labs, 10Tool-Labs, 6Community-Tech-Tool-Labs: Collect and display basic metrics for all tools (service groups) - https://phabricator.wikimedia.org/T129630#2177535 (10chasemp) p:5Triage>3Normal [14:01:49] 6Labs, 10Labs-Infrastructure, 6Operations: labnet1002 can't talk to webproxy.eqiad.wmnet:8080, puppet fails to install designateclient - https://phabricator.wikimedia.org/T129623#2177536 (10chasemp) p:5Triage>3Normal [14:02:12] 6Labs: Can't create security rules via OSM - https://phabricator.wikimedia.org/T129438#2177537 (10chasemp) p:5Triage>3Normal @andrew, is it fair to close this as "please do it in horizon"? [14:02:28] 6Labs: Can't delete security groups (in horizon or OSM) - https://phabricator.wikimedia.org/T129437#2177540 (10chasemp) p:5Triage>3Normal @Andrew, is it fair to close this as "please do it in horizon"? [14:03:01] 6Labs, 10Tool-Labs: Setup a supported HTTP Ingress solution for Kubernetes - https://phabricator.wikimedia.org/T129312#2177543 (10chasemp) p:5Triage>3Normal [14:03:47] 6Labs: Add DNS entry for promethium.wikitextexp.eqiad.wmflabs - https://phabricator.wikimedia.org/T129181#2177544 (10chasemp) p:5Triage>3Normal [14:04:25] 6Labs, 7Documentation: Update documentation for the various phab Labs tags - https://phabricator.wikimedia.org/T129043#2177545 (10chasemp) p:5Triage>3Normal [14:04:36] 6Labs, 10Labs-Infrastructure: puppet::self broken - https://phabricator.wikimedia.org/T128930#2177546 (10chasemp) p:5Triage>3Normal [14:05:15] 6Labs: Clean up ogvjs-testing labs instance - https://phabricator.wikimedia.org/T128901#2177547 (10chasemp) p:5Triage>3Normal [14:05:41] 6Labs, 10Tool-Labs, 7Software-Licensing: Remove or prune cdnjs on tools-static - https://phabricator.wikimedia.org/T128841#2177552 (10chasemp) p:5Triage>3Normal [14:06:48] 6Labs, 6Operations: Can't create account "Bishoy Camel" (user with a former SVN account not migrated) - https://phabricator.wikimedia.org/T128833#2177556 (10chasemp) p:5Triage>3High [14:07:16] 6Labs: role::puppet::self requires a puppetmaster restart during apply - https://phabricator.wikimedia.org/T128740#2177557 (10chasemp) p:5Triage>3Normal [14:07:30] 6Labs, 10Tool-Labs, 10Monitoring, 6Operations: Make icinga-wm report Tools homepage check at #wikimedia-labs, too - https://phabricator.wikimedia.org/T128716#2177558 (10chasemp) p:5Triage>3Low [14:07:37] 6Labs, 10Tool-Labs, 10Monitoring, 6Operations: Add other Tools administrators to the Icinga notification group - https://phabricator.wikimedia.org/T128715#2177559 (10chasemp) p:5Triage>3Normal [14:07:50] 6Labs: role::simplelamp fails to start mysql due to apparmor - https://phabricator.wikimedia.org/T128642#2177560 (10chasemp) p:5Triage>3Low [14:07:59] 6Labs, 10Tool-Labs, 6Operations: Get rid of Tool Labs home page check from shinken - https://phabricator.wikimedia.org/T128615#2177561 (10chasemp) p:5Triage>3Normal [14:10:16] 6Labs, 10Tool-Labs, 13Patch-For-Review: Puppet errors on tools-web-static-01 and tools-web-static-02 - https://phabricator.wikimedia.org/T128411#2177564 (10chasemp) 5Open>3Resolved seems good now [14:10:53] 6Labs, 10MediaWiki-extensions-OpenStackManager, 10wikitech.wikimedia.org: Unable to add service group to service groups - https://phabricator.wikimedia.org/T128400#2177566 (10chasemp) p:5Triage>3Low [14:12:24] 6Labs, 10DBA: s52490 created 338 simultanous conections to the same server executing the same query - https://phabricator.wikimedia.org/T128355#2177568 (10chasemp) 5Open>3Resolved a:3chasemp seems to have been ok for awhile so I"m going to resolve for now [14:12:51] 6Labs, 10Phabricator: Create custom Phab form for requesting a new Labs project - https://phabricator.wikimedia.org/T128300#2177571 (10chasemp) p:5Triage>3Normal [14:13:05] 6Labs, 10wikitech.wikimedia.org: SRF preference messages broken - https://phabricator.wikimedia.org/T128027#2177572 (10chasemp) p:5Triage>3Normal [14:13:14] 6Labs, 10MediaWiki-extensions-OpenStackManager: Lists of users in Labs project pages should be sorted by wiki user name - https://phabricator.wikimedia.org/T128002#2177573 (10chasemp) p:5Triage>3Low [14:13:44] 6Labs, 10Labs-Infrastructure: I/O on labmon1001 is very slow - https://phabricator.wikimedia.org/T127957#2177574 (10chasemp) p:5Triage>3High I noticed this as well, it is basically unusable at times. [14:14:34] 6Labs, 10Tool-Labs, 6Collaboration-Team-Backlog, 6Community-Tech-Tool-Labs, and 2 others: Enable Flow on wikitech (labswiki and labtestwiki), then turn on for Tool talk namespace - https://phabricator.wikimedia.org/T127792#2177576 (10chasemp) p:5Triage>3Normal [14:15:05] 6Labs, 10Labs-Infrastructure: Make labs wikitech role aware - https://phabricator.wikimedia.org/T127771#2177577 (10chasemp) p:5Triage>3Low [14:15:15] 6Labs: Move labs auth.logs to central logging - https://phabricator.wikimedia.org/T127717#2177578 (10chasemp) p:5Triage>3Normal [14:15:25] 6Labs: Write some labs tests that monitor login and sudo permissions - https://phabricator.wikimedia.org/T127716#2177580 (10chasemp) p:5Triage>3Normal [14:16:03] 6Labs, 10Labs-Infrastructure, 10Deployment-Systems, 6Release-Engineering-Team: integration-make-wmf-branch instance stall on Failed to start LSB: NFS support files common to client and server. - https://phabricator.wikimedia.org/T127705#2177582 (10chasemp) p:5Triage>3Normal @hashar what happened with... [14:16:42] 6Labs, 10Tool-Labs: bug :Portgrabber don't support non ASCII characters - https://phabricator.wikimedia.org/T127689#2177584 (10chasemp) p:5Triage>3Normal [14:16:50] 6Labs, 10Tool-Labs: Installer Bundler for managing Ruby application dependencies - https://phabricator.wikimedia.org/T127685#2177585 (10chasemp) p:5Triage>3Normal [14:17:18] 6Labs, 10Tool-Labs: Labs users should be able to force-delete their own jobs - https://phabricator.wikimedia.org/T127681#2177586 (10chasemp) p:5Triage>3Low [14:24:39] 6Labs, 10Tool-Labs: install Morfeusz (morphological analyser) and Python bindings - https://phabricator.wikimedia.org/T127633#2177588 (10chasemp) p:5Triage>3Normal [14:25:11] 6Labs, 10Labs-Infrastructure: Soft mount NFS for almost (all) projects that still have NFS - https://phabricator.wikimedia.org/T127559#2177589 (10chasemp) p:5Triage>3Normal [14:25:31] 6Labs, 10Tool-Labs: gridengine master dependencies are missing for gridengine_resources - https://phabricator.wikimedia.org/T127388#2177590 (10chasemp) p:5Triage>3Normal [14:26:09] 6Labs, 10Labs-Infrastructure: Avoid indefinite growing of apt caches and and old kernel images - https://phabricator.wikimedia.org/T127374#2177591 (10chasemp) p:5Triage>3Normal [14:28:57] 6Labs, 10Tool-Labs, 10DBA, 13Patch-For-Review: Tool Labs queries die - https://phabricator.wikimedia.org/T127266#2177594 (10chasemp) p:5Triage>3Normal [14:31:45] 6Labs, 10Tool-Labs: provide a more strict robots.txt at Tool Labs - https://phabricator.wikimedia.org/T127206#2177599 (10chasemp) p:5Triage>3Normal [14:32:04] 6Labs, 10Phabricator: Git broken on phabricator labs machines - https://phabricator.wikimedia.org/T127139#2177600 (10chasemp) p:5Triage>3Normal [14:34:48] 6Labs, 10Wikimedia-Mailing-lists: Create temporary test mailman mailing list to test synchronization with https://discourse.wmflabs.org/ - https://phabricator.wikimedia.org/T126547#2177607 (10chasemp) p:5Triage>3Normal [14:35:08] 6Labs, 13Patch-For-Review: Instances broken on initial provision with dns setup issues - https://phabricator.wikimedia.org/T126580#2177609 (10chasemp) p:5Triage>3Normal [14:39:26] 6Labs, 10Tool-Labs: tools-grid-master / almost full (929M/18G free) - https://phabricator.wikimedia.org/T126353#2177645 (10chasemp) 5Open>3Resolved a:3chasemp Thanks Should be survivable now > /dev/vda1 ext4 18G 8.5G 8.4G 51% / [14:39:36] 6Labs: salt keys being created and accepted with wrong hostname (no project name in hostname) - https://phabricator.wikimedia.org/T126324#2177649 (10chasemp) p:5Triage>3Normal [14:39:47] 6Labs: broken labs instances (ssh or perms), do we care? - https://phabricator.wikimedia.org/T126323#2177653 (10chasemp) p:5Triage>3Low [14:39:55] 6Labs, 10wikitech.wikimedia.org: Add links to Labs help/FAQ on Nova Resource project and instance pages - https://phabricator.wikimedia.org/T126289#2177655 (10chasemp) p:5Triage>3Normal [14:41:24] 6Labs: sudo does not work for admin users in 'search' project - https://phabricator.wikimedia.org/T126265#2177659 (10chasemp) 5Open>3Resolved a:3chasemp no movement or word on this so I'm closing [14:43:39] 6Labs, 10Labs-Infrastructure, 10Deployment-Systems, 6Release-Engineering-Team: integration-make-wmf-branch instance stall on Failed to start LSB: NFS support files common to client and server. - https://phabricator.wikimedia.org/T127705#2177668 (10hashar) 5Open>3declined We used that instance to cut t... [14:44:52] 6Labs, 10Tool-Labs: puppet failure on a large number of instances - https://phabricator.wikimedia.org/T126165#2177675 (10chasemp) [14:45:28] 6Labs, 10Tool-Labs: puppet failure on a large number of instances - https://phabricator.wikimedia.org/T126165#2006444 (10chasemp) [14:45:30] 6Labs, 10Tool-Labs: tools-web-static-*: Could not find dependent Package[gridengine-common] - https://phabricator.wikimedia.org/T126171#2177677 (10chasemp) 5Open>3Resolved a:3chasemp seems no longer true [14:45:45] 6Labs, 10Tool-Labs: puppet failure on a large number of instances - https://phabricator.wikimedia.org/T126165#2006444 (10chasemp) 5Open>3Resolved a:3chasemp [15:28:55] 6Labs, 10Labs-Infrastructure: I/O on labmon1001 is very slow - https://phabricator.wikimedia.org/T127957#2059611 (10scfc) If that causes requests from #Shinken to time out from time to time, T99072 might be related. [15:44:40] 6Labs: Can't delete security groups (in horizon or OSM) - https://phabricator.wikimedia.org/T129437#2177991 (10AlexMonk-WMF) Well I'm not @Andrew but given that I tested this in both Horizon and OSM when I created this ticket, that's probably not fair. [15:45:50] 6Labs: Can't delete security groups (in horizon or OSM) - https://phabricator.wikimedia.org/T129437#2177992 (10chasemp) Ah didn't see that thanks [15:52:43] 6Labs, 10Tool-Labs, 6Community-Tech-Tool-Labs, 7Epic: Tools web interface for tool authors (Brainstorming ticket) - https://phabricator.wikimedia.org/T128158#2065676 (10chasemp) @bd808 what would you think about using https://phabricator.wikimedia.org/ponder/ for some of this as it's more stackoverflow que... [15:52:45] PROBLEM - Puppet run on tools-exec-1212 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [15:54:44] 10Tool-Labs-tools-Other: Migrate http://toolserver.org/~dispenser/* to Tool Labs - https://phabricator.wikimedia.org/T68868#2178008 (10Dispenser) [15:54:46] 6Labs, 10Wikimedia-Labs-General, 10DBA, 6Operations, 7Tracking: (Tracking) Database replication services - https://phabricator.wikimedia.org/T50930#2178009 (10Dispenser) [15:54:48] 10Wikimedia-Labs-General: Make user_email_authenticated status visible on labs - https://phabricator.wikimedia.org/T70876#2178006 (10Dispenser) 5declined>3Open [[http://dispenser.homenet.org/~dispenser/cgi-bin/useractivity.py|One of my tools]] runs slowly because it has to make API requests to get the emaila... [16:04:47] 6Labs, 6Operations, 13Patch-For-Review: Labtest designate giving out Forbidden exceptions when trying to list domains - https://phabricator.wikimedia.org/T130979#2178026 (10Andrew) 5Open>3Resolved a:3Andrew [16:05:33] 6Labs: Can't delete security groups (in horizon or OSM) - https://phabricator.wikimedia.org/T129437#2178028 (10Andrew) a:3Andrew [16:13:13] 6Labs, 10Labs-Infrastructure: Labs proxy api (aka 'Invisible Unicorn') is a spof - https://phabricator.wikimedia.org/T131308#2178053 (10yuvipanda) So this uses sqlalchemy, so pretty easy to switchover to toolsdb. But that would mean we'll be relying on toolsdb for infrastructure which makes me feel a bit iffy. [16:22:40] RECOVERY - Puppet run on tools-exec-1212 is OK: OK: Less than 1.00% above the threshold [0.0] [16:36:55] 6Labs, 10Labs-Infrastructure: Labs proxy api (aka 'Invisible Unicorn') is a spof - https://phabricator.wikimedia.org/T131308#2178104 (10AlexMonk-WMF) project-proxy is separate from the tools project, nothing outside of the tools project should be touching tools-db and we certainly shouldn't make dynamicproxy d... [16:37:09] 6Labs, 10Tool-Labs: bug :Portgrabber don't support non ASCII characters - https://phabricator.wikimedia.org/T127689#2050607 (10yuvipanda) I think this is much more related to GridEngine setting having different PATHs than your terminal when executing it manually. [16:42:31] 6Labs, 10Labs-Infrastructure: Labs proxy api (aka 'Invisible Unicorn') is a spof - https://phabricator.wikimedia.org/T131308#2178140 (10yuvipanda) toolsdb is a misnomer, and several projects use it (and that is a supported use case, IMO). I agree that something that is a dependency of wikitech / horizon should... [16:43:02] 6Labs, 10Labs-Infrastructure: Labs proxy api (aka 'Invisible Unicorn') is a spof - https://phabricator.wikimedia.org/T131308#2178144 (10yuvipanda) For history: toolsdb used to be a VM on the tools hosts, and didn't get renamed when we moved it to a labs-wide hardware backed db machine... [17:05:06] 6Labs, 10Tool-Labs, 13Patch-For-Review, 10Scap3 (Scap3-Adoption-Phase1): Setup a proper deployment strategy for Kubernetes - https://phabricator.wikimedia.org/T129311#2178276 (10yuvipanda) a:3yuvipanda [17:05:38] 6Labs, 10Labs-Infrastructure, 10Tool-Labs: jsub/jstart take 60 s due to /usr/local/bin/log-command-invocation CPU hunger - https://phabricator.wikimedia.org/T131700#2178277 (10yuvipanda) a:3yuvipanda [17:20:01] 6Labs, 10Labs-Infrastructure: labvirt1002 disk space alert - https://phabricator.wikimedia.org/T131777#2178350 (10Andrew) [17:21:32] 6Labs, 10Labs-Infrastructure: labvirt1002 disk space alert - https://phabricator.wikimedia.org/T131777#2178366 (10Andrew) p:5Triage>3High [17:27:13] 6Labs, 10Labs-Infrastructure, 10Tool-Labs: jsub/jstart take 60 s due to /usr/local/bin/log-command-invocation CPU hunger - https://phabricator.wikimedia.org/T131700#2178403 (10yuvipanda) a:5yuvipanda>3None I'm unable to reproduce this now. I'm happy to make it async (probably!), but I am not fully sure i... [17:35:44] stashbot having issues or is labs? [17:38:14] greg-g: nothing general ongoing that I am aware of [17:40:23] hrrm [17:41:26] I'm getting 503s with phabricator. [17:42:13] I was going to link to the sal entry about phab going down for maintenance, but... [17:42:17] https://tools.wmflabs.org/sal/production [17:42:25] tom29739: phab is going down for maintence, should be quick (10 minutes) [17:42:45] chasemp: something's weird: https://tools.wmflabs.org/sal/production [17:43:06] SAL isn't working either. [17:44:40] Tool Labs seems to work though. [17:45:43] Can someone kick stashbot or restart it? [17:48:07] "boom!" <- how funny [17:48:42] I just kicked stashbot [17:48:49] tools.stashbot@tools-bastion-05:~$ ./stashbot.sh restart [17:48:50] Restarting stashbot... [17:48:50] Pushed rescheduling of job 4863845 on host tools-exec-1406.eqiad.wmflabs [17:49:26] does stashbot rely on a phab connection? [17:49:52] It might. It accesses phab. [17:50:19] I would look into it but am about to go afk for dinner [17:50:44] bon proffit [17:50:51] oh and it's also an irc bot that I haven't got access to [17:51:18] it was flapping before phab went down, I believe [17:51:22] * greg-g cross checks timestamps [17:51:52] yeah, it was flapping before that [17:51:55] T123 [17:52:41] Phab is back up now. [17:53:02] SAL still doesn't work. [17:53:54] T123 [17:56:31] 6Labs, 10Labs-Infrastructure: labvirt1002 disk space alert - https://phabricator.wikimedia.org/T131777#2178498 (10Andrew) p:5High>3Normal This was transitory... labvirt1002 is running at 7% free which isn't great but also not an emergency. [18:05:36] T123 [18:07:02] * greg-g kills stashbot [18:07:11] simple restarting not working, filing a task [18:08:42] T12345 [18:08:47] nah [18:09:23] stashbot is on github, so: https://github.com/bd808/tools-stashbot/issues/9 [18:10:35] 6Labs, 10MediaWiki-extensions-OATHAuth, 10Wikimedia-Hackathon-2016, 10wikitech.wikimedia.org, and 2 others: 2FA seems to be broken on wmf.19 - https://phabricator.wikimedia.org/T131445#2178534 (10dpatrick) Thanks Reedy! [18:14:22] 6Labs, 10Tool-Labs, 13Patch-For-Review: Puppet fails on tools-elastic-01, tools-elastic-02 and tools-elastic-03: "Class[Nginx] is already declared" - https://phabricator.wikimedia.org/T131644#2178550 (10scfc) The error now moved to: ``` [tim@passepartout ~]$ echo tools-elastic-0{1..3}.tools.eqiad.wmflabs |... [18:19:07] greg-g: it may never have been 'right' post overnight outage? [18:20:04] chasemp: good point, but it looked like it was going crazy the hour before the thermal paste reapply today [18:20:20] 6Labs, 10Tool-Labs, 13Patch-For-Review: Puppet fails on tools-elastic-01, tools-elastic-02 and tools-elastic-03: "Class[Nginx] is already declared" - https://phabricator.wikimedia.org/T131644#2173694 (10chasemp) yep thanks, merged during a meeting then lunch but I'm coming back around to it [18:20:31] that ^^ is based solely by looking at my irc logs join/part messages :) [18:22:27] greg-g: an hour before? I wonder if phab wasn't flaking for awhile pre-hard-shutdown [18:22:30] all speculation tho [18:22:54] chasemp: no no, an hour before chris shutting it down pre-emptively to reapply the thermal paste [18:23:10] ah interesting [18:23:22] yeah, 17:18ish according to join/parts in here [18:24:14] :37 was chris shutting it down [18:24:18] ok, so 20 minutes :) [19:11:23] Could someone please kick login.tools.wmflabs.org? Unreachable via ssh [19:12:25] 6Labs, 10Tool-Labs: Upgrade to Kubernetes 1.2 - https://phabricator.wikimedia.org/T130972#2178746 (10yuvipanda) The plan is: 1. Prepare all config changes and what not needed to upgrade, stage as puppet patches that haven't been merged yet. 2. Stop the 3 services on the master (apiserver, controller-manager,... [19:12:49] I'm just going to switch it now. [19:13:26] magnus_: you are right tools-bastion-03.eqiad.wmflabs is available atm [19:13:41] I'm using tools-bastion-03 [19:15:19] I finally got in and too many things to swat down it was nearly unresponsive so I rebooted [19:15:19] I'm just switching login.tools.wmflabs.org and tools-login to tools-bastion-05 [19:15:24] -03 [19:15:32] !log tools reboot tools-bastion-05 [19:15:35] yep [19:15:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [19:15:54] I'm pretty sure someone was doing a lot of gzip on NFS [19:16:03] which wouldn't be helped necessarily but maybe at least less impactful [19:16:11] pooptop, we need it [19:17:17] Thanks all! [19:18:02] this is going to mess up everyone's alerts [19:43:22] !log tools new bastion! [19:43:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [19:46:08] YuviPanda: http://pastebin.com/P55KHuky ... I assume it's something on my end by reading the error message, but jsuk. [19:46:51] Matthew_: nope, I just switchted the topic for that - new bastion, so new key :) [19:47:03] Ohhhhhh ok I gotcha. [19:47:11] I think we crossed in the middle. [19:47:37] hehe :D [19:47:43] Yay! [19:48:05] Even though I've been using it for the past week. [19:49:14] YuviPanda: Thank you :) [19:49:30] np Matthew_ :) [19:56:06] 6Labs, 10Tool-Labs: Convert most top level tools domains to CNAMEs - https://phabricator.wikimedia.org/T131796#2178815 (10yuvipanda) [19:56:32] 6Labs, 10Tool-Labs: Convert most top level tools domains to CNAMEs - https://phabricator.wikimedia.org/T131796#2178827 (10chasemp) p:5Triage>3High [20:00:35] 6Labs, 10Tool-Labs: Convert most top level tools domains to CNAMEs - https://phabricator.wikimedia.org/T131796#2178836 (10yuvipanda) Tools domains this applies to: 1. tools.wmflabs.org 2. tools-login.wmflabs.org 3. tools-static.wmflabs.org 4. tools-dev.wmflabs.org 5. tools-trusty.wmflabs.org [20:01:40] 6Labs, 10Tool-Labs: Convert most top level tools domains to CNAMEs - https://phabricator.wikimedia.org/T131796#2178815 (10Andrew) Using cnames sounds like a great solution. I'm making two changes to the bug title - Adding bastions, as that's the same problem - changing 'domain' to 'record' because I'm trying... [20:01:59] 6Labs, 10Tool-Labs: Convert most top level tool and bastion dns redcords to CNAMEs - https://phabricator.wikimedia.org/T131796#2178841 (10Andrew) [20:02:45] YuviPanda: a new bastion-05, -02, or is -03 now the default bastion? [20:03:15] 03 [20:03:30] The xlarge one. [20:03:49] yes [20:12:38] but when I login to tools-login.wmflabs I get to bastion-05 [20:13:57] it's possible there is stale DNS Luke081515 [20:14:09] tools-login.wmflabs.org hits 03 for me [20:14:58] seems like not a lot of users are using -03, the CPU usage isn't really bigger than before [20:15:28] login.tools.wmflabs.org puts me on -03. [20:16:44] login.tools.wmflabs or tools-login.wmflabs? [20:17:03] they should go to the same place :) [20:18:08] I get: login.tools => 03 tools-login = 05 [20:18:11] *=> [20:18:13] strange [20:20:59] !log services migrating instance appservice to labvirt1009 [20:21:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Services/SAL, dummy [20:21:25] Luke081515: am pretty sure it's stale DNS. that was the case for me until a few minutes ago [20:22:46] ok [20:37:05] 6Labs, 10Tool-Labs: Fix URL encoding of link to user's profile on 'No webservice' warning page - https://phabricator.wikimedia.org/T131799#2178903 (10PeterBowman) [20:52:32] 6Labs, 10Labs-Infrastructure: labvirt1002 disk space alert - https://phabricator.wikimedia.org/T131777#2179011 (10Andrew) 5Open>3Resolved [20:57:29] YuviPanda: Ok, I'm now at -03 too :D [21:08:33] !log rcm moving the data from rcm-2 to iron to prepare killing of rcm-2 [21:08:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Rcm/SAL, Master [21:14:33] 6Labs, 10Tool-Labs: Fix URL encoding of link to user's profile on 'No webservice' warning page - https://phabricator.wikimedia.org/T131799#2179148 (10scfc) a:3scfc [21:25:56] 6Labs: Can't delete security groups (in horizon or OSM) - https://phabricator.wikimedia.org/T129437#2179176 (10Andrew) 2016-04-04 21:17:14.813 11572 INFO nova.compute.api [req-ca935825-a32f-414d-ba40-118d8b28dde6 mortalandrew andrewtestproject - - -] Delete security group sacrificial 2016-04-04 21:17:14.817 1157... [21:37:15] YuviPanda: or anyone else: did the host key change on tools-login? [21:41:26] 6Labs: Can't delete security groups (in horizon or OSM) - https://phabricator.wikimedia.org/T129437#2179198 (10Andrew) I've filed an upstream bug for this: https://bugs.launchpad.net/nova/+bug/1566025 [21:43:03] (03PS1) 10Tim Landscheidt: Fix links for maintainers [labs/toollabs] - 10https://gerrit.wikimedia.org/r/281537 (https://phabricator.wikimedia.org/T131799) [21:43:10] e.g. is this man-in-the-middle attack warning I'm getting legit? [21:46:45] MusikAnimal, about a new ssh key? [21:46:55] yes [21:47:15] tools-bastion-03 is now on login.tools.wmflabs.org [21:47:35] that might of been it, when did that happen? [21:47:46] Hour ago. [21:47:52] Maybe 2 hours ago. [21:47:59] ok cool. False alarm. Thank you [22:02:37] PROBLEM - Host tools-bastion-11 is DOWN: CRITICAL - Host Unreachable (10.68.16.124) [22:11:38] 6Labs: Change 'deleted' column datatype in 'security_groups' table in 'nova' database - https://phabricator.wikimedia.org/T131814#2179297 (10Andrew) [22:18:22] 6Labs: Change 'deleted' column datatype in 'security_groups' table in 'nova' database - https://phabricator.wikimedia.org/T131814#2179317 (10scfc) This is the exact same error as T112492 where @jcrespo already changed the database. Has the database been rolled back? [22:25:07] 6Labs: Change 'deleted' column datatype in 'security_groups' table in 'nova' database - https://phabricator.wikimedia.org/T131814#2179374 (10Andrew) You are correct! I wonder if maybe the nova migration script is actively rolled things back when I went from juno->kilo? I'll investigate. [22:31:11] MusikAnimal: If you'd like, #wikimedia-xtools is a thing. [22:31:23] yeah that's been around for a while right? [22:31:43] Yeah, I guess so? Pretty dead. [22:32:35] I mention it in case we don't want to tie up -labs :) [22:32:40] yeah that's why I left [22:33:00] Fair enough. [22:36:51] MusikAnimal: MusikAnimal how goes xtools migration to its own project? [22:37:13] not sure if that's actually happening, but I guess Matthew_ would the one to ask [22:37:50] chasemp: Erm... nothing is much happening yet. I am beginning a rewrite but haven't gotten very far yet (with my free time it will take a while) [22:37:52] everything mostly works I think, and we're restarting it about 5 times a day now, so the downtime is fairly low [22:38:20] ok thanks guys [22:39:08] the restarts are required of course because of the memory leaks due to the poorly written code using an ancient version of PHP [22:39:19] so a rewrite would certainly be favourable [22:39:25] just a lot of work [22:39:34] for a tool that's mostly working [22:39:54] But pretty impossible to fix if it breaks... at least to begin with. [22:40:04] this is true ha [23:20:57] 6Labs, 10Tool-Labs, 6Operations, 7Icinga: tool labs instance distribution monitoring is broken - https://phabricator.wikimedia.org/T119929#1840320 (10faidon) Ping? If this can't be fixed anytime soon, can we remove the check from the servers on puppet at least? (I've been auditing acknowledged-but-forgotte...