[00:54:41] <Betacommand>	 anyone have a second to help verify that I am running into a bug and I am just not being stupid?
[02:01:07] <andrewbogott>	 YuviPanda: any idea why morebots are dead?
[02:01:35] <YuviPanda>	 andrewbogott: no idea. It’s just somewhat unloved of late, I suppose
[02:01:39] <YuviPanda>	 I can go in and bring it back up
[02:01:44] <andrewbogott>	 I’m already doing so
[02:01:44] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-07 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]  
[02:01:50] <andrewbogott>	 was wondering if there was a known cause
[02:01:52] <YuviPanda>	 andrewbogott: ah just saw, cool
[02:01:53] <YuviPanda>	 andrewbogott: nope
[02:02:24] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-cyberbot is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]  
[02:04:22] <shinken-wm>	 PROBLEM - Puppet failure on tools-redis is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]  
[02:04:26] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-08 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]  
[02:04:28] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-24 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]  
[02:04:44] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-10 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]  
[02:04:48] <shinken-wm>	 PROBLEM - Puppet failure on tools-services-02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]  
[02:05:04] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-05 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0]  
[02:05:20] <shinken-wm>	 PROBLEM - Puppet failure on tools-webproxy-02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]  
[02:05:22] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-20 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]  
[02:06:48] <shinken-wm>	 PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string 'Magnus' not found on 'http://tools.wmflabs.org:80/' - 3122 bytes in 0.024 second response time  
[02:07:26] <YuviPanda>	 andrewbogott: ^
[02:07:27] <YuviPanda>	 uh oh
[02:07:34] * YuviPanda checks
[02:07:36] <YuviPanda>	 oh
[02:07:40] <andrewbogott>	 YuviPanda: yes, that is why I am here
[02:07:47] <YuviPanda>	 I didn’t notice the toollabs downtime
[02:07:48] <YuviPanda>	 sorry
[02:07:50] * YuviPanda checks
[02:08:40] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-23 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]  
[02:08:52] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-09 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]  
[02:09:12] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-03 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]  
[02:09:30] <shinken-wm>	 PROBLEM - Puppet failure on tools-redis-slave is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]  
[02:09:32] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-22 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]  
[02:09:41] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-06 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [0.0]  
[02:10:12] <shinken-wm>	 PROBLEM - Puppet failure on tools-webproxy-01 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]  
[02:10:26] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-11 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]  
[02:10:38] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-06 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]  
[02:29:44] <shinken-wm>	 RECOVERY - Puppet failure on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0]  
[02:30:17] <shinken-wm>	 RECOVERY - Puppet failure on tools-webproxy-02 is OK: OK: Less than 1.00% above the threshold [0.0]  
[02:31:43] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-07 is OK: OK: Less than 1.00% above the threshold [0.0]  
[02:32:25] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-cyberbot is OK: OK: Less than 1.00% above the threshold [0.0]  
[02:33:53] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-09 is OK: OK: Less than 1.00% above the threshold [0.0]  
[02:34:13] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-03 is OK: OK: Less than 1.00% above the threshold [0.0]  
[02:34:24] <shinken-wm>	 RECOVERY - Puppet failure on tools-redis is OK: OK: Less than 1.00% above the threshold [0.0]  
[02:34:29] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-08 is OK: OK: Less than 1.00% above the threshold [0.0]  
[02:34:29] <shinken-wm>	 RECOVERY - Puppet failure on tools-redis-slave is OK: OK: Less than 1.00% above the threshold [0.0]  
[02:34:29] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-24 is OK: OK: Less than 1.00% above the threshold [0.0]  
[02:34:33] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-22 is OK: OK: Less than 1.00% above the threshold [0.0]  
[02:34:46] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-10 is OK: OK: Less than 1.00% above the threshold [0.0]  
[02:35:10] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-05 is OK: OK: Less than 1.00% above the threshold [0.0]  
[02:35:14] <shinken-wm>	 RECOVERY - Puppet failure on tools-webproxy-01 is OK: OK: Less than 1.00% above the threshold [0.0]  
[02:35:24] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-20 is OK: OK: Less than 1.00% above the threshold [0.0]  
[02:35:27] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-11 is OK: OK: Less than 1.00% above the threshold [0.0]  
[02:36:34] <andrewbogott>	 YuviPanda: the issue has passed but I think the original cause is still lurking.  I’ll forward you the email with our current theory.
[02:36:41] <YuviPanda>	 alright!
[02:38:39] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-23 is OK: OK: Less than 1.00% above the threshold [0.0]  
[02:39:12] <andrewbogott>	 hm, actually, I’m going to stop keystone again to try one last thing...
[02:39:35] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-06 is OK: OK: Less than 1.00% above the threshold [0.0]  
[02:39:57] <andrewbogott>	 nope, no dice
[02:40:37] <shinken-wm>	 RECOVERY - Puppet failure on tools-master is OK: OK: Less than 1.00% above the threshold [0.0]  
[02:40:37] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-06 is OK: OK: Less than 1.00% above the threshold [0.0]  
[02:40:43] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-11 is OK: OK: Less than 1.00% above the threshold [0.0]  
[02:46:48] <shinken-wm>	 RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 753925 bytes in 2.133 second response time  
[02:51:01] <wikibugs>	 6Labs, 10Tool-Labs, 3Labs-Q4-Sprint-2, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Puppetize & fix tools-db - https://phabricator.wikimedia.org/T88234#1211330 (10yuvipanda)
[02:51:03] <wikibugs>	 6Labs, 10Tool-Labs: Planned labs maintenance on tools-db: Puppetization + log file change - https://phabricator.wikimedia.org/T94643#1211327 (10yuvipanda) 5Open>3Resolved a:3yuvipanda All done now \o/
[02:51:39] <wikibugs>	 6Labs, 10Tool-Labs, 3Labs-Q4-Sprint-2, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Puppetize & fix tools-db - https://phabricator.wikimedia.org/T88234#1211331 (10yuvipanda) 5Open>3Resolved Woo, puppetized
[03:06:36] <wikibugs>	 6Labs, 10Tool-Labs, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Retire 'tomcat' node, make Java apps run on the generic webgrid - https://phabricator.wikimedia.org/T91066#1211345 (10yuvipanda) All good. Thanks @scfc
[04:25:09] <MusikAnimal>	 hello, I have a Ruby script on my tool account that requires a few gems. Evidently I'm unable to install any
[04:25:38] <MusikAnimal>	 also I'd prefer Ruby 2.0+ if there's any way to get that installed as well. How should I go about requesting this?
[04:26:51] <pandacat>	 MusikAnimal: hey! File a bug and we will figure out what to do
[04:26:59] <pandacat>	 I suspect you will have to use bundler or some such
[04:27:09] <MusikAnimal>	 yep I was hoping I could use bundler
[04:27:37] <pandacat>	 And rbenv perhaps?
[04:27:45] <pandacat>	 MusikAnimal: I wonder if you can just use rbenv Noe 
[04:27:47] <MusikAnimal>	 that's much better than rvm, yes
[04:27:47] <pandacat>	 Now
[04:27:54] <MusikAnimal>	 not installed
[04:27:55] <MusikAnimal>	 :(
[04:28:30] <pandacat>	 You can install it yourself fairly easily iirc 
[04:28:33] <pandacat>	 No root needed 
[04:29:15] <MusikAnimal>	 surely I'll need root to install another Ruby version?
[04:29:39] <pandacat>	 Nope
[04:29:51] <pandacat>	 If you only need it for yourself I doubt it
[04:30:49] <MusikAnimal>	 alright I'll give it a try then! thanks
[04:32:00] <MusikAnimal>	 another question, what about authentication? I guess my bot password has to live on here somewhere
[04:33:09] <pandacat>	 Yeah just make set permissions accordingly 
[04:33:13] <pandacat>	 And you will be ok 
[04:33:23] <pandacat>	 chmod o= filename
[05:35:51] <wikibugs>	 10Tool-Labs, 3Labs-Q4-Sprint-3, 5Patch-For-Review, 3ToolLabs-Goals-Q4: Make proxylistener not need to keep open socket connections open - https://phabricator.wikimedia.org/T96059#1211569 (10yuvipanda) 5Open>3Resolved a:3yuvipanda
[05:59:16] <sDrewth>	 has something changed on wikitech. it is telling me that i have cookies turned off
[06:05:34] <sDrewth>	 not getting that response from normal wikis
[06:32:02] <sDrewth>	 coren yuvi ^
[06:35:22] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-07 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]  
[06:39:40] <shinken-wm>	 PROBLEM - Puppet failure on tools-login is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]  
[07:04:41] <shinken-wm>	 RECOVERY - Puppet failure on tools-login is OK: OK: Less than 1.00% above the threshold [0.0]  
[07:05:24] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-07 is OK: OK: Less than 1.00% above the threshold [0.0]  
[07:07:38] <Amir1>	 hey, all of my tasks in grid (in my service group) were killed 
[07:07:51] <Amir1>	 one of them take 72 hours to finish
[07:07:58] <Amir1>	 what did I do wrong?
[08:15:07] <wikibugs>	 6Labs, 10Continuous-Integration: integration labs project DNS resolver improperly switched to openstack-designate - https://phabricator.wikimedia.org/T95273#1185067 (10hashar)
[08:37:23] <moritzm>	 logins to wikitech are failing for me (it complains about not having cookies enabled, but the message seems misleading), anyone having the same problem? is that fallout from yesterday's LDAP problems?
[09:38:24] <wikibugs>	 6Labs, 10Tool-Labs, 10Wikimedia-Hackathon-2015: Organize Wikimedia Labs activities at the Wikimedia Hackathon 2015 - https://phabricator.wikimedia.org/T92274#1211783 (10Qgil) @yuvipanda, any progress here? The event is around the corner.
[10:10:35] <wikibugs>	 10Wikimedia-Labs-wikitech-interface: Can not log into wikitech.wikimedia.org - https://phabricator.wikimedia.org/T96240#1211877 (10Magnus) 3NEW
[10:25:59] <afeder>	 moritzm: i get the same (incorrect) error
[10:26:59] <moritzm>	 afeder: yep, godog also confirmed it, also https://phabricator.wikimedia.org/T96240 was opened
[10:27:13] <afeder>	 great
[10:43:12] <sDrewthedoff>	 any news on wikitech and its refusal to log in?
[10:44:42] <wikibugs>	 10Wikimedia-Labs-wikitech-interface: Can not log into wikitech.wikimedia.org - https://phabricator.wikimedia.org/T96240#1211906 (10scfc) Me, too.
[11:04:10] <a930913>	 Almost up to job # 10,000,000 on the grid :o
[11:06:46] <a930913>	 Coren: If a process spawns a new process on the grid and then kills itself, will the new process be killed?
[11:12:50] <wikibugs>	 10Wikimedia-Labs-wikitech-interface: Can not log into wikitech.wikimedia.org - https://phabricator.wikimedia.org/T96240#1211922 (10Krenair) In my experience this is usually keystone on virt1001 (or was it 1000? Can't check right now) being down, which requires ops to restart. Other services implicated in the pas...
[11:13:21] <wikibugs>	 10Wikimedia-Labs-wikitech-interface, 6operations: Can not log into wikitech.wikimedia.org - https://phabricator.wikimedia.org/T96240#1211923 (10Krenair)
[11:14:50] <wikibugs>	 10Wikimedia-Labs-wikitech-interface, 6operations: Can not log into wikitech.wikimedia.org - https://phabricator.wikimedia.org/T96240#1211926 (10Joe) We already restarted keystone earlier to no effect  I'm looking around to find any specific error.
[11:15:16] <wikibugs>	 10Wikimedia-Labs-wikitech-interface, 6operations: Can not log into wikitech.wikimedia.org - https://phabricator.wikimedia.org/T96240#1211928 (10Joe) p:5Triage>3Unbreak! a:3Joe
[11:22:32] <wikibugs>	 10Wikimedia-Labs-wikitech-interface, 6operations: Can not log into wikitech.wikimedia.org - https://phabricator.wikimedia.org/T96240#1211938 (10Krenair) In case you don't already know about it there are troubleshooting instructions at https://wikitech.wikimedia.org/wiki/Wikitech I usually hide the logging behi...
[11:27:32] <a930913>	 Damianz: The redis feed looks borked.
[11:33:47] <hashar>	 sDrewth: there is a recurring issue on labs that cause auth to fail :/
[11:33:53] <hashar>	 a process known as keystone 
[11:34:21] <sDrewth>	 okay, _joe_ is poking at it, I went and asked in wmf-operations
[11:34:54] <sDrewth>	 ah, you are there
[11:38:10] <wikibugs>	 10Tool-Labs: Unattended upgrades are failing from time to time - https://phabricator.wikimedia.org/T92491#1211942 (10scfc) ``` From: root@tools.wmflabs.org (Cron Daemon) Subject: Cron <root@tools-exec-15> test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily ) To: root@tools.wmflabs.org Date:...
[11:38:43] <wikibugs>	 10Tool-Labs: Unattended upgrades are failing from time to time - https://phabricator.wikimedia.org/T92491#1211943 (10scfc)
[12:00:23] <wikibugs>	 10Wikimedia-Labs-wikitech-interface, 6operations: Can not log into wikitech.wikimedia.org - https://phabricator.wikimedia.org/T96240#1211951 (10Krenair) a:5Joe>3Krenair ```krenair@silver:/srv/mediawiki/wmf-config$ sudo /usr/sbin/apache2ctl restart krenair@silver:/srv/mediawiki/wmf-config$ ```  Seems to wor...
[12:00:32] <wikibugs>	 10Wikimedia-Labs-wikitech-interface, 6operations: Can not log into wikitech.wikimedia.org - https://phabricator.wikimedia.org/T96240#1211953 (10Krenair) 5Open>3Resolved
[12:01:05] <wikibugs>	 10Wikimedia-Labs-wikitech-interface, 6operations: Can not log into wikitech.wikimedia.org - https://phabricator.wikimedia.org/T96240#1211954 (10Joe) I checked/restarted keystone, I checked all the error logs I could think of (both locally and on fluorine), restarted the local failing nutcracker service on silv...
[12:11:04] <wikibugs>	 10Wikimedia-Labs-wikitech-interface, 6operations: Can not log into wikitech.wikimedia.org - https://phabricator.wikimedia.org/T96240#1211962 (10Joe) Maybe this is obvious enough, but is everyone having problems logging in using the 2 factor auth?
[12:12:26] <wikibugs>	 10Wikimedia-Labs-wikitech-interface, 6operations: Can not log into wikitech.wikimedia.org - https://phabricator.wikimedia.org/T96240#1211963 (10Krenair) Nope... Works for me
[12:18:02] <wikibugs>	 10Wikimedia-Labs-wikitech-interface, 6operations: Can not log into wikitech.wikimedia.org - https://phabricator.wikimedia.org/T96240#1211967 (10Joe) So, it seems memcached was the problem. Restarting nutcracker solved the issue afterall.
[12:18:50] <wikibugs>	 10Wikimedia-Labs-wikitech-interface, 6operations: Can not log into wikitech.wikimedia.org - https://phabricator.wikimedia.org/T96240#1211969 (10MoritzMuehlenhoff) No, I don't use 2FA, in my case Krenair's Apache restart fixed it for me.
[13:11:12] <Coren>	 a930913: It should, though I've honestly not tried it.  It's a bug if it doesn't.
[13:13:50] * Coren waves.
[13:16:52] <a930913>	 Coren: Ah, so that's why my failover always fails on the grid :D
[13:22:44] <Coren>	 a930913: Having orphaned child processes on nodes kinda defeats the purpose of a grid allocator.  :-)
[13:44:42] <Coren>	 !log tools -exec-14 drained, rebooting.
[13:44:45] <labs-morebots>	 Logged the message, Master
[13:46:53] <Coren>	 !log tools -exec-14 repooled.  That's it for general exec nodes.
[13:46:55] <labs-morebots>	 Logged the message, Master
[13:48:54] <Coren>	 Oh, wait - I forgot -15.  :-(
[14:19:30] <Coren>	 !log tools -exec-15 drained, rebooting.
[14:19:33] <labs-morebots>	 Logged the message, Master
[14:20:26] <Coren>	 Ah, how boring.  Job id rolls around at 10000000.  :-)
[14:22:07] <Coren>	 !log tools -exec-15 repooled
[14:22:09] <labs-morebots>	 Logged the message, Master
[14:22:14] <Coren>	 !log rebooting -shadow
[14:22:14] <labs-morebots>	 rebooting is not a valid project.
[14:22:21] <Coren>	 !log tools rebooting -shadow
[14:22:23] <labs-morebots>	 Logged the message, Master
[14:26:35] <shinken-wm>	 PROBLEM - SSH on tools-shadow is CRITICAL: Connection refused  
[14:27:56] <wikibugs>	 10Wikimedia-Labs-Infrastructure, 5Patch-For-Review: VM images duplicate /etc/apt/preferences.d/wikimedia{,.pref} - https://phabricator.wikimedia.org/T60681#1212155 (10fgiunchedi) this has been merged, we can also improve apt::pin to check whether the name will get ignored by apt or not
[14:29:50] <Coren>	 !log tools rebooting -mail
[14:29:53] <labs-morebots>	 Logged the message, Master
[14:31:36] <shinken-wm>	 RECOVERY - SSH on tools-shadow is OK: SSH OK - OpenSSH_5.9p1 Debian-5ubuntu1.4 (protocol 2.0)  
[14:32:12] <wikibugs>	 10Wikimedia-Labs-wikitech-interface, 6operations: Can not log into wikitech.wikimedia.org - https://phabricator.wikimedia.org/T96240#1212172 (10Andrew) The keystone database is jam packed with expired tokens -- many of these issues seem to be resulting from resulting keystone slowdowns.  I'm working on cleanin...
[14:37:31] <Coren>	 !log tools rebooting -master
[14:37:34] <labs-morebots>	 Logged the message, Master
[14:38:02] <Coren>	 Note: new job submissions may fail for the next 60s or so.
[14:39:00] <Coren>	 Or not.  That was quick.  :-)
[14:41:25] <wikibugs>	 6Labs, 3Labs-Q4-Sprint-2, 3Labs-Q4-Sprint-3, 3ToolLabs-Goals-Q4: Do a rolling restart of Tool Labs precise instances - https://phabricator.wikimedia.org/T95557#1212208 (10coren) This is finishing today; infrastructure hosts have either been rebooted (most) or scheduled to be rebooted (-dev, at 20h UTC)  Le...
[14:54:10] <Krinkle>	 Coren: I'm looking for a way to puppetise tmpfs being mounted at /var/lib/mysql for contint slaves. Kind of in a chicken-egg situation. Could you give some pointers maybe? Need a way to declare the dir, mount tmpfs, and then ensure it's there before mysql server starts.
[14:57:38] <wikibugs>	 6Labs, 10Tool-Labs, 7Tracking: Make toollabs reliable enough (Tracking) - https://phabricator.wikimedia.org/T90534#1212221 (10Ricordisamoa) When will I be able to delete "check all my Tool Labs webservices" from my daily to-do list?
[15:04:02] <Coren>	 Krinkle: I do something similar to this with the labs_lvs classes.  The trick is to create a create a mount {} resource that has 'noauto' in the options=>, have ensure => mounted for it, then have the file {} for the directory require the Mount[]
[15:04:39] <Coren>	 Krinkle: Then have the mysql server require the File[]
[15:04:45] <shinken-wm>	 PROBLEM - Puppet failure on tools-jessie-ldapconfig-test is CRITICAL: CRITICAL: 16.67% of data above the critical threshold [0.0]  
[15:05:24] <Krinkle>	 Coren: Interesting. But the dir has to exist before it can mount? Ah, I guess that's only the case the first time. The second time the dir won't be there and shouldn't be created either.
[15:05:25] <Krinkle>	 Interesting
[15:05:33] <Krinkle>	 Coren: OK. I think I've got that.
[15:06:16] <Krinkle>	 Coren: I'm not sure where to change the server though. We just include a package{ mysql-server } at the moment. Is the service being started elsewhere or just the package itself? 
[15:06:36] <Krinkle>	 Ideally I can just delete the directory, stop the server, then reboot the instance and have it running.
[15:06:42] <Coren>	 Krinkle: If you're not sure about the dir existing, you can't have a file {} for it (because you'd have two) but you can make an exec {} of a mkdir -p, give it a creates=> stanza, and have the mount() resource require it
[15:06:56] <Krinkle>	 cool
[15:07:18] <wikibugs>	 6Labs: Zillion expired tokens in keystone database - https://phabricator.wikimedia.org/T96256#1212276 (10Andrew) 3NEW a:3Andrew
[15:07:20] <Coren>	 Krinkle: I can make a changeset for it if you know where you need it.
[15:09:25] <Krinkle>	 Coren: Oh wow that'd be awesome :) We need it for role::ci::slave::labs. The mysql-server package is in contint::packages::labs -> contint::packages.
[15:09:37] <Coren>	 Krinkle: Gimme a minute or two.
[15:10:34] <Coren>	 in role::ci::slave::labs or role::ci::slave::labs::common?
[15:11:52] <Krinkle>	 Coren: role::ci::slave::labs::common is probably better indeed.
[15:12:34] <Krinkle>	 Though contint::packages is currently included from role::ci::slave::labs. So the other slave roles don't depend on it.
[15:13:31] <Coren>	 Well, I'll make the patch for ::common but it's easy enough to move around at need.
[15:14:29] <Krinkle>	 Yeah, looking further I think we'll want it in role::ci::slave::labs, since ::common is also used for servers that serve as ci-slave but not "integration-slave", e.g. beta-cluster, parsoid-deploy etc.
[15:19:34] <Coren>	 Krinkle: I'm not seeing an explicit definition of a 'mysql-server' service, are you just relying on the default start-on-boot behaviour?
[15:20:00] <Krinkle>	 Coren: Yeah.
[15:20:11] <Coren>	 So need to add that too.
[15:20:34] <Krinkle>	 Coren: I was looking into how production does this, but didn't get very far. I'm seeing mysql_wmf::datadir, but it's not referenced anywhere.
[15:21:04] <Coren>	 Krinkle: hiera, I think.  But also, prod does a lot of highly complicated things you certainly don't need to do.
[15:21:17] <Krinkle>	 :)
[15:34:15] <wikibugs>	 6Labs: Bundler and other Ruby gems needed for MusikBot tool - https://phabricator.wikimedia.org/T96261#1212411 (10MusikAnimal) 3NEW
[15:48:19] <shinken-wm>	 PROBLEM - Puppet failure on tools-webproxy-02 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [0.0]  
[15:54:02] <wikibugs>	 6Labs: Jessie Labs instances do not seem to set $ldap::role::config::labs::ldapconfig - https://phabricator.wikimedia.org/T96266#1212488 (10scfc) 3NEW a:3scfc
[15:57:08] <wikibugs>	 6Labs: Jessie Labs instances do not seem to set $ldap::role::config::labs::ldapconfig - https://phabricator.wikimedia.org/T96266#1212532 (10scfc) ``` Apr 16 15:55:55 toolsbeta-puppetmaster3 puppet-master[1157]: (Scope(Class[Toollabs::Bastion])) Could not look up qualified variable 'ldap::role::config::labs::ldap...
[16:01:12] <Krenair>	 <Krenair> latest timestamp in the revision table for metawiki_p is 20150416022943
[16:01:19] <Krenair>	 are there known issues with replication Coren?
[16:05:51] <Krinkle>	 Coren: I'm trying it out manually on integration-slave-trusty-1012 now.
[16:14:08] <shinken-wm>	 RECOVERY - Puppet failure on tools-webproxy-02 is OK: OK: Less than 1.00% above the threshold [0.0]  
[16:15:31] <wikibugs>	 10Tool-Labs: toollabs::bastion uses $ldap::role::config::labs::ldapconfig without including ldap::role::config::labs - https://phabricator.wikimedia.org/T96266#1212623 (10scfc)
[16:23:43] <Coren>	 Krenair: None known.
[16:23:58] <Coren>	 Krinkle: Tell me how it goes.  :-)
[16:26:07] <Krinkle>	 Coren: Arr. temporairly side tracked, but was figuring out the mount syntax for the command line. Basically doing the steps one by one first.
[16:26:14] <Krinkle>	 On that one server
[16:31:16] <Krinkle>	 Coren: Hm.. I'm curious why there's a file {} after mount{}. What would stop us from requiring mount[] further down?
[16:31:25] <shinken-wm>	 PROBLEM - Host tools-jessie-ldapconfig-test is DOWN: CRITICAL - Host Unreachable (10.68.17.231)  
[16:31:32] <Krinkle>	 Ah, the permissions!
[16:32:35] <Coren>	 Krinkle: That, and that it's more reliable to have the exec depend on the file which we know will be checked even if the filesystem was mounted at boot.
[16:33:57] <Krinkle>	 I think it worked.
[16:34:33] <Krinkle>	 I tried to stop the server, but it came right back due to upstart so just deleted var/lib/mysql as-is, re-created it empty, mounted tmpfs and ran mysql_install_db. 
[16:34:45] <Krinkle>	 Then opening mysql client it shows information_schema there and connecting fine
[16:35:22] <Krinkle>	 Will run mediawiki installer to verify and then delete it again and let puppet do it.
[16:35:31] <Coren>	 Sounds reasonable.
[16:38:48] <Coren>	 YuviPanda: Around?
[16:38:49] <Krinkle>	 Coren: Hm. ERROR 1006 (HY000) at line 2: Can't create database 'jenkins_u0_mw' (errno: 2)
[16:38:51] <Krinkle>	 I'll debug and get back.
[16:39:06] <Coren>	 Krinkle: Possible issue: permissions on the created directory, file ownership.
[16:40:36] <Coren>	 Also, you blasted the directory of mysql but did you start it since?
[16:40:37] <Coren>	 (Otherwise the daemon might be hanging on to old fds)
[16:42:28] <Krinkle>	 Coren: Yeah, I think that's it. Looking in /var/lib/mysql it has mysql/ and performance_schema/ but not ibdata1 and ib_logfile0.
[16:42:29] <Krinkle>	 stopping it and letting it start again fixed it
[16:42:29] <shinken-wm>	 PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: CRITICAL - Socket timeout after 10 seconds  
[16:42:29] <Coren>	 That's why puppet demands that upstart keeps it on manual.
[16:42:30] <Coren>	 shinken, what you smokin'?  It's fine.
[16:53:37] <Krenair>	 Coren, how up to date should we generally expect the labs replicas to be>?
[16:54:01] <Coren>	 Krenair: They generally keep a few seconds behind prod at most.
[16:54:08] <Krenair>	 right... I think something's broken then
[16:54:27] <Coren>	 Hm.
[16:54:32] <Krenair>	 20150416165301 is max(log_timestamp) for metawiki.logging in prod
[16:54:42] <Krenair>	 20150416032058 in the labs replica
[16:54:58] <Coren>	 Oy, that's quite a few hours.
[16:55:12] <Coren>	 I'm no DB expert, but ima take a look.
[16:57:30] <Krinkle>	 Coren: So it's running fine, though database creation didn't go noticably faster or slower. Still varying between 4 to 10 seconds both on this instance and other instances. Ran it a couple dozen times in a loop.
[16:57:35] <Krinkle>	 Coren: Test execution (read/write queries) do seem to go a fair bit faster.
[16:57:42] <Krinkle>	 So I'll forge ahead and roll it out
[16:57:57] <Coren>	 database creation, I think, is mostly not I/O bound.
[16:58:10] <Krinkle>	 Coren: According to Tim it was.
[16:58:33] <Krinkle>	 https://phabricator.wikimedia.org/T96229
[16:58:36] <Krinkle>	 It's quite odd.
[16:58:46] <Krinkle>	 Something in labs is causing it to stalls sometimes for a minute or more.
[16:59:04] <Krinkle>	 I can't reliably reproduce that though, so I dont know if this fixes that.
[16:59:32] <Coren>	 Well, it can't hurt; and it makes sense that a test database would be ephemeral anyways.
[16:59:32] <Krinkle>	 Tim determined from dumps and debug traces that at https://phabricator.wikimedia.org/T96230 that the install script for mediawiki seems I/O limited.
[16:59:41] <Krinkle>	 yeah :)
[17:00:02] <Coren>	 Krinkle: Might not be i/o in the actual datadir though that is the limit.
[17:01:18] <Krinkle>	 Coren: Yeah, maybe mysql tmpdir is relevant ?
[17:01:37] <Krinkle>	 It has its own tmpdir as well. Not sure what it uses that for
[17:01:47] <Krinkle>	 Would be an interesting next step to try and tmpfsify
[17:01:52] <Coren>	 It probably is; I don't know how much that is sollicited during creation though.
[17:24:15] <wikibugs>	 6Labs, 10Tool-Labs, 7Tracking: Make toollabs reliable enough (Tracking) - https://phabricator.wikimedia.org/T90534#1212906 (10scfc) Do not worry.  Tools is a very liberal environment, so as you have been able to ignore `bigbrother` since July 2014 when @coren set it up, the use of @yuvipanda's service manife...
[17:29:27] <Krinkle>	 Coren: 
[17:29:28] <Krinkle>	 Apr 16 17:28:18 integration-slave-trusty-1012 puppet-agent[11419]: Could not retrieve catalog from remote server: Error 400 on SERVER: Invalid parameter requires at /etc/puppet/manifests/role/ci.pp:518 on node i-00000a8e.eqiad.wmflabs 
[17:29:37] <Krinkle>	 I guess that should be require?
[17:29:47] <Coren>	 Krinkle: Err, yes it should.
[17:29:52] <Coren>	 puppet lint failz.
[17:29:59] <Krinkle>	 yeah
[17:30:04] <Krinkle>	 should've caught htat
[17:32:53] <Krinkle>	 Apr 16 17:32:42 integration-slave-trusty-1011 puppet-agent[16550]: (/Stage[main]/Role::Ci::Slave::Labs/Exec[create-mysql-datadir]/returns) executed successfully
[17:32:54] <Krinkle>	 Apr 16 17:32:42 integration-slave-trusty-1011 puppet-agent[16550]: (/Stage[main]/Role::Ci::Slave::Labs/Service[mysql-server]) Could not evaluate: Could not find init script or upstart conf file for 'mysql-server'
[17:33:10] <Krinkle>	 Coren: Hm.. looks like you were right to question that 
[17:33:30] <Coren>	 Hm.
[17:33:52] <Coren>	 Might just be the wrong name.  What's the name of the upstart script for that version of the mysql-server package in /etc/init/?
[17:33:59] <Krinkle>	 There exists /etc/init.d/mysql 
[17:34:20] <Coren>	 That's probably just it then.  Chaning to service { 'mysql': should do the trick.
[17:34:21] <Krinkle>	 Let's try that
[17:37:32] <Krinkle>	 Coren: Hm.. :/
[17:37:34] <Krinkle>	 Apr 16 17:37:04 integration-slave-trusty-1012 puppet-agent[14027]: Could not set 'manual' on enable: undefined method `manual_start' for Service[mysql](provider=upstart):Puppet::Type::Service::ProviderUpstart at 518:/etc/puppet/manifests/role/ci.pp
[17:37:34] <Krinkle>	 Apr 16 17:37:04 integration-slave-trusty-1012 puppet-agent[14027]: Could not set 'manual' on enable: undefined method `manual_start' for Service[mysql](provider=upstart):Puppet::Type::Service::ProviderUpstart at 518:/etc/puppet/manifests/role/ci.pp
[17:37:35] <Krinkle>	 Apr 16 17:37:04 integration-slave-trusty-1012 puppet-agent[14027]: Wrapped exception:
[17:37:36] <Krinkle>	 Apr 16 17:37:04 integration-slave-trusty-1012 puppet-agent[14027]: undefined method `manual_start' for Service[mysql](provider=upstart):Puppet::Type::Service::ProviderUpstart
[17:37:38] <Krinkle>	 Apr 16 17:37:04 integration-slave-trusty-1012 puppet-agent[14027]: (/Stage[main]/Role::Ci::Slave::Labs/Service[mysql]/enable) change from true to manual failed: Could not set 'manual' on enable: undefined method `manual_start' for Service[mysql](provider=upstart):Puppet::Type::Service::ProviderUpstart at 518:/etc/puppet/manifests/role/ci.pp
[17:37:53] <Coren>	 Hm.
[17:38:14] <Coren>	 The provider doesn't support enable => manual despite what the docs claimed.
[17:38:15] <Coren>	 Sad.
[17:38:20] * Coren ponders.
[17:38:54] <Coren>	 I can't think of a way to work around that beyond puppetizing the upstart script itself.
[17:39:09] <Coren>	 (Because we clearly don't want the mysqld to start at boot)
[17:39:11] <Krinkle>	 Coren: Woudl some kind of notify/restart work?
[17:39:21] <Krinkle>	 Oh, right.
[17:39:30] <Coren>	 Krinkle: It might, but it's almost certainly better to not have it start at all until we're ready.
[17:39:35] <Krinkle>	 Yeah
[17:39:47] <Coren>	 Especially if we want to try doing tmpdir as well.
[17:40:30] <Krinkle>	 Coren: I'm curious in the current manifest as written in that patch, whats stopping it from installing it too early etc.? Or will puppet got through and find service{} before it continues?
[17:41:40] <Krinkle>	 Coren: I'll just init.d/mysql stop via dsh and let it start back up
[17:41:44] <Krinkle>	 so that our slaves are back up
[17:41:45] <wikibugs>	 10Wikimedia-Labs-Infrastructure, 5Patch-For-Review: VM images duplicate /etc/apt/preferences.d/wikimedia{,.pref} - https://phabricator.wikimedia.org/T60681#1213001 (10scfc) 5Open>3Resolved a:3scfc The canonical [[https://forge.puppetlabs.com/puppetlabs/apt|puppetlabs/apt]] module solves that question by...
[17:41:49] <Krinkle>	 right now I've taken them all down
[17:42:08] <Krinkle>	 I see /var/lib/mysql/.created exists everywhere now
[17:46:26] <Coren>	 Krinkle: The require stanzas force ordering - if the service isn't started at boot the only thing that can do so is the service {} resource and that one has all the right order.
[17:47:32] <Coren>	 It won't do the mysql_install_db until the package is installed and the tmpfs is mounted, and the service depends on the install having been completed.
[17:51:11] <Krinkle>	 Coren: OK. I've re-pooled a few so that jobs are being processed and changes merged.
[17:51:23] <Krinkle>	 Coren: I'd like to have them rebooted soon as well to make sure everything is all right
[17:51:30] <Krinkle>	 but I guess we still need to figure out the service enabling
[17:54:06] <Krinkle>	 Coren: Maybe we can ensure => stopped and then again ensure => running after the file resource is realised?
[17:54:26] <Krinkle>	 Hm. I guess that would reboot it eveyr pupet run
[17:55:57] <Coren>	 Nope.  We really need to make sure /etc/init/mysql doesn't autostart it.  Easiest way is to add an /etc/init/mysql.override
[17:56:06] * Coren adds it now.
[17:59:16] <Coren>	 Hm.  That'd break with systemd.
[17:59:21] * Coren thinks.
[18:00:36] <Coren>	 Those aren't Jessie, are they?
[18:00:43] <Krinkle>	 Precise and Trusty.
[18:01:01] <Krinkle>	 Although Antoine is working on Jessie (only has slave::labs::common applied and not yet used for anything)
[18:01:05] <Coren>	 Yeah, the .override will do - but will have to be changed when swithing to Jessie.
[18:02:20] <Coren>	 Ah, oops, I undid some of your changes.
[18:02:22] * Coren fixes that
[18:04:02] <Coren>	 Krinkle: The latest changeset should work with any upstart setup.
[18:09:39] <wikibugs>	 6Labs: Zillion expired tokens in keystone database - https://phabricator.wikimedia.org/T96256#1213187 (10Andrew) As of 1pm CDT on April 16th, there are 4147505 rows in the token table.  I'll check back in a day or two and make sure that number is decreasing.  /usr/bin/mysql keystone -e 'SELECT COUNT(*) FROM token;'
[18:10:33] <YuviPanda>	 Coren: hey coren 
[18:10:37] <YuviPanda>	 Just heading to the office 
[18:10:38] <YuviPanda>	 Sup
[18:11:03] <Coren>	 YuviPanda: I wanted to start doing the rolling restarts on the webgrid nodes - wanted to make sure you had nothing baking.
[18:11:22] <YuviPanda>	 Coren: nope - they are all good :)
[18:11:52] <Coren>	 YuviPanda: Exec nodes done, and so are infrastructure except -dev (which has a reboot scheduled for 20h UTC)
[18:11:55] <Krinkle>	 Coren: is .override a special syntax that systemd or upstart adheres?
[18:12:05] <Coren>	 Krinkle: That's an upstartism.
[18:12:08] <Krinkle>	 cool
[18:12:19] <YuviPanda>	 Coren: I'm wondering if we should just being Dev back as trusty. There is already a machine provisioned for it
[18:12:24] <Coren>	 Krinkle: systemd will need an exec{} of systemctl instead.
[18:12:33] <YuviPanda>	 (tools-bastion-02)
[18:13:01] <Coren>	 YuviPanda: Might be worthwhile to point the public IP at it, I suppose, though not without warning as many are using -dev to test their (precise) jobs.
[18:14:00] <YuviPanda>	 Yeah we need a tools-bastion-precise 
[18:14:17] <YuviPanda>	 You are right tho. Needs more warnings and announcements. Let's not do it this time 
[18:14:26] <YuviPanda>	 I'll announce it 
[18:14:48] <YuviPanda>	 Coren: also thoughts on switching default webservice distro to trusty in say 2 weeks?
[18:15:05] <YuviPanda>	 Nothing will change except webservice start will default to trusty and not precise. 
[18:15:17] <YuviPanda>	 We won't migrate any services ourselves either 
[18:15:32] <Coren>	 Two weeks sound okay iff you get the notifications started today.  Much things break if they are moved.
[18:15:33] <YuviPanda>	 But ultimately I want only one precise web node by end of next quarter...
[18:15:48] <YuviPanda>	 Coren: yeah but this is just changing defaults. Not moving anything 
[18:16:00] <YuviPanda>	 I'll start notifications off today 
[18:16:14] <Coren>	 If you change the default, lotsa thing will move implicitly.  I expect most things don't explicitly specify the release.
[18:16:31] <YuviPanda>	 Service Manifests capture it explicitly 
[18:16:47] <YuviPanda>	 And I think we will kill bigbrother for WS before that 
[18:17:04] <YuviPanda>	 So only people doing things by hand (starting and what not) will have it changed 
[18:17:16] <YuviPanda>	 Which I can make webservice command print a notice for 
[18:17:54] <Krinkle>	 Coren: Hm.. getting different kinds of errors after running puppet with the latest patch set
[18:18:08] <Coren>	 Krinkle: That's... vague.  :-)
[18:18:34] <YuviPanda>	 Anyway off to the office!
[18:19:53] <Krinkle>	 Coren: https://gist.github.com/Krinkle/9b845df0c603f8d5d343
[18:24:33] <Krinkle>	 YuviPanda: service manifests are working quite nicely. I see them being auto-created when using webservice2 now. :)
[18:25:36] <YuviPanda>	 Krinkle: :) webservice too - they are the same script now 
[18:26:11] <YuviPanda>	 Except if you call webservice for lighttpd instead of webservice2 it defaults to precise 
[18:26:26] <YuviPanda>	 Krinkle: but yeah idea is you do webservice start and then never worry about it 
[18:26:37] <YuviPanda>	 A webservice stop modifies your service manifest for example 
[18:26:57] <Krinkle>	 so that it won't auto restart?
[18:27:07] <YuviPanda>	 If you explicitly stop it 
[18:27:11] <Krinkle>	 yeah
[18:27:15] <YuviPanda>	 Yup
[18:27:16] <Krinkle>	 makes sense :)
[18:27:20] <YuviPanda>	 Yeah
[18:27:22] <Hprmedina>	 Hi
[18:27:28] <YuviPanda>	 Next is to do this for worker jobs 
[18:27:32] <YuviPanda>	 And then cron 
[18:28:07] <Coren>	 Krinkle: I'm not understanding those logs - I see the /Stage[main]/Role::Ci::Slave::Labs/Service[mysql]/ensure) ensure changed 'stopped' to 'running' but not the things that should have happened before that.
[18:28:28] <Coren>	 Krinkle: Was that on a fresh boot?
[18:28:36] <Krinkle>	 Coren: Nope, not rebooted yet
[18:28:46] <Krinkle>	 last reboot was last week
[18:29:04] <Hprmedina>	 I created an account, I've completed the process well?
[18:29:14] <Hprmedina>	 mu account is hprmedina XD
[18:29:19] <Krinkle>	 Coren: Maybe the enable=>manual from earlier left some residu?
[18:29:20] <Hprmedina>	 my account is hprmedina XD
[18:29:30] <wikibugs>	 6Labs, 10Tool-Labs, 7Tracking: Make toollabs reliable enough (Tracking) - https://phabricator.wikimedia.org/T90534#1213306 (10Ricordisamoa) >>! In T90534#1212906, @scfc wrote: > Do not worry.  Tools is a very liberal environment, so as you have been able to ignore `bigbrother` since July 2014 when @coren set...
[18:29:49] <Coren>	 Krinkle: It shouldn't, since it failed entirely.  But also, I'm not sue where that mysql_upgrade is coming from.
[18:30:16] <Krinkle>	 Coren: I'll reboot one precise and one trusty?
[18:30:59] <Coren>	 Krinkle: Sounds reasonable.
[18:31:32] <Coren>	 (The .override thing won't have any real effect until a reboot anyways since the point was 'don't start on boot'
[18:31:52] <Krinkle>	 "The requested host does not exist." 
[18:32:00] <Krinkle>	 Yikes, my backend session is gone.
[18:32:20] <Coren>	 I know Andrew had to mess with Keystone.
[18:34:56] <Krinkle>	 Coren: Before the restart, my $tail -f syslog shows it got into a loop of sorts between "etc/mysql/debian-start[13592]: Upgrading MySQL" and "kernel: [610994.772232] init: mysql main process ended, respawning"
[18:35:49] <Krinkle>	 Apr 16 18:35:14 integration-slave-trusty-1012 puppet-agent[2023]: Could not retrieve catalog from remote server: Error 400 on SERVER: Duplicate declaration: Diamond::Collector[Httpd] is already declared in file /etc/puppet/modules/apache/manifests/monitoring.pp:22; cannot redeclare at /etc/puppet/modules/apache/manifests/monitoring.pp:22 on node i-00000a8e.eqiad.wmflabs 
[18:35:57] <Krinkle>	 pff
[18:39:38] <Krinkle>	 Looks like that only happens on the first run after boot
[18:40:38] <bblack>	 labvirt1001 known?
[18:40:41] <Krinkle>	 Coren: OK. So the same happens after reboot. Here's the run (filtered) https://gist.github.com/Krinkle/0fe33fa174b702921388/raw
[18:41:02] <Coren>	 puppet-agent[2023]: Could not set 'manual' on enable: undefined method `manual_start' for Service[mysql](
[18:41:09] <Coren>	 Are you sure the latest changeset is applied?
[18:41:47] <Krinkle>	 refs/changes/28/204528/10
[18:41:47] <Krinkle>	 Yep
[18:41:48] <Coren>	 Hm.
[18:42:04] <Coren>	 Maybe some odd leftover crud.  Odd.
[18:42:21] <Krinkle>	 It contains +    file { '/etc/init/mysql.override': , and service { 'mysql': } without any 'enable' parameter
[18:42:58] <Coren>	 I just submitted CS 11 that has an explicit 'enable => true,' instead.
[18:43:13] <Coren>	 That might just be it.
[18:43:31] <Krinkle>	 Coren: It does say " Using cached catalog " 
[18:43:38] <Krinkle>	 Is there a way to check what manfiest is pulled down locally?
[18:43:46] <Krinkle>	 Its local copy of the puppetmaster manifets basically
[18:43:58] <Krinkle>	 Possibly because of Apr 16 18:40:44 integration-slave-precise-1012 puppet-agent[1801]: Could not retrieve catalog from remote server: Error 400 on SERVER: Duplicate declaration: Diamond::Collector[Httpd] is already declared in file /etc/puppet/modules/apache/manifests/monitoring.pp:22; cannot redeclare at /etc/puppet/modules/apache/manifests/monitoring.pp:22 on node i-00000a9f.eqiad.wmflabs 
[18:44:04] <Krinkle>	 Which wans't there until after the reboot
[18:44:27] <Coren>	 A different patch that broke things so that the newer version isn't applying?
[18:44:48] <Coren>	 Try to pull the changeset again, I've added an 'enable => true.' that would override any leftovers.
[18:45:14] <Krinkle>	 Yep, running..
[18:46:22] <Krinkle>	 I don't udnerstnad that duplicate declaration. It's referring to the same file/line.
[18:46:27] <Krinkle>	 It runs twice?
[18:47:21] <Krinkle>	 Hm.. apache init and mediawiki.pp though both include it. https://github.com/wikimedia/operations-puppet/search?q="include+apache%3A%3Amonitoring"&type=Code
[18:47:42] <Krinkle>	 Coren: Cool
[18:47:44] <Krinkle>	 Apr 16 18:46:24 integration-slave-precise-1012 puppet-agent[3247]: (/Stage[main]/Role::Ci::Slave::Labs/Service[mysql]/enable) enable changed 'false' to 'true'
[18:47:44] <Krinkle>	 Apr 16 18:46:25 integration-slave-precise-1012 puppet-agent[3247]: Finished catalog run in 26.24 seconds
[18:47:47] <Krinkle>	 Same on trusty
[18:47:48] <Krinkle>	 no errors
[18:47:50] <Krinkle>	 o/
[18:48:07] <Krinkle>	 Apr 16 18:46:29 integration-slave-trusty-1012 puppet-agent[3650]: (/Stage[main]/Role::Ci::Slave::Labs/File[/etc/init/mysql.override]/ensure) defined content as '{md5}cf3f2a865fbea819dadd439586eaee31' 
[18:49:25] <Krinkle>	 Adding some data to the commit message
[18:50:23] <hasharAway>	 Krinkle: Coren: I can confirm Jessie for CI slave is still experimental.  Has a bunch of broken puppet definition and upstart jobs still
[18:52:12] <Hprmedina>	 Hi, I  completed the step 5 (paste ssh key). How can I know if the key is right?
[18:54:12] <Hprmedina>	 XD
[18:54:56] <Coren>	 Hprmedina: My using it to log in.  :-)
[18:55:27] <Hprmedina>	 Hi Coren 
[18:55:40] <Coren>	 Krinkle: The mysql.override /should/ prevent mysql from starting until puppet tells it to.
[18:56:06] <Krinkle>	 Yeah, looks like it does that
[18:57:20] <hashar>	 may I get loginwithshell permission for the user 'nodepoolmanager' please  :)
[18:57:27] <hashar>	 link is https://wikitech.wikimedia.org/wiki/Shell_Request/Nodepoolmanager
[18:57:46] <hashar>	 to be used to by nodepool and log in instances of the 'contintcloud' project
[18:58:35] <Coren>	 hashar: I dunno.  Looks like a sockpuppet to me.  Fishy.  :-)
[18:58:45] <hashar>	 yeah definitely a sock 
[18:59:00] <hashar>	 though it is going to be mostly a bot :D
[18:59:00] <Coren>	 {{done}}
[18:59:43] <hashar>	 merci!
[19:03:46] <Niharika>	 Hello! grrrit-wm seems dead. (In case it's not already known :)
[19:05:52] <YuviPanda>	 hashar: 
[19:05:59] <YuviPanda>	 Q
[19:06:16] <YuviPanda>	 Sorry accidental 
[19:15:29] <YuviPanda>	 Stuck in Bart booo
[19:16:32] <russblau>	 my dplbot tool's web pages (http://tools.wmflabs.org/dplbot/) are returning the "No webservice" error message, even though 'qstat' shows the server is running - any ideas what might be causing this?
[19:17:26] <YuviPanda>	 russblau: hey! Unsure - can you leave it that way for maybe 20mins more so I can debug?
[19:17:36] <YuviPanda>	 A restart would fix it immediately if you wish 
[19:17:45] <YuviPanda>	 But I want to find root cause now 
[19:18:00] <russblau>	 yeah, i just tried a restart before getting your message, and that cured it
[19:18:34] <YuviPanda>	 Ah damn :)
[19:18:37] <YuviPanda>	 It's ok 
[19:18:43] <russblau>	 thanks though
[19:18:46] <YuviPanda>	 I'll put monitoring in place for this later today
[19:18:47] <YuviPanda>	 Yw
[19:20:55] <wikibugs>	 6Labs: labvirt boxes need a new cert for libvirtd - https://phabricator.wikimedia.org/T96291#1213443 (10Andrew) 3NEW a:3akosiaris
[19:29:22] <Betacommand>	 T13|mobile: can you fix wikiviewstats cron? Im getting email spam about undeliverable mail
[19:44:57] <YuviPanda>	 andrewbogott: I just realized that when cleaning puppet / salt certs you need to clean them both on labswide puppetmaster *and* the local puppetmaster if there’s one
[19:44:57] <YuviPanda>	 ...
[19:45:01] <YuviPanda>	 project-wide puppetmaster, that is...
[19:55:24] <wm-bot>	 Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Hprmedina was created, changed by Hprmedina link https://wikitech.wikimedia.org/wiki/Nova+Resource%3aTools%2fAccess+Request%2fHprmedina edit summary: Created page with "{{Tools Access Request |Justification=improve access to media wiki with direct access to database, API is very slow an inefficient with some querys |Completed=false |User Name..."
[19:56:16] <Coren>	 !log tools depooling -webgrid-01 for reboot
[19:56:25] <labs-morebots>	 Logged the message, Master
[19:56:52] <wikibugs>	 6Labs: labvirt boxes need a new cert for libvirtd - https://phabricator.wikimedia.org/T96291#1213494 (10akosiaris) So, there are 2 WMF CAs right now. The "old" one and the "new" one. Those are populated on all systems via   https://github.com/wikimedia/operations-puppet/blob/production/manifests/certs.pp#L69  an...
[19:58:55] <T13|mobile>	 Strange. I can't log in to tools-dev but I can log into tools-login.  Tools-dev just drops the connection when I log in.
[19:59:07] <Coren>	 YuviPanda: So, do you think the manifest will go bonkers if I restart the webservices?  That'll requeue them for several seconds.  Otherwise, I can just qdel them and let it restart but then anything that is missed will go down.
[20:00:17] <Coren>	 Then again, webservice will not try to restart the service if it's already there in 'Rqw' state will it?
[20:00:22] <YuviPanda>	 Coren: either should work, but I think you should qdel them and have it start it back up.
[20:00:40] <Coren>	 YuviPanda: How confident are you that everything that is running is in the manifests?
[20:01:31] <YuviPanda>	 Coren: about 99.9%
[20:01:34] <Hprmedina>	 Hi Coren, it's me again... I try to connect but it not works... left some right or something?
[20:01:37] <YuviPanda>	 Coren: because I did a diff
[20:01:48] <YuviPanda>	 of running webservices with service manifests and ended up with null
[20:01:56] <Coren>	 Hprmedina: What are you trying to connect to?
[20:02:04] <Coren>	 YuviPanda: Good enough.  :-)
[20:02:25] <Hprmedina>	 login.tools.wmflabs.org
[20:02:52] <Hprmedina>	 (using PuTTY)
[20:03:03] <Amir1>	 YuviPanda: hey, have you got my email?
[20:03:10] <Coren>	 Hprmedina: What username are you trying to connect with?
[20:03:15] <Hprmedina>	 hprmedina
[20:03:17] <Coren>	 (So I can check the logs)
[20:04:09] <Coren>	 Hprmedina: You have not requested access to the Tools project, nor indeed shell access.  :-)
[20:04:35] <Coren>	 (Well, you may have requested them but you haven't gotten them yet)  :-)
[20:04:35] <Hprmedina>	 o.o  https://wikitech.wikimedia.org/wiki/Shell_Request/Hprmedina this ?
[20:04:44] <Hprmedina>	 ahh... ok!
[20:05:20] <Hprmedina>	 maybe that is the problem....  :P
[20:05:33] <Coren>	 Hprmedina: I've just done both.
[20:06:13] <Coren>	 !log tools -webgrid-01 drained, rebooting.
[20:06:16] <labs-morebots>	 Logged the message, Master
[20:06:27] <Hprmedina>	 nice XD thanks, Coren, I will try again
[20:07:45] <T13|mobile>	 Betacommand: I'm not sure what to comment out and I dont see anything obvious.  That said, I don't seem to be getting any emails.
[20:10:48] <T13|mobile>	 I only see one line not commented out.
[20:11:02] <T13|mobile>	 So... now its commented
[20:24:24] <wikibugs>	 6Labs: /etc/ssh/userkeys/ubuntu notices for every puppet run on labs instances - https://phabricator.wikimedia.org/T94866#1213564 (10yuvipanda) I think these keys are in the base image and should be removed from there.
[20:26:22] <wikibugs>	 6Labs: /etc/ssh/userkeys/ubuntu notices for every puppet run on labs instances - https://phabricator.wikimedia.org/T94866#1213566 (10yuvipanda) a:5yuvipanda>3None (and I don't have a fix atm)
[20:29:08] <wikibugs>	 6Labs: /etc/ssh/userkeys/ubuntu notices for every puppet run on labs instances - https://phabricator.wikimedia.org/T94866#1213576 (10akosiaris) I noticed the same thing in a freshly created jessie image today.  ``` Notice: /Stage[main]/Puppetmaster::Ssl/File[/var/lib/puppet/server/ssl]/group: group changed 'pupp...
[20:32:33] <Coren>	 !log tools -webgrid-01 repooled
[20:32:36] <labs-morebots>	 Logged the message, Master
[20:33:26] <Coren>	 !log tools -webgrid-02 depooled
[20:33:29] <labs-morebots>	 Logged the message, Master
[20:35:05] <Coren>	 !log tools -webgrid-02 drained, rebooting
[20:35:07] <labs-morebots>	 Logged the message, Master
[20:38:09] <Coren>	 !log tools -webgrid-02 repooled
[20:38:12] <labs-morebots>	 Logged the message, Master
[20:38:19] <Coren>	 !log tools -webgrid-03 depooled
[20:38:22] <labs-morebots>	 Logged the message, Master
[20:39:04] <YuviPanda>	 Coren: -08 is also precise, btw. scfc recreated -04 with stranger numbering
[20:39:06] <chasemp>	 Coren: say we get a phab uesr who is a spammer and they came in through labs/ldap
[20:39:20] <chasemp>	 is there any policy for banning / warning these people in case they are legit just being silly?
[20:39:27] <chasemp>	 I banned in phab
[20:39:42] <Coren>	 chasemp: Via wikitech talk page, and we can block there.
[20:40:25] <Coren>	 YuviPanda: Yeah, I got a script that checks what instances are precise and not rebooted
[20:40:37] <YuviPanda>	 ah right :)
[20:40:38] <YuviPanda>	 coolo
[20:40:39] <YuviPanda>	 *cool
[20:40:47] <chasemp>	 Coren: like https://wikitech.wikimedia.org/wiki/User_talk:Physicaladdress ?
[20:41:17] <Coren>	 chasemp: Presuming that this is the correct user, yes.  :-)
[20:41:27] <chasemp>	 hah, yes it is
[20:45:09] <Coren>	 !log tools -webgrid-03 drained, rebooting
[20:45:12] <labs-morebots>	 Logged the message, Master
[20:46:33] <Coren>	 !log tools -webgrid-03 repooled, depooling -webgrid-08
[20:46:36] <labs-morebots>	 Logged the message, Master
[20:47:30] <YuviPanda>	 Coren: am writing the trusty webservice switch email now
[20:47:31] <YuviPanda>	 https://etherpad.wikimedia.org/p/toollabs-webservice-trusty-switch
[20:49:09] <chasemp>	 Coren: YuviPanda well fyi if a user comes in from ldap/labs and is banned in phab I'm gogin to suggest talk page notice
[20:49:19] <YuviPanda>	 chasemp: +1
[20:49:20] <chasemp>	 so you may hear from someone unhappy you can point them to us over at phab
[20:49:45] <YuviPanda>	 chasemp: clearly the solution is to embed a wikitext parser in phabricator and hack in talk pages there as well.
[20:49:50] * YuviPanda slinks away again
[20:49:58] <chasemp>	 we could name it OID
[20:50:02] <YuviPanda>	 :D
[20:50:06] <chasemp>	 phoid
[20:50:20] <YuviPanda>	 boid, clearly
[20:50:50] <chasemp>	 andrewbogott: just fyi on this " if a user comes in from ldap/labs and is banned in phab I'm gogin to suggest talk page notice"
[20:51:00] <YuviPanda>	 chasemp: you know we used to do Code Review with a mw extension right? :)
[20:51:07] <Coren>	 YuviPanda: Sounds okay to me.
[20:51:16] <YuviPanda>	 Coren: yeah, let me fill in and send it off. 
[20:55:53] <wikibugs>	 7Tool-Labs: Move tools-mail to trusty - https://phabricator.wikimedia.org/T96299#1213660 (10yuvipanda) 3NEW
[20:56:21] <wikibugs>	 7Tool-Labs, 3ToolLabs-Goals-Q4: Phase out precise instances from toollabs - https://phabricator.wikimedia.org/T94790#1213666 (10yuvipanda)
[20:57:32] <Coren>	 !log tools -webgrid-08 drained, rebooting
[20:57:35] <labs-morebots>	 Logged the message, Master
[21:00:41] <wikibugs>	 6Labs, 3Labs-Q4-Sprint-2, 3ToolLabs-Goals-Q4: Disable LDAP and enable admin puppet module on labstore100[12] - https://phabricator.wikimedia.org/T95559#1213672 (10coren) This will have to be done otherwise as many users are members of more than eight groups and NFS needs to be able to check group membership...
[21:02:18] <wikibugs>	 6Labs, 3Labs-Q4-Sprint-2, 3Labs-Q4-Sprint-3, 3ToolLabs-Goals-Q4: Do a rolling restart of Tool Labs precise instances - https://phabricator.wikimedia.org/T95557#1213673 (10coren) All Precise instances in Tools have had idmap disabled in labs except for the dedicated nodes.  Contacting the maintainers by ema...
[21:12:45] <YuviPanda>	 Coren: one more look at https://etherpad.wikimedia.org/p/toollabs-webservice-trusty-switch? I think I finished it up
[21:19:24] <Coren>	 Still sounds good.  Is the php version the bigger obstacle to migration?
[21:19:48] <YuviPanda>	 Coren: yes. lots of code relies on bugs in php 5.3 to function.
[21:19:54] <YuviPanda>	 well, bugs and long obsolete features
[21:19:56] * Coren nods.
[21:20:00] * YuviPanda cough coughs xtools
[21:20:20] <wikibugs>	 6Labs: Bundler and other Ruby gems needed for MusikBot tool - https://phabricator.wikimedia.org/T96261#1213773 (10MusikAnimal)
[21:21:32] <MusikAnimal>	 ^ that wikibugs bot is fast!
[21:21:53] <MusikAnimal>	 but seriously, do I need to file a phab every time I want gems installed?
[21:48:59] <wikibugs>	 6Labs, 5Patch-For-Review: labvirt boxes need a new cert for libvirtd - https://phabricator.wikimedia.org/T96291#1213885 (10Andrew) libvirtd.conf only allows me to specify one ca file.  So I either need keys on all boxes from the same CA, or some way to chain multiple CAs in a single file.  It shouldn't hurt to...
[21:49:47] <wikibugs>	 6Labs, 5Patch-For-Review: labvirt boxes need a new cert for libvirtd - https://phabricator.wikimedia.org/T96291#1213892 (10Andrew) Oh, I should add -- the main project here is migration of instances from virt10xx to labvirt10xx.  So I need them to talk to each other.