[01:39:48] <grrrit-wm>	 (03CR) 10Tim Landscheidt: [C: 04-1] "I'm okay with that, but there needs to be an entry in debian/changelog." [labs/toollabs] - 10https://gerrit.wikimedia.org/r/194702 (owner: 10Yuvipanda)
[01:42:40] <Betacommand>	 Coren: Question about database tables, are user tables stored on the same servers as the db replicas?
[01:42:56] <Coren>	 Define user tables?
[01:43:22] <Betacommand>	 user created database tables, sorry for not being clearer
[01:43:42] <Coren>	 Yes, albeit on different disks
[01:43:57] <Coren>	 Well, except for those created on tools-db of course.
[01:44:55] <Betacommand>	 so doing joins across all databases from user tables should be possible?
[01:48:59] <Betacommand>	 Coren: ?
[01:49:27] <Coren>	 Betacommand: It is.
[01:50:46] <Betacommand>	 OK, thanks. Getting ready to create/replace a tool that would use it, and before I got too far I wanted to make sure that I could do it that way
[01:51:50] <Coren>	 Be mindful of good indices and join order when you do so; joins tend to be expensive on the db otherwise.
[02:02:10] <Betacommand>	 Coren: its more of a select from replica, and dump into user table
[05:17:56] <Krinkle>	 Can someone restart this tool? https://tools.wmflabs.org/unpatrollededitstats/commonswiki/ Seems to have gone down a while back but not come back up after the outage
[05:59:00] <YuviPanda|zzz>	 Krinkle: yup in a couple of mins.
[06:18:07] <YuviPanda|zzz>	 Krinkle|detached: done
[06:24:28] <grrrit-wm>	 (03PS2) 10Yuvipanda: Do not create ~/cgi-bin for new tools [labs/toollabs] - 10https://gerrit.wikimedia.org/r/194702 
[06:43:44] <Jianhui67>	 Heya
[06:44:03] <YuviPanda>	 hi Jianhui67 
[06:44:14] <Jianhui67>	  https://tools.wmflabs.org/unpatrollededitstats/
[06:44:36] <Jianhui67>	  Can someone do a statistics of this for wikidata and wikispecies
[06:44:39] <Jianhui67>	 Would be useful
[06:49:59] <Jianhui67>	 Jelte: ^^
[06:51:40] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-02 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0]  
[06:52:40] <wikibugs>	 10Wikimedia-Labs-Infrastructure: Move LabsDB aliases and NAT to DNS and LabsDB servers - https://phabricator.wikimedia.org/T63897#1094926 (10scfc) I found some old notes regarding possible migration (non-)issues based on a imaginary world after https://gerrit.wikimedia.org/r/#/c/156599/ was merged that applies t...
[06:52:42] <shinken-wm>	 PROBLEM - Puppet failure on tools-master is CRITICAL: CRITICAL: 28.57% of data above the critical threshold [0.0]  
[06:58:22] <wikibugs>	 10Tool-Labs: /usr/bin/sql should query DNS as well to determine whether a database has been replicated - https://phabricator.wikimedia.org/T91733#1094928 (10scfc) 3NEW a:3scfc
[06:58:52] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-catscan is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0]  
[07:08:00] <grrrit-wm>	 (03CR) 10Tim Landscheidt: [C: 032] Do not create ~/cgi-bin for new tools [labs/toollabs] - 10https://gerrit.wikimedia.org/r/194702 (owner: 10Yuvipanda)
[07:17:47] <shinken-wm>	 RECOVERY - Puppet failure on tools-master is OK: OK: Less than 1.00% above the threshold [0.0]  
[07:20:30] <wikibugs>	 10Tool-Labs, 10Continuous-Integration: labs-toollabs-debian-glue fails apparently with a timeout - https://phabricator.wikimedia.org/T91247#1094957 (10scfc) In https://integration.wikimedia.org/ci/job/labs-toollabs-debian-glue/112/console, it passed, but the section that failed previously apparently wasn't eve...
[07:21:49] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-02 is OK: OK: Less than 1.00% above the threshold [0.0]  
[07:28:57] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-catscan is OK: OK: Less than 1.00% above the threshold [0.0]  
[08:01:53] <wikibugs>	 10Tool-Labs: Migrate individual tools to trusty to relieve pressure on older precise nodes - https://phabricator.wikimedia.org/T88228#1095003 (10scfc) a:5scfc>3None
[08:03:55] <wikibugs>	 10Wikimedia-Labs-Infrastructure: Move LabsDB aliases and NAT to DNS and LabsDB servers - https://phabricator.wikimedia.org/T63897#1095005 (10scfc) Forgot: I had uploaded a small script at https://gerrit.wikimedia.org/r/#/c/191846/ that transforms text files to YAML (for hiera) which might be useful here.
[08:12:24] <wikibugs>	 6Labs, 5Patch-For-Review, 7Puppet: Enable including classes via hiera for labs - https://phabricator.wikimedia.org/T90592#1095017 (10scfc) 5Open>3Resolved a:3thcipriani
[08:53:10] <wikibugs>	 10Wikimedia-Labs-Infrastructure: Move LabsDB aliases and NAT to DNS and LabsDB servers - https://phabricator.wikimedia.org/T63897#1095079 (10yuvipanda)
[08:53:42] <shinken-wm>	 PROBLEM - Puppet failure on tools-master is CRITICAL: CRITICAL: 42.86% of data above the critical threshold [0.0]  
[08:54:30] <marcmiquel>	 good morning labs people
[09:08:15] <wikibugs>	 10Tool-Labs: Remove unneeded tools - https://phabricator.wikimedia.org/T91740#1095116 (10scfc) 3NEW a:3scfc
[09:10:26] <wikibugs>	 10Tool-Labs: Remove unneeded tools - https://phabricator.wikimedia.org/T91740#1095126 (10scfc) No grid jobs running.
[09:12:12] <Niharika>	 Ping. wikibugs dropped off from -dev. 
[09:12:26] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-06 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]  
[09:18:42] <shinken-wm>	 RECOVERY - Puppet failure on tools-master is OK: OK: Less than 1.00% above the threshold [0.0]  
[09:36:27] <wikibugs>	 10Wikimedia-Labs-Infrastructure: Move LabsDB aliases and NAT to DNS and LabsDB servers - https://phabricator.wikimedia.org/T63897#1095166 (10scfc)
[09:36:28] <wikibugs>	 10Tool-Labs, 5Patch-For-Review: /usr/bin/sql should query DNS as well to determine whether a database has been replicated - https://phabricator.wikimedia.org/T91733#1095165 (10scfc) 5Open>3Resolved
[09:59:26] <Kelson>	 YuviPanda: Hi, I try to figure out how to mount the NFS to get more storage with https://wikitech.wikimedia.org/wiki/NFS
[09:59:45] <YuviPanda>	 Kelson: oh, that’s not needed for labs.
[09:59:52] <Kelson>	 YuviPanda: without success. Do I'm at the right place?
[09:59:53] <YuviPanda>	 Kelson: it should already be available for you by default
[10:00:00] <YuviPanda>	 Kelson: just write to /data/project
[10:00:06] <YuviPanda>	 if it isn’t available for you
[10:00:09] <YuviPanda>	 just restart your machine
[10:00:13] <YuviPanda>	 err, instance
[10:00:15] <YuviPanda>	 and it should be available
[10:00:25] <YuviPanda>	 all labs instances have NFS mounted on /data/project
[10:00:40] <Kelson>	 YuviPanda: ok, I have a call, I try afterward. thx
[10:00:45] <YuviPanda>	 Kelson: yw
[10:35:07] <marcmiquel>	 guys, I found sth I cannot understand...it's in the Noam Chomsky article in Catalan Wikipedia
[10:35:13] <marcmiquel>	 maybe you can see what I cannot.
[10:35:29] <marcmiquel>	 I see what links to it and I found 655 other articles.
[10:35:31] <Kelson>	 kelson@mwoffliner3:/srv/kiwix-other/mwoffliner$ df | grep srv
[10:35:31] <Kelson>	 /dev/mapper/vd-second--local--disk                   63242140    47976912    12029596  80% /srv
[10:35:31] <Kelson>	 kelson@mwoffliner3:/srv/kiwix-other/mwoffliner$ touch titi
[10:35:31] <Kelson>	 touch: cannot touch 'titi': No space left on device
[10:35:45] <marcmiquel>	 but when I check in the other articles if there's a link in the code...there isn't!
[10:35:54] <marcmiquel>	 http://ca.wikipedia.org/w/index.php?title=Especial%3AEnlla%C3%A7os&limit=500&target=Noam+Chomsky&namespace=0
[10:35:57] <Kelson>	 YuviPanda: pretty strange, no space left on device, but the partition is only 80% full....
[10:36:17] <marcmiquel>	 it's very strange...
[10:36:21] <YuviPanda>	 Kelson: hmm, not sure what’s happening….
[10:36:40] <YuviPanda>	 Kelson: did you try using /data/project?
[10:36:47] <YuviPanda>	 I’m not sure if anyone else has tried filling /srv before, tho
[10:36:59] <Kelson>	 YuviPanda: still not, currently reconfiguring a dump to use ti.
[10:37:09] <YuviPanda>	 alright
[10:37:17] <YuviPanda>	 I suspect we would need andrewbogott_afk or Coren  to look at the /srv issue
[11:00:24] <marcmiquel>	 YuviPanda: did you see what I said? :)
[11:00:35] <YuviPanda>	 marcmiquel: oh, sorry. doing too many things...
[11:00:41] <marcmiquel>	 no prob :)
[11:01:08] <YuviPanda>	 marcmiquel: I’m not sure what’s happening there. Maybe it’s in a template?
[11:01:22] <marcmiquel>	 how could I check that?
[11:02:44] <YuviPanda>	 marcmiquel: open the page, go to ‘view source’ in browser, and then try to find?
[11:02:50] <YuviPanda>	 so search in output html, not in source
[11:03:00] <marcmiquel>	 ok! thanks.
[11:04:52] <marcmiquel>	 YuviPanda: indeed. i found the links.
[11:05:11] <marcmiquel>	 but they aren't supposed to be there. should I alert the community?
[11:05:46] <YuviPanda>	 I guess so
[11:05:54] * YuviPanda isn’t fully sure
[11:06:46] <marcmiquel>	 can I modify the template myself and that will propagate?
[11:07:09] <YuviPanda>	 I guess so. not sure what the community would expect you to do tho
[11:10:40] <wm-bot>	 I will let you know when I see anomie around here
[11:10:40] <YuviPanda>	 @notify anomie
[11:16:44] <wm-bot>	 I will let you know when I see hoo around here
[11:16:44] <YuviPanda>	 @notify hoo
[12:08:04] <wikibugs>	 10Tool-Labs: Remove unneeded tools - https://phabricator.wikimedia.org/T91740#1095369 (10scfc) No cron jobs, filesystem done.
[12:16:45] <wikibugs>	 10Tool-Labs: Remove unneeded tools - https://phabricator.wikimedia.org/T91740#1095371 (10scfc) Removed service groups, still need to look through the databases.
[12:32:18] <wikibugs>	 6Labs: Labs NFSv4/idmapd mess - https://phabricator.wikimedia.org/T87870#1095395 (10akosiaris) OK, waiting a week has been prudent. Perhaps we should move this forward again ?
[12:57:39] <Coren>	 Kelson: Do 'df -i /srv'
[12:58:39] <Coren>	 Kelson: Ima going to guess you ran out of inodes.
[13:02:09] <wikibugs>	 6Labs, 6operations, 5Patch-For-Review: Puppetize labstore1003 - https://phabricator.wikimedia.org/T91573#1095427 (10coren) p:5Triage>3Normal
[13:18:27] <Kelson>	 Coren: hi
[13:18:43] <Kelson>	 Coren: I have removed a lot of content, but that's perfectly possible, should I tune the limits?
[13:20:36] <Coren>	 Kelson: You make have to, if you have very many small files.  ext4 has a default inode/space ratio that is turned for a typical mix of file sizes.
[13:20:37] <Kelson>	 YuviPanda: I don't have benched it, but the nfs server seems to be pretty fast - so this looks like to be a good alternative to /srv
[13:21:20] <Kelson>	 Coren: ok, I'll do it. Thank you for this analysis.
[13:22:04] <YuviPanda>	 Kelson: \o/
[13:23:59] <Coren>	 Kelson: As a rule, if trying to create an empty file fails, it's pretty much guaranteed that you ran out of inodes and not space: an empty file allocates no blocks.
[13:43:57] <Betacommand>	 !toolshelp
[13:43:57] <wm-bot>	 https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Help
[13:51:06] <Betacommand>	 Coren: are there any docs for creating a user database?
[13:51:59] <Coren>	 Betacommand: Check section 8 of that page, there is a link to 'complete documentation' which has it.
[13:56:51] <Betacommand>	 Coren: thanks missed the small print
[15:25:20] <wikibugs>	 6Labs, 5Patch-For-Review: Upgrade labstore1002 to Jessie - https://phabricator.wikimedia.org/T91640#1095612 (10Cmjohnson)
[16:03:51] <Krenair>	 Coren, is there no wikimania2016wiki replication?
[16:04:36] <Coren>	 Krenair: It may be too recent to have been picked up by the last run.  Lemme check.
[16:05:45] <Krenair>	 26th Jan: https://lists.wikimedia.org/pipermail/newprojects/2015-January/000097.html
[16:06:20] <Coren>	 Krenair: It's replicated, but does not yet have a view.  Lemme do a maintain-replicas run.
[16:06:25] <andrewbogott>	 YuviPanda: you’re getting these shinken emails about puppet freshness on tools, right?  Do you know what that’s happening?
[16:08:43] <Coren>	 Krenair: It's in progress.  Takes about 30min all told.
[16:09:14] <Krenair>	 Coren, okay, interesting. Do you have to do these every time a new wiki is created?
[16:11:01] <Coren>	 Krenair: Also for schema changes.  I generally do it monthly or so as a few weeks' lag usually isn't a major concern but I was a little behind because outages last month when I would have done it normally.
[16:11:19] <Coren>	 It's not automated because it often needs manual intervention when queries are holding table locks.
[16:12:07] <Coren>	 Also sometimes schema changes break things.
[16:24:02] <anomie>	 Coren, YuviPanda: Would one of you care to fix someone's labs/gerrit/wikitech access? <https://phabricator.wikimedia.org/T90658>, instructions are linked from the last post.
[16:24:44] <Krinkle>	 YuviPanda: It seems graphite is no longer deleting deleted instances
[16:24:51] <Coren>	 Krenair: You should be full of happies, but the alias to it won't be there for a few hours (I'm working on that atm) - connecting to c[123].labsdb will get you to it anyways.
[16:25:28] <Coren>	 anomie: I'll try to get to it right after lunch if nobody got to it first.
[16:26:43] <anomie>	 Thanks
[16:27:14] <Krinkle>	 Giving me funny graphs like these :D https://graphite.wmflabs.org/render/?title=integration-slave1005+CPU&from=-24h&target=alias%28color%28stacked%28integration.integration-slave1005.cpu.total.user.value%29%2C%22%233333bb%22%29%2C%22User%22%29
[16:28:09] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-11 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [0.0]  
[17:15:28] <wikibugs>	 6Labs, 7Puppet: Missing documentation for labs puppet roles - https://phabricator.wikimedia.org/T91770#1095890 (10awight) 3NEW
[17:37:17] <awight>	 Does it take some time for ssh keys to propagate to labs?  I should have shell access per https://wikitech.wikimedia.org/w/index.php?title=Shell_Request/Awight but cannot log into bastion1 or my instance...
[17:38:18] <awight>	 nvm!  I just got in.
[17:39:40] <awight>	 How would I sudo to root?
[17:42:03] <andrewbogott>	 awight: if you have sudo on the machine, sudo su -
[17:42:40] <wikibugs>	 10Wikimedia-Labs-wikitech-interface, 10Wikimedia-IRC, 6operations: Enable irc feed for wikitech.wikimedia.org site - https://phabricator.wikimedia.org/T36685#1095977 (10Glaisher)
[17:42:40] <awight>	 andrewbogott: it asks for a password...
[17:42:49] <andrewbogott>	 awight: then you don’t have sudo rights
[17:43:05] <awight>	 hrm!  I just created this instance.
[17:43:18] <andrewbogott>	 what instance?
[17:43:21] <andrewbogott>	 what project?
[17:43:44] <awight>	 ok cool, I see https://wikitech.wikimedia.org/wiki/Help:Sudo_Policies
[17:43:56] <awight>	 I'll try that first
[17:44:53] <awight>	 success!
[17:45:19] <andrewbogott>	 woo documentation
[17:45:37] <awight>	 :)
[18:26:06] <andrewbogott>	 awight: if you’re still here, can you help me with a small experiment?
[18:26:32] <awight>	 andrewbogott: sure!
[18:26:50] <andrewbogott>	 hm… actually, nevermind, this will make a lot more sense if I’m in the same room with someone
[18:26:57] <andrewbogott>	 I’ll get subbu to do it this afternoon :)
[18:27:03] <awight>	 Hehe, I was hoping my inexperience might be helpful here.
[18:30:26] <wikibugs>	 6Labs: Install OpenStack Horizon for production labs - https://phabricator.wikimedia.org/T87279#1096180 (10Andrew) For test/dev/demo purposes:  https://horizon.wikimedia.org  Login with labs shell name and wikitech password.  The next step is to create a million subtasks here for every little way that horizon is...
[18:31:06] <wikibugs>	 10Wikimedia-Labs-Infrastructure, 5Patch-For-Review: Move LabsDB aliases to DNS - https://phabricator.wikimedia.org/T63897#1096181 (10coren)
[18:31:40] <wikibugs>	 6Labs: Fix horizon logo - https://phabricator.wikimedia.org/T91780#1096184 (10Andrew) 3NEW a:3Andrew
[18:31:55] <wikibugs>	 10Wikimedia-Labs-Infrastructure, 5Patch-For-Review: Move LabsDB aliases to DNS - https://phabricator.wikimedia.org/T63897#653806 (10coren) Edited task description to be clearer about the current status.
[18:32:32] <wikibugs>	 6Labs: Limit available images on horizon - https://phabricator.wikimedia.org/T91782#1096199 (10Andrew) 3NEW a:3Andrew
[18:44:49] <wikibugs>	 10MediaWiki-extensions-OpenStackManager: OpenStackManager special pages should link to in-wiki documentation - https://phabricator.wikimedia.org/T36500#1096302 (10Nemo_bis)
[18:47:07] <wikibugs>	 6Labs: Horizon security audit - https://phabricator.wikimedia.org/T91784#1096313 (10Andrew) 3NEW a:3Andrew
[19:52:10] <wikibugs>	 6Labs: Change the way manage-nfs-volumes is monitored - https://phabricator.wikimedia.org/T91806#1096608 (10coren) 3NEW
[19:52:41] <wikibugs>	 6Labs: Change the way manage-nfs-volumes is monitored - https://phabricator.wikimedia.org/T91806#1096623 (10coren) p:5Triage>3Normal
[20:31:43] <shinken-wm>	 PROBLEM - Puppet staleness on tools-webgrid-tomcat is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [43200.0]  
[20:32:48] <shinken-wm>	 PROBLEM - Puppet staleness on tools-static is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [43200.0]  
[20:33:00] <shinken-wm>	 PROBLEM - Puppet staleness on tools-trusty is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [43200.0]  
[20:33:30] <shinken-wm>	 PROBLEM - Puppet staleness on tools-mail is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [43200.0]  
[20:33:44] <shinken-wm>	 PROBLEM - Puppet staleness on tools-webgrid-02 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [43200.0]  
[20:33:54] <shinken-wm>	 PROBLEM - Puppet staleness on tools-webgrid-05 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [43200.0]  
[20:34:28] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-14 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [43200.0]  
[20:34:40] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-gift is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [43200.0]  
[20:36:06] <YuviPanda>	 Coren: ^ can you take a look?
[20:36:17] <YuviPanda>	 andrewbogott: ^ I forgot about your email, I have no idea what’s happening
[20:36:26] <Coren>	 YuviPanda: Bleh.  Yeah, on it.
[20:36:38] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-07 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [43200.0]  
[20:36:46] <YuviPanda>	 I’m going to sleep
[20:37:00] <andrewbogott>	 YuviPanda: yeah, looks like the same thing
[20:37:04] <Coren>	 YuviPanda: You still do that?
[20:37:07] <andrewbogott>	 but, not urgent anyway.  sleep well
[20:37:12] <shinken-wm>	 PROBLEM - Puppet staleness on tools-webgrid-01 is CRITICAL: CRITICAL: 28.57% of data above the critical threshold [43200.0]  
[20:37:12] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-13 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [43200.0]  
[20:37:14] <YuviPanda>	 Coren: I try… sometimes
[20:38:52] <shinken-wm>	 PROBLEM - Puppet staleness on tools-dev is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [43200.0]  
[20:39:02] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-02 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [43200.0]  
[20:39:48] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-08 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [43200.0]  
[20:40:20] <shinken-wm>	 PROBLEM - Puppet staleness on tools-webgrid-04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [43200.0]  
[20:41:10] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-12 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0]  
[20:41:11] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-01 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [43200.0]  
[20:41:14] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-catscan is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [43200.0]  
[20:41:36] <shinken-wm>	 PROBLEM - Puppet staleness on tools-shadow is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [43200.0]  
[20:41:37] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [43200.0]  
[20:42:43] <shinken-wm>	 PROBLEM - Puppet staleness on tools-webgrid-generic-02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [43200.0]  
[20:42:57] <andrewbogott>	 Coren: every time that happens I log into one of the affected instances and run puppet to see what’s happening, and puppet works fine.  This time is no exception.  (Just tried on tools-webgrid-04, works fine.)
[20:43:15] <Coren>	 andrewbogott: Yeah, I did exactly that on -14 with the same result.
[20:43:17] <andrewbogott>	 Is it possible that our test is broken such that it perceives any puppet diff as a failure?
[20:43:22] * Coren tries to dig into logs.
[20:43:51] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-cyberbot is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [43200.0]  
[20:43:51] <shinken-wm>	 PROBLEM - Puppet staleness on tools-redis is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [43200.0]  
[20:44:01] <shinken-wm>	 PROBLEM - Puppet staleness on tools-redis-slave is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [43200.0]  
[20:44:29] <Coren>	 andrewbogott: That'd be an amusing bug, but would also explain a few things.
[20:45:05] <andrewbogott>	 How stale does a box have to be to be considered stale?
[20:45:14] <andrewbogott>	 e.g. 31 minutes?
[20:45:39] <Coren>	 andrewbogott: It's a bit bigger than that IIRC.
[20:45:55] <Coren>	 andrewbogott: Huh.  Interesting datapoint.  The puppet log on -14 has not been touched in some 12h
[20:46:05] <andrewbogott>	 if it’s less than 60 mins then I blame that.  There should be some slack...
[20:46:12] <Coren>	 andrewbogott: So puppet apprently isn't even TRYING to run.
[20:46:21] <andrewbogott>	 oh! That’d do it
[20:46:26] <andrewbogott>	 And explain why running it by hand works fine
[20:46:37] <shinken-wm>	 PROBLEM - Puppet staleness on tools-webgrid-07 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [43200.0]  
[20:46:39] <shinken-wm>	 PROBLEM - Puppet staleness on tools-webproxy-02 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [43200.0]  
[20:46:39] <andrewbogott>	 but not why it always shapes up an hour later
[20:47:16] <Coren>	 Isn't puppet supposed to run from cron?
[20:47:45] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-05 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [43200.0]  
[20:47:49] <andrewbogott>	 I think so
[20:48:29] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-09 is CRITICAL: CRITICAL: 62.50% of data above the critical threshold [43200.0]  
[20:48:39] <andrewbogott>	 hm, must not be in cron, I don’t see crontabs for that
[20:49:03] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-10 is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [43200.0]  
[20:49:15] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-03 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [43200.0]  
[20:49:23] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-06 is CRITICAL: CRITICAL: 28.57% of data above the critical threshold [43200.0]  
[20:49:46] <Coren>	 andrewbogott: That was why the quesion and my puzzlement.
[20:49:56] <Coren>	 What /does/ run puppet, ostensibly?
[20:50:23] <andrewbogott>	 a minute ago I would’ve said ‘cron'
[20:52:15] <Coren>	 See, so would I.  So now I'm wondering if there is something that's interfering and the cause of our explosions.
[20:53:44] <Coren>	 There isn't a puppet agent daemon, and there is nothing in cron so clearly puppet isn't being run.
[20:54:26] <shinken-wm>	 RECOVERY - Puppet staleness on tools-exec-14 is OK: OK: Less than 1.00% above the threshold [3600.0]  
[20:55:24] <shinken-wm>	 RECOVERY - Puppet staleness on tools-webgrid-04 is OK: OK: Less than 1.00% above the threshold [3600.0]  
[20:57:11] <andrewbogott>	 Coren: look in cd /etc/cron.d
[20:57:18] <andrewbogott>	 um… in /etc/cron.d/puppet
[20:57:24] <andrewbogott>	 so it’s there but when I look at crontab I don’t see it
[20:58:21] <andrewbogott>	 I don’t understand why it doesn’t appear in crontab -l
[20:59:48] <Coren>	 Well, the cron.d stuff doesn't by default I think.
[21:00:00] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-14 is CRITICAL: CRITICAL: 75.00% of data above the critical threshold [0.0]  
[21:00:15] <Coren>	 It's a system crontab and not root's specifically.
[21:00:25] <Coren>	 But it's not running for some reason.
[21:00:28] * Coren digs moar.
[21:00:58] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-04 is CRITICAL: CRITICAL: 37.50% of data above the critical threshold [0.0]  
[21:01:17] <Coren>	 Wait.  Are all the failing boxen trusty?
[21:01:24] <Coren>	 Well, not-precise?
[21:02:09] <andrewbogott>	 Looking at syslog, I would say that puppet is invoked properly.   And the failure is an actual failure.
[21:02:15] <andrewbogott>	 E: There are problems and -y was used without --force-yes
[21:02:33] <andrewbogott>	 installing install python-requests
[21:02:58] <Coren>	 In syslog?
[21:03:06] <andrewbogott>	 yeah
[21:03:35] <Coren>	 I see that once in the past 24h
[21:04:09] <andrewbogott>	 on tools-exec-11 it seems to be happening on every run for the last few hours
[21:04:49] <Coren>	 But why doesn't it do it by hand?
[21:05:05] <Coren>	 (Also, why isn't /var/log/puppet updated)
[21:05:25] <andrewbogott>	 I don’t know what the deal is with /var/log/puppet except that it’s always empty
[21:05:43] <andrewbogott>	 hm, running by hand I see the complaint now.
[21:05:52] <Coren>	 Hah!  Running it again on -14 I see the complaint.
[21:05:54] <andrewbogott>	 I wonder…
[21:07:00] <andrewbogott>	 nah, I see it every time now
[21:07:04] <andrewbogott>	 Maybe we were just hypnotized before
[21:07:18] * Coren checks something.
[21:08:01] <Coren>	 20:54‧ <shinken-wm> RECOVERY - Puppet staleness on tools-exec-14 is OK: OK: Less than 1.00% above the threshold [3600.0]    
[21:08:03] <Coren>	 So no.
[21:09:18] <Coren>	 Why is it now trying to downgrade a package?
[21:09:36] <Coren>	 And I *know* it didn't do that when I ran it earlier.
[21:09:51] <Coren>	 It looks like an apt-get update ran and it lost one of its repos.
[21:10:18] <YuviPanda>	 Coren: andrewbogott so I’m technically still sleeping, but https://gerrit.wikimedia.org/r/#/c/119428/ might be related?
[21:10:24] <YuviPanda>	 I tested on tools-login after I merged, and it was ok...
[21:10:37] <andrewbogott>	 YuviPanda: yeah, maybe breaks on trusty but not precise?
[21:11:15] <YuviPanda>	 andrewbogott: well, webgrid-06 is trusty and -04 is precise, and both are being warned for by shinken
[21:11:20] <YuviPanda>	 and tools-login is precise
[21:12:07] <Coren>	 YuviPanda: I think it's likely related.  It seems to have lost track of the packages with newer versions built specifically for tools.
[21:12:16] <Coren>	 Hence the failed attempts at downgrading.
[21:12:23] <YuviPanda>	 right. I ‘tested’ by doing an apt-cache for nginx-extras
[21:32:41] <Coren>	 Gah.  Sorry, past four on a friday my brain is fried and I can't actually figure out the issue.
[21:47:43] <Krinkle>	 Coren: andrewbogott: Hm.. is there a bug for the fact that /home is not always mounted for new instances? Of the 20 instances I created over the past week (I re-created some), about 4 did not get /home mounted even after many puppet runs
[21:48:24] <andrewbogott>	 Krinkle: there’s not a bug, although I can fix it pretty easily.  Go ahead and make a new ticket.
[21:48:46] <andrewbogott>	 The issue is just that instances boot before nfs knows about them… it’s a symptom of yuvi’s new partitioning change that makes boots much faster.
[21:49:06] <Krinkle>	 andrewbogott: A reboot fixes it, right?
[21:49:08] <andrewbogott>	 yep
[21:49:23] <andrewbogott>	 But I can put a manual delay in the firstboot script so that nfs has a chance to catch up
[21:50:20] <wikibugs>	 6Labs, 10Wikimedia-Labs-Infrastructure: New instances don't always get their /data and /home project mounted from NFS - https://phabricator.wikimedia.org/T91822#1096906 (10Krinkle) 3NEW
[22:31:16] <wikibugs>	 6Labs: Rename keystone role 'projectadmin' to 'admin' - https://phabricator.wikimedia.org/T91830#1097046 (10Andrew) 3NEW a:3Andrew
[23:11:07] <wikibugs>	 6Labs: Rename keystone role 'projectadmin' to 'admin' - https://phabricator.wikimedia.org/T91830#1097182 (10Andrew) As worded this is incorrect.  In keystone an 'admin' is what we call a 'cloud admin' and what in keystone is called a 'member' is almost but not quite what we call a 'project admin'.  I will articu...
[23:15:24] <wikibugs>	 6Labs: Rename keystone role 'projectadmin' to 'admin' - https://phabricator.wikimedia.org/T91830#1097207 (10Andrew) Largely for my future reference, here's a long discussion with a Horizon dev about how this will work for us:  https://phabricator.wikimedia.org/P368
[23:23:45] <wikibugs>	 6Labs: Rename keystone role 'projectadmin' to 'admin' - https://phabricator.wikimedia.org/T91830#1097225 (10Andrew) In Labs:  User:  can view project info and access project instances.  Cannot change project membership or create VMs.  ProjectAdmin:  Everything a User does, plus:  can create/delete VMs, modify pr...