[05:25:27] <shinken-wm>	 PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1414 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [43200.0]
[07:04:39] <shinken-wm>	 PROBLEM - Puppet run on tools-docker-builder-03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[09:40:26] <wikibugs>	 10Striker, 10Phabricator, 10Security-Reviews, 13Patch-For-Review: Unable to mirror repository from git.legoktm.com into diffusion - https://phabricator.wikimedia.org/T143969#2647761 (10faidon) I don't have any strong feelings towards either direction, no. (let's see if Moritz or Darian feel otherwise)  As...
[11:14:50] <shinken-wm>	 PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[11:54:51] <shinken-wm>	 RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0]
[12:41:55] <shinken-wm>	 PROBLEM - SSH on tools-webgrid-lighttpd-1210 is CRITICAL: Server answer
[13:26:36] <grrrit-wm>	 (03PS1) 10Tobias Gritschacher: Add 2ColConflict and ElectronPdfService extensions [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/311414 
[13:31:30] <wikibugs>	 06Labs, 07Tracking: New Labs project requests (tracking) - https://phabricator.wikimedia.org/T76375#2648137 (10chasemp)
[13:31:32] <wikibugs>	 06Labs, 15User-Nikerabbit: Request creation of wmwcourse labs project - https://phabricator.wikimedia.org/T144388#2648135 (10chasemp) 05Open>03stalled >>! In T144388#2629978, @Nikerabbit wrote: > Yep. I said early 2017 in case the project work takes more time to finish after the lectures end in December....
[13:31:48] <wikibugs>	 06Labs, 15User-Nikerabbit: Revert in 01/2017: Request creation of wmwcourse labs project - https://phabricator.wikimedia.org/T144388#2648138 (10chasemp)
[13:34:46] <wikibugs>	 06Labs, 05Goal: Create labtest cluster - https://phabricator.wikimedia.org/T120293#2648147 (10chasemp)
[13:34:48] <wikibugs>	 06Labs: Install and configure labtestnet2001 as a labnet gateway - https://phabricator.wikimedia.org/T120297#2648145 (10chasemp) 05Open>03Resolved a:03chasemp
[13:48:52] <wikibugs>	 06Labs, 10Labs-Infrastructure: New instance first puppet run is broken - https://phabricator.wikimedia.org/T144330#2648217 (10chasemp) 05Open>03Resolved a:03chasemp Yes, I believe so.
[14:03:14] <addshore>	 hey yuvipanda! coudl you add me to thelolrrit-wm tool please? :D
[14:05:43] <shinken-wm>	 RECOVERY - Host tools-secgroup-test-103 is UP: PING OK - Packet loss = 0%, RTA = 0.63 ms
[14:06:00] <grrrit-wm>	 (03CR) 10Addshore: [C: 031] "Looks like I can't deploy this after I merge yet so just +1 for now." [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/311414 (owner: 10Tobias Gritschacher)
[14:13:20] <shinken-wm>	 PROBLEM - Host tools-secgroup-test-103 is DOWN: CRITICAL - Host Unreachable (10.68.21.22)
[14:27:01] <shinken-wm>	 RECOVERY - Host secgroup-lag-102 is UP: PING OK - Packet loss = 0%, RTA = 0.75 ms
[14:51:59] <shinken-wm>	 PROBLEM - Host secgroup-lag-102 is DOWN: CRITICAL - Host Unreachable (10.68.17.218)
[14:54:09] <grrrit-wm>	 (03CR) 10Paladox: "@Addshore hi, if you can merge I can deploy for you?" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/311414 (owner: 10Tobias Gritschacher)
[14:54:23] <grrrit-wm>	 (03CR) 10Addshore: [C: 032] Add 2ColConflict and ElectronPdfService extensions [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/311414 (owner: 10Tobias Gritschacher)
[14:55:01] <grrrit-wm>	 (03Merged) 10jenkins-bot: Add 2ColConflict and ElectronPdfService extensions [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/311414 (owner: 10Tobias Gritschacher)
[14:56:17] <grrrit-wm>	 (03CR) 10Paladox: "@Addshore hi, would you also be able to merge this one please? This is all tested and has been deployed." [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/308949 (https://phabricator.wikimedia.org/T93082) (owner: 10Paladox)
[14:56:41] <wikibugs>	 06Labs: Request creation of Mathematical Refresh Rate Policies labs project - https://phabricator.wikimedia.org/T143901#2582592 (10Andrew) Hello!  I'm sorry that this request hasn't been acknowledged.  If you would still like the project created, can you tell us more about what will run inside the project (and,...
[14:57:26] <shinken-wm>	 RECOVERY - Host tools-secgroup-test-102 is UP: PING OK - Packet loss = 0%, RTA = 0.64 ms
[14:58:07] <grrrit-wm>	 (03PS7) 10Paladox: Do not show merges by the L10n-bot [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/308949 (https://phabricator.wikimedia.org/T93082) 
[14:58:43] <grrrit-wm>	 (03CR) 10Addshore: Do not show merges by the L10n-bot (031 comment) [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/308949 (https://phabricator.wikimedia.org/T93082) (owner: 10Paladox)
[14:59:35] <grrrit-wm>	 (03CR) 10Paladox: Do not show merges by the L10n-bot (031 comment) [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/308949 (https://phabricator.wikimedia.org/T93082) (owner: 10Paladox)
[14:59:50] <wikibugs>	 06Labs, 10Labs-Infrastructure: Experiment with Linux KSM (dedupe memory shared by instances) on labs infra - https://phabricator.wikimedia.org/T146037#2648431 (10hashar)
[15:00:54] <grrrit-wm>	 (03PS8) 10Paladox: Do not show merges by the L10n-bot [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/308949 (https://phabricator.wikimedia.org/T93082) 
[15:01:17] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Do not show merges by the L10n-bot [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/308949 (https://phabricator.wikimedia.org/T93082) (owner: 10Paladox)
[15:01:53] <grrrit-wm>	 (03PS9) 10Paladox: Do not show merges by the L10n-bot [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/308949 (https://phabricator.wikimedia.org/T93082) 
[15:02:53] <grrrit-wm>	 (03CR) 10Paladox: "@Addshore done" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/308949 (https://phabricator.wikimedia.org/T93082) (owner: 10Paladox)
[15:04:29] <shinken-wm>	 PROBLEM - Host tools-secgroup-test-102 is DOWN: CRITICAL - Host Unreachable (10.68.21.170)
[15:04:35] <grrrit-wm>	 (03CR) 10Addshore: Do not show merges by the L10n-bot (031 comment) [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/308949 (https://phabricator.wikimedia.org/T93082) (owner: 10Paladox)
[15:05:43] <grrrit-wm>	 (03PS10) 10Paladox: Do not show merges by the L10n-bot [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/308949 (https://phabricator.wikimedia.org/T93082) 
[15:05:57] <grrrit-wm>	 (03CR) 10Paladox: "@Addshore done :)" (031 comment) [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/308949 (https://phabricator.wikimedia.org/T93082) (owner: 10Paladox)
[15:06:25] <grrrit-wm>	 (03CR) 10Addshore: [C: 032] Do not show merges by the L10n-bot [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/308949 (https://phabricator.wikimedia.org/T93082) (owner: 10Paladox)
[15:06:56] <grrrit-wm>	 (03Merged) 10jenkins-bot: Do not show merges by the L10n-bot [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/308949 (https://phabricator.wikimedia.org/T93082) (owner: 10Paladox)
[15:07:21] <paladox>	 !log tools.lolrrit-wm deploying https://gerrit.wikimedia.org/r/311414 and https://gerrit.wikimedia.org/r/#/c/308949/
[15:07:25] <labs-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.lolrrit-wm/SAL, Master
[15:14:11] <grrrit-wm>	 (03CR) 10Paladox: "Thanks." [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/308949 (https://phabricator.wikimedia.org/T93082) (owner: 10Paladox)
[15:14:23] <paladox>	 addshore ^^ deployed :)
[15:14:29] <addshore>	 thanks!
[15:14:35] <paladox>	 Your welcome :)
[15:16:35] <wikibugs>	 06Labs: Request creation of Mathematical Refresh Rate Policies labs project - https://phabricator.wikimedia.org/T143901#2648499 (10chasemp) p:05Triage>03Normal @Agaherbert  can you expand a bit on what you need here? We can allocate resources but from what I read and understand this seems like the kind of an...
[15:19:51] <marostegui>	 Hello everyone, db1069 was hit again by this: https://phabricator.wikimedia.org/T145077 - I have collected all the information again and will feed it back to Percona and TokuDB so they can investigate further
[15:20:02] <marostegui>	 db1069 is now fixed and trying to catch up
[15:21:12] <marostegui>	 https://tools.wmflabs.org/replag/ s2 lag theris du e
[15:21:19] <marostegui>	 *there is due to this issue
[15:53:50] <wikibugs>	 10Tool-Labs-tools-Pageviews: Add "wiki page" as a source to Massviews - https://phabricator.wikimedia.org/T144251#2648701 (10MusikAnimal) 05Open>03Resolved a:03MusikAnimal Done with https://github.com/MusikAnimal/pageviews/releases/tag/2016.09.19T15.48
[15:54:09] <wikibugs>	 10Tool-Labs-tools-Pageviews: Add "subpages" as a source to Massviews - https://phabricator.wikimedia.org/T144238#2648709 (10MusikAnimal) 05Open>03Resolved a:03MusikAnimal Done with https://github.com/MusikAnimal/pageviews/releases/tag/2016.09.19T15.48
[15:57:39] <wikibugs>	 06Labs, 10Labs-Infrastructure, 10Analytics: Report page views for labs instances - https://phabricator.wikimedia.org/T103726#2648753 (10Milimetric) This should be done with piwik on labs, now that we have more experience with it.
[16:01:18] <wikibugs>	 10Quarry, 10Analytics: it would be useful to run the same Quarry query conveniently in several database - https://phabricator.wikimedia.org/T95582#1195035 (10Milimetric) I'm going to untag Analytics, quarry is a different approach, we're about to allow multi-database data access in a different way.
[16:01:25] <wikibugs>	 10Quarry: it would be useful to run the same Quarry query conveniently in several database - https://phabricator.wikimedia.org/T95582#2648793 (10Milimetric)
[16:36:13] <wikibugs>	 10Striker, 10Phabricator, 10Security-Reviews, 13Patch-For-Review: Unable to mirror repository from git.legoktm.com into diffusion - https://phabricator.wikimedia.org/T143969#2648991 (10mmodell) >>! In T143969#2647761, @faidon wrote: > Under which account do those git fetches run, and what other privileges...
[17:04:17] <wikibugs>	 06Labs, 10Beta-Cluster-Infrastructure: Please raise quota for deployment-prep - https://phabricator.wikimedia.org/T145611#2635940 (10Andrew) This increase sounds fine to me.
[17:04:25] <wikibugs>	 06Labs, 10Beta-Cluster-Infrastructure: Request increased quota for deployment-prep labs project - https://phabricator.wikimedia.org/T145636#2636577 (10Andrew) Yep, increase is fine with me.
[17:10:11] <wikibugs>	 10Striker: Allow easy replication of existing github/bitbucket repos - https://phabricator.wikimedia.org/T143971#2649139 (10mmodell)
[17:10:15] <wikibugs>	 10Striker, 10Phabricator, 10Security-Reviews, 13Patch-For-Review: Unable to mirror repository from git.legoktm.com into diffusion - https://phabricator.wikimedia.org/T143969#2649140 (10mmodell)
[17:37:24] <wikibugs>	 06Labs, 10PAWS, 06Research-and-Data: Setup new labsdbs for PAWS / Quarry - https://phabricator.wikimedia.org/T146061#2649322 (10yuvipanda)
[17:38:12] <wikibugs>	 06Labs, 10PAWS, 06Operations, 06Research-and-Data, 10hardware-requests: Purchase new labsdbs for PAWS / Quarry - https://phabricator.wikimedia.org/T146061#2649334 (10chasemp) p:05Triage>03Normal
[17:40:39] <multichill>	 yuvipanda: Do you happen to know if the Commons database server is unhappy? Some of my queries that used to take less than a minute started to time out
[17:42:42] <jynus>	 multichill, can you elaborate?
[17:45:13] <multichill>	 Hey jynus, didn't realize it was you. I do some horrible queries to find a bunch of images. For example /data/project/multichill/queries/commons/paintings_without_wikidata_ci.sql
[17:45:26] <jynus>	 no
[17:45:40] <jynus>	 I mean, what problems are you finding?
[17:45:47] <multichill>	 This query now timed out. RROR 2013 (HY000) at line 6: Lost connection to MySQL server during query after 2 hours
[17:45:54] <multichill>	 It used to complete in much shorter times
[17:45:58] <jynus>	 yes, long-running queries do that
[17:46:01] <multichill>	 Let me dig up the log
[17:46:02] <jynus>	 so
[17:46:07] <wikibugs>	 10Tool-Labs-tools-Xtools: Convert all xtools issues to either Phabricator or GitHub - https://phabricator.wikimedia.org/T134632#2649396 (10Matthewrbowker) I have added the xtools repository to Phabricator.  See {rXT}
[17:46:12] <jynus>	 the issue is why they are taking so much time
[17:46:45] <wikibugs>	 06Labs, 06Operations, 06Research-and-Data, 10hardware-requests: eqiad: 2 hardware access request for research labsdbs - https://phabricator.wikimedia.org/T146065#2649413 (10yuvipanda)
[17:46:59] <wikibugs>	 06Labs, 06Operations, 06Research-and-Data, 10hardware-requests: eqiad: 2 hardware access request for research labsdbs - https://phabricator.wikimedia.org/T146065#2649428 (10yuvipanda)
[17:47:01] <wikibugs>	 06Labs, 10PAWS, 06Operations, 06Research-and-Data, 10hardware-requests: Purchase new labsdbs for PAWS / Quarry - https://phabricator.wikimedia.org/T146061#2649430 (10yuvipanda)
[17:47:05] <multichill>	 Exactly
[17:47:38] <multichill>	 This query was real either 2 or 3  minutes
[17:48:17] <multichill>	 Since a day of 5 that exploded to 116min (KILL)
[17:49:02] <multichill>	 jynus: grep -B 5 data/project/multichill/queries/commons/paintings_without_wikidata_ci.txt /data/project/multichill/logs/find_painting_images.log | grep real
[17:49:56] <jynus>	 "Since a day of 5" what does that mean?
[17:50:13] <multichill>	 jynus: Sorry, since about 5 days ago
[17:50:42] <jynus>	 maybe WLM has some stress on commons replica server?
[17:51:05] <jynus>	 I see high memory and cpu usage
[17:51:16] <multichill>	 12 runs ago, 2 times a day so something changed about 6 or 7 days ago. Maybe. any graphs that have gone up a lot around that time?
[17:53:42] <multichill>	 jynus: 42575053, that's me
[17:54:10] <multichill>	 The number of rows is insane
[17:54:52] <jynus>	 maybe try splitting the query in smaller chunks
[17:55:02] <wikibugs>	 06Labs, 06Operations, 06Research-and-Data, 10hardware-requests: eqiad: 2 hardware access request for research labsdbs - https://phabricator.wikimedia.org/T146065#2649465 (10chasemp) p:05Triage>03Normal
[17:57:25] <multichill>	 The query hasn't been changed since April and would just complete in a normal time. Without knowing the source of the problem it would be a bit pointless
[18:08:28] <wikibugs>	 06Labs, 06Operations, 06Research-and-Data-Backlog, 10hardware-requests: eqiad: 2 hardware access request for research labsdbs - https://phabricator.wikimedia.org/T146065#2649537 (10ggellerman)
[18:14:48] <multichill>	 jynus: The server hosting the Commons database as a high load, is any of the other servers less busy so I can test if it does complete and run in a normal time on that one?
[18:15:25] <jynus>	 you can hardcode using enwiki database
[18:15:35] <jynus>	 ok for a test
[18:16:03] <multichill>	 ok, running now
[18:16:48] <multichill>	 Would it be possible that for some reason indexes are not used or something in that direction? I can't use describe because it's a view....
[18:17:39] <jynus>	 you should be able to explain the connection
[18:17:52] <jynus>	 for a long running query
[18:18:19] <jynus>	 http://s.petrunia.net/blog/?p=89
[18:23:16] <multichill>	 Oh that's awesome, didn't know that one jynus!
[18:24:02] <jynus>	 I think I am going to do an emergency restart of labsdb1003
[18:24:17] <jynus>	 if I do not do it, it will explode and it will be worse
[18:24:31] <jynus>	 it is 1 step near exhausting all memory
[18:25:24] <mafk>	 explode?
[18:25:27] <mafk>	 duh
[18:25:35] <mafk>	 hope that it's not literally
[18:25:58] <jynus>	 well, I prefer to restart the server unnanounced and being able to startit back
[18:26:11] <jynus>	 than it crashing and not being able to start it again
[18:27:00] <jynus>	 it is swapping like crazy: https://grafana-admin.wikimedia.org/dashboard/db/mysql?panelId=40&fullscreen&from=1474223207596&to=1474309607597&var-dc=eqiad%20prometheus%2Fops&var-server=labsdb1003
[18:27:25] <wikibugs>	 10PAWS, 06Research-and-Data-Backlog: Create a mailing list for PAWS - https://phabricator.wikimedia.org/T129297#2101483 (10leila) @DarTar, do you want this to happen? If not, we can close it and open it as needed in the future.
[18:34:20] <jynus>	 we will suffer some turbulences, please keep your seat belts fasten
[18:37:02] <multichill>	 You can always ask reedy to be your co-pilot
[18:46:46] <jynus>	 multichill, try now on commons
[18:46:50] <jynus>	 it should be much better
[18:47:16] <jynus>	 no swapping
[18:49:53] <jynus>	 we will see how long it lasts...
[18:51:58] <multichill>	 Running
[19:07:02] <multichill>	 jynus: Bummer, it's still running. Going to kill it and disable the job for now. No sense in hammering the database servers if no result is produced......
[19:25:11] <wikibugs>	 06Labs, 06Operations, 06Research-and-Data-Backlog, 10hardware-requests: eqiad: 2 hardware access request for research labsdbs - https://phabricator.wikimedia.org/T146065#2649945 (10chasemp) hi @ggellerman thanks!  I believe this has been specially budgeted for in Q2 of 2016 and should work within that budg...
[19:25:56] <wikibugs>	 10Tool-Labs-tools-Erwin's-tools: Unknown Error/MySQL errors - https://phabricator.wikimedia.org/T140421#2649959 (10Nemo_bis) That's just replag https://tools.wmflabs.org/replag/
[19:37:20] <shinken-wm>	 PROBLEM - Puppet run on tools-docker-builder-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[19:39:54] <wikibugs>	 06Labs, 07Tracking: Existing Labs project quota increase requests (Tracking) - https://phabricator.wikimedia.org/T140904#2649993 (10chasemp)
[19:39:56] <wikibugs>	 06Labs, 10Beta-Cluster-Infrastructure: Please raise quota for deployment-prep - https://phabricator.wikimedia.org/T145611#2649990 (10chasemp) 05Open>03Resolved a:03chasemp
[19:43:31] <yuvipanda>	 legoktm: start in 15min?
[19:43:43] <legoktm>	 yuvipanda: sure. I already created the instance
[19:45:10] <wikibugs>	 06Labs, 07Tracking: Existing Labs project quota increase requests (Tracking) - https://phabricator.wikimedia.org/T140904#2650019 (10chasemp)
[19:45:12] <wikibugs>	 06Labs, 10Beta-Cluster-Infrastructure: Request increased quota for deployment-prep labs project - https://phabricator.wikimedia.org/T145636#2650016 (10chasemp) 05Open>03Resolved a:03chasemp should be gtg, there are a few stacked quota bumps for deployment-prep so let me know @fgiunchedi if you get hung u...
[19:45:28] <wikibugs>	 06Labs, 10Beta-Cluster-Infrastructure: Please raise quota for deployment-prep - https://phabricator.wikimedia.org/T145611#2650022 (10hashar) New quotas:  | Cores | 171/192 | RAM | 350208/392400
[19:45:35] <yuvipanda>	 legoktm: ok. did you already set it up with role::puppet::self?
[19:45:38] <yuvipanda>	 if not don't do it!
[19:45:42] <legoktm>	 no
[19:45:47] <yuvipanda>	 ok
[19:45:47] <legoktm>	 just literally created the instance
[19:46:22] <yuvipanda>	 ah ok :)
[19:46:25] <yuvipanda>	 legoktm: as jessie?
[19:46:30] <legoktm>	 of course :)
[19:47:03] <yuvipanda>	 legoktm: can you apply the role 'role::puppetmaster::standalone'?
[19:47:50] <legoktm>	 I have to use wikitech for that right?
[19:48:00] <yuvipanda>	 legoktm: yup
[19:48:05] <yuvipanda>	 horizon will get it in the next few days
[19:49:41] <legoktm>	 > Modified instance (integration-puppetmaster01). 
[19:50:27] <legoktm>	 yuvipanda: do I need to force a puppet run or anything?
[19:50:44] <yuvipanda>	 legoktm: yeah
[19:50:53] <yuvipanda>	 or you can wait for the automatic run sometime in next 30min but force :D
[19:52:38] <legoktm>	 The last Puppet run was at Mon Sep 19 19:48:39 UTC 2016 (3 minutes ago). 
[19:52:51] <yuvipanda>	 heh
[19:52:57] <legoktm>	 not sure if that was in time :S
[19:52:59] <legoktm>	 also
[19:53:00] <legoktm>	 integration-puppetmaster01 is a Puppet client of integration-puppetmaster.integration.eqiad.wmflabs (puppetclient)
[19:53:03] <legoktm>	 is that going to cause problems?
[19:53:26] <legoktm>	 apparently
[19:53:30] <legoktm>	 I tried to force puppet
[19:53:31] <legoktm>	 Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Duplicate declaration: File[/etc/puppet/fileserver.conf] is already declared in file /etc/puppet/modules/puppet/manifests/self/config.pp:101; cannot redeclare at /etc/puppet/modules/puppetmaster/manifests/config.pp:26 on node integration-puppetmaster01.integration.eqiad.wmflabs
[19:54:02] <yuvipanda>	 that's a different problem, caused by the fact that role::puppet::self is applied to all instances by the hiera config for the project
[19:54:55] <Krenair>	 puppetmasters don't need to be their own clients, do they?
[19:54:57] <yuvipanda>	 legoktm: I just did https://wikitech.wikimedia.org/wiki/Hiera:Integration/host/integration-puppetmaster01
[19:54:59] <yuvipanda>	 legoktm: try now?
[19:55:06] <Krenair>	 can just override the puppetmaster on the specific new puppetmaster instance?
[19:55:20] <yuvipanda>	 Krenair: that is irrelevant to this particular issue he's having tho
[19:55:39] <Krenair>	 yeah I was thinking about the other thing
[19:55:42] <yuvipanda>	 which is that the puppetmaster class and the puppet class can't co-exist, because role::puppet::self sets up the whole puppetmaster config and stuff even if it's only a client
[19:55:50] <legoktm>	 yuvipanda: same error
[19:56:06] <yuvipanda>	 legoktm: do a git pull -r origin production on the integration puppetmaster?
[19:56:08] <yuvipanda>	 I think those lag behind by upto ten minutes
[19:56:50] <legoktm>	 uh, where is the repo again?
[19:57:03] <legoktm>	 /var/lib/...something?
[19:57:42] <yuvipanda>	  /var/lib/git/operations/puppet
[19:57:53] <legoktm>	 uh, /var/lib/git is empty
[19:58:02] <legoktm>	 root@integration-puppetmaster01:/var/lib/git# ls
[19:58:45] <yuvipanda>	 legoktm: ok, I'm gonna poke around for a sec
[19:58:50] <legoktm>	 go for it
[20:00:32] <yuvipanda>	 man, I hate role::puppet::self so much
[20:01:16] <yuvipanda>	 legoktm: ok, I'm going to remove role::puppet::self from Hiera:Integration
[20:01:32] <legoktm>	 how much will that break everything else?
[20:01:34] <yuvipanda>	 legoktm: I'm tempted to disable puppet across the instances now
[20:01:50] <legoktm>	 this doesn't affect contintcloud right?
[20:01:57] <yuvipanda>	 nope
[20:01:59] <yuvipanda>	 just integration
[20:02:01] <legoktm>	 ok
[20:02:05] <legoktm>	 should be fine for now then
[20:02:10] <legoktm>	 just !log in -releng?
[20:02:53] <yuvipanda>	 yeah ok
[20:04:15] <shinken-wm>	 PROBLEM - Puppet run on tools-precise-dev is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:04:53] <shinken-wm>	 PROBLEM - Puppet run on tools-puppetmaster-02 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[20:08:09] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1206 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[20:09:41] <shinken-wm>	 RECOVERY - Puppet run on tools-docker-builder-03 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:13:34] <shinken-wm>	 PROBLEM - Puppet run on tools-k8s-etcd-01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:14:04] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1409 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:14:06] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1408 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:14:36] <shinken-wm>	 PROBLEM - Puppet run on tools-worker-1009 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[20:15:18] <tom29739>	 ^ problem?
[20:15:28] <shinken-wm>	 PROBLEM - Puppet run on tools-worker-1017 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[20:15:54] <shinken-wm>	 PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[20:16:22] <yuvipanda>	 yeah looking
[20:17:20] <shinken-wm>	 PROBLEM - Puppet run on tools-flannel-etcd-02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:17:26] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1209 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:18:06] <yuvipanda>	 no idea, it worked on the instance I just tried
[20:18:06] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1216 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:18:12] <shinken-wm>	 PROBLEM - Puppet run on tools-worker-1016 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[20:18:15] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-gift is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[20:19:15] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1212 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[20:19:37] <yuvipanda>	 well, I'm in the middle of something else that is also time sensitive (integration switchover), so I'm going to let this fire burn
[20:26:17] <chasemp>	 yuvipanda: I tried to catch a few and so far they both succeed for me so...idk yet, slow but no errors
[20:26:26] <chasemp>	 so not sure but seems either not urgent or transient
[20:26:51] <yuvipanda>	 yeah
[20:34:12] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1212 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:39:35] <shinken-wm>	 RECOVERY - Puppet run on tools-worker-1009 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:43:09] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1206 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:44:16] <shinken-wm>	 RECOVERY - Puppet run on tools-precise-dev is OK: OK: Less than 1.00% above the threshold [0.0]
[20:44:54] <shinken-wm>	 RECOVERY - Puppet run on tools-puppetmaster-02 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:53:07] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1216 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:53:13] <shinken-wm>	 RECOVERY - Puppet run on tools-worker-1016 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:53:37] <shinken-wm>	 RECOVERY - Puppet run on tools-k8s-etcd-01 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:55:29] <shinken-wm>	 RECOVERY - Puppet run on tools-worker-1017 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:55:53] <shinken-wm>	 RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:56:28] <wikibugs>	 10PAWS: python3-tk package missing - https://phabricator.wikimedia.org/T145362#2650344 (10Tbayer) 05Open>03Resolved a:03yuvipanda >>! In T145362#2629234, @yuvipanda wrote: > Have you tried using `%matplotlib inline` instead of `%matplotlib`? The > former works better in notebooks.  Yes, that works, thanks!...
[20:57:19] <shinken-wm>	 RECOVERY - Puppet run on tools-flannel-etcd-02 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:57:23] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1209 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:58:13] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-gift is OK: OK: Less than 1.00% above the threshold [0.0]
[20:59:04] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1409 is OK: OK: Less than 1.00% above the threshold [0.0]
[20:59:06] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1408 is OK: OK: Less than 1.00% above the threshold [0.0]
[21:22:58] <wikibugs>	 06Labs, 10Tool-Labs, 06Collaboration-Team-Triage, 06Community-Tech-Tool-Labs, and 5 others: Enable Flow on wikitech (labswiki and labtestwiki), then turn on for Tool talk namespace - https://phabricator.wikimedia.org/T127792#2650437 (10Catrope)
[21:54:01] <shinken-wm>	 PROBLEM - Puppet run on tools-checker-01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[21:54:38] <shinken-wm>	 PROBLEM - Puppet run on tools-worker-1021 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[21:54:40] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1209 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[21:54:44] <shinken-wm>	 PROBLEM - Puppet run on tools-worker-1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0]
[21:55:20] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1411 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[21:56:21] <yuvipanda>	 ^ all transient
[21:57:37] <chasemp>	 maybe clients are grouping on the in-project master and causing issues?
[21:58:00] <yuvipanda>	 chasemp: no, I merged the change which caused a puppetmaster restart
[21:58:14] <yuvipanda>	 so these were the ones that were running when the restart happened
[21:58:21] <chasemp>	 gotcha
[21:58:48] <yuvipanda>	 I'm writing docs now
[21:59:46] <shinken-wm>	 RECOVERY - Puppet run on tools-worker-1001 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:09:03] <shinken-wm>	 RECOVERY - Puppet run on tools-checker-01 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:14:33] <wikibugs>	 06Labs, 06Operations, 06Research-and-Data-Backlog, 10hardware-requests: eqiad: 2 hardware access request for research labsdbs - https://phabricator.wikimedia.org/T146065#2650663 (10ggellerman) @chasemp Hi!  This was on the Research & Data workboard.  Because it looks like Yuvi is doing the work, we moved i...
[22:29:41] <shinken-wm>	 RECOVERY - Puppet run on tools-worker-1021 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:29:41] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1209 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:30:21] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1411 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:48:07] <wikibugs>	 06Labs, 06Operations, 06Research-and-Data-Backlog, 10hardware-requests: eqiad: 2 hardware access request for research labsdbs - https://phabricator.wikimedia.org/T146065#2650788 (10DarTar) @chasemp @ggellerman yes, this is part of dedicated capex budget for FY16-17.  If there are separate tickets where app...
[23:09:11] <wikibugs>	 10PAWS, 06Research-and-Data-Backlog: Create a mailing list for PAWS - https://phabricator.wikimedia.org/T129297#2650857 (10DarTar) @leila this is now subject to the launch timeline, which is realistically going to be in Q3. I'm fine closing it since we'll need to plan the announcement/support strategy when the...