[00:13:31] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1430 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[00:48:29] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1430 is OK: OK: Less than 1.00% above the threshold [0.0]
[02:10:33] <wikibugs__>	 06Labs, 10Horizon: Stop requiring two-factor authentication for horizon.wikimedia.org - https://phabricator.wikimedia.org/T161473#3185606 (10MZMcBride) 05declined>03Open
[02:18:16] <wikibugs__>	 06Labs, 10Horizon: Stop requiring two-factor authentication for horizon.wikimedia.org - https://phabricator.wikimedia.org/T161473#3185611 (10MZMcBride) >>! In T161473#3140735, @Andrew wrote: > Wikitech has always required 2fa for any actions that affect VMs.  I don't know what this means. I've never needed two...
[05:22:32] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1434 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[05:57:32] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1434 is OK: OK: Less than 1.00% above the threshold [0.0]
[06:09:30] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1430 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[06:16:56] <wikibugs>	 06Labs, 10DBA: S1 replag at 3 hours - https://phabricator.wikimedia.org/T163023#3185797 (10Marostegui) 05duplicate>03Resolved a:03jcrespo I believe this can be closed as all the slaves caught up after Jaime's fixes. ``` [root@labsdb1001 06:16 /root] # mysql --skip-ssl -e "show all slaves status\G" | egre...
[06:27:55] <wikibugs__>	 06Labs, 10Tool-Labs, 06Community-Tech: labsdb1001 crashing regularly in the last 2 days due to OOM - https://phabricator.wikimedia.org/T163001#3183006 (10Marostegui) So the server hasn't crashed again in almost 3 days which could be either the limit Jaime set and it is killing other slow queries or @Anomie s...
[06:35:31] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1423 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[06:49:31] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1430 is OK: OK: Less than 1.00% above the threshold [0.0]
[07:10:32] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1423 is OK: OK: Less than 1.00% above the threshold [0.0]
[07:35:45] <wikibugs__>	 (03CR) 10jenkins-bot: Localisation updates from https://translatewiki.net. [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/348416 (owner: 10L10n-bot)
[08:41:00] <wikibugs__>	 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Other: Shut down "cewbot" - https://phabricator.wikimedia.org/T160264#3185873 (10Nemo_bis) >>! In T160264#3185229, @Kanashimi wrote: > I am sorry that it seems there are some bugs in my script. I have change some codes and trying to fix the problem. > Please [[ https://...
[08:47:38] <wikibugs__>	 06Labs, 10Tool-Labs: `fr-wikiversity` Tool should get deleted - https://phabricator.wikimedia.org/T133778#3185892 (10TerraCodes)
[09:04:01] <wikibugs__>	 06Labs, 10Tool-Labs, 10DBA: labsdb1001 and labsdb1003 short on available space - https://phabricator.wikimedia.org/T132431#3185917 (10Marostegui) 37G of logfile in 4 days is quite a bit (we are logging warnings): ``` [root@labsdb1001 09:01 /srv/sqldata] # ls -lh labsdb1001.err -rw-r----- 1 mysql mysql 37G Ap...
[09:27:17] <akoopal>	 is the max_user_connections setting new on toollabs?
[09:27:25] <akoopal>	 erwin's tools are hitting it
[09:30:49] <valhallasw`cloud>	 akoopal: if I remember correctly it's set to something insanely high by default, and is reduced to a few if a user account is causing issues. What is it set to for erwin's tools?
[09:31:18] <valhallasw`cloud>	 akoopal: https://phabricator.wikimedia.org/T162519
[09:47:05] <akoopal>	 valhallasw`cloud: accoording to the error, 2
[09:47:54] <valhallasw`cloud>	 akoopal: see the bug, that's about erwin's account. Indeed limited to 2 connections, due to overloading the database.
[09:48:35] <akoopal>	 it is a collection of well used tools
[09:48:55] <valhallasw`cloud>	 akoopal: and even a collection of well used tools should not overload the database servers
[09:49:14] <akoopal>	 nope, but not simpel to debug
[09:55:14] <akoopal>	 over the last 10000 lines in access.log:
[09:55:17] <akoopal>	    5942 /erwin85/xcontribs.php
[09:55:18] <akoopal>	    2307 /erwin85/randomarticle.php
[09:55:18] <akoopal>	     296 /erwin85/relatedchanges.php
[09:55:18] <akoopal>	     266 /erwin85/categorycount.php
[09:55:19] <akoopal>	     226 /erwin85/contribs.php
[09:56:30] <wikibugs__>	 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Erwin's-tools, 10Tool-Labs-tools-Other, 10DBA: s51362 has been rate limited to 2 concurrent connections for creating hundreds of 1400-second queries to labsdb1001 and labsdb1003 every 10 seconds - https://phabricator.wikimedia.org/T162519#3185960 (10Akoopal)
[09:57:44] <valhallasw`cloud>	 akoopal: jcrespo is probably OK with increasing it to some higher amount (10 or so), if you believe that will make tools accessible again.
[09:57:55] <valhallasw`cloud>	 however, if 1400s queries are submitted, the tools /will/ still go down
[09:58:10] <valhallasw`cloud>	 because those queries will take up a connection each... for 1400s at a time.
[09:58:23] <wikibugs>	 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Erwin's-tools, 10Tool-Labs-tools-Other, 10DBA: s51362 has been rate limited to 2 concurrent connections for creating hundreds of 1400-second queries to labsdb1001 and labsdb1003 every 10 seconds - https://phabricator.wikimedia.org/T162519#3165938 (10Akoopal) Did:  t...
[09:58:45] <akoopal>	 valhallasw`cloud: might be a try for now
[09:59:46] <akoopal>	 added the tag for erwin's tools, suggested to split of xcontrib and randomarticle to start with, so the tools are less dependent
[10:00:15] <valhallasw`cloud>	 akoopal: the only script that has UNION_ALL is contribs.php
[10:00:24] <akoopal>	 and for the rest going to further enjoy my holiday :-)
[10:00:28] <valhallasw`cloud>	 so that's likely the culprit
[10:00:35] <akoopal>	 ok
[10:00:47] <shinken-wm>	 PROBLEM - Host tools-exec-1433 is DOWN: CRITICAL - Host Unreachable (10.68.22.87)
[10:02:54] <akoopal>	 valhallasw`cloud: can you add that?
[10:12:42] <wikibugs>	 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Erwin's-tools, 10Tool-Labs-tools-Other, 10DBA: s51362 has been rate limited to 2 concurrent connections for creating hundreds of 1400-second queries to labsdb1001 and labsdb1003 every 10 seconds - https://phabricator.wikimedia.org/T162519#3185981 (10Akoopal) From ir...
[10:12:48] <akoopal>	 valhallasw`cloud: added myself to the bug
[10:18:33] <akoopal>	 valhallasw`cloud: tool is at least working for me atm
[10:18:58] <akoopal>	 if nobody else picks up will see what I can do later
[10:20:39] <wikibugs>	 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Erwin's-tools, 10Tool-Labs-tools-Other, 10DBA: s51362 has been rate limited to 2 concurrent connections for creating hundreds of 1400-second queries to labsdb1001 and labsdb1003 every 10 seconds - https://phabricator.wikimedia.org/T162519#3185989 (10valhallasw) The...
[12:41:45] <wikibugs__>	 06Labs, 10Labs-Infrastructure, 10WikiApiary: Requesting more disk space a Wikiapiary project instance - https://phabricator.wikimedia.org/T162534#3186178 (10chasemp) >>! In T162534#3172139, @chasemp wrote: > storage is not quota'd in the same fashion as RAM or CPU.  I think what you guys want is described in...
[14:00:52] <shinken-wm>	 RECOVERY - Host tools-exec-1433 is UP: PING OK - Packet loss = 0%, RTA = 5.37 ms
[14:10:26] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1431 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[14:16:03] <shinken-wm>	 PROBLEM - Host tools-exec-1433 is DOWN: CRITICAL - Host Unreachable (10.68.22.87)
[14:22:00] <wikibugs>	 06Labs, 10Tool-Labs, 06Community-Tech: labsdb1001 crashing regularly in the last 2 days due to OOM - https://phabricator.wikimedia.org/T163001#3186281 (10Anomie) I've already gotten [[https://en.wikipedia.org/wiki/User_talk:AnomieBOT#Broken_redirects_task_seems_to_have_stopped|an inquiry]] from someone who n...
[14:45:25] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1431 is OK: OK: Less than 1.00% above the threshold [0.0]
[14:59:15] <wikibugs>	 (03CR) 10D3r1ck01: "> Perhaps the readme should reflect the *current* state of code base" [labs/tools/Wikimedia-Emoji-Bot] - 10https://gerrit.wikimedia.org/r/348010 (owner: 10D3r1ck01)
[15:05:10] <wikibugs>	 (03PS2) 10D3r1ck01: Update README.md file, add .env.example & .gitignore [labs/tools/Wikimedia-Emoji-Bot] - 10https://gerrit.wikimedia.org/r/348010
[15:10:37] <wikibugs__>	 06Labs: IO issues for Tools instances flapping with iowait and puppet failure - https://phabricator.wikimedia.org/T161898#3186318 (10bd808) >>! In T161898#3180464, @chasemp wrote: >    - 9107 tools.best-image:uwsgi-python-best-image >    - 9115 tools.framabot:lighttpd-framabot  These were both badly broken webse...
[15:55:40] <wikibugs>	 06Labs, 10Labs-Infrastructure, 10WikiApiary: Requesting more disk space a Wikiapiary project instance - https://phabricator.wikimedia.org/T162534#3186481 (10MarkAHershberger) >>! In T162534#3186178, @chasemp wrote: > any luck?  Looks straight-forward enough.  I'm out this week and @DeepBlue was gone this w/e...
[16:16:16] <wikibugs>	 06Labs, 10Tool-Labs, 10InternetArchiveBot: tools.iabot is overloading the grid by running too many workers in parallel - https://phabricator.wikimedia.org/T161951#3186559 (10Cyberpower678) 05Resolved>03Open Well, big brother isn't able to do it's job.  My workers will not start.
[16:27:31] <wikibugs>	 06Labs, 10Tool-Labs, 10InternetArchiveBot: tools.iabot is overloading the grid by running too many workers in parallel - https://phabricator.wikimedia.org/T161951#3148214 (10valhallasw) As far as I can see, ``` valhallasw@tools-bastion-03:/data/project/iabot$ qstat -u tools.iabot job-ID  prior   name       u...
[17:21:25] <wikibugs__>	 06Labs, 06Operations, 13Patch-For-Review: Instance creation fails before first puppet run around 1% of the time - https://phabricator.wikimedia.org/T160908#3186744 (10Andrew) 05Open>03Resolved Five days without leaks or failures, so I think this is resolved.
[17:21:49] <andrewbogott>	 !log tools adding 8 more exec nodes:  tools-exec-1435 through 1442
[17:21:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[17:47:30] <chasemp>	 thanks andrewbogott
[17:53:04] <wikibugs__>	 06Labs, 10Tool-Labs, 10InternetArchiveBot: tools.iabot is overloading the grid by running too many workers in parallel - https://phabricator.wikimedia.org/T161951#3186923 (10Cyberpower678) >>! In T161951#3186597, @valhallasw wrote: > As far as I can see, > ``` > valhallasw@tools-bastion-03:/data/project/iabo...
[18:03:50] <wikibugs>	 06Labs: Puppet breakage for ircecho - https://phabricator.wikimedia.org/T163129#3186992 (10Andrew)
[18:31:27] <wikibugs>	 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Other: ruarbcom tool runs count.py job once per minute - https://phabricator.wikimedia.org/T163075#3185255 (10Framawiki) I think that you really could set cronjob to run every five minutes for example. It's five times less ! Please remember that you use shared service....
[18:34:09] <wikibugs__>	 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Other: ruarbcom tool runs count.py job once per minute - https://phabricator.wikimedia.org/T163075#3187159 (10Kalan) Continuous jobs are really an option. I will convert my script into such a thing before next elections.
[18:38:24] <wikibugs__>	 06Labs, 13Patch-For-Review: Puppet breakage for ircecho - https://phabricator.wikimedia.org/T163129#3187173 (10Andrew) With that patch the error is now  Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Duplicate declaration: File[/etc/init.d/ircecho] is already declared in file /etc/p...
[18:47:39] <wikibugs__>	 06Labs, 13Patch-For-Review: Puppet breakage for ircecho - https://phabricator.wikimedia.org/T163129#3187219 (10Andrew) 05Open>03Resolved a:03Andrew
[18:48:21] <wikibugs>	 06Labs: Puppet breakage for ircecho - https://phabricator.wikimedia.org/T163129#3187228 (10Paladox)
[18:49:34] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1442 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[18:50:50] <shinken-wm>	 RECOVERY - Host tools-exec-1433 is UP: PING OK - Packet loss = 0%, RTA = 1.49 ms
[18:54:37] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1442 is OK: OK: Less than 1.00% above the threshold [0.0]
[19:21:38] <wikibugs>	 06Labs, 10Labs-Infrastructure: Monitor dhcp/dnsmasq on labnet - https://phabricator.wikimedia.org/T162956#3187341 (10Andrew) If we run the test on a labs instance, it's easy:  Just 'check_dhcp' in nagios.  Not sure we have alerting from within labs though.
[19:35:24] <chasemp>	 !log tools add reedy to sudo all perms so he can admin things
[19:35:27] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[20:02:18] <wikibugs__>	 06Labs, 10Labs-Infrastructure, 10WikiApiary: Requesting more disk space a Wikiapiary project instance - https://phabricator.wikimedia.org/T162534#3187553 (10Dzahn) p:05Triage>03Normal
[20:02:38] <wikibugs__>	 06Labs, 10Labs-Infrastructure, 10WikiApiary: Requesting more disk space a Wikiapiary project instance - https://phabricator.wikimedia.org/T162534#3166400 (10Dzahn) a:03DeepBlue
[20:35:00] <wikibugs__>	 06Labs, 10Monitoring, 10Shinken: Admin request for user paladox and Luke081515 in the project shinken - https://phabricator.wikimedia.org/T162629#3187722 (10Paladox) bump
[20:37:34] <wikibugs__>	 06Labs, 10Monitoring, 10Shinken: Admin request for user paladox and Luke081515 in the project shinken - https://phabricator.wikimedia.org/T162629#3187737 (10chasemp) >>! In T162629#3175607, @Paladox wrote: > @chasemp Hi, any update on this please? I would like to start testing https://gerrit.wikimedia.org/r/...
[20:38:49] <wikibugs__>	 06Labs, 10Monitoring, 10Shinken: Admin request for user paladox and Luke081515 in the project shinken - https://phabricator.wikimedia.org/T162629#3187740 (10Paladox) Oh i think that was -2 on it ever being in wmf. But i can still do it in labs. As he said i am free to do it in labs with testing before upload...
[20:39:59] <wikibugs>	 06Labs, 10Monitoring, 10Shinken: Admin request for user paladox and Luke081515 in the project shinken - https://phabricator.wikimedia.org/T162629#3187742 (10Paladox) It's probably because i had this "This is a class that will hopefully replace icinga 1.x in wikimedia but manly for us to use in labs to begin...
[20:41:32] <bd808>	 !log tools Building tools-docker-builder-05
[20:41:36] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL
[20:47:02] <wikibugs__>	 06Labs, 10Monitoring, 10Shinken: Admin request for user paladox and Luke081515 in the project shinken - https://phabricator.wikimedia.org/T162629#3187814 (10Paladox) @chasemp i could turn the instance into it's own puppet master allowing me to create a new puppet class and for testing.
[20:53:32] <wikibugs__>	 06Labs, 10Monitoring, 10Shinken: Admin request for user paladox and Luke081515 in the project shinken - https://phabricator.wikimedia.org/T162629#3169464 (10bd808) @Paladox do you have any discussion going with anyone at all about whether there is a desire for icinga2 and any explanation of the benefits it m...
[20:56:49] <wikibugs>	 06Labs, 10Monitoring, 10Shinken: Admin request for user paladox and Luke081515 in the project shinken - https://phabricator.wikimedia.org/T162629#3187853 (10Paladox) >>! In T162629#3187844, @bd808 wrote: > @Paladox do you have any discussion going with anyone at all about whether there is a desire for icinga...
[21:16:13] <wikibugs__>	 06Labs, 10Monitoring, 10Shinken: Admin request for user paladox and Luke081515 in the project shinken - https://phabricator.wikimedia.org/T162629#3187887 (10Paladox) @bd808 hi, i've been looking at https://terrty.net/2016/shinken-vs-sensu-vs-icinga2-vs-zabbix/ looking mostly at shinken vs icinga2. According...
[21:23:24] <wikibugs>	 06Labs, 10Monitoring, 10Shinken: Admin request for user paladox and Luke081515 in the project shinken - https://phabricator.wikimedia.org/T162629#3187911 (10bd808) >>! In T162629#3187853, @Paladox wrote: >>>! In T162629#3187844, @bd808 wrote: >> @Paladox do you have any discussion going with anyone at all ab...
[21:35:14] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1405 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[21:35:32] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1414 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[21:35:42] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1411 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[21:35:52] <shinken-wm>	 PROBLEM - Puppet run on tools-elastic-03 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[21:36:06] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1401 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[21:36:26] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1431 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[21:36:39] <chasemp>	 andrewbogott: madhuvishy any idea on puppet issues^?
[21:36:55] <andrewbogott>	 nope, but I'll look
[21:37:02] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1421 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[21:37:08] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1407 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[21:37:09] <chasemp>	 I'm running it on bastion atm to see too...
[21:37:19] <chasemp>	 Error: Failed to apply catalog: Could not find dependency Package[lvm2] for File[/usr/local/sbin/make-instance-vg] at /etc/puppet/modules/labs_lvm/manifests/init.pp:27
[21:37:22] <chasemp>	 uh ok
[21:37:28] <chasemp>	 someone must have merged something?
[21:37:40] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1416 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[21:37:42] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1418 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[21:37:43] <chasemp>	 or apt went crazy
[21:38:04] <chasemp>	 i   lvm2                                                     - Linux Logical Volume Manager
[21:38:11] <andrewbogott>	 yeah, looks like a real issue
[21:38:38] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1406 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[21:38:43] <shinken-wm>	 PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[21:38:55] <shinken-wm>	 PROBLEM - Puppet run on tools-prometheus-02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[21:38:57] <chasemp>	 same all around
[21:38:58] <chasemp>	 Error: Failed to apply catalog: Could not find dependency Package[lvm2] for File[/usr/local/sbin/make-instance-vg] at /etc/puppet/modules/labs_lvm/manifests/init.pp:27
[21:39:00] <shinken-wm>	 PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[21:39:01] <madhuvishy>	 yeah I didn't merge anything
[21:39:05] * madhuvishy looks
[21:39:06] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1438 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[21:39:16] <shinken-wm>	 PROBLEM - Puppet run on tools-redis-1002 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[21:39:16] <chasemp>	     package { ['lvm2', 'parted']:
[21:39:16] <chasemp>	         ensure => present,
[21:39:17] <chasemp>	     }
[21:39:20] <shinken-wm>	 PROBLEM - Puppet run on tools-prometheus-01 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0]
[21:39:22] <shinken-wm>	 PROBLEM - Puppet run on tools-bastion-03 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[21:39:40] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1417 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[21:39:48] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1440 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[21:40:15] <madhuvishy>	 chasemp: that file hasn't changed in years
[21:40:20] <chasemp>	 root@tools-webgrid-lighttpd-1418:~# apt-get install lvm2
[21:40:20] <chasemp>	 Reading package lists... Done
[21:40:20] <chasemp>	 Building dependency tree
[21:40:22] <chasemp>	 Reading state information... Done
[21:40:24] <chasemp>	 lvm2 is already the newest version.
[21:40:29] <shinken-wm>	 PROBLEM - Puppet run on tools-static-11 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[21:40:30] <andrewbogott>	 the package is available, it's the puppet def of the package that's missing
[21:40:30] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1430 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[21:40:39] <chasemp>	 andrewbogott: ahh
[21:40:39] <wikibugs>	 10PAWS: Notebook fails to save, throwing "413 Request Entity Too Large" error - https://phabricator.wikimedia.org/T163157#3187967 (10Tbayer)
[21:40:40] <chasemp>	 right
[21:40:44] <chasemp>	 ok...
[21:41:07] <chasemp>	 modules/lvm/manifests/init.pp:    package { 'lvm2':
[21:41:21] <chasemp>	 or well, it's there right?
[21:41:33] <andrewbogott>	 not in scope though I guess?
[21:41:37] <chasemp>	 this should be fine tho
[21:41:37] <chasemp>	     package { ['lvm2', 'parted']:
[21:41:42] <chasemp>	 for         source  => 'puppet:///modules/labs_lvm/make-instance-vg',
[21:41:55] <chasemp>	 in modules/labs_lvm/manifests/init.pp
[21:42:19] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1410 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[21:42:20] <paladox>	 chasemp https://gerrit.wikimedia.org/r/#/c/348624/ bd*808 looks like he fixed it in that change.
[21:42:27] <shinken-wm>	 PROBLEM - Puppet run on tools-checker-02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[21:42:59] <andrewbogott>	 I'm going to do a bisect test on the tools puppetmaster.
[21:43:11] <chasemp>	 andrewbogott: https://gerrit.wikimedia.org/r/#/c/348624/
[21:43:31] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1403 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[21:44:14] <andrewbogott>	 chasemp:  and that's merged?  Or...
[21:44:22] <madhuvishy>	 it isn't
[21:44:23] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1429 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[21:44:35] <chasemp>	 I'm very confused as it isn't andrewbogott but something changed here -- must of
[21:44:41] <shinken-wm>	 PROBLEM - Puppet run on tools-bastion-02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[21:44:43] <chasemp>	 bd808: lvm in labs? someting changed?
[21:44:51] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1409 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[21:44:53] <andrewbogott>	 oh, it's cherry-picked on the tools puppetmaster
[21:44:57] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1408 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[21:44:58] <chasemp>	 oh shit that
[21:45:02] <chasemp>	 bad news bears
[21:45:03] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1435 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[21:45:04] <andrewbogott>	 I'll fix
[21:45:09] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1412 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[21:45:16] <bd808>	 is my cherry pick breaking the world?
[21:45:19] <shinken-wm>	 PROBLEM - Puppet run on tools-elastic-01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[21:45:23] <madhuvishy>	 seems like it
[21:45:29] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1409 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[21:45:34] <bd808>	 I can toss it if nobody else has
[21:45:41] <madhuvishy>	 bd808: i think andrewbogott is
[21:45:43] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-gift-trusty-01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[21:45:48] <chasemp>	 bd808: yeah, we have a fairly solid no cherry picks pact without !log 
[21:45:59] <chasemp>	 or at least yuvi and I came to that end
[21:46:00] <andrewbogott>	 yeah, should be fixed now
[21:46:02] <bd808>	 I'm a horrible person
[21:46:11] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1410 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[21:47:04] <bd808>	 I was trying to make the new docker builder node not suck, but the naive hack was not a good one
[21:47:20] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1412 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[21:47:26] <chasemp>	 mainly I was confused as hell, no worries tho, but in general I would even go so far as to say the channel title here or in -admin should reflect a cherry-pick at the least a SAL entry and an irc ping
[21:47:31] <chasemp>	 but that's based on it being not common
[21:47:44] <bd808>	 the docker stuff that was made for prod installs lvm2 and conflicts with our role for mounting unallocated disk
[21:47:52] <chasemp>	 merp
[21:48:06] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1432 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0]
[21:48:15] <madhuvishy>	 ugh
[21:48:18] <shinken-wm>	 PROBLEM - Puppet run on tools-mail is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[21:48:34] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1434 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0]
[21:49:20] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1416 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0]
[21:50:39] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1442 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[21:50:55] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-generic-1402 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[21:50:59] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1418 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[21:51:11] <shinken-wm>	 PROBLEM - Puppet run on tools-cron-01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[21:51:15] <chasemp>	 holy emails batman
[22:01:28] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1431 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:10:58] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1418 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:12:38] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1416 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:12:40] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1418 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:13:56] <shinken-wm>	 RECOVERY - Puppet run on tools-prometheus-02 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:14:38] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1417 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:14:49] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1440 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:15:13] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1405 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:15:29] <shinken-wm>	 RECOVERY - Puppet run on tools-static-11 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:15:33] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1414 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:15:41] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1411 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:15:49] <shinken-wm>	 RECOVERY - Puppet run on tools-elastic-03 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:16:07] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1401 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:17:02] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1421 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:17:08] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1407 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:17:26] <shinken-wm>	 RECOVERY - Puppet run on tools-checker-02 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:18:32] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1403 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:18:38] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1406 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:18:41] <shinken-wm>	 RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:19:01] <shinken-wm>	 RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:19:08] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1438 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:19:16] <shinken-wm>	 RECOVERY - Puppet run on tools-redis-1002 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:19:22] <shinken-wm>	 RECOVERY - Puppet run on tools-prometheus-01 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:19:24] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1429 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:20:30] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1430 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:22:22] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1410 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:23:18] <shinken-wm>	 RECOVERY - Puppet run on tools-mail is OK: OK: Less than 1.00% above the threshold [0.0]
[22:24:19] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1416 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:24:21] <shinken-wm>	 RECOVERY - Puppet run on tools-bastion-03 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:24:43] <shinken-wm>	 RECOVERY - Puppet run on tools-bastion-02 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:24:53] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1409 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:24:59] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1408 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:25:03] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1435 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:25:09] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1412 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:25:23] <shinken-wm>	 RECOVERY - Puppet run on tools-elastic-01 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:25:29] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1409 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:25:36] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1442 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:25:44] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-gift-trusty-01 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:25:58] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-generic-1402 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:26:10] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-lighttpd-1410 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:26:10] <shinken-wm>	 RECOVERY - Puppet run on tools-cron-01 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:27:20] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1412 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:28:06] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1432 is OK: OK: Less than 1.00% above the threshold [0.0]
[22:28:37] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1434 is OK: OK: Less than 1.00% above the threshold [0.0]
[23:33:46] <wikibugs>	 06Labs, 06Developer-Relations (Apr-Jun 2017): Recover Zulip/Mattermost instance on Labs - https://phabricator.wikimedia.org/T162960#3188321 (10srishakatux) 05Open>03Resolved Conversation w/ Yuvipanda who has already done quite a lot of research in this area says:  - Mattermost is totally opencore - but we...
[23:35:59] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1436 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[23:39:03] <wikibugs__>	 10Tool-Labs-tools-Xtools: "not valid wiki" error when I check Commons' contributions in X!s Tools - https://phabricator.wikimedia.org/T161427#3188365 (10Matthewrbowker) 05Open>03Resolved I have opened a pull request against the rewrite repository.  This should fix the issue, by allowing us to "map" commonswi...
[23:49:02] <shinken-wm>	 PROBLEM - Puppet run on tools-docker-builder-05 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[23:49:35] <bd808>	 That tools-docker-builder-05 alert is probably me messing about
[23:50:40] <wikibugs__>	 06Labs, 06Developer-Relations (Apr-Jun 2017), 03Google-Summer-of-Code (2017), 10Outreachy (Round-14): Set up a Zulip instance on tool Labs - https://phabricator.wikimedia.org/T163169#3188402 (10srishakatux)