[00:13:31] PROBLEM - Puppet run on tools-exec-1430 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [00:48:29] RECOVERY - Puppet run on tools-exec-1430 is OK: OK: Less than 1.00% above the threshold [0.0] [02:10:33] 06Labs, 10Horizon: Stop requiring two-factor authentication for horizon.wikimedia.org - https://phabricator.wikimedia.org/T161473#3185606 (10MZMcBride) 05declined>03Open [02:18:16] 06Labs, 10Horizon: Stop requiring two-factor authentication for horizon.wikimedia.org - https://phabricator.wikimedia.org/T161473#3185611 (10MZMcBride) >>! In T161473#3140735, @Andrew wrote: > Wikitech has always required 2fa for any actions that affect VMs. I don't know what this means. I've never needed two... [05:22:32] PROBLEM - Puppet run on tools-exec-1434 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [05:57:32] RECOVERY - Puppet run on tools-exec-1434 is OK: OK: Less than 1.00% above the threshold [0.0] [06:09:30] PROBLEM - Puppet run on tools-exec-1430 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [06:16:56] 06Labs, 10DBA: S1 replag at 3 hours - https://phabricator.wikimedia.org/T163023#3185797 (10Marostegui) 05duplicate>03Resolved a:03jcrespo I believe this can be closed as all the slaves caught up after Jaime's fixes. ``` [root@labsdb1001 06:16 /root] # mysql --skip-ssl -e "show all slaves status\G" | egre... [06:27:55] 06Labs, 10Tool-Labs, 06Community-Tech: labsdb1001 crashing regularly in the last 2 days due to OOM - https://phabricator.wikimedia.org/T163001#3183006 (10Marostegui) So the server hasn't crashed again in almost 3 days which could be either the limit Jaime set and it is killing other slow queries or @Anomie s... [06:35:31] PROBLEM - Puppet run on tools-exec-1423 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [06:49:31] RECOVERY - Puppet run on tools-exec-1430 is OK: OK: Less than 1.00% above the threshold [0.0] [07:10:32] RECOVERY - Puppet run on tools-exec-1423 is OK: OK: Less than 1.00% above the threshold [0.0] [07:35:45] (03CR) 10jenkins-bot: Localisation updates from https://translatewiki.net. [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/348416 (owner: 10L10n-bot) [08:41:00] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Other: Shut down "cewbot" - https://phabricator.wikimedia.org/T160264#3185873 (10Nemo_bis) >>! In T160264#3185229, @Kanashimi wrote: > I am sorry that it seems there are some bugs in my script. I have change some codes and trying to fix the problem. > Please [[ https://... [08:47:38] 06Labs, 10Tool-Labs: `fr-wikiversity` Tool should get deleted - https://phabricator.wikimedia.org/T133778#3185892 (10TerraCodes) [09:04:01] 06Labs, 10Tool-Labs, 10DBA: labsdb1001 and labsdb1003 short on available space - https://phabricator.wikimedia.org/T132431#3185917 (10Marostegui) 37G of logfile in 4 days is quite a bit (we are logging warnings): ``` [root@labsdb1001 09:01 /srv/sqldata] # ls -lh labsdb1001.err -rw-r----- 1 mysql mysql 37G Ap... [09:27:17] is the max_user_connections setting new on toollabs? [09:27:25] erwin's tools are hitting it [09:30:49] akoopal: if I remember correctly it's set to something insanely high by default, and is reduced to a few if a user account is causing issues. What is it set to for erwin's tools? [09:31:18] akoopal: https://phabricator.wikimedia.org/T162519 [09:47:05] valhallasw`cloud: accoording to the error, 2 [09:47:54] akoopal: see the bug, that's about erwin's account. Indeed limited to 2 connections, due to overloading the database. [09:48:35] it is a collection of well used tools [09:48:55] akoopal: and even a collection of well used tools should not overload the database servers [09:49:14] nope, but not simpel to debug [09:55:14] over the last 10000 lines in access.log: [09:55:17] 5942 /erwin85/xcontribs.php [09:55:18] 2307 /erwin85/randomarticle.php [09:55:18] 296 /erwin85/relatedchanges.php [09:55:18] 266 /erwin85/categorycount.php [09:55:19] 226 /erwin85/contribs.php [09:56:30] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Erwin's-tools, 10Tool-Labs-tools-Other, 10DBA: s51362 has been rate limited to 2 concurrent connections for creating hundreds of 1400-second queries to labsdb1001 and labsdb1003 every 10 seconds - https://phabricator.wikimedia.org/T162519#3185960 (10Akoopal) [09:57:44] akoopal: jcrespo is probably OK with increasing it to some higher amount (10 or so), if you believe that will make tools accessible again. [09:57:55] however, if 1400s queries are submitted, the tools /will/ still go down [09:58:10] because those queries will take up a connection each... for 1400s at a time. [09:58:23] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Erwin's-tools, 10Tool-Labs-tools-Other, 10DBA: s51362 has been rate limited to 2 concurrent connections for creating hundreds of 1400-second queries to labsdb1001 and labsdb1003 every 10 seconds - https://phabricator.wikimedia.org/T162519#3165938 (10Akoopal) Did: t... [09:58:45] valhallasw`cloud: might be a try for now [09:59:46] added the tag for erwin's tools, suggested to split of xcontrib and randomarticle to start with, so the tools are less dependent [10:00:15] akoopal: the only script that has UNION_ALL is contribs.php [10:00:24] and for the rest going to further enjoy my holiday :-) [10:00:28] so that's likely the culprit [10:00:35] ok [10:00:47] PROBLEM - Host tools-exec-1433 is DOWN: CRITICAL - Host Unreachable (10.68.22.87) [10:02:54] valhallasw`cloud: can you add that? [10:12:42] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Erwin's-tools, 10Tool-Labs-tools-Other, 10DBA: s51362 has been rate limited to 2 concurrent connections for creating hundreds of 1400-second queries to labsdb1001 and labsdb1003 every 10 seconds - https://phabricator.wikimedia.org/T162519#3185981 (10Akoopal) From ir... [10:12:48] valhallasw`cloud: added myself to the bug [10:18:33] valhallasw`cloud: tool is at least working for me atm [10:18:58] if nobody else picks up will see what I can do later [10:20:39] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Erwin's-tools, 10Tool-Labs-tools-Other, 10DBA: s51362 has been rate limited to 2 concurrent connections for creating hundreds of 1400-second queries to labsdb1001 and labsdb1003 every 10 seconds - https://phabricator.wikimedia.org/T162519#3185989 (10valhallasw) The... [12:41:45] 06Labs, 10Labs-Infrastructure, 10WikiApiary: Requesting more disk space a Wikiapiary project instance - https://phabricator.wikimedia.org/T162534#3186178 (10chasemp) >>! In T162534#3172139, @chasemp wrote: > storage is not quota'd in the same fashion as RAM or CPU. I think what you guys want is described in... [14:00:52] RECOVERY - Host tools-exec-1433 is UP: PING OK - Packet loss = 0%, RTA = 5.37 ms [14:10:26] PROBLEM - Puppet run on tools-exec-1431 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [14:16:03] PROBLEM - Host tools-exec-1433 is DOWN: CRITICAL - Host Unreachable (10.68.22.87) [14:22:00] 06Labs, 10Tool-Labs, 06Community-Tech: labsdb1001 crashing regularly in the last 2 days due to OOM - https://phabricator.wikimedia.org/T163001#3186281 (10Anomie) I've already gotten [[https://en.wikipedia.org/wiki/User_talk:AnomieBOT#Broken_redirects_task_seems_to_have_stopped|an inquiry]] from someone who n... [14:45:25] RECOVERY - Puppet run on tools-exec-1431 is OK: OK: Less than 1.00% above the threshold [0.0] [14:59:15] (03CR) 10D3r1ck01: "> Perhaps the readme should reflect the *current* state of code base" [labs/tools/Wikimedia-Emoji-Bot] - 10https://gerrit.wikimedia.org/r/348010 (owner: 10D3r1ck01) [15:05:10] (03PS2) 10D3r1ck01: Update README.md file, add .env.example & .gitignore [labs/tools/Wikimedia-Emoji-Bot] - 10https://gerrit.wikimedia.org/r/348010 [15:10:37] 06Labs: IO issues for Tools instances flapping with iowait and puppet failure - https://phabricator.wikimedia.org/T161898#3186318 (10bd808) >>! In T161898#3180464, @chasemp wrote: > - 9107 tools.best-image:uwsgi-python-best-image > - 9115 tools.framabot:lighttpd-framabot These were both badly broken webse... [15:55:40] 06Labs, 10Labs-Infrastructure, 10WikiApiary: Requesting more disk space a Wikiapiary project instance - https://phabricator.wikimedia.org/T162534#3186481 (10MarkAHershberger) >>! In T162534#3186178, @chasemp wrote: > any luck? Looks straight-forward enough. I'm out this week and @DeepBlue was gone this w/e... [16:16:16] 06Labs, 10Tool-Labs, 10InternetArchiveBot: tools.iabot is overloading the grid by running too many workers in parallel - https://phabricator.wikimedia.org/T161951#3186559 (10Cyberpower678) 05Resolved>03Open Well, big brother isn't able to do it's job. My workers will not start. [16:27:31] 06Labs, 10Tool-Labs, 10InternetArchiveBot: tools.iabot is overloading the grid by running too many workers in parallel - https://phabricator.wikimedia.org/T161951#3148214 (10valhallasw) As far as I can see, ``` valhallasw@tools-bastion-03:/data/project/iabot$ qstat -u tools.iabot job-ID prior name u... [17:21:25] 06Labs, 06Operations, 13Patch-For-Review: Instance creation fails before first puppet run around 1% of the time - https://phabricator.wikimedia.org/T160908#3186744 (10Andrew) 05Open>03Resolved Five days without leaks or failures, so I think this is resolved. [17:21:49] !log tools adding 8 more exec nodes: tools-exec-1435 through 1442 [17:21:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [17:47:30] thanks andrewbogott [17:53:04] 06Labs, 10Tool-Labs, 10InternetArchiveBot: tools.iabot is overloading the grid by running too many workers in parallel - https://phabricator.wikimedia.org/T161951#3186923 (10Cyberpower678) >>! In T161951#3186597, @valhallasw wrote: > As far as I can see, > ``` > valhallasw@tools-bastion-03:/data/project/iabo... [18:03:50] 06Labs: Puppet breakage for ircecho - https://phabricator.wikimedia.org/T163129#3186992 (10Andrew) [18:31:27] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Other: ruarbcom tool runs count.py job once per minute - https://phabricator.wikimedia.org/T163075#3185255 (10Framawiki) I think that you really could set cronjob to run every five minutes for example. It's five times less ! Please remember that you use shared service.... [18:34:09] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Other: ruarbcom tool runs count.py job once per minute - https://phabricator.wikimedia.org/T163075#3187159 (10Kalan) Continuous jobs are really an option. I will convert my script into such a thing before next elections. [18:38:24] 06Labs, 13Patch-For-Review: Puppet breakage for ircecho - https://phabricator.wikimedia.org/T163129#3187173 (10Andrew) With that patch the error is now Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Duplicate declaration: File[/etc/init.d/ircecho] is already declared in file /etc/p... [18:47:39] 06Labs, 13Patch-For-Review: Puppet breakage for ircecho - https://phabricator.wikimedia.org/T163129#3187219 (10Andrew) 05Open>03Resolved a:03Andrew [18:48:21] 06Labs: Puppet breakage for ircecho - https://phabricator.wikimedia.org/T163129#3187228 (10Paladox) [18:49:34] PROBLEM - Puppet run on tools-exec-1442 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [18:50:50] RECOVERY - Host tools-exec-1433 is UP: PING OK - Packet loss = 0%, RTA = 1.49 ms [18:54:37] RECOVERY - Puppet run on tools-exec-1442 is OK: OK: Less than 1.00% above the threshold [0.0] [19:21:38] 06Labs, 10Labs-Infrastructure: Monitor dhcp/dnsmasq on labnet - https://phabricator.wikimedia.org/T162956#3187341 (10Andrew) If we run the test on a labs instance, it's easy: Just 'check_dhcp' in nagios. Not sure we have alerting from within labs though. [19:35:24] !log tools add reedy to sudo all perms so he can admin things [19:35:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:02:18] 06Labs, 10Labs-Infrastructure, 10WikiApiary: Requesting more disk space a Wikiapiary project instance - https://phabricator.wikimedia.org/T162534#3187553 (10Dzahn) p:05Triage>03Normal [20:02:38] 06Labs, 10Labs-Infrastructure, 10WikiApiary: Requesting more disk space a Wikiapiary project instance - https://phabricator.wikimedia.org/T162534#3166400 (10Dzahn) a:03DeepBlue [20:35:00] 06Labs, 10Monitoring, 10Shinken: Admin request for user paladox and Luke081515 in the project shinken - https://phabricator.wikimedia.org/T162629#3187722 (10Paladox) bump [20:37:34] 06Labs, 10Monitoring, 10Shinken: Admin request for user paladox and Luke081515 in the project shinken - https://phabricator.wikimedia.org/T162629#3187737 (10chasemp) >>! In T162629#3175607, @Paladox wrote: > @chasemp Hi, any update on this please? I would like to start testing https://gerrit.wikimedia.org/r/... [20:38:49] 06Labs, 10Monitoring, 10Shinken: Admin request for user paladox and Luke081515 in the project shinken - https://phabricator.wikimedia.org/T162629#3187740 (10Paladox) Oh i think that was -2 on it ever being in wmf. But i can still do it in labs. As he said i am free to do it in labs with testing before upload... [20:39:59] 06Labs, 10Monitoring, 10Shinken: Admin request for user paladox and Luke081515 in the project shinken - https://phabricator.wikimedia.org/T162629#3187742 (10Paladox) It's probably because i had this "This is a class that will hopefully replace icinga 1.x in wikimedia but manly for us to use in labs to begin... [20:41:32] !log tools Building tools-docker-builder-05 [20:41:36] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [20:47:02] 06Labs, 10Monitoring, 10Shinken: Admin request for user paladox and Luke081515 in the project shinken - https://phabricator.wikimedia.org/T162629#3187814 (10Paladox) @chasemp i could turn the instance into it's own puppet master allowing me to create a new puppet class and for testing. [20:53:32] 06Labs, 10Monitoring, 10Shinken: Admin request for user paladox and Luke081515 in the project shinken - https://phabricator.wikimedia.org/T162629#3169464 (10bd808) @Paladox do you have any discussion going with anyone at all about whether there is a desire for icinga2 and any explanation of the benefits it m... [20:56:49] 06Labs, 10Monitoring, 10Shinken: Admin request for user paladox and Luke081515 in the project shinken - https://phabricator.wikimedia.org/T162629#3187853 (10Paladox) >>! In T162629#3187844, @bd808 wrote: > @Paladox do you have any discussion going with anyone at all about whether there is a desire for icinga... [21:16:13] 06Labs, 10Monitoring, 10Shinken: Admin request for user paladox and Luke081515 in the project shinken - https://phabricator.wikimedia.org/T162629#3187887 (10Paladox) @bd808 hi, i've been looking at https://terrty.net/2016/shinken-vs-sensu-vs-icinga2-vs-zabbix/ looking mostly at shinken vs icinga2. According... [21:23:24] 06Labs, 10Monitoring, 10Shinken: Admin request for user paladox and Luke081515 in the project shinken - https://phabricator.wikimedia.org/T162629#3187911 (10bd808) >>! In T162629#3187853, @Paladox wrote: >>>! In T162629#3187844, @bd808 wrote: >> @Paladox do you have any discussion going with anyone at all ab... [21:35:14] PROBLEM - Puppet run on tools-webgrid-lighttpd-1405 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [21:35:32] PROBLEM - Puppet run on tools-exec-1414 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [21:35:42] PROBLEM - Puppet run on tools-exec-1411 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [21:35:52] PROBLEM - Puppet run on tools-elastic-03 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [21:36:06] PROBLEM - Puppet run on tools-exec-1401 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [21:36:26] PROBLEM - Puppet run on tools-exec-1431 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [21:36:39] andrewbogott: madhuvishy any idea on puppet issues^? [21:36:55] nope, but I'll look [21:37:02] PROBLEM - Puppet run on tools-exec-1421 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [21:37:08] PROBLEM - Puppet run on tools-exec-1407 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [21:37:09] I'm running it on bastion atm to see too... [21:37:19] Error: Failed to apply catalog: Could not find dependency Package[lvm2] for File[/usr/local/sbin/make-instance-vg] at /etc/puppet/modules/labs_lvm/manifests/init.pp:27 [21:37:22] uh ok [21:37:28] someone must have merged something? [21:37:40] PROBLEM - Puppet run on tools-exec-1416 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [21:37:42] PROBLEM - Puppet run on tools-webgrid-lighttpd-1418 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [21:37:43] or apt went crazy [21:38:04] i lvm2 - Linux Logical Volume Manager [21:38:11] yeah, looks like a real issue [21:38:38] PROBLEM - Puppet run on tools-exec-1406 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [21:38:43] PROBLEM - Puppet run on tools-services-01 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [21:38:55] PROBLEM - Puppet run on tools-prometheus-02 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [21:38:57] same all around [21:38:58] Error: Failed to apply catalog: Could not find dependency Package[lvm2] for File[/usr/local/sbin/make-instance-vg] at /etc/puppet/modules/labs_lvm/manifests/init.pp:27 [21:39:00] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [21:39:01] yeah I didn't merge anything [21:39:05] * madhuvishy looks [21:39:06] PROBLEM - Puppet run on tools-exec-1438 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [21:39:16] PROBLEM - Puppet run on tools-redis-1002 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [21:39:16] package { ['lvm2', 'parted']: [21:39:16] ensure => present, [21:39:17] } [21:39:20] PROBLEM - Puppet run on tools-prometheus-01 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [21:39:22] PROBLEM - Puppet run on tools-bastion-03 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [21:39:40] PROBLEM - Puppet run on tools-exec-1417 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [21:39:48] PROBLEM - Puppet run on tools-exec-1440 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [21:40:15] chasemp: that file hasn't changed in years [21:40:20] root@tools-webgrid-lighttpd-1418:~# apt-get install lvm2 [21:40:20] Reading package lists... Done [21:40:20] Building dependency tree [21:40:22] Reading state information... Done [21:40:24] lvm2 is already the newest version. [21:40:29] PROBLEM - Puppet run on tools-static-11 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [21:40:30] the package is available, it's the puppet def of the package that's missing [21:40:30] PROBLEM - Puppet run on tools-exec-1430 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [21:40:39] andrewbogott: ahh [21:40:39] 10PAWS: Notebook fails to save, throwing "413 Request Entity Too Large" error - https://phabricator.wikimedia.org/T163157#3187967 (10Tbayer) [21:40:40] right [21:40:44] ok... [21:41:07] modules/lvm/manifests/init.pp: package { 'lvm2': [21:41:21] or well, it's there right? [21:41:33] not in scope though I guess? [21:41:37] this should be fine tho [21:41:37] package { ['lvm2', 'parted']: [21:41:42] for source => 'puppet:///modules/labs_lvm/make-instance-vg', [21:41:55] in modules/labs_lvm/manifests/init.pp [21:42:19] PROBLEM - Puppet run on tools-exec-1410 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [21:42:20] chasemp https://gerrit.wikimedia.org/r/#/c/348624/ bd*808 looks like he fixed it in that change. [21:42:27] PROBLEM - Puppet run on tools-checker-02 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [21:42:59] I'm going to do a bisect test on the tools puppetmaster. [21:43:11] andrewbogott: https://gerrit.wikimedia.org/r/#/c/348624/ [21:43:31] PROBLEM - Puppet run on tools-webgrid-lighttpd-1403 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [21:44:14] chasemp: and that's merged? Or... [21:44:22] it isn't [21:44:23] PROBLEM - Puppet run on tools-exec-1429 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [21:44:35] I'm very confused as it isn't andrewbogott but something changed here -- must of [21:44:41] PROBLEM - Puppet run on tools-bastion-02 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [21:44:43] bd808: lvm in labs? someting changed? [21:44:51] PROBLEM - Puppet run on tools-webgrid-lighttpd-1409 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [21:44:53] oh, it's cherry-picked on the tools puppetmaster [21:44:57] PROBLEM - Puppet run on tools-webgrid-lighttpd-1408 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [21:44:58] oh shit that [21:45:02] bad news bears [21:45:03] PROBLEM - Puppet run on tools-exec-1435 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [21:45:04] I'll fix [21:45:09] PROBLEM - Puppet run on tools-webgrid-lighttpd-1412 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [21:45:16] is my cherry pick breaking the world? [21:45:19] PROBLEM - Puppet run on tools-elastic-01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [21:45:23] seems like it [21:45:29] PROBLEM - Puppet run on tools-exec-1409 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [21:45:34] I can toss it if nobody else has [21:45:41] bd808: i think andrewbogott is [21:45:43] PROBLEM - Puppet run on tools-exec-gift-trusty-01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [21:45:48] bd808: yeah, we have a fairly solid no cherry picks pact without !log [21:45:59] or at least yuvi and I came to that end [21:46:00] yeah, should be fixed now [21:46:02] I'm a horrible person [21:46:11] PROBLEM - Puppet run on tools-webgrid-lighttpd-1410 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [21:47:04] I was trying to make the new docker builder node not suck, but the naive hack was not a good one [21:47:20] PROBLEM - Puppet run on tools-exec-1412 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [21:47:26] mainly I was confused as hell, no worries tho, but in general I would even go so far as to say the channel title here or in -admin should reflect a cherry-pick at the least a SAL entry and an irc ping [21:47:31] but that's based on it being not common [21:47:44] the docker stuff that was made for prod installs lvm2 and conflicts with our role for mounting unallocated disk [21:47:52] merp [21:48:06] PROBLEM - Puppet run on tools-exec-1432 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [21:48:15] ugh [21:48:18] PROBLEM - Puppet run on tools-mail is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [21:48:34] PROBLEM - Puppet run on tools-exec-1434 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [21:49:20] PROBLEM - Puppet run on tools-webgrid-lighttpd-1416 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [21:50:39] PROBLEM - Puppet run on tools-exec-1442 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [21:50:55] PROBLEM - Puppet run on tools-webgrid-generic-1402 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [21:50:59] PROBLEM - Puppet run on tools-exec-1418 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [21:51:11] PROBLEM - Puppet run on tools-cron-01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [21:51:15] holy emails batman [22:01:28] RECOVERY - Puppet run on tools-exec-1431 is OK: OK: Less than 1.00% above the threshold [0.0] [22:10:58] RECOVERY - Puppet run on tools-exec-1418 is OK: OK: Less than 1.00% above the threshold [0.0] [22:12:38] RECOVERY - Puppet run on tools-exec-1416 is OK: OK: Less than 1.00% above the threshold [0.0] [22:12:40] RECOVERY - Puppet run on tools-webgrid-lighttpd-1418 is OK: OK: Less than 1.00% above the threshold [0.0] [22:13:56] RECOVERY - Puppet run on tools-prometheus-02 is OK: OK: Less than 1.00% above the threshold [0.0] [22:14:38] RECOVERY - Puppet run on tools-exec-1417 is OK: OK: Less than 1.00% above the threshold [0.0] [22:14:49] RECOVERY - Puppet run on tools-exec-1440 is OK: OK: Less than 1.00% above the threshold [0.0] [22:15:13] RECOVERY - Puppet run on tools-webgrid-lighttpd-1405 is OK: OK: Less than 1.00% above the threshold [0.0] [22:15:29] RECOVERY - Puppet run on tools-static-11 is OK: OK: Less than 1.00% above the threshold [0.0] [22:15:33] RECOVERY - Puppet run on tools-exec-1414 is OK: OK: Less than 1.00% above the threshold [0.0] [22:15:41] RECOVERY - Puppet run on tools-exec-1411 is OK: OK: Less than 1.00% above the threshold [0.0] [22:15:49] RECOVERY - Puppet run on tools-elastic-03 is OK: OK: Less than 1.00% above the threshold [0.0] [22:16:07] RECOVERY - Puppet run on tools-exec-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [22:17:02] RECOVERY - Puppet run on tools-exec-1421 is OK: OK: Less than 1.00% above the threshold [0.0] [22:17:08] RECOVERY - Puppet run on tools-exec-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [22:17:26] RECOVERY - Puppet run on tools-checker-02 is OK: OK: Less than 1.00% above the threshold [0.0] [22:18:32] RECOVERY - Puppet run on tools-webgrid-lighttpd-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [22:18:38] RECOVERY - Puppet run on tools-exec-1406 is OK: OK: Less than 1.00% above the threshold [0.0] [22:18:41] RECOVERY - Puppet run on tools-services-01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:19:01] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [22:19:08] RECOVERY - Puppet run on tools-exec-1438 is OK: OK: Less than 1.00% above the threshold [0.0] [22:19:16] RECOVERY - Puppet run on tools-redis-1002 is OK: OK: Less than 1.00% above the threshold [0.0] [22:19:22] RECOVERY - Puppet run on tools-prometheus-01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:19:24] RECOVERY - Puppet run on tools-exec-1429 is OK: OK: Less than 1.00% above the threshold [0.0] [22:20:30] RECOVERY - Puppet run on tools-exec-1430 is OK: OK: Less than 1.00% above the threshold [0.0] [22:22:22] RECOVERY - Puppet run on tools-exec-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [22:23:18] RECOVERY - Puppet run on tools-mail is OK: OK: Less than 1.00% above the threshold [0.0] [22:24:19] RECOVERY - Puppet run on tools-webgrid-lighttpd-1416 is OK: OK: Less than 1.00% above the threshold [0.0] [22:24:21] RECOVERY - Puppet run on tools-bastion-03 is OK: OK: Less than 1.00% above the threshold [0.0] [22:24:43] RECOVERY - Puppet run on tools-bastion-02 is OK: OK: Less than 1.00% above the threshold [0.0] [22:24:53] RECOVERY - Puppet run on tools-webgrid-lighttpd-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [22:24:59] RECOVERY - Puppet run on tools-webgrid-lighttpd-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [22:25:03] RECOVERY - Puppet run on tools-exec-1435 is OK: OK: Less than 1.00% above the threshold [0.0] [22:25:09] RECOVERY - Puppet run on tools-webgrid-lighttpd-1412 is OK: OK: Less than 1.00% above the threshold [0.0] [22:25:23] RECOVERY - Puppet run on tools-elastic-01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:25:29] RECOVERY - Puppet run on tools-exec-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [22:25:36] RECOVERY - Puppet run on tools-exec-1442 is OK: OK: Less than 1.00% above the threshold [0.0] [22:25:44] RECOVERY - Puppet run on tools-exec-gift-trusty-01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:25:58] RECOVERY - Puppet run on tools-webgrid-generic-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [22:26:10] RECOVERY - Puppet run on tools-webgrid-lighttpd-1410 is OK: OK: Less than 1.00% above the threshold [0.0] [22:26:10] RECOVERY - Puppet run on tools-cron-01 is OK: OK: Less than 1.00% above the threshold [0.0] [22:27:20] RECOVERY - Puppet run on tools-exec-1412 is OK: OK: Less than 1.00% above the threshold [0.0] [22:28:06] RECOVERY - Puppet run on tools-exec-1432 is OK: OK: Less than 1.00% above the threshold [0.0] [22:28:37] RECOVERY - Puppet run on tools-exec-1434 is OK: OK: Less than 1.00% above the threshold [0.0] [23:33:46] 06Labs, 06Developer-Relations (Apr-Jun 2017): Recover Zulip/Mattermost instance on Labs - https://phabricator.wikimedia.org/T162960#3188321 (10srishakatux) 05Open>03Resolved Conversation w/ Yuvipanda who has already done quite a lot of research in this area says: - Mattermost is totally opencore - but we... [23:35:59] PROBLEM - Puppet run on tools-exec-1436 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [23:39:03] 10Tool-Labs-tools-Xtools: "not valid wiki" error when I check Commons' contributions in X!s Tools - https://phabricator.wikimedia.org/T161427#3188365 (10Matthewrbowker) 05Open>03Resolved I have opened a pull request against the rewrite repository. This should fix the issue, by allowing us to "map" commonswi... [23:49:02] PROBLEM - Puppet run on tools-docker-builder-05 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [23:49:35] That tools-docker-builder-05 alert is probably me messing about [23:50:40] 06Labs, 06Developer-Relations (Apr-Jun 2017), 03Google-Summer-of-Code (2017), 10Outreachy (Round-14): Set up a Zulip instance on tool Labs - https://phabricator.wikimedia.org/T163169#3188402 (10srishakatux)