[00:10:25] bd808: why did gpy quit last time, also memory issues? at 1GB this time? [06:48:49] PROBLEM - Puppet errors on tools-exec-1419 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [07:23:50] RECOVERY - Puppet errors on tools-exec-1419 is OK: OK: Less than 1.00% above the threshold [0.0] [07:49:52] !log tools move ooooold shared resources into archive for later cleanup [07:49:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [08:08:33] gry: the command to run is "qacct -j gpy". It will take a while to finish but will show you the exit status code, maximum ram allocated, and a bunch of other stuff for each job. [08:11:01] gry: the most recent job reported ended with 1000.191M max vmem and status 130, so I think that yes it just kept asking for more ram and eventually hit the limit. [12:40:30] chasemp, andrewbogott is the tuning for Nova Resource/Tool weights something you still want to apply? [12:41:02] dcausse: andrew is traveling and I'm in meetings all day but yes we want to massage that still (I'm unsure on desired final state tho) [12:41:04] looking at the bug we discussed last week I see that you moved the page from Nova Resources to Portal [12:41:11] ah [12:41:13] right [12:41:27] ok will wait a bit then [12:41:41] dcausse: thanks and sorry that didn't occur to me, we were midflight for reorging docs [12:41:46] maybe the first plan won't hold [12:42:10] ok [12:57:24] 06Labs, 07Tracking: Existing Labs project quota increase requests (Tracking) - https://phabricator.wikimedia.org/T140904#3284037 (10bd808) [12:57:26] 06Labs, 10Labs-Infrastructure: Create a new labs flavor available to all project: largedisk - https://phabricator.wikimedia.org/T142166#3284035 (10bd808) 05Open>03declined We will continue to handle these on a per-request basis for the foreseeable future. The #cloud-services-team is currently not comfortab... [13:38:22] !log git rebooting gerrit-test and gerrit-mysql unknown error troubleshooting, possible nagios issue [13:38:39] of course i go to log something and the bot isnt even here... [13:50:52] is there some sort of labs maintaince today? my instances are going nuts when it comes to icinga notifications [13:51:08] but im able to access my instances myself with no issue [14:39:32] It seems that when trying to start nagios-nrpe-server it is now timming out as of today. Was working yesturday. [14:44:18] fyi, editing on wikitech is down, not sure what's going on [14:44:32] chasemp: ^ [14:45:03] o.0 [14:46:50] dcausse thanks, i will let bryan know [21:11:29] 06Labs, 10DBA: Labs database corruption - https://phabricator.wikimedia.org/T166091#3285070 (10Dispenser) [22:00:10] Can I install Open Web Analytics or Piwik for tools on Labs? [23:03:12] bd808: thank you [23:20:42] PROBLEM - Puppet errors on tools-exec-1437 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [23:51:20] 06Labs: IO issues for Tools instances flapping with iowait and puppet failure - https://phabricator.wikimedia.org/T161898#3285317 (10Kalan) [23:51:22] 06Labs, 10Tool-Labs, 10Tool-Labs-tools-Other: ruarbcom tool runs count.py job once per minute - https://phabricator.wikimedia.org/T163075#3285314 (10Kalan) 05Open>03Resolved a:03Kalan There’s now an equivalent continuous job, so marking this as resolved for now.