[00:45:40] PROBLEM - Puppet run on tools-k8s-master-01 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [00:46:36] PROBLEM - Puppet run on tools-webgrid-lighttpd-1412 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [00:47:00] PROBLEM - Puppet run on tools-exec-1409 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [01:23:51] 06Labs, 10Tool-Labs, 07Wikimedia-Incident: Tune nginx config parameters for tools / labs proxies - https://phabricator.wikimedia.org/T143637#2574397 (10Krinkle) For the record, see https://wikitech.wikimedia.org/wiki/Incident_documentation/ToolsProxy20160823 and https://gerrit.wikimedia.org/r/#/c/297829/3/m... [01:25:41] RECOVERY - Puppet run on tools-k8s-master-01 is OK: OK: Less than 1.00% above the threshold [0.0] [01:26:34] RECOVERY - Puppet run on tools-webgrid-lighttpd-1412 is OK: OK: Less than 1.00% above the threshold [0.0] [01:26:58] RECOVERY - Puppet run on tools-exec-1409 is OK: OK: Less than 1.00% above the threshold [0.0] [04:05:11] PROBLEM - SSH on tools-exec-1217 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:10:47] PROBLEM - Host tools-secgroup-test-103 is DOWN: CRITICAL - Host Unreachable (10.68.21.22) [04:19:19] 10Tool-Labs-tools-Pageviews, 07I18n: The label Metric in Siteviews is not localizable - https://phabricator.wikimedia.org/T143544#2578152 (10MusikAnimal) 05Open>03Resolved a:03MusikAnimal Fixed with [[ https://github.com/MusikAnimal/pageviews/commit/93064d7b05c7f2a739b1bbd4f329ac2d7b8ea6a0 | 93064d7 ]].... [04:40:04] RECOVERY - SSH on tools-exec-1217 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2~wmfprecise2 (protocol 2.0) [04:51:03] PROBLEM - SSH on tools-exec-1217 is CRITICAL: Server answer [05:01:02] RECOVERY - SSH on tools-exec-1217 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2~wmfprecise2 (protocol 2.0) [05:07:02] PROBLEM - SSH on tools-exec-1217 is CRITICAL: Server answer [05:25:14] PROBLEM - Puppet run on tools-webgrid-lighttpd-1203 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [05:41:01] PROBLEM - Puppet run on tools-mail-01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [05:46:21] 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 10Striker, and 2 others: Deploy "Striker" Tool Labs console to WMF production - https://phabricator.wikimedia.org/T136256#2578178 (10mmodell) [06:00:14] RECOVERY - Puppet run on tools-webgrid-lighttpd-1203 is OK: OK: Less than 1.00% above the threshold [0.0] [06:01:12] PROBLEM - Puppet staleness on tools-exec-1204 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [43200.0] [06:10:15] PROBLEM - Puppet run on tools-precise-dev is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [06:52:05] 10Tool-Labs-tools-Pageviews: Prioritize date labels where there are spikes in traffic - https://phabricator.wikimedia.org/T143227#2578191 (10MusikAnimal) 05Open>03Invalid I actually misunderstood the request, they wanted to highlight Sundays to better illustrate the growth and decline in pageviews during a o... [06:56:54] 10Tool-Labs-tools-Pageviews: Using Quarry as the source in Massviews doesn't work in Safari - https://phabricator.wikimedia.org/T143767#2578221 (10MusikAnimal) p:05Triage>03Low [06:57:50] 10Tool-Labs-tools-Pageviews, 06Community-Tech, 07I18n: Topviews in the Pageviews labs tool doesn't auto-exclude special pages with localized names - https://phabricator.wikimedia.org/T139725#2440845 (10MusikAnimal) This should incidentally be fixed when T142403 is rolled out, hopefully soon :) [06:58:36] 10Tool-Labs-tools-Pageviews, 06Community-Tech, 07I18n: Topviews in the Pageviews labs tool doesn't auto-exclude special pages with localized names - https://phabricator.wikimedia.org/T139725#2578231 (10MusikAnimal) [06:58:39] 10Tool-Labs-tools-Pageviews, 03Community-Tech-Sprint: Restrict Topviews to showing data only for individual days or months - https://phabricator.wikimedia.org/T142403#2533750 (10MusikAnimal) [07:00:11] 10Tool-Labs-tools-Pageviews: Improve Topviews interface - https://phabricator.wikimedia.org/T142802#2578235 (10MusikAnimal) Will happen after T142403 is completed [07:00:45] 10Tool-Labs-tools-Pageviews, 03Community-Tech-Sprint: Restrict Topviews to showing data only for individual days or months - https://phabricator.wikimedia.org/T142403#2533750 (10MusikAnimal) [07:00:47] 10Tool-Labs-tools-Pageviews: Improve Topviews interface - https://phabricator.wikimedia.org/T142802#2578238 (10MusikAnimal) [07:01:43] 10Tool-Labs-tools-Pageviews, 03Community-Tech-Sprint: Restrict Topviews to showing data only for individual days or months - https://phabricator.wikimedia.org/T142403#2533750 (10MusikAnimal) a:03MusikAnimal [07:01:54] 10Tool-Labs-tools-Pageviews: Improve Topviews interface - https://phabricator.wikimedia.org/T142802#2546734 (10MusikAnimal) a:03MusikAnimal [07:02:24] PROBLEM - Puppet run on tools-exec-1202 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [07:03:01] 10Tool-Labs-tools-Pageviews, 06Community-Tech, 07I18n: Topviews in the Pageviews labs tool doesn't auto-exclude special pages with localized names - https://phabricator.wikimedia.org/T139725#2578242 (10MusikAnimal) a:03MusikAnimal [07:42:23] RECOVERY - Puppet run on tools-exec-1202 is OK: OK: Less than 1.00% above the threshold [0.0] [08:34:54] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Joel was created, changed by Joel link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Joel edit summary: Created page with "{{Tools Access Request |Justification=To help ;) |Completed=false |User Name=Joel }}" [09:27:38] 10Tool-Labs-tools-Pageviews, 07Browser-Support-Apple-Safari: Using Quarry as the source in Massviews doesn't work in Safari - https://phabricator.wikimedia.org/T143767#2578398 (10Aklapper) [10:33:41] !log git upgrading gerrit on gerrit-test [10:33:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Git/SAL, Master [11:50:04] https://tools.wmflabs.org/wikidata-todo/ is broken [11:50:32] (Also https://tools.wmflabs.org/wikidata-todo/quick_statements.php) [11:57:30] 06Labs, 10Tool-Labs: tools.wmflabs.org/wikidata-todo is not working - https://phabricator.wikimedia.org/T143779#2578752 (10abian) [12:00:00] 10Tool-Labs-tools-Other: tools.wmflabs.org/wikidata-todo is not working - https://phabricator.wikimedia.org/T143779#2578769 (10valhallasw) https://tools.wmflabs.org/wikidata-todo/dupes.html is ok, https://tools.wmflabs.org/wikidata-todo/quick_statements.php is not. This suggests the php-fcgi backends are overlo... [12:05:26] 10Tool-Labs-tools-Other: tools.wmflabs.org/wikidata-todo is not working - https://phabricator.wikimedia.org/T143779#2578787 (10abian) Then, I overloaded those backends. Everything was fine until I submitted a large job. I'm sorry. :S [12:11:59] abian: the tool shouldn't completely break down when someone does that ;-) [12:13:52] Perhaps you're right ':) [12:28:40] 10Tool-Labs-tools-Other: tools.wmflabs.org/wikidata-todo is not working - https://phabricator.wikimedia.org/T143779#2578809 (10abian) Now, these pages seem to load again. [12:37:01] 06Labs: nova-network deprecated, for real this time, as of Openstack N - https://phabricator.wikimedia.org/T142615#2578811 (10chasemp) p:05Triage>03High [12:37:36] 06Labs: nova-network deprecated, for real this time, as of Openstack N - https://phabricator.wikimedia.org/T142615#2541058 (10chasemp) thanks @andrew, I'm flagging this to be in my face constantly :) [12:42:38] 06Labs, 10Tool-Labs, 07Wikimedia-Incident: Tune nginx config parameters for tools / labs proxies - https://phabricator.wikimedia.org/T143637#2574397 (10chasemp) That's a fairly significant drop :) [12:46:50] 10Tool-Labs-tools-Other: tools.wmflabs.org/wikidata-todo is not working - https://phabricator.wikimedia.org/T143779#2578821 (10Magnus) QuickStatements calls the tool API one row at a time. A single tab (or even a handful) won't overload it, no matter how long your instruction list is. [13:30:09] 06Labs, 10Tool-Labs, 13Patch-For-Review: Write diamond collector for gridengine job count stats - https://phabricator.wikimedia.org/T140999#2578939 (10chasemp) 05Open>03Resolved [15:08:01] 06Labs, 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07Wikimedia-Incident: Nodepool instance instance creation quota management - https://phabricator.wikimedia.org/T143016#2579149 (10hashar) @thcipriani pointed to some IRC logs from OpenStack infrastructure that reflected them having the sa... [15:11:40] PROBLEM - Puppet run on tools-k8s-master-01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [15:32:45] 06Labs, 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07Wikimedia-Incident: Nodepool instance instance creation quota management - https://phabricator.wikimedia.org/T143016#2579209 (10hashar) As for the quota glitches, the is two ghost instances that needs to be deleted. Also Nova codes see... [15:51:42] RECOVERY - Puppet run on tools-k8s-master-01 is OK: OK: Less than 1.00% above the threshold [0.0] [16:19:45] I just posted a talk proposal for wikiconf that may be of interest to some folks here -- https://wikiconference.org/wiki/Submissions:2016/Developing_community_norms_for_critical_bots_and_tools [16:20:15] 06Labs, 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07Wikimedia-Incident: Nodepool instance instance creation quota management - https://phabricator.wikimedia.org/T143016#2579386 (10hashar) From #openstack-infra: > [16:15:12Z] hashar: There was a quota mismatch with our proj... [16:34:30] 06Labs, 06Operations: grafana-labs.wikimedia.org doesn't reflect grafana-labs-admin.wikimedia.org - https://phabricator.wikimedia.org/T143556#2579482 (10fgiunchedi) some progress: the global user "Anonymous" lacked "Viewer" access to the default/main organization, now all dashboards are visible on https://graf... [16:43:07] 06Labs, 06Operations: 4.4-series kernel vs. iptables - https://phabricator.wikimedia.org/T142388#2579538 (10MoritzMuehlenhoff) a:03MoritzMuehlenhoff [17:23:51] 06Labs, 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07Wikimedia-Incident: Nodepool instance instance creation quota management - https://phabricator.wikimedia.org/T143016#2579700 (10chasemp) >>! In T143016#2579386, @hashar wrote: > From #openstack-infra: > >> [16:15:12Z] has... [17:25:22] !log tools depool tools-exec-1217, it is dead/stuck/hung/io-starved [17:25:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [17:26:49] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Track labs instances hanging - https://phabricator.wikimedia.org/T141673#2579718 (10yuvipanda) p:05Triage>03High [17:50:15] RECOVERY - Puppet run on tools-precise-dev is OK: OK: Less than 1.00% above the threshold [0.0] [17:51:44] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Track labs instances hanging - https://phabricator.wikimedia.org/T141673#2579795 (10yuvipanda) [17:56:15] PROBLEM - Puppet run on tools-precise-dev is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [18:18:00] !log testlabs disposable-1014-test-vm-01 IO testing on labvirt1014 [18:18:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Testlabs/SAL, Master [18:20:32] !log tools.xtools Restarted webservice because of 503 errors. [18:20:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.xtools/SAL, Master [18:20:54] We made it 9 days that time :) [18:28:23] 06Labs, 10Tool-Labs: Ssh fails for new tools instances - https://phabricator.wikimedia.org/T143812#2579881 (10Andrew) [18:28:32] yuvipanda: ^ [18:29:57] thcipriani: didyou make an instance named 'blerg' [18:30:10] and can I remove it? (stuck in deleting state since 8/10) [18:30:47] chasemp: yes. Sorry. Confirming instance creation was broken at some point :\ [18:31:09] delete at will [18:31:13] andrewbogott: if you hop on labcontrol and issue 'nova show 6fe92c84-055b-4a94-a3e6-3d74a49bd015' there is some whacky error output from nova internals I have never seen [18:31:50] andrewbogott: or I could just show you :) https://phabricator.wikimedia.org/P3884 [18:32:40] I've seen that before… I /think/ that it happens when rabbitmq is stuck or swamped, and that once we get into that timedout state it stays that way forever [18:32:56] Issuing another delete generally clears it up, but… I wish we had a better theory about the cause of the timeouts [18:33:20] andrewbogott: this is right around the time of CI breakage and rabbitmq overhelm [18:33:35] that fits pretty well then [18:33:48] yep [18:34:32] andrewbogott: any idea on opengrok-web [18:34:35] stuck in error state [18:34:40] I reset but no clue on this [18:34:44] seems old too [18:34:49] | image | ubuntu-10.04-lucid (deprecated) (07072e9d-c65f-427e-838b-81d4be6cdc2a) | [18:35:27] what's the instance id? [18:35:42] 06Labs: Deleted instances stuck in ERROR state in nova - https://phabricator.wikimedia.org/T143566#2579909 (10chasemp) 05Open>03Resolved a:03chasemp ``` 1999 nova reset-state --active 15fb50c2-56bf-4efc-bc8f-ceb303c5c5d8 2000 nova reset-state --active 21ba7af6-bde1-4f08-a16a-c3faa06b6ccc 2001 nova sho... [18:36:14] andrewbogott: 0153e94a-2f43-4b87-ac06-d6be88cb0c6c [18:37:05] Do you think I should try to revive it? Do we have a user complaint? [18:37:47] no it was stuck in ERROR state [18:37:52] I reset to active but...no idea otherwise [18:38:08] went looking because of https://phabricator.wikimedia.org/T143566 [18:38:31] ah, ok. I don't know that there's a good way to know what happened at this point [18:39:25] live and let live then :) just curious if you knew since it's so old and seems one-offy [18:41:09] 06Labs, 10Tool-Labs: Ssh fails for new tools instances - https://phabricator.wikimedia.org/T143812#2579939 (10Andrew) The sad example is andrew-test-1001.tools.eqiad.wmflabs [18:41:30] 06Labs, 10Labs-Infrastructure: Upgrade qemu on labvirts - https://phabricator.wikimedia.org/T142866#2579941 (10chasemp) IIUC the jessie stable version of `qemu-system-x86` is still 2.0 vs 2.3 for Trusty and because of that we have discussed downgrading possibly as well [18:44:45] hi im wondering how do i request iquota increase please? [18:44:53] PROBLEM - Host andrew-create-test-112 is DOWN: CRITICAL - Host Unreachable (10.68.18.80) [18:44:55] i forgot the task that tells you how to request it [18:45:10] paladox: https://phabricator.wikimedia.org/T140904 [18:45:21] thanks [18:48:39] 06Labs: Request increased quota for git labs project - https://phabricator.wikimedia.org/T143815#2579978 (10Paladox) [18:50:34] chasemp i wonder would the above work or do i have to specify a cpu or ram increase? [18:50:51] just a question so that i doint include the wrong information [18:51:09] that's fine [18:51:34] ok thanks [19:01:25] Hi everybody, I have a problem with my tool. The webservice responds "502 Bad Gateway". Any ideas how I can solve the problem? [19:03:42] I tried: tools.request@tools-bastion-03:~$ webservice restart [19:03:43] Your job is not running, starting. [19:04:02] FNDE: I was about to log I stopped it because it was spamming us w/ email [19:04:14] to the point where it was causing issues for other tools / admins [19:04:37] Job 128890 caused action: Job 128890 set to ERROR [19:04:38] User = tools.request [19:04:38] Queue = webgrid-lighttpd@tools-webgrid-lighttpd-1412.tools.eqiad.wmflabs [19:04:39] Start Time = [19:04:41] End Time = [19:04:43] failed opening input/output file:08/24/2016 19:03:06 [52861:8063]: error: can't open output file "/data/project/request/error.log": P [19:04:57] :O :O [19:05:20] Where do u found this information? :) [19:05:46] that was in the email [19:05:53] which I imagine is the STDERROR output [19:06:33] ah.. can I also get this mails? [19:07:30] I'm not sure how you can (though I believe you can) [19:07:42] FNDE: PM your email and I'll forward one for now but I stopped it again as it's still doing it [19:07:43] great! problem solved :) [19:08:41] I think I deleted this file before O:) [19:09:02] thank u! [19:25:55] Reedy: I'm using the latest version of AWB on my personal wiki and am logged in on a bot account but the bot tab seems to be missing. Known Issue? Note that I'm banned on enwp so can't report it there. [19:27:02] http://ddowiki.com/page/Special:UserRights/Kobold_sneak is the user; Internet Explorer version: 11.0.9600.18427 [19:27:02] .NET version: 2.0.50727.5485 [19:27:02] Windows version: 6.1; AWB v 5.8.7.0 SVN 12080 [19:37:20] PROBLEM - Puppet run on tools-docker-builder-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [19:38:53] Reedy: Apparently it had something to do with AutoWikiBrowser/CheckPage -- But I thought that page was ignored for admin accounts? [19:39:28] Even listing the bot account on the page wouldn't allow it to work. Confused about this. [19:39:44] Tried listing with and without the under_score [19:39:55] Didn't make a difference. [19:41:59] ShoeMaker: I think Reedy's bouncer is here but his attentions are elsewhere. It might be a while before he gets back to you. [19:42:39] That's okay, I'm about to head off to work... Just wanted to give him any possible relevant info I could think of. [20:55:37] 06Labs, 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07Wikimedia-Incident: Nodepool instance instance creation quota management - https://phabricator.wikimedia.org/T143016#2580531 (10hashar) Yeah the quota usage links you have posted earlier have lead me to figure out how to look at the act... [21:14:09] 06Labs, 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07Wikimedia-Incident: Nodepool instance instance creation quota management - https://phabricator.wikimedia.org/T143016#2580595 (10hashar) Right now with 2 jessie and 2 trusty instances (min-ready values). On the Horizon project page at h... [21:17:35] 06Labs, 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07Wikimedia-Incident: Nodepool instance instance creation quota management - https://phabricator.wikimedia.org/T143016#2580596 (10hashar) I have looked at all the projects I have access too and `tools` seems to be off by one with the pie... [21:37:56] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Track labs instances hanging - https://phabricator.wikimedia.org/T141673#2580620 (10chasemp) [21:43:22] PROBLEM - Puppet run on tools-webgrid-lighttpd-1210 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [22:00:43] !log ores restarted uwsgi-ores on ores-web-05 [22:00:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Ores/SAL, Master [22:12:39] 06Labs, 10Horizon, 13Patch-For-Review: Create puppet backend with REST api for labs instance configuration - https://phabricator.wikimedia.org/T133412#2580742 (10Andrew) The API as described in the description does not at all correspond to the actual code. I'm pretty sure that we say 'roles' and 'hiera' now... [22:14:12] 06Labs, 10Horizon, 13Patch-For-Review: Create puppet backend with REST api for labs instance configuration - https://phabricator.wikimedia.org/T133412#2580746 (10Andrew) Using a single pathway for hiera means that I have to comingle two different sources of hiera yaml: 1) Hiera that was written free-form b... [22:14:51] (03PS1) 10BryanDavis: Rebuilt all wheels on Trusty host [labs/striker/wheels] - 10https://gerrit.wikimedia.org/r/306566 [22:15:25] (03CR) 10BryanDavis: [C: 032] Rebuilt all wheels on Trusty host [labs/striker/wheels] - 10https://gerrit.wikimedia.org/r/306566 (owner: 10BryanDavis) [22:15:31] (03Merged) 10jenkins-bot: Rebuilt all wheels on Trusty host [labs/striker/wheels] - 10https://gerrit.wikimedia.org/r/306566 (owner: 10BryanDavis) [22:15:58] hi, the crosswatch tool gives a 404, webservice crashed? [22:16:29] (03PS1) 10BryanDavis: Rebuild wheels on Trusty [labs/striker/deploy] - 10https://gerrit.wikimedia.org/r/306567 [22:16:48] bd808 fails right now with https://phabricator.wikimedia.org/P3887 btw [22:16:56] meh, wrong channel [22:17:20] heh [22:18:20] (03CR) 10BryanDavis: [C: 032] Rebuild wheels on Trusty [labs/striker/deploy] - 10https://gerrit.wikimedia.org/r/306567 (owner: 10BryanDavis) [22:18:26] (03Merged) 10jenkins-bot: Rebuild wheels on Trusty [labs/striker/deploy] - 10https://gerrit.wikimedia.org/r/306567 (owner: 10BryanDavis) [22:19:55] (03PS1) 10BryanDavis: Remove unused wheel wheel [labs/striker/wheels] - 10https://gerrit.wikimedia.org/r/306569 [22:20:08] (03CR) 10BryanDavis: [C: 032] Remove unused wheel wheel [labs/striker/wheels] - 10https://gerrit.wikimedia.org/r/306569 (owner: 10BryanDavis) [22:20:49] come on jerkins [22:21:25] (03Merged) 10jenkins-bot: Remove unused wheel wheel [labs/striker/wheels] - 10https://gerrit.wikimedia.org/r/306569 (owner: 10BryanDavis) [22:22:04] (03PS1) 10BryanDavis: Bump wheels submodule [labs/striker/deploy] - 10https://gerrit.wikimedia.org/r/306570 [22:22:13] (03CR) 10BryanDavis: [C: 032] Bump wheels submodule [labs/striker/deploy] - 10https://gerrit.wikimedia.org/r/306570 (owner: 10BryanDavis) [22:22:20] (03Merged) 10jenkins-bot: Bump wheels submodule [labs/striker/deploy] - 10https://gerrit.wikimedia.org/r/306570 (owner: 10BryanDavis) [22:25:26] (03PS1) 10BryanDavis: Fix hostname of californium [labs/striker/deploy] - 10https://gerrit.wikimedia.org/r/306572 [22:25:39] (03CR) 10BryanDavis: [C: 032] Fix hostname of californium [labs/striker/deploy] - 10https://gerrit.wikimedia.org/r/306572 (owner: 10BryanDavis) [22:25:48] (03Merged) 10jenkins-bot: Fix hostname of californium [labs/striker/deploy] - 10https://gerrit.wikimedia.org/r/306572 (owner: 10BryanDavis) [22:28:20] RECOVERY - Puppet run on tools-webgrid-lighttpd-1210 is OK: OK: Less than 1.00% above the threshold [0.0] [22:34:33] 06Labs, 10Horizon, 13Patch-For-Review: Create puppet backend with REST api for labs instance configuration - https://phabricator.wikimedia.org/T133412#2580804 (10Andrew) So far I'm finding the format-scrambling issue to be tolerable, so best not to waste much time on this until I have a more complete demo ru... [22:51:24] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Joel was modified, changed by Tim Landscheidt link https://wikitech.wikimedia.org/w/index.php?diff=818291 edit summary: [22:58:57] PROBLEM - Host tools-exec-1217 is DOWN: CRITICAL - Host Unreachable (10.68.18.20) [23:03:48] RECOVERY - Host tools-exec-1217 is UP: PING OK - Packet loss = 0%, RTA = 0.73 ms [23:03:54] !log tools reboot tools-exec-1217 [23:04:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [23:07:04] RECOVERY - SSH on tools-exec-1217 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2~wmfprecise2 (protocol 2.0) [23:08:40] PROBLEM - Puppet staleness on tools-exec-1217 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [43200.0] [23:18:43] RECOVERY - Puppet staleness on tools-exec-1217 is OK: OK: Less than 1.00% above the threshold [3600.0] [23:19:08] chasemp thanks! did you also repool it? [23:19:36] not yet [23:21:28] yep [23:21:29] done [23:22:08] yuvipanda: the long story short on that VM and why it's in that state is 'I don't know' but I dragged brandon into it and we both puzzled [23:22:12] it's no simple issue that's for sure [23:22:39] !log striker Building new trusty uwsgi node to replace striker-uwsgi01 (jessie) [23:22:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Striker/SAL, Master [23:23:05] chasemp ok :) [23:23:20] chasemp thanks for looking into it! [23:23:46] yep, yuvipanda notes on various things https://phabricator.wikimedia.org/P3886 [23:23:55] I'll try to put it into some context on task [23:24:01] but I'm burnt out for the moment [23:24:16] * yuvipanda nods [23:28:00] (03CR) 10Legoktm: [C: 032] Add #Augmented-Changes-Feed to #wikimedia-collaboration [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/306297 (owner: 10Mattflaschen) [23:34:08] (03Merged) 10jenkins-bot: Add #Augmented-Changes-Feed to #wikimedia-collaboration [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/306297 (owner: 10Mattflaschen) [23:38:39] 06Labs, 10Tool-Labs: Ssh fails for new tools instances - https://phabricator.wikimedia.org/T143812#2580929 (10yuvipanda) 05Open>03Resolved This was a combination of the following: 1. I had enabled clush to be available on all instances by adding role::toollabs::clush::target tools-wide 2. But this wasn't... [23:44:28] PROBLEM - Host test-ssh is DOWN: CRITICAL - Host Unreachable (10.68.19.76) [23:48:56] PROBLEM - Host andrew-test-1001 is DOWN: CRITICAL - Host Unreachable (10.68.18.174)