[00:00:25] 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Public IPs not being updated from OpenStack Nova plugin - https://phabricator.wikimedia.org/T52620#2570792 (10Andrew) In theory this issue is already resolved because the status plugin gets 'exists' notifications that are sent every hour. Do you see otherwise? [00:15:23] RECOVERY - Puppet run on tools-exec-1202 is OK: OK: Less than 1.00% above the threshold [0.0] [01:54:45] PROBLEM - Puppet run on tools-worker-1010 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [02:04:49] 06Labs, 10Tool-Labs: requests_oauthlib not installed on exec hosts? - https://phabricator.wikimedia.org/T143534#2570878 (10Nettrom) [02:30:37] 06Labs, 10Tool-Labs: requests_oauthlib not installed on exec hosts? - https://phabricator.wikimedia.org/T143534#2570878 (10yuvipanda) 1. Use jsub instead of qsub? 2. Pass `-l release=trusty` to make sure that it runs on trusty, which does have these packages (We'll be switching this to the default soon) [02:34:47] RECOVERY - Puppet run on tools-worker-1010 is OK: OK: Less than 1.00% above the threshold [0.0] [02:38:45] 06Labs, 10Tool-Labs: requests_oauthlib not installed on exec hosts? - https://phabricator.wikimedia.org/T143534#2570909 (10Nettrom) 05Open>03Resolved a:03Nettrom Switching to trusty by adding the parameter fixed it, sorry for not thinking about that before creating the task. Closing it as resolved. [02:46:52] PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [03:21:48] RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0] [05:42:05] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string 'Magnus' not found on 'http://tools.wmflabs.org:80/' - 355 bytes in 0.002 second response time [05:47:05] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 3670 bytes in 0.035 second response time [06:43:05] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string 'Magnus' not found on 'http://tools.wmflabs.org:80/' - 531 bytes in 0.019 second response time [06:53:06] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 3670 bytes in 0.019 second response time [07:28:28] 06Labs, 10Tool-Labs, 13Patch-For-Review: Tool Labs jobs locking up - https://phabricator.wikimedia.org/T143375#2570993 (10valhallasw) One easy thing to check is whether the job can still write to the output file; to do so, ssh to the exec host, and `ls -l /proc//fd`, where is the process id (which... [07:29:05] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string 'Magnus' not found on 'http://tools.wmflabs.org:80/' - 355 bytes in 0.002 second response time [07:34:05] RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 3670 bytes in 0.026 second response time [07:41:29] Hi. Labs tools on Error 500 [07:41:47] 500 Internal Server Error - nginx/1.11.1 [07:45:06] PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: HTTP CRITICAL - No data received from host [07:55:47] PROBLEM - Puppet run on tools-worker-1010 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0] [07:55:59] 10Tool-Labs-tools-Other: [AG] [Bug] Internet Explorer: Wrong height of input field - https://phabricator.wikimedia.org/T113590#2571024 (10Florian) [08:10:08] mafk, I can get to some labs tools [08:10:13] which ones can't you get to? [08:10:52] it's working now [08:11:01] tools.guc [08:11:09] and some others [08:11:16] maybe a temporary outage [08:26:57] 06Labs, 10Labs-Infrastructure, 10Beta-Cluster-Infrastructure, 07Tracking: Log files on labs instance fill up disk (/var is only 2GB) (tracking) - https://phabricator.wikimedia.org/T71601#2571093 (10hashar) [08:30:48] RECOVERY - Puppet run on tools-worker-1010 is OK: OK: Less than 1.00% above the threshold [0.0] [08:33:27] PROBLEM - Puppet run on tools-flannel-etcd-03 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [08:37:26] 06Labs, 10Labs-Infrastructure, 10Beta-Cluster-Infrastructure, 07Tracking: Log files on labs instance fill up disk (/var is only 2GB) (tracking) - https://phabricator.wikimedia.org/T71601#2571103 (10hashar) 05Open>03Resolved a:03hashar That was a transient issue due to labs instances having a `/var` o... [08:39:39] 06Labs, 10Graphite, 06Operations: lots of graphite metrics under "instances" created - https://phabricator.wikimedia.org/T143405#2571110 (10fgiunchedi) @yuvipanda the above would remove also recent files, sth like `find . -type f -mtime +672 -delete` and delete empty directories too afterwards [09:13:28] RECOVERY - Puppet run on tools-flannel-etcd-03 is OK: OK: Less than 1.00% above the threshold [0.0] [09:39:16] PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1207 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0] [10:38:51] 10Tool-Labs-tools-Pageviews, 07I18n: The label Metric in Siteviews is not localizable - https://phabricator.wikimedia.org/T143544#2571253 (10Amire80) [11:51:30] 06Labs, 10Tool-Labs: Puppet not running on tools-webgrid-lighttpd-1207 - https://phabricator.wikimedia.org/T143191#2571314 (10MoritzMuehlenhoff) You mean the "error writing to client: Broken pipe" message? Yes, that's irritatiing, but benign. See here for more: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug... [13:40:23] 06Labs: cronspam from labscontrol1001, labstore1001, labnet1002.eqiad.wmnet, labsdb1003.eqiad.wmnet - https://phabricator.wikimedia.org/T132422#2571494 (10Andrew) I fixed several of these just now; will keep an eye on my cronspam to see what I missed. [13:45:05] 06Labs: cronspam from labscontrol1001, labstore1001, labnet1002.eqiad.wmnet, labsdb1003.eqiad.wmnet - https://phabricator.wikimedia.org/T132422#2571507 (10faidon) Thanks a lot, appreciated :) I'll keep an eye out too. [14:00:31] 06Labs, 10Math: Request increased quota for Math labs project - https://phabricator.wikimedia.org/T143446#2568538 (10chasemp) Am I reading this right: you want whatever quota bump it takes to spin up one large instance while keeping existing instances? I'm not sure if it's only RAM needed in this case but we... [14:03:02] 06Labs, 10Math: Request increased quota for Math labs project - https://phabricator.wikimedia.org/T143446#2571545 (10Hcohl) If by 'spin up' , it is meant to 'create a new', then I agree yes. [14:07:26] 06Labs: Don't set instance root passwords if using a local puppetmaster - https://phabricator.wikimedia.org/T142531#2571549 (10yuvipanda) Even more detailed steps: 1. Write a custom fact that parses /etc/puppet/puppet.conf for puppetmaster = value 2. Check if this is equal to the labs puppetmaster value (obtain... [14:16:38] 06Labs: Request increased quota for Phlogiston labs project - https://phabricator.wikimedia.org/T143020#2571566 (10chasemp) So the request is for enough quota for one large instance while keeping phlog-02 around, but as a temp bump in some fashion as the resources from phlog-01 will be recycled once the new larg... [14:57:33] 06Labs, 06Operations: grafana-labs.wikimedia.org doesn't reflect grafana-labs-admin.wikimedia.org - https://phabricator.wikimedia.org/T143556#2571642 (10yuvipanda) [14:59:39] 06Labs, 10Tool-Labs: Build a puppet failure check for tools that's less flaky than current one - https://phabricator.wikimedia.org/T143499#2571654 (10yuvipanda) [15:00:10] 06Labs, 10Tool-Labs, 13Patch-For-Review: Tool Labs jobs locking up - https://phabricator.wikimedia.org/T143375#2571655 (10MusikAnimal) I think I've figured it out. The [[ https://en.wikipedia.org/w/api.php?action=query&titles=MediaWiki&format=json&maxlag=-1 | maxlag API response ]] now includes fractional se... [15:38:32] RECOVERY - Host secgroup-lag-102 is UP: PING OK - Packet loss = 0%, RTA = 1.23 ms [15:41:58] PROBLEM - Host secgroup-lag-102 is DOWN: CRITICAL - Host Unreachable (10.68.17.218) [15:47:22] RECOVERY - Host tools-secgroup-test-103 is UP: PING OK - Packet loss = 0%, RTA = 0.60 ms [15:49:43] PROBLEM - Host tools-secgroup-test-103 is DOWN: CRITICAL - Host Unreachable (10.68.21.22) [15:50:38] andrewbogott btw, I think the wikitech api is reporting instances that no longer exist [15:51:47] see https://wikitech.wikimedia.org/w/api.php?action=query&list=novainstances&niregion=eqiad&format=json&niproject=tools for example [15:51:49] yuvipanda: I don't really know how the wikitech api works or where it gets its data [15:52:00] andrewbogott it is just a reflection of nova [15:52:05] let me see if nova has same issue [15:52:25] RECOVERY - Host tools-secgroup-test-102 is UP: PING OK - Packet loss = 0%, RTA = 0.85 ms [15:53:03] andrewbogott can confirm, instances that are deleted are in ERROR state in nova [15:53:42] 'in nova' meaning what? [15:54:02] andrewbogott 'openstack server list' shows them in ERROR state [15:54:07] OS_TENANT_NAME=tools openstack server list | grep ERROR [15:54:12] let me file a bug [15:54:12] ok [15:54:16] thanks [15:54:41] PROBLEM - Host tools-secgroup-test-102 is DOWN: CRITICAL - Host Unreachable (10.68.21.170) [15:55:19] 06Labs: Deleted instances stuck in ERROR state in nova - https://phabricator.wikimedia.org/T143566#2571912 (10yuvipanda) [15:55:54] andrewbogott done! [16:01:31] 06Labs, 06Operations: grafana-labs.wikimedia.org doesn't reflect grafana-labs-admin.wikimedia.org - https://phabricator.wikimedia.org/T143556#2571946 (10fgiunchedi) a:03fgiunchedi [16:11:10] Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Joaquinito01 was created, changed by Joaquinito01 link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Joaquinito01 edit summary: Created page with "{{Tools Access Request |Justification=Wikipedia |Completed=false |User Name=Joaquinito01 }}" [16:31:20] (03PS1) 10Lokal Profil: Add project to dbPrimaryKey [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/306012 (https://phabricator.wikimedia.org/T143481) [16:35:32] (03CR) 10Lokal Profil: "Can't try it locally though due to dumps. But possibly if https://gerrit.wikimedia.org/r/#/c/303498/ was merged?" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/306012 (https://phabricator.wikimedia.org/T143481) (owner: 10Lokal Profil) [16:45:44] 06Labs, 10Labs-Infrastructure: Plan deprecation of all precise instances in Labs - https://phabricator.wikimedia.org/T143349#2572112 (10MoritzMuehlenhoff) @yuvipanda : There's at least one precise instance not in your list; precise.debdeploy.eqiad.wmflabs ? This specific instance can be trashed once we're done... [17:12:40] PROBLEM - Puppet run on tools-k8s-master-01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [17:21:21] 06Labs, 06Operations, 13Patch-For-Review: move nfs /scratch to labstore1003 - https://phabricator.wikimedia.org/T134896#2572313 (10chasemp) @madhuvishy has been formalizing our logic for depooling/pooling grid exec nodes and so with T140483 resolved we hope to rolling this out without rebooted. 0. stage cha... [17:29:46] 06Labs, 10Tool-Labs, 13Patch-For-Review: Tool Labs jobs locking up - https://phabricator.wikimedia.org/T143375#2572331 (10MusikAnimal) 05Open>03Resolved a:03MusikAnimal I think that was it. Thank you very much for the help everyone (esp. @valhallasw)!! This one was not easy to figure out :) Do we know... [17:39:28] 06Labs, 06Operations, 13Patch-For-Review: move nfs /scratch to labstore1003 - https://phabricator.wikimedia.org/T134896#2572367 (10yuvipanda) You can do the same for k8s. You can depool https://wikitech.wikimedia.org/wiki/Tools_Kubernetes#Depooling_a_node, do your thing, repool. That will work for all the k8... [17:42:25] 06Labs, 06Operations, 13Patch-For-Review: move nfs /scratch to labstore1003 - https://phabricator.wikimedia.org/T134896#2572382 (10madhuvishy) Yeah I'm familiar with doing this for k8s worker nodes - did this a bunch of times while helping @yuvipanda recreate worker nodes couple weeks ago. [17:52:43] RECOVERY - Puppet run on tools-k8s-master-01 is OK: OK: Less than 1.00% above the threshold [0.0] [18:30:53] anyone there? Doing >ps -aux fw on one of our labs instances displays processes on/off [18:31:07] which seems really wrong [18:31:44] cc yuvipanda andrewbogott [18:32:08] well nevermind [18:39:01] nuria_: 'on/off' ? [18:39:57] valhallasw`cloud: ya nevermind , figure it out that ps -aux fw no longer works has to be ps -auxfw [18:51:53] 06Labs, 10Horizon, 13Patch-For-Review: Incorrect quota error when creating instances in some projects - https://phabricator.wikimedia.org/T142379#2572906 (10Andrew) 05Open>03Resolved ugly hotfix applied! [19:23:34] How do we set the secret key while running a uwsgi Flask app on Labs? [19:24:03] curry: app.secret_key = ... ? [19:24:10] curry: I may be missing some context here [19:24:22] I tried $ export SECRET_KEY='secret_key' in the terminal [19:25:02] And used app.config['SECRET_KEY']=environ.get('SECRET_KEY') [19:25:04] You're trying to run an existing app that requires setting the secret key via an environment variable? [19:25:24] Yes [19:25:57] It throws 'secret key not set' error when I try to access a session variable [19:28:08] curry: I don't know much of the flask internals, so I'm afraid you'll have to debug that [19:28:22] http://flask.pocoo.org/docs/0.11/config/ suggests app.config['SECRET_KEY'] = ... should indeed work [19:28:31] but maybe you're accessing the session variable before the config is set? [19:28:38] or maybe it's being overwritten somehow/ [19:31:49] I was wondering if ToolLabs deals with secret keys in a different way [19:32:10] Because what I am doing works perfectly fine on localhost [19:33:22] curry, how are you running the flask app? [19:33:28] You're not very clear in where the issue occurs. Is the issue with reading an enviroment variable or in configuring flask with that secret key? [19:34:12] The exact error I'm getting in uwsgi.logs is this: [19:34:15] RuntimeError: The session is unavailable because no secret key was set. Set the secret_key on the application to something unique and secret. [19:34:42] I'm running the Flask app as: [19:34:45] webservice uwsgi-python start [19:34:58] That explains it [19:35:13] oh, right. environ.get('SECRET_KEY') returns None if it can't read the environment variable [19:35:27] If you set the env var on the bastion then it doesn't carry over to the grid host [19:36:13] Ohh! [19:36:57] Any quicklinks on how to set it on the grid? [19:37:19] PROBLEM - Puppet run on tools-docker-builder-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [19:38:03] curry, you can't [19:38:09] I would just load the key from a json file [19:39:32] thing is even when I explicitly did: [19:40:09] app.secret_key='secret' [19:40:29] I get the same error although I am not getting the key from an env var [19:41:52] curry: are you sure you're not overwriting that value later? [19:41:58] e.g. with None from os.environ/ [19:42:10] Yes. Sure. [19:43:53] I just stopped and started the service [19:43:57] It's working [19:44:19] curry: Ah, right. If you change code, the server has to be restarted. [19:44:38] (as opposed to just running 'python app.py', which has some auto-reloading magic) [19:44:55] Okayy. So you suggest reading in the secret key from an external file? [19:46:20] Yes, I suggest moving all secrets to a seperate file (e.g. 'secrets.json'), then loading the data from there [19:46:46] secrets.json can then be chmod 600 (i.e. only read/writeable by the user) while the rest of the code can be open for reading [19:51:36] Alright. Thanks a lot! [19:51:48] 06Labs: Request increased quota for Phlogiston labs project - https://phabricator.wikimedia.org/T143020#2573062 (10JAufrecht) Phlog-01 is not currently running Phlogiston code and hasn't in weeks, so the leak in the charts shouldn't be related to Phlogiston. Phlog-01 was stable until early June (T137736), and t... [20:01:10] (03PS1) 10BryanDavis: Add python-logstash [labs/striker/wheels] - 10https://gerrit.wikimedia.org/r/306047 [20:02:54] PROBLEM - Puppet run on tools-exec-1204 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [20:05:51] (03PS2) 10BryanDavis: Add python-logstash [labs/striker/wheels] - 10https://gerrit.wikimedia.org/r/306047 [20:09:26] (03PS1) 10BryanDavis: Add python-logstash and bump wheels [labs/striker/deploy] - 10https://gerrit.wikimedia.org/r/306049 [20:29:46] 06Labs, 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07Wikimedia-Incident: Nodepool instance instance creation quota management - https://phabricator.wikimedia.org/T143016#2573190 (10hashar) Nice debugging! >>! In T143016#2559446, @thcipriani wrote: > Messages like this one: > > ``` > DEB... [20:55:50] (03PS2) 10BryanDavis: Add Logstash logging support [labs/striker] - 10https://gerrit.wikimedia.org/r/305941 (https://phabricator.wikimedia.org/T143172) [20:56:15] (03PS3) 10BryanDavis: Add python-logstash [labs/striker/wheels] - 10https://gerrit.wikimedia.org/r/306047 (https://phabricator.wikimedia.org/T143172) [20:56:35] (03PS2) 10BryanDavis: Add python-logstash and bump wheels [labs/striker/deploy] - 10https://gerrit.wikimedia.org/r/306049 (https://phabricator.wikimedia.org/T143172) [20:57:23] 10Tool-Labs-tools-Pageviews: Script/bot rapidly hitting Pageviews tool - https://phabricator.wikimedia.org/T142607#2573335 (10MusikAnimal) 05Open>03Invalid This appears to have stopped [21:03:43] musikanimal: wrt access.log; could that be the log rotation again? [21:04:06] ? [21:04:06] it can also just be a bit slow; the easiest way to test is to ssh to the webgrid server and `tail access.log` there [21:04:09] where do you see that [21:04:16] https://phabricator.wikimedia.org/T142607#2573335 [21:04:21] "A weird unrelated issue is the access.log on Tool Labs isn't being written to for any requests," [21:04:36] oh, yeah [21:04:49] hmm that could be the log rotation, actually [21:05:30] the last entries in the access.log are from 6 August [21:06:02] maybe my log rotation script is broken, but it should scrape off the top part of the file and not the bottom [21:06:12] tail -c 1000000 $logfile > temp.$$; mv temp.$$ $logfile [21:06:51] where logfile is one the files in *.log [21:07:40] valhallasw`cloud: how might one get into the webgrid server? [21:08:25] musikanimal, same way as any other sever [21:08:27] *server [21:08:41] `ssh tools-webgrid-` [21:09:01] You can find the host with qsub, like any other grid job [21:09:52] this tool is on Kubernetes [21:11:05] Do `kubectl get pods` [21:11:22] Get the pod number, it'll be the tool name + some random characters [21:11:59] musikanimal, then `kubectl exec -ti -- bash` [21:12:27] 06Labs, 07Tracking: Existing Labs project quota increase requests (Tracking) - https://phabricator.wikimedia.org/T140904#2573496 (10chasemp) [21:12:29] 06Labs, 10Phlogiston (Interrupt): Create new Phlogiston instance for production - https://phabricator.wikimedia.org/T142277#2573495 (10chasemp) [21:12:34] 06Labs: Revert: Request increased quota for Phlogiston labs project - https://phabricator.wikimedia.org/T143020#2573497 (10chasemp) [21:13:17] sweet, thank you [21:13:22] never would have figured that out heh [21:13:32] anyway the logs are the same there, nothing since 6 August [21:13:39] I'll try disabling the rotation [21:14:14] it only rotates every two hours though, so we should see something more recent in the log [21:15:29] 06Labs, 07Tracking: Existing Labs project quota increase requests (Tracking) - https://phabricator.wikimedia.org/T140904#2573507 (10chasemp) [21:15:31] 06Labs, 10Math: Request increased quota for Math labs project - https://phabricator.wikimedia.org/T143446#2573504 (10chasemp) 05Open>03Resolved a:03chasemp @Hcohi should be good to go. Let us know if you have issues. [21:17:06] 06Labs, 10Math: Request increased quota for Math labs project - https://phabricator.wikimedia.org/T143446#2573509 (10Hcohl) If by @Hcohi you mean @Hcohl I am very happy about this. Thank you. :) [21:18:23] 06Labs, 10Math: Request increased quota for Math labs project - https://phabricator.wikimedia.org/T143446#2573517 (10chasemp) @Hcohl ha yes sorry and yw [21:19:43] musikanimal: no, because it stops writing after the first rotation [21:20:09] musikanimal: because the server is writing to a now-no-longer-existing file [21:20:26] ah, I think I get it [21:20:31] The log rotation effectively disables the logs [21:20:42] well that's a shame [21:20:57] Do you have it in truncate mode? [21:21:02] I think there's a phab somewhere about getting some rotation system set up? [21:21:11] I'm not sure [21:21:22] Yes.. somewhere [21:21:30] normal ole `truncate --size 10000 file` chops off the end of the file, right? [21:21:31] musikanimal: somewhere, but not going to happen [21:21:44] Probably isn't going to happen for a long time though [21:21:49] this will be better in kubernetes-world [21:21:59] and in kubernetes-world, this is/should be a solved problem [21:22:10] yeah, trunc chops off the end [21:22:25] truncate* [21:22:31] musikanimal, try `kubectl logs ` [21:22:52] If the tool logs to stderr/stdout then it'll be in that [21:22:59] got nothing [21:23:06] tom29739: musikanimal is not using k8s [21:23:07] this tool is normal lighttpd [21:23:17] I am for pageviews [21:23:28] not for MusikBot [21:24:59] I've g2g, thanks to you both for your help! I'm learning a lot :) [21:25:16] see you :-) [21:27:49] bye :) [21:57:55] 06Labs, 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07Wikimedia-Incident: Nodepool instance instance creation quota management - https://phabricator.wikimedia.org/T143016#2573738 (10chasemp) yeah we puzzled over this for a good long while. https://graphite.wikimedia.org/render/?width=88... [22:01:28] !log tools Disabling puppet across tools hosts [22:01:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [22:05:05] 06Labs, 10Labs-Infrastructure: Plan deprecation of all precise instances in Labs - https://phabricator.wikimedia.org/T143349#2573771 (10AlexMonk-WMF) >>! In T143349#2572112, @MoritzMuehlenhoff wrote: > @yuvipanda : There's at least one precise instance not in your list; precise.debdeploy.eqiad.wmflabs ? This s... [22:07:09] !log tools Disabled puppet across tools hosts in preparation to merge https://gerrit.wikimedia.org/r/#/c/305657/ (see T134896) [22:07:10] T134896: move nfs /scratch to labstore1003 - https://phabricator.wikimedia.org/T134896 [22:07:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master [22:20:22] (03CR) 10BryanDavis: [C: 032] Add Logstash logging support [labs/striker] - 10https://gerrit.wikimedia.org/r/305941 (https://phabricator.wikimedia.org/T143172) (owner: 10BryanDavis) [22:20:39] (03CR) 10BryanDavis: [C: 032] Add python-logstash [labs/striker/wheels] - 10https://gerrit.wikimedia.org/r/306047 (https://phabricator.wikimedia.org/T143172) (owner: 10BryanDavis) [22:31:19] (03Merged) 10jenkins-bot: Add Logstash logging support [labs/striker] - 10https://gerrit.wikimedia.org/r/305941 (https://phabricator.wikimedia.org/T143172) (owner: 10BryanDavis) [22:31:21] (03Merged) 10jenkins-bot: Add python-logstash [labs/striker/wheels] - 10https://gerrit.wikimedia.org/r/306047 (https://phabricator.wikimedia.org/T143172) (owner: 10BryanDavis) [22:33:00] (03PS3) 10BryanDavis: Add logstash logging support [labs/striker/deploy] - 10https://gerrit.wikimedia.org/r/306049 (https://phabricator.wikimedia.org/T143172) [22:33:38] (03CR) 10BryanDavis: [C: 032] Add logstash logging support [labs/striker/deploy] - 10https://gerrit.wikimedia.org/r/306049 (https://phabricator.wikimedia.org/T143172) (owner: 10BryanDavis) [22:33:44] (03Merged) 10jenkins-bot: Add logstash logging support [labs/striker/deploy] - 10https://gerrit.wikimedia.org/r/306049 (https://phabricator.wikimedia.org/T143172) (owner: 10BryanDavis) [23:04:03] PROBLEM - Puppet run on tools-mail-01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [23:29:16] PROBLEM - Puppet run on tools-precise-dev is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [23:31:57] PROBLEM - Puppet run on tools-webgrid-generic-1402 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [23:33:39] PROBLEM - Puppet run on tools-webgrid-generic-1403 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [23:36:03] PROBLEM - Puppet run on tools-webgrid-generic-1404 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [23:43:14] PROBLEM - Puppet run on tools-webgrid-lighttpd-1203 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [23:43:53] ^ all me, looking [23:46:02] RECOVERY - Puppet run on tools-webgrid-generic-1404 is OK: OK: Less than 1.00% above the threshold [0.0] [23:46:58] RECOVERY - Puppet run on tools-webgrid-generic-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [23:48:38] RECOVERY - Puppet run on tools-webgrid-generic-1403 is OK: OK: Less than 1.00% above the threshold [0.0] [23:49:55] RECOVERY - Puppet staleness on tools-exec-1211 is OK: OK: Less than 1.00% above the threshold [3600.0] [23:53:59] RECOVERY - Puppet staleness on tools-exec-1213 is OK: OK: Less than 1.00% above the threshold [3600.0]