[00:00:25] <wikibugs>	 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Public IPs not being updated from OpenStack Nova plugin - https://phabricator.wikimedia.org/T52620#2570792 (10Andrew) In theory this issue is already resolved because the status plugin gets 'exists' notifications that are sent every hour.  Do you see otherwise?
[00:15:23] <shinken-wm>	 RECOVERY - Puppet run on tools-exec-1202 is OK: OK: Less than 1.00% above the threshold [0.0]
[01:54:45] <shinken-wm>	 PROBLEM - Puppet run on tools-worker-1010 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[02:04:49] <wikibugs>	 06Labs, 10Tool-Labs: requests_oauthlib not installed on exec hosts? - https://phabricator.wikimedia.org/T143534#2570878 (10Nettrom)
[02:30:37] <wikibugs>	 06Labs, 10Tool-Labs: requests_oauthlib not installed on exec hosts? - https://phabricator.wikimedia.org/T143534#2570878 (10yuvipanda) 1. Use jsub instead of qsub? 2. Pass `-l release=trusty` to make sure that it runs on trusty, which does have these packages (We'll be switching this to the default soon)
[02:34:47] <shinken-wm>	 RECOVERY - Puppet run on tools-worker-1010 is OK: OK: Less than 1.00% above the threshold [0.0]
[02:38:45] <wikibugs>	 06Labs, 10Tool-Labs: requests_oauthlib not installed on exec hosts? - https://phabricator.wikimedia.org/T143534#2570909 (10Nettrom) 05Open>03Resolved a:03Nettrom Switching to trusty by adding the parameter fixed it, sorry for not thinking about that before creating the task. Closing it as resolved.
[02:46:52] <shinken-wm>	 PROBLEM - Puppet run on tools-services-02 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[03:21:48] <shinken-wm>	 RECOVERY - Puppet run on tools-services-02 is OK: OK: Less than 1.00% above the threshold [0.0]
[05:42:05] <shinken-wm>	 PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string 'Magnus' not found on 'http://tools.wmflabs.org:80/' - 355 bytes in 0.002 second response time
[05:47:05] <shinken-wm>	 RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 3670 bytes in 0.035 second response time
[06:43:05] <shinken-wm>	 PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string 'Magnus' not found on 'http://tools.wmflabs.org:80/' - 531 bytes in 0.019 second response time
[06:53:06] <shinken-wm>	 RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 3670 bytes in 0.019 second response time
[07:28:28] <wikibugs>	 06Labs, 10Tool-Labs, 13Patch-For-Review: Tool Labs jobs locking up - https://phabricator.wikimedia.org/T143375#2570993 (10valhallasw) One easy thing to check is whether the job can still write to the output file; to do so, ssh to the exec host, and `ls -l /proc/<pid>/fd`, where <pid> is the process id (which...
[07:29:05] <shinken-wm>	 PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string 'Magnus' not found on 'http://tools.wmflabs.org:80/' - 355 bytes in 0.002 second response time
[07:34:05] <shinken-wm>	 RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 3670 bytes in 0.026 second response time
[07:41:29] <mafk>	 Hi. Labs tools on Error 500
[07:41:47] <mafk>	 500 Internal Server Error - nginx/1.11.1
[07:45:06] <shinken-wm>	 PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: HTTP CRITICAL - No data received from host
[07:55:47] <shinken-wm>	 PROBLEM - Puppet run on tools-worker-1010 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [0.0]
[07:55:59] <wikibugs>	 10Tool-Labs-tools-Other: [AG] [Bug] Internet Explorer: Wrong height of input field - https://phabricator.wikimedia.org/T113590#2571024 (10Florian)
[08:10:08] <Krenair>	 mafk, I can get to some labs tools
[08:10:13] <Krenair>	 which ones can't you get to?
[08:10:52] <mafk>	 it's working now
[08:11:01] <mafk>	 tools.guc
[08:11:09] <mafk>	 and some others
[08:11:16] <mafk>	 maybe a temporary outage
[08:26:57] <wikibugs>	 06Labs, 10Labs-Infrastructure, 10Beta-Cluster-Infrastructure, 07Tracking: Log files on labs instance fill up disk (/var is only 2GB) (tracking) - https://phabricator.wikimedia.org/T71601#2571093 (10hashar)
[08:30:48] <shinken-wm>	 RECOVERY - Puppet run on tools-worker-1010 is OK: OK: Less than 1.00% above the threshold [0.0]
[08:33:27] <shinken-wm>	 PROBLEM - Puppet run on tools-flannel-etcd-03 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0]
[08:37:26] <wikibugs>	 06Labs, 10Labs-Infrastructure, 10Beta-Cluster-Infrastructure, 07Tracking: Log files on labs instance fill up disk (/var is only 2GB) (tracking) - https://phabricator.wikimedia.org/T71601#2571103 (10hashar) 05Open>03Resolved a:03hashar That was a transient issue due to labs instances having a `/var` o...
[08:39:39] <wikibugs>	 06Labs, 10Graphite, 06Operations: lots of graphite metrics under "instances" created - https://phabricator.wikimedia.org/T143405#2571110 (10fgiunchedi) @yuvipanda the above would remove also recent files, sth like `find . -type f -mtime +672 -delete` and delete empty directories too afterwards
[09:13:28] <shinken-wm>	 RECOVERY - Puppet run on tools-flannel-etcd-03 is OK: OK: Less than 1.00% above the threshold [0.0]
[09:39:16] <shinken-wm>	 PROBLEM - Puppet staleness on tools-webgrid-lighttpd-1207 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [43200.0]
[10:38:51] <wikibugs>	 10Tool-Labs-tools-Pageviews, 07I18n: The label Metric in Siteviews is not localizable - https://phabricator.wikimedia.org/T143544#2571253 (10Amire80)
[11:51:30] <wikibugs>	 06Labs, 10Tool-Labs: Puppet not running on tools-webgrid-lighttpd-1207 - https://phabricator.wikimedia.org/T143191#2571314 (10MoritzMuehlenhoff) You mean the "error writing to client: Broken pipe" message? Yes, that's irritatiing, but benign. See here for more: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug...
[13:40:23] <wikibugs>	 06Labs: cronspam from labscontrol1001, labstore1001, labnet1002.eqiad.wmnet, labsdb1003.eqiad.wmnet - https://phabricator.wikimedia.org/T132422#2571494 (10Andrew) I fixed several of these just now; will keep an eye on my cronspam to see what I missed.
[13:45:05] <wikibugs>	 06Labs: cronspam from labscontrol1001, labstore1001, labnet1002.eqiad.wmnet, labsdb1003.eqiad.wmnet - https://phabricator.wikimedia.org/T132422#2571507 (10faidon) Thanks a lot, appreciated :) I'll keep an eye out too.
[14:00:31] <wikibugs>	 06Labs, 10Math: Request increased quota for Math labs project - https://phabricator.wikimedia.org/T143446#2568538 (10chasemp) Am I reading this right: you want whatever quota bump it takes to spin up one large instance while keeping existing instances?  I'm not sure if it's only RAM needed in this case but we...
[14:03:02] <wikibugs>	 06Labs, 10Math: Request increased quota for Math labs project - https://phabricator.wikimedia.org/T143446#2571545 (10Hcohl) If by 'spin up' , it is meant to 'create a new', then I agree yes.
[14:07:26] <wikibugs>	 06Labs: Don't set instance root passwords if using a local puppetmaster - https://phabricator.wikimedia.org/T142531#2571549 (10yuvipanda) Even more detailed steps:  1. Write a custom fact that parses /etc/puppet/puppet.conf for puppetmaster = value 2. Check if this is equal to the labs puppetmaster value (obtain...
[14:16:38] <wikibugs>	 06Labs: Request increased quota for Phlogiston labs project - https://phabricator.wikimedia.org/T143020#2571566 (10chasemp) So the request is for enough quota for one large instance while keeping phlog-02 around, but as a temp bump in some fashion as the resources from phlog-01 will be recycled once the new larg...
[14:57:33] <wikibugs>	 06Labs, 06Operations: grafana-labs.wikimedia.org doesn't reflect grafana-labs-admin.wikimedia.org - https://phabricator.wikimedia.org/T143556#2571642 (10yuvipanda)
[14:59:39] <wikibugs>	 06Labs, 10Tool-Labs: Build a puppet failure check for tools that's less flaky than current one - https://phabricator.wikimedia.org/T143499#2571654 (10yuvipanda)
[15:00:10] <wikibugs>	 06Labs, 10Tool-Labs, 13Patch-For-Review: Tool Labs jobs locking up - https://phabricator.wikimedia.org/T143375#2571655 (10MusikAnimal) I think I've figured it out. The [[ https://en.wikipedia.org/w/api.php?action=query&titles=MediaWiki&format=json&maxlag=-1 | maxlag API response ]] now includes fractional se...
[15:38:32] <shinken-wm>	 RECOVERY - Host secgroup-lag-102 is UP: PING OK - Packet loss = 0%, RTA = 1.23 ms
[15:41:58] <shinken-wm>	 PROBLEM - Host secgroup-lag-102 is DOWN: CRITICAL - Host Unreachable (10.68.17.218)
[15:47:22] <shinken-wm>	 RECOVERY - Host tools-secgroup-test-103 is UP: PING OK - Packet loss = 0%, RTA = 0.60 ms
[15:49:43] <shinken-wm>	 PROBLEM - Host tools-secgroup-test-103 is DOWN: CRITICAL - Host Unreachable (10.68.21.22)
[15:50:38] <yuvipanda>	 andrewbogott btw, I think the wikitech api is reporting instances that no longer exist
[15:51:47] <yuvipanda>	 see https://wikitech.wikimedia.org/w/api.php?action=query&list=novainstances&niregion=eqiad&format=json&niproject=tools for example
[15:51:49] <andrewbogott>	 yuvipanda: I don't really know how the wikitech api works or where it gets its data
[15:52:00] <yuvipanda>	 andrewbogott it is just a reflection of nova
[15:52:05] <yuvipanda>	 let me see if nova has same issue
[15:52:25] <shinken-wm>	 RECOVERY - Host tools-secgroup-test-102 is UP: PING OK - Packet loss = 0%, RTA = 0.85 ms
[15:53:03] <yuvipanda>	 andrewbogott can confirm, instances that are deleted are in ERROR state in nova 
[15:53:42] <andrewbogott>	 'in nova' meaning what?
[15:54:02] <yuvipanda>	 andrewbogott 'openstack server list' shows them in ERROR state
[15:54:07] <yuvipanda>	 OS_TENANT_NAME=tools openstack server list  | grep ERROR
[15:54:12] <yuvipanda>	 let me file a bug
[15:54:12] <andrewbogott>	 ok
[15:54:16] <andrewbogott>	 thanks
[15:54:41] <shinken-wm>	 PROBLEM - Host tools-secgroup-test-102 is DOWN: CRITICAL - Host Unreachable (10.68.21.170)
[15:55:19] <wikibugs>	 06Labs: Deleted instances stuck in ERROR state in nova - https://phabricator.wikimedia.org/T143566#2571912 (10yuvipanda)
[15:55:54] <yuvipanda>	 andrewbogott done!
[16:01:31] <wikibugs>	 06Labs, 06Operations: grafana-labs.wikimedia.org doesn't reflect grafana-labs-admin.wikimedia.org - https://phabricator.wikimedia.org/T143556#2571946 (10fgiunchedi) a:03fgiunchedi
[16:11:10] <wm-bot>	 Change on 12wikitech.wikimedia.org a page Nova Resource:Tools/Access Request/Joaquinito01 was created, changed by Joaquinito01 link https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/Access_Request/Joaquinito01 edit summary: Created page with "{{Tools Access Request |Justification=Wikipedia |Completed=false |User Name=Joaquinito01 }}"
[16:31:20] <grrrit-wm>	 (03PS1) 10Lokal Profil: Add project to dbPrimaryKey [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/306012 (https://phabricator.wikimedia.org/T143481) 
[16:35:32] <grrrit-wm>	 (03CR) 10Lokal Profil: "Can't try it locally though due to dumps. But possibly if https://gerrit.wikimedia.org/r/#/c/303498/ was merged?" [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/306012 (https://phabricator.wikimedia.org/T143481) (owner: 10Lokal Profil)
[16:45:44] <wikibugs>	 06Labs, 10Labs-Infrastructure: Plan deprecation of all precise instances in Labs - https://phabricator.wikimedia.org/T143349#2572112 (10MoritzMuehlenhoff) @yuvipanda : There's at least one precise instance not in your list; precise.debdeploy.eqiad.wmflabs ? This specific instance can be trashed once we're done...
[17:12:40] <shinken-wm>	 PROBLEM - Puppet run on tools-k8s-master-01 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[17:21:21] <wikibugs>	 06Labs, 06Operations, 13Patch-For-Review: move nfs /scratch to labstore1003 - https://phabricator.wikimedia.org/T134896#2572313 (10chasemp) @madhuvishy has been formalizing our logic for depooling/pooling grid exec nodes and so with T140483 resolved we hope to rolling this out without rebooted.  0. stage cha...
[17:29:46] <wikibugs>	 06Labs, 10Tool-Labs, 13Patch-For-Review: Tool Labs jobs locking up - https://phabricator.wikimedia.org/T143375#2572331 (10MusikAnimal) 05Open>03Resolved a:03MusikAnimal I think that was it. Thank you very much for the help everyone (esp. @valhallasw)!! This one was not easy to figure out :)  Do we know...
[17:39:28] <wikibugs>	 06Labs, 06Operations, 13Patch-For-Review: move nfs /scratch to labstore1003 - https://phabricator.wikimedia.org/T134896#2572367 (10yuvipanda) You can do the same for k8s. You can depool https://wikitech.wikimedia.org/wiki/Tools_Kubernetes#Depooling_a_node, do your thing, repool. That will work for all the k8...
[17:42:25] <wikibugs>	 06Labs, 06Operations, 13Patch-For-Review: move nfs /scratch to labstore1003 - https://phabricator.wikimedia.org/T134896#2572382 (10madhuvishy) Yeah I'm familiar with doing this for k8s worker nodes - did this a bunch of times while helping @yuvipanda recreate worker nodes couple weeks ago.
[17:52:43] <shinken-wm>	 RECOVERY - Puppet run on tools-k8s-master-01 is OK: OK: Less than 1.00% above the threshold [0.0]
[18:30:53] <nuria_>	 anyone there? Doing >ps -aux fw on one of our labs instances displays processes on/off
[18:31:07] <nuria_>	 which seems really wrong
[18:31:44] <nuria_>	 cc yuvipanda andrewbogott 
[18:32:08] <nuria_>	 well nevermind
[18:39:01] <valhallasw`cloud>	 nuria_: 'on/off' ?
[18:39:57] <nuria_>	 valhallasw`cloud: ya nevermind , figure it out that ps -aux fw no longer works has to be ps -auxfw
[18:51:53] <wikibugs>	 06Labs, 10Horizon, 13Patch-For-Review: Incorrect quota error when creating instances in some projects - https://phabricator.wikimedia.org/T142379#2572906 (10Andrew) 05Open>03Resolved ugly hotfix applied!
[19:23:34] <curry>	 How do we set the secret key while running a uwsgi Flask app on Labs?
[19:24:03] <valhallasw`cloud>	 curry: app.secret_key = ... ?
[19:24:10] <valhallasw`cloud>	 curry: I may be missing some context here
[19:24:22] <curry>	 I tried $ export SECRET_KEY='secret_key' in the terminal
[19:25:02] <curry>	 And used app.config['SECRET_KEY']=environ.get('SECRET_KEY')
[19:25:04] <valhallasw`cloud>	 You're trying to run an existing app that requires setting the secret key via an environment variable?
[19:25:24] <curry>	 Yes
[19:25:57] <curry>	 It throws 'secret key not set' error when I try to access a session variable
[19:28:08] <valhallasw`cloud>	 curry: I don't know much of the flask internals, so I'm afraid you'll have to debug that
[19:28:22] <valhallasw`cloud>	 http://flask.pocoo.org/docs/0.11/config/ suggests app.config['SECRET_KEY'] = ... should indeed work
[19:28:31] <valhallasw`cloud>	 but maybe you're accessing the session variable before the config is set?
[19:28:38] <valhallasw`cloud>	 or maybe it's being overwritten somehow/
[19:31:49] <curry>	 I was wondering if ToolLabs deals with secret keys in a different way
[19:32:10] <curry>	 Because what I am doing works perfectly fine on localhost
[19:33:22] <tom29739>	 curry, how are you running the flask app?
[19:33:28] <valhallasw`cloud>	 You're not very clear in where the issue occurs. Is the issue with reading an enviroment variable or in configuring flask with that secret key?
[19:34:12] <curry>	 The exact error I'm getting in uwsgi.logs is this:
[19:34:15] <curry>	 RuntimeError: The session is unavailable because no secret key was set.  Set the secret_key on the application to something unique and secret.
[19:34:42] <curry>	 I'm running the Flask app as:
[19:34:45] <curry>	 webservice uwsgi-python start
[19:34:58] <tom29739>	 That explains it
[19:35:13] <valhallasw`cloud>	 oh, right. environ.get('SECRET_KEY') returns None if it can't read the environment variable
[19:35:27] <tom29739>	 If you set the env var on the bastion then it doesn't carry over to the grid host
[19:36:13] <curry>	 Ohh!
[19:36:57] <curry>	 Any quicklinks on how to set it on the grid?
[19:37:19] <shinken-wm>	 PROBLEM - Puppet run on tools-docker-builder-01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[19:38:03] <tom29739>	 curry, you can't
[19:38:09] <valhallasw`cloud>	 I would just load the key from a json file
[19:39:32] <curry>	 thing is even when I explicitly did:
[19:40:09] <curry>	 app.secret_key='secret'
[19:40:29] <curry>	 I get the same error although I am not getting the key from an env var
[19:41:52] <valhallasw`cloud>	 curry: are you sure you're not overwriting  that value later?
[19:41:58] <valhallasw`cloud>	 e.g. with None from os.environ/
[19:42:10] <curry>	 Yes. Sure.
[19:43:53] <curry>	 I just stopped and started the service
[19:43:57] <curry>	 It's working
[19:44:19] <valhallasw`cloud>	 curry: Ah, right. If you change code, the server has to be restarted.
[19:44:38] <valhallasw`cloud>	 (as opposed to just running 'python app.py', which has some auto-reloading magic)
[19:44:55] <curry>	 Okayy. So you suggest reading in the secret key from an external file?
[19:46:20] <valhallasw`cloud>	 Yes, I suggest moving all secrets to a seperate file (e.g. 'secrets.json'), then loading the data from there
[19:46:46] <valhallasw`cloud>	 secrets.json can then be chmod 600 (i.e. only read/writeable by the user) while the rest of the code can be open for reading
[19:51:36] <curry>	 Alright. Thanks a lot!
[19:51:48] <wikibugs>	 06Labs: Request increased quota for Phlogiston labs project - https://phabricator.wikimedia.org/T143020#2573062 (10JAufrecht) Phlog-01 is not currently running Phlogiston code and hasn't in weeks, so the leak in the charts shouldn't be related to Phlogiston.  Phlog-01 was stable until early June (T137736), and t...
[20:01:10] <grrrit-wm>	 (03PS1) 10BryanDavis: Add python-logstash [labs/striker/wheels] - 10https://gerrit.wikimedia.org/r/306047 
[20:02:54] <shinken-wm>	 PROBLEM - Puppet run on tools-exec-1204 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0]
[20:05:51] <grrrit-wm>	 (03PS2) 10BryanDavis: Add python-logstash [labs/striker/wheels] - 10https://gerrit.wikimedia.org/r/306047 
[20:09:26] <grrrit-wm>	 (03PS1) 10BryanDavis: Add python-logstash and bump wheels [labs/striker/deploy] - 10https://gerrit.wikimedia.org/r/306049 
[20:29:46] <wikibugs>	 06Labs, 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07Wikimedia-Incident: Nodepool instance instance creation quota management - https://phabricator.wikimedia.org/T143016#2573190 (10hashar) Nice debugging!  >>! In T143016#2559446, @thcipriani wrote: > Messages like this one: >  > ``` > DEB...
[20:55:50] <grrrit-wm>	 (03PS2) 10BryanDavis: Add Logstash logging support [labs/striker] - 10https://gerrit.wikimedia.org/r/305941 (https://phabricator.wikimedia.org/T143172) 
[20:56:15] <grrrit-wm>	 (03PS3) 10BryanDavis: Add python-logstash [labs/striker/wheels] - 10https://gerrit.wikimedia.org/r/306047 (https://phabricator.wikimedia.org/T143172) 
[20:56:35] <grrrit-wm>	 (03PS2) 10BryanDavis: Add python-logstash and bump wheels [labs/striker/deploy] - 10https://gerrit.wikimedia.org/r/306049 (https://phabricator.wikimedia.org/T143172) 
[20:57:23] <wikibugs>	 10Tool-Labs-tools-Pageviews: Script/bot rapidly hitting Pageviews tool - https://phabricator.wikimedia.org/T142607#2573335 (10MusikAnimal) 05Open>03Invalid This appears to have stopped
[21:03:43] <valhallasw`cloud>	 musikanimal: wrt access.log; could that be the log rotation again?
[21:04:06] <musikanimal>	 ?
[21:04:06] <valhallasw`cloud>	 it can also just be a bit slow; the easiest way to test is to ssh to the webgrid server and `tail access.log` there
[21:04:09] <musikanimal>	 where do you see that
[21:04:16] <valhallasw`cloud>	 https://phabricator.wikimedia.org/T142607#2573335
[21:04:21] <valhallasw`cloud>	 "A weird unrelated issue is the access.log on Tool Labs isn't being written to for any requests,"
[21:04:36] <musikanimal>	 oh, yeah
[21:04:49] <musikanimal>	 hmm that could be the log rotation, actually
[21:05:30] <musikanimal>	 the last entries in the access.log are from 6 August
[21:06:02] <musikanimal>	 maybe my log rotation script is broken, but it should scrape off the top part of the file and not the bottom
[21:06:12] <musikanimal>	 tail -c 1000000 $logfile > temp.$$; mv temp.$$ $logfile
[21:06:51] <musikanimal>	 where logfile is one the files in *.log
[21:07:40] <musikanimal>	 valhallasw`cloud: how might one get into the webgrid server?
[21:08:25] <tom29739>	 musikanimal, same way as any other sever
[21:08:27] <tom29739>	 *server
[21:08:41] <tom29739>	 `ssh tools-webgrid-<number>`
[21:09:01] <tom29739>	 You can find the host with qsub, like any other grid job
[21:09:52] <musikanimal>	 this tool is on Kubernetes
[21:11:05] <tom29739>	 Do `kubectl get pods`
[21:11:22] <tom29739>	 Get the pod number, it'll be the tool name + some random characters
[21:11:59] <tom29739>	 musikanimal, then `kubectl exec <pod name> -ti -- bash`
[21:12:27] <wikibugs>	 06Labs, 07Tracking: Existing Labs project quota increase requests (Tracking) - https://phabricator.wikimedia.org/T140904#2573496 (10chasemp)
[21:12:29] <wikibugs>	 06Labs, 10Phlogiston (Interrupt): Create new Phlogiston instance for production - https://phabricator.wikimedia.org/T142277#2573495 (10chasemp)
[21:12:34] <wikibugs>	 06Labs: Revert: Request increased quota for Phlogiston labs project - https://phabricator.wikimedia.org/T143020#2573497 (10chasemp)
[21:13:17] <musikanimal>	 sweet, thank you
[21:13:22] <musikanimal>	 never would have figured that out heh
[21:13:32] <musikanimal>	 anyway the logs are the same there, nothing since 6 August
[21:13:39] <musikanimal>	 I'll try disabling the rotation
[21:14:14] <musikanimal>	 it only rotates every two hours though, so we should see something more recent in the log
[21:15:29] <wikibugs>	 06Labs, 07Tracking: Existing Labs project quota increase requests (Tracking) - https://phabricator.wikimedia.org/T140904#2573507 (10chasemp)
[21:15:31] <wikibugs>	 06Labs, 10Math: Request increased quota for Math labs project - https://phabricator.wikimedia.org/T143446#2573504 (10chasemp) 05Open>03Resolved a:03chasemp @Hcohi should be good to go. Let us know if you have issues.
[21:17:06] <wikibugs>	 06Labs, 10Math: Request increased quota for Math labs project - https://phabricator.wikimedia.org/T143446#2573509 (10Hcohl) If by @Hcohi you mean @Hcohl I am very happy about this. Thank you. :)
[21:18:23] <wikibugs>	 06Labs, 10Math: Request increased quota for Math labs project - https://phabricator.wikimedia.org/T143446#2573517 (10chasemp) @Hcohl ha yes sorry and yw
[21:19:43] <valhallasw`cloud>	 musikanimal: no, because it stops writing after the first rotation
[21:20:09] <valhallasw`cloud>	 musikanimal: because the server is writing to a now-no-longer-existing file
[21:20:26] <musikanimal>	 ah, I think I get it
[21:20:31] <tom29739>	 The log rotation effectively disables the logs
[21:20:42] <musikanimal>	 well that's a shame
[21:20:57] <tom29739>	 Do you have it in truncate mode?
[21:21:02] <musikanimal>	 I think there's a phab somewhere about getting some rotation system set up?
[21:21:11] <musikanimal>	 I'm not sure
[21:21:22] <tom29739>	 Yes.. somewhere
[21:21:30] <musikanimal>	 normal ole `truncate --size 10000 file` chops off the end of the file, right?
[21:21:31] <valhallasw`cloud>	 musikanimal: somewhere, but not going to happen
[21:21:44] <tom29739>	 Probably isn't going to happen for a long time though
[21:21:49] <valhallasw`cloud>	 this will be better in kubernetes-world
[21:21:59] <valhallasw`cloud>	 and in kubernetes-world, this is/should be a solved problem
[21:22:10] <valhallasw`cloud>	 yeah, trunc chops off the end
[21:22:25] <valhallasw`cloud>	 truncate*
[21:22:31] <tom29739>	 musikanimal, try `kubectl logs <pod name>`
[21:22:52] <tom29739>	 If the tool logs to stderr/stdout then it'll be in that
[21:22:59] <musikanimal>	 got nothing
[21:23:06] <valhallasw`cloud>	 tom29739: musikanimal is not using k8s
[21:23:07] <musikanimal>	 this tool is normal lighttpd
[21:23:17] <musikanimal>	 I am for pageviews
[21:23:28] <musikanimal>	 not for MusikBot
[21:24:59] <musikanimal>	 I've g2g, thanks to you both for your help! I'm learning a lot :)
[21:25:16] <valhallasw`cloud>	 see you :-)
[21:27:49] <tom29739>	 bye :)
[21:57:55] <wikibugs>	 06Labs, 10Continuous-Integration-Infrastructure, 13Patch-For-Review, 07Wikimedia-Incident: Nodepool instance instance creation quota management - https://phabricator.wikimedia.org/T143016#2573738 (10chasemp) yeah we puzzled over this for a good long while.    https://graphite.wikimedia.org/render/?width=88...
[22:01:28] <madhuvishy>	 !log tools Disabling puppet across tools hosts
[22:01:33] <labs-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master
[22:05:05] <wikibugs>	 06Labs, 10Labs-Infrastructure: Plan deprecation of all precise instances in Labs - https://phabricator.wikimedia.org/T143349#2573771 (10AlexMonk-WMF) >>! In T143349#2572112, @MoritzMuehlenhoff wrote: > @yuvipanda : There's at least one precise instance not in your list; precise.debdeploy.eqiad.wmflabs ? This s...
[22:07:09] <madhuvishy>	 !log tools Disabled puppet across tools hosts in preparation to merge https://gerrit.wikimedia.org/r/#/c/305657/ (see T134896)
[22:07:10] <stashbot>	 T134896: move nfs /scratch to labstore1003 - https://phabricator.wikimedia.org/T134896
[22:07:14] <labs-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master
[22:20:22] <grrrit-wm>	 (03CR) 10BryanDavis: [C: 032] Add Logstash logging support [labs/striker] - 10https://gerrit.wikimedia.org/r/305941 (https://phabricator.wikimedia.org/T143172) (owner: 10BryanDavis)
[22:20:39] <grrrit-wm>	 (03CR) 10BryanDavis: [C: 032] Add python-logstash [labs/striker/wheels] - 10https://gerrit.wikimedia.org/r/306047 (https://phabricator.wikimedia.org/T143172) (owner: 10BryanDavis)
[22:31:19] <grrrit-wm>	 (03Merged) 10jenkins-bot: Add Logstash logging support [labs/striker] - 10https://gerrit.wikimedia.org/r/305941 (https://phabricator.wikimedia.org/T143172) (owner: 10BryanDavis)
[22:31:21] <grrrit-wm>	 (03Merged) 10jenkins-bot: Add python-logstash [labs/striker/wheels] - 10https://gerrit.wikimedia.org/r/306047 (https://phabricator.wikimedia.org/T143172) (owner: 10BryanDavis)
[22:33:00] <grrrit-wm>	 (03PS3) 10BryanDavis: Add logstash logging support [labs/striker/deploy] - 10https://gerrit.wikimedia.org/r/306049 (https://phabricator.wikimedia.org/T143172) 
[22:33:38] <grrrit-wm>	 (03CR) 10BryanDavis: [C: 032] Add logstash logging support [labs/striker/deploy] - 10https://gerrit.wikimedia.org/r/306049 (https://phabricator.wikimedia.org/T143172) (owner: 10BryanDavis)
[22:33:44] <grrrit-wm>	 (03Merged) 10jenkins-bot: Add logstash logging support [labs/striker/deploy] - 10https://gerrit.wikimedia.org/r/306049 (https://phabricator.wikimedia.org/T143172) (owner: 10BryanDavis)
[23:04:03] <shinken-wm>	 PROBLEM - Puppet run on tools-mail-01 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[23:29:16] <shinken-wm>	 PROBLEM - Puppet run on tools-precise-dev is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0]
[23:31:57] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-generic-1402 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[23:33:39] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-generic-1403 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0]
[23:36:03] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-generic-1404 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0]
[23:43:14] <shinken-wm>	 PROBLEM - Puppet run on tools-webgrid-lighttpd-1203 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0]
[23:43:53] <madhuvishy>	 ^ all me, looking
[23:46:02] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-generic-1404 is OK: OK: Less than 1.00% above the threshold [0.0]
[23:46:58] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-generic-1402 is OK: OK: Less than 1.00% above the threshold [0.0]
[23:48:38] <shinken-wm>	 RECOVERY - Puppet run on tools-webgrid-generic-1403 is OK: OK: Less than 1.00% above the threshold [0.0]
[23:49:55] <shinken-wm>	 RECOVERY - Puppet staleness on tools-exec-1211 is OK: OK: Less than 1.00% above the threshold [3600.0]
[23:53:59] <shinken-wm>	 RECOVERY - Puppet staleness on tools-exec-1213 is OK: OK: Less than 1.00% above the threshold [3600.0]