[08:19:11] <shinken-wm>	 PROBLEM - SSH on tools-exec-1221 is CRITICAL: Server answer
[08:39:11] <shinken-wm>	 RECOVERY - SSH on tools-exec-1221 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2~wmfprecise2 (protocol 2.0)
[08:45:12] <shinken-wm>	 PROBLEM - SSH on tools-exec-1221 is CRITICAL: Server answer
[09:05:11] <shinken-wm>	 RECOVERY - SSH on tools-exec-1221 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2~wmfprecise2 (protocol 2.0)
[09:42:19] <wikibugs>	 06Labs, 10DBA, 10Horizon: Tgr unable to login on Horizon - https://phabricator.wikimedia.org/T131630#2338309 (10jcrespo) 05Open>03Resolved a:03jcrespo I will close this for now, the title task (Tgr unable to login on Horizon) is resolved.
[11:18:16] <wikibugs>	 06Labs, 10Tool-Labs, 06Community-Tech-Tool-Labs, 06Developer-Relations, 07Documentation: Create a "my first Python webservice" tutorial for Tool Labs - https://phabricator.wikimedia.org/T134494#2338409 (10Qgil)
[11:25:26] <wikibugs>	 06Labs, 10Labs-Infrastructure, 10DBA: Queries of commonswiki_p.filearchive for fa_sha1 are slow - https://phabricator.wikimedia.org/T71088#726770 (10Volans) The query is not using any index: ``` MariaDB LABS localhost (none) > explain SELECT * FROM commonswiki_p.filearchive WHERE fa_sha1 = '0mpoldytyxspxrdbf...
[11:26:39] <shinken-wm>	 RECOVERY - Puppet staleness on tools-prometheus-01 is OK: OK: Less than 1.00% above the threshold [3600.0]
[11:42:14] <wikibugs>	 06Labs, 10Labs-Infrastructure, 10DBA: Queries of commonswiki_p.filearchive for fa_sha1 are slow - https://phabricator.wikimedia.org/T71088#726770 (10valhallasw) One option might be to create a filearchive_notdeleted view (analogous to the _userindex one), with a `WHERE fa_deleted&1 = 0`.  (https://git.wikime...
[11:53:09] <godog>	 !log tools cherry-pick https://gerrit.wikimedia.org/r/#/c/280652 https://gerrit.wikimedia.org/r/#/c/290479 https://gerrit.wikimedia.org/r/#/c/291710/ on tools-puppetmaster-01
[11:53:12] <labs-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master
[11:55:25] <wikibugs>	 06Labs, 10Labs-Infrastructure, 10DBA: Queries of commonswiki_p.filearchive for fa_sha1 are slow - https://phabricator.wikimedia.org/T71088#2338487 (10jcrespo)
[11:59:48] <shinken-wm>	 PROBLEM - Puppet run on tools-prometheus-01 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0]
[12:09:42] <shinken-wm>	 RECOVERY - Puppet run on tools-prometheus-01 is OK: OK: Less than 1.00% above the threshold [0.0]
[12:52:12] <shinken-wm>	 PROBLEM - SSH on tools-exec-1221 is CRITICAL: Server answer
[12:59:25] <wikibugs>	 06Labs, 10Tool-Labs: puppet disabled on tools-pastion-01 - https://phabricator.wikimedia.org/T136552#2338643 (10valhallasw)
[12:59:53] <wikibugs>	 06Labs, 10Tool-Labs: ssh on tools-exec-1221 closes connection - https://phabricator.wikimedia.org/T136553#2338657 (10valhallasw)
[13:05:21] <wikibugs>	 06Labs, 10Tool-Labs: ssh on tools-exec-1221 closes connection - https://phabricator.wikimedia.org/T136553#2338682 (10valhallasw) ``` valhallasw@tools-bastion-02:~$ qmod -d "*@tools-exec-1221" valhallasw@tools-bastion-02.tools.eqiad.wmflabs changed state of "continuous@tools-exec-1221.tools.eqiad.wmflabs" (disa...
[13:06:31] <valhallasw`cloud>	 !log tools rebooting tools-exec-1221
[13:06:35] <labs-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master
[13:07:24] <wikibugs>	 06Labs, 10Tool-Labs: ssh on tools-exec-1221 closes connection - https://phabricator.wikimedia.org/T136553#2338685 (10Ladsgroup) Was it too resource consuming?
[13:12:11] <shinken-wm>	 RECOVERY - SSH on tools-exec-1221 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2~wmfprecise2 (protocol 2.0)
[13:13:24] <wikibugs>	 06Labs, 10Tool-Labs: ssh on tools-exec-1221 closes connection - https://phabricator.wikimedia.org/T136553#2338701 (10valhallasw) No, the host was hanging, and thus had to be rebooted. The jobs mentioned above were running there, could not be restarted, and thus had to be force-deleted (otherwise SGE would have...
[13:14:24] <wikibugs>	 06Labs, 10Tool-Labs: ssh on tools-exec-1221 closes connection - https://phabricator.wikimedia.org/T136553#2338702 (10valhallasw) 05Open>03Resolved a:03valhallasw Host is back online after a reboot, and the queues are re-enabled.
[13:15:32] <shinken-wm>	 PROBLEM - Puppet staleness on tools-exec-1221 is CRITICAL: CRITICAL: 28.57% of data above the critical threshold [43200.0]
[13:17:33] <wikibugs>	 06Labs, 10Tool-Labs: Stale NFS handle breaks puppet on tools-exec-1204, -1205 and -1218 - https://phabricator.wikimedia.org/T136495#2338707 (10valhallasw) This is now also happening on tools-exec-1203: ``` Error: /Stage[main]/Role::Labs::Nfsclient/Labstore::Nfs_mount[dumps]/File[/public/dumps]: Could not evalu...
[13:20:34] <shinken-wm>	 RECOVERY - Puppet staleness on tools-exec-1221 is OK: OK: Less than 1.00% above the threshold [3600.0]
[13:59:34] <wikibugs>	 06Labs, 10Tool-Labs: puppet disabled on tools-prometheus-01 - https://phabricator.wikimedia.org/T136498#2338794 (10fgiunchedi) 05Open>03Resolved a:03fgiunchedi indeed, I've reenabled puppet and cherry-picked https://gerrit.wikimedia.org/r/#/c/291710/ on tools-puppetmaster-01 so that's now running the sam...
[14:01:47] <godog>	 YuviPanda: FYI the prometheus instance on tools now has a /tools/ prefix, IOW https://tools-prometheus.wmflabs.org/tools/status from https://gerrit.wikimedia.org/r/#/c/290479/
[14:02:17] <valhallasw`cloud>	 godog: thanks!
[14:02:52] <godog>	 valhallasw`cloud: np, trying to wrap up a few things before going on vacation tomorrow :D
[16:58:36] <wikibugs>	 10Labs-project-extdist, 10MediaWiki-extensions-ExtensionDistributor: Download snapshot generates 404 for downloads - https://phabricator.wikimedia.org/T136564#2339112 (10Legoktm) p:05Triage>03Unbreak! a:03Legoktm
[18:06:57] <grrrit-wm>	 (03PS1) 10Jean-Frédéric: Migrate to use Intuition as a library [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/291776 (https://phabricator.wikimedia.org/T134565) 
[18:15:41] <grrrit-wm>	 (03CR) 10Krinkle: Migrate to use Intuition as a library (031 comment) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/291776 (https://phabricator.wikimedia.org/T134565) (owner: 10Jean-Frédéric)
[19:09:35] <Matthew_>	 So... where do I connect if I want to be able to access all of the replicas on tool labs?  Is it c1.labsdb?  Or did I read the docs wrong?
[19:13:39] <grrrit-wm>	 (03CR) 10Jean-Frédéric: Migrate to use Intuition as a library (031 comment) [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/291776 (https://phabricator.wikimedia.org/T134565) (owner: 10Jean-Frédéric)
[19:14:54] <grrrit-wm>	 (03PS4) 10Jean-Frédéric: Add local dev environment with docker-compose [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/291198 (https://phabricator.wikimedia.org/T136351) 
[19:15:20] <grrrit-wm>	 (03PS5) 10Jean-Frédéric: Add local dev environment with docker-compose [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/291198 (https://phabricator.wikimedia.org/T136351) 
[19:42:53] <danilo>	 hi, why recentchanges table of ptwiki_p have data from before the last 30 days? MariaDB [ptwiki_p]> SELECT MIN(rc_timestamp) FROM recentchanges; return 20160207180814, but should return something like 20160430...
[19:42:58] <wikibugs>	 06Labs, 10Tool-Labs, 10xTools-on-Labs: xtools-articleinfo spawning a large number of duplicate webservices - https://phabricator.wikimedia.org/T132471#2339578 (10Matthewrbowker) a:03Matthewrbowker
[19:43:37] <danilo>	 this is resulting in a bug in a graph that uses this table: http://tools.wmflabs.org/ptwikis/Patrulhamento_de_IPs
[19:48:32] <wikibugs>	 06Labs, 10Tool-Labs, 10xTools-on-Labs: xtools-articleinfo spawning a large number of duplicate webservices - https://phabricator.wikimedia.org/T132471#2339610 (10Matthewrbowker) p:05Triage>03Normal
[19:49:01] <wikibugs>	 06Labs, 10Tool-Labs: Investigate why Joe is default editor on toollabs - https://phabricator.wikimedia.org/T100526#2339611 (10valhallasw) 05declined>03Open Reopening this. For some reason, `git` starts `joe` (as `valhallasw`)  even though my   ``` valhallasw@tools-bastion-03:~/src/pywikibot-core$ cat ~/.se...
[20:02:24] <wikibugs>	 06Labs, 10DBA, 10Horizon: Tgr unable to login on Horizon - https://phabricator.wikimedia.org/T131630#2339667 (10Andrew) Thank you @jynus
[20:09:39] <YuviPanda>	 Matthew_: around?
[20:09:50] <Matthew_>	 Yes
[20:10:15] <YuviPanda>	 Matthew_: do you want me to try moving xtools-articleinfo to kubernetes? that might fix this problem and also provide you with more isolation
[20:10:25] <YuviPanda>	 it doesn't change anything for you
[20:10:28] <YuviPanda>	 it still runs off of NFS
[20:10:35] <YuviPanda>	 and you can change/deploy code as you used to
[20:10:43] <YuviPanda>	 I only have PHP5.6 available now tho
[20:10:44] <Matthew_>	 If you think it will help.
[20:11:00] <YuviPanda>	 Matthew_: so the question becomes, do you think it'll work on php5.6
[20:11:05] <Matthew_>	 I don't know if our current code is php5.6 comparable though.
[20:11:19] <YuviPanda>	 Matthew_: does it run on precise or trusty?
[20:11:19] <Matthew_>	 May I take a look and get back to you?
[20:11:33] <YuviPanda>	 Matthew_: it runs on trusty, so should be ok.
[20:11:35] <YuviPanda>	 Matthew_: sure
[20:11:46] <Matthew_>	 Okay. I'll look and let you know. Thank you!
[20:12:19] <YuviPanda>	 Matthew_: yw. it could also allow you to have a http based health check, so you don't need to do webservice restart
[20:12:44] <Matthew_>	 Okay.
[20:18:49] <wikibugs>	 06Labs, 10Tool-Labs, 10xTools-on-Labs: xtools-articleinfo spawning a large number of duplicate webservices - https://phabricator.wikimedia.org/T132471#2339690 (10MusikAnimal) Since adding Yuvipanda's magic script I haven't noticed any extraneous webservices. I think this can safely be closed as resolved.
[20:20:39] <wikibugs>	 06Labs, 10Tool-Labs, 10xTools-on-Labs: xtools-articleinfo spawning a large number of duplicate webservices - https://phabricator.wikimedia.org/T132471#2339692 (10Matthewrbowker) 05Open>03Resolved >>! In T132471#2339690, @MusikAnimal wrote: > Since adding Yuvipanda's magic script I haven't noticed any ext...
[20:24:26] <Amir1>	 YuviPanda: hey, it would be great (and if you have some time) to take a look at this
[20:24:27] <Amir1>	 https://gerrit.wikimedia.org/r/#/c/291751/
[20:25:46] <Amir1>	 thanks 
[21:09:52] <wikibugs>	 10Quarry: Add date when query was last run - https://phabricator.wikimedia.org/T77941#832144 (10agray) This would be very useful for reports which use Quarry data (allowing us to timestamp the source for the end-user). The page currently reports the ID of the last run (in the source as `"qrun_id": 12345`) which...
[21:19:55] <wikibugs>	 06Labs, 10Tool-Labs: jsub appears to act differently towards network requests - https://phabricator.wikimedia.org/T136588#2339811 (10Ladsgroup)
[21:30:32] <Krinkle>	 Betacommand: ping
[21:30:47] <Krinkle>	 Betacommand: Do you know the status of https://tools.wmflabs.org/?tool=wikiviewstats / https://tools.wmflabs.org/wikiviewstats it seems to be down
[21:31:00] <Krinkle>	 Is this obsoleted by https://tools.wmflabs.org/pageviews/ ?
[22:00:57] <wikibugs>	 06Labs, 10Tool-Labs: jsub appears to act differently towards network requests - https://phabricator.wikimedia.org/T136588#2339939 (10Yamaha5) I test it with core. it shows this error!  ``` WARNING: Waiting 10 seconds before retrying. ERROR: Traceback (most recent call last):   File "/data/project/checkdictatio...
[22:04:18] <Krinkle>	 valhallasw`cloud: Can you update https://github.com/valhallasw/tsreports to have a url set in the url field on top? e.g. to https://tools.wmflabs.org/tsreports/ and also in the readme
[22:04:28] <Krinkle>	 Should we create a redirect from toolserver.org~/reports ?
[22:10:34] <YuviPanda>	 Krinkle: did you ever get a chance to look at nagf?
[22:14:00] <Krinkle>	 Not yet
[22:14:25] <Krinkle>	 YuviPanda: Tell me :)
[22:14:40] <Krinkle>	 tools-login, become nagf, qstat
[22:14:49] <YuviPanda>	 Krinkle: no qstat
[22:14:51] <YuviPanda>	 Krinkle: kubectl get pods
[22:14:56] <Krinkle>	 no service.manifest
[22:15:07] <YuviPanda>	 Krinkle: and if you change public_html/ it'll be instantly reflected
[22:15:26] <YuviPanda>	 Krinkle: yeah, it's the testbed for the new k8s backend. You can see the yaml file it is running in nagf-deployment.yaml
[22:16:21] <Krinkle>	 YuviPanda: rc or deployment?
[22:16:28] <YuviPanda>	 Krinkle: deployment
[22:17:38] <YuviPanda>	 Krinkle: your logs are also back on access.log and error.log
[22:22:02] <Krinkle>	 YuviPanda: Interesting
[22:22:05] <Krinkle>	 So it mounts from NFS?
[22:22:28] <Krinkle>	 and then uses lighttpd to read public_html and write to logs 
[22:22:58] <YuviPanda>	 Krinkle: yup
[22:23:04] <YuviPanda>	 Krinkle: it's the exact same code + config we run on gridengine
[22:23:09] <Krinkle>	 Yeah
[22:23:24] <YuviPanda>	 Krinkle: and I'm currently working on adding a --backend=k8s option to webservice
[22:23:30] <YuviPanda>	 Krinkle: so it'll just submit jobs to k8s instead of gridengine
[22:23:33] <YuviPanda>	 and nothing else changes
[22:23:36] <Krinkle>	 So I assume once this is stable, it might replace that? (with the deployment yaml being implied, rather than explicit for each)
[22:23:43] <YuviPanda>	 Krinkle: yup
[22:24:04] <YuviPanda>	 Krinkle: there are still cases when webservice will want to run under gridengine (primarily, if they are spawning jobs on the grid themselves)
[22:24:11] <YuviPanda>	 Krinkle: other than that, it's all positive.
[22:24:14] <Krinkle>	 YuviPanda: Do you intent to have a relatively simple way to get most of this infra but without NFS requirement?
[22:24:32] <YuviPanda>	 Krinkle: I want to, but I haven't been able to think of a way to do that that is actually simple
[22:24:52] <Krinkle>	 Yeah, you need a way to bundle the code and ship it
[22:24:57] <YuviPanda>	 Krinkle: yup
[22:25:26] <YuviPanda>	 Krinkle: there is https://phabricator.wikimedia.org/T136264 for evaluating a proper PaaS for tools
[22:25:30] <YuviPanda>	 Krinkle: which will definitely be NFS Free
[22:25:49] <YuviPanda>	 Krinkle: https://phabricator.wikimedia.org/T136265 solicits comments on what the evaluation criteria should be
[22:25:50] <Krinkle>	 Having to bootstrap it from a public git repo isn't practical. And of course one woudl ideally still have easy access to logs and errors (and persist between restarts, and shared when having multiple replicas) 
[22:26:20] <YuviPanda>	 Krinkle: yeah, so most of these PaaS things have all of that covered.
[22:26:21] <Krinkle>	 I guess k8s would allow one pod to persist as local volume (still not NFS)
[22:26:34] <YuviPanda>	 that's kindof a losing proposition though, since the node could go away
[22:26:36] <Krinkle>	 Separate from the actual http pod
[22:26:50] <Krinkle>	 and then ssh into that via k8s to view the logs
[22:26:52] <YuviPanda>	 the actual solution to that is to deploy actual persistant storage (Cinder / Ceph)
[22:26:56] <Krinkle>	 Right
[22:27:00] <YuviPanda>	 there's a separate ticket for log storage as well
[22:27:02] <Krinkle>	 logstash :)
[22:27:10] <YuviPanda>	 where the actual solution is ElasticSearch + something
[22:27:16] <YuviPanda>	 yeah
[22:27:17] <Krinkle>	 Though difficult with access controls.
[22:27:20] <YuviPanda>	 yup
[22:27:23] <Krinkle>	 :)
[22:27:30] <YuviPanda>	 ElasticSearch's access control plugin is properietary open core stuff
[22:27:39] <YuviPanda>	 log storage itself is a good 6 month project
[22:28:09] <Krinkle>	 YuviPanda: Something like syslogd to NFS could work maybe
[22:28:11] <YuviPanda>	 Krinkle: so we made an early decision to get rid of gridengine first, and then slowly get rid of NFS.
[22:28:18] <YuviPanda>	 Krinkle: kubernetes has 'log collectors' as a concept
[22:28:20] <Krinkle>	 asynchronous and no hard dependency
[22:28:26] <YuviPanda>	 so those would work
[22:28:39] <YuviPanda>	 yeah, so that might be an intermediate next step
[22:28:39] <Krinkle>	 Yeah
[22:28:53] <Krinkle>	 I guess that's what k8s log collectors could be effectively
[22:29:05] <Krinkle>	 UDP or TCP to a subscriber which then persists separately
[22:29:06] <Krinkle>	 anyhow
[22:29:09] <Krinkle>	 g2h
[22:29:11] <Krinkle>	 g2g
[22:29:14] <YuviPanda>	 Krinkle: kk! cya