[00:03:59] <GEOFBOT>	 thanks guys :D http://sn1per-api-tests.wmflabs.org/wiki/Main_Page
[00:05:36] <YuviPanda>	 GEOFBOT: :) 
[01:29:00] <wikibugs>	 6Labs, 7Database: Provision a labsdb useraccount that can be used to run replica-addusers.pl - https://phabricator.wikimedia.org/T104476#1487205 (10yuvipanda) @jcrespo using these credentials from labstore1002 to try to grant grants ends up with:  ```Can\'t connect to MySQL server on \'labsdb1001.eqiad.wmnet\'...
[01:50:01] <wikibugs>	 6Labs, 7Database: Provision a labsdb useraccount that can be used to run replica-addusers.pl - https://phabricator.wikimedia.org/T104476#1487225 (10yuvipanda) And if I use the root pw (to check rest of script), it fails at executing:  ```    GRANT SELECT, SHOW VIEW ON `%\_p`.* TO 's52632'@'%';```  with  ```Acc...
[02:01:02] <wikibugs>	 6Labs, 7Database: Provision a labsdb useraccount that can be used to run replica-addusers.pl - https://phabricator.wikimedia.org/T104476#1487238 (10yuvipanda) Messing around, even:  ```mysql:root@labsdb1001.eqiad.wmnet [(none)]> GRANT SELECT, SHOW VIEW ON enwiki_p.* TO 's52632'@'%'; ERROR 1044 (42000): Access...
[02:04:58] <wikibugs>	 6Labs, 7Database: Provision a labsdb useraccount that can be used to run replica-addusers.pl - https://phabricator.wikimedia.org/T104476#1487242 (10yuvipanda) The actual SQL I need to execute for each user is:  ```    CREATE USER 's52632'@'%' IDENTIFIED BY 'somepasswordhere';     GRANT SELECT, SHOW VIEW ON `%\...
[02:05:42] <shinken-wm>	 PROBLEM - Free space - all mounts on tools-bastion-01 is CRITICAL tools.tools-bastion-01.diskspace.root.byte_percentfree (<50.00%)
[02:06:58] <YuviPanda>	 !log tools removed pacct files from tools-bastion-01
[02:07:02] <labs-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master
[02:20:42] <shinken-wm>	 RECOVERY - Free space - all mounts on tools-bastion-01 is OK All targets OK
[02:21:21] <wikibugs>	 6Labs, 7Database: Provision a labsdb useraccount that can be used to run replica-addusers.pl - https://phabricator.wikimedia.org/T104476#1487263 (10Springle) At Yuvi's request on IRC I added `'labsdbadmin'@'10.64.37.7'` with the same permissions/password as @jcrespo added for `'labsdbadmin'@'10.64.37.6'`, sinc...
[02:51:59] <Reedy>	 labsd badmin
[02:55:43] <wikibugs>	 6Labs, 10Labs-Infrastructure, 3Labs-Sprint-105, 3Labs-Sprint-106, and 2 others: replica.my.cnf creation broken - https://phabricator.wikimedia.org/T104453#1487301 (10scfc) (The rewrite so far only covers service group users ("tool accounts"), for human users this is not fixed yet and this is known, so no n...
[02:56:41] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1407 is CRITICAL 33.33% of data above the critical threshold [0.0]
[03:25:47] <bd808>	 YuviPanda: I've got a puppet question that you might know something about. How in the heck does Service['ganglia-monitor'] get applied on a prod instance?
[03:26:32] <bd808>	 ::ganglia::monitor::service defines it but I can't see that applied anywhere
[03:27:02] <bd808>	 I'm getting this failure on a labs instance -- Error: Failed to apply catalog: Could not find dependent Service[ganglia-monitor] for File[/etc/ganglia/conf.d/elasticsearch.pyconf] at /etc/puppet/modules/elasticsearch/manifests/ganglia.pp:8
[03:27:25] <bd808>	 because the service isn't in the manifest
[03:27:40] <shinken-wm>	 PROBLEM - Free space - all mounts on tools-webgrid-lighttpd-1404 is CRITICAL tools.tools-webgrid-lighttpd-1404.diskspace.root.byte_percentfree (<50.00%)
[03:30:23] <bd808>	 found it!
[03:31:26] <bd808>	 ::standard conditionally includes ::ganglia which includes ::ganglia::monitor which has this horrible "include service" in it
[04:39:55] <bd808>	 !log stashbot Setup basic vms
[05:06:43] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1407 is OK Less than 1.00% above the threshold [0.0]
[05:20:37] <YuviPanda>	 bd808: the conditional should've made it not appear in labs....
[05:20:38] <YuviPanda>	 no?
[05:57:28] <shinken-wm>	 PROBLEM - Free space - all mounts on tools-webgrid-lighttpd-1406 is CRITICAL tools.tools-webgrid-lighttpd-1406.diskspace.root.byte_percentfree (<22.22%)
[06:15:00] <grrrit-wm>	 (03CR) 10Sitic: [C: 032 V: 032] Switch to python 3 [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/227164 (owner: 10Sitic)
[06:17:22] <YuviPanda>	 sitic: nice! 
[06:17:23] <YuviPanda>	 :)
[06:17:35] <YuviPanda>	 Did your webservice-new problem get sorted?
[06:20:11] <sitic>	 YuviPanda: well halfak is to blame (only supporting python3 with mediawiki-utilities) ;-)
[06:20:23] <YuviPanda>	 sitic: aaah heh :)
[06:20:31] <YuviPanda>	 I think he will be happy to hear that :)
[06:20:31] <sitic>	 yeah, was a strange issue, I think a stale NFS file handle on one of the webgrid nodes
[06:21:03] <sitic>	 gevent was blocking me, they finally added support two weeks ago
[06:21:52] <sitic>	 YuviPanda: btw, if you want to have a look: https://gerrit.wikimedia.org/r/#/c/227163/ adds ORES support
[06:22:16] <YuviPanda>	 sitic: fight.
[06:22:17] <YuviPanda>	 Err
[06:22:19] <YuviPanda>	 Right
[06:22:22] <YuviPanda>	 (Am on phone)
[06:22:33] <sitic>	 :-)
[06:22:47] <YuviPanda>	 sitic: hopefully I'll have a non-nfs dependent webservice setup in a few months 
[06:23:27] <sitic>	 wow, great =)
[06:24:00] <YuviPanda>	 sitic: yeah, marathon or kubernetes. Unsure yet 
[06:24:04] <YuviPanda>	 Should be fun 
[06:42:33] <shinken-wm>	 RECOVERY - Free space - all mounts on tools-webgrid-lighttpd-1406 is OK All targets OK
[06:47:41] <shinken-wm>	 RECOVERY - Free space - all mounts on tools-webgrid-lighttpd-1404 is OK All targets OK
[07:38:47] <valhallasw`cloud>	 YuviPanda: do I still have to un-1 something?
[07:38:54] <YuviPanda>	 valhallasw`cloud: nope
[07:39:00] <valhallasw`cloud>	 ok!
[07:57:44] <wikibugs>	 6Labs, 10Labs-Infrastructure, 3Labs-Sprint-105, 3Labs-Sprint-106, and 2 others: replica.my.cnf creation broken - https://phabricator.wikimedia.org/T104453#1487534 (10yuvipanda) So it's mostly fixed for tool accounts now \o/
[08:16:28] <_acs_>	 re
[08:21:26] <wikibugs>	 6Labs, 7Database: Measure capacity and utilization of labsdb*** boxes - https://phabricator.wikimedia.org/T107070#1487542 (10jcrespo) Aside from the aggregated totals on Ganglia and Tendril (which lead to discover OOMs on labsdb1002, you can do queries right now on information_schema such as:  ```  MariaDB LAB...
[10:07:32] <shinken-wm>	 PROBLEM - Puppet failure on tools-static-01 is CRITICAL 100.00% of data above the critical threshold [0.0]
[12:42:41] <Katie>	 Is the channel topic still accurate?
[12:43:13] <Katie>	 The crontabs part seems old.
[12:57:19] <jynus>	 Katie, I think that was post-NFS failure
[13:18:52] <wikibugs>	 6Labs, 10Tool-Labs, 7Database: Tool Labs enwiki_p replicated database missing rows - https://phabricator.wikimedia.org/T106470#1488023 (10jcrespo) a:3jcrespo
[13:32:56] <a9309131>	 Coren: Is there a diagram or a description of how the database pieces all connect?
[13:35:43] <a9309131>	 Katie: Don't worry, there will be another error soon, and the topic will change ;)
[13:36:13] <jynus>	 a9309131, what are you looking for exactly? Topology, databases?
[13:38:16] <a9309131>	 jynus: My boss can't believe wikipedia uses SQL because he doesn't think it scales well enough. Something to show him how it's done so that it copes.
[13:38:30] <jynus>	 ha
[13:38:34] <a930913>	 :p
[13:38:53] <jynus>	 50% of the wikipedia mysql dbas here
[13:39:10] <jynus>	 all enwiki writes goes to one server
[13:39:18] <jynus>	 that is scaling up
[13:39:53] <jynus>	 all uncached reads goes to 2-6 slaves per shard, that is scaling out
[13:40:22] <jynus>	 let me see if I can find any pretty graph
[13:49:14] <jynus>	 a930913, I do not have a pretty graph for you, but I can share you the production configuration of 1 datacenter: https://git.wikimedia.org/blob/operations%2Fmediawiki-config.git/33494f49c75de0e1e0dda7cf4217edd3856ac5eb/wmf-config%2Fdb-eqiad.php
[13:49:56] <jynus>	 that holds the >800 wikis
[13:57:05] <hashar>	 and https://noc.wikimedia.org/db.php  <--- Jynus ;)
[13:57:29] <hashar>	 from noc.wikimedia.org there is also https://dbtree.wikimedia.org/
[13:57:42] <hashar>	 which shows query per seconds and the master / slave relationships
[13:57:56] <hashar>	 not sure whether that last one is actually up to date
[13:58:55] <wikibugs>	 10Tool-Labs-tools-Other, 7Epic: Convert all Labs tools to use cdnjs for static libraries - https://phabricator.wikimedia.org/T103934#1488103 (10Ricordisamoa)
[13:59:09] <jynus>	 hashar, that is a parsed config, so I do not use it
[13:59:33] <jynus>	 and the other, I do, but the version on another internal tool with more sensitive data
[13:59:58] <hashar>	 wasn't sure whether you knew about them
[14:00:24] <jynus>	 I kinda did, but I am still discovering services every day to be fair
[14:01:01] <jynus>	 also, those pretty graphs do not work nice with curl, command line for the win :-)
[14:04:54] <jynus>	 Katie: templatelinks sync on enwiki_p is running, now we only have to wait- it may take quite some time
[14:07:49] <a930913>	 jynus: hashar: Thanks guys.
[14:09:44] <jynus>	 to be fair, I think my job is "easy", i think the best thing at wmf is the load balancing and caching, with takes away most of the problems
[14:27:30] <mark>	 that's correct ;)
[14:39:23] <Amir1_>	 Hey, Can you create a submodule of pywikibot named pywikibase?
[14:39:29] <Amir1_>	 pywikibot/pywikibase
[14:39:39] <Amir1_>	 I'm pulling out Wikibase related stuff
[14:47:25] <Amir1_>	 legoktm: hey around?
[14:53:01] <wikibugs>	 6Labs, 10Tool-Labs, 3Labs-Sprint-107: tools bastion accounting logs super noisy, filling /var - https://phabricator.wikimedia.org/T107052#1488195 (10valhallasw) As for other hosts:  tools-exec-1201 has entries as well, but only roughly twice a minute: ``` 10.64.37.10-man   F    root     __         0.00 secs...
[15:24:34] <Amir1_>	 YuviPanda: hey, around?
[15:25:27] <YuviPanda>	 Amir1_: kind of not really
[15:25:27] <YuviPanda>	 Sup
[15:25:53] <Amir1_>	 I just want to create a project in pywikibot
[15:26:14] <Amir1_>	 I don't know how to and probably I don't have permission YuviPanda 
[15:26:47] <YuviPanda>	 I don't know either - shouldn't you ask the pywikibot people?
[15:26:54] <YuviPanda>	 Unless you mean a gerrit project?
[15:27:10] <YuviPanda>	 In which case there is a request form somewhere on mediawiki.org 
[15:30:59] <valhallasw`cloud>	 YuviPanda: I'm considering turning on nfs debugging for a second or two on tools-bastion-01, but I'm afraid it could make the entire server unresponsive
[15:32:08] <Amir1_>	 YuviPanda: I meant a gerrit project
[15:32:12] <Amir1_>	 okay, thanks :)
[15:32:40] <YuviPanda>	 valhallasw`cloud: we could have DNS point -login to -02 and then start messing around?
[15:32:49] <YuviPanda>	 Also brb shower 
[15:32:54] <valhallasw`cloud>	 hm, maybe, yeah.
[15:33:11] <valhallasw`cloud>	 inb4 'dude, where's my screen?'
[15:33:27] <YuviPanda>	 valhallasw`cloud: heh :D 
[15:33:42] <YuviPanda>	 Any running screens on -01?
[15:33:44] <valhallasw`cloud>	 (there's quite a few people running processes again, we should do something about that)
[15:33:54] <valhallasw`cloud>	 probably
[15:34:07] <YuviPanda>	 Ulimits or something 
[15:34:13] <valhallasw`cloud>	 one screen, three tmuxes
[15:34:20] <YuviPanda>	 Ok physically stepping into shower. Brb
[15:34:31] <a930913>	 Daily reboots :D
[15:35:06] <a930913>	 That'll stop 'em.
[15:36:57] <Amir1_>	 https://www.mediawiki.org/wiki/Git/New_repositories/Requests
[15:43:09] <wikibugs>	 6Labs, 10Tool-Labs, 7Database: Tool Labs enwiki_p replicated database missing rows - https://phabricator.wikimedia.org/T106470#1488314 (10jcrespo) Sync is in progress. Looking better already.   ``` MariaDB LABS localhost enwiki > SELECT     -> tl_title     -> FROM page     -> JOIN templatelinks     -> ON tl_...
[15:49:12] <wikibugs>	 6Labs, 3Labs-Sprint-107, 5Patch-For-Review: Make continuous backups of NFS data to codfw - https://phabricator.wikimedia.org/T106474#1488338 (10coren) a:3coren
[15:51:40] <shinken-wm>	 PROBLEM - ToolLabs Home Page on toollabs is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string 'Magnus' not found on 'http://tools.wmflabs.org:80/' - 371 bytes in 0.029 second response time
[15:54:13] <wikibugs>	 6Labs, 10Labs-Infrastructure: Rename labstore module to labs_store or similar - https://phabricator.wikimedia.org/T107160#1488346 (10scfc) 3NEW
[15:55:10] <YuviPanda>	 Coren: ^^ shinken alert for toollabs 
[15:55:50] * YuviPanda is on phone on way to office
[15:55:54] <YuviPanda>	 valhallasw`cloud: ^^
[15:56:20] <valhallasw`cloud>	 YuviPanda: {{worksforme}}
[15:56:29] <YuviPanda>	 Ok
[16:00:21] <wikibugs>	 6Labs, 10Tool-Labs, 7Database: Provide replication lag as a database function - https://phabricator.wikimedia.org/T50628#1488359 (10jcrespo)
[16:00:27] <Coren>	 YuviPanda: Not sure what shinken is whining about, I see nothing wrong with it, and plenty of Magnuses.
[16:00:37] <Coren>	 owait.
[16:00:42] <Coren>	 Shinken is trying to HTTP
[16:00:47] <Coren>	 And /that/ fails.
[16:01:31] <Coren>	 Hm.  Intermitently.
[16:01:34] * Coren digs further
[16:06:42] <shinken-wm>	 RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 838325 bytes in 2.402 second response time
[16:06:47] <wikibugs>	 6Labs, 6Security, 6operations: create-dbusers can be used to clobber existing files on the NFS server - https://phabricator.wikimedia.org/T107161#1488399 (10scfc) 3NEW
[16:07:29] <wikibugs>	 6Labs, 10Labs-Infrastructure: Rename labstore module to labs_store or similar - https://phabricator.wikimedia.org/T107160#1488414 (10coren) 5Open>3declined a:3coren It used to be named labs_storage but @faidon felt that it should reflect the host naming shemes better (labstoreXXXX) for clarity.  So, as d...
[16:14:06] <wikibugs>	 6Labs: Find a different backup solution for Wikimetrics - https://phabricator.wikimedia.org/T103001#1488442 (10Milimetric) 5Open>3Resolved Thanks, Yuvi.  I verified that backup is working perfectly fine again.  We'll stick with /data/project for now because there's no real incentive to do anything else.
[16:17:15] <wikibugs>	 6Labs: Find a different backup solution for Wikimetrics - https://phabricator.wikimedia.org/T103001#1488455 (10yuvipanda) Cool. I'll re-open when we have a 'real' backup solution available.
[16:17:51] <shinken-wm>	 PROBLEM - ToolLabs Home Page on toollabs is CRITICAL - Socket timeout after 10 seconds
[16:19:20] <andrewbogott>	 YuviPanda: do you know what’s up with the toollabs page flapping?
[16:19:29] <YuviPanda>	 andrewbogott: no... Coren was looking into it
[16:19:34] <andrewbogott>	 ok, cool.
[16:19:43] <YuviPanda>	 I've to get off to get on BART now, I'll be online in 15
[16:19:51] * andrewbogott should’ve read the backscroll
[16:26:06] <Coren>	 As far as I can tell, it was the admin webservice php that went a bit cray-cray.
[16:26:21] <Coren>	 I've kicked its butt adn now it seems to be okay.
[16:27:33] <Coren>	 Aha.
[16:27:42] <Coren>	 PHP Fatal error:  Out of memory 
[16:27:44] <Coren>	 is the cause.
[16:29:28] <wikibugs>	 6Labs, 10Labs-Infrastructure: Rename labstore module to labs_store or similar - https://phabricator.wikimedia.org/T107160#1488533 (10scfc) Ah, I have faint memories of that discussion (and as I never touch actual iron, that reasoning is usually not on my mind :-)).
[16:44:17] <niedzielski>	 hashar: hey! i'm a little confused on what to do next for the android emulation test job. you said we should get the job finished first but you disabled it.
[16:49:37] <wikibugs>	 6Labs, 10Beta-Cluster, 10Wikimedia-Logstash, 5Patch-For-Review: Logstash on beta yields 500 due to NFS outage (can't open /data/project/logstash/.htpasswd) - https://phabricator.wikimedia.org/T102962#1488597 (10bd808)
[16:54:41] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1407 is CRITICAL 20.00% of data above the critical threshold [0.0]
[16:59:18] <valhallasw`cloud>	 Coren: what's your hunch on enabling sunrpc.nfs_debug on tools-bastion-01? If it causes a large system load, can we easily set it back to 'off'?
[16:59:40] <valhallasw`cloud>	 it might spam >10k messages per second
[16:59:55] <Coren>	 My first question would be "Why?"
[17:00:15] <valhallasw`cloud>	 Coren: https://phabricator.wikimedia.org/T107052
[17:00:48] <valhallasw`cloud>	 nfs is starting a thread very often (~10k times per second), which floods pacct, but which also indicates an underlying issue
[17:02:57] <Coren>	 valhallasw`cloud: I need to read the kernel source a bit first to figure out what the thread is meant to be doing.
[17:04:09] <valhallasw`cloud>	 it updates nfs client state, basically
[17:04:45] <valhallasw`cloud>	 and as far as I can see, it gets called mostly on errors in the NFS connection
[17:05:24] <wikibugs>	 6Labs, 7Puppet: Move all labs-only puppet roles to manifests/role/labs - https://phabricator.wikimedia.org/T107167#1488633 (10yuvipanda) 3NEW
[17:06:43] <Coren>	 valhallasw`cloud: Yeah, it's not immediately clear what's up.  But I'd debug on anything /but/ -bastion-01.  Why not -bastion-02 at least?
[17:06:53] <valhallasw`cloud>	 Coren: because it's not happening on bastion-02
[17:07:05] <Coren>	 At all, or just not as often?
[17:07:11] <valhallasw`cloud>	 but I can do a more exhaustive search for other hosts where it's happening
[17:07:34] <valhallasw`cloud>	 the manager typically runs once or twice per minute
[17:08:03] * Coren ponders.
[17:08:39] <Coren>	 I'd be surprised if it wasn't just a matter of scale - but if you turn it on do -01 make sure it's only briefly and keep a very close eye on it.
[17:09:14] <valhallasw`cloud>	 *nod*. My fear is not being able to turn it off anymore because of the load
[17:09:22] <wikibugs>	 6Labs, 7Puppet: Move all labs-only puppet roles to manifests/role/labs - https://phabricator.wikimedia.org/T107167#1488644 (10demon) Some of them shouldn't be labs-only probably ;-)
[17:09:30] <valhallasw`cloud>	 let me search for other hosts, maybe there is another one borking
[17:09:36] <Coren>	 valhallasw`cloud: foo;sleep 2;not-foo
[17:09:37] <Coren>	 :-)
[17:10:42] * valhallasw`cloud nods
[17:11:12] <valhallasw`cloud>	 ah, it's also happening on some webgrid-14xx nodes it seems
[17:11:40] <valhallasw`cloud>	 yep
[17:12:18] <valhallasw`cloud>	 ok, let's do it on tools-webgrid-lighttpd-1401 instead
[17:12:50] <wikibugs>	 6Labs, 7Puppet: Move all labs-only puppet roles to manifests/role/labs - https://phabricator.wikimedia.org/T107167#1488657 (10yuvipanda) Ah, right. so if it is applied in both prod and labs it should *not* be a labs only role but use hiera.  This is for the growing number of things that are 'deployed' to labs...
[17:12:51] * valhallasw`cloud reads up on draining hosts
[17:13:35] <Krenair>	 What do we have installed on labs in the way of mysql libraries for python3?
[17:13:45] <Krenair>	 for tools
[17:14:20] <wikibugs>	 6Labs, 10Tool-Labs, 3Labs-Sprint-107: tools bastion accounting logs super noisy, filling /var - https://phabricator.wikimedia.org/T107052#1488658 (10valhallasw) This is also happening on `tools-webgrid-lighttpd-1401` (and some other lighttpd-14xx hosts, and maybe others as well) which is a safer host to debu...
[17:14:38] <valhallasw`cloud>	 Krenair: we basically have nothing installed in the way of python3 libraries, I think
[17:14:57] <YuviPanda>	 virtualenv, virtualenv, virtualenv
[17:15:21] <Krenair>	 bah, ok
[17:15:23] <Krenair>	 nevermind..
[17:15:35] <YuviPanda>	 we could probably setup pip to allow per-user installs tho
[17:15:39] * YuviPanda gets off bart
[17:16:15] <valhallasw`cloud>	 !log tools disabled queue "webgrid-lighttpd@tools-webgrid-lighttpd-1401.eqiad.wmflabs"
[17:16:18] <labs-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master
[17:23:19] <wikibugs>	 6Labs, 10Tool-Labs: support python3 uwsgi apps - https://phabricator.wikimedia.org/T104374#1488674 (10GoldenRing) I've followed the instructions above, and I'm reasonably confident I've done it as described.  But when I try to start the webservice, I get this in the logs:  ``` 2015-07-28 17:22:23: (log.c.166)...
[17:32:15] <Coren>	 valhallasw`cloud: A point that may be of note is whether the incidence of the accounting entries falls off after the node has been drained or not.
[17:34:43] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1407 is OK Less than 1.00% above the threshold [0.0]
[17:39:46] <valhallasw`cloud>	 Coren: *nod*
[17:40:20] <wikibugs>	 6Labs, 10Tool-Labs: Rewrite the meta_p table populating code to python and have it run on a cron - https://phabricator.wikimedia.org/T107094#1488754 (10Krenair) a:3Krenair
[17:40:55] <YuviPanda>	 Krenair: \o/
[17:41:03] <YuviPanda>	 Krenair: python3 and python3-pymysql?
[17:41:07] <Krenair>	 yep
[17:41:10] <YuviPanda>	 nice
[17:41:23] <Krenair>	 not quite working yet
[17:41:26] <Krenair>	 but it's getting there
[17:41:42] <YuviPanda>	 nice!
[17:43:34] <valhallasw`cloud>	 !log tools rescheduled all webservice jobs on tools-webgrid-lighttpd-1401.eqiad.wmflabs, server is now empty
[17:43:38] <labs-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master
[17:49:14] <valhallasw`cloud>	 !log tools Jobs were drained at 19:43, but this did not decreade he rate, which is still at ~50k/minute. Now running "sysctl -w sunrpc.nfs_debug=1023 && sleep 2 && sysctl -w sunrpc.nfs_debug=0" which hopefully doesn't kill the server
[17:49:17] <labs-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL, Master
[17:55:41] <Krenair>	 >>> req = urllib.request.Request(canon + "/w/api.php?action=query&meta=siteinfo&siprop=general&format=json")
[17:55:41] <Krenair>	 Traceback (most recent call last):
[17:55:41] <Krenair>	   File "<stdin>", line 1, in <module>
[17:55:41] <Krenair>	 AttributeError: 'module' object has no attribute 'request'
[17:55:41] <Krenair>	 >>> req = urllib.request.Request(canon + "/w/api.php?action=query&meta=siteinfo&siprop=general&format=json")
[17:55:47] <Krenair>	 >>>
[17:55:49] <Krenair>	 ... wut?
[17:56:32] <Krenair>	 works if I import urllib.request as well as urllib
[17:56:37] <legoktm>	 yes
[17:56:41] <legoktm>	 OR
[17:56:47] <legoktm>	 import requests
[17:56:50] <legoktm>	 requests.get(...)
[18:02:07] <wikibugs>	 6Labs, 10Tool-Labs, 3Labs-Sprint-107: tools bastion accounting logs super noisy, filling /var - https://phabricator.wikimedia.org/T107052#1488828 (10valhallasw) The rate did not decrease after draining the server of jobs. Using  ``` sysctl -w sunrpc.nfs_debug=1023 && sleep 2 && sysctl -w sunrpc.nfs_debug=0 `...
[18:03:25] <valhallasw`cloud>	 bah
[18:04:01] <GoldenRing>	 Anyone familiar with using lighttpd + fastcgi + flipflop + flask + python3?
[18:04:31] <Coren>	 valhallasw`cloud: That was impressively uninformative
[18:08:31] <GoldenRing>	 I'm trying to follow the instructions left by valhallasw'cloud on phab ticket 104374, but always get "child exited with status 13" in the error log.
[18:08:36] <GoldenRing>	 Can't make out what I'm doing wrong.
[18:08:57] <valhallasw`cloud>	 GoldenRing: the error reporting for lighttpd is typically not very informative, no :(
[18:09:34] <GoldenRing>	 Any pointers?
[18:09:44] <GoldenRing>	 Either to what's wrong or how to diagnose it.
[18:10:06] <valhallasw`cloud>	 you can try running the app.fcgi directly
[18:10:12] <valhallasw`cloud>	 to see if that gives you a clear error message
[18:11:00] <valhallasw`cloud>	 back in ~20 mins
[18:15:08] <GoldenRing>	 Yes, that gives "OSError: [Errno 88] Socket operation on non-socket"
[18:15:41] <GoldenRing>	 I've tried googling that, but all the answers I've found amount to, "It does that when you run it directly, but it should work under a web server."
[18:17:28] <YuviPanda>	 GoldenRing: hi. let me see if I can help
[18:17:40] <GoldenRing>	 Thanks.
[18:29:18] <GoldenRing>	 Er, so, any ideas?
[18:29:30] <YuviPanda>	 GoldenRing: patch incoming to add native python3 support
[18:30:14] <GoldenRing>	 Heh.  That would be the best solution, for sure.
[18:30:29] <GoldenRing>	 As in, using uwsgi / webservice2?
[18:30:33] <YuviPanda>	 GoldenRing: yes
[18:30:39] <GoldenRing>	 Nice.
[18:30:50] <GoldenRing>	 I'll leave you to it, then.
[18:30:55] <GoldenRing>	 Thanks.
[18:48:25] <wikibugs>	 6Labs, 10Tool-Labs: support python3 uwsgi apps - https://phabricator.wikimedia.org/T104374#1489137 (10valhallasw) I'm at a loss why this would happen. I've changed the app.fcgi to be completely wrapped by a try-except block, and even that fails. That's consistent with the output: 'Flask server started <time>'...
[19:02:23] <wikibugs>	 6Labs, 10Tool-Labs, 3Labs-Sprint-107: tools bastion accounting logs super noisy, filling /var - https://phabricator.wikimedia.org/T107052#1489248 (10valhallasw) tcpdump is slightly more informative:  ``` valhallasw@tools-webgrid-lighttpd-1401:~$ sudo tcpdump port 2049 | grep ERROR | head -n 25 tcpdump: verbo...
[19:04:36] <valhallasw`cloud>	 andrewbogott: ^ I got further with tcpdump, but I don't know enough NFS to really make sense of what I'm seeing. If you want to fiddle around, tools-webgrid-lighttpd-1401 has the same issue, and you can do whatever you want to it (unmount stuff, reboot, etc)
[19:05:07] <andrewbogott>	 valhallasw`cloud: maybe Coren will magically fix it, otherwise I’l look later on
[19:15:57] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1408 is CRITICAL 16.67% of data above the critical threshold [0.0]
[19:26:02] <wikibugs>	 10Tool-Labs-tools-Other: Fix tool kmlexport - https://phabricator.wikimedia.org/T92963#1489379 (10Thgoiter) Down a lot since yesterday. Please restart.
[19:30:02] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1204 is CRITICAL 30.00% of data above the critical threshold [0.0]
[19:30:56] <shinken-wm>	 PROBLEM - Puppet failure on tools-master is CRITICAL 50.00% of data above the critical threshold [0.0]
[19:31:53] <wikibugs>	 6Labs, 10Tool-Labs, 5Patch-For-Review: support python3 uwsgi apps - https://phabricator.wikimedia.org/T104374#1489408 (10yuvipanda) ok! so if your webservice is of type uwsgi-plain, you can put whatever you want in your uwsgi.ini - that should be able to get you working with python3 / uwsgi with just followi...
[20:02:10] <wikibugs>	 10Tool-Labs-tools-Other: Fix tool kmlexport - https://phabricator.wikimedia.org/T92963#1489476 (10valhallasw) Restarted.
[20:05:54] <shinken-wm>	 RECOVERY - Puppet failure on tools-master is OK Less than 1.00% above the threshold [0.0]
[20:06:29] <jdlrobson>	 YuviPanda: could i borrow you for about 30 mins at some point this afternoon?
[20:07:16] <YuviPanda>	 jdlrobson: sure.
[20:10:03] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1204 is OK Less than 1.00% above the threshold [0.0]
[20:19:21] <jdlrobson>	 YuviPanda: what time works for you?
[20:19:31] <YuviPanda>	 jdlrobson: gonna get food in 10, how about 2pm?
[20:19:34] <jdlrobson>	 can i rely on your calendar?
[20:19:34] <YuviPanda>	 jdlrobson: lso depends on what for? :)
[20:20:31] <jdlrobson>	 cool sounds good. It's about https://github.com/wikimedia/labs-tools-gerrit-to-redis  (quick demo and getting permissions etc)
[20:26:22] <YuviPanda>	 ah that one
[20:26:23] <YuviPanda>	 ok
[20:26:41] <YuviPanda>	 jdlrobson: yes you can rely on my calendar (yes it's that empty :D)
[20:38:17] <wikibugs>	 6Labs, 6Release-Engineering, 6operations, 10wikitech.wikimedia.org, 5Patch-For-Review: silver / scap -  Could not get latest version: 403 Forbidden - https://phabricator.wikimedia.org/T103138#1489646 (10greg)
[20:39:28] <wikibugs>	 6Labs, 7Puppet: Could not find data item labs_recursor - https://phabricator.wikimedia.org/T107205#1489652 (10Tgr) 3NEW
[20:43:58] <wikibugs>	 10Wikibugs: wikibugs announcing phab column changes on IRC - https://phabricator.wikimedia.org/T107208#1489704 (10greg) 3NEW
[20:45:17] <wikibugs>	 10Wikibugs: wikibugs announcing phab column changes on IRC - https://phabricator.wikimedia.org/T107208#1489719 (10greg)
[20:50:28] <grrrit-wm>	 (03PS1) 10Legoktm: Don't notify if multiple ignored actions were triggered [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/227574 (https://phabricator.wikimedia.org/T107208) 
[20:50:37] <legoktm>	 bd808: ^ quick cr?
[20:51:01] <bd808>	 legoktm: sure. looking
[20:52:33] <grrrit-wm>	 (03CR) 10BryanDavis: [C: 031] Don't notify if multiple ignored actions were triggered [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/227574 (https://phabricator.wikimedia.org/T107208) (owner: 10Legoktm)
[20:52:46] <bd808>	 legoktm: I only haz +1 there apparently
[20:53:10] <grrrit-wm>	 (03CR) 10Jforrester: [C: 031] Don't notify if multiple ignored actions were triggered [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/227574 (https://phabricator.wikimedia.org/T107208) (owner: 10Legoktm)
[20:57:10] <grrrit-wm>	 (03CR) 10Legoktm: [C: 032] Don't notify if multiple ignored actions were triggered [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/227574 (https://phabricator.wikimedia.org/T107208) (owner: 10Legoktm)
[20:57:26] <grrrit-wm>	 (03Merged) 10jenkins-bot: Don't notify if multiple ignored actions were triggered [labs/tools/wikibugs2] - 10https://gerrit.wikimedia.org/r/227574 (https://phabricator.wikimedia.org/T107208) (owner: 10Legoktm)
[20:59:03] <ircnotifier>	 !log tools.wikibugs legoktm: Deployed 680c8aad81158a3ddb1c4018233c07729c163cc0 Don't notify if multiple ignored actions were triggered wb2-phab
[20:59:06] <labs-morebots>	 Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikibugs/SAL, Master
[21:01:04] <YuviPanda>	 jdlrobson: where are you?
[21:01:09] <jdlrobson>	 coming out :D
[21:01:14] <jdlrobson>	 *over :)
[21:01:33] <YuviPanda>	 jdlrobson: ok!
[21:05:11] <YuviPanda>	 mmm fonts
[21:24:59] <shinken-wm>	 PROBLEM - SSH on tools-exec-1213 is CRITICAL - Socket timeout after 10 seconds
[21:29:50] <shinken-wm>	 RECOVERY - SSH on tools-exec-1213 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2~wmfprecise2 (protocol 2.0)
[21:42:17] <grrrit-wm>	 (03PS2) 10Sitic: Add ORES support [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/227163 (https://phabricator.wikimedia.org/T106727) 
[21:45:05] <grrrit-wm>	 (03PS3) 10Sitic: Add ORES support [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/227163 (https://phabricator.wikimedia.org/T106727) 
[22:27:18] <wikibugs>	 10Wikibugs, 5Patch-For-Review: wikibugs announcing phab column changes on IRC - https://phabricator.wikimedia.org/T107208#1490007 (10Legoktm) 5Open>3Resolved a:3Legoktm
[22:28:03] <wikibugs>	 6Labs, 7Puppet: Could not find data item labs_recursor - https://phabricator.wikimedia.org/T107205#1490025 (10scfc) The underlying problem (I think) is that `role::puppet::self` down the line includes `puppetmaster::hiera` which sets up:  ```     file { '/etc/puppet/hiera.yaml':         ensure  => $ensure,...
[22:45:34] <grrrit-wm>	 (03CR) 10Sitic: [C: 032 V: 032] Add ORES support [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/227163 (https://phabricator.wikimedia.org/T106727) (owner: 10Sitic)
[22:46:02] <grrrit-wm>	 (03PS1) 10Sitic: Fix logevents for pagetranslation [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/227600 
[22:46:52] <grrrit-wm>	 (03CR) 10Sitic: [C: 032 V: 032] Fix logevents for pagetranslation [labs/tools/crosswatch] - 10https://gerrit.wikimedia.org/r/227600 (owner: 10Sitic)
[23:07:49] <wikibugs>	 6Labs, 10Datasets-Archiving, 10Labs-Infrastructure, 10Wikidata: Wikidata JSON dumps gets deleted after every new Wikidata dump - https://phabricator.wikimedia.org/T107226#1490125 (10Hydriz) 3NEW
[23:08:06] <wikibugs>	 6Labs, 10Datasets-Archiving, 10Datasets-General-or-Unknown, 10Labs-Infrastructure, 10Wikidata: Wikidata JSON dumps gets deleted after every new Wikidata dump - https://phabricator.wikimedia.org/T107226#1490125 (10Hydriz)
[23:55:58] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1408 is OK Less than 1.00% above the threshold [0.0]