[00:05:32] <mutante>	  kaldari> matt_flaschen: for some reason I can't edit any pages on wikitech wiki. it just gives me a session error.
[00:09:35] <mutante>	 personally i have this issue right now
[00:09:39] <mutante>	  The two-factor authentication token provided was invalid. 
[00:09:52] <mutante>	 using google authenticator as normal
[00:10:22] <legoktm>	 yuvipanda: should it run on trusty or precise?
[00:11:58] <mutante>	 ok, my 2fa problem is solved. my time on the phone was off
[00:18:37] <wikibugs>	 10Tool-Labs: Grants for my Tools-db missing to insert new lines - https://phabricator.wikimedia.org/T98790#1278118 (10scfc) a:3Springle
[00:51:32] <wikibugs>	 6Labs, 6operations, 10wikitech.wikimedia.org: Move wikitech to HHVM - https://phabricator.wikimedia.org/T98813#1278203 (10yuvipanda) 3NEW
[00:53:01] <wikibugs>	 6Labs, 6operations, 10wikitech.wikimedia.org: Move wikitech to HHVM - https://phabricator.wikimedia.org/T98813#1278212 (10Krenair) See also T87036 - although silver runs trusty and has PHP 5.5 rather than 5.3, we should still migrate it to HHVM.
[00:55:54] <wikibugs>	 6Labs, 6operations, 10wikitech.wikimedia.org: Move wikitech to HHVM - https://phabricator.wikimedia.org/T98813#1278220 (10Krenair)
[01:04:45] <wikibugs>	 6Labs, 6Phabricator: Figure out who made #Labs a restricted-join/edit project and why (or revert) - https://phabricator.wikimedia.org/T98814#1278235 (10Krenair) 3NEW a:3chasemp
[01:05:47] <wikibugs>	 6Labs, 6Phabricator: Figure out who made #Labs a restricted-join/edit project and why (or revert) - https://phabricator.wikimedia.org/T98814#1278251 (10Krenair)
[01:07:26] <wikibugs>	 6Labs, 10Beta-Cluster, 5Patch-For-Review: Move logs off NFS on beta - https://phabricator.wikimedia.org/T98289#1278255 (10bd808) Syslogs are now going to deployment-logstash1 instead of deployment-bastion via a cherry-pick of https://gerrit.wikimedia.org/r/210253.  I removed `role::syslog::centralserver` fro...
[01:10:58] <yuvipanda>	 legoktm: lol, so when we expanded capacity two weeks ago, it made webservice restarts so fast that I can’t test my code that waits for webservices to start before returning :)
[01:13:16] <legoktm>	 haha
[01:21:11] <HaeB>	 yuvipanda: quelle horreur ;) apropos: i was wondering; after "webservice uwsgi-python (re)start" i don't have to worry about anything ever again, right?
[01:21:23] <yuvipanda>	 HaeB: basically, yes
[01:21:25] <HaeB>	 i.e. won't normally need to restart it manually
[01:21:28] <yuvipanda>	 no
[01:21:33] <HaeB>	 ok
[01:41:54] <wikibugs>	 10Tool-Labs, 5Patch-For-Review: Unify / simplify webservice code - https://phabricator.wikimedia.org/T98440#1278291 (10yuvipanda) So ^ is the beginning of webservice-new for interactive use and webservice-runner for actually starting things on the webservice nodes. Also contains implementations of tool-nodejs...
[01:45:05] <spagewmf>	 yuvipanda: https://wikitech.wikimedia.org/wiki/Help:Labs-vagrant#Update_vagrant says "Run `sudo git pull". Why do you need to run as root? `git pull` seems to work fine.
[01:47:12] <yuvipanda>	 spagewmf: I think the permissions at labs-vagrant were messed up enough at some point that people basically gave up trying to do the right thing and did sudo on everything
[01:47:18] <yuvipanda>	 which might not be the worst of approaches mind you
[01:48:46] <spagewmf>	 yuvipanda: I always leave the safety on when handling a gun
[01:49:42] <yuvipanda>	 spagewmf: :D only if the gun has a working safety :) but you’re right.  
[01:50:01] <yuvipanda>	 what all of this needs is for someone to spend a day going through it and fixing all the user issues
[01:50:53] <yuvipanda>	 legoktm: https://gerrit.wikimedia.org/r/#/c/210196/ if / when you feel like it :)
[01:51:31] <spagewmf>	 yuvipanda: OK, I'll remove the `sudo`s with a Troubleshooting suggestion to retry with sudo.
[01:51:43] <yuvipanda>	 spagewmf: +1
[01:54:32] <legoktm>	 yuvipanda: I probably won't have time to review it soon sorry
[01:54:41] <yuvipanda>	 legoktm: that’s cool too. thanks :)
[01:54:51] <yuvipanda>	 hence the if/when :)
[01:56:38] <spagewmf>	 legoktm: when are you going to drop the cloak of invisibility on https://wikimediafoundation.org/wiki/Staff_and_contractors?showall=1 so I don't make embarrassing errors in the witness lineup?
[01:56:59] <legoktm>	 spagewmf: not soon, there's a picture of me on officewiki though :P
[01:58:13] <spagewmf>	 legoktm: you look like Adam Basso in a dark room :)  Still better than nothing, thanks.
[01:58:23] <legoktm>	 heh
[02:27:25] <wikibugs>	 10Tool-Labs: Deprecate #no-default-php in .lighttpd.conf - https://phabricator.wikimedia.org/T98818#1278311 (10yuvipanda) 3NEW
[02:32:23] <wikibugs>	 10Tool-Labs: Deprecate #no-default-php in .lighttpd.conf - https://phabricator.wikimedia.org/T98818#1278327 (10yuvipanda) 5 tools use this currently: ``` ./directory/.lighttpd.conf:#no-default-php ./static/.lighttpd.conf:#no-default-php ./newwebtest/.lighttpd.conf:#no-default-php ./enwp10/.lighttpd.conf:#no-defa...
[02:50:00] <wikibugs>	 10Tool-Labs: Deprecate #no-default-php in .lighttpd.conf - https://phabricator.wikimedia.org/T98818#1278371 (10yuvipanda) p:5Triage>3Low
[02:50:19] <wikibugs>	 10Tool-Labs: Deprecate #no-default-php in .lighttpd.conf - https://phabricator.wikimedia.org/T98818#1278311 (10yuvipanda)
[02:50:20] <wikibugs>	 10Tool-Labs, 5Patch-For-Review: Unify / simplify webservice code - https://phabricator.wikimedia.org/T98440#1278372 (10yuvipanda)
[03:35:26] <shinken-wm>	 PROBLEM - ToolLabs Home Page on toollabs is CRITICAL - Socket timeout after 10 seconds
[03:40:19] <shinken-wm>	 RECOVERY - ToolLabs Home Page on toollabs is OK: HTTP OK: HTTP/1.1 200 OK - 768374 bytes in 4.542 second response time
[04:05:19] <spagewmf>	 yo yuvipanda , Shirley one should change the admin password for a labs-vagrant instance ASAP before the bad guys watching wikitech for new Nova resources swoop in and compromise it?!
[04:07:06] <bd808>	 spagewmf: don't blame yuvipanda for labs-vagrant. He hasn't worked on it for over a year ;)
[04:07:06] <spagewmf>	 is there a standard place people stick the admin password for others with shell access to the instance?  The old role has /srv/mediawiki/orig/adminpass
[04:08:03] <bd808>	 I tend to lock others out and assume they know how to run createAndPromote.php when they need more rights
[04:08:03] <spagewmf>	 bd808: I'm a lover not a blame stylist :) . I'm fixing up https://wikitech.wikimedia.org/wiki/Help:Labs-vagrant as I go, seems "Change the admin password" is fairly important.
[04:08:53] <bd808>	 noted and appreciated
[04:09:12] <bd808>	 I'm not so great at remembering to document things as I go
[04:09:14] <legoktm>	 $wgMinimalPasswordLength = 0;
[04:10:47] <bd808>	 I don't know that I could list all the default settings provided by MediaWiki-Vagrant that make the wikis it sets up insecure
[04:11:18] <bd808>	 it's tuned to be a dev environment which is not very compatible with "secure"
[04:11:45] <legoktm>	 $wgGroupPermissions['*']  = User::getAllRights(); (won't work unfortunately :()
[04:12:14] <bd808>	 constant passwords, errors to web, mysql connects as admin user, ...
[04:13:26] <spagewmf>	 bd808: agreed, but Shirley changing the admin password is Security step 0.
[04:13:30] <spagewmf>	 *surely
[04:13:53] <bd808>	 agreed (and don't call me Shirley)
[04:15:04] <Magog_the_Ogre>	 my new favorite page: https://en.wikipedia.org/wiki/List_of_lists_of_lists
[04:15:32] <bd808>	 that is a good one for sure
[04:15:43] <bd808>	 wikipedians live their lists
[04:15:50] <bd808>	 s/live/love/
[04:16:30] <bd808>	 https://en.wikipedia.org/wiki/Portal:Contents/Lists
[04:17:14] <Magog_the_Ogre>	 you employees are too kind to us
[04:18:46] <bd808>	 ha. I think the world is pretty kind for paying for me to have this job :)
[04:19:57] <Magog_the_Ogre>	 about a month ago, a tool of mine, er, caused a catastrophic failure on Commons
[04:20:07] <Magog_the_Ogre>	 >days since breaking Wikimedia: 17
[04:20:25] <Magog_the_Ogre>	 I got a polite request to please change its behavior
[04:20:43] <bd808>	 I don't think I've collaborated on a global outage for ... 3 months?
[04:21:04] <Betacommand>	 Magog_the_Ogre: what happened?
[04:21:07] <bd808>	 my logging work help make one in late January last longer than it should have
[04:21:27] <Magog_the_Ogre>	 nah it was YuviPanda
[04:21:45] <bd808>	 was that the really large gallery thing?
[04:21:57] <Magog_the_Ogre>	 Betacommand, my bot was generating user galleries, but the users weren't watching them very closely, so they were ending up with several thousand per section
[04:22:01] <Magog_the_Ogre>	 yup, that was me
[04:22:18] <bd808>	 it took us a while to puzzle that one out
[04:22:42] <Magog_the_Ogre>	 yeah I truly can't say sorry enough
[04:23:02] <bd808>	 totally not your fault in my opinion
[04:23:20] <Betacommand>	 Magog_the_Ogre: I once had my watchlist crash the serversw
[04:23:23] <bd808>	 wikitext provides a myriad of ways to make bad things happen
[04:23:58] <bd808>	 you just happened to make a useful tool that helped make some wikitext we weren't protecting against
[04:24:19] <Magog_the_Ogre>	 true
[04:24:21] <spagewmf>	 "List of Argentinian films of the 1930s" is not a list of lists!
[04:25:02] <bd808>	 all the "WTFs" when we found the problem were pointed at the stuff in core that the galleries exposed
[04:25:04] <Magog_the_Ogre>	 still, no code is perfect, and the lists were well beyond the point of useful. I take at least partial responsibility
[04:26:11] <bd808>	 At $DAYJOB-2 I wrote some code that "misplaced" $4M one day
[04:26:27] <bd808>	 mistakes are easy and yours was cheap in comparison
[04:26:34] <Magog_the_Ogre>	 spagewmf, I think it is
[04:29:14] * Magog_the_Ogre wonders how a computer can misplace hundreds of dollars
[04:29:27] <Magog_the_Ogre>	 and by hundreds, I mean millions (ironically)
[04:31:05] <bd808>	 it was a tax payment app and my code messed up the table that recorded who had paid and how much
[04:31:31] <Magog_the_Ogre>	 oh man
[04:31:35] <bd808>	 so we had $4M in the bank account but wen't sure who to credit for the payments
[04:31:46] <Magog_the_Ogre>	 I work for a company that had accounting problems at one point
[04:31:56] <Magog_the_Ogre>	 we have auditors everywhere for everything
[04:32:10] <Magog_the_Ogre>	 that kind of mistake could literally sink my whole company
[04:32:46] <Magog_the_Ogre>	 not trying to make you feel bad, just giving some perspective from the other side
[04:32:55] <bd808>	 thankfully I had operational logs that could be used to figure things out. It took a couple of nerve wracking days but we handled it
[04:33:02] <Magog_the_Ogre>	 lol
[04:33:47] <bd808>	 At the same job I wrote the accounting system and the NACHA transfer code. I was supposed to be a "professional" ;)
[04:34:36] <Magog_the_Ogre>	 I worked in a bank for a few years (non-IT, call center grunt worker)
[04:34:42] <Magog_the_Ogre>	 I always wondered how ACHs work
[04:35:31] <bd808>	 basically by exchanging flat files over sftp
[04:35:55] <bd808>	 there's a thick manual that describes the file formats and transfer protocols
[04:36:39] <bd808>	 Most folks buy a module that plugs into their accounting system to do the work
[04:36:44] <Magog_the_Ogre>	 I never saw any major mistakes, although my one friend claimed he saw someone's account get debited with the maximum amount, which was like 9.9999999999 trillion
[04:37:01] <Magog_the_Ogre>	 yeah 
[04:37:29] <bd808>	 since I wrote our accounting system I ended up on the hook for the transfer code too
[04:37:31] <Magog_the_Ogre>	 if there's anything I'd want prewritten and pretested software for, it'd be money transfer.
[04:38:19] <bd808>	 there was a pretty reasonable audit and testing process for getting my code certified to operate with the NACHA clearinghouse
[04:38:27] <bd808>	 that was actually a fun project
[04:41:23] <Magog_the_Ogre>	 gn all
[05:36:21] <spagewmf>	 Whoever ported `sudo su vagrant; labs-vagrant git-update`, I <3 you
[05:36:32] <bd808>	 yw
[05:36:56] <bd808>	 I think I did it because you asked for it spagewmf ;)
[05:37:52] <bd808>	 yup -- https://gerrit.wikimedia.org/r/#/c/161360/
[05:38:01] <bd808>	 "S pointed out that this was missing in an email."
[06:35:30] <shinken-wm>	 PROBLEM - Puppet failure on tools-master is CRITICAL 66.67% of data above the critical threshold [0.0]
[07:00:28] <shinken-wm>	 RECOVERY - Puppet failure on tools-master is OK Less than 1.00% above the threshold [0.0]
[08:54:27] <shinken-wm>	 PROBLEM - Puppet staleness on tools-mailrelay-01 is CRITICAL 100.00% of data above the critical threshold [43200.0]
[09:19:48] <valhallasw>	 yuvipanda: if you have time today, could you try to make the shinken accounts?
[09:27:09] <wikibugs>	 10Tool-Labs: Grants for my Tools-db missing to insert new lines - https://phabricator.wikimedia.org/T98790#1278744 (10Springle) Afaik labs grants are set by @coren's maintain-replicas.pl, or perhaps @yuvipanda was rebuilding it? You'll need to ask them if anything has changed.
[09:27:21] <wikibugs>	 10Tool-Labs: Grants for my Tools-db missing to insert new lines - https://phabricator.wikimedia.org/T98790#1278748 (10Springle) a:5Springle>3None
[09:50:34] <grrrit-wm>	 (03CR) 10Filippo Giunchedi: "understood, thanks" [labs/toollabs] - 10https://gerrit.wikimedia.org/r/209968 (https://phabricator.wikimedia.org/T98641) (owner: 10Merlijn van Deen)
[09:56:02] <wikibugs>	 6Labs: Get Labs openstack service dbs on a proper db server - https://phabricator.wikimedia.org/T92693#1278857 (10Springle) a:5Springle>3jcrespo
[10:21:32] <wikibugs>	 6Labs: Get Labs openstack service dbs on a proper db server - https://phabricator.wikimedia.org/T92693#1278916 (10Springle) Probably we should setup an M5 cluster dedicated to databases for labs services, and also move pdns/designate over from M1 at the same time. A single EQIAD R510 master (db1009 is available)...
[11:22:26] <konggaru>	 I have a question about Python encoding
[11:22:52] <konggaru>	 I ran my Python script on my computer and it works well
[11:23:13] <konggaru>	 and I ran that script on wikimedia-labs via SSH and worked well
[11:23:52] <konggaru>	 However, when I use jstart there is a error "UnicodeEncodeError"
[11:24:17] <konggaru>	 My script has East-Asian characters
[11:24:32] <valhallasw>	 konggaru: set PYTHONIOENCODING
[11:27:22] <valhallasw>	 or expliclty encode() what you print
[11:29:28] <konggaru>	 Thank you
[14:01:05] <hashar>	 the continuous integration weekly meeting is starting now in #wikimedia-office . Short agenda is https://www.mediawiki.org/wiki/Continuous_integration/Meetings/2015-05-12
[14:01:49] <wikibugs>	 6Labs: Get Labs openstack service dbs on a proper db server - https://phabricator.wikimedia.org/T92693#1279418 (10Andrew) Works for me!  Just let me know when/if I should redirect services to a new db host.
[14:19:30] <wikibugs>	 6Labs, 6operations, 10ops-eqiad: Can labvirt* boxes take more RAM? - https://phabricator.wikimedia.org/T98658#1279518 (10Cmjohnson) No, all the slots are full.
[14:24:01] <wikibugs>	 6Labs, 6operations, 10ops-eqiad: Can labvirt* boxes take more RAM? - https://phabricator.wikimedia.org/T98658#1279538 (10Andrew) 5Open>3declined ok, thanks
[14:51:37] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL 20.00% of data above the critical threshold [0.0]
[15:17:02] <marcmiquel>	 hi guys
[15:17:13] <marcmiquel>	 i have a doubt on using python packets in tool labs
[15:17:21] <marcmiquel>	 i need to use one in particular which needs to be installed
[15:17:40] <marcmiquel>	 but the system doesn't allow me to do sudo or use pip
[15:18:08] <marcmiquel>	 does anybody know another way to do it?
[15:21:34] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK Less than 1.00% above the threshold [0.0]
[15:34:22] <a930913>	 marcmiquel: virtualenv
[15:35:48] <a930913>	 Coren: Why have a load of continuous jobs that had been running for months, stopped?
[15:39:02] <codezee>	 hi all, just a small question, I had filed a request for a labs-instance here at https://phabricator.wikimedia.org/T98537 , but I'm wondering if I need to furnish any other information related to it?
[15:39:22] <codezee>	 as was mentioned, I have created it as a subtask of https://phabricator.wikimedia.org/T76375
[15:44:10] <Betacommand>	 Coren: you around?
[15:45:12] <marcmiquel>	 thanks a930913. i'm going to try.
[15:46:53] <a930913>	 marcmiquel: virtualenv <dir>; source <dir>/bin/activate; pip install *;
[15:47:23] <marcmiquel>	 ¿?
[15:47:35] <marcmiquel>	 i'm uploading the last virtualenv to my tool labs
[15:47:55] <a930913>	 marcmiquel: virtualenv is already installed.
[15:48:36] <marcmiquel>	 what is the line you posted me doing?
[15:48:55] <a930913>	 marcmiquel: It's three lines of bash.
[15:49:11] <a930913>	 First makes the virtual environment in the directory.
[15:49:30] <a930913>	 Second activates the environment so you can use it.
[15:49:52] <a930913>	 Third is the pip install to install whatever you want in the environment.
[15:50:09] <marcmiquel>	 but, should i substitute dir for sth elese?
[15:50:37] <a930913>	 Yeah, I usually use "venv"
[15:50:49] <a930913>	 Assuming it's in the current directory.
[15:51:36] <marcmiquel>	 ¿? u invent the name of the dir?
[15:52:26] <marcmiquel>	 aha, it creates the environment in the dir u create i see
[16:17:24] <marcmiquel>	 a930913: once the modules are installed using virtualenv there is no problem in calling them in the code using "import X", right?
[16:26:26] <valhallasw>	 a930913: stopped? that's odd. Any date/time when that has happened? Might have something to do with the rebuild of the -exec cluster, but that should have restarted the jobs
[16:27:23] <valhallasw>	 marcmiquel: that should work, but remember to source <dir>/bin/activate before you run your script
[16:27:43] <L235>	 Hey, FYI, I'm getting Error: 503, Service Unavailable at Tue, 12 May 2015 16:26:11 GMT on beta.wmflabs.org
[16:27:44] <marcmiquel>	 ah, ok. thanks guys! :)
[16:28:20] <valhallasw>	 L235: works for me? I get a redirect to http://deployment.wikimedia.beta.wmflabs.org/wiki/Main_Page
[16:29:19] <L235>	 Yeah, as in I'm getting forwarded there, but that gives the 503
[16:29:22] <L235>	 Request: GET http://deployment.wikimedia.beta.wmflabs.org/wiki/Main_Page, from 127.0.0.1 via deployment-cache-text02 deployment-cache-text02 ([127.0.0.1]:3128), Varnish XID 1175329398
[16:29:22] <L235>	 Forwarded for: 192.171.222.224, 66.249.85.191, 127.0.0.1
[16:29:22] <L235>	 Error: 503, Service Unavailable at Tue, 12 May 2015 16:27:49 GMT
[16:29:34] <L235>	 (Sorry for for flood)
[16:31:13] <valhallasw>	 L235: hm, not sure. I'd suggest opening a ticket on phab, with tag #beta-cluster
[16:32:13] <L235>	 Will do if not resolved in ~1day or so
[16:32:46] <L235>	 (It's not urgent)
[16:33:58] <L235>	 ....and it's back
[16:47:30] <yuvipanda>	 valhallasw: L235 also #wikimedia-releng deals with betacluster stuff usually 
[16:47:55] <L235>	 Good to know for the future, thanks
[16:48:03] <yuvipanda>	 Yw
[16:48:21] <valhallasw>	 yuvipanda: 'morning
[16:48:22] <yuvipanda>	 valhallasw: I added you as a reviewer to tools - webservice initial commit :)
[16:48:25] <yuvipanda>	 valhallasw: hello
[16:48:30] <valhallasw>	 yuvipanda: yeah, I'll take a look.
[16:48:34] * yuvipanda is just getting ready to head to office
[16:48:36] <valhallasw>	 yuvipanda: who should I poke for ldap nda?
[16:48:40] <yuvipanda>	 valhallasw: sweet.
[16:49:11] <yuvipanda>	 valhallasw: usually ops on clinic duty. I can do it too I think. Open a bug anyeT?
[16:49:23] <valhallasw>	 yuvipanda: https://phabricator.wikimedia.org/T97580 :P
[16:49:27] <valhallasw>	 eh
[16:49:27] <valhallasw>	 https://phabricator.wikimedia.org/T93644
[16:49:43] <yuvipanda>	 Hehe cool
[16:49:48] <yuvipanda>	 I'll do it when I'm at laptop 
[16:49:53] <valhallasw>	 <3
[16:49:58] <yuvipanda>	 I shall do the shinken accounts as well
[16:50:26] <valhallasw>	 <barnstar>
[17:07:25] <Betacommand>	 my first draft of the re-write of the ?status page for tools http://tools.wmflabs.org/betacommand-dev/cgi-bin/sge_status.py
[17:07:47] <Betacommand>	 just need to throw in the filtering system
[18:24:42] <codezee>	 yuvipanda: hello! do you have a moment?
[18:28:45] <Nemo_bis>	 Does someone know what happened to Magnus Manske's viewstats_cache table? https://bitbucket.org/magnusmanske/glamtools/issue/27/unknown-database-s51203__viewstats_cache
[18:29:17] <wikibugs>	 6Labs, 10hardware-requests, 6operations: labnet1002 - https://phabricator.wikimedia.org/T98740#1280447 (10RobH) So we had to order a 10G card for labnet1001, we'll need to do the same for whatever system is allocated for labnet1002.  labnet1001 has the following: 2x Intel Xeon(R) CPU X5650  @ 2.67GHz (6 core...
[18:29:50] <wikibugs>	 10Tool-Labs, 10Wikidata, 5Patch-For-Review, 3Wikidata-Sprint-2015-05-05: Add wb_changes_subscription and wbc_entity_usage to labs db replication - https://phabricator.wikimedia.org/T98748#1280450 (10Bene)
[18:30:07] <wikibugs>	 6Labs, 10hardware-requests, 6operations: labnet1002 - https://phabricator.wikimedia.org/T98740#1280451 (10RobH) p:5Triage>3Normal
[18:58:59] <wikibugs>	 6Labs, 10hardware-requests, 6operations: labnet1002 - https://phabricator.wikimedia.org/T98740#1280508 (10Andrew) No need for SSDs; lets go with the 410.  Thanks.
[19:08:15] <codezee>	 hi! I had filed a request for a labs-instance for my project here https://phabricator.wikimedia.org/T98537 , can I know if I need to furnish some other information for it, or would that be enough?
[19:16:26] <valhallasw>	 codezee: looks clear enough to me, so I'd just wait until one of the admins responds.
[19:17:06] <codezee>	 valhallasw: alright :) I'll wait then
[19:21:13] <wikibugs>	 6Labs, 7Tracking: Create Labs Project for WikidataPageBanner extension - https://phabricator.wikimedia.org/T98537#1280560 (10Andrew) 5Open>3Resolved a:3Andrew Created.  Sumit, you can add additional members or users to your project if you like.  Please document your project here:  https://wikitech.wikime...
[19:21:15] <wikibugs>	 6Labs, 7Tracking: New Labs project requests (Tracking) - https://phabricator.wikimedia.org/T76375#1280563 (10Andrew)
[19:26:22] <wikibugs>	 6Labs, 7Tracking: Create Labs Project for WikidataPageBanner extension - https://phabricator.wikimedia.org/T98537#1280574 (10Sumit) @Andrew, thanks for the help :)
[19:30:24] <yuvipanda>	 valhallasw: can you check on redis? I got an alert from catchpoijt 
[19:30:24] <yuvipanda>	 Point 
[19:30:26] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1201 is CRITICAL 30.00% of data above the critical threshold [0.0]
[19:30:32] <yuvipanda>	 I'm on the way to office now
[19:30:39] <valhallasw>	 yuvipanda: check what exactly?
[19:30:58] <yuvipanda>	 valhallasw: if it is full?
[19:31:41] <yuvipanda>	 Man, public transportation sucks in this city 
[19:32:02] <valhallasw>	 yuvipanda: max is what again?
[19:32:08] <valhallasw>	 we're at 7G now
[19:32:09] <valhallasw>	 http://graphite.wmflabs.org/render/?width=586&height=308&_salt=1430297062.188&target=tools.tools-redis.redis.6379.memory.internal_view&target=tools.tools-redis.redis.6379.memory.external_view&from=-14days
[19:32:17] <yuvipanda>	 Hmm should be ok
[19:32:20] <yuvipanda>	 12g 
[19:32:35] <yuvipanda>	 The test that failed writes a random value and immediately reads it again 
[19:33:08] <yuvipanda>	 Thanks for checking! I'll look at the teat when I'm back in the office 
[19:57:06] <valhallasw>	 yuvipanda: https://docs.python.org/2/library/multiprocessing.html#process-and-exceptions should map to GridTask very nicely
[19:57:21] <valhallasw>	 run() = the actual webserver
[19:57:27] <valhallasw>	 start() = call SGE to start the webserver
[19:57:41] <valhallasw>	 join() = somehow do a blocking SGE join, not sure if that's possible :-p
[19:57:47] <valhallasw>	 name = SGE task name
[19:58:00] <valhallasw>	 is_alive() = talk to the grid to see if status=running
[19:58:21] <valhallasw>	 and terminate() to kill
[20:00:25] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1201 is OK Less than 1.00% above the threshold [0.0]
[20:02:51] <valhallasw>	 extending it would be the best, but that's probably a bit too complicated
[20:09:20] <valhallasw>	 anyway. mirroring known interfaces is good
[20:21:58] <yuvipanda>	 valhallasw: looking through rest of your comments now
[20:30:07] <wikibugs>	 10Tool-Labs, 10Hackathon-Lyon-2015: Tool-labs meeting agenda for Lyon Hackathon - https://phabricator.wikimedia.org/T98912#1280778 (10valhallasw) 3NEW
[20:31:49] <valhallasw>	 yuvipanda: ^ please add other things we should discuss
[20:31:59] <yuvipanda>	 looking
[20:34:12] <wikibugs>	 10Tool-Labs, 10Hackathon-Lyon-2015: Tool-labs meeting agenda for Lyon Hackathon - https://phabricator.wikimedia.org/T98912#1280791 (10valhallasw)
[20:35:02] <mutante>	 valhallasw: what's wrong  with that tools.wmflabs.org site
[20:35:19] <valhallasw>	 mutante: it's not very user friendly :-p
[20:35:54] <yuvipanda>	 nor very fast :)
[20:35:58] <mutante>	 valhallasw: aha? seems good to me
[20:36:34] <mutante>	 user friendly is a vague term but i guess it depends what you expect it to do
[20:36:59] <yuvipanda>	 right now it, uh, is… good for a listing of all service groups! :)
[20:37:13] <yuvipanda>	 (I like https://tools.wmflabs.org/hay/directory/#/ better)
[20:37:14] <yuvipanda>	 anyway
[20:37:14] <valhallasw>	 mutante: I expect 1) to help users who are looking for a tool to find it, and 2) to help users who have vaguely heard of tool labs to get started, and 3) to help power users to quickly get info on tool status etc
[20:37:23] <yuvipanda>	 +1 to valhallasw 
[20:38:03] <valhallasw>	 for 1 hay's directory is great, for 2 the wikitech help page is sort-of OK but overwhelming, for 3 the current site works but is on the slow end
[20:38:16] <mutante>	 looks like one of the issues is that the "additional information" column is not filled out by all tool maintainers
[20:38:38] <yuvipanda>	 bigger issue is that the home page isn’t really designed with anything in mind :)
[20:41:23] <yuvipanda>	 valhallasw: do you know where JAT_status is documented?
[20:41:51] <yuvipanda>	 http://wiki.gridengine.info/wiki/index.php/GridEngine_XML links to http://gridengine.info/articles/2005/11/03/gridengine-xml-translating-jat_state-values-into-useful-information which is helpfully 404
[20:42:22] <mutante>	 got it
[20:42:50] <valhallasw>	 :/
[20:42:59] <valhallasw>	 yuvipanda: https://github.com/gridengine/gridengine/search?l=c&q=JAT_status&utf8=%E2%9C%93 has nothing
[20:43:35] <yuvipanda>	 valhallasw: yeah. I can probably dig into the code and figure out the bitmasks but...
[20:43:38] <valhallasw>	 so, er, https://github.com/gridengine/gridengine/blob/6a5407d56c85b39290ac2488fb6dec1a4404a974/source/libs/sgeobj/sge_job.h#L44 I guess :X
[20:43:48] <valhallasw>	 just using state 'running' makes more sense, yeah.
[20:43:59] <valhallasw>	 I don't get why jstat is so crappy
[20:44:11] <valhallasw>	 why would you *not* tell me the job status when I ask for a specific job
[20:44:12] <valhallasw>	 qstat8
[20:44:34] <yuvipanda>	 gridengine is in general crappy, IMO
[20:44:57] <valhallasw>	 there's not really a large amount of options though, it seems
[20:45:33] <yuvipanda>	 me and _joe_ are going to play around with Mesos / Marathon during the hackathon. 
[20:45:41] <yuvipanda>	 I already have a test cluster running in labs. should be interesting.
[20:46:37] <valhallasw>	 yuvipanda: mmhm. We could use it for the webgrid at least, and optionally for exec nodes
[20:46:55] <valhallasw>	 I don't think we can realistically *replace* gridengine in the forseeable future
[20:47:17] <yuvipanda>	 I think if we’re still running gridengine in about, say, 18 months, I’ll consider my time at toollabs a failure :)
[20:47:55] <yuvipanda>	 but anyway, talking of that now is only going to be a distraction, I think
[20:48:02] <yuvipanda>	 don’t know enough about Mesos / Marathon to have informed opinions
[20:48:03] <valhallasw>	 yuvipanda: I'll consider my time at toollabs a failure if gridengine is gone sooner than that. I'm not going to be OK with pulling the rug from under our users again
[20:48:22] <yuvipanda>	 valhallasw: oh totally, me neither. 
[20:48:34] <valhallasw>	 but having something better in parallel: sure
[20:48:46] <yuvipanda>	 valhallasw: I think it’ll happen very similar to the precise / trusty migration
[20:48:56] <yuvipanda>	 valhallasw: where you start off with ‘use this extra parameter for goodies!'
[20:49:01] <valhallasw>	 yuvipanda: state 'running' = 'task has at least one job'
[20:49:03] <yuvipanda>	 valhallasw: and then after some time it becomes default
[20:49:12] <valhallasw>	 https://github.com/gridengine/gridengine/blob/master/source/clients/qstat/qstat_xml.c#L836
[20:49:28] <valhallasw>	 so you don't even need to parse the status
[20:49:31] <yuvipanda>	 and then after basically everyone has migrated and / or alternative arrangements have been madae (some tools can still run on gridengine)
[20:49:53] <yuvipanda>	 valhallasw: hmm, I think looking for ‘r’ in qstat -u is clearer...
[20:49:56] <yuvipanda>	 so I’m inclined to leave it as is
[20:50:08] <valhallasw>	 yuvipanda: you mean for 'running' in -xml
[20:50:36] <yuvipanda>	 valhallasw: hmm, I could do that too
[20:50:46] <yuvipanda>	 valhallasw: right now it’s checking for       <state>r</state>
[20:50:52] <valhallasw>	 ehhh
[20:50:53] <valhallasw>	 wait
[20:51:08] <valhallasw>	 you're right, it's <state>r</state>
[20:51:36] <yuvipanda>	 yeah
[20:51:51] <yuvipanda>	 JA*_ is basically a ‘lol you expect consistency and documentation?'
[20:54:14] <wikibugs>	 6Labs: social-tools1 instance is down, rebooting via Special:NovaInstance fails - https://phabricator.wikimedia.org/T98919#1280886 (10ashley) 3NEW
[20:55:06] <yuvipanda>	 andrewbogott: ^ can you take a look? I wonder if it’s another botched move.
[20:55:43] <andrewbogott>	 yuvipanda: ok
[20:56:52] <valhallasw>	 yuvipanda: I have never seen so many layers of indirection in C :|
[20:56:59] <valhallasw>	 stop forwarding that function pointer!
[20:57:30] <valhallasw>	 and then it gets passed to the magic job_stdout_init
[20:57:31] <valhallasw>	 :|
[20:58:11] <valhallasw>	 oh, no, it's just github serach being weird
[20:59:42] <wikibugs>	 6Labs: social-tools1 instance is down, rebooting via Special:NovaInstance fails - https://phabricator.wikimedia.org/T98919#1280938 (10Andrew) 5Open>3Resolved a:3Andrew I started it -- it looks OK now.  This is mostly just the result of there not being a 'start' link on wikitech and 'start' and 'reboot' bei...
[21:00:15] <polybuildr>	 Okay, I used bd808's guide to set up a Labs-vagrant role on an instance and also created a web proxy.
[21:00:22] <brion>	 it takes a few minutes for ssh keys to update on the login bastion right? just updating stuff since i moved to a new computer a couple weeks ago :)
[21:00:28] <polybuildr>	 However, http://honeypot-wiki-alpha.wmflabs.org/ has a 504 gateway timeout
[21:00:31] <brion>	 ah there it goes
[21:00:32] <brion>	 \o/
[21:01:01] <ashley>	 andrewbogott, yuvipanda: thanks :D
[21:01:07] <andrewbogott>	 polybuildr: have you checked your security groups?  Might be that your instance is blocking web traffic
[21:01:18] <yuvipanda>	 valhallasw: re: abstracting out the username stuff, I have: https://github.com/wikimedia/operations-software-tools-manifest/blob/master/tools/manifest/tool.py and related
[21:01:21] <polybuildr>	 andrewbogott: It was set to default. Bad idea?
[21:01:39] <andrewbogott>	 polybuildr: you’ll need to look at the actual rules in the security group.
[21:01:40] <yuvipanda>	 valhallasw: maybe I should just move them to a ‘tools’ package, and then put utils and things like this there.
[21:02:00] <bd808>	 polybuildr: My guess would be that you need to open up port 80 in the security group. See (1) here -- https://wikitech.wikimedia.org/wiki/Help:Labs-vagrant#Setting_up_your_instance_with_labs-vagrant
[21:02:07] <valhallasw>	 yuvipanda: and then Tool.from_current_user() or something. That also works, I guess
[21:02:16] <yuvipanda>	 valhallasw: that sounds like the right thing to do, yeah
[21:02:18] <yuvipanda>	 valhallasw: so let me do that.
[21:02:23] <valhallasw>	 yuvipanda: just adding def get_tool_name() is also good enough for now, I think
[21:02:38] <polybuildr>	 bd808: I did a labs-vagrant provision and there were no errors
[21:02:39] <yuvipanda>	 valhallasw: nah, let’s just do the right thing. shouldn’t be that hard.
[21:02:54] <valhallasw>	 yuvipanda: you can also do some home dir stuff with that route
[21:02:56] <valhallasw>	 instead of ~
[21:03:00] <polybuildr>	 bd808: I thought that was enough. Did I miss something from that guide?
[21:03:01] <yuvipanda>	 valhallasw: yup.
[21:03:02] <valhallasw>	 which might just use $USER again :P
[21:03:06] <yuvipanda>	 yup :P
[21:03:12] <yuvipanda>	 valhallasw: I was thinking of that right after I told you all those :P
[21:03:24] <valhallasw>	 but again, using $USER is fine in practice
[21:03:31] <bd808>	 polybuildr: the security group stuff is poorly named. It is really about firewall rules for talking to your new VM.
[21:03:50] <polybuildr>	 bd808: So 'default' wasn't the right way to go about it?
[21:03:54] <polybuildr>	 Will look into those settings.
[21:04:09] <bd808>	 default is ok, but you need to add a rule to allow port 80 communications
[21:04:24] <bd808>	 the default settings only let in ssh
[21:04:35] <polybuildr>	 bd808: Hmm, where do I do that? I can't seem to find an interface. Or do I it from within the machine?
[21:04:59] <bd808>	 polybuildr: It should show on https://wikitech.wikimedia.org/wiki/Special:NovaSecurityGroup
[21:05:13] <grrrit-wm>	 (03CR) 10Merlijn van Deen: [C: 032] "Sorry for the slow merge!" [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/205222 (owner: 10Werdna)
[21:05:16] <polybuildr>	 bd808: aha :D thanks!
[21:05:16] <grrrit-wm>	 (03CR) 10jenkins-bot: [V: 04-1] Add #wikimedia-design configuration [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/205222 (owner: 10Werdna)
[21:05:23] <valhallasw>	 :<
[21:09:56] <polybuildr>	 bd808: I added a rule to allow 80 tcp traffic specifically from my IP and I'm still getting the gateway error. Any guesses?
[21:10:21] <polybuildr>	 Also, if I wanted to make the wiki publicly accessible, would the CIDR be 0.0.0.0/0?
[21:10:40] <grrrit-wm>	 (03PS3) 10Merlijn van Deen: Add #wikimedia-design configuration [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/205222 (owner: 10Werdna)
[21:11:03] <grrrit-wm>	 (03CR) 10Merlijn van Deen: [C: 032] Add #wikimedia-design configuration [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/205222 (owner: 10Werdna)
[21:11:18] <andrewbogott>	 polybuildr: yeah, 0.0.0.0/0 is ‘everywhere’
[21:11:20] <bd808>	 polybuildr: yeah, add 0.0.0.0/0. That's really only going to allow all of Labs to get to your instance. There is yet another layer of firewall before the outside world
[21:11:41] <valhallasw>	 yuvipanda: https://gerrit.wikimedia.org/r/#/c/202363/ can haz review
[21:11:41] <andrewbogott>	 your IP specifically won’t work because you’re using a proxy, right?  So your instance is only ever seeing traffic from the proxy.
[21:12:11] <yuvipanda>	 valhallasw: needs manual rebasing :(
[21:12:18] <valhallasw>	 oh :(
[21:12:21] <valhallasw>	 I'll take a look in a sec
[21:12:27] <yuvipanda>	 valhallasw: I’m also vaguely terrified of touching that file - every time I’ve done so (even with +1s from someone else) it’s blown up
[21:12:56] <polybuildr>	 andrewbogott: I'm not behind a proxy as of now.
[21:13:34] <andrewbogott>	 polybuildr: “I used bd808's guide to set up a Labs-vagrant role on an instance and also created a web proxy.”
[21:13:48] <andrewbogott>	 Note the ‘web proxy’ part of that
[21:13:59] <polybuildr>	 Oh.
[21:14:02] <polybuildr>	 Right.
[21:14:06] <polybuildr>	 Ouch. :p Okay.
[21:14:08] <andrewbogott>	 :)
[21:14:46] <grrrit-wm>	 (03PS2) 10Merlijn van Deen: Move Math repo from #mediawiki-visualeditor to #wikimedia-editing [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/206041 (owner: 10Jforrester)
[21:15:08] <grrrit-wm>	 (03CR) 10Merlijn van Deen: [C: 032] Move Math repo from #mediawiki-visualeditor to #wikimedia-editing [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/206041 (owner: 10Jforrester)
[21:15:48] <grrrit-wm>	 (03Merged) 10jenkins-bot: Add #wikimedia-design configuration [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/205222 (owner: 10Werdna)
[21:15:51] <grrrit-wm>	 (03Merged) 10jenkins-bot: Move Math repo from #mediawiki-visualeditor to #wikimedia-editing [labs/tools/grrrit] - 10https://gerrit.wikimedia.org/r/206041 (owner: 10Jforrester)
[21:18:50] <polybuildr>	 bd808, andrewbogott: It works! :D Thanks.
[21:18:57] <andrewbogott>	 cool :)
[21:19:00] <ircnotifier>	 !log tools.lolrrit-wm valhallasw: Deployed 38a61981f2f894a19a2a9f0f65f0c0bee07dd16f Merge "Move Math repo from #mediawiki-visualeditor to #wikimedia-editing"
[21:19:03] <valhallasw>	 since when does git log --oneline -3 start a pager?!?!
[21:19:05] <labs-morebots>	 Logged the message, Master
[21:19:20] <valhallasw>	 oh well, luckily fab happily passes 'q' to less
[21:19:27] <valhallasw>	 ohnotthisshitagain
[21:24:34] <polybuildr>	 andrewbogott: This instance is meant to function as a honeypot for spam, to help with a spam deleting extension. It's currently an m1.small instance. Do you think that's enough or would you recommend a bigger instance?
[21:25:13] <andrewbogott>	 polybuildr: it’s hard to know until it’s been running for a bit.  Just make sure you design it in such a way that it’s easy to rebuild, and then we can discard it and build a bigger one if/when necessary.
[21:26:08] <polybuildr>	 There isn't really any design as such. labs-vagrant with port 80 open, that's all.
[21:26:44] <valhallasw>	 yuvipanda: I'm off to bed, but could you to the ldap nda group and the shinken accounts somewhere today? Thanks :-)
[21:27:02] <yuvipanda>	 valhallasw: yeah, will do! sorry about the delay, the key thing ruined my morning
[21:27:08] <polybuildr>	 andrewbogott: ^^^
[21:27:16] <yuvipanda>	 valhallasw: thanks for reviews, etc. I’ll have a new  patchsets up soon
[21:27:16] <valhallasw>	 yuvipanda: np, it's hard to type without keyboard :D
[21:27:18] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1211 is CRITICAL 30.00% of data above the critical threshold [0.0]
[21:27:27] <andrewbogott>	 polybuildr: in this case ‘design’ might mean, ‘write down the steps you took to build it so you can repeat them'
[21:27:28] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1206 is CRITICAL 20.00% of data above the critical threshold [0.0]
[21:27:34] <yuvipanda>	 uh oh
[21:27:37] <yuvipanda>	 is that the change I just merged
[21:27:39] * yuvipanda checks
[21:28:49] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-generic-1403 is CRITICAL 40.00% of data above the critical threshold [0.0]
[21:28:59] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1407 is CRITICAL 20.00% of data above the critical threshold [0.0]
[21:29:03] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1410 is CRITICAL 33.33% of data above the critical threshold [0.0]
[21:29:11] <valhallasw>	 ...prooobably, although I have no idea how
[21:29:22] <valhallasw>	 or something else dying
[21:29:32] <valhallasw>	 Error: Could not retrieve catalog from remote server: Error 400 on SERVER: Must pass central_host to Class[Base::Remote_syslog] at /etc/puppet/modules/base/manifests/init.pp:62 on node i-00000bbc.eqiad.wmflabs
[21:29:33] <polybuildr>	 andrewbogott: thanks!
[21:29:43] <valhallasw>	 oh, you found that already
[21:29:51] <yuvipanda>	 valhallasw: yeah
[21:29:59] <yuvipanda>	 valhallasw: fix coming up, it was the rsyslog patch :)
[21:30:13] <valhallasw>	 yeah, I was going to say 'that was /also/ a patch you merged' :-p
[21:30:13] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1208 is CRITICAL 44.44% of data above the critical threshold [0.0]
[21:30:17] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1202 is CRITICAL 44.44% of data above the critical threshold [0.0]
[21:30:22] <yuvipanda>	 valhallasw: that was the only patch I merged :P
[21:30:26] <shinken-wm>	 PROBLEM - Puppet failure on tools-webproxy-01 is CRITICAL 33.33% of data above the critical threshold [0.0]
[21:30:26] <yuvipanda>	 (today at least)
[21:30:33] <valhallasw>	 oh, you didn't merge the bigbrother one. doh.
[21:30:37] <yuvipanda>	 valhallasw: :)
[21:30:47] <yuvipanda>	 valhallasw: you were going to bed, so I figured I’ll merge it later
[21:30:53] <valhallasw>	 *nod*
[21:30:54] <shinken-wm>	 PROBLEM - Puppet failure on tools-mail is CRITICAL 20.00% of data above the critical threshold [0.0]
[21:31:26] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1201 is CRITICAL 44.44% of data above the critical threshold [0.0]
[21:31:34] <yuvipanda>	 andrewbogott: don’t worry about the ^ failures
[21:31:34] <shinken-wm>	 PROBLEM - Puppet failure on tools-master is CRITICAL 60.00% of data above the critical threshold [0.0]
[21:31:34] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1410 is CRITICAL 30.00% of data above the critical threshold [0.0]
[21:31:42] <andrewbogott>	 yuvipanda: ok, thanks
[21:31:46] <shinken-wm>	 PROBLEM - Puppet failure on tools-bastion-02 is CRITICAL 20.00% of data above the critical threshold [0.0]
[21:31:52] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1405 is CRITICAL 60.00% of data above the critical threshold [0.0]
[21:31:58] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1408 is CRITICAL 60.00% of data above the critical threshold [0.0]
[21:32:02] <valhallasw>	 yuvipanda: can shinken group failures per type?
[21:32:25] <yuvipanda>	 valhallasw: ‘batching’, you mean? 
[21:32:32] <shinken-wm>	 PROBLEM - Puppet failure on tools-bastion-01 is CRITICAL 40.00% of data above the critical threshold [0.0]
[21:32:34] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1402 is CRITICAL 30.00% of data above the critical threshold [0.0]
[21:32:36] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1409 is CRITICAL 40.00% of data above the critical threshold [0.0]
[21:32:44] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1202 is CRITICAL 40.00% of data above the critical threshold [0.0]
[21:32:45] <yuvipanda>	 valhallasw: it can’t, but ircecho should and it’s been a long wanted feature that nobody has the time to work on :(
[21:32:46] <valhallasw>	 yuvipanda: 'max N 'puppet' messages within M minutes'
[21:32:50] <yuvipanda>	 valhallasw: and we end up with this instead :(
[21:33:08] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1210 is CRITICAL 66.67% of data above the critical threshold [0.0]
[21:33:23] <yuvipanda>	 valhallasw: I added you to nda group
[21:33:27] <yuvipanda>	 valhallasw: can you check?
[21:33:27] <valhallasw>	 <3
[21:33:32] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1215 is CRITICAL 20.00% of data above the critical threshold [0.0]
[21:33:44] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1205 is CRITICAL 40.00% of data above the critical threshold [0.0]
[21:33:46] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1208 is CRITICAL 40.00% of data above the critical threshold [0.0]
[21:34:27] <valhallasw>	 yuvipanda: yeah, have a build with parameters button for operations-puppet-catalog-compiler
[21:34:31] <valhallasw>	 let me try :P
[21:34:38] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1407 is CRITICAL 50.00% of data above the critical threshold [0.0]
[21:34:46] <shinken-wm>	 PROBLEM - Puppet failure on tools-trusty is CRITICAL 40.00% of data above the critical threshold [0.0]
[21:34:51] <yuvipanda>	 valhallasw: you can also just check by going to graphite.wikimedia.org and trying to log in
[21:34:56] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-generic-1401 is CRITICAL 30.00% of data above the critical threshold [0.0]
[21:35:09] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1408 is CRITICAL 66.67% of data above the critical threshold [0.0]
[21:35:27] <shinken-wm>	 PROBLEM - Puppet failure on tools-static-01 is CRITICAL 40.00% of data above the critical threshold [0.0]
[21:35:30] <valhallasw>	 yuvipanda: seems to work
[21:35:46] <yuvipanda>	 valhallasw: sweet :)
[21:35:49] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1203 is CRITICAL 40.00% of data above the critical threshold [0.0]
[21:36:58] <shinken-wm>	 PROBLEM - Puppet failure on tools-checker-01 is CRITICAL 40.00% of data above the critical threshold [0.0]
[21:37:12] <shinken-wm>	 PROBLEM - Puppet failure on tools-checker-02 is CRITICAL 20.00% of data above the critical threshold [0.0]
[21:37:18] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-generic-1404 is CRITICAL 44.44% of data above the critical threshold [0.0]
[21:37:59] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1209 is CRITICAL 50.00% of data above the critical threshold [0.0]
[21:38:47] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1207 is CRITICAL 40.00% of data above the critical threshold [0.0]
[21:39:01] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1404 is CRITICAL 60.00% of data above the critical threshold [0.0]
[21:39:11] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1204 is CRITICAL 40.00% of data above the critical threshold [0.0]
[21:39:17] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1205 is CRITICAL 20.00% of data above the critical threshold [0.0]
[21:40:18] <valhallasw>	 yuvipanda: well, puppet-compiler certainly isn't working :-p but that was to be expected
[21:40:21] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1210 is CRITICAL 40.00% of data above the critical threshold [0.0]
[21:40:31] <yuvipanda>	 valhallasw: :) it isn’t working for prod either atm :)
[21:40:37] <shinken-wm>	 PROBLEM - Puppet failure on tools-exec-1402 is CRITICAL 30.00% of data above the critical threshold [0.0]
[21:41:36] <yuvipanda>	 valhallasw: I’m just going to make it into the util class for now. not going to make it depend on tools-manifest yet.
[21:41:54] <valhallasw>	 ok
[21:41:59] <valhallasw>	 actual bed now :p
[21:42:02] <yuvipanda>	 valhallasw: although at some point we should have a tools-common and have other packages depend on it
[21:42:03] <valhallasw>	 *waves*
[21:42:04] <yuvipanda>	 valhallasw: :P
[21:42:05] <yuvipanda>	 valhallasw: night <3
[21:42:14] <valhallasw>	 yuvipanda: YEAH WITH DEBIAN
[21:42:15] <valhallasw>	 [/runs]
[21:42:21] <shinken-wm>	 PROBLEM - Puppet failure on tools-webgrid-lighttpd-1401 is CRITICAL 66.67% of data above the critical threshold [0.0]
[21:42:34] <valhallasw>	 (once it's a python package that's trivial anyway, so whatevs)
[21:43:08] <yuvipanda>	 valhallasw: yeah that’s what I meant
[21:48:23] <milimetric>	 hey folks, I'm having some problems with /var filling up on beta labs (the eventlogging instance)
[21:48:36] <milimetric>	 deployment-eventlogging02.eqiad.wmflabs
[21:49:02] <milimetric>	 I can clean some small things but mostly if we could clean up the innodb files we'd be good.  I just don't have root
[21:49:13] <milimetric>	 *I don't have root on the mysql database, I have root on the machine
[21:50:19] <yuvipanda>	 milimetric: you’re looking for: #wikimedia-releng :D also, on behalf of greg-g, https://wikitech.wikimedia.org/wiki/Labs_labs_labs for calling it beta labs and not beta cluster :)
[21:50:31] <yuvipanda>	 greg-g: I wonder if emailing engineering@ and wikitech-l@ about https://wikitech.wikimedia.org/wiki/Labs_labs_labs might be a good idea
[21:51:35] <greg-g>	 yuvipanda: :)
[21:52:07] <milimetric>	 yuvipanda: thank you, greg-g: sorry, noted
[21:52:21] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1211 is OK Less than 1.00% above the threshold [0.0]
[21:56:25] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1201 is OK Less than 1.00% above the threshold [0.0]
[21:56:51] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1405 is OK Less than 1.00% above the threshold [0.0]
[21:56:59] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1408 is OK Less than 1.00% above the threshold [0.0]
[21:57:25] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1206 is OK Less than 1.00% above the threshold [0.0]
[21:58:09] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1210 is OK Less than 1.00% above the threshold [0.0]
[21:58:49] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-generic-1403 is OK Less than 1.00% above the threshold [0.0]
[21:59:03] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1407 is OK Less than 1.00% above the threshold [0.0]
[21:59:04] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1410 is OK Less than 1.00% above the threshold [0.0]
[21:59:36] <spagewmf>	 I enabled $wgUseInstantCommons on my labs instance, but no images appear, e.g. http://devhub.wmflabs.org/wiki/File:Wikipedia_Beta_search_on_Android_4.4.4_2015-02-09.png
[21:59:42] <wikibugs>	 6Labs, 10Beta-Cluster, 5Patch-For-Review: Move logs off NFS on beta - https://phabricator.wikimedia.org/T98289#1281086 (10bd808) 5Open>3Resolved I think all beta should be using NFS for now is images and homedirs.
[22:00:11] <spagewmf>	 I think it's a permissions problem, /var/www is drwxr-xr-x 4 root root.
[22:00:11] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1208 is OK Less than 1.00% above the threshold [0.0]
[22:00:12] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1408 is OK Less than 1.00% above the threshold [0.0]
[22:00:22] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1202 is OK Less than 1.00% above the threshold [0.0]
[22:00:26] <shinken-wm>	 RECOVERY - Puppet failure on tools-webproxy-01 is OK Less than 1.00% above the threshold [0.0]
[22:00:52] <shinken-wm>	 RECOVERY - Puppet failure on tools-mail is OK Less than 1.00% above the threshold [0.0]
[22:01:32] <shinken-wm>	 RECOVERY - Puppet failure on tools-master is OK Less than 1.00% above the threshold [0.0]
[22:01:34] <spagewmf>	 heh, when I look for errors % tail /var/log/hhvm//error.log
[22:01:35] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1410 is OK Less than 1.00% above the threshold [0.0]
[22:01:41] <spagewmf>	 Failed to initialize central HHBC repository:\n  Failed to open /var/www/.hhvm.hhbc: 14 - unable to open database file\n
[22:01:45] <shinken-wm>	 RECOVERY - Puppet failure on tools-bastion-02 is OK Less than 1.00% above the threshold [0.0]
[22:02:33] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1402 is OK Less than 1.00% above the threshold [0.0]
[22:02:33] <shinken-wm>	 RECOVERY - Puppet failure on tools-bastion-01 is OK Less than 1.00% above the threshold [0.0]
[22:02:35] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1409 is OK Less than 1.00% above the threshold [0.0]
[22:02:45] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1202 is OK Less than 1.00% above the threshold [0.0]
[22:03:35] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1215 is OK Less than 1.00% above the threshold [0.0]
[22:03:43] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1208 is OK Less than 1.00% above the threshold [0.0]
[22:03:43] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1205 is OK Less than 1.00% above the threshold [0.0]
[22:04:03] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1404 is OK Less than 1.00% above the threshold [0.0]
[22:04:39] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1407 is OK Less than 1.00% above the threshold [0.0]
[22:04:47] <shinken-wm>	 RECOVERY - Puppet failure on tools-trusty is OK Less than 1.00% above the threshold [0.0]
[22:04:53] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-generic-1401 is OK Less than 1.00% above the threshold [0.0]
[22:05:25] <shinken-wm>	 RECOVERY - Puppet failure on tools-static-01 is OK Less than 1.00% above the threshold [0.0]
[22:05:49] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1203 is OK Less than 1.00% above the threshold [0.0]
[22:06:58] <shinken-wm>	 RECOVERY - Puppet failure on tools-checker-01 is OK Less than 1.00% above the threshold [0.0]
[22:07:15] <shinken-wm>	 RECOVERY - Puppet failure on tools-checker-02 is OK Less than 1.00% above the threshold [0.0]
[22:07:15] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-generic-1404 is OK Less than 1.00% above the threshold [0.0]
[22:07:59] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1209 is OK Less than 1.00% above the threshold [0.0]
[22:08:44] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1207 is OK Less than 1.00% above the threshold [0.0]
[22:09:12] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1204 is OK Less than 1.00% above the threshold [0.0]
[22:09:18] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1205 is OK Less than 1.00% above the threshold [0.0]
[22:10:20] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1210 is OK Less than 1.00% above the threshold [0.0]
[22:15:29] <bd808>	 spagewmf: ugh. there are a bunch of puppet rules in mediawiki-vagrant that try to keep that error from happening.
[22:16:02] <spagewmf>	 bd808: OK, I'll file a bug.  Meanwhile what permission & owners should I give /var/www ?
[22:16:29] <bd808>	 it should be root:root. the problem is with something in the hhvm config
[22:16:48] <bd808>	 the www-data ran hhvm and it didn't get told the right place to put the hhbc file
[22:17:09] <bd808>	 s/the/when/
[22:17:31] <spagewmf>	 bd808: I ran all my `labs-vagrant provision` as myself and they seemed to complete fine, maybe the setup would have worked with `sudo` prepended.
[22:17:51] <bd808>	 nah. there is a hidden sudo
[22:18:11] <bd808>	 the ruby that is called shells out with sudo puppet ....
[22:19:15] <spagewmf>	 bd808: OK but /var/www/images doesn't exist, should it be somewhere else or should I create it there and change its ownership?
[22:19:58] <bd808>	 let me look at a vagrant managed host...
[22:20:47] <bd808>	 spagewmf: /srv/images is the default location
[22:20:54] <bd808>	 for uploaded images
[22:21:20] <spagewmf>	 bd808: hmm, my commons image ^ above has src="/images/thumb/c/c9/Wikipedia_Beta_search_on_Android_4.4.4_2015-02-09.png/800px-Wikipedia_Beta_search_on_Android_4.4.4_2015-02-09.png"
[22:21:26] <bd808>	 I thought we had instantcommons on by default too?
[22:21:48] <bd808>	 there should be apache rules for mapping /images
[22:22:31] <bd808>	 spagewmf: in /etc/apache2/site-confs/devwiki/00-default.conf
[22:22:41] <bd808>	 Alias /images "/srv/images"
[22:27:20] <shinken-wm>	 RECOVERY - Puppet failure on tools-webgrid-lighttpd-1401 is OK Less than 1.00% above the threshold [0.0]
[22:28:30] <spagewmf>	 bd808: I have a /srv/images owned by www-data and that alias, and http://devhub.wmflabs.org/images/spagetest/test.png works.  So the /var/www permissions is a false alarm, though still a problem for that .hhvm.hbc file
[22:29:10] <bd808>	 *nod* hopefully it was just for some maint script that ran at a weird point
[22:29:51] <wikibugs>	 6Labs, 10Beta-Cluster, 5Patch-For-Review: Move logs off NFS on beta - https://phabricator.wikimedia.org/T98289#1281134 (10yuvipanda) Awesome, thanks bd808 :)   about 186G of logs are in /data/project/logs/archive, I'll delete them in a week. Does that sound ok?
[22:30:38] <shinken-wm>	 RECOVERY - Puppet failure on tools-exec-1402 is OK Less than 1.00% above the threshold [0.0]
[22:31:06] <spagewmf>	 bd808: re: "I thought we had instantcommons on by default too?" some roles set it true, others false. I'm pretty sure mine was false until I changed it in a /vagrant/settings.d/10-misc.php
[22:43:26] <spagewmf>	 bd808: Sorry, the .hhvm.hbc permission errors are from a few days ago [1]. I have a /var/run/hhvm that has a recent fcgi.hhbc.sq3 in it.  [1] No timestamps in /var/log/hhvm/error.log lines, yet diehards rebel against systemd's journalctl
[22:45:14] <yuvipanda>	 spagewmf: wait, systemd?
[22:45:30] <yuvipanda>	 are you setting this up on a debian jessie instance?
[22:45:54] <yuvipanda>	 oh
[22:45:57] <yuvipanda>	 I misread that sentence
[22:45:58] <yuvipanda>	 I think
[22:45:59] <spagewmf>	 yuvipanda: no, I'm commenting that hhvm/error.log doesn't have timestamps in it, yet another log file reinventing the wheel.
[22:45:59] <yuvipanda>	 ignore
[22:46:11] <yuvipanda>	 spagewmf: yeah, I read that as ‘yet diehard rebel against systemd’s journalctl'
[22:46:29] <yuvipanda>	 and hence was thinking that it was the one rebelling against systemd’s journalctl
[22:46:56] <spagewmf>	 yuvipanda: I watch the revolting graybeards rail about systemd for entertainment :)
[22:47:02] <yuvipanda>	 ah
[22:47:12] <yuvipanda>	 I back out of all conflicts and watch from afar
[22:48:51] <spagewmf>	 yuvipanda: https://devuan.org/ , so sad
[22:49:14] <yuvipanda>	 spagewmf: yeah, good luck to them, I guess
[22:49:44] <yuvipanda>	 spagewmf: heh, latest dev update is from quite a few months ago
[22:50:24] <yuvipanda>	 spagewmf: haha https://lists.dyne.org/lurker/thread/20150506.000231.e016cb73.en.html
[22:51:05] <yuvipanda>	 spagewmf: also hating software seems a bit like a thing for people with too much time on their hands.
[23:00:51] <wikibugs>	 10Tool-Labs: Clean up huge logs on toollabs - https://phabricator.wikimedia.org/T98652#1281197 (10yuvipanda) Don't attempt to rm these from the labs instances tho - should be removed on the server directly or we'll kill NFS :)
[23:20:38] <shinken-wm>	 PROBLEM - Puppet failure on tools-precise-dev is CRITICAL 100.00% of data above the critical threshold [0.0]