[00:36:19] <wikibugs>	 10Beta-Cluster, 10Parsoid, 10VisualEditor: Cant open any page with in Betalabs , showing error "Error loading data from server: ve-api: Revision IDs (doc=0,api=216266) returned by server do not match. Would you like to retry?" - https://phabricator.wikimedia.org/T98522#1271171 (10Jdforrester-WMF) 5Open>3R...
[00:36:24] <wikibugs>	 10Beta-Cluster, 10Parsoid, 10VisualEditor: Cant open any page with in Betalabs , showing error "Error loading data from server: ve-api: Revision IDs (doc=0,api=216266) returned by server do not match. Would you like to retry?" - https://phabricator.wikimedia.org/T98522#1270368 (10Jdforrester-WMF) Seems to be...
[00:47:49] <wikibugs>	 10Beta-Cluster, 6Labs: Move logs off NFS on beta - https://phabricator.wikimedia.org/T98289#1271190 (10yuvipanda) NFS is paging a lot now, so I'll highly appreciate it if this can happen sooner than later :)
[01:41:05] <wikibugs>	 10Beta-Cluster, 10VisualEditor, 10VisualEditor-MediaWiki, 3Editing Department 2014/15 Q4 blockers, 5WMF-deploy-2015-04-29_(1.26wmf4): VisualEditor fails to load on Beta Cluster, complaining about revID mis-match - https://phabricator.wikimedia.org/T97558#1271316 (10Jdforrester-WMF) 5Open>3Resolved a:...
[01:41:10] <wikibugs>	 10Beta-Cluster, 10VisualEditor, 10VisualEditor-MediaWiki, 3Editing Department 2014/15 Q4 blockers, 5WMF-deploy-2015-04-29_(1.26wmf4): VisualEditor fails to load on Beta Cluster, complaining about revID mis-match - https://phabricator.wikimedia.org/T97558#1246405 (10Jdforrester-WMF)
[01:50:50] <shinken-wm>	 PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL deployment-prep.deployment-bastion.diskspace._var.byte_percentfree (<55.56%)
[04:29:34] <bd808>	 yuvipanda: want to hear something funny? I think a lot of the NFS problems caused by beta are actually caused by NFS problems in beta.
[04:29:45] <yuvipanda>	 bd808: I KNEW IT! :D
[04:29:55] <yuvipanda>	 nfs dies causing logs causing nfs death causing logs causing nfs death?
[04:30:05] <bd808>	 tail -f /data/project/logs/apache2.log
[04:30:26] * yuvipanda does
[04:30:36] <bd808>	 the mw servers can't write their logs to nfs so they send error logs via syslog that say so
[04:30:45] <bd808>	 and that gets written to nfs 
[04:31:05] <yuvipanda>	 hahaha
[04:31:06] <yuvipanda>	 wow
[04:31:10] <yuvipanda>	 and that’s happening right now
[04:31:21] <bd808>	 has been for a long time I think
[04:31:30] <yuvipanda>	 the stale file handles?
[04:31:38] <bd808>	 yeah
[04:32:12] <yuvipanda>	 heh
[04:32:23] <yuvipanda>	 so are these mw logs going to logstash?
[04:32:34] <bd808>	 perms on that share from deployment-mediawiki01 are wonky
[04:32:42] <bd808>	 the error logs do, yes
[04:32:49] <bd808>	 that access logs don't
[04:32:58] <bd808>	 and that's what is fialing to write
[04:33:06] <yuvipanda>	 why are they not in /var/log
[04:33:13] <yuvipanda>	 mw instances are new and don’t have stupid tiny var
[04:33:30] <bd808>	 because h.ashar didn't wnat to have to ssh all over the place to see logs
[04:33:31] <yuvipanda>	 and who looks at access logs.
[04:33:59] <yuvipanda>	 ok. so we’re going to move them back now
[04:34:08] <yuvipanda>	 and he can set up dsh or other mechanisms if he needs to
[04:34:17] <yuvipanda>	 (if they need to be written to disk at all)
[04:34:24] <bd808>	 I have a patch
[04:34:30] <yuvipanda>	 woooo
[04:34:40] <bd808>	 all it does is remove the custom log config for beta apache
[04:34:49] <bd808>	 which makes things just like prod
[04:34:52] <yuvipanda>	 +1
[04:34:54] <yuvipanda>	 <3
[04:35:24] <bd808>	 access logs local and error logs to syslog which writes local, forwards to udp2log and logstash
[04:35:45] <bd808>	 that will still put the error logs on nfs for now
[04:35:53] <bd808>	 but that would be the next thing to fix
[04:36:01] <yuvipanda>	 yeah, steps...
[04:36:28] <bd808>	 need a log storage box with local disk for the logs and then move the udp2log service there
[04:36:33] <bd808>	 easy peasy
[04:36:40] <yuvipanda>	 :D
[04:40:16] <yuvipanda>	 bd808: poke me when you’ve patch? I also thought of a niceish way to do auth for kibana for tools without requiring LDAP
[04:47:43] <bd808>	 !log cherry-picked https://gerrit.wikimedia.org/r/#/c/209680/
[04:47:51] <qa-morebots>	 Logged the message, Master
[04:49:04] <bd808>	 !log Symbolic link not allowed or link target not accessible: /srv/mediawiki/docroot/bits/static/master/extensions
[04:49:05] <shinken-wm>	 PROBLEM - Puppet failure on deployment-mediawiki01 is CRITICAL 11.11% of data above the critical threshold [0.0]
[04:49:08] <qa-morebots>	 Logged the message, Master
[04:49:18] <bd808>	 probably related to things ori was doing for prod
[04:50:13] <yuvipanda>	 yeah
[04:50:35] <bd808>	 symlinks are wrong
[04:50:53] <yuvipanda>	 in general? :)
[04:51:02] <bd808>	 for the beta docroot
[04:51:19] <bd808>	 off by a level, needs more ..
[04:54:39] <bd808>	 yuvipanda: that patch seems to work fine on mw01. just removed the file and then restarted apache
[04:59:07] <shinken-wm>	 RECOVERY - Puppet failure on deployment-mediawiki01 is OK Less than 1.00% above the threshold [0.0]
[05:14:30] <bd808>	 !log apache2 access logs now only locally on instances in /var/log/apache2/other_vhosts_access.log; error log in /var/log/apache2.log and still relayed to deployment-bastion and logstash (works like production now)
[05:14:35] <qa-morebots>	 Logged the message, Master
[05:14:45] <yuvipanda>	 bd808: \o/
[05:14:52] <yuvipanda>	 bd808: so now we just need a fluroine?
[05:14:59] <bd808>	 yeah
[05:15:16] <bd808>	 I wonder how much disk we need for the logs?
[05:15:25] <bd808>	 we won't need much cpu
[05:16:45] <yuvipanda>	 bd808: use a large and trim as needed?
[05:17:15] <bd808>	 that will probably work fine
[05:17:22] <bd808>	 I can poke at that tomorrow
[05:17:33] <bd808>	 and you should head home young man
[05:17:36] <yuvipanda>	 bd808: <3 thank you very much
[05:17:37] <yuvipanda>	 bd808: I’m home
[05:17:43] <yuvipanda>	 left early today (7pm)
[05:17:54] <bd808>	 early-ish
[05:17:57] <yuvipanda>	 am going to do anti burnout therapy tonight (beer and sleep before 11pm)
[05:19:10] <yuvipanda>	 bd808: re: the logstash thing. We can just do OAuth and enforce it via lua on nginx :)
[05:19:25] <yuvipanda>	 it’s not that crazy - toollabs webproxy already enforces a fuckton of things via lua on nginx
[05:19:31] <bd808>	 oh? fancy
[05:19:51] <bd808>	 ou can teach me how nginx works
[05:19:54] <yuvipanda>	 bd808: yeah, we can basically do what we do for dynamicproxy but do it for auth
[05:20:12] <yuvipanda>	 I’ve no idea how nginx works :P It’s been years since I fucked around with Lua there...
[05:20:15] <yuvipanda>	 but it totally can do this
[05:20:24] <bd808>	 :)
[05:20:25] <yuvipanda>	 you firewall off the host, and make nginx the sole way to get to it
[05:20:32] <yuvipanda>	 and then that works :)
[05:20:45] <yuvipanda>	 bd808: of course that should still be just for admins, I guess. 
[05:20:58] <yuvipanda>	 bd808: and we should just write a variant of heroku logs commandline for tools
[05:21:42] <bd808>	 we can do the nginx trick to isolate kibana too. I think we can get it all really
[05:21:47] <bd808>	 but admins first
[05:22:06] <bd808>	 and then test the nginx revers proxy stuff for leaks
[05:22:27] <yuvipanda>	 yeah
[05:22:43] <yuvipanda>	 well, and then trust kibana to not allow arbitrary queries
[05:22:59] <bd808>	 having a cli tool for searching the logs would be nice for other things too
[05:23:07] <yuvipanda>	 yeah
[05:23:12] <yuvipanda>	 and we can also aggressively limit
[05:23:32] <yuvipanda>	 for toollabs I’m pretty sure I can make nginx do identd authentication :P
[05:23:45] <yuvipanda>	 and issue a cookie or something
[05:24:03] <bd808>	 identd is so old skool
[05:24:20] <bd808>	 makes me want to create a ~/.plan file
[05:24:31] <yuvipanda>	 hehe
[05:24:37] <yuvipanda>	 bd808: but works great for our use case!
[05:24:48] <yuvipanda>	 until the glorious container revolution takes us all, of course
[05:27:42] <wikibugs>	 10Beta-Cluster, 6Labs, 5Patch-For-Review: Move logs off NFS on beta - https://phabricator.wikimedia.org/T98289#1271652 (10bd808) The apache2 logs in beta cluster now match the production config. This means that access logs are written to local disk at /var/log/apache2/other_vhosts_access.log on each host. Er...
[06:06:17] <shinken-wm>	 PROBLEM - Content Translation Server on deployment-cxserver03 is CRITICAL: Connection refused
[06:16:17] <shinken-wm>	 RECOVERY - Content Translation Server on deployment-cxserver03 is OK: HTTP OK: HTTP/1.1 200 OK - 1103 bytes in 0.024 second response time
[06:35:46] <shinken-wm>	 RECOVERY - Free space - all mounts on deployment-bastion is OK All targets OK
[07:22:51] <wikibugs>	 6Release-Engineering, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 6operations: nb subdomain redirects - https://phabricator.wikimedia.org/T86924#1271748 (10jayvdb)
[07:42:51] <wikibugs>	 10Beta-Cluster: upgrade salt on deployment-prep to 2014.7 - https://phabricator.wikimedia.org/T92276#1271784 (10ArielGlenn) 5Open>3Resolved This upgrade is complete.  While git deploy works fine, this is in part due to local modifications to code in the trigger package which were made during the previous upg...
[10:45:50] <shinken-wm>	 PROBLEM - Puppet staleness on deployment-restbase02 is CRITICAL 100.00% of data above the critical threshold [43200.0]
[12:19:43] <shinken-wm>	 PROBLEM - App Server Main HTTP Response on deployment-mediawiki01 is CRITICAL - Socket timeout after 10 seconds
[12:26:32] <shinken-wm>	 PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL - Socket timeout after 10 seconds
[12:26:33] <shinken-wm>	 PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL - Socket timeout after 10 seconds
[12:26:33] <shinken-wm>	 PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL - Socket timeout after 10 seconds
[12:26:33] <shinken-wm>	 PROBLEM - App Server Main HTTP Response on deployment-mediawiki03 is CRITICAL - Socket timeout after 10 seconds
[12:28:16] <shinken-wm>	 RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 48040 bytes in 1.233 second response time
[12:30:54] <shinken-wm>	 RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 47738 bytes in 1.928 second response time
[12:31:20] <shinken-wm>	 RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 30076 bytes in 0.908 second response time
[12:41:34] <shinken-wm>	 PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL - Socket timeout after 10 seconds
[12:41:34] <shinken-wm>	 PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL - Socket timeout after 10 seconds
[12:41:35] <shinken-wm>	 PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL - Socket timeout after 10 seconds
[12:41:35] <shinken-wm>	 PROBLEM - Puppet failure on deployment-bastion is CRITICAL 66.67% of data above the critical threshold [0.0]
[12:41:55] <shinken-wm>	 RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 47745 bytes in 0.808 second response time
[12:42:19] <shinken-wm>	 RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 30077 bytes in 1.462 second response time
[12:44:15] <shinken-wm>	 RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 48032 bytes in 1.674 second response time
[12:44:33] <shinken-wm>	 RECOVERY - App Server Main HTTP Response on deployment-mediawiki01 is OK: HTTP OK: HTTP/1.1 200 OK - 47744 bytes in 0.921 second response time
[12:44:37] <shinken-wm>	 RECOVERY - App Server Main HTTP Response on deployment-mediawiki03 is OK: HTTP OK: HTTP/1.1 200 OK - 48451 bytes in 3.058 second response time
[12:45:53] <shinken-wm>	 PROBLEM - HHVM Queue Size on deployment-mediawiki01 is CRITICAL 33.33% of data above the critical threshold [80.0]
[12:50:52] <shinken-wm>	 RECOVERY - HHVM Queue Size on deployment-mediawiki01 is OK Less than 30.00% above the threshold [10.0]
[13:18:26] <shinken-wm>	 RECOVERY - Puppet failure on deployment-bastion is OK Less than 1.00% above the threshold [0.0]
[13:58:21] <wikibugs>	 10Beta-Cluster, 10Parsoid, 10VisualEditor: Cant open any page with in Betalabs , showing error "Error loading data from server: ve-api: Revision IDs (doc=0,api=216266) returned by server do not match. Would you like to retry?" - https://phabricator.wikimedia.org/T98522#1272306 (10Ryasmeen) yup working now
[13:58:42] <wikibugs>	 10Beta-Cluster, 10Parsoid, 10VisualEditor, 7Verified: Cant open any page with in Betalabs , showing error "Error loading data from server: ve-api: Revision IDs (doc=0,api=216266) returned by server do not match. Would you like to retry?" - https://phabricator.wikimedia.org/T98522#1272307 (10Ryasmeen)
[14:42:23] <wikibugs>	 10Beta-Cluster, 10Graphoid: Deploy Graphoid on Beta Cluster - https://phabricator.wikimedia.org/T97606#1272387 (10mobrovac) >>! In T97606#1262386, @Yurik wrote: > @mobrovac, what are the steps for the manual deployment from deployment-bastion? How are they different from production? Thx!  (P.S. its alive!!!)...
[15:06:49] <wikibugs>	 10Browser-Tests, 10Wikimania-Hackathon-2015, 10Wikimedia-Hackathon-2015: Workshop: write the first browsertests/Selenium test - https://phabricator.wikimedia.org/T94024#1272426 (10zeljkofilipin) I am still planning to run the session. I need at least 1 hour. Depending on how many people apply, their previous...
[15:08:29] <wikibugs>	 10Browser-Tests, 10Wikimania-Hackathon-2015, 10Wikimedia-Hackathon-2015: Workshop: Fix broken browsertests/Selenium Jenkins jobs - https://phabricator.wikimedia.org/T94299#1272431 (10zeljkofilipin) I am still planning to run the session. I need at least 1 hour. Depending on how many people apply and how hard...
[15:13:05] <wikibugs>	 10Browser-Tests, 6Release-Engineering: Determine weekly triage meeting for Browser Tests - https://phabricator.wikimedia.org/T98207#1272450 (10zeljkofilipin) Let's try Wednesday at 8:30 PDT. I have a meeting with Greg on Thursday at 9 am.
[15:13:07] <wikibugs>	 10Beta-Cluster, 10Parsoid, 10VisualEditor, 7Verified: Cant open any page with in Beta Cluster , showing error "Error loading data from server: ve-api: Revision IDs (doc=0,api=216266) returned by server do not match. Would you like to retry?" - https://phabricator.wikimedia.org/T98522#1272451 (10greg)
[15:13:16] <wikibugs>	 10Browser-Tests, 6Release-Engineering: Determine weekly triage meeting for Browser Tests - https://phabricator.wikimedia.org/T98207#1272453 (10zeljkofilipin)
[15:36:18] <wikibugs>	 10Browser-Tests, 6Release-Engineering: Determine weekly triage meeting for Browser Tests - https://phabricator.wikimedia.org/T98207#1272497 (10greg) 5Open>3Resolved a:3greg >>! In T98207#1272450, @zeljkofilipin wrote: > Let's try Wednesday at 8:30 PDT. I have a meeting with Greg on Thursday at 9 am.  Dec...
[15:36:30] <wikibugs>	 10Browser-Tests, 6Release-Engineering: Determine weekly triage meeting for Browser Tests - https://phabricator.wikimedia.org/T98207#1272500 (10greg) a:5greg>3zeljkofilipin
[15:56:41] <wikibugs>	 6Release-Engineering, 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Grant access for aklapper to phab-admins - https://phabricator.wikimedia.org/T97642#1272529 (10coren) @aklapper: once you return, we need and ssh key and a signature on L3 then we're all set to set you up.
[16:23:54] <etonkovidova>	 zeljkof: Hello! I see that you're online :) Do you have a link to the hotel in Annecy? I cannot find it anywhere ...
[16:25:41] <zeljkof>	 etonkovidova: let me see, I had it somewhere...
[16:26:40] <etonkovidova>	 zeljkof: wonder how everybody stays amazingly calm without having the link - LOL
[16:26:40] <zeljkof>	 etonkovidova: main page https://office.wikimedia.org/wiki/Lyon_Hackathon
[16:27:15] <zeljkof>	 etonkovidova: hotel https://office.wikimedia.org/wiki/Lyon_Travel_Information_Packet#Lodging_and_Venue_for_the_Wikimedia_Hackathon
[16:27:31] <etonkovidova>	 zeljkof: Hurray! many thanks
[16:28:06] <zeljkof>	 it was in e-mail from travel team, somewhere in mail...
[16:28:18] <zeljkof>	 etonkovidova: no problem 😃
[16:29:07] <shinken-wm>	 PROBLEM - Puppet failure on deployment-zotero01 is CRITICAL 100.00% of data above the critical threshold [0.0]
[16:29:08] <etonkovidova>	 zeljkof: hmm.. it's about Lyon only
[16:29:29] <zeljkof>	 etonkovidova: oh, sorry, you are asking for Annency?
[16:29:38] <etonkovidova>	 zeljkof: yes! 
[16:29:40] * zeljkof did not read carefully
[16:29:49] <zeljkof>	 hm, I do not know if I have it
[16:30:07] <zeljkof>	 greg-g: do you have hotel information for Annency? (not Lyon)
[16:30:33] <greg-g>	 zeljkof: i don't, I just pinged rachel to send us the travel packets
[16:30:57] <zeljkof>	 greg-g: thanks
[16:31:00] <greg-g>	 np!
[16:31:10] <zeljkof>	 etonkovidova: see ^ :)
[16:31:20] <etonkovidova>	 zeljkof: and greg-g - I look comparing to you a little paranoid ...
[16:31:46] <greg-g>	 annual review?
[16:31:49] <zeljkof>	 etonkovidova: for me it is just a short flight, I do not even need a passport :)
[16:34:20] <zeljkof>	 greg-g: trip to Annecy
[16:35:49] <grrrit-wm>	 (03PS6) 10JanZerebecki: Added job for WikidataQuality extension. [integration/config] - 10https://gerrit.wikimedia.org/r/206392 (owner: 10Soeren.oldag)
[16:37:10] <grrrit-wm>	 (03PS7) 10JanZerebecki: Added job for WikidataQuality extension. [integration/config] - 10https://gerrit.wikimedia.org/r/206392 (owner: 10Soeren.oldag)
[16:37:43] <grrrit-wm>	 (03CR) 10JanZerebecki: "PS7 is only a rebase" [integration/config] - 10https://gerrit.wikimedia.org/r/206392 (owner: 10Soeren.oldag)
[16:54:49] <grrrit-wm>	 (03CR) 10JanZerebecki: "Checked that Wikibase also works with composer-package-validate: https://integration.wikimedia.org/ci/job/php-composer-package-validate/68" [integration/config] - 10https://gerrit.wikimedia.org/r/206392 (owner: 10Soeren.oldag)
[16:56:34] <marktraceur>	 Has phpunit-hhvm been broken recently?
[16:56:43] <wikibugs>	 6Release-Engineering, 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Grant access for aklapper to phab-admins - https://phabricator.wikimedia.org/T97642#1272648 (10Dzahn) a:3Aklapper
[16:57:21] <marktraceur>	 Oh, never mind. Might be a real issue
[16:58:43] <grrrit-wm>	 (03CR) 10JanZerebecki: "The composer-package-valide job still works: https://integration.wikimedia.org/ci/job/php-composer-package-validate/66/console" [integration/config] - 10https://gerrit.wikimedia.org/r/206392 (owner: 10Soeren.oldag)
[16:59:44] <etonkovidova>	 greg-g: regarding Annual Review - should I put you  as a reviewer explicitly on that 2014 - 2015 Annual Review: Co-Worker Feedback Nomination?
[17:00:00] <etonkovidova>	 greg-g: or you will be my reviewer by default? :)
[17:00:52] <greg-g>	 etonkovidova: I'll be reviewing everyone else's review of you, and putting my own feedback in, so, no need to list me
[17:01:15] <etonkovidova>	 greg-g: good - thx!
[17:05:19] <bd808>	 yuvipanda: deployment-fluorine -- trusty or jessie? (the real fluorine is precise)
[17:15:02] <wikibugs>	 6Release-Engineering, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 6operations: nb subdomain redirects - https://phabricator.wikimedia.org/T86924#1272673 (10Dzahn) >>! In T86924#985115, @Aklapper wrote: > Not sure either if this is still site-requests level (shell) or already lower on DNS level (...
[17:18:20] <wikibugs>	 10Deployment-Systems: Come up with an abstract deployment model that roughly addresses the needs of existing projects - https://phabricator.wikimedia.org/T97068#1272680 (10mmodell) @fgiunchedi: if you have anything to add, please add your thoughts on requirements for sentry deployments
[17:21:36] <yuvipanda>	 bd808: same as real flourine I would say 
[17:24:11] <greg-g>	 yuvipanda: you missed the next line: the real one is precise
[17:24:31] <bd808>	 I went with trusty
[17:24:36] <bd808>	 It should work fine
[17:24:54] <bd808>	 I didn't want to try and figure out rsyslog on jessie
[17:26:18] <yuvipanda>	 greg-g: yeah I didn't. I had the same conversation with ^d about staying - stick to same distribution as prod until prod changes 
[17:26:30] <greg-g>	 yuvipanda: :P
[17:26:39] * greg-g assumed there was an issue with precise labs instances
[17:26:55] <bd808>	 nfs I think now
[17:27:11] <bd808>	 also gawd so old
[17:27:38] <yuvipanda>	 greg-g: we had apache writing logs to NFS about how it had errors writing logs to nfs
[17:28:43] <bd808>	 which in turn made NFS sad for the whole labs environment
[17:29:26] <grrrit-wm>	 (03CR) 10JanZerebecki: "Deployed to Jenkins: mwext-WikidataQuality-npm, mwext-WikidataQuality-qunit, mwext-WikidataQuality-repo-tests-mysql-hhvm, mwext-WikidataQu" [integration/config] - 10https://gerrit.wikimedia.org/r/206392 (owner: 10Soeren.oldag)
[17:31:21] <grrrit-wm>	 (03PS8) 10JanZerebecki: Added job for WikidataQuality extension. [integration/config] - 10https://gerrit.wikimedia.org/r/206392 (owner: 10Soeren.oldag)
[17:36:25] <grrrit-wm>	 (03CR) 10JanZerebecki: "Deployed to Jenkins again: mwext-WikidataQuality-repo-tests-mysql-hhvm, mwext-WikidataQuality-repo-tests-mysql-zend, mwext-WikidataQuality" [integration/config] - 10https://gerrit.wikimedia.org/r/206392 (owner: 10Soeren.oldag)
[17:41:07] <shinken-wm>	 PROBLEM - Puppet failure on deployment-fluorine is CRITICAL 14.29% of data above the critical threshold [0.0]
[17:45:29] <yuvipanda>	 bd808: yeah
[17:45:31] <yuvipanda>	 bd808: thanks for working on it :)
[17:45:52] <bd808>	 more fun that writing reviews ;)
[17:46:08] <shinken-wm>	 RECOVERY - Puppet failure on deployment-fluorine is OK Less than 1.00% above the threshold [0.0]
[18:15:32] <bd808>	 !log Cherry-picked https://gerrit.wikimedia.org/r/#/c/209769/
[18:15:36] <qa-morebots>	 Logged the message, Master
[18:19:52] <wikibugs>	 10Beta-Cluster, 6operations: Can't apply ::role::logging::mediawiki on a trusty host - https://phabricator.wikimedia.org/T98627#1272912 (10bd808) 3NEW
[18:21:32] <shinken-wm>	 PROBLEM - Puppet failure on integration-zuul-packaged is CRITICAL 100.00% of data above the critical threshold [0.0]
[18:22:07] <shinken-wm>	 PROBLEM - Puppet failure on deployment-fluorine is CRITICAL 62.50% of data above the critical threshold [0.0]
[18:30:15] <wikibugs>	 10Beta-Cluster: Grant ssh access on deployment-elastic0{5-8}.eqiad.wmflabs - https://phabricator.wikimedia.org/T98624#1272965 (10Krenair)
[18:30:50] <wikibugs>	 10Beta-Cluster, 6operations: Can't apply ::role::logging::mediawiki on a trusty host - https://phabricator.wikimedia.org/T98627#1272966 (10bd808) I'm going to rebuild the instance I was working on as a precise host. Someday we can redo the beta cluster version after it has been sorted out for prod as either tr...
[18:32:11] <shinken-wm>	 PROBLEM - Host deployment-fluorine is DOWN: CRITICAL - Host Unreachable (10.68.16.197)
[18:36:05] <shinken-wm>	 RECOVERY - Host deployment-fluorine is UPING OK - Packet loss = 0%, RTA = 0.72 ms
[18:40:27] <grrrit-wm>	 (03PS9) 10JanZerebecki: Added job for WikidataQuality extension. [integration/config] - 10https://gerrit.wikimedia.org/r/206392 (owner: 10Soeren.oldag)
[19:22:07] <shinken-wm>	 PROBLEM - Puppet failure on deployment-fluorine is CRITICAL 16.67% of data above the critical threshold [0.0]
[19:24:42] <spagewmf>	 yurik, yuvipanda , bd808, greg-g: I realize I'm late to the party, but can any of you review me?  Just "He edits wiki pages, usually doesn't make 'em any worse.  Wait, everybody does that!"
[19:24:52] <grrrit-wm>	 (03CR) 10JanZerebecki: "Changing the order of wikibase and mw apply settings make the db update script work https://integration.wikimedia.org/ci/job/mwext-Wikidat" [integration/config] - 10https://gerrit.wikimedia.org/r/206392 (owner: 10Soeren.oldag)
[19:25:19] <yurik>	 spagewmf, who are you?
[19:25:21] <bd808>	 spagewmf: :( I hit my limit yesterday
[19:25:24] <yurik>	 do i know you?
[19:25:27] <yurik>	 :-P
[19:25:49] <bd808>	 spagewmf: have you asked anomie?
[19:25:55] <greg-g>	 spagewmf: I'm full up :(
[19:27:05] <shinken-wm>	 RECOVERY - Puppet failure on deployment-fluorine is OK Less than 1.00% above the threshold [0.0]
[19:27:06] <yuvipanda>	 spagewmf: I hit my limit too :( sorry
[19:29:20] <spagewmf>	 No worries, I'll keep looking (is Dan Duvall in here under an alias?).  I squeezed onto anomie's dance card.
[19:29:36] <greg-g>	 spagewmf: dan's either on a flight to france or in france
[19:29:53] <spagewmf>	 bon voyage
[19:30:21] <bd808>	 Vive la France!
[19:32:07] <valhallasw`nuage>	 Vive la victoire!
[19:32:26] <valhallasw`nuage>	 (I couldn't quite figure out why everything was closed on a friday, until I realized it's the 8th)
[19:33:29] <bd808>	 Oh! It is VE day
[19:34:43] <greg-g>	 oh, VE is done?
[19:36:19] <grrrit-wm>	 (03CR) 10JanZerebecki: [C: 04-1] Added job for WikidataQuality extension. [integration/config] - 10https://gerrit.wikimedia.org/r/206392 (owner: 10Soeren.oldag)
[20:05:30] <bd808>	 !log Cherry-picked https://gerrit.wikimedia.org/r/#/c/209801
[20:05:32] <qa-morebots>	 Logged the message, Master
[21:17:24] <shinken-wm>	 PROBLEM - Puppet staleness on deployment-eventlogging02 is CRITICAL 100.00% of data above the critical threshold [43200.0]
[23:19:15] <wikibugs>	 6Release-Engineering, 6Phabricator: Next Phabricator upgrade on 2015-05-20 (tentative) - https://phabricator.wikimedia.org/T98451#1273789 (10Qgil) The next upgrade should bring public Conpherence rooms, now prevented only by a bug that I reported directly upstream: https://secure.phabricator.com/T8102
[23:23:57] <wikibugs>	 10Beta-Cluster, 6Labs, 5Patch-For-Review: Move logs off NFS on beta - https://phabricator.wikimedia.org/T98289#1273803 (10bd808) I created a new instance in deployment-prep named `deployment-fluorine.eqiad.wmflabs`. This host is an m1-large instance with 58G of local disk storage at /srv/mw-log (symlinked to...
[23:25:46] <bd808>	 yuvipanda: one step closer. Next up is the mediawiki-config patch
[23:26:51] <yuvipanda>	 bd808: <3
[23:27:09] <yuvipanda>	 bd808: re: logging terribleness, see also: https://phabricator.wikimedia.org/T98652. a 1T log file filled with php notices and errors only
[23:27:52] <bd808>	 heh. no auto rotation I take it
[23:27:52] <Krenair>	 yuvipanda, careful when deleting that file
[23:28:01] <yuvipanda>	 Krenair: yeah, going to be :)
[23:28:06] <Krenair>	 I deleted one once
[23:28:13] <yuvipanda>	 Krenair: also I can delete that on the NFS host
[23:28:17] <Krenair>	 Pretty sure labs was down for like 15 minutes after
[23:28:33] <bd808>	 for sure do it from the nfs master
[23:30:32] <bd808>	 I kind of want to merge and sync that mediawiki-config patch but 17:30 local time on a Friday is probably not a good idea
[23:30:55] <Krenair>	 the oauth one?
[23:31:05] <Krenair>	 or your logstash one?
[23:31:06] <Krenair>	 or..?
[23:31:10] <bd808>	 logstash one
[23:31:13] <bd808>	 https://gerrit.wikimedia.org/r/#/c/209825
[23:31:18] <bd808>	 beta only
[23:31:35] <yuvipanda>	 bd808: can we cherry pick it? :D
[23:31:40] <bd808>	 nope
[23:31:50] <yuvipanda>	 I’m all for pulling it off before the weekend primarily because less chance of paging on weekend :)
[23:31:56] <yuvipanda>	 although NFS is more stable today
[23:32:04] <yuvipanda>	 so I’m going to leave it to you to decide
[23:32:58] <bd808>	 my inner ori is fighting with my sense of not wanting to break shit
[23:33:07] * Reedy s
[23:33:10] <Reedy>	 FAIL
[23:33:25] <bd808>	 ohi Reedy 
[23:33:55] <bd808>	 I take that as an vote for jfdi
[23:34:54] <Reedy>	 It was supposed to be something along those lines
[23:35:21] * bd808 succumbs to peer pressure
[23:36:09] <Reedy>	 just blame me
[23:36:10] <Reedy>	 I do
[23:37:49] <bd808>	 "Reedy made me do it" -- I think I've said that before
[23:46:11] <greg-g>	 hey, Reedy still has honorary +v in here, just saying
[23:46:44] <Reedy>	 I've still got the powers to go fuck shit up if necessary ;)
[23:47:01] <greg-g>	 Just the way I like it
[23:53:41] <wikibugs>	 10Beta-Cluster, 6Labs, 5Patch-For-Review: Move logs off NFS on beta - https://phabricator.wikimedia.org/T98289#1273883 (10bd808) MediaWiki debug logs are now switched to deployment-fluorine
[23:54:32] <bd808>	 !log Switched MediaWiki debug logs to deployment-fluorine:/srv/mw-log
[23:54:38] <qa-morebots>	 Logged the message, Master
[23:58:09] <wikibugs>	 10Deployment-Systems, 6operations, 7Graphite, 5Patch-For-Review: [scap] Deploy events aren't showing up in graphite/gdash - https://phabricator.wikimedia.org/T64667#1273893 (10greg) Thanks @fgiunchedi !
[23:59:45] <bd808>	 !log Created /data/project/logs/WHERE_DID_THE_LOGS_GO.txt to point folks to the right places
[23:59:47] <qa-morebots>	 Logged the message, Master