[00:36:19] 10Beta-Cluster, 10Parsoid, 10VisualEditor: Cant open any page with in Betalabs , showing error "Error loading data from server: ve-api: Revision IDs (doc=0,api=216266) returned by server do not match. Would you like to retry?" - https://phabricator.wikimedia.org/T98522#1271171 (10Jdforrester-WMF) 5Open>3R... [00:36:24] 10Beta-Cluster, 10Parsoid, 10VisualEditor: Cant open any page with in Betalabs , showing error "Error loading data from server: ve-api: Revision IDs (doc=0,api=216266) returned by server do not match. Would you like to retry?" - https://phabricator.wikimedia.org/T98522#1270368 (10Jdforrester-WMF) Seems to be... [00:47:49] 10Beta-Cluster, 6Labs: Move logs off NFS on beta - https://phabricator.wikimedia.org/T98289#1271190 (10yuvipanda) NFS is paging a lot now, so I'll highly appreciate it if this can happen sooner than later :) [01:41:05] 10Beta-Cluster, 10VisualEditor, 10VisualEditor-MediaWiki, 3Editing Department 2014/15 Q4 blockers, 5WMF-deploy-2015-04-29_(1.26wmf4): VisualEditor fails to load on Beta Cluster, complaining about revID mis-match - https://phabricator.wikimedia.org/T97558#1271316 (10Jdforrester-WMF) 5Open>3Resolved a:... [01:41:10] 10Beta-Cluster, 10VisualEditor, 10VisualEditor-MediaWiki, 3Editing Department 2014/15 Q4 blockers, 5WMF-deploy-2015-04-29_(1.26wmf4): VisualEditor fails to load on Beta Cluster, complaining about revID mis-match - https://phabricator.wikimedia.org/T97558#1246405 (10Jdforrester-WMF) [01:50:50] PROBLEM - Free space - all mounts on deployment-bastion is CRITICAL deployment-prep.deployment-bastion.diskspace._var.byte_percentfree (<55.56%) [04:29:34] yuvipanda: want to hear something funny? I think a lot of the NFS problems caused by beta are actually caused by NFS problems in beta. [04:29:45] bd808: I KNEW IT! :D [04:29:55] nfs dies causing logs causing nfs death causing logs causing nfs death? [04:30:05] tail -f /data/project/logs/apache2.log [04:30:26] * yuvipanda does [04:30:36] the mw servers can't write their logs to nfs so they send error logs via syslog that say so [04:30:45] and that gets written to nfs [04:31:05] hahaha [04:31:06] wow [04:31:10] and that’s happening right now [04:31:21] has been for a long time I think [04:31:30] the stale file handles? [04:31:38] yeah [04:32:12] heh [04:32:23] so are these mw logs going to logstash? [04:32:34] perms on that share from deployment-mediawiki01 are wonky [04:32:42] the error logs do, yes [04:32:49] that access logs don't [04:32:58] and that's what is fialing to write [04:33:06] why are they not in /var/log [04:33:13] mw instances are new and don’t have stupid tiny var [04:33:30] because h.ashar didn't wnat to have to ssh all over the place to see logs [04:33:31] and who looks at access logs. [04:33:59] ok. so we’re going to move them back now [04:34:08] and he can set up dsh or other mechanisms if he needs to [04:34:17] (if they need to be written to disk at all) [04:34:24] I have a patch [04:34:30] woooo [04:34:40] all it does is remove the custom log config for beta apache [04:34:49] which makes things just like prod [04:34:52] +1 [04:34:54] <3 [04:35:24] access logs local and error logs to syslog which writes local, forwards to udp2log and logstash [04:35:45] that will still put the error logs on nfs for now [04:35:53] but that would be the next thing to fix [04:36:01] yeah, steps... [04:36:28] need a log storage box with local disk for the logs and then move the udp2log service there [04:36:33] easy peasy [04:36:40] :D [04:40:16] bd808: poke me when you’ve patch? I also thought of a niceish way to do auth for kibana for tools without requiring LDAP [04:47:43] !log cherry-picked https://gerrit.wikimedia.org/r/#/c/209680/ [04:47:51] Logged the message, Master [04:49:04] !log Symbolic link not allowed or link target not accessible: /srv/mediawiki/docroot/bits/static/master/extensions [04:49:05] PROBLEM - Puppet failure on deployment-mediawiki01 is CRITICAL 11.11% of data above the critical threshold [0.0] [04:49:08] Logged the message, Master [04:49:18] probably related to things ori was doing for prod [04:50:13] yeah [04:50:35] symlinks are wrong [04:50:53] in general? :) [04:51:02] for the beta docroot [04:51:19] off by a level, needs more .. [04:54:39] yuvipanda: that patch seems to work fine on mw01. just removed the file and then restarted apache [04:59:07] RECOVERY - Puppet failure on deployment-mediawiki01 is OK Less than 1.00% above the threshold [0.0] [05:14:30] !log apache2 access logs now only locally on instances in /var/log/apache2/other_vhosts_access.log; error log in /var/log/apache2.log and still relayed to deployment-bastion and logstash (works like production now) [05:14:35] Logged the message, Master [05:14:45] bd808: \o/ [05:14:52] bd808: so now we just need a fluroine? [05:14:59] yeah [05:15:16] I wonder how much disk we need for the logs? [05:15:25] we won't need much cpu [05:16:45] bd808: use a large and trim as needed? [05:17:15] that will probably work fine [05:17:22] I can poke at that tomorrow [05:17:33] and you should head home young man [05:17:36] bd808: <3 thank you very much [05:17:37] bd808: I’m home [05:17:43] left early today (7pm) [05:17:54] early-ish [05:17:57] am going to do anti burnout therapy tonight (beer and sleep before 11pm) [05:19:10] bd808: re: the logstash thing. We can just do OAuth and enforce it via lua on nginx :) [05:19:25] it’s not that crazy - toollabs webproxy already enforces a fuckton of things via lua on nginx [05:19:31] oh? fancy [05:19:51] ou can teach me how nginx works [05:19:54] bd808: yeah, we can basically do what we do for dynamicproxy but do it for auth [05:20:12] I’ve no idea how nginx works :P It’s been years since I fucked around with Lua there... [05:20:15] but it totally can do this [05:20:24] :) [05:20:25] you firewall off the host, and make nginx the sole way to get to it [05:20:32] and then that works :) [05:20:45] bd808: of course that should still be just for admins, I guess. [05:20:58] bd808: and we should just write a variant of heroku logs commandline for tools [05:21:42] we can do the nginx trick to isolate kibana too. I think we can get it all really [05:21:47] but admins first [05:22:06] and then test the nginx revers proxy stuff for leaks [05:22:27] yeah [05:22:43] well, and then trust kibana to not allow arbitrary queries [05:22:59] having a cli tool for searching the logs would be nice for other things too [05:23:07] yeah [05:23:12] and we can also aggressively limit [05:23:32] for toollabs I’m pretty sure I can make nginx do identd authentication :P [05:23:45] and issue a cookie or something [05:24:03] identd is so old skool [05:24:20] makes me want to create a ~/.plan file [05:24:31] hehe [05:24:37] bd808: but works great for our use case! [05:24:48] until the glorious container revolution takes us all, of course [05:27:42] 10Beta-Cluster, 6Labs, 5Patch-For-Review: Move logs off NFS on beta - https://phabricator.wikimedia.org/T98289#1271652 (10bd808) The apache2 logs in beta cluster now match the production config. This means that access logs are written to local disk at /var/log/apache2/other_vhosts_access.log on each host. Er... [06:06:17] PROBLEM - Content Translation Server on deployment-cxserver03 is CRITICAL: Connection refused [06:16:17] RECOVERY - Content Translation Server on deployment-cxserver03 is OK: HTTP OK: HTTP/1.1 200 OK - 1103 bytes in 0.024 second response time [06:35:46] RECOVERY - Free space - all mounts on deployment-bastion is OK All targets OK [07:22:51] 6Release-Engineering, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 6operations: nb subdomain redirects - https://phabricator.wikimedia.org/T86924#1271748 (10jayvdb) [07:42:51] 10Beta-Cluster: upgrade salt on deployment-prep to 2014.7 - https://phabricator.wikimedia.org/T92276#1271784 (10ArielGlenn) 5Open>3Resolved This upgrade is complete. While git deploy works fine, this is in part due to local modifications to code in the trigger package which were made during the previous upg... [10:45:50] PROBLEM - Puppet staleness on deployment-restbase02 is CRITICAL 100.00% of data above the critical threshold [43200.0] [12:19:43] PROBLEM - App Server Main HTTP Response on deployment-mediawiki01 is CRITICAL - Socket timeout after 10 seconds [12:26:32] PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL - Socket timeout after 10 seconds [12:26:33] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL - Socket timeout after 10 seconds [12:26:33] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL - Socket timeout after 10 seconds [12:26:33] PROBLEM - App Server Main HTTP Response on deployment-mediawiki03 is CRITICAL - Socket timeout after 10 seconds [12:28:16] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 48040 bytes in 1.233 second response time [12:30:54] RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 47738 bytes in 1.928 second response time [12:31:20] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 30076 bytes in 0.908 second response time [12:41:34] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL - Socket timeout after 10 seconds [12:41:34] PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL - Socket timeout after 10 seconds [12:41:35] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL - Socket timeout after 10 seconds [12:41:35] PROBLEM - Puppet failure on deployment-bastion is CRITICAL 66.67% of data above the critical threshold [0.0] [12:41:55] RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 47745 bytes in 0.808 second response time [12:42:19] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 30077 bytes in 1.462 second response time [12:44:15] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 48032 bytes in 1.674 second response time [12:44:33] RECOVERY - App Server Main HTTP Response on deployment-mediawiki01 is OK: HTTP OK: HTTP/1.1 200 OK - 47744 bytes in 0.921 second response time [12:44:37] RECOVERY - App Server Main HTTP Response on deployment-mediawiki03 is OK: HTTP OK: HTTP/1.1 200 OK - 48451 bytes in 3.058 second response time [12:45:53] PROBLEM - HHVM Queue Size on deployment-mediawiki01 is CRITICAL 33.33% of data above the critical threshold [80.0] [12:50:52] RECOVERY - HHVM Queue Size on deployment-mediawiki01 is OK Less than 30.00% above the threshold [10.0] [13:18:26] RECOVERY - Puppet failure on deployment-bastion is OK Less than 1.00% above the threshold [0.0] [13:58:21] 10Beta-Cluster, 10Parsoid, 10VisualEditor: Cant open any page with in Betalabs , showing error "Error loading data from server: ve-api: Revision IDs (doc=0,api=216266) returned by server do not match. Would you like to retry?" - https://phabricator.wikimedia.org/T98522#1272306 (10Ryasmeen) yup working now [13:58:42] 10Beta-Cluster, 10Parsoid, 10VisualEditor, 7Verified: Cant open any page with in Betalabs , showing error "Error loading data from server: ve-api: Revision IDs (doc=0,api=216266) returned by server do not match. Would you like to retry?" - https://phabricator.wikimedia.org/T98522#1272307 (10Ryasmeen) [14:42:23] 10Beta-Cluster, 10Graphoid: Deploy Graphoid on Beta Cluster - https://phabricator.wikimedia.org/T97606#1272387 (10mobrovac) >>! In T97606#1262386, @Yurik wrote: > @mobrovac, what are the steps for the manual deployment from deployment-bastion? How are they different from production? Thx! (P.S. its alive!!!)... [15:06:49] 10Browser-Tests, 10Wikimania-Hackathon-2015, 10Wikimedia-Hackathon-2015: Workshop: write the first browsertests/Selenium test - https://phabricator.wikimedia.org/T94024#1272426 (10zeljkofilipin) I am still planning to run the session. I need at least 1 hour. Depending on how many people apply, their previous... [15:08:29] 10Browser-Tests, 10Wikimania-Hackathon-2015, 10Wikimedia-Hackathon-2015: Workshop: Fix broken browsertests/Selenium Jenkins jobs - https://phabricator.wikimedia.org/T94299#1272431 (10zeljkofilipin) I am still planning to run the session. I need at least 1 hour. Depending on how many people apply and how hard... [15:13:05] 10Browser-Tests, 6Release-Engineering: Determine weekly triage meeting for Browser Tests - https://phabricator.wikimedia.org/T98207#1272450 (10zeljkofilipin) Let's try Wednesday at 8:30 PDT. I have a meeting with Greg on Thursday at 9 am. [15:13:07] 10Beta-Cluster, 10Parsoid, 10VisualEditor, 7Verified: Cant open any page with in Beta Cluster , showing error "Error loading data from server: ve-api: Revision IDs (doc=0,api=216266) returned by server do not match. Would you like to retry?" - https://phabricator.wikimedia.org/T98522#1272451 (10greg) [15:13:16] 10Browser-Tests, 6Release-Engineering: Determine weekly triage meeting for Browser Tests - https://phabricator.wikimedia.org/T98207#1272453 (10zeljkofilipin) [15:36:18] 10Browser-Tests, 6Release-Engineering: Determine weekly triage meeting for Browser Tests - https://phabricator.wikimedia.org/T98207#1272497 (10greg) 5Open>3Resolved a:3greg >>! In T98207#1272450, @zeljkofilipin wrote: > Let's try Wednesday at 8:30 PDT. I have a meeting with Greg on Thursday at 9 am. Dec... [15:36:30] 10Browser-Tests, 6Release-Engineering: Determine weekly triage meeting for Browser Tests - https://phabricator.wikimedia.org/T98207#1272500 (10greg) a:5greg>3zeljkofilipin [15:56:41] 6Release-Engineering, 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Grant access for aklapper to phab-admins - https://phabricator.wikimedia.org/T97642#1272529 (10coren) @aklapper: once you return, we need and ssh key and a signature on L3 then we're all set to set you up. [16:23:54] zeljkof: Hello! I see that you're online :) Do you have a link to the hotel in Annecy? I cannot find it anywhere ... [16:25:41] etonkovidova: let me see, I had it somewhere... [16:26:40] zeljkof: wonder how everybody stays amazingly calm without having the link - LOL [16:26:40] etonkovidova: main page https://office.wikimedia.org/wiki/Lyon_Hackathon [16:27:15] etonkovidova: hotel https://office.wikimedia.org/wiki/Lyon_Travel_Information_Packet#Lodging_and_Venue_for_the_Wikimedia_Hackathon [16:27:31] zeljkof: Hurray! many thanks [16:28:06] it was in e-mail from travel team, somewhere in mail... [16:28:18] etonkovidova: no problem 😃 [16:29:07] PROBLEM - Puppet failure on deployment-zotero01 is CRITICAL 100.00% of data above the critical threshold [0.0] [16:29:08] zeljkof: hmm.. it's about Lyon only [16:29:29] etonkovidova: oh, sorry, you are asking for Annency? [16:29:38] zeljkof: yes! [16:29:40] * zeljkof did not read carefully [16:29:49] hm, I do not know if I have it [16:30:07] greg-g: do you have hotel information for Annency? (not Lyon) [16:30:33] zeljkof: i don't, I just pinged rachel to send us the travel packets [16:30:57] greg-g: thanks [16:31:00] np! [16:31:10] etonkovidova: see ^ :) [16:31:20] zeljkof: and greg-g - I look comparing to you a little paranoid ... [16:31:46] annual review? [16:31:49] etonkovidova: for me it is just a short flight, I do not even need a passport :) [16:34:20] greg-g: trip to Annecy [16:35:49] (03PS6) 10JanZerebecki: Added job for WikidataQuality extension. [integration/config] - 10https://gerrit.wikimedia.org/r/206392 (owner: 10Soeren.oldag) [16:37:10] (03PS7) 10JanZerebecki: Added job for WikidataQuality extension. [integration/config] - 10https://gerrit.wikimedia.org/r/206392 (owner: 10Soeren.oldag) [16:37:43] (03CR) 10JanZerebecki: "PS7 is only a rebase" [integration/config] - 10https://gerrit.wikimedia.org/r/206392 (owner: 10Soeren.oldag) [16:54:49] (03CR) 10JanZerebecki: "Checked that Wikibase also works with composer-package-validate: https://integration.wikimedia.org/ci/job/php-composer-package-validate/68" [integration/config] - 10https://gerrit.wikimedia.org/r/206392 (owner: 10Soeren.oldag) [16:56:34] Has phpunit-hhvm been broken recently? [16:56:43] 6Release-Engineering, 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Grant access for aklapper to phab-admins - https://phabricator.wikimedia.org/T97642#1272648 (10Dzahn) a:3Aklapper [16:57:21] Oh, never mind. Might be a real issue [16:58:43] (03CR) 10JanZerebecki: "The composer-package-valide job still works: https://integration.wikimedia.org/ci/job/php-composer-package-validate/66/console" [integration/config] - 10https://gerrit.wikimedia.org/r/206392 (owner: 10Soeren.oldag) [16:59:44] greg-g: regarding Annual Review - should I put you as a reviewer explicitly on that 2014 - 2015 Annual Review: Co-Worker Feedback Nomination? [17:00:00] greg-g: or you will be my reviewer by default? :) [17:00:52] etonkovidova: I'll be reviewing everyone else's review of you, and putting my own feedback in, so, no need to list me [17:01:15] greg-g: good - thx! [17:05:19] yuvipanda: deployment-fluorine -- trusty or jessie? (the real fluorine is precise) [17:15:02] 6Release-Engineering, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 6operations: nb subdomain redirects - https://phabricator.wikimedia.org/T86924#1272673 (10Dzahn) >>! In T86924#985115, @Aklapper wrote: > Not sure either if this is still site-requests level (shell) or already lower on DNS level (... [17:18:20] 10Deployment-Systems: Come up with an abstract deployment model that roughly addresses the needs of existing projects - https://phabricator.wikimedia.org/T97068#1272680 (10mmodell) @fgiunchedi: if you have anything to add, please add your thoughts on requirements for sentry deployments [17:21:36] bd808: same as real flourine I would say [17:24:11] yuvipanda: you missed the next line: the real one is precise [17:24:31] I went with trusty [17:24:36] It should work fine [17:24:54] I didn't want to try and figure out rsyslog on jessie [17:26:18] greg-g: yeah I didn't. I had the same conversation with ^d about staying - stick to same distribution as prod until prod changes [17:26:30] yuvipanda: :P [17:26:39] * greg-g assumed there was an issue with precise labs instances [17:26:55] nfs I think now [17:27:11] also gawd so old [17:27:38] greg-g: we had apache writing logs to NFS about how it had errors writing logs to nfs [17:28:43] which in turn made NFS sad for the whole labs environment [17:29:26] (03CR) 10JanZerebecki: "Deployed to Jenkins: mwext-WikidataQuality-npm, mwext-WikidataQuality-qunit, mwext-WikidataQuality-repo-tests-mysql-hhvm, mwext-WikidataQu" [integration/config] - 10https://gerrit.wikimedia.org/r/206392 (owner: 10Soeren.oldag) [17:31:21] (03PS8) 10JanZerebecki: Added job for WikidataQuality extension. [integration/config] - 10https://gerrit.wikimedia.org/r/206392 (owner: 10Soeren.oldag) [17:36:25] (03CR) 10JanZerebecki: "Deployed to Jenkins again: mwext-WikidataQuality-repo-tests-mysql-hhvm, mwext-WikidataQuality-repo-tests-mysql-zend, mwext-WikidataQuality" [integration/config] - 10https://gerrit.wikimedia.org/r/206392 (owner: 10Soeren.oldag) [17:41:07] PROBLEM - Puppet failure on deployment-fluorine is CRITICAL 14.29% of data above the critical threshold [0.0] [17:45:29] bd808: yeah [17:45:31] bd808: thanks for working on it :) [17:45:52] more fun that writing reviews ;) [17:46:08] RECOVERY - Puppet failure on deployment-fluorine is OK Less than 1.00% above the threshold [0.0] [18:15:32] !log Cherry-picked https://gerrit.wikimedia.org/r/#/c/209769/ [18:15:36] Logged the message, Master [18:19:52] 10Beta-Cluster, 6operations: Can't apply ::role::logging::mediawiki on a trusty host - https://phabricator.wikimedia.org/T98627#1272912 (10bd808) 3NEW [18:21:32] PROBLEM - Puppet failure on integration-zuul-packaged is CRITICAL 100.00% of data above the critical threshold [0.0] [18:22:07] PROBLEM - Puppet failure on deployment-fluorine is CRITICAL 62.50% of data above the critical threshold [0.0] [18:30:15] 10Beta-Cluster: Grant ssh access on deployment-elastic0{5-8}.eqiad.wmflabs - https://phabricator.wikimedia.org/T98624#1272965 (10Krenair) [18:30:50] 10Beta-Cluster, 6operations: Can't apply ::role::logging::mediawiki on a trusty host - https://phabricator.wikimedia.org/T98627#1272966 (10bd808) I'm going to rebuild the instance I was working on as a precise host. Someday we can redo the beta cluster version after it has been sorted out for prod as either tr... [18:32:11] PROBLEM - Host deployment-fluorine is DOWN: CRITICAL - Host Unreachable (10.68.16.197) [18:36:05] RECOVERY - Host deployment-fluorine is UPING OK - Packet loss = 0%, RTA = 0.72 ms [18:40:27] (03PS9) 10JanZerebecki: Added job for WikidataQuality extension. [integration/config] - 10https://gerrit.wikimedia.org/r/206392 (owner: 10Soeren.oldag) [19:22:07] PROBLEM - Puppet failure on deployment-fluorine is CRITICAL 16.67% of data above the critical threshold [0.0] [19:24:42] yurik, yuvipanda , bd808, greg-g: I realize I'm late to the party, but can any of you review me? Just "He edits wiki pages, usually doesn't make 'em any worse. Wait, everybody does that!" [19:24:52] (03CR) 10JanZerebecki: "Changing the order of wikibase and mw apply settings make the db update script work https://integration.wikimedia.org/ci/job/mwext-Wikidat" [integration/config] - 10https://gerrit.wikimedia.org/r/206392 (owner: 10Soeren.oldag) [19:25:19] spagewmf, who are you? [19:25:21] spagewmf: :( I hit my limit yesterday [19:25:24] do i know you? [19:25:27] :-P [19:25:49] spagewmf: have you asked anomie? [19:25:55] spagewmf: I'm full up :( [19:27:05] RECOVERY - Puppet failure on deployment-fluorine is OK Less than 1.00% above the threshold [0.0] [19:27:06] spagewmf: I hit my limit too :( sorry [19:29:20] No worries, I'll keep looking (is Dan Duvall in here under an alias?). I squeezed onto anomie's dance card. [19:29:36] spagewmf: dan's either on a flight to france or in france [19:29:53] bon voyage [19:30:21] Vive la France! [19:32:07] Vive la victoire! [19:32:26] (I couldn't quite figure out why everything was closed on a friday, until I realized it's the 8th) [19:33:29] Oh! It is VE day [19:34:43] oh, VE is done? [19:36:19] (03CR) 10JanZerebecki: [C: 04-1] Added job for WikidataQuality extension. [integration/config] - 10https://gerrit.wikimedia.org/r/206392 (owner: 10Soeren.oldag) [20:05:30] !log Cherry-picked https://gerrit.wikimedia.org/r/#/c/209801 [20:05:32] Logged the message, Master [21:17:24] PROBLEM - Puppet staleness on deployment-eventlogging02 is CRITICAL 100.00% of data above the critical threshold [43200.0] [23:19:15] 6Release-Engineering, 6Phabricator: Next Phabricator upgrade on 2015-05-20 (tentative) - https://phabricator.wikimedia.org/T98451#1273789 (10Qgil) The next upgrade should bring public Conpherence rooms, now prevented only by a bug that I reported directly upstream: https://secure.phabricator.com/T8102 [23:23:57] 10Beta-Cluster, 6Labs, 5Patch-For-Review: Move logs off NFS on beta - https://phabricator.wikimedia.org/T98289#1273803 (10bd808) I created a new instance in deployment-prep named `deployment-fluorine.eqiad.wmflabs`. This host is an m1-large instance with 58G of local disk storage at /srv/mw-log (symlinked to... [23:25:46] yuvipanda: one step closer. Next up is the mediawiki-config patch [23:26:51] bd808: <3 [23:27:09] bd808: re: logging terribleness, see also: https://phabricator.wikimedia.org/T98652. a 1T log file filled with php notices and errors only [23:27:52] heh. no auto rotation I take it [23:27:52] yuvipanda, careful when deleting that file [23:28:01] Krenair: yeah, going to be :) [23:28:06] I deleted one once [23:28:13] Krenair: also I can delete that on the NFS host [23:28:17] Pretty sure labs was down for like 15 minutes after [23:28:33] for sure do it from the nfs master [23:30:32] I kind of want to merge and sync that mediawiki-config patch but 17:30 local time on a Friday is probably not a good idea [23:30:55] the oauth one? [23:31:05] or your logstash one? [23:31:06] or..? [23:31:10] logstash one [23:31:13] https://gerrit.wikimedia.org/r/#/c/209825 [23:31:18] beta only [23:31:35] bd808: can we cherry pick it? :D [23:31:40] nope [23:31:50] I’m all for pulling it off before the weekend primarily because less chance of paging on weekend :) [23:31:56] although NFS is more stable today [23:32:04] so I’m going to leave it to you to decide [23:32:58] my inner ori is fighting with my sense of not wanting to break shit [23:33:07] * Reedy s [23:33:10] FAIL [23:33:25] ohi Reedy [23:33:55] I take that as an vote for jfdi [23:34:54] It was supposed to be something along those lines [23:35:21] * bd808 succumbs to peer pressure [23:36:09] just blame me [23:36:10] I do [23:37:49] "Reedy made me do it" -- I think I've said that before [23:46:11] hey, Reedy still has honorary +v in here, just saying [23:46:44] I've still got the powers to go fuck shit up if necessary ;) [23:47:01] Just the way I like it [23:53:41] 10Beta-Cluster, 6Labs, 5Patch-For-Review: Move logs off NFS on beta - https://phabricator.wikimedia.org/T98289#1273883 (10bd808) MediaWiki debug logs are now switched to deployment-fluorine [23:54:32] !log Switched MediaWiki debug logs to deployment-fluorine:/srv/mw-log [23:54:38] Logged the message, Master [23:58:09] 10Deployment-Systems, 6operations, 7Graphite, 5Patch-For-Review: [scap] Deploy events aren't showing up in graphite/gdash - https://phabricator.wikimedia.org/T64667#1273893 (10greg) Thanks @fgiunchedi ! [23:59:45] !log Created /data/project/logs/WHERE_DID_THE_LOGS_GO.txt to point folks to the right places [23:59:47] Logged the message, Master