[00:51:42] (03PS1) 10Awight: Fundraising CRM job can run concurrently now [integration/config] - 10https://gerrit.wikimedia.org/r/266447 (https://phabricator.wikimedia.org/T91903) [02:01:41] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:02:07] PROBLEM - App Server Main HTTP Response on deployment-mediawiki01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:02:17] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:02:35] PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [02:06:31] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 31731 bytes in 0.566 second response time [02:06:57] RECOVERY - App Server Main HTTP Response on deployment-mediawiki01 is OK: HTTP OK: HTTP/1.1 200 OK - 39540 bytes in 0.741 second response time [02:07:05] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 39866 bytes in 0.726 second response time [02:07:25] RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 39547 bytes in 0.536 second response time [03:16:58] (03PS3) 10Krinkle: Notify wikimedia-perf when WebPageTest fails [integration/config] - 10https://gerrit.wikimedia.org/r/265631 (owner: 10Phedenskog) [03:17:35] (03PS4) 10Krinkle: Notify wikimedia-perf when WebPageTest fails [integration/config] - 10https://gerrit.wikimedia.org/r/265631 (owner: 10Phedenskog) [03:17:48] (03CR) 10jenkins-bot: [V: 04-1] Notify wikimedia-perf when WebPageTest fails [integration/config] - 10https://gerrit.wikimedia.org/r/265631 (owner: 10Phedenskog) [03:18:39] (03CR) 10jenkins-bot: [V: 04-1] Notify wikimedia-perf when WebPageTest fails [integration/config] - 10https://gerrit.wikimedia.org/r/265631 (owner: 10Phedenskog) [04:05:14] (03CR) 10Krinkle: "There is a yaml logic error somewhere." [integration/config] - 10https://gerrit.wikimedia.org/r/265631 (owner: 10Phedenskog) [04:18:47] !log integration-slave-jessie-1001:/mnt full; cleaned up 15G of files in /mnt/pbuilder/build (27 hours after the last time I did that) [04:18:50] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [04:19:02] 10Continuous-Integration-Config: Frequent "No space left on device" failures for debian-glue jobs on integration-slave-jessie-1001 - https://phabricator.wikimedia.org/T124746#1964900 (10bd808) 3NEW [09:30:57] !log restarting Jenkins to upgrade the gearman plugin with https://review.openstack.org/#/c/271543/ [09:30:59] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [09:42:25] 10Continuous-Integration-Infrastructure, 6Release-Engineering-Team, 7Jenkins, 7Upstream: [upstream] Jenkins Gearman plugin has deadlock on executor threads (was: Beta Cluster stopped receiving code updates (beta-update-databases-eqiad hung) - https://phabricator.wikimedia.org/T72597#1965151 (10hashar) Upst... [10:35:34] Project beta-scap-eqiad build #87682: 04FAILURE in 2 min 59 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/87682/ [11:30:21] (03PS1) 10Hashar: Drop --verbose from castor rsync calls [integration/config] - 10https://gerrit.wikimedia.org/r/266485 [11:30:29] (03CR) 10Hashar: [C: 032] Drop --verbose from castor rsync calls [integration/config] - 10https://gerrit.wikimedia.org/r/266485 (owner: 10Hashar) [11:32:11] (03Merged) 10jenkins-bot: Drop --verbose from castor rsync calls [integration/config] - 10https://gerrit.wikimedia.org/r/266485 (owner: 10Hashar) [11:36:25] Yippee, build fixed! [11:36:25] Project beta-scap-eqiad build #87685: 09FIXED in 6 min 43 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/87685/ [11:38:20] (03PS3) 10Hashar: [labs/tools/ptable] tox [integration/config] - 10https://gerrit.wikimedia.org/r/265543 (owner: 10Ricordisamoa) [11:38:36] (03CR) 10Hashar: [C: 032] "Thanks :)" [integration/config] - 10https://gerrit.wikimedia.org/r/265543 (owner: 10Ricordisamoa) [11:39:35] (03Merged) 10jenkins-bot: [labs/tools/ptable] tox [integration/config] - 10https://gerrit.wikimedia.org/r/265543 (owner: 10Ricordisamoa) [12:14:37] !log Added Jenkins IRC bot (wmf-insecte) to #wikimedia-perf for https://gerrit.wikimedia.org/r/#/c/265631/ [12:14:40] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [12:21:55] (03CR) 10Hashar: "It is missing a level of indentation :-)" (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/265631 (owner: 10Phedenskog) [12:22:00] (03PS5) 10Hashar: Notify wikimedia-perf when WebPageTest fails [integration/config] - 10https://gerrit.wikimedia.org/r/265631 (owner: 10Phedenskog) [12:22:30] (03PS6) 10Hashar: Notify wikimedia-perf when WebPageTest fails [integration/config] - 10https://gerrit.wikimedia.org/r/265631 (owner: 10Phedenskog) [12:25:20] (03CR) 10Hashar: [C: 032] "INFO:jenkins_jobs.builder:Reconfiguring jenkins job performance-webpagetest-wmf" [integration/config] - 10https://gerrit.wikimedia.org/r/265631 (owner: 10Phedenskog) [12:32:46] (03Merged) 10jenkins-bot: Notify wikimedia-perf when WebPageTest fails [integration/config] - 10https://gerrit.wikimedia.org/r/265631 (owner: 10Phedenskog) [12:45:56] Hashar: thanks :) [13:44:48] (03PS2) 10Hashar: Fundraising CRM job can run concurrently now [integration/config] - 10https://gerrit.wikimedia.org/r/266447 (https://phabricator.wikimedia.org/T91903) (owner: 10Awight) [13:45:06] phedenskog: you are welcome :-} [13:45:32] (03CR) 10Hashar: [C: 032] "I have refreshed the job." [integration/config] - 10https://gerrit.wikimedia.org/r/266447 (https://phabricator.wikimedia.org/T91903) (owner: 10Awight) [13:47:16] (03Merged) 10jenkins-bot: Fundraising CRM job can run concurrently now [integration/config] - 10https://gerrit.wikimedia.org/r/266447 (https://phabricator.wikimedia.org/T91903) (owner: 10Awight) [13:47:26] (03PS2) 10Hashar: [SocialLogin] Replace jslint test with npm and add composer-test [integration/config] - 10https://gerrit.wikimedia.org/r/264967 (owner: 10Paladox) [13:48:04] (03CR) 10Hashar: [C: 032] [SocialLogin] Replace jslint test with npm and add composer-test [integration/config] - 10https://gerrit.wikimedia.org/r/264967 (owner: 10Paladox) [13:49:56] (03CR) 10jenkins-bot: [V: 04-1] [SocialLogin] Replace jslint test with npm and add composer-test [integration/config] - 10https://gerrit.wikimedia.org/r/264967 (owner: 10Paladox) [13:53:29] (03PS3) 10Hashar: [SocialLogin] Replace jslint test with npm and add composer-test [integration/config] - 10https://gerrit.wikimedia.org/r/264967 (owner: 10Paladox) [13:53:57] (03CR) 10Hashar: [C: 032] "mwext-SocialLogin-jslint was still references has being non voting." [integration/config] - 10https://gerrit.wikimedia.org/r/264967 (owner: 10Paladox) [13:57:38] (03Merged) 10jenkins-bot: [SocialLogin] Replace jslint test with npm and add composer-test [integration/config] - 10https://gerrit.wikimedia.org/r/264967 (owner: 10Paladox) [14:07:12] (03PS2) 10Hashar: [WPtouch] Add jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/266306 (owner: 10Paladox) [14:07:58] (03CR) 10Hashar: [C: 032] [WPtouch] Add jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/266306 (owner: 10Paladox) [14:08:52] (03Merged) 10jenkins-bot: [WPtouch] Add jenkins tests [integration/config] - 10https://gerrit.wikimedia.org/r/266306 (owner: 10Paladox) [14:11:20] Project browsertests-Wikidata-SmokeTests-linux-firefox build #87: 04STILL FAILING in 13 min: https://integration.wikimedia.org/ci/job/browsertests-Wikidata-SmokeTests-linux-firefox/87/ [14:38:53] (03PS2) 10Hashar: [ShortUrlApi] Replace jslint test with jsonlint and jshint [integration/config] - 10https://gerrit.wikimedia.org/r/264599 (owner: 10Paladox) [14:39:05] (03CR) 10Hashar: [C: 032] [ShortUrlApi] Replace jslint test with jsonlint and jshint [integration/config] - 10https://gerrit.wikimedia.org/r/264599 (owner: 10Paladox) [14:40:51] (03Merged) 10jenkins-bot: [ShortUrlApi] Replace jslint test with jsonlint and jshint [integration/config] - 10https://gerrit.wikimedia.org/r/264599 (owner: 10Paladox) [14:53:05] 10Beta-Cluster-Infrastructure: Redis latency monitor config change broke Redis in Beta Cluster - https://phabricator.wikimedia.org/T124677#1965678 (10Dereckson) To help to prioritize, we have received on the GLAM mailing list a report the error also occurs on Commons Beta using the GLAM tools: `[f79958ce] /wiki... [15:17:00] 10Beta-Cluster-Infrastructure: Redis latency monitor config change broke Redis in Beta Cluster - https://phabricator.wikimedia.org/T124677#1965733 (10hashar) >>! In T124677#1962608, @mmodell wrote: > ``` > twentyafterfour@deployment-redis01:~$ redis-server --version > Redis server v=2.8.4 sha=00000000:0 malloc=j... [15:19:12] 10Continuous-Integration-Infrastructure, 5Patch-For-Review, 7Technical-Debt, 7Tracking: All repositories should pass jshint test (tracking) - https://phabricator.wikimedia.org/T62619#1965737 (10hashar) [15:27:04] (03PS8) 10Hashar: tests: mw exts having npm do not need extension-jslint [integration/config] - 10https://gerrit.wikimedia.org/r/262708 [15:28:50] (03CR) 10Hashar: [C: 032] tests: mw exts having npm do not need extension-jslint [integration/config] - 10https://gerrit.wikimedia.org/r/262708 (owner: 10Hashar) [15:30:40] (03Merged) 10jenkins-bot: tests: mw exts having npm do not need extension-jslint [integration/config] - 10https://gerrit.wikimedia.org/r/262708 (owner: 10Hashar) [16:12:33] (03PS1) 10Hashar: Make tox voting on puppet/kafkatee and varnishkafka [integration/config] - 10https://gerrit.wikimedia.org/r/266515 [16:12:52] (03CR) 10Hashar: [C: 032] Make tox voting on puppet/kafkatee and varnishkafka [integration/config] - 10https://gerrit.wikimedia.org/r/266515 (owner: 10Hashar) [16:16:14] (03CR) 10jenkins-bot: [V: 04-1] Make tox voting on puppet/kafkatee and varnishkafka [integration/config] - 10https://gerrit.wikimedia.org/r/266515 (owner: 10Hashar) [16:20:39] (03PS2) 10Hashar: Make tox voting on puppet/kafkatee and varnishkafka [integration/config] - 10https://gerrit.wikimedia.org/r/266515 [16:21:46] (03CR) 10Hashar: [C: 032] Make tox voting on puppet/kafkatee and varnishkafka [integration/config] - 10https://gerrit.wikimedia.org/r/266515 (owner: 10Hashar) [16:23:12] (03Merged) 10jenkins-bot: Make tox voting on puppet/kafkatee and varnishkafka [integration/config] - 10https://gerrit.wikimedia.org/r/266515 (owner: 10Hashar) [16:48:31] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:48:31] PROBLEM - App Server Main HTTP Response on deployment-mediawiki01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:53:33] !log updated OCG to version 64050af0456a43344b32e3e93561a79207565eaf [16:53:33] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL, Master [16:55:16] thcipriani: the redis server used by OCG seems to be readonly now, too. [16:55:33] Error: send_command: stream not writeable. enable_offline_queue is false\n at RedisClient.send_command [16:56:04] (03CR) 10Zfilipin: [C: 032] Log SauceLabs session URLs via Cucumber logger embeds [selenium] - 10https://gerrit.wikimedia.org/r/265894 (https://phabricator.wikimedia.org/T105589) (owner: 10Dduvall) [16:56:21] cscott: :\ doing SWAT right at the moment, but I'll check. Which redis host is it in beta? [16:56:32] *I'll check after SWAT [16:56:36] thcipriani: i'll try to figure it out. [16:57:26] i just did the deploy of the new ocg version, so it's *possible* the problem is on my end (maybe the upgraded redis package is incompatible, or something like that) [16:57:30] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 31731 bytes in 0.565 second response time [16:57:39] but the "stream not writeable" error makes it seem pretty suspicious. [16:57:54] (03Merged) 10jenkins-bot: Log SauceLabs session URLs via Cucumber logger embeds [selenium] - 10https://gerrit.wikimedia.org/r/265894 (https://phabricator.wikimedia.org/T105589) (owner: 10Dduvall) [16:57:56] RECOVERY - App Server Main HTTP Response on deployment-mediawiki01 is OK: HTTP OK: HTTP/1.1 200 OK - 39540 bytes in 0.468 second response time [16:59:01] cscott: FWIW the problem I found yesterday should have only affected the deployment hosts (servers with role::deployment_server applied), but I don't blame you for being wary. [17:00:45] thcipriani: created https://phabricator.wikimedia.org/T124791 and the redis server seems to be deployment-redis01.eqiad.wmflabs:6379 [17:01:28] ECONNREFUSED - maybe the root cause is the missing .deployment-prep. ? [17:04:52] 10Beta-Cluster-Infrastructure: OCG redis server is readonly in beta cluster - https://phabricator.wikimedia.org/T124791#1966088 (10cscott) There's a suspicious "stream not writeable" in there which makes me suspicious that this might be related to T124720. But on the other hand, ECONNREFUSED makes me think that... [17:07:19] the redis server is https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-redis01.deployment-prep.eqiad.wmflabs and it has the role::jobqueue_redis. dunno if that inherits from role::deployment_server or not. [17:11:51] 10Beta-Cluster-Infrastructure: OCG redis server is DOWN in beta cluster - https://phabricator.wikimedia.org/T124791#1966107 (10cscott) [17:12:47] cscott: no it shouldn't inherit from that role. There were some other redis problems with beta recently, though. Probably related to this. [17:12:54] 10Beta-Cluster-Infrastructure: OCG redis server is DOWN in beta cluster - https://phabricator.wikimedia.org/T124791#1966070 (10cscott) from deployment-pdf01: ``` $ redis-cli -h deployment-redis01.eqiad.wmflabs Could not connect to Redis at deployment-redis01.eqiad.wmflabs:6379: Connection refused ``` That seems... [17:13:07] thcipriani: yeah, the machine seems to be down, or at least the redis service there is. [17:13:17] thcipriani: i can't connect with redis-cli (details on phab ticket) [17:14:05] thcipriani: i seem to be able to ssh into it, though. [17:14:51] thcipriani: but `$ ps afx | fgrep redis` doesn't turn up anything. probably the redis server crashed? [17:14:52] IIRC the previous problem was that there as an unsupported redis configuration variable for the beta version of redis, so redis wasn't started. [17:15:05] yeah, probably something like that again [17:16:02] 10Beta-Cluster-Infrastructure: Redis server used by OCG (deployment-redis01) is DOWN in beta cluster - https://phabricator.wikimedia.org/T124791#1966119 (10cscott) [17:36:37] 6Release-Engineering-Team, 10DBA, 7WorkType-Maintenance, 7user-notice: Requests to globally reset a user's skin preferences - https://phabricator.wikimedia.org/T119206#1966188 (10demon) 5Open>3Resolved >>! In T119206#1938966, @Nemo_bis wrote: > I still see interface in Vector on some wiki when I'm logg... [17:53:36] 6Release-Engineering-Team, 10DBA, 10MediaWiki-Configuration, 6operations, and 2 others: codfw is in read only according to mediawiki - https://phabricator.wikimedia.org/T124795#1966247 (10Krenair) phab doesn't auto-add projects like that anymore [18:14:56] 10Beta-Cluster-Infrastructure: Redis server used by OCG (deployment-redis01) is DOWN in beta cluster - https://phabricator.wikimedia.org/T124791#1966348 (10thcipriani) 5Open>3Resolved a:3thcipriani Not sure what changed, but it looks like this is now fixed? `/var/log/ocg/ocg.log` output: ``` Jan 26 18:10... [18:17:02] https://gerrit.wikimedia.org/r/#/c/265513/ < anyone able to mege? [18:23:20] 10Continuous-Integration-Infrastructure: CI trusty slaves do not have php5-apcu installed - https://phabricator.wikimedia.org/T124800#1966414 (10Legoktm) 3NEW [18:25:04] 10Continuous-Integration-Infrastructure: PHP5.5 tests say tidy is not installed, even though it appears to be installed - https://phabricator.wikimedia.org/T124801#1966430 (10Legoktm) 3NEW [18:25:33] 10Beta-Cluster-Infrastructure: Redis server used by OCG (deployment-redis01) is DOWN in beta cluster - https://phabricator.wikimedia.org/T124791#1966437 (10cscott) Yeah, seems to work now. Thanks! [18:31:27] Project browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce build #575: 04FAILURE in 1 min 26 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-linux-firefox-sauce/575/ [18:53:45] Project browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #577: 04FAILURE in 45 sec: https://integration.wikimedia.org/ci/job/browsertests-CentralNotice-en.wikipedia.beta.wmflabs.org-linux-chrome-sauce/577/ [19:06:07] hey yall, anyone know of possible weird redirects from meta to wikimediafoundation.org? [19:06:16] also office [19:07:24] https://office.wikimedia.org/wiki/Travel/Wikimania_2016 takes me to wikimediafoundation.org, so does https://meta.wikimedia.org/wiki/Schema:Analytics [19:07:29] ottomata: *.wikimedia.org is broken right now. [19:07:45] ottomata: See -operations. [19:07:48] James_F: ah [19:07:49] thanks [19:20:36] thanks [19:20:56] is there a canonical list of what makes up group[0-2]? bonus points for a link that doesn't redirect to wikimediafoundation.org. :) [19:21:39] urandom: https://noc.wikimedia.org/conf/highlight.php?file=group0.dblist etc. [19:21:51] https://noc.wikimedia.org/conf/highlight.php?file=group1.dblist heh [19:21:58] legoktm: wondered about that, but i don't see a group2 there [19:22:14] group2 is wikipedia.dblist - group1-wikipedia.dblist [19:22:52] oh, i see... [19:22:53] or all.dblist - group0.dblist - group1.dblist ;) [19:23:06] legoktm: thanks! [19:23:30] np [19:29:42] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:34:31] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 31731 bytes in 0.694 second response time [19:57:19] PROBLEM - App Server Main HTTP Response on deployment-mediawiki01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:57:19] PROBLEM - App Server Main HTTP Response on deployment-mediawiki03 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:57:19] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:00:00] RECOVERY - App Server Main HTTP Response on deployment-mediawiki01 is OK: HTTP OK: HTTP/1.1 200 OK - 39534 bytes in 2.047 second response time [20:11:50] 5Release-Engineering-Epics, 10Gather, 10MobileFrontend, 10Reading-Web-Planning, and 2 others: [EPIC] Create a formal release process for MobileFrontend/Gather - https://phabricator.wikimedia.org/T100296#1966947 (10Jdlrobson) [20:14:18] 5Release-Engineering-Epics, 10Gather, 10MobileFrontend, 10Reading-Web-Planning, and 2 others: [EPIC] Create a formal release process for MobileFrontend/Gather - https://phabricator.wikimedia.org/T100296#1966977 (10Jdlrobson) [20:15:11] 5Release-Engineering-Epics, 10Gather, 10MobileFrontend, 10Reading-Web-Planning, and 2 others: [EPIC] Create a formal release process for MobileFrontend/Gather - https://phabricator.wikimedia.org/T100296#1309879 (10Jdlrobson) [20:17:52] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:17:52] PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:18:50] 5Release-Engineering-Epics, 10Gather, 10MobileFrontend, 10Reading-Web-Planning, and 2 others: [EPIC] Create a formal release process for MobileFrontend/Gather - https://phabricator.wikimedia.org/T100296#1309879 (10Jdlrobson) I suggest we rewrite this task to come up with a decision about how to/if at all d... [20:22:07] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 39874 bytes in 0.694 second response time [20:22:27] RECOVERY - App Server Main HTTP Response on deployment-mediawiki02 is OK: HTTP OK: HTTP/1.1 200 OK - 39533 bytes in 0.793 second response time [20:28:32] PROBLEM - App Server Main HTTP Response on deployment-mediawiki01 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:28:32] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:28:34] PROBLEM - App Server Main HTTP Response on deployment-mediawiki02 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [20:43:40] (03PS2) 10Hashar: Add rake as a dependency [selenium] - 10https://gerrit.wikimedia.org/r/266205 [20:43:40] (03CR) 10Hashar: "Rebased :-)" [selenium] - 10https://gerrit.wikimedia.org/r/266205 (owner: 10Hashar) [20:45:58] RECOVERY - App Server Main HTTP Response on deployment-mediawiki01 is OK: HTTP OK: HTTP/1.1 200 OK - 39558 bytes in 1.531 second response time [20:49:08] 10Beta-Cluster-Infrastructure: Redis server used by OCG (deployment-redis01) is DOWN in beta cluster - https://phabricator.wikimedia.org/T124791#1967206 (10hashar) deployment-redis01 is running Trusty which ship a version not supporting latency monitoring. Ends up not starting properly :( See T124677 [20:49:21] 10Beta-Cluster-Infrastructure: Redis server used by OCG (deployment-redis01) is DOWN in beta cluster - https://phabricator.wikimedia.org/T124791#1967209 (10hashar) [20:49:23] 10Beta-Cluster-Infrastructure: Redis latency monitor config change broke Redis in Beta Cluster - https://phabricator.wikimedia.org/T124677#1962304 (10hashar) [20:53:13] 10Beta-Cluster-Infrastructure, 10Staging: Rework beta apache config - https://phabricator.wikimedia.org/T1256#1967240 (10Krenair) [20:53:15] 10Beta-Cluster-Infrastructure, 5Patch-For-Review, 7Puppet, 7Tracking: Remove all ::beta roles in puppet - https://phabricator.wikimedia.org/T86644#1967239 (10Krenair) [20:55:06] 10Continuous-Integration-Infrastructure: CI trusty slaves do not have php5-apcu installed - https://phabricator.wikimedia.org/T124800#1967250 (10hashar) We removed apc a while ago because its opcode cache ends up being confused when the same file name change too often/too fast. Ie when a build reuse a workspace... [20:56:56] 6Release-Engineering-Team, 10Browser-Tests-Infrastructure, 10Reading-Web, 5Patch-For-Review: MW-Selenium associates wrong SauceLabs job with Jenkins artifact - https://phabricator.wikimedia.org/T105589#1967253 (10hashar) We have agreed to release a new version of mediawiki-selenium and bump the version req... [20:59:08] 10Continuous-Integration-Infrastructure, 6operations, 5Patch-For-Review, 7WorkType-NewFunctionality: Phase out operations-puppet-pep8 Jenkins job and tools/puppet_pep8.py - https://phabricator.wikimedia.org/T114887#1967267 (10hashar) The job `operations-puppet-tox-pep8-jessie` is always triggered. It will... [21:17:24] legoktm: Could i ask a question and have some help on it. [21:18:08] legoktm: How can i install any required extensions under the vendor folder since it seems that it isent detecting it as installed under an extensions folder. [21:26:11] 10Continuous-Integration-Config, 10MediaWiki-extensions-OpenStackManager: ApiDocumentationTest failure: Undefined property: AuthPlugin::$boundAs - https://phabricator.wikimedia.org/T124613#1967389 (10hashar) Do not make it connect to LDAP, notably the Nodepool disposable instances do not connect to LDAP. Havin... [22:15:49] (03PS1) 10Paladox: [Vector] Add extension-qunit-generic test [integration/config] - 10https://gerrit.wikimedia.org/r/266604 [22:16:49] (03PS1) 10Paladox: [Metrolook] Add extension-qunit-generic test [integration/config] - 10https://gerrit.wikimedia.org/r/266605 [22:21:08] (03PS1) 10Paladox: [SemanticSifter] Replace jslint test with jshint and jsonlint [integration/config] - 10https://gerrit.wikimedia.org/r/266607 [22:37:54] 10Continuous-Integration-Infrastructure, 10incident-20160126-WikimediaDomainRedirection: Write and implement tests for Wikimedia's Apache configuration (redirects.conf, etc.) - https://phabricator.wikimedia.org/T45266#1967795 (10greg) [22:38:29] 10Continuous-Integration-Config, 10incident-20160126-WikimediaDomainRedirection, 6operations, 7Regression: operations-apache-config-lint replacement doesn't check syntax - https://phabricator.wikimedia.org/T114801#1967805 (10greg) [22:38:40] 10Beta-Cluster-Infrastructure, 10Staging, 10incident-20160126-WikimediaDomainRedirection: Rework beta apache config - https://phabricator.wikimedia.org/T1256#1967807 (10greg) [22:54:27] 10Continuous-Integration-Config, 10Fundraising Tech Backlog, 10Wikimedia-Fundraising-CiviCRM, 5Patch-For-Review: Optimize CiviCRM CI job - https://phabricator.wikimedia.org/T91903#1967908 (10awight) [22:58:22] 10Beta-Cluster-Infrastructure, 10Staging, 10Incident-20160126-WikimediaDomainRedirection: Rework beta apache config - https://phabricator.wikimedia.org/T1256#1967920 (10cscott) Out of curiosity, what would be the beta equivalent of `wikimediafoundation.org` (which was involved in the 2016-01-26 incident). I... [23:04:35] 10Beta-Cluster-Infrastructure, 10Staging, 10Incident-20160126-WikimediaDomainRedirection: Rework beta apache config - https://phabricator.wikimedia.org/T1256#1967954 (10Reedy) >>! In T1256#1967920, @cscott wrote: > Out of curiosity, what would be the beta equivalent of `wikimediafoundation.org` (which was in... [23:05:58] 10Beta-Cluster-Infrastructure, 10Staging, 10Incident-20160126-WikimediaDomainRedirection: Rework beta apache config - https://phabricator.wikimedia.org/T1256#1967961 (10Krenair) There isn't one (currently). One of the issues involved in this is that beta doesn't have all the sites that production has (e.g. c... [23:08:56] 10Beta-Cluster-Infrastructure, 10Staging, 10Incident-20160126-WikimediaDomainRedirection: Rework beta apache config - https://phabricator.wikimedia.org/T1256#1967974 (10cscott) @Reedy So it seems that an equivalent to wikimediafoundation.org doesn't (yet) exist in labs? In the [incident report](https://wiki... [23:11:08] Yippee, build fixed! [23:11:09] Project browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce build #398: 09FIXED in 14 min: https://integration.wikimedia.org/ci/job/browsertests-Gather-en.m.wikipedia.beta.wmflabs.org-linux-chrome-sauce/398/ [23:11:09] (03CR) 10Awight: "Thanks!" [integration/config] - 10https://gerrit.wikimedia.org/r/266447 (https://phabricator.wikimedia.org/T91903) (owner: 10Awight) [23:11:53] 10Beta-Cluster-Infrastructure, 10Staging, 10Incident-20160126-WikimediaDomainRedirection: Rework beta apache config - https://phabricator.wikimedia.org/T1256#1967989 (10Krenair) I'm not sure how wikimediafoundation.org is relevant here? It's a wiki, not a portal [23:16:37] 10Beta-Cluster-Infrastructure, 10Staging, 10Incident-20160126-WikimediaDomainRedirection: Rework beta apache config - https://phabricator.wikimedia.org/T1256#1968010 (10cscott) The symptom of the problem were redirects to wikimediafoundation.org. Ideally (eventually) the beta configuration would match the p... [23:17:51] 10Beta-Cluster-Infrastructure, 10Staging, 10Incident-20160126-WikimediaDomainRedirection: Rework beta apache config - https://phabricator.wikimedia.org/T1256#1968013 (10greg) Step 1 would have been for the puppet swat procedure to include testing of changes in Beta.... but I digress. [23:18:49] 10Beta-Cluster-Infrastructure, 10Staging, 10Incident-20160126-WikimediaDomainRedirection: Rework beta apache config - https://phabricator.wikimedia.org/T1256#1968018 (10Krenair) >>! In T1256#1968013, @greg wrote: > Step 1 would have been for the puppet swat procedure to include testing of changes in Beta....... [23:21:01] Krenair: yeah, touche, I was just thinking about the general issue that ops don't test things in beta very much (some are, which is awesome!) [23:22:54] I was trying to slowly make the structure of beta vs. structure of prod config the same [23:23:15] so that they could just be merged into a single common template that gives the correct domains for the site (.org or .beta.wmflabs.org) [23:24:19] hey, is EventLogging in beta stored in mysql? if so, on which host? [23:28:16] yes MaxSem [23:28:24] there are docs somewhere on wikitech [23:28:48] https://wikitech.wikimedia.org/wiki/Analytics/EventLogging/TestingOnBetaCluster [23:29:01] https://wikitech.wikimedia.org/wiki/Analytics/EventLogging/TestingOnBetaCluster#Database [23:32:02] root@deployment-eventlogging03:/home/maxsem# ps aux|grep mysqld [23:32:03] root 16864 0.0 0.0 10584 940 pts/7 S+ 23:31 0:00 grep --color=auto mysqld [23:32:03] root@deployment-eventlogging03:/home/maxsem# [23:33:36] MaxSem: might want to ask in -analytics, as well [23:34:17] I seem to recall mysqld wasn't installed by puppet or something [23:34:20] when this got set up [23:34:24] ottomata would know [23:34:30] it was [23:34:46] I started the daemon by had and it works now [23:34:57] obviously, no new events (: [23:38:44] mysql is installed by puppet, but not started