[00:27:22] 10Release-Engineering-Team (Kanban), 10Wikimedia-Site-requests: Close chairwiki - https://phabricator.wikimedia.org/T184961#3901664 (10demon) p:05Triage>03Normal [00:31:40] 10Gerrit, 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10Zuul: Update zuul to upstream master - https://phabricator.wikimedia.org/T158243#3901680 (10TerraCodes) >>! In T158243#3900947, @Paladox wrote: > Prods now using a zuul version that supports gerrit 2.14+ :). Then can this task be closed? [00:32:47] 10Gerrit, 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10Zuul: Update zuul to upstream master - https://phabricator.wikimedia.org/T158243#3901681 (10Paladox) We only updated to Zuul 2.5.1, but there’s 2.5.2 and 2.6.0. But this shouldn’t block gerrit’s 2.14 update now :) [00:34:47] 10Gerrit, 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10Zuul: Update zuul to upstream master - https://phabricator.wikimedia.org/T158243#3901683 (10TerraCodes) [00:34:50] 10Gerrit, 10Release-Engineering-Team (Someday), 10Patch-For-Review: Update gerrit to 2.14.7 - https://phabricator.wikimedia.org/T156120#3901682 (10TerraCodes) [00:35:04] 10Gerrit, 10Release-Engineering-Team (Someday), 10Patch-For-Review: Update gerrit to 2.14.7 - https://phabricator.wikimedia.org/T156120#3308995 (10TerraCodes) Per T158243#3901681 [01:58:22] 10Continuous-Integration-Config, 10Patch-For-Review, 10Technical-Debt, 10Zuul: test-requirements.txt in ci-config still points to precise deb - https://phabricator.wikimedia.org/T162191#3901820 (10Paladox) 05Open>03Resolved a:03Paladox The change was merged :) [01:58:31] 10Continuous-Integration-Config, 10Technical-Debt, 10Zuul: test-requirements.txt in ci-config still points to precise deb - https://phabricator.wikimedia.org/T162191#3901824 (10Paladox) [02:37:35] (03PS1) 10Phantom42: Add WikiAdmin dependency for BlueSpiceInterWikiLinks extension [integration/config] - 10https://gerrit.wikimedia.org/r/404401 (https://phabricator.wikimedia.org/T175794) [02:43:40] Project beta-scap-eqiad build #191005: 04FAILURE in 0.39 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/191005/ [02:56:30] Yippee, build fixed! [02:56:30] Project beta-scap-eqiad build #191006: 09FIXED in 2 min 50 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/191006/ [02:57:40] 10MediaWiki-Releasing, 10Analytics: Create dashboard showing MediaWiki tarball download statistics - https://phabricator.wikimedia.org/T119772#3901867 (10Nuria) My advice , rather than using hadoop for this would be to instrument with piwik, releases.wikimedia.org. Combing terabytes of data for this few reques... [05:56:40] 10MediaWiki-Releasing, 10Analytics: Create dashboard showing MediaWiki tarball download statistics - https://phabricator.wikimedia.org/T119772#3901946 (10Legoktm) People don't visit releases.wikimedia.org directly - they click the direct tarball link from mediawiki.org. I don't think piwik will work for that. [06:30:37] PROBLEM - App Server Main HTTP Response on deployment-mediawiki05 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 1343 bytes in 0.014 second response time [06:35:44] RECOVERY - App Server Main HTTP Response on deployment-mediawiki05 is OK: HTTP OK: HTTP/1.1 200 OK - 46843 bytes in 5.920 second response time [06:53:44] Project beta-scap-eqiad build #191031: 04FAILURE in 5.5 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/191031/ [07:06:49] Yippee, build fixed! [07:06:50] Project beta-scap-eqiad build #191032: 09FIXED in 3 min 9 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/191032/ [08:32:18] (03PS1) 10Hashar: Migrate mediawiki/vagrant to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/404416 [08:34:20] (03CR) 10jerkins-bot: [V: 04-1] Migrate mediawiki/vagrant to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/404416 (owner: 10Hashar) [08:40:35] (03PS2) 10Hashar: Migrate mediawiki/vagrant to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/404416 [08:42:11] (03CR) 10Hashar: [C: 032] Migrate mediawiki/vagrant to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/404416 (owner: 10Hashar) [08:43:19] (03Merged) 10jenkins-bot: Migrate mediawiki/vagrant to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/404416 (owner: 10Hashar) [09:03:51] (03PS1) 10Hashar: Migrate vagrant doc publish job to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/404421 [09:13:13] (03PS1) 10Hashar: docker: pass args to 'rake' [integration/config] - 10https://gerrit.wikimedia.org/r/404422 [09:13:49] (03PS2) 10Hashar: Migrate vagrant doc publish job to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/404421 [09:24:33] (03CR) 10Hashar: [C: 032] docker: pass args to 'rake' [integration/config] - 10https://gerrit.wikimedia.org/r/404422 (owner: 10Hashar) [09:25:54] (03Merged) 10jenkins-bot: docker: pass args to 'rake' [integration/config] - 10https://gerrit.wikimedia.org/r/404422 (owner: 10Hashar) [09:27:28] !log deploy rake/rake-vagrant docker images | https://gerrit.wikimedia.org/r/#/c/404422/ [09:27:34] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:38:11] !log Deleting all legacy wmfreleng/ docker images from the Jenkins slaves : sudo cumin --force 'name:docker' "docker images|grep wmfreleng|awk '{print \$3}'|sort|uniq|xargs docker rmi --force" [09:38:15] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [09:51:29] 10Release-Engineering-Team (Kanban), 10MediaWiki-extensions-ORES, 10ORES, 10Scoring-platform-team, 10User-zeljkofilipin: Special:RecentChanges broken on Jenkins slaves - https://phabricator.wikimedia.org/T184938#3902160 (10zeljkofilipin) p:05Triage>03Low a:03zeljkofilipin [09:59:31] PROBLEM - Free space - all mounts on deployment-kafka01 is CRITICAL: CRITICAL: deployment-prep.deployment-kafka01.diskspace.root.byte_percentfree (<22.22%) [10:01:20] hashar: zeljkof : Do you think you can get this merged? https://gerrit.wikimedia.org/r/#/c/403904/ [10:01:36] It's integratioin config [10:02:00] Also this, with lower priority: https://phabricator.wikimedia.org/T166672 [10:02:21] Amir1: in a meeting, will take a look soon [10:02:27] (03PS3) 10Hashar: Migrate vagrant doc publish job to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/404421 [10:02:28] Thanks [10:03:20] (03CR) 10Hashar: [C: 032] Migrate vagrant doc publish job to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/404421 (owner: 10Hashar) [10:05:42] (03Merged) 10jenkins-bot: Migrate vagrant doc publish job to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/404421 (owner: 10Hashar) [10:09:03] 10Release-Engineering-Team (Kanban), 10Mediawiki-extensions-PropertySuggester, 10Repository-Admins, 10Wikidata: Move PropertySuggester-Python to gerrit - https://phabricator.wikimedia.org/T166672#3902208 (10hashar) a:03hashar [10:09:11] Amir1: yeah the gerrit/github repo migration I can handle it. In a few [10:09:54] Thank you very much [10:10:33] Amir1: for https://phabricator.wikimedia.org/T166672 do you have a Gerrit repository name to suggest? :) [10:10:58] hmm, let me check the current repos [10:11:26] we have wikibase/javascript-api [10:11:40] so I think wikibase/property-suggester-python would be fine [10:11:44] or wikidata/propertysuggested ? :D [10:11:52] ahh yeah wikibase bah [10:12:20] I don't think we need the -python suffix [10:12:24] seems wikibase/property-suggester would be simpler [10:12:44] hashar: the problem is that we have a property suggester extension [10:12:51] with the exact same name [10:13:13] also wikidata prefix seems fine to me too (whatever you prefer) [10:29:07] Amir1: back sorry. It is up to you really :) Just pick a name and edit https://phabricator.wikimedia.org/T166672 and i will do the mirroring :] [10:31:40] (03PS1) 10Hashar: Remove mediawiki-vagrant-rake-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/404432 [10:31:51] (03CR) 10Hashar: [C: 032] Remove mediawiki-vagrant-rake-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/404432 (owner: 10Hashar) [10:34:08] (03Merged) 10jenkins-bot: Remove mediawiki-vagrant-rake-jessie [integration/config] - 10https://gerrit.wikimedia.org/r/404432 (owner: 10Hashar) [10:55:15] 10Release-Engineering-Team (Kanban), 10Discovery, 10Discovery-Search (Current work), 10MW-1.31-release-notes (WMF-deploy-2018-01-02 (1.31.0-wmf.15)), and 2 others: Create selenium-CirrusSearch-jessie daily Jenkins job - https://phabricator.wikimedia.org/T175179#3902290 (10zeljkofilipin) [11:06:50] (03PS3) 10Zfilipin: Create selenium-CirrusSearch-jessie Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/398030 (https://phabricator.wikimedia.org/T175179) [11:14:53] 10Release-Engineering-Team (Kanban), 10Discovery, 10Discovery-Search (Current work), 10MW-1.31-release-notes (WMF-deploy-2018-01-02 (1.31.0-wmf.15)), and 2 others: Create selenium-CirrusSearch-jessie daily Jenkins job - https://phabricator.wikimedia.org/T175179#3902332 (10zeljkofilipin) I have created a te... [11:19:31] (03PS4) 10Zfilipin: Create selenium-CirrusSearch-jessie Jenkins job [integration/config] - 10https://gerrit.wikimedia.org/r/398030 (https://phabricator.wikimedia.org/T175179) [11:39:40] dcausse: the job is running https://integration.wikimedia.org/ci/view/Selenium/job/selenium-CirrusSearch-jessie-381785/3/console [11:39:52] try if you can edit and/or run it [11:50:51] zeljkof: thanks! I'll have a look later this week [12:04:49] 10Release-Engineering-Team (Kanban), 10Wiki-Setup (Close): Close chairwiki - https://phabricator.wikimedia.org/T184961#3902407 (10Aklapper) [12:06:09] (03PS1) 10Dalba: Increase pywikibot-tox-docker timeout to 10 min [integration/config] - 10https://gerrit.wikimedia.org/r/404436 [13:12:28] (03CR) 10Hashar: [C: 032] Increase pywikibot-tox-docker timeout to 10 min [integration/config] - 10https://gerrit.wikimedia.org/r/404436 (owner: 10Dalba) [13:13:51] (03Merged) 10jenkins-bot: Increase pywikibot-tox-docker timeout to 10 min [integration/config] - 10https://gerrit.wikimedia.org/r/404436 (owner: 10Dalba) [13:19:13] (03PS1) 10Hashar: Migrate yard-publish jobs to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/404440 [13:21:05] (03CR) 10jenkins-bot: Update RuboCop Ruby gem [ruby/api] - 10https://gerrit.wikimedia.org/r/395511 (https://phabricator.wikimedia.org/T180878) (owner: 10Zfilipin) [13:22:13] (03PS2) 10Hashar: Migrate yard-publish jobs to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/404440 [13:23:22] (03CR) 10jenkins-bot: Update RuboCop Ruby gem [ruby/api] - 10https://gerrit.wikimedia.org/r/395511 (https://phabricator.wikimedia.org/T180878) (owner: 10Zfilipin) [13:23:47] (03CR) 10Hashar: "^^ that is me testing the migration to docker" [ruby/api] - 10https://gerrit.wikimedia.org/r/395511 (https://phabricator.wikimedia.org/T180878) (owner: 10Zfilipin) [13:26:58] (03CR) 10jenkins-bot: Update RuboCop Ruby gem [selenium] - 10https://gerrit.wikimedia.org/r/395513 (https://phabricator.wikimedia.org/T180878) (owner: 10Zfilipin) [13:29:02] (03CR) 10Hashar: [C: 032] Migrate yard-publish jobs to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/404440 (owner: 10Hashar) [13:30:22] (03Merged) 10jenkins-bot: Migrate yard-publish jobs to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/404440 (owner: 10Hashar) [14:22:45] (03PS1) 10Hashar: Migrate tox publish jobs to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/404448 [14:29:48] (03CR) 10Hashar: [C: 032] Migrate tox publish jobs to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/404448 (owner: 10Hashar) [14:30:16] (03PS2) 10Hashar: Migrate tox publish jobs to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/404448 [14:30:32] (03CR) 10Hashar: [C: 032] Migrate tox publish jobs to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/404448 (owner: 10Hashar) [14:33:07] (03Merged) 10jenkins-bot: Migrate tox publish jobs to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/404448 (owner: 10Hashar) [14:53:16] PROBLEM - Host deployment-puppetdb01 is DOWN: CRITICAL - Host Unreachable (10.68.23.76) [15:22:26] (03PS1) 10Hashar: Migrate most Doxygen jobs to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/404470 [15:22:40] (03CR) 10Hashar: [C: 032] Migrate most Doxygen jobs to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/404470 (owner: 10Hashar) [15:24:08] (03Merged) 10jenkins-bot: Migrate most Doxygen jobs to Docker [integration/config] - 10https://gerrit.wikimedia.org/r/404470 (owner: 10Hashar) [15:45:43] (03PS1) 10Hashar: Fix doxygen container not cloning repo [integration/config] - 10https://gerrit.wikimedia.org/r/404475 [15:46:22] (03CR) 10Hashar: [C: 032] Fix doxygen container not cloning repo [integration/config] - 10https://gerrit.wikimedia.org/r/404475 (owner: 10Hashar) [15:48:08] (03Merged) 10jenkins-bot: Fix doxygen container not cloning repo [integration/config] - 10https://gerrit.wikimedia.org/r/404475 (owner: 10Hashar) [15:52:01] 10Release-Engineering-Team (Kanban), 10Mediawiki-extensions-PropertySuggester, 10Repository-Admins, 10Wikidata: Move PropertySuggester-Python to gerrit - https://phabricator.wikimedia.org/T166672#3903017 (10Ladsgroup) @hashar: Hey, sorry for the hassle, I talked to the PM and we decided to go with `wikibas... [15:53:07] 10Release-Engineering-Team, 10Developer-Relations, 10Discourse, 10Operations: Enable GitHub login in discourse-mediawiki.wmflabs.org - https://phabricator.wikimedia.org/T184986#3903020 (10Qgil) Thanks! It seems that #release-engineering-team has access to Wikimedia's GitHub account. Adding them to the loop. [15:58:36] Project beta-scap-eqiad build #191093: 04FAILURE in 1 min 39 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/191093/ [15:59:58] PROBLEM - Host deployment-etcd-01 is DOWN: CRITICAL - Host Unreachable (10.68.19.227) [16:01:05] PROBLEM - Host deployment-ms-be03 is DOWN: CRITICAL - Host Unreachable (10.68.22.125) [16:03:03] Getting some HTTP 500s from beta,e.g. https://en.wikipedia.beta.wmflabs.org/wiki/Special:Preferences [16:05:17] Project beta-scap-eqiad build #191094: 04STILL FAILING in 1 min 35 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/191094/ [16:06:03] Krinkle: I know mass reboots are happening for the Meltdown kernel updates atm [16:06:09] but I don't actually know for sure it's related [16:07:33] (03PS1) 10Hashar: Add PHP to doxygen image [integration/config] - 10https://gerrit.wikimedia.org/r/404483 [16:14:20] RECOVERY - Host deployment-ms-be03 is UP: PING OK - Packet loss = 0%, RTA = 200.61 ms [16:14:42] (03PS1) 10Hashar: Fix path in oojs-ui-doxygen-publish [integration/config] - 10https://gerrit.wikimedia.org/r/404485 [16:14:42] RECOVERY - Host deployment-etcd-01 is UP: PING OK - Packet loss = 0%, RTA = 5.82 ms [16:16:47] Yippee, build fixed! [16:16:47] Project beta-scap-eqiad build #191095: 09FIXED in 3 min 7 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/191095/ [16:22:23] PROBLEM - Puppet errors on deployment-etcd-01 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0] [16:27:22] RECOVERY - Puppet errors on deployment-etcd-01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:05:28] addshore: I'm going to go ahead and promote wmf.16 to group1, FYI (if you have lingering concerns about T184749 ) [17:05:28] T184749: Every edit (including rollback) distorts non-ASCII text - https://phabricator.wikimedia.org/T184749 [17:14:21] PROBLEM - Host deployment-ms-fe02 is DOWN: CRITICAL - Host Unreachable (10.68.19.247) [17:14:30] PROBLEM - Host integration-slave-docker-1003 is DOWN: CRITICAL - Host Unreachable (10.68.23.158) [17:15:30] PROBLEM - Host deployment-elastic06 is DOWN: CRITICAL - Host Unreachable (10.68.23.242) [17:21:23] RECOVERY - Host deployment-elastic06 is UP: PING OK - Packet loss = 0%, RTA = 1.97 ms [17:21:32] I'm doing a bunch more security reboots today so there are going to be a lot of those for the next few hours. [17:24:07] RECOVERY - Host integration-slave-docker-1003 is UP: PING OK - Packet loss = 0%, RTA = 2.02 ms [17:24:58] RECOVERY - Host deployment-ms-fe02 is UP: PING OK - Packet loss = 0%, RTA = 2.01 ms [17:27:15] PROBLEM - Puppet errors on deployment-elastic06 is CRITICAL: CRITICAL: 16.67% of data above the critical threshold [0.0] [17:27:55] PROBLEM - App Server Main HTTP Response on deployment-mediawiki06 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 12198 bytes in 0.168 second response time [17:28:37] PROBLEM - App Server Main HTTP Response on deployment-mediawiki05 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 12198 bytes in 0.159 second response time [17:28:47] PROBLEM - Puppet errors on deployment-snapshot01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [17:29:40] PROBLEM - Puppet errors on integration-slave-docker-1003 is CRITICAL: CRITICAL: 12.50% of data above the critical threshold [0.0] [17:30:18] PROBLEM - App Server Main HTTP Response on deployment-mediawiki04 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 12198 bytes in 0.165 second response time [17:30:42] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string 'Wikipedia' not found on 'https://en.wikipedia.beta.wmflabs.org:443/wiki/Main_Page?debug=true' - 12779 bytes in 0.218 second response time [17:31:44] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - string 'Wikipedia' not found on 'https://en.m.wikipedia.beta.wmflabs.org:443/wiki/Main_Page?debug=true' - 12433 bytes in 0.159 second response time [17:32:16] RECOVERY - Puppet errors on deployment-elastic06 is OK: OK: Less than 1.00% above the threshold [0.0] [17:32:33] thanks andrewbogott for keeping us updated :) [17:33:31] PROBLEM - Host deployment-redis06 is DOWN: CRITICAL - Host Unreachable (10.68.20.16) [17:34:06] PROBLEM - Host deployment-redis02 is DOWN: CRITICAL - Host Unreachable (10.68.16.231) [17:34:10] PROBLEM - Host deployment-zotero01 is DOWN: CRITICAL - Host Unreachable (10.68.17.102) [17:34:18] PROBLEM - Host integration-slave-docker-1007 is DOWN: CRITICAL - Host Unreachable (10.68.19.105) [17:34:40] RECOVERY - Puppet errors on integration-slave-docker-1003 is OK: OK: Less than 1.00% above the threshold [0.0] [17:34:42] PROBLEM - Host deployment-tmh01 is DOWN: CRITICAL - Host Unreachable (10.68.16.211) [17:36:46] Project beta-scap-eqiad build #191108: 04FAILURE in 3 min 2 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/191108/ [17:36:49] PROBLEM - Host deployment-kafka04 is DOWN: CRITICAL - Host Unreachable (10.68.17.9) [17:36:51] PROBLEM - Host saucelabs-01 is DOWN: CRITICAL - Host Unreachable (10.68.21.186) [17:41:45] RECOVERY - Host deployment-redis06 is UP: PING OK - Packet loss = 0%, RTA = 2.01 ms [17:43:18] RECOVERY - Host deployment-redis02 is UP: PING OK - Packet loss = 0%, RTA = 1.19 ms [17:43:50] RECOVERY - Host deployment-kafka04 is UP: PING OK - Packet loss = 0%, RTA = 3.37 ms [17:44:38] RECOVERY - Host integration-slave-docker-1007 is UP: PING OK - Packet loss = 0%, RTA = 1.55 ms [17:44:44] RECOVERY - Host deployment-tmh01 is UP: PING OK - Packet loss = 0%, RTA = 4.75 ms [17:44:48] RECOVERY - Host saucelabs-01 is UP: PING OK - Packet loss = 0%, RTA = 8.27 ms [17:47:14] Yippee, build fixed! [17:47:14] Project beta-scap-eqiad build #191109: 09FIXED in 3 min 33 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/191109/ [17:47:27] PROBLEM - Puppet errors on integration-slave-docker-1007 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [17:48:36] (I'm poking at shinken, so it's sometimes dropping off and coming up, jfyi) [17:50:33] PROBLEM - Host deployment-restbase01 is DOWN: CRITICAL - Host Unreachable (10.68.16.128) [17:50:41] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [17:51:16] PROBLEM - Puppet errors on saucelabs-01 is CRITICAL: CRITICAL: 14.29% of data above the critical threshold [0.0] [17:52:26] RECOVERY - Puppet errors on integration-slave-docker-1007 is OK: OK: Less than 1.00% above the threshold [0.0] [17:53:44] PROBLEM - Host deployment-aqs01 is DOWN: CRITICAL - Host Unreachable (10.68.18.237) [17:53:48] PROBLEM - Host castor02 is DOWN: CRITICAL - Host Unreachable (10.68.20.186) [17:53:55] PROBLEM - Host deployment-memc04 is DOWN: CRITICAL - Host Unreachable (10.68.23.25) [17:54:01] PROBLEM - Host integration-r-lang-01 is DOWN: CRITICAL - Host Unreachable (10.68.20.232) [17:54:52] PROBLEM - Host deployment-zookeeper02 is DOWN: CRITICAL - Host Unreachable (10.68.18.75) [17:56:13] RECOVERY - Puppet errors on saucelabs-01 is OK: OK: Less than 1.00% above the threshold [0.0] [17:56:45] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 35931 bytes in 1.140 second response time [17:57:58] RECOVERY - App Server Main HTTP Response on deployment-mediawiki06 is OK: HTTP OK: HTTP/1.1 200 OK - 46799 bytes in 4.571 second response time [17:58:36] RECOVERY - App Server Main HTTP Response on deployment-mediawiki05 is OK: HTTP OK: HTTP/1.1 200 OK - 46843 bytes in 1.152 second response time [18:00:19] RECOVERY - App Server Main HTTP Response on deployment-mediawiki04 is OK: HTTP OK: HTTP/1.1 200 OK - 46849 bytes in 1.600 second response time [18:00:33] RECOVERY - Host deployment-restbase01 is UP: PING OK - Packet loss = 0%, RTA = 44.08 ms [18:00:41] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 47434 bytes in 0.724 second response time [18:00:57] RECOVERY - Host deployment-aqs01 is UP: PING OK - Packet loss = 0%, RTA = 4.15 ms [18:01:07] RECOVERY - Host deployment-zookeeper02 is UP: PING OK - Packet loss = 0%, RTA = 4.00 ms [18:01:24] RECOVERY - Host integration-r-lang-01 is UP: PING OK - Packet loss = 0%, RTA = 1.79 ms [18:02:30] RECOVERY - Host deployment-memc04 is UP: PING OK - Packet loss = 0%, RTA = 8.18 ms [18:02:37] RECOVERY - Host castor02 is UP: PING OK - Packet loss = 0%, RTA = 7.99 ms [18:08:51] PROBLEM - Puppet errors on castor02 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [0.0] [18:11:50] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:12:03] PROBLEM - Host deployment-imagescaler01 is DOWN: CRITICAL - Host Unreachable (10.68.19.158) [18:12:52] PROBLEM - Host Generic Beta Cluster is DOWN: CRITICAL - Host Unreachable (en.wikipedia.beta.wmflabs.org) [18:13:22] PROBLEM - Host deployment-cache-text04 is DOWN: CRITICAL - Host Unreachable (10.68.18.103) [18:13:52] RECOVERY - Puppet errors on castor02 is OK: OK: Less than 1.00% above the threshold [0.0] [18:14:06] PROBLEM - Host deployment-restbase02 is DOWN: CRITICAL - Host Unreachable (10.68.17.189) [18:15:15] PROBLEM - Host deployment-ms-be04 is DOWN: CRITICAL - Host Unreachable (10.68.16.139) [18:15:17] PROBLEM - Host deployment-aqs02 is DOWN: CRITICAL - Host Unreachable (10.68.17.90) [18:15:26] PROBLEM - Host deployment-sentry01 is DOWN: CRITICAL - Host Unreachable (10.68.19.148) [18:15:36] PROBLEM - Host deployment-fluorine02 is DOWN: CRITICAL - Host Unreachable (10.68.23.106) [18:15:50] PROBLEM - Host deployment-videoscaler01 is DOWN: CRITICAL - Host Unreachable (10.68.19.130) [18:16:14] !log muted shinken in -releng while the reboots are on-going [18:16:19] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [18:17:02] Project beta-scap-eqiad build #191112: 04FAILURE in 3 min 15 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/191112/ [18:17:38] ^ probably due to host failures [18:27:05] Yippee, build fixed! [18:27:06] Project beta-scap-eqiad build #191113: 09FIXED in 3 min 21 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/191113/ [19:22:35] Amir1: why did you abandon https://gerrit.wikimedia.org/r/#/c/403930/ ? [19:23:35] thcipriani: sorry, abandoned wrong patch [19:23:39] facepalm [19:23:58] no worries :) [19:24:04] wanted to abandon this: https://gerrit.wikimedia.org/r/#/c/404491/ [19:32:34] 10Beta-Cluster-Infrastructure, 10Commons: Unable to login commons wmflabs - https://phabricator.wikimedia.org/T185028#3903794 (10Framawiki) [19:36:13] wb wikibugs [19:39:09] 10Release-Engineering-Team (Next), 10Scap, 10ORES, 10Operations, 10Scoring-platform-team: scap support for git-lfs - https://phabricator.wikimedia.org/T181855#3903847 (10Ottomata) p:05Triage>03Normal Just curious, why not use git fat? We have a git-fat store available already, and it can be used by... [19:39:30] 10Continuous-Integration-Infrastructure, 10Operations, 10HHVM: HHVM 3.18.5+dfsg-1+wmf3 changes parse_url causing unit tests to fail - https://phabricator.wikimedia.org/T185024#3903850 (10Ottomata) p:05Triage>03Normal [19:39:54] 10Release-Engineering-Team, 10DBA, 10Operations, 10cloud-services-team: Move some wikis to s5 - https://phabricator.wikimedia.org/T184805#3903854 (10Ottomata) p:05Triage>03Normal [19:41:21] addshore: finally got https://gerrit.wikimedia.org/r/#/c/403930/ merged. Will deploy now and after that roll forward wmf.16. [19:41:35] Did something happen to CI selenium username/password config recently [19:41:46] 10Beta-Cluster-Infrastructure: Beta cluster login broken - https://phabricator.wikimedia.org/T185028#3903875 (10Krenair) [19:41:57] All browser tests requriring login on https://integration.wikimedia.org/ci/view/Selenium/job/selenium-MinervaNeue/282/ are suddenly breaking [19:41:59] 10Beta-Cluster-Infrastructure: Beta cluster login broken - https://phabricator.wikimedia.org/T185028#3903765 (10Krenair) Reproduced bug on enwiki [19:42:06] jdlrobson, yep, there's a task open for it [19:42:16] Krenair: can you link me? [19:42:19] (and thanks!) [19:42:21] the one I just touched [19:42:29] ahh perfect thank you!! [19:42:38] I do notice this topic: [19:42:39] * greg-g has changed the topic to: Wikimedia Release Engineering (team members +voiced) | Status: Beta Cluster not well, WMCS reboots in progress [19:42:50] 10Beta-Cluster-Infrastructure: Beta cluster login broken - https://phabricator.wikimedia.org/T185028#3903887 (10greg) 05Open>03Invalid Yes, today is a bad day for the Beta Cluster, the underlying hosts of the VMs are being restarted on a staggered bases to apply linux kernel security patches. Please try agai... [19:42:50] wonder if perhaps the memc/redis instance is down for reboot [19:42:58] that might cause such errors [19:43:00] 10Beta-Cluster-Infrastructure, 10MinervaNeue, 10Readers-Web-Backlog: Beta cluster login broken - https://phabricator.wikimedia.org/T185028#3903895 (10Jdlrobson) FYI this is why the Minerva browser tests are failing right now. [19:43:12] as it'd break session handling I would think [19:43:21] Krenair: wait, it's a prod issue? [19:43:25] no [19:43:27] sorry, I missed the convo here [19:43:44] I haven't tested prod [19:43:59] I was confused by "enwiki" [19:44:04] I assume we'd have heard a lot by now if prod login was broken [19:44:12] yeah, no kidding :) [19:44:13] ah [19:44:32] yeah I was referring to en.wikipedia.beta.wmflabs.org which has the same DB name as en.wikipedia.org :/ [19:44:44] right right [19:50:50] good i wasn't doing any work over there today [19:53:32] 10Release-Engineering-Team (Kanban), 10User-greg: Basic plan of post-Hackathon team offsite - https://phabricator.wikimedia.org/T184550#3887485 (10greg) p:05High>03Normal Status: submitted group travel request and let eng-admin@ know, awaiting responses (they're all a bit busy right now) [19:53:53] 10Release-Engineering-Team (Kanban), 10User-greg: Basic plan of pre-Hackathon team offsite - https://phabricator.wikimedia.org/T184550#3903957 (10greg) [19:54:27] (03CR) 10Chad: [C: 032] make-wmf-branch: Clone submodules with --depth=1 [tools/release] - 10https://gerrit.wikimedia.org/r/404529 (owner: 10Chad) [19:55:01] (03Merged) 10jenkins-bot: make-wmf-branch: Clone submodules with --depth=1 [tools/release] - 10https://gerrit.wikimedia.org/r/404529 (owner: 10Chad) [20:00:15] 10Release-Engineering-Team (Next), 10Scap, 10ORES, 10Operations, 10Scoring-platform-team: scap support for git-lfs - https://phabricator.wikimedia.org/T181855#3903991 (10demon) >>! In T181855#3903847, @Ottomata wrote: > Just curious, why not use git fat? We have a git-fat store available already, and it... [20:02:42] (03PS1) 10Chad: make-wmf-branch: Add force to special submodules [tools/release] - 10https://gerrit.wikimedia.org/r/404532 [20:02:44] (03CR) 10Chad: [C: 032] make-wmf-branch: Add force to special submodules [tools/release] - 10https://gerrit.wikimedia.org/r/404532 (owner: 10Chad) [20:02:47] 10Release-Engineering-Team (Next), 10Scap, 10ORES, 10Operations, 10Scoring-platform-team: scap support for git-lfs - https://phabricator.wikimedia.org/T181855#3904000 (10Ottomata) K cool, sounds good :) [20:03:14] (03CR) 10jerkins-bot: [V: 04-1] make-wmf-branch: Add force to special submodules [tools/release] - 10https://gerrit.wikimedia.org/r/404532 (owner: 10Chad) [20:03:25] (03Merged) 10jenkins-bot: make-wmf-branch: Add force to special submodules [tools/release] - 10https://gerrit.wikimedia.org/r/404532 (owner: 10Chad) [20:10:00] 10Scap, 10Operations: Install git-lfs client (at least on scap targets & masters) - https://phabricator.wikimedia.org/T180628#3904040 (10Ottomata) p:05Triage>03Normal [20:10:23] 10Continuous-Integration-Config, 10Operations, 10Patch-For-Review: Add CI to all operations/* repositories and archive obsolete ones - https://phabricator.wikimedia.org/T180330#3904041 (10Ottomata) p:05Triage>03Normal [20:10:44] 10Release-Engineering-Team (Watching / External), 10Operations, 10User-Joe: [DRAFT][RfC] Deployment of python applications in production - https://phabricator.wikimedia.org/T180023#3904044 (10Ottomata) p:05Triage>03Normal [20:13:39] thcipriani: hey, do you think you can take a look at this? https://gerrit.wikimedia.org/r/#/c/403904/ [20:14:25] Amir1: seems fine to me, you don't need BetaFeatures to be checked out as part of extension tests? [20:14:51] thcipriani: we removed any link to that extension one week ago [20:14:55] prod and tests [20:15:00] ah, k. [20:15:14] (03CR) 10Thcipriani: [C: 032] Remove BetaFeatures from ORES extension dependencies [integration/config] - 10https://gerrit.wikimedia.org/r/403904 (https://phabricator.wikimedia.org/T184554) (owner: 10Ladsgroup) [20:15:28] will deploy after it merges. [20:16:08] Thank you very much [20:16:26] * thcipriani doffs hat [20:16:41] (03Merged) 10jenkins-bot: Remove BetaFeatures from ORES extension dependencies [integration/config] - 10https://gerrit.wikimedia.org/r/403904 (https://phabricator.wikimedia.org/T184554) (owner: 10Ladsgroup) [20:20:15] !log reloading zuul to deploy https://gerrit.wikimedia.org/r/#/c/403904/1 [20:20:42] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [20:21:15] Amir1: should be deployed now [20:22:33] Thank you very much [20:22:35] Thanks! [20:23:03] yw :) [20:36:04] Amir1: I am creating wikibase/property-suggester-scripts [20:36:05] :D [20:37:00] Project beta-scap-eqiad build #191127: 04FAILURE in 3 min 19 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/191127/ [20:47:21] Yippee, build fixed! [20:47:21] Project beta-scap-eqiad build #191128: 09FIXED in 3 min 41 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/191128/ [20:50:20] 10Release-Engineering-Team (Kanban), 10Mediawiki-extensions-PropertySuggester, 10Repository-Admins, 10Wikidata: Move PropertySuggester-Python to gerrit - https://phabricator.wikimedia.org/T166672#3904142 (10hashar) 05Open>03Resolved ``` $ git clone --mirror https://github.com/Wikidata-lib/PropertySugge... [20:55:24] Project beta-code-update-eqiad build #189816: 04FAILURE in 2 min 23 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/189816/ [21:06:12] Project beta-scap-eqiad build #191129: 04FAILURE in 3 min 21 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/191129/ [21:06:55] Yippee, build fixed! [21:06:55] Project beta-code-update-eqiad build #189817: 09FIXED in 42 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/189817/ [21:08:58] 10Beta-Cluster-Infrastructure, 10ContentTranslation: [betalabs] "Uncaught TypeError: Cannot read property 'changeSettings' of null" when clicking on any option of row tux-message-selector - https://phabricator.wikimedia.org/T185038#3904207 (10Etonkovidova) [21:09:09] Project beta-scap-eqiad build #191130: 04STILL FAILING in 2 min 13 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/191130/ [21:11:00] 10Release-Engineering-Team, 10MediaWiki-Vagrant, 10Operations, 10Epic, and 2 others: [EPIC] Migrate base image to Debian Jessie - https://phabricator.wikimedia.org/T136429#3904228 (10jgleeson) [21:11:13] 10Beta-Cluster-Infrastructure, 10Language-2017-Oct-Dec, 10Language-2018-Jan-Mar: [betacluster] Special:Translate does not allow to translate article content - multiple "Uncaught SyntaxError: Unexpected token < in JSON at position 0" - https://phabricator.wikimedia.org/T180841#3904232 (10Etonkovidova) Checked... [21:11:25] 10Beta-Cluster-Infrastructure, 10Language-2017-Oct-Dec, 10Language-2018-Jan-Mar: [betacluster] Special:Translate does not allow to translate article content - multiple "Uncaught SyntaxError: Unexpected token < in JSON at position 0" - https://phabricator.wikimedia.org/T180841#3904239 (10Etonkovidova) 05Open... [21:12:40] and login via gui fails with cookie problem [21:13:56] Sagan: see topic [21:14:08] oops, okay [21:16:20] Project beta-scap-eqiad build #191131: 04STILL FAILING in 2 min 44 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/191131/ [21:22:44] no_justification: we probably need to arm keyholder^ lots of permission denied errors [21:25:52] Project beta-scap-eqiad build #191132: 04STILL FAILING in 2 min 17 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/191132/ [21:29:45] !log armed keyholder using instructions on https://wikitech.wikimedia.org/wiki/Keyholder [21:29:51] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:31:04] 10Beta-Cluster-Infrastructure, 10ContentTranslation: [betacluster] "Uncaught TypeError: Cannot read property 'changeSettings' of null" when clicking on any option of row tux-message-selector - https://phabricator.wikimedia.org/T185038#3904355 (10greg) [21:32:42] * greg-g couldn't get su to work, what's my password? :) [21:33:03] * greg-g brbs [21:40:58] 10Continuous-Integration-Infrastructure, 10Operations, 10HHVM: HHVM 3.18.5+dfsg-1+wmf3 changes parse_url causing unit tests to fail - https://phabricator.wikimedia.org/T185024#3903647 (10Anomie) It looks like someone upstream misread RFC 3986 near [[https://github.com/facebook/hhvm/commit/80855dc1f2fe4d9de6b... [21:41:53] Yippee, build fixed! [21:41:54] Project beta-scap-eqiad build #191133: 09FIXED in 8 min 12 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/191133/ [22:02:00] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.31.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T180749#3904461 (10thcipriani) [22:02:41] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.31.0-wmf.16 deployment blockers - https://phabricator.wikimedia.org/T180749#3768321 (10thcipriani) 05Open>03Resolved 1.31.0-wmf.16 is now live everywhere. [22:40:16] CentralAuth browser tests are failing with "could not find links to Sauce Labs job URL": [22:40:19] https://integration.wikimedia.org/ci/job/selenium-CentralAuth/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=BrowserTests/lastBuild/testReport/(root)/CentralAuth%20log%20in/Test_central_domain_login/ [22:40:42] this is usually a transient error but now it doesn't seem to be [22:41:26] or maybe that's just debug output and unrelated to the actual error? [22:58:59] 10Continuous-Integration-Infrastructure, 10Front-end-Standards-Group, 10MediaWiki-extensions-General: Decide whether we want the package-lock.json to commit or ignore - https://phabricator.wikimedia.org/T179229#3904589 (10Jdlrobson) [23:04:47] tgr: that *might* be due to the cloud reboots [23:21:32] greg-g: rebuilds are still failing [23:21:39] :( [23:22:26] tgr: I hate to say it, but file a task for zeljko [23:23:22] tgr, is it still occurring? [23:23:49] yeah, I started the last attempt 7 minutes ago [23:27:45] did Jenkins recently make it superhard to find the actual test output? [23:27:53] (03PS1) 10Chad: make-wmf-branch: Stop committing weird sub-submodule branching [tools/release] - 10https://gerrit.wikimedia.org/r/404600 [23:28:02] it took me at least five minutes of random clicking around [23:31:03] greg-g: turns out it's beta being broken: 18:06 -!- mode/#wikimedia-releng [-o greg-g] by ChanServ [23:31:21] oops [23:31:25] https://en.wikipedia.beta.wmflabs.org/w/api.php?action=query&meta=siteinfo&siprop=general&format=json [23:31:56] so ignore it, I can figure that out [23:32:22] there dosen't seem to be a dashboard on https://logstash-beta.wmflabs.org/app/kibana#/dashboards?notFound=dashboard&_g=() [23:35:08] tgr: ugh and thanks [23:39:23] (03CR) 1020after4: [C: 032] make-wmf-branch: Stop committing weird sub-submodule branching [tools/release] - 10https://gerrit.wikimedia.org/r/404600 (owner: 10Chad) [23:39:58] (03Merged) 10jenkins-bot: make-wmf-branch: Stop committing weird sub-submodule branching [tools/release] - 10https://gerrit.wikimedia.org/r/404600 (owner: 10Chad)