[00:04:42] 10Release-Engineering-Team (Kanban), 10ORES, 10Scoring-platform-team: Create gerrit mirrors for all github-based ORES repos - https://phabricator.wikimedia.org/T192042#4125561 (10awight) p:05Triage>03Normal [00:05:32] ^ Wrote this into a task just so I don’t forget where we’re at. / so I can forget in half an hour ;-) [00:06:53] < shinken-wm> PROBLEM - Free space - all mounts on integration-slave-jessie-1003 is CRITICAL [00:07:04] ^ this looks like it breaks all the jenkins tests [00:07:57] see all the -1 on operations.. if you go into details it shows that instance name [00:08:02] the issue is that /srv is full [00:08:16] pbuilder 2.4G [00:14:42] !log preparing to deploy phabricator rPHDEP/release/2018-04-12/1 https://phabricator.wikimedia.org/project/view/3335/ [00:14:44] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [00:14:57] heh wrong channel & [00:14:59] ^ [00:18:31] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<20.00%) [00:21:23] integration-slave-jessie-1003 seems to have fixed itself [00:21:42] now we have a new alert for deployment-florine02 [00:28:17] RECOVERY - Free space - all mounts on integration-slave-jessie-1003 is OK: OK: All targets OK [00:31:16] PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [00:35:26] 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10MediaWiki-Parser, 10Quibble, and 2 others: [REL1_30] Some parserTests fail on debian stretch using Tidy, because of a new version of libtidy - https://phabricator.wikimedia.org/T191771#4125612 (10Legoktm) >>! In T191771#4122685, @MoritzMuehlenh... [00:35:36] 10Phabricator (Upstream), 10Upstream: Keep the Phabricator toolbar on the top visible when scrolling down - https://phabricator.wikimedia.org/T191540#4125618 (10Johnywhy) Firstly, not trying to be confrontational, just wanting to work effectively with the software and the community. And my feature requests are... [00:58:30] 10Phabricator, 10Release-Engineering-Team (Kanban): Deploy "Deadlines" feature - https://phabricator.wikimedia.org/T191865#4125662 (10mmodell) 05Open>03Resolved [00:58:49] 10Phabricator, 10Release-Engineering-Team (Kanban): Deploy "Deadlines" feature - https://phabricator.wikimedia.org/T191865#4119121 (10mmodell) [00:59:17] 10Phabricator, 10Release-Engineering-Team (Kanban): Deploy "Deadlines" feature - https://phabricator.wikimedia.org/T191865#4119121 (10mmodell) 05Resolved>03Open [01:04:48] 10Phabricator, 10Release-Engineering-Team (Kanban): Deploy "Deadlines" feature - https://phabricator.wikimedia.org/T191865#4125678 (10mmodell) [01:05:26] 10Phabricator, 10Release-Engineering-Team (Kanban): Deploy "Deadlines" feature - https://phabricator.wikimedia.org/T191865#4119121 (10mmodell) [01:05:40] 10Phabricator, 10Release-Engineering-Team (Kanban): Deploy "Deadlines" feature - https://phabricator.wikimedia.org/T191865#4119121 (10mmodell) 05Open>03Resolved [01:06:42] 10Phabricator: Add support for task types - https://phabricator.wikimedia.org/T93499#4125684 (10mmodell) @atgo: It's deployed. You'll have to edit the due date on any previously created tasks before it'll show up, due to a configuration problem. All new deadlines should show up on workboards and at the top of ta... [01:11:11] 10Phabricator: Add support for task types - https://phabricator.wikimedia.org/T93499#4125690 (10atgo) AMAZING THANK YOU @mmodell! [03:37:14] Project mediawiki-core-code-coverage-php7 build #201: 04STILL FAILING in 37 min: https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage-php7/201/ [04:24:20] Project mediawiki-core-code-coverage build #3440: 04STILL FAILING in 1 hr 24 min: https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage/3440/ [04:47:27] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [10.0] [05:32:28] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [10.0] [06:07:27] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [10.0] [06:49:17] PROBLEM - Puppet staleness on deployment-maps03 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [43200.0] [06:53:32] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [07:09:06] (03CR) 10Hashar: [C: 032] "> I did not bothered trying to benchmark the 11 patches" (032 comments) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425426 (owner: 10Thiemo Kreuz (WMDE)) [07:09:54] (03Merged) 10jenkins-bot: Replace strpos() with faster substr() comparisons [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425426 (owner: 10Thiemo Kreuz (WMDE)) [07:10:39] (03CR) 10jenkins-bot: Replace strpos() with faster substr() comparisons [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425426 (owner: 10Thiemo Kreuz (WMDE)) [07:12:39] 10MediaWiki-Releasing, 10MW-1.31-release: Bundle LoginNotify extension with MW 1.31 - https://phabricator.wikimedia.org/T191746#4126031 (10Legoktm) [07:12:59] 10MediaWiki-Releasing, 10MW-1.31-release: Bundle CodeEditor extension with MW 1.31 - https://phabricator.wikimedia.org/T191742#4126033 (10Legoktm) [07:13:13] 10MediaWiki-Releasing, 10MW-1.31-release: Bundle Thanks extension with MW 1.31 - https://phabricator.wikimedia.org/T191739#4126035 (10Legoktm) [07:13:23] 10MediaWiki-Releasing, 10MW-1.31-release: Bundle Thanks extension with MW 1.31 - https://phabricator.wikimedia.org/T191739#4115139 (10Legoktm) [07:13:42] 10MediaWiki-Releasing, 10MW-1.31-release: Bundle Thanks extension with MW 1.31 - https://phabricator.wikimedia.org/T191739#4115139 (10Legoktm) >>! In T191739#4115458, @Legoktm wrote: > AFAICT Thanks is useless without Echo, but there's no hard dependency set in extension.json? Hard dependency set in https://g... [07:42:44] 10Gerrit, 10Developer-Relations, 10Developer-Wishlist (2017): Add a welcome bot to Gerrit for first time contributors - https://phabricator.wikimedia.org/T73357#4126099 (10Aklapper) Potentially superseded by T192046 ? [07:45:38] 10MediaWiki-Releasing, 10MW-1.31-release: Bundle Minerva Neue skin with MW 1.31 - https://phabricator.wikimedia.org/T191743#4126103 (10Legoktm) [07:58:37] (03CR) 10Thiemo Kreuz (WMDE): "Thanks a lot for the detailed review! :-)" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425426 (owner: 10Thiemo Kreuz (WMDE)) [08:42:17] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Lexicographical data, 10Wikidata, and 2 others: MediaWiki core's node selenium tests flaky when run as part of mwext-mw-selenium-node-composer-jessie job - https://phabricator.wikimedia.org/T191537#4126230 (10WMDE-leszek) @zelj... [08:44:24] 10Release-Engineering-Team (Kanban), 10Scap, 10Operations: mwscript rebuildLocalisationCache.php takes 40 minutes - https://phabricator.wikimedia.org/T191921#4126233 (10MoritzMuehlenhoff) Is the localisation cache currently generated incompletey as a consequence of that bug? I reimaged an app server with str... [08:57:11] 10Phabricator (Upstream), 10Upstream: Keep the Phabricator toolbar on the top visible when scrolling down - https://phabricator.wikimedia.org/T191540#4126256 (10Aklapper) > So all feature requests come down to your decision? It is a blurry responsibility shared among folks who maintain Wikimedia's Phab instanc... [09:17:54] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:17:54] PROBLEM - App Server Main HTTP Response on deployment-mediawiki04 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:22:45] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 36481 bytes in 3.807 second response time [09:22:47] RECOVERY - App Server Main HTTP Response on deployment-mediawiki04 is OK: HTTP OK: HTTP/1.1 200 OK - 47453 bytes in 3.797 second response time [09:29:50] PROBLEM - App Server Main HTTP Response on deployment-mediawiki05 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:34:43] RECOVERY - App Server Main HTTP Response on deployment-mediawiki05 is OK: HTTP OK: HTTP/1.1 200 OK - 47453 bytes in 3.721 second response time [09:36:13] PROBLEM - App Server Main HTTP Response on deployment-mediawiki06 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [09:41:08] RECOVERY - App Server Main HTTP Response on deployment-mediawiki06 is OK: HTTP OK: HTTP/1.1 200 OK - 47503 bytes in 6.503 second response time [10:54:55] 10Release-Engineering-Team (Kanban), 10Scap, 10Scoring-platform-team, 10Patch-For-Review: [Blocked] Support git-lfs - https://phabricator.wikimedia.org/T180627#4126467 (10demon) error: RPC failed; HTTP 504 curl 22 The requested URL returned error: 504 Gateway Time-out Herein lies the clue for this failu... [10:56:17] 10Gerrit, 10Developer-Relations, 10Developer-Wishlist (2017): Add a welcome bot to Gerrit for first time contributors - https://phabricator.wikimedia.org/T73357#4126472 (10demon) That's a dupe of this.... [11:02:17] 10Release-Engineering-Team (Kanban), 10Scap, 10Operations: mwscript rebuildLocalisationCache.php takes 40 minutes - https://phabricator.wikimedia.org/T191921#4126493 (10demon) Pretty sure that's unrelated....but I've definitely seen it before. Mostly when running maintenance scripts from tin instead of terbi... [11:14:47] 10Release-Engineering-Team (Kanban), 10Scap, 10Operations: mwscript rebuildLocalisationCache.php takes 40 minutes - https://phabricator.wikimedia.org/T191921#4126516 (10MoritzMuehlenhoff) >>! In T191921#4126493, @demon wrote: > Pretty sure that's unrelated....but I've definitely seen it before. Mostly when r... [11:28:31] 10Release-Engineering-Team (Kanban), 10Scap, 10Operations: mwscript rebuildLocalisationCache.php takes 40 minutes - https://phabricator.wikimedia.org/T191921#4126550 (10demon) No, that should've done it :( [11:32:46] 10Beta-Cluster-Infrastructure, 10Operations, 10HHVM, 10User-Elukey, 10User-Joe: Upgrade deployment-prep appserver fleet to Debian Stretch (using HHVM) - https://phabricator.wikimedia.org/T192071#4126558 (10Joe) p:05Triage>03Normal [11:33:25] 10Release-Engineering-Team (Kanban), 10MW-1.31-release-notes (WMF-deploy-2018-04-03 (1.31.0-wmf.28)), 10Patch-For-Review, 10User-zeljkofilipin: Video recording for Selenium tests in Node.js - https://phabricator.wikimedia.org/T179188#4126571 (10zeljkofilipin) [11:38:06] (03PS1) 10Zfilipin: WIP killall ffmpeg [integration/config] - 10https://gerrit.wikimedia.org/r/425788 (https://phabricator.wikimedia.org/T179188) [11:39:28] 10Release-Engineering-Team (Kanban), 10Scap, 10Operations: mwscript rebuildLocalisationCache.php takes 40 minutes - https://phabricator.wikimedia.org/T191921#4126575 (10MoritzMuehlenhoff) Comparing the freshly installed app server (mw1265) with an existing one (mw1264) also shows that /srv/mediawiki/php-1.31... [11:48:40] (03CR) 10Zfilipin: "Hm, did not help." [integration/config] - 10https://gerrit.wikimedia.org/r/425788 (https://phabricator.wikimedia.org/T179188) (owner: 10Zfilipin) [11:52:46] <_joe_> !log creating deployment-mediawiki-07, first stretch appserver T192071 [11:52:50] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:52:50] T192071: Upgrade deployment-prep appserver fleet to Debian Stretch (using HHVM) - https://phabricator.wikimedia.org/T192071 [11:53:50] 10Release-Engineering-Team, 10MediaWiki-Core-Tests, 10Epic, 10MW-1.31-release-notes (WMF-deploy-2018-02-27 (1.31.0-wmf.23)), and 2 others: Q3 Selenium framework improvements - https://phabricator.wikimedia.org/T182421#4126612 (10zeljkofilipin) a:05zeljkofilipin>03None [11:54:30] 10Release-Engineering-Team, 10MediaWiki-Core-Tests, 10User-zeljkofilipin: Q4 Selenium framework improvements - https://phabricator.wikimedia.org/T190994#4126613 (10zeljkofilipin) a:05zeljkofilipin>03None [11:55:27] (03CR) 10Zfilipin: [C: 04-1] WIP killall ffmpeg [integration/config] - 10https://gerrit.wikimedia.org/r/425788 (https://phabricator.wikimedia.org/T179188) (owner: 10Zfilipin) [11:57:19] PROBLEM - Puppet errors on deployment-mediawiki-07 is CRITICAL: CRITICAL: 87.50% of data above the critical threshold [0.0] [11:58:49] hi hi hashar ! [11:59:08] what CI access is it possible for people to get? what groups do people need to be in in order to do things (like restart jobs?) [11:59:21] _joe_: Did you already create that instance? [11:59:34] <_joe_> eddiegp: yes [11:59:36] If possible, I'd name it 08. We already had a 07 in the past. [11:59:51] Okay, then don't mind. [11:59:53] <_joe_> eddiegp: I named it mediawiki-07 [12:00:40] <_joe_> note the dash [12:00:48] <_joe_> as I want to use "prefix puppet" [12:01:01] <_joe_> so that I don't have to use that damn horizon UI again [12:01:02] <_joe_> :P [12:01:45] Isn't "prefix puppet" in the horizon UI? [12:02:35] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, and 2 others: selenium test for Wikibase is unstable - https://phabricator.wikimedia.org/T189762#4126623 (10hoo) [12:02:58] <_joe_> yes [12:03:02] <_joe_> but I can do it once [12:03:10] <_joe_> and it applies to all the servers from now on [12:03:17] <_joe_> s/servers/vms/ [12:03:27] Ah, sure. And you cant have a prefix that doesn't end in a dash? [12:03:36] <_joe_> yes [12:03:46] <_joe_> but I didn't want to mess up with the existing instances [12:03:46] (Not that the dash bothers me, just curious) [12:03:56] <_joe_> so that was my trick :P [12:04:22] Heh, okay :D [12:04:54] * eddiegp now waits for someone to confuse the deleted deployment-mediawiki07 with the newly created deployment-mediawiki-07 ;) [12:06:29] <_joe_> ahah [12:06:40] <_joe_> yeah if that happens, no big deal [12:07:17] RECOVERY - Puppet errors on deployment-mediawiki-07 is OK: OK: Less than 1.00% above the threshold [0.0] [12:16:44] addshore: we have locked Jenkins CI. [12:16:52] addshore: but "recheck" should be good enough for most use cases [12:17:25] Cool, just checking, as I'm about to do a tlak here at wmde covering CI deployments mw config etc [12:17:56] addshore: ahhh that sounds awesome :] [12:18:17] addshore: the twist you can give the audience is: at some point people will be able to reproduce the CI run on their local machine thanks to Docker [12:18:25] :D [12:18:36] that is the quibble thing I am polishing up, and later that will be helm / kubernetes [12:18:42] so you would do something like: [12:18:57] helm update --change=13291,42 && helm test [12:18:59] (in theory) [12:19:17] quibble is an intermediate state [12:19:19] woo :D [12:19:31] any other key CI things you think i should cover? [12:19:56] I'm generally / quickly talking about jenkins zuul, the docker stuff, pipelines, what configs you might want to touch [12:19:57] etc [12:20:27] you can talk about the gate pipeline [12:20:33] and how patches are pilled up one after the other [12:20:49] yarp [12:20:51] so that each patch is tested AS IF patches ahead already got merged [12:20:59] composer vs vendor.git is a mess as well [12:21:08] (for gating: https://docs.openstack.org/infra/zuul/user/gating.html ) [12:21:31] and another thing is explaining the slow down during busy hours [12:21:38] yup, [12:21:54] namely: jobs still run on disposable VM which takes ~ 3 minutes to refil and with a pool of ~ 15 instances [12:23:17] meanwhile [12:23:27] I am forced to write some javascript bah ( https://gerrit.wikimedia.org/r/#/c/425790/2/tests/selenium/wdio.conf.js ) [12:26:32] 10Project-Admins: Create a new component project: MediaWiki-Live-preview - https://phabricator.wikimedia.org/T192074#4126664 (10stjn) [12:27:02] PROBLEM - App Server Main HTTP Response on deployment-mediawiki-07 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [12:28:05] 10Project-Admins: Create a new component project: MediaWiki-Live-preview - https://phabricator.wikimedia.org/T192074#4126678 (10stjn) [12:33:45] 10Project-Admins: Create a new component project: MediaWiki-Live-preview - https://phabricator.wikimedia.org/T192074#4126685 (10stjn) [12:39:39] (03PS4) 10Hashar: docker: hhvm/php55 quibble images on Jessie [integration/config] - 10https://gerrit.wikimedia.org/r/425600 [12:40:13] (03PS5) 10Hashar: docker: php55 quibble images on Jessie [integration/config] - 10https://gerrit.wikimedia.org/r/425600 [12:41:05] (03PS6) 10Hashar: docker: php55 quibble image on Jessie [integration/config] - 10https://gerrit.wikimedia.org/r/425600 [12:41:07] (03PS1) 10Hashar: docker: hhvm quibble image on Jessie [integration/config] - 10https://gerrit.wikimedia.org/r/425793 [12:49:00] (03PS1) 10Hashar: Add php55 quibble jobs [integration/config] - 10https://gerrit.wikimedia.org/r/425795 [12:49:54] (03CR) 10Hashar: [C: 032] "Passed with mediawiki change https://gerrit.wikimedia.org/r/#/c/425790/" [integration/config] - 10https://gerrit.wikimedia.org/r/425600 (owner: 10Hashar) [12:50:13] (03CR) 10Hashar: [C: 04-1] "Quibbles require support for hhvm -m server" [integration/config] - 10https://gerrit.wikimedia.org/r/425793 (owner: 10Hashar) [12:50:36] (03CR) 10Hashar: [C: 032] "Deployed:" [integration/config] - 10https://gerrit.wikimedia.org/r/425795 (owner: 10Hashar) [12:51:11] (03Merged) 10jenkins-bot: docker: php55 quibble image on Jessie [integration/config] - 10https://gerrit.wikimedia.org/r/425600 (owner: 10Hashar) [12:51:42] !log building releng/quibble-jessie and releng/quibble-jessie-php55 [12:51:44] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [12:52:10] (03Merged) 10jenkins-bot: Add php55 quibble jobs [integration/config] - 10https://gerrit.wikimedia.org/r/425795 (owner: 10Hashar) [13:23:51] (03PS1) 10Hashar: Abstract out port 9412 [integration/quibble] - 10https://gerrit.wikimedia.org/r/425804 [13:31:35] (03CR) 10Hashar: [C: 032] Abstract out port 9412 [integration/quibble] - 10https://gerrit.wikimedia.org/r/425804 (owner: 10Hashar) [13:32:02] (03Merged) 10jenkins-bot: Abstract out port 9412 [integration/quibble] - 10https://gerrit.wikimedia.org/r/425804 (owner: 10Hashar) [13:42:26] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [14:24:30] <_joe_> !log installing deployment-mediawiki{08,09} for the beta upgrade to stretch [14:24:32] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:24:58] <_joe_> !log installing deployment-mediawiki{08,09} for the beta upgrade to stretch of deployment-prep (T192071) [14:25:01] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:25:01] T192071: Upgrade deployment-prep appserver fleet to Debian Stretch (using HHVM) - https://phabricator.wikimedia.org/T192071 [14:28:47] s/mediawiki{08,09}/mediawiki-{08,09}/ :P [14:34:47] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - string 'Wikipedia' not found on 'https://en.m.wikipedia.beta.wmflabs.org:443/wiki/Main_Page?debug=true' - 2669 bytes in 6.085 second response time [14:34:52] 10Release-Engineering-Team (Kanban), 10Scap, 10Operations: mwscript rebuildLocalisationCache.php takes 40 minutes - https://phabricator.wikimedia.org/T191921#4127057 (10MoritzMuehlenhoff) >>! In T191921#4126575, @MoritzMuehlenhoff wrote: > Comparing the freshly installed app server (mw1265) with an existing... [14:35:01] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Backend fetch failed - string 'Wikipedia' not found on 'https://en.wikipedia.beta.wmflabs.org:443/wiki/Main_Page?debug=true' - 2669 bytes in 6.059 second response time [14:38:17] Request from 86.141.212.219 via deployment-cache-text04 deployment-cache-text04, Varnish XID 235588902 [14:46:57] 10Continuous-Integration-Config, 10Wiki-Loves-Monuments-Database: Generate coverafe for PHPUnit tests of labs-tools-heritage - https://phabricator.wikimedia.org/T192083#4127096 (10Lokal_Profil) [14:47:13] 10Continuous-Integration-Config, 10Wiki-Loves-Monuments-Database: Generate coverafe for PHPUnit tests of labs-tools-heritage - https://phabricator.wikimedia.org/T192083#4127096 (10Lokal_Profil) Background: >>! In T179054#3881834, @Legoktm wrote: >>>! In T179054#3877791, @Lokal_Profil wrote: >> Possibly a new t... [14:48:18] Project beta-scap-eqiad build #203542: 04FAILURE in 39 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/203542/ [14:48:54] 14:44:55 14:44:55 sudo -u mwdeploy -n -- /usr/bin/scap cdb-rebuild on deployment-mediawiki-07.deployment-prep.eqiad.wmflabs returned [255]: Host key verification failed. [14:50:33] _joe_ is working on these. I assumed he was on it. [14:51:35] <_joe_> paladox: that machine works, and scap pull works too locally [14:51:44] oh ok [14:51:49] thanks [14:52:11] <_joe_> so this must be an issue with the host key not being recognized from deployment-tin I guess? [14:52:24] 10Release-Engineering-Team (Watching / External), 10Epic, 10MediaWiki-Platform-Team (MWPT-Q4-Apr-Jun-2018), 10User-notice: Deploy refactored comment storage - https://phabricator.wikimedia.org/T166733#4127134 (10Bstorm) [14:52:38] <_joe_> but that's a bit strange, puppet should take care of everything [14:52:45] There was a task about this being a problem, I'll look for it [14:53:14] PROBLEM - Host deployment-puppetdb01 is DOWN: CRITICAL - Host Unreachable (10.68.23.76) [14:53:27] _joe_: T159332 [14:53:27] T159332: Beta cluster scap job ( beta-scap-eqiad ) fails due to puppet erasing /etc/ssh/ssh_known_hosts - https://phabricator.wikimedia.org/T159332 [14:53:43] <_joe_> sigh [14:54:01] <_joe_> we really can't do things properly in deployment-prep, can we [14:54:11] Look at the last comment of thcipriani, it has a hint how to fix this. [14:54:56] We could, if we'd care more. It's just that few people want to spent time working on things there if there's no direct outcome imho. [14:55:12] PROBLEM - App Server Main HTTP Response on deployment-mediawiki-08 is CRITICAL: Connection refused [14:56:04] <_joe_> eddiegp: there would've been ways to make deployment-prep less of an intractable oddball, IMHO [14:56:37] this one? https://phabricator.wikimedia.org/T159332 [14:56:48] puppetdb was down before btw, you can ignore it. [14:57:16] thcipriani: Yes. [14:57:19] <_joe_> eddiegp: so that's why the ssh_known_host file is emptyt [14:58:21] Idk? Does ssh_known_host need puppetdb? [14:58:55] <_joe_> yes [14:59:13] <_joe_> puppet runs on servers, and every server registers its fingerprint and all to puppetdb [14:59:22] deployment-puppetdb01 has been down for ... weeks? months? T187736 [14:59:22] T187736: Host deployment-puppetdb01 is DOWN: CRITICAL - Host Unreachable (10.68.23.76) - https://phabricator.wikimedia.org/T187736 [14:59:32] it's been down months [14:59:32] <_joe_> then they also collect all the keys in there to generate the ssh-known-hosts file [14:59:58] <_joe_> but that's in production, I didn't think we used it in labs [15:00:57] At least there's an instance for it. Idk if that was just used for testing at some point or is meant to be an instance actually used like puppetdb in prod [15:02:02] I pinged Krenair (he created the instance) about it a week ago, but it seems he didn't have time to attend to it yet. [15:06:18] (03CR) 10Niedzielski: [C: 031] Run `npm run selenium` instead of `grunt webdriver:test` [integration/config] - 10https://gerrit.wikimedia.org/r/424592 (https://phabricator.wikimedia.org/T179190) (owner: 10Zfilipin) [15:12:50] 10Project-Admins: Create a new component project: MediaWiki-Live-preview - https://phabricator.wikimedia.org/T192074#4126664 (10Jdforrester-WMF) Normally I'd just use https://phabricator.wikimedia.org/maniphest/query/sPpQV.7vXBLk/#R but sure. Should we make it a sub-project of #Mediawiki-Page-editing ? [15:28:37] Project beta-scap-eqiad build #203543: 04STILL FAILING in 39 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/203543/ [15:36:04] Project mediawiki-core-code-coverage-php7 build #202: 04STILL FAILING in 36 min: https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage-php7/202/ [15:37:10] Project beta-scap-eqiad build #203544: 04STILL FAILING in 7 min 44 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/203544/ [15:40:10] (03CR) 10Krinkle: [C: 031] "Seems fine to land as-is without depends from what I can see. Core supports 'npm run selenium' already. The referenced changeset merely ch" [integration/config] - 10https://gerrit.wikimedia.org/r/424592 (https://phabricator.wikimedia.org/T179190) (owner: 10Zfilipin) [15:43:30] Project beta-scap-eqiad build #203545: 04STILL FAILING in 5 min 38 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/203545/ [15:45:20] 10Project-Admins: Create a new component project: MediaWiki-Live-preview - https://phabricator.wikimedia.org/T192074#4127260 (10stjn) >>! In T192074#4127191, @Jdforrester-WMF wrote: > Normally I'd just use https://phabricator.wikimedia.org/maniphest/query/sPpQV.7vXBLk/#R but sure. That’s not user-friendly, to b... [15:48:41] PROBLEM - Puppet errors on integration-slave-docker-1003 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [15:49:49] Project beta-scap-eqiad build #203546: 04STILL FAILING in 5 min 35 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/203546/ [15:49:54] 10Release-Engineering-Team (Kanban), 10Scap, 10Operations: mwscript rebuildLocalisationCache.php takes 40 minutes - https://phabricator.wikimedia.org/T191921#4127269 (10thcipriani) As suggested in IRC, I ran `perf record` for rebuilding only the English language cdb in beta (since perf requires root). Not s... [15:52:18] (03PS1) 10Physikerwelt: Add AndreG-P to the jenkins whitelist [integration/config] - 10https://gerrit.wikimedia.org/r/425841 [15:54:01] thcipriani: I haven't used perf on hhvm yet, but if you haven't already, this might help: https://github.com/facebook/hhvm/wiki/Profiling#linux-perf-tools [15:54:25] Krinkle: thanks! I'll take a look. [15:54:28] I noticed the perf map missing, which shoudln't be the case [15:54:35] "Failed to open /tmp/perf-1109.map, " [15:54:56] (03CR) 10Zoranzoki21: [C: 04-1] Add AndreG-P to the jenkins whitelist (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/425841 (owner: 10Physikerwelt) [15:57:30] !log integration-slave-docker-1003 had to reinstall python-pbr to fix puppet complaining about tzdata update [15:57:32] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:59:17] Project beta-scap-eqiad build #203547: 04STILL FAILING in 5 min 32 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/203547/ [16:03:00] Krinkle: export php='hhvm -v Eval.Jit=false -v hhvm.keep_perf_pid_map=1' resulted in much more interesting output, thanks for the tip! [16:03:09] yw, glad it worked! [16:03:40] RECOVERY - Puppet errors on integration-slave-docker-1003 is OK: OK: Less than 1.00% above the threshold [0.0] [16:05:02] (03PS2) 10Physikerwelt: Add AndreG-P to the jenkins whitelist [integration/config] - 10https://gerrit.wikimedia.org/r/425841 [16:08:01] (03CR) 10Zoranzoki21: Add AndreG-P to the jenkins whitelist (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/425841 (owner: 10Physikerwelt) [16:09:00] Project beta-scap-eqiad build #203548: 04STILL FAILING in 5 min 22 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/203548/ [16:09:21] 10Project-Admins: Create a new component project: MediaWiki-Live-preview - https://phabricator.wikimedia.org/T192074#4126664 (10Krinkle) I'd also opt for it to be under #MediaWiki-Page-editing, preferably as either a plain workboard column, or perhaps as "milestone" subproject so that it continues to be manageab... [16:16:27] (03PS3) 10Physikerwelt: Add AndreG-P to the jenkins whitelist [integration/config] - 10https://gerrit.wikimedia.org/r/425841 [16:21:29] (03CR) 10Physikerwelt: "Thank you." (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/425841 (owner: 10Physikerwelt) [16:22:17] Project mediawiki-core-code-coverage build #3441: 04STILL FAILING in 1 hr 22 min: https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage/3441/ [16:23:06] (03CR) 10Zoranzoki21: [C: 031] "Ok is. Thank you!" [integration/config] - 10https://gerrit.wikimedia.org/r/425841 (owner: 10Physikerwelt) [16:24:51] 10Project-Admins: Create a new component project: MediaWiki-Live-preview - https://phabricator.wikimedia.org/T192074#4127441 (10Jdforrester-WMF) The problem with it being a milestone is that they are meant to be finishable. :-) [16:37:58] (03PS1) 10Hashar: Support HHVM has a built-in web server [integration/quibble] - 10https://gerrit.wikimedia.org/r/425849 [16:38:00] (03CR) 10Zoranzoki21: [C: 04-1] "I no want to this user be whitelisted. With patch is all ok, but I no want to this user be whitelisted because he is not active here. Unti" [integration/config] - 10https://gerrit.wikimedia.org/r/425841 (owner: 10Physikerwelt) [16:38:27] (03CR) 10jerkins-bot: [V: 04-1] Support HHVM has a built-in web server [integration/quibble] - 10https://gerrit.wikimedia.org/r/425849 (owner: 10Hashar) [16:38:32] (03CR) 10Zoranzoki21: [C: 04-1] "> I no want to this user be whitelisted. With patch is all ok, but I" [integration/config] - 10https://gerrit.wikimedia.org/r/425841 (owner: 10Physikerwelt) [16:42:54] 10Project-Admins: Create Technical Writing Project - https://phabricator.wikimedia.org/T192093#4127499 (10Aklapper) [16:48:27] (03PS2) 10Hashar: HHVM as built-in web server + integration tests [integration/quibble] - 10https://gerrit.wikimedia.org/r/425849 [16:53:43] Project beta-scap-eqiad build #203549: 04STILL FAILING in 39 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/203549/ [16:54:09] (03PS1) 10Hashar: Simplify tox configuration [integration/quibble] - 10https://gerrit.wikimedia.org/r/425860 [16:57:38] (03PS1) 10Legoktm: Automatically replace DO_MAINTENANCE [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425862 [17:01:13] (03CR) 10Physikerwelt: "Yes. Andre just started his PhD project on MathSearch. It would be a great help to grant him checking rights. Otherwise I see myself typin" [integration/config] - 10https://gerrit.wikimedia.org/r/425841 (owner: 10Physikerwelt) [17:02:01] PROBLEM - SSH on integration-slave-docker-1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:02:26] Project beta-scap-eqiad build #203550: 04STILL FAILING in 7 min 54 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/203550/ [17:06:08] (03PS1) 10Hashar: Add sqlite quibble jobs [integration/config] - 10https://gerrit.wikimedia.org/r/425865 [17:08:07] (03CR) 10Hashar: [C: 032] Add sqlite quibble jobs [integration/config] - 10https://gerrit.wikimedia.org/r/425865 (owner: 10Hashar) [17:09:17] Project beta-scap-eqiad build #203551: 04STILL FAILING in 5 min 35 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/203551/ [17:09:52] (03Merged) 10jenkins-bot: Add sqlite quibble jobs [integration/config] - 10https://gerrit.wikimedia.org/r/425865 (owner: 10Hashar) [17:10:37] (03PS2) 10Umherirrender: Shorten out earlier in the DbrQueryUsage sniff [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425427 (owner: 10Thiemo Kreuz (WMDE)) [17:10:39] (03CR) 10Umherirrender: [C: 032] Shorten out earlier in the DbrQueryUsage sniff [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425427 (owner: 10Thiemo Kreuz (WMDE)) [17:11:23] 10Beta-Cluster-Infrastructure, 10Operations: Beta cluster Obama page often responds with 503 - https://phabricator.wikimedia.org/T188913#4127666 (10Niedzielski) This is still an issue: ``` Request from 73.252.38.252 via deployment-cache-text04 deployment-cache-text04, Varnish XID 236099863 Error: 503, Backend... [17:11:30] (03Merged) 10jenkins-bot: Shorten out earlier in the DbrQueryUsage sniff [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425427 (owner: 10Thiemo Kreuz (WMDE)) [17:11:49] RECOVERY - SSH on integration-slave-docker-1003 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u4 (protocol 2.0) [17:11:59] (03CR) 10jenkins-bot: Shorten out earlier in the DbrQueryUsage sniff [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425427 (owner: 10Thiemo Kreuz (WMDE)) [17:12:18] 10Beta-Cluster-Infrastructure, 10Operations: Beta cluster Obama page often responds with 503 - https://phabricator.wikimedia.org/T188913#4023631 (10EddieGP) All of beta is currently down. [17:13:05] (03CR) 10Umherirrender: "Is there also a xsd for the ruleset.xml? That could be run on build step of mediawiki-codesniffer." [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/424970 (owner: 10Legoktm) [17:18:03] 10Beta-Cluster-Infrastructure, 10Operations: Beta cluster Obama page often responds with 503 - https://phabricator.wikimedia.org/T188913#4127679 (10jcrespo) Giuseppe mentioned some test stretch patches on beta, it may be unrelated, but so he is aware of ongoing issues. [17:19:10] Project beta-scap-eqiad build #203552: 04STILL FAILING in 5 min 26 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/203552/ [17:23:11] !log add ssh host key for deployment-mediawiki07 to /mnt/home/jenkins-deploy/.ssh/known_hosts so that beta-scap-eqiad will work again [17:23:12] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:25:54] !log running scap pull on deployment-mediawiki07 to catch up from missed beta-scap-eqiad deploys [17:25:56] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:29:12] PROBLEM - Free space - all mounts on deployment-mediawiki-07 is CRITICAL: CRITICAL: deployment-prep.deployment-mediawiki-07.diskspace.root.byte_percentfree (<11.11%) [17:29:29] Yippee, build fixed! [17:29:30] Project beta-scap-eqiad build #203553: 09FIXED in 5 min 44 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/203553/ [17:34:13] RECOVERY - Free space - all mounts on deployment-mediawiki-07 is OK: OK: All targets OK [17:43:59] (03PS1) 10Umherirrender: Optimize ShortCastSyntaxSniff sniff for performance [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425871 [17:44:20] (03CR) 10Umherirrender: Optimize ShortCastSyntax sniff for performance (031 comment) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425504 (owner: 10Thiemo Kreuz (WMDE)) [17:44:52] (03CR) 10Umherirrender: "This are my ideas for Id1aa50f577356f873c197dc42f342cfe192ce775" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425871 (owner: 10Umherirrender) [18:01:48] 10Release-Engineering-Team (Kanban), 10Release Pipeline, 10Patch-For-Review: Host packaged helm charts at https://releases.wikimedia.org/charts - https://phabricator.wikimedia.org/T191821#4117802 (10mobrovac) [18:03:05] 10Release-Engineering-Team (Kanban), 10Release Pipeline, 10Patch-For-Review: Host packaged helm charts at https://releases.wikimedia.org/charts - https://phabricator.wikimedia.org/T191821#4117802 (10Dzahn) https://releases.wikimedia.org/charts/ [18:10:39] 10Release-Engineering-Team (Kanban), 10ORES, 10Scoring-platform-team: Create gerrit mirrors for all github-based ORES repos - https://phabricator.wikimedia.org/T192042#4127824 (10awight) [18:11:22] (03CR) 10Umherirrender: [C: 04-1] Faster scan for namespaces in the PrefixedGlobalFunctions sniff (032 comments) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425434 (owner: 10Thiemo Kreuz (WMDE)) [18:17:14] (03CR) 10Umherirrender: Optimize PHPUnitClassUsage sniff for performance (032 comments) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425505 (owner: 10Thiemo Kreuz (WMDE)) [18:18:19] (03PS2) 10Umherirrender: Remove empty lines from comments [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425423 (owner: 10Thiemo Kreuz (WMDE)) [18:18:24] (03CR) 10Umherirrender: [C: 032] Remove empty lines from comments [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425423 (owner: 10Thiemo Kreuz (WMDE)) [18:19:02] 10Beta-Cluster-Infrastructure, 10Operations: Beta cluster Obama page often responds with 503 - https://phabricator.wikimedia.org/T188913#4127842 (10thcipriani) p:05Triage>03High hrm, everything from load.php is failing. Don't know if this is necessarily deployment-cache-text-04's problem since IIRC that's... [18:19:29] (03Merged) 10jenkins-bot: Remove empty lines from comments [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425423 (owner: 10Thiemo Kreuz (WMDE)) [18:20:35] (03CR) 10jenkins-bot: Remove empty lines from comments [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425423 (owner: 10Thiemo Kreuz (WMDE)) [18:42:32] (03CR) 10Umherirrender: "Not a blocker comment" (033 comments) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425429 (owner: 10Thiemo Kreuz (WMDE)) [18:52:29] PROBLEM - Puppet errors on deployment-cache-text04 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [18:54:30] PROBLEM - Puppet errors on deployment-secureredirexperiment is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [18:55:00] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 48074 bytes in 4.203 second response time [18:59:44] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 36444 bytes in 3.664 second response time [19:02:29] RECOVERY - Puppet errors on deployment-cache-text04 is OK: OK: Less than 1.00% above the threshold [0.0] [19:05:23] PROBLEM - Puppet errors on deployment-mediawiki04 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [19:05:55] 10Beta-Cluster-Infrastructure, 10Operations: Beta cluster Obama page often responds with 503 - https://phabricator.wikimedia.org/T188913#4127994 (10thcipriani) p:05High>03Normal Well the deployment-mediawiki-07 backend was the cause of 503s today. I changed the appserver backend in hiera to deployment-medi... [19:10:07] (03PS1) 10Hashar: DevWebServer now waits for tcp connnection [integration/quibble] - 10https://gerrit.wikimedia.org/r/425882 [19:10:49] (03PS3) 10Hashar: Support HHVM built-in web server + integration tests [integration/quibble] - 10https://gerrit.wikimedia.org/r/425849 [19:12:07] 10Beta-Cluster-Infrastructure, 10Operations: Beta cluster Obama page often responds with 503 - https://phabricator.wikimedia.org/T188913#4128004 (10thcipriani) This is hard to explain. So when deployment-cache-text-04 used deployment-mediawiki-07 as a backend this page was coming back with a 503: https://en.m... [19:12:27] (03CR) 10Umherirrender: Make use of $phpcsFile->eolChar in two sniffs (033 comments) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425513 (owner: 10Thiemo Kreuz (WMDE)) [19:23:01] (03PS2) 10Umherirrender: Scan for return tags from the end of the function scope [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425432 (owner: 10Thiemo Kreuz (WMDE)) [19:23:04] (03CR) 10Umherirrender: [C: 032] Scan for return tags from the end of the function scope [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425432 (owner: 10Thiemo Kreuz (WMDE)) [19:24:10] (03Merged) 10jenkins-bot: Scan for return tags from the end of the function scope [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425432 (owner: 10Thiemo Kreuz (WMDE)) [19:25:14] (03CR) 10jenkins-bot: Scan for return tags from the end of the function scope [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425432 (owner: 10Thiemo Kreuz (WMDE)) [19:35:25] RECOVERY - Puppet errors on deployment-mediawiki04 is OK: OK: Less than 1.00% above the threshold [0.0] [19:49:28] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [10.0] [19:50:13] (03PS2) 10Umherirrender: Make mwext-PoolCounter-rake-docker non-voting [integration/config] - 10https://gerrit.wikimedia.org/r/425337 [19:51:33] (03CR) 10jerkins-bot: [V: 04-1] Make mwext-PoolCounter-rake-docker non-voting [integration/config] - 10https://gerrit.wikimedia.org/r/425337 (owner: 10Umherirrender) [19:58:21] (03PS3) 10Umherirrender: Make mwext-PoolCounter-rake-docker non-voting [integration/config] - 10https://gerrit.wikimedia.org/r/425337 [19:59:36] (03CR) 10jerkins-bot: [V: 04-1] Make mwext-PoolCounter-rake-docker non-voting [integration/config] - 10https://gerrit.wikimedia.org/r/425337 (owner: 10Umherirrender) [20:02:16] (03PS4) 10Umherirrender: Make mwext-PoolCounter-rake-docker non-voting [integration/config] - 10https://gerrit.wikimedia.org/r/425337 [20:03:37] (03PS2) 10Umherirrender: Automatically replace DO_MAINTENANCE [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425862 (owner: 10Legoktm) [20:04:14] 10Release-Engineering-Team (Kanban), 10ORES, 10Scoring-platform-team: Create gerrit mirrors for all github-based ORES repos - https://phabricator.wikimedia.org/T192042#4128116 (10mmodell) [20:04:24] 10Release-Engineering-Team (Kanban), 10Scap, 10Scoring-platform-team, 10Patch-For-Review: [Blocked] Support git-lfs - https://phabricator.wikimedia.org/T180627#4128118 (10mmodell) [20:04:30] 10Release-Engineering-Team (Kanban), 10ORES, 10Scoring-platform-team: Create gerrit mirrors for all github-based ORES repos - https://phabricator.wikimedia.org/T192042#4125561 (10mmodell) 05Open>03Resolved [20:04:41] (03CR) 10Umherirrender: [C: 032] Automatically replace DO_MAINTENANCE [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425862 (owner: 10Legoktm) [20:04:45] (03Merged) 10jenkins-bot: Automatically replace DO_MAINTENANCE [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425862 (owner: 10Legoktm) [20:04:47] (03CR) 10jenkins-bot: Automatically replace DO_MAINTENANCE [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425862 (owner: 10Legoktm) [20:09:01] (03CR) 10Zoranzoki21: [C: 04-1] "> Yes. Andre just started his PhD project on MathSearch. It would be" [integration/config] - 10https://gerrit.wikimedia.org/r/425841 (owner: 10Physikerwelt) [20:09:38] 10Release-Engineering-Team (Kanban), 10ORES, 10Scoring-platform-team: Create gerrit mirrors for all github-based ORES repos - https://phabricator.wikimedia.org/T192042#4128135 (10mmodell) Sorry it took me so long. I just saw the update. [20:19:49] (03PS1) 10Hashar: Helper to check an executable is hhvm [integration/quibble] - 10https://gerrit.wikimedia.org/r/425890 [20:19:51] (03PS1) 10Hashar: Fix install.php under HHVM with MariaDB [integration/quibble] - 10https://gerrit.wikimedia.org/r/425891 [20:23:20] (03PS4) 10Hashar: Support HHVM built-in web server + integration tests [integration/quibble] - 10https://gerrit.wikimedia.org/r/425849 [20:24:27] PROBLEM - Mediawiki Error Rate on graphite-labs is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [10.0] [20:26:58] (03PS2) 10Hashar: Fix install.php under HHVM with MariaDB [integration/quibble] - 10https://gerrit.wikimedia.org/r/425891 [20:27:00] (03PS5) 10Hashar: Support HHVM built-in web server + integration tests [integration/quibble] - 10https://gerrit.wikimedia.org/r/425849 [20:35:25] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, and 2 others: selenium test for Wikibase is unstable - https://phabricator.wikimedia.org/T189762#4128215 (10Smalyshev) I've watched the [[ https://integration.wikimedia.org/ci/job/mwe... [20:36:54] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 10MW-1.31-release-notes (WMF-deploy-2018-04-10 (1.31.0-wmf.29)): selenium test for Wikibase is unstable - https://phabricator.wikimedia.org/T189762#4128224 (10Smalyshev) [20:51:40] (03PS6) 10Hashar: Support HHVM built-in web server + integration tests [integration/quibble] - 10https://gerrit.wikimedia.org/r/425849 [20:57:44] (03CR) 10Hashar: [C: 032] Helper to check an executable is hhvm [integration/quibble] - 10https://gerrit.wikimedia.org/r/425890 (owner: 10Hashar) [20:57:57] (03PS2) 10Hashar: Helper to check an executable is hhvm [integration/quibble] - 10https://gerrit.wikimedia.org/r/425890 [20:57:59] (03PS3) 10Hashar: Fix install.php under HHVM with MariaDB [integration/quibble] - 10https://gerrit.wikimedia.org/r/425891 [20:58:01] (03PS7) 10Hashar: Support HHVM built-in web server + integration tests [integration/quibble] - 10https://gerrit.wikimedia.org/r/425849 [21:01:28] 10Release-Engineering-Team (Kanban), 10ORES, 10Scoring-platform-team: Create gerrit mirrors for all github-based ORES repos - https://phabricator.wikimedia.org/T192042#4128325 (10awight) >>! In T192042#4128135, @mmodell wrote: > Sorry it took me so long. I just saw the update. <3 I'd say that resolving a ta... [21:02:52] 10Release-Engineering-Team (Kanban), 10ORES, 10Scoring-platform-team: Create gerrit mirrors for all github-based ORES repos - https://phabricator.wikimedia.org/T192042#4128334 (10mmodell) @awight: I'd like to unblock your project as much as possible and the week is nearly over so a few hours add up. [21:03:03] (03CR) 10Hashar: [C: 032] Helper to check an executable is hhvm [integration/quibble] - 10https://gerrit.wikimedia.org/r/425890 (owner: 10Hashar) [21:03:21] (03CR) 10Hashar: [C: 032] Fix install.php under HHVM with MariaDB [integration/quibble] - 10https://gerrit.wikimedia.org/r/425891 (owner: 10Hashar) [21:03:46] (03Merged) 10jenkins-bot: Helper to check an executable is hhvm [integration/quibble] - 10https://gerrit.wikimedia.org/r/425890 (owner: 10Hashar) [21:03:55] (03CR) 10Hashar: [C: 032] "Believe it or not, hhvm -m server works just fine. No need for a router.php." [integration/quibble] - 10https://gerrit.wikimedia.org/r/425849 (owner: 10Hashar) [21:04:00] (03Merged) 10jenkins-bot: Fix install.php under HHVM with MariaDB [integration/quibble] - 10https://gerrit.wikimedia.org/r/425891 (owner: 10Hashar) [21:04:12] (03PS2) 10Hashar: Simplify tox configuration [integration/quibble] - 10https://gerrit.wikimedia.org/r/425860 [21:04:30] (03Merged) 10jenkins-bot: Support HHVM built-in web server + integration tests [integration/quibble] - 10https://gerrit.wikimedia.org/r/425849 (owner: 10Hashar) [21:04:48] (03CR) 10Hashar: [C: 032] Simplify tox configuration [integration/quibble] - 10https://gerrit.wikimedia.org/r/425860 (owner: 10Hashar) [21:05:21] (03Merged) 10jenkins-bot: Simplify tox configuration [integration/quibble] - 10https://gerrit.wikimedia.org/r/425860 (owner: 10Hashar) [21:07:12] (03PS2) 10Hashar: DevWebServer now waits for tcp connnection [integration/quibble] - 10https://gerrit.wikimedia.org/r/425882 [21:08:21] PROBLEM - Puppet errors on deployment-mx02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [21:08:51] (03CR) 10Hashar: [C: 032] DevWebServer now waits for tcp connnection [integration/quibble] - 10https://gerrit.wikimedia.org/r/425882 (owner: 10Hashar) [21:09:21] (03Merged) 10jenkins-bot: DevWebServer now waits for tcp connnection [integration/quibble] - 10https://gerrit.wikimedia.org/r/425882 (owner: 10Hashar) [21:17:27] (03PS1) 10Hashar: php_is_hhvm needs clearing in integration tests [integration/quibble] - 10https://gerrit.wikimedia.org/r/425907 [21:18:00] (03CR) 10Hashar: [C: 032] php_is_hhvm needs clearing in integration tests [integration/quibble] - 10https://gerrit.wikimedia.org/r/425907 (owner: 10Hashar) [21:18:25] (03Merged) 10jenkins-bot: php_is_hhvm needs clearing in integration tests [integration/quibble] - 10https://gerrit.wikimedia.org/r/425907 (owner: 10Hashar) [21:23:24] (03PS2) 10Hashar: docker: hhvm quibble image on Jessie [integration/config] - 10https://gerrit.wikimedia.org/r/425793 [21:23:24] (03PS1) 10Hashar: Add hhvm quibble jobs [integration/config] - 10https://gerrit.wikimedia.org/r/425908 [21:27:45] (03PS1) 10Hashar: docker: 0.0.7 [integration/config] - 10https://gerrit.wikimedia.org/r/425909 [21:28:14] (03PS1) 10Hashar: Bump Jenkins jobs to quibble 0.0.7 [integration/config] - 10https://gerrit.wikimedia.org/r/425910 [21:29:48] (03PS3) 10Hashar: docker: hhvm quibble image on Jessie [integration/config] - 10https://gerrit.wikimedia.org/r/425793 [21:30:54] (03PS2) 10Hashar: Add hhvm quibble jobs [integration/config] - 10https://gerrit.wikimedia.org/r/425908 [21:31:33] (03PS2) 10Hashar: docker: quibble 0.0.7 [integration/config] - 10https://gerrit.wikimedia.org/r/425909 [21:31:35] (03PS2) 10Hashar: Bump Jenkins jobs to quibble 0.0.7 [integration/config] - 10https://gerrit.wikimedia.org/r/425910 [21:31:37] (03PS4) 10Hashar: docker: hhvm quibble image on Jessie [integration/config] - 10https://gerrit.wikimedia.org/r/425793 [21:31:39] (03PS3) 10Hashar: Add hhvm quibble jobs [integration/config] - 10https://gerrit.wikimedia.org/r/425908 [21:31:43] good night [21:41:26] 10Release-Engineering-Team (Kanban), 10Release Pipeline: Come up with a decent method of declaring helm chart path/version in service repo - https://phabricator.wikimedia.org/T191327#4128403 (10dduvall) [21:41:31] 10Release-Engineering-Team (Kanban), 10Release Pipeline, 10Patch-For-Review: Host packaged helm charts at https://releases.wikimedia.org/charts - https://phabricator.wikimedia.org/T191821#4128401 (10dduvall) 05Open>03Resolved @Dzahn thanks for the quick merge! [21:42:29] 10Release-Engineering-Team (Kanban), 10Release Pipeline: modify service-pipeline to include helm install/helm test - https://phabricator.wikimedia.org/T188935#4128405 (10dduvall) [21:42:32] 10Release-Engineering-Team (Kanban), 10Release Pipeline: Come up with a decent method of declaring helm chart path/version in service repo - https://phabricator.wikimedia.org/T191327#4101697 (10dduvall) [21:50:38] 10Release-Engineering-Team (Kanban), 10Release Pipeline: Come up with a decent method of declaring helm chart path/version in service repo - https://phabricator.wikimedia.org/T191327#4128442 (10dduvall) In today's SSD (aka CD Pipeline, aka Release Pipeline) Meeting, we agreed to implement option #1 after chart... [22:24:06] no_justification https://bugs.chromium.org/p/gerrit/issues/detail?id=6094#c17 yay! [22:36:54] 10Beta-Cluster-Infrastructure, 10Operations: Beta cluster Obama page often responds with 503 - https://phabricator.wikimedia.org/T188913#4128549 (10thcipriani) restored `deployment-mediawiki-07` as appserver backend. It seems the ferm service is having trouble starting on that machine, so the previous varnish... [22:36:57] RECOVERY - App Server Main HTTP Response on deployment-mediawiki-07 is OK: HTTP OK: HTTP/1.1 200 OK - 46945 bytes in 4.819 second response time [22:38:46] PROBLEM - Puppet errors on deployment-aqs01 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [22:44:11] twentyafterfour: I’m putting together the submodules you mirrored and doing another test deployment, but I had a question for the near future: [22:44:23] How will we safely test this deployment strategy in production? [22:44:58] There are some major differences between beta and prod, obviously. I was thinking we could deploy to just one ORES server without too much disruption, if that makes sense to you... [23:01:53] 10Release-Engineering-Team (Kanban), 10Scap, 10Scoring-platform-team, 10Patch-For-Review: Support git-lfs - https://phabricator.wikimedia.org/T180627#4128610 (10awight) [23:09:57] 10Release-Engineering-Team (Kanban), 10Scap, 10Scoring-platform-team, 10Patch-For-Review: Support git-lfs - https://phabricator.wikimedia.org/T180627#4128623 (10awight) With the gerrit-based submodule workaround, git-lfs is in business on the beta cluster! We have normal, working ORES install with a smal... [23:10:26] I’m ready for production, so going ahead with the canary plan above. [23:18:47] RECOVERY - Puppet errors on deployment-aqs01 is OK: OK: Less than 1.00% above the threshold [0.0] [23:28:04] 10Release-Engineering-Team (Kanban), 10Scap, 10Scoring-platform-team, 10Patch-For-Review: Support git-lfs - https://phabricator.wikimedia.org/T180627#4128655 (10awight) Production was unsuccessful, but we're really close! ``` ssh tin git fetch https://gerrit.wikimedia.org/r/mediawiki/services/ores/deploy... [23:28:38] twentyafterfour: Whatever is happening on beta needs to happen to production, and I think we’ll have complete LFS support! [23:29:48] 10Project-Admins: Create a new component project: MediaWiki-Live-preview - https://phabricator.wikimedia.org/T192074#4128663 (10Krinkle) There's merely word semantics, though. I don't see that as a problem. They also work well on [#services](https://phabricator.wikimedia.org/project/view/69/), [#TechCom-RFC](htt... [23:30:55] 10Continuous-Integration-Infrastructure, 10MediaWiki-Core-Tests: Work around Jenkins's xunit validator being incompatible with PHPUnit 6's extra output in junit.xml - https://phabricator.wikimedia.org/T192120#4128664 (10Jdforrester-WMF) [23:30:59] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 10MW-1.31-release-notes (WMF-deploy-2018-04-10 (1.31.0-wmf.29)): selenium test for Wikibase is unstable - https://phabricator.wikimedia.org/T189762#4128674 (10Legoktm) It's nearly b... [23:58:38] 10Continuous-Integration-Infrastructure, 10MediaWiki-Core-Tests: Work around Jenkins's xunit validator being incompatible with PHPUnit 6's extra output in junit.xml - https://phabricator.wikimedia.org/T192120#4128741 (10Legoktm) ``` >>> schema=etree.XMLSchema(etree.parse(open('xunit-plugin/src/main/resources/o...