[00:07:04] 10Continuous-Integration-Infrastructure, 10MediaWiki-Core-Tests: Work around Jenkins's xunit validator being incompatible with PHPUnit 6's extra output in junit.xml - https://phabricator.wikimedia.org/T192120#4128746 (10Legoktm) I tested out https://github.com/jenkinsci/xunit-plugin/pull/57 and it appears to w... [00:07:59] thcipriani: ^ don't know if you would know how to deploy a patch to a jenkins plugin [00:10:28] 10Continuous-Integration-Infrastructure, 10MediaWiki-Core-Tests, 10Jenkins: Work around Jenkins's xunit validator being incompatible with PHPUnit 6's extra output in junit.xml - https://phabricator.wikimedia.org/T192120#4128757 (10Legoktm) [00:12:14] (03PS1) 10Jforrester: [Wikibase] Temporarily remove flaky mwext-mw-selenium-composer-jessie from test [integration/config] - 10https://gerrit.wikimedia.org/r/425929 (https://phabricator.wikimedia.org/T189762) [00:27:14] awight: awesome. Deploying one ores server in prod sounds like a good plan to me [00:28:55] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, and 2 others: selenium test for Wikibase is unstable - https://phabricator.wikimedia.org/T189762#4128781 (10Krinkle) [00:32:22] twentyafterfour: Do you happen to know what steps remain to make production deployment behave like beta? [00:32:52] awight: we haven't customized anything in particular, however, we might need to deploy the latest scap to production [00:33:05] ah, that’s probably all it is. awesome! [00:33:16] Is there a task I can depend on to make that clear? [00:33:26] no but I will make one [00:35:38] FYI, I’m loading a 1.5GB file into git-lfs cos it’s our first use case, and more urgent than the other applications of LFS. [00:35:57] Doing that now and testing on beta, to see if there are any gotchas at this point. [00:36:26] awight: T192124 [00:36:27] T192124: Deploy Scap 3.8.0 to production - https://phabricator.wikimedia.org/T192124 [00:36:31] 10Deployments, 10Release-Engineering-Team (Kanban), 10Operations, 10Release: Deploy Scap 3.8.0 to production - https://phabricator.wikimedia.org/T192124#4128785 (10mmodell) [00:36:34] ty [00:36:57] you're welcome :) [00:37:19] 10Deployments, 10Release-Engineering-Team (Kanban), 10Operations, 10Release: Deploy Scap 3.8.0 to production - https://phabricator.wikimedia.org/T192124#4128799 (10awight) [00:37:23] 10Release-Engineering-Team (Kanban), 10Scap, 10Scoring-platform-team, 10Patch-For-Review: Support git-lfs - https://phabricator.wikimedia.org/T180627#4128798 (10awight) [00:38:09] twentyafterfour: btw, is this really a blocker, or just a related task? T182085 [00:38:10] T182085: Connect Phabricator to swift for storage of git-lfs and file uploads. - https://phabricator.wikimedia.org/T182085 [00:39:31] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<30.00%) [00:39:53] awight: it's only a blocker for having lfs in diffusion repos [00:40:03] (03PS1) 10Dduvall: WIP: Perform helm deployment in service-pipeline [integration/config] - 10https://gerrit.wikimedia.org/r/425936 (https://phabricator.wikimedia.org/T188935) [00:40:07] so probably not an overall blocker for you? [00:40:17] as long as gerrit is doing the job well enough [00:40:18] uh oh :) I think it’s too late then, I already uploaded the huge file I was mentioning [00:40:24] ah I see [00:40:36] ok, maybe this support lfs thing is an epic, then [00:40:53] large file is fine because my repo is currently hosted in gerrit... [00:40:53] phabricator will let you create git lfs repos but it'll reject file uploads more than a few megabytes [00:41:13] nasty business :) [00:41:22] yeah phab stores files in mysql...not good [00:41:47] the alternative is amazon s3, which we don't want to pay for nor trust. [00:42:06] but swift is s3 compatible so it's theoretically trivial to hook up [00:42:23] smh [00:59:59] PROBLEM - Free space - all mounts on deployment-ores01 is CRITICAL: CRITICAL: deployment-prep.deployment-ores01.diskspace._srv.byte_percentfree (No valid datapoints found)deployment-prep.deployment-ores01.diskspace.root.byte_percentfree (<30.00%) [01:04:13] twentyafterfour: successfully deployed the 1.5GB file on beta :D [01:04:16] The bad news: [01:04:21] 3.1G .git/modules/submodules/assets/ [01:04:29] lmao [01:09:57] RECOVERY - Free space - all mounts on deployment-ores01 is OK: OK: deployment-prep.deployment-ores01.diskspace._srv.byte_percentfree (No valid datapoints found) [02:09:53] (03PS1) 10Jamesmontalvo3: Add jamesmontalvo3 to the CI whitelist [integration/config] - 10https://gerrit.wikimedia.org/r/425944 [02:15:09] (03CR) 10Cicalese: [C: 031] Add jamesmontalvo3 to the CI whitelist [integration/config] - 10https://gerrit.wikimedia.org/r/425944 (owner: 10Jamesmontalvo3) [03:56:59] Project mediawiki-core-code-coverage-php7 build #203: 04STILL FAILING in 56 min: https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage-php7/203/ [04:26:45] Project mediawiki-core-code-coverage build #3442: 04STILL FAILING in 1 hr 26 min: https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage/3442/ [04:49:47] I will let you know when I see hashar around here [04:49:47] @notify hashar [04:55:40] PROBLEM - SSH on integration-slave-docker-1011 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [04:59:11] (03PS1) 10Legoktm: Use external docker-registry URL in README [integration/quibble] - 10https://gerrit.wikimedia.org/r/425956 [05:00:33] RECOVERY - SSH on integration-slave-docker-1011 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u4 (protocol 2.0) [06:50:43] PROBLEM - Puppet errors on integration-slave-docker-1003 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [06:53:28] hi hashar! [06:53:32] quibble is amazing [06:54:09] within 30 seconds I was fully able to reproduce a sqlite test failure on my laptop [06:54:16] (03CR) 10Hashar: [C: 032] docker: quibble 0.0.7 [integration/config] - 10https://gerrit.wikimedia.org/r/425909 (owner: 10Hashar) [06:54:29] And then bisecting was pretty easy actually [06:54:40] legoktm: ahh good to know :] [06:54:48] legoktm: I filled a task about sqlite issues on mediawiki/core [06:55:09] https://phabricator.wikimedia.org/T191035 [06:55:23] hashar: also https://phabricator.wikimedia.org/T192120 is the last task blocking PHPUnit 6 - a 1.31 blocker. How do we deploy a patch to a jenkins plugin? [06:55:31] (03Merged) 10jenkins-bot: docker: quibble 0.0.7 [integration/config] - 10https://gerrit.wikimedia.org/r/425909 (owner: 10Hashar) [06:55:42] some are just SELECT yielding things in different orders, others are due to incremental id that are different based n the order tests are run (i think) [06:56:32] legoktm: I fork the github repo to our gerrit (under something like integration/jenkins-plugin-foobar ) [06:56:40] pick the PR there as a change [06:56:56] then maven build it and upload the resulting artifact to CI [06:57:28] and kudos on PHPUnit 6 that must have been a lot of patches [06:58:17] where/how do I upload the artifact? [06:58:53] https://integration.wikimedia.org/ci/pluginManager/ -> [Advanced] tab -> Upload Plugin [06:59:00] not sure whether you have access though [06:59:21] there are a couple others I have forked https://gerrit.wikimedia.org/r/#/admin/projects/?filter=jenkinsci [07:00:22] legoktm: I can handle it if you want [07:01:03] !log rebuilding quibble containers to use 0.0.7 [07:01:06] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [07:01:41] I'm running $ mvn install right now, if this works I think I can take care of it? I've never fiddled with java that much so I'd like to try [07:01:50] yup [07:01:52] 10Beta-Cluster-Infrastructure, 10Operations: Beta cluster Obama page often responds with 503 - https://phabricator.wikimedia.org/T188913#4129085 (10EddieGP) >>! In T188913#4127994, @thcipriani wrote: > Well the deployment-mediawiki-07 backend was the cause of 503s today. I changed the appserver backend in hier... [07:01:58] though it feels like maven is downloading the internet [07:02:02] you will get the result somewhere under the tree. Maybe /build or /target [07:02:12] (or just git status --ignored and one of them will be the result) [07:02:27] and yeah maven handles package management + the build workflow [07:02:49] three keystrokes (m v n ) for everything! [07:03:17] the various jar it downloads should end up under ~/.m2 if you wanna clean it up later [07:06:27] 10Beta-Cluster-Infrastructure, 10Operations: Beta cluster Obama page often responds with 503 - https://phabricator.wikimedia.org/T188913#4023631 (10MoritzMuehlenhoff) @EddieGP : I'm not sure what changed with the addition of mediawiki07, but I can confirm that mediawiki04 was definitely serving traffic as of T... [07:09:32] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [07:11:12] hashar: do I need to bump a version string or anything? [07:14:11] (03PS1) 10Legoktm: Added skipped tag for phpunit xml [integration/jenkinsci/xunit-plugin] - 10https://gerrit.wikimedia.org/r/425965 [07:14:36] (03CR) 10Legoktm: [V: 032 C: 032] Added skipped tag for phpunit xml [integration/jenkinsci/xunit-plugin] - 10https://gerrit.wikimedia.org/r/425965 (owner: 10Legoktm) [07:18:48] (03PS1) 10Legoktm: Bump version to 1.103-wmf.1 [integration/jenkinsci/xunit-plugin] - 10https://gerrit.wikimedia.org/r/425966 [07:19:15] (03CR) 10Legoktm: [V: 032 C: 032] Bump version to 1.103-wmf.1 [integration/jenkinsci/xunit-plugin] - 10https://gerrit.wikimedia.org/r/425966 (owner: 10Legoktm) [07:21:31] !log uploading xunit 1.103-wmf.1 to jenkins [07:21:33] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [07:21:48] > xunit plugin is already installed. Jenkins needs to be restarted for the update to take effect [07:22:13] !log restarting jenkins [07:22:15] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [07:22:52] legoktm: \o/ [07:23:14] I will recommend you for a medal [07:23:49] then I guess test out and if that works fine report on the github PR that it fixed it for us [07:24:03] but it seems the xunit plugin is kind of abandonned :( [07:24:34] (03CR) 10Hashar: [C: 032] Bump Jenkins jobs to quibble 0.0.7 [integration/config] - 10https://gerrit.wikimedia.org/r/425910 (owner: 10Hashar) [07:24:56] yeah, it hasn't been touched in like 2 years [07:24:59] legoktm: then I cancel the long living "performance-xxx" jobs [07:25:03] and restart jenkins on the host [07:25:06] oh [07:25:10] that's why it's waiting [07:25:26] ah it is resdtarting magically [07:25:32] I just killed the last job [07:25:33] yeah jenkins waits for jobs to complete [07:25:43] RECOVERY - Puppet errors on integration-slave-docker-1003 is OK: OK: Less than 1.00% above the threshold [0.0] [07:26:25] (03CR) 10Hashar: [C: 032] "We will see what happens" [integration/config] - 10https://gerrit.wikimedia.org/r/425793 (owner: 10Hashar) [07:27:07] (03CR) 10jerkins-bot: [V: 04-1] Bump Jenkins jobs to quibble 0.0.7 [integration/config] - 10https://gerrit.wikimedia.org/r/425910 (owner: 10Hashar) [07:27:09] (03CR) 10jerkins-bot: [V: 04-1] docker: hhvm quibble image on Jessie [integration/config] - 10https://gerrit.wikimedia.org/r/425793 (owner: 10Hashar) [07:27:46] (03CR) 10Hashar: [C: 032] Bump Jenkins jobs to quibble 0.0.7 [integration/config] - 10https://gerrit.wikimedia.org/r/425910 (owner: 10Hashar) [07:27:56] (03CR) 10Hashar: [C: 032] docker: hhvm quibble image on Jessie [integration/config] - 10https://gerrit.wikimedia.org/r/425793 (owner: 10Hashar) [07:28:34] * legoktm crosses fingers [07:28:52] (03CR) 10Hashar: [C: 032] Add hhvm quibble jobs [integration/config] - 10https://gerrit.wikimedia.org/r/425908 (owner: 10Hashar) [07:29:00] (03Merged) 10jenkins-bot: Bump Jenkins jobs to quibble 0.0.7 [integration/config] - 10https://gerrit.wikimedia.org/r/425910 (owner: 10Hashar) [07:29:09] (03Merged) 10jenkins-bot: docker: hhvm quibble image on Jessie [integration/config] - 10https://gerrit.wikimedia.org/r/425793 (owner: 10Hashar) [07:30:13] (03CR) 10jerkins-bot: [V: 04-1] Add hhvm quibble jobs [integration/config] - 10https://gerrit.wikimedia.org/r/425908 (owner: 10Hashar) [07:30:19] (03CR) 10Hashar: [C: 032] "Forgot to deploy the jobs" [integration/config] - 10https://gerrit.wikimedia.org/r/425908 (owner: 10Hashar) [07:31:34] (03Merged) 10jenkins-bot: Add hhvm quibble jobs [integration/config] - 10https://gerrit.wikimedia.org/r/425908 (owner: 10Hashar) [07:34:28] hashar: https://wikitech.wikimedia.org/wiki/Jenkins#Patch_a_plugin [07:34:54] https://integration.wikimedia.org/ci/job/mediawiki-phpunit-php70-jessie/478/console PASSED! [07:35:01] wonderful [07:35:06] * legoktm hugs hashar [07:38:49] 10Continuous-Integration-Infrastructure, 10MediaWiki-Core-Tests, 10Jenkins: Work around Jenkins's xunit validator being incompatible with PHPUnit 6's extra output in junit.xml - https://phabricator.wikimedia.org/T192120#4129144 (10Legoktm) 05Open>03Resolved a:03Legoktm @hashar explained to me the proce... [07:46:10] (03PS2) 10Legoktm: Don't require documenting self-explaining parameter-less functions [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/423982 (owner: 10Thiemo Kreuz (WMDE)) [07:46:17] legoktm: congratulations! [07:46:22] (03CR) 10Legoktm: [C: 032] "PS2: Rebased" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/423982 (owner: 10Thiemo Kreuz (WMDE)) [07:47:35] (03Merged) 10jenkins-bot: Don't require documenting self-explaining parameter-less functions [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/423982 (owner: 10Thiemo Kreuz (WMDE)) [07:48:01] (03CR) 10jenkins-bot: Don't require documenting self-explaining parameter-less functions [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/423982 (owner: 10Thiemo Kreuz (WMDE)) [07:49:07] (03PS2) 10Legoktm: Fix IllegalSingleLineComment sniff fix for unclosed comments [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425515 (owner: 10Thiemo Kreuz (WMDE)) [07:49:22] (03CR) 10Legoktm: [C: 032] Fix IllegalSingleLineComment sniff fix for unclosed comments [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425515 (owner: 10Thiemo Kreuz (WMDE)) [07:50:04] (03CR) 10jerkins-bot: [V: 04-1] Fix IllegalSingleLineComment sniff fix for unclosed comments [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425515 (owner: 10Thiemo Kreuz (WMDE)) [07:53:51] (03CR) 10jerkins-bot: [V: 04-1] Fix IllegalSingleLineComment sniff fix for unclosed comments [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425515 (owner: 10Thiemo Kreuz (WMDE)) [08:01:19] 10Beta-Cluster-Infrastructure, 10Scap: Enable `scap log` on deployment-* servers - https://phabricator.wikimedia.org/T192032#4125106 (10EddieGP) Yeah, we could do that, by setting up an instance (or just applying the relevant class to deployment-tin). But then we probably want to make the scap-scripts (those r... [08:02:43] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: Request for access to the beta cluster - https://phabricator.wikimedia.org/T190755#4082855 (10EddieGP) Do you still have trouble logging into these instances or can this task be closed? [08:19:45] PROBLEM - Host deployment-mediawiki-08 is DOWN: CRITICAL - Host Unreachable (10.68.17.203) [08:20:40] <_joe_> !log creating deployment-mediawiki-09 with stretch, eliminating -08 which was left in an unusable state T192071 [08:20:42] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [08:20:42] T192071: Upgrade deployment-prep appserver fleet to Debian Stretch (using HHVM) - https://phabricator.wikimedia.org/T192071 [08:44:08] (03CR) 10Thiemo Kreuz (WMDE): Optimize ShortCastSyntax sniff for performance (031 comment) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425504 (owner: 10Thiemo Kreuz (WMDE)) [08:45:03] (03CR) 10Thiemo Kreuz (WMDE): "This proposal looks nice, but does have one issue I care about: The moment all issues are fixed, this code is exclusively confronted with " [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425871 (owner: 10Umherirrender) [08:48:40] !log debugging T189493 on beta [08:48:42] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [08:48:42] T189493: Domain 'sdwiki' is not recognized. - https://phabricator.wikimedia.org/T189493 [08:54:27] RECOVERY - Mediawiki Error Rate on graphite-labs is OK: OK: Less than 1.00% above the threshold [1.0] [08:55:16] (03PS2) 10Thiemo Kreuz (WMDE): Minor performance optimizations to the UnusedUseStatement sniff [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425429 [08:55:24] (03CR) 10Thiemo Kreuz (WMDE): Minor performance optimizations to the UnusedUseStatement sniff (033 comments) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425429 (owner: 10Thiemo Kreuz (WMDE)) [08:58:24] PROBLEM - App Server Main HTTP Response on deployment-mediawiki-09 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:58:40] PROBLEM - Puppet errors on deployment-mediawiki-09 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [09:02:13] (03CR) 10Thiemo Kreuz (WMDE): Optimize PHPUnitClassUsage sniff for performance (031 comment) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425505 (owner: 10Thiemo Kreuz (WMDE)) [09:13:39] RECOVERY - Puppet errors on deployment-mediawiki-09 is OK: OK: Less than 1.00% above the threshold [0.0] [09:18:57] (03PS3) 10Thiemo Kreuz (WMDE): Make use of $phpcsFile->eolChar in two sniffs [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425513 [09:19:30] (03CR) 10Thiemo Kreuz (WMDE): Make use of $phpcsFile->eolChar in two sniffs (031 comment) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425513 (owner: 10Thiemo Kreuz (WMDE)) [09:21:51] 10Release-Engineering-Team (Kanban), 10MediaWiki-Core-Tests, 10MediaWiki-Parser, 10Quibble, and 2 others: [REL1_30] Some parserTests fail on debian stretch using Tidy, because of a new version of libtidy - https://phabricator.wikimedia.org/T191771#4129269 (10hashar) @MoritzMuehlenhoff as Kunal said, MediaW... [09:26:07] 10Release-Engineering-Team (Kanban), 10Release Pipeline, 10Patch-For-Review: Host packaged helm charts at https://releases.wikimedia.org/charts - https://phabricator.wikimedia.org/T191821#4129271 (10akosiaris) Oh! That's great. Now I can do the following locally on my minikube instance Add and update the re... [09:29:22] 10Project-Admins: Create Technical Writing Project - https://phabricator.wikimedia.org/T192093#4129274 (10Aklapper) Hmm. There are existing tags like #documentation and #mediawiki-documentation and #pywikibot-documentation and I wonder if they (or their workboard columns) could be (re)used instead of creating mo... [09:32:47] (03CR) 10Alexandros Kosiaris: [C: 031] "minor comment inline, rest LGTM (although I am no Groovy expert)" (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/425936 (https://phabricator.wikimedia.org/T188935) (owner: 10Dduvall) [09:34:57] PROBLEM - Free space - all mounts on integration-slave-docker-1004 is CRITICAL: CRITICAL: integration.integration-slave-docker-1004.diskspace.root.byte_percentfree (<10.00%) [09:41:32] (03PS1) 10Thiemo Kreuz (WMDE): Optimize PrefixedGlobalFunctions sniff for performance [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425986 [09:49:17] Project beta-scap-eqiad build #203653: 04FAILURE in 5 min 33 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/203653/ [10:00:16] (03CR) 10Thiemo Kreuz (WMDE): Faster scan for namespaces in the PrefixedGlobalFunctions sniff (032 comments) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425434 (owner: 10Thiemo Kreuz (WMDE)) [10:00:19] (03Abandoned) 10Thiemo Kreuz (WMDE): Faster scan for namespaces in the PrefixedGlobalFunctions sniff [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425434 (owner: 10Thiemo Kreuz (WMDE)) [10:01:27] Project mwext-phpunit-coverage-publish build #3307: 04FAILURE in 4.6 sec: https://integration.wikimedia.org/ci/job/mwext-phpunit-coverage-publish/3307/ [10:01:31] Project mwext-phpunit-coverage-publish build #3308: 04STILL FAILING in 3.4 sec: https://integration.wikimedia.org/ci/job/mwext-phpunit-coverage-publish/3308/ [10:02:40] !log root@deployment-tin:/mnt/home/jenkins-deploy# sudo -u jenkins-deploy -- sh -c 'ssh-keyscan "deployment-mediawiki-09.deployment-prep.eqiad.wmflabs" >> .ssh/known_hosts' to fix beta-scap-eqiad [10:02:42] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [10:03:16] Yippee, build fixed! [10:03:17] Project mwext-phpunit-coverage-publish build #3309: 09FIXED in 1 min 44 sec: https://integration.wikimedia.org/ci/job/mwext-phpunit-coverage-publish/3309/ [10:30:13] PROBLEM - Free space - all mounts on deployment-mediawiki-07 is CRITICAL: CRITICAL: deployment-prep.deployment-mediawiki-07.diskspace.root.byte_percentfree (<22.22%) [10:32:10] _joe_: ^ Scap failed on deployment-mediawiki-07 due to "No space left on device (28)" [10:32:28] <_joe_> eddiegp: oh interesting, lemme see what's up there [10:32:50] Project beta-scap-eqiad build #203654: 04STILL FAILING in 39 min: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/203654/ [10:34:47] And there it is. I fixed the ssh host key failure for deployment-mediawiki-09 earlier btw, see the sal for the command. [10:35:01] <_joe_> I just fixed it [10:35:08] <_joe_> eddiegp: how do you fix that? [10:35:15] <_joe_> so that I can do it next time [10:35:23] <_joe_> I thought it was a temporary problem [10:35:25] "see the sal" ;) [10:35:33] I've put the full command there. [10:35:44] <_joe_> eddiegp: ok, thanks [10:35:55] <_joe_> beta is a pile of cheap hacks and bugs :( [10:37:02] Unfortunately, yes :/ I try to work through some of those one by one. [10:37:52] <_joe_> eddiegp: I think there is a structural problem that needs to be addressed [10:38:19] <_joe_> but for now, I'll just add the mount of the rest of the disk on /srv to appservers there [10:42:24] 10Release-Engineering-Team (Kanban), 10Wikidata, 10Wikidata-Ministry-Of-Magic-Tech-Debt, 10Wikidata-Turtles-Tech-Debt, and 4 others: Run Wikibase daily browser tests on Jenkins - https://phabricator.wikimedia.org/T167432#4129336 (10WMDE-leszek) [10:42:29] Yippee, build fixed! [10:42:30] Project beta-scap-eqiad build #203655: 09FIXED in 8 min 56 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/203655/ [10:42:46] Yep, that helped. :) [10:44:16] <_joe_> eddiegp: no actually I'll add the /srv partition in the afternoon, I'd like to do it with zero downtime and I don't trust my connection on a train enough to do it [10:45:33] _joe_: Hmm, the build is fixed, what was it you did then? [10:46:39] <_joe_> I just removed old apt archives for now [10:46:47] <_joe_> that gives you ~ 2 GB of breathing space [10:46:54] Heh, okay. [10:48:41] <_joe_> so my plan is to do as follow: add the appropriate puppet class, disable puppet everywhere, move /srv/mediawiki somewhere else, make a symlink, mount the new partition, sync there and swap the symlink [10:48:45] <_joe_> more or less [10:49:08] <_joe_> the jobrunner will instead have the dedicated /srv from the beginning [10:50:12] RECOVERY - Free space - all mounts on deployment-mediawiki-07 is OK: OK: All targets OK [10:50:39] I read about that at some point. Moving stuff over to a dedicated /srv in the aftermath isn't as easy as it should be. [10:51:11] (03PS1) 10Thiemo Kreuz (WMDE): Streamline validate-phpcs-xml.php script [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/426002 [11:02:48] <_joe_> eddiegp: well it's a standard trick sysadmins do all the time when moving things around [11:03:10] <_joe_> it should be doable with almost-zero downtime [11:08:33] * eddiegp is out for a while. [11:22:35] (03PS3) 10Thiemo Kreuz (WMDE): Fix IllegalSingleLineComment sniff fix for unclosed comments [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425515 [11:22:46] (03CR) 10Thiemo Kreuz (WMDE): Replace strpos() with faster substr() comparisons (031 comment) [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425426 (owner: 10Thiemo Kreuz (WMDE)) [11:33:14] (03CR) 10Zfilipin: "@krinkle replacing `grunt webdriver:test` with `npm run selenium` isn't about getting rid of grunt in favor of npm, it's about moving logi" [integration/config] - 10https://gerrit.wikimedia.org/r/424592 (https://phabricator.wikimedia.org/T179190) (owner: 10Zfilipin) [11:38:25] 10Release-Engineering-Team (Kanban), 10MW-1.31-release-notes (WMF-deploy-2018-04-03 (1.31.0-wmf.28)), 10Patch-For-Review, 10User-zeljkofilipin: Video recording for Selenium tests in Node.js - https://phabricator.wikimedia.org/T179188#4129459 (10zeljkofilipin) >>! In T179188#4123859, @zeljkofilipin wrote: >... [11:42:20] (03Abandoned) 10Zfilipin: WIP killall ffmpeg [integration/config] - 10https://gerrit.wikimedia.org/r/425788 (https://phabricator.wikimedia.org/T179188) (owner: 10Zfilipin) [11:42:51] (03Abandoned) 10Umherirrender: Optimize ShortCastSyntaxSniff sniff for performance [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425871 (owner: 10Umherirrender) [11:49:46] (03PS4) 10Umherirrender: Make use of $phpcsFile->eolChar in two sniffs [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425513 (owner: 10Thiemo Kreuz (WMDE)) [11:49:50] (03CR) 10Umherirrender: [C: 032] Make use of $phpcsFile->eolChar in two sniffs [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425513 (owner: 10Thiemo Kreuz (WMDE)) [11:50:51] (03Merged) 10jenkins-bot: Make use of $phpcsFile->eolChar in two sniffs [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425513 (owner: 10Thiemo Kreuz (WMDE)) [11:51:16] (03CR) 10jenkins-bot: Make use of $phpcsFile->eolChar in two sniffs [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425513 (owner: 10Thiemo Kreuz (WMDE)) [11:55:01] (03PS2) 10Umherirrender: Optimize PrefixedGlobalFunctions sniff for performance [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425986 (owner: 10Thiemo Kreuz (WMDE)) [11:55:05] (03CR) 10Umherirrender: [C: 032] Optimize PrefixedGlobalFunctions sniff for performance [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425986 (owner: 10Thiemo Kreuz (WMDE)) [11:56:02] (03Merged) 10jenkins-bot: Optimize PrefixedGlobalFunctions sniff for performance [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425986 (owner: 10Thiemo Kreuz (WMDE)) [11:56:27] (03CR) 10jenkins-bot: Optimize PrefixedGlobalFunctions sniff for performance [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425986 (owner: 10Thiemo Kreuz (WMDE)) [12:06:55] 10Continuous-Integration-Config, 10MediaWiki-extensions-GettingStarted, 10Wikimedia-log-errors (Jenkins Failure): npm test of GettingStarted is failing due to stylelint just outputs dots and number of errors - https://phabricator.wikimedia.org/T192146#4129480 (10Umherirrender) [12:32:45] (03CR) 10Hashar: [C: 032] "Definitely. Thank you!" [integration/quibble] - 10https://gerrit.wikimedia.org/r/425956 (owner: 10Legoktm) [12:33:13] (03Merged) 10jenkins-bot: Use external docker-registry URL in README [integration/quibble] - 10https://gerrit.wikimedia.org/r/425956 (owner: 10Legoktm) [12:36:09] (03PS1) 10Hashar: Add tox to operations/software/wmfmariadbpy [integration/config] - 10https://gerrit.wikimedia.org/r/426023 [12:38:18] (03CR) 10Hashar: [C: 032] Add tox to operations/software/wmfmariadbpy [integration/config] - 10https://gerrit.wikimedia.org/r/426023 (owner: 10Hashar) [12:38:41] PROBLEM - Puppet errors on deployment-ms-be04 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [12:39:30] (03Merged) 10jenkins-bot: Add tox to operations/software/wmfmariadbpy [integration/config] - 10https://gerrit.wikimedia.org/r/426023 (owner: 10Hashar) [12:54:28] PROBLEM - Puppet errors on deployment-ms-be03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [13:11:59] PROBLEM - Puppet errors on deployment-jobrunner03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [13:29:19] Project beta-scap-eqiad build #203673: 04FAILURE in 5 min 34 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/203673/ [13:30:57] PROBLEM - English Wikipedia Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - string 'Wikipedia' not found on 'https://en.wikipedia.beta.wmflabs.org:443/wiki/Main_Page?debug=true' - 936 bytes in 0.024 second response time [13:31:41] PROBLEM - English Wikipedia Mobile Main page on beta-cluster is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - string 'Wikipedia' not found on 'https://en.m.wikipedia.beta.wmflabs.org:443/wiki/Main_Page?debug=true' - 955 bytes in 0.096 second response time [13:32:54] PROBLEM - App Server Main HTTP Response on deployment-mediawiki-07 is CRITICAL: HTTP CRITICAL: HTTP/1.1 404 Not Found - string 'Wikipedia' not found on 'http://en.wikipedia.beta.wmflabs.org:80/wiki/Main_Page?debug=true' - 324 bytes in 0.003 second response time [13:34:41] _joe_: I suppose that's the "almost"-zero downtime? :) [13:34:54] <_joe_> yeah :P [13:34:59] <_joe_> sorry :) [13:35:01] Project beta-scap-eqiad build #203674: 04STILL FAILING in 5 min 35 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/203674/ [13:35:06] <_joe_> it's very slow [13:35:08] <_joe_> the rsync [13:35:15] Not a problem. [13:35:16] <_joe_> and I expected it to work [13:35:24] <_joe_> and it's not, meh [13:35:31] <_joe_> on 09 it worked better [13:35:38] Just, yesterday I assumed you were on it and you weren't, and then beta was down 2h. So I thought I'd better ask ;) [13:36:02] <_joe_> beta was down why? [13:36:20] Yesterday, because something was off on -07 [13:36:28] <_joe_> I just added one server to the pool, it wasn't even the main server [13:36:32] PROBLEM - Puppet errors on deployment-eventlog05 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [13:36:39] _joe_: https://phabricator.wikimedia.org/T188913 [13:36:40] <_joe_> I switched the pointers once I was sure it worked [13:37:06] <_joe_> that ticket is from march 8? [13:37:21] Yeah, we stole it yesterday [13:37:52] See the comments further down [13:37:58] RECOVERY - App Server Main HTTP Response on deployment-mediawiki-07 is OK: HTTP OK: HTTP/1.1 200 OK - 47363 bytes in 4.648 second response time [13:38:04] <_joe_> sigh, you know being used to production, where things work correctly... [13:38:26] <_joe_> (btw the problem was already over since 5 mins, shinken is dumb) [13:39:01] shinken tests every five minutes or so. [13:39:10] So it probably started right after the last check [13:41:13] Yippee, build fixed! [13:41:14] Project beta-scap-eqiad build #203675: 09FIXED in 5 min 29 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/203675/ [13:41:37] <_joe_> uhm I just noticed in beta [13:41:46] <_joe_> lrwxrwxrwx 1 mwdeploy mwdeploy 17 Apr 11 19:52 php -> php-1.31.0-wmf.29 is a broken link [13:42:37] Uhmm yea, because wikiversions should have 'php-master' for everything. [13:43:12] For whatever reason, it currently doesnt. [13:43:17] php-master is a valid directory [13:46:43] At least I think it was that way at some point. [13:46:48] RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 36042 bytes in 5.346 second response time [13:49:10] <_joe_> ok [13:49:45] <_joe_> eddiegp: the configuration of the varnish backends is absurd in beta, I agree. [13:49:52] Project mwext-phpunit-coverage-publish build #3312: 04FAILURE in 1 min 42 sec: https://integration.wikimedia.org/ci/job/mwext-phpunit-coverage-publish/3312/ [13:49:59] <_joe_> I preferred not to fix it as I supposed people had their reasons [13:50:08] <_joe_> it's not really my realm [13:52:25] That's true for most people, unfortunately. beta is seen as mostly relengs realm, but it's not like releng had time/resources spare to spend them on beta - so, the current state. [13:52:39] At least, not to move forward bigger projects to improve it. [13:58:18] PROBLEM - Puppet errors on integration-slave-jessie-1003 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [13:59:32] PROBLEM - Puppet errors on integration-slave-jessie-1002 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [14:00:46] eddiegp: beta uses wikiversions-labs.json [14:00:51] eddiegp: which has "php-master" [14:01:01] RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 47958 bytes in 4.601 second response time [14:01:48] hashar: Indeed! So I almost remembered correctly :D [14:02:45] 10Continuous-Integration-Config, 10MediaWiki-extensions-GettingStarted, 10Wikimedia-log-errors (Jenkins Failure): npm test of GettingStarted is failing due to stylelint just outputs dots and number of errors - https://phabricator.wikimedia.org/T192146#4129688 (10hashar) [14:02:47] 10Continuous-Integration-Config, 10AbuseFilter, 10Upstream: stylelint is just outputting dots and number of errors, making it impossible to fix - https://phabricator.wikimedia.org/T190072#4129686 (10hashar) [14:02:59] <_joe_> eddiegp: I think the root of the evil comes well before not having resources to maintain it [14:03:21] 10Continuous-Integration-Config, 10MediaWiki-extensions-GettingStarted, 10Wikimedia-log-errors (Jenkins Failure): npm test of GettingStarted is failing due to stylelint just outputs dots and number of errors - https://phabricator.wikimedia.org/T192146#4129480 (10hashar) T190072 is a generic bug for stylelint... [14:03:25] <_joe_> and well before we had a releng team :D [14:05:24] Yeah, I was more talking "why didn't anyone get it out of the bad state it's in" :) [14:09:19] RECOVERY - Puppet staleness on deployment-maps03 is OK: OK: Less than 1.00% above the threshold [3600.0] [14:11:16] Project mwext-phpunit-coverage-publish build #3313: 04STILL FAILING in 1 min 36 sec: https://integration.wikimedia.org/ci/job/mwext-phpunit-coverage-publish/3313/ [14:11:58] RECOVERY - Puppet errors on deployment-jobrunner03 is OK: OK: Less than 1.00% above the threshold [0.0] [14:12:55] Project mwext-phpunit-coverage-publish build #3314: 04STILL FAILING in 1 min 38 sec: https://integration.wikimedia.org/ci/job/mwext-phpunit-coverage-publish/3314/ [14:22:23] Project mwext-phpunit-coverage-publish build #3315: 04STILL FAILING in 1 min 31 sec: https://integration.wikimedia.org/ci/job/mwext-phpunit-coverage-publish/3315/ [14:24:27] Project mwext-phpunit-coverage-publish build #3316: 04STILL FAILING in 2 min 3 sec: https://integration.wikimedia.org/ci/job/mwext-phpunit-coverage-publish/3316/ [14:30:09] <_joe_> anyone has idea about why ferm fails on beta hosts? [14:30:15] <_joe_> before I start debugging [14:33:19] RECOVERY - Puppet errors on integration-slave-jessie-1003 is OK: OK: Less than 1.00% above the threshold [0.0] [14:33:33] thcipriani might be able to help, he tried to debug it yesterday (when it caused the beta cluster to be down) [14:34:18] I guess you've seen https://phabricator.wikimedia.org/T188913#4127842 and his following comments? [14:34:31] I didn't figure out the root cause, it claims to not be able to resolve deployment-prometheus01 but dig works fine [14:35:48] <_joe_> thcipriani: oh I think I know why [14:36:00] ? [14:36:17] <_joe_> dig +short -t AAAA deployment-prometheus01.deployment-prep.eqiad.wmflabs [14:38:03] hashar: You're awesome. [14:38:24] ah, right, resolve is looking for AAAA [14:39:01] Update: I'm disabling nodepool now. It will be off for a while, maybe an hour [14:39:31] RECOVERY - Puppet errors on integration-slave-jessie-1002 is OK: OK: Less than 1.00% above the threshold [0.0] [14:45:56] (03CR) 10Jforrester: [C: 031] Run `npm run selenium` instead of `grunt webdriver:test` [integration/config] - 10https://gerrit.wikimedia.org/r/424592 (https://phabricator.wikimedia.org/T179190) (owner: 10Zfilipin) [14:50:27] <_joe_> the issue is conf.d/10_prometheus-nutcracker-exporter:4:&R_SERVICE(tcp, 9191, (@resolve((deployment-prometheus01.deployment-prep.eqiad.wmflabs)) @resolve((deployment-prometheus01.deployment-prep.eqiad.wmflabs), AAAA))); [14:53:38] Shouldn't all instances have both v4 and v6 addresses? [14:54:18] v6 is not supported yet [14:54:48] it's available in prod, but labs doin't support ipv6 addresses yet [14:54:52] Ah yes. Let me guess, in prod it already is? [14:54:58] Alright :D [14:57:59] We can probably do realm-based both v4&v6 (for prod) or just v4 (for cloud) in profile::prometheus::nutcracker_exporter then? [15:03:25] <_joe_> eddiegp: it's a bit more than just that [15:03:28] <_joe_> sadly [15:03:38] <_joe_> it's all over the prometheus classe [15:03:41] <_joe_> *classes [15:03:52] <_joe_> I'm trying to find a global fix [15:04:20] Make v6 in cloud work? [15:04:24] * eddiegp runs away [15:05:09] i belive openstack recently added support for v6 [15:05:15] though not sure which version. [15:20:49] <_joe_> I monkey patched the ferm issue FTR [15:21:05] <_joe_> but I'll try a better solution on monday [15:21:43] <_joe_> if no one objects, I'll shut down deployment-mediawiki04 and deployment-mediawiki05 now [15:21:50] <_joe_> shut down, not delete [15:22:21] +1 sounds fine to me [15:24:07] +1 Happy to see them go :) [15:24:46] Except ... how would one restart them? If the answer is "through horizon" then now is probably not the best time to attempt a shutdown. [15:25:26] We should have a chance to restart them in case something goes wrong. [15:27:33] <_joe_> heh you're right [15:27:42] <_joe_> let's wait until monday [15:37:24] Project mediawiki-core-code-coverage-php7 build #204: 04STILL FAILING in 37 min: https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage-php7/204/ [15:44:19] I've restarted nodepool. I have more upgrades to do but I believe that nodepool is unaffected by the remaining pieces. [16:10:55] Project beta-scap-eqiad build #203692: 04FAILURE in 7 min 16 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/203692/ [16:17:56] 10Continuous-Integration-Config, 10MediaWiki-extensions-GettingStarted, 10Wikimedia-log-errors (Jenkins Failure): npm test of GettingStarted is failing due to stylelint just outputs dots and number of errors - https://phabricator.wikimedia.org/T192146#4129982 (10Umherirrender) 05duplicate>03Open The link... [16:18:56] 10Continuous-Integration-Config, 10MediaWiki-extensions-GettingStarted, 10Wikimedia-log-errors (Jenkins Failure): npm test of GettingStarted is failing due to stylelint just outputs dots and number of errors - https://phabricator.wikimedia.org/T192146#4129994 (10hashar) I dont know but T190072 is the generic... [16:19:16] Yippee, build fixed! [16:19:17] Project beta-scap-eqiad build #203693: 09FIXED in 5 min 35 sec: https://integration.wikimedia.org/ci/job/beta-scap-eqiad/203693/ [16:21:20] 10Continuous-Integration-Config, 10MediaWiki-extensions-GettingStarted, 10Wikimedia-log-errors (Jenkins Failure): npm test of GettingStarted is failing due to stylelint just outputs dots and number of errors - https://phabricator.wikimedia.org/T192146#4129996 (10Umherirrender) T190072 is closed, I have comme... [16:23:01] 10Continuous-Integration-Config, 10AbuseFilter, 10Upstream: stylelint is just outputting dots and number of errors, making it impossible to fix - https://phabricator.wikimedia.org/T190072#4129999 (10Umherirrender) >>! In T190072#4129805, @Mholloway wrote: > Doesn't fix the issue on Kartographer, either. I s... [16:24:34] Project mediawiki-core-code-coverage build #3443: 04STILL FAILING in 1 hr 24 min: https://integration.wikimedia.org/ci/job/mediawiki-core-code-coverage/3443/ [16:26:19] 10Continuous-Integration-Config, 10AbuseFilter, 10Upstream: stylelint is just outputting dots and number of errors, making it impossible to fix - https://phabricator.wikimedia.org/T190072#4130017 (10Mholloway) >>! In T190072#4129999, @Umherirrender wrote: > I see no failure on the project's patch sets See P... [16:30:02] 10Continuous-Integration-Infrastructure (shipyard), 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018), 10PHP 7.0 support, 10Patch-For-Review: Run MediaWiki tests on PHP 7 - https://phabricator.wikimedia.org/T144962#4130022 (10Jdforrester-WMF) [16:30:26] 10Continuous-Integration-Infrastructure (shipyard), 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018), 10PHP 7.0 support, 10Patch-For-Review: Run MediaWiki tests on PHP 7 - https://phabricator.wikimedia.org/T144962#2615580 (10Jdforrester-WMF) [16:31:21] 10Continuous-Integration-Config, 10MediaWiki-General-or-Unknown, 10PHP 7.0 support: Make Wikimedia CI run PHP in either PHP 7.0+ or HHVM - https://phabricator.wikimedia.org/T190547#4130032 (10Jdforrester-WMF) [16:31:25] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure (Little Steps Sprint), 10Release-Engineering-Team (Someday): Get rid of Zend 5.5 tests for wmf branches - https://phabricator.wikimedia.org/T94149#4130031 (10Jdforrester-WMF) [16:32:02] Wikimedia\Rdbms\DBQueryError from line 1453 of /srv/jenkins-workspace/workspace/mediawiki-core-code-coverage/src/includes/libs/rdbms/database/Database.php: A database query error has occurred. Did you forget to run your application's database schema updater after upgrading? [16:32:04] 16:24:31 Query: SELECT page_id,page_namespace,page_title,page_restrictions,page_is_redirect,page_is_new,page_random,page_touched,page_links_updated,page_latest,page_len,page_content_model FROM unittest_page WHERE page_namespace = '1' AND page_title = 'Not_Main_Page' LIMIT 1 [16:32:05] 10Project-Admins: Create Technical Writing Project - https://phabricator.wikimedia.org/T192093#4130033 (10bd808) @srodlund I think @Aklapper has some valid points here, but I can also see the benefit to a consolidated workboard for a group that is trying to selectively attack these problems. I'm wondering if a... [16:32:06] 16:24:31 Function: WikiPage::pageData [16:32:08] 16:24:31 Error: 1 no such table: unittest_page [16:32:51] 10Continuous-Integration-Config, 10MediaWiki-General-or-Unknown, 10PHP 7.0 support: Make Wikimedia CI run PHP in either PHP 7.0+ or HHVM - https://phabricator.wikimedia.org/T190547#4076530 (10Jdforrester-WMF) [16:32:58] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure (Little Steps Sprint), 10Release-Engineering-Team (Someday): Get rid of Zend 5.5 tests for wmf branches - https://phabricator.wikimedia.org/T94149#1156282 (10Jdforrester-WMF) [16:33:57] 10Continuous-Integration-Config, 10Release-Engineering-Team (Someday), 10Test-Coverage: Switch MediaWiki coverage job from PHP 5 to PHP 7 - https://phabricator.wikimedia.org/T147778#4130037 (10Jdforrester-WMF) [16:37:09] So the code coverage job uses a table which doesn't exist. not sure where it's supposed to be created? [16:40:17] twentyafterfour: See anomie's comment. The creation happens, but then the tables are dropped because of the connection or something. [16:40:36] oh, temp tables? [16:40:41] "Something or other is making sqlite raise a "database schema has changed" error, which makes MediaWiki drop the DB connection and reconnect, which means all the temporary tables the testing framework had set up are gone." [16:40:46] Yeah. [16:42:31] Weird, well I'm not even sure where to start on that. [16:42:51] Hopefully Brad does. [16:47:42] Thanks James, I would have just been stabbing in the dark with that one anyway [16:55:24] !log cherry-pick https://gerrit.wikimedia.org/r/#/c/426104/ (test 3) [16:55:25] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [16:57:24] twentyafterfour: Me too, legoktm is the brilliant one, I just happened to read that 10 minutes previously. :-) [16:59:03] (03CR) 10Dduvall: WIP: Perform helm deployment in service-pipeline (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/425936 (https://phabricator.wikimedia.org/T188935) (owner: 10Dduvall) [17:02:18] legoktm: Whilst we're waiting for the T191863 fun to be fixed, feel like doing T190548 (or helping me find where to do it myself)? [17:02:19] T190548: Update mediawiki-core-qunit-selenium-jessie/mediawiki-extensions-qunit-jessie jobs from PHP5 to PHP7/HHVM - https://phabricator.wikimedia.org/T190548 [17:02:19] T191863: SearchEngineTest fails during PHPUnit coverage job "no such table: unittest_page" - https://phabricator.wikimedia.org/T191863 [17:04:23] (03PS2) 10Dduvall: WIP: Perform helm deployment in service-pipeline [integration/config] - 10https://gerrit.wikimedia.org/r/425936 (https://phabricator.wikimedia.org/T188935) [17:05:48] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team: Request for access to the beta cluster - https://phabricator.wikimedia.org/T190755#4130109 (10chelsyx) Yes I can log in to these instances now. Thanks! @mpopov can you confirm? [17:06:06] Hey, releng -- I'm prepping a couple of new repos (containing old services) for deploy via scap, and I have a couple of questions that I'm hoping someone could answer. Most fundamentally, is this the correct change to make in puppet in order to get things ready for a new service to be scap deployed: https://gerrit.wikimedia.org/r/#/c/426112/ [17:06:30] James_F: https://gerrit.wikimedia.org/r/plugins/gitiles/integration/jenkins/+/master/bin/mw-install-mysql.sh [17:07:08] legoktm: Aha. OK, will give it a whirl. [17:07:11] James_F: oh actually, nope it's just https://gerrit.wikimedia.org/r/plugins/gitiles/integration/config/+/master/zuul/parameter_functions.py#23 [17:07:52] marlier: looking [17:08:33] legoktm: Ta. [17:08:36] marlier: that looks ok so far, yes [17:10:14] (03PS1) 10Jforrester: Move qunit jobs from Zend PHP 5.5 to PHP 7.0 [integration/config] - 10https://gerrit.wikimedia.org/r/426115 (https://phabricator.wikimedia.org/T190548) [17:10:38] legoktm: Also, given the coverage jobs are broken anyway, should we just do T147778 now? [17:10:38] T147778: Switch MediaWiki coverage job from PHP 5 to PHP 7 - https://phabricator.wikimedia.org/T147778 [17:10:51] twentyafterfour: Sweet, thanks. Since these are services that currently exist (deployed via puppet), my plan is basically this: 1) get this puppet change merged; 2) scap deploy the services from their new repos, without modifying the service config file. This means that scap will restart the services in their current locations, but that's fine. 3) Alter the service config files in puppet so that they point to the new locations; 4) do [17:10:51] one final scap deploy to verify that everything is working. [17:11:09] Does that seem like it makes sense? Anything that's obviously missing? [17:11:59] James_F: I'd like to do one clean run under PHP 5 so we have a benchmark of the coverage difference over the past week (esp w/ Aryeh's work), because PHP 7 counts differently [17:12:00] marlier: seems sensible to me. Nothing obviously missing but it might depend on the specifics of these services. [17:12:24] legoktm: Fair. Also if there's a performance difference it'd be nice to have before/after numbers, especially if they're significant. [17:12:45] twentyafterfour: Cool, thanks. These services happen to be pretty straightforward, so fingers crossed it'll just be smooth... [17:17:00] James_F: I think php7 is twice as fast, it takes like an hour now [17:17:07] Nice. [17:20:23] (03PS1) 10Hashar: License under Apache 2.0 [integration/quibble] - 10https://gerrit.wikimedia.org/r/426116 (https://phabricator.wikimedia.org/T192132) [17:22:29] PROBLEM - Puppet errors on deployment-redis02 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [17:22:30] PROBLEM - Puppet errors on deployment-redis01 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [17:22:30] PROBLEM - Puppet errors on integration-cumin is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [17:22:49] Project mwext-phpunit-coverage-publish build #3317: 04STILL FAILING in 1 min 49 sec: https://integration.wikimedia.org/ci/job/mwext-phpunit-coverage-publish/3317/ [17:22:59] (03CR) 10Hashar: "Kunal, you did the port of gitchangedinhead to python and I copy pasted it here. I don't think integration/jenkins has any specific licen" [integration/quibble] - 10https://gerrit.wikimedia.org/r/426116 (https://phabricator.wikimedia.org/T192132) (owner: 10Hashar) [17:23:36] 10Gerrit, 10Release-Engineering-Team, 10Operations, 10Ops-Access-Requests: Requesting access to deployment for pmiazga - https://phabricator.wikimedia.org/T192159#4130139 (10Dzahn) [17:23:54] !log Enabling Pipeline Utility Steps plugin in jenkins for changes to service pipeline in https://gerrit.wikimedia.org/r/#/c/425936/ [17:23:56] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [17:24:57] (03CR) 10Legoktm: [C: 031] "Fine with me, thanks :)" [integration/quibble] - 10https://gerrit.wikimedia.org/r/426116 (https://phabricator.wikimedia.org/T192132) (owner: 10Hashar) [17:25:26] 10Gerrit, 10Release-Engineering-Team, 10Operations, 10Ops-Access-Requests: Requesting access to deployment for pmiazga - https://phabricator.wikimedia.org/T192159#4130143 (10pmiazga) @Dzahn yes, exactly. I wasn't sure what exactly do I need. II noticed that I'm in `deployers` group and it confused me a lot... [17:28:02] PROBLEM - SSH on integration-slave-docker-1012 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:28:36] RECOVERY - App Server Main HTTP Response on deployment-mediawiki-09 is OK: HTTP OK: HTTP/1.1 200 OK - 47367 bytes in 7.533 second response time [17:29:03] 10Gerrit, 10Release-Engineering-Team, 10Operations, 10Ops-Access-Requests: Requesting access to deployment for pmiazga - https://phabricator.wikimedia.org/T192159#4130162 (10Dzahn) I _think_ it's that you are missing here: https://gerrit.wikimedia.org/r/#/admin/groups/21,members @greg @demon Can you appr... [17:29:19] (03CR) 10Hashar: "> I prefer GPL on principle, but I'll defer to your choice since you're the main author" [integration/quibble] - 10https://gerrit.wikimedia.org/r/426116 (https://phabricator.wikimedia.org/T192132) (owner: 10Hashar) [17:30:16] PROBLEM - SSH on integration-slave-docker-1011 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:31:23] (03CR) 10Legoktm: [C: 031] "I wrote on the task that GPL v3 is compatible with Apache 2.0.(see https://www.gnu.org/licenses/license-list.html#apache2)" [integration/quibble] - 10https://gerrit.wikimedia.org/r/426116 (https://phabricator.wikimedia.org/T192132) (owner: 10Hashar) [17:31:38] legoktm: tldr: I dont like gpl3 :] [17:31:44] oh? [17:31:51] but maybe we can dual license it [17:32:04] at least the quibble part [17:32:20] then I realized there is code that comes from openstack Zuul, and I went with the easiest path: pcik the same license [17:32:27] is there a point/advantage to GPL v3 / Apache dual licensing? [17:32:29] fair enough [17:32:41] (03CR) 10Thcipriani: [C: 031] "> I wrote on the task that GPL v3 is compatible with Apache 2.0.(see" [integration/quibble] - 10https://gerrit.wikimedia.org/r/426116 (https://phabricator.wikimedia.org/T192132) (owner: 10Hashar) [17:33:17] but maybe we can go with gpl2+ I dont know [17:33:47] I havent looked a lot at all the madness and picked the easiest path other the GPL2+ :-\ [17:33:52] thcipriani: ^^ :D [17:33:58] anyway. gotta head out for dinner [17:35:12] RECOVERY - SSH on integration-slave-docker-1011 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u4 (protocol 2.0) [17:37:52] RECOVERY - SSH on integration-slave-docker-1012 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u4 (protocol 2.0) [17:48:34] (03CR) 10Hashar: "I am too tired but here is the summary:" [integration/quibble] - 10https://gerrit.wikimedia.org/r/426116 (https://phabricator.wikimedia.org/T192132) (owner: 10Hashar) [17:56:17] PROBLEM - SSH on integration-slave-docker-1011 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [17:58:29] (03CR) 10Legoktm: "Do you want to squash your changes into the original patch?" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/426002 (owner: 10Thiemo Kreuz (WMDE)) [18:00:39] (03PS2) 10Legoktm: Optimize ShortCastSyntax sniff for performance [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425504 (owner: 10Thiemo Kreuz (WMDE)) [18:01:08] RECOVERY - SSH on integration-slave-docker-1011 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u4 (protocol 2.0) [18:01:08] Yippee, build fixed! [18:01:09] Project mwext-phpunit-coverage-publish build #3318: 09FIXED in 1 min 46 sec: https://integration.wikimedia.org/ci/job/mwext-phpunit-coverage-publish/3318/ [18:02:15] (03CR) 10Legoktm: [C: 032] "I'm not really sure on how far we should be taking these micro optimizations, hardcoding the length is probably OK, but I am slightly unco" [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425504 (owner: 10Thiemo Kreuz (WMDE)) [18:03:16] (03Merged) 10jenkins-bot: Optimize ShortCastSyntax sniff for performance [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425504 (owner: 10Thiemo Kreuz (WMDE)) [18:03:42] (03CR) 10jenkins-bot: Optimize ShortCastSyntax sniff for performance [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425504 (owner: 10Thiemo Kreuz (WMDE)) [18:03:53] (03PS4) 10Legoktm: Fix IllegalSingleLineComment sniff fix for unclosed comments [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425515 (owner: 10Thiemo Kreuz (WMDE)) [18:05:03] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Lexicographical data, 10Wikidata, and 2 others: MediaWiki core's node selenium tests flaky when run as part of mwext-mw-selenium-node-composer-jessie job - https://phabricator.wikimedia.org/T191537#4130293 (10zeljkofilipin) Sor... [18:05:49] (03CR) 10Legoktm: [C: 032] "I think the auto fix part potentially reenabling disabled code is OK - we still state that auto fixes do need to be manually reviewed and " [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425515 (owner: 10Thiemo Kreuz (WMDE)) [18:06:36] (03Merged) 10jenkins-bot: Fix IllegalSingleLineComment sniff fix for unclosed comments [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425515 (owner: 10Thiemo Kreuz (WMDE)) [18:07:00] (03CR) 10jenkins-bot: Fix IllegalSingleLineComment sniff fix for unclosed comments [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425515 (owner: 10Thiemo Kreuz (WMDE)) [18:07:17] PROBLEM - SSH on integration-slave-docker-1011 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:10:57] (03PS2) 10Legoktm: Optimize PHPUnitClassUsage sniff for performance [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425505 (owner: 10Thiemo Kreuz (WMDE)) [18:11:01] (03CR) 10Legoktm: [C: 032] Optimize PHPUnitClassUsage sniff for performance [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425505 (owner: 10Thiemo Kreuz (WMDE)) [18:11:58] (03Merged) 10jenkins-bot: Optimize PHPUnitClassUsage sniff for performance [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425505 (owner: 10Thiemo Kreuz (WMDE)) [18:12:21] (03CR) 10jenkins-bot: Optimize PHPUnitClassUsage sniff for performance [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425505 (owner: 10Thiemo Kreuz (WMDE)) [18:12:54] PROBLEM - SSH on integration-slave-docker-1015 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [18:14:45] (03CR) 10Legoktm: [C: 04-1] Enable voting ext-mw-selenium-node-composer-je (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/424277 (owner: 10Addshore) [18:15:00] (03PS2) 10Legoktm: Whitelist HunterH [integration/config] - 10https://gerrit.wikimedia.org/r/423739 (owner: 10Florianschmidtwelzow) [18:15:02] (03PS4) 10Legoktm: Add AndreG-P to the jenkins whitelist [integration/config] - 10https://gerrit.wikimedia.org/r/425841 (owner: 10Physikerwelt) [18:15:04] (03PS5) 10Legoktm: Make mwext-PoolCounter-rake-docker non-voting [integration/config] - 10https://gerrit.wikimedia.org/r/425337 (owner: 10Umherirrender) [18:15:06] (03PS2) 10Legoktm: Add jamesmontalvo3 to the CI whitelist [integration/config] - 10https://gerrit.wikimedia.org/r/425944 (owner: 10Jamesmontalvo3) [18:15:21] (03CR) 10Legoktm: [C: 032] Whitelist HunterH [integration/config] - 10https://gerrit.wikimedia.org/r/423739 (owner: 10Florianschmidtwelzow) [18:15:27] (03CR) 10Legoktm: [C: 032] Add AndreG-P to the jenkins whitelist [integration/config] - 10https://gerrit.wikimedia.org/r/425841 (owner: 10Physikerwelt) [18:15:52] (03CR) 10Legoktm: [C: 032] ":(" [integration/config] - 10https://gerrit.wikimedia.org/r/425337 (owner: 10Umherirrender) [18:15:59] (03CR) 10Legoktm: [C: 032] Add jamesmontalvo3 to the CI whitelist [integration/config] - 10https://gerrit.wikimedia.org/r/425944 (owner: 10Jamesmontalvo3) [18:16:55] (03Merged) 10jenkins-bot: Whitelist HunterH [integration/config] - 10https://gerrit.wikimedia.org/r/423739 (owner: 10Florianschmidtwelzow) [18:16:57] (03Merged) 10jenkins-bot: Add AndreG-P to the jenkins whitelist [integration/config] - 10https://gerrit.wikimedia.org/r/425841 (owner: 10Physikerwelt) [18:17:16] (03Merged) 10jenkins-bot: Make mwext-PoolCounter-rake-docker non-voting [integration/config] - 10https://gerrit.wikimedia.org/r/425337 (owner: 10Umherirrender) [18:17:21] (03Merged) 10jenkins-bot: Add jamesmontalvo3 to the CI whitelist [integration/config] - 10https://gerrit.wikimedia.org/r/425944 (owner: 10Jamesmontalvo3) [18:20:57] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Lexicographical data, 10Wikidata, and 2 others: MediaWiki core's node selenium tests flaky when run as part of mwext-mw-selenium-node-composer-jessie job - https://phabricator.wikimedia.org/T191537#4130361 (10zeljkofilipin) Whe... [18:22:43] RECOVERY - SSH on integration-slave-docker-1015 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u4 (protocol 2.0) [18:24:57] (03PS3) 10Legoktm: Minor performance optimizations to the UnusedUseStatement sniff [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425429 (owner: 10Thiemo Kreuz (WMDE)) [18:25:01] (03CR) 10Legoktm: [C: 032] Minor performance optimizations to the UnusedUseStatement sniff [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425429 (owner: 10Thiemo Kreuz (WMDE)) [18:25:22] 10Release-Engineering-Team (Kanban), 10MediaWiki-extensions-PoolCounter, 10Patch-For-Review, 10Wikimedia-log-errors (Jenkins Failure): Fix tests of PoolCounter extension - https://phabricator.wikimedia.org/T178517#4130379 (10Umherirrender) This test is now non-voting and removed from gate-and-submit. Pleas... [18:25:59] (03Merged) 10jenkins-bot: Minor performance optimizations to the UnusedUseStatement sniff [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425429 (owner: 10Thiemo Kreuz (WMDE)) [18:26:28] (03CR) 10jenkins-bot: Minor performance optimizations to the UnusedUseStatement sniff [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/425429 (owner: 10Thiemo Kreuz (WMDE)) [18:28:15] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Lexicographical data, 10Wikidata, and 2 others: MediaWiki core's node selenium tests flaky when run as part of mwext-mw-selenium-node-composer-jessie job - https://phabricator.wikimedia.org/T191537#4130383 (10zeljkofilipin) It'... [18:28:49] (03CR) 10Legoktm: [C: 04-1] "This needs a rebase, sorry for letting it sit. Might also be easier to squash Thiemo's follow-up into this patch." [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/420159 (https://phabricator.wikimedia.org/T182057) (owner: 10MaxSem) [18:32:03] 10Gerrit, 10Release-Engineering-Team, 10Operations, 10Ops-Access-Requests: Requesting access to deployment for pmiazga - https://phabricator.wikimedia.org/T192159#4130418 (10MaxSem) >>! In T192159#4130118, @Jdlrobson wrote: > @Niharika @MaxSem I don't suppose either of you would be able to help get him up... [18:33:03] (03PS1) 10Legoktm: Release 18.0.0 [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/426133 [18:35:05] (03CR) 10Legoktm: [C: 032] Release 18.0.0 [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/426133 (owner: 10Legoktm) [18:35:53] (03Merged) 10jenkins-bot: Release 18.0.0 [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/426133 (owner: 10Legoktm) [18:36:29] (03CR) 10jenkins-bot: Release 18.0.0 [tools/codesniffer] - 10https://gerrit.wikimedia.org/r/426133 (owner: 10Legoktm) [18:38:43] Project mwext-phpunit-coverage-publish build #3322: 04FAILURE in 1 min 35 sec: https://integration.wikimedia.org/ci/job/mwext-phpunit-coverage-publish/3322/ [18:39:23] PROBLEM - Puppet errors on deployment-mx is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [18:41:16] 10Beta-Cluster-Infrastructure: Could not find class role::etcd::common for deployment-conf03.deployment-prep.eqiad.wmflabs - https://phabricator.wikimedia.org/T168520#4130435 (10EddieGP) 05Open>03Resolved [18:41:18] 10Beta-Cluster-Infrastructure, 10RelEng-Archive-FY201718-Q1, 10Citoid, 10VisualEditor, and 2 others: Beta cluster varnish fails VCL compilation because citoid.wmflabs.org does not resolve - https://phabricator.wikimedia.org/T168519#4130436 (10EddieGP) [18:42:29] Yippee, build fixed! [18:42:30] Project mwext-phpunit-coverage-publish build #3323: 09FIXED in 1 min 51 sec: https://integration.wikimedia.org/ci/job/mwext-phpunit-coverage-publish/3323/ [18:50:37] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Lexicographical data, 10Wikidata, and 2 others: MediaWiki core's node selenium tests flaky when run as part of mwext-mw-selenium-node-composer-jessie job - https://phabricator.wikimedia.org/T191537#4130471 (10zeljkofilipin) Loo... [18:52:36] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Lexicographical data, 10Wikidata, and 2 others: MediaWiki core's node selenium tests flaky when run as part of mwext-mw-selenium-node-composer-jessie job - https://phabricator.wikimedia.org/T191537#4130481 (10zeljkofilipin) Hav... [18:54:04] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Lexicographical data, 10Wikidata, and 2 others: MediaWiki core's selenium tests flaky when run as part of mwext-mw-selenium-node-composer-jessie job - https://phabricator.wikimedia.org/T191537#4130488 (10zeljkofilipin) [19:03:44] (03PS3) 10Dduvall: WIP: Perform helm deployment in service-pipeline [integration/config] - 10https://gerrit.wikimedia.org/r/425936 (https://phabricator.wikimedia.org/T188935) [19:05:51] (03CR) 10Dduvall: "thcipriani and I got it running! https://integration.wikimedia.org/ci/blue/organizations/jenkins/service-pipeline-test-only-debug/detail/s" [integration/config] - 10https://gerrit.wikimedia.org/r/425936 (https://phabricator.wikimedia.org/T188935) (owner: 10Dduvall) [19:08:06] 10Continuous-Integration-Config, 10MediaWiki-extensions-GettingStarted, 10Wikimedia-log-errors (Jenkins Failure): npm test of GettingStarted is failing due to stylelint just outputs dots and number of errors - https://phabricator.wikimedia.org/T192146#4130532 (10Umherirrender) 05Open>03Resolved Failing l... [19:17:05] 10Continuous-Integration-Infrastructure, 10MediaWiki-extensions-SecureSessions, 10Patch-For-Review, 10Wikimedia-log-errors (Jenkins Failure): Secure Sessions ships with php-geoip, but test infrastructure has it already compiled, which gives failures - https://phabricator.wikimedia.org/T157814#4130565 (10Umh... [19:17:55] 10Continuous-Integration-Config, 10MediaWiki-Core-Tests, 10MediaWiki-General-or-Unknown, 10Tracking, 10Wikimedia-log-errors (Jenkins Failure): Let ApiDocumentationTest structure test pass on all repos - https://phabricator.wikimedia.org/T154838#4130568 (10Umherirrender) [19:18:27] 10Continuous-Integration-Config, 10RelEng-Archive-FY201718-Q1, 10MediaWiki-extensions-Other, 10Patch-For-Review, 10Wikimedia-log-errors (Jenkins Failure): LinkSuggest2 test failing due to missing files located in sub repo - https://phabricator.wikimedia.org/T155773#4130569 (10Umherirrender) [19:19:28] 10Continuous-Integration-Infrastructure, 10Release-Engineering-Team (Kanban), 10Jenkins, 10Wikimedia-log-errors (Jenkins Failure): Install zip extension for CI - https://phabricator.wikimedia.org/T179772#4130574 (10Umherirrender) [19:19:47] 10RelEng-Archive-FY201718-Q1, 10MediaWiki-extensions-Other, 10Patch-For-Review, 10Wikimedia-log-errors (Jenkins Failure): PaginateText extension: Tests failing due to missing files located in sub repo - https://phabricator.wikimedia.org/T154935#4130575 (10Umherirrender) [19:19:53] 10Continuous-Integration-Config, 10RelEng-Archive-FY201718-Q1, 10MediaWiki-Core-Tests, 10MediaWiki-extensions-Other, and 2 others: PagesList tests failing due to missing files located in sub repo - https://phabricator.wikimedia.org/T154930#4130576 (10Umherirrender) [19:20:08] 10Continuous-Integration-Config, 10MediaWiki-Core-Tests, 10MediaWiki-extensions-Other, 10Patch-For-Review, 10Wikimedia-log-errors (Jenkins Failure): GooglePlaces tests failing due to missing files located in sub repo - https://phabricator.wikimedia.org/T154848#4130577 (10Umherirrender) [19:20:39] 10Continuous-Integration-Config, 10MediaWiki-extensions-Other, 10Patch-For-Review, 10Wikimedia-log-errors (Jenkins Failure): FlickrAPI test failing due to missing files located in sub repo - https://phabricator.wikimedia.org/T154847#4130579 (10Umherirrender) [19:20:41] 10Continuous-Integration-Config, 10RelEng-Archive-FY201718-Q1, 10Brickimedia, 10MediaWiki-Core-Tests, and 2 others: Skin Refreshed sub repo does not handled in test config - https://phabricator.wikimedia.org/T154806#4130580 (10Umherirrender) [19:21:49] 10Continuous-Integration-Config, 10MediaWiki-extensions-Other, 10Patch-For-Review, 10Wikimedia-log-errors (Jenkins Failure): FlickrAPI test failing due to missing files located in sub repo - https://phabricator.wikimedia.org/T154847#2926008 (10Umherirrender) [19:21:51] 10Continuous-Integration-Config, 10MediaWiki-Core-Tests, 10MediaWiki-extensions-Other, 10Patch-For-Review, 10Wikimedia-log-errors (Jenkins Failure): GooglePlaces tests failing due to missing files located in sub repo - https://phabricator.wikimedia.org/T154848#2926038 (10Umherirrender) [19:21:54] 10Continuous-Integration-Config, 10RelEng-Archive-FY201718-Q1, 10MediaWiki-Core-Tests, 10MediaWiki-extensions-Other, and 2 others: PagesList tests failing due to missing files located in sub repo - https://phabricator.wikimedia.org/T154930#4130599 (10Umherirrender) [19:21:57] 10Continuous-Integration-Config, 10MediaWiki-Core-Tests, 10MediaWiki-General-or-Unknown, 10Tracking, 10Wikimedia-log-errors (Jenkins Failure): Let ApiDocumentationTest structure test pass on all repos - https://phabricator.wikimedia.org/T154838#4130602 (10Umherirrender) [19:22:02] 10Continuous-Integration-Config, 10RelEng-Archive-FY201718-Q1, 10Brickimedia, 10MediaWiki-Core-Tests, and 2 others: Skin Refreshed sub repo does not handled in test config - https://phabricator.wikimedia.org/T154806#2924515 (10Umherirrender) [19:22:07] 10Continuous-Integration-Config, 10TestMe: fix or mark as inactive extensions currently failing CI - https://phabricator.wikimedia.org/T134090#4130584 (10Umherirrender) [19:26:39] 10Continuous-Integration-Config, 10TestMe: fix or mark as inactive extensions currently failing CI - https://phabricator.wikimedia.org/T134090#4130621 (10Umherirrender) I have marked all sub tasks with "Wikimedia-log-errors (Jenkins Failure)" https://phabricator.wikimedia.org/project/profile/3298/ Failures o... [19:41:50] Eurgh. [19:42:11] I really don't like mixing up CI issues ("Jenkins Failure") with production issues ("Wikimedia-log-errors"). [19:42:17] Was there a task about this? [19:43:49] (03Draft2) 10Zoranzoki21: Edit Project Config [extensions/RandomArea] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/426156 [19:44:12] (03CR) 10Zoranzoki21: [C: 032] Edit Project Config [extensions/RandomArea] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/426156 (owner: 10Zoranzoki21) [19:44:19] (03CR) 10Zoranzoki21: [V: 032 C: 032] Edit Project Config [extensions/RandomArea] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/426156 (owner: 10Zoranzoki21) [19:44:58] James_F: maybe open a task and CC Umherirrender? :-/ [19:45:19] or comment on https://phabricator.wikimedia.org/T134090 [19:45:33] (03PS3) 10Umherirrender: Use composer unit tests for BlueSpiceAbout [integration/config] - 10https://gerrit.wikimedia.org/r/417970 [19:45:40] OK, will do. [19:46:17] Krinkle created it - https://phabricator.wikimedia.org/project/manage/3298/ [19:46:58] 10Continuous-Integration-Config, 10TestMe: fix or mark as inactive extensions currently failing CI - https://phabricator.wikimedia.org/T134090#4130668 (10Jdforrester-WMF) From IRC: ``` 12:41:50 Eurgh. 12:42:11 I really don't like mixing up CI issues ("Jenkins Failure") with production iss... [19:47:18] * James_F plays the grumpy old man. [19:48:15] PROBLEM - SSH on integration-slave-docker-1014 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:49:41] legoktm: Jenkins-Failure isn't just for any random test that fails in a random repo [19:50:11] it's specifically for cases where a job is failing in a way that is affecting other extensions. [19:50:23] e.g. because a commit was merged that is intermittently failing and passed on the merge [19:50:53] or because an extension made core tests fail in a way that affects core gerrit patches but somehow the extension patch succeeded and is now breaking master. [19:51:32] I created it because we had a bunch of things constantly breaking mediawiki/core master due to bad tests in extensions. [19:51:39] I suppose it's a bit badly defined. [19:53:08] RECOVERY - SSH on integration-slave-docker-1014 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u4 (protocol 2.0) [19:53:13] Krinkle: Yeah. That'd be "jenkins-failure-breaks-other-repos" not "hey this random repo sucks" >;-) [19:56:29] I've updated the description to better fit the scope of the parent tag Wikimedia-log-errors. [19:56:45] "This tag is used to track issues with Jenkins jobs that are failing due to the current master branch of a Wikimedia-deployed repository having reached a state that is not consistently passing its own tests." [19:56:50] Thoughts? [19:57:17] Most people will never see the description, only the name. [19:57:27] So the name needs to be really, clearly unambiguous. [19:57:38] Name it jenkins-failure-that-breaks-other-repos-but-not-just-some-random-failure-in-some-random-repo-no-really-dont-put-these-here :P [19:57:40] And even then it's not a wikimedia-production-log-error. [19:57:49] 10MediaWiki-Codesniffer: Handle traits in MediaWiki.Commenting.PhpunitAnnotations - https://phabricator.wikimedia.org/T191046#4130693 (10Umherirrender) 05Open>03Resolved p:05Triage>03Normal a:03thiemowmde Fixed by https://gerrit.wikimedia.org/r/#/c/423953/ without changing/apply the naming convention [19:59:27] More seriously, I think most tags are ambigious if you just look at the name and don't read the description. The best you can do is to have a clear description and remove the tag when it's mis-used. [20:10:14] James_F: Well, it's got the parent name in the tag as well [20:10:19] "Wikimedia-log-errors (Jenkins Failure)" [20:15:01] 10Gerrit, 10Developer-Relations, 10Developer-Wishlist (2017): Add a welcome bot to Gerrit for first time contributors - https://phabricator.wikimedia.org/T73357#4130724 (10srishakatux) [20:20:27] 10Gerrit, 10Developer-Relations, 10Developer-Wishlist (2017): Add a welcome bot to Gerrit for first time contributors - https://phabricator.wikimedia.org/T73357#4130757 (10hashar) [20:20:30] 10Continuous-Integration-Infrastructure, 10MediaWiki-extensions-SecureSessions, 10Patch-For-Review: Secure Sessions ships with php-geoip, but test infrastructure has it already compiled, which gives failures - https://phabricator.wikimedia.org/T157814#4130761 (10Krinkle) Untagging, SecureSessions is not [WMF... [20:21:07] 10Gerrit, 10Developer-Relations, 10Developer-Wishlist (2017): Add a welcome bot to Gerrit for first time contributors - https://phabricator.wikimedia.org/T73357#745463 (10hashar) sorry I messed it up :( Feel free to keep this or the other task [20:25:09] 10Continuous-Integration-Config, 10TestMe: fix or mark as inactive extensions currently failing CI - https://phabricator.wikimedia.org/T134090#4130790 (10Krinkle) @Umherirrender I've untagged most of them because: 1) That tag is only for branches of repos that are WMF-deployed and block or otherwise affect Wi... [20:37:58] Krinkle: Yeah. [20:41:41] PROBLEM - Puppet errors on deployment-mediawiki04 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [0.0] [20:42:35] Oh shut up shinken! [20:44:51] AaronSchulz: Error: Could not start Service[mcrouter.service]: Execution of '/usr/sbin/service mcrouter.service start' returned 6: Failed to start mcrouter.service.service: Unit mcrouter.service.service failed to load: No such file or directory. [20:44:56] On deployment-mediawiki04 [20:46:58] eddiegp: yep, fixing [20:47:17] Thanks. [20:56:40] RECOVERY - Puppet errors on deployment-mediawiki04 is OK: OK: Less than 1.00% above the threshold [0.0] [21:23:22] (03CR) 10Legoktm: [C: 031] "This is probably fine, we'll want to deploy this in a low CI traffic time period to make sure it works as expected." [integration/config] - 10https://gerrit.wikimedia.org/r/426115 (https://phabricator.wikimedia.org/T190548) (owner: 10Jforrester) [21:23:53] legoktm: Like… now? :-) [21:24:30] legoktm: Only half a dozen patches landing right now. [21:25:17] I was thinking more like 11pm at night when there are none [21:25:35] also libraryupgrader is just waiting for the queue to be clear before pushing more jobs [21:35:39] (03PS1) 10Hashar: Relay chromedriver stderr at WARNING level [integration/quibble] - 10https://gerrit.wikimedia.org/r/426242 [21:39:15] PROBLEM - SSH on integration-slave-docker-1011 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [21:39:24] (03CR) 10jerkins-bot: [V: 04-1] Relay chromedriver stderr at WARNING level [integration/quibble] - 10https://gerrit.wikimedia.org/r/426242 (owner: 10Hashar) [21:41:02] 10Gerrit, 10Release-Engineering-Team, 10Operations, 10Ops-Access-Requests: Requesting access to deployment for pmiazga - https://phabricator.wikimedia.org/T192159#4130954 (10Niharika) I'll be happy to help. :) [21:44:08] RECOVERY - SSH on integration-slave-docker-1011 is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u4 (protocol 2.0) [21:45:40] 10Gerrit, 10Release-Engineering-Team, 10Operations, 10Ops-Access-Requests: Requesting access to deployment for pmiazga - https://phabricator.wikimedia.org/T192159#4130960 (10Jdlrobson) Thanks both <3 [21:48:01] (03PS4) 10Dduvall: WIP: Perform helm deployment in service-pipeline [integration/config] - 10https://gerrit.wikimedia.org/r/425936 (https://phabricator.wikimedia.org/T188935) [22:07:23] (03Draft2) 10Zoranzoki21: Add RandomPages extension in zuul [integration/config] - 10https://gerrit.wikimedia.org/r/426255 [22:08:51] aparently you can set phab into read only mode.... [22:08:59] a mode i never knew existed heh [22:12:34] (03PS5) 10Dduvall: WIP: Perform helm deployment in service-pipeline [integration/config] - 10https://gerrit.wikimedia.org/r/425936 (https://phabricator.wikimedia.org/T188935) [22:14:47] (03Draft2) 10Zoranzoki21: Edit Project Config [extensions/RandomPages] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/426261 [22:14:52] (03CR) 10Zoranzoki21: [V: 032 C: 032] Edit Project Config [extensions/RandomPages] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/426261 (owner: 10Zoranzoki21) [22:43:37] (03PS1) 10Hashar: Fix up hhvm server logging [integration/quibble] - 10https://gerrit.wikimedia.org/r/426280 [22:46:58] (03CR) 10jerkins-bot: [V: 04-1] Fix up hhvm server logging [integration/quibble] - 10https://gerrit.wikimedia.org/r/426280 (owner: 10Hashar) [22:50:28] (03PS2) 10Hashar: Fix up hhvm server logging [integration/quibble] - 10https://gerrit.wikimedia.org/r/426280 [22:52:02] yeah friday night threads deadlock is not for me :D [22:52:07] good week-end [22:52:31] (03CR) 10Hashar: "It has a deadlock somehow :-(" [integration/quibble] - 10https://gerrit.wikimedia.org/r/426242 (owner: 10Hashar) [22:53:46] (03CR) 10jerkins-bot: [V: 04-1] Fix up hhvm server logging [integration/quibble] - 10https://gerrit.wikimedia.org/r/426280 (owner: 10Hashar) [23:04:34] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team, 10Puppet: Long-lived cherry-picks on deployment-puppetmaster02.deployment-prep.eqiad.wmflabs - https://phabricator.wikimedia.org/T191294#4131090 (10EddieGP) 05Open>03declined wontfix until {T135427}. We did, do and will just ignore that check, or... [23:10:10] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Next): Provide a version of frwiki on Beta Cluster / staging - https://phabricator.wikimedia.org/T166290#3291171 (10EddieGP) Since {T188288} frwiki already is part of our unified letsencrypt cert. JFTR to whoever will tackle this task. [23:10:23] 10Continuous-Integration-Config, 10Release-Engineering-Team (Someday), 10Test-Coverage: Switch MediaWiki coverage job from PHP 5 to PHP 7 - https://phabricator.wikimedia.org/T147778#4131098 (10Jdforrester-WMF) [23:13:58] 10Beta-Cluster-Infrastructure, 10Wikidata: Domain 'sdwiki' is not recognized. - https://phabricator.wikimedia.org/T189493#4043275 (10EddieGP) I poked WMDE about what I found this morning on IRC: > 2018-04-13 09:05:55 eddiegp Hey, how is the wb_changes_dispatch table being populated? > 2018-04-13 09:06:24 eddi... [23:16:02] 10Beta-Cluster-Infrastructure, 10MediaWiki-extensions-ThrottleOverride: Deploy ThrottleOverride to beta cluster - https://phabricator.wikimedia.org/T182161#4131113 (10EddieGP) p:05Triage>03Low [23:16:26] 10Beta-Cluster-Infrastructure, 10cloud-services-team, 10Puppet: labs-puppetmaster/Labs Puppetmaster HTTPS is UNKNOWN since [...] - https://phabricator.wikimedia.org/T191553#4131114 (10EddieGP) p:05Triage>03Low [23:16:47] 10Beta-Cluster-Infrastructure, 10Puppet: deployment-secureredirexperiment puppet error - https://phabricator.wikimedia.org/T191663#4131115 (10EddieGP) p:05Triage>03Normal [23:17:07] 10Beta-Cluster-Infrastructure, 10Puppet: deployment-eventlog05 puppet errors - https://phabricator.wikimedia.org/T191109#4131116 (10EddieGP) p:05Triage>03Normal [23:17:15] 10Beta-Cluster-Infrastructure: Beta: acme-setup failing in beta deployment-cache-upload04 - https://phabricator.wikimedia.org/T178404#4131117 (10EddieGP) p:05Triage>03Normal [23:25:51] 10Beta-Cluster-Infrastructure, 10Operations, 10Puppet: Make deployment-prep puppetmaster more similar to Production puppetmaster - https://phabricator.wikimedia.org/T146627#4131124 (10EddieGP) [23:25:53] 10Beta-Cluster-Infrastructure, 10Patch-For-Review: On beta nutcracker::verbosity default to 5 while prod uses 4, fills disk with a lot of log spam - https://phabricator.wikimedia.org/T136078#4131125 (10EddieGP) [23:25:55] 10Beta-Cluster-Infrastructure: deployment-cache-upload04/text04 Could not find data item cache::cluster in hiera no default supplied at /etc/puppet/modules/role/manifests/cache/base.pp:17 - https://phabricator.wikimedia.org/T136077#4131126 (10EddieGP) [23:25:57] 10Beta-Cluster-Infrastructure, 10Cloud-Services, 10Operations, 10Puppet: Implement role based hiera lookups for labs - https://phabricator.wikimedia.org/T120165#4131122 (10EddieGP) 05Open>03declined I agree with the previous comments. Horizons prefix functionality seems to cover about everything this w... [23:32:43] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Next): Provide a version of frwiki on Beta Cluster / staging - https://phabricator.wikimedia.org/T166290#4131134 (10EddieGP) p:05Triage>03Low