[00:00:02] Any word from the QA engineers? [00:00:07] should I roll back commons to wmf.8 and everything else to wmf.10? [00:00:15] no I haven't heard anything [00:00:29] I believe they also weren't notified last week that wmf.10 was going to group1 on Tuesday [00:00:54] have they been asked to do group1 testing today? [00:01:33] if not, might want to push back in that case (and send e-mail now ahead of tomorrow) so that we can get all clears tomorrow mid-day [00:02:01] ok [00:02:17] I told people to do group1 testing today. [00:02:23] But they may have been busy. :-) [00:03:01] Krinkle: what about commons, should it be rolled back to wmf.8 for now due to T226448 [00:03:02] T226448: Fatal logged after renaming files: "LocalFile.php: Call to a member function purgeEverything() on boolean" - https://phabricator.wikimedia.org/T226448 [00:03:29] Going forward I think the idea of QA sign off might be interesting to look for in general, for two reasons 1) to reduce stress (e.g. I have to test now or the train will roll forward without me), and 2) so that if people are absent or things happen like offsites or vacation, that there is a signal somewhere that asks for the sign off that can then be delegated accordingly (or knowingly ignored instead of unknowingly). [00:03:41] Might also help move the train faster mid-long term if sign off happens within a day. [00:03:53] e.g. might not need a whole day each time. [00:04:17] twentyafterfour: I don't know. Multimedia should decide principally. [00:04:39] but its blurry/fuzzy [00:05:03] could roll back to be safe, given we won't learen anything new between now and tomorrow anyhow. [00:06:00] gotta go now [00:06:01] o/ [00:06:04] thanks Krinkle [02:11:45] 10Diffusion, 10Release-Engineering-Team (Kanban), 10Operations, 10Packaging, and 2 others: Cannot connect to vcs@git-ssh.wikimedia.org (since move from phab1001 to phab1003) - https://phabricator.wikimedia.org/T224677 (10mmodell) So I finally got a chance to test this, I can confirm that my patched sshd bi... [02:13:42] 10Diffusion, 10Release-Engineering-Team (Kanban), 10Operations, 10Packaging, and 2 others: Cannot connect to vcs@git-ssh.wikimedia.org (since move from phab1001 to phab1003) - https://phabricator.wikimedia.org/T224677 (10mmodell) a:05mmodell→03None [02:27:18] PROBLEM - Free space - all mounts on deployment-fluorine02 is CRITICAL: CRITICAL: deployment-prep.deployment-fluorine02.diskspace._srv.byte_percentfree (<22.22%) [03:01:19] Project mwcore-phpunit-coverage-master build #9: 04FAILURE in 1 min 18 sec: https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/9/ [03:24:23] uhoh [03:24:40] 23:01:09 npm WARN tarball tarball data for webdriverio@4.12.0 (sha1-40De8nIYPIFopN0LOCMi+de+4Q0=) seems to be corrupted. Trying one more time. [06:52:19] RECOVERY - Free space - all mounts on deployment-fluorine02 is OK: OK: All targets OK [07:39:22] This is confusing--testwiki and commonswiki both report that they're on 1.34.0-wmf.10, but according to the roadmap and SAL logs, they should be on wmf.8 [07:40:35] Reedy: ^ do you know why this would be? [07:43:16] wikiversions.json also shows that groups 0 and 1 are on wmf.10 [07:44:15] yesteday the train rolled for groups 0 and 1 [07:44:30] this was to cover last week's train being blocked [07:45:06] https://tools.wmflabs.org/versions/ [07:46:41] Thanks for the sanity check. I'm reading T220735 now, glad to see this was intentional. [07:46:42] T220735: 1.34.0-wmf.10 deployment blockers - https://phabricator.wikimedia.org/T220735 [07:48:49] no worries [07:49:18] apergos: One more thing I don't understand (it's a long list :p), I see this note in phabricator, > Mentioned in SAL (#wikimedia-operations) [2019-06-24T17:53:12Z] rebuilt and synchronized wikiversions files: group0 wikis to 1.34.0-wmf.10 refs T220735 [07:49:34] I suppose that was the train deployment. [07:49:41] awight: yes [07:49:48] But I cannot find that message in the wiki SAL [07:49:57] (08:53:12 μμ) logmsgbot: !log twentyafterfour@deploy1001 rebuilt and synchronized wikiversions files: group0 wikis to 1.34.0-wmf.10 refs T220735 [07:50:04] (08:55:21 μμ) stashbot: twentyafterfour@deploy1001: Failed to log message to wiki. Somebody should check the error logs. [07:50:09] so yeah it didn't make it in [07:50:18] awight: sometimes it fails to write to the wiki SAL due to some timeout or something [07:51:46] ah sorry for the gratuitous ping there twenty afterfour [07:52:00] fwiw, I sent a train status update email this past afternoon to wikitech-l, as is customary [07:52:05] apergos: no problem :) [07:52:11] I guess people use SAL in different ways, but for my lazy style it seems important that we can trust the SAL. Mind if I reopen the task to fix this? [07:52:34] fwiw, I see https://phabricator.wikimedia.org/T218708 which was oddly closed as a duplicate of https://phabricator.wikimedia.org/T218608 [07:52:56] hmm [07:53:22] awight: a simple auto-retry from logmsgbot might do the trick [07:53:33] wait 30 secs and retry [07:53:33] sure [07:53:34] er stashbot actually [07:54:24] +1 or some other transactional thing [07:54:29] when I see the "Failed to log message to wiki" error I usually just repeat the !log command but I don't always see it [07:54:32] Now I know to only trust https://tools.wmflabs.org/sal/production though [07:55:16] it was pretty busy in the channel yesterday to see stuff like that [07:55:18] awight: https://tools.wmflabs.org/versions/ is the most user-friendly place to keep tabs on production version status and the SAL [07:55:40] that is quite nice, thanks for the tips [07:56:05] Sorry about the early-morning pings! [07:56:12] np [08:04:06] (03PS1) 10Elukey: Archive the cdh puppet submodule [integration/config] - 10https://gerrit.wikimedia.org/r/518913 (https://phabricator.wikimedia.org/T226474) [08:08:08] 10Diffusion, 10Release-Engineering-Team (Kanban), 10Operations, 10Packaging, and 2 others: Cannot connect to vcs@git-ssh.wikimedia.org (since move from phab1001 to phab1003) - https://phabricator.wikimedia.org/T224677 (10MoritzMuehlenhoff) I'll file a bug against the Debian OpenSSH package, this seems like... [08:13:26] 10Release-Engineering-Team, 10Stashbot: Make stashbot robust to occasional failure when writing to SAL - https://phabricator.wikimedia.org/T226475 (10awight) [08:16:46] 10Diffusion, 10Release-Engineering-Team (Kanban), 10Operations, 10Packaging, and 2 others: Cannot connect to vcs@git-ssh.wikimedia.org (since move from phab1001 to phab1003) - https://phabricator.wikimedia.org/T224677 (10mmodell) This is causing significant inconvenience as we have some repositories which... [08:17:12] RECOVERY - Puppet errors on integration-slave-jessie-1004 is OK: OK: Less than 1.00% above the threshold [2.0] [08:18:54] RECOVERY - Puppet errors on integration-slave-jessie-1001 is OK: OK: Less than 1.00% above the threshold [2.0] [08:25:54] RECOVERY - Puppet errors on integration-slave-jessie-1002 is OK: OK: Less than 1.00% above the threshold [2.0] [09:11:49] (03CR) 10Hashar: Clone requirements from ext dependencies (034 comments) [integration/quibble] - 10https://gerrit.wikimedia.org/r/502286 (https://phabricator.wikimedia.org/T193824) (owner: 10Hashar) [09:27:29] Project beta-code-update-eqiad build #252280: 04FAILURE in 4 min 28 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/252280/ [09:31:52] (03CR) 10Hashar: [C: 04-1] Clone requirements from ext dependencies (037 comments) [integration/quibble] - 10https://gerrit.wikimedia.org/r/502286 (https://phabricator.wikimedia.org/T193824) (owner: 10Hashar) [09:32:24] Puppet errors on integration-slave-jessie-1004 is OK < who knows how it happened [09:34:23] Yippee, build fixed! [09:34:23] Project beta-code-update-eqiad build #252281: 09FIXED in 1 min 22 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/252281/ [09:41:22] (03CR) 10Awight: Clone requirements from ext dependencies (031 comment) [integration/quibble] - 10https://gerrit.wikimedia.org/r/502286 (https://phabricator.wikimedia.org/T193824) (owner: 10Hashar) [09:48:41] (03CR) 10Awight: Clone requirements from ext dependencies (031 comment) [integration/quibble] - 10https://gerrit.wikimedia.org/r/502286 (https://phabricator.wikimedia.org/T193824) (owner: 10Hashar) [09:50:01] (03PS1) 10Hashar: Extract isCoreOrVendor and isExtOrSkin to util [integration/quibble] - 10https://gerrit.wikimedia.org/r/518950 [09:55:21] 10Release-Engineering-Team (Kanban), 10Release Pipeline, 10Core Platform Team (Extension Management (TEC13)), 10Core Platform Team Kanban (Doing), and 2 others: Determine a standard way of installing MediaWiki lib/extension dependencies within containers - https://phabricator.wikimedia.org/T193824 (10daniel... [09:58:13] (03CR) 10Hashar: [C: 04-1] Clone requirements from ext dependencies (032 comments) [integration/quibble] - 10https://gerrit.wikimedia.org/r/502286 (https://phabricator.wikimedia.org/T193824) (owner: 10Hashar) [09:58:30] (03PS6) 10Hashar: Clone requirements from ext dependencies [integration/quibble] - 10https://gerrit.wikimedia.org/r/502286 (https://phabricator.wikimedia.org/T193824) [10:01:34] (03CR) 10Hashar: "That one should be easy to review :]" (032 comments) [integration/quibble] - 10https://gerrit.wikimedia.org/r/518950 (owner: 10Hashar) [10:02:07] (03CR) 10Hashar: [C: 04-1] "Still need to try it out and write some more high level tests I guess :-\" [integration/quibble] - 10https://gerrit.wikimedia.org/r/502286 (https://phabricator.wikimedia.org/T193824) (owner: 10Hashar) [10:06:14] 10Release-Engineering-Team, 10Release-Engineering-Team-TODO, 10Operations, 10SRE-Access-Requests: Request access to deployment cluster for Alaa Sarhan - https://phabricator.wikimedia.org/T223698 (10jbond) @greg are you able to approve this request? [10:49:37] (03CR) 10Awight: [C: 03+1] "would merge (but haven't the permissions)" [integration/quibble] - 10https://gerrit.wikimedia.org/r/518950 (owner: 10Hashar) [11:09:30] legoktm: hey, would you like to join #freenode_#wikimedia-codehealth:matrix.org ? Occasionally there is some discussion there related to the unit tests separation project. [11:10:40] awight: I will grant you CR+2 on integration/quibble [11:11:32] mszabo-wikia: That looks like such a munged channel :P [11:12:01] yeah, I mean #wikimedia-codehealth [11:12:11] this client is... alpha software [11:12:23] alpha, as in https://twitter.com/DEVOPS_BORAT/status/212664225754132480 [11:12:42] awight: I think you should now be able to +2 https://gerrit.wikimedia.org/r/#/c/integration/quibble/+/518950/ :) [11:19:32] (03CR) 10Hashar: "We also need to remove the Jenkins jobs. Can be done by removing cdh/cdh4 in jjb/operations-puppet.yaml:" (032 comments) [integration/config] - 10https://gerrit.wikimedia.org/r/518913 (https://phabricator.wikimedia.org/T226474) (owner: 10Elukey) [11:23:07] (03PS2) 10Elukey: Archive the cdh/cdh4 puppet submodule [integration/config] - 10https://gerrit.wikimedia.org/r/518913 (https://phabricator.wikimedia.org/T226474) [11:24:08] (03CR) 10Elukey: "Thanks! I didn't remove them because I saw that other jobs were still there:" [integration/config] - 10https://gerrit.wikimedia.org/r/518913 (https://phabricator.wikimedia.org/T226474) (owner: 10Elukey) [11:33:34] (03CR) 10Hashar: Minor copyediting (033 comments) [integration/quibble] - 10https://gerrit.wikimedia.org/r/510246 (owner: 10Awight) [11:38:22] (03CR) 10Hashar: [C: 03+2] "> we have only zookeeper and ngnix left, so probably those will need to be cleaned up as well?" [integration/config] - 10https://gerrit.wikimedia.org/r/518913 (https://phabricator.wikimedia.org/T226474) (owner: 10Elukey) [11:38:51] elukey: and I am merging the archival of cdh/cdh4 in CI. If you want to empty the git repository and just leave a README.md file, the change would have to be force emrged [11:38:55] but the gerrit repo are still open [11:40:34] (03Merged) 10jenkins-bot: Archive the cdh/cdh4 puppet submodule [integration/config] - 10https://gerrit.wikimedia.org/r/518913 (https://phabricator.wikimedia.org/T226474) (owner: 10Elukey) [11:41:26] hashar: ah yes yes [11:41:33] thanks! [11:42:14] !log Deleting Jenkins jobs puppet-cdh-rake-docker and puppet-cdh4-rake-docker # T226474 [11:42:16] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [11:42:16] T226474: Archive cdh puppet submodule - https://phabricator.wikimedia.org/T226474 [11:50:20] hashar: is there any doc about archival of diffusion/gh repos? [11:50:29] (sorry didn't find it :( ) [11:53:50] (03PS1) 10Elukey: Remove old modules already archived [integration/config] - 10https://gerrit.wikimedia.org/r/518973 [11:54:24] ah wait it looks like some modules didn't go through proper archival [11:54:27] like https://gerrit.wikimedia.org/r/#/admin/projects/operations/puppet/mariadb [11:54:34] that afaik is not used anymore [11:54:39] same thing for wikimetrics [11:55:04] so probably worth to open a task [11:55:09] (03CR) 10Awight: [C: 03+2] "Wheee :-)" [integration/quibble] - 10https://gerrit.wikimedia.org/r/518950 (owner: 10Hashar) [11:56:04] (03Merged) 10jenkins-bot: Extract isCoreOrVendor and isExtOrSkin to util [integration/quibble] - 10https://gerrit.wikimedia.org/r/518950 (owner: 10Hashar) [11:56:50] (03CR) 10jerkins-bot: [V: 04-1] Remove old modules already archived [integration/config] - 10https://gerrit.wikimedia.org/r/518973 (owner: 10Elukey) [11:56:54] (03CR) 10jenkins-bot: Extract isCoreOrVendor and isExtOrSkin to util [integration/quibble] - 10https://gerrit.wikimedia.org/r/518950 (owner: 10Hashar) [11:56:58] (03PS1) 10Jbond: ppc check experimental: removing trailing comma [integration/config] - 10https://gerrit.wikimedia.org/r/518974 (https://phabricator.wikimedia.org/T166066) [12:01:29] (03PS3) 10Awight: Minor copyediting [integration/quibble] - 10https://gerrit.wikimedia.org/r/510246 [12:01:38] (03CR) 10Awight: Minor copyediting (033 comments) [integration/quibble] - 10https://gerrit.wikimedia.org/r/510246 (owner: 10Awight) [12:02:01] Once I remove all the incorrect information I've added, this patch will be +0/-0 ;-) [12:02:29] 10Release-Engineering-Team, 10Release-Engineering-Team-TODO, 10Operations, 10puppet-compiler, and 2 others: Integrate the puppet compiler in the puppet CI pipeline - https://phabricator.wikimedia.org/T166066 (10jbond) I hit trailing comma issue reported by ayounsi above and have attempted a fix. CR whic... [13:18:30] PROBLEM - Long lived cherry-picks on puppetmaster on deployment-puppetmaster03 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [0.0] [13:45:46] 10Release-Engineering-Team (Kanban), 10Scap, 10MediaWiki-ResourceLoader, 10MW-1.34-notes (1.34.0-wmf.7; 2019-05-28), and 3 others: Scap deployments are not purging MessageBlobStore (was: Stale localized messages) - https://phabricator.wikimedia.org/T222539 (10jijiki) [13:45:49] 10Release-Engineering-Team (Deployment services), 10Release-Engineering-Team-TODO, 10Scap, 10serviceops, 10User-jijiki: Deploy scap 3.10.0-1 - https://phabricator.wikimedia.org/T224915 (10jijiki) 05Open→03Resolved a:03jijiki [13:58:41] 10Phabricator-Sprint-Extension: Call to undefined method SprintProjectProfilePanelEngine::buildNavigation() when accessing Burndown since 2019.16 - https://phabricator.wikimedia.org/T222586 (10Vlaza-servoy-com) Just got a head-up from Alves Logo Michael (kairel) with a fix that I merged in my fork https://github... [13:59:20] 10Phabricator-Sprint-Extension: Call to undefined method SprintProjectProfilePanelEngine::buildNavigation() when accessing Burndown since 2019.16 - https://phabricator.wikimedia.org/T222586 (10Vlaza-servoy-com) 05Open→03Resolved a:03Vlaza-servoy-com [14:13:17] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10Release, 10Train Deployments: 1.34.0-wmf.10 deployment blockers - https://phabricator.wikimedia.org/T220735 (10Aklapper) [14:14:57] !log gerrit: created `operations/debs/file-read-backwards.git` refs. T226449 [14:14:59] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:15:00] T226449: Please create operations/debs/file-read-backwards gerrit repository - https://phabricator.wikimedia.org/T226449 [14:15:12] (03PS1) 10MarcoAurelio: Add `operations/debs/file-read-backwards.git` repo to zuul [integration/config] - 10https://gerrit.wikimedia.org/r/519022 (https://phabricator.wikimedia.org/T226449) [14:18:17] (03PS2) 10MarcoAurelio: Add `operations/debs/file-read-backwards.git` repo to zuul [integration/config] - 10https://gerrit.wikimedia.org/r/519022 (https://phabricator.wikimedia.org/T226449) [14:18:46] (03CR) 10MarcoAurelio: "No idea if this is the right thing to do with this repo though, of if it is rightly configured." [integration/config] - 10https://gerrit.wikimedia.org/r/519022 (https://phabricator.wikimedia.org/T226449) (owner: 10MarcoAurelio) [14:19:14] (03PS7) 10Hashar: Clone requirements from ext dependencies [integration/quibble] - 10https://gerrit.wikimedia.org/r/502286 (https://phabricator.wikimedia.org/T193824) [14:19:18] (03PS1) 10Hashar: Ensure zuul.clone works without mediawiki/core [integration/quibble] - 10https://gerrit.wikimedia.org/r/519024 [14:21:45] (03PS1) 10WMDE-Fisch: Add CentralAuth to FileImporter required extensions [integration/config] - 10https://gerrit.wikimedia.org/r/519027 (https://phabricator.wikimedia.org/T225617) [14:22:20] (03CR) 10Hashar: "I have found a bug when testing it and rebased on top of the fix:" [integration/quibble] - 10https://gerrit.wikimedia.org/r/502286 (https://phabricator.wikimedia.org/T193824) (owner: 10Hashar) [14:24:56] (03CR) 10Hashar: Ensure zuul.clone works without mediawiki/core (031 comment) [integration/quibble] - 10https://gerrit.wikimedia.org/r/519024 (owner: 10Hashar) [14:25:08] (03CR) 10Awight: Ensure zuul.clone works without mediawiki/core (031 comment) [integration/quibble] - 10https://gerrit.wikimedia.org/r/519024 (owner: 10Hashar) [14:26:17] awight: I give up for today :( [14:26:21] catching up the kids etc [14:26:26] (03CR) 10Hashar: Ensure zuul.clone works without mediawiki/core (031 comment) [integration/quibble] - 10https://gerrit.wikimedia.org/r/519024 (owner: 10Hashar) [14:26:42] at least I managed to rebase my patch [14:27:28] * hashar is a way [14:27:30] away [14:27:31] o/ ! [14:28:38] (03CR) 10Awight: [C: 03+2] Ensure zuul.clone works without mediawiki/core (031 comment) [integration/quibble] - 10https://gerrit.wikimedia.org/r/519024 (owner: 10Hashar) [14:29:15] (03CR) 10Krinkle: "Should this be reflected in the extensions' extension.json file requirements as well, so that MW will verify this?" [integration/config] - 10https://gerrit.wikimedia.org/r/519027 (https://phabricator.wikimedia.org/T225617) (owner: 10WMDE-Fisch) [14:29:17] (03Merged) 10jenkins-bot: Ensure zuul.clone works without mediawiki/core [integration/quibble] - 10https://gerrit.wikimedia.org/r/519024 (owner: 10Hashar) [14:30:34] (03CR) 10jenkins-bot: Ensure zuul.clone works without mediawiki/core [integration/quibble] - 10https://gerrit.wikimedia.org/r/519024 (owner: 10Hashar) [14:34:39] !log github: created https://github.com/wikimedia/operations-debs-file-read-backwards mirror refs. T226449 [14:34:41] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:34:41] T226449: Please create operations/debs/file-read-backwards gerrit repository - https://phabricator.wikimedia.org/T226449 [14:39:49] !log Ran `replication start operations/debs/file-read-backwards --wait` on gerrit.wikimedia.org refs. T226449 [14:39:54] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [14:39:56] T226449: Please create operations/debs/file-read-backwards gerrit repository - https://phabricator.wikimedia.org/T226449 [14:42:13] (03CR) 10Awight: [C: 03+1] "> Should this be reflected in the extensions' extension.json file" [integration/config] - 10https://gerrit.wikimedia.org/r/519027 (https://phabricator.wikimedia.org/T225617) (owner: 10WMDE-Fisch) [15:25:45] (03PS1) 10Awight: Test some zuul things [integration/quibble] - 10https://gerrit.wikimedia.org/r/519049 [15:26:16] (03CR) 10Awight: [C: 03+2] "I added the discussed tests as I51ca55f5860e833d8c5742cc5a9b33b6866b6469" [integration/quibble] - 10https://gerrit.wikimedia.org/r/519024 (owner: 10Hashar) [15:28:40] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10Release, 10Train Deployments: 1.34.0-wmf.10 deployment blockers - https://phabricator.wikimedia.org/T220735 (10Jdforrester-WMF) [15:29:27] (03CR) 10Awight: Clone requirements from ext dependencies (031 comment) [integration/quibble] - 10https://gerrit.wikimedia.org/r/502286 (https://phabricator.wikimedia.org/T193824) (owner: 10Hashar) [15:50:50] !log deployment-prep restart php7.2-fpm for wikidiff2 upgrade (T223391) [15:50:53] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:50:53] T223391: Deploy Wikidiff2 version 1.8.2 with the timeout issue fixed - https://phabricator.wikimedia.org/T223391 [15:51:22] /o\ I don't think that log went where I hoped it would. [15:52:01] (reposted in #wikimedia-cloud) [15:52:41] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10Release, 10Train Deployments: 1.34.0-wmf.10 deployment blockers - https://phabricator.wikimedia.org/T220735 (10Jdforrester-WMF) [15:53:37] awight: where'd you want it? [15:53:49] that's the right log for deployment-prep/beta cluster [15:55:58] (03CR) 10Jforrester: [C: 03+2] Add CentralAuth to FileImporter required extensions [integration/config] - 10https://gerrit.wikimedia.org/r/519027 (https://phabricator.wikimedia.org/T225617) (owner: 10WMDE-Fisch) [15:56:44] greg-g: hi! I think it landed in the beta cluster log, where I think I wanted it ;-) [15:56:50] yay [15:57:32] (03Merged) 10jenkins-bot: Add CentralAuth to FileImporter required extensions [integration/config] - 10https://gerrit.wikimedia.org/r/519027 (https://phabricator.wikimedia.org/T225617) (owner: 10WMDE-Fisch) [15:58:05] !log Adding CentralAuth to FileImporter required extensions in Zuul [15:58:07] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [15:59:47] thcipriani: Noticed a scap deploy just now with 2/11 servers over the threshold, does that mean syncs only stop at canaries when all 11 exceed their local logstash threshold? [16:00:05] also noticed the logstash query is type:mediawiki or hhvm, that'll need to include syslog/php72 as well. [16:00:08] * Krinkle files task [16:00:09] Krinkle: it's a percentage [16:00:28] 15:56:44 Check 'Logstash Error rate for mw1263.eqiad.wmnet' failed: ERROR: 80% OVER_THRESHOLD (Avg. Error rate: Before: 0.02, After: 5.00, Threshold: 1. [16:00:28] 00) [16:00:28] 15:56:44 Check 'Logstash Error rate for mw1278.eqiad.wmnet' failed: ERROR: 50% OVER_THRESHOLD (Avg. Error rate: Before: 0.01, After: 2.00, Threshold: 1. [16:00:28] 00) [16:00:28] 15:56:44 Canary error check failed for 2 canaries, less than threshold to halt deployment (2/11), see https://logstash.wikimedia.org/goto/db09a36be5ed3e [16:00:36] I mean the last line [16:00:41] which is what decides whether to abort or not [16:00:58] is "threshold to halt deployment" 11 of 11? [16:01:17] I think the threshold is 3 of 11. [16:01:37] Or, rather, > 2/11. [16:01:40] https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/tools/scap/+/master/scap/main.py#388 [16:01:54] Right. [16:02:12] 0.25*11 = 2.75 = 3 canaries for us. [16:02:20] right [16:02:40] ah, the message doesn't say what max_failed_canaries is [16:02:41] OK [16:02:53] Maybe the message should state the threshold. [16:03:03] yeah, that would be helpful, probably [16:04:40] oh, it does, it just uses failed > max_failed_canaries so (2/11) should be (3/11) [16:06:58] thcipriani: I thought that's the number that actually failed [16:07:04] vs the minimum to abort deploy [16:07:24] e.g. when it did not reach the threshold [16:07:36] oh right [16:07:37] wrong line [16:07:39] yep, it does already [16:07:46] just a rounding/truncate error I guess? [16:07:50] exactly [16:07:59] division of ints in python rounds [16:08:05] plus we use > [16:08:12] so the log is incorrect [16:08:26] so it has to be > 2 [16:44:49] (03CR) 10Hashar: [C: 03+2] Add `operations/debs/file-read-backwards.git` repo to zuul [integration/config] - 10https://gerrit.wikimedia.org/r/519022 (https://phabricator.wikimedia.org/T226449) (owner: 10MarcoAurelio) [16:46:34] (03Merged) 10jenkins-bot: Add `operations/debs/file-read-backwards.git` repo to zuul [integration/config] - 10https://gerrit.wikimedia.org/r/519022 (https://phabricator.wikimedia.org/T226449) (owner: 10MarcoAurelio) [16:47:13] (03CR) 10Hashar: [C: 03+2] "Deployed!" [integration/config] - 10https://gerrit.wikimedia.org/r/519022 (https://phabricator.wikimedia.org/T226449) (owner: 10MarcoAurelio) [16:48:56] 10Continuous-Integration-Config, 10Release Pipeline, 10serviceops-radar, 10Core Platform Team (RESTBase Split (CDP2)), and 2 others: Trigger RESTRouter image builds on push/tag - https://phabricator.wikimedia.org/T226536 (10mobrovac) p:05Triage→03High [16:56:09] 10Continuous-Integration-Config, 10Release Pipeline, 10serviceops-radar, 10Core Platform Team (RESTBase Split (CDP2)), and 2 others: Trigger RESTRouter image builds on push/tag - https://phabricator.wikimedia.org/T226536 (10mobrovac) [17:06:17] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10Release, 10Train Deployments: 1.34.0-wmf.10 deployment blockers - https://phabricator.wikimedia.org/T220735 (10Krinkle) [17:08:50] (03PS1) 10Thcipriani: Fix failed canary threshold logging [tools/scap] - 10https://gerrit.wikimedia.org/r/519074 [17:23:53] (03CR) 10Jforrester: [C: 03+2] Fix failed canary threshold logging [tools/scap] - 10https://gerrit.wikimedia.org/r/519074 (owner: 10Thcipriani) [17:25:33] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10Release, 10Train Deployments: 1.34.0-wmf.10 deployment blockers - https://phabricator.wikimedia.org/T220735 (10Jdforrester-WMF) [17:26:42] (03Merged) 10jenkins-bot: Fix failed canary threshold logging [tools/scap] - 10https://gerrit.wikimedia.org/r/519074 (owner: 10Thcipriani) [17:27:22] (03CR) 10jenkins-bot: Fix failed canary threshold logging [tools/scap] - 10https://gerrit.wikimedia.org/r/519074 (owner: 10Thcipriani) [17:30:07] 10Continuous-Integration-Config, 10Release Pipeline, 10serviceops-radar, 10Core Platform Team (RESTBase Split (CDP2)), and 2 others: Trigger RESTRouter image builds on push/tag - https://phabricator.wikimedia.org/T226536 (10mobrovac) [17:37:51] (03PS1) 10markahershberger: Edit Project Config [extensions/WhoIsWatching] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/519077 [17:40:44] 10Phabricator-Bot-Requests, 10Discourse, 10Space: Loop trying to create an account in Wikimedia Space with certain browser versions - https://phabricator.wikimedia.org/T226545 (10Qgil) [17:40:52] 10Phabricator-Bot-Requests, 10Discourse, 10Space: Loop trying to create an account in Wikimedia Space with certain browser versions - https://phabricator.wikimedia.org/T226545 (10Qgil) p:05Triage→03High [17:42:29] 10Phabricator-Bot-Requests, 10Discourse, 10Space: Loop trying to create an account in Wikimedia Space with certain browser versions - https://phabricator.wikimedia.org/T226545 (10Quiddity) [17:43:56] 10Phabricator-Bot-Requests, 10Discourse, 10Space: Loop trying to create an account in Wikimedia Space with certain browser versions - https://phabricator.wikimedia.org/T226545 (10Quiddity) I have the loop with: Firefox 67.0.4 It works successfully with: Chromium 74.0.3729.169 [17:44:02] 10Phabricator-Bot-Requests, 10Discourse, 10Space: Loop trying to create an account in Wikimedia Space with certain browser versions - https://phabricator.wikimedia.org/T226545 (10Qgil) [17:46:28] 10Phabricator-Bot-Requests, 10Discourse, 10Space: Loop trying to create an account in Wikimedia Space with certain browser versions - https://phabricator.wikimedia.org/T226545 (10Quiddity) [17:48:05] 10Phabricator-Bot-Requests, 10Discourse, 10Space: Loop trying to create an account in Wikimedia Space with certain browser versions - https://phabricator.wikimedia.org/T226545 (10Quiddity) [17:48:16] 10Phabricator-Bot-Requests, 10Discourse, 10Space: Loop trying to create an account in Wikimedia Space with certain browser versions - https://phabricator.wikimedia.org/T226545 (10Quiddity) [17:48:38] 10Phabricator-Bot-Requests, 10Discourse, 10Space: Loop trying to create an account in Wikimedia Space with certain browser versions - https://phabricator.wikimedia.org/T226545 (10Qgil) I get the loop in Firefox 60.7.0esr (64-bit) Works with Firefox 66.0.5 (64-bit). Also works with Chrome 62.0.3202.89 (64-b... [17:50:36] 10Phabricator-Bot-Requests, 10Discourse, 10Space: Loop trying to create an account in Wikimedia Space with certain browser versions - https://phabricator.wikimedia.org/T226545 (10mmodell) Works fine for me with Firefox 67.0.3 (64-bit) [17:56:25] 10Phabricator-Bot-Requests, 10Discourse, 10Space: Loop trying to create an account in Wikimedia Space with certain browser versions - https://phabricator.wikimedia.org/T226545 (10Tgr) Does it also happen with Javascript disabled? [17:56:39] (03PS1) 10Thcipriani: maintentance: cleanup contint1001, too [integration/config] - 10https://gerrit.wikimedia.org/r/519083 (https://phabricator.wikimedia.org/T207702) [18:07:22] Yippee, build fixed! [18:07:23] Project mwcore-phpunit-coverage-master build #10: 09FIXED in 3 hr 7 min: https://integration.wikimedia.org/ci/job/mwcore-phpunit-coverage-master/10/ [18:12:41] (03CR) 10Jforrester: "If e.g. we have an image "foo", and we use foo:0.1 in image "bar" and foo:0.2 in image "baz", trimming foo:0.1 will mean that baz will end" [integration/config] - 10https://gerrit.wikimedia.org/r/519083 (https://phabricator.wikimedia.org/T207702) (owner: 10Thcipriani) [18:14:09] (03CR) 10Jforrester: maintentance: cleanup contint1001, too (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/519083 (https://phabricator.wikimedia.org/T207702) (owner: 10Thcipriani) [18:14:53] thcipriani: Is https://phabricator.wikimedia.org/T207707 blocked on SRE? If so, we should mention it in SoS. [18:15:36] James_F: yes please [18:15:48] ServiceOps, I guess? [18:22:18] I think that's correct [18:25:40] although, I suppose, SRE does know who can help, so a general SRE callout in SoS should be fine :) [18:26:45] thcipriani: https://etherpad.wikimedia.org/p/Scrum-of-Scrums updated with our stuff including that CTA. :-) [18:27:25] James_F: <3 [18:41:08] 10Release-Engineering-Team (Kanban), 10Release, 10Train Deployments: 1.34.0-wmf.11 deployment blockers - https://phabricator.wikimedia.org/T220736 (10mmodell) a:03jeena [18:41:16] 10Phabricator-Bot-Requests, 10Discourse, 10Space: Loop trying to create an account in Wikimedia Space with certain browser versions - https://phabricator.wikimedia.org/T226545 (10Aklapper) Anything relevant shown in the 'console' of the Developer Tools? For more information, please see https://developer.mozi... [18:45:36] (03PS2) 10Thcipriani: maintentance: cleanup contint1001, too [integration/config] - 10https://gerrit.wikimedia.org/r/519083 (https://phabricator.wikimedia.org/T207702) [18:47:19] (03CR) 10Thcipriani: maintentance: cleanup contint1001, too (031 comment) [integration/config] - 10https://gerrit.wikimedia.org/r/519083 (https://phabricator.wikimedia.org/T207702) (owner: 10Thcipriani) [19:00:24] (03CR) 10Jforrester: [C: 03+1] maintentance: cleanup contint1001, too [integration/config] - 10https://gerrit.wikimedia.org/r/519083 (https://phabricator.wikimedia.org/T207702) (owner: 10Thcipriani) [19:08:27] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10Release, 10Train Deployments: 1.34.0-wmf.10 deployment blockers - https://phabricator.wikimedia.org/T220735 (10mmodell) deploying MediaWiki 1.34.0-wmf.10 to all wikis [19:10:10] !lloh Update Jenkins collapsible section patterns to account for "quibble.cmd" => "quibble.commands" changing in the output [19:11:25] * Reedy hands Krinkle a g [19:11:27] !log Update Jenkins collapsible section patterns to account for "quibble.cmd" => "quibble.commands" changing in the output [19:11:28] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [19:11:29] :D [19:11:59] what's "a g" ? [19:12:16] attorney general? [19:12:18] :P [19:13:03] it's the letter next to the h :-P [19:13:26] apergos: qwertyist! [19:14:03] well it is in the context of an off-by-one typo! [19:14:36] 10Release-Engineering-Team (Kanban), 10Patch-For-Review, 10Release, 10Train Deployments: 1.34.0-wmf.10 deployment blockers - https://phabricator.wikimedia.org/T220735 (10mmodell) [19:28:59] 10Release-Engineering-Team (Kanban), 10MW-1.34-notes (1.34.0-wmf.10; 2019-06-18), 10Release, 10Train Deployments: 1.34.0-wmf.10 deployment blockers - https://phabricator.wikimedia.org/T220735 (10Jdforrester-WMF) 05Open→03Resolved [19:32:01] 10Continuous-Integration-Config, 10BlueSpice: In CI BlueSpice repositories should always have BlueSpiceFoundation injected - https://phabricator.wikimedia.org/T226567 (10hashar) [19:46:20] 10Continuous-Integration-Config, 10Continuous-Integration-Infrastructure: New phan dependencies significantly slowed down CI tests - https://phabricator.wikimedia.org/T225112 (10hashar) I noticed we already have a feature flag to disable recursion when the job name contains `-phan-`. Then I am not sure why we... [19:53:30] PROBLEM - Free space - all mounts on deployment-mwmaint01 is CRITICAL: CRITICAL: deployment-prep.deployment-mwmaint01.diskspace.root.byte_percentfree (<11.11%) [20:01:35] 10Gerrit, 10Release-Engineering-Team: Investigate Gerrit https hits spikes - https://phabricator.wikimedia.org/T226570 (10hashar) [20:03:26] RECOVERY - Free space - all mounts on deployment-mwmaint01 is OK: OK: All targets OK [20:13:45] 10Gerrit, 10Release-Engineering-Team: Investigate Gerrit https hits spikes - https://phabricator.wikimedia.org/T226570 (10hashar) Looking at the Apache log for June 25th at 03:00 UTC. I got 51704 requests. The mid hour spike: ` 1 2019-06-25T03:34 1 2019-06-25T03:33 3360 2019-06-25T03:34 5037... [20:21:18] 10Gerrit, 10Release-Engineering-Team: Investigate Gerrit https hits spikes - https://phabricator.wikimedia.org/T226570 (10hashar) 05Open→03Resolved a:03hashar [20:21:21] 10Gerrit, 10Release-Engineering-Team, 10VPS-project-codesearch, 10Patch-For-Review: Gerrit thread use GC thrashing - https://phabricator.wikimedia.org/T221026 (10hashar) [20:35:19] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO, 10serviceops-radar: Gerrit http threads stuck behind sendemail thread - https://phabricator.wikimedia.org/T224448 (10thcipriani) This happened twice in the past 24 hours. https://fastthread.io/my-thread-report.js... [20:35:28] Project beta-code-update-eqiad build #252347: 04FAILURE in 2 min 28 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/252347/ [20:37:38] PROBLEM - puppet last run on contint1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 3 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[git_pull_jenkins CI slave scripts] [20:44:24] Yippee, build fixed! [20:44:24] Project beta-code-update-eqiad build #252348: 09FIXED in 1 min 23 sec: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/252348/ [20:46:51] 10Release-Engineering-Team (Kanban), 10Security-Team, 10phan-taint-check-plugin: Phan-taint-check-plugin not available for PHP > 7.0 - https://phabricator.wikimedia.org/T207344 (10sbassett) **Update:** T216974#5284212 [21:00:22] Jun 25 20:34:05 contint1001 puppet-agent[17475]: (/Stage[main]/Contint::Slave_scripts/Git::Clone[jenkins CI slave scripts]/Exec[git_pull_jenkins CI slave scripts]/returns) fatal: unable to access 'https://gerrit.wikimedia.org/r/integration/jenkins.git/': The requested URL returned error: 503 [21:00:24] bah [21:02:37] ah yeah gerrit restarted [21:04:52] RECOVERY - puppet last run on contint1001 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [21:04:56] (03PS1) 10Thcipriani: Revert "Gerrit v2.15.14" [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/519137 [21:05:30] (03CR) 10Paladox: [C: 03+2] Revert "Gerrit v2.15.14" [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/519137 (owner: 10Thcipriani) [21:06:09] (03CR) 10Thcipriani: [V: 03+2] Revert "Gerrit v2.15.14" [software/gerrit] (deploy/wmf/stable-2.15) - 10https://gerrit.wikimedia.org/r/519137 (owner: 10Thcipriani) [21:06:39] I cannot suddenly access https://gerrit.wikimedia.org/r/#/dashboard/self [21:06:53] hauskatze yup, known. [21:07:15] !log gerrit: Created new repo: operations/debs/coredns - requested by fsero on mediawiki [21:07:16] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [21:07:37] paladox: alright, I'll wait :) [21:11:29] (03PS1) 10MarcoAurelio: Edit Project Config [debs/coredns] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/519139 [21:11:54] (03Abandoned) 10MarcoAurelio: Edit Project Config [debs/coredns] (refs/meta/config) - 10https://gerrit.wikimedia.org/r/519139 (owner: 10MarcoAurelio) [21:15:01] 10Release-Engineering-Team (Kanban), 10Security-Team, 10phan-taint-check-plugin: Phan-taint-check-plugin not available for PHP > 7.0 - https://phabricator.wikimedia.org/T207344 (10Jdforrester-WMF) >>! In T207344#5284226, @sbassett wrote: > **Update:** T216974#5284212 Thank you! [21:21:18] 10Gerrit, 10Release-Engineering-Team (Kanban), 10Release-Engineering-Team-TODO, 10LibUp: Re-enable use of Gerrit HTTP token to push patchsets - https://phabricator.wikimedia.org/T218750 (10thcipriani) 05Resolved→03Open Spoke too soon. Gerrit 2.15.14 caused a lot of SendEmail locks ({T224448}). I have t... [21:21:59] * hauskatze is waiting for the great gerrit upgrade to the pre-v3 version [21:22:17] 2.16 I think? [21:22:25] yup 2.16 [21:22:32] 2.16 is the last 2.x release :) [21:22:48] 3.0 was released in may [21:23:21] 2.16, looking forward to that [21:23:27] ugh I am so not here. really I'm gone... [21:23:47] :D [21:24:38] apergos: lol [21:24:55] paladox: couple of blockers still to fix right? [21:25:29] hauskatze nope. All blockers have been resolved. At least i haven't found anymore and have resolved the other blockers :) [21:25:56] So I guess it's just waiting for some green light [21:25:59] ? [21:26:19] yup [21:27:43] 10Gerrit, 10Release-Engineering-Team (Development services), 10Release-Engineering-Team-TODO, 10serviceops-radar: Gerrit http threads stuck behind sendemail thread - https://phabricator.wikimedia.org/T224448 (10thcipriani) >>! In T224448#5284141, @thcipriani wrote: > This happened twice in the past 24 hour... [21:28:39] I'd like to go a week without having to restart gerrit before major migrations :\ [21:29:38] has mostly been my mental block on getting an upgrade plan together [21:30:34] hashar: musikanimal wonders if https://integration.wikimedia.org/ci/job/composer-php72-docker/83/console is related to our ci-setup [21:32:45] hauskatze: yes the container lacks the php extension [21:33:12] hashar: the container is managed by releng or our repo? [21:33:17] by releng [21:33:31] I guess we'll have to ask for it to be added? [21:33:34] and that job is is just doing "composer install" && "composer test" [21:33:40] which most of the time is just to run linters [21:33:47] when we introduced composer, that was the intent [21:34:03] eg mediawiki extensions usually do not declare any php extension in their composer.json [21:34:23] but if it runs 'composer install' shouldn't jenkins just fetch and retrieve the packages requested and apply them? [21:34:33] like in our local machines? [21:34:56] though I guess that'd be a security issue [21:35:03] if we could load anything we want [21:35:22] the problem is that we are using composer.json as the entry point to run some lint/test commands [21:35:31] when it is also used to define the actual application [21:35:35] but yeah [21:35:36] anyway [21:35:46] we would surely need to add php-mysqli :] [21:35:51] hauskatze: if you file a bug we can probably add php-mysqli to th econtainer [21:35:53] there might be other use cases [21:36:10] the reason it hasn't been installed is probably because no project used it yet [21:36:19] I imagine when they write some test they will also want to have a mysql available [21:41:45] 10Continuous-Integration-Config: Add `ext-mysqli` to the CI container - https://phabricator.wikimedia.org/T226585 (10MarcoAurelio) [21:41:57] legoktm & hashar ^^ as suggested [21:42:20] 10Continuous-Integration-Config, 10Fundraising-Backlog: Phan job fails on CI for mediawiki/core fundraising/REL1_31 - https://phabricator.wikimedia.org/T226156 (10Jgreen) [21:44:18] I'll aim to get to it tonight if no one else does [21:54:06] ty :) [22:00:23] !log Ran `replication start operations/debs/coredns --wait` [22:00:25] Logged the message at https://wikitech.wikimedia.org/wiki/Release_Engineering/SAL [22:07:25] 10Continuous-Integration-Config: Add `ext-mysqli` to the CI container - https://phabricator.wikimedia.org/T226585 (10Jdforrester-WMF) Hmm. `mysqli` seems a bit heavy to add to the `composer-{flavour}-docker` images. We currently have `php(7.\d)?-mysql` in the `quibble-stretch-php7*` images and the custom `civic... [22:30:46] brennen && liw apparently #zuul like the feedback you've given on https://phabricator.wikimedia.org/T218138 :) [22:31:57] paladox: oh yeah? [22:32:12] they say it's very well written [22:32:29] cool. :) [22:33:34] i think i dropped in there for a while while we were looking at that stuff, but trimmed it because i was stacking up IRC channels too fast... [22:33:53] heh [22:46:41] 10Beta-Cluster-Infrastructure, 10Release-Engineering-Team (Kanban), 10Code-Stewardship-Reviews: deployment-prep: Code stewardship request - https://phabricator.wikimedia.org/T215217 (10Jrbranaa) a:03Jrbranaa [22:49:25] 10Project-Admins: Desktop Tag - https://phabricator.wikimedia.org/T225799 (10JTannerWMF) Well there can be a feature that is desktop and mobile and we are making it clear to engineers it should be built for both. Sometimes it is only for Desktop or only for Mobile. We is Growth Team and Editing Team Product and... [22:50:42] 10Phabricator-Bot-Requests, 10Discourse, 10Space: Loop trying to create an account in Wikimedia Space with certain browser versions - https://phabricator.wikimedia.org/T226545 (10Quiddity) I can no longer reproduce. (yay?!) Hopefully that means it has solved itself somehow? [22:56:54] 10Phabricator-Bot-Requests, 10Discourse, 10Space: Loop trying to create an account in Wikimedia Space with certain browser versions - https://phabricator.wikimedia.org/T226545 (10bd808) Firefox 67.0.4 OSX: I seemed to be in the "loop" state when clicking on the "sign up" button, but then I tried clicking "lo... [23:08:42] 10Continuous-Integration-Config: Move all CI generic tasks from PHP70 to PHP72 - https://phabricator.wikimedia.org/T225457 (10Jdforrester-WMF) [23:08:44] 10Continuous-Integration-Config, 10Patch-For-Review: Run phan secheck on PHP 7.2, not PHP 7.0 - https://phabricator.wikimedia.org/T226420 (10Jdforrester-WMF) 05Open→03Stalled Stalled on blockers.