[00:02:32] !log reindexing Portuguese wikis on elastic@eqiad, elastic@codfw, and cloudelastic complete (T284185) [00:02:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:02:37] T284185: Reindex German, Dutch, and Portugese Wikis - https://phabricator.wikimedia.org/T284185 [00:07:46] 10SRE, 10MW-on-K8s, 10Release-Engineering-Team, 10serviceops: Check out www-portals repo in the mediawiki-webserver and in the mediawiki-multiversion images - https://phabricator.wikimedia.org/T285325 (10jeena) a:03jeena [00:10:24] (03PS1) 10Jeena Huneidi: Checkout portals for multiversion & webserver img [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701001 (https://phabricator.wikimedia.org/T285325) [00:11:16] (03CR) 10Jeena Huneidi: "haven't built this yet" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701001 (https://phabricator.wikimedia.org/T285325) (owner: 10Jeena Huneidi) [00:26:15] (03PS1) 10Krinkle: InitialiseSettings: Add toolforge.org to wgNoFollowDomainExceptions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701003 (https://phabricator.wikimedia.org/T285364) [00:40:57] !log uploaded new versions of flufl.bounce_4.0-1_amd64.changes hyperkitty_1.3.4-2~bpo10+4_amd64.changes mailman3_3.3.3-1~bpo10+5_amd64.changes mailman-hyperkitty_1.1.0-10~bpo10+1_amd64.changes to apt1001 [00:41:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:49:32] 10SRE, 10Wikimedia-Mailing-lists: Upgrade Mailman to flufl.bounce 4.0 - https://phabricator.wikimedia.org/T285120 (10Legoktm) Package uploaded to apt.wm.o and I've upgraded the Cloud instance. Will to production later today. [00:52:16] (03CR) 10BryanDavis: InitialiseSettings: Add toolforge.org to wgNoFollowDomainExceptions (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701003 (https://phabricator.wikimedia.org/T285364) (owner: 10Krinkle) [02:01:46] (Traffic on tunnel link) firing: Traffic on tunnel link - https://alerts.wikimedia.org [02:14:01] 10SRE, 10Performance-Team, 10serviceops, 10MW-1.36-notes, and 3 others: Enable "/*/mw-with-onhost-tier/" route for MediaWiki where safe - https://phabricator.wikimedia.org/T264604 (10aaron) How many appserver instances are active? [02:21:46] (Traffic on tunnel link) resolved: Traffic on tunnel link - https://alerts.wikimedia.org [02:35:15] PROBLEM - SSH on mw1284.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [02:36:14] 10SRE, 10Okapi [Wikimedia Enterprise], 10Platform Engineering, 10Traffic: Securely connect Wikimedia Enterprise Infrastructure with WMF Kafka Streams - https://phabricator.wikimedia.org/T280628 (10RBrounley_WMF) Update here - we are onboarding folks at the current moment - DevOps focused Sr Software Engine... [03:35:59] RECOVERY - SSH on mw1284.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [03:40:39] PROBLEM - Check systemd state on cumin1001 is CRITICAL: CRITICAL - degraded: The following units failed: database-backups-snapshots.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:49:47] (03PS1) 10Marostegui: db1100: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/701013 (https://phabricator.wikimedia.org/T283235) [04:51:20] (03CR) 10Marostegui: [C: 03+2] db1100: Enable notifications [puppet] - 10https://gerrit.wikimedia.org/r/701013 (https://phabricator.wikimedia.org/T283235) (owner: 10Marostegui) [04:52:18] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1100 (re)pooling @ 25%: Repool db1100 after upgrade', diff saved to https://phabricator.wikimedia.org/P16699 and previous config saved to /var/cache/conftool/dbconfig/20210623-045217-root.json [04:52:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:06:49] PROBLEM - mailman3_queue_size on lists1001 is CRITICAL: CRITICAL: 1 mailman3 queues above limits: bounces is 51 (limit: 25) https://wikitech.wikimedia.org/wiki/Mailman/Monitoring https://grafana.wikimedia.org/d/GvuAmuuGk/mailman3 [05:08:03] seems like a temporary spike [05:08:41] RECOVERY - mailman3_queue_size on lists1001 is OK: OK: mailman3 queues are below the limits https://wikitech.wikimedia.org/wiki/Mailman/Monitoring https://grafana.wikimedia.org/d/GvuAmuuGk/mailman3 [05:09:05] oh, responses from the daily-article-l post [05:31:06] (03PS1) 10Samwilson: Remove defunct feature flag $wgWikisourceEnableOcr [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701016 (https://phabricator.wikimedia.org/T285311) [05:32:33] (03CR) 10Ryan Kemper: [C: 03+2] [wdqs] fix prometheus_jmx_exporter class selection [puppet] - 10https://gerrit.wikimedia.org/r/700653 (https://phabricator.wikimedia.org/T270245) (owner: 10DCausse) [05:42:53] !log marostegui@cumin1001 dbctl commit (dc=all): 'Start repooling db1100', diff saved to https://phabricator.wikimedia.org/P16700 and previous config saved to /var/cache/conftool/dbconfig/20210623-054252-marostegui.json [05:42:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:58:13] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1100 (re)pooling @ 50%: Repool db1100 after upgrade', diff saved to https://phabricator.wikimedia.org/P16701 and previous config saved to /var/cache/conftool/dbconfig/20210623-055812-root.json [05:58:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:06:27] (03PS1) 10Elukey: postgresql::slave: add optional to the includes parameter [puppet] - 10https://gerrit.wikimedia.org/r/701018 (https://phabricator.wikimedia.org/T232358) [06:08:42] (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (NOOP 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29965/console" [puppet] - 10https://gerrit.wikimedia.org/r/701018 (https://phabricator.wikimedia.org/T232358) (owner: 10Elukey) [06:11:08] (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (NOOP 4): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29966/console" [puppet] - 10https://gerrit.wikimedia.org/r/701018 (https://phabricator.wikimedia.org/T232358) (owner: 10Elukey) [06:11:44] (03CR) 10Elukey: [V: 03+1 C: 03+2] postgresql::slave: add optional to the includes parameter [puppet] - 10https://gerrit.wikimedia.org/r/701018 (https://phabricator.wikimedia.org/T232358) (owner: 10Elukey) [06:13:16] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1100 (re)pooling @ 75%: Repool db1100 after upgrade', diff saved to https://phabricator.wikimedia.org/P16702 and previous config saved to /var/cache/conftool/dbconfig/20210623-061316-root.json [06:13:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:16:48] (03PS2) 10ArielGlenn: dumps: Remove absented crons [puppet] - 10https://gerrit.wikimedia.org/r/700978 (https://phabricator.wikimedia.org/T273673) (owner: 10Ladsgroup) [06:17:52] (03CR) 10ArielGlenn: [C: 03+2] dumps: Remove absented crons [puppet] - 10https://gerrit.wikimedia.org/r/700978 (https://phabricator.wikimedia.org/T273673) (owner: 10Ladsgroup) [06:21:33] (03PS2) 10ArielGlenn: dumps: Migrate dumplists cron to systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/700981 (https://phabricator.wikimedia.org/T273673) (owner: 10Ladsgroup) [06:24:40] (03CR) 10ArielGlenn: [C: 03+2] dumps: Migrate dumplists cron to systemd timer [puppet] - 10https://gerrit.wikimedia.org/r/700981 (https://phabricator.wikimedia.org/T273673) (owner: 10Ladsgroup) [06:28:20] !log marostegui@cumin1001 dbctl commit (dc=all): 'db1100 (re)pooling @ 100%: Repool db1100 after upgrade', diff saved to https://phabricator.wikimedia.org/P16703 and previous config saved to /var/cache/conftool/dbconfig/20210623-062819-root.json [06:28:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:37:49] !log [WDQS] `ryankemper@wdqs2001:~$ sudo pool` [06:37:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:45:19] PROBLEM - SSH on bast3005 is CRITICAL: Server answer: https://wikitech.wikimedia.org/wiki/SSH/monitoring [06:45:19] (03CR) 10Giuseppe Lavagetto: [C: 03+2] pipeline: install the wmf internal CAs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/700843 (https://phabricator.wikimedia.org/T284417) (owner: 10Giuseppe Lavagetto) [06:45:28] (03CR) 10DCausse: [C: 03+1] mjolnir: Provide prioritized topics to bulk daemon (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/699814 (https://phabricator.wikimedia.org/T261407) (owner: 10Ebernhardson) [06:46:02] (03Merged) 10jenkins-bot: pipeline: install the wmf internal CAs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/700843 (https://phabricator.wikimedia.org/T284417) (owner: 10Giuseppe Lavagetto) [06:47:09] RECOVERY - SSH on bast3005 is OK: SSH OK - OpenSSH_7.9p1 Debian-10+deb10u2 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [06:48:51] (03CR) 10Giuseppe Lavagetto: [C: 03+2] pipeline: install php extensions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/700903 (https://phabricator.wikimedia.org/T285309) (owner: 10Giuseppe Lavagetto) [06:49:01] (03CR) 10jerkins-bot: [V: 04-1] pipeline: install php extensions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/700903 (https://phabricator.wikimedia.org/T285309) (owner: 10Giuseppe Lavagetto) [06:49:11] <_joe_> wut [06:49:23] (03PS2) 10Giuseppe Lavagetto: pipeline: install php extensions [mediawiki-config] - 10https://gerrit.wikimedia.org/r/700903 (https://phabricator.wikimedia.org/T285309) [06:49:38] <_joe_> sometimes those merge conflicts are just mysterious [06:50:47] huh [06:56:35] !log [WDQS] `ryankemper@wdqs1006:~$ sudo pool` [06:56:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:06:14] (03CR) 10Legoktm: "Do you need php-excimer and php-wmerrors too?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/700903 (https://phabricator.wikimedia.org/T285309) (owner: 10Giuseppe Lavagetto) [07:08:45] !log updating mailman packages on lists1001 and restarting (T285120, T280889) [07:08:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:08:51] T285120: Upgrade Mailman to flufl.bounce 4.0 - https://phabricator.wikimedia.org/T285120 [07:10:20] (03PS2) 10Ryan Kemper: mjolnir: Stop listening to BC topics [puppet] - 10https://gerrit.wikimedia.org/r/697836 (https://phabricator.wikimedia.org/T261407) (owner: 10Ebernhardson) [07:10:44] (03PS4) 10KartikMistry: cxserver: Remove Matxin MT and add more language support to Elia [deployment-charts] - 10https://gerrit.wikimedia.org/r/700466 (https://phabricator.wikimedia.org/T285199) [07:13:25] PROBLEM - mailman3_runners on lists1001 is CRITICAL: PROCS CRITICAL: 13 processes with UID = 38 (list), regex args /usr/lib/mailman3/bin/runner https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [07:13:36] ^ known [07:13:41] (03CR) 10Ryan Kemper: [C: 03+2] mjolnir: Stop listening to BC topics [puppet] - 10https://gerrit.wikimedia.org/r/697836 (https://phabricator.wikimedia.org/T261407) (owner: 10Ebernhardson) [07:14:52] I missed a patch when rebuilding, fixing... [07:15:25] ACKNOWLEDGEMENT - mailman3_runners on lists1001 is CRITICAL: PROCS CRITICAL: 13 processes with UID = 38 (list), regex args /usr/lib/mailman3/bin/runner Legoktm working on it https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [07:26:16] !log uploaded mailman3_3.3.3-1~bpo10+6_amd64.changes on apt1001 [07:26:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:28:09] RECOVERY - mailman3_runners on lists1001 is OK: PROCS OK: 14 processes with UID = 38 (list), regex args /usr/lib/mailman3/bin/runner https://wikitech.wikimedia.org/wiki/Mailman/Monitoring [07:30:23] 10SRE, 10Wikimedia-Mailing-lists: Upgrade Mailman to flufl.bounce 4.0 - https://phabricator.wikimedia.org/T285120 (10Legoktm) 05Open→03Resolved [07:42:37] (03CR) 10KartikMistry: [C: 03+2] cxserver: Remove Matxin MT and add more language support to Elia [deployment-charts] - 10https://gerrit.wikimedia.org/r/700466 (https://phabricator.wikimedia.org/T285199) (owner: 10KartikMistry) [07:43:04] * kart_ updating cxserver.. [07:44:57] (03Merged) 10jenkins-bot: cxserver: Remove Matxin MT and add more language support to Elia [deployment-charts] - 10https://gerrit.wikimedia.org/r/700466 (https://phabricator.wikimedia.org/T285199) (owner: 10KartikMistry) [07:45:01] (03CR) 10Filippo Giunchedi: [C: 03+2] "LGTM, thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/700994 (https://phabricator.wikimedia.org/T281358) (owner: 10Dave Pifke) [07:46:42] !log kartik@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'cxserver' for release 'staging' . [07:46:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:49:08] !log kartik@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'cxserver' for release 'production' . [07:49:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:54:33] !log kartik@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'cxserver' for release 'production' . [07:54:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:57:56] !log cxserver: Removed Matxin MT support and added more language support to Elia MT (T285199, T284900) [07:58:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:58:03] T284900: Support more language pairs with Elia MT - https://phabricator.wikimedia.org/T284900 [07:58:03] T285199: Remove Matxin support from ContentTranslation - https://phabricator.wikimedia.org/T285199 [07:59:45] (03PS1) 10Volans: Add official support for Python 3.9 [software/pywmflib] - 10https://gerrit.wikimedia.org/r/701051 [08:03:23] (03CR) 10Volans: [C: 03+2] mediawiki: Make siteinfo API request over HTTPS [software/spicerack] - 10https://gerrit.wikimedia.org/r/700963 (https://phabricator.wikimedia.org/T266618) (owner: 10Legoktm) [08:09:19] (03Merged) 10jenkins-bot: mediawiki: Make siteinfo API request over HTTPS [software/spicerack] - 10https://gerrit.wikimedia.org/r/700963 (https://phabricator.wikimedia.org/T266618) (owner: 10Legoktm) [08:10:18] (03PS1) 10Legoktm: swift: Only run swiftrepl-mw in the active datacenter [puppet] - 10https://gerrit.wikimedia.org/r/701052 (https://phabricator.wikimedia.org/T285373) [08:10:53] (03CR) 10Elukey: [C: 03+1] "\o/" [software/pywmflib] - 10https://gerrit.wikimedia.org/r/701051 (owner: 10Volans) [08:11:28] (03CR) 10Volans: [C: 03+2] Add official support for Python 3.9 [software/pywmflib] - 10https://gerrit.wikimedia.org/r/701051 (owner: 10Volans) [08:13:43] (03Merged) 10jenkins-bot: Add official support for Python 3.9 [software/pywmflib] - 10https://gerrit.wikimedia.org/r/701051 (owner: 10Volans) [08:14:04] volans: thanks, I think there might be one more spicerack patch related to stopping maintenance systemd timers, will try to do it later today, then do you think we can do a spicerack release on Thursday? [08:15:24] legoktm: sure, I was planning to do it today already if that was it, but happy to wait for the next patch :) [08:16:19] RECOVERY - Check systemd state on cumin1001 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:19:51] (03CR) 10Volans: [C: 04-1] "Potential issue with current puppetization depending how important is to not run a cross sync." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/701052 (https://phabricator.wikimedia.org/T285373) (owner: 10Legoktm) [08:26:43] 10SRE, 10conftool, 10serviceops, 10Datacenter-Switchover: Disable maintenance scripts via conftool - https://phabricator.wikimedia.org/T266717 (10Legoktm) >>! In T266717#6723029, @RLazarus wrote: >>>! In T266717#6722705, @Joe wrote: >> I think we have a better way to avoid this. Basically we want to stop r... [08:27:31] (03PS1) 10Legoktm: mediawiki: Stop all systemd job units when stopping cronjobs [software/spicerack] - 10https://gerrit.wikimedia.org/r/701053 (https://phabricator.wikimedia.org/T266717) [08:27:48] volans: ^ is the patch, I just haven't tested yet whether the wildcard actually works [08:29:28] (03CR) 10Legoktm: mediawiki: Stop all systemd job units when stopping cronjobs (031 comment) [software/spicerack] - 10https://gerrit.wikimedia.org/r/701053 (https://phabricator.wikimedia.org/T266717) (owner: 10Legoktm) [08:30:16] (03CR) 10David Caro: grid: php config don't rely on php being installed by puppet (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/700186 (owner: 10David Caro) [08:34:07] !log mbsantos@deploy1002 Started deploy [kartotherian/deploy@9f16a6b]: (no justification provided) [08:34:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:34:21] !log mbsantos@deploy1002 Finished deploy [kartotherian/deploy@9f16a6b]: (no justification provided) (duration: 00m 14s) [08:34:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:34:37] legoktm: ack, looking [08:35:53] wouldn't puppet restart them all? I don't recall by heart if puppet is disabled there during the switch [08:38:15] 10SRE, 10Traffic: Enable UDS support on varnish - https://phabricator.wikimedia.org/T285374 (10Vgutierrez) [08:38:30] (03PS1) 10Legoktm: mediawiki: mw-cli-wrapper: Only run if read only in confctl is false [puppet] - 10https://gerrit.wikimedia.org/r/701055 (https://phabricator.wikimedia.org/T266717) [08:39:27] well these are timers, so puppet doesn't control them [08:39:42] and also we do disable puppet in 00-disable-puppet [08:39:48] logger.info('Disabling Puppet on MediaWiki maintenance hosts in %s and %s', args.dc_from, args.dc_to) [08:39:49] remote.query('A:mw-maintenance').run_sync('disable-puppet "{message}"'.format(message=PUPPET_REASON)) [08:41:03] ack [08:41:27] (03CR) 10Legoktm: mediawiki: mw-cli-wrapper: Only run if read only in confctl is false (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/701055 (https://phabricator.wikimedia.org/T266717) (owner: 10Legoktm) [08:42:03] (03PS1) 10Vgutierrez: varnish: Add listen on UDS support [puppet] - 10https://gerrit.wikimedia.org/r/701056 (https://phabricator.wikimedia.org/T285374) [08:42:05] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "We should keep using the data in /etc/conftool-state/mediawiki.yaml for this as well." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/701055 (https://phabricator.wikimedia.org/T266717) (owner: 10Legoktm) [08:42:35] I trust more the disabled puppet than puppet not restarting them as they are defined in the catalog and might try to ensure they're running [08:42:38] (03CR) 10Cathal Mooney: [C: 03+1] "Looks good to me, or at least it makes sense :)" [homer/public] - 10https://gerrit.wikimedia.org/r/700939 (owner: 10Ayounsi) [08:42:38] ;) [08:42:59] heh [08:44:16] (03CR) 10Giuseppe Lavagetto: [C: 03+1] hieradata: Use TLS codfw pool for memcached replication on eqiad [puppet] - 10https://gerrit.wikimedia.org/r/700861 (https://phabricator.wikimedia.org/T271967) (owner: 10Effie Mouzeli) [08:44:24] (03PS3) 10Ema: varnishmtail: add Error and FetchError VSL tags [puppet] - 10https://gerrit.wikimedia.org/r/700876 (https://phabricator.wikimedia.org/T284576) [08:44:27] (03CR) 10Legoktm: mediawiki: mw-cli-wrapper: Only run if read only in confctl is false (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/701055 (https://phabricator.wikimedia.org/T266717) (owner: 10Legoktm) [08:44:58] (03CR) 10Giuseppe Lavagetto: [C: 03+1] Add python-build-bullseye image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/685462 (owner: 10Volans) [08:46:32] PROBLEM - Check systemd state on thanos-fe2002 is CRITICAL: CRITICAL - degraded: The following units failed: ferm.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:47:01] (03CR) 10Ema: [C: 03+2] varnishmtail: add Error and FetchError VSL tags [puppet] - 10https://gerrit.wikimedia.org/r/700876 (https://phabricator.wikimedia.org/T284576) (owner: 10Ema) [08:48:32] (03CR) 10Legoktm: swift: Only run swiftrepl-mw in the active datacenter (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/701052 (https://phabricator.wikimedia.org/T285373) (owner: 10Legoktm) [08:48:56] !log sudo systemctl start ferm.service on thanos-fe2002 (DNS query timeout) [08:48:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:49:50] (03CR) 10Giuseppe Lavagetto: [C: 03+1] Add istio 1.9.5 images (031 comment) [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/700396 (https://phabricator.wikimedia.org/T278192) (owner: 10Elukey) [08:49:52] RECOVERY - Check systemd state on thanos-fe2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [08:52:53] 10SRE, 10Commons, 10MediaWiki-File-management, 10SRE-swift-storage, and 4 others: Re-deleting a Commons file: "Error deleting file: The file "mwstore://local-multiwrite/local-deleted/..." is in an inconsistent state within the internal storage backends". - https://phabricator.wikimedia.org/T270994 (10Aklapp... [08:58:09] (03Abandoned) 10Addshore: Added wmgWikibaseEntitySources setting for defining Wikibase "entity sources" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490104 (https://phabricator.wikimedia.org/T214557) (owner: 10WMDE-leszek) [08:58:36] (03Abandoned) 10Addshore: Added wmgWikibaseRepoLocalEntitySourceName to define the "local" source of Wikibase Repo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/490633 (https://phabricator.wikimedia.org/T214557) (owner: 10WMDE-leszek) [08:59:33] (03PS1) 10Giuseppe Lavagetto: mwdebug: use latest image [deployment-charts] - 10https://gerrit.wikimedia.org/r/701058 [09:05:03] (03CR) 10Giuseppe Lavagetto: [C: 04-1] mediawiki: mw-cli-wrapper: Only run if read only in confctl is false (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/701055 (https://phabricator.wikimedia.org/T266717) (owner: 10Legoktm) [09:13:50] (03PS1) 10Jcrespo: Revert "mariadb: Temporarily reduce retention of dbprov2003-stored backups" [puppet] - 10https://gerrit.wikimedia.org/r/700737 [09:13:56] (03PS2) 10Jcrespo: Revert "mariadb: Temporarily reduce retention of dbprov2003-stored backups" [puppet] - 10https://gerrit.wikimedia.org/r/700737 [09:14:10] 10SRE, 10LDAP-Access-Requests: Grant Access to ldap/wmf for TChin - https://phabricator.wikimedia.org/T285326 (10Aklapper) @tchin: Ah, thanks for the clarification and the link! I've [clarified that section](https://office.wikimedia.org/w/index.php?title=Technology%2FOnboarding%2FChecklists%2FTemplate&type=rev... [09:14:17] (03CR) 10Elukey: [V: 03+2 C: 03+2] Add istio 1.9.5 images [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/700396 (https://phabricator.wikimedia.org/T278192) (owner: 10Elukey) [09:14:45] (03CR) 10Giuseppe Lavagetto: [C: 03+2] mwdebug: use latest image [deployment-charts] - 10https://gerrit.wikimedia.org/r/701058 (owner: 10Giuseppe Lavagetto) [09:15:31] (03PS3) 10Jcrespo: Revert "mariadb: Temporarily reduce retention of dbprov2003-stored backups" [puppet] - 10https://gerrit.wikimedia.org/r/700737 [09:15:51] (03PS4) 10Jcrespo: Revert "mariadb: Temporarily reduce retention of dbprov2003-stored backups" [puppet] - 10https://gerrit.wikimedia.org/r/700737 [09:16:16] (03PS4) 10Jbond: P:logoutd: create wrapper script for calling logout.d scripts [puppet] - 10https://gerrit.wikimedia.org/r/700922 (https://phabricator.wikimedia.org/T283242) [09:16:48] (03CR) 10Jbond: [C: 03+1] "fixed thanks" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/700922 (https://phabricator.wikimedia.org/T283242) (owner: 10Jbond) [09:17:51] (03Merged) 10jenkins-bot: mwdebug: use latest image [deployment-charts] - 10https://gerrit.wikimedia.org/r/701058 (owner: 10Giuseppe Lavagetto) [09:20:33] (03CR) 10Jcrespo: [C: 03+2] Revert "mariadb: Temporarily reduce retention of dbprov2003-stored backups" [puppet] - 10https://gerrit.wikimedia.org/r/700737 (owner: 10Jcrespo) [09:22:01] (03PS5) 10Jcrespo: Revert "mariadb: Temporarily reduce retention of dbprov2003-stored backups" [puppet] - 10https://gerrit.wikimedia.org/r/700737 [09:22:09] !log oblivian@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [09:22:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:22:13] (03PS6) 10Jcrespo: Revert "mariadb: Temporarily reduce retention of dbprov2003-stored backups" [puppet] - 10https://gerrit.wikimedia.org/r/700737 [09:23:08] (03PS7) 10Jcrespo: Revert "mariadb: Temporarily reduce retention of dbprov2003-stored backups" [puppet] - 10https://gerrit.wikimedia.org/r/700737 [09:26:48] (03PS5) 10Jbond: P:logoutd: create wrapper script for calling logout.d scripts [puppet] - 10https://gerrit.wikimedia.org/r/700922 (https://phabricator.wikimedia.org/T283242) [09:26:59] (03CR) 10Jcrespo: [C: 03+2] Revert "mariadb: Temporarily reduce retention of dbprov2003-stored backups" [puppet] - 10https://gerrit.wikimedia.org/r/700737 (owner: 10Jcrespo) [09:28:04] 10SRE, 10Wikimedia-Mailing-lists: Mailman3 bounce runner is running very slowly - https://phabricator.wikimedia.org/T282348 (10Legoktm) 05Open→03Resolved a:03Legoktm All of the issues we've run into so far have been patched upstream and deployed to lists.wm.o, so I'm going to close this. In general the b... [09:28:27] (03CR) 10jerkins-bot: [V: 04-1] P:logoutd: create wrapper script for calling logout.d scripts [puppet] - 10https://gerrit.wikimedia.org/r/700922 (https://phabricator.wikimedia.org/T283242) (owner: 10Jbond) [09:29:40] (03PS2) 10Vgutierrez: varnish: Add listen on UDS support [puppet] - 10https://gerrit.wikimedia.org/r/701056 (https://phabricator.wikimedia.org/T285374) [09:30:11] 10SRE, 10Wikimedia-Mailing-lists, 10I18n, 10Patch-For-Review, 10RTL: Make pipermail show RTL emails better by emitting dir=auto - https://phabricator.wikimedia.org/T235458 (10Legoktm) 05Open→03Declined At this point we're not going to change the pipermail archives. I don't think this is an issue in h... [09:35:32] !log mbsantos@deploy1002 Started deploy [kartotherian/deploy@9f16a6b]: (no justification provided) [09:35:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:35:53] !log mbsantos@deploy1002 Finished deploy [kartotherian/deploy@9f16a6b]: (no justification provided) (duration: 00m 20s) [09:35:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:42:25] (03PS3) 10David Caro: toolforge: Add buster specific packages/setting [puppet] - 10https://gerrit.wikimedia.org/r/700186 [09:42:27] (03PS1) 10David Caro: toolforge.genpp: add buster repos [puppet] - 10https://gerrit.wikimedia.org/r/701062 [09:42:29] (03PS1) 10David Caro: toolforge.exec_environ: add tests [puppet] - 10https://gerrit.wikimedia.org/r/701063 [09:42:31] (03CR) 10Filippo Giunchedi: swift: Only run swiftrepl-mw in the active datacenter (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/701052 (https://phabricator.wikimedia.org/T285373) (owner: 10Legoktm) [09:43:12] PROBLEM - pt-heartbeat-wikimedia service on db2080 is CRITICAL: CRITICAL - Expecting inactive but unit pt-heartbeat-wikimedia is active https://wikitech.wikimedia.org/wiki/MariaDB/pt-heartbeat [09:43:39] huuh. #-operations, you say. [09:43:43] please ignore ^ [09:44:45] (03CR) 10jerkins-bot: [V: 04-1] toolforge: Add buster specific packages/setting [puppet] - 10https://gerrit.wikimedia.org/r/700186 (owner: 10David Caro) [09:45:02] RECOVERY - pt-heartbeat-wikimedia service on db2080 is OK: OK - pt-heartbeat-wikimedia is inactive https://wikitech.wikimedia.org/wiki/MariaDB/pt-heartbeat [09:45:33] (03CR) 10Majavah: toolforge: Add buster specific packages/setting (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/700186 (owner: 10David Caro) [09:46:32] (03CR) 10Filippo Giunchedi: [V: 03+1] "PCC SUCCESS (NOOP 1 DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29967/console" [puppet] - 10https://gerrit.wikimedia.org/r/701052 (https://phabricator.wikimedia.org/T285373) (owner: 10Legoktm) [09:46:54] (03CR) 10David Caro: toolforge: Add buster specific packages/setting (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/700186 (owner: 10David Caro) [09:47:17] (03PS1) 10Kormat: nrpe: Pass through the provided contact_group. [puppet] - 10https://gerrit.wikimedia.org/r/701065 [09:47:41] (03PS2) 10Kormat: nrpe: Pass through the provided contact_group. [puppet] - 10https://gerrit.wikimedia.org/r/701065 [09:48:50] (03CR) 10Kormat: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29968/console" [puppet] - 10https://gerrit.wikimedia.org/r/701065 (owner: 10Kormat) [09:49:26] godog: can i randomly pick you to have a look at a one-line CR ^ please? :) [09:50:06] kormat: certainly, looking [09:50:26] (03CR) 10Filippo Giunchedi: [C: 03+1] "Ship it!" [puppet] - 10https://gerrit.wikimedia.org/r/701065 (owner: 10Kormat) [09:50:42] (03CR) 10Kormat: [V: 03+1 C: 03+2] nrpe: Pass through the provided contact_group. [puppet] - 10https://gerrit.wikimedia.org/r/701065 (owner: 10Kormat) [09:50:45] good old reviewboard days [09:50:55] godog: cheers :) [09:51:03] (03PS15) 10Jbond: IDM: create new idm library with logoutd base class [software/pywmflib] - 10https://gerrit.wikimedia.org/r/695341 (https://phabricator.wikimedia.org/T283242) [09:51:27] sure [09:52:00] PROBLEM - Host wdqs1013 is DOWN: PING CRITICAL - Packet loss = 100% [09:52:33] (03PS16) 10Jbond: IDM: create new idm library with logoutd base class [software/pywmflib] - 10https://gerrit.wikimedia.org/r/695341 (https://phabricator.wikimedia.org/T283242) [09:52:35] (03CR) 10Jbond: "updated" (034 comments) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/695341 (https://phabricator.wikimedia.org/T283242) (owner: 10Jbond) [09:52:41] (03PS17) 10Jbond: IDM: create new idm library with logoutd base class [software/pywmflib] - 10https://gerrit.wikimedia.org/r/695341 (https://phabricator.wikimedia.org/T283242) [09:52:48] RECOVERY - Host wdqs1013 is UP: PING OK - Packet loss = 0%, RTA = 0.58 ms [09:53:02] that was expected? ^^ [09:53:23] (03PS1) 10Razzi: superset: rename analytics_cluster::ui::{dashboards,superset} [puppet] - 10https://gerrit.wikimedia.org/r/701066 (https://phabricator.wikimedia.org/T268219) [09:57:08] (03PS4) 10David Caro: toolforge: Add buster specific packages/setting [puppet] - 10https://gerrit.wikimedia.org/r/700186 [09:57:10] (03CR) 10David Caro: toolforge: Add buster specific packages/setting (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/700186 (owner: 10David Caro) [09:57:12] (03PS2) 10David Caro: toolforge.exec_environ: add tests [puppet] - 10https://gerrit.wikimedia.org/r/701063 [09:58:11] (03PS3) 10David Caro: toolforge.exec_environ: add tests [puppet] - 10https://gerrit.wikimedia.org/r/701063 [09:58:18] (03PS1) 10Elukey: istio: add a more specific Depends target [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/701067 (https://phabricator.wikimedia.org/T278192) [09:58:32] (03CR) 10David Caro: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/701063 (owner: 10David Caro) [09:59:23] (03CR) 10Arturo Borrero Gonzalez: "other than that, LGTM" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/701062 (owner: 10David Caro) [10:03:01] (03CR) 10Jbond: [C: 03+1] "lgtm" [homer/public] - 10https://gerrit.wikimedia.org/r/700939 (owner: 10Ayounsi) [10:03:43] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "I'm ok with this structure, and with what you have now, but only if you commit to the following:" (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/697938 (https://phabricator.wikimedia.org/T278192) (owner: 10Elukey) [10:04:46] (03PS1) 10Jelto: fix cleanup of config backups [gitlab-ansible] - 10https://gerrit.wikimedia.org/r/701068 (https://phabricator.wikimedia.org/T274463) [10:04:58] (03PS3) 10Vgutierrez: varnish: Add listen on UDS support [puppet] - 10https://gerrit.wikimedia.org/r/701056 (https://phabricator.wikimedia.org/T285374) [10:08:58] (03CR) 10Jelto: "The cleanup of the config backups in /etc/gitlab/config_backup is not working. The cleanup script uses the wrong path (just ls -tp withou" [gitlab-ansible] - 10https://gerrit.wikimedia.org/r/701068 (https://phabricator.wikimedia.org/T274463) (owner: 10Jelto) [10:12:10] (03PS2) 10Razzi: superset: rename analytics_cluster::ui::{dashboards,superset} [puppet] - 10https://gerrit.wikimedia.org/r/701066 (https://phabricator.wikimedia.org/T268219) [10:18:19] (03PS1) 10Jbond: C:postgresql::server: prefer an empty list to undef [puppet] - 10https://gerrit.wikimedia.org/r/701069 (https://phabricator.wikimedia.org/T232358) [10:18:25] 10SRE, 10Wikimedia-Mailing-lists: Enable verp probes in mailman3 - https://phabricator.wikimedia.org/T285361 (10Nemo_bis) On VERP in MediaWiki see also: * https://www.mediawiki.org/wiki/VERP * https://www.mediawiki.org/wiki/Manual:Hooks/UserMailerChangeReturnPath [10:18:39] (03PS2) 10Jbond: C:postgresql::server: prefer an empty list to undef [puppet] - 10https://gerrit.wikimedia.org/r/701069 (https://phabricator.wikimedia.org/T232358) [10:19:52] (03CR) 10Jbond: [V: 03+1] "PCC SUCCESS (DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29973/console" [puppet] - 10https://gerrit.wikimedia.org/r/701069 (https://phabricator.wikimedia.org/T232358) (owner: 10Jbond) [10:20:16] (03CR) 10Jbond: "Thanks for this fix, I have created a minor update https://gerrit.wikimedia.org/r/c/operations/puppet/+/701069 to default to an empty list" [puppet] - 10https://gerrit.wikimedia.org/r/701018 (https://phabricator.wikimedia.org/T232358) (owner: 10Elukey) [10:21:54] (03PS6) 10Jbond: P:logoutd: create wrapper script for calling logout.d scripts [puppet] - 10https://gerrit.wikimedia.org/r/700922 (https://phabricator.wikimedia.org/T283242) [10:29:22] (03PS3) 10Razzi: superset: rename analytics_cluster::ui::{dashboards,superset} [puppet] - 10https://gerrit.wikimedia.org/r/701066 (https://phabricator.wikimedia.org/T268219) [10:30:09] (03PS5) 10Jbond: rspec: update to use rspec-mock instead of mocha [puppet] - 10https://gerrit.wikimedia.org/r/649684 [10:30:52] (03CR) 10Razzi: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29975/console" [puppet] - 10https://gerrit.wikimedia.org/r/701066 (https://phabricator.wikimedia.org/T268219) (owner: 10Razzi) [10:30:59] (03CR) 10jerkins-bot: [V: 04-1] rspec: update to use rspec-mock instead of mocha [puppet] - 10https://gerrit.wikimedia.org/r/649684 (owner: 10Jbond) [10:31:18] (03PS2) 10Elukey: istio: add a more specific build target [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/701067 (https://phabricator.wikimedia.org/T278192) [10:31:38] (03CR) 10Razzi: "Just a little cleanup" [puppet] - 10https://gerrit.wikimedia.org/r/701066 (https://phabricator.wikimedia.org/T268219) (owner: 10Razzi) [10:32:24] (03CR) 10Effie Mouzeli: [C: 03+2] hieradata: Use TLS codfw pool for memcached replication on eqiad [puppet] - 10https://gerrit.wikimedia.org/r/700861 (https://phabricator.wikimedia.org/T271967) (owner: 10Effie Mouzeli) [10:33:07] 10SRE, 10Traffic, 10Patch-For-Review: Enable UDS support on varnish - https://phabricator.wikimedia.org/T285374 (10Vgutierrez) p:05Triage→03Medium this seemed an innocent change but it effectively forces the update from VCL 4.0 to 4.1: From varnish documentation: ` When UDS listeners are in use, VCL >= 4... [10:35:27] (03PS1) 10Ema: varnish: add error counters to Prometheus [puppet] - 10https://gerrit.wikimedia.org/r/701070 (https://phabricator.wikimedia.org/T284576) [10:40:34] (03PS6) 10Jbond: rspec: update to use rspec-mock instead of mocha [puppet] - 10https://gerrit.wikimedia.org/r/649684 [10:40:36] (03PS1) 10Jbond: Gemfile: update puppetlabs_spec_helper [puppet] - 10https://gerrit.wikimedia.org/r/701071 [10:40:38] (03PS1) 10Jbond: spec_hellper: switch to rspec mocha and add rspec_parrallel arguments [puppet] - 10https://gerrit.wikimedia.org/r/701072 [10:41:50] (03CR) 10jerkins-bot: [V: 04-1] Gemfile: update puppetlabs_spec_helper [puppet] - 10https://gerrit.wikimedia.org/r/701071 (owner: 10Jbond) [10:42:02] (03CR) 10jerkins-bot: [V: 04-1] spec_hellper: switch to rspec mocha and add rspec_parrallel arguments [puppet] - 10https://gerrit.wikimedia.org/r/701072 (owner: 10Jbond) [10:42:39] (03PS7) 10Jbond: rspec: update to use rspec-mock instead of mocha [puppet] - 10https://gerrit.wikimedia.org/r/649684 [10:43:16] (03PS2) 10Jbond: rake_modules: update the dynamic spec test to use ParallelTests [puppet] - 10https://gerrit.wikimedia.org/r/700660 [10:43:26] (03PS3) 10Jbond: rake_modules: update the dynamic spec test to use ParallelTests [puppet] - 10https://gerrit.wikimedia.org/r/700660 [10:44:05] !log oblivian@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [10:44:08] (03PS2) 10Jbond: Gemfile: update puppetlabs_spec_helper [puppet] - 10https://gerrit.wikimedia.org/r/701071 [10:44:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:44:20] (03PS2) 10Jbond: spec_hellper: switch to rspec mocha and add rspec_parrallel arguments [puppet] - 10https://gerrit.wikimedia.org/r/701072 [10:44:39] (03PS4) 10Jbond: rake_modules: update the dynamic spec test to use ParallelTests [puppet] - 10https://gerrit.wikimedia.org/r/700660 [10:44:47] (03CR) 10jerkins-bot: [V: 04-1] rake_modules: update the dynamic spec test to use ParallelTests [puppet] - 10https://gerrit.wikimedia.org/r/700660 (owner: 10Jbond) [10:45:40] (03CR) 10jerkins-bot: [V: 04-1] rspec: update to use rspec-mock instead of mocha [puppet] - 10https://gerrit.wikimedia.org/r/649684 (owner: 10Jbond) [10:45:54] (03PS4) 10Vgutierrez: varnish: Add listen on UDS support [puppet] - 10https://gerrit.wikimedia.org/r/701056 (https://phabricator.wikimedia.org/T285374) [10:45:56] (03PS1) 10Vgutierrez: vcl: Use VCL 4.1 instead of 4.0 [puppet] - 10https://gerrit.wikimedia.org/r/701073 (https://phabricator.wikimedia.org/T285374) [10:45:59] (03CR) 10jerkins-bot: [V: 04-1] rake_modules: update the dynamic spec test to use ParallelTests [puppet] - 10https://gerrit.wikimedia.org/r/700660 (owner: 10Jbond) [10:47:02] (03PS3) 10Jbond: Gemfile: update puppetlabs_spec_helper [puppet] - 10https://gerrit.wikimedia.org/r/701071 [10:47:24] (03PS3) 10Jbond: spec_hellper: switch to rspec mocha and add rspec_parrallel arguments [puppet] - 10https://gerrit.wikimedia.org/r/701072 [10:47:33] (03PS5) 10Jbond: rake_modules: update the dynamic spec test to use ParallelTests [puppet] - 10https://gerrit.wikimedia.org/r/700660 [10:48:42] (03CR) 10jerkins-bot: [V: 04-1] rake_modules: update the dynamic spec test to use ParallelTests [puppet] - 10https://gerrit.wikimedia.org/r/700660 (owner: 10Jbond) [10:49:26] (03PS8) 10Jbond: rspec: update to use rspec-mock instead of mocha [puppet] - 10https://gerrit.wikimedia.org/r/649684 [10:52:15] (03CR) 10jerkins-bot: [V: 04-1] rspec: update to use rspec-mock instead of mocha [puppet] - 10https://gerrit.wikimedia.org/r/649684 (owner: 10Jbond) [10:55:22] (03PS9) 10Jbond: rspec: update to use rspec-mock instead of mocha [puppet] - 10https://gerrit.wikimedia.org/r/649684 [10:56:38] 10SRE, 10MW-on-K8s, 10serviceops: The mediawiki-webserver image should only log in json format - https://phabricator.wikimedia.org/T285384 (10Joe) [10:56:48] 10SRE, 10MW-on-K8s, 10serviceops: The mediawiki-webserver image should only log in json format - https://phabricator.wikimedia.org/T285384 (10Joe) p:05Triage→03Medium [10:58:12] (03CR) 10jerkins-bot: [V: 04-1] rspec: update to use rspec-mock instead of mocha [puppet] - 10https://gerrit.wikimedia.org/r/649684 (owner: 10Jbond) [10:58:30] 10SRE, 10MW-on-K8s, 10serviceops, 10Patch-For-Review: Install wiki-specific php extensions in the mediawiki production image - https://phabricator.wikimedia.org/T285309 (10Joe) 05Open→03Resolved [10:58:36] 10SRE, 10MW-on-K8s, 10serviceops: Make all httpbb tests pass on the mwdebug deployment. - https://phabricator.wikimedia.org/T285298 (10Joe) [10:58:49] (03PS1) 10Muehlenhoff: Extend access for aikochou [puppet] - 10https://gerrit.wikimedia.org/r/701075 [11:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: Your horoscope predicts another unfortunate European mid-day backport window deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210623T1100). [11:00:04] No GERRIT patches in the queue for this window AFAICS. [11:00:07] (03PS2) 10David Caro: toolforge.genpp: add buster repos [puppet] - 10https://gerrit.wikimedia.org/r/701062 [11:00:09] (03PS5) 10David Caro: toolforge: Add buster specific packages/setting [puppet] - 10https://gerrit.wikimedia.org/r/700186 [11:00:11] (03PS4) 10David Caro: toolforge.exec_environ: add tests [puppet] - 10https://gerrit.wikimedia.org/r/701063 [11:00:30] (03CR) 10David Caro: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/701063 (owner: 10David Caro) [11:00:51] 10SRE, 10MW-on-K8s, 10serviceops: Make all httpbb tests pass on the mwdebug deployment. - https://phabricator.wikimedia.org/T285298 (10Joe) [11:02:55] (03PS5) 10Vgutierrez: varnish: Add listen on UDS support [puppet] - 10https://gerrit.wikimedia.org/r/701056 (https://phabricator.wikimedia.org/T285374) [11:09:26] (03CR) 10Muehlenhoff: [C: 03+2] Extend access for aikochou [puppet] - 10https://gerrit.wikimedia.org/r/701075 (owner: 10Muehlenhoff) [11:10:57] !log Removing peering to AS39651 / "Com Hem AB" at AMS-IX (cr2-esams). Peer has left IX. [11:11:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:19:11] (03PS10) 10Jbond: rspec: update to use rspec-mock instead of mocha [puppet] - 10https://gerrit.wikimedia.org/r/649684 [11:19:18] (03Abandoned) 10Jbond: Gemfile: update puppetlabs_spec_helper [puppet] - 10https://gerrit.wikimedia.org/r/701071 (owner: 10Jbond) [11:19:30] (03Abandoned) 10Jbond: spec_hellper: switch to rspec mocha and add rspec_parrallel arguments [puppet] - 10https://gerrit.wikimedia.org/r/701072 (owner: 10Jbond) [11:20:08] (03CR) 10jerkins-bot: [V: 04-1] rspec: update to use rspec-mock instead of mocha [puppet] - 10https://gerrit.wikimedia.org/r/649684 (owner: 10Jbond) [11:21:24] 10SRE, 10MW-on-K8s, 10serviceops, 10User-jijiki: The mediawiki-webserver image should only log in json format - https://phabricator.wikimedia.org/T285384 (10jijiki) a:03jijiki [11:22:20] (03PS11) 10Jbond: rspec: update to use rspec-mock instead of mocha [puppet] - 10https://gerrit.wikimedia.org/r/649684 [11:22:22] (03PS1) 10Jbond: spec_hellper: switch to rspec mocha and add rspec_parrallel arguments [puppet] - 10https://gerrit.wikimedia.org/r/701077 [11:23:16] (03CR) 10jerkins-bot: [V: 04-1] rspec: update to use rspec-mock instead of mocha [puppet] - 10https://gerrit.wikimedia.org/r/649684 (owner: 10Jbond) [11:24:11] (03CR) 10David Caro: [V: 03+1] "PCC SUCCESS (NOOP 33): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29976/console" [puppet] - 10https://gerrit.wikimedia.org/r/700186 (owner: 10David Caro) [11:25:04] (03PS12) 10Jbond: rspec: update to use rspec-mock instead of mocha [puppet] - 10https://gerrit.wikimedia.org/r/649684 [11:25:06] (03PS1) 10Jbond: Gemfile: update rspec-puppet and puppetlabs_spec_helper [puppet] - 10https://gerrit.wikimedia.org/r/701078 [11:26:04] (03CR) 10jerkins-bot: [V: 04-1] rspec: update to use rspec-mock instead of mocha [puppet] - 10https://gerrit.wikimedia.org/r/649684 (owner: 10Jbond) [11:30:59] (03PS13) 10Jbond: rspec: update to use rspec-mock instead of mocha [puppet] - 10https://gerrit.wikimedia.org/r/649684 [11:31:22] (03CR) 10Jbond: [C: 03+2] Gemfile: update rspec-puppet and puppetlabs_spec_helper [puppet] - 10https://gerrit.wikimedia.org/r/701078 (owner: 10Jbond) [11:38:28] (03PS2) 10Ema: varnish: add error counters to Prometheus [puppet] - 10https://gerrit.wikimedia.org/r/701070 (https://phabricator.wikimedia.org/T284576) [11:40:17] (03CR) 10Ema: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/701070 (https://phabricator.wikimedia.org/T284576) (owner: 10Ema) [11:42:49] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM" [homer/public] - 10https://gerrit.wikimedia.org/r/700939 (owner: 10Ayounsi) [11:44:51] (03CR) 10Ayounsi: [C: 03+2] Simplify labs-in4/6 firewall filters [homer/public] - 10https://gerrit.wikimedia.org/r/700939 (owner: 10Ayounsi) [11:46:48] !log Simplify labs-in4/6 firewall filters - CR700939 [11:46:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:49:04] 10SRE, 10Commons, 10MediaWiki-File-management, 10SRE-swift-storage, and 4 others: Re-deleting a Commons file: "Error deleting file: The file "mwstore://local-multiwrite/local-deleted/..." is in an inconsistent state within the internal storage backends". - https://phabricator.wikimedia.org/T270994 (10JGHowe... [11:51:18] (03PS1) 10Ema: varnish: install mtail programs in a loop [puppet] - 10https://gerrit.wikimedia.org/r/701083 [11:52:11] (03PS2) 10Ema: varnish: install mtail programs in a loop [puppet] - 10https://gerrit.wikimedia.org/r/701083 [11:52:22] (03CR) 10Ema: "check experimental" [puppet] - 10https://gerrit.wikimedia.org/r/701083 (owner: 10Ema) [11:55:19] (03PS1) 10Muehlenhoff: Remove backup for sretest [puppet] - 10https://gerrit.wikimedia.org/r/701084 [11:57:26] (03CR) 10Volans: [C: 03+1] "Thanks for the fixes, much nicer! LGTM, one doc nit inline." (031 comment) [software/pywmflib] - 10https://gerrit.wikimedia.org/r/695341 (https://phabricator.wikimedia.org/T283242) (owner: 10Jbond) [11:58:31] (03PS1) 10Ayounsi: Port cloud-in4 to Capirca [homer/public] - 10https://gerrit.wikimedia.org/r/701085 (https://phabricator.wikimedia.org/T273865) [11:59:44] 10SRE, 10Maps, 10Product-Infrastructure-Team-Backlog, 10Traffic, 10Epic: Support maps serving for affiliate sites via an allow list - https://phabricator.wikimedia.org/T261694 (10valerio.bozzolan) > Error: 403, Forbidden: Map tiles are restricted to Wikimedia & affiliated sites only. Please post on https... [12:03:51] (03PS2) 10Ayounsi: Port cloud-in4 to Capirca [homer/public] - 10https://gerrit.wikimedia.org/r/701085 (https://phabricator.wikimedia.org/T273865) [12:05:27] (03CR) 10Kormat: [C: 03+1] mediawiki: Reduce purgeParserCache.php sleep from 500ms to 200 [puppet] - 10https://gerrit.wikimedia.org/r/700957 (https://phabricator.wikimedia.org/T282761) (owner: 10Krinkle) [12:06:03] (03PS18) 10Jbond: IDM: create new idm library with logoutd base class [software/pywmflib] - 10https://gerrit.wikimedia.org/r/695341 (https://phabricator.wikimedia.org/T283242) [12:06:39] (03CR) 10Volans: [C: 03+1] "Ship it!" [software/pywmflib] - 10https://gerrit.wikimedia.org/r/695341 (https://phabricator.wikimedia.org/T283242) (owner: 10Jbond) [12:07:05] (03CR) 10Jbond: [C: 03+2] IDM: create new idm library with logoutd base class [software/pywmflib] - 10https://gerrit.wikimedia.org/r/695341 (https://phabricator.wikimedia.org/T283242) (owner: 10Jbond) [12:09:57] (03CR) 10jerkins-bot: [V: 04-1] IDM: create new idm library with logoutd base class [software/pywmflib] - 10https://gerrit.wikimedia.org/r/695341 (https://phabricator.wikimedia.org/T283242) (owner: 10Jbond) [12:10:25] (03CR) 10Ayounsi: "Current diff is: https://phabricator.wikimedia.org/P16705" [homer/public] - 10https://gerrit.wikimedia.org/r/701085 (https://phabricator.wikimedia.org/T273865) (owner: 10Ayounsi) [12:13:18] (03CR) 10Ayounsi: "Existing filter is there: https://github.com/wikimedia/operations-homer-public/blob/master/templates/cr/firewall.conf#L838" [homer/public] - 10https://gerrit.wikimedia.org/r/701085 (https://phabricator.wikimedia.org/T273865) (owner: 10Ayounsi) [12:15:16] !log [urbanecm@mwmaint1002 ~]$ foreachwikiindblist s2 recountCategories.php --mode=pages && foreachwikiindblist s2 recountCategories.php --mode=subcats && foreachwikiindblist s2 recountCategories.php --mode=files # T170737 [12:15:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:15:22] T170737: Run recountCategories.php on Wikimedia wikis - https://phabricator.wikimedia.org/T170737 [12:16:59] PROBLEM - LVS linkrecommendation eqiad port 4005/tcp - Link Recommendation- linkrecommendation.svc.eqiad.wmnet IPv4 on linkrecommendation.svc.eqiad.wmnet is CRITICAL: connect to address 10.2.2.23 and port 4005: Connection refused https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [12:18:53] RECOVERY - LVS linkrecommendation eqiad port 4005/tcp - Link Recommendation- linkrecommendation.svc.eqiad.wmnet IPv4 on linkrecommendation.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 193 bytes in 1.054 second response time https://wikitech.wikimedia.org/wiki/LVS%23Diagnosing_problems [12:26:15] !log [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s5 [12:26:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:26:21] T170737: Run recountCategories.php on Wikimedia wikis - https://phabricator.wikimedia.org/T170737 [12:28:07] (03CR) 10Jcrespo: [C: 03+1] "No blocker on my side." [puppet] - 10https://gerrit.wikimedia.org/r/701084 (owner: 10Muehlenhoff) [12:29:40] (03PS19) 10Jbond: IDM: create new idm library with logoutd base class [software/pywmflib] - 10https://gerrit.wikimedia.org/r/695341 (https://phabricator.wikimedia.org/T283242) [12:32:25] (03Abandoned) 10Marostegui: orchestrator.conf: Do not promote sanitarium masters/backup hosts [puppet] - 10https://gerrit.wikimedia.org/r/700928 (owner: 10Marostegui) [12:33:25] 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): hw troubleshooting: server hardlocking for cloudmetrics1002.eqiad.wmnet - https://phabricator.wikimedia.org/T281881 (10jcrespo) >>! In T281881#7146744, @aborrero wrote: > Please @jcrespo enable backups on that server again. @aborrero I did th... [12:34:56] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] "LGTM" [homer/public] - 10https://gerrit.wikimedia.org/r/701085 (https://phabricator.wikimedia.org/T273865) (owner: 10Ayounsi) [12:35:38] !log [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s6 [12:35:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:35:43] T170737: Run recountCategories.php on Wikimedia wikis - https://phabricator.wikimedia.org/T170737 [12:37:30] (03CR) 10Jbond: [C: 03+2] IDM: create new idm library with logoutd base class [software/pywmflib] - 10https://gerrit.wikimedia.org/r/695341 (https://phabricator.wikimedia.org/T283242) (owner: 10Jbond) [12:46:17] !log [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s7 [12:46:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:46:23] T170737: Run recountCategories.php on Wikimedia wikis - https://phabricator.wikimedia.org/T170737 [12:47:25] 10SRE, 10Commons, 10MediaWiki-File-management, 10SRE-swift-storage, and 4 others: Re-deleting a Commons file: "Error deleting file: The file "mwstore://local-multiwrite/local-deleted/..." is in an inconsistent state within the internal storage backends". - https://phabricator.wikimedia.org/T270994 (10Aklapp... [12:50:46] (03CR) 10Elukey: [V: 03+2 C: 03+2] istio: add a more specific build target [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/701067 (https://phabricator.wikimedia.org/T278192) (owner: 10Elukey) [12:57:04] 10SRE, 10Traffic, 10Patch-For-Review: Enable UDS support on varnish - https://phabricator.wikimedia.org/T285374 (10Vgutierrez) Initial testing in our labs environment shows that curl doesn't play well with PROXY protocol **and** unix domain sockets: ` root@traffic-cache-atsupload-buster:~# curl --haproxy-pro... [12:59:22] !log [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s3 [12:59:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:59:27] T170737: Run recountCategories.php on Wikimedia wikis - https://phabricator.wikimedia.org/T170737 [13:01:24] (03PS1) 10Krinkle: InitialiseSettings: Change wgEntitySchemaShExSimpleUrl to toolforge.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701093 (https://phabricator.wikimedia.org/T262072) [13:01:26] (03PS1) 10Krinkle: CommonSettings: Restore wgCSPFalsePositiveUrls for intuition.toolforge.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701094 (https://phabricator.wikimedia.org/T207900) [13:04:29] (03PS1) 10Volans: CHANGELOG: add changelogs for release v4.1.1 [software/cumin] - 10https://gerrit.wikimedia.org/r/701095 [13:06:25] (03CR) 10Ottomata: [C: 03+1] superset: rename analytics_cluster::ui::{dashboards,superset} [puppet] - 10https://gerrit.wikimedia.org/r/701066 (https://phabricator.wikimedia.org/T268219) (owner: 10Razzi) [13:09:50] (03PS1) 10Elukey: istio: add proper quoting around "build" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/701096 [13:10:11] (03CR) 10Elukey: [V: 03+2 C: 03+2] istio: add proper quoting around "build" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/701096 (owner: 10Elukey) [13:13:56] (03CR) 10Volans: [C: 03+2] CHANGELOG: add changelogs for release v4.1.1 [software/cumin] - 10https://gerrit.wikimedia.org/r/701095 (owner: 10Volans) [13:16:21] (03PS4) 10Volans: Add python-build-bullseye image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/685462 [13:21:03] (03Merged) 10jenkins-bot: CHANGELOG: add changelogs for release v4.1.1 [software/cumin] - 10https://gerrit.wikimedia.org/r/701095 (owner: 10Volans) [13:23:42] (03PS1) 10Volans: Upstream release v4.1.1 [software/cumin] (debian) - 10https://gerrit.wikimedia.org/r/701098 [13:23:54] 10SRE, 10Services, 10Wikibase-Quality-Constraints, 10Wikidata, and 3 others: Deploy Shellbox instance (shellbox-constraints) for Wikidata constraint regexes - https://phabricator.wikimedia.org/T285104 (10Ladsgroup) [13:24:25] (03CR) 10Volans: [V: 03+2 C: 03+2] Add python-build-bullseye image [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/685462 (owner: 10Volans) [13:24:44] (03CR) 10Jbond: [C: 03+2] rspec: update to use rspec-mock instead of mocha [puppet] - 10https://gerrit.wikimedia.org/r/649684 (owner: 10Jbond) [13:24:59] (03CR) 10Elukey: [C: 03+1] "LGTM, I had the same idea but didn't want to mess up with the existing set up :)" [puppet] - 10https://gerrit.wikimedia.org/r/701069 (https://phabricator.wikimedia.org/T232358) (owner: 10Jbond) [13:25:28] (03PS6) 10Jbond: rake_modules: update the dynamic spec test to use ParallelTests [puppet] - 10https://gerrit.wikimedia.org/r/700660 [13:26:03] (03CR) 10jerkins-bot: [V: 04-1] rake_modules: update the dynamic spec test to use ParallelTests [puppet] - 10https://gerrit.wikimedia.org/r/700660 (owner: 10Jbond) [13:26:11] (03PS2) 10Jbond: spec_helper: switch to rspec mocha and add rspec_parrallel arguments [puppet] - 10https://gerrit.wikimedia.org/r/701077 [13:26:21] (03PS3) 10Jbond: spec_helper: switch to rspec mocha and add rspec_parrallel arguments [puppet] - 10https://gerrit.wikimedia.org/r/701077 [13:26:31] (03PS7) 10Jbond: rake_modules: update the dynamic spec test to use ParallelTests [puppet] - 10https://gerrit.wikimedia.org/r/700660 [13:27:26] (03CR) 10jerkins-bot: [V: 04-1] rake_modules: update the dynamic spec test to use ParallelTests [puppet] - 10https://gerrit.wikimedia.org/r/700660 (owner: 10Jbond) [13:27:42] !log [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s4 [13:27:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:27:48] T170737: Run recountCategories.php on Wikimedia wikis - https://phabricator.wikimedia.org/T170737 [13:29:26] (03CR) 10Volans: [C: 03+2] Upstream release v4.1.1 [software/cumin] (debian) - 10https://gerrit.wikimedia.org/r/701098 (owner: 10Volans) [13:31:42] (03PS2) 10Effie Mouzeli: profile::thanos::swift: add account for tegola vector tiles [puppet] - 10https://gerrit.wikimedia.org/r/700862 (https://phabricator.wikimedia.org/T283049) [13:32:22] (03PS8) 10Jbond: rake_modules: update the dynamic spec test to use ParallelTests [puppet] - 10https://gerrit.wikimedia.org/r/700660 [13:32:49] (03CR) 10Ottomata: [C: 03+2] Finalize backend migration of CentralNotice EL schemas [puppet] - 10https://gerrit.wikimedia.org/r/699786 (https://phabricator.wikimedia.org/T259163) (owner: 10Ottomata) [13:35:46] (03Merged) 10jenkins-bot: Upstream release v4.1.1 [software/cumin] (debian) - 10https://gerrit.wikimedia.org/r/701098 (owner: 10Volans) [13:41:57] (03CR) 10Filippo Giunchedi: [C: 03+1] profile::thanos::swift: add account for tegola vector tiles [puppet] - 10https://gerrit.wikimedia.org/r/700862 (https://phabricator.wikimedia.org/T283049) (owner: 10Effie Mouzeli) [13:42:40] (03CR) 10Kormat: [C: 03+2] mediawiki: Reduce purgeParserCache.php sleep from 500ms to 200 [puppet] - 10https://gerrit.wikimedia.org/r/700957 (https://phabricator.wikimedia.org/T282761) (owner: 10Krinkle) [13:45:43] !log uploaded cumin_4.1.1 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia [13:45:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:47:23] (03CR) 10Giuseppe Lavagetto: [C: 03+1] rake_modules: update the dynamic spec test to use ParallelTests (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/700660 (owner: 10Jbond) [13:48:24] (03CR) 10Giuseppe Lavagetto: [C: 03+1] spec_helper: switch to rspec mocha and add rspec_parrallel arguments [puppet] - 10https://gerrit.wikimedia.org/r/701077 (owner: 10Jbond) [13:48:52] (03PS1) 10Ottomata: Finalize WMDEBanner* schema migration to Event Platform [extensions/WikimediaEvents] (wmf/1.37.0-wmf.11) - 10https://gerrit.wikimedia.org/r/701103 (https://phabricator.wikimedia.org/T282562) [13:51:21] (03CR) 10Effie Mouzeli: [C: 03+2] profile::thanos::swift: add fake credentials for tegola_prod [labs/private] - 10https://gerrit.wikimedia.org/r/700863 (https://phabricator.wikimedia.org/T283049) (owner: 10Effie Mouzeli) [13:51:25] (03CR) 10Jbond: rake_modules: update the dynamic spec test to use ParallelTests (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/700660 (owner: 10Jbond) [13:51:27] (03PS9) 10Jbond: rake_modules: update the dynamic spec test to use ParallelTests [puppet] - 10https://gerrit.wikimedia.org/r/700660 [13:51:30] (03CR) 10Effie Mouzeli: [V: 03+2 C: 03+2] profile::thanos::swift: add fake credentials for tegola_prod [labs/private] - 10https://gerrit.wikimedia.org/r/700863 (https://phabricator.wikimedia.org/T283049) (owner: 10Effie Mouzeli) [13:51:52] (03CR) 10Effie Mouzeli: [C: 03+2] profile::thanos::swift: add account for tegola vector tiles [puppet] - 10https://gerrit.wikimedia.org/r/700862 (https://phabricator.wikimedia.org/T283049) (owner: 10Effie Mouzeli) [13:54:02] !log rolling restart thanos-fe* to pick up new tegola-vector-tiles account - T283049 [13:54:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:54:07] T283049: Swift account to store pre-rendered vector-tiles - https://phabricator.wikimedia.org/T283049 [13:56:51] (03CR) 10Jbond: [C: 03+2] "> Patch Set 2: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/701069 (https://phabricator.wikimedia.org/T232358) (owner: 10Jbond) [13:59:14] (03PS1) 10Volans: cuminunpriv: add kerberos config bits [puppet] - 10https://gerrit.wikimedia.org/r/701105 (https://phabricator.wikimedia.org/T244840) [14:03:44] (03PS1) 10Effie Mouzeli: Minor fix on confirmation message. [software/cumin] - 10https://gerrit.wikimedia.org/r/701107 [14:20:24] (03CR) 10Volans: [C: 03+2] "Thanks, LGTM" [software/cumin] - 10https://gerrit.wikimedia.org/r/701107 (owner: 10Effie Mouzeli) [14:25:30] (03PS1) 10Ottomata: DRY profile::analytics::cluster::users [puppet] - 10https://gerrit.wikimedia.org/r/701112 (https://phabricator.wikimedia.org/T284225) [14:25:35] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good!" [puppet] - 10https://gerrit.wikimedia.org/r/701105 (https://phabricator.wikimedia.org/T244840) (owner: 10Volans) [14:25:41] (03Merged) 10jenkins-bot: Minor fix on confirmation message. [software/cumin] - 10https://gerrit.wikimedia.org/r/701107 (owner: 10Effie Mouzeli) [14:25:49] (03PS1) 10Elukey: istio: bump proxyv2 docker image version to 1.9.5-2 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/701113 [14:28:05] (03CR) 10Elukey: [V: 03+2 C: 03+2] istio: bump proxyv2 docker image version to 1.9.5-2 [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/701113 (owner: 10Elukey) [14:28:40] (03PS1) 10Muehlenhoff: Extend access for Kay Wong [puppet] - 10https://gerrit.wikimedia.org/r/701114 [14:29:14] (03CR) 10Ottomata: "Should be a no-op" [puppet] - 10https://gerrit.wikimedia.org/r/701112 (https://phabricator.wikimedia.org/T284225) (owner: 10Ottomata) [14:29:38] (03CR) 10Muehlenhoff: [C: 03+2] Extend access for Kay Wong [puppet] - 10https://gerrit.wikimedia.org/r/701114 (owner: 10Muehlenhoff) [14:29:47] (03PS2) 10Muehlenhoff: Extend access for Kay Wong [puppet] - 10https://gerrit.wikimedia.org/r/701114 [14:30:06] (03CR) 10Ottomata: [V: 03+1] "PCC SUCCESS (DIFF 3): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29977/console" [puppet] - 10https://gerrit.wikimedia.org/r/701112 (https://phabricator.wikimedia.org/T284225) (owner: 10Ottomata) [14:33:07] (03CR) 10Ottomata: [V: 03+1] "PCC SUCCESS (DIFF 4): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29978/console" [puppet] - 10https://gerrit.wikimedia.org/r/701112 (https://phabricator.wikimedia.org/T284225) (owner: 10Ottomata) [14:33:10] (03PS2) 10Cwhite: logstash: add ECS transition support for Oslo structured logs [puppet] - 10https://gerrit.wikimedia.org/r/695563 (https://phabricator.wikimedia.org/T234565) [14:37:16] (03PS4) 10Cwhite: Add metric group. [software/ecs] - 10https://gerrit.wikimedia.org/r/699428 [14:38:47] (03CR) 10Cwhite: [C: 03+2] Add metric group. [software/ecs] - 10https://gerrit.wikimedia.org/r/699428 (owner: 10Cwhite) [14:39:30] (03Merged) 10jenkins-bot: Add metric group. [software/ecs] - 10https://gerrit.wikimedia.org/r/699428 (owner: 10Cwhite) [14:40:11] (03CR) 10Ottomata: [V: 03+1 C: 03+2] DRY profile::analytics::cluster::users [puppet] - 10https://gerrit.wikimedia.org/r/701112 (https://phabricator.wikimedia.org/T284225) (owner: 10Ottomata) [14:41:24] (03CR) 10Elukey: [C: 03+1] "Change looks good but my understanding is that this module will likely go away when an-master1001 will be migrated to Buster. Should we ke" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/701112 (https://phabricator.wikimedia.org/T284225) (owner: 10Ottomata) [14:41:40] (03PS1) 10Muehlenhoff: return-tgt-for-user: Fix date parsing [puppet] - 10https://gerrit.wikimedia.org/r/701116 [14:42:02] (03CR) 10Ottomata: DRY profile::analytics::cluster::users (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/701112 (https://phabricator.wikimedia.org/T284225) (owner: 10Ottomata) [14:42:06] (03PS1) 10Urbanecm: Make Growth features available to newcomers at lvwiki and skwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701117 (https://phabricator.wikimedia.org/T278191) [14:42:53] (03PS2) 10Muehlenhoff: return-tgt-for-user: Fix date parsing [puppet] - 10https://gerrit.wikimedia.org/r/701116 [14:49:58] (03PS1) 10Cwhite: logstash: add and enable ecs revision 4 [puppet] - 10https://gerrit.wikimedia.org/r/701119 [14:51:03] (03PS1) 10Urbanecm: EditGrowthConfig: Suggested edit "Learn more" link should support interwiki [extensions/GrowthExperiments] (wmf/1.37.0-wmf.9) - 10https://gerrit.wikimedia.org/r/700740 (https://phabricator.wikimedia.org/T279886) [14:53:36] !log [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s8 [14:53:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:53:42] T170737: Run recountCategories.php on Wikimedia wikis - https://phabricator.wikimedia.org/T170737 [14:54:29] !log [urbanecm@mwmaint1002 ~]$ foreachwikiindblist $SHARD recountCategories.php --mode=pages && foreachwikiindblist $SHARD recountCategories.php --mode=subcats && foreachwikiindblist $SHARD recountCategories.php --mode=files # T170737, SHARD=s1 [14:54:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:55:29] (03CR) 10Cwhite: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/29980/" [puppet] - 10https://gerrit.wikimedia.org/r/701119 (owner: 10Cwhite) [14:56:59] (03PS1) 10Andrew Bogott: puppet_enc_logrotate: remove sudo line [puppet] - 10https://gerrit.wikimedia.org/r/701122 [14:57:27] (03CR) 10Hashar: "> Unfortunately gitiles.jar is already bundled in gerrit.war and would thus be overwritten when updating Gerrit :-\" [software/gerrit] (wmf/stable-3.2) - 10https://gerrit.wikimedia.org/r/700932 (https://phabricator.wikimedia.org/T240264) (owner: 10Hashar) [14:57:54] (03CR) 10Andrew Bogott: [C: 03+2] puppet_enc_logrotate: remove sudo line [puppet] - 10https://gerrit.wikimedia.org/r/701122 (owner: 10Andrew Bogott) [14:59:29] (03PS1) 10Muehlenhoff: Remove access for liw [puppet] - 10https://gerrit.wikimedia.org/r/701125 [15:01:24] (03PS4) 10Cwhite: logstash: add ecs migration config for sampled webrequest logs [puppet] - 10https://gerrit.wikimedia.org/r/699254 (https://phabricator.wikimedia.org/T234565) [15:01:41] (03PS2) 10Muehlenhoff: Remove access for liw [puppet] - 10https://gerrit.wikimedia.org/r/701125 [15:03:16] (03CR) 10jerkins-bot: [V: 04-1] logstash: add ecs migration config for sampled webrequest logs [puppet] - 10https://gerrit.wikimedia.org/r/699254 (https://phabricator.wikimedia.org/T234565) (owner: 10Cwhite) [15:04:18] (03PS5) 10Cwhite: logstash: add ecs migration config for sampled webrequest logs [puppet] - 10https://gerrit.wikimedia.org/r/699254 (https://phabricator.wikimedia.org/T234565) [15:06:43] (03CR) 10Muehlenhoff: [C: 03+2] Remove access for liw [puppet] - 10https://gerrit.wikimedia.org/r/701125 (owner: 10Muehlenhoff) [15:17:58] !log Removing peering to AS64050 / "BGP Consultancy Pte Ltd" at AMS-IX (cr2-esams). Peer has left IX. [15:18:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:20:53] (03PS1) 10Elukey: reportupdater::job: fix variable name [puppet] - 10https://gerrit.wikimedia.org/r/701130 (https://phabricator.wikimedia.org/T274880) [15:21:53] (03CR) 10Mforns: [C: 03+1] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/701130 (https://phabricator.wikimedia.org/T274880) (owner: 10Elukey) [15:22:58] (03CR) 10Elukey: [V: 03+1] "PCC SUCCESS (DIFF 1): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/29981/console" [puppet] - 10https://gerrit.wikimedia.org/r/701130 (https://phabricator.wikimedia.org/T274880) (owner: 10Elukey) [15:25:36] (03CR) 10Elukey: [V: 03+1 C: 03+2] reportupdater::job: fix variable name [puppet] - 10https://gerrit.wikimedia.org/r/701130 (https://phabricator.wikimedia.org/T274880) (owner: 10Elukey) [15:27:18] (03CR) 10Volans: [C: 03+2] cuminunpriv: add kerberos config bits [puppet] - 10https://gerrit.wikimedia.org/r/701105 (https://phabricator.wikimedia.org/T244840) (owner: 10Volans) [15:32:37] 10SRE, 10Cloud-Services, 10cloud-services-team (Kanban): rack/setup/install labmon1002 - https://phabricator.wikimedia.org/T165784 (10aborrero) [15:33:19] 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): hw troubleshooting: server hardlocking for cloudmetrics1002.eqiad.wmnet - https://phabricator.wikimedia.org/T281881 (10aborrero) 05Open→03Resolved thanks @jcrespo For the record, I just marked the server in netbox as `Active`. [15:46:18] (03CR) 10SBassett: [C: 03+1] CommonSettings: Restore wgCSPFalsePositiveUrls for intuition.toolforge.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701094 (https://phabricator.wikimedia.org/T207900) (owner: 10Krinkle) [15:48:39] PROBLEM - Host wdqs1013 is DOWN: PING CRITICAL - Packet loss = 100% [15:48:50] jouncebot: next [15:48:50] In 2 hour(s) and 11 minute(s): Train log triage with CPT (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210623T1800) [15:48:50] In 2 hour(s) and 11 minute(s): Morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210623T1800) [15:49:29] RECOVERY - Host wdqs1013 is UP: PING OK - Packet loss = 0%, RTA = 2.39 ms [15:53:41] (03CR) 10Krinkle: InitialiseSettings: Add toolforge.org to wgNoFollowDomainExceptions (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701003 (https://phabricator.wikimedia.org/T285364) (owner: 10Krinkle) [15:53:48] (03PS2) 10Krinkle: InitialiseSettings: Change wgEntitySchemaShExSimpleUrl to toolforge.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701093 (https://phabricator.wikimedia.org/T262072) [15:53:53] (03PS2) 10Krinkle: CommonSettings: Restore wgCSPFalsePositiveUrls for intuition.toolforge.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701094 (https://phabricator.wikimedia.org/T207900) [16:00:19] 10SRE, 10Maps, 10Product-Infrastructure-Team-Backlog, 10Services, and 2 others: New Service Request tegola-vector-tiles - https://phabricator.wikimedia.org/T274390 (10jijiki) [16:01:00] (03PS1) 10Effie Mouzeli: tegola-vector-tiles: add helmfile.d config [deployment-charts] - 10https://gerrit.wikimedia.org/r/701138 (https://phabricator.wikimedia.org/T283159) [16:02:33] (03CR) 10jerkins-bot: [V: 04-1] tegola-vector-tiles: add helmfile.d config [deployment-charts] - 10https://gerrit.wikimedia.org/r/701138 (https://phabricator.wikimedia.org/T283159) (owner: 10Effie Mouzeli) [16:04:08] (03PS2) 10Effie Mouzeli: tegola-vector-tiles: add helmfile.d config [deployment-charts] - 10https://gerrit.wikimedia.org/r/701138 (https://phabricator.wikimedia.org/T283159) [16:11:57] (03CR) 10BryanDavis: InitialiseSettings: Add toolforge.org to wgNoFollowDomainExceptions (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701003 (https://phabricator.wikimedia.org/T285364) (owner: 10Krinkle) [16:40:39] (03PS3) 10Jgiannelos: Maps vector server PostGIS query improvements [deployment-charts] - 10https://gerrit.wikimedia.org/r/685799 (https://phabricator.wikimedia.org/T281976) [16:42:33] !log re-start sending traffic on the codfw-eqsin Telia transport link [16:42:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:46:53] 10SRE, 10Maps, 10Product-Infrastructure-Team-Backlog, 10Traffic, 10Epic: Support maps serving for affiliate sites via an allow list - https://phabricator.wikimedia.org/T261694 (10AntiCompositeNumber) >>! In T261694#7172223, @valerio.bozzolan wrote: > Edited: I think that these domains could be whiteliste... [16:54:18] (03CR) 10Krinkle: "Impact / Test plan: The "check entities" link rendered at https://www.wikidata.org/wiki/EntitySchema:E100" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701093 (https://phabricator.wikimedia.org/T262072) (owner: 10Krinkle) [16:54:26] (03PS3) 10Krinkle: CommonSettings: Restore wgCSPFalsePositiveUrls for intuition.toolforge.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701094 (https://phabricator.wikimedia.org/T207900) [16:54:53] PROBLEM - MediaWiki exceptions and fatals per minute for parsoid on alert1001 is CRITICAL: 107 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [16:56:37] (03PS4) 10Jgiannelos: Maps vector server PostGIS query improvements [deployment-charts] - 10https://gerrit.wikimedia.org/r/685799 (https://phabricator.wikimedia.org/T281976) [16:59:34] (03CR) 10Jgiannelos: "> Patch Set 1: Code-Review-1" [deployment-charts] - 10https://gerrit.wikimedia.org/r/685799 (https://phabricator.wikimedia.org/T281976) (owner: 10Jgiannelos) [17:00:29] RECOVERY - MediaWiki exceptions and fatals per minute for parsoid on alert1001 is OK: (C)100 gt (W)50 gt 10 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [17:06:11] (03CR) 10Brennen Bearnes: [C: 04-1] "> Patch Set 1:" [gitlab-ansible] - 10https://gerrit.wikimedia.org/r/701068 (https://phabricator.wikimedia.org/T274463) (owner: 10Jelto) [17:07:43] !log beginning rolling reboots of kafka-main200[1-5] for updates [17:07:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:09:10] (03CR) 10Brennen Bearnes: [C: 04-1] "An example of current behavior: https://phabricator.wikimedia.org/P16708" [gitlab-ansible] - 10https://gerrit.wikimedia.org/r/701068 (https://phabricator.wikimedia.org/T274463) (owner: 10Jelto) [17:09:34] (03CR) 10MSantos: [C: 03+2] Maps vector server PostGIS query improvements [deployment-charts] - 10https://gerrit.wikimedia.org/r/685799 (https://phabricator.wikimedia.org/T281976) (owner: 10Jgiannelos) [17:09:36] (03CR) 10Jgiannelos: "Adding Effie in the loop." [deployment-charts] - 10https://gerrit.wikimedia.org/r/685799 (https://phabricator.wikimedia.org/T281976) (owner: 10Jgiannelos) [17:12:28] (03Merged) 10jenkins-bot: Maps vector server PostGIS query improvements [deployment-charts] - 10https://gerrit.wikimedia.org/r/685799 (https://phabricator.wikimedia.org/T281976) (owner: 10Jgiannelos) [17:22:02] (03PS1) 10Gergő Tisza: Add custom signup flow for donors [extensions/GrowthExperiments] (wmf/1.37.0-wmf.9) - 10https://gerrit.wikimedia.org/r/700741 (https://phabricator.wikimedia.org/T284740) [17:23:02] (03PS1) 10Gergő Tisza: Add custom signup flow for donors [extensions/GrowthExperiments] (wmf/1.37.0-wmf.11) - 10https://gerrit.wikimedia.org/r/700742 (https://phabricator.wikimedia.org/T284740) [17:25:35] 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): cloudvirt1038: PCIe error - https://phabricator.wikimedia.org/T276922 (10nskaggs) @Cmjohnson Are we still looking at returning this machine? Has an RMA been started? [17:27:07] (03CR) 10Legoktm: [C: 03+2] Bump envoy timeout for parsoid-php [puppet] - 10https://gerrit.wikimedia.org/r/699425 (https://phabricator.wikimedia.org/T244609) (owner: 10Arlolra) [17:27:12] (03PS2) 10Legoktm: Bump envoy timeout for parsoid-php [puppet] - 10https://gerrit.wikimedia.org/r/699425 (https://phabricator.wikimedia.org/T244609) (owner: 10Arlolra) [17:41:15] PROBLEM - Host wdqs1013 is DOWN: PING CRITICAL - Packet loss = 100% [17:41:21] RECOVERY - Host wdqs1013 is UP: PING OK - Packet loss = 0%, RTA = 0.31 ms [17:43:55] 10SRE, 10SRE-Access-Requests: re-open access to Analytic Cluster for ChristineDeKock - https://phabricator.wikimedia.org/T284987 (10ChristineDeKock) 05Resolved→03Open [17:44:07] 10SRE, 10SRE-Access-Requests: re-open access to Analytic Cluster for ChristineDeKock - https://phabricator.wikimedia.org/T284987 (10ChristineDeKock) Hi all, Thank you for your trouble on this. Unfortunately, I am having some trouble with credentials when accessing the server. I ssh’d into stat1008.eqiad.wm... [17:49:28] 10SRE, 10conftool, 10serviceops, 10Datacenter-Switchover, 10Patch-For-Review: Disable maintenance scripts via conftool - https://phabricator.wikimedia.org/T266717 (10RLazarus) >>! In T266717#7171743, @Legoktm wrote: > I guess the concern is that starting the scripts right away just adds extra pressure an... [17:51:23] (03PS2) 10Legoktm: mediawiki: mw-cli-wrapper: Only run if read only in confctl is false [puppet] - 10https://gerrit.wikimedia.org/r/701055 (https://phabricator.wikimedia.org/T266717) [17:51:25] (03PS1) 10Legoktm: mediawiki: Port mw-cli-wrapper to Python [puppet] - 10https://gerrit.wikimedia.org/r/701164 [17:51:57] (03CR) 10Legoktm: mediawiki: mw-cli-wrapper: Only run if read only in confctl is false (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/701055 (https://phabricator.wikimedia.org/T266717) (owner: 10Legoktm) [17:52:11] (03PS2) 10Ottomata: Enable canary events for NavigationTiming ext streams [mediawiki-config] - 10https://gerrit.wikimedia.org/r/699789 (https://phabricator.wikimedia.org/T271208) [17:55:11] (03CR) 10Ottomata: [C: 03+2] Enable canary events for NavigationTiming ext streams [mediawiki-config] - 10https://gerrit.wikimedia.org/r/699789 (https://phabricator.wikimedia.org/T271208) (owner: 10Ottomata) [17:57:46] !log otto@deploy1002 Synchronized wmf-config/InitialiseSettings.php: Enable canary events for NavigationTiming ext streams - T271208, T266798 (duration: 01m 29s) [17:57:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:57:52] T271208: NavigationTiming Extension schemas Event Platform Migration - https://phabricator.wikimedia.org/T271208 [17:57:52] T266798: Enable canary events for all streams - https://phabricator.wikimedia.org/T266798 [17:58:02] !log beginning rolling reboots of kafka-main100[1-5] for updates [17:58:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:00:04] marxarelli and jeena: That opportune time is upon us again. Time for a Train log triage with CPT deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210623T1800). [18:00:05] RoanKattouw, Niharika, and Urbanecm: Time to snap out of that daydream and deploy Morning backport window. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210623T1800). [18:00:05] thand, ottomata, Urbanecm, and tgr: A patch you scheduled for Morning backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:11] I can deploy today [18:00:11] o/ [18:00:23] (unless ottomata or others want to self-service) [18:00:37] not very familiar with backports, would apprecate the help [18:00:43] i do configs often, never done backpot [18:00:59] its a noop backport to wmf.11, just didn't make the branch cut [18:01:01] ottomata: do you want me to show you how to do it, or do it for you? [18:01:15] (I'm happy to do any of those, up to you) [18:01:55] (03CR) 10Urbanecm: [C: 03+2] Finalize WMDEBanner* schema migration to Event Platform [extensions/WikimediaEvents] (wmf/1.37.0-wmf.11) - 10https://gerrit.wikimedia.org/r/701103 (https://phabricator.wikimedia.org/T282562) (owner: 10Ottomata) [18:01:59] would appreciate if you did it for me! (that's why I scheduled it :) ), thank you very much! [18:02:10] okay, will do :) [18:02:21] (03CR) 10Urbanecm: [C: 03+2] EditGrowthConfig: Suggested edit "Learn more" link should support interwiki [extensions/GrowthExperiments] (wmf/1.37.0-wmf.9) - 10https://gerrit.wikimedia.org/r/700740 (https://phabricator.wikimedia.org/T279886) (owner: 10Urbanecm) [18:02:40] tgr|away: around? [18:03:16] (03PS2) 10Urbanecm: Make Growth features available to newcomers at lvwiki and skwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701117 (https://phabricator.wikimedia.org/T278191) [18:03:21] (03CR) 10Urbanecm: [C: 03+2] Make Growth features available to newcomers at lvwiki and skwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701117 (https://phabricator.wikimedia.org/T278191) (owner: 10Urbanecm) [18:04:26] (03Merged) 10jenkins-bot: Make Growth features available to newcomers at lvwiki and skwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701117 (https://phabricator.wikimedia.org/T278191) (owner: 10Urbanecm) [18:04:47] I don't see thand at IRC right now [18:04:55] that's me [18:05:00] I'm under wikitrent_ [18:05:30] hello wikitrent_ ! Are you able to test the patch at testwiki once it's ready for testing? [18:05:49] yes [18:05:58] great, i'll ping you when it's ready :) [18:06:40] (03Merged) 10jenkins-bot: Finalize WMDEBanner* schema migration to Event Platform [extensions/WikimediaEvents] (wmf/1.37.0-wmf.11) - 10https://gerrit.wikimedia.org/r/701103 (https://phabricator.wikimedia.org/T282562) (owner: 10Ottomata) [18:07:14] urbanecm: sorry, had to go afk for a few minutes [18:08:35] ack, no worries [18:09:04] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: b4a786799a249d2012b9c47553a0b64fdce1bac0: Make Growth features available to newcomers at lvwiki and skwiki (T278191; T284149) (duration: 01m 06s) [18:09:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:09:11] T278191: Deploy Growth experiments at Latvian Wikipedia - https://phabricator.wikimedia.org/T278191 [18:09:11] T284149: Deploy Growth features on Slovak Wikipedia - https://phabricator.wikimedia.org/T284149 [18:09:16] (03CR) 10Urbanecm: [C: 03+2] Add custom signup flow for donors [extensions/GrowthExperiments] (wmf/1.37.0-wmf.9) - 10https://gerrit.wikimedia.org/r/700741 (https://phabricator.wikimedia.org/T284740) (owner: 10Gergő Tisza) [18:09:18] (03CR) 10Urbanecm: [C: 03+2] Add custom signup flow for donors [extensions/GrowthExperiments] (wmf/1.37.0-wmf.11) - 10https://gerrit.wikimedia.org/r/700742 (https://phabricator.wikimedia.org/T284740) (owner: 10Gergő Tisza) [18:11:09] ottomata: I assume no testing is necessary for your patch, right? [18:11:22] (as you said it's a noop backport) [18:11:42] (03CR) 10Effie Mouzeli: "> Patch Set 4:" [deployment-charts] - 10https://gerrit.wikimedia.org/r/685799 (https://phabricator.wikimedia.org/T281976) (owner: 10Jgiannelos) [18:11:54] (03PS2) 10Urbanecm: Revert "Revert "Enable $wgSecurePollSingleTransferableVoteEnabled on beta sites"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/699558 (owner: 10STran) [18:11:56] (03CR) 10Urbanecm: [C: 03+2] Revert "Revert "Enable $wgSecurePollSingleTransferableVoteEnabled on beta sites"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/699558 (owner: 10STran) [18:13:20] ottomata: see my message above [18:13:31] (03Merged) 10jenkins-bot: Revert "Revert "Enable $wgSecurePollSingleTransferableVoteEnabled on beta sites"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/699558 (owner: 10STran) [18:13:54] wikitrent_: your patch is available at mwdebug1001 for testing, can you have a look please? [18:15:00] k. testing now [18:16:26] (03CR) 10RLazarus: "Thanks for doing this!" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/701164 (owner: 10Legoktm) [18:16:30] I'm going to mess around on mwmaint2002 to test some systemd stuff [18:17:05] legoktm: noting it's B&C time, and i'm running scap sync-file from time to time -- hopefully won't affect you. [18:19:08] 20:13 ottomata: see my message above # 2 [18:19:35] urbanecm: ack, it shouldn't [18:19:38] just in case any alerts come [18:22:07] !log ebernhardson@deploy1002 Started deploy [wikimedia/discovery/analytics@0535b94]: expect eventgate events for all datacenters, second try [18:22:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:23:12] wikitrent_: how is the test going? 🙂 [18:25:37] Sorry I'm helping Trent w/this and we mistook beta for test. We're working on getting admin access for Trent so he can test this. Sorry for the delay! 🙇‍♂️ [18:25:51] Tran: if you can give me his username, happy to grant +sysop [18:25:59] "Developertrent" [18:26:00] *their username [18:27:49] (change visibility) 20:27, 23 June 2021 Martin Urbanec talk contribs block changed group membership for Developertrent from (none) to administrator (temporary, until 20:27, 23 July 2021) (needed to test something with AHT) [18:28:02] Tran: wikitrent_: ^^ [18:28:33] (03Merged) 10jenkins-bot: EditGrowthConfig: Suggested edit "Learn more" link should support interwiki [extensions/GrowthExperiments] (wmf/1.37.0-wmf.9) - 10https://gerrit.wikimedia.org/r/700740 (https://phabricator.wikimedia.org/T279886) (owner: 10Urbanecm) [18:28:39] (03CR) 10Samuel (WMF): [C: 03+1] "No issues on my end. Thanks for submitting it." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701094 (https://phabricator.wikimedia.org/T207900) (owner: 10Krinkle) [18:29:56] Tran: wikitrent_: lmk if that fixed the access issue plese :) [18:29:59] *please [18:31:15] can confirm it works as expected on testwiki [18:31:19] !log ebernhardson@deploy1002 Finished deploy [wikimedia/discovery/analytics@0535b94]: expect eventgate events for all datacenters, second try (duration: 09m 11s) [18:31:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:31:22] thank you for the admin access [18:31:46] excellent. Syncing it to production. [18:33:14] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 3a2fc6e3687817db5c774e7e527a10dd9e974138: Enable $wgSecurePollSingleTransferableVoteEnabled on beta sites (duration: 01m 05s) [18:33:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:33:21] wikitrent_: should be done! [18:33:37] thanks! I appreciate all the help :) [18:33:45] any time :) [18:34:35] (03Merged) 10jenkins-bot: Add custom signup flow for donors [extensions/GrowthExperiments] (wmf/1.37.0-wmf.9) - 10https://gerrit.wikimedia.org/r/700741 (https://phabricator.wikimedia.org/T284740) (owner: 10Gergő Tisza) [18:34:39] (03Merged) 10jenkins-bot: Add custom signup flow for donors [extensions/GrowthExperiments] (wmf/1.37.0-wmf.11) - 10https://gerrit.wikimedia.org/r/700742 (https://phabricator.wikimedia.org/T284740) (owner: 10Gergő Tisza) [18:34:51] PROBLEM - Prometheus jobs reduced availability on alert1001 is CRITICAL: job=swagger_check_citoid_cluster_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [18:35:14] !log urbanecm@deploy1002 Synchronized php-1.37.0-wmf.9/extensions/GrowthExperiments/includes/Specials/SpecialEditGrowthConfig.php: 17efbafc300c6745928415d7a10f2dad8f406de4: EditGrowthConfig: Suggested edit "Learn more" link should support interwiki (T279886; T285385) (duration: 01m 06s) [18:35:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:35:21] T279886: Special:EditGrowthConfig does not allow users to use interwiki titles - https://phabricator.wikimedia.org/T279886 [18:35:21] T285385: PHP Deprecated: Use of HTMLTitleTextField::validate will reject external titles in 1.38 when interwiki is false was deprecated in MediaWiki 1.37. [Called from HTMLFormField::getErrorsRaw] - https://phabricator.wikimedia.org/T285385 [18:35:39] urbanecm: sorry yes no testing needed [18:35:49] thanks ottomata, syncing [18:35:51] phew [18:35:55] sorry got pulled into aconvo [18:35:56] thank you [18:36:02] no problem [18:36:37] tgr: pulled your patches to mwdebug1001, can you have a look? [18:36:43] RECOVERY - Prometheus jobs reduced availability on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [18:37:21] due to i18n changes, i'll have to use sync-world to get it live -- messages will probably be broken at the debug host [18:37:24] but otherwise it should work [18:37:35] looking [18:39:17] !log urbanecm@deploy1002 Synchronized php-1.37.0-wmf.11/extensions/WikimediaEvents/extension.json: 01f034b466ff7bdd274e18c9ad7cefe88245548d: Finalize WMDEBanner* schema migration to Event Platform (T282562) (duration: 01m 05s) [18:39:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:39:22] T282562: WMDEBanner* Event Platform Migration - https://phabricator.wikimedia.org/T282562 [18:39:33] ottomata: synced [18:40:10] urbanecm: oh duh, forgot it's feature-flagged. Do we have time for a quick follow-up? [18:40:30] tgr: certainly, you're the last B&C customer and we have 20 minutes [18:44:59] PROBLEM - MariaDB Replica Lag: m2 on db1117 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1073.06 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [18:44:59] PROBLEM - MariaDB Replica Lag: m2 on db2133 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1073.54 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [18:45:10] (03PS1) 10Gergő Tisza: Enable GrowthExperiments donor landing page for testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701169 (https://phabricator.wikimedia.org/T284799) [18:45:58] urbanecm: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/701169 (will add to the calendar afterwards) [18:46:13] great, merging [18:46:18] (03CR) 10Urbanecm: [C: 03+2] Enable GrowthExperiments donor landing page for testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701169 (https://phabricator.wikimedia.org/T284799) (owner: 10Gergő Tisza) [18:46:25] PROBLEM - MariaDB Replica Lag: m2 on db2078 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 1159.81 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [18:47:03] (03Merged) 10jenkins-bot: Enable GrowthExperiments donor landing page for testing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701169 (https://phabricator.wikimedia.org/T284799) (owner: 10Gergő Tisza) [18:47:32] tgr: pulled onto mwdebug1001 [18:48:33] (03CR) 10RLazarus: "Whew, this is indeed much cleaner! I commented on T266717 about whether to use the read_only state -- assuming we're going ahead with that" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/701055 (https://phabricator.wikimedia.org/T266717) (owner: 10Legoktm) [18:51:13] urbanecm: thanks, works [18:51:19] great, syncing [18:51:52] thank you so much urbanecm ! [18:53:33] !log urbanecm@deploy1002 Synchronized php-1.37.0-wmf.11/extensions/GrowthExperiments/: 76e5fc91083736d14049a05ed227cdea015c113e: Add custom signup flow for donors (T284740; T284800; T285281) (duration: 01m 07s) [18:53:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:53:42] T285281: Donors to newcomers: go straight to homepage - https://phabricator.wikimedia.org/T285281 [18:53:42] T284800: Donors to newcomers: URL parameters - https://phabricator.wikimedia.org/T284800 [18:53:42] T284740: Donors to newcomers: design an enhanced account creation landing page - https://phabricator.wikimedia.org/T284740 [18:54:15] !log urbanecm@deploy1002 Scap failed!: 6/9 canaries failed their endpoint checks(https://en.wikipedia.org) [18:54:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:54:21] RECOVERY - MariaDB Replica Lag: m2 on db2133 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [18:54:21] uhoh [18:54:57] !log cmjohnson@cumin1001 START - Cookbook sre.dns.netbox [18:55:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:55:01] !log urbanecm@deploy1002 sync-file aborted: REVERT: 76e5fc91083736d14049a05ed227cdea015c113e: Add custom signup flow for donors (T284740; T284800; T285281) (duration: 00m 01s) [18:55:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:55:10] are scap errors logged somewhere? [18:55:33] 10Puppet, 10SRE, 10Infrastructure-Foundations: Puppet does not undo manual "systemd mask $unit" - https://phabricator.wikimedia.org/T285425 (10Legoktm) [18:55:42] !log urbanecm@deploy1002 Synchronized php-1.37.0-wmf.9/extensions/GrowthExperiments/: REVERT: 76e5fc91083736d14049a05ed227cdea015c113e: Add custom signup flow for donors (T284740; T284800; T285281) (duration: 00m 38s) [18:55:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:55:56] tgr: Check 'Check endpoints for mw1265.eqiad.wmnet' is CRITICAL: Test Special Version returned the unexpected status 500 (expecting: 200) [18:56:07] for en.wikipedia.org [18:56:19] it should theoretically be in logstash [18:57:02] (03PS1) 10Legoktm: systemd: Ensure units are unmasked [puppet] - 10https://gerrit.wikimedia.org/r/701171 (https://phabricator.wikimedia.org/T285425) [18:57:39] PROBLEM - MediaWiki exceptions and fatals per minute for appserver on alert1001 is CRITICAL: 3948 gt 100 https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [18:57:42] pulled it to mwdebug again [18:58:01] there's a bunch of 'Call to undefined method GrowthExperiments\VariantHooks::onSpecialPage_initList()' but I assume that's just deploy race conditions [18:58:26] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [18:58:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:58:43] (5000 of them, wow.) [18:58:45] right [18:59:05] the good question is...how are we going to deploy this [18:59:15] is there a safe order to sync this in? [18:59:31] RECOVERY - MediaWiki exceptions and fatals per minute for appserver on alert1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=18&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [19:00:04] marxarelli and jeena: Your horoscope predicts another unfortunate MediaWiki train - American Version deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210623T1900). [19:00:16] I guess everything else then extension.json? [19:00:21] probably [19:00:33] jeena: please wait with train for a while [19:00:42] complications with B&C deployment [19:00:48] 👍 dduvall ^ [19:01:03] should we just stop and leave it to the next backport window? [19:01:14] ack. (5k error spike!) [19:01:20] dduvall: yeah, that's me :/ [19:01:45] Special:Version does work for me, and I can't find the related error [19:02:09] no problem. i will wait patiently [19:02:10] there's a 'Too few arguments to function GrowthExperiments\VariantHooks::__construct()' but that's probably also a race condition [19:02:47] and a few 'Argument 1 passed to MediaWiki\User\UserNameUtils::getCanonical() must be of the type string, null given' which I think are unrelated [19:03:28] tgr: let's revert it and do it later. Syncing everything else then extension.json will probably work for hook, but not for the dependencies [19:03:38] (for the hook) [19:03:48] right [19:04:11] (03PS1) 10Urbanecm: Revert "Add custom signup flow for donors" [extensions/GrowthExperiments] (wmf/1.37.0-wmf.9) - 10https://gerrit.wikimedia.org/r/700744 (https://phabricator.wikimedia.org/T284740) [19:04:22] (03CR) 10Urbanecm: [V: 03+2 C: 03+2] Revert "Add custom signup flow for donors" [extensions/GrowthExperiments] (wmf/1.37.0-wmf.9) - 10https://gerrit.wikimedia.org/r/700744 (https://phabricator.wikimedia.org/T284740) (owner: 10Urbanecm) [19:06:05] going to sync it to ensure there's nothing left from the failed deployment [19:06:09] I don't think there's a safe ordering - the VariantHooks constructor needs extension.json for the new argument, changing extension.json declares VariantHooks::onSpecialPage_initList and that hook is used a lot. [19:06:42] 10SRE, 10SRE-Access-Requests: re-open access to Analytic Cluster for ChristineDeKock - https://phabricator.wikimedia.org/T284987 (10Ottomata) Hi, your ssh/shell username is `christinedk`. You shouldn't need a password to ssh into stat1008. If you can ssh into stat1008 (it sounds like you did at first?) then... [19:07:03] probably what triggered the canaries as well [19:07:11] yeah, i think that's what happened [19:07:15] 10Puppet, 10SRE, 10Infrastructure-Foundations, 10Patch-For-Review: Puppet does not undo manual "systemctl mask $unit" - https://phabricator.wikimedia.org/T285425 (10Legoktm) [19:07:32] !log urbanecm@deploy1002 Synchronized php-1.37.0-wmf.9/extensions/GrowthExperiments/: 2338e53: Revert "Add custom signup flow for donors" (T284740; T284800; T285281) (duration: 01m 06s) [19:07:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:07:39] T285281: Donors to newcomers: go straight to homepage - https://phabricator.wikimedia.org/T285281 [19:07:39] T284800: Donors to newcomers: URL parameters - https://phabricator.wikimedia.org/T284800 [19:07:40] T284740: Donors to newcomers: design an enhanced account creation landing page - https://phabricator.wikimedia.org/T284740 [19:07:56] not sure how it can be deployed other than overriding canaries and causing another 5K errors [19:08:14] so (gathering information for future deployment trainings) is there a way to restructure the patches to make it safe? [19:08:28] urbanecm: can you sync or revert the config patch to avoid future confusion? [19:08:35] tgr: you can make three patches: add the argument to VariantHooks constructor as nullable (fallback to global state), do extension.json change and remove the nullable [19:08:37] tgr: sure, on it [19:09:45] apergos: yes, basically making it three patches, see above [19:10:00] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 6e0f5ad88cb7d99e3b4cf48bccb6e34cdcc64fa5: Enable GrowthExperiments donor landing page for testing (T284799) (duration: 01m 05s) [19:10:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:10:06] T284799: [EPIC] Encourage donors to create accounts - https://phabricator.wikimedia.org/T284799 [19:10:23] ugh. I suppose there's no better way. [19:10:34] tgr: none that i'm aware of [19:10:36] gotcha [19:10:44] should've noticed it earlier, sorry [19:11:01] tgr: I think we're now good to hand over to train folks, right? [19:11:07] RECOVERY - MariaDB Replica Lag: m2 on db1117 is OK: OK slave_sql_lag Replication lag: 0.00 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [19:11:41] I forgot SpecialPage_initList is called all over the place. I guess one learning is to be cautious with that hook. [19:11:55] yeah, train should be good to go. [19:12:11] dduvall: jeena: the floor is yours, sorry for the delay [19:13:04] urbanecm: hey, no problem. thanks for running the deployment! [19:13:27] 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops-radar: (Need By: TBD) rack/setup/install mw14[14-56] - https://phabricator.wikimedia.org/T273915 (10RobH) [19:13:45] urbanecm: so the wmf.11 version was deployed, right? [19:13:55] (no traffic so no canary errors I suppose) [19:13:59] (03PS1) 10Arlolra: Bump envoy timeout for restbase [puppet] - 10https://gerrit.wikimedia.org/r/701172 (https://phabricator.wikimedia.org/T279825) [19:14:20] tgr: yes, but messages weren't rebuilded [19:14:34] (03CR) 10Arlolra: Use restbase-for-services for VE's VirtualRestClient calls (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/699434 (https://phabricator.wikimedia.org/T279825) (owner: 10Arlolra) [19:14:39] eh, we don't have translations yet anyway [19:15:40] yeah. let's leave it as-is for now. [19:16:03] and the train will do a full scap anyway, I think? [19:16:38] would on tuesdays -- promotion is a config change IIRC. [19:16:39] tgr: only on tuesdays when we deploy to testwikis prior to group0 [19:16:40] the splitting-up thing is probably not worth the effort for the single day wmf.9 has left [19:17:23] ah, right. We can do it later, or just wait until Tuesday. [19:18:34] tgr: won't waiting until Tuesday affect our/Growth's ability to QA this? That's the only reason we backported this, iirc [19:19:03] depends on how much we care about raw message markers. [19:19:24] the text is not final yet anyway. [19:19:55] so let's wait it is then -- assuming no donor traffic will get to this feature before :) [19:20:25] !log preparing to promote wmf.11 group1 (T281152) cc'ing risky patch contacts Amir1, Krinkle, DannyS712 [19:20:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:20:34] T281152: 1.37.0-wmf.11 deployment blockers - https://phabricator.wikimedia.org/T281152 [19:22:31] (03PS1) 10Dduvall: group1 wikis to 1.37.0-wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701173 [19:22:33] (03CR) 10Dduvall: [C: 03+2] group1 wikis to 1.37.0-wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701173 (owner: 10Dduvall) [19:23:01] > [{reqId}] {exception_url} Error: Call to undefined method GrowthExperiments\VariantHooks::onSpecialPage_initList() [19:23:12] That looks suspect, but not sure if it's already dealt with [19:23:17] Krinkle: that was me, and should no longer happen [19:23:22] ok :) [19:23:28] (03Merged) 10jenkins-bot: group1 wikis to 1.37.0-wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701173 (owner: 10Dduvall) [19:24:26] there are still some log remnants from the revert too around 1907 UTC i believe, e.g. "ArgumentCountError: Too few arguments to function GrowthExperiments\VariantHooks::__construct()" [19:24:34] is that right, tgr? [19:24:54] dduvall: that's also from the deployment, and should be fixed by the revert. [19:25:01] !log dduvall@deploy1002 rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.11 [19:25:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:25:16] yeah, that patch was reverted [19:26:08] !log dduvall@deploy1002 Synchronized php: group1 wikis to 1.37.0-wmf.11 (duration: 01m 06s) [19:26:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:26:18] urbanecm, tgr: great, ty [19:26:25] * Krinkle filteres out wmf.9 from his logstash dashboard [19:26:48] urbanecm: I still don't get how the canary error happened, though. [19:27:04] Do canary checks run while the sync is halfway done? [19:29:22] hmm, seeing a number of "PHP Notice: Undefined index: frameCount". are these related to your patch Amir1? [19:29:25] RECOVERY - MariaDB Replica Lag: m2 on db2078 is OK: OK slave_sql_lag Replication lag: 0.36 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [19:29:47] https://gerrit.wikimedia.org/r/c/mediawiki/core/+/693298 [19:30:28] and some "MovePage:974 Failed to create null revision while moving page ID [...]" [19:30:33] tgr: the synthetic swagger checks run afterward, the logstash checks "run" afterward as well (after 30sec or something), but they compare past minute with some previous time, and we don't depool servers during syncs, so any fatals caused to real users do get noticed by the canary logstash checks [19:31:24] or sync is a lot faster than it once was, but it's still not fully atomic. [19:31:34] so what hapened is that users ran into errors mid-search, and logstash check "just" noticed that [19:31:34] yeah but this sounded like the swagger check [19:31:36] > Check 'Check endpoints for mw1265.eqiad.wmnet' is CRITICAL: Test Special Version returned the unexpected status 500 (expecting: 200) [19:31:38] php opcache revalidation delay does make it less likely for new files to get picked up before the sync is finished [19:31:42] but it's always possible [19:32:14] aye, yeah, that would def run after sync. [19:32:30] was that fatal logged to logstash? [19:33:07] Krinkle: what tgr quoted was logged to my console, and the error message about the SpecialPage_initList hook you quoted was definitely in logstash [19:35:29] !log rebooting kafkamon hosts for updates [19:35:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:36:29] Krinkle: not that I can find (no errors on Special:Version in the logs) [19:36:47] maybe it wasn't a php fatal then, but something else in the middle. [19:37:46] the wmf.11 logspam has definitely increased beyond what i'm comfortable allowing due to "PHP Notice: Undefined index: frameCount" (seemingly related to https://gerrit.wikimedia.org/r/c/mediawiki/core/+/693298). i'm going to rollback for now and see if we can't get it addressed [19:37:54] it would make sense that a race condition on SpecialPage_initList breaks Special:Version, I'm just not sure how the timing would work [19:39:07] !log rolling back wmf.11 from group1 due to increase in logspam possibly related to noted risky patch https://gerrit.wikimedia.org/r/c/mediawiki/core/+/693298 (cc T281152 and patch contact Amir1) [19:39:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:39:11] > Jun 23, 2021 @ 18:54:14.295 Check 'Check endpoints for mw1265.eqiad.wmnet' failed: /wiki/{title} (Main Page) is CRITICAL: Test Main Page returned the unexpected status 500 (expecting: 200); /wiki/{title} (Special Version) is CRITICAL: Test Special Version returned the unexpected status 500 (expecting: 200) [19:39:12] T281152: 1.37.0-wmf.11 deployment blockers - https://phabricator.wikimedia.org/T281152 [19:40:59] tgr: https://logstash.wikimedia.org/goto/f5530bb597284b8c989740e98ab69cbd [19:41:15] this is mw1265 during those 2min with non-mw types includes [19:41:26] So yeah I guess SpecialPage_initList - if it was a php error [19:42:35] thanks for checking [19:42:40] !log dduvall@deploy1002 rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.37.0-wmf.9 [19:42:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:44:06] (03PS1) 10Dduvall: Revert "group1 wikis to 1.37.0-wmf.11" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701174 [19:44:08] (03CR) 10Dduvall: [C: 03+2] Revert "group1 wikis to 1.37.0-wmf.11" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701174 (owner: 10Dduvall) [19:44:14] tgr: lastly, https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-deploy-2021.06.23?id=_605OnoBCxLmWkI6E6Ue [19:44:19] in the same second, on enwiki Special:Version [19:44:22] so I guess [19:44:32] but yeah, strange that it happened. [19:44:53] (03Merged) 10jenkins-bot: Revert "group1 wikis to 1.37.0-wmf.11" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701174 (owner: 10Dduvall) [20:00:04] marxarelli and jeena: It is that lovely time of the day again! You are hereby commanded to deploy MediaWiki train - American Version. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210623T1900). [20:00:05] chrisalbon and accraze: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Services – Graphoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210623T2000). [20:00:52] 10Puppet, 10Infrastructure-Foundations, 10GitLab (Initialization), 10Patch-For-Review, and 3 others: Puppetise gitlab-ansible playbook - https://phabricator.wikimedia.org/T283076 (10brennen) [20:03:10] dduvall: hi, can I have a ticket or something? [20:03:17] I will take a look after eating dinner [20:03:44] Amir1: yes, just filed it https://phabricator.wikimedia.org/T285431 [20:04:08] sorry. i feel like a jerk filing UBNs for "undefined index" but our logspam policy dictates [20:04:32] gotta keep the logs clean [20:05:21] no totally understanable [20:05:30] This patch is massive [20:10:34] 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops-radar: (Need By: TBD) rack/setup/install mw14[14-56] - https://phabricator.wikimedia.org/T273915 (10RobH) [20:11:57] 10SRE: Please create "grant@wikipedia.org" email handle to use for annual fundraising email test - https://phabricator.wikimedia.org/T285432 (10MNoorWMF) [20:13:05] !log cmjohnson@cumin1001 START - Cookbook sre.dns.netbox [20:13:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:16:23] dduvall: the patches under sre patches wmf 11 don't have the proper permissions. I'm not able to copy a new patch into the directory [20:16:37] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [20:16:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:17:32] !log cmjohnson@cumin1001 START - Cookbook sre.dns.netbox [20:17:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:20:00] !log cmjohnson@cumin1001 END (PASS) - Cookbook sre.dns.netbox (exit_code=0) [20:20:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:24:08] maryum: /srv/patches/1.37.0-wmf.11 on the deploy host? [20:24:26] dduvall: yes that directory [20:24:30] FYI - secteam is deploying an updated patch for T285190 right now. [20:25:25] i see. fwict it's owned by me (since i ran `scap apply-patches`) and the wikidev group [20:25:32] jouncebot: now [20:25:32] For the next 0 hour(s) and 34 minute(s): MediaWiki train - American Version (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210623T1900) [20:25:32] For the next 0 hour(s) and 34 minute(s): Services – Graphoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210623T2000) [20:25:41] and is group browsable/writable [20:25:42] sbassett: i don't think that's to do wise during train :) [20:26:41] urbanecm: Oh, thought that was finished. We can hold off for another half hour. [20:27:15] 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops-radar: (Need By: TBD) rack/setup/install mw14[14-56] - https://phabricator.wikimedia.org/T273915 (10RobH) [20:27:48] ftr i'm not a member of the train crew, i only used deployments calendar to check :) [20:28:28] maryum, sbassett: the dir seems to have the same permissions that patch dirs usually have [20:28:43] dduvall: thanks for checking, I'll try again [20:28:51] np [20:29:06] dduvall: the patch files are owned by you though, and don't appear to have group/world write, at least in extensions/CentralAuth [20:29:17] for wmf.11 [20:29:22] * dduvall checks again [20:29:32] confirmed, sbassett is right [20:29:43] ah, odd [20:29:52] on that note, we should update `/usr/local/sbin/fix-staging-perms` to be able to fix patches perms, too [20:29:53] must be an issue with `scap apply-patches` [20:30:37] maryum, sbassett: k. fixed perms! [20:30:46] dduvall: thanks!! [20:30:59] Not a big deal, but secteam folks do need to be able to overwrite anything under /srv/patches (for cases like the updated patch we want to deploy soon). [20:31:53] sbassett: for /srv/mediawiki-stagging, there's the script i linked above. I'll propose to extend it to /srv/patches, too. So if bad perms happen, you'll just run sudo /usr/local/sbin/fix-staging-perms to fix it [20:32:01] looking again, i believe the file perms were simply retained when `scap apply-patches` ran, and the original patch files have the wrong perms [20:32:48] is that due to the umask of the original writer of a patch file? [20:32:56] probably, yes [20:33:12] Ok, well, then maybe we on the secteam need to be more careful of that as well :) [20:34:06] yeah, i can't have you blaming scap. scap is our scapegoat :p [20:34:18] not enough to go around! [20:35:26] er, scap-goat? [20:36:32] lolz [20:36:39] dduvall: I can't believe we didn't make that rename during goatification! [20:37:32] huge oversight by goat-g (cc greg-g) [20:37:32] but seriously, Scappy is a magnificent pig. Some might say radiant even [20:38:57] sure is, and an unsung hero [20:39:49] unsung and undocumented [20:40:42] * dduvall kids. scap's user docs aren't bad. inline comments could be better (see bug #1) [21:01:34] Ok, we'd like to try to deploy an updated sec patch for T285190. Are we clear with the train? [21:01:53] ^ dduvall, urbanecm [21:02:49] sbassett: still waiting on a UBN fix for train to proceed, so you can go ahead [21:03:00] Ok, thx [21:04:16] 10SRE, 10LDAP-Access-Requests: Access request to superset for user natalia-rodriguez - https://phabricator.wikimedia.org/T285436 (10NRodriguez) [21:13:05] (03CR) 10Razzi: [V: 03+1 C: 03+2] superset: rename analytics_cluster::ui::{dashboards,superset} [puppet] - 10https://gerrit.wikimedia.org/r/701066 (https://phabricator.wikimedia.org/T268219) (owner: 10Razzi) [21:13:58] * greg-g looks in all confused [21:14:52] it is a great alt name for scap though [21:15:02] a goat alt name [21:15:05] geez folks [21:15:32] and with that driveby udderly useless comment, I am out, have a good rest of your day/evening ,folks [21:19:44] (03PS3) 10Ryan Kemper: mjolnir: Provide prioritized topics to bulk daemon [puppet] - 10https://gerrit.wikimedia.org/r/699814 (https://phabricator.wikimedia.org/T261407) (owner: 10Ebernhardson) [21:22:45] (03CR) 10Ryan Kemper: [C: 03+2] mjolnir: Provide prioritized topics to bulk daemon [puppet] - 10https://gerrit.wikimedia.org/r/699814 (https://phabricator.wikimedia.org/T261407) (owner: 10Ebernhardson) [21:39:14] 10SRE, 10Infrastructure-Foundations, 10Mail: Please create "grant@wikipedia.org" email handle to use for annual fundraising email test - https://phabricator.wikimedia.org/T285432 (10Peachey88) [21:45:42] !log Deployed updated security patch for T285190 to wmf.9 and wmf.11 [21:45:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:52:03] Krinkle: if you're around and willing to review this fixes it https://gerrit.wikimedia.org/r/701179 [21:53:13] Pchelolo: ^ [21:53:42] ok, looking [21:53:42] Amir1: Ah, so the ! check was handling it bofore [21:54:01] That's a good thing to re-review the original commit again for, to see if any other error handling was lost that way [21:54:12] yeah, it used to return "0", now return [ "_error" => 0 ] [22:00:23] that's what I was wondering too, it should've returned "0" and I forgot that "0" is falsy... [22:03:10] (03PS1) 10Ladsgroup: Check for _error in getting metadata array in PNGHandler [core] (wmf/1.37.0-wmf.11) - 10https://gerrit.wikimedia.org/r/701186 (https://phabricator.wikimedia.org/T285431) [22:03:16] (03CR) 10Ladsgroup: [C: 03+2] Check for _error in getting metadata array in PNGHandler [core] (wmf/1.37.0-wmf.11) - 10https://gerrit.wikimedia.org/r/701186 (https://phabricator.wikimedia.org/T285431) (owner: 10Ladsgroup) [22:04:02] (03PS1) 10RobH: install params for mw14[14-47] [puppet] - 10https://gerrit.wikimedia.org/r/701181 (https://phabricator.wikimedia.org/T273915) [22:05:02] (03CR) 10jerkins-bot: [V: 04-1] install params for mw14[14-47] [puppet] - 10https://gerrit.wikimedia.org/r/701181 (https://phabricator.wikimedia.org/T273915) (owner: 10RobH) [22:06:34] (03PS2) 10RobH: install params for mw14[14-47] [puppet] - 10https://gerrit.wikimedia.org/r/701181 (https://phabricator.wikimedia.org/T273915) [22:06:35] Amir1: sorry, i neglected to mention this in the task but i was seeing the same notice from GIFHandler, line 167 [22:06:41] i'll update the task [22:07:17] Please do, I'll make a patch for that as well. I should check what others do as well [22:07:50] (03CR) 10RobH: [C: 03+2] install params for mw14[14-47] [puppet] - 10https://gerrit.wikimedia.org/r/701181 (https://phabricator.wikimedia.org/T273915) (owner: 10RobH) [22:10:58] 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops-radar: (Need By: TBD) rack/setup/install mw14[14-56] - https://phabricator.wikimedia.org/T273915 (10RobH) [22:12:10] 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops-radar: (Need By: TBD) rack/setup/install mw14[14-56] - https://phabricator.wikimedia.org/T273915 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts: ` mw1414.eqiad.wmnet ` The log can be found in `/var/log/wmf-au... [22:12:59] dduvall: can you tell if there is any other place causing error as well? [22:13:13] 10SRE, 10ops-eqiad, 10DC-Ops, 10cloud-services-team (Hardware): (Need By: TBD) rack/setup/install cloudcephosd10[16-20].eqiad.wmnet - https://phabricator.wikimedia.org/T274945 (10RobH) a:05RobH→03Cmjohnson I'm not sure why I'm not seeing a DHCP request for cloudcephosd1016: cloudcephosd1016. rack C8.... [22:13:15] not that i saw during the deployment [22:14:10] okay, let's get these two merged and backported, roll the train to group1 (if there is no other blocker) and then I'll fix anything left before rolling to group2. [22:14:15] Does that sound good to you? [22:15:19] I can't find any other usecase dduvall [22:15:25] but there might be some here an there [22:15:44] that sounds good, Amir1. thanks for jumping on this! [22:16:03] jouncebot: now [22:16:03] No deployments scheduled for the next 0 hour(s) and 43 minute(s) [22:16:48] oh I'm sorry I caused this. The thing is that this rework is needed to reduce the chance of commons database blowing up again [22:16:57] it's currently a ticking bomb [22:18:42] Pchelolo: the second set :D https://gerrit.wikimedia.org/r/701182 [22:18:55] I hope this covers all [22:19:01] so far couldn't find anything else [22:19:01] hey, no problem. i feel quite relaxed :) [22:19:45] ^^ [22:21:55] (03CR) 10jerkins-bot: [V: 04-1] Check for _error in getting metadata array in PNGHandler [core] (wmf/1.37.0-wmf.11) - 10https://gerrit.wikimedia.org/r/701186 (https://phabricator.wikimedia.org/T285431) (owner: 10Ladsgroup) [22:23:04] (03CR) 10Ladsgroup: [C: 03+2] "... Again" [core] (wmf/1.37.0-wmf.11) - 10https://gerrit.wikimedia.org/r/701186 (https://phabricator.wikimedia.org/T285431) (owner: 10Ladsgroup) [22:24:40] !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1414.eqiad.wmnet with reason: REIMAGE [22:24:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:26:50] !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1414.eqiad.wmnet with reason: REIMAGE [22:26:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:29:20] I need a +2 for the second patch. Krinkle 🥺 [22:31:31] https://gerrit.wikimedia.org/r/c/mediawiki/core/+/701182 [22:35:18] 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops-radar: (Need By: TBD) rack/setup/install mw14[14-56] - https://phabricator.wikimedia.org/T273915 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['mw1414.eqiad.wmnet'] ` and were **ALL** successful. [22:40:47] (03Merged) 10jenkins-bot: Check for _error in getting metadata array in PNGHandler [core] (wmf/1.37.0-wmf.11) - 10https://gerrit.wikimedia.org/r/701186 (https://phabricator.wikimedia.org/T285431) (owner: 10Ladsgroup) [22:42:08] (03PS1) 10Ladsgroup: Check for _error in getting metadata array in GIFHandler [core] (wmf/1.37.0-wmf.11) - 10https://gerrit.wikimedia.org/r/701187 (https://phabricator.wikimedia.org/T285431) [22:42:14] (03CR) 10Ladsgroup: [C: 03+2] Check for _error in getting metadata array in GIFHandler [core] (wmf/1.37.0-wmf.11) - 10https://gerrit.wikimedia.org/r/701187 (https://phabricator.wikimedia.org/T285431) (owner: 10Ladsgroup) [22:42:36] !log ladsgroup@deploy1002 Synchronized php-1.37.0-wmf.11/includes/media/PNGHandler.php: Backport: [[gerrit:701186|Check for _error in getting metadata array in PNGHandler (T285431)]] (duration: 01m 06s) [22:42:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:42:42] T285431: PHP Notice: Undefined index: frameCount - https://phabricator.wikimedia.org/T285431 [22:56:31] 10SRE, 10ops-eqiad, 10DC-Ops, 10serviceops-radar: (Need By: TBD) rack/setup/install mw14[14-56] - https://phabricator.wikimedia.org/T273915 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts: ` ['mw1415.eqiad.wmnet', 'mw1416.eqiad.wmnet', 'mw1417.eqiad.wm... [22:56:39] rise servers, riseeeee! [22:58:08] 10SRE, 10conftool, 10serviceops, 10Datacenter-Switchover, 10Patch-For-Review: Disable maintenance scripts via conftool - https://phabricator.wikimedia.org/T266717 (10Legoktm) Ack, and we already have a separate step to re-enable them. The masking I suggested earlier doesn't seem to work exactly yet, see... [23:00:05] RoanKattouw, Niharika, and Urbanecm: #bothumor Q:How do functions break up? A:They stop calling each other. Rise for Evening backport window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20210623T2300). [23:00:05] No GERRIT patches in the queue for this window AFAICS. [23:02:42] (03CR) 10Legoktm: [C: 03+2] Bump envoy timeout for restbase [puppet] - 10https://gerrit.wikimedia.org/r/701172 (https://phabricator.wikimedia.org/T279825) (owner: 10Arlolra) [23:03:48] (03Merged) 10jenkins-bot: Check for _error in getting metadata array in GIFHandler [core] (wmf/1.37.0-wmf.11) - 10https://gerrit.wikimedia.org/r/701187 (https://phabricator.wikimedia.org/T285431) (owner: 10Ladsgroup) [23:05:14] !log ladsgroup@deploy1002 Synchronized php-1.37.0-wmf.11/includes/media/GIFHandler.php: Backport: [[gerrit:701187|Check for _error in getting metadata array in GIFHandler (T285431)]] (duration: 01m 06s) [23:05:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:05:19] T285431: PHP Notice: Undefined index: frameCount - https://phabricator.wikimedia.org/T285431 [23:06:56] are the sites very slow for anyone else, or just me? [23:08:00] Guest4094: seems ok to me [23:08:16] re-rolling group1 momentarily [23:09:01] !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1415.eqiad.wmnet with reason: REIMAGE [23:09:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:10:43] !log re-rolling group1 to 1.37.0-wmf.11 (T281152) following deployment of blocker fixes [23:10:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:10:48] T281152: 1.37.0-wmf.11 deployment blockers - https://phabricator.wikimedia.org/T281152 [23:10:59] (03PS1) 10Dduvall: group1 wikis to 1.37.0-wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701185 [23:10:59] looks like i have 60% packet loss when pinging wikimedia sites, probably not our fault [23:11:01] !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1416.eqiad.wmnet with reason: REIMAGE [23:11:01] (03CR) 10Dduvall: [C: 03+2] group1 wikis to 1.37.0-wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701185 (owner: 10Dduvall) [23:11:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:11:14] !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1415.eqiad.wmnet with reason: REIMAGE [23:11:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:11:39] (03Merged) 10jenkins-bot: group1 wikis to 1.37.0-wmf.11 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701185 (owner: 10Dduvall) [23:13:00] !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1417.eqiad.wmnet with reason: REIMAGE [23:13:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:13:03] !log dduvall@deploy1002 rebuilt and synchronized wikiversions files: group1 wikis to 1.37.0-wmf.11 [23:13:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:13:29] !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1416.eqiad.wmnet with reason: REIMAGE [23:13:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:14:08] !log dduvall@deploy1002 Synchronized php: group1 wikis to 1.37.0-wmf.11 (duration: 01m 04s) [23:14:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:15:08] !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1418.eqiad.wmnet with reason: REIMAGE [23:15:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:15:36] !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1417.eqiad.wmnet with reason: REIMAGE [23:15:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:17:03] !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1419.eqiad.wmnet with reason: REIMAGE [23:17:06] Amir1: inexplicably seeing "PHP Notice: Undefined index: frameCount" again, now on line 156 :/ [23:17:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:17:44] !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1418.eqiad.wmnet with reason: REIMAGE [23:17:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:19:08] !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1420.eqiad.wmnet with reason: REIMAGE [23:19:10] Amir1: the && error seems probematic as the error might be falsey [23:19:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:19:21] !log rolling back 1.37.0-wmf.11 from group1 (T281152) due to reoccurrence of "PHP Notice: Undefined index: frameCount" now at PNGHandler.php:156 (T285431) [23:19:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:19:27] T281152: 1.37.0-wmf.11 deployment blockers - https://phabricator.wikimedia.org/T281152 [23:19:27] T285431: PHP Notice: Undefined index: frameCount - https://phabricator.wikimedia.org/T285431 [23:19:48] !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1419.eqiad.wmnet with reason: REIMAGE [23:19:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:20:01] i unfortunately have to call it a day after rollback and pick up my kid, but i can do group1 and group2 tomorrow morning [23:20:12] er, group1 tomorrow morning and group2 at the normal hour [23:21:00] !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE [23:21:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:21:51] !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1420.eqiad.wmnet with reason: REIMAGE [23:21:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:21:58] (03PS1) 1020after4: Remove reference to obsolete phabricator libraries. [puppet] - 10https://gerrit.wikimedia.org/r/701206 [23:21:59] dduvall: yeah, it'll be an easy fix. just a silly mistake in the patch [23:22:15] I've left CR pointing out that you noticed the error moved one line down, and how to fix it [23:22:27] !log dduvall@deploy1002 rebuilt and synchronized wikiversions files: Revert group1 wikis to 1.37.0-wmf.9 [23:22:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:22:51] Krinkle: right on. thank you! [23:23:01] !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE [23:23:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:23:40] (03PS1) 10Dduvall: Revert "group1 wikis to 1.37.0-wmf.11" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701207 [23:23:42] (03CR) 10Dduvall: [C: 03+2] Revert "group1 wikis to 1.37.0-wmf.11" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701207 (owner: 10Dduvall) [23:23:53] !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1421.eqiad.wmnet with reason: REIMAGE [23:23:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:24:19] (03Merged) 10jenkins-bot: Revert "group1 wikis to 1.37.0-wmf.11" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/701207 (owner: 10Dduvall) [23:25:00] !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1423.eqiad.wmnet with reason: REIMAGE [23:25:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:26:05] !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1422.eqiad.wmnet with reason: REIMAGE [23:26:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:27:00] !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1424.eqiad.wmnet with reason: REIMAGE [23:27:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:28:12] !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1423.eqiad.wmnet with reason: REIMAGE [23:28:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:29:00] !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1425.eqiad.wmnet with reason: REIMAGE [23:29:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:30:26] !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1424.eqiad.wmnet with reason: REIMAGE [23:30:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:31:08] !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1426.eqiad.wmnet with reason: REIMAGE [23:31:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:32:35] !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1425.eqiad.wmnet with reason: REIMAGE [23:32:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:33:01] !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1427.eqiad.wmnet with reason: REIMAGE [23:33:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:34:42] !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1426.eqiad.wmnet with reason: REIMAGE [23:34:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:35:00] !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1428.eqiad.wmnet with reason: REIMAGE [23:35:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:36:55] !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1427.eqiad.wmnet with reason: REIMAGE [23:36:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:37:01] !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1429.eqiad.wmnet with reason: REIMAGE [23:37:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:39:00] !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1430.eqiad.wmnet with reason: REIMAGE [23:39:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:39:07] !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1428.eqiad.wmnet with reason: REIMAGE [23:39:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:41:00] !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1431.eqiad.wmnet with reason: REIMAGE [23:41:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:41:12] !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1430.eqiad.wmnet with reason: REIMAGE [23:41:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:42:01] !log robh@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1429.eqiad.wmnet with reason: REIMAGE [23:42:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:43:01] !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1432.eqiad.wmnet with reason: REIMAGE [23:43:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:43:07] (03Abandoned) 10Arlolra: Switch to using parsoid-async for direct VirtualRestClient connects [mediawiki-config] - 10https://gerrit.wikimedia.org/r/700077 (https://phabricator.wikimedia.org/T244609) (owner: 10Arlolra) [23:45:09] !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1433.eqiad.wmnet with reason: REIMAGE [23:45:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:45:24] !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1431.eqiad.wmnet with reason: REIMAGE [23:45:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:46:50] !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE [23:46:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:48:01] !log robh@cumin1001 END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1432.eqiad.wmnet with reason: REIMAGE [23:48:02] !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1433.eqiad.wmnet with reason: REIMAGE [23:48:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:48:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:48:51] (03CR) 10Legoktm: mediawiki: Port mw-cli-wrapper to Python (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/701164 (owner: 10Legoktm) [23:49:04] !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE [23:49:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:50:19] !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1434.eqiad.wmnet with reason: REIMAGE [23:50:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:51:05] !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE [23:51:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:52:37] !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1435.eqiad.wmnet with reason: REIMAGE [23:52:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:52:42] 10SRE, 10serviceops, 10Parsoid (Tracking): Maybe consider consolidating parsoid-* and restbase-* proxy services, respectively - https://phabricator.wikimedia.org/T285445 (10Arlolra) [23:53:01] !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1437.eqiad.wmnet with reason: REIMAGE [23:53:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:53:59] (03PS2) 10Legoktm: mediawiki: Port mw-cli-wrapper to Python [puppet] - 10https://gerrit.wikimedia.org/r/701164 [23:54:01] (03PS3) 10Legoktm: mediawiki: mw-cli-wrapper: Only run if read only in confctl is false [puppet] - 10https://gerrit.wikimedia.org/r/701055 (https://phabricator.wikimedia.org/T266717) [23:54:07] (03CR) 10Legoktm: mediawiki: mw-cli-wrapper: Only run if read only in confctl is false (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/701055 (https://phabricator.wikimedia.org/T266717) (owner: 10Legoktm) [23:54:49] !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1436.eqiad.wmnet with reason: REIMAGE [23:54:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:55:08] !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1438.eqiad.wmnet with reason: REIMAGE [23:55:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:56:54] !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1437.eqiad.wmnet with reason: REIMAGE [23:56:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:57:09] !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1439.eqiad.wmnet with reason: REIMAGE [23:57:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:59:00] !log robh@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1438.eqiad.wmnet with reason: REIMAGE [23:59:01] !log robh@cumin1001 START - Cookbook sre.hosts.downtime for 2:00:00 on mw1440.eqiad.wmnet with reason: REIMAGE [23:59:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:59:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log