[01:09:36] RECOVERY - MariaDB Replica Lag: s1 on db2141 is OK: OK slave_sql_lag Replication lag: 0.31 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica [02:10:47] (03PS1) 10Reedy: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735754 [02:10:49] (03CR) 10Reedy: [C: 03+2] Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735754 (owner: 10Reedy) [02:13:56] (03Abandoned) 10Reedy: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735754 (owner: 10Reedy) [02:16:25] (03PS1) 10Reedy: Add ami to langlist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735755 (https://phabricator.wikimedia.org/T294717) [02:19:34] (03CR) 10Reedy: [C: 03+2] Add ami to langlist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735755 (https://phabricator.wikimedia.org/T294717) (owner: 10Reedy) [02:20:14] (03Merged) 10jenkins-bot: Add ami to langlist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735755 (https://phabricator.wikimedia.org/T294717) (owner: 10Reedy) [02:20:33] (03PS1) 10Reedy: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735756 [02:20:35] (03CR) 10Reedy: [C: 03+2] Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735756 (owner: 10Reedy) [02:20:55] (03Abandoned) 10Reedy: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735756 (owner: 10Reedy) [02:22:28] !log reedy@deploy1002 Synchronized langlist: Add ami to langlist T294717 T292414 (duration: 00m 55s) [02:22:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:22:37] T294717: Enable ami interlanguage prefix - https://phabricator.wikimedia.org/T294717 [02:22:37] T292414: Create Wikipedia Amis - https://phabricator.wikimedia.org/T292414 [02:22:46] (03PS1) 10Reedy: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735757 [02:22:48] (03CR) 10Reedy: [C: 03+2] Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735757 (owner: 10Reedy) [02:23:30] (03Merged) 10jenkins-bot: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735757 (owner: 10Reedy) [02:24:29] !log reedy@deploy1002 Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 01m 49s) [02:24:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:28:07] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [02:28:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:31:37] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [02:31:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:41:34] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [02:41:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:46:30] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [02:46:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:54:26] PROBLEM - k8s API server requests latencies on kubemaster2002 is CRITICAL: instance=10.192.16.48 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [02:58:34] RECOVERY - k8s API server requests latencies on kubemaster2002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [04:03:12] PROBLEM - SSH on puppetmaster1002.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [05:43:46] PROBLEM - Query Service HTTP Port on wdqs2003 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 298 bytes in 0.003 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service [06:05:12] RECOVERY - SSH on puppetmaster1002.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [07:00:35] (03CR) 10Awight: "This change is ready for review." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735769 (owner: 10Awight) [07:50:28] PROBLEM - Host logstash2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [07:51:52] (03CR) 10Muehlenhoff: [C: 04-1] "Agreed on installing netcat-openbsd by default, but two comments inline" [puppet] - 10https://gerrit.wikimedia.org/r/735413 (owner: 10Herron) [08:09:50] (03PS2) 10Muehlenhoff: Prefer mx1001 over mx2001 for weights in MX records [dns] - 10https://gerrit.wikimedia.org/r/732924 [08:15:12] RECOVERY - Host logstash2028.mgmt is UP: PING OK - Packet loss = 0%, RTA = 45.22 ms [08:24:52] (03CR) 10Muehlenhoff: [C: 03+2] Prefer mx1001 over mx2001 for weights in MX records [dns] - 10https://gerrit.wikimedia.org/r/732924 (owner: 10Muehlenhoff) [08:26:44] (03CR) 10Muehlenhoff: Buster tracking updates (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/731777 (owner: 10Muehlenhoff) [09:03:26] !log restarting blazegraph on wdqs2003 (jvm stuck for the last 22hours) [09:03:28] (03PS1) 10Muehlenhoff: Remove LDAP access for lzaman [puppet] - 10https://gerrit.wikimedia.org/r/735931 [09:03:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:04:26] wdqs2003 will likely complain about lag, the alert should go off by itself in a couple hours [09:04:28] PROBLEM - Host logstash2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [09:05:17] (03CR) 10Muehlenhoff: [C: 03+2] Remove LDAP access for lzaman [puppet] - 10https://gerrit.wikimedia.org/r/735931 (owner: 10Muehlenhoff) [09:05:22] RECOVERY - Query Service HTTP Port on wdqs2003 is OK: HTTP OK: HTTP/1.1 200 OK - 448 bytes in 0.015 second response time https://wikitech.wikimedia.org/wiki/Wikidata_query_service [09:12:10] PROBLEM - WDQS high update lag on wdqs2003 is CRITICAL: 6.521e+07 ge 4.32e+07 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [09:20:26] (03CR) 10Jelto: [C: 03+1] Rakefile: Print stderr in case helmfile build fails [deployment-charts] - 10https://gerrit.wikimedia.org/r/730204 (owner: 10JMeybohm) [09:26:02] (03CR) 10Jelto: [C: 03+2] blubberoid: bump common_templates to 0.4 and chart version [deployment-charts] - 10https://gerrit.wikimedia.org/r/734926 (https://phabricator.wikimedia.org/T292390) (owner: 10Jelto) [09:27:18] (03CR) 10Jelto: [C: 03+2] services: deploy services with helm3 [deployment-charts] - 10https://gerrit.wikimedia.org/r/721301 (https://phabricator.wikimedia.org/T251305) (owner: 10Jelto) [09:27:23] (03CR) 10Jelto: [C: 03+2] Rakefile: Print stderr in case helmfile build fails [deployment-charts] - 10https://gerrit.wikimedia.org/r/730204 (owner: 10JMeybohm) [09:29:22] RECOVERY - Host logstash2028.mgmt is UP: PING OK - Packet loss = 0%, RTA = 34.59 ms [09:31:44] (03CR) 10jerkins-bot: [V: 04-1] services: deploy services with helm3 [deployment-charts] - 10https://gerrit.wikimedia.org/r/721301 (https://phabricator.wikimedia.org/T251305) (owner: 10Jelto) [09:35:10] (03PS3) 10Jelto: Rakefile: Print stderr in case helmfile build fails [deployment-charts] - 10https://gerrit.wikimedia.org/r/730204 (owner: 10JMeybohm) [09:45:03] (03PS16) 10Jelto: services: deploy services with helm3 [deployment-charts] - 10https://gerrit.wikimedia.org/r/721301 (https://phabricator.wikimedia.org/T251305) [10:05:50] (03PS3) 10Jelto: blubberoid: bump common_templates to 0.4 and chart version [deployment-charts] - 10https://gerrit.wikimedia.org/r/734926 (https://phabricator.wikimedia.org/T292390) [10:12:39] jouncebot: nowandnext [10:12:39] No deployments scheduled for the next 0 hour(s) and 47 minute(s) [10:12:39] In 0 hour(s) and 47 minute(s): UTC morning backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211101T1100) [10:16:20] * urbanecm deploying a sec patch [10:17:01] !log Deploy a security patch for T294686 [10:17:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:27:29] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] "I can confirm this." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735769 (owner: 10Awight) [10:43:30] RECOVERY - WDQS high update lag on wdqs2003 is OK: (C)4.32e+07 ge (W)2.16e+07 ge 2.049e+07 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [10:46:19] !log installing libdatetime-timezone-perl updates (updates for latest tz changes) [10:46:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:47:28] (03PS2) 10Btullis: Add more alerts to the data-engineering team [alerts] - 10https://gerrit.wikimedia.org/r/735669 (https://phabricator.wikimedia.org/T293399) [10:50:12] (03PS2) 10Hnowlan: api-gateway: move pathing_map config to helmfile [deployment-charts] - 10https://gerrit.wikimedia.org/r/735411 (https://phabricator.wikimedia.org/T288789) (owner: 10Elukey) [10:54:09] (03PS1) 10Alexandros Kosiaris: tlsproxy::localssl: Allow disabling http2 [puppet] - 10https://gerrit.wikimedia.org/r/735944 (https://phabricator.wikimedia.org/T275752) [10:54:11] (03PS1) 10Alexandros Kosiaris: maps: Disable http2 in tlsproxy [puppet] - 10https://gerrit.wikimedia.org/r/735945 (https://phabricator.wikimedia.org/T275752) [10:54:13] (03PS1) 10Alexandros Kosiaris: elasticsearch::cirrus: Disable http2 in tlsproxy [puppet] - 10https://gerrit.wikimedia.org/r/735946 (https://phabricator.wikimedia.org/T275752) [10:54:15] (03PS1) 10Alexandros Kosiaris: cloudelastic: Disable http2 in tlsproxy [puppet] - 10https://gerrit.wikimedia.org/r/735947 (https://phabricator.wikimedia.org/T275752) [10:56:00] (03CR) 10jerkins-bot: [V: 04-1] tlsproxy::localssl: Allow disabling http2 [puppet] - 10https://gerrit.wikimedia.org/r/735944 (https://phabricator.wikimedia.org/T275752) (owner: 10Alexandros Kosiaris) [10:56:38] (03PS2) 10Urbanecm: foundationwiki: Disable direct account creation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735674 (https://phabricator.wikimedia.org/T205347) [10:56:50] (03CR) 10Urbanecm: [C: 03+2] foundationwiki: Disable direct account creation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735674 (https://phabricator.wikimedia.org/T205347) (owner: 10Urbanecm) [10:57:45] (03Merged) 10jenkins-bot: foundationwiki: Disable direct account creation [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735674 (https://phabricator.wikimedia.org/T205347) (owner: 10Urbanecm) [10:58:52] (03PS3) 10Urbanecm: Add edit-legal to editprotected grant [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735716 [10:58:57] (03CR) 10Urbanecm: [C: 03+2] Add edit-legal to editprotected grant [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735716 (owner: 10Urbanecm) [10:59:32] logmsgbot is gone! [10:59:36] can someone restart it please? [10:59:42] (03Merged) 10jenkins-bot: Add edit-legal to editprotected grant [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735716 (owner: 10Urbanecm) [10:59:55] https://wikitech.wikimedia.org/wiki/Logmsgbot, looks to be co-hosted with icinga [11:00:03] !log 10:59:03 Synchronized wmf-config/InitialiseSettings.php: c236232bc48f4a61e98ffd2a93a23375bbb46287: foundationwiki: Disable direct account creation (T205347) (duration: 00m 56s) [11:00:04] Amir1, Lucas_WMDE, awight, and Urbanecm: Time to snap out of that daydream and deploy UTC morning backport window. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211101T1100). [11:00:05] No Gerrit patches in the queue for this window AFAICS. [11:00:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:00:06] T205347: Enable SUL accounts on Governance wiki - https://phabricator.wikimedia.org/T205347 [11:01:06] (03PS2) 10Alexandros Kosiaris: tlsproxy::localssl: Allow disabling http2 [puppet] - 10https://gerrit.wikimedia.org/r/735944 (https://phabricator.wikimedia.org/T275752) [11:01:08] (03PS2) 10Alexandros Kosiaris: maps: Disable http2 in tlsproxy [puppet] - 10https://gerrit.wikimedia.org/r/735945 (https://phabricator.wikimedia.org/T275752) [11:01:10] (03PS2) 10Alexandros Kosiaris: elasticsearch::cirrus: Disable http2 in tlsproxy [puppet] - 10https://gerrit.wikimedia.org/r/735946 (https://phabricator.wikimedia.org/T275752) [11:01:12] (03PS2) 10Alexandros Kosiaris: cloudelastic: Disable http2 in tlsproxy [puppet] - 10https://gerrit.wikimedia.org/r/735947 (https://phabricator.wikimedia.org/T275752) [11:01:14] (03PS1) 10Alexandros Kosiaris: elasticsearch: Fix spec test [puppet] - 10https://gerrit.wikimedia.org/r/735949 [11:01:56] !log 11:01:21 Synchronized wmf-config/CommonSettings.php: b9aa3d21bfb16aaa9605e7abe311eb122009d6ed: Add edit-legal to editprotected grant (duration: 00m 54s) [11:01:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:02:55] (LogstashIndexingFailures) firing: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org [11:03:31] 10SRE, 10serviceops, 10MW-1.38-notes (1.38.0-wmf.6; 2021-10-26), 10Patch-For-Review, 10Sustainability: Jobrunner on Buster occasional timeout on codfw file upload - https://phabricator.wikimedia.org/T275752 (10akosiaris) >>! In T275752#7468984, @Legoktm wrote: >> But now I realize that we were talking HT... [11:04:13] (03CR) 10Hnowlan: [C: 03+2] api-gateway: move pathing_map config to helmfile [deployment-charts] - 10https://gerrit.wikimedia.org/r/735411 (https://phabricator.wikimedia.org/T288789) (owner: 10Elukey) [11:07:55] (LogstashIndexingFailures) resolved: Logstash Elasticsearch indexing errors - https://wikitech.wikimedia.org/wiki/Logstash#Indexing_errors - https://grafana.wikimedia.org/d/000000561/logstash?viewPanel=40 - https://alerts.wikimedia.org [11:08:47] (03Merged) 10jenkins-bot: api-gateway: move pathing_map config to helmfile [deployment-charts] - 10https://gerrit.wikimedia.org/r/735411 (https://phabricator.wikimedia.org/T288789) (owner: 10Elukey) [11:10:16] (03CR) 10Hnowlan: api-gateway: move pathing_map config to helmfile (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/735411 (https://phabricator.wikimedia.org/T288789) (owner: 10Elukey) [11:11:05] 10SRE, 10DBA: wmf-auto-reinstall fails on hosts that run pt-heartbeat - https://phabricator.wikimedia.org/T252528 (10Kormat) >>! In T252528#7125738, @LSobanski wrote: > A stub document capturing this is at https://wikitech.wikimedia.org/wiki/MariaDB/Rebooting_a_host. Done. [11:13:32] welcome back, logmsgbot [11:14:31] (03CR) 10Alexandros Kosiaris: [C: 03+2] elasticsearch: Fix spec test [puppet] - 10https://gerrit.wikimedia.org/r/735949 (owner: 10Alexandros Kosiaris) [11:14:45] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [11:14:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:17:54] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [11:17:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:18:08] (03CR) 10Alexandros Kosiaris: [C: 03+1] mediawiki: remove font packages from all canary appservers [puppet] - 10https://gerrit.wikimedia.org/r/735685 (https://phabricator.wikimedia.org/T294378) (owner: 10Dzahn) [11:20:57] !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1096.eqiad.wmnet [11:20:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:21:10] (03PS1) 10Urbanecm: QuitMentorship: Pass a logger [extensions/GrowthExperiments] (wmf/1.38.0-wmf.6) - 10https://gerrit.wikimedia.org/r/735641 (https://phabricator.wikimedia.org/T294665) [11:21:18] (03CR) 10Urbanecm: [C: 03+2] QuitMentorship: Pass a logger [extensions/GrowthExperiments] (wmf/1.38.0-wmf.6) - 10https://gerrit.wikimedia.org/r/735641 (https://phabricator.wikimedia.org/T294665) (owner: 10Urbanecm) [11:22:05] !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host an-worker1097.eqiad.wmnet [11:22:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:25:32] (03CR) 10JMeybohm: [C: 04-1] role::ml_k8s::master: add node-role.kubernetes.io/master labels (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/735577 (https://phabricator.wikimedia.org/T289834) (owner: 10Elukey) [11:27:41] (03CR) 10Urbanecm: "Lydia approved" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735627 (https://phabricator.wikimedia.org/T294632) (owner: 10Juan90264) [11:27:54] (03CR) 10Urbanecm: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735627 (https://phabricator.wikimedia.org/T294632) (owner: 10Juan90264) [11:31:15] 10SRE, 10serviceops, 10Kubernetes, 10Patch-For-Review: Migrate to helm v3 - https://phabricator.wikimedia.org/T251305 (10Jelto) A proof of concept migration of `blubberoid` in `staging-codfw` was successful. I used the following steps to migrate the service: * Create and submit change for blubberoids `h... [11:31:53] !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1097.eqiad.wmnet [11:31:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:40:11] (03Merged) 10jenkins-bot: QuitMentorship: Pass a logger [extensions/GrowthExperiments] (wmf/1.38.0-wmf.6) - 10https://gerrit.wikimedia.org/r/735641 (https://phabricator.wikimedia.org/T294665) (owner: 10Urbanecm) [11:41:36] !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host an-worker1098.eqiad.wmnet [11:41:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:47:39] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [11:47:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:48:38] !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1098.eqiad.wmnet [11:48:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:49:01] !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host an-worker1099.eqiad.wmnet [11:49:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:50:56] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [11:50:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:54:44] 10SRE, 10Community-Tech, 10serviceops, 10wikidiff2, 10Platform Team Workboards (Platform Engineering Reliability): Deploy wikidiff2 1.13.0 - https://phabricator.wikimedia.org/T285857 (10Daimona) Hey @hnowlan @WDoranWMF, just want to make sure you saw the updates above. This is now ready for deployment in... [11:58:02] !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1099.eqiad.wmnet [11:58:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:58:26] PROBLEM - Host logstash2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [11:59:34] !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host an-worker1100.eqiad.wmnet [11:59:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:07:29] !log urbanecm@deploy1002 Synchronized php-1.38.0-wmf.6/extensions/GrowthExperiments/includes/Mentorship/QuitMentorshipFactory.php: 4671528977db15b4e287d50980a684223ab6f611: QuitMentorship: Pass a logger (T294665; 1/2) (duration: 00m 56s) [12:07:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:07:32] T294665: Error: Call to a member function warning() on null - https://phabricator.wikimedia.org/T294665 [12:08:24] !log urbanecm@deploy1002 Synchronized php-1.38.0-wmf.6/extensions/GrowthExperiments/includes/Mentorship/QuitMentorship.php: 4671528977db15b4e287d50980a684223ab6f611: QuitMentorship: Pass a logger (T294665; 2/2) (duration: 00m 55s) [12:08:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:08:34] !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1100.eqiad.wmnet [12:08:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:10:50] RECOVERY - Host logstash2028.mgmt is UP: PING OK - Packet loss = 0%, RTA = 33.74 ms [12:12:14] (03PS2) 10MMandere: netboot: Add drmrs DC site subnets [puppet] - 10https://gerrit.wikimedia.org/r/735608 (https://phabricator.wikimedia.org/T282787) [12:16:36] (03CR) 10jerkins-bot: [V: 04-1] Localisation updates from https://translatewiki.net. [software/mailman-templates] - 10https://gerrit.wikimedia.org/r/735968 (owner: 10L10n-bot) [12:18:23] !log btullis@cumin1001 START - Cookbook sre.hosts.reboot-single for host an-worker1101.eqiad.wmnet [12:18:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:19:22] PROBLEM - Host logstash2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [12:25:13] !log btullis@cumin1001 END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1101.eqiad.wmnet [12:25:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:31:46] RECOVERY - Host logstash2028.mgmt is UP: PING OK - Packet loss = 0%, RTA = 35.67 ms [12:38:03] (03CR) 10Nikerabbit: [V: 03+2] Localisation updates from https://translatewiki.net. [software/mailman-templates] - 10https://gerrit.wikimedia.org/r/735968 (owner: 10L10n-bot) [12:40:45] (03PS4) 10Hnowlan: api-gateway: allow /staging/ testing namespace only in staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/715467 (https://phabricator.wikimedia.org/T289583) [12:58:32] (03CR) 10Alexandros Kosiaris: [V: 03+1] "PCC SUCCESS (DIFF 5 NOOP 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32025/console" [puppet] - 10https://gerrit.wikimedia.org/r/735944 (https://phabricator.wikimedia.org/T275752) (owner: 10Alexandros Kosiaris) [13:04:04] PROBLEM - Maps tiles generation on alert1001 is CRITICAL: CRITICAL: 90.35% of data under the critical threshold [5.0] https://wikitech.wikimedia.org/wiki/Maps/Runbook https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=8&fullscreen&orgId=1 [13:04:23] (03PS1) 10Muehlenhoff: Prefer mx1001 over mx2001 for smart hosts / wiki mail [puppet] - 10https://gerrit.wikimedia.org/r/735972 [13:05:26] 10SRE: Please create "maryana@wikipedia.org" email handle to use for annual fundraising email test - https://phabricator.wikimedia.org/T294758 (10spatton) [13:07:23] (03PS3) 10MMandere: install_server: Add drmrs DC site subnets [puppet] - 10https://gerrit.wikimedia.org/r/735608 (https://phabricator.wikimedia.org/T282787) [13:11:13] 10SRE, 10SRE-Access-Requests: Please create "maryana@wikipedia.org" email handle to use for annual fundraising email test - https://phabricator.wikimedia.org/T294758 (10ssingh) a:03ssingh [13:19:59] 10SRE, 10SRE-Access-Requests: Please create "maryana@wikipedia.org" email handle to use for annual fundraising email test - https://phabricator.wikimedia.org/T294758 (10ssingh) 05Open→03Resolved Hi @spatton: This has been implemented; please feel free to reopen if something is missing or if there are any c... [13:31:36] (03PS3) 10Alexandros Kosiaris: tlsproxy::localssl: Allow disabling http2 [puppet] - 10https://gerrit.wikimedia.org/r/735944 (https://phabricator.wikimedia.org/T275752) [13:31:38] (03PS3) 10Alexandros Kosiaris: maps: Disable http2 in tlsproxy [puppet] - 10https://gerrit.wikimedia.org/r/735945 (https://phabricator.wikimedia.org/T275752) [13:31:40] (03PS3) 10Alexandros Kosiaris: elasticsearch::cirrus: Disable http2 in tlsproxy [puppet] - 10https://gerrit.wikimedia.org/r/735946 (https://phabricator.wikimedia.org/T275752) [13:31:42] (03PS3) 10Alexandros Kosiaris: cloudelastic: Disable http2 in tlsproxy [puppet] - 10https://gerrit.wikimedia.org/r/735947 (https://phabricator.wikimedia.org/T275752) [13:31:44] (03PS1) 10Alexandros Kosiaris: relforge: Disable http2 in tlsproxy [puppet] - 10https://gerrit.wikimedia.org/r/735973 (https://phabricator.wikimedia.org/T275752) [13:31:46] (03CR) 10David Caro: [C: 03+2] start_instance_with_prefix: add tries parameter [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/731912 (https://phabricator.wikimedia.org/T292465) (owner: 10David Caro) [13:31:51] !log installing jbig2dec security updates [13:31:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:33:29] (03PS4) 10David Caro: ceph: introduce auth load abstraction [puppet] - 10https://gerrit.wikimedia.org/r/735615 (https://phabricator.wikimedia.org/T293752) (owner: 10Arturo Borrero Gonzalez) [13:34:58] (03CR) 10jerkins-bot: [V: 04-1] ceph: introduce auth load abstraction [puppet] - 10https://gerrit.wikimedia.org/r/735615 (https://phabricator.wikimedia.org/T293752) (owner: 10Arturo Borrero Gonzalez) [13:41:27] (03PS1) 10Muehlenhoff: Add library hint for jbig2dec [puppet] - 10https://gerrit.wikimedia.org/r/735974 [13:48:56] (03PS5) 10David Caro: ceph: introduce auth load abstraction [puppet] - 10https://gerrit.wikimedia.org/r/735615 (https://phabricator.wikimedia.org/T293752) (owner: 10Arturo Borrero Gonzalez) [13:49:47] (03CR) 10BBlack: "Looking pretty good - some nitpicks added in the comments:" [puppet] - 10https://gerrit.wikimedia.org/r/735608 (https://phabricator.wikimedia.org/T282787) (owner: 10MMandere) [13:53:40] (03PS1) 10BBlack: Add ntp.drmrs.wm.o for initial installs [dns] - 10https://gerrit.wikimedia.org/r/735976 (https://phabricator.wikimedia.org/T282787) [13:56:25] (03CR) 10Muehlenhoff: [C: 03+2] Add library hint for jbig2dec [puppet] - 10https://gerrit.wikimedia.org/r/735974 (owner: 10Muehlenhoff) [13:57:06] (03CR) 10Kormat: "Some comments below." [puppet] - 10https://gerrit.wikimedia.org/r/735688 (https://phabricator.wikimedia.org/T284150) (owner: 10Ottomata) [14:06:19] (03CR) 10Ottomata: [WIP] profile::analytics::database::mariadb_multi (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/735688 (https://phabricator.wikimedia.org/T284150) (owner: 10Ottomata) [14:07:51] (03CR) 10Alexandros Kosiaris: [V: 03+1] "PCC SUCCESS (DIFF 5 NOOP 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32027/console" [puppet] - 10https://gerrit.wikimedia.org/r/735944 (https://phabricator.wikimedia.org/T275752) (owner: 10Alexandros Kosiaris) [14:11:33] (03CR) 10BBlack: [C: 03+2] Add ntp.drmrs.wm.o for initial installs [dns] - 10https://gerrit.wikimedia.org/r/735976 (https://phabricator.wikimedia.org/T282787) (owner: 10BBlack) [14:12:40] (03PS4) 10MMandere: install_server: Add drmrs DC site subnets [puppet] - 10https://gerrit.wikimedia.org/r/735608 (https://phabricator.wikimedia.org/T282787) [14:13:27] (03CR) 10MMandere: install_server: Add drmrs DC site subnets (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/735608 (https://phabricator.wikimedia.org/T282787) (owner: 10MMandere) [14:14:26] (03CR) 10BBlack: [C: 03+1] install_server: Add drmrs DC site subnets [puppet] - 10https://gerrit.wikimedia.org/r/735608 (https://phabricator.wikimedia.org/T282787) (owner: 10MMandere) [14:15:55] (03PS1) 10Jelto: services: add support to deploy all services with helm3 [deployment-charts] - 10https://gerrit.wikimedia.org/r/735979 (https://phabricator.wikimedia.org/T251305) [14:16:22] (03CR) 10MMandere: [C: 03+2] install_server: Add drmrs DC site subnets [puppet] - 10https://gerrit.wikimedia.org/r/735608 (https://phabricator.wikimedia.org/T282787) (owner: 10MMandere) [14:18:29] 10SRE, 10serviceops, 10Kubernetes, 10Patch-For-Review: Migrate to helm v3 - https://phabricator.wikimedia.org/T251305 (10JMeybohm) Great! I think you can proceed with patching the helmfiles. It's worth noticing that `helmfile destroy` does remove the helm history (e.g. all those ConfigMaps tiller creates)... [14:37:41] !log updating PHP on mwdebug1001 [14:37:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:41:09] (03PS1) 10Papaul: change drmrs PDU facilities from 3 phase to 1 phase [puppet] - 10https://gerrit.wikimedia.org/r/735986 (https://phabricator.wikimedia.org/T294597) [14:42:17] (03CR) 10Papaul: [C: 03+2] change drmrs PDU facilities from 3 phase to 1 phase [puppet] - 10https://gerrit.wikimedia.org/r/735986 (https://phabricator.wikimedia.org/T294597) (owner: 10Papaul) [14:54:53] !log uploaded PHP 7.2.34-18+0~20210223.60+debian10~1.gbpb21322+wmf3 to apt.wikimedia.org (buster-wikimedia/component/php72) T294317 [14:54:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:02:34] 10SRE, 10SRE-Access-Requests: Create "maryana@wikipedia.org" email handle for annual fundraising email test (replying to donate@) - https://phabricator.wikimedia.org/T294758 (10Aklapper) [15:08:15] 10SRE, 10ops-drmrs, 10DC-Ops, 10Infrastructure-Foundations, and 2 others: (Need By: TBD) setup/config PDU in drmrs ( ps1-b12 and ps1-b13) - https://phabricator.wikimedia.org/T294597 (10Papaul) [15:09:00] 10SRE, 10ops-drmrs, 10DC-Ops, 10Infrastructure-Foundations, and 2 others: (Need By: TBD) setup/config PDU in drmrs ( ps1-b12 and ps1-b13) - https://phabricator.wikimedia.org/T294597 (10Papaul) 05Open→03Resolved Complete [15:10:46] (03PS15) 10Jgiannelos: maps: Add script to send tile invalidation events [puppet] - 10https://gerrit.wikimedia.org/r/722825 (https://phabricator.wikimedia.org/T270175) [15:12:43] (03CR) 10Alexandros Kosiaris: [V: 03+1 C: 03+1] "PCC says noop for anything that uses the define, aside from swift. On swift http2 is being turned off on purpose" [puppet] - 10https://gerrit.wikimedia.org/r/735944 (https://phabricator.wikimedia.org/T275752) (owner: 10Alexandros Kosiaris) [15:12:45] !log installing tiff security updates [15:12:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:16:39] (03PS1) 10BBlack: conftool-data: remove "dns" cluster [puppet] - 10https://gerrit.wikimedia.org/r/735991 [15:25:57] (03PS1) 10MMandere: site: Add new cache instances [puppet] - 10https://gerrit.wikimedia.org/r/735994 [15:28:50] !log rolling restart of mw canaries to pick up tiff security updates [15:28:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:30:04] jan_drewniak: #bothumor I � Unicode. All rise for Wikimedia Portals Update deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211101T1530). [15:44:18] (03PS1) 10Ottomata: eventgate-main - Bump image versino to get mediawiki/revision/create slot change [deployment-charts] - 10https://gerrit.wikimedia.org/r/735996 (https://phabricator.wikimedia.org/T293195) [15:48:48] (03CR) 10Ottomata: [C: 03+2] eventgate-main - Bump image versino to get mediawiki/revision/create slot change [deployment-charts] - 10https://gerrit.wikimedia.org/r/735996 (https://phabricator.wikimedia.org/T293195) (owner: 10Ottomata) [15:48:50] (03CR) 10Ottomata: [V: 03+2 C: 03+2] eventgate-main - Bump image versino to get mediawiki/revision/create slot change [deployment-charts] - 10https://gerrit.wikimedia.org/r/735996 (https://phabricator.wikimedia.org/T293195) (owner: 10Ottomata) [15:49:28] !log installing opencv security updates on stretch [15:49:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:50:39] !log otto@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . [15:50:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:52:49] !log otto@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . [15:52:49] !log otto@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' . [15:52:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:52:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:01:09] (03PS1) 10PipelineBot: shellbox-constraints: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/736000 [16:05:15] (03PS1) 10PipelineBot: shellbox: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/736002 [16:09:28] (03PS1) 10PipelineBot: shellbox-timeline: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/736003 [16:15:19] (03CR) 10BryanDavis: "I haven't touched anything in this area for 4+ years, so likely Thiemo is a better reviewer than me. :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735769 (owner: 10Awight) [16:18:36] (03PS1) 10Muehlenhoff: Add library hint for opencv [puppet] - 10https://gerrit.wikimedia.org/r/736005 [16:20:31] (03CR) 10Andrew Bogott: [C: 03+1] start_instance_with_prefix: Group options in a dataclass [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/731908 (owner: 10David Caro) [16:26:37] (03CR) 10Andrew Bogott: [C: 03+1] "Having just copy/pasted all these options into a new cookbook I fully support this!" [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/731906 (owner: 10David Caro) [16:28:39] (03PS1) 10Legoktm: shellbox-media: Bump to 2021-11-01-155830-media [deployment-charts] - 10https://gerrit.wikimedia.org/r/736008 [16:32:30] (03CR) 10Legoktm: [C: 03+2] shellbox-media: Bump to 2021-11-01-155830-media [deployment-charts] - 10https://gerrit.wikimedia.org/r/736008 (owner: 10Legoktm) [16:53:23] !log otto@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'production' . [16:53:23] !log otto@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'eventgate-main' for release 'canary' . [16:53:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:53:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:57:21] (03CR) 10Andrew Bogott: [C: 03+2] start_instance_with_prefix: fix next index generation [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/731885 (owner: 10David Caro) [16:58:27] (03CR) 10Andrew Bogott: [C: 03+2] start_instance_prefix: add reusable params helpers [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/731906 (owner: 10David Caro) [17:00:05] ryankemper: How many deployers does it take to do Wikidata Query Service weekly deploy deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211101T1700). [17:01:14] (03Merged) 10jenkins-bot: start_instance_prefix: add reusable params helpers [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/731906 (owner: 10David Caro) [17:01:40] (03CR) 10Andrew Bogott: [C: 03+2] wmcs: create composite type OpenstackIdentifier [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/731907 (owner: 10David Caro) [17:01:51] (03CR) 10Andrew Bogott: [C: 03+2] start_instance_with_prefix: Group options in a dataclass [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/731908 (owner: 10David Caro) [17:04:39] jouncebot: whats up [17:04:42] jouncebot: nowandnext [17:04:42] For the next 0 hour(s) and 25 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211101T1700) [17:04:42] In 0 hour(s) and 55 minute(s): UTC evening backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211101T1800) [17:04:45] (03CR) 10Andrew Bogott: [C: 03+2] start_instance_with_prefix: work around extra stderr message [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/731913 (owner: 10David Caro) [17:05:27] (03Merged) 10jenkins-bot: wmcs: create composite type OpenstackIdentifier [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/731907 (owner: 10David Caro) [17:06:06] !log removing font packages from canary appservers (T294378, gerrit:735685) [17:06:06] (03Merged) 10jenkins-bot: start_instance_with_prefix: Group options in a dataclass [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/731908 (owner: 10David Caro) [17:06:08] (03Merged) 10jenkins-bot: start_instance_with_prefix: add tries parameter [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/731912 (https://phabricator.wikimedia.org/T292465) (owner: 10David Caro) [17:06:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:06:10] T294378: Remove mediawiki::packages::fonts from non thumbor servers - https://phabricator.wikimedia.org/T294378 [17:06:17] (03CR) 10Dzahn: [C: 03+2] mediawiki: remove font packages from all canary appservers [puppet] - 10https://gerrit.wikimedia.org/r/735685 (https://phabricator.wikimedia.org/T294378) (owner: 10Dzahn) [17:08:41] (03Merged) 10jenkins-bot: start_instance_with_prefix: work around extra stderr message [cookbooks] (wmcs) - 10https://gerrit.wikimedia.org/r/731913 (owner: 10David Caro) [17:16:28] PROBLEM - SSH on puppetmaster1002.mgmt is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [17:17:55] (03CR) 10Alexandros Kosiaris: [C: 03+1] service/miscweb: switch state from service_setup to lvs_setup [puppet] - 10https://gerrit.wikimedia.org/r/694628 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [17:18:04] (03CR) 10Alexandros Kosiaris: [C: 03+1] service/miscweb: switch state from lvs_setup to monitoring_setup [puppet] - 10https://gerrit.wikimedia.org/r/694629 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [17:18:14] (03CR) 10Alexandros Kosiaris: [C: 03+1] service/miscweb: switch state from monitoring_setup to production [puppet] - 10https://gerrit.wikimedia.org/r/694630 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [17:20:33] (03PS1) 10Legoktm: Rebuild PHP 7.2 and 7.3 images [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/736014 (https://phabricator.wikimedia.org/T294317) [17:21:10] (03PS2) 10MMandere: site: Add new cache instances in ulsfo DC [puppet] - 10https://gerrit.wikimedia.org/r/735994 (https://phabricator.wikimedia.org/T290694) [17:21:23] (03CR) 10Alexandros Kosiaris: "You 'll need a change for modules/profile/files/configmaster/disc_desired_state.py as well after you feel it's ready." [puppet] - 10https://gerrit.wikimedia.org/r/694625 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [17:21:27] (03CR) 10Alexandros Kosiaris: [C: 03+1] add miscweb to LVS [puppet] - 10https://gerrit.wikimedia.org/r/694625 (https://phabricator.wikimedia.org/T281538) (owner: 10Dzahn) [17:23:32] (03CR) 10BBlack: [C: 03+1] site: Add new cache instances in ulsfo DC [puppet] - 10https://gerrit.wikimedia.org/r/735994 (https://phabricator.wikimedia.org/T290694) (owner: 10MMandere) [17:23:54] (03CR) 10Legoktm: [V: 03+2 C: 03+2] Rebuild PHP 7.2 and 7.3 images [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/736014 (https://phabricator.wikimedia.org/T294317) (owner: 10Legoktm) [17:25:44] (03PS1) 10Legoktm: Fix version number in PHP 7.4 changelogs [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/736016 [17:25:59] (03CR) 10Legoktm: [V: 03+2 C: 03+2] Fix version number in PHP 7.4 changelogs [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/736016 (owner: 10Legoktm) [17:29:15] (03PS1) 10Dzahn: canary_api: remove font packages from canary API servers [puppet] - 10https://gerrit.wikimedia.org/r/736017 (https://phabricator.wikimedia.org/T294378) [17:30:29] (03CR) 10Dzahn: [C: 03+2] canary_api: remove font packages from canary API servers [puppet] - 10https://gerrit.wikimedia.org/r/736017 (https://phabricator.wikimedia.org/T294378) (owner: 10Dzahn) [17:37:36] (03CR) 10Hnowlan: [C: 03+1] maps: Disable http2 in tlsproxy [puppet] - 10https://gerrit.wikimedia.org/r/735945 (https://phabricator.wikimedia.org/T275752) (owner: 10Alexandros Kosiaris) [17:37:57] (03PS1) 10Ottomata: Add role::analytics_cluster::database::meta on an-db100[12] [puppet] - 10https://gerrit.wikimedia.org/r/736019 (https://phabricator.wikimedia.org/T284150) [17:38:22] (03PS1) 10Dzahn: canary_api: actually remove font packages from canary API servers [puppet] - 10https://gerrit.wikimedia.org/r/736020 (https://phabricator.wikimedia.org/T294378) [17:39:58] (03PS2) 10Dzahn: canary_api: actually remove font packages from canary API servers [puppet] - 10https://gerrit.wikimedia.org/r/736020 (https://phabricator.wikimedia.org/T294378) [17:43:08] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [17:43:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:43:43] (03PS2) 10Ottomata: Add role::analytics_cluster::database::meta on an-db100[12] [puppet] - 10https://gerrit.wikimedia.org/r/736019 (https://phabricator.wikimedia.org/T284150) [17:44:30] (03CR) 10Elukey: [V: 03+1] role::ml_k8s::master: add node-role.kubernetes.io/master labels (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/735577 (https://phabricator.wikimedia.org/T289834) (owner: 10Elukey) [17:44:35] (03CR) 10Dzahn: [C: 03+2] canary_api: actually remove font packages from canary API servers [puppet] - 10https://gerrit.wikimedia.org/r/736020 (https://phabricator.wikimedia.org/T294378) (owner: 10Dzahn) [17:45:20] PROBLEM - Juniper alarms on cr2-codfw is CRITICAL: JNX_ALARMS CRITICAL - 3 red alarms, 1 yellow alarms https://wikitech.wikimedia.org/wiki/Network_monitoring%23Juniper_alarm [17:45:38] PROBLEM - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 85, down: 9, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [17:45:42] PROBLEM - BFD status on cr2-codfw is CRITICAL: CRIT: Down: 1 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [17:46:19] !log removing mediawiki font packages from the 8 canary API servers, in addition to 11 canary appservers T294378 [17:46:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:46:22] T294378: Remove mediawiki::packages::fonts from non thumbor servers - https://phabricator.wikimedia.org/T294378 [17:46:24] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [17:46:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:46:36] PROBLEM - OSPF status on mr1-codfw is CRITICAL: OSPFv2: 1/2 UP : OSPFv3: 1/2 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [17:47:33] topranks: 9 interfaces down is a lot for cr2-codfw afaict [17:48:39] yeah absolutely... let me look, sounds like a line card or something [17:48:57] ACK, thanks [17:52:13] !log force-resetting FPC 0 on cr2-codfw as it appears hard down. [17:52:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:53:17] (03CR) 10Ebernhardson: [C: 03+1] relforge: Disable http2 in tlsproxy [puppet] - 10https://gerrit.wikimedia.org/r/735973 (https://phabricator.wikimedia.org/T275752) (owner: 10Alexandros Kosiaris) [17:53:23] (03CR) 10Ebernhardson: [C: 03+1] cloudelastic: Disable http2 in tlsproxy [puppet] - 10https://gerrit.wikimedia.org/r/735947 (https://phabricator.wikimedia.org/T275752) (owner: 10Alexandros Kosiaris) [17:53:35] (03CR) 10Ebernhardson: [C: 03+1] elasticsearch::cirrus: Disable http2 in tlsproxy [puppet] - 10https://gerrit.wikimedia.org/r/735946 (https://phabricator.wikimedia.org/T275752) (owner: 10Alexandros Kosiaris) [17:54:04] mutante: thanks for the heads up. Yeah card 0 seems to have failed. [17:54:38] Looks like it may be in a reboot cycle. Will discuss with dcops see if we can get someone on site. [17:54:50] PROBLEM - k8s API server requests latencies on kubemaster2002 is CRITICAL: instance=10.192.16.48 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [17:55:16] topranks: ok, sounds like "high" prio but not emergency? well, "getting someone on site"-level, ACK! [17:56:26] Definitely high priority, all access to servers is down from cr2-codfw, however they are reachable still from cr1, and the link between crs is working. [17:56:42] In theory should all failover but haven't had time to assess fully the impact. [17:56:56] RECOVERY - k8s API server requests latencies on kubemaster2002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [17:57:46] *nod* if it needs a document or someone to chat to dcops I can do that so you can keep looking [17:58:45] on that second part [18:00:05] RoanKattouw and Urbanecm: That opportune time is upon us again. Time for a UTC evening backport window deploy. Don't be afraid. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211101T1800). [18:00:05] Juan_90264: A patch you scheduled for UTC evening backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:01:19] i would be able to deploy today... [18:01:22] hi Juan_90264 [18:01:30] was just saying you're not here :) [18:01:47] (Juniper alarm active) firing: Juniper alarm active - https://alerts.wikimedia.org [18:01:52] (03CR) 10MMandere: [C: 03+2] site: Add new cache instances in ulsfo DC [puppet] - 10https://gerrit.wikimedia.org/r/735994 (https://phabricator.wikimedia.org/T290694) (owner: 10MMandere) [18:01:57] Okay [18:02:22] (03PS2) 10Urbanecm: Amend wordmark for the Meetei (Manipuri) Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735679 (https://phabricator.wikimedia.org/T294189) (owner: 10Odder) [18:02:26] Now yes, I'm present [18:02:26] (03CR) 10Urbanecm: [C: 03+2] Amend wordmark for the Meetei (Manipuri) Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735679 (https://phabricator.wikimedia.org/T294189) (owner: 10Odder) [18:02:33] great Juan_90264 [18:03:14] (03Merged) 10jenkins-bot: Amend wordmark for the Meetei (Manipuri) Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735679 (https://phabricator.wikimedia.org/T294189) (owner: 10Odder) [18:03:46] Juan_90264: available at mwdebug1001, please test [18:03:58] Okay [18:05:58] hey mw deployers, heads up for you: just removed all the mw font packages from canary appservers, incl. mwdebug*. if you see any unexpected things, warnings or reports related to missing fonts, please raise them [18:06:02] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [18:06:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:06:05] it's kind of to proof we can really remove them [18:06:35] urbanecm: I tested and approved [18:06:35] mutante: acknowledged. I'm syncing a static resource now, so shouldn't affect me in theory [18:06:53] (might be worth an ops-l mail, which deployers should read -- I don't see any) [18:08:18] !log urbanecm@deploy1002 Synchronized static/images/mobile/copyright/wikipedia-wordmark-mni.svg: fb433d67f738a2b7dd436e9298f716b14d66c155: Amend wordmark for the Meetei (Manipuri) Wikipedia (T294189; 1/2) (duration: 00m 55s) [18:08:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:08:21] T294189: Add a mobile logo for the Meetei (Manipuri) Wikipedia - https://phabricator.wikimedia.org/T294189 [18:09:26] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [18:09:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:09:34] !log Purge https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-wordmark-mni.svg (T294189) [18:09:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:09:51] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: fb433d67f738a2b7dd436e9298f716b14d66c155: Amend wordmark for the Meetei (Manipuri) Wikipedia (T294189; 2/2) (duration: 00m 55s) [18:09:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:09:56] Juan_90264: should be live [18:10:37] (03PS1) 10PipelineBot: shellbox-constraints: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/736024 [18:10:43] urbanecm: Wordmark fixed working [18:10:51] great! [18:10:54] then we're done :) [18:10:55] (LogstashKafkaConsumerLag) firing: (2) Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://alerts.wikimedia.org [18:11:19] ...i hope that alert's not me? [18:11:30] * urbanecm is always suspicious about an alert shortly after his deployment [18:12:12] Exactly, thanks urbanecm [18:12:16] np [18:12:34] hm [18:12:36] I dunno, but there is always a change that adds to $CACHES, which is used in firewalling rules in base or so [18:12:42] s/always/also [18:13:02] ferm reload on almost everything with that [18:13:02] it's complaining about logstash [18:13:58] this? "The reason for index failure is usually conflicting fields, see also bug T150106 for a detailed discussion of the problem. " [18:13:59] T150106: Type collisions in log events causing indexing failures in ELK Elasticsearch - https://phabricator.wikimedia.org/T150106 [18:14:16] (03PS1) 10PipelineBot: shellbox: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/736025 [18:14:43] cwhite: around? there's an alert for too many messages in kafka, and it seems to be related to logstash [18:15:55] (LogstashKafkaConsumerLag) firing: (2) Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://alerts.wikimedia.org [18:17:26] seems to have resolved itself? [18:17:34] looking at https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=logging-eqiad&var-topic=All&var-consumer_group=All&from=now-1h&to=now [18:17:42] (03PS1) 10PipelineBot: shellbox-timeline: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/736026 [18:18:29] *nod* based on "For a host of reasons it might happen that there's a buildup of messages on Kafka" it was just a temp. buildup it seems [18:19:06] could be restarts caused by the ferm change [18:19:09] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [18:19:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:19:32] 10SRE, 10ops-codfw, 10DC-Ops, 10Platform Engineering, and 2 others: Q2:(Need By: TBD) rack/setup/install restbase202[456].codfw.wmnet - https://phabricator.wikimedia.org/T294377 (10Eevans) >>! In T294377#7466932, @Papaul wrote: > @Eevans just for curiosity, any reason we have no restbase hosts in row A ?... [18:20:06] (03CR) 10Legoktm: [C: 03+2] shellbox-constraints: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/736024 (owner: 10PipelineBot) [18:20:55] (LogstashKafkaConsumerLag) resolved: (2) Too many messages in kafka logging - https://wikitech.wikimedia.org/wiki/Logstash#Kafka_consumer_lag - https://alerts.wikimedia.org [18:24:18] (03CR) 10MSantos: [C: 03+2] tile-pregeneration: Adapt to new event schema [software/tegola] (wmf/v0.14.x) - 10https://gerrit.wikimedia.org/r/734975 (https://phabricator.wikimedia.org/T293366) (owner: 10Jgiannelos) [18:24:25] (03Merged) 10jenkins-bot: shellbox-constraints: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/736024 (owner: 10PipelineBot) [18:24:40] (03CR) 10MSantos: [C: 04-1] "One last nit and I think we are good to go." [puppet] - 10https://gerrit.wikimedia.org/r/722825 (https://phabricator.wikimedia.org/T270175) (owner: 10Jgiannelos) [18:25:25] (03Merged) 10jenkins-bot: tile-pregeneration: Adapt to new event schema [software/tegola] (wmf/v0.14.x) - 10https://gerrit.wikimedia.org/r/734975 (https://phabricator.wikimedia.org/T293366) (owner: 10Jgiannelos) [18:25:25] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [18:25:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:26:42] !log legoktm@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' . [18:26:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:33:23] (03CR) 10Nikki Nikkhoui: [C: 03+1] api-gateway: allow /staging/ testing namespace only in staging [deployment-charts] - 10https://gerrit.wikimedia.org/r/715467 (https://phabricator.wikimedia.org/T289583) (owner: 10Hnowlan) [18:34:46] PROBLEM - k8s API server requests latencies on kubemaster2002 is CRITICAL: instance=10.192.16.48 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [18:36:52] RECOVERY - k8s API server requests latencies on kubemaster2002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [18:37:38] !log legoktm@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' . [18:37:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:49:26] !log legoktm@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-constraints' for release 'main' . [18:49:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:50:06] (03PS1) 10Muehlenhoff: Switch Brooke to volunteer NDA status [puppet] - 10https://gerrit.wikimedia.org/r/736030 [18:51:13] PROBLEM - Check systemd state on cp4035 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-varnish-exporter.service,varnishncsa.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:51:41] PROBLEM - Check systemd state on cp4034 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-varnish-exporter.service,varnishncsa.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [18:51:45] uh oh [18:53:09] (03CR) 10Legoktm: wikitech::web: remove font packages from wikitech servers (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/735042 (https://phabricator.wikimedia.org/T294378) (owner: 10Dzahn) [18:53:09] Mondays eh. [18:53:48] ACKNOWLEDGEMENT - BFD status on cr2-codfw is CRITICAL: CRIT: Down: 1 Cathal Mooney FPC 0 Linecard failure. T294789 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BFD_status [18:54:28] ACKNOWLEDGEMENT - Juniper alarms on cr2-codfw is CRITICAL: JNX_ALARMS CRITICAL - 3 red alarms, 1 yellow alarms Cathal Mooney FPC 0 Linecard failure. T294789 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Juniper_alarm [18:54:56] tn: very much today [18:55:08] ACKNOWLEDGEMENT - Router interfaces on cr2-codfw is CRITICAL: CRITICAL: host 208.80.153.193, interfaces up: 85, down: 17, dormant: 0, excluded: 0, unused: 0: Cathal Mooney FPC 0 Linecard failure. T294789 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [18:56:39] (03PS1) 10Ppchelko: Remove hook set for incident reponse in 2020 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/736032 [18:57:17] PROBLEM - traffic-pool service on cp4036 is CRITICAL: CRITICAL - Expecting active but unit traffic-pool is inactive https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [19:02:29] (03CR) 10Legoktm: [C: 03+2] shellbox-timeline: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/736026 (owner: 10PipelineBot) [19:02:31] (03CR) 10Legoktm: [C: 03+2] shellbox: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/736025 (owner: 10PipelineBot) [19:03:32] (03PS2) 10Legoktm: shellbox-media: Bump to 2021-11-01-180934-media [deployment-charts] - 10https://gerrit.wikimedia.org/r/736008 [19:07:17] jouncebot: nowandnext [19:07:17] No deployments scheduled for the next 0 hour(s) and 52 minute(s) [19:07:17] In 0 hour(s) and 52 minute(s): Services – Graphoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211101T2000) [19:07:25] * urbanecm stashing at mwdebug1001 [19:07:42] (03Merged) 10jenkins-bot: shellbox-timeline: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/736026 (owner: 10PipelineBot) [19:07:44] (03Merged) 10jenkins-bot: shellbox: pipeline bot promote [deployment-charts] - 10https://gerrit.wikimedia.org/r/736025 (owner: 10PipelineBot) [19:08:35] PROBLEM - Check systemd state on cp4033 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-varnish-exporter.service,varnishncsa.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:09:05] sukhe: are cascading crashes across proxies expected [19:09:26] That's 4 now in same DC [19:10:14] can I press the klaxon yet /j [19:10:25] * legoktm looks [19:10:32] tn it's just so tempting [19:10:37] * hauskatze eyes tn [19:10:50] tn: hehe [19:11:00] Ty legoktm [19:11:28] * urbanecm done testing [19:11:30] (ways to get removed from that group #1) [19:11:42] (03CR) 10Dzahn: [C: 03+1] "it does seem to match the ticket from 2019 handled by gtirloni. that's all I know. never been involved in removing dropped tables I think" [puppet] - 10https://gerrit.wikimedia.org/r/735723 (https://phabricator.wikimedia.org/T216481) (owner: 10Zabe) [19:12:14] "I was just checking to see if pressing the button had a cool animation!" [19:13:05] mmandere: varnishncsa and prometheus-varnish-exporter are down on various ulsfo hosts, the last puppet change is yours of "Add new cache instances in ulsfo DC" [19:13:55] discussing this with traffic as well [19:14:40] ok [19:16:37] PROBLEM - traffic-pool service on cp4035 is CRITICAL: CRITICAL - Expecting active but unit traffic-pool is inactive https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [19:16:46] sukhe: any reason to not revert first? [19:16:54] legoktm: they are the added ones [19:17:00] oh [19:17:04] So maybe some setup missed [19:17:37] not pooled yet, great [19:17:49] * legoktm stops looking [19:17:49] it is WIP [19:17:53] see the PENDING section [19:17:58] Icinga is saying they're not active so ye should be ok [19:17:59] there will be more shortly [19:18:01] https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=all&type=detail&servicestatustypes=1 [19:18:04] (03PS1) 10Ottomata: Remove unused bigtop hive and oozie database creation code [puppet] - 10https://gerrit.wikimedia.org/r/736034 (https://phabricator.wikimedia.org/T284150) [19:18:12] Icinga is in the process of adding the checks [19:18:34] mutante: as long as we know we don't need to worry [19:18:51] mmandere: if they not supposed to working, can they be downtimed? [19:19:14] !log legoktm@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'shellbox' for release 'main' . [19:19:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:19:26] (03PS3) 10Ottomata: Add role::analytics_cluster::database::meta on an-db100[12] [puppet] - 10https://gerrit.wikimedia.org/r/736019 (https://phabricator.wikimedia.org/T284150) [19:19:39] PROBLEM - Check systemd state on cp4036 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-varnish-exporter.service,varnishncsa.service https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:20:11] (03CR) 10Ottomata: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32029/console" [puppet] - 10https://gerrit.wikimedia.org/r/736019 (https://phabricator.wikimedia.org/T284150) (owner: 10Ottomata) [19:20:13] (03PS1) 10Urbanecm: Prepare a QuickSurvey for Growth IP research [mediawiki-config] - 10https://gerrit.wikimedia.org/r/736035 (https://phabricator.wikimedia.org/T294568) [19:20:24] ^ as long as it's 4033 thorugh 4036 they are new [19:20:26] https://phabricator.wikimedia.org/T290694 [19:22:24] (03CR) 10Ottomata: [V: 03+1] "an-db1002 has read_only = 1" [puppet] - 10https://gerrit.wikimedia.org/r/736019 (https://phabricator.wikimedia.org/T284150) (owner: 10Ottomata) [19:22:31] (03PS4) 10Ottomata: Add role::analytics_cluster::database::meta on an-db100[12] [puppet] - 10https://gerrit.wikimedia.org/r/736019 (https://phabricator.wikimedia.org/T284150) [19:22:57] (03Abandoned) 10Ottomata: [WIP] profile::analytics::database::mariadb_multi [puppet] - 10https://gerrit.wikimedia.org/r/735688 (https://phabricator.wikimedia.org/T284150) (owner: 10Ottomata) [19:23:41] !log legoktm@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'shellbox' for release 'main' . [19:23:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:24:48] mutante: I see some discussion about some alers. Is it ok to do a MW deploy now? [19:25:05] urbanecm: yep [19:25:09] thanks [19:25:15] (03PS2) 10Urbanecm: Prepare a QuickSurvey for Growth IP research [mediawiki-config] - 10https://gerrit.wikimedia.org/r/736035 (https://phabricator.wikimedia.org/T294568) [19:25:24] (03CR) 10Urbanecm: [C: 03+2] Prepare a QuickSurvey for Growth IP research [mediawiki-config] - 10https://gerrit.wikimedia.org/r/736035 (https://phabricator.wikimedia.org/T294568) (owner: 10Urbanecm) [19:26:11] (03Merged) 10jenkins-bot: Prepare a QuickSurvey for Growth IP research [mediawiki-config] - 10https://gerrit.wikimedia.org/r/736035 (https://phabricator.wikimedia.org/T294568) (owner: 10Urbanecm) [19:26:17] ACKNOWLEDGEMENT - Check systemd state on cp4033 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-varnish-exporter.service,varnishncsa.service daniel_zahn https://phabricator.wikimedia.org/T290694 https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:26:17] ACKNOWLEDGEMENT - Check systemd state on cp4034 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-varnish-exporter.service,varnishncsa.service daniel_zahn https://phabricator.wikimedia.org/T290694 https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:26:17] ACKNOWLEDGEMENT - Check systemd state on cp4035 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-varnish-exporter.service,varnishncsa.service daniel_zahn https://phabricator.wikimedia.org/T290694 https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:26:17] ACKNOWLEDGEMENT - traffic-pool service on cp4035 is CRITICAL: CRITICAL - Expecting active but unit traffic-pool is inactive daniel_zahn https://phabricator.wikimedia.org/T290694 https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [19:26:17] ACKNOWLEDGEMENT - Check systemd state on cp4036 is CRITICAL: CRITICAL - degraded: The following units failed: prometheus-varnish-exporter.service,varnishncsa.service daniel_zahn https://phabricator.wikimedia.org/T290694 https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [19:26:17] ACKNOWLEDGEMENT - traffic-pool service on cp4036 is CRITICAL: CRITICAL - Expecting active but unit traffic-pool is inactive daniel_zahn https://phabricator.wikimedia.org/T290694 https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state [19:26:20] !log legoktm@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'shellbox' for release 'main' . [19:26:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:28:06] 10SRE, 10ops-ulsfo, 10DC-Ops, 10Traffic: Q1:(Need By: TBD) rack/setup/install cp403[3-6].ulsfo.wmnet - https://phabricator.wikimedia.org/T290694 (10Dzahn) Icinga alerts that were added by puppet started firing and raised some questions but confirmed it was just about these new hosts and they just switched... [19:28:11] (03CR) 10Dzahn: "https://phabricator.wikimedia.org/T290694#7473030" [puppet] - 10https://gerrit.wikimedia.org/r/735994 (https://phabricator.wikimedia.org/T290694) (owner: 10MMandere) [19:29:26] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: cba805cb8aaa88d814bfff19b82e8f57ace4fafd: Prepare a QuickSurvey for Growth IP research (T294568) (duration: 00m 55s) [19:29:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:29:29] T294568: deploy quicksurvey for editors on eswiki and arwiki (for Growth IP editors research) - https://phabricator.wikimedia.org/T294568 [19:31:57] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [19:31:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:32:21] * urbanecm done [19:34:34] 10SRE, 10ops-ulsfo, 10DC-Ops, 10Traffic: Q1:(Need By: TBD) rack/setup/install cp403[3-6].ulsfo.wmnet - https://phabricator.wikimedia.org/T290694 (10BBlack) Yeah sorry for the noise - we weren't anticipating the hosts re-puppeting themselves into the productions roles (incorrectly!) and should've just puppe... [19:35:22] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [19:35:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:43:23] PROBLEM - k8s API server requests latencies on kubemaster2002 is CRITICAL: instance=10.192.16.48 verb=LIST https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [19:45:21] RECOVERY - k8s API server requests latencies on kubemaster2002 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Kubernetes https://grafana.wikimedia.org/dashboard/db/kubernetes-api?viewPanel=27 [20:00:05] chrisalbon and accraze: Your horoscope predicts another unfortunate Services – Graphoid / ORES deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211101T2000). [20:02:54] (03PS1) 10Legoktm: planet: Add JeanFred's blog [puppet] - 10https://gerrit.wikimedia.org/r/736037 [20:04:03] (03PS3) 10Herron: base_packages: install netcat-openbsd by default [puppet] - 10https://gerrit.wikimedia.org/r/735413 [20:05:04] (03CR) 10Dzahn: [C: 03+2] planet: Add JeanFred's blog [puppet] - 10https://gerrit.wikimedia.org/r/736037 (owner: 10Legoktm) [20:07:04] (03CR) 10Herron: base_packages: install netcat-openbsd by default (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/735413 (owner: 10Herron) [20:08:37] !log planet1002 - systemctl start update-en-planet after merging config change btw. legoktm: it should be included in a sec [20:08:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:08:51] ty :) [20:08:56] ;) [20:09:18] (03CR) 10Herron: [C: 03+1] Prefer mx1001 over mx2001 for smart hosts / wiki mail [puppet] - 10https://gerrit.wikimedia.org/r/735972 (owner: 10Muehlenhoff) [20:10:02] !log legoktm@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' . [20:10:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:11:04] (03PS1) 10Legoktm: shellbox-syntaxhighlight: Bump to 2021-11-01-180934-syntaxhighlight [deployment-charts] - 10https://gerrit.wikimedia.org/r/736039 [20:11:26] (03CR) 10Legoktm: [C: 03+2] shellbox-media: Bump to 2021-11-01-180934-media [deployment-charts] - 10https://gerrit.wikimedia.org/r/736008 (owner: 10Legoktm) [20:11:30] (03CR) 10Legoktm: [C: 03+2] shellbox-syntaxhighlight: Bump to 2021-11-01-180934-syntaxhighlight [deployment-charts] - 10https://gerrit.wikimedia.org/r/736039 (owner: 10Legoktm) [20:12:36] !log legoktm@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' . [20:12:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:14:28] !log legoktm@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-timeline' for release 'main' . [20:14:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:16:05] (03Merged) 10jenkins-bot: shellbox-media: Bump to 2021-11-01-180934-media [deployment-charts] - 10https://gerrit.wikimedia.org/r/736008 (owner: 10Legoktm) [20:16:07] (03Merged) 10jenkins-bot: shellbox-syntaxhighlight: Bump to 2021-11-01-180934-syntaxhighlight [deployment-charts] - 10https://gerrit.wikimedia.org/r/736039 (owner: 10Legoktm) [20:17:08] (03CR) 10Dzahn: [C: 03+2] growthexperiments.pp: Remove absented job [puppet] - 10https://gerrit.wikimedia.org/r/734565 (https://phabricator.wikimedia.org/T278103) (owner: 10Urbanecm) [20:18:07] !log legoktm@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'shellbox-media' for release 'main' . [20:18:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:22:48] !log legoktm@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'shellbox-media' for release 'main' . [20:22:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:24:24] !log legoktm@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-media' for release 'main' . [20:24:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:24:29] (03CR) 10Dzahn: [C: 03+2] growthexperiments.pp: Run purgeExpiredMentorStatus.php twice a month [puppet] - 10https://gerrit.wikimedia.org/r/734568 (https://phabricator.wikimedia.org/T280307) (owner: 10Urbanecm) [20:24:49] (03PS5) 10Dzahn: growthexperiments.pp: Run purgeExpiredMentorStatus.php twice a month [puppet] - 10https://gerrit.wikimedia.org/r/734568 (https://phabricator.wikimedia.org/T280307) (owner: 10Urbanecm) [20:26:19] (03PS1) 10Dbrant: Add Android site association file. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/736042 (https://phabricator.wikimedia.org/T294776) [20:26:35] (03CR) 10Muehlenhoff: [C: 03+1] "Looks good to me!" [puppet] - 10https://gerrit.wikimedia.org/r/735413 (owner: 10Herron) [20:30:22] !log legoktm@deploy1002 helmfile [staging] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' . [20:30:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:32:01] !log legoktm@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' . [20:32:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:33:25] !log legoktm@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'shellbox-syntaxhighlight' for release 'main' . [20:33:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:34:10] !log mwmaint* - new timer/service mediawiki_job_growthexperiments-purgeExpiredMentorStatus created by puppet - T280307 [20:34:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:34:12] T280307: Mentor dashboard: M2 mentor tools/settings - https://phabricator.wikimedia.org/T280307 [20:39:35] (03PS1) 10Urbanecm: QuickSurveys: Show Growth IP editors survey to 0.1% of users [mediawiki-config] - 10https://gerrit.wikimedia.org/r/736043 (https://phabricator.wikimedia.org/T294568) [20:39:51] thanks mutante! [20:41:27] (03PS1) 10Dzahn: mediawiki: rm mediawiki/maintenance/pageassessments.pp [puppet] - 10https://gerrit.wikimedia.org/r/736044 [20:44:47] !log upgrading PHP 7.2 on mwdebug* servers [20:44:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:45:24] (03CR) 10Dduvall: "Friendly poke: Does anyone else need to sign off, or can I schedule a review/merge for the next Pupppet deployment window?" [puppet] - 10https://gerrit.wikimedia.org/r/734703 (https://phabricator.wikimedia.org/T294050) (owner: 10Dduvall) [20:49:25] (03CR) 10Dzahn: "I think this would be best merged by someone in wmcs (because labs-only but potentially massive effect). A change like this doesn't seem t" [puppet] - 10https://gerrit.wikimedia.org/r/734703 (https://phabricator.wikimedia.org/T294050) (owner: 10Dduvall) [20:49:49] (03CR) 10Andrew Bogott: [C: 03+2] hiera: Add hostname/certname based lookup to secret hierarchy under labs [puppet] - 10https://gerrit.wikimedia.org/r/734703 (https://phabricator.wikimedia.org/T294050) (owner: 10Dduvall) [20:51:07] urbanecm: I keep forgetting every time how exactly we did the timer setup between both data centers [20:51:11] andrewbogott: ty! [20:51:29] sure thing. I'm keeping an eye out but it should be harmless [20:51:51] urbanecm: so once again I was checking whether it's really normal that these for growth* run in both DCs [20:52:06] yeah, the passive one should ignore them AFAIK [20:52:40] and we put the "ignore" part in a different place [20:52:49] where before we skipped creating the units [20:53:03] now we keep them (and run!), but use a wrapper [20:53:12] (or at least that's what i remember) [20:53:47] yes, the wrapper checks whether the DC is active or not [20:53:58] so the timers are running but don't do anything [20:54:00] https://github.com/wikimedia/puppet/blob/production/modules/profile/manifests/mediawiki/periodic_job.pp#L31 [20:54:09] this looks to be the key part [20:54:30] ACK, that's it:) thanks [20:54:49] also means its safe to manually start those whereever [20:54:56] after adding a new one [20:55:11] sounds so [20:55:36] as long as you use the units created by puppet [20:55:43] and dont manually run other commands that is [20:56:47] i don't think that'd break anything though [20:56:52] !log upgrading PHP 7.2 on A:mw-canary servers [20:56:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:57:16] you'd just end up talking to the primary DB in the passive DC, which is read only [20:59:05] !log mwmaint1002:/# systemctl start mediawiki_job_growthexperiments-purgeExpiredMentorStatus (T280307) [20:59:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:59:07] T280307: Mentor dashboard: M2 mentor tools/settings - https://phabricator.wikimedia.org/T280307 [20:59:36] Nov 01 20:59:27 mwmaint1002 mediawiki_job_growthexperiments-purgeExpiredMentorStatus[30987]: scowiki: Deleted 0 rows from user_properties. [21:00:04] Reedy and sbassett: #bothumor I � Unicode. All rise for Weekly Security deployment window deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211101T2100). [21:00:15] urbanecm: Nov 01 20:59:48 mwmaint1002 systemd[1]: mediawiki_job_growthexperiments-purgeExpiredMentorStatus.service: Succeeded. [21:00:22] sounds good :) [21:00:32] saw it go through the list of wikis to end with zuwiki [21:01:16] and `/var/log/mediawiki/mediawiki_job_growthexperiments-purgeExpiredMentorStatus/syslog.log` has the output [21:01:20] looks it's all working :) [21:01:22] thanks mutante [21:01:36] this seems unused: https://gerrit.wikimedia.org/r/c/operations/puppet/+/736044/1/modules/profile/manifests/mediawiki/maintenance/pageassessments.pp [21:01:46] ack, yw [21:02:53] mutante: shouldn't https://gerrit.wikimedia.org/g/operations/puppet/+/ca92d0840a2f954bad6b284a3045baf3b5528dfa/modules/profile/manifests/mediawiki/maintenance.pp#74 load that file? [21:03:42] eh, yes, that looks like it. what was I doing.. grepped for that in my local copy [21:04:51] ah, here is the thing. I searched for the name of the job resource. which is 'pageassessments_cleanup', unlike the class [21:05:04] ty, nevermind [21:05:28] (03Abandoned) 10Dzahn: mediawiki: rm mediawiki/maintenance/pageassessments.pp [puppet] - 10https://gerrit.wikimedia.org/r/736044 (owner: 10Dzahn) [21:05:55] np :) [21:06:05] glad i could help [21:15:46] (03CR) 10Legoktm: "Seems fine to me, but you'll need to rebase since I touched the changelogs. I'm not sure what to do about the duplication, I think we can " [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/732737 (owner: 10Ahmon Dancy) [21:18:35] 10SRE, 10serviceops, 10MW-1.38-notes (1.38.0-wmf.6; 2021-10-26), 10Patch-For-Review, 10Sustainability: Jobrunner timeouts on cross-DC file uploads because of HTTP/2 - https://phabricator.wikimedia.org/T275752 (10Legoktm) [21:20:39] PROBLEM - Host logstash2028.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [21:21:54] jouncebot: nowandnext [21:21:54] For the next 1 hour(s) and 38 minute(s): Weekly Security deployment window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211101T2100) [21:21:54] In 1 hour(s) and 38 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211101T2300) [21:22:15] (03CR) 10Legoktm: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/735944 (https://phabricator.wikimedia.org/T275752) (owner: 10Alexandros Kosiaris) [21:22:16] * urbanecm deploys a secpatch [21:26:47] RECOVERY - Host logstash2028.mgmt is UP: PING OK - Packet loss = 0%, RTA = 35.97 ms [21:27:21] (03PS1) 10Urbanecm: votewiki: Grant election admins securepoll-view-voter-pii [mediawiki-config] - 10https://gerrit.wikimedia.org/r/736067 (https://phabricator.wikimedia.org/T290808) [21:27:23] (03CR) 10Urbanecm: [C: 03+2] votewiki: Grant election admins securepoll-view-voter-pii [mediawiki-config] - 10https://gerrit.wikimedia.org/r/736067 (https://phabricator.wikimedia.org/T290808) (owner: 10Urbanecm) [21:28:10] (03Merged) 10jenkins-bot: votewiki: Grant election admins securepoll-view-voter-pii [mediawiki-config] - 10https://gerrit.wikimedia.org/r/736067 (https://phabricator.wikimedia.org/T290808) (owner: 10Urbanecm) [21:28:48] !log urbanecm@deploy1002 Synchronized wmf-config/InitialiseSettings.php: 8f5008d9a043c96cd1dba18bdb38e168b01d63d0: votewiki: Grant election admins securepoll-view-voter-pii (T290808) (duration: 00m 55s) [21:28:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:29:42] (03CR) 10Dzahn: [C: 03+1] "compiler output looks good, only changes swift" [puppet] - 10https://gerrit.wikimedia.org/r/735944 (https://phabricator.wikimedia.org/T275752) (owner: 10Alexandros Kosiaris) [21:30:08] !log Deploy a security patch for T290808 [21:30:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:30:56] !log mwdebug-deploy@deploy1002 helmfile [eqiad] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [21:30:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:31:33] (03PS5) 10Ottomata: Add role::analytics_cluster::database::meta on an-db100[12] [puppet] - 10https://gerrit.wikimedia.org/r/736019 (https://phabricator.wikimedia.org/T284150) [21:32:07] (03CR) 10jerkins-bot: [V: 04-1] Add role::analytics_cluster::database::meta on an-db100[12] [puppet] - 10https://gerrit.wikimedia.org/r/736019 (https://phabricator.wikimedia.org/T284150) (owner: 10Ottomata) [21:32:44] (03PS6) 10Ottomata: Add role::analytics_cluster::database::meta on an-db100[12] [puppet] - 10https://gerrit.wikimedia.org/r/736019 (https://phabricator.wikimedia.org/T284150) [21:33:18] (03CR) 10jerkins-bot: [V: 04-1] Add role::analytics_cluster::database::meta on an-db100[12] [puppet] - 10https://gerrit.wikimedia.org/r/736019 (https://phabricator.wikimedia.org/T284150) (owner: 10Ottomata) [21:34:16] (03PS7) 10Ottomata: Add role::analytics_cluster::database::meta on an-db100[12] [puppet] - 10https://gerrit.wikimedia.org/r/736019 (https://phabricator.wikimedia.org/T284150) [21:34:24] !log mwdebug-deploy@deploy1002 helmfile [codfw] Ran 'sync' command on namespace 'mwdebug' for release 'pinkunicorn' . [21:34:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:34:51] (03CR) 10jerkins-bot: [V: 04-1] Add role::analytics_cluster::database::meta on an-db100[12] [puppet] - 10https://gerrit.wikimedia.org/r/736019 (https://phabricator.wikimedia.org/T284150) (owner: 10Ottomata) [21:34:53] (03CR) 10Ottomata: "Stevie, could you look over just meta_new.pp and see if my usage of conventions and classes is correct? Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/736019 (https://phabricator.wikimedia.org/T284150) (owner: 10Ottomata) [21:35:22] (03PS8) 10Ottomata: Add role::analytics_cluster::database::meta on an-db100[12] [puppet] - 10https://gerrit.wikimedia.org/r/736019 (https://phabricator.wikimedia.org/T284150) [21:36:04] (03CR) 10jerkins-bot: [V: 04-1] Add role::analytics_cluster::database::meta on an-db100[12] [puppet] - 10https://gerrit.wikimedia.org/r/736019 (https://phabricator.wikimedia.org/T284150) (owner: 10Ottomata) [21:37:02] (03PS9) 10Ottomata: Add role::analytics_cluster::database::meta on an-db100[12] [puppet] - 10https://gerrit.wikimedia.org/r/736019 (https://phabricator.wikimedia.org/T284150) [21:38:02] (03CR) 10Ottomata: [V: 03+1] "PCC SUCCESS (DIFF 2): https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/32033/console" [puppet] - 10https://gerrit.wikimedia.org/r/736019 (https://phabricator.wikimedia.org/T284150) (owner: 10Ottomata) [21:38:48] (03PS4) 10Ahmon Dancy: php-fpm: Add settings to control debuggability [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/732737 [21:39:37] (03CR) 10Ahmon Dancy: "rebased." [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/732737 (owner: 10Ahmon Dancy) [21:40:42] (03PS1) 10Aaron Schulz: Add another yubikey for my SSH access [puppet] - 10https://gerrit.wikimedia.org/r/736068 [22:01:47] (Juniper alarm active) firing: Juniper alarm active - https://alerts.wikimedia.org [22:21:24] @seen AaronSchulz [22:21:33] RECOVERY - SSH on puppetmaster1002.mgmt is OK: SSH OK - OpenSSH_7.0 (protocol 2.0) https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [22:35:25] (03PS2) 10Aaron Schulz: Add another yubikey for my SSH access [puppet] - 10https://gerrit.wikimedia.org/r/736068 [22:46:43] (03PS1) 10Dzahn: snapshot: replace the word cron everywhere [puppet] - 10https://gerrit.wikimedia.org/r/736074 [22:47:22] (03CR) 10jerkins-bot: [V: 04-1] snapshot: replace the word cron everywhere [puppet] - 10https://gerrit.wikimedia.org/r/736074 (owner: 10Dzahn) [22:54:38] (03PS2) 10Dzahn: snapshot: replace the word cron everywhere [puppet] - 10https://gerrit.wikimedia.org/r/736074 [23:00:04] RoanKattouw and Urbanecm: May I have your attention please! UTC late backport window. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211101T2300) [23:00:04] kemayo and Juan_90264: A patch you scheduled for UTC late backport window is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [23:00:38] 👋🏻 [23:01:11] 10SRE, 10Traffic, 10serviceops: Reconcile MediaWiki POST timeout and Varnish/ATS timeouts - https://phabricator.wikimedia.org/T294800 (10Legoktm) [23:02:31] (03PS3) 10Dzahn: snapshot: replace the word cron everywhere [puppet] - 10https://gerrit.wikimedia.org/r/736074 [23:06:46] (03PS4) 10Dzahn: snapshot: replace the word cron everywhere [puppet] - 10https://gerrit.wikimedia.org/r/736074 [23:08:34] Hello [23:10:06] Urbanecm: ? [23:10:45] Plausibly the preceding security deploy window has been running over. [23:11:36] Okay [23:11:48] jouncebot: play some music [23:12:51] jouncebot: [23:13:02] Jouncebot: ? [23:13:24] jouncebot: nowandnext [23:13:24] For the next 0 hour(s) and 46 minute(s): UTC late backport window (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211101T2300) [23:13:24] In 2 hour(s) and 46 minute(s): Branching MediaWiki, extensions, skins, and vendor – See Heterogeneous_deployment/Train_deploys (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20211102T0200) [23:15:06] RoanKattouw: friendly jouncebot ping [23:15:49] so tomorrow is a holiday for WMF US [23:15:55] maybe it's like Friday, you know [23:16:09] could influence deployment calendar [23:16:36] They're generally good about keeping the deployment page updated to reflect these things. [23:16:51] not supposed to be, other than it being an European train week [23:18:28] (03CR) 10Aaron Schulz: "I put the public key in my home dir on bast4003" [puppet] - 10https://gerrit.wikimedia.org/r/736068 (owner: 10Aaron Schulz) [23:18:40] Will the changes scheduled for now be implemented? [23:19:12] if no deployers are available for this window, you'll have to reschedule [23:19:21] (03CR) 10Dzahn: "This is supposed to not change anything. Just make it much easier to follow-up with the part that actually switches it to systemd. https:" [puppet] - 10https://gerrit.wikimedia.org/r/736074 (owner: 10Dzahn) [23:20:05] there might be someone else around to deploy if the people that were scheduled for it can't make it [23:20:15] There's still 40 minutes in the window, so remaining available might yet see things deployed. [23:20:42] Isn't mutante an deployer? [23:21:59] I haven't deployed in 8 years or something, I'd not be comfortable without taking the training again nowadays [23:22:02] I don't think so? I can never remember who has permissions, so all I can say is that they're not down as a deployer on anything in the current deployment calendar. :D [23:22:18] Or that. [23:24:12] I used to do that when the deployment host and bastion was the same thing and called fenari [23:24:49] so if it's not emergency or so, i'd rather not try it now after 4pm and before the holiday [23:25:29] I returned [23:25:42] No changes in your absence, alas. [23:26:17] Okay [23:28:36] If I'm not mistaken, I usually see deployers available: legoktm, tgr_ [23:28:53] I'm in a meeting rn, sorry [23:29:22] Alright [23:41:47] I'm going to be eating something -- ping me if a deployment does happen. If not, I'll reschedule my patch once I'm done eating. [23:42:21] Okay Kemayo [23:50:29] * legoktm looks at the patches [23:51:12] (03PS4) 10Juan90264: Enable ArticlePlaceholder for kswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735627 (https://phabricator.wikimedia.org/T294632) [23:51:26] Juan_90264: Kemayo: I can deploy now [23:51:49] (03CR) 10Legoktm: [C: 03+2] Enable ArticlePlaceholder for kswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735627 (https://phabricator.wikimedia.org/T294632) (owner: 10Juan90264) [23:53:13] (03Merged) 10jenkins-bot: Enable ArticlePlaceholder for kswiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/735627 (https://phabricator.wikimedia.org/T294632) (owner: 10Juan90264) [23:53:21] Legoktm: Is the meeting over? [23:53:29] yeah [23:53:53] Juan_90264: it's on mwdebug1001 for you to test [23:54:45] I'll try [23:56:47] legoktm: okay, I’m available. I don’t have much to test for mine though. It’s a config setup patch where I can’t verify anything about it. [23:57:04] Kemayo: ack, I'll just sync it out then once we finish the current patch [23:58:42] Legoktm: I tested and approved [23:59:03] cool [23:59:49] (03PS4) 10Legoktm: Add event stream config for discussiontools [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731854 (https://phabricator.wikimedia.org/T286076) (owner: 10DLynch) [23:59:53] (03CR) 10Legoktm: [C: 03+2] Add event stream config for discussiontools [mediawiki-config] - 10https://gerrit.wikimedia.org/r/731854 (https://phabricator.wikimedia.org/T286076) (owner: 10DLynch)