[00:29:20] PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is CRITICAL: 107.8 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37 [01:36:38] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1087 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [01:47:44] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1087 is OK: HTTP OK: HTTP/1.0 200 OK - 22343 bytes in 0.007 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [01:53:32] RECOVERY - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is OK: (C)100 gt (W)80 gt 79.32 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37 [02:05:50] PROBLEM - proton endpoints health on proton1001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) is CRITICAL: Test Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices returned the unexpected status 404 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton [02:07:50] PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices) is CRITICAL: Test Print the Bar page from en.wp.org in A4 format using optimized for reading on mobile devices returned the unexpected status 404 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton [02:09:34] RECOVERY - proton endpoints health on proton1001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton [02:09:42] RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton [02:13:48] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1075 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [02:26:52] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1075 is OK: HTTP OK: HTTP/1.0 200 OK - 22339 bytes in 0.006 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [02:30:50] PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is CRITICAL: 108.6 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37 [02:49:18] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1075 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [03:02:12] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1075 is OK: HTTP OK: HTTP/1.0 200 OK - 22341 bytes in 0.006 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [03:09:50] RECOVERY - Router interfaces on cr2-eqord is OK: OK: host 208.80.154.198, interfaces up: 64, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [03:11:06] RECOVERY - Router interfaces on cr3-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 76, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [03:58:10] RECOVERY - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is OK: (C)100 gt (W)80 gt 79.32 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37 [04:01:42] PROBLEM - proton endpoints health on proton1002 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 404 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton [04:03:30] PROBLEM - proton endpoints health on proton1001 is CRITICAL: /{domain}/v1/pdf/{title}/{format}/{type} (Print the Foo page from en.wp.org in letter format) is CRITICAL: Test Print the Foo page from en.wp.org in letter format returned the unexpected status 404 (expecting: 200) https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton [04:05:22] RECOVERY - proton endpoints health on proton1001 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton [04:05:28] RECOVERY - proton endpoints health on proton1002 is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/Services/Monitoring/proton [04:28:00] PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is CRITICAL: 101.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37 [05:01:36] !log ats-tls restart in cp1075, cp1081 and cp1087 - T249335 [05:03:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:03:31] T249335: Memory leak on ats-tls 8.0.6 - https://phabricator.wikimedia.org/T249335 [05:03:41] wow.. almost 2 minutes to log it [05:11:46] (03CR) 10Vgutierrez: [C: 03+2] ATS: Enable inbound TLSv1.3 on the upload cluster [puppet] - 10https://gerrit.wikimedia.org/r/585697 (https://phabricator.wikimedia.org/T170567) (owner: 10Vgutierrez) [05:16:21] !log Enable TLS Session Tickets on eqiad - T245616 [05:16:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:16:29] T245616: Provide a simple and automated SSL Ticket key generation system for ATS - https://phabricator.wikimedia.org/T245616 [05:16:42] !log Enable inbound TLSv1.3 in upload@eqiad - T170567 [05:16:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:16:47] T170567: Support TLSv1.3 - https://phabricator.wikimedia.org/T170567 [05:16:56] (03PS1) 10BryanDavis: Replace pykube with a custom API client [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/586162 (https://phabricator.wikimedia.org/T197930) [05:17:44] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1079 for schema change', diff saved to https://phabricator.wikimedia.org/P10896 and previous config saved to /var/cache/conftool/dbconfig/20200406-051744-marostegui.json [05:17:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:18:15] !log Deploy schema change on db1079 (this will generate lag on s7 labs) [05:18:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:20:26] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1081 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [05:23:53] (03CR) 10BryanDavis: "I would classify this as "gently tested". I wrote the K8sClient code originally on dev.toolforge.org and tested its ability to list and de" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/586162 (https://phabricator.wikimedia.org/T197930) (owner: 10BryanDavis) [05:33:30] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1081 is OK: HTTP OK: HTTP/1.0 200 OK - 22307 bytes in 0.006 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [05:39:20] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10Marostegui) I agree with Nuria - we are kinda converting the Analytis replica into a staging en... [05:45:59] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1079 after schema change', diff saved to https://phabricator.wikimedia.org/P10897 and previous config saved to /var/cache/conftool/dbconfig/20200406-054559-marostegui.json [05:46:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:50:51] !log ats-tls restart in cp3056, cp3058 and cp3062 - T249335 [05:50:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:50:57] T249335: Memory leak on ats-tls 8.0.6 - https://phabricator.wikimedia.org/T249335 [05:52:01] (03PS1) 10Marostegui: dbproxy1010: Remove all puppet references [puppet] - 10https://gerrit.wikimedia.org/r/586165 (https://phabricator.wikimedia.org/T248944) [05:53:41] (03PS1) 10Marostegui: wmnet: Remove production DNS entries for dbproxy1010 [dns] - 10https://gerrit.wikimedia.org/r/586166 (https://phabricator.wikimedia.org/T248944) [05:54:22] !log marostegui@cumin1001 START - Cookbook sre.hosts.decommission [05:54:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:54:53] !log marostegui@cumin1001 END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) [05:54:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:55:32] (03CR) 10Marostegui: [C: 03+2] dbproxy1010: Remove all puppet references [puppet] - 10https://gerrit.wikimedia.org/r/586165 (https://phabricator.wikimedia.org/T248944) (owner: 10Marostegui) [05:56:00] (03CR) 10Marostegui: [C: 03+2] wmnet: Remove production DNS entries for dbproxy1010 [dns] - 10https://gerrit.wikimedia.org/r/586166 (https://phabricator.wikimedia.org/T248944) (owner: 10Marostegui) [05:57:12] RECOVERY - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is OK: (C)100 gt (W)80 gt 78.31 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37 [05:57:42] 10Operations, 10ops-eqiad, 10decommission: Decommission dbproxy1010.eqiad.wmnet - https://phabricator.wikimedia.org/T248944 (10Marostegui) a:05Marostegui→03Jclark-ctr [05:57:47] 10Operations, 10ops-eqiad, 10decommission: Decommission dbproxy1010.eqiad.wmnet - https://phabricator.wikimedia.org/T248944 (10Marostegui) [05:59:36] 10Operations, 10ops-eqiad, 10DC-Ops, 10decommission: Decommission dbproxy1010.eqiad.wmnet - https://phabricator.wikimedia.org/T248944 (10Marostegui) Ready for #dc-ops! [06:01:34] ACKNOWLEDGEMENT - snapshot of s3 in codfw on db1115 is CRITICAL: snapshot for s3 at codfw taken more than 3 days ago: Most recent backup 2020-04-02 08:53:40 Jcrespo running now, last run failed https://wikitech.wikimedia.org/wiki/MariaDB/Backups [06:03:00] (03CR) 10Jcrespo: "Will it require grants deletion, too? (by IP, or was it already done)" [puppet] - 10https://gerrit.wikimedia.org/r/586165 (https://phabricator.wikimedia.org/T248944) (owner: 10Marostegui) [06:03:18] (03PS1) 10Marostegui: install_server: Allow labsdb1011 to reimage with Buster [puppet] - 10https://gerrit.wikimedia.org/r/586167 (https://phabricator.wikimedia.org/T249188) [06:03:48] (03CR) 10Marostegui: [C: 03+2] "It was done already: T231280#6021930" [puppet] - 10https://gerrit.wikimedia.org/r/586165 (https://phabricator.wikimedia.org/T248944) (owner: 10Marostegui) [06:07:00] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [software/censorship-monitoring] - 10https://gerrit.wikimedia.org/r/585767 (owner: 10Ssingh) [06:07:36] (03CR) 10Jcrespo: "Thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/586165 (https://phabricator.wikimedia.org/T248944) (owner: 10Marostegui) [06:08:35] (03CR) 10Marostegui: [C: 03+2] install_server: Allow labsdb1011 to reimage with Buster [puppet] - 10https://gerrit.wikimedia.org/r/586167 (https://phabricator.wikimedia.org/T249188) (owner: 10Marostegui) [06:11:32] (03PS2) 10Elukey: admin: allow analytics-privatedata-users to use GPUs by default [puppet] - 10https://gerrit.wikimedia.org/r/585760 [06:14:16] 10Operations, 10SRE-Access-Requests, 10Developer-Advocacy (Apr-Jun 2020): Add aklapper to analytics-privatedata-users - https://phabricator.wikimedia.org/T248905 (10jcrespo) a:05Nuria→03Aklapper So what is the status of this? Which rights are needed and which are required? [06:18:49] !log Deploy schema change on s1 codfw master, this will generate lag on codfw [06:18:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:23:20] (03PS1) 10Muehlenhoff: ci: Use docker.io on Buster [puppet] - 10https://gerrit.wikimedia.org/r/586203 (https://phabricator.wikimedia.org/T224591) [06:23:41] (03PS1) 10Giuseppe Lavagetto: mediawiki: convert all API servers to use envoy for TLS termination. [puppet] - 10https://gerrit.wikimedia.org/r/586204 (https://phabricator.wikimedia.org/T247389) [06:23:43] (03PS1) 10Giuseppe Lavagetto: mediawiki: convert all appserver to use envoy for TLS termination [puppet] - 10https://gerrit.wikimedia.org/r/586205 (https://phabricator.wikimedia.org/T247389) [06:24:10] 10Operations, 10netops, 10observability: Migrate role::netmon to Buster - https://phabricator.wikimedia.org/T247967 (10MoritzMuehlenhoff) p:05Triage→03Medium [06:24:15] 10Operations, 10Puppet: Upgrade Puppet to 5.5.19 - https://phabricator.wikimedia.org/T248168 (10MoritzMuehlenhoff) p:05Triage→03Medium [06:26:10] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10jcrespo) a:05Nuria→03Mooeypoo While I agree with everyone here, I have a couple of thoughts... [06:26:58] (03CR) 10jerkins-bot: [V: 04-1] ci: Use docker.io on Buster [puppet] - 10https://gerrit.wikimedia.org/r/586203 (https://phabricator.wikimedia.org/T224591) (owner: 10Muehlenhoff) [06:27:55] (03CR) 10Elukey: [C: 03+2] admin: allow analytics-privatedata-users to use GPUs by default [puppet] - 10https://gerrit.wikimedia.org/r/585760 (owner: 10Elukey) [06:28:46] (03PS2) 10Muehlenhoff: ci: Use docker.io on Buster [puppet] - 10https://gerrit.wikimedia.org/r/586203 (https://phabricator.wikimedia.org/T224591) [06:29:49] ACKNOWLEDGEMENT - DNS on ganeti1011.mgmt is CRITICAL: DNS CRITICAL - expected 0.0.0.0 but got 10.65.5.106 Ayounsi https://phabricator.wikimedia.org/T249314 https://wikitech.wikimedia.org/wiki/Dc-operations/Hardware_Troubleshooting_Runbook [06:29:56] (03CR) 10Giuseppe Lavagetto: [C: 03+2] mediawiki: convert all API servers to use envoy for TLS termination. [puppet] - 10https://gerrit.wikimedia.org/r/586204 (https://phabricator.wikimedia.org/T247389) (owner: 10Giuseppe Lavagetto) [06:30:22] !log Upgrade dbproxy1019 - T231520 [06:30:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:30:28] T231520: Replace labsdb (wikireplicas) dbproxies: dbproxy1010 and dbproxy1011 - https://phabricator.wikimedia.org/T231520 [06:33:29] (03PS1) 10Marostegui: wikireplicas_dns: Replace dbproxy1011 with dbproxy1019 [puppet] - 10https://gerrit.wikimedia.org/r/586206 (https://phabricator.wikimedia.org/T231520) [06:35:04] (03PS1) 10Marostegui: wmnet: Replace dbproxy1011 with dbproxy1019 [dns] - 10https://gerrit.wikimedia.org/r/586207 (https://phabricator.wikimedia.org/T231520) [06:36:08] (03CR) 10Marostegui: "As done with the previous dbproxy1010 replacement, once merge, I will run wmcs-wikireplica-dns" [puppet] - 10https://gerrit.wikimedia.org/r/586206 (https://phabricator.wikimedia.org/T231520) (owner: 10Marostegui) [06:36:45] <_joe_> !log converting the api servers to envoy for TLS in eqiad [06:36:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:45:34] !log delete BGP to AS25074 in amsix [06:45:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:47:14] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10Marostegui) >>! In T249059#6030883, @jcrespo wrote: > While I agree with everyone here, I have... [06:50:59] PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is CRITICAL: 157.6 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37 [06:55:20] (03PS2) 10Vgutierrez: partman: clean up cacheproxy selectors [puppet] - 10https://gerrit.wikimedia.org/r/583613 (https://phabricator.wikimedia.org/T156955) (owner: 10BBlack) [06:55:41] (03PS1) 10Elukey: admin: add kerberos flag for aklapper [puppet] - 10https://gerrit.wikimedia.org/r/586208 (https://phabricator.wikimedia.org/T248905) [06:57:15] 10Operations, 10SRE-Access-Requests, 10Developer-Advocacy (Apr-Jun 2020), 10Patch-For-Review: Add aklapper to analytics-privatedata-users - https://phabricator.wikimedia.org/T248905 (10elukey) @Aklapper have you tried https://turnilo.wikimedia.org/ or http://superset.wikimedia.org/ ? The latter allows you... [07:00:05] (03CR) 10Vgutierrez: [C: 03+2] partman: clean up cacheproxy selectors [puppet] - 10https://gerrit.wikimedia.org/r/583613 (https://phabricator.wikimedia.org/T156955) (owner: 10BBlack) [07:06:55] !log Rename wb_terms on codfw - T248086 [07:06:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:07:00] T248086: Drop wb_terms in production from s4 (commonswiki, testcommonswiki), s3 (testwikidatawiki), s8 (wikidatawiki) - https://phabricator.wikimedia.org/T248086 [07:08:02] (03PS1) 10Vgutierrez: ATS: Remove host level codfw storage_elements config [puppet] - 10https://gerrit.wikimedia.org/r/586296 (https://phabricator.wikimedia.org/T248816) [07:10:05] (03CR) 10Vgutierrez: "It's a NOOP: https://puppet-compiler.wmflabs.org/compiler1003/21711/" [puppet] - 10https://gerrit.wikimedia.org/r/586296 (https://phabricator.wikimedia.org/T248816) (owner: 10Vgutierrez) [07:12:27] mediawiki.org seems to be loading very slowly for me, just got 18 seconds TTFB, 17 of that spend in parsing [07:13:16] (03PS2) 10DCausse: cirrus: Increase commonswiki near match weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/580394 (https://phabricator.wikimedia.org/T245642) (owner: 10EBernhardson) [07:14:09] <_joe_> Nikerabbit: what backend? [07:14:14] <_joe_> and what page? [07:14:14] 10Operations, 10netops: IRR updates needed - https://phabricator.wikimedia.org/T235886 (10ayounsi) 05Open→03Resolved I'd rather avoid creating and managing inetnums at a per team or per route object granularity, but it's not possible to create inetnums for the whole allocation (whole /22 for example), only... [07:14:18] 10Operations, 10Traffic, 10netops: BGP: Investigate isolating codfw and eqiad - https://phabricator.wikimedia.org/T246721 (10ayounsi) [07:14:21] 10Operations, 10netops: esams/knams: advertise 185.15.58.0/23 instead of 185.15.56.0/22 - https://phabricator.wikimedia.org/T207753 (10ayounsi) [07:14:32] (03PS1) 10Vgutierrez: ATS: Disable wmf-analytics log [puppet] - 10https://gerrit.wikimedia.org/r/586298 (https://phabricator.wikimedia.org/T249335) [07:16:18] https://www.mediawiki.org/wiki/Manual:Coding_conventions/PHP [07:16:23] (03PS1) 10Ema: 5.1.3-1wm13: handle fetch error in vbf_stp_condfetch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/586299 (https://phabricator.wikimedia.org/T249344) [07:16:50] (03CR) 10jerkins-bot: [V: 04-1] 5.1.3-1wm13: handle fetch error in vbf_stp_condfetch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/586299 (https://phabricator.wikimedia.org/T249344) (owner: 10Ema) [07:16:58] _joe_: what do you mean with backend? [07:18:18] (03PS2) 10Vgutierrez: ATS: Disable wmf-analytics log [puppet] - 10https://gerrit.wikimedia.org/r/586298 (https://phabricator.wikimedia.org/T249335) [07:18:43] (03CR) 10Ema: [C: 03+1] ATS: Remove host level codfw storage_elements config [puppet] - 10https://gerrit.wikimedia.org/r/586296 (https://phabricator.wikimedia.org/T248816) (owner: 10Vgutierrez) [07:18:59] (03CR) 10Vgutierrez: [C: 03+2] ATS: Remove host level codfw storage_elements config [puppet] - 10https://gerrit.wikimedia.org/r/586296 (https://phabricator.wikimedia.org/T248816) (owner: 10Vgutierrez) [07:20:19] (03CR) 10Ema: ATS: Disable wmf-analytics log (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/586298 (https://phabricator.wikimedia.org/T249335) (owner: 10Vgutierrez) [07:20:59] <_joe_> Nikerabbit: in the page, if you look at the sources, there is "wgHostname":"mw1350"} [07:21:03] <_joe_> just at the bottom [07:21:03] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_restbase_eqiad site=eqiad https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [07:21:10] <_joe_> in the page sources [07:22:05] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [07:22:28] (03PS3) 10Vgutierrez: ATS: Disable wmf-analytics log [puppet] - 10https://gerrit.wikimedia.org/r/586298 (https://phabricator.wikimedia.org/T249335) [07:22:32] (03CR) 10Vgutierrez: ATS: Disable wmf-analytics log (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/586298 (https://phabricator.wikimedia.org/T249335) (owner: 10Vgutierrez) [07:23:04] <_joe_> Nikerabbit: or from the web console, mw.config.get('wgHostname') [07:23:08] (03PS2) 10Ema: 5.1.3-1wm13: handle fetch error in vbf_stp_condfetch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/586299 (https://phabricator.wikimedia.org/T249344) [07:24:33] (03CR) 10jerkins-bot: [V: 04-1] 5.1.3-1wm13: handle fetch error in vbf_stp_condfetch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/586299 (https://phabricator.wikimedia.org/T249344) (owner: 10Ema) [07:25:34] _joe_: mw1355 [07:25:46] <_joe_> Nikerabbit: what was the url? [07:25:59] _joe_: https://www.mediawiki.org/wiki/Manual:Coding_conventions/PHP [07:26:20] <_joe_> it loaded instantly for me but lemme check the logs of that server [07:27:56] <_joe_> yes, I can see your request taking 17.6 seconds on the backend [07:28:17] (03PS4) 10Vgutierrez: ATS: Disable wmf-analytics log [puppet] - 10https://gerrit.wikimedia.org/r/586298 (https://phabricator.wikimedia.org/T249335) [07:28:35] <_joe_> but I see a lot of normal parsing times there [07:35:14] (03CR) 10Filippo Giunchedi: [C: 03+1] "Retroactive LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/586111 (https://phabricator.wikimedia.org/T249346) (owner: 10CDanis) [07:35:26] !log Rename wb_terms on eqiad excluding labsdb1009, labdb1010, labsdb1011 - T248086 [07:35:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:35:31] T248086: Drop wb_terms in production from s4 (commonswiki, testcommonswiki), s3 (testwikidatawiki), s8 (wikidatawiki) - https://phabricator.wikimedia.org/T248086 [07:35:51] !log restart elasticsearch_6@cloudelastic-chi-eqiad on cloudelastic1003 as attempt to fix heavy GC runs (old gen) - T231517 [07:35:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:35:56] T231517: Investigate and fix GC issues on cloudelastic machines - https://phabricator.wikimedia.org/T231517 [07:36:58] _joe_: thanks, probably someone else to determine whether 17 seconds for that page is normal [07:37:36] <_joe_> Nikerabbit: it doesn't seem so but I don't see anything indicating the server was misoperating at the time [07:38:12] (03CR) 10Filippo Giunchedi: [C: 03+2] prometheus: additional external_labels for Thanos [puppet] - 10https://gerrit.wikimedia.org/r/585468 (https://phabricator.wikimedia.org/T233956) (owner: 10Filippo Giunchedi) [07:39:01] RECOVERY - ElasticSearch shard size check - 9243 on search.svc.codfw.wmnet is OK: OK - All good! https://wikitech.wikimedia.org/wiki/Search%23If_it_has_been_indexed [07:40:26] For me https://www.mediawiki.org/wiki/Manual:Coding_conventions/PHP is: Parsed by mw1266 Cached time: 20200406073916 CPU time usage: 0.504 seconds Saved in parser cache with key mediawikiwiki:pcache:idhash:75216-0!userlang=it and timestamp 20200406073915 and revision id 3737484 [07:41:18] And similar for Finnish: CPU time usage: 0.432 seconds Saved in parser cache with key mediawikiwiki:pcache:idhash:75216-0!userlang=fi and timestamp 20200406074040 and revision id 3737484 [07:43:25] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_mobileapps_cluster_codfw site=codfw https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [07:43:37] (I know it's not the same as wgHostname , was just wondering what parsing time it has) [07:45:05] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [07:49:41] <_joe_> !log eqiad API migrated to envoy for local TLS termination, now starting codfw [07:49:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:50:04] RECOVERY - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is OK: (C)100 gt (W)80 gt 79.32 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37 [07:50:34] !log search index: deleting stale index wikidatawiki_content_1585224806 on cloudelastic:9243 [07:50:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:51:18] (03CR) 10Ema: [C: 03+1] ATS: Disable wmf-analytics log [puppet] - 10https://gerrit.wikimedia.org/r/586298 (https://phabricator.wikimedia.org/T249335) (owner: 10Vgutierrez) [07:51:37] (03CR) 10Vgutierrez: [C: 03+2] ATS: Disable wmf-analytics log [puppet] - 10https://gerrit.wikimedia.org/r/586298 (https://phabricator.wikimedia.org/T249335) (owner: 10Vgutierrez) [07:51:46] (03PS3) 10Ema: 5.1.3-1wm13: handle fetch error in vbf_stp_condfetch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/586299 (https://phabricator.wikimedia.org/T249344) [07:52:31] 10Operations: Onboarding Janis Meybohm - https://phabricator.wikimedia.org/T249081 (10MoritzMuehlenhoff) [07:54:53] !log rolling restart of ats-tls to disable wmf-analytics log - T249335 T237993 [07:54:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:55:00] T237993: Create replacement for Varnishkafka - https://phabricator.wikimedia.org/T237993 [07:55:00] T249335: Memory leak on ats-tls 8.0.6 - https://phabricator.wikimedia.org/T249335 [07:57:55] (03PS4) 10Ema: 5.1.3-1wm13: handle fetch error in vbf_stp_condfetch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/586299 (https://phabricator.wikimedia.org/T249344) [08:01:54] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1087 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [08:07:18] !log Deploy schema change on dbstore1003:3311 [08:07:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:09:08] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp1087 is OK: HTTP OK: HTTP/1.0 200 OK - 22302 bytes in 0.005 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [08:09:41] (03PS1) 10Jdrewniak: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586305 (https://phabricator.wikimedia.org/T128546) [08:14:05] (03CR) 10jerkins-bot: [V: 04-1] 5.1.3-1wm13: handle fetch error in vbf_stp_condfetch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/586299 (https://phabricator.wikimedia.org/T249344) (owner: 10Ema) [08:18:16] (03CR) 10Ema: "recheck" [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/586299 (https://phabricator.wikimedia.org/T249344) (owner: 10Ema) [08:18:42] <_joe_> !log conversion of codfw api done [08:18:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:21:00] (03CR) 10Vgutierrez: [C: 03+1] 5.1.3-1wm13: handle fetch error in vbf_stp_condfetch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/586299 (https://phabricator.wikimedia.org/T249344) (owner: 10Ema) [08:27:15] (03PS1) 10ArielGlenn: fix up link for multistream index file download [dumps] - 10https://gerrit.wikimedia.org/r/586308 (https://phabricator.wikimedia.org/T249477) [08:35:44] (03PS1) 10Elukey: aptrepo: add configuration for AMD ROCm 3.3 [puppet] - 10https://gerrit.wikimedia.org/r/586310 (https://phabricator.wikimedia.org/T247082) [08:36:06] 10Operations, 10MediaWiki-Cache, 10Page Content Service, 10Product-Infrastructure-Team-Backlog, and 3 others: cache_text cluster consistently backlogged on purge requests - https://phabricator.wikimedia.org/T249325 (10ema) [08:37:58] (03PS2) 10Elukey: aptrepo: add configuration for AMD ROCm 3.3 [puppet] - 10https://gerrit.wikimedia.org/r/586310 (https://phabricator.wikimedia.org/T247082) [08:39:55] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/586310 (https://phabricator.wikimedia.org/T247082) (owner: 10Elukey) [08:43:06] (03PS2) 10Elukey: wdqs: Initial configuration of wdqs200[78]. [puppet] - 10https://gerrit.wikimedia.org/r/577324 (https://phabricator.wikimedia.org/T246343) (owner: 10Gehel) [08:44:04] (03CR) 10Elukey: "Had to rebase manually due to a site.pp conflict.." [puppet] - 10https://gerrit.wikimedia.org/r/577324 (https://phabricator.wikimedia.org/T246343) (owner: 10Gehel) [08:45:24] (03PS3) 10Elukey: wdqs: Initial configuration of wdqs200[78]. [puppet] - 10https://gerrit.wikimedia.org/r/577324 (https://phabricator.wikimedia.org/T246343) (owner: 10Gehel) [08:50:35] !log Deploy schema change on db1139:3311 [08:50:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:50:55] (03CR) 10ArielGlenn: [C: 03+2] fix up link for multistream index file download [dumps] - 10https://gerrit.wikimedia.org/r/586308 (https://phabricator.wikimedia.org/T249477) (owner: 10ArielGlenn) [08:52:16] (03CR) 10Elukey: [C: 03+2] wdqs: Initial configuration of wdqs200[78]. [puppet] - 10https://gerrit.wikimedia.org/r/577324 (https://phabricator.wikimedia.org/T246343) (owner: 10Gehel) [08:54:35] !log bootstrap wdqs200[7,8] - T246343 [08:54:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:54:40] T246343: Service implementation on wdqs200[7-8].codfw.wmnet - https://phabricator.wikimedia.org/T246343 [08:54:56] (03CR) 10Ema: [V: 03+2 C: 03+2] "All tests green in a local buster pbuilder environment." [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/586299 (https://phabricator.wikimedia.org/T249344) (owner: 10Ema) [08:55:35] !log ariel@deploy1001 Started deploy [dumps/dumps@ae1e705]: add prefetch test, fix multistream index file download link [08:55:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:55:45] !log ariel@deploy1001 Finished deploy [dumps/dumps@ae1e705]: add prefetch test, fix multistream index file download link (duration: 00m 09s) [08:55:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:57:34] (03PS1) 10Filippo Giunchedi: modules: add thanos-sidecar define and profile [puppet] - 10https://gerrit.wikimedia.org/r/586312 (https://phabricator.wikimedia.org/T233956) [08:58:14] (03PS1) 10Filippo Giunchedi: prometheus: add thanos-sidecar to prometheus@ops [puppet] - 10https://gerrit.wikimedia.org/r/586313 (https://phabricator.wikimedia.org/T233956) [08:58:32] (03PS1) 10Filippo Giunchedi: Add Thanos query [puppet] - 10https://gerrit.wikimedia.org/r/586314 (https://phabricator.wikimedia.org/T233956) [08:58:34] (03PS1) 10Filippo Giunchedi: prometheus: scrape thanos sidecar/query metrics [puppet] - 10https://gerrit.wikimedia.org/r/586315 (https://phabricator.wikimedia.org/T233956) [09:04:01] (03CR) 10jerkins-bot: [V: 04-1] Add Thanos query [puppet] - 10https://gerrit.wikimedia.org/r/586314 (https://phabricator.wikimedia.org/T233956) (owner: 10Filippo Giunchedi) [09:06:15] (03PS2) 10Filippo Giunchedi: Add Thanos query [puppet] - 10https://gerrit.wikimedia.org/r/586314 (https://phabricator.wikimedia.org/T233956) [09:06:17] (03PS2) 10Filippo Giunchedi: prometheus: scrape thanos sidecar/query metrics [puppet] - 10https://gerrit.wikimedia.org/r/586315 (https://phabricator.wikimedia.org/T233956) [09:08:30] !log upload varnish 5.1.3-1wm13 to buster-wikimedia on apt1001.wm.org T249344 [09:08:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:08:35] T249344: varnishd crashes in vbf_stp_condfetch(): cp3057 and cp3061 - https://phabricator.wikimedia.org/T249344 [09:11:33] !log cp2027: upgrade varnish to 5.1.3-1wm13 and restart varnish-fe T249344 [09:11:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:12:29] (03CR) 10jerkins-bot: [V: 04-1] Add Thanos query [puppet] - 10https://gerrit.wikimedia.org/r/586314 (https://phabricator.wikimedia.org/T233956) (owner: 10Filippo Giunchedi) [09:14:43] (03PS2) 10Filippo Giunchedi: modules: add thanos-sidecar define and profile [puppet] - 10https://gerrit.wikimedia.org/r/586312 (https://phabricator.wikimedia.org/T233956) [09:14:45] (03PS2) 10Filippo Giunchedi: prometheus: add thanos-sidecar to prometheus@ops [puppet] - 10https://gerrit.wikimedia.org/r/586313 (https://phabricator.wikimedia.org/T233956) [09:15:22] (03PS3) 10Filippo Giunchedi: Add Thanos query [puppet] - 10https://gerrit.wikimedia.org/r/586314 (https://phabricator.wikimedia.org/T233956) [09:15:24] (03PS3) 10Filippo Giunchedi: prometheus: scrape thanos sidecar/query metrics [puppet] - 10https://gerrit.wikimedia.org/r/586315 (https://phabricator.wikimedia.org/T233956) [09:16:21] (03CR) 10Thiemo Kreuz (WMDE): [C: 03+1] planet: Remove dead blog.wikimedia.org feeds [puppet] - 10https://gerrit.wikimedia.org/r/586148 (owner: 10Legoktm) [09:36:06] (03CR) 10Arturo Borrero Gonzalez: [C: 03+1] Fix partial rename of "type" parameter to "wstype" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/585846 (https://phabricator.wikimedia.org/T249390) (owner: 10BryanDavis) [09:39:45] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1099:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P10898 and previous config saved to /var/cache/conftool/dbconfig/20200406-093944-marostegui.json [09:39:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:40:06] !log Deploy schema change on db1099:3311 [09:40:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:50:12] !log push pfw firewall policies - T249267 [09:50:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:53:30] RECOVERY - BGP status on cr2-esams is OK: BGP OK - up: 414, down: 0, shutdown: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23BGP_status [10:00:55] (03CR) 10Volans: "> Patch Set 2:" (031 comment) [software/homer] - 10https://gerrit.wikimedia.org/r/584973 (owner: 10Volans) [10:01:06] (03PS1) 10Filippo Giunchedi: prometheus: alert on stale textfiles [puppet] - 10https://gerrit.wikimedia.org/r/586324 [10:05:04] (03PS1) 10Majavah: Create account creator and rollback groups on yowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586325 (https://phabricator.wikimedia.org/T249487) [10:08:23] 10Operations, 10Performance-Team, 10Traffic, 10Patch-For-Review, 10Wikimedia-Incident: 15% response start regression as of 2019-11-11 (Varnish->ATS) - https://phabricator.wikimedia.org/T238494 (10Gilles) @ema @Vgutierrez last time we talked about this a month ago, you were in the process of rolling out A... [10:14:07] (03CR) 10Filippo Giunchedi: "ATM "atlas_metadata.prom" is alerting, although that is legit I believe because Puppet is managing the file and won't update it unless it " [puppet] - 10https://gerrit.wikimedia.org/r/586324 (owner: 10Filippo Giunchedi) [10:30:04] jan_drewniak: (Dis)respected human, time to deploy Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200406T1030). Please do the needful. [10:33:40] (03PS1) 10Elukey: superset: handle upgrade to 0.36.0 [puppet] - 10https://gerrit.wikimedia.org/r/586328 (https://phabricator.wikimedia.org/T249495) [10:36:12] (03PS2) 10Elukey: superset: handle upgrade to 0.36.0 [puppet] - 10https://gerrit.wikimedia.org/r/586328 (https://phabricator.wikimedia.org/T249495) [10:40:10] (03CR) 10jerkins-bot: [V: 04-1] superset: handle upgrade to 0.36.0 [puppet] - 10https://gerrit.wikimedia.org/r/586328 (https://phabricator.wikimedia.org/T249495) (owner: 10Elukey) [10:43:07] (03CR) 10Jdrewniak: [C: 03+2] Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586305 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:43:14] (03PS3) 10Elukey: superset: handle upgrade to 0.36.0 [puppet] - 10https://gerrit.wikimedia.org/r/586328 (https://phabricator.wikimedia.org/T249495) [10:44:00] (03CR) 10Jbond: [C: 03+1] admin: add kerberos flag for aklapper [puppet] - 10https://gerrit.wikimedia.org/r/586208 (https://phabricator.wikimedia.org/T248905) (owner: 10Elukey) [10:44:06] (03Merged) 10jenkins-bot: Bumping portals to master [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586305 (https://phabricator.wikimedia.org/T128546) (owner: 10Jdrewniak) [10:46:47] (03PS1) 10Arturo Borrero Gonzalez: openstack: queens: wmfkeystonehooks: refresh code to use provider_api [puppet] - 10https://gerrit.wikimedia.org/r/586330 (https://phabricator.wikimedia.org/T249494) [10:47:57] (03PS1) 10Elukey: Add fake kerberos keytabs for Superset hosts [labs/private] - 10https://gerrit.wikimedia.org/r/586331 [10:48:35] (03CR) 10Elukey: [V: 03+2 C: 03+2] Add fake kerberos keytabs for Superset hosts [labs/private] - 10https://gerrit.wikimedia.org/r/586331 (owner: 10Elukey) [10:50:17] !log jdrewniak@deploy1001 Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: [[gerrit:586305| Bumping portals to master (563985)]] (duration: 01m 12s) [10:50:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:51:09] (03PS4) 10Elukey: superset: handle upgrade to 0.36.0 [puppet] - 10https://gerrit.wikimedia.org/r/586328 (https://phabricator.wikimedia.org/T249495) [10:51:16] !log jdrewniak@deploy1001 Synchronized portals: Wikimedia Portals Update: [[gerrit:586305| Bumping portals to master (563985)]] (duration: 00m 58s) [10:51:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:54:11] (03PS5) 10Elukey: superset: handle upgrade to 0.36.0 [puppet] - 10https://gerrit.wikimedia.org/r/586328 (https://phabricator.wikimedia.org/T249495) [10:55:52] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [10:56:02] (03CR) 10Elukey: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/21716/" [puppet] - 10https://gerrit.wikimedia.org/r/586328 (https://phabricator.wikimedia.org/T249495) (owner: 10Elukey) [10:56:37] (03CR) 10Elukey: [C: 03+2] aptrepo: add configuration for AMD ROCm 3.3 [puppet] - 10https://gerrit.wikimedia.org/r/586310 (https://phabricator.wikimedia.org/T247082) (owner: 10Elukey) [10:59:20] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is OK: HTTP OK: HTTP/1.0 200 OK - 22395 bytes in 0.256 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [10:59:39] 10Operations, 10MediaWiki-Parser, 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team), and 2 others: API action=parse should be poolcounter-limited if a re-parse is necessary - https://phabricator.wikimedia.org/T243803 (10Peter.ovchyn) @Anomie, I've pushed temporary solutions. I don't think th... [11:00:05] Amir1, Lucas_WMDE, awight, and Urbanecm: #bothumor I � Unicode. All rise for European Mid-day SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200406T1100). [11:00:05] awight, kart_, tgr, dcausse, and tassu: A patch you scheduled for European Mid-day SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [11:00:19] I'm happy to deploy today. [11:00:30] o/ [11:00:31] here o/ [11:00:32] awight: thanks! [11:00:42] coo. [11:00:48] cool :) [11:00:49] kk, I'll merge the two extension patches and deploy whatever lands first. [11:00:53] :-) [11:04:14] tgr: Hi, I'm putting the header change on mwdebug1001, if that will work for you. [11:04:37] (03CR) 10Awight: [C: 03+2] Whitelist X-Wikimedia-Debug header for cross-wiki API requests [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585779 (https://phabricator.wikimedia.org/T249107) (owner: 10Gergő Tisza) [11:04:38] awight: sort of testable, without it some things break on mwdebug [11:04:42] (03CR) 10Awight: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585779 (https://phabricator.wikimedia.org/T249107) (owner: 10Gergő Tisza) [11:05:19] tgr: Thanks [11:05:32] (03Merged) 10jenkins-bot: Whitelist X-Wikimedia-Debug header for cross-wiki API requests [mediawiki-config] - 10https://gerrit.wikimedia.org/r/585779 (https://phabricator.wikimedia.org/T249107) (owner: 10Gergő Tisza) [11:06:26] tgr: Patch should be ready to test, on mwdebug1001 [11:08:10] ...on second thought it relies on a core patch which will only ride this week's train. So not actually testable now, sorry. [11:08:30] tgr: Should I revert or just push it like this? [11:08:44] It looks harmless enough to my untrained eye [11:09:09] yeah it should be harmless [11:09:25] $a[] = ... works in PHP even if $a is not defined [11:10:41] or maybe I'm wrong and that's an E_NOTICE? [11:10:44] let me test [11:11:25] I think $undefined[] has been assumed safe forever. Also, your patch should only affect people with the debugging plugins enabled. [11:11:32] !log awight@deploy1001 Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:585779| Whitelist X-Wikimedia-Debug header for cross-wiki API requests (T249107)]] (duration: 00m 59s) [11:11:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:11:37] T249107: CORS errors on commons on debug servers - https://phabricator.wikimedia.org/T249107 [11:11:44] (03CR) 10Awight: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/580394 (https://phabricator.wikimedia.org/T245642) (owner: 10EBernhardson) [11:12:42] (03Merged) 10jenkins-bot: cirrus: Increase commonswiki near match weight [mediawiki-config] - 10https://gerrit.wikimedia.org/r/580394 (https://phabricator.wikimedia.org/T245642) (owner: 10EBernhardson) [11:13:59] ebernhardson: dcausse: ^ ready to test on mwdebug1001 [11:14:05] awight: testing [11:14:09] ty! [11:15:15] awight: it works great! [11:15:34] no, the code runs for everyone [11:15:37] deploying... [11:15:42] but it does seem safe [11:15:49] 10Operations, 10Performance-Team, 10Traffic, 10Wikimedia-Site-requests, and 2 others: Remove "Cache-control: no-cache" hack from wmf-config - https://phabricator.wikimedia.org/T247783 (10Joe) I can comment on the PHP part, I'm not 100% sure about the caching layer cache times given the changes happened in... [11:16:28] I had a vague recollection that some recent PHP version made that notice-able, but https://www.php.net/manual/en/language.types.array.php#language.types.array.syntax.modifying doesn't mention it and manual testing works fine too [11:17:05] :-) not a bad idea to make that noticeable [11:17:23] 10Operations, 10Mail: Wiki email not delivered to GMail - https://phabricator.wikimedia.org/T243937 (10Geagea) The user that @Matanya mentiond using Gmail and she olso OTRS volunteer, so her e-mail can be confirmed through OTRS system. [11:17:24] I'll do the testing tomorrow on group0 [11:17:26] (03CR) 10Awight: Create account creator and rollback groups on yowiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586325 (https://phabricator.wikimedia.org/T249487) (owner: 10Majavah) [11:17:32] !log awight@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:580394|cirrus: Increase commonswiki near match weight (T245642)]] (duration: 00m 59s) [11:17:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:17:37] T245642: Increase the near match weight - https://phabricator.wikimedia.org/T245642 [11:17:48] thanks for the deploy! [11:18:28] !log import AMD ROCm 3.3 packages in buster-wikimedia (component thirdparty/rocm33) - T247082 [11:18:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:18:33] T247082: Upgrade AMD ROCm to latest upstream - https://phabricator.wikimedia.org/T247082 [11:19:00] (03PS1) 10Elukey: profile::statistics::gpu: upgrade stat1008 to ROCm 3.3 [puppet] - 10https://gerrit.wikimedia.org/r/586334 (https://phabricator.wikimedia.org/T247082) [11:19:21] awight: fixing your comment, a sec [11:19:48] awight: thanks for the deploy! [11:20:06] (03PS2) 10Elukey: profile::statistics::gpu: upgrade stat1008 to ROCm 3.3 [puppet] - 10https://gerrit.wikimedia.org/r/586334 (https://phabricator.wikimedia.org/T247082) [11:20:49] (03PS2) 10Majavah: Create account creator and rollback groups on yowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586325 (https://phabricator.wikimedia.org/T249487) [11:21:03] tassu: Hi, I'm also trying to understand why there are no rights listed in $wgAddGroups and $wgRemoveGroups. I think this might be a typo? [11:21:23] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1099:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P10899 and previous config saved to /var/cache/conftool/dbconfig/20200406-112123-marostegui.json [11:21:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:21:38] dcausse: thanks for fixing the thing that I deployed :-) [11:21:56] awight: does not seem effective on production, does it need a double sync? (there's a bug with IS.php caching) [11:22:26] (03CR) 10Awight: Create account creator and rollback groups on yowiki (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586325 (https://phabricator.wikimedia.org/T249487) (owner: 10Majavah) [11:22:49] dcausse: I can double-sync just in case... [11:23:32] (03PS3) 10Majavah: Create account creator and rollback groups on yowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586325 (https://phabricator.wikimedia.org/T249487) [11:23:32] sure, please [11:23:40] awight: nice catch, need more coffee [11:23:45] i think it should be fine now [11:24:00] (03PS2) 10Volans: commit: do not commit_check on initial empty diff [software/homer] - 10https://gerrit.wikimedia.org/r/585510 (https://phabricator.wikimedia.org/T244363) [11:24:02] (03PS2) 10Volans: diff: allow to omit the actual diff [software/homer] - 10https://gerrit.wikimedia.org/r/585511 [11:24:04] (03PS3) 10Volans: diff: use different exit code if there is a diff [software/homer] - 10https://gerrit.wikimedia.org/r/585512 (https://phabricator.wikimedia.org/T249224) [11:24:06] (03PS3) 10Volans: netbox: silently skip devices without platform [software/homer] - 10https://gerrit.wikimedia.org/r/585536 [11:24:18] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1105:3311 for schema change', diff saved to https://phabricator.wikimedia.org/P10900 and previous config saved to /var/cache/conftool/dbconfig/20200406-112417-marostegui.json [11:24:19] (03CR) 10jerkins-bot: [V: 04-1] profile::statistics::gpu: upgrade stat1008 to ROCm 3.3 [puppet] - 10https://gerrit.wikimedia.org/r/586334 (https://phabricator.wikimedia.org/T247082) (owner: 10Elukey) [11:24:21] tgr|away: :D [11:24:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:24:47] !log Deploy schema change on db1105:3311 [11:24:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:25:10] !log awight@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: double-syncing (duration: 00m 58s) [11:25:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:25:30] dcausse: double-synced, lmk if I should deploy a revert [11:25:46] awight: I now see the new settings, thanks for deploy! :) [11:25:52] +1 [11:26:15] (03PS3) 10Elukey: profile::statistics::gpu: upgrade stat1008 to ROCm 3.3 [puppet] - 10https://gerrit.wikimedia.org/r/586334 (https://phabricator.wikimedia.org/T247082) [11:27:34] (03PS4) 10Majavah: Create account creator and rollback groups on yowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586325 (https://phabricator.wikimedia.org/T249487) [11:29:18] (03PS1) 10Elukey: Add fake kerberos keytabs for stat1008 [labs/private] - 10https://gerrit.wikimedia.org/r/586336 [11:29:25] tassu: Thanks, I'll get to that in a few minutes [11:29:32] great, thanks [11:29:36] (03CR) 10Elukey: [V: 03+2 C: 03+2] Add fake kerberos keytabs for stat1008 [labs/private] - 10https://gerrit.wikimedia.org/r/586336 (owner: 10Elukey) [11:29:46] kart_: mwdebug1001 has your change, please lmk how it goes [11:30:00] OK. Testing! [11:30:06] @Nikerabbit ^^ [11:30:15] Checking [11:31:10] I'm able to load old-saved draft. @Nikerabbit [11:31:26] without mwdebug: fail, with mwdebug: pass. please proceed [11:31:30] ty [11:31:58] (03PS1) 10Joal: Bump AQS druid datasource to 2020-03 [puppet] - 10https://gerrit.wikimedia.org/r/586337 [11:32:18] Thanks @Nikerabbit [11:32:25] elukey: for when you're back --^ [11:32:34] !log awight@deploy1001 Synchronized php-1.35.0-wmf.26/extensions/ContentTranslation: SWAT: [[gerrit:586311|Avoid failure on restoring draft with no categories (T249400)]] (duration: 01m 02s) [11:32:36] joal: bonjour! [11:32:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:32:40] T249400: Content Translation tool not allowing work - https://phabricator.wikimedia.org/T249400 [11:32:46] Hey :) [11:32:52] (03PS4) 10Elukey: profile::statistics::gpu: upgrade stat1008 to ROCm 3.3 [puppet] - 10https://gerrit.wikimedia.org/r/586334 (https://phabricator.wikimedia.org/T247082) [11:33:41] Thanks awight !! [11:34:20] (03CR) 10Awight: [C: 03+2] Create account creator and rollback groups on yowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586325 (https://phabricator.wikimedia.org/T249487) (owner: 10Majavah) [11:34:23] (03CR) 10Elukey: [C: 03+2] Bump AQS druid datasource to 2020-03 [puppet] - 10https://gerrit.wikimedia.org/r/586337 (owner: 10Joal) [11:34:25] (03CR) 10Awight: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586325 (https://phabricator.wikimedia.org/T249487) (owner: 10Majavah) [11:34:40] (03CR) 10Elukey: [C: 03+2] profile::statistics::gpu: upgrade stat1008 to ROCm 3.3 [puppet] - 10https://gerrit.wikimedia.org/r/586334 (https://phabricator.wikimedia.org/T247082) (owner: 10Elukey) [11:34:53] (03PS5) 10Elukey: profile::statistics::gpu: upgrade stat1008 to ROCm 3.3 [puppet] - 10https://gerrit.wikimedia.org/r/586334 (https://phabricator.wikimedia.org/T247082) [11:34:58] kart_: For sure! [11:35:18] (03Merged) 10jenkins-bot: Create account creator and rollback groups on yowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586325 (https://phabricator.wikimedia.org/T249487) (owner: 10Majavah) [11:36:21] tassu: If you'd like to test, mwdebug1001 has the change [11:36:30] testing now [11:37:03] I remembered the user right wrong, it's "rollback" not "rollbacker" [11:37:14] should I just make a new patch based on master to fix? [11:37:18] kk yes please [11:37:58] (03PS1) 10Dzahn: phabricator: re-enable aphlict service [puppet] - 10https://gerrit.wikimedia.org/r/586338 (https://phabricator.wikimedia.org/T238593) [11:38:40] (03PS1) 10Majavah: Fix user right on yowiki rollbacker group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586339 (https://phabricator.wikimedia.org/T249487) [11:38:52] awight: fix patch is at https://gerrit.wikimedia.org/r/#/c/586339/ [11:39:14] (03CR) 10Elukey: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/586334 (https://phabricator.wikimedia.org/T247082) (owner: 10Elukey) [11:39:25] (03CR) 10Elukey: [C: 03+2] profile::statistics::gpu: upgrade stat1008 to ROCm 3.3 [puppet] - 10https://gerrit.wikimedia.org/r/586334 (https://phabricator.wikimedia.org/T247082) (owner: 10Elukey) [11:39:42] (03CR) 10Awight: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586339 (https://phabricator.wikimedia.org/T249487) (owner: 10Majavah) [11:40:45] (03CR) 10Dzahn: [C: 04-1] ATS/phabricator: directly talk wss:// to aphlict (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/569104 (https://phabricator.wikimedia.org/T238593) (owner: 10Dzahn) [11:40:52] (03Merged) 10jenkins-bot: Fix user right on yowiki rollbacker group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586339 (https://phabricator.wikimedia.org/T249487) (owner: 10Majavah) [11:43:02] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3050 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [11:45:47] (03CR) 10Dzahn: [C: 03+2] "https://puppet-compiler.wmflabs.org/compiler1003/21720/phab1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/586338 (https://phabricator.wikimedia.org/T238593) (owner: 10Dzahn) [11:46:05] tassu: Ready to test again :-) [11:46:29] awight: works now, sorry for the hassle [11:46:52] great! [11:48:16] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3050 is OK: HTTP OK: HTTP/1.0 200 OK - 22395 bytes in 0.257 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [11:48:20] !log awight@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:586325|Create account creator and rollback groups on yowiki (T249487)]] (duration: 00m 59s) [11:48:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:48:26] T249487: Add rollback and account creator groups on yo.wiki - https://phabricator.wikimedia.org/T249487 [11:48:53] !log elukey@cumin1001 START - Cookbook sre.aqs.roll-restart [11:48:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:50:39] awight: thanks for deploying [11:50:53] +1 :-) thanks for the quick adjustments [11:52:08] !log elukey@cumin1001 END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) [11:52:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:53:10] (03CR) 10Dzahn: [C: 04-1] "after https://gerrit.wikimedia.org/r/c/operations/puppet/+/586338 the aphlict (nodejs) service is running again. But it uses 22280 and 222" [puppet] - 10https://gerrit.wikimedia.org/r/569104 (https://phabricator.wikimedia.org/T238593) (owner: 10Dzahn) [11:53:14] !log awight@deploy1001 Synchronized php-1.35.0-wmf.26/extensions/TwoColConflict: SWAT: [[gerrit:586309|Backport talk page and EventLogging changes (T248243, T249404) (duration: 00m 59s) [11:53:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:53:19] T248243: Stop dropping large conflicts from EventLogging - https://phabricator.wikimedia.org/T248243 [11:53:19] T249404: TwoColConflict "exit" metrics failing - https://phabricator.wikimedia.org/T249404 [11:53:35] jouncebot: next [11:53:35] In 0 hour(s) and 6 minute(s): Creating gr.wikimedia.org (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200406T1200) [11:54:23] (03CR) 10Ayounsi: [C: 03+1] commit: do not commit_check on initial empty diff [software/homer] - 10https://gerrit.wikimedia.org/r/585510 (https://phabricator.wikimedia.org/T244363) (owner: 10Volans) [11:55:41] (03PS1) 10Elukey: Set apt1001's IPs in analytics-in4/6 term apt [homer/public] - 10https://gerrit.wikimedia.org/r/586340 [11:56:11] XioNoX: --^ [11:56:17] (if you have time) [11:56:30] elukey: ? [11:56:48] my patch for homer [11:57:02] elukey: oh, I mute some bots from here [11:57:08] !log EU swat complete [11:57:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:57:54] elukey: am I CCed? [11:58:06] (03PS1) 10Arturo Borrero Gonzalez: openstack: keystone: queens: fix encoding issues in our custom LDAP handler [puppet] - 10https://gerrit.wikimedia.org/r/586341 (https://phabricator.wikimedia.org/T249494) [11:59:42] (03CR) 10Ayounsi: "Please add apt2001.wikimedia.org as well." [homer/public] - 10https://gerrit.wikimedia.org/r/586340 (owner: 10Elukey) [12:00:00] elukey: commented [12:00:04] Amir1 and Urbanecm: How many deployers does it take to do Creating gr.wikimedia.org deploy? (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200406T1200). [12:00:22] XioNoX: thanks! Does eqiad ever use 2001? [12:00:41] Urbanecm: Are you ready? Put your hands up [12:00:54] (03PS7) 10Dzahn: ATS/phabricator: directly talk wss:// to aphlict [puppet] - 10https://gerrit.wikimedia.org/r/569104 (https://phabricator.wikimedia.org/T238593) [12:01:00] PROBLEM - Check systemd state on boron is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [12:01:01] elukey: dunno, but it's better to have both just in case [12:02:26] (03PS2) 10Elukey: Set apt1001's IPs in analytics-in4/6 term apt [homer/public] - 10https://gerrit.wikimedia.org/r/586340 [12:02:35] XioNoX: fixed thanks [12:04:13] (03CR) 10Dzahn: [C: 03+1] Set apt1001's IPs in analytics-in4/6 term apt [homer/public] - 10https://gerrit.wikimedia.org/r/586340 (owner: 10Elukey) [12:04:22] !log test grafana 6.7.2 upgrade on grafana2001 - T244208 [12:04:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:04:28] T244208: Upgrade Grafana to 6.7 - https://phabricator.wikimedia.org/T244208 [12:04:52] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3062 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:05:08] (03CR) 10Alex Monk: [C: 03+1] "looks like the old code causes one of the errors we saw, if this new code works in codfw1dev let's ship it to eqiad1" [puppet] - 10https://gerrit.wikimedia.org/r/586341 (https://phabricator.wikimedia.org/T249494) (owner: 10Arturo Borrero Gonzalez) [12:05:41] (03PS3) 10Elukey: Set apt1001's IPs in analytics-in4/6 term apt [homer/public] - 10https://gerrit.wikimedia.org/r/586340 [12:06:31] (03PS2) 10Arturo Borrero Gonzalez: openstack: keystone: queens: fix encoding issues in our custom LDAP handler [puppet] - 10https://gerrit.wikimedia.org/r/586341 (https://phabricator.wikimedia.org/T249494) [12:07:08] (03CR) 10jerkins-bot: [V: 04-1] openstack: keystone: queens: fix encoding issues in our custom LDAP handler [puppet] - 10https://gerrit.wikimedia.org/r/586341 (https://phabricator.wikimedia.org/T249494) (owner: 10Arturo Borrero Gonzalez) [12:07:36] (03CR) 10Ayounsi: [C: 03+1] "LGTM!" [homer/public] - 10https://gerrit.wikimedia.org/r/586340 (owner: 10Elukey) [12:07:43] (03CR) 10Elukey: [C: 03+2] Set apt1001's IPs in analytics-in4/6 term apt [homer/public] - 10https://gerrit.wikimedia.org/r/586340 (owner: 10Elukey) [12:08:14] !log hnowlan@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' . [12:08:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:11:31] (03CR) 10Alex Monk: openstack: keystone: queens: fix encoding issues in our custom LDAP handler (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/586341 (https://phabricator.wikimedia.org/T249494) (owner: 10Arturo Borrero Gonzalez) [12:14:21] (03CR) 10Dzahn: [C: 03+2] "Is it replaced by feed 1h https://wikimediafoundation.org/feed/" [puppet] - 10https://gerrit.wikimedia.org/r/586148 (owner: 10Legoktm) [12:14:45] !log hnowlan@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' . [12:14:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:14:54] (03CR) 10Volans: "Comments addressed as agreed on IRC" (032 comments) [software/homer] - 10https://gerrit.wikimedia.org/r/585510 (https://phabricator.wikimedia.org/T244363) (owner: 10Volans) [12:15:21] (03CR) 10Ayounsi: [C: 03+1] plugins: initial implementation for Netbox data [software/homer] - 10https://gerrit.wikimedia.org/r/584973 (owner: 10Volans) [12:15:42] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3062 is OK: HTTP OK: HTTP/1.0 200 OK - 22405 bytes in 0.256 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [12:17:25] (03PS8) 10Ladsgroup: Initial configuration for gr.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/574184 (https://phabricator.wikimedia.org/T245911) (owner: 10MarcoAurelio) [12:18:25] (03CR) 10jerkins-bot: [V: 04-1] Initial configuration for gr.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/574184 (https://phabricator.wikimedia.org/T245911) (owner: 10MarcoAurelio) [12:18:57] (03PS1) 10Volans: doc: fix example config.yaml indentation [software/homer] - 10https://gerrit.wikimedia.org/r/586342 [12:18:59] (03PS1) 10Volans: gitignore: add plugins [software/homer] - 10https://gerrit.wikimedia.org/r/586343 [12:19:50] (03CR) 10Volans: [C: 03+2] plugins: initial implementation for Netbox data [software/homer] - 10https://gerrit.wikimedia.org/r/584973 (owner: 10Volans) [12:19:58] (03CR) 10Volans: [C: 03+2] commit: do not commit_check on initial empty diff [software/homer] - 10https://gerrit.wikimedia.org/r/585510 (https://phabricator.wikimedia.org/T244363) (owner: 10Volans) [12:20:00] (03CR) 10Volans: [C: 03+2] diff: allow to omit the actual diff [software/homer] - 10https://gerrit.wikimedia.org/r/585511 (owner: 10Volans) [12:20:02] (03CR) 10Volans: [C: 03+2] diff: use different exit code if there is a diff [software/homer] - 10https://gerrit.wikimedia.org/r/585512 (https://phabricator.wikimedia.org/T249224) (owner: 10Volans) [12:20:08] (03PS3) 10Dzahn: planet: Remove dead blog.wikimedia.org feeds [puppet] - 10https://gerrit.wikimedia.org/r/586148 (owner: 10Legoktm) [12:20:10] (03CR) 10Volans: [C: 03+2] netbox: silently skip devices without platform [software/homer] - 10https://gerrit.wikimedia.org/r/585536 (owner: 10Volans) [12:20:39] (03PS1) 10Filippo Giunchedi: hieradata: move grafana-next to grafana2001 [puppet] - 10https://gerrit.wikimedia.org/r/586344 (https://phabricator.wikimedia.org/T244208) [12:20:59] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1105:3311 after schema change', diff saved to https://phabricator.wikimedia.org/P10901 and previous config saved to /var/cache/conftool/dbconfig/20200406-122058-marostegui.json [12:21:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:21:16] (03PS2) 10Filippo Giunchedi: hieradata: move grafana-next to grafana2001 [puppet] - 10https://gerrit.wikimedia.org/r/586344 (https://phabricator.wikimedia.org/T244208) [12:21:23] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1089 for schema change', diff saved to https://phabricator.wikimedia.org/P10902 and previous config saved to /var/cache/conftool/dbconfig/20200406-122123-marostegui.json [12:21:25] (03PS1) 10Hnowlan: calico: use rdb2006 for changeprop in codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/586345 [12:21:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:21:51] !log Deploy schema change on db1089 [12:21:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:22:31] (03PS2) 10Dzahn: planet: Add new techblog.wikimedia.org feed [puppet] - 10https://gerrit.wikimedia.org/r/586147 (owner: 10Legoktm) [12:22:36] (03Merged) 10jenkins-bot: plugins: initial implementation for Netbox data [software/homer] - 10https://gerrit.wikimedia.org/r/584973 (owner: 10Volans) [12:22:47] (03Merged) 10jenkins-bot: commit: do not commit_check on initial empty diff [software/homer] - 10https://gerrit.wikimedia.org/r/585510 (https://phabricator.wikimedia.org/T244363) (owner: 10Volans) [12:22:50] (03Merged) 10jenkins-bot: diff: allow to omit the actual diff [software/homer] - 10https://gerrit.wikimedia.org/r/585511 (owner: 10Volans) [12:23:03] (03CR) 10Ssingh: [C: 03+2] Update the Debian package for the v0.1.1 release [software/censorship-monitoring] - 10https://gerrit.wikimedia.org/r/585767 (owner: 10Ssingh) [12:23:10] (03PS3) 10Arturo Borrero Gonzalez: openstack: keystone: queens: fix encoding issues in our custom LDAP handler [puppet] - 10https://gerrit.wikimedia.org/r/586341 (https://phabricator.wikimedia.org/T249494) [12:23:12] (03CR) 10Dzahn: [C: 03+2] planet: Add new techblog.wikimedia.org feed [puppet] - 10https://gerrit.wikimedia.org/r/586147 (owner: 10Legoktm) [12:23:28] (03Merged) 10jenkins-bot: diff: use different exit code if there is a diff [software/homer] - 10https://gerrit.wikimedia.org/r/585512 (https://phabricator.wikimedia.org/T249224) (owner: 10Volans) [12:23:30] (03Merged) 10jenkins-bot: netbox: silently skip devices without platform [software/homer] - 10https://gerrit.wikimedia.org/r/585536 (owner: 10Volans) [12:24:10] (03CR) 10jerkins-bot: [V: 04-1] openstack: keystone: queens: fix encoding issues in our custom LDAP handler [puppet] - 10https://gerrit.wikimedia.org/r/586341 (https://phabricator.wikimedia.org/T249494) (owner: 10Arturo Borrero Gonzalez) [12:25:00] (03PS5) 10Dzahn: Add .webm in files.viewable-mime-types of Phabricator [puppet] - 10https://gerrit.wikimedia.org/r/569627 (https://phabricator.wikimedia.org/T215360) (owner: 10Zoranzoki21) [12:27:21] !log hnowlan@deploy1001 helmfile [EQIAD] Ran 'sync' command on namespace 'changeprop' for release 'production' . [12:27:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:27:43] the patch is not working properly, aborting creating the wiki [12:28:25] Amir1: I'm now here [12:29:03] (03CR) 10Dzahn: "seems to be a -1 per https://phabricator.wikimedia.org/T244162#6021780" [puppet] - 10https://gerrit.wikimedia.org/r/569627 (https://phabricator.wikimedia.org/T215360) (owner: 10Zoranzoki21) [12:29:13] Urbanecm: can you take a look at the config patch? [12:29:22] sure [12:30:13] 10Operations, 10Beta-Cluster-Infrastructure: deployment-cache-upload05: Several millions of logstash error entries - https://phabricator.wikimedia.org/T243129 (10fgiunchedi) Untagging observability as this is a service / beta issue (also seems fixed?) [12:31:45] 10Operations, 10observability: Make grafana-next.wm.o HTTP 302 redirect to grafana.wm.o - https://phabricator.wikimedia.org/T240048 (10fgiunchedi) Different approach/idea: point grafana-next to standby grafana: https://gerrit.wikimedia.org/r/c/operations/puppet/+/586344 [12:34:09] Amir1: seems composer buildDBLists wasn't run (MarcoAurelio reports it has an issue, probably old PHP on his side) [12:34:16] uploading a new PS [12:34:21] (03PS9) 10Urbanecm: Initial configuration for gr.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/574184 (https://phabricator.wikimedia.org/T245911) (owner: 10MarcoAurelio) [12:34:24] Thanks [12:34:49] Urbanecm: do you want to create the wiki? I stay beside just in case [12:35:08] Amir1: Sure - if jenkins allows me so [12:35:55] wiki creators need to have wikidata Q so we can add property "created by" to the wiki :p [12:36:03] (03CR) 10Urbanecm: [C: 03+2] Initial configuration for gr.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/574184 (https://phabricator.wikimedia.org/T245911) (owner: 10MarcoAurelio) [12:36:10] mutante: create one :P [12:36:52] Urbanecm: done https://www.wikidata.org/wiki/Q89574409 [12:36:58] thanks [12:37:03] (03Merged) 10jenkins-bot: Initial configuration for gr.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/574184 (https://phabricator.wikimedia.org/T245911) (owner: 10MarcoAurelio) [12:37:18] !log Update eqiad analytics filters with new APT IPs [12:37:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:38:12] Amir1: pulled onto deploy1001 [12:38:27] and scap puling onto mwmaint1002 [12:38:54] cool [12:39:15] (03PS1) 10Ayounsi: Sort lvs_neighbors dict [homer/public] - 10https://gerrit.wikimedia.org/r/586347 [12:39:16] Amir1: `mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=aawiki el wikimedia grwikimedia gr.wikimedia.org` is the right thing to run, yes? [12:39:44] 10Operations, 10Traffic, 10Wikimedia-Logstash, 10observability: Varnish does not vary elasticsearch query by request body - https://phabricator.wikimedia.org/T174960 (10fgiunchedi) 05Open→03Declined Tentatively resolving since things are working as intended [12:40:07] yup [12:40:22] (03CR) 10Volans: [C: 03+1] "LGTM" [homer/public] - 10https://gerrit.wikimedia.org/r/586347 (owner: 10Ayounsi) [12:40:30] okay, doing so [12:40:50] (03CR) 10Muehlenhoff: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/584638 (https://phabricator.wikimedia.org/T233950) (owner: 10Jbond) [12:41:21] (03CR) 10Ayounsi: [C: 03+2] Sort lvs_neighbors dict [homer/public] - 10https://gerrit.wikimedia.org/r/586347 (owner: 10Ayounsi) [12:42:16] Amir1: now I just need to sync dblists, wikiversions etc, right? [12:42:33] is the database created? [12:42:37] yes [12:42:42] then yup [12:42:55] (03CR) 10Ayounsi: [C: 03+1] gitignore: add plugins [software/homer] - 10https://gerrit.wikimedia.org/r/586343 (owner: 10Volans) [12:43:24] (03CR) 10Ayounsi: [C: 03+1] doc: fix example config.yaml indentation [software/homer] - 10https://gerrit.wikimedia.org/r/586342 (owner: 10Volans) [12:44:18] !log urbanecm@deploy1001 Synchronized dblists/: 77b9ae9: Create grwikimedia (duration: 00m 59s) [12:44:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:45:53] jouncebot: now [12:45:53] For the next 0 hour(s) and 14 minute(s): Creating gr.wikimedia.org (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200406T1200) [12:45:56] jouncebot: next [12:45:56] In 4 hour(s) and 14 minute(s): Wikidata Query Service weekly deploy (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200406T1700) [12:46:25] !log urbanecm@deploy1001 rebuilt and synchronized wikiversions files: 77b9ae9: Create grwikimedia [12:46:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:46:45] jouncebot: refresh [12:46:46] I refreshed my knowledge about deployments. [12:46:47] jouncebot: next [12:46:47] In 0 hour(s) and 13 minute(s): entitysources: Directly create entitySources config for WMF "test" wikis (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200406T1300) [12:47:10] Amir1: synced dblists and wikiversions and scap pulled at mwdebug1001. At this point it should work there, right? [12:47:20] yeah [12:47:33] the issue is it...doesn't. Could you have a look please? [12:47:45] Urbanecm: did you do "sync-wikiversions" [12:47:48] yes [12:48:01] scap sync-wikiversions [12:48:06] to rebuild the cache [12:48:07] hmm [12:48:08] yes, that command [12:48:10] okay [12:48:27] 10Operations, 10Analytics, 10netops: Move netflow data to Eventgate Analytics - https://phabricator.wikimedia.org/T248865 (10MoritzMuehlenhoff) p:05Triage→03Medium [12:48:35] 10Operations, 10Wikimedia-Mailing-lists: Delete email addresses with privileged @domain names from mailing lists at offboarding - https://phabricator.wikimedia.org/T248384 (10MoritzMuehlenhoff) p:05Triage→03Medium [12:48:47] Amir1: it says `Error: 1146 Table 'grwikimedia.revtag' doesn't exist` [12:48:59] do you have an idea what uses that table? [12:49:25] (03CR) 10Gehel: [C: 03+1] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/583954 (owner: 10Muehlenhoff) [12:49:34] codesearch [12:49:47] translate [12:49:52] ah, right [12:49:57] okay, going to create Translate's tables [12:50:54] jouncebot: refresh [12:50:55] I refreshed my knowledge about deployments. [12:50:57] jouncebot: next [12:50:57] In 1 hour(s) and 9 minute(s): entitysources: Directly create entitySources config for WMF "test" wikis (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200406T1400) [12:51:03] * addshore pushed his slot back by 1 hour [12:51:09] Amir1: thx, it's up on mwdebug. Going to finish the syncs [12:51:27] (03CR) 10Volans: [C: 03+2] doc: fix example config.yaml indentation [software/homer] - 10https://gerrit.wikimedia.org/r/586342 (owner: 10Volans) [12:51:32] (03CR) 10Volans: [C: 03+2] gitignore: add plugins [software/homer] - 10https://gerrit.wikimedia.org/r/586343 (owner: 10Volans) [12:52:10] eh, is the "writing system" Latin script? [12:52:14] for Greek [12:52:20] as opposed to Ancient greek [12:52:22] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1089 after schema change', diff saved to https://phabricator.wikimedia.org/P10903 and previous config saved to /var/cache/conftool/dbconfig/20200406-125222-marostegui.json [12:52:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:52:48] !log urbanecm@deploy1001 Synchronized multiversion/MWMultiVersion.php: 77b9ae9: Create grwikimedia (duration: 00m 58s) [12:52:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:53:09] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1107 for schema change', diff saved to https://phabricator.wikimedia.org/P10904 and previous config saved to /var/cache/conftool/dbconfig/20200406-125308-marostegui.json [12:53:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:53:39] !log Deploy schema change on db1107 [12:53:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:53:48] 10Operations, 10netops, 10Wikimedia-Incident: Add linecard diversity to the router-to-router interconnect in codfw - https://phabricator.wikimedia.org/T248506 (10MoritzMuehlenhoff) p:05Triage→03Medium [12:53:49] mutante: hmm good q. Per https://el.wikipedia.org/wiki/%CE%A0%CF%8D%CE%BB%CE%B7:%CE%9A%CF%8D%CF%81%CE%B9%CE%B1, doesn't seem so [12:54:10] !log urbanecm@deploy1001 Synchronized static/images/project-logos/: 77b9ae9: Create grwikimedia (duration: 00m 58s) [12:54:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:54:29] Urbanecm: ack [12:54:42] RECOVERY - snapshot of s3 in codfw on db1115 is OK: snapshot for s3 at codfw taken less than 3 days ago and larger than 90 GB: Last one 2020-04-06 09:02:54 from db2098.codfw.wmnet:3313 (842 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups [12:54:48] (03Merged) 10jenkins-bot: doc: fix example config.yaml indentation [software/homer] - 10https://gerrit.wikimedia.org/r/586342 (owner: 10Volans) [12:54:50] (03Merged) 10jenkins-bot: gitignore: add plugins [software/homer] - 10https://gerrit.wikimedia.org/r/586343 (owner: 10Volans) [12:55:38] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: 77b9ae9: Create grwikimedia (duration: 00m 58s) [12:55:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:55:54] okay, wiki seems to be running! [12:55:58] (03PS2) 10Gehel: maps: tweak OSM replication hours [puppet] - 10https://gerrit.wikimedia.org/r/581636 (owner: 10MSantos) [12:56:16] going to do interwiki cache now [12:57:24] (03CR) 10Dzahn: [C: 03+1] "db dump created" [puppet] - 10https://gerrit.wikimedia.org/r/583920 (owner: 10Dzahn) [12:57:26] (03CR) 10Jcrespo: [C: 03+1] "The patch is right- does this need any vlan hole for the new proxy to be available from cloud network? You would know that better than I d" [puppet] - 10https://gerrit.wikimedia.org/r/586206 (https://phabricator.wikimedia.org/T231520) (owner: 10Marostegui) [12:57:28] (03PS1) 10Urbanecm: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586349 [12:57:30] (03CR) 10Urbanecm: [C: 03+2] Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586349 (owner: 10Urbanecm) [12:57:43] (03CR) 10Gehel: [C: 03+2] maps: tweak OSM replication hours [puppet] - 10https://gerrit.wikimedia.org/r/581636 (owner: 10MSantos) [12:58:36] (03Merged) 10jenkins-bot: Update interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586349 (owner: 10Urbanecm) [12:59:39] !log urbanecm@deploy1001 Synchronized wmf-config/interwiki.php: Update interwiki cache (duration: 02m 22s) [12:59:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:59:57] !log Creation of grwikimedia is done (T245911) [13:00:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:02] T245911: Create a wiki for Wikimedia Community User Group Greece - https://phabricator.wikimedia.org/T245911 [13:00:10] PROBLEM - PHP opcache health on mw2307 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:00:51] Amir1: wiki seems to be up, I'll make them to submit acc details, so they can use it. [13:01:11] cool [13:01:17] (03CR) 10Dzahn: [C: 03+2] switch webserver-misc-apps discovery record to miscweb1002 [dns] - 10https://gerrit.wikimedia.org/r/584606 (https://phabricator.wikimedia.org/T247648) (owner: 10Dzahn) [13:01:17] Thanks for doing it [13:01:20] (03PS2) 10Dzahn: switch webserver-misc-apps discovery record to miscweb1002 [dns] - 10https://gerrit.wikimedia.org/r/584606 (https://phabricator.wikimedia.org/T247648) [13:01:42] thanks Urbanecm, nice [13:01:50] (03PS3) 10Gehel: maps: enable osm replication cron [puppet] - 10https://gerrit.wikimedia.org/r/581637 (owner: 10MSantos) [13:01:50] happy to help! [13:02:55] !log marostegui@cumin1001 dbctl commit (dc=all): 'Repool db1107 after schema change', diff saved to https://phabricator.wikimedia.org/P10905 and previous config saved to /var/cache/conftool/dbconfig/20200406-130255-marostegui.json [13:02:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:03:11] Amir1: how to find the logo URL (on commons) [13:03:20] !log marostegui@cumin1001 dbctl commit (dc=all): 'Depool db1118 for schema change', diff saved to https://phabricator.wikimedia.org/P10906 and previous config saved to /var/cache/conftool/dbconfig/20200406-130320-marostegui.json [13:03:22] sees all other community user groups but Greece [13:03:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:03:28] !log updating gnutls on buster [13:03:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:03:46] !log Deploy schema change on db1118 [13:03:47] mutante: https://commons.wikimedia.org/wiki/File:WM_CUG_GR.png? [13:03:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:03:51] (03CR) 10Gehel: [C: 03+2] maps: enable osm replication cron [puppet] - 10https://gerrit.wikimedia.org/r/581637 (owner: 10MSantos) [13:04:02] mutante: yeah, that's the one [13:04:26] thanks! [13:04:44] (it's usually linked from the task's description, fwiw) [13:05:14] !log cache: upgrade varnish to 5.1.3-1wm13, begin rolling varnish-fe restarts T249344 [13:05:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:05:21] T249344: varnishd crashes in vbf_stp_condfetch(): cp3057 and cp3061 - https://phabricator.wikimedia.org/T249344 [13:05:23] ah of course, yep [13:06:39] (03PS1) 10Jgreen: remove deprecated hostgroups from nsca_frack.cfg.erb [puppet] - 10https://gerrit.wikimedia.org/r/586351 (https://phabricator.wikimedia.org/T247855) [13:07:24] (03Abandoned) 10Dzahn: ATS: switch racktables to backend miscweb1002 [puppet] - 10https://gerrit.wikimedia.org/r/583920 (owner: 10Dzahn) [13:08:08] (03CR) 10Jgreen: [V: 03+1 C: 03+2] remove deprecated hostgroups from nsca_frack.cfg.erb [puppet] - 10https://gerrit.wikimedia.org/r/586351 (https://phabricator.wikimedia.org/T247855) (owner: 10Jgreen) [13:08:23] (03CR) 10Jgreen: [V: 03+1 C: 03+2] remove deprecated hostgroups from nsca_frack.cfg.erb [puppet] - 10https://gerrit.wikimedia.org/r/586351 (https://phabricator.wikimedia.org/T247855) (owner: 10Jgreen) [13:12:36] marostegui: could i ask for a temp mysql GRANT? I am migrating racktables to buster and when the app version changes it wants to also update the DB structure and then i get: "You do not have the SUPER privilege and binary logging is enabled (you *might* want to use the less safe log_bin_trust_function_creators variable)" [13:13:07] it wants to ('CREATE TRIGGER `trigger_test` BEFORE INSERT ON `innodb_test` FOR EACH ROW BEGIN END') [13:13:48] mutante: yes, we can do that and then revoke that SUPER to leave it as it used to be [13:13:52] mutante: talk to me in private [13:13:57] thank you! ok [13:15:48] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "> > LGTM but add removal of the now-unused template." [puppet] - 10https://gerrit.wikimedia.org/r/585250 (https://phabricator.wikimedia.org/T211250) (owner: 10RLazarus) [13:17:09] RECOVERY - PHP opcache health on mw2307 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [13:17:53] (03CR) 10Giuseppe Lavagetto: [C: 03+1] "LGTM, one nit." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/585795 (https://phabricator.wikimedia.org/T211250) (owner: 10RLazarus) [13:18:21] (03CR) 10Giuseppe Lavagetto: [C: 03+1] maintenance: Migrate echo_mail_batch to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/585796 (https://phabricator.wikimedia.org/T211250) (owner: 10RLazarus) [13:19:59] (03PS1) 10Elukey: amd: update package list after 3.3 upgrade [puppet] - 10https://gerrit.wikimedia.org/r/586352 (https://phabricator.wikimedia.org/T247082) [13:21:55] (03CR) 10CDanis: [C: 03+1] hieradata: move grafana-next to grafana2001 [puppet] - 10https://gerrit.wikimedia.org/r/586344 (https://phabricator.wikimedia.org/T244208) (owner: 10Filippo Giunchedi) [13:22:06] (03CR) 10Elukey: [C: 03+2] amd: update package list after 3.3 upgrade [puppet] - 10https://gerrit.wikimedia.org/r/586352 (https://phabricator.wikimedia.org/T247082) (owner: 10Elukey) [13:22:38] !log stat1008 upgraded to ROCm 3.3 (enables Tensorflow 2.x) [13:22:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:25:14] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10jcrespo) > My fear is that this become "a given service" and people starting assuming and think... [13:26:30] !log reboot stat1008 as test to verify ROCm 3.3 upgrades [13:26:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:27:16] (03CR) 10CDanis: "> Patch Set 1:" [puppet] - 10https://gerrit.wikimedia.org/r/586324 (owner: 10Filippo Giunchedi) [13:28:11] (03CR) 10Filippo Giunchedi: [C: 03+2] hieradata: move grafana-next to grafana2001 [puppet] - 10https://gerrit.wikimedia.org/r/586344 (https://phabricator.wikimedia.org/T244208) (owner: 10Filippo Giunchedi) [13:28:48] (03CR) 10CDanis: prometheus: alert on stale textfiles (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/586324 (owner: 10Filippo Giunchedi) [13:29:09] PROBLEM - Host stat1008 is DOWN: PING CRITICAL - Packet loss = 100% [13:31:18] this is me [13:31:19] ^ should be reboot for rocm update I guess [13:31:35] RECOVERY - Host stat1008 is UP: PING OK - Packet loss = 0%, RTA = 0.26 ms [13:32:05] moritzm: yep yep [13:41:11] (03CR) 10Jhedden: [C: 03+2] openstack: update nova-placement healthcheck in codfwdev1 [puppet] - 10https://gerrit.wikimedia.org/r/586118 (https://phabricator.wikimedia.org/T249453) (owner: 10Jhedden) [13:41:34] 10Operations, 10Performance-Team, 10Traffic, 10Wikimedia-Site-requests, and 2 others: Remove "Cache-control: no-cache" hack from wmf-config - https://phabricator.wikimedia.org/T247783 (10ema) When it comes to caching 50x responses: I can confirm with confidence that at the ats-be level we [[https://gerrit.... [13:47:38] !log upload cescout 0.1.1-1 to apt.wm.o (buster) - T247273 [13:47:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:47:43] T247273: Deploy the cescout package (censorship monitoring) - https://phabricator.wikimedia.org/T247273 [13:54:58] (03PS1) 10Nikerabbit: Restore Beta Cluster logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586353 (https://phabricator.wikimedia.org/T165128) [13:56:30] (03CR) 10Nikerabbit: "@Timo&James: please check if this makes sense." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586353 (https://phabricator.wikimedia.org/T165128) (owner: 10Nikerabbit) [13:57:00] (03PS6) 10ArielGlenn: weekly dump of machine vision tables from commonswiki [puppet] - 10https://gerrit.wikimedia.org/r/573351 (https://phabricator.wikimedia.org/T236431) [13:57:39] (03CR) 10jerkins-bot: [V: 04-1] weekly dump of machine vision tables from commonswiki [puppet] - 10https://gerrit.wikimedia.org/r/573351 (https://phabricator.wikimedia.org/T236431) (owner: 10ArielGlenn) [14:00:04] addshore: I, the Bot under the Fountain, allow thee, The Deployer, to do entitysources: Directly create entitySources config for WMF "test" wikis deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200406T1400). [14:01:41] (03PS7) 10ArielGlenn: weekly dump of machine vision tables from commonswiki [puppet] - 10https://gerrit.wikimedia.org/r/573351 (https://phabricator.wikimedia.org/T236431) [14:03:29] jouncebot: now [14:03:29] For the next 0 hour(s) and 56 minute(s): entitysources: Directly create entitySources config for WMF "test" wikis (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200406T1400) [14:04:37] (03CR) 10Krinkle: Restore Beta Cluster logging (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586353 (https://phabricator.wikimedia.org/T165128) (owner: 10Nikerabbit) [14:05:32] (03PS2) 10Giuseppe Lavagetto: mediawiki: convert all appserver to use envoy for TLS termination [puppet] - 10https://gerrit.wikimedia.org/r/586205 (https://phabricator.wikimedia.org/T247389) [14:05:34] (03PS1) 10Giuseppe Lavagetto: parsoid: switch to envoy for TLS termination [puppet] - 10https://gerrit.wikimedia.org/r/586354 (https://phabricator.wikimedia.org/T247389) [14:05:52] (03PS8) 10Addshore: Test wikidata: Define entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569209 (https://phabricator.wikimedia.org/T248664) (owner: 10WMDE-leszek) [14:06:04] (03PS5) 10Addshore: Test wikibase clients: Define entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569256 (https://phabricator.wikimedia.org/T248664) (owner: 10WMDE-leszek) [14:06:11] (03PS5) 10Addshore: Test commons: Define entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569257 (https://phabricator.wikimedia.org/T248664) (owner: 10WMDE-leszek) [14:07:22] !log elukey@cumin1001 START - Cookbook sre.wdqs.data-transfer [14:07:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:07:30] !log elukey@cumin1001 END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) [14:07:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:08:13] (03CR) 10Alexandros Kosiaris: [C: 04-1] calico: use rdb2006 for changeprop in codfw (031 comment) [deployment-charts] - 10https://gerrit.wikimedia.org/r/586345 (owner: 10Hnowlan) [14:08:21] (03CR) 10Addshore: [C: 03+2] Test wikidata: Define entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569209 (https://phabricator.wikimedia.org/T248664) (owner: 10WMDE-leszek) [14:09:20] !log elukey@cumin1001 START - Cookbook sre.wdqs.data-transfer [14:09:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:09:45] (03Merged) 10jenkins-bot: Test wikidata: Define entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569209 (https://phabricator.wikimedia.org/T248664) (owner: 10WMDE-leszek) [14:10:23] 10Operations, 10MediaWiki-Parser, 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team), and 2 others: API action=parse should be poolcounter-limited if a re-parse is necessary - https://phabricator.wikimedia.org/T243803 (10Anomie) Neither api.php nor ApiMain are the locations being asked for. T... [14:10:28] (03CR) 10Alexandros Kosiaris: [C: 03+1] "\o/" [puppet] - 10https://gerrit.wikimedia.org/r/586354 (https://phabricator.wikimedia.org/T247389) (owner: 10Giuseppe Lavagetto) [14:12:11] 10Operations, 10DBA, 10Patch-For-Review: Add favicon to icinga and tendril - https://phabricator.wikimedia.org/T204110 (10jcrespo) 05Resolved→03Open There is a small bug, in which favicon.ico is references as being in the current dir. When we are under a hierarchy (such as https://tendril.wikimedia.org/h... [14:12:23] (03PS1) 10Volans: Release v0.2.0 [software/homer] - 10https://gerrit.wikimedia.org/r/586355 [14:13:24] (03PS1) 10Ottomata: refine - look for schemas both primary and secondary schema repositories [puppet] - 10https://gerrit.wikimedia.org/r/586356 (https://phabricator.wikimedia.org/T240985) [14:13:51] (03CR) 10Nikerabbit: Restore Beta Cluster logging (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586353 (https://phabricator.wikimedia.org/T165128) (owner: 10Nikerabbit) [14:13:53] (03PS2) 10Ottomata: refine - look for schemas both primary and secondary schema repositories [puppet] - 10https://gerrit.wikimedia.org/r/586356 (https://phabricator.wikimedia.org/T240985) [14:15:13] (03CR) 10Addshore: [C: 03+2] Test wikidata: Define entity sources configuration (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569209 (https://phabricator.wikimedia.org/T248664) (owner: 10WMDE-leszek) [14:15:21] (03PS1) 10Jcrespo: tendril: Fix relative reference to favicon when not on root [software/tendril] - 10https://gerrit.wikimedia.org/r/586358 (https://phabricator.wikimedia.org/T204110) [14:16:17] 10Operations, 10observability, 10Patch-For-Review, 10User-CDanis: Upgrade Grafana to 6.7 - https://phabricator.wikimedia.org/T244208 (10fgiunchedi) Grafana 6.7.2 is running on https://grafana-next.wikimedia.org and AFAICT is running as expected. I ran into a problem (couldn't login anymore) after the datab... [14:18:30] (03CR) 10jerkins-bot: [V: 04-1] refine - look for schemas both primary and secondary schema repositories [puppet] - 10https://gerrit.wikimedia.org/r/586356 (https://phabricator.wikimedia.org/T240985) (owner: 10Ottomata) [14:18:40] (03CR) 10Jcrespo: [V: 03+2 C: 03+2] tendril: Fix relative reference to favicon when not on root [software/tendril] - 10https://gerrit.wikimedia.org/r/586358 (https://phabricator.wikimedia.org/T204110) (owner: 10Jcrespo) [14:18:45] (03PS1) 10Addshore: Test wikidata: Alter entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586361 (https://phabricator.wikimedia.org/T248664) [14:18:53] (03PS1) 10Elukey: Fix AMD GPU prometheus exporter for ROCm 3.3 [puppet] - 10https://gerrit.wikimedia.org/r/586362 (https://phabricator.wikimedia.org/T247082) [14:19:02] (03CR) 10Addshore: [C: 03+2] Test wikidata: Alter entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586361 (https://phabricator.wikimedia.org/T248664) (owner: 10Addshore) [14:19:32] (03PS1) 10Mholloway: MachineVision: Label blacklist updates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586363 (https://phabricator.wikimedia.org/T249285) [14:20:04] (03PS3) 10Ottomata: refine - look for schemas both primary and secondary schema repositories [puppet] - 10https://gerrit.wikimedia.org/r/586356 (https://phabricator.wikimedia.org/T240985) [14:20:20] 10Operations, 10DBA, 10Patch-For-Review: Add favicon to icinga and tendril - https://phabricator.wikimedia.org/T204110 (10jcrespo) 05Open→03Resolved Above url works now [14:20:32] (03Merged) 10jenkins-bot: Test wikidata: Alter entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586361 (https://phabricator.wikimedia.org/T248664) (owner: 10Addshore) [14:20:48] (03CR) 10Ayounsi: [C: 03+1] Release v0.2.0 [software/homer] - 10https://gerrit.wikimedia.org/r/586355 (owner: 10Volans) [14:23:26] (03CR) 10Elukey: [C: 03+2] Fix AMD GPU prometheus exporter for ROCm 3.3 [puppet] - 10https://gerrit.wikimedia.org/r/586362 (https://phabricator.wikimedia.org/T247082) (owner: 10Elukey) [14:23:37] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase entity source config for testwikidatawiki T248664 (duration: 00m 59s) [14:23:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:23:43] T248664: entitysources: Directly create entitySources config for WMF "test" wikis - https://phabricator.wikimedia.org/T248664 [14:24:42] (03PS6) 10Addshore: Test wikibase clients: Define entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569256 (https://phabricator.wikimedia.org/T248664) (owner: 10WMDE-leszek) [14:24:42] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase entity source config for testwikidatawiki T248664 (cachebust) (duration: 00m 58s) [14:24:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:26:04] (03CR) 10Addshore: Test wikibase clients: Define entity sources configuration (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569256 (https://phabricator.wikimedia.org/T248664) (owner: 10WMDE-leszek) [14:26:07] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P10909 and previous config saved to /var/cache/conftool/dbconfig/20200406-142607-marostegui.json [14:26:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:27:08] (03CR) 10Hashar: zuul: provision the scap repository (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/579587 (https://phabricator.wikimedia.org/T215458) (owner: 10Hashar) [14:27:15] 10Operations, 10Commons, 10SRE-swift-storage, 10User-fgiunchedi: Big number of uploads from DPLA bot - https://phabricator.wikimedia.org/T248151 (10MoritzMuehlenhoff) @fgiunchedi Is there anything left for this ticket? Can it be closed? [14:27:38] 10Operations, 10Mail: Wiki email not delivered to GMail - https://phabricator.wikimedia.org/T243937 (10MoritzMuehlenhoff) p:05Triage→03Medium [14:28:01] 10Operations, 10Commons, 10Thumbor: Thumbnailing page 2 of c:File:Mimořádné opatření - zákaz vývozu desinfekce rukou.pdf generates a non-fatal Ghostscript error that is piped to imagemagick - https://phabricator.wikimedia.org/T247473 (10MoritzMuehlenhoff) p:05Triage→03Medium [14:30:43] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P10910 and previous config saved to /var/cache/conftool/dbconfig/20200406-143042-marostegui.json [14:30:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:34:50] (03PS9) 10Hashar: zuul: provision the scap repository [puppet] - 10https://gerrit.wikimedia.org/r/579587 (https://phabricator.wikimedia.org/T215458) [14:36:40] (03CR) 10Hashar: "I have changed the systemd template to use the deployment repository when on Buster and relaxed the Icinga check_procs regex." [puppet] - 10https://gerrit.wikimedia.org/r/579587 (https://phabricator.wikimedia.org/T215458) (owner: 10Hashar) [14:36:45] (03CR) 10Addshore: Test wikibase clients: Define entity sources configuration (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569256 (https://phabricator.wikimedia.org/T248664) (owner: 10WMDE-leszek) [14:37:10] (03PS2) 10Filippo Giunchedi: prometheus: alert on stale textfiles [puppet] - 10https://gerrit.wikimedia.org/r/586324 [14:37:12] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10Marostegui) What we can also do is use the "test" hosts (https://wikitech.wikimedia.org/wiki/Ma... [14:37:18] (03PS7) 10Addshore: Test wikibase clients: Define entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569256 (https://phabricator.wikimedia.org/T248664) (owner: 10WMDE-leszek) [14:37:36] (03CR) 10Addshore: [C: 03+2] Test wikibase clients: Define entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569256 (https://phabricator.wikimedia.org/T248664) (owner: 10WMDE-leszek) [14:37:38] (03CR) 10Filippo Giunchedi: prometheus: alert on stale textfiles (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/586324 (owner: 10Filippo Giunchedi) [14:37:55] (03CR) 10Andrew Bogott: [C: 03+1] "Looks good! Is there any reason to not forward this same change to Rocky?" [puppet] - 10https://gerrit.wikimedia.org/r/586341 (https://phabricator.wikimedia.org/T249494) (owner: 10Arturo Borrero Gonzalez) [14:37:55] !log marostegui@cumin1001 dbctl commit (dc=all): 'Slowly repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P10911 and previous config saved to /var/cache/conftool/dbconfig/20200406-143755-marostegui.json [14:37:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:38:42] (03Merged) 10jenkins-bot: Test wikibase clients: Define entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569256 (https://phabricator.wikimedia.org/T248664) (owner: 10WMDE-leszek) [14:40:05] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3062 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [14:40:07] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase client entity source config T248664 (duration: 00m 59s) [14:40:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:40:13] T248664: entitysources: Directly create entitySources config for WMF "test" wikis - https://phabricator.wikimedia.org/T248664 [14:41:15] (03PS6) 10Addshore: Test commons: Define entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569257 (https://phabricator.wikimedia.org/T248664) (owner: 10WMDE-leszek) [14:41:15] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase client entity source config T248664 (cache bust) (duration: 00m 58s) [14:41:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:42:21] !log marostegui@cumin1001 dbctl commit (dc=all): 'Fully repool db1118 after schema change', diff saved to https://phabricator.wikimedia.org/P10912 and previous config saved to /var/cache/conftool/dbconfig/20200406-144220-marostegui.json [14:42:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:43:05] (03CR) 10Herron: [C: 03+1] prometheus: alert on stale textfiles [puppet] - 10https://gerrit.wikimedia.org/r/586324 (owner: 10Filippo Giunchedi) [14:45:17] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 240, down: 1, dormant: 0, excluded: 0, unused: 0: https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [14:45:21] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3062 is OK: HTTP OK: HTTP/1.0 200 OK - 22401 bytes in 0.272 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [14:45:49] PROBLEM - OSPF status on cr2-codfw is CRITICAL: OSPFv2: 4/5 UP : OSPFv3: 4/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [14:46:29] (03PS2) 10Hnowlan: calico: use rdb2005 for changeprop in codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/586345 [14:46:32] (03PS1) 10Addshore: TEST: entity source, use modern repoDatabase and interwikiPrefix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586366 (https://phabricator.wikimedia.org/T248664) [14:46:37] (03CR) 10CDanis: prometheus: alert on stale textfiles (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/586324 (owner: 10Filippo Giunchedi) [14:46:49] (03CR) 10Addshore: [C: 03+2] TEST: entity source, use modern repoDatabase and interwikiPrefix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586366 (https://phabricator.wikimedia.org/T248664) (owner: 10Addshore) [14:46:50] * cdanis looking at OSPF/router interfaces alerts [14:47:49] (03Merged) 10jenkins-bot: TEST: entity source, use modern repoDatabase and interwikiPrefix [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586366 (https://phabricator.wikimedia.org/T248664) (owner: 10Addshore) [14:49:13] (03PS3) 10Filippo Giunchedi: prometheus: alert on stale textfiles [puppet] - 10https://gerrit.wikimedia.org/r/586324 [14:49:20] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase, entity source, use modern repoDatabase and interwikiPrefix T248664 (duration: 00m 58s) [14:49:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:49:25] (03CR) 10Filippo Giunchedi: prometheus: alert on stale textfiles (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/586324 (owner: 10Filippo Giunchedi) [14:49:26] T248664: entitysources: Directly create entitySources config for WMF "test" wikis - https://phabricator.wikimedia.org/T248664 [14:49:53] (03CR) 10Marostegui: "Will merge tomorrow, as it is almost the end of the day for me" [puppet] - 10https://gerrit.wikimedia.org/r/586206 (https://phabricator.wikimedia.org/T231520) (owner: 10Marostegui) [14:50:31] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: TEST: Wikibase, entity source, use modern repoDatabase and interwikiPrefix T248664 (cache bust) (duration: 00m 57s) [14:50:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:51:29] (03CR) 10Addshore: [C: 03+2] Test commons: Define entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569257 (https://phabricator.wikimedia.org/T248664) (owner: 10WMDE-leszek) [14:51:32] (03PS7) 10Addshore: Test commons: Define entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569257 (https://phabricator.wikimedia.org/T248664) (owner: 10WMDE-leszek) [14:51:37] (03CR) 10Addshore: [C: 03+2] Test commons: Define entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569257 (https://phabricator.wikimedia.org/T248664) (owner: 10WMDE-leszek) [14:53:17] (03Merged) 10jenkins-bot: Test commons: Define entity sources configuration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/569257 (https://phabricator.wikimedia.org/T248664) (owner: 10WMDE-leszek) [14:54:39] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: TEST: Test commons: Define entity sources configuration T248664 (duration: 00m 57s) [14:54:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:54:45] T248664: entitysources: Directly create entitySources config for WMF "test" wikis - https://phabricator.wikimedia.org/T248664 [14:55:39] RECOVERY - OSPF status on cr2-codfw is OK: OSPFv2: 5/5 UP : OSPFv3: 5/5 UP https://wikitech.wikimedia.org/wiki/Network_monitoring%23OSPF_status [14:55:52] !log addshore@deploy1001 Synchronized wmf-config/InitialiseSettings.php: TEST: Test commons: Define entity sources configuration T248664 (cache bust) (duration: 00m 57s) [14:55:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:57:45] (03PS1) 10Addshore: Remove wmgWikibase(Repo/Client)Repositories fot test sites [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586368 (https://phabricator.wikimedia.org/T248664) [14:58:05] (03CR) 10CDanis: [C: 03+1] "thanks!" [puppet] - 10https://gerrit.wikimedia.org/r/586324 (owner: 10Filippo Giunchedi) [14:58:19] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10dmaza) >>! In T249059#6032405, @Marostegui wrote: > What we can also do is use the "test" hosts... [14:59:35] !log deploy slot done [14:59:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:00:40] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10aezell) Thanks to everyone for working to find a solution to this gap in environments we seem t... [15:01:27] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10Marostegui) The test host would be the closest (same data and same HW specs as we have in produ... [15:01:42] 10Operations, 10MediaWiki-General, 10serviceops-radar, 10Availability (MediaWiki-MultiDC), and 3 others: Use a multi-dc aware store for ObjectCache's MainStash if needed. - https://phabricator.wikimedia.org/T212129 (10Krinkle) [15:03:04] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10Marostegui) >>! In T249059#6032463, @aezell wrote: > > As for this particular task, we just h... [15:04:31] !log elukey@cumin1001 END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97) [15:04:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:06:10] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10jcrespo) > just explain [...] > Then, we would run the query with a few different parameters t... [15:08:45] (03PS2) 10Volans: Release v0.2.0 [software/homer] - 10https://gerrit.wikimedia.org/r/586355 [15:08:55] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 242, down: 0, dormant: 0, excluded: 0, unused: 0 https://wikitech.wikimedia.org/wiki/Network_monitoring%23Router_interface_down [15:10:11] 10Operations, 10serviceops: miscweb1001/2001 - upgrade to buster or decom - https://phabricator.wikimedia.org/T247648 (10Dzahn) [15:10:34] 10Operations, 10serviceops: miscweb1001/2001 - upgrade to buster or decom - https://phabricator.wikimedia.org/T247648 (10Dzahn) racktables has been switched to miscweb1002 on buster [15:12:56] (03PS1) 10Dzahn: decom miscweb1002 and miscweb2001 [puppet] - 10https://gerrit.wikimedia.org/r/586370 (https://phabricator.wikimedia.org/T247648) [15:15:12] (03PS1) 10Dzahn: decom miscweb1001 and miscweb2001 [dns] - 10https://gerrit.wikimedia.org/r/586371 (https://phabricator.wikimedia.org/T247648) [15:15:19] (03CR) 10Andrew Bogott: [C: 03+1] "confirmed -- they didn't rip out all the dependency code in Queens but they ripped out some of it, and apparently this part." [puppet] - 10https://gerrit.wikimedia.org/r/586330 (https://phabricator.wikimedia.org/T249494) (owner: 10Arturo Borrero Gonzalez) [15:15:46] (03PS2) 10Dzahn: decom miscweb1001 and miscweb2001 [puppet] - 10https://gerrit.wikimedia.org/r/586370 (https://phabricator.wikimedia.org/T247648) [15:17:05] (03Abandoned) 10Dzahn: add wikipersonas.org and link to ncredir-parking [dns] - 10https://gerrit.wikimedia.org/r/567558 (https://phabricator.wikimedia.org/T241944) (owner: 10Dzahn) [15:18:23] (03PS1) 10MSantos: WIP: collect metrics about OSM DB disk space [puppet] - 10https://gerrit.wikimedia.org/r/586372 (https://phabricator.wikimedia.org/T248858) [15:19:25] (03CR) 10jerkins-bot: [V: 04-1] WIP: collect metrics about OSM DB disk space [puppet] - 10https://gerrit.wikimedia.org/r/586372 (https://phabricator.wikimedia.org/T248858) (owner: 10MSantos) [15:21:37] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [15:28:19] (03PS1) 10Cmjohnson: Adding mgmt dns for cloudcontrol1005 [dns] - 10https://gerrit.wikimedia.org/r/586374 (https://phabricator.wikimedia.org/T247471) [15:28:47] (03CR) 10jerkins-bot: [V: 04-1] Adding mgmt dns for cloudcontrol1005 [dns] - 10https://gerrit.wikimedia.org/r/586374 (https://phabricator.wikimedia.org/T247471) (owner: 10Cmjohnson) [15:29:03] (03CR) 10Alexandros Kosiaris: [C: 03+1] calico: use rdb2005 for changeprop in codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/586345 (owner: 10Hnowlan) [15:29:45] (03PS2) 10Cmjohnson: Adding mgmt dns for cloudcontrol1005 [dns] - 10https://gerrit.wikimedia.org/r/586374 (https://phabricator.wikimedia.org/T247471) [15:30:51] (03PS3) 10Cmjohnson: Adding mgmt dns for cloudcontrol1005 [dns] - 10https://gerrit.wikimedia.org/r/586374 (https://phabricator.wikimedia.org/T247471) [15:31:00] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 (10hashar) [15:31:39] (03CR) 10Cmjohnson: [C: 03+2] Adding mgmt dns for cloudcontrol1005 [dns] - 10https://gerrit.wikimedia.org/r/586374 (https://phabricator.wikimedia.org/T247471) (owner: 10Cmjohnson) [15:32:29] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, 10cloud-services-team (Hardware): (Need by: 2020-04-01) rack/setup/install cloudcontrol1005 - https://phabricator.wikimedia.org/T247471 (10Cmjohnson) [15:32:36] (03PS2) 10Mholloway: MachineVision: Label blacklist updates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586363 (https://phabricator.wikimedia.org/T249285) [15:33:33] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is OK: HTTP OK: HTTP/1.0 200 OK - 22410 bytes in 0.257 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [15:34:48] (03CR) 10Hnowlan: [C: 03+2] calico: use rdb2005 for changeprop in codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/586345 (owner: 10Hnowlan) [15:35:08] (03Merged) 10jenkins-bot: calico: use rdb2005 for changeprop in codfw [deployment-charts] - 10https://gerrit.wikimedia.org/r/586345 (owner: 10Hnowlan) [15:36:03] (03PS2) 10MSantos: WIP: collect metrics about OSM DB disk space [puppet] - 10https://gerrit.wikimedia.org/r/586372 (https://phabricator.wikimedia.org/T248858) [15:36:55] !log hnowlan@deploy1001 helmfile [CODFW] Ran 'sync' command on namespace 'changeprop' for release 'production' . [15:36:57] (03CR) 10jerkins-bot: [V: 04-1] WIP: collect metrics about OSM DB disk space [puppet] - 10https://gerrit.wikimedia.org/r/586372 (https://phabricator.wikimedia.org/T248858) (owner: 10MSantos) [15:36:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:39:19] (03CR) 10Volans: [C: 03+2] Release v0.2.0 [software/homer] - 10https://gerrit.wikimedia.org/r/586355 (owner: 10Volans) [15:40:34] (03CR) 10Mholloway: [C: 03+2] MachineVision: Label blacklist updates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586363 (https://phabricator.wikimedia.org/T249285) (owner: 10Mholloway) [15:40:35] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [15:41:35] lots of metawiki parsing errors [15:41:49] api calls [15:42:10] (03Merged) 10jenkins-bot: MachineVision: Label blacklist updates [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586363 (https://phabricator.wikimedia.org/T249285) (owner: 10Mholloway) [15:42:33] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 (10hashar) [15:43:04] 10Operations, 10Release-Engineering-Team-TODO, 10Continuous-Integration-Infrastructure (phase-out-jessie), 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services): Migrate contint* hosts to Buster - https://phabricator.wikimedia.org/T224591 (10hashar) [15:43:23] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10dmaza) >>! In T249059#6032472, @Marostegui wrote: > The test host would be the closest (same da... [15:43:37] going down now [15:44:13] (03Merged) 10jenkins-bot: Release v0.2.0 [software/homer] - 10https://gerrit.wikimedia.org/r/586355 (owner: 10Volans) [15:45:35] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [15:45:44] !log mholloway-shell@deploy1001 Synchronized wmf-config/InitialiseSettings.php: MachineVision: Label blacklist updates (T249285) (duration: 00m 58s) [15:45:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:45:50] T249285: CAT blacklist update, 2020-04-02 - https://phabricator.wikimedia.org/T249285 [15:46:11] (03PS4) 10Filippo Giunchedi: prometheus: alert on stale textfiles [puppet] - 10https://gerrit.wikimedia.org/r/586324 [15:47:41] (03CR) 10Filippo Giunchedi: [C: 03+2] prometheus: alert on stale textfiles [puppet] - 10https://gerrit.wikimedia.org/r/586324 (owner: 10Filippo Giunchedi) [15:52:36] (03CR) 10Alexandros Kosiaris: [C: 03+1] "\o/ Nice." [puppet] - 10https://gerrit.wikimedia.org/r/586203 (https://phabricator.wikimedia.org/T224591) (owner: 10Muehlenhoff) [15:55:54] (03PS1) 10Bstorm: wikireplicas: point wb_terms_no_longer_updated and wb_terms correctly [puppet] - 10https://gerrit.wikimedia.org/r/586384 (https://phabricator.wikimedia.org/T248592) [15:55:57] 10Operations, 10ops-codfw, 10DBA: (Need by: TBD) codfw: rack/setup/install backup2002/array backup2002-array1 - https://phabricator.wikimedia.org/T248934 (10Papaul) [16:05:46] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: queens: wmfkeystonehooks: refresh code to use provider_api [puppet] - 10https://gerrit.wikimedia.org/r/586330 (https://phabricator.wikimedia.org/T249494) (owner: 10Arturo Borrero Gonzalez) [16:07:29] (03CR) 10Krinkle: Restore Beta Cluster logging (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586353 (https://phabricator.wikimedia.org/T165128) (owner: 10Nikerabbit) [16:08:44] (03PS4) 10Arturo Borrero Gonzalez: openstack: keystone: queens: fix encoding issues in our custom LDAP handler [puppet] - 10https://gerrit.wikimedia.org/r/586341 (https://phabricator.wikimedia.org/T249494) [16:09:10] (03PS2) 10Bstorm: wikireplicas: point wb_terms_no_longer_updated and wb_terms correctly [puppet] - 10https://gerrit.wikimedia.org/r/586384 (https://phabricator.wikimedia.org/T248592) [16:09:39] (03CR) 10jerkins-bot: [V: 04-1] openstack: keystone: queens: fix encoding issues in our custom LDAP handler [puppet] - 10https://gerrit.wikimedia.org/r/586341 (https://phabricator.wikimedia.org/T249494) (owner: 10Arturo Borrero Gonzalez) [16:13:44] 10Operations, 10Performance-Team, 10Traffic, 10Wikimedia-Site-requests, and 2 others: Remove "Cache-control: no-cache" hack from wmf-config - https://phabricator.wikimedia.org/T247783 (10BBlack) I worry that removing a no-cache default might have all kinds of unintended consequences. We should probably do... [16:14:15] 10Operations, 10Traffic, 10Patch-For-Review, 10Performance-Team (Radar): Improve ATS backend connection reuse against origin servers - https://phabricator.wikimedia.org/T241145 (10ema) >>! In T241145#5884800, @ema wrote: > The function `release_server_session` calls `do_io_close` if the following condition... [16:14:41] 10Operations, 10ops-codfw, 10DBA: (Need by: TBD) codfw: rack/setup/install backup2002/array backup2002-array1 - https://phabricator.wikimedia.org/T248934 (10Papaul) [16:15:40] (03PS5) 10Arturo Borrero Gonzalez: openstack: keystone: queens: fix encoding issues in our custom LDAP handler [puppet] - 10https://gerrit.wikimedia.org/r/586341 (https://phabricator.wikimedia.org/T249494) [16:17:35] (03PS6) 10Arturo Borrero Gonzalez: openstack: keystone: fix encoding issues in our custom LDAP handler [puppet] - 10https://gerrit.wikimedia.org/r/586341 (https://phabricator.wikimedia.org/T249494) [16:18:33] (03CR) 10Giuseppe Lavagetto: [C: 03+2] parsoid: switch to envoy for TLS termination [puppet] - 10https://gerrit.wikimedia.org/r/586354 (https://phabricator.wikimedia.org/T247389) (owner: 10Giuseppe Lavagetto) [16:18:58] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] openstack: keystone: fix encoding issues in our custom LDAP handler [puppet] - 10https://gerrit.wikimedia.org/r/586341 (https://phabricator.wikimedia.org/T249494) (owner: 10Arturo Borrero Gonzalez) [16:22:02] (03PS1) 10CDanis: disable rebound purges [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586390 (https://phabricator.wikimedia.org/T249325) [16:22:09] (03CR) 10Filippo Giunchedi: "Have you looked at what metrics prometheus-postgres-exporter provides for disk space? I think we should go the exporter way where we reaso" [puppet] - 10https://gerrit.wikimedia.org/r/586372 (https://phabricator.wikimedia.org/T248858) (owner: 10MSantos) [16:24:08] <_joe_> !log switching parsoid-php to envoy for TLS termination [16:24:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:29:47] 10Operations, 10Patch-For-Review: setup/install cescout1001.eqiad.wmnet - https://phabricator.wikimedia.org/T239250 (10ssingh) 05Open→03Resolved [16:29:49] 10Operations: Hardware request for Postgres database for censorship monitoring scripts - https://phabricator.wikimedia.org/T238652 (10ssingh) [16:29:52] 10Operations, 10Patch-For-Review: setup/install cescout1001.eqiad.wmnet - https://phabricator.wikimedia.org/T239250 (10ssingh) I am closing this task as the deployment is being tracked in T247273. I have updated the status in Netbox to "Active". [16:33:15] (03CR) 10Marostegui: "Where is the logic where you tell the view which table to point to?" [puppet] - 10https://gerrit.wikimedia.org/r/586384 (https://phabricator.wikimedia.org/T248592) (owner: 10Bstorm) [16:35:38] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3056 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [16:36:56] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3056 is OK: HTTP OK: HTTP/1.0 200 OK - 22400 bytes in 0.255 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [16:43:59] (03CR) 10Bstorm: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/586384 (https://phabricator.wikimedia.org/T248592) (owner: 10Bstorm) [16:45:27] (03PS3) 10Bstorm: wikireplicas: point wb_terms_no_longer_updated and wb_terms correctly [puppet] - 10https://gerrit.wikimedia.org/r/586384 (https://phabricator.wikimedia.org/T248592) [16:46:00] (03CR) 10Bstorm: "Better?" [puppet] - 10https://gerrit.wikimedia.org/r/586384 (https://phabricator.wikimedia.org/T248592) (owner: 10Bstorm) [16:48:27] PROBLEM - Check no envoy runtime configuration is left persistent on wtp1025 is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - string entries: {} not found on http://localhost:9631/runtime - 392 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23Envoy [16:49:50] _joe_: interesting ^ [16:50:41] (03PS1) 10Volans: Add Homer's plugins [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/586397 [16:50:41] <_joe_> rlazarus: that's me :D [16:50:43] (03PS1) 10Volans: Update Homer's src to v0.2.0 [software/homer/deploy] - 10https://gerrit.wikimedia.org/r/586398 [16:50:56] <_joe_> I forgot to reload envoy after switching on more access logging [16:51:08] haha I figured it wouldn't be anybody else :D [16:51:34] <_joe_> oh alex has abused the admin interface on kubernetes too [16:51:44] <_joe_> with some horrible hacks based on nsenter [16:52:02] <_joe_> !log parsoid migrated to use envoy for TLS termination [16:52:04] yeah but global_tls_min_log_code had a certain _joe_ness to it [16:52:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:52:16] <_joe_> aahh [16:52:44] (03CR) 10Jdlrobson: [C: 03+1] "I think this can be safely landed now" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/584734 (https://phabricator.wikimedia.org/T248500) (owner: 10Jforrester) [16:52:53] I would love if envoy runtime config supported comments [16:52:54] <_joe_> rlazarus: I reloaded envoy there, it will recover soon [16:53:03] <_joe_> rlazarus: indeed [16:53:38] <_joe_> rlazarus: I would love if envoy allowed to declare more things like runtime configs, but I guess for most things you want to use xDS [16:53:41] <_joe_> like for timeouts [16:53:48] something like {"layer_values": [{value: "500", comment: "increased access logging for TLS migration -joe"}]} [16:54:08] yeah I guess that's true, it ought to live in the config server and then we can just do what we want [16:54:16] !log elukey@cumin1001 START - Cookbook sre.wdqs.data-transfer [16:54:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:54:22] making more work for my future sel-- I mean, for whoever ends up building that [16:55:18] <_joe_> rlazarus: just make sure we need to use yaml for the config server [16:55:21] <_joe_> loads of it [16:55:34] sorry, you're breaking up, did you say protobufs? [16:55:36] <_joe_> bonus points if you make a 1:1 mapping with protobufs [16:55:39] <_joe_> ahahah [16:55:39] sdjfahldksj [16:57:39] Hi, I'm having trouble with a (new?) "UNSTABLE" jenkins result https://integration.wikimedia.org/ci/job/wmf-quibble-vendor-mysql-php72-docker/28970/ -- the test report claims that one test failed, but the log in the console is completely green. Any ideas what is going on? [16:58:03] <_joe_> oh also make any command line utilities conform to the UX guidelines we adopted in SRE https://en.wikipedia.org/wiki/Necronomicon [16:58:26] <_joe_> micgro42: you might find more help in #wikimedia-releng [16:58:59] thanks, I'll ask there [17:00:04] gehel and onimisionipe: Time to snap out of that daydream and deploy Wikidata Query Service weekly deploy. Get on with it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200406T1700). [17:01:17] jouncebot: deployment delayed for the moment [17:02:59] (03CR) 10Marostegui: [C: 03+1] "> Better?" [puppet] - 10https://gerrit.wikimedia.org/r/586384 (https://phabricator.wikimedia.org/T248592) (owner: 10Bstorm) [17:08:04] 10Operations, 10ORES, 10Scoring-platform-team (Current): ORES uwsgi consumes a large amount of memory and CPU when shutting down (as part of a restart) - https://phabricator.wikimedia.org/T242705 (10Halfak) [17:09:55] 10Operations, 10ops-eqiad, 10Analytics: (Need by: TBD) rack/setup/install kafka-jumbo100[789].eqiad.wmnet - https://phabricator.wikimedia.org/T244506 (10elukey) >>! In T244506#6022851, @Cmjohnson wrote: > These are failing during install. @elukey can you verify the raid configuration please > > Failed to... [17:11:52] (03PS3) 10Jforrester: Drop fallback support for wgMobileFrontendLogo [mediawiki-config] - 10https://gerrit.wikimedia.org/r/584734 (https://phabricator.wikimedia.org/T248500) [17:17:35] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [17:18:28] RECOVERY - Check no envoy runtime configuration is left persistent on wtp1025 is OK: HTTP OK: HTTP/1.1 200 OK - 286 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23Envoy [17:18:35] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [17:19:57] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3062 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [17:21:11] (03PS1) 10Jforrester: mobile: Stop reading wmgMobileFrontend and wmgMinervaNeue, always true [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586403 [17:21:13] (03PS1) 10Jforrester: Stop defining wmgMobileFrontend and wmgMinervaNeue, unread [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586404 [17:21:15] (03PS1) 10Jforrester: Move mobile-labs into CommonSettings-labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586405 [17:21:17] (03PS1) 10Jforrester: Move mobile into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586406 [17:24:15] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3062 is OK: HTTP OK: HTTP/1.0 200 OK - 22405 bytes in 0.271 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [17:32:26] (03PS1) 10Arturo Borrero Gonzalez: 57.15.185.in-addr.arpa: drop zone [dns] - 10https://gerrit.wikimedia.org/r/586409 (https://phabricator.wikimedia.org/T247972) [17:34:38] 10Operations, 10Anti-Harassment, 10SRE-Access-Requests, 10Patch-For-Review: Requesting access to analytics-privatedata-users for tchanders, dmaza, dbarratt, wikigit - https://phabricator.wikimedia.org/T249059 (10Nuria) >https://wikitech.wikimedia.org/wiki/MariaDB#Testing_servers Nice, +1 to this idea. [17:34:58] 10Operations, 10Traffic, 10Availability (MediaWiki-MultiDC), 10Performance-Team (Radar): Make CDN purges reliable - https://phabricator.wikimedia.org/T133821 (10Krinkle) [17:35:46] (03PS2) 10Jforrester: Move mobile-labs into CommonSettings-labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586405 [17:35:48] (03PS2) 10Jforrester: Move mobile into CommonSettings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586406 [17:39:08] PROBLEM - Maps tiles generation on icinga1001 is CRITICAL: CRITICAL: 100.00% of data under the critical threshold [5.0] https://wikitech.wikimedia.org/wiki/Maps/Runbook https://grafana.wikimedia.org/dashboard/db/maps-performances?panelId=8&fullscreen&orgId=1 [17:43:08] (03CR) 10Arturo Borrero Gonzalez: [C: 03+2] 57.15.185.in-addr.arpa: drop zone [dns] - 10https://gerrit.wikimedia.org/r/586409 (https://phabricator.wikimedia.org/T247972) (owner: 10Arturo Borrero Gonzalez) [17:47:25] (03CR) 10Elukey: [C: 03+2] Add imagelinks table to tables sqooped on HDFS [puppet] - 10https://gerrit.wikimedia.org/r/585292 (https://phabricator.wikimedia.org/T249113) (owner: 10Joal) [17:48:25] (03CR) 10MSantos: "> Patch Set 2:" [puppet] - 10https://gerrit.wikimedia.org/r/586372 (https://phabricator.wikimedia.org/T248858) (owner: 10MSantos) [18:00:04] RoanKattouw, Niharika, and Urbanecm: #bothumor I � Unicode. All rise for Morning SWAT(Max 6 patches) deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200406T1800). [18:00:04] nn1l2: A patch you scheduled for Morning SWAT(Max 6 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [18:00:26] 10Operations, 10SRE-Access-Requests, 10Developer-Advocacy (Apr-Jun 2020), 10Patch-For-Review: Add aklapper to analytics-privatedata-users - https://phabricator.wikimedia.org/T248905 (10Nuria) I think @Aklapper is going to need kerberos cause the work @srishakatux was doing requires access to hadoop [18:03:16] nn1l2: I can SWAT today! [18:03:55] Thanks! [18:04:50] (03PS4) 10Urbanecm: Enable Local upload on azbwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/584913 (owner: 104nn1l2) [18:05:21] nn1l2: if not already done, could you please install https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug#Browser_extensions to your browser? I'll shortly ask you to verify your change at a debug server [18:05:33] (03CR) 10Urbanecm: [C: 03+2] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/584913 (owner: 104nn1l2) [18:06:34] (03Merged) 10jenkins-bot: Enable Local upload on azbwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/584913 (owner: 104nn1l2) [18:06:42] Good luck nn1l2, you’re in safe hands with Urbanecm [18:07:04] thanks RhinosF1 :) [18:07:23] nn1l2: could you have a look if that works as expected at mwdebug1001, please? [18:08:03] Installed. This is my first time! [18:08:15] What should I do? [18:08:35] nn1l2: the extension should add an icon to your browser (usually, right to the URL address bar) [18:08:52] nn1l2: click the extension and turn it on, select mwdebug1001, you should then be able to test [18:09:47] (03CR) 10Dzahn: "https://www.wikidata.org/w/index.php?title=Q89575308&type=revision&diff=1151624665&oldid=1151497225" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/574184 (https://phabricator.wikimedia.org/T245911) (owner: 10MarcoAurelio) [18:09:56] nn1l2: I totally understand it's first time - let me know if you have any questions [18:10:15] Correct, turned on. What test should I do? [18:10:39] nn1l2: You should try to upload a file, in this case [18:10:55] just try to verify the change works as you (its author) intended [18:11:20] gehel: what happened to WDQS deploy? I had two patches waiting [18:11:44] dcausse: ^ remeber the tasks [18:12:17] RhinosF1: yep, it's in my list. We're planning to do it in ~50' [18:12:25] gehel: cool [18:12:50] to be honest, a bunch of things have come up, we'll at least prepare the release, but we might delay the actual deployment until tomorrow, sorry :/ [18:13:23] gehel: fine by me, dcausse had them ready, I just put the patch ready [18:13:59] RhinosF1: I'll ping you here as soon as I know more! [18:14:07] Thanks! [18:15:25] PROBLEM - PHP opcache health on mwdebug1001 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [18:16:19] Urbanecm: what happened to nn1|2 [18:16:25] I don't know [18:16:43] syncing that anyways, it's blocked on non-existance of mediawiki:LIcenses, when i temp-created it, it worked [18:16:52] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: 335a924: Enable Local upload on azbwiki (T248971) (duration: 00m 59s) [18:16:56] Cool [18:16:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:16:58] T248971: Enable local upload on azbwiki - https://phabricator.wikimedia.org/T248971 [18:17:15] RECOVERY - PHP opcache health on mwdebug1001 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [18:18:56] !log urbanecm@deploy1001 Synchronized wmf-config/InitialiseSettings.php: SWAT: 335a924: Enable Local upload on azbwiki (T248971; take II) (duration: 00m 58s) [18:19:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:19:20] nn1l2: I see you had an outage or something. I've tested that on your behalf, it seems to work. Enjoy the change! [18:20:05] Uploading files to this wiki is not enabled. Please upload to Wikimedia Commons.To be able to use this special page to upload to this wiki, an administrator needs to add one or more license options to the page MediaWiki:Licenses.Use the following format: * Template name|Label. Use any text to enable uploading without license options.Return to [18:20:06] آنا صفحه. [18:20:26] 10Operations, 10SRE-Access-Requests, 10Developer-Advocacy (Apr-Jun 2020), 10Patch-For-Review: Add aklapper to analytics-privatedata-users - https://phabricator.wikimedia.org/T248905 (10jcrespo) a:05Aklapper→03MoritzMuehlenhoff [18:20:36] This is what I see [18:20:58] 10Operations, 10SRE-Access-Requests, 10Developer-Advocacy (Apr-Jun 2020), 10Patch-For-Review: Add aklapper to analytics-privatedata-users - https://phabricator.wikimedia.org/T248905 (10jcrespo) ^See last comment. [18:21:00] (03PS2) 10Krinkle: reverse-proxy: Disable rebound purges [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586390 (https://phabricator.wikimedia.org/T249325) (owner: 10CDanis) [18:21:03] nn1l2: yes. To be able to use file upload, you (an admin) need to first specify licenses that needs to be used. [18:21:06] *can be [18:21:25] Not an admin on azb. Not even a speaker. [18:21:29] Thanks! [18:21:38] I will take care of that! [18:21:50] nn1l2: Cool, thanks! [18:21:54] (03CR) 10Krinkle: "LGTM, but this should not happen until after we confirm that the max-lag concern is indeed correctly addressed by the other mitigation we " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586390 (https://phabricator.wikimedia.org/T249325) (owner: 10CDanis) [18:22:07] !log Morning SWAT done [18:22:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:29:54] (03CR) 10Bstorm: "> Patch Set 3: Code-Review+1" [puppet] - 10https://gerrit.wikimedia.org/r/586384 (https://phabricator.wikimedia.org/T248592) (owner: 10Bstorm) [18:31:48] (03CR) 10Bstorm: "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/586384 (https://phabricator.wikimedia.org/T248592) (owner: 10Bstorm) [18:32:11] (03CR) 10Bstorm: [C: 03+2] wikireplicas: point wb_terms_no_longer_updated and wb_terms correctly [puppet] - 10https://gerrit.wikimedia.org/r/586384 (https://phabricator.wikimedia.org/T248592) (owner: 10Bstorm) [18:34:29] (03CR) 1020after4: ATS/phabricator: directly talk wss:// to aphlict (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/569104 (https://phabricator.wikimedia.org/T238593) (owner: 10Dzahn) [18:42:41] !log elukey@cumin1001 END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) [18:42:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:42:45] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3056 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [18:43:13] (03CR) 10Dzahn: ATS/phabricator: directly talk wss:// to aphlict (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/569104 (https://phabricator.wikimedia.org/T238593) (owner: 10Dzahn) [18:48:07] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3056 is OK: HTTP OK: HTTP/1.0 200 OK - 22397 bytes in 0.272 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [18:51:53] !log elukey@cumin1001 START - Cookbook sre.wdqs.data-transfer [18:51:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:55:46] (03CR) 10RLazarus: [C: 03+2] maintenance: Migrate translationnotifications jobs to periodic_job [puppet] - 10https://gerrit.wikimedia.org/r/585250 (https://phabricator.wikimedia.org/T211250) (owner: 10RLazarus) [18:57:47] !log elukey@cumin1001 END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97) [18:57:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:58:24] elukey: need any help? [18:58:41] !log elukey@cumin1001 START - Cookbook sre.wdqs.data-transfer [18:58:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:00:02] !log elukey@cumin1001 END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97) [19:00:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:00:37] gehel: hey sorry, I was about to ping the chan, the categories transfer seems to not work [19:00:50] I was checking the extended log and one of the command fails [19:00:56] meeting time, can we check tomorrow? [19:01:07] sure sure, will retry tomorrow morning :) [19:02:33] gehel: I think I may know why it failed, retrying one last time [19:03:09] !log elukey@cumin1001 START - Cookbook sre.wdqs.data-transfer [19:03:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:03:42] yessss [19:03:54] there was a process bound to the nc port on 2007 [19:04:19] categories.jnl is being copied [19:05:39] !log elukey@cumin1001 END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) [19:05:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:06:38] gehel: \o/ [19:08:20] 10Operations, 10Performance-Team, 10Traffic, 10Wikimedia-Site-requests, and 2 others: Remove "Cache-control: no-cache" hack from wmf-config - https://phabricator.wikimedia.org/T247783 (10Krinkle) MediaWiki's default is already to not cache by default. Any response it produces through normal means sets Cach... [19:09:08] 10Operations, 10MediaWiki-General, 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team), and 4 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10Framawiki) > The titles will be by user "Maintenance scrip... [19:10:27] 10Operations, 10MediaWiki-General, 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team), and 4 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10DannyS712) >>! In T219279#6012187, @holger.knust wrote: >... [19:11:41] (03CR) 10WMDE-leszek: [C: 03+1] Remove wmgWikibase(Repo/Client)Repositories fot test sites [mediawiki-config] - 10https://gerrit.wikimedia.org/r/586368 (https://phabricator.wikimedia.org/T248664) (owner: 10Addshore) [19:13:59] (03CR) 10C. Scott Ananian: "Did this break officewiki? Timing seems suspicious:" [puppet] - 10https://gerrit.wikimedia.org/r/586354 (https://phabricator.wikimedia.org/T247389) (owner: 10Giuseppe Lavagetto) [19:14:12] 10Operations, 10MediaWiki-General, 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team), and 4 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10Anomie) >>! In T219279#6033667, @DannyS712 wrote: > Since... [19:14:17] 10Operations: VE and Flow fails with "Error contacting the Parsoid/RESTBase server (HTTP 404)" on officewiki - https://phabricator.wikimedia.org/T249535 (10Framawiki) [19:14:34] 10Operations: VE and Flow fail with "Error contacting the Parsoid/RESTBase server (HTTP 404)" on officewiki - https://phabricator.wikimedia.org/T249535 (10Framawiki) [19:14:37] 10Operations, 10MediaWiki-General, 10serviceops, 10Patch-For-Review, 10Service-Architecture: Use envoy for TLS termination on the appservers - https://phabricator.wikimedia.org/T247389 (10cscott) This might have broken officewiki: T249535 [19:14:59] _joe_: any chance you caused VE to break on officewiki? [19:15:22] cf T247389 and T249535 [19:15:22] T247389: Use envoy for TLS termination on the appservers - https://phabricator.wikimedia.org/T247389 [19:15:23] T249535: VE and Flow fail with "Error contacting the Parsoid/RESTBase server (HTTP 404)" on officewiki - https://phabricator.wikimedia.org/T249535 [19:15:26] <_joe_> cscott: uh it seems not very probable [19:15:44] it's the only conf change that has happened on the parsoid cluster today, as far as i can tell [19:15:50] i'm just looking at the timing [19:15:58] <_joe_> is that just officewiki or all wikis? [19:16:46] just officewiki afaik [19:17:00] <_joe_> that's very unprobable then [19:17:15] <_joe_> unless we somehow closed a loophole we didn't know of [19:17:23] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [19:17:25] cscott: _joe_: VE is getting the "Wikimedia Error" pages as a response from Parsoid. this seems like it might be important [19:17:30] that "unless" is what i'm looking at [19:17:39] <_joe_> MatmaRex: on any wiki? [19:17:53] i don't know, but definitely on officewiki [19:18:15] <_joe_> can someone test another wiki please? if it's just officewiki this might not be related to my change [19:18:27] check the "response" field here: https://logstash-next.wikimedia.org/app/kibana#/doc/2d891220-161a-11ea-a364-c747e6d6cfc2/logstash-mediawiki-2020.04.06?id=rRrgUHEBxWmzajXK0QU1 [19:18:30] <_joe_> anyways, lemme get to my computer [19:18:31] enwiki is fine. [19:18:32] (i hope that link works for you) [19:18:47] 10Operations, 10Cloud-Services, 10Traffic: Requests to production are sometimes timing out or giving empty response - https://phabricator.wikimedia.org/T249035 (10MusikAnimal) The timeouts are still happening consistently, and my bots are still complaining, too 🙁 ([[ https://en.wikipedia.org/w/index.php?titl... [19:19:06] (03CR) 10Kosta Harlan: [C: 03+1] Enable GrowthExperiments welcome survey on Ukrainian, Hungarian, Armenian Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/584135 (https://phabricator.wikimedia.org/T238295) (owner: 10Gergő Tisza) [19:19:47] en.beta is fine [19:19:53] <_joe_> [19:19:57] <_joe_> oh [19:20:10] in other cases, we're getting: {"messageTranslations":{"en":"The requested relative path (/office.wikimedia.org/v3/page/html/Homepage/268268) did not match any known handler"},"httpCode":404,"httpReason":"Not Found"} [19:20:11] <_joe_> sigh, yes, we must have something that doesn't set the content-lenght header [19:20:27] https://logstash-next.wikimedia.org/app/kibana#/doc/2d891220-161a-11ea-a364-c747e6d6cfc2/logstash-mediawiki-2020.04.06?id=CBngUHEBxWmzajXKcePI [19:20:29] <_joe_> MatmaRex: that seems restbase though [19:20:37] i think officewiki is special in that it either bypasses restbase or does some other special non-caching thing because it's a private wiki [19:20:39] yeah. there seem to be two problems [19:20:53] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3052 is OK: HTTP OK: HTTP/1.0 200 OK - 22402 bytes in 0.256 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [19:20:58] i forget exactly how it is set up, but it is definitely a unique and special flower [19:21:13] well, as unique and special as any of the private wikis are [19:21:40] <_joe_> yes, something like that [19:21:41] _joe_: VE code that sends these requests does not set the 'content-length' header explicitly. should it do that? (or is it supposed to happen automatically somewhere?) [19:22:06] <_joe_> MatmaRex: I guess it should happen automatically, but let me revert first, ask questions later [19:22:21] <_joe_> the 404 might not be related to my swittch, but the 411 surely is [19:22:24] _joe_: it's late for you, want me to take over the revert? [19:22:53] <_joe_> rlazarus: it's going to be tricky. will probably need a lot of coordination, so let's do it together [19:22:58] sgtm [19:23:07] 10Operations: VE and Flow fail with "Error contacting the Parsoid/RESTBase server (HTTP 404)" / "…(HTTP 411)" on officewiki - https://phabricator.wikimedia.org/T249535 (10matmarex) [19:24:04] <_joe_> I am sorry, I did test VE worked but on itwiki, I saw no significant errors in the logs either [19:24:20] Pchelolo confirms no restbase for private wikis [19:24:42] (03PS1) 10Giuseppe Lavagetto: Revert "parsoid: switch to envoy for TLS termination" [puppet] - 10https://gerrit.wikimedia.org/r/586423 (https://phabricator.wikimedia.org/T249535) [19:24:44] so the special thing here is that restbase *isn't* in the middle on officewiki [19:25:22] <_joe_> rlazarus: so first thing, I'm disabling puppet across all parsoid [19:25:24] (03CR) 10Herron: [C: 03+2] kibana: move httpd proxy authentication to a separate profile [puppet] - 10https://gerrit.wikimedia.org/r/583414 (https://phabricator.wikimedia.org/T246961) (owner: 10Mstyles) [19:25:40] <_joe_> then I think I need to first make sure envoy stops listening on 443, then run puppet [19:25:56] <_joe_> so I'm going to test it first on one server, the procedure should be [19:27:17] <_joe_> depool; rm /etc/envoy/listeners.d/00-tls_terminator_443.yaml; build-envoy-config -c /etc/envoy/; systemctl reload envoyproxy.service; puppet agent -tv; pool if all goes well [19:27:20] <_joe_> sigh [19:27:41] (03CR) 10Giuseppe Lavagetto: [C: 03+2] Revert "parsoid: switch to envoy for TLS termination" [puppet] - 10https://gerrit.wikimedia.org/r/586423 (https://phabricator.wikimedia.org/T249535) (owner: 10Giuseppe Lavagetto) [19:27:51] ack, makes sense [19:28:11] 10Operations, 10MediaWiki-General, 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team), and 4 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10DannyS712) >>! In T219279#6033694, @Anomie wrote: >>>! In... [19:28:59] just as a quick sanity check: officewiki isn't *that* important. if the revert is particularly tricky, it might make more sense just to disable parsoid/flow on officewiki for a few hours while we 'fix' things. [19:29:13] just double-checking that the cure isn't worse than the disease. officewiki isn't a public wiki. [19:29:31] and as far as I understand this, it is only private wikis which are affected. [19:29:46] RhinosF1: we're running late on preparing this deployment [19:29:52] <_joe_> cscott: still, better to revert now and I find root cause tomorrow morning [19:30:01] 10Operations, 10MediaWiki-General, 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team), and 4 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10DannyS712) >>! In T219279#6033659, @Framawiki wrote: >> Th... [19:30:20] https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/InitialiseSettings.php#L17274 [19:30:20] RhinosF1: we're actually taking this occasion to introduce a new team member to the release process, and as always, the first time you do something it takes longer [19:30:25] <_joe_> uhm puppet is failing [19:30:27] We'll resume this tomorrow [19:31:01] gehel: no problem, just watched BBC’s emergency news bulletin [19:31:20] RhinosF1: do I wan to know? [19:31:30] cscott, on officewiki, Parsoid is used for previews for the source editor. [19:31:46] 2017 source editor is the default there. [19:31:52] gehel: Borris has gone into ICU @ St Thomas’ hospital [19:32:10] subbu: i'm just suggesting that anything which only affects the WMF internal wiki is somewhat lower priority [19:32:12] ouch, things are getting real! [19:32:38] cscott, understood. [19:33:13] subbu: and the revert _joe_ is working on actually touches all the configuration of the production public wikis as I understand it. so the usual calculus of 'revert first' might bear thinking through. [19:34:33] but i think i'm too slow and _joe_ is actually fixing things while i'm still thinking through the issues. ;) [19:35:12] <_joe_> cscott: I'm trying to revert my change quickly [19:35:17] <_joe_> let's see if I pull it off [19:35:34] noted. i'll let _joe_ and sre make that call. [19:37:35] (03PS1) 10Jhedden: openstack: update haproxy healthchecks for openstack services [puppet] - 10https://gerrit.wikimedia.org/r/586424 (https://phabricator.wikimedia.org/T249453) [19:37:37] (03CR) 10jerkins-bot: [V: 04-1] openstack: update haproxy healthchecks for openstack services [puppet] - 10https://gerrit.wikimedia.org/r/586424 (https://phabricator.wikimedia.org/T249453) (owner: 10Jhedden) [19:38:58] <_joe_> rlazarus: can you check the status of wtp1025 in pybal? lvs1015 [19:39:02] looking [19:39:50] (03PS2) 10Jhedden: openstack: update haproxy healthchecks for openstack services [puppet] - 10https://gerrit.wikimedia.org/r/586424 (https://phabricator.wikimedia.org/T249453) [19:40:59] PROBLEM - Prometheus jobs reduced availability on icinga1001 is CRITICAL: job=swagger_check_restbase_esams site=esams https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [19:41:49] 10Operations, 10MediaWiki-Cache, 10Page Content Service, 10Product-Infrastructure-Team-Backlog, and 4 others: cache_text cluster consistently backlogged on purge requests - https://phabricator.wikimedia.org/T249325 (10Krinkle) [19:41:52] _joe_: can't find my notes, remind me where to look? [19:41:53] 10Operations, 10Traffic, 10Availability (MediaWiki-MultiDC), 10Patch-For-Review, 10Performance-Team (Radar): Make CDN purges reliable - https://phabricator.wikimedia.org/T133821 (10Krinkle) [19:42:06] 10Operations, 10Patch-For-Review: VE and Flow fail with "Error contacting the Parsoid/RESTBase server (HTTP 404)" / "…(HTTP 411)" on officewiki - https://phabricator.wikimedia.org/T249535 (10cscott) >>! In T249535#6033680, @Bugreporter wrote: > Note I tag it as Unbreak Now because of {T249533} which may or may... [19:42:28] _joe_: conftool has it {"weight": 10, "pooled": "yes"} [19:42:54] for service=parsoid{,-php} and same but weight 1 for service=canary [19:43:10] 10Operations, 10Cloud-Services, 10Traffic, 10Wikimedia-Incident: Requests to production are sometimes timing out or giving empty response - https://phabricator.wikimedia.org/T249035 (10Krinkle) [19:43:42] (03CR) 10Jhedden: "I've verified these new checks on all Queens and Rocky OpenStack services. PCC results: https://puppet-compiler.wmflabs.org/compiler1003/2" [puppet] - 10https://gerrit.wikimedia.org/r/586424 (https://phabricator.wikimedia.org/T249453) (owner: 10Jhedden) [19:44:39] RECOVERY - Prometheus jobs reduced availability on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Prometheus%23Prometheus_job_unavailable https://grafana.wikimedia.org/d/NEJu05xZz/prometheus-targets [19:45:02] (03CR) 10jerkins-bot: [V: 04-1] openstack: update haproxy healthchecks for openstack services [puppet] - 10https://gerrit.wikimedia.org/r/586424 (https://phabricator.wikimedia.org/T249453) (owner: 10Jhedden) [19:45:33] 10Operations, 10Patch-For-Review: VE and Flow fail with "Error contacting the Parsoid/RESTBase server (HTTP 404)" / "…(HTTP 411)" on officewiki - https://phabricator.wikimedia.org/T249535 (10cscott) The change at fault appears to be 8e9f967d543721b16ee51fc3772976c8963440ae, which I flagged as possibly suspicio... [19:46:00] <_joe_> rlazarus: you can also check, from lvs1015 itself, the pybal admin interface whtat will tell you the status of the server according to pybal [19:46:16] (03PS3) 10Jhedden: openstack: update haproxy healthchecks for openstack services [puppet] - 10https://gerrit.wikimedia.org/r/586424 (https://phabricator.wikimedia.org/T249453) [19:46:18] (03CR) 10jerkins-bot: [V: 04-1] openstack: update haproxy healthchecks for openstack services [puppet] - 10https://gerrit.wikimedia.org/r/586424 (https://phabricator.wikimedia.org/T249453) (owner: 10Jhedden) [19:46:22] <_joe_> curl -sL localhost:9090/pools/parsoid-php_443 [19:47:10] ack thanks -- thought I remembered something like that but I couldn't find it in wikitech, made a note and we can follow up on that tomorrow [19:47:35] (03PS4) 10Jhedden: openstack: update haproxy healthchecks for openstack services [puppet] - 10https://gerrit.wikimedia.org/r/586424 (https://phabricator.wikimedia.org/T249453) [19:48:21] <_joe_> rlazarus: so right now what I am going to do is the following cumin sauce [19:48:27] <_joe_> cumin -m async -b1 'A:parsoid and not P{wtp1025*} and A:eqiad' depool 'rm -rf /etc/envoy/listeners.d/00-tls_terminator_443.yaml' 'build-envoy-config -c /etc/envoy/ [19:48:29] <_joe_> ' 'systemctl restart envoyproxy.service' 'run-puppet-agent -e "switching to nginx --joe"' pool [19:48:44] okay, got there: wtp1025.eqiad.wmnet: enabled/up/pooled [19:48:45] <_joe_> heh, minus the paste fail [19:48:58] <_joe_> and at the same time, keep an eye on that curl [19:49:20] <_joe_> watch -n 5 curl -sL localhost:9090/pools/parsoid-php_443 :P [19:49:42] haha I just started that running, but with -n 2 :P [19:49:57] 10Operations, 10Patch-For-Review: VE and Flow fail with "Error contacting the Parsoid/RESTBase server (HTTP 404)" / "…(HTTP 411)" on officewiki - https://phabricator.wikimedia.org/T249535 (10Framawiki) Same for VE on otrswiki. https://otrs-wiki.wikimedia.org/w/api.php?action=visualeditor&format=json&paction=pa... [19:50:15] ^ another private wiki fyi [19:50:20] <_joe_> you will see the servers being first disabled, then going down, then coming back up [19:50:38] <_joe_> Framawiki: yeah we're fixing it now, I honestly have no idea why it went like this [19:51:01] <_joe_> what on earth are we doing differently that specifically breaks parsoid [19:51:16] <_joe_> I mean the 411 I understand, but the 404s... not really [19:51:41] PROBLEM - PHP opcache health on mw2330 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [19:52:16] 10Operations, 10MediaWiki-General, 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team), and 4 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10Anomie) >>! In T219279#6033762, @DannyS712 wrote: > I assu... [19:53:23] <_joe_> cscott, Framawiki the rollback is happening; I'm going slower than one would do if the breakage was larger [19:53:41] <_joe_> but in 20 minutes or so it should be completed [19:54:29] 10Operations, 10MediaWiki-General, 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team), and 4 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10DannyS712) >>! In T219279#6033847, @Anomie wrote: >>>! In... [19:55:57] <_joe_> ok I think actually nginx was fixing a problem for us, and envoy is more neutral in proxying requests [19:56:22] in terms of missing content-length header you mean? [19:56:27] <_joe_> yes [19:56:28] _joe_: the difference in config is that restbase isn't in the middle [19:56:51] <_joe_> cscott: restbase surely fixes the absence of the content-length header if it's a post request [19:57:01] <_joe_> rlazarus: this is going to be fun to debug! [19:57:06] :D [19:57:24] <_joe_> the error definitely seems to come from the backend and not envoy itself [19:58:33] <_joe_> oh no scratch that [19:58:45] <_joe_> we're callign something via the caching layer [19:58:56] <_joe_> so the error page is actually the one from ATS [19:59:07] <_joe_> I'm looking at the logs MatmaRex pointed to [19:59:18] <_joe_> that encapsulates a response from envoy [20:00:04] halfak and accraze: My dear minions, it's time we take the moon! Just kidding. Time for Services – Graphoid / Citoid / ORES deploy. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200406T2000). [20:01:23] <_joe_> cscott: I just saved a null edit via VE in officewiki [20:01:25] _joe_, ok, looks fixed on officewiki .. i could open a page, edit and review diffs /cc cscott [20:01:39] <_joe_> but it will take more time to complete though [20:02:00] <_joe_> we're at less than 50% of the cluster done [20:02:40] thanks _joe_ ... i suppose we'll need to file phab tasks for followups for fixing the root issue ... incident report required? [20:02:55] <_joe_> subbu: yes I think so, but we will take care of it [20:03:01] ok, ty. [20:03:04] <_joe_> we also need to understand the root cause better [20:03:36] i think we (both parsing and ops) should have a list of "special cases" in VE/Restbase/Parsoid config [20:03:40] (03CR) 10Andrew Bogott: [C: 03+1] "nifty" [puppet] - 10https://gerrit.wikimedia.org/r/586424 (https://phabricator.wikimedia.org/T249453) (owner: 10Jhedden) [20:04:18] not just group0/1/2 but also officewiki (as representative of private wikis) and beta/labs. [20:04:20] <_joe_> cscott: yeah I also know about it but I just didn't remember to test a private wiki too [20:04:57] it occurred to us that none of us had tested officewiki recently either. i couldn't immediately even remember if it was expected to work or if it was in the same "known broken" category as wikitech. [20:05:13] so we can put that in the "docs required" part of the incident report [20:05:18] (03CR) 10Jhedden: [C: 03+2] openstack: update haproxy healthchecks for openstack services [puppet] - 10https://gerrit.wikimedia.org/r/586424 (https://phabricator.wikimedia.org/T249453) (owner: 10Jhedden) [20:06:13] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3062 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [20:06:41] RECOVERY - PHP opcache health on mw2330 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [20:07:17] subbu, _joe_: i'm going to take a home-schooling break. ok for _joe_ to ping subbu and/or ping me on a non-IRC channel (hangouts, phone, etc) if anything urgent comes up? [20:07:33] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3062 is OK: HTTP OK: HTTP/1.0 200 OK - 22402 bytes in 0.257 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [20:07:37] <_joe_> sure but I think we're all set [20:07:46] great, thanks! [20:07:49] cscott, sure. thanks. [20:08:02] * cscott keeps his fingers crossed, just to be safe [20:09:09] (03PS1) 10Papaul: DNS: Add mgmt and production DNS for backup2002 [dns] - 10https://gerrit.wikimedia.org/r/586431 [20:10:00] 10Operations, 10Patch-For-Review: VE and Flow fail with "Error contacting the Parsoid/RESTBase server (HTTP 404)" / "…(HTTP 411)" on officewiki - https://phabricator.wikimedia.org/T249535 (10Joe) a:03Joe [20:11:08] (03CR) 10Papaul: [C: 03+2] DNS: Add mgmt and production DNS for backup2002 [dns] - 10https://gerrit.wikimedia.org/r/586431 (owner: 10Papaul) [20:12:07] 10Operations, 10ops-codfw, 10DBA, 10Patch-For-Review: (Need by: TBD) codfw: rack/setup/install backup2002/array backup2002-array1 - https://phabricator.wikimedia.org/T248934 (10Papaul) [20:13:17] <_joe_> Framawiki: still having problems on otrswiki? [20:17:43] 10Operations, 10Patch-For-Review: VE and Flow fail with "Error contacting the Parsoid/RESTBase server (HTTP 404)" / "…(HTTP 411)" on officewiki - https://phabricator.wikimedia.org/T249535 (10Joe) 05Open→03Resolved The rollback of today's change is almost completed. I will take the time tomorrow to try to u... [20:18:55] <_joe_> subbu: the problem should be over, the rollback is completed, at least in eqiad [20:20:04] great. [20:20:36] (03PS1) 10Papaul: ADD backup2002 to role spare [puppet] - 10https://gerrit.wikimedia.org/r/586435 [20:20:57] <_joe_> (now running in codfw, where there is no active traffic) [20:22:00] (03CR) 10jerkins-bot: [V: 04-1] ADD backup2002 to role spare [puppet] - 10https://gerrit.wikimedia.org/r/586435 (owner: 10Papaul) [20:24:26] (03PS2) 10Papaul: ADD backup2002 to role spare [puppet] - 10https://gerrit.wikimedia.org/r/586435 (https://phabricator.wikimedia.org/T238601) [20:25:27] (03CR) 10jerkins-bot: [V: 04-1] ADD backup2002 to role spare [puppet] - 10https://gerrit.wikimedia.org/r/586435 (https://phabricator.wikimedia.org/T238601) (owner: 10Papaul) [20:26:42] PROBLEM - Check no envoy runtime configuration is left persistent on mw1276 is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - string entries: {} not found on http://localhost:9631/runtime - 392 bytes in 0.001 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23Envoy [20:27:19] (03PS1) 10Jhedden: openstack: codfw1dev update neutron haproxy config [puppet] - 10https://gerrit.wikimedia.org/r/586437 (https://phabricator.wikimedia.org/T249453) [20:27:55] I just noticed https://gerrit.wikimedia.org/r/plugins/gitiles/labs/tools/wikibugs2/+/master/grrrrit.py#113 [20:29:16] PROBLEM - Widespread puppet agent failures on icinga1001 is CRITICAL: 0.01081 ge 0.01 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [20:30:15] _joe_: are puppet failures related? ^ [20:30:24] they're on the parsoid cluster [20:30:29] <_joe_> yes [20:30:36] (03CR) 10Jhedden: [C: 03+2] openstack: codfw1dev update neutron haproxy config [puppet] - 10https://gerrit.wikimedia.org/r/586437 (https://phabricator.wikimedia.org/T249453) (owner: 10Jhedden) [20:30:36] yeah just got there too [20:30:38] (03PS3) 10Papaul: ADD backup2002 to role spare [puppet] - 10https://gerrit.wikimedia.org/r/586435 (https://phabricator.wikimedia.org/T238601) [20:30:39] <_joe_> i think it willl solve itself [20:31:03] <_joe_> it's a failed nginx reload, that happens before the restart [20:31:17] '/usr/sbin/service nginx reload' returned 1 instead of one of [0] [20:31:18] nod [20:31:31] <_joe_> yep [20:31:39] <_joe_> so in fact it's a false positive [20:33:27] ACKNOWLEDGEMENT - Check no envoy runtime configuration is left persistent on mw1276 is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - string entries: {} not found on http://localhost:9631/runtime - 392 bytes in 0.001 second response time Giuseppe Lavagetto Monitoring errors from envoy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23Envoy [20:34:44] <_joe_> revert done [20:35:08] _joe_: so regarding opcache/l10n, how do you want to proceed? hiera override in puppet for higher opcache and then depool and enable in wmf-config? [20:35:46] <_joe_> something like that, yes [20:35:58] _joe_: eqiad or codfw? [20:36:40] <_joe_> Krinkle: it makes no difference for my tests, so I'd pick codfw [20:36:44] <_joe_> not a debug server though [20:36:54] ok [20:37:01] I'll patch some things tomorrow and let you know. [20:39:41] (03PS1) 10Ppchelko: ChangeProp: add more metrics and deploy the latest code [deployment-charts] - 10https://gerrit.wikimedia.org/r/586439 (https://phabricator.wikimedia.org/T248677) [20:42:12] <_joe_> ack, thanks :) [20:42:55] (03PS2) 10Jeena Huneidi: mediawiki-dev: Replace phpenmod with configmap to enable xdebug [deployment-charts] - 10https://gerrit.wikimedia.org/r/584733 (https://phabricator.wikimedia.org/T246921) [20:45:03] 10Operations, 10MediaWiki-General, 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team), and 4 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10Anomie) >>! In T219279#6033852, @DannyS712 wrote: > But, w... [20:50:54] 10Operations, 10Traffic, 10good first task: Only retry failed requests for external traffic on cache frontends - https://phabricator.wikimedia.org/T249317 (10srishakatux) @ema Hello! As this task is tagged as a #good_first_task, I'm wondering if it can be made clear where exactly the code needs to be changed... [21:00:04] Reedy and sbassett: Your horoscope predicts another unfortunate Weekly Security deployment window deploy. May Zuul be (nice) with you. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200406T2100). [21:05:45] 10Operations, 10MediaWiki-General, 10serviceops, 10Core Platform Team Workboards (Clinic Duty Team), and 4 others: Some pages will become completely unreachable after PHP7 update due to Unicode changes - https://phabricator.wikimedia.org/T219279 (10DannyS712) >>! In T219279#6034047, @Anomie wrote: >>>! In... [21:08:59] RECOVERY - Widespread puppet agent failures on icinga1001 is OK: (C)0.01 ge (W)0.006 ge 0 https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/yOxVDGvWk/puppet [21:10:29] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3058 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [21:10:39] (03CR) 10Jeena Huneidi: "> Patch Set 1: Code-Review-1" (032 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/584733 (https://phabricator.wikimedia.org/T246921) (owner: 10Jeena Huneidi) [21:11:56] (03CR) 10Jeena Huneidi: "> Patch Set 1:" (033 comments) [deployment-charts] - 10https://gerrit.wikimedia.org/r/584733 (https://phabricator.wikimedia.org/T246921) (owner: 10Jeena Huneidi) [21:16:46] (03PS1) 10Andrew Bogott: openstack: keystone: update encoding in our custom LDAP handler [puppet] - 10https://gerrit.wikimedia.org/r/586446 (https://phabricator.wikimedia.org/T249494) [21:17:15] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3058 is OK: HTTP OK: HTTP/1.0 200 OK - 22401 bytes in 0.296 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [21:17:44] (03PS2) 10Andrew Bogott: openstack: keystone: update encoding in our custom LDAP handler [puppet] - 10https://gerrit.wikimedia.org/r/586446 (https://phabricator.wikimedia.org/T249494) [21:22:53] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3058 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [21:22:59] (03CR) 10Brennen Bearnes: [C: 03+2] mediawiki-dev: Replace phpenmod with configmap to enable xdebug [deployment-charts] - 10https://gerrit.wikimedia.org/r/584733 (https://phabricator.wikimedia.org/T246921) (owner: 10Jeena Huneidi) [21:23:36] (03Merged) 10jenkins-bot: mediawiki-dev: Replace phpenmod with configmap to enable xdebug [deployment-charts] - 10https://gerrit.wikimedia.org/r/584733 (https://phabricator.wikimedia.org/T246921) (owner: 10Jeena Huneidi) [21:30:45] (03CR) 10Andrew Bogott: [C: 03+2] openstack: keystone: update encoding in our custom LDAP handler [puppet] - 10https://gerrit.wikimedia.org/r/586446 (https://phabricator.wikimedia.org/T249494) (owner: 10Andrew Bogott) [21:32:00] (03CR) 10BryanDavis: [C: 03+1] "The approach of making everything byte strings was not right based on what we are seeing in LDAP. Krenair did some quick testing with py2 " [puppet] - 10https://gerrit.wikimedia.org/r/586446 (https://phabricator.wikimedia.org/T249494) (owner: 10Andrew Bogott) [21:33:53] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3058 is OK: HTTP OK: HTTP/1.0 200 OK - 22408 bytes in 3.310 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [21:35:13] (03PS1) 10Andrew Bogott: openstack: keystone: update encoding in our custom LDAP handler [puppet] - 10https://gerrit.wikimedia.org/r/586449 (https://phabricator.wikimedia.org/T249494) [21:36:51] (03CR) 10Andrew Bogott: [C: 03+2] openstack: keystone: update encoding in our custom LDAP handler [puppet] - 10https://gerrit.wikimedia.org/r/586449 (https://phabricator.wikimedia.org/T249494) (owner: 10Andrew Bogott) [22:02:10] * Krinkle testing stuff on mwdebug1001 [22:05:59] (03PS1) 10Andrew Bogott: wmfkeystonehooks: encode member names as utf8 [puppet] - 10https://gerrit.wikimedia.org/r/586454 (https://phabricator.wikimedia.org/T249494) [22:07:41] (03PS1) 10Halfak: Explicitly install both myspell-pt-pt and myspell-pt-br [puppet] - 10https://gerrit.wikimedia.org/r/586456 [22:08:03] (03PS2) 10Halfak: Explicitly install both myspell-pt-pt and myspell-pt-br [puppet] - 10https://gerrit.wikimedia.org/r/586456 (https://phabricator.wikimedia.org/T249559) [22:11:10] (03PS2) 10Andrew Bogott: wmfkeystonehooks: encode member names as utf8 [puppet] - 10https://gerrit.wikimedia.org/r/586454 (https://phabricator.wikimedia.org/T249494) [22:20:16] (03PS3) 10Andrew Bogott: wmfkeystonehooks: encode member names as utf8 [puppet] - 10https://gerrit.wikimedia.org/r/586454 (https://phabricator.wikimedia.org/T249494) [22:20:18] (03PS1) 10Andrew Bogott: wmfkeystonehooks sudo group: encode a bunch of ldap values as utf8 [puppet] - 10https://gerrit.wikimedia.org/r/586457 (https://phabricator.wikimedia.org/T249494) [22:21:40] (03CR) 10jerkins-bot: [V: 04-1] wmfkeystonehooks sudo group: encode a bunch of ldap values as utf8 [puppet] - 10https://gerrit.wikimedia.org/r/586457 (https://phabricator.wikimedia.org/T249494) (owner: 10Andrew Bogott) [22:26:33] (03PS2) 10Andrew Bogott: wmfkeystonehooks sudo group: encode a bunch of ldap values as utf8 [puppet] - 10https://gerrit.wikimedia.org/r/586457 (https://phabricator.wikimedia.org/T249494) [22:29:59] (03PS4) 10BryanDavis: wmfkeystonehooks: encode member names as utf8 [puppet] - 10https://gerrit.wikimedia.org/r/586454 (https://phabricator.wikimedia.org/T249494) (owner: 10Andrew Bogott) [22:31:46] (03PS5) 10BryanDavis: wmfkeystonehooks: encode member names as utf8 [puppet] - 10https://gerrit.wikimedia.org/r/586454 (https://phabricator.wikimedia.org/T249494) (owner: 10Andrew Bogott) [22:33:07] (03CR) 10Andrew Bogott: [C: 03+2] wmfkeystonehooks: encode member names as utf8 [puppet] - 10https://gerrit.wikimedia.org/r/586454 (https://phabricator.wikimedia.org/T249494) (owner: 10Andrew Bogott) [22:35:07] PROBLEM - Logstash Elasticsearch indexing errors on icinga1001 is CRITICAL: 13.01 ge 8 https://wikitech.wikimedia.org/wiki/Logstash%23Indexing_errors https://logstash.wikimedia.org/goto/1cee1f1b5d4e6c5e06edb3353a2a4b83 https://grafana.wikimedia.org/dashboard/db/logstash [22:36:03] (03PS1) 10Mstyles: kibana: add kibana to relforge [puppet] - 10https://gerrit.wikimedia.org/r/586460 (https://phabricator.wikimedia.org/T246961) [22:38:45] RECOVERY - Logstash Elasticsearch indexing errors on icinga1001 is OK: (C)8 ge (W)1 ge 0.02917 https://wikitech.wikimedia.org/wiki/Logstash%23Indexing_errors https://logstash.wikimedia.org/goto/1cee1f1b5d4e6c5e06edb3353a2a4b83 https://grafana.wikimedia.org/dashboard/db/logstash [22:41:18] (03CR) 10BryanDavis: wmfkeystonehooks sudo group: encode a bunch of ldap values as utf8 (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/586457 (https://phabricator.wikimedia.org/T249494) (owner: 10Andrew Bogott) [22:44:04] (03PS6) 1020after4: phabricator: remove firewall holes for port 80 [puppet] - 10https://gerrit.wikimedia.org/r/569100 (owner: 10Dzahn) [22:44:06] (03PS8) 1020after4: ATS/phabricator: directly talk wss:// to aphlict [puppet] - 10https://gerrit.wikimedia.org/r/569104 (https://phabricator.wikimedia.org/T238593) (owner: 10Dzahn) [22:44:08] (03PS1) 1020after4: ATS/phabricator: configure aphlict certificate [puppet] - 10https://gerrit.wikimedia.org/r/586461 (https://phabricator.wikimedia.org/T238593) [22:45:01] (03CR) 10BryanDavis: openstack: keystone: fix encoding issues in our custom LDAP handler (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/586341 (https://phabricator.wikimedia.org/T249494) (owner: 10Arturo Borrero Gonzalez) [22:45:42] (03CR) 10Mstyles: "looks good from the puppet compiler! https://puppet-compiler.wmflabs.org/compiler1001/21723/" [puppet] - 10https://gerrit.wikimedia.org/r/586460 (https://phabricator.wikimedia.org/T246961) (owner: 10Mstyles) [22:46:13] (03CR) 1020after4: ATS/phabricator: configure aphlict certificate (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/586461 (https://phabricator.wikimedia.org/T238593) (owner: 1020after4) [22:47:26] (03CR) 1020after4: [C: 04-1] "@dzahn: this can merge whenever you are ready, just amend with the certificate details in hieradata." [puppet] - 10https://gerrit.wikimedia.org/r/586461 (https://phabricator.wikimedia.org/T238593) (owner: 1020after4) [22:48:48] (03CR) 10jerkins-bot: [V: 04-1] ATS/phabricator: configure aphlict certificate [puppet] - 10https://gerrit.wikimedia.org/r/586461 (https://phabricator.wikimedia.org/T238593) (owner: 1020after4) [22:57:04] (03PS2) 1020after4: ATS/phabricator: configure aphlict certificate [puppet] - 10https://gerrit.wikimedia.org/r/586461 (https://phabricator.wikimedia.org/T238593) [23:00:04] RoanKattouw, Niharika, and Urbanecm: Dear deployers, time to do the Evening SWAT(Max 6 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200406T2300). [23:00:04] No GERRIT patches in the queue for this window AFAICS. [23:00:20] (03PS3) 1020after4: ATS/phabricator: configure aphlict certificate [puppet] - 10https://gerrit.wikimedia.org/r/586461 (https://phabricator.wikimedia.org/T238593) [23:03:00] (03PS4) 1020after4: ATS/phabricator: configure aphlict certificate [puppet] - 10https://gerrit.wikimedia.org/r/586461 (https://phabricator.wikimedia.org/T238593) [23:03:27] (03PS3) 10Andrew Bogott: wmfkeystonehooks sudo group: encode a bunch of ldap values as utf8 [puppet] - 10https://gerrit.wikimedia.org/r/586457 (https://phabricator.wikimedia.org/T249494) [23:03:57] just got an error - [Xou1IQpAIDEAAC74T5wAAAOQ] 2020-04-06 23:03:09: Fatal exception of type "Wikimedia\Rdbms\DBQueryError" [23:04:10] Unable to open https://meta.wikimedia.org/wiki/Steward_requests/Global - `[Xou1VgpAMNIAA1mYvDEAAAIS] 2020-04-06 23:03:50: Fatal exception of type "Wikimedia\Rdbms\DBQueryError"` [23:05:10] Getting the same on multiple pages, but only when logged in [23:05:19] (this is on enwiki) [23:05:33] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [23:05:39] * addshore reads up [23:06:46] doesn't seem to be in logstash :/ [23:07:23] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [23:07:31] Filed T249565 [23:07:31] T249565: Unable to open some pages on multiple wikis - DBQueryError - https://phabricator.wikimedia.org/T249565 [23:07:53] (it is, but without request id apparently) [23:07:56] i see a pretty healthy wave of db errors but looks to have recovered [23:08:09] Table 'wikidatawiki.wb_items_per_site' doesn't exist [23:08:13] Still happening for me [23:08:17] (03CR) 10jerkins-bot: [V: 04-1] ATS/phabricator: configure aphlict certificate [puppet] - 10https://gerrit.wikimedia.org/r/586461 (https://phabricator.wikimedia.org/T238593) (owner: 1020after4) [23:08:32] (03CR) 1020after4: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/586461 (https://phabricator.wikimedia.org/T238593) (owner: 1020after4) [23:09:08] I'm getting a db error here https://en.m.wikipedia.org/wiki/Dominic_Raab [23:09:17] (03PS5) 1020after4: ATS/phabricator: configure aphlict certificate [puppet] - 10https://gerrit.wikimedia.org/r/586461 (https://phabricator.wikimedia.org/T238593) [23:09:19] mvolz, known issue, thank you for reporting [23:09:32] gotcha [23:09:43] ping Amir1 if your around [23:09:47] (03CR) 1020after4: "The compiled change looks like what I would expect:" [puppet] - 10https://gerrit.wikimedia.org/r/586461 (https://phabricator.wikimedia.org/T238593) (owner: 1020after4) [23:10:07] (03PS4) 10Papaul: ADD backup2002 to role spare [puppet] - 10https://gerrit.wikimedia.org/r/586435 (https://phabricator.wikimedia.org/T238601) [23:12:05] addshore: I just pinged him off irc [23:12:49] ack [23:12:59] seems like Wikibase repo is triggered on wikis which should not be repos? [23:13:25] PROBLEM - PHP opcache health on mw2331 is CRITICAL: CRITICAL: opcache cache-hit ratio is below 99.85% https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [23:13:42] (03CR) 10Papaul: [C: 03+2] ADD backup2002 to role spare [puppet] - 10https://gerrit.wikimedia.org/r/586435 (https://phabricator.wikimedia.org/T238601) (owner: 10Papaul) [23:14:07] (03CR) 10jerkins-bot: [V: 04-1] ATS/phabricator: configure aphlict certificate [puppet] - 10https://gerrit.wikimedia.org/r/586461 (https://phabricator.wikimedia.org/T238593) (owner: 1020after4) [23:14:26] nvm, I was looking at the wrong trace. It affects both wikidata and the WB client wikis, apparently [23:14:31] I'm around [23:14:36] I'm loooking at it [23:14:43] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [23:15:19] Why now, can we rollback until this is fixed? [23:15:39] Rollback what exactly? [23:15:48] I can't view any wikipedia page right now. [23:15:58] Jdlrobson, known issue under investigation, https://phabricator.wikimedia.org/T249565 [23:16:05] On it [23:16:08] yep saw [23:16:13] There's not been any MW deploys for like 6 hours [23:16:22] o_O [23:16:23] what got deployed? [23:16:28] shit i was hoping it was a train thing [23:16:31] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [23:16:50] I think its logged in users only? [23:17:07] oh no, replicated for anonymous user too [23:19:23] anyone got a logstash URI handy? [23:19:35] Xou1VgpAMNIAA1mYvDEAAAIS [23:19:45] Or is the URI something else? [23:20:13] https://logstash.wikimedia.org/goto/9f6a1d2eb60269b7d8ce0c8a767c7ce8 [23:20:13] PROBLEM - MediaWiki exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [23:20:42] amir addshore: was wb_items_per_site table added recently? [23:20:49] if so can we revert the patch that introduced it? [23:21:35] Jdlrobson: There's no deploy in the last 6 hours [23:21:37] So it's not obvious [23:21:41] Happened around here: https://grafana.wikimedia.org/d/000000102/production-logging?orgId=1&from=1586213660588&to=1586214947739&var-datasource=eqiad%20prometheus%2Fops&panelId=18&fullscreen [23:21:57] PROBLEM - PHP7 rendering on wtp2010 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17285 bytes in 2.757 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:21:57] PROBLEM - PHP7 rendering on mw2301 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 2.312 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:21:59] PROBLEM - Too many messages in kafka logging-eqiad on icinga1001 is CRITICAL: cluster=misc exported_cluster=logging-eqiad group={logstash,logstash-codfw,logstash7-codfw} instance=kafkamon1001:9501 job=burrow partition={0,1,2,3,4,5} site=eqiad topic=udp_localhost-err https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasour [23:21:59] us/ops&var-cluster=logging-eqiad&var-topic=All&var-consumer_group=All [23:22:01] PROBLEM - PHP7 rendering on mw2285 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 3.257 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:01] PROBLEM - PHP7 rendering on wtp2001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17285 bytes in 1.139 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:03] PROBLEM - PHP7 rendering on mw2302 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 1.688 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:03] PROBLEM - PHP7 rendering on mw2169 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 1.815 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:03] PROBLEM - PHP7 rendering on mw2143 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 2.269 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:03] PROBLEM - PHP7 rendering on mw2147 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 2.404 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:05] PROBLEM - PHP7 rendering on mw1290 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:07] PROBLEM - PHP7 rendering on mw2327 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 2.055 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:07] PROBLEM - PHP7 rendering on mw2261 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 1.925 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:07] PROBLEM - PHP7 rendering on wtp2020 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17285 bytes in 2.068 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:07] PROBLEM - PHP7 rendering on mw1365 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:09] PROBLEM - PHP7 rendering on mw2315 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17281 bytes in 0.862 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:11] PROBLEM - PHP7 rendering on mw2295 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 1.411 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:11] PROBLEM - PHP7 rendering on mw2142 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17281 bytes in 0.765 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:11] PROBLEM - PHP7 rendering on mw2145 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 1.233 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:11] PROBLEM - PHP7 rendering on mw2325 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17281 bytes in 0.721 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:13] PROBLEM - PHP7 rendering on mw2207 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17281 bytes in 1.104 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:13] PROBLEM - PHP7 rendering on mw2211 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 1.332 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:13] PROBLEM - PHP7 rendering on mw2175 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 1.660 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:13] PROBLEM - PHP7 rendering on mw2176 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 1.913 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:13] PROBLEM - PHP7 rendering on mw1289 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:13] PROBLEM - PHP7 rendering on mw2168 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 2.239 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:13] PROBLEM - PHP7 rendering on mw2183 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 2.463 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:14] PROBLEM - PHP7 rendering on mw2190 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 2.812 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:14] PROBLEM - PHP7 rendering on mw2194 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 3.027 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:15] PROBLEM - PHP7 rendering on mw2197 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 3.379 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:15] PROBLEM - PHP7 rendering on mw2186 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 3.592 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:16] PROBLEM - PHP7 rendering on mw2195 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 3.912 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:17] PROBLEM - PHP7 rendering on mw2275 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 1.506 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:17] PROBLEM - PHP7 rendering on mw2350 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 1.381 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:17] PROBLEM - PHP7 rendering on mw2306 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 1.733 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:18] PROBLEM - PHP7 rendering on mw2354 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 1.933 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:18] PROBLEM - PHP7 rendering on mw2208 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 2.355 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:19] PROBLEM - PHP7 rendering on mw2210 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 2.497 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:19] PROBLEM - PHP7 rendering on mw1347 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:21] PROBLEM - PHP7 rendering on mw2277 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 3.419 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:21] PROBLEM - PHP7 rendering on mw2202 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 3.831 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:21] PROBLEM - PHP7 rendering on mw2258 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 4.088 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:21] PROBLEM - PHP7 rendering on mw2192 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 4.409 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:22] PROBLEM - PHP7 rendering on mw2206 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 4.140 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:22] PROBLEM - PHP7 rendering on mw2240 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 4.573 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:22] And there goes the cached pages. [23:22:23] PROBLEM - PHP7 rendering on mw2268 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 4.808 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:23] PROBLEM - PHP7 rendering on mw2357 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 3.638 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:24] PROBLEM - PHP7 rendering on wtp2005 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17285 bytes in 3.319 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:24] PROBLEM - PHP7 rendering on mw2245 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 3.684 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:25] PROBLEM - PHP7 rendering on mw2218 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 3.986 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:25] PROBLEM - PHP7 rendering on mw2283 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 2.786 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:26] PROBLEM - PHP7 rendering on mw2276 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 2.975 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:26] PROBLEM - PHP7 rendering on mw2171 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 3.406 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:27] PROBLEM - PHP7 rendering on mw2253 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 3.635 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:27] PROBLEM - PHP7 rendering on mw2219 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 4.062 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:28] PROBLEM - PHP7 rendering on mw2232 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 4.301 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:28] PROBLEM - PHP7 rendering on mw2244 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 4.731 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:29] PROBLEM - PHP7 rendering on mw2226 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 4.989 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:29] PROBLEM - PHP7 rendering on mw2217 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 5.428 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:30] PROBLEM - PHP7 rendering on mw2222 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 5.673 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:30] PROBLEM - PHP7 rendering on mw2229 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 6.113 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:33] PROBLEM - PHP7 rendering on mw1342 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:35] PROBLEM - PHP7 rendering on mw2294 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.081 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:35] PROBLEM - PHP7 rendering on mwdebug2001 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17293 bytes in 9.816 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:37] PROBLEM - PHP7 rendering on mw2361 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 8.607 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:37] PROBLEM - PHP7 rendering on mw2272 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 8.819 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:37] PROBLEM - PHP7 rendering on mw2356 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 8.101 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:39] PROBLEM - PHP7 rendering on mw2311 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 8.720 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:39] PROBLEM - PHP7 rendering on mw2225 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.228 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:39] PROBLEM - PHP7 rendering on mw2236 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.386 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:41] PROBLEM - PHP7 rendering on mw2220 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.025 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:41] PROBLEM - PHP7 rendering on mw2254 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.290 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:41] PROBLEM - PHP7 rendering on mw2270 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.692 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:41] PROBLEM - PHP7 rendering on mw2230 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.981 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:43] PROBLEM - PHP7 rendering on mw2274 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 8.548 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:43] PROBLEM - PHP7 rendering on wtp2002 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17285 bytes in 8.821 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:43] PROBLEM - PHP7 rendering on mw2297 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 8.373 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:43] PROBLEM - PHP7 rendering on mw2364 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 8.705 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:43] PROBLEM - PHP7 rendering on mw2330 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 8.912 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:45] PROBLEM - PHP7 rendering on mw2375 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.253 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:45] PROBLEM - PHP7 rendering on mw2362 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.463 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:45] PROBLEM - PHP7 rendering on mw2293 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.121 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:45] PROBLEM - PHP7 rendering on mw2300 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.388 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:45] PROBLEM - PHP7 rendering on mw2373 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.687 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:45] PROBLEM - PHP7 rendering on mw2163 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.993 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:47] PROBLEM - PHP7 rendering on mw2368 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.515 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:47] PROBLEM - PHP7 rendering on mw2262 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.838 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:47] PROBLEM - PHP7 rendering on mw2135 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:47] PROBLEM - PHP7 rendering on mw2200 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:48] PROBLEM - PHP7 rendering on mw2269 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:49] PROBLEM - PHP7 rendering on mw2235 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.682 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:49] PROBLEM - PHP7 rendering on mw2137 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.528 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:49] PROBLEM - PHP7 rendering on mw1390 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:50] PROBLEM - PHP7 rendering on wtp2018 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17285 bytes in 9.871 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:50] PROBLEM - PHP7 rendering on mw2290 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:51] PROBLEM - PHP7 rendering on mw2231 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:51] PROBLEM - PHP7 rendering on wtp2011 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:52] PROBLEM - PHP7 rendering on mw2233 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.879 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:52] PROBLEM - PHP7 rendering on mw2172 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:53] PROBLEM - PHP7 rendering on mw2224 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:53] PROBLEM - PHP7 rendering on mw2184 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:54] PROBLEM - PHP7 rendering on mw2198 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:54] PROBLEM - PHP7 rendering on mw2286 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:55] PROBLEM - PHP7 rendering on mw2271 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:55] PROBLEM - PHP7 rendering on mw2189 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:56] PROBLEM - PHP7 rendering on mw2289 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:56] PROBLEM - PHP7 rendering on mw2257 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:57] PROBLEM - PHP7 rendering on mw2355 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:59] PROBLEM - PHP7 rendering on mw2370 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.187 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:59] PROBLEM - PHP7 rendering on mw2216 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.569 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:59] PROBLEM - PHP7 rendering on mw2309 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.413 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:22:59] PROBLEM - PHP7 rendering on wtp2004 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17285 bytes in 9.902 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:01] PROBLEM - PHP7 rendering on mw2209 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.104 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:01] PROBLEM - PHP7 rendering on mw2166 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.230 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:01] PROBLEM - PHP7 rendering on mw2164 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.696 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:01] PROBLEM - PHP7 rendering on mw2173 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.803 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:01] PROBLEM - PHP7 rendering on mw2185 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:02] PROBLEM - PHP7 rendering on mw2193 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:05] PROBLEM - PHP7 rendering on mw2332 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.662 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:05] PROBLEM - PHP7 rendering on mw2292 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:05] PROBLEM - PHP7 rendering on mw2317 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:05] PROBLEM - PHP7 rendering on mw2366 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:05] PROBLEM - PHP7 rendering on mw2353 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:05] PROBLEM - PHP7 rendering on mw2308 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:06] PROBLEM - PHP7 rendering on mw2331 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:06] PROBLEM - PHP7 rendering on mw2320 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:07] PROBLEM - PHP7 rendering on mw2255 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:09] PROBLEM - PHP7 rendering on mw2352 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.481 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:09] PROBLEM - PHP7 rendering on mw2367 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.867 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:09] PROBLEM - PHP7 rendering on mw2141 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:09] PROBLEM - PHP7 rendering on wtp2009 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:09] PROBLEM - PHP7 rendering on mw2304 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:11] PROBLEM - PHP7 rendering on mw2179 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.708 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:11] PROBLEM - PHP7 rendering on mw2205 is CRITICAL: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 17283 bytes in 9.981 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:11] PROBLEM - PHP7 rendering on mw2203 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:11] PROBLEM - PHP7 rendering on mw2177 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:12] PROBLEM - PHP7 rendering on mw2214 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:13] PROBLEM - PHP7 rendering on mw2374 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:15] PROBLEM - PHP7 rendering on mw2201 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:15] PROBLEM - PHP7 rendering on mw2239 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:15] PROBLEM - PHP7 rendering on mw2234 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:15] PROBLEM - PHP7 rendering on mw2241 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:15] PROBLEM - PHP7 rendering on mw2291 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:17] PROBLEM - PHP7 rendering on mwdebug1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:17] PROBLEM - PHP7 rendering on mw2204 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:17] PROBLEM - PHP7 rendering on wtp2012 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:17] PROBLEM - PHP7 rendering on mwdebug2002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:17] PROBLEM - PHP7 rendering on wtp2014 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:18] PROBLEM - PHP7 rendering on mw2182 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:18] PROBLEM - PHP7 rendering on mw2188 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:19] PROBLEM - PHP7 rendering on mw2312 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:19] PROBLEM - PHP7 rendering on mw2326 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:20] PROBLEM - PHP7 rendering on mw2322 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:20] PROBLEM - PHP7 rendering on mw2318 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:21] PROBLEM - PHP7 rendering on mw2288 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:23] PROBLEM - PHP7 rendering on mw2303 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:23] PROBLEM - PHP7 rendering on wtp2013 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:23] PROBLEM - PHP7 rendering on wtp2003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:23] PROBLEM - PHP7 rendering on mw2174 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:23] PROBLEM - PHP7 rendering on wtp2008 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:27] PROBLEM - PHP7 rendering on mw2358 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:29] PROBLEM - PHP7 rendering on wtp2017 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:29] PROBLEM - PHP7 rendering on mw2170 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:35] PROBLEM - PHP7 rendering on mw2310 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:35] PROBLEM - PHP7 rendering on mw2144 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:35] PROBLEM - PHP7 rendering on wtp2019 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:35] PROBLEM - PHP7 rendering on mw2165 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:37] PROBLEM - PHP7 rendering on mw2316 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:37] PROBLEM - PHP7 rendering on mw2139 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:37] PROBLEM - PHP7 rendering on mw2376 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:37] PROBLEM - PHP7 rendering on mw2237 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:39] PROBLEM - PHP7 rendering on mw2298 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:39] PROBLEM - PHP7 rendering on mw2305 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:39] PROBLEM - PHP7 rendering on mw2360 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:39] PROBLEM - PHP7 rendering on mw2296 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:39] PROBLEM - PHP7 rendering on mw2369 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:40] PROBLEM - PHP7 rendering on mw2273 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:41] PROBLEM - PHP7 rendering on mw2319 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:41] PROBLEM - PHP7 rendering on mw2307 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:41] PROBLEM - PHP7 rendering on mw2314 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:42] PROBLEM - PHP7 rendering on mw2299 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:42] PROBLEM - PHP7 rendering on mw2365 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:43] PROBLEM - PHP7 rendering on mw1404 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:43] PROBLEM - PHP7 rendering on mw2181 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:44] PROBLEM - PHP7 rendering on mw2178 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:44] PROBLEM - PHP7 rendering on mw2221 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:45] PROBLEM - PHP7 rendering on mw2228 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:47] PROBLEM - PHP7 rendering on mw2323 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:47] PROBLEM - PHP7 rendering on mw2371 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:47] PROBLEM - PHP7 rendering on mw2359 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:47] PROBLEM - PHP7 rendering on mw2284 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:47] PROBLEM - PHP7 rendering on mw2287 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:48] PROBLEM - PHP7 rendering on mw2167 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:48] PROBLEM - PHP7 rendering on mw2187 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:49] PROBLEM - PHP7 rendering on mw2146 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:49] PROBLEM - PHP7 rendering on mw2252 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:50] PROBLEM - PHP7 rendering on mw2191 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:50] PROBLEM - PHP7 rendering on wtp2016 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:51] PROBLEM - PHP7 rendering on mw2251 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:51] PROBLEM - PHP7 rendering on mw2227 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:52] PROBLEM - PHP7 rendering on mw2136 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:52] PROBLEM - PHP7 rendering on mw2215 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:53] PROBLEM - PHP7 rendering on mw2321 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:53] PROBLEM - PHP7 rendering on mw2333 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:53] HAS ANYONE HERE RUN UPDATE.PHP IN PRODUCTION [23:23:54] PROBLEM - PHP7 rendering on mw2180 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:54] PROBLEM - PHP7 rendering on mw2140 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:55] PROBLEM - PHP7 rendering on mw2313 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:55] PROBLEM - PHP7 rendering on mw2324 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:56] PROBLEM - PHP7 rendering on mw2138 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:56] PROBLEM - PHP7 rendering on mw2328 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:57] Er, I'm getting a bad looking error. Maybe it's this ^ `[Xou51ApAAEMAAJ6gnmoAAAAO] 2020-04-06 23:23:01: Fatal exception of type "Wikimedia\Rdbms\DBQueryError"` [23:23:57] PROBLEM - PHP7 rendering on wtp2006 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:57] PROBLEM - PHP7 rendering on wtp2007 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:58] PROBLEM - PHP7 rendering on mw2212 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:58] PROBLEM - PHP7 rendering on mw2199 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:59] PROBLEM - PHP7 rendering on mw2256 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:23:59] PROBLEM - PHP7 rendering on mw2242 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:24:00] PROBLEM - PHP7 rendering on mw2238 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:24:00] RECOVERY - PHP7 rendering on mw1365 is OK: HTTP OK: HTTP/1.1 200 OK - 81692 bytes in 0.123 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:24:01] PROBLEM - PHP7 rendering on wtp2015 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:24:01] PROBLEM - PHP7 rendering on mw2372 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:24:02] PROBLEM - PHP7 rendering on mw2334 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:24:02] PROBLEM - PHP7 rendering on mw2363 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:24:03] PROBLEM - PHP7 rendering on mw2351 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:24:03] PROBLEM - PHP7 rendering on mw2196 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:24:04] PROBLEM - PHP7 rendering on mw2329 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:24:04] PROBLEM - PHP7 rendering on mw2223 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:24:05] RECOVERY - PHP7 rendering on mw1289 is OK: HTTP OK: HTTP/1.1 200 OK - 81692 bytes in 0.136 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:24:05] RECOVERY - PHP7 rendering on mw1347 is OK: HTTP OK: HTTP/1.1 200 OK - 81692 bytes in 0.139 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:24:09] uh oh... [23:24:13] niedzielski: Yes, everyone is. [23:24:14] right [23:24:17] PROBLEM - PHP7 rendering on mw1399 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:24:19] RECOVERY - PHP7 rendering on mw1342 is OK: HTTP OK: HTTP/1.1 200 OK - 81692 bytes in 0.151 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:24:33] RECOVERY - PHP7 rendering on mw1390 is OK: HTTP OK: HTTP/1.1 200 OK - 81692 bytes in 0.134 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:24:47] PROBLEM - PHP7 rendering on wtp1047 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:24:59] RECOVERY - PHP7 rendering on mwdebug1001 is OK: HTTP OK: HTTP/1.1 200 OK - 81702 bytes in 0.222 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:25:19] PROBLEM - PHP7 rendering on mw1401 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:25:23] RECOVERY - PHP7 rendering on mw1404 is OK: HTTP OK: HTTP/1.1 200 OK - 81692 bytes in 0.126 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:25:25] Thanks, James_F. Sorry to hear. [23:25:43] It's very bad, you're right. [23:25:43] RECOVERY - PHP7 rendering on mw1290 is OK: HTTP OK: HTTP/1.1 200 OK - 81692 bytes in 0.123 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:25:53] One of the biggest tables in production has just disappeared. [23:25:53] (03PS1) 10Bstorm: tools-static: CDNJS suddenly requires SNI name to be past along [puppet] - 10https://gerrit.wikimedia.org/r/586464 (https://phabricator.wikimedia.org/T249558) [23:26:15] RECOVERY - PHP7 rendering on mw2294 is OK: HTTP OK: HTTP/1.1 200 OK - 78291 bytes in 3.024 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:26:15] RECOVERY - PHP7 rendering on mw2357 is OK: HTTP OK: HTTP/1.1 200 OK - 78291 bytes in 9.563 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:26:15] RECOVERY - PHP7 rendering on mw2245 is OK: HTTP OK: HTTP/1.1 200 OK - 78291 bytes in 8.256 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:26:15] RECOVERY - PHP7 rendering on mw2276 is OK: HTTP OK: HTTP/1.1 200 OK - 78291 bytes in 6.311 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:26:15] RECOVERY - PHP7 rendering on mw2229 is OK: HTTP OK: HTTP/1.1 200 OK - 78291 bytes in 6.274 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:26:15] RECOVERY - PHP7 rendering on wtp2005 is OK: HTTP OK: HTTP/1.1 200 OK - 78298 bytes in 8.273 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:26:16] RECOVERY - PHP7 rendering on mw2253 is OK: HTTP OK: HTTP/1.1 200 OK - 78291 bytes in 6.313 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:26:16] RECOVERY - PHP7 rendering on mw2171 is OK: HTTP OK: HTTP/1.1 200 OK - 78291 bytes in 6.318 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:26:17] RECOVERY - PHP7 rendering on mw2232 is OK: HTTP OK: HTTP/1.1 200 OK - 78291 bytes in 6.301 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:26:17] RECOVERY - PHP7 rendering on mw2283 is OK: HTTP OK: HTTP/1.1 200 OK - 78291 bytes in 6.330 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:26:18] RECOVERY - PHP7 rendering on mw2217 is OK: HTTP OK: HTTP/1.1 200 OK - 78291 bytes in 6.289 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:26:18] RECOVERY - PHP7 rendering on mw2244 is OK: HTTP OK: HTTP/1.1 200 OK - 78291 bytes in 6.308 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:26:19] RECOVERY - PHP7 rendering on mw2226 is OK: HTTP OK: HTTP/1.1 200 OK - 78291 bytes in 6.295 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:26:19] RECOVERY - PHP7 rendering on mwdebug2001 is OK: HTTP OK: HTTP/1.1 200 OK - 78306 bytes in 3.951 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:26:20] RECOVERY - PHP7 rendering on mw2219 is OK: HTTP OK: HTTP/1.1 200 OK - 78291 bytes in 6.315 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:26:20] RECOVERY - PHP7 rendering on mw2222 is OK: HTTP OK: HTTP/1.1 200 OK - 78291 bytes in 6.293 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:26:21] RECOVERY - PHP7 rendering on mw2218 is OK: HTTP OK: HTTP/1.1 200 OK - 78291 bytes in 8.273 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:26:21] RECOVERY - PHP7 rendering on mw2361 is OK: HTTP OK: HTTP/1.1 200 OK - 78299 bytes in 0.257 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:26:22] RECOVERY - PHP7 rendering on mw2272 is OK: HTTP OK: HTTP/1.1 200 OK - 78299 bytes in 0.282 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:26:22] RECOVERY - PHP7 rendering on mw2356 is OK: HTTP OK: HTTP/1.1 200 OK - 78298 bytes in 0.256 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:26:23] RECOVERY - PHP7 rendering on mw2311 is OK: HTTP OK: HTTP/1.1 200 OK - 78298 bytes in 0.256 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:26:23] RECOVERY - PHP7 rendering on mw2236 is OK: HTTP OK: HTTP/1.1 200 OK - 78299 bytes in 0.278 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:26:24] RECOVERY - PHP7 rendering on mw2225 is OK: HTTP OK: HTTP/1.1 200 OK - 78299 bytes in 0.280 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:26:24] RECOVERY - PHP7 rendering on mw2270 is OK: HTTP OK: HTTP/1.1 200 OK - 78299 bytes in 0.280 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:26:25] RECOVERY - PHP7 rendering on mw2220 is OK: HTTP OK: HTTP/1.1 200 OK - 78299 bytes in 0.282 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:26:25] RECOVERY - PHP7 rendering on mw2254 is OK: HTTP OK: HTTP/1.1 200 OK - 78299 bytes in 0.282 second response time https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_rendering [23:26:32] !log created wb_items_per_site [23:26:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:26:56] So what happened, a table had mysteriously disappeared? [23:27:18] MaxSem: Seemingly [23:27:37] maybe someone dropped a table accidentally... [23:27:46] thanks @Amir1 for solving the immediate issue of getting Wikipedia back online and stopping the log spam. [23:27:52] (03PS1) 10Papaul: Add backup2002 MAC address and partman [puppet] - 10https://gerrit.wikimedia.org/r/586465 (https://phabricator.wikimedia.org/T238601) [23:28:25] yup, we think somehow update.php got ran which should never happen in production, dropping an important table [23:28:35] I'm assuming there are logs to explain this? [23:29:13] :( [23:29:26] Site should be back up. [23:29:37] RECOVERY - MediaWiki exceptions and fatals per minute on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Application_servers https://grafana.wikimedia.org/d/000000438/mediawiki-alerts?panelId=2&fullscreen&orgId=1&var-datasource=eqiad+prometheus/ops [23:30:16] Amir1: if that really happened, just one table vanishing would be a fairly innocent outcome [23:30:37] Yeah, update.php would also try to ALTER INDEX from what prod has. [23:30:41] Which would be very very bad. [23:31:09] (03CR) 10Papaul: [C: 03+2] Add backup2002 MAC address and partman [puppet] - 10https://gerrit.wikimedia.org/r/586465 (https://phabricator.wikimedia.org/T238601) (owner: 10Papaul) [23:31:23] !log ladsgroup@mwmaint1002:/srv/mediawiki-staging/php-1.35.0-wmf.26$ mwscript extensions/Wikibase/repo/maintenance/rebuildItemsPerSite.php --wiki=wikidatawiki [23:31:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:31:57] MaxSem: yeah, we should check everything :( [23:32:24] Does wb_terms still exist? https://gerrit.wikimedia.org/g/mediawiki/extensions/Wikibase/+/48b37477025c454dfa392efcb7f22d3bbe9c69f6/repo/includes/Store/Sql/DatabaseSchemaUpdater.php#79 [23:32:51] (03CR) 10Zhuyifei1999: tools-static: CDNJS suddenly requires SNI name to be past along (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/586464 (https://phabricator.wikimedia.org/T249558) (owner: 10Bstorm) [23:33:05] Jdlrobson: It's being dropped. [23:34:04] probably T157651? [23:34:05] T157651: sql.php runs LoadExtensionSchemaUpdates - https://phabricator.wikimedia.org/T157651 [23:34:37] (03CR) 10Bstorm: tools-static: CDNJS suddenly requires SNI name to be past along (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/586464 (https://phabricator.wikimedia.org/T249558) (owner: 10Bstorm) [23:35:11] (03PS2) 10Bstorm: tools-static: CDNJS suddenly requires SNI name to be past along [puppet] - 10https://gerrit.wikimedia.org/r/586464 (https://phabricator.wikimedia.org/T249558) [23:35:29] Looks like a bot EmausBot is running wild? https://phabricator.wikimedia.org/T249565#6034676 [23:35:34] can we disable it ? [23:36:03] (03CR) 10Bstorm: tools-static: CDNJS suddenly requires SNI name to be past along (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/586464 (https://phabricator.wikimedia.org/T249558) (owner: 10Bstorm) [23:36:04] https://www.wikidata.org/wiki/Special:Contributions/EmausBot [23:37:12] Jdlrobson blocked [23:37:28] thanks DannyS712 [23:37:43] We have lots of duplicates now - https://www.wikidata.org/wiki/Q15622255 https://www.wikidata.org/wiki/Q89608201 [23:37:56] Not sure who to notify about that [23:38:11] If you can find the earliest duplicate I'll nuke everything since then, and the bot will eventually recreate them once its working [23:39:56] (03CR) 10Zhuyifei1999: tools-static: CDNJS suddenly requires SNI name to be past along (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/586464 (https://phabricator.wikimedia.org/T249558) (owner: 10Bstorm) [23:40:06] https://www.wikidata.org/w/index.php?title=Q89606763&oldid=1151786598 [23:40:17] (03CR) 10Zhuyifei1999: [C: 03+1] tools-static: CDNJS suddenly requires SNI name to be past along [puppet] - 10https://gerrit.wikimedia.org/r/586464 (https://phabricator.wikimedia.org/T249558) (owner: 10Bstorm) [23:40:35] AntiComposite: confirmed [23:40:46] was gonna suggest the same [23:40:53] https://www.wikidata.org/w/index.php?title=Special:Contributions/EmausBot&offset=20200406232800&limit=500&target=EmausBot [23:41:15] (03CR) 10Bstorm: [C: 03+2] tools-static: CDNJS suddenly requires SNI name to be past along [puppet] - 10https://gerrit.wikimedia.org/r/586464 (https://phabricator.wikimedia.org/T249558) (owner: 10Bstorm) [23:42:17] RECOVERY - Too many messages in kafka logging-eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Logstash%23Kafka_consumer_lag https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=eqiad+prometheus/ops&var-cluster=logging-eqiad&var-topic=All&var-consumer_group=All [23:42:32] Jdlrobson I've deleted the last 500 created (since the 500th was a duplicate); on to the next 500 [23:43:01] 10Operations, 10ops-codfw, 10DBA: (Need by: TBD) codfw: rack/setup/install backup2002/array backup2002-array1 - https://phabricator.wikimedia.org/T248934 (10Papaul) [23:43:05] (03CR) 10Andrew Bogott: [C: 03+2] wmfkeystonehooks sudo group: encode a bunch of ldap values as utf8 [puppet] - 10https://gerrit.wikimedia.org/r/586457 (https://phabricator.wikimedia.org/T249494) (owner: 10Andrew Bogott) [23:46:39] RECOVERY - PHP opcache health on mw2331 is OK: OK: opcache is healthy https://wikitech.wikimedia.org/wiki/Application_servers/Runbook%23PHP7_opcache_health [23:46:40] I have been running the rebuild for a while now [23:47:23] Amir1 can you let me know once its safe to unblock EmausBot? I deleted 510 items that it created today that were likely duplicates [23:47:56] DannyS712: I doubt things should be ran, the table is empty [23:48:01] we are rebuilding it [23:48:24] I have no sense of how long the table rebuilding will take - hours? A week? [23:48:28] months [23:48:35] Oh wow [23:48:40] Well, maybe not plural [23:48:40] :( [23:49:36] let's see [23:49:46] it's faster than I thought [23:49:46] So, restore from backups? 😭 [23:49:58] it's 110600 items already [23:50:03] probably a whole week [23:51:31] 10Operations, 10ops-codfw, 10DBA: (Need by: TBD) codfw: rack/setup/install backup2002/array backup2002-array1 - https://phabricator.wikimedia.org/T248934 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by pt1979 on cumin2001.codfw.wmnet for hosts: ` backup2002.codfw.wmnet ` The log can be found i... [23:52:28] (03PS6) 1020after4: ATS/phabricator: configure aphlict certificate [puppet] - 10https://gerrit.wikimedia.org/r/586461 (https://phabricator.wikimedia.org/T238593) [23:53:08] DannyS712: thanks. [23:53:22] Amir1: are you serious? wow o_O [23:53:41] Jdlrobson: yup, it's 80M items [23:54:01] we are at 0.1M right now, so it's like 0.2% done [23:54:19] the most important ones are done though, low qids [23:54:58] Amir1: no backups? [23:55:12] or is this copying from a backup? [23:55:29] After its done, is there a script to find an updated list of true duplicates (https://www.wikidata.org/wiki/Wikidata:True_duplicates) to see what remains to be cleaned up? [23:55:44] Amir1: so what is the impact of this change from a product perspective? [23:55:52] Jdlrobson: there's backups, probably we are going to go in that direction [23:55:53] without that data what consequences are there? [23:56:17] articles can't use wikidata's data, they are not connected to each other (between languages) [23:56:27] jouncebot: now [23:56:27] For the next 0 hour(s) and 3 minute(s): Evening SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200406T2300) [23:56:31] does that mean no wikidata descriptions? [23:56:33] * addshore is deploying [23:56:40] (03PS1) 10Bstorm: tools-static: apply SNI name setting to fontcdn as well [puppet] - 10https://gerrit.wikimedia.org/r/586475 (https://phabricator.wikimedia.org/T249558) [23:58:28] syncing [23:58:42] addshore https://test.wikidata.org/wiki/Wikidata:Main_Page `(Cannot access the database: Cannot access the database: Unknown error (10.64.16.7))` [23:58:48] Jdlrobson: yup [23:59:03] Amir1: so what can i tell my product manager(s) to expect? [23:59:10] (03PS1) 10Andrew Bogott: wmfkeystonehooks sudo group: encode yet more ldap values as utf8 [puppet] - 10https://gerrit.wikimedia.org/r/586476 (https://phabricator.wikimedia.org/T249494) [23:59:12] DannyS712: let me fix test as well [23:59:23] !log addshore@deploy1001 Synchronized php-1.35.0-wmf.26/extensions/Wikibase/repo/includes/Store/Sql/DatabaseSchemaUpdater.php: Do not try to drop things when theres no wb_terms table T208425 T249565 (duration: 00m 59s) [23:59:27] Jdlrobson: Sure, honestly, everything is on fire right now [23:59:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:59:31] T208425: [EPIC] Kill the wb_terms table - https://phabricator.wikimedia.org/T208425 [23:59:31] T249565: Wikidata's wb_items_per_site table has suddenly disappeared, creating DBQueryErrors on page views - https://phabricator.wikimedia.org/T249565 [23:59:41] addshore sorry, I thought that was already merged there