[06:48:00] 10Traffic, 10Operations, 10Patch-For-Review: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by vgutierrez on cumin1001.eqiad.wmnet for hosts: ` cp4029.ulsfo.wmnet ` The log can be found in `/var/log/wmf-auto-reima... [07:35:32] 10Traffic, 10Operations, 10Patch-For-Review: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cp4029.ulsfo.wmnet'] ` Of which those **FAILED**: ` ['cp4029.ulsfo.wmnet'] ` [07:35:45] 10Traffic, 10Operations, 10Patch-For-Review: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by vgutierrez on cumin1001.eqiad.wmnet for hosts: ` cp4029.ulsfo.wmnet ` The log can be found in `/var/log/wmf-auto-reima... [07:57:41] 10Traffic, 10MobileFrontend, 10Operations, 10TechCom-RFC, 10Readers-Web-Backlog (Tracking): Remove .m. subdomain, serve mobile and desktop variants through the same URL - https://phabricator.wikimedia.org/T214998 (10Base) What does this mean for me as for a user who edits Wikimedia projects from mobile u... [08:03:45] 10Traffic, 10Operations, 10Patch-For-Review: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cp4029.ulsfo.wmnet'] ` Of which those **FAILED**: ` ['cp4029.ulsfo.wmnet'] ` [08:32:20] <_joe_> vgutierrez, ema - don't we set a header in the caching layer that identifies each request? [08:33:23] <_joe_> so that requests to the app layer have a header, like X-Request-Id ? [08:35:01] <_joe_> oh I see, we don't [08:35:15] <_joe_> we just avoid unsetting it [08:35:21] <_joe_> for internal requests [08:37:24] 10Traffic, 10Operations, 10serviceops, 10Patch-For-Review: Add x-request-id to httpd (apache) logs - https://phabricator.wikimedia.org/T244545 (10Joe) [09:57:29] _joe_: hey, I think marko was working on something similar [09:57:55] something something uuid [09:57:58] <_joe_> yes [10:02:15] _joe_: see https://phabricator.wikimedia.org/T201409 and related tasks [10:04:43] <_joe_> yes I'm aware [10:05:11] <_joe_> my need is: we generate a lot of mw-appserver-to-api requests [10:05:20] <_joe_> I want to track them across [10:05:56] <_joe_> so one way of doing that is setting X-Request-Id at the edge layer, and track it in the logs of apache [10:06:34] <_joe_> but generating the unique id at the edge layer can wait for now, probably [10:24:15] 10Traffic, 10Operations, 10serviceops, 10Patch-For-Review: Add x-request-id to httpd (apache) logs - https://phabricator.wikimedia.org/T244545 (10jijiki) p:05Triage→03Medium [10:27:12] 10Traffic, 10Operations, 10Patch-For-Review: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by vgutierrez on cumin1001.eqiad.wmnet for hosts: ` cp4028.ulsfo.wmnet ` The log can be found in `/var/log/wmf-auto-reima... [10:54:28] 10Traffic, 10Operations, 10Patch-For-Review: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cp4028.ulsfo.wmnet'] ` Of which those **FAILED**: ` ['cp4028.ulsfo.wmnet'] ` [11:03:21] 10Traffic, 10Discovery, 10Operations, 10Wikidata, and 3 others: Wikidata maxlag repeatedly over 5s since Jan20, 2020 (primarily caused by the query service) - https://phabricator.wikimedia.org/T243701 (10Lucas_Werkmeister_WMDE) > Either the lag in WDQS needs to be fixed, or we need to introduce some scalin... [11:05:26] 10Traffic, 10Discovery, 10Operations, 10Wikidata, and 3 others: Wikidata maxlag repeatedly over 5s since Jan20, 2020 (primarily caused by the query service) - https://phabricator.wikimedia.org/T243701 (10Ladsgroup) Yes, I think `wgWikidataOrgQueryServiceMaxLagFactor` should be way higher. Something like 12... [11:08:49] 10Traffic, 10Operations, 10Patch-For-Review: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by vgutierrez on cumin1001.eqiad.wmnet for hosts: ` cp4027.ulsfo.wmnet ` The log can be found in `/var/log/wmf-auto-reima... [11:09:30] 10Traffic, 10Operations: Upgrade ncredir cluster to buster - https://phabricator.wikimedia.org/T243391 (10Vgutierrez) [11:36:51] 10Traffic, 10Discovery, 10Operations, 10Wikidata, and 3 others: Wikidata maxlag repeatedly over 5s since Jan20, 2020 (primarily caused by the query service) - https://phabricator.wikimedia.org/T243701 (10Lea_Lacroix_WMDE) Hello all, Here are some news: we are going to try and increase the maxlag connected... [11:38:11] 10Traffic, 10Operations, 10Patch-For-Review: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by vgutierrez on cumin1001.eqiad.wmnet for hosts: ` ['cp5005.eqsin.wmnet', 'cp5011.eqsin.wmnet'] ` The log can be found i... [11:40:37] 10Traffic, 10Operations, 10Patch-For-Review: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cp4027.ulsfo.wmnet'] ` and were **ALL** successful. [11:44:50] 10Traffic, 10Operations, 10Patch-For-Review: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10Vgutierrez) [12:24:14] 10Traffic, 10Operations: Upgrade ncredir cluster to buster - https://phabricator.wikimedia.org/T243391 (10Vgutierrez) 05Open→03Resolved a:03Vgutierrez [12:24:26] 10Traffic, 10Operations: Upgrade ncredir cluster to buster - https://phabricator.wikimedia.org/T243391 (10Vgutierrez) [12:24:34] \o/ [12:26:32] 10Traffic, 10Operations: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by vgutierrez on cumin1001.eqiad.wmnet for hosts: ` cp5011.eqsin.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/202002101226_vgutie... [12:31:11] 10Traffic, 10Operations: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by vgutierrez on cumin1001.eqiad.wmnet for hosts: ` cp5004.eqsin.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/202002101230_vgutie... [13:10:25] 10Traffic, 10Operations: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cp5004.eqsin.wmnet'] ` Of which those **FAILED**: ` ['cp5004.eqsin.wmnet'] ` [13:11:27] 10Traffic, 10Operations: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cp5011.eqsin.wmnet'] ` Of which those **FAILED**: ` ['cp5011.eqsin.wmnet'] ` [13:46:38] 10Traffic, 10Operations: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by vgutierrez on cumin1001.eqiad.wmnet for hosts: ` cp5010.eqsin.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/202002101346_vgutie... [13:48:40] 10Traffic, 10Operations: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by vgutierrez on cumin1001.eqiad.wmnet for hosts: ` cp5003.eqsin.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/202002101348_vgutie... [13:49:10] 10Traffic, 10Operations: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10Vgutierrez) [14:29:17] 10Traffic, 10Operations: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cp5003.eqsin.wmnet'] ` Of which those **FAILED**: ` ['cp5003.eqsin.wmnet'] ` [14:30:39] 10Traffic, 10Operations: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cp5010.eqsin.wmnet'] ` Of which those **FAILED**: ` ['cp5010.eqsin.wmnet'] ` [14:45:51] 10Traffic, 10Operations: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10Vgutierrez) [14:47:35] 10Traffic, 10Operations: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by vgutierrez on cumin1001.eqiad.wmnet for hosts: ` cp5009.eqsin.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/202002101447_vgutie... [14:50:07] 10Traffic, 10Operations: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by vgutierrez on cumin1001.eqiad.wmnet for hosts: ` cp5002.eqsin.wmnet ` The log can be found in `/var/log/wmf-auto-reimage/202002101449_vgutie... [15:30:01] _joe_: I want to set such things at the edge this Q [15:32:45] <_joe_> cdanis: perfect! [15:33:04] 10Traffic, 10Operations: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cp5009.eqsin.wmnet'] ` Of which those **FAILED**: ` ['cp5009.eqsin.wmnet'] ` [15:39:03] 10Traffic, 10Operations: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['cp5002.eqsin.wmnet'] ` and were **ALL** successful. [16:24:54] 10Traffic, 10Operations: Upgrade cache cluster to debian buster - https://phabricator.wikimedia.org/T242093 (10Vgutierrez) [17:44:27] 10Traffic, 10DC-Ops, 10Operations, 10ops-esams: Upgrade BIOS and IDRAC firmware on esams caches - https://phabricator.wikimedia.org/T243167 (10RobH) @bblack, Can we modify this task to include the eqiad caches that need update as well? I'll be handing these remotely. During this process, if any single se... [18:04:19] same issue with wikiworkshop.org as T243948 ? [18:04:19] T243948: SSL CRITICAL - OCSP staple validity for www.wikipedia.bg has X seconds left - https://phabricator.wikimedia.org/T243948 [18:04:38] (see icinga warnings, wanted to give a heads up before heading out) [18:44:55] 10Traffic, 10DC-Ops, 10Operations, 10ops-esams: Upgrade BIOS and IDRAC firmware on esams caches - https://phabricator.wikimedia.org/T243167 (10BBlack) @RobH - Yes, let's edit this to include eqiad as well. We've had the same symptoms both places, and they're the same approximate generation of hardware con... [18:45:18] 10Traffic, 10DC-Ops, 10Operations, 10ops-esams: Upgrade BIOS and IDRAC firmware on esams caches - https://phabricator.wikimedia.org/T243167 (10BBlack) a:05BBlack→03RobH [20:26:46] 10Traffic, 10DC-Ops, 10Operations, 10ops-esams: Upgrade BIOS and IDRAC firmware on R440 cp systems - https://phabricator.wikimedia.org/T243167 (10RobH) [20:27:03] bblack: updated the cp bios, quick question, i figure its ok to do one from each pool at a time right? [20:27:07] ie: one text, one upload [20:27:27] also the numbers are not consistent site to site. eqiad has even as upload, esams as odd as upload ;_; [20:27:30] it makes me sad [20:28:10] 10Traffic, 10DC-Ops, 10Operations, 10ops-esams: Upgrade BIOS and IDRAC firmware on R440 cp systems - https://phabricator.wikimedia.org/T243167 (10RobH) [20:29:01] 10Traffic, 10DC-Ops, 10Operations, 10ops-esams: Upgrade BIOS and IDRAC firmware on R440 cp systems - https://phabricator.wikimedia.org/T243167 (10RobH) My plan is to do one from each service group (upload/text) at a time, batched together. (It is just as easy to watch two bios updates as one, it doesn't q... [20:29:15] Is anyone doing OS upgrades on eqiad CP systems at this time? [20:29:26] if not, im going to start doing bios updates. [20:29:47] (likely not as its no longer AM in Pacific time) [20:29:54] and late in day for the eu opsen doing them. [20:48:28] 10Traffic, 10Commons, 10MediaWiki-File-management, 10Multimedia, and 8 others: Picture from Commons not found from Singapore - https://phabricator.wikimedia.org/T231086 (10aaron) I think having swift-repl manually set X-Timestamp is doable now. It would work kind of like rsync can in that regard. This also... [20:52:21] robh: yeah the numbers are sad :) nobody should be on buster upgrades at this time of day [20:52:25] 10Traffic, 10MobileFrontend, 10Operations, 10TechCom-RFC, 10Readers-Web-Backlog (Tracking): Remove .m. subdomain, serve mobile and desktop variants through the same URL - https://phabricator.wikimedia.org/T214998 (10Jdforrester-WMF) >>! In T214998#5863413, @Base wrote: > What does this mean for me as for... [20:52:45] robh: and yes, you can do one from each cluster. [20:57:34] awesome, thanks! [20:57:40] ill admin log when i go through them [21:37:14] Ok, I'm starting on them now. AFAIK there won't be anything but IRC icinga noise [21:37:39] and i cannot stop all of it (the noisest being the check for the vanish across the entire cluster check) so not bothing to silence the per host checks [21:37:42] that way i can see them come back [21:43:35] did you mean the old ipsec checks that spam from all the other nodes? [21:43:55] those are gone now, we got rid of ipsec [21:44:17] so I think if you down the host and dependencies work as expected, should be no noise [21:44:28] robh: ^ [21:44:35] ohhhh [21:44:40] yeah thats what i meant [21:44:45] ok, i'll do that for the rest then [21:44:56] cp1075 and cp1076 will be noisy but the remainder i can maint mode [21:45:00] when i work them [21:45:04] bblack: thanks =] [22:02:49] I have this ready for tomorrow's router upgrade if someone wants to +1 https://gerrit.wikimedia.org/r/c/operations/dns/+/571368 [22:19:51] 10Traffic, 10DC-Ops, 10Operations, 10ops-esams: Upgrade BIOS and IDRAC firmware on R440 cp systems - https://phabricator.wikimedia.org/T243167 (10RobH) [22:39:57] 10Traffic, 10DC-Ops, 10Operations, 10ops-esams: Upgrade BIOS and IDRAC firmware on R440 cp systems - https://phabricator.wikimedia.org/T243167 (10RobH) [22:44:11] 10Traffic, 10DC-Ops, 10Operations, 10ops-esams: Upgrade BIOS and IDRAC firmware on R440 cp systems - https://phabricator.wikimedia.org/T243167 (10RobH) [23:02:34] 10Traffic, 10Beta-Cluster-Infrastructure, 10Operations, 10RESTBase, 10User-Ryasmeen: Restbase routing down on beta, 2020-02-07 - https://phabricator.wikimedia.org/T244586 (10Jdforrester-WMF) Thank you! [23:05:10] 10Traffic, 10DC-Ops, 10Operations, 10ops-eqiad, 10ops-esams: Upgrade BIOS and IDRAC firmware on R440 cp systems - https://phabricator.wikimedia.org/T243167 (10RobH) [23:24:53] vgutierrez: ema: curious regarding T238494/T244538, is dns disabling expected to reduce latency? What is ats-tls currently doing with dns during the critical path? [23:24:55] T238494: 15% response start regression as of 2019-11-11 (Varnish->ATS) - https://phabricator.wikimedia.org/T238494 [23:30:55] 10Traffic, 10DC-Ops, 10Operations, 10ops-eqiad, 10ops-esams: Upgrade BIOS and IDRAC firmware on R440 cp systems - https://phabricator.wikimedia.org/T243167 (10RobH) [23:59:36] 10Traffic, 10Operations, 10SRE-swift-storage, 10Performance-Team (Radar): Reduce amount of headers sent from web responses - https://phabricator.wikimedia.org/T194814 (10Krinkle) [23:59:41] 10Traffic, 10Analytics, 10Operations, 10Performance-Team: Only serve debug HTTP headers when x-wikimedia-debug is present - https://phabricator.wikimedia.org/T210484 (10Krinkle)