[00:19:23] 10netops, 06DC-Ops, 06Infrastructure-Foundations, 10ops-codfw, 06SRE: codfw row C/D upgrade racking task - https://phabricator.wikimedia.org/T360789#9924471 (10Papaul) [06:31:28] 10netops, 06Infrastructure-Foundations: magru ipv6 issues - https://phabricator.wikimedia.org/T368499 (10ayounsi) 03NEW [08:30:33] ryankemper: could you forward the calendar event / google meet details? [08:41:05] 10netops, 06Infrastructure-Foundations, 06SRE: Core router error logs: "sshd: Did not receive identification string" from prometheus hosts - https://phabricator.wikimedia.org/T368513 (10cmooney) 03NEW p:05Triage→03Medium [08:43:56] vgutierrez: Added you. Not sure how to forward it more generally but the event is `DPE Search - The Wednesday Meeting™` on my calendar if anyone else needs the info [08:45:55] ryankemper: it collides with the SRE management meeting so probably b.black won't be able to attend [08:46:19] The stated time is a bit of a lie. It's listed as a 30 minute meeting but really it's ~2 hours [08:46:24] I'll be there and try to gather info to be able to discuss it internally [08:46:34] IOW start time is correct but not end time [08:46:42] ack [09:54:15] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: No unicast IP ranges announced to peers from eqdfw - https://phabricator.wikimedia.org/T367439#9925639 (10cmooney) >>! In T367439#9921613, @ayounsi wrote: > Your proposal seems good to me. > > Adding the anycast AS makes sens, I think I in... [13:23:16] 06Traffic, 06MW-Interfaces-Team, 06serviceops: map the /api/ prefix to /w/rest.php - https://phabricator.wikimedia.org/T364400#9926374 (10Bmueller) [13:24:14] 06Traffic, 06MW-Interfaces-Team, 06serviceops: map the /api/ prefix to /w/rest.php - https://phabricator.wikimedia.org/T364400#9926376 (10Bmueller) @daniel thank you for al the prep work! This is good to go :-) [13:43:04] 06Traffic, 06MW-Interfaces-Team, 06serviceops: map the /api/ prefix to /w/rest.php - https://phabricator.wikimedia.org/T364400#9926454 (10Joe) >>! In T364400#9780622, @BBlack wrote: >>>! In T364400#9779996, @hnowlan wrote: >> Could we implement this remapping at the ATS layer rather than the Apache one, in a... [14:09:08] 06Traffic, 06MW-Interfaces-Team, 06serviceops: map the /api/ prefix to /w/rest.php - https://phabricator.wikimedia.org/T364400#9926617 (10daniel) >>! In T364400#9926454, @Joe wrote: > Even more to @bblack's comment, I would just have apache funnel anything under `/api` it receives to an endpoint in mediawiki... [14:35:01] 06Traffic: LVSRealserverMSS alert is broken for ferm based hosts - https://phabricator.wikimedia.org/T367204#9926710 (10Vgutierrez) p:05Triage→03Medium [14:46:01] 10netops, 06Traffic, 06Infrastructure-Foundations, 06serviceops: IPIP encapsulation considerations for low-traffic services - https://phabricator.wikimedia.org/T368544 (10Vgutierrez) 03NEW [14:46:21] 10netops, 06Traffic, 06Infrastructure-Foundations, 06serviceops: IPIP encapsulation considerations for low-traffic services - https://phabricator.wikimedia.org/T368544#9926761 (10Vgutierrez) p:05Triage→03Medium [14:51:00] 10netops, 06Traffic, 06Infrastructure-Foundations, 06serviceops: weighted maglev viability for low-traffic services - https://phabricator.wikimedia.org/T368545 (10Vgutierrez) 03NEW [14:51:13] 10netops, 06Traffic, 06Infrastructure-Foundations, 06serviceops: weighted maglev viability for low-traffic services - https://phabricator.wikimedia.org/T368545#9926781 (10Vgutierrez) p:05Triage→03Medium [14:53:27] 06Traffic, 10MoveComms-Support, 10MW-on-K8s, 06serviceops, and 2 others: Move 100% of external traffic to Kubernetes - https://phabricator.wikimedia.org/T362323#9926798 (10Clement_Goubert) [14:53:31] _joe_, cdanis, topranks, XioNoX, bblack, fabfur: no rush at all but I'd like to get some feedback from you guys on both T368545 && T368544 <3 [14:53:32] T368545: weighted maglev viability for low-traffic services - https://phabricator.wikimedia.org/T368545 [14:53:33] T368544: IPIP encapsulation considerations for low-traffic services - https://phabricator.wikimedia.org/T368544 [15:06:44] thanks vgutierrez <3 [15:12:34] 10netops, 06Traffic, 06Infrastructure-Foundations, 06serviceops: IPIP encapsulation considerations for low-traffic services - https://phabricator.wikimedia.org/T368544#9926889 (10cmooney) > IPIP encapsulation has a 20 bytes overhead that needs to be accounted somehow, in high-traffic[12] services we chose... [15:16:40] in hieradata/role/common/dnsbox.yaml under profile::dns::auth::acmechief_target::acmechief_hosts acmechief1002.eqiad.wmnet seems missing? [15:17:41] sorry, I was in meetings [15:17:52] moritzm: yeah I am not sure how, looking at the log now [15:18:28] thx,no rush [15:18:32] 9bc5cad8a3f71e5f668991099cfe80a9379bbc04 [15:18:37] just something I noticed when cleaning out the legacy configs [15:19:55] that seems unrelated? it's 1002 that is missing and my commti doesn't touch it? [15:19:57] let me blame the exact line to see how we misesd it [15:20:16] yes, I got the wrong commit, looking again [15:20:35] yeah no idea [15:20:47] I will fix it but better I think I will try to generate it automatically [15:20:50] thanks for the heads-up [15:21:00] ack, great [15:21:05] sorry, 9bc5ca was not related, I was confusing it with the P7 change [15:22:21] ack [15:24:45] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1049969 [15:29:54] 06Traffic, 10MoveComms-Support, 10MW-on-K8s, 06serviceops, and 2 others: Move 100% of external traffic to Kubernetes - https://phabricator.wikimedia.org/T362323#9926945 (10Jdforrester-WMF) Should we call this Resolved and track the remaining migrations in the parent, T290536? [16:01:09] sukhe: o/ can we do https://gerrit.wikimedia.org/r/c/operations/puppet/+/1042278 today? [16:01:19] i'm here and don't have any more meetings today wow! [16:01:42] ottomata: happy to. I just need 30 mins to roll out and an existing change that I need to babysit [16:03:12] okay! perfect. i'll be here! gonna make some lunch back shortly [16:03:49] thanks, will ping you when done [16:05:09] 06Traffic, 06Data-Engineering, 10Observability-Logging, 13Patch-For-Review: Upgrade hosts to haproxy 2.8.10 - https://phabricator.wikimedia.org/T367756#9927223 (10Fabfur) Update on this investigation: apparently capturing the frame with tcpdump from HAProxy to Benthos, doesn't show the "log merging". The w... [16:30:45] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9927349 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp5018.eqsin.wmnet with OS b... [16:31:30] ottomata: hello hello [16:31:32] here and looking at it now [16:33:52] ok, let's merge it if you are ready [16:40:13] ottomata: will wait for you to be around but I am here [16:40:23] hello! [16:40:23] yes [16:40:30] why did I not see this ping. Hmm [16:40:33] sorry sukhe i'm here [16:41:16] np [16:41:23] let's do it unless you want to do it yourself in which case go for it :) [16:41:53] ok starting [16:41:58] i can do it, just want to have help in case I break something! [16:42:00] proceeding [16:42:06] sure! go for it [16:42:17] the per-host hiera override is helpful so we don't need to disable puppet on A:cp-text [16:42:29] but I will ensure a NOOP on another host for my OCD if nothing else :) [16:42:57] I forget does puppet have gate and submit? Or should I submit? [16:43:27] submitting... [16:44:08] just submit [16:44:11] and merge on puppetmaster [16:44:18] and then run sudo puppet agent -tv on cp1100 [16:44:25] puppet merged [16:44:32] running puppet [16:46:59] some puppet failures because of missing dependencies, e.g. removing eventlogging varnishkafka instance, systemd and nrpe stuff. running puppet again to make sure it was just for that run [16:47:20] there was a whitespace change in the vcl file on cp1101 [16:47:22] extra newline [16:47:24] but looks good [16:48:10] nope, still failures [16:48:15] huh [16:48:16] let's see [16:48:16] the vcl and varnish stuff look okay [16:48:25] its just the removal of varnishkafka-eventlogging [16:49:58] looking [16:50:12] so [16:50:31] in cp1100, you set it to absent [16:50:43] and then present by default. that's intentional? [16:51:07] yes. intentiional [16:51:19] the eventual intention is to remove it everywhere [16:51:28] then we can get rid of the relevant puppet code [16:51:45] it looks like the instance was stopped [16:51:56] puppet just doesn't know how to do ensure=>absent all the way down correctly i think [16:52:03] yeah [16:52:07] but we need to clean this up, so looking where [16:52:11] also looking [16:52:12] I disabled puppet elsehwere [16:52:15] just in case [16:52:24] fwiw puppet was fine on cp1101 [16:52:30] its only on this host that i'm setting absent [16:52:32] present was default before [16:52:52] yep, matched it up [16:54:55] it seems to have to do with the enable param on the service [16:55:05] but not sure why its a problem? [16:55:07] yeah but it's weird [16:55:44] are ensure => stopped and enable => true incompatible somehow? [16:55:50] i wouldn' think so. [16:57:08] 10netops, 06Infrastructure-Foundations, 06SRE, 13Patch-For-Review: Configure QoS marking and policy across network - https://phabricator.wikimedia.org/T339850#9927454 (10cmooney) [16:57:48] PCC confirms present to absent as well [16:58:05] - ensure => running [16:58:05] + ensure => stopped [16:59:06] hm, but why doesn't PCC show enabled => false [16:59:23] base::service_unit looks like it should do that if ensure is not 'present' [16:59:57] https://github.com/wikimedia/operations-puppet/blob/production/modules/base/manifests/service_unit.pp#L125 [17:00:55] POH [17:00:56] OH [17:00:57] found it [17:01:01] haha, I am curious what [17:01:05] its in varnishkafka::instance ensrue is set to true explicitly [17:01:06] sorry [17:01:07] enable* [17:01:09] is set to true [17:01:33] i think that should not provide an override for enable [17:01:39] and let service_unit manage it [17:02:02] hahadefine varnishkafka::instance( $ensure = 'present', [17:02:08] -haha, that was earlier [17:02:33] the param docs are not updated but ensure is there [17:03:23] patch incoming [17:03:37] https://gerrit.wikimedia.org/r/c/operations/puppet/+/1049994 [17:03:54] ah this enable ha [17:03:55] ok [17:03:59] let's run PCC on it [17:04:06] btw, base::service_unit has been replaced with systemd::service back in 2018/2019 for basically everything EXCEPT varnishkafka and confd because it felt dangerous to touch those [17:04:06] running now [17:04:14] so like almost nothing else uses that anymore [17:04:16] oh! interesting. [17:04:34] https://gerrit.wikimedia.org/r/q/message:service_unit+owner:dzahn@wikimedia.org [17:04:38] well, i think varnishkafka is on its (slow) way out... [17:04:58] webrequest is going to haproxy + benthos, eventlogging removed here, only thing left might be statsv? which we want to replace too [17:05:04] looking [17:05:06] pcc https://puppet-compiler.wmflabs.org/output/1049994/3075/ [17:05:26] lgtm? [17:05:32] yep [17:05:33] - enable => True [17:05:33] + enable => False [17:05:44] no change on cp1101 [17:05:58] if you feel like changing that. ancient ticket: https://phabricator.wikimedia.org/T194724 [17:06:20] mutante: nice history [17:06:37] thanks mutante ! i hope we'll just decom varnishkafka and remove the puppet code one day [17:06:49] sounds good :) [17:06:54] thanks for +1 sukhe , merging [17:07:02] $deityspeed [17:08:05] sukhe: btw, is there a way I can make a request go through a specific host? I want to test this whole thing by sending a request via cp1100 [17:08:22] (running puppet) [17:08:29] ottomata: https://wikitech.wikimedia.org/wiki/Varnish#Force_your_requests_through_a_specific_Varnish_frontend [17:08:51] well once again I should search before I ask [17:08:52] ty [17:09:01] np, you should be more surprised we have docs :) [17:09:58] puppet success! [17:10:01] nice! [17:10:14] clean Puppet run == happiness [17:10:28] hm, these instructions are for linux, not macos i think! Eeek [17:11:26] do i need setcap? i will find out [17:12:52] hm, you know, i can just curl from cp1100 :) [17:13:08] ha, but well sure, but if you need to reach from the outside world [17:13:15] yeah [17:14:25] it works! [17:14:30] you will need CAP_NET_BIND for 443 [17:14:31] nice [17:15:35] okay sukhe the next step is to do this everywhere [17:15:40] ottomata: ok. [17:15:46] do you want to do this today? happy either way but asking! [17:16:08] i think so! let me double check something [17:16:55] yes, let's do it. [17:17:22] hm, perhaps, let's verify that the beacon endpoints still work from cp1100 [17:17:23] doing that now... [17:18:02] 06Traffic, 06Infrastructure-Foundations, 10Puppet-Core, 06SRE, 07patch-welcome: Deprecate `base::service_unit` in puppet - https://phabricator.wikimedia.org/T194724#9927634 (10Dzahn) status of this ticket in 2024. remaining services using this: [] base::service_unit { 'prometheus-node-exporter': [] base... [17:19:48] ok :) [17:21:39] hm, I think the only other beacon usage is statsv? verified that works. [17:22:35] codesearch is showing me some more [17:23:03] they all respond with 204 as expected [17:23:10] /beacon/event is the only one that is different [17:23:12] okay i think we can proceed [17:23:29] as long as you have verified it :) [17:24:57] just change the default in the profile I guess, no need for hiera [17:25:33] oh [17:25:37] hm, indeed! [17:26:29] will need to change the beacon regex. should I do that in the default in profile::cache::varnish::frontend too ? [17:27:51] let me see again, just a sec [17:28:35] yes, as long as this is the only place it's used, it should be there I think [17:29:18] I added the param in the last patch [17:29:19] so it should be ;) [17:30:33] :D [17:31:16] sukhe: i could also remove the param and move the regex back to hardcoded in vcl file? [17:31:23] maybe the param will be useful in future tho? [17:32:25] yeah, leave it in the profile vs hardcoded [17:32:33] okay [17:34:45] ah we should run puppet on icinga host too, just got an alert for vk-eventloggin on cp1100 [17:34:50] ottomata: CR 1050000. if you ever wanted to try the powerball, maybe today is the day [17:34:57] oh ho! [17:35:01] ottomata: which alert? [17:35:04] sukhe: this uses nrpe... hm [17:35:21] Notification Type: PROBLEM [17:35:21] Service: eventlogging Varnishkafka log producer [17:36:01] ah no [17:36:06] don't worry, that's cp5018, being reimaged [17:36:16] oh? no its from cp1100 [17:36:25] Notification Type: PROBLEM [17:36:25] Service: eventlogging Varnishkafka log producer [17:36:25] Host: cp1100 [17:36:25] Address: 10.64.0.79 [17:36:25] State: CRITICAL [17:36:28] oh there too? let me see; I just saw cp1100 [17:36:30] er, 5018 [17:36:36] it might be gone now [17:36:44] yeah this was from 47 mins ago [17:36:45] in my email [17:36:56] but we'll prob get more if we don't run puppet to ensure the alert is gone too [17:36:57] yeah, all clean now [17:37:05] after we do this everywhere [17:37:13] yeah don't worry, we can clean it up [17:37:52] running pcc on https://gerrit.wikimedia.org/r/c/operations/puppet/+/1050000 [17:37:57] thanks [17:38:26] nice doc comments ottomata! [17:39:44] always :D [17:39:45] looking at PCC [17:40:31] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9927697 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp5018.eqsin.wmnet with OS bulls... [17:40:53] hm, i guess cp1101 doesn't have varnishkafka-eventlogging? [17:41:26] upload [17:41:52] oh weird, 1100 and 1102 are text but 1101 is upload? even/odd? [17:41:53] IIRC, eventlogging is only in text [17:41:56] yep [17:41:57] TIL [17:42:05] if you want more fun, it's not the same on all sites :P [17:42:08] okay well i chose my PCC targets well then [17:42:13] i do not want more fun [17:42:49] ottomata: you should also remove the cp1100 override in this commit maybe [17:42:53] oh yesh thank you [17:43:08] pushed [17:43:21] i'll run pcc again there for ol times sake [17:43:23] lol ok [17:44:05] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9927710 (10BCornwall) [17:45:30] no diff on cp1100 as expected [17:45:35] yep [17:45:36] lgtm sukhe. whatcha think? [17:45:44] ottomata: I think we can ship it. [17:45:50] if you want to be extra safe [17:45:54] disable puppet on A:cp-text [17:45:56] and then roll on one [17:45:58] and then everywhere [17:46:00] or I can do it [17:46:05] but I think that's probably the best in this ase [17:46:06] *case [17:46:18] if you have your fingers ready please go for it, i'd have to dig up my cumin manual [17:46:25] on it [17:46:28] been a while since i've done sre work :) [17:46:46] hm, can I disable the icinga alert ahead of time? i'd prefer not to email spam this [17:47:06] all good, one host at a time so I will see and silence the alert [17:47:27] where did you get the email about this though? [17:47:52] analytics-alerts@wikimedia.org [17:47:55] i think its configured that way [17:47:56] ha [17:48:00] contact groups whatever [17:48:04] ok, let me know if you get it for this time or not [17:48:10] there are so many options for disabling this in icinnga [17:48:11] merging on cp4037 [17:48:15] ty [17:48:43] I chose "disable active checks for this service" [17:49:32] sukhe: i haven't merged patch, shall I? [17:49:40] oh i see your ship it comment [17:49:46] go for it, I was disabling Puppet [17:49:48] k [17:49:49] merging [17:50:01] and yeah, NOOP on 4037 for now. will run again [17:51:15] puppet-merge complete [17:51:20] logged in -operations [17:51:24] running [17:52:36] https://puppetboard.wikimedia.org/report/cp4037.ulsfo.wmnet/0ecda2e91a549cd8e757fbf02502a1223c6833c6 [17:53:03] uhhhh TIL puppetboard [17:53:26] amazing [17:53:39] looks good to you? [17:53:51] ya i will do my test on cp4037 real quick to be super sure [17:53:58] thanks [17:54:18] (fixing bastion config for ulsfo...) [17:55:21] sukhe: looks good to me! [17:56:22] nice! [17:56:41] so basically from here on, we roll it out to all others, albeit a bit more batched out [17:56:45] k [17:56:56] do you just progressivly enable puppet? [17:56:58] yep [17:57:11] k, lemme know if i can ehlp [17:57:16] you silenced it all on Icinga, thanks [17:57:16] i'll be here [17:57:21] yup [17:57:24] thanks, np, you can leave it to me. will let you know when done [17:57:28] yeehaw you the best [17:57:39] haha hardly, you did all the work [18:17:45] ottomata: all done :) [18:17:55] oh my! looking! [18:19:06] yay it works from my local [18:19:12] congrats! [18:19:30] awesooomme this is so exciting sukhe [18:19:32] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9927836 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp5019.eqsin.wmnet with OS b... [18:19:49] i can FINALLY decom! [18:19:56] we started this migration 4 years ago!!!!!! [18:20:17] i'll give this a day to settle before decoming things [18:20:18] but will prep patches [18:20:22] thank you so much! [18:20:29] np, happy to help remove stuff! [18:20:51] feel free to add us on the review for the rest of the stuff too if desired [18:26:32] hm, i wonder how this is going in beta right now... :) [18:26:39] this stuff is configured there too [18:26:55] ottomata: we don't own beta as such but yeah, probably good to check there [18:26:57] (any puppetboard in beta?) [18:27:04] not that I know of no [18:34:54] 06Traffic, 06SRE: Anycast NTP and update the list of timeservers for P:systemd::timesyncd - https://phabricator.wikimedia.org/T366360#9927897 (10ssingh) 05Open→03Resolved a:03ssingh This was rolled out to all 2166 hosts today that are now using `ntp-[abc].anycast.wmnet`. All traces of `ntp.anycast.wm... [18:45:39] hmmm [18:45:58] ottomata: what's up? [18:46:14] seeing something unexpected? [18:46:30] MediaWikiPingback events look like they come in pretty frequently [18:46:37] but, none seince 18:10 utc? [18:47:11] hm, yeah something not good [18:47:11] https://grafana.wikimedia.org/goto/M17SPpQIg?orgId=1 [18:47:13] investigating [18:47:29] let me check the timing [18:48:14] yep [18:48:14] matches [18:48:44] attempting logstash fu [18:49:01] i am a white belt [18:51:56] i can curl fine... [18:52:10] and make dummy MWPingback events come through... [18:52:38] sorry, not very useful with this but we can revert, push the patch and see if it helps [18:52:46] that way we can at least narrow it down to this [18:52:47] that should be OK [18:53:18] k, gimme a couple mins...it'll be hard to repro this after revert, since my manual test works... [18:53:31] i want to see if i can find the cause [18:53:49] sure [18:54:22] ottomata: it seems like there was a deployment there as well? [18:54:31] ended at 18:17 [18:54:34] eh? [18:54:35] hm [18:54:36] i see [18:54:42] 14:53:31 < jeena> Rolling back the train due to higher than normal DB query error rates [18:54:48] I have no idea if related or not, will leave that to you [18:55:08] hmmmmmmmmmm [18:55:13] maybe i see it... [18:56:06] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9927953 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp5019.eqsin.wmnet with OS bulls... [18:56:22] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9927954 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp5019.eqsin.wmnet with OS b... [18:56:41] sukhe: yeah hm [18:56:51] curl -v 'https://www.mediawiki.org/beacon/event?%7B%22schema%22%3A%22Test%22%2C%22revision%22%3A15047841%2C%22wiki%22%3A%22472d1e0653ccb71526c091ecd2ea4bbe%22%2C%22event%22%3A%7B%22OtherMessage%22%3A%22OttoTest%5Cu0020Message%22%7D%7D;'; [18:56:53] redirects [18:56:56] 301 [18:57:07] curl -v 'https://www.mediawiki.org/beacon/event/?%7B%22schema%22%3A%22Test%22%2C%22revision%22%3A15047841%2C%22wiki%22%3A%22472d1e0653ccb71526c091ecd2ea4bbe%22%2C%22event%22%3A%7B%22OtherMessage%22%3A%22OttoTest%5Cu0020Message%22%7D%7D;'; [18:57:11] with / before query params [18:57:12] works [18:57:13] that was my test. [18:57:20] the clients do not use the / [18:57:25] why does / redirect!? [18:57:26] grr [18:57:56] ottomata: the regex [18:58:13] no, i think the request is making it to mw.org? [18:58:19] hmm [18:58:29] < server: mw-web.eqiad.main-8456bb6d77-rp62h [18:58:37] < x-cache: cp1112 miss, cp1112 miss [18:58:37] < x-cache-status: miss [18:58:59] the req is being handled by docroot /beacon/event/index.php [18:59:09] so somehow a request to /beacon/event is redirecting to /beacon/event/ [18:59:41] hmmm a symlink might work? I think so? rather than relying on /index.php handling [18:59:46] will try on mwdebug real quick [18:59:52] ok :] [19:00:00] (thank you for rubber ducking with me) [19:01:56] ah no of course not,b ecause wthout .php file ext it is a text file [19:01:58] okay lets revert [19:02:07] patch coming... [19:02:14] ok! [19:04:21] FIRING: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [19:05:02] sukhe: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1050027 [19:06:01] you still want to remove it on cp1100 I am assuming? [19:06:29] yeah i suppose we should, i was going go keep it to test buuut we shoudl just fully revert.... one sec [19:06:50] ok [19:07:22] k done [19:08:38] running PCC [19:10:09] rolling it out [19:10:16] first one host then all in batches of 20 [19:10:27] thank you [19:13:59] ottomata: rolling out everywhere else [19:14:21] RESOLVED: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [19:15:33] ack [19:19:41] ottomata: all done [19:20:02] coming back up [19:23:12] sukhe: thank you [19:23:12] https://grafana.wikimedia.org/goto/Pq4v8pwIg?orgId=1 [19:23:14] looks good [19:23:25] that's enough breaking things for today [19:23:37] might have to go back to the drawing board here :/ [19:23:51] i was trying to avoid special frontend routing here, but we might have to [19:24:36] ok! sorry that our joy was short-lived but could be worse :) [20:03:38] FIRING: [8x] LVSRealserverMSS: Unexpected MSS value on 103.102.166.224:443 @ cp5019 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=eqsin&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [20:05:12] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9928301 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp5019.eqsin.wmnet with OS bulls... [20:08:38] RESOLVED: [8x] LVSRealserverMSS: Unexpected MSS value on 103.102.166.224:443 @ cp5019 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=eqsin&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [20:48:37] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9928552 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp5020.eqsin.wmnet with OS b... [21:13:47] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9928630 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp5020.eqsin.wmnet with OS bulls... [21:14:08] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9928633 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp5020.eqsin.wmnet with OS b... [22:12:38] FIRING: [8x] LVSRealserverMSS: Unexpected MSS value on 103.102.166.224:443 @ cp5020 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=eqsin&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [22:17:38] RESOLVED: [8x] LVSRealserverMSS: Unexpected MSS value on 103.102.166.224:443 @ cp5020 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=eqsin&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [22:19:38] FIRING: [8x] LVSRealserverMSS: Unexpected MSS value on 103.102.166.224:443 @ cp5020 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=eqsin&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [22:23:00] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9928841 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp5020.eqsin.wmnet with OS bulls... [22:24:38] RESOLVED: [8x] LVSRealserverMSS: Unexpected MSS value on 103.102.166.224:443 @ cp5020 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=eqsin&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [22:26:29] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9928853 (10BCornwall) [22:47:15] https://gerrit.wikimedia.org/r/c/operations/dns/+/1050075 [22:48:06] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9928876 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage was started by brett@cumin2002 for host cp5021.eqsin.wmnet with OS b... [23:12:17] FIRING: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [23:17:17] RESOLVED: SLOMetricAbsent: - https://alerts.wikimedia.org/?q=alertname%3DSLOMetricAbsent [23:53:38] FIRING: [8x] LVSRealserverMSS: Unexpected MSS value on 103.102.166.224:443 @ cp5021 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=eqsin&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS [23:56:49] 06Traffic, 06DC-Ops, 10ops-eqsin, 06SRE, 13Patch-For-Review: Q4: install PCIe NVMe SSDs into eqsin text cp50(1[789]|2[01234] - https://phabricator.wikimedia.org/T365763#9928977 (10ops-monitoring-bot) Cookbook cookbooks.sre.hosts.reimage started by brett@cumin2002 for host cp5021.eqsin.wmnet with OS bulls... [23:58:38] RESOLVED: [8x] LVSRealserverMSS: Unexpected MSS value on 103.102.166.224:443 @ cp5021 - https://wikitech.wikimedia.org/wiki/LVS#LVSRealserverMSS_alert - https://grafana.wikimedia.org/d/Y9-MQxNSk/ipip-encapsulated-services?orgId=1&viewPanel=2&var-site=eqsin&var-cluster=cache_text - https://alerts.wikimedia.org/?q=alertname%3DLVSRealserverMSS