[06:40:30] Amir1: looks good, adding a VTC test [06:41:21] testing with https://wikitech.wikimedia.org/wiki/Varnish#Configuration [07:17:22] Amir1: merging later today, thankyou! (pun intended) [07:45:35] 10netops, 10Operations, 10ops-codfw: (Need by: ) codfw:rack/setup/new management switches - https://phabricator.wikimedia.org/T253154 (10ayounsi) Please make sure to update the cables in Netbox: https://netbox.wikimedia.org/dcim/devices/2133/ Now that the switches are managed by Homer/Netbox there are some... [09:21:32] ema: well played :D [09:21:35] Thanks! [09:42:06] hello! I have a service I'd like to move to lvs_setup (https://gerrit.wikimedia.org/r/c/operations/puppet/+/619800). is now a safe/sensible time to do a pybal restart as per https://wikitech.wikimedia.org/wiki/LVS#Configure_the_load_balancers? [10:06:59] better today than tomorrow :) [10:18:35] https://puppet-compiler.wmflabs.org/compiler1001/24481/ pcc looks happy as expected [10:18:54] hnowlan: you'll need to restart pybal on the low-traffic and secondary load balancers [10:19:36] so, lvs1015, lvs1016, lvs2009 and lvs2010 :) [10:37:08] vgutierrez: cool, thanks! The state change itself looks okay? [10:37:24] I know it's just one line but the docs have the fear in me :D [10:37:34] I'd say so :) [10:42:36] alright, going to merge and run puppet on the lvs servers so [10:44:15] actually vgutierrez would you mind +1ing the change if you have a sec please? [10:46:24] thanks! [11:01:05] all done, gonna restart the pybals on that list [11:10:16] done! which active loadbalancer should I be restarting now? [11:20:03] oh wait I get it, lvs1015 and lvs2009 *are* the low-traffic ones and lvs1016 and lvs2010 are the secondary ones [11:42:49] I think I did something wrong at some point :) I get no route to host when querying it on the configured port (https://api-gateway.svc.eqiad.wmnet:8087/healthz) Any suggestions for how I might best debug what I broke? [12:47:04] sorry.. lunch break :) [12:47:08] let's see... [12:47:49] no worries! [12:47:56] a tcp traceroute looks very odd compared to something like mobileapps [12:50:32] so.. are we sure that the k8s cluster allows traffic to 10.2.2.55? [12:50:56] pybal / ipvsadm looks good [12:51:16] the servers listed for 10.2.2.55 are responding on port 8087 [12:51:42] but they need to allow incoming traffic for 10.2.2.55 as well [12:51:48] I'm not 100% sure how to check that on a k8s host [12:53:10] ohhh hmm good question [12:57:39] ohh looks like there might some extra config that the doc doesn't have [12:59:41] hnowlan: hieradata/role/common/kubernetes/worker.yaml [12:59:52] I'd say your service should be listed under profile::lvs::realserver::pools: [13:02:06] oh, nice [13:02:23] I'll add that to the doc [13:03:18] https://gerrit.wikimedia.org/r/c/operations/puppet/+/619996 if you have a sec please [13:06:06] hnowlan: yup, that does the trick.. see https://puppet-compiler.wmflabs.org/compiler1003/24482/ and how 10.2.2.55 is being added to LVS_SERVICE_IPS in kubernetes1005 [13:06:32] ah awesome [13:06:34] FFS... [13:06:44] I abandoned yours instead of my CR by mistake [13:06:48] /o\ [13:07:01] heh no big deal [13:07:22] restored and +1ed [13:07:23] go ahead [13:11:31] thanks! [13:16:27] vgutierrez: it worked! Thanks a lot for all the help [13:16:36] nice :D [14:05:32] oh, nice, good find [14:05:58] I figured it was something like that when I glanced at this before an appt this morning but didn't have time to dig in and didn't know offhand where the k8s worker configs lived in puppet [14:06:23] hnowlan: in general it's quite hard to diagnose these issues without looking at tcpdumps or other such things, it's the nature of LVS-DR [14:06:40] the fact that backend health probing happens over a separate path than the actual data path *also* does not help [14:08:42] heh I can imagine [14:08:55] yeah it can be a little bit tricky [14:09:05] Some Day(tm) we'll fix that [14:09:11] I wasn't even sure where to look beyond my mental copy of Baby's First TCP Problem [14:09:18] does volans need to bring the socket checker again? :-P [14:09:34] ahaha [14:56:20] 10netops, 10Operations: Standardize VRRP group IDs - https://phabricator.wikimedia.org/T260363 (10ayounsi) p:05Triage→03Low [15:04:03] 10netops, 10DC-Ops, 10Operations, 10ops-eqiad, 10cloud-services-team (Hardware): (Need By: 2020-06-12) rack/setup/install WMCS 10G switches - https://phabricator.wikimedia.org/T251632 (10ayounsi) I believe: > Also em0, the management ports don't have their cable info in Netbox. Is the last thing to do h... [16:11:34] hello again, I have more questions I'm afraid. I'm close to having everything in place for my new LVS service. My next task it to make it available to the open internet at api.wikimedia.org. I've read https://wikitech.wikimedia.org/wiki/Global_traffic_routing and I was wondering if there's any additional process required to get approval and/or config outside of ATS and setting varnish to pass [16:11:40] uncached? [16:11:45] I'm aware this is probably a big question, heh [16:13:15] hnowlan: we have surprisingly little process here :) [16:14:16] it's possible you should have already asked for legal/privacy reviews a while ago, although if you're not doing any extra logging yourself (using things other than logstash and webrequest analytics), they're generally no-ops [16:14:29] did you add the DNS name to the external zones? [16:14:37] likely you just want a CNAME to dyna [16:15:23] I have a CR for it but it's not merged yet [16:17:24] oh, I think you're talking about https://gerrit.wikimedia.org/r/c/operations/dns/+/619798 ? assuming you used the discovery name in the traffic backend configuration, yes that will need to get merged first [16:17:34] but I meant editing operations/dns // templates/wikimedia.org [16:17:51] which should be a separate patch [16:19:48] I meant this one (but the other one needs doing too) https://gerrit.wikimedia.org/r/c/operations/dns/+/599273 [16:20:05] ah I missed that somehow [16:20:16] ah it's old, and technically it's not my CR but I'm going to take over driving it [16:21:56] but yeah -- dyna.wikimedia.org resolves to the public IP for the text-lb LVS IP, which then routes to the cp-text nodes, which are configured via the files mentioned on the Global traffic routing wikitech page [16:22:06] I think you should be set after that [16:22:36] fantastic, thanks a lot! [16:24:26] np :) [16:41:57] Could I borrow a +1 on https://gerrit.wikimedia.org/r/c/operations/dns/+/619798 if anyone has a sec? [16:43:49] you did the operations/puppet conftool-data/ and hieradata/ patch already right? [16:44:16] (I think I stamped those...) [16:44:49] yeah, you did :) [16:45:06] hnowlan: have you merged DNS patches before? [16:46:31] yep! (as of today, heh) [16:46:36] ok cool :) [16:46:59] thanks for the review [17:05:14] ok - last humble review request, I promise. after this one I will never* ask for another review (today) https://gerrit.wikimedia.org/r/c/operations/puppet/+/620067 [21:37:42] 10HTTPS, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): Set "https_upgrade" configuration flag for domainproxy to enforce HTTPS upgrade for GET|HEAD requests - https://phabricator.wikimedia.org/T120486 (10bd808) >>! In https://gerrit.wikimedia.org/r/620122, @bd808 wrote: > We really never... [21:45:09] 10HTTPS, 10Cloud-VPS, 10Patch-For-Review, 10cloud-services-team (Kanban): Set "https_upgrade" configuration flag for domainproxy to enforce HTTPS upgrade for GET|HEAD requests - https://phabricator.wikimedia.org/T120486 (10bd808) Here's my strawman timeline and communications plan: * week of 2020-08-17: **...