[01:29:53] 10Traffic, 10MediaWiki-ResourceLoader, 10Operations, 10Performance-Team: Expires header for load.php should be relative to request time instead of cache time - https://phabricator.wikimedia.org/T105657#3951291 (10Krinkle) 05Open>03stalled [02:01:33] 10Traffic, 10Analytics, 10Operations, 10Research, and 6 others: Referrer policy for browsers which only support the old spec - https://phabricator.wikimedia.org/T180921#3951340 (10Tgr) @Nuria the new config is live now (although it will only take effect gradually due to Varnish caching). Can you check if t... [02:02:17] 10Traffic, 10Continuous-Integration-Infrastructure, 10Operations: Lower varnish caching length on doc.wikimedia.org - https://phabricator.wikimedia.org/T184255#3951341 (10Legoktm) 05Open>03Resolved a:03Legoktm Yep! [02:16:56] 10Traffic, 10MediaWiki-ResourceLoader, 10Operations, 10Performance-Team: Expires header for load.php should be relative to request time instead of cache time - https://phabricator.wikimedia.org/T105657#3951353 (10BBlack) @Krinkle - Seems sane to try. I do wonder if browsers will actually allow `Age: 0` to... [02:23:34] 10Traffic, 10MediaWiki-ResourceLoader, 10Operations, 10Performance-Team: Expires header for load.php should be relative to request time instead of cache time - https://phabricator.wikimedia.org/T105657#3951362 (10Krinkle) 05stalled>03Open [02:24:41] 10Traffic, 10MediaWiki-ResourceLoader, 10Operations, 10Performance-Team: Expires header for load.php should be relative to request time instead of cache time - https://phabricator.wikimedia.org/T105657#1448575 (10Krinkle) @BBlack Thanks for the quick response. I'll put move it out of blocked then. Will let... [02:24:47] 10Traffic, 10MediaWiki-ResourceLoader, 10Operations, 10Performance-Team: Expires header for load.php should be relative to request time instead of cache time - https://phabricator.wikimedia.org/T105657#3951365 (10Krinkle) a:05Krinkle>03None [11:13:23] hello people, I created https://gerrit.wikimedia.org/r/c/408773/ as attempt to add ipsec from cp hosts to kafka-jumbo [11:13:34] let me know whenever you have time for a review :) [11:53:51] OMG polygerrit [11:54:01] it will take some effort to get used to it visually :) [11:54:12] indeed! [11:56:34] why is that the case every time when gerrit introduces a UI? [11:58:31] I imagine it is because our brain gets wired onto a certain gerrit way of doing things, and then we have to re-learn all of it again [11:59:21] when github would changes the UI I'm sure they get a lot of shit from users for instance [11:59:51] s/would//, that wasn't English ^ [12:00:30] elukey: is the comment here still valid, even in light of the new patch? https://github.com/wikimedia/puppet/blob/production/modules/role/manifests/kafka/analytics.pp#L13 [12:01:10] or is the plan with the new jumbo cluster to fix the mirrormaker thing too? [12:02:54] ema: we need to migrate kafka-main (kafka for eventbus/jobqueues/etc..) to Kafka 1.0 first (as jumbo is), and then we'll be able to run mirror maker from jumbo hosts [12:04:39] mirror maker is basically a kafka consumer/producer, that reads from a cluster and replicates to another one [12:04:57] but since kafka main still runs 0.9, running it from Jumbo wouldn't work [12:05:17] because 0.9 and 1.0 don't talk to each other I guess [12:06:29] I think it is a matter of having a consumer/producer using two different api versions, but I'd need to recheck what was the issue that Andrew got when he tried to migrate mirror maker [12:07:24] there you go https://phabricator.wikimedia.org/T177216#3712901 [12:07:42] (at the time Jumbo was 0.11, now 1.0) [12:09:23] lol @ crap crackers [12:09:29] we are planning to migrate kafka main to 1.0 next quarter if everything goes as planned [12:09:32] ahahahha yes [12:13:22] elukey: so the patch says it adds ipsec config for cache<->jumbo communication, but it seems to be a noop on cache hosts? Shouldn't we get the jumbo hosts listed under /etc/ipsec.conf on those? [12:15:05] yep I think so, but I don't remember what kind of things are needed to be done other than the changes that I made [12:16:02] in theory in role::ipsec there is an explicit hiera call for "cache::ipsec::kafka::nodes" [12:16:14] or is this one of those cases where pcc thinks the change is a noop while in fact it is not? [12:16:27] maybe we can check the catalog [12:19:32] kafka-jumbo hosts are only listed in known_hosts if I'm reading this right: https://puppet-compiler.wmflabs.org/compiler02/9878/cp1052.eqiad.wmnet/change.cp1052.eqiad.wmnet.pson [12:21:36] I tried to check in the full catalog traces of other kafka hosts but didn't find the ipsec config [12:22:01] yeah this is probably a case that trips out pcc [12:22:32] oh wait, but we've tried pcc on cp1052 and I don't see kafka nodes on eqiad cache hosts' ipsec.conf [12:22:53] don't shoot me, I didn't design the original ipsec-related classes :P [12:23:18] :) [12:23:22] https://github.com/wikimedia/puppet/blob/production/modules/role/manifests/ipsec.pp#L36 [12:24:04] are those done with exported resources? [12:24:14] kafka_nodes['eqiad'] is not among the array_concat in the eqiad branch (obviously) [12:24:32] elukey: maybe try pcc on a codfw cache host? [12:24:34] oh, yeah [12:24:40] that makes sense [12:24:45] try pcc from a cache host that's not in eqiad and thus actually uses ipsec to eqiad :) [12:25:23] but in general, the gerrit changeset looks sane to me on manual inspection [12:26:22] beware you'll probably have to rush it once you push it to avoid icinga spam. As in, get the agent run done on the jumbos and all the non-eqiad caches in a relatively short number of minutes [12:26:52] ema: oh right of course! sorryyyy [12:27:40] bblack: hi! so Jumbo first, then all the non eqiad caches [12:27:53] doesn't matter which is first, just all done in short time together. [12:28:32] all right [12:29:06] basically as soon as any one of the set of (all non-eqiad caches + affected jumbos) puppetizes, it will begin alerting icinga something's wrong until most of the rest are also done, with the problem reaching critical mass when about half of them are done. [12:29:36] I think it takes a few minutes before icinga actually hits IRC (like 7 failed checks) to smooth this out a little, but still [12:32:03] I can alert people before merging so it doesn't seem that the world is falling over if icinga starts to complain :) [12:32:38] yeah might be wise, stuff some warning in -ops "hey if you see any ipsec alerts, don't freak out" [12:33:54] elukey: I looked at your new pcc, it seems to only be configuring v4, I guess the jumbo hosts lack dns-configured (static-mapped) v6? [12:35:06] bblack: you are right, I thought Andrew already did it but I was wrong [12:35:51] that is the interface::add_ip6_mapped { 'main': } right ? [12:36:07] yout change as-is will "work", but if for some reason the nodes connect over v6 it will be unprotected [12:36:42] I guess if the (TCP-level) connection is always cache->kafka-jumbo, they'd never see the v6 since it's missing in DNS. But if they could connect the other way around it would be an issue. [12:37:28] I am not rushing and I really want to make things properly, so I'd add interface::add_ip6_mapped { 'main': } first, roll it out and then re-run pcc to see if we are good [12:37:48] ok [12:38:00] yeah looks like the old nodes configure it that way in role::kafka::analytics [12:38:29] maybe add to role::kafka::jumbo::broker ? [12:38:41] (and also add entries to DNS) [12:39:35] I can go do the DNS-side patch, I'm familiar with it anyways [12:40:11] if you have time it would be really great [12:42:02] https://gerrit.wikimedia.org/r/c/408795/ - jenkins doesn't like it of course by I can bypass [12:42:53] going to check the ip6 traffic on jumbo though [12:48:20] on jumbo1001 I can see only neighbor solicitation, some multicast traffic (from/to non global ip6 addresses) and RA [12:48:36] so it should get upset by this change afaict [12:50:35] the sorting of our ipv6 revdns lists is awful lol [12:50:49] I was going to fix it first where I saw issues, but apparently that's a rather large project :P [12:51:35] elukey: one thing that concerns me is the jumbos are stretch. I'm sure (hope?) we've already done the ip6-mapped thing on stretch somewhere. [12:51:48] I'm less-sure we've applied the ipsec stuff on stretch anywhere yet [12:52:21] pvoid tends to say "it's all ok because the jessie<->stretch diff is relatively-small", so I'm probably just being paranoid. [12:52:49] but something to keep an eye out for, maybe check if we've already got other (non-cache) ipsec hosts on stretch [12:54:09] I can run it on one kafka-jumbo host and triple check, worst that can happen is that I need to rollback. The cluster is not handling important traffic at the moment and it can tolerate a one node failure [12:54:17] what do you think? [12:54:46] for the ipsec part I would not be so sure to be honest [12:55:02] (I mean, I wouldn't be so inclined to just merge and see :) [12:58:21] (going to grab lunch quickly and then I'll be back) [12:58:31] ok [13:05:05] elukey: DNS part @ https://gerrit.wikimedia.org/r/#/c/408801/ (I'd deploy the ip6 mapped part first and see that the IPs look ok in practice on the hosts, before touching DNS, just in case. Also useful validation of the DNS patch that its IPs match the ones picked by the add_ip6_mapped) [13:23:42] back again! [13:24:11] bblack: so going to merge my change and run puppet on kafka-jumbo1001 so we can see the end result [13:24:57] ok [13:36:40] bblack: done! [13:37:33] elukey@kafka-jumbo1001:~$ ip addr | grep 2620:0:861:101:10:64:0:175 ---> inet6 2620:0:861:101:10:64:0:175/64 scope global mngtmpaddr dynamic [13:40:22] tried to ping6 it from kafka1012, works fine [13:50:50] looks good :) [13:51:31] all right, rolling out the rest then :) [13:54:45] IP6 2620:0:861:102:1a66:daff:fefc:ccbc.59048 > puppetmaster1001.eqiad.wmnet.puppet: [13:55:00] so one thing goes over ip6, puppet! [13:56:44] yeah probably anything where the kafka-jumbo host initiates (does the DNS lookup) and mapped-v6 is defined in DNS for the destination host [14:08:57] the dns changes look good, shall I merge + authdns-update? [14:12:55] elukey: +1 yeah [14:20:06] done! [14:21:44] looks good in my limited testing/looking [14:23:04] so, moving on to the ipsec part... [14:23:21] the templates for ipsec.conf use my favorite function: scope.function_ipresolve [14:23:35] which IIRC resolves on the puppetmaster during compilation [14:24:10] and the eqiad DNS caches almost certainly already had the kafka-jumbo hostnames [14:25:01] we can fix that pretty easily on prod by forcing those 6 hostnames out of the powerdns caches, then the real puppetmaster would compile the new ipsec changeset with the appropriate ipv6 stuff in place [14:25:17] I'm not exactly sure how we flush DNS like that for the puppet-compiler hosts so that it would show the same [14:26:47] in any case, the TTL is 1h, so in theory it would fix itself "naturally" in both cases 1h after the authdns-update [14:27:46] (or maybe if you're lucky, sooner. worth trying anyways) [14:28:57] I like the dns cache wipe apprach! [14:29:55] it will work for prod if everything else is fine, it just (maybe? probably?) won't affect the puppet-compiler hosts, and thus pcc won't reflect that v6 is fixed in the ipsec.conf changes [14:30:05] (until 1h, and/or if we're lucky, who knows) [14:32:12] while I get the obvious utility factor of "ipresolve", it's really a nasty hack that shouldn't exist. Maybe once we have IPAM we can do some sort of export from IPAM -> hieradata or something else of similar effect, and then not have compilation depend on runtime DNS lookups on the compiler. [14:34:24] or a cumin's query to gather them from the IPAM :D [14:35:11] that sounds loopy too lol [14:36:06] the stuff ipresolve() is used for really should be global "facts" of some kind, at least in the hieradata sense. [14:37:21] ema: https://gerrit.wikimedia.org/r/#/c/408810/ ? It's my best stab at what might resolve T181315 so far. I have yet to find any solid indicator that this is driven by anything other than some kind of 24h expiry stampede. [14:37:21] T181315: Varnish HTTP response from app servers taking 160s (only 0.031s inside Apache) - https://phabricator.wikimedia.org/T181315 [14:38:01] ema: (the syntax is tested btw, works on real v4 + v5 servers) [14:45:04] bblack: https://puppet-compiler.wmflabs.org/compiler02/9884/cp2002.codfw.wmnet/ looks good :) [14:45:40] (didn't wipe anything, still in a meeting) [14:55:37] elukey: awesome [14:56:04] elukey: checked via cumin, and this will be the first case of C:strongswan + F:lsbdistcodename = stretch :) [14:58:27] bblack: if you have time I could merge https://gerrit.wikimedia.org/r/c/408773/ and then run puppet on kafka-jumbo, meanwhile you run it on the non eqiad caches? [14:58:54] elukey: well, we can hit it all in one go with cumin, but... [14:59:11] 10HTTPS, 10Traffic: Wikimania.org uses an invalid security certificate - https://phabricator.wikimedia.org/T186717#3952715 (10Stryn) [14:59:17] oh yes even better, I didn't want to mess with cache host :) [14:59:18] we should take a minute and stare at the stretch+ipsec case a bit, in case there's anything obviously-borked about it [14:59:27] bblack: the patch lgtm, I wonder whether we should do something similar for keep too though. Or is the theory that TTLs going to 0 all roughly at the same time would cause a surge in IMS requests down the cache stack and that might have something to do with the issue? [15:00:09] elukey: jessie uses strongswan 5.3.0-1+wmf2, and stretch has 5.5.1-4+deb9u1 . Not sure what our wmf mods are, or what kind of diffs we might care about on upstream 5.3->5.5 [15:00:29] * volans love to see people doing complex query syntaxes and learn from the error message (hopes it was useful) [15:01:06] volans: yeah I did 'P{C:strongswan} and P{F:lsbdistcodename = stretch}' (or jessie) to look. [15:01:41] I tried without P{} first and it told me to use the "global grammar", which wasn't very obvious, but I eventually sort of figured out what to do on wikitech cumin page, figuring the P{} and similar wrappers must be the global grammer. [15:02:34] yeah, saw that from the logs :-P any suggestions to make that more obvious? [15:04:30] our jessie package of strongswan was just a backport since Jeff needed some feature which was only in 5.3 or so (and IIRC +wmf2 only backports some security fixes), other than that we should not have any modifications towards Debian [15:04:33] * volans s/General/Global/ on wikitech page in the menawhile [15:05:48] and Debian.NEWS only mentions older versions, so probably nothing too disruptive in 5.5 over 5.3 [15:06:16] are the strongswan versions incompatible? [15:06:45] nono we are just sanity checking that we are good before proceeding [15:06:52] on paper it should be fine for us [15:10:46] it's probably worth checking the changelog, though [15:11:31] bblack: forget about the keep comment, TTL+grace+keep is when objects become eviction candidates, so swizzling ttl is enough [15:11:40] Debian.NEWS lists the disruptive changes which require admin intervention [15:12:17] but OTOH if strongswan was somehow incompatible with the strongswan version in oldstable that would be quite a massive thing and likely mentioned in Debian.NEWS [15:13:00] ema: yeah, I'm thinking the VCL incantation to do it for all TTLs isn't that bad, either, gonna rework so it affects <1d natural TTLs, too [15:18:03] hi :) [15:18:08] hi there :) [15:18:20] there's less bot and other misc chatter here than in -operations [15:18:34] pre-welcome to the team :) [15:19:47] heeeey vgutierrez! [15:19:51] welcome [15:21:08] Hola vgutierrez! I'm sorry I failed to say hello the other day in person, I heard you were around only later on... [15:25:08] don't worry about that :D [15:25:39] vgutierrez: hello! [15:29:29] yey! hi elukey. I'm eager to start on Monday.. it's been already 2-3 months since the interviews! [15:30:15] ema: updated https://gerrit.wikimedia.org/r/#/c/408810/ [15:32:26] oops, I left in my debug header output :) [15:32:32] bblack: I was about to ask :) [15:32:41] also, the nice commit message is gone [15:32:58] well I replaced it with a nice comment in the code, seems better :) [15:36:28] well, I often find it useful to have the commit log explaining a bit more about the change, but YMMV! [15:37:01] yeah I guess we differ there. I'm in general naturally-prone to writing huge commitlog explanations for a very short code change. [15:37:30] but I've recognized it as an anti-pattern. then it's hard to understand the code when reading the code, and you always have to history-dive git to find out why things work the way they do. [15:38:00] seems better to summarize what would've been in the extra-long commitlog in an actual code comment, making both the diff and the final state of the code more self-explanatory. [15:40:48] fair enough [15:40:50] bblack: VTC test case? :) [15:45:39] ema: that would be interesting to write. I mean, I guess we could test that it's within the 5% bounds, but if we test that it's not equal to the original fixed ttl the test will randomly fail sometimes when std.random picks exactly 1.0 [15:47:10] bblack: yep, checking if it's within the bounds seems like the way to go [15:48:25] and the usefulness there I guess it catching a case where something in VCL parsing/compilation changes the meaning of = * std.random(which should return a real) [15:48:50] which is not entirely unlikely, there have been odd changes in type conversions and operators before [16:12:43] any thoughts about the ipsec 5.5 vs 5.3 doubt? [16:13:54] elukey: I dug through it a bit just now, and it seems reasonably sane. possibly an improvement, at least once both sides are eventually on 5.5. [16:14:28] elukey: what I'm just starting to stare at now is the whole ferm/iptables issue. I think we've done ferm+ipsec somewhere before, I'm not sure if it's automagic when you include the ipsec role. [16:15:50] bblack: iirc when we added kafka1023 we didn't have to touch ferm/iptables [16:19:26] elukey: modules/role/manifests/kafka/analytics/broker.pp: include ::ferm::ipsec_allow [16:19:41] ^ we need that on the jumbo brokers in whatever appropriate place, in the ipsec patch [16:20:42] should that go into the ipsec role somehow? [16:20:46] bblack: snap, thanks a lot [16:22:31] ottomata: no, because all the other ipsec consumers don't use ferm [16:23:01] I guess we could do some kind of "if ferm" conditional, I donno. I'd assume it just pulls in ferm in general if it wasn't present, or fails [16:25:15] volans: oddly, I don't get any hosts for 'C:strongswan and C:ferm::service', although I'm pretty sure hosts like kafka101.eqiad.wmnet have both [16:25:16] mmk [16:25:43] kafka1001 probably does not have strongswan, but kafka1012 should maybe? [16:25:59] kafka1012 should have both i'd think [16:26:10] kafka1012 is spared/decommed or something I think [16:26:29] bblack: intersting, but I cannot even with the global grammar [16:26:33] oh no, that's kafka1018 I'm thinking of [16:26:56] wait ferm service is not a class ;) [16:27:16] oh, well C:ferm is though [16:27:41] 50 hosts with 'P{C:strongswan} and P{C:ferm}' [16:28:17] 130 with 'P{C:strongswan} and P{R:ferm::service}' [16:28:25] ah ok [16:28:42] the latter seems "wrong" at some level, maybe not the cumin level heh [16:28:53] to me too [16:29:20] 10HTTPS, 10Traffic, 10Operations: Wikimania.org uses an invalid security certificate - https://phabricator.wikimedia.org/T186717#3953055 (10Stryn) 05Open>03declined See T133548 [16:29:25] now, if you do 'C:strongswan and C:ferm' it doesn't return anything, while it should give you the same error for Facts and Resources mix [16:29:38] because puppetdb API v3 allows for only one resource do be queried per query [16:29:43] ah! [16:29:50] and that's on me not "detecting" it [16:30:45] in any case: [16:30:47] API v4 should work, at least in theory, we'll see soon [16:30:51] 'P{C:strongswan} and P{C:ferm} and P{C:ferm::ipsec_allow}' -> 50 hosts [16:31:00] 'P{C:strongswan} and P{C:ferm} and not P{C:ferm::ipsec_allow}' -> 0 hosts [16:31:13] so ipsec_allow is consistently applied where it needs to be so far [16:31:15] ottomata: would https://gerrit.wikimedia.org/r/#/c/408773/5/modules/profile/manifests/kafka/broker.pp be ok for this use case? [16:31:20] nice [16:32:30] elukey: seems like it would be saner to put ferm::allow_ipsec together with ipsec itself (which is not the case for the old kafka1012-style brokers) [16:32:44] elukey: am fine with that ya [16:32:57] where are you including the ipsec class though right now? [16:32:59] in the role? [16:33:05] maybe put those two together? [16:33:42] the kafka1012 cluster was using this to put them together: [16:33:43] modules/role/manifests/kafka/analytics.pp: include ::role::ipsec [16:33:43] modules/role/manifests/kafka/analytics/broker.pp: include ::ferm::ipsec_allow [16:34:03] it seems like they should be together in one place or another, but I donno what other constraints might be going on there [16:34:36] the current jumbo+ipsec patch puts ::role::ipsec in: modules/role/manifests/kafka/jumbo/broker.pp [16:34:37] yeah, the linter is crazy these days [16:35:02] roles can only include profiles or other roles maybe? profiles cannot include roles? [16:35:27] there should be one single role on each node that includes only profiles [16:35:34] elukey: i'd put the ferm include in the role with your # temp comment and skip past the linter [16:35:38] and the things we used to call roles are usually all profiles [16:35:42] and should be renamed [16:35:44] yayaya [16:35:46] but they aren't all renamed [16:35:48] either way, both role::kafka::analytics and role::kafka::analytics::broker are roles, so the rules don't change whichever file it's in [16:35:52] so sometimes you have to skip past the linter to do things [16:35:58] yea, there is a chicken-egg problem often,i feel ya [16:36:09] gotta find somewhere to start [16:36:51] ok so all in the role with a temp comment [16:36:56] at least the "include ::apache" stuff can be fixed now, heh [16:38:07] elukey: yeah I'd say leave existing stuff alone and just put them together in jumbo/broker.pp for now. It all needs refactoring anyways. [16:42:55] I explicitly added the class, it please the linter but it is a trick of course [16:42:58] https://gerrit.wikimedia.org/r/#/c/408773/6/modules/role/manifests/kafka/jumbo/broker.pp [16:43:16] running pcc [16:43:35] so, there is an actual behavior difference there in theory, between "include" (include-like) and "class" (resource-like) in puppet terms [16:43:53] but in practice we seem to already be using it both ways without issue [16:43:56] modules/profile/manifests/redis/multidc.pp: class { '::ferm::ipsec_allow': } [16:43:59] modules/role/manifests/kafka/analytics/broker.pp: include ::ferm::ipsec_allow [16:44:27] yes I am aware of the difference, I thought it was the same for this use case since we only include it once anyway (but I could be wrong) [16:44:36] since the class has no parameters and it doesn't get included twice, yeah I think it doesn't matter [16:44:39] just noting! :) [16:44:53] oh yes please do! If it wasn't for you I'd have forgotten 100 things :) [16:47:37] so I think we look good to deploy. It's difficult to do a test where we just get 1 broker + 1 cache to see any obvious breakage before it hits the rest and/or alerts icinga [16:47:52] (so difficult that it's probably not worth the effort, vs applying quickly and reverting quick if issue) [16:47:52] checked https://puppet-compiler.wmflabs.org/compiler02/9888/kafka-jumbo1001.eqiad.wmnet/, it contains Ferm::Rule[ferm-ipsec-esp] [16:48:04] agreed [16:49:05] ok so I'd say we are good to merge [16:50:53] ya bblack no prod uses of vk + jumbo yet, so ya [16:50:56] +1 merge [16:52:11] elukey: if you want I can try to control the carnage a bit and handle the deploy. puppet-disable all the affected hosts, puppet 1x broker + 1x cache manually and see the connection go alive, then release->puppetize the rest [16:52:23] (all quick enough for no icinga, hopefully) [16:53:35] if something makes it just not work at all, at least then we're limiting to up to 2x ipsec alerts -> revert [16:54:26] bblack: sure, I am currently checking a good cumin alias [16:54:47] I was naively testing A:cp-codfw and A:cp-ulsfo and A:cp-esams and A:kafka-jumbo [16:55:12] that sounds about right, I didn't think of A:kafka-jumbo and tried something dumbed :) [16:55:13] * volans will not say publicly that is secretly working on adding autocomplete to cumin, including aliases [16:55:14] * elukey waits for Riccardo [16:55:19] hahaha [16:57:06] elukey: puppet disabled, ready? [16:57:14] ready [16:57:23] going to merge then [16:57:24] well, one host is being slow, sec [16:57:34] ok done [16:58:52] bblack: merged [17:00:36] do you want me to run puppet on one kafka jumbo + cp codfw host? [17:00:47] (don't want to step on your foot) [17:00:48] already done, looks good, running the rest [17:01:00] hahaha ok I should I have guessed that [17:01:03] thanks a lot :) [17:02:47] root@kafka-jumbo1001:~# /usr/local/lib/nagios/plugins/check_strongswan [17:02:50] Strongswan OK - 114 ESP OK [17:02:55] root@cp2001:~# /usr/local/lib/nagios/plugins/check_strongswan [17:02:55] Strongswan OK - 68 ESP OK [17:03:12] that doesn't confirm every pair, but confirms every affected host is functional for at least some pairings, which is close enough [17:03:15] should be ok! :) [17:03:28] \o/ [17:06:04] all affected hosts have been re-checked by icinga since the puppet change rolled out, too, all OK [17:06:28] well except kafka-jumbo themselves, because icinga hasn't gotten around to adding the NRPE check for them yet [17:06:44] but all the cache-side hosts have, so they're all validating they're connecting to all the brokers anyways [17:07:02] this is massive, we are now unblocked to make the migration, thanks a ton bblack! [17:07:11] np! [17:13:38] volans: https://wikitech.wikimedia.org/w/index.php?title=Cumin&type=revision&diff=1781936&oldid=1781700 looks perfect, would've made the lookup easier :) [17:14:24] I realized it was using a different naming when you mentioned it, and fixed :) [17:14:28] thanks for telling me [17:22:20] 10netops, 10Cloud-Services, 10Operations: Intermittent bandwidth issue to labs proxy (eqiad) from Comcast in Portland OR - https://phabricator.wikimedia.org/T136671#3953299 (10brion) In the middle of the week it seems less congested than on the weekend, still on the same route. Seeing up to 32 megabits downl... [17:54:02] 10Traffic, 10Maps-Sprint, 10Operations: Decide on Cache-Control headers for map tiles - https://phabricator.wikimedia.org/T186732#3953401 (10Gehel) [17:57:18] 10netops, 10Cloud-Services, 10Operations: Intermittent bandwidth issue to labs proxy (eqiad) from Comcast in Portland OR - https://phabricator.wikimedia.org/T136671#2343755 (10ayounsi) What speed/path do you get for example between your home and your linode server? Can you also try for example: https://uplo... [18:19:26] 10Traffic, 10Analytics, 10Operations, 10Research, and 6 others: Referrer policy for browsers which only support the old spec - https://phabricator.wikimedia.org/T180921#3953540 (10Nuria) Will do, let's let it bake a bit and i shall check. [18:49:02] ema: https://gerrit.wikimedia.org/r/#/c/408810/ updated with working VTC for your amusement tomorrow. If you can double-check that this shouldn't cause awfulness and push it in your AM, we'll have time to see some positive impact before the week is out, if any. [19:16:23] 10Traffic, 10Operations, 10Page-Previews, 10RESTBase, and 2 others: Cached page previews not shown when refreshed - https://phabricator.wikimedia.org/T184534#3953710 (10Volans) [19:17:19] 10netops, 10Cloud-Services, 10Operations: Intermittent bandwidth issue to labs proxy (eqiad) from Comcast in Portland OR - https://phabricator.wikimedia.org/T136671#3953714 (10brion) I get a full 150 megabits download (my bandwidth cap) on that file from ulsfo, and about 100 megabits from my Linode server (t... [19:35:21] still no large mailbox ramps in ulsfo since the upgrade. no significant spikes either, just those single-digit floor values creeping in, and occasionaly tiny spikes into the low double digits. [19:35:35] ~1.5 days to go though [19:37:09] (until the first of them reach ~7d uptime and restart again. I guess really it will take nearly two full weeks to see them all hit their maximal uptimes, in the case that some of it's chash-destination-specific *and* uptime-sensitive)