[07:46:51] <_joe_> I need a quick sanity check on my latest php nugget [07:46:57] <_joe_> https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/589541 [07:47:14] <_joe_> any takers? :P [09:21:29] <_joe_> XioNoX: ema and I have a quiz for you [09:21:56] bring it on [09:25:06] <_joe_> so, HTCP goes via UDP [09:25:32] <_joe_> and I would expect this to limit the amount of data we can send to the caches, specifically the lenght of an url [09:25:48] <_joe_> to less than the MTU we use in production [09:26:07] <_joe_> but in practice, ema was able to purge a url that is more than 2k bytes long [09:26:09] or actually, is IP fragmentation saving the day? [09:26:33] <_joe_> ema: you can past the tcpdump capture you had? [09:26:52] sure [09:26:52] tcpdump -n -v udp port 4827 and host 239.128.0.112 and greater 1500 [09:26:58] tcpdump: listening on eno1, link-type EN10MB (Ethernet), capture size 262144 bytes [09:27:01] 09:16:25.191816 IP (tos 0x0, ttl 6, id 44466, offset 0, flags [+], proto UDP (17), length 1500) [09:27:04] 10.64.16.77.47889 > 239.128.0.112.4827: UDP, bad length 2102 > 1472 [09:28:01] all other packets have `flags [DF]` Don't Fragment I guess? [09:28:14] so maybe the [+] is fragmentation [09:28:26] what do you mean by other packets? [09:28:53] other packets captured with `greater 10` instead of `greater 1500` [09:29:13] eg: [09:29:19] 09:27:35.909623 IP (tos 0x0, ttl 6, id 2958, offset 0, flags [DF], proto UDP (17), length 125) 10.64.16.67.42720 > 239.128.0.112.4827: UDP, length 97 [09:29:23] <_joe_> ema: I'm also in awe that the IP stack accepts a packet with bad length [09:29:35] DF isso yeah DF is the don't fragment flag [09:29:58] wait [09:30:20] + is reported if MF is set, and DF is reported if F is set [09:30:51] what does 'F is set' mean? [09:31:10] (I suppose MF means More Fragments) [09:32:45] _joe_: the bad length is maybe because the NIC does the reasembly [09:32:58] <_joe_> XioNoX: ok, that makes sense then [09:33:13] still thinking about the flags [09:33:20] yes you can send > 1500 length packets because IP fragmentation/reassembly [09:33:35] and yes NICs (often) do fragmentation/reassembly in hardware [09:36:36] <_joe_> yeah I was thinking of the fact you can't really do that over the internet reliably [09:36:45] you can [09:37:19] ema: can you send me a pcap? I'll look at it in wireshark as the tcpdump flags confuse me [09:37:21] <_joe_> right, but at that point why not use tcp instead [09:39:20] XioNoX: cp2009:/home/ema/x.pcap [09:41:48] ema: so yeah, the + means MF (More Fragments) as in, expect more fragments to follow [09:43:55] and the one you sent me doesn't have the bad length [09:44:18] mmh, maybe it's just tcpdump saying so [09:44:23] tcpdump -r x.pcap [09:44:25] reading from file x.pcap, link-type EN10MB (Ethernet) [09:44:28] 09:38:54.554415 IP mwmaint1002.eqiad.wmnet.54496 > 239.128.0.112.4827: UDP, bad length 2102 > 1472 [09:48:52] ema: what issue are you trying to debug? [09:49:45] yeah this is normal [09:49:49] your pcap only has the first fragment [09:49:58] so reassembly fails [09:50:12] XioNoX: it's Friday, there are no issues on Friday [09:50:36] _joe_ and I are just having some fun [09:51:01] hahaha [09:51:04] <_joe_> yeah that "bad length" puzzled us :) [09:51:09] let me know if I can help more [09:51:16] <_joe_> and we hoped you could teach us about it :P [09:51:17] the *UDP* header has a length field [09:51:21] of the entire payload [09:51:31] the UDP header is in the first fragment [09:51:54] so tcpdump dissects that, figures out to expect a UDP payload of 2102 [09:52:06] <_joe_> ok, that makes sense [09:52:10] the pcap has only the first *IP* fragment, which contains only one part of the payload (1472) [09:52:28] note that unlike tcpdump wireshark refuses to even dissect L4 until it sees all fragments [09:52:35] <_joe_> 1472 being mtu - ip header - udp header [09:52:38] (it's probably configurable somewhere) [09:52:49] so won't show any errors/warnings, it will just show "Data" [09:53:09] <_joe_> so I should always use wireshark I guess :P [09:54:10] yeah ipv4 header 20, udp header 8 [09:55:03] kinda weird that you're seeing fragments and reassembly didn't happen at the NIC level [09:55:17] maybe the remainder got lost though [09:55:35] <_joe_> paravoid: apparently not, the daemon got the full url AIUI [09:56:16] <_joe_> I'll re-offer some nice php code for review if anyone's inclined https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/589541/ [09:56:35] <_joe_> it's fresh, it uses tabs and modern stuff like [ ] arrays! [09:56:45] yeah nice try [09:57:01] <_joe_> :D [09:58:29] someone knows how to interract with broadcom NICs btw? I think the embeded LLDP daemon is running on some hosts, but no idea how to check/disable it [09:58:39] for https://phabricator.wikimedia.org/T250367 [09:59:04] the NIC has an embedded LLDP daemon?! [09:59:06] wtf [09:59:19] TIL [10:00:05] there are some mentions that it exists eg in https://github.com/torvalds/linux/commit/7d63818a35851cf00867248d5ab50a8fe8df5943 [10:00:39] but no idea if I can query that flag from linux [10:08:15] there is a bios menu, maybe there is an option there [10:10:51] 90 hosts are "broken" so I hope we don't have to reboot them :( [10:26:33] XioNoX: https://github.com/ubports/ubuntu_kernel_xenial/tree/master/ubuntu/bnxt mentions the DCBX agent and suggests it can be queried and changed with lldptool-dcbx https://manpages.debian.org/testing/lldpad/lldptool-dcbx.8.en.html. i didn;t see lldptools on the servers so havn't expolred further [10:27:08] yeah that's from lldpad, which is a different implementation from lldpd [11:56:58] zoom windows analysis https://dev.io/posts/zoomzoo/ (seems like they have simlar issues to the linux client and more) [11:59:31] "Who put the "Zoo" in "Zoom"?" [12:24:59] puppet cert expiry [12:24:59] openssl s_client -connect projects.puppetlabs.com:443 -showcerts 2> /dev/null notAfter=Apr 16 23:59:59 2020 GMT [12:25:06] :( [12:49:31] I checked zoom javascript dependencies a bit, that is a bunch of very old libs iirc [13:25:32] _joe_: you want json_decode and not json_encode? [13:27:22] <_joe_> cdanis: uh, no [13:27:48] <_joe_> ofc I want _encode [13:27:50] <_joe_> meh [13:27:58] <_joe_> I even wrote it multiple times :D [13:28:09] :D [13:28:22] <_joe_> at least I was consistent :P [13:39:53] elukey: to clarify, are you thinking I should back up to the n-1 version of the patch that includes get => 'LatestRoute|Pool|designate'? [13:40:04] andrewbogott: correct [13:40:11] 'k, makes sense [13:40:12] thank you! [13:40:16] with the caveat that I don't know how it will behave :) [13:40:28] but it makes sense for your use case and it is outlined by upstream [13:40:32] so seems safe [13:40:40] sorry but I didn't get the use case at first [13:40:50] I thought you needed something different [13:44:04] _joe_: did you know that we're saturating network rx on some appservers some of the time? [13:44:31] <_joe_> cdanis: I didn't know, no. Interesting, that might explain some things I was seeing [13:44:50] https://w.wiki/N6Z [13:44:57] <_joe_> it's quite surprising too [13:45:22] yeah! [13:45:30] <_joe_> I would guess that's transfer of some large query dataset [13:45:42] <_joe_> like say a query returned 50 MB of data [13:45:54] wow rx? [13:45:59] <_joe_> I can see that filling up RX bytes for a fraction of a second [13:46:01] it's only running on the canary appservers -- I think there's 5 in eqiad? -- so it's surprising to see any events at all [13:46:16] <_joe_> elukey: I would've been more worried if it was TX [13:46:22] these events are for several seconds _joe_ [13:46:38] <_joe_> well, if the query is 100 MB of data :P [13:46:57] <_joe_> and at the same time you have another 40 requests in flight doing queries to other databases [13:47:16] _joe_ rx is also not great too no? It means that some clients send a ton of data to appservers, maybe in bursts? [13:47:25] <_joe_> ofc it's not great [13:47:29] elukey: or memcached ;) [13:47:35] <_joe_> but I can expect it to happen from time to time [13:48:07] <_joe_> elukey: I'm willing to bet 95% of rx traffic comes from the datastores [13:48:13] cdanis: ah right good point, didn't think about it [13:48:30] <_joe_> databases, memcached, other services [13:48:37] yes yes good point [13:48:39] lol, you can't just enter a unix timestamp in the grafana textbox [13:50:28] the usual monitoring shows mw1265 spiking to 30Mbyte/sec (0.24 Gbit/s) and mw1262 spiking to 26Mbyte/sec (0.2 Gbit/s) [13:51:32] whereas nic-saturation-exporter reports that mw1265 had >90% util on rx for 9 seconds, and mw1262 for 8 seconds [13:57:19] I'm a bit worried about the cardinality of the histogram metric I'd want, will have to do some math [14:06:48] godog: responded and patched https://gerrit.wikimedia.org/r/c/operations/puppet/+/589400 and also added https://gerrit.wikimedia.org/r/c/operations/puppet/+/589597 on top [14:16:39] ottomata: thanks! in a meeting, will take a look later [14:20:36] danke! [15:06:59] is there anybody that I can talk with for mwlog1001? Quick patch, I want to remove the rsync rules to allow stat1007 to pull mw-api logs - https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/589600/ [15:58:33] <_joe_> cdanis: I would say % of time the nic was saturated per minute? [15:58:54] _joe_: that's computable from metrics we already have [15:59:17] I was thinking of also having the exporter provide a histogram with the per-second utilization percentages themselves, with fine-grained buckets at the top end [15:59:31] so you could tell 85% vs 90% vs 95% [15:59:45] elukey: if you're still around… interested in helping me understand why mcrouter won't start up? The error message is pretty overwhelming. [15:59:50] https://www.irccloud.com/pastebin/4mhIuuHR/ [16:00:04] andrewbogott: sure [16:00:22] the complaint seems to be 'no route' [16:00:47] (almost certainly due to a missing semicolon or something) [16:00:55] here's the class def: [16:00:57] https://www.irccloud.com/pastebin/lbafBfDC/ [16:02:20] andrewbogott: I think it is worth to check the json file that got rendered with jq '.' [16:02:28] if there is a problem it will be clear [16:03:15] elukey: sorry, how do I see the rendered json? [16:05:12] andrewbogott: it should be under /etc/mcrouter/etc.., is cloudservices1003 one of the hosts in which the change was applied? [16:05:24] currently only cloudservices2002-dev.wikimedia.org [16:05:46] and puppet seems to not write the file to /etc/mcrouter due to the failure [16:06:09] maybe that's the problem and not the config [16:06:52] yeah so /etc/mcrouter/config.json is not there [16:08:11] so validate_cmd => "/usr/bin/mcrouter --validate-config --port ${port} --route-prefix ${region}/${cluster} --config file:%", [16:08:34] the config is wrong, but luckily there is something in the puppet log [16:09:04] this? [16:09:08] /usr/bin/mcrouter --validate-config --port 11213 --route-prefix codfw/designate-codfw --config file:/etc/mcrouter/config.json20200417-21884-160kevf [16:09:15] I'm not sure how to capture the file it wants to validate [16:10:21] so just copy that on a file, then cat the file | jq '.' [16:10:30] there seems to be no pool configuee [16:10:34] *configured [16:10:54] right, but that file is transitory [16:11:06] <_joe_> andrewbogott: look at https://www.irccloud.com/pastebin/4mhIuuHR/ [16:11:11] <_joe_> the json is there [16:11:27] ah, ok, just without newlines [16:11:28] <_joe_> just run puppet, copy the json in the log to a file, run jq on it [16:11:37] yep --^ [16:11:59] andrewbogott: in your patch there was a .each IIRC that looped over the hosts [16:12:11] I guess it is not working in -dev? [16:12:24] ah yeah, I bet that's it [16:12:42] servers => $designate_hosts.map |$host| { "${host}: 11211" }