[09:09:43] I feel like I haven't complained about predictable network interfaces names in a while [09:10:32] interface names get so long that it overflows the buffer allocated to the IRQ ID [09:11:48] example... [09:11:51] https://www.irccloud.com/pastebin/qPaJKwHZ/ [09:13:11] if you're lucky enough you get an interface name short enough to get the full irq id [09:13:16] https://www.irccloud.com/pastebin/ZrEffO2C/ [09:13:26] ๐Ÿ˜  [09:13:51] and to add extra insult, the struct in the Linux is still limited to 16 chars minus the terminating null :-) [09:14:48] moritzm: any idea on how to debug how IRQs are assigned to NIC queues BTW? [09:15:06] no idea, sorry [09:15:41] /proc/interrupts ? [09:17:14] jynus: more like why I'm getting 16 IRQs associated to apparently 16 queues when ethtool is reporting 8 queues [09:18:01] hmmm and I need to debug interface-rps.py cause it's definitely expecting iface-TxRx-%d as a pattern [09:18:10] and that's not true at least on lvs7002 [09:18:14] ๐Ÿฟ [09:18:43] and all thanks to the predictable network interface names [09:49:28] so basically the broadcom drivers sets 2 MSI-X vectors per queue [09:49:55] (one for rx and one for tx, hence the 2 IRQs per combined queue) [09:52:08] vgutierrez: I suspect maybe there are 8 rx queues, 8 tx queus? [09:52:29] that seems to be what /sys/class/net/ens1f0np0/queues/ suggests anyway [09:56:00] topranks: so those are combined queues apparently [09:56:10] https://www.irccloud.com/pastebin/RPzXpRkn/ [10:03:27] the concept of a combined tx/rx queue confuses me too much I think :) [10:07:30] and it's probably a good match for a XDP based load balancer [10:07:52] given XDP gets the packet from the NIC, modifies the packet and sends it back to the same NIC (XDP_TX action) [10:10:28] but this is a PITA in the sense of having CPU<->NIC mapping agnostic of the NIC driver [10:11:57] the most accurate way of fetching them for bnxt_en seems to be dumping /sys/class/net/$iface/queues/rx-*/rps_cpus [10:13:22] this is important to have one LRU map per CPU handling incoming traffic [10:13:37] https://www.irccloud.com/pastebin/RlnrkZLa/ [10:13:54] the number at the end of the map name is the CPU ID [10:14:53] so in XDP it fetches the proper LRU map with [10:14:56] __u32 cpu_num = bpf_get_smp_processor_id(); [10:14:56] void* lru_map = bpf_map_lookup_elem(&lru_mapping, &cpu_num); [15:25:56] does anyone know which servers host the ops prometheus instance? I was looking on the prom servers but I didn't see them under /srv/prometheus/ops like I'd expect. I just wanted to make sure the BB check we removed in https://gerrit.wikimedia.org/r/c/operations/puppet/+/1161960 was actually removed [15:28:10] inflatador: I usually go by https://gerrit.wikimedia.org/g/operations/puppet/+/39bd40d905b40d3a393f960e2e9048542a3bdc29/hieradata/common/prometheus.yaml#10 [15:29:01] there's probably a better way, of course :) [15:29:05] swfrench-wmf thanks! I was looking at the wrong prom server [15:30:22] and...looks like we are good on the BB check [15:31:09] ๐Ÿ’™cdanis@cumin2002.codfw.wmnet ~ ๐Ÿ•ฆโ˜• sudo cumin 'P:prometheus::ops' [15:31:11] 9 hosts will be targeted: [15:31:13] prometheus[2005-2006].codfw.wmnet,prometheus6002.drmrs.wmnet,prometheus[1005-1006].eqiad.wmnet,prometheus5002.eqsin.wmnet,prometheus3003.esams.wmnet,prometheus7002.magru.wmnet,prometheus4002.ulsfo.wmnet [15:31:15] DRY-RUN mode enabled, aborting [15:34:27] that'll do it as well :) [15:35:33] (the more encompassing one if desired is simply A:prometheus) [16:02:13] ACK, will keep that in my back pocket ;)