[08:03:00] 10netops, 10Operations: cr2-eqdfw (MX204) vhclient log noise - https://phabricator.wikimedia.org/T203261 (10ayounsi) Steps for the upgrade: [] Verify image checksum and validate `request system software validate /var/tmp/junos-vmhost-install-mx-x86-64-17.4R2.4.tgz` [] Start upgrade process `request system soft... [09:04:32] vgutierrez: so where are we with puppet stuff? [09:06:02] morning to you too [09:06:41] so volans comments are addressed, bblack active/pasive too [09:07:16] so... lets merge it and run puppet in certcentral[12]001? [09:11:54] 10Traffic, 10Operations: Document eqsin power connections in Netbox - https://phabricator.wikimedia.org/T207138 (10faidon) p:05Triage>03Normal [09:11:55] * volans hides [09:12:21] volans <3 [09:12:24] sorry [09:12:41] yes, good morning [09:12:52] I was hoping to get volans' comments [09:13:32] I'll see if I can get to it this morning, a bit busy [09:15:06] 10Traffic, 10Operations, 10Wikimedia-Incident: Add maint-announce@ to Equinix's recipient list for eqsin incidents - https://phabricator.wikimedia.org/T207140 (10faidon) p:05Triage>03High [09:19:29] 10Traffic, 10Operations: Document eqsin power connections in Netbox - https://phabricator.wikimedia.org/T207138 (10faidon) This refers to power connections specifically as it's a subtask of the power incident, but that spreadsheet covers patches as well, and we should probably document these as well. Also, I'... [09:49:17] 10netops, 10Cloud-Services, 10Operations: Consider renumbering Labs to separate address spaces - https://phabricator.wikimedia.org/T122406 (10faidon) I think this is now done with Neutron, and while the old space remains for now, the migration is underway, so this task can be closed. @ayounsi, @aborrero, @ch... [11:28:40] sigh... checking the pcc looks confusing with some stuff... listing absent things in the change catalog it's confusing [11:39:42] vgutierrez, well the catalog would contain absent resources right? [11:39:59] yup [11:40:01] it would need to be in the catalog for the client to be able to handle absenting it [11:40:45] but in the summary you just see a resource File[/etc/certcentral/http-challenges] [11:40:51] yeah [11:41:02] and it looks weird taking into account that it's set us as absent [11:41:07] s/us/up/ [11:41:25] I'm not too surprised about that [11:41:39] I also wouldn't worry about it for deploying a brand new class to a fresh server... [13:46:59] /* Disable multiple TX rings by default. Simple round-robin hardware [13:47:02] * scheduling of the TX rings can cause starvation of rings with [13:47:05] * small packets when other rings have TSO or jumbo packets. [13:47:17] ^ tg3.c source. so maybe doing the ethtool to give them matching TX queues is a bad idea! :) [13:48:36] will need to rework the tg3 interface::rps support a bit [13:50:40] sorry g.ehel, I ran out of time verifying some things yesterday, but turns out it was good to wait and verify everything heh [13:59:50] XioNoX: will be there shortly, maybe a min or three off! [13:59:58] ok! [14:36:58] 10netops, 10Operations: cr2-eqdfw (MX204) vhclient log noise - https://phabricator.wikimedia.org/T203261 (10ayounsi) 05Open>03Resolved Confirmed no more noisy logs. [14:45:53] 10netops, 10Cloud-Services, 10Operations: Consider renumbering Labs to separate address spaces - https://phabricator.wikimedia.org/T122406 (10ayounsi) No objections for me. [16:45:25] bblack: last (hopefully) question about https://gerrit.wikimedia.org/r/c/operations/puppet/+/465624 (rps for wdqs) [16:46:00] bblack: you're better placed than me to evaluate the risk, is this something I should deploy first one one server to validate for a few days? [16:46:24] or is it safe enough to send to all in one shot? [16:47:55] gehel: with the interface::rps stuff as it is now, it should be pretty damn safe, but I would still maybe disable puppet on wdqs* and just do one first to validate that assumption. [16:48:07] ofc! [16:48:15] thanks for a ll the work! [16:48:27] gehel: and do the numa_networking amend first, so we don't have to re-deploy/watch it again to do that later :) [16:48:40] sure [16:49:07] I'll add that right now, but I'll merge tomorrow (late in the day here, I want to be around for a bit once its deployed) [16:49:31] but in general my "disable puppet" above, I meant short term in case of obvious breakage. long-term I don't think there's any way this could be more harmful than helpful. [16:50:05] bblack: I trust you, I just think it is bad policy to merge and run :) [16:50:08] worst case it just doesn't really fix your real problems, and instead removes one symptom and moves you on to staring at something else :) [16:50:20] gehel: it is :) [16:50:32] * gehel has enough to stare at for a lifetime! [16:51:23] I'm not entirely sure why that numa_neworking has to be in the hosts/ hierarchy. Would regexes.yaml work as well? [16:51:52] * gehel is trying to avoid creating 13 files just to set one commonvariable [16:53:30] gehel: just stuff it in per-host yaml and ignore _joe_'s cries of pain. there's a history to it, and it doesn't work easily any other way, even though it seems like it should/would. [16:53:34] it'll go away soon enough [16:53:46] rgr [16:54:12] the patch is prepped already to make it go away, at which point I'll effectively revert this part of your patch [16:54:25] but I have to carefully test some stuff on LVS's use-case first, so I can't quite do it yet [16:54:26] I'll add a note [16:56:41] this is all good anyways, as it pushes me to finish that cleanup, and also I needed to do all of these things to improve LVS low-level perf, and to make interface::rps deploy on our new authdns machines (which use the same card as wdqs) [17:11:52] bblack: heads up, I'll try to pick your brain tomorrow about T207195 (too late today to start this kind of conversation) [17:11:52] T207195: Configure LVS endpoints for new elasticsearch clusters - https://phabricator.wikimedia.org/T207195 [19:00:01] ok :) [20:34:49] 10Traffic, 10Operations, 10hardware-requests, 10ops-esams: Procure and install LVS and miscellaneous servers - https://phabricator.wikimedia.org/T184068 (10RobH)