[00:45:57] 10netops, 10Operations: csw2-esams's VCP link flapped - https://phabricator.wikimedia.org/T229755 (10ayounsi) 05Open→03Declined > I finished working on them but I was not able to match the digital trace to any software report like bug or PR. > When there is a core-dump alongside to an event that caused an... [01:59:58] 10Domains, 10Traffic, 10Operations, 10WMF-Legal, 10Patch-For-Review: Move wikimedia.ee under WM-EE - https://phabricator.wikimedia.org/T204056 (10Slaporte) @tramm I wanted to confirm that we got your email and we're looking into it. Chuck is out of office for the next few days, following his work at Wiki... [02:00:24] 10Domains, 10Traffic, 10Operations, 10WMF-Legal, 10Patch-For-Review: Move wikimedia.ee under WM-EE - https://phabricator.wikimedia.org/T204056 (10Slaporte) a:05CRoslof→03Slaporte [14:21:11] hello traffic team [14:21:21] I'd like to merge https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/531154/ to swap the turnilo backend to another host [14:21:33] so I'd need to merge + run puppet on cp_text [14:21:47] would it be ok or it is a bad time? [14:21:49] Cc: ema [14:22:33] elukey: go ahead! [14:22:38] <3 [14:35:28] ema: new versio of turnilo available btw! :) [14:35:31] *version [14:38:46] ema: what's the ATS equivalent of https://gerrit.wikimedia.org/r/c/operations/puppet/+/480782 ? (the upload.yaml file doesn't have the route_table anymore [14:39:59] XioNoX: requests go straight from ATS to the origin servers (eg: swift) now [14:41:47] XioNoX: and we're using DNS discovery to figure out to which data center requests need to go. What are you working on? [14:41:47] ema: ok, so is there anything that I need to do to drain as much traffic from codfw as possible? [14:42:31] need to do a precautionary depool of codfw for future router work [14:44:39] XioNoX: ack, right now we're using swift.discovery.wmnet and kartotherian.discovery.wmnet in active/active mode [14:44:50] XioNoX: we'll need to set them eqiad-only https://wikitech.wikimedia.org/wiki/DNS/Discovery [14:45:51] ah cool, doc looks clear :) [14:45:53] thanks! [14:46:21] so basically: [14:46:22] is there a querry that returns everything that is pooled in codfw? [14:46:34] to depool codfw maps: [14:46:36] confctl --object-type discovery select 'dnsdisc=kartotherian,name=codfw' set/pooled=false [14:46:47] to depool codfw swift: [14:46:49] confctl --object-type discovery select 'dnsdisc=swift,name=codfw' set/pooled=false [14:48:37] mmh I wouldn't know how to get everything pooled in codfw [14:48:53] perhaps volans / _joe_ have ideas [14:50:03] thanks! [14:50:04] yeah you can use regex there [14:50:12] so name=.* should do (by memory) [14:50:25] sorry dnsdisc-.* [14:50:28] <_joe_> XioNoX: wrong [14:50:31] <_joe_> err [14:50:32] <_joe_> volans: [14:50:34] but I need to check [14:50:34] <_joe_> :P [14:50:36] syntax [14:50:43] ema, the puppet change is still needed for text, right? [14:50:52] I'm barely functional today, sorry :) [14:50:56] <_joe_> confctl --object-type discovery select 'name=codfw' get gets you all the objects [14:51:11] XioNoX: correct, puppet change still needed for text [14:51:18] thanks! [14:51:49] XioNoX: are you touching only external network or also internal? [14:52:03] the current etcd primary is in codfw too [14:52:07] and how instant is the discovery change? just need to do it a few minutes before the work, right? [14:52:15] <_joe_> XioNoX: the DNS ttl [14:52:23] <_joe_> so 5 minutes if you don't shorten it [14:54:25] volans: all codfw, nothing is supposed to be impacted, but doing the low hanging depools just in case [15:01:06] ok [15:52:52] ema: so LVS2001/2/3 are going to lose their connection to cr1-codfw briefly, they're all primary with lvs2004/5/6 being backup it seems. Once codfw is depooled is it safe to disable pybal on 1/2/3 ? - this is just to prevent a possible page to everybody, or should I just downtime the paging alert? [15:57:05] XioNoX: I'd downtime the alert [15:57:14] ok! [15:58:00] XioNoX: when is the activity planned, again? [15:58:13] ema: in 1h [15:58:17] 3h long max [15:58:33] XioNoX: ack, I'll be afk but reachable on the phone in case you need me :) [15:59:37] ema: thanks, sites should be depooled (I'll downtime eqsin as well as the only eqsin transport goes to codfw), so I'm expecting noise at worse [15:59:48] ema: any idea where the LVS checks live? [16:03:41] XioNoX: mmh nope! [16:06:18] XioNoX: but maybe you can search for "lb.codfw" on https://icinga.wikimedia.org/icinga/ [16:06:45] ema: godog told me I could search for #page as well, meeting now, but will look after [16:06:53] ah nice [16:09:19] Displaying Result 1 - 587 of 587 Matching Services [16:09:22] FYI [16:11:15] ema: actually I'll still need to fail over the LVS that is in front of internal services, as those won't be depooled [16:17:30] yeah just disable puppet and then stop pybal, on all the lvs100[123] [16:17:41] (otherwise puppet might restart it on you) [16:17:48] err sorry 200[123] [16:18:22] hmmm we already have ulsfo+codfw text alerts now [16:19:01] any chance this is just the turnilo changes? [16:20:02] XioNoX: any thoughts? the availability drop is basically for cache_text for the whole codfw side of the world (codfw, ulsfo, eqsin) [16:20:37] bblack: I didn't do any change so far [16:31:38] bblack: let me know if/when it's fine to merge the DNS and varnish depools - https://gerrit.wikimedia.org/r/c/operations/dns/+/531502 - https://gerrit.wikimedia.org/r/c/operations/puppet/+/531513 [16:35:14] XioNoX: should be fine now. Do the DNS one first and give it 15 minutes before you do the varnish one, it will make things slightly smoother. [16:35:26] ok! [16:38:03] DNS done [16:56:56] XioNoX: on the varnish patch, of course keep in mind without a cumin run on the source DCs' varnishes it would take ~30m to apply fully [16:57:10] yep, thx! [16:57:26] (it has no effect at the destination, so you just have to hit eqsin + ulsfo) [17:04:11] bblack: I'm wondering if it's not easier to disable the BGP session to lvs2001/2/3 on the router side rather than stop puppet/pybal [17:07:57] XioNoX: whatever's easier for you I guess. [17:08:27] ok, making sure there were no other moving parts on the LVS side [17:10:07] technically, we could also just reconfigure pybal on these to talk to the other router instead and keep them all alive [17:10:16] but it hardly seems worth the trouble for a short depooled outage [17:12:28] sounds good, thx [17:12:47] all the depools are done, time to start the work [18:09:54] 10Traffic, 10Elasticsearch, 10Operations, 10Discovery-Search (Current work), 10Patch-For-Review: Icinga check defined from LVS configuration for cloudelastic are borked - https://phabricator.wikimedia.org/T229621 (10debt) 05Open→03Resolved [18:35:05] hmm that's weird [18:35:11] something is wrong with my gmail account [18:35:15] is it just me? [18:35:45] something? :P [18:36:09] yeah, sorry [18:36:20] I was logged out first and now it tells me to "Contact your domain admin for help." [18:36:31] weird [18:49:23] 10netops, 10Operations: PCI Gap Assessment auditor question about SNMP - https://phabricator.wikimedia.org/T230952 (10Jgreen) [18:52:20] 10netops, 10Operations: PCI Gap Assessment auditor question about SNMP - https://phabricator.wikimedia.org/T230952 (10ayounsi) 05Open→03Resolved a:03ayounsi SNMPv2c Read Only. Easy task! :) [19:07:12] ha ok. resolved. apparently, IT thought my contract had expired with the Foundation so I was logged out of my account [19:07:47] (!) that was a bit scary, to be told that my password has been changed [19:37:05] 10netops, 10Operations, 10ops-codfw: update RE-S-X6-64G-S in cr[12]-codfw - https://phabricator.wikimedia.org/T226422 (10ayounsi) 05Open→03Resolved DONE! Everything is healthy, very little alert noise, no service impact. [19:54:09] 10netops, 10Operations, 10ops-eqiad: (Need By: Sept 30) update RE-S-X6-64G-S in cr[12]-eqiad - https://phabricator.wikimedia.org/T226424 (10ayounsi) [20:16:01] 10Traffic, 10Operations: Configure Layer3 hashing for router ECMP (for anycast DNS) - https://phabricator.wikimedia.org/T230955 (10BBlack) [20:26:25] 10Traffic, 10Core Platform Team, 10Operations, 10Performance-Team, and 4 others: Serve Main Page of WMF wikis from a consistent URL - https://phabricator.wikimedia.org/T120085 (10kchapman) [20:26:53] 10Traffic, 10Core Platform Team, 10Operations, 10Performance-Team, and 4 others: Serve Main Page of WMF wikis from a consistent URL - https://phabricator.wikimedia.org/T120085 (10kchapman) @CCicalese_WMF could you review this from a product perspective and determine if it is something we want to do? [20:45:14] 10Domains, 10Traffic, 10Operations, 10WMF-Legal, 10Patch-For-Review: Move wikimedia.ee under WM-EE - https://phabricator.wikimedia.org/T204056 (10Slaporte) >>! In T204056#5261399, @tramm wrote: > ... Wikimedia Eesti doesn't directly control any nameservers (however we control many DNS records of domains... [21:10:48] 10Traffic, 10Operations, 10Phabricator, 10Release-Engineering-Team (Development services), and 2 others: Prepare Phame to support heavy traffic for a Tech Department blog - https://phabricator.wikimedia.org/T226044 (10JAufrecht) > title, subtitle and description I think what you have now, "Wikimedia Tech... [21:30:16] 10Traffic, 10Operations, 10Phabricator, 10Release-Engineering-Team (Development services), and 2 others: Prepare Phame to support heavy traffic for a Tech Department blog - https://phabricator.wikimedia.org/T226044 (10Aklapper) General reminder about [naming things](https://www.mediawiki.org/wiki/Naming_th... [22:40:02] 10netops, 10Operations, 10ops-eqiad: (Need By: Sept 30) update RE-S-X6-64G-S in cr[12]-eqiad - https://phabricator.wikimedia.org/T226424 (10ayounsi) Scheduled for Thursday Sept 5th, 8am PST, 11am local time, 15:00 UTC. 3h [22:52:21] 10Traffic, 10netops, 10Operations: Configure interface damping on primary links - https://phabricator.wikimedia.org/T196432 (10ayounsi) [22:52:24] 10netops, 10Operations, 10ops-eqiad: (Need By: Sept 30) update RE-S-X6-64G-S in cr[12]-eqiad - https://phabricator.wikimedia.org/T226424 (10ayounsi) [22:52:26] 10netops, 10Operations, 10ops-codfw: update RE-S-X6-64G-S in cr[12]-codfw - https://phabricator.wikimedia.org/T226422 (10ayounsi)