[00:57:02] 10Traffic, 10Analytics, 10Analytics-EventLogging, 10Operations, 10Performance-Team: Increase EventLogging limit from 2K to 4K - https://phabricator.wikimedia.org/T208282 (10Krinkle) [00:58:21] 10Traffic, 10Analytics, 10Analytics-EventLogging, 10Operations, 10Performance-Team: Increase EventLogging limit from 2K to 5K - https://phabricator.wikimedia.org/T208282 (10Krinkle) [02:34:48] 10Traffic, 10Community-Tech, 10MediaWiki-Parser, 10Operations, and 5 others: Show SVGs in page language if available - https://phabricator.wikimedia.org/T205040 (10Samwilson) I think this just needs rebasing. [02:38:25] 10netops, 10Operations, 10ops-eqiad, 10Patch-For-Review: Rack/setup cr2-eqord - https://phabricator.wikimedia.org/T204170 (10Papaul) 05Open>03Resolved Double checked, it looks like https://netbox.wikimedia.org/dcim/devices/1954/ is complete so I deleted the first one [10:21:54] 10Certcentral, 10Traffic, 10Operations: certcentral: delay deployment of renewed certs to wait out skewed client clocks - https://phabricator.wikimedia.org/T204997 (10Vgutierrez) Let's Encrypt intentionally backdates the issued certificates 1 hour. ```name=cercentral logs Oct 30 10:02:36 certcentral1001 cert... [11:35:08] 10Traffic, 10Operations, 10Wikimedia-Incident: Power incident in eqsin - https://phabricator.wikimedia.org/T206861 (10ema) [11:35:10] 10Traffic, 10Operations, 10monitoring, 10Patch-For-Review: Icinga: check_confd_vcl_reload unknown when file is missing - https://phabricator.wikimedia.org/T206950 (10ema) 05Open>03Resolved a:03ema Fixed: https://gerrit.wikimedia.org/r/470353 [12:02:14] 10Certcentral, 10Traffic, 10Operations: certcentral: delay deployment of renewed certs to wait out skewed client clocks - https://phabricator.wikimedia.org/T204997 (10BBlack) So, with regard to the potential staging delays in this and T207295 , the reason they're not urgent or required for conversion of the... [12:22:48] ema: if you have a min, can you double-check for VCL stupidity on my part in https://gerrit.wikimedia.org/r/c/operations/puppet/+/470579 ? :) [12:23:28] ema: it's killing support for *.zero.wikipedia.org, but leaving "zero.wikipedia.org" and "zero.wikimedia.org" working as they were (other project domains definitely don't have zero hostnames at all) [12:26:15] or I guess alternatively, it might be safer to just leave the VCL alone until the final removal of zero [12:26:21] and just mess with the san list stuff [12:26:31] yeah, maybe I'll just do that. [12:26:34] ema: nevermind :) [12:41:15] is the date of the last zero contracts expiry public? [12:43:24] no, I don't think so [12:45:11] I'm trying to think/search what of that is public [12:45:19] https://phabricator.wikimedia.org/T187716 is there, but kinda dated and lacks any firm end date [12:48:53] ah that links this: https://blog.wikimedia.org/2018/02/16/partnerships-new-approach/ [12:49:09] which says "The Wikipedia Zero program will end in 2018" in its summary [12:51:09] so it could have already ended or it will end within the next 8 weeks [12:51:23] /9 [12:51:40] well according to a blog post from february anyways [12:51:52] obviously, plans slip/change, so I doubt the "will end in 2018" is really a commitment [13:13:33] bblack: ok, so we're leaving the VCL as is for now [13:16:27] speaking of VCL :) [13:16:55] bblack: do we really need this in misc-frontend? https://github.com/wikimedia/puppet/blob/production/modules/varnish/templates/misc-frontend.inc.vcl.erb#L5 [13:17:26] I've added /usr/share/varnish/tests/text/27-sitemaps-rewrites.vtc and that passes w/o the misc-frontend code above, so maybe the duplication is not needed? [14:01:32] 10netops, 10Operations, 10ops-codfw: codfw row C recable and add QFX - https://phabricator.wikimedia.org/T208272 (10ayounsi) Here is the full list of hosts in that row. No outages expected, but brief (5s) connectivity interruption for some racks is possible. CCing services owners, to know if it's an acceptab... [14:06:39] ema: yeah I'm out of context on that now, but we should revisit it a bit. I'm inclined to think the testing just fails to test reality here because of , because it didn't work without the duplication in practice, and acted as if all of req.* gets reset to original incoming values as soon as you VCL-switch. It could still arguably be factored better even then though [14:06:45] . [14:19:22] 10netops, 10Operations, 10ops-codfw: codfw row C recable and add QFX - https://phabricator.wikimedia.org/T208272 (10jcrespo) > Thursday No problem on my side, a short network outage is not a huge issue on codfw for dbs, but I cannot guarantee they will not page, and I won't be around to attend it- someone e... [14:23:56] 10netops, 10Operations, 10ops-codfw: codfw row C recable and add QFX - https://phabricator.wikimedia.org/T208272 (10fgiunchedi) >>! In T208272#4706141, @ayounsi wrote: > CCing services owners, to know if it's an acceptable risk and if it can be mitigated by depooling services. Short interruptions are ok wit... [14:38:27] 10netops, 10Operations, 10ops-codfw: codfw row C recable and add QFX - https://phabricator.wikimedia.org/T208272 (10Papaul) [15:00:17] ema: I finally got time to validate https://gerrit.wikimedia.org/r/c/operations/puppet/+/468320/ on deployment-prep, with multiple elasticsearch instances [15:00:43] it seems to work fine at least for that use case (it is a noop in other current use cases) [15:00:55] ema: your +1 (or -1) would be welcomed [15:20:55] gehel: looking! [15:23:48] ema: thansk! [15:31:46] gehel: is this currently tested on deployment-elastic05? How is it configured? [15:32:59] I have rolled back the cherry-pick already, but the code change on elastic is similar to https://gerrit.wikimedia.org/r/c/operations/puppet/+/466591 [15:33:15] (I configured that directly in the instance puppet) [15:33:42] ah! [15:33:54] it declared 2 TLS proxies, on port 9243 and 9443, sharing the same cert [15:34:37] nnot sure exactly what you'd like to know [15:35:36] ema: thanks! [15:35:44] https://gerrit.wikimedia.org/r/c/operations/puppet/+/466591 answers my question :) [15:38:33] 10netops, 10Operations, 10ops-codfw: codfw row C recable and add QFX - https://phabricator.wikimedia.org/T208272 (10ayounsi) [15:40:28] 10netops, 10Operations, 10ops-codfw: codfw row C recable and add QFX - https://phabricator.wikimedia.org/T208272 (10Papaul) [15:53:16] 10Certcentral: Improve expected exceptions logging in certcentral - https://phabricator.wikimedia.org/T208326 (10Vgutierrez) [16:05:13] 10netops, 10Operations, 10ops-codfw: codfw row C recable and add QFX - https://phabricator.wikimedia.org/T208272 (10ayounsi) [16:11:15] I^Hwe need to get busy on decoms too [16:11:37] eeden, lvs1007-12 (whatever's left there), decommed eqiad cp10xx, etc [16:11:47] all of those are in indemterminate states stalled on us somewhere [16:12:27] once I get past my next meeting and then make some forward progress on GlobalSign I'll take a look at some of it [16:13:58] cool, I'll look at some of it tomorrow too [16:14:01] and also, re-check whether we're still blocked on the rest of lvs1013-16 for now and still have to keep lvs100x in play until then [16:15:26] afternoon vgutierrez [16:16:09] how's stuff going? just checking in to ensure you're not waiting for me for anything [16:19:15] hi Krenair [16:19:22] just studying the logs of thi morning tests [16:19:25] cool [16:19:31] so we got the certificates as expected [16:19:54] you can see the output of certcentral2001 in T208326 [16:19:54] T208326: Improve expected exceptions logging in certcentral - https://phabricator.wikimedia.org/T208326 [16:20:16] you deleted/moved the certs on disk to have it issue new ones right? [16:20:20] indeed [16:20:24] in both nodes [16:20:53] and ran puppet at the same time, so it triggered the reconfig in certcentral2001 to use the new certificate, and restarted the service in both nodes [16:21:03] s/new certificate/new account/ [16:22:08] so besides the log verbosity, I've realized we've a small bug related to the exponential backoff [16:22:45] we are ignoring systematically the state SELF_SIGNED, we cannot do that unless the previous state is INITIAL [16:23:22] (or VALID) [17:38:44] 10netops, 10Operations, 10ops-codfw: codfw row C recable and add QFX - https://phabricator.wikimedia.org/T208272 (10elukey) About the C4 switch replacement: there are 4 mw hosts in codfw that are acting as proxies for mcrouter to replicate keys from eqiad to codfw: ``` elukey@mw1347:~$ cat /etc/mcrouter/con... [20:14:51] 10netops, 10Cloud-VPS, 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): ntp broken in new region - https://phabricator.wikimedia.org/T208244 (10Andrew) [20:15:39] 10netops, 10Cloud-VPS, 10Operations, 10Patch-For-Review, 10cloud-services-team (Kanban): ntp broken in new region - https://phabricator.wikimedia.org/T208244 (10Andrew) a:03Andrew [22:45:26] 10Traffic, 10Community-Tech, 10MediaWiki-Parser, 10Operations, and 5 others: Show SVGs in page language if available - https://phabricator.wikimedia.org/T205040 (10Samwilson) I was waiting for others to weigh in. They haven't. I've +2'd it. :)