[07:28:17] hola, anybody against me merging https://gerrit.wikimedia.org/r/#/c/313400/ ? [08:24:50] 10netops, 06Operations, 10Ops-Access-Requests: Access to network devices - https://phabricator.wikimedia.org/T147061#2683656 (10ArielGlenn) p:05Triage>03Normal [13:28:32] 07HTTPS, 10Traffic, 06Operations, 07Wikimedia-Incident: Make OCSP Stapling support more generic and robust - https://phabricator.wikimedia.org/T93927#2684299 (10BBlack) This is still technically an outstanding issue that should be addressed, but it's relatively low priority with relatively low risk, at lea... [13:31:14] 10Traffic, 06Operations, 10media-storage, 13Patch-For-Review: Certain images failing to load in ulsfo - https://phabricator.wikimedia.org/T144257#2593559 (10BBlack) >>! In T144257#2681470, @Aklapper wrote: > Is anybody actively investigating this? / Does this need more investigation? Or did the merged patc... [13:31:37] 10Traffic, 06Operations, 10media-storage, 13Patch-For-Review: Certain images failing to load in ulsfo - https://phabricator.wikimedia.org/T144257#2684312 (10BBlack) [13:31:39] 10Traffic, 06Operations, 13Patch-For-Review: varnish backends start returning 503s after ~6 days uptime - https://phabricator.wikimedia.org/T145661#2684310 (10BBlack) [13:33:47] 10Traffic, 10Beta-Cluster-Infrastructure, 06Operations, 13Patch-For-Review: Upload cache in beta is broken - https://phabricator.wikimedia.org/T147116#2684318 (10BBlack) It would probably be better to upgrade the deployment-prep upload cache to varnish4. [14:00:24] morning bblack! Anything against me merging https://gerrit.wikimedia.org/r/#/c/313400/ ? [14:00:54] 10Traffic, 10Beta-Cluster-Infrastructure, 06Operations, 13Patch-For-Review: Upload cache in beta is broken - https://phabricator.wikimedia.org/T147116#2684352 (10AlexMonk-WMF) >>! In T147116#2684318, @BBlack wrote: > It would probably be better to upgrade the deployment-prep upload cache to varnish4. Okay... [14:01:27] thanks! [14:01:28] elukey: go for it [14:06:08] 10Traffic, 10Beta-Cluster-Infrastructure, 06Operations, 13Patch-For-Review: Upload cache in beta is broken - https://phabricator.wikimedia.org/T147116#2684357 (10BBlack) I wish :) The basic flow we're using on prod nodes is here, but some of that's inapplicable to deployment-prep: https://wikitech.wikimed... [14:15:02] 10Traffic, 06Operations, 13Patch-For-Review: TLS stats regression related to Chrome/41 on Windows - https://phabricator.wikimedia.org/T141786#2684373 (10BBlack) There's not much left to do here and we're no longer actively investigating. However, I'd like to try removing the 401 hack at some point, to see i... [14:19:27] 07HTTPS, 10Traffic, 10MediaWiki-Page-editing, 06Operations, 07Browser-Support-Internet-Explorer: text input history/autocomplete doesn't work with HTTPS under IE8-10 - https://phabricator.wikimedia.org/T55636#2684382 (10BBlack) 05Open>03declined Declining this task because (a) It's been open for 3 ye... [14:20:46] 07HTTPS, 10Traffic, 06Operations, 10Wikimedia-Shop: store.wikimedia.org HTTPS issues - https://phabricator.wikimedia.org/T128559#2684390 (10BBlack) [14:20:48] 07HTTPS, 10Traffic, 06Operations, 10Wikimedia-Shop: Canonical URL in Store points to HTTP address, should be HTTPS - https://phabricator.wikimedia.org/T131131#2684389 (10BBlack) [14:22:01] 07HTTPS, 10Traffic, 06Operations, 13Patch-For-Review: Create a secure redirect service for large count of non-canonical / junk domains - https://phabricator.wikimedia.org/T133548#2684397 (10BBlack) [14:22:03] 07HTTPS, 10Traffic, 06Operations: https://wikipedia.com and similar throw certificate warning - https://phabricator.wikimedia.org/T42998#2684399 (10BBlack) [14:22:58] 07HTTPS, 10Traffic, 10MediaWiki-General-or-Unknown, 10Wikimedia-General-or-Unknown: securecookies - https://phabricator.wikimedia.org/T119570#2684401 (10BBlack) Removing Traffic/Ops here, as the Traffic-layer cookies are all marked secure now. [14:28:15] 10Traffic, 06Operations: OpenSSL 1.1 deployment for cache clusters - https://phabricator.wikimedia.org/T144523#2684429 (10BBlack) We discussed this at the offsite, and we're reading to go with OpenSSL 1.1.0b. The plan is to patch our build such that the -dev package is version-differentiated in the package ti... [14:30:53] 10Traffic, 06Operations, 13Patch-For-Review: Planning for phasing out non-Forward-Secret TLS ciphers - https://phabricator.wikimedia.org/T118181#2684436 (10BBlack) We've discussed (at our offiste meetings) our strategy for removing the final pair of non-forward-secret ciphers (DES-CBC3-SHA and AES128-SHA).... [14:52:12] 10Traffic, 06Operations: Removing support for DES-CBC3-SHA TLS cipher - https://phabricator.wikimedia.org/T147199#2684468 (10BBlack) [15:01:33] 10Traffic, 06Operations: Removing support for DES-CBC3-SHA TLS cipher - https://phabricator.wikimedia.org/T147199#2684520 (10BBlack) [15:05:27] 10Traffic, 06Operations: Removing support for AES128-SHA TLS cipher - https://phabricator.wikimedia.org/T147202#2684557 (10BBlack) [15:08:43] 10Traffic, 10MediaWiki-General-or-Unknown, 06Operations, 06Release-Engineering-Team, and 5 others: Make sure we're not relying on HTTP_PROXY headers - https://phabricator.wikimedia.org/T140658#2471564 (10BBlack) Is there more to do here on the MW-Core side of things? [15:15:53] 10Traffic, 06Operations, 13Patch-For-Review: Decom bits.wikimedia.org hostname - https://phabricator.wikimedia.org/T107430#2684611 (10BBlack) The hostname's been gone for ~12 days now, so odds of revert seem low at this point. I'm going to merge up the VCL patch to kill the unused bits code there, and push... [15:25:17] 10Traffic, 10Beta-Cluster-Infrastructure, 06Operations, 13Patch-For-Review: Upload cache in beta is broken - https://phabricator.wikimedia.org/T147116#2684616 (10AlexMonk-WMF) 05Open>03Resolved a:03AlexMonk-WMF >>! In T147116#2684357, @BBlack wrote: > I wish :) Yeah I knew you were gonna say that.... [15:26:23] 10Traffic, 10Beta-Cluster-Infrastructure, 06Operations, 13Patch-For-Review: Upload cache in beta is broken - https://phabricator.wikimedia.org/T147116#2684621 (10BBlack) I think we can abandon the patch. We're assuming we're past the point of reverting to varnish3 for the upload caches at this point, just... [15:32:34] 10Traffic, 06Operations, 13Patch-For-Review: Stop using persistent storage in our backend varnish layers. - https://phabricator.wikimedia.org/T142848#2684632 (10BBlack) 05Open>03Resolved a:03BBlack We're past this decision point now. There are issues with `file` storage in Varnish4 as well, but mitiga... [15:33:04] 10Traffic, 06Operations, 13Patch-For-Review: varnishd: Assert error in smp_oc_getobj(), storage/storage_persistent_silo.c line 417 - https://phabricator.wikimedia.org/T142810#2684636 (10BBlack) 05Open>03Resolved a:03BBlack No longer relevant (see T142848) [15:46:42] 10Traffic, 06Operations: Removing support for DES-CBC3-SHA TLS cipher - https://phabricator.wikimedia.org/T147199#2684679 (10BBlack) [15:49:17] 10Traffic, 06Operations: Removing support for AES128-SHA TLS cipher - https://phabricator.wikimedia.org/T147202#2684685 (10BBlack) [16:04:09] 07HTTPS, 10Traffic, 06Operations, 06WMF-Communications, 07Security-Other: Server certificate is classified as invalid on government computers - https://phabricator.wikimedia.org/T128182#2684754 (10BBlack) I don't think there's really anything we can do here on our end, and this has been opened with no pr... [16:05:59] 07HTTPS, 10Traffic, 06Operations, 06Performance-Team, and 2 others: HTTPS-only for stream.wikimedia.org - https://phabricator.wikimedia.org/T140128#2684764 (10BBlack) >>! In T140128#2637840, @Dzahn wrote: >>>! In T140128#2625078, @AlexMonk-WMF wrote: >> Can you filter those access logs down to labs entries... [16:14:40] 07HTTPS, 10Traffic, 06Operations, 06WMF-Communications, 07Security-Other: Server certificate is classified as invalid on government computers - https://phabricator.wikimedia.org/T128182#2684780 (10Reedy) Canned response to send back to them, and something for them to push to their IT guys to point out th... [16:19:15] 07HTTPS, 10Traffic, 06Operations, 06WMF-Communications, 07Security-Other: Server certificate is classified as invalid on government computers - https://phabricator.wikimedia.org/T128182#2684800 (10BBlack) I think probably the best canned response we can send is something along the lines of: ```Probably... [16:40:12] 10Traffic, 06Operations, 13Patch-For-Review: TLS stats regression related to Chrome/41 on Windows - https://phabricator.wikimedia.org/T141786#2684860 (10BBlack) At least in the initial few minutes after removing the workaround, there's no apparent return of the bad traffic. Will leave this for a few days to... [16:58:41] bblack: not sure if you are working on it but I received "ERROR:conftool:Error when trying to set/pooled=no on name=cp1064.eqiad.wmnet,service=varnish-be-rand" [16:59:05] (48 mins ago, cron root@) [16:59:20] ERROR:conftool:Failure writing to the kvstore: Backend error: Raft Internal Error : [16:59:30] known issue with the etcd cluster I think [17:00:38] yes I mentioned since I don't know if 1064 needed to be restarted or not [17:01:05] in this case no [17:01:40] well, I take that back. maybe :) [17:02:07] originally it was failing with a zero exit code and letting the restart happen anyways (without the pre-depool), but that may not be the case anymore [17:02:32] looks like the restart still happened [17:06:14] so it looks like the caches are running python-conftool-0.3.0 with no upgrade avail, but the repo has a 0.3.1 release that I guess hasn't made it to carbon yet, with the bugfix [17:06:39] (the exit-code bugfix, which will cause the restart to not happen if the depool fails, which is more-correct but also makes the real issue more-annoying) [17:09:18] 10Traffic, 06Operations: etcd cluster has Raft Internal errors sporadically - https://phabricator.wikimedia.org/T147209#2685056 (10BBlack) [17:09:35] ^ made a ticket about the deeper issue, we seemed to not make one originally [17:11:02] nice! [17:11:26] I was trying to check how vk reacted to the restart but I can only see Oct 03 05:36:16 cp1064 varnishkafka[4031]: VSLQ_Dispatch: Log acquired! [17:11:40] that is early this morning [17:12:10] the new functionality seems working fine though [17:12:11] Sep 30 05:36:19 cp1064 varnishkafka[11447]: VSLQ_Dispatch: Log acquired! [17:12:14] Sep 30 17:30:05 cp1064 varnishkafka[11447]: VSLQ_Dispatch: Varnish Log abandoned or overrun. [17:12:17] Sep 30 17:30:05 cp1064 varnishkafka[11447]: VSLQ_Dispatch: Log acquired! [17:12:23] but this one was some days ago [17:12:53] ahhh the script erroring was varnish-backend [17:12:58] okok now I got it [17:13:22] :) [17:13:23] only a varnish-frontend restart causes a vk log abandoned/acquired log [17:13:27] sorry for the spam :) [17:13:58] well in these cases, when the backends do their scripted restarts, all the frontends in the same DC also reload their VCL (to depool the backend in question, then later to repool it) [17:14:15] and apparently the frontend VCL reloads had some probability of killing vk's connection to shm log [17:14:28] (which is hopefully addressed now with the new reconnect code) [17:15:13] (I really hope so!) [17:19:22] <_joe_> bblack: the conftool upgrade will be available tomorrow [17:19:41] <_joe_> sorry, didn't make it before the offsite and I needed to work on k8s today before yuvi went away [17:25:50] np :) [20:00:55] bblack: your phabricator replies are priceless thank you :] [21:40:22] 10Traffic, 06Operations, 13Patch-For-Review, 05codfw-rollout: Varnish support for active:active backend services - https://phabricator.wikimedia.org/T134404#2686167 (10BBlack) [21:40:25] 10Traffic, 06Operations, 06Services, 13Patch-For-Review: Declarative configuration for varnish services and backends - https://phabricator.wikimedia.org/T110717#2686169 (10BBlack) [22:00:37] 10Traffic, 06Operations, 13Patch-For-Review, 05codfw-rollout: Varnish support for active:active backend services - https://phabricator.wikimedia.org/T134404#2686270 (10BBlack) I've merged the "declarative config" ticket to here, it's worth perusing the older comments/commits there at T110717. The rational...