[00:50:09] 10HTTPS, 10Traffic, 10Operations, 10Performance-Team (Radar): TLS certificates renewal process - https://phabricator.wikimedia.org/T196248 (10Krenair) I don't think we use certbot anywhere except maybe Gerrit. This ticket hasn't been updated since the acme-chief deployment, which is now being used for the... [06:00:11] 10HTTPS, 10Traffic, 10Operations, 10Performance-Team (Radar): TLS certificates renewal process - https://phabricator.wikimedia.org/T196248 (10Krinkle) [06:02:29] 10HTTPS, 10Traffic, 10Operations, 10Performance-Team (Radar): TLS certificates renewal process - https://phabricator.wikimedia.org/T196248 (10Krinkle) @BBlack Based on the three references you've made to this ticket over the past two years, I guess this has de-facto been accepted as-is. Should we document... [07:06:09] 10netops, 10Operations, 10ops-codfw: Switch on rack C7 in codfw is down - https://phabricator.wikimedia.org/T267865 (10elukey) Just added two days of downtime to all the hosts in the rack, hopefully it will be less spammy. As follow up of this task I think that we should prioritize T225005, having only 3 ka... [08:27:32] 10Traffic, 10Desktop Improvements, 10Operations, 10Product-Infrastructure-Team-Backlog, and 4 others: Connection closed while downloading PDF of articles - https://phabricator.wikimedia.org/T266373 (10Jgiannelos) >>! In T266373#6616778, @akosiaris wrote: >>>! In T266373#6613038, @Jgiannelos wrote: >> @akos... [08:33:26] 10Traffic, 10Desktop Improvements, 10Operations, 10Product-Infrastructure-Team-Backlog, and 4 others: Connection closed while downloading PDF of articles - https://phabricator.wikimedia.org/T266373 (10Jgiannelos) >>! In T266373#6617586, @akosiaris wrote: >> Interestingly, proton returns transfer-encoding:... [09:00:03] 10Traffic, 10Operations: purged is not resilient to kafka main nodes going down - https://phabricator.wikimedia.org/T267867 (10jijiki) p:05Triage→03Medium [10:45:06] 10netops, 10Operations, 10ops-codfw: Switch on rack C7 in codfw is down - https://phabricator.wikimedia.org/T267865 (10ops-monitoring-bot) Icinga downtime for 1 day, 0:00:00 set by dcaro@cumin1001 on 1 host(s) and their services with reason: The switch it depends on is down ` cloudbackup2002.codfw.wmnet ` [11:05:38] 10Traffic, 10Desktop Improvements, 10Operations, 10Product-Infrastructure-Team-Backlog, and 4 others: Connection closed while downloading PDF of articles - https://phabricator.wikimedia.org/T266373 (10akosiaris) >>! In T266373#6623109, @Jgiannelos wrote: >>>! In T266373#6616778, @akosiaris wrote: >>>>! In... [11:18:38] 10HTTPS, 10Traffic, 10Operations, 10Patch-For-Review, and 2 others: sec-warning page uses the term "Wikipedia" incorrectly - https://phabricator.wikimedia.org/T241656 (10Ladsgroup) 05Open→03Resolved a:03Ladsgroup [14:30:56] 10HTTPS, 10Traffic, 10Beta-Cluster-Infrastructure, 10Operations: The certificate for upload.beta.wmflabs.org expired on November 13, 2020. - https://phabricator.wikimedia.org/T267858 (10Vgutierrez) >>! In T267858#6622251, @Krenair wrote: > @Vgutierrez FYI in case this could happen in prod too, I haven't be... [15:49:11] 10Traffic, 10Operations, 10Patch-For-Review, 10User-notice: Deprecate TLSv1.2 weak ciphersuites - https://phabricator.wikimedia.org/T258405 (10Vgutierrez) 05Open→03Resolved a:03Vgutierrez [16:11:57] 10netops, 10Operations, 10ops-codfw: ripe-atlast-codfw is down - https://phabricator.wikimedia.org/T267714 (10Papaul) power cycle device, checked cable, swapped cable device is still showing down [16:12:48] 10netops, 10Operations, 10ops-codfw: ripe-atlas-codfw is down - https://phabricator.wikimedia.org/T267714 (10ayounsi) [16:13:50] 10netops, 10Operations, 10ops-codfw: ripe-atlas-codfw is down - https://phabricator.wikimedia.org/T267714 (10ayounsi) a:03faidon I think Faidon is the person who knows the most about the Atlas :) Feel free to re-assign as needed. [16:29:22] 10Traffic, 10Beta-Cluster-Infrastructure, 10Operations, 10User-Ryasmeen: Beta needs to be upgraded to Varnish 6 - https://phabricator.wikimedia.org/T267561 (10Vgutierrez) 05Resolved→03Open Re-opening as mentioned in https://phabricator.wikimedia.org/T267006#6624466 deployment-cache-upload06 has been om... [16:29:43] vgutierrez: hi, indeed deployment-cache-upload06 hasn't been fixed :/ [16:31:18] btw nothing marked on hold on upload06 [16:31:27] so maybe the apt upgrade fails :/ [16:31:51] I'm not familiar on how those systems are usually upgraded TBH [16:32:02] no idea either :] [16:32:11] I guess we just try hard until something works [16:32:18] poor server [16:32:19] mainly puppet passes and packages are all upgraded [16:32:23] be gentle [16:32:41] for the text06 one, ariel and I took over on thursday after some other acted on it [16:33:09] and we just noticed some packages were marked on old. After an upgrade puppet was all happy as well as the varnish services [16:33:12] (surprisingly) [16:34:39] probably having to downgrade libvarnishapi2 is messing with apt [16:34:44] libvarnishapi2: [16:34:44] Installed: 6.1.1-1+deb10u1 [16:34:44] Candidate: 6.0.6-1wm2 [16:35:15] 10netops, 10Operations, 10ops-codfw: Switch on rack C7 in codfw is down - https://phabricator.wikimedia.org/T267865 (10ayounsi) Spare switch configured. Old (failed): https://netbox.wikimedia.org/dcim/devices/1892/ New (spare): https://netbox.wikimedia.org/dcim/devices/235/ That's where T259166 would be use... [16:36:02] certainly [16:36:15] I think we have unattended upgrade system enabled on beta cluster [16:44:01] are you going to trigger the upgrade or may I? [16:56:15] already on it :) [17:04:57] 10Traffic, 10Beta-Cluster-Infrastructure, 10Operations, 10User-Ryasmeen: Beta needs to be upgraded to Varnish 6 - https://phabricator.wikimedia.org/T267561 (10Vgutierrez) 05Open→03Resolved a:03Vgutierrez manually ran `apt upgrade` and puppet afterwards.. everything seems ok on deployment-cache-upload06 [17:12:04] vgutierrez: we need to remember to re-pool the codfw lvs :) [17:12:16] yup [17:12:40] let me repool cp2037 and cp2038 first though :) [17:27:08] XioNoX: lvs2007 is looking good :) [17:27:15] nice! [17:27:22] traffic is coming back https://grafana.wikimedia.org/d/000000343/load-balancers-lvs?orgId=1&viewPanel=7&from=now-30m&to=now [17:29:28] 10Traffic, 10Beta-Cluster-Infrastructure, 10Operations, 10User-Ryasmeen: Beta needs to be upgraded to Varnish 6 - https://phabricator.wikimedia.org/T267561 (10hashar) thank you! [17:38:12] 10netops, 10Operations, 10ops-codfw: Switch on rack C7 in codfw is down - https://phabricator.wikimedia.org/T267865 (10Volans) I've run this piece of code to migrate the interfaces from the old to the new device in a Netbox `nbshell`. ` import uuid request_id = uuid.uuid4() user = User.objects.get(username='... [17:41:40] 10netops, 10Operations, 10ops-codfw: Switch on rack C7 in codfw is down - https://phabricator.wikimedia.org/T267865 (10ayounsi) 05Open→03Resolved a:03ayounsi Thanks, I think we're all done here. RMA in T267950. [19:55:44] 10HTTPS, 10Traffic, 10Beta-Cluster-Infrastructure, 10Operations: The certificate for upload.beta.wmflabs.org expired on November 13, 2020. - https://phabricator.wikimedia.org/T267858 (10Krenair) 05Open→03Resolved [22:01:03] 10Domains, 10Traffic, 10Operations, 10Patch-For-Review: Change of nameservers for Wikimedia.org.tr - https://phabricator.wikimedia.org/T259792 (10CRoslof) 05Open→03Resolved a:03CRoslof I have updated the nameservers to the ones requested.