[00:35:54] 10Traffic, 10DC-Ops, 10Operations, 10ops-esams: Multiple systems in esams OE10 showing PSU failures - https://phabricator.wikimedia.org/T177228 (10Dzahn) cp3033 is shown as having CRIT redundancy https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=cp3033&service=IPMI+Sensor+Status [00:40:03] 10Traffic, 10Operations, 10ops-esams: cp3033 unreacheable since 2018-07-15 11:47:31 - https://phabricator.wikimedia.org/T199677 (10Dzahn) The host also shows that power supplies are not redundant.. which had a comment linking to T177403 -> T177228. And support has expired (https://netbox.wikimedia.org/dcim... [00:44:18] 10Traffic, 10Operations: cp4021 - UNKNOWN: cannot run varnishstat - https://phabricator.wikimedia.org/T221731 (10Dzahn) [04:10:09] 10Traffic, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar): Refactor public-facing DYNA scheme for primary project hostnames in our DNS - https://phabricator.wikimedia.org/T208263 (10CDanis) @Cwek Thank you very much for the detailed report! I've rolled back the experimental change to our DNS... [06:31:32] 10Traffic, 10Operations: cp4021 - UNKNOWN: cannot run varnishstat - https://phabricator.wikimedia.org/T221731 (10Vgutierrez) p:05Triage→03Low that's expected, as @ema mentioned yesterday in -traffic: ` so we've got cp4021 reimaged as Varnish/ATS and it seems to be looking kind-of OK it is howeve... [07:04:39] 10Traffic, 10Operations: Evaluate ATS TLS stack - https://phabricator.wikimedia.org/T220383 (10Vgutierrez) We need to keep an eye on https://github.com/apache/trafficserver/issues/5084 [08:25:54] 10Traffic, 10Operations: Evaluate ATS TLS stack - https://phabricator.wikimedia.org/T220383 (10MoritzMuehlenhoff) >>! In T220383#5133808, @Vgutierrez wrote: > We need to keep an eye on https://github.com/apache/trafficserver/issues/5084 Buster has OpenSSL 1.1.1b, so this affects ATS as shipped in Buster? Shou... [08:27:56] 10Traffic, 10Operations: Evaluate ATS TLS stack - https://phabricator.wikimedia.org/T220383 (10Vgutierrez) so I guess it's affected but right now I'm working under the assumption that we will use stretch in the cp nodes, using our own ATS packaging. @ema can confirm that :) [08:29:48] moritzm, ema maybe is faster here :) [08:30:17] moritzm: the cp-ats nodes are currently user our (ema) custom packaging for trafficserver for stretch [08:30:45] so at some point we need to bring libssl1.1.1 to stretch as we briefly discussed in the past [08:31:19] and consider the issue described on https://github.com/apache/trafficserver/issues/5084 that BTW looks like it will be fixed in 1.1.1c [08:34:09] vgutierrez: if https://github.com/apache/trafficserver/issues/5084 is reproducible in buster (with buster packages, not ours), we should file a bug in debian [08:37:00] when it comes to our own setup: is there any other software requiring libssl1.1.1 in stretch? [08:37:20] how much of an effort is it to backport it? [08:42:10] moritzm suggested following a similar approach as the one used to bring 1.1.0 to jessie IIRC [08:57:06] 10Traffic, 10Operations: cp4021 - UNKNOWN: cannot run varnishstat - https://phabricator.wikimedia.org/T221731 (10ema) Indeed our Varnish mailbox lag Icinga check only applies to Varnish backends, given that backends are those affected by T145661 and similar issues. During the Puppet refactoring splitting front... [09:07:26] cdanis: https://en.greatfire.org/https/wikipedia.org and https://www.comparitech.com/privacy-security-tools/blockedinchina/ seems to still give negative results [10:00:24] 10netops, 10Operations: Level3 esams <-> eqiad link outage - https://phabricator.wikimedia.org/T221758 (10akosiaris) [10:03:55] 10Traffic, 10Operations, 10Patch-For-Review: cp4021 - UNKNOWN: cannot run varnishstat - https://phabricator.wikimedia.org/T221731 (10ema) 05Open→03Resolved [10:07:04] elukey: Receiver signal average optical power : 0.0003 mW / -35.23 dBm [10:07:18] but the other side reports [10:07:25] Receiver signal average optical power : 0.4013 mW / -3.97 dBm [10:07:41] looks like 1 direction of the fiber was cut or something [10:07:54] shark eating the cable [10:08:07] lol [10:08:13] more like broken I 'd say though [10:08:19] deteriorated badly [10:09:16] ah we got an email [10:10:37] we have the images: https://youtu.be/XMxkRh7sx84?t=10 [10:11:27] lol [10:12:04] the must by so yummy for them, or good to bite for their teeth [10:12:13] *it must be [10:12:18] Service Identifier: BDFS2448 [10:12:18] Ticket Created On: BDFS2448 [10:12:18] 10netops, 10Operations: Level3 esams <-> eqiad link outage - https://phabricator.wikimedia.org/T221758 (10akosiaris) Received information for Level3 ` This is to confirm that ticket 16262986 has been created regarding your service. Customer Name: Wikimedia Foundation Billing Account Number: 1-DCG6LL Customer... [10:12:27] so.. BDFS2448 is a valid timestamp? [10:12:32] I should start using it ;-) [10:12:41] anyway [10:13:17] we have to wait for them to fix the link, and hope that knams feels good today :D [10:19:30] 10netops, 10Operations: Level3 esams <-> eqiad link outage - https://phabricator.wikimedia.org/T221758 (10akosiaris) p:05Triage→03High [10:27:08] 10Traffic, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar): Refactor public-facing DYNA scheme for primary project hostnames in our DNS - https://phabricator.wikimedia.org/T208263 (10Cwek) @CDanis I can't confirm it completely, but it seems the side effect may have formed. I extracted some su... [10:30:09] 10Traffic, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar): Refactor public-facing DYNA scheme for primary project hostnames in our DNS - https://phabricator.wikimedia.org/T208263 (10Cwek) @CDanis Thanks your help, but it seems the side effect may have formed. I extracted some subdomains of w... [10:41:08] 10netops, 10Operations: Level3 esams <-> eqiad link outage - https://phabricator.wikimedia.org/T221758 (10akosiaris) New update says: ` Field Operations dispatched and upon arrival to the site determined the fiber near the equipment had been burned. Field Operations are currently working to install a new fibe... [10:41:25] 10netops, 10Operations: Level3 esams <-> eqiad link outage - https://phabricator.wikimedia.org/T221758 (10akosiaris) [10:41:32] 10netops, 10Operations: Level3 esams <-> eqiad link outage - https://phabricator.wikimedia.org/T221758 (10akosiaris) p:05High→03Low [12:18:59] gilles: yes, I'll find some time this week to look at your patch! [13:26:09] thanks! [14:04:40] 10netops, 10Operations: Level3 esams <-> eqiad link outage - https://phabricator.wikimedia.org/T221758 (10akosiaris) 05Open→03Resolved a:03akosiaris CenturyLink sent a summary and a notification they 'll close the issues as resolved on their end. ` Summary: On April 24, 2019 at 9:21 GMT, CenturyLink ide... [14:44:29] 10Traffic, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar): Refactor public-facing DYNA scheme for primary project hostnames in our DNS - https://phabricator.wikimedia.org/T208263 (10BBlack) @Cwek - Thanks for the reports! Have you tried other Wikimedia projects (e.g. wikiversity, wikiquote,... [14:58:36] 10netops, 10Analytics-Kanban, 10EventBus, 10Operations: Allow analytics VLAN to reach schema.svc.$site.wmnet - https://phabricator.wikimedia.org/T221690 (10Ottomata) a:05Ottomata→03None [14:58:46] who should I bug about ^ [14:58:48] :) [14:58:49] ? [15:09:37] 10Traffic, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar): Refactor public-facing DYNA scheme for primary project hostnames in our DNS - https://phabricator.wikimedia.org/T208263 (10Cwek) @BBlack You can read this [[ https://zh.wikipedia.org/wiki/Help:%E5%A6%82%E4%BD%95%E8%AE%BF%E9%97%AE%E7%B... [15:10:14] ottomata: T221690 ? [15:10:14] T221690: Allow analytics VLAN to reach schema.svc.$site.wmnet - https://phabricator.wikimedia.org/T221690 [15:18:17] XioNoX: ya [15:19:13] 10netops, 10Analytics-Kanban, 10EventBus, 10Operations: Allow analytics VLAN to reach schema.svc.$site.wmnet - https://phabricator.wikimedia.org/T221690 (10ayounsi) a:03ayounsi [15:19:20] I can help [15:19:38] will do it later today [15:27:56] XioNoX: I can do it if you want [15:28:26] it should be an easy term to add [15:28:31] (ipv6 and ipv4) [15:28:55] elukey: I'll do it through jnt, unless it's urgent [15:29:35] ack then [15:35:22] not urgent, thank you! [15:40:24] elukey: schema.svc.eqiad.wmnet and schema.svc.codfw.wmnet only have v4 DNS [15:40:44] ah yes even better then [15:44:02] permission to bounce pybal on lvs low-traffic eqiad/codfw? context is https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/504590/ [15:47:54] herron: go ahead, as usual starting with the secondaries and double-checking with ipvsadm that everything looks sane :) [15:48:15] kk thanks [15:56:01] done [16:01:00] herron: thanks! [19:58:17] mutante: can you handle the communication to wikitech-l regarding the TLS cipher suite change in gerrit? [20:09:27] I'm working on that vgutierrez [20:09:31] https://etherpad.wikimedia.org/p/g505410announce [20:16:35] thx [20:18:56] vgutierrez: i will add it to the deployment calendar and then we can send the etherpad contents to wikitech-l [20:19:15] sound good? then we have a time for users to expect it [20:25:52] yep [20:34:47] https://wikitech.wikimedia.org/w/index.php?title=Deployments&type=revision&diff=1824248&oldid=1824244 [20:34:59] tomorrow morning and added myself to possible puppet swatters [20:35:11] 10netops, 10Operations, 10cloud-services-team (Kanban): Allocate VIP for failover of the maps home and project mounts on cloudstore1008/9 - https://phabricator.wikimedia.org/T221806 (10Bstorm) [20:35:26] 10netops, 10Operations, 10cloud-services-team (Kanban): Allocate VIP for failover of the maps home and project mounts on cloudstore1008/9 - https://phabricator.wikimedia.org/T221806 (10Bstorm) a:05Bstorm→03None [20:39:39] notified releng [21:05:38] bblack: i would like to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/500715 and i can't decide if it counts as trivial or dangerous .. adding a regex to text.yaml for getting wikiba.se done [21:23:27] mutante: seems relatively safe, maybe pass it through puppet-compiler on a text node first. But probably the worst that can happen to is a bunch of puppetfails from VCL failing to reload (or wikibase itself not working, but it's new to this cluster anyways) [21:26:57] bblack: sounds good and will do that. thank you [22:20:16] merged [22:31:03] 10netops, 10Operations, 10cloud-services-team (Kanban): Allocate VIP for failover of the maps home and project mounts on cloudstore1008/9 - https://phabricator.wikimedia.org/T221806 (10ayounsi) Thanks, sounds good! There is nothing special to do, make sure to reserve it in DNS, eg. 208.80.155.119/2620:0:861:... [22:36:57] 10netops, 10Operations, 10cloud-services-team (Kanban): Allocate VIP for failover of the maps home and project mounts on cloudstore1008/9 - https://phabricator.wikimedia.org/T221806 (10Bstorm) Great! I'll sort that out, then. [22:37:08] 10netops, 10Operations, 10cloud-services-team (Kanban): Allocate VIP for failover of the maps home and project mounts on cloudstore1008/9 - https://phabricator.wikimedia.org/T221806 (10Bstorm) a:03Bstorm [22:46:25] 10Traffic, 10Operations, 10Wikidata, 10serviceops, and 4 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10Dzahn) >>! In T155359#5108338, @Dzahn wrote: > Next is deploying https://gerrit.wikimedia.org/r/c/operations/puppet/+/500715 for T99531#50771... [23:10:38] 10netops, 10Operations, 10ops-ulsfo: Interface errors on cr4-ulsfo:et-0/0/1 - https://phabricator.wikimedia.org/T205937 (10RobH) 05Open→03Resolved a:03RobH [23:18:36] 10Domains, 10Traffic, 10Operations, 10WMF-Legal, 10Patch-For-Review: Move wikimedia.ee under WM-EE - https://phabricator.wikimedia.org/T204056 (10Dzahn) Hi @tramm Any update on this from your side?