[00:31:55] 10netops, 10DBA, 06Operations, 13Patch-For-Review: Switchover s1 master db1057 -> db1052 - https://phabricator.wikimedia.org/T156008#2967405 (10jcrespo) I have upgraded all packages except wmf-mariadb10 and restarted the server for kernel update. [01:36:58] 10netops, 06Operations, 10ops-codfw: codfw: mc2019-mc2036/switch port configuration - https://phabricator.wikimedia.org/T156212#2967567 (10Papaul) [06:55:57] 10netops, 10DBA, 06Labs, 06Operations, 13Patch-For-Review: DBA plan to mitigate asw-c2-eqiad reboots - https://phabricator.wikimedia.org/T155999#2968016 (10Marostegui) [07:04:40] 10netops, 10DBA, 06Labs, 06Operations, 13Patch-For-Review: DBA plan to mitigate asw-c2-eqiad reboots - https://phabricator.wikimedia.org/T155999#2968020 (10Marostegui) For the record and tracking purposes: after lots of hours and hassle we were able to switch db1095's (new sanitarium) master from db1052... [07:07:39] 10netops, 10DBA, 06Operations, 10ops-eqiad: Move db1054 to C3 - https://phabricator.wikimedia.org/T156225#2968022 (10Marostegui) [07:08:02] 10netops, 10DBA, 06Labs, 06Operations, 13Patch-For-Review: DBA plan to mitigate asw-c2-eqiad reboots - https://phabricator.wikimedia.org/T155999#2961118 (10Marostegui) [08:33:16] moritzm: I've upgraded cp3034 to 8.7 and the new kernel, looks good so far [08:38:23] ok, thanks [08:54:32] upgrading cp3040 too so we have one text and one upload node on the newest kernel to keep an eye on [10:16:49] codfw pooled, frontends refilling: https://grafana.wikimedia.org/dashboard/db/prometheus-varnish-dc-stats?panelId=32&fullscreen&var-datasource=codfw%20prometheus%2Fops&var-cluster=cache_upload&var-layer=backend&var-layer=frontend&from=now-1h&to=now [10:39:31] nice graph :) [10:43:31] wikilove to filippo! [11:07:03] 10netops, 06Operations, 10ops-codfw: asw-a7-codfw is down - https://phabricator.wikimedia.org/T154758#2968449 (10faidon) 05Open>03Resolved a:03faidon Nothing more to do here. [11:07:19] 10netops, 06Labs, 06Operations: asw-c2-eqiad reboots & fdb_mac_entry_mc_set() issues - https://phabricator.wikimedia.org/T155875#2968453 (10faidon) p:05High>03Unbreak! [11:10:22] 10netops, 06Labs, 06Operations: asw-c2-eqiad reboots & fdb_mac_entry_mc_set() issues - https://phabricator.wikimedia.org/T155875#2968455 (10faidon) The switch rebooted again overnight (Jan 25 01:16 UTC). We are going to proceed with a replacement as soon as the DBA work (T155999) is done. Setting priority to... [11:12:43] 10netops, 06Operations: Enabling IGMP snooping on QFX switches breaks IPv6 (HTCP purges flood across codfw) - https://phabricator.wikimedia.org/T133387#2968457 (10faidon) 05Open>03stalled [11:14:03] 10netops, 06Labs, 06Operations: asw-c2-eqiad reboots & fdb_mac_entry_mc_set() issues - https://phabricator.wikimedia.org/T155875#2968471 (10Marostegui) >>! In T155875#2968455, @faidon wrote: > The switch rebooted again overnight (Jan 25 01:16 UTC). We are going to proceed with a replacement as soon as the DB... [11:55:32] 10netops, 10DBA, 06Operations, 13Patch-For-Review: Switchover s1 master db1057 -> db1052 - https://phabricator.wikimedia.org/T156008#2968514 (10Marostegui) This will be happening Thursday 25th at 07:00 UTC [14:57:46] 10Traffic, 06Operations: Select or Acquire Address Space for Asia Cache DC - https://phabricator.wikimedia.org/T156256#2968867 (10BBlack) [14:58:11] 10Traffic, 06Operations: Turn up network links for Asia Cache DC - https://phabricator.wikimedia.org/T156031#2968883 (10BBlack) [14:58:13] 10Traffic, 06Operations: Select or Acquire Address Space for Asia Cache DC - https://phabricator.wikimedia.org/T156256#2968882 (10BBlack) [14:58:40] 10Traffic, 06Operations: Configuration for Asia Cache DC hosts - https://phabricator.wikimedia.org/T156027#2968889 (10BBlack) [14:58:42] 10Traffic, 06Operations: Select or Acquire Address Space for Asia Cache DC - https://phabricator.wikimedia.org/T156256#2968867 (10BBlack) [15:04:59] remember we had some mysterious varnish system a while ago where lots of packages had been upgraded to the version in backports? that seems to be a bug in apt apparently, at least there are other reports of that: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=849382 [15:06:42] funky! [16:10:43] 10netops, 10DBA, 06Operations, 10ops-eqiad, 13Patch-For-Review: Move db1054 to A2 - https://phabricator.wikimedia.org/T156225#2969189 (10Marostegui) [16:26:12] 10netops, 10DBA, 06Operations, 10ops-eqiad, 13Patch-For-Review: Move db1054 to A3 - https://phabricator.wikimedia.org/T156225#2969260 (10Marostegui) [16:26:39] 10netops, 10DBA, 06Operations, 10ops-eqiad, 13Patch-For-Review: Move db1054 to A3 - https://phabricator.wikimedia.org/T156225#2968022 (10Marostegui) in the end it will go to A3 as Chris found some issues on the racks we previously selected. [16:55:26] 10netops, 10DBA, 06Operations, 10ops-eqiad, 13Patch-For-Review: Move db1054 to A3 - https://phabricator.wikimedia.org/T156225#2969337 (10Marostegui) 05Open>03Resolved a:03Cmjohnson db1054 has been moved. DNS updated db-eqiad,codfw files updated mysql and replication started finely. tendril updated... [16:55:31] 10netops, 10DBA, 06Labs, 06Operations, 13Patch-For-Review: DBA plan to mitigate asw-c2-eqiad reboots - https://phabricator.wikimedia.org/T155999#2969340 (10Marostegui) [16:58:41] 10netops, 06Labs, 06Operations: asw-c2-eqiad reboots & fdb_mac_entry_mc_set() issues - https://phabricator.wikimedia.org/T155875#2969364 (10Cmjohnson) @faidon new switch has been installed. Also added an uplink module. The switch is accessible via mgmt [19:45:17] 10Traffic, 06Operations: Letsencrypt all the prod things we can - planning - https://phabricator.wikimedia.org/T133717#2970074 (10RobH) [19:46:55] 10Traffic, 06Operations: convert wikitech-static.wikimedia.org to use LE rather than GS certificate - https://phabricator.wikimedia.org/T156294#2970080 (10RobH) [19:49:33] 10Traffic, 06Operations: convert wikitech-static.wikimedia.org to use LE rather than GS certificate - https://phabricator.wikimedia.org/T156294#2970098 (10Dzahn) Yep, i would suggest to install https://certbot.eff.org/ and run that there. It would get the cert and also create the Apache config snippet. [19:54:09] 10Traffic, 06Operations: convert wikitech-static.wikimedia.org to use LE rather than GS certificate - https://phabricator.wikimedia.org/T156294#2970105 (10RobH) Also this doesn't appear to be in our monitoring, and it should be in icinga. I'm adding now. [19:55:50] 10Traffic, 06Operations: Letsencrypt all the prod things we can - planning - https://phabricator.wikimedia.org/T133717#2970113 (10Krenair) [19:55:52] 10Traffic, 06Operations: convert wikitech-static.wikimedia.org to use LE rather than GS certificate - https://phabricator.wikimedia.org/T156294#2970111 (10Krenair) 05Open>03Invalid It already runs LE. You can add me to the monitoring if you like. [19:58:52] 10Traffic, 06Operations: Letsencrypt all the prod things we can - planning - https://phabricator.wikimedia.org/T133717#2970121 (10RobH) [19:59:13] 10Traffic, 06Operations: Letsencrypt all the prod things we can - planning - https://phabricator.wikimedia.org/T133717#2240497 (10RobH) Seems wikitech-static was converted previously, so it was already done. [20:46:52] 07HTTPS, 10Traffic, 06Operations, 10Wikimedia-Blog: Switch blog to HTTPS-only - https://phabricator.wikimedia.org/T105905#2970304 (10EdErhart-WMF) @BBlack Automattic has done this. Can someone check and make sure it's been set correctly before we close the ticket? [22:36:15] 10Traffic, 06Operations: convert wikitech-static.wikimedia.org to use LE rather than GS certificate - https://phabricator.wikimedia.org/T156294#2970689 (10Dzahn) added an Icinga contact for Krenair. but contact is not in a group yet. [22:36:42] 10Traffic, 06Operations: convert wikitech-static.wikimedia.org to use LE rather than GS certificate - https://phabricator.wikimedia.org/T156294#2970693 (10Dzahn) @Robh wanna link the monitoring change [22:53:35] 10Traffic, 06Operations: convert wikitech-static.wikimedia.org to use LE rather than GS certificate - https://phabricator.wikimedia.org/T156294#2970706 (10RobH) I linked this task in the commit. I thought it would show here post merge.... odd. I know it shows when bug:task# shows but should also post merge... [22:54:15] 10Traffic, 06Operations: convert wikitech-static.wikimedia.org to use LE rather than GS certificate - https://phabricator.wikimedia.org/T156294#2970707 (10RobH) https://gerrit.wikimedia.org/r/#/c/334180/ & https://gerrit.wikimedia.org/r/#/c/334177/ [22:55:58] 10Traffic, 06Operations: convert wikitech-static.wikimedia.org to use LE rather than GS certificate - https://phabricator.wikimedia.org/T156294#2970708 (10Krenair) https://gerrit.wikimedia.org/r/#/c/334177/ https://gerrit.wikimedia.org/r/#/c/334180/ I think it doesn't like the whitespace between the task line... [23:43:38] 10Traffic, 06Operations: convert wikitech-static.wikimedia.org to use LE rather than GS certificate - https://phabricator.wikimedia.org/T156294#2970836 (10Dzahn) It's the lack of "Bug: "