[00:23:06] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['es2034.codfw.wmnet'] ` and were **ALL** successful. [00:25:32] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Papaul) [00:25:50] 10DBA: Productionize es20[26-34] and es10[26-34] - https://phabricator.wikimedia.org/T261717 (10Papaul) [00:26:27] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Papaul) 05Open→03Resolved @Marostegui all yours [04:59:28] 10DBA, 10Operations, 10observability: Prometheus/MariaDB counts a 'SELECT ... FOR UPDATE' query as an UPDATE query - https://phabricator.wikimedia.org/T262579 (10Marostegui) p:05Triage→03Medium [05:02:56] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Marostegui) Thank you @Papaul - they all look good to me. [05:04:22] 10DBA: Productionize es20[26-34] and es10[26-34] - https://phabricator.wikimedia.org/T261717 (10Marostegui) [05:08:02] 10DBA: Productionize es20[26-34] and es10[26-34] - https://phabricator.wikimedia.org/T261717 (10Marostegui) [05:18:49] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): enwiki database replicas (Toolforge and Cloud VPS) are more than 24h+ lagged - https://phabricator.wikimedia.org/T262239 (10Marostegui) Thank you for your understanding :-) [07:07:29] 10DBA, 10Operations, 10observability: Prometheus/MariaDB counts a 'SELECT ... FOR UPDATE' query as an UPDATE query - https://phabricator.wikimedia.org/T262579 (10jcrespo) Both global and session status seem to be doing the right thing (global status checked stopping replication), so it is not the server (x1... [07:11:53] this may be interesting for you manuel and papaul: https://netbox.wikimedia.org/extras/reports/puppetdb.PhysicalHosts/ [07:14:14] ok, thanks [07:44:34] 10DBA, 10Operations, 10observability: Prometheus/MariaDB counts a 'SELECT ... FOR UPDATE' query as an UPDATE query - https://phabricator.wikimedia.org/T262579 (10jcrespo) Discarding also Grafana dashboards: ` irate(mysql_global_status_commands_total{instance="$server:$port",command="update"}[5m... [07:50:53] 10DBA, 10Operations, 10observability: Prometheus/MariaDB counts a 'SELECT ... FOR UPDATE' query as an UPDATE query - https://phabricator.wikimedia.org/T262579 (10jcrespo) However, while I can see the related updates later on, they are around 1 per second, not the large amount shown on the master, and not eno... [09:33:27] I am relatively confident about: https://gerrit.wikimedia.org/r/c/operations/software/wmfmariadbpy/+/626602/4 [09:33:38] but what do you think about: https://gerrit.wikimedia.org/r/c/operations/software/wmfmariadbpy/+/626603/2 [09:33:40] ? [09:34:14] will that work with m1, x1, es, pc? [09:34:19] yep [09:34:25] then I am ok with it [09:34:28] it uses the newly cenralized system [09:34:51] so as long as the hiera key is updated, it will be keep up to date [09:35:06] ok as in, will be potentially useful for you? [09:35:39] yes, I have +1ed, I am used to use numbers, but this is good too [09:35:51] for me it is mostly a "not having to remember the port/typing less" [09:39:22] I am mostly thinking on other people not having to memorize the ports [09:45:58] it's nice to have unit tests :-) [10:39:59] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Watchlist Expiry: Release plan [rough schedule] - https://phabricator.wikimedia.org/T261005 (10Marostegui) @ifried is this still scheduled for next week? [10:42:03] akosiaris: otrs plan will go forward? Do I start to snapshot otrs and "break" otrs-test on db1077? [10:56:44] I have started a snapshot now, in any case [10:57:00] heads up for log, in case there is lag on db1117 for the heavy ongoing backups [11:06:08] jynus: I was about to ask. Thanks! [11:13:00] jynus: see -operations, not urgent, not an issue, but just for your awareness [11:14:05] I gave them 2h of downtime [11:19:36] strange [11:19:40] that shouldn't happen [11:19:45] lag yes, but overload? [11:20:00] I have seen that before, so I am not surprised, especially when doing transfers from db1117 [11:20:38] snapshot is ongoing [11:20:52] I see and because it is db1117, it affects almost all proxies [11:22:01] ERROR 2013 (HY000): Lost connection to MySQL server at 'sending connection information to server', system error: 104 [11:22:08] but it works when connecting from localhost [11:22:47] also, at most I would expect overload on m2, but not the others :-( [11:22:49] Yes, that is what I said, that I have seen that before, the host rejecting/getting saturated, especially when doing transfers [11:23:32] hopefully transfer should finish soon [11:23:38] sorry about the noise [11:24:07] not an issue [12:29:28] db1117 network copy finished, so there should be no more errors from now on [12:29:37] ok thanks [12:30:14] procedure somewhere should note that otrs/db1117 backups should be done with less concurrency [12:30:24] I will mention that on wiki [13:09:58] about to bring down db1077 to restore and set it up as an additional otrs-only replica of m2 [13:10:33] ok [13:10:45] I will check if microcode update is to be done there at the same time [13:19:50] 10DBA, 10Operations, 10SRE-swift-storage, 10Goal: WMF media storage must be adequately backed up in a remote location - https://phabricator.wikimedia.org/T262668 (10jcrespo) [13:27:07] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Watchlist Expiry: Release plan [rough schedule] - https://phabricator.wikimedia.org/T261005 (10ifried) @Marostegui Yes, we're planning for a Tuesday, 9/15 release. Thanks for checking in! I'll update you on Tuesday and let you know if the release is still... [13:35:50] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Watchlist Expiry: Release plan [rough schedule] - https://phabricator.wikimedia.org/T261005 (10Marostegui) Thanks! [14:03:34] 10DBA, 10Patch-For-Review: Compress enwiki InnoDB tables - https://phabricator.wikimedia.org/T254462 (10Marostegui) [14:03:51] 10DBA, 10Patch-For-Review: Compress enwiki InnoDB tables - https://phabricator.wikimedia.org/T254462 (10Marostegui) 05Open→03Resolved This is all done [15:05:57] 10DBA: Compress enwiki InnoDB tables - https://phabricator.wikimedia.org/T254462 (10Pppery) [16:40:56] PROBLEM - MariaDB sustained replica lag on db1081 is CRITICAL: 71.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1081&var-port=9104 [16:41:04] PROBLEM - MariaDB sustained replica lag on db1121 is CRITICAL: 28.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1121&var-port=9104 [16:41:58] PROBLEM - MariaDB sustained replica lag on db1141 is CRITICAL: 24.8 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1141&var-port=9104 [16:48:40] RECOVERY - MariaDB sustained replica lag on db1081 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1081&var-port=9104 [16:48:48] RECOVERY - MariaDB sustained replica lag on db1121 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1121&var-port=9104 [16:49:42] RECOVERY - MariaDB sustained replica lag on db1141 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1141&var-port=9104 [18:45:00] PROBLEM - MariaDB sustained replica lag on db1081 is CRITICAL: 6.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1081&var-port=9104 [18:48:52] PROBLEM - MariaDB sustained replica lag on db1147 is CRITICAL: 2.4 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1147&var-port=9104 [18:49:04] PROBLEM - MariaDB sustained replica lag on db1143 is CRITICAL: 2.6 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1143&var-port=9104 [18:49:52] RECOVERY - MariaDB sustained replica lag on db1147 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1147&var-port=9104 [18:53:12] RECOVERY - MariaDB sustained replica lag on db1143 is OK: (C)2 ge (W)1 ge 0 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1143&var-port=9104 [19:04:42] RECOVERY - MariaDB sustained replica lag on db1081 is OK: (C)2 ge (W)1 ge 0.2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db1081&var-port=9104 [19:35:25] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: New Date - Tue, Sept 15: PDU Upgrade 12pm-4pm UTC- Racks D5 and D6 - https://phabricator.wikimedia.org/T261453 (10wiki_willy) [19:36:56] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: New Date - Wed, Sept 16 PDU Upgrade 12pm-4pm UTC- Racks D7 and D8 - https://phabricator.wikimedia.org/T261454 (10wiki_willy) a:05Jclark-ctr→03Cmjohnson [19:37:16] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: New Date - Tue, Sept 15: PDU Upgrade 12pm-4pm UTC- Racks D5 and D6 - https://phabricator.wikimedia.org/T261453 (10wiki_willy) a:05Jclark-ctr→03Cmjohnson [19:37:42] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: Tue, Sept 15 PDU Upgrade 12pm-4pm UTC- Racks C4 and C5 - https://phabricator.wikimedia.org/T261456 (10wiki_willy) a:05Jclark-ctr→03Cmjohnson [19:39:58] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: New Date - Thur, Sept 17: PDU Upgrade 12pm-4pm UTC- Racks D1 and D2 - https://phabricator.wikimedia.org/T261459 (10wiki_willy) [20:40:13] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: Tue, Sept 8 PDU Upgrade 12pm-4pm UTC- Racks D3 and D4 - https://phabricator.wikimedia.org/T261452 (10RobH) [20:56:41] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: Mon, Sept 14 PDU Upgrade 12pm-4pm UTC- Racks C2 and C3 - https://phabricator.wikimedia.org/T261455 (10RobH) [20:56:56] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: Tue, Sept 15 PDU Upgrade 12pm-4pm UTC- Racks C4 and C5 - https://phabricator.wikimedia.org/T261456 (10RobH) [20:57:20] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: New Date - Thur, Sept 17: PDU Upgrade 12pm-4pm UTC- Racks D1 and D2 - https://phabricator.wikimedia.org/T261459 (10RobH) [20:57:51] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: New Date - Wed, Sept 16 PDU Upgrade 12pm-4pm UTC- Racks D7 and D8 - https://phabricator.wikimedia.org/T261454 (10RobH) [20:58:00] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: Wed, Sept 16 PDU Upgrade 12pm-4pm UTC- Racks C6 and C7 - https://phabricator.wikimedia.org/T261457 (10RobH)