[01:06:28] 10DBA, 10Data-Services, 10Quarry: SQL requests to DB replicas became work much slower, both from Quarry and from process on Toolforge - https://phabricator.wikimedia.org/T262757 (10bd808) Plugging the query into https://sql-optimizer.toolforge.org/ gives this explain result: |id| select_type| table| type|... [05:14:06] 10DBA: Compress new Wikibase tables - https://phabricator.wikimedia.org/T232446 (10Marostegui) [05:14:13] 10DBA: Compress new Wikibase tables - https://phabricator.wikimedia.org/T232446 (10Marostegui) 05Open→03Resolved This is all done [05:36:02] 10DBA: db1115's (tendril) disk filling up - https://phabricator.wikimedia.org/T262782 (10Marostegui) [05:36:16] 10DBA: db1115's (tendril) disk filling up - https://phabricator.wikimedia.org/T262782 (10Marostegui) p:05Triage→03High a:03Marostegui [05:42:42] 10DBA: db1115's (tendril) disk filling up - https://phabricator.wikimedia.org/T262782 (10Marostegui) Something happened the 26th of August, when the host started to grow a lot on disk space on that table {F32349428} [05:52:13] 10DBA: db1115's (tendril) disk filling up - https://phabricator.wikimedia.org/T262782 (10Marostegui) Most of the logged queries are coming from db1138 from what I can see. [06:03:03] 10DBA: db1115's (tendril) disk filling up - https://phabricator.wikimedia.org/T262782 (10Marostegui) p:05High→03Medium I have truncated the table and after a few minutes the table has again almost 1M rows and this is interesting: ` mysql:root@localhost [tendril]> select distinct(m_server_id) from general_log... [07:40:17] 10DBA, 10Data-Services, 10Projects-Cleanup: Drop DB tables for now-deleted fixcopyrightwiki from production - https://phabricator.wikimedia.org/T246055 (10Marostegui) @Bugreporter what's the benefit of renaming if they need to be truncated anyways? [07:44:31] 10DBA: db1115's (tendril) disk filling up - https://phabricator.wikimedia.org/T262782 (10Marostegui) 05Open→03Resolved Mystery solved. Looks like `general_log` table on db1138 contained lots of queries (248k) (this was probably enabled by me for MCR query capture) from the 26th Aug. This was being imported e... [08:14:44] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Kormat: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Marostegui) @Papaul mysql stopped, you can proceed with this host whenever you want. [08:52:41] I will stop replication on db1077 soon, when alex confirms. That should not alerts, but likely in a few minutes/hours m2 will start to get lag due to big alters [08:52:53] ok [08:59:30] 10DBA, 10Patch-For-Review: Productionize es20[26-34] and es10[26-34] - https://phabricator.wikimedia.org/T261717 (10Marostegui) [08:59:41] 10DBA, 10Patch-For-Review: Productionize es20[26-34] and es10[26-34] - https://phabricator.wikimedia.org/T261717 (10Marostegui) es2026 is fully pooled on es2 [10:48:50] 10DBA: transfer.py fails when copying data between es hosts - https://phabricator.wikimedia.org/T262388 (10jcrespo) a:03jcrespo [11:06:46] marostegui: o/ let me know if I can help with anything [11:07:00] Amir1: o/ back from holidays? [11:07:10] Yup :D [11:07:20] Amir1: Good to have you around again! Thanks for the offer :** [11:07:40] ^_^ [11:10:56] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: Mon, Sept 14 PDU Upgrade 12pm-4pm UTC- Racks C2 and C3 - https://phabricator.wikimedia.org/T261455 (10Marostegui) >>! In T261455#6423007, @Marostegui wrote: > Please take extra care with db1087, db1100 and db1109, they are an eqiad masters and lots of slaves hang... [11:48:33] 10Blocked-on-schema-change, 10DBA: Extend sites.site_global_key on WMF production - https://phabricator.wikimedia.org/T260476 (10Marostegui) [11:48:54] 10Blocked-on-schema-change, 10DBA: Extend sites.site_global_key on WMF production - https://phabricator.wikimedia.org/T260476 (10Marostegui) 05Open→03Resolved All done [13:47:32] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): enwiki database replicas (Toolforge and Cloud VPS) are more than 24h+ lagged - https://phabricator.wikimedia.org/T262239 (10Marostegui) For what is worth, the last maintenance that needs to happen on s1 is being run at the moment, I expect it to take 2... [14:29:07] jynus: https://phabricator.wikimedia.org/P12579 :( [14:29:42] No puppet run around that time [14:29:58] Unless you want to debug something I will re-start it again [14:30:38] sure [14:30:42] so it is something else [14:31:24] We'll see, I will re-start it again with same conditions (puppet enabled and to a different directory) [14:32:01] nc process were not killed btw [14:33:31] Started, we'll see how it goes. It failed after 3.5TB [14:36:09] matching ps1-d5-eqiad maintenance, coincidence? [14:36:33] yeah, it should be, that's eqiad [14:43:08] Yeah, it is all being done from codfw, as I am using cumin2001 [14:43:20] so I am then completely lost [14:44:50] I have commented on the task, for tracking [14:44:57] thanks [14:44:58] 10DBA: transfer.py fails when copying data between es hosts - https://phabricator.wikimedia.org/T262388 (10Marostegui) For what is worth, there was an error today between es2017 and es2027's transfer after 3.5TB: {P12579} The only puppet runs around those times at the destination host (es2027) are: - Sep 14 12:... [14:46:16] I should've used verbose, oh well [15:22:38] 10DBA, 10Operations, 10observability: Prometheus/MariaDB counts a 'SELECT ... FOR UPDATE' query as an UPDATE query - https://phabricator.wikimedia.org/T262579 (10lmata) Is there any specific action you'd like us to take regarding the exporter? [15:27:07] 10DBA, 10Operations, 10observability: Prometheus/MariaDB counts a 'SELECT ... FOR UPDATE' query as an UPDATE query - https://phabricator.wikimedia.org/T262579 (10jcrespo) I think the title no longer reflects reality- I am not sure anymore if there was an issue, but even if it was, I don't believe it is on th... [15:29:47] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: Mon, Sept 14 PDU Upgrade 12pm-4pm UTC- Racks C2 and C3 - https://phabricator.wikimedia.org/T261455 (10RobH) [16:13:37] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: Tue, Sept 8 PDU Upgrade 12pm-4pm UTC- Racks D3 and D4 - https://phabricator.wikimedia.org/T261452 (10RobH) a:05Jclark-ctr→03Cmjohnson It appears all the steps by onsites were done, but its unclear. If there are any pending steps for these, please do so and t... [16:14:08] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: New Date - Tue, Sept 15: PDU Upgrade 12pm-4pm UTC- Racks D5 and D6 - https://phabricator.wikimedia.org/T261453 (10RobH) [16:14:11] 10DBA, 10Data-Services: Prepare and check storage layer for arbcom_ruwiki - https://phabricator.wikimedia.org/T262832 (10JJMC89) [16:15:50] 10DBA, 10Data-Services: Prepare and check storage layer for arbcom_ruwiki - https://phabricator.wikimedia.org/T262832 (10JJMC89) public→private per the parent task [16:19:20] 10DBA, 10Data-Services: Prepare and check storage layer for arbcom_ruwiki - https://phabricator.wikimedia.org/T262832 (10jcrespo) As documented, please do not create the database itself until it is confirmed here it is not being sent to cloud. [16:24:52] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: New Date - Tue, Sept 15: PDU Upgrade 12pm-4pm UTC- Racks D5 and D6 - https://phabricator.wikimedia.org/T261453 (10RobH) [16:25:58] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: New Date - Tue, Sept 15: PDU Upgrade 12pm-4pm UTC- Racks D5 and D6 - https://phabricator.wikimedia.org/T261453 (10RobH) [16:32:40] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: New Date - Tue, Sept 15: PDU Upgrade 12pm-4pm UTC- Racks D5 and D6 - https://phabricator.wikimedia.org/T261453 (10RobH) Please note that ps1-d6-eqiad does not see ps2-d6-eqiad, I suspect it is not linked correctly via cable. The new netbox entries for these two... [16:40:47] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Kormat: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Papaul) hardware diagnostics error below {F32350724} [17:00:14] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: New Date - Tue, Sept 15: PDU Upgrade 12pm-4pm UTC- Racks D5 and D6 - https://phabricator.wikimedia.org/T261453 (10RobH) PDUs show correctly in icinga, so the errors for them are legit: ps1-d6-eqiad doesn't see ps2, so it has errors. [18:04:21] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Watchlist Expiry: Release plan [rough schedule] - https://phabricator.wikimedia.org/T261005 (10ifried) @Marostegui Hello! I'm sending an update that we now plan to postpone the release to a week later, so we can address a UX issue. This means that Group 0... [19:05:48] 10DBA, 10Product-Infrastructure-Team-Backlog, 10Push-Notification-Service, 10Patch-For-Review, 10User-Marostegui: DBA review for Echo push notification subscription tables - https://phabricator.wikimedia.org/T246716 (10Mholloway) There's no particular reason we need TEXT as opposed to BLOB, so I pushed a... [19:12:40] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: New Date - Tue, Sept 15 PDU Upgrade 12pm-4pm UTC- Racks C2 and C3 - https://phabricator.wikimedia.org/T261455 (10wiki_willy) [19:14:39] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: New Date - Mon, Sept 14: PDU Upgrade 12pm-4pm UTC- Racks D5 and D6 - https://phabricator.wikimedia.org/T261453 (10wiki_willy) [19:32:02] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: New Date - Mon, Sept 14: PDU Upgrade 12pm-4pm UTC- Racks D5 and D6 - https://phabricator.wikimedia.org/T261453 (10RobH) [19:32:35] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: New Date - Mon, Sept 14: PDU Upgrade 12pm-4pm UTC- Racks D5 and D6 - https://phabricator.wikimedia.org/T261453 (10RobH) [19:32:50] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: PDU Upgrade 12pm-4pm UTC- Racks D5 and D6 - https://phabricator.wikimedia.org/T261453 (10RobH) [19:34:16] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: PDU Upgrade 12pm-4pm UTC- Racks D5 and D6 - https://phabricator.wikimedia.org/T261453 (10RobH) I've removed the due/work date as the majority of the onsite work was completed today. All that is pending onsite work is: [] - onsite to update this task with the as... [19:38:59] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: PDU Upgrade 12pm-4pm UTC- Racks D5 and D6 - https://phabricator.wikimedia.org/T261453 (10RobH) [19:39:27] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: PDU Upgrade Racks D5 and D6 - https://phabricator.wikimedia.org/T261453 (10RobH) [19:52:10] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: PDU Upgrade Racks D5 and D6 - https://phabricator.wikimedia.org/T261453 (10RobH) [19:56:23] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: PDU Upgrade Racks D5 and D6 - https://phabricator.wikimedia.org/T261453 (10RobH) [19:56:42] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: PDU Upgrade Racks D5 and D6 - https://phabricator.wikimedia.org/T261453 (10RobH) So the only pending item is ps1-d6-eqiad cannot see ps2-d6-eqiad. I suspect the silver link cable is not seated. [20:05:09] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Watchlist Expiry: Release plan [rough schedule] - https://phabricator.wikimedia.org/T261005 (10Marostegui) That sounds good @ifried - let me know when the final date is clear Thank you for the heads up [20:07:05] 10DBA, 10Product-Infrastructure-Team-Backlog, 10Push-Notification-Service, 10Patch-For-Review, 10User-Marostegui: DBA review for Echo push notification subscription tables - https://phabricator.wikimedia.org/T246716 (10Marostegui) Thanks. What's the release plan? Is this going to go full ON for all the w... [23:28:58] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Kormat: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Papaul) Papaul, That error just points to the existence of errors. Were there any other errors following that? I'm sorry if I didn't mak... [23:29:59] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review, 10User-Kormat: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Papaul) Hey Michael, No need to be sorry I knew I had to continue diagnostics which I did and it was taking too long and was waiting for it...