[09:18:14] I am going to work on db1082/db1124 [09:25:43] 10DBA, 10Data-Services, 10Operations: db1082 power loss resulted on mysql crash - https://phabricator.wikimedia.org/T213108 (10jcrespo) db1124:s5 stopped at db1082-bin.002490:667685191 ` root@db1124[(none)]> show global variables like '%gtid%'; +------------------------+------------------------------... [11:25:17] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change, 10User-Banyek: Dropping user.user_options on wmf databases - https://phabricator.wikimedia.org/T85757 (10Banyek) on `db1062` (s7 master) every database is done, except `eswiki` I have to retry this later. (Lock wait timeout exceeded) [13:39:42] 10DBA, 10Analytics, 10Analytics-Kanban, 10Data-Services, and 3 others: Create materialized views on Wiki Replica hosts for better query performance - https://phabricator.wikimedia.org/T210693 (10Banyek) @Bstorm so, what do you think, should I drop these? [14:30:25] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2047 - https://phabricator.wikimedia.org/T212966 (10Marostegui) 05Open→03Resolved Thanks! ` root@db2047:~# hpssacli controller all show config Smart Array P420i in Slot 0 (Embedded) (sn: 0014380337E0DB0) Port Name: 1I Port Name: 2I Gen... [14:30:37] 10DBA, 10Operations: Predictive failures on disk S.M.A.R.T. status - https://phabricator.wikimedia.org/T208323 (10Marostegui) [14:43:01] 10DBA, 10Jade, 10Operations, 10TechCom-RFC, and 2 others: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10daniel) Moving this to "backlog" for now. This should have happened right after last week's TechCom meeting, but I forgot to do... [15:04:32] I am going to switch replication to db1124:3315 to codfw for some time [15:04:50] until it catches up [15:05:26] you mean db1082 catches up? [15:05:42] no, db1082 has already catched up [15:06:04] but I cannot poing to it because it lacks row based binlogs [15:06:33] so I move sanitarium until now and then put it back [15:06:43] (I got the right position in advance) [15:06:59] ah, you mean leaving db1082 alone [15:07:00] right [15:07:32] to get it recloned [15:07:41] it was recloned already [15:07:49] ah! [15:07:52] sorry, missed that! [15:07:59] but binlogs cannt be used [15:08:12] I mean, if we were in an emergency, we could use them [15:08:19] yep, I get it know :) [15:08:26] but I prefer to switchover temporarilly the replication [15:08:30] yeah [15:12:22] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: es2019 is not responsive - https://phabricator.wikimedia.org/T212833 (10Marostegui) I have depooled es2019 so it is ready to be powered off once @Papaul is ready for it [15:17:04] 10DBA, 10SDC Engineering, 10SDC General, 10Wikidata, and 2 others: Create a production test wiki in group0 to parallel Wikimedia Commons - https://phabricator.wikimedia.org/T197616 (10Marostegui) The database is still present at the s4 servers, and I would like to clean it up. It cannot be done directly on... [15:17:22] 10DBA, 10SDC Engineering, 10SDC General, 10Wikidata, and 2 others: Create a production test wiki in group0 to parallel Wikimedia Commons - https://phabricator.wikimedia.org/T197616 (10Marostegui) 05Resolved→03Open [15:22:05] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: es2019 is not responsive - https://phabricator.wikimedia.org/T212833 (10Marostegui) a:05Banyek→03Papaul Assigning to @Papaul as per our chat [15:24:50] 10DBA, 10Recommendation-API, 10Research, 10Core Platform Team Backlog (Watching / External), and 2 others: Recommendation API exceeds max_user_connections in MySQL - https://phabricator.wikimedia.org/T212154 (10Marostegui) 05Open→03Resolved I have deployed and changed the mysql config: ` | GRANT USAGE... [15:40:41] I'll leave soon to the kindergarten, but I'll online later [15:41:46] 10DBA, 10Recommendation-API, 10Research, 10Core Platform Team Backlog (Watching / External), and 2 others: Recommendation API exceeds max_user_connections in MySQL - https://phabricator.wikimedia.org/T212154 (10bmansurov) @Marostegui thanks! [15:48:13] 10DBA, 10Patch-For-Review: Drop valid_tag table - https://phabricator.wikimedia.org/T212254 (10Marostegui) [15:51:05] 10DBA, 10SDC Engineering, 10SDC General, 10Wikidata, and 2 others: Create a production test wiki in group0 to parallel Wikimedia Commons - https://phabricator.wikimedia.org/T197616 (10jcrespo) By the way, are people aware that a shard called "test-s4" has 2 dedicated large hosts and ready to be used for pr... [16:01:17] 10DBA, 10Jade, 10Operations, 10TechCom-RFC, and 2 others: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10Halfak) The meeting happened. @Krinkle confirmed that his questions had been answered. I imagined that he'd report to y'all. [16:06:40] 10DBA, 10Patch-For-Review: Drop valid_tag table - https://phabricator.wikimedia.org/T212254 (10Marostegui) [16:07:19] 10DBA, 10Analytics, 10Analytics-Kanban, 10Data-Services, and 3 others: Create materialized views on Wiki Replica hosts for better query performance - https://phabricator.wikimedia.org/T210693 (10Bstorm) Well, my end can't use 'em very effectively. It's all whether we are going to build it around analytics... [16:11:02] 10DBA, 10Operations, 10ops-eqiad: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10Marostegui) @faidon @RobH can we follow up with Dell to see what's going on a more formal way? This server has been unusable since it arrived and it is brand new :-| [16:20:05] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: es2019 is not responsive - https://phabricator.wikimedia.org/T212833 (10Papaul) a:05Papaul→03Marostegui Update BIOS from 2.4.3 to 2.8.0 IDRAC from 2.40 to 2.61 system is power on [16:27:56] 10DBA, 10Operations, 10ops-eqiad: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10RobH) >>! In T207258#4862960, @Marostegui wrote: > @faidon @RobH can we follow up with Dell to see what's going on a more formal way? This server has been unusable since it arrived and it is bran... [16:28:19] 10DBA, 10Operations, 10ops-eqiad: rack/setup/install pc1007-pc1010 - https://phabricator.wikimedia.org/T207258 (10Marostegui) Thank you! [16:34:54] 10DBA, 10Operations, 10ops-codfw: Several es20XX servers keep crashing (es2017, es2019, es2015, es2014) since 23 March - https://phabricator.wikimedia.org/T130702 (10Marostegui) [16:34:57] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: es2019 is not responsive - https://phabricator.wikimedia.org/T212833 (10Marostegui) 05Open→03Resolved Thank you! I have repooled the server! [16:46:31] 10DBA, 10Analytics, 10Analytics-Kanban, 10Data-Services, and 3 others: Create materialized views on Wiki Replica hosts for better query performance - https://phabricator.wikimedia.org/T210693 (10Marostegui) +1 to drop them if @Milimetric doesn't need them [16:57:42] so this is the current status- I started replication on db1124:s5 from db2066 [16:57:52] but I stopped it on the 3 labsdbs [16:58:13] if everthing is working, I will restart the labsdbs [16:58:17] ah good [16:58:27] will you leave it replicate for a few mins first? [16:58:42] hours [16:59:19] it was a tricky change because filters + separate master makes it not trivial [16:59:27] and worries me to automate it [17:01:12] in fact, even if I wanted to change it back to db1082 fastly, I cannot, it has to catch up almost a whole day [17:46:33] 10DBA, 10Analytics, 10Analytics-Kanban, 10Data-Services, and 3 others: Create materialized views on Wiki Replica hosts for better query performance - https://phabricator.wikimedia.org/T210693 (10Milimetric) +1 to drop, our other solution is feasible and we're going that way for the foreseeable future. Hap... [18:51:26] 10DBA, 10Patch-For-Review: Drop valid_tag table - https://phabricator.wikimedia.org/T212254 (10Marostegui) [19:01:29] 10DBA, 10Data-Services, 10Operations: db1082 power loss resulted on mysql crash - https://phabricator.wikimedia.org/T213108 (10jcrespo) This is mostly fixed, except gtid must be enabled on 82 and 1124, plus 82 must be repooled. [19:07:12] 10DBA, 10Patch-For-Review: Drop valid_tag table - https://phabricator.wikimedia.org/T212254 (10Marostegui) [20:13:53] 10DBA, 10Analytics, 10Analytics-Kanban, 10Data-Services, and 3 others: Create materialized views on Wiki Replica hosts for better query performance - https://phabricator.wikimedia.org/T210693 (10Banyek) Ok, then we are all agree, I'll drop those tables tomorrow in the morning [20:29:34] 10DBA, 10Patch-For-Review: Drop valid_tag table - https://phabricator.wikimedia.org/T212254 (10Marostegui) [20:32:19] 10DBA, 10Patch-For-Review: Drop valid_tag table - https://phabricator.wikimedia.org/T212254 (10Marostegui) s3 codfw will be done with replication once the backups are finished. s3 will be done on a per host basis to avoid lags [] labsdb1011 [] labsdb1010 [] labsdb1009 [] dbstore1002 [] db1124 [] db1123 [] db1... [20:37:29] 10DBA, 10Analytics, 10Analytics-Kanban, 10User-Elukey: Review dbstore1002's non-wiki databases and decide which ones needs to be migrated to the new multi instance setup - https://phabricator.wikimedia.org/T212487 (10leila) >>! In T212487#4851201, @elukey wrote: >>>! In T212487#4839932, @elukey wrote: >>... [21:03:29] 10DBA, 10Data-Services, 10Operations: db1082 power loss resulted on mysql crash - https://phabricator.wikimedia.org/T213108 (10bd808) [22:14:10] 10DBA, 10Patch-For-Review: Drop valid_tag table - https://phabricator.wikimedia.org/T212254 (10Marostegui) s3 codfw is done - with replication. [22:14:23] 10DBA, 10Patch-For-Review: Drop valid_tag table - https://phabricator.wikimedia.org/T212254 (10Marostegui) [22:14:39] 10DBA, 10Patch-For-Review: Drop valid_tag table - https://phabricator.wikimedia.org/T212254 (10Marostegui)