[04:50:00] 10DBA, 10Operations, 10CAS-SSO, 10Patch-For-Review, 10User-jbond: Request new database for idp-test.wikimedia.org - https://phabricator.wikimedia.org/T256120 (10Marostegui) @jbond anything pending here or can this be closed? [04:51:00] 10DBA, 10Performance-Team: Database for XHGui profiles - https://phabricator.wikimedia.org/T254795 (10Marostegui) Did the backup work fine @jcrespo? [05:01:55] 10DBA, 10Operations, 10Patch-For-Review, 10User-Kormat: Set up replication for zarcillo - https://phabricator.wikimedia.org/T257816 (10Marostegui) a:03Kormat [05:11:55] 10DBA, 10Core Platform Team, 10Schema-change, 10User-DannyS712: iwlinks indexes should be UNIQUE INDEXes - https://phabricator.wikimedia.org/T256842 (10Marostegui) p:05Triage→03Medium [05:13:23] 10DBA, 10Cloud-Services, 10User-Kormat: Prepare and check storage layer for sysop_itwiki - https://phabricator.wikimedia.org/T257125 (10Marostegui) a:05Kormat→03None [05:15:22] 10DBA, 10Operations, 10Sustainability (Incident Prevention): Disallow 'weight: 0' for MW db config in dbctl - https://phabricator.wikimedia.org/T239901 (10Marostegui) 05Open→03Declined Going to close this as declined for now, as looks like we are not going to proceed with this so far. [05:22:51] 10DBA, 10Quibble: Optimize MySQL settings for MediaWiki CI / Quibble - https://phabricator.wikimedia.org/T218196 (10Marostegui) @hashar is this still needed? [05:24:26] 10DBA, 10Operations, 10Traffic, 10Patch-For-Review: Framework to transfer files over the LAN - https://phabricator.wikimedia.org/T156462 (10Marostegui) a:05Rduran→03None [05:26:19] 10DBA, 10Operations, 10Epic, 10Performance-Team (Radar), 10Sustainability (Incident Prevention): Decide how to improve parsercache replication, sharding and HA - https://phabricator.wikimedia.org/T133523 (10Marostegui) [07:20:14] 10DBA, 10Performance-Team: Database for XHGui profiles - https://phabricator.wikimedia.org/T254795 (10jcrespo) a:05jcrespo→03Marostegui A backups was successfully produced, however it is empty (has no data) like the database. [07:42:11] 10DBA, 10Performance-Team: Database for XHGui profiles - https://phabricator.wikimedia.org/T254795 (10Marostegui) Thanks Jaime! @dpifke anything else left from your side or we are good to close this? [08:29:04] 10DBA, 10Operations, 10CAS-SSO, 10Patch-For-Review, 10User-jbond: Request new database for idp-test.wikimedia.org - https://phabricator.wikimedia.org/T256120 (10jbond) 05Open→03Resolved nothing pending on this task, resolving and thanks [08:44:38] 10DBA, 10Cloud-Services, 10CPT Initiatives (MCR Schema Migration), 10Core Platform Team Workboards (Clinic Duty Team), and 2 others: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10Marostegui) After a few days of investigati... [09:53:31] 10DBA, 10Operations, 10ops-eqiad: db1145 crashed - memory issues - https://phabricator.wikimedia.org/T258249 (10Marostegui) [09:55:12] 10DBA, 10Operations, 10ops-eqiad: db1145 crashed - memory issues - https://phabricator.wikimedia.org/T258249 (10Marostegui) More HW logs - we've got a broken DIMM apparently ` Record: 1 Date/Time: 03/30/2020 23:27:03 Source: system Severity: Ok Description: Log cleared. -----------------------... [09:58:28] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1145 crashed - memory issues - https://phabricator.wikimedia.org/T258249 (10Marostegui) I have hit F1, to continue its boot, so we can see the OS. The memory on the OS looks ok: ` root@db1145:~# free -g total used free... [09:58:48] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1145 crashed - memory issues - https://phabricator.wikimedia.org/T258249 (10Marostegui) p:05Triage→03Medium [09:59:45] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1145 crashed - memory issues - https://phabricator.wikimedia.org/T258249 (10Marostegui) [11:06:17] 10DBA, 10Operations, 10ops-eqiad: db1145 crashed - memory issues - https://phabricator.wikimedia.org/T258249 (10jcrespo) I've started a restore process from yesterday's backups. [11:10:52] 10DBA: Solve transferpy concurrency issue with auto port detection - https://phabricator.wikimedia.org/T256450 (10jcrespo) The shared checksum path on parallel execution issue should get top priority, as this was already observed problematic when running to independent executions of transfer.py to the same host... [11:13:57] 10DBA: Solve transferpy concurrency issue with auto port detection - https://phabricator.wikimedia.org/T256450 (10Privacybatm) Yeah, I agree with you, I am working in this issue currently. Thank you. [12:45:10] 10DBA: Make checksum parallel to the data transfer in transferpy package - https://phabricator.wikimedia.org/T254979 (10jcrespo) So the largest issues on how options work, which makes them very confusing: If I do --no-checksum, I expect to not get any checksum; however, I get a parallel checksum. If I do --para... [13:01:29] 10DBA: Make checksum parallel to the data transfer in transferpy package - https://phabricator.wikimedia.org/T254979 (10Privacybatm) >>! In T254979#6315023, @jcrespo wrote: > So the largest issues on how options work, which makes them very confusing: > > If I do --no-checksum, I expect to not get any checksum;... [13:06:48] 10DBA, 10Patch-For-Review: Make checksum parallel to the data transfer in transferpy package - https://phabricator.wikimedia.org/T254979 (10Privacybatm) I have just updated the commit message so that it is visible here! [13:20:48] 10DBA, 10Operations, 10ops-eqiad: db1145 crashed - memory issues - https://phabricator.wikimedia.org/T258249 (10jcrespo) a:03wiki_willy The service is back up from backups so the backup service continues uninterrupted during the weekend. @wiki_willy let us know what is the next step as mentioned by Maroste... [13:24:27] 10DBA, 10Operations, 10ops-eqiad: db1145 crashed - memory issues - https://phabricator.wikimedia.org/T258249 (10Marostegui) @jcrespo remember I disabled notifications via puppet, I guess we should leave them disabled until the maintenance is done? [13:26:02] 10DBA, 10Operations, 10ops-eqiad: db1145 crashed - memory issues - https://phabricator.wikimedia.org/T258249 (10jcrespo) Yes, I agree with that option. Thanks for creating the ticket and doing the initial triage! [13:26:32] 10DBA, 10Operations, 10ops-eqiad: db1145 crashed - memory issues - https://phabricator.wikimedia.org/T258249 (10Marostegui) Cool! <3 [13:33:46] 10DBA, 10Patch-For-Review: Solve transferpy concurrency issue with auto port detection and checksum file names - https://phabricator.wikimedia.org/T256450 (10Privacybatm) [17:09:59] 10DBA, 10Operations, 10ops-eqiad: db1145 crashed - memory issues - https://phabricator.wikimedia.org/T258249 (10wiki_willy) a:05wiki_willy→03Jclark-ctr @Jclark-ctr - can you check this one out when you're onsite next? It was only installed a few months ago, so we should be able to RMA the part pretty ea... [17:14:28] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: Degraded RAID on db1131 - https://phabricator.wikimedia.org/T257253 (10wiki_willy) @Marostegui - here are the details below on what Dell replaced. The DIMM A10, the SSD in slot 0, and the system board (though the board wasn't bad...it was just the CMOS... [18:13:03] 10DBA, 10Performance-Team: Database for XHGui profiles - https://phabricator.wikimedia.org/T254795 (10dpifke) 05Open→03Resolved All good on this end. Thanks! [18:50:31] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: Degraded RAID on db1131 - https://phabricator.wikimedia.org/T257253 (10Jclark-ctr) Thanks willy Yes. DIMM A10 SSD slot 0 Main board [22:28:54] 10DBA, 10CheckUser, 10Stewards-and-global-tools, 10I18n, and 2 others: Incomplete i18n for log entries in CheckUser - https://phabricator.wikimedia.org/T41013 (10Huji) The reason I mentioned DBA is because I know there has been some sensitivity about the size of CU tables. See T257223 for instance.