[05:39:17] 10DBA, 10MediaWiki-Database, 10MediaWiki-extensions-OATHAuth: Schema change to oathauth_users - https://phabricator.wikimedia.org/T225643 (10Marostegui) centralauth has been done: ` root@cumin1001:/home/marostegui# mysql.py -hdb1090:3317 centralauth -e "show create table oathauth_users\G" | egrep "module|dat... [05:39:41] 10DBA, 10MediaWiki-Database, 10MediaWiki-extensions-OATHAuth: Schema change to oathauth_users - https://phabricator.wikimedia.org/T225643 (10Marostegui) [06:10:13] 10DBA, 10Patch-For-Review: Replace db1077 with db1112 - https://phabricator.wikimedia.org/T225981 (10Marostegui) db1112 is now the sanitarium master for s3. [06:52:59] 10DBA, 10Patch-For-Review: Replace db1077 with db1112 - https://phabricator.wikimedia.org/T225981 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db1077.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/201906200652_marostegui_2... [07:13:32] 10DBA: Replace db1077 with db1112 - https://phabricator.wikimedia.org/T225981 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1077.eqiad.wmnet'] ` and were **ALL** successful. [07:41:15] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: db1077 crashed - https://phabricator.wikimedia.org/T225391 (10Marostegui) And after the reboot the battery fully failed T226154: ` Battery/Capacitor Count: 0 ` [07:41:34] 10DBA: Replace db1077 with db1112 - https://phabricator.wikimedia.org/T225981 (10Marostegui) And after the reboot the battery fully failed T226154: ` Battery/Capacitor Count: 0 ` [09:28:17] 10DBA: Replace db1077 with db1112 - https://phabricator.wikimedia.org/T225981 (10Marostegui) 05Open→03Resolved db1077 is now replicating from db1111 in the test-s4 cluster. The temporary data has been also removed from dbprov1001 ` root@dbprov1001:/srv/backups/tmp# df -hT /srv Filesystem Type Si... [10:39:31] 10DBA, 10MediaWiki-Database, 10MediaWiki-extensions-OATHAuth: Schema change to oathauth_users - https://phabricator.wikimedia.org/T225643 (10Marostegui) [10:40:29] 10DBA, 10MediaWiki-Database, 10MediaWiki-extensions-OATHAuth: Schema change to oathauth_users - https://phabricator.wikimedia.org/T225643 (10Marostegui) 05Open→03Resolved All the fishbowl wikis are done: ` for i in `cat s3_fishbowl | awk -F "." '{print $1}'`; do echo $i; mysql.py -hdb1123 $i -e "show cr... [13:38:14] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2058 - https://phabricator.wikimedia.org/T225902 (10Papaul) a:05Papaul→03Marostegui Disk replaced [13:39:25] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2043 - https://phabricator.wikimedia.org/T225889 (10Papaul) a:05Papaul→03Marostegui Disk replaced [14:04:43] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2058 - https://phabricator.wikimedia.org/T225902 (10Marostegui) Thanks! It is rebuilding ` root@db2058:~# hpssacli controller all show config Smart Array P420i in Slot 0 (Embedded) (sn: 0014380337DC560) Port Name: 1I Port Name: 2I Gen8 Ser... [14:05:12] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2058 - https://phabricator.wikimedia.org/T225902 (10Marostegui) a:05Marostegui→03Papaul The disk failed, can we try another one? Thanks! ` physicaldrive 1I:1:3 (port 1I:box 1:bay 3, SAS, 600 GB, Failed) ` [14:06:52] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2058 - https://phabricator.wikimedia.org/T225902 (10Marostegui) Sorry, this was for db2043 [14:07:07] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2058 - https://phabricator.wikimedia.org/T225902 (10Marostegui) a:05Papaul→03Marostegui [14:07:59] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2043 - https://phabricator.wikimedia.org/T225889 (10Marostegui) a:05Marostegui→03Papaul The disk has failed - can we try a different one? ` root@db2043:~# hpssacli controller all show config Smart Array P420i in Slot 0 (Embedded) (sn: 0014380337FA... [14:18:53] jynus and marostegui: We're all set to update the wiki replica views for https://phabricator.wikimedia.org/T101631 . Would today be OK for us to apply the change? [14:19:41] jeh: You can do labsdb1012 and labsdb1011 today yeah [14:19:57] 1009 and 1010 probably will have to wait [14:20:13] Is it ok if we don't end up needing to depool? [14:20:13] as we are doing some maintenance on 1011 so we cannot really depool them [14:20:18] ah sure [14:20:24] if you don't need to depool that's fine [14:20:28] We probably will, but just asking :-D [14:20:34] 1012 anytime [14:20:37] 1011 is depooled [14:20:46] 1009 and 1010...let's try without depooling first :) [14:21:04] It's on the revision table, so it's not likely lol [14:21:31] yeah....we can always try to see how lucky we are today [14:21:49] It's happened before! [14:21:58] don't jinx it!!! [14:22:04] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2043 - https://phabricator.wikimedia.org/T225889 (10Marostegui) @Papaul has removed and inserted back the disk and it is rebuilding again. Let's see if it goes fine this time or we have to replace it completely ` root@db2043:~# hpssacli controller all sho... [14:23:57] thanks! [14:31:20] marostegui: https://logstash.wikimedia.org/goto/3e50e34511b93a5a97b80c24a1f4ba74 (T205045) [14:31:20] T205045: Exception from LinksUpdate: Deadlock found in database query (from Wikibase\Client\Usage\Sql\EntityUsageTable::addUsages) - https://phabricator.wikimedia.org/T205045 [14:32:20] Amir1: oh sweet!! you fixed it?! [14:37:37] yup :P [14:38:01] <3 [14:38:04] thanks!! :) [15:02:25] bstorm_: did it work on 1010 without depooling? [15:03:04] marostegui: no, it ran into a lock there [15:03:07] :( [15:05:54] He's trying 1009 now [15:06:19] i think it will fail [15:06:26] | 7762620 | maintainviews | localhost | NULL | Query | 10 | Waiting for table metadata lock [15:10:45] 1009 completed [15:10:55] oh nice [15:10:59] so maybe give 1010 another go? [15:13:46] yeah, I'll try again [15:13:53] * marostegui fingers crossed [15:15:57] 1010 completed successfully this time [15:16:11] \o/ [15:26:53] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2058 - https://phabricator.wikimedia.org/T225902 (10Marostegui) 05Open→03Resolved The RAID is back to Optimal! ` root@db2058:~# hpssacli controller all show config Smart Array P420i in Slot 0 (Embedded) (sn: 0014380337DC560) Port Name: 1I... [15:26:59] 10DBA, 10Data-Services, 10Tracking-Neverending: Wikireplica service for tools and labs - issues and missing available views (tracking) - https://phabricator.wikimedia.org/T150767 (10JHedden) [15:27:48] 10DBA, 10Operations, 10ops-eqiad: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10Marostegui) @RobH if you add the production DNS entries, I can take care of the installations myself [15:43:28] 10DBA, 10Operations, 10ops-eqiad: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10RobH) [15:43:50] 10DBA, 10Operations, 10ops-eqiad: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10RobH) a:05RobH→03Marostegui Assigned to @Marostegui per irc sync up (dns records are live.) [15:44:00] 10DBA, 10Operations: eqiad: rack/setup/install (4) dbproxy systems. - https://phabricator.wikimedia.org/T225704 (10RobH) [16:54:15] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2043 - https://phabricator.wikimedia.org/T225889 (10Marostegui) And the disk failed again [19:37:10] 10DBA, 10Growth-Team, 10PageCuration, 10Community-Tech (Kanban), 10Spike: [4 hours] Investigate whether it's possible to order by tag value - https://phabricator.wikimedia.org/T225169 (10MusikAnimal) @Marostegui @jcrespo Apologies for the ping, but we were hoping to get the OK from DBAs on our proposed q... [19:40:11] 10DBA, 10Growth-Team, 10PageCuration, 10Community-Tech (Kanban), 10Spike: [4 hours] Investigate whether it's efficient to order by tag value (DBA input requested) - https://phabricator.wikimedia.org/T225169 (10MusikAnimal)