[08:32:30] 10DBA, 10ContentTranslation, 10Language-2018-Jan-Mar, 10MW-1.31-release-notes (WMF-deploy-2018-03-20 (1.31.0-wmf.26)), and 2 others: CX2: Register the version used to start a translation - https://phabricator.wikimedia.org/T187986#4060285 (10KartikMistry) >>! In T187986#4052729, @Nikerabbit wrote: > [x] Te... [08:51:30] I've checked consistency of m1- most were false positives, I fixed 1 row [08:52:10] nice! [08:52:13] \o/ [09:34:12] did you start db1073? [09:34:27] uh? [09:34:44] db1073 is a misc master, I haven't touched it [09:35:16] db1073 is running with the socket on /tmp, but it has its systemd setting privatetmp=true [09:35:25] I don't know how that is possible [09:36:04] Ah, that might be coming when it was set up [09:36:16] I think I did the ln -s [09:36:33] buy mariadb cannot boot with the socket there [09:37:01] the real socket is on /tmp, it is not a link [09:37:18] Then I don't get it :| [09:37:35] maybe it was started before puppet came in? [09:37:38] literally, if mysqld tries to touch tmp, process is kille [09:37:40] d [09:39:07] we will need to restart it at some point [09:39:17] I am going to move the default config on all misc [09:39:23] we can do it tomorrow morning if you want, it should be a matter of seconds [09:39:36] well, I am not in a hurry [09:39:47] I just don't understand how it is technically possible [09:39:59] maybe it was a race condition [09:40:00] how it happens is normal [09:40:09] we have now bad config [09:40:17] but how systemd allowed it? [09:40:59] I may do some tests on a non-critical host [09:41:31] so maybe it started mysql before systemd got the config? [09:41:48] but that wouldn't be possible either [09:41:49] I guess that is possible [09:41:57] but without a systemd unit [09:42:00] yeah [09:42:02] the only way to do that is init.d [09:42:12] and it is clearly done by systemd [09:42:22] at least systemd knows about it [09:44:24] 10DBA, 10Operations, 10hardware-requests: Decommission db1009 - https://phabricator.wikimedia.org/T189216#4060424 (10Marostegui) Backup a binary copy of db1009: `es2001:/srv/backups/older/m5/db1009_binary_copy/db1009.tar.gz` Backup a logical copy of testreduce_0715.results table: `es2001:/srv/backups/older/m... [09:47:54] this was all to generate this list: https://phabricator.wikimedia.org/T148507#4060444 [09:51:16] regarding ^ I would contact the testreduce* databases owner, and offer the dump [09:51:46] if it is something that is concerning for the owner, can can make available the dump [09:51:56] even if most likely it will not [09:51:57] you have any idea how that would be? [09:52:02] *who [09:52:20] there is some ticket in which I check if we should do backups of that [09:52:24] and I was told no [09:52:28] let me search it [09:52:28] ah cool, I will look for it [09:52:36] don't worry, I can do it :) [09:52:43] Don't want you to waste your time [09:52:53] search closed testreduce backups or something like that [09:52:58] ok! [09:52:59] thanks [09:54:54] https://phabricator.wikimedia.org/T186585#3949953 ? [09:56:07] yes [09:56:11] cool! [09:56:13] thanks [09:56:37] 10DBA, 10Operations, 10hardware-requests, 10Patch-For-Review: Decommission db1009 - https://phabricator.wikimedia.org/T189216#4060488 (10Marostegui) [09:56:39] if there is no backups, there is no guarantee of consistency either [09:56:51] yeah, I will drop a line to confirm they are ok [09:57:03] but let's at least explain what happened [09:57:11] yeah :) [09:57:23] e.g. maybe the contents is not needed, but they can recreate them [10:07:33] I realised there are only 22 different rows and not 57k :-) [10:07:37] I emailed the owner anyways [10:27:28] 10DBA, 10Operations, 10hardware-requests, 10Patch-For-Review: Decommission db1009 - https://phabricator.wikimedia.org/T189216#4060598 (10Marostegui) [10:27:50] 10DBA, 10Operations, 10hardware-requests, 10Patch-For-Review: Decommission db1009 - https://phabricator.wikimedia.org/T189216#4035423 (10Marostegui) @chasemp can you please proceed and remove the ACL for db1009 now? [10:28:29] 10DBA, 10Operations, 10hardware-requests, 10Patch-For-Review: Decommission db1009 - https://phabricator.wikimedia.org/T189216#4060600 (10Marostegui) [10:29:19] 10DBA, 10Operations, 10hardware-requests, 10Patch-For-Review: Decommission db1009 - https://phabricator.wikimedia.org/T189216#4035423 (10Marostegui) a:05jcrespo>03RobH This host is now ready for DC Ops decommissioning, so assigning it to @RobH [10:30:26] thanks for the work! [10:31:45] Getting old hosts to be decom'ed is always a pleasure :p [10:33:43] marostegui: It's still not as fun as servers that staff specifically wanted to see blown up (see solaris boxes) [10:34:32] or ciscos? ;-) [10:34:46] heh, yeah [11:12:49] 10DBA, 10Wikimedia-Incident: Improve regular production database backups handling - https://phabricator.wikimedia.org/T138562#4060683 (10jcrespo) [11:12:51] 10DBA, 10Operations, 10Goal: Generate consistent logical database backups in CODFW - https://phabricator.wikimedia.org/T184699#4060677 (10jcrespo) 05Open>03Resolved a:03jcrespo Done at T184696 and T183735 [11:15:23] 10DBA, 10Operations, 10Goal: Generate consistent logical database backups in CODFW - https://phabricator.wikimedia.org/T184699#4060692 (10jcrespo) [11:15:26] 10DBA, 10Patch-For-Review: Failover existing eqiad database backup system to the new codfw database logical backup system - https://phabricator.wikimedia.org/T184697#4060689 (10jcrespo) 05Open>03Resolved a:05jcrespo>03mark The generation worked, we have backups on codfw AND bacula. [11:27:39] 10DBA: Generate report of disk health for database masters and master candidates - https://phabricator.wikimedia.org/T190035#4060730 (10jcrespo) [11:27:47] 10DBA: Generate report of disk health for database masters and master candidates - https://phabricator.wikimedia.org/T190035#4060742 (10jcrespo) p:05Triage>03Low [12:17:14] 10DBA, 10Operations, 10Patch-For-Review: Switchover m1 master from db1016 to db1063 - https://phabricator.wikimedia.org/T189655#4060833 (10Marostegui) @akosiaris we are planning to suggest in the meeting today: tomorrow Tuesday at 16:00UTC, would that work for you? [12:17:53] 10DBA, 10Operations, 10Patch-For-Review: Switchover m1 master from db1016 to db1063 - https://phabricator.wikimedia.org/T189655#4060834 (10akosiaris) Yes, that's fine. [12:19:03] 10DBA, 10Operations, 10Patch-For-Review: Switchover m1 master from db1016 to db1063 - https://phabricator.wikimedia.org/T189655#4060839 (10Marostegui) Awesome! Thanks! We will mention it on the meeting today then, and we'll see what we get :) [12:37:31] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018), 10Patch-For-Review, 10Schema-change: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089#4060892 (10Marostegui) [13:00:01] 10DBA: Generate report of disk health for database masters and master candidates - https://phabricator.wikimedia.org/T190035#4060927 (10Marostegui) [13:18:24] 10DBA, 10ContentTranslation, 10Language-2018-Jan-Mar, 10MW-1.31-release-notes (WMF-deploy-2018-03-20 (1.31.0-wmf.26)), 10Schema-change: CX2: Register the version used to start a translation - https://phabricator.wikimedia.org/T187986#4060982 (10Petar.petkovic) [15:07:33] 10DBA, 10Operations, 10hardware-requests, 10Patch-For-Review: Decommission db1009 - https://phabricator.wikimedia.org/T189216#4061400 (10Marostegui) >>! In T189216#4058198, @Marostegui wrote: > So, the checks finished and there were differences on testreduce_0715.results (173GB) table, between the followin... [15:23:06] 10DBA: Generate report of disk health for database masters and master candidates - https://phabricator.wikimedia.org/T190035#4061483 (10Marostegui) s1: db1052 master: ``` root@db1052:~# megacli -LDPDInfo -aAll | egrep -i "slot|error" Slot Number: 0 Media Error Count: 106 Other Error Count: 1 Slot Number: 1 Medi... [15:36:09] 10DBA: Generate report of disk health for database masters and master candidates - https://phabricator.wikimedia.org/T190035#4061535 (10jcrespo)