[02:28:56] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2019 - https://phabricator.wikimedia.org/T170503#3433998 (10Peachey88) [04:45:33] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2019 - https://phabricator.wikimedia.org/T170503#3434091 (10Marostegui) Hi Rob, It is scheduled for decom but I don't know if this will happen any time soon. It is s4 master, so if we happen to have some old disks from other hosts around the DC it would b... [05:03:26] 10DBA: db2019 has performance issues, replace disk or switchover s4 master elsewhere - https://phabricator.wikimedia.org/T170351#3434128 (10Marostegui) And the disk finally failed (see how it no longer appears) and how it was automatically detected on T170503: ``` root@db2019:~# megacli -PDList -aall | grep Slo... [05:14:16] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for dinwiki - https://phabricator.wikimedia.org/T169193#3434164 (10Marostegui) The sanitization was done correctly and no private data was reported, so I have created the views on labsdb1001 and labsdb1003. The new labs hosts have repl... [07:54:52] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2019 - https://phabricator.wikimedia.org/T170503#3434309 (10Papaul) @Marostegui this was not assigned to me so i missed it on my dashboard sorry about that. but will look into this first thing in the am once on site. We should have some disks from the dec... [07:55:47] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2019 - https://phabricator.wikimedia.org/T170503#3434311 (10Marostegui) @Papaul no worries! :-) Thanks! [09:31:08] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Convert unique keys into primary keys for some wiki tables on s1 - https://phabricator.wikimedia.org/T166204#3434611 (10Marostegui) db1095 (sanitarium2) is done for the table that exist: ``` root@neodymium:/home/marostegui# for i in `cat s1_tables`; do e... [09:31:48] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Convert unique keys into primary keys for some wiki tables on s1 - https://phabricator.wikimedia.org/T166204#3434612 (10Marostegui) [09:54:44] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Convert unique keys into primary keys for some wiki tables on s1 - https://phabricator.wikimedia.org/T166204#3434686 (10Marostegui) I think for db1047, I will alter the same tables than in dbstore1002 to: 1) have both analytic hosts consistent 2) I doubt... [11:56:04] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2019 - https://phabricator.wikimedia.org/T170503#3434899 (10Marostegui) [12:00:39] 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on db1066 - https://phabricator.wikimedia.org/T169448#3434923 (10Marostegui) [12:33:54] hello! [12:34:24] I tried to run the eventlogging cleaner on db1047 for only one day in 2014 (as test) and this is the first error that I got [12:34:27] pymysql.err.InternalError: (1290, 'The MariaDB server is running with the --read-only option so it cannot execute this statement') [12:34:33] that makes sense, didn't event think about it [12:37:01] (the statement was a DELETE) [12:37:34] Yeah, it is read only, like dbstore1002, although jynus mentioned some days ago that in the past those hosts were available for writes [12:38:47] So we have two options then, either set them to be writteable or you run the script with root [12:38:50] None of them are ideal XD [12:39:04] ahh this is why eventlogging_sync.sh works! [12:39:11] because it runs as root [12:39:33] Yeah, I don't like the idea of having those hosts writeable to be honest [12:39:50] me too, since IIUC they have automatic failover activated [12:40:03] so if db1046 goes fishing db1047 becomes the EL master [12:40:28] but it would manually need to be read_only=0 [12:40:34] at the time of the failover [12:41:13] marostegui: sure sure I meant if we set read_only=0 to db1047 to allow the cleaner script to work [12:42:02] That would allow people (with enough grants) to change data directly there [12:42:39] but with the actual grants we should be fine no? [12:43:07] in any case, what is the best option in your opinion? [12:43:45] I am checking the users and their grants [12:45:16] Looks like there is no user with write privileges apart from root [12:46:05] So it might be safe to set up read_only=off there, but I would like to hear a second opinion from jynus as I believe it was changed in the past from off to ON for some reason [12:50:35] marostegui: I'd be afraid of having db1047 be able to completely take over from db1046, not sure if this is what we want.. at the moment eventlogging would just fail inserting if db1046 was down [12:50:56] I think it used to be off, and maybe there was user databases [12:51:06] elukey: what do you mean? [12:51:30] in all cases, check puppet and modify accordingly [12:52:19] marostegui: say db1046 goes awol, db1047 takes over and becomes the m4 master no? At the moment it wouldn't accept INSERTs from eventlogging, but with readonly false it could [12:52:31] elukey: correct yes [12:52:57] marostegui: so I am wondering if this is dangerous or not :D [12:53:30] Yeah, it is dangerous, but so is running the script as root :) [12:53:59] where would you run the script from? [12:54:06] localhost [12:54:25] ah, then maybe it is not that bad [12:54:46] atm I authenticate via unix_socket using the eventlogcleaner user [12:55:23] you can authenticate via socket with root as well [12:56:49] so the "best" trade/off at the moment would be to run it as root? [12:57:04] if it is from localhost, I would say so [12:57:28] all rightzzz [12:58:52] marostegui: I am starting to think that there is a curse on this script :D [12:59:04] yeah, the old data doesn't want to be killed [13:04:14] don't run it as root [13:04:38] eventlogging will eventually not have other data [13:04:56] and the other data is a copy- worse case that can happen is it being leaked [13:05:09] which can happen now [13:05:16] root has other worse privileges [13:05:32] delete on log.% is the least hutfull [13:05:39] I asked elukey not to use root [13:06:07] root should never been used [13:06:18] neither for this nor for replication [13:06:25] Yeah, I am not a big fan of root, but if he is not sure if they want to have db1047 accepting writes straight away after a failover, there is not much option :( [13:06:38] yes, there is an option [13:06:53] remove al write rights from other users [13:07:05] I checked and there are no users with writes [13:07:11] yep [13:07:12] like, research for instance doesn't have [13:07:17] I assume other than root? [13:07:22] other than root yeah [13:07:28] then that's it [13:07:36] other thing would make as much sense as [13:07:58] setting enwiki masters as read only [13:08:03] and set it to run as root [13:08:07] (mediawiki) [13:08:12] dbs are to be used [13:08:22] slaves in general should be in read only [13:08:29] but this is not a slave [13:08:38] same thing that labs hosts [13:08:44] people use them in r/w [13:08:46] yes, it is in that limbo for me [13:08:56] let them use the db [13:09:57] Then let's change puppet set them read_only=0 and let it run elukey :-) [13:10:28] can they delete the whole data? Yes! [13:10:41] but I am sure in that case luca will help recover it [13:10:47] ahahahah [13:10:48] :) [13:11:24] all right code review coming [13:11:29] xddddd [13:11:39] I am sorry, but excesive burocrazy [13:11:51] when there is a not perfect, but appropiate way [13:11:58] not a fan [13:12:04] let users do their thing [13:12:32] plus, I already told him it was ok [13:12:51] I wasn't aware - sorry :) [13:12:56] elukey: sorry for the back and forth! [13:12:58] not a problem with you [13:13:08] I just was saying sorry to luca [13:13:18] to confuse him [13:13:35] And I confused him too! [13:13:41] we had a long meeting last day [13:13:48] and discussed this and other matters [13:14:46] I dont even know why that is in read only [13:14:55] I think it didn't use to [13:15:03] yeah that is what you told me [13:15:06] that it used to be OFF [13:15:13] but if nobody complains [13:15:16] but as it is ON, maybe there was some past nightmares history... [13:15:26] no reason to put it on [13:15:38] if someone complains, like this exact case, we change it :-) [13:16:18] for some reason I didn't connect the read-only with "the script will not work" since eventlogging_sync.sh was running fine [13:16:26] the root account was the key :D [13:16:26] yeah [13:16:33] I mentioned that briefly [13:16:40] we should create a dedicated account for that [13:16:41] I might have missed sorry :( [13:16:52] but I didn't want to ask you [13:17:04] because you were already helping a lot [13:17:06] with this [13:52:20] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for maiwikimedia - https://phabricator.wikimedia.org/T168788#3435384 (10Marostegui) Is this supposed to be like this (s3 master)? ``` root@db1075[(none)]> use maiwikimedia; Database changed root@db1075[maiwikimedia]> select count(*) fr... [14:02:26] I will fix dbstore2001 [14:02:31] (replication broken) [14:02:39] (because of dinwiki creation) [14:02:56] (/() ) [14:02:57] XD [14:07:47] I was going to do that- saw it, but there was no rush [14:07:55] yep :) [14:08:02] did you saw, by the way, pc2006= [14:08:04] ? [14:08:09] yeah, I replied [14:08:12] I think the raid is fine [14:08:18] o, didn't see that [14:08:34] I may not be suscribed? [14:08:44] if you created the ticket, you must be no? [14:08:56] strange [14:09:09] I am subscribed [14:09:13] spam? [14:09:37] buried against other emails [14:23:02] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for maiwikimedia - https://phabricator.wikimedia.org/T168788#3435546 (10Urbanecm) >>! In T168788#3435384, @Marostegui wrote: > Is this supposed to be like this (s3 master)? > ``` > root@db1075[(none)]> use maiwikimedia; > Database chan... [14:24:08] 10DBA, 10Cloud-Services, 10User-Urbanecm: Prepare and check storage layer for maiwikimedia - https://phabricator.wikimedia.org/T168788#3435566 (10Marostegui) >>! In T168788#3435546, @Urbanecm wrote: >>>! In T168788#3435384, @Marostegui wrote: >> Is this supposed to be like this (s3 master)? >> ``` >> root@db... [14:30:42] checking Icinga status I found that dbstore2002, db1102 and db1024 have notifications disabled for the host check (but not for the services). When you have time (no hurry) could you check if they are in the state you want them please? :) [14:31:49] I wil check, I think db1024 is scheduled from decom [14:31:55] from the top of my mind [14:31:57] But I will check :) [14:33:24] thanks, because usually or you want them all disabled or enabled (host + services) [14:33:35] yeah :) [14:37:41] all cleaned up [14:37:43] thanks for the heads up! [14:46:39] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Convert unique keys into primary keys for some wiki tables on s1 - https://phabricator.wikimedia.org/T166204#3435733 (10Marostegui) dbstore1001 is done (in the same way as dbstore1002), all tables but the two big ones that crashed dbstore1002: ``` root@n... [15:10:14] 10DBA, 10MediaWiki-Database, 10ArchCom-RfC (ArchCom-Approved), 10RfC: Should we bump MediaWiki's minimum supported MySQL Version to 5.5? - https://phabricator.wikimedia.org/T161232#3435925 (10daniel) Since no objections have been raised during the last call period, this RFC has been approved for implementa... [15:11:51] dbstore2002 seems to have a gap on its monitoring https://grafana-admin.wikimedia.org/dashboard/file/server-board.json?refresh=1m&orgId=1&var-server=dbstore2002&var-network=bond0&from=1499954757509&to=1499956408765 [15:12:16] did it crash or something? [15:12:25] o no [15:12:32] it is a gap everywhere [15:20:06] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2019 - https://phabricator.wikimedia.org/T170503#3435969 (10Papaul) a:03Marostegui Disk replacement complete [15:23:39] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2019 - https://phabricator.wikimedia.org/T170503#3433715 (10jcrespo) Thanks papaul, do you have an idea of how many 600 gb disks like these are left? I do not need an exact count, "0, 1, 2 or many" is specific enough. I think chris is short of them I wonde... [15:25:16] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2019 - https://phabricator.wikimedia.org/T170503#3435983 (10Marostegui) Thanks @Papaul, looks like that after our doubts to identify the correct disk, we changed the correct one :-) ``` Code: 0x0000006a Class: 0 Locale: 0x02 Event Description: Rebuild auto... [15:57:17] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Convert unique keys into primary keys for some wiki tables on s1 - https://phabricator.wikimedia.org/T166204#3436120 (10Marostegui) [16:02:28] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2019 - https://phabricator.wikimedia.org/T170503#3436132 (10Papaul) @jcrespo I do not have spare on site, there onces I am using are pulled from decom servers. [16:05:25] 10DBA, 10MediaWiki-Database, 10ArchCom-RfC (ArchCom-Approved), 10RfC: Should we bump MediaWiki's minimum supported MySQL Version to 5.5? - https://phabricator.wikimedia.org/T161232#3436141 (10Reedy) Sweet, thanks. Fine to go into 1.30? I'll see about making some patches, and see what we can start to remo... [16:10:21] 10DBA, 10MediaWiki-Database, 10ArchCom-RfC (ArchCom-Approved), 10RfC: Should we bump MediaWiki's minimum supported MySQL Version to 5.5? - https://phabricator.wikimedia.org/T161232#3436171 (10daniel) >>! In T161232#3436141, @Reedy wrote: > Sweet, thanks. > > Fine to go into 1.30? I guess so, but ask @dem... [16:14:02] 10DBA, 10MediaWiki-Database, 10ArchCom-RfC (ArchCom-Approved), 10RfC: Should we bump MediaWiki's minimum supported MySQL Version to 5.5? - https://phabricator.wikimedia.org/T161232#3436202 (10demon) 1.30? We haven't even begun yet. Go ahead [16:27:20] 10DBA, 10MediaWiki-Database, 10ArchCom-RfC (ArchCom-Approved), 10Patch-For-Review, 10RfC: Should we bump MediaWiki's minimum supported MySQL Version to 5.5? - https://phabricator.wikimedia.org/T161232#3436277 (10jcrespo) This may require multiple changes on the official wiki documentation, you will need... [16:48:45] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2019 - https://phabricator.wikimedia.org/T170503#3436419 (10Marostegui) 05Open>03Resolved And RAID back to optimal: ``` root@db2019:~# megacli -LDInfo -lALL -aALL Adapter 0 -- Virtual Drive Information: Virtual Drive: 0 (Target Id: 0) Name... [16:48:59] 10DBA: db2019 has performance issues, replace disk or switchover s4 master elsewhere - https://phabricator.wikimedia.org/T170351#3436421 (10Marostegui) The disk was replaced and the raid is back to optimal (T170503#3436419), let's see if it has any effect on this issue in the next few days. [17:06:53] 10DBA, 10Operations, 10Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3436497 (10Cmjohnson) @jcrespo and @marostegui db1106 raid has been setup. [17:33:06] 10DBA, 10Operations, 10Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3436693 (10Marostegui) Thanks @Cmjohnson I have restarted the host for its reinstallation. I will close this task when done. [20:04:26] 10DBA, 10MediaWiki-Database, 10MediaWiki-Documentation, 10ArchCom-RfC (ArchCom-Approved), and 3 others: Should we bump MediaWiki's minimum supported MySQL Version to 5.5? - https://phabricator.wikimedia.org/T161232#3437495 (10Reedy) [20:04:57] 10DBA, 10MediaWiki-Database, 10MediaWiki-Documentation, 10ArchCom-RfC (ArchCom-Approved), and 3 others: Should we bump MediaWiki's minimum supported MySQL Version to 5.5? - https://phabricator.wikimedia.org/T161232#3125908 (10Reedy) >>! In T161232#3436277, @jcrespo wrote: > This may require multiple change...