[04:23:28] 10DBA, 06Operations, 10ops-codfw: es2015 crashed with no logs - https://phabricator.wikimedia.org/T147769#2702976 (10jcrespo) [06:46:11] 10DBA, 06Operations, 13Patch-For-Review: Investigate db1082 crash - https://phabricator.wikimedia.org/T145533#2703074 (10Marostegui) Upgraded: ``` root@db1082:~# hpssacli controller slot=1 show | grep -i firmware Firmware Version: 4.02 ``` I will slowly get this server back to the pool but I think this... [07:27:13] 10DBA, 06Operations, 10ops-eqiad: Physically move db1053 to a different rack - https://phabricator.wikimedia.org/T147774#2703076 (10Marostegui) [07:42:24] 10DBA, 06Operations, 13Patch-For-Review: Investigate db1082 crash - https://phabricator.wikimedia.org/T145533#2703104 (10Marostegui) 05Open>03Resolved [08:05:25] 10DBA, 06Operations, 10ops-eqiad: db1065: Degraded RAID - https://phabricator.wikimedia.org/T147396#2703129 (10Marostegui) All good now, thanks ``` Device Present ================ Virtual Drives : 1 Degraded : 0 Offline : 0 Physical Devices : 14 Disk... [08:05:34] 10DBA, 06Operations, 10ops-eqiad: db1065: Degraded RAID - https://phabricator.wikimedia.org/T147396#2703130 (10Marostegui) 05Open>03Resolved [08:42:51] 10DBA, 06Operations, 13Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#2703221 (10jcrespo) Above commands as of now: ``` $ sudo salt -C 'G@cluster:mysql and G@site:eqiad' cmd.run 'grep -l 'server\.key' /etc/my.cnf' | grep -c '/etc/my\.cnf' 102 $ sudo salt -C '... [08:46:04] marostegui: you'll have to change db1053 ip address and row change btw, each row is in its own vlan/subnet [08:46:32] godog: thanks - is that something that I do or DC-Ops do? [08:46:57] godog: By the way the hp upgrade went fine [08:47:51] nice! glad it didn't cause problems [08:48:31] marostegui: you can update dns yourself, https://wikitech.wikimedia.org/wiki/DNS#Changing_records_in_a_zonefile [08:48:55] Ah, good. I will wait for Chris to confirm if it can be moved :) [09:22:11] so for what I see, wikidata unbreak now is an in query that was deployed without proper testing [09:22:22] and in + 10+ items [09:23:28] \o/ [09:23:45] which causes multiple 10+ second queries [09:23:50] which causes pileups [09:24:25] probably deployed thursday [09:27:05] db1015 has started to complain about disk space, too [09:28:50] We can replace it with another one? [09:29:08] we should... now we can?? [09:29:30] those are tracked on the decommission <=db1050 [09:29:35] ticket [09:30:35] Yeah, I am not seeing many servers +db1050 commented out so I guess there are not lots of options to replace those [09:31:42] there are some >1050 commented [09:31:50] for which the intention is to replace those [09:32:07] I also commented you the idea of consolidating the recentchanges slaves with compression [09:32:16] yeah [09:32:23] so the plan is there, it is just a bit vage [09:32:27] we can try to work on that for those >1050 [09:32:29] as a background [09:36:32] BTW, you will see dbstores or other servers not direct slaves or the master [09:36:37] that is not intended [09:37:04] it is just that some of those are also masters, so gtid doesn't work very well [09:37:31] and changing that topology, specially of a delayed slave is a pain- stopping the slave does not stop the slave [09:37:47] some of those are masters? [09:38:05] they have writes not coming from the master [09:38:09] aaah [09:38:10] yes [09:38:15] e.g. most of analytics [09:38:17] you told me about those writable slaves [09:38:22] which messes up gtid [09:39:00] so the combination of factors makes them difficult to change its topology, even if we have a script for that [09:39:14] but that doesn't work for the delayed slave [09:39:50] because replication is controlled by events [09:41:16] if you try to do CHANGE commands without disabling events that ends up with corruption most likely [09:57:58] regarding https://gerrit.wikimedia.org/r/#/c/313235/ I see you a bit optimistic about how easy is to apply that [09:58:04] first, I disagree with the change [09:58:17] myisam used 3, which was the innodb default [09:58:33] plus it requires an index to be rebuilt when changed [10:35:32] 10DBA, 10Phabricator, 06Release-Engineering-Team, 13Patch-For-Review, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2703452 (10jcrespo) a:05jcrespo>03None [10:40:25] 10DBA, 10Phabricator, 06Release-Engineering-Team, 13Patch-For-Review, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2703471 (10jcrespo) To clarify, pending tasks: - let's schedule a downtime for the master fa... [11:46:55] 10DBA: hitcounter and _counter tables are on the cluster but were deleted/unsused? - https://phabricator.wikimedia.org/T132837#2703535 (10Marostegui) I have dropped the tables again from S2, the hosts that failed: `db1063, db1069, dbstore1001, dbstore1002, dbstore2002` It has been a while now and none has brok... [11:53:58] 10DBA, 10Phabricator, 06Release-Engineering-Team, 13Patch-For-Review, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2703551 (10Paladox) We would need to do the + changes to https://github.com/wikimedia/phabric... [12:03:10] 10DBA, 10Phabricator, 06Release-Engineering-Team, 13Patch-For-Review, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2703554 (10Paladox) What we could do here https://github.com/wikimedia/phabricator/blob/wmf/... [12:34:39] 10DBA, 13Patch-For-Review: Reimage dbstore2001 as jessie - https://phabricator.wikimedia.org/T146261#2703580 (10Marostegui) Given that we have to copy all the shards to this server, I was thinking on doing the following: - xtrabackup + import tablespaces We could try xtrabackup from S3 - db2065 for instance,... [12:40:01] 10DBA, 13Patch-For-Review: Reimage dbstore2001 as jessie - https://phabricator.wikimedia.org/T146261#2703586 (10jcrespo) What about compression and 5.7? Because right now there is not enough space for all shards. [12:42:16] 10DBA, 13Patch-For-Review: Reimage dbstore2001 as jessie - https://phabricator.wikimedia.org/T146261#2703598 (10Marostegui) I was planning on testing the compression on dbstore not on the slaves so the plan I had in mind was: Xtrabackup S3 and once that shard is running, compress it. Import another shard, com... [12:44:37] 10DBA, 13Patch-For-Review: Reimage dbstore2001 as jessie - https://phabricator.wikimedia.org/T146261#2703600 (10jcrespo) That seems ok to me, just it was not clear to me. [12:45:38] db1035 is also running low on space [12:45:49] :( [12:46:08] Maybe it is time to bump the priority of the decom to avoid noise? [12:47:07] 10DBA, 13Patch-For-Review: Reimage dbstore2001 as jessie - https://phabricator.wikimedia.org/T146261#2703602 (10Marostegui) Regarding 5.7, I was planning on installing: ``` marostegui@dbstore2001:~$ apt-cache show wmf-mysql57 Package: wmf-mysql57 Version: 5.7.9-1 Architecture: amd64 Maintainer: Jaime Crespo <... [12:47:11] for some reason it is also caching? [12:47:24] ^that is too old [12:47:49] and has security bugs [12:48:03] Ah [12:48:17] we should also create a 10.1 if that is what we are going to setup on labs [12:48:35] let me see that there is on debian [12:48:44] ok [12:49:29] stable has 10.0.27 [12:50:04] and 5.5.52 [12:51:08] testing has the same but also 5.6.30 [12:52:12] we probably have to start from 0 [12:52:14] mariadb 10.1 might be skipped in Debian in favour of directly moving to 10.2: https://lists.debian.org/debian-release/2016/10/msg00065.html [12:52:20] They still package mysql-mmm jesus… [12:52:21] hasn't been decided yet [12:52:35] moritzm, wow, I didn't know they wanted to kill mariadb [12:52:46] they will effectively do that if they do that [12:52:56] marostegui: that has been removed in the mean time: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=828075 [12:53:10] jynus: why would that kill mariadb? [12:53:28] moritzm: Ah, that makes more sense, because as it says there, it is mostly abandoned [12:53:30] 10.2 is not a good release [12:53:59] but they will put it out on december no matter what [12:54:28] 10.1 is ok, and fixes needed bugs and features [12:54:28] could you voice your concern to Otto Kekäläinen , then? He's maintaining mariadb in Debian [12:54:34] otto? [12:54:48] the same one that praises the business license, ceo of mariadb? [12:54:51] Not Andrew Otto :-) [12:55:02] does debian know he does that? [12:55:39] sure, he did all the recent uploads: https://packages.qa.debian.org/m/mariadb-10.0.html [12:55:54] or what do you mean with "knows that he does that"? [12:56:00] no, I mean debian know that [12:56:47] moritzm, recommended read: https://mariadb.org/mariadb-true-open-source-project/ [12:57:03] look who signs :-) [12:59:23] moritzm, do not worry, he really knows my concerns very clearly, I expressed them to him for a long time in person [12:59:55] ok :-) I doubt the release team will accept it anyway, the timing is too bad [13:00:26] I ok with vendors helping if it is mutually beneficial [13:00:39] but shouldn't they stay away from non-technical decisions? [13:01:04] is there any debian policy about private interests? [13:02:00] 10DBA: Unify commonswiki.revision - https://phabricator.wikimedia.org/T147305#2703605 (10Marostegui) db2065.codfw.wmnet is done ``` MariaDB PRODUCTION s4 localhost commonswiki > show create table revision\G *************************** 1. row *************************** Table: revision Create Table: CREAT... [13:03:48] your link is also very interesting [13:04:32] the machine usage policy forbids to use Debian hosts for monetary gain, but that's very blurry area. It happens that some companies let there staff maintain a package on company time, but that's fine by standard practices [13:04:43] that is ok [13:04:55] as in, who will know better mariadb than him [13:04:59] unless you maybe screw the package and work as a consultant to fix installations :-) [13:05:11] I do not see issues with that [13:05:38] but decisions like versions or package X and not Y as alias is where I see conflicts [13:07:44] so, I think we will take 10.0 package, take all crap out and use it as a base for the 10.1 packages [13:08:29] sounds like a plan [13:08:46] and for once, we will have packages before debian sid does [13:10:10] we will also prepare 5.7, in case the debian maintainer plans to continue with the hara-kiri of 10.2 [13:10:32] XDDD [13:12:01] it is better that I couldn't stay for the mariadb developers conference, they would have kicked me out [13:12:36] Or maybe converted you to a mariadb believer [13:13:52] I am a beliver, I am this close to create my own foundation with proper values [13:14:18] you know, this thing called free software [13:14:43] hey, I got up at 5 am today with a page, do not take this into account [13:14:45] :-) [13:14:51] I was wondering if they'll go to Fosdem :) [13:15:18] jynus: Maybe it is time to logoff? :) [13:18:26] let me check what is ubuntu doing [13:18:35] I know they have more recent mysql packages [13:19:30] they have 5.7.15-0ubuntu0, we can use as a reference [13:19:56] but still on 10.0.27 [13:20:05] 5.7.15 nice [13:20:20] what is ubuntu's sid? [13:20:29] (aside from LTS, I mean) [13:20:55] yakkety apparently [13:21:50] maybe my dream of debian packages + patches is impossible :-( [13:22:25] is that what you have with the current 5.7 we have in the repo now? [13:23:01] no, that is a custom package, following previous dba's work [13:23:26] then there is the whole /opt which is no longer needed, as we have full upgraded now [13:23:52] jynus: mariadb has a repository with all the packages, including 10.1 for jessie AFAIU [13:24:02] volans, we do not use binary packages! [13:24:15] deb-src too [13:24:22] we do not trust their sources, as to trust their binaries :-) [13:24:38] let me see [13:25:02] https://downloads.mariadb.org/mariadb/repositories/#mirror=eenet&distro=Debian&distro_release=jessie--jessie&version=10.1 [13:34:19] I am comparing the patches between 10.0 and 10.1 and they are very similar, but have different names so I am getting blind compareing both [13:38:23] the one from mariadb includes the tests, which we also include, and take a huge amount of space [13:43:52] arg, they have added galera as part of 10.1 [13:45:38] oh nooo [13:46:14] 10DBA: hitcounter and _counter tables are on the cluster but were deleted/unsused? - https://phabricator.wikimedia.org/T132837#2703656 (10Marostegui) Regarding S4, I have dropped both tables from the master `db1040` (without replicating the statement) to make sure nothing is issued from the master downstream, as... [14:21:32] 10DBA: Defragment db1070, db1082, db1087, db1092 - https://phabricator.wikimedia.org/T137191#2703703 (10Marostegui) Maybe we can try https://github.com/giacomolozito/ibdata-shrinker for defragmenting one of this hosts. [14:22:26] 10DBA, 07Tracking: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921#2703708 (10Marostegui) [14:22:29] 10DBA, 06Operations: Drop database table "email_capture" from Wikimedia wikis - https://phabricator.wikimedia.org/T57676#2703707 (10Marostegui) 05Open>03Resolved [14:43:29] 10DBA, 10Phabricator, 06Release-Engineering-Team, 13Patch-For-Review, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2703759 (10Paladox) Or for https://github.com/wikimedia/phabricator/blob/wmf/stable/src/appli... [15:29:12] compiling mariadb10.0, 10.1 and 5.7- average load 3 [15:29:28] only? [15:29:34] :) [15:30:40] I will have to compile boost 1.59 embebbed into the server, m*ritz will not be happy :-/ [15:44:40] so we called our mariadb-10.1 packages wmf-mariadb10 [15:45:03] *10.0 [15:45:18] now I do now know what to call the mariadb-10.1 [15:45:33] wmf-mariadb101 ? [16:00:13] rotfl [16:01:23] I think we should follow proper debian naming here [16:01:25] if possible [16:06:22] we are installing non-standard files in a non-standard location [16:06:35] last thing I am thinking is debian policies [16:06:40] so we should use also a non standard name? :D [16:07:05] the package is correctly versioned 10.0.27-1 [16:07:26] but we want to separate the 10.0 and 10.1 versions [16:07:34] as they do [16:07:47] we do not want production on 10.1 [16:07:53] by accident [16:07:57] ofc [16:12:14] they have different repos, same name for the packages [16:12:25] what? [16:12:42] mariadb-server_10.0.27+maria-1~sid_all.deb vs mariadb-server_10.1.18+maria-1~sid_all.deb [16:12:47] you want to share the name? [16:13:09] with upstream packages? [16:13:14] * volans just reporting [16:13:25] I was looking at them to see what they did [16:13:41] oh, the problem is the existing wmf-mariadb10 [16:13:45] which is a bad name [16:13:57] but I suppose nobody at yhe time supposed there was going to be a 10.1 [16:14:04] just a 11 [16:14:05] agree [16:14:07] or whatever [16:14:36] probably wmf-mariadb101 make sense [16:15:15] not pretty but almost clear, someone might think it contains a tutorial named MariaDB 101 though :-P [16:15:43] alternatively wmf-mariadb10-1 [16:15:47] no [16:15:56] at most wmf-mariadb10.1 [16:16:07] or wmf-mariadb-10.1 [16:16:18] can you use the . before the _? I don't remember if the version checker will be fooled [16:17:04] the official version is mariadb-server-core-10.0 [16:17:32] ok then [16:17:44] then thing is [16:17:51] originally, it was wmf-mariadb [16:18:08] then we created wmf-mariadb10 as an alternative [16:44:32] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations: Prepare and check production and labs-side filtering for olowiki - https://phabricator.wikimedia.org/T147302#2704003 (10jcrespo) No it is not, it is not available on labs- and it should not be until this is resolved. [16:46:37] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations: Prepare and check production and labs-side filtering for olowiki - https://phabricator.wikimedia.org/T147302#2704009 (10jcrespo) @Marostegui we should do this tomorrow, with special guest @chasemp , if he wants. [18:42:52] 10DBA, 10Phabricator, 06Release-Engineering-Team, 13Patch-For-Review, 07Wikimedia-Incident: Contention on search phabricator database creating full phabricator outages - https://phabricator.wikimedia.org/T146673#2704167 (10mmodell) Upstream task about InnoDB support: https://secure.phabricator.com/T11741 [19:19:48] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations: Prepare and check production and labs-side filtering for olowiki - https://phabricator.wikimedia.org/T147302#2704247 (10Marostegui) Sounds good to me, let's do it tomorrow! El 10 oct. 2016 18:46, "jcrespo" escribió:...