[00:13:23] 10DBA, 10Operations, 10Patch-For-Review: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases - https://phabricator.wikimedia.org/T162070#3151523 (10Dzahn) [00:14:55] 10DBA, 10Operations, 10Patch-For-Review: Cleanup or remove mysql puppet module; repurpose mariadb module to cover misc use cases - https://phabricator.wikimedia.org/T162070#3151523 (10Dzahn) Created subtask to make quarry use the mariadb module since that is one of the few things still using it. [06:20:28] 10DBA, 10Operations, 10Wikimedia-Site-requests: Global rename of JeanBono → Rexcornot: supervision needed - https://phabricator.wikimedia.org/T181170#3783151 (10Marostegui) I am online now :-) [06:54:09] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-extensions-ORES, 10Patch-For-Review, and 2 others: Review and deploy schema change on dropping oresc_rev_predicted_model index - https://phabricator.wikimedia.org/T180045#3783170 (10Marostegui) [06:55:30] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-extensions-ORES, 10Patch-For-Review, and 2 others: Review and deploy schema change on dropping oresc_rev_predicted_model index - https://phabricator.wikimedia.org/T180045#3783172 (10Marostegui) [06:59:18] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-extensions-ORES, 10Patch-For-Review, and 2 others: Review and deploy schema change on dropping oresc_rev_predicted_model index - https://phabricator.wikimedia.org/T180045#3783175 (10Marostegui) 05Open>03Resolved All done [09:16:26] I have upgraded the s5 master to .33 because I have the intention of rebooting all s5/s8 replicas [09:16:36] (on codfw) [09:16:38] ah great! [09:17:33] port changes are tricky- many things to change- mediawiki, puppet, mysql itself [09:18:10] yeah, when I was deploying it I was like: buf, this will be a bit of a pain to change to 3318 [09:18:31] but it is a one time change, and only on 2 instances per replica set [09:19:03] yes and the good thing is that they are not masters or anything "critical" [09:48:54] https://dbtree.wikimedia.org/ (scroll down) [09:49:26] yay!!!!!!!! [09:51:24] marostegui, jynus: do you have any long-running DBA tasks on sarin/neodymium? I'd like to reboot both of them into a new kernel next week (but totally flexible on the time) [09:51:53] moritzm: I have two of them [09:52:01] but I think they will finish by tomorrow [09:52:10] (only on neodymium) [09:52:47] moritzm: set a calendar invite and wil wll makse sure nothing is running there [09:53:37] let me check sarin, maybe you could do that now so we can migrate things there if necessary [09:54:28] marostegui: I only see a screen there, maybe yours? [09:54:32] if not I will kill it [09:54:42] nope, don't think I have logged in sarin for months [09:55:07] maybe it is mine [09:55:11] there's a connection to db2051 for commonswiki from Aug31 [09:55:40] ok, killed [09:55:47] maybe you can do sarin now [09:56:01] an that we we start new long-running processes there [09:56:06] *then [09:56:07] ok, let me briefly check with Riccardo (and I'll also ping the security channel), then we could go ahead with sarin today [09:56:12] of course [09:56:30] moritzm: sorry, but our maintenance is long-running [09:56:35] apologies [09:56:47] * volans reading backlog [09:57:26] no blockers for me to reboot sarin [09:57:30] jynus: sure, fully understood [09:58:06] moritzm: I have cleaned up my screens on neodymium, all but the ones that are running [09:58:33] let's reboot sarin today and then we can look into neodymium next week [10:00:43] marostegui: sharing terminals- that is an improvement into your addition of using screeen :-D [10:00:57] *adiction [10:01:05] XDDDDDDDDD [10:24:05] I'm installing the new 4.9.51 kernels on db hosts now (we actually spoke about this last week or so, but then something else came up) [10:39:40] thanks [10:39:53] I wonder if we should be doing the same for mariadb versions [10:40:17] after all, no problems should happen unless something else than the mysqld executable changes [10:40:46] moritzm: we also need to speak about ferm at some point [10:41:01] both for mariadb and in general, but it is not a blocker for us [10:41:37] jynus: would be happy to, let's do that tomorrow on "quiet Friday"? [10:41:54] (talk about ferm I mean) [12:45:02] I wonder if I should remove partitioning from db2038 or just reimage it, which will be faster? [13:54:08] Is that s5? [13:56:16] I decided to clone in the end- see SAL [13:56:24] ah ok :) [13:59:00] although it is going quite slowly [14:12:55] 10DBA, 10Patch-For-Review: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359#3783937 (10Marostegui) [14:14:20] jynus: db1101 has now mysql stopped [14:14:24] is that a blocker for you? [14:14:34] I can bring it up and do the copy later [14:15:07] no [14:15:11] not a problem for me [14:15:14] ok :) [14:15:25] I am creating a patch, however [14:15:38] and you may want to apply it befre putting it up? [14:15:42] sure :) [14:16:27] where do we put the multiinstance hosts? at the start of all, at the end? [14:17:02] as they have multiple sets [14:17:13] I keep putting them int he same order, depending on the hostname [14:17:20] as in master, then minor -> major [14:17:24] no [14:17:27] you talking about wmconfig? [14:17:28] but [14:17:34] on puppet [14:17:35] ah [14:17:44] they are ordered in gneral by replication set [14:17:52] but obviously those have 2 of them [14:18:16] yeah, maybe we can put the lower shard first? [14:18:17] maybe we put a comment [14:18:27] and then all multi at the end? [14:18:36] yeah, that sounds good [14:18:38] a comment on the s1, s2 sections? [14:18:46] the order of each ones is not that important [14:18:51] for me [14:19:14] yep, not for me either [14:36:55] so I have https://gerrit.wikimedia.org/r/393065 [14:37:08] I can move db1101 to s8 if you tell me so [14:37:30] the rest (except db2038, have only been fake-moved) [14:38:48] yep, db1108 is now stopped, so you can do it if you like [14:38:51] sorry db1101 [14:40:53] patch updated [14:41:02] I will run the compiler, as I do not trust all changes [14:41:33] will that create sqldata.s8? [14:41:37] the directory I mean [14:44:13] no no [14:44:17] that has to be done manually [14:44:24] well [14:44:32] maybe the dir is created [14:44:41] but definitely it is not moved :-) [14:45:03] that is why it helps if it is already down [14:45:38] I think the dir and the run dir has to be deleted manually [14:45:40] yeah, i will create+move it [14:45:42] and the config [14:45:43] or just rename it :) [14:45:46] yep [14:46:01] it does not delete existing dirs [14:46:33] so I think if we rename it and not delete old dirs, one could manually mistakenly starty the wrong shard [14:47:04] because I think there is a "do not try to start if there is no sqldir.X" [14:47:19] not sure if the dir or the config [14:47:19] yeah, I will just rename s5 to s8 on db1101 [14:47:28] so sqldata.s5 will no longer exist [14:47:38] you need to do it once puppet is deployed [14:47:42] yeah [14:47:42] or disable puppet [14:47:48] and i need to wait for the transfer to finish :) [14:47:51] I am still compiling [14:47:53] :-) [14:48:01] same here [14:48:08] sure, sure, I will also wait for the transfer to finish so, we have time [14:49:01] I think someone is installing kernels [14:49:05] :-D [14:49:19] Someone blond even? [14:49:29] blond? [14:49:38] I thought it was brown [14:49:47] For me that is blond [14:50:23] my official hair colour is "Angelic blonde" [14:50:33] do you dye it that way? [14:51:11] hahaha [14:53:50] every week! [14:55:13] so there are several errors: https://puppet-compiler.wmflabs.org/compiler02/8952/ [14:55:48] where does db1100 go? [14:56:23] db1100 is in s5 [14:56:27] as far as the etherpad [14:56:56] db1109 and db1110 are the new hosts, so maybe the catalog is missing or something for the compiler? I have seen that before with new hosts [14:57:01] maybe I am blinf [14:57:02] volans helped me at the time [14:57:11] but I do not see 1100 on etherpad [14:57:24] what's up? [14:57:42] jynus: you are not bind, no [14:58:05] is it new? [14:58:20] jynus: db1100 isn't new, I guess I missed it when I built the etherpad [14:58:26] it is up and running normally [14:58:28] it should go to s5 [14:58:37] that is ok [14:58:46] so I probably deleted because etherpad [14:58:57] but yes, it was on s5 [14:59:17] volans: you remember that issue with new hsots failing on puppet compiler? because some magic had to be run? [14:59:31] https://wikitech.wikimedia.org/wiki/Nova_Resource:Puppet3-diffs#FAQ [14:59:47] if the host is new you need to update the compiler facts [15:00:16] that was it! [15:00:18] :** [15:05:29] db1101 disables s5, but doesn't declare s8 [15:06:03] db2038 changes correctly to s8 [15:06:24] yeah, I see [15:06:54] db1109 and db1110 fails completely [15:07:04] yeah, that is because what volans says [15:07:05] checking 1101 and these otehrs [15:07:10] ? [15:07:21] but those are not new? [15:07:27] they are [15:07:30] oh [15:07:31] db1109 and db1110 are new [15:07:35] ok [15:07:43] new but already replicating and in production [15:07:54] so I shouldn't configure them yet or should I? [15:08:04] you should [15:08:05] they are all set [15:08:08] ok [15:08:09] we only have to update the facts [15:08:16] so they appear fine on puppet compiler [15:08:21] so only db1101 to fix [15:08:29] (Aside from db1100 [15:08:29] indeed [15:09:41] oh, I know [15:09:51] multiinstance doesn't have s8 code [15:10:19] Aaaaah right, db2038 isn't multiinstance, of course [15:14:45] will db1100 alert or something? [15:15:15] db1100 has notifications disabled for mysql services (we should enable them actually) [15:15:18] the rest is enabled [15:16:10] 6 large servers for s5 is a bit too much [15:16:37] do we have some left for misc, which is what we bought them for? [15:16:57] big servers, no, we don't [15:17:08] any server [15:17:57] it is ok to overprovision for the switchover, but we should think later how we use the resources [15:17:58] once we start freeing up hosts with the multi-instance, we will have some of the old recentchanges (96G) free [15:18:04] ok [15:18:09] worry for later [15:18:13] yep [15:21:04] that should be it: https://puppet-compiler.wmflabs.org/compiler02/8953/db1101.eqiad.wmnet/ [15:21:18] \o/ [15:21:28] also https://puppet-compiler.wmflabs.org/compiler02/8953/db1100.eqiad.wmnet/ [15:22:10] maybe I can set db1071 as master already to puppetize the heartbeat [15:23:15] 10DBA, 10Patch-For-Review: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359#3784020 (10Marostegui) db1097.s5 is now replicating [15:23:26] 10DBA, 10Patch-For-Review: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359#3784021 (10Marostegui) [15:23:31] db1097.s5 is now up with db1101's data, as the transfer finished. [15:23:40] I will leave db1101 stopped, so we can migrate it to s8 .) [15:23:42] :-) [15:24:17] we can deploy now and I can disable puppet on the ones I am transferring now [15:24:26] sure [15:26:37] check gerrit:393065 [15:27:04] checjking [15:33:33] +1 ed with a comment [15:33:37] a FYI mostly [15:38:25] let's fix that now [15:38:48] ok :) [15:40:32] check the latest patch while I try to run the compiler on 91 and 92 codfw [15:40:38] checking [15:49:41] ok, about to deploy, ready for changes on db1101 ? [15:49:48] yep! [15:49:56] I have disabled pupept on db2038, not finished deploying yet [15:50:01] *tranferring [15:50:20] go for it! [15:53:34] let me know when merged [15:53:50] merged and deployed [15:53:57] ok [15:54:00] let me run puppet on db1101 [15:54:09] with noop first [15:54:55] mmm [15:55:05] I think I missed changing db1071 to s8? [15:55:12] yeah [15:55:13] i saw that [15:55:19] but I assumed you wanted to leave it on s5 [15:55:22] no problem because [15:55:23] just give it the master role to it [15:55:34] It didin't kill the existing heartbeat [15:55:38] but it is nto what I wanted [15:55:40] that is what i understood from the commit message [15:55:50] I wanted that before [15:55:54] but not now [15:56:03] haha [15:56:04] ok ok [15:56:37] root@neodymium:~# mysql --skip-ssl -hdb1101 -P3318 -e "select @@hostname" -BN [15:56:40] db1101 [15:56:41] \o/ [15:57:57] I have updated tendril for db1101 [15:59:11] it is looking good [16:01:35] check icinga [16:01:44] it will change with a delay [16:02:03] it will be very easy to create silly alerts with these movements [16:02:08] haha I was actually running puppet on einstenium [16:03:09] - check_command nrpe_check!check_mariadb_slave_sql_state_s7!10 [16:03:12] + check_command nrpe_check!check_mariadb_slave_sql_state_s8!10 [16:03:16] :) [16:04:02] will it succeed? [16:04:25] we will seeeee [16:04:35] so far the checks are showing up [16:04:41] for s8 [16:07:20] +echo 'db1071 is a Core DB Server s8 (mariadb::core)' [16:07:33] icinga works fine [16:07:49] It has lag! [16:07:52] haha [16:07:59] We also have to change prometheus port [16:08:02] from 3315 to 3318 [16:08:02] will it alert? [16:08:08] oh? [16:08:12] didn't it change [16:08:17] oh [16:08:19] you mean [16:08:25] the prometheus exporter config [16:08:28] yeah, sorry [16:08:30] which is still not automatic [16:08:35] :-( [16:08:58] lag is gone :) [17:12:37] 10DBA, 10Patch-For-Review: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359#3784339 (10Marostegui) db2085 is now fully compressed ``` root@db2085:~# df -hT /srv Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/tank-data xfs 3.6T 1.9T 1.8T 51% /srv ``` [19:03:37] 10DBA, 10MediaWiki-Configuration, 10Operations, 10Wikidata: Test moving testwikidatawiki database to s8 replica set on Wikimedia - https://phabricator.wikimedia.org/T180694#3784503 (10Marostegui) To give an update. Jaime successfully moved wikidatata to the s8 set of servers in codfw (passive DC). There is... [19:28:05] 10DBA, 10MediaWiki-Configuration, 10Operations, 10Wikidata: Test moving testwikidatawiki database to s8 replica set on Wikimedia - https://phabricator.wikimedia.org/T180694#3784515 (10Addshore) >>! In T180694#3784503, @Marostegui wrote: > To give an update. > Jaime successfully moved wikidatata to the s8 s... [19:31:40] marostegui: ^^ did you typo? [19:56:35] 10DBA, 10MediaWiki-Configuration, 10Operations, 10Wikidata: Test moving testwikidatawiki database to s8 replica set on Wikimedia - https://phabricator.wikimedia.org/T180694#3784552 (10Marostegui) wikidata: https://gerrit.wikimedia.org/r/#/c/391835/ Again, this was only for READS and has NO EFFECT on produc... [19:57:11] addshore: ^ :)