[05:08:16] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3711753 (10Marostegui) [05:08:45] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review: Drop now redundant indexes from pagelinks and templatelinks - https://phabricator.wikimedia.org/T174509#3686678 (10Marostegui) [05:30:28] 10DBA, 10Data-Services, 10Patch-For-Review: Toolforge oursql connecting to enwiki.analytics.db.svc.eqiad.wmflabs raises error 1615 'Prepared statement needs to be re-prepared' but works fine on enwiki.labsdb - https://phabricator.wikimedia.org/T179041#3711765 (10Marostegui) 05Open>03Resolved a:03Maroste... [07:27:44] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Identify tools hosting databases on labsdb100[13] and notify maintainers - https://phabricator.wikimedia.org/T175096#3582626 (10Marostegui) >>! In T175096#3711025, @Luke081515 wrote: > Is it possible to make a dump of the merl DB and put it into the too... [07:39:18] * elukey ready to copy data whenever is fine for the dba :P [07:39:32] o/ [07:39:36] let's go then [07:39:41] we have to downtime db1047 [07:39:45] stop all slaves [07:39:48] grab those positions [07:39:59] stop mysql [07:40:01] transfer it [07:40:07] start mysql on db1108 without replication [07:40:09] reset slave [07:40:14] and configure replication on both channels [07:40:21] and mysql_upgrade [07:40:42] going to downtime db1047 for a week [07:40:48] so db1108 will not replicate any wiki, only the log databse with the custom script [07:40:52] ah [07:40:54] nice [07:40:56] \o/ [07:41:03] let's copy everything though [07:41:12] and later we can drop everything but the log database [07:41:19] sure [07:41:23] otherwise ibdata will complain about tablespaces and all that jazz [07:41:38] and we have to reset slave all on db1108 anyways [07:41:48] to clean up all the replication leftovers after the copy [07:41:49] qq - what do you mean with "grab those positions" ? [07:41:58] ah [07:42:04] the output from show slave status on db1047 [07:42:14] if we wanted to start replication on db1108 we will need the coordenates [07:42:20] to know where to replicate from [07:42:25] logbin file and position [07:42:33] which task are you using to !log stuff? [07:42:56] https://phabricator.wikimedia.org/T177405 [07:43:01] great [07:43:08] we also have [07:43:09] https://phabricator.wikimedia.org/T156844 [07:44:03] do you have to stop eventlogging script on db1047 before i stop mysql? [07:44:27] better to do it but it will fail if mysql goes down [07:44:38] ok, you do it? [07:44:41] sure [07:44:45] thanks [07:45:19] replication stopped on db1047 [07:45:32] puppet disabled and eventlogging_sync stopped [07:45:44] great [07:45:45] stopping mysql [07:46:07] just in case: https://phabricator.wikimedia.org/P6187 [07:47:53] mmm [07:48:03] we do have to copy only log database [07:48:10] otherwise will not have space in db1108 [07:48:12] to copy all the content [07:48:47] that is going to be a pain [07:49:07] because mariadb will not like it right? [07:49:11] yeah [07:49:18] we cannot drop stuff on db1047 no? [07:49:30] * elukey thinks about it [07:49:44] we can try the copy and then deal with the errors [07:49:53] so your proposal would be to drop the wiki databases on db1047 ad then copy, so ibdata will be happier [07:49:59] yeah [07:50:14] because I am not sure how ibdata and tokudb will deal with it [07:50:36] and all the .tokudb files [07:50:40] for instance: _enwiki_sql_920_1ece_key_ep_org_city_d1ac_1_19_B_2.tokudb [07:51:24] so from what I know that host is not really used a lot [07:52:02] if we can keep the schemas that are not "Replica only" (like the wikis) I am fine with it [07:52:13] what do you mean? [07:52:23] trying to explain [07:53:17] my understanding is that all/some wikis are replicated into schemas that should not be different in any way from their masters (so no chance that anybody is working on them doing any kind of research work) [07:53:39] meanwhile there are other databases (maybe research or whatever) that can contain data that people need to keep [07:54:26] yeah [07:54:36] what i suggest is to drop s1.dblist and s2.dblist from there [07:54:41] +2 [07:54:46] is that fine with you? [07:54:46] let's do it [07:55:02] that means [07:55:39] https://phabricator.wikimedia.org/P6188 [07:55:57] We would also then must stop and unconfigure replication, as it will fail as those databases are no longer there [07:56:20] makes sense yes [07:57:05] should go drop those dbs then? [07:57:24] yep please do [07:57:28] ok [07:58:19] if you have patience can you explain to me how to unconfigure replication after the work? Curious :) [07:59:22] ah [07:59:23] it is easy [07:59:23] XD [07:59:44] root@EVENTLOGGING m4[(none)]> set @@default_master_connection='s1'; [07:59:45] Query OK, 0 rows affected (0.00 sec) [07:59:46] root@EVENTLOGGING m4[(none)]> reset slave all; [07:59:46] Query OK, 0 rows affected (0.01 sec) [07:59:49] root@EVENTLOGGING m4[(none)]> set @@default_master_connection='s2'; [07:59:52] Query OK, 0 rows affected (0.00 sec) [07:59:57] root@EVENTLOGGING m4[(none)]> reset slave all; [07:59:59] Query OK, 0 rows affected (0.10 sec) [08:00:04] root@EVENTLOGGING m4[(none)]> show all slaves status\G [08:00:07] Empty set (0.00 sec) [08:00:23] going to start with the drops [08:00:59] ahhh nice [08:01:38] doing the drops [08:01:48] screen -x drops [08:01:50] if you want to follow [08:02:42] we will see if that is enough :| [08:03:01] drops done [08:03:04] let's stop mysql now [08:04:39] still bigger than db1108 [08:05:43] do we have to migrate the user tables, or just log one? [08:06:25] log+staging? [08:06:35] i think we have to go for mysqldump option i believe [08:07:04] really? Still bigger? How is that possible? [08:07:48] root@db1047:/srv/sqldata# du -sh . [08:07:48] 3.9T . [08:08:24] there must be something super big other than the log databse [08:08:26] *databse [08:08:31] checking [08:08:33] all those crapy tokdu files [08:09:51] 57G _log_sql_420e_109dd026_main_6e352a157_1_1a_B_0.tokudb [08:09:52] 68G _log_sql_420e_29ae7b2_main_4cefd7892_1_1a_B_0.tokudb [08:09:52] 212G staging [08:09:52] 399G log [08:09:52] 2.4T ibdata1 [08:10:03] lovely [08:10:09] then we do have to go for mysqldump [08:10:13] to migrate it to file per table [08:10:16] and get rid of ibdata [08:10:20] yeah :( [08:10:28] ok [08:10:32] so we have to prepare db1108 for it [08:10:34] let me see [08:11:25] let me start a mysql instance there [08:11:31] otherwise we cannot do it :) [08:12:50] shall I or do you want to do it? [08:12:55] i will do it [08:12:58] it is painful sometimes [08:13:19] is there a happy moment in the life of a dba sometimes? [08:13:22] :D [08:13:35] yeah, when we log off [08:13:36] XDDD [08:17:04] so let's copy over only the log databsae [08:17:10] yeah, still setting up the instance [08:17:13] (this morning I can't type database) [08:17:24] let me know if I can help, feeling useless :( [08:19:40] looking good now [08:19:45] let me do some changes in puppet before i forget [08:21:46] maybe you can help me there [08:21:49] It feels like being in a very famous and old garage where super exper mechanics take care of your car, knowing what's wrong simply listening to the sound of the engine [08:22:10] "yeah that thing is clearly to be changed, can you hear it?" [08:22:12] :D [08:22:28] can you change in db1108 config: [client] [08:22:29] port = 3306 [08:22:29] socket = /run/mysqld/mysql.sock [08:22:33] but poor db1047 is not a fancy car, it is like a old Fiat Panda [08:22:38] sure [08:22:45] and same socket for the [mysqld] section [08:22:53] instead of /tmp/mysql.sock [08:22:59] it needs to be: /run/mysqld/mysql.sock [08:23:05] mysql is now up \o/ [08:24:21] yessss [08:24:21] and the tokudb plugin looks loaded [08:24:22] nice [08:24:29] going to start dumping remotely log database [08:24:32] staging is not needed? [08:25:23] first time that I hear the name [08:25:30] haha [08:25:35] let me dump the mysql user table first [08:25:40] so we can have the same grants from db1047 [08:25:50] I am fixing puppet [08:25:54] thanks! [08:27:15] users loaded and looking good [08:27:18] going to start the dump [08:29:30] I created https://gerrit.wikimedia.org/r/#/c/386586, checking what is going to change for db1046 [08:31:18] backups running [08:31:28] dumps, sorry [08:31:34] mmm [08:31:44] that will change db1046 [08:31:52] it should only change db1108 (for now I guess) [08:32:02] i mean we can change it for db1046 [08:32:04] but requires a restart [08:33:27] it shoudn't be a big problem, we can do it, or maybe simply keeping the change in the config file there waiting for the next restart? [08:33:39] no, because it will fail [08:33:47] as it will change the config where the init.d looks for the socket [08:33:53] so the correct order is [08:33:55] stop mysql [08:34:03] merge puppet so the new socket location is set to config [08:34:05] start mysql [08:34:13] ahhh [08:34:15] yeah [08:34:21] it is a bit messy [08:37:57] so pcc shows no fidd [08:37:59] *diff [08:38:07] I might have done something wrong [08:38:12] mmm [08:38:18] which profile db1047 and db1046 have? [08:39:09] so 47 has [08:39:13] class role::mariadb::misc::eventlogging::replica { include ::profile::mariadb::misc::eventlogging::database include ::profile::mariadb::misc::eventlogging::replication [08:39:21] meanwhile 46 should have only databse [08:39:38] so db1047 will be affected by that change? [08:40:18] so class mariadb::config's default for socket and port are the same ones that I changed [08:40:28] this is probably why puppet is showing up a no-op [08:40:53] mysql 26235 110 50.2 54657200 33147156 ? Sl 08:03 41:33 /opt/wmf-mariadb10/bin/mysqld --basedir=/opt/wmf-mariadb10 --datadir=/srv/sqldata --plugin-dir=/opt/wmf-mariadb10/lib/plugin --user=mysql --skip-slave-start --log-error=/srv/sqldata/db1047.err --pid-file=/srv/sqldata/db1047.pid --socket=/tmp/mysql.sock --port=3306 [08:40:58] it is using the "old" socket locaiton [08:41:02] it should change db1047 no? [08:41:34] it should but since we already have a if guard for base dir we can definitely add more parameters [08:41:40] so only db1108 will change [08:41:46] ah [08:41:48] then great! [08:42:16] thanks :* [08:43:11] still need to figure out though why the current default parameters for mariadb::config in the el::database profile are not applied [08:43:26] they should go to /etc/my.cnf right ? [08:43:28] or init.d ? [08:44:06] ah no basedir/my.cnf [08:44:16] marostegui: hey, can we deploy this? https://gerrit.wikimedia.org/r/386588 [08:44:21] checking [08:44:25] It's done now :) [08:44:59] it looks safe, no? :) [08:45:07] yes :) [08:45:28] +1 ed [08:45:58] do we have to iptimize wb_terms by the way? [08:46:01] to claim space back=? [08:46:44] marostegui: It's 5% smaller [08:46:49] your call :) [08:47:49] -rw-rw---- 1 mysql mysql 697G Oct 26 08:47 /srv/sqldata/wikidatawiki/wb_terms.ibd [08:47:52] might be worth [08:47:53] XD [08:49:26] ahhhh I am stupiiiidddd.. el's mariadb config uses eventlogging.my.cnf.erb [08:49:29] uffff [08:49:39] that doesn't use any socket variable [08:49:42] * elukey cries in a corner [08:50:01] hahahahaha [08:58:57] marostegui: https://gerrit.wikimedia.org/r/#/c/386586/3 [09:00:15] let's see [09:00:43] marostegui: we are also working on something that makes it cut to 1/3 at least [09:01:14] elukey: I wonder why db1047 shows a change if it is current my.cnf has /tmp [09:01:27] root@db1047:~# cat /etc/my.cnf | grep socket [09:01:27] socket = /tmp/mysql.sock [09:01:27] socket = /tmp/mysql.sock [09:01:41] Amir1: that is needed yeah [09:01:51] when you think it will be available? [09:02:42] several weeks from now, The change needed for it just got merged [09:03:02] marostegui: is shows a change in the class parameter, not in the file.. that makes sense since the variable was not used in there [09:03:08] no? [09:03:19] oh yes [09:03:19] haha [09:03:20] right [09:03:22] (in the file it only ends up the "MANAGED BY PUPPET" [09:03:25] * marostegui needs a coffee? [09:04:43] marostegui: I think we need to deploy this patch: https://gerrit.wikimedia.org/r/#/c/386588/ [09:04:43] :D :D [09:05:07] Amir1: oh, sorry, I wasn't aware you were waiting for me to merge [09:05:08] hehe [09:05:13] :D [09:05:37] doing it now [09:06:34] Amir1: merged [09:06:48] Thanks! [09:07:20] marostegui: merged and executed puppet on db1046/dbstore1002 (the rest have puppet disabled) [09:07:42] yeah, i have puppet disabled on 1108 [09:07:49] i will enable it once we have imported log table [09:07:54] otherwise mydumper will complain about SSL [09:08:57] makes sense [09:21:01] marostegui: so mydumper atm is grabbing the log db from db1047 automagically? [09:21:15] yeah [09:21:27] once done, we will import it on db1108 [09:21:31] it should be fast to import [09:21:36] it is not big [09:21:51] but as it is dumping remotely, it will take a bit to get [09:21:58] so far it has gotten 83G (it is compressing on the fly) [09:22:21] db1108: screen -x mydumper [09:22:24] to see what is is doing [09:23:50] nice! Thanks! [10:24:26] 10DBA, 10Patch-For-Review: Run pt-table-checksum on s3 - https://phabricator.wikimedia.org/T164488#3712029 (10Marostegui) [10:24:49] 10DBA, 10Operations: Decommission db1015, db1035, db1044 and db1038 - https://phabricator.wikimedia.org/T148078#3712033 (10Marostegui) [10:24:52] 10DBA, 10Patch-For-Review: Run pt-table-checksum on s3 - https://phabricator.wikimedia.org/T164488#3235303 (10Marostegui) 05Open>03Resolved Everything has been checksummed. I believe s3 is in good shape now. [10:25:34] 10DBA, 10Operations: Decommission db1015, db1035, db1044 and db1038 - https://phabricator.wikimedia.org/T148078#2714228 (10Marostegui) [12:38:00] 10DBA, 10Patch-For-Review: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662#3712197 (10Marostegui) [12:38:37] 10DBA, 10Patch-For-Review: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359#3712198 (10Marostegui) [12:53:38] 10DBA, 10Patch-For-Review: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359#3712208 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts: ``` db2086.codfw.wmnet ``` The log can be found in `/var/log/wmf-auto-reimage/2017102... [12:54:38] 10DBA, 10Patch-For-Review: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662#3712211 (10Marostegui) [12:55:31] elukey@db1108:~$ du -hs /srv/tmp/db1047 [12:55:31] 391G /srv/tmp/db1047 [12:55:35] ufff [12:55:45] yeah, still making its way [13:12:25] 10DBA, 10Patch-For-Review: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359#3712219 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db2086.codfw.wmnet'] ``` and were **ALL** successful. [13:46:23] it is finishing, no more threads are being created by mydumper [13:46:26] so the last 4 tables are being done now [13:46:37] one of them just finished, so only 3 pending [13:46:45] wow, 2! [13:48:44] finished [13:48:49] going to import them now on db1108 [13:49:58] importing [14:07:51] just read, nice! [14:09:30] elukey@db1108:~$ sudo mysql -u root --skip-ssl [14:09:30] ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mysql.sock' (2 "No such file or directory") [14:09:33] is it normal? [14:09:40] 10DBA, 10Operations, 10ops-eqiad: db1101 crashed - memory errors - https://phabricator.wikimedia.org/T178383#3712363 (10Marostegui) I have repooled db1101 [14:09:49] it is normal, because there is no socket there :) [14:09:56] ah yes still socket = /tmp/mysql.sock in my.cnf [14:10:11] nono I was wondering about the socket path [14:10:27] just use -S /run/mysqld/mysqld.sock [14:10:28] I guess we'll need to stop mysql, re-enable puppet, run it, restart [14:10:40] no, there is no need to stop mysql [14:11:01] to connect: mysql —skip-ssl -S /run/mysqld/mysqld.sock [14:11:15] once we run puppet you will not need -S [14:11:21] because client is updated [14:11:29] as the my.cnf will get updated and use it [14:11:30] yeah [14:11:48] don't you love databases? [14:12:10] I do! [14:13:00] let me know if I can help [14:13:46] nah, we just need to wait [14:13:54] for the import to finish [14:14:00] we should stop and start mysqlk to make sure it goes fine [14:14:02] and then we are done [14:14:13] we also need to run mysql_upgrade [14:14:24] i will do it once the tables are imported [14:14:28] then stop mysql, and then start it [14:14:38] will enable puppet once stopped [14:14:45] we can enable puppet on db1047 if you like [14:15:01] if i enable puppet on db1108 it will start the event logging script, right? [14:15:04] is that ok? [14:15:50] I think that I masked the unit [14:15:52] lemme check [14:16:43] masked, so it is fine to run puppet [14:16:52] ok, it will not be anytime soon :) [14:16:53] if you are ok I'll re-enable puppet on db1047 [14:17:00] yeah, db1047 should be fine [14:17:16] db1108 we have to wait [14:17:25] so for db1108 these are the steps [14:17:33] - wait for the import to finish [14:17:37] - run mysql_upgrade [14:17:40] - stop mysql [14:17:42] - enable and run puppet [14:17:44] - start mysql [14:17:47] that should be it [14:18:20] ah mysql_upgrade to upgrade mysql tables.. will it work smoothless? [14:18:32] smoothly, better :D [14:18:33] yeah, it should be fine [14:19:32] db1047 seems fine [14:19:59] 10DBA, 10Patch-For-Review: Support multi-instance on core hosts - https://phabricator.wikimedia.org/T178359#3712371 (10Marostegui) a:03Marostegui [14:20:07] good! [14:20:08] 10DBA, 10Patch-For-Review: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662#3712372 (10Marostegui) a:03Marostegui [15:45:20] 10DBA, 10Cloud-VPS, 10Data-Services: Add pp_propname/pp_value index to Labs replica - https://phabricator.wikimedia.org/T140609#2470422 (10bd808) The `*.{analytics,web}.db.svc.eqiad.wmflabs` servers should have these indexes (from maintenance/tables.sql) on the page_props table: ``` CREATE UNIQUE INDEX /*i*/... [15:50:31] 10DBA, 10Cloud-VPS, 10Data-Services: Add pp_propname/pp_value index to Labs replica - https://phabricator.wikimedia.org/T140609#2470422 (10Marostegui) >>! In T140609#3712651, @bd808 wrote: > The `*.{analytics,web}.db.svc.eqiad.wmflabs` servers should have these indexes (from maintenance/tables.sql) on the pa... [15:52:23] 10DBA, 10Cloud-VPS, 10Data-Services: Add pp_propname/pp_value index to Labs replica - https://phabricator.wikimedia.org/T140609#3712669 (10Marostegui) They are actually there: ``` mysql:root@localhost [enwiki]> select @@hostname; +------------+ | @@hostname | +------------+ | labsdb1010 | +------------+ 1 ro... [15:52:53] 10DBA, 10Cloud-VPS, 10Data-Services: Add pp_propname/pp_value index to Labs replica - https://phabricator.wikimedia.org/T140609#3712670 (10bd808) >>! In T140609#3712651, @bd808 wrote: > The `*.{analytics,web}.db.svc.eqiad.wmflabs` servers should have these indexes (from maintenance/tables.sql) on the page_pr... [15:54:09] 10DBA, 10Cloud-VPS, 10Data-Services: Add page_props.pp_value index to Wiki Replicas - https://phabricator.wikimedia.org/T140609#3712676 (10bd808) [15:57:08] 10DBA, 10Cloud-VPS, 10Data-Services: Add page_props.pp_value index to Wiki Replicas - https://phabricator.wikimedia.org/T140609#3712679 (10Marostegui) If you want it to be added there can you please add it here: `./modules/role/files/labs/db/views/extra-wikireplicas-only-indexes.sql` [16:13:05] 10DBA, 10Cloud-VPS, 10Data-Services: rev_len should be available also for deleted revisions in database replicas - https://phabricator.wikimedia.org/T101631#3712712 (10bd808) @Bawolff can you think of a reason that we should be hiding the length of rev deleted revisions in the replicas? They do seem to be [[... [16:39:43] marostegui: need to log off but I'll check tomorrow morning db1108, since afaics when the load finishes nothing will happen [16:39:53] (puppet disabled, eventlogging_sync masked, etc..) [17:01:13] elukey: sure, no worries! [17:01:18] it will take the whole night to import anyways [17:14:53] marostegui: not super urgent, but your feedback would be useful here at T178128 [17:14:53] T178128: Access to raw database tables on labsdb* for wmcs-admin users - https://phabricator.wikimedia.org/T178128 [17:18:50] madhuvishy: i will try to take a look tomorrow or monday [17:19:01] marostegui: okay thanks! :) [17:19:06] :-) [17:29:14] 10DBA, 10Analytics, 10Data-Services, 10Research, 10cloud-services-team (Kanban): Implement technical details and process for "datasets_p" on wikireplica hosts - https://phabricator.wikimedia.org/T173511#3713170 (10bd808) >>! In T173511#3558849, @bd808 wrote: > The urgency for `datasets_p` is that finding... [17:31:03] 10DBA, 10Operations, 10cloud-services-team, 10Scoring-platform-team (Current): Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3713192 (10bd808) >>! In T168584#3570152, @bd808 wrote: > If we do lose a disk on 1001/3 to the powercycle though it will be hard to recover so we s... [17:55:14] marostegui: bd808 So i put up https://gerrit.wikimedia.org/r/#/c/386660 for the dns switchover for labsdb1001, when we talked about this before jaime suggested doing something like this a day/some hours before the reboot, but that will be a Sunday. I am wondering if we should do it tomorrow end of day but not sure if 1003 can handle the load over the weekend, any thoughts? [17:56:25] (or I could wake up super early on Monday 30th) [18:16:35] 10DBA, 10Operations, 10cloud-services-team, 10Patch-For-Review, 10Scoring-platform-team (Current): Labsdb* servers need to be rebooted - https://phabricator.wikimedia.org/T168584#3713361 (10madhuvishy) Started a planning doc for the reboots here - https://etherpad.wikimedia.org/p/labsdb-reboots [18:28:41] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata: Drop the "wb_terms.term_language" index - https://phabricator.wikimedia.org/T179106#3713420 (10hoo) [18:29:40] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata: Drop the "wb_terms.wb_terms_language" index - https://phabricator.wikimedia.org/T179106#3713408 (10hoo) [18:30:01] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata: Drop the "wb_terms.wb_terms_language" index - https://phabricator.wikimedia.org/T179106#3713408 (10hoo) [22:18:17] madhuvishy: whatever you and marostegui think is right works for me [23:59:33] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Identify tools hosting databases on labsdb100[13] and notify maintainers - https://phabricator.wikimedia.org/T175096#3714074 (10Quiddity) >>! In T175096#3711831, @Marostegui wrote: >>>! In T175096#3711025, @Luke081515 wrote: >> Is it possible to make a...