[06:41:27] 07Blocked-on-schema-change, 10DBA: Deploy I2b042685 to all databases - https://phabricator.wikimedia.org/T139090#2419082 (10Marostegui) >>! In T139090#2778068, @jcrespo wrote: > After running over 30 000 alter tables, this is nominatively done; > > Nice job!!! :) [07:32:42] 10DBA: Fix PK on S5 dewiki.revision - https://phabricator.wikimedia.org/T148967#2778728 (10Marostegui) This is now running on db2023 [08:00:09] 10DBA, 06Operations, 10ops-codfw: db2034 crashes meta ticket - https://phabricator.wikimedia.org/T150233#2778754 (10Marostegui) [08:00:38] 10DBA, 06Operations, 10ops-codfw: db2034 crashes meta ticket - https://phabricator.wikimedia.org/T150233#2778772 (10Marostegui) [08:01:08] 10DBA, 06Operations, 10ops-codfw: db2034 crashes meta ticket - https://phabricator.wikimedia.org/T150233#2778754 (10Marostegui) [08:01:11] 10DBA, 06Operations, 10ops-codfw: db2034 crash - https://phabricator.wikimedia.org/T137084#2356666 (10Marostegui) [08:01:34] 10DBA, 06Operations, 10ops-codfw: db2034 crashes meta ticket - https://phabricator.wikimedia.org/T150233#2778754 (10Marostegui) [08:01:36] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2755877 (10Marostegui) [08:12:05] 10DBA, 13Patch-For-Review: Unify commonswiki.revision - https://phabricator.wikimedia.org/T147305#2778780 (10Marostegui) dbstore2001,2002 are done ``` root@neodymium:~# for i in dbstore2001 dbstore2002; do echo $i; mysql -h$i.codfw.wmnet -A commonswiki -e "show create table revision\G";done dbstore2001 ******... [08:46:31] 10DBA, 13Patch-For-Review: Unify commonswiki.revision - https://phabricator.wikimedia.org/T147305#2778800 (10Marostegui) This is now running on dbstore1001 [09:47:52] 10DBA, 06Operations, 10ops-codfw: db2034 crashes meta ticket - https://phabricator.wikimedia.org/T150233#2778883 (10jcrespo) @Marostegui thank you for this work, I know it takes some time [09:48:45] :-) thanks [09:54:04] packaged 10.1.19 [09:54:25] labsdb1009 down (power supply failure) [09:55:01] Yeah, I read that :( [09:55:23] better now than with data [10:17:16] 10DBA, 13Patch-For-Review: Unify commonswiki.revision - https://phabricator.wikimedia.org/T147305#2778978 (10Marostegui) This is running on db1059: ```./software/dbtools/osc_host.sh --host=xxx --port=3306 --db=commonswiki --table=revision --method=ddl --no-replicate "DROP KEY rev_id, DROP PRIMARY KEY, ADD PRI... [10:41:05] there is lag on dbstore1001 - s4. Doing anything there? [10:42:18] I see yes, no problem then [10:58:13] 07Blocked-on-schema-change, 10DBA: Deploy I2b042685 to all databases - https://phabricator.wikimedia.org/T139090#2779071 (10jcrespo) self reminder: I cannot check right now dbstore2002.codfw.wmnet because it is under maintenance. [11:05:56] jynus: yes, sorry, an alter [12:04:09] what do you think of https://gerrit.wikimedia.org/r/320358 ? [12:04:45] checking [12:05:09] sure [12:05:21] this is in preparation for another change [12:05:25] when did you add me as a reviewer? never got the email [12:05:34] oh? [12:05:45] that is important, that you receive those [12:05:54] I always do [12:06:07] Aaaa, it went to spam :? [12:06:12] lol [12:06:19] put some filters there [12:06:45] I +1'ed [12:07:01] thank you, I will send another after I deply this ones [12:07:05] *one [12:10:01] sounds good! [12:12:10] 10DBA: Fix PK on S5 dewiki.revision - https://phabricator.wikimedia.org/T148967#2779315 (10Marostegui) db2023 is now done ``` root@neodymium:~# mysql -hdb2023.codfw.wmnet -A dewiki -e "show create table revision\G" *************************** 1. row *************************** Table: revision Create Tabl... [12:13:23] 10DBA: Fix PK on S5 dewiki.revision - https://phabricator.wikimedia.org/T148967#2779318 (10Marostegui) [12:15:33] 07Blocked-on-schema-change, 10DBA: Deploy I2b042685 to all databases - https://phabricator.wikimedia.org/T139090#2779319 (10jcrespo) Aside from the dbs that do not exist on labs, and the above 2 hosts, the following failed to apply correctly: ``` KEY `pl_backlinks_namespace` (`pl_namespace`,`pl_title`,`pl_fr... [12:20:16] 07Blocked-on-schema-change, 10DBA: Deploy I2b042685 to all databases - https://phabricator.wikimedia.org/T139090#2779321 (10jcrespo) I will not fix labsdb1001 and labsdb1003- they are at EOL, and the new ones will have the right structure; plus it will create unnecessary lag and metadata locking to users. Thi... [12:23:05] 10DBA: Fix PK on S5 dewiki.revision - https://phabricator.wikimedia.org/T148967#2779337 (10Marostegui) codfw is finished so now eqiad needs fixing, the following hosts needs to get the right PK and the right indexes: ``` db1069.eqiad.wmnet PRIMARY KEY (`rev_page`,`rev_id`), labsdb1001.eqiad.wmnet PRIMARY KEY... [12:37:43] 07Blocked-on-schema-change, 10DBA: Deploy I2b042685 to all databases - https://phabricator.wikimedia.org/T139090#2779406 (10jcrespo) Now that dbstore2002 is up again, I can see it is fully ALTER'ed for the shards that it contains (s1, s4, s3). db2034 is considered broken, so no issue with that. I will depool... [12:44:44] I think there is too low concurrency on the mariadb connection pool [12:44:57] we should increase it, at least on the newer servers [14:22:33] db1059 is not fully depooled, that is a problem [14:22:48] fully deployed? [14:22:53] depooled [14:22:53] ? [14:23:10] it says it is depooled, but it is receiving api traffic [14:23:22] ? [14:23:39] how come? [14:23:44] Maybe we can stop repklication? [14:23:50] to force the load balancer to "forget" it? [14:24:18] it is deployed as depooled [14:24:28] but it is not on the repo? [14:25:00] • 10:08 marostegui@tin: Synchronized wmf-config/db-eqiad.php: Depool db1059 - T149079 T147305 (duration: 00m 57s) [14:25:00] T147305: Unify commonswiki.revision - https://phabricator.wikimedia.org/T147305 [14:25:01] T149079: codfw: Fix S4 commonswiki.templatelinks partitions - https://phabricator.wikimedia.org/T149079 [14:26:17] anyway, I am doing a complex patch, will ask for your help soon [14:26:24] oki [14:27:28] 10DBA, 06Operations, 10ops-codfw: install new disks into dbstore2001 - https://phabricator.wikimedia.org/T149457#2779776 (10Marostegui) @Papaul can we do this on Thursday? On Wednesday night I will take a snapshot of dbstore2001 so by Thursday we should be good to go on Thursday. I have been talking to @Vo... [14:32:27] re: db1059…a tcpdump reveals no traffic from API (so far) [14:32:47] (read traffic) [14:32:57] yes, it must me my local copy only [14:49:10] marostegui, please check https://gerrit.wikimedia.org/r/320401 [14:49:16] it is quite important [14:52:02] checking [14:53:26] added 1 comment [15:08:13] uploaded new version [15:08:20] checking [15:08:28] check thoroughly, it is too many things at the same time [15:08:39] I am checking also [15:12:12] jynus: did you see my comment? (i see no patch after that comment, just saying) [15:12:49] maybe I didn't downloaded it? [15:12:56] it should be up now [15:13:05] *uploaded it [15:13:34] now i see it [15:13:41] checking then [15:14:01] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations, 13Patch-For-Review: Migrate labsdb1005/1006/1007 to jessie - https://phabricator.wikimedia.org/T123731#2779859 (10yuvipanda) I wonder if it'll be better to do this next quarter. We've already done a few bits of pretty disruptive maintenance, and have on... [15:17:03] shit [15:17:05] I hit +2 [15:17:08] by mistake :( [15:18:11] I wanted to +1, damn [15:21:08] jynus: ^ [15:39:06] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations, 13Patch-For-Review: Migrate labsdb1005/1006/1007 to jessie - https://phabricator.wikimedia.org/T123731#2779893 (10jcrespo) > I wonder if it'll be better to do this next quarter. I am ok with next quarter- let's set a time. I have workarounded the 5.5 s... [15:43:39] volans, check https://tendril.wikimedia.org/report/slow_queries?host=^db&user=wikiuser&schema=wik&qmode=eq&query=&hours=5 [15:43:55] it is not that I do not think db1065 is a problem [15:44:06] it is just that it is an old problem [15:45:41] interesting [15:49:25] marostegui, see what you thing about https://gerrit.wikimedia.org/r/320414 instead [15:49:40] let's see [15:50:37] Sounds good, I am sure those .65 and .66 will appreciate the help although they have a lot more memory, maybe it is better 3,3,1 as weights? [15:50:40] just asking [15:51:06] jynus: I know slow queries there are "normal", but the spikes on the DB specific graphs are unrelated IMHO, but a bit busy right know, cannot elaborate [15:51:27] no, those happend from time to time [15:52:09] specially when we have extra load and 2 less servers than usual [15:53:56] marostegui, api server have also main traffic, I would prefer 1,1,1 [15:54:45] true, they have 50,50 [15:54:50] good then [15:56:04] in most cases it is a connection issue rather than load [15:56:35] we will see what happens now [16:05:18] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations, 13Patch-For-Review: Migrate labsdb1005/1006/1007 to jessie - https://phabricator.wikimedia.org/T123731#2779979 (10yuvipanda) Ok. Early January? [16:08:53] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations, 13Patch-For-Review: Migrate labsdb1005/1006/1007 to jessie - https://phabricator.wikimedia.org/T123731#2779984 (10jcrespo) January ok, but after the 15th. [16:13:59] we need one more server on s1, and that is db1095 [16:35:53] 10DBA, 06Operations, 10ops-eqiad: Multiple hardware issues on db1073 - https://phabricator.wikimedia.org/T149728#2780028 (10Cmjohnson) Replaced the disk at slot 4 [16:36:43] * marostegui crosses his fingers ^ [16:37:58] 10DBA, 13Patch-For-Review: Unify commonswiki.revision - https://phabricator.wikimedia.org/T147305#2780035 (10Marostegui) db1059 is done: ``` root@neodymium:/home/marostegui/git/software/dbtools# mysql -hdb1059 -A commonswiki -e "show create table revision\G" *************************** 1. row ****************... [16:39:48] git fetch [16:39:50] gah [16:39:55] lol [16:41:27] 10DBA, 13Patch-For-Review: codfw: Fix S4 commonswiki.templatelinks partitions - https://phabricator.wikimedia.org/T149079#2780060 (10Marostegui) ALTER running on db1059. [16:50:50] somthing went wrong with https://gerrit.wikimedia.org/r/320346 [16:50:57] I do not know what it is [16:51:04] but it doesn't matter now [16:51:10] the server is depooled [16:54:54] yeah, tcpdump wasn't showing anything [16:55:08] what do you see wrong there? [16:55:17] not with the patch itself [16:55:42] some kind of different ordering or something on tin [16:57:29] I will delay https://gerrit.wikimedia.org/r/320392 for tomorrow [16:57:40] ok [16:57:46] Remember tomorrow bank holiday in Madrid [16:57:58] remember that you, too [16:58:02] XDD [16:58:03] wink wink [16:58:08] XDD [16:58:23] I will probably login before going to bed to start a snapshot on dbstore2001 but that is 2 minutes :) [16:58:53] So it is ready by thursday so we can place the new disks there: https://phabricator.wikimedia.org/T149457#2779776 [16:59:11] why a snapshot on dbstore2001? [16:59:28] it will be wiped [16:59:38] Yes, but it will be transfered to dbstore2002 :) [16:59:51] I feel better if I have one :) [16:59:56] will it fit? [16:59:59] yes [17:00:05] how? [17:00:07] more than enough space in dbstore2002 [17:00:18] and I am also compressing S3 top tables and it is going really well [17:00:34] are we taking like 3TB on 2002? [17:00:51] No, the tar.gz of dbstore2001 will be less than 600G [17:01:12] true, normal backups take 250GB or so [17:01:16] per shard [17:01:19] Yep [17:01:31] And S1 and S4 are compressed + the compression of S3 has saved already around 500G [17:01:35] And there are still some wikis to go [17:01:40] nice [17:01:58] I think the total size of the three shards is goiing to be 2.2 or 2.0T [17:04:54] you got to love the new servers I bought- reducing alter table time 5x [17:05:28] while you are away, I will try to reimage some servers to jessie [17:05:45] and implement the unix_auth plugin [17:07:41] 10DBA, 13Patch-For-Review: Unify commonswiki.revision - https://phabricator.wikimedia.org/T147305#2780094 (10Marostegui) >>! In T147305#2778800, @Marostegui wrote: > This is now running on dbstore1001 Finished: ``` MariaDB MARIADB dbstore1001 commonswiki > show create table revision\G ***********************... [17:09:06] The old servers took 5x time more? [17:09:08] :_( [17:13:58] 10DBA, 06Operations, 10ops-codfw: install new disks into dbstore2001 - https://phabricator.wikimedia.org/T149457#2780119 (10Papaul) @Marostegui yes Thursday 10:00 am works for me. [18:04:16] 10DBA, 06Operations, 10ops-codfw: install new disks into dbstore2001 - https://phabricator.wikimedia.org/T149457#2780233 (10Marostegui) Great thank you! I will wait for you and once you are around I will shutdown the server then Thanks! [18:13:58] 10DBA: Test InnoDB compression - https://phabricator.wikimedia.org/T139055#2780291 (10Marostegui) I have tested compression on the top3 tables in S3 (revision, pagelinks, templatelinks) and the dataset has been reduced around `500G` in that whole shard. dbstore2001 now contains S1, S3, S4 (S1 and S4 fully compr... [18:56:50] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2755877 (10Papaul) Dear Mr Papaul Tshibamba, Thank you for contacting Hewlett Packard Enterprise for your service request. This email confirms your request for service and the details are below. Y... [19:38:51] 07Blocked-on-schema-change, 10DBA, 13Patch-For-Review: Deploy I2b042685 to all databases - https://phabricator.wikimedia.org/T139090#2780605 (10jcrespo) 05Open>03Resolved This is officially deployed- it only took continuous schema changes for 2 months. [19:41:52] 10DBA, 10CirrusSearch, 06Discovery, 06Discovery-Search (Current work), and 2 others: CirrusSearch SQL query for locating pages for reindex performs poorly - https://phabricator.wikimedia.org/T147957#2710120 (10jcrespo) a:03jcrespo [19:42:08] 10DBA, 10CirrusSearch, 06Discovery, 06Discovery-Search (Current work), and 3 others: MySQL chooses poor query plan for link counting query - https://phabricator.wikimedia.org/T143932#2583671 (10jcrespo) a:03jcrespo [22:25:21] jynus: l10n_cache shouldn't be used in production [22:25:32] In fact, it's probably worth clearing out the table [22:25:39] I see enwiki has many rows [22:30:18] 10DBA: truncate l10n_cache table on WMF wikis - https://phabricator.wikimedia.org/T150306#2781385 (10Reedy) [22:52:56] 10DBA, 07Epic, 07Tracking: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921#2781458 (10jcrespo) [22:52:58] 10DBA: truncate l10n_cache table on WMF wikis - https://phabricator.wikimedia.org/T150306#2781457 (10jcrespo) [22:53:25] 10DBA: truncate l10n_cache table on WMF wikis - https://phabricator.wikimedia.org/T150306#2781385 (10jcrespo) I know this table doesn't have to be dropped, only dropped virtually (truncated). [22:53:38] WFM :) [23:22:46] 10DBA, 06Community-Tech, 13Patch-For-Review, 05WMF-deploy-2016-11-01_(1.29.0-wmf.1), 05WMF-deploy-2016-11-08_(1.29.0-wmf.2): Create a maintenance script for populating the local_user_id and global_user_id fields in the centralauth localuser table - https://phabricator.wikimedia.org/T142503#2781510 (10Dann...