[07:04:54] 10DBA, 13Patch-For-Review: Rampant differences in indexes on enwiki.revision across the DB cluster - https://phabricator.wikimedia.org/T132416#3063687 (10Marostegui) db2034 is done: ``` Table: revision Create Table: CREATE TABLE `revision` ( `rev_id` int(8) unsigned NOT NULL AUTO_INCREMENT, `rev_pag... [07:31:24] 10DBA: Investigate db1047 replication lag - https://phabricator.wikimedia.org/T159266#3063702 (10Marostegui) So, by looking at the binlogs I have seen that all activity related to pt-table-checksum finished at this time: ``` db1047-bin.005028 #170228 17:13:33 ``` As I said yesterday, BBU looked good. Although... [07:41:02] 10DBA, 13Patch-For-Review: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485#3063704 (10Marostegui) I have resumed the run on eowiki: ``` Resuming from eowiki.revision chunk 932, timestamp 2017-02-28 17:13:33 Checksumming eowiki.revision: 3%... [07:45:05] 10DBA, 13Patch-For-Review: Install and reimage dbstore1001 as jessie - https://phabricator.wikimedia.org/T153768#3063707 (10Marostegui) The misc backups were generated: ``` root@dbstore1001:/srv/backups# ls -lh total 39G -rw-r--r-- 1 root root 16G Mar 1 01:59 m1-20170301010001.sql.gz -rw-r--r-- 1 root root 1... [07:55:27] 10DBA, 13Patch-For-Review: Install and reimage dbstore1001 as jessie - https://phabricator.wikimedia.org/T153768#3063711 (10Marostegui) Looks like the job is running (with lots of other jobs) or at least bacula reports so : ``` Running Jobs: Console connected at 01-Mar-17 07:48 48872 Full dbstore1001.eqia... [08:13:52] 10DBA, 10MediaWiki-User-blocking, 03Community-Tech-Sprint: Do test queries for range contributions to gauge performance of using different tables - https://phabricator.wikimedia.org/T156318#3063737 (10Legoktm) Shouldn't ipc_hex be NOT NULL? Also, are we going to have an identical table for archive? [08:40:38] 10DBA, 10MediaWiki-User-blocking, 03Community-Tech-Sprint: Do test queries for range contributions to gauge performance of using different tables - https://phabricator.wikimedia.org/T156318#3063773 (10jcrespo) @MusikAnimal Table design it 100% dependent of the queries that are going to be done, please add a... [08:48:26] 10DBA, 06Community-Tech, 10MediaWiki-Categories, 13Patch-For-Review: Increase size of categorylinks.cl_collation column - https://phabricator.wikimedia.org/T158724#3063820 (10thiemowmde) > I'm worried about bloating an already huge table […] I wonder why you are phrasing this as a response to what I wrote... [08:59:36] 10DBA, 13Patch-For-Review: Install and reimage dbstore1001 as jessie - https://phabricator.wikimedia.org/T153768#3063864 (10Marostegui) I am doubtful whether it is going to run sometime. If we look at the past events: ``` 01-Feb 23:31 dbstore1001.eqiad.wmnet-fd JobId 46761: shell command: run ClientRunBeforeJ... [09:05:42] 10DBA, 06Community-Tech, 10MediaWiki-Categories, 13Patch-For-Review: Increase size of categorylinks.cl_collation column - https://phabricator.wikimedia.org/T158724#3063867 (10Aklapper) [ Though people might disagree on approaches, could everyone please be respectful and assume people mean well? Thanks a lo... [09:13:10] 07Blocked-on-schema-change, 10DBA, 06Community-Tech, 06Stewards-and-global-tools (Temporary-UserRights): Schema changes for expiring user groups - https://phabricator.wikimedia.org/T155605#3063873 (10Marostegui) The errors stopped right after I stopped the ALTERs and there has been no more. So I am going t... [09:17:45] 10DBA, 06Community-Tech, 10MediaWiki-Categories, 13Patch-For-Review: Increase size of categorylinks.cl_collation column - https://phabricator.wikimedia.org/T158724#3063877 (10jcrespo) If we are doing things "right", hashes are a bad idea for primary keys (foreign keys of this table). They can have collisio... [09:48:52] 10DBA, 13Patch-For-Review: Install and reimage dbstore1001 as jessie - https://phabricator.wikimedia.org/T153768#3063951 (10Marostegui) Looking at other jobs, looks like they are quite delayed too: ``` -- Scheduled time: 26-Feb-2017 04:05:02 Start time: 28-Feb-2017 12:27:04 -- Schedu... [10:00:31] leave it as it is marostegui :-), no need to keep it monitoring until it fails [10:01:03] the backups? [10:01:26] yeah :-) [10:01:36] do we get a notification? [10:01:53] I was trying to understand why it is not running, is it usual to get it that delayed? [10:01:57] yes [10:02:08] specially lately [10:02:56] Ah ok ok. Then I will not force the backups to run, and let it either fail or run :) [10:03:12] wait a couple of day, then get worried [10:03:35] last time was because we run out of time [10:03:43] *ran [10:03:58] Ah, couple of days. sounds "good" as in that is normally what the other jobs are getting delayed :) [10:09:46] do you have handy a non-special slave with good indexes? [10:09:58] good = the last ones? [10:10:13] as in, indexes as they should be [10:10:39] rc slave = db2034 which is the indexes we agreed on for those slaves [10:11:06] if you want a non rc slave: db2070 [10:11:12] (those are enwiki, both of them) [10:14:40] I am checking this: https://phabricator.wikimedia.org/T159319#3064012 [10:15:32] oh wow, the first one [10:42:53] FYI, I'm upgrading kernel on trusty DB servers (as usual no reboots) [10:43:06] ok, thanks for the heads up! [11:50:05] 10DBA: Investigate db1047 replication lag - https://phabricator.wikimedia.org/T159266#3064167 (10Marostegui) I have been doing more pt-table-checksum runs this morning and I have stopped now as I am going for lunch. There are no pending transaction executions on db1047 so, I would discard pt-table-checksum for n... [12:13:20] 10DBA: Investigate db1047 replication lag - https://phabricator.wikimedia.org/T159266#3064209 (10jcrespo) 05Open>03Resolved a:03jcrespo Let's close it, I only opened as a a reminder if it continued the following day. [13:28:01] jynus: what do you think about notice sent today for maint next wed for https://phabricator.wikimedia.org/T157359#3050755? [13:28:19] iiuc it's more or less you guys doing the maint and we are just standing around being really good looking [13:28:25] and offering moral support [13:29:16] wed+thur looks ok to me, but we should ask manuel, too [13:29:37] I would do it wed [13:29:40] kk [13:29:45] I'll make sure the annoucnement is sent today [13:29:46] (so we have more days left just in case) [13:30:04] marostegui, I would start it on wed- I am not sure I would finish then :-) [13:30:08] or try to I don't seem to have the labs-announce list password but I'll figure it otu :) [13:30:16] yeah it's noted as taking a few days [13:30:34] jynus: even better reason to start wed :p [13:30:42] should we say 48 hours? [13:30:49] or 72 hours [13:31:28] let's stick to 48, but starting 8 UTC [13:31:44] it should be just a few hours [13:31:53] k [13:31:58] it is stop, copy, reimage, copy back, right? [13:32:01] copy-reimage-copy twice [13:32:33] the problem is we are not that much in contact with who and how it is used, so things could happen [13:32:37] we also need to include cross fingers somewhere there [13:32:42] e.g. we knew mariadb 10 worked [13:32:53] jynus: yeah someone will show up angry but we don't have a ton of choice here [13:33:01] we are not sure about postgres 9.x [13:33:09] whatever is jessies version [13:33:16] hm [13:33:18] right [13:33:18] so I wanted extra time [13:33:24] it is aronud 1T data to copy over [13:33:29] not that it doesn't work [13:33:35] it is used in production chasemp [13:33:46] maps in prod uses postgres? [13:33:47] but in case something inexpected happens [13:33:54] yeah, and other stuff [13:34:17] I think puppet :D [13:34:29] what I meant is that mariadb 10 had been tested for years on labs [13:34:37] and despite it, some complained [13:34:40] yeah I'm with ya [13:35:11] also migration is not as streamlined as mysql [13:35:22] worse case scenario, we have to export and reimport [13:35:47] that is why I want that extra time, even if we don't use it [13:36:03] or random puppet stuff that is not compatible with jessie [13:36:46] we should abuse discovery, in case they have some tips about osm on jessie [13:38:47] understood, thanks jynus [13:43:30] jynus: marostegui spot check my announcement please https://etherpad.wikimedia.org/p/labsdb10067 [13:44:20] we should call the servers by the name know by the users [13:44:28] do you know if you have a dns or alias? [13:44:45] like toolsdb or enwiki.sql? [13:44:45] I believe they connect to them directly by name [13:44:49] ah, good [13:44:52] I don't /think/ so [13:45:13] changing ot fqdn in case [13:45:39] let's then just add that they are the PostgresQL ("OSM") servers [13:45:48] ok [13:46:03] the text disappeared [13:46:10] ah :-) [13:46:15] https://etherpad.wikimedia.org/p/labsdb10067 [13:46:24] copy/paste gave me fits in ff [13:47:06] date fix noted [13:47:08] iso date, I do know not what data you had written [13:47:10] fixing subject name too [13:47:13] should we also mention that we will update postgre? [13:47:21] americans and their silly dates [13:47:23] no point [13:47:32] it is not like there is an alternative [13:47:56] we can add an "upgrade client libaries" if you need, though [13:47:56] ok! [13:48:14] but that can wait until the post-maintenance [13:48:17] can you add that if you want? not sure how to word [13:48:18] yeah [13:48:21] I'm ok as-is [13:48:56] sure! [13:49:00] let me change the wording of the window [13:49:03] ok [13:49:04] 48 of window [13:49:21] I want to avoid "do you really need so much time down?" [13:49:22] comment [13:49:35] XDD [13:49:52] it's like you've been here before [13:51:37] do you mind? [13:51:38] Thank you, [13:51:39] The Labs & DBA teams [13:52:37] the rest is form, which I do not care so much [13:52:50] I would apologize more than say thanks [13:52:57] but it is not like it is that important [13:53:45] it's an implicit thanks for their patience :D [13:53:53] and grace in understanding operational realities [13:53:59] ok off I go [13:56:31] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3064388 (10chasemp) ```We need to take labsdb1006.eqiad.wmnet and labsdb1007.eqiad.wmnet offline to update them from Ubuntu Precise to Debian Jessie on 2017-03-08. This... [13:56:44] 10DBA, 06Labs, 10Labs-Infrastructure, 06Operations: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359#3064389 (10chasemp) [14:11:51] 07Blocked-on-schema-change, 10DBA, 06Community-Tech, 06Stewards-and-global-tools (Temporary-UserRights): Schema changes for expiring user groups - https://phabricator.wikimedia.org/T155605#3064413 (10Marostegui) All the wikis in s3 have been finished. I believe all the shards are now done. I am going to re... [14:19:35] 10DBA, 13Patch-For-Review: Deploy gtid_domain_id flag in our mysql hosts - https://phabricator.wikimedia.org/T149418#3064434 (10JAllemandou) Thanks for the answer @Marostegui . Slow data is better than no data :) I'll keep listening to this task for news and ping once in a while. [15:17:25] 10DBA, 06Release-Engineering-Team: Missing / Dropped databases? - https://phabricator.wikimedia.org/T132838#2211904 (10Marostegui) We had this same issue with the dbstore servers while working on: T155605 [15:40:33] 10DBA, 10MediaWiki-Database, 10MediaWiki-Logging, 06Performance-Team, and 2 others: Logging needs an index to optimize searching by log_title - https://phabricator.wikimedia.org/T68961#3064637 (10Huji) @Umherirrender may I asked what workaround you are talking about? The BloomCache workaround is not possib... [15:53:56] 10DBA, 10MediaWiki-Database, 10MediaWiki-Logging, 06Performance-Team, and 2 others: Logging needs an index to optimize searching by log_title - https://phabricator.wikimedia.org/T68961#3064669 (10Huji) @jcrespo can you point me out to the part of the code that skips showing the edit log when the user is an... [16:32:04] 07Blocked-on-schema-change, 10DBA, 06Community-Tech, 06Stewards-and-global-tools (Temporary-UserRights): Schema changes for expiring user groups - https://phabricator.wikimedia.org/T155605#3064805 (10Marostegui) So I have done the following checks. Every shard master + a slave in codfw. Checked all the wik... [16:33:15] 10DBA, 10MediaWiki-Database, 10MediaWiki-Logging, 06Performance-Team, and 2 others: Logging needs an index to optimize searching by log_title - https://phabricator.wikimedia.org/T68961#3064819 (10jcrespo) @Huji Are you sure you are asking the right person? I am not a developer, and I have never done a sing... [17:35:15] 10DBA, 13Patch-For-Review: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038 - https://phabricator.wikimedia.org/T154485#3065009 (10Marostegui) Wikis tested today: eowiki - no differences fiwiki - no differences idwiki - stopped on the page table as it is pretty much the end of the day... [18:25:40] 10DBA, 10MediaWiki-Database, 10MediaWiki-Logging, 06Performance-Team, and 2 others: Logging needs an index to optimize searching by log_title - https://phabricator.wikimedia.org/T68961#3065289 (10Huji) @jcrespo I asked you because of your comment on https://gerrit.wikimedia.org/r/#/c/139103/ PS7 where you... [18:40:07] 10DBA, 10MediaWiki-User-blocking, 03Community-Tech-Sprint: Do test queries for range contributions to gauge performance of using different tables - https://phabricator.wikimedia.org/T156318#3065313 (10Bawolff) > Why 255 bytes for the ip address? If that represents a number, ipv4 and ipv6 have, respectively,... [18:47:00] 10DBA, 10MediaWiki-Database, 10MediaWiki-Logging, 06Performance-Team, and 2 others: Logging needs an index to optimize searching by log_title - https://phabricator.wikimedia.org/T68961#3065334 (10jcrespo) @Huji Sadly, I knew what happened, but for 3rd party recount, I wasn't part of the people implemented... [18:57:46] 10DBA, 10MediaWiki-User-blocking, 03Community-Tech-Sprint: Do test queries for range contributions to gauge performance of using different tables - https://phabricator.wikimedia.org/T156318#3065355 (10jcrespo) > Would you suggest doing something like 2 unsigned ints (for 128 bits), or as small a varchar as p...