[07:21:34] <wikibugs>	 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Initial data tests for db1095 (temporary db1069 - sanitarium replacement) - https://phabricator.wikimedia.org/T150960#2820100 (10Marostegui) The following tables have been renamed as they are in the list of ignored t...
[07:37:14] <wikibugs_>	 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2820103 (10Marostegui) Papaul has swapped the PSU between each other, so I am trying to crash it again. If this is not successful we will try to replace both PSUs with other ones.
[08:10:50] <wikibugs>	 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Initial data tests for db1095 (temporary db1069 - sanitarium replacement) - https://phabricator.wikimedia.org/T150960#2820152 (10Marostegui) I just tested what happens with a master with the old timestamp format and...
[08:14:51] <wikibugs_>	 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2820171 (10Marostegui) And the server died again after 37 minutes (yesterday when it was plugged to the other PDU it took a lot longer to crash and it only crashed on the second attempt after 1:20h...
[08:34:35] <wikibugs>	 10DBA, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Implement a frontend failover solution for labsdb replicas - https://phabricator.wikimedia.org/T141097#2820195 (10jcrespo) > How is the haproxy layer failed over (between nodes) in prod atm? LVS or ucarp/VRRP or ?  There is no redundancy at the mome...
[08:42:04] <wikibugs>	 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Initial data tests for db1095 (temporary db1069 - sanitarium replacement) - https://phabricator.wikimedia.org/T150960#2820213 (10Marostegui) I have audited all the tables and columns appearing on the triggers below a...
[08:52:07] <marostegui>	 jynus: Can I run an alter table on db1092? Are you done with it? (it is depooled -S5)
[08:52:19] <wikibugs_>	 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata: Wikibase\Repo\Store\Sql\SqlEntityIdPager::fetchIds query slow - https://phabricator.wikimedia.org/T151356#2820253 (10jcrespo) I am ok with that, but on my version, it doesn't create a temporary table (maybe you were mixing 2 different executions?)...
[08:57:31] <jynus>	 marostegui, yes, I am finished
[08:57:39] <marostegui>	 thanks!
[09:03:10] <wikibugs_>	 10DBA: Fix PK on S5 dewiki.revision - https://phabricator.wikimedia.org/T148967#2820273 (10Marostegui) Running ALTER table on db1092 which is currently depooled due to: T151272
[09:06:17] <jynus>	 what is the plan with db1095?
[09:06:49] <marostegui>	 jynus: I wanted to let it replicate for a few more hours and after that, I would say we are ready to copy the content to a labs box
[09:06:50] <jynus>	 (aside from leaving it there for a while)
[09:07:10] <jynus>	 so you want to copy that already, without compression?
[09:07:16] <jynus>	 or does it have compression?
[09:07:23] <marostegui>	 no, it doesn't
[09:07:40] <marostegui>	 at this point we need to decide if we want more than 1 shard
[09:07:43] <jynus>	 I was asking because I do not know if it is better to compress
[09:07:44] <marostegui>	 or stick to 1 shard
[09:08:02] <marostegui>	 Let me check one thing
[09:08:05] <jynus>	 or to load the things from dbstore
[09:08:29] <jynus>	 we have to do only one shard- but we need to do all shards eventually
[09:08:57] <jynus>	 also, do not move the tables on sanitarium2
[09:09:07] <jynus>	 just delete them
[09:09:16] <jynus>	 no need to be careful there
[09:09:24] <marostegui>	 sure
[09:10:03] <jynus>	 if something breaks it will break equaly deleting and moving, and we can reimport individual tables from dbstore1001
[09:10:13] <marostegui>	 ok, i left a file logging the lag of dbstore2001 (and 2002) every 2 minutes for the last hours and I see no big deals (maybe 40 secodns lags or stuff like that)
[09:10:16] <marostegui>	 so compression looks good
[09:10:32] <jynus>	 the problem is importing s3
[09:10:43] <jynus>	 lots of work there
[09:10:48] <marostegui>	 well, what we can do is
[09:10:55] <marostegui>	 1) import s3 with a normal netcat
[09:11:01] <marostegui>	 2) run the scriopt (will take ages)
[09:11:11] <marostegui>	 3) import the rest of shards moving the tablespaces, less work
[09:11:18] <marostegui>	 no?
[09:11:39] <jynus>	 are those xors?
[09:11:49] <marostegui>	 haha
[09:11:49] <jynus>	 or steps
[09:12:04] <jynus>	 i do not understand
[09:12:06] <marostegui>	 I think it is easier to import s3 in a normal way
[09:12:12] <marostegui>	 (netcat)
[09:12:19] <marostegui>	 rather than importing the ibd files
[09:12:23] <jynus>	 so we discard what we have done with s1?
[09:12:35] <jynus>	 that is why I was asking
[09:12:53] <marostegui>	 ah, well, setting all this up doesn't take long now that we know how to do it
[09:12:58] <marostegui>	 I can have S3 and S1 done in one day
[09:13:25] <jynus>	 well, but not populating labs if it is going to be reimported
[09:14:22] <marostegui>	 yes, that might take another day
[09:14:30] <marostegui>	 to copy the content to the labs servers
[09:14:50] <jynus>	 maybe labs can be done in a single copy?
[09:15:04] <jynus>	 to avoid overhead and errors?
[09:15:14] <marostegui>	 yes, once we have db1095 with both shards, we can just use netcat
[09:15:17] <marostegui>	 and copy all the stuff
[09:15:28] <jynus>	 that is why I was asking, coming up with a plan
[09:15:37] <jynus>	 to not work more than necessary
[09:15:42] <marostegui>	 ok, let me write one in the ticket and we can discuss it there?
[09:15:48] <jynus>	 ok
[09:15:54] <marostegui>	 going to do it now
[09:15:55] <jynus>	 then there is the rest of the shards
[09:15:58] <marostegui>	 in the meta ticket?
[09:16:00] <jynus>	 that can wait
[09:16:03] <marostegui>	 sure
[09:16:09] <marostegui>	 so we want S3 and S1 at this point
[09:16:23] <jynus>	 but we may have to use sanitarium
[09:16:33] <jynus>	 I am thinking of all shards, after december
[09:16:39] <jynus>	 not to be done now
[09:16:47] <jynus>	 but they have to be eventually done
[09:16:51] <marostegui>	 yeah, that can be done withouyt problems as we can import them easily
[09:16:58] <marostegui>	 the problem is just S3 :)
[09:17:03] <jynus>	 so not to create ourselves problems
[09:17:08] <jynus>	 after the goal
[09:17:28] <marostegui>	 yes agreed
[09:17:35] <jynus>	 I am thinking of that
[09:17:44] <jynus>	 not becaue it has to be done in a hurry
[09:17:58] <jynus>	 but to make our life easier
[09:18:21] <marostegui>	 well, the goal says 1 shard, and that shard would need to be s3 to avoid problems in the future, but we can include S1 easily
[09:18:25] <marostegui>	 at this point I mean
[09:18:34] <jynus>	 for example
[09:18:43] <jynus>	 if we import, import already compressed
[09:18:49] <marostegui>	 yeah
[09:18:55] <jynus>	 so we do not have to work more later
[09:19:01] <marostegui>	 agreed
[09:19:16] <jynus>	 note we have until 31 december
[09:19:31] <jynus>	 so we do not have to have it next week
[09:19:35] <marostegui>	 sure
[09:19:44] <marostegui>	 But I would like to have S3 and S1 by next week up and running
[09:19:50] <marostegui>	 actually by the first days of the week
[09:19:58] <marostegui>	 on db1095 at least
[09:20:05] <jynus>	 db1095 maybe
[09:20:14] <jynus>	 labsdb may be too optimixtic
[09:20:34] <marostegui>	 ok, let me write the plan ion the meta ticket
[09:20:36] <marostegui>	 so we can discuss
[09:20:39] <jynus>	 and letting some days checking for problems, as you intended
[09:20:42] <jynus>	 it was a good idea
[09:20:43] <marostegui>	 yep
[09:20:49] <marostegui>	 I never trust replication :p
[09:21:49] <jynus>	 specially not when combined with arbitrary sanitization and replication filtering
[09:22:12] <marostegui>	 exactly
[09:26:54] <wikibugs_>	 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2820295 (10Marostegui) And it crashed again after 57 minutes.   @Papaul let's try what we agreed on yesterday - swap both PSUs with different ones.
[09:32:53] <wikibugs>	 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Initial data tests for db1095 (temporary db1069 - sanitarium replacement) - https://phabricator.wikimedia.org/T150960#2820298 (10jcrespo) db1095 is ignoring 'heartbeat.%', and we need that for replication lag control...
[09:34:42] <wikibugs>	 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Initial data tests for db1095 (temporary db1069 - sanitarium replacement) - https://phabricator.wikimedia.org/T150960#2820311 (10Marostegui) Yes, you are right. It was ignored on the previous tests because ROW based...
[09:36:44] <wikibugs_>	 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802#2820313 (10Marostegui) The tests on T150960 are looking good so we'd need to discuss the next...
[09:41:42] <wikibugs>	 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802#2820324 (10jcrespo) Looks good. s3 will need lots of checking. Maybe I could work on an heuris...
[09:44:05] <wikibugs_>	 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802#2820338 (10Marostegui) >>! In T150802#2820324, @jcrespo wrote: > Looks good. s3 will need lots...
[09:48:03] <wikibugs_>	 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Initial data tests for db1095 (temporary db1069 - sanitarium replacement) - https://phabricator.wikimedia.org/T150960#2820365 (10Marostegui) hearbeat is no longer being ignored.
[09:55:20] <wikibugs_>	 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Initial data tests for db1095 (temporary db1069 - sanitarium replacement) - https://phabricator.wikimedia.org/T150960#2820400 (10jcrespo) Why was being ignored, puppet? If yes, we need to change it there.
[09:55:50] <wikibugs>	 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Initial data tests for db1095 (temporary db1069 - sanitarium replacement) - https://phabricator.wikimedia.org/T150960#2820401 (10Marostegui) No, me manually after importing S1.
[10:30:53] <wikibugs>	 10DBA: Fix PK on S5 dewiki.revision - https://phabricator.wikimedia.org/T148967#2820496 (10Marostegui) db1092 is done:  ``` MariaDB PRODUCTION s5 localhost dewiki > show create table revision\G *************************** 1. row ***************************        Table: revision Create Table: CREATE TABLE `revis...
[10:31:25] <wikibugs_>	 10DBA, 06Labs, 10Labs-Infrastructure: Migrate existing labs users from the old servers, if possible using roles and start maintaining users on the new database servers, too - https://phabricator.wikimedia.org/T149933#2820497 (10jcrespo) I have been testing the roles, they work as advertised:  ``` mysql -u u2...
[10:46:21] <wikibugs_>	 10DBA, 06Labs, 10Labs-Infrastructure: Migrate existing labs users from the old servers, if possible using roles and start maintaining users on the new database servers, too - https://phabricator.wikimedia.org/T149933#2820524 (10jcrespo) @chasemp @yuvipanda @Marostegui With the above, the changes to permissio...
[10:47:01] <jynus>	 ^roles
[10:49:35] <wikibugs_>	 10DBA, 06Operations: Icinga MariaDB disk space check on silver checks the wrong partition - https://phabricator.wikimedia.org/T151491#2820534 (10Peachey88)
[10:52:37] <wikibugs>	 10DBA, 06Operations: Icinga MariaDB disk space check on silver checks the wrong partition - https://phabricator.wikimedia.org/T151491#2819072 (10jcrespo) There is no /srv partition on silver. probably it should check /a instead of / ?
[10:56:49] <wikibugs>	 10DBA, 06Operations: Icinga MariaDB disk space check on silver checks the wrong partition - https://phabricator.wikimedia.org/T151491#2820552 (10jcrespo) The MariaDB disk space check is a legacy of the past- there should be only one disk check, and the critical (and warning) level should be higher for database...
[11:11:46] <wikibugs>	 10DBA, 06Operations: Icinga MariaDB disk space check on silver checks the wrong partition - https://phabricator.wikimedia.org/T151491#2820601 (10Volans) @jcrespo see T151489, there is a `/srv` mount point as well as `/a` mount point, they both mount the same partition!
[11:21:47] <wikibugs_>	 10DBA: Meta ticket: Deploy InnoDB compression where possible - https://phabricator.wikimedia.org/T150438#2820626 (10Marostegui) p:05Triage>03Low
[11:43:10] <wikibugs_>	 10DBA, 06Operations, 13Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#2820680 (10jcrespo) p:05High>03Normal
[11:43:51] <wikibugs>	 10DBA, 07Availability: Look into Maria 10 parallel-replication - https://phabricator.wikimedia.org/T85266#2820681 (10jcrespo) p:05Triage>03Low
[11:56:05] <wikibugs>	 10DBA, 06Operations, 13Patch-For-Review: Firewall configurations for database hosts - https://phabricator.wikimedia.org/T104699#2820707 (10jcrespo) p:05Triage>03Normal
[12:02:09] <marostegui>	 jynus: slave_exec_mode               | STRICT on db1095 so ROW based is actually working otherwise we'd have seen it
[12:02:56] <marostegui>	 lunch
[13:12:49] <wikibugs_>	 10DBA, 06Labs, 10Labs-Infrastructure: Provision with data the new labsdb servers and provide replica service with at least 1 shard from a sanitized copy from production - https://phabricator.wikimedia.org/T147052#2820860 (10Marostegui)
[13:12:51] <wikibugs>	 10DBA: Fix dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T130128#2820861 (10Marostegui)
[13:12:54] <wikibugs_>	 10DBA, 13Patch-For-Review: Reimage dbstore2001 as jessie - https://phabricator.wikimedia.org/T146261#2820858 (10Marostegui) 05Open>03Resolved This is done - I will create a different task to import all the shards to dbstore2001 and dbstore2002.
[13:21:42] <wikibugs>	 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Provision db1095 with at least 1 shard, sanitize and test slave-side triggers - https://phabricator.wikimedia.org/T150802#2820867 (10Marostegui) Candidates masters for db1095:  S3: Depooled servers: db1035 db1044  S1...
[13:31:18] <wikibugs_>	 10DBA: Import S2,S6,S7,m3 and x1 to dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T151552#2820882 (10Marostegui)
[13:31:51] <wikibugs>	 10DBA: Import S2,S6,S7,m3 and x1 to dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T151552#2820882 (10Marostegui)
[14:02:15] <wikibugs_>	 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Initial data tests for db1095 (temporary db1069 - sanitarium replacement) - https://phabricator.wikimedia.org/T150960#2821001 (10Marostegui) So far so good, replication is going fine and I can see new records being i...
[14:09:20] <wikibugs>	 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Initial data tests for db1095 (temporary db1069 - sanitarium replacement) - https://phabricator.wikimedia.org/T150960#2821109 (10jcrespo) I believe you, I am just unsure //why// it works. BTW, I've seen replication w...
[14:19:30] <wikibugs>	 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Initial data tests for db1095 (temporary db1069 - sanitarium replacement) - https://phabricator.wikimedia.org/T150960#2821160 (10Marostegui) >>! In T150960#2821109, @jcrespo wrote: > I believe you, I am just unsure /...
[14:26:54] <wikibugs>	 10DBA, 10Datasets-General-or-Unknown, 06Labs, 10Labs-Infrastructure, 13Patch-For-Review: Initial data tests for db1095 (temporary db1069 - sanitarium replacement) - https://phabricator.wikimedia.org/T150960#2821176 (10jcrespo) >>! In T150960#2821160, @Marostegui wrote: > We can do that, but how can we fi...
[14:29:18] <wikibugs_>	 10DBA, 10Phabricator, 07Upstream: Editing a recurring event overrides all past instances - https://phabricator.wikimedia.org/T151228#2821195 (10daniel) I think phab behavior is misleading at best. I commented to that effect on the upstream ticket.
[14:29:50] <wikibugs>	 10DBA, 10Phabricator, 07Upstream: Editing a recurring event overrides all past instances - https://phabricator.wikimedia.org/T151228#2821197 (10daniel) @Marostegui any chance of recovering the lost titles and descriptions?
[14:29:55] <marostegui>	 jynus: you have the pt-table-checksum handy?
[14:30:14] <marostegui>	 like, the options
[14:30:35] <jynus>	 yes
[14:30:43] <jynus>	 but I can do that, do not worry
[14:30:51] <marostegui>	 ah good :)
[14:31:09] <jynus>	 remember that we need to patch pt- to work for our environement
[14:31:17] <marostegui>	 true
[14:31:25] <jynus>	 I think the patch went through
[14:31:38] <jynus>	 but I doubt it is already on debian, as it was not a security issue
[14:32:46] <marostegui>	 https://phabricator.wikimedia.org/T151228#2821197 -> you have any advice on that? not now, whenever you have time to read and teach me :)
[14:33:12] <jynus>	 how busy are you today?
[14:33:24] <jynus>	 I am not much because of the freeze
[14:33:28] <marostegui>	 same here
[14:33:34] <marostegui>	 maybe we can discuss it now?
[14:33:40] <jynus>	 maybe we can talk this and the mariadb package
[14:33:47] <jynus>	 should take little time
[14:33:53] <marostegui>	 sounds good
[17:52:40] <wikibugs>	 10DBA, 10Phabricator, 07Upstream: Editing a recurring event overrides all past instances - https://phabricator.wikimedia.org/T151228#2821602 (10jcrespo) I was working with Manuel on this, and he recovered a copy of the database from 2016-10-05 01:00:02, previous one exist. In order to help with this issue, I...
[18:24:29] <wikibugs>	 10DBA, 10Phabricator, 07Upstream: Editing a recurring event overrides all past instances - https://phabricator.wikimedia.org/T151228#2811476 (10epriestley) If it's easier, this data can also be recovered completely from the live `phabricator_calendar.calendar_eventtransaction` table, which stores the old and...
[18:50:38] <wikibugs_>	 10DBA: Use tls for dump backup generation - https://phabricator.wikimedia.org/T151583#2821866 (10jcrespo)
[22:50:12] <wikibugs>	 10DBA, 10Phabricator, 07Upstream: Editing a recurring event overrides all past instances - https://phabricator.wikimedia.org/T151228#2822184 (10mmodell) @jcrespo & @Marostegui:  The calendar event phid for E66 is `PHID-CEVT-sggwinpwrtfsppo7pbqd` and the timestamp for the edit is `1479751874`.  All of the aff...
[23:14:12] <wikibugs_>	 10DBA, 10Phabricator, 07Upstream: Editing a recurring event overrides all past instances - https://phabricator.wikimedia.org/T151228#2822198 (10mmodell) This query selects the phid of the event instance, the oldValue and the newValue from for the events in question:  ```lang=mysql      SELECT ev.phid, tr.old...
[23:23:38] <wikibugs_>	 10DBA, 10Phabricator, 07Upstream: Editing a recurring event overrides all past instances - https://phabricator.wikimedia.org/T151228#2822216 (10mmodell) this is the update query which I am not confident enough to run on live database:  ```lang=mysql     UPDATE calendar_event ev INNER JOIN calendar_eventtrans...