[05:12:42] 10DBA, 10Lexicographical data, 10Wikidata, 10Datacenter-Switchover-2018, and 5 others: S8 replication issues leading to rows missing during eqiad -> codfw switch (Was: "A few lexemes disappeared") - https://phabricator.wikimedia.org/T206743 (10Marostegui) `wb_terms` is finished and it is all clean. [05:30:54] 10DBA, 10Lexicographical data, 10Wikidata, 10Datacenter-Switchover-2018, and 5 others: S8 replication issues leading to rows missing during eqiad -> codfw switch (Was: "A few lexemes disappeared") - https://phabricator.wikimedia.org/T206743 (10Marostegui) The following tables are empty so no need to check:... [05:31:25] 10DBA, 10Lexicographical data, 10Wikidata, 10Datacenter-Switchover-2018, and 5 others: S8 replication issues leading to rows missing during eqiad -> codfw switch (Was: "A few lexemes disappeared") - https://phabricator.wikimedia.org/T206743 (10Marostegui) [05:54:05] 10DBA, 10Lexicographical data, 10Wikidata, 10Datacenter-Switchover-2018, and 5 others: S8 replication issues leading to rows missing during eqiad -> codfw switch (Was: "A few lexemes disappeared") - https://phabricator.wikimedia.org/T206743 (10Marostegui) [07:43:20] do you know if s5 codfw hosts are catching up soon (aka are they in the middle, final steps of the import?) [07:43:46] the import is finished, they are catching up [07:44:09] on the replicas too? [07:44:18] yep [07:44:23] nice, thank you [08:11:43] T207253: I start to check how to use compare.py (I run it a smaller wiki, check the output, get familiar with the tool, and I start to work on this seriously [08:11:44] T207253: Compare a few tables per section between hosts and DC - https://phabricator.wikimedia.org/T207253 [08:11:56] 10DBA, 10Lexicographical data, 10Wikidata, 10Datacenter-Switchover-2018, and 5 others: S8 replication issues leading to rows missing during eqiad -> codfw switch (Was: "A few lexemes disappeared") - https://phabricator.wikimedia.org/T206743 (10Marostegui) Final round of tables that have been checked and ar... [08:12:17] 10DBA, 10Lexicographical data, 10Wikidata, 10Datacenter-Switchover-2018, and 5 others: S8 replication issues leading to rows missing during eqiad -> codfw switch (Was: "A few lexemes disappeared") - https://phabricator.wikimedia.org/T206743 (10Marostegui) [08:19:52] 10DBA, 10Wikimedia-Incident: Compare a few tables per section between hosts and DC - https://phabricator.wikimedia.org/T207253 (10Marostegui) a:03Banyek As per our last chat, assigning it to Balazs as he'll start with it [08:27:06] 10DBA, 10Lexicographical data, 10Wikidata, 10Datacenter-Switchover-2018, and 5 others: S8 replication issues leading to rows missing during eqiad -> codfw switch (Was: "A few lexemes disappeared") - https://phabricator.wikimedia.org/T206743 (10Marostegui) 05Open>03Resolved a:05Marostegui>03jcrespo... [08:41:49] jynus: should I repool db1092 and db1087? [08:43:08] if they are ok from your point of view yes [08:43:15] but I need to enable notifications first [08:43:20] I can do that too [08:43:23] no worries [08:43:37] I was asking if you are planning to touch them for some reason today [08:44:57] no, I said so yesterday I was done with them [08:45:21] maybe they should be restarted [08:45:30] for upgrade and sane memory state [08:45:33] sounds sane! [08:45:36] before repoole [08:45:46] let me know how I can help [08:45:51] No worries, I can do it [08:45:53] just: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/469388/ [08:45:57] :) [08:46:46] what about drops? [08:47:06] drops? [08:47:07] I guess I can wait until s5 is checked [08:47:10] ah [08:47:12] yeah [08:49:44] can you reassign to me or put back into backlog all tickets you have assigned currently before tomorrow? [08:49:55] yeah [08:50:01] or stall the ones you really want to do [08:50:03] I was planning to do so, as soon as I finish my checks [08:50:23] I will do a wrap up on my tickets after lunch :) [08:50:53] 10DBA, 10User-Banyek, 10Wikimedia-Incident: Compare a few tables per section between hosts and DC - https://phabricator.wikimedia.org/T207253 (10Banyek) [09:09:55] 10DBA, 10User-Banyek, 10Wikimedia-Incident: Compare a few tables per section between hosts and DC - https://phabricator.wikimedia.org/T207253 (10Banyek) In the very first iteration I create a 'config' file which contains a list of sections, dbs, tables, hosts, and a wrapper around it which will start the co... [09:10:54] The first wrapper will be a simple text based table witch positional data, but I think the best wrapper would be a python one, which could parse a yaml config [09:11:28] 10DBA, 10User-Banyek, 10Wikimedia-Incident: Compare a few tables per section between hosts and DC - https://phabricator.wikimedia.org/T207253 (10Marostegui) You might also want to run a first iteration over the hosts and tables you select to make sure the tables are indeed the same from the first day, in ord... [09:29:14] 10DBA, 10Lexicographical data, 10Wikidata, 10Datacenter-Switchover-2018, and 5 others: S8 replication issues leading to rows missing during eqiad -> codfw switch (Was: "A few lexemes disappeared") - https://phabricator.wikimedia.org/T206743 (10Pigsonthewing) 05Resolved>03Open The issue with https://www... [09:35:42] 10DBA, 10Lexicographical data, 10Wikidata, 10Datacenter-Switchover-2018, and 5 others: S8 replication issues leading to rows missing during eqiad -> codfw switch (Was: "A few lexemes disappeared") - https://phabricator.wikimedia.org/T206743 (10Marostegui) I guess it is because what @Addshore described at T... [09:36:00] 10DBA, 10User-Banyek, 10Wikimedia-Incident: Compare a few tables per section between hosts and DC - https://phabricator.wikimedia.org/T207253 (10Banyek) The first dataset to check will be: ```s1 enwiki user user_id db1067 db2048 s2 bgwiki user user_id db1066 db2035 s3 bmwiki user user_id db1075 db2043 s3 j... [09:42:51] 10DBA, 10User-Banyek, 10Wikimedia-Incident: Compare a few tables per section between hosts and DC - https://phabricator.wikimedia.org/T207253 (10Marostegui) According to sizes For s2 maybe you want to check: ``` enwiktionary itwiki ptwiki ``` s3: ``` mediawikiwiki enwikibooks frwikisource ``` s5: ``` dewiki... [09:44:33] 10DBA, 10User-Banyek, 10Wikimedia-Incident: Compare a few tables per section between hosts and DC - https://phabricator.wikimedia.org/T207253 (10Banyek) Ok, I'll use those! [09:55:51] 10DBA, 10Patch-For-Review: Drop ct_ indexes on change_tag - https://phabricator.wikimedia.org/T205913 (10Marostegui) 05Open>03stalled [09:55:58] 10DBA, 10Datasets-General-or-Unknown, 10Patch-For-Review, 10Release-Engineering-Team (Watching / External), and 2 others: Automate the check and fix of object, schema and data drifts between mediawiki HEAD, production masters and slaves - https://phabricator.wikimedia.org/T104459 (10Marostegui) [09:56:01] 10DBA, 10Schema-change: Drop externallinks.el_from_namespace on wmf databases - https://phabricator.wikimedia.org/T114117 (10Marostegui) 05Open>03stalled [09:56:03] 10DBA, 10Epic, 10Tracking: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921 (10Marostegui) [09:56:06] 10DBA, 10Schema-change, 10Tracking: [DO NOT USE] Schema changes for Wikimedia wikis (tracking) [superseded by #Blocked-on-schema-change] - https://phabricator.wikimedia.org/T51188 (10Marostegui) [09:56:08] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping rc_cur_time on wmf databases - https://phabricator.wikimedia.org/T67448 (10Marostegui) 05Open>03stalled [09:56:13] 10DBA, 10Schema-change, 10Tracking: [DO NOT USE] Schema changes for Wikimedia wikis (tracking) [superseded by #Blocked-on-schema-change] - https://phabricator.wikimedia.org/T51188 (10Marostegui) [09:56:16] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping rc_cur_time on wmf databases - https://phabricator.wikimedia.org/T67448 (10Marostegui) 05stalled>03Open [09:56:20] 10DBA, 10Schema-change, 10Tracking: [DO NOT USE] Schema changes for Wikimedia wikis (tracking) [superseded by #Blocked-on-schema-change] - https://phabricator.wikimedia.org/T51188 (10Marostegui) [09:56:22] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping rc_cur_time on wmf databases - https://phabricator.wikimedia.org/T67448 (10Marostegui) 05Open>03stalled [09:56:29] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Dropping rc_moved_to_title/rc_moved_to_ns on wmf databases - https://phabricator.wikimedia.org/T51191 (10Marostegui) 05Open>03stalled [10:05:15] 10DBA, 10Lexicographical data, 10Wikidata, 10Datacenter-Switchover-2018, and 5 others: S8 replication issues leading to rows missing during eqiad -> codfw switch (Was: "A few lexemes disappeared") - https://phabricator.wikimedia.org/T206743 (10jcrespo) @Addshore I thought you had communicated to wikidata u... [10:31:01] 10DBA, 10User-Banyek, 10Wikimedia-Incident: Compare a few tables per section between hosts and DC - https://phabricator.wikimedia.org/T207253 (10Banyek) I think this could be a good format/data YAML to start with: ```s1: primary: db1067 compare: db2048 databases: enwiki: tables:... [10:59:06] 10DBA, 10User-Banyek, 10Wikimedia-Incident: Compare a few tables per section between hosts and DC - https://phabricator.wikimedia.org/T207253 (10jcrespo) @Banyek Hardcoding the masters in configurations seems to me like a bad idea- they are already defined redundantly 4 times on mediawiki, on puppet, on tend... [11:01:23] 10DBA, 10User-Banyek, 10Wikimedia-Incident: Compare a few tables per section between hosts and DC - https://phabricator.wikimedia.org/T207253 (10Marostegui) I wouldn't use tables_to_check.txt for now (not entirely, just a few of those, like `revision` or `user`). So the checks don't take too long and we can... [11:02:34] 10DBA, 10User-Banyek, 10Wikimedia-Incident: Compare a few tables per section between hosts and DC - https://phabricator.wikimedia.org/T207253 (10jcrespo) > I wouldn't use tables_to_check.txt for now I wouldn't either, just wanted him to have a look at it and not duplicate the master definition. [11:13:06] 10DBA, 10User-Banyek, 10Wikimedia-Incident: Compare a few tables per section between hosts and DC - https://phabricator.wikimedia.org/T207253 (10Banyek) @jcrespo You have right indeed, but the current data structure is good in that way that there will be no real problem to remove the 'primary' and maybe the... [11:17:03] 10DBA, 10User-Banyek, 10Wikimedia-Incident: Compare a few tables per section between hosts and DC - https://phabricator.wikimedia.org/T207253 (10jcrespo) To not just be a pain, this is how you can discover the master for a particular section automatically: root@db1115.eqiad.wmnet[zarcillo]> SELECT instance... [11:18:47] 10DBA, 10User-Banyek, 10Wikimedia-Incident: Compare a few tables per section between hosts and DC - https://phabricator.wikimedia.org/T207253 (10Banyek) Thank you! [11:25:15] I go and eat something [12:50:19] 10DBA, 10Operations, 10cloud-services-team, 10wikitech.wikimedia.org, and 2 others: Move some wikis to s5 - https://phabricator.wikimedia.org/T184805 (10Marostegui) a:05Marostegui>03jcrespo I have checked `revision` and `user` table for all the wikis in s5: ``` dewiki cebwiki shwiki srwiki mgwiktionar... [13:57:10] 10DBA, 10Operations: Populate the wikishared db on all dbstores - https://phabricator.wikimedia.org/T126252 (10Marostegui) I am not sure whether this still applies or not, does it? [13:59:22] 10DBA, 10Operations: Populate the wikishared db on all dbstores - https://phabricator.wikimedia.org/T126252 (10jcrespo) 05Open>03Resolved a:03jcrespo This was done long time ago on dbstore1002, and doesn't apply anymore on dbstores due to multiinstance. [14:07:01] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 10wikidata-tech-focus: wikibase: synchronize schema on production with what is created on install - https://phabricator.wikimedia.org/T85414 (10Marostegui) >>! In T85414#4661580, @WMDE-leszek wrote: > `wb_terms_entity_id` only uses the "old", nu... [14:16:11] 10DBA, 10Lexicographical data, 10Wikidata, 10Datacenter-Switchover-2018, and 5 others: S8 replication issues leading to rows missing during eqiad -> codfw switch (Was: "A few lexemes disappeared") - https://phabricator.wikimedia.org/T206743 (10Addshore) >>! In T206743#4691328, @jcrespo wrote: > @Addshore I... [16:06:23] 10DBA, 10User-Banyek, 10Wikimedia-Incident: Compare a few tables per section between hosts and DC - https://phabricator.wikimedia.org/T207253 (10Banyek) **Note: this is work in progress.** The current yaml structure is the following: ``` Sections: - s1: Primary: db1067 Compare: db2048 Database... [16:08:17] I leave for now [19:33:26] 10DBA, 10MediaWiki-Database, 10Wikimedia-production-error: excessive "lock wait timeout exceeded " error rate after deploying 1.33.0-wmf.1 to group1 - https://phabricator.wikimedia.org/T207881 (10mmodell) [19:37:33] 10DBA, 10MediaWiki-Database, 10Wikimedia-production-error: excessive "lock wait timeout exceeded " error rate after deploying 1.33.0-wmf.1 to group1 - https://phabricator.wikimedia.org/T207881 (10mmodell) {F26775508} [21:10:17] 10DBA, 10MediaWiki-Database, 10Wikimedia-production-error: excessive "lock wait timeout exceeded " error rate after deploying 1.33.0-wmf.1 to group1 - https://phabricator.wikimedia.org/T207881 (10Banyek) Those are the exceptions in the given time range: https://logstash.wikimedia.org/app/kibana#/dashboard/de... [21:25:22] 10DBA, 10User-Banyek: dbstore2002 tables compression status check - https://phabricator.wikimedia.org/T204930 (10Banyek) Here are the compressable tables from s1: ``` MariaDB [(none)]> SELECT table_schema, table_name, data_length/1024/1024/1024 as size FROM information_ Schema.tables WHERE engine='INNODB' and... [21:59:16] 10DBA: dbproxy1005 reports database failover - https://phabricator.wikimedia.org/T207901 (10Banyek) [21:59:32] 10DBA: dbproxy1005 reports database failover - https://phabricator.wikimedia.org/T207901 (10Banyek) p:05Triage>03Normal