[00:26:03] 10DBA, 06Operations, 10Traffic: dbtree: make wasat a working backend and become active-active - https://phabricator.wikimedia.org/T163141#3188507 (10Dzahn) not sure if this should be tagged as traffic or not. please feel free to remove it. it just got auto-added because it copies tags when you create somethi... [06:12:13] 10DBA: db1069 s7 replication thread stuck on fawiki.flaggedrevs_tracking - https://phabricator.wikimedia.org/T163183#3188964 (10Marostegui) [06:12:27] 10DBA: db1069 s7 replication thread stuck on fawiki.flaggedrevs_tracking - https://phabricator.wikimedia.org/T163183#3188977 (10Marostegui) 05Open>03Resolved [06:13:20] 10DBA: db1069 s7 replication thread stuck on fawiki.flaggedrevs_tracking - https://phabricator.wikimedia.org/T163183#3188964 (10Marostegui) This looks similar to: T161781 [06:15:24] 10DBA: db1069 s7 replication thread stuck on fawiki.flaggedrevs_tracking - https://phabricator.wikimedia.org/T163183#3188982 (10Marostegui) [07:06:24] did s7-sanitarium crash? [07:06:55] oh, I see T163183 [07:06:55] T163183: db1069 s7 replication thread stuck on fawiki.flaggedrevs_tracking - https://phabricator.wikimedia.org/T163183 [07:10:45] yeah [07:10:47] only that instance [07:11:09] i created the ticket for future references and tracking [07:12:35] it happened on s3 [07:12:42] on friday [07:13:06] the upgrade made it less frequent, but it did not solve it [07:14:44] and it happened on s2 some days ago too: https://phabricator.wikimedia.org/T161781 [07:14:54] which table was it on s3? [07:14:57] same? [07:15:18] tag_summary [07:15:29] on iswiki [07:16:20] I wonder why it happens on db1069 but not on the dbstore servers... [07:16:38] tokudb master [07:17:03] with multisource [07:17:27] but db1069 doesn't have technically, multisource [07:17:38] well, tokudb master [07:18:03] when we finish these 2 weeks, we have to connect labsdb1001 and 3 with db1095 [07:18:03] but it is pretty much the same as the dbstore, because it replicates from non tokudb [07:18:32] not row and not master [07:19:11] yeah, i guess that is it [07:19:13] :( [07:36:51] 10DBA, 06Revision-Scoring-As-A-Service, 10rsaas-articlequality : [Discuss] Hosting the monthly article quality dataset on labsDB - https://phabricator.wikimedia.org/T146718#3189093 (10Marostegui) >>! In T146718#3186337, @Halfak wrote: > Hi @Marostegui. > > * Requirements are 15.6GB with an additional 2GB p... [07:53:12] 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, 15User-Daniel: Usage tracking: record which statement group is used - https://phabricator.wikimedia.org/T151717#3189123 (10hoo) >>! In T151717#3184079, @jcrespo wrote: >> We want to collect additional information on one of these wikis for a while >... [08:02:36] 10DBA: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807#3189128 (10Marostegui) The dsns_s1 table is ready with the correct hosts. The rc slaves on both dcs now have the filter: ``` Replicate_Wild_Ignore_Table: enwiki.__wmf_checksums ``` The table needs to be dropped and recreated... [08:17:41] 10DBA: Run pt-table-checksum on s7 - https://phabricator.wikimedia.org/T163190#3189184 (10Marostegui) [08:56:29] marostegui, you will be happy to know that all external storage checks I have done until now reveal no differences between servers [08:57:42] \o/ [08:58:03] niiiiiiiiiiiiiiiiice!!!!! [08:58:10] I was not 100% sure about it [08:58:32] not everthing has been checks, only enwiki and wikidata [08:58:36] *checked [09:30:50] 10DBA, 13Patch-For-Review: run pt-tablechecksum on s6 - https://phabricator.wikimedia.org/T160509#3189325 (10jcrespo) text is ok, checking user_properties now. [09:33:59] 10DBA: Run pt-table-checksum on s7 - https://phabricator.wikimedia.org/T163190#3189329 (10Marostegui) The following tables per wiki need to be excluded (no PK): arwiki ``` __iwlinks_new categorylinks change_tag click_tracking click_tracking_user_properties cur edit_page_tracking ep_users_per_course hidden image... [10:10:21] 10DBA, 10MediaWiki-extensions-WikibaseClient, 10Wikidata, 15User-Daniel: Usage tracking: record which statement group is used - https://phabricator.wikimedia.org/T151717#3189407 (10matej_suchanek) [10:45:52] 10DBA: Run pt-table-checksum on s7 - https://phabricator.wikimedia.org/T163190#3189443 (10Marostegui) The dsns_s7 table is ready. The replication filters are in place on the rc slaves: ``` Replicate_Wild_Ignore_Table: arwiki.__wmf_checksums,cawiki.__wmf_checksums,centralauth.__wmf_checksums,eswiki.__wmf_checksum... [10:46:06] 10DBA: Run pt-table-checksum on s7 - https://phabricator.wikimedia.org/T163190#3189445 (10Marostegui) [10:46:08] 10DBA, 06Operations, 13Patch-For-Review: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#3189446 (10Marostegui) [10:46:55] 10DBA, 06Operations, 13Patch-For-Review: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#2266762 (10Marostegui) [10:46:57] 10DBA: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807#3189448 (10Marostegui) [11:31:46] jynus, marostegui: you ok if I do another test of tendril switch to codfw and then back? [11:32:11] yes, go on [11:32:39] +1 [11:33:07] ack [11:35:41] tendril tree in codfw configuration now [11:36:25] looks good to me, ready to switch it back if you don't need to check anything else [11:37:06] looks good to me too [11:37:12] switch back if you like [11:37:16] ok [11:37:43] done [11:38:04] looks good again [12:53:46] 10DBA: DROP OAI-related tables - https://phabricator.wikimedia.org/T139342#2428747 (10Marostegui) While checking for tables with PK I found this ticket and took a look at this table across all the masters, as it looks like it can be dropped and it is pretty big on some shards. It indeed looks unused as per: T549... [14:04:35] there is an http request on codfw that takes 2 minutes to be served [14:05:48] jynus: Special:UserLogin ? :) [14:05:59] I think it used to be recentchanges [14:06:14] after a few runs it goes faster [14:06:20] that one was the slowest of the warmup script ;) [14:06:35] I've made another script for pre-warmup [14:06:43] of the databases [14:07:27] dependencies are crazty [14:07:37] commons depends on meta domain [14:07:42] and meta depends on mediawiki [14:08:26] if I pre-warm the databases, the actual warmup script may actually take less time as a side effect [14:09:14] if you warmup it today we can test it if you want [14:09:23] I am doing it as we speak [14:09:32] how long does it take? [14:09:37] but for tomorrow it will have to be repeated [14:09:45] a cold cold a few minutes [14:09:52] after a few runs, a few seconds [14:10:05] recentchanges took two minutes the first time [14:10:24] then there is the external store scripts, that take 24 hours [14:10:37] and they have been running since this morning [14:10:53] is that script the script you were using to check the differences or a new one? [14:11:24] the one for the external storage, the same one [14:12:08] the one for http, I woold like to merge it with legoktm with a fast and a slow option [14:13:24] the problem with doing it with http, is that if memcache caches stuff, it actually doesn't warmup the databases [14:15:00] /wiki/Special:RecentChangesLinked/Main_Page takes forever [14:16:19] but are we warming up memcached too apart from varnish? [14:16:28] varnish? [14:16:35] we don't touch varnish [14:16:59] marostegui: we warmup apc + memcached + db [14:17:07] varnish is already warm [14:17:15] because already in production [14:17:55] I am reading phase 4, and I don't know why I thought we were including varnish along with apc [14:17:59] my bad [14:22:15] 10DBA, 06Revision-Scoring-As-A-Service, 10rsaas-articlequality : [Discuss] Hosting the monthly article quality dataset on labsDB - https://phabricator.wikimedia.org/T146718#3189849 (10Halfak) @Marostegui, we've already experimented heavily with usage of this table by researchers at the above mentioned worksh... [17:16:16] 10DBA, 13Patch-For-Review: run pt-tablechecksum on s6 - https://phabricator.wikimedia.org/T160509#3190424 (10jcrespo) user_properties checked and fixed, although not on the delayed slaves as it is not an append-only table. Checking now watchlist. [17:55:05] [07:12:05] the one for http, I woold like to merge it with legoktm with a fast and a slow option <-- sorry, what's the context? [17:56:19] don't worry too much [17:56:36] I will send a CR when/if I find the time [18:04:42] ok [18:15:35] 10DBA, 06Operations, 10Traffic: dbtree: make wasat a working backend and become active-active - https://phabricator.wikimedia.org/T163141#3187493 (10BBlack) Yeah, leave the traffic tag as we'll want to basically revert https://gerrit.wikimedia.org/r/#/c/348456/ once dbtree is ready for it. [18:19:42] 10DBA, 06Operations, 10ops-eqiad, 13Patch-For-Review: eqiad rack/setup 11 new DB servers - https://phabricator.wikimedia.org/T162233#3190761 (10Cmjohnson) 10 of the 11 servers that arrived are racked, switch configured, raid completed, idrac setup and dns entries for both mgmt and production. They are rea...