[06:09:46] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: codfw rack/setup first 10 DB servers - https://phabricator.wikimedia.org/T162159#3162515 (10Marostegui) [06:09:56] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: codfw rack/setup first 10 DB servers - https://phabricator.wikimedia.org/T162159#3154083 (10Marostegui) >>! In T162159#3160873, @Papaul wrote: > @Marostegui Just for your information > > asw-a2-codfw > asw-a7-codfw > asw-b2-codfw > asw-b7-codfw > as... [06:20:50] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: setup tempdb2001(WMF6407) - https://phabricator.wikimedia.org/T162290#3162523 (10Marostegui) After disabling sync binlog and trx commit yesterday the server caught up. I have enabled gtid as well. I have sent the patch to pool it, but I think we should... [06:53:47] 10DBA: How many revision comments are exactly the same? Get some stats. - https://phabricator.wikimedia.org/T162138#3162525 (10jcrespo) The first million of revisions is a bit more diverse, but the results are mostly maintained (probably less bots were active)- more savings in number of rows due to a lot of empt... [06:58:09] 10DBA: How many revision comments are exactly the same? Get some stats. - https://phabricator.wikimedia.org/T162138#3153644 (10Marostegui) >>! In T162138#3159663, @jcrespo wrote: > > 2/3 of them would repeat at least once, and around half of them at least 10 times, with a general reduction of 1/3 of the origina... [07:02:20] 10DBA: How many revision comments are exactly the same? Get some stats. - https://phabricator.wikimedia.org/T162138#3162531 (10jcrespo) Wikidata seems to be the one that would benefit the most, due to many bot repeating messages: ``` root@dbstore2002[ops]> SELECT count(*) FROM wikidatawiki.revision WHERE rev_i... [07:03:12] 10DBA: How many revision comments are exactly the same? Get some stats. - https://phabricator.wikimedia.org/T162138#3162533 (10jcrespo) 05Open>03Resolved [07:24:53] lag on dbstore1002:s2 is me (see SAL) [07:31:03] 10DBA: How many revision comments are exactly the same? Get some stats. - https://phabricator.wikimedia.org/T162138#3162569 (10Marostegui) >>! In T162138#3162531, @jcrespo wrote: > Wikidata seems to be the one that would benefit the most, due to many bot repeating messages: > > > Final size: 96M 99G -> 96M :-O [07:39:48] 10DBA: How many revision comments are exactly the same? Get some stats. - https://phabricator.wikimedia.org/T162138#3162593 (10jcrespo) No, 96M is the final size for 1 million comments- including indexes. The projected size for revision is less than the current half (99G compressed-> 45G) plus a comments table o... [07:42:43] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: setup tempdb2001(WMF6407) - https://phabricator.wikimedia.org/T162290#3162594 (10Marostegui) host added to tendril too. [07:48:23] 10DBA, 13Patch-For-Review: Unify revision table on s7 - https://phabricator.wikimedia.org/T160390#3162595 (10Marostegui) db1094 is done: ``` root@neodymium:/home/marostegui# for i in `cat s7_T160390`; do echo $i; mysql --skip-ssl -hdb1094.eqiad.wmnet $i -e "show create table revision\G" | egrep "KEY";done arwi... [08:27:19] 10DBA, 10Wikidata, 07Performance, 15User-Daniel, and 2 others: Use redis-based lock manager in dispatch changes in production - https://phabricator.wikimedia.org/T159826#3162747 (10hoo) [08:27:21] 10DBA, 10Wikidata, 13Patch-For-Review, 15User-Daniel, and 2 others: Use redis-based lock manager for dispatchChanges on test.wikidata.org - https://phabricator.wikimedia.org/T159828#3162745 (10hoo) 05Open>03Resolved We deployed this yesterday and I can confirm it to work correctly. I tested this by st... [08:44:38] you stopped replication on db1047 I assume? [08:44:52] yes [08:44:55] ok :) [08:44:58] just confirming [08:45:02] part of the s2 fix [08:45:14] yeah, I guessed that, but as I saw the warning, I thought I would ask [08:45:42] there are huge lists of rows missing on dbstore1002 and db1047 [08:46:00] I inteded to stop replication for just a second [08:46:08] but it takes 5 minutes [08:46:19] and a 100MB import [08:47:49] I am going to downtime db1047, too [08:47:55] sure [08:48:17] why do you think db1047 has so many differences normally? I guess because it has been used to play around with analytics? [08:48:36] No idea [08:48:41] crash? [08:48:50] other issues with multisource? [08:49:22] maybe crashes yeah [08:49:31] crashes+multisource isn't a good combination [08:49:31] however [08:49:42] there are almost full ranges missing [08:49:47] on s2 [08:50:42] that looks like someone has played with tables or something [08:50:54] who knows where it was recloned from and all that [08:51:09] there was a data loss of properties when I entered [08:51:22] (someone run a script to purge properties) [08:51:26] *ran [08:51:33] maybe related? [08:51:59] anyway, the analytics slaves are the only ones affected [08:52:06] the others seem fine [08:54:09] yeah, and I am glad about that :) [08:54:21] i mean, it is not ideal, but it could be a lot worse [08:54:45] there is also some false positives, due to the detection method [08:55:56] numerical PKs + ptheartbeat will help [08:57:20] yeah, we desperately need those Pks :) [08:57:48] not only will make the checks easier, it will prevent issues in the first place [08:58:11] writes based on PKs + better locking + possibility of RBR [09:25:00] 10DBA, 10Wikidata, 15User-Daniel, 15User-Ladsgroup, 03Wikidata-Sprint: Use redis-based lock manager for dispatchChanges on test.wikidata.org - https://phabricator.wikimedia.org/T159828#3162835 (10hoo) [09:26:42] 10DBA, 13Patch-For-Review: Unify revision table on s7 - https://phabricator.wikimedia.org/T160390#3162851 (10Marostegui) dbstore1002 is done: ``` root@neodymium:/home/marostegui# for i in `cat s7_T160390`; do echo $i; mysql --skip-ssl -hdbstore1002.eqiad.wmnet $i -e "show create table revision\G" | egrep "KEY"... [09:41:41] 10DBA, 13Patch-For-Review: run pt-table-checksum on s2 (WAS: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038) - https://phabricator.wikimedia.org/T154485#3162872 (10jcrespo) Everything seems ok or has been corrected (there were some false positives above). I am going to give a quick l... [09:55:47] 10DBA, 13Patch-For-Review: run pt-table-checksum on s2 (WAS: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038) - https://phabricator.wikimedia.org/T154485#3162892 (10Marostegui) >>! In T154485#3162872, @jcrespo wrote: > Everything seems ok or has been corrected (there were some false p... [10:44:06] 10DBA, 10Wikidata, 15User-Daniel, 15User-Ladsgroup, 03Wikidata-Sprint: Use redis-based lock manager for dispatchChanges on test.wikidata.org - https://phabricator.wikimedia.org/T159828#3163025 (10daniel) Thank you @hoo! One thing that still worries me is that we don't have a mechanism for stale locks. I... [10:46:15] 10DBA, 10Wikidata, 15User-Daniel, 15User-Ladsgroup, 03Wikidata-Sprint: Use redis-based lock manager for dispatchChanges on test.wikidata.org - https://phabricator.wikimedia.org/T159828#3163026 (10hoo) >>! In T159828#3163025, @daniel wrote: > Thank you @hoo! > > One thing that still worries me is that we... [10:50:50] jynus: Is there an easy way for you to tell me the number of updates occurring to a single table in… maybe a minute? [10:51:49] 10DBA, 10Wikidata, 15User-Daniel, 15User-Ladsgroup, 03Wikidata-Sprint: Use redis-based lock manager for dispatchChanges on test.wikidata.org - https://phabricator.wikimedia.org/T159828#3163047 (10jcrespo) As a small side note- that can also happen on mysql. Despite locks being released on session disconn... [10:51:54] like now, or all the time? [10:52:01] jynus: Both works [10:52:05] I just want to get an idea [10:52:08] I think so [10:52:12] please tell me [10:52:17] which table/server [10:52:35] wikidatawiki.wb_changes_dispatch [10:52:53] I expect there to be around 40-50 updates a second [10:52:55] I had those stats up, but had to disable them because of privacy concerns [10:53:05] for smaller wikis [10:53:26] so rows written on the s5 master? [10:53:30] Yes [10:53:34] to that table [10:55:43] Maybe like the rate would also be interesting (like what perecentage of the s5 or wikidatawiki writes happens to that table) [10:55:52] but that's optional [10:56:16] I can give you details, but I will put them on an NDA-private paste, ok? [10:56:41] it is running now [10:56:46] Fine with me… I just need the numbers to argue internally [10:56:51] he he [10:56:55] I have some ideas to reduce this [10:56:58] I have all kinds of numbers [10:57:04] feel free to ask for more [10:57:17] 42 [10:57:21] :-P [10:57:39] this is supposed to go into tendril, but never find the time to implement a nice interface [10:58:00] Oh, I'd love to have that [10:58:06] I do, too [10:58:11] *would [10:58:24] it is scheduled for then I have the time :-) [10:59:24] quick and diry results: https://phabricator.wikimedia.org/P5222 [11:00:02] those are absolute numbers, so just subtract and divide by the time to get the "speed" [11:00:46] That's very close to what I expected… thanks :) [11:00:56] *absolute number since the counter reset, normally a mysql restart [11:01:25] I'm fairly sure we can get this down by one or two orders of magnitude ;) [11:01:47] that would be great, I think wikidata will grow more [11:02:03] otherwise, if too many writes start happening [11:02:19] we will have to use separate masters for some functionality to avoid lag [11:02:27] BTW, unrelated [11:02:40] s8 servers just arrived [11:03:07] but do not expect anything in service until several weeks ahead [11:03:10] Cool :) [11:03:12] No worries [11:22:59] 10DBA, 10Wikidata, 15User-Daniel, 15User-Ladsgroup, 03Wikidata-Sprint: Use redis-based lock manager for dispatchChanges on test.wikidata.org - https://phabricator.wikimedia.org/T159828#3163156 (10daniel) >>! In T159828#3163026, @hoo wrote: > We can set a timeout for this `LockManager::lock` allows that (... [13:03:49] 10DBA, 10Wikidata, 15User-Daniel, 15User-Ladsgroup, 03Wikidata-Sprint: Use redis-based lock manager for dispatchChanges on test.wikidata.org - https://phabricator.wikimedia.org/T159828#3163721 (10hoo) >>! In T159828#3163156, @daniel wrote: >>>! In T159828#3163026, @hoo wrote: >> We can set a timeout for... [14:10:16] 10DBA, 10Cognate, 10Wikidata, 10Wikimedia-Extension-setup, and 3 others: Create SQL database and Tables for Cognate extension to be used on Wiktionaries - https://phabricator.wikimedia.org/T162252#3156921 (10jcrespo) So that we are not a blocker- creation of tables in production, specially if we have aread... [14:13:04] 10DBA, 10Cognate, 10Wikidata, 10Wikimedia-Extension-setup, and 3 others: Create SQL database and Tables for Cognate extension to be used on Wiktionaries - https://phabricator.wikimedia.org/T162252#3163941 (10Marostegui) >>! In T162252#3163937, @jcrespo wrote: > So that we are not a blocker- creation of tab... [14:15:40] 10DBA, 10Cognate, 10Wikidata, 10Wikimedia-Extension-setup, and 3 others: Create SQL database and Tables for Cognate extension to be used on Wiktionaries - https://phabricator.wikimedia.org/T162252#3163944 (10jcrespo) To clarify- it may be blocked on us right now to create the database and because labs filt... [14:16:27] 10DBA, 10Cognate, 10Wikidata, 10Wikimedia-Extension-setup, and 3 others: Create SQL database and Tables for Cognate extension to be used on Wiktionaries - https://phabricator.wikimedia.org/T162252#3163945 (10jcrespo) > This sentence actually confused me x1 == extension1 :-) [14:39:16] 10DBA, 13Patch-For-Review: Unify revision table on s7 - https://phabricator.wikimedia.org/T160390#3164014 (10Marostegui) db1039 is now done: ``` root@neodymium:/home/marostegui# for i in `cat s7_T160390`; do echo $i; mysql --skip-ssl -hdb1039.eqiad.wmnet $i -e "show create table revision\G" | egrep "KEY";done... [14:57:32] 10DBA, 13Patch-For-Review: Unify revision table on s7 - https://phabricator.wikimedia.org/T160390#3164036 (10Marostegui) db1062 is done: ``` root@neodymium:/home/marostegui# for i in `cat s7_T160390`; do echo $i; mysql --skip-ssl -hdb1062.eqiad.wmnet $i -e "show create table revision\G" | egrep "KEY";done arwi... [15:08:55] 10DBA, 10Cognate, 10Wikidata, 10Wikimedia-Extension-setup, and 3 others: Create SQL database and Tables for Cognate extension to be used on Wiktionaries - https://phabricator.wikimedia.org/T162252#3164065 (10Marostegui) A quick look on the x1 production hosts (thanks Jaime for the clarification) shows no r... [15:14:47] 10DBA, 10Cognate, 10Wikidata, 10Wikimedia-Extension-setup, and 3 others: Create SQL database and Tables for Cognate extension to be used on Wiktionaries - https://phabricator.wikimedia.org/T162252#3164073 (10jcrespo) > production hosts Was thinking on dbstore (backup) hosts, which were problematic (rememb... [15:24:16] 10DBA, 10Cognate, 10Wikidata, 10Wikimedia-Extension-setup, and 3 others: Create SQL database and Tables for Cognate extension to be used on Wiktionaries - https://phabricator.wikimedia.org/T162252#3164084 (10Marostegui) Yup, we can leave labs aside for the "first stage" and then study those without blockin... [15:25:33] 10DBA, 10Cognate, 10Wikidata, 10Wikimedia-Extension-setup, and 3 others: Create SQL database and Tables for Cognate extension to be used on Wiktionaries - https://phabricator.wikimedia.org/T162252#3164085 (10jcrespo) @Addshore I hope labs access is not a blocker for this, that can be done at a later date. [16:00:43] 10DBA, 10MediaWiki-Special-pages, 10Wikimedia-Site-requests, 13Patch-For-Review, 07Wikimedia-log-errors: "Invalid DB key" errors on various special pages - https://phabricator.wikimedia.org/T155091#3164121 (10matmarex) [16:12:23] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: codfw rack/setup first 10 DB servers - https://phabricator.wikimedia.org/T162159#3164156 (10Marostegui) [17:03:27] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: codfw rack/setup first 10 DB servers - https://phabricator.wikimedia.org/T162159#3164204 (10Papaul) [17:49:00] 10DBA, 13Patch-For-Review: run pt-table-checksum on s2 (WAS: run pt-table-checksum before decommissioning db1015, db1035,db1044,db1038) - https://phabricator.wikimedia.org/T154485#3164374 (10jcrespo) db1018 and db1021 have the exact same watchlist table, we can now use the slave to compare it to the other serv... [20:25:33] 10DBA, 06Labs, 06Operations: eqiad: (2) hardware access request for labsdb1006 & 7 refresh - https://phabricator.wikimedia.org/T161755#3164860 (10chasemp) [20:25:36] 10DBA, 06Labs, 06Operations: eqiad: (2) hardware access request for labsdb1004 & 5 refresh - https://phabricator.wikimedia.org/T161754#3164861 (10chasemp)