[00:35:53] 10DBA, 10Data-Services, 10Epic: Labs database replica drift - https://phabricator.wikimedia.org/T138967#2415416 (10MusikAnimal) ```lang=sql SELECT COUNT(rev_id) FROM enwiki_p.revision WHERE rev_page = 48357647 ``` returns 4 when it should be 0, as the [[ https://en.wikipedia.org/wiki/Draft:Sohail_Khan | page... [00:57:30] 10DBA, 10Data-Services, 10Epic: Labs database replica drift - https://phabricator.wikimedia.org/T138967#3587107 (10bd808) >>! In T138967#3586971, @Dispenser wrote: > From arwiki: [[https://ar.wikipedia.org/wiki/File:%D8%A7%D9%84%D9%84%D9%87_%D8%B9%D8%B2_%D9%88%D8%AC%D9%84.png|File:الله عز وجل.png]]. Deleted... [00:59:31] 10DBA, 10Data-Services, 10Epic: Labs database replica drift - https://phabricator.wikimedia.org/T138967#3587109 (10bd808) [05:19:55] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2010 - https://phabricator.wikimedia.org/T175228#3587269 (10Marostegui) @Papaul please change the disk whenever you can. Thanks! [05:21:33] 10DBA: truncate l10n_cache table on WMF wikis - https://phabricator.wikimedia.org/T150306#3587272 (10Marostegui) 05Open>03Resolved This is now all done [05:21:38] 10DBA, 10Epic, 10Tracking: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921#3587274 (10Marostegui) [08:29:50] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2010 - https://phabricator.wikimedia.org/T175228#3587467 (10jcrespo) @Marostegui, are we sure we want this done, and not get rid of the host directly?- it is a very old host and we have its replacements setup. I would ask how many spare disks we have left,... [08:31:47] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2010 - https://phabricator.wikimedia.org/T175228#3587473 (10Marostegui) >>! In T175228#3587467, @jcrespo wrote: > @Marostegui, are we sure we want this done, and not get rid of the host directly?- it is a very old host and we have its replacements setup. I... [08:38:44] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2010 - https://phabricator.wikimedia.org/T175228#3587490 (10jcrespo) @Papaul Do you have plenty of old 300GB disks that would not be used otherwise or should we speed up the decomissioning (it will happen eventually, but right now we have other priorities). [08:39:42] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2010 - https://phabricator.wikimedia.org/T175228#3587492 (10Marostegui) a:03Papaul [09:10:47] 10DBA: Run pt-table-checksum on s4 (commonswiki) - https://phabricator.wikimedia.org/T162593#3587559 (10jcrespo) All differences mentioned on T162593#3175313 handled. Now going with ``` oldimage text watchlist ``` [09:16:48] 10DBA, 10Patch-For-Review: Productionize 11 new eqiad database servers - https://phabricator.wikimedia.org/T172679#3587586 (10Marostegui) [09:18:15] 10DBA: run pt-tablechecksum on s5 - https://phabricator.wikimedia.org/T161294#3587590 (10Marostegui) db1100 has been cloned from db1049 and migrated to file per table. It is now catching up, once it has caught up I will create the decommission task for db1049 but I won't act on it as I will be gone for two weeks... [11:28:37] 10DBA, 10Patch-For-Review: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662#3587855 (10Marostegui) [11:48:10] so the eventlogging_cleaner.py script that I was running dbstore1002 and db1047 was heavily trashing mysqld [11:48:17] https://grafana.wikimedia.org/dashboard/file/server-board.json?var-server=db1047&refresh=1m&orgId=1&from=now-1h&to=now&panelId=19&fullscreen after I stopped it [11:48:50] yeah nice: https://grafana.wikimedia.org/dashboard/file/server-board.json?var-server=db1047&refresh=1m&orgId=1&from=now-7d&to=now&panelId=19&fullscreen [11:49:24] on db1047 this was the top of top: 5254 mysql 20 0 0.112t 0.040t 8548 S 57.4 65.5 257007:05 mysqld [11:49:36] on dbstore1002 was even worse [11:49:57] I noticed it because the UPDATE queries were taking more and more and more to complete [11:50:53] on dbstore1047 the batch size was 100k (that in the beginning was super uber quick, ~4s to complete) meanwhile on dbstore1002 10k [11:51:08] dbstore1002 has a lot more load [11:51:13] as it replicates all the shards [11:51:42] ah I can still see mysqld now in top with huge Virt/Res values.. [11:52:03] is it normal due to the nature of mysqld or sign of some sort of trashing? [11:52:06] yeah, but that isnormal [11:52:14] ah okok [11:53:01] maybe I can retry with a less aggressive schedule? It was 1s between each batch [11:53:08] I can wait say 5/10s and see [11:53:42] yeah, worth a try [11:53:51] the disk was saturated though, so it makes sense it was slow as hell [11:54:29] I was wondering why the hell a batch was taking so long [11:54:34] then I checked the graphs :P [11:54:50] ok so I'll retry with db1047 only and 10s of sleep [12:01:41] marostegui: not really a big diff - https://grafana.wikimedia.org/dashboard/file/server-board.json?var-server=db1047&refresh=1m&orgId=1&from=now-1h&to=now&panelId=19&fullscreen [12:03:23] I'll try smaller batches [12:24:19] 10s of sleep + 1000 rows in the batch seems to be sustainable [12:27:28] trying also other combinations [12:33:05] 10k with 4s between baches looks good too [13:22:57] 10DBA, 10Operations, 10ops-eqiad: Decommission db1049 - https://phabricator.wikimedia.org/T175264#3588137 (10Marostegui) [13:23:14] 10DBA, 10Operations, 10ops-eqiad: Decommission db1049 - https://phabricator.wikimedia.org/T175264#3588137 (10Marostegui) p:05Triage>03Normal [13:36:04] 10Blocked-on-schema-change, 10Community-Tech, 10MediaWiki-Database, 10Hindi-Sites, and 3 others: Allow comments longer than 255 bytes - https://phabricator.wikimedia.org/T6715#3588249 (10Anomie) [13:44:19] 10Blocked-on-schema-change, 10DBA, 10Data-Services, 10Dumps-Generation, 10MediaWiki-Platform-Team: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569#3588328 (10jcrespo) @bd808 @Anomie I think this could "break" some tools, maybe it would be nice to give to "announce... [14:44:11] marostegui: jynus for your reading pleasure whenever you have a sec https://www.mediawiki.org/wiki/Special_Interest_Groups and https://wikitech.wikimedia.org/wiki/Wiki_Replicas_Special_Interest_Group [14:44:43] 10Blocked-on-schema-change, 10DBA, 10Data-Services, 10Dumps-Generation, 10MediaWiki-Platform-Team: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569#3588548 (10Anomie) This change itself shouldn't break any tools. I can't promise that it won't because tools might be... [14:45:07] chasemp: I gave them a quick look, but I not sure I follow the right context to that? [14:45:53] Monthly meeting with a few of us (including you guys assuming you are still ok w/ it) where we discuss the setup and anything that is on the horizon or open tasks etc [14:45:54] jynus: I believe it is the monthly meeting about the future or the present of cloud (and for us databases) that me and chase thought it would be a good idea to do on Wikimania [14:46:02] that's it [14:46:19] we are putting a bit of a formal bow on it to make it fancy calling it a SIG [14:46:25] and giving us a platform for making Decisions(tm) [14:46:33] cue 'dun dun dunnnnn' [14:47:28] I am not sure about that, I mean, I am all for meeting you and supporting you [14:48:02] but shouldn't an organ like that be comprised of users and they give feedback to us to support them? [14:48:14] users in the broad sense [14:48:23] yes, users who we know are large stakeholders are welcome [14:48:41] halfak for instance has said he'll come to communicate end user issues he has seen [14:48:45] I guess they will start joining once these meetings start happening I guess? [14:49:00] let me put it in other terms- I do not need anything from cloud, and probably will never need [14:49:14] I serve cloud to achieve its goals [14:49:23] I think it's mostly us needing things from you and wanting to keep you guys in the loop [14:49:30] I think it is more as a sync meeting so we both (teams) know what might be coming [14:49:43] it's a "make time as this is important" approach rather than a "We'll find time if it's important" [14:49:48] yeah [14:50:09] if it doesn't work out / seems pointless / gets too weird we can always disband [14:50:11] I still do not understand, but probably manuel talked more time with you to get it [14:50:26] I am not critizising it, I just do not get it (yet?) [14:50:31] jynus take it as a monthly sync meeting I would say [14:50:37] yep [14:50:52] assuming that's cool [14:51:24] I like the idea, let's do a few iterations and then evaluate if it is needed [14:51:31] sounds good [14:51:48] yes, but I see 1 person from cloud and 1 person from DBA meeting, not "everybody at the same time, including users" [14:52:08] This could end up much more between us "cloud" and users, with inviting you guys when we know there is a big topic coming [14:52:15] yeah [14:52:17] it's early days jynus, we are trying to work it out :) [14:52:20] that I would understand [14:52:33] like, you reaching users and needing at some point feedback from us [14:52:38] that I would get it [14:53:06] If you guys are willing to commit to coming to the monthly for a bit we'll figure out what makes sense [14:53:08] it is not that I do not want to talk to users, it is just that users want things that I do not sometimes even understand [14:53:24] chasemp: you should invite jynus as special guest on special occasions ;) [14:53:37] jynus: we need you as a paternal figure to assuage our fears and make us feel it's all going to be arlight [14:53:49] chasemp: I am fine with trying it out and then reevaluate after a few of them yes [14:53:56] chasemp: hey, you were the ones that formalized it so much [14:54:05] :-) [14:54:29] I'm a bureaucrat at heart it seems [14:54:50] you mean a manager? :-p [14:54:53] marostegui: jynus ok cool, just wanted to give a heads up :) if one if you could come to an invite we'll send for a bit that's good enough :) [14:54:57] heh [14:55:15] I am just trying to keep my meetings to a minimum, whenever there is real need [14:57:21] totally fair [14:58:35] 10Blocked-on-schema-change, 10DBA, 10Data-Services, 10Dumps-Generation, 10MediaWiki-Platform-Team: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569#3588577 (10jcrespo) Thank you @Anomie, that is exactly why I was confused and why I said I didn't know all the steps... [15:01:35] 10Blocked-on-schema-change, 10DBA, 10Data-Services, 10Dumps-Generation, 10MediaWiki-Platform-Team: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569#3588585 (10jcrespo) @Anomie one last question, related to the process, this task (T174569) is blocking the process, w... [15:05:07] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2010 - https://phabricator.wikimedia.org/T175228#3588594 (10Papaul) @jcrespo technically we do not have any 300GB spare disks. I am trying to load my Google spreadsheet for server decommission to see if we we do have a server with 300GB but can't for the m... [15:09:51] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2010 - https://phabricator.wikimedia.org/T175228#3588596 (10jcrespo) @Papaul no problem- do not work too hard, we may replace the full server soon. [15:21:00] 10DBA, 10Operations, 10Patch-For-Review: dbtree: don't return 200 on error pages - https://phabricator.wikimedia.org/T163143#3588637 (10Dzahn) 05Open>03Resolved a:03Dzahn https://gerrit.wikimedia.org/r/#/c/353388/1/index.php [15:21:02] 10DBA, 10Operations, 10Traffic, 10Patch-For-Review: dbtree broken (for some users?) - https://phabricator.wikimedia.org/T162976#3588640 (10Dzahn) [15:46:15] 10Blocked-on-schema-change, 10DBA, 10Data-Services, 10Dumps-Generation, 10MediaWiki-Platform-Team: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569#3588736 (10Anomie) No problem at all, I'm happy to answer good questions. I should be on IRC as usual, 1300 UTC to 21... [15:49:09] 10Blocked-on-schema-change, 10DBA, 10Data-Services, 10Dumps-Generation, 10MediaWiki-Platform-Team: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569#3588742 (10jcrespo) p:05Low>03Normal Setting it to normal, I will adjust it to High if more time passes with no f... [17:15:14] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2010 - https://phabricator.wikimedia.org/T175228#3589016 (10Papaul) @jcrespo db200[1-9} all have 12x300Gb disks we can pull one out and use it for db2010 for now. [17:19:26] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2010 - https://phabricator.wikimedia.org/T175228#3589027 (10jcrespo) Yes, those are unused, you can use one of those with no problem. Please do if it doesn't take much of your time, thank you. [17:24:03] 10DBA, 10Data-Services, 10cloud-services-team: Identify tools hosting databases on labsdb100[13] and notify maintainers - https://phabricator.wikimedia.org/T175096#3589046 (10bd808) Initial list of accounts: {P5960}