[09:17:19] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q2-Oct-Dec-2017): Determine how to update old compressed ExternalStore entries for T181555 - https://phabricator.wikimedia.org/T183419#3852480 (10jcrespo) I would definitely suggest using a new cluster (active or separate, probably a separate one with 1:1 duplication woul... [09:32:07] 10DBA, 10MediaWiki-extensions-Newsletter, 10Google-Code-in-2017, 10Patch-For-Review, 10Performance: List of Newsletters should have a column showing the number of issues - https://phabricator.wikimedia.org/T180979#3853483 (10jcrespo) You lack one key metric to give you an idea- how many rows are there on... [09:38:31] 10DBA, 10Beta-Cluster-Infrastructure, 10MW-1.31-release-notes (WMF-deploy-2018-01-02 (1.31.0-wmf.15)), 10Patch-For-Review: Unbreak replication in beta cluster - https://phabricator.wikimedia.org/T183252#3853489 (10jcrespo) @Addshore "blocking one of the MCR related tickets" If things like this are highly... [10:08:54] can I reuse db1055 for misc? [10:13:06] yep [10:13:07] all yours [10:19:17] 10DBA, 10Operations, 10Goal: Decommission old coredb machines (<=db1050) - https://phabricator.wikimedia.org/T134476#3853576 (10jcrespo) [10:20:04] 10DBA, 10Goal: Decommission database hosts < db2030 (tracking) - https://phabricator.wikimedia.org/T176243#3853577 (10jcrespo) [10:21:44] 10DBA, 10Operations, 10hardware-requests, 10Goal: reclaim and return all cisco servers - https://phabricator.wikimedia.org/T128821#2087114 (10jcrespo) [10:23:18] 10DBA, 10Data-Services, 10Goal, 10Patch-For-Review, 10cloud-services-team (FY2017-18): Migrate all users to new Wiki Replica cluster and decommission old hardware - https://phabricator.wikimedia.org/T142807#3853594 (10jcrespo) [10:23:21] 10DBA, 10Operations, 10hardware-requests, 10Goal: reclaim and return all cisco servers - https://phabricator.wikimedia.org/T128821#2087114 (10jcrespo) [10:23:39] 10DBA, 10Operations, 10hardware-requests, 10Goal: reclaim and return all cisco servers - https://phabricator.wikimedia.org/T128821#3853595 (10jcrespo) [10:23:41] 10DBA, 10Data-Services, 10Goal, 10Patch-For-Review, 10cloud-services-team (FY2017-18): Migrate all users to new Wiki Replica cluster and decommission old hardware - https://phabricator.wikimedia.org/T142807#2546929 (10jcrespo) [10:26:53] do you think we should have 1 candidate in STATEMENT on all shards, or we should, but not a huge priority? [10:27:19] yeah, it would be good to have them ready [10:27:30] not sure if it has to be done right right right now, but we should doi it [10:28:06] At least it is easier to pick a candidate, in an emergency, we do not have to think about which one to pick [10:29:37] one more question, is db1056 available too from commonswiki, or just temporarilly down? [10:29:55] if I available, I will take it now, too [10:30:06] available too! [10:30:19] great, so we are in a better place than I expected [10:30:36] we still have to replace db1030 (dump in s6) [10:30:40] but that can wait I think [10:30:55] We can reuse one of the new DBs to reshuffle a bit and get rid of it [10:31:06] But I think db1055 and db1056 should go to misc [10:33:39] oh, you repooled db1067 already [10:33:56] or maybe I did? cannot remember that [10:34:46] I don't think I did [10:35:53] anyway, check my short-term plan: [10:36:11] oh, you already did [10:36:29] hehe, yeah, I just commented, if you don't mind leaving db1039 there for now [10:36:33] I use it as a reminder :) [10:36:46] data has not been fully checksummed there [10:37:37] for some reason, I got the wrong ticket [10:37:43] when I chacked for that one [10:37:58] and thought it pointed to "decom", not to the checksum one [10:40:20] Yes, not big deal, I can still checksum it, but I prefer to leave in on the config files till it is totally checksummed [10:40:27] sure, I agree [10:40:42] I just thought it was an unintended leftover [10:42:59] and do you see why I bother you every time? [10:43:06] I am old and cannot be trusted [10:43:21] ahhaha [10:43:32] Sorry for nitpick stuff like the commit message [10:43:40] I just wanted to make sure you were not forgetting something [10:43:41] :) [10:43:45] no, no nitpick [10:43:54] I wonder who repooled db1067 [10:44:33] it looks like you: https://gerrit.wikimedia.org/r/#/c/398058/ [10:44:34] XD [10:44:43] Jcrespo Wed Dec 13 15:05:38 2017 +0000 [10:44:46] yeah [10:44:48] haha [10:46:29] I will probably create misc_multiinstance, if I see I can setup everyting on 2 days [10:47:16] If not, I will at least leave things ready for a failover to strech of most mic services (which is why I upgraded the proxies, too) [10:47:48] nice, misc_multi-instance is a win [10:48:37] The thing is, I am not sure if we should keep X and X_multiinstnace [10:48:51] and if we do, because they are different roles [10:49:13] we should refactor into profiles to avoid code duplication [10:49:40] FYI, in case you have some outage on misc [10:50:18] some passive misc hosts upgraded to stretch have manual fixes because I could not bothered to create hiera keyes for the upgrade [10:50:43] that will be fixed once we have active stretch masters [10:51:13] ok, good to know, thanks :) [10:52:19] 2 fixes- ln -s /run/mysqld/mysqld.socket /tmp/mysql.sock [10:52:30] haha nice one [10:52:51] and the basedir on config, which is hardcoded to mariadb10 [10:59:26] 10DBA, 10Beta-Cluster-Infrastructure, 10MW-1.31-release-notes (WMF-deploy-2018-01-02 (1.31.0-wmf.15)), 10Patch-For-Review: Unbreak replication in beta cluster - https://phabricator.wikimedia.org/T183252#3853708 (10Addshore) >>! In T183252#3852204, @Catrope wrote: > I exported the data on db03, imported it... [11:23:43] 10DBA, 10Beta-Cluster-Infrastructure, 10MW-1.31-release-notes (WMF-deploy-2018-01-02 (1.31.0-wmf.15)), 10Patch-For-Review: Unbreak replication in beta cluster - https://phabricator.wikimedia.org/T183252#3853749 (10Catrope) >>! In T183252#3853489, @jcrespo wrote: > @Addshore "blocking one of the MCR related... [11:45:42] 10DBA, 10Beta-Cluster-Infrastructure, 10MW-1.31-release-notes (WMF-deploy-2018-01-02 (1.31.0-wmf.15)), 10Patch-For-Review: Unbreak replication in beta cluster - https://phabricator.wikimedia.org/T183252#3853800 (10daniel) @Catrope thank you for the detailed post mortem! [12:05:11] 10DBA, 10Beta-Cluster-Infrastructure, 10MW-1.31-release-notes (WMF-deploy-2018-01-02 (1.31.0-wmf.15)), 10Patch-For-Review: Unbreak replication in beta cluster - https://phabricator.wikimedia.org/T183252#3853814 (10jcrespo) > I think the background here is not that this broke MCR To clarify, I on my commen... [12:10:46] 10DBA, 10Beta-Cluster-Infrastructure, 10MW-1.31-release-notes (WMF-deploy-2018-01-02 (1.31.0-wmf.15)), 10Patch-For-Review: Unbreak replication in beta cluster - https://phabricator.wikimedia.org/T183252#3853821 (10Addshore) > My comment was specifically about the unbreakage, that more people should know ho... [12:21:22] 10DBA, 10Beta-Cluster-Infrastructure, 10MW-1.31-release-notes (WMF-deploy-2018-01-02 (1.31.0-wmf.15)), 10Patch-For-Review: Unbreak replication in beta cluster - https://phabricator.wikimedia.org/T183252#3853855 (10jcrespo) > Are there any written docs https://www.mediawiki.org/wiki/Manual:MySQL [12:22:18] 10DBA, 10Beta-Cluster-Infrastructure, 10MW-1.31-release-notes (WMF-deploy-2018-01-02 (1.31.0-wmf.15)), 10Patch-For-Review: Unbreak replication in beta cluster - https://phabricator.wikimedia.org/T183252#3853866 (10jcrespo) In particular, https://www.mediawiki.org/wiki/Manual:Backing_up_a_wiki [12:28:01] 10DBA, 10Beta-Cluster-Infrastructure, 10MW-1.31-release-notes (WMF-deploy-2018-01-02 (1.31.0-wmf.15)), 10Patch-For-Review: Unbreak replication in beta cluster - https://phabricator.wikimedia.org/T183252#3853888 (10jcrespo) BTW, let's move discussion to a separate ticket, this is technically closed. [13:07:27] 10DBA, 10Patch-For-Review: run pt-tablechecksum on s5/s8 - https://phabricator.wikimedia.org/T161294#3853936 (10Marostegui) After fixing lots of differences on wikidata tables, it is impossible to say that it is fixed 100% but it is in a much better consistent state that it was before and I believe it is quite... [14:06:34] 10DBA, 10Wikimedia-Site-requests: Global rename of ولاء → لا روسا: supervision needed - https://phabricator.wikimedia.org/T183467#3854087 (10alanajjar) [14:06:56] 10DBA, 10Wikimedia-Site-requests: Global rename of ولاء → لا روسا: supervision needed - https://phabricator.wikimedia.org/T183467#3854073 (10alanajjar) [14:59:11] 10DBA, 10Operations: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#3854189 (10jcrespo) p:05Triage>03Normal [14:59:45] 10DBA, 10Operations: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#3854202 (10jcrespo) Very related to T183249 [15:00:18] 10DBA, 10Operations, 10Patch-For-Review: Reimage and upgrade to stretch all dbproxies - https://phabricator.wikimedia.org/T183249#3848018 (10jcrespo) Very related (blocked on failovers of): T183469 [15:06:15] 10DBA, 10Operations: Setup newer machines and replace all old misc (m*) and x1 codfw machines - https://phabricator.wikimedia.org/T183470#3854215 (10jcrespo) p:05Triage>03Normal [15:07:39] 10DBA, 10Operations: Setup newer machines and replace all old misc (m*) and x1 codfw machines - https://phabricator.wikimedia.org/T183470#3854229 (10jcrespo) Eqiad version: T183469 Very related: T170662 [15:08:08] 10DBA, 10Patch-For-Review: Productionize 22 new codfw database servers - https://phabricator.wikimedia.org/T170662#3681943 (10jcrespo) See: T183470 [15:46:19] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q2-Oct-Dec-2017): Determine how to update old compressed ExternalStore entries for T181555 - https://phabricator.wikimedia.org/T183419#3854304 (10Anomie) [16:20:39] 10Blocked-on-schema-change, 10DBA, 10Data-Services, 10Dumps-Generation, and 2 others: Schema change for refactored comment storage - https://phabricator.wikimedia.org/T174569#3854397 (10CCicalese_WMF) [16:33:24] 10DBA, 10Operations, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#3854189 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by jynus on neodymium.eqiad.wmnet for hosts: ``` ['db1055.eqiad.wmnet', 'db1056.eqi... [16:38:48] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q2-Oct-Dec-2017): Determine how to update old compressed ExternalStore entries for T181555 - https://phabricator.wikimedia.org/T183419#3854456 (10Anomie) > Note that clusters are now a days fully logical entities, and that they can physically be on the same machines than... [16:50:12] 10DBA, 10Operations, 10hardware-requests, 10Goal: reclaim and return all cisco servers - https://phabricator.wikimedia.org/T128821#3854519 (10RobH) a:05RobH>03None [16:51:32] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018): Determine how to update old compressed ExternalStore entries for T181555 - https://phabricator.wikimedia.org/T183419#3854524 (10Anomie) [17:05:08] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018): Determine how to update old compressed ExternalStore entries for T181555 - https://phabricator.wikimedia.org/T183419#3854606 (10jcrespo) @Anomie Apologies if I didn't express myself clearly. You may or may not have understood my proposal, it is not 100%... [17:09:31] 10DBA, 10Operations, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#3854610 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db1056.eqiad.wmnet', 'db1055.eqiad.wmnet'] ``` and were **ALL** successful. [17:15:23] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018): Determine how to update old compressed ExternalStore entries for T181555 - https://phabricator.wikimedia.org/T183419#3854616 (10jcrespo) Data I have right now: es1 (the read only hosts): have 11TB available, they are using 7.8T es2 and es3 (read-write):... [18:23:08] 10DBA, 10Operations, 10Patch-For-Review: Rack and setup db1111 and db1112 - https://phabricator.wikimedia.org/T180788#3854799 (10Marostegui) This has been all set. Servers replicate between each other (db1111 being the master). They contain, as requested, Commonswiki and eowiki Credentials were sent to MCR... [18:29:36] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018): Determine how to update old compressed ExternalStore entries for T181555 - https://phabricator.wikimedia.org/T183419#3854818 (10Anomie) "storing similar pages together for better optimization" would be the large code rewrite, since ExternalStore only get... [18:35:00] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018): Determine how to update old compressed ExternalStore entries for T181555 - https://phabricator.wikimedia.org/T183419#3854832 (10jcrespo) Following from that, and my data, do you think we could fit the rows we have to rewrite on the "old" or on the new se... [20:35:10] 10DBA, 10MediaWiki-extensions-Newsletter, 10Google-Code-in-2017, 10Patch-For-Review, 10Performance: List of Newsletters should have a column showing the number of issues - https://phabricator.wikimedia.org/T180979#3855333 (10Bawolff) Its very unclear what sort of growth nl_issues/nl_newsletters will have... [21:09:26] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018): Determine how to update old compressed ExternalStore entries for T181555 - https://phabricator.wikimedia.org/T183419#3855448 (10Anomie) Hmm. 11TB available, 7.8T used, leaves 3.2T free, which is room to resave about 40% of the data. On dewiki only about...