[00:04:14] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: (codfw):rack/setup/install db213[2-5] - https://phabricator.wikimedia.org/T237702 (10Papaul) [00:37:31] 10DBA, 10Operations, 10ops-codfw: (codfw):rack/setup/install db213[2-5] - https://phabricator.wikimedia.org/T237702 (10Papaul) [00:38:51] 10DBA, 10Operations, 10ops-codfw: (codfw):rack/setup/install db213[2-5] - https://phabricator.wikimedia.org/T237702 (10Papaul) @Marostegui @jcrespo please fell free to take over the task. Thanks. [06:53:28] 10DBA, 10Operations, 10ops-codfw: (codfw):rack/setup/install db213[2-5] - https://phabricator.wikimedia.org/T237702 (10Marostegui) 05Open→03Resolved [06:53:30] 10DBA, 10Goal: Address Database infrastructure blockers on datacenter switchover & multi-dc deployment - https://phabricator.wikimedia.org/T220170 (10Marostegui) [06:55:19] 10DBA: Productionize db213[2-5} - https://phabricator.wikimedia.org/T238183 (10Marostegui) [06:55:30] 10DBA: Productionize db213[2-5} - https://phabricator.wikimedia.org/T238183 (10Marostegui) p:05Triage→03Normal [06:58:56] 10DBA: Recompress special slaves across eqiad and codfw - https://phabricator.wikimedia.org/T235599 (10Marostegui) [07:15:50] 10DBA: Productionize db213[2-5} - https://phabricator.wikimedia.org/T238183 (10Marostegui) [07:36:57] I am afraid that installing 10.1 on buster isn't as easy as we planned as it expects libjemalloc1 which isn't present on buster, it ships libjemalloc2 [07:37:10] jynus: ^ [07:40:04] There is not even 10.1 packages from mariadb for buster, so I think I am going to reimage those hosts as stretch and go ahead with 10.1 instead of experimenting here, I don't have much time for this now, and it is just 4 hosts. We will soon have db1107 for us, so we can experiment with buster and 10.3 [08:00:49] 10DBA, 10Patch-For-Review: Productionize db213[2-5} - https://phabricator.wikimedia.org/T238183 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db2132.codfw.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/201911130800_marostegui_190... [08:06:26] 10Blocked-on-schema-change, 10DBA, 10Core Platform Team: Schema change for refactored actor and comment storage - https://phabricator.wikimedia.org/T233135 (10Marostegui) [08:06:38] 10Blocked-on-schema-change, 10DBA: Schema change to rename user_newtalk indexes - https://phabricator.wikimedia.org/T234066 (10Marostegui) [08:07:03] 10Blocked-on-schema-change, 10DBA, 10Core Platform Team: Schema change for refactored actor and comment storage - https://phabricator.wikimedia.org/T233135 (10Marostegui) [08:22:57] 10DBA: Compress new Wikibase tables - https://phabricator.wikimedia.org/T232446 (10jcrespo) > on our side (which stores and uses the most of stuff SDC use), we can safely move to another server, we don't do any joins with other tables in the code. It was actually wikibase team that said to us said it was not ye... [08:25:23] 10DBA: Productionize db213[2-5} - https://phabricator.wikimedia.org/T238183 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db2132.codfw.wmnet'] ` and were **ALL** successful. [08:32:32] we can still do 10.3 or 10.4 on stretch (not saying we have to, just suggesting an alternative) [08:32:37] *on buster [08:32:52] yeah, 10.3 on buster is "easy" [08:51:35] I am copiling 10.4 too, just to have more options [08:53:09] nice, thanks [08:57:16] jynus: the jemalloc Debian maintainer can most certainly help you with the build :-) https://packages.qa.debian.org/j/jemalloc.html [10:03:30] 10DBA: Decommission db2067.codfw.wmnet - https://phabricator.wikimedia.org/T233185 (10Marostegui) [10:03:33] 10DBA: Productionize db213[2-5} - https://phabricator.wikimedia.org/T238183 (10Marostegui) [10:03:58] 10DBA, 10MediaWiki-General, 10Operations, 10TechCom: Evaluate and decide the future of relational datastore at WMF after the upgrade of MariaDB 10.1 is finished - https://phabricator.wikimedia.org/T193224 (10jcrespo) db1114 is now running percona-server 8.0, if anyone wants to test it. [10:49:57] there is large numbers of wikidata deadlocks [10:50:12] https://logstash.wikimedia.org/goto/c324309305e72647bb0b05872747836c [10:50:41] Since 8:15 [10:50:50] I pinged Amir1 [10:51:08] I saw db1087 being detected as lagging from time to time [10:51:16] probably due to that query [10:51:29] although it moves between 1-2 seconds [10:51:35] (the lag) [10:51:36] Are dumps running? [10:51:42] no [10:51:52] it looks like a cron [10:51:59] a yes [10:51:59] it's been running for 9h [10:52:03] vslow [10:52:26] yep [12:16:21] marostegui: can we kill it now? [12:16:39] it's not an important thing, In fact I want to suggest killing it in total [12:22:19] I hope the deadlocks didn't cause the script to fail migrating data [12:29:39] FYI, I'm in the process of rolling out a new Icinga check which validates that the microcode mitigatoons for CPU vulns are correctly applied. I made a dry-run and it shows that dbproxy2001,2002,2004 (weirdly not 2003) are running a fixed kernel for MDS, but the mitigations are not visible in /proc/cpuinfo [12:30:32] the kern.log entries are no longer in logs, so it's hard too tell whether it needs a firmware update (we have seen that for the puppet master hosts) or whether it's some other glitch [12:30:55] I'll simply ack these when the check goes live (as I suppose those are not simple to reboot for further tests) [13:06:15] Amir1: It finished a soon as I created the task, but if you think it shouldn't run...+1 [13:06:58] moritzm: we can reboot those anytime, they are not active [13:07:21] I will get the PM approval soon and just disable it the way we disabled it for the other one [13:07:31] thank you :) [13:31:53] 10DBA, 10MediaWiki-Logging, 10Core Platform Team Workboards (Clinic Duty Team), 10Performance Issue, and 2 others: Page creation log cannot be viewed from oldest records, Fatal: "execution time limit of 60 seconds was exceeded" - https://phabricator.wikimedia.org/T237026 (10Marostegui) This bad index choi... [13:57:47] 10DBA: Compress new Wikibase tables - https://phabricator.wikimedia.org/T232446 (10Ladsgroup) >>! In T232446#5659283, @jcrespo wrote: >> on our side (which stores and uses the most of stuff SDC use), we can safely move to another server, we don't do any joins with other tables in the code. > > It was actually w... [14:06:03] 10DBA: Compress new Wikibase tables - https://phabricator.wikimedia.org/T232446 (10jcrespo) > I'm in the wikibase team. Can you tell me who said it and where, maybe I'm missing something? Technically it's not possible but it's just matter of sending proper connection to the class and that's all. I didn't talk d... [14:07:49] marostegui: ack, thx [15:50:22] 10DBA, 10Operations: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10Marostegui) [15:53:48] 10DBA: Productionize db213[2-5} - https://phabricator.wikimedia.org/T238183 (10Marostegui) a:05Papaul→03Marostegui [15:54:18] 10DBA, 10Operations, 10ops-codfw: (codfw):rack/setup/install db213[2-5] - https://phabricator.wikimedia.org/T237702 (10Marostegui) Thanks @Papaul - the hosts look good, I will create another task to productionize them Thanks! [16:43:17] 10DBA, 10Operations, 10ops-codfw: Upgrade db2072 firmware and bios - https://phabricator.wikimedia.org/T237905 (10Papaul) a:05Papaul→03Marostegui Before BIOS Version 2.4.3 Firmware Version 2.40 After BIOS Version 2.10.5 Firmware Version 2.70 [16:47:13] 10DBA, 10Operations, 10ops-codfw: Upgrade db2072 firmware and bios - https://phabricator.wikimedia.org/T237905 (10Marostegui) 05Open→03Resolved Thanks - I have started MySQL (and run mysql_upgrade). Thanks Jaime for getting this host down for Papaul too! [16:47:16] 10DBA, 10Operations: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10Marostegui) [16:47:23] 10DBA, 10Operations: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10Marostegui)