[06:17:39] 10DBA, 10Growth-Team, 10MediaWiki-Watchlist, 10MW-1.36-notes (1.36.0-wmf.25; 2021-01-05), 10Wikimedia-production-error: ClearUserWatchlistJob/WatchedItemStore::removeWatchBatchForUser bad database peformance on enwiki and others, causing database lag - https://phabricator.wikimedia.org/T270481 (10Marosteg... [06:23:24] PROBLEM - MariaDB sustained replica lag on db2140 is CRITICAL: 1.072e+05 ge 2 https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Replication_lag https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-server=db2140&var-port=9104 [06:29:06] 10DBA, 10Operations, 10ops-codfw: db2140 crashed due to HW memory errors - https://phabricator.wikimedia.org/T271084 (10Marostegui) [06:29:32] 10DBA, 10Operations, 10ops-codfw: db2140 crashed due to HW memory errors - https://phabricator.wikimedia.org/T271084 (10Marostegui) p:05Triage→03Medium [06:32:56] 10DBA: db2078 m1 mysqld process crashed - https://phabricator.wikimedia.org/T270877 (10Marostegui) p:05Triage→03Medium [06:50:39] 10DBA, 10Patch-For-Review: Productionize x2 databases - https://phabricator.wikimedia.org/T269324 (10Marostegui) @CDanis I have been trying to add db2144 as the first slave on x2, but I am getting errors with the validation of `x2` as an accepted value: ` The modified object fails validation: 'x2' does not mat... [06:57:20] 10DBA, 10Data-Persistence-Backup: Drop table profiling from WMF wiki mariadb servers - https://phabricator.wikimedia.org/T266125 (10Marostegui) [07:10:04] 10DBA, 10Data-Persistence-Backup: Drop table profiling from WMF wiki mariadb servers - https://phabricator.wikimedia.org/T266125 (10Marostegui) [07:17:44] 10DBA, 10Data-Persistence-Backup: Drop table profiling from WMF wiki mariadb servers - https://phabricator.wikimedia.org/T266125 (10Marostegui) [07:49:32] 10DBA, 10Data-Persistence-Backup: Drop table profiling from WMF wiki mariadb servers - https://phabricator.wikimedia.org/T266125 (10Marostegui) [07:50:39] 10DBA, 10Data-Persistence-Backup: Drop table profiling from WMF wiki mariadb servers - https://phabricator.wikimedia.org/T266125 (10Marostegui) [08:13:28] Morning :) [08:38:15] hey sobanski [08:39:13] 10DBA, 10Data-Persistence-Backup: Drop table profiling from WMF wiki mariadb servers - https://phabricator.wikimedia.org/T266125 (10Marostegui) [08:39:58] 10DBA, 10Data-Persistence-Backup: Drop table profiling from WMF wiki mariadb servers - https://phabricator.wikimedia.org/T266125 (10Marostegui) [08:40:27] 10DBA, 10Data-Persistence-Backup: Drop table profiling from WMF wiki mariadb servers - https://phabricator.wikimedia.org/T266125 (10Marostegui) [08:41:51] 10DBA, 10Data-Persistence-Backup: Drop table profiling from WMF wiki mariadb servers - https://phabricator.wikimedia.org/T266125 (10Marostegui) On s3 it is only present on: ` +--------------+ | table_schema | +--------------+ | aawiki | | testwiki | +--------------+ 2 rows in set (0.051 sec) ` [08:42:27] 10DBA, 10Data-Persistence-Backup: Drop table profiling from WMF wiki mariadb servers - https://phabricator.wikimedia.org/T266125 (10Marostegui) [08:42:41] 10Data-Persistence-Backup: Run check table periodically on backup source hosts - https://phabricator.wikimedia.org/T265866 (10Marostegui) [08:42:44] 10DBA, 10Data-Persistence-Backup: Drop table profiling from WMF wiki mariadb servers - https://phabricator.wikimedia.org/T266125 (10Marostegui) 05Open→03Resolved This is completed [08:42:47] 10DBA, 10Epic, 10Tracking-Neverending: Database tables to be dropped on Wikimedia wikis and other WMF databases (tracking) - https://phabricator.wikimedia.org/T54921 (10Marostegui) [08:51:51] 10Blocked-on-schema-change, 10DBA: Schema change for renaming user_properties_property index - https://phabricator.wikimedia.org/T270187 (10Marostegui) [08:52:32] 10Blocked-on-schema-change, 10DBA: Schema change for renaming user_properties_property index - https://phabricator.wikimedia.org/T270187 (10Marostegui) [08:52:58] 10Blocked-on-schema-change, 10DBA: Schema change for renaming user_properties_property index - https://phabricator.wikimedia.org/T270187 (10Marostegui) @Ladsgroup this looks good? ` mysql:root@localhost [labtestwiki]> show create table user_properties\G *************************** 1. row *********************... [08:55:49] 10Blocked-on-schema-change, 10DBA: Schema change for renaming user_properties_property index - https://phabricator.wikimedia.org/T270187 (10Marostegui) [08:55:58] 10Blocked-on-schema-change, 10DBA: Schema change for renaming user_properties_property index - https://phabricator.wikimedia.org/T270187 (10Marostegui) [08:56:00] 10Blocked-on-schema-change, 10DBA: Increase size of content_models.model_id - https://phabricator.wikimedia.org/T270053 (10Marostegui) [08:56:30] 10Blocked-on-schema-change, 10DBA: Increase size of content_models.model_id - https://phabricator.wikimedia.org/T270053 (10Marostegui) This schema change cannot be done ONLINE as it changes the datatype of the column. However, the table looks small enough (at least on `enwiki`) that it can be applied directly... [08:57:30] 10Blocked-on-schema-change, 10DBA: Increase size of content_models.model_id - https://phabricator.wikimedia.org/T270053 (10Marostegui) @Ladsgroup looks good?: ` mysql:root@localhost [labtestwiki]> show create table content_models\G *************************** 1. row *************************** Table: co... [08:57:42] 10Blocked-on-schema-change, 10DBA: Increase size of content_models.model_id - https://phabricator.wikimedia.org/T270053 (10Marostegui) [09:04:27] 10DBA, 10Patch-For-Review: Test upgrading sanitarium hosts to Buster + 10.4 - https://phabricator.wikimedia.org/T268742 (10Marostegui) No errors on db1154 after 10 days [09:20:54] 10Blocked-on-schema-change, 10DBA: Increase size of content_models.model_id - https://phabricator.wikimedia.org/T270053 (10Ladsgroup) LGTM [09:22:35] 10Blocked-on-schema-change, 10DBA: Increase size of content_models.model_id - https://phabricator.wikimedia.org/T270053 (10Marostegui) Thanks I will deploy it to s6 and let it run for a couple of days to make sure nothing strange comes up. [09:27:24] 10Blocked-on-schema-change, 10DBA: Schema change for renaming user_properties_property index - https://phabricator.wikimedia.org/T270187 (10Ladsgroup) LGTM. Thanks! [09:30:28] 10Blocked-on-schema-change, 10DBA: Schema change for renaming user_properties_property index - https://phabricator.wikimedia.org/T270187 (10Marostegui) Thanks I will deploy it to s6 and let it run for a couple of days to make sure nothing strange comes up. [09:38:25] 10Blocked-on-schema-change, 10DBA: Increase size of content_models.model_id - https://phabricator.wikimedia.org/T270053 (10Marostegui) s6 is done, let's see if we find issues: ` # /home/marostegui/section s6 | while read host port; do echo "$host:$port"; mysql.py -h$host:$port ruwiki -e "show create table cont... [09:38:37] 10Blocked-on-schema-change, 10DBA: Increase size of content_models.model_id - https://phabricator.wikimedia.org/T270053 (10Marostegui) [09:50:42] 10Blocked-on-schema-change, 10DBA: Increase size of content_models.model_id - https://phabricator.wikimedia.org/T270053 (10Marostegui) [09:57:47] 10Blocked-on-schema-change, 10DBA: Schema change for renaming user_properties_property index - https://phabricator.wikimedia.org/T270187 (10Marostegui) s6 codfw is done: ` # /home/marostegui/section s6 | grep codfw | while read host port; do echo "$host:$port"; mysql.py -h$host:$port ruwiki -e "show create t... [09:58:08] 10Blocked-on-schema-change, 10DBA: Schema change for renaming user_properties_property index - https://phabricator.wikimedia.org/T270187 (10Marostegui) [09:59:54] 10Blocked-on-schema-change, 10DBA: Schema change for renaming user_properties_property index - https://phabricator.wikimedia.org/T270187 (10Marostegui) s6 eqiad: [] labsdb1012 [] labsdb1011 [] labsdb1010 [] labsdb1009 [] dbstore1005 [] db1140 [] db1139 [] db1131 [] db1125 [] db1113 [] db1098 [x] db1096 [] db1... [10:02:24] 10Blocked-on-schema-change, 10DBA: Schema change for renaming user_properties_property index - https://phabricator.wikimedia.org/T270187 (10Marostegui) db1096 on eqiad is altered, I am going to leave it for a couple of days to make sure no queries are forcing that index: ` # for i in frwiki jawiki ruwiki; do e... [10:03:12] 10Blocked-on-schema-change, 10DBA: Schema change for renaming user_properties_property index - https://phabricator.wikimedia.org/T270187 (10Marostegui) [11:31:06] marostegui: Happy new year, ping about this :D https://gerrit.wikimedia.org/r/c/operations/puppet/+/642649 [11:31:18] Amir1: hoi! [11:31:20] checking [11:31:46] Amir1: ah yes [11:31:50] Do you want me to merge? [11:32:12] yes please :D [11:32:33] it should be noop [11:32:43] famous last words! [11:34:23] Amir1: merged, running puppet manually on dbmonitor1001 now [11:34:52] Thanks. There's a couple hundreds left :D [11:35:00] Info: Applying configuration version '(728ffeef14) Marostegui - tendril: Migrate hiera() to lookup() and setting datatype' [11:35:00] Notice: Applied catalog in 16.40 seconds [11:35:01] all good [11:35:08] noop as expected [11:35:10] \o/ [11:35:42] \o\ |o| /o/ [14:31:41] 10DBA, 10Orchestrator, 10User-Kormat: Enable report_host on candidate masters - https://phabricator.wikimedia.org/T271106 (10Marostegui) [14:31:52] 10DBA, 10Orchestrator, 10User-Kormat: Enable report_host on candidate masters - https://phabricator.wikimedia.org/T271106 (10Marostegui) p:05Triage→03Medium [14:32:42] 10DBA, 10Orchestrator, 10User-Kormat: Enable report_host on candidate masters - https://phabricator.wikimedia.org/T271106 (10Marostegui) codfw ones should be easy, the rest would need depooling and we should probably take the opportunity to upgrade kernels if there're pending updates. [14:54:56] 10DBA, 10Orchestrator, 10User-Kormat: Enable report_host on candidate masters - https://phabricator.wikimedia.org/T271106 (10Marostegui) [15:32:44] 10DBA, 10Operations, 10ops-codfw: db2140 crashed due to HW memory errors - https://phabricator.wikimedia.org/T271084 (10Papaul) @Marostegui yes we can swap the DIMM and see . You can depool the server when you can and let me know. [15:34:56] 10DBA, 10Operations, 10ops-codfw: db2140 crashed due to HW memory errors - https://phabricator.wikimedia.org/T271084 (10Marostegui) @Papaul server off! [15:43:42] 10DBA, 10Operations, 10ops-codfw: db2140 crashed due to HW memory errors - https://phabricator.wikimedia.org/T271084 (10Papaul) @Marostegui swapped A7 with B6 , clear the IDRAC log no more errors for now [15:46:03] 10DBA, 10Operations, 10ops-codfw: db2140 crashed due to HW memory errors - https://phabricator.wikimedia.org/T271084 (10Marostegui) Thanks Papaul! I am going to check the data and will close the task once I am done. If it happens again we can reopen and ask Dell for a replacement [17:04:31] 10Data-Persistence-Backup, 10Wikibugs: wikibugs test bug part II - https://phabricator.wikimedia.org/T90594 (10Legoktm) Testing https://gerrit.wikimedia.org/r/c/labs/tools/wikibugs2/+/644529 [17:04:54] sobanski: ^^ seems to work [17:05:31] legoktm: thanks! [18:20:09] 10DBA, 10GrowthExperiments, 10Growth-Team (Current Sprint), 10Patch-For-Review, and 2 others: Slow load times for Special:Homepage on cswiki - https://phabricator.wikimedia.org/T267216 (10kostajh) >>! In T267216#6689751, @MMiller_WMF wrote: > A part of this work, which was a collaboration with the Search t... [19:05:34] 10DBA, 10SRE-tools: Some Data Persistence clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271140 (10LSobanski) [19:34:52] 10DBA, 10SRE-tools: Some Data Persistence clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271140 (10LSobanski) [19:37:21] 10DBA, 10SRE-tools, 10IPv6: Some Data Persistence clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271140 (10Aklapper) [19:42:19] 10Data-Persistence-Backup, 10SRE-tools: Some Data Persistence Backup clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271148 (10LSobanski) [19:43:03] 10DBA, 10SRE-tools, 10IPv6: Some Data Persistence clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271140 (10LSobanski) I moved the backup hosts to https://phabricator.wikimedia.org/T271148. [19:43:21] 10Data-Persistence-Backup, 10SRE-tools: Some Data Persistence Backup clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271148 (10LSobanski) [19:43:49] 10Data-Persistence-Backup, 10SRE-tools: Some Data Persistence Backup clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271148 (10LSobanski) [19:44:20] 10DBA, 10SRE-tools, 10IPv6: Some Data Persistence clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271140 (10LSobanski) [19:47:37] 10DBA, 10SRE-tools, 10IPv6: Some Data Persistence clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271140 (10LSobanski) p:05Triage→03Medium [19:47:46] 10Data-Persistence-Backup, 10SRE-tools: Some Data Persistence Backup clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271148 (10LSobanski) p:05Triage→03Medium [19:48:06] 10DBA, 10SRE-tools, 10IPv6: Some Data Persistence DB clusters apparently do not support IPv6 - https://phabricator.wikimedia.org/T271140 (10LSobanski) [22:11:31] 10DBA, 10Beta-Cluster-Infrastructure, 10Operations, 10serviceops, 10Patch-For-Review: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10Legoktm) [22:57:43] 10DBA, 10Beta-Cluster-Infrastructure, 10Operations, 10serviceops, 10Patch-For-Review: Upgrade the MediaWiki servers to ICU 63 - https://phabricator.wikimedia.org/T264991 (10Legoktm)