[01:05:32] 10DBA, 10MediaWiki-extensions-Translate, 10Privacy Engineering, 10Language-Team (Language-2021-January-March), and 3 others: Error 1146: Table 'mediawikiwiki.translate_cache' doesn't exist - https://phabricator.wikimedia.org/T272428 (10JFishback_WMF) From a privacy perspective, if this table is merely cach... [05:44:13] 10DBA, 10Patch-For-Review: Switchover s4 (commonswiki) from db1081 to db1138 - https://phabricator.wikimedia.org/T271427 (10Marostegui) [05:59:28] 10DBA, 10Patch-For-Review: Switchover s4 (commonswiki) from db1081 to db1138 - https://phabricator.wikimedia.org/T271427 (10Marostegui) [06:01:45] 10DBA, 10Patch-For-Review: Switchover s4 (commonswiki) from db1081 to db1138 - https://phabricator.wikimedia.org/T271427 (10Marostegui) [06:02:09] 10DBA, 10Patch-For-Review: Switchover s4 (commonswiki) from db1081 to db1138 - https://phabricator.wikimedia.org/T271427 (10Marostegui) Pre failover steps are done [06:03:29] 10DBA, 10MediaWiki-extensions-Translate, 10Privacy Engineering, 10Language-Team (Language-2021-January-March), and 3 others: Error 1146: Table 'mediawikiwiki.translate_cache' doesn't exist - https://phabricator.wikimedia.org/T272428 (10Marostegui) @abi_ let's make it private then? [06:22:20] 10Blocked-on-schema-change, 10DBA, 10Fundraising-Backlog: CentralNotice: Update DB schema on Meta for campign types feature - https://phabricator.wikimedia.org/T272953 (10AndyRussG) [06:23:03] 10Blocked-on-schema-change, 10DBA, 10Fundraising-Backlog: CentralNotice: Update DB schema on Meta for campign types feature - https://phabricator.wikimedia.org/T272953 (10AndyRussG) [06:23:27] 10Blocked-on-schema-change, 10DBA, 10Fundraising-Backlog: CentralNotice: Update DB schema on Meta for campign types feature - https://phabricator.wikimedia.org/T272953 (10AndyRussG) [06:27:47] 10Blocked-on-schema-change, 10DBA, 10Fundraising-Backlog: CentralNotice: Update DB schema on Meta for campign types feature - https://phabricator.wikimedia.org/T272953 (10Marostegui) p:05Triage→03Medium [07:10:39] 10DBA: Switchover s4 (commonswiki) from db1081 to db1138 - https://phabricator.wikimedia.org/T271427 (10Marostegui) [07:14:29] 10DBA: Switchover s4 (commonswiki) from db1081 to db1138 - https://phabricator.wikimedia.org/T271427 (10Marostegui) This was done successfully read only on: 07:00:37 read only off: 07:01:52 total read only time: 1:15 minutes [07:15:01] 10DBA: Switchover s4 (commonswiki) from db1081 to db1138 - https://phabricator.wikimedia.org/T271427 (10Marostegui) [07:36:38] 10DBA: Switchover s4 (commonswiki) from db1081 to db1138 - https://phabricator.wikimedia.org/T271427 (10Marostegui) The update to zarcillo database was done manually as it failed, we are investigating and following up on irc about it. Not a big deal. [07:38:59] 10DBA: Switchover s4 (commonswiki) from db1081 to db1138 - https://phabricator.wikimedia.org/T271427 (10Marostegui) [07:44:24] 10DBA: Switchover s4 (commonswiki) from db1081 to db1138 - https://phabricator.wikimedia.org/T271427 (10Marostegui) [07:44:50] 10DBA: Switchover s4 (commonswiki) from db1081 to db1138 - https://phabricator.wikimedia.org/T271427 (10Marostegui) 05Open→03Resolved Thanks @jcrespo and @Kormat for the support! [07:44:53] 10DBA, 10Orchestrator, 10User-Kormat: Enable report_host on candidate masters - https://phabricator.wikimedia.org/T271106 (10Marostegui) [07:44:55] 10DBA, 10Orchestrator, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Marostegui) [07:44:57] 10DBA, 10SRE: db1080-95 batch possibly suffering BBU issues - https://phabricator.wikimedia.org/T258386 (10Marostegui) [07:49:01] 10DBA: Fix db-switchover update zarcillo part - https://phabricator.wikimedia.org/T272954 (10Marostegui) [07:49:16] 10DBA: Fix db-switchover update zarcillo part - https://phabricator.wikimedia.org/T272954 (10Marostegui) p:05Triage→03Medium [07:49:17] there we go kormat ^ :) [07:53:08] you will be able to update db1081 later, right, too? https://phabricator.wikimedia.org/T266483#6772425 [07:57:11] 10DBA: Fix db-switchover update zarcillo part - https://phabricator.wikimedia.org/T272954 (10jcrespo) I will add for context that the core reason for this happening is the required low level of consistency of TokuDB, if we get rid of TokuDB (e.g. moving the db) the source of the problems would dissapear. But I a... [07:57:24] jynus: yeah, but that host is going away as soon as I clone db1160 :) [07:57:42] oh, I see, understood [08:05:55] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db1160.eqiad.wmnet'] ` The log ca... [08:18:24] 10DBA, 10MediaWiki-extensions-Translate, 10Privacy Engineering, 10Language-Team (Language-2021-January-March), and 3 others: Error 1146: Table 'mediawikiwiki.translate_cache' doesn't exist - https://phabricator.wikimedia.org/T272428 (10Nikerabbit) >>! In T272428#6776281, @Marostegui wrote: > @abi_ let's ma... [08:26:02] 10DBA, 10SRE: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1160.eqiad.wmnet'] ` and were **ALL** successful. [09:04:55] 10DBA: Mark mediawikiwiki.translate_cache as private so it doesn't replicate to wiki replicas - https://phabricator.wikimedia.org/T272957 (10Marostegui) [09:05:10] 10DBA: Mark mediawikiwiki.translate_cache as private so it doesn't replicate to wiki replicas - https://phabricator.wikimedia.org/T272957 (10Marostegui) p:05Triage→03Medium [09:11:23] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [09:23:55] 10DBA, 10Phabricator: Restart m3 (phabricator) database master db1132 - https://phabricator.wikimedia.org/T272596 (10Marostegui) Procedure: Pre restart [] Silence m3 hosts [] buffer pool dump + disablement in advance to make the restart faster Restart [] `!log m3 master restart - T272596` [] set phabricato... [09:28:46] 10DBA, 10Orchestrator: Add m* sections to Orchestrator - https://phabricator.wikimedia.org/T272568 (10Marostegui) [09:28:48] 10DBA, 10Wikimedia-Etherpad: Upgrade and restart m1 master (db1080) - https://phabricator.wikimedia.org/T271540 (10Marostegui) [09:29:02] 10DBA, 10Orchestrator: Add m* sections to Orchestrator - https://phabricator.wikimedia.org/T272568 (10Marostegui) [09:29:06] 10DBA, 10Phabricator: Restart m3 (phabricator) database master db1132 - https://phabricator.wikimedia.org/T272596 (10Marostegui) [09:29:31] 10DBA, 10wikitech.wikimedia.org, 10User-notice, 10cloud-services-team (Kanban): Restart m5 master (db1128) - https://phabricator.wikimedia.org/T272388 (10Marostegui) [09:29:33] 10DBA, 10Orchestrator: Add m* sections to Orchestrator - https://phabricator.wikimedia.org/T272568 (10Marostegui) [09:36:21] 10DBA, 10Patch-For-Review: Mark mediawikiwiki.translate_cache as private so it doesn't replicate to wiki replicas - https://phabricator.wikimedia.org/T272957 (10Urbanecm) [09:36:29] 10DBA, 10MediaWiki-extensions-Translate, 10Privacy Engineering, 10Language-Team (Language-2021-January-March), and 3 others: Error 1146: Table 'mediawikiwiki.translate_cache' doesn't exist - https://phabricator.wikimedia.org/T272428 (10Urbanecm) [09:37:09] 10DBA, 10Patch-For-Review: Mark mediawikiwiki.translate_cache as private so it doesn't replicate to wiki replicas - https://phabricator.wikimedia.org/T272957 (10Urbanecm) This is not a subtask of a former train blocker, removing relationship. [09:38:37] 10DBA, 10Patch-For-Review: Mark mediawikiwiki.translate_cache as private so it doesn't replicate to wiki replicas - https://phabricator.wikimedia.org/T272957 (10Kormat) 05Open→03Resolved a:03Kormat Table added to $private_tables, change pushed, sanitarium mariadbs restarted, and changes confirmed to be l... [09:41:15] 10DBA, 10Patch-For-Review: Mark mediawikiwiki.translate_cache as private so it doesn't replicate to wiki replicas - https://phabricator.wikimedia.org/T272957 (10Marostegui) Thanks @Kormat! @abi_ @Nikerabbit can you please let us know when the table is created so we can double check it is indeed not replicated? [10:29:27] 10DBA: Mark mediawikiwiki.translate_cache as private so it doesn't replicate to wiki replicas - https://phabricator.wikimedia.org/T272957 (10Nikerabbit) Creation of the table is not high priority for us as it's not yet needed on production. That's why I tried to keep that discussion out of the train blocker task. [10:55:01] 10DBA, 10OTRS, 10Performance-Team, 10Recommendation-API: Restart m2 database master (db1107) - https://phabricator.wikimedia.org/T272964 (10Marostegui) [10:57:45] 10DBA, 10OTRS, 10Performance-Team, 10Recommendation-API, 10Research: Restart m2 database master (db1107) - https://phabricator.wikimedia.org/T272964 (10Marostegui) p:05Triage→03Medium @akosiaris @hnowlan @MoritzMuehlenhoff @kostajh #performance-team #research I would like to propose Wednesday 5th Feb... [10:59:53] 10DBA, 10OTRS, 10Performance-Team, 10Recommendation-API, 10Research: Restart m2 database master (db1107) - https://phabricator.wikimedia.org/T272964 (10hnowlan) No objections from us on Sockpuppet. [11:03:14] 10DBA, 10OTRS, 10Performance-Team, 10Recommendation-API, 10Research: Restart m2 database master (db1107) - https://phabricator.wikimedia.org/T272964 (10kostajh) >>! In T272964#6776773, @Marostegui wrote: > @akosiaris @hnowlan @MoritzMuehlenhoff @kostajh #performance-team #research I would like to propose... [11:03:27] 10DBA, 10OTRS, 10Performance-Team, 10Recommendation-API, 10Research: Restart m2 database master (db1107) - https://phabricator.wikimedia.org/T272964 (10akosiaris) Fine by me, I 'll keep an eye on OTRS and recommendation api. [11:08:09] 10DBA, 10OTRS, 10Performance-Team, 10Recommendation-API, 10Research: Restart m2 database master (db1107) - https://phabricator.wikimedia.org/T272964 (10MoritzMuehlenhoff) >>! In T272964#6776773, @Marostegui wrote: > @akosiaris @hnowlan @MoritzMuehlenhoff @kostajh #performance-team #research I would like... [11:12:16] 10DBA, 10OTRS, 10Performance-Team, 10Recommendation-API, 10Research: Restart m2 database master (db1107) - https://phabricator.wikimedia.org/T272964 (10Marostegui) Thank you guys for the fast responses! Going to schedule it for Wednesday 5th Feb at 09:00AM UTC then - @dpifke if this doesn't work for `xh... [11:12:21] 10DBA, 10OTRS, 10Performance-Team, 10Recommendation-API, 10Research: Restart m2 database master (db1107) - https://phabricator.wikimedia.org/T272964 (10Marostegui) [11:13:08] 10DBA, 10OTRS, 10Performance-Team, 10Recommendation-API, 10Research: Restart m2 database master (db1107) - https://phabricator.wikimedia.org/T272964 (10Marostegui) [11:13:46] 10DBA, 10OTRS, 10Performance-Team, 10Recommendation-API, 10Research: Restart m2 database master (db1107) - https://phabricator.wikimedia.org/T272964 (10Marostegui) I checked the 2020 calendar (I wonder why.....), Wednesday is 3rd of Feb, not 5th :) [11:16:30] 10DBA, 10OTRS, 10Performance-Team, 10Recommendation-API, 10Research: Restart m2 database master (db1107) - https://phabricator.wikimedia.org/T272964 (10Marostegui) [11:18:51] 10DBA, 10OTRS, 10Performance-Team, 10Recommendation-API, 10Research: Restart m2 database master (db1107) - https://phabricator.wikimedia.org/T272964 (10Marostegui) Maintenance window booked on the deployment calendar [11:51:42] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) db1160 is ready to replace db1081. Leaving it to replicate for 24h before pooling it. [12:09:50] 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on all production instances - https://phabricator.wikimedia.org/T268336 (10Marostegui) m3 cleaned. [13:36:03] 10DBA, 10mariadb-optimizer-bug: Investigate possible optimizer regression on 10.4.17 with DELETE statements - https://phabricator.wikimedia.org/T268457 (10Marostegui) There [[ https://jira.mariadb.org/browse/MDEV-24266?focusedCommentId=178319&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpa... [13:55:05] jynus: great suggestions on the db-switchover CR, thank you :) [13:55:17] "SET STATEMENT" is very neat [13:57:59] I already voted +1, so it was just real comments [13:58:36] I hadn't used that a lot but I tought nice to mention it as we have a few places wheer we can start using it [15:14:39] 10DBA, 10Platform Engineering Roadmap Decision Making, 10SRE, 10Performance-Team (Radar), 10User-Kormat: Remove groups from db configs - https://phabricator.wikimedia.org/T263127 (10Krinkle) @Kormat @Marostegui I believe this is unblocked now for you to remove groups from the db configuration. At this... [15:20:21] 10DBA, 10OTRS, 10Performance-Team, 10Recommendation-API, 10Research: Restart m2 database master (db1107) - https://phabricator.wikimedia.org/T272964 (10Krinkle) [15:58:58] 10DBA, 10OTRS, 10Performance-Team, 10Recommendation-API, 10Research: Restart m2 database master (db1107) - https://phabricator.wikimedia.org/T272964 (10dpifke) [15:59:33] 10DBA, 10OTRS, 10Performance-Team, 10Recommendation-API, 10Research: Restart m2 database master (db1107) - https://phabricator.wikimedia.org/T272964 (10dpifke) No objection here. [16:00:56] 10DBA, 10OTRS, 10Recommendation-API, 10Research, 10Performance-Team (Radar): Restart m2 database master (db1107) - https://phabricator.wikimedia.org/T272964 (10dpifke) [16:13:07] 10DBA, 10wikitech.wikimedia.org: Move database for wikitech (labswiki) to a main cluster section - https://phabricator.wikimedia.org/T167973 (10Andrew) @Marostegui that is indeed more complicated than I was expecting! Does that imply downtime or other effects on non-wikitech wikis? If it makes it any easier,... [16:15:06] I just added a couple of extra metrics we had as gaps on the mysql dashboard: max connections and access denied https://grafana.wikimedia.org/d/000000273/mysql?viewPanel=10&orgId=1 [16:16:23] as I realized those were plottend on the aggregated but not on the individual graph [16:18:51] 10DBA, 10wikitech.wikimedia.org: Move database for wikitech (labswiki) to a main cluster section - https://phabricator.wikimedia.org/T167973 (10Marostegui) >>! In T167973#6777519, @Andrew wrote: > @Marostegui that is indeed more complicated than I was expecting! Does that imply downtime or other effects on no... [16:25:06] marostegui, we were missing access denied from this, now they show up: https://grafana.wikimedia.org/d/000000273/mysql?viewPanel=10&orgId=1&var-server=es1032&var-port=9104&from=1611667503442&to=1611678303442 [16:25:51] what's 0.8 access denied? XD [16:25:56] per second [16:26:18] I can change the units [16:26:23] mmm what is having access denied? [16:26:26] but if we have 10000 / s [16:26:34] is when we want to worry about it [16:26:50] as in, what it means exactly the metric, or what is causing it? [16:27:38] what is causing it [16:27:59] this is a proxysql instance I had some years ago and now I am trying to find :-) [16:28:02] I think [16:28:06] I am working on it [16:28:30] XDDD [16:29:49] I saw it on the global graph, and now I can find it thanks to the per-instance graph [16:30:41] could be also tendril, I will update you when I find it, and in the meantime fixing a few dashboard bugs [16:39:46] we have log_warnings=0, recommended value is 1 and 2 for extra errors (access errors), I may create a ticket to evaluate increasing it (but not important right now) [16:44:04] and we must have quantum effects on servers, because now that I am monitoring the logs, we get no errors :-( [16:47:07] 10Blocked-on-schema-change, 10DBA, 10Fundraising-Backlog: CentralNotice: Update DB schema on Meta for campign types feature - https://phabricator.wikimedia.org/T272953 (10Marostegui) [16:53:35] 10Blocked-on-schema-change, 10DBA, 10Fundraising-Backlog: CentralNotice: Update DB schema on Meta for campign types feature - https://phabricator.wikimedia.org/T272953 (10Marostegui) [16:53:43] 10Blocked-on-schema-change, 10DBA, 10Fundraising-Backlog: CentralNotice: Update DB schema on Meta for campign types feature - https://phabricator.wikimedia.org/T272953 (10Marostegui) testwiki is done: ` # mysql.py -hdb1123 testwiki -e "show create table cn_notice_log\G show create table cn_notices\G" *******... [16:54:42] I found it [16:55:35] * RhinosF1|NotHere is blind today [16:57:56] 10DBA, 10wikitech.wikimedia.org: Move database for wikitech (labswiki) to a main cluster section - https://phabricator.wikimedia.org/T167973 (10Andrew) What steps can I take to get this on your goals list for a future quarter? I've tried pinging @LSobanski but he seems not to be following. We are making ongo... [17:07:18] 10DBA, 10wikitech.wikimedia.org: Move database for wikitech (labswiki) to a main cluster section - https://phabricator.wikimedia.org/T167973 (10Marostegui) @Andrew @lsobanski is out on vacation - he will get back to you once we've discussed this. [17:11:23] 10DBA, 10SRE, 10ops-eqiad: Memory errors on clouddb1019 - https://phabricator.wikimedia.org/T272125 (10Cmjohnson) New DIMM has been dispatched for the server I will coordinate a time with you to power down to restore the original configuration. [17:12:13] 10DBA, 10SRE, 10ops-eqiad: Memory errors on clouddb1019 - https://phabricator.wikimedia.org/T272125 (10Marostegui) Sounds good @Cmjohnson let me know when it arrives and you plan to change it so I can stop mysql Thank you [17:37:08] 10DBA, 10SRE, 10Continuous-Integration-Config, 10Release-Engineering-Team (CI & Testing services), and 2 others: Create integration test env for wmfmariadbpy - https://phabricator.wikimedia.org/T265266 (10thcipriani) [18:17:01] 10DBA, 10mariadb-optimizer-bug: Investigate possible optimizer regression on 10.4.17 with DELETE statements - https://phabricator.wikimedia.org/T268457 (10Marostegui) So far 10.4.18 doesn't seem to change the query plan and keeps picking the wrong one. The file with the sanitized data has been sent to MariaDB... [21:09:33] 10Blocked-on-schema-change, 10DBA, 10Fundraising-Backlog: CentralNotice: Update DB schema on Meta for campaign types feature - https://phabricator.wikimedia.org/T272953 (10Reedy)