[05:55:09] 10DBA, 10MediaWiki-API, 10MediaWiki-Database, 05MW-1.29-release-notes, and 4 others: ApiQueryExtLinksUsage::run query has crazy limit - https://phabricator.wikimedia.org/T59176#2875574 (10Krinkle) [07:13:47] 10DBA: duplicate key problems - https://phabricator.wikimedia.org/T151029#2875604 (10Marostegui) We can try to: 1) mysqldump that table and place it somewhere else 2) try the alter if it works, it is clearly a bug although we thought that forcing a table rebuild (COPY) always fixed the issue, so it is very wor... [07:16:25] 10DBA, 10MediaWiki-Database, 06Operations: db1028 increased lag after extensions/CentralAuth/maintenance/populateLocalAndGlobalIds.php - https://phabricator.wikimedia.org/T152761#2875607 (10Marostegui) No worries - I have extended the downtime for the lag checks until Monday. However if it finishes before th... [07:22:04] 10DBA: duplicate key problems - https://phabricator.wikimedia.org/T151029#2875608 (10Marostegui) Oh, I just read correctly that you are creating AN EMPTY TABLE!!! :o [07:44:38] 10DBA, 13Patch-For-Review: Fix PK on S5 dewiki.revision - https://phabricator.wikimedia.org/T148967#2875617 (10Marostegui) db1049 (master is done) ``` root@neodymium:~# mysql -hdb1049 -A dewiki -e "show create table revision\G" *************************** 1. row *************************** Table: revis... [07:46:42] 10DBA, 13Patch-For-Review: Fix PK on S5 dewiki.revision - https://phabricator.wikimedia.org/T148967#2875620 (10Marostegui) dewiki.revision is now unified. We agreed on not doing labs/db1069 servers so I believe this is ready to be closed. ``` root@neodymium:/home/marostegui/git/software/dbtools# for i in `cat... [07:47:01] 10DBA: Rampant differences in indexes on enwiki.revision across the DB cluster - https://phabricator.wikimedia.org/T132416#2875622 (10Marostegui) [07:47:02] 10DBA, 13Patch-For-Review: Fix PK on S5 dewiki.revision - https://phabricator.wikimedia.org/T148967#2875621 (10Marostegui) 05Open>03Resolved [07:47:50] 07Blocked-on-schema-change, 10DBA, 06Collaboration-Team-Triage, 10Flow, and 2 others: Add primary keys to remaining Flow tables - https://phabricator.wikimedia.org/T149819#2875623 (10Marostegui) a:03Marostegui [07:53:26] 10DBA, 13Patch-For-Review: Import S2,S6,S7,m3 and x1 to dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T151552#2875632 (10Marostegui) s7 has been imported correctly into dbstore2001 and it is now catching up with the master. [08:25:00] 10DBA, 13Patch-For-Review: Wikidatawiki revision table needs unification - https://phabricator.wikimedia.org/T150644#2875658 (10Marostegui) Alter running on db1071 [08:42:49] 10DBA: duplicate key problems - https://phabricator.wikimedia.org/T151029#2875665 (10Marostegui) I have been trying a few different things but I cannot really how is that duplicate key possible. [09:22:03] 10DBA, 13Patch-For-Review: Wikidatawiki revision table needs unification - https://phabricator.wikimedia.org/T150644#2875714 (10Marostegui) db1071 done ``` root@neodymium:~# mysql -hdb1071 -A wikidatawiki -e "show create table revision\G" *************************** 1. row *************************** T... [09:23:28] 10DBA: duplicate key problems - https://phabricator.wikimedia.org/T151029#2875719 (10Marostegui) I am trying to reproduce this issue in a 10.0.28 version with the same data. [09:32:16] 10DBA, 07Performance: s3 database resource usage and contention increased 2-10x times - https://phabricator.wikimedia.org/T153184#2875757 (10jcrespo) This maintenance T152761#2874723 would explain the extra resource usage. [09:34:34] 10DBA, 13Patch-For-Review: Wikidatawiki revision table needs unification - https://phabricator.wikimedia.org/T150644#2875764 (10Marostegui) Alter running on db1070 [09:35:43] 10DBA: duplicate key problems - https://phabricator.wikimedia.org/T151029#2875769 (10jcrespo) It is labsdb1001, it could be a 10.0.15-specific bug. [10:21:47] 10DBA, 13Patch-For-Review: Wikidatawiki revision table needs unification - https://phabricator.wikimedia.org/T150644#2875867 (10Marostegui) db1070 is done ``` root@neodymium:~# mysql -hdb1070 -A wikidatawiki -e "show create table revision\G" *************************** 1. row ***************************... [10:27:41] 10DBA, 10Wikimedia-Site-requests, 13Patch-For-Review: Recreate a wiki for Wikimedia Portugal - https://phabricator.wikimedia.org/T126832#2024851 (10jcrespo) a:05Dereckson>03jcrespo [10:50:16] 10DBA: Remove partitions from metawiki.pagelinks in s7 - https://phabricator.wikimedia.org/T153300#2875886 (10Marostegui) [10:50:35] 10DBA: Remove partitions from metawiki.pagelinks in s7 - https://phabricator.wikimedia.org/T153300#2875904 (10Marostegui) p:05Triage>03Low [11:59:58] 10DBA, 10Wikimedia-Site-requests, 13Patch-For-Review: Recreate a wiki for Wikimedia Portugal - https://phabricator.wikimedia.org/T126832#2876087 (10Dereckson) Thanks to take care of that Jaime. @Alchimista So you can plan a little bit, we're the last deployment day of the year before the code freeze, so a l... [14:02:29] 10DBA: Unknown cause is creating lag on db1048 under write load (but not on the other m3 slaves) - https://phabricator.wikimedia.org/T151039#2876333 (10Marostegui) There has been no lag for the last 7 days: https://grafana.wikimedia.org/dashboard/db/mysql?var-dc=eqiad%20prometheus%2Fops&var-server=db1048&from=no... [14:03:49] 10DBA: Unknown cause is creating lag on db1048 under write load (but not on the other m3 slaves) - https://phabricator.wikimedia.org/T151039#2876336 (10jcrespo) 05Open>03Resolved a:03Marostegui We suspect this is not infrastructure-caused, but app layer. So lets resolve it. [14:04:33] we should increase (with time) thread_pool_size to 64 globally [14:07:47] Ah I see…we have 32 [14:14:36] 10DBA: duplicate key problems - https://phabricator.wikimedia.org/T151029#2876344 (10Marostegui) This is definitely the table itself. I have tried importing that table (taken with a mysqldump) into a brand new 10.0.28 server and I got ``` "ERROR 1062 (23000) at line 4362: Duplicate entry '0-Merah_Putih' for key... [14:26:45] 10DBA: duplicate key problems - https://phabricator.wikimedia.org/T151029#2876390 (10jcrespo) Plan: stop replication with other server in sync; recreate the table from 0; restart replication. [14:27:27] ^ how do you plan to recreate the table? just copying it from another server? [14:51:06] Question: the tables that exist on x1 for each wiki, are not supposed to exist on the wiki itself. ie: zhwiki which lives in s3, has a database in x1, but those tables in x1 aren't in the s3 copy, is that the way it is, right? [14:51:58] Because I see for instance: aawiki has the same tables in its s3 instance than in x1 (but empty) [14:54:15] that is mediawiki [14:54:23] there are many tables on mediawiki, the product [14:54:36] that we do not use (e.g. jobqueue vs. redis) [14:54:53] text, that we actually use, but to store a reference to ES servers [14:55:11] in some cases, those may be in use [14:55:21] what I am afraid of is copying data from x1 that will overwrite valid data on the wikis themselves [14:55:30] the only tables on x1 databases are called: echo_blabla [14:55:39] ah, I see [14:55:45] they are just 5 tables [14:56:03] one thing I did in the past [14:56:16] I can do an easy find actually to see if any of those tables are in the current dbstore2001 wikis and with sizer larger than 0 :) [14:56:17] is to replicate only wikishared and echo table [14:56:37] of wikishared and flow [14:56:39] *or [14:56:50] to avoid accidental overwrites [14:57:11] yeah, all of them are actually "echo" tables [14:57:23] in some cases, echo is installed locally [14:57:40] you have to go to the config to see which one is the canonical [14:58:41] apart from flow and wikishared, the rest of the tables are only "echo" tables [14:59:18] https://noc.wikimedia.org/conf/highlight.php?file=flowprivate.dblist [14:59:37] ^these are the ones that are dangerous to overwrite [15:00:20] actually, it is on: https://noc.wikimedia.org/conf/highlight.php?file=InitialiseSettings.php [15:00:38] / Use separate database on extension1 cluster for all non-private wikis. [15:00:41] // Use separate database on extension1 cluster for all non-private wikis. [15:01:58] mmm interesting [15:02:32] there is also https://phabricator.wikimedia.org/T119154 [15:05:13] Yeah a quick find reveals that there are not many that are actually used, for instanace, metawiki uses them or kowiki [15:06:11] jynus: I have sent you a mail to our problem on yesterday, but it's not urgent. Thank you [15:19:37] 10DBA, 13Patch-For-Review: Wikidatawiki revision table needs unification - https://phabricator.wikimedia.org/T150644#2876587 (10Marostegui) alter running on db1092 [15:22:34] I saw your comment on dbstore2001 icinga's check about s7 being delayed 24 intentionally…and got me confused [15:26:57] ? [15:27:11] well, that is supposed to be delayed [15:27:14] it has to be [15:27:16] oh, sorry, your comment is from Feb [15:27:17] XD [15:27:32] it will be as soon as we can do it without the events [15:27:39] yeah yeah [15:27:45] but I was confused like: when did we do it? [15:27:50] but I just realised it was from february [15:46:19] 10DBA, 13Patch-For-Review: Import S2,S6,S7,m3 and x1 to dbstore2001 and dbstore2002 - https://phabricator.wikimedia.org/T151552#2876669 (10Marostegui) x1 needs a bit more coordination than just the mysqldump from db2033 as there are tables existing in both, dbstore2001 and db2033. So far, a find reveals that t... [15:47:49] 10DBA, 13Patch-For-Review: Wikidatawiki revision table needs unification - https://phabricator.wikimedia.org/T150644#2876673 (10Marostegui) db1092 is done ``` root@neodymium:/home/marostegui/git/software/dbtools# mysql -hdb1092 -A wikidatawiki -e "show create table revision\G" *************************** 1. r... [18:09:30] 10DBA, 06Operations, 13Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#2877405 (10jcrespo) I have enabled TLS on neodymium and sarin, but because the mysql clients there are not using OpenSSL, clients will fail with: ``` ERROR 2026 (HY000): SSL connection err... [18:11:04] 10DBA, 10MediaWiki-Database, 06Operations: db1028 increased lag after extensions/CentralAuth/maintenance/populateLocalAndGlobalIds.php - https://phabricator.wikimedia.org/T152761#2877415 (10kaldari) 05Open>03Resolved a:03kaldari @Marostegui: The script is finished. Feel free to reinstate the lag checks. [18:12:47] 10DBA, 07Performance: s3 database resource usage and contention increased 2-10x times - https://phabricator.wikimedia.org/T153184#2877427 (10kaldari) 05Open>03Resolved The script is finally finished. [18:13:32] 10DBA, 07Performance: s3 database resource usage and contention increased 2-10x times - https://phabricator.wikimedia.org/T153184#2877431 (10jcrespo) Confirmed that was the cause: https://grafana-admin.wikimedia.org/dashboard/db/mysql-aggregated?from=1481220766259&to=1481825566259&var-dc=eqiad%20prometheus%2Fo... [18:20:51] 10DBA, 07Epic, 13Patch-For-Review: Decouple roles from mariadb.pp into their own file - https://phabricator.wikimedia.org/T150850#2877460 (10jcrespo) Can you see^, @Dzahn? Slowly, but steadily. [18:48:37] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2877682 (10Papaul) @Marostegui HP tech just left site and replaced the main-board once again also installed HP service Pack on the server. We can crash it again. Thanks. [18:50:07] 10DBA, 10MediaWiki-Database, 06Operations: db1028 increased lag after extensions/CentralAuth/maintenance/populateLocalAndGlobalIds.php - https://phabricator.wikimedia.org/T152761#2877690 (10Marostegui) Thanks for the heads up - I have now removed the downtimes. [18:51:05] 10DBA, 13Patch-For-Review: db2034: investigate its crash and reimage - https://phabricator.wikimedia.org/T149553#2877706 (10Marostegui) Thanks for the heads up @Papaul, I will do it tomorrow morning and report the results [20:11:09] 10DBA, 06Operations, 10ops-eqiad: Multiple hardware issues on db1073 - https://phabricator.wikimedia.org/T149728#2878046 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by jynus on neodymium.eqiad.wmnet for hosts: ``` ['db1073.eqiad.wmnet'] ``` The log can be found in `/var/log/wmf-auto-reimage/2... [20:30:08] 10DBA, 06Operations, 10ops-eqiad: Multiple hardware issues on db1073 - https://phabricator.wikimedia.org/T149728#2878062 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db1073.eqiad.wmnet'] ``` Of which those **FAILED**: ``` set(['db1073.eqiad.wmnet']) ```