[07:05:41] 10DBA, 10Gerrit, 10Operations, 10Patch-For-Review, 10Release-Engineering-Team (Next): Gerrit is failing to connect to db on gerrit2001 thus preventing systemd from working - https://phabricator.wikimedia.org/T176532#3628916 (10elukey) Just added a week of downtime to gerrit2001 since icinga was spamming. [07:10:33] 10DBA, 10Operations, 10ops-eqiad: Move db1069 to A1 - https://phabricator.wikimedia.org/T186699#4022894 (10Marostegui) @Cmjohnson let us know if you have time to do this sometime this week. Thanks! [07:40:29] 10DBA, 10Operations, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4022936 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts: ``` ['db1073.eqiad.wmnet'] ``` Th... [07:51:53] 10DBA, 10Operations, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4022944 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db1073.eqiad.wmnet'] ``` Of which those **FAILED**: ``` ['db1073.eqiad.wmnet'] ``` [07:55:40] 10DBA, 10Operations, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4022950 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts: ``` ['db1073.eqiad.wmnet'] ``` Th... [07:58:17] 10DBA, 10MediaWiki-extensions-WikibaseRepository, 10Wikidata, 10Patch-For-Review, 10Wikidata-Sprint-2018-02-28: Consider dropping the "wb_items_per_site.wb_ips_site_page" index - https://phabricator.wikimedia.org/T179793#4022954 (10thiemowmde) [08:09:17] 10DBA, 10Operations, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4022961 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts: ``` ['db1073.eqiad.wmnet'] ``` Th... [08:35:39] 10DBA, 10Operations, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4022999 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db1073.eqiad.wmnet'] ``` and were **ALL** successful. [09:12:03] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182#4023055 (10Marostegui) [09:12:04] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-Database, 10Multi-Content-Revisions, and 3 others: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128#4023056 (10Marostegui) [10:02:03] 10Blocked-on-schema-change, 10Reading List Service, 10Reading-Infrastructure-Team-Backlog (Kanban): Deploy ReadingLists schema change for efficient count(*) handling - https://phabricator.wikimedia.org/T188048#4023176 (10jcrespo) https://wikitech.wikimedia.org/w/index.php?title=Deployments&action=historysubm... [10:12:25] 10DBA, 10Operations, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4023189 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on neodymium.eqiad.wmnet for hosts: ``` ['db1073.eqiad.wmnet'] ``` Th... [10:13:03] 10Blocked-on-schema-change, 10Reading List Service, 10Reading-Infrastructure-Team-Backlog (Kanban): Deploy ReadingLists schema change for efficient count(*) handling - https://phabricator.wikimedia.org/T188048#4023190 (10jcrespo) a:03jcrespo [10:28:30] 10Blocked-on-schema-change, 10Reading List Service, 10Reading-Infrastructure-Team-Backlog (Kanban): Deploy ReadingLists schema change for efficient count(*) handling - https://phabricator.wikimedia.org/T188048#4023207 (10jcrespo) a:05jcrespo>03Tgr This has been applied to production: ``` $ grep -v dbstor... [10:32:29] 10DBA, 10Operations, 10Patch-For-Review: Setup newer machines and replace all old misc (m*) and x1 eqiad machines - https://phabricator.wikimedia.org/T183469#4023215 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db1073.eqiad.wmnet'] ``` and were **ALL** successful. [10:39:45] 10DBA, 10Schema-change: Drop externallinks.el_from_namespace on wmf databases - https://phabricator.wikimedia.org/T114117#4023224 (10jcrespo) Removing blocked because as far as I can see, this is not blocking anything or anyone (people should just not use the column until it is deleted). This doesn't mean it s... [10:41:32] 10DBA, 10Collaboration-Team-Triage, 10MediaWiki-extensions-PageCuration, 10Schema-change: Drop ptrl_comment in production - https://phabricator.wikimedia.org/T157762#4023228 (10jcrespo) As per comment, there is noone blocked there, this is a regular DBA cleanup task. Thanks for reporting- it helps making t... [10:43:29] 10DBA, 10Schema-change: Apply enum changes to (img|oi|fa)_major_mime on production - https://phabricator.wikimedia.org/T72005#4023240 (10jcrespo) [10:54:05] 10Blocked-on-schema-change, 10MediaWiki-Database, 10Patch-For-Review: Add a primary key to l10n_cache - https://phabricator.wikimedia.org/T146591#4023255 (10jcrespo) [10:55:21] 10Blocked-on-schema-change, 10MediaWiki-Database, 10Patch-For-Review: Add a primary key to l10n_cache - https://phabricator.wikimedia.org/T146591#2665708 (10jcrespo) [10:56:57] 10Blocked-on-schema-change, 10MediaWiki-Database, 10Patch-For-Review: Add a primary key to l10n_cache - https://phabricator.wikimedia.org/T146591#2665708 (10jcrespo) Please add the summary as similar as the one suggested at https://wikitech.wikimedia.org/wiki/Schema_changes#Workflow_of_a_schema_change whenev... [10:58:22] 10Blocked-on-schema-change, 10MediaWiki-Database, 10Patch-For-Review: Add a primary key to l10n_cache - https://phabricator.wikimedia.org/T146591#4023271 (10jcrespo) [10:58:24] 10DBA: truncate l10n_cache table on WMF wikis - https://phabricator.wikimedia.org/T150306#4023270 (10jcrespo) [10:58:56] 10Blocked-on-schema-change, 10MediaWiki-Database, 10Patch-For-Review: Add a primary key to l10n_cache - https://phabricator.wikimedia.org/T146591#2665708 (10jcrespo) [11:00:17] 10Blocked-on-schema-change, 10MediaWiki-Database, 10Patch-For-Review: Add a primary key to l10n_cache - https://phabricator.wikimedia.org/T146591#2665708 (10jcrespo) [11:28:10] 10Blocked-on-schema-change, 10Reading List Service, 10Reading-Infrastructure-Team-Backlog (Kanban): Deploy ReadingLists schema change for efficient count(*) handling - https://phabricator.wikimedia.org/T188048#4023311 (10Tgr) 05Open>03Resolved a:05Tgr>03jcrespo Thanks! >>! In T188048#4023207, @jcres... [11:43:27] jynus: marostegui hey, I'm working on two things for wikidata atm, you know about one of them (reducing the logging table size everywhere) but this is another case. Optimizing wb_terms table to reduce its size and speed up lookups. https://phabricator.wikimedia.org/T188279 I put up a document for that: https://docs.google.com/document/d/1Op3GHFS0wOXYATA8EC92WquQVNAJR1jH-a0a4KaB-B8 The reason I'm asking you is that we probably [11:43:27] start a new table called (maybe) wb_terms_new and read from both tables for a while until the new table is fully populated and the old table got shrunk (moving rows from the old table to the new table gradually) and then drop the old table altogether [11:44:00] Does this work for you? Do you think I should optimize it column by column instead? [11:45:16] (we are in a meeting) [11:47:31] same here, but mine is crowded, no one noticed (yet) :D [12:11:20] pleaes don't advance much on wb_terms_new (in terms of code and final decisions) until you get more input from us [12:11:51] that could be easier to manage, but it needs more resources in cpu and disk space and io, which we may not have [12:16:40] jynus: sure, that's why I'm reaching out to you [12:21:06] Amir1: The idea looks good to me, but as jynus said, we need to make sure we have enough resources to support both, enough io to support reads from both tables, and space of course to handle both [12:21:20] the idea doesn't look good to me [12:22:03] rewriting the wb_terms at application level may not be better than running an alter table and maintaining a full duplicate of it [12:22:24] I am not sure if we can afford and alter that rebuilds the table [12:22:36] how big is the table now [12:22:37] let's see [12:23:13] I am not saying we can afford that, I am literally saying: [12:23:24] "rewriting the wb_terms at application level may not be better than running an alter table" [12:23:41] Jesus, 818G :| [12:24:39] marostegui: don't worry, MCR will make that and the revision table a single table [12:24:47] XDDDDD [12:24:52] I am not kidding [12:25:10] 10DBA, 10Data-Services, 10Tracking: Wikireplica service for tools and labs - issues and missing available views (tracking) - https://phabricator.wikimedia.org/T150767#4023474 (10aborrero) [12:25:12] 10DBA, 10Cloud-VPS, 10Data-Services, 10cloud-services-team (Kanban): Add page_props.pp_value index to Wiki Replicas - https://phabricator.wikimedia.org/T140609#4023472 (10aborrero) 05Open>03stalled DBAs are currently busy and can't take a look at this. Will revisit later. [12:28:51] 10DBA: Decommission db1011 - https://phabricator.wikimedia.org/T184703#4023480 (10Marostegui) [12:29:28] 10DBA: Decommission db1011 - https://phabricator.wikimedia.org/T184703#3892852 (10Marostegui) [12:45:28] 10DBA, 10Patch-For-Review: Decommission db1011 - https://phabricator.wikimedia.org/T184703#4023513 (10Marostegui) [12:46:49] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: Decommission db1011 - https://phabricator.wikimedia.org/T184703#4023515 (10Marostegui) a:03RobH db1011 is now ready to be decommissioned by DC Ops - assigning it to @RobH [12:50:13] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182#4023524 (10Marostegui) [12:50:35] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-Database, 10Multi-Content-Revisions, and 3 others: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128#4023526 (10Marostegui) [12:50:37] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018), 10Patch-For-Review, 10Schema-change: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089#4023527 (10Marostegui) [13:02:52] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182#4023551 (10Marostegui) s7 progress: [] labsdb1009 [] labsdb1010 [] labsdb1011 [] db1102 [x] dbstore1001 - broken host [] d... [13:02:56] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-Database, 10Multi-Content-Revisions, and 3 others: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128#4023552 (10Marostegui) s7 progress: [] labsdb1009 [] labsdb1010 [] labsdb1011 [] db1102 [... [13:03:02] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018), 10Patch-For-Review, 10Schema-change: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089#4023553 (10Marostegui) s7 progress: [] labsdb1009 [] labsdb1010 [] labsdb1011 [] db1102 [x] dbstore1001 -... [13:03:15] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182#4023554 (10Marostegui) [13:03:41] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-Database, 10Multi-Content-Revisions, and 3 others: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128#4023555 (10Marostegui) [13:04:14] 10DBA, 10MediaWiki-Platform-Team (MWPT-Q3-Jan-Mar-2018), 10Patch-For-Review, 10Schema-change: Fix WMF schemas to not break when comment store goes WRITE_NEW - https://phabricator.wikimedia.org/T187089#4023556 (10Marostegui) [13:04:28] 10Blocked-on-schema-change, 10DBA, 10Patch-For-Review, 10Schema-change: Perform schema change to add externallinks.el_index_60 to all wikis - https://phabricator.wikimedia.org/T153182#3964085 (10Marostegui) [13:04:41] 10Blocked-on-schema-change, 10DBA, 10MediaWiki-Database, 10Multi-Content-Revisions, and 3 others: Schema change to prepare for dropping archive.ar_text and archive.ar_flags - https://phabricator.wikimedia.org/T185128#3964058 (10Marostegui) [14:25:13] Amir1: we should have a more in depth conversation, 1TB of data is not that much, 1TB of a single, monolitical table could be too much for some operations, including performing maintanace. [15:15:09] 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on db1068 - https://phabricator.wikimedia.org/T188187#4023865 (10Cmjohnson) @Marostegui Feel free to fail the disk...I am ready w/a replacement [15:19:38] 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on db1068 - https://phabricator.wikimedia.org/T188187#4023869 (10Marostegui) >>! In T188187#4023865, @Cmjohnson wrote: > @Marostegui Feel free to fail the disk...I am ready w/a replacement Thanks - I will do in a sec once I get someone to double check the comm... [15:22:36] jynus: I want to replace the disk on db1068 (s4 maser) with media errors, I want to mark it as failed before chris removes it, do you have a sec to review the disk and the command? (https://phabricator.wikimedia.org/T188187) [15:23:10] it is disk 32:9 so…megacli -PDOffline -PhysDrv [32:9] -a0 [15:23:28] you need either escapes [15:23:30] or '' [15:23:35] check the command on wikitech [15:24:07] https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting#Disks_about_to_fail [15:24:20] ah yes, good one [15:24:34] -a0 or -aALL is the same, there is normally only 1 controller which is the first one [15:24:48] megacli -PDOffline -PhysDrv \[32:9\] -a0 [15:25:02] do you want me to check the actual slot too? [15:25:02] yeah, I have the tendency to avoid using -aALL as I have had issues in the past with it [15:25:09] cool [15:25:13] If you have time, I prefer another pair of eyes [15:25:17] ok [15:25:17] just to confirm [15:27:25] there are more disks with errors, BTW [15:27:27] although less [15:27:52] yep [15:28:23] 32:9 is indeed the worst one [15:28:27] cool! [15:28:31] going to mark it then [15:30:47] Firmware state: Offline [15:30:57] yep :) [15:31:48] technically, there are some commands to prepare it for removal [15:31:55] -PDMarkMissing [15:32:04] actually -PdPrpRmv [15:32:22] if you do the above, I think you have to include it manaully [15:32:26] But that shouldn't be necessary if the disk is being seeing as offline as far as I know [15:32:29] exactly [15:49:55] 10DBA, 10Operations, 10ops-eqiad: Degraded RAID on db1068 - https://phabricator.wikimedia.org/T188187#4023913 (10Marostegui) This has been replaced by Chris: ``` root@db1068:~# megacli -PDRbld -ShowProg -PhysDrv [32:9] -aALL Rebuild Progress on Device at Enclosure 32, Slot 9 Completed 1% in 12 Minutes. ``` [15:55:39] 10DBA, 10Epic: Meta ticket: Migrate multi-source database hosts to multi-instance - https://phabricator.wikimedia.org/T159423#4023923 (10jcrespo) @marostegui I wanted to setup db1113 as sanitarium multiinstance, but I see sanitarium_multiinstance.pp is very pourly puppetized. Compare it to core_multiinstance.pp. [16:04:37] 10DBA, 10Patch-For-Review: Prepare and indicate proper master db failover candidates for all eqiad database sections (s1-s8, x1) - https://phabricator.wikimedia.org/T186321#4023963 (10Marostegui) [16:04:39] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: Move db1069 to A1 - https://phabricator.wikimedia.org/T186699#4023961 (10Marostegui) 05Open>03Resolved This is all done now - Chris will update racktables Thanks @Cmjohnson [16:05:24] 10DBA, 10Patch-For-Review: Prepare and indicate proper master db failover candidates for all eqiad database sections (s1-s8, x1) - https://phabricator.wikimedia.org/T186321#3940903 (10Marostegui) [16:08:50] 10DBA, 10Epic: Meta ticket: Migrate multi-source database hosts to multi-instance - https://phabricator.wikimedia.org/T159423#4023973 (10Marostegui) >>! In T159423#4023923, @jcrespo wrote: > @marostegui I wanted to setup db1113 as sanitarium multiinstance, but I see sanitarium_multiinstance.pp is very pourly... [16:21:47] are you executing twice tendril changes on both datacenters? [16:21:57] or only keeping eqiad up to date? [16:22:09] only eqiad, but maybe it is a good idea to do it on both [16:22:19] (I only did one change since we set up codfw) [16:22:36] whtever we do, let's do the same [16:22:46] ok, let's do it on both=? [16:22:50] ok [16:23:04] which one did you chang last? [16:23:23] db1011 (remove) [16:23:29] I will execute on codfw [16:23:33] ok, I will run my change [16:23:36] let me do it [16:23:38] oki! [16:23:40] I have other change to do [16:23:44] thank you [16:23:45] will rune mine and yours [16:23:56] thanks [16:25:27] of course: "ERROR 1577 (HY000) at line 6: Cannot proceed because system tables used by Event Scheduler were found damaged at server start" [16:25:36] lovely [16:25:46] on codfw? [16:25:58] yes [16:26:06] They haven't responded yet to my ticket :( [16:29:32] I guess we have to do the same and not do it [16:29:47] yeah :( [16:32:27] 10DBA, 10Patch-For-Review: Prepare and indicate proper master db failover candidates for all eqiad database sections (s1-s8, x1) - https://phabricator.wikimedia.org/T186321#4024060 (10Marostegui) 05Open>03Resolved a:03Marostegui [16:55:44] I repooled labsdb1010 [16:57:00] saw it yeah - thanks [17:25:46] jynus: marostegui I just got back from a marathon of meetings, shall I make a meeting in hangout? that would make things fast [17:26:00] when do you have some time to discuss this [17:26:28] Amir1: (we are in a meeting now), probably we should read the document first (I haven't read it myself yet) and then come up with concrete questions or comments there (I guess) [17:26:49] that would work for me too [17:27:25] +1 [17:47:09] marostegui: can you check bdwikimedia 'show tables;" for me? - T188853#4024324 [17:47:09] T188853: Install translate extension in bd.wikimedia.org - https://phabricator.wikimedia.org/T188853 [17:48:18] Hauskatze: I am in a meeting, but: https://phabricator.wikimedia.org/P6796 [17:48:50] thanks! [17:49:07] not sure why those tables ain't replicated, they shouldn't contain any private data [17:49:19] that is the db master, not any lab host [17:49:41] * Hauskatze checks with the src/sql to see if there's a missing table not created by createExtensionTables.php of WikimediaMaintenance [17:50:57] It works fine [17:51:02] Unless there's been some new tables added [17:53:21] https://github.com/wikimedia/puppet/blob/c42782570dfe438f91a6ace59396eebadf1b103b/modules/role/templates/labs/db/views/maintain-views.yaml [17:53:28] Hauskatze: I don't think any of the translate tables are replicated [17:53:38] Neither 1:1, nor custom viewed [17:53:53] Reedy: I think there's a missing table, 'translate_stash' [17:54:12] Hauskatze: let's be clear and not mix production vs. wikirreplicas [17:54:14] I think it wasn't created because WikimediaMaintenance doesn't have it in the list of tables to create [17:54:30] which servers are you talking about? [17:54:30] jynus: I'm clear about that, don't worry [17:54:44] I know some tables are not displayed on replicas [17:54:47] well, it is not clear to me which you are talking about [17:54:53] that's why I asked here for a show tables [17:54:55] so I am asking [17:55:00] Is it causing an error? Is the table actually used? [17:55:08] Reedy: I asked nikerabbit [17:55:17] he'll let us know in good time I guess [17:55:19] Hauskatze: the show tables is from the master [17:55:27] if it is needed, we can add it later [17:55:34] meta for example, doesn't have translatestash either [17:55:35] so far, the extension isn't failing [17:55:47] https://github.com/wikimedia/mediawiki-extensions-Translate/blob/master/sql/translate_stash.sql [17:56:22] considering the table has been around since 2013... [17:56:27] I'm guessing we'd have been adding it if we needed it [17:57:02] https://www.mediawiki.org/wiki/Special:TranslationStash doesn't seem to exist so yeah, maybe it's not needed at all [17:57:13] All the other tables are created so this should be fine [18:00:53] are you sure that is the right section? [18:01:01] that it is not being created on x1? [18:01:21] there is some translation stuff on x1, that I know [19:22:45] 10DBA, 10Data-Services, 10Goal, 10Patch-For-Review, 10cloud-services-team (FY2017-18): Migrate all users to new Wiki Replica cluster and decommission old hardware - https://phabricator.wikimedia.org/T142807#4024699 (10zhuyifei1999) [21:08:16] 10DBA, 10Jouncebot, 10MediaWiki-Maintenance-scripts, 10Operations, and 3 others: Add section for long-running tasks on the Deployment page (specially for database maintenance) - https://phabricator.wikimedia.org/T144661#4025055 (10Framawiki)