[06:00:12] 10Blocked-on-schema-change, 10DBA, 10Wikidata, 10Technical-Debt, and 2 others: Make wb_changes_dispatch.chd_seen unsigned in production - https://phabricator.wikimedia.org/T273874 (10Marostegui) [06:01:37] 10Blocked-on-schema-change, 10DBA, 10Wikidata, 10Technical-Debt, and 2 others: Make wb_changes_dispatch.chd_seen unsigned in production - https://phabricator.wikimedia.org/T273874 (10Marostegui) `testcommonswiki` and `commonswiki` ` # ./section s4 | while read host port; do echo "$host:$port" ; mysql.p... [06:01:55] 10Blocked-on-schema-change, 10DBA, 10Wikidata, 10Technical-Debt, and 2 others: Make wb_changes_dispatch.chd_seen unsigned in production - https://phabricator.wikimedia.org/T273874 (10Marostegui) [06:06:02] 10Blocked-on-schema-change, 10DBA, 10Wikidata, 10Technical-Debt, and 2 others: Make wb_changes_dispatch.chd_seen unsigned in production - https://phabricator.wikimedia.org/T273874 (10Marostegui) `wikidatawiki` ` # ./section s8 | while read host port; do echo "$host:$port" ; mysql.py -h$host:$port wiki... [06:06:22] 10Blocked-on-schema-change, 10DBA, 10Wikidata, 10Technical-Debt, and 2 others: Make wb_changes_dispatch.chd_seen unsigned in production - https://phabricator.wikimedia.org/T273874 (10Marostegui) [06:07:12] 10Blocked-on-schema-change, 10DBA, 10Wikidata, 10Technical-Debt, and 2 others: Make wb_changes_dispatch.chd_seen unsigned in production - https://phabricator.wikimedia.org/T273874 (10Marostegui) 05Open→03Resolved All done [06:08:41] 10Blocked-on-schema-change: Schema change for dropping defaults of ipb_timestamp and ipb_expiry - https://phabricator.wikimedia.org/T273358 (10Marostegui) Altered db1098 and db1096 in eqiad. Will leave them running for a few hours to make sure nothing has those indexes hardcoded. [06:19:48] 10DBA, 10DC-Ops, 10SRE, 10ops-codfw, 10Patch-For-Review: (Need By: TBD) rack/setup/install db21[45-52] - https://phabricator.wikimedia.org/T273568 (10Marostegui) These hosts have been added to puppet with: `insetup` role and also assigned a partman recipe for the installation. The only puppet change need... [06:19:50] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission db1094.eqiad.wmnet - https://phabricator.wikimedia.org/T273710 (10Marostegui) [07:52:05] 10DBA, 10serviceops, 10Parsoid (Tracking): mariadb failing on testreduce1001 - https://phabricator.wikimedia.org/T274034 (10Joe) [07:59:08] 10DBA, 10serviceops, 10Parsoid (Tracking): mariadb failing on testreduce1001 - https://phabricator.wikimedia.org/T274034 (10Marostegui) I have fixed this issue and mysql is now running. ` root@testreduce1001:/var/lib# mysql testreduce -e "show tables" +----------------------+ | Tables_in_testreduce | +------... [08:51:59] 10DBA, 10Patch-For-Review, 10Performance-Team (Radar): Productionize x2 databases - https://phabricator.wikimedia.org/T269324 (10Volans) Most likely the `x2` section should be added to spicerack too in `spicerack/mysql_legacy.py:CORE_SECTIONS` [09:45:05] 10DBA, 10SRE, 10ops-eqiad: eqiad: move db1111 to rack A8 - https://phabricator.wikimedia.org/T273982 (10Marostegui) @Cmjohnson db1111 is now off. You can proceed whenever you like. [10:47:53] I have refactored some backup code, you will see that db hosts now say there is no available backups [10:48:12] that is technically true, but only because the old backups are under a different name [10:48:39] I will explain this better on meeting [11:25:01] hurray - got a fix merged into dbdeployer: https://github.com/datacharmer/dbdeployer/pull/125 [11:26:01] <3 [11:32:37] hey, ignore my comment about db backups being renamed- it was an artifact of monitoring in the middle of deployment [11:33:09] things will keep for now as is, only 2 new backups will be added: ES ones [11:34:40] and those are running now [11:35:01] I will document all changes on wiki when 100% sure they are working [11:35:06] that will be easier to follo [11:35:10] *follow [12:32:08] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission db1094.eqiad.wmnet - https://phabricator.wikimedia.org/T273710 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: `db1094.eqiad.wmnet` - db1094.eqiad.wmnet (**PASS**) - Downtimed host on Icinga... [12:33:59] 10DBA, 10DC-Ops, 10decommission-hardware, 10ops-eqiad, 10Patch-For-Review: decommission db1094.eqiad.wmnet - https://phabricator.wikimedia.org/T273710 (10Marostegui) a:05Marostegui→03wiki_willy Ready for #dc-ops [12:34:32] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [13:17:58] snapshots of db1171 seem to be working as expected, last test is today's scheduled logical dumps [13:27:13] 10Data-Persistence-Backup, 10SRE, 10Goal, 10Patch-For-Review: Set up backup strategy for es clusters - https://phabricator.wikimedia.org/T79922 (10jcrespo) After deployment, the latest operational steps are being done to ensure backups will be generated correctly, after that (and its documentation), we cou... [13:30:22] 10Data-Persistence-Backup, 10SRE, 10Goal, 10Patch-For-Review: Set up backup strategy for es clusters - https://phabricator.wikimedia.org/T79922 (10jcrespo) [13:30:29] 10Data-Persistence-Backup, 10SRE, 10Patch-For-Review: Implement logic to be able to perform incremental backups of ES hosts - https://phabricator.wikimedia.org/T244884 (10jcrespo) [13:30:45] 10Data-Persistence-Backup, 10Epic, 10Patch-For-Review: Improve regular production database backups handling - https://phabricator.wikimedia.org/T138562 (10jcrespo) [13:30:47] 10Data-Persistence-Backup, 10SRE, 10Patch-For-Review: Implement logic to be able to perform incremental backups of ES hosts - https://phabricator.wikimedia.org/T244884 (10jcrespo) [13:30:49] 10Data-Persistence-Backup, 10SRE, 10Goal, 10Patch-For-Review: Set up backup strategy for es clusters - https://phabricator.wikimedia.org/T79922 (10jcrespo) [13:30:52] 10Data-Persistence-Backup, 10SRE, 10Patch-For-Review: Implement logic to be able to perform incremental backups of ES hosts - https://phabricator.wikimedia.org/T244884 (10jcrespo) 05Stalled→03Open p:05High→03Low [13:33:24] 10Data-Persistence-Backup, 10Epic, 10Patch-For-Review: Improve regular production database backups handling - https://phabricator.wikimedia.org/T138562 (10jcrespo) [14:29:11] 10Blocked-on-schema-change: Schema change for dropping defaults of ipb_timestamp and ipb_expiry - https://phabricator.wikimedia.org/T273358 (10Marostegui) [14:30:06] 10Blocked-on-schema-change: Schema change for dropping defaults of ipb_timestamp and ipb_expiry - https://phabricator.wikimedia.org/T273358 (10Marostegui) [14:34:53] is there anything long-running for mysql or backups on cumin2001 which would prevent a reboot tomorrow morning European time? [14:36:12] not from my side, but I think dumps do run on tuesdays [14:36:22] jaime will know better [14:40:28] ack [15:25:21] could we do them now or on wednesday? [15:25:39] oh, wait, but dumps do not depend on cumin [15:25:49] so it can be done tomorrow no issue [15:27:45] sadly s4 logical backups now take 14-15 hours due to image table growth [15:32:04] thx, I'll aim for tomorrow, then (but will doublecheck the channel before I start anyway) [15:50:27] 10DBA, 10SRE, 10ops-eqiad: eqiad: move db1111 to rack A8 - https://phabricator.wikimedia.org/T273982 (10Cmjohnson) 05Open→03Resolved @Marostegui Thanks, the server move was successful and you are able to ssh. [15:50:56] 10DBA, 10SRE, 10ops-eqiad: eqiad: move db1111 to rack A8 - https://phabricator.wikimedia.org/T273982 (10Marostegui) Thank you Chris - I will take care of starting mysql and repooling the host. [18:11:54] 10DBA: wikiuser cannot write to labswiki from mw* servers - https://phabricator.wikimedia.org/T274168 (10Urbanecm) [18:32:11] 10Data-Persistence-Backup, 10SRE, 10Goal, 10Patch-For-Review: Set up backup strategy for es clusters - https://phabricator.wikimedia.org/T79922 (10jcrespo) codfw -> eqiad completed rather quickly. ` 305897 Full 5,561 3.078 T OK 08-Feb-21 17:19 backup2002.codfw.wmnet-Monthly-1st-Wed-EsRwEqi... [18:41:58] 10Data-Persistence-Backup, 10SRE, 10Goal, 10Patch-For-Review: Set up backup strategy for es clusters - https://phabricator.wikimedia.org/T79922 (10jcrespo) @ayounsi Could you think of a reason for this discrepancy at network layer? I cannot think of one at hw or software level. The only thing I see differe... [18:59:02] 10DBA: wikiuser cannot write to labswiki from mw* servers - https://phabricator.wikimedia.org/T274168 (10LSobanski) p:05Triage→03Medium [19:02:01] 10DBA: wikiuser cannot write to labswiki from mw* servers - https://phabricator.wikimedia.org/T274168 (10Marostegui) [19:40:36] 10Data-Persistence-Backup, 10SRE, 10Goal, 10Patch-For-Review: Set up backup strategy for es clusters - https://phabricator.wikimedia.org/T79922 (10ayounsi) Overall there is more traffic in the eqiad->codfw direction, so that could be part of the reason. Is it possible to know a bit more about the source,... [21:55:20] 10DBA, 10wikitech.wikimedia.org: wikitech database has almost all of its varbinary fields wrong - https://phabricator.wikimedia.org/T269348 (10Ladsgroup) Another ping? One month since my last ping has passed. [22:27:00] 10DBA, 10wikitech.wikimedia.org, 10cloud-services-team (Kanban): wikitech database has almost all of its varbinary fields wrong - https://phabricator.wikimedia.org/T269348 (10Reedy)