[05:10:01] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for arywiki - https://phabricator.wikimedia.org/T257725 (10Marostegui) p:05Triage→03Medium Let us know when the database is created [05:12:45] 10DBA, 10Patch-For-Review: Compress enwiki InnoDB tables - https://phabricator.wikimedia.org/T254462 (10Marostegui) [06:53:06] 10DBA, 10Operations, 10Patch-For-Review: decommission db1097.eqiad.wmnet - https://phabricator.wikimedia.org/T257406 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: `db1097.eqiad.wmnet` - db1097.eqiad.wmnet (**FAIL**) - Downtimed host on Icinga - Found... [06:54:33] 10DBA, 10Operations, 10Patch-For-Review: decommission db1097.eqiad.wmnet - https://phabricator.wikimedia.org/T257406 (10ops-monitoring-bot) cookbooks.sre.hosts.decommission executed by marostegui@cumin1001 for hosts: `db1097.eqiad.wmnet` - db1097.eqiad.wmnet (**FAIL**) - Downtimed host on Icinga - Found... [06:55:32] 10DBA, 10Operations, 10Patch-For-Review: decommission db1097.eqiad.wmnet - https://phabricator.wikimedia.org/T257406 (10Marostegui) I have powered off the host manually, the IPMI connection was failing [06:55:56] 10DBA, 10Operations, 10Patch-For-Review: decommission db1097.eqiad.wmnet - https://phabricator.wikimedia.org/T257406 (10Marostegui) [07:07:50] 10DBA, 10Operations, 10Patch-For-Review: decommission db1097.eqiad.wmnet - https://phabricator.wikimedia.org/T257406 (10Marostegui) [07:14:42] 10DBA, 10Patch-For-Review: Compress enwiki InnoDB tables - https://phabricator.wikimedia.org/T254462 (10Marostegui) a:03Marostegui [08:28:42] jynus: oh hai. i see that backups are using es1022 as a source for es4 [08:28:46] (modules/profile/templates/mariadb/backup_config/backup1002.cnf.erb) [08:29:05] i'm going to reimage it today; should i change that to point to a different es4 node (i.e. es1021)? [08:29:35] how long will the reimage last? [08:29:57] let's say 4h to be safe4 [08:30:10] yeah, I was going to say, be pesimistic [08:30:55] because of the low write rate + binlogs, we only take a dump currently every week, on monday->tuesday UTC night [08:31:08] so I think it should be ok unless something goes wrong [08:31:42] alright. i'll go ahead without changing that, then, but if the reimaging runs into issues i'll send a CR just for that [08:32:06] yeah, we will have to think anyway how to handle a worse case scenario [08:32:26] plus we have backup redundancy on codfw, so don't worry too much if one run fails [08:33:27] jynus: cool, thanks [08:57:17] 10DBA, 10Epic, 10Patch-For-Review, 10User-Kormat: Upgrade es4 to debian buster + mariadb 10.4 - https://phabricator.wikimedia.org/T257284 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by kormat on cumin1001.eqiad.wmnet for hosts: ` ['es1022.eqiad.wmnet'] ` The log can be found in `/var/log/wm... [09:16:21] 10DBA, 10MediaWiki-General, 10MW-1.35-notes (1.35.0-wmf.34; 2020-05-26), 10Patch-For-Review, and 2 others: Normalise MW Core database language fields length - https://phabricator.wikimedia.org/T253276 (10Marostegui) [09:16:52] this is the second host which didn't pxe boot... [09:17:10] it worked the second time? [09:17:27] i haven't retried it yet [09:20:53] i've manually set it to pxe boot via ipmitool, rebooting now to see what happens [09:21:16] it does say IPMI: Boot to PXE Boot Requested by iDRAC [09:21:46] maybe worth checking with arzhel to see if the network is ok [09:39:51] hmmh, according to the logs on install1003 es1022 the offer/request/ack look fine, probably an issue locally [09:42:53] moritzm: the request didn't reach install1003 until the third reboot attempt [09:43:04] this is the same thing that happened with es1021 last week [09:43:21] could be a firmware issue, or a network issue, or.. i don't know [09:44:11] it could be a concidence, but the time that did work for both machines was when i manually selected pxe from the serial console [09:45:41] FYI: https://gerrit.wikimedia.org/r/c/operations/puppet/+/612167 [09:47:11] 10DBA, 10Epic, 10Patch-For-Review, 10User-Kormat: Upgrade es4 to debian buster + mariadb 10.4 - https://phabricator.wikimedia.org/T257284 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['es1022.eqiad.wmnet'] ` and were **ALL** successful. [09:51:20] 10DBA, 10MediaWiki-General, 10MW-1.35-notes (1.35.0-wmf.34; 2020-05-26), 10Patch-For-Review, and 2 others: Normalise MW Core database language fields length - https://phabricator.wikimedia.org/T253276 (10Marostegui) s3 eqiad progress [] labsdb1012 [] labsdb1011 [] labsdb1010 [] labsdb1009 [x] dbstore1004... [13:17:05] 10DBA, 10CheckUser, 10Growth-Team, 10Thanks, 10User-DannyS712: Monitor the growth of CheckUser tables after the addition of Thanks data - https://phabricator.wikimedia.org/T257223 (10Huji) [13:18:52] 10DBA, 10Operations, 10Epic, 10User-Kormat: Use zarcillo as an authoritative inventory of db instances/roles - https://phabricator.wikimedia.org/T257814 (10Kormat) [13:23:53] 10DBA, 10CheckUser, 10Growth-Team, 10Thanks, 10User-DannyS712: Monitor the growth of CheckUser tables after the addition of Thanks data - https://phabricator.wikimedia.org/T257223 (10Huji) 05Stalled→03Open Not stalled anymore, since the feature created in T255526 is now live on all WMF wikis. [13:24:05] 10DBA, 10CheckUser, 10Growth-Team, 10Thanks, 10User-DannyS712: Monitor the growth of CheckUser tables after the addition of Thanks data - https://phabricator.wikimedia.org/T257223 (10Huji) @Marostegui can I ask you to provide an update every Tuesday? [13:26:01] 10DBA, 10CheckUser, 10Growth-Team, 10Thanks, 10User-DannyS712: Monitor the growth of CheckUser tables after the addition of Thanks data - https://phabricator.wikimedia.org/T257223 (10Marostegui) Will do my best! [13:26:01] 10DBA, 10Operations, 10Epic, 10User-Kormat: Set up replication for zarcillo - https://phabricator.wikimedia.org/T257816 (10Kormat) [13:26:01] 10DBA, 10Operations, 10Epic, 10User-Kormat: Set up replication for zarcillo - https://phabricator.wikimedia.org/T257816 (10Kormat) [13:26:02] 10DBA, 10Operations, 10Epic, 10User-Kormat: Add haproxy config for zarcillo - https://phabricator.wikimedia.org/T257819 (10Kormat) [13:26:02] 10DBA, 10Operations, 10Epic, 10User-Kormat: Add monitoring to ensure consistency between puppet and zarcillo - https://phabricator.wikimedia.org/T257821 (10Kormat) [13:26:02] 10DBA, 10Operations, 10Epic, 10User-Kormat: Set up replication for zarcillo - https://phabricator.wikimedia.org/T257816 (10Marostegui) p:05Triage→03Medium zarcillo was moved to db2093 because db1115 was not stable for a few week, but since: T252331 T231165 and T231182 were solved, we've not had any iss... [13:26:02] 10DBA, 10Operations, 10Epic, 10User-Kormat: Add monitoring to ensure consistency between tendril and zarcillo - https://phabricator.wikimedia.org/T257822 (10Kormat) [13:26:02] 10DBA, 10Operations, 10Epic, 10User-Kormat: Use zarcillo as an authoritative inventory of db instances/roles - https://phabricator.wikimedia.org/T257814 (10Marostegui) p:05Triage→03Medium [13:29:51] 10DBA, 10Operations, 10Epic, 10User-Kormat: Add haproxy config for zarcillo - https://phabricator.wikimedia.org/T257819 (10Marostegui) p:05Triage→03Medium There are two options here: 1- Move the database to either m1 or m2. That ensure replication, and a proxy in front of it. 2- Refactor haproxy puppe... [13:36:27] 10DBA, 10Operations, 10Epic, 10User-Kormat: Add haproxy config for zarcillo - https://phabricator.wikimedia.org/T257819 (10Marostegui) 05Open→03Stalled [13:36:30] 10DBA, 10Operations, 10Epic, 10User-Kormat: Use zarcillo as an authoritative inventory of db instances/roles - https://phabricator.wikimedia.org/T257814 (10Marostegui) [13:50:53] 10DBA, 10Cloud-Services, 10CPT Initiatives (MCR Schema Migration), 10Core Platform Team Workboards (Clinic Duty Team), and 2 others: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10Marostegui) [14:02:40] 10DBA, 10Cloud-Services, 10CPT Initiatives (MCR Schema Migration), 10Core Platform Team Workboards (Clinic Duty Team), and 2 others: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10WDoranWMF) @Marostegui Yeah, if you don't m... [14:04:09] 10DBA, 10Cloud-Services, 10CPT Initiatives (MCR Schema Migration), 10Core Platform Team Workboards (Clinic Duty Team), and 2 others: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10Marostegui) >>! In T238966#6301290, @WDoran... [14:40:19] 10DBA, 10Cloud-Services, 10CPT Initiatives (MCR Schema Migration), 10Core Platform Team Workboards (Clinic Duty Team), and 2 others: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10WDoranWMF) >>! In T238966#6301296, @Maroste... [15:05:20] 10DBA, 10Patch-For-Review: Create more tests for transfepy package - https://phabricator.wikimedia.org/T257600 (10Privacybatm) The following scenarios comes under this ticket: > 1. Unit tests: **transfer.py** correctness in logger level setting correctness in passing new options: verbose, parallel-checksum s... [15:18:34] 10DBA, 10Epic, 10User-Kormat: Switchover es4 master from es1020 to es1021 - https://phabricator.wikimedia.org/T257847 (10Kormat) [15:18:40] ^ WIP [15:20:41] kormat: ^ probably doesn't need the epic task, it is not _that_ hard: p [15:20:57] oops :) it inherited that from the parent task [15:22:35] yeah, I am not sure I like that feature of inherinting everything from the parent task, especially when you create a simple task and the parent has like 100 subscribers and they are all in the new task too XD [15:23:28] in this case i'm looking at some of the subscribers and wondering would it be polite to remote them from the subtask? but i don't know why they subscribed to the parent task, so ¯\_(ツ)_/¯ [15:24:07] yeah, I always have the same doubt XD [15:24:09] there is not rules, but I normally only start with those relevant [15:24:12] *are [15:24:33] the ones on parent should receive a notif anyway if they want to be subscribed is the thought [15:24:39] kormat: normally if it is going to have a very specific task I don't do it, to avoid spamming them [15:24:58] if it something that requires lots of discussion/awareness then I leave them subscribed [15:25:38] what if you want to punish them for subscribing? [15:25:50] then add them as watchers for operations tag! [15:25:55] :D [15:26:19] kormat just subsribed you to T119626 [15:26:20] T119626: Eliminate SPOF at the main database infrastructure - https://phabricator.wikimedia.org/T119626 [15:26:50] noooooo [15:29:54] 10DBA, 10User-Kormat: Switchover es4 master from es1020 to es1021 - https://phabricator.wikimedia.org/T257847 (10Kormat) [15:46:20] 10DBA, 10User-Kormat: Switchover es4 master from es1020 to es1021 - https://phabricator.wikimedia.org/T257847 (10Kormat) [15:58:24] 10DBA, 10Cognate, 10ContentTranslation, 10Growth-Team, and 10 others: Upgrade x1 databases to Buster and Mariadb 10.4 - https://phabricator.wikimedia.org/T254871 (10Marostegui) Wednesday 15th July at 06:00 AM UTC we will be setting x1 in read-only for around 1 minute to switchover the master to an upgraded... [16:20:24] 10DBA, 10User-Kormat: Switchover es4 master from es1020 to es1021 - https://phabricator.wikimedia.org/T257847 (10Marostegui) > Check that es4 is indeed read-only (How?) Once you've deployed the RO patch, you can inspect the master current binlog (`show master status;`) using `mysqlbinlog` and check that only... [17:34:49] 10DBA, 10CheckUser, 10Growth-Team, 10Thanks, 10User-DannyS712: Monitor the growth of CheckUser tables after the addition of Thanks data - https://phabricator.wikimedia.org/T257223 (10Huji) a:03Marostegui Much appreciated!