[02:48:02] 10DBA, 10WikimediaEditorTasks, 10Patch-For-Review, 10Reading-Infrastructure-Team-Backlog (Kanban): Choose DB/Cluster for WikimediaEditorTasks tables - https://phabricator.wikimedia.org/T218302 (10Krinkle) //(Does not affect the wikimedia/rdbms PHP library.)// [03:01:02] 10DBA, 10CirrusSearch, 10Discovery, 10MediaWiki-JobQueue, and 2 others: Change job table to use mediumblob for job_params field - https://phabricator.wikimedia.org/T219887 (10Krinkle) [03:01:07] 10DBA, 10CirrusSearch, 10Discovery, 10MediaWiki-JobQueue, and 2 others: Change job table to use mediumblob for job_params field - https://phabricator.wikimedia.org/T219887 (10Krinkle) [05:41:04] 10DBA, 10Reading-Infrastructure-Team-Backlog, 10WikimediaEditorTasks: Drop wikishared.wikimedia_editor_tasks_entity_description_exists table from x1 - https://phabricator.wikimedia.org/T219963 (10Marostegui) [05:41:13] 10DBA, 10Reading-Infrastructure-Team-Backlog, 10WikimediaEditorTasks: Drop wikishared.wikimedia_editor_tasks_entity_description_exists table from x1 - https://phabricator.wikimedia.org/T219963 (10Marostegui) p:05Triage→03Normal [05:44:52] 10DBA, 10CirrusSearch, 10Discovery, 10MediaWiki-JobQueue, and 2 others: Change job table to use mediumblob for job_params field - https://phabricator.wikimedia.org/T219887 (10Marostegui) [05:55:54] 10DBA: Decommission 2 codfw x1 hosts db2033 and db2034 - https://phabricator.wikimedia.org/T219493 (10Marostegui) a:03Marostegui [06:56:28] so surprisingly new backup systems is going on without any errors [06:56:53] although it is take a lot of time to generate backups and snapshots at the same time from dbstore2* [06:57:15] but that will be hopefully mitigated once the new sources are in place, no? [07:03:22] maybe ¯\_(ツ)_/¯ [07:03:30] xDDD [07:04:10] So, the filtered tables thingy, as far as I remember if the column is "K" there will be no trigger there [07:27:34] 10DBA, 10CirrusSearch, 10Discovery, 10MediaWiki-JobQueue, and 2 others: Change job table to use mediumblob for job_params field - https://phabricator.wikimedia.org/T219887 (10Marostegui) a:03Marostegui Change applied first to `labtestwiki` as requested :-) ` root@db1073.eqiad.wmnet[labtestwiki]> show cre... [07:27:43] 10DBA, 10CirrusSearch, 10Discovery, 10MediaWiki-JobQueue, and 2 others: Change job table to use mediumblob for job_params field - https://phabricator.wikimedia.org/T219887 (10Marostegui) [07:33:26] tendril is having issues [07:33:39] oh shit [07:33:42] the usual memory leak [07:33:43] let's see [07:34:20] https://grafana.wikimedia.org/d/000000377/host-overview?refresh=5m&orgId=1&var-server=db1115&var-datasource=eqiad%20prometheus%2Fops&var-cluster=mysql&from=now-6h&to=now [07:37:01] should we go for a restart? [07:37:37] sure, you do it? [07:37:40] yeah [07:37:43] checking one more thing [07:37:45] maybe downtime first [07:37:51] can you do that for me? [07:37:52] I can do that last part [07:37:55] thanks [07:38:41] done [07:38:50] thanks, checking one more thing before restarting [07:39:29] Got timeout reading communication packets [07:39:35] There was a problem processing the query on the foreign data source. [07:39:42] nothing on logs or hw logs [07:39:54] going for the restart [07:39:59] I will upgrade it [07:46:00] tendril should come back in a bit [07:46:46] I have seen another issue, not directly related [07:47:28] mysql is fully up and event_scheduler is on [07:47:38] tendril is back [07:47:41] what have you seen? [09:00:57] jynus, marostegui: FYI, I'm going to stop puppet on dbproxy1006/07/08/10/11 in a bit and merge https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/487895/ (discussed the steps yesterday with Jaime) [09:01:19] ok [09:01:26] sure [09:24:31] 10DBA, 10WikimediaEditorTasks, 10Patch-For-Review, 10Reading-Infrastructure-Team-Backlog (Kanban): Choose DB/Cluster for WikimediaEditorTasks tables - https://phabricator.wikimedia.org/T218302 (10Marostegui) >>! In T218302#5078853, @Marostegui wrote: > Having a table under `$private_tables` means it will n... [09:29:31] 10DBA, 10Reading-Infrastructure-Team-Backlog, 10WikimediaEditorTasks: Drop wikishared.wikimedia_editor_tasks_entity_description_exists table from x1 - https://phabricator.wikimedia.org/T219963 (10Marostegui) 05Open→03Resolved Done ` root@db1069.eqiad.wmnet[wikishared]> select count(*) from wikimedia_edit... [09:29:36] 10DBA, 10WikimediaEditorTasks, 10Patch-For-Review, 10Reading-Infrastructure-Team-Backlog (Kanban): Choose DB/Cluster for WikimediaEditorTasks tables - https://phabricator.wikimedia.org/T218302 (10Marostegui) [09:56:52] 10DBA, 10CirrusSearch, 10Discovery, 10MediaWiki-JobQueue, and 2 others: Change job table to use mediumblob for job_params field - https://phabricator.wikimedia.org/T219887 (10Marostegui) [10:02:08] 10DBA, 10CirrusSearch, 10Discovery, 10MediaWiki-JobQueue, and 2 others: Change job table to use mediumblob for job_params field - https://phabricator.wikimedia.org/T219887 (10Marostegui) [10:15:18] it took 12 hours to do an s8 snapshot :-O [10:15:48] how come? [10:16:02] it is true that 2 dumps were running on that server at the same time [10:18:42] the good news is that dumps got reduced by half [10:21:31] s8 backup is 1.5TB uncompressed [10:22:13] or 442G compressed [10:23:47] oh nice [10:23:55] 442G compressed is impressive [10:23:55] more than I expected [10:24:10] it is supposed to be compressed innodb the "uncompressed one" [10:38:57] 10DBA, 10CirrusSearch, 10Discovery, 10MediaWiki-JobQueue, and 2 others: Change job table to use mediumblob for job_params field - https://phabricator.wikimedia.org/T219887 (10Marostegui) [10:42:32] 10DBA, 10CirrusSearch, 10Discovery, 10MediaWiki-JobQueue, and 2 others: Change job table to use mediumblob for job_params field - https://phabricator.wikimedia.org/T219887 (10Marostegui) [10:52:43] 10DBA, 10CirrusSearch, 10Discovery, 10MediaWiki-JobQueue, and 2 others: Change job table to use mediumblob for job_params field - https://phabricator.wikimedia.org/T219887 (10Marostegui) [12:50:56] 10DBA, 10MediaWiki-Cache, 10Patch-For-Review, 10Performance-Team (Radar), 10User-Marostegui: Replace parsercache keys to something more meaningful on db-XXXX.php - https://phabricator.wikimedia.org/T210725 (10Marostegui) I would like to proceed with the above ^ plan Tuesday or Thursday next week cc @aaro... [13:04:13] marostegui: jynus Today I did a test deployment of Url shortener, so read and writes got increased a little, tell me if something went/goes/will go crazy [13:06:23] Amir1: you've got any graphs? [13:06:31] 10DBA, 10MediaWiki-Cache, 10Patch-For-Review, 10Performance-Team (Radar), 10User-Marostegui: Replace parsercache keys to something more meaningful on db-XXXX.php - https://phabricator.wikimedia.org/T210725 (10jcrespo) Did you run the warmup script on codfw? what was the effect? [13:07:22] marostegui: not so far, have you seen anything on x1? [13:07:58] 10DBA, 10MediaWiki-Cache, 10Patch-For-Review, 10Performance-Team (Radar), 10User-Marostegui: Replace parsercache keys to something more meaningful on db-XXXX.php - https://phabricator.wikimedia.org/T210725 (10Marostegui) >>! In T210725#5081484, @jcrespo wrote: > Did you run the warmup script on codfw? wh... [13:07:59] Amir1: When was the test done, so I can narrow the times [13:08:16] marostegui: 1pm Poznan time [13:08:21] until 1:30 [13:09:45] https://grafana.wikimedia.org/d/000000273/mysql?orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1069&var-port=9104&from=1554287456000&to=1554291033000 [13:10:00] there is a spike at 10:50 on the master [13:11:10] It does some reads on master for weird reasons that I need to fix [13:11:22] (by fix I mean just switching it to replica) [13:15:34] the slave db1120 has also a spike and on handler_read_rnd_next which generally mean table scans [13:16:26] 10DBA, 10CirrusSearch, 10Discovery, 10MediaWiki-JobQueue, and 2 others: Change job table to use mediumblob for job_params field - https://phabricator.wikimedia.org/T219887 (10Marostegui) [13:18:07] jynus: /var/lib/mediawiki-cache-warmup/warmup.js is the one for parsercache? or am I getting confused [13:35:13] @ meeting [13:35:35] :) [13:36:43] actually, thanks for the reminder :) [13:54:02] back [13:55:00] o/ [13:55:01] it is documented on the dc-failover wiki page, can't remember now [13:55:10] yeah, I saw that one there [13:55:56] you can also request individual pages and see on the debug information from which pc was retrived [13:56:15] (overpassing the cache, ofc) [13:57:01] e.g "Saved in parser cache with key eswiki:pcache:idhash:2271189-0!thumbsize=5 and timestamp 20190403133323 and revision id 112969012" [13:58:06] So what I had in my notes was: nodejs /var/lib/mediawiki-cache-warmup/warmup.js ~/urls-cluster.txt spread appservers.svc.codfw.wmnet [13:58:16] which matches what I saw on: https://wikitech.wikimedia.org/wiki/Switch_Datacenter/MediaWiki [14:11:27] https://logstash.wikimedia.org/goto/f6705ebc9015ac3e90ca26aa4de2b4d5 [14:11:40] errors? [14:11:55] nop [14:11:59] I am browsing some more wikis [14:12:04] ah, debug [14:12:26] Hehe yeah, the debug for the parsercache keys [14:29:34] 10DBA, 10MediaWiki-Cache, 10Patch-For-Review, 10Performance-Team (Radar), 10User-Marostegui: Replace parsercache keys to something more meaningful on db-XXXX.php - https://phabricator.wikimedia.org/T210725 (10Marostegui) So, I have done the following test with X-Wikimedia-debug I have browsed codfw for... [14:33:14] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2070 - https://phabricator.wikimedia.org/T219852 (10Papaul) a:05Papaul→03Marostegui complete [14:33:55] 10DBA, 10Operations, 10ops-codfw: Degraded RAID on db2070 - https://phabricator.wikimedia.org/T219852 (10Marostegui) Thanks! ` physicaldrive 1I:1:6 (port 1I:box 1:bay 6, SAS, 600 GB, Rebuilding) ` [14:47:08] I am re - starting puppet on dbproxy 7, 8, 10 and 11 [15:01:41] jynus or marostegui: the battery is here....the server will need to be shutdown to swap it out [15:01:54] cmjohnson1: ok, so we need to plan for a failover [15:02:28] yes...i need maybe 5 mins of downtime [15:02:30] cmjohnson1: that will take sometime (as it requires read-only time for the wikis), we will get back to you once we are ready for it (probably not even next week) [15:02:40] okay...let me know [15:02:44] cmjohnson1: thank you [15:02:58] db1075 I guess [15:03:01] yep [15:06:46] 10DBA, 10Operations, 10ops-eqiad: db1078 s3 primary DB master BBU pre-failure - https://phabricator.wikimedia.org/T219115 (10Marostegui) @Cmjohnson let us know that the BBU arrived and he'll need to put the server down to be able to replace it. So we need to do a failover and failback to db1075 (the previous... [15:07:00] 10DBA, 10Operations, 10ops-eqiad: db1078 s3 primary DB master BBU pre-failure - https://phabricator.wikimedia.org/T219115 (10Marostegui) a:05Cmjohnson→03Marostegui [15:07:54] I proposed a date there ^ [15:09:11] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/deploy codfw dedicated backup recovery/provisioning hosts - https://phabricator.wikimedia.org/T218336 (10jcrespo) [15:09:44] 10DBA, 10Goal: Purchase and setup remaining hosts for database backups - https://phabricator.wikimedia.org/T213406 (10jcrespo) [15:09:51] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/deploy codfw dedicated backup recovery/provisioning hosts - https://phabricator.wikimedia.org/T218336 (10jcrespo) 05Open→03Resolved a:05jcrespo→03Papaul This is done, except the problems with mounting point of the ssds, to be handled... [15:09:56] \o/ [15:12:04] 10DBA, 10CirrusSearch, 10Discovery, 10MediaWiki-JobQueue, and 2 others: Change job table to use mediumblob for job_params field - https://phabricator.wikimedia.org/T219887 (10Marostegui) [15:19:38] 10DBA, 10Goal: Decomission dbstore1001, dbstore2001, dbstore2002 and es2001-4 hosts* - https://phabricator.wikimedia.org/T220002 (10jcrespo) [15:22:06] 10DBA, 10Goal: Decomission dbstore1001, dbstore2001, dbstore2002 and es2001-4 hosts* - https://phabricator.wikimedia.org/T220002 (10Marostegui) When creating the final task to decommission dbstore2002 please make sure a point to label the BBU as broken on the DCOps onsite steps. Thanks! [15:23:04] 10DBA, 10Operations, 10ops-eqiad: rack/setup/deploy eqiad dedicated backup recovery/provisioning hosts - https://phabricator.wikimedia.org/T219399 (10jcrespo) [15:23:07] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/install (5) dedicated dump slaves - https://phabricator.wikimedia.org/T219463 (10jcrespo) [15:23:10] 10DBA, 10Operations, 10ops-eqiad, 10Patch-For-Review: rack/setup/install db1139|db1140.eqiad.wmnet (2 dump slaves) - https://phabricator.wikimedia.org/T218985 (10jcrespo) [15:23:13] 10DBA, 10Goal: Decomission dbstore1001, dbstore2001, dbstore2002 and es2001-4 hosts* - https://phabricator.wikimedia.org/T220002 (10jcrespo) [15:23:15] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: rack/setup/deploy codfw dedicated backup recovery/provisioning hosts - https://phabricator.wikimedia.org/T218336 (10jcrespo) [15:25:11] 10DBA, 10Goal: Decomission dbstore1001, dbstore2001, dbstore2002 and es2001-4 hosts* - https://phabricator.wikimedia.org/T220002 (10jcrespo) [15:44:51] 10DBA, 10Goal: Decommission dbstore1001, dbstore2001, dbstore2002 and es2001-4 hosts* - https://phabricator.wikimedia.org/T220002 (10jcrespo) [15:45:25] 10DBA, 10Goal: Decommission dbstore1001, dbstore2001, dbstore2002 and es2001-4 hosts* - https://phabricator.wikimedia.org/T220002 (10jcrespo) p:05Triage→03Normal a:03jcrespo [16:01:44] In theory, the latest proxysql has support for tls1.2 https://github.com/sysown/proxysql/releases [23:55:15] 10DBA, 10TechCom-RFC: MediaWiki database policy and/or guidelines (2019) - https://phabricator.wikimedia.org/T220056 (10Krinkle) [23:56:21] 10DBA, 10TechCom-RFC: MediaWiki database policy and/or guidelines (2019) - https://phabricator.wikimedia.org/T220056 (10Krinkle) Ref T190379#5073911. I think the first two points in the objective should be addressed by reincorporating them into the development policy, similar to before. The third point (abou... [23:57:07] 10DBA, 10Performance-Team, 10TechCom, 10TechCom-RFC (TechCom-Approved): RFC: Re-establish the development policies - https://phabricator.wikimedia.org/T190379 (10Krinkle) @jcrespo @mark This RFC marked a clean start with the entry point at . The ones atta... [23:57:34] 10DBA, 10TechCom-RFC: MediaWiki database policy and/or guidelines (2019) - https://phabricator.wikimedia.org/T220056 (10Jdforrester-WMF)