[07:19:09] 10DBA, 06Labs, 06Operations, 10netops, 13Patch-For-Review: DBA plan to mitigate asw-c2-eqiad reboots - https://phabricator.wikimedia.org/T155999#3012217 (10Marostegui) [07:19:12] 10DBA, 06Operations, 13Patch-For-Review: Move db1073 to B3 - https://phabricator.wikimedia.org/T156126#3012215 (10Marostegui) 05Open>03Resolved Server is repooled this can be closed. [07:22:58] 10DBA, 06Labs, 06Operations, 10netops, 13Patch-For-Review: DBA plan to mitigate asw-c2-eqiad reboots - https://phabricator.wikimedia.org/T155999#3012218 (10Marostegui) 05Open>03Resolved All the initial actions listed on the original ticket to mitigate this issue have been completed, the only pending... [08:57:48] 07Blocked-on-schema-change, 06Collaboration-Team-Triage, 10Notifications, 13Patch-For-Review, 07Schema-change: Add primary key to echo_notification table - https://phabricator.wikimedia.org/T136428#3012358 (10Marostegui) I am going to start with S7 now (slave by slave). I said I would start on Monday bu... [09:14:42] 07Blocked-on-schema-change, 06Collaboration-Team-Triage, 10Notifications, 13Patch-For-Review, 07Schema-change: Add primary key to echo_notification table - https://phabricator.wikimedia.org/T136428#2334714 (10jcrespo) Some context for echo: T119154 and T153638 <- this, in the future, will solve many conf... [09:34:55] jynus: for coordination, do you need to work over any of the s7 hosts in eqiad? [09:35:08] I will be depooling them during today [09:35:43] I am doing db1034 now [09:35:50] not sure which shard it is [09:35:54] s7 :) [09:36:03] can you leave it depooled so I can alter it? (it is a fast one) [09:36:09] I only need to restart it [09:36:16] it should only take 15 minutes [09:36:23] sounds good, let me know when I can proceed [09:36:26] my alter is around 2 minutes [09:36:27] then I'm done [09:36:39] I will ping you [09:36:42] thanks [09:36:59] db1028 is also depooled, is that you too? [09:37:29] ah no [09:37:32] that is me :) [09:37:34] I will repool it [09:40:19] BTW [09:40:29] I gave you a heads up for some echo tables [09:40:46] that have to be dropped rather than altered, on the ticket [09:41:11] (I am not telling you to drop them now, but at least not wasting time altering them) [09:43:02] yeah I did, I am just altering s7 which is metawiki and kowiki (kowiki is to be unsed, but it is just one line), for s3 will only alter the used ones :) [09:43:22] I won't alter anything but the used ones on s3 [10:15:22] marostegui, db1034 is ready, down'ed until 18h [10:15:35] thanks! [10:17:12] I have to depool db2040, though [10:17:26] go for it :) [10:17:35] I can take care of reverting https://gerrit.wikimedia.org/r/#/c/336661/ once I am done [10:18:47] I am done [10:30:41] 10DBA: Automate dataset backup recovery - https://phabricator.wikimedia.org/T157668#3012505 (10Marostegui) [12:09:42] 07Blocked-on-schema-change, 06Collaboration-Team-Triage, 10Notifications, 13Patch-For-Review, 07Schema-change: Add primary key to echo_notification table - https://phabricator.wikimedia.org/T136428#3012659 (10Marostegui) hosts done in s7 so far (I decided to alter also kowiki, for consistency, as it was... [12:23:53] 10DBA, 06Operations, 13Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#3012696 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by jynus on neodymium.eqiad.wmnet for hosts: ``` ['db2040.codfw.wmnet'] ``` The log can be found in `/var/log/wmf-auto-re... [13:57:17] 10DBA, 06Operations, 13Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#3012956 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['db2040.codfw.wmnet'] ``` and were **ALL** successful. [15:21:44] 10DBA, 06Operations, 13Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#3013195 (10jcrespo) All core servers/server with core data now support TLS connections and use it for replication (except labs- the new server suport it, but are not accesible remotely for... [15:23:01] jynus, marostegui: \o/ [15:23:29] See my comment on other channel, moritzm [15:23:43] moritzm: it was only jynus :-) [15:24:05] he's restarted I don't know how many hosts the last few days [15:24:09] and reimaged... [15:24:21] jynus: I'll have a look at the task tomorrow [15:24:31] dbstore1002, 2001 and 2002 have to restart the replication channels [15:24:55] doing it now [15:26:58] high QPS on db1094 [15:27:12] db1086 seems underused [15:27:16] expected :) [15:27:18] I depooled it [15:27:21] ok ok [15:27:22] and repoolintg it now [15:27:25] no rush [15:27:33] I thought it had an unbalance [15:27:34] No, I am done :) [15:27:37] normally [15:27:51] yes, but I mean I only worry if it was the normal state [15:27:58] like, some server underused [15:28:07] ah no no :) [15:30:44] 10DBA, 10MediaWiki-extensions-ORES, 10Reading Epics, 06Revision-Scoring-As-A-Service, and 2 others: rcshow=oresreview is slow - https://phabricator.wikimedia.org/T152585#3013230 (10Halfak) a:03Ladsgroup [15:43:22] 07Blocked-on-schema-change, 06Collaboration-Team-Triage, 10Notifications, 13Patch-For-Review, 07Schema-change: Add primary key to echo_notification table - https://phabricator.wikimedia.org/T136428#3013297 (10Marostegui) s7 is all done: ``` root@neodymium:/home/marostegui/git/software/dbtools# for i in... [16:21:34] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: db2060 not accessible - https://phabricator.wikimedia.org/T156161#3013464 (10Marostegui) The server is off now. Feel free to turn it on once it is all done (or if the HP technician doesn't show up again) Thank you! [17:08:53] 10DBA: Use tls for dump backup generation - https://phabricator.wikimedia.org/T151583#3013632 (10jcrespo) [17:08:57] 10DBA, 06Operations, 06Performance-Team, 07Availability, 07Wikimedia-Multiple-active-datacenters: Apache <=> mariadb SSL/TLS for cross-datacenter writes - https://phabricator.wikimedia.org/T134809#3013633 (10jcrespo) [17:09:06] 10DBA, 06Operations, 13Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#3013629 (10jcrespo) 05Open>03Resolved a:03jcrespo I have restarted all replication channels of dbstore1002/2001/2002 and db1047. I consider this task resolved, with some follow-ups,... [17:23:36] 10DBA, 06Operations, 06Performance-Team, 07Availability, 07Wikimedia-Multiple-active-datacenters: Apache <=> mariadb SSL/TLS for cross-datacenter writes - https://phabricator.wikimedia.org/T134809#3013704 (10jcrespo) TLS is deployed on all core MySQLs (s*, x2, es*, pc* shards)- although for obvious reaso... [17:31:28] 10DBA, 06Operations: Followup for TLS MariaDB server roll-out - https://phabricator.wikimedia.org/T157702#3013769 (10jcrespo) [17:37:44] 10DBA, 06Operations: Followup for TLS MariaDB server roll-out - https://phabricator.wikimedia.org/T157702#3013816 (10jcrespo) [17:37:47] 10DBA: Use tls for dump backup generation - https://phabricator.wikimedia.org/T151583#3013815 (10jcrespo) [17:37:51] 10DBA, 06Operations, 13Patch-For-Review: Set up TLS for MariaDB replication - https://phabricator.wikimedia.org/T111654#3013817 (10jcrespo) [17:59:47] 10DBA, 10Browser-Tests-Infrastructure, 10MediaWiki-API, 06Release-Engineering-Team: Database query error (internal_api_error_DBQueryError) while getting list=allrevisions - https://phabricator.wikimedia.org/T123557#3013913 (10Jdlrobson) p:05Triage>03Unbreak! Hey this is hitting browser tests now consis... [18:08:27] 10DBA, 10Browser-Tests-Infrastructure, 10MediaWiki-API, 06Release-Engineering-Team: Database query error (internal_api_error_DBQueryError) while getting list=allrevisions - https://phabricator.wikimedia.org/T123557#3013983 (10jcrespo) @Jdlrobson I do not know the details- but the original bug is a producti... [18:13:52] 10DBA, 10Browser-Tests-Infrastructure, 10MediaWiki-API, 06Release-Engineering-Team: Database query error (internal_api_error_DBQueryError) while getting list=allrevisions - https://phabricator.wikimedia.org/T123557#3014016 (10Jdlrobson) p:05Unbreak!>03Triage Tracking this now in T157711 [18:14:08] 10DBA, 10MediaWiki-API: Database query error (internal_api_error_DBQueryError) while getting list=allrevisions - https://phabricator.wikimedia.org/T123557#3014020 (10Jdlrobson) [20:46:36] 10DBA, 06Operations, 10ops-codfw, 13Patch-For-Review: db2060 not accessible - https://phabricator.wikimedia.org/T156161#3014626 (10Papaul) unfortunately once again the tech didn't show up as scheduled between 9 am and 11am. I had to call HP and find out why but they couldn't tell me the reason the tech did...