[05:59:58] 10DBA, 10Orchestrator, 10User-Kormat: Enable communication between orchestrator and clouddb hosts - https://phabricator.wikimedia.org/T273606 (10Marostegui) \o/ thanks! [06:02:28] 10DBA, 10SRE: Decom dbmonitor2001 - https://phabricator.wikimedia.org/T274496 (10Marostegui) p:05Triage→03Medium a:03Kormat Yeah, as far as I remember we're not using this for anything Assigning it for Stevie for confirmation and removal (if that applies) [06:10:23] 10DBA, 10SRE, 10ops-eqiad: Investigate and repool db1134 - https://phabricator.wikimedia.org/T274472 (10Marostegui) Thanks everyone who responded to this incident! [06:17:33] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) >>! In T258361#6822070, @jcrespo wrote: > I am taking db1163 to, at least temporarily, substitute db1134 due to T274472. Thanks. I... [07:58:24] 10Blocked-on-schema-change, 10DBA: Schema change for renaming name_title_timestamp on archive table - https://phabricator.wikimedia.org/T273359 (10Marostegui) a:03Marostegui [07:59:16] 10Blocked-on-schema-change, 10DBA: Schema change for renaming name_title_timestamp on archive table - https://phabricator.wikimedia.org/T273359 (10Marostegui) [08:08:34] 10Blocked-on-schema-change, 10DBA: Schema change for renaming name_title_timestamp on archive table - https://phabricator.wikimedia.org/T273359 (10Marostegui) Altered first db1098:3316 and will leave it for 24h to see if any errors show up s6 progress [] labsdb1012 [] labsdb1011 [] labsdb1010 [] labsdb1009 []... [08:08:37] 10Blocked-on-schema-change, 10DBA: Schema change for renaming name_title_timestamp on archive table - https://phabricator.wikimedia.org/T273359 (10Marostegui) [08:08:40] 10Blocked-on-schema-change, 10DBA: Schema change for renaming name_title_timestamp on archive table - https://phabricator.wikimedia.org/T273359 (10Marostegui) ` # for i in frwiki jawiki ruwiki; do echo $i; mysql.py -hdb1098:3316 $i -e "show create table archive\G" | grep title_timestamp ; done frwiki KEY `ar... [08:26:51] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [08:30:32] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission db1075.eqiad.wmnet - https://phabricator.wikimedia.org/T274235 (10Marostegui) [08:30:48] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission db1075.eqiad.wmnet - https://phabricator.wikimedia.org/T274235 (10Marostegui) [08:45:50] 10DBA, 10Add-Link, 10Growth-Structured-Tasks, 10Growth-Team (Current Sprint), and 2 others: Add Link engineering: Provide a mechanism for storing data about which link recommendations were rejected by the user - https://phabricator.wikimedia.org/T266446 (10kostajh) I don't think there is anything to QA her... [09:00:52] 10DBA, 10decommission-hardware: decommission db1076.eqiad.wmnet - https://phabricator.wikimedia.org/T274752 (10Marostegui) [09:01:22] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [09:01:27] 10DBA, 10decommission-hardware: decommission db1076.eqiad.wmnet - https://phabricator.wikimedia.org/T274752 (10Marostegui) [09:01:30] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [09:40:57] marostegui: candidate master tag on dbctl was not touched, as I saw it was already in a weird state- I will leave to you to put it in the desired state you prefer [09:44:15] for db1163? [09:47:04] all of s1 need a candidate master tag revision [09:47:15] so I didn't want to touch it [09:47:31] sure, I will take care of it, db1134 needs to be remove and db1163 will be added [09:47:48] just leave it in the way you prefer [09:48:12] as I knew that was not used at the moment and I wasn't unsure how to leave it [09:48:53] maybe chcklist needs a point of "update candidate master tag on dbctl" or something, until integrated [09:50:47] fixed, db1134 removed, db1163 added [09:51:27] ah, and db1134 was left in STATEMENT [09:51:45] I am telling you all of this so you are 100% aware of current status [09:52:46] This is what I have in mind as next steps: https://phabricator.wikimedia.org/T258361#6829557 [09:53:47] https://gerrit.wikimedia.org/r/c/operations/puppet/+/664230 [10:30:58] would it be possible to reboot cumin1001 tomorrow from the perspective of DB/backup maint tasks? [10:31:40] moritzm: I am doing some data checks there, so if you can give me a heads up tomorrow that'd be cool so I can write down where I stop it [10:33:13] ack, will do (if it also works out for backups) [10:43:06] for me tomorow is the best day [10:43:22] although any time after 9 usually backups are done [11:02:34] great, I'll recheck with both of you tomorrow before I start [12:15:25] 10DBA, 10Patch-For-Review, 10Performance-Team (Radar): Productionize x2 databases - https://phabricator.wikimedia.org/T269324 (10Marostegui) a:05Marostegui→03Kormat Thanks @Krinkle and @aaron. We synced up in our meeting today, assigning this to Stevie for the puppet changes needed. [12:51:51] 10DBA, 10decommission-hardware, 10Patch-For-Review: decommission db1093.eqiad.wmnet - https://phabricator.wikimedia.org/T273955 (10Marostegui) [13:02:50] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) db1162 is fully pooled [15:55:49] hi… following up on T272571 I would like to update the suggester [15:55:49] T272571: Updating's Wikidata property suggester caused replica lag on all wikidata databases - https://phabricator.wikimedia.org/T272571 [15:56:11] With the new process (data refining done prior to importing) this should be fine [15:56:29] but nevertheless, I wanted to coordinate the update this time [15:59:50] jynus: ^ [16:00:16] I read you, but I am unclear about the ask? [16:00:50] s/unclear/unsure/ [16:02:18] I mostly wanted to give you a heads up... is there anything in particular you want me to look at (other than the replag dashboard)? [16:02:37] nothinng else, really [16:03:01] the replag and the db error dashboard [16:03:17] https://logstash.wikimedia.org/app/dashboards#/view/87348b60-90dd-11e8-8687-73968bebd217 [16:04:22] if you did some tweaks and expect no lag, you can resolve it [16:04:31] (the ticket) [16:15:07] The update ran through and the dashboards look good (as expected) [16:15:13] Thus I'm going to resolve the ticket [16:15:31] cool, thanks! [16:45:50] 10DBA: Add a link engineering: Database for link recommendation service - https://phabricator.wikimedia.org/T267214 (10kostajh) 05Resolved→03Open @Marostegui I was wondering if you could double-check the permissions, I am seeing this for the admin user: ` pymysql.err.OperationalError: (1142, "SELECT command... [17:54:41] 10Data-Persistence-Backup, 10Analytics: Evaluate the need to generate and maintain zookeeper backups - https://phabricator.wikimedia.org/T274808 (10jcrespo) [17:56:40] 10Data-Persistence-Backup, 10Analytics: Evaluate the need to generate and maintain zookeeper backups - https://phabricator.wikimedia.org/T274808 (10jcrespo) This is not something we ask analytics to take care of, but for the initial questions, I believe @elukey or @ottomata may be the person to know more about... [17:58:04] jynus: o/ I'd be interested to start a discussion (for next fiscal in case) with you about backup of Hadoop data during the next weeks if you have time [17:58:24] ha ha, nice try to derail me :-DDDD [17:58:35] I'd be happy to talk about backups [17:58:37] nono I am going to check the zookeeper ticket I promise [17:58:37] of anything [17:58:48] ticket is mostly a question (but an informed one) [17:58:56] please don't think I am dumping a task [17:59:18] we have just created an ad-hoc cluster to support the upgrade of the hadoop cluster to bigtop (that we are going to repurpose during the following days) since we needed to save ~400TB of data [17:59:28] elukey, we should indeed sync up on backup needs [17:59:34] but long term it would be great to have a backup of datasets that we cannot re-create [17:59:52] what is the best medium? Should I open a task and add you in Cc? [17:59:54] we have a lot of upcoming task, and simplifying workflow, or creating some that work for all [18:00:01] super thanks a lot :) [18:00:04] would be beneficial to all [18:00:42] the thing is, some of the tools we have may not work great for all usages [18:01:33] for example, for media backups we need a complete different backup model and tech [18:01:43] we may need the same for whatever datasets you handle [18:01:56] the same == something completely different [18:02:34] for now, let's schedule a meeting for 2 topics: [18:02:49] (even if very in the future) [18:02:59] 1) persistence team understanding analytics needs [18:03:08] (including hadoop and what you mention) [18:03:52] 2) analytics helping persistence document existing datasets and its backup needs [18:04:57] sounds good [18:13:44] 10DBA, 10Data-Persistence-Backup, 10Data-Persistence: Drop unused database "bacula" from m1 - https://phabricator.wikimedia.org/T274809 (10jcrespo) [18:14:53] 10DBA, 10SRE, 10serviceops, 10Goal, 10Patch-For-Review: Strengthen backup infrastructure and support - https://phabricator.wikimedia.org/T229209 (10jcrespo) [18:15:12] 10DBA, 10Data-Persistence-Backup, 10Data-Persistence: Drop unused database "bacula" from m1 - https://phabricator.wikimedia.org/T274809 (10jcrespo) [18:15:15] 10Data-Persistence-Backup, 10SRE, 10Goal, 10Patch-For-Review: Followup to backup1001 bacula switchover (misc pending tasks) - https://phabricator.wikimedia.org/T238048 (10jcrespo) [18:15:42] 10Data-Persistence-Backup, 10SRE, 10Goal, 10Patch-For-Review: Followup to backup1001 bacula switchover (misc pending tasks) - https://phabricator.wikimedia.org/T238048 (10jcrespo) 05Open→03Resolved Regarding the last 2 points, we have, in a way, done the last point "parametrize better the jobdefaults i...