[04:53:09] 10Blocked-on-schema-change, 10DBA, 10Fundraising-Backlog: CentralNotice: Update DB schema on Meta for campaign types feature - https://phabricator.wikimedia.org/T272953 (10AndyRussG) >>! In T272953#6777633, @Marostegui wrote: > testwiki is done: Ahh cool thanks so much!! Yeah looks good to me! :D [06:10:39] 10Blocked-on-schema-change, 10DBA: Schema change for timestamp field of uploadstash - https://phabricator.wikimedia.org/T270055 (10Marostegui) [06:11:01] 10Blocked-on-schema-change, 10DBA: Schema change for timestamp field of uploadstash - https://phabricator.wikimedia.org/T270055 (10Marostegui) 05Open→03Resolved All done [06:50:36] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) a:03Marostegui [06:51:21] 10Blocked-on-schema-change, 10DBA, 10Fundraising-Backlog: CentralNotice: Update DB schema on Meta for campaign types feature - https://phabricator.wikimedia.org/T272953 (10Marostegui) a:03Marostegui [07:07:13] 10DBA, 10Data-Persistence-Backup, 10Orchestrator: Orchestrator db logical backups - https://phabricator.wikimedia.org/T266636 (10Marostegui) 05Stalled→03Resolved a:03jcrespo I am going to close this as resolved. I have checked the backups at `dbprov2003` and they look good. `orchestrator.database_inst... [08:50:37] 10DBA, 10wikitech.wikimedia.org, 10User-notice, 10cloud-services-team (Kanban): Restart m5 master (db1128) - https://phabricator.wikimedia.org/T272388 (10Marostegui) Pre-restart steps are done [09:04:18] 10DBA, 10Orchestrator: Add m* sections to Orchestrator - https://phabricator.wikimedia.org/T272568 (10Marostegui) [09:04:20] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Marostegui) [09:04:25] 10DBA, 10wikitech.wikimedia.org, 10User-notice, 10cloud-services-team (Kanban): Restart m5 master (db1128) - https://phabricator.wikimedia.org/T272388 (10Marostegui) 05Open→03Resolved This was done. Downtime start: 09:00:23 Downtime stop: 09:00:51 Total: 28 seconds ` +--------------------+ | @@report_... [09:05:00] arturo: m5 master was restarted, downtime was 28 seconds, please double check if everything from your side looks good (wikitech looks fine) [09:07:37] 10DBA, 10Orchestrator: Add m* sections to Orchestrator - https://phabricator.wikimedia.org/T272568 (10Marostegui) m5 is now in orchestrator [09:07:44] 10DBA, 10Orchestrator: Add m* sections to Orchestrator - https://phabricator.wikimedia.org/T272568 (10Marostegui) [09:08:23] marostegui: awesome! <3 [09:09:02] <3!! [09:10:03] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10Marostegui) [09:14:20] thanks marostegui will check in a bit, currently in a meeting [09:14:27] arturo: thanks! [09:15:46] marostegui: I can RW on wikitech 🎉 [09:16:20] arturo: yeah, wikitech works fine, I am more worried about any other services you might have that could be using that host [09:16:27] I have no visibility on those :) [09:39:14] marostegui: striker was the other? [09:39:42] arturo: you have the list of databases at https://phabricator.wikimedia.org/T272388 [09:40:33] 10DBA, 10Orchestrator: Add m* sections to Orchestrator - https://phabricator.wikimedia.org/T272568 (10Marostegui) [09:40:39] 10DBA, 10OTRS, 10Recommendation-API, 10Research, 10Performance-Team (Radar): Restart m2 database master (db1107) - https://phabricator.wikimedia.org/T272964 (10Marostegui) [09:41:16] marostegui: striker works 🎉 [10:25:46] 10Data-Persistence-Backup, 10SRE, 10Patch-For-Review: print a list of backed up directories in the MOTD of production servers - https://phabricator.wikimedia.org/T272686 (10jcrespo) [10:25:50] 10Data-Persistence-Backup, 10SRE, 10Goal, 10Patch-For-Review: Followup to backup1001 bacula switchover (misc pending tasks) - https://phabricator.wikimedia.org/T238048 (10jcrespo) [10:58:33] 10DBA, 10Orchestrator: Add m* and es4/es5 sections to Orchestrator - https://phabricator.wikimedia.org/T272568 (10Marostegui) [11:09:43] 10DBA, 10Orchestrator, 10Patch-For-Review, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Marostegui) [11:15:16] 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on all production instances - https://phabricator.wikimedia.org/T268336 (10Marostegui) es5 cleaned [11:15:24] 10DBA, 10Orchestrator: Add m* and es4/es5 sections to Orchestrator - https://phabricator.wikimedia.org/T272568 (10Marostegui) [11:15:31] 10DBA, 10Orchestrator: Add m* and es4/es5 sections to Orchestrator - https://phabricator.wikimedia.org/T272568 (10Marostegui) es5 is now in orchestrator [11:18:45] 10DBA, 10Orchestrator: Cleanup heartbeat.heartbeat on all production instances - https://phabricator.wikimedia.org/T268336 (10Marostegui) [11:20:11] 10DBA, 10Orchestrator: Add m* and es4/es5 sections to Orchestrator - https://phabricator.wikimedia.org/T272568 (10Marostegui) [11:21:51] 10DBA, 10Orchestrator: Add m* and es4/es5 sections to Orchestrator - https://phabricator.wikimedia.org/T272568 (10Marostegui) We need to decide what to do with hosts on es1, es2 and es3. They do not have replication enabled so we need to decide if we want them in orchestrator as standalone or not. At the momen... [11:46:44] 10DBA, 10OTRS, 10Recommendation-API, 10Research, 10Performance-Team (Radar): Restart m2 database master (db1107) - https://phabricator.wikimedia.org/T272964 (10Marostegui) Procedure: Pre restart [] Silence m2 hosts [] buffer pool dump + disablement in advance to make the restart faster Restart [] `!log... [12:07:19] 10Data-Persistence-Backup, 10SRE, 10Patch-For-Review: print a list of backed up directories in the MOTD of production servers - https://phabricator.wikimedia.org/T272686 (10jcrespo) ` $ ssh apt1001.wikimedia.org Linux apt1001 XXXXX-amd64 #1 SMP Debian XXXXX (XXXXX) x86_64 Debian GNU/Linux 10 (buster) Backed... [12:08:02] 10Data-Persistence-Backup, 10SRE, 10Patch-For-Review: print a list of backed up directories in the MOTD of production servers - https://phabricator.wikimedia.org/T272686 (10jcrespo) ` $ ssh gerrit1001.wikimedia.org ... Backed up on this host: gerrit-repo-data ... ` [12:27:00] 10DBA, 10Orchestrator: Add m* and es4/es5 sections to Orchestrator - https://phabricator.wikimedia.org/T272568 (10Kormat) >>! In T272568#6783461, @Marostegui wrote: > My personal opinion is to include them on orchestrator even if they are not replicating from each other - @Kormat thoughts? 👍 from me. [12:32:19] 10DBA, 10Orchestrator: Add m* and es4/es5 sections to Orchestrator - https://phabricator.wikimedia.org/T272568 (10Marostegui) Sounds good, I will clean up the heartbeat table there. It needs re-creation with the newer one schema, as the "shard" section doesn't exist on that one. Once that is done, hopefully o... [12:42:09] 10Data-Persistence-Backup, 10SRE: Revert OpenSSL min version configuration introduced for bacula compatibility - https://phabricator.wikimedia.org/T273182 (10jcrespo) [12:47:17] 10Data-Persistence-Backup, 10SRE: Revert OpenSSL min version configuration introduced for bacula compatibility - https://phabricator.wikimedia.org/T273182 (10jcrespo) p:05Triage→03Medium [12:48:08] 10Data-Persistence-Backup, 10SRE: Revert OpenSSL min version configuration introduced for bacula compatibility - https://phabricator.wikimedia.org/T273182 (10jcrespo) I have asked on ^this ticket of potential schedule, if they answer we can decide to go for exception or wait for upgrade. [12:51:52] 10DBA, 10Orchestrator: Add m* and es4/es5 sections to Orchestrator - https://phabricator.wikimedia.org/T272568 (10Marostegui) So looks like that even if the "alias" query works for all the three hosts, just the first one that gets discovered is the one that gets placed on `es1` cluster, the rest are just their... [13:56:42] 10DBA, 10Orchestrator, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Marostegui) [15:12:07] hi all quick question (hopefully) is there a recomended method/profile for installing databases in a cloud project? [15:13:33] kormat: perhaps? [15:13:59] hiii [15:15:04] jbond42: the usual profiles work in cloud, but the firewall rules might not be what you want [15:16:27] you're probably best off asking wmcs [15:17:01] kormat: when you say the "ussual profiles", ah ok was going to point out that there are 46 mariadb profiles :) [15:17:07] thanks ill ask in cloud [15:20:22] jbond42: i'll probably regret asking, but what's the context? :) [15:20:40] jbond42: apt-get install :-P [15:20:42] * volans hides [15:22:33] kormat: creating an copy of idp.wikimedia.org in cloud [15:22:44] it storse u2f registrations in a sql db [15:23:36] mariadb? [15:23:47] yes in production it uses m1 [15:23:55] ah i see [15:24:17] it dosn;t have to store them in a sql db and its a bit over kill for a dev environment but at the same time i want to try and mirror production as best i can [15:25:09] andrewb.ogott: has pointed me towards role::mariadb::cloudinfra which should be enough to get me sorted [15:25:13] i think you're likely to end up needing a custom mariadb profile [15:25:25] ok. good luck :) [15:25:59] :) thanks ill try not to polute thingfuther [15:26:59] if you have any questions about our puppet code, [15:27:05] i'll get you marostegui's number [15:27:12] lol :D [15:37:09] marostegui the DIMM has arrived for clouddb1019 [15:47:47] cmjohnson1: I will power off the host for you, one sec [15:47:55] okay [15:47:56] thanks [15:48:02] should only take a minute [15:49:29] cmjohnson1: host powered of! [15:49:30] off [15:57:31] marostegui server is back up [16:02:08] cmjohnson1: thanks I can see the memory back! [16:02:14] I will take it from here, thank you so much [16:06:08] 10DBA, 10cloud-services-team (Kanban): Move wikireplicas under the new sanitarium hosts (db1154, db1155) - https://phabricator.wikimedia.org/T272008 (10Cmjohnson) [16:06:44] 10DBA, 10SRE, 10ops-eqiad: Memory errors on clouddb1019 - https://phabricator.wikimedia.org/T272125 (10Cmjohnson) 05Open→03Resolved This has been fixed. DELL Return tracking # USPS 9202394653012447257126 [16:08:46] 10DBA, 10SRE, 10ops-eqiad: Memory errors on clouddb1019 - https://phabricator.wikimedia.org/T272125 (10Marostegui) Thank you so much [16:41:48] 10Data-Persistence-Backup, 10SRE, 10Goal, 10Patch-For-Review: Followup to backup1001 bacula switchover (misc pending tasks) - https://phabricator.wikimedia.org/T238048 (10jcrespo) [16:49:10] 10Data-Persistence-Backup, 10SRE, 10Goal, 10Patch-For-Review: Followup to backup1001 bacula switchover (misc pending tasks) - https://phabricator.wikimedia.org/T238048 (10jcrespo) :-) ` 303254 Back Full 0 0 helium.eqiad.wmnet-Monthly-1st-Wed-Archive-archive-backup is running ` [18:32:01] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts: ` db1159.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-rei... [18:54:02] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1159.eqiad.wmnet'] ` and were **ALL** successful. [19:04:56] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10RobH) [19:07:14] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10RobH) >>! In T267043#6767978, @Marostegui wrote: > @RobH it looks like db1163 has RAID0 instead of RAID10: > ` Acknowledged, fix and reimage in progress! [19:08:01] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts: ` db1163.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-rei... [19:30:11] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1163.eqiad.wmnet'] ` and were **ALL** successful. [19:41:22] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10RobH) >>! In T267043#6784890, @RobH wrote: >>>! In T267043#6767978, @Marostegui wrote: >> @RobH it looks like db1163 has RAID0 instead of RAID10: >> ` > > Acknowledged,... [19:42:37] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts: ` db1171.eqiad.wmnet ` The log can be found in `/var/log/wmf-auto-rei... [20:04:36] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1171.eqiad.wmnet'] ` and were **ALL** successful. [20:19:02] 10Data-Persistence-Backup, 10SRE, 10Goal, 10Patch-For-Review: Followup to backup1001 bacula switchover (misc pending tasks) - https://phabricator.wikimedia.org/T238048 (10jcrespo) yay ` 303254 Full 4 568.7 G OK 28-Jan-21 20:16 helium.eqiad.wmnet-Monthly-1st-Wed-Archive-archive-backup ` [20:24:40] 10Data-Persistence-Backup, 10SRE, 10Goal, 10Patch-For-Review: Followup to backup1001 bacula switchover (misc pending tasks) - https://phabricator.wikimedia.org/T238048 (10jcrespo) yay*2 ` 303259 Restore 1 2.369 G OK 28-Jan-21 20:21 RestoreFiles ` ` $ diff /var/tmp/bacula-restores/srv/ba... [20:26:41] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10RobH) [21:15:42] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts: ` ['db1172.eqiad.wmnet', 'db1173.eqiad.wmnet', 'db1175.eqiad.wmnet']... [21:39:55] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1175.eqiad.wmnet', 'db1172.eqiad.wmnet', 'db1173.eqiad.wmnet'] ` and were **ALL** successful. [21:51:00] 10DBA, 10Patch-For-Review: Productionize x2 databases - https://phabricator.wikimedia.org/T269324 (10aaron) >>! In T269324#6772394, @Marostegui wrote: > This is all done - hosts are ready to start getting data. I was thinking that these would be setup just like the pcxxxx servers (e.g. each server in eqiad ha... [22:34:33] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10RobH) [22:36:05] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10RobH) 05Open→03Resolved [22:36:07] 10DBA, 10SRE, 10Patch-For-Review: Productionize db1155-db1175 and refresh and decommission db1074-db1095 (22 servers) - https://phabricator.wikimedia.org/T258361 (10RobH)