[00:11:56] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10RobH) [00:26:00] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10RobH) [00:31:15] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts: ` ['db1156.eqiad.wmnet', 'db1157.eqiad.wmnet', 'db1158.eqiad.wmnet',... [00:58:13] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1166.eqiad.wmnet', 'db1164.eqiad.wmnet', 'db1170.eqiad.wmnet', 'db1162.eqiad.wmnet', 'db1160.eqiad.wmnet', 'db1... [01:06:37] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10RobH) [01:07:51] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10RobH) > Of which those **FAILED**: > ` > ['db1159.eqiad.wmnet', 'db1171.eqiad.wmnet', 'db1172.eqiad.wmnet', 'db1173.eqiad.wmnet', 'db1175.eqiad.wmnet'] > ` I've updated... [05:47:54] 10DBA, 10Phabricator, 10SRE, 10Patch-For-Review: Grant phstats user SELECT rights for phabricator_policy database - https://phabricator.wikimedia.org/T272654 (10Marostegui) 05Open→03Resolved a:03Marostegui Change has been applied - thanks daniel for working out the patch! [05:51:58] 10DBA, 10Phabricator, 10SRE, 10Patch-For-Review: Grant phstats user SELECT rights for phabricator_policy database - https://phabricator.wikimedia.org/T272654 (10Marostegui) Actually the original patch creator was @Aklapper so thank you too! :) [05:58:06] 10DBA: m2 codfw master crashed - https://phabricator.wikimedia.org/T272614 (10Marostegui) So it also made the slave in codfw crash: ` Jan 21 17:15:50 db2078 mysqld[2936]: 210121 17:15:50 [ERROR] mysqld got signal 11 ; Jan 21 17:15:50 db2078 mysqld[2936]: This could be because you hit a bug. It is also possible t... [06:05:19] 10DBA, 10Patch-For-Review: Productionize x2 databases - https://phabricator.wikimedia.org/T269324 (10Marostegui) Thank you Chris! I have pooled codfw hosts with no issues. On Monday I will do the same with eqiad ones as I rather not pool eqiad new hosts on Friday - just in case even if they are not in use, so... [06:05:56] 10DBA, 10Phabricator: Restart m3 (phabricator) database master db1132 - https://phabricator.wikimedia.org/T272596 (10Marostegui) Thanks @mmodell - going to send a calendar invite for 06:00 AM UTC! [06:06:35] 10DBA, 10Phabricator: Restart m3 (phabricator) database master db1132 - https://phabricator.wikimedia.org/T272596 (10Marostegui) For Wednesday 27th! [06:32:48] 10DBA, 10Phabricator: Restart m3 (phabricator) database master db1132 - https://phabricator.wikimedia.org/T272596 (10Marostegui) [06:33:56] 10Blocked-on-schema-change, 10DBA: Schema change for timestamp field of uploadstash - https://phabricator.wikimedia.org/T270055 (10Marostegui) [06:44:08] 10Blocked-on-schema-change, 10DBA: Schema change for timestamp field of uploadstash - https://phabricator.wikimedia.org/T270055 (10Marostegui) [06:58:11] 10Blocked-on-schema-change, 10DBA: Schema change for timestamp field of uploadstash - https://phabricator.wikimedia.org/T270055 (10Marostegui) [07:23:59] 10DBA, 10DC-Ops, 10SRE, 10ops-eqiad: (Need By: 2020-11-29) rack/setup/install db11[51-76] - https://phabricator.wikimedia.org/T267043 (10Marostegui) @RobH it looks like db1163 has RAID0 instead of RAID10: ` root@db1163:~# megacli -LdPdInfo -a0 Adapter #0 Number of Virtual Disks: 1 Virtual Drive: 0 (Targe... [07:39:00] 10Blocked-on-schema-change, 10DBA: Schema change for timestamp field of uploadstash - https://phabricator.wikimedia.org/T270055 (10Marostegui) [09:08:32] jynus: am i correct that the cumin hosts aren't backed up at all? [09:08:45] I can check [09:09:03] are you concerned about any dataset in particular? [09:09:17] /home/kormat :) [09:10:56] /srv/deployment is backed up from cumin only [09:11:25] I think that is by design, you shouldn't store anything on /home that you didn't want to lose [09:12:04] but if you put a ticket with convincing arguments (not to me, I am happt to do it with no problem) we can easily change that [09:17:41] gotcha [09:44:35] jynus: After rebuilding the m2 hosts, is there a place where I can copy+paste the dump grants for the specific databases in m2? [09:44:54] apart from db1117 and changing the IP to codfw hosts? [09:46:03] not really, I can take care of that and promise to document them better [09:46:37] jynus: don't worry, I can just dump them from db1117:3322 and change the IPs to the codfw hosts, maybe we can track them in a .sql file in puppet like we do with the rest of grants [09:50:14] done, applied them from db1117:3322 changing the ips [09:51:57] 10DBA, 10Orchestrator, 10User-Kormat: Enable report_host on candidate masters - https://phabricator.wikimedia.org/T271106 (10Kormat) [09:56:25] 10DBA: Grant "sockpuppet_import" user INDEX on "sockpuppet" database - https://phabricator.wikimedia.org/T272533 (10Marostegui) [09:56:37] 10DBA: m2 codfw master crashed - https://phabricator.wikimedia.org/T272614 (10Marostegui) 05Open→03Resolved I have rebuilt the hosts - and also upgraded their mariadb version. There's very little information on what actually caused the crash, I think the index creation played a role here, but hard to know th... [10:08:13] 10DBA, 10Phabricator, 10SRE, 10Patch-For-Review: Grant phstats user SELECT rights for phabricator_policy database - https://phabricator.wikimedia.org/T272654 (10Aklapper) 05Resolved→03Open >>! In T272654#6767902, @Marostegui wrote: > Change has been applied (Thanks everyone.) Hmm, https://gerrit.wikim... [10:18:14] 10DBA, 10Phabricator, 10SRE, 10Patch-For-Review: Grant phstats user SELECT rights for phabricator_policy database - https://phabricator.wikimedia.org/T272654 (10Marostegui) 05Open→03Resolved I forgot the m3-slave CNAME uses the hostname directly instead of the proxy, which is not nice but we can fix th... [10:20:05] jynus: kormat: would it make sense to have a list of backed up directories in the MOTD, so immediately obvious when logging in? [10:20:35] I think that should be doable [10:20:42] i like the idea [10:22:22] 10DBA, 10Phabricator, 10SRE, 10Patch-For-Review: Grant phstats user SELECT rights for phabricator_policy database - https://phabricator.wikimedia.org/T272654 (10Aklapper) Yes, works now! Does that mean https://gerrit.wikimedia.org/r/c/operations/puppet/+/657692/ should be closed or abandoned or so? Thanks! <3 [10:24:10] 10Data-Persistence-Backup, 10SRE: print a list of backed up directories in the MOTD of production servers - https://phabricator.wikimedia.org/T272686 (10jcrespo) [10:31:35] 10DBA, 10Orchestrator, 10User-Kormat: Enable report_host on candidate masters - https://phabricator.wikimedia.org/T271106 (10Kormat) [10:43:33] I am going to work as a goal on bacula puppet clean up, so I most likely will be able to take care of that refinement [10:48:26] 10DBA, 10Orchestrator, 10User-Kormat: Enable report_host on candidate masters - https://phabricator.wikimedia.org/T271106 (10Kormat) [10:50:37] 10DBA, 10Orchestrator, 10User-Kormat: Enable report_host on candidate masters - https://phabricator.wikimedia.org/T271106 (10Kormat) Script to check: ` NODESET="db[1075-1076,1093,1109,1130,1134,1136,1138,1141,1149].eqiad.wmnet" for inst in $(nodeset -e "$NODESET"); do if [ "$(sudo -H mysql.py -h $inst -B... [11:22:23] 10DBA, 10Orchestrator, 10User-Kormat: Enable report_host on candidate masters - https://phabricator.wikimedia.org/T271106 (10Kormat) [11:24:41] 10DBA, 10netbox: Grants not working with DB hosts with to ipv6 - https://phabricator.wikimedia.org/T270101 (10jcrespo) [11:36:19] 10Data-Persistence-Backup, 10SRE: print a list of backed up directories in the MOTD of production servers - https://phabricator.wikimedia.org/T272686 (10LSobanski) p:05Triage→03Medium Sounds like a good idea. Is this to address a specific concern that came up? One thing that comes to mind is the amount, re... [11:40:00] 10Data-Persistence-Backup, 10SRE: print a list of backed up directories in the MOTD of production servers - https://phabricator.wikimedia.org/T272686 (10jcrespo) > Is this to address a specific concern that came up? ^@mark [11:43:10] 10Data-Persistence-Backup, 10SRE: print a list of backed up directories in the MOTD of production servers - https://phabricator.wikimedia.org/T272686 (10mark) It's purely an idea I've had for a long time, to make it immediately obvious to anyone logging in what is backed up, and what isn't. That should help to... [11:46:00] I was looking over pending reboots and I think dbmonitor2001 can be decommissioned, right? it was never a real fallback for dbmonitor1001 (since tendril doesn't work on PHP 7) and if we add cross DC redundancy for Orchestrator it would be named dborch2001, so it seems to me it can simply go away? [11:49:05] 10Data-Persistence-Backup, 10SRE: print a list of backed up directories in the MOTD of production servers - https://phabricator.wikimedia.org/T272686 (10jcrespo) 2 notices: * This could only cover direct bacula backups (things that are indirectly backed up, like puppet or gerrit repos) or database and media ba... [12:34:17] marostegui, I am documenting dump grants on puppet, you will be shocked soon why I was waiting to do it:-) [12:34:57] * jynus warns everyone to not look directly to the eyes of my next patch [12:35:55] 10DBA, 10Orchestrator, 10User-Kormat: Enable report_host on candidate masters - https://phabricator.wikimedia.org/T271106 (10Kormat) [12:40:12] Just call it a Medusa patch, or maybe a Basilisk patch [12:57:52] I just used the right emoji on comment https://gerrit.wikimedia.org/r/c/operations/puppet/+/657801 [12:59:44] this will test the limit of "a bad patch is better than no patch" [13:05:00] I am also going to add the grant inventory to potential topics for GSOC [13:08:36] haha nice patch [13:09:01] jynus: I don't get the commit message?, which docs? [13:09:18] it was supposed to be documentation, not documents [13:09:22] will amend title [13:09:33] aah [13:12:04] 10DBA, 10Phabricator, 10SRE, 10Patch-For-Review: Grant phstats user SELECT rights for phabricator_policy database - https://phabricator.wikimedia.org/T272654 (10Marostegui) Just merged it! :) [13:13:38] I just realized there was a task for this T111929 [13:13:39] T111929: Puppetize grants for mysql hosts that are the source of recovery (dbstore, passive misc) - https://phabricator.wikimedia.org/T111929 [13:13:55] there's always a task! [13:15:40] https://knowyourmeme.com/memes/relevant-phabricator-task [13:17:01] this reminds me, x2 is still WIP, right? [13:17:08] yes [13:17:17] ok, will wait when done to talk backups [13:17:18] and no backup is needed - see my comment on the patchset [13:17:22] oh, true [13:17:25] I forgot [13:17:32] sorry [13:17:41] I didn't want to abandon the patch myself as I am not the owner [13:17:54] there's a patch? [13:18:10] https://gerrit.wikimedia.org/r/c/operations/puppet/+/649820 [13:18:36] look, this week I had like 10 access request, not my best week :-) [13:19:16] Eh? [13:19:16] oh, I missed the response because I was away, [13:19:36] I mean that I am a bit overloaded with info, and clumsy [13:20:09] because clinic duty [13:20:26] I have not complained at all anywhere [13:26:59] 10DBA, 10Orchestrator, 10User-Kormat: Enable report_host on candidate masters - https://phabricator.wikimedia.org/T271106 (10Kormat) [13:27:08] 10DBA, 10Orchestrator, 10User-Kormat: Enable report_host for mariadb - https://phabricator.wikimedia.org/T266483 (10Kormat) [13:27:10] 10DBA, 10Orchestrator, 10User-Kormat: Enable report_host on candidate masters - https://phabricator.wikimedia.org/T271106 (10Kormat) 05Open→03Resolved This is now complete. [13:27:37] 10DBA, 10Orchestrator, 10User-Kormat: Enable report_host on candidate masters - https://phabricator.wikimedia.org/T271106 (10Marostegui) a:05Marostegui→03Kormat Thanks! [13:33:56] 10DBA, 10cloud-services-team (Kanban): Move wikireplicas under the new sanitarium hosts (db1154, db1155) - https://phabricator.wikimedia.org/T272008 (10Marostegui) [13:35:08] 10DBA, 10Patch-For-Review: Test upgrading sanitarium hosts to Buster + 10.4 - https://phabricator.wikimedia.org/T268742 (10Marostegui) [13:35:14] 10DBA, 10cloud-services-team (Kanban): Move wikireplicas under the new sanitarium hosts (db1154, db1155) - https://phabricator.wikimedia.org/T272008 (10Marostegui) 05Open→03Stalled clouddb1015:3314 moved. The only pending host is clouddb1019 which is waiting for on-site maintenance as it is inaccessible (T... [15:05:50] * sobanski stepping out for the rest of the day