[00:53:47] 10DBA, 10Beta-Cluster-Infrastructure, 10Continuous-Integration-Infrastructure, 10Wikimedia-Rdbms, 10Epic: Enable MariaDB/MySQL's Strict Mode - https://phabricator.wikimedia.org/T108255 (10Reedy) [01:05:19] 10DBA, 10Performance-Team: Additional database user for XHGui administration - https://phabricator.wikimedia.org/T260640 (10dpifke) [01:06:08] 10DBA, 10Performance-Team: Additional database user for XHGui administration - https://phabricator.wikimedia.org/T260640 (10dpifke) p:05Triage→03Low This is not urgent; we can make due at the moment by logging in as 'xhgui'. [04:39:55] kormat or jynus around? [04:40:45] hey, what's up? [04:41:26] kormat: With the latest changes to switchover.py I am not sure about: https://phabricator.wikimedia.org/P12280 [04:41:37] I can use the old repo, so that is ok [04:42:17] I added one more line to the error, so refresh :) [04:42:21] `db-switchover` [04:42:26] aha! [04:42:58] that works [04:43:02] great [04:43:02] Thanks [04:43:06] I will replace the documentation [05:02:19] 10DBA, 10Phabricator, 10Patch-For-Review: Upgrade m3 to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T259589 (10Marostegui) Test read-only off [05:02:41] 10DBA, 10Phabricator: Upgrade m3 to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T259589 (10Marostegui) [05:02:54] 10DBA, 10Operations, 10Phabricator: Upgrade m3 to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T259589 (10Marostegui) [05:09:14] 10DBA, 10Epic, 10Patch-For-Review: Upgrade WMF database-and-backup-related hosts to buster - https://phabricator.wikimedia.org/T250666 (10Marostegui) [05:09:39] 10DBA, 10Operations, 10Phabricator: Upgrade m3 to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T259589 (10Marostegui) 05Open→03Resolved This was done successfully. m3 fully runs Buster and MariaDB 10.4 db1128 will be moved to m5, and that will be tracked at T260324 Thanks @mmodell for hel... [07:01:53] 10DBA, 10Operations, 10Phabricator: replace phabricator db passwords with longer passwords - https://phabricator.wikimedia.org/T250361 (10jcrespo) [07:12:35] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for thankyouwiki - https://phabricator.wikimedia.org/T260551 (10Marostegui) a:05Marostegui→03None Thank you, the triggers worked fine, that new user got redacted correctly. [07:15:19] marostegui: `dpkg -L wmfmariadbpy-admin` - that'll show you the names of the executables that are provided now on cumin machines [07:16:01] kormat: thanks,that's useful! [07:17:26] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for lldwiki - https://phabricator.wikimedia.org/T259436 (10Marostegui) `lldwiki_p` database created `| GRANT SELECT, SHOW VIEW ON `lldwiki\_p`.* TO 'labsdbuser'` also applied. #cloud-services-team please go ahead and cre... [07:20:25] 10DBA, 10Operations, 10Phabricator: replace phabricator db passwords with longer passwords - https://phabricator.wikimedia.org/T250361 (10jcrespo) [07:22:57] 10DBA, 10Performance-Team: Additional database user for XHGui administration - https://phabricator.wikimedia.org/T260640 (10Marostegui) I think it should be fine to give ALTER, most of the applications that live on misc, have users that have it. However, it would be good to get a heads up anytime you're going... [07:24:58] I will make sure the grants are the same on all m3 hosts, cleanup anything that is obsolete and reflect all changes on the grant files [07:25:58] thanks jynus [07:39:46] 10DBA, 10Operations, 10Parsoid, 10serviceops, 10Parsoid-Tests: update mysql GRANTs for testreduce - https://phabricator.wikimedia.org/T260627 (10Kormat) Hi, i've created the new grants. Please test and let me know if there are any issues. Cheers. [07:47:30] 10DBA, 10Phabricator, 10Release-Engineering-Team-TODO, 10serviceops, and 3 others: Improve privilege separation for phabricator's config files and mysql credentials - https://phabricator.wikimedia.org/T146055 (10jcrespo) 05Stalled→03Open [07:56:07] 10DBA, 10Phabricator, 10Release-Engineering-Team-TODO, 10serviceops, and 4 others: Improve privilege separation for phabricator's config files and mysql credentials - https://phabricator.wikimedia.org/T146055 (10jcrespo) This is done: https://gerrit.wikimedia.org/r/c/operations/puppet/+/620879 Anything pen... [07:59:02] did you have the time to check https://gerrit.wikimedia.org/r/c/operations/puppet/+/620722 ? I don't need it merged fast, but I would like to know if the general approach would be correct [08:02:24] jynus: i've had a quick look. i disagree that this is blocking merging the wmfmariadbpy CR. now that we're doing _releases_ of wmfmariadbpy, the constraint is that we need section_ports.csv deployed before we can do the 0.5 release. [08:02:46] but also we need that file on all cumin+db hosts, because with 0.4 python3-wmfmariadbpy will be deployed on all those hosts [08:04:06] sure [08:04:15] I just did the example with cumin only [08:04:42] I think we could change wmfmariadbpy.pp into a package + config setup [08:04:44] i would prefer to merge the wmfmariadbpy CR, and finish the 0.4 release process before coming back to this [08:04:55] which one? [08:05:20] https://gerrit.wikimedia.org/r/c/operations/software/wmfmariadbpy/+/620291/ [08:05:46] ok, and that will not affect you? [08:05:50] actually merging that before 0.4 doesn't really matter [08:06:03] let's wait, just in case [08:06:09] ok [08:06:42] but my question wasn't really about any of the above, have no issue with that [08:07:04] it was more of if the puppet idea was in the right direction [08:07:15] but I am guessing the answer is "will look at it later" [08:07:46] i like the general idea [08:08:17] cool, it was just that [08:08:27] I will stop working on it until a deploy is made [08:08:39] is there something I can do to help the first deploy/version [08:08:53] i'm blocked on gerrit permissions :/ [08:09:25] if I have them, I could create the tag with your instructions [08:10:00] in any case, in other topics, backup2001 now has eqiad backups [08:10:50] it got stuck yesterday: https://phabricator.wikimedia.org/P12285 [08:10:53] but it worked today [08:11:27] A for aborted and f for failed, T for terminated OK [08:12:22] I restarted all hosts because there was buffer errors on bacula [08:12:55] weird stalls [08:13:34] I am working now on the memory dashboard, I think I have to relearn prometheus with thanos storage [08:22:25] marostegui: is the old m3 master still around? I want to do a grant diff [08:22:34] yep it is [08:25:14] it is almost impossible to diff between both :-( [08:27:08] but they are the same as db1117 [08:27:19] I will keep a dump just in case [08:37:03] 10DBA, 10Operations, 10Parsoid, 10serviceops, and 2 others: update mysql GRANTs for testreduce - https://phabricator.wikimedia.org/T260627 (10Kormat) [08:37:20] we need a grant database [08:37:52] that way we can have a n:n relationship between client hosts, users and its corresponding grants [08:47:22] 10DBA, 10Operations, 10Phabricator, 10Patch-For-Review: replace phabricator db passwords with longer passwords - https://phabricator.wikimedia.org/T250361 (10jcrespo) 05Open→03Resolved a:03jcrespo [08:53:55] marostegui: I have closed the phab ticket as the scope is technically done, but I will spend some extra time cleaning up/adding missing grants [08:54:03] jynus: thanks [08:56:42] 10DBA, 10Operations, 10Sustainability (Incident Followup): Redefine mysql GRANTs for wikiadmin - https://phabricator.wikimedia.org/T249683 (10jcrespo) [08:57:03] we also have T201662 pending... [08:57:39] yes, we have many things pending [08:58:54] he he [09:02:46] Last dump for s1 at eqiad (db1139.eqiad.wmnet:3311) taken on 2020-08-18 00:55:26 is 147 GB, but previous one was 157 GB, a change of 6.3% [09:04:16] don't worry, when you come back from vacations, I will have fixed all tickets!!!! [09:04:45] https://jynus.com/gif/high_five.gifv [09:22:19] hi! I should really stay in this channel [09:22:28] hashar: welcome :) [09:23:09] so you can't change rights for the transferpy gerrit repo [09:23:27] even Jynys who is in the owner group "operations-software-wmfmariadbpy" [09:23:57] aand jynus can't add anyone there cause the group is owned by the "Gerrit managers" group [09:24:00] which is well badly named [09:24:18] so I have just changed the group to be self owned [09:24:31] so that anyone in operations-software-wmfmariadbpy can add new members to it [09:24:41] which in turns grant the ability to change the access [09:25:05] and added kormat to it ;) [09:25:14] \o/ [09:25:18] hashar: thank you! :) [09:26:02] thanks [09:26:52] aaand it works: https://gerrit.wikimedia.org/r/admin/repos/operations/software/wmfmariadbpy,tags [09:28:33] amazing [09:28:34] to be fair [09:28:41] Gerrit permissions are a bit messy :-\ [09:39:10] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Marostegui) I have merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/620881 so the hosts will get installed with RAID10... [09:39:13] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es10[26-34].eqiad.wmnet - https://phabricator.wikimedia.org/T260370 (10Marostegui) I have merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/620881 so the hosts will get installed with RAID10... [09:40:37] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Marostegui) [09:41:29] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad, 10Patch-For-Review: (Need By: 2020-08-31) rack/setup/install es10[26-34].eqiad.wmnet - https://phabricator.wikimedia.org/T260370 (10Marostegui) [09:43:59] the list of ports would simplify the multiinstance classes a lot: https://gerrit.wikimedia.org/r/c/operations/puppet/+/620899 [09:44:40] oh that's nice [09:46:23] not sure if we should maintain separate lists of extra_ports and prometheus ports: convention or configuration? [10:06:56] 10DBA, 10Operations, 10ops-codfw: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Marostegui) [10:08:13] 10DBA, 10Operations, 10ops-codfw: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Marostegui) p:05Triage→03High Setting this to high to make sure the host is back online before 1st of September, which is when the DC switchover is happening [10:41:52] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Marostegui) The mgmt interface became responsive again, maybe switch issue? @ayounsi could you help checking? [10:43:20] 10DBA, 10User-Kormat, 10cloud-services-team (Kanban): Parametrize wmf-pt-kill so it can connect to different sockets - https://phabricator.wikimedia.org/T260511 (10Kormat) [10:43:43] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Marostegui) These are the HW logs of the host: ` ------------------------------------------------------------------------------- Record: 3 Date/Time: 08... [10:52:46] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Marostegui) I cannot see anything on the console, however the idrac says that the host is on, so I have to issued a power cycle, which didn't work and I had to... [10:59:16] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10ayounsi) >>! In T260670#6392699, @Marostegui wrote: > The mgmt interface became responsive again, maybe switch issue? @ayounsi could you help checking? Those... [10:59:51] 10DBA, 10Operations, 10ops-codfw, 10Patch-For-Review: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Marostegui) Thank you! [11:06:17] 10DBA, 10Data-Services, 10cloud-services-team (Kanban): Prepare and check storage layer for thankyouwiki - https://phabricator.wikimedia.org/T260551 (10Marostegui) The private data check came back clean so I have: - created `thankyouwiki_p` - Gave the grants to `labsdbuser`: `| GRANT SELECT, SHOW VIEW ON `t... [11:31:42] 10DBA: Compare a few tables per section before the switchover - https://phabricator.wikimedia.org/T260042 (10Marostegui) a:03Marostegui Will work on this next week, before the switchover [11:32:21] 10DBA, 10Patch-For-Review, 10Sustainability (Incident Followup), 10User-Banyek: Automatically compare a few tables per section between hosts and DC - https://phabricator.wikimedia.org/T207253 (10Marostegui) [11:34:07] 10DBA, 10Operations, 10ops-codfw: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10jcrespo) Does this host need reprovisioning? [11:34:47] 10DBA, 10Operations, 10ops-codfw: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Marostegui) Replication came back clean and with no errors on startup (or after mysql_upgrade), so I think we are good [11:59:59] jynus, marostegui: ok, now that i can finally release wmfmariadbpy 0.4, i'm going to follow this (slightly updated) deployment plan: https://phabricator.wikimedia.org/P12206 [12:03:07] looks ok to me, just one question, could step 10 be done by deploying only https://gerrit.wikimedia.org/r/c/operations/puppet/+/619443/4/modules/mariadb/manifests/packages_wmf.pp ? [12:03:25] (better than install it manually?) [12:03:30] kormat: sounds good to me, the last line, 29, is that intended to be switchover.py or db-switchover? [12:04:40] jynus: i'm deliberately not using puppet before step 15 [12:04:44] ok [12:04:52] then install it in several batches [12:04:58] marostegui: updated [12:05:45] as in, don't run apt install on 200 hosts at the same time :-D [12:06:04] kormat: edited it s/dc/db [12:06:26] oops, right [12:07:26] jynus: i can use cumin with a small batch size i guess [12:08:04] I have edit wikitech to change switchover.py for db-switchover across our doc [12:08:26] awesome :) [12:08:38] what was the sql statement you used for that? ;) [12:08:55] hahaha it was just a couple of pages that contained it [12:29:48] wmfmariadbpy 0.4 installed on cumin2001; mysql.py and db-replication-tree work fine [12:30:03] installing to db2131 now (to test buster) [12:34:13] works fine. same done for db2094 (stretch) [13:09:40] 10DBA, 10observability: check_mariadb_dump failing on alert[12]* hosts - https://phabricator.wikimedia.org/T260686 (10fgiunchedi) [13:57:44] 10DBA, 10Performance-Team, 10Patch-For-Review: Additional database user for XHGui administration - https://phabricator.wikimedia.org/T260640 (10Marostegui) First part of the task (revoking grants) done: ` # ./section m2 | while read host port; do echo "==== $host:$port ===="; mysql.py -h$host:$port -e "show... [13:58:31] 10DBA, 10Performance-Team, 10Patch-For-Review: Additional database user for XHGui administration - https://phabricator.wikimedia.org/T260640 (10Marostegui) [14:07:31] sustained lag on db1104 [14:08:10] yes [14:08:11] MCR [14:08:20] bad performance since yesterday [14:08:22] oh [14:08:28] Yes, it has been running since yesterday [14:08:29] as in ongoing maintance? [14:08:44] let me downtime it then? [14:08:44] yes [14:08:51] it is downtimed [14:08:57] oh [14:09:21] Ah, I see, it expired downtime [14:09:24] ah [14:09:26] I will downtime it again [14:09:37] done [14:09:38] I was like... am I looking at the wrong host? [14:21:57] wmfmariadbpy deployment status: everything until line 27 is done. i'm going to wait until tomorrow before i delete check_mariadb.py [14:23:32] good work, kormat [14:23:47] 10DBA, 10Operations, 10ops-codfw: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Papaul) If it is switch issue we should know today since msw-c1 is set to be replaced today. [14:24:52] ta :) [14:26:14] 10DBA, 10Operations, 10ops-codfw: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Marostegui) @Papaul there's definitely also something going on with the host, as the CPU errors reported on the HW logs do match the alert time [14:27:48] 10DBA, 10Operations, 10ops-codfw: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Papaul) @Marostegui can you depool it so i can do some maintenance on? [14:29:15] 10DBA, 10Operations, 10ops-codfw: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Marostegui) @Papaul mysql stopped, you can proceed as needed. Thank you! [14:29:22] I can take care, marostegui go rest [14:30:22] jynus: I will be around for another 15/20 mins as I need to slowly repool db1104, which just finished MCR :) [14:30:44] btw, the only thing to handover is db2125, in case papaul needs something from us (like now) [14:30:47] jynus kormat ^ [14:30:56] the rest is all stopped, no on-going maintenance or anything like that [15:25:11] 10DBA, 10Operations, 10ops-codfw: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Papaul) a:05Papaul→03Marostegui Drained the power from the server and did FW upgrade Before FW upgrade BIOS Version 2.4.7 iDRAC Firmware Version 3.36.36.36 After FW upgrad... [16:11:44] 10DBA, 10Performance-Team, 10WikimediaDebug: Additional database user for XHGui administration - https://phabricator.wikimedia.org/T260640 (10Krinkle)