[05:27:58] 10DBA, 10Operations, 10ops-codfw, 10User-Kormat: db2125 crashed - mgmt iface also not available - https://phabricator.wikimedia.org/T260670 (10Marostegui) Thank you! I have just uploaded the new version of mariadb (10.4.14) to the repo which has been tested on a couple of servers (codfw and eqiad) for a week [05:36:38] 10DBA, 10observability: check_mariadb_dump failing on alert[12]* hosts - https://phabricator.wikimedia.org/T260686 (10Marostegui) p:05Triage→03Medium a:03Marostegui Assigning to Jaime to see if he can take an initial look during the week [05:39:56] 10DBA, 10observability: check_mariadb_dump failing on alert[12]* hosts - https://phabricator.wikimedia.org/T260686 (10Marostegui) a:05Marostegui→03jcrespo [07:22:10] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Watchlist Expiry: Release plan [rough schedule] - https://phabricator.wikimedia.org/T261005 (10Marostegui) p:05Triage→03Medium Thanks @ifried for reaching out. Overall, I feel good with that plan, especially the order and the set of wikis it will be de... [07:32:22] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Marostegui) >>! In T260373#6401735, @jcrespo wrote: > @Papaul (manuel is on vacations until Monday), what about 1 on A1 and 2 on A6? Same row but it lo... [08:05:18] 10DBA: dbtree slowdown 2020-08-20 - https://phabricator.wikimedia.org/T260876 (10Marostegui) Thank you guys for debugging and fixing! [08:06:06] 10DBA, 10Continuous-Integration-Config, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (2020-07-01 to 2020-09-30 (Q1)): Run wmfmariadbpy on CI - https://phabricator.wikimedia.org/T261098 (10hashar) [08:08:20] 10DBA, 10DC-Ops, 10Operations, 10ops-eqiad: (2020-08-15) rack/setup/install dbprov1003.eqiad.wmnet - https://phabricator.wikimedia.org/T258750 (10jcrespo) >>! In T258750#6404800, @Cmjohnson wrote: > @jcrespo This server is ready for you, I did update the site.pp role to insetup. I didn't want to install it... [08:24:15] Krinkle pinged me on a broken dashboard for Bacula, so I tried to fix it the best I could (suggestions welcome): https://grafana.wikimedia.org/d/413r2vbWk/bacula [08:24:48] problem is that depending on the screen width, figures appear weird [08:25:07] jynus: should the top center panel say "Last 7 Days" instead? [08:25:20] correct, about to change it [08:25:23] what's the difference between "last successful backup" and last successful FULL backup? [08:25:46] normally incrementals and differentials are more frequent than full backups [08:25:59] sometimes the full fail but not the others or viceversa [08:26:05] but what does that full means? [08:26:19] marostegui: that you aren't hungry any more [08:26:27] everythin was backed up instead of only files that changed since the last backup or last full [08:26:44] full backups vs incremental vs differential [08:26:45] but full is for external store or in general? [08:27:05] this is bacula, this is for all backups [08:27:14] you can select the job at the top [08:27:45] 10DBA, 10Continuous-Integration-Config, 10Patch-For-Review, 10Release-Engineering-Team (CI & Testing services), 10Release-Engineering-Team-TODO (2020-07-01 to 2020-09-30 (Q1)): Run wmfmariadbpy on CI - https://phabricator.wikimedia.org/T261098 (10hashar) p:05Triage→03Medium I guess the bulk of the wo... [08:27:48] for most backups a full is only done once a month, depending on the configuration [08:28:07] but that is just transparent for recovery, bacula takes care of everything when you hit recover [08:28:25] aaaah ok full in bacula's context, got it [08:28:37] yes, sorry, this is general backups [08:28:44] (bacula) [08:29:00] yep, got it now! [08:29:38] if you select dbprov2001 jobs it will say it does incrremental backups [08:29:50] but because how database backup works, they will really be full [08:30:22] https://grafana.wikimedia.org/d/413r2vbWk/bacula?viewPanel=10&orgId=1&var-site=eqiad&var-job=dbprov2001.codfw.wmnet-Monthly-1st-Wed-Databases-mysql-srv-backups-dumps-latest&from=1595665818045&to=1598257818045 [08:31:05] ^ there are backups everyday but only those around dumps will generate meaningful data [08:45:00] 10DBA, 10observability: check_mariadb_dump failing on alert[12]* hosts - https://phabricator.wikimedia.org/T260686 (10jcrespo) from T247966 I understand that alert1001 and alert2001 are new icinga hosts similar to the existing ones, right? If yes, the only needed change is to add them to the allow list for gra... [08:48:02] WARN Memory 90% used (dbstore1004) [08:50:11] kormat: hi, so friday evening I went wild and wanted to run the wmfmariadbpy integration tests on CI. I think I solved the low hanging fruits (such as requiring perconal-tookit in the CI docker image), but the suite still dies out horribly https://gerrit.wikimedia.org/r/c/operations/software/wmfmariadbpy/+/621762/ :] [08:50:30] so i saw :) [08:50:33] kormat: it is nowhere urgent, but I guess it will be nice to have integration tests running on CI when possible hehe [08:50:39] i'll try to have a look today [08:50:53] yeah don't worry [08:51:21] I don't want to set you guys priorities. Just feel free to take over / amend the patch as needed when you get time to finely tweak the suite [08:51:41] I also went with an horrible hack to have the suite to point to the mysqld instance started by the CI container. But that is fragile [08:59:04] 10DBA, 10Performance-Team, 10WikimediaDebug, 10Patch-For-Review: Additional database user for XHGui administration - https://phabricator.wikimedia.org/T260640 (10Marostegui) a:03Marostegui [09:03:42] 10DBA, 10observability: check_mariadb_dump failing on alert[12]* hosts - https://phabricator.wikimedia.org/T260686 (10fgiunchedi) That's correct @jcrespo, those alert* hosts will be replacing the existing icinga hosts. I can confirm that we're OK now, the check works: ` [1598259185] SERVICE ALERT: alert1001;d... [09:05:33] ^ LGTM now, re: the puppetized grants, I did grep puppet for icinga1001 or its ip address but the grants didn't come up (FYI) [09:32:55] 10DBA, 10Performance-Team, 10WikimediaDebug, 10Patch-For-Review: Additional database user for XHGui administration - https://phabricator.wikimedia.org/T260640 (10Marostegui) The new user has been created: ` db2133.codfw.wmnet:3306 GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, ALTER ON `xhgui`.* TO `xhguiad... [09:32:59] 10DBA, 10Performance-Team, 10WikimediaDebug, 10Patch-For-Review: Additional database user for XHGui administration - https://phabricator.wikimedia.org/T260640 (10Marostegui) [09:39:29] marostegui: o/ I just wanted to say now more than one third of core tables are migrated to abstract schema \o/ [09:39:38] nice!!!! [09:39:40] good work! [09:39:51] lots of extensions are moving too, wikibase is almost done [09:40:11] so you see lots of .sql files moving :D sorry [09:41:00] I need a decision on timestamp so I can move forward with the rest of tables (https://phabricator.wikimedia.org/T42626) basically, binary is smaller but it would need more schema changes in production [09:41:35] Sorry, I haven't had the time to re-read that task [09:42:51] no worries, take your time you have until the weekend otherwise I'll be dropping tables in production at random [09:42:59] hahaha [09:48:44] I think we will have to migrate the MySQL panels on top, too [09:48:57] for grafana, as those are deprecated [12:04:42] 10DBA, 10Patch-For-Review: Upgrade m5 to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T260324 (10Marostegui) [12:25:38] jynus: re: using 'black', i have so far excluded wmfbackups from the coverage, as it's likely moving to a new repo soon, and it's 'your' code [12:27:52] well, I mentioned mostly because of remoteexeccution, which called cumin and had a different style [12:28:10] if riccardo plans to migrate to it, no issue [12:28:33] also, have you seen the resulting code? [12:28:53] there would be a few fixes that would need human attention [12:29:32] i haven't looked at it in detail. what sort of fixes do you have in mind? [12:32:53] I cannot link or don't know how to link to specific line numbers on diff [12:34:52] ok - filename + filenumber would work instead [12:35:01] it doesn't work for me [12:35:05] anyway [12:35:12] https://gerrit.wikimedia.org/r/c/operations/software/wmfmariadbpy/+/621753/2/wmfmariadbpy/WMFReplication.py new line 335 [12:35:36] that while is crazy- sure the previous version isn't preciselly clean [12:35:49] but it would need further refactoring [12:36:38] or things like line 520 [12:36:41] *620 [12:38:29] at the risk of being a bit blunt, to me this looks like the formatter is highlighting code that was already in need of refactoring [12:38:33] which i'm fine with [12:38:35] indeed [12:39:09] but hey, if you want to work on that, no complains [12:39:57] i figure that i'll probably end up refactoring a lot of the code, adding in exceptions etc. i have no problem tackling this sort of thing in the process [12:40:21] did you convince riccardo to go with double quotes? [12:41:00] that is the only thing that feels weird to me, working on one style on some repos and another on others [12:42:00] this is a gordian-knot approach. [12:42:10] there appears to be consensus that we should be using a code formatter for python [12:42:18] there is not consensus on what formatter we should use [12:42:28] black is seeing fairly wide adoption externally [12:42:48] so rather than try to fix the lack of consensus internally, [12:42:59] i'd rather make the part i have ownership better [12:43:18] otherwise i risk getting mired in unproductive discussions for ever [12:43:30] ok, so if you take care of maintaining it, I will follow you [12:43:37] great, thanks :) [12:43:50] but please, not just CI, refactoring changes! [12:43:54] packaging, etc. [12:44:07] I will need you to take care of functionality too! [12:44:17] of course [12:44:23] ok, then go ahead [12:49:14] perhaps converting those Yes/No strings into booleans seems valuable here too :) [12:49:49] mark: at a minimum making them 'constants' [12:49:55] mark: they are not booleans [12:50:02] they are multi-state strings [12:50:03] oh right, that too [12:50:09] thanks, mysql. [12:50:13] ok [12:50:14] Yes, No, Connecting [12:50:16] etc. [12:50:38] "Yes, No, Fuck you for assuming there were only 2 possible states", etc [12:50:57] all complains are to be redirected to mysql commands [12:51:53] for example, if you run start replication, it will immediately switch to running Yes, follow by null, follow by connecting follow by Yes or an error [12:52:02] all very fun [12:54:02] especially the Null ;) [12:54:20] "this is a FSM, but now we're confused!" ;) [12:54:35] actually not sure if it sends null or the empty string, one of the 2 [12:54:47] mark: haha [12:55:10] mark: to make things nicer, Yes, YES and yes are different, valid values [12:55:24] do they also have different semantics? [12:55:30] not internally [12:55:42] but they will return exactly what you send them in some queries [12:55:47] so any case [12:56:00] in some queries, of course, in others they will return 0 or 1 [12:56:43] for example, to configure the buffer pool you can write '100G' and it returns the bytes if read [12:56:56] but if you pass that on the command line it returns an error [12:57:03] wow [12:57:12] I guess this reinforces the need for some abstraction layer here ;) [12:57:43] nah, my code written at 4am before the switchover date is perfect /s [12:58:02] mark: i think a distraction layer might be even more relevant [13:13:52] 10DBA: Upgrade m5 to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T260324 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by marostegui on cumin1001.eqiad.wmnet for hosts: ` ['db1128.eqiad.wmnet'] ` The log can be found in `/var/log/wmf-auto-reimage/202008241313_marostegui_12147.log`. [13:47:18] 10DBA: Upgrade m5 to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T260324 (10ops-monitoring-bot) Completed auto-reimage of hosts: ` ['db1128.eqiad.wmnet'] ` and were **ALL** successful. [14:13:59] 10DBA: Upgrade m5 to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T260324 (10Marostegui) [14:34:57] jynus: do you have any more feedback on the 'black' CRs? [14:36:59] @ meeting [14:37:07] ah ok, np [15:09:16] kormat: back [15:09:35] kormat, I was only worried about the string delimiters difference [15:09:39] compared to cumin [15:09:43] but go ahead [15:09:57] I said already I would follow your steps on that [15:22:17] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 2020-09-14) rack/setup/install dbprov2003.codfw.wmnet - https://phabricator.wikimedia.org/T258749 (10jcrespo) 05Open→03Resolved closing again, as it has not happened again. [16:09:54] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Papaul) [16:10:16] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Papaul) [17:19:09] jynus: do database backup jobs explicitly name databases or are they on a per-host basis? [17:19:29] My actual question is: before dropping a database do I need to make some corresponding change in a backup table someplace? [17:20:22] so nothing breaks [17:20:34] but I would be happy if you ping me on a ticket or somewhere [17:20:43] so there are no useless grants [17:20:59] not only for backups [17:21:09] to keep track we haven't lost a db at some point [17:21:23] but doesn't have to be real time or top priority- not found dbs are just ignored [17:22:26] andrewbogott: e.g. if you mean for misc/m5, we keep an up to date list at https://wikitech.wikimedia.org/wiki/MariaDB/misc#m5 [17:22:41] jynus: great, I'm making a ticket now [17:22:43] it is at the moment manual, we want to move metadata to an automatic process at some point [17:22:59] adding me to an existing cleanup ticket is ok too [17:23:10] as long as you are ok leaving it open until I do the cleanup [17:23:15] whatever is easier for you [17:23:49] 10DBA, 10Cloud-VPS, 10cloud-services-team (Kanban): Drop openstack databases from m5-master - https://phabricator.wikimedia.org/T261152 (10Andrew) [17:24:19] I didn't have an existing cleanup ticket other than the 'move things to galera' ticket, so I just made a cleanup task ^ [17:24:56] cool, ad both me (for backups) and manuel (for db) [17:24:59] at least [17:25:11] wow, taht is a lot of dbs [17:25:21] is it all but striker maybe? [17:25:27] maybe all openstack ones? [17:25:38] it's all the openstack things [17:25:42] cool [17:25:45] I'm doublechecking now to make sure no one is connected [17:25:47] we will have to update the wiki [17:26:20] backups will be kept for around 2-3 months since now, then delete forever FYI [17:27:55] This will show me every open db connection to m5, right? [17:27:55] mysql -u root -BNe "select host,count(host) from processlist group by host;" information_schema [17:28:18] I guess that isn't definitive since it's a single point in time [17:32:13] 10DBA, 10Cloud-VPS, 10cloud-services-team (Kanban): Drop openstack databases from m5-master - https://phabricator.wikimedia.org/T261152 (10Andrew) Before we drop things I'd appreciate having a DBA confirm that there's no longer any activity to any of those. I've checked myself but it's nice to have a second... [21:12:23] 10DBA, 10Cloud-Services, 10MW-1.35-notes (1.35.0-wmf.36; 2020-06-09), 10Platform Team Initiatives (MCR Schema Migration), and 2 others: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10daniel) [21:59:52] 10DBA, 10DC-Ops, 10Operations, 10ops-codfw: (Need By: 2020-08-31) rack/setup/install es20[26-34].codfw.wmnet - https://phabricator.wikimedia.org/T260373 (10Papaul) The Delivery ETA for this is 08/31/20 so it is not possible to have those servers by 2020-08-31. [22:47:59] 10DBA, 10Community-Tech, 10Expiring-Watchlist-Items: Watchlist Expiry: Release plan [rough schedule] - https://phabricator.wikimedia.org/T261005 (10ifried) @Marostegui Thank you for your prompt response! We are very happy to read that you feel good about our plan. Yes, we can push the release schedule a week...