[03:28:52] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment (The Letter Song), 10MW-1.35-notes (1.35.0-wmf.36; 2020-06-09): ipb_address_unique has an extra column in production but not in the code (WAS: ipb_address_unique has an extra column in the code but not in product... - https://phabricator.wikimedia.org/T251188 [05:43:58] 10DBA, 10Patch-For-Review, 10Upstream, 10cloud-services-team (Kanban): Reimage labsdb1011 to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T249188 (10elukey) +1 please proceed! [07:04:44] 10DBA, 10Patch-For-Review, 10Upstream, 10cloud-services-team (Kanban): Reimage labsdb1011 to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T249188 (10Marostegui) Thank you! [07:09:48] 10DBA: Create prometheus alert to detect lag spikes - https://phabricator.wikimedia.org/T253120 (10Marostegui) This happened again on Sunday: https://grafana.wikimedia.org/d/000000273/mysql?panelId=6&fullscreen&orgId=1&from=1591486181606&to=1591537428665&var-dc=eqiad%20prometheus%2Fops&var-server=db1092&var-port... [07:18:10] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment (The Letter Song), 10MW-1.35-notes (1.35.0-wmf.36; 2020-06-09): ipb_address_unique has an extra column in production but not in the code (WAS: ipb_address_unique has an extra column in the code but not in product... - https://phabricator.wikimedia.org/T251188 [07:21:41] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment (The Letter Song), 10MW-1.35-notes (1.35.0-wmf.36; 2020-06-09): ipb_address_unique has an extra column in production but not in the code (WAS: ipb_address_unique has an extra column in the code but not in product... - https://phabricator.wikimedia.org/T251188 [09:39:45] 10DBA, 10Gerrit: Get a writable reviewdb clone to test Gerrit upgrade with - https://phabricator.wikimedia.org/T254516 (10Marostegui) Thanks Jaime for taking care of this. Just a clarification, there is a RO gerrit user called: `gerritro` which just SELECT grant. ` GRANT SELECT ON `reviewdb`.* TO 'gerritro'@'1... [09:42:18] 10DBA, 10Gerrit: Get a writable reviewdb clone to test Gerrit upgrade with - https://phabricator.wikimedia.org/T254516 (10Dzahn) Access for the gerritro user was fixed on May 28th in T243800#6171861. And that made it possible to run gerrit on the test server. I think that's an entirely separate request now,... [09:46:48] 10DBA, 10Growth-Team, 10MediaWiki-Recent-changes, 10Schema-change: recentchanges table indexes: tmp1, tmp2 and tmp3 - https://phabricator.wikimedia.org/T206103 (10Marostegui) [09:57:09] 10DBA, 10Growth-Team, 10MediaWiki-Recent-changes, 10Schema-change: recentchanges table indexes: tmp1, tmp2 and tmp3 - https://phabricator.wikimedia.org/T206103 (10Marostegui) [11:06:52] 10DBA: Check that all core hosts have events installed and enabled - https://phabricator.wikimedia.org/T254689 (10Marostegui) [11:07:17] 10DBA: Check that all core hosts have events installed and enabled - https://phabricator.wikimedia.org/T254689 (10Marostegui) 05Open→03Resolved This is all done. I am going to create a task, as we discussed, to check if we have events and if the event_scheduler is enabled. [11:09:26] 10DBA, 10Gerrit: Get a writable reviewdb clone to test Gerrit upgrade with - https://phabricator.wikimedia.org/T254516 (10jcrespo) Thanks, now that I understood the actual requirements (I wasn't aware of the existence of a ro account), I will take care of creating a test upgrade database from a reviewdb snapsh... [11:11:07] 10DBA: Create an monitoring check for event_scheduler and for installed events - https://phabricator.wikimedia.org/T254738 (10Marostegui) [11:11:23] 10DBA: Create a monitoring check for event_scheduler and for installed events - https://phabricator.wikimedia.org/T254738 (10Marostegui) p:05Triage→03Medium [11:31:52] 10DBA, 10Growth-Team, 10MediaWiki-Recent-changes, 10Schema-change: recentchanges table indexes: tmp1, tmp2 and tmp3 - https://phabricator.wikimedia.org/T206103 (10Marostegui) [11:40:10] 10DBA, 10Cloud-Services, 10CPT Initiatives (MCR Schema Migration), 10Core Platform Team Workboards (Clinic Duty Team), and 2 others: Apply updates for MCR, actor migration, and content migration, to production wikis. - https://phabricator.wikimedia.org/T238966 (10Marostegui) @daniel I haven't found anythin... [12:05:17] 10DBA: Compress enwiki InnoDB tables - https://phabricator.wikimedia.org/T254462 (10Marostegui) [12:18:29] 10DBA: Compress enwiki InnoDB tables - https://phabricator.wikimedia.org/T254462 (10Marostegui) [13:47:32] re: https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/603434, volans has suggested that instead of starting a read-only transaction that i could do `set session transaction read only;` [13:48:04] so that even if a cookbook uses multiple transactions, they will all default to read-only (if in dry-run mode, or specifically requested by the cookbook) [13:48:33] but I have no idea what approach is best and what are the drawbacks :) [13:48:55] i'm pretty sure it's still better than "start transaction read only" [13:49:17] the issue with that is that will "fix" dmls [13:49:45] but FYI, stop slave/change master would work, it is an innodb-level config [13:49:47] jynus: I had also other 2 concerns 1) root user and 2) DDLs [13:50:10] we literally were talking about that [13:50:18] on the meeting today [13:50:36] we brought up the posibility of a new read only grants account [13:51:17] indeed, that would be nice and simplify things I think [13:51:29] I mentioned not only for this [13:51:47] but because I already wanted it for snpashotting, which currently runs as root [13:52:11] ahh. now i have a better understanding of the issues [13:52:20] kormat: the other bit I forgot to comment in the CR. Usually for spicerack modules that needs config we read some global-puppet-distributed config in /etc/spicerack/... and that could have 2 users one for RO and one for RW [13:52:25] so `drop table X` is not done inside a transaction. sigh. [13:52:43] so the 3 of us we do a complete trio [13:53:02] riccardo knows mysql and autmation, but doesn't know ongoing db infra [13:53:10] you know ongoing infra and automation [13:53:26] I know mysql and infra but suck at automation 0:-D [13:53:33] and manuel? :D [13:53:59] manuel is the boss of everyone :-D [13:54:25] am I? [13:54:39] talking about positive things, I like that kormat used the way I use for authentication [13:54:53] jynus: i was reading mysql.py :) [13:54:54] which is not ideal but should work until we have a proper accounting system [13:55:12] kormat: read sql.py, it has some ingrained logic, don't reuse it [13:55:30] but it is nice to get better understanding of needs [13:56:03] basically to avoid making the same mistakes I did [13:58:11] kormat: https://phabricator.wikimedia.org/diffusion/OSMD/browse/master/wmfmariadbpy/sql.py [13:58:23] this was in the middle of a refactoring that never came [13:58:41] oh, not that one [13:58:57] https://phabricator.wikimedia.org/diffusion/OSMD/browse/master/wmfmariadbpy/WMFMariaDB.py [13:59:22] I will eventually ask you to reimplement that on spicerack, but that is out of scope [13:59:35] +1 [13:59:51] but I think it would be a nice read to understand what ended up there [14:00:03] ack [14:00:04] for example, I would prefer to work on transactions [14:00:14] I mean, to have transactions user-controlled [14:00:31] even if it is an admin-based interface [14:00:43] I cannot advice you on the best way [14:00:47] ok, back to the short-term: i'll change my CR to use `set session transaction read only`, but add a note that this doesn't protect against a lot of the more dangerous actions [14:01:02] because it could be ok to implement one model, and later change it [14:01:21] my question is, doesn't dry-run actually do stop slave? [14:01:31] or something like that? [14:01:37] dry-run of what? [14:01:48] the db switchover script [14:01:51] *dc [14:01:56] no idea, i haven't looked at it [14:02:33] the old legacy module run sql commands via cumin, so the dry_run logic was taken care by the remote modue [14:02:34] yeah "set global read_only=1;" keeps working on read only transaction [14:02:39] not running the remote command on dry_run [14:02:55] ok, then maybe not worth implementing read only now? [14:03:05] just replicationg the old functionality? [14:04:08] we can draw the line where we want. The good part is that cookbooks are code reviewed too, so we can also start with something less safe for a general approach [14:04:10] because I don't really see an "easy" way to implement it, so maybe better to skip it [14:04:22] is there a (small) set of sql commands that are 'safe' (i.e. do not change db state)? (e.g. 'select', 'show') [14:04:32] as long as we keep that in mind while reviewing the cookbooks until we implement a more general less powerful user [14:04:39] kormat: very difficult to implement without a full parser [14:04:54] i'm not talking about anything that complex [14:05:07] i'm thinking of checking the first word of the sql statement, and if it's not in a whitelist, reject in dry-run [14:05:08] yeah you can nest queries in ways that it's hard to know if they write [14:05:17] ahh, right. so much for that [14:05:18] e.g. "/* a comment*/ SELECT * FROM table with read lock;" is a dangerous operation [14:05:32] and you will need to parse nested comments, and select meaning [14:05:40] ok, doomed, got it [14:05:41] not impossible , but you get the idea :-D [14:05:59] I think I will create the task with read only admin [14:06:06] which I think all agree is a good idea [14:06:16] +100 [14:06:23] +1000 [14:06:31] but focus on a first initial small implementation first? [14:06:34] yourself, I mean [14:06:47] something that works even unsafe [14:06:55] and with the account option in mind? [14:07:03] yep [14:07:08] sgtm [14:07:08] disclaimer again, just suggesting [14:07:53] I think the idea is good [14:08:09] but it needs more work to implement it [14:08:52] I am more interested on other thing, which is discovery- that will be kept on puppet right now, right? [14:09:17] only the "trasnsport" changes, right? [14:09:22] *transport [14:10:36] i assume so, yes. [14:10:48] so I think a small path would do for now [14:11:00] may main concern about _legacy implementation was discoverability [14:11:12] of course we are now also free to query tendril/zarcillo [14:11:24] so I think a working implementation first of the querying interface [14:11:52] would be enought, to avoid the "cumin mysql -S" [14:12:08] * kormat nods [14:12:47] I only recommend to read the mysql.py and WMFMariaDB.py and other caller scripts to help understand how we used it, if it helps [14:14:36] regarding zarcillo, it will depend on what will be the actual source of truth in the end [14:15:28] sure [14:17:04] if we do a wrapper of execute(), we could do what you say, alternatively [14:17:37] e.g. update() insert() admin_cmd() select(), but I think that would be too much work [14:18:14] if we need to re-implement an ORM let's just use one ;) my 2 cents on this [14:18:21] yep yep [14:18:30] noooo [14:19:18] so let me summarize my criticisms of _legacy: ability to handle multiplexing [14:19:39] I don't care as much about implementation details [14:22:17] let's also use mysql terminology for [] -> groups [14:22:25] sections is confusing with wmf db sections [14:23:56] IMHO the main objective of the first iteration is feature parity and multiplexing (being able to connect to the many instances of a single server) [14:26:35] right, we just need an ORM that abstracts away database management ;-) [14:27:26] one that has a single method `do_the_thing_im_thinking_of()`, which takes no parameters [14:27:41] do_the_thing_I_should_be_thinking_of() [14:27:59] haha. even better [14:28:13] sudo do_the_right_thing() [14:28:20] jynus: let me know when you have the task filed for the `adminro` user, i want to reference it in my CR [14:28:31] I was writing it right now [14:28:45] +10000 [14:29:28] whould we start an etherpad to agree on some "phases" for this module? I think we have various iterations we can go through, also to keep the CRs manageable [14:30:13] things like "connect to any mysql instance with the proper config", "run query", "discover hosts based on section/role/dc/etc." [14:30:32] y'all are getting real ambitious :) [14:30:45] so far I have 2 large things "_legacy feature parity" "multiplexing". For later we discussed "read only/limited rights" "discoverability not on puppet" [14:31:08] we can create a tracking task [14:34:54] jynus: about the config file, what do you think of shipping a my.cnf~like file in /etc/spicerack/mysql/ maybe with multiple INI sections if we need different ones based on core DBs, labs, misc, etc... [14:35:09] volans: we already do that [14:35:23] /root/.my.cnf has that [14:35:29] provisioned by puppet [14:35:51] it has not all features we may want, but it is the use case: admin, cloud, replication, etc. [14:36:22] volans: pymysql parses my.cnf-format files [14:36:29] pls don't make me have to parse them myself [14:37:02] ok, we could reuse that one, although there is also an ongoing effort to allow to run cookbooks as non-root, but we can fix that later, if/when needed [14:37:20] yes, the locations is there because it made sense [14:37:33] at the time- only root has access to it to prevent accidents [14:37:46] but can be changed- it is however used by other scripts [14:38:28] when we fine-grained permissions better, it could be owned by a mysqladmin user or whatever [14:39:04] ack [14:40:05] kormat: I know, I was not suggesting that ;) [14:41:01] 10DBA: Setup and global admin account that can only read/have limited privileges to databases for safer debugging - https://phabricator.wikimedia.org/T254756 (10jcrespo) [14:41:22] 10DBA: Setup a global admin account that can only read/have limited privileges to databases for safer debugging - https://phabricator.wikimedia.org/T254756 (10jcrespo) [14:41:56] 10DBA: Setup a global admin account that can only read/have limited privileges to databases for safer debugging - https://phabricator.wikimedia.org/T254756 (10jcrespo) [14:50:39] I realized we could do the rw/ro split with roles on the same account [14:51:21] https://mariadb.com/kb/en/set-role/ [14:52:21] although it would still need DCL changes [15:03:54] jynus: hmm. would this allow the sudo-like thing you were talking about for admins doing work? [15:04:01] sortof [15:04:13] on the fly adquiring and removing rights [15:04:47] disadvantage: slightly more complexity, non-portable to non-mariadb [15:06:02] I will put a comment as a possiblity on the ticket [15:07:22] advantages: avoiding duplicating accounts for each admin account on each network/domain [15:12:50] 10DBA: Setup a global admin account that can only read/have limited privileges to databases for safer debugging - https://phabricator.wikimedia.org/T254756 (10jcrespo) One thing that could be useful is exploring the usage of [[ https://mariadb.com/kb/en/set-role/ | SET ROLE ]] for this (worth mentioning it so it... [15:16:29] BTW, I was checking which grants would be where and just realized REPLICATION_SLAVE_ADMIN was implemented on 8.0 [15:18:16] oh nice [15:19:10] MySQL 8.0 [15:19:23] https://bugs.mysql.com/bug.php?id=78393 [15:19:33] I will check MariaDB, othewise write a bug [15:20:18] Fixed on 10.5.2, which is not GA yet [15:35:53] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment (The Letter Song), 10MW-1.35-notes (1.35.0-wmf.36; 2020-06-09): ipb_address_unique has an extra column in production but not in the code (WAS: ipb_address_unique has an extra column in the code but not in product... - https://phabricator.wikimedia.org/T251188 [15:57:16] 10DBA: Setup a global admin account that can only read/have limited privileges to databases for safer debugging - https://phabricator.wikimedia.org/T254756 (10Volans) >>! In T254756#6202036, @jcrespo wrote: > * Disadvantages ** At least for the Spicerack integration the RO mode used in dry-run or when asking fo... [16:57:50] 10DBA, 10Operations, 10ops-codfw: db2075 failed to boot kernel 2/3 tries, please upgrade firmware/BIOS to mitigate - https://phabricator.wikimedia.org/T254139 (10Papaul) Before ` BIOS Version 2.4.3 Firmware Version 2.40.40.40 IP Address(es) 10.193.1.55 iDRAC MAC Address 84:7B:EB:F6:97:56 DNS Domai... [16:58:45] 10DBA, 10Operations, 10ops-codfw: db2075 failed to boot kernel 2/3 tries, please upgrade firmware/BIOS to mitigate - https://phabricator.wikimedia.org/T254139 (10Papaul) 05Open→03Resolved @jcrespo firmware upgrade complete [16:58:48] 10DBA, 10Operations: Reboot, upgrade firmware and kernel of db1096-db1106, db2071-db2092 - https://phabricator.wikimedia.org/T216240 (10Papaul) [17:40:52] 10DBA, 10Patch-For-Review, 10Upstream, 10cloud-services-team (Kanban): Reimage labsdb1011 to Buster and MariaDB 10.4 - https://phabricator.wikimedia.org/T249188 (10Marostegui) Replication started on labsdb1011 (Stretch+ Mariadb 10.1 again) after copying it from labsdb1012. [17:42:55] 10DBA, 10Operations, 10ops-codfw: db2075 failed to boot kernel 2/3 tries, please upgrade firmware/BIOS to mitigate - https://phabricator.wikimedia.org/T254139 (10jcrespo) Thank you for the help, putting the services back up. [17:49:27] 10DBA, 10Operations, 10ops-codfw: db2075 failed to boot kernel 2/3 tries, please upgrade firmware/BIOS to mitigate - https://phabricator.wikimedia.org/T254139 (10jcrespo) @marostegui there seems to be a bug on 10.1.45-MariaDB installed locally, as the systemd unit doesn't notify the start (despite actually g... [18:19:41] 10DBA, 10Operations, 10ops-codfw: db2075 failed to boot kernel 2/3 tries, please upgrade firmware/BIOS to mitigate - https://phabricator.wikimedia.org/T254139 (10Marostegui) Yeah, I was testing the new version on that host with the new package and then I got into lots of others things. If you have some time... [18:37:51] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment (The Letter Song), 10MW-1.35-notes (1.35.0-wmf.36; 2020-06-09): ipb_address_unique has an extra column in production but not in the code (WAS: ipb_address_unique has an extra column in the code but not in product... - https://phabricator.wikimedia.org/T251188 [18:38:41] What's the point of having the WAS part? [18:38:45] It makes titles pointlessly long [18:40:56] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment (The Letter Song), 10MW-1.35-notes (1.35.0-wmf.36; 2020-06-09): ipb_address_unique has an extra column in production but not in the code (WAS: ipb_address_unique has an extra column in the code but not in product... - https://phabricator.wikimedia.org/T251188 [18:42:44] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment (The Letter Song), 10MW-1.35-notes (1.35.0-wmf.36; 2020-06-09): ipb_address_unique has an extra column in production but not in the code (WAS: ipb_address_unique has an extra column in the code but not in product... - https://phabricator.wikimedia.org/T251188 [19:12:51] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment (The Letter Song), 10MW-1.35-notes (1.35.0-wmf.36; 2020-06-09): ipb_address_unique has an extra column in production but not in the code (WAS: ipb_address_unique has an extra column in the code but not in product... - https://phabricator.wikimedia.org/T251188 [19:13:22] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment (The Letter Song), 10MW-1.35-notes (1.35.0-wmf.36; 2020-06-09): ipb_address_unique has an extra column in production but not in the code - https://phabricator.wikimedia.org/T251188 (10Reedy) [19:29:54] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment (The Letter Song), 10MW-1.35-notes (1.35.0-wmf.36; 2020-06-09): ipb_address_unique has an extra column in production but not in the code - https://phabricator.wikimedia.org/T251188 (10Marostegui) [19:57:30] 10DBA, 10Performance-Team: Database for XHGui profiles - https://phabricator.wikimedia.org/T254795 (10Gilles) [20:36:49] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment (The Letter Song), 10MW-1.35-notes (1.35.0-wmf.36; 2020-06-09): ipb_address_unique has an extra column in production but not in the code - https://phabricator.wikimedia.org/T251188 (10Tchanders) Thank you @Ladsgroup ! [20:39:10] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment (The Letter Song), 10MW-1.35-notes (1.35.0-wmf.36; 2020-06-09): ipb_address_unique has an extra column in production but not in the code - https://phabricator.wikimedia.org/T251188 (10Niharika) Woohoo. Thanks @Ladsgroup! That was a big help. [20:52:39] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment (The Letter Song), 10MW-1.35-notes (1.35.0-wmf.36; 2020-06-09): ipb_address_unique has an extra column in production but not in the code - https://phabricator.wikimedia.org/T251188 (10Ladsgroup) Unblock is done and I took the liberty to apply the change... [20:53:01] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment (The Letter Song), 10MW-1.35-notes (1.35.0-wmf.36; 2020-06-09): ipb_address_unique has an extra column in production but not in the code - https://phabricator.wikimedia.org/T251188 (10Ladsgroup) [21:06:03] 10DBA, 10Performance-Team: Database for XHGui profiles - https://phabricator.wikimedia.org/T254795 (10Krinkle) >>! @Krinkle wrote at T199853: > In these 14 months (2016-2018), we've created 14 GB worth of data. > `name=July 2018 > krinkle@tungsten:~$ df -h > /dev/mapper/tank-data 1.6T 14G | /srv > ` Curr... [21:13:57] 10DBA, 10Performance-Team: Database for XHGui profiles - https://phabricator.wikimedia.org/T254795 (10Krinkle) >>! Task description by @dpifke: > In a cross-DC scenario, MediaWiki would write to the master instance, MariaDB would be responsible for replicating those writes, and each XHGui front-end would read...