[04:33:59] 10DBA, 10MediaWiki-API, 10Core Platform Team Backlog (Watching / External), 10Core Platform Team Kanban (Waiting for Review), and 2 others: Certain ApiQueryRecentChanges::run api query is too slow, slowing down dewiki - https://phabricator.wikimedia.org/T149077 (10Krinkle) It looks like this #wikimedia-pro... [06:03:40] 10DBA, 10Goal, 10Patch-For-Review: Implement database binary backups into the production infrastructure - https://phabricator.wikimedia.org/T206203 (10Marostegui) >>! In T206203#5015671, @jcrespo wrote: >> And what I get is that I don't really know what happened as there is no usage or trace of error. > > Y... [07:49:12] 10DBA, 10Goal, 10Patch-For-Review: Implement database binary backups into the production infrastructure - https://phabricator.wikimedia.org/T206203 (10jcrespo) > Let's add those lines to https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/494899/13/modules/profile/files/mariadb/daily_snapshot.py That was... [08:03:02] 10Blocked-on-schema-change, 10MediaWiki-Database, 10MW-1.32-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), 10Schema-change: Add index log_type_action - https://phabricator.wikimedia.org/T51199 (10Marostegui) [08:03:04] 10DBA, 10Data-Services: Discrepancies with logging table on different wikis - https://phabricator.wikimedia.org/T71127 (10Marostegui) [08:06:15] jynus: can you check: es2001:/home/marostegui/backups.cnf ? [08:06:27] I am running: sudo -u dump /home/marostegui/puppet/modules/profile/files/mariadb/backup_mariadb.py --config-file /home/marostegui/backups.cnf and it does nothing :) [08:06:34] so not sure what's going on [08:09:39] note that the wrong user/password for db1115 is on purpose [08:12:49] uh [08:13:00] don't try to backup db1115 please [08:13:04] ? [08:13:10] I am not trying to backup db1115 [08:13:15] ah [08:13:15] I am trying to backup s6 [08:14:47] I don't think the yaml is valid? [08:14:57] you should get a debug file with the error [08:15:15] Ah I see [08:15:16] true [08:15:37] Can that be noted to be fixed? as in: let me know there is a debug file that I can check instead of returning nothing? [08:15:37] always check $? after execution [08:15:51] what is the bug? [08:16:20] I can add a --verbose to print to stdout [08:16:29] https://phabricator.wikimedia.org/P8181 [08:16:36] it doesn't print to it by default [08:16:43] I am sorry if I am testing silly things, but you asked me to test, and that is what I am doing [08:16:46] :) [08:16:52] run echo $? [08:17:11] if it is not 0, that is your error, to know more, read the logs [08:17:26] And as I don't have everything in mind like you, I might encounter things like these, which for you are obvious but not for someone who doesn't have all the background [08:17:48] It is not 0, but it is not obvious to someone who is not involved heavily in the internals :) [08:17:54] yes, that is why I am asking you to do it [08:17:59] :) [08:18:00] that needs documentation [08:18:34] filippo said to use logging, not me BTW [08:19:03] sure, just asking it to be noted, so we can iterate over all these things later or at your convenience [08:19:08] and I agreed as that is not really intended for interactive usage [08:19:24] marostegui: I think that is best written on the ticket :-) [08:19:31] ok, will do it! [08:19:37] I will write as I see things [08:19:41] that ok? [08:19:45] sure [08:22:19] 10DBA, 10MediaWiki-API, 10Core Platform Team Backlog (Watching / External), 10Core Platform Team Kanban (Waiting for Review), and 2 others: Certain ApiQueryRecentChanges::run api query is too slow, slowing down dewiki - https://phabricator.wikimedia.org/T149077 (10jcrespo) I don't think that one is exactly... [08:23:17] if they are like a list of many of those, maybe create a paste and put it on the ticket so you don't have to comment every time? [08:23:26] ok! [08:24:05] BTW, there is an empty mysql on es2002 if you want to backup a small host [08:24:12] ah cool [08:24:33] if you are testing on codfw, is it ok if I stop db1114 and replace it with an empty host to do the same? [08:24:41] totally [08:24:46] also, my backup test of enwiki took 2TB [08:25:27] maybe can consider skipping indexes option? [08:26:09] aka --compact [08:26:30] I guess we make the main functionality work first, then we optimize :-D [08:27:52] haha yeah [08:28:12] question [08:28:19] yes [08:28:22] I cannot use —config-file —only-postprocess right? [08:28:32] if I use —config-file then all the other options are ignored, right? [08:28:45] at the moment either you use --config-file or other options [08:28:48] cool [08:29:09] but you can add only_postprocess: True to the top of the file [08:29:31] feel free to suggest alternatives, although I would say to rank them [08:29:38] yeah, don't worry [08:29:51] "this bug is preventing me from doing a backup" vs "this is usability" [08:30:13] at the moment there will be a lot of the first ones :-) [08:32:17] db1118 doesn't have the ops datrabase and procedures, I am going to create it [08:34:02] that was making queries run for over 60 seconds [08:41:33] 10DBA, 10MediaWiki-API, 10Core Platform Team Backlog (Watching / External), 10Core Platform Team Kanban (Waiting for Review), and 2 others: Certain ApiQueryRecentChanges::run api query is too slow, slowing down dewiki - https://phabricator.wikimedia.org/T149077 (10jcrespo) What would you think of creating... [11:03:47] so the basic snapshot cycle works, but for some reason, despite the --compress, it doesn't get compressed [11:12:02] I think I got a but on the backup creation [11:12:05] *bug [11:29:49] 10DBA, 10Goal, 10Patch-For-Review: Implement database binary backups into the production infrastructure - https://phabricator.wikimedia.org/T206203 (10Marostegui) Some tests/comments {P8183} [11:46:58] Marostegui: if the above are your only complains, things are going better than I thought [11:47:01] :-) [11:47:11] or you didn't test enough yet :-) [11:47:17] I have only tested dumps and not all the options! :) [11:47:40] But so far, only usage issues I have seen [11:48:17] yeah, I agree with you that usability is not the best [11:48:19] can I test snapshots by doing Type: snapshot on the cnf? [11:48:32] yep [11:48:34] well [11:48:35] cool [11:48:38] locally ones [11:48:42] yeah [11:48:49] for remote you have to use transfer.py [11:48:57] yeah, I will test on es2001 locally [11:49:14] I am making now consistent successful runs from db1114 [11:49:24] Ah cool [11:49:42] so I have deployed the wmfmariadbpy (which doesn't get deployed anywhere) as a milestone [11:49:54] not deployed, committed [11:50:07] and will move the other one to in-review [11:50:30] so that there is a lot of issues still, but most functionality should be there [11:50:49] error handling is hard [11:50:57] more like error reporting [11:51:11] because logs is the prefered report for unattended scripts [11:51:16] mm, es2001 doesn't have mysql there, so I will use the other host you suggested [11:51:37] I will use 2002 [11:51:54] so I don't know how to balance interactive and unatended error reporting [11:52:30] if I had to give priority, I would ask for #1 protect against run without options triggering the whole cycle [11:52:58] well, to be fair, that was done on purpose [11:53:06] but I guess it can be confusing [11:53:18] it is confusing if you are not familiar [11:53:28] and normally when you run a script with nothing you'd expect an usage or something [11:53:39] not triggering the whole thing, that is how I see it [11:53:40] or maybe I should create separate executables [11:53:51] one for interactive and another for non-interactive usage [11:54:03] What is more complex? [11:54:17] Add that protection as by default show the help or having another script? [11:54:40] it is not as much as complexity as cleanleanless [11:54:57] e.g. move config reading to backup_mariadb_uninteractive [11:55:13] and leave backup_mariadb as only options [11:55:21] with saner defaults [11:55:59] but how many different scripts we have now? [11:56:25] what do you mean with scripts? Because we have a lot of files [11:56:35] I mean .py [11:56:45] the core functionality is in WMFBackups [11:56:51] but you don't run that [11:56:59] Yeah, so let's put it this way [11:57:31] If I want to run a dump I use backup_mariadb.py, if I need a snapshot I also run backup_mariadb.py (locally) and if I need a remote one I need to run transfer.py? [11:57:48] alternatively I can create a --quiet [11:58:03] or a --verbose to print in stdout [11:58:34] technically, if you want a remote backup [11:59:04] you need to run transfer.py and then backup_mariadb --only-postprocess [11:59:30] we really need a cheatsheet [11:59:31] :p [11:59:32] that is why daily_snapshot [11:59:38] does everything for you [11:59:46] including dumps? [11:59:51] not dumps [11:59:57] but it could [12:00:06] ok, so daily_snapshot to run snapshots either locally or remote if I need it to [12:00:11] we could centralize everthing on cumin [12:00:11] and backup_mariadb for dumps? [12:00:31] but I would prefer not to unless we have to (for snapshots) [12:00:40] I added a new comment on the review [12:00:54] that may clarify a bit the whole thing until I write a more detailed architecture [12:01:01] sure, I am fine either way, but I think we need a list of what to use each for, I am confused [12:01:08] ah let me see [12:01:18] see https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/494899/ [12:01:23] that is the first step :-) [12:01:40] then give me more time to write the detailed version to the wiki [12:01:45] sure sure [12:01:49] not pushing you at all [12:01:57] documenting at the same time than trying to figure it out is hard :-) [12:02:07] I am doing my best to test, but it is complex :) [12:02:52] it will all make sense, I will work now on a diagram [12:02:57] :) [12:03:09] * marostegui sends the jaime's love sticker [12:03:15] and if you follow puppet, in the end it will be just too scripts [12:03:28] daily_snapshot and mariadb_backup [12:03:43] that is kinda what I figured, but I wasn't sure :) [12:03:49] first for snapshots, second for dumps [12:04:17] to be fair, it is complex because what it does is complex itself [12:04:39] lots of orchestration going on, many hosts involved, etc [12:04:40] I am not blaming you or anything, just to be clear on that [12:04:43] I know [12:04:59] I am not being defensive, just defending you :-) [12:05:01] I am trying to test stuff but sometimes it is hard to know whether it is intended or not :) [12:07:57] so the main takeaway is that I am not putting at the moment a lot of focus on this being command-line friendly [12:09:21] error reporting and handling is important, though [12:10:18] 10Blocked-on-schema-change, 10MediaWiki-Database, 10MW-1.32-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), 10Schema-change: Add index log_type_action - https://phabricator.wikimedia.org/T51199 (10Marostegui) [12:10:20] 10DBA, 10Data-Services: Discrepancies with logging table on different wikis - https://phabricator.wikimedia.org/T71127 (10Marostegui) [12:12:49] 10DBA, 10Patch-For-Review: Implement a proof of concept of a snapshot cycle automation for a mediawiki section database - https://phabricator.wikimedia.org/T210292 (10jcrespo) {F28374996} {F28374999} [12:13:13] in the end it should be as easy as: https://phabricator.wikimedia.org/T210292#5017175 :-) [12:13:27] \o/ [12:18:45] I am going to leavr running a larger backup (m5 snapshot from the pasive host) [14:58:35] 10DBA, 10Data-Services: Discrepancies with logging table on different wikis - https://phabricator.wikimedia.org/T71127 (10Marostegui) s6 eqiad progress [] labsdb1012 [] labsdb1011 [] labsdb1010 [] labsdb1009 [] dbstore1005 [x] dbstore1001 [] db1125 [] db1113 [] db1098 [] db1096 [] db1093 [] db1088 [] db1085 [... [14:58:40] 10Blocked-on-schema-change, 10MediaWiki-Database, 10MW-1.32-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), 10Schema-change: Add index log_type_action - https://phabricator.wikimedia.org/T51199 (10Marostegui) s6 eqiad progress [] labsdb1012 [] labsdb1011 [] labsdb1010 [] labsdb1009 [] dbstore1005 [x] dbsto... [15:02:07] 10DBA, 10cloud-services-team (Kanban): CloudVPS: evaluate convenience of having codfw openstack DBs in proper DB hosts - https://phabricator.wikimedia.org/T218029 (10Andrew) We would also want labtestwiki here (the MW database for labtestwikitech) [15:30:29] 10DBA, 10Data-Services: Discrepancies with logging table on different wikis - https://phabricator.wikimedia.org/T71127 (10Marostegui) [15:31:11] 10Blocked-on-schema-change, 10MediaWiki-Database, 10MW-1.32-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), 10Schema-change: Add index log_type_action - https://phabricator.wikimedia.org/T51199 (10Marostegui) [16:07:46] I will test the snapshots tomorrow :) [16:52:34] 10DBA, 10MediaWiki-API, 10Core Platform Team Backlog (Watching / External), 10Core Platform Team Kanban (Waiting for Review), and 2 others: Certain ApiQueryRecentChanges::run api query is too slow, slowing down dewiki - https://phabricator.wikimedia.org/T149077 (10Anomie) >>! In T149077#5016519, @Krinkle w... [17:24:11] 10DBA, 10cloud-services-team (Kanban): CloudVPS: evaluate convenience of having codfw openstack DBs in proper DB hosts - https://phabricator.wikimedia.org/T218029 (10aborrero) We discussed about this in our last WMCS team meeting and we agreed that having a m6-master (or whatever is the name) in codfw would be... [18:51:46] 10DBA, 10MediaWiki-API, 10Core Platform Team Backlog (Watching / External), 10Core Platform Team Kanban (Waiting for Review), and 2 others: Certain ApiQueryRecentChanges::run api query is too slow, slowing down dewiki - https://phabricator.wikimedia.org/T149077 (10Krinkle) For the record, I included all th... [23:36:26] 10DBA, 10Patch-For-Review: Implement a proof of concept of a snapshot cycle automation for a mediawiki section database - https://phabricator.wikimedia.org/T210292 (10jcrespo) Snapshoting is now working consistently. There was a locking issue due to stdout piping + xtrabackup verbose output (for long running b...