[01:37:49] We have a database in production (x1) called "trash". Good luck grepping to see if it's used 😒 (T411835) [01:37:49] T411835: Investigate unusal dbs in x1 - https://phabricator.wikimedia.org/T411835 [08:15:38] 🗑 [09:29:56] no problems with transfer.py since the upgrade, right? [09:34:44] it's running right now for me [09:35:07] I saw, that's why I asked :-) [09:39:07] so far so good, but it's not finished yet. In general I haven't seen issues in past runs [09:39:23] I'd still like an ETA during the run tho :D [09:41:16] that's a good suggestion [09:41:52] although not trivial to do right for interactive vs batch usage [09:42:09] for example, I don't necesarily like how cumin does it [09:42:46] feel free to file a ticket if you have a clear proposal on how to solve that, and I can try to implement it for next version [09:43:08] e.g.: on logs only, a socket, etc. [09:45:18] Yes. In theory I prefer to have tools generate ample metrics rather than log numeric values but our monitoring stack is not built for this [09:46:55] So progress tracking is trivial, interfance not so, I think that's the main hurdle [09:50:53] for me the reason for that is twofold: planning at what time to repool and also investigate if a cloning is slower than expected as it could reveal disk issues [09:51:08] ofc [09:51:13] s/disk/drive/ [09:52:29] the thing is, communication is done through cumin (clusterssh), so it cannot be done the usual methods (e.g. curses UI) [09:52:56] but it can do other stuff, like query progress on a separate thread and show it on screen [09:53:08] or print it on screen [09:53:27] basically depending on how people want to receive the info [09:54:08] for example, if it is ran as part as a cookbook, a log format may be preferred [09:54:40] as I don't think it is expected to be seen interactively on console [09:56:16] personally I'd rather only use the terminal "append only" without curses and even without any line delete/replace [09:56:43] that's useful to know [09:56:50] and how frequent? [09:57:00] once every 5 minutes? [09:57:15] with an average transmission and write speeds? [09:57:59] (I mean more in general as a "policy": scripts output should be copypastable and testable so no fancy screen redraws etc) [09:58:20] yes 5 mins would be ideal [09:58:22] again, we work with what we have, I have to disable that on cumin [09:58:32] I mean, the interactive stuff [09:59:02] because it was build as an interactive tool first [09:59:22] not a transfer.py functionality or something I built [10:00:26] and, in fact, transfer.py predates cumin (we used salt before) [10:00:44] (I get it, I was talking in abstract e.g. when writing new tools) [10:01:01] and I agree with that sentiment [10:01:33] for transfer.py just logging out bandwidth and eta every 5 mins either in console or in a file would be ideal [10:01:50] I am explaining why progress tracking was something I wanted to have but it is not trivial to decide how to do it [10:02:00] but a log every X time should be easy [10:03:04] alternativelly, a temporary socket file that can be queried if external tracking is needed for automation [10:04:32] cat /run/transferpy.name/progress_percentage -> 12 [10:05:21] this conversation was helpful, thanks [10:05:42] IMO the socket requires a listener and does not keep history. Appending into a log file is "boring" in a good way and keeps history [10:06:22] no worries, we can do both :-D [13:04:48] jynus: you probably noticed already - the transfer worked ok [13:06:47] 👍 [14:26:48] Amir1: Manuel is ooto so I added you as reviewer for https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1215116 and https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1214083/3 . Also regarding https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1215575 I think we need to discuss what the UX should be: do we depool pc* by selecting a host or passing the section name? [14:27:15] I'm slowly waking up. Give me a bit! [14:43:38] I am soon to go away, remember I won't be around until Tuesday [14:48:16] Amir1: I can't find the AMR timezone in my tzdata :D [14:48:44] :D It has a bit of random.random() to spice things up [15:11:32] The sixth Spice Girl - Timezone Spice [15:27:24] xD [17:20:01] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on db1229:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed [18:05:43] normal s2 replica [18:05:50] let's see what's going on [18:07:34] > Dec 04 17:05:09 db1229 systemd[1]: prometheus-mysqld-exporter.service: Job prometheus-mysqld-exporter.service/start failed with result 'dependency'. [18:08:28] it looks fine to me [18:08:39] (restarted the service manually) [21:20:12] FIRING: SystemdUnitFailed: wmf_auto_restart_prometheus-mysqld-exporter.service on db1229:9100 - https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state - https://grafana.wikimedia.org/d/g-AaZRFWk/systemd-status - https://alerts.wikimedia.org/?q=alertname%3DSystemdUnitFailed