[02:34:53] 10DBA, 10MediaWiki-Page-derived-data, 10Performance-Team (Radar), 10Schema-change: Avoid MySQL's ENUM type, which makes keyset pagination difficult - https://phabricator.wikimedia.org/T119173 (10Krinkle) p:05Triage→03Medium [05:29:33] 10DBA, 10Operations: Upgrade and restart s5 and s6 primary DB master: Tue 5th May - https://phabricator.wikimedia.org/T251154 (10Marostegui) 05Open→03Resolved This was done. We started a bit later than expected due to some on-going issues with another service. RO started: 05:20:59 RO finished: 05:23:34 [05:29:36] 10DBA, 10Operations, 10Puppet, 10User-jbond: DB: perform rolling restart of mariadb daemons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) [05:30:02] 10DBA, 10Operations, 10Puppet, 10User-jbond: DB: perform rolling restart of mariadb daemons to pick up CA changes - https://phabricator.wikimedia.org/T239791 (10Marostegui) [05:30:22] 10DBA, 10Operations: Upgrade and restart s5 and s6 primary DB master: Tue 5th May - https://phabricator.wikimedia.org/T251154 (10Marostegui) [05:31:48] 3000% cpu usage [05:33:18] there is a lot of "delete from processlist where server_id = @server_id" [05:33:24] I don't think that is normal [05:33:52] is that comingo from the watchdog o rwhat? [05:34:19] I think all hosts are trying to be updated at the same time [05:34:25] which is not great for performance [05:34:34] I would give it a few minutes [05:34:38] and check the status again [05:35:18] hopefully the events don't pile up after the first run [05:35:36] I think it will stop working, Act it is already increasing for all of them [05:36:36] Errr [05:37:11] root@db1115:~# host 10.64.32.13 [05:37:12] 13.32.64.10.in-addr.arpa domain name pointer orespoolcounter1002.eqiad.wmnet. [05:37:52] We do we have that ip issuing a delete? [05:38:27] Or mwmaint1002 running delete from processlist where server_id = @server_id as root? [05:39:27] I don't think thats right, must be a dns issue [05:41:07] it is also pointing to cumin1001, I think that is just missleading [05:42:31] CREATE PROCEDURE [05:42:40] so looks like 81166 root 10.64.32.25 tendril Connect 1745 Copying to tmp table insert into processlist_query_log\n (server_id, stamp, id, user, host, db, time, info)\n 0.000 is holding everything from running smoothly? [05:43:03] it is like running a lot of setup host processes [05:46:37] cannot say, honestly [05:46:50] lots of inserting and CREATE * things running [05:46:55] Going to stop the event_scheduler [05:47:13] to see what cleans up and what keeps running [05:47:42] activity back to 0% [05:47:52] so it is the scheduler what is causing high load [05:48:03] | 240574 | root | 10.64.32.25 | tendril | Connect | 172 | Copying to tmp table | insert into processlist_query_log [05:48:03] (server_id, stamp, id, user, host, db, time, info) [05:49:04] and q.stamp > now() - interval 7 day [05:49:37] 245 seconds and still didn't finish [05:51:42] this is me right now: https://youtu.be/12LLJFSBnS4?t=18 [05:51:54] hahahahahahahaha [05:52:22] I am trying to see wtf Copying to tmp table | insert into processlist_query_log [05:52:22] (server_id, stamp, id, user, host, db, time, info that is [05:52:24] and where it is coming from [05:52:34] the fact that it takes more than 5 minutes to finish is concerning [05:52:56] not sure if that might be causing the rest of things to pile up [05:54:18] maybe that table just exploded? [05:54:26] that table is huge, so... [05:54:33] that may be one of the maintenance processes that run from time to time [05:54:44] on a cron [05:55:26] to be honest, I would just truncate that table [05:55:36] (even if it is not the source of this problem) [05:55:39] sure [05:55:46] but you will have to kill the query first [05:55:51] yeah [05:56:38] https://phabricator.wikimedia.org/P11134 that's crazy XD [05:57:09] it is the query log, I think it is normal for it to be big [05:57:22] yeah I know, but we haven't purged it in years [05:57:34] it gets purged [05:57:56] a partition gets created and dropped, see partition time [05:58:28] yeah, but it is insane that it grows 16GB in one day [05:59:14] if you intend to kill, better sooner than later :-D [05:59:56] yeah, the join with processlist_query_log must be just crazy for it to finish [06:00:04] if the table is that huge [06:00:08] going to kill that insert [06:03:17] jynus: let's truncate the table? [06:03:30] ok to m1 [06:03:33] ok to me [06:03:43] ok, doing it and after that I will start the event scheduler [06:04:46] https://phabricator.wikimedia.org/P11135 [06:04:50] going to start event scheduler [06:05:15] ok, done [06:06:29] I am going to do a recap on phabricator [06:06:39] so all this is recorded for posterity [06:06:54] lol [06:10:06] things are looking good so far now [06:10:12] yep [06:10:41] for some reason the processlist got much larger laterly [06:11:20] 3GB vs 15 GB [06:11:22] are you comparing the backupsizes? [06:11:33] we don't backup tendril [06:11:37] only zarcillo [06:11:40] ah [06:11:46] not by choice [06:11:53] but we do backup tendril's schemas? [06:12:00] it doesn't work because of external locking [06:12:23] we have a backup of it on dbprov and the secondary host [06:12:31] yep [06:13:25] going to get some breakfast and will document the tendril's incident [06:13:30] on its phab task [06:13:36] I will go for a walk [06:13:42] enjoy :) [06:13:47] I will ask you about something non-work later [06:13:55] ok! [06:13:56] but related to databases :-D [06:33:59] something went wrong with es5 backup [06:34:03] will check it later [06:40:13] 10DBA, 10Privacy Engineering, 10Security-Team: Drop (and archive?) aft_feedback - https://phabricator.wikimedia.org/T250715 (10jcrespo) a:03jcrespo [06:42:02] 10DBA, 10Privacy Engineering, 10Security-Team: Drop (and archive?) aft_feedback - https://phabricator.wikimedia.org/T250715 (10jcrespo) Thanks, useful evaluation and notice, @JFishback_WMF, taking it from here to generate the exports without the problematic rows. [07:08:23] 10DBA, 10Operations, 10Patch-For-Review: Disable/remove unused features on Tendril - https://phabricator.wikimedia.org/T231185 (10Marostegui) We had an issue with tendril today where tendril was very slow and almost unresponsive, at first I thought it was another case of {T231769}, but it wasn't. First of a... [07:15:08] marostegui: following the discussion from yesterday, what hosts do we want to reimage to buster+10.4 for testing and load purposes? [07:16:06] kormat: we need to upgrade at least another host from es4 or es5 but in eqiad this time [07:17:48] heh. https://tendril.wikimedia.org/tree isn't loading for me [07:18:00] is it down again? [07:18:10] hosts and activity look ok [07:18:53] so that's a different issue then, as the DB host this time looks fine [07:18:55] let's investigate [07:20:52] `PHP Notice: Undefined offset: 1661 in /srv/dbtree/inc/tree.php on line 14` [07:21:30] $host = $this->hosts[$host_id]; [07:21:31] XD [07:23:22] I guess 1661 was a host ID [07:23:31] root@db1115.eqiad.wmnet[tendril]> select * from servers where id=1661; [07:23:31] Empty set (0.00 sec) [07:24:58] so everything but tree seems to be working [07:31:11] there is something weird with the data [07:31:23] 1661 is in global_Status_log, and slave_status [07:31:30] (i don't know if that's expected or not) [07:31:48] There is a duplicate host with the same id [07:31:51] let me fix that [07:31:56] and they both have 1660 as id [07:32:04] before you do - how can i see this? [07:32:22] ah, I was examining db1115 tendril database, on the servers table [07:32:34] so you've already hid the evidence? :) [07:32:38] and tendril's code runs on dbmonitor1001 [07:32:44] No, I haven't touched it yet [07:33:02] `select COUNT(*) from servers where id=1660;` gives me `1` [07:33:21] yes, same, but look at this: [07:33:33] | 1660 | es2020.codfw.wmnet | 3306 | 10.192.0.157 | 2020-05-05 07:22:53 | NULL | 2020-05-04 14:00:00 | 2020-05-05 06:30:00 | 2020-05-05 07:23:18 | 2020-05-05 07:22:39 | 2020-05-05 07:23:33 | 2020-05-05 07:23:25 | NULL | 180355229 | 171966665 | [07:33:33] | 1657 | db1107.eqiad.wmnet | 3306 | 10.64.0.214 | 2020-05-05 07:23:24 | NULL | 2020-05-05 07:00:00 | 2020-05-05 06:27:00 | 2020-05-05 07:23:09 | 2020-05-05 07:23:15 | 2020-05-05 07:23:32 | 2020-05-05 07:23:30 | NULL | NULL | NULL | [07:33:33] | 1660 | es2020.codfw.wmnet [07:34:01] wtf, it is gone now? [07:37:24] we can try to insert a dummy row with id=1661 [07:38:01] are there delete triggers (is that the right word?) to remove entries from other tables if a server row is deleted? [07:38:44] I guess so, but it is all a big of black magic yeah. But if a host is removed, the events are (or should be) removed as part of the tendril-drop script [07:39:13] you mean the glorious 1k-line bash script? :) [07:39:18] yes! [07:39:19] haha [07:39:32] oh, sorry, that's the add script. the drop script is quite simple [07:39:36] so that removes all the events from the host that is being disabled and dropped [07:41:02] something i've been wondering: should these glorious scripts be using a transaction? [07:43:38] so from what I can see it was working today around 6:30AM CEST [07:49:42] marostegui: this error might be a red-herring. it's present in even logs from 2020-04-16 [07:49:54] the offset one? [07:49:57] yep [07:50:27] Yeah, I was checking enabling php errors on tree.php and it wasn't very successful there [07:52:22] I am sure this is too much of a coincidence and it is most likely related to today's issues [07:52:45] `PHP Warning: gethostbyaddr(): Address is not a valid IPv4 or IPv6 address in /srv/tendril/lib/utility.php on line 507, referer: https://tendril.wikimedia.org/tree` [07:53:47] that could be because of my tests with a dummy hosts (if it is from a few minutes ago) [07:53:53] it is, yeah [07:54:14] so... so far we have zero useful debug info from tendril. "neat" [07:54:20] see what I say about killing? if we had invested all time time in maintaining in creating a new, simpler one, we would have 2 developments [07:55:01] hmm. there's a js error [07:55:09] `Error: Must call google.charts.load before google.charts.setOnLoadCallback` [07:55:48] uf, if api was updated, we are for a fun ride [07:56:50] "SyntaxError: JSON.parse: unexpected character at line 1 column 1 of the JSON data Resource URL: https://tendril.wikimedia.org/jquery-1.9.1.min.js" [07:57:46] also "Using //@ to indicate sourceMappingURL pragmas is deprecated. Use //# instead" [07:58:38] https://developers.google.com/chart/interactive/docs/basic_load_libs#update-library-loader-code [07:58:50] so the api has changed, but i don't know when [07:59:29] buf :( [07:59:39] "To update your existing charts, you can replace the old library loader code with the new code" [07:59:42] i wonder if we can pin to the older version. i'll poke around. [07:59:44] cool [07:59:54] very useful documentation :-D [08:00:32] maybe it is a 4 code change? [08:01:26] v47 was released on 2020-01-06, previous version was 2018-10-01 [08:01:47] it should be a 1 code change, I am going to try it on server directly, ok? [08:02:08] *1 line of code [08:02:22] it's not :) [08:02:38] or at least not from what i can see [08:02:40] but sure, go for it [08:02:46] you can't break it any more than it's already broken :) [08:03:21] kormat: btw: https://phabricator.wikimedia.org/T96499 [08:03:30] yep :) [08:04:03] 'Module "current" is not supported.' [08:04:43] will just mimic the code snippet [08:04:54] maybe the api of the api changed too :-D [08:10:04] Am I doing something wrong? [08:10:15] ReferenceError: drawChart is not defined [08:12:43] sorry I had to restart my laptop [08:12:46] It got totally frozen [08:13:37] what are you currently doing jynus ? [08:13:38] I seem, I have to update google.setOnLoadCallback(drawChart); [08:13:41] *see [08:13:52] as soon as I fine where is that defined :-D [08:13:54] *find [08:14:31] only in 6 places :-D [08:15:02] haha [08:15:42] yep, tree is back [08:15:53] namespace changed [08:15:56] can you do a recap of what was needed? [08:16:02] yeah, yeah [08:16:10] I was waiting for the patch to explain [08:16:17] not that I wasn't [08:16:28] just it would be easier with a patch if you let me [08:16:32] this is not a fixc [08:16:39] this is just "a test for a proper fix" [08:17:19] summary: just google api changed not our fault this time [08:17:41] I am wondering if this could have had something to do with the early overloads or it is just pure coincidence? [08:17:49] coincidence [08:17:51] Even though we fixed the earlier issues by truncating the huge table [08:17:56] I belive people were having tree issues [08:18:13] maybe deprecation happened at different times due to google's cdn/dcs [08:18:19] for contex of the early issues kormat : https://phabricator.wikimedia.org/T231185#6107666 [08:18:20] it finally hit europe [08:18:37] so this is unrelated, this was just javascript [08:21:00] downloading patch and reviewing in a second [08:24:32] marostegui: https://gerrit.wikimedia.org/r/c/operations/software/tendril/+/594412 [08:24:56] it is not a complete patch, it needs to change all instances of google.setOnLoadCallback(drawChart); [08:25:38] but that is what it is on dbmonitor1001 right now, just for explanation [08:25:40] I see so essentially: https://developers.google.com/chart/interactive/docs/basic_load_libs#update-library-loader-code [08:25:46] Let's include that on the commit message? [08:25:48] the url I mean [08:25:53] yeah, but our usage is not that trivial [08:25:58] yeah [08:26:16] because the function is available on some pages [08:26:23] so cannot be on the header [08:26:31] I don't even know how that worked before [08:26:41] I guess it never did- it just errored all the time [08:27:00] yes [08:27:03] will add that [08:27:07] just wanted to share quickly [08:27:12] yep! [08:27:19] I have to finish the patch [08:27:26] maybe kormat can help me review it? [08:27:29] I hope they don't start changing more things or this will be fun [08:27:36] and also make another for dbtree [08:27:41] which it is also broken [08:27:57] yeah, as they are separate files :-/ [08:28:12] more than files, repos :-D [08:29:38] let me take a quick break [08:30:46] i think we should pin to a specific version [08:31:14] we can replace `'current'` with `'47'`, for example [08:32:59] can that be done actually? [08:33:18] yep, documented here: https://developers.google.com/chart/interactive/docs/basic_load_libs#load-version-name-or-number [08:33:44] marostegui: so, going back to the start of all this, how about i reimage es1024 (from es5)? [08:33:53] oh, that maybe a better approach - I am scared if they started changing stuff more often, before we've gotten rid of tendril [08:34:01] yep, exactly [08:34:44] kormat: es1024 sounds good [08:35:12] alrighty. i'll do that now. then i'll start looking at partman vomit. [08:35:33] XD [08:36:05] kormat: regarding the suggestion about the version, I would suggest we discuss on jaime's patch [08:36:09] so it doesn't get lost on irc [08:36:34] +1 [08:37:19] done [08:37:56] thank you! [08:58:57] you can check updated patch, that should be more "presentable" [08:59:04] I will check dbtree now [09:00:29] I wonder if https://www.google.com/jsapi is still needed [09:01:04] I can try dropping it on prod, see if something else breaks [09:02:29] yeah, I think it is not needed [09:08:15] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment, 10Patch-For-Review: ipb_address_unique has an extra column in the code but not in production - https://phabricator.wikimedia.org/T251188 (10Marostegui) >>! In T251188#6105539, @Tchanders wrote: > @jcrespo @Marostegui - thanks for pinging AHT. This would... [09:10:57] jynus: another pass done [09:11:31] my bad [09:12:18] but please check patch 3 (soon 4) [09:13:07] i thought i did, but :gerrit: [09:16:13] he [09:16:34] has anyone noticed that gerrit is pretty terrible? [09:16:36] it changes color when on an outdated version [09:16:41] it is super clear! [09:16:55] I wonder if you are using polygerrit or old gerrit [09:17:13] polygerrit [09:18:27] let me know if ok to deploy [09:18:38] and yes, that last change is a bit risky [09:18:48] but this is tendril we are talking about [09:19:45] LGTM'd [09:20:01] different SLA require different approaches :-D [09:20:21] SAL of tendril right now is "50% of the times works" [09:20:24] *SLA [09:20:42] I've also uploaded https://gerrit.wikimedia.org/r/c/operations/software/dbtree/+/594422 [09:21:09] which explains the "dbtree doesn't work" reports we got laterly from 1 person [09:21:22] I will update the jquery version there, too [09:21:26] LGTM'd too [09:21:55] one strange thing of this repo [09:22:00] is the deploy is manual [09:22:24] one has to go to dbmonitor host and rebase [09:22:52] we could do it automatically on merge, but I decided not to, but I am no longer in charge :-D [09:24:53] i'm currently looking at the db case in netboot.cfg, and i'm wondering how exact we want the patterns to be [09:25:14] because some of them cover a lot more hosts than we have, and some are very very exact, and i'm not sure what the rationale is [09:25:31] kormat: basically we cover more db* hosts [09:25:35] so we don't have to worry about new ones [09:25:44] the exact ones is because we don't have many of them anyways [09:25:49] and we don't usually buy those [09:25:54] but db* hosts, we usually buy [09:26:01] ie: es hosts, we rarely do, or pc [09:26:53] is it me or has the graphical representation changed, too: https://tendril.wikimedia.org/report/slow_queries?host=^db&user=wikiuser&schema=wik&hours=1 [09:27:04] seems... different [09:27:13] yeah [09:27:16] it is different [09:28:24] poke around and report if you see something broken... that wasn't broken befor ofc [09:28:39] marostegui: hmm. the other question is - what happens if you pxe boot with partman/custom/no-srv-format.cfg? i know it causes partitioning to fail, but is that meaningfully different from the partitioner waiting for human input? [09:29:03] yeah, it errors out and the install cannot complete at all [09:29:06] yes, it won't find the root system [09:29:22] this was mostly an accidental finding, but we ended up thinking of it as a feature [09:29:31] i might not be asking this question correctly [09:29:36] so even an op couldn't manuualy operate it [09:29:37] :-D [09:29:50] ah. and that's a desired feature? [09:30:00] well, depends on the alternative [09:30:01] because if it isn't, we could drop the entire case [09:30:11] in which case all db hosts will pause looking for manual input [09:30:15] ideally, the install would not even start [09:30:17] no automatic data loss [09:30:20] 10DBA, 10Operations, 10Privacy Engineering, 10Traffic, and 4 others: dbtree loads third party resources (from google.com/jsapi) - https://phabricator.wikimedia.org/T96499 (10Marostegui) For the record: https://gerrit.wikimedia.org/r/#/c/operations/software/tendril/+/594412/ https://gerrit.wikimedia.org/r/#... [09:30:31] sure. but i think that's a bigger thing to tackle, and is tracked by the task you filed [09:30:31] or the host wouln't even reboot [09:30:36] ah, ok [09:30:50] so for the /srv thing, the ideal is to just do an install [09:31:01] coplete wipe of / but keeping /src [09:31:16] /srv, is that the scope you are working with, or something else? [09:31:36] sorry, I may be missunderstanding the contex, ignore me if that's the case [09:31:37] let me back up slightly: at some point in the (hopefully near) future we'll have some flag somewhere we can toggle to say this host should reimage from pxe [09:31:45] yep [09:31:58] in the meantime, we want to manually use netboot.cfg to say if a host should reimage or not [09:32:12] but if enabled, it shoud keep /srv at all times... except if it is a new host [09:32:18] so we should have 2 recipes [09:32:24] if we remove the no-srv-format case, all db hosts will default to waiting for human input before destructive actions [09:32:25] "reimage keeping /src" [09:32:33] and "reimage fully (new hosts only)" [09:32:36] and we don't need to maintain this pattern of hosts [09:33:09] I personally do not like that [09:33:27] I think it should even fail complately or go though completely [09:33:44] what does it do if it fails completely? [09:33:50] does it reboot? [09:33:54] not touch disk at all [09:34:07] kormat: right now, if it fails, it keeps waiting for an human to reboot it [09:34:08] prevent reboot if possible, but not sure that is possible [09:34:09] or does it also wait for human input? [09:34:15] marostegui: right [09:34:25] we have to count that reboot will always be possible [09:34:34] but we should eliminate the human factor too [09:34:35] so from a safety point of view it's equivalent [09:35:03] well, the current system requires 2 people to ok a reimage, a deployer and a reviewer [09:35:19] I think that is a feature, let me give you an example [09:35:40] jynus: whereas with what i'm talking about someone could reboot a machine off pxe, and do an install manually? is that the case you're concerned about? [09:35:51] a person wants to reimage db1101, types db1001 which has super-important data, and manually reimages db1001 [09:36:53] there should be some kind of conscient decision [09:37:01] I am getting a bit lost, I think we are having now two very similar threads, but I am not following any of them [09:37:04] that can be reviewed [09:37:07] he he [09:37:13] sorry, moving away [09:37:20] jynus: when i fix the partman recipe, there shouldn't be any more manual interaction with reimaging [09:37:38] ok [09:37:48] (fresh installs are maybe an exception? don't know there) [09:37:51] kormat: can you expose what you have in mind (or questions) and once done, we can ask you questions or discuss ideas? [09:37:59] yes [09:38:27] sometimes mock patches work too, to start a discussion [09:39:09] marostegui: sure. let me quickly put a description together (probably in a paste) [09:39:20] sounds good [09:39:24] thank you [09:39:44] I think that'll help to have a clear view of what you have in mind (at least to me) [09:39:52] and then we can discuss ideas and ask questions :) [09:43:09] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment, 10Patch-For-Review: ipb_address_unique has an extra column in the code but not in production - https://phabricator.wikimedia.org/T251188 (10Tchanders) @Marostegui Thanks for looking into those wikis - even though we know it only affects a few, it's help... [09:46:42] marostegui, jynus : https://phabricator.wikimedia.org/P11138 [09:47:25] kormat: thanks - reading [09:47:27] 10Blocked-on-schema-change, 10DBA, 10Anti-Harassment, 10Patch-For-Review: ipb_address_unique has an extra column in the code but not in production - https://phabricator.wikimedia.org/T251188 (10Marostegui) >>! In T251188#6107915, @Tchanders wrote: > @Marostegui Thanks for looking into those wikis - even th... [09:49:17] kormat: so by default any host will fail to install unless told otherwise? [09:49:30] yes, same as currently [09:49:49] and if specified, it will do a full reimage including /srv? [09:50:22] marostegui: we will presumably have srv-format.cfg and no-srv-format.cfg (or similar). we'd use no-srv-format for reimages, [09:50:33] and srv-format for fresh installs, or where we want to wipe /srv for whatever reason [09:51:22] Right, so by cases: 1) by default fail on the install 2) have an specific partman for full reimage including /srv 3) have an specific partman to reimage without including /srv? [09:51:23] is it me or https://dbtree.wikimedia.org/ is now blue? [09:51:36] marostegui: exactly [09:51:38] :-/ [09:51:40] jynus: yes [09:51:43] he he [09:51:48] why on on tendril? [09:52:00] is it on the app, or is it a default? [09:52:20] marostegui: though technically 1) would be "block in the partitioner for human input" rather than an explicit fail [09:52:23] maybe it was always blue and we just "fixed it" [09:52:34] kormat: right [09:53:15] kormat: so if you want to reimage a host, you basically add a line with the partman recipe you want (either full or avoid /srv) one, right? [09:53:17] ideally T251416 would get fixed, and it would never even make it into the installer [09:53:19] T251416: PXE Boot defaults to automatically reimaging (normally destroying os and all filesystemdata) on all servers - https://phabricator.wikimedia.org/T251416 [09:53:22] I still don't understand fully [09:53:28] marostegui: yes [09:53:30] "Drop the case entry from netboot.cfg. This will cause all db hosts to block on human input by default." [09:53:58] does this mean have no recipe by default? [09:54:11] yes [09:54:53] problem is foundations will not like that [09:55:10] "all hosts should have the recipe to reimage them" [09:55:32] that sounds like a foundations problem :P [09:55:41] no, it is our problem :-D [09:55:45] anyway, ignore that [09:55:47] for now [09:55:54] I think it is ok-ish [09:56:04] "ok-ish" \o/ [09:56:05] I am not convinced it is better than a hard failure [09:56:14] I think it is sane [09:56:28] I just have to be convinced that it is better than other alternatives [09:56:43] specially on edge cases [09:56:58] "I forgot to remove the change after reimage" etc. [09:57:17] jynus: that stuff should be handled by netbox or whatever [09:57:24] oh, I agree [09:57:29] so this is short-term? [09:57:38] you should have a dropdown to select the next single-boot target [09:57:39] you should have started there [09:57:54] jynus: well, i hope so. but i've no idea how long it will take for T251416 to be fixed (if ever) [09:58:00] yeah, ignore that [09:58:12] because without that, we're left with this [09:58:13] I mean this is "as long as ^is not fixed", right? [09:58:18] yep [09:58:26] ok, that makes me more open to change [09:58:43] specially if it is safe and make people more productive [09:59:15]