[06:48:25] volans elukey not sure if this is related to the errors you guys were troubleshooting a week ago: https://phabricator.wikimedia.org/P13461 [07:13:26] marostegui: hola! This seems after the dns diff right? If so no it is a different one.. anything useful in the logs? [07:13:43] maybe cumin failed on some nodes [07:13:49] yeah, it is after the dns diff [07:15:04] ah so from the logs all the dns servers failed to be uodated basically [07:15:50] :-/ [07:16:08] *updated [07:16:09] I can try running it again to see if it was a one time thing [07:17:10] yep in theory it should be ok, but the cookbook may fail for other reasons (like the ones I was seeing with the analytics nodes) [07:17:26] if that doesn't work we can run the netbox dns cookook [07:17:30] *cookbook [07:17:55] ok, let me try again [07:18:42] the fact that all DNS servers failed it is weird though [07:18:53] yeah, that is strange [07:25:57] elukey: nah, same error :( [07:26:11] elukey: maybe I should create a task? [07:27:59] marostegui: yes definitely [07:28:10] will do, thank you! [07:30:45] Done: https://phabricator.wikimedia.org/T268963 [07:32:19] I am running utils/deploy-check.py on dns5001 (without deploy etc..) to see if it yields to some errors [07:34:39] thanks for adding that to the task! [07:34:55] error: CNAME 'es2-master.eqiad.wmnet.' points to known same-zone NXDOMAIN 'es1015.eqiad.wmnet.' [07:35:11] aaaaah [07:35:13] I can change that [07:35:17] ta daaaan [07:35:18] :) [07:35:35] thanks for finding that! [07:35:41] I think it would be extremely useful to get cumin's output in the cookbook [07:35:46] at least in debugging logs [07:35:46] I will review all the other cnames, as we might find the same issue [07:35:52] ack adding this to the task [07:36:23] heh, this didn't happen with es1 cause we don't have a cname for that, but we should [07:36:30] at least for consistency, I will add it too [07:38:06] elukey: https://gerrit.wikimedia.org/r/c/operations/dns/+/644086 [07:38:44] marostegui: so it looks good but I have no idea about the hostnames :D [07:39:01] elukey: haha, no worries, it was more a syntax double check :) [07:39:41] marostegui: yes yes I figured, I wouldn't trust myself to review database configs either, +1ed :D [07:39:49] elukey: I will re-run the decom script in a few minutes [07:40:03] ack [07:45:40] elukey: it failed for other reasons, but I think it worked now [07:49:30] marostegui: ah yes that it still a problem, if the .mgmt is not there the cookbook doesn't like it [07:49:45] marostegui: but, now I am confused, how can the cookbook say "nothing to commit" ? [07:49:57] elukey: I guess it did commit the other previous changes [07:50:05] the removal of the es1015.eqiad.wmnet dns [07:50:54] Host es1015 not found: 3(NXDOMAIN) [07:53:22] ah ok yes it was failing to deploy the zone files [07:53:30] but the record was removed [07:56:06] yep yep confirmed the diff comes from the netbox dns repo [07:56:17] that of course contains the diff [07:56:18] perfect [07:57:55] marostegui: going also to open a task for the mgmt thing, so we remember [07:58:03] (for Riccardo's happiness) [08:01:24] https://phabricator.wikimedia.org/T268965 [08:03:03] so we can close the other one I think? [08:03:41] let's keep it open, the cumin's output logging part I think it is something to work on, it should be straightforward from the logs why the cookbook failed [08:03:56] but not sure it may be too verbose [08:05:26] elukey: cool! [08:06:20] marostegui: ah one last thing, I forgot.. did you cumin cumin? [08:06:33] elukey: you mean cumin cumin cumin? [08:06:38] yes exactly [08:06:46] yes, always with cumin [08:06:51] perfect, then we are ok [08:30:58] * volans has the impression to have been summoned... [08:34:39] re: cumin's output in cookbooks, I've just unilaterally decided that I'll re-enable them all, and if any cookbook is too spammy we can adjust at will even if we don't have yet all the bits in cumin's API to manage that [08:34:58] I'll read the rest in a moment [08:35:05] marostegui, elukey ^^^ [08:36:30] <3 [08:36:55] volans: \o/ [08:38:19] marostegui: actually you didn't run it again, because on no-changed by default it doesn't deploy [08:38:34] also you don't need to run the whole decom but can just run the sre.dns.netbox cookbook in those cases [08:38:37] ah :) [08:38:42] with a special option [08:38:51] https://wikitech.wikimedia.org/wiki/DNS/Netbox#Force_update_generated_records [08:39:03] ah thanks volans - didn't know that! [08:39:20] but the decom cookbook should probably be more explicit in those cases [08:39:25] so something to improve anyway [08:39:41] volans: yeah, to me it wasn't clear whether something was actually done or not [08:40:25] volans: so the records were committed in netbox but not deployed to the dns auth servers? [08:40:29] the authdns-update failed, it can fail on a single host for $reasons or everywhere for errors, but in all cases it's virtually impossible that it can break the dns [08:41:02] :challenge-accepted: [08:41:25] elukey: indeed because gdnsd pre-check failed because of the cname pointing to a same-zone not-existing name [08:41:46] in this case the CNAMEs shiuld have been renamed before the decom to prevent the failure [08:42:46] mmm but es1015.eqiad.wmnet yields to nxdomain to me now [08:43:34] because probably marostegui has manually run authdns-update [08:43:43] ah yes for the CNAMES [08:43:46] volans: I did for the cnames yeah [08:43:46] okok now it makes sense [08:43:51] perfect [08:44:00] that deployed also the auto-generated changes [08:44:21] so nothing to do AFAICT without having checked the servers [08:44:34] but, if you want to be extra secure you can run the dns.netbox cookbook with the --force option [08:44:50] makes sense yes [08:45:26] as follow up, more logging + probably something mention the --force etc.. could be useful to avoid forgetting [08:45:48] I knew about it but completely forgot [08:45:56] yep [09:21:57] marostegui, elukey: https://gerrit.wikimedia.org/r/c/operations/software/spicerack/+/644175 [09:34:46] hashar: FYI, there's a an OpenJDK bugfix release (11.0.9.1), the bugfix from .1 doesn't really concern us, but Jenkins refuses to start since it expects 3 parts in the version number only: https://github.com/jenkinsci/packaging/pull/198 [09:35:32] the 11.0.9.1 update will land in the next Buster point release (happening next weekend), but I suppose there will be some Jenkins update to hand the patch from the pull request, so Jenkins can be upgraded before the new JRE [09:36:50] moritzm: oh that is nice [09:36:55] moritzm: thanks for catching that one :] [09:37:14] though the Jenkins msater run on java 8 still [09:37:20] the agents do use java 11 [09:41:20] looking at https://ubuntu.com/security/notices/USN-4607-2 this affects anyone running the latest Ubuntu LTS, so I guess Jenkins will push out a fix soon [17:20:24] (can wait for meeting end) volans: Re: cumin v0.0.45 should I be aware of any change for client code calling cumin, or new features I should use? [17:21:16] jynus: are you using spicerack's 'remote' module or directly importing cumin [17:21:19] ? [17:21:47] directly cumin for now, as spicerack didn't exist when I started using it [17:22:27] cumin side nothing changed yet, and the next release will anyway be backward compatible as we can't require all clients to be upgraded at the same time [17:22:39] I'll keep you posted on that side of things [17:22:45] ok, so that means I should be looking to the changes? [17:23:13] we have most db-related stuff abstracted in one class, so any change should be easy [17:23:59] as for the current spicerack release the only change involving cumin is that when a cookbook calls run_sync or run_async the usual cumin output that currently is suppressed will be printed to stdout/err [17:24:20] volans, this is the whole extend of cumin usage by all tools: https://phabricator.wikimedia.org/diffusion/OSMD/browse/master/wmfmariadbpy/RemoteExecution/CuminExecution.py [17:24:34] s/all/some/ [17:25:04] are there db tools not using that? [17:25:21] that use cumin directy, I mean [17:25:41] jynus: ah, no. lots not using cumin at all (directly or indirectly) [17:25:54] yeah, :-D [17:26:01] jynus: what you'll be able to change when the new cumin release will be available (still need some work) [17:26:04] is remove the https://phabricator.wikimedia.org/diffusion/OSMD/browse/master/wmfmariadbpy/RemoteExecution/CuminExecution.py$52 [17:26:07] hack [17:26:12] we are not heavy users, just transfer, switchover and xtrabackup mostly [17:26:21] and choose on a per-execution basis if you want the output printed out or not [17:26:24] volans: thanks, keep us updated [17:26:29] sure! [17:26:39] but if it is only that, seems like an easy change [17:28:33] also you "can" change it not forced to :) [17:28:38] but it's a hack [17:29:27] yeah, I would be glad to change it [17:30:17] that was added, I think by the volunteer, after your suggestion when there was no alternative [17:34:08] yep [17:36:40] also if using cumin directly is not prefered, we can also have a look at that [17:39:49] not at all [17:39:57] it's meant to be used at will