[08:11:56] hey folks, in T374830 some users are reporting failures in resolving gerrit.wikimedia.org from within WMCS virtual machines. I wonder if we could be getting rate limited or similar [08:11:58] T374830: Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org - https://phabricator.wikimedia.org/T374830 [08:12:38] VMs have a resolver configured from WMCS, that then forwards to ns{0,1,2}.wikimedia.org for wikimedia.org lookups [08:12:41] rings any bell? [08:12:45] <_joe_> arturo: rate limited with what? [08:13:35] _joe_: I don't know, that's why I ask. We have been unexpectedly rate-limited in the past by the CDN. There could be other [unknown to me] ratelimits somewhere [08:14:07] <_joe_> arturo: resolution is DNS, so defaults to UDP and doesn't go via the CDN [08:14:34] <_joe_> that's why I was asking, I'm not aware of any rate-limiting in our DNS infra [08:15:27] ok, thanks. I'll keep diving in logs to see if I can find any other leads [08:19:49] <_joe_> If I had to bet, given the problem is intermittent, it might be some kind of obscure networking issue anywhere between the openstack networking and every other piece involved. [08:20:20] <_joe_> but also, why would the dns recursor not cache the value for gerrit.wikimedia.org? [10:19:13] I need internet access from `stat1011` (just to pull in some python libraries) — a ghost of a memory in my brain is telling me there's a command I have to run to get external network access working, but I cannot for the life of me find it on wikitech [10:21:15] of course now I find https://wikitech.wikimedia.org/wiki/HTTP_proxy and its `set_proxy` (: thanks anyway! [10:25:56] <_joe_> :) [10:26:23] <_joe_> moritzm, elukey what would be a good time to disable puppet on the puppetservers for ~ 1 hour tops? [10:27:00] anytime, I am not working on it and Moritz is out :) [10:28:00] <_joe_> ok :) I'll do it after lunch break then [10:28:16] <_joe_> I counted on doing it this morning but I got swayed by various fires [10:28:29] <_joe_> thanks <3 [11:31:42] arturo: no reate-limiting on the DNS hosts of any kind [11:32:01] sukhe: thanks for confirming [11:32:53] arturo: do we have an idea as to since when we are seeing this? [11:33:08] > Feels like it’s happening rather too often since last Friday or so. this? [11:33:23] sukhe: I have no specific data, the ticket is the main source of information [11:38:07] ok thanks. I will take a look when I am online properly soon [11:49:44] the pdns-recursor throttling you mentioned kick in when the auth server doesn't answer to a query or answers in a way that the recursor doesn't like but I don't see that happening here [11:50:15] like there is nothing to not like about gerrit.wikimedia.org that gdnsd will respond [11:51:38] ah your settings are different since you run your own rec [11:52:17] you can try bumping max-tcp-per-client as well in addition to the timeout [13:25:32] <_joe_> elukey: I just disabled puppet on the puppetservers [13:28:17] okok [15:22:07] <_joe_> Amir1: I have a change to merge [15:22:23] what's up [15:24:27] _joe_: you already merged it. Sorry :( [15:29:52] _joe_: https://dumps.wikimedia.org/other/wikitech/ [15:30:14] now updating the rackspace host to point to this [18:32:28] anyone know the best way to test a cookbook change in combination with a spicerack change? [18:33:16] no such way other than to wait for the spicerack release. but now I am curious too :) [18:34:26] jhathaway: though now I got curious and noticed https://wikitech.wikimedia.org/wiki/Spicerack/Cookbooks#Creating_your_local_environment [18:35:50] hmm, yeah I wonder if that would work, if I run those steps on a prod server in my home dir [18:39:15] I made this gross playbook when I first started, but it's probably discouraged to use ;( https://gitlab.wikimedia.org/repos/search-platform/sre/spicerack-dev-env [18:40:25] inflatador: which specific parts out of curiosity? [18:41:15] There's a ticket somewhere, but the playbook downloads stuff from pip which is frowned upon [18:41:52] ah but local tests should be fine though? we don't do that in just prod [18:46:50] yeah, it's useful for you to run locally. I only ever ran it from cumin as that was the only place that had access to all the bits I needed [18:47:15] there are probably ways to do it without downloading anything from pip too