[09:18:08] moritzm: do we get debian-debug on our debian mirror? [09:18:36] (I need libssl debug symbols on acmechief-test1001) [09:18:50] <_joe_> vgutierrez: you mean the -dbgsym packages? [09:18:53] <_joe_> sure thing [09:18:55] yep [09:22:08] vgutierrez: we don't, see https://phabricator.wikimedia.org/T164819#3309872 [09:22:39] ack, thx [09:41:23] <_joe_> volans: are you doing something on the puppet compiler? compiler1002 is unreachable it seems, at least via web [09:42:45] <_joe_> uhm I can ssh just fine into the server [09:43:45] no, not at all _joe_ [09:46:04] <_joe_> volans: yeah it's issues with ferm in labs apparently [10:21:50] moritzm: so.. it looks like python3-cryptography on buster has a memory leak, it's already been fixed by upstream with https://github.com/pyca/cryptography/commit/9a22851fab924fd58482fdad3f8dd23dc3987f91, I guess that I should open a bug in debian to get that backported? [10:25:43] that would be a great! the next OpenSSL update (to be appear in the next days) will cause a build failure in python-cryptography (which needs a fix in py-crypto to use the correct API), so there'll be an update soon anyway, we could kill two birds with one stone there [10:26:19] updating to py-crypto 2.7 will fix it as well [10:26:50] can you file it with severity: important and mention that the leak is triggerable with real work loads? [10:27:00] sure [10:30:46] moritzm: tags: fixed-upstream applies here? [10:30:53] yes! [10:30:58] nice :) [10:48:41] done: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=941413 [11:22:22] ack, nice [12:19:13] I'm looking for advice on how properly workaround the python linter in this patch: https://gerrit.wikimedia.org/r/c/operations/puppet/+/539853 [12:19:44] I'm hoping to dont introduce any overriding instruction in the files themselves, to keep them a verbatim upstream copy [13:47:35] <_joe_> herron: I see Timo agrees as well, I think we should just merge your change https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/539623/ [13:48:46] _joe_: ok, will give that a shot this morning [13:49:26] <_joe_> then I'll need to understand why it doesn't set type=mediawiki [13:49:38] <_joe_> but that's not your concern for now :) [13:58:01] hah ok [15:28:02] heh heh, I was taking a quick look at provisioning a grafana1002 and now I'm wondering if grafana has a slow memory leak :D https://grafana.wikimedia.org/d/000000377/host-overview?refresh=5m&panelId=4&fullscreen&orgId=1&var-server=grafana1001&var-datasource=eqiad%20prometheus%2Fops&var-cluster=misc&from=now-90d&to=now [15:29:13] fun! [15:58:02] anyone want to take a quick look at https://gerrit.wikimedia.org/r/c/operations/dns/+/539894 ? [16:00:18] cdanis: +1 [16:00:26] grazie mille [16:02:28] di nulla [16:03:27] hm, isn't there a way to override flake8 checks in a given puppet subdir? I thought it was just .pep8 but now I can't find an example [16:28:38] <_joe_> andrewbogott: tox.ini [16:29:47] <_joe_> sorry I saw the message from arturo earlier but forgot to reply :( [16:34:59] some of the kafka logging consumers are lagging a lot [16:35:00] https://grafana.wikimedia.org/d/000000484/kafka-consumer-lag?from=now-3h&to=now&orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-cluster=logging-eqiad&var-topic=All&var-consumer_group=All [16:35:51] Cc: godog, shdubsh --^ [16:37:50] thanks _joe_ [16:39:10] elukey: It looks like there were a few bursts of messages over the last hour. Not sure where from though. Backlog appears to be getting consumed no problem though. [16:39:21] ack! [16:40:49] elukey shdubsh thanks, I'll take a look as well after the meeting [16:44:09] akosiaris: FYI the above might be k8s logging ^ still to be confirmed but the increase lines up [16:44:43] yup looks like it [16:45:07] that's a big increase [16:45:41] way way too big, no way the cluster produce that many logs [16:46:11] yeah, trying to locate which container(s) [16:46:45] *drumroll* I think it is zotero [16:47:06] lol [16:47:27] lemme see what I can do for that [16:48:59] ack, lag is reducing but looks like that's still ~2k log/s [16:58:31] akosiaris: think your recent patch is also doing this? " Error while evaluating a Resource Statement, Function lookup() did not find a value for the name 'profile::rsyslog::kubernetes::token' at /etc/puppet/modules/profile/manifests/rsyslog/kubernetes.pp:2 on node tools-worker-1001.tools.eqiad.wmflabs" [16:58:58] ah indeed [16:59:10] I completely forgot about tools, sigh [16:59:23] I 'll revert, no easy way to fix 2 issues at the same time [17:00:18] akosiaris: it's up to you, having puppet broken on tools isn't an emergency [17:02:14] thanks akosiaris ! I'll followup either in a new task or reopen existing ones [17:22:10] what does it usually mean if gnt-instance console just sits and hangs? [17:22:58] oh, I bet puppet hasn't run on the install hosts yet [17:30:21] cdanis: probably that, either that or the console configuration (the ttyS0.115200 file) is wrong [17:30:28] yeah I've made that mistake before as well ;) [17:30:31] (mispasted MAC address) [17:30:38] anyway all good now, thanks akosiaris [17:30:44] yw [18:46:25] elukey: can matomo.analytics.eqiad.wmflabs be deleted? Or, failing that, could you take a stab at fixing puppet there? It's been unpuppetized for a couple of months now. currently failing with "Function lookup() did not find a value for the name 'profile::tlsproxy::service::cert_domain_name'" [18:54:07] andrewbogott: he is out for today [18:56:49] XioNoX: ok [19:06:35] andrewbogott: 2 instances deleted. one you asked about and another old thing [19:06:49] mutante: thank you! [19:06:52] unrelated to matomo i mean