[07:13:11] hmm for some reason on https://gerrit.wikimedia.org/r/c/operations/puppet/+/572515 ulsfo/profile/trafficserver/tls.yaml gets applied as expected but all the variations that I've tested for profile::varnish::cache::frontend fail to be applied as expected :/ [07:17:43] <_joe_> vgutierrez: I think you're doing this quite the wrong way [07:17:52] <_joe_> first of all [07:18:03] <_joe_> do you want to apply to both text and upload? [07:18:06] yes [07:18:14] <_joe_> so duplicate the data. [07:18:20] why? [07:18:24] <_joe_> in role/ulsfo/cache/... [07:18:30] the profiles are shared across the roles [07:18:35] why do I need to duplicate that? [07:18:48] <_joe_> because it makes it more clear what applies to any role [07:18:55] <_joe_> it's also in our puppet rules [07:19:10] <_joe_> extreme DRY will only make your life miserable (like right now) [07:19:22] <_joe_> Info: Applying configuration version '(496da4b903) Giuseppe Lavagetto - discoverydns: integrate into servicecatalog' [07:19:27] <_joe_> argh wrong paste [07:19:31] :) [07:19:32] <_joe_> as for your change [07:19:48] <_joe_> profile::cache::varnish::frontend::runtime_params [07:20:11] <_joe_> -> ulsfo/profile/cache/varnish/frontend.yaml [07:20:15] yeah [07:20:25] <_joe_> if you want to go this way [07:20:37] <_joe_> but again, I'd rather use role/ulsfo/... [07:20:42] I tried that on PS2 [07:20:55] and pcc didn't like it [07:21:10] <_joe_> uhm, are you sure? [07:21:38] <_joe_> hieradata/ulsfo/profile/varnish/cache/frontend.yaml [07:21:44] <_joe_> see the difference? [07:21:57] <_joe_> _joe_> -> ulsfo/profile/cache/varnish/frontend.yaml [07:22:49] oh nice, yet another L8 issue [07:22:51] you're right [11:42:40] jynus, marostegui: https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=db1115&service=snapshot+of+s3+in+eqiad [11:43:01] XioNoX: thanks, we are aware and working on it [11:43:09] jynus: let's ack it [11:43:09] cool! [11:43:28] doing my waking up round of icinga alerts [11:48:48] also this has been alerting, and notifications are disabled, but no notes and no ack: https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=ganeti1009&service=configured+eth [12:39:43] XioNoX: I've always wondered what is the right policy around ack'ing [12:39:43] specially around "I know it is happening and working on it, but the issue is ongoing" [12:39:43] to me acking means, I know there is an issue and someone else who goes to /alerts doesn't need to freak out about it [12:39:44] that is the issue, maybe someone else has to [12:39:44] as it may lead to blind acks/forgetting [12:39:44] (I have acked it BTW), just throwing the question [12:39:44] why blind ack? [12:39:44] well, we don't have s3 eqiad fresh backups, that is something of concern [12:39:45] you can also set an expiration date if you're worried about ack'ing and getting distracted [12:39:46] basically, I've seen different people having a different understanding on how alerts should be treated [12:39:46] I am more on your side, but I've seen people disagreeing [12:39:46] "It is useful to see things going off and on in certain cases" [12:39:46] (which could be true too) [12:39:47] I think, and I said this some times, we need a proper dashboard with "things that are broken", different from the alert system [12:39:47] yeah, we should have some standard policy [12:39:48] that means yet another dashboard [12:39:48] he he [12:39:48] and then nobody will look at the alert system :) [12:39:48] so in an ideal world, integrated [12:39:50] but separating "things you should look at now" vs "things that you should now have ongoing issues/maintentance" [12:39:50] *know [12:39:50] so like red and yellow alerts kind of things? :) [12:39:51] no [12:39:51] or list of all alerts and list of all ack'ed alerts? :) [12:39:51] not based on priority [12:39:51] alerts <> broken things [12:39:52] not having backups would be a top priority, but it wouldn't really need immediate paging [12:39:52] think of a better !log system [12:39:52] https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=all&type=detail&servicestatustypes=16&hoststatustypes=3&serviceprops=6 :) [12:39:52] e.g. you note that from 3:00 to 7:00 there will be a router maintenance [12:39:52] XioNoX: that is terrible [12:39:52] and not what I would like to have [12:39:52] I think I understand what you mean :) [12:39:52] think performance issues [12:39:52] icinga only has on or off [12:39:52] or maybe a warning [12:39:52] sure, there is metrics [12:39:52] but right now https://icinga.wikimedia.org/alerts is the one stop to know if anything needs attention, so to me it makes sens to keep it clean [12:39:53] I don't disagree [12:39:53] I just would like something better! [12:39:53] and observability is working on it, maybe alertmanager is the answer [12:39:53] there would be like 2 axis: importance and time sensitivity [12:39:53] keep pages for important and time sensitive [12:39:53] but I am missing things that are important but not time sensitive [12:39:53] call it dashboard or call it something else [13:33:16] jynus: me personally, i only ack stuff when I'm working on it or created a task to be followed up [14:42:21] fyi i seem to be getting backet loss reaching gerrit (not checked other endpoints) seems to be an issue in telia [14:42:24] https://phabricator.wikimedia.org/P10424 [14:42:26] XioNoX: ^^ [14:44:27] I guess we don't have to treat our circuit ID private as they're in Telia's PTR - https://phabricator.wikimedia.org/P10424$16 [14:44:59] are we supposed to treat circuit IDs as private? [14:45:13] i put circuit IDs and transit vendor trouble ticket numbers in icinga acks all the time :D [14:46:33] I think at some point we did [14:47:18] trouble tickets and CID could cause someone calling on our behalf and asking stupid things [14:49:32] jbond42: what's your source IP? (can pm me) [14:50:07] 144.2.161.226 let me know if you want me to run any tests [14:50:19] oops that was ment to be a pm [14:50:24] haha [14:51:13] thx [14:51:47] jbond42: email sent, noc@ CCed [14:51:54] ack cheers [15:54:16] Question for the Debian pros, if https://packages.debian.org/sid/gobgpd works fine on Buster, can I just add the deb to the wikimedia-buster repo or it's more complicated than that? [16:07:01] <_joe_> yes [16:07:32] thanks! [16:08:07] <_joe_> it was yes to both options [16:08:42] as long as it's yes to the first one :) [16:16:51] RIPE Atlas is so good [16:17:14] ah? [16:18:16] doing a bunch of state-by-state latency measurements now [16:18:21] really cool to be able to do it [16:19:26] nice, didn't know it had this granuarity [16:19:42] it's not automated, but you can draw bounding boxes on a map to select probes [16:20:36] ah ok! Which state is the most annoying to draw a box around? [16:21:36] West Virginia or Illinois (just because of population density around St Louis) are pretty up there [16:22:48] Tennessee is rather easy, although I was surprised there's 0 probes in the Memphis metro area [16:50:36] I have no idea what flowspec is [16:53:58] replied on -ops [16:54:07] ah [16:54:15] it was a new name for me :-D [16:54:28] I first thought mw flow weird service [17:05:18] I'm having trouble opening pages on enwiki -- is something up? [17:05:27] yes [17:05:34] known and working on it [17:05:54] ack, thanks [17:24:20] liw: problems still? [17:25:44] jynus, seems to work now [17:25:56] cool :-D [22:14:38] Puppet question, if I do: https://www.irccloud.com/pastebin/OKVj2LKW/ [22:15:22] Is there a way to replace the current: `$network_infra = $network_data['network::infrastructure']` to restore the old behavior? Aka return only a list of prefixes [23:17:05] chatted in pvt with Xio.NoX, for the curious one solution is: [23:17:39] $network_data['network::infrastructure'].reduce( [] ) |$memo, $value| { $memo + $value[1] }