[15:01:21] I went over all the OKRs today [15:01:24] lgtm! [15:01:37] I left a couple of comments, but otherwise feel free to add them to betterworks [15:01:52] btw, we're experimenting with something new to avoid information duplication [15:02:16] that is, me not posting any "team" KRs to avoid the duplication of information [15:02:33] so rather than have my OKRs be O: " [15:02:33] Wikimedia’s technical infrastructure is sustained at common industry standards and levels and maintained with low technical debt [15:02:49] and then KR, ElasticSearch 7.2, and then herron have another OKR that's about the same thing [15:03:20] instead, the idea is for me to only have those long-running Os, and then y'all aligning your OKRs to them [15:03:45] ok, so you'll get only your personal KRs and not also our Os as KRs [15:03:45] I think mark deployed it successfully -- we're still experimenting with it, so let me know if there are kinks to work out [15:03:49] if I get it right [15:04:09] and we align with your Os, not with your KRs [15:04:34] yes [15:04:39] i don't think that's even possible? [15:04:46] and for sure it gets very confusing [15:05:07] ack [17:55:19] paravoid: is there a task with details of what metrics you want in prometheus from netbox? [17:55:44] not yet AFAIK [17:56:13] I think we should first check if upstream has anything open related [17:56:33] chaomodus, volans: I've been getting https://tty.gr/s/netbox-server-error.png every now and then [17:56:49] I got that while we were in the meeting and tried to click volans' link around utilization [17:56:54] interesting [17:57:06] and if not maybe think of having some sort of script that runs daily collecting data from the API [17:57:09] possibly related to https://phabricator.wikimedia.org/T232767 ? [17:57:10] uh? [17:57:16] weird, never happened to me on the web [17:58:40] root@netboxdb1001:~# grep -c failed /var/log/postgresql/postgresql-11-main.log* [17:58:43] /var/log/postgresql/postgresql-11-main.log:536 [17:58:45] /var/log/postgresql/postgresql-11-main.log.1:880 [17:59:10] 2020-01-15 17:51:21 GMT FATAL: password authentication failed for user "netbox" [17:59:13] 2020-01-15 17:51:21 GMT DETAIL: Password does not match for user "netbox". [17:59:49] I have experienced that occationally [17:59:58] like when we first switched to this setup [18:00:14] but not for some time, I'll look at it [18:00:21] i assume you get it more frequently than that. [18:00:38] I've gotten it 3-4 times over the past month, not a ton [18:00:50] but logs indicate it happens often enough [18:01:05] not rare enough to ignore I'm afraid :) [18:01:21] Okay cool [18:01:51] yep added to list. [18:05:11] timing on the logs is suspect [18:05:35] lots of bursts at :21 and :51, which probably coincides with some periodic cron or something [18:05:38] the dump job maybe? [18:06:02] yep [18:06:22] puppet is at 20,50 [18:06:33] I'm checking a thing [18:06:34] Ah [18:06:42] yah that seems like a co-incidence [18:06:45] like could cause it [18:07:03] nope it's that [18:07:07] tell me that puppet is not changing the assword on every run [18:07:07] we run it all the time [18:07:11] and probably re-set it [18:07:16] https://puppetboard.wikimedia.org/report/netboxdb1001.eqiad.wmnet/329fae24ffecb694dd3fe2ac535d74e3744b999d [18:07:29] doh [18:07:32] yep [18:08:15] the onlyif [18:08:21] seems to not work anymore I guess [18:08:29] modules/postgresql/manifests/user.pp +71 [18:08:44] what are we looking there? [18:08:53] the pass_set? [18:08:55] set to what? [18:09:22] it's odd thact it would break since this code is the same code as the previous installation. [18:09:25] yep, that's the user set [18:10:05] oh my gosh, I have an idea [18:10:23] what even is that code [18:10:34] it surely isn't storing the password as md5 :\ [18:12:12] Oh [18:12:12] what do you mean? [18:12:14] yes [18:12:16] it is [18:12:18] I think i know why the onlyif is broken [18:12:34] but I haven't tested - this is a substantially different version of postgres [18:12:58] my gut is telling me that the postgres upgrade broke it [18:13:03] as it's slightly different [18:13:07] yep [18:14:47] mmmh no, that's correct [18:15:07] $password_md5 = md5("${password}${user}") is generating the correct hash [18:15:45] erm [18:15:47] okay [19:42:42] I'm happy to report that thanks to some debugging we have fixed the user creation loop [19:43:12] side note, the postgres stuff in puppet is non-pretty. [19:43:47] tnx volans :) [19:43:50] ahaha wow [19:44:21] we're still doing multiple onlyif checks but at least it's a noop each time now :) cc paravoid [19:45:06] I've created https://phabricator.wikimedia.org/T242910 kinda related [21:56:06] <_joe_> our postgres module is questionable, yes [22:11:20] mmhm