[15:59:30] If anyone wants to crawl deep into a systemd/journald rabbit hole with me -- https://phabricator.wikimedia.org/T151422#5315881 -- trying to figure out how to stop journald from splitting messages on 2K boundaries [16:06:38] *cough* buster upgrade *cough* [16:07:08] seriously though, IIRC the 2k limitation has been lifted in journald shipping with buster [16:10:12] bd808: https://github.com/systemd/systemd/commit/ec20fe5ffb8a00469bab209fff6c069bb93c6db2 [17:08:21] shdubsh: so the fun thing is that according to the tarball that I get with `apt-get source systemd` our Stretch hosts have that fix applied, but ... the behavior has not changed. [17:08:49] it is very confusing so far [17:13:55] moritzm, for your sso checklist: I'm reminded that there are not currently any read-only ldap replicas in codfw yet. So probably we'll want that at some point before we rely on the ro endpoints. (I set up dns while I was allocating things but never made the actual VMs) [17:13:59] confusion is lessening! `apt-cache show systemd | grep Version` -- there are 2 different systemd packages in apt for stretch [17:14:18] one before the 2K fix, one after [17:14:36] and of course the one before is the one installed on the hosts I'm testing [17:17:21] bd808: yes there is an updated version available for stretch in backports. I was told not to use it because it didn't benefit from the debian security process [17:17:50] yeah. my brain hurt is all solved now I think. :) [18:16:12] ack, that's a good point, we'll probably look into adding codfw replicas as well [18:36:44] is gerrit like super slow for anyone else? [18:37:41] it was for valentin today fwiw [18:39:20] chaomodus: cant confirm, seems ok [18:39:33] just me then [18:41:26] chaomodus: could be that it first tries IPv6 and then gives up and falls back to Ipv4? [18:41:52] could be [18:46:44] oh, it was just slow for me [18:48:00] paladox: oh. mail threads? [18:48:19] dosen't look like it's mail threads, i see no back up of threads [18:49:10] the other common one that slowed it down temp. was reindexing [18:49:45] reindexing wouldn't slow gerrit down like that, but a JVM GC would. [18:50:01] https://gerrit.wikimedia.org/r/monitoring [18:50:14] dont see something obvious there [18:50:46] https://grafana.wikimedia.org/d/Bw2mQ3iWz/gerrit-javamelody?orgId=1&panelId=1&fullscreen i see a white line around 18:40 bst [18:51:01] oh, that's utc sorry [18:51:15] hmm.. look at "http severe" mean time [18:51:36] only 142 hits out of almost a million though [18:52:00] yeh, i see https://gerrit.wikimedia.org/r/monitoring?part=graph&graph=httpMeanTimes [18:52:14] paladox: oh yea, i see the gap [18:52:41] https://grafana.wikimedia.org/d/Bw2mQ3iWz/gerrit-javamelody?orgId=1&panelId=1&fullscreen&from=1562697408587&to=1562697945536 [18:52:54] yup [18:53:08] oh, it lasted for 2minutes? [18:53:44] it's only checking every 30 sec, could also be just 1:01 i think [18:54:06] ah ok. [18:54:57] last entry in gc_log is 3 days ago [18:55:45] gc_log is gerrit :) [18:56:06] JVM GC is different (not ran by gerrit) [18:56:43] i see an increase in GC pause time here https://grafana.wikimedia.org/d/Bw2mQ3iWz/gerrit-javamelody?orgId=1&panelId=14&fullscreen&from=1562697408587&to=1562697945536 [18:57:06] the gerrit.json log also has the pause [18:57:28] there is a log line at 18:39:56 and then next at 18:43 [18:57:48] oh [18:58:08] there are some exceptions before or after that but they seem "normal" [18:58:15] like unknown user or auth failed [18:58:46] https://gerrit.wikimedia.org/r/monitoring?part=graph&graph=gc <-- shows an increase [18:58:58] the "gap" could also not mean anything because generally there aren't that many log entries that it has something new every few seconds [18:59:13] in gerrit.json that is [18:59:50] paladox: oh yea.. i think you got it then [18:59:53] Yeh (Im thinking this is just a java pause for gc) [19:00:03] ack [19:01:50] but we already switched to G1 to reduce those [19:01:53] right [19:03:09] yup [19:03:33] it appears that the memory increased around the time, see https://grafana.wikimedia.org/d/Bw2mQ3iWz/gerrit-javamelody?orgId=1&panelId=12&fullscreen ( [19:55:53] chaomodus: since you were working on netbox/netmon. do you know if we can decom netmon1003 now? [19:56:05] i have a bunch of gerrit changes waiting for that ok [19:56:05] afaik yes [19:56:23] we're moving to ganeti [19:56:31] but maybe ask arzhel [19:56:42] https://phabricator.wikimedia.org/T198939#5104657 [19:57:27] chaomodus: it's also if servermon as a service is retired or not [19:57:27] oic [19:57:32] faidon would know then [19:57:37] I never used netmon1003 [19:57:47] maybe akosiaris too [19:59:47] yea, ok. thanks. i already assigned that quite a while ago