[10:58:59] hi all i intend to deploiy the following change today, aiming for 13:00 UTC. This change will make the new puppetmaster1003 (running puppet 5.10) server a real loadbalanced backend. please shout or comment on the ticket if you have any concerns etc https://gerrit.wikimedia.org/r/c/operations/puppet/+/538590 [11:07:35] <_joe_> jbond42: as long as you tested one server in every class, I don't see issues on the serviceops side [11:09:28] _joe_: thanks, i have used octocatalog-diff to check all servers and have also pointed individual servers at the puppetmaster for a number of weeks [11:10:04] <_joe_> yeah I followed that, but never looked at the details [11:10:10] <_joe_> it all seems pretty safe [11:10:16] ack cheers [11:10:20] <_joe_> out of curiosity, what did you need to fix? [11:10:26] <_joe_> just the hiera backends? [11:10:37] <_joe_> or there were new incompatibilities? [11:11:15] hardly anything tbh. i cant rember all the ones early one but there where a few issues with i think the volatile file share, puppetdb_query and more recently there where some scoping issues where resource defaults had been used [11:11:43] e.g. https://gerrit.wikimedia.org/r/c/operations/puppet/+/537641 [11:28:44] <_joe_> oh right [11:28:53] <_joe_> "fun" [11:29:13] <_joe_> we had another case where that happens, mwmaint1002 I think [11:30:19] <_joe_> but if diff is ok i guess we're GTG [11:31:46] yes there where a couple of other instance but should be all sorted now [13:20:49] ok im gonna deploy this now ping me if you see anything unexpected [13:45:52] hi all, `git fetch --all` is now taking about 2-3 minutes each time which seems a bit long to me. Is this simlar to what others are seeing or is there something i can do to speed things up on my side. https://phabricator.wikimedia.org/P9151 <- GIT_TRACE=1 git fetch --all [13:48:14] I don't see any obvious problems with gerrit's monitoring metrics right now, fwiw [13:48:33] for me it takes the usual 5ish seconds [13:48:33] <_joe_> so one of the reasons is [13:48:41] <_joe_> we have a ton of git references [13:48:45] <_joe_> one for every patchset [13:48:49] <_joe_> and that doesn't really scale [13:49:11] cdanis: are you planning to do a dbctl deployment today? [13:49:15] <_joe_> I think at some point we should "archive" the current puppet repo and start with a fresh one [13:49:25] <_joe_> marostegui: nope, we wer discussing that right now [13:49:28] cdanis: We are doing a master switchover tomorrow morning, so maybe better to wait till that is gone? [13:49:30] ah cool [13:49:31] marostegui: not today, pretty jetlagged, but I am thinking tomorrow or Wednesday [13:49:49] thanks joe, im just reading https://phabricator.wikimedia.org/T103990 which seems related to git refrences [13:49:52] cdanis: On thursday we have another switchover, so tomorrow probably better :) [13:49:57] ok! [13:50:04] cdanis: or next week, up to you :) [13:50:14] this one would be very annoying to roll back, because of the schema changes [13:50:56] <_joe_> jbond42: when I say "fresh" I mean with all history but no refs [13:51:02] <_joe_> to all the past patchsets [13:52:14] _joe_: yes i think that would be good, its starting to get a bit frustrating and i also think this is probably causing an impact to pupet-merge as well [13:52:28] I think we need to do git gc more often. [13:52:52] Me and tyler talked about changing the config for that. Since it currently does it every week (on a saturday) [13:54:27] git gc wouldn't solve that one, since all those refs would still exist; gitv2 would solve it if/when that happens upstream [13:54:36] or archiving the old repo and gluing it together with a new one [13:56:04] thcipriani it's happening in gerrit 3.1 :) [13:59:24] <_joe_> paladox: this has nothing to do with gc [13:59:35] ok [14:00:04] <_joe_> ok this looks like a compelling argument to move there as soon as it's available [14:00:24] <_joe_> jbond42: well locally in the DC I suppose bandwidth is not the issue [14:03:02] true [14:03:21] <_joe_> but we can put that to the test I measn [14:03:54] ack, fyi i have applied the suggestions here and its now down to about 20seconds from 2-3 minutes https://phabricator.wikimedia.org/T103990#2144157 [14:07:06] <_joe_> git-upload-pack sends back 210979 refs [14:07:28] <_joe_> for a total of 15 Mb of data [14:07:37] <_joe_> just for upload-pack [14:07:44] btw [14:08:14] I've a .gitconfig snippet for always fetching over HTTPS and always pushing over SSH, which gets you really good performance, and if you're using gpg-agent to manage your ssh keys means you don't needlessly re-authenticate [14:08:26] <_joe_> cdanis: that's what jbond42 just did [14:08:30] https://github.com/cdanis/dotfiles/blob/master/git/.gitconfig#L17 [14:08:50] _joe_: but with the .gitconfig snippet you don't have to run set-remote commands in each clone [14:09:15] <_joe_> oh you meant global? [14:09:19] yes [14:09:26] with the pushInsteadOf config statement [14:09:50] cheers cdanis [14:09:51] so you just need to drop that in your ~/.gitconfig and you're done [14:13:57] I'll comment on the bug as well [14:19:17] https://phabricator.wikimedia.org/T103990#5516281 [16:04:20] bblack: should I assume for now that the D2 switch is staying where it is? Or do y'all still expect to replace it sometime soon? [16:04:29] D2: Add .arcconfig for differential/arcanist - https://phabricator.wikimedia.org/D2 [16:05:03] <_joe_> lol [16:05:10] um, thanks stashbot [16:06:31] andrewbogott: don't know yet, I'll follow up with Juniper suport today [16:06:41] XioNoX: ok! [16:06:52] please ping if/when the replacement is scheduled so we can downtime some things [16:07:03] sure, will do! [18:03:50] I *think* so long as we don't have further issues/recurrence we might just keep it in service for now, re: D2 [18:03:50] D2: Add .arcconfig for differential/arcanist - https://phabricator.wikimedia.org/D2 [18:08:59] I opened https://phabricator.wikimedia.org/T233645 about investigating what happened [19:52:12] XioNoX: did you know that RIPE Atlas will do traceroutes?? this is great [19:52:21] yeah [19:53:34] cdanis: but they don't have good tooling around them, so we would have to write parsers, and ways to actually figure out where an issue possibly is [19:54:27] yeah, going to e.g. https://atlas.ripe.net/measurements/22902219/#!probes and clicking the little ℹ️ icon is not the best UX [19:59:19] XioNoX: it appears we both had the same idea to file incident reports for the issue on Friday. How would you like to deconflict? [19:59:34] shdubsh: merge :) [20:00:16] I didn't fill the timeline, and the "services" part is only guesses, so I'd say your version is more accurate on that [20:01:28] Ok. I've some time to do some merging. :)