[12:31:43] apologies to halfak and Nettrom for missing pings, I'll be off for an hour or so and then back on. Might want to start a phab task or an email as this seems like a longer design brainstorm than an IRC chat can do justice [17:29:18] halfak are you coming to documentation time today? I've got a couple things I wanted to run by you [17:29:34] J-Mo1, I'm in CSCW paper mode, but I think I can be there anyway. [17:29:54] cool. I'll keep it brief [19:56:28] halfak: Amir1: hi, are you online to watch ores a little [19:56:45] i'll change it back to how it was before.. right [19:56:53] but i was told i should not have done this in one hard step [19:57:11] but in a soft-switch, first active/active, wait, then disable the original one [19:57:31] so i'll do that when going back [19:59:13] mutante, did something go wrong? [20:00:49] halfak: no. i mean something went wrong yesterday after deploy, so we switched to eqiad, right. and now codfw is fixed by amir and so we would revert to how it was before [20:01:13] Oh I see. Yes I'll be around for the next few hours. [20:01:18] To monitor and help out :) [20:02:23] halfak: ok, great. so what i will do now is set it to BOTH for like half an hour [20:02:33] and then disable codfw [20:02:47] then everything is like before, and this is what traffic told me, do it in 2 steps [20:02:55] like, i should have done that yesterday too [20:02:59] mutante, s/codfw/eqiad/ [20:03:04] to avoid 50x during switch [20:03:13] arg, yes :) [20:03:30] back to how it was before the deployment yesterday [20:03:41] and then i will leave it like that until the Ops "switch DC" period is over [20:04:02] and then we finally set it to active/active and leave it like that permanently [20:05:37] OK cool. Sounds like a good plan. [20:05:54] mutante, why would that make sense to do yesterday? [20:05:58] When codfw was broken. [20:07:32] halfak: good question, frankly i was just adviced to do it like that by a comment from brandon of traffic [20:09:03] 15:59 < bblack> mutante: that method of switching causes a 5xx spike (but luckily this is a low-traffic service) [20:09:06] 15:59 < bblack> the duration of the 5xx spike should only be during the cumin run, basically [20:09:09] 16:02 < bblack> (sorry it's not well documented outside of wikitech. But basically to switch from only-eqiad to only-codfw (or vice-versa) without a 508 spike during the deployment process, you [20:09:13] have to do it in two stages, switching to active:active first in the middle) [20:09:16] well, and only during the cumin run .. which isnt that long [20:09:51] 508 is "loop detected" [20:11:13] gotcha. That makes sense. [20:11:35] In future situations like that, we might want to go active-active only for a brief moment. [20:14:29] doing the puppet run on the 15 cache servers , that makes it active/active [20:17:38] finished [20:18:00] cool. Checking on grafana [20:18:37] Saw codfw come alive at 2008 UTC. [20:18:52] Errors and response time looks OK [20:19:06] sounds good [20:23:17] Everything continues to look good. Looking away for a bit. [20:23:58] ok, then uploading the second step / rebasing Amir's change to revert my original change [20:32:30] switching back to codfw-only 5/15 [20:33:19] 5/15/ [20:33:21] *? [20:34:50] done. 15/15 (servers that are running varnish and make up the "misc:cache" group) [20:34:59] Oh I see [20:35:15] yea, when i run puppet there to apply the config change on all [20:35:32] so now we are back to like it was before yesterday's deploy [20:45:47] mutante, all looks good. [20:46:51] Ready to declare victory? [21:21:23] hey halfak. a note that I've received your ping about Tech Dept. Platforms. I just had not a chance to give it a close look. Once I do that, I will address your assignment to me as well. Thanks for the ping. [21:24:15] lzia, hey! No problem. Just wanted to make sure your stuff got listed in whatever kind of thing that document will become :) [21:27:39] yeah, really appreciate it, halfak. :) it's been non-stop since a few days ago. ;) [21:28:33] I hear you. Keep on trucking. It's been the same here. Just getting done with paper #2. Had a major hiccup in data processing today that threatened the submission. D: [21:32:10] :D [21:32:14] good luck, halfak. :) [21:32:24] Thanks [21:32:30] * halfak flexes and digs in heels. [21:32:39] you too! [21:42:51] yeah, thanks. [22:29:29] halfak: yes, victory , all good :) [22:29:47] :)