[03:08:07] 10Traffic, 10netops, 10Operations, 10ops-eqsin: cp5010 - no link on primary ethernet port - https://phabricator.wikimedia.org/T187158#3966190 (10Papaul) The DAC needed to be seated on the switch side @BBlack please check and see if you can get not to the server [03:15:45] 10Traffic, 10Operations, 10ops-eqsin: rack/setup/install cp50(0[1-9]|1[0-2]) - https://phabricator.wikimedia.org/T181557#4025915 (10BBlack) [03:15:47] 10Traffic, 10netops, 10Operations, 10ops-eqsin: cp5010 - no link on primary ethernet port - https://phabricator.wikimedia.org/T187158#4025912 (10BBlack) 05Open>03Resolved a:03Papaul DHCP works, so interface is fixed, thanks! [03:36:34] 10Traffic, 10Operations, 10ops-eqsin: rack/setup/install dns500[12] - https://phabricator.wikimedia.org/T181556#4025979 (10BBlack) [03:36:38] 10Traffic, 10Operations, 10ops-eqsin: dns5002 mgmt console unreachable - https://phabricator.wikimedia.org/T186902#4025975 (10BBlack) 05Open>03Resolved a:03Papaul @Papaul re-seated mgmt console cable, seems to be working now [06:11:07] 10Traffic, 10netops, 10Operations, 10ops-eqsin: replace eqsin SFP-T/SFP+ - https://phabricator.wikimedia.org/T188923#4026116 (10ayounsi) [06:25:11] 10netops, 10Operations: cr1-eqsin faulty interfaces - https://phabricator.wikimedia.org/T187807#4026126 (10ayounsi) 05Open>03Resolved Unit replaced by Papaul, all interfaces are up! No alarms. [08:44:58] vgutierrez: shall we try 1.15.0 on pybal-test? :) [08:45:11] sure [08:45:20] but let me learn how I should package the thing :) [08:45:33] yesterday I was able to generate the .deb in boron [08:45:41] but not in /var/cache/pbuilder/result.... [08:45:54] ah, let me know if you ever figure out how to do that! [08:46:12] wut? [08:46:19] I saw pybal 1.14.X there [08:46:22] so you know how :P [08:46:27] I've tried for a bit, given up, and now I just copy the packages under the right place in /var/cache/pbuilder/result/whatever [08:46:32] and fixe perms [08:46:36] *fix [08:46:38] :facepalm: [08:46:44] hehe :) [08:46:58] hmm maybe volans could help us [08:47:50] frankly, I've never considered this a big enough issue to care, but I see that you're much more diligent than I am! [08:48:14] hahahah [08:48:31] I'm trying to avoid bad habits from day #1 [08:48:44] but yeah.. I was tempted to do the same yesterday [08:48:55] anyways, next step would be to rsync things [08:48:59] so, on apt.w.o: [08:49:39] sudo rsync -v boron.eqiad.wmnet::pbuilder-result/jessie-amd64/pybal-blah* . [08:50:36] sudo -i ; cd /srv/wikimedia [08:50:39] reprepro include jessie-wikimedia ~vgutierrez/pybal_1.15.0_amd64.changes [08:51:06] and in case reprepro yells at you about bad distributions or similar nonsense, --ignore=wrongdistribution [08:55:44] * _joe_ larts ema [08:55:56] <_joe_> you should use the proper distro names in the changelog [09:01:53] vgutierrez: not sure how I can help but happy to try ;) give me 5 minutes though [09:02:38] sure [09:03:05] we were trying to figure out how packages end automatically in /var/cache/pbuilder/result/.... [09:03:39] we are building pybal in boron like this: $ ARCH=amd64 DIST=jessie WIKIMEDIA=yes gbp buildpackage -j8 -us -uc -sa --git-builder=git-pbuilder --git-ignore-branch --git-ignore-new [09:04:11] and the package ends in $HOME [09:04:30] you can skip --git-ignore-new / --git-ignore-branch of course according to circumstances [09:04:40] as in, if you're on the right branch :D [09:05:04] gbp doesn't like 1.15 as a branch xD [09:07:30] unless you say --git-debian-branch=1.15 :) [09:07:51] or, unless we ship debian/gbp.conf with the appropriate value, which might be a good idea [09:11:18] additional eyes welcome! https://gerrit.wikimedia.org/r/#/c/416652/ (post-v5-upgrade cleanup) [09:11:23] * ema goes afk for a bit [09:21:24] vgutierrez: so, I'm no expert here among all those DD, but I would definitely ship a gbp.conf with the right config (upstrem-branch, debian-branch, upstream-tag [if you use tags]) [09:22:01] noted :) [09:22:30] besides that, what's the main difference between what you do and what we do? [09:22:40] none [09:22:51] cause yesterday you told me that you always end in the result in /var/cache instead of in $HOME [09:22:54] xD [09:22:57] let me check my script that makes cumin releases [09:23:11] ahhh maybe [09:23:16] GIT_PBUILDER_AUTOCONF=no [09:23:31] cit. To disable all attempts to discover the base path, tarball, or configuration file and set up the pbuilder options and instead rely on the settings in .pbuilderrc, set GIT_PBUILDER_AUTOCONF to "no". [09:23:50] $ cat .pbuilderrc [09:23:51] BUILD_HOME=$BUILDDIR [09:24:46] I think I had this issue once we moved to boron that is stretch from the previous build machine that was jessie [09:25:24] or better, I had some issues, and I think this fixed them, but I don't remember that I was getting the .deb in my home, but could be [09:25:48] vgutierrez ^^^ [09:26:32] -rw-r--r-- 1 vgutierrez wikidev 48K Mar 6 09:25 /var/cache/pbuilder/result/jessie-amd64/pybal_1.15.0_all.deb [09:26:35] yey [09:26:36] that made the trick [09:26:37] volans: <3 [09:26:45] volans.beers++ [09:27:00] lol, yw! :) [09:27:57] BTW, 100% unrelated thing, which URL shortener are we using for the IRC logs? [09:28:04] vgutierrez: feel free to check my release script in /home/volans/cumin-release [09:28:09] the URL in the topic doesn't work for me [09:28:25] and http://wm-bot.wmflabs.org/browser/index.php?display=%23wikimedia-traffic feels huge [09:28:57] https:// even [09:29:04] vgutierrez: none officially :( if you see in ops bit.ly was used [09:29:24] but there isn't an internal/official one AFAIK yet [09:31:05] hmmm [09:31:06] :_( [09:38:39] vgutierrez@pybal-test2001:~$ apt-cache policy pybal [09:38:40] pybal: [09:38:40] Installed: 1.14.4 [09:38:40] Candidate: 1.15.0 [09:39:09] \o/ [09:39:37] vgutierrez: when adding/upgrading a package to APT, always log in -ops ;) [09:39:46] *!log [09:40:30] things like https://tools.wmflabs.org/sal/production?p=0&q=apt.wikimedia.org&d= [09:41:54] thx :D [09:42:15] and I just noticed is not mentioned in the wikitech page for reprepro, adding it [09:45:37] and done [10:31:50] https://info.varnish-software.com/blog/varnish-6.0 [10:31:54] but no release on github :( [10:32:27] "Varnish 6.0 is currently under limited availability; during spring 2018, it will be made generally available. Get in touch to learn more about this new release." [10:32:30] sigh! [10:32:41] <_joe_> wat [10:32:57] "Varnish 6.0 is currently under limited availability; during spring 2018, it will be made generally available" [10:33:00] oops [10:33:00] that :) [10:34:06] <_joe_> notice how encryption of cache objects is supported, but I see no news about TLS [10:35:14] that's because it already exists in the version that will support encrypted cache objects: the proprietary one [10:35:19] "Varnish Plus" [10:36:43] "Vanish 6.0 now fully supports HTTP/2.0" [10:37:14] fully means also over tls? }:) [10:41:02] they offered hitch (a TLS terminator) for this before [10:41:17] https://github.com/varnish/hitch/ [10:42:17] I am wondering how many changes to the shmlog they have done this time (already done a big one for 5.2...) [14:31:15] as FYI with https://gerrit.wikimedia.org/r/#/c/416683/1/hieradata/role/common/cache/text.yaml we are migrating vk text traffic to Jumbo [14:31:15] vgutierrez: how's 1.15.0 doing on pybal-test? :) [14:33:16] behaves as expected on my short experience :) [14:33:57] I was thinking on the BGP issue: T188085 [14:33:58] T188085: Pybal stuck at BGP state OPENSENT while the other peer reached ESTABLISHED - https://phabricator.wikimedia.org/T188085 [14:34:08] we've been very Pybal centric on our approach [14:35:53] maybe XioNoX could help us, it would be nice to have something like https://gerrit.wikimedia.org/r/c/415260/ but network side, or even better, showing discordancies between what PyBal and the routers are reporting [14:39:36] yeah [14:40:24] an easy way to see how the routers are behaving is looking for AS64600 in libreNMS [14:40:29] "easy" [14:42:08] vgutierrez: so, if pybal-test looks good we can upgrade lvs1010 to 1.15.0. The host does not serve any actual user traffic but it has quite a few services and checks enabled [14:42:25] that sounds nice [14:44:10] what's the best approach to trigger the package update? [14:45:34] apt install pybal? :) [14:48:37] I guess that requires a !log line, right? [14:48:41] indeed [14:53:10] vgutierrez@lvs1010:~$ curl http://127.0.0.1:9090/metrics 2>1 |grep bgp |grep -v "#" [14:53:12] pybal_bgp_enabled 0.0 [14:58:26] log seems normal (same warnings as before the update) [14:58:34] ok [14:59:07] bgp is disabled on lvs1010, nice that we have a prometheus metric for that too now :) [14:59:18] something small-ish but with BGP enabled? [14:59:34] as BGP was the most affected thing in this release [14:59:52] lvs5003.eqsin.wmnet? [15:00:07] right, the new DC :D [15:00:56] vgutierrez@lvs5003:~$ grep bgp /etc/pybal/pybal.conf [15:00:56] bgp = yes [15:06:06] not looking good :& [15:06:08] :/ [15:06:21] Mar 06 15:02:57 lvs5003 pybal[11775]: File "/usr/lib/python2.7/dist-packages/pybal/bgp/bgp.py", line 2420, in _sendUpdates [15:06:24] Mar 06 15:02:57 lvs5003 pybal[11775]: attributeMap.setdefault(advertisement.attributes, set()).add(advertisement) [15:06:27] Mar 06 15:02:57 lvs5003 pybal[11775]: File "/usr/lib/python2.7/dist-packages/pybal/bgp/bgp.py", line 790, in __hash__ [15:06:30] Mar 06 15:02:57 lvs5003 pybal[11775]: return reduce(operator.xor, map(hash, self.itervalues()), 0) [15:06:33] Mar 06 15:02:57 lvs5003 pybal[11775]: File "/usr/lib/python2.7/dist-packages/pybal/bgp/bgp.py", line 655, in __hash__ [15:06:36] Mar 06 15:02:57 lvs5003 pybal[11775]: return hash((self.value[0:3] + (frozenset(self.value[3]), ))) [15:06:39] Mar 06 15:02:57 lvs5003 pybal[11775]: exceptions.IndexError: tuple index out of range [15:07:14] ha! [15:07:17] luckily, all of eqsin is still downtimed in icinga :) [15:07:35] I was going to undo that this morning, but it can wait till you're done using it as a testbed too :) [15:07:53] good timing :) [15:07:57] indeed [15:08:13] vgutierrez: that exception was not being raised on pybal-test though, was it? [15:08:50] I didn't see it [15:08:53] double checking [15:08:57] vgutierrez: please !log the upgrade of lvs5003 too [15:09:38] sir, yes sir [15:10:49] root@pybal-test2001:~# journalctl -u pybal --since "1 hour ago" | grep -i excep [15:10:52] root@pybal-test2001:~# [15:10:55] vgutierrez@lvs5003:~$ sudo journalctl -u pybal --since "10 minutes ago" |grep -i excep [15:10:58] Mar 06 15:02:57 lvs5003 pybal[11775]: --- --- [15:11:01] Mar 06 15:02:57 lvs5003 pybal[11775]: exceptions.IndexError: tuple index out of range [15:11:09] looks like it's something config related [15:11:14] or environment specific [15:11:34] full traceback here: https://phabricator.wikimedia.org/P6801 [15:12:10] looks like something is wrong with the prefixes to advertise in lvs5003 [15:12:22] I'm not familiar with this part of the code, but I can see something strange, the parent signature (Base def __init__(self, value=(AFI_INET, SAFI_UNICAST), attrTuple=None) [15:13:26] while the concrete one is: [15:13:27] __init__(self, value=None, attrTuple=None) [15:13:55] and calls the parent with value=value (so overriding its value) if the attrTuple is not True [15:14:58] might be a red herring ofc ;) [15:16:04] so the error is basically a serialization error while trying to log something right? [15:17:06] hmm serialization error when trying to send an UPDATE BGP message [15:17:28] oh, no [15:17:29] right [15:28:23] vgutierrez: this is a shot-in-the-dark suggestion, but perhaps the __setattr__ method recently added to class BGPPeering doesn't handle inheritance right? [15:28:41] the part that reads: [15:28:43] # old style class, super().__setattr__() doesn't work [15:28:43] # https://docs.python.org/2/reference/datamodel.html#customizing-attribute-access [15:28:46] self.__dict__[name] = value [15:29:34] the only other place in that file I see a custom __setattr__ being used, the part that does the setting looks more like: [15:29:37] super(FSM, self).__setattr__(name, value) [15:29:50] yup [15:30:13] class FSM(object) works with super()... [15:30:44] all webrequest traffic migrated to Jumbo (finally) [15:30:50] but yeah I have honestly very little knowledge of advanced OO trickery with python, and thus most of pybal goes right over my head :) [15:30:54] elukey: \o/ [15:30:58] \o/ [15:31:46] from the documentation referenced in that fragment of code [15:31:48] If __setattr__() wants to assign to an instance attribute, it should not simply execute self.name = value — this would cause a recursive call to itself. Instead, it should insert the value in the dictionary of instance attributes, e.g., self.__dict__[name] = value. For new-style classes, rather than accessing the instance dictionary, it should call the base class method with the same name, [15:31:54] for example, object.__setattr__(self, name, value). [15:32:23] BGPPeering being and old-style class, it looks like "self.__dict__[name] = value" should be enough [15:33:25] is that code path unit tested? :) [15:33:52] the __setattr__()? [15:33:55] indeed [15:34:17] hmmm ok [15:35:30] but that's an easy test [15:35:42] right, comment out the setattr and see if lvs5003 still crashes :) [15:35:45] if I get rid of __setattr__ temporarily we'd only lose some metrics [15:35:47] yup [15:35:48] :D [15:36:39] I only focus on that because looking at the 1.14..master diffs, it's one of the only places that seems to play any kind of trickery that might trip up some self.value[0:3] thing in a __hash__ [15:36:53] although I cannot draw a direct line between the error and that code in my head heh [15:38:19] root@lvs5003:/usr/lib/python2.7/dist-packages/pybal/bgp# journalctl -u pybal --since "3 minutes ago" |grep excep [15:38:22] Mar 06 15:37:49 lvs5003 pybal[20022]: --- --- [15:38:25] Mar 06 15:37:49 lvs5003 pybal[20022]: exceptions.IndexError: tuple index out of range [15:38:28] nah [15:38:44] ok, try to log self.value maybe? [15:39:16] the one leading to IndexError [15:39:29] bblack: for me smells like https://github.com/wikimedia/PyBal/commit/e2f4c6feec7afd8bb0f507ae60aae141742c0f13#diff-5650e972ab1a1492f87127184aab13ee [15:43:29] yeah that makes some sense too [15:45:46] are we happy of the fact that exceptions in bgp.py do not make pybal crash, but just get logged? [15:46:11] heh [15:46:46] not crashing is a noble goal, but not-crashing by letting one of your most important functions cease working is probably not a good way to get there :) [15:47:46] in this specific case it's bad.. basically it's crashing in the point where everything looks good on the BGP side (session is ESTABLISHED) but no UPDATE messages are being sent [15:50:58] Mar 06 15:49:36 lvs5003 pybal[23129]: [bgp] INFO: Hashing ((2, 1), '2001:df2:e500:101:10:132:0:13', []) [15:51:02] that's self.value [15:51:19] and of course self.value[3] triggers the out of range [15:52:58] ha! [15:53:32] I assume the tuple looked differently on 1.14.x? [15:54:51] according to the comments [15:54:53] should look like this [15:54:53] # Tuple encoding of self.value: [15:54:54] # (AFI, SAFI, NH, [NLRI]) [15:55:03] [NRLI] is not there at all [15:55:51] and looking at __init__ it looks like the caller is messing up [15:55:51] def __init__(self, value=None, attrTuple=None): [15:55:51] super(MPReachNLRIAttribute, self).__init__(value=value, attrTuple=attrTuple) [15:55:55] if not attrTuple: [15:55:57] self.value = value or (AFI_INET6, SAFI_UNICAST, IPv6IP(), []) [15:57:32] see my first comment above :D [15:58:28] well the (2,1) is probably afi,safi and needs flattening somewhere [16:00:47] https://github.com/wikimedia/PyBal/blob/1.15/pybal/bgpfailover.py#122 [16:00:50] that looks like the offender to me [16:01:12] compared to https://github.com/wikimedia/PyBal/blob/1.14/pybal/bgpfailover.py#L52-L53 [16:02:20] yes [16:02:33] and it's new in the MED-related patch [16:02:58] so it's missing the safi bit? [16:04:34] it should be afAttrs[bgp.MPReachNLRIAttribute] = bgp.MPReachNLRIAttribute((af[0], af[1], bgp.IPv6IP(self.nexthopIPv6), [])) [16:05:10] I think that sounds right [16:05:34] (although arguably it will always be SAFI_UNICAST, but probably better to not hardcode that in this place I guess) [16:08:06] now the log looks like this [16:08:07] Mar 06 16:07:14 lvs5003 pybal[26937]: [bgp.FSM@0x7f2ae5607850 peer 103.102.166.129:61602] INFO: State is now: ESTABLISHED [16:08:10] Mar 06 16:07:14 lvs5003 pybal[26937]: [bgp.BGPFactory@0x7f2ae574a488] INFO: BGP session established for ASN 64600 peer 103.102.166.129 [16:08:13] Mar 06 16:07:14 lvs5003 pybal[26937]: [bgp] INFO: Hashing (2, 1, '2001:df2:e500:101:10:132:0:13', []) [16:08:16] Mar 06 16:07:14 lvs5003 pybal[26937]: [bgp] INFO: Hashing (2, 1, '2001:df2:e500:101:10:132:0:13', []) [16:08:22] the session is ESTABLISHED, UPDATES are being sent and there are not crashes [16:08:30] s/not/no/g [16:10:53] not bad for my first attempt of releasing pybal /o\ [16:12:16] wfm \o/ [16:12:38] I'll try to come with a test that shows the issue and a patch [16:12:42] for 15.0.1 [16:12:50] *1.15.1 [16:16:49] nice [16:18:05] bblack: when you have a sec: https://gerrit.wikimedia.org/r/#/c/416652/ (post-v5 cleanup) [16:26:08] <_joe_> can I suggest to keep the amount of logs in normal operating mode to a minimum? [16:27:54] regarding Mar 06 16:07:14 lvs5003 pybal[26937]: [bgp] INFO: Hashing (2, 1, '2001:df2:e500:101:10:132:0:13', [])? [16:31:05] ema: relatedly - I did first-install on cp5010 last night after its ethernet was fixed. It came up strangely puppet-wise, because apparently it installed varnish4 initially. but "apt-get install varnish" installed varnish5. I'm not sure what the deal is there with priorities/repos/etc. [16:32:56] (I'm guessing it's still in experimental or whatever) [16:40:08] bblack: it is still in experimental, yes [16:40:53] so that should be fixed once we move the package to main [16:41:35] 15:40:24 an easy way to see how the routers are behaving is looking for AS64600 in libreNMS [16:41:35] 15:40:29 "easy" [16:41:50] an easier way is just logging into the router and use "show bgp neigh ..." commands [16:42:21] oh mark, hi [16:42:24] https://gerrit.wikimedia.org/r/c/416711/ plz :D [16:45:14] +1 [16:45:32] awesome :D [16:46:27] the nice bit about bgp in pybal is [16:46:32] it only really does anything during startup [16:46:41] so once we verify whether it correctly sends the update on startup [16:46:46] and then manages to maintain the session [16:46:49] then we're good ;) [16:46:54] yup.. but in this scenario [16:47:10] sending the updates is crashing, and pybal looks all good [16:47:19] besides the logs screaming of course :) [16:47:24] well we've seen this pattern before, e.g. with pybal<->etcd [16:47:52] what pattern do you mean? [16:47:55] the general problem is a thread handling "foo" can crash and the rest of pybal keeps going, with only logs or functional problems to let us know what happened. [16:48:11] there are not really threads, but yeah [16:48:32] s/thread/some-unit-of-abstraction-that-can-crash-while-other-parts-of-the-process-dont/ [16:48:49] we should probably improve our monitoring/alerting [16:49:44] it's tricky to decide when pybal should exit on failure and when it shouldn't [16:49:49] it really depends on the situation [16:49:59] given we have backup routes to protect us in the case of a pybal crash over multiple machines, it seems to be like an improvement if the general-case were fixed that all crashes bubble up to process death instead of being caught wherever it is they're being caught at. [16:50:31] having pybal exit sucks, but having pybal half-functional and not knowing it unless we add a bunch of specific monitoring to double-check everything it touches also sucks. [16:50:32] perhaps it should be a config option [16:50:37] yes [16:50:52] perhaps any top level unhandled exception could also be a metri [16:50:53] metric [16:50:58] so prometheus metric could alert [16:51:33] in our specific case it would probably be fine to exit always nowadays, yes [16:51:40] but for other people not always necessarily [16:51:45] (and apparently there are a few 3rd party users out there) [16:52:34] vgutierrez: it would be good to configure pybal-test/quagga so that updates are actually sent (that would have allowed to repro this issue there rather than on lvs5) [16:52:51] I guess I can merge this: https://gerrit.wikimedia.org/r/c/416712/ just cherrypicking the fix from master [16:53:31] I'm just saying on general principle: if an unpredicted random exception happens, the code can't trust itself anymore to make a judgement call about whether continuing to operate at all is a good idea. [16:53:33] ema: maybe the difference is that we don't have any IPv6 prefix in the pybal test environment? [16:53:41] vgutierrez: that can very well be [16:54:06] i.e. that this is generally a case of https://en.wikipedia.org/wiki/Defensive_programming -> "here is also the risk that the code traps or prevents too many exceptions, potentially resulting in unnoticed, incorrect results" [16:54:09] vgutierrez: yes, re:cherry-picking to 1.15 [16:54:38] nice, merging it [16:56:00] vgutierrez: we also need to cherrypick https://gerrit.wikimedia.org/r/#/c/416412/ onto master, and add a new commit to master bumping debian/changelog to 1.15.1, to be later cherry-picked onto the 1.15 branch [16:56:32] hmmm [16:56:44] you mean https://gerrit.wikimedia.org/r/c/416657/ ? [16:57:02] oh you've done it already :) [16:57:05] ema: I'm working on the changelog for 1.15.1 right now [16:57:10] * vgutierrez slow [16:57:43] in this case, any exception related to a bgp session should at least reset that session [16:58:13] vgutierrez: I hadn't noticed that the 1.15.0 commit was already on master, very good [16:58:23] well but this particular type/source of bgp exception wasn't fixable, and wouldn't have been predicted by a specific catch. and if we reset, it would just crash again in a loop [16:58:38] yes, but we would notice that [16:59:12] mark: maybe, maybe not :) [16:59:22] alternatively a router sends something either bad or unexpected, should that necessarily crash pybal? [16:59:28] well we'd also notice a single global crash of the process easier. the only way we notice the reset->crash loop is the logspam somewhere on one end or the other. [16:59:57] when a router sends something bad or unexpected, that's an input validation issue, those should be predictable/catchable. [17:00:12] they should be yes [17:00:21] but if they crash all your pybal instances at once then that doesn't really help you [17:00:26] anyway, meeting now [17:00:49] blerg meetings! [17:29:42] gbp-buildpackage -> git-pbuilder -> pdebuild -> cowbuilder -> pbuilder -> pbuilder-buildpackage -> dpkg-buildpackage -> debian/rules binary [17:29:54] I'm just leaving this one here [17:37:30] well.. 1.15.1 rolled out succesfully in lvs1010 && lvs5003 :D [17:45:46] vgutierrez: cool, let's if they behave during the EU night :) [17:45:54] of course they will! [18:45:12] 10Traffic, 10Operations, 10TemplateStyles, 10Wikimedia-Extension-setup, and 4 others: Deploy TemplateStyles to WMF production - https://phabricator.wikimedia.org/T133410#4028292 (10ggellerman) [19:40:47] 10Traffic, 10Operations, 10TemplateStyles, 10Wikimedia-Extension-setup, and 4 others: Deploy TemplateStyles to WMF production - https://phabricator.wikimedia.org/T133410#3331578 (10Tgr) [22:54:09] 10Traffic, 10netops, 10Operations, 10ops-eqsin: replace eqsin SFP-T/SFP+ - https://phabricator.wikimedia.org/T188923#4029421 (10Papaul) [x] On the device labeled cr1-eqsin, Juniper MX104, top of rack 603, please replace the 4 optics present in the embedded ports (aka not in modules) labeled xe-2/0/0 to xe-... [22:59:22] 10Traffic, 10Operations, 10ops-eqsin: cp5006 unresponsive - https://phabricator.wikimedia.org/T187157#4029425 (10Papaul) All the normal troubleshooting was done on the server. - Unplugging the power - Removing the PSU's for 15 minutes while working on the router Server will not power on. [23:16:28] 10Traffic, 10netops, 10Operations, 10ops-eqsin: replace eqsin SFP-T/SFP+ - https://phabricator.wikimedia.org/T188923#4029452 (10Papaul) [23:17:31] 10Traffic, 10netops, 10Operations, 10ops-eqsin: replace eqsin SFP-T/SFP+ - https://phabricator.wikimedia.org/T188923#4024207 (10Papaul) [23:18:15] 10Traffic, 10netops, 10Operations, 10ops-eqsin: replace eqsin SFP-T/SFP+ - https://phabricator.wikimedia.org/T188923#4024207 (10Papaul) [] On the device labeled asw-0603-eqsin, Juniper EX4600, rack 603, please replace the SFP-T (copper SFPs) present in ports 12, 14 and 23 with the QFX-SFP-1GE-T transceiver... [23:26:46] 10netops, 10Operations, 10ops-eqsin: return faulty MX104 to Juniper - https://phabricator.wikimedia.org/T189060#4029482 (10ayounsi)