[00:11:18] 10Traffic, 10Operations, 10ops-ulsfo, 10Patch-For-Review: rack/setup/install cp40(29|3[012]).ulsfo.wmnet - https://phabricator.wikimedia.org/T178423#3695566 (10RobH) [00:43:00] 10Traffic, 10Operations, 10ops-ulsfo: rack/setup/install cp40(29|3[012]).ulsfo.wmnet - https://phabricator.wikimedia.org/T178423#3695572 (10RobH) [08:15:06] 10Traffic, 10Operations, 10Wikimedia-Logstash, 10Patch-For-Review, 10Services (watching): RESTBase logs disappeared from logstash - https://phabricator.wikimedia.org/T178078#3695783 (10fgiunchedi) 05Open>03Resolved a:03fgiunchedi No problem @Pchelolo ! The problem has been fixed on the lvs side too... [08:52:18] 10Traffic, 10Operations: Select or Acquire Address Space for Asia Cache DC - https://phabricator.wikimedia.org/T156256#3696012 (10faidon) We now have an APNIC account, and we were assigned today this IP space: - 103.102.166.0/24 - 2001:df2:e500::/48 There is an on-going thread with APNIC about some WHOIS oddi... [08:52:32] bblack, XioNoX, ema: ^ :) :) [08:53:26] 10Traffic, 10Operations: Allocate address space for Singapore (APNIC) - https://phabricator.wikimedia.org/T156256#3696016 (10faidon) [08:59:26] \o/ [09:00:24] nice! [09:09:28] <_joe_> paravoid: :))) [11:07:49] yay [12:53:29] awesome :) [12:53:55] paravoid: I have portal access now, I just have to sort out getting everyone else SG3 portal access (and maybe getting myself access to the other DCs in it) [12:57:36] ^ paravoid: assigned or allocated? :) [12:58:27] we're not a LIR [12:58:32] this is infrastructure [12:58:46] 2. This assignment cannot be further sub-delegated to your customers [12:58:50] This is for your infrastructure only. [12:59:13] ok [12:59:42] they still haven't fixed the whois situation and didn't really read my email [12:59:55] so I sent them another one last night, they'll probably see it tomorrow or whatever [13:00:11] anyway, long story short, let's not edit those objects or create route objects etc. until this whole thing is fixed [13:03:31] in case they change the numbers when they reset/fix the handles situation? [13:03:57] (surely not. but I guess, we don't know which handles to use for other things) [13:06:34] paravoid: I"m able to go into the portal form now to order metro cross-connects in SG, and the only option for the "other" side from SG3 is SG2 :) [13:07:13] (also I've added mark, paravoid, ayounsi, robh to SG3 access in the portal) [13:07:24] great [13:09:58] awesome [13:10:32] the IPs won't change, but whois objects reference each other [13:10:56] so if we e.g. create route objects now that are mnt-by MAINT-WIKIMEDIA19-AP it might make things more difficult down the line [13:11:17] also, APNIC is confused as to what has happened (they said that I created 20 IRT objects) [13:11:35] so let's not touch anything and confuse them more :) [13:11:44] bblack: any clarity on SG3<->SG1 interconnects? [13:14:07] not yet, the sales rep responded to only one of my pair of questions (about the portal stuff) [13:14:43] oh wait, I see that my first cup of coffee is not yet complete [13:14:55] hidden at the bottom of the email after an inline image, she did answer the other question: [13:14:58] "The cross connects at NRC:455+MRC:396 per port on the order can be connected between SG1 and SG3, however it does not apply to connections to SG2 (further site). [13:15:01] Do let us have the details of the connections to provision them." [13:15:06] ah! [13:15:08] awesome [13:15:12] good news! [13:16:17] I can see SG3 in the portal [13:16:24] good [13:18:11] the cross-connect form allows me to choose SG1, SG2 and SG3 [13:19:06] anyway, we'll figure that out [13:19:31] I can just ask SG1 carriers now without asking for local loops, that's the immediate gain :) [13:19:37] right [13:20:17] that was quite the scare eh :0 [13:20:18] :) [13:21:01] yes! [13:22:21] but still, they knew we were asking about carriers and ordering x-connects to use them, they sent us the huge SG1 list for comparison to other DCs, and only ever offered SG3 for the space. It was an oversight on our (my!) end not to ask, but it'd be pretty shady of them to now say we needed more fees to reach that carrier list :P [13:23:22] shady, but not beyond them [16:06:06] ah of course, it's in the ticket, I don't know why I couldn't find it elsewhere [16:06:09] https://gerrit.wikimedia.org/r/#/c/317450/ [16:06:54] (^ that's alex's outstanding patch for ncredir) [16:07:03] (the other alex) [16:13:36] 10Traffic, 10Operations: Allocate address space for Singapore (APNIC) - https://phabricator.wikimedia.org/T156256#3696962 (10BBlack) We can do revdns and basic puppet address space commits here or in T156027 as appropriate I think (maybe most of the puppet-level stuff over there). One thing it would be nice t... [16:22:43] 10Traffic, 10Operations, 10Patch-For-Review, 10User-notice: Removing support for DES-CBC3-SHA TLS cipher (drops IE8-on-XP support) - https://phabricator.wikimedia.org/T147199#3696991 (10Johan) Very well. "You may be able to install" is actually fairly complicated to parse (three verbs, not obvious to non-f... [16:22:56] bblack: re https://phabricator.wikimedia.org/T156256#3696962 those works for me [16:25:31] bblack: well, we use a /110 for v6 at other sites [16:26:39] yeah there's some discrepancies in how we declare things at different levels [16:27:14] the /110 is more-correct [16:27:37] but the bottom line is the whole /64 contains nothing but the /110 [16:28:19] https://www.mediawiki.org/wiki/Wikipedia_Zero/IP_Addresses [16:28:24] ^ this is, I think, the list they use now [16:28:49] although it wasn't edited since 2014, and I swear there was another similar page that had more detail on a different wiki [16:30:09] bblack: yeah that's why I said the /64 works as well, we don't have any risk of overlapping [16:30:32] ah found it: https://office.wikimedia.org/wiki/Wikipedia_Zero_Destination_IP_Addresses [16:30:45] I also made https://wikitech.wikimedia.org/wiki/IP_and_AS_allocations but it's not complete for v6 [16:30:52] they have the /111 (half the /110) for the older sets at the bottom that are no-multimedia [16:31:12] AFAIK they don't have non-multimedia partners anymore though, it's just the top set that matters (same as the other link) [16:31:29] that makes sens then [16:34:24] paravoid: I can't access the RPKI page on the APINC portal: "You do not have the Resource certification - Update permission. You must have this permission before you can access this page." [16:35:11] 10Traffic, 10Operations, 10ops-ulsfo: replace ulsfo aging servers - https://phabricator.wikimedia.org/T164327#3229950 (10RobH) [16:35:46] I'll have a look in a bit, but in any case [16:35:49] don't do anything just yet [16:35:56] as I said above :) [16:36:14] no route objects, no DNS, nothing [16:36:26] as they're already confused and it's going to be even more confusing [16:36:56] yup, was just curious to see how they do RPKI :) [16:38:19] huh, they seem to have extra permission bits [16:38:23] hidden very well :) [16:38:51] I think I just gave you access to it [16:39:38] 10Traffic, 10Operations, 10ops-ulsfo: decommission/replace bast4001.wikimedia.org - https://phabricator.wikimedia.org/T178592#3697086 (10RobH) [16:40:47] bblack: just to confirm i can now see sg3 in portal.equinix.com for incoing shipment requests [16:41:02] yay [16:41:02] i know you are handling xconnects so didnt bother to check that (since you said its not right anyhow yet) [16:41:17] 10Traffic, 10Operations: Allocate address space for Singapore (APNIC) - https://phabricator.wikimedia.org/T156256#3697109 (10faidon) Yup, that's fine, as is creating the zones in the DNS and puppet repository (but not do the reverse delegation). [17:45:37] bblack, hey, did you take a look at that patch? [17:47:30] no, but I've been doing useless-manager stuff like bringing it up in meetings and planning that we'd like to complete this work somehow unofficially this quarter :) [17:47:45] Krenair: ^ :) [17:49:32] :D [18:00:36] 10Traffic, 10Operations, 10ops-ulsfo, 10Patch-For-Review: rack/setup/install cp40(29|3[012]).ulsfo.wmnet - https://phabricator.wikimedia.org/T178423#3697388 (10ops-monitoring-bot) Script wmf-auto-reimage was launched by bblack on neodymium.eqiad.wmnet for hosts: ``` ['cp4029.ulsfo.wmnet', 'cp4030.ulsfo.wmn... [18:19:20] bblack: anything we should more for https://phabricator.wikimedia.org/T176386 ? [18:21:55] 10Traffic, 10DC-Ops, 10Operations, 10ops-esams: Multiple systems in esams OE10 showing PSU failures - https://phabricator.wikimedia.org/T177228#3651185 (10faidon) IIRC, @mark said that the rack in question doesn't have a secondary PDU. New PDUs for esams are in the budget this year, so I guess this is plan... [18:25:08] 10Traffic, 10Operations, 10Patch-For-Review: upload@ulsfo strange ethernet / power / switch issues, etc... - https://phabricator.wikimedia.org/T176386#3697507 (10BBlack) 05Open>03Resolved a:03BBlack Nothing really to do here, except remember it if new power issues arise with the new hosts... [18:25:08] paravoid: nope, not really [18:26:45] thx :) [18:31:25] 10Traffic, 10Operations, 10ops-ulsfo, 10Patch-For-Review: rack/setup/install cp40(29|3[012]).ulsfo.wmnet - https://phabricator.wikimedia.org/T178423#3697535 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['cp4031.ulsfo.wmnet', 'cp4030.ulsfo.wmnet', 'cp4032.ulsfo.wmnet', 'cp4029.ulsfo.wmnet'] `... [18:51:19] 10netops, 10Operations: Find a new PIM RP IP - https://phabricator.wikimedia.org/T167842#3346542 (10hashar) Thanks @Faidon for the link! Jenkins is never out of surprise. We do not rely on that auto discovery feature and I will get it disabled in the daemon. [18:54:17] 10netops, 10Continuous-Integration-Infrastructure, 10Operations, 10Jenkins, 10Release-Engineering-Team (Kanban): Disable Jenkins autodiscovery system - https://phabricator.wikimedia.org/T178608#3697616 (10hashar) [19:06:06] 10netops, 10Operations: Find a new PIM RP IP - https://phabricator.wikimedia.org/T167842#3697696 (10ayounsi) @Gehel See Faidon's comment on T167842#3353703. Is there any reasons to have JMX agent autodiscovery enabled? [19:10:06] ema: I suspect the recent refactorings (profile, etc?) have messed up dependencies for new installs [19:10:34] not that our deps were perfect before, we typically had to run puppet 4-5 times after a fresh install to get everything clean [19:10:46] but now it fails repeatedly without intervention so far [19:11:08] (got stuck on retrying to reload VCL, when the initscripts weren't yet provisioned to turn on inline C support) [19:11:42] and then after manually deleting the statefiles for retry-reload-vcl stuff, it still gets stuck because it's not provisioning the systemd unit files for our varnish instances before trying to start them [19:11:52] maybe that latter bit is the systemd unit declaration stuff, will look around [19:13:30] Error: Could not start Service[varnish-frontend]: Execution of '/bin/systemctl start varnish-frontend' returned 6: Failed to start varnish-frontend.service: Unit varnish-frontend.service failed to load: No such file or directory. [19:13:35] Error: /Stage[main]/Cacheproxy::Instance_pair/Varnish::Instance[text-frontend]/Base::Service_unit[varnish-frontend]/Service[varnish-frontend]/ensure: change from stopped to running failed: Could not start Service[varnish-frontend]: Execution of '/bin/systemctl start varnish-frontend' returned 6: Failed to start varnish-frontend.service: Unit varnish-frontend.service failed to load: No such file o [19:13:41] r directory. [19:13:43] ^ that [19:16:01] ah, it's really a shortcoming of base::service_unit for this use-case [19:16:03] $path = $initscript ? { [19:16:03] 'systemd' => "/lib/systemd/system/${name}.service", [19:16:06] 'systemd_override' => "/etc/systemd/system/${name}.service.d/puppet-override.conf", [19:16:19] it supports a systemd_override + default debian basic unitfile, or a custom unitfile [19:16:29] it doesn't support a custom unitfile + custom override too [19:16:59] (I guess the implicit assumption there is "you could've put these overrides in the custom unit file you're already deploying") [19:17:51] and of course, had that not been at issue, the vcl reload wouldn't have gone into repeat-fail (for lack of inline-c param in the custom base unit file) [19:23:32] 10netops, 10Continuous-Integration-Infrastructure, 10Operations, 10Jenkins, 10Release-Engineering-Team (Kanban): Disable Jenkins autodiscovery system - https://phabricator.wikimedia.org/T178608#3697773 (10hashar) a:03hashar [19:33:10] 10netops, 10Operations: Find a new PIM RP IP - https://phabricator.wikimedia.org/T167842#3697822 (10Gehel) I see no reason to have jolokia even accessible on the network, it should be local only. I'll have a look into our config (to be honest I don't know much about jolokia, but that's a good occasion to dig i... [19:44:16] bblack: have a look at systemd::unit, I think the intention for that was to replace base::service_unit [19:45:42] we just moved from systemd::service -> base::service_unit heh [19:48:20] 10netops, 10Operations, 10fundraising-tech-ops, 10ops-codfw, 10Patch-For-Review: codfw: rack frack refresh equipment - https://phabricator.wikimedia.org/T169643#3697889 (10ayounsi) 05Open>03Resolved [20:07:31] 10netops, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar), 10Performance-Team-notice: Merge AS14907 with AS43821 - https://phabricator.wikimedia.org/T167840#3697928 (10ayounsi) As of today, 180 BGP sessions use the old AS# and 216 use the new one. Timeline for decommissioning the old AS# (d... [20:11:37] 10netops, 10Operations, 10Patch-For-Review: Tracking task for network syslog messages - https://phabricator.wikimedia.org/T174397#3697931 (10ayounsi) 05Open>03Resolved Will update that task if needed in the future. [22:33:26] 10netops, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar), 10Performance-Team-notice: Merge AS14907 with AS43821 - https://phabricator.wikimedia.org/T167840#3698379 (10BBlack) +1 LGTM!