[07:12:06] 10Traffic, 10Operations: varnish-frontend-fetcherr: Assert error in vslc_vtx_next, 100% CPU usage - https://phabricator.wikimedia.org/T253093 (10ema) [07:13:15] 10Traffic, 10Operations: varnish-frontend-fetcherr: Assert error in vslc_vtx_next, 100% CPU usage - https://phabricator.wikimedia.org/T253093 (10ema) p:05Triage→03Medium [07:29:17] 10netops, 10Operations, 10ops-eqiad: Three ports on asw2-d-eqiad are not working as expected - https://phabricator.wikimedia.org/T247881 (10ayounsi) 05Open→03Stalled p:05Triage→03Low a:05Cmjohnson→03None Sounds good! This will have to wait for a time we for example do T196487. Outside of COVID t... [08:27:17] 10Traffic, 10Operations, 10vm-requests: Create a Ganeti VM for Wikidough - https://phabricator.wikimedia.org/T253024 (10Dzahn) Is the requested hostname "homer" a copy/paste error? Does this need to be in eqiad or would codfw work just as well? [08:50:37] 10Traffic, 10Operations, 10Patch-For-Review: varnish-frontend-fetcherr: Assert error in vslc_vtx_next, 100% CPU usage - https://phabricator.wikimedia.org/T253093 (10ema) 05Open→03Resolved a:03ema Checking `poll()` and exiting when it returns something else than `None` seems to have fixed things. Tested... [09:22:44] 10netops, 10Operations, 10ops-eqiad: (Need by: 2019-09-30) upgrade msw1-eqiad from EX4200 to EX4300 - https://phabricator.wikimedia.org/T225121 (10faidon) Are there any updates to this task and any particular reasons it's been held up? While this was never super urgent, we're now at the ~one year mark since... [10:31:00] 10Traffic, 10Discovery, 10Operations: Search autocompletion broken for recent articles (after April 30?) for some users / browsers - https://phabricator.wikimedia.org/T253114 (10Tgr) [10:33:47] 10Traffic, 10Discovery, 10Operations: Search autocompletion broken for recent articles (after April 30?) for some users / browsers - https://phabricator.wikimedia.org/T253114 (10Tgr) [12:00:49] 10Traffic, 10Operations, 10vm-requests: Create a Ganeti VM for Wikidough - https://phabricator.wikimedia.org/T253024 (10ssingh) [12:01:53] 10Traffic, 10Operations, 10vm-requests: Create a Ganeti VM for Wikidough - https://phabricator.wikimedia.org/T253024 (10ssingh) >>! In T253024#6147641, @Dzahn wrote: > Is the requested hostname "homer" a copy/paste error? (Updated so as not to confuse with the existing service) > Does this need to be in e... [12:20:11] 10Traffic, 10Operations, 10vm-requests: Create a Ganeti VM for Wikidough - https://phabricator.wikimedia.org/T253024 (10ssingh) [12:22:18] 10Traffic, 10Operations, 10vm-requests: Create a Ganeti VM for Wikidough - https://phabricator.wikimedia.org/T253024 (10Dzahn) a:03Dzahn [12:22:44] sukhe_: lol, bikeshed2001? [12:23:26] 10Traffic, 10Operations, 10vm-requests: Create a Ganeti VM for Wikidough - https://phabricator.wikimedia.org/T253024 (10ssingh) Updated the task description as we have decided to go with codfw. For the disk, we can start with 30G and then add an additional one if required. (In case we decide to use the disk... [12:24:12] sukhe_: since now I've been sniped, I suggest `doh2001` and also updating https://wikitech.wikimedia.org/wiki/Infrastructure_naming_conventions#Servers [12:25:00] cdanis: haha! it is intentional. we decided to not refer to wikidough or the resolver explicitly, just so as to not attract attention. I decided to use "homer" (d'oh!) but it was an existing service (I had just looked up wthe zone file :) [12:25:08] ahah [12:25:48] I suppose, seems a little odd to me given it's a public bug and a public wikitech page ;) but sure [12:26:02] yeah that's fair [12:26:14] channeling my inner v.o.l.a.n.s. you might want to use one of the one-off style names https://wikitech.wikimedia.org/wiki/Infrastructure_naming_conventions#Miscellaneous_servers [12:26:35] cdanis: those are deprecated AFAIK [12:26:42] The use of those names is deprecated in favour of specialized cluster names above, when possible. [12:26:48] not for single servers [12:26:56] WARN: bikeshedding about bikeshedding about naming conventions [12:27:03] haha [12:27:06] only for stuff which resembles some level of cluster [12:27:35] my work here is done [12:27:45] i can already see the future ticket where the one-off name is converted to a "proper cluster" [12:27:58] it sounds like a production deployment would be different anyway [12:28:10] I needed a name that is !wikidough or !resolver or !doh [12:28:15] proper cluster name, multiple machines in at least each code DC [12:28:33] cdanis: yeah, this is experimental for now. we will do smarter things from the actual deployment, with a separate IP space and all that :) [12:28:35] so I'm not worried about that part mutante [12:29:43] well then.. pick a nice star name :) https://en.wikipedia.org/wiki/List_of_proper_names_of_stars [12:29:54] that has not been used before :p [12:30:18] in retrospect, the ticket could have been made private as well but I think we didn't want that. there is also the fact that we need to balance being public about this so that *some* people can use it and help us test it but not all people and that balance is tricky [12:32:08] I am a bit biased towards andromeda [12:33:45] too bad deployment_servers are not misc anymore. adhil "The name was originally Arabic الذيل‎ aḏ-ḏayl, 'the train'" [12:36:07] sukhe_: that's a constellation, not a star ;) [12:36:46] right, I thought that was fine but apparently no, it has to be *a* star [12:37:10] (https://www.imdb.com/title/tt0213327/) [12:37:17] volans: i was thinking it and did not dare to say it :) [12:37:25] 3rd column :) [12:37:58] I see nothing existing for "malmok" [12:38:00] done? [12:38:19] git log -p is fre [12:38:21] *free [12:38:27] no objection :D [12:38:43] sounds good [12:38:56] phew [12:44:20] 10Traffic, 10Operations, 10vm-requests: Create a Ganeti VM for Wikidough - https://phabricator.wikimedia.org/T253024 (10ssingh) [13:38:34] https://labs.ripe.net/Members/joao_m_ceron/the-bgp-tuner-intuitive-anycast-management [13:40:03] neat! [13:42:00] Verfploeter is pretty cool [13:59:46] https://www.isi.edu/~johnh/PAPERS/Vries17a.pdf also a nice read [14:08:53] 10netops, 10Operations: Set minimum-links 2 to AMS-IX LACP - https://phabricator.wikimedia.org/T253122 (10ayounsi) p:05Triage→03High [14:14:13] 10netops, 10Operations: Set minimum-links 2 to AMS-IX LACP - https://phabricator.wikimedia.org/T253122 (10CDanis) LGTM assuming we don't also configure `optimize-aggregate-frr`: https://kb.juniper.net/InfoCenter/index?page=content&id=KB34635&actp=METADATA [14:17:06] "[...]is partially sponsored by the Department of Homeland Security (DHS) Science and Technology Directorat" [14:17:11] I didn't know that was a thing [14:18:45] XioNoX: i didn't either, but it doesn't exactly surprise me i guess [14:19:25] 10netops, 10Operations: Set minimum-links 2 to AMS-IX LACP - https://phabricator.wikimedia.org/T253122 (10jbond) LGMT [14:21:13] I don't think I understand what Verfploeter is from the Abstract and the Conclusion [14:21:55] XioNoX: it's, for every /24 that has one, list with an IP address that responds to ping requests, and a mass ping-er you run somewhere, and then a ping response collector you run on each of your anycast nodes [14:22:07] and then some tools to do geocoding & data analysis [14:22:46] ah ok! [14:22:47] thanks [14:23:54] just sends pings with a source address == your anycast address [14:28:41] 10netops, 10Operations: Set minimum-links 2 to AMS-IX LACP - https://phabricator.wikimedia.org/T253122 (10ayounsi) 05Open→03Resolved Thanks! This will also help in case the wrong cable gets bumped into during the new link provisioning. [15:55:20] bblack: PSA that we can now do Prometheus queries like https://w.wiki/RGh (if this returns grafana homepage, log in and try again) [15:57:27] nice! [15:57:37] why does this need a login? [15:57:51] it's the /explore endpoint, which lets you query arbitrary metrics and queries [15:57:52] the explore feature [15:58:03] normally you need dashboard edit to do that [15:58:56] * bd808 kind of misses the old fully public graphite data [15:59:49] 10Traffic, 10Operations: track NIC firmware version numbers across the fleet - https://phabricator.wikimedia.org/T236744 (10CDanis) 05Open→03Resolved Prometheus metrics now exist, via a textfile exporter installed by Puppet on every physical host. Sample output for the metric on an LVS machine: ` # HELP... [16:04:57] cdanis: awesome :) [16:05:17] bd808: yeah, the tradeoff is you get to actually do computation in your queries [16:05:40] we could probably make it public, I guess, but we'd want to check what we actually have lurking in our prometheis first [16:05:41] bd808: my feeling re: graphite https://i.imgur.com/A356jiI.gif [16:06:49] but in seriousness yes, we can also think about another prometheus instance that's public and then we're ok if folks go crazy on it, modulo what cdanis re: data [16:28:10] 10netops, 10Operations: intermittent brief data dropouts for esams netflow data - https://phabricator.wikimedia.org/T253128 (10CDanis) [16:28:18] 10netops, 10Operations: intermittent brief data dropouts for esams netflow data - https://phabricator.wikimedia.org/T253128 (10CDanis) p:05Triage→03Low [16:30:09] 10Traffic, 10Operations: OCSP Stapling for Intermediates - https://phabricator.wikimedia.org/T148134 (10Aklapper) >>! In T148134#2760707, @BBlack wrote: > Stalling this on further progress in the rest of the world (browsers implementations, TLS Cached Info, etc). 3.5 years after stalling this task, it seems t... [16:43:53] 10Traffic, 10Operations: Secure shared ticket key rotation for anycast authdns - https://phabricator.wikimedia.org/T240863 (10BBlack) 05Open→03Declined There's not much DoTLS adoption so far, and really our primary HTTPS termination needs this more than AuthDNS does, at which point we can just copy whateve... [16:43:58] 10Traffic, 10netops, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar): Anycast AuthDNS - https://phabricator.wikimedia.org/T98006 (10BBlack) [17:34:10] 10netops, 10Operations: scrape ripe atlas data for a few anchors at other large networks - https://phabricator.wikimedia.org/T252890 (10RLazarus) p:05Triage→03Medium [19:28:46] 10netops, 10Operations: intermittent brief data dropouts for esams netflow data - https://phabricator.wikimedia.org/T253128 (10ayounsi) Relevant [[ https://turnilo.wikimedia.org/#wmf_netflow/4/N4IgbglgzgrghgGwgLzgFwgewHYgFwhLYCmAtAMYAWcATmiADQgYC2xyOx+IAomuQHoAqgBUAwoxAAzCAjTEaUfAG1QaAJ4AHLgVZcmNYlO4B9E3sl6ASn... [19:50:02] 10netops, 10Operations, 10ops-codfw: (Need by: End of July-2020 ) codfw:rack/setup/new management switches - https://phabricator.wikimedia.org/T253154 (10Papaul) p:05Triage→03Medium a:03Papaul [19:51:44] 10Traffic, 10Operations, 10Wikimedia-General-or-Unknown, 10User-DannyS712: Pages whose title ends with semicolon (;) are intermittently inaccessible - https://phabricator.wikimedia.org/T238285 (10Krinkle) [19:51:55] 10Traffic, 10Operations, 10Wikimedia-General-or-Unknown, 10User-DannyS712: Pages whose title ends with semicolon (;) are intermittently inaccessible - https://phabricator.wikimedia.org/T238285 (10Krinkle) [20:39:34] 10Traffic, 10Operations: Whitelist x-wikimedia-debug header field (currently not allowed by Access-Control-Allow-Headers in preflight response) - https://phabricator.wikimedia.org/T252826 (10BPirkle) [20:51:03] 10Traffic, 10MobileFrontend, 10Operations, 10TechCom-RFC, 10Readers-Web-Backlog (Tracking): RFC: Remove .m. subdomain, serve mobile and desktop variants through the same URL - https://phabricator.wikimedia.org/T214998 (10Krinkle) [21:09:28] 10Traffic, 10netops, 10Operations, 10Patch-For-Review, 10Performance-Team (Radar): Anycast AuthDNS - https://phabricator.wikimedia.org/T98006 (10BBlack) Status Update: Worked through a bunch of the software-level complexities today with getting bird::anycast to advertise an authdns IP from all the auth...