[00:29:26] apparently this can end up being the challenge value: -TRta7bBXdBMGITJASxX13gBsPeUbYn9HRqRg9sHmYY [00:29:33] which causes problems for command handling [00:35:21] https://phabricator.wikimedia.org/P7529 [00:35:41] (personal subdomain in use because apparently I'm not allowed to make NS records out of designate) [00:43:46] so yeah, along with the _acme-challenge auto-removal thing that's something that'll need to be handled in gdnsd somehow [00:44:41] oh it looks like I might be able to do -- [00:44:44] hmm [00:44:50] ye [00:44:51] yes [01:00:37] https://phabricator.wikimedia.org/P7530 [01:00:38] ta-da [02:05:44] opened https://phabricator.wikimedia.org/T204013 and the upstream one linked there for my designate NS record problem [08:32:30] 10Traffic, 10Horizon, 10Operations, 10Upstream: Horizon Designate dashboard not allowing creation of NS records - https://phabricator.wikimedia.org/T204013 (10ema) p:05Triage>03Normal [08:53:10] 10Traffic, 10Operations, 10Patch-For-Review: Deploy initial ATS test clusters in core DCs - https://phabricator.wikimedia.org/T199720 (10ema) [08:53:13] 10Traffic, 10Operations: Evaluate Apache Traffic Server - https://phabricator.wikimedia.org/T96853 (10ema) 05Open>03Resolved a:03ema This can be closed now that we have: deployed two test clusters running ATS and routing traffic to all our applications, gained basic operational experience with it, verifi... [11:01:59] Krenair: the _acme-challenge autoremoval is already merged. The leading dash is an interesting gotcha that will break scripts with I guess a probability of ~1/64 challenges if integrators don't think of it [11:02:40] I wonder if there's something I can change with getopt() to make it work better [11:04:50] yeah there is [11:04:56] Two other modes are also implemented. If the first character of optstring is '+' or the environment variable POSIXLY_CORRECT is set, then option processing stops [11:05:16] so I can do that, and it won't try to process -TRta7bBXdBMGITJASxX13gBsPeUbYn9HRqRg9sHmYY after it's already stopped at acme-dns-01 [11:08:27] I'm fine with the -- trick [11:10:51] yeah but it will be a common FAQ thing, I'd have to at least document that anyone writing a script driving it should always stick the -- in there [11:11:19] ok [11:11:22] now I'm lost in the rabbithole of +/POSIXLY_CORRECT being a GNU thing and how this works on FreeBSD heh [11:11:39] is gdnsd expected to run on FreeBSD? [11:13:39] yes [11:14:08] I don't test it myself usually, but I try to be cognizant of portability and look at their docs where it matters, because others do [11:15:07] Allan Jude maintains a ports package for it: https://www.freshports.org/dns/gdnsd/ [11:15:36] that hasn't updated in forever heh, but he's recently filed bugs [11:17:27] so yeah the + hack is GNU, but POSIXLY_CORRECT should work [11:17:49] (and then FreeBSD and Linux will both treat it the same way and the literal -TRta7bBXdBMGITJASxX13gBsPeUbYn9HRqRg9sHmYY works) [11:35:20] does FreeBSD recognise the --? [11:35:42] yeah it does, but it also doesn't need it [11:36:04] basically you never would've run into the problem in the first place on FreeBSD, it would've just worked [11:36:12] I just had to make sure the workaround for GNU libc didn't mess that up :) [11:36:42] FreeBSD and the POSIX standards stop looking for -options after the first non-option string like "acme-dns-01" [11:37:10] GNU libc by default does something different: it permutes the argument array, shifting non-arguments like "acme-dns-01" out towards the end while search for more -options beyond them. [11:42:50] fun [11:43:02] in my case the command will now be [11:43:53] /usr/bin/gdnsdctl -- acme-dns-01 somedomain somechallenge somedomain -someotherchallenge [11:44:43] hopefully this will all work on systems other than debian stretch though that's the only one I've tested [11:44:57] btw, I've had to make a few fixes which will need to be reviewed [11:46:42] who should do that in valentin's absence? [11:55:40] I guess me! [12:07:47] ok. one of them has a test failure I haven't figured out yet [12:49:37] ok done [12:58:11] 10Traffic, 10Operations, 10Wikidata, 10wikiba.se, and 2 others: [Task] move wikiba.se webhosting to wikimedia misc-cluster - https://phabricator.wikimedia.org/T99531 (10abian) In case you want to analyze the situation, wikiba.se is down right now. [13:00:46] 10Traffic, 10Operations, 10Wikidata, 10wikiba.se, and 2 others: [Task] move wikiba.se webhosting to wikimedia misc-cluster - https://phabricator.wikimedia.org/T99531 (10Addshore) Indeed it is: {F25772293} $ ping wikiba.se ``` Pinging wikiba.se [89.31.143.100] with 32 bytes of data: Request timed out. R... [13:02:43] 10Traffic, 10Operations, 10Goal, 10Patch-For-Review: Deploy a scalable service for ACME (LetsEncrypt) certificate management - https://phabricator.wikimedia.org/T199711 (10Krenair) Got a bunch of patches open that need review, in no particular order: * This one is needed to implement the outcome of the abo... [14:04:04] 10Domains, 10Traffic, 10Operations: Move wikimedia.ee under WM-EE - https://phabricator.wikimedia.org/T204056 (10tramm) [14:07:35] bblack, realised something about my puppetisation for this [14:07:53] we're going to need puppet to notify our API service when it creates a new authorised host file [14:08:37] right now it does File <<| tag == 'certcentral-authorisedhosts' |>> to pull in those but that has no relation to the service... [14:12:39] oh wait [14:12:44] the @@file that creates it [14:12:51] notify => Base::Service_unit['uwsgi-certcentral'] [14:13:33] I wonder if that tries to do the notify on the node that imports the file resource [14:20:57] 10Traffic, 10Operations, 10Wikidata, 10wikiba.se, and 2 others: [Task] move wikiba.se webhosting to wikimedia misc-cluster - https://phabricator.wikimedia.org/T99531 (10Dzahn) these are the contact options listed on the site at http://wikiba.se/contact/ (in case it's down) to find an admin: //It's also po... [18:04:06] ema, bblack: I'd love to have your great minds have a look at https://phabricator.wikimedia.org/T202765 [18:04:40] for context, we have a bot that generates too much load on wdqs, but evades our throttling by using multiple IPs [18:05:46] We're not expecting any changes in transit/peering traffic, right? only transport links [18:10:18] gehel: I get "Access Denied: Restricted Task" [18:10:43] ema: ofc :/ [18:11:17] ema: added you as a subscriber, is that enough? [18:12:01] XioNoX: all esams<->eqiad traffic will become esams<->codfw, which I think means that the answer to your question is "yes, only transport links" [18:12:30] gehel: it works, yep [18:13:18] ema: to resume the issue: we have a service which does not scale (wdqs) a fairly naive throttling mechanism, and a bot which misbehaves [18:13:43] gehel: how about requiring bots to have a decent UA string? :) [18:14:29] yeah, problem is defining "decent" and still accepting browsers on the same endpoint [18:14:49] doh, browsers! [18:16:33] XioNoX: we will however also depool eqiad from frontend traffic, so shifts in transit/peering expected too? [18:18:52] ema: ok! good to know [18:19:25] ema: depool during the whole duration of the switchover or temporarily? [18:20:05] XioNoX: the whole thing AFAIU [18:20:26] rgr! [18:21:29] ema: then keep in mind we will only have codfw active in the US when we move ulsfo [18:22:13] XioNoX: when will that happen, again? [18:22:47] ema: last week of sept, most likely monday/tuesday [18:23:24] we might want to repool eqiad temporarily when we depool ulsfo, probably a good traffic meeting topic [18:24:38] XioNoX: so nowadays ulsfo is not that busy thanks to eqsin, so *maybe* it's not a tragedy to depool it while eqiad is also depooled. Definitely a good topic for Thursday though :) [19:36:30] 10Domains, 10Traffic, 10Operations, 10Patch-For-Review: Move wikimedia.ee under WM-EE - https://phabricator.wikimedia.org/T204056 (10Dzahn) a:03Dzahn [19:41:26] XioNoX: there's transit changes too, from depooling the eqiad edge [19:41:52] (transit load shifting from eqiad transits/peers to codfw/eqdfw transits/peers) [19:43:19] yep [19:43:31] keeping an eye on them [19:43:41] thankfully we have aggregate commits [19:43:53] gehel: in general we can't set rules on UA strings that make perfect logical sense. We can ask that well-behaved UAs please send us informative ones for debugging. But if we try to do something "reject/throttle insane/short/empty/stupid UA strings", there ends up being no great definition for such UA strings. [19:44:05] (we've been through this a few times before on other services) [19:46:11] re: ulsfo depool vs eqiad depool - yeah we can talk about it thursday, but it will probably be fine loadwise. It's not ideal in terms of US redundancy, but I think we can hold on just-codfw until/unless we have an edge problem in codfw that necessitates spinning eqiad back up [19:47:14] loadwise it should be fine, anyways [19:47:37] (but maybe the killer issue will be transit capacity?) [19:48:17] requests-wise it's peaks of ~11k rps [19:54:51] I'll look at transit capacity, I *think* we should be fine [19:55:20] fyi, hoover on the ports on the right to see current usage in codfw: https://librenms.wikimedia.org/bill/bill_id=15/ but we're far from any issue [19:58:03] yeah, moving ulsfo's load to codfw still is far from any saturation [21:11:14] 10Traffic, 10Operations, 10Performance-Team, 10Wikimedia-General-or-Unknown, and 2 others: Search engines continue to link to JS-redirect destination after Wikipedia copyright protest - https://phabricator.wikimedia.org/T199252 (10Imarlier) @kaldari Indeed. Problem is that the number of affected pages was... [22:28:44] 10Traffic, 10Operations, 10Performance-Team, 10Wikimedia-General-or-Unknown, and 2 others: Search engines continue to link to JS-redirect destination after Wikipedia copyright protest - https://phabricator.wikimedia.org/T199252 (10Nemo_bis) Google is known to not respect our canonical URLs, see {T93550}. T... [22:41:50] 10Traffic, 10Operations, 10Performance-Team, 10Wikimedia-General-or-Unknown, and 2 others: Search engines continue to link to JS-redirect destination after Wikipedia copyright protest - https://phabricator.wikimedia.org/T199252 (10kaldari) >Problem is that the number of affected pages was somewhere > 500,0...