[10:15:40] I've filed an issue for the ATS problem that caused all cp-ats to end up with a full root partition: https://github.com/apache/trafficserver/issues/4635 [10:22:25] 10Traffic, 10Operations, 10Patch-For-Review: ATS: log inspection at runtime - https://phabricator.wikimedia.org/T204225 (10ema) >>! In T204225#4761225, @ema wrote: > 1. trafficserver closes its open logpipes upon logging.yaml config reload s/closes/unlinks/. Bug filed upstream: https://github.com/apache/tra... [11:33:41] ema: nice report <3 [15:16:09] bblack, ema: is T99531 on your radar? [15:16:10] T99531: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 [15:21:52] echo T99531 >> RADAR [15:21:52] T99531: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 [15:21:55] paravoid: it is now! :) [15:23:49] :P [15:27:53] paravoid: we'll discuss that during tomorrow's traffic meeting at the latest [15:28:06] np [15:30:30] meanwhile, it took me a couple of hours to figure out that the reason why we don't get -dbgsym packages built for trafficserver since we switched to llvm is that binaries generated by llvm do not have a BuildID unless you pass -Wl,--buildid to LDFLAGS [15:31:15] and in turn dh_strip silently ignores executables without BuildID (even if you're building with DH_VERBOSE set) [15:32:21] οθψη! [15:32:24] ouch even :) [15:32:29] how come we switched to llvm? [15:32:31] it's apparently a "major new feature" https://bcain-llvm.readthedocs.io/projects/clang/en/release_39/ReleaseNotes/#major-new-features [15:33:50] paravoid: heh, since version 8 ats needs C++17 [15:34:18] so in order to build against stretch we had to either backport gcc (lol) or use the llvm version in stretch-backports [15:34:32] ah and stretch doesn't have gcc 7 [15:35:13] correct [15:35:18] yeah makes sense [15:37:03] paravoid: I'm not really here, but yes it's on the radar. Ideally at this point we make it part of the certcentral certs (probably next Q realistically, to stick one of those onto the cache clusters just for this) [15:37:27] paravoid: but... we can't do much until the zone is transferred, no matter which path we go. [15:37:58] bblack: yeah, I just pinged you because there was a task update last week with zone contents etc. [15:38:04] and the task went from Stalled to Open [15:38:15] yeah [15:38:16] https://phabricator.wikimedia.org/T99531#4746550 [15:38:38] so someone needs to bring that into ops/dns (as-is using their IPs), and then we need to go back to wmde about registration transfer. [15:38:50] there's even a patch! [15:39:32] once the reg xfer is done, then it's all on us to get LE certs up and going on the cache clusters (but yeah, realistically, next Q before we're pushing new certcentral-based certs to the caches) [15:40:31] since LE will be DNS-based now, we can issue the certs ahead of switching server IPs too, so it makes the transition process simpler and smoother [15:40:52] oh haha, I hadn't thought of that [15:40:54] that's awesome heh [15:41:42] anyway, it sounds like next step is someone from traffic responding to that task and possibly amending + merging that patch [15:42:34] yeah, I can respond for now, maybe we leave the DNS fiddling for next week though [15:44:57] what does BuildID contain? I have some guesses as to why it might be off by default in llvm/clang [15:46:34] is it just a cryptographic hash? [15:47:14] cdanis: https://fedoraproject.org/wiki/Releases/FeatureBuildId [15:48:45] aha ty [15:48:47] 10Traffic, 10Operations, 10Wikidata, 10wikiba.se, and 2 others: [Task] move wikiba.se webhosting to wikimedia cluster - https://phabricator.wikimedia.org/T99531 (10BBlack) Thanks for the data and the patch! We'll dig into the DNS patch next week and get it merged in so we're serving wikiba.se from our DNS... [15:49:07] bblack: cool, thx :) [15:50:41] (and sorry for the ping + discussion, I hadn't realized you were off, my bad!) [15:56:27] cdanis: I imagine llvm does not pass --build-id to be faster than gcc hehe [15:57:54] haha my first thought was about build repeatability [15:58:04] but yeah performance was my second thought [16:05:29] hey look, debug symbols \o/ https://integration.wikimedia.org/ci/job/debian-glue/1331/ [16:48:47] 10Traffic, 10Operations, 10Wikimedia-Incident: Add maint-announce@ to Equinix's recipient list for eqsin incidents - https://phabricator.wikimedia.org/T207140 (10RobH) There has been no notices from eq singapore to test this, and last note was from Vivian's update on Nov 16th. I emailed a reply in just now... [17:06:30] 10Traffic, 10Operations, 10ops-eqsin: cp5001 unreachable since 2018-07-14 17:49:21 - https://phabricator.wikimedia.org/T199675 (10RobH) > Hi Rob, > Good day to you. > I am replying on behalf of my colleague Marco, as you have spoken earlier. > > I have created a new case number on the issue with the serve... [20:20:29] 10Traffic, 10Operations, 10ops-eqsin: cp5001 unreachable since 2018-07-14 17:49:21 - https://phabricator.wikimedia.org/T199675 (10RobH) So Dell wants us to update the bios and return this to service to see if the error happens again. I'll flash the bios, and attempt to run memtest remotely and see if that w... [20:50:50] when using varnishlog do you guys have a way of excluding the healthcheck/varnishcheck stuff? [20:53:06] 10Traffic, 10Operations, 10ops-eqsin: cp5001 unreachable since 2018-07-14 17:49:21 - https://phabricator.wikimedia.org/T199675 (10RobH) The latency in pushing things to the mgmt network is pretty high, but it is working. Updated the idrac firmware to 2.60 from 2.50, now updating bios from 2.5.4 to 2.8.0 [21:07:31] 10Traffic, 10Operations, 10ops-eqsin: cp5001 unreachable since 2018-07-14 17:49:21 - https://phabricator.wikimedia.org/T199675 (10RobH) Bios updated, now running memtest86+ via Dell diagnostics boot option entry. [21:17:55] on a semi-related note, I've got a new varnish server in deployment-prep that is responding with 503s to client requests. but varnishlog is not logging anything [21:24:01] think it might need -n frontend [21:24:22] yeah there we are