[08:41:50] ema,bblack: the new varnishkafka with -T 700 (max seconds to wait between Begin and End timestamps before marking a log incomplete) together with the new filter for Websockets upgrade req solved the inconsistency check issues that we had during the past weeks. I think that we might need to tune the -T and -L (maximum number of incomplete logs waiting for the timeout to expire) again for the uploa [08:41:56] d migration, but overall I think that from the vk side we are good! [08:42:10] \o/ [08:42:12] fantastic [08:42:32] \o/ [08:42:51] elukey: great job :) [08:44:11] 10Traffic, 06Analytics-Kanban, 06Operations, 13Patch-For-Review: Verify why varnishkafka stats and webrequest logs count differs - https://phabricator.wikimedia.org/T136314#2465314 (10elukey) Ran again the query, no empty dt fields for the past hours too. The issue seems solved! We'll might need to tune a... [08:44:19] 10Traffic, 06Analytics-Kanban, 06Operations: Verify why varnishkafka stats and webrequest logs count differs - https://phabricator.wikimedia.org/T136314#2465315 (10elukey) [09:37:26] 10Traffic, 06Operations, 06Wikipedia-Android-App-Backlog, 06Wikipedia-iOS-App-Backlog, and 2 others: Zero: Investigate removing the limit on carrier tagging to m-dot and zero-dot requests - https://phabricator.wikimedia.org/T137990#2465363 (10ema) p:05Triage>03Normal [10:07:34] elukey: awesome :) [10:17:03] bblack: morning! [10:17:21] very early morning :) [10:17:32] bblack: thoughts on the Range/X-Range thing? [10:18:16] feel free to reply after coffee/breakfast :) [10:36:25] <_joe_> uhm, on palladium we have /var/lib/puppet/volatile/squid [10:36:37] <_joe_> can you confirm we can safely remove those files? [10:44:51] _joe_: those files look great from an archeological point of view [10:45:03] <_joe_> we have backups :) [10:45:32] then I'd say it's fine to remove them :) [10:49:53] what about our secret plan to switch back to squid???? [10:50:12] bblack: a bunch of git reverts should do the trick [10:53:12] ema: where's the X-Range thing at? [10:53:45] bblack: I haven't prepared a proper CR yet, just this https://phabricator.wikimedia.org/P3440 [10:54:01] ema: also, I think we can do some v3-level pre-patching to upload VCL and avoid $reason_response and $unset_remove [10:57:30] ema: as in rebase onto https://gerrit.wikimedia.org/r/#/c/299126/ [10:58:43] ooh unset is availabe in v3! [10:59:36] yeah we use it all over the place, I donno why that one line was "remove" [11:00:16] anyways, rebasing right quick [11:01:49] cool, then we can get rid of the additional variable in https://gerrit.wikimedia.org/r/#/c/298744/ [11:03:25] oh yeah, you said that before :) [11:03:45] yeah rebased [11:04:06] wonderful [11:07:03] food o'clock here [11:18:01] 07HTTPS, 10Traffic, 06Operations, 05MW-1.27-release-notes, and 2 others: Insecure POST traffic - https://phabricator.wikimedia.org/T105794#2465558 (10Liuxinyu970226) [11:29:49] ema: 4 patch chain now ending at https://gerrit.wikimedia.org/r/#/c/299130/ (X-Range stuff) [11:30:18] (I put another generic one in the middle, to define vcl_backend_fetch per-cluster) [11:41:32] bblack: not urgent or anything, but I would be interested to know why https://phabricator.wikimedia.org/T140206#2456629 didn't show up in the logs for insecure post traffic [11:44:20] legoktm: if I had to guess, I'd say that code doesn't log in with a username? we only reported on authenticated users... [11:44:32] no, it does not [11:44:39] well, there you go :) [11:44:41] makes sense then [11:45:15] otherwise we'd be reporting users' IPs in public places, or if it's labs-based we'd just be reporting the IPs of random labs exec nodes that vary over time for the same tool [11:45:28] right [11:45:28] all API access should be logged-in for that reason alone, IMHO [11:45:52] well, anything anyone considers important anyways [11:46:15] we'll be looking at the same thing (really logged in) for ratelimiting at some point in the future, too. [11:46:54] the idea being if it's a logged-in session we let the application layer handle any per-user ratelimits if it wants. But if it's unauthenticated traffic, we apply global conservative ratelimits to avoid anonymous abuse. [11:47:02] (up in varnish) [11:48:34] but back to dumpinterwiki.php specifically: apparently it's also not logging warnings or errors anywhere that anyone will look at or alert on... [11:49:05] because we've been sending an API warning for a long time, and it's been randomly failing 10% of its requests with 403 for the past month too. [11:53:50] yeah. [11:53:59] I'll try and follow up on that later [13:32:53] bblack: s/CORS/OK/ of course! [13:33:18] so simple now that I see it but I don't think I would have thought of that :) [13:33:40] I was too busy finding a way to set obj.response in v4 heh [13:50:41] ema: I think we'll hit the same thing on text, same basic pre-patches [13:51:01] oh maybe not [13:51:47] just the restbase redirects hack has resp.response, but it's in deliver [13:51:56] anyways, we'll see it when we get there [14:02:10] bblack: minor comment about https://gerrit.wikimedia.org/r/#/c/299129 [14:02:26] that patch introduces a bunch of newlines [14:02:50] which are a bit of a distraction when reading the compiled VCL [14:03:13] perhaps we could use the ERB closing tag -%> instead of a newline? [14:03:43] this is not beyond nit-picking [14:04:06] hem, this *is* beyond nit-picking :) [14:06:23] here the newlines I'm talking about: https://puppet-compiler.wmflabs.org/3351/cp1048.eqiad.wmnet/ [14:06:30] 10Traffic, 06Analytics-Kanban, 06Operations: Verify why varnishkafka stats and webrequest logs count differs - https://phabricator.wikimedia.org/T136314#2465959 (10Ottomata) NICE WORK! [14:12:47] ema: yeah they suck, but I'd rather have the source be readable than the compiled output [14:13:20] agreed, the source is more important [14:16:53] bblack: I've changed the commit message of https://gerrit.wikimedia.org/r/#/c/299130/ [14:17:18] it will show up as your commit message so please double check what I said before I ruin your reputation :) [14:30:27] ema: all looks mergeable to me, assuming v3 output is sane [14:48:55] bblack: v3 output LGTM https://puppet-compiler.wmflabs.org/3355/cp1048.eqiad.wmnet/ [15:08:53] ema: for sub upload_common_set_[x]range ... should they be a in v4-only block in upload-common maybe? I don't think they hurt in v3, assuming they don't cause compilation failure because of some unknown variable name or whatever [15:09:30] bblack: the unknown variable stuff should be a simple warning [15:09:41] "unused", rather [15:11:40] let me try that on my labs instance though [15:26:55] yeah hopefully, since they're simple [15:32:45] bblack: after a few issues downgrading to v3 (PEBKAC), it compiled [15:36:19] it compiled! ship it! [15:36:30] :) [15:36:33] (the frontend) [15:37:05] there might be another PEBKAC for the backend [15:37:09] hang on a sec [15:38:47] Error: (-spersistent): fallocate() for file /srv/vdb/varnish.main1 failed: No space left on device [15:38:51] heh [15:39:24] yeah those files are huge [15:43:32] VCL compiled. [15:43:56] how incredibly good it would be to have CI for this [15:46:15] * elukey sees hashar runnin away screaming [16:10:00] 10Traffic, 06Analytics-Kanban, 06Operations: Verify why varnishkafka stats and webrequest logs count differs - https://phabricator.wikimedia.org/T136314#2466298 (10Nuria) 05Open>03Resolved [16:12:58] bblack: feel free to merge today, otherwise I'm gonna do that on Monday :) [16:13:06] o/ [16:14:35] ok