[07:06:25] 10Wikimedia-Apache-configuration, 06Operations, 07HHVM, 07Wikimedia-log-errors: Fix Apache proxy_fcgi error "Invalid argument: AH01075: Error dispatching request to" (Causing HTTP 503) - https://phabricator.wikimedia.org/T73487#2446548 (10elukey) After a bit of digging, it seems that the AH01075/AH01068 er... [10:51:40] 10Traffic, 06Analytics-Kanban, 06Operations, 13Patch-For-Review: Verify why varnishkafka stats and webrequest logs count differs - https://phabricator.wikimedia.org/T136314#2446844 (10elukey) I checked again how VSL manages memory to establish the effect of the -T timeout (default 120 sec) and -L limit (... [12:54:07] <_joe_> ema: around? [12:54:37] _joe_: yep [12:55:14] <_joe_> ema: so, I'd need help from either you or brandon to understand how to manage new pages that are being autogenerated from wikidata data [12:55:19] <_joe_> on some smaller wikis [12:55:36] <_joe_> are you available maybe tomorrow morning to start discussing it? [12:55:47] sure thing [12:58:26] _joe_: if you were thinking of an hangout it might be better to do it in the afternoon though, so that bblack can also join if he wants [13:00:00] <_joe_> oh right [13:00:02] <_joe_> he's back? [13:00:11] I think he should be back today yes [13:00:23] <_joe_> ok so let's wait for him [13:16:16] maps upgraded to 4.1.3-1wm1 [13:19:10] 10Traffic, 06Operations, 10Continuous-Integration-Infrastructure (phase-out-gallium): Move gallium to an internal host? - https://phabricator.wikimedia.org/T133150#2447151 (10hashar) integration.wikimedia.org (with Zuul and Jenkins) is going to migrate to scandium.eqiad.wmnet doc.wikimedia.org is looking fo... [13:22:18] yeah I'm here [13:22:31] bblack: hi! :) [13:22:32] I have a long backlog of phab/gerrit/email/irc/etc to catch up on [13:22:35] hi! :) [13:23:05] <_joe_> hi! Welcome back! [13:23:11] <_joe_> enjoyed your time off? [13:23:20] yes! :) [13:23:47] <_joe_> ok so I'll wait spoiling your comeback with wikidata until tomorrow :P [13:24:21] ok :) [13:41:53] welcome back! :) [13:46:11] 10Traffic, 06Operations, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2447205 (10BBlack) The cutoff date is coming up tomorrow! One more list update, from the past 48H: New usernames not seen before: ``` HWY... [14:00:29] 10Traffic, 10Analytics, 10MediaWiki-extensions-CentralNotice, 06Operations: Generate a list of junk CN cookies being sent by clients - https://phabricator.wikimedia.org/T132374#2447244 (10BBlack) Yes, we can help wipe these out at the Varnish layer, by unsetting blacklisted cookies we see. We've done that... [14:07:53] 10Traffic, 10Varnish, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, and 5 others: Redo /beacon/impression system (formerly Special:RecordImpression) to remove extra round trips on all FR impressions (title was: S:RI should pyroperish) - https://phabricator.wikimedia.org/T45250#2447263 (10fgiu... [14:12:17] 10Traffic, 10Varnish, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, and 5 others: Redo /beacon/impression system (formerly Special:RecordImpression) to remove extra round trips on all FR impressions (title was: S:RI should pyroperish) - https://phabricator.wikimedia.org/T45250#2447284 (10BBla... [15:11:39] 10Traffic, 06Operations, 06Performance-Team, 13Patch-For-Review, 07perfnotice: Support HTTP/2 - https://phabricator.wikimedia.org/T96848#2447594 (10BBlack) {F4262192} [15:49:04] Hey ema. Did you get my questions from the last few days? [15:51:44] 10Traffic, 06Operations, 10fundraising-tech-ops: Fix nits in Fundraising HTTPS/HSTS configs in wikimedia.org domain - https://phabricator.wikimedia.org/T137161#2447784 (10BBlack) [15:54:50] 10Traffic, 06Operations, 10fundraising-tech-ops: Fix nits in Fundraising HTTPS/HSTS configs in wikimedia.org domain - https://phabricator.wikimedia.org/T137161#2447794 (10BBlack) @Jgreen thanks for working on this! I've re-audited all the Fundraising wikimedia.org hostnames, updated https://wikitech.wikimed... [15:57:00] Snorri_: hi! I don't actually remember, what did you ask? [15:59:29] ema: I had 2 Questions. First: Are the used hash functions evenly distributed? (I guess they are, but still ;) ) Second: Are the caches purged/flushed every 24h? My crawlers suggest this but this wouldn´t make sense regarding the 7 days TTL. [16:00:04] 10Traffic, 06Operations, 13Patch-For-Review: Investigate TCP Fast Open for tlsproxy - https://phabricator.wikimedia.org/T108827#2447812 (10BBlack) I'd like to share keys in the long run, but I think sh for port 80 is the right move for now. It will also clear up confusion on our TFO success/fail stats in ge... [16:11:35] 07HTTPS, 10Traffic, 06Operations, 10Wikimedia-Blog: Switch blog to HTTPS-only - https://phabricator.wikimedia.org/T105905#2447858 (10BBlack) If there's no real cost to do so, it would be ideal to ask them to switch our VIP for blog.wm.o to HTTPS-by-default and LetsEncrypt (as the latter will save us some m... [16:19:20] 10Traffic, 06Operations, 13Patch-For-Review: Investigate TCP Fast Open for tlsproxy - https://phabricator.wikimedia.org/T108827#2447888 (10BBlack) Another thing just occurred to me though - until we switch port 80 to nginx or patch our varnish, we don't have TFO support on port 80 regardless, as varnish does... [16:20:10] 10Traffic, 06Operations, 13Patch-For-Review: Switch port 80 to nginx on primary clusters - https://phabricator.wikimedia.org/T107236#2447893 (10BBlack) [16:20:12] 10Traffic, 06Operations, 13Patch-For-Review: Investigate TCP Fast Open for tlsproxy - https://phabricator.wikimedia.org/T108827#2447892 (10BBlack) [16:22:06] 10Traffic, 06Operations, 13Patch-For-Review: Investigate TCP Fast Open for tlsproxy - https://phabricator.wikimedia.org/T108827#2447901 (10ema) >>! In T108827#2447888, @BBlack wrote: > Another thing just occurred to me though - until we switch port 80 to nginx or patch our varnish, we don't have TFO support... [16:23:22] 10Traffic, 06Operations, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2447906 (10Whatamidoing-WMF) I've contacted the newest three. I'm going to post general messages to all the WP:BOTN pages and a few VPTs a... [16:31:25] 10Traffic, 06Operations, 06Performance-Team: Support brotli compression - https://phabricator.wikimedia.org/T137979#2447957 (10Gilles) It might be a good idea to experiment with this locally using our real content, to see what kind of gains we'd be looking at. SDCH+gzip might be worth looking into as well.... [16:36:35] 10Traffic, 06Operations, 06Performance-Team: Support brotli compression - https://phabricator.wikimedia.org/T137979#2447965 (10BBlack) I agree that SDCH has better upsides (for supporting clients), it just also seems like a much larger effort to turn it on and get it tuned, and I have no idea how we'd integr... [16:39:28] ema: bblack: https://github.com/openresty/nginx-systemtap-toolkit looks cool [16:41:11] 10Traffic, 06Operations, 06Performance-Team: Support brotli compression - https://phabricator.wikimedia.org/T137979#2448019 (10Gilles) Actually it's probably Linkedin, not Facebook that this guy works for. I pieced it together from his HN history, he oftens comments on Apache Traffic Server, which Linkedin i... [16:43:01] 10Traffic, 06Operations: Evaluate Apache Traffic Server - https://phabricator.wikimedia.org/T96853#2448043 (10BBlack) http://www.slideshare.net/thenickberry/reflecting-a-year-after-migrating-to-apache-traffic-server [16:43:16] 10Traffic, 06Operations, 06Performance-Team: Support brotli compression - https://phabricator.wikimedia.org/T137979#2448048 (10BBlack) Nice ATS link! Added to T96853 [16:44:24] ori: nice find, looks super useful :) [16:45:27] yes! embedding systemtap inside perl should be illegal, but very interesting indeed [16:50:39] Snorri_: we normally don't flush the caches, no. As for the hashing part, do you want to know if the hashing function distributes clients evenly among nodes? I guess so, it would be a pretty sad hashing function otherwise :) [16:54:10] just making sure ;) the flushing is very strange. My probes very ~2h apart for each website and they normally stay in the cache the whole time (even sites that other than my crawler are used pretty much twice a day) but every 24h they are removed from the cache. Looks like a flush, or an offline project crawling the whole wikipedia once every 24 hours or ... something like that. [16:58:32] 10Traffic, 06Operations, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2448199 (10BBlack) @Whatamidoing-WMF Thanks! I'm still getting caught up a bit from being on vacation.... The original plan (and still th... [17:00:08] Snorri_: how are you determining when the pages are removed from cache? [17:00:28] Snorri_: our front-most caches have a maximum TTL of 1 day for all objects, but deeper caches are allowed to cache longer if nothing else prevents it. [17:01:39] bblack: That solves everything! A TTL of 1 day on the mem-cache and 7 days on disc. Perfect! Thanks a lot! [17:02:55] 10Traffic, 06Operations, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2343854 (10AryanSogd) What should I do to make my bot (AryanBot) is not broken? >>! In T136674#2447906, @Whatamidoing-WMF wrote: > I'v... [17:02:58] (determening a removed through the age value. If it was 23h and the next probe 2 hours later from the same IP finds a Age of lower than 23h it had to be removed from this cache) [17:03:04] 10Traffic, 06Operations, 10fundraising-tech-ops: Fix nits in Fundraising HTTPS/HSTS configs in wikimedia.org domain - https://phabricator.wikimedia.org/T137161#2448253 (10Jgreen) >>! In T137161#2447794, @BBlack wrote: > @Jgreen thanks for working on this! I've re-audited all the Fundraising wikimedia.org ho... [17:08:31] 10Traffic, 06Operations, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2448299 (10AryanSogd) What should I do to make my bot (AryanBot) is not broken? [17:09:55] 10Traffic, 06Operations, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2448315 (10Elitre) Quoting a linked list message: - The simple solution is to simply include the "rawcontinue" parameter with your requ... [17:11:41] Snorri_: well, Age should go longer than a day, as Age is transitive through the caches.... there could yet be some artificial reason we're unaware of that Age almost never goes over a day. [17:12:55] Snorri_: we've taken some stats samples ourselves on Age before, though, and found it does drop off pretty quickly in practice on actual served requests [17:13:24] Snorri_: https://phabricator.wikimedia.org/T124954 [17:14:01] but (at least back then!) we didn't see a big cliff in the Age data on the text clusters until somewhere between 7 and 14 days out. [17:15:47] bblack: This depends. If I understand it correctly it works like this: I request a site no one has yet requested anywhere. So it moves to esam (in my case) mem-cache, data cache, equiad data cache, app-server. All 3 caches get the site stored. After that I probe the site every 2h. Everyday the mem-cache responds as it already got the site. After 24h it´s TTL is up and it is purged. The next request will look in the data cache again. But [17:15:47] as 24hours of traffic happened the site is already gone, so it works just like I started. Right? [17:19:20] 10Traffic, 06Operations, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2448487 (10AryanSogd) Thank you, Elitre. [17:28:39] Snorri_: not sure, I have a meeting starting up in a few minutes, but will circle back to this a little later. It's unlikely, though, that our backend disk caches cycle naturally in 24H due to regular traffic. They're probably oversized compared to the entire dataset. [17:30:23] bblack: I do have some (very few) sites that sometime get a higher than 24h Age value. So not all sites are like this. But it looks like most of them, as well as some of the "semi-popular" sites. Like "Cat" or "Dog". [17:30:30] And thanks a lot for the help! [19:04:51] Snorri_: if we wanted to see natural rollover times for the cache infra as a whole, probably the best testcase would be to find pages that are (a) popular enough that they're definitely hit many times an hour (there's lots of these to pick from) and (b) not edited frequently and don't reference frequently-edited templates (so we know our PURGE mechanism isn't routinely invalidating them) [19:05:43] Snorri_: the cache storage is shared between all the projects/languages/articles, so the rollover time is going to be the same for them all. We could also pick a very very unpopular page that nobody ever views or edits, and keep polling it consistently to keep it warm in the caches ourselves, and track its rollover [19:07:23] Snorri_: I tend to think we're not rolling over the disk caches just due to size limits, though. If even an unpopular page that's not being frequently-purged never lives longer than 24h while we regularly poll it, then it has to be due to mediawiki's header policy or PURGE traffic hitting it for some unknown reasons [19:10:31] bblack: I tried to consider both for my thesis. I do have sites used quite frequent. (Or so I hope) I will check tomorrow through my data sets what sites like "Cat", "Dog", the main page an so on say. [19:12:52] bblack: Also I do have sites almost never visited. Like a very small village in Germany. Both should not be edited frequently. So I can check them for the provided data. :) I´ll let you know what I can find out! [19:19:39] Snorri_: keep in mind there's indirect purges from templates and linked data, too. Some things (probably especiall cities) include all kinds of standard frequently-updated templates, or reference wikidata data that gets updated routinely, etc. [19:19:54] Snorri_: (in which case the article might not be edited often, but it may still get purged and re-parsed often) [19:24:46] bblack: These indirect purges could be problematic. But if the sites are really purged every 24h even though this should not happen due to the size and no updates. Than it would be possible to check on indirect purges. Am I right? [19:34:46] Snorri_: the only way I know to check purges is to actually log the purge traffic, which is immense due to other ongoing related challenges [19:35:21] Snorri_: but if we had just a handful of specific articles to watch, we could filter down on that and just log purges on those for a week or two and try to figure out why they were purged at those times [19:36:35] (re purge rate issues, see also https://phabricator.wikimedia.org/T124418 -> https://phabricator.wikimedia.org/T133821 ) [19:37:17] but current thinking is that the high rate of total purges doesn't necessarily mean every article is being purged - a *lot* of the raw purge requests are probably duplicates for certain heavily-hit categories of pages, etc... [19:38:53] bblack: Well...my bachelor thesis is (hopefully) wrote at the end of the month. So this might not make it into my thesis. But that doesn´t mean I won´t be interested in it after that. So this might be something to consider. Narrowing it down to a few interesting sites (frequent and just probed) might be something to do! [19:44:59] 10Traffic, 06Operations: Content purges are unreliable - https://phabricator.wikimedia.org/T133821#2449554 (10BBlack) >>! In T133821#2352086, @ori wrote: >>>! In T133821#2245711, @BBlack wrote: >> However, we reverted this because it seemed to make the race issues worse at the time. > > How did you know? Bec... [19:50:37] 10Traffic, 06Operations, 06Community-Liaisons (Jul-Sep-2016), 13Patch-For-Review: Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2449600 (10BBlack) The patch link above is pretty self-descriptive, and I'm planning to deploy that tomorrow. Will u... [21:51:42] 10Traffic, 06Operations, 10Continuous-Integration-Infrastructure (phase-out-gallium): Move gallium to an internal host? - https://phabricator.wikimedia.org/T133150#2450319 (10Dzahn) >>! In T133150#2447151, @hashar wrote: > `doc.wikimedia.org` is looking for a new home. Ganeti VM ?