[00:22:49] AaronSchulz: ping [00:26:58] meow? [00:27:10] * AaronSchulz needs more catitude [00:27:11] AaronSchulz: Aye, I'm looking into MediaWiki filecache. [00:27:21] AaronSchulz: Hoping maybe you know it a bit better than I. [00:28:28] AaronSchulz: It's quite basic at the moment (raw file contents and disk file mtime used to "store" the last-modified meta data), so not quite sure how to make it support content that isn't timestamp but content-hash instead (e.g. like ETag HTTP headers instead of Last-Modified) [00:29:14] I was thinking a separate metadata file, or maybe wrap the file in JSON with a "content" key and one for the meta data. That'll make the file a bit earlier, and probably create some serialisation artefacts and increase the file size of such cache. [00:29:19] Is that a good way to do it? [00:29:40] file a bit bigger* [00:29:52] Krinkle: why not use APC? [00:30:14] ori: FileCache is for entire HTTP responses. [00:30:18] We don't use it in prod. [00:30:35] The current implementation only has a file system backend. [00:30:55] I don't really care what backend it uses, but don't think it's a priority right now to implement differnet storage backends for it. [00:31:10] ok [00:31:14] we probably want better persistence guruantees than APC or other object caches though [00:31:43] without a cache for it, third parties without Varnish will be terribly slow. [00:32:34] there was a trick on how to have nginx serve the file cache directly and bypass PHP entirely, which made it pretty fast [00:32:41] We essentially cache very little. Mostly client-side (browser or proxy) right now. [00:33:03] how did you determine that this is a problem? [00:33:05] legoktm: for page views? [00:33:20] Krinkle: yup, let me find it.. [00:33:46] ori: The problem is that ResourceFileCache is timestamp based, so my WIP patch just breaks for installs that have filecache enabled. [00:34:04] Krinkle: https://www.mediawiki.org/wiki/Manual:File_cache#Serving_cached_pages_directly [00:34:06] The quick solution is to store the hash somewhere. [00:34:19] legoktm: Oh, ok. That's a possible later step. This about load.php responses. [00:34:22] store it in the database [00:34:27] ah [00:34:36] ResourceFileCache, not HtmlFileCache [00:34:51] ori: Hm.. content and meta data of cache objects in separate places? [00:35:13] update.php clears object cache, it doesn't clear html cache. [00:35:14] * AaronSchulz leans towards metadata files but would have to think about that [00:35:16] file cache* [00:35:30] too bad xattr is so spotty ;) [00:35:42] Even separate files is somewhat icky, but I'm fine either way (separate meta file or wrapped file) [00:36:03] Krinkle: theoretically you'd bump $wgCacheEpoch whenever you run update.php... [00:36:20] legoktm: lol, no? [00:36:21] who does that [00:36:50] Krinkle: $wgCacheEpoch = max( $wgCacheEpoch, gmdate( 'YmdHis', @filemtime( "$IP/LocalSettings.php" ) ) ); [00:36:59] that's in Setup.php [00:37:10] so unless you turned that off, it should get bumped [00:37:23] Krinkle, you could also have a header to the file before the gzipped content that has the hash [00:37:51] that way you do not need to deserialize the blob to check the etag [00:38:52] Right [00:39:01] that also avoids having to keep two files in sync [00:39:15] Ah, right. The benefit of a meta file and mtime is that it's outside the file [00:39:23] and doesn't require loading the file off disk every time [00:39:24] good point [00:39:58] Hm.. actually.. Maybe I can just change it to treat mtime as an expires instead of last-modified. [00:40:32] We change urls anyway when there's new content. [00:40:45] and in other cases we always use the full max age. [00:41:20] So I guess we can ignore the Etag, just like browsers ignore the ETag during the expires period before they query the server. [00:41:31] * Krinkle ponders. [00:42:32] yeah, the hash is indeed in the gz file name (I haven't looked at this in a while), which lets us do some tricks [00:42:49] Oh yeah that too. The file name itself. [00:43:19] though that's just the context hash right now [00:43:21] which represents either a versioned url (which contains the version hash already), or an unversioned url (which we unconditionally cache regardless of changes) [00:43:43] Krinkle, does your patch compare hashes all the time or augment with a blind TTL? [00:43:51] As long as it includes modules and version, we're good from cache expiration perspective [00:44:02] AaronSchulz: The TTL is handled by the clients, not the server. [00:44:09] However for file cache we'll need to handle it outself. [00:44:25] For regular HTTP caching by Varnish/browsers we don't need to do anything other than set Expires: [00:44:35] Yeah, I'll give it a blind TTL. [00:44:54] right now it (sometimes) compares timestamps. [00:48:13] AaronSchulz: there's some odd magic going on here though https://github.com/wikimedia/mediawiki/blob/master/includes/resourceloader/ResourceLoader.php#L794-L807 [00:54:52] * AaronSchulz wishes getVersion had more docs [01:01:02] Krinkle, soo, mostly you just need to avoid isCacheGood() if the URL is versioned since a hit implies an ETag match? [01:01:40] Hm.. but isCacheGood() is given a timestamp relative to now, not to the Last-Modified header [01:01:43] time() - maxage [01:01:47] wait this is already doing blind TTL? [01:01:59] yes [01:02:07] it's just subtracting out to accomplish that [01:02:17] right, so it doesn't need any changing at all [01:02:21] ::) [01:02:35] except for the parameter passed to tryRespondLastModified() [01:02:39] currently $ts [01:02:49] Which in my patch is $etag [01:02:56] Will need to add some abstraction. [01:03:01] Arrr. no, right. [01:03:28] will people use IMF *and* ETag together? I assume not [01:03:38] *IMS [01:03:43] It would work if the cache was a consumer of MW directly, but it's not. It's a data provider to the client on top of RL. [01:03:52] So it still needs to know whether to return 304 Not Modified or not [01:03:58] Yeah, they would not [01:04:19] But since we don't know how long it's been since the client received the original Expires, we don't know whether it's stale or not. [01:04:38] on teh server it uses now time() - maxage, btu we don't have a value for time() from the client [01:04:42] it used to be Last-Modified [01:04:55] Although that was a cheat I ugess [01:05:05] since it didn't really come from the client in this code path, it's just a fake value time-maxage [01:05:08] for blind TTL [01:05:13] but we don't have an equiv of that. [01:05:28] when the client requested If-None-Match, we don't know how long ago that hash was issued. [01:10:47] Hm.. I think I've got a way to bypass it. Thx for the idea bouncing AaronSchulz :) [01:11:20] Krinkle, is this on an etherpad or phab task somewhere? [01:11:33] https://gerrit.wikimedia.org/r/#/c/207661/ [01:11:39] https://phabricator.wikimedia.org/T94074 [01:11:43] it would be nice to see the control flow rather than trying to imagine it [01:12:02] It's just a side-effect. Because the entire system is changing to hash based, the FileCache is one of the dependencies caught along the way that doesn't seem entirely compatible at first [01:12:24] But with blind TTL (which is already what it does actually) it won't be a problem, so there won't be a need to implement any hash support in there. [01:12:42] A few things will need updating in the code paths calling it, but the essentials can stay as-is [01:14:53] reviews / thoughts about https://gerrit.wikimedia.org/r/210257 appreciated, especially "OMG you're going to break everything" [01:15:18] but, yeah, tryRespondLastModified() call would need tweaking of course, I forgot to mention that [01:15:30] legoktm, are you? [01:15:46] AaronSchulz: only for stupid users [01:17:15] Krinkle, is your patch *done* after that change? [01:17:26] AaronSchulz: I think so yes. [01:17:48] I haven't done broad testing yet, but my initial list of knowns is finished after this. [01:20:54] AaronSchulz: I've got two minor bug fix patches that this depends on in order to test properly https://gerrit.wikimedia.org/r/#/q/status:open+project:mediawiki/core+branch:master+topic:rl-filecache,n,z [01:32:44] Thx AaronSchulz [02:40:54] AaronSchulz: Got it. I moved tryRespondFromFileCache() in repond() to below the creation of $etag so that tryRespondNotModified() can be handled first. [02:41:05] Then it's a simple matter of blind TTL and sending the headers [02:41:53] Also added some docs to getVersion() :) [04:46:05] near the top of my xhprof output I'm seeing: 99.54% 834.073 1 - gmdate [15:31:47] tgr: Brad is going to be out today so I'll just cancel our team meeting if that's ok with you [15:32:12] sure [15:32:16] * bd808 tries to imagine a developer saying "NO I NEED ANOTHER MEETING!" [15:33:52] when it was one meeting per week in multimedia, it could get lonely without it [15:34:15] with mobile web's 7 meeting/week average, I feel I'll be OK [15:34:24] :) understandable [15:34:43] I have fought for a weekly meeting before as a remotie [17:39:57] AaronSchulz: around? I need some cache invalidation help :) [17:46:43] csteipp: have you tried turning it off and on again? [17:46:44] * Reedy grins [17:47:48] Ironically, that is the only solution at this point :) [17:48:30] * csteipp goes to restart all memcache on the cluster... [17:48:48] (kidding, in case ops sees that ^) [17:53:20] csteipp: anything i can help with? [19:19:13] bd808: I'm getting a duplicate source complaint from apt-get update in vagrant (running current HEAD) [19:19:50] W: You may want to run apt-get update to correct these problems [19:19:56] Silly apt, that's what I'm doing [19:20:27] tacotuesday: trying myself now... [19:21:56] tacotuesday: I have no conflicts, but no exotic roles either [19:22:22] I only have cirrussearch explicitly turned on [19:22:26] my /etc/apt/sources.list.d/ has multiverse.list and wikimedia.list [19:22:45] does it tell you where the dups are coming from? [19:22:59] W: Duplicate sources.list entry http://security.ubuntu.com/ubuntu/ trusty-security/multiverse amd64 Packages (/var/lib/apt/lists/security.ubuntu.com_ubuntu_dists_trusty-security_multiverse_binary-amd64_Packages) [19:22:59] W: Duplicate sources.list entry http://security.ubuntu.com/ubuntu/ trusty-security/multiverse i386 Packages (/var/lib/apt/lists/security.ubuntu.com_ubuntu_dists_trusty-security_multiverse_binary-i386_Packages) [19:25:10] Ah, I think I get it [19:25:17] I have an old multiverse sitting in sources.list. [19:25:25] When it should be in multiverse.list now [19:26:41] ah. The ones in my sources.list are commented out [19:27:15] Yeah trying that [19:29:31] I think that'll do it [19:44:22] I think I'm going for tacos tonight [19:44:52] mmmmmm... tacos [19:45:41] AaronSchulz: 10.64.32.22 is spamming fatal monitor with deadlocks [19:45:52] Trying to update `category` [20:48:11] legoktm: Are 6 +1s on https://gerrit.wikimedia.org/r/#/c/210257/ enough? ;) [20:48:46] 1+1+1+1+1+1=+1 [20:49:23] bd808: I think so :P [20:51:20] thanks :) [20:53:54] also: what does +1 mean anyway? Reviewer has a working mouse. [20:54:40] Do you even neen need a mouse? ;) [20:54:50] it's gerrit [20:54:52] so probably yes [20:55:24] you can do it with the SSH interface [20:55:33] hah [21:12:22] Tim-away: You've answered this question before, but I didn't write it down anywhere, so I have to ask you again: what is it about APC in HHVM that is substantially different and better than APC in PHP5? [21:36:38] hhvm mem maps the APC storage area as I recall [21:36:52] basically making it a very light weight lookup [21:37:29] there is also a way to make a .so to pre-populate it. That's how FB does their l10n strings [22:06:40] ori: You going to be able to make the meeting right now? [22:07:43] csteipp: oh shit [22:07:44] on my way [22:07:50] just had my 1:1 with damon and it went long [22:38:41] hey is there any way to upgrade the database schema manually if i were to point a 1.25 installation at a 1.24.2 database? [22:38:55] looking to hotswap my server basically [22:39:49] should i uninstall 1.25 on the new server, then follow the upgrade option in the installer? [22:51:30] hi all, can someone please review https://gerrit.wikimedia.org/r/#/c/210395/ ? Thanks [22:53:51] bmansurov: I added some reviewers who probably know about it a bit more [22:53:55] csteipp: https://gerrit.wikimedia.org/r/#/c/182995/ [22:54:01] legoktm: thank you [23:43:10] anyone wanna +2 https://gerrit.wikimedia.org/r/#/c/210260/ ? MaxSem?