[00:23:38] TimStarling: how would you implement caching of frequently-accessed parser cache items in APC? The parser cache as a whole belongs in memcached, but a small subset of it is very hot. [00:24:53] is APC under HHVM an LRU cache? [00:26:39] No. [00:26:59] makes things interesting [00:27:14] how small is this subset? [00:28:26] Pretty small. Having a dozen or two of the most popular templates and modules in APC would go a long way. [00:28:59] templates and modules are not in the parser cache [00:29:45] you mean the preprocessor DOM cache [00:30:09] Yes. I mix the two up. :/ [00:30:58] having a bounded LRU cache behind APC in HHVM would be pretty cool [00:31:44] actually, no [00:31:45] I looked a bit and thought that it might be possible to configure using the LRU stuff TimStarling made for the regex cache [00:31:47] I mean the revision cache [00:32:05] revisiontext:textid:NNN stuff [00:32:18] bd808: interesting idea [00:32:18] I guess another question is what the hit rate would be [00:32:45] example: enwiktionary:revisiontext:textid:32974160 gets ~190 GETs / second [00:32:59] that's a lua module [00:33:10] if the hit rate is high, you can just copy all memcached fetches to a local LRU cache [00:33:46] if the hit rate is low, the hot objects will be evicted and you won't get much benefit [00:34:37] https://en.wiktionary.org/wiki/Module:languages [00:38:01] it seems to me that it is not especially simple to find the most popular, say 100 pages, from a very large set with a fairly flat probability distribution [00:38:29] you could have a separate analytics system which calculates that list and pushes it into memcached [00:38:51] and then the nodes would copy the list into APC and consult it on each memcached fetch [00:39:19] if the page title is in the popular list, they would check APC, and if it is not there, copy from memcached to APC [00:39:59] ideally this would be done in something like nutcracker [00:40:02] like I say, if the probability distribution is not so flat, you can do it by sampling or simple LRU [00:40:31] ori: I was thinking the same thing (nutcracker local caching) [00:41:14] i'd love to have a small (say, 20-megabyte) memcached instance and for nutcracker to intelligently copy most-frequently-accessed keys there [00:41:19] invalidation is the really hard part [00:41:38] keys can have a very short TTL [00:42:06] going from a fetch every second to a fetch every 20 seconds would be perfectly fine [00:42:37] a fetch from a remote memcached server, I mean. [00:49:08] basically i'd like nutcracker to override the hash-based key-to-server mapping for a small subset of hot keys [00:50:04] but also ideally pass through to the normal backend on miss too [00:50:15] right [00:51:19] maybe we should just cache everything in the Module namespace in APC with a short TTL [00:51:56] hhvm APC is unbounded which makes that sound scary [00:52:19] module namespace is small + short ttl [00:53:01] So like an ApcBufferedMemcachedBagOStuff ? [00:53:16] heh [00:55:29] and when the module namespace becomes not so small... [00:55:40] boom! [00:55:54] hhvm crashing empties the cache [01:01:10] i think i'm just going to do this in ScribuntoEngineBase::fetchModuleFromParser [03:37:56] TimStarling: https://gerrit.wikimedia.org/r/226260 [04:14:47] why is ApiResult so complicated? [04:14:56] it seems to just exist to screw up my result arrays [04:15:34] I want to just return an array from execute() and have it magically show up in Parsoid after JSON serialize/deserialize [04:15:49] exactly the same in every way [04:15:50] TimStarling: how? if you use formatversion=2 it should be untouched [04:16:43] it is complicated [04:18:31] formatversion only changes ApiFormatJson, it doesn't even affect ApiResult [04:18:47] and PHP, but yes... [04:19:14] it affects how your result is converted to JSON [04:19:28] ah ok, it suppresses the BC transformation to ApiResult::getResultData() [04:20:49] ok let's set the stopscrewingwithmydata=1 flag and see if that makes a difference [04:20:57] thanks for the tip [04:32:30] yeah, one of my problems was [04:32:47] * - Boolean-valued items are changed to '' if true or removed if false, [04:32:48] * unless listed in META_BC_BOOLS. This may be skipped by including [04:32:48] * 'nobool' in the value array. [04:37:12] that still leaves hundreds of lines of weird special cases [04:38:20] like, what would happen if I had an array with a key that was a user-specified string? [04:39:15] nothing unless the string started with an underscore, and then the value will be removed, even if it is nested 10 levels deep, in an array inside a stdclass object, inside an array [04:39:46] stripMetadata() will hunt down those underscores whereever they are hiding [04:40:13] yes, I hit that before [04:40:33] and where is it documented? certainly not in the doc comments of ApiResult [04:40:53] ApiResult "simply wraps a nested array() structure, adding some functions to simplify array's modifications." [04:41:14] thanks ApiResult! 1400 lines of simplifications! [04:42:43] and why is it calling $wgContLang->normalize() on all strings? [04:43:36] what if I want to serialize image metadata containing a gzipped text layer or something [04:44:41] I'm pretty sure nothing in the HTML output layer calls $wgContLang->normalize() [04:45:06] it is potentially very slow, so it should be input layer's responsibility [04:46:58] TimStarling: I live-hacked and deployed https://gerrit.wikimedia.org/r/#/c/226260/ to gauge its impact, then reverted. It shaves off more than 100ms off page save time. See the bottommost graph on https://performance.wikimedia.org/ . (Deployed at 28 past the hour; reverted at 43 past.) [04:47:24] yeah, the only things that call it are Sanitizer, WebRequest (conditionally), Xml::elementClean() and ApiResult [04:47:49] ori: but it will crash HHVM, right? [04:48:05] no, the keys have a 60-second expiry [04:48:44] so you are relying on the parser being slow enough that it can't write that much data to APC in a minute? [04:48:54] depends on the number of cores, I guess [04:49:20] it's only for Module: NS revision text [04:49:49] which is how big? [04:50:36] not so big that the number of pages in that namespace that get used in a minute are enough to fill up gigabytes of RAM [04:51:13] 22MB on enwiki [04:51:28] didn't make a visible dent on http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&c=Application+servers+eqiad&m=cpu_report&s=by+name&mc=2&g=mem_report [04:52:21] * robla lurks, having just loaded performance.wikimedia.org and done some novice browsing of it (nice work!) [04:52:33] robla!!!!!! [04:52:49] heya o/ [04:53:06] :) [04:53:10] hi! [04:53:22] howdy! [04:53:28] how is everything? [04:53:57] by and large, exceptionally good [04:54:40] excellent [04:55:24] I still have a crutch which I use for longer walking, but around the house, I don't need it. [04:56:27] I have some other niggly problems, but on the whole, I got off pretty lucky for having been slammed by a car [04:57:19] yeah [04:57:25] how are things in this virtual neck of the woods? [04:57:50] oh, puttering along [04:59:06] pretty good, just got back from Wikimania. legoktm's SUL finalization talk on Friday stole the show. I had to spend Saturday and Sunday listening to people gush about how great he is. [04:59:39] :o [04:59:42] hi robla! [05:00:31] hi! glad to hear that people are receiving the SUL stuff well. I have to dogpile on the legoktm gushing :-) [05:02:15] I've been working on a couple of parsoid projects mostly [05:03:07] the architecture committee was going along nicely for a while, but now seems to have stalled somewhat, maybe due to wikimania [05:03:29] TimStarling: the Parsoid work seem interesting/fun? [05:03:44] yes [05:04:26] I rewrote half of PEG.js, the JavaScript parser generator which parsoid uses [05:04:34] mostly for fun ;) [05:05:43] currently in discussions with upstream -- he's not sure he likes my rewrite, but it's 10% faster, and can probably be faster still with a bit more work [05:05:49] :-) yeah, I heard/read a *little* about that...is that looking like that work will make it all of the way out, or is it in the "we'll see.." stage? [05:06:19] it'll be deployed to WMF [05:06:46] arlo just merged it for round-trip testing [05:06:53] nice! [05:07:19] that is 10% faster on their benchmarks [05:08:21] the architecture committee has been keeping the RFC process kicking along, and there was also a strategic document, mostly authored by Daniel and Gabriel [05:08:39] setting out some big projects for MW core that we would like to see happen [05:09:16] discussions of committee structure and governance model were mostly deferred waiting for you to come back [05:10:34] yeah, I skimmed around on the Arch stuff. I'm simultaneously eager to get back but also making sure that I'm not being stupid about jumping back in too quickly [05:11:04] fair enough [05:12:09] have you been to the office yet? [05:12:20] yeah, a couple of visits [05:16:36] I'm hopeful I'll largely be able to focus on Arch Committee stuff, which is what I had worked out with Damon before both of us disappeared for our own reasons. [05:18:22] Terry and Lila are on board, and I'm looking forward to working more with them too [05:19:10] are you thinking about joining the archcom hangout meeting tomorrow? I'm not sure how many people will be there [05:19:37] Daniel is definitely out, until August 5 in fact [05:21:40] I *might*....I still have a bunch of personal stuff that I've been putting off, but at least tomorrow I'm not scheduled for any therapy at the same time. [05:22:22] ok [05:22:33] I hadn't thought through the timing of that meeting until you brought it up just now. [16:52:01] anomie: review requested for https://gerrit.wikimedia.org/r/#/c/226260/ , if you have the time [16:52:09] * anomie looks [16:52:22] i cherry-picked it and deployed it to gauge impact, and it was beneficial for save time (~100ms) and did not cause a visible bump in memory usage [16:52:36] tim checked the total size of the module NS on enwiki and it is 22 mb [16:52:41] there's also the 60s TTL [16:53:53] ori: What's the worst-case if someone makes lots of huge modules and invokes them all in one page? [18:15:45] anomie: nothing really; you'd have to fill up gigs of RAM with non-duplicate lua content [18:48:45] ori: did the login method for phabricator change? [18:48:56] I don't remember it asking for LDAP credentials before [18:49:11] and what I thought were my username/pw don't work [18:49:14] but I could've forgotten [18:49:35] swtaarrs: you should still be able to use your mediawiki.org account [18:50:01] ooh [18:50:07] aha, thanks! [19:01:42] bd808: got a second? [19:01:51] in a 1:1 [19:02:01] ping me when you're free? [19:03:18] sure [19:03:47] thanks [19:13:34] swtaarrs: that's https://phabricator.wikimedia.org/T963 biting [19:14:08] ahh ok [19:15:31] Nemo_bis: that reminds me of the time some random blog article was the top google result for "facebook login". people actually clicked on it, scrolled down to the first textbox they could find (the blog's comment section), entered their username/pw, and hit submit [19:15:58] so there were tons of plaintext fb credentials just sitting there until the owner figured out what was happening and deleted them [19:17:31] heh [19:17:53] phishing is all too easy with the average internet userbase [21:14:53] ori: finally out of meetings [21:27:37] bd808: still there? [21:27:42] yup [21:27:44] sup? [21:28:01] ok, so I basically wanted your advice on how to manage APC. Here's what I've learned through experimentation in the last few hours: [21:29:14] the following code appears to be a reliable mean of determining the current size, in bytes, of APC: [21:29:15] $cacheInfo = apc_cache_info( 'user', true ); [21:29:15] $cacheSizeMb = $cacheInfo['values_size'] / ( 1 << 20 ); [21:29:37] errr, in megabytes that is [21:30:02] it seems desirable to augment's HHVM APC management with some logic internal to APCBagOStuff [21:30:28] To set an upper limit? [21:30:41] the question is how -- should APCBagOStuff::set() simply decline to set something if APC is over a certain size threshold? [21:31:09] That's *probably* safe in that I don't think any code that calls APCBagOStuff::get() depends on the get being successful; there is always a fallback [21:31:27] one bit of ugliness that we'd need to consider is this: [21:31:53] keys aren't evicted from APC at the exact moment they expire; rather, they get evicted the next time HHVM iterates through all keys looking for expired ones to evict [21:32:01] and that happens.... every N `set` operations [21:32:16] so if we decline to set items once HHVM reaches a certain size, we won't trigger eviction of expired items either [21:32:26] so we might have to do something like, decline to set the value, but instead set some dummy key [21:32:32] right. I think PHP5 evicts before apc_add() returns false for the store being full [21:33:06] maybe rejecting set()s isn't the right approach at all [21:33:25] that's what PHP5 does when full [21:33:37] but implementing it in our classes is a bit hacky [21:33:51] it really should be an upstream feature in HHMV [21:34:02] a limit on the apc store size [21:34:11] yeah [21:34:28] maybe there is one and it isn't documented, i'll go read the code again [21:34:50] Worth looking again. I didn't find one a couple of weeks ago though [21:35:12] but if indeed there isn't one, given that an upstream feature, even if implemented, won't make it to our production environment for at least a couple of months, does emulating the PHP5 behavior in APCBagOStuff seem sensible? [21:35:43] It would help keep us from shooting our own foot. [21:36:02] * ori nods [21:36:10] one other question -- how about checking free memory instead of apc size? [21:36:25] decline to set if free memory is less than some threshold rather than apc size is bigger than some threshold [21:36:26] that would be very hhvm specific [21:36:33] in php5 they are distinct pools [21:36:46] free system memory, i mean [21:36:59] apc is 1..N shm segments [21:38:56] So if we want an HHVM safety net I would lean towards a $wgMaxAPCSize that defaults to -1 and then stuff in APCBagOStuff to enforce a simple limit if >=0 [21:39:14] and just have the value there be bytes [21:40:50] for the cleanup stuff you could make a dummy key that you just store a bool in to trick it [21:41:39] meaning if we are above the limit then poke a tiny consistent key to increment the periodic gc counter [21:41:53] yeah [21:47:34] anomie: in the meantime, though, https://gerrit.wikimedia.org/r/#/c/226260/ is safe, both in theory and in practice (it has been in prod for the past day) [22:36:57] TimStarling: https://gerrit.wikimedia.org/r/#/c/226260/ has been in production for a while now, with no perceptible impact on memory usage on the application servers. Are you still uncomfortable with merging it? [22:53:55] it's not the typical case that concerns me [23:05:34] I'll comment on the change [23:23:54] TimStarling: thanks; I'll have to think about those