[04:23:01] wikimedia/remex-html is developed on Gerrit but composer.json points to github [04:23:08] is that standard practice? [04:23:28] it makes contributing even harder than it generally is with Gerrit [05:06:40] tgr|away: I don't see anything referencing github in remex's composer.json? [05:29:06] legoktm: yeah, my bad, it's not in composer.json, but the packagist package is linked to github [05:29:20] oh, we have to do that for the zip download feature [05:29:43] so you do composer install --prefer-source to submit a patch and then you have to mess with the remotes first [05:44:13] <_joe_> ok talking about packagist, there are a series of packagist files with defects and I get some spam from them [06:06:29] tgr: yeah :| I'd love not relying on github [06:10:12] couldn't we just HTTP-redirect the downloads? [06:10:30] that's still relying on github but at least only for the zipballs [10:34:00] <_joe_> not sure how reliable this is, but same page rendered on hhvm and php7, the memory used by RemexHtml spikes from the 6.8 Mb used in HHVM to 23,4 Mb on PHP7 [10:34:43] <_joe_> https://performance.wikimedia.org/xhgui/run/view?id=5c6e7820bb8544d05aeac100 (PHP7) vs https://performance.wikimedia.org/xhgui/run/view?id=5c6e7dbbbb85444b0b71e0d7 (HHVM) [10:34:55] <_joe_> php7 is way faster though, even with tideways enabled [12:21:49] <_joe_> is there any guide about how to write javascript for MediaWiki extension? I mean a style guide, not the mw js api [12:22:07] <_joe_> https://www.mediawiki.org/wiki/Manual:Developing_extensions doesn't have many useful pointers [13:19:12] _joe_: https://www.mediawiki.org/wiki/Manual:Coding_conventions/JavaScript [13:19:20] <_joe_> Reedy: thanks [16:02:46] _joe_: that's interesting. RemexHtml\Tokenizer\Tokenizer::consumeAttribs has a giant regular expression (https://github.com/wikimedia/remex-html/blob/2ecb8ff12/RemexHtml/Tokenizer/Tokenizer.php#L1108-L1149), and both HHVM and PHP7 compile those just-in-time, using PCRE. So it's possible that the difference in memory usage you saw across the two requests is the difference between a JIT cache hit and a [16:02:48] cache miss. [16:03:08] <_joe_> uhm [16:03:16] Also, PCRE JIT doesn't cache anything for you, it's entirely up to the caller to store and reuse JIT data. pcre_study() gives you JIT data and it's up to the caller code to store and reuse JIT data. It'd be interesting to compare how HHVM and PHP7 cache JIT data. [16:03:53] err, repeatedly repeating myself [16:05:22] lastly pcre allocates a stack for executing compiled regular expressions and its (again) up to the caller whether and how to reuse the stack space for subsequent regexp execution. And the size of the stack is tunable. [16:06:01] <_joe_> I'm not sure php has a global pcre cache in fpm [16:06:04] <_joe_> while hhvm does [16:06:53] php 7 does: https://externals.io/message/87245 [16:07:22] assuming it is not turned off at compile time, which I would it isn't for the version the wmf is running [16:07:29] *would hope [16:10:40] AFAIK neither HHVM or PHP exports any metrics about the pcre cache, and it's not exposed to PHP code at all [16:11:43] it would be super interesting to instrument it, since mediawiki spends a lot of time executing regular expressions [16:15:43] heh, you're running php 7.2.8 AFAICT, and there was a leak in the PCRE cache that was fixed in 7.3: heheh: https://github.com/php/php-src/commit/9278be148e751bc6d2107f4df667f6a6de4daa66 [16:16:03] not saying that's what you saw but maybe :) [16:18:24] Oh gods, let's not postpone PHP7 by needing to move to 7.3 over 7.2. ;-( [16:18:48] memory leaks didn't stop the HHVM roll-out, so no reason to panic :) [16:19:32] besides this is no more than a plausible guess [16:19:33] True. [16:20:13] I just /so/ want us to move off HHVM so we can drop the slowest component of CI. [17:15:33] I filed a task for investigating this further: https://phabricator.wikimedia.org/T216744 [17:30:20] <_joe_> ori: thanks [17:30:52] <_joe_> I was in back-to-back meetings where people annoyingly demanded my full attention [17:31:28] <_joe_> (and we can probably backport that fix if it's good for memory usage and we do have an issue) [17:31:43] it occurred to me that you can infer how boring or interesting my day job is on any given week based on how active I am on wmf IRC channels [17:31:56] <_joe_> ori: I hope it gets more boring [17:32:16] <_joe_> ori: you know I'm still searching for an SRE that should concentrate on the MediaWiki side of things [17:32:23] <_joe_> right? [17:32:46] <_joe_> also, we'd prefer to find someone located in a timezone reasonably away from UTC [17:33:30] <_joe_> :P [17:33:45] maybe one day :) [17:33:54] <_joe_> yeah yeah [17:34:32] <_joe_> I can't even sweeten the pill with you, you know *exactly* what you'd be getting into [17:35:02] <_joe_> but tbh, I'd love to be able to spend time chasing profiling details like this [17:41:26] <_joe_> instead, I'm stuck doing 1000 things that pertain to solving tech debt, which includes fun like finding a new sensible home for 10 million flow k/v data which were stashed on the session redises [17:42:17] <_joe_> err, 6 million, in truth :) [17:48:04] * James_F hugs the wonderful _joe_ and ori. [17:53:16] <_joe_> James_F: wait until my two patches for WikimediaEvents fall in your lap :P [17:53:37] Ha. [17:53:44] On that note, I'm off to the opera. [17:53:46] <_joe_> probably tomorrow, I'm too tired to finish and submit them right now [17:53:48] <_joe_> ahah [17:53:57] <_joe_> ok, tomorrow then I guess [17:55:57] <_joe_> eprodromou: hah, I was about to ping you. We should take some time (not today, I'm leaving) to talk about T212129 [17:55:57] T212129: Use a multi-dc aware store for ObjectCache's MainStash if needed. - https://phabricator.wikimedia.org/T212129 [17:56:09] <_joe_> and also practical next steps for it [17:56:31] heh, that's a fun problem [17:56:34] i'm off too, nice chatting [17:56:50] for variable definitions of "fun" [17:57:24] _joe_: so I think the best first step is to turn off Redis storage and see what breaks [17:57:53] <_joe_> eprodromou: well right now it would break the site, given some of the things we stash there [17:58:07] Sorry, I was being facetious [17:58:14] <_joe_> aahhahah ok :P [17:58:20] I think it would break a lot of things [17:58:32] <_joe_> I'm too tired, Ive started working 12 hours ago [17:58:48] I'll try to say what I mean instead [17:58:51] <_joe_> let's try to catch up tomorrow :) [17:59:08] That sounds good. Want to put a meeting on my schedule? [17:59:19] Or just plan on connecting here when we both have free time? [17:59:37] <_joe_> that sounds better :) [18:00:04] Excellent, let's do that [18:00:10] <_joe_> I'll try to get a longer break during the day and to be around after 17:00 UTC