[00:06:40] tgr|away: The unpatrolled file stuff for wp0, are we finishing that? I would assume not. [00:07:16] There's a few stale CRs relevant to it in FileBackend and related classes. [00:15:59] TimStarling: SMalyshev: Regarding PHP7 profiling (https://phabricator.wikimedia.org/T176916) - I'm working from the other end of the problem (collection, aggregation), and close to done with generalising arclamp to support two incoming queues for receiving stacktraces (instead of just one for xenon). [00:16:51] I'm not sure what the current status or expectation was. I'd like to help out where I can with the instrumentation, but I feel the current state is still a bit too rough for me to take it further, I don't have the C skills and experience for it :) [00:17:27] Pchelolo: https://www.mediawiki.org/wiki/Core_Platform_Team/Skills_matrix I copied the list from Developers/Maintainers for now, but was going to expand it to meet reality afterwards [00:17:37] I would be fine with pushing it to next Q, assuming ops won't go live with PHP7 before then. [00:18:06] great legoktm thank you. I'll add services stuff there [00:18:11] :) [00:19:00] should we use rel-eng 5-star pictograms there? might be easier to read then a bunch of numbers [00:19:07] Krinkle: I did write profiler for PHP (though not for 7), so I could probably help some, though I am not familiar with Xenon parts [00:19:20] (and I do know C :) [00:19:49] so if I could help with reviews/PHP questions/etc. I'd be happy to [00:20:21] SMalyshev: basically, to get to a point where we have something that can produce stacktraces on app servers for PHP 7 fpm requests in some way. Once that exists, I can take it further to find a way to send them to Redis and publish, aggrgate, and visualise etc. [00:21:25] Krinkle: stacktrace when? generally stacktrace on PHP should be easy, http://php.net/debug_backtrace does it so just do the same :) [00:21:37] Whether as php extension and/or from PHP user land in-process and/or from an observer deamon somehow, I do not know. [00:21:50] SMalyshev: sampling profiler, every N ms in all of prod, without significant slow down [00:22:10] ah, that might be a bit trickier. Definitely extension then, not PHP user land [00:22:30] You can Tim started on it at https://phabricator.wikimedia.org/T176916#4301180 [00:22:52] and probably would need to use some trickery so that e.g. we don't copy the same data structures a million times [00:24:26] This is mostly outside my playing field, but as I understand it there is the option of "write a program" (C?) that captures PHP state from all current php procs at an interval, or "write a PHP extension" (C?) that captures it from within the process. Neither of those I feel able to do myself, so I depend on you and/or Tim for that. [00:24:36] being a blocker for PHP 7 deployment in prod. [00:24:49] No particular rush on my part, but presumably in the next 1-2 quarters. [00:25:33] from memory, without thinking about it too much, I'd suggest collecting stack along the same lines as dump_bt does, without resolving any pointers etc. and then at the end of the request serialize it and ship out whereever we need it [00:26:26] N dump_bt [00:26:33] (sorry, typo) [00:26:54] Krinkle: extension is probably easier since it already has access to all the guts... though the gdb trick is nice and might be easier to code. But I am not sure performance-wise it would work [00:27:30] Right. It's a trade off between no overhead in the process, and no overhead on pausing the process, correct? [00:29:21] There's also overhead of gdb itself, which has no performance guarantees whatsoever [00:29:44] Pchelolo: I was a bit concerned about horizontal space [00:29:46] i.e. it might decide it needs symbols for some shared lib and get stuck for several seconds trying to find them [00:30:01] Pchelolo: but if they fit it works for me [00:30:26] SMalyshev: the gdb overhead, would that apply to in-process/PHP extension as well? [00:30:37] Krinkle: so for running this continuously on production server is sounds a bit dangerous... but we might experiment and see maybe it's not that bad [00:30:42] I assume now, given debug_backtrace() hopefully does not pause for seconds. [00:30:49] not* [00:30:51] legoktm: just checked - they fit now [00:31:09] but on my screen we only have space for 2 more team members :) [00:31:20] Krinkle: no, php extension would just collect data, it doesn't need any external startup gdb does, it already knows all symbols etc. since compiler/loader already resolved them [00:31:20] heh :) [00:31:30] SMalyshev: Right. [00:32:07] Krinkle: it will also have better performance since if you have strings there, when generating them you can just save char* instead of copying the whole string [00:32:30] since it lives in the same process (and we know php is not going to drop e.g. class names until the request end) [00:33:20] for gdb, if you collect same function name 100 times, you copy it 100 times [00:33:25] not a huge deal for a modern cpu, but it all adds up [00:33:49] SMalyshev: Anyway, sorry to be direct, but in a nut shell: I do not think realistically I can code this, regardless of how much advice. I can help brainstorm, and I can maybe spend a quarter trying to add the last bit of C code needed to connect with Redis, but the actual "thing" it self, will be a CPT need. Just need to decide on when. I'd gladly take the chance to learn more about this from you (plural), if we can afford the delay of doing so [00:33:49] wrt to PHP7 deploy. [00:34:16] And CPT in turn might need your expertise on PHP internals. [00:35:35] regarding it adding up, I have no idea what I'm talking about, but.. it could be interesting to look at how HHVM/Xenon "does it" internally, which is what we use in prod now for our sampling profiler. [00:35:51] Krinkle: yep, I understand. So, I do not know enough about this stuff yet (I mean, what is the full task def, what needs to be done/already done, when etc.) so I am not sure whether/when my help would be needed. [00:36:23] the worker proceses in hhvm have one thread per request and if I understood correctly, they capture it from the master thread, using a tick to synchronise them for the briefest of moments. [00:36:42] but I would be glad to help if needed. So maybe we (myself, you, @TimStarling, anybody else who needs to be in the loop) can set up a meet and talk about it [00:36:49] every 100ms, each request thread gets such a signal. [00:37:31] granted, I wouldn't be surprised if it's all very differnet and not applicable or to learn from for php. [00:38:07] in 7.1 you can set a flag from a signal handler which will interrupt the VM, that would be a great way to do profiling [00:38:28] yeah here it might be some complications too, e.g. signals are expensive usually, and can happen in unsafe places, might be better to have something that checks e.g. on the end of each statement or use zend_interrupt_whatever for this [00:38:31] even if it waits until function exit to call the hook [00:38:55] for hhvm, xenon (in cpp) internally gets a stack from every req thread in every worker every 100ms. And then at the shutdown of a randomly chosen web request, we have a user land xenon_get_data callback to send it to Redis ourselves. [00:38:56] if it calls a hook on function exit, you then know that the signal occurred during that function, that's all you need for profiling [00:39:21] TimStarling: in this case then we might even not need a signal. Just sample the time each time we end the statement, if 100ms passed call the backtrace dump [00:39:35] iirc there are pretty cheap ways to get time on Linux, may even not need a syscall for it [00:40:07] that's how xhprof works already, but I think Krinkle benchmarked it and found that it was adding overhead [00:40:47] xhprof adds function entry/exit hooks to the VM and then in sampling mode, just increments a counter on most hook events [00:40:59] it's not super optimised, there's also configuration reads and the like [00:41:04] well yes it'd add overhead anyway, code does not run for free :) but I wonder whether it's more overhead than generating signal every 100ms... I dunno, that needs checking. If signal is faster then let's do signal [00:41:08] Yeah, xhprof's sample mode (100ms as well) added between 10 and 35% slowdown. compared to >300% overhead for unsampled) [00:41:27] maybe we should look at why it is so slow [00:41:35] 300% is probably for doing way more than occasional backtrace... [00:41:56] it probably records timing of every function call at least [00:42:19] I did email nikic asking if he wanted to do a contract for us, but he didn't reply [00:43:21] might be that he's not interested... it's a pity. [00:43:37] 300% is unsampled, pretty much doing a backtrace every time [00:43:51] 10-35% is just for checking a few global variables per function call [00:44:42] right. So if instead of this it would just check the time and only do BT when at least 100ms passed since the last one I believe it'd be better than 300% [00:45:42] this is really what the sampled mode does already [00:45:44] Looking at Xenon, each thread has a XenonSignalFlag. This is set by the observer thread for all known threads every 100ms. Then in a regular thread, the onFunctionExit checks if the flag was set. [00:45:47] /* See if its time to sample. While loop is to handle a single function [00:45:47] * taking a long time and passing several sampling intervals. */ [00:45:47] while ((cycle_timer() - hp_globals.last_sample_tsc) [00:45:47] > hp_globals.sampling_interval_tsc) { [00:45:55] This is presumably why it is light as it needs to do no time computation. [00:46:08] where cycle_timer() is [00:46:13] asm volatile("rdtsc" : "=a" (__a), "=d" (__d)); [00:46:37] this is hp_sample_check() in xhprof [00:47:01] Krinkle: yeah, that's what I thought for signal thing. It doesn't even have to be separate thread... [00:47:32] TimStarling: ah, so that sounds exactly what I was thinking about :) [00:47:34] it's called from hp_begin(), I suspect hp_begin() is really where the overhead is [00:48:12] or rather, hp_execute() [00:48:47] I wonder if it overrides the interrupt handler, does it also need the execute? [00:49:55] looks like hp_sample_check is called at the begin and end of each function [00:50:01] should not be too expensive [00:50:08] zend_execute() is replaced by hp_execute(), so you miss out on anything clever zend_execute() was doing [00:50:33] maybe we need to profile the profiler ;-) [00:50:43] for example does userspace function return actually return from the standard VM executor, or does it just adjust the internal stack? [00:50:52] yeah I think because xhprof has two modes, it does a lot of things that it won't need to do in sample mode [00:50:59] I think it is all just gotos in the core [00:51:09] but when you add a hook, that is disabled [00:51:56] TimStarling: possibly. That's why I am saying maybe in sampling mode some of these hooks are not needed. You may lose occasional sample because of it if you're unlucky but it should be faster [00:52:21] SMalyshev: regarding signal, how could the signal get sent after 100ms without a separate thread to schedule, and without checking every time whether its time? [00:52:24] it doesn't use interrupt hook [00:52:39] so I wonder if one rewrites it to use interrupt hook whether it'd be better [00:53:02] Krinkle: alarm signal? [00:53:15] you can use timer_create() with SIGEV_THREAD [00:53:27] setitimer etc. [00:53:34] this runs the timeout handler in a separate thread [00:53:53] * Krinkle only knows the names of C built-ins that have PHP equivalents. [00:53:55] setitimer is old, timer_create() is what we used for the Lua sampling profiler and it is quite nice [00:54:30] TimStarling: ah, ok, even better... timer_create seems to be newer and nicer [00:54:49] Krinkle: long story short, Linux has syscalls to deliver signals at prescribed intervals [00:55:28] or do other stuff in case of timer_create [00:55:47] SMalyshev: interesting. does that get added to an event loop, or would the process check at its own convenient time whether the signal was delivered? [00:56:34] Krinkle: no, signals are asynchronous. Which is a problem, because they can happen at wrong time, but Zend Engine already has code to deal with it [00:56:48] k [00:56:50] so it could just set a flag and then when we're ready we'll check the flag [00:56:55] a lot of timer_create() is implemented in the C library, the kernel really only provides one timer per process, then the C library multiplexes it [00:58:04] I never used SIGEV_THREAD, needs to read more about it, sounds useful. But in any case, with high probablity we can make signal way work, or we can go the same way xhprof does, but probably using interrupt handler would be more efficient [00:58:55] unfortunately, I need to go in about 5 mins, but if you want to set up a meet (or discussion thread or anything like that) where we could discuss it, I'd be glad to participate [01:00:25] Do you think we should try to improve/fork/upstream xhprof, or do something separate in a minimal way that we may and up submitting as improvement, but keep using our own? [01:09:30] I think we could try with improving xhprof but probably will need to do our own parts with regard to collection anyway.... [01:09:55] as I understand, basic xhprof doesn't do much beyond writing stuff to files? [01:11:55] SMalyshev: It doesn't write to files afaik, it requires the user land to call a function when it wants to stop recording and gets an array of 0 or more samples. [01:12:12] ah yes... [01:12:13] it also requires userland to start it, which we'd do from auto_prepend_file presumably. [01:12:29] we alreayd have such a file in place for our manual unsampled xhprof/xhgui profiles. [01:13:04] ok in this case we'd need some php/extension capability to ship the results out I guess. I am not sure how their sampling mode works in that case, so we'd have to check [01:13:30] but I think we'd probably try to use xhprof+something else to store the results then [01:13:31] for shipping, I'd be fine with keeping that in userland post-shutdown. The xhprof model basically. [01:13:45] also what xenon does currently. [01:14:35] The hard part is tracing in a way with low overhead. If that ends up happening within a PHP exension (not a dbg process) then it's presumably easiest to let PHP userland deal with the collection part. [01:14:57] our curent logic for this is https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/profiler.php [01:15:54] where we register a shutdown callback in which we ask HHVM any data collected so far in that worker process. It's also where we start/stop the debug profiling (unsampled). Both ship to Redis and/or Mongo (XHGui). [01:17:40] yep, sounds good. So probably improving xhprof sampling mode is the way to go (at least my current thinking) [01:18:03] have to run now, will be back in 2-3 hrs if anything else is needed [03:10:07] legoktm, Reedy: OAuth in awk! [03:57:10] incredible [11:22:19] tgr: i just noticed that the new classes in the diff patch are not namespaced. [11:22:37] i think all new classes should be. is this an oversight, or is there a good reason for them not being? [11:24:34] DanielK_WMDE: I think it's less confusing than a) mixing namespaced and non-namespaced content in includes/diff, b) having separate includes/diff and includes/Diff, c) choosing some non-intuitive namespace name just to avoid conflict, which are the options I can think of [11:25:34] adding a namespace after the fact is easy, the platform team ha a project for namespaceing all of MW core, I think it's easier to deal with it as part of that [11:25:43] I'd just happily mix... [11:26:04] we should ask TimStarling, he's the namespacification guy... but then, i'd really like to just get this merged :) [11:26:18] playing with the GetSlotDiffRenderer patch atm [11:39:47] tgr: so... what about SlotRoleHandlers bein in control of SlotDiffRenderers? That would just happen on top, and the hook would only be used for per-content-model overrides? [11:41:45] the hook is used for non-per-content-model overrides [11:42:14] well, it has access to the content model [11:42:19] it doesn't have access to the role name [11:42:29] or I guess it can be per content model, the important part of it is that the extension doing the overriding does not own the content type (or role) [11:42:44] you can't really provide a SlotDiffRenderer without considering the content model [11:43:10] well, for text content you can. but not for all kinds of content [11:44:29] the only use case in my mind is mobile frontend overriding the differ for all text content models. [11:44:30] you could use it for some kind of decoration, change the output format without changing the content of the diff and such [11:45:16] i guess, though that's kind of scary, considering that the renderer returns an html snipped that contains bare table rows [11:45:21] there are two use cases currently (which are pretty similar), inline formatting via MobileFrontend and via WikEd [11:45:24] there's really not much decoration you can do there [11:45:55] yea, both override diffing for text types [11:46:01] parsing table HTML and turning it into something nicer, for example [11:46:05] how can they reliably detect whether it's a text type? [11:46:19] insteancof TextContentHandlerß [11:46:22] ? [11:46:24] yes [11:46:42] I'm wondering whether this hook should rather be in TextContentHandler, then [11:47:15] or better even, in TextSlotDiffRenderer [11:47:16] someone might want to use it to override for a specific non-text content type [11:47:39] yea, that's possible, i'm just not sure I want that :) [11:48:04] I feel for the current use casees, a GetTextDiff hook would be more useful [11:48:05] I imagine it will be in SlotRoleHandler eventually as that's the logical place for getSlotDiffRenderer [11:48:06] anyway [11:48:08] abotu the name... [11:48:39] thinking abotu this again, SlotDiffRenderer seems off - the implementation is not per slot role. [11:48:46] it's typically per content model [11:48:55] so ContentDIffRenderer would make more sense [11:51:11] to me SlotDiffRenderer says that it is responsible for rendering the diff for a single slot (as opposed to a full page) [11:51:29] ContentDiffRenderer sounds reasonable too, though [11:52:26] yea, that'S how we originally arrived at the name. But you could just as well use the renderer completely outseide the context of slots. [11:52:52] the interface knows nothing about sltos. the implementation knows nothing about slots. so why should the name refer to slots?... [11:54:36] I'm still unsure to what extent we'll have slots with a generic content type and most logic in the role handler [11:55:45] I see usecases for having different wikitext slots, btu they donÄ't require much custom logic [11:58:54] ideally I'd like to see a diff rendering with two separate steps, with a diff engine creating a logical diff representation from two contents and a diff formatter turning that into HTML (or JSON or whatever) [11:59:20] where the first step is on the content level, the second step might be on the content or role level, not quite sure [11:59:50] but that would be a lot of refactoring [12:00:06] that would be nice, but iirc is a performance issue [12:00:19] that is, we'd need both options: do it nice and slot, or quick and inflexible [12:00:40] we could just split wikidiff2 in two [12:00:50] "just" :) [12:01:05] as I said, lots of refactoring [12:01:16] I don't think it would impact performance much [12:01:36] I commented inline on the patch. Can you reply there, so our thinking is visible? [12:01:47] sure [12:02:27] uh. the MobileFrontend patch went in after I gave a +2 on the core change it depends on - but the core change faield to merge [12:02:37] the dependency handling seems off. [12:03:27] ah, nvm, i got confused [12:04:10] both are in now! [12:34:39] DanielK_WMDE: followed up [12:34:45] gtg for lunch now [14:07:54] what's a Volkswagen hack? is that manufacturer considered especially shoddy or something? [14:10:30] Gas-Auto [14:18:34] tgr: thanks for deploying that interface-admin bureaucrat change :) [16:04:50] I think there are other places in core where we already mix namespaced and non-namespaced stuff [16:50:26] Anyone got a trivial script setup to loop over all CA wikis and do an API request? [16:51:59] api write [17:00:19] tgr, anomie: I followed up on the RevisionRenderer patch. Do you think you can have a look at the modifications/replies today? Then I can follow up again in the morning. [17:01:50] tgr: the diff stuff is in now, I'll poke at it on beta a little [17:10:22] legoktm: btw, been seeing a certain fatal in patch coverage a few times recently. "Undefined index: R" followed by PHP Fatal in Git.php [17:10:24] https://integration.wikimedia.org/ci/job/mediawiki-phpunit-coverage-patch-docker/792/console [17:10:32] Might be due to the renamed file? [17:13:21] tgr: I could be wrong, but I suspect it relates to the emissions scandal from Volkswagen, where they passed the test for low emission by making their engines detect whether or it is likely it is under a test (knowing how the inspectors do their work) and emit less there than under normal conditions. [17:14:40] tgr: Krinkle is right, I was referring to "code detects that it's under test and fakes stuff". [17:15:10] https://en.wikipedia.org/wiki/Volkswagen_emissions_scandal [17:16:08] they call it a "defeat device" [17:17:47] i stole that from Thiemo_WMDE, who complained about my "VW code" at some point :) [17:19:37] Reedy: yeah, it's in the task about MassMessage being broken [17:20:33] https://phabricator.wikimedia.org/T139380#4460313 [17:20:45] it's an API read but probably not hard to adapt [17:21:06] lol [17:21:31] that looks horrible :P [17:22:20] everything written in shell scripting looks horrible [17:22:24] I have somewhere in my edit history a JS script that does a global preference write when you visit Special:BlankPage?doit=1 or something. [17:22:32] Now we have GlobalPrefs it's not really needed. [17:22:35] it's just slightly uglier due to being on a single line [17:22:57] I wonder if it'd be easier just to write a maintenance script and run it over foreachwiki [17:23:13] Plausibly [17:23:38] you can use foreachwikiindblist with eval.php probably [17:24:37] DanielK_WMDE: will look [17:30:12] thank you! [17:35:38] * @deprecated since 1.25 [17:35:38] */ [17:35:38] public function getErrorsArray() { [17:35:38] return $this->getStatusArray( 'error' ); [17:35:44] getStatusArray is protected... Helpful [17:39:23] tgr: forgot to push, sorry. update is up now. [17:40:47] DanielK_WMDE: you might have missed some comments in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/451025/18/includes/parser/ParserOutput.php , I wrote those later [17:41:25] right, I'll look [17:43:04] https://gerrit.wikimedia.org/r/454325 that'll do [17:52:23] legoktm: regading the patch-coverage issue, Id like to fix that actually. (to better understand how it works) But might need some help. [18:10:35] https://gerrit.wikimedia.org/r/454326 fixes a copy pasta error in the docs [18:16:55] (+2'd) [18:18:36] ta [18:27:40] "Use @note annotation instead of @note:" [18:27:42] what? [18:28:10] oh, it doesn't like the colon? [18:29:53] yup [18:33:34] What does "@deprecated" do. Is that being read by some programs/scripts or is just a note for the developers? [18:35:05] it's for developers, the "machine-readable" equivalent is calling wfDeprecated() [18:35:20] tgr: thanks so https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/StopForumSpam/+/454086/3/updateBlacklist.php is mostly a no-op right? [18:35:29] although IDEs do warn when a deprecated method is used [18:35:41] maybe these days phan does too? [18:36:01] I looked to use wfDeprecated ( __METHOD__, 1.31 ) but I'm not sure how to use that [18:36:16] so I went to soft-deprecate first [18:36:51] if you mean that it does not affect the behavior of the script then yes, it's a noop [20:29:39] Krinkle: hey gentle ping that I need a little more context on https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/EventLogging/+/453457/ [20:29:57] I'm about to run out for a bit, but maybe let's chat whenever you're free? [20:30:18] milimetric: I can talk now, or when you're back before ~4 hours from now. [20:30:56] ok, great, I'll ping you when I'm back [21:08:45] How would I go about checking if an extension is installed in any of our wikis? Is there a list/tool somwhere that lists this kind of information? [21:09:09] Basically, Special:Version but for all the wikis :p [21:09:39] https://noc.wikimedia.org/conf/highlight.php?file=InitialiseSettings.php [21:09:47] https://noc.wikimedia.org/conf/highlight.php?file=CommonSettings.php [21:10:13] TBH, dmaza, https://noc.wikimedia.org/conf/highlight.php?file=extension-list is the simplest list [21:10:42] very nice.. I didn't think of looking into configs [21:10:46] thank you Reedy [22:51:05] Krinkle: looks like I never added renamed file support, it would go into https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/tools/phpunit-patch-coverage/+/master/src/Git.php#28 and the GitChanged class [23:06:37] it probably would be easier to consider renamed files as deleted + created files