[15:58:58] Hey Elitre. :-) [15:59:14] About to start the weekly triage meeting. Can't hear James_F or others on Webex though :) [15:59:30] Elitre: We're all muted right now, yes. [16:00:16] Welcome everyone to this week's VE triage meeting. As the topic states, please see https://www.mediawiki.org/wiki/Talk:VisualEditor/Portal for details and link to join the audio portion of the meeting on Webex. [16:00:56] As a reminder, if you don't feel like clicking on the link above :p , this is what's on the agenda: reviews of release criteria, of resolved blockers and of nominated blockers, then a short discussion about other business. [16:02:31] https://phabricator.wikimedia.org/project/profile/1015/ is the link to the criteria. [16:02:59] Anyone wishing to discuss that? If not, let's move on. [16:04:11] Here's the link to the solved blockers bugs: https://phabricator.wikimedia.org/maniphest/query/k2_oebAEjnjt/#R [16:04:22] https://phabricator.wikimedia.org/T91145 was a regression. [16:04:40] https://phabricator.wikimedia.org/T91307 irritating, now fixed. [16:04:56] https://phabricator.wikimedia.org/T91299 was about redlinks. [16:05:14] https://phabricator.wikimedia.org/T88386 and https://phabricator.wikimedia.org/T76523 , also fixed. [16:05:56] The latter means the toolbar is now loaded faster. [16:06:25] https://phabricator.wikimedia.org/T89054 was about the context menu. [16:07:15] https://phabricator.wikimedia.org/T70590 was about the behavior of the link inspector. [16:07:40] https://phabricator.wikimedia.org/T89309 affected Safari users. [16:07:59] https://phabricator.wikimedia.org/T71474 , https://phabricator.wikimedia.org/T89923 also now fixed. [16:08:56] https://phabricator.wikimedia.org/T89878 is about integration with OOjs UI dialog. [16:09:43] I might have missed a few of them. [16:09:58] To follow the discussion about nominated blockers: check out the left column at https://phabricator.wikimedia.org/project/sprint/board/1015/ . [16:10:49] End of quarter in 4 weeks! let's not freak out about that though :) [16:11:53] https://phabricator.wikimedia.org/T85622  is a bug meaning that new internal links, in preview, point to an URL which has /w/ rather than /wiki/ in it, and therefore won't work. [16:12:12] Accepting it as a Polish task, since it's so minor. [16:13:02] https://phabricator.wikimedia.org/T89399 is a dependency of a bug accepted last week. [16:13:27] Also being accepted as such. Relatively irritating and inexplicable. [16:13:32] https://phabricator.wikimedia.org/T90420 meant that after clicking Edit on any wiki in Firefox, although the cursor is correctly at the top of the page, the view will immediately scroll to the bottom. [16:13:45] Accepting it. [16:13:52] https://phabricator.wikimedia.org/T90454 is about loading the "welcome to VE" message earlier than what happens now. [16:14:18] improved perceived speed, therefore, accepting as Polish. [16:14:36] https://phabricator.wikimedia.org/T91245 is a performance issue. [16:15:48] (shout out to Wittylama for showing it to us. Accepting it!) [16:16:28] Trevor suggests moving it to the Polish queue though. [16:16:43] https://phabricator.wikimedia.org/T91248 , another performance issue. [16:17:20] What's all this talk about people from Poland [16:17:38] Queueing and being minors etc. [16:17:47] (aka the chimera bug. difficult to explain, but will be accepted) [16:18:05] marktraceur: we love that community :p [16:18:16] https://phabricator.wikimedia.org/T52227 is also related to design, specifically, to the toolbar appearing on several lines when the window is narrow. (So annoying, IMHO.) [16:19:16] oops, I was looking at another task :) https://phabricator.wikimedia.org/T90815 is an item related to the toolbar's design, accepted anyway. [16:20:00] https://phabricator.wikimedia.org/T51806 is a quite old request to display hidden templates somehow so that the editors are aware of where they are, and can interact with them. (My favourite in this batch. Glad my clever colleague Sherry nominated it.) [16:22:04] Accepted as polish task. [16:22:25] https://phabricator.wikimedia.org/T90673 sub-task of an already accepted task. [16:22:36] therefore, needs to be accepted. [16:23:29] anything you'd like to say? feedback about the meeting(s)? [16:24:39] We'd like to switch to Google Hangouts: any concerns about this? [16:25:20] (looks like there are a couple of people on call in Webex right now though). [16:25:44] (three people calling in via phone currently in that meeting) [16:26:18] They'll probably be able to do the same in Hangouts. [16:28:15] Thanks for coming, you'll find the logs and minutes linked from https://www.mediawiki.org/wiki/Talk:VisualEditor/Portal ASAP. We'll probably be using Hangouts for the next meeting. Get in touch or leave further feedback on that very same page. bye! [17:50:40] Tech talk starting in 10 min [17:58:57] talk starting soon! :) [17:59:48] starting in just a few min [18:00:00] awesome [18:00:01] exciting!!! [18:00:36] for once, I make it to one of these events !! [18:00:41] but why am I the only one in the Tech Talk Hangout? :P [18:00:44] yay, hi thedj! [18:00:51] andre__: what's the URL? [18:01:04] don't know, whatever I found in the "internal" Engineeering calendar [18:01:09] I just click, I don't read. [18:01:20] ah, more people! [18:01:46] i only see the youtube link [18:01:56] ohai [18:02:24] almost there :) [18:03:29] The sixth suggested video by Youtube: Hack Hungry Shark evolution monedas y diamantes infinitos ( NO ROOT ) [18:03:38] 7th actually [18:03:47] whouhou, I amde it. [18:04:15] starting! [18:04:28] let me know when you are watching on youtube [18:04:34] Now [18:04:38] rfarrand: now [18:04:54] ping me with all questions :) [18:04:58] * anomie is watching on youtube [18:05:07] youtube not working for me [18:05:25] try to reload jeremyb2 [18:05:37] rfarrand: i'm watching on youtube as well [18:05:37] youtube working for me [18:06:02] ok, maybe got it now [18:06:22] great [18:06:45] 30 people watching! [18:08:17] HA ! [18:08:25] thedj: :) [18:09:24] var_export never gets any love. [18:09:54] WAT [18:09:59] That's pretty horrible. [18:10:32] (the list thing, not var_export) [18:10:46] ewww, why do these code samples have smart quotes? :-) :-) [18:11:20] <^d> superm401: echo var_export($var, true) [18:12:03] ^d, yep. What about it? [18:12:26] <^d> superm401: I love var_export :) [18:12:27] rfarrand: in the future can we start the stream early (5 or 10 mins ahead?) even if the camera, etc. is not actually turned on. or else google fix whatever the problem is that means I can be early and even leave and come back a few times but still can't manage to get it to start here at the same time you guys started. (so almost guaranteed to miss something) [18:12:32] :) [18:12:45] ^d, you can leave off the echo and true, of course. It outputs to stdout by default. [18:13:13] <^d> echo "foo msg: " + var_export... [18:13:17] rfarrand: (I'm willing to do a bit of testing with you if you want to figure out a way that works well) [18:14:02] jeremyb2: let me think about a good way to address that. That would effect the start of the youtube video which is what most people end up watching over time. I can always invite you directly to the hangout which we start early. [18:14:11] I think there's an implicit "if you have your editor configured to do that" [18:14:34] Yeah, sounds like a really cool feature though. [18:14:41] Presumably there's a command line version too. [18:14:45] rfarrand: which is why we should test with a dummy hangout. IIRC it's not just on or off. [18:16:27] jeremyb2: happy to test ideas out with you [18:16:42] danke :) [18:17:31] Hm.. sounds like Node.js / yield :) [18:17:39] I wonder if we could write a transpiler from Hack to node.js [18:17:40] :P [18:17:53] and mediawiki will run on nodejs, at last [18:18:50] <^d> nodejs would be so much nicer without the js part [18:19:06] ^d: It's event loop has been factored out for a while now [18:19:07] Krinkle: hehe :) [18:19:09] there's lots of binding actually [18:19:11] even php ones [18:19:29] libuv [18:19:31] <^d> srsly? color me surprised [18:19:41] https://github.com/joyent/node/tree/master/deps/uv#readme [18:19:42] * ^d doesn't exactly pay attention though [18:20:02] https://github.com/libuv/libuv * [18:20:09] I guess async functions useful if response involves multiple requests to services [18:20:33] worse, it often DOES io. [18:20:49] I think that's overstating how much Lua we use a bit. [18:20:55] Yeah, it's the best we can do without threading. [18:21:15] <^d> superm401: Well, we use it a ton on-wiki. I don't think the analogy really works [18:22:08] <^d> Although maybe you could async calls to luastandalone [18:22:11] * ^d shrugs [18:22:49] so, async in php basically only works as long as you have lots of other async stuff at the same time right ? [18:22:59] You'd need multiple instances, a single instance isn't thread safe I don't think. [18:23:05] (of Lua) [18:23:07] thedj, in actual PHP, not much async support at all. [18:23:15] <^d> anomie: probably yeah [18:23:26] i meant hack sorry, already mixing stuff up [18:23:58] PHP can use pthread, but it's rare. [18:24:34] pthread + apache doesn't mix too well, IIRC [18:24:42] Sounds like Hack has nice syntax even for linear async dependencies. [18:24:55] Hm.. callback hell or promises are hidden with 'yield' in js though, much like 'await' [18:24:58] But maybe more of a perf win for depending on multiple once at once. [18:25:01] He, he's saying that now. cool [18:25:11] multiple ones. [18:26:56] I wonder what happens if I await a non-async function [18:27:25] runtime error ? or just gets ignored ? [18:28:21] akosiaris, it has a type checker, so maybe it could warn you. [18:28:47] superm401: could be [18:29:27] <^d> maybe you wait until you dieeeeee [18:29:27] Question for later: Would there we a way to do call u->friends() speculatively before you know if it's needed, then only use it sometimes? [18:30:21] doesn't strong typing imply you can't have dynamic dyspatch like $foo->$bar($foo)? B/c I don't see how it is possible to statically check that [18:31:18] SMalyshev: how is that different from $$n ? [18:31:33] SMalyshev, you mean like List? [18:31:35] jeremyb2: not different, really, just one example. [18:31:51] With function append (T obj)? [18:32:13] superm401: happy to ask, how do you verbalize "u->friends()" [18:32:15] heh [18:32:41] superm401: no I do not. I mean dynamic function call which is not known in advance. I heard "if static type checker passes it, type error can not happen at runtime". I don't see how $foo->$bar($foo) can fullfill this promise [18:32:46] I wasn't paying attention, what does the ? mean in the function arguments ? optional ? [18:32:54] rfarrand, "call the friends method on u" should work. Thanks. [18:32:56] SMalyshev: the type checking is optional [18:33:06] thedj: ? means it is "nullable" [18:33:08] thedj: It means the object could be null [18:33:08] thedj: nullable [18:33:13] ah righ [18:33:25] superm401: thanks :) [18:33:30] Oh, that's nice (specifying field and constructor param simultaneously). [18:33:41] k. thx guys. was difficult to hear from the kitchen :) (coder needs coffee) [18:33:44] spagewmf: that doesn't answer the question. Let's say I use that option and all my functions are types. You still don't know statically what $foo->$bar means [18:33:47] Sometimes people in PHP don't even specify the field, making it public, which is not great... [18:34:37] SMalyshev, oh, I missed the $bar before. [18:34:46] Maybe if you use that you're giving up on the static type guarantees. [18:35:25] Hack accepts most PHP (type annotations are optional), but the static typing only applies if you annotate. [18:35:29] superm401: that's not what he said :) he said type error can not happen [18:35:49] SMalyshev, sounds like a question for the end. :) I interpreted it as only if you use the type annotations. [18:35:50] but if you do $foo->$bar it can happen [18:36:24] this list example is really bad example [18:36:26] What about function foo( int|string $x ) ? [18:36:26] who has actually used hack already? [18:36:41] if it's absolutely edge case why harp on it so much? [18:37:10] SMalyshev, he mentioned it twice briefly... [18:37:17] Not async? [18:37:20] nobody sane would do list($a, $b) = "string" anyway [18:37:44] jeremyb2: FB mostly [18:37:50] <^d> SMalyshev: Not on purpose [18:38:05] ^d: how can you accidentally write such code? [18:38:07] thedj: i meant within wikimedia/mediawiki :) [18:38:14] <^d> But the $foo = 'string'; list($a, $b) = $foo [18:38:18] <^d> That's possible by accident [18:38:19] thedj: (even just to play or for 10 minutes) [18:38:29] SMalyshev, you might right list($a, $b, $c) = $threeLetterCode; [18:38:30] ^d: that works as expected. Only literal string doesn't [18:38:49] superm401: s/right/write/ :) [18:38:54] SMalyshev, it's a legit WTF that they work differently with constant strings. [18:39:22] superm401: yes, except that nobody would use literal string in this context, so it's a bad example [18:39:33] SMalyshev, I disagree. You might in a REPL, or in a test. [18:39:50] superm401: why? for what legit purpose such code may be used? [18:39:58] If it works differently for a constant, that's a major pain when trying out code in a PHP shell (REPL). [18:40:12] I might want to use $threeLetterCode for real, but use 'abc' when testing in a PHP shell. [18:40:21] superm401: did you actually try it and it was a major pain for you in REPL? [18:41:56] I'm 99% sure nobody present every tried it in a real context [18:41:59] jeremyb2: some use outside Facebook on their blog, http://hhvm.com/blog/6005/hack-community-roundup-3 [18:42:07] can we switch tomorrow ? it's not the perfect language, but it seems a whole lot of more usable than pure php to me :) [18:43:03] Don't necessarily want to rush to adopt a language controlled only by one company. We saw how that worked with Oracle. [18:43:04] We can't do much experimenting like that if we want to maintain support for Zend PHP. [18:43:18] Luckily, some of this stuff has been adopted by PHP 7. [18:43:36] Does it transpile to PHP 5.3? [18:43:39] thedj: there are at least two hosts and imagescalers still on zend [18:43:45] list() on strings is gone on PHP 7 btw [18:43:49] <^d> anomie: We'll drop 5.3 support soon enough anyway [18:43:56] <^d> 5.6, I believe [18:43:59] there, zend escape hatch built in ! [18:44:02] ^d: halelujah! :) [18:44:09] anomie: "The Hack transpiler (or h2tp) provides a PHP 5.4+ compatibility" [18:44:11] ^d: I've been hearing that for a while :/ [18:44:11] ^d: well, if it transpiles, then maybe not? ;) [18:44:30] <^d> anomie: Get the rest of cluster on zend. [18:44:32] list() on strings?!.... wow >_< [18:45:47] spagewmf: afaik that does not support at least async, so we still could not use all of hack [18:46:07] That's also almost valid E4X. :) [18:46:16] I wonder object of what is xhp? is it DOM? [18:46:35] can I xpath it? [18:46:40] Woah, weird colons all over the place suddenly [18:47:02] DanielK and all: Can I please ask a general question about AuthManager and ContentHandler, since I'm a little unfamiliar with them, and in terms of Wikipedia's 288 languages? How do these work inter-lingually? And in what ways are https://www.mediawiki.org/wiki/Requests_for_comment/AuthManager and https://phabricator.wikimedia.org/T89733 - anticipating adding further languages? Thanks. [18:47:02] Yeah, are those inner classes or a new namespace syntax or what? [18:47:12] <^d> We should use hack in wmf-config shit. Multiversion and CommonSettings and the like. [18:47:36] ^d: Yeah, or an opt-in database sub class [18:47:42] <^d> That too ^ [18:47:48] xhp is it's own thing. https://www.facebook.com/notes/facebook-engineering/xhp-a-new-way-to-write-php/294003943919?_fb_noscript=1 [18:48:21] Scott_WUaS_, people are watching a tech talk right now. Maybe ask on https://lists.wikimedia.org/mailman/listinfo/wikitech-l or https://lists.wikimedia.org/mailman/listinfo/wikitech-l . [18:48:29] Err, or https://www.mediawiki.org/wiki/Project:Support_desk [18:48:33] Thanks [18:48:33] ^d: also we could use it where we compile into PHP. E.g. lightncandy compiles templates to PHP. It's unclear if we gain much performance [18:48:56] sounds a bit like angular directives [18:49:09] superm401: I think XHP is more like a different kind of quoted string [18:49:27] Yeah, plus it lets you make elements that know how to render to real HTML. [18:50:13] XHP things are actually classes with a parse level syntactic sugar to let you write them like xml fragments [18:51:58] People probably wouldn't inline them all in real life. E.g. the authors would probably be $postAuthors [18:52:04] Some PHP functions take it in either order! [18:52:13] we are going to need to be quick with questions today - sorry - we need to be out of this room exactly on the hour [18:52:16] can't you just use foreach and write in like 2 lines? [18:53:23] I will work with Josh to fine a way for people to ask questions after this.. maybe be email [18:53:31] *by email [18:53:33] Yay, more javascript paradigms [18:53:53] rfarrand: someone can have the speaker look at his laptop in the collab space while we ask questions in this IRC channel [18:54:01] ok, so we've got shorthand lambda syntax [18:54:11] questions? [18:54:17] which also imports whole scope [18:54:29] In fact, Yahoo recognised the similarity with Hack and wrote static type system on top of Javascript. [18:54:38] imports whole scope [citation needed] [18:54:48] http://flowtype.org/ [18:54:52] question: is anyone using the transpiler? [18:54:54] s/Yahoo/Facebook [18:54:58] I believe Erik B told me it adds implicit use statements. [18:55:08] Flow is Hack for JavaScript. [18:55:27] Also, JavaScript does the same thing, but I doubt it actually keeps the whole scope alive in any sane JS VM. [18:55:30] <^d> bd808: Hahaha. I tried to test hack on a mostly-unused filed in wmf-config but this happened: [18:55:36] <^d> 18:54:30 sync-file failed: Command '/usr/bin/php -l /srv/mediawiki-staging/w/query.php' returned non-zero exit status 255 [18:55:51] anyone in the hangout can ask directly [18:56:05] ^d: heh. lint failure [18:56:07] Krinkle: Flow is improved discussion and collaboration for MediaWiki :) [18:56:08] andrewbogott_afk, bd808, ori [18:56:13] ^ [18:56:38] opps, not andrewbogott_afk, i mean andre_ [18:56:43] Hi rfarrand o/ [18:56:49] hi Krinkle :) [18:57:17] last chance for questions! [18:57:28] ping me so I know that you actually want me to ask [18:58:07] superm401: do you still want "Would there we a way to do "call the friends method on u" speculatively before you know if it's needed, then only use it sometimes?? [18:58:14] rfarrand, yes [18:59:05] so...all hooks wouldn't be type checked? [18:59:09] rfarrand: I want to ask about governance and future. Are there any core committers outside Facebook? What's to stop FB from screwing up Hack? [18:59:20] No need for 'do' [18:59:22] question: is there a performance difference between running async code in hhvm in native hack and the same code transpiled to php? [19:00:29] So strict mode is unusable for us. Ok. [19:00:33] basically, to enable strict typing you have to give up the dynamic nature of php [19:00:47] all variable dispatch etc is gone [19:01:32] and if there's no multidispatch in functions is may become a little awkward [19:01:36] anomie: you can still use strict mode for some files [19:01:52] I argue you never need $foo->$bar in serious code. [19:02:02] who is the speaker? [19:02:09] jzerebecki: you can but you lose the strict guarantee (no runtime errors) unless you go total strict [19:02:15] sure [19:02:21] If there's a finite number of options, explicit flow should do. If there's more than a few, the code needs refactoring. [19:02:28] The only way we could really use it would be via libraries that had both a php and a hack version (or a hack and transpiled version) [19:02:32] It's one of those bad parts you can just take out of the language as a user and not use. [19:02:40] spagewmf and anyone else [19:02:43] or in stand alone service things [19:02:44] Josh is still here [19:02:52] want me to set up a hangout [19:02:55] and you can ask questions? [19:03:07] Krinkle: declaring dynamic dispatch a bad part of PHP is going quite far. IMHO it is one of the good parts [19:03:10] rfarrand: we're in IRC, maybe he can join it [19:03:27] ok, he is standing around chatting with people now [19:03:36] I can suggest it afterwards [19:03:37] SMalyshev: dynamic functions and weak typing (e.g. one function can return different types) is good. [19:03:49] But variable function calls is imho just poor application design. [19:03:55] Krinkle: of course it can be done in other means (like multimethods) [19:03:58] There's no need forit. [19:04:05] rfarrand: if he has a laptop, he visits https://meta.wikimedia.org/wiki/IRC_office_hours , clicks #wikimedia-office^connect and is in [19:04:06] Krinkle: no they are not [19:05:02] they allow you to handle dynamic range of options without writing huge switches etc. [19:05:14] Don't need switches either [19:05:38] SMalyshev: Got an example of where a method name is passed as a variable in good code? [19:06:12] Krinkle: that would depend on your definition of good code. Since you predefined all code where it's used to be bad, we have a contradiction here [19:06:51] if you give another definition - sure, I used it many times, but by virtue of that you'd define my code bad. it still worked, was nice, readable and easily extendable [19:07:38] SMalyshev: An example in code you consider good, then. [19:08:02] I'm open to redefining my scope to include it if it seems reasonable to me. [19:09:51] Krinkle: an implementation of Strategy pattern would use it [19:10:05] for example [19:12:06] (of course, you don't have to pass method name - closure would do too) [19:12:23] in general, what PHP calls "callable" [19:13:42] but beyond that, any code that you would implement as multimethod in a language supporting them [19:13:55] bd808: so for library we'd have to figure out how to have hhvm hosts pick the hack version of the library, and zend hosts use the regular (maybe transpiled) version. [19:15:09] what's multimethod? multiple definitions with different signatures? [19:15:15] bd808: I wonder what that gets us? Maybe runtime speedup, and speaker Jeff would say Hack's type checking delivers better library quality [19:15:19] spagewmf: yeah, which would be a configuration time option I think. Two libraries implementing a common interface and you'd use composer or whatever to choose which one to deploy and run [19:15:34] jeremyb2: basically set of methods that have the same name but different arg types [19:16:02] SMalyshev: method overloading does not exist in PHP. [19:16:05] so you can do foo.bar(xyz, baz) and which bar is called depends not only on foo but also on xyz and baz [19:16:05] Nor javascript [19:16:22] Krinkle: that's correct. But you can do the same with dynamic functions [19:16:27] It only has one definition. How is that related to calling a method from a given string? [19:16:32] spagewmf: and possibly features like async. imagine a service pattern where one version used curl/multi-curl and the other used async/await [19:16:52] Krinkle: don't focus on strings, it doesn't have to be a string, any callable [19:17:00] I think we're in a terminology mixup. I'm not familiar with using terms 'multimethod' or 'dynamic functions' by your definition. I probably know them as something else. [19:17:15] SMalyshev: Invoking a callable (e.g. $foo() as opposed to $bar->$foo()) is fine. [19:17:19] but the danger of course is diverging implementations and the php codebase rotting (unless it was transpiled) [19:17:22] And Hack will support that [19:17:31] Krinkle: imagine you have a Printer method than needs to print a number of types. $printer->print($anything) [19:17:54] does phab use hack? are we running on zend or hhvm? [19:17:57] how you make it easy for it to do that without having a huge switch or other uglyness? [19:18:08] Printer class rather [19:18:10] SMalyshev: print is the method name, yes/ [19:18:18] Where is the variable> [19:18:36] Krinkle: yes, for public API but you can have internal implementation. Imagine you have 20 types to print [19:18:52] each type has different printing code [19:19:11] SMalyshev: And so you'd do that. get_type($anything) and call internal $this->print_$type($anything): ? [19:19:13] That'd horrible. [19:19:39] A switch case for type seems quite reasonable there. [19:19:49] Krinkle: that's one way, not get_type() - that's produce "object" - but with $anything->getPrintType() [19:20:03] Krinkle: 20-clause switch? You really think it's better? [19:20:10] SMalyshev: If the anything object has a method then there is no variable. [19:20:29] Krinkle: if each printing method has 20 lines, you've got 400-line function. And god forbid you miss a break; anywhere... [19:20:30] SMalyshev: can you show us example zend php for what you're imagining? [19:20:31] SMalyshev: I think you misunderstood my original premise. Which is that calling a method name on an object with a variable is bad practice. [19:21:08] e.g. $this->$bar($something) where $bar holds a string that is a method name. [19:21:17] jeremyb2: give me 2 mins to sketch it out [19:21:45] Krinkle: but $this->bar($something) is fine, right? [19:22:48] jeremyb2: Yeah, totally. I don't think any sizable program wouldn't have that. [19:22:54] That's the most boring code any PHP program would have. [19:22:56] ok :) [19:22:57] bread and butter [19:23:08] i don't look at php too much :D [19:23:30] * jeremyb2 repeats: does phab use hack? are we running on zend or hhvm? [19:23:41] chasemp: ^ [19:23:41] jeremyb2: something like this: https://gist.github.com/smalyshev/a6528384707ddbfe38e9 [19:23:48] jeremyb2: I think it might run hhvm, but not using Hack internally. [19:23:57] it doesn't [19:24:11] so, zend or hhvm? [19:24:17] It runs on nginx and claims powered by PHP/5.5.9-1ubuntu4.5 [19:24:19] old but http://www.quora.com/Does-Phabricator-run-on-HipHop-for-PHP [19:24:31] afaik evans opinion hasn't changed [19:24:46] Krinkle: that's exactly what I disagree with [19:24:49] hhvm is //possible// maybe even easy but no one is using it that I have seen [19:25:03] chasemp: Hm... I knew Phabricator doesn't use Hack features. But do we not run it on hhvm, like other app servers? [19:25:14] that's correct [19:25:21] I don't think it is a bad practice - actually, I think it's one of a very useful tools in PHP toolkit [19:25:29] Or does this mean this is the only server we have that uses Zend PHP with a version other than 5.3 [19:25:32] ok, so we could do hhvm if we felt like it but essentially only facebook actually does it [19:25:50] there are other zend instances running I believe, thought I cant' remember what [19:26:03] chasemp: on trusty / php 5.5? [19:26:25] the determination to purge standard php from mw is larger than appservers [19:26:26] we outsourced wordpress. we probably have some drupal [19:26:27] trusty PHP 5.5.9 [19:26:36] is fundraising all hhvm? [19:26:46] (and/or civi) [19:27:28] it's a good question for jgreen but I think not [19:27:34] huh, i thought contacts.wm.o was dead. but seems to still be running [19:27:48] Krinkle: wikitech is on trusty / php 5.5 as well [19:28:43] hhvm isn't all sunshine and roses, for performance it's brilliant in some scenarios right but adds complexity and isn't a total win in every case [19:28:52] but then again it's not my call to make [19:29:29] complexity for developer? or you mean precompiling hhbc? [19:29:35] if we were cpu bound for phab and were willing to take on the local mismatch from the canonical instance it would be worth it [19:29:39] jeremyb2: I don't think fundraising is on a branch that supports hhvm yet. Anything from before this past November or so would have various blocker bugs for running on hhvm [19:29:40] complexity for support [19:29:59] bd808: ahh [19:30:42] bd808: remember the great code review marathon of 2014? reviewing stuff that had already been deployed for a while. (I guess self-merges) [19:31:09] SMalyshev: We'll have to disagree to agree ;-). I've seen the concept you mean there but never written like that and not aware of any qualitative or maintainability compromise. And I'm quite strict when it comes to readability and such in code. [19:31:34] jeremyb2: I didn't get roped into that but I did hear about the list that had to be checked out, yeah [19:33:01] Krinkle: so how would you write that code better? e.g. in example on gist, what would you do? [19:35:31] SMalyshev: If you consider a wide range of conceptually similar problems but that e.g. involve something other than a string (or something that would fit within a reasonable method name) or something that isn't returned by a single method, then one woudl also be forced to use a registry, subclasses, switch case or elseif construct. [19:35:33] https://github.com/wikimedia/mediawiki/blob/631186747a9/includes/content/JsonContent.php [19:35:34] for example [19:35:51] or https://github.com/wikimedia/mediawiki/blob/631186747a9/includes/json/FormatJson.php [19:37:32] Krinkle: if you have complex case that you can not reduce to a simple parameter, yes. But most cases can be reduced so, at least within the parameters for task in hand [19:37:37] Krinkle: SMalyshev: why not just have a ->print() instead of a (or in addition to) a ->getPrintType() ? [19:38:29] jeremyb2: because House class doesn't know about how printing is done, it's not its business. It just holds data [19:39:05] jeremyb2: there could be 100 of different printers, we can't put all of them inside House. Of course, if you have only one printer, then you can do it this way [19:39:23] i don't understand [19:39:30] House doesn't have to know how to print everything [19:39:31] jeremyb2: but once you have the second one, you don't want to go and modify all sibject classes, right? [19:39:34] just how to print House [19:39:35] *subject [19:40:03] jeremyb2: but what if you have 10 ways of printing? as HTML, as SVG, as JSON, as PNG, etc. [19:40:25] that's different [19:40:56] jeremyb2: right. so you can do it both ways, depending on which part would be likely to expand most and which part needs to know most [19:41:04] first you print to e.g. XML. or YAML. or just an in memory structure. then something renders it to HTML, SVG, JSON [19:41:17] both ways are valid, in appropriate circumstances, I just describe on of them [19:41:42] jeremyb2: you seek workaround for a fixed design. that's not a point of the exercise - the point is to find better design :) [19:41:56] If every printer needs logic for every type, something is wrong. [19:42:26] Krinkle: every printer always needs the logic for every type, there's no way out of it. The question is *where* this logic is located [19:42:27] And yeah, it seems like a syntactical trick that saves a few lines of code and decreases readability. [19:42:58] If it only contains data, and the data is in a usable format. The printer should be able to deal with it in a generic way. [19:43:17] perhaps baring a small set of exceptions that are generic based on the shape of the data, not a specific type [19:43:19] Krinkle: no it doesn't decrease redability - you can very easily know which code goes where by just looking at the names, and you also have right types [19:43:37] Anyway, that's how I roll. [19:43:37] o/ [19:44:05] maybe the problem is that this is an abstract discussion. SMalyshev can you give some real production code as an example? [19:44:07] Krinkle: you seem to just refuse to consider the case where you can not have generic logic that works for all cases [19:44:18] jeremyb2: real production code of what? [19:44:38] of $$n or $a->$b [19:44:58] that is analogous to your House example [19:45:10] of course it's an abstract discussion. How discussion about abstract design option like "dynamic function are always bad" can't be abstract? [19:45:30] if you show an example that's not bad [19:45:37] SMalyshev: I still think that the only case that warrants such code flow is if you're dealing with data from a foreign source you do not control, and as such can't blame it on one's own design. [19:46:02] jeremyb2: I don't want to gice you my code since it's essentially the same and I don't have code handy right now since I didn't spend last night looking for specific production code example for this :) [19:47:56] As with all bad parts in languages, there are safe ways to use them, but it takes effort to distinguish from unsafe ones and it's easily written a different way that doesn't require use of that feature. Akin to e.g. the 'with' statement or 'goto'. [19:48:39] Krinkle: properly compatrmentalized module rarely controls all the environment around it, it'd be too much information inside the module [19:48:54] so if the environment needs to change, the module's assumptions would break [20:03:33] SMalyshev: i think when Krinkle says foreign he means something broader than just outside the module [20:04:08] Yeah. If the data is within your own application, it should be possible to design it in a way that doesn't require htis [20:04:46] Krinkle: I don't think modifying you data so your avoid a prejudice in using specific language tools is a good design principle [20:04:52] And if such data would enter an application, if it needs this kind of repetition and redesigning in every printer, I'd probably just abstract it first (within pure data) so that it isn't needed. [20:05:05] SMalyshev: It's quite the other way around. [20:05:17] It's escalating down to a principle in this case. [20:05:54] I don't see how declaring part of language toolkit bad by principle and rearranging all the data around it is a good design approach. [20:06:07] Consider a large ven diagram and there's a tiny bit of pixel overlap of this particular language feature, but when zooming out there is a large body of reason on top of this, it just happens to alienate this feature. And I don't see that as a coincendece. [20:06:53] goto is bad. It took 2 generations to get there and make it ubiquitous without argument. People are used to doing things a certain way and that simply takes generations to get around. [20:07:08] It's not bad because it doesn't work well with complex data or edge cases. [20:07:15] It's inherently bad. [20:07:44] There can be very large mostly good programs that happen to use it. Then changing that one part seems odd. Especially if it requires the underlyin data model to change. [20:07:49] Krinkle: goto is not abstractly bad (and also not always bad, try to implement a good complex parser that performs without goto and you'll see how it goes). goto is bad because of its consequences [20:08:06] I [20:08:33] I've written and read good well-performant parsers in C, JavaScript, PHP and python without goto. Not missed it a single second. [20:08:42] Krinkle: nothing is "inherently bad". it's just a dogmatic approach to programming, turning it into a religion. programming is not a religion, it's a craft [20:08:48] Anyway, this isn't going anywhere. in #wikimedia-dev for anything else. [20:09:20] Removing bad parts liberates the mind to focus on things that matter. Letting go of the small cases where it felt nice is part of moving on. [21:00:49] #startmeeting RFC meeting [21:00:50] Meeting started Wed Mar 4 21:00:49 2015 UTC and is due to finish in 60 minutes. The chair is TimStarling. Information about MeetBot at http://wiki.debian.org/MeetBot. [21:00:50] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [21:00:50] The meeting name has been set to 'rfc_meeting' [21:01:10] o/ [21:01:20] \o [21:01:22] #topic AuthManager | RFC meeting | Wikimedia meetings channel | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ [21:01:30] #link https://www.mediawiki.org/wiki/Requests_for_comment/AuthManager [21:01:38] * aude waves :) [21:01:54] hey aude [21:02:04] actually on time ... [21:04:23] so there are these 4 sets of components [21:04:36] I'm gonna sit this one out because I'm falling asleep [21:04:36] Hopefully next week I'll have adjusted to this timezone properly and I can actually participate [21:04:53] PreAuthenticationFilter, AuthenticationProvider, SecondaryAuthenticationProvider, PostAuthenticationFilter [21:04:58] ok RoanKattouw [21:05:26] I suppose they will be the names of interfaces or abstract classes? [21:05:42] Most likely, yes. I'm not sure about PostAuthenticationFilter. [21:06:01] That one may just be a Hook rihgt? [21:06:06] Yes. [21:06:10] what's the diff between AuthenticationProvider and Secondary* ? [21:06:50] Our split was things that "really" authn (db, ldap) and additional controls (2-factor) [21:07:05] but how would the interface be different? [21:07:05] to allow composition of the parts of the flow [21:07:11] Only one AuthenticationProvider is actually asked to "PASS" on the authentication request, and it determines the local identity of the user being authenticated. SecondaryAuthenticationProviders are stuff that should be done during the authn process like "reset password" [21:07:20] actually, it would be useful to see what the interface of any of them is supposed to look like [21:07:28] Where if they fail then the login fails, but they don't directly determine who is logging in. [21:08:06] We haven't decided on code-level interface yet, we didn't want to get too much into the weeds before we knew where we were going generally. [21:08:08] wouldn't these be just PostAuthenticationFilter's? [21:08:19] or filters aren't allowed to ask for data? [21:08:21] no, because the user isn't authenticated [21:08:28] there's three user states [21:08:36] not logged in, half logged in, and completely logged in [21:08:39] No, because two-factor determines whether whether yo're authenticated. [21:08:40] PostAuthenticationFilter is for stuff that replaces the usual "redirect back to the 'returnto'", like GettingStarted. [21:08:43] Post is *after* authentication. [21:09:02] AuthenticationProvider and SecondaryAuthenticationProviders are the two transitions between those three states [21:09:36] aha. I think it is worth adding to state diagram - i.e. which one does which state [21:09:36] I'm not sure how the half logged in state is implemented at the moment [21:10:02] anomie, superm401: so, how is the post-ath thing a "filter", then? [21:10:03] the half logged in state would be that we think we know who you are, but you can't do anything until you finish confirming your identity [21:10:04] maybe that is something we just hacked up for the global password reset we did a while back? [21:10:18] TimStarling: In current MediaWiki? I don't know that we even have that state. Probably some hook that aborts the login somehow. [21:10:20] Currently, the half-logged in state is just when you know the user is logged in, but you haven't set $wgUser yet... [21:10:43] There are a couple cases in the big switch statement that do that... it's ugly. [21:10:45] seems to me that a post auth hook is useful, but calling it "filter" seems misleading [21:10:58] Not my term, I was just explaining it. [21:11:04] SpecialUserlogin has 19 Hooks::run() calls; I bet several of them are the "abort" steps for the half-login state [21:12:03] DanielK_WMDE: I was grasping for words that were semi-descriptive. Filter fit in my head like a unix output filter [21:12:07] We really just have 2 half-logged in special cases right now-- temporary passwords and (hard) expired passwords. Both are ugly. [21:12:14] it will definitely be nice to declare half login to be an actual thing instead of a collection of hook hacks [21:12:54] * anomie notes names can be bikeshedded, he doesn't care much [21:13:03] yeah we should probably think of a better name for it [21:13:07] bd808: Hooks::run( 'AuthenticationComplete' ) [21:13:16] I think the state tracking will be handled in the AuthenticationResponse object [21:13:47] bd808: it would be great if we could have a clean separation of service objects and value/state objects here. [21:13:49] bd808: Or in the session that's being set up through the whole process. [21:13:53] That object being a POPO that passes data around [21:14:05] yea [21:14:09] DanielK_WMDE: I think we do [21:14:24] But please point out where we may be mixing [21:15:01] bd808: I didn't mean to imply that you are, just blurting out my thoughts :P [21:15:06] We should also remember that some half-logged in filters depend on past history. [21:15:11] but I'm having a hard time seeing any of this from the rfc [21:15:21] E.g. a CAPTCHA that comes up for 3 bad logins (but doesn't show on your first attempt). [21:15:25] would be great to see some (mock) code, and maybe a state/transition diagram [21:15:48] well, the data flow diagram is kind of like that i guess [21:15:50] Not sure how that would fit in. [21:15:54] *nod* I think the state transition is there in narratives but not explicit [21:16:12] so are the inferfaces [21:16:21] I think the auth flow and the actual backend checking should be separated - i.e. there are many ways to ask for username/password, including multi-screen ones (e.g. anthi-phising images etc) [21:16:26] The narratives in https://www.mediawiki.org/wiki/Requests_for_comment/AuthManager#Federated_auth approach a pseudo-code level of detail [21:16:34] superm401: the captcha plugin would be responsible for maintaining that state I think [21:17:01] SMalyshev: They are, based on the AuthenticationRequest types. [21:17:19] superm401: it would have to be tied to a session some where to track failure attempts [21:17:44] Probably in AuthManager itself decorating the session I guess [21:18:13] Right. So there would be one CAPTCHA PreAuthenticationFilter for login, but usually it reports that no form fields should be displayed (a noop), and only for certain session history does it ask you to show a CAPTCHA form? [21:18:19] and then like legoktm says the captcha provided SecondaryAuthenticationProviders would kick in a enfoce it [21:18:34] superm401: Yeah [21:18:49] Throttling is based on ip, not session... but yeah, either way I think the plan is for the captcha plugin to determine what it needs. [21:18:56] SMalyshev has a good point, I guess you would call it model/controller separation? [21:19:34] you could put a line between session and request management and verifying authentication [21:19:47] TimStarling: right. there should be something that I can call like $something->validate($username, $password) regardless of how I came to have the pair [21:20:06] TimStarling, SMalyshev: The AuthenticationRequest type defines what fields are requested from the user, and an instance holds values of those fields. Then the backend gets one of those instances and can do whatever it wants, including requesting another AuthenticationRequest type to get further info. [21:20:41] yeah, so the backend is deciding how many request/response cycles are required from the user [21:20:53] AuthenticationProvider is that place that checks the data collected however [21:20:56] and the backend is presumably responsible for managing multi-request state in a session [21:21:00] when I get this new AuthenticationRequest, where the old one is kept? Session? [21:21:18] *howefer it was collected [21:21:24] Session, or wherever else the backend might want to store it. [21:21:39] i'm still blurry about the state management. all state needs to be in the session, right? and when a request comes in, some code needs to find out in which phase we are, and which auth module/object should take control at this point. Then the relevant module can look at the session and request, and act on that input. [21:21:59] but how do we make sure we know which component to transfer control to when a request comes in? [21:22:07] That's what AuthManager is for: tracking state in the session and determining how to dispatch stuff. [21:22:51] plus the entry points needing to know what state they are in based on the response from AuthManager [21:22:57] did a miss the bit where it sais how this is done? [21:23:17] Special:Login (and ApiLogin) would basically take input, make one or more AuthenticationRequest instances, and call some AuthManager->start() or AuthManager->continue() method. [21:24:05] bd808: Beyond "starting a new auth-flow" versus "continuing an auth-flow already in progress", I don't think the frontend needs to know much. [21:25:02] would the provider know how login request was asked? i.e. you may want to ask somebody to create an account if asked via web but not via API, for example [21:25:22] I think the basic federated auth flow describes the state diagram -- https://www.mediawiki.org/wiki/Requests_for_comment/AuthManager#Simple_login_via_web_UI [21:25:37] anomie: would be useful to have the interface/methods drafted somewhere [21:25:52] SMalyshev: No. The provider would just return "No account" and the frontend would decide how exactly to reflect that result to the user. [21:26:08] DanielK_WMDE: "write the code and I'll tell you if it's ok"? [21:26:23] ^ what he said [21:26:33] bd808: the interface, not the code [21:26:59] anomie: so where the "no account" part would be? is it "UI"? [21:27:19] We hoped to describe the problem at a level that the interface was an implementation detail, but I think I understand your point [21:27:51] I tend to agree that interface (literal one or mock class or something, no code probably just the defs) may help [21:28:07] only function defs, only public APIs, etc. [21:29:09] well, the more you add to the RFC, the harder it is to approve [21:29:38] I think the overall direction is pretty clear already. [21:29:52] And most of the details would probably fall right into place. [21:29:59] TimStarling: true for scope/features. but i wouldn't say that more explanation or details make an rfc harder to approve [21:30:14] Can we agree in principle to move forward and then get interested parties involved in further design and code review? [21:30:39] SMalyshev: The AuthenticationProvider would return a "PASS" with no local user mapping. We don't seem to have defined exactly how that situation gets returned from AuthManager to the frontend. [21:30:57] Or are the fundamental questions that need to be answered? [21:31:02] I think we have no really serious objections [21:31:23] i personally don't see anything i'd disagree with, but a lot that i understand only vaguely... [21:31:44] but that's my problem, i guess [21:32:08] I like the general structure but I expect there would be some edge cases not described here. But that's ok [21:32:16] #info SMalyshev is concerned about lack of model/controller split [21:32:34] #info several people would like to see interface definitions [21:32:38] It's hard to think of all the edge cases until you get down into the code. [21:32:42] DanielK_WMDE: I can make sure you get looped in to design and early code reviews [21:32:43] anomie: agree [21:33:21] bd808: to be honest, i'm curious about it, but probably won't have time to really dig in. [21:33:29] *nod* [21:33:31] sure, cc me, but no promises :) [21:33:52] #info no serious problems in principle given the current level of detail [21:34:04] One question I think we wanted feedback on was how aggressive we can be about deprecation of the existing plugin/hooks. Are we ok totally removing AuthPlugin and most hooks by 1.27? [21:34:31] 1.27 is the next LTS [21:34:45] and we'd like to kill the old stuff before it ships [21:34:58] * AaronS didn't see any serious problems with it either [21:35:02] That is ~12 months out [21:35:16] getting to some code/interfaces seems low enough risk then [21:35:20] who's going to rewrite Special:UserLogin and Special:ConfirmEdit (CAPTCHAs) to provide these additional AuthenticationProvider UI fields? The former is especially crufty. [21:35:32] we don't have time for talking about deprecation right now [21:35:37] probably AuthPlugin is used by third parties a bit, although not sure how up-to-date people keep their wikis [21:35:40] let's move on to Daniel's RFC [21:35:41] spagewmf: This project would [21:35:42] * aude would be ok with it [21:36:05] #topic ContentHandler search support | RFC meeting | Wikimedia meetings channel | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ (Meeting topic: RFC meeting) [21:36:07] bd808: godspeed [21:36:08] * manybubbles is here for DanielK_WMDE's rfc [21:36:12] spagewmf: Hopefully with help from some frontend/design folks for the design aspect. [21:36:17] #link https://phabricator.wikimedia.org/T89733 [21:36:36] so, let me give a short intro [21:36:50] currently, there's Content::getTextForSearchEngine [21:37:01] getTextForSearch* [21:37:16] heh, aude knows the interface better than I do ;) [21:37:22] anyway, it returns a string which is then proccessed for the full text index. [21:37:28] DanielK_WMDE: nah you are right [21:37:42] that works well enough for text based content, but is a bit sad for anythign more structured [21:37:53] getTextForSearchIndex [21:38:02] * manybubbles doesn't care about method name [21:38:12] :) [21:38:22] so, for structured data, it would be nice to be able to expose individual fields separately [21:38:35] with a weight and a data type defined for each type [21:38:52] This might be relevant for Flow as well, though it's a bit more complicated since the data isn't really stored in the page. [21:39:02] so for INDEX_TYPE_QUANTITY - what would happen when quantities have units? [21:39:05] that way, Wikibase could tell cirrus "this is the label in French" and "this is a property value" [21:39:21] superm401: agreed. I'm not sure what to do there [21:39:50] this proposal is one way to make Cirrus to a better job with wikidata [21:39:50] SMalyshev: nothing, probably. for that level of sophistication, we'd want to integrate a proper query service [21:39:54] When is this method called? Just on save? Is there a way to trigger it? [21:40:01] that's beyond what i intended with this rfc [21:40:17] for now, it would just be great oif the content handler could define different fields to expose to the search index [21:40:18] superm401: that is probably complicated - it depends [21:40:35] superm401: links update. more or less. [21:40:50] do we use non-text fields in cirrus search now, like geo? [21:41:03] SMalyshev: yes - they are added using hooks [21:41:05] SMalyshev: via geodata extension [21:41:07] superm401: the search index is secondary data, like the link tables. updated from the job queue after save. but it depends on which search engoine you use. [21:41:07] cirrus calls some hooks [21:41:50] DanielK_WMDE: i think it's not really right not to think about where the request came from - it was really about making cirrus better for wikidata. unless its about something else that I've missed. [21:41:53] superm401: the fact that flow data isn't really stored in pages means you'll end up re-implementing a lot of core functionality. all the secondary data tracking stuff... [21:42:25] DanielK_WMDE, yeah, we have. Some stuff we can leverage core, other stuff we have to re-implement. [21:42:44] manybubbles: yes, it's about making cirrus more aware of wikidata's data structure. but it could be useful for other things as well. [21:42:46] DanielK_WMDE: secondary data updates are all handled through core, we just provided an apropriate ParserResult currently [21:42:51] where that breaks down is infinitly long pages [21:43:01] DanielK_WMDE, Matthias has already basically re-implemented search, so this is more of an academic question for now. [21:43:07] Unless we could later move over to it. [21:43:27] DanielK_WMDE: it might be useful to enumerate any of those other things in the rfc [21:44:16] manybubbles: ok. well. the handler for wikitext could expose things like sections, categories, etc separately. this is currently handled by cirrus itself, right? [21:44:39] DanielK_WMDE: yes - I imagine we'd push all the special stuff cirrus has for wikitext into core at that point. [21:45:05] manybubbles: exactly. ok, i'll put that into the rfc [21:45:27] the idea here is that instead of just shoving a content object and search and saying "here you go, you figure out what to do" the content model has opinions on how to expose its structured data [21:45:53] ebernhardson: ParserOutput has fields for the different types of links. I'm proposing a similar mechanism for exposing different fields to the search index. [21:45:54] I like the idea of having cirrus understand wikidata better, but I wonder if we're not going too far with field types, etc. - is it really the place or should it be in the query engine? [21:45:55] I replied to mz in the ticket a few minutes ago with what I think is a decent justification. or justification equation [21:46:04] maybe it is, just wondering [21:46:19] SMalyshev: i would start with this in a more incremental, modest way [21:46:21] ebernhardson: i don't want to bind it to ParserOutput, because extracting that info isn't necessarily bound to generating HTML output [21:46:26] like expose text fields [21:46:49] aude: yeah I really like the idea of letting wikidata make sense out of the data and tell which thing is where [21:46:59] DanielK_WMDE, I don't think he was saying the search data should be in ParserData. [21:47:04] SMalyshev: the way we'd do it without this is to have Cirrus explicitly aware of wikidata (by type checks) or have wikidata explicitly aware of cirrus (by hook registration) [21:47:15] geo is something people want [21:47:16] ebernhardson, you were just elaborating on what I said re re-implementing, right? [21:47:18] and could be easy [21:47:20] SMalyshev: well, the search engine needs to somehow know what kind of data will be in the fields, so it can set up the indexes appropriately [21:47:21] but I'm not sure encouraging people doing data range searches there is the right platform... dunno [21:47:21] DanielK_WMDE: it sounds like you want some bit of page metadata that's independent of content type? [21:47:27] and property - wikibase-item is easy [21:47:36] other things not so easy [21:47:44] and multilingual part is probably most tricky [21:48:14] all this would be more limited obviously than full query engine [21:48:50] how is CirrusSearch able to handle incategory: , hastemplate: etc. currently? Is it all special-cased wikitext handling? [21:49:00] gwicke: no, Content doesn't know any metadata. Metadata yxould be handled in a similar way, but Content is about, well, content. It doesn't even know the page title. Just what's *on* the page [21:49:05] spagewmf, the logic is in Cirrus, but they want to move it to core. [21:49:16] spagewmf: yes [21:49:34] DanielK_WMDE: right, hence metadata [21:49:39] cirrus is explicitly aware of wikitext and has hooks for everything else. like geo [21:49:41] to core meaning to wikidata code, right? [21:49:50] SMalyshev: no [21:50:01] SMalyshev: i agree that range searches (quantity, geo, date) go pretty far beyond what search is currently being used for. [21:50:04] this is a pretty generic problem [21:50:04] We were talking about wikitext fields, categories, etc. [21:50:08] anything that is general enough (eg. useful to db backend search) [21:50:08] i could drop them from the rfc for now [21:50:12] could be in mw core [21:50:35] SMalyshev: i'm mostly interested in full text vs. prefix match, and suppor for multi-value and multi-lingual fields. [21:50:39] parsoid had the same need, and I expect a lot more kinds of page / revision associated metadata to come up in the future [21:50:43] aude: so where knowledge about what is where in wikidata would be? i.e. what would implement getAllSearchIndexFieldDefinitions? [21:50:44] DanielK_WMDE, I don't think you need to drop them. [21:50:51] i would drop date and quantity [21:50:55] It is okay if the field data is a little more advanced than Special:Search supports for now. [21:51:40] SMalyshev: soem in cirrus (e.g. how to handle coordinates, maybe), some in wikibase [21:51:49] the thing about this proposal is that its ok to ask for things the search index can't do [21:51:52] it just won't do them [21:51:57] SMalyshev: wikidata content handlers would expose wikidata specific stuff. the wikitext content handler would expose wikitext specific stuff, like sections. Wouldn't need to know about that in cirrus any more. [21:51:58] and this interface to help bridge the two [21:52:00] its a _should_ not a _must_ [21:52:01] superm401, manybubbles: thanks. This RFC should say how it changes the current wikitext processing [21:52:04] in my opinion [21:52:16] DanielK_WMDE: ok, got it, makes sense [21:52:20] manybubbles: exactly. the content handler just gives the search index more to work with [21:52:38] something dumb like the default sql based thing can just concatinate all values, and run mysql's fulltext index [21:52:58] DanielK_WMDE: or choose not to use some fields [21:53:21] or types [21:53:26] #info spagewmf and others suggest the rfc should say how it would change wikitext processing [21:53:39] or html processing [21:53:52] #info manybubbles suggests the rfc should list examples of fields that could be exposed for different kinds of content [21:54:29] DanielK_WMDE: yeah - for wikitext is a great example [21:54:42] or Parsoid HTML [21:54:46] so is it worth talking about if the rfc is worth doing at all? [21:54:57] gwicke: but exposing content model specific info is different from exposing revision/page metadata. the info is used in the same way, but the responsibility for providing it would lie with very different components [21:54:58] yes [21:55:02] manybubbles: I was about to ask you that [21:55:08] like, its totally worth doing if there are more search backend or if there are more things than just wikidata [21:55:10] DanielK_WMDE: i think it's mainly the metadata things that are in common across content types [21:55:21] aude: [21:55:23] If ContentHandler folk like this proposal and Search folk like it, let this be their vows [21:55:24] indeed [21:55:29] getTextForSearchIndex gives raw text (wikitext e.g.) [21:55:37] for wikibase, that would be raw json(?) [21:55:46] kind of weird to me [21:56:04] aude: i think we concatenate all values fdrom the json, omitting the keys [21:56:12] DanielK_WMDE: maybe [21:56:13] manybubbles: it sounds like you would prefer to see Cirrus or some abstraction of Cirrus specified as a primary search engine for core [21:56:13] and pass that to the full text index. it sucks :) [21:56:21] kind of like we do now [21:56:32] DanielK_WMDE: I don't think that we can keep all processing related to a content model in a monolithic framework in the longer term [21:56:34] and the other search engines would have a kind of compatibility interface [21:57:08] and what the core exposes via ContentHandler would be basically what Cirrus supports, and the other search engines would transform that into something they support [21:57:10] gwicke: there's no need to. any component can dispatched based on content model id [21:57:47] TimStarling: i was thinking the ContentHandler would expose something completely abstracted from what individual search engines support [21:57:58] TimStarling: not really - this abstraction makes a ton of sense given that we have to support both database search and cirrus. its just that we could fix wikidata's search by doing some hooks work in cirrus and wikidata. it might be simpler and it wouldn't need a grand abstraction [21:58:00] but we can tune it for Cirrus [21:58:02] could you redefine this as a schema that contains the information? [21:58:11] rather than a procedural interface [21:58:24] that'd open up many ways to produce this metadata [21:58:26] The abstraction seems sound to me. I don't see any reason to special-case Wikidata. [21:58:34] it might not be a good idea to do the hooks but its worth comparing that solution to this one [21:59:08] gwicke: what kind of schema do you have in mind? expose this as an xml document? [21:59:15] a JSON schema, for example [21:59:32] i think just super simple steps, like getFieldsForSearchIndex return array( 'raw_text' => x ) [21:59:32] gwicke: can you link an example of the thing you are thinking of? [21:59:36] manybubbles: well, we're almost out of time, and you are our senior search engineer [21:59:39] for TextContent [21:59:41] Yeah, if we did do a schema (not sure that's necessary), it would be nice to use JsonSchema since EventLogging already does. [21:59:49] TimStarling: in that case I think its worth doing. [21:59:58] TimStarling: I'm also the only search engineer :) [22:00:00] manybubbles: well ,this solution is pretty light-weight. We can easily provide default implementations in the base classes for Content and ContentHandler. [22:00:05] manybubbles: basically, stash the return values from the RFC methods in a JSON document that has a schema [22:00:18] it's adding one method to each interface, and a basic implementation for each [22:00:21] not much work [22:00:25] we probably care more about the data than how it's produced [22:00:41] gwicke: I think we're out of time. discuss on the rfc itself? [22:00:58] #info RFC approved by manybubbles, please continue with design/implementation [22:01:07] sure [22:01:09] \o/ [22:01:13] manybubbles: the hook think doesn't need a core change, but is a special case hack, which doesn't give a nice way for others to do the same. also, wikibase would need to know about cirrus (ore the other way around) [22:01:22] yay :) [22:02:02] #info gwicke would prefer a metadata schema over a procedural interface [22:02:15] ok, I guess we'll schedule another meeting in a month or two? [22:02:23] gwicke: i'm thinking in terms of php code here, so it's all methods and interfaces. if want to think of this in terms of a storage service, this would be exposed as json. [22:02:52] DanielK_WMDE: I'm just trying to compare this proposal to an alternate one. [22:03:00] TimStarling: i hope to have something on gerrit my then. if it doesn't get merged, let' [22:03:11] ...let's schedule another meeting :) [22:03:12] gwicke, are you saying that e.g. core would provide a schema, and then core and Parsoid would have independent implementations providing data that fits the schema? [22:03:17] DanielK_WMDE: it would be nice to lock that schema down [22:03:31] so that other services can consume it & produce it [22:03:52] superm401: yes [22:04:04] gwicke: the name/indexType/multiValue/weight thing? Sure, we can have a JSON binding in addition to the PHP binding [22:04:14] as we get fancier with our extraction heuristics I expect more to live in independent services [22:04:39] #info: propose a JSON binding for the field metadata [22:04:56] ok. all done? [22:05:12] #endmeeting [22:05:13] Meeting ended Wed Mar 4 22:05:12 2015 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [22:05:13] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-03-04-21.00.html [22:05:13] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-03-04-21.00.txt [22:05:13] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-03-04-21.00.wiki [22:05:13] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2015/wikimedia-office.2015-03-04-21.00.log.html [22:05:26] DanielK_WMDE: basically, data objects ;) [22:05:32] thanks TimStarling for running the meeting! [22:05:34] or was it value? [22:05:50] gwicke: POPOs ;) [22:06:05] nah [22:06:13] POJO as 'JSON objects' [22:06:41] JSON objects is techncially an oxymoron. [22:06:50] gwicke: for wikibase, we have an abstract data model spec, with specis for different bindings (JSON, RDF, etc) hanging off it. [22:06:51] technically [22:07:10] superm401: an oxymoron? isn't really a hyperbola? [22:07:27] DanielK_WMDE, no, JSON is a format for strings. Objects are not strings. [22:07:28] DanielK_WMDE: where are those wikibase specs? [22:07:36] Though it's a very technical distinction. [22:07:37] DanielK_WMDE: either way, interoperable spec = good [22:08:15] superm401: but it expands to "JavaScript Object Notation". "JavaScript Object Notation Objects" is kind of silly... [22:08:35] DanielK_WMDE, yeah, it's a string notation you can serialize objects too. [22:08:50] gwicke: the abstract one is being overhauled, the JSON one needs minor updates, and the RDF one is missing ;) but here you go: https://www.mediawiki.org/wiki/Wikibase/DataModel [22:08:51] But that's enough of my nit-picking for now. :) [22:09:02] s/too/to [22:09:27] superm401: well, it also implies a specific notion of "object". [22:09:32] but right. enough for now :) [22:09:34] DanielK_WMDE: thanks [22:09:41] see you guys later! [22:10:19] Have a good day. [22:10:27] good night! [22:11:50] night!