[14:06:11] * anomie reads backscroll and is reminded of T150417 [15:36:41] anomie: my conflict meeting got done early, so if you want to meet now I can [15:37:07] bd808: ok. Give me a minute to get some water. [15:37:15] sounds good [18:31:35] what would be the best way to create a private SJ function and not freak jsduck out? https://gerrit.wikimedia.org/r/#/c/362326/11/resources/src/mediawiki/mediawiki.util.js [19:24:38] Fannnnnnntastic [19:24:47] make-release won't make a patch between rc.0 and rc.1 [19:24:50] * RainbowSprinkles stabs [19:25:05] -rw-r--r-- 1 demon wikidev 20 Jul 7 19:09 mediawiki-1.29.0-rc.1.patch.gz [19:25:38] Pretty sure that file should be larger than that [19:29:54] https://github.com/wikimedia/mediawiki/compare/1.29.0-rc.0...1.29.0-rc.1 [19:29:58] Quite a few changes [19:30:05] MaxSem: mark it as @ignore (or do not add a doc comment at all) [19:42:48] RainbowSprinkles: AaronSchulz: Would appreciate some thoughts from your perspectives on https://gerrit.wikimedia.org/r/#/c/363831/ [19:43:01] Reedy: obvs [19:50:18] thanks MatmaRex [20:05:15] Gahhhhhhhh [20:05:37] I had a huge reply written to you Krinkle but then that browser window crashed :( [20:07:14] * RainbowSprinkles writes tldr version instead and ragequits [20:08:52] * MaxSem gives RainbowSprinkles a mug of hot cocoa with rainbow sprinkles [20:09:36] I know why I'm grumpy, I never made coffee [20:09:55] how iz i awak? [20:11:58] RainbowSprinkles: Reminds me of that time I thought I had cleaned the whole house and finished the big school project I had to do , and thought really hard about all the writing and all - then the alarm went off and it was still 8AM... [20:12:20] I had to do it all *again*, or so it felt. [20:12:37] I call that predicting the future! [20:12:42] You're clearly psychic [20:13:23] I'm psychic too. I had a dream about rearranging my living room one night, and the next day I rearranged my living room [20:15:33] hehe [20:15:53] Psychic, or, inspired! [20:54:41] FFS eslint, you suck [20:57:21] MaxSem why? [20:57:26] is it that new 4.x update? [20:57:45] for all the reasons in the world:P [20:57:55] lol [20:58:04] try indent-old i think they call it [21:28:22] MaxSem: What now? [21:30:14] catch 22: it didn't like the function at the beginning of the file, it doesn't like it at the end either [21:55:30] Lol, I've been trying to trace where the phrase "This can help to ease strain on a virtual person." came from on https://www.mediawiki.org/wiki/Manual:Job_queue#Job_examples [21:55:37] https://www.mediawiki.org/w/index.php?title=Manual:Job_queue&diff=prev&oldid=191006 [21:55:42] https://www.mediawiki.org/w/index.php?title=Manual:Job_queue&diff=prev&oldid=191085 [21:55:51] I guess it was added in context of no longer needing to do null edits. [21:55:52] :D [21:56:57] wow, Krinkle reads manuals :O [22:11:03] brion: Do you remember the commit or workaround for https://phabricator.wikimedia.org/T155110 ? I'm curious to see in what case you had to pass a $fn name inside a run() method that had to match the outer transaction. [22:13:56] Krinkle im wondering could you take another look at https://gerrit.wikimedia.org/r/#/c/193434/ please? [22:23:37] legoktm: https://gerrit.wikimedia.org/r/#/c/220319/ at two years old is your oldest not-V-1, not-C-1, not-WIP patch; want to respond to a.nomie's comments so it can be merged? :-) [22:29:03] James_F: actually it probably needs a decent amount of work to use the new i18n warning system :/ [22:31:11] Krinkle: I think I went with something dumb like putting the two cases into a single class instead of using a subclass [22:31:39] In that case it was just a switch with different behavior in one or two places so not too weird to do :) [22:32:04] legoktm: Meh. :-( [22:32:06] So I used a parameter for the option I stead of the job class type [22:32:49] Should be in timedmediahabdler's WebVideoTranscodeJob if you care to check details [23:13:12] Does anyone here know why Sanitizer::stripAllTags() uses custom implementations of both strip_tags() (delimiterReplace) and html_entity_decode() (decodeCharReferences) ? [23:13:37] blame? [23:13:50] I'm grepping for calls to html_entity_decode( strip_tags( $x ) ) to Sanitizer::stripAllTags( $x ) since that's conceptually more or less the same, but I can't find docs on why these things/differences exist [23:13:52] Hmm I'll try that [23:16:21] All blame tells me is that the relevant PHP builtins have never been used there, we've always used some kind of custom implementation or other [23:17:00] "It was written in times of PHP 3 alpha" [23:25:27] RoanKattouw, let Jerkins decide: https://gerrit.wikimedia.org/r/#/c/363983/ [23:25:50] Yeah that's the thing, I have trouble believing this code was written before PHP$ [23:25:54] *PHP4 [23:26:25] it passes Sanitizer tests, btw [23:26:30] It'll likely fail unless you pass ENT_QUOTES | ENT_HTML5 or something as the second param to html_entity_decode [23:26:43] Yeah because **those functions don't have tests** [23:26:52] \m/ [23:26:59] resolved wfm? [23:27:24] Or well, at least stripAllTags doesn't have very meaningful ones, but decodeCharReferences has some. You could replace that function directly so that its tests also test the PHP builtin [23:27:30] And then you might get some failures to do with quote characters [23:27:37] Until you pass the right ENT_ keywords [23:30:23] RoanKattouw: Assuming this stemmed from my comment about RC tags – I believe there was another thread somewhere in Phab or Gerrit that went a step further with regards to why we treat the tag-description as parseable sometimes, but return it as text() transform only in the API, and in other places (new RCFilters UI) as parsed but stripped down to plain text - which seems like it may produce confusing content if it uses hyper links or [23:30:23] templates and such, which – if not encouraged/allowed, why then allow them in other cases. Seems like we should either decide on the format, or have separate messages. [23:30:36] e.g. label vs rich description. or something [23:30:55] seems like description is mainly for the table on Special:tags, which can be quite large/lengthy [23:31:36] Yeah, we show a truncated description in the RCFilters interface [23:31:45] There's a short name message and a long description message [23:31:54] I would support banning rich text from the short name message [23:32:11] And yes you assume correctly [23:32:21] OK, here's a weird line of code [23:32:23] + $query = wfCgiToArray( html_entity_decode( $parsedUrl['query'] ) ); [23:32:39] Where $parsedUrl is "Result of parseUrl() or wfParseUrl()" [23:32:50] So how can html_entity_decode possibly be relevant or correct here? [23:32:55] It's a URL, not HTML [23:33:29] pfft. now you're assuming that all the codes even make sense [23:33:56] I'm just removing it [23:35:29] so, here's the answer: https://integration.wikimedia.org/ci/job/mediawiki-phpunit-hhvm-jessie/9465/ :P [23:35:35] RoanKattouw: If this is related to MeidaWiki.php, WebRequest, PathRouter or Title, then I think I know. It deals with the fact that we html-encode thigns in wikitext but those can also appear in [[ ]], which we support as a way of more easily exxpressing hard-to-type characters. [23:35:56] It's in MobileFrontend, in MobileContext::updateDesktopUrlQuery() [23:36:02] E.g. [[•]] [23:36:15] links to https://en.wikipedia.org/wiki/• [23:36:17] OK great but how does that ever relate to HTML? [23:37:27] Sorry, I mean how does that relate to URLs [23:37:47] This function parses a URL, parses its query string, removes a param, and then serializes [23:38:22] so if we tweak a couple tests we can make things run a bit faster:) [23:39:26] So on that basis alone it has no business decoding entities on the way, but even then, decoding entities in URLs seems very wrong [23:41:40] Yeah [23:41:41] does it really decode URLs? https://en.wikipedia.org/wiki/& [23:41:47] https://github.com/search?l=PHP&q=org%3Awikimedia+html_entity_decode+-repo%3Awikimedia%2Fwikimedia-fundraising-SmashPig-vendor+-repo%3Awikimedia%2Foperations-debs-libvpx+-repo%3Awikimedia%2Fmediawiki-extensions-DonationInterface-vendor+-repo%3Awikimedia%2Fwikimedia-fundraising-crm-drush+-repo%3Awikimedia%2Fwikimedia-fundraising-crm-vendor+-repo%3Awikimedia%2Fintegration-composer+-repo%3Awikimedia%2Fwikimedia-fundraising-crm-drupal+-repo%3Awikim [23:41:50] edia%2Fwikimedia-fundraising-crm+-repo%3Awikimedia%2Fmediawiki-vendor+-repo%3Awikimedia%2Fwikimedia-bugzilla-modifications+-repo%3Awikimedia%2Fmediawiki-extensions-Wikidata+-repo%3Awikimedia%2Fphabricator+-repo%3Awikimedia%2Fwikimedia-fundraising-crm-civicrm+-repo%3Awikimedia%2Fphabricator-libphutil+-repo%3Awikimedia%2Foperations-software-librenms+-repo%3Awikimedia%2Foperations-debs-hhvm+-repo%3Awikimedia%2Fmediawiki-debian&type=Code&utf8=%E2%9 [23:41:50] C%93 [23:41:50] ugh.. [23:42:04] http://bit.ly/2tzGwGS [23:42:28] or https://git.io/vQKA8 [23:43:37] Here's another one doing a similar thing https://github.com/wikimedia/mediawiki-extensions-Echo/blob/95f83de22584a0c401359da9f0e5b320ca937f53/includes/DiscussionParser.php#L1181 [23:44:17] Yes I'm already fixing a bunch of extensions [23:44:33] I knew about the Echo and Flow ones [23:44:33] And then there's code doing it multipe times - https://github.com/wikimedia/mediawiki-extensions-BlueSpiceExtensions/blob/3d14eaa103864cbfc964ededb7524f2ebb5c9233/ExtendedSearch/includes/BuildIndex/BuildIndexMainControl.class.php#L705 [23:44:44] That's actually why the SpecialRC one was identical to those, because the same people wrote them :) [23:44:57] I'm about to submit changes to some extensions fixing some of them [23:45:01] RoanKattouw: LocalisedException in core [23:45:04] * c raises an eyebrow [23:45:16] Krinkle: I'm ahead of you: https://gerrit.wikimedia.org/r/363987 [23:45:21] And the one in API of course, which is the same as the one in LocalizedException [23:45:36] Esther: Krinkle needs a spanking [23:46:58] RoanKattouw: cool, assuming tests pass, I'll land it. [23:47:29] Also, we can update Sanitizer as far as I'm concerned. Perhaps add a few tests while at it, but overall, I don't see a security issue here, just a performance and code smell one :) [23:47:30] Krinkle: I would like to get an answer from somebody (e.g. bawolff or TimStarling) about why the built-ins are not used though [23:47:42] And to have better test coverage for these things [23:47:52] Sanitizer seems like a very important place to have test coverage [23:48:17] RoanKattouw: Can you check if these are used (indirectly) by Sanitizer::escapeId, or anything involving Link anchors and such? We need those to be relatively stable to avoid breaking links and anchors and such. [23:48:36] I know we do some basic stripping when converting e.g. == Foo == to /* Foo */ for edit summary [23:48:38] The entity decoding stuff is probably used there yes [23:48:44] Nothing I'm currently touching is though [23:49:07] Right, you're not changing the Sanitizer method at the moment? [23:50:05] Exactly, just changing mode code to use it instead of builtins [23:50:14] Also didn't eliminate all usage of the built-ins [23:50:29] Parser.php uses strip_tags() and I'm quite hesitant to change that because it's Parser.php [23:50:52] Also this patch for MF's (mis)usage for html_entity_decode: https://gerrit.wikimedia.org/r/363991 [23:51:02] Meh and my core patch fails Jenkins, looking at why [23:51:11] did you try git blame? [23:51:34] Line length [23:51:37] TimStarling: Yes I did [23:51:55] And I found no justification anywhere, except maybe that some of these things are so old that PHP4 might not have been in vogue at the time [23:52:34] The absence of strip_tags() in stripAllTags() as originally something like preg_replace( '!\<.*\>', '' ) (by Brion), and you changed it to delimiterReplace later when you introduced it, but that was for perf not correctness [23:53:07] Similarly for entity decoding I didn't find any illuminating commit messages, but I was only looking at the usage inside stripAllTags, not at the introduction of those functions more generally [23:53:43] I remember not trusting strip_tags [23:54:02] Don't recall d tails why though [23:54:24] The funny thing is, it is used in a few places (only one in core after my patch though, but that's in Parser.php). Just not in the Sanitizer [23:54:44] random HTML decoding is the kind of thing I would expect in Lee's code [23:55:00] he had a very integrated mindset, and would shove features anywhere [23:55:21] For entity decoding I can imagine not trusting html_entity_decode(), because of the crazy options that it takes, and because of how crazy entity stuff is generally [23:56:10] Maybe it was that the whitelisting doesn't blacklist or validate attributes [23:56:12] yeah, it's wrong, RemexHtml has the full tables from the spec [23:56:35] I tried using PHP's numeric character encoding, but that was non-compliant also [23:56:38] Eg if you allow span you allow span with on mouse over or some such [23:57:01] I mean for things like �