[01:23:49] O_O [01:24:05] scheduled meeting? [01:48:02] <[[Cameron]]> 0.o Is there? [01:48:21] <[[Cameron]]> Oh looks like IRC Cloud died [09:12:53] any WMF legal members here? [17:44:06] ToAruShiroiNeko: This really isn't the place to ask for legal help, but maybe I can direct you to the correct person if you can describe what you're looking for [17:44:38] I need to discuss the matter with someone from the legal team, to get an update of something ongoing [17:44:41] its nothing too serious [17:45:28] OK then [17:46:11] ToAruShiroiNeko: There is legal@wikimedia.org for general questions I guess [17:46:35] yes, I dont want to email, it isnt that important [17:46:58] Well, I can assert with 98% certainty that you won't get an answer here [21:00:42] * robla gets prepped to start Phab:E325 [21:01:32] #startmeeting ArchCom IRC meeting [21:01:33] Meeting started Wed Oct 26 21:01:32 2016 UTC and is due to finish in 60 minutes. The chair is robla. Information about MeetBot at http://wiki.debian.org/MeetBot. [21:01:33] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [21:01:33] The meeting name has been set to 'archcom_irc_meeting' [21:01:52] #topic T138783 - SVG in HTML [21:01:52] T138783: SVG Upload should (optionally) allow the xhtml namespace - https://phabricator.wikimedia.org/T138783 [21:02:06] #topic T138783 - HTML in SVG [21:02:09] :-) [21:02:13] hi folks! [21:03:06] so....today's topic is about whether we should allow HTML in our SVG [21:03:51] * robla goes to find more links to describe this better.... [21:04:45] [21:05:44] matmarex suggested "We have a HTML validation library (the Sanitizer class) and it could probably be hooked up to validating HTML" [21:05:52] does that seem like a good idea? [21:06:26] MatmaRex is not in this channel and is marked /away [21:06:35] but maybe we have bawolff? [21:06:42] Yep [21:07:16] i think it's important to differentiate between a) should mediawiki support this at all and b) should this be allowed/used on wikimedia sites [21:07:45] also, what html is typically encountered in that context, and how well is that supported by different renderers? [21:08:13] as i wrote on the bug, seems in theory workable but im concerned the sanitizer wants to rewrite during sanitization which means youd have to be very pedantic in writing your svgs [21:08:41] unless we change the upload pipeline to allow files to be modified during upload [21:08:47] as for (a), if there was a patch in gerrit to allow HTML behind a feature flag, it would presumably be accepted, we don't have a massively high bar for that sort of thing, right? [21:09:54] TimStarling: I think the bar is "is this secure?" [21:09:58] yeah, you would have trouble using sanitizer directly [21:10:25] TimStarling: or more nicely, some dort of pluggable validation thingy, yea. [21:10:29] I'm not really a big fan of Sanitizer in terms of the way the code is currently structured [21:10:33] I'd guess the security depends on the sanitizer? [21:10:38] then this could be an extension rather than a feature flag [21:11:12] SMalyshev: also on the consumer of the html. currently, it's us rendering these, not the browser. [21:11:32] you mean SVGs? [21:11:45] what originally started this conversation was a discussion about a draw.io plugin where the feature flag wanted was "turn off sanitization" [21:11:51] yes. well, the html inside the svgs [21:11:56] ah yes if we have SVG renderer we also need to see what's going on there [21:12:26] according to bawolff, our SVG renderer does not support HTML, you'll just get an empty box [21:12:26] They were using svg foreign objecty things [21:12:56] so there may be fallback for renders not supporting it [21:12:56] which is not surprising, it's a very complicated spec to pull in [21:13:56] * DanielK_WMDE_ wants interactive/scripted svg [21:14:03] an idea I had last hour: maybe we could have an option in the image link which enables client-side SVG rendering [21:14:24] whether or not rsvg properly handles the fallback i have no idea as i havent tested that [21:14:43] you know there are performance reasons to do SVG in the server at the moment [21:14:47] TimStarling: sounds ok for 3rd parties. for wikimedia, we'd need a decent fallback renderer [21:14:55] DanielK_WMDE: well thats a whole other bag of worms :) [21:15:08] if sanitizer follows https://meta.wikimedia.org/wiki/Help:HTML_in_wikitext is seems ok [21:15:44] #info question discussed: should we use the Sanitizer class as HTML validation library for SVG? [21:15:49] Sanitizer is what the parser itself uses so it should [21:15:59] SMalyshev: i think that page is basically documenting what the sanitizer does [21:16:57] Sanitizer::removeHTMLtags() is currently only used by the parser, so it probably does some things which are wikitext-specific [21:17:23] and like bawolff previously said, it works by rewriting dubious HTML to be less dubious [21:17:32] The translated entity names for example are wikitext specific [21:17:41] so you either have to allow it to alter uploads or have a strict rejection mode [21:18:15] (side note: please feel free to add "#info" tag notes to get captured in the summary of this discussion) [21:19:01] Allowing mediawiki to alter uploads would make a bunch of things unrelated to this much easier [21:19:03] for example, unquoted attribute values are rewritten to be double-quoted, which avoids relying on client parsing details [21:19:46] there's nothing really stopping us from doing it, right? [21:19:47] #info question discussed: should MediaWiki be able to alter SVG uploads on save? [21:20:29] I think some code has to be rearchitected, but not a major amount [21:20:53] AaronSchulz: does it sound difficult to you? [21:21:03] The upload code is something a lot of people find scary and dont like touching [21:21:31] but i dont thimk it would be very hard [21:22:20] bawolff: yeah, probably [21:23:00] UploadBase::performUpload() just uploads $this->mTempPath [21:23:19] it's not even trying to move the file from the stash, it just reuploads [21:23:39] so if you modify that file, you're all done, right? [21:24:17] I imagine you would want to also give the user a warning [21:24:31] sure [21:24:50] #info bawolff says code might need rearchitecting. AaronSchulz doesn't think it would be too difficult [21:26:46] #info TimStarling would like there to be an image link option which does client-side SVG rendering [21:27:41] If we allow altering the svg, i think it would be cool to replace our entire validation code with DomPurify [21:28:08] #info 14:27:41  If we allow altering the svg, i think it would be cool to replace our entire validation code with DomPurify [21:28:22] thoughts on that? [21:28:36] * robla goes to find the DomPurify link [21:28:38] slightly complicated by that project being in js, but thats probably not an issue for wmf at least [21:28:57] #link https://github.com/cure53/DOMPurify [21:29:14] and i think we would be better using someone elses sanitizer than our own [21:31:09] Since more eyes on their sanitizer than ours [21:31:11] bawolff: you're suggesting that the Wikimedia instances would rely on server-side Javascript (i.e. Node.js). do you think it makes sense to require Node.js for secure SVG? [21:31:31] (for third parties? [21:31:37] hi, sorry i'm late to SVG party. DanielK_WMDE_ I second your desire for interactive SVGs :) Graphs at this point can produce SVG output, but graphs does not (yet) support an ability to create URL links [21:31:54] i wouldnt say im happy about it... [21:32:04] that library does not look like an improvement [21:32:38] (hi All) [21:33:04] so if graphoid was allowed to produce SVG output, it would already be a quality improvement, but not necessarily a functional improvement right away [21:33:40] but at least it gives a path forward, whereas without SVG, we have no easy way of producing clickable graphs [21:33:59] legoktm pointed out T86874 on wikitech-l [21:33:59] T86874: Make SVG sanitization into a library - https://phabricator.wikimedia.org/T86874 [21:34:35] TimStarling: can you be more specific? I havent looked at that library in detail im more coming from a i dont like our current approach [21:35:15] #info question discussed: is DOMPurify an improvement over the status quo [21:35:53] for a start, it's only 860 physical lines of code, so if you wanted it you could just port it to PHP [21:36:51] it's apparently just an XSS filter, not a cross-site request filter [21:38:12] Yes, there would have to be a layer on top for parts outside their security model but inside ours [21:38:23] I don't know how well it has been reviewed [21:39:18] IIRC csteipp viewed DOMPurify favorably, but we didn't manage to finish the review before he left [21:39:25] it's a lot more permissive than Sanitizer [21:40:48] agreed on the need to layer our specific restrictions on top [21:41:23] var IS_ALLOWED_URI = /^(?:(?:(?:f|ht)tps?|mailto|tel):|[^a-z]|[a-z+.\-]+(?:[^a-z+.\-:]|$))/i; [21:41:36] parsoid's sanitizer works at the token / sax level, so it shouldn't be all that hard to wrap it into a SAX visitor [21:41:45] I've got to go [21:42:41] TimStarling: d'oh! ok....thanks for the discussion! [21:42:58] I definitely think that we should leverage other people's work in this space as much as possible [21:43:36] #info 14:42:58  I definitely think that we should leverage other people's work in this space as much as possible [21:45:06] gwicke: do you think we should stop working on our PHP version of SVG validation? [21:45:09] robla, do you keep a list of potential usecases? Graphoid SVG output would be the first in line [21:45:36] (i meant i will be the first person to get into that line) [21:46:14] yurik: I don't. This topic isn't *yet* an RFC, but if you want to turn this into an RFC, that'd be a great place to put your case on the first line [21:46:14] Isnt graphoid svgs entirely machine generated (ie not attacker controlled)? Does it need general sanitation? [21:47:00] robla: I think we should look at features / maturity of different options, and then evaluate a) which requires the least initial investment, and b) which will need the bigger long term maintenance effort on our end [21:47:03] bawolff, that was my understanding too, but for some reason csteip (i think) wanted to sanitize everything regardless of the source [21:47:18] "just in case" (tm) [21:47:42] gwicke: I can't commit to writing an RFC for this, but that all seems reasonable. would you be up for writing it? [21:47:46] Gwicke: do you know other options? [21:48:05] I haven't checked the PHP code in a while, but my recollection from the last time I talked about this with csteipp was that it wasn't very complete for things like html-in-SVG [21:48:24] DomPurify was the only one i could find that was even remotely good [21:48:36] the problem with that logic is that Vega is already enabled on the client, so if there is a security vulnerability, people are already exposed to it (i hope we caught all of them). So just adding SVG output does not seem to increase the attack surface [21:48:41] bawolff: that's the only one we seriously considered back then [21:49:23] but who knows, the web library world is changing every couple of months these days.. [21:49:56] Im intrigued by Tim's comment about porting to php was interesting. That honestly never even occured to me [21:50:23] but im not sure about the library support in php for DOM [21:50:42] which dompurify seems to use extensively [21:51:04] https://github.com/OWASP/java-html-sanitizer looks potentially interesting [21:51:25] perhaps its there. Im honestly not very familar with that area of the php ecosystem [21:51:38] * robla creates an RFC page https://www.mediawiki.org/wiki/Requests_for_comment/SVG_Upload_should_(optionally)_allow_the_xhtml_namespace [21:51:41] I don't think it supports SVG docs, however [21:53:06] ok...we're coming to the end of the hour. This has been a really informative discussion [21:53:32] ...and we have a place we can collaborate on prose ^^^ [21:53:57] https://github.com/darylldoyle/svg-sanitizer [21:54:14] Did we come to a conclusion about the original issue? [21:54:42] * subbu only read the last few lines of this rfc since he got here late [21:55:01] bawolff: I think the original issue was about should we allow XHTML at all, and I haven't heard any objections to it [21:55:26] "just" need to solve the sanitization challenge.. [21:55:27] the conversation seems moved to "which validation library should we use?" [21:55:43] Subbu: thats exactly what im looking for [21:55:49] that question seems it will be solved after vi vs emacs ;-) [21:56:23] seriously though, we have a bunch of validation options we should probably consider in the RFC writeup [21:56:27] the second issue is that we'll need to upgrade our rasterization tools to support HTML [21:56:51] we do have at least one option for that, but it's not rsvg [21:56:52] gwicke: yeah, I don't think we'll have time to touch on that topic [21:57:11] let's take the discussion back to phab. I'll take the action to summarize there [21:57:12] Thanks for the discussion, All! [21:57:33] any last thoughts before I hit #endmeeting? [21:57:56] (I'll end this arbitrarily very close to the top of the hour) [21:58:31] we should continue this conversation over on #wikimedia-multimedia maybe :-) [21:58:44] Gwicke: i dont know if thats actually neccesary for the original use case [21:58:47] gwicke: further thoughts about the sanitization issue? [21:58:50] * robla regrets not more explicitly pinging them before this [21:59:02] 60 second warning [21:59:14] obviously it would make more sense [21:59:52] alrighty....thank you everyone! No meeting planned next week, and the following week has an option or two [22:00:01] Thank you! [22:00:07] #endmeeting [22:00:08] Meeting ended Wed Oct 26 22:00:07 2016 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [22:00:08] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-10-26-21.01.html [22:00:08] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-10-26-21.01.txt [22:00:08] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-10-26-21.01.wiki [22:00:08] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-10-26-21.01.log.html [22:04:49] bawolff: http://pdf-electron.wmflabs.org/png?accessKey=secret&url=https://upload.wikimedia.org/wikipedia/commons/b/bd/Test.svg [22:05:50] this is using Chrome's SVG renderer