[20:59:09] soon... [21:01:08] * robla gets plugged in [21:01:31] #startmeeting ArchCom RFC meeting - Markdown support | Wikimedia meetings channel | Please note: Channel is logged and publicly posted (DO NOT REMOVE THIS NOTE) | Logs: http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-office/ [21:01:31] Meeting started Wed Jun 22 21:01:31 2016 UTC and is due to finish in 60 minutes. The chair is brion. Information about MeetBot at http://wiki.debian.org/MeetBot. [21:01:31] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [21:01:32] The meeting name has been set to 'archcom_rfc_meeting___markdown_support___wikimedia_meetings_channel___please_note__channel_is_logged_and_publicly_posted__do_not_remove_this_note____logs__http___bots_wmflabs_org__wm_bot_logs__23wikimedia_office_' [21:01:46] i hope that wasn't too many bits for poor meetbot [21:02:19] #link https://phabricator.wikimedia.org/E218 Phab event for this week's meeting [21:02:40] #info discussing https://phabricator.wikimedia.org/T137946 develop Markdown support strategy for MediaWiki [21:03:20] #link https://www.mediawiki.org/wiki/Requests_for_comment/Markdown this week's RFC [21:03:30] * robla wipes brow [21:03:41] robla, care to chat a bit on the background? [21:04:33] sure, this is asking "what should our Markdown strategy be?", where pretty much any answer is valid [21:04:56] :D [21:05:12] why I'm asking that: there are many, many flavors of "wiki syntax" out there, of which MediaWiki wikitext is only one [21:06:00] (but ours is the _real_ wikisyntax... :P ) [21:06:04] many implementations claim "Markdown support", which the interpretation varies quite a bit based on implementation [21:06:43] YairRand: :-D I think that actually gets to the heart of it [21:07:51] YairRand: do you (or anyone out there) believe that all other implementations will "see the light" and start using our format? should they? [21:08:59] a different question is: will all the disparate markdown efforts to go beyond "simple" markdown eventually arrive at the wikitext level of complexity? [21:09:25] even if the syntax will probably not be wikitext syntax itself. [21:09:59] (taking off my chair hat momentarily) what's a reason a given wiki might have for choosing to use markdown? preference, or compatibility with existing data or other tools, or? [21:10:21] (that might affect how one would go about such support) [21:10:56] I think both questions are very good, and now I'm having trouble choosing :-) [21:11:00] :D [21:11:08] let's do em in turn [21:11:11] migrating from a github wiki to mediawiki might be one reason to want markdown page source [21:11:54] *nod* [21:11:54] bd808: yup [21:11:54] are there any serious limitations regarding wikitext that are solved in other syntaxes? are they pretty freely convertable? [21:12:09] (Is there a question here about how Wikimedia markdown talked about now will interface with SQID and Wikidata?) [21:12:35] YairRand: the Pandoc folks aspire to provide complete interchangability [21:12:58] #info open question: reasons for choosing markdown? example: moving hosting of a github wiki [21:12:59] robla: ... [21:13:06] * subbu is looking at http://pandoc.org/README.html#pandocs-markdown and sees that it is a pretty long spec [21:13:47] #info open question: complexity and extensions to the markup? example: would we need a syntax extension for templates/parserfunctions/lua/wikidata/etc? [21:14:39] easy things are easy to convert, hard things are ....... well that's the question isn't it :D [21:15:04] one good reason to entertain this markdown question for mediawiki is that it might let us abstract the markup / parsing parts of the codebase behind an interface. [21:15:26] #info for convertability of markdownish things, see pandoc http://pandoc.org/README.html#pandocs-markdown [21:15:41] what does cut and paste support mean for users in practice? [21:15:52] agreed. getting serious about multiple markup formats would led to cleaning up a lot of entagled cruft in core [21:15:58] subbu: good point. also, how much do we rely on wikitext eg in the user interface? [21:16:00] i toyed with that interface idea in https://www.mediawiki.org/wiki/User:SSastry_(WMF)/Notes/Wikitext#Core_ideas [21:16:35] brion, yes, wikitext in the UI is tricky ... [21:16:49] site messages are another i guess. [21:17:25] TimStarling: I know what it means for me, but that's probably a better question for the folks who work with VE regularly, since my understanding is that cut-n-paste bugs happen a lot [21:17:28] #info question: heavy use of wikitext in UI may require core parser. implications for alternate formats? [21:17:48] * robla goes to find the Phab component for cut-n-paste issues [21:17:58] brion, is this (wikitext in UI) used a lot in non-wmf installs of mediawiki? [21:18:26] https://phabricator.wikimedia.org/project/view/898/ VisualEditor copypaste component in Phab [21:18:29] would markdown be a third editing mode, after "source" and VE? [21:18:33] #link https://phabricator.wikimedia.org/project/view/898/ VisualEditor copypaste component in Phab [21:19:00] TimStarling, I would think not. [21:19:12] would you have an "insert markdown" toolbar button which gives you a box for pasting markdown? [21:19:18] subbu: at least some yes, sentences and paragraphs allowing bold, links, etc on various special pages. don't know how scary they are [21:19:27] as in .. i see robla's proposal as that of using it as an interchange format for copy-paste [21:20:12] #info question: would cut-and-paste and interchange for markdown add a third editing mode beyond source/visual? [21:20:38] agreed. getting serious about multiple markup formats would led to cleaning up a lot of entagled cruft in core [21:20:43] or it could be done as a ContentHandler [21:21:17] yeah. then you could have a mixed wiki if you wanted [21:21:29] then you wouldn't even touch $wgParser or create a Parser base class [21:21:31] i don't see a used case for mixed-markup-format wikis. [21:21:31] #info tim sez "getting serious about multiple markup formats would led to cleaning up a lot of entagled cruft in core" [21:21:34] that would be pretty confusing. [21:21:44] no, I was quoting bd808 [21:21:51] #info whoops bd808 sez that [21:22:04] * brion quote parsing error ;) [21:22:07] * bd808 denies it all [21:22:40] it can be the default content handler if you like, the point of doing it as a content handler is that it gives you a convenient pre-existing hook point [21:22:53] i can see particular uses, such as when a wiki is used as a source repository of documents to be reused.... but they get scary ;) [21:22:59] (for mixed modes) [21:23:00] pretty much everything about wikitext has already been abstracted there, for wikidata's benefit [21:23:36] things like links table updates, redirect syntax, PST and parsing itself [21:23:42] #info tim is pretty sure ContentHandler can implement a markdown mode well. should already be well-factored. can be used as default contenthandler in theory [21:23:48] i see ... [21:24:51] that wouldn't effect site messages because the message system grabs onto $wgParser [21:25:12] but maybe that's not a bad thing [21:25:17] but they'd still have to be written in wikitext if they are stored in a wiki page, right? [21:25:19] yeah, that's the point [21:25:47] site messages could have the wikitext content type, so you could even preview them using wikitext [21:26:27] we already support default content types that vary depending on namespace [21:26:27] #info example of needing core parser: messages in MediaWiki: namespace, such as site notices. force them to use wikitext CH [21:26:34] again for wikidata's benefit [21:26:34] is some sort of wikitext always going to be at the heart of MediaWiki or is T112999 forseeable? [21:26:35] T112999: Let MediaWiki operate entirely without wikitext - https://phabricator.wikimedia.org/T112999 [21:27:09] robla: it's conceivable but we'd have to eliminate or make optional the remaining wikitext users ;) [21:27:36] brion, i don't think robla is saying get rid of wikitext .. but whether mediawiki might support an option without wikitext. [21:27:48] allowing the parser for site messages to change would be like adding a language variant to every i18n language which seems unlikely to turn out well [21:28:22] right you'd basically have to change them to plaintext or plaintext with a very limited markup that is not full wikitext [21:28:31] * subbu is trying to grok what bd808 just said [21:28:34] I don't think it would really be helpful to attempt to translate i18n into some other markup language [21:28:46] but we've got all sorts of fun things like grammatical plural and gender markers done via a subset of wiki markup [21:28:47] you know, i18n really drove the development of a lot of parser features [21:28:48] subbu: en-wikitext && en-markdown [21:29:30] #info i18n is heavily dependent on a subset of the core parser for plurals, genders, and other message variants... but that doesn't have to be used for content if you don't want [21:29:36] let's say that the version of wikitext we have now is "wikitext 1.0". is "wikitext 1.1" something we could do? (and still support i18n) [21:30:10] * brion ponders [21:30:36] could we, or would we want to, split a wikitext spec into 'the bits used for i18n' and 'extra fancy-ass markup used in wikipedia-like content' [21:30:36] ? [21:30:45] robla, wikitext has evolved over the years .. so, i guess the qn. you are asking is if explicit versioning is needed? [21:30:48] or is that even worse :D [21:30:49] i18n of course is a mix of formats [21:31:05] preprocessed plain text, preprocessed HTML and true wikitext [21:31:07] plaintext, plaintext plus, wikitext, html, .... oh helllllls [21:31:14] subbu: yeah, I think so [21:31:49] well, except the qqq language which is pretty consistently wikitext [21:32:52] #info question: is explicit versioning needed? can/should we make a 'wikitext 1.1' that is always implemented for i18n and ui messages? [21:33:18] #info note i18n messages are a mix of plaintext+preprocess, HTML+preprocess, and pure wikitext [21:34:42] robla, are you proposing any role for markdown on WMF wikis? [21:35:10] (What are the implications of these MediaWiki markdown choices/decisions re ContentTranslation and Wikipedia's 358 languages, and security questions especially?) [21:35:47] TimStarling: I think it potentially has a role in normalizing CopyPaste issues, but the path toward that is complicated [21:35:59] #info question: implications of markdown choices on other tools like CT, need for i18n, and security? [21:36:19] that requires browsers, doc-creating systems (word, etc.) to support conversion to "standard" markdown. [21:36:45] it seems very limited as an interchange format [21:37:02] compared to RTF, HTML, PDF, etc. [21:37:22] if I were going to copy-paste from a markdown wiki page, bug report, or readme file on github for instance, my choices are to copy-paste the source, or copy-paste the rendered HTML [21:37:39] subbu: I think at a base level, we have a number of applications that claim "text/html" during copy/paste operations, but text/html copy pasting pretty much anything [21:37:52] we know that pasting text/html is way harder than it should be ;) but we already support it in VE [21:38:07] brion, from some sources, yes. [21:38:13] pasting HTML into VE is already good enough to be useful [21:38:14] brion: we support it today, but it's an arms race, isn't it? [21:38:17] benefits of source copy? [21:38:18] I have used it a few times [21:38:18] hehe yep [21:39:00] no one (that I'm aware of) has defined a useful subset of HTML that is safe for copy/paste operations [21:39:08] but so is markdown isn't it? [21:39:22] if we support github's extensions, next we get asked about someone else's extensions [21:40:18] #info question is the HTML copy-paste "arms race" good enough vs markup-specific paste converter tools for markdown etc? [21:40:36] HTML paste is likely to work if the HTML is very simple [21:41:00] for example if you're copying from a github README.md you'd expect it to work [21:41:29] TimStarling: is there a "very simple" subset of HTML we can get browser makers to support? [21:41:43] (for copy/paste purposes)? [21:41:47] robla, you linked to https://tools.ietf.org/html/draft-ietf-appsawg-text-markdown-12 ... what are your thoughts on how likely it is to be adopted? [21:42:24] robla: no... but then browsers can't export to markdown either [21:42:31] #link https://tools.ietf.org/html/draft-ietf-appsawg-text-markdown-12 [21:42:55] subbu: I think like that could happen [21:43:18] our original goal for parsoid html2wt (which is still there as a comment in the serialization code) is to be able to accept arbitrary html and convert it to "acceptable" wikitext. but we haven't quite worked on that goal for a while now since we are mostly behind clients whose output is more controlled. [21:44:05] subbu: what do you mean by "output is more controlled"? [21:44:27] as in .. VE/CX/Flow etc. don't generate arbitrary html. [21:44:39] ah, got it [21:45:25] but, if you say, took the html from a bbc article and gave it to parsoid to convert to wikitext, the output isn't pretty. [21:45:35] so...basically, the copy/paste code works when we can control the generation of the HTML, but most implementations don't conform to our spec [21:45:51] no, VE does its own handling of copy-pasted HTML .. it doesn't go through parsoid. [21:46:20] fun :D [21:46:29] you mean it cleans up the HTML before it hands it to parsoid for serialization? [21:46:41] but, we've talked about creating a library for normalization and cleanup. [21:47:01] #info for comparison, the HTML paste handling in VE is done by normalizing HTML on the VE end, before it eventually lands in parsoid during save/serialization [21:47:10] TimStarling, as far as i know ... they strip unrecognized / unsupported attributes. [21:47:30] #info ideally the parsoid html2wt would take any html and produce 'acceptable' wikitext but is not fully exercised at that right now [21:49:28] things like html2wt are going to be necessary for a long time, I imagine, but it seems to me we should at least start pulling people toward a world where html2wt isn't necessary [21:50:32] well, there's the html-only world possibility :) [21:50:47] where you'd still have some validation stage [21:50:56] but not a major reparse i guess [21:51:13] (and presumably a stage to handle composition of templates, media etc) [21:51:56] for parsoid to accept arbitrary html, we would need to run a sanitization pass on the html and strip unrecognized attributes, normalize html, etc. [21:52:09] I think we live in a world where wikitext is sanitized and tries to be safe, and HTML is known unsafe [21:52:28] indeed we'd have "inside html" and "outside html" at the least [21:52:32] which is also something that needs to happen with a html-only wiki .. sanitization at the very least. [21:52:32] never, EVER mix em :D [21:52:44] there's no "sanitized HTML" spec [21:53:09] :) [21:53:14] #info an HTML-only storage world needs to carefully sanitize between "outside HTML" and "safe inside HTML".... but there's no spec! we'd need one [21:53:37] there's the old HTML email spec [21:53:58] (but yeah, that's not really a good alternative) [21:54:38] https://en.wikipedia.org/wiki/HTML_email [21:55:34] probably we need to spec out our extensions as well, such as how you extract the file name from a usage, a wiki page from a link, a template reference and parameter set from a big ol' blob of divs or whatever [21:55:46] I think if VE's HTML paste can produce reasonable wikitext markup for any HTML generated from original markdown, then that more or less replaces the need for direct markdown paste [21:56:15] i tend to agree [21:56:36] "original markdown" as in http://daringfireball.net/projects/markdown/syntax [21:56:45] which is much simpler than pandoc markdown [21:57:28] commonmark would be the modern simple version, I think [21:57:47] http://commonmark.org/ [21:57:50] ok we're getting low on time [21:58:08] any action items to pursue? decisions made? [21:58:21] T127329 is the placeholder for the parsoid side work to consolidate html-import/cleanup code into a library for use by whoever. [21:58:21] T127329: Using Parsoid as a wikitext bridge for importing content into wikitext format - https://phabricator.wikimedia.org/T127329 [21:58:50] #link https://phabricator.wikimedia.org/T127329 related parsoid bridge for html-import-to-wikitext [21:59:19] Thanks All! [21:59:26] so I'm fairly skeptical about the idea of direct markdown paste as being superior to markdown->html->wikitext [21:59:28] subbu: my understanding is that you're working on RFCs as a goal soon, right? [21:59:28] i was interested in the markdown strategy as a potential benefit for refactoring some code in mediawiki .. but looks like that is mostly already in place? [21:59:50] yay wikidata -> contenthandler \o/ [22:00:17] robla, rfcs for .. that task i pasted above? [22:00:21] #info tim is skeptical of direct paste; html import seems to serve well [22:00:33] subbu: something related to T112999? [22:00:33] T112999: Let MediaWiki operate entirely without wikitext - https://phabricator.wikimedia.org/T112999 [22:00:43] #action someone should revise the RfC, probably drop the cut-paste [22:00:44] ah, cscott territory. [22:00:49] yes. [22:00:58] #action update T112999 for the ContentHandler era [22:00:58] T112999: Let MediaWiki operate entirely without wikitext - https://phabricator.wikimedia.org/T112999 [22:01:15] i'll chat with him about it. [22:01:36] #action subbu will chat with cscott [22:01:38] thanks all! [22:01:40] #endmeeting [22:01:42] Meeting ended Wed Jun 22 22:01:41 2016 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [22:01:42] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-06-22-21.01.html [22:01:42] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-06-22-21.01.txt [22:01:42] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-06-22-21.01.wiki [22:01:42] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2016/wikimedia-office.2016-06-22-21.01.log.html [22:01:49] thanks brion! [22:01:55] :D [22:01:57] see y'all later! [22:02:19] see ya