[21:03:53] <_joe_> DanielK_WMDE: meeting :P [21:04:37] _joe_: yea, let me in!... [21:04:46] ..."requesting access"... [21:04:52] <_joe_> uh we didn't see a request popping up [21:04:59] great. [21:05:02] let me try that again [21:05:32] can you invite me explicitly? [21:05:36] it did that for me, but worked after I restarted my browser [21:05:44] <_joe_> marko is doing that now [21:06:00] I think there was a chromium update since our last meeting [21:07:11] i just restarted [21:07:16] still nothing? [21:07:32] <_joe_> nope [22:01:42] o/ [22:02:01] #startmeeting TechCom RFC discussion [22:02:01] Meeting started Wed Feb 28 22:02:01 2018 UTC and is due to finish in 60 minutes. The chair is DanielK_WMDE. Information about MeetBot at http://wiki.debian.org/MeetBot. [22:02:01] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [22:02:01] The meeting name has been set to 'techcom_rfc_discussion' [22:02:01] Meeting started Wed Feb 28 22:02:01 2018 UTC and is due to finish in 60 minutes. The chair is DanielK_WMDE. Information about MeetBot at http://wiki.debian.org/MeetBot. [22:02:01] Useful Commands: #action #agreed #help #info #idea #link #topic #startvote. [22:02:01] The meeting name has been set to 'techcom_rfc_discussion' [22:02:14] oh yay, we have two meetbots again [22:02:24] :) [22:02:27] #topic RFC: Make some aspects of Tidy's whitespace stripping behavior part of wikitext parsing "spec" [22:02:33] #link https://phabricator.wikimedia.org/T157418 [22:03:13] hey subbu! [22:03:15] hi [22:03:38] who all is here for this rfc discussion? [22:04:03] I'm here [22:04:09] * _joe_ waves [22:04:15] I'm here (though mostly listening/reading) [22:04:18] i'm here [22:04:19] <_joe_> although I'm not sure I can be of any help :P [22:04:22] \o/ [22:04:35] alright .. there are enough of us. DanielK_WMDE do I need to summarize the rfc here? [22:04:38] these things tend to start with tumbleweed, and turn into a frenzy in the last 10 minutes... [22:04:40] I'm here listening [22:05:04] subbu: yes, please give a short intro, and ask the main questiosn you want to addess in thsi meeting [22:05:08] sounds good. [22:05:35] subbu: feel free to make liberal use of #info to put stuff into the minutes [22:05:41] Tidy (who everyone is familiar with, I presume) strips whitespace in HTML tags (among other things). [22:05:42] actually, please everyone make liberal use of #info [22:06:10] Since it runs after the PHP parser generates its output, it effectively strips whitespace from both native wikitext constructs and html tags. [22:06:35] so "* foo" renders as
  • foo
  • '' foo '' renders as foo and so on. [22:06:51] As it turns out, editors have implicitly come to rely on this whitespace stripping behavior. [22:07:14] specifically in nested lists which are rendered as horizontal lists via CSS [22:07:25] and which wrap the list in a ( .. ) pair [22:07:46] so ** foo\n ** bar renders as (foo bar) without any whitespace before the ( and the ) [22:08:11] as we are replacing tidy with a html5 tidier (RemexHtml based one), this whitespace is no longer being stripped. [22:08:29] because whitespace is sensitive in html5 + css land. [22:08:42] so, editors are now seeing ( foo bar ) now and they don't like it, understandably. [22:08:57] So, that is the problem we now have. And the RFC is about figure out how to address this. [22:09:07] anyone have any questions about this problem statement? [22:09:20] what is the specific question you want answered here today? [22:09:31] right, i'll get to that then. [22:09:33] some presumably like it since we had a number of bugs open declaring that tidy's whitespace stripping was unacceptable [22:10:09] perhaps, yes. although the only ones we've heard from right now are those who like their hlist formatting preserved. [22:10:10] but I agree with the proposal "3c" that whitespace should be removed from the start and end of block-level wikitext constructs [22:10:54] The 3 proposals are in the RFC description. [22:11:06] 3a: strip whitespace from native wikitext as well as html tags [22:11:16] 3b: strip whitespace from all (and only) native wikitext constructs [22:11:23] the only reason this wasn't done from the outset is because tidy and user-editable CSS were introduced almost simultaneously, by gwicke [22:11:24] 3c: strip whitespace from only block-level wikitext constructs. [22:11:42] ah, i didn't know that piece of history. [22:11:42] so hlists became possible at the same time as tidy started stripping whitespace [22:12:26] gwicke also rewrote doBlockLevels() at around the same time [22:12:38] (perhaps jumping ahead) -- my perspective is that it should be possible to write wikitext for any particular HTML you want to generate (* this guarantee is not (yet) possible, but it's the goal) [22:12:45] based on the discussion in the ticket, especially comments by PerfektesChaos, it looks like (3c) is the only viable option .. i wanted to hear if anyone is aware of other reasons / concerns with that propsoal. [22:13:04] but stripping whitespace is controllable with , so if you want whitespace after your list tags, write
  • <-whitespace here [22:13:24] so i have no problem with stripping whitespace by default when you write "html in wikitext" [22:13:26] I think it should be considered a grammar change rather than postprocessing [22:14:28] that is, list grammar will be e.g. "*" WS* content [22:14:45] okay, so there are two pieces here (a) do we agree with the 3c being the spec for wikitext going forward? (b) implementation detail: grammar / post-processing / ... ? [22:15:04] any objections / concerns / comments about (a)? [22:15:40] * subbu doesn't know how we could do the 9 seconds of awkward silence => consent .. as we do in some hangout meetings. [22:15:40] retaining whitespace in explicit HTML tags at least allows the editor the freedom to do weird things with whitespace if they want to [22:15:44] TimStarling: only for * lists, though? what about for explicit
  • tags? [22:15:59] oh, you just answered that. so whitespace preserved in explicit li tags [22:16:09] yes [22:16:15] cscott, yes, proposal 3c: is only native wikitext .. not html tags. [22:16:25] subbu: is that possible? or are some of the complaints about html tags? [22:16:34] * cscott seems to be just a few seconds too slow today [22:17:03] how certain are we that excluding html tags is sufficient? [22:17:18] and tim, i assume that * x' [22:17:26] and tim, i assume that `* x` preserves the whitespace? [22:17:33] cscott, my objection to stripping whitespace in html tags is that it introduces a subtle diff between how we are used to html in non-wikipedia pages and on wikipedia pages. [22:17:40] and copy-paste hassles. [22:17:57] that is not whitespace at the frontend which is where I'm proposing to implement it [22:18:02] subbu: oh, totally agreed, i just thought that some of the constructs the editors found problematic used explicit HTML [22:18:17] cscott, no .. ** for nested lists as far as i am aware. [22:18:32] I'm saying put it in the parsoid tokenizer, and in MW's doBlockLevels() [22:18:49] in the parsoid tokenizer, is seen as is, presumably it is not whitespace [22:18:54] and what TimStarling said above about ws in explicit html tags gives editors freedom to do crazy things with ws if they want to :) [22:19:12] in doBlockLevels() will be seen as a strip marker, which is also presumably not whitespace [22:19:20] well, i'm in favor of the more extreme proposal (stripping whitespace even around block level *explicit html tags*), so i'm definitely in favor of the less extreme proposal (stripping whitespace around equivalent wikitext constructs) as well [22:19:21] TimStarling, implementation wise, it is simpler to do it in the cleanup phase in parsoid .. so we don't have to mess with figuring out how to preserve that lost wihtespace for accurate DSR computation. [22:19:44] yes, is seen as is. [22:19:56] fair enough [22:20:13] but, doBlockLevels seems like the right place in php parser. [22:20:23] subbu: we need to figure out how to handle \0 and the other bogus control characters; we might be able to create a token in the tokenizer for those and also for whitespace-after-lists [22:20:30] and as it turns out matmarex has already implemented this for headings, and we didn't know :) [22:20:37] which means "drop in the output" more or less [22:21:00] there's also doTableStuff() [22:21:10] which you include as a requirement in your proposal [22:21:13] cscott, we could do that .. create a placeholder token .. just feels more hassle. but, okay, we don't have to resolve that now. [22:21:15] preserves whitespace, | does not? [22:21:25] subbu: agreed re deferring parsoid implementation qs [22:21:46] TimStarling, ya .. it seems sensible to do for all block tags so it feels consistent. [22:22:43] doTableStuff() does not try to identify HTML tables, so HTML tables would continue to preserve whitespace [22:22:49] is it time to start adding #info and #agreed items? [22:23:27] #info editors depend on tidy's whitespace stripping for nested lists to implement things like horizontal lists surrounded by ( and ) [22:23:38] we're agreed on "wikitext constructs that generate HTML tags which by default are display:block" ? [22:24:03] (trying to write a succinct but precise definition) [22:24:19] I think a table cell is technically not display:block? [22:24:19] #info RemexHtml, as a HTML5 parser, no longer strips whitespace in tags which breaks horizontal list rendering and this rfc is about solving that problem [22:24:43] and

    tags are display:block, although it's arguable what "stripping whitespace" around \n\n is. [22:25:00] #info The RFC has 3 proposals for addressing this [22:25:09] #info 3a: strip whitespace from native wikitext as well as html tags [22:25:16] #info 3b: strip whitespace from all (and only) native wikitext constructs [22:25:21] "wikitext constructs that generate HTML tags which by default are not display:inline" ? [22:25:24] #info 3c: strip whitespace from only block-level wikitext constructs. [22:25:30] table-cells are display:table-cell i think [22:25:49] #info history --> the only reason this wasn't done from the outset is because tidy and user-editable CSS were introduced almost simultaneously, by gwicke [22:25:50] Do we strip whitespace for wikitext "a\n\n b" ? [22:26:09] #info history --> so hlists became possible at the same time as tidy started stripping whitespace [22:26:12] well, that is

    a

     b
    [22:26:41] the line breaks are retained, which I think is fine, it helps source readability, is apparently harmless [22:27:09] so I would say we are not including line break generated paragraphs in this [22:27:31] "non-paragraph wikitext constructs that generate HTML tags which by default are not display:inline" [22:27:41] hehe [22:27:52] okay, so "block" is not the right term .. is that what we are discussing now? :) [22:27:59] you know there is a limited number of wikitext constructs, that's not an external spec reference [22:28:08] you could just list them (as subbu did in the RFC) [22:28:14] it appears we do strip trailing whitespace at the end of

    tags [22:28:28] ya, we could just list them :) --> * # : ; {| |} | ! || !! [22:28:46] TimStarling: sure. i'm just trying to come up with a consistent list by making a short executable specification ;) [22:28:56] i think you could argue that we *are* stripping whitespace around

    tags [22:29:12] "wikitext constructs that generate HTML tags which by default are not display:inline" seems short and correct. [22:29:47] cscott, although this is not about what we do for convenience that doesn't matter for rendering, but about what is *required* by the spec. [22:29:56] and what we want to put in the spec as required. [22:30:45] alright, should i add an agreed item into the minutes? :) [22:30:47] well, you could make the whitespace in P tags significant with css. so it does matter that tidy is stripping it. [22:31:04] this has turned into a standard parsing team geek out session [22:31:08] lol [22:31:10] :) [22:31:52]

     tags break my short definition, though.  We don't strip whitespace at the beginning or the end of the generated 
     tag.
    [22:31:53] * subbu is trying to figure out whether TimStarling thinks that is a good or a bad thing for this office hour 
    [22:32:10] 	 I think we can call it approved with option 3c as stated, if there are no objections?
    [22:32:20] 	 subbu: you forgot headings in your explicit list
    [22:32:27] 	 oh right! let me amend it.
    [22:32:44] 	 well, that was the point of trying to generate the list by type  instead of relying on an ad hoc list ;)
    [22:34:18] 	 anyway, i think the notional spec should be that non-inline html tags generated by wikitext constructs (not explicit html) have whitespace stripped before and after.  with the exception of 
    .
    [22:34:29] 	 that should be consistent and happens to match what we are doing anyway right now ;)
    [22:35:06] 	 oh -- whitespace *after*.  hm.  that's not actually required by your inline list folks, is it subbu?
    [22:35:24] 	 *within*, not after/before.
    [22:35:35] 	 it's a little hard to match the "whitespace after" in Tim's notional grammar.
    [22:35:49] 	 subbu: not anywhere within
    [22:36:12] 	 within means at the end of a list item if there is a following list item? that sounds complicated
    [22:36:16] 	 subbu: just at the start of the firstChild and at the end of the lastChild, assuming those are text nodes.
    [22:36:39] 	 yes .. agreed .. within is not precise.
    [22:36:43] 	 TimStarling: yeah.  grammar would be '* WS content WS NL' something like that.
    [22:36:48] 	 let's just say "start of"?
    [22:36:56] 	 "within" hopefully does not mean removing spaces from between words in a sentence
    [22:37:16] 	 TimStarling: i'm pretty sure that's not what subbu meant, although that's the dictionary definition of what he said
    [22:37:17] 	 :)
    [22:37:17] 	 in table cells removing whitespace from the end of the cell does matter
    [22:37:28] 	 I mean, if you somehow style them as not table cells
    [22:38:01] 	 TimStarling: grammar for table cell divider is WS BAR WS ?  could be done, just a bit messy w/ the lookahead
    [22:38:04] 	 #info agreemt on option 3c: strip whitespace from only block-level wikitext constructs.
    [22:38:05] 	 it's like for definition lists, you can have multiple block level elements generated from one line
    [22:38:33] 	 #info tentative agreement on: let people do crazy things with html tags.
    [22:39:21] 	 DanielK_WMDE, TimStarling also let us say that parsing team will work out precise details and wording of the spec ?
    [22:39:27] 	 i think it's saner to just remove at start of tag, that can be done with the tokenizer in a straightforward way (regardless of whether our actual implementation ends up doing it that way)
    [22:39:45] 	 so, we aren't constrained by having to work out all details .. as long as TechCom has the confidence that we can do it. :)
    [22:39:52] 	 work out all details *now*
    [22:39:52] 	 subbu, cscott: let me play devil's advocate for a minute. how about introducing  or some such, instead of changing the parser to stay compatible with a horrible hack?
    [22:40:38] 	 DanielK_WMDE: I think my wikitext 2.0 proposal was {* *} for new-style lists
    [22:40:46] 	 which could opt in to some different behavior
    [22:40:54] 	  you mean? that could work too, i think .. the trick is what does it involve in terms of editors accepting it.
    [22:40:57] 	 although i don't know if this is the behavior I particularly want to opt in to.
    [22:41:10] 	 subbu: the purpose on having an RFC on this is to make sure we don't introduce syntax we later regret...
    [22:41:15] 	 yes.
    [22:41:22] 	  could be an extension
    [22:41:27] 	 totally.
    [22:41:38] 	 totally was for DanielK_WMDE's comment
    [22:41:40] 	 although then you'd need to be explicit with your 
    [22:41:43] 	 subbu: hlist, of course. my brain is approaching midnight.
    [22:42:23] 	 isn't there already a parser function or something for {#strip} ?
    [22:42:29] 	 cscott: kind of like . which should be called 
    . except that it's not 
    .
    [22:42:31] 	 **   a\n **b\n .. 
    [22:42:54] 	 can the hlist templates be changed from `** ...foo...` to `**{#strip ...foo...}` ?
    [22:43:08] 	 although one of the reasons this RFC came about was because my suggesting that editors strip whitespace from nested lists wasn't acceptable.
    [22:43:09] 	 TimStarling, subbu: was there agreement on when and where to do the stripping?
    [22:43:10] 	 if you have e.g. "; aaa : bbb\n" you would propose 
    aaa
    bbb
    , with the space retained after aaa? [22:43:13] ah, yes, [22:43:15] * cscott shudders [22:43:32] they liked pretty and readable wikitext and wanted their whitespace in lists for readability. [22:43:49] according to izno, quid.dity, perfekeschaos at least. [22:43:57] TimStarling: yes for the def list q, although there's some   craziness in that particular case isn't there? [22:44:22] subbu: give them syntax highlighting instead :) [22:44:44] i'm actually listing the extension idea from DanielK_WMDE [22:44:58] it would be easy to write an extension that would do the stripping in this one particular case [22:45:01] cscott: true, that's weird [22:45:17] although then we'd still have whitespace stripping in headings, which would maybe be an oddball case [22:45:17] DanielK_WMDE, cscott if we want them to do or {#strip .. }, we could ask them to strip whitespace as well, no? [22:45:38] so, i guess we are no longer agreed on 3c at this point. [22:45:38] subbu: well, they want the WS for readability inside the list. i can't argue with that [22:45:56] I don't want [22:46:04] subbu: i'm just saying we could wrap it in something and/or give them slightly different markup to use, so long as it is equally spaced-out-and-readable [22:46:11] and other parser modes are not actually all that easy [22:46:41] ** foo ** bar ** baz seems just as readable, although I'll agree that **{#strip foo }**{#strip bar} probably fails on the readability score [22:47:16] certainly it's inelegant to do them as an extension tag, you have to do recursiveTagParse() [22:47:44] there are plenty of extensions which do that, however [22:47:48] that horse is out of the barn [22:47:57] I think allowing whitespace between the star and the content in a list item is absolutely fine as a human-understandable grammar [22:47:58] all of wikisource is built on it [22:48:37] TimStarling: for the `** foo ** bar ** baz` example, the whitespace before the ** is equally problematic though, isn't it? [22:48:53] surely you know about the implementation problems though, you "shuddered" at the thought of it [22:48:54] #info other ideas proposed as an alternative to proposal 3c: use as an extension or #strip parser function [22:49:39] (fwiw i shudder at because the actual semantics of it are close-to-but-not-the-same as a bunch of other pre-like modes. it's not the recursive parse that bothers me...) [22:49:43] #info this rfc exists because editors want their whitespace in nested lists for readability reasons [22:50:08] subbu: would they be okay with `** foo\n** bar\n` ? [22:50:20] #info I don't want [22:50:44] cscott, i don't know. [22:50:48] the problem with recursiveTagParse() is that it has to re-enter the preprocessor [22:50:50] okay, 10 mins left. [22:50:52] where do we stand? [22:51:03] it breaks frames, it breaks preprocessor DOM caching [22:51:35] resetting: i'm actually fine with stripping at the beginning and end of wikitext block constructs. we just can't pretend to put it in the grammar, it will be part of the semantics of the output (like ws stripping for scribunto arguments, say) [22:51:48] it's consistent with how headings already behave [22:51:53] #info certainly it's inelegant to do them as an extension tag, you have to do recursiveTagParse(); the problem with recursiveTagParse() is that it has to re-enter the preprocessor; it breaks frames, it breaks preprocessor DOM caching [22:51:59] it works the way we want table cells to work [22:52:03] you know editors can't even do {{{1}}} because that's how broken recursiveTagParse() is [22:52:49] so long as you can bypass the whitespace stripping on both sides with if desired, strip away. IMO. [22:53:05] #info resetting: i'm actually fine with stripping at the beginning and end of wikitext block constructs. we just can't pretend to put it in the grammar, it will be part of the semantics of the output (like ws stripping for scribunto arguments, say); it's consistent with how headings already behave; it works the way we want table cells to work [22:53:55] i like consistent stripping on both sides rather than piecemeal approach, and i'm fairly convinced that we can write a short spec justifying which constructs we do this on. [22:54:09] TimStarling: maybe we need an alternative to recursiveTagParse. Somethign like scoped parser options. [22:54:27] ok, five minutes left [22:54:35] no time to dive into that rabbit hole [22:54:46] subbu: is tehre anything critical left to discuss? [22:55:01] #info as it turns out matmarex has already implemented whitespace-stripping for headings, and we didn't know :) [22:55:04] i think most people *expect* whitespace to be insignificant in HTML, and are saddened and surprised when they do something which makes `
    \nfoo...` not actually render the way they expect (with a tiny tiny space in front of or above the foo) [22:55:12] can the RFC be approved, DanielK_WMDE ? [22:55:15] DanielK_WMDE, is there agreement around 3c? [22:55:37] i didn't hear any objections. [22:55:43] i myself know too little about it. [22:55:54] any objections against option 3c? [22:56:04] +1 to 3c [22:56:09] maybe put it for last call ... to give folks 1 week to object .. or whatever is usual techcom practice [22:56:17] subbu: of course [22:56:22] but, we'll work on implementing it. [22:56:23] ok. [22:56:42] note that I asked for objections at 32 minutes past [22:56:44] DanielK_WMDE, ok .. was just clarifyign since i wasn't sure what "approved" meant. :) [22:56:47] subbu: min 1 week after announcement. that means two weeks from now, in practice. [22:57:07] TimStarling: so 6 days, 23 hours, and 28 for last call is what you're saying? ;) [22:57:10] subbu: yea, sorry, that wasn't quite clear :) [22:57:23] #info so long as you can bypass the whitespace stripping on both sides with if desired, strip away. IMO. [22:57:42] no more qns. from me at this point. [22:58:17] #action parsing team to clarify precise details and a human-understandable description of the change. [22:58:22] #info no objections raised, RFC to go on last call for approval. [22:58:26] is there such a thing as #action? :) [22:58:46] i think so. i just stick to the three commands i know ;) [22:58:46] i can add it back as info if not [22:58:50] https://wiki.debian.org/MeetBot [22:59:05] there is .. alright. [22:59:15] yes, action exists. even #agreed exists :) i keep forgetting that [22:59:22] ok, thanks all! [22:59:32] \o/ thanks everyone for your input. [22:59:39] #endmeeing [22:59:45] #endmeeting [22:59:45] i know TimStarling enjoyed the geek out even if it sounded like a complaint. :) [22:59:46] Meeting ended Wed Feb 28 22:59:45 2018 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [22:59:46] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-02-28-22.02.html [22:59:46] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-02-28-22.02.txt [22:59:46] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-02-28-22.02.wiki [22:59:46] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-02-28-22.02.log.html [22:59:46] Meeting ended Wed Feb 28 22:59:45 2018 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) [22:59:46] Minutes: https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-02-28-22.02.html [22:59:46] Minutes (text): https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-02-28-22.02.txt [22:59:46] Minutes (wiki): https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-02-28-22.02.wiki [22:59:47] Log: https://tools.wmflabs.org/meetbot/wikimedia-office/2018/wikimedia-office.2018-02-28-22.02.log.html [23:00:13] * DanielK_WMDE loves it how both these bots actually write to the same files