I use parsoid to publish email messages into wiki and have a little
Sometimes generated article has "preformatted" fragments that do not
have any special formatting in source text.
After investigation I discovered that it is caused by spaces that start
new line in HTML text.
When source HTML of email is viewed in browser these spaces do not have
any effect, but after converting to wikitext they became part of markup.
Next, trying to discover they way parsoid works I have seen that
normally these spaces became surronded with <nowiki> tag, but in some
circumtances it does not happen.
So I made test HTML file to see different results of converting:
The result of conversion is:
It seems that if new line is just at end of <span> tag, <nowiki> is not
It is possible that Arlo's bugfix will satisfy your use case.
However, note that Parsoid will introduce <nowiki> protection around
characters that will parse differently if not escaped. So "<p> foo<p>"
will convert to "<nowiki> </nowiki>foo". You can avoid this by passing
the 'scrub_wikitext' flag to the html -> wikitext API endpoint . This
tells Parsoid to normalize the input HTML to eliminate the need for
FYI in case this flag is pertinent to your use case.
> On 7/22/19 10:51 AM, Arlo Breault wrote:
>>> On Jul 22, 2019, at 5:11 AM, Sergey F <[hidden email]> wrote:
>>> The result of conversion is:
>> Yes, this looks like a bug
>> See https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/524811 >>
> Thanks Arlo!
> It is possible that Arlo's bugfix will satisfy your use case.
It would have helped if I had actually seen Arlo's patch before I sent
that email - he was fixing a case where we were not adding a nowiki
where it should have been added.
So, you will need to pass the scrub_wikitext parameter if you want to
avoid the nowikis. Or, you can normalize the HTML yourself before
passing it to Parsoid.
Or, if you were just reporting the inconsistency, ignore my emails. :-)