leading space and <span> tag

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

leading space and <span> tag

Sergey F
Hello,

I use parsoid to publish email messages into wiki and have a little
issue.
Sometimes generated article has "preformatted" fragments that do not
have any special formatting in source text.
After investigation I discovered that it is caused by spaces that start
new line in HTML text.
When source HTML of email is viewed in browser these spaces do not have
any effect, but after converting to wikitext they became part of markup.
Next, trying to discover they way parsoid works I have seen that
normally these spaces became surronded with <nowiki> tag, but in some
circumtances it does not happen.

So I made test HTML file to see different results of converting:

<html>
<head>
</head>
<body>

<p>test2<span>
  test3
</span></p>

<p><span>test2
  test3
</span></p>

<p>textx<span>test2
  test3
</span></p>

</body>
</html>

The result of conversion is:

test2<span>
  test3
</span>

<span>test2
<nowiki> </nowiki>test3
</span>

textx<span>test2
<nowiki> </nowiki>test3
</span>

It seems that if new line is just at end of <span> tag, <nowiki> is not
inserted.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: leading space and <span> tag

Arlo Breault


> On Jul 22, 2019, at 5:11 AM, Sergey F <[hidden email]> wrote:
>
> <p>test2<span>
>  test3
> </span></p>
>
> The result of conversion is:
>
> test2<span>
> test3
> </span>

Yes, this looks like a bug

See https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/524811

Thanks


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: leading space and <span> tag

Subramanya Sastry
On 7/22/19 10:51 AM, Arlo Breault wrote:

>> On Jul 22, 2019, at 5:11 AM, Sergey F <[hidden email]> wrote:
>>
>> <p>test2<span>
>>   test3
>> </span></p>
>>
>> The result of conversion is:
>>
>> test2<span>
>> test3
>> </span>
> Yes, this looks like a bug
>
> See https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/524811
>
> Thanks

Thanks Arlo!

Sergey:

It is possible that Arlo's bugfix will satisfy your use case.

However, note that Parsoid will introduce <nowiki> protection around
characters that will parse differently if not escaped. So "<p> foo<p>"
will convert to "<nowiki> </nowiki>foo". You can avoid this by passing
the 'scrub_wikitext' flag to the html -> wikitext API endpoint [1]. This
tells Parsoid to normalize[2] the input HTML to eliminate the need for
those nowikis.

FYI in case this flag is pertinent to your use case.

Subbu.

1.
https://www.mediawiki.org/wiki/Parsoid/API#For_HTML_-%3E_wikitext_requests

2. https://www.mediawiki.org/wiki/Parsoid/Normalizations


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: leading space and <span> tag

Subramanya Sastry
On 7/22/19 11:05 AM, Subramanya Sastry wrote:

> On 7/22/19 10:51 AM, Arlo Breault wrote:
>>> On Jul 22, 2019, at 5:11 AM, Sergey F <[hidden email]> wrote:
>>>
>>> <p>test2<span>
>>>   test3
>>> </span></p>
>>>
>>> The result of conversion is:
>>>
>>> test2<span>
>>> test3
>>> </span>
>> Yes, this looks like a bug
>>
>> See https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/524811
>>
>> Thanks
>
> Thanks Arlo!
>
> Sergey:
>
> It is possible that Arlo's bugfix will satisfy your use case.

It would have helped if I had actually seen Arlo's patch before I sent
that email - he was fixing a case where we were not adding a nowiki
where it should have been added.

So, you will need to pass the scrub_wikitext parameter if you want to
avoid the nowikis. Or, you can normalize the HTML yourself before
passing it to Parsoid.

Or, if you were just reporting the inconsistency, ignore my emails. :-)

Subbu.


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l