WYSIWYG and parser plans (was What is wrong with Wikia's WYSIWYG?)

classic Classic list List threaded Threaded
98 messages Options
12345
Reply | Threaded
Open this post in threaded view
|

WYSIWYG and parser plans (was What is wrong with Wikia's WYSIWYG?)

Brion Vibber
On Mon, May 2, 2011 at 12:04 AM, Tim Starling <[hidden email]>wrote:

> Can someone please tell me, in precise technical terms, what is wrong
> with Wikia's WYSIWYG editor and why we can't use it?
>
> I have heard that it has bugs in it, but I have not been told exactly
> what these bugs are, why they are more relevant for Wikimedia than for
> Wikia, or why they can't be fixed.
>
> Years ago, we talked dismissively about WYSIWYG. We discussed the
> features that a WYSIWYG editor would have to have, pointing out how
> difficult they would be to implement and how we didn't have the
> manpower to pull off such a thing. Now that Wikia has gone ahead and
> implemented those exact features, what is the problem?
>

The most fundamental problem with Wikia's editor remains its fallback
behavior when some structure is unsupported:

  "Source mode required

  Rich text editing has been disabled because the page contains complex
code."

Here's an example of unsupported code, the presence of which makes a page
permanently uneditable by the rich editor until it's removed:

  <table>
  <tr><td>a</td></tr>
  </table>

You can try this out now at http://communitytest.wikia.com/

It will at least let you edit other *sections* that don't contain anything
that scares it, but if the nasty bit is somewhere in what you want to edit,
it just doesn't recover.


There are some smart things in what they're doing: annotating the markup
ought to be a big help in hooking up the rendered HTML bits back to the
original source. The way they hold template invocations and plugins as
standalone placeholders within the rich text is pretty good (and could be a
bit better if it could display some content and provide even more advanced
invocation editing tools, which is all detail work).

But if it just gives up on entire pages, we've got a problem because to
handle Wikipedia we need to handle lllooonnnggg pages that tend to include
lots of complex templates which pull in funky code of their own.

At a minimum, assuming that other round-tripping problems are all resolved
and the treatment of templates and extensions can be improved, it would need
to be changed to recognize uneditable chunks and present them as a sort of
placeholders too -- like the templates you should be able to dive into
source and edit them if need be, but they ought not destroy the rest of the
page.


Beyond that let's flip the question the other way -- what do we *want* out
of WYSIWYG editing, and can that tool provide it or what else do we need?
I've written up some notes a few weeks ago, which need some more collation &
updating from the preliminary experiments I'm doing, and I would strongly
appreciate more feedback from you Tim and from everyone else who's been
poking about in parser & editing land:

  http://www.mediawiki.org/wiki/Wikitext.next

And also some of Trevor's notes which I have poked at:

  http://www.mediawiki.org/wiki/Visual_Editor_design

I've got some aggressive ideas about normalizing how we deal with template
expansion to work at the parse tree level; this can be friendlier to some
levels of caching, splitting portions of parsing between PHP and optimized
native code, or even mixing some things between pre-parsed text and
client-side work, but most importantly I'm interested in making sure we have
a relatively clean hierarchical relationship between parts of the document,
which we can use to much more reliably hook up parts of the rendered HTML
output:

* maintain an abstract parse tree that can be hooked up fully to both the
original source text *and* the live output DOM
* do section, paragraph, or table-cell editing inline directly on a view
page, with predictable replacements

It may well be that this is too expansive and we'll want to contract to
something that's more like Wikia's annotated parser output -- in most cases
it should give us similar information but it'll probably be harder to
replace parts of the page at runtime in JavaScript.


Another goal beyond editing itself is normalizing the world of 'alternate
parsers'. There've been several announced recently, and we've got such a
large array now of them available, all a little different. We even use mwlib
ourselves in the PDF/ODF export deployment, and while we don't maintain that
engine we need to coordinate a little with the people who do so that new
extensions and structures get handled.

A new visual editor that's built around a normalized, defined parser could
be a great help; other folks will be able to use compatible parsers instead
of mostly-similar parsers.


For the moment I'm mostly schooling myself on the current state of the world
and setting up experimental tools to aid in debugging extra
parser/editor-related goodies (eg the inspector tool I'm fiddling with at
http://en.wikipedia.org/wiki/User:Brion_VIBBER/vector.js ), but hope to get
some of these projects starting moving forward after Berlin.

-- brion
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WYSIWYG and parser plans (was What is wrong with Wikia's WYSIWYG?)

Fred Bauder-2

> Beyond that let's flip the question the other way -- what do we *want*
> out
> of WYSIWYG editing, and can that tool provide it or what else do we need?

We want something simpler and easier to use. That is not what Wikia has.
I could hardly stand trying it out for a few minutes.

Fred


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WYSIWYG and parser plans (was What is wrong with Wikia's WYSIWYG?)

Magnus Manske-2
On Mon, May 2, 2011 at 7:33 PM, Fred Bauder <[hidden email]> wrote:
>
>> Beyond that let's flip the question the other way -- what do we *want*
>> out
>> of WYSIWYG editing, and can that tool provide it or what else do we need?
>
> We want something simpler and easier to use. That is not what Wikia has.
> I could hardly stand trying it out for a few minutes.

So, why not use my WYSIFTW approach? It will only "parse" the parts of
the wikitext that it can turn back, edited or unedited, into wikitext,
unaltered (including whitespace) if not manually changed. Some parts
may therefore stay as wikitext, but it's very rare (except lists,
which I didn't implement yet, but they look intuitive enough).

Today's featured article parses in 2 sec in Chrome, so it's fast
enough for most situations using a current browser, and it also
supports section editing. There's basic functionality for most things,
even a one-click "insert reference" function. There's also still lots
missing, but nothing fundamental, mostly time-sink functions like
"insert table column" etc.

Magnus

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WYSIWYG and parser plans (was What is wrong with Wikia's WYSIWYG?)

Brion Vibber
On Mon, May 2, 2011 at 12:55 PM, Magnus Manske
<[hidden email]>wrote:

> On Mon, May 2, 2011 at 7:33 PM, Fred Bauder <[hidden email]>
> wrote:
> >>> Beyond that let's flip the question the other way -- what do we *want*
> >> out
> >> of WYSIWYG editing, and can that tool provide it or what else do we
> need?
> >
> > We want something simpler and easier to use. That is not what Wikia has.
> > I could hardly stand trying it out for a few minutes.
>
> So, why not use my WYSIFTW approach? It will only "parse" the parts of
> the wikitext that it can turn back, edited or unedited, into wikitext,
> unaltered (including whitespace) if not manually changed. Some parts
> may therefore stay as wikitext, but it's very rare (except lists,
> which I didn't implement yet, but they look intuitive enough).
>

There's a lot I like about the WYSIFTW tool:
* replacing the section edits inline is kinda nice
* folding of extensions and templates is intelligent and allows you to edit
them easily (unlike Wikia's which drops in opaque placeholders, currently
requiring you to switch the *entire* section to source mode to change them
at all) -- some infoboxes for instance show up as basically editable tables
of parameter pairs, which is pretty workable!
* popup menus on links, images, etc provide access to detail controls
without cluttering up their regular view

I've added a side-by-side view of a popular article (top of [[w:Barack
Obama]]) with its WYSIFTW editing view and the Wikia editor (which just
gives up and shows source) at:

http://www.mediawiki.org/wiki/Wikitext.next#Problems

There are though cases where WYSIFTW gets confused, such as a <ref> with
multi-line contents -- it doesn't get that the lists, templates etc are
inside the ref rather than outside, which messes up the folding.

These sorts of things are why I think it'd be a win to use a common
wikitext->AST parser for both rendering and editing tasks: if they're
consistent we eliminate a lot of such odd edge cases. It could also make it
much easier to do fine-grained editing; instead of invoking the editor on an
entire section at a time, we could click straight into a paragraph, table,
reference, etc, knowing that the editor and the renderer both are dividing
the page up the same way.

-- brion
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WYSIWYG and parser plans (was What is wrong with Wikia's WYSIWYG?)

Platonides
In reply to this post by Magnus Manske-2
Magnus Manske wrote:
>
> So, why not use my WYSIFTW approach? It will only "parse" the parts of
> the wikitext that it can turn back, edited or unedited, into wikitext,
> unaltered (including whitespace) if not manually changed. Some parts
> may therefore stay as wikitext, but it's very rare (except lists,
> which I didn't implement yet, but they look intuitive enough).
>
> Magnus

Crazy idea: What if it was an /extensible/ editor? You could add later a
module for enable lists, or "enable graphic <ref>", but also instruct it
on how to present to the user some crazy template with a dozen parameters...



_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WYSIWYG and parser plans (was What is wrong with Wikia's WYSIWYG?)

Freako F. Freakolowsky
best idea so far ...

On 03. 05. 2011 00:29, Platonides wrote:

> Magnus Manske wrote:
>> So, why not use my WYSIFTW approach? It will only "parse" the parts of
>> the wikitext that it can turn back, edited or unedited, into wikitext,
>> unaltered (including whitespace) if not manually changed. Some parts
>> may therefore stay as wikitext, but it's very rare (except lists,
>> which I didn't implement yet, but they look intuitive enough).
>>
>> Magnus
> Crazy idea: What if it was an /extensible/ editor? You could add later a
> module for enable lists, or "enable graphic<ref>", but also instruct it
> on how to present to the user some crazy template with a dozen parameters...
>
>
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WYSIWYG and parser plans (was What is wrong with Wikia's WYSIWYG?)

lee worden
In reply to this post by Brion Vibber
On 05/02/11 15:30, [hidden email] wrote:

> Date: Tue, 03 May 2011 00:29:51 +0200
> From: Platonides<[hidden email]>
> Subject: Re: [Wikitech-l] WYSIWYG and parser plans (was What is wrong
> with Wikia's WYSIWYG?)
> To:[hidden email]
> Message-ID:<ipnb0i$omi$[hidden email]>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Magnus Manske wrote:
>> >
>> >  So, why not use my WYSIFTW approach? It will only "parse" the parts of
>> >  the wikitext that it can turn back, edited or unedited, into wikitext,
>> >  unaltered (including whitespace) if not manually changed. Some parts
>> >  may therefore stay as wikitext, but it's very rare (except lists,
>> >  which I didn't implement yet, but they look intuitive enough).
>> >
>> >  Magnus
> Crazy idea: What if it was an/extensible/  editor? You could add later a
> module for enable lists, or "enable graphic<ref>", but also instruct it
> on how to present to the user some crazy template with a dozen parameters...

Seems like it will need to be extensible, to allow authors of MW
extensions to add support for cases where they've changed the parser's
behavior?

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WYSIWYG and parser plans (was What is wrong with Wikia's WYSIWYG?)

George William Herbert
In reply to this post by Platonides
On Mon, May 2, 2011 at 3:29 PM, Platonides <[hidden email]> wrote:

> Magnus Manske wrote:
>>
>> So, why not use my WYSIFTW approach? It will only "parse" the parts of
>> the wikitext that it can turn back, edited or unedited, into wikitext,
>> unaltered (including whitespace) if not manually changed. Some parts
>> may therefore stay as wikitext, but it's very rare (except lists,
>> which I didn't implement yet, but they look intuitive enough).
>>
>> Magnus
>
> Crazy idea: What if it was an /extensible/ editor? You could add later a
> module for enable lists, or "enable graphic <ref>", but also instruct it
> on how to present to the user some crazy template with a dozen parameters...

Generically a nice idea.

Specific to Wikipedia / WMF projects - all the extensions you might
consider adding are pretty much required for our internal uptake of
the tool, as our pages are the biggest / oldest / crustyest ones
likely to have to be managed...


--
-george william herbert
[hidden email]

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WYSIWYG and parser plans (was What is wrong with Wikia's WYSIWYG?)

Tim Starling-2
In reply to this post by Brion Vibber
On 03/05/11 04:25, Brion Vibber wrote:
> The most fundamental problem with Wikia's editor remains its fallback
> behavior when some structure is unsupported:
>
>   "Source mode required
>
>   Rich text editing has been disabled because the page contains complex
> code."

I don't think that's a fundamental problem, I think it's a quick hack
added to reduce the development time devoted to rare wikitext
constructs, while maintaining round-trip safety. Like you said further
down in your post, it can be handled more elegantly by replacing the
complex code with a placeholder. Why not just do that?

CKEditor makes adding such placeholders really easy. The RTE source
has a long list of such client-side modules, added to support various
Wikia extensions.

> Here's an example of unsupported code, the presence of which makes a page
> permanently uneditable by the rich editor until it's removed:
>
>   <table>
>   <tr><td>a</td></tr>
>   </table>
>
> You can try this out now at http://communitytest.wikia.com/

Works for me.

http://communitytest.wikia.com/wiki/Brion%27s_table

> Beyond that let's flip the question the other way -- what do we *want* out
> of WYSIWYG editing, and can that tool provide it or what else do we need?
> I've written up some notes a few weeks ago, which need some more collation &
> updating from the preliminary experiments I'm doing, and I would strongly
> appreciate more feedback from you Tim and from everyone else who's been
> poking about in parser & editing land:
>
>   http://www.mediawiki.org/wiki/Wikitext.next

Some people in this thread have expressed concerns about the tiny
breakages in wikitext backwards compatibility introduced by RTE,
despite the fact that RTE has aimed for, and largely achieved, precise
backwards compatibility with legacy wikitext.

I find it hard to believe that those people would be comfortable with
a project which has as its goal a broad reform of wikitext syntax.

Perhaps there are good arguments for wikitext syntax reform, but I
have trouble believing that WYSIWYG support is one of them, since the
problem appears to have been solved already by RTE, without any reform.

> Another goal beyond editing itself is normalizing the world of 'alternate
> parsers'. There've been several announced recently, and we've got such a
> large array now of them available, all a little different. We even use mwlib
> ourselves in the PDF/ODF export deployment, and while we don't maintain that
> engine we need to coordinate a little with the people who do so that new
> extensions and structures get handled.

I know that there is a camp of data reusers who like to write their
own parsers. I think there are more people who have written a wikitext
parser from scratch than have contributed even a small change to the
MediaWiki core parser. They have a lot of influence, because they go
to conferences and ask for things face-to-face.

Now that we have HipHop support, we have the ability to turn
MediaWiki's core parser into a fast, reusable library. The performance
reasons for limiting the amount of abstraction in the core parser will
disappear. How many wikitext parsers does the world really need?

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WYSIWYG and parser plans (was What is wrong with Wikia's WYSIWYG?)

Chad
On Mon, May 2, 2011 at 8:28 PM, Tim Starling <[hidden email]> wrote:

> I know that there is a camp of data reusers who like to write their
> own parsers. I think there are more people who have written a wikitext
> parser from scratch than have contributed even a small change to the
> MediaWiki core parser. They have a lot of influence, because they go
> to conferences and ask for things face-to-face.
>
> Now that we have HipHop support, we have the ability to turn
> MediaWiki's core parser into a fast, reusable library. The performance
> reasons for limiting the amount of abstraction in the core parser will
> disappear. How many wikitext parsers does the world really need?
>

People want to write their own parsers because they don't want to use PHP.
They want to parse in C, Java, Ruby, Python, Perl, Assembly and every
other language other than the one that it wasn't written in. There's this, IMHO,
misplaced belief that "standardizing" the parser or markup would put us in a
world of unicorns and rainbows where people can write their own parsers on
a whim, just because they can. Other than "making it easier to integrate with
my project," I don't see a need for them either (and tbh, the endless
discussions grow tedious).

I don't see any problem with keeping the parser in PHP, and as you point out
with HipHop support on the not-too-distant horizon the complaints about
performance with Zend will largely evaporate.

-Chad

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WYSIWYG and parser plans (was What is wrong with Wikia's WYSIWYG?)

Chad
On Mon, May 2, 2011 at 8:38 PM, Chad <[hidden email]> wrote:
> People want to write their own parsers because they don't want to use PHP.
> They want to parse in C, Java, Ruby, Python, Perl, Assembly and every
> other language other than the one that it wasn't written in.

s/wasn't/was/

-Chad

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WYSIWYG and parser plans (was What is wrong with Wikia's WYSIWYG?)

Brion Vibber
In reply to this post by Tim Starling-2
On Mon, May 2, 2011 at 5:28 PM, Tim Starling <[hidden email]>wrote:

> On 03/05/11 04:25, Brion Vibber wrote:
> > The most fundamental problem with Wikia's editor remains its fallback
> > behavior when some structure is unsupported:
> >
> >   "Source mode required
> >
> >   Rich text editing has been disabled because the page contains complex
> > code."
>
> I don't think that's a fundamental problem, I think it's a quick hack
> added to reduce the development time devoted to rare wikitext
> constructs, while maintaining round-trip safety. Like you said further
> down in your post, it can be handled more elegantly by replacing the
> complex code with a placeholder. Why not just do that?
>

Excellent question -- how hard would it be to change that?

I'm fairly sure that's easier to do with an abstract parse tree generated
from source (don't recognize it? stash it in a dedicated blob); I worry it
may be harder trying to stash that into the middle of a multi-level HTML
translation engine that wasn't meant to be reversible in the first place (do
we even know if there's an opportunity to recognize the problem component
within the annotated HTML or not? Is it seeing things it doesn't recognize
in the HTML, or is it seeing certain structures in the source and aborting
before it even gets there?).

Like many such things, this might be better resolved by trying it and seeing
what happens -- I don't want us to lock into a strategy too early when a lot
of ideas are still unresolved.


I'm very interested in making experimentation easy; for my pre-exploratory
work I'm stashing things into a gadget which adds render/parse
tree/inspector modes to the editing page:

http://www.mediawiki.org/wiki/File:Parser_Playground_demo.png (screenshot &
links)

I've got this set up as a gadget on mediawiki.org now and as a user script
on en.wikipedia.org (loaded on User:Brion_VIBBER/vector.js) just for tossing
random pages in and getting a better sense of how things break down.
Currently parser variant choices are:

* the actual MediaWiki parser via API (parse tree shows the preprocessor
XML; side-by-side mode doesn't have a working inspector mode though)
* a really crappy FakeParser class I threw together, able to handle only a
few constructs. Generates a JSON parse tree, and the inspector mode can
match up nodes in side-by-side view of the tree & HTML.
* PegParser using the peg.js parser generator to build the source->tree
parser, and the same tree->html and tree->source round-trip functions as
FakeParser. The peg source can be edited and rerun to regen the new parse
tree. It's fun!

These are a long way off from the level of experimental support we're going
to want, but I think people are going to benefit from trying a few different
things and getting a better feel for how source, parse trees, and resulting
HTML really will look.

(Template expansion isn't yet presented in this system, and that's going to
be where the real fun is. ;)


Some people in this thread have expressed concerns about the tiny
> breakages in wikitext backwards compatibility introduced by RTE,
> despite the fact that RTE has aimed for, and largely achieved, precise
> backwards compatibility with legacy wikitext.

I find it hard to believe that those people would be comfortable with
> a project which has as its goal a broad reform of wikitext syntax.
>
> Perhaps there are good arguments for wikitext syntax reform, but I
> have trouble believing that WYSIWYG support is one of them, since the
> problem appears to have been solved already by RTE, without any reform.
>

Well, Wikia's RTE still doesn't work on high-profile Wikipedia article
pages, so that remains unproven...

That said, an RTE that doesn't require changing core parser behavior yet
*WILL BE A HUGE BENEFIT* to getting it into use sooner, and still leaves
future reform efforts open.

I'm *VERY OPEN* to the notion of doing the RTE using either a supplementary
source-level parser (which doesn't have to render all structures 100% the
same as the core parser, but *needs* to always create sensible structures
that are useful for editors and can round-trip cleanly) or an alternate
version of the core parser with annotations and limited transformations (eg
like how we don't strip comments out when producing editable source, so we
need to keep them in the output in some way if it's going to be fed into an
HTML-ish editing view).

A supplementary parser that deals with all your editing fun, but doesn't
play super nice with open...close templates is probably just fine for a huge
number of purposes.

Now that we have HipHop support, we have the ability to turn
> MediaWiki's core parser into a fast, reusable library. The performance
> reasons for limiting the amount of abstraction in the core parser will
> disappear. How many wikitext parsers does the world really need?
>

I'm not convinced that a giant blob of MediaWiki is suitable as a reusable
library, but would love to see it tried.

-- brion
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WYSIWYG and parser plans (was What is wrong with Wikia's WYSIWYG?)

Brion Vibber
On Mon, May 2, 2011 at 5:55 PM, Brion Vibber <[hidden email]> wrote:

> On Mon, May 2, 2011 at 5:28 PM, Tim Starling <[hidden email]>wrote:
>
>> I don't think that's a fundamental problem, I think it's a quick hack
>> added to reduce the development time devoted to rare wikitext
>> constructs, while maintaining round-trip safety. Like you said further
>> down in your post, it can be handled more elegantly by replacing the
>> complex code with a placeholder. Why not just do that?
>>
>
> Excellent question -- how hard would it be to change that?
>
> I'm fairly sure that's easier to do with an abstract parse tree generated
> from source (don't recognize it? stash it in a dedicated blob); I worry it
> may be harder trying to stash that into the middle of a multi-level HTML
> translation engine that wasn't meant to be reversible in the first place (do
> we even know if there's an opportunity to recognize the problem component
> within the annotated HTML or not? Is it seeing things it doesn't recognize
> in the HTML, or is it seeing certain structures in the source and aborting
> before it even gets there?).
>
> Like many such things, this might be better resolved by trying it and
> seeing what happens -- I don't want us to lock into a strategy too early
> when a lot of ideas are still unresolved.
>

Had a quick chat with Tim in IRC -- we're definitely going to try poking at
the current state of the Wikia RTE a bit more.

I'll start merging it to our extensions SVN so we've got a stable clone of
it that can be run on stock trunk. Little changes should be mergable back to
Wikia's SVN, and we'll have something available for stock distributions
that's more stable than the old FCK extension, and that we can start
experimenting with along with other stuff.

Another good thing in this code is the client-side editor plugins; once one
gets past the raw "shove stuff in/out of the markup format" most of the hard
work and value of an editor actually comes in the helpers for working with
links, images, tables, galleries, etc -- dialogs, wizards, helpers for
dragging things around. That's all stuff that we can examine and improve or
base from.

-- brion
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WYSIWYG and parser plans (was What is wrong with Wikia's WYSIWYG?)

Andreas Jonsson-5
In reply to this post by Chad
2011-05-03 02:38, Chad skrev:

> On Mon, May 2, 2011 at 8:28 PM, Tim Starling <[hidden email]> wrote:
>> I know that there is a camp of data reusers who like to write their
>> own parsers. I think there are more people who have written a wikitext
>> parser from scratch than have contributed even a small change to the
>> MediaWiki core parser. They have a lot of influence, because they go
>> to conferences and ask for things face-to-face.
>>
>> Now that we have HipHop support, we have the ability to turn
>> MediaWiki's core parser into a fast, reusable library. The performance
>> reasons for limiting the amount of abstraction in the core parser will
>> disappear. How many wikitext parsers does the world really need?
>>
>
> People want to write their own parsers because they don't want to use PHP.
> They want to parse in C, Java, Ruby, Python, Perl, Assembly and every
> other language other than the one that it wasn't written in. There's this, IMHO,
> misplaced belief that "standardizing" the parser or markup would put us in a
> world of unicorns and rainbows where people can write their own parsers on
> a whim, just because they can. Other than "making it easier to integrate with
> my project," I don't see a need for them either (and tbh, the endless
> discussions grow tedious).

My motivation for attacking the task of creating a wikitext parser is,
aside from it being an interesting problem, a genuin concern for the
fact that such a large body of data is encoded in such a vaguely
specified format.

> I don't see any problem with keeping the parser in PHP, and as you point out
> with HipHop support on the not-too-distant horizon the complaints about
> performance with Zend will largely evaporate.

But most of the parser's work consists of running regexp pattern
matching over the article text, doesn't it?  Regexp pattern matching are
implemented by native functions.  Does the Zend engine have a slow
regexp implementation?  I would have guessed that the main reason that
the parser is slow is the algorithm, not its implementation.

Best Regards,

Andreas Jonsson

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WYSIWYG and parser plans (was What is wrong with Wikia's WYSIWYG?)

Daniel Friesen-4
On 11-05-03 03:40 AM, Andreas Jonsson wrote:

> 2011-05-03 02:38, Chad skrev:
> [...]
>> I don't see any problem with keeping the parser in PHP, and as you point out
>> with HipHop support on the not-too-distant horizon the complaints about
>> performance with Zend will largely evaporate.
> But most of the parser's work consists of running regexp pattern
> matching over the article text, doesn't it?  Regexp pattern matching are
> implemented by native functions.  Does the Zend engine have a slow
> regexp implementation?  I would have guessed that the main reason that
> the parser is slow is the algorithm, not its implementation.
>
> Best Regards,
>
> Andreas Jonsson
regexps might be fast, but when you have to run hundreds of them all
over the place and do stuff in-language then the language becomes the
bottleneck.

--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Zend performance (Was: WYSIWYG and parser plans)

Domas Mituzas
>
> regexps might be fast, but when you have to run hundreds of them all
> over the place and do stuff in-language then the language becomes the
> bottleneck.

some oprofile data showsthat pcre is few percent of execution time - and there's really lots of Zend internals that stand in the way - memory management (HPHP implements it as C++ object allocations via jemalloc), symbol resolutions (native calls in C++), etc.

Domas

samples  %        image name               app name                 symbol name
492400    9.6648  libphp5.so               libphp5.so               _zend_mm_alloc_int
451573    8.8634  libc-2.7.so              libc-2.7.so              (no symbols)
347812    6.8268  libphp5.so               libphp5.so               zend_hash_find
345665    6.7847  no-vmlinux               no-vmlinux               (no symbols)
330513    6.4873  libphp5.so               libphp5.so               _zend_mm_free_int
225755    4.4311  libpcre.so.3.12.1        libpcre.so.3.12.1        (no symbols)
159925    3.1390  libphp5.so               libphp5.so               zend_do_fcall_common_helper_SPEC
137709    2.7029  libphp5.so               libphp5.so               _zval_ptr_dtor
127233    2.4973  libxml2.so.2.6.31        libxml2.so.2.6.31        (no symbols)
111249    2.1836  libphp5.so               libphp5.so               zend_hash_quick_find
93994     1.8449  libphp5.so               libphp5.so               _zend_hash_quick_add_or_update
84693     1.6623  libphp5.so               libphp5.so               zend_assign_to_variable
84256     1.6538  fss.so                   fss.so                   (no symbols)
56474     1.1085  libphp5.so               libphp5.so               execute
49959     0.9806  libphp5.so               libphp5.so               zend_hash_destroy
48450     0.9510  libz.so.1.2.3.3          libz.so.1.2.3.3          (no symbols)
46967     0.9219  libphp5.so               libphp5.so               ZEND_JMPZ_SPEC_TMP_HANDLER
46523     0.9131  libphp5.so               libphp5.so               _zend_hash_add_or_update
45747     0.8979  libphp5.so               libphp5.so               zend_str_tolower_copy
39154     0.7685  libphp5.so               libphp5.so               zend_fetch_dimension_address
35356     0.6940  libphp5.so               libphp5.so               ZEND_RECV_SPEC_HANDLER
33381     0.6552  libphp5.so               libphp5.so               compare_function
32660     0.6410  libphp5.so               libphp5.so               _zend_hash_index_update_or_next_insert
31815     0.6245  libphp5.so               libphp5.so               zend_parse_va_args
31689     0.6220  libphp5.so               libphp5.so               ZEND_SEND_VAR_SPEC_CV_HANDLER
31554     0.6193  libphp5.so               libphp5.so               _emalloc
30404     0.5968  libphp5.so               libphp5.so               _get_zval_ptr_var
29812     0.5851  libphp5.so               libphp5.so               ZEND_ASSIGN_REF_SPEC_CV_VAR_HANDLER
28092     0.5514  libphp5.so               libphp5.so               ZEND_DO_FCALL_SPEC_CONST_HANDLER
27760     0.5449  libphp5.so               libphp5.so               zend_hash_clean
27589     0.5415  libphp5.so               libphp5.so               zend_fetch_var_address_helper_SPEC_CONST
26731     0.5247  libphp5.so               libphp5.so               _zval_dtor_func
24732     0.4854  libphp5.so               libphp5.so               ZEND_ASSIGN_SPEC_CV_VAR_HANDLER
24732     0.4854  libphp5.so               libphp5.so               ZEND_RECV_INIT_SPEC_CONST_HANDLER
22587     0.4433  libphp5.so               libphp5.so               zend_send_by_var_helper_SPEC_CV
22176     0.4353  libphp5.so               libphp5.so               _efree
21911     0.4301  libphp5.so               libphp5.so               .plt
21102     0.4142  libphp5.so               libphp5.so               ZEND_SEND_VAL_SPEC_CONST_HANDLER
19556     0.3838  libphp5.so               libphp5.so               zend_fetch_property_address_read_helper_SPEC_UNUSED_CONST
18568     0.3645  libphp5.so               libphp5.so               zend_get_property_info
18348     0.3601  libphp5.so               libphp5.so               zend_std_get_method
18279     0.3588  libphp5.so               libphp5.so               zend_get_hash_value
17944     0.3522  libphp5.so               libphp5.so               php_var_unserialize
17461     0.3427  libphp5.so               libphp5.so               _zval_copy_ctor_func
17187     0.3373  libtidy-0.99.so.0.0.0    libtidy-0.99.so.0.0.0    (no symbols)
16341     0.3207  libphp5.so               libphp5.so               zend_get_parameters_ex
16103     0.3161  libphp5.so               libphp5.so               zend_std_read_property
15662     0.3074  libphp5.so               libphp5.so               zend_hash_copy
14678     0.2881  libphp5.so               libphp5.so               zend_binary_strcmp
14556     0.2857  apc.so                   apc.so                   my_copy_hashtable_ex
14279     0.2803  libphp5.so               libphp5.so               _zend_mm_realloc_int
13993     0.2747  oprofiled                oprofiled                (no symbols)
13680     0.2685  libphp5.so               libphp5.so               dom_nodelist_length_read
13265     0.2604  libphp5.so               libphp5.so               zval_add_ref
13166     0.2584  libphp5.so               libphp5.so               zend_objects_store_del_ref_by_handle
13084     0.2568  libphp5.so               libphp5.so               ZEND_INIT_METHOD_CALL_SPEC_CV_CONST_HANDLER
13030     0.2558  libphp5.so               libphp5.so               zend_assign_to_object
11822     0.2320  libphp5.so               libphp5.so               ZEND_INSTANCEOF_SPEC_CV_HANDLER
11511     0.2259  libphp5.so               libphp5.so               zend_fetch_property_address_read_helper_SPEC_CV_CONST
11425     0.2242  libphp5.so               libphp5.so               _estrndup
11340     0.2226  libphp5.so               libphp5.so               zendi_smart_strcmp
11227     0.2204  libphp5.so               libphp5.so               ZEND_JMPZ_SPEC_VAR_HANDLER
11174     0.2193  libphp5.so               libphp5.so               ZEND_FETCH_CLASS_SPEC_CONST_HANDLER
11080     0.2175  libphp5.so               libphp5.so               _zend_hash_init
10908     0.2141  libphp5.so               libphp5.so               zend_object_store_get_object
10623     0.2085  libphp5.so               libphp5.so               zend_assign_to_variable_reference
10577     0.2076  libphp5.so               libphp5.so               zend_hash_index_find
10231     0.2008  libphp5.so               libphp5.so               ZEND_JMP_SPEC_HANDLER
10227     0.2007  libphp5.so               libphp5.so               ZEND_RETURN_SPEC_CONST_HANDLER
9400      0.1845  libphp5.so               libphp5.so               _safe_emalloc
8973      0.1761  libphp5.so               libphp5.so               ZEND_BOOL_SPEC_TMP_HANDLER
8652      0.1698  libphp5.so               libphp5.so               zend_lookup_class_ex
8504      0.1669  libphp5.so               libphp5.so               ZEND_JMPZ_EX_SPEC_TMP_HANDLER
8489      0.1666  libphp5.so               libphp5.so               zend_call_function
8448      0.1658  libphp5.so               libphp5.so               convert_to_boolean
8307      0.1630  libphp5.so               libphp5.so               ZEND_JMPZ_SPEC_CV_HANDLER
8297      0.1629  libphp5.so               libphp5.so               zend_hash_rehash
8092      0.1588  libphp5.so               libphp5.so               ZEND_INIT_METHOD_CALL_SPEC_UNUSED_CONST_HANDLER
7855      0.1542  libphp5.so               libphp5.so               ZEND_RETURN_SPEC_VAR_HANDLER
7659      0.1503  libphp5.so               libphp5.so               instanceof_function_ex
7552      0.1482  libphp5.so               libphp5.so               ZEND_FE_FETCH_SPEC_VAR_HANDLER
7383      0.1449  libphp5.so               libphp5.so               ZEND_FETCH_OBJ_R_SPEC_UNUSED_CONST_HANDLER
7036      0.1381  libphp5.so               libphp5.so               is_identical_function
7012      0.1376  libphp5.so               libphp5.so               php_is_type
6907      0.1356  libphp5.so               libphp5.so               zend_hash_get_current_data_ex
6901      0.1355  libphp5.so               libphp5.so               ZEND_SEND_REF_SPEC_CV_HANDLER
6881      0.1351  libphp5.so               libphp5.so               concat_function
6860      0.1346  libphp5.so               libphp5.so               zend_hash_del_key_or_index
6843      0.1343  libphp5.so               libphp5.so               php_pcre_match_impl
6648      0.1305  libphp5.so               libphp5.so               zend_isset_isempty_dim_prop_obj_handler_SPEC_VAR_CV
6600      0.1295  libphp5.so               libphp5.so               ZEND_ASSIGN_DIM_SPEC_CV_UNUSED_HANDLER
6538      0.1283  libphp5.so               libphp5.so               _phpi_pop
6306      0.1238  libphp5.so               libphp5.so               zend_get_constant_ex
6254      0.1228  libphp5.so               libphp5.so               zif_strtr
5901      0.1158  libphp5.so               libphp5.so               zend_fetch_class
5829      0.1144  libphp5.so               libphp5.so               zif_dom_nodelist_item
5809      0.1140  libphp5.so               libphp5.so               sub_function
5805      0.1139  libphp5.so               libphp5.so               zend_std_write_property
5789      0.1136  libphp5.so               libphp5.so               ZEND_RETURN_SPEC_CV_HANDLER
5753      0.1129  libphp5.so               libphp5.so               _ecalloc
5678      0.1114  libmysqlclient.so.15.0.0 libmysqlclient.so.15.0.0 (no symbols)
5650      0.1109  libphp5.so               libphp5.so               ZEND_ADD_ARRAY_ELEMENT_SPEC_CONST_UNUSED_HANDLER
5470      0.1074  libphp5.so               libphp5.so               ZEND_FETCH_W_SPEC_CONST_HANDLER
5262      0.1033  libphp5.so               libphp5.so               ZEND_SEND_VAL_SPEC_TMP_HANDLER
5259      0.1032  libphp5.so               libphp5.so               ZEND_ASSIGN_SPEC_CV_TMP_HANDLER
5128      0.1007  libphp5.so               libphp5.so               ZEND_FETCH_DIM_W_SPEC_CV_CV_HANDLER



_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WYSIWYG and parser plans (was What is wrong with Wikia's WYSIWYG?)

Jay Ashworth-2
In reply to this post by Andreas Jonsson-5
----- Original Message -----
> From: "Andreas Jonsson" <[hidden email]>

> Subject: Re: [Wikitech-l] WYSIWYG and parser plans (was What is wrong with Wikia's WYSIWYG?)

> My motivation for attacking the task of creating a wikitext parser is,
> aside from it being an interesting problem, a genuin concern for the
> fact that such a large body of data is encoded in such a vaguely
> specified format.

Correct: Until you have (at least) two independently written parsers, both
of which pass a test suite 100%, you don't have a *spec*.

Or more to the point, it's unclear whether the spec or the code rules, which
can get nasty.

Cheers,
-- jra

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WYSIWYG and parser plans (was What is wrong with Wikia's WYSIWYG?)

MZMcBride-2
In reply to this post by Tim Starling-2
Tim Starling wrote:

>> Another goal beyond editing itself is normalizing the world of 'alternate
>> parsers'. There've been several announced recently, and we've got such a
>> large array now of them available, all a little different. We even use mwlib
>> ourselves in the PDF/ODF export deployment, and while we don't maintain that
>> engine we need to coordinate a little with the people who do so that new
>> extensions and structures get handled.
>
> I know that there is a camp of data reusers who like to write their
> own parsers. I think there are more people who have written a wikitext
> parser from scratch than have contributed even a small change to the
> MediaWiki core parser. They have a lot of influence, because they go
> to conferences and ask for things face-to-face.
>
> Now that we have HipHop support, we have the ability to turn
> MediaWiki's core parser into a fast, reusable library. The performance
> reasons for limiting the amount of abstraction in the core parser will
> disappear. How many wikitext parsers does the world really need?

I realize you have a dry wit, but I imagine this joke was lost on nearly
everyone. You're not really suggesting that everyone who wants to parse
MediaWiki wikitext compile and run HipHop PHP in order to do so.

It's unambiguously a fundamental goal that content on Wikimedia wikis be
able to be easily redistributed, shared, and spread. A wikisyntax that's
impossible to adequately parse in other environments (or in Wikimedia's
environment, for that matter) is a critical and serious inhibitor to this
goal.

MZMcBride



_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WYSIWYG and parser plans (was What is wrong with Wikia's WYSIWYG?)

Chad
On Tue, May 3, 2011 at 2:15 PM, MZMcBride <[hidden email]> wrote:
> I realize you have a dry wit, but I imagine this joke was lost on nearly
> everyone. You're not really suggesting that everyone who wants to parse
> MediaWiki wikitext compile and run HipHop PHP in order to do so.
>

And how is using the parser with HipHop going to be any more
difficult than using it with Zend?

-Chad

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: WYSIWYG and parser plans (was What is wrong with Wikia's WYSIWYG?)

MZMcBride-2
Chad wrote:
> On Tue, May 3, 2011 at 2:15 PM, MZMcBride <[hidden email]> wrote:
>> I realize you have a dry wit, but I imagine this joke was lost on nearly
>> everyone. You're not really suggesting that everyone who wants to parse
>> MediaWiki wikitext compile and run HipHop PHP in order to do so.
>
> And how is using the parser with HipHop going to be any more
> difficult than using it with Zend?

The point is that the wikitext and its parsing should be completely separate
from MediaWiki/PHP/HipHop/Zend.

I think some of the bigger picture is getting lost here. Wikimedia produces
XML dumps that contain wikitext. For most people, this is the only way to
obtain and reuse large amounts of content from Wikimedia wikis (especially
as the HTML dumps haven't been re-created since 2008). There needs to be a
way for others to be able to very easily deal with this content.

Many people have suggested (with good reason) that this means that wikitext
parsing needs to be reproducible in other programming languages. While
HipHop may be the best thing since sliced bread, I've yet to see anyone put
forward a compelling reason that the current state of affairs is acceptable.
Saying "well, it'll soon be much faster for MediaWiki to parse" doesn't
overcome the legitimate issues that re-users have (such as programming in a
language other than PHP, banish the thought).

For me, the idea that all that's needed is a faster parser in PHP is a
complete non-starter.

MZMcBride



_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
12345