Cutting MediaWiki loose from wikitext

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Cutting MediaWiki loose from wikitext

Daniel Kinzler
Hi all. I have a bold proposal (read: evil plan).

To put it briefly: I want to remove the assumption that MediaWiki pages contain
always wikitext. Instead, I propose a pluggable handler system for different
types of content, similar to what we have for file uploads. So, I propose to
associate a "content model" identifier with each page, and have handlers for
each model that provide serialization, rendering, an editor, etc.

The background is that the Wikidata project needs a way to store structured data
(JSON) on wiki pages instead of wikitext. Having a pluggable system would solve
that problem along with several others, like doing away with the special cases
for JS/CSS, the ability to maintain categories etc separate from body text,
manage Gadgets sanely on a wiki page, or several other things (see the link below).

I have described my plans in more detail on meta:

  http://meta.wikimedia.org/wiki/Wikidata/Notes/ContentHandler

A very rough prototype is in a dev branch here:

  http://svn.wikimedia.org/svnroot/mediawiki/branches/Wikidata/phase3/

Please let me know what you think (here on the list, preferably, not on the talk
page there, at least for now).

Note that we *definitely* need this ability for Wikidata. We could do it
differently, but I think this would be the cleanest solution, and would have a
lot of mid- and long term benefits, even if it's a short term pain. I'm
presenting my plan here to find out if I'm on the right track, and whether it is
feasible to put this on the road map for 1.20. It would be my (and the Wikidata
team's) priority to implement this and see it through before Wikimania. I'm
convinced we have the manpower to get it done.

Cheers,
Daniel

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Cutting MediaWiki loose from wikitext

Alex Brollo
I agree that's hyronical to play with a powerful database-built project,
and to have no access nor encouragement to organize our data as should be
organized. But - we do use normal pages as data repository too, simply
marking some specific areas of pages as "data areas". More, we use the same
page both as normal wikitext container and "data container". Why not?

Alex brollo (it.source)
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Cutting MediaWiki loose from wikitext

John Erling Blad
In reply to this post by Daniel Kinzler
I like this idea, it solves a lot of problems.
John

On Mon, Mar 26, 2012 at 4:45 PM, Daniel Kinzler <[hidden email]> wrote:

> Hi all. I have a bold proposal (read: evil plan).
>
> To put it briefly: I want to remove the assumption that MediaWiki pages contain
> always wikitext. Instead, I propose a pluggable handler system for different
> types of content, similar to what we have for file uploads. So, I propose to
> associate a "content model" identifier with each page, and have handlers for
> each model that provide serialization, rendering, an editor, etc.
>
> The background is that the Wikidata project needs a way to store structured data
> (JSON) on wiki pages instead of wikitext. Having a pluggable system would solve
> that problem along with several others, like doing away with the special cases
> for JS/CSS, the ability to maintain categories etc separate from body text,
> manage Gadgets sanely on a wiki page, or several other things (see the link below).
>
> I have described my plans in more detail on meta:
>
>  http://meta.wikimedia.org/wiki/Wikidata/Notes/ContentHandler
>
> A very rough prototype is in a dev branch here:
>
>  http://svn.wikimedia.org/svnroot/mediawiki/branches/Wikidata/phase3/
>
> Please let me know what you think (here on the list, preferably, not on the talk
> page there, at least for now).
>
> Note that we *definitely* need this ability for Wikidata. We could do it
> differently, but I think this would be the cleanest solution, and would have a
> lot of mid- and long term benefits, even if it's a short term pain. I'm
> presenting my plan here to find out if I'm on the right track, and whether it is
> feasible to put this on the road map for 1.20. It would be my (and the Wikidata
> team's) priority to implement this and see it through before Wikimania. I'm
> convinced we have the manpower to get it done.
>
> Cheers,
> Daniel
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitext-l] Fwd: Cutting MediaWiki loose from wikitext

Jeremy Baron
In reply to this post by Daniel Kinzler
On Mon, Mar 26, 2012 at 12:50, Maximilian Doerr <[hidden email]> wrote:
> I strongly disagree with removing wiki text.  Several thousand bots depend
> on it.  Especially Cluebot NG which is a highly sophisticated anti-vandalism
> bot.  I recommend going our current route where the visual editor writes the
> Wikimarkup realtime and vice versa.

I suspect maybe you didn't understand Daniel's message? (or didn't
read it?) I imagine most pages will still use the same markup we have
today after his idea is implemented.

For the pages that are different, the bots can and should adapt.

-Jeremy

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Cutting MediaWiki loose from wikitext

Brion Vibber
In reply to this post by Daniel Kinzler
I'm generally in favor of this plan. I haven't looked over the specific
code experiments yet but the plan sounds solid. A few notes:

* over time we'll want to do things like migrate File: pages from 'plain
wikitext that happens to have an associated file' to 'structured data about
a file'. This will be magnificent.

* I wouldn't overmuch emphasize things like "oh you could have pages in
markdown or tex!", though it does sound neat and all. :)

* we need to make sure that import/export round-trips things consistently,
including for "non-wikitext" stuff. Either that means making import/export
content-aware, or shipping the serialized form through the export XML?


As for timing; Daniel's hoping for something in the neighborhood of an
August deployment. I think if we keep things minimal that should be
feasible; it's somewhat similar to the migration of Image stuff with
MediaHandler classes.

I'm a bit uncertain about the idea of 'multipart' pages, though attached
data YES YES in some clean way is needed.

-- brion


On Mon, Mar 26, 2012 at 7:45 AM, Daniel Kinzler <[hidden email]>wrote:

> Hi all. I have a bold proposal (read: evil plan).
>
> To put it briefly: I want to remove the assumption that MediaWiki pages
> contain
> always wikitext. Instead, I propose a pluggable handler system for
> different
> types of content, similar to what we have for file uploads. So, I propose
> to
> associate a "content model" identifier with each page, and have handlers
> for
> each model that provide serialization, rendering, an editor, etc.
>
> The background is that the Wikidata project needs a way to store
> structured data
> (JSON) on wiki pages instead of wikitext. Having a pluggable system would
> solve
> that problem along with several others, like doing away with the special
> cases
> for JS/CSS, the ability to maintain categories etc separate from body text,
> manage Gadgets sanely on a wiki page, or several other things (see the
> link below).
>
> I have described my plans in more detail on meta:
>
>  http://meta.wikimedia.org/wiki/Wikidata/Notes/ContentHandler
>
> A very rough prototype is in a dev branch here:
>
>  http://svn.wikimedia.org/svnroot/mediawiki/branches/Wikidata/phase3/
>
> Please let me know what you think (here on the list, preferably, not on
> the talk
> page there, at least for now).
>
> Note that we *definitely* need this ability for Wikidata. We could do it
> differently, but I think this would be the cleanest solution, and would
> have a
> lot of mid- and long term benefits, even if it's a short term pain. I'm
> presenting my plan here to find out if I'm on the right track, and whether
> it is
> feasible to put this on the road map for 1.20. It would be my (and the
> Wikidata
> team's) priority to implement this and see it through before Wikimania. I'm
> convinced we have the manpower to get it done.
>
> Cheers,
> Daniel
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Cutting MediaWiki loose from wikitext

Daniel Kinzler
On 26.03.2012 22:02, Brion Vibber wrote:
> I'm generally in favor of this plan. I haven't looked over the specific
> code experiments yet but the plan sounds solid.

YAY!

> * over time we'll want to do things like migrate File: pages from 'plain
> wikitext that happens to have an associated file' to 'structured data about
> a file'. This will be magnificent.

I hope to get the WMNL guys excited about this idea, this would really rock for
GLAM applications.

> * I wouldn't overmuch emphasize things like "oh you could have pages in
> markdown or tex!", though it does sound neat and all. :)

Yes. For the records, i do *not* want to move Wikipedia format to another
syntax. (Well, I wish it *used* another syntax, but that's a completely separate
discussion).

> * we need to make sure that import/export round-trips things consistently,
> including for "non-wikitext" stuff. Either that means making import/export
> content-aware, or shipping the serialized form through the export XML?

I intend the importer/exporter to use the serialized form, and to be aware only
of the additional revision attributes specifying the content model and
serialization format.

How a wiki should react when importing content for an unknown handler is an open
issue, though. Fail? Import a blank page? Import as wikitext?...

But we don't need to solve that here and now.

> As for timing; Daniel's hoping for something in the neighborhood of an
> August deployment. I think if we keep things minimal that should be
> feasible; it's somewhat similar to the migration of Image stuff with
> MediaHandler classes.

This is because of Wikidata's tight timeline. We'll be working hard on getting
this ready soon.

> I'm a bit uncertain about the idea of 'multipart' pages, though attached
> data YES YES in some clean way is needed.

That bit is mostly idle musing - "multipart" and "attachments" are *not* needed
for Wikidata, though they open up several neat use cases.

Thanks for the feedback Brion!

-- daniel

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Cutting MediaWiki loose from wikitext

Daniel Kinzler
In reply to this post by Alex Brollo
On 26.03.2012 18:18, Alex Brollo wrote:
> I agree that's hyronical to play with a powerful database-built project,
> and to have no access nor encouragement to organize our data as should be
> organized. But - we do use normal pages as data repository too, simply
> marking some specific areas of pages as "data areas". More, we use the same
> page both as normal wikitext container and "data container". Why not?

Because it is not sufficient. There is no way to query such data efficiently,
and there is no standard web API to access this data, not URLs to reference it
(without the text around it).

The proposal allows for structured data as page content, as well as any other
type of page content, and it also potentially allows multiple types of data to
exist as part of the same page (using some mechanism of "attachment" or
"multipart").

-- daniel


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Cutting MediaWiki loose from wikitext

MZMcBride-2
In reply to this post by Daniel Kinzler
Daniel Kinzler wrote:
> To put it briefly: I want to remove the assumption that MediaWiki pages
> contain always wikitext. Instead, I propose a pluggable handler system for
> different types of content, similar to what we have for file uploads. So, I
> propose to associate a "content model" identifier with each page, and have
> handlers for each model that provide serialization, rendering, an editor, etc.

It's an ancient assumption that's built in to many parts of MediaWiki (and
many outside tools and scripts). Is there any kind of assessment about the
level of impact this would have?

For example, would the diff engine need to be rewritten so that people can
monitor these pages for vandalism? Will these pages be editable in the same
way as current wikitext pages? If not, will there be special editors for the
various data types? What other parts of the MediaWiki codebase will be
affected and to what extent? Will text still go in the text table or will
separate tables and infrastructure be used?

I'm reminded a little of LiquidThreads for some reason. This idea sounds
good, but I'm worried about the implementation details, particularly as the
assumption you seek to upend is so old and ingrained.

> The background is that the Wikidata project needs a way to store structured
> data (JSON) on wiki pages instead of wikitext. Having a pluggable system would
> solve that problem along with several others, like doing away with the special
> cases for JS/CSS, the ability to maintain categories etc separate from body
> text, manage Gadgets sanely on a wiki page, or several other things (see the
> link below).

How would this affect categories being stored in wikitext (alongside the
rest of the page content text)? That part doesn't make any sense to me.

MZMcBride



_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Cutting MediaWiki loose from wikitext

Platonides
In reply to this post by Daniel Kinzler
I like the general idea (haven't gone through the detailed pages).


> On 26.03.2012 22:02, Brion Vibber wrote:
>> * over time we'll want to do things like migrate File: pages from 'plain
>> wikitext that happens to have an associated file' to 'structured data about
>> a file'. This will be magnificent.
I think that File: pages that happen to be svg is a much easier approach.


>> I'm a bit uncertain about the idea of 'multipart' pages, though attached
>> data YES YES in some clean way is needed.
>
> That bit is mostly idle musing - "multipart" and "attachments" are *not* needed
> for Wikidata, though they open up several neat use cases.

It's just something to take into account when designing the extensibility.


> A very rough prototype is in a dev branch here:
>
>   http://svn.wikimedia.org/svnroot/mediawiki/branches/Wikidata/phase3/

It looks really evil publishing that svn branch just days after git
migration :)
I think that branch -created months ago- should be migrated to git, so
we could all despair..^W benefit from git wonderful branching abilities.

Best regards


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Cutting MediaWiki loose from wikitext

Tim Starling-2
In reply to this post by Daniel Kinzler
On 27/03/12 01:45, Daniel Kinzler wrote:
> Hi all. I have a bold proposal (read: evil plan).
>
> To put it briefly: I want to remove the assumption that MediaWiki pages contain
> always wikitext. Instead, I propose a pluggable handler system for different
> types of content, similar to what we have for file uploads. So, I propose to
> associate a "content model" identifier with each page, and have handlers for
> each model that provide serialization, rendering, an editor, etc.

For the record: we've discussed this previously and I'm fine with it.
It's a well thought-out proposal, and the only request I had was to
ensure that the DB schema supports some similar projects that we have
in the idea pile, like multiple parser versions.

On 27/03/12 09:37, MZMcBride wrote:
> For example, would the diff engine need to be rewritten so that people can
> monitor these pages for vandalism? Will these pages be editable in the same
> way as current wikitext pages? If not, will there be special editors for the
> various data types?

These questions are all answered on the notes page that Daniel linked
to. The answers are yes, no and yes.

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Cutting MediaWiki loose from wikitext

Daniel Kinzler
In reply to this post by Platonides
On 27.03.2012 00:09, Platonides wrote:
> It looks really evil publishing that svn branch just days after git
> migration :)
> I think that branch -created months ago- should be migrated to git, so
> we could all despair..^W benefit from git wonderful branching abilities.

Indeed - when I asked Chad about that, he said "ask me again once the dust has
settled". I'd be happy to have this in git.

Or... well, maybe I'll just make a patch from that branch, make a fresh branch
in git, and cherry pick the changes, trying to keep things minimal. Yea, that's
probably the best thing to do.

-- daniel

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitext-l] Fwd: Cutting MediaWiki loose from wikitext

Daniel Kinzler
In reply to this post by Daniel Kinzler
On 27.03.2012 02:19, Daniel Friesen wrote:
> Non-wikitext data is supposed to give extensions the ability to do things beyond
> WikiText. The data is always going to be an opaque form controlled by the
> extension.
> I don't think that low level serialized data should be visible at all to
> clients. Even if they know it's there.

The serialized form of the data needs to be visible at least in the XML dump
format. How else could we transfer non-wikitext content between wikis?

Using the serialized form may also make sense for editing via the web API,
though I'm not sure yet what the best ways is here:

a) keep using the current general, text based interface with the serialized form
of the content

or b) require a specialized editing API for each content type.

Going with a) has the advantage of that it will simply work with current API
client code. However, if the client modifies the content and writes it back
without being aware of the format, it may corrupt the data. So perhaps we should
return an error when a client tries to edit a non-wikitext page "the old way".

The b) option is a bit annoying because it means that we have to define a
potentially quite complex mapping between the content model and API's result
model (nested php arrays). This is easy enough for Wikidata, which uses a JSON
based internal model. But for, say, SVG... well, I guess the specialized mapping
could still be "escaped XML as a string".

Note that if we allow a), we can still allow b) at the same time - for Wikidata,
we will definitely implement a special purpose editing interface that supports
stuff like "add value for language x to property y", etc.

> Just like database schemas change, I expect extensions to also want to alter the
> format of data as they add new features.

Indeed. This is why in addition to a data model identifier, the serialization
format is explicitly tracked in the database and will be present in dumps and
via the web API.

> Also I've thought about something like this for quite awhile. One of the things
> I'd really like us to do is start using real metadata even within normal
> WikiText pages. We should really replace in-page [[Category:]] with a real
> string of category metadata. Which we can then use to provide good intuitive
> category interfaces. ([[Category:]] would be left in for templates,
> compatibility, etc...).

That could be implemented using a "multipart" content type. But I don't want to
get into this too deeply - multipart has a lot of cool uses, but it's beyond
what we will do for Wikidata.

> This case especially tells me that raw is not something that should be
> outputting the raw data, but should be something which is implemented by
> whatever implements the normal handling for that serialized data.

you mean action=raw? yes, I agree. action=raw should not return the actual
serialized format. It should probably return nothing or an error for non-text
content. For multipart pages it would just return the "main part", without the
"extensions".

But the entire "multipart" stuff needs more thought. It has a lot of great
applications, but it's beyond the scope of Wikidata, and it has some additional
implications (e.g. can the old editing interface be used to edit "just the text"
while keeping the attachments?).

-- daniel


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Cutting MediaWiki loose from wikitext

Daniel Kinzler
In reply to this post by Tim Starling-2
On 27.03.2012 00:33, Tim Starling wrote:
> For the record: we've discussed this previously and I'm fine with it.
> It's a well thought-out proposal, and the only request I had was to
> ensure that the DB schema supports some similar projects that we have
> in the idea pile, like multiple parser versions.

Thanks Tim! The one important bit I'd like to hear from you is... do you think
it is feasible to get this not only implemented but also reviewed and deployed
by August?... We are on a tight schedule with Wikidata, and this functionality
is a major blocker.

I think implementing ContentHandlers for MediaWiki would have a lot of benefits
for the future, but if it's not feasible to get it in quickly, I have to think
of an alternative way to implement structured data storage.

Thanks
Daniel


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Cutting MediaWiki loose from wikitext

Daniel Kinzler
In reply to this post by MZMcBride-2
On 27.03.2012 00:37, MZMcBride wrote:
> It's an ancient assumption that's built in to many parts of MediaWiki (and
> many outside tools and scripts). Is there any kind of assessment about the
> level of impact this would have?

Not formally, just my own poking at the code base. There is a lot of places in
the code that access revision text, and do something with it, not all can easily
be found or changed (especially true for extensions).

My proposal covers a compatibility layer that will cause legacy code to just see
an empty page when trying to access the contents of a non-wikitext page. Only
code aware of content models will see any non-wikitext content. This should
avoid most problems, and should ensure that things will work as before at least
for everything that is wikitext.

> For example, would the diff engine need to be rewritten so that people can
> monitor these pages for vandalism?

A diff engine needs to be implemented for each content model. The existing
engine(s) does not need to be rewritten, it will be used for all wikitext pages.

> Will these pages be editable in the same
> way as current wikitext pages?

No. The entire point of this proposal is to be able to neatly supply specialized
display, editing and diffing of different kinds of content.

> If not, will there be special editors for the
> various data types?

Indeed.

> What other parts of the MediaWiki codebase will be
> affected and to what extent?

A few classes (like Revision or WikiPage) need some major additions or changes,
see the proposal on meta. Lots of places should eventually be changed to become
aware of content models, but don't need to be adapted immediately (see above).

> Will text still go in the text table or will
> separate tables and infrastructure be used?

Uh, did you read the proposal?...

All content is serialized just before storing it. It is stored into the text
table using the same code as before. The content model and serialization format
is recorded in the revision table.

Secondary data (index data, analogous to the link tables) may be extracted from
the content and stored in separate database tables, or in some other service, as
needed.

> I'm reminded a little of LiquidThreads for some reason. This idea sounds
> good, but I'm worried about the implementation details, particularly as the
> assumption you seek to upend is so old and ingrained.

It's more like the transition to using MediaHandlers instead of assuming
uploaded files to be images: existing concepts and actions are generalized to
apply to more types of content.

LiquidThreads introduces new concepts (threads, conversations) and interactions
(re-arranging, summarazing, etc) and tries to integrate them with the concepts
used for wiki pages. This seems far more complicated to me.

>> The background is that the Wikidata project needs a way to store structured
>> data (JSON) on wiki pages instead of wikitext. Having a pluggable system would
>> solve that problem along with several others, like doing away with the special
>> cases for JS/CSS, the ability to maintain categories etc separate from body
>> text, manage Gadgets sanely on a wiki page, or several other things (see the
>> link below).
>
> How would this affect categories being stored in wikitext (alongside the
> rest of the page content text)? That part doesn't make any sense to me.

Imagine a data model that works like mime/multipart email: you have a wrapper
that contains the "main" text as well as "attachments". The whole shebang gets
serialized and stored in the text table, as usual. For displaying, editing and
visualizing, you have code that is aware of the multipart nature of the content,
and puts the parts together nicely.

However, the category stuff is a use case I'm just mentioning because it has bee
requested so often in the past (namely, editing categories, interlanguage links,
etc separately from the wiki text); this mechanism is not essential to the
concept of ContentHandlers, and not something I plan to implement for the
Wikidata project. It'S just somethign that will become much easier once we have
ContentHandlers.

-- daniel

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Cutting MediaWiki loose from wikitext

Alex Brollo
In reply to this post by Daniel Kinzler
I can't understand details of this talk, but if you like take a look to the
raw code of any ns0 page into it.wikisource and consider that "area dati"
is removed from wikitext as soon as an user opens the page in edit mode,
and re-builded as the user saves it; or take a look here:
http://it.wikisource.org/wiki/MediaWiki:Variabili.js where date used into
automation/help of edit are collected as js objects.


Alex brollo
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitext-l] Fwd: Cutting MediaWiki loose from wikitext

Daniel Kinzler
In reply to this post by Daniel Kinzler
On 27.03.2012 09:33, Oren Bochman wrote:
> 1. JSON - that's not a very reader friendly format. Also not an ideal format
>  for the search engine to consume. This is due to lack of Support for
> metadata and data schema. XML is universally supported, more human friendly
> and support a schema which can be useful way beyond their this initial .

JSON is the internal serialization format. It will not be shown to the user or
used to communicate with clients. Unless of course they use JSON for interaction
with the web API, as most do.

The full text search engine will be fed a completely artificial view of the
data. I agree that JSON wouldn't be good for that, though XML would be far worse
still.

As to which format and data model to use to represent Wikidata records
internally: that's a different discussion, independent of the idea of
introducing ContentHandlers to MediaWiki. Please post to wikidata-l about that.

> 2. Be bold but also be smart and give respect where it is due. Bots and
> everyone else who's written tools for and about MediaWiki, who made a basic
> assumption about the page structure would be broken. Many will not so readily
> adapt.

I agree that backwards compatibility is very important. Which is why I took care
not to break any code or client using the "old" interface on pages that contain
wikitext (i.e. the standard/legacy case). The current interface (both, the web
API as well as methods in MediaWiki core) will function exactly as before for
all pages that contain wikitext.

For pages not containing wikitext, such code can not readily function. There are
two options here (currently controlled by a global setting): pretend the page is
empty (the default) or throw an error (probably better in case of the web API,
but too strict for other uses).

> 3. A project like wikidata - in its infancy should make every effort to be
> backwards compatible, It would be far wiser to be place wikidata into a page
> with wiki source using an custom <xml/> tag or even <cdata/> xhtml tag.

I strongly disagree with that, it introduces more problems than it solves; Denny
and I decided against this option specifically in the light of the experience he
collected with embedding structured data in wikitext in Semantic MediaWiki and
Shortipedia.

But again: that's a different discussion, please post your concerns to wikidata-l.

Regards,
Daniel

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Cutting MediaWiki loose from wikitext

Daniel Kinzler
In reply to this post by Alex Brollo
On 27.03.2012 09:47, Alex Brollo wrote:
> I can't understand details of this talk, but if you like take a look to the
> raw code of any ns0 page into it.wikisource and consider that "area dati"
> is removed from wikitext as soon as an user opens the page in edit mode,
> and re-builded as the user saves it; or take a look here:
> http://it.wikisource.org/wiki/MediaWiki:Variabili.js where date used into
> automation/help of edit are collected as js objects.

Yes. Basically, the ContentHandler proposal would introduce native support for
this kind of thing into MediaWiki, instead of implementing it as a hack with
JavaScript. Wouldn't it be nice to get input forms for this data, or have nice
diffs of the structure, or good search results for data records?... Not to
mention the ability to actually query for individual data fields :)

-- daniel

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Cutting MediaWiki loose from wikitext

Antoine Musso-3
In reply to this post by Daniel Kinzler
Daniel Kinzler wrote:
> A very rough prototype is in a dev branch here:
>
>   http://svn.wikimedia.org/svnroot/mediawiki/branches/Wikidata/phase3/

I guess we could have that migrated to Gerrit and review the project there.

--
Antoine "hashar" Musso


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Cutting MediaWiki loose from wikitext

Daniel Kinzler
On 27.03.2012 11:26, Antoine Musso wrote:
> Daniel Kinzler wrote:
>> A very rough prototype is in a dev branch here:
>>
>>   http://svn.wikimedia.org/svnroot/mediawiki/branches/Wikidata/phase3/
>
> I guess we could have that migrated to Gerrit and review the project there.

Sure, fine with me :) Though I will likely make a new branch and merge my
changes again more cleanly. What's there now is really a proof of concept. But
sure, have a look!

-- daniel


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitext-l] Fwd: Cutting MediaWiki loose from wikitext

Daniel Kinzler
In reply to this post by Daniel Kinzler
On 27.03.2012 13:07, [hidden email] wrote:
>> JSON is the internal serialization format.
>
> You're suggesting to use MediaWiki as a model :)
> What's stopping you from implementing it as a _file_ handler, not _article_
> handler?

Because of the actions I want to be able to perform on them, most importantly
editing, but also having diff views for the history, automatic merge to avoid
edit conflicts, etc.

These types of interaction is supported by mediawiki for "articles", but not for
"files".

In constrast, files are rendered/thumbnailed (we don't need that), get included
in articles with a box and caption (we don't want that), and can be
accessed/downloaded directly as a file via http (we definitely don't want that).

So, what we want to do with the structured data fits much better with
MediaWiki's concept of a "page" than with the concept of a "file".

> I mean, _articles_ contain text (now wikitext).
> All non-human readable/editable/diffable data is stored as "files".

But that data WILL be readable/editable/diffable! That's the point! Just not as
text, but as something else, using special viewers, editors, and differs. That's
precisely the idea of the ContentHandler.

> Now they all are in File namespace, but maybe it's much more simpler to allow
> storing them in other namespaces and write file handlers for displaying/editing
> them than to break the idea of "article"?

How does what I propose break the idea of an article? It just means that
articles do not *necessarily* contain text. And it makes sure that whatever it
is that is contained in the article can still be viewed, edited, and compared in
a meaningful way.

-- daniel


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
12