Simple Page Object model using #lst

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Simple Page Object model using #lst

Alex Brollo
I'd like to share an idea. If you think that I don't know of what I am
speaking of, probably you're right; nevertheless I'll try.

Labeled section trasclusion, I presume, simply runs as a substring search
into raw wiki code of a page; it gives back a piece of the page as it is
(but removing any <section...> tag inside). Imagine that this "copy and
paste" of chunks of wiki code would be the first parsing step, the result
being a new wiki text, then parsed for template code and other wiki code.

If this would happen, I imagine that the original page could be considered
an "object", t.i. a collection of "attributes" (fragments of text)  and
"methods" (template chunks). So, you could write template pages with
collections of different template functions,. or pages with collections of
different data, or mixed pages with both data and functions, any of them
being accessible from any wiki page of the same project (while waiting for
interwiki transclusionn).

Then, simply adding carefully a "self-tranclusion permission" to use chunks
of code of a page into the same page , the conversion of a page into a
true,even if simple,  "object" would be complete.

Alex
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Simple Page Object model using #lst

Jesse (Pathoschild)
On Tue, Jan 25, 2011 at 8:14 AM, Alex Brollo <[hidden email]> wrote:
> If this would happen, I imagine that the original page could be considered
> an "object", t.i. a collection of "attributes" (fragments of text)  and
> "methods" (template chunks).

Labeled Section Transclusion can be used this way, but it's not very
efficient for this. Internally it uses generated regular expressions
to extract sections; you can peek at its source code at
<http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/LabeledSectionTransclusion/lst.php?view=markup>.

--
Yours cordially,
Jesse (Pathoschild)

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Simple Page Object model using #lst

Alex Brollo
2011/1/25 Jesse (Pathoschild) <[hidden email]>

> On Tue, Jan 25, 2011 at 8:14 AM, Alex Brollo <[hidden email]>
> wrote:
> > If this would happen, I imagine that the original page could be
> considered
> > an "object", t.i. a collection of "attributes" (fragments of text)  and
> > "methods" (template chunks).
>
> Labeled Section Transclusion can be used this way, but it's not very
> efficient for this. Internally it uses generated regular expressions
> to extract sections; you can peek at its source code at
> <
> http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/LabeledSectionTransclusion/lst.php?view=markup
> >.
>

Thanks, but I'm far from understanding such a php code, nor I have any idea
about the "whole exotic thing" of wiki code parsing and html generation.
But, if I'd write something like #lst, I'd index text using section code
simply as delimiters, building something hidden like this into the wiki code
ot into another field of database:

<!-- sections
s1[0:100]
s2 [120:20]
s3[200:150]
 -->

where s1,s2,s3 are the section names and numbers the offset/length of the
text between section tags into the wiki page "string"; or something similar
to this, built to be extremely simple/fast  to parse and to give back
substrings of the page in the fastest, most efficient way. Such data should
be calculated only when a page content is changed. I guess, that efficiency
of sections would increase a lot, incouraging a larger use of #lst.

If such parsing of section text would be the first step of page parsing,
even segments of text delimited by noinclude tags could be retrieved.

Alex
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Simple Page Object model using #lst

Alex Brollo
2011/1/25 Alex Brollo <[hidden email]>

Just to test effectiveness of such a strange idea, I added some formal
section tags into a 6 Kby text section.txt, then I wrote a simple script to
create a "data area" , this is the result (a python dictionary into a html
comment code) appended to the section.txt file:

<!--SECTIONS:{'<section begin=1 />': [(152, 990), (1282, 2406), (4078,
4478)], '<section begin=6 />': [(19, 115)], '<section begin=2 />': [(2443,
2821), (2859, 3256)], '<section begin=4 />': [(1555, 1901)], '<section
begin=5 />': [(171, 477)], '<section begin=3 />': [(3704, 4042)]}-->

then I run these lines from python idle:

>>> for i in range(1000):
    f=open("section.txt").read()
    indici=eval(find_stringa(f,"<!--SECTIONS:","-->"))
    t=""
    for i in indici["<section begin=1 />"]:
        t+=f[i[0]:i[1]]

As you see the code, for 1000 times:
opens the file and loads it
selects "data area" (find_stringa is a personal, string seach tool to get
strings), and converts it into a dictionary
retrieves all the text inside multiple sections named "1" (the worst case in
the list: section 1 has three instances: [(152, 990), (1282, 2406), (4078,
4478)]

Time to do 1000 cicles: more or less, 3 seconds on a far from powerful pc.
:-)
Fast, in my opinion!

So, it can be done, and it runs, in an effective way too. Doesn't it?

Alex
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Simple Page Object model using #lst

Platonides
Had LST used <section name=foo> </section> to mark sections,
instead of <section begin=foo />content<section end=foo />, it
would be as easy as traversing the preprocessor output, which would
already have the sections splitted.

Alex Brollo wrote:

> 2011/1/25 Alex Brollo <[hidden email]>
>
> Just to test effectiveness of such a strange idea, I added some formal
> section tags into a 6 Kby text section.txt, then I wrote a simple script to
> create a "data area" , this is the result (a python dictionary into a html
> comment code) appended to the section.txt file:
>
> <!--SECTIONS:{'<section begin=1 />': [(152, 990), (1282, 2406), (4078,
> 4478)], '<section begin=6 />': [(19, 115)], '<section begin=2 />': [(2443,
> 2821), (2859, 3256)], '<section begin=4 />': [(1555, 1901)], '<section
> begin=5 />': [(171, 477)], '<section begin=3 />': [(3704, 4042)]}-->
>
> then I run these lines from python idle:
>
>>>> for i in range(1000):
>     f=open("section.txt").read()
>     indici=eval(find_stringa(f,"<!--SECTIONS:","-->"))
>     t=""
>     for i in indici["<section begin=1 />"]:
>         t+=f[i[0]:i[1]]
>
> As you see the code, for 1000 times:
> opens the file and loads it
> selects "data area" (find_stringa is a personal, string seach tool to get
> strings), and converts it into a dictionary
> retrieves all the text inside multiple sections named "1" (the worst case in
> the list: section 1 has three instances: [(152, 990), (1282, 2406), (4078,
> 4478)]
>
> Time to do 1000 cicles: more or less, 3 seconds on a far from powerful pc.
> :-)
> Fast, in my opinion!
>
> So, it can be done, and it runs, in an effective way too. Doesn't it?
>
> Alex

It can obviously be done. But you should compare it against the original
implementation. 3 seconds by itself isn't meaningful.
Another thing to test would be using stripos() instead of those regex,
in case it is faster.


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Simple Page Object model using #lst

Brion Vibber
On Tue, Jan 25, 2011 at 10:27 AM, Platonides <[hidden email]> wrote:

> Had LST used <section name=foo> </section> to mark sections,
> instead of <section begin=foo />content<section end=foo />, it
> would be as easy as traversing the preprocessor output, which would
> already have the sections splitted.
>

It was done this way in order to allow overlapping sections: LST was created
so arbitrary parts of a document on Wikisource can be quoted while retaining
a direct link to the original document as it continues to be edited.

Basically, the section markers are permanent markers for the source of a
copy-and-paste operation. One person might be copying from paragraph 1 to
paragraph 4; another might copy from paragraph 3 to paragraph 5; your page
structure looks like this:

  [page]
    [section-open 1/]
    [para 1/] <!-- in section 1 only -->
    [para 2/] <!-- in section 1 only -->
    [section-open 2/]
    [para 3/] <!-- in both section 1 and 2 -->
    [para 4/] <!-- in both section 1 and 2 -->
    [section-close 1/]
    [para 5/] <!-- in section 2 only -->
    [section-close 2/]
  [/page]

Since the LST sections overlap, they don't really fit well in the
hierarchical structures that the preprocessor deals in except as standalone
start/end markers.

*BUT* ... it's probably possible to actually redo things to use that above
structure in a sensible way, instead of doing text regexes:

  iterate through the node tree:
    if found desired section start node:
      start saving our spot
    if found desired section end node:
      if start node was at same level:
        grab everything in between
        RETURN that to upstream parser
      else:
        find the closed common parent node of start and end
        build a node tree that has the parts of the start's parent before
the start trimmed, and the parts of the end's parent after the end trimmed
        RETURN that to upstream parser

One could also pull the markers out of the original text and store them as
separate metadata in some way, which seems to be part of the suggestions
earlier in thread. The main problem here is that we could easily end up
losing track of the markers during editing; we have no persistent identity
for pieces of text, so if there's not a visible node in there for editors to
move & copy along with their alterations, they not be able to persist
automatically.

-- brion
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l