HTML Dump of Individual Article(s)

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

HTML Dump of Individual Article(s)

Dan Davis-2
Is there a utility that will dump an article or list of articles to
HTML or PDF at the command line from a locked-down Wiki (i.e., login
required to view)? The pages in question are mostly self-contained and
heavily utilize tables and other formatting.

I found http://en.wikipedia.org/wiki/Wikipedia:Database_download and
http://meta.wikimedia.org/wiki/Data_dumps, but they seem to be more
geared to dumping the whole wiki and not individual pages.

Any pointers?

Dan
_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: HTML Dump of Individual Article(s)

Rob Church
On 12/05/06, Dan Davis <[hidden email]> wrote:
> Is there a utility that will dump an article or list of articles to
> HTML or PDF at the command line from a locked-down Wiki (i.e., login
> required to view)? The pages in question are mostly self-contained and
> heavily utilize tables and other formatting.
>
> I found http://en.wikipedia.org/wiki/Wikipedia:Database_download and
> http://meta.wikimedia.org/wiki/Data_dumps, but they seem to be more
> geared to dumping the whole wiki and not individual pages.

The dumpHTML maintenance script allows you to specify a start and end
page identifier which could be used to dump a single page or a number
of pages with consecutive identifiers.


Rob Church
_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: HTML Dump of Individual Article(s)

Dan Davis-2
On 5/12/06, Rob Church <[hidden email]> wrote:

> On 12/05/06, Dan Davis <[hidden email]> wrote:
> > Is there a utility that will dump an article or list of articles to
> > HTML or PDF at the command line from a locked-down Wiki (i.e., login
> > required to view)? The pages in question are mostly self-contained and
> > heavily utilize tables and other formatting.
>
> The dumpHTML maintenance script allows you to specify a start and end
> page identifier which could be used to dump a single page or a number
> of pages with consecutive identifiers.
>

No way to do this by title? Only by page ID?  Is it possible for this
to work with pages that require Login? The output is giving me a page
that says I must login to view the page.

Is it possible dump the article text only without headers, footers and
sidebars but with the normal formatting parsed correctly? I want to
have an automated process that will do wonderful thinks with the
article text without the navigational features, etc.

Dan
_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: HTML Dump of Individual Article(s)

Tels
Moin,


On Tuesday 16 May 2006 14:59, Dan Davis wrote:

> On 5/12/06, Rob Church <[hidden email]> wrote:
> > On 12/05/06, Dan Davis <[hidden email]> wrote:
> > > Is there a utility that will dump an article or list of articles to
> > > HTML or PDF at the command line from a locked-down Wiki (i.e.,
> > > login required to view)? The pages in question are mostly
> > > self-contained and heavily utilize tables and other formatting.
> >
> > The dumpHTML maintenance script allows you to specify a start and end
> > page identifier which could be used to dump a single page or a number
> > of pages with consecutive identifiers.
>
> No way to do this by title? Only by page ID?  Is it possible for this
> to work with pages that require Login? The output is giving me a page
> that says I must login to view the page.
>
> Is it possible dump the article text only without headers, footers and
> sidebars but with the normal formatting parsed correctly? I want to
> have an automated process that will do wonderful thinks with the
> article text without the navigational features, etc.
Have you looked at wiki2xml - see

 http://bloodgate.com/wiki/index.php?title=Special:Wiki2XML

for example.

Best wishes,

Tels


--
 Signed on Tue May 16 18:58:25 2006 with key 0x93B84C15.
 Visit my photo gallery at http://bloodgate.com/photos/
 PGP key on http://bloodgate.com/tels.asc or per email.

 STOP! We have run out of virgins!


_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/mediawiki-l

attachment0 (492 bytes) Download Attachment