Re: MediaWiki to Latex Converter

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: MediaWiki to Latex Converter

Dirk Hünniger
Hugo Vincent <hugo <at> bluewatersys.com> writes:

>
> Hi everyone,
>
> I recently set up a MediaWiki (http://server.bluewatersys.com/w90n740/)
> and I need to extra the content from it and convert it into LaTeX
> syntax for printed documentation. I have googled for a suitable OSS
> solution but nothing was apparent.
>
> I would prefer a script written in Python, but any recommendations
> would be very welcome.
>
> Do you know of anything suitable?
>
> Kind Regards,
> Hugo Vincent,
> Bluewater Systems.
>

This problem is actually sovled there is an easy way to export mediawiki
articles to LaTeX and PDF.

see http://de.wikibooks.org/wiki/Benutzer:Dirk_Huenniger/wb2pdf

Yours Dirk Hünniger




_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: MediaWiki to Latex Converter

Svip
On 16 June 2012 10:51, Dirk Hünniger <[hidden email]> wrote:

> This problem is actually sovled there is an easy way to export mediawiki
> articles to LaTeX and PDF.
>
> see http://de.wikibooks.org/wiki/Benutzer:Dirk_Huenniger/wb2pdf

Interesting, but why is it so large?  Is the source code available?

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: MediaWiki to Latex Converter

Dirk Hünniger
On 06/16/2012 12:03 PM, Svip wrote:
> Interesting, but why is it so large?  Is the source code available?
The source code is available here

http://wb2pdf.svn.sourceforge.net/viewvc/wb2pdf/

The Binary is large because it contains everything necessery to compile
the generated LaTeX code, which is basically a full installation of MikTeX.
Yours Dirk Hünniger

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: MediaWiki to Latex Converter

Platonides
In reply to this post by Dirk Hünniger
On 16/06/12 10:51, Dirk Hünniger wrote:
> This problem is actually sovled there is an easy way to export mediawiki
> articles to LaTeX and PDF.
>
> see http://de.wikibooks.org/wiki/Benutzer:Dirk_Huenniger/wb2pdf
>
> Yours Dirk Hünniger

How does it compare with
http://www.mediawiki.org/wiki/Extension:Wiki2LaTeX ?

Also, are you aware you're replying to an 8 years old thread?



_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: MediaWiki to Latex Converter

Dirk Hünniger
On 06/16/2012 05:53 PM, Platonides wrote:
> On 16/06/12 10:51, Dirk Hünniger wrote:>  This problem is actually sovled there is an easy way to export mediawiki>  articles to LaTeX and PDF.>  >  see http://de.wikibooks.org/wiki/Benutzer:Dirk_Huenniger/wb2pdf>  >  Yours Dirk Hünniger
> How does it compare withhttp://www.mediawiki.org/wiki/Extension:Wiki2LaTeX ?
>
I invested much more time in the development. So it is probably more
complete. If you really want to know I can make a feature by feature
list. But its going to be very long.

Just to give you an idea how deeply I went into detail I give you a
question I had to think about. If a table is very wide, it has to be
landscape, but if it is a nested one it must not. And if it as very long
it has to span several pages. And if it begins with a set of rows
continuously containing at least on header cell each, those rows have to
be repeated on top of each new page of the table. And by the way what
happens if these cells contain footnotes.

Sounds like fun?

An important advantage for the user is that you can immediately use it
in wikipedia, wikibooks, etc.
This is because it is running on the client side.

On the other hand Wiki2LaTeX runs on the server side. That means it
needs to be installed by the administrator of the Wiki.

I will also provide a server side version of my software if requested to
do so.

  Yours Dirk

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: MediaWiki to Latex Converter

Platonides
In reply to this post by Dirk Hünniger
On 16/06/12 12:25, Dirk Hünniger wrote:
> On 06/16/2012 12:03 PM, Svip wrote:
>> Interesting, but why is it so large?  Is the source code available?
> The source code is available here
>
> http://wb2pdf.svn.sourceforge.net/viewvc/wb2pdf/
>
> The Binary is large because it contains everything necessery to compile
> the generated LaTeX code, which is basically a full installation of MikTeX.
> Yours Dirk Hünniger

Have you heard of dependencies?
You have to download a 364M file, which extracts to 898M
Of those 94M are Linux-specific. The rest includes miktex files, object
files, dlls, exes, imagemagick, tcl/tk, Olson db...
The real code seem to lie at  trunk/wb2pdf/trunk/src, being just 4MB.

And if we look at the linux version, it isn't better. It does not only
place everything into a /usr/bin subfolder, it copies everything (90M)
to /tmp on each run. Completely oblivious of security.
Running this program on a shared system is a vulnerability on itself.

Why don't you make a package with just the wb2pdf specific files?
Also, temporary build files are not needed on a release.


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: MediaWiki to Latex Converter

Dirk Hünniger
On 06/16/2012 06:49 PM, Platonides wrote:
> Have you heard of dependencies?You have to download a 364M file, which extracts to 898MOf those 94M are Linux-specific. The rest includes miktex files, objectfiles, dlls, exes, imagemagick, tcl/tk, Olson db...The real code seem to lie at  trunk/wb2pdf/trunk/src, being just 4MB.
> And if we look at the linux version, it isn't better. It does not onlyplace everything into a /usr/bin subfolder, it copies everything (90M)to /tmp on each run. Completely oblivious of security.Running this program on a shared system is a vulnerability on itself.
> Why don't you make a package with just the wb2pdf specific files?Also, temporary build files are not needed on a release.


I provide one download that is easy to use for any user of both Linux an
Windows. Thus is obviously contains files unnecessary for each of the
two operating systems. I have heard of dependencies and the .deb
contains a lot of them, and they are downloaded when it is installed. I
can produce a higher quality .deb file. It will still be 90MByte because
I need a full Unicode font. To be precise I need twelve variants of it
and thats the 90MByte. I essentially did the tmp trick in order to get
around the work of researching where to install each file and to
properly fix the path names in the code and to test that. So for now you
can run the software, you can test every feature you want, and if you or
somebody else decided s/he wants to use it, I will make a .deb file that
fits yours needs. This will probably take two weeks, with most of the
time being spent on chose proper directories.

Yours Dirk

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: MediaWiki to Latex Converter

Platonides
On 16/06/12 19:14, Dirk Hünniger wrote:

> On 06/16/2012 06:49 PM, Platonides wrote:
>> Have you heard of dependencies?You have to download a 364M file, which
>> extracts to 898MOf those 94M are Linux-specific. The rest includes
>> miktex files, objectfiles, dlls, exes, imagemagick, tcl/tk, Olson
>> db...The real code seem to lie at  trunk/wb2pdf/trunk/src, being just
>> 4MB.
>> And if we look at the linux version, it isn't better. It does not
>> onlyplace everything into a /usr/bin subfolder, it copies everything
>> (90M)to /tmp on each run. Completely oblivious of security.Running
>> this program on a shared system is a vulnerability on itself.
>> Why don't you make a package with just the wb2pdf specific files?Also,
>> temporary build files are not needed on a release.
>
>
> I provide one download that is easy to use for any user of both Linux an
> Windows. Thus is obviously contains files unnecessary for each of the
> two operating systems.
If it was just a few extra MB, I could agree. But 94M / 800M IMHO are
past the point here you should split per OS.

> I have heard of dependencies and the .deb
> contains a lot of them, and they are downloaded when it is installed. I
> can produce a higher quality .deb file.

> It will still be 90MByte because
> I need a full Unicode font. To be precise I need twelve variants of it
> and thats the 90MByte.
You mean the mega font? That's actually 207M uncompressed :)
That should probably go to a different package (and depend on it). I
don't see why it couldn't fallback to another available font if it's not
available, though.
Many wikis are written in just a tiny subset of unicode.

It seems you're creating it from wqyzenhei + unifont + freeserif fonts.
Why do you need to merge them?


> I essentially did the tmp trick in order to get
> around the work of researching where to install each file and to
> properly fix the path names in the code and to test that.

In case of doubt, you should have placed the folder in /usr/lib
A number of would be better placed at /usr/share, though.
But I'm not sure what are many files.
For instance, what's the purpose of geturl and pa programs?

And why do you have copies at bin/ and dist/build? Furthermore, why are
they different?
Build artifacts are also common there.

> So for now you
> can run the software, you can test every feature you want, and if you or
> somebody else decided s/he wants to use it, I will make a .deb file that
> fits yours needs. This will probably take two weeks, with most of the
> time being spent on chose proper directories.

I feel a bit wary of running that :S


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: MediaWiki to Latex Converter

Dirk Hünniger



> You mean the mega font? That's actually 207M uncompressed :)
> That should probably go to a different package (and depend on it). I
> don't see why it couldn't fallback to another available font if it's not
> available, though.
The point is that the change of the font has to happen inside a run of
LaTeX compiler. I tried that and it sometimes works but often the
compiler does not produce any output if I do that. So the best is to
give the compiler one font for the whole document and let run with that.
>
> It seems you're creating it from wqyzenhei + unifont + freeserif fonts.
> Why do you need to merge them?
I merged them because changing the font in LaTeX does not always work,
especially inside headings which become part of the table of contents.
>> I essentially did the tmp trick in order to get
>> around the work of researching where to install each file and to
>> properly fix the path names in the code and to test that.
> In case of doubt, you should have placed the folder in /usr/lib
> A number of would be better placed at /usr/share, though.
> But I'm not sure what are many files.
> For instance, what's the purpose of geturl and pa programs?
The main part of the program is written in the wonderful and easy to
learn purely functional programming language Haskell. Some minor parts
are written in Python3, these two parts need to communicate. Currently
pa and geturl are binaries created by the Haskell Compiler ghc. pa is
essitially a compiler for the mediawiki language, it parses to a tree
and writes it down as LaTeX.  The problem with the mediawiki language is
that it allows improper bracketing of tags and thus is not context free
and thus there is no BNF for it and thus all normal parsers are ruled
out and thus you need to use a more obscure technology like monadic
parser combinators in Haskell.

But since you seem to have a good idea where to put which file, you
maybe could give me some hints on that, since that would make my work
much easier.
> And why do you have copies at bin/ and dist/build? Furthermore, why are
> they different?
> Build artifacts are also common there.
I will remember this for future versions of the deb file. Essentially I
only need the stuff in the bin directory. The stuff in the build
directory is just created by the ghc build tools.

Yours Dirk

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: MediaWiki to Latex Converter

Dirk Hünniger
In reply to this post by Platonides
> You mean the mega font? That's actually 207M uncompressed :)
> That should probably go to a different package (and depend on it). I
> don't see why it couldn't fallback to another available font if it's not
> available, though.
I could indeed work without that font. But in this case I will create
font switching commands in the latex file. This means that it won't
compile with pdflatex, since that does not allow font switching inside
headings. Furthermore the LaTeX file will become significantly less
readable. I also cannot put the fonts to another package, since the
Debian project is not going to accept that package, as I just
investigated. So essentially it is not possible to create a
significantly better deb file from my point of view.
Yours Dirk

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l