Dear Brian,
On 9/13/15, Brian Wolff <[hidden email]> wrote: > On 9/12/15, wp mirror <[hidden email]> wrote: >> 0) Context >> >> I am currently developing new features for WP-MIRROR (see < >> https://www.mediawiki.org/wiki/Wp-mirror>). >> >> 1) Objective >> >> I would like WP-MIRROR to generate all image thumbs during the mirror build >> process. This is so that mediawiki can render pages quickly using >> precomputed thumbs. >> >> 2) Dump importation >> >> maintenance/importDump.php - this computes thumbs during importation, but >> is too slow. >> mwxml2sql - loads databases quickly, but does not compute thumbs. >> >> 3) Question >> >> Is there a way to compute all the thumbs after loading databases quickly >> with mwxml2sql? >> >> Sincerely Yours, >> Kent >> ______________________________ >_________________ >> Wikitech-l mailing list >> [hidden email] >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > Hi. My understanding is that wp-mirror sets up a MediaWiki instance > for rendering the mirror. One solution would be to set up 404-thumb > rendering. This makes it so that instead of pre-rendering the needed > thumbs, MediaWiki will render the thumbs on-demand whenever the web > browser requests a thumb. There's some instructions for how this works > at https://www.mediawiki.org/wiki/Manual:Thumb.php This is probably > the best solution to your problem. Right. Currently, wp-mirror does set up mediawiki to use 404-thumb rendering. This works fine, but can cause a few seconds latency when rendering pages. Also, it would be nice to be able to generate thumb dump tarballs, just like we used to generate original size media dump tarballs. I would like wp-mirror have such dump features. > Otherwise, MW needs to know what thumbs are needed for all pages, > which involves parsing pages (e.g. via refreshLinks.php). This is a > very slow process. If you already had all the thumbnail's generated, > you could just copy over the thumb directory perhaps, but I'm not sure > where you would get a pre-generated thumb directory. Wp-mirror does load the *links.sql.gz dump files into the *links tables, because this method is two orders of magnitude faster than maintenance/refreshLinks.php. >-- >-bawolff Idea. I am thinking of piping the *pages-articles.xml.bz2 dump file through an AWK script to write all unique [[File:*]] tags into a file. This can be done quickly. The question then is: Given a file with all the media tags, how can I generate all the thumbs. What mediawiki function shall I call? Can this be done using the web API? Any other ideas? Sincerely Yours, Kent _______________________________________________ Wikitech-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
On 15/09/15 01:34, wp mirror wrote:
> Idea. I am thinking of piping the *pages-articles.xml.bz2 dump file > through an AWK script to write all unique [[File:*]] tags into a file. This > can be done quickly. The question then is: Given a file with all the media > tags, how can I generate all the thumbs. What mediawiki function shall I > call? Can this be done using the web API? Any other ideas? > > Sincerely Yours, > Kent You know it will fail for all kind of images included through templates (particularly infoboxes), right? _______________________________________________ Wikitech-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
On Mon, Sep 14, 2015 at 4:49 PM, Platonides <[hidden email]> wrote:
> You know it will fail for all kind of images included through templates > (particularly infoboxes), right? Indeed, it is not possible to find out what thumbnails are used by a page without actually parsing it. Your best bet is to wait until Parsoid dumps become available (T17017 <https://phabricator.wikimedia.org/T17017>), then go through those with an XML parser and extract the thumb URLs. That's still slow but not as slow as the MediaWiki parser. (Or you can try to find a regexp which matches thumbnail URLs but we all know what happens <http://stackoverflow.com/a/1732454/323407> when you use a regexp to parse HTML.) After that, just throw those URLs at the 404 handler. _______________________________________________ Wikitech-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/wikitech-l |
Free forum by Nabble | Edit this page |