Google Summer of Code: accepted projects

classic Classic list List threaded Threaded
41 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Google Summer of Code: accepted projects

Roan Kattouw-2
Yesterday, the selection of GSoC projects was officially announced.
For MediaWiki, the following projects have been accepted:

* Niklas Laxström (Nikerabbit), mentored by Siebrand, will be working
on improving localization and internationalization in MediaWiki, as
well as improving the Translate extension used on translatewiki.net
* Zhe Wu, mentored by Aryeh Gregor (Simetrical), will be building a
thumbnailing daemon, so image manipulation won't have to happen on the
Apache servers any more
* Jeroen de Dauw, mentored by Yaron Koren, will be improving the
Semantic Layers extension and merging it into the Semantic Google Maps
extension
* Gerardo Antonio Cabero, mentored by Michael Dale (mdale), will be
improving the Cortado applet for video playback (I'm a bit fuzzy on
the details for this one)

The official list with links to (parts of) the proposals can be found
at the Google website [1]; lists for other organizations can be
reached through the list of participating organizations [2].

The next event on the GSoC timeline [3] is the community bonding
period [4], during which the students are supposed to get to know
their mentors and the community. This period lasts until May 23rd,
when the students actually begin coding.

Starting now and continuing at least until the end of GSoC in August,
you will probably see and hear from the students on IRC and the
mailing lists and hear about the projects they're working on. To
repeat the crux of an earlier thread on this list [5]: be nice to
these special newcomers, make them feel welcome and comfortable, and
try not to bite them :)

To the mentors and students: have fun!

Roan Kattouw (Catrope)

[1] http://socghop.appspot.com/org/home/google/gsoc2009/wikimedia
[2] http://socghop.appspot.com/program/accepted_orgs/google/gsoc2009
[3] http://socghop.appspot.com/document/show/program/google/gsoc2009/timeline
[4] http://googlesummerofcode.blogspot.com/2007/04/so-what-is-this-community-bonding-all.html
[5] http://lists.wikimedia.org/pipermail/wikitech-l/2009-March/041964.html

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Google Summer of Code: accepted projects

Marco Schuster-2
On Wed, Apr 22, 2009 at 12:22 AM, Roan Kattouw <[hidden email]>wrote:

> * Zhe Wu, mentored by Aryeh Gregor (Simetrical), will be building a
> thumbnailing daemon, so image manipulation won't have to happen on the
> Apache servers any more


Wow, I'm lookin' forward to this. Mighta be worth a try to give the upper
the ability to choose non-standard resizing filters or so... or full-fledged
image manipulation, something like a wiki-style photoshop.

Marco


--
VMSoft GbR
Nabburger Str. 15
81737 München
Geschäftsführer: Marco Schuster, Volker Hemmert
http://vmsoft-gbr.de
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Google Summer of Code: accepted projects

Michael Dale-4
I was looking at http://editor.pixastic.com/ ... "wiki-style photoshop"
would be cool ... but not in the scope of that soc project ;)

--michael

Marco Schuster wrote:

> On Wed, Apr 22, 2009 at 12:22 AM, Roan Kattouw <[hidden email]>wrote:
>
>  
>> * Zhe Wu, mentored by Aryeh Gregor (Simetrical), will be building a
>> thumbnailing daemon, so image manipulation won't have to happen on the
>> Apache servers any more
>>    
>
>
> Wow, I'm lookin' forward to this. Mighta be worth a try to give the upper
> the ability to choose non-standard resizing filters or so... or full-fledged
> image manipulation, something like a wiki-style photoshop.
>
> Marco
>
>
>  


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Google Summer of Code: accepted projects

David Gerard-2
2009/4/22 Michael Dale <[hidden email]>:
> Marco Schuster wrote:
>> On Wed, Apr 22, 2009 at 12:22 AM, Roan Kattouw <[hidden email]>wrote:

>>> * Zhe Wu, mentored by Aryeh Gregor (Simetrical), will be building a
>>> thumbnailing daemon, so image manipulation won't have to happen on the
>>> Apache servers any more

>> Wow, I'm lookin' forward to this. Mighta be worth a try to give the upper
>> the ability to choose non-standard resizing filters or so... or full-fledged
>> image manipulation, something like a wiki-style photoshop.

> I was looking at http://editor.pixastic.com/ ... "wiki-style photoshop"
> would be cool ... but not in the scope of that soc project ;)


You can do pretty much anything with ImageMagick. Trouble is that it's
not the fastest at *anything*. Depends how much that affects
performance in practice - something that *just* thumbnails could be
all sorts of more efficient, but you'd need a new program for each
function, and most Unix users of MediaWiki thumbnail with ImageMagick
already so it'll be there.


- d.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Google Summer of Code: accepted projects

Chad
On Tue, Apr 21, 2009 at 8:16 PM, David Gerard <[hidden email]> wrote:

> 2009/4/22 Michael Dale <[hidden email]>:
>> Marco Schuster wrote:
>>> On Wed, Apr 22, 2009 at 12:22 AM, Roan Kattouw <[hidden email]>wrote:
>
>>>> * Zhe Wu, mentored by Aryeh Gregor (Simetrical), will be building a
>>>> thumbnailing daemon, so image manipulation won't have to happen on the
>>>> Apache servers any more
>
>>> Wow, I'm lookin' forward to this. Mighta be worth a try to give the upper
>>> the ability to choose non-standard resizing filters or so... or full-fledged
>>> image manipulation, something like a wiki-style photoshop.
>
>> I was looking at http://editor.pixastic.com/ ... "wiki-style photoshop"
>> would be cool ... but not in the scope of that soc project ;)
>
>
> You can do pretty much anything with ImageMagick. Trouble is that it's
> not the fastest at *anything*. Depends how much that affects
> performance in practice - something that *just* thumbnails could be
> all sorts of more efficient, but you'd need a new program for each
> function, and most Unix users of MediaWiki thumbnail with ImageMagick
> already so it'll be there.
>
>
> - d.
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

The main issue with the daemon idea (which was discussed at length in
#mediawiki a few weeks ago) is that it requires a major change in how we
handle images.

Right now, the process involves rendering on-demand, rather than at-leisure.
This has the benefit of always producing an ideal thumb'd image at the end
of every parse. However the major drawbacks are an increase in parsing
time (while we wait for ImageMagik to do its thing) and an increased load on
the app servers. The only time we can sidestep this is if someone uses a
thumb dimension for which we already have a thumb rendered.

In order for this to work, we'd need to shift to a style of "render when you get
a chance, but give me the best fit for now." Basically, we'd begin parsing and
find that we need a thumbnailed copy of some image, but we don't have the
ideal size just yet. Instead, we could return the best-fitting thumbnail so far
and use that until the daemon has given us the right image.

Not an easy task, but I certainly hope some progress can be made on
it over the summer :)

-Chad

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Google Summer of Code: accepted projects

Aryeh Gregor
In reply to this post by Marco Schuster-2
On Tue, Apr 21, 2009 at 7:54 PM, Marco Schuster
<[hidden email]> wrote:
> Wow, I'm lookin' forward to this. Mighta be worth a try to give the upper
> the ability to choose non-standard resizing filters or so... or full-fledged
> image manipulation, something like a wiki-style photoshop.

That seems to be orthogonal to the proposed project.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Google Summer of Code: accepted projects

Aryeh Gregor
In reply to this post by Chad
On Tue, Apr 21, 2009 at 8:34 PM, Chad <[hidden email]> wrote:

> The main issue with the daemon idea (which was discussed at length in
> #mediawiki a few weeks ago) is that it requires a major change in how we
> handle images.
>
> Right now, the process involves rendering on-demand, rather than at-leisure.
> This has the benefit of always producing an ideal thumb'd image at the end
> of every parse. However the major drawbacks are an increase in parsing
> time (while we wait for ImageMagik to do its thing) and an increased load on
> the app servers. The only time we can sidestep this is if someone uses a
> thumb dimension for which we already have a thumb rendered.
>
> In order for this to work, we'd need to shift to a style of "render when you get
> a chance, but give me the best fit for now." Basically, we'd begin parsing and
> find that we need a thumbnailed copy of some image, but we don't have the
> ideal size just yet. Instead, we could return the best-fitting thumbnail so far
> and use that until the daemon has given us the right image.

I'm not clear on why we don't just make the daemon synchronously
return a result the way ImageMagick effectively does.  Given the level
of reuse of thumbnails, it seems unlikely that the latency is a
significant concern -- virtually no requests will ever actually wait
on it.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Google Summer of Code: accepted projects

Michael Dale-4
Aryeh Gregor wrote:
> I'm not clear on why we don't just make the daemon synchronously
> return a result the way ImageMagick effectively does.  Given the level
> of reuse of thumbnails, it seems unlikely that the latency is a
> significant concern -- virtually no requests will ever actually wait
> on it.
>  
( I basically outlined these issues on the soc page but here they are
again with at bit more clarity )

I recommended that the image daemon run semi-synchronously since the
changes needed to maintain multiple states and return non-cached
place-holder images while managing updates and page purges for when the
updated images are available within the wikimedia server architecture
probably won't be completed in the summer of code time-line. But if the
student is up for it the concept would be useful for other components
like video transformation / transcoding, sequence flattening etc. But
its not what I would recommend for the summer of code time-line.

== per issues outlined in bug 4854 ==
I don't think its a good idea to invest a lot of energy into a separate
python based image daemon. It won't avoid all  problems listed in bug 4854

Shell-character-exploit issues should be checked against anyway (since
not everyone is going to install the daemon)

Other people using mediaWiki won't add a python or java based image
resize and resolve dependency python or java  component & libraries. It
won't be easier to install than imagemagick or "php-gd" that are
repository hosted applications and already present in shared hosting
environments.

Once you start integrating other libs like (java) Batik it becomes
difficult to resolve dependencies (java, python etc) and to install you
have to push out a "new program" that is not integrated into all the
application repository manages for the various distributions.

Potential to isolate CPU and memory usage should be considered in the
core medaiWiki image resize support anyway . ie we don't want to crash
other peoples servers who are using mediaWiki by not checking upper
bounds of image transforms. Instead we should make the core image
transform smarter maybe have a configuration var that /attempts/ to bind
the upper memory for spawned processing and take that into account
before issuing the shell command for a given large image transformation
with a given sell application.

== what would probably be better for the image resize efforts should
focus on ===

(1) making the existing system "more robust" and (2) better taking
advantage of multi-threaded servers.

(1) right now the system chokes on large images we should deploy support
for an in-place image resize maybe something like vips (?)
(http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use)
The system should intelligently call vips to transform the image to a
reasonable size at time of upload then use those derivative for just in
time thumbs for articles. ( If vips is unavailable we don't transform
and we don't crash the apache node.)

(2) maybe spinning out the image transform process early on in the
parsing of the page with a place-holder and callback so by the time all
the templates and links have been looked up the image is ready for
output. (maybe another function wfShellBackgroundExec($cmd,
$callback_function) (maybe using |pcntl_fork then normal |wfShellExec
then| ||pcntl_waitpid then callback function ... which sets some var in
the parent process so that pageOutput knows its good to go) |

If operationally the "daemon" should be on a separate server we should
still more or less run synchronously ... as mentioned above ... if
possible the daemon should be php based so we don't explode the
dependencies for deploying robust image handling with mediaWiki.

peace,
--michael

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Google Summer of Code: accepted projects

Brion Vibber-3
In reply to this post by Roan Kattouw-2
Thanks for taking care of the announce mail, Roan! I spent all day
yesterday at the dentists... whee :P

I've taken the liberty of reposting it on the tech blog:
http://techblog.wikimedia.org/2009/04/google-summer-of-code-student-projects-accepted/

I'd love for us to get the students set up on the blog to keep track of
their project progress and raise visibility... :D

-- brion

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Google Summer of Code: accepted projects

Magnus Manske-2
In reply to this post by Marco Schuster-2
On Wed, Apr 22, 2009 at 12:54 AM, Marco Schuster
<[hidden email]> wrote:

> On Wed, Apr 22, 2009 at 12:22 AM, Roan Kattouw <[hidden email]>wrote:
>
>> * Zhe Wu, mentored by Aryeh Gregor (Simetrical), will be building a
>> thumbnailing daemon, so image manipulation won't have to happen on the
>> Apache servers any more
>
>
> Wow, I'm lookin' forward to this. Mighta be worth a try to give the upper
> the ability to choose non-standard resizing filters or so... or full-fledged
> image manipulation, something like a wiki-style photoshop.

On a semi-related note: What's the status of the management routines
that handle "thrwoaway" things like math PNGs?
Is this a generic system, so it can be used e.g. for jmol PNGs in the future?
Is it integrated with the image thumbnail handling?
Should it be?

Magnus

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Google Summer of Code: accepted projects

Brion Vibber-3
On 4/22/09 11:13 AM, Magnus Manske wrote:

> On Wed, Apr 22, 2009 at 12:54 AM, Marco Schuster
> <[hidden email]>  wrote:
>> On Wed, Apr 22, 2009 at 12:22 AM, Roan Kattouw<[hidden email]>wrote:
>>
>>> * Zhe Wu, mentored by Aryeh Gregor (Simetrical), will be building a
>>> thumbnailing daemon, so image manipulation won't have to happen on the
>>> Apache servers any more
>>
>> Wow, I'm lookin' forward to this. Mighta be worth a try to give the upper
>> the ability to choose non-standard resizing filters or so... or full-fledged
>> image manipulation, something like a wiki-style photoshop.
>
> On a semi-related note: What's the status of the management routines
> that handle "thrwoaway" things like math PNGs?

There is no management for this yet, it's done ad-hoc in each such
system. :(

> Is this a generic system, so it can be used e.g. for jmol PNGs in the future?
> Is it integrated with the image thumbnail handling?
> Should it be?

We do need a central management system for this, which can handle:

1) Storage backends other than raw filesystem

We want to migrate off of using NFS to something we can better control
failover and other characteristics of. Not having to implement the
interface a second, third, fourth etc time for math, timeline, etc would
be nice.


2) Garbage collection / expiration of no-longer-used items

Right now math and timeline renderings just get stored forever and ever...


3) Sensible purging/expiration/override of old renderings when renderer
behavior changes

When we fix a bug in, upgrade, or expand capabilities of texvc etc we
need to be able to re-render the new, corrected images. Preferably in a
way that's friendly to caching, and that doesn't kill our servers with a
giant immediate crush of requests.


4) Rendering server isolation

Being able to offload rendering to a subcluster with restricted resource
limits can help avoid bringing down the entire site when there's a
runaway process (like all those image resizing problems we've seen with
giant PNGs and animated GIFs).

It may also help to do some privilege separation for services we might
not trust quite as much (shelling out to an external program with
user-supplied data? What could go wrong? :)

-- brion

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Google Summer of Code: accepted projects

K. Peachey
In reply to this post by Brion Vibber-3
On Thu, Apr 23, 2009 at 2:30 AM, Brion Vibber <[hidden email]> wrote:

> Thanks for taking care of the announce mail, Roan! I spent all day
> yesterday at the dentists... whee :P
>
> I've taken the liberty of reposting it on the tech blog:
> http://techblog.wikimedia.org/2009/04/google-summer-of-code-student-projects-accepted/
>
> I'd love for us to get the students set up on the blog to keep track of
> their project progress and raise visibility... :D
>
> -- brion
Maybe a nice little install of WordpressMU might be in order so they
each have a little blog which they can update.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Google Summer of Code: accepted projects

Wu Zhe
In reply to this post by Michael Dale-4
Michael Dale <[hidden email]> writes:

> I recommended that the image daemon run semi-synchronously since the
> changes needed to maintain multiple states and return non-cached
> place-holder images while managing updates and page purges for when the
> updated images are available within the wikimedia server architecture
> probably won't be completed in the summer of code time-line. But if the
> student is up for it the concept would be useful for other components
> like video transformation / transcoding, sequence flattening etc. But
> its not what I would recommend for the summer of code time-line.

I may have problems understanding the concept "semi-synchronously", does
it mean when MW parse a page that contains thumbnail images, the parser
sends requests to daemon which would reply twice for each request, one
immediately with a best fit or a place holder (synchronously), one later
on when thumbnail is ready (asynchronously)?

> == what would probably be better for the image resize efforts should
> focus on ===
>
> (1) making the existing system "more robust" and (2) better taking
> advantage of multi-threaded servers.
>
> (1) right now the system chokes on large images we should deploy
> support for an in-place image resize maybe something like vips (?)
> (http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use)
> The system should intelligently call vips to transform the image to a
> reasonable size at time of upload then use those derivative for just
> in time thumbs for articles. ( If vips is unavailable we don't
> transform and we don't crash the apache node.)

Wow, vips sounds great, still reading its documentation. How is its
performance on relatively small size (not huge, a few hundreds pixels in
width/height) images compared with traditional single threaded resizing
programs?

> (2) maybe spinning out the image transform process early on in the
> parsing of the page with a place-holder and callback so by the time
> all the templates and links have been looked up the image is ready for
> output. (maybe another function wfShellBackgroundExec($cmd,
> $callback_function) (maybe using |pcntl_fork then normal |wfShellExec
> then| ||pcntl_waitpid then callback function ... which sets some var
> in the parent process so that pageOutput knows its good to go) |

Asynchronous daemon doesn't make much sense if page purge occurs on
server side, but what if we put off page purge to the browser? It works
like this:

1. mw parser send request to daemon
2. daemon finds the work non-trivial, reply *immediately* with a best
   fit or just a place holder
3. browser renders the page, finds it's not final, so sends a request to
   daemon directly using AJAX
4. daemon reply to the browser when thumbnail is ready
5. browser replace temporary best fit / place holder with new thumb
   using Javascript

The daemon now have to deal with two kinds of clients: mw servers and
browsers.

Letting browser wait instead of mw server has the benefit of reduced
latency for users while still have an acceptable page to read before
image replacing takes place and a perfect page after that. For most of
users, it's likely that the replacing occurs as soon as page loading
ends, since transfering page takes some time, and daemon would have
already finished thumbnailing in the process.



_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Google Summer of Code: accepted projects

Wu Zhe
In reply to this post by Michael Dale-4
Michael Dale <[hidden email]> writes:

> I recommended that the image daemon run semi-synchronously since the
> changes needed to maintain multiple states and return non-cached
> place-holder images while managing updates and page purges for when the
> updated images are available within the wikimedia server architecture
> probably won't be completed in the summer of code time-line. But if the
> student is up for it the concept would be useful for other components
> like video transformation / transcoding, sequence flattening etc. But
> its not what I would recommend for the summer of code time-line.

I may have problems understanding the concept "semi-synchronously", does
it mean when MW parse a page that contains thumbnail images, the parser
sends requests to daemon which would reply twice for each request, one
immediately with a best fit or a place holder (synchronously), one later
on when thumbnail is ready (asynchronously)?

> == what would probably be better for the image resize efforts should
> focus on ===
>
> (1) making the existing system "more robust" and (2) better taking
> advantage of multi-threaded servers.
>
> (1) right now the system chokes on large images we should deploy
> support for an in-place image resize maybe something like vips (?)
> (http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use)
> The system should intelligently call vips to transform the image to a
> reasonable size at time of upload then use those derivative for just
> in time thumbs for articles. ( If vips is unavailable we don't
> transform and we don't crash the apache node.)

Wow, vips sounds great, still reading its documentation. How is its
performance on relatively small size (not huge, a few hundreds pixels in
width/height) images compared with traditional single threaded resizing
programs?

> (2) maybe spinning out the image transform process early on in the
> parsing of the page with a place-holder and callback so by the time
> all the templates and links have been looked up the image is ready for
> output. (maybe another function wfShellBackgroundExec($cmd,
> $callback_function) (maybe using |pcntl_fork then normal |wfShellExec
> then| ||pcntl_waitpid then callback function ... which sets some var
> in the parent process so that pageOutput knows its good to go) |

Asynchronous daemon doesn't make much sense if page purge occurs on
server side, but what if we put off page purge to the browser? It works
like this:

1. mw parser send request to daemon
2. daemon finds the work non-trivial, reply *immediately* with a best
   fit or just a place holder
3. browser renders the page, finds it's not final, so sends a request to
   daemon directly using AJAX
4. daemon reply to the browser when thumbnail is ready
5. browser replace temporary best fit / place holder with new thumb
   using Javascript

The daemon now have to deal with two kinds of clients: mw servers and
browsers.

Letting browser wait instead of mw server has the benefit of reduced
latency for users while still have an acceptable page to read before
image replacing takes place and a perfect page after that. For most of
users, it's likely that the replacing occurs as soon as page loading
ends, since transfering page takes some time, and daemon would have
already finished thumbnailing in the process.


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Google Summer of Code: accepted projects

Wu Zhe
In reply to this post by Michael Dale-4
Michael Dale <[hidden email]> writes:

> I recommended that the image daemon run semi-synchronously since the
> changes needed to maintain multiple states and return non-cached
> place-holder images while managing updates and page purges for when the
> updated images are available within the wikimedia server architecture
> probably won't be completed in the summer of code time-line. But if the
> student is up for it the concept would be useful for other components
> like video transformation / transcoding, sequence flattening etc. But
> its not what I would recommend for the summer of code time-line.

I may have problems understanding the concept "semi-synchronously", does
it mean when MW parse a page that contains thumbnail images, the parser
sends requests to daemon which would reply twice for each request, one
immediately with a best fit or a place holder (synchronously), one later
on when thumbnail is ready (asynchronously)?

> == what would probably be better for the image resize efforts should
> focus on ===
>
> (1) making the existing system "more robust" and (2) better taking
> advantage of multi-threaded servers.
>
> (1) right now the system chokes on large images we should deploy
> support for an in-place image resize maybe something like vips (?)
> (http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use)
> The system should intelligently call vips to transform the image to a
> reasonable size at time of upload then use those derivative for just
> in time thumbs for articles. ( If vips is unavailable we don't
> transform and we don't crash the apache node.)

Wow, vips sounds great, still reading its documentation. How is its
performance on relatively small size (not huge, a few hundreds pixels in
width/height) images compared with traditional single threaded resizing
programs?

> (2) maybe spinning out the image transform process early on in the
> parsing of the page with a place-holder and callback so by the time
> all the templates and links have been looked up the image is ready for
> output. (maybe another function wfShellBackgroundExec($cmd,
> $callback_function) (maybe using |pcntl_fork then normal |wfShellExec
> then| ||pcntl_waitpid then callback function ... which sets some var
> in the parent process so that pageOutput knows its good to go) |

Asynchronous daemon doesn't make much sense if page purge occurs on
server side, but what if we put off page purge to the browser? It works
like this:

1. mw parser send request to daemon
2. daemon finds the work non-trivial, reply *immediately* with a best
   fit or just a place holder
3. browser renders the page, finds it's not final, so sends a request to
   daemon directly using AJAX
4. daemon reply to the browser when thumbnail is ready
5. browser replace temporary best fit / place holder with new thumb
   using Javascript

Daemon now have to deal with two kinds of clients: mw servers and
browsers.

Letting browser wait instead of mw server has the benefit of reduced
latency for users while still have an acceptable page to read before
image replacing takes place and a perfect page after that. For most of
users, it's likely that the replacing occurs as soon as page loading
ends, since transfering page takes some time, and daemon would have
already finished thumbnailing in the process.


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Google Summer of Code: accepted projects

Wu Zhe
In reply to this post by Michael Dale-4
Michael Dale <[hidden email]> writes:

> I recommended that the image daemon run semi-synchronously since the
> changes needed to maintain multiple states and return non-cached
> place-holder images while managing updates and page purges for when the
> updated images are available within the wikimedia server architecture
> probably won't be completed in the summer of code time-line. But if the
> student is up for it the concept would be useful for other components
> like video transformation / transcoding, sequence flattening etc. But
> its not what I would recommend for the summer of code time-line.

I may have problems understanding the concept "semi-synchronously", does
it mean when MW parse a page that contains thumbnail images, the parser
sends requests to daemon which would reply twice for each request, one
immediately with a best fit or a place holder (synchronously), one later
on when thumbnail is ready (asynchronously)?

> == what would probably be better for the image resize efforts should
> focus on ===
>
> (1) making the existing system "more robust" and (2) better taking
> advantage of multi-threaded servers.
>
> (1) right now the system chokes on large images we should deploy
> support for an in-place image resize maybe something like vips (?)
> (http://www.vips.ecs.soton.ac.uk/index.php?title=Speed_and_Memory_Use)
> The system should intelligently call vips to transform the image to a
> reasonable size at time of upload then use those derivative for just
> in time thumbs for articles. ( If vips is unavailable we don't
> transform and we don't crash the apache node.)

Wow, vips sounds great, still reading its documentation. How is its
performance on relatively small size (not huge, a few hundreds pixels in
width/height) images compared with traditional single threaded resizing
programs?

> (2) maybe spinning out the image transform process early on in the
> parsing of the page with a place-holder and callback so by the time
> all the templates and links have been looked up the image is ready for
> output. (maybe another function wfShellBackgroundExec($cmd,
> $callback_function) (maybe using |pcntl_fork then normal |wfShellExec
> then| ||pcntl_waitpid then callback function ... which sets some var
> in the parent process so that pageOutput knows its good to go) |

Asynchronous daemon doesn't make much sense if page purge occurs on
server side, but what if we put off page purge to the browser? It works
like this:

1. mw parser send request to daemon
2. daemon finds the work non-trivial, reply *immediately* with a best
   fit or just a place holder
3. browser renders the page, finds it's not final, so sends a request to
   daemon directly using AJAX
4. daemon reply to the browser when thumbnail is ready
5. browser replace temporary best fit / place holder with new thumb
   using Javascript

Daemon now have to deal with two kinds of clients: mw servers and
browsers.

Letting browser wait instead of mw server has the benefit of reduced
latency for users while still have an acceptable page to read before
image replacing takes place and a perfect page after that. For most of
users, it's likely that the replacing occurs as soon as page loading
ends, since transfering page takes some time, and daemon would have
already finished thumbnailing in the process.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Google Summer of Code: accepted projects

Wu Zhe
In reply to this post by Wu Zhe

Sorry about the duplicates, I posted via gmane, but haven't seen my post
there for some time and thought there must be something wrong with
gmane. This won't happen again.


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Google Summer of Code: accepted projects

Nikola Smolenski
In reply to this post by Wu Zhe
Wu Zhe wrote:

> Asynchronous daemon doesn't make much sense if page purge occurs on
> server side, but what if we put off page purge to the browser? It works
> like this:
>
> 1. mw parser send request to daemon
> 2. daemon finds the work non-trivial, reply *immediately* with a best
>    fit or just a place holder
> 3. browser renders the page, finds it's not final, so sends a request to
>    daemon directly using AJAX
> 4. daemon reply to the browser when thumbnail is ready
> 5. browser replace temporary best fit / place holder with new thumb
>    using Javascript
>
> The daemon now have to deal with two kinds of clients: mw servers and
> browsers.

To me this looks way too overcomplicated. I suggest a simpler approach:

1. mw copies a placeholder image to the appropriate filename: the
placeholder could be the original image, best match thumb or a PNG with
text "wait until the thumbnail renders";
2. mw send request to daemon;
3. daemon copies resized image over the placeholder.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Google Summer of Code: accepted projects

Wu Zhe
Nikola Smolenski <[hidden email]> writes:

> Wu Zhe wrote:
>> Asynchronous daemon doesn't make much sense if page purge occurs on
>> server side, but what if we put off page purge to the browser? It works
>> like this:
>>
>> 1. mw parser send request to daemon
>> 2. daemon finds the work non-trivial, reply *immediately* with a best
>>    fit or just a place holder
>> 3. browser renders the page, finds it's not final, so sends a request to
>>    daemon directly using AJAX
>> 4. daemon reply to the browser when thumbnail is ready
>> 5. browser replace temporary best fit / place holder with new thumb
>>    using Javascript
>>
>> The daemon now have to deal with two kinds of clients: mw servers and
>> browsers.
>
> To me this looks way too overcomplicated. I suggest a simpler approach:
>
> 1. mw copies a placeholder image to the appropriate filename: the
> placeholder could be the original image, best match thumb or a PNG with
> text "wait until the thumbnail renders";
> 2. mw send request to daemon;
> 3. daemon copies resized image over the placeholder.

This simpler approach differs in that it gets rid of the AJAX thing, now
users have to manually refresh the page. Whether the AJAX is worth the
effort is discussable.


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Google Summer of Code: accepted projects

Magnus Manske-2
In reply to this post by Brion Vibber-3
I've created an initial proposal for a unified storage-handling database:

http://www.mediawiki.org/wiki/User:Magnus_Manske/File_handling

Feel free to edit and comment :-)

Cheers,
Magnus

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
123