Wiki@Home Extension

classic Classic list List threaded Threaded
47 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Wiki@Home Extension

Michael Dale-4
Want to point out the working prototype of the Wiki@home extension.
Presently it focuses on a system for transcoding uploaded media to free
formats, but will also be used for "flattening sequences" and maybe
other things in the future ;)

Its still rough around the edges ... it presently features:
* Support for uploading a non-free media assets,

* putting those non free media assets into a jobs table and distributing
the transcode job into $wgChunkDuration length encoding jobs. ( each
pieces is uploaded then reassembled on the server. that way big
transcoding jobs can be distributed to as many clients that are
participating )

* It supports multiple derivatives for different resolutions based on
the requested size.
** In the future I will add a hook for oggHanlder to use that as well ..
since a big usability issue right now is users embedding HD or high res
ogg videos into a small video space in an article ... and it naturally
it performs slowly.

* It also features a JavaScript interface for clients to query for new
jobs, get the job, download the asset, do transcode & upload it (all
through an api module so people could build a client as a shell script
if they wanted)
** In the future the interface will support preferences , basic
statistics and more options like "turn on wiki@home every-time I visit
wikipedia" or only get jobs while I am away from my computer.

* I try and handle derivatives consistently with the "file"/ media
handling system. So right now your uploaded non-free format file will be
linked to on the file detail page and via the api calls. We should
probably limit client exposure to non-free formats. Obviously they have
the files be on a public url to be transcoded, but the interfaces for
embedding and the stream detail page should link to the free format
version at all times.

* I tie transcoded chunks to user ids this makes it easier to disable
bad participants.
** I need to add an interface to delete derivatives if someone flags it
as so.

* it supports $wgJobTimeOut for re-assigning jobs that don't get done in
$wgJobTimeOut time.

This was hacked together over the past few days so its by no means
production ready ... but should get there soon ;)  Feedback is welcome.
Its in the svn at: /trunk/extensions/WikiAtHome/

peace,
michael


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiki@Home Extension

Gregory Maxwell
On Fri, Jul 31, 2009 at 9:51 PM, Michael Dale<[hidden email]> wrote:
> the transcode job into $wgChunkDuration length encoding jobs. ( each
> pieces is uploaded then reassembled on the server. that way big
> transcoding jobs can be distributed to as many clients that are
> participating )

This pretty much breaks the 'instant' gratification you currently get on upload.


The segmenting is going to significant harm compression efficiency for
any inter-frame coded output format unless you perform a two pass
encode with the first past on the server to do keyframe location
detection.  Because the stream will restart at cut points.

> * I tie transcoded chunks to user ids this makes it easier to disable
> bad participants.

Tyler Durden will be sad.

But this means that only logged in users will participate, no?

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiki@Home Extension

Michael Dale-4
Gregory Maxwell wrote:

> On Fri, Jul 31, 2009 at 9:51 PM, Michael Dale<[hidden email]> wrote:
>  
>> the transcode job into $wgChunkDuration length encoding jobs. ( each
>> pieces is uploaded then reassembled on the server. that way big
>> transcoding jobs can be distributed to as many clients that are
>> participating )
>>    
>
> This pretty much breaks the 'instant' gratification you currently get on upload.
>  

true... people will never upload to site without instant gratification (
cough youtube cough ) ...

At any rate its not replacing the firefogg  that has instant
gratification at point of upload its ~just another option~...

Also I should add that this wiki@home system just gives us distributed
transcoding as a bonus side effect ... its real purpose will be to
distribute the flattening of edited sequences. So that 1) IE users can
view them 2) We can use effects that for the time being are too
computationally expensive to render out in real-time in javascript 3)
you can download and play the sequences with normal video players and 4)
we can transclude sequences and use templates with changes propagating
to flattened versions rendered on the wiki@home distributed computer

While presently many machines in the wikimedia internal server cluster
grind away at parsing and rendering html from wiki-text the situation is
many orders of magnitude more costly with using transclution and temples
with video ... so its good to get this type of extension out in the wild
and warmed up for the near future ;)


> The segmenting is going to significant harm compression efficiency for
> any inter-frame coded output format unless you perform a two pass
> encode with the first past on the server to do keyframe location
> detection.  Because the stream will restart at cut points.
>
>  

also true. Good thing theora-svn now supports two pass encoding :) ...
but an extra key frame every 30 seconds properly wont hurt your
compression efficiency too much.. vs the gain of having your hour long
interview trans-code a hundred times faster than non-distributed
conversion.  (almost instant gratification)  Once the cost of generating
a derivative is on par with the cost of sending out the clip a few times
for "viewing" lots of things become possible.
>> * I tie transcoded chunks to user ids this makes it easier to disable
>> bad participants.
>>    
>
> Tyler Durden will be sad.
>
> But this means that only logged in users will participate, no?
>  

true...  You also have to log in to upload to commons....  It will make
life easier and make abuse of the system more difficult.. plus it can
act as a motivation factor with distributed@home teams, personal stats
and all that jazz. Just as people like to have their name show up on the
"donate" wall when making small financial contributions.

peace,
--michael

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiki@Home Extension

Gregory Maxwell
On Sat, Aug 1, 2009 at 12:13 AM, Michael Dale<[hidden email]> wrote:
> true... people will never upload to site without instant gratification (
> cough youtube cough ) ...

Hm? I just tried uploading to youtube and there was a video up right
away. Other sizes followed within a minute or two.

> At any rate its not replacing the firefogg  that has instant
> gratification at point of upload its ~just another option~...

As another option— Okay. But video support on the site stinks because
of lack of server side 'thumbnailing' for video.  People upload
multi-megabit videos, which is a good thing for editing, but then they
don't play well for most users.

Just doing it locally is hard— we've had failed SOC projects for this—
doing it distributed has all the local complexity and then some.

> Also I should add that this wiki@home system just gives us distributed
> transcoding as a bonus side effect ... its real purpose will be to
> distribute the flattening of edited sequences. So that 1) IE users can
> view them 2) We can use effects that for the time being are too
> computationally expensive to render out in real-time in javascript 3)
> you can download and play the sequences with normal video players and 4)
> we can transclude sequences and use templates with changes propagating
> to flattened versions rendered on the wiki@home distributed computer

I'm confused as to why this isn't being done locally at Wikimedia.
Creating some whole distributed thing seems to be trading off
something inexpensive (machine cycles) for something there is less
supply of— skilled developer time.  Processing power is really
inexpensive.

Some old copy of ffmpeg2theora on a single core of my core2 desktop
process a 352x288 input video at around 100mbit/sec (input video
consumption rate). Surely the time and cost required to send a bunch
of source material to remote hosts is going to offset whatever benefit
this offers.

We're also creating a whole additional layer of cost in that someone
have to police the results.

Perhaps my tyler durden reference was too indirect:

* Create a new account
* splice some penises 30 minutes into some talking head video
* extreme lulz.

Tracking down these instance and blocking these users seems like it
would be a fulltime job for a couple of people and it would only be
made worse if the naughtyness could be targeted at particular
resolutions or fallbacks. (Making it less likely that clueful people
will see the vandalism)


> While presently many machines in the wikimedia internal server cluster
> grind away at parsing and rendering html from wiki-text the situation is
> many orders of magnitude more costly with using transclution and temples
> with video ... so its good to get this type of extension out in the wild
> and warmed up for the near future ;)

In terms of work per byte of input the wikitext parser is thousands of
times slower than the theora encoder. Go go inefficient software. As a
result the difference may be less than many would assume.

Once you factor in the ratio of video to non-video content for the
for-seeable future this comes off looking like a time wasting
boondoggle.

Unless the basic functionality— like downsampled videos that people
can actually play— is created I can't see there ever being a time
where some great distributed thing will do any good at all.

>> The segmenting is going to significant harm compression efficiency for
>> any inter-frame coded output format unless you perform a two pass
>> encode with the first past on the server to do keyframe location
>> detection.  Because the stream will restart at cut points.
>
> also true. Good thing theora-svn now supports two pass encoding :) ...

Yea, great, except doing the first pass for segmentation is pretty
similar to the computational cost as simply doing a one-pass encode of
the video.

> but an extra key frame every 30 seconds properly wont hurt your
> compression efficiency too much..

It's not just about keyframes locations— if you encode separately and
then merge you lose the ability to provide continuous rate control. So
there would be large bitrate spikes at the splice intervals which will
stall streaming for anyone without significantly more bandwidth than
the clip.

> vs the gain of having your hour long
> interview trans-code a hundred times faster than non-distributed
> conversion.  (almost instant gratification)

Well tuned you can expect a distributed system to improve throughput
at the expense of latency.

Sending out source material to a bunch of places, having them crunch
on it on whatever slow hardware they have, then sending it back may
win on the dollars per throughput front, but I can't see that having
good latency.

> true...  You also have to log in to upload to commons....  It will make
> life easier and make abuse of the system more difficult.. plus it can

Having to create an account does pretty much nothing to discourage
malicious activity.

> act as a motivation factor with distributed@home teams, personal stats
> and all that jazz. Just as people like to have their name show up on the
> "donate" wall when making small financial contributions.

"But why donate? Wikimedia is all distributed, right?"

But whatever— It seems that the goal here is to create trendy buzzword
technology while basic functionality like simple thumbnailing sits
completely unaddressed.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiki@Home Extension

Brian J Mingus
On Sat, Aug 1, 2009 at 12:47 AM, Gregory Maxwell <[hidden email]> wrote:

> On Sat, Aug 1, 2009 at 12:13 AM, Michael Dale<[hidden email]> wrote:
>
>
> Once you factor in the ratio of video to non-video content for the
> for-seeable future this comes off looking like a time wasting
> boondoggle.
>

I think you vastly underestimate the amount of video that will be uploaded.
Michael is right in thinking big and thinking distributed. CPU cycles are
not *that* cheap. There is a lot of free video out there and as soon as we
have a stable system in place wikimedians are going to have a heyday
uploading it to Commons.
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiki@Home Extension

Gregory Maxwell
On Sat, Aug 1, 2009 at 2:54 AM, Brian<[hidden email]> wrote:
> On Sat, Aug 1, 2009 at 12:47 AM, Gregory Maxwell <[hidden email]> wrote:
>> On Sat, Aug 1, 2009 at 12:13 AM, Michael Dale<[hidden email]> wrote:
>> Once you factor in the ratio of video to non-video content for the
>> for-seeable future this comes off looking like a time wasting
>> boondoggle.
> I think you vastly underestimate the amount of video that will be uploaded.
> Michael is right in thinking big and thinking distributed. CPU cycles are
> not *that* cheap.

Really rough back of the napkin numbers:

My desktop has a X3360 CPU. You can build systems all day using this
processor for $600 (I think I spent $500 on it 6 months ago).  There
are processors with better price/performance available now, but I can
benchmark on this.

Commons is getting roughly 172076 uploads per month now across all
media types.  Scans of single pages, photographs copied from flickr,
audio pronouncations, videos, etc.

If everyone switched to uploading 15 minute long SD videos instead of
other things there would be 154,868,400 seconds of video uploaded to
commons per-month. Truly a staggering amount. Assuming a 40 hour work
week it would take over 250 people working full time just to *view*
all of it.

That number is an average rate of 58.9 seconds of video uploaded per
second every second of the month.

Using all four cores my desktop video encodes at >16x real-time (for
moderate motion standard def input using the latest theora 1.1 svn).

So you'd need less than four of those systems to keep up with the
entire commons upload rate switched to 15 minute videos.  Okay, it
would be slow at peak hours and you might wish to produce a couple of
versions at different resolutions, so multiply that by a couple.

This is what I meant by processing being cheap.

If the uploads were all compressed at a bitrate of 4mbit/sec and that
users were kind enough to spread their uploads out through the day and
that the distributed system were perfectly efficient (only need to
send one copy of the upload out), and if Wikimedia were only paying
$10/mbit/sec/month for transit out of their primary dataceter... we'd
find that the bandwidth costs of sending that source material out
again would be $2356/month. (58.9 seconds per second * 4mbit/sec *
$10/mbit/sec/month)

(Since transit billing is on the 95th percentile 5 minute average of
the greater of inbound or outbound uploads are basically free, but
sending out data to the 'cloud' costs like anything else).

So under these assumptions sending out compressed video for
re-encoding is likely to cost roughly as much *each month* as the
hardware for local transcoding. ... and the pace of processing speed
up seems to be significantly better than the declining prices for
bandwidth.

This is also what I meant by processing being cheap.

Because uploads won't be uniformly space you'll need some extra
resources to keep things from getting bogged at peak hours. But the
poor peak-to-average ratio also works against the bandwidth costs. You
can't win: Unless you assume that uploads are going to be very low
bitrates local transcoding will always be cheaper with very short
payoff times.

I don't know how to figure out how much it would 'cost' to have human
contributors spot embedded penises snuck into transcodes and then
figure out which of several contributing transcoders are doing it and
blocking them, only to have the bad user switch IPs and begin again.
... but it seems impossibly expensive even though it's not an actual
dollar cost.


> There is a lot of free video out there and as soon as we
> have a stable system in place wikimedians are going to have a heyday
> uploading it to Commons.

I'm not saying that there won't be video; I'm saying there won't be
video if development time is spent on fanciful features rather than
desperately needed short term functionality.  We have tens of
thousands of videos, much of which don't stream well for most people
because they need thumbnailing.

Firefogg was useful upload lubrication. But user-powered cloud
transcoding?  I believe the analysis I provided above demonstrates
that resources would be better applied elsewhere.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiki@Home Extension

David Gerard-2
In reply to this post by Brian J Mingus
2009/8/1 Brian <[hidden email]>:

> I think you vastly underestimate the amount of video that will be uploaded.
> Michael is right in thinking big and thinking distributed. CPU cycles are
> not *that* cheap. There is a lot of free video out there and as soon as we
> have a stable system in place wikimedians are going to have a heyday
> uploading it to Commons.


Oh hell yes. If I could just upload any AVI or MPEG4 straight off a
camera, you bet I would. Just imagine what people who've never heard
the word "Theora" will do.


- d.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiki <at> Home Extension

Gergő Tisza
In reply to this post by Gregory Maxwell
Gregory Maxwell <gmaxwell <at> gmail.com> writes:

> I don't know how to figure out how much it would 'cost' to have human
> contributors spot embedded penises snuck into transcodes and then
> figure out which of several contributing transcoders are doing it and
> blocking them, only to have the bad user switch IPs and begin again.
> ... but it seems impossibly expensive even though it's not an actual
> dollar cost.

Standard solution to that is to perform each operation multiple times on
different machines and then compare results. Of course, that raises bandwidth
costs even further.


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiki@Home Extension

Kat Walsh-4
In reply to this post by David Gerard-2
On Sat, Aug 1, 2009 at 9:57 AM, David Gerard<[hidden email]> wrote:

> 2009/8/1 Brian <[hidden email]>:
>
>> I think you vastly underestimate the amount of video that will be uploaded.
>> Michael is right in thinking big and thinking distributed. CPU cycles are
>> not *that* cheap. There is a lot of free video out there and as soon as we
>> have a stable system in place wikimedians are going to have a heyday
>> uploading it to Commons.
>
>
> Oh hell yes. If I could just upload any AVI or MPEG4 straight off a
> camera, you bet I would. Just imagine what people who've never heard
> the word "Theora" will do.

Even if so, I don't think assuming that every single commons upload at
the current rate will instead be a 15-minute video is much of an
underestimate...

-Kat


--
Your donations keep Wikipedia online: http://donate.wikimedia.org/en
Wikimedia, Press: [hidden email] * Personal: [hidden email]
http://en.wikipedia.org/wiki/User:Mindspillage * (G)AIM:Mindspillage
mindspillage or mind|wandering on irc.freenode.net * email for phone

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiki@Home Extension

Brian J Mingus
On Sat, Aug 1, 2009 at 10:12 AM, Kat Walsh <[hidden email]> wrote:

> On Sat, Aug 1, 2009 at 9:57 AM, David Gerard<[hidden email]> wrote:
> > 2009/8/1 Brian <[hidden email]>:
> >
> >> I think you vastly underestimate the amount of video that will be
> uploaded.
> >> Michael is right in thinking big and thinking distributed. CPU cycles
> are
> >> not *that* cheap. There is a lot of free video out there and as soon as
> we
> >> have a stable system in place wikimedians are going to have a heyday
> >> uploading it to Commons.
> >
> >
> > Oh hell yes. If I could just upload any AVI or MPEG4 straight off a
> > camera, you bet I would. Just imagine what people who've never heard
> > the word "Theora" will do.
>
> Even if so, I don't think assuming that every single commons upload at
> the current rate will instead be a 15-minute video is much of an
> underestimate...
>
> -Kat
>

A reasonable estimate would require knowledge of how much free video can be
automatically acquired, it's metadata automatically parsed and then
automatically uploaded to commons. I am aware of some massive archives of
free content video. Current estimates based on images do not necessarily
apply to video, especially as we are just entering a video-aware era of the
internet. At any rate, while Gerard's estimate is a bit optimistic in my
view, it seems realistic for the near term.
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiki@Home Extension

Brian J Mingus
On Sat, Aug 1, 2009 at 10:17 AM, Brian <[hidden email]> wrote:

>
>
> On Sat, Aug 1, 2009 at 10:12 AM, Kat Walsh <[hidden email]> wrote:
>
>> On Sat, Aug 1, 2009 at 9:57 AM, David Gerard<[hidden email]> wrote:
>> > 2009/8/1 Brian <[hidden email]>:
>> >
>> >> I think you vastly underestimate the amount of video that will be
>> uploaded.
>> >> Michael is right in thinking big and thinking distributed. CPU cycles
>> are
>> >> not *that* cheap. There is a lot of free video out there and as soon as
>> we
>> >> have a stable system in place wikimedians are going to have a heyday
>> >> uploading it to Commons.
>> >
>> >
>> > Oh hell yes. If I could just upload any AVI or MPEG4 straight off a
>> > camera, you bet I would. Just imagine what people who've never heard
>> > the word "Theora" will do.
>>
>> Even if so, I don't think assuming that every single commons upload at
>> the current rate will instead be a 15-minute video is much of an
>> underestimate...
>>
>> -Kat
>>
>
> A reasonable estimate would require knowledge of how much free video can be
> automatically acquired, it's metadata automatically parsed and then
> automatically uploaded to commons. I am aware of some massive archives of
> free content video. Current estimates based on images do not necessarily
> apply to video, especially as we are just entering a video-aware era of the
> internet. At any rate, while Gerard's estimate is a bit optimistic in my
> view, it seems realistic for the near term.
>

Sorry, looked up to the wrong message - Gregory's estimate.
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiki@Home Extension

Gregory Maxwell
In reply to this post by Brian J Mingus
On Sat, Aug 1, 2009 at 12:17 PM, Brian<[hidden email]> wrote:
> A reasonable estimate would require knowledge of how much free video can be
> automatically acquired, it's metadata automatically parsed and then
> automatically uploaded to commons. I am aware of some massive archives of
> free content video. Current estimates based on images do not necessarily
> apply to video, especially as we are just entering a video-aware era of the
> internet. At any rate, while Gerard's estimate is a bit optimistic in my
> view, it seems realistic for the near term.

So—  The plan is that we'll lose money on every transaction but we'll
make it up in volume?

(Again, this time without math: The rate of increase as a function of
video-minutes of the amortized hardware costs costs for local
transcoding is lower than the rate of increase in bandwidth costs
needed to send off the source material to users to transcode in a
distributed manner. This holds for pretty much any reasonable source
bitrate, though I used 4mbit/sec in my calculaton.  So regardless of
the amount of video being uploaded using users is simply more
expensive than doing it locally)

Existing distributed computing projects work because the ratio of
CPU-crunching to communicating is enormously high. This isn't (and
shouldn't be) true for video transcoding.

They also work because there is little reward for tampering with the
system. I don't think this is true for our transcoding. There are many
who would be greatly gratified by splicing penises into streams far
more so than anonymously and undetectably making a protein fold wrong.

... and it's only reasonable to expect the cost gap to widen.

On Sat, Aug 1, 2009 at 9:57 AM, David Gerard<[hidden email]> wrote:
> Oh hell yes. If I could just upload any AVI or MPEG4 straight off a
> camera, you bet I would. Just imagine what people who've never heard
> the word "Theora" will do.

Sweet! Except, *instead* of developing the ability to upload straight
off a camera what is being developed is user-distributed video
transcoding— which won't do anything itself to make it easier to
upload.

What it will do is waste precious development cycles maintaining an
overly complicated software infrastructure, waste precious commons
administration cycles hunting subtle and confusing sources of
vandalism, and waste income from donors by spending more on additional
outbound bandwidth than would be spent on computing resources to
transcode locally.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiki@Home Extension

Brian J Mingus
On Sat, Aug 1, 2009 at 11:04 AM, Gregory Maxwell <[hidden email]> wrote:

> On Sat, Aug 1, 2009 at 12:17 PM, Brian<[hidden email]> wrote:
> > A reasonable estimate would require knowledge of how much free video can
> be
> > automatically acquired, it's metadata automatically parsed and then
> > automatically uploaded to commons. I am aware of some massive archives of
> > free content video. Current estimates based on images do not necessarily
> > apply to video, especially as we are just entering a video-aware era of
> the
> > internet. At any rate, while Gerard's estimate is a bit optimistic in my
> > view, it seems realistic for the near term.
>
> So—  The plan is that we'll lose money on every transaction but we'll
> make it up in volume?


There are always tradeoffs. If I understand wiki@home correctly it is also
intended to be run @foundation. It works just as well for distributing
transcoding over the foundation cluster as it does for distributing it to
disparate clients. Thus, if the foundation encounters a cpu backlog and
wishes to distribute some long running jobs to @home clients in order to
maintain realtime operation of the site in exchange for bandwidth it could.
Through this method the foundation could handle transcoding spikes of
arbitrary size. In the case of spikes @foundation can do first pass
get-something-back-to-the-user-now encoding and pass the rest of the tasks
to @home.
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiki@Home Extension

Gregory Maxwell
On Sat, Aug 1, 2009 at 1:13 PM, Brian<[hidden email]> wrote:
>
> There are always tradeoffs. If I understand wiki@home correctly it is also
> intended to be run @foundation. It works just as well for distributing
> transcoding over the foundation cluster as it does for distributing it to
> disparate clients.

There is nothing in the source code that suggests that.

It currently requires the compute nodes to be running the firefogg
browser extension.  So this would require loading an xserver and
firefox onto the servers in order to have them participate as it is
now.  The video data has to take a round-trip through PHP and the
upload interface which doesn't really make any sense, that alone could
well take as much time as the actual transcode.

As a server distribution infrastructure it would be an inefficient one.

Much of the code in the extension appears to be there to handle issues
that simply wouldn't exist in the local transcoding case.   I would
have no objection to a transcoding system designed for local operation
with some consideration made for adding externally distributed
operation in the future if it ever made sense.

Incidentally— The slice and recombine approach using oggCat in
WikiAtHome produces files with gaps in the granpos numbering and audio
desync for me.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiki@Home Extension

Brian J Mingus
On Sat, Aug 1, 2009 at 11:47 AM, Gregory Maxwell <[hidden email]> wrote:

> As a server distribution infrastructure [snip]
>

It had occured to me that wiki@home might be better generalized to an
heterogeneous compute cloud for foundation trusted code. The idea would be
qemu sandboxes distributed via boinc. So the foundation could distribute
transcoder sandboxes to a certain number of clients, and sandboxes specific
to the needs of researchers using datasets such as the dumps which are often
easily parellelized using map/reduce. The head node would sit on the tool
server. The qemu instances would run ubuntu. The researcher submits a job,
which consists of a directory containing his code, his data, and a file
describing the map/reduce partitioning of the data. The head node compiles
the code into a qemu instance and uses boinc to map it to a client that is
running win/linux/mac.  Crazy, right? ;-)
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiki@Home Extension

Brian J Mingus
On Sat, Aug 1, 2009 at 12:05 PM, Brian <[hidden email]> wrote:

> On Sat, Aug 1, 2009 at 11:47 AM, Gregory Maxwell <[hidden email]>wrote:
>
>> As a server distribution infrastructure [snip]
>>
>
> It had occured to me that wiki@home might be better generalized to an
> heterogeneous compute cloud for foundation trusted code. The idea would be
> qemu sandboxes distributed via boinc. So the foundation could distribute
> transcoder sandboxes to a certain number of clients, and sandboxes specific
> to the needs of researchers using datasets such as the dumps which are often
> easily parellelized using map/reduce. The head node would sit on the tool
> server. The qemu instances would run ubuntu. The researcher submits a job,
> which consists of a directory containing his code, his data, and a file
> describing the map/reduce partitioning of the data. The head node compiles
> the code into a qemu instance and uses boinc to map it to a client that is
> running win/linux/mac.  Crazy, right? ;-)
>

Various obvious effeciency improvements occured to me. If the clients are
already running an Ubuntu qemu instance then they can simply be shipped the
code and the data. They compile the code and run their portion of the data.
The transcoder clients sit idle with a transcoder instance ready, process
the data and send it back. Obviously, it is not very optimal to ship out an
entire os for every job..:)
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiki@Home Extension

Brian J Mingus
On Sat, Aug 1, 2009 at 12:18 PM, Brian <[hidden email]> wrote:

> On Sat, Aug 1, 2009 at 12:05 PM, Brian <[hidden email]> wrote:
>
>> On Sat, Aug 1, 2009 at 11:47 AM, Gregory Maxwell <[hidden email]>wrote:
>>
>>> As a server distribution infrastructure [snip]
>>>
>>
>> It had occured to me that wiki@home might be better generalized to an
>> heterogeneous compute cloud for foundation trusted code. The idea would be
>> qemu sandboxes distributed via boinc. So the foundation could distribute
>> transcoder sandboxes to a certain number of clients, and sandboxes specific
>> to the needs of researchers using datasets such as the dumps which are often
>> easily parellelized using map/reduce. The head node would sit on the tool
>> server. The qemu instances would run ubuntu. The researcher submits a job,
>> which consists of a directory containing his code, his data, and a file
>> describing the map/reduce partitioning of the data. The head node compiles
>> the code into a qemu instance and uses boinc to map it to a client that is
>> running win/linux/mac.  Crazy, right? ;-)
>>
>
> Various obvious effeciency improvements occured to me. If the clients are
> already running an Ubuntu qemu instance then they can simply be shipped the
> code and the data. They compile the code and run their portion of the data.
> The transcoder clients sit idle with a transcoder instance ready, process
> the data and send it back. Obviously, it is not very optimal to ship out an
> entire os for every job..:)
>

And of course, you can just ship them the binaries!
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiki@Home Extension

Michael Dale-4
In reply to this post by Gregory Maxwell
Some notes:
* ~its mostly an api~. We can run it internally if that is more cost
efficient. ( will do on a command line client shortly ) ... (as
mentioned earlier the present code was hacked together quickly its just
a prototype. I will generalize things to work better as internal jobs.
and I think I will not create File:Myvideo.mp4 wiki pages rather create
a placeholder File:Myvideo.ogg page and only store the derivatives
outside of wiki page node system. I also notice some sync issues with
oggCat which are under investigation )

* Clearly CPU's, are cheep so is power for the commuters, human
resources for system maintenance, rack-space and internal network
management, and we of-course will want to "run the numbers" on any
solution we go with. I think your source bitrate assumption was a little
high I would think more like 1-2Mbs (with cell-phone camaras targeting
low bitrates for transport and desktops re-encoding before upload). But
I think this whole convesation is missing the larget issue which is if
its cost prohibitive to distribute a few copies for transcode how are we
going to distribute the derivatives thousands of times for viewing?  
Perhaps future work in this area should focus more on the distributing  
bandwith cost issue.

*  Furthermore I think I might have mis-represented wiki@home I should
have more clearly focused on the sequence flattening and only mentioned
transocding as an option. With sequence flattening we have a more
standard viewing bitrate of source material and cpu costs for rendering
are much higher. At present there is no fast way to overlay html/svg on
video with filters and effects that are only presently predictably
defined in javascript. For this reason we use the browser to wysiwyg
render out the content. Eventually we may want to write a optimized
stand alone flattener, but for now the wiki@home solution worlds less
costly in terms of developer resources since we can use the "editor" to
output the flat file.

3) And finally yes ... you can already insert a penis into video uploads
today. With something like: oggCat | "ffmpeg2theora -i someVideo.ogg -s
0 -e 42.2" "myOneFramePenis.ogg"  "ffmpeg2theora -i someVideo.ogg -s 42.2"
But yea its one more level to worry about and if its cheaper to do it
internally (the transcodes not the penis insertion) we should do it
internally. :P  (I hope other appreciate the multiple levels of humor here)

peace,
michael

Gregory Maxwell wrote:

> On Sat, Aug 1, 2009 at 2:54 AM, Brian<[hidden email]> wrote:
>  
>> On Sat, Aug 1, 2009 at 12:47 AM, Gregory Maxwell <[hidden email]> wrote:
>>    
>>> On Sat, Aug 1, 2009 at 12:13 AM, Michael Dale<[hidden email]> wrote:
>>> Once you factor in the ratio of video to non-video content for the
>>> for-seeable future this comes off looking like a time wasting
>>> boondoggle.
>>>      
>> I think you vastly underestimate the amount of video that will be uploaded.
>> Michael is right in thinking big and thinking distributed. CPU cycles are
>> not *that* cheap.
>>    
>
> Really rough back of the napkin numbers:
>
> My desktop has a X3360 CPU. You can build systems all day using this
> processor for $600 (I think I spent $500 on it 6 months ago).  There
> are processors with better price/performance available now, but I can
> benchmark on this.
>
> Commons is getting roughly 172076 uploads per month now across all
> media types.  Scans of single pages, photographs copied from flickr,
> audio pronouncations, videos, etc.
>
> If everyone switched to uploading 15 minute long SD videos instead of
> other things there would be 154,868,400 seconds of video uploaded to
> commons per-month. Truly a staggering amount. Assuming a 40 hour work
> week it would take over 250 people working full time just to *view*
> all of it.
>
> That number is an average rate of 58.9 seconds of video uploaded per
> second every second of the month.
>
> Using all four cores my desktop video encodes at >16x real-time (for
> moderate motion standard def input using the latest theora 1.1 svn).
>
> So you'd need less than four of those systems to keep up with the
> entire commons upload rate switched to 15 minute videos.  Okay, it
> would be slow at peak hours and you might wish to produce a couple of
> versions at different resolutions, so multiply that by a couple.
>
> This is what I meant by processing being cheap.
>
> If the uploads were all compressed at a bitrate of 4mbit/sec and that
> users were kind enough to spread their uploads out through the day and
> that the distributed system were perfectly efficient (only need to
> send one copy of the upload out), and if Wikimedia were only paying
> $10/mbit/sec/month for transit out of their primary dataceter... we'd
> find that the bandwidth costs of sending that source material out
> again would be $2356/month. (58.9 seconds per second * 4mbit/sec *
> $10/mbit/sec/month)
>
> (Since transit billing is on the 95th percentile 5 minute average of
> the greater of inbound or outbound uploads are basically free, but
> sending out data to the 'cloud' costs like anything else).
>
> So under these assumptions sending out compressed video for
> re-encoding is likely to cost roughly as much *each month* as the
> hardware for local transcoding. ... and the pace of processing speed
> up seems to be significantly better than the declining prices for
> bandwidth.
>
> This is also what I meant by processing being cheap.
>
> Because uploads won't be uniformly space you'll need some extra
> resources to keep things from getting bogged at peak hours. But the
> poor peak-to-average ratio also works against the bandwidth costs. You
> can't win: Unless you assume that uploads are going to be very low
> bitrates local transcoding will always be cheaper with very short
> payoff times.
>
> I don't know how to figure out how much it would 'cost' to have human
> contributors spot embedded penises snuck into transcodes and then
> figure out which of several contributing transcoders are doing it and
> blocking them, only to have the bad user switch IPs and begin again.
> ... but it seems impossibly expensive even though it's not an actual
> dollar cost.
>
>
>  
>> There is a lot of free video out there and as soon as we
>> have a stable system in place wikimedians are going to have a heyday
>> uploading it to Commons.
>>    
>
> I'm not saying that there won't be video; I'm saying there won't be
> video if development time is spent on fanciful features rather than
> desperately needed short term functionality.  We have tens of
> thousands of videos, much of which don't stream well for most people
> because they need thumbnailing.
>
> Firefogg was useful upload lubrication. But user-powered cloud
> transcoding?  I believe the analysis I provided above demonstrates
> that resources would be better applied elsewhere.
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>  


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiki@Home Extension

David Gerard-2
In reply to this post by Brian J Mingus
2009/8/1 Brian <[hidden email]>:

> And of course, you can just ship them the binaries!


Trusted clients are impossible. Particularly for prrotecting against
lulz-seekers.


- d.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Wiki@Home Extension

Brian J Mingus
On Sat, Aug 1, 2009 at 1:07 PM, David Gerard <[hidden email]> wrote:

> 2009/8/1 Brian <[hidden email]>:
>
> > And of course, you can just ship them the binaries!
>
>
> Trusted clients are impossible. Particularly for prrotecting against
> lulz-seekers.
>
>
> - d.
>
>
Impossible? That's hyperbole.
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
123