Upload large files

classic Classic list List threaded Threaded
57 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Upload large files

Magnus Manske
I've found some nice classical ogg files online (CC-BY-SA-2.0). However,
some are larger than 20 MB. Uploading those leads me back to a blank
upload page, without comment or error. 20MB seems to be a magical limt
for PHP.

Is there a way to bypass that limit? I'd hate to have to cut perfectly
good ogg files.

Magnus


_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l

signature.asc (257 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Upload large files

Steve Bennett-4
On 8/21/06, Magnus Manske <[hidden email]> wrote:
> I've found some nice classical ogg files online (CC-BY-SA-2.0). However,
> some are larger than 20 MB. Uploading those leads me back to a blank
> upload page, without comment or error. 20MB seems to be a magical limt
> for PHP.

You mean a magical limit for uploading to MediaWiki? Maybe a nice
person with access to the servers would copy them for you? :)

Steve
_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Upload large files

Rob Church
On 21/08/06, Steve Bennett <[hidden email]> wrote:
> On 8/21/06, Magnus Manske <[hidden email]> wrote:
> > I've found some nice classical ogg files online (CC-BY-SA-2.0). However,
> > some are larger than 20 MB. Uploading those leads me back to a blank
> > upload page, without comment or error. 20MB seems to be a magical limt
> > for PHP.
>
> You mean a magical limit for uploading to MediaWiki? Maybe a nice
> person with access to the servers would copy them for you? :)

There are limits within MediaWiki, PHP and Apache. Magic can come from
more than one direction.


Rob Church
_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Upload large files

Magnus Manske
Rob Church schrieb:

> On 21/08/06, Steve Bennett <[hidden email]> wrote:
>  
>> On 8/21/06, Magnus Manske <[hidden email]> wrote:
>>    
>>> I've found some nice classical ogg files online (CC-BY-SA-2.0). However,
>>> some are larger than 20 MB. Uploading those leads me back to a blank
>>> upload page, without comment or error. 20MB seems to be a magical limt
>>> for PHP.
>>>      
>> You mean a magical limit for uploading to MediaWiki? Maybe a nice
>> person with access to the servers would copy them for you? :)
>>    
>
> There are limits within MediaWiki, PHP and Apache. Magic can come from
> more than one direction.
>  
I don't think it's MediaWiki, as it doesn't display the
file-is-too-large message, but just returns a blank upload form. And
AFAIK, Apache doesn't have an upload size limit /per se/ (though you can
set one, apparently, with LimitRequsetBody). Thus, I'd bet on PHP to be
the culprit. I know we've raised the PHP upload limit, but maybe not
enough, or maybe there is a compiled-in limit?


Also, I don't think manually copying files would be a good idea. Even if
you'd replace an existing upload, at least the img_size field would be
wrong. And manually copying a file, then manually update the database to
have it show correct values, just for uploading a larger-than-usual file
is ... not good at all.

Magnus


_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l

signature.asc (257 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Upload large files

Magnus Manske
Also, that error should at least display a message, not just return a
blank upload form.

Magnus


_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l

signature.asc (257 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Upload large files

Rob Church
In reply to this post by Magnus Manske
On 21/08/06, Magnus Manske <[hidden email]> wrote:
> I don't think it's MediaWiki, as it doesn't display the
> file-is-too-large message, but just returns a blank upload form. And
> AFAIK, Apache doesn't have an upload size limit /per se/ (though you can
> set one, apparently, with LimitRequsetBody). Thus, I'd bet on PHP to be
> the culprit. I know we've raised the PHP upload limit, but maybe not
> enough, or maybe there is a compiled-in limit?

Yeah, I was advising Steve that more than one possible limit exists. :)


Rob Church
_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Upload large files

Steve Bennett-4
On 8/21/06, Rob Church <[hidden email]> wrote:
> Yeah, I was advising Steve that more than one possible limit exists. :)

Heh, I wasn't actually meaning to blame MediaWiki, but it sounded like it.

Ugly kludge of the day idea: add a "feature" where two files can be
appended to create a new file. Then users could upload large files in
20mb segments and concatenate them afterwards.

Steve
_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Upload large files

Tim Starling
In reply to this post by Magnus Manske
Magnus Manske wrote:
> I've found some nice classical ogg files online (CC-BY-SA-2.0). However,
> some are larger than 20 MB. Uploading those leads me back to a blank
> upload page, without comment or error. 20MB seems to be a magical limt
> for PHP.
>
> Is there a way to bypass that limit? I'd hate to have to cut perfectly
> good ogg files.

PHP stores the entire contents of the POST request in memory, as it is
receiving it. That's why we can't allow arbitrarily large uploads, the
server would run out of memory. In any case, HTTP does not support resuming
for uploads, so it's quite fragile.

Ideally, we should use a protocol which is designed for uploading large
files efficiently and robustly. FTP is one such protocol, that's what
archive.org use for their video and audio uploads. They do it like this:

1. When a web account is created, an FTP account and home directory is set up.
2. Via a PHP script, the user gives the name of the collection of files they
want to upload. The script creates a directory for the upload on the FTP server.
3. The user logs in to the FTP server using their web username and password.
They upload the files using an FTP client.
4. The user "checks in" the files. There is an HTML file in the FTP
directory called "CLICK_HERE_WHEN_DONE.htm" which does this operation via a
meta refresh. Alternatively, it's done automatically 48 hours after the
directory creation.

So we could set up something like that. Or maybe we could just outsource our
large file handling to them. It'd certainly save on hard drive costs,
wouldn't it?

-- Tim Starling

_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Upload large files

Magnus Manske
Tim Starling schrieb:

> Magnus Manske wrote:
>  
>> I've found some nice classical ogg files online (CC-BY-SA-2.0). However,
>> some are larger than 20 MB. Uploading those leads me back to a blank
>> upload page, without comment or error. 20MB seems to be a magical limt
>> for PHP.
>>
>> Is there a way to bypass that limit? I'd hate to have to cut perfectly
>> good ogg files.
>>    
>
> PHP stores the entire contents of the POST request in memory, as it is
> receiving it. That's why we can't allow arbitrarily large uploads, the
> server would run out of memory. In any case, HTTP does not support resuming
> for uploads, so it's quite fragile.
>
> Ideally, we should use a protocol which is designed for uploading large
> files efficiently and robustly. FTP is one such protocol, that's what
> archive.org use for their video and audio uploads.
Or we could use a "mixed" solution:
* I upload my file to a publically accessible file (ftp or http, no
matter), if it's not already online
* I call "Special:Upload?source=web"
* The upload <input> is replaced with a simple text input row for the URL
* Instead of using the PHP upload mechanism, MediaWiki just copies the
file through ftp/http

Advantages:
* Simple changes to MediaWiki
* No need to set up ftp accounts etc.

Disadvantages:
* User needs a place to store files temporarily online (shouldn't be too
hard these days)
* People might copy stuff from anywhere on the web (they can do that
already, but only for small files ;-))
We might want to restrict this in some creative way; at least we could
dynamically set the size limit (20 MB for newbies, 1GB for admins;-)

I'd volunteer to implement the above; sounds like just a few lines of
code (with a simple hard limit, say, 100MB for everyone).
>
> So we could set up something like that. Or maybe we could just outsource our
> large file handling to them. It'd certainly save on hard drive costs,
> wouldn't it?
>  
Rely on others to store our valueable, multi-GB open content pr0n^W
music files? Never! :-)

Magnus


_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l

signature.asc (257 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Upload large files

Erik Moeller-3
In reply to this post by Tim Starling
On 8/22/06, Tim Starling <[hidden email]> wrote:
> Or maybe we could just outsource our
> large file handling to them. It'd certainly save on hard drive costs,
> wouldn't it?

I think that would make sense, yes. Archive.org could get API access
to push new metadata records to Commons, so that the files can be
referenced using standard syntax, but external URLs are used whenever
they are directly pointed to.

Put it in the non-existent Wikimedia roadmap. ;-)
--
Peace & Love,
Erik
_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Upload large files

Alex Powell-3
Out of interest, how big is Wikipedia now? I mean the actual InnoDB
file on the master server, and all the images in the media folder -
not the ones that are released to the public.

Kind regards,

Alex

On 8/22/06, Erik Moeller <[hidden email]> wrote:

> On 8/22/06, Tim Starling <[hidden email]> wrote:
> > Or maybe we could just outsource our
> > large file handling to them. It'd certainly save on hard drive costs,
> > wouldn't it?
>
> I think that would make sense, yes. Archive.org could get API access
> to push new metadata records to Commons, so that the files can be
> referenced using standard syntax, but external URLs are used whenever
> they are directly pointed to.
>
> Put it in the non-existent Wikimedia roadmap. ;-)
> --
> Peace & Love,
> Erik
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> http://mail.wikipedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Upload large files

Magnus Manske
In reply to this post by Magnus Manske
Magnus Manske schrieb:
> I'd volunteer to implement the above; sounds like just a few lines of
> code (with a simple hard limit, say, 100MB for everyone).
>  
OK, done :-)

Set $wgAllowCopyUploads to true to turn it on.
$wgMaxUploadSize (default : 100MB) limits the file size.

Magnus


_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l

signature.asc (257 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Upload large files

Brion Vibber
Magnus Manske wrote:
> Magnus Manske schrieb:
>> I'd volunteer to implement the above; sounds like just a few lines of
>> code (with a simple hard limit, say, 100MB for everyone).
>>  
> OK, done :-)
>
> Set $wgAllowCopyUploads to true to turn it on.
> $wgMaxUploadSize (default : 100MB) limits the file size.

There are very serious security problems with this, as discussed on IRC. (Please
try to be in #mediawiki when making commits for feedback.)

-- brion vibber (brion @ pobox.com)


_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l

signature.asc (257 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Upload large files

Arne 'Timwi' Heizmann
In reply to this post by Magnus Manske

Hi Magnus,

> Or we could use a "mixed" solution:
> * I upload my file to a publically accessible file (ftp or http, no
> matter), if it's not already online
> * I call "Special:Upload?source=web"
> * The upload <input> is replaced with a simple text input row for the URL
> * Instead of using the PHP upload mechanism, MediaWiki just copies the
> file through ftp/http

Why are you suggesting an extra different upload page? Why not just add
a radio button right there on the Upload page?

However, as Brion Vibber already mentioned, there are significant
security issues with this. I have a suggestion that might solve them; if
I have overlooked a security problem that this doesn't solve, please let
me know.

My suggestion is thus:

  * The upload page displays (if the "upload from web" option is
    selected) a randomly-generated token. This token is generated only
    once for every user, and then stays the same.
  * When uploading a file, the user needs to submit two URLs:
    * One that points to a text file containing the above token
    * One to the actual file he wants to upload
  * The upload is allowed only if the two files are on the same domain
    (or in the same directory, depending on how draconian you want it).

Ideas? Criticism?
Timwi

_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Upload large files

Simetrical
Why is downloading from an unknown server any less secure than
downloading from an unknown user?  You have to ensure that the file is
non-malicious either way.
_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Upload large files

Magnus Manske
In reply to this post by Arne 'Timwi' Heizmann
Timwi schrieb:

> Hi Magnus,
>
>  
>> Or we could use a "mixed" solution:
>> * I upload my file to a publically accessible file (ftp or http, no
>> matter), if it's not already online
>> * I call "Special:Upload?source=web"
>> * The upload <input> is replaced with a simple text input row for the URL
>> * Instead of using the PHP upload mechanism, MediaWiki just copies the
>> file through ftp/http
>>    
>
> Why are you suggesting an extra different upload page?
I don't.
>  Why not just add
> a radio button right there on the Upload page?
>  
I have already implemented it. It is the same upload page, just with the
textbox instead of the <input type=file>. It uses a little extra code in
SpecialUpload.php, is all.
> However, as Brion Vibber already mentioned, there are significant
> security issues with this. I have a suggestion that might solve them; if
> I have overlooked a security problem that this doesn't solve, please let
> me know.
>  
On concerns by Brion and Tim, I've rewritten the copy-from-URL part
using CURL, which makes the function less susceptible for
malicious/broken sources.

> My suggestion is thus:
>
>   * The upload page displays (if the "upload from web" option is
>     selected) a randomly-generated token. This token is generated only
>     once for every user, and then stays the same.
>   * When uploading a file, the user needs to submit two URLs:
>     * One that points to a text file containing the above token
>     * One to the actual file he wants to upload
>   * The upload is allowed only if the two files are on the same domain
>     (or in the same directory, depending on how draconian you want it).
>  
This isn't really a security feature, as an Evil User (tm) can still
upload any file (s)he wants.

It could, however, be a measure against newbies trying to copy random
files from the web. They can do that, however, right now - thy only have
to save the file locally, as long as it's not too large. So, it would
prevent newbies with no own web space from uploading large files. Is
that really worth the bother?

If activated, my implementation by default only grants admins the right
to upload large files. So, to solve my original problem, I'd have to
find a commons admin, and write on his/her talk page to please upload
the files I stored at (URL), maybe give the file description/license
there or insert it myself once it's up. As long as the overall number of
large files to upload is low, that should work just fine.

Or I'll have to run for admin myself. I have a feeling I might be
accepted ;-)

Magnus


_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l

signature.asc (257 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Upload large files

Platonides
Actually, the "upload from web" available for anyone would improve things,
as there wouldn't be so much "no source". We could find the url from where
it was caught. I am assuming the original url appearing on the summary.
Now, the instructions to "move to commons" are download file, go to commons,
fill summary, *upload file*, and press upload. It could be, as simple as go
to commons, fill summary, fill url, upload. Even easier if you use the
commonshelper, as it would fill almost all for you. I think the part where i
spent most time in the 'to commons' process is in the down/uploading, as the
file full of bytes must cross the net.
Then you could add more tricks to it, as having a bot checking license for
uploads from flickr, auto-nfd blacklisted urls, etc.

Platonides



_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Upload large files

Jay Ashworth-2
On Wed, Aug 23, 2006 at 03:03:27PM +0200, Platonides wrote:
> Actually, the "upload from web" available for anyone would improve things,
> as there wouldn't be so much "no source". We could find the url from where
> it was caught. I am assuming the original url appearing on the summary.

It occurs to *me* that logging the source URL in the file history might
be useful in many circumstances.

Cheers
-- jra
--
Jay R. Ashworth                                                [hidden email]
Designer                          Baylink                             RFC 2100
Ashworth & Associates        The Things I Think                        '87 e24
St Petersburg FL USA      http://baylink.pitas.com             +1 727 647 1274

        The Internet: We paved paradise, and put up a snarking lot.
_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Upload large files

Arne 'Timwi' Heizmann
In reply to this post by Simetrical
Simetrical wrote:
> Why is downloading from an unknown server any less secure than
> downloading from an unknown user?  You have to ensure that the file is
> non-malicious either way.

The uploaded file itself is not the only source of a potential security
problem here.

_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Upload large files

Steve Bennett-4
On 8/24/06, Timwi <[hidden email]> wrote:
> The uploaded file itself is not the only source of a potential security
> problem here.

Is that BEANS I can smell?:)

Steve
_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
123