[Mediawiki-l] Special characters on uploaded files

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[Mediawiki-l] Special characters on uploaded files

Stuardo Herrera
Hello everyone!

I'm uploading some pdf files that have some special characters in their
names to my mediawiki site, but when I try to open them, I get to a Not
Found page. I have noticed that when I upload them, their names change in
the server to other characters where the special chars are and that's why
mediawiki can't find them. Is their a way or an extension to avoid this?
Thank you!

--
:::Stuardo Herrera:::
http://stuardo.wordpress.com
http://php.develsystems.com
_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: [Mediawiki-l] Special characters on uploaded files

Brion Vibber
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Stuardo Herrera wrote:
> I'm uploading some pdf files that have some special characters in their
> names to my mediawiki site, but when I try to open them, I get to a Not
> Found page. I have noticed that when I upload them, their names change in
> the server to other characters where the special chars are and that's why
> mediawiki can't find them. Is their a way or an extension to avoid this?
> Thank you!

Is your server running on Microsoft Windows? In this case file uploads
with non-ASCII characters will not work correctly.

On other operating systems you should not have this difficulty, so more
details would be welcome.

- -- brion vibber (brion @ pobox.com)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFr/E8wRnhpk1wk44RAn44AJsHYOjk4RGxu6tFdSZVWDp4oJSMrQCg0dao
Y4nnFsPXsfB2mLNkYb4qlzs=
=y298
-----END PGP SIGNATURE-----

_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: [Mediawiki-l] Special characters on uploaded files

Stuardo Herrera
Thanks Brion. Yep, IIS 6 on Windows 2003 Server. Do I have a solution? or
should I just tell my users to not upload files with names with special
chars? Thank you again :)

2007/1/18, Brion Vibber <[hidden email]>:

>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Stuardo Herrera wrote:
> > I'm uploading some pdf files that have some special characters in their
> > names to my mediawiki site, but when I try to open them, I get to a Not
> > Found page. I have noticed that when I upload them, their names change
> in
> > the server to other characters where the special chars are and that's
> why
> > mediawiki can't find them. Is their a way or an extension to avoid this?
> > Thank you!
>
> Is your server running on Microsoft Windows? In this case file uploads
> with non-ASCII characters will not work correctly.
>
> On other operating systems you should not have this difficulty, so more
> details would be welcome.
>
> - -- brion vibber (brion @ pobox.com)
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (Darwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFFr/E8wRnhpk1wk44RAn44AJsHYOjk4RGxu6tFdSZVWDp4oJSMrQCg0dao
> Y4nnFsPXsfB2mLNkYb4qlzs=
> =y298
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> MediaWiki-l mailing list
> [hidden email]
> http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
>



--
:::Stuardo Herrera:::
http://stuardo.wordpress.com
http://php.develsystems.com
_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: [Mediawiki-l] Special characters on uploaded files

Fernando Correia
I also had this issue. For now I put a warning on the upload page not to use
accented characters.

This is not a very good solution...

Windows doesn't have any problem handling special characters in file names.
But either MediaWiki or PHP are mangling the file name somehow before
writing it. And the file name mangling used to create the file on the file
system is different from the one used later on the URL. That's why the file
is not found.

Possible solutions would be:

a) To discover who is changing the file name (MediaWiki or PHP) and try to
disable it.
b) To put some code on the upload file form that, when running under
Windows, would suggest a safe file name.

I might try to do this in the future if there is a chance the patch would be
accepted for MediaWiki.
_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
jdd
Reply | Threaded
Open this post in threaded view
|

Re: [Mediawiki-l] Special characters on uploaded files

jdd
Fernando Correia wrote:

> Windows doesn't have any problem handling special characters in file names.

wrong.

Windows have many problems, using special codes for some
characters, as do joliet cd/dvd system, this is easy to see
when reading from windows any file written under strictly
utf8 compliant unix system

jdd


--
http://www.dodin.net
Votez pour nous, merci - vote for us, thanks :-)
http://musique.sfrjeunestalents.fr/artiste/Magic-Alliance/

_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: [Mediawiki-l] Special characters on uploaded files

Brion Vibber
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

jdd wrote:
> Fernando Correia wrote:
>> Windows doesn't have any problem handling special characters in file names.
>
> wrong.
>
> Windows have many problems, using special codes for some
> characters, as do joliet cd/dvd system, this is easy to see
> when reading from windows any file written under strictly
> utf8 compliant unix system

(If you configure your mount options properly on the Unix/Linux side you
won't have that problem!)


The problem is that Windows has a kind of weird schizophrenic approach
to character sets.

Part of the system works in pure, total Unicode, speaking and storing
UTF-16 everywhere. This is the Unicode or "wide character" interface.

Part of the system works in a language- or system-dependent second
encoding which may be 8-bit or variable length. This is the (not very
accurately named) "ANSI" interface.

(And then just to be a jerk, part of the system works in *another*
language- or system-dependent *third* encoding, 8-bit or variable
length, which is the "OEM" charset. This is used in console-mode
terminals and the DOS-compatible 8.3 filenames on FAT volumes.)


Now, for better or for worse, if you use the (Unix-derived) C standard
library, like most ports of Unix apps probably do, it seems to prefer
using the ANSI (or maybe OEM?) encoding of things.

MediaWiki generally assumes you're running on a modern Unix and speaks
UTF-8 everywhere, including with the filesystem. That assumption breaks
on Windows, where filenames on the filesystem *as seen from PHP* are
accessed through some kind of horrid "ANSI" (or OEM?)-to-Unicode
translation layer.

This means you basically get gibberish, since MediaWiki and the web
server see different versions of the filename.


A planned change to the file storage scheme will make this issue
obsolete as file storage will be done with nice, ASCII-clean
alphanumeric hash keys, but that might be another major version or two
before it gets done.


If someone happens to know a convenient way to tell the system "my
process speaks UTF-8, let me use the damn Unicode filenames" that'd be
super. Otherwise... hack in a check for non-ASCII chars? *shrug*

- -- brion vibber (brion @ pobox.com / brion @ wikimedia.org)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFsQUswRnhpk1wk44RAu+nAJ9Ph4Pd2hTejpMmRrrYUU21WBjJBQCeLK43
m9V/59LLt+dA+oMfftRGyWg=
=ZfNo
-----END PGP SIGNATURE-----

_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: [Mediawiki-l] Special characters on uploaded files

Fernando Correia
I agree that would be a terrific solution at the root of the problem. But it
is a big change and may be too far in the future.

A quicker but effective solution could be some special processing on the
post event of the upload file form, "cleaning" the file name. This could be
conditional so it would not affect UNIX installations.
_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: [Mediawiki-l] Special characters on uploaded files

Stuardo Herrera
In reply to this post by Brion Vibber
Oh well, I'll think in a hack then. Meanwhile I hope all my users read the
"don't upload special chars" message. Thanks to everyone that helped!

2007/1/19, Brion Vibber <[hidden email]>:

>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> jdd wrote:
> > Fernando Correia wrote:
> >> Windows doesn't have any problem handling special characters in file
> names.
> >
> > wrong.
> >
> > Windows have many problems, using special codes for some
> > characters, as do joliet cd/dvd system, this is easy to see
> > when reading from windows any file written under strictly
> > utf8 compliant unix system
>
> (If you configure your mount options properly on the Unix/Linux side you
> won't have that problem!)
>
>
> The problem is that Windows has a kind of weird schizophrenic approach
> to character sets.
>
> Part of the system works in pure, total Unicode, speaking and storing
> UTF-16 everywhere. This is the Unicode or "wide character" interface.
>
> Part of the system works in a language- or system-dependent second
> encoding which may be 8-bit or variable length. This is the (not very
> accurately named) "ANSI" interface.
>
> (And then just to be a jerk, part of the system works in *another*
> language- or system-dependent *third* encoding, 8-bit or variable
> length, which is the "OEM" charset. This is used in console-mode
> terminals and the DOS-compatible 8.3 filenames on FAT volumes.)
>
>
> Now, for better or for worse, if you use the (Unix-derived) C standard
> library, like most ports of Unix apps probably do, it seems to prefer
> using the ANSI (or maybe OEM?) encoding of things.
>
> MediaWiki generally assumes you're running on a modern Unix and speaks
> UTF-8 everywhere, including with the filesystem. That assumption breaks
> on Windows, where filenames on the filesystem *as seen from PHP* are
> accessed through some kind of horrid "ANSI" (or OEM?)-to-Unicode
> translation layer.
>
> This means you basically get gibberish, since MediaWiki and the web
> server see different versions of the filename.
>
>
> A planned change to the file storage scheme will make this issue
> obsolete as file storage will be done with nice, ASCII-clean
> alphanumeric hash keys, but that might be another major version or two
> before it gets done.
>
>
> If someone happens to know a convenient way to tell the system "my
> process speaks UTF-8, let me use the damn Unicode filenames" that'd be
> super. Otherwise... hack in a check for non-ASCII chars? *shrug*
>
> - -- brion vibber (brion @ pobox.com / brion @ wikimedia.org)
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (Darwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFFsQUswRnhpk1wk44RAu+nAJ9Ph4Pd2hTejpMmRrrYUU21WBjJBQCeLK43
> m9V/59LLt+dA+oMfftRGyWg=
> =ZfNo
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> MediaWiki-l mailing list
> [hidden email]
> http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
>



--
:::Stuardo Herrera:::
http://stuardo.wordpress.com
http://php.develsystems.com
_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: [Mediawiki-l] Special characters on uploaded files

Fernando Correia
In reply to this post by Fernando Correia
2007/1/19, Fernando Correia <[hidden email]>:
>
> I agree that would be a terrific solution at the root of the problem. But
> it is a big change and may be too far in the future.
>
> A quicker but effective solution could be some special processing on the
> post event of the upload file form, "cleaning" the file name. This could be
> conditional so it would not affect UNIX installations.
>


Brion, do you think such a patch would have a chance of being incorporated
into MediaWiki?
_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: [Mediawiki-l] Special characters on uploaded files

Brion Vibber
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Fernando Correia wrote:

> 2007/1/19, Fernando Correia <[hidden email]>:
>> I agree that would be a terrific solution at the root of the problem. But
>> it is a big change and may be too far in the future.
>>
>> A quicker but effective solution could be some special processing on the
>> post event of the upload file form, "cleaning" the file name. This could be
>> conditional so it would not affect UNIX installations.
>>
>
>
> Brion, do you think such a patch would have a chance of being incorporated
> into MediaWiki?

Could be done.

- -- brion vibber (brion @ pobox.com / brion @ wikimedia.org)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFtOCbwRnhpk1wk44RArdJAJ9j5+SFDvJ7KNaEmwiyw0+3JBgLTgCfQjxP
S9G4Fhbk0hMCHJjjdbtPhns=
=+aN9
-----END PGP SIGNATURE-----

_______________________________________________
MediaWiki-l mailing list
[hidden email]
http://lists.wikimedia.org/mailman/listinfo/mediawiki-l