Links with (, ), :, ' - break all the time

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Links with (, ), :, ' - break all the time

Eric K-2
If anyone on an external site posts a link to our wiki, the link will often break if it has spaces, brackets, apostrophes, colons, e.g.:
input, output rendered as:

input: www.mywiki.com/wiki/don't speakoutput: www.mywiki.com/wiki/don              [breaks at the apostrophe]


Or:

input: www.mywiki.com/wiki/orange (fruit)
output: www.mywiki.com/wiki/orange (fruit                 [breaks at a bracket]

And so on.


The links break and I am not able to do anything. Sometimes I email webmasters telling them to fix their rendering software but this doesn't achieve anything. There are 100's if not 1000s of incorrect rendering software routines, which are just not used to treating brackets, spaces, apostrophes and other characters correctly. I no longer want to use these characters in the article name anymore. For internal wiki links, I can do [[dontspeak|Don't Speak (song)]] and deal with having the article URL different from the title for some articles. That is perfectly fine because my higher priority is to prevent broken
links and prevent losing site visitors who could not arrive at the correct link.

How this would work is, I would probably set the Title in a separate tag, e.g. I would create:

- www.mywiki.com/wiki/dontspeak

And on that page, I would have something like:
<title>Don't Speak (song)</title>

This would be the H1 for that page where the title is usually found. This way people get everything they need:- a working URL that wont break on other websites and forums

- a title that can be any number of characters long.
Does anything exist that does this, and if not, any suggestions on how to get this done would be appreciated.
Please note, I'm not talking about sorting categories. This is about the title of the page and its URL and for some pages I want these to be different from each other. I see the HTML code for that heading is:
- <h1 id="firstHeading"
So maybe I would overwrite that H1 tag with that ID with my own title, or something like that. I don't know if its possible.


thanks
Erik
_______________________________________________
MediaWiki-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: Links with (, ), :, ' - break all the time

Benjamin Lees
http://www.mediawiki.org/wiki/Manual:$wgAllowDisplayTitle

But why don't you just use redirects?

_______________________________________________
MediaWiki-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: Links with (, ), :, ' - break all the time

Eric K-2
In reply to this post by Eric K-2
I found the question to my own answer in a previous email to this list. The answer is at:
http://www.mediawiki.org/wiki/Manual:$wgAllowDisplayTitle.
{{DISPLAYTITLE:Foo}} in the page can be used to set the page title.




________________________________
 From: Eric K <[hidden email]>
To: "[hidden email]" <[hidden email]>
Sent: Sunday, January 8, 2012 7:45 PM
Subject: [Mediawiki-l] Links with (, ), :, ' - break all the time
 
If anyone on an external site posts a link to our wiki, the link will often break if it has spaces, brackets, apostrophes, colons, e.g.:
input, output rendered as:

input: www.mywiki.com/wiki/don't speakoutput: www.mywiki.com/wiki/don              [breaks at the apostrophe]


Or:

input: www.mywiki.com/wiki/orange (fruit)
output: www.mywiki.com/wiki/orange (fruit                 [breaks at a bracket]

And so on.


The links break and I am not able to do anything. Sometimes I email webmasters telling them to fix their rendering software but this doesn't achieve anything. There are 100's if not 1000s of incorrect rendering software routines, which are just not used to treating brackets, spaces, apostrophes and other characters correctly. I no longer want to use these characters in the article name anymore. For internal wiki links, I can do [[dontspeak|Don't Speak (song)]] and deal with having the article URL different from the title for some articles. That is perfectly fine because my higher priority is to prevent broken
links and prevent losing site visitors who could not arrive at the correct link.

How this would work is, I would probably set the Title in a separate tag, e.g. I would create:

- www.mywiki.com/wiki/dontspeak

And on that page, I would have something like:
<title>Don't Speak (song)</title>

This would be the H1 for that page where the title is usually found. This way people get everything they need:- a working URL that wont break on other websites and forums

- a title that can be any number of characters long.
Does anything exist that does this, and if not, any suggestions on how to get this done would be appreciated.
Please note, I'm not talking about sorting categories. This is about the title of the page and its URL and for some pages I want these to be different from each other. I see the HTML code for that heading is:
- <h1 id="firstHeading"
So maybe I would overwrite that H1 tag with that ID with my own title, or something like that. I don't know if its possible.


thanks
Erik
_______________________________________________
MediaWiki-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
_______________________________________________
MediaWiki-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: Links with (, ), :, ' - break all the time

Kilian-3
In reply to this post by Benjamin Lees
On 01/09/2012 03:02 AM, Benjamin Lees wrote:
> But why don't you just use redirects?

Redirects wouldn't solve the problem. Users would be redirected to URLs
with spaces/punctuation, copy them from their browser's location bar and
still post them elsewhere.

_______________________________________________
MediaWiki-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: Links with (, ), :, ' - break all the time

Eric K-2
In reply to this post by Eric K-2
I'm thinking of using the DISPLAYTITLE tag, if a page name contains:
- brackets, commas, colons or apostrophes
I would try not to use these characters in the page name but sometimes they have to be used. I would strip the page name of these characters and use dashes instead, and these alternate names would appear in Recent Changes, Contributions and other places and still look OK.
I would have to use the [[Foo|Bar]] format for making internal wiki links.

This would all be have to be done to prevent the following kinds of links from being broken:
wiki/Don't Speak (song)
Here the page title would be: Dont Speak - song
This would be the actual name of the page and the link wouldnt break. I would use

- {{DISPLAYTITLE|Don't Speak (song)}}
to set the correct page title. Preventing broken incoming links is a high priority for me.

If anyone has any better suggestions on how to stop incoming links from being broken by various characters, let me know.

Erik





________________________________
 From: Eric K <[hidden email]>
To: Eric K <[hidden email]>; MediaWiki announcements and site admin list <[hidden email]>
Sent: Sunday, January 8, 2012 8:48 PM
Subject: Re: [Mediawiki-l] Links with (, ), :, ' - break all the time
 

I found the question to my own answer in a previous email to this list. The answer is at:
http://www.mediawiki.org/wiki/Manual:$wgAllowDisplayTitle.
{{DISPLAYTITLE:Foo}} in the page can be used to set the page title.




________________________________
 From: Eric K <[hidden email]>
To: "[hidden email]" <[hidden email]>
Sent: Sunday, January 8, 2012 7:45 PM
Subject: [Mediawiki-l] Links with (, ), :, ' - break all the time
 
If anyone on an external site posts a link to our wiki, the link will often break if it has spaces, brackets, apostrophes, colons, e.g.:
input, output rendered as:

input: www.mywiki.com/wiki/don't speakoutput: www.mywiki.com/wiki/don              [breaks at the apostrophe]


Or:

input: www.mywiki.com/wiki/orange (fruit)
output: www.mywiki.com/wiki/orange (fruit                 [breaks at a bracket]

And so on.


The links break and I am not able to do anything. Sometimes I email webmasters telling them to fix their rendering software but this doesn't achieve anything. There are 100's if not 1000s of incorrect
 rendering software routines, which are just not used to treating brackets, spaces, apostrophes and other characters correctly. I no longer want to use these characters in the article name anymore. For internal wiki links, I can do [[dontspeak|Don't Speak (song)]] and deal with having the article URL different from the title for some articles. That is perfectly fine because my higher priority is to prevent broken
links and prevent losing site visitors who could not arrive at the correct link.

How this would work is, I would probably set the Title in a separate tag, e.g. I would create:

- www.mywiki.com/wiki/dontspeak

And on that page, I would have something like:
<title>Don't Speak (song)</title>

This would be the H1 for that page where the title is usually found. This way people get everything they need:- a working URL that wont break on
 other websites and forums

- a title that can be any number of characters long.
Does anything exist that does this, and if not, any suggestions on how to get this done would be appreciated.
Please note, I'm not talking about sorting categories. This is about the title of the page and its URL and for some pages I want these to be different from each other. I see the HTML code for that heading is:
- <h1 id="firstHeading"
So maybe I would overwrite that H1 tag with that ID with my own title, or something like that. I don't know if its possible.


thanks
Erik
_______________________________________________
MediaWiki-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
_______________________________________________
MediaWiki-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: Links with (, ), :, ' - break all the time

Eric K-2
In reply to this post by Kilian-3
Yea I've seen that recommendation to use redirects. Its not a good solution. For example we have two pages:
mysite.com/wiki/Don't Speak (song)
mysite.com/wiki/Don'Amos
Both of these URL's would break at:
mysite.com/wiki/Don
and a redirect would not work. I could create a "help" page for the "Don" entry but I've lost the visitor's interest already. Onmy site I have many pages breaking "early" at the same location in this way, so this is an actual problem for me.

Sometimes we would need multiple redirects for a page, e.g. the same title:
mysite.com/wiki/Don't Speak (song)
There's 3 possible places where the link could break:

mysite.com/wiki/Don

mysite.com/wiki/Don't Speak (
mysite.com/wiki/Don't Speak (song

As again, the first break might not be enough to specify where the user wanted to go. So now we need to create multiple redirects for any page that has this problem and we're still not sure the visitor will arrive where they wanted to. So its not a feasible solution to create redirects. Plus suppose I do have a redirect for a certain entry. If a user posts a link on a forum the forum software renders it as:
mysite.com/wiki/Don't Speak (song
This will confuse the user and make them wonder if the link will work or not.

The only real solution is to not allow these characters in a URL at all:
- commas, apostrophes, brackets, colons, semicolons and so on.
That is the real problem that needs to be dealt with somehow. Since there are 1000's of URL rendering software routines all over the web, we can be sure that if a URL just has letters, numbers, underscores and dashes, it will definitely work.
If someone was writing a URL rendering routine and they saw this:
"Hey, did you see the site I sent you (http://mediawiki.org/wiki/extensions(safe)), was actually not working?"
The software guy will break the URL before the ending bracket, while Mediawiki wants the bracket to be part of the URL.
In this case:
"Hey, I went to http://www.mediawiki.org/wiki/blah, had coffee and then went to bed".
Here again the software guy will break it on the comma while MW might want us to include the comma in the URL.

So in my opinion, MW should take care of this one way or the other and not allow people to use these forbidden characters in the URL, while allowing them to be used in the page heading.
{{DISPLAYTITLE}} works but I have to take some additional steps. It should work like this:
- For a Page with the name (or URL) "Foo", if we use {{DISPLAYTITLE|Bar}} on that page, then:
--- if we make interwiki links to [[Foo]] or [[Bar]], it should automatically always link to wiki/Foo, but display it as "Bar". *
--- Any automatically generated page logs and contributions links and so on, should always link to Foo, but display it as Bar.
--- If we want to have a different text display, we can use [[Foo|Blah blah]] as usual.
The URL (Foo) is where we're restricted with characters, and Bar is where we have complete freedom to display anything in the page heading.
*: Once again, our first priority is to prevent broken links and although this creates an inconsistency as compared to other pages on the site, this is the only option we have.
Also, having a link that doesnt match up with the page heading, is very commonplace on non-wiki sites, so its not a problem.

I'm OK with the solution I posted. I would use DISPLAYTITLE, and use [[Foo|Bar]] for interwiki links. I would format Foo to be as close to Bar as possible, but not use any problematic characters.
So the page name would be:
Dont Speak - Song  (actual name of the page)
And the page heading, displayed with {{DISPLAYTITLE| }}, would be as I wanted it to be:
[[Dont Speak - song  | Don't Speak (song)]]
I have no option but to not use the apostrophe. Lets see how Yahoo mail and the list software format this URL:
http://en.wikipedia.org/wiki/Don't_Speak            - [1]
I know it wont work if its posted on topix.com and many other sites.
Another problem is that if we have these characters in the URL and they're encoded in % signs, e.g.:
http://en.wikipedia.org/wiki/Don%27t_Speak
That doesn't look good.
The % sign encoding does make the URL work, but its not guaranteed that the user will get it like I get it. Many times I've seen people copy pasting links to my site but they didnt have the % encoding and they broke. I dont know how that happened, but I wont blame the user. They simply copy pasted in a different environment. A URL should work if copy pasted and if we're tolerating a failure rate here, it should be extremely low (say, 1%). For example the word "Apple" can be copy pasted with the same results in every environment, but the URL [1] has a high failure rate, which is why I've seen many broken links. So to me, the % encoding is not a solution that prevents failure and therefore should not be used.

First, we are a website to the world, and then we are a wiki to the people who work on the site. So my first priority is to have links that don't break. If that means having links which will work, but dont look perfect (e.g. using "Dont", which is not grammatically correct), I would rather do that than have a grammatically correct link that will break when posted on some websites.

In any case I think this is something that should have been dealt with so these problematic characters would never be seen in the URL of wikipedia or any other mediawiki site.
Its not a big problem for me to use the DISPLAYTITLE feature and do the work arounds and tolerate some non-ideal page logs, which will show the page URL, instead of the page title (I will try to have the smallest possible difference). I'm glad that solution is an option and the feature is built in.
I do think the the performance of URL's on a website is a serious issue, and they should always work and if I have to do some extra work to make them work, thats fine with me.

I wish I didn't have to use these characters in the page heading but many times I have to and that freedom should be there, as it exists on a non-wiki website and at the same time, I should not have a URL that might break and its OK to have the page heading and URL different from each other. I can imagine millions of Mediawiki links breaking every day due to the presence of these characters. If the MW software people decided to deal with this, they would have to figure out a way how to keep the page heading seperate from the URL and still have everything work fine.
Now my site wouldn't exist without the MW software so I'm very thankful to all those who have worked on it.
But anyway, yea - these are some of my thoughts on URL breaks and page headings.

Eric












________________________________
 From: Kilian <[hidden email]>
To: [hidden email]
Sent: Monday, January 9, 2012 12:47 PM
Subject: Re: [Mediawiki-l] Links with (, ), :, ' - break all the time
 
On 01/09/2012 03:02 AM, Benjamin Lees wrote:
> But why don't you just use redirects?

Redirects wouldn't solve the problem. Users would be redirected to URLs
with spaces/punctuation, copy them from their browser's location bar and
still post them elsewhere.

_______________________________________________
MediaWiki-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
_______________________________________________
MediaWiki-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: Links with (, ), :, ' - break all the time

Platonides
On 10/01/12 22:57, Eric K wrote:
> The only real solution is to not allow these characters in a URL at all:
> - commas, apostrophes, brackets, colons, semicolons and so on.

Just change $wgLegalTitleChars to a subset that pleases you.


_______________________________________________
MediaWiki-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: Links with (, ), :, ' - break all the time

Eric K-2
Thanks! I'm altering existing titles and will use that wG variable you suggested to keep future titles correct.



________________________________
 From: Platonides <[hidden email]>
To: [hidden email]
Sent: Saturday, January 21, 2012 4:29 PM
Subject: Re: [Mediawiki-l] Links with (, ), :, ' - break all the time
 
On 10/01/12 22:57, Eric K wrote:
> The only real solution is to not allow these characters in a URL at all:
> - commas, apostrophes, brackets, colons, semicolons and so on.

Just change $wgLegalTitleChars to a subset that pleases you.


_______________________________________________
MediaWiki-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
_______________________________________________
MediaWiki-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l