[RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

classic Classic list List threaded Threaded
51 messages Options
123
Reply | Threaded
Open this post in threaded view
|

Re: [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

Daniel Friesen-2
On 2013-09-16 7:09 PM, Gabriel Wicke wrote:

> Any of the entry points? Any new entry point? Anything we ever want to
> put into the root?
> We should be able to avoid most conflicts by picking prefixed entry
> points. However, as we can't drop the clashing /w/api.php any time soon
> I have removed the /wiki/ part from the RFC:
>
> https://www.mediawiki.org/wiki/Requests_for_comment/Clean_up_URLs
>
> So now only the conversion from
>
> /w/index.php?title=foo?action=history
> to
> /foo?action=history
>
> is under discussion.
>
> Gabriel
Has the practice of disallowing /w/ or /index.php inside robots.txt to
force search engines to completely ignore search, edit pages,
exponential pagination, etc.. been considered?

Btw, side note on root urls. We still have an open bug allowing attacks
on wikis using root paths:
https://bugzilla.wikimedia.org/show_bug.cgi?id=38048

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

Tim Starling-2
In reply to this post by Gabriel Wicke-3
On 17/09/13 11:08, Gabriel Wicke wrote:

> On 09/16/2013 04:34 PM, Brian Wolff wrote:
>> Additionally there is some security issues in ie6 when doing foo?action=raw
>> if I recall.
>
> Yes, IIRC some version of IE disregarded the Content-type header and
> guessed the content type based on the URL and the content. If the URL
> contained .php (only outside the query string?), it disabled this behavior.
>
> Tim mentions in
> https://www.mediawiki.org/wiki/Special:Code/MediaWiki/49833#c3561 that
> this only applied to IE3 and earlier, and IE4 respects the Content-type
> header. As the market share of IE <= 3 is probably non-existent we could
> probably blacklist it from logging in and content API access altogether.

This issue affects IE at least up to IE 6, possibly later, see bug 28235.

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

Gabriel Wicke-3
In reply to this post by Daniel Friesen-2
On 09/16/2013 07:24 PM, Daniel Friesen wrote:

> On 2013-09-16 7:09 PM, Gabriel Wicke wrote:
>> Any of the entry points? Any new entry point? Anything we ever want to
>> put into the root?
>> We should be able to avoid most conflicts by picking prefixed entry
>> points. However, as we can't drop the clashing /w/api.php any time soon
>> I have removed the /wiki/ part from the RFC:
>>
>> https://www.mediawiki.org/wiki/Requests_for_comment/Clean_up_URLs
>>
>> So now only the conversion from
>>
>> /w/index.php?title=foo?action=history
>> to
>> /foo?action=history
>>
>> is under discussion.
>>
>> Gabriel
> Has the practice of disallowing /w/ or /index.php inside robots.txt to
> force search engines to completely ignore search, edit pages,
> exponential pagination, etc.. been considered?

See
https://www.mediawiki.org/wiki/Requests_for_comment/Clean_up_URLs#Migration

> Btw, side note on root urls. We still have an open bug allowing attacks
> on wikis using root paths:
> https://bugzilla.wikimedia.org/show_bug.cgi?i

That looks like a fixable bug. In Parsoid for example all internal links
are relative, which avoids the protocol-relative URL issue you reported
there.

Gabriel

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

Jeremy Baron
In reply to this post by Gabriel Wicke-3
On Tue, Sep 17, 2013 at 2:09 AM, Gabriel Wicke <[hidden email]> wrote:
> So now only the conversion from
>
> /w/index.php?title=foo?action=history
> to
> /foo?action=history

Do you mean:

to
/wiki/foo?action=history

?

> is under discussion.

See also https://gerrit.wikimedia.org/r/51595 and RT# 864 (aka
https://bugzilla.wikimedia.org/21919 ) which all seem to prefer
docroot verification rather than DNS.

-Jeremy

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

Jon Robson
In reply to this post by Gabriel Wicke-3
I would suggest taking a look at the number of 404s caused by people trying
to access pages without the wiki prefix.... This would be interesting data
to go alongside this interesting proposal...
On 16 Sep 2013 20:01, "Gabriel Wicke" <[hidden email]> wrote:

> On 09/16/2013 07:24 PM, Daniel Friesen wrote:
> > On 2013-09-16 7:09 PM, Gabriel Wicke wrote:
> >> Any of the entry points? Any new entry point? Anything we ever want to
> >> put into the root?
> >> We should be able to avoid most conflicts by picking prefixed entry
> >> points. However, as we can't drop the clashing /w/api.php any time soon
> >> I have removed the /wiki/ part from the RFC:
> >>
> >> https://www.mediawiki.org/wiki/Requests_for_comment/Clean_up_URLs
> >>
> >> So now only the conversion from
> >>
> >> /w/index.php?title=foo?action=history
> >> to
> >> /foo?action=history
> >>
> >> is under discussion.
> >>
> >> Gabriel
> > Has the practice of disallowing /w/ or /index.php inside robots.txt to
> > force search engines to completely ignore search, edit pages,
> > exponential pagination, etc.. been considered?
>
> See
> https://www.mediawiki.org/wiki/Requests_for_comment/Clean_up_URLs#Migration
>
> > Btw, side note on root urls. We still have an open bug allowing attacks
> > on wikis using root paths:
> > https://bugzilla.wikimedia.org/show_bug.cgi?i
>
> That looks like a fixable bug. In Parsoid for example all internal links
> are relative, which avoids the protocol-relative URL issue you reported
> there.
>
> Gabriel
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

Gabriel Wicke-3
In reply to this post by Tim Starling-2
On 09/16/2013 07:48 PM, Tim Starling wrote:
> On 17/09/13 11:08, Gabriel Wicke wrote:
>> Tim mentions in
>> https://www.mediawiki.org/wiki/Special:Code/MediaWiki/49833#c3561 that
>> this only applied to IE3 and earlier, and IE4 respects the Content-type
>> header. As the market share of IE <= 3 is probably non-existent we could
>> probably blacklist it from logging in and content API access altogether.
>
> This issue affects IE at least up to IE 6, possibly later, see bug 28235.

Thanks for the pointer! It is sad that IE6 (and likely IE7) is still
haunting us. IE8+ is covered by the X-Content-Type-Options header.

It sounds like your Content-Disposition solution [1] should still work
for IE6/7 where that header is not used otherwise. The existing users of
that header all seem to be file-related. Did I miss any use in action
handlers?

Gabriel

[1]: https://bugzilla.wikimedia.org/show_bug.cgi?id=28235#c6

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

Gabriel Wicke-3
In reply to this post by Jeremy Baron
On 09/16/2013 08:48 PM, Jeremy Baron wrote:
> On Tue, Sep 17, 2013 at 2:09 AM, Gabriel Wicke <[hidden email]> wrote:
>> /w/index.php?title=foo?action=history
>> to
>> /foo?action=history
>
> Do you mean:
>
> to
> /wiki/foo?action=history

Yes, sorry. The RFC had it right, in case you read that ;)

Gabriel

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

Tim Starling-2
In reply to this post by Gabriel Wicke-3
On 17/09/13 14:01, Gabriel Wicke wrote:

> On 09/16/2013 07:48 PM, Tim Starling wrote:
>> On 17/09/13 11:08, Gabriel Wicke wrote:
>>> Tim mentions in
>>> https://www.mediawiki.org/wiki/Special:Code/MediaWiki/49833#c3561 that
>>> this only applied to IE3 and earlier, and IE4 respects the Content-type
>>> header. As the market share of IE <= 3 is probably non-existent we could
>>> probably blacklist it from logging in and content API access altogether.
>>
>> This issue affects IE at least up to IE 6, possibly later, see bug 28235.
>
> Thanks for the pointer! It is sad that IE6 (and likely IE7) is still
> haunting us. IE8+ is covered by the X-Content-Type-Options header.
>
> It sounds like your Content-Disposition solution [1] should still work
> for IE6/7 where that header is not used otherwise. The existing users of
> that header all seem to be file-related. Did I miss any use in action
> handlers?

I'm assuming you can grep for Content-Disposition as well as I can.
IIRC, the difficulty with Content-Disposition, in the context of a
security patch, was the need to abstract handling of the header out of
the various places that send it, so that it would be consistent and
demonstrably secure. That would have made the security patch larger
and more complex than it needed to be, which would have been a problem
for backporters. That shouldn't be a concern for your feature.

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

K. Peachey-2
In reply to this post by Gabriel Wicke-3
On Tue, Sep 17, 2013 at 8:34 AM, Gabriel Wicke <[hidden email]> wrote:

> There *might* be, in theory. In practice I doubt that there are any
> articles starting with 'w/'. To avoid future conflicts, we should
> probably prefix private paths with an underscore as titles cannot start
> with it (and REST APIs often use it for special resources).
>
>
I bet people have said that about single letter interwikis, but we do have
quiet a few "<single letter>:" page titles around. have "<single letter>/"
is not un-believable.
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

Nikola Smolenski-2
On 17/09/13 10:24, K. Peachey wrote:

> On Tue, Sep 17, 2013 at 8:34 AM, Gabriel Wicke <[hidden email]> wrote:
>
>> There *might* be, in theory. In practice I doubt that there are any
>> articles starting with 'w/'. To avoid future conflicts, we should
>> probably prefix private paths with an underscore as titles cannot start
>> with it (and REST APIs often use it for special resources).
>>
>>
> I bet people have said that about single letter interwikis, but we do have
> quiet a few "<single letter>:" page titles around. have "<single letter>/"
> is not un-believable.

I have found 2476 pages in English Wikipedia that start with
'[something]/', inlcuding pages starting with '//'. None of them start
with a small letter though, for obvious reasons.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

Daniel Kinzler
In reply to this post by Gabriel Wicke-3
Am 17.09.2013 00:34, schrieb Gabriel Wicke:
> There *might* be, in theory. In practice I doubt that there are any
> articles starting with 'w/'.

I count 10 on en.wiktionary.org:

https://en.wiktionary.org/w/index.php?title=Special%3APrefixIndex&prefix=w%2F&namespace=0

> To avoid future conflicts, we should
> probably prefix private paths with an underscore as titles cannot start
> with it (and REST APIs often use it for special resources).

That would be better.

But still, I think this is a bad idea. Essentially, putting Articles at the root
of the domain mains hogging the domain as a namespace. Depending on what you
want to do with your wiki, this is not a good idea.

For insteancve, wikidata uses the /entity/ path for URIs representing things,
while the documents under /wiki/ are descriptions of these things. If page
content was located at the root, we'd have nasty namespace pollution.

Basically: page content is only one of the things a wiki may server. "Internal"
resources like CSS are another. But there may be much more, like structured
data. It's good to use prefixes to keep these apart.

-- daniel


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

Daniel Friesen-2
In reply to this post by Nikola Smolenski-2
On 2013-09-17 2:29 AM, Nikola Smolenski wrote:

> On 17/09/13 10:24, K. Peachey wrote:
>> On Tue, Sep 17, 2013 at 8:34 AM, Gabriel Wicke <[hidden email]>
>> wrote:
>>
>>> There *might* be, in theory. In practice I doubt that there are any
>>> articles starting with 'w/'. To avoid future conflicts, we should
>>> probably prefix private paths with an underscore as titles cannot start
>>> with it (and REST APIs often use it for special resources).
>>>
>>>
>> I bet people have said that about single letter interwikis, but we do
>> have
>> quiet a few "<single letter>:" page titles around. have "<single
>> letter>/"
>> is not un-believable.
>
> I have found 2476 pages in English Wikipedia that start with
> '[something]/', inlcuding pages starting with '//'. None of them start
> with a small letter though, for obvious reasons.
The problem with that query is you're searching Wikipedia. Try
Wiktionary instead. I found 5 just on the first letter I tested
https://en.wiktionary.org/wiki/Special:PrefixIndex/a/

Also pages prefixed with "<single letter>/" aren't the only thing that
creates conflicts. As far as standard rewrite rules and webservers are
considered a directory at /a/ and /a are the same thing. See how
https://en.wikipedia.org/w is not a 404 pointing to [[w]] like
https://en.wikipedia.org/a is but instead is the same as w/ and hence
w/index.php. So really any single letter article on a root pathed wiki
conflicts with any single letter root directory. ;) And Wikipedia has a
redirect like that for every single letter of the latin alphabet.
(Actually forget the latin alphabet, they've practically got most of
Unicode there)

Side topic https://en.wiktionary.org/w/r/t is messed up: " To check for
"r/t" on Wikipedia, see: //en.wikipedia.org/wiki/r/t
<https://en.wikipedia.org/wiki/r/t>"

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

Daniel Friesen-2
In reply to this post by Gabriel Wicke-3
On 2013-09-16 8:01 PM, Gabriel Wicke wrote:

> On 09/16/2013 07:24 PM, Daniel Friesen wrote:
>> On 2013-09-16 7:09 PM, Gabriel Wicke wrote:
>>> Any of the entry points? Any new entry point? Anything we ever want to
>>> put into the root?
>>> We should be able to avoid most conflicts by picking prefixed entry
>>> points. However, as we can't drop the clashing /w/api.php any time soon
>>> I have removed the /wiki/ part from the RFC:
>>>
>>> https://www.mediawiki.org/wiki/Requests_for_comment/Clean_up_URLs
>>>
>>> So now only the conversion from
>>>
>>> /w/index.php?title=foo?action=history
>>> to
>>> /foo?action=history
>>>
>>> is under discussion.
>>>
>>> Gabriel
>> Has the practice of disallowing /w/ or /index.php inside robots.txt to
>> force search engines to completely ignore search, edit pages,
>> exponential pagination, etc.. been considered?
> See
> https://www.mediawiki.org/wiki/Requests_for_comment/Clean_up_URLs#Migration
Ok. Though even assuming the * and Allow: non-standard features are
supported by all bots we want to target I actually don't like the idea
of blacklisting /wiki/*? in this way.

I don't think that every url with a query in it qualifies as something
we want to blacklist from search engines. There are plenty but sometimes
there is content that's served with a query which could otherwise be a
good idea to index.

For example the non-first pages on long categories and Special:Allpages'
pagination. The latter has robots=noindex – though I think we may want
to reconsider that – but the former is not noindexed and with the
introduction of rel="next", etc... would be pretty reasonable to index
but is currently blacklisted by robots.txt.
Additionally while we normally want to noindex edit pages. This isn't
true of redlinks in every case. Take redlinked category links for
example. These link to an action=edit&redlink=1 which for a search
engine would then redirect back to the pretty url for the category. But
because of robots.txt this link is masked because the intermediate
redirect cannot be read by the search engine.

The idea I had to fix that naturally was to make MediaWiki aware of this
and whether by a new routing system or simply filters for specific
simple queries make it output /wiki/title?query urls for those cases
where it's a query we would want indexed and leave robots blacklisted
stuff under /w/ (though I did also consider a separate short url path
like /w/page/$1 to make internal/robots blacklisted urls pretty).
However adding Disallow: /wiki/*? to robots.txt will preclude the
ability to do that.

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

Nikola Smolenski-2
In reply to this post by Daniel Friesen-2
On 17/09/13 11:59, Daniel Friesen wrote:
> On 2013-09-17 2:29 AM, Nikola Smolenski wrote:
>> I have found 2476 pages in English Wikipedia that start with
>> '[something]/', inlcuding pages starting with '//'. None of them start
>> with a small letter though, for obvious reasons.
> The problem with that query is you're searching Wikipedia. Try
> Wiktionary instead. I found 5 just on the first letter I tested
> https://en.wiktionary.org/wiki/Special:PrefixIndex/a/

There are 124 of which 63 start with a small letter.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

Daniel Friesen-2
In reply to this post by Daniel Kinzler
On 2013-09-17 2:48 AM, Daniel Kinzler wrote:

>> To avoid future conflicts, we should
>> probably prefix private paths with an underscore as titles cannot start
>> with it (and REST APIs often use it for special resources).
> That would be better.
>
> But still, I think this is a bad idea. Essentially, putting Articles at the root
> of the domain mains hogging the domain as a namespace. Depending on what you
> want to do with your wiki, this is not a good idea.
>
> For insteancve, wikidata uses the /entity/ path for URIs representing things,
> while the documents under /wiki/ are descriptions of these things. If page
> content was located at the root, we'd have nasty namespace pollution.
>
> Basically: page content is only one of the things a wiki may server. "Internal"
> resources like CSS are another. But there may be much more, like structured
> data. It's good to use prefixes to keep these apart.
>
> -- daniel
+1

We've got others for content-related things too besides ones for
internal resources and structured data.

eg: https://test2.wikipedia.org/s/85

((And I'll try to resist starting a rant about the knockoff "REST" which
is a partial premise here))

~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

Brad Jorsch (Anomie)
In reply to this post by Gabriel Wicke-3
On Mon, Sep 16, 2013 at 7:41 PM, Gabriel Wicke <[hidden email]> wrote:
> Using sub-resources rather than the random switch to /w/index.php is
> more important for caching (promotes deterministic URLs) and does not
> seem to involve similar trade-offs.

Note that "promotes deterministic URLs" applies only to cases where
only one parameter other than 'title' is provided to index.php
(usually this parameter is 'action'). If the URL has more than one
parameter other than 'title', you're still out of luck.

"But you can turn on $wgActionPaths to remove 'action' from the query
string too!" you say? But then you're still stuck if the URL has two
parameters other than 'action' and 'title'. Such as "offset" and
"limit", for example.


--
Brad Jorsch (Anomie)
Software Engineer
Wikimedia Foundation

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

Gabriel Wicke-3
In reply to this post by Daniel Kinzler
On 09/17/2013 02:48 AM, Daniel Kinzler wrote:
> Am 17.09.2013 00:34, schrieb Gabriel Wicke:
>> There *might* be, in theory. In practice I doubt that there are any
>> articles starting with 'w/'.
>
> I count 10 on en.wiktionary.org:
>
> https://en.wiktionary.org/w/index.php?title=Special%3APrefixIndex&prefix=w%2F&namespace=0

The good news is that none of them is /w/{index,api,load}.php ;)

>> To avoid future conflicts, we should
>> probably prefix private paths with an underscore as titles cannot start
>> with it (and REST APIs often use it for special resources).
>
> That would be better.
>
> But still, I think this is a bad idea. Essentially, putting Articles at the root
> of the domain mains hogging the domain as a namespace. Depending on what you
> want to do with your wiki, this is not a good idea.

I agree that it does not make sense to place the wiki at the root level
if you are running (or plan to run) other services on the domain. On
Wikipedia, the wiki is the primary use case. Optimizing for the common
use case can be a good idea.

> Basically: page content is only one of the things a wiki may server. "Internal"
> resources like CSS are another. But there may be much more, like structured
> data. It's good to use prefixes to keep these apart.

For different representations of the same resource there is also much to
be said for suffixes, even if some of those representations are not
visual. Additionally, we have namespaces as a prefix mechanism within a
wiki. There will sure be cases where leaving the wiki makes sense, but I
am hesitant to discard the flat wiki namespace all too quickly.

Gabriel

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

Gabriel Wicke-3
In reply to this post by Brad Jorsch (Anomie)
On 09/17/2013 08:40 AM, Brad Jorsch (Anomie) wrote:
> On Mon, Sep 16, 2013 at 7:41 PM, Gabriel Wicke <[hidden email]> wrote:
>> Using sub-resources rather than the random switch to /w/index.php is
>> more important for caching (promotes deterministic URLs) and does not
>> seem to involve similar trade-offs.
>
> Note that "promotes deterministic URLs" applies only to cases where
> only one parameter other than 'title' is provided to index.php
> (usually this parameter is 'action'). If the URL has more than one
> parameter other than 'title', you're still out of luck.

An end point that wants to be cacheable should only use one query
parameter, which might well be a path. Hypothetical examples:

http://wiki.org/wiki/Foo?r=latest/html
http://wiki.org/wiki/Foo?r=123456/wikitext

An alternative solution would be to specify a list of required query
parameters and a canonical ordering, and to reject (or redirect)
requests not conforming to this spec. The problem I see with this
approach is that many client libraries don't provide control over the
order of query parameters, which would make such an interface hard to use.

Gabriel

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

Brad Jorsch (Anomie)
On Tue, Sep 17, 2013 at 12:27 PM, Gabriel Wicke <[hidden email]> wrote:
>
> An end point that wants to be cacheable should only use one query
> parameter, which might well be a path. Hypothetical examples:
>
> http://wiki.org/wiki/Foo?r=latest/html
> http://wiki.org/wiki/Foo?r=123456/wikitext

So now you're cramming multiple parameters, ordered, into one
parameter? Why not go all the way and do
http://wiki.org/wiki/123456/wikitext/Foo then?

But IMO, that's ridiculous.

> An alternative solution would be to specify a list of required query
> parameters and a canonical ordering, and to reject (or redirect)
> requests not conforming to this spec.

"reject" is even more ridiculous. "redirect" is less ridiculous, but
is strange and will increase latency and number-of-requests for
clients that don't know the magic order.

What is the actual benefit we're trying to get here? All I've gotten
so far along those lines is "improve cacheability", but it doesn't
seem to have been established whether caching even needs improving in
this area.


--
Brad Jorsch (Anomie)
Software Engineer
Wikimedia Foundation

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: [RFC]: Clean URLs- dropping /wiki/ and /w/index.php?title=..

Gabriel Wicke-3
On 09/17/2013 11:24 AM, Brad Jorsch (Anomie) wrote:

> On Tue, Sep 17, 2013 at 12:27 PM, Gabriel Wicke <[hidden email]> wrote:
>> An end point that wants to be cacheable should only use one query
>> parameter, which might well be a path. Hypothetical examples:
>>
>> http://wiki.org/wiki/Foo?r=latest/html
>> http://wiki.org/wiki/Foo?r=123456/wikitext
>
> So now you're cramming multiple parameters, ordered, into one
> parameter? Why not go all the way and do
> http://wiki.org/wiki/123456/wikitext/Foo then?

I consider the article to be the main resource we are interested in,
with a revision and then a specific part (format) of that revision as a
sub-resource. As our titles can contain slashes we need to delimit the
main resource from the sub-resource part. A single query parameter that
specifies the sub-resource path achieves that.

> What is the actual benefit we're trying to get here? All I've gotten
> so far along those lines is "improve cacheability", but it doesn't
> seem to have been established whether caching even needs improving in
> this area.

A heavily-used content API will perform better and use less resources
when it is cacheable. This will become more important over time, so I
believe it is worth spending a small amount of effort on now.

Gabriel

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
123