Re: [Wikitech-l] Flattening a wikimedia category

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] Flattening a wikimedia category

David Gerard-2
On 4 February 2010 17:38, Daniel Schwen <[hidden email]> wrote:

>> But we need the functionality there first, so we can *then* flatten.

> Ahh, the good old chicken and egg ;-)
> I don't let that count. We have plenty of working category
> intersection tools already.


Yes, but they're not part of the interface.

The technology needs to work with the data - the six million files and
their categories, carefully added by hand by humans.

If category intersections worked, they could then be broken down to
work better with category intersections.

Demanding that all six million files be de-categorised before you'll
even allow a category intersection tool to *possibly* be deployed is
backward.

People need to be able to go gradually.


- d.

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] Flattening a wikimedia category

Daniel Schwen-2
> Yes, but they're not part of the interface.
So what?!
The first step has been made on the technical side. _No_ step has been
made at all on the categorization side.

> The technology needs to work with the data - the six million files and
> their categories, carefully added by hand by humans.
The technology works in principle. But it can never work fully
satisfactory with the current categorization scheme.

> If category intersections worked,
They do.

> Demanding that all six million files be de-categorised before you'll
> even allow a category intersection tool to *possibly* be deployed is
> backward.
I never demanded that. Geez. What I want is the commons community
pledges support for a change of the categorization system. Putting
intersection in the interface before they do is a _waste of time_.
I'm asking for them to show the _tiniest_ sign of support. The
programmers have already bent over backwards (including me with my own
intersection tool)

Of course recategorisation will take time, and so will the deployment
of a production quality intersection interface. Stop pretending that
either side has to finish all their work first and show a polished end
result! That will never work and just stall developments.

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] Flattening a wikimedia category

geni
In reply to this post by David Gerard-2
On 4 February 2010 17:44, David Gerard <[hidden email]> wrote:

> On 4 February 2010 17:38, Daniel Schwen <[hidden email]> wrote:
>
>>> But we need the functionality there first, so we can *then* flatten.
>
>> Ahh, the good old chicken and egg ;-)
>> I don't let that count. We have plenty of working category
>> intersection tools already.
>
>
> Yes, but they're not part of the interface.
>
> The technology needs to work with the data - the six million files and
> their categories, carefully added by hand by humans.
>
> If category intersections worked, they could then be broken down to
> work better with category intersections.
>
> Demanding that all six million files be de-categorised before you'll
> even allow a category intersection tool to *possibly* be deployed is
> backward.
>
> People need to be able to go gradually.
>
>
> - d.

Can be got around by calling the new system tagging and running it
separate from the existing category system.



--
geni

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] Flattening a wikimedia category

Daniel Schwen-2
>> People need to be able to go gradually.

Yeah, tried that before. See [1] (Template:Tag). But that would be
quite the kludge. There are plenty of ways to change the category
system. What should come first is either a dicatorial decree or - if
it must - a vote/!vote for switching systems.

If that is decided a bot could easily be run to write a flattened
category list onto every category page. That list would have to be
manually reviewed for goofs like Aryeh pointed out.
Furthermore we'd have to blacklist every category that does not
describe a singular concept. Can be done using templates that are
manually set and read out by bots.

(abbreviated) example:
Category:Churches in Guernsey

would get a template on its category page with all the categories that
occur somewhere above "Category:Churches in Guernsey" in the tree:
Category:Churches in Normandy
Category:Religious_buildings_in_Guernsey
Category:Religious buildings in Normandy
Category:Buildings in Guernsey
Category:Guernsey
Category:Religion in Guernsey
Category:Buildings in Normandy
Category:Architecture of Normandy
Category:Normandy
Category:Architecture of Europe
Category:Provinces of France
Category:Provinces
Category:History of France
Category:History
Category:History of Europe by country
Category:France
...

A ginormous list. However every blacklisted category could already be
filtered out! Leaving us with
Category:Guernsey
Category:Normandy
Category:Provinces
Category:History
Category:France
...

Well Category:Churches better be in there somewhere ;-). Anyhow that
list will be much shorter now, and users can weed out nonsense
categories like the abstract Category:History and  Category:Provinces
quickly. A bot could then recategorize all images in the reviewed
category.

[1] http://commons.wikimedia.org/w/index.php?title=Special:Undelete&target=Template%3ATag

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] Flattening a wikimedia category

Alison M. Wheeler
<pedant> I know this is just an example but ...
----- "Daniel Schwen" <[hidden email]> wrote:
> (abbreviated) example:
> Category:Churches in Guernsey
> A ginormous list. However every blacklisted category could already be
> filtered out! Leaving us with
> Category:Guernsey
> Category:Normandy
> Category:Provinces
> Category:History
> Category:France

Is it just me who notes that Guernsey is one of the *UK* Channel Islands, and not part of France ...
</pedant>

Alison

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] Flattening a wikimedia category

Daniel Schwen-2
> Is it just me who notes that Guernsey is one of the *UK* Channel Islands, and not part of France ...
> </pedant>

That.. ..uhm... ...*sweat*...

...that was EXACTLY my point! ;-)
The commons categorization system is screwed up :-P

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] Flattening a wikimedia category

Daniel Kinzler
In reply to this post by David Gerard-2
Robert Stojnic schrieb:

> Aryeh Gregor wrote:
>> Right.  Supporting category intersection and search in category with
>> better UI (we already sort of support it if you know the right magic
>> terms) is what we should be aiming for here.
>>  
>
> Last year, just around this time, we came to the exactly same
> conclusion. And similarly like then, there is no shortage of good
> opinions on how to do it, but people to actually do the programming.
>
> r.

Wikimedia Germany has contracted Neil Harris to work on implementing deep
category intersection. The goal is basically a rewrite of my sucky CatScan tool.
The result is hopefully fast & generic enough so it can be used as a service
that integrates with the current search infrastructure.

The project has started, there is funding and a project plan. I expect to see
usable results soon. In fact, I hope to present this at the developer meeting in
april (neil, contact me about attending) and discuss the integration into lucene
search.

I agree that full recursive flattening of the current category structure leads
to bad results some times (especially on the english wikipedia, commons is quite
bad too), a depth of 5 however is generally useful. One common use case is
intersecting a content category with a maintenance category, for organizing
editorial work in a wiki project. In that case, at least one category comes from
a template.

Atomic categorization aka tagging however also sucks: the tags are either too
generic (so it's hard to find stuff) or too specific (you never know what to
search for). tags implying/including other tags is very useful. which is exactly
what categories with deep intersection will provide.


-- daniel


_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] Flattening a wikimedia category

Daniel Schwen-2
On Thu, Feb 4, 2010 at 1:50 PM, Daniel Kinzler <[hidden email]> wrote:
[...]

Ok, all that sounded oddly familiar...

> Atomic categorization aka tagging however also sucks:
Well, I certainly would not say it sucks. After all _every_ major
image library uses it. Will it be perfect? Probably not, but perfect
is the enemy of good enough ;-)

> the tags are either too
> generic (so it's hard to find stuff) or too specific (you never know what to
> search for). tags implying/including other tags is very useful. which is exactly
I do not see this problem at all. In my example above we would have
_both_ specific (Normandy, Guernsey) and general (France) tags. Search
for what ever you like and narrow down using intersection. How can you
not know what to search for? This is a problem we have _now_! Out
categories are ridiculously specific. Going atomic will only make this
situation better in this respect.

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] Flattening a wikimedia category

geni
In reply to this post by Daniel Schwen-2
On 4 February 2010 19:49, Daniel Schwen <[hidden email]> wrote:
>> Is it just me who notes that Guernsey is one of the *UK* Channel Islands, and not part of France ...
>> </pedant>
>
> That.. ..uhm... ...*sweat*...
>
> ...that was EXACTLY my point! ;-)
> The commons categorization system is screwed up :-P

Not really. The islands belonged to Normandy and still belong to the
Duchy of Normandy. They are not part of the UK. Normandy however is
now part of France.

--
geni

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] Flattening a wikimedia category

David Gerard-2
On 4 February 2010 20:12, geni <[hidden email]> wrote:

> Not really. The islands belonged to Normandy and still belong to the
> Duchy of Normandy. They are not part of the UK. Normandy however is
> now part of France.


They are indeed not part of the UK. They just, er, share in the phone,
monetary and postal systems ... to some degree ...


- d.

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] Flattening a wikimedia category

David Gerard-2
In reply to this post by David Gerard-2
On 7 February 2010 08:45, Andrew Garrett <[hidden email]> wrote:

> Not at all, it's entirely reasonable to discuss the problems associated
> with the current categorisation system, and what methods we'd like to
> use to improve it.


The current categorization system is per-wiki-specific. It's done
differently in different places. So it's not clear that you won't
require 750 different discussions.

To get back to the topic of category intersections on Commons:

Could the developers please outline, point by point, the precise hoops
we need to jump through to get category intersections on Commons? New
hoops seem to have been introduced during the currently discussion.

Please make an unambiguous list of the hoops Commons will be required
to jump through before this feature can happen, so it's actually clear
to all and we're all working from the same page, rather than trying to
guess what shrubbery you'll be demanding next.

Thanks!


- d.

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] Flattening a wikimedia category

David Gerard-2
On 7 February 2010 13:09, Daniel Schwen <[hidden email]> wrote:

> Ok, lets's say Neil found a way to deal with 10. I give you that this
> is implementation specific. Number 2) however is independent of any
> implementation. Here you have your "hoop" (to to stick with your
> pejorative lingo): Get rid of the crazy category system and go atomic.
> What is vague about this, what part of this is unclear to you?


The problem is that doing this before the feature that uses it is in
place renders categorisation on Commons even more useless. What this
will mean is that you will be requiring a direct reduction in the
usability of the wiki content before *possibly* implementing a
feature.

In practice, the difference between this and saying "No, never" is
telling people to do work that you know can't happen.

Please leave commons-l in the cc: this time, thanks.


- d.

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] Flattening a wikimedia category

David Gerard-2
On 7 February 2010 13:27, Roan Kattouw <[hidden email]> wrote:

> There's no reason why it couldn't be the other way around: an
> intersection feature could be written and deployed *first*, *then* the
> category trees on Commons would be gradually migrated to the new
> system. Issues like nonsense results for automatic flattening could be
> migitated by disabling features or making them less visible.


*Precisely*. This is why the new (and it is new) demand to trash the
present category tree before *possibly* implementing a category
intersection feature is, in practical terms, indistinguishable from
sheer contemptuous obstructionism. Daniel may be terribly offended
that I dare to be acerbic about his expression of contempt, but I find
his expression of contempt rather more offensive.


- d.

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] Flattening a wikimedia category

Daniel Schwen-2
In reply to this post by David Gerard-2
> In practice, the difference between this and saying "No, never" is
> telling people to do work that you know can't happen.

Wow, this is rich. We already had this conversation. A reminder:

> Demanding that all six million files be de-categorised before you'll
> even allow a category intersection tool to *possibly* be deployed is
> backward.
I never demanded that. Geez. What I want is the commons community
pledges support for a change of the categorization system. Putting
intersection in the interface before they do is a _waste of time_.
I'm asking for them to show the _tiniest_ sign of support. The
programmers have already bent over backwards (including me with my own
intersection tool)

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] Flattening a wikimedia category

bawolff
In reply to this post by David Gerard-2
>In response to all the category intersection/flattening stuff
It's amazing how different this conversation sounds when you compare
the wikitech-l one vs the commons-l one.

-bawolff

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: [Wikitech-l] Flattening a wikimedia category

Mike Peel
Yes, it's rather confusing... ;-) Perhaps someone subscribed to both  
lists could summarize what this conversation is about, and what has  
come out of it so far?

Mike

On 8 Feb 2010, at 20:39, bawolff wrote:

>> In response to all the category intersection/flattening stuff
> It's amazing how different this conversation sounds when you compare
> the wikitech-l one vs the commons-l one.
>
> -bawolff
>
> _______________________________________________
> Commons-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/commons-l


_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l