Re: [Foundation-l] Letter to the community on Controversial Content

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Letter to the community on Controversial Content

Andreas Kolbe
Note: This foundation-l post is cross-posted to commons-l, since this discussion may be of interest there as well.



> From: Tobias Oelgarte <[hidden email]>

> It is a in house made problem, as i explained at brainstorming [1].
> To put it short: It is a self made problem, based on the fact that this 
> images got more attention then others. Thanks to failed deletion 
> requests they had many people caring about them. This results in more 
> exact descriptions and file naming then in average images. Thats what 
> search engines prefer; and now we have them at a top spot. Thanks for 
> caring so much about this images and not treating them like anything else.



I don't think that is the case, actually. Brandon described how the search function works here:


To take an example, the file 


(a prominent search result in searches for "shower") has never had its name or description changed since it was uploaded from Flickr. My impression is that refinement of file names and descriptions following discussions has little to do with sexual or pornography-related media appearing prominently in search listings. The material is simply there, and the search function finds it, as it is designed to do.



> Andreas, you currently represent exactly that kind of argumentation that 
> leads into anything, but not to a solution. I described it already in 
> the post "Controversial Content vs Only-Image-Filter" [2], that single 
> examples don't represent the overall thematic. It also isn't an addition 
> to the discussion as an argument. It would be an argument if we would 
> know the effects that occur. We have to clear the question:



It is hard to say how else to provide evidence of a problem, other than by giving multiple (not single) examples of it.

You could also search for blond, blonde, red hair, strawberry, or peach ...

What is striking is the crass sexism of some of the filenames and image descriptions: "blonde bombshell", "Blonde teenie sucking", "so, so sexy", "These two had a blast showing off" etc.


One of the images shows a young woman in the bathroom, urinating: 


Her face is fully shown, and the image, displayed in the Czech Wikipedia, carries no personality rights warning, nor is there evidence that she has consented to or is even aware of the upload.

And I am surprised how often images of porn actresses are found in search results, even for searches like "Barbie". Commons has 917 files in Category:Unidentified porn actresses alone. There is no corresponding Category:Unidentified porn actors (although there is of course a wealth of categories and media for gay porn actors).



> * Is it a problem that the search function displays sexual content? (A 
> search should find anything related, by definition.)



I think the search function works as designed, looking for matches in file names and descriptions. 



> * Is sexual content is overrepresented by the search?



I don't think so. The search function simply shows what is there. However, the sexual content that comes up for innocuous searches sometimes violates the principle of least astonishment, and thus may turn some users off using, contributing to, or recommending Commons as an educational resource.



> * If that is the case. Why is it that way?
> * Can we do something about it, without drastic changes, like 
> blocking/excluding categories?



One thing that might help would be for the search function to privilege files that are shown in top-level categories containing the search term: e.g. for "cucumber", first display all files that are in category "cucumber", rather than those contained in subcategories, like "sexual penetrative use of cucumbers", regardless of the file name (which may not have the English word "cucumber" in it).

A second step would be to make sure that sexual content is not housed in the top categories, but in appropriately named subcategories. This is generally already established practice. Doing both would reduce the problem somewhat, at least in cases where there is a category that matches the search term.


Regards,
Andreas


[1] 
http://meta.wikimedia.org/w/index.php?title=Controversial_content%2FBrainstorming&action=historysubmit&diff=2996411&oldid=2995984
[2]
http://lists.wikimedia.org/pipermail/foundation-l/2011-October/069699.html

Am 17.10.2011 02:56, schrieb Andreas Kolbe:
> Personality conflicts aside, we're noting that non-sexual search terms in Commons can prominently return sexual images of varying explicitness, from mild nudity to hardcore, and that this is different from entering a sexual search term and finding that Google fails to filter some results.
>
> I posted some more Commons search terms where this happens on Meta; they include
>
> Black, Caucasian, Asian;
>
> Male, Female, Teenage, Woman, Man;
>
> Vegetables;
>
> Drawing, Drawing style;
>
> Barbie, Doll;
>
> Demonstration, Slideshow;
>
> Drinking, Custard, Tan;
>
> Hand, Forefinger, Backhand, Hair;
>
> Bell tolling, Shower, Furniture, Crate, Scaffold;
>
> Galipette – French for "somersault"; this leads to a collection of 1920s pornographic films which are undoubtedly of significant historical interest, but are also pretty much as explicit as any modern representative of the genre.
>
> Andreas

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Letter to the community on Controversial Content

Tobias Oelgarte
Am 17.10.2011 12:47, schrieb Andreas Kolbe:
Note: This foundation-l post is cross-posted to commons-l, since this discussion may be of interest there as well.



> From: Tobias Oelgarte [hidden email]

> It is a in house made problem, as i explained at brainstorming [1].
> To put it short: It is a self made problem, based on the fact that this 
> images got more attention then others. Thanks to failed deletion 
> requests they had many people caring about them. This results in more 
> exact descriptions and file naming then in average images. Thats what 
> search engines prefer; and now we have them at a top spot. Thanks for 
> caring so much about this images and not treating them like anything else.



I don't think that is the case, actually. Brandon described how the search function works here:


To take an example, the file 


(a prominent search result in searches for "shower") has never had its name or description changed since it was uploaded from Flickr. My impression is that refinement of file names and descriptions following discussions has little to do with sexual or pornography-related media appearing prominently in search listings. The material is simply there, and the search function finds it, as it is designed to do.

That is again the picking of an example. But what do you expect to find? Say that someone actually searches for an image of this practice. Should he find it at the last spot? An good search algorithm treats everything equal and delivers the closest matches. A search which is more intelligent would deliver images of showers first if you search for "shower", since it knows the difference between the terms "golden shower" and "shower". Thats how it should work. It's definitely not an error of the search engine itself, but it could be improved to deliver better matching results, without any marking. Extending it to exclude marked content leads back to the basic question(s), which should be unnecessary.

> Andreas, you currently represent exactly that kind of argumentation that 
> leads into anything, but not to a solution. I described it already in 
> the post "Controversial Content vs Only-Image-Filter" [2], that single 
> examples don't represent the overall thematic. It also isn't an addition 
> to the discussion as an argument. It would be an argument if we would 
> know the effects that occur. We have to clear the question:
It is hard to say how else to provide evidence of a problem, other than by giving multiple (not single) examples of it.

You could also search for blond, blonde, red hair, strawberry, or peach ...

What is striking is the crass sexism of some of the filenames and image descriptions: "blonde bombshell", "Blonde teenie sucking", "so, so sexy", "These two had a blast showing off" etc.


One of the images shows a young woman in the bathroom, urinating: 


Her face is fully shown, and the image, displayed in the Czech Wikipedia, carries no personality rights warning, nor is there evidence that she has consented to or is even aware of the upload.

And I am surprised how often images of porn actresses are found in search results, even for searches like "Barbie". Commons has 917 files in Category:Unidentified porn actresses alone. There is no corresponding Category:Unidentified porn actors (although there is of course a wealth of categories and media for gay porn actors).
Evidence would be a statistic in which it is shown how many people are actually happy with the results. With happy in the meaning: "i will use it again and was not so offended to not use it".

If the naming of that images is a problem then we can just rename them to something more useful. We have templates and bots for that. Marking the images would not help in this case. But doing what we can already do, would be a simple and working solution: Rename it.

The case of this image and others is already addressed in COM:PEOPLE. I also see no direct relation between this topic (keeping/deleting) and the search function and it's result.

Everyone should know that "Barbie" is a often used term or part of a pseudonym. That the search reacts to both is quite right. The word itself does not distinguish between multiple meanings. But thats again not the problem.

I must remind you not construct special cases. Better spend the time in searching for good solutions, which don't need to discriminate content to give the best results as possible.

> * Is it a problem that the search function displays sexual content? (A 
> search should find anything related, by definition.)

I think the search function works as designed, looking for matches in file names and descriptions. 

That means, that it does it's job as intended.

> * Is sexual content is overrepresented by the search?

I don't think so. The search function simply shows what is there. However, the sexual content that comes up for innocuous searches sometimes violates the principle of least astonishment, and thus may turn some users off using, contributing to, or recommending Commons as an educational resource.
That needs a big quotation mark and is an unproven statement since the beginning of the discussion. Commons and Wikipedia are meant to represent the whole variety of knowledge. A search for words will eventually deliver anything that is called that way, ambiguous or not. That means you will find anything related, since the projects don't aim at a special audience. For example "kids".

> * If that is the case. Why is it that way?
> * Can we do something about it, without drastic changes, like 
> blocking/excluding categories?

One thing that might help would be for the search function to privilege files that are shown in top-level categories containing the search term: e.g. for "cucumber", first display all files that are in category "cucumber", rather than those contained in subcategories, like "sexual penetrative use of cucumbers", regardless of the file name (which may not have the English word "cucumber" in it).
Refining the search should definitely be an option. After reading Brandon's comment I must also wonder why it doesn't consider categories. That are the places where content is already pre-sorted by ourself. It would definitely worth the effort, since it would two things at once:

1. It would most likely give better results, even if the description or filename is not translated.
2. Given a search function which finds content more effective, would also minimize the effect we are talking about.

A second step would be to make sure that sexual content is not housed in the top categories, but in appropriately named subcategories. This is generally already established practice. Doing both would reduce the problem somewhat, at least in cases where there is a category that matches the search term.
I'm a little against categories that are purely introduced to divide content in sexual (offensive) and non sexual (not offensive) content. If the practice/depiction has a own specialized term than it is acceptable. But introducing pseudo categories just blows up the category tree and effectively hides content. If we implement the first idea and introduce special categories, then we are effectively back at filtering and non neutral judgment.

PS: I was wondering which mail client you use. Usually the structure is destroyed and the order of mails (re:) is not kept, which makes it hard to follow conversations.

Regards,
Andreas


[1] 
http://meta.wikimedia.org/w/index.php?title=Controversial_content%2FBrainstorming&action=historysubmit&diff=2996411&oldid=2995984
[2]
http://lists.wikimedia.org/pipermail/foundation-l/2011-October/069699.html

Am 17.10.2011 02:56, schrieb Andreas Kolbe:
> Personality conflicts aside, we're noting that non-sexual search terms in Commons can prominently return sexual images of varying explicitness, from mild nudity to hardcore, and that this is different from entering a sexual search term and finding that Google fails to filter some results.
>
> I posted some more Commons search terms where this happens on Meta; they include
>
> Black, Caucasian, Asian;
>
> Male, Female, Teenage, Woman, Man;
>
> Vegetables;
>
> Drawing, Drawing style;
>
> Barbie, Doll;
>
> Demonstration, Slideshow;
>
> Drinking, Custard, Tan;
>
> Hand, Forefinger, Backhand, Hair;
>
> Bell tolling, Shower, Furniture, Crate, Scaffold;
>
> Galipette – French for "somersault"; this leads to a collection of 1920s pornographic films which are undoubtedly of significant historical interest, but are also pretty much as explicit as any modern representative of the genre.
>
> Andreas
_______________________________________________ Commons-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/commons-l


_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Letter to the community on Controversial Content

Alex Brollo
The most curious and unexpected sex content  I ever found  is into [[Category:HTML]]. An unusual way to learn html syntax... browse the category, or search directly with the keywords "html hr tag" :-D

Alex

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Letter to the community on Controversial Content

Craig Franklin-2
There's a lot of images on Commons that make me scratch my head and say "what the hell?".  I've just found another two :-).

On 18 October 2011 04:10, Alex Brollo <[hidden email]> wrote:
The most curious and unexpected sex content  I ever found  is into [[Category:HTML]]. An unusual way to learn html syntax... browse the category, or search directly with the keywords "html hr tag" :-D

Alex

_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l



_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l
Reply | Threaded
Open this post in threaded view
|

Re: [Foundation-l] Letter to the community on Controversial Content

Andreas Kolbe
In reply to this post by Tobias Oelgarte
From: Tobias Oelgarte <[hidden email]>
> That is again the picking of an example. But what do you expect to find? Say that someone actually 
> searches for an image of this practice. Should he find it at the last spot? An good search algorithm
> treats everything equal and delivers the closest matches. A search which is more intelligent would 
> deliver images of showers first if you search for "shower", since it knows the difference between the 
> terms "golden shower" and "shower". Thats how it should work. It's definitely not an error of the
> search engine itself, but it could be improved to deliver better matching results, without any marking.
> Extending it to exclude marked content leads back to the basic question(s), which should be
> unnecessary.



I would expect that someone who has entered "shower" as their search terms is looking for images
of showers, and that someone looking for images of wet sex would enter "golden shower" as their
search term. So if we present an image of people urinating on each other in response to a search like
"shower", we violate the principle of least astonishment. 



> Evidence would be a statistic in which it is shown how many people are actually happy with the results.
> With happy in the meaning: "i will use it again and was not so offended to not use it".



One thing not to forget here is that we may turn away users who might otherwise contribute. If the users
that *remain* are happy, that does not mean that we have not lost many others, nor that those who
remained are representative of the broader population.



> If the naming of that images is a problem then we can just rename them to something more useful. We
> have templates and bots for that. Marking the images would not help in this case. But doing what we
> can already do, would be a simple and working solution: Rename it.



Difficult too; would you suggest giving all sexual images code names?



> Everyone should know that "Barbie" is a often used term or part of a pseudonym. That the search
> reacts to both is quite right. The word itself does not distinguish between multiple meanings. But thats
> again not the problem.



Someone entering "Barbie" as the search term is probably looking for images of Barbie dolls, not images
of porn actresses like Lanny Barbie, Fetish Babie or Barbie Love. I think it's not unreasonable to expect
the latter group of people to enter both parts of the name.



> > One thing that might help would be for the search function to privilege files that are shown in top-
> > level categories containing the search term: e.g. for "cucumber", first display all files that are in
> > category "cucumber", rather than those contained in subcategories, like "sexual penetrative use of
> > cucumbers", regardless of the file name (which may not have the English word "cucumber" in it).

> Refining the search should definitely be an option. After reading Brandon's comment I must also
> wonder why it doesn't consider categories. That are the places where content is already pre-sorted by
> ourself. It would definitely worth the effort, since it would two things at once:
> 1. It would most likely give better results, even if the description or filename is not translated.
> 2. Given a search function which finds content more effective, would also minimize the effect we are
> talking about. 


We are in agreement on that point. I've asked Brandon (on the gendergap list) if this would be a lot work,
but he has previously indicated that finding time to reprogram this might be difficult. Nevertheless, I
think it is something we should pursue. Anything you can do to help is appreciated.




> I'm a little against categories that are purely introduced to divide content in sexual (offensive) and non
> sexual (not offensive) content. If the practice/depiction has a own specialized term than it is acceptable.
> But introducing pseudo categories just blows up the category tree and effectively hides content. If we
> implement the first idea and introduce special categories, then we are effectively back at filtering and
> non neutral judgment.



I don't think it makes sense to feature women with cucumbers inserted in their vagina in the top-level
cucumber category. Again, principle of least astonishment.



> PS: I was wondering which mail client you use. Usually the structure is destroyed and the order of
> mails (re:) is not kept, which makes it hard to follow conversations.



I know, it's a pain. This should be my last post to this list with the yahoo client; I've gotten myself a
gmail account and will use that from now on.



Cheers,
Andreas



[1] 
http://meta.wikimedia.org/w/index.php?title=Controversial_content%2FBrainstorming&action=historysubmit&diff=2996411&oldid=2995984
[2]
http://lists.wikimedia.org/pipermail/foundation-l/2011-October/069699.html

Am 17.10.2011 02:56, schrieb Andreas Kolbe:
> Personality conflicts aside, we're noting that non-sexual search terms in Commons can prominently return sexual images of varying explicitness, from mild nudity to hardcore, and that this is different from entering a sexual search term and finding that Google fails to filter some results.
>
> I posted some more Commons search terms where this happens on Meta; they include
>
> Black, Caucasian, Asian;
>
> Male, Female, Teenage, Woman, Man;
>
> Vegetables;
>
> Drawing, Drawing style;
>
> Barbie, Doll;
>
> Demonstration, Slideshow;
>
> Drinking, Custard, Tan;
>
> Hand, Forefinger, Backhand, Hair;
>
> Bell tolling, Shower, Furniture, Crate, Scaffold;
>
> Galipette – French for "somersault"; this leads to a collection of 1920s pornographic films which are undoubtedly of significant historical interest, but are also pretty much as explicit as any modern representative of the genre.
>
> Andreas
_______________________________________________ Commons-l mailing list [hidden email] https://lists.wikimedia.org/mailman/listinfo/commons-l


_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l



_______________________________________________
Commons-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/commons-l