Unbreaking statistics

classic Classic list List threaded Threaded
27 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Unbreaking statistics

Peter Gervai-5
Hello,

I see I've created quite a stir around, but so far nothing really
useful popped up. :-(

But I see that one from Neil:
> Yes, modifying the http://stats.grok.se/ systems looks like the way to go.

For me it doesn't really seem to be, since it seems to be using an
extremely dumbed down version of input, which only contains page views
and [unreliable] byte counters. Most probably it would require large
rewrites, and a magical new data source.

> What do people actually want to see from the traffic data? Do they want
> referrers, anonymized user trails, or what?

Are you old enough to remember stats.wikipedia.org? As far as I
remember originally it ran webalizer, then something else, then
nothing. If you check a webalizer stat you'll see what's in it. We are
using, or we used until our nice fellow editors broke it, awstats,
which basically provides the same with more caching.

Most used and useful stats are page views (daily and hourly stats are
pretty useful too), referrers, visitor domain and provider stats, os
and browser stats, screen resolution stats, bot activity stats,
visitor duration and depth, among probably others.

At a brief glance I could replicate the grok.se stats easily since it
seems to work out of http://dammit.lt/wikistats/, but it's completely
useless for anything beyond page hit count.

Is there a possibility to write a code which process raw squid data?
Who do I have to bribe? :-/

--
 byte-byte,
    grin

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Unbreaking statistics

Tim Starling-2
Peter Gervai wrote:
> Is there a possibility to write a code which process raw squid data?
> Who do I have to bribe? :-/

Yes it's possible. You just need to write a script that accepts a log
stream on stdin and builds the aggregate data from it. If you want
access to IP addresses, it needs to run on our own servers with only
anonymised data being passed on to the public.

http://wikitech.wikimedia.org/view/Squid_logging
http://wikitech.wikimedia.org/view/Squid_log_format

-- Tim Starling


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Unbreaking statistics

Alex Zaddach
In reply to this post by Peter Gervai-5
Peter Gervai wrote:

> Hello,
>
> I see I've created quite a stir around, but so far nothing really
> useful popped up. :-(
>
> But I see that one from Neil:
>> Yes, modifying the http://stats.grok.se/ systems looks like the way to go.
>
> For me it doesn't really seem to be, since it seems to be using an
> extremely dumbed down version of input, which only contains page views
> and [unreliable] byte counters. Most probably it would require large
> rewrites, and a magical new data source.
>
>> What do people actually want to see from the traffic data? Do they want
>> referrers, anonymized user trails, or what?
>
> Are you old enough to remember stats.wikipedia.org? As far as I
> remember originally it ran webalizer, then something else, then
> nothing. If you check a webalizer stat you'll see what's in it. We are
> using, or we used until our nice fellow editors broke it, awstats,
> which basically provides the same with more caching.
>
> Most used and useful stats are page views (daily and hourly stats are
> pretty useful too), referrers, visitor domain and provider stats, os
> and browser stats, screen resolution stats, bot activity stats,
> visitor duration and depth, among probably others.
>
> At a brief glance I could replicate the grok.se stats easily since it
> seems to work out of http://dammit.lt/wikistats/, but it's completely
> useless for anything beyond page hit count.
>
> Is there a possibility to write a code which process raw squid data?
> Who do I have to bribe? :-/
>

We do have http://stats.wikimedia.org/ which includes things like
http://stats.wikimedia.org/EN/VisitorsSampledLogOrigins.htm

--
Alex (wikipedia:en:User:Mr.Z-man)

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Unbreaking statistics

Robert Rohde
In reply to this post by Tim Starling-2
On Fri, Jun 5, 2009 at 6:38 PM, Tim Starling<[hidden email]> wrote:

> Peter Gervai wrote:
>> Is there a possibility to write a code which process raw squid data?
>> Who do I have to bribe? :-/
>
> Yes it's possible. You just need to write a script that accepts a log
> stream on stdin and builds the aggregate data from it. If you want
> access to IP addresses, it needs to run on our own servers with only
> anonymised data being passed on to the public.
>
> http://wikitech.wikimedia.org/view/Squid_logging
> http://wikitech.wikimedia.org/view/Squid_log_format
>

How much of that is really considered private?  IP addresses
obviously, anything else?

I'm wondering if a cheap and dirty solution (at least for the low
traffic wikis) might be to write a script that simply scrubs the
private information and makes the rest available for whatever
applications people might want.

-Robert Rohde

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Unbreaking statistics

Gregory Maxwell
On Fri, Jun 5, 2009 at 10:13 PM, Robert Rohde<[hidden email]> wrote:

> On Fri, Jun 5, 2009 at 6:38 PM, Tim Starling<[hidden email]> wrote:
>> Peter Gervai wrote:
>>> Is there a possibility to write a code which process raw squid data?
>>> Who do I have to bribe? :-/
>>
>> Yes it's possible. You just need to write a script that accepts a log
>> stream on stdin and builds the aggregate data from it. If you want
>> access to IP addresses, it needs to run on our own servers with only
>> anonymised data being passed on to the public.
>>
>> http://wikitech.wikimedia.org/view/Squid_logging
>> http://wikitech.wikimedia.org/view/Squid_log_format
>>
>
> How much of that is really considered private?  IP addresses
> obviously, anything else?
>
> I'm wondering if a cheap and dirty solution (at least for the low
> traffic wikis) might be to write a script that simply scrubs the
> private information and makes the rest available for whatever
> applications people might want.

There is a lot of private data in user agents ("MSIE 4.123; WINNT 4.0;
bouncing_ferret_toolbar_1.23 drunken_monkey_downloader_2.34" may be
uniquely identifying). There is even private data titles if you don't
sanitize carefully
(/wiki/search?lookup=From%20rarohde%20To%20Gmaxwell%20OMG%20secret%20stuff%20lemme%20accidently%20paste%20it%20into%20the%20search%20box).
 There is private data in referrers
(http://rarohde.com/url_that_only_rarohde_would_have_comefrom).

Things which individually do not appear to disclose anything private
can disclose private things (look at the people uniquely identified by
AOL's 'anonymized' search data).

On the flip side, aggregation can take private things (i.e.
useragents; IP info; referrers) and convert it to non-private data:
Top user agents; top referrers; highest traffic ASNs... but becomes
potentially revealing if not done carefully: The 'top' network and
user agent info for a single obscure article in a short time window
may be information from only one or two users, not really an
aggregation.

Things like common paths through the site should be safe so long as
they are not provided with too much temporal resolution, limit
themselves to existing articles, and limit themselves to either really
common paths or breaking paths into two or three node chains and skip
releasing the least common of those.

Generally when dealing with private data you must approach it with the
same attitude that a C coder must take to avoid buffer overflows.
Treat all data as hostile, assume all actions are potentially
dangerous. Try to figure out how to break it, and think deviously.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Unbreaking statistics

Brian J Mingus
In reply to this post by Robert Rohde
Scrubbing log files to make the data private is hard work. You'd be
impressed by what researchers have been able to do - taking purportedly
anonymous data and using it to identify users en masse by correlating it
with publicly available data from other sites such as Amazon, Facebook and
Netflix. Make no doubt - if you don't do it carefully you will be the target
of, in the best of cases, an academic researcher who wants to prove that you
don't understand statistics.

On Fri, Jun 5, 2009 at 8:13 PM, Robert Rohde <[hidden email]> wrote:

> On Fri, Jun 5, 2009 at 6:38 PM, Tim Starling<[hidden email]>
> wrote:
> > Peter Gervai wrote:
> >> Is there a possibility to write a code which process raw squid data?
> >> Who do I have to bribe? :-/
> >
> > Yes it's possible. You just need to write a script that accepts a log
> > stream on stdin and builds the aggregate data from it. If you want
> > access to IP addresses, it needs to run on our own servers with only
> > anonymised data being passed on to the public.
> >
> > http://wikitech.wikimedia.org/view/Squid_logging
> > http://wikitech.wikimedia.org/view/Squid_log_format
> >
>
> How much of that is really considered private?  IP addresses
> obviously, anything else?
>
> I'm wondering if a cheap and dirty solution (at least for the low
> traffic wikis) might be to write a script that simply scrubs the
> private information and makes the rest available for whatever
> applications people might want.
>
> -Robert Rohde
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Unbreaking statistics

Robert Rohde
In reply to this post by Gregory Maxwell
On Fri, Jun 5, 2009 at 9:20 PM, Gregory Maxwell<[hidden email]> wrote:

> On Fri, Jun 5, 2009 at 10:13 PM, Robert Rohde<[hidden email]> wrote:
> There is a lot of private data in user agents ("MSIE 4.123; WINNT 4.0;
> bouncing_ferret_toolbar_1.23 drunken_monkey_downloader_2.34" may be
> uniquely identifying). There is even private data titles if you don't
> sanitize carefully
> (/wiki/search?lookup=From%20rarohde%20To%20Gmaxwell%20OMG%20secret%20stuff%20lemme%20accidently%20paste%20it%20into%20the%20search%20box).
>  There is private data in referrers
> (http://rarohde.com/url_that_only_rarohde_would_have_comefrom).
>
> Things which individually do not appear to disclose anything private
> can disclose private things (look at the people uniquely identified by
> AOL's 'anonymized' search data).
>
> On the flip side, aggregation can take private things (i.e.
> useragents; IP info; referrers) and convert it to non-private data:
> Top user agents; top referrers; highest traffic ASNs... but becomes
> potentially revealing if not done carefully: The 'top' network and
> user agent info for a single obscure article in a short time window
> may be information from only one or two users, not really an
> aggregation.
>
> Things like common paths through the site should be safe so long as
> they are not provided with too much temporal resolution, limit
> themselves to existing articles, and limit themselves to either really
> common paths or breaking paths into two or three node chains and skip
> releasing the least common of those.
>
> Generally when dealing with private data you must approach it with the
> same attitude that a C coder must take to avoid buffer overflows.
> Treat all data as hostile, assume all actions are potentially
> dangerous. Try to figure out how to break it, and think deviously.

On reflection I agree with you, though I think the biggest problem
would actually be a case you didn't mention.  If one provided timing
and page view information, then one can almost certainly single out
individual users by correlating the view timing with edit histories.

Okay, so no stripped logs.  The next question becomes what is the
right way to aggregate.  We can A) reinvent the wheel, or B) adapt a
pre-existing log analyzer in a mode to produce clean aggregate data.
While I respect the work of Zachte and others, this might be a case
where B is a better near-term solution.

Looking at http://stats.wikipedia.hu/cgi-bin/awstats.pl (the page that
started this mess), his AWStats config already suppresses IP info and
aggregates everything into groups that make it very hard to identify
anything personal from.  (There is still a small risk with allowing
users to drill down to pages / requests that are almost never made,
but perhaps that could be turned off.)  AWStats has native support for
Squid logs and is open source.

This is not necessarily the only option, but I suspect that if we gave
it some thought it would be possible to find an off-the-shelf tool
that would be good enough to support many wikis and configurable
enough to satisfy even the GMaxwell's of the world ;-).  huwiki is
actually the 20th largest wiki (by number of edits), so if it worked
for them, then a tool like AWStats can probably work for most of the
projects (which are not EN).

-Robert Rohde

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Hotlinking (was Re: Unbreaking statistics)

Platonides
In reply to this post by Alex Zaddach
Alex wrote:
> We do have http://stats.wikimedia.org/ which includes things like
> http://stats.wikimedia.org/EN/VisitorsSampledLogOrigins.htm

I see on that list pretty high the site www.musicistheheartofoursoul.com
Looking at the page, they include many images from wikimedia servers,
hotlinking them and without link to the image page.

Moreover, they aren't even free images but Fair Use ones uploaded on enwiki.

Shouldn't we politely ask them to make a local copy?


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Hotlinking (was Re: Unbreaking statistics)

Brian J Mingus
Two things
1. There isn't a good way to get an image dump

2. Allowing hotlinking seems to fit nicely within the WMF mission.

On Sat, Jun 6, 2009 at 6:24 PM, Platonides <[hidden email]> wrote:

> Alex wrote:
> > We do have http://stats.wikimedia.org/ which includes things like
> > http://stats.wikimedia.org/EN/VisitorsSampledLogOrigins.htm
>
> I see on that list pretty high the site www.musicistheheartofoursoul.com
> Looking at the page, they include many images from wikimedia servers,
> hotlinking them and without link to the image page.
>
> Moreover, they aren't even free images but Fair Use ones uploaded on
> enwiki.
>
> Shouldn't we politely ask them to make a local copy?
>
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Unbreaking statistics

John at Darkstar
In reply to this post by Tim Starling-2
If someone wants to work on this I have some ideas to make something
usefull out of this log, but I'm a bit short on time. Basically its two
ideas that are really usefull; one is to figure out which articles are
most interesting to show in a portal and the other is how to detect
articles with missing linking between them.
John

Tim Starling skrev:

> Peter Gervai wrote:
>> Is there a possibility to write a code which process raw squid data?
>> Who do I have to bribe? :-/
>
> Yes it's possible. You just need to write a script that accepts a log
> stream on stdin and builds the aggregate data from it. If you want
> access to IP addresses, it needs to run on our own servers with only
> anonymised data being passed on to the public.
>
> http://wikitech.wikimedia.org/view/Squid_logging
> http://wikitech.wikimedia.org/view/Squid_log_format
>
> -- Tim Starling
>
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Unbreaking statistics

John at Darkstar
In reply to this post by Robert Rohde
Some articles are always very seldom referred and those can be used to
uniquely identify a machine. Then there are all those who do something
that goes into public logs. The later are very difficult to obfuscate,
but the first one is possible to solve by setting a time frame long
enough that sufficient alternate traffic will be within the same window.
Unfortunately this time frame is pretty long for some articles, and from
some tests it seems to be weeks on Norsk (bokmål) Wikipedia.
John

Robert Rohde skrev:

> On Fri, Jun 5, 2009 at 9:20 PM, Gregory Maxwell<[hidden email]> wrote:
>> On Fri, Jun 5, 2009 at 10:13 PM, Robert Rohde<[hidden email]> wrote:
>> There is a lot of private data in user agents ("MSIE 4.123; WINNT 4.0;
>> bouncing_ferret_toolbar_1.23 drunken_monkey_downloader_2.34" may be
>> uniquely identifying). There is even private data titles if you don't
>> sanitize carefully
>> (/wiki/search?lookup=From%20rarohde%20To%20Gmaxwell%20OMG%20secret%20stuff%20lemme%20accidently%20paste%20it%20into%20the%20search%20box).
>>  There is private data in referrers
>> (http://rarohde.com/url_that_only_rarohde_would_have_comefrom).
>>
>> Things which individually do not appear to disclose anything private
>> can disclose private things (look at the people uniquely identified by
>> AOL's 'anonymized' search data).
>>
>> On the flip side, aggregation can take private things (i.e.
>> useragents; IP info; referrers) and convert it to non-private data:
>> Top user agents; top referrers; highest traffic ASNs... but becomes
>> potentially revealing if not done carefully: The 'top' network and
>> user agent info for a single obscure article in a short time window
>> may be information from only one or two users, not really an
>> aggregation.
>>
>> Things like common paths through the site should be safe so long as
>> they are not provided with too much temporal resolution, limit
>> themselves to existing articles, and limit themselves to either really
>> common paths or breaking paths into two or three node chains and skip
>> releasing the least common of those.
>>
>> Generally when dealing with private data you must approach it with the
>> same attitude that a C coder must take to avoid buffer overflows.
>> Treat all data as hostile, assume all actions are potentially
>> dangerous. Try to figure out how to break it, and think deviously.
>
> On reflection I agree with you, though I think the biggest problem
> would actually be a case you didn't mention.  If one provided timing
> and page view information, then one can almost certainly single out
> individual users by correlating the view timing with edit histories.
>
> Okay, so no stripped logs.  The next question becomes what is the
> right way to aggregate.  We can A) reinvent the wheel, or B) adapt a
> pre-existing log analyzer in a mode to produce clean aggregate data.
> While I respect the work of Zachte and others, this might be a case
> where B is a better near-term solution.
>
> Looking at http://stats.wikipedia.hu/cgi-bin/awstats.pl (the page that
> started this mess), his AWStats config already suppresses IP info and
> aggregates everything into groups that make it very hard to identify
> anything personal from.  (There is still a small risk with allowing
> users to drill down to pages / requests that are almost never made,
> but perhaps that could be turned off.)  AWStats has native support for
> Squid logs and is open source.
>
> This is not necessarily the only option, but I suspect that if we gave
> it some thought it would be possible to find an off-the-shelf tool
> that would be good enough to support many wikis and configurable
> enough to satisfy even the GMaxwell's of the world ;-).  huwiki is
> actually the 20th largest wiki (by number of edits), so if it worked
> for them, then a tool like AWStats can probably work for most of the
> projects (which are not EN).
>
> -Robert Rohde
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Hotlinking (was Re: Unbreaking statistics)

David Gerard-2
In reply to this post by Brian J Mingus
2009/6/7 Brian <[hidden email]>:

> Two things
> 1. There isn't a good way to get an image dump
> 2. Allowing hotlinking seems to fit nicely within the WMF mission.


Hotlinking isn't generally allowed, but using Commons as a remote
repository on your own MediaWiki is.


- d.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Hotlinking (was Re: Unbreaking statistics)

John at Darkstar
In reply to this post by Brian J Mingus
Hotlinking fair use images are something that should not be possible.
John

Brian skrev:

> Two things
> 1. There isn't a good way to get an image dump
>
> 2. Allowing hotlinking seems to fit nicely within the WMF mission.
>
> On Sat, Jun 6, 2009 at 6:24 PM, Platonides <[hidden email]> wrote:
>
>> Alex wrote:
>>> We do have http://stats.wikimedia.org/ which includes things like
>>> http://stats.wikimedia.org/EN/VisitorsSampledLogOrigins.htm
>> I see on that list pretty high the site www.musicistheheartofoursoul.com
>> Looking at the page, they include many images from wikimedia servers,
>> hotlinking them and without link to the image page.
>>
>> Moreover, they aren't even free images but Fair Use ones uploaded on
>> enwiki.
>>
>> Shouldn't we politely ask them to make a local copy?
>>
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Hotlinking (was Re: Unbreaking statistics)

Brian J Mingus
In reply to this post by David Gerard-2
What do you mean it's not allowed - it works. There is only one way to
disallow it!

On Sun, Jun 7, 2009 at 1:11 AM, David Gerard <[hidden email]> wrote:

> 2009/6/7 Brian <[hidden email]>:
>
> > Two things
> > 1. There isn't a good way to get an image dump
> > 2. Allowing hotlinking seems to fit nicely within the WMF mission.
>
>
> Hotlinking isn't generally allowed, but using Commons as a remote
> repository on your own MediaWiki is.
>
>
> - d.
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Hotlinking (was Re: Unbreaking statistics)

Brian J Mingus
In reply to this post by John at Darkstar
What on earth are you talking about?

On Sun, Jun 7, 2009 at 1:12 AM, John at Darkstar <[hidden email]> wrote:

> Hotlinking fair use images are something that should not be possible.
> John
>
> Brian skrev:
> > Two things
> > 1. There isn't a good way to get an image dump
> >
> > 2. Allowing hotlinking seems to fit nicely within the WMF mission.
> >
> > On Sat, Jun 6, 2009 at 6:24 PM, Platonides <[hidden email]> wrote:
> >
> >> Alex wrote:
> >>> We do have http://stats.wikimedia.org/ which includes things like
> >>> http://stats.wikimedia.org/EN/VisitorsSampledLogOrigins.htm
> >> I see on that list pretty high the site
> www.musicistheheartofoursoul.com
> >> Looking at the page, they include many images from wikimedia servers,
> >> hotlinking them and without link to the image page.
> >>
> >> Moreover, they aren't even free images but Fair Use ones uploaded on
> >> enwiki.
> >>
> >> Shouldn't we politely ask them to make a local copy?
> >>
> >>
> >> _______________________________________________
> >> Wikitech-l mailing list
> >> [hidden email]
> >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >>
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Hotlinking (was Re: Unbreaking statistics)

John at Darkstar
Platonides comment on fair use images, you say in point 2 that
hotlinking is something that fits nicely withing the wmf mission, I say
it should not be possible to hotlink fair use images.

How would you argue that serving _fair_use_images_ for someone else is
within wmf mission? How would you argue that such use of the images does
not violates the copyright owners rights to the images?

John

Brian skrev:

> What on earth are you talking about?
>
> On Sun, Jun 7, 2009 at 1:12 AM, John at Darkstar <[hidden email]> wrote:
>
>> Hotlinking fair use images are something that should not be possible.
>> John
>>
>> Brian skrev:
>>> Two things
>>> 1. There isn't a good way to get an image dump
>>>
>>> 2. Allowing hotlinking seems to fit nicely within the WMF mission.
>>>
>>> On Sat, Jun 6, 2009 at 6:24 PM, Platonides <[hidden email]> wrote:
>>>
>>>> Alex wrote:
>>>>> We do have http://stats.wikimedia.org/ which includes things like
>>>>> http://stats.wikimedia.org/EN/VisitorsSampledLogOrigins.htm
>>>> I see on that list pretty high the site
>> www.musicistheheartofoursoul.com
>>>> Looking at the page, they include many images from wikimedia servers,
>>>> hotlinking them and without link to the image page.
>>>>
>>>> Moreover, they aren't even free images but Fair Use ones uploaded on
>>>> enwiki.
>>>>
>>>> Shouldn't we politely ask them to make a local copy?
>>>>
>>>>
>>>> _______________________________________________
>>>> Wikitech-l mailing list
>>>> [hidden email]
>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>>
>>> _______________________________________________
>>> Wikitech-l mailing list
>>> [hidden email]
>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Hotlinking (was Re: Unbreaking statistics)

Robert Rohde
On Sun, Jun 7, 2009 at 1:37 AM, John at Darkstar<[hidden email]> wrote:
> Platonides comment on fair use images, you say in point 2 that
> hotlinking is something that fits nicely withing the wmf mission, I say
> it should not be possible to hotlink fair use images.
>
> How would you argue that serving _fair_use_images_ for someone else is
> within wmf mission? How would you argue that such use of the images does
> not violates the copyright owners rights to the images?

At the risk of stating the obvious, the person hotlinking the image
could also have an entirely reasonable fair use claim.

-Robert Rohde

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Hotlinking (was Re: Unbreaking statistics)

John at Darkstar
The person hotlinking can have a resonable fair use claim, but the site
serving a fair use image for someone else would most likely be in
serious trouble defending its position. If you have some reasoning that
this is not the case it would be interesting, as the fair use images
from english Wikipedia can be moved to Commons if this is correct.

Robert Rohde skrev:

> On Sun, Jun 7, 2009 at 1:37 AM, John at Darkstar<[hidden email]> wrote:
>> Platonides comment on fair use images, you say in point 2 that
>> hotlinking is something that fits nicely withing the wmf mission, I say
>> it should not be possible to hotlink fair use images.
>>
>> How would you argue that serving _fair_use_images_ for someone else is
>> within wmf mission? How would you argue that such use of the images does
>> not violates the copyright owners rights to the images?
>
> At the risk of stating the obvious, the person hotlinking the image
> could also have an entirely reasonable fair use claim.
>
> -Robert Rohde
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Hotlinking (was Re: Unbreaking statistics)

Gerard Meijssen-3
Hoi,
Is this discussion about policy relevant to this mailing list ?
Thanks,
     GerardM

2009/6/7 John at Darkstar <[hidden email]>

> The person hotlinking can have a resonable fair use claim, but the site
> serving a fair use image for someone else would most likely be in
> serious trouble defending its position. If you have some reasoning that
> this is not the case it would be interesting, as the fair use images
> from english Wikipedia can be moved to Commons if this is correct.
>
> Robert Rohde skrev:
> > On Sun, Jun 7, 2009 at 1:37 AM, John at Darkstar<[hidden email]> wrote:
> >> Platonides comment on fair use images, you say in point 2 that
> >> hotlinking is something that fits nicely withing the wmf mission, I say
> >> it should not be possible to hotlink fair use images.
> >>
> >> How would you argue that serving _fair_use_images_ for someone else is
> >> within wmf mission? How would you argue that such use of the images does
> >> not violates the copyright owners rights to the images?
> >
> > At the risk of stating the obvious, the person hotlinking the image
> > could also have an entirely reasonable fair use claim.
> >
> > -Robert Rohde
> >
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Hotlinking (was Re: Unbreaking statistics)

David Gerard-2
2009/6/7 Gerard Meijssen <[hidden email]>:

> Is this discussion about policy relevant to this mailing list ?


Somewhat:.

If we officially don't like hotlinking, is it reasonable to disable
hotlinking from Wikimedia sites? If so, can it be done without
breaking remote file repo use of Commons?


- d.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
12