Tool to find poorly written articles

classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Tool to find poorly written articles

Ditty Mathew
Hi ,

I am planning to develop a tool to find out the poorly written articles and rank it accordingly. This will give a statistics about which all article we have to modify to make it well written. Also finding good article in one language helps to recommend that in other language where the same article has poorly written.

Is there any tool already exists which will do the same task. If it is not there, will this be helpful. Can you give me some suggestions?


with regards

Ditty

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Tool to find poorly written articles

Aileen Oeberst
I am currently on vacation and will not be able to answer your mail before
November 10. But I will get back then as soon as possible.

Best regards, Aileen Oeberst


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Tool to find poorly written articles

Aaron Halfaker-2
Hi Ditty!

Since Aileen is on vacation (lol), I've got some references for you. 
If you are interested in the article quality predictions that are used in SuggestBot, check out https://pythonhosted.org/wikiclass/ Right now, we only have models built for English Wikipedia, but the features are relatively agnostic and should work in other languages.  

-Aaron

On Fri, Oct 24, 2014 at 10:31 AM, Aileen Oeberst <[hidden email]> wrote:
I am currently on vacation and will not be able to answer your mail before
November 10. But I will get back then as soon as possible.

Best regards, Aileen Oeberst


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Tool to find poorly written articles

Ziko van Dijk-3
Hello,
What do you exactly mean by poorly written? Dnaber presented on Wikimania a LanguageTool to detect wordings that might be incorrect.
Kind regards
Ziko

Am Freitag, 24. Oktober 2014 schrieb Aaron Halfaker :
Hi Ditty!

Since Aileen is on vacation (lol), I've got some references for you. 
If you are interested in the article quality predictions that are used in SuggestBot, check out https://pythonhosted.org/wikiclass/ Right now, we only have models built for English Wikipedia, but the features are relatively agnostic and should work in other languages.  

-Aaron

On Fri, Oct 24, 2014 at 10:31 AM, Aileen Oeberst <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;a.oeberst@iwm-kmrc.de&#39;);" target="_blank">a.oeberst@...> wrote:
I am currently on vacation and will not be able to answer your mail before
November 10. But I will get back then as soon as possible.

Best regards, Aileen Oeberst


_______________________________________________
Wiki-research-l mailing list
<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;Wiki-research-l@lists.wikimedia.org&#39;);" target="_blank">Wiki-research-l@...
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Tool to find poorly written articles

Ditty Mathew
What I meant by poorly written is the quality of article.

with regards

Ditty

On Fri, Oct 24, 2014 at 2:09 PM, Ziko van Dijk <[hidden email]> wrote:
Hello,
What do you exactly mean by poorly written? Dnaber presented on Wikimania a LanguageTool to detect wordings that might be incorrect.
Kind regards
Ziko

Am Freitag, 24. Oktober 2014 schrieb Aaron Halfaker :

Hi Ditty!

Since Aileen is on vacation (lol), I've got some references for you. 
If you are interested in the article quality predictions that are used in SuggestBot, check out https://pythonhosted.org/wikiclass/ Right now, we only have models built for English Wikipedia, but the features are relatively agnostic and should work in other languages.  

-Aaron

On Fri, Oct 24, 2014 at 10:31 AM, Aileen Oeberst <[hidden email]> wrote:
I am currently on vacation and will not be able to answer your mail before
November 10. But I will get back then as soon as possible.

Best regards, Aileen Oeberst


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Tool to find poorly written articles

Ditty Mathew
In reply to this post by Aaron Halfaker-2
Hi Aaron,

How can we evaluate the system? Is there any existing rating of articles available?

with regards

Ditty

On Fri, Oct 24, 2014 at 12:23 PM, Aaron Halfaker <[hidden email]> wrote:
Hi Ditty!

Since Aileen is on vacation (lol), I've got some references for you. 
If you are interested in the article quality predictions that are used in SuggestBot, check out https://pythonhosted.org/wikiclass/ Right now, we only have models built for English Wikipedia, but the features are relatively agnostic and should work in other languages.  

-Aaron

On Fri, Oct 24, 2014 at 10:31 AM, Aileen Oeberst <[hidden email]> wrote:
I am currently on vacation and will not be able to answer your mail before
November 10. But I will get back then as soon as possible.

Best regards, Aileen Oeberst


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Tool to find poorly written articles

Aaron Halfaker-2
Ditty, 

Yes.  See the following for discussion:

Warncke-Wang, M., Cosley, D., & Riedl, J. (2013, August). Tell me more: an actionable quality model for Wikipedia. In Proceedings of the 9th International Symposium on Open Collaboration (p. 8). ACM.  http://opensym.org/wsos2013/proceedings/p0202-warncke.pdf

On Fri, Oct 24, 2014 at 1:13 PM, Ditty Mathew <[hidden email]> wrote:
Hi Aaron,

How can we evaluate the system? Is there any existing rating of articles available?

with regards

Ditty

On Fri, Oct 24, 2014 at 12:23 PM, Aaron Halfaker <[hidden email]> wrote:
Hi Ditty!

Since Aileen is on vacation (lol), I've got some references for you. 
If you are interested in the article quality predictions that are used in SuggestBot, check out https://pythonhosted.org/wikiclass/ Right now, we only have models built for English Wikipedia, but the features are relatively agnostic and should work in other languages.  

-Aaron

On Fri, Oct 24, 2014 at 10:31 AM, Aileen Oeberst <[hidden email]> wrote:
I am currently on vacation and will not be able to answer your mail before
November 10. But I will get back then as soon as possible.

Best regards, Aileen Oeberst


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Tool to find poorly written articles

James Salsman-2
In reply to this post by Ditty Mathew
Ditty,

Article quality is inherently subjective in the hard-AI sense. A panel of judges will consider accurate articles full of spelling, grammar, and formatting errors superior in quality to hoax, biased, spam, or out-of-date articles with perfect grammar, impeccable spelling, and immaculate formatting.

In my studies of the short popular vital articles (WP:SPVA) the closest correlation with subjective mean opinion score quality I've found so far is sentence length. But it has diminishing returns and the raw correlation is +0.2 at best.

The entirely subjective nature of article quality is additional support for automating accuracy review.

Best regards,
James


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Tool to find poorly written articles

Simon Knight
In reply to this post by Ditty Mathew
Hi Ditty, there might be some other relevant literature in this list:
https://wikimedia.org.uk/wiki/Talk:Technology_Committee/Project_requests/WikiRate_-_rating_Wikimedia#Relevant_literature  (it's an area Wikimedia UK are interested in exploring)

Best
Simon

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of [hidden email]
Sent: 24 October 2014 19:14
To: [hidden email]
Subject: Wiki-research-l Digest, Vol 110, Issue 15

Send Wiki-research-l mailing list submissions to
        [hidden email]

To subscribe or unsubscribe via the World Wide Web, visit
        https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
or, via email, send a message with subject or body 'help' to
        [hidden email]

You can reach the person managing the list at
        [hidden email]

When replying, please edit your Subject line so it is more specific than "Re: Contents of Wiki-research-l digest..."


Today's Topics:

   1. Tool to find poorly written articles (Ditty Mathew)
   2. Re: Tool to find poorly written articles (Aileen Oeberst)
   3. Re: Tool to find poorly written articles (Aaron Halfaker)
   4. Re: Tool to find poorly written articles (Ziko van Dijk)
   5. Re: Tool to find poorly written articles (Ditty Mathew)
   6. Re: Tool to find poorly written articles (Ditty Mathew)


----------------------------------------------------------------------

Message: 1
Date: Fri, 24 Oct 2014 11:30:19 -0400
From: Ditty Mathew <[hidden email]>
To: [hidden email]
Subject: [Wiki-research-l] Tool to find poorly written articles
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset="utf-8"

Hi ,

I am planning to develop a tool to find out the poorly written articles and rank it accordingly. This will give a statistics about which all article we have to modify to make it well written. Also finding good article in one language helps to recommend that in other language where the same article has poorly written.

Is there any tool already exists which will do the same task. If it is not there, will this be helpful. Can you give me some suggestions?


with regards

Ditty
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/wiki-research-l/attachments/20141024/fc53187b/attachment-0001.html>

------------------------------

Message: 2
Date: Fri, 24 Oct 2014 17:31:41 +0200
From: "Aileen Oeberst" <[hidden email]>
To: [hidden email]
Subject: Re: [Wiki-research-l] Tool to find poorly written articles
Message-ID: <[hidden email]>
Content-Type: text/plain; charset=UTF-8

I am currently on vacation and will not be able to answer your mail before November 10. But I will get back then as soon as possible.

Best regards, Aileen Oeberst




------------------------------

Message: 3
Date: Fri, 24 Oct 2014 11:23:07 -0500
From: Aaron Halfaker <[hidden email]>
To: Research into Wikimedia content and communities
        <[hidden email]>
Subject: Re: [Wiki-research-l] Tool to find poorly written articles
Message-ID:
        <CANQe2T9CWZtPA+buxL0E0Wo=[hidden email]>
Content-Type: text/plain; charset="utf-8"

Hi Ditty!

Since Aileen is on vacation (lol), I've got some references for you.

   - Cosley, D., Frankowski, D., Terveen, L., & Riedl, J. (2007, January).
   SuggestBot: using intelligent task routing to help people find work in
   wikipedia. In *Proceedings of the 12th international conference on
   Intelligent user interfaces* (pp. 32-41). ACM.
   http://pensivepuffin.com/dwmcphd/syllabi/info447_wi14/readings/09-Systems/Cosley.SuggestBot.IUI07.pdf
   - Warncke-Wang, M., Cosley, D., & Riedl, J. (2013, August). Tell me
   more: an actionable quality model for Wikipedia. In *Proceedings of the
   9th International Symposium on Open Collaboration* (p. 8). ACM.
   http://opensym.org/wsos2013/proceedings/p0202-warncke.pdf

If you are interested in the article quality predictions that are used in SuggestBot, check out https://pythonhosted.org/wikiclass/ Right now, we only have models built for English Wikipedia, but the features are relatively agnostic and should work in other languages.

-Aaron

On Fri, Oct 24, 2014 at 10:31 AM, Aileen Oeberst <[hidden email]>
wrote:

> I am currently on vacation and will not be able to answer your mail
> before November 10. But I will get back then as soon as possible.
>
> Best regards, Aileen Oeberst
>
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/wiki-research-l/attachments/20141024/4b9ae463/attachment-0001.html>

------------------------------

Message: 4
Date: Fri, 24 Oct 2014 20:09:37 +0200
From: Ziko van Dijk <[hidden email]>
To: Research into Wikimedia content and communities
        <[hidden email]>
Subject: Re: [Wiki-research-l] Tool to find poorly written articles
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset="utf-8"

Hello,
What do you exactly mean by poorly written? Dnaber presented on Wikimania a LanguageTool to detect wordings that might be incorrect.
Kind regards
Ziko

Am Freitag, 24. Oktober 2014 schrieb Aaron Halfaker :

> Hi Ditty!
>
> Since Aileen is on vacation (lol), I've got some references for you.
>
>    - Cosley, D., Frankowski, D., Terveen, L., & Riedl, J. (2007,
>    January). SuggestBot: using intelligent task routing to help people find
>    work in wikipedia. In *Proceedings of the 12th international
>    conference on Intelligent user interfaces* (pp. 32-41). ACM.
>    http://pensivepuffin.com/dwmcphd/syllabi/info447_wi14/readings/09-Systems/Cosley.SuggestBot.IUI07.pdf
>    - Warncke-Wang, M., Cosley, D., & Riedl, J. (2013, August). Tell me
>    more: an actionable quality model for Wikipedia. In *Proceedings of
>    the 9th International Symposium on Open Collaboration* (p. 8). ACM.
>    http://opensym.org/wsos2013/proceedings/p0202-warncke.pdf
>
> If you are interested in the article quality predictions that are used
> in SuggestBot, check out https://pythonhosted.org/wikiclass/ Right
> now, we only have models built for English Wikipedia, but the features
> are relatively agnostic and should work in other languages.
>
> -Aaron
>
> On Fri, Oct 24, 2014 at 10:31 AM, Aileen Oeberst
> <[hidden email] <javascript:_e(%7B%7D,'cvml','[hidden email]');>> wrote:
>
>> I am currently on vacation and will not be able to answer your mail
>> before November 10. But I will get back then as soon as possible.
>>
>> Best regards, Aileen Oeberst
>>
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> [hidden email]
>> <javascript:_e(%7B%7D,'cvml','[hidden email]');>
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/wiki-research-l/attachments/20141024/c1799a45/attachment-0001.html>

------------------------------

Message: 5
Date: Fri, 24 Oct 2014 14:11:48 -0400
From: Ditty Mathew <[hidden email]>
To: Research into Wikimedia content and communities
        <[hidden email]>
Subject: Re: [Wiki-research-l] Tool to find poorly written articles
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset="utf-8"

What I meant by poorly written is the quality of article.

with regards

Ditty

On Fri, Oct 24, 2014 at 2:09 PM, Ziko van Dijk <[hidden email]> wrote:

> Hello,
> What do you exactly mean by poorly written? Dnaber presented on
> Wikimania a LanguageTool to detect wordings that might be incorrect.
> Kind regards
> Ziko
>
> Am Freitag, 24. Oktober 2014 schrieb Aaron Halfaker :
>
> Hi Ditty!
>>
>> Since Aileen is on vacation (lol), I've got some references for you.
>>
>>    - Cosley, D., Frankowski, D., Terveen, L., & Riedl, J. (2007,
>>    January). SuggestBot: using intelligent task routing to help people find
>>    work in wikipedia. In *Proceedings of the 12th international
>>    conference on Intelligent user interfaces* (pp. 32-41). ACM.
>>    http://pensivepuffin.com/dwmcphd/syllabi/info447_wi14/readings/09-Systems/Cosley.SuggestBot.IUI07.pdf
>>    - Warncke-Wang, M., Cosley, D., & Riedl, J. (2013, August). Tell me
>>    more: an actionable quality model for Wikipedia. In *Proceedings of
>>    the 9th International Symposium on Open Collaboration* (p. 8). ACM.
>>    http://opensym.org/wsos2013/proceedings/p0202-warncke.pdf
>>
>> If you are interested in the article quality predictions that are
>> used in SuggestBot, check out https://pythonhosted.org/wikiclass/ 
>> Right now, we only have models built for English Wikipedia, but the
>> features are relatively agnostic and should work in other languages.
>>
>> -Aaron
>>
>> On Fri, Oct 24, 2014 at 10:31 AM, Aileen Oeberst
>> <[hidden email]>
>> wrote:
>>
>>> I am currently on vacation and will not be able to answer your mail
>>> before November 10. But I will get back then as soon as possible.
>>>
>>> Best regards, Aileen Oeberst
>>>
>>>
>>> _______________________________________________
>>> Wiki-research-l mailing list
>>> [hidden email]
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>
>>
>>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/wiki-research-l/attachments/20141024/527a3a3c/attachment-0001.html>

------------------------------

Message: 6
Date: Fri, 24 Oct 2014 14:13:33 -0400
From: Ditty Mathew <[hidden email]>
To: Research into Wikimedia content and communities
        <[hidden email]>
Subject: Re: [Wiki-research-l] Tool to find poorly written articles
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset="utf-8"

Hi Aaron,

How can we evaluate the system? Is there any existing rating of articles available?

with regards

Ditty

On Fri, Oct 24, 2014 at 12:23 PM, Aaron Halfaker <[hidden email]>
wrote:

> Hi Ditty!
>
> Since Aileen is on vacation (lol), I've got some references for you.
>
>    - Cosley, D., Frankowski, D., Terveen, L., & Riedl, J. (2007,
>    January). SuggestBot: using intelligent task routing to help people find
>    work in wikipedia. In *Proceedings of the 12th international
>    conference on Intelligent user interfaces* (pp. 32-41). ACM.
>    http://pensivepuffin.com/dwmcphd/syllabi/info447_wi14/readings/09-Systems/Cosley.SuggestBot.IUI07.pdf
>    - Warncke-Wang, M., Cosley, D., & Riedl, J. (2013, August). Tell me
>    more: an actionable quality model for Wikipedia. In *Proceedings of
>    the 9th International Symposium on Open Collaboration* (p. 8). ACM.
>    http://opensym.org/wsos2013/proceedings/p0202-warncke.pdf
>
> If you are interested in the article quality predictions that are used
> in SuggestBot, check out https://pythonhosted.org/wikiclass/ Right
> now, we only have models built for English Wikipedia, but the features
> are relatively agnostic and should work in other languages.
>
> -Aaron
>
> On Fri, Oct 24, 2014 at 10:31 AM, Aileen Oeberst
> <[hidden email]>
> wrote:
>
>> I am currently on vacation and will not be able to answer your mail
>> before November 10. But I will get back then as soon as possible.
>>
>> Best regards, Aileen Oeberst
>>
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.wikimedia.org/pipermail/wiki-research-l/attachments/20141024/1ddbb865/attachment.html>

------------------------------

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


End of Wiki-research-l Digest, Vol 110, Issue 15
************************************************


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Tool to find poorly written articles

WereSpielChequers-2
In reply to this post by James Salsman-2
And just to add to the complexity of James' comments; there are some people who think that a general interest encyclopaedia should be written for a general audience. So articles with long sentences should be improved by rewriting into more but shorter sentences,

On 24 October 2014 19:44, James Salsman <[hidden email]> wrote:
Ditty,

Article quality is inherently subjective in the hard-AI sense. A panel of judges will consider accurate articles full of spelling, grammar, and formatting errors superior in quality to hoax, biased, spam, or out-of-date articles with perfect grammar, impeccable spelling, and immaculate formatting.

In my studies of the short popular vital articles (WP:SPVA) the closest correlation with subjective mean opinion score quality I've found so far is sentence length. But it has diminishing returns and the raw correlation is +0.2 at best.

The entirely subjective nature of article quality is additional support for automating accuracy review.

Best regards,
James


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l



_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Tool to find poorly written articles

Aileen Oeberst
In reply to this post by Ditty Mathew
I am currently on vacation and will not be able to answer your mail before
November 10. But I will get back then as soon as possible.

Best regards, Aileen Oeberst


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Tool to find poorly written articles

Joe Corneli-3
In reply to this post by WereSpielChequers-2

On Sat, Oct 25 2014, WereSpielChequers wrote:

> And just to add to the complexity of James' comments; there are some people
> who think that a general interest encyclopaedia should be written for a
> general audience. So articles with long sentences should be improved by
> rewriting into more but shorter sentences,

How about an even simpler version of the problem: an encyclopedia
written by robots for robots.  I speak, of course, of DBPedia.  We could
equally ask, what makes for quality entries there?

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Tool to find poorly written articles

Ziko van Dijk-3
Hello Ditty,

It is difficult for me to understand your question if you are not more
specific of what you consider a "poorly written article". "Poorly" can
refer her to many different things, like readability, grammar,
balance, statements supported by 'sources', good division of knowledge
over several articles etc.

I think that software tools can only give a hint, but the judgement
(how "good" is an article) can be done only by a human, on the basis
of concrete criteria what is meant to be "good", and for what target
group. I tend to say that some Wikipedia articles are "good" for
experts but at the same time unsuitable for the general public.

E.g., a software tool can count the words per sentence, but long
sentences are not necessarily good or bad by themselves.

Etc. :-)

Kind regards
Ziko







2014-10-25 1:47 GMT+02:00 Joe Corneli <[hidden email]>:

>
> On Sat, Oct 25 2014, WereSpielChequers wrote:
>
>> And just to add to the complexity of James' comments; there are some people
>> who think that a general interest encyclopaedia should be written for a
>> general audience. So articles with long sentences should be improved by
>> rewriting into more but shorter sentences,
>
> How about an even simpler version of the problem: an encyclopedia
> written by robots for robots.  I speak, of course, of DBPedia.  We could
> equally ask, what makes for quality entries there?
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Tool to find poorly written articles

Ditty Mathew
Hi Ziko,

You are right. But if the content of the article is very less or having less references, less edits, less no of images, less no of links etc, articles are of poor quality. Based on these factors, to some extent we can find the quality of article.

with regards

Ditty

On Sat, Oct 25, 2014 at 8:23 AM, Ziko van Dijk <[hidden email]> wrote:
Hello Ditty,

It is difficult for me to understand your question if you are not more
specific of what you consider a "poorly written article". "Poorly" can
refer her to many different things, like readability, grammar,
balance, statements supported by 'sources', good division of knowledge
over several articles etc.

I think that software tools can only give a hint, but the judgement
(how "good" is an article) can be done only by a human, on the basis
of concrete criteria what is meant to be "good", and for what target
group. I tend to say that some Wikipedia articles are "good" for
experts but at the same time unsuitable for the general public.

E.g., a software tool can count the words per sentence, but long
sentences are not necessarily good or bad by themselves.

Etc. :-)

Kind regards
Ziko







2014-10-25 1:47 GMT+02:00 Joe Corneli <[hidden email]>:
>
> On Sat, Oct 25 2014, WereSpielChequers wrote:
>
>> And just to add to the complexity of James' comments; there are some people
>> who think that a general interest encyclopaedia should be written for a
>> general audience. So articles with long sentences should be improved by
>> rewriting into more but shorter sentences,
>
> How about an even simpler version of the problem: an encyclopedia
> written by robots for robots.  I speak, of course, of DBPedia.  We could
> equally ask, what makes for quality entries there?
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Tool to find poorly written articles

Ziko van Dijk-3
Okay. What do you think of the wikibu tool from Switzerland? It
believes that the number of editors and readers etc are indicators for
the quality, or at least a basis to discuss.
Kind regards
Ziko

http://www.wikibu.ch/search.php?search=Frankfurter+Nationalversammlung

2014-10-25 14:44 GMT+02:00 Ditty Mathew <[hidden email]>:

> Hi Ziko,
>
> You are right. But if the content of the article is very less or having less
> references, less edits, less no of images, less no of links etc, articles
> are of poor quality. Based on these factors, to some extent we can find the
> quality of article.
>
> with regards
>
> Ditty
>
> On Sat, Oct 25, 2014 at 8:23 AM, Ziko van Dijk <[hidden email]> wrote:
>>
>> Hello Ditty,
>>
>> It is difficult for me to understand your question if you are not more
>> specific of what you consider a "poorly written article". "Poorly" can
>> refer her to many different things, like readability, grammar,
>> balance, statements supported by 'sources', good division of knowledge
>> over several articles etc.
>>
>> I think that software tools can only give a hint, but the judgement
>> (how "good" is an article) can be done only by a human, on the basis
>> of concrete criteria what is meant to be "good", and for what target
>> group. I tend to say that some Wikipedia articles are "good" for
>> experts but at the same time unsuitable for the general public.
>>
>> E.g., a software tool can count the words per sentence, but long
>> sentences are not necessarily good or bad by themselves.
>>
>> Etc. :-)
>>
>> Kind regards
>> Ziko
>>
>>
>>
>>
>>
>>
>>
>> 2014-10-25 1:47 GMT+02:00 Joe Corneli <[hidden email]>:
>> >
>> > On Sat, Oct 25 2014, WereSpielChequers wrote:
>> >
>> >> And just to add to the complexity of James' comments; there are some
>> >> people
>> >> who think that a general interest encyclopaedia should be written for a
>> >> general audience. So articles with long sentences should be improved by
>> >> rewriting into more but shorter sentences,
>> >
>> > How about an even simpler version of the problem: an encyclopedia
>> > written by robots for robots.  I speak, of course, of DBPedia.  We could
>> > equally ask, what makes for quality entries there?
>> >
>> > _______________________________________________
>> > Wiki-research-l mailing list
>> > [hidden email]
>> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Tool to find poorly written articles

Joe Corneli-3
In reply to this post by Ditty Mathew

On Sat, Oct 25 2014, Ditty Mathew wrote:

> Hi Ziko,
>
> You are right. But if the content of the article is very less or having
> less references, less edits, less no of images, less no of links etc,
> articles are of poor quality. Based on these factors, to some extent we can
> find the quality of article.

To some extent I would agree with you, and there's a comparison of just
this nature on pp. 96-98 of my thesis (http://oro.open.ac.uk/40775/).

However, the classic Hannah Arendt [1] vs Pamela Anderson [2] example
seems like it might be a challenge: I'd be curious to know which one of
those articles your metrics would describe as better quality.  And how
would you compare those to the biography of Meridith L. Patterson [3]?

Further, if you try to compare biographical articles with articles on
technical topics, like the article on ultrafilters [3] mentioned in my
thesis, then you'll really be comparing apples and oranges.  At the very
least it seems like you should take into consideration "network"
properties of the article relative to other *related* articles --
although then you will quickly get into the business of evaluating
sub-sections of the encyclopedia.

You may also have to consider the role that the article is meant to
play: e.g. is it just there to present simple facts, or is it meant to
be more expository?  A print encyclopedia would have zero links, and a
given article might be "impressionistic" and still high-calibre:
http://www.newyorker.com/magazine/2001/07/16/encyclopaedia-anderson

Joe

[1] https://en.wikipedia.org/wiki/Hannah_Arendt
[2] https://en.wikipedia.org/wiki/Pamela_Anderson
[3] https://en.wikipedia.org/wiki/Meredith_L._Patterson
[4] https://en.wikipedia.org/wiki/Ultrafilter

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Tool to find poorly written articles

Kerry Raymond
In reply to this post by Ziko van Dijk-3
I think it's pointless to argue over what we mean by "quality" or "well
written" in general. It is fair to say that there are a lot of mechanically
derivable metrics for articles including:

* number of citations
* number of unique citations
* article length
* density of citations, unique citations relative to article length
* ditto for photos, infoboxes, navbox, categories etc
* linguistic analysis like sentence length, Flesch-Kincaid readability
scores
* Age of article
* Number of editors
* Number of page views
* Density of ...
* number of reverts
* reverts per editor/year/etc ..
* number of inbound links, number of outbound links, number of redlinks
* manual quality assessments (usually in project tags)
* presence of "issue" tags, e.g. refimprove, citation needed, etc

It seem to be that if we had a tool that could generate a wide range of
these sort of metrics, folks could then put their own algorithm over the top
to compute and weight whatever combination of them makes sense for their
particular purpose.

Kerry

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Ziko van
Dijk
Sent: Saturday, 25 October 2014 11:28 PM
To: Research into Wikimedia content and communities
Subject: Re: [Wiki-research-l] Tool to find poorly written articles

Okay. What do you think of the wikibu tool from Switzerland? It
believes that the number of editors and readers etc are indicators for
the quality, or at least a basis to discuss.
Kind regards
Ziko

http://www.wikibu.ch/search.php?search=Frankfurter+Nationalversammlung

2014-10-25 14:44 GMT+02:00 Ditty Mathew <[hidden email]>:
> Hi Ziko,
>
> You are right. But if the content of the article is very less or having
less
> references, less edits, less no of images, less no of links etc, articles
> are of poor quality. Based on these factors, to some extent we can find
the

> quality of article.
>
> with regards
>
> Ditty
>
> On Sat, Oct 25, 2014 at 8:23 AM, Ziko van Dijk <[hidden email]> wrote:
>>
>> Hello Ditty,
>>
>> It is difficult for me to understand your question if you are not more
>> specific of what you consider a "poorly written article". "Poorly" can
>> refer her to many different things, like readability, grammar,
>> balance, statements supported by 'sources', good division of knowledge
>> over several articles etc.
>>
>> I think that software tools can only give a hint, but the judgement
>> (how "good" is an article) can be done only by a human, on the basis
>> of concrete criteria what is meant to be "good", and for what target
>> group. I tend to say that some Wikipedia articles are "good" for
>> experts but at the same time unsuitable for the general public.
>>
>> E.g., a software tool can count the words per sentence, but long
>> sentences are not necessarily good or bad by themselves.
>>
>> Etc. :-)
>>
>> Kind regards
>> Ziko
>>
>>
>>
>>
>>
>>
>>
>> 2014-10-25 1:47 GMT+02:00 Joe Corneli <[hidden email]>:
>> >
>> > On Sat, Oct 25 2014, WereSpielChequers wrote:
>> >
>> >> And just to add to the complexity of James' comments; there are some
>> >> people
>> >> who think that a general interest encyclopaedia should be written for
a
>> >> general audience. So articles with long sentences should be improved
by
>> >> rewriting into more but shorter sentences,
>> >
>> > How about an even simpler version of the problem: an encyclopedia
>> > written by robots for robots.  I speak, of course, of DBPedia.  We
could

>> > equally ask, what makes for quality entries there?
>> >
>> > _______________________________________________
>> > Wiki-research-l mailing list
>> > [hidden email]
>> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Tool to find poorly written articles

Edward Saperia

I agree with this *so much*. Give us infrastructure to make views, and we'll use it to make amazing things!

Sent from my iPhone

> On 25 Oct 2014, at 21:41, "Kerry Raymond" <[hidden email]> wrote:
>
> I think it's pointless to argue over what we mean by "quality" or "well
> written" in general. It is fair to say that there are a lot of mechanically
> derivable metrics for articles including:
>
> * number of citations
> * number of unique citations
> * article length
> * density of citations, unique citations relative to article length
> * ditto for photos, infoboxes, navbox, categories etc
> * linguistic analysis like sentence length, Flesch-Kincaid readability
> scores
> * Age of article
> * Number of editors
> * Number of page views
> * Density of ...
> * number of reverts
> * reverts per editor/year/etc ..
> * number of inbound links, number of outbound links, number of redlinks
> * manual quality assessments (usually in project tags)
> * presence of "issue" tags, e.g. refimprove, citation needed, etc
>
> It seem to be that if we had a tool that could generate a wide range of
> these sort of metrics, folks could then put their own algorithm over the top
> to compute and weight whatever combination of them makes sense for their
> particular purpose.
>
> Kerry
>
> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of Ziko van
> Dijk
> Sent: Saturday, 25 October 2014 11:28 PM
> To: Research into Wikimedia content and communities
> Subject: Re: [Wiki-research-l] Tool to find poorly written articles
>
> Okay. What do you think of the wikibu tool from Switzerland? It
> believes that the number of editors and readers etc are indicators for
> the quality, or at least a basis to discuss.
> Kind regards
> Ziko
>
> http://www.wikibu.ch/search.php?search=Frankfurter+Nationalversammlung
>
> 2014-10-25 14:44 GMT+02:00 Ditty Mathew <[hidden email]>:
>> Hi Ziko,
>>
>> You are right. But if the content of the article is very less or having
> less
>> references, less edits, less no of images, less no of links etc, articles
>> are of poor quality. Based on these factors, to some extent we can find
> the
>> quality of article.
>>
>> with regards
>>
>> Ditty
>>
>>> On Sat, Oct 25, 2014 at 8:23 AM, Ziko van Dijk <[hidden email]> wrote:
>>>
>>> Hello Ditty,
>>>
>>> It is difficult for me to understand your question if you are not more
>>> specific of what you consider a "poorly written article". "Poorly" can
>>> refer her to many different things, like readability, grammar,
>>> balance, statements supported by 'sources', good division of knowledge
>>> over several articles etc.
>>>
>>> I think that software tools can only give a hint, but the judgement
>>> (how "good" is an article) can be done only by a human, on the basis
>>> of concrete criteria what is meant to be "good", and for what target
>>> group. I tend to say that some Wikipedia articles are "good" for
>>> experts but at the same time unsuitable for the general public.
>>>
>>> E.g., a software tool can count the words per sentence, but long
>>> sentences are not necessarily good or bad by themselves.
>>>
>>> Etc. :-)
>>>
>>> Kind regards
>>> Ziko
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> 2014-10-25 1:47 GMT+02:00 Joe Corneli <[hidden email]>:
>>>>
>>>>> On Sat, Oct 25 2014, WereSpielChequers wrote:
>>>>>
>>>>> And just to add to the complexity of James' comments; there are some
>>>>> people
>>>>> who think that a general interest encyclopaedia should be written for
> a
>>>>> general audience. So articles with long sentences should be improved
> by
>>>>> rewriting into more but shorter sentences,
>>>>
>>>> How about an even simpler version of the problem: an encyclopedia
>>>> written by robots for robots.  I speak, of course, of DBPedia.  We
> could
>>>> equally ask, what makes for quality entries there?
>>>>
>>>> _______________________________________________
>>>> Wiki-research-l mailing list
>>>> [hidden email]
>>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>>
>>> _______________________________________________
>>> Wiki-research-l mailing list
>>> [hidden email]
>>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>>
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Tool to find poorly written articles

Jack Park
In reply to this post by Kerry Raymond
I think it's a bit of time from now, but there are several open source Watson-like machine reading tools coming along, any one of which could be put to the task of interest here.  Could do more than that, but it's a start.


On Sat, Oct 25, 2014 at 1:41 PM, Kerry Raymond <[hidden email]> wrote:
I think it's pointless to argue over what we mean by "quality" or "well
written" in general. It is fair to say that there are a lot of mechanically
derivable metrics for articles including:

* number of citations
* number of unique citations
* article length
* density of citations, unique citations relative to article length
* ditto for photos, infoboxes, navbox, categories etc
* linguistic analysis like sentence length, Flesch-Kincaid readability
scores
* Age of article
* Number of editors
* Number of page views
* Density of ...
* number of reverts
* reverts per editor/year/etc ..
* number of inbound links, number of outbound links, number of redlinks
* manual quality assessments (usually in project tags)
* presence of "issue" tags, e.g. refimprove, citation needed, etc

It seem to be that if we had a tool that could generate a wide range of
these sort of metrics, folks could then put their own algorithm over the top
to compute and weight whatever combination of them makes sense for their
particular purpose.

Kerry

-----Original Message-----
From: [hidden email]
[mailto:[hidden email]] On Behalf Of Ziko van
Dijk
Sent: Saturday, 25 October 2014 11:28 PM
To: Research into Wikimedia content and communities
Subject: Re: [Wiki-research-l] Tool to find poorly written articles

Okay. What do you think of the wikibu tool from Switzerland? It
believes that the number of editors and readers etc are indicators for
the quality, or at least a basis to discuss.
Kind regards
Ziko

http://www.wikibu.ch/search.php?search=Frankfurter+Nationalversammlung

2014-10-25 14:44 GMT+02:00 Ditty Mathew <[hidden email]>:
> Hi Ziko,
>
> You are right. But if the content of the article is very less or having
less
> references, less edits, less no of images, less no of links etc, articles
> are of poor quality. Based on these factors, to some extent we can find
the
> quality of article.
>
> with regards
>
> Ditty
>
> On Sat, Oct 25, 2014 at 8:23 AM, Ziko van Dijk <[hidden email]> wrote:
>>
>> Hello Ditty,
>>
>> It is difficult for me to understand your question if you are not more
>> specific of what you consider a "poorly written article". "Poorly" can
>> refer her to many different things, like readability, grammar,
>> balance, statements supported by 'sources', good division of knowledge
>> over several articles etc.
>>
>> I think that software tools can only give a hint, but the judgement
>> (how "good" is an article) can be done only by a human, on the basis
>> of concrete criteria what is meant to be "good", and for what target
>> group. I tend to say that some Wikipedia articles are "good" for
>> experts but at the same time unsuitable for the general public.
>>
>> E.g., a software tool can count the words per sentence, but long
>> sentences are not necessarily good or bad by themselves.
>>
>> Etc. :-)
>>
>> Kind regards
>> Ziko
>>
>>
>>
>>
>>
>>
>>
>> 2014-10-25 1:47 GMT+02:00 Joe Corneli <[hidden email]>:
>> >
>> > On Sat, Oct 25 2014, WereSpielChequers wrote:
>> >
>> >> And just to add to the complexity of James' comments; there are some
>> >> people
>> >> who think that a general interest encyclopaedia should be written for
a
>> >> general audience. So articles with long sentences should be improved
by
>> >> rewriting into more but shorter sentences,
>> >
>> > How about an even simpler version of the problem: an encyclopedia
>> > written by robots for robots.  I speak, of course, of DBPedia.  We
could
>> > equally ask, what makes for quality entries there?
>> >
>> > _______________________________________________
>> > Wiki-research-l mailing list
>> > [hidden email]
>> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
>
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>

_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: Tool to find poorly written articles

Aileen Oeberst
In reply to this post by Ditty Mathew
I am currently on vacation and will not be able to answer your mail before
November 10. But I will get back then as soon as possible.

Best regards, Aileen Oeberst


_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
12