No. of articles deleted over time

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

No. of articles deleted over time

Haifeng Zhang
Dear all,

Is there an easy way to get the number of articles deleted over time (e.g., month) in Wikipedia?

Can I use Quarry? What tables should I use?


Thanks,

Haifeng Zhang
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: No. of articles deleted over time

Aaron Halfaker-2
Here's a related bit of work:
https://meta.wikimedia.org/wiki/Research:Wikipedia_article_creation

In this research project, I used a mix of both the deletion log and the
archive table to get a sense for when pages were being deleted.

Ultimately, I found that the easiest deletion event to operationalize was
to look at the most recent ar_timestamp for a page in the archive table.
 I could only go back to 2008 with this metric because the archive table
didn't exist before then.

The archive table is available in quarry.  See
https://quarry.wmflabs.org/query/38414 for an example query that gets the
timestamp of an article's last revision.

The logging table is also in quarry.  See
https://quarry.wmflabs.org/query/38415 for an example query that gets
deletion events.

On Tue, Aug 13, 2019 at 2:51 PM Haifeng Zhang <[hidden email]>
wrote:

> Dear all,
>
> Is there an easy way to get the number of articles deleted over time
> (e.g., month) in Wikipedia?
>
> Can I use Quarry? What tables should I use?
>
>
> Thanks,
>
> Haifeng Zhang
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: No. of articles deleted over time

metasj
Since but 26122 has been fixed, any reason not to use the deletion log
instead?

On Thu, Aug 15, 2019 at 10:27 AM Aaron Halfaker <[hidden email]>
wrote:

> Here's a related bit of work:
> https://meta.wikimedia.org/wiki/Research:Wikipedia_article_creation
>
> In this research project, I used a mix of both the deletion log and the
> archive table to get a sense for when pages were being deleted.
>
> Ultimately, I found that the easiest deletion event to operationalize was
> to look at the most recent ar_timestamp for a page in the archive table.
>  I could only go back to 2008 with this metric because the archive table
> didn't exist before then.
>
> The archive table is available in quarry.  See
> https://quarry.wmflabs.org/query/38414 for an example query that gets the
> timestamp of an article's last revision.
>
> The logging table is also in quarry.  See
> https://quarry.wmflabs.org/query/38415 for an example query that gets
> deletion events.
>
> On Tue, Aug 13, 2019 at 2:51 PM Haifeng Zhang <[hidden email]>
> wrote:
>
> > Dear all,
> >
> > Is there an easy way to get the number of articles deleted over time
> > (e.g., month) in Wikipedia?
> >
> > Can I use Quarry? What tables should I use?
> >
> >
> > Thanks,
> >
> > Haifeng Zhang
> > _______________________________________________
> > Wiki-research-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>


--
Samuel Klein          @metasj           w:user:sj          +1 617 529 4266
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
Reply | Threaded
Open this post in threaded view
|

Re: No. of articles deleted over time

Morten Wang
A couple of learnings about article deletions from the ACTRIAL analysis:

   1. The logging table does not appear to contain correct page IDs of
   deleted pages until some time in 2014[1]. If you're looking at historical
   data and want to combine earlier deletions with other information,
   following Aaron's lead and using the archive table is probably the way to
   go.
   2. The article namespace doesn't just contain "articles", it also
   contains redirects and disambiguation pages. Particularly redirects can
   affect measurements of number of pages deleted[2] because there have been
   instances of cleanup of substantial numbers of redirects. There's no
   information about redirect status in the archive table, as far as I know,
   but the log comment can be used to identify a substantial number of such
   deletions.

The code I used in our analysis of deletion reasons, which also covers the
article namespace, is on GitHub:
https://github.com/nettrom/actrial/blob/master/python/deletionreasons.py

Footnotes:

   1.
   https://meta.wikimedia.org/wiki/Research_talk:Autoconfirmed_article_creation_trial/Work_log/2018-01-29
   2.
   https://meta.wikimedia.org/wiki/Research_talk:Autoconfirmed_article_creation_trial/Work_log/2018-01-19#Improving_the_data_gathering


Cheers,
Morten

On Fri, 16 Aug 2019 at 05:31, Samuel Klein <[hidden email]> wrote:

> Since but 26122 has been fixed, any reason not to use the deletion log
> instead?
>
> On Thu, Aug 15, 2019 at 10:27 AM Aaron Halfaker <[hidden email]>
> wrote:
>
> > Here's a related bit of work:
> > https://meta.wikimedia.org/wiki/Research:Wikipedia_article_creation
> >
> > In this research project, I used a mix of both the deletion log and the
> > archive table to get a sense for when pages were being deleted.
> >
> > Ultimately, I found that the easiest deletion event to operationalize was
> > to look at the most recent ar_timestamp for a page in the archive table.
> >  I could only go back to 2008 with this metric because the archive table
> > didn't exist before then.
> >
> > The archive table is available in quarry.  See
> > https://quarry.wmflabs.org/query/38414 for an example query that gets
> the
> > timestamp of an article's last revision.
> >
> > The logging table is also in quarry.  See
> > https://quarry.wmflabs.org/query/38415 for an example query that gets
> > deletion events.
> >
> > On Tue, Aug 13, 2019 at 2:51 PM Haifeng Zhang <[hidden email]>
> > wrote:
> >
> > > Dear all,
> > >
> > > Is there an easy way to get the number of articles deleted over time
> > > (e.g., month) in Wikipedia?
> > >
> > > Can I use Quarry? What tables should I use?
> > >
> > >
> > > Thanks,
> > >
> > > Haifeng Zhang
> > > _______________________________________________
> > > Wiki-research-l mailing list
> > > [hidden email]
> > > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> > >
> > _______________________________________________
> > Wiki-research-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
> >
>
>
> --
> Samuel Klein          @metasj           w:user:sj          +1 617 529 4266
> _______________________________________________
> Wiki-research-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
_______________________________________________
Wiki-research-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l