[MediaWiki-l] Permanently remove old revisions and unused files?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[MediaWiki-l] Permanently remove old revisions and unused files?

Mickey Feldman
I have been looking for an extension or process to remove all revisions
of pages "older than _date_" or "all but the last _n_", but have not
found anything close.

This is a private corporate wiki used for internal documentation. Pages
evolve, but then generally stabilize and are then only for reference and
rarely edited. There is no need to keep the 100's of revisions that grew
them to their final form.

Likewise, there are older and unused versions of uploaded files that are
just clutter.

Extension:Nuke does not meet this need.
Extension:DeleteBatch doesn't either.
Extension:DeletePagePermanently - nope.

There are maintenance scripts for Deleting Archived revisions and
purging old text - also not what I'm looking for.

So far I'm finding no way to do this other than manually, one page at a
time, which is a no go. There are 10s of thousands of pages.

I may have to write a new extension from scratch, but I'm finding it
hard to believe this functionality does not already exist.

Have I overlooked something obvious? Am I the only one who has wanted
something like this?

Thanks in advance.


--
M. Feldman

---------------------------

Vigil Health Solutions Inc.
www.vigil.com


_______________________________________________
MediaWiki-l mailing list
To unsubscribe, go to:
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: Permanently remove old revisions and unused files?

George William Herbert

Why?  Space is rarely an issue...

George William Herbert
Sent from my iPhone

> On Feb 4, 2016, at 12:38 PM, Mickey Feldman <[hidden email]> wrote:
>
> I have been looking for an extension or process to remove all revisions of pages "older than _date_" or "all but the last _n_", but have not found anything close.
>
> This is a private corporate wiki used for internal documentation. Pages evolve, but then generally stabilize and are then only for reference and rarely edited. There is no need to keep the 100's of revisions that grew them to their final form.
>
> Likewise, there are older and unused versions of uploaded files that are just clutter.
>
> Extension:Nuke does not meet this need.
> Extension:DeleteBatch doesn't either.
> Extension:DeletePagePermanently - nope.
>
> There are maintenance scripts for Deleting Archived revisions and purging old text - also not what I'm looking for.
>
> So far I'm finding no way to do this other than manually, one page at a time, which is a no go. There are 10s of thousands of pages.
>
> I may have to write a new extension from scratch, but I'm finding it hard to believe this functionality does not already exist.
>
> Have I overlooked something obvious? Am I the only one who has wanted something like this?
>
> Thanks in advance.
>
>
> --
> M. Feldman
>
> ---------------------------
>
> Vigil Health Solutions Inc.
> www.vigil.com
>
>
> _______________________________________________
> MediaWiki-l mailing list
> To unsubscribe, go to:
> https://lists.wikimedia.org/mailman/listinfo/mediawiki-l

_______________________________________________
MediaWiki-l mailing list
To unsubscribe, go to:
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: Permanently remove old revisions and unused files?

Daniel Barrett
In reply to this post by Mickey Feldman
Mickey,

What business problem are you trying to solve by deleting old revisions of articles? They don’t take up much disk space, and they aren’t visible unless you intentionally go looking for them (with the View History tab).  Is the problem just personal taste -- you don't like seeing so many revisions -- or is there some other business reason? Note: If you don’t want users to see revisions at all, you could hide the View History tab with a line in Mediawiki:Vector.css (or Common.css), at least as a first step:

   #ca-history { display:none; }

If the old versions are a security risk, there is feature to hide (not delete) particular revisions: https://www.mediawiki.org/wiki/Manual:RevisionDelete.

Regarding removal of unused, uploaded files, here is a SQL query that (I believe) lists all unused files that are more than 90 days old. (Critiques are welcome.) You can then feed the list to the script "maintenance/deleteBatch.php" supplied with Mediawiki to delete them.

select
 concat('File:', p.page_title) as 'unused file'
from
 wp_page p
 left outer join wp_imagelinks il on (il.il_to = p.page_title)
 inner join wp_image i on (i.img_name = p.page_title)
where
 il.il_to is null
 and datediff(now(), i.img_timestamp) > 90

DanB

================
From: MediaWiki-l [mailto:[hidden email]] On Behalf Of Mickey Feldman
Sent: Thursday, February 04, 2016 3:38 PM
To: [hidden email]
Subject: [MediaWiki-l] Permanently remove old revisions and unused files?

I have been looking for an extension or process to remove all revisions
of pages "older than _date_" or "all but the last _n_", but have not
found anything close.

This is a private corporate wiki used for internal documentation. Pages
evolve, but then generally stabilize and are then only for reference and
rarely edited. There is no need to keep the 100's of revisions that grew
them to their final form.

Likewise, there are older and unused versions of uploaded files that are
just clutter.

Extension:Nuke does not meet this need.
Extension:DeleteBatch doesn't either.
Extension:DeletePagePermanently - nope.

There are maintenance scripts for Deleting Archived revisions and
purging old text - also not what I'm looking for.

So far I'm finding no way to do this other than manually, one page at a
time, which is a no go. There are 10s of thousands of pages.

I may have to write a new extension from scratch, but I'm finding it
hard to believe this functionality does not already exist.

Have I overlooked something obvious? Am I the only one who has wanted
something like this?

Thanks in advance.


--
M. Feldman

---------------------------

Vigil Health Solutions Inc.
www.vigil.com


_______________________________________________
MediaWiki-l mailing list
To unsubscribe, go to:
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
_______________________________________________
MediaWiki-l mailing list
To unsubscribe, go to:
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: Permanently remove old revisions and unused files?

Mickey Feldman

The wiki is now about 10 GB. It does compress to about 1 GB. Although
text pages are saved as their differences, as far as I know new versions
of images are saved in their entirety. Even saving only diffs, hundreds
of revisions of thousands of pages does add up.

I want to do a daily off-site backup. This wiki is on a shared virtual
host without command line access, thus no ability to use rsync over ssh,
which would allow only the changes to be moved. An entire image of the
system needs to be saved so that it can be restored fairly painlessly. I
want to shrink this as much as possible, since I am already running into
problems with the archiving of the system on the host due to size. For
example, I have had to compress each branch of the images folder
individually - gzipping it into a single archive fails, despite the
hosting company having bumped timeouts and memory allowances.

Moving to a host with complete command line access might be a solution,
but currently the hosting company deals with security issues (beyond
allowing only  authorized users to log in of course). If we go to
something like rackspace, then security becomes our problem, and I don't
pretend to have sufficient expertise in that.


Suggestions and alternatives welcome.

> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 5 Feb 2016 16:28:49 +0000
> From: Daniel Barrett <[hidden email]>
> To: MediaWiki announcements and site admin list
> <[hidden email]>
> Subject: Re: [MediaWiki-l] Permanently remove old revisions and unused
> files?
> Message-ID:
> <[hidden email]>
>
> Content-Type: text/plain; charset=UTF-8
>
> Mickey,
>
> What business problem are you trying to solve by deleting old revisions of articles? They don’t take up much disk space, and they aren’t visible unless you intentionally go looking for them (with the View History tab).  Is the problem just personal taste -- you don't like seeing so many revisions -- or is there some other business reason? Note: If you don’t want users to see revisions at all, you could hide the View History tab with a line in Mediawiki:Vector.css (or Common.css), at least as a first step:
>
>     #ca-history { display:none; }
>
> If the old versions are a security risk, there is feature to hide (not delete) particular revisions: https://www.mediawiki.org/wiki/Manual:RevisionDelete.
>
> Regarding removal of unused, uploaded files, here is a SQL query that (I believe) lists all unused files that are more than 90 days old. (Critiques are welcome.) You can then feed the list to the script "maintenance/deleteBatch.php" supplied with Mediawiki to delete them.
>
> select
>   concat('File:', p.page_title) as 'unused file'
> from
>   wp_page p
>   left outer join wp_imagelinks il on (il.il_to = p.page_title)
>   inner join wp_image i on (i.img_name = p.page_title)
> where
>   il.il_to is null
>   and datediff(now(), i.img_timestamp) > 90
>
> DanB
>
> ================
> From: MediaWiki-l [mailto:[hidden email]] On Behalf Of Mickey Feldman
> Sent: Thursday, February 04, 2016 3:38 PM
> To: [hidden email]
> Subject: [MediaWiki-l] Permanently remove old revisions and unused files?
>
> I have been looking for an extension or process to remove all revisions
> of pages "older than _date_" or "all but the last _n_", but have not
> found anything close.
>
> This is a private corporate wiki used for internal documentation. Pages
> evolve, but then generally stabilize and are then only for reference and
> rarely edited. There is no need to keep the 100's of revisions that grew
> them to their final form.
>
> Likewise, there are older and unused versions of uploaded files that are
> just clutter.
>
> Extension:Nuke does not meet this need.
> Extension:DeleteBatch doesn't either.
> Extension:DeletePagePermanently - nope.
>
> There are maintenance scripts for Deleting Archived revisions and
> purging old text - also not what I'm looking for.
>
> So far I'm finding no way to do this other than manually, one page at a
> time, which is a no go. There are 10s of thousands of pages.
>
> I may have to write a new extension from scratch, but I'm finding it
> hard to believe this functionality does not already exist.
>
> Have I overlooked something obvious? Am I the only one who has wanted
> something like this?
>
> Thanks in advance.
>
>


_______________________________________________
MediaWiki-l mailing list
To unsubscribe, go to:
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l
Reply | Threaded
Open this post in threaded view
|

Re: Permanently remove old revisions and unused files?

Platonides
On 09/02/16 15:31, Mickey Feldman wrote:
>
> The wiki is now about 10 GB. It does compress to about 1 GB. Although
> text pages are saved as their differences,

Not exactly. Only if you run a maintenance script, which you probably
haven't.


> as far as I know new versions of images are saved in their entirety.

Right



> I want to do a daily off-site backup. This wiki is on a shared virtual
> host without command line access, thus no ability to use rsync over ssh,
> which would allow only the changes to be moved. An entire image of the
> system needs to be saved so that it can be restored fairly painlessly. I
> want to shrink this as much as possible, since I am already running into
> problems with the archiving of the system on the host due to size. For
> example, I have had to compress each branch of the images folder
> individually - gzipping it into a single archive fails, despite the
> hosting company having bumped timeouts and memory allowances.
>
> Moving to a host with complete command line access might be a solution,
> but currently the hosting company deals with security issues (beyond
> allowing only authorized users to log in of course). If we go to
> something like rackspace, then security becomes our problem, and I don't
> pretend to have sufficient expertise in that.
>
>
> Suggestions and alternatives welcome.

If you are at the point of wanting to remove old revisions in order to
have a small backup size, you could as well filter it to only include
new revisions.

Seems like the wrong solution, though. I don't think the size of your
old revisions will be significative in the whole backup size.


I would check if rsync binary is installed, even if you don't have
command line access. You could have a cron (or launch from a web page…
if there's no other way) the rsync that copies the changes to the other
site.
Just note that should your main site be compromised, you do not want
your backup to be deleted through that path. Not that different from
ensuring that if your files were corrupted (on purpose?) your backup
won't be copying those changes before you notice, though.


_______________________________________________
MediaWiki-l mailing list
To unsubscribe, go to:
https://lists.wikimedia.org/mailman/listinfo/mediawiki-l