best way to clean up the wiki from 3Gb of spam

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

best way to clean up the wiki from 3Gb of spam

Yury Katkov
Hi everyone!

I have found myself in the following situation several times: I
created a wiki for some event or small project, everything works fine
and after the event or project was done - nobody have seen this wiki
for several months and does nothing on it. After several months
somebody needs the wiki once again and realizes that the wiki database
now have 3 Gb of text spam. Suppose that there is no back-up or
rollback option in a wiki hosting. So here is the question: how to

1) remove all the spam
2) delete all the spam accounts
3) reduce the database size from 3Gb to the original size

Cheers,
Yury Katkov, WikiVote

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: best way to clean up the wiki from 3Gb of spam

John Doe-27
Do you have a list of legitimate known good accounts?

On Fri, Aug 24, 2012 at 3:27 AM, Yury Katkov <[hidden email]> wrote:

> Hi everyone!
>
> I have found myself in the following situation several times: I
> created a wiki for some event or small project, everything works fine
> and after the event or project was done - nobody have seen this wiki
> for several months and does nothing on it. After several months
> somebody needs the wiki once again and realizes that the wiki database
> now have 3 Gb of text spam. Suppose that there is no back-up or
> rollback option in a wiki hosting. So here is the question: how to
>
> 1) remove all the spam
> 2) delete all the spam accounts
> 3) reduce the database size from 3Gb to the original size
>
> Cheers,
> Yury Katkov, WikiVote
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: best way to clean up the wiki from 3Gb of spam

Derric Atzrott
>> Hi everyone!
>>
>> I have found myself in the following situation several times: I
>> created a wiki for some event or small project, everything works fine
>> and after the event or project was done - nobody have seen this wiki
>> for several months and does nothing on it. After several months
>> somebody needs the wiki once again and realizes that the wiki database
>> now have 3 Gb of text spam. Suppose that there is no back-up or
>> rollback option in a wiki hosting. So here is the question: how to
>>
>> 1) remove all the spam
>> 2) delete all the spam accounts
>> 3) reduce the database size from 3Gb to the original size
>>
>> Cheers,
>> Yury Katkov, WikiVote
>
>Do you have a list of legitimate known good accounts?

I'm actually really interested in this too. I just deleted the databases
for two copies of Mediawiki that I ran for similar reasons...

Thank you,
Derric Atzrott


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: best way to clean up the wiki from 3Gb of spam

Tei-2
In reply to this post by Yury Katkov
On 24 August 2012 09:27, Yury Katkov <[hidden email]> wrote:

> Hi everyone!
>
> I have found myself in the following situation several times: I
> created a wiki for some event or small project, everything works fine
> and after the event or project was done - nobody have seen this wiki
> for several months and does nothing on it. After several months
> somebody needs the wiki once again and realizes that the wiki database
> now have 3 Gb of text spam. Suppose that there is no back-up or
> rollback option in a wiki hosting. So here is the question: how to
>

No backups, no way to roolback to a date? thats bad.
You could start a wiki from scratch, copy manually from the old one
whatever was good.  Maybe share this task with a few selected
voluntaries.
Start the new one without anonymous edits, a sexy theme and a huge
campaign to attract people. "No like the old wiki!, this is actually
good and maintaned!".
Maybe the lack of maintenance contributed to the decay. I wonder if a
wiki without enough contributors is worth existing, like a garden
without anyone to cut the grass.



--
--
ℱin del ℳensaje.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: best way to clean up the wiki from 3Gb of spam

Derric Atzrott
>> Hi everyone!
>>
>> I have found myself in the following situation several times: I
>> created a wiki for some event or small project, everything works fine
>> and after the event or project was done - nobody have seen this wiki
>> for several months and does nothing on it. After several months
>> somebody needs the wiki once again and realizes that the wiki database
>> now have 3 Gb of text spam. Suppose that there is no back-up or
>> rollback option in a wiki hosting. So here is the question: how to
>>
>
>No backups, no way to roolback to a date? thats bad.
>You could start a wiki from scratch, copy manually from the old one
>whatever was good.  Maybe share this task with a few selected
>voluntaries.
>Start the new one without anonymous edits, a sexy theme and a huge
>campaign to attract people. "No like the old wiki!, this is actually
>good and maintaned!".
>Maybe the lack of maintenance contributed to the decay. I wonder if a
>wiki without enough contributors is worth existing, like a garden
>without anyone to cut the grass.

Certainly.  If for no other reason than the historical value.

We still keep all the Wikimania wikis around.

Thank you,
Derric Atzrott


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: best way to clean up the wiki from 3Gb of spam

John Doe-27
In reply to this post by Tei-2
Given enough facts it would be rather easy for me to write a script
that nukes said spam I did something similar on
http://manual.fireman.com.br/wiki/Especial:Registro/Betacommand

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: best way to clean up the wiki from 3Gb of spam

Yury Katkov
Hi everyone! I agree with everyone in this thread, but the main
problem is that even if I create a bot of use extensions that removes
pages, the actual database records won't be deleted. If I understand
correctly, the MediaWiki philosophy tells us that we cannot just drop
the page or an account from the database - all the deletions means
only that we will hide those nasty spam pages.

Consequently after the deletions the size of my database won't shrink
to original 100 Mb, it remains around 3Gb which is a problem for
hosting.

The proposed solution of exporting all the pages to a brand new wiki
solves this problem. Are there any other solutions where the dropping
of my old spammed database does not involved?
-----
Yury Katkov



On Fri, Aug 24, 2012 at 4:13 PM, John <[hidden email]> wrote:
> Given enough facts it would be rather easy for me to write a script
> that nukes said spam I did something similar on
> http://manual.fireman.com.br/wiki/Especial:Registro/Betacommand
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: best way to clean up the wiki from 3Gb of spam

John Doe-27
What can be done after mass deleting is to purge the archive database
table which should reduce the database size significantly. If you take
a look at the the example where I cleaned up an existing site I
reduced the database size by about 90%

On Fri, Aug 24, 2012 at 11:47 AM, Yury Katkov <[hidden email]> wrote:

> Hi everyone! I agree with everyone in this thread, but the main
> problem is that even if I create a bot of use extensions that removes
> pages, the actual database records won't be deleted. If I understand
> correctly, the MediaWiki philosophy tells us that we cannot just drop
> the page or an account from the database - all the deletions means
> only that we will hide those nasty spam pages.
>
> Consequently after the deletions the size of my database won't shrink
> to original 100 Mb, it remains around 3Gb which is a problem for
> hosting.
>
> The proposed solution of exporting all the pages to a brand new wiki
> solves this problem. Are there any other solutions where the dropping
> of my old spammed database does not involved?
> -----
> Yury Katkov
>
>
>
> On Fri, Aug 24, 2012 at 4:13 PM, John <[hidden email]> wrote:
>> Given enough facts it would be rather easy for me to write a script
>> that nukes said spam I did something similar on
>> http://manual.fireman.com.br/wiki/Especial:Registro/Betacommand
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: best way to clean up the wiki from 3Gb of spam

Tyler Romeo
In reply to this post by Yury Katkov
Technically speaking, pages and accounts can be permanently deleted. (There
is an extension for it I believe.) However, since MediaWiki does not use
foreign keys, you have to be careful not to break things in the process.

*--*
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com | [hidden email]



On Fri, Aug 24, 2012 at 11:47 AM, Yury Katkov <[hidden email]>wrote:

> Hi everyone! I agree with everyone in this thread, but the main
> problem is that even if I create a bot of use extensions that removes
> pages, the actual database records won't be deleted. If I understand
> correctly, the MediaWiki philosophy tells us that we cannot just drop
> the page or an account from the database - all the deletions means
> only that we will hide those nasty spam pages.
>
> Consequently after the deletions the size of my database won't shrink
> to original 100 Mb, it remains around 3Gb which is a problem for
> hosting.
>
> The proposed solution of exporting all the pages to a brand new wiki
> solves this problem. Are there any other solutions where the dropping
> of my old spammed database does not involved?
> -----
> Yury Katkov
>
>
>
> On Fri, Aug 24, 2012 at 4:13 PM, John <[hidden email]> wrote:
> > Given enough facts it would be rather easy for me to write a script
> > that nukes said spam I did something similar on
> > http://manual.fireman.com.br/wiki/Especial:Registro/Betacommand
> >
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: best way to clean up the wiki from 3Gb of spam

Yury Katkov
In reply to this post by John Doe-27
http://www.mediawiki.org/wiki/Manual:Reduce_size_of_the_database and
here is the manual on how to purge the archive database! Thanks John,
that's a perfect solution!
-----
Yury Katkov



On Fri, Aug 24, 2012 at 7:51 PM, John <[hidden email]> wrote:

> What can be done after mass deleting is to purge the archive database
> table which should reduce the database size significantly. If you take
> a look at the the example where I cleaned up an existing site I
> reduced the database size by about 90%
>
> On Fri, Aug 24, 2012 at 11:47 AM, Yury Katkov <[hidden email]> wrote:
>> Hi everyone! I agree with everyone in this thread, but the main
>> problem is that even if I create a bot of use extensions that removes
>> pages, the actual database records won't be deleted. If I understand
>> correctly, the MediaWiki philosophy tells us that we cannot just drop
>> the page or an account from the database - all the deletions means
>> only that we will hide those nasty spam pages.
>>
>> Consequently after the deletions the size of my database won't shrink
>> to original 100 Mb, it remains around 3Gb which is a problem for
>> hosting.
>>
>> The proposed solution of exporting all the pages to a brand new wiki
>> solves this problem. Are there any other solutions where the dropping
>> of my old spammed database does not involved?
>> -----
>> Yury Katkov
>>
>>
>>
>> On Fri, Aug 24, 2012 at 4:13 PM, John <[hidden email]> wrote:
>>> Given enough facts it would be rather easy for me to write a script
>>> that nukes said spam I did something similar on
>>> http://manual.fireman.com.br/wiki/Especial:Registro/Betacommand
>>>
>>> _______________________________________________
>>> Wikitech-l mailing list
>>> [hidden email]
>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: best way to clean up the wiki from 3Gb of spam

John Doe-27
Like I said if you want I can whip up a script to nuke the spam, just
drop me an email off list

On Fri, Aug 24, 2012 at 11:54 AM, Yury Katkov <[hidden email]> wrote:

> http://www.mediawiki.org/wiki/Manual:Reduce_size_of_the_database and
> here is the manual on how to purge the archive database! Thanks John,
> that's a perfect solution!
> -----
> Yury Katkov
>
>
>
> On Fri, Aug 24, 2012 at 7:51 PM, John <[hidden email]> wrote:
>> What can be done after mass deleting is to purge the archive database
>> table which should reduce the database size significantly. If you take
>> a look at the the example where I cleaned up an existing site I
>> reduced the database size by about 90%
>>
>> On Fri, Aug 24, 2012 at 11:47 AM, Yury Katkov <[hidden email]> wrote:
>>> Hi everyone! I agree with everyone in this thread, but the main
>>> problem is that even if I create a bot of use extensions that removes
>>> pages, the actual database records won't be deleted. If I understand
>>> correctly, the MediaWiki philosophy tells us that we cannot just drop
>>> the page or an account from the database - all the deletions means
>>> only that we will hide those nasty spam pages.
>>>
>>> Consequently after the deletions the size of my database won't shrink
>>> to original 100 Mb, it remains around 3Gb which is a problem for
>>> hosting.
>>>
>>> The proposed solution of exporting all the pages to a brand new wiki
>>> solves this problem. Are there any other solutions where the dropping
>>> of my old spammed database does not involved?
>>> -----
>>> Yury Katkov
>>>
>>>
>>>
>>> On Fri, Aug 24, 2012 at 4:13 PM, John <[hidden email]> wrote:
>>>> Given enough facts it would be rather easy for me to write a script
>>>> that nukes said spam I did something similar on
>>>> http://manual.fireman.com.br/wiki/Especial:Registro/Betacommand
>>>>
>>>> _______________________________________________
>>>> Wikitech-l mailing list
>>>> [hidden email]
>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>
>>> _______________________________________________
>>> Wikitech-l mailing list
>>> [hidden email]
>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: best way to clean up the wiki from 3Gb of spam

Sumana Harihareswara-2
Speaking of scripts, it would be cool if someone would polish this set
of anti-spam scripts a little bit and see if it's worth advertising more:

 https://www.noisebridge.net/wiki/Secretaribot
 https://github.com/dannyob/secretaribot

--
Sumana Harihareswara
Engineering Community Manager
Wikimedia Foundation

On 08/24/2012 11:55 AM, John wrote:

> Like I said if you want I can whip up a script to nuke the spam, just
> drop me an email off list
>
> On Fri, Aug 24, 2012 at 11:54 AM, Yury Katkov <[hidden email]> wrote:
>> http://www.mediawiki.org/wiki/Manual:Reduce_size_of_the_database and
>> here is the manual on how to purge the archive database! Thanks John,
>> that's a perfect solution!
>> -----
>> Yury Katkov
>>
>>
>>
>> On Fri, Aug 24, 2012 at 7:51 PM, John <[hidden email]> wrote:
>>> What can be done after mass deleting is to purge the archive database
>>> table which should reduce the database size significantly. If you take
>>> a look at the the example where I cleaned up an existing site I
>>> reduced the database size by about 90%
>>>
>>> On Fri, Aug 24, 2012 at 11:47 AM, Yury Katkov <[hidden email]> wrote:
>>>> Hi everyone! I agree with everyone in this thread, but the main
>>>> problem is that even if I create a bot of use extensions that removes
>>>> pages, the actual database records won't be deleted. If I understand
>>>> correctly, the MediaWiki philosophy tells us that we cannot just drop
>>>> the page or an account from the database - all the deletions means
>>>> only that we will hide those nasty spam pages.
>>>>
>>>> Consequently after the deletions the size of my database won't shrink
>>>> to original 100 Mb, it remains around 3Gb which is a problem for
>>>> hosting.
>>>>
>>>> The proposed solution of exporting all the pages to a brand new wiki
>>>> solves this problem. Are there any other solutions where the dropping
>>>> of my old spammed database does not involved?
>>>> -----
>>>> Yury Katkov
>>>>
>>>>
>>>>
>>>> On Fri, Aug 24, 2012 at 4:13 PM, John <[hidden email]> wrote:
>>>>> Given enough facts it would be rather easy for me to write a script
>>>>> that nukes said spam I did something similar on
>>>>> http://manual.fireman.com.br/wiki/Especial:Registro/Betacommand
>>>>>
>>>>> _______________________________________________
>>>>> Wikitech-l mailing list
>>>>> [hidden email]
>>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>>
>>>> _______________________________________________
>>>> Wikitech-l mailing list
>>>> [hidden email]
>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>
>>> _______________________________________________
>>> Wikitech-l mailing list
>>> [hidden email]
>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: best way to clean up the wiki from 3Gb of spam

Yury Katkov
In reply to this post by John Doe-27
Hi John, thanks! Take your time! If you already have such a script,
and can share it - please do! But if not - I think it will be a good
exercise in pywikipediabot or extension development for me.
-----
Yury Katkov



On Fri, Aug 24, 2012 at 7:55 PM, John <[hidden email]> wrote:

> Like I said if you want I can whip up a script to nuke the spam, just
> drop me an email off list
>
> On Fri, Aug 24, 2012 at 11:54 AM, Yury Katkov <[hidden email]> wrote:
>> http://www.mediawiki.org/wiki/Manual:Reduce_size_of_the_database and
>> here is the manual on how to purge the archive database! Thanks John,
>> that's a perfect solution!
>> -----
>> Yury Katkov
>>
>>
>>
>> On Fri, Aug 24, 2012 at 7:51 PM, John <[hidden email]> wrote:
>>> What can be done after mass deleting is to purge the archive database
>>> table which should reduce the database size significantly. If you take
>>> a look at the the example where I cleaned up an existing site I
>>> reduced the database size by about 90%
>>>
>>> On Fri, Aug 24, 2012 at 11:47 AM, Yury Katkov <[hidden email]> wrote:
>>>> Hi everyone! I agree with everyone in this thread, but the main
>>>> problem is that even if I create a bot of use extensions that removes
>>>> pages, the actual database records won't be deleted. If I understand
>>>> correctly, the MediaWiki philosophy tells us that we cannot just drop
>>>> the page or an account from the database - all the deletions means
>>>> only that we will hide those nasty spam pages.
>>>>
>>>> Consequently after the deletions the size of my database won't shrink
>>>> to original 100 Mb, it remains around 3Gb which is a problem for
>>>> hosting.
>>>>
>>>> The proposed solution of exporting all the pages to a brand new wiki
>>>> solves this problem. Are there any other solutions where the dropping
>>>> of my old spammed database does not involved?
>>>> -----
>>>> Yury Katkov
>>>>
>>>>
>>>>
>>>> On Fri, Aug 24, 2012 at 4:13 PM, John <[hidden email]> wrote:
>>>>> Given enough facts it would be rather easy for me to write a script
>>>>> that nukes said spam I did something similar on
>>>>> http://manual.fireman.com.br/wiki/Especial:Registro/Betacommand
>>>>>
>>>>> _______________________________________________
>>>>> Wikitech-l mailing list
>>>>> [hidden email]
>>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>>
>>>> _______________________________________________
>>>> Wikitech-l mailing list
>>>> [hidden email]
>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>
>>> _______________________________________________
>>> Wikitech-l mailing list
>>> [hidden email]
>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: best way to clean up the wiki from 3Gb of spam

John Doe-27
Its rather easy to write in pywiki I just need some information from
you about your wiki. (IE are all edits after X date bad, we only have
Y valid users and here are their names) exc stuff like that allows me
to tailor the script to your needs.

On Fri, Aug 24, 2012 at 12:03 PM, Yury Katkov <[hidden email]> wrote:

> Hi John, thanks! Take your time! If you already have such a script,
> and can share it - please do! But if not - I think it will be a good
> exercise in pywikipediabot or extension development for me.
> -----
> Yury Katkov
>
>
>
> On Fri, Aug 24, 2012 at 7:55 PM, John <[hidden email]> wrote:
>> Like I said if you want I can whip up a script to nuke the spam, just
>> drop me an email off list
>>
>> On Fri, Aug 24, 2012 at 11:54 AM, Yury Katkov <[hidden email]> wrote:
>>> http://www.mediawiki.org/wiki/Manual:Reduce_size_of_the_database and
>>> here is the manual on how to purge the archive database! Thanks John,
>>> that's a perfect solution!
>>> -----
>>> Yury Katkov
>>>
>>>
>>>
>>> On Fri, Aug 24, 2012 at 7:51 PM, John <[hidden email]> wrote:
>>>> What can be done after mass deleting is to purge the archive database
>>>> table which should reduce the database size significantly. If you take
>>>> a look at the the example where I cleaned up an existing site I
>>>> reduced the database size by about 90%
>>>>
>>>> On Fri, Aug 24, 2012 at 11:47 AM, Yury Katkov <[hidden email]> wrote:
>>>>> Hi everyone! I agree with everyone in this thread, but the main
>>>>> problem is that even if I create a bot of use extensions that removes
>>>>> pages, the actual database records won't be deleted. If I understand
>>>>> correctly, the MediaWiki philosophy tells us that we cannot just drop
>>>>> the page or an account from the database - all the deletions means
>>>>> only that we will hide those nasty spam pages.
>>>>>
>>>>> Consequently after the deletions the size of my database won't shrink
>>>>> to original 100 Mb, it remains around 3Gb which is a problem for
>>>>> hosting.
>>>>>
>>>>> The proposed solution of exporting all the pages to a brand new wiki
>>>>> solves this problem. Are there any other solutions where the dropping
>>>>> of my old spammed database does not involved?
>>>>> -----
>>>>> Yury Katkov
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Aug 24, 2012 at 4:13 PM, John <[hidden email]> wrote:
>>>>>> Given enough facts it would be rather easy for me to write a script
>>>>>> that nukes said spam I did something similar on
>>>>>> http://manual.fireman.com.br/wiki/Especial:Registro/Betacommand
>>>>>>
>>>>>> _______________________________________________
>>>>>> Wikitech-l mailing list
>>>>>> [hidden email]
>>>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>>>
>>>>> _______________________________________________
>>>>> Wikitech-l mailing list
>>>>> [hidden email]
>>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>>
>>>> _______________________________________________
>>>> Wikitech-l mailing list
>>>> [hidden email]
>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>
>>> _______________________________________________
>>> Wikitech-l mailing list
>>> [hidden email]
>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: best way to clean up the wiki from 3Gb of spam

Yury Katkov
I think that we have the date after which there was only spam.
-----
Yury Katkov



On Fri, Aug 24, 2012 at 8:07 PM, John <[hidden email]> wrote:

> Its rather easy to write in pywiki I just need some information from
> you about your wiki. (IE are all edits after X date bad, we only have
> Y valid users and here are their names) exc stuff like that allows me
> to tailor the script to your needs.
>
> On Fri, Aug 24, 2012 at 12:03 PM, Yury Katkov <[hidden email]> wrote:
>> Hi John, thanks! Take your time! If you already have such a script,
>> and can share it - please do! But if not - I think it will be a good
>> exercise in pywikipediabot or extension development for me.
>> -----
>> Yury Katkov
>>
>>
>>
>> On Fri, Aug 24, 2012 at 7:55 PM, John <[hidden email]> wrote:
>>> Like I said if you want I can whip up a script to nuke the spam, just
>>> drop me an email off list
>>>
>>> On Fri, Aug 24, 2012 at 11:54 AM, Yury Katkov <[hidden email]> wrote:
>>>> http://www.mediawiki.org/wiki/Manual:Reduce_size_of_the_database and
>>>> here is the manual on how to purge the archive database! Thanks John,
>>>> that's a perfect solution!
>>>> -----
>>>> Yury Katkov
>>>>
>>>>
>>>>
>>>> On Fri, Aug 24, 2012 at 7:51 PM, John <[hidden email]> wrote:
>>>>> What can be done after mass deleting is to purge the archive database
>>>>> table which should reduce the database size significantly. If you take
>>>>> a look at the the example where I cleaned up an existing site I
>>>>> reduced the database size by about 90%
>>>>>
>>>>> On Fri, Aug 24, 2012 at 11:47 AM, Yury Katkov <[hidden email]> wrote:
>>>>>> Hi everyone! I agree with everyone in this thread, but the main
>>>>>> problem is that even if I create a bot of use extensions that removes
>>>>>> pages, the actual database records won't be deleted. If I understand
>>>>>> correctly, the MediaWiki philosophy tells us that we cannot just drop
>>>>>> the page or an account from the database - all the deletions means
>>>>>> only that we will hide those nasty spam pages.
>>>>>>
>>>>>> Consequently after the deletions the size of my database won't shrink
>>>>>> to original 100 Mb, it remains around 3Gb which is a problem for
>>>>>> hosting.
>>>>>>
>>>>>> The proposed solution of exporting all the pages to a brand new wiki
>>>>>> solves this problem. Are there any other solutions where the dropping
>>>>>> of my old spammed database does not involved?
>>>>>> -----
>>>>>> Yury Katkov
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 24, 2012 at 4:13 PM, John <[hidden email]> wrote:
>>>>>>> Given enough facts it would be rather easy for me to write a script
>>>>>>> that nukes said spam I did something similar on
>>>>>>> http://manual.fireman.com.br/wiki/Especial:Registro/Betacommand
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Wikitech-l mailing list
>>>>>>> [hidden email]
>>>>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>>>>
>>>>>> _______________________________________________
>>>>>> Wikitech-l mailing list
>>>>>> [hidden email]
>>>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>>>
>>>>> _______________________________________________
>>>>> Wikitech-l mailing list
>>>>> [hidden email]
>>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>>
>>>> _______________________________________________
>>>> Wikitech-l mailing list
>>>> [hidden email]
>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>
>>> _______________________________________________
>>> Wikitech-l mailing list
>>> [hidden email]
>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: best way to clean up the wiki from 3Gb of spam

John Doe-27
Can I get a link to your site? I would love to take a look and write
you that script, (I always love a challenge)

On Fri, Aug 24, 2012 at 12:10 PM, Yury Katkov <[hidden email]> wrote:

> I think that we have the date after which there was only spam.
> -----
> Yury Katkov
>
>
>
> On Fri, Aug 24, 2012 at 8:07 PM, John <[hidden email]> wrote:
>> Its rather easy to write in pywiki I just need some information from
>> you about your wiki. (IE are all edits after X date bad, we only have
>> Y valid users and here are their names) exc stuff like that allows me
>> to tailor the script to your needs.
>>
>> On Fri, Aug 24, 2012 at 12:03 PM, Yury Katkov <[hidden email]> wrote:
>>> Hi John, thanks! Take your time! If you already have such a script,
>>> and can share it - please do! But if not - I think it will be a good
>>> exercise in pywikipediabot or extension development for me.
>>> -----
>>> Yury Katkov
>>>
>>>
>>>
>>> On Fri, Aug 24, 2012 at 7:55 PM, John <[hidden email]> wrote:
>>>> Like I said if you want I can whip up a script to nuke the spam, just
>>>> drop me an email off list
>>>>
>>>> On Fri, Aug 24, 2012 at 11:54 AM, Yury Katkov <[hidden email]> wrote:
>>>>> http://www.mediawiki.org/wiki/Manual:Reduce_size_of_the_database and
>>>>> here is the manual on how to purge the archive database! Thanks John,
>>>>> that's a perfect solution!
>>>>> -----
>>>>> Yury Katkov
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Aug 24, 2012 at 7:51 PM, John <[hidden email]> wrote:
>>>>>> What can be done after mass deleting is to purge the archive database
>>>>>> table which should reduce the database size significantly. If you take
>>>>>> a look at the the example where I cleaned up an existing site I
>>>>>> reduced the database size by about 90%
>>>>>>
>>>>>> On Fri, Aug 24, 2012 at 11:47 AM, Yury Katkov <[hidden email]> wrote:
>>>>>>> Hi everyone! I agree with everyone in this thread, but the main
>>>>>>> problem is that even if I create a bot of use extensions that removes
>>>>>>> pages, the actual database records won't be deleted. If I understand
>>>>>>> correctly, the MediaWiki philosophy tells us that we cannot just drop
>>>>>>> the page or an account from the database - all the deletions means
>>>>>>> only that we will hide those nasty spam pages.
>>>>>>>
>>>>>>> Consequently after the deletions the size of my database won't shrink
>>>>>>> to original 100 Mb, it remains around 3Gb which is a problem for
>>>>>>> hosting.
>>>>>>>
>>>>>>> The proposed solution of exporting all the pages to a brand new wiki
>>>>>>> solves this problem. Are there any other solutions where the dropping
>>>>>>> of my old spammed database does not involved?
>>>>>>> -----
>>>>>>> Yury Katkov
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Aug 24, 2012 at 4:13 PM, John <[hidden email]> wrote:
>>>>>>>> Given enough facts it would be rather easy for me to write a script
>>>>>>>> that nukes said spam I did something similar on
>>>>>>>> http://manual.fireman.com.br/wiki/Especial:Registro/Betacommand
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Wikitech-l mailing list
>>>>>>>> [hidden email]
>>>>>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Wikitech-l mailing list
>>>>>>> [hidden email]
>>>>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>>>>
>>>>>> _______________________________________________
>>>>>> Wikitech-l mailing list
>>>>>> [hidden email]
>>>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>>>
>>>>> _______________________________________________
>>>>> Wikitech-l mailing list
>>>>> [hidden email]
>>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>>
>>>> _______________________________________________
>>>> Wikitech-l mailing list
>>>> [hidden email]
>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>
>>> _______________________________________________
>>> Wikitech-l mailing list
>>> [hidden email]
>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: best way to clean up the wiki from 3Gb of spam

Daniel Friesen-2
In reply to this post by Yury Katkov
Be aware that by default InnoDB uses a file called ibdata1 to do all of  
it's data storage.
When you remove data from the database InnoDB does not shrink ibdata1 down.
So even if you reduce your 3GB database down <1GB and you have room for  
>2GB of content to be added before ibdata1 grows again.
The actual size on disk that your database takes up will likely remain at  
3GB.

So if you really want to reduce on-disk size exporting and re-importing at  
least your raw database at some point becomes necessary since InnoDB will  
never give you that disk space back.

On Fri, 24 Aug 2012 08:54:26 -0700, Yury Katkov <[hidden email]>  
wrote:

> http://www.mediawiki.org/wiki/Manual:Reduce_size_of_the_database and
> here is the manual on how to purge the archive database! Thanks John,
> that's a perfect solution!
> -----
> Yury Katkov
>
>
>
> On Fri, Aug 24, 2012 at 7:51 PM, John <[hidden email]> wrote:
>> What can be done after mass deleting is to purge the archive database
>> table which should reduce the database size significantly. If you take
>> a look at the the example where I cleaned up an existing site I
>> reduced the database size by about 90%
>>
>> On Fri, Aug 24, 2012 at 11:47 AM, Yury Katkov <[hidden email]>  
>> wrote:
>>> Hi everyone! I agree with everyone in this thread, but the main
>>> problem is that even if I create a bot of use extensions that removes
>>> pages, the actual database records won't be deleted. If I understand
>>> correctly, the MediaWiki philosophy tells us that we cannot just drop
>>> the page or an account from the database - all the deletions means
>>> only that we will hide those nasty spam pages.
>>>
>>> Consequently after the deletions the size of my database won't shrink
>>> to original 100 Mb, it remains around 3Gb which is a problem for
>>> hosting.
>>>
>>> The proposed solution of exporting all the pages to a brand new wiki
>>> solves this problem. Are there any other solutions where the dropping
>>> of my old spammed database does not involved?
>>> -----
>>> Yury Katkov
>>>
>>>
>>>
>>> On Fri, Aug 24, 2012 at 4:13 PM, John <[hidden email]>  
>>> wrote:
>>>> Given enough facts it would be rather easy for me to write a script
>>>> that nukes said spam I did something similar on
>>>> http://manual.fireman.com.br/wiki/Especial:Registro/Betacommand
>>>>
>>>> _______________________________________________
>>>> Wikitech-l mailing list
>>>> [hidden email]
>>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>
>>> _______________________________________________
>>> Wikitech-l mailing list
>>> [hidden email]
>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l


--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: best way to clean up the wiki from 3Gb of spam

Derric Atzrott
In reply to this post by John Doe-27
>Its rather easy to write in pywiki I just need some information from
>you about your wiki. (IE are all edits after X date bad, we only have
>Y valid users and here are their names) exc stuff like that allows me
>to tailor the script to your needs.
>
>Can I get a link to your site? I would love to take a look and write
>you that script, (I always love a challenge)

If you make your script have some sort of configuration variables or something
along those lines for these different things, then you could release it and
many people could be helped by it.

If you do decide to release it.  I would cross post to the mailing list for
Mediawiki administrators as well.  I'm sure someone on there could use it.

Thank you,
Derric Atzrott


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: best way to clean up the wiki from 3Gb of spam

John Doe-27
Ive got a script but would like to test it before I make it public. If
someone has a site with spam and would let me test it, it would be
appreciated

On Fri, Aug 24, 2012 at 12:20 PM, Derric Atzrott
<[hidden email]> wrote:

>>Its rather easy to write in pywiki I just need some information from
>>you about your wiki. (IE are all edits after X date bad, we only have
>>Y valid users and here are their names) exc stuff like that allows me
>>to tailor the script to your needs.
>>
>>Can I get a link to your site? I would love to take a look and write
>>you that script, (I always love a challenge)
>
> If you make your script have some sort of configuration variables or something
> along those lines for these different things, then you could release it and
> many people could be helped by it.
>
> If you do decide to release it.  I would cross post to the mailing list for
> Mediawiki administrators as well.  I'm sure someone on there could use it.
>
> Thank you,
> Derric Atzrott
>
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: best way to clean up the wiki from 3Gb of spam

Tyler Romeo
I do! http://wiki.sittv.com has been building up spam for a number of
months (or longer).

*--*
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com | [hidden email]



On Fri, Aug 24, 2012 at 12:52 PM, John <[hidden email]> wrote:

> Ive got a script but would like to test it before I make it public. If
> someone has a site with spam and would let me test it, it would be
> appreciated
>
> On Fri, Aug 24, 2012 at 12:20 PM, Derric Atzrott
> <[hidden email]> wrote:
> >>Its rather easy to write in pywiki I just need some information from
> >>you about your wiki. (IE are all edits after X date bad, we only have
> >>Y valid users and here are their names) exc stuff like that allows me
> >>to tailor the script to your needs.
> >>
> >>Can I get a link to your site? I would love to take a look and write
> >>you that script, (I always love a challenge)
> >
> > If you make your script have some sort of configuration variables or
> something
> > along those lines for these different things, then you could release it
> and
> > many people could be helped by it.
> >
> > If you do decide to release it.  I would cross post to the mailing list
> for
> > Mediawiki administrators as well.  I'm sure someone on there could use
> it.
> >
> > Thank you,
> > Derric Atzrott
> >
> >
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
12