mwdumper problems

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

mwdumper problems

candy-4
Hi all,

I tried to import the entire english wikipedia dump(some 15 gigs) in to
mediawiki1.4.
The entire process took 2 days and then, it seems nothing happened !
I got an error like the following in between somewhere.

ERROR 1062 (23000) at line 28: Duplicate entry '1' for key 1

But the import was not aborted. So I let it continue.

I used the follwoing command for the import :

java -Xmx512m -server -jar mwdumper.jar --format=sql:1.4
pages_full.xml.bz2 | mysql -u root -p wikidb

where wikidb is the database name.

I think people had similar problems as is evident here

http://meta.wikimedia.org/wiki/Talk:Data_dumps

But then what went wrong. Or what needs to be done to make it work ?

Candy

_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: mwdumper problems

Jitse Niesen
On 1/8/06, candy <[hidden email]> wrote:
>
> I tried to import the entire english wikipedia dump(some 15 gigs) in to
> mediawiki1.4.
> The entire process took 2 days and then, it seems nothing happened !
> I got an error like the following in between somewhere.
>
> ERROR 1062 (23000) at line 28: Duplicate entry '1' for key 1
>
> But the import was not aborted. So I let it continue.

I never tried importing the dump in MediaWiki 1.4, but I had the same
problem in 1.5. The solution for me was to empty some tables first. In
1.5, the following does the trick: Run "mysql -u root -p wikidb" and
give the commands "truncate page; truncate revision; truncate text;".
My guess is that in 1.4 you need to do "truncate cur;".

Jitse
_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: mwdumper problems

candy-4
Jitse Niesen wrote:

> On 1/8/06, candy <[hidden email]> wrote:
>
>>I tried to import the entire english wikipedia dump(some 15 gigs) in to
>>mediawiki1.4.
>>The entire process took 2 days and then, it seems nothing happened !
>>I got an error like the following in between somewhere.
>>
>>ERROR 1062 (23000) at line 28: Duplicate entry '1' for key 1
>>
>>But the import was not aborted. So I let it continue.
>
>
> I never tried importing the dump in MediaWiki 1.4, but I had the same
> problem in 1.5. The solution for me was to empty some tables first. In
> 1.5, the following does the trick: Run "mysql -u root -p wikidb" and
> give the commands "truncate page; truncate revision; truncate text;".
> My guess is that in 1.4 you need to do "truncate cur;".
>
> Jitse


I did as you instructed by emptying the cur, old ,and blobs tables. The
import went on for 3 days. The size of the database wikidb(in this case)
increased to 4.1 GB. only although the total dump is 15 gig big.
After the import was over, I started the http server, and if I try to
search some page say "FRance" in the search field, no results are
obtained showing the import was not successful. What do you think is the
problem.
Is there something more I should do?

Candy

_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Re: mwdumper problems

Rob Church
What does Special:Allpages show?


Rob Church

On 17/01/06, candy <[hidden email]> wrote:

> Jitse Niesen wrote:
> > On 1/8/06, candy <[hidden email]> wrote:
> >
> >>I tried to import the entire english wikipedia dump(some 15 gigs) in to
> >>mediawiki1.4.
> >>The entire process took 2 days and then, it seems nothing happened !
> >>I got an error like the following in between somewhere.
> >>
> >>ERROR 1062 (23000) at line 28: Duplicate entry '1' for key 1
> >>
> >>But the import was not aborted. So I let it continue.
> >
> >
> > I never tried importing the dump in MediaWiki 1.4, but I had the same
> > problem in 1.5. The solution for me was to empty some tables first. In
> > 1.5, the following does the trick: Run "mysql -u root -p wikidb" and
> > give the commands "truncate page; truncate revision; truncate text;".
> > My guess is that in 1.4 you need to do "truncate cur;".
> >
> > Jitse
>
>
> I did as you instructed by emptying the cur, old ,and blobs tables. The
> import went on for 3 days. The size of the database wikidb(in this case)
> increased to 4.1 GB. only although the total dump is 15 gig big.
> After the import was over, I started the http server, and if I try to
> search some page say "FRance" in the search field, no results are
> obtained showing the import was not successful. What do you think is the
> problem.
> Is there something more I should do?
>
> Candy
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> http://mail.wikipedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: mwdumper problems

candy-4
It shows only a few pages:

A                to Alexander Selkirk
Alexander Severus to Antipope
Antiprism        to Foreign relations of Antigua and Barbuda
Geography of Africa to Ægir

Why wasnt the entire dump imported. As its only a few pages there are so
many red links. That is broken links I guess.

Candy

Rob Church wrote:

> What does Special:Allpages show?
>
>
> Rob Church
>
> On 17/01/06, candy <[hidden email]> wrote:
>
>>Jitse Niesen wrote:
>>
>>>On 1/8/06, candy <[hidden email]> wrote:
>>>
>>>
>>>>I tried to import the entire english wikipedia dump(some 15 gigs) in to
>>>>mediawiki1.4.
>>>>The entire process took 2 days and then, it seems nothing happened !
>>>>I got an error like the following in between somewhere.
>>>>
>>>>ERROR 1062 (23000) at line 28: Duplicate entry '1' for key 1
>>>>
>>>>But the import was not aborted. So I let it continue.
>>>
>>>
>>>I never tried importing the dump in MediaWiki 1.4, but I had the same
>>>problem in 1.5. The solution for me was to empty some tables first. In
>>>1.5, the following does the trick: Run "mysql -u root -p wikidb" and
>>>give the commands "truncate page; truncate revision; truncate text;".
>>>My guess is that in 1.4 you need to do "truncate cur;".
>>>
>>>Jitse
>>
>>
>>I did as you instructed by emptying the cur, old ,and blobs tables. The
>>import went on for 3 days. The size of the database wikidb(in this case)
>>increased to 4.1 GB. only although the total dump is 15 gig big.
>>After the import was over, I started the http server, and if I try to
>>search some page say "FRance" in the search field, no results are
>>obtained showing the import was not successful. What do you think is the
>>problem.
>>Is there something more I should do?
>>
>>Candy
>>
>>_______________________________________________
>>Wikitech-l mailing list
>>[hidden email]
>>http://mail.wikipedia.org/mailman/listinfo/wikitech-l
>>

_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Re: mwdumper problems

Brion Vibber
candy wrote:
> It shows only a few pages:
>
> A                    to     Alexander Selkirk
> Alexander Severus    to     Antipope
> Antiprism            to     Foreign relations of Antigua and Barbuda
> Geography of Africa    to     Ægir
>
> Why wasnt the entire dump imported. As its only a few pages there are so
> many red links. That is broken links I guess.

Perhaps an error occurred, and you didn't notice the error message?

Note that when piping SQL to the mysql command-line client, mwdumper will
continue unpacking the XML all the way through to the end but generally nothing
else will go into the database beyond the first database error that occurs.

Make sure that your database tables are *completely* empty before running
mwdumper. If there are any pages in there, you are likely to encounter errors.

-- brion vibber (brion @ pobox.com)


_______________________________________________
Wikitech-l mailing list
[hidden email]
http://mail.wikipedia.org/mailman/listinfo/wikitech-l

signature.asc (257 bytes) Download Attachment