Re: Job queue affecting semantic queries in MW > 1.22

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Job queue affecting semantic queries in MW > 1.22

egel
> The job queue on my site has now swollen to well over a million and
> continues to grow. A critical point has been reached where semantic
> queries are starting to fail, particularly those that need to delve
> into subcategories. "Data repair and upgrade" succeeded within two
> days, but did nothing to change that.
>
> The culprit may or may not be the introduction in MW 1.22 of a
> significant change to the way jobs are executed - see
> https://www.mediawiki.org/wiki/Manual:Job_queue#Changes_introduced_in_MediaWiki_1.22
> - in my case, $wgPhpCli is set to "false" because the default
> assumption "/usr/bin/php" is incorrect. The claim in the documentation
> that the old code is used when $wgPhpCli is not set to an actual path
> does not ring true here. (The issue remains true for later releases of
> MW 1.22 and setting $wgJobRunRate to a higher value has no effect).

Have you tried setting $wgPhpCli to the right path?
Some internet providers allow you to add commands as a cron-job, have
you tried adding runJobs.php as a cron-job.
Again, MW 1.23 introduces a significant change to the way jobs are
executed and is planned to be released today. So if everything else
fails, you can try to update MW and hope for the best.

--
Met Vriendelijke Zwerversgroeten

Wouter Rademaker


------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
Semediawiki-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/semediawiki-user
Reply | Threaded
Open this post in threaded view
|

Re: Job queue affecting semantic queries in MW > 1.22

Cavila Contrafibularity
No cron-jobs, but Karsten's piece of advice proves to be very useful - thanks for this! I'm now using the extension to run the script with "--maxjobs 100" each time and will gradually increase the number (it won't work in one go, as you might expect). An interesting outcome at this early stage is how often the same two or three pages are listed, presumably because requests for the same page have been heaping up over time.

> Have you tried setting $wgPhpCli to the right path?

Actually, what I should have said there is that either the path is incorrect or my installation simply lacks the permission to access it (being on a shared host and all). If it's the former, I would need to approach the hosting provider.

The approach to job queues in MW 1.23  may be an improvement, but I will first try to sort out this mess on its own terms, so to speak.

Thanks for all your comments,

Cav


> To: [hidden email]; [hidden email]
> Subject: RE: [Semediawiki-user] Job queue affecting semantic queries in MW >  1.22
> Date: Wed, 4 Jun 2014 14:54:22 +0200
> From: [hidden email]
>
> > The job queue on my site has now swollen to well over a million and
> > continues to grow. A critical point has been reached where semantic
> > queries are starting to fail, particularly those that need to delve
> > into subcategories. "Data repair and upgrade" succeeded within two
> > days, but did nothing to change that.
> >
> > The culprit may or may not be the introduction in MW 1.22 of a
> > significant change to the way jobs are executed - see
> > https://www.mediawiki.org/wiki/Manual:Job_queue#Changes_introduced_in_MediaWiki_1.22
> > - in my case, $wgPhpCli is set to "false" because the default
> > assumption "/usr/bin/php" is incorrect. The claim in the documentation
> > that the old code is used when $wgPhpCli is not set to an actual path
> > does not ring true here. (The issue remains true for later releases of
> > MW 1.22 and setting $wgJobRunRate to a higher value has no effect).
>
> Have you tried setting $wgPhpCli to the right path?
> Some internet providers allow you to add commands as a cron-job, have
> you tried adding runJobs.php as a cron-job.
> Again, MW 1.23 introduces a significant change to the way jobs are
> executed and is planned to be released today. So if everything else
> fails, you can try to update MW and hope for the best.
>
> --
> Met Vriendelijke Zwerversgroeten
>
> Wouter Rademaker
>
     
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
Semediawiki-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/semediawiki-user
Reply | Threaded
Open this post in threaded view
|

Re: Job queue affecting semantic queries in MW > 1.22

kghbln
Great that this option is working out for you. I believe that this
change [1] seeks at avoiding the same page being dealt with heaps of
times. I think it will be cool if this could be tested. I have not had
the time to do so up till now.

Cheers Karsten


[1] https://github.com/SemanticMediaWiki/SemanticMediaWiki/pull/307

Am 04.06.2014 15:35, schrieb Cavila:

> No cron-jobs, but Karsten's piece of advice proves to be very useful - thanks for this! I'm now using the extension to run the script with "--maxjobs 100" each time and will gradually increase the number (it won't work in one go, as you might expect). An interesting outcome at this early stage is how often the same two or three pages are listed, presumably because requests for the same page have been heaping up over time.
>
>> Have you tried setting $wgPhpCli to the right path?
> Actually, what I should have said there is that either the path is incorrect or my installation simply lacks the permission to access it (being on a shared host and all). If it's the former, I would need to approach the hosting provider.
>
> The approach to job queues in MW 1.23  may be an improvement, but I will first try to sort out this mess on its own terms, so to speak.
>
> Thanks for all your comments,
>
> Cav
>
>
>> To: [hidden email]; [hidden email]
>> Subject: RE: [Semediawiki-user] Job queue affecting semantic queries in MW >  1.22
>> Date: Wed, 4 Jun 2014 14:54:22 +0200
>> From: [hidden email]
>>
>>> The job queue on my site has now swollen to well over a million and
>>> continues to grow. A critical point has been reached where semantic
>>> queries are starting to fail, particularly those that need to delve
>>> into subcategories. "Data repair and upgrade" succeeded within two
>>> days, but did nothing to change that.
>>>
>>> The culprit may or may not be the introduction in MW 1.22 of a
>>> significant change to the way jobs are executed - see
>>> https://www.mediawiki.org/wiki/Manual:Job_queue#Changes_introduced_in_MediaWiki_1.22
>>> - in my case, $wgPhpCli is set to "false" because the default
>>> assumption "/usr/bin/php" is incorrect. The claim in the documentation
>>> that the old code is used when $wgPhpCli is not set to an actual path
>>> does not ring true here. (The issue remains true for later releases of
>>> MW 1.22 and setting $wgJobRunRate to a higher value has no effect).
>> Have you tried setting $wgPhpCli to the right path?
>> Some internet providers allow you to add commands as a cron-job, have
>> you tried adding runJobs.php as a cron-job.
>> Again, MW 1.23 introduces a significant change to the way jobs are
>> executed and is planned to be released today. So if everything else
>> fails, you can try to update MW and hope for the best.
>>
>> --
>> Met Vriendelijke Zwerversgroeten
>>
>> Wouter Rademaker
>>
>    
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/NeoTech
> _______________________________________________
> Semediawiki-user mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/semediawiki-user
>


------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
Semediawiki-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/semediawiki-user
Reply | Threaded
Open this post in threaded view
|

Re: Job queue affecting semantic queries in MW > 1.22

Cavila Contrafibularity
Mmm, this may actually be more of a SMW issue than I would have guessed, since the only jobs listed so far belong to SMW\UpdateJob. Some more information then:

* The list that is produced after runjobs.php has been executed (using Extension:MaintenanceShell) contains many pairs of lines. One such pair may look like:
   2014-06-04 14:34:02 SMW\UpdateJob [name of page] STARTING

   2014-06-04 14:34:02 SMW\UpdateJob [name of page] t=6 good

* The same number (t=...) may appear more than once for the same page.
** This list tends to include the same three pages (though not exclusively) so they are occurring hundreds of times. Hoping that it would disappear from the results, I deleted one of them, but it keeps coming up.
* Oddly and worrisomely enough, the API indicates that the job queue continues to grow.

Software:
* Semantic Bundle including SMW 1.9.2
* PHP 5.4.4-14
* MySQL 5.5.37

Cav

> Date: Wed, 4 Jun 2014 15:42:40 +0200
> From: [hidden email]
> To: [hidden email]
> Subject: Re: [Semediawiki-user] Job queue affecting semantic queries in MW > 1.22
>
> Great that this option is working out for you. I believe that this
> change [1] seeks at avoiding the same page being dealt with heaps of
> times. I think it will be cool if this could be tested. I have not had
> the time to do so up till now.
>
> Cheers Karsten
>
>
> [1] https://github.com/SemanticMediaWiki/SemanticMediaWiki/pull/307
>
> Am 04.06.2014 15:35, schrieb Cavila:
> > No cron-jobs, but Karsten's piece of advice proves to be very useful - thanks for this! I'm now using the extension to run the script with "--maxjobs 100" each time and will gradually increase the number (it won't work in one go, as you might expect). An interesting outcome at this early stage is how often the same two or three pages are listed, presumably because requests for the same page have been heaping up over time.
> >
> >> Have you tried setting $wgPhpCli to the right path?
> > Actually, what I should have said there is that either the path is incorrect or my installation simply lacks the permission to access it (being on a shared host and all). If it's the former, I would need to approach the hosting provider.
> >
> > The approach to job queues in MW 1.23  may be an improvement, but I will first try to sort out this mess on its own terms, so to speak.
> >
> > Thanks for all your comments,
> >
> > Cav
> >
> >
> >> To: [hidden email]; [hidden email]
> >> Subject: RE: [Semediawiki-user] Job queue affecting semantic queries in MW >  1.22
> >> Date: Wed, 4 Jun 2014 14:54:22 +0200
> >> From: [hidden email]
> >>
> >>> The job queue on my site has now swollen to well over a million and
> >>> continues to grow. A critical point has been reached where semantic
> >>> queries are starting to fail, particularly those that need to delve
> >>> into subcategories. "Data repair and upgrade" succeeded within two
> >>> days, but did nothing to change that.
> >>>
> >>> The culprit may or may not be the introduction in MW 1.22 of a
> >>> significant change to the way jobs are executed - see
> >>> https://www.mediawiki.org/wiki/Manual:Job_queue#Changes_introduced_in_MediaWiki_1.22
> >>> - in my case, $wgPhpCli is set to "false" because the default
> >>> assumption "/usr/bin/php" is incorrect. The claim in the documentation
> >>> that the old code is used when $wgPhpCli is not set to an actual path
> >>> does not ring true here. (The issue remains true for later releases of
> >>> MW 1.22 and setting $wgJobRunRate to a higher value has no effect).
> >> Have you tried setting $wgPhpCli to the right path?
> >> Some internet providers allow you to add commands as a cron-job, have
> >> you tried adding runJobs.php as a cron-job.
> >> Again, MW 1.23 introduces a significant change to the way jobs are
> >> executed and is planned to be released today. So if everything else
> >> fails, you can try to update MW and hope for the best.
> >>
> >> --
> >> Met Vriendelijke Zwerversgroeten
> >>
> >> Wouter Rademaker
> >>
> >    
> > ------------------------------------------------------------------------------
> > Learn Graph Databases - Download FREE O'Reilly Book
> > "Graph Databases" is the definitive new guide to graph databases and their
> > applications. Written by three acclaimed leaders in the field,
> > this first edition is now available. Download your free book today!
> > http://p.sf.net/sfu/NeoTech
> > _______________________________________________
> > Semediawiki-user mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/semediawiki-user
> >
>
>
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/NeoTech
> _______________________________________________
> Semediawiki-user mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/semediawiki-user
     
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
Semediawiki-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/semediawiki-user
Reply | Threaded
Open this post in threaded view
|

Re: Job queue affecting semantic queries in MW > 1.22

James HK
Hi,

SMW\UpdateJob is normally triggered by:

- \SMWSQLStore3SetupHandlers::refreshData when executing rebuildData.php
- \SMWSQLStore3Writers::updateRedirects
- \SMW\MediaWiki\Jobs\UpdateDispatcherJob dispatches jobs when
changing a property type or a subject is deleted (but only when the
setting is enabled, disabled by default)

Cheers

On 6/4/14, Cavila <[hidden email]> wrote:

> Mmm, this may actually be more of a SMW issue than I would have guessed,
> since the only jobs listed so far belong to SMW\UpdateJob. Some more
> information then:
>
> * The list that is produced after runjobs.php has been executed (using
> Extension:MaintenanceShell) contains many pairs of lines. One such pair may
> look like:
>    2014-06-04 14:34:02 SMW\UpdateJob [name of page] STARTING
>
>    2014-06-04 14:34:02 SMW\UpdateJob [name of page] t=6 good
>
> * The same number (t=...) may appear more than once for the same page.
> ** This list tends to include the same three pages (though not exclusively)
> so they are occurring hundreds of times. Hoping that it would disappear from
> the results, I deleted one of them, but it keeps coming up.
> * Oddly and worrisomely enough, the API indicates that the job queue
> continues to grow.
>
> Software:
> * Semantic Bundle including SMW 1.9.2
> * PHP 5.4.4-14
> * MySQL 5.5.37
>
> Cav
>
>> Date: Wed, 4 Jun 2014 15:42:40 +0200
>> From: [hidden email]
>> To: [hidden email]
>> Subject: Re: [Semediawiki-user] Job queue affecting semantic queries in MW
>> > 1.22
>>
>> Great that this option is working out for you. I believe that this
>> change [1] seeks at avoiding the same page being dealt with heaps of
>> times. I think it will be cool if this could be tested. I have not had
>> the time to do so up till now.
>>
>> Cheers Karsten
>>
>>
>> [1] https://github.com/SemanticMediaWiki/SemanticMediaWiki/pull/307
>>
>> Am 04.06.2014 15:35, schrieb Cavila:
>> > No cron-jobs, but Karsten's piece of advice proves to be very useful -
>> > thanks for this! I'm now using the extension to run the script with
>> > "--maxjobs 100" each time and will gradually increase the number (it
>> > won't work in one go, as you might expect). An interesting outcome at
>> > this early stage is how often the same two or three pages are listed,
>> > presumably because requests for the same page have been heaping up over
>> > time.
>> >
>> >> Have you tried setting $wgPhpCli to the right path?
>> > Actually, what I should have said there is that either the path is
>> > incorrect or my installation simply lacks the permission to access it
>> > (being on a shared host and all). If it's the former, I would need to
>> > approach the hosting provider.
>> >
>> > The approach to job queues in MW 1.23  may be an improvement, but I will
>> > first try to sort out this mess on its own terms, so to speak.
>> >
>> > Thanks for all your comments,
>> >
>> > Cav
>> >
>> >
>> >> To: [hidden email];
>> >> [hidden email]
>> >> Subject: RE: [Semediawiki-user] Job queue affecting semantic queries in
>> >> MW >  1.22
>> >> Date: Wed, 4 Jun 2014 14:54:22 +0200
>> >> From: [hidden email]
>> >>
>> >>> The job queue on my site has now swollen to well over a million and
>> >>> continues to grow. A critical point has been reached where semantic
>> >>> queries are starting to fail, particularly those that need to delve
>> >>> into subcategories. "Data repair and upgrade" succeeded within two
>> >>> days, but did nothing to change that.
>> >>>
>> >>> The culprit may or may not be the introduction in MW 1.22 of a
>> >>> significant change to the way jobs are executed - see
>> >>> https://www.mediawiki.org/wiki/Manual:Job_queue#Changes_introduced_in_MediaWiki_1.22
>> >>> - in my case, $wgPhpCli is set to "false" because the default
>> >>> assumption "/usr/bin/php" is incorrect. The claim in the
>> >>> documentation
>> >>> that the old code is used when $wgPhpCli is not set to an actual path
>> >>> does not ring true here. (The issue remains true for later releases
>> >>> of
>> >>> MW 1.22 and setting $wgJobRunRate to a higher value has no effect).
>> >> Have you tried setting $wgPhpCli to the right path?
>> >> Some internet providers allow you to add commands as a cron-job, have
>> >> you tried adding runJobs.php as a cron-job.
>> >> Again, MW 1.23 introduces a significant change to the way jobs are
>> >> executed and is planned to be released today. So if everything else
>> >> fails, you can try to update MW and hope for the best.
>> >>
>> >> --
>> >> Met Vriendelijke Zwerversgroeten
>> >>
>> >> Wouter Rademaker
>> >>
>> >    
>> > ------------------------------------------------------------------------------
>> > Learn Graph Databases - Download FREE O'Reilly Book
>> > "Graph Databases" is the definitive new guide to graph databases and
>> > their
>> > applications. Written by three acclaimed leaders in the field,
>> > this first edition is now available. Download your free book today!
>> > http://p.sf.net/sfu/NeoTech
>> > _______________________________________________
>> > Semediawiki-user mailing list
>> > [hidden email]
>> > https://lists.sourceforge.net/lists/listinfo/semediawiki-user
>> >
>>
>>
>> ------------------------------------------------------------------------------
>> Learn Graph Databases - Download FREE O'Reilly Book
>> "Graph Databases" is the definitive new guide to graph databases and their
>>
>> applications. Written by three acclaimed leaders in the field,
>> this first edition is now available. Download your free book today!
>> http://p.sf.net/sfu/NeoTech
>> _______________________________________________
>> Semediawiki-user mailing list
>> [hidden email]
>> https://lists.sourceforge.net/lists/listinfo/semediawiki-user
>    
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/NeoTech
> _______________________________________________
> Semediawiki-user mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/semediawiki-user
>

------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
Semediawiki-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/semediawiki-user
Reply | Threaded
Open this post in threaded view
|

Re: Job queue affecting semantic queries in MW > 1.22

Cavila Contrafibularity
- Does "Data repair and upgrade" as initialised from Special:SMWAdmin trigger "rebuildData.php",  thereby increasing the job queue? But what happens when the script has run its course? Are the tasks executed or are they deferred to the queue? Perhaps there have been a couple of abortive attempts in the past to "Data repair and upgrade" that might account for lingering jobs (or even loops?).
- There are lots of redirects on my site.
- The third one looks irrelevant if the setting is disabled.

(Note: I don't know if "SMW\UpdateJob" covers the entire queue but it is the only one indicated thus far.)

"runJobs.php" has been spun many times over, but I managed to remove less than 1% of the backlog. The process is so slow and the work tedious (I do have a life ; ) that I'm seriously considering to empty the jobs table of its contents. A clean slate.

Cav

> Date: Thu, 5 Jun 2014 00:19:50 +0900
> Subject: Re: [Semediawiki-user] Job queue affecting semantic queries in MW > 1.22
> From: [hidden email]
> To: [hidden email]
> CC: [hidden email]; [hidden email]
>
> Hi,
>
> SMW\UpdateJob is normally triggered by:
>
> - \SMWSQLStore3SetupHandlers::refreshData when executing rebuildData.php
> - \SMWSQLStore3Writers::updateRedirects
> - \SMW\MediaWiki\Jobs\UpdateDispatcherJob dispatches jobs when
> changing a property type or a subject is deleted (but only when the
> setting is enabled, disabled by default)
>
> Cheers
>
> On 6/4/14, Cavila <[hidden email]> wrote:
> > Mmm, this may actually be more of a SMW issue than I would have guessed,
> > since the only jobs listed so far belong to SMW\UpdateJob. Some more
> > information then:
> >
> > * The list that is produced after runjobs.php has been executed (using
> > Extension:MaintenanceShell) contains many pairs of lines. One such pair may
> > look like:
> >    2014-06-04 14:34:02 SMW\UpdateJob [name of page] STARTING
> >
> >    2014-06-04 14:34:02 SMW\UpdateJob [name of page] t=6 good
> >
> > * The same number (t=...) may appear more than once for the same page.
> > ** This list tends to include the same three pages (though not exclusively)
> > so they are occurring hundreds of times. Hoping that it would disappear from
> > the results, I deleted one of them, but it keeps coming up.
> > * Oddly and worrisomely enough, the API indicates that the job queue
> > continues to grow.
> >
> > Software:
> > * Semantic Bundle including SMW 1.9.2
> > * PHP 5.4.4-14
> > * MySQL 5.5.37
> >
> > Cav
> >
> >> Date: Wed, 4 Jun 2014 15:42:40 +0200
> >> From: [hidden email]
> >> To: [hidden email]
> >> Subject: Re: [Semediawiki-user] Job queue affecting semantic queries in MW
> >> > 1.22
> >>
> >> Great that this option is working out for you. I believe that this
> >> change [1] seeks at avoiding the same page being dealt with heaps of
> >> times. I think it will be cool if this could be tested. I have not had
> >> the time to do so up till now.
> >>
> >> Cheers Karsten
> >>
> >>
> >> [1] https://github.com/SemanticMediaWiki/SemanticMediaWiki/pull/307
> >>
> >> Am 04.06.2014 15:35, schrieb Cavila:
> >> > No cron-jobs, but Karsten's piece of advice proves to be very useful -
> >> > thanks for this! I'm now using the extension to run the script with
> >> > "--maxjobs 100" each time and will gradually increase the number (it
> >> > won't work in one go, as you might expect). An interesting outcome at
> >> > this early stage is how often the same two or three pages are listed,
> >> > presumably because requests for the same page have been heaping up over
> >> > time.
> >> >
> >> >> Have you tried setting $wgPhpCli to the right path?
> >> > Actually, what I should have said there is that either the path is
> >> > incorrect or my installation simply lacks the permission to access it
> >> > (being on a shared host and all). If it's the former, I would need to
> >> > approach the hosting provider.
> >> >
> >> > The approach to job queues in MW 1.23  may be an improvement, but I will
> >> > first try to sort out this mess on its own terms, so to speak.
> >> >
> >> > Thanks for all your comments,
> >> >
> >> > Cav
> >> >
> >> >
> >> >> To: [hidden email];
> >> >> [hidden email]
> >> >> Subject: RE: [Semediawiki-user] Job queue affecting semantic queries in
> >> >> MW >  1.22
> >> >> Date: Wed, 4 Jun 2014 14:54:22 +0200
> >> >> From: [hidden email]
> >> >>
> >> >>> The job queue on my site has now swollen to well over a million and
> >> >>> continues to grow. A critical point has been reached where semantic
> >> >>> queries are starting to fail, particularly those that need to delve
> >> >>> into subcategories. "Data repair and upgrade" succeeded within two
> >> >>> days, but did nothing to change that.
> >> >>>
> >> >>> The culprit may or may not be the introduction in MW 1.22 of a
> >> >>> significant change to the way jobs are executed - see
> >> >>> https://www.mediawiki.org/wiki/Manual:Job_queue#Changes_introduced_in_MediaWiki_1.22
> >> >>> - in my case, $wgPhpCli is set to "false" because the default
> >> >>> assumption "/usr/bin/php" is incorrect. The claim in the
> >> >>> documentation
> >> >>> that the old code is used when $wgPhpCli is not set to an actual path
> >> >>> does not ring true here. (The issue remains true for later releases
> >> >>> of
> >> >>> MW 1.22 and setting $wgJobRunRate to a higher value has no effect).
> >> >> Have you tried setting $wgPhpCli to the right path?
> >> >> Some internet providers allow you to add commands as a cron-job, have
> >> >> you tried adding runJobs.php as a cron-job.
> >> >> Again, MW 1.23 introduces a significant change to the way jobs are
> >> >> executed and is planned to be released today. So if everything else
> >> >> fails, you can try to update MW and hope for the best.
> >> >>
> >> >> --
> >> >> Met Vriendelijke Zwerversgroeten
> >> >>
> >> >> Wouter Rademaker
> >> >>
> >> >    
> >> > ------------------------------------------------------------------------------
> >> > Learn Graph Databases - Download FREE O'Reilly Book
> >> > "Graph Databases" is the definitive new guide to graph databases and
> >> > their
> >> > applications. Written by three acclaimed leaders in the field,
> >> > this first edition is now available. Download your free book today!
> >> > http://p.sf.net/sfu/NeoTech
> >> > _______________________________________________
> >> > Semediawiki-user mailing list
> >> > [hidden email]
> >> > https://lists.sourceforge.net/lists/listinfo/semediawiki-user
> >> >
> >>
> >>
> >> ------------------------------------------------------------------------------
> >> Learn Graph Databases - Download FREE O'Reilly Book
> >> "Graph Databases" is the definitive new guide to graph databases and their
> >>
> >> applications. Written by three acclaimed leaders in the field,
> >> this first edition is now available. Download your free book today!
> >> http://p.sf.net/sfu/NeoTech
> >> _______________________________________________
> >> Semediawiki-user mailing list
> >> [hidden email]
> >> https://lists.sourceforge.net/lists/listinfo/semediawiki-user
> >    
> > ------------------------------------------------------------------------------
> > Learn Graph Databases - Download FREE O'Reilly Book
> > "Graph Databases" is the definitive new guide to graph databases and their
> > applications. Written by three acclaimed leaders in the field,
> > this first edition is now available. Download your free book today!
> > http://p.sf.net/sfu/NeoTech
> > _______________________________________________
> > Semediawiki-user mailing list
> > [hidden email]
> > https://lists.sourceforge.net/lists/listinfo/semediawiki-user
> >
     
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
Semediawiki-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/semediawiki-user
Reply | Threaded
Open this post in threaded view
|

Re: Job queue affecting semantic queries in MW > 1.22

Cavila Contrafibularity
A further update:

(1) The maintenance script "showJobs.php" (again executed through the same extension) allows you to obtain the number of pending jobs per group. While "runJobs.php" had not yet shown me anything but "SMW\UpdateJob", "showJobs.php" revealed that the vast majority of queued items belongs to "SMWUpdateJob" (without a backslash), indicating a remnant from SMW 1.8.0.5 (and/or earlier).[1] The jobs table has been purged now.

(2) The patch "removeDuplicates" that MWJames submitted last month (https://github.com/SemanticMediaWiki/SemanticMediaWiki/pull/307/files) has now been applied. Hopefully, that should solve or at least reduce future occurrences of the issue.

(3) My starting point for this topic, the clue that something is not quite right, is that queries on categories were starting to  lose many relevant pages from its subcategories, despite $smwgQSubcategoryDepth being set to "20" or "30". That issue has not (yet) been resolved by simply clearing the jobs table. (For an inline query to give up-to-date information, it probably relies on jobs, but when the pages themselves are up-to-date, this shouldn't affect Special:Ask, right?)

Cav



[1] Compare https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/297 

> From: [hidden email]
> To: [hidden email]
> Date: Thu, 5 Jun 2014 10:05:56 +0100
> CC: [hidden email]
> Subject: Re: [Semediawiki-user] Job queue affecting semantic queries in MW > 1.22
>
> - Does "Data repair and upgrade" as initialised from Special:SMWAdmin trigger "rebuildData.php",  thereby increasing the job queue? But what happens when the script has run its course? Are the tasks executed or are they deferred to the queue? Perhaps there have been a couple of abortive attempts in the past to "Data repair and upgrade" that might account for lingering jobs (or even loops?).
> - There are lots of redirects on my site.
> - The third one looks irrelevant if the setting is disabled.
>
> (Note: I don't know if "SMW\UpdateJob" covers the entire queue but it is the only one indicated thus far.)
>
> "runJobs.php" has been spun many times over, but I managed to remove less than 1% of the backlog. The process is so slow and the work tedious (I do have a life ; ) that I'm seriously considering to empty the jobs table of its contents. A clean slate.
>
> Cav
>
> > Date: Thu, 5 Jun 2014 00:19:50 +0900
> > Subject: Re: [Semediawiki-user] Job queue affecting semantic queries in MW > 1.22
> > From: [hidden email]
> > To: [hidden email]
> > CC: [hidden email]; [hidden email]
> >
> > Hi,
> >
> > SMW\UpdateJob is normally triggered by:
> >
> > - \SMWSQLStore3SetupHandlers::refreshData when executing rebuildData.php
> > - \SMWSQLStore3Writers::updateRedirects
> > - \SMW\MediaWiki\Jobs\UpdateDispatcherJob dispatches jobs when
> > changing a property type or a subject is deleted (but only when the
> > setting is enabled, disabled by default)
> >
> > Cheers
> >
> > On 6/4/14, Cavila <[hidden email]> wrote:
> > > Mmm, this may actually be more of a SMW issue than I would have guessed,
> > > since the only jobs listed so far belong to SMW\UpdateJob. Some more
> > > information then:
> > >
> > > * The list that is produced after runjobs.php has been executed (using
> > > Extension:MaintenanceShell) contains many pairs of lines. One such pair may
> > > look like:
> > >    2014-06-04 14:34:02 SMW\UpdateJob [name of page] STARTING
> > >
> > >    2014-06-04 14:34:02 SMW\UpdateJob [name of page] t=6 good
> > >
> > > * The same number (t=...) may appear more than once for the same page.
> > > ** This list tends to include the same three pages (though not exclusively)
> > > so they are occurring hundreds of times. Hoping that it would disappear from
> > > the results, I deleted one of them, but it keeps coming up.
> > > * Oddly and worrisomely enough, the API indicates that the job queue
> > > continues to grow.
> > >
> > > Software:
> > > * Semantic Bundle including SMW 1.9.2
> > > * PHP 5.4.4-14
> > > * MySQL 5.5.37
> > >
> > > Cav
> > >
> > >> Date: Wed, 4 Jun 2014 15:42:40 +0200
> > >> From: [hidden email]
> > >> To: [hidden email]
> > >> Subject: Re: [Semediawiki-user] Job queue affecting semantic queries in MW
> > >> > 1.22
> > >>
> > >> Great that this option is working out for you. I believe that this
> > >> change [1] seeks at avoiding the same page being dealt with heaps of
> > >> times. I think it will be cool if this could be tested. I have not had
> > >> the time to do so up till now.
> > >>
> > >> Cheers Karsten
> > >>
> > >>
> > >> [1] https://github.com/SemanticMediaWiki/SemanticMediaWiki/pull/307
> > >>
> > >> Am 04.06.2014 15:35, schrieb Cavila:
> > >> > No cron-jobs, but Karsten's piece of advice proves to be very useful -
> > >> > thanks for this! I'm now using the extension to run the script with
> > >> > "--maxjobs 100" each time and will gradually increase the number (it
> > >> > won't work in one go, as you might expect). An interesting outcome at
> > >> > this early stage is how often the same two or three pages are listed,
> > >> > presumably because requests for the same page have been heaping up over
> > >> > time.
> > >> >
> > >> >> Have you tried setting $wgPhpCli to the right path?
> > >> > Actually, what I should have said there is that either the path is
> > >> > incorrect or my installation simply lacks the permission to access it
> > >> > (being on a shared host and all). If it's the former, I would need to
> > >> > approach the hosting provider.
> > >> >
> > >> > The approach to job queues in MW 1.23  may be an improvement, but I will
> > >> > first try to sort out this mess on its own terms, so to speak.
> > >> >
> > >> > Thanks for all your comments,
> > >> >
> > >> > Cav
> > >> >
> > >> >
> > >> >> To: [hidden email];
> > >> >> [hidden email]
> > >> >> Subject: RE: [Semediawiki-user] Job queue affecting semantic queries in
> > >> >> MW >  1.22
> > >> >> Date: Wed, 4 Jun 2014 14:54:22 +0200
> > >> >> From: [hidden email]
> > >> >>
> > >> >>> The job queue on my site has now swollen to well over a million and
> > >> >>> continues to grow. A critical point has been reached where semantic
> > >> >>> queries are starting to fail, particularly those that need to delve
> > >> >>> into subcategories. "Data repair and upgrade" succeeded within two
> > >> >>> days, but did nothing to change that.
> > >> >>>
> > >> >>> The culprit may or may not be the introduction in MW 1.22 of a
> > >> >>> significant change to the way jobs are executed - see
> > >> >>> https://www.mediawiki.org/wiki/Manual:Job_queue#Changes_introduced_in_MediaWiki_1.22
> > >> >>> - in my case, $wgPhpCli is set to "false" because the default
> > >> >>> assumption "/usr/bin/php" is incorrect. The claim in the
> > >> >>> documentation
> > >> >>> that the old code is used when $wgPhpCli is not set to an actual path
> > >> >>> does not ring true here. (The issue remains true for later releases
> > >> >>> of
> > >> >>> MW 1.22 and setting $wgJobRunRate to a higher value has no effect).
> > >> >> Have you tried setting $wgPhpCli to the right path?
> > >> >> Some internet providers allow you to add commands as a cron-job, have
> > >> >> you tried adding runJobs.php as a cron-job.
> > >> >> Again, MW 1.23 introduces a significant change to the way jobs are
> > >> >> executed and is planned to be released today. So if everything else
> > >> >> fails, you can try to update MW and hope for the best.
> > >> >>
> > >> >> --
> > >> >> Met Vriendelijke Zwerversgroeten
> > >> >>
> > >> >> Wouter Rademaker
> > >> >>
> > >> >    
> > >> > ------------------------------------------------------------------------------
> > >> > Learn Graph Databases - Download FREE O'Reilly Book
> > >> > "Graph Databases" is the definitive new guide to graph databases and
> > >> > their
> > >> > applications. Written by three acclaimed leaders in the field,
> > >> > this first edition is now available. Download your free book today!
> > >> > http://p.sf.net/sfu/NeoTech
> > >> > _______________________________________________
> > >> > Semediawiki-user mailing list
> > >> > [hidden email]
> > >> > https://lists.sourceforge.net/lists/listinfo/semediawiki-user
> > >> >
> > >>
> > >>
> > >> ------------------------------------------------------------------------------
> > >> Learn Graph Databases - Download FREE O'Reilly Book
> > >> "Graph Databases" is the definitive new guide to graph databases and their
> > >>
> > >> applications. Written by three acclaimed leaders in the field,
> > >> this first edition is now available. Download your free book today!
> > >> http://p.sf.net/sfu/NeoTech
> > >> _______________________________________________
> > >> Semediawiki-user mailing list
> > >> [hidden email]
> > >> https://lists.sourceforge.net/lists/listinfo/semediawiki-user
> > >    
> > > ------------------------------------------------------------------------------
> > > Learn Graph Databases - Download FREE O'Reilly Book
> > > "Graph Databases" is the definitive new guide to graph databases and their
> > > applications. Written by three acclaimed leaders in the field,
> > > this first edition is now available. Download your free book today!
> > > http://p.sf.net/sfu/NeoTech
> > > _______________________________________________
> > > Semediawiki-user mailing list
> > > [hidden email]
> > > https://lists.sourceforge.net/lists/listinfo/semediawiki-user
> > >
>      
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/NeoTech
> _______________________________________________
> Semediawiki-user mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/semediawiki-user
     
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
_______________________________________________
Semediawiki-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/semediawiki-user
Reply | Threaded
Open this post in threaded view
|

Re: Job queue affecting semantic queries in MW > 1.22

Cavila Contrafibularity
Just a quick status report to wrap things up. The issue, or similar behaviour, is now being discussed at https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/330

After I had emptied the jobs table, I initiated a new "Data repair and upgrade" process. Then, when it was complete, over 20,000 new "SMW\UpdateJob" jobs had been created. Subcategories were gradually returning results again as, with a little help from maintenance/upgrade.php, the number of jobs decreased (it's currently under 1,000)! So if you find yourself in a similar situation, it might not be such a bad idea to clear the table of pending jobs, re-initiate "Data repair and upgrade" and speed up the upgrade if necessary.

Cav

From: [hidden email]
To: [hidden email]
CC: [hidden email]
Subject: RE: [Semediawiki-user] Job queue affecting semantic queries in MW > 1.22
Date: Thu, 5 Jun 2014 14:03:50 +0100




A further update:

(1) The maintenance script "showJobs.php" (again executed through the same extension) allows you to obtain the number of pending jobs per group. While "runJobs.php" had not yet shown me anything but "SMW\UpdateJob", "showJobs.php" revealed that the vast majority of queued items belongs to "SMWUpdateJob" (without a backslash), indicating a remnant from SMW 1.8.0.5 (and/or earlier).[1] The jobs table has been purged now.

(2) The patch "removeDuplicates" that MWJames submitted last month (https://github.com/SemanticMediaWiki/SemanticMediaWiki/pull/307/files) has now been applied. Hopefully, that should solve or at least reduce future occurrences of the issue.

(3) My starting point for this topic, the clue that something is not quite right, is that queries on categories were starting to  lose many relevant pages from its subcategories, despite $smwgQSubcategoryDepth being set to "20" or "30". That issue has not (yet) been resolved by simply clearing the jobs table. (For an inline query to give up-to-date information, it probably relies on jobs, but when the pages themselves are up-to-date, this shouldn't affect Special:Ask, right?)

Cav



[1] Compare https://github.com/SemanticMediaWiki/SemanticMediaWiki/issues/297 

> From: [hidden email]
> To: [hidden email]
> Date: Thu, 5 Jun 2014 10:05:56 +0100
> CC: [hidden email]
> Subject: Re: [Semediawiki-user] Job queue affecting semantic queries in MW > 1.22
>
> - Does "Data repair and upgrade" as initialised from Special:SMWAdmin trigger "rebuildData.php",  thereby increasing the job queue? But what happens when the script has run its course? Are the tasks executed or are they deferred to the queue? Perhaps there have been a couple of abortive attempts in the past to "Data repair and upgrade" that might account for lingering jobs (or even loops?).
> - There are lots of redirects on my site.
> - The third one looks irrelevant if the setting is disabled.
>
> (Note: I don't know if "SMW\UpdateJob" covers the entire queue but it is the only one indicated thus far.)
>
> "runJobs.php" has been spun many times over, but I managed to remove less than 1% of the backlog. The process is so slow and the work tedious (I do have a life ; ) that I'm seriously considering to empty the jobs table of its contents. A clean slate.
>
> Cav
>
> > Date: Thu, 5 Jun 2014 00:19:50 +0900
> > Subject: Re: [Semediawiki-user] Job queue affecting semantic queries in MW > 1.22
> > From: [hidden email]
> > To: [hidden email]
> > CC: [hidden email]; [hidden email]
> >
> > Hi,
> >
> > SMW\UpdateJob is normally triggered by:
> >
> > - \SMWSQLStore3SetupHandlers::refreshData when executing rebuildData.php
> > - \SMWSQLStore3Writers::updateRedirects
> > - \SMW\MediaWiki\Jobs\UpdateDispatcherJob dispatches jobs when
> > changing a property type or a subject is deleted (but only when the
> > setting is enabled, disabled by default)
> >
> > Cheers
> >
> > On 6/4/14, Cavila <[hidden email]> wrote:
> > > Mmm, this may actually be more of a SMW issue than I would have guessed,
> > > since the only jobs listed so far belong to SMW\UpdateJob. Some more
> > > information then:
> > >
> > > * The list that is produced after runjobs.php has been executed (using
> > > Extension:MaintenanceShell) contains many pairs of lines. One such pair may
> > > look like:
> > >    2014-06-04 14:34:02 SMW\UpdateJob [name of page] STARTING
> > >
> > >    2014-06-04 14:34:02 SMW\UpdateJob [name of page] t=6 good
> > >
> > > * The same number (t=...) may appear more than once for the same page.
> > > ** This list tends to include the same three pages (though not exclusively)
> > > so they are occurring hundreds of times. Hoping that it would disappear from
> > > the results, I deleted one of them, but it keeps coming up.
> > > * Oddly and worrisomely enough, the API indicates that the job queue
> > > continues to grow.
> > >
> > > Software:
> > > * Semantic Bundle including SMW 1.9.2
> > > * PHP 5.4.4-14
> > > * MySQL 5.5.37
> > >
> > > Cav
> > >
> > >> Date: Wed, 4 Jun 2014 15:42:40 +0200
> > >> From: [hidden email]
> > >> To: [hidden email]
> > >> Subject: Re: [Semediawiki-user] Job queue affecting semantic queries in MW
> > >> > 1.22
> > >>
> > >> Great that this option is working out for you. I believe that this
> > >> change [1] seeks at avoiding the same page being dealt with heaps of
> > >> times. I think it will be cool if this could be tested. I have not had
> > >> the time to do so up till now.
> > >>
> > >> Cheers Karsten
> > >>
> > >>
> > >> [1] https://github.com/SemanticMediaWiki/SemanticMediaWiki/pull/307
> > >>
> > >> Am 04.06.2014 15:35, schrieb Cavila:
> > >> > No cron-jobs, but Karsten's piece of advice proves to be very useful -
> > >> > thanks for this! I'm now using the extension to run the script with
> > >> > "--maxjobs 100" each time and will gradually increase the number (it
> > >> > won't work in one go, as you might expect). An interesting outcome at
> > >> > this early stage is how often the same two or three pages are listed,
> > >> > presumably because requests for the same page have been heaping up over
> > >> > time.
> > >> >
> > >> >> Have you tried setting $wgPhpCli to the right path?
> > >> > Actually, what I should have said there is that either the path is
> > >> > incorrect or my installation simply lacks the permission to access it
> > >> > (being on a shared host and all). If it's the former, I would need to
> > >> > approach the hosting provider.
> > >> >
> > >> > The approach to job queues in MW 1.23  may be an improvement, but I will
> > >> > first try to sort out this mess on its own terms, so to speak.
> > >> >
> > >> > Thanks for all your comments,
> > >> >
> > >> > Cav
> > >> >
> > >> >
> > >> >> To: [hidden email];
> > >> >> [hidden email]
> > >> >> Subject: RE: [Semediawiki-user] Job queue affecting semantic queries in
> > >> >> MW >  1.22
> > >> >> Date: Wed, 4 Jun 2014 14:54:22 +0200
> > >> >> From: [hidden email]
> > >> >>
> > >> >>> The job queue on my site has now swollen to well over a million and
> > >> >>> continues to grow. A critical point has been reached where semantic
> > >> >>> queries are starting to fail, particularly those that need to delve
> > >> >>> into subcategories. "Data repair and upgrade" succeeded within two
> > >> >>> days, but did nothing to change that.
> > >> >>>
> > >> >>> The culprit may or may not be the introduction in MW 1.22 of a
> > >> >>> significant change to the way jobs are executed - see
> > >> >>> https://www.mediawiki.org/wiki/Manual:Job_queue#Changes_introduced_in_MediaWiki_1.22
> > >> >>> - in my case, $wgPhpCli is set to "false" because the default
> > >> >>> assumption "/usr/bin/php" is incorrect. The claim in the
> > >> >>> documentation
> > >> >>> that the old code is used when $wgPhpCli is not set to an actual path
> > >> >>> does not ring true here. (The issue remains true for later releases
> > >> >>> of
> > >> >>> MW 1.22 and setting $wgJobRunRate to a higher value has no effect).
> > >> >> Have you tried setting $wgPhpCli to the right path?
> > >> >> Some internet providers allow you to add commands as a cron-job, have
> > >> >> you tried adding runJobs.php as a cron-job.
> > >> >> Again, MW 1.23 introduces a significant change to the way jobs are
> > >> >> executed and is planned to be released today. So if everything else
> > >> >> fails, you can try to update MW and hope for the best.
> > >> >>
> > >> >> --
> > >> >> Met Vriendelijke Zwerversgroeten
> > >> >>
> > >> >> Wouter Rademaker
> > >> >>
> > >> >    
> > >> > ------------------------------------------------------------------------------
> > >> > Learn Graph Databases - Download FREE O'Reilly Book
> > >> > "Graph Databases" is the definitive new guide to graph databases and
> > >> > their
> > >> > applications. Written by three acclaimed leaders in the field,
> > >> > this first edition is now available. Download your free book today!
> > >> > http://p.sf.net/sfu/NeoTech
> > >> > _______________________________________________
> > >> > Semediawiki-user mailing list
> > >> > [hidden email]
> > >> > https://lists.sourceforge.net/lists/listinfo/semediawiki-user
> > >> >
> > >>
> > >>
> > >> ------------------------------------------------------------------------------
> > >> Learn Graph Databases - Download FREE O'Reilly Book
> > >> "Graph Databases" is the definitive new guide to graph databases and their
> > >>
> > >> applications. Written by three acclaimed leaders in the field,
> > >> this first edition is now available. Download your free book today!
> > >> http://p.sf.net/sfu/NeoTech
> > >> _______________________________________________
> > >> Semediawiki-user mailing list
> > >> [hidden email]
> > >> https://lists.sourceforge.net/lists/listinfo/semediawiki-user
> > >    
> > > ------------------------------------------------------------------------------
> > > Learn Graph Databases - Download FREE O'Reilly Book
> > > "Graph Databases" is the definitive new guide to graph databases and their
> > > applications. Written by three acclaimed leaders in the field,
> > > this first edition is now available. Download your free book today!
> > > http://p.sf.net/sfu/NeoTech
> > > _______________________________________________
> > > Semediawiki-user mailing list
> > > [hidden email]
> > > https://lists.sourceforge.net/lists/listinfo/semediawiki-user
> > >
>      
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/NeoTech
> _______________________________________________
> Semediawiki-user mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/semediawiki-user
         
------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Semediawiki-user mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/semediawiki-user