smw_RefreshData.php suggestions?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

smw_RefreshData.php suggestions?

Laurent Alquier
Hello

I am still running into many issues with smw_RefreshData.php for a refresh of pages with Virtuoso as a SPARQLstore.

Some issues are caused with my data and some need more troubleshooting (plain 'request denied' from the php script but the SPARQL query itself works fine directly in Virtuoso).

I am looking for suggestions from anyone familiar with the refresh process about where would be a good place to start patching for two new options:

1- I would like the script to skip pages that fail to refresh for whatever reason and log them somewhere so that the refresh can resume with the next available page.

2- I would like to improve the way the script runs in batch mode and allow to save the last updated page so I can do things like processing 1000 page refresh a day in a script,

Right now, the script just stops randomly when it runs into pages with issues. Since I have more than 300,000 page IDs to process, running through the whole thing with a few seconds delay between pages would require days. It is not possible to sit there and babysit that script to restart it every time it fails.

I don't mind looking into the code and proposing fixes to github. I just need a good place to start between the many layers of the script.

--
- Laurent Alquier
http://www.linfa.net

------------------------------------------------------------------------------

_______________________________________________
Semediawiki-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel
Reply | Threaded
Open this post in threaded view
|

Re: smw_RefreshData.php suggestions?

kghbln
Hi Laurent,

just two days ago I had the same thought (1) when refreshing the data of site. Luckily I only had 20k ids so after half an hour ... I personally get your point and a good starting point is probably to add these as feature requests for rebuildData.php to GitHub.

Cheers Karsten


Am 13.11.2015 um 15:10 schrieb Laurent Alquier:
Hello

I am still running into many issues with smw_RefreshData.php for a refresh of pages with Virtuoso as a SPARQLstore.

Some issues are caused with my data and some need more troubleshooting (plain 'request denied' from the php script but the SPARQL query itself works fine directly in Virtuoso).

I am looking for suggestions from anyone familiar with the refresh process about where would be a good place to start patching for two new options:

1- I would like the script to skip pages that fail to refresh for whatever reason and log them somewhere so that the refresh can resume with the next available page.

2- I would like to improve the way the script runs in batch mode and allow to save the last updated page so I can do things like processing 1000 page refresh a day in a script,

Right now, the script just stops randomly when it runs into pages with issues. Since I have more than 300,000 page IDs to process, running through the whole thing with a few seconds delay between pages would require days. It is not possible to sit there and babysit that script to restart it every time it fails.

I don't mind looking into the code and proposing fixes to github. I just need a good place to start between the many layers of the script.

--
- Laurent Alquier
http://www.linfa.net


------------------------------------------------------------------------------


_______________________________________________
Semediawiki-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel


------------------------------------------------------------------------------

_______________________________________________
Semediawiki-devel mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/semediawiki-devel