FileBackend branch merge

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

FileBackend branch merge

Aaron Schulz
I plan to merge the file backend branch next Monday (PST).

Overview:
FileRepo was refactored to use storage paths instead of file system paths. A storage path looks like "mwstore://backend/container/rel_path_to_file". This is somewhat similar to FileRepo virtual URLs (though they are URL-encoded of course), which look like "mwrepo://repo/zone/rel_path_to_file".

Some functions, like storeBatch() still allow FS paths as sources. Important breaking changes are in functions like File::getPath(), which return storage paths now instead of file system paths. The append-related functions were removed as we are using concatenate instead (already added in trunk in r104687).

The main goal is to abstract storage away so that various backends (FS, Swift, S3, Azure,...) can be supported. Our current NFS usage for thumbnails is not sustainable short-term and nor is the usage for source files long-term. Beyond being a single point of failure, it doesn't scale very well. With new features like chunked uploads and TimedMediaHandler, we hope to actually have serious video content in the future, which will require a better storage medium.

Other changes:
* Media handler code was minimally affected, as the transform tools are based on FS file reads/output anyway. However File::transform() will copy the output (if any) to the final storage path destination.
* Upload code was minimally affected too. Initial uploads still work with temp FS source files and call performUpload(). Stash-based uploads still store virtual URLs in the DB to track the uploaded files (from the initial attempt). When the user finishes and uploads from the stash, the usual performUpload() function is called on a local FS copy. Chunked uploads likewise use keys that determine virtual URLs, which use the FileRepo::concatenate() function to create a new storage file. The usual performUpload() function is called on a local FS copy of the file. Improvements could still be made here.
* Minor changes to img_auth.php/thumb.php were also required.
* Thumb handler code was recently added to /trunk, this can eventually be used to replace our custom thumb-handler.php script on our NFS thumbnail cache server.

Breakage:
Typically, the more a module makes use of FileRepo and virtual URLs, the less likely it is to break. Even calling File::getPath() and using that as a source to FileRepo::store() will happen to still work. Things like:
a) filemtime( $file->getPath() )
b) copy( $file->getPath(), ... )
c) StreamFile::stream( $file->getPath() )
...will be broken. You will see errors about PHP not finding a wrapper for 'mwstore'.

For example, ConfirmAccount and NSFileRepo will need updating. Since I wrote the former, it may provide an example for any updates needed. Such extensions will want to use FileRepo with an FSFileBackend and handle storage paths properly. If done correctly, the end-user won't notice anything on upgrade.

All core unit tests pass on my local machine.

End-users:
Once bugs are ironed out, nothing should really change for end-users. Setup.php will automatically create backwards compatible FSFileBackend containers for repositories. There aren't really any user facing features in this rewrite.
Reply | Threaded
Open this post in threaded view
|

Re: FileBackend branch merge

Danny Joe Bauch
Very exciting. I can hardly wait until Monday to get my hands on this
to try it out on Azure. That's been the missing piece for me.
Everything else in MW 18.5 works for me on Azure, except paging beyond
the first page for some of the special pages (e.g. Most Wanted) due to
a SQL Server PDO bug that degrades scrollable cursors to forward-only
cursors when using the Common Table Expressions (CTEs) that I
introduced to compensate for the lack of LIMIT/OFFSETsyntax in T-SQL
for SQL Azure.

On Fri, Dec 16, 2011 at 4:40 PM, Aaron S. <[hidden email]> wrote:

> I plan to merge the file backend branch next Monday (PST).
>
> Overview:
> FileRepo was refactored to use storage paths instead of file system paths. A
> storage path looks like "mwstore://backend/container/rel_path_to_file". This
> is somewhat similar to FileRepo virtual URLs (though they are URL-encoded of
> course), which look like "mwrepo://repo/zone/rel_path_to_file".
>
> Some functions, like storeBatch() still allow FS paths as sources. Important
> breaking changes are in functions like File::getPath(), which return storage
> paths now instead of file system paths. The append-related functions were
> removed as we are using concatenate instead (already added in trunk in
> r104687).
>
> The main goal is to abstract storage away so that various backends (FS,
> Swift, S3, Azure,...) can be supported. Our current NFS usage for thumbnails
> is not sustainable short-term and nor is the usage for source files
> long-term. Beyond being a single point of failure, it doesn't scale very
> well. With new features like chunked uploads and TimedMediaHandler, we hope
> to actually have serious video content in the future, which will require a
> better storage medium.
>
> Other changes:
> * Media handler code was minimally affected, as the transform tools are
> based on FS file reads/output anyway. However File::transform() will copy
> the output (if any) to the final storage path destination.
> * Upload code was minimally affected too. Initial uploads still work with
> temp FS source files and call performUpload(). Stash-based uploads still
> store virtual URLs in the DB to track the uploaded files (from the initial
> attempt). When the user finishes and uploads from the stash, the usual
> performUpload() function is called on a local FS copy. Chunked uploads
> likewise use keys that determine virtual URLs, which use the
> FileRepo::concatenate() function to create a new storage file. The usual
> performUpload() function is called on a local FS copy of the file.
> Improvements could still be made here.
> * Minor changes to img_auth.php/thumb.php were also required.
> * Thumb handler code was recently added to /trunk, this can eventually be
> used to replace our custom thumb-handler.php script on our NFS thumbnail
> cache server.
>
> Breakage:
> Typically, the more a module makes use of FileRepo and virtual URLs, the
> less likely it is to break. Even calling File::getPath() and using that as a
> source to FileRepo::store() will happen to still work. Things like:
> a) filemtime( $file->getPath() )
> b) copy( $file->getPath(), ... )
> c) StreamFile::stream( $file->getPath() )
> ...will be broken. You will see errors about PHP not finding a wrapper for
> 'mwstore'.
>
> For example, ConfirmAccount and NSFileRepo will need updating. Since I wrote
> the former, it may provide an example for any updates needed. Such
> extensions will want to use FileRepo with an FSFileBackend and handle
> storage paths properly. If done correctly, the end-user won't notice
> anything on upgrade.
>
> All core unit tests pass on my local machine.
>
> End-users:
> Once bugs are ironed out, nothing should really change for end-users.
> Setup.php will automatically create backwards compatible FSFileBackend
> containers for repositories. There aren't really any user facing features in
> this rewrite.
>
> --
> View this message in context: http://wikimedia.7.n6.nabble.com/FileBackend-branch-merge-tp1799672p1799672.html
> Sent from the Wikipedia Developers mailing list archive at Nabble.com.
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l