Reasons for not migrating to Tool Lab

classic Classic list List threaded Threaded
32 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Reasons for not migrating to Tool Lab

Merlissimo
Perhaps it is useful to summarize reasons why toolserver users are not
able to change to tool/bot labs. I added my main reasons. Perhaps other
can add their reasons, too? (Mabe we should also add this list to the
wiki page)

temporary blockers
* no replication of wikimedia wiki databases
** joining of user databases with wiki databases
* no support for script execution dependency (on ts: currently done by sge)
* no support for servlets

missing support blockers
* no support for new users not familar with unix based systems
* no transparent updating of packages with security problems/bug

permanent blockers
* license problems (i wrote code at work for my company and reuse parts
for my bot framework. I have not the right to declare this code as open
source which is needed by labs policy.)
* no DaB.

_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: Reasons for not migrating to Tool Lab

MZMcBride-2
Merlissimo wrote:

> Perhaps it is useful to summarize reasons why toolserver users are not
> able to change to tool/bot labs. I added my main reasons. Perhaps other
> can add their reasons, too? (Mabe we should also add this list to the
> wiki page)
>
> temporary blockers
> * no replication of wikimedia wiki databases
> ** joining of user databases with wiki databases
> * no support for script execution dependency (on ts: currently done by sge)
> * no support for servlets
>
> missing support blockers
> * no support for new users not familar with unix based systems
> * no transparent updating of packages with security problems/bug
>
> permanent blockers
> * license problems (i wrote code at work for my company and reuse parts
> for my bot framework. I have not the right to declare this code as open
> source which is needed by labs policy.)
> * no DaB.

I think I'd add "general direction of centralizing everything under a single
Wikimedia Foundation is a bad idea" as a permanent blocker. Maybe there's a
reasonable case for why deprecating the Toolserver and creating Wikimedia
Labs is a great idea, but I don't see it yet.

I don't see why each (Wikimedia) chapter shouldn't have its own replica of
the databases. We want free content to be free (and re-used and re-mixed and
whatever else). If you're going to invest in infrastructure, I think it
makes more sense to bolster replication support than try to compete with the
Toolserver.

That said, pooled resources can sometimes be a smart move to save on
investments such as hardware. Chapters working together is not a bad thing
(I believe some chapters donated to Wikimedia Deutschland for Toolserver
support in the past). But the broader point is that users should be very
cautious of the general direction that a Wikimedia (Foundation) Labs is
headed and ask whether it's really a good idea iff it means the destruction
of free-standing projects such as the Toolserver.

MZMcBride



_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: Reasons for not migrating to Tool Lab

Tim Landscheidt
(anonymous) wrote:

> [...]
> I think I'd add "general direction of centralizing everything under a single
> Wikimedia Foundation is a bad idea" as a permanent blocker. Maybe there's a
> reasonable case for why deprecating the Toolserver and creating Wikimedia
> Labs is a great idea, but I don't see it yet.

> I don't see why each (Wikimedia) chapter shouldn't have its own replica of
> the databases. We want free content to be free (and re-used and re-mixed and
> whatever else). If you're going to invest in infrastructure, I think it
> makes more sense to bolster replication support than try to compete with the
> Toolserver.

> That said, pooled resources can sometimes be a smart move to save on
> investments such as hardware. Chapters working together is not a bad thing
> (I believe some chapters donated to Wikimedia Deutschland for Toolserver
> support in the past). But the broader point is that users should be very
> cautious of the general direction that a Wikimedia (Foundation) Labs is
> headed and ask whether it's really a good idea iff it means the destruction
> of free-standing projects such as the Toolserver.

IMHO you have to differentiate between data and function.
It makes no sense to build artificial obstacles when setting
up some tool that can only be reasonably used with the live
dataset.  On the other hand, preparing for a day where WMF
turns rogue is never wrong.

  But the nice thing about Labs is that you can try out (re-
plicable :-)) replication setups at no cost, and don't have
to upfront investments on hardware, etc., so when time
comes, you can just upload your setup to EC2 or whatever and
have a working Wikipedia clone running in a manageable time-
frame.

Tim


_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: Reasons for not migrating to Tool Lab

Ryan Lane-3
In reply to this post by Merlissimo
> temporary blockers
> * no replication of wikimedia wiki databases
> ** joining of user databases with wiki databases

We currently have no plans for having the user databases on the same
servers as the replicated databases. Direct joins will not be
possible, so tools will need to be modified.

> * no support for script execution dependency (on ts: currently done by sge)

There's less of a need for this in Labs. If whatever you are running
is really expensive, you can have your own instance. That said, I was
looking at integrating a global queuing system. It won't be SGE,
though.

If someone is really keen on SGE, then I recommend they work with us
to puppetize it. Thankfully, open grid engine is already packaged in
ubuntu, which should make that much easier.

> * no support for servlets
>

I'm not sure what you mean by servlet?

> missing support blockers
> * no support for new users not familar with unix based systems

Can you describe how this is handled in Toolserver currently?

> * no transparent updating of packages with security problems/bug
>

Ubuntu has unattended-upgrades. It's generally enabled on instances.

> permanent blockers
> * license problems (i wrote code at work for my company and reuse parts for
> my bot framework. I have not the right to declare this code as open source
> which is needed by labs policy.)

This will continue to be a permanent blocker.

You can't decide that on your own, but you can ask your employer if
you can open source the code.

> * no DaB.
>

I'd love DaB to help us improve Labs.

Everything about Labs is fully open. Anyone can help build it, even
the production portions.

- Ryan

_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: Reasons for not migrating to Tool Lab

Daniel Schwen-2
In reply to this post by MZMcBride-2
At the risk of outing myself as "naive": I do not see this as a
problem like MZMcBride does. I think the foundation should have earned
our trust by now and them locking down the data does not seem like a
credible threat to me.
In any case:

a) you can download dumps to access the data independently from WMF
b) the replication to the TS is already "at the mercy" of WMF. The TS
does not make the data any free-er.

Best,
Dschwen


> I think I'd add "general direction of centralizing everything under a single
> Wikimedia Foundation is a bad idea" as a permanent blocker. Maybe there's a
> reasonable case for why deprecating the Toolserver and creating Wikimedia
> Labs is a great idea, but I don't see it yet.
>
> I don't see why each (Wikimedia) chapter shouldn't have its own replica of
> the databases. We want free content to be free (and re-used and re-mixed and
> whatever else). If you're going to invest in infrastructure, I think it
> makes more sense to bolster replication support than try to compete with the
> Toolserver.

_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: Reasons for not migrating to Tool Lab

Erik Moeller-4
In reply to this post by MZMcBride-2
On Wed, Sep 26, 2012 at 10:15 AM, MZMcBride <[hidden email]> wrote:
> I think I'd add "general direction of centralizing everything under a single
> Wikimedia Foundation is a bad idea" as a permanent blocker.

As others have noted, there's a difference between offering data
(which we do - we've spent a lot of time, money and effort to ensure
that stuff like dumps.wikimedia.org works reliably even at enwiki
scale) and providing a working environment for the dev community.

Having a primary working environment like Labs makes sense in much the
same way that it makes sense to have a primary multimedia repository
like Commons (and Wikidata, and in future probably a gadget
repository, a Lua script repository, etc.). It enables community
network effects and economies of scale that can't easily be replicated
and reduces wasteful duplication of effort.

That said, I'd love to make more real-time data feeds available for
third parties in general. The analytics team is currently looking into
offering a sensible alternative to the IRC feed for edit metadata, for
example.

Erik
--
Erik Möller
VP of Engineering and Product Development, Wikimedia Foundation

Support Free Knowledge: https://wikimediafoundation.org/wiki/Donate

_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: Reasons for not migrating to Tool Lab

Ryan Lane-3
> As others have noted, there's a difference between offering data
> (which we do - we've spent a lot of time, money and effort to ensure
> that stuff like dumps.wikimedia.org works reliably even at enwiki
> scale) and providing a working environment for the dev community.
>
> Having a primary working environment like Labs makes sense in much the
> same way that it makes sense to have a primary multimedia repository
> like Commons (and Wikidata, and in future probably a gadget
> repository, a Lua script repository, etc.). It enables community
> network effects and economies of scale that can't easily be replicated
> and reduces wasteful duplication of effort.
>

I'd like to go a little further on this point.

One of the goals of Labs is to have a fully virtualized clone of our
entire infrastructure that is also completely puppetized in a way
that's reusable by third parties. If you're worried about WMF, then
you should participate in Labs. You should help puppetize and should
help make everything usable by non-WMF entities.

Bringing community operations members back into the operations of the
site is another one of the goals of Labs. If we have enough community
operations people, then the projects aren't dependent on the knowledge
of the staff to survive.

If WMF becomes evil, fork the entire infrastructure into EC2,
Rackspace cloud, HP cloud, etc. and bring the community operations
people along for the ride. Hell, use the replicated databases in Labs
to populate your database in the cloud.

- Ryan

_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: Reasons for not migrating to Tool Lab

MZMcBride-2
In reply to this post by Erik Moeller-4
Erik Moeller wrote:

> As others have noted, there's a difference between offering data
> (which we do - we've spent a lot of time, money and effort to ensure
> that stuff like dumps.wikimedia.org works reliably even at enwiki
> scale) and providing a working environment for the dev community.
>
> Having a primary working environment like Labs makes sense in much the
> same way that it makes sense to have a primary multimedia repository
> like Commons (and Wikidata, and in future probably a gadget
> repository, a Lua script repository, etc.). It enables community
> network effects and economies of scale that can't easily be replicated
> and reduces wasteful duplication of effort.

Yes, there's a difference. But in this case, as far as I understand it, a
direct cost (or casualty) of setting up Wikimedia Labs is the existence of
the Toolserver. Does Wikimedia need a great testing infrastructure? Yes, of
course. (And it's not as though the Toolserver has ever been without its
share of issues; I'm not trying to white-wash the past here.) But the
question is: if such a Wikimedia testing infrastructure comes at the cost of
losing the Toolserver, is that acceptable?

Ryan Lane wrote:
> If WMF becomes evil, fork the entire infrastructure into EC2,
> Rackspace cloud, HP cloud, etc. and bring the community operations
> people along for the ride. Hell, use the replicated databases in Labs
> to populate your database in the cloud.

Tim Landscheidt wrote:
> But the nice thing about Labs is that you can try out (re-
> plicable :-)) replication setups at no cost, and don't have
> to upfront investments on hardware, etc., so when time
> comes, you can just upload your setup to EC2 or whatever and
> have a working Wikipedia clone running in a manageable time-
> frame.

This is not an easy task. Replicating the databases is enormously
challenging (they're huge datasets in the cases of the big wikis) and
they're constantly changing. If you tried to rely on dumps alone, you'd
always be out of date by at least two weeks (assuming dumps are working
properly). Two weeks on the Internet is a lot of time.

But more to the point, even if you suddenly had a lot of infrastructure
(bandwidth for constantly retrieving the data, space to store it all, and
extra memory and CPU to allow users to, y'know, do something with it) and
even if you suddenly had staff capable of managing these databases, not
every table is in even available currently. As far as I'm aware,
http://dumps.wikimedia.org doesn't include tables such as "user",
"ipblocks", "archive", "watchlist", any tables related to global images or
global user accounts, and probably many others. I'm not sure a full audit
has ever been done, but this is partially tracked by
<https://bugzilla.wikimedia.org/show_bug.cgi?id=25602>.

So beyond the silly simplicity of the suggestion that one could simply "move
to the cloud!", there are currently technical impossibilities to doing so.

MZMcBride



_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: Reasons for not migrating to Tool Lab

Ryan Lane-3
> Yes, there's a difference. But in this case, as far as I understand it, a
> direct cost (or casualty) of setting up Wikimedia Labs is the existence of
> the Toolserver. Does Wikimedia need a great testing infrastructure? Yes, of
> course. (And it's not as though the Toolserver has ever been without its
> share of issues; I'm not trying to white-wash the past here.) But the
> question is: if such a Wikimedia testing infrastructure comes at the cost of
> losing the Toolserver, is that acceptable?
>

This is a scarecrow argument. The mere existence of Labs doesn't mean
the loss of Toolserver.

Labs is more than just a testing infrastructure. It's an
infrastructure for creating things, for enable volunteer operations,
for bringing operations and development together, for integrating
other projects, and for providing free hosting to projects that may
not have it otherwise. Labs just also happens to need some of the same
features as Toolserver.

Again, as I've mentioned, Labs purpose isn't a Toolserver replacement.
It's vision is much, much larger than what the Toolserver can do.

> Ryan Lane wrote:
>> If WMF becomes evil, fork the entire infrastructure into EC2,
>> Rackspace cloud, HP cloud, etc. and bring the community operations
>> people along for the ride. Hell, use the replicated databases in Labs
>> to populate your database in the cloud.
>
> Tim Landscheidt wrote:
>> But the nice thing about Labs is that you can try out (re-
>> plicable :-)) replication setups at no cost, and don't have
>> to upfront investments on hardware, etc., so when time
>> comes, you can just upload your setup to EC2 or whatever and
>> have a working Wikipedia clone running in a manageable time-
>> frame.
>
> This is not an easy task. Replicating the databases is enormously
> challenging (they're huge datasets in the cases of the big wikis) and
> they're constantly changing. If you tried to rely on dumps alone, you'd
> always be out of date by at least two weeks (assuming dumps are working
> properly). Two weeks on the Internet is a lot of time.
>
> But more to the point, even if you suddenly had a lot of infrastructure
> (bandwidth for constantly retrieving the data, space to store it all, and
> extra memory and CPU to allow users to, y'know, do something with it) and
> even if you suddenly had staff capable of managing these databases, not
> every table is in even available currently. As far as I'm aware,
> http://dumps.wikimedia.org doesn't include tables such as "user",
> "ipblocks", "archive", "watchlist", any tables related to global images or
> global user accounts, and probably many others. I'm not sure a full audit
> has ever been done, but this is partially tracked by
> <https://bugzilla.wikimedia.org/show_bug.cgi?id=25602>.
>
> So beyond the silly simplicity of the suggestion that one could simply "move
> to the cloud!", there are currently technical impossibilities to doing so.
>

It's the same impossibilities for forking any single CC project
online. We're not allowed by our privacy policy (and very likely by
law) to provide that information. It's absurd to fault us on this. I
guess we're being evil by not being evil.

We've providing every single other needed piece of the puzzle required
for forking.

- Ryan

_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: Reasons for not migrating to Tool Lab

Hersfold Wikipedia
You may not have meant for it to lead to the end of the Toolserver, but
apparently that's how WMDE is taking it, and it sounds like that's going
to be the inevitable result. To say otherwise is rather naive at this
point, given the size of the threads talking about this.

----
User:Hersfold
[hidden email]

On 9/26/2012 6:06 PM, Ryan Lane wrote:

>> Yes, there's a difference. But in this case, as far as I understand it, a
>> direct cost (or casualty) of setting up Wikimedia Labs is the existence of
>> the Toolserver. Does Wikimedia need a great testing infrastructure? Yes, of
>> course. (And it's not as though the Toolserver has ever been without its
>> share of issues; I'm not trying to white-wash the past here.) But the
>> question is: if such a Wikimedia testing infrastructure comes at the cost of
>> losing the Toolserver, is that acceptable?
>>
> This is a scarecrow argument. The mere existence of Labs doesn't mean
> the loss of Toolserver.
>
> Labs is more than just a testing infrastructure. It's an
> infrastructure for creating things, for enable volunteer operations,
> for bringing operations and development together, for integrating
> other projects, and for providing free hosting to projects that may
> not have it otherwise. Labs just also happens to need some of the same
> features as Toolserver.
>
> Again, as I've mentioned, Labs purpose isn't a Toolserver replacement.
> It's vision is much, much larger than what the Toolserver can do.
>
>> Ryan Lane wrote:
>>> If WMF becomes evil, fork the entire infrastructure into EC2,
>>> Rackspace cloud, HP cloud, etc. and bring the community operations
>>> people along for the ride. Hell, use the replicated databases in Labs
>>> to populate your database in the cloud.
>> Tim Landscheidt wrote:
>>> But the nice thing about Labs is that you can try out (re-
>>> plicable :-)) replication setups at no cost, and don't have
>>> to upfront investments on hardware, etc., so when time
>>> comes, you can just upload your setup to EC2 or whatever and
>>> have a working Wikipedia clone running in a manageable time-
>>> frame.
>> This is not an easy task. Replicating the databases is enormously
>> challenging (they're huge datasets in the cases of the big wikis) and
>> they're constantly changing. If you tried to rely on dumps alone, you'd
>> always be out of date by at least two weeks (assuming dumps are working
>> properly). Two weeks on the Internet is a lot of time.
>>
>> But more to the point, even if you suddenly had a lot of infrastructure
>> (bandwidth for constantly retrieving the data, space to store it all, and
>> extra memory and CPU to allow users to, y'know, do something with it) and
>> even if you suddenly had staff capable of managing these databases, not
>> every table is in even available currently. As far as I'm aware,
>> http://dumps.wikimedia.org doesn't include tables such as "user",
>> "ipblocks", "archive", "watchlist", any tables related to global images or
>> global user accounts, and probably many others. I'm not sure a full audit
>> has ever been done, but this is partially tracked by
>> <https://bugzilla.wikimedia.org/show_bug.cgi?id=25602>.
>>
>> So beyond the silly simplicity of the suggestion that one could simply "move
>> to the cloud!", there are currently technical impossibilities to doing so.
>>
> It's the same impossibilities for forking any single CC project
> online. We're not allowed by our privacy policy (and very likely by
> law) to provide that information. It's absurd to fault us on this. I
> guess we're being evil by not being evil.
>
> We've providing every single other needed piece of the puzzle required
> for forking.
>
> - Ryan
>
> _______________________________________________
> Toolserver-l mailing list ([hidden email])
> https://lists.wikimedia.org/mailman/listinfo/toolserver-l
> Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette


_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: Reasons for not migrating to Tool Lab

Platonides
In reply to this post by Ryan Lane-3
On 26/09/12 20:25, Ryan Lane wrote:
>> temporary blockers
>> * no replication of wikimedia wiki databases
>> ** joining of user databases with wiki databases
>
> We currently have no plans for having the user databases on the same
> servers as the replicated databases. Direct joins will not be
> possible, so tools will need to be modified.

-50

It's such a useful feature, that it would be worth making a local mysql
slaves for having them.
I know, the all-powerful labs environment is unable to run a mysql
instance, but we could use MySQL cluster, trading memory (available) to
get joins (denied).




>> * no support for script execution dependency (on ts: currently done by sge)
>
> There's less of a need for this in Labs. If whatever you are running
> is really expensive, you can have your own instance. That said, I was
> looking at integrating a global queuing system. It won't be SGE,
> though.
>
> If someone is really keen on SGE, then I recommend they work with us
> to puppetize it. Thankfully, open grid engine is already packaged in
> ubuntu, which should make that much easier.

SGE is a strong queue system. We have people and tools already trained
to use it. It would be my first option.
That said, if the presented alternative has the same user interface, it
shouldn't be a problem. For instance, I don't have an opinion about
which of the SGE forks would be preferable.


>> * no support for servlets
>
> I'm not sure what you mean by servlet?

J2EE, I guess.




>> * no DaB.
>>
>
> I'd love DaB to help us improve Labs.
>
> Everything about Labs is fully open. Anyone can help build it, even
> the production portions.
>
> - Ryan

Would it be worth our efforts? I sometimes wonder why we should work on
that (yes, I'm pessimistic right now).
For instance the squid in front of *.beta.wmflabs.org. It was configured
by Petan and me. We had absolutely no support from the WMF. The squid
wasn't purging correctly. It worked on production, so there was a config
error somewhere.
We begged to see the squid config for months. But as it was in the
private repository, no, it can't be shown, just in case it has something
secret (very unlikely for squid config). Yes, we will clean them up and
publish, eventually. Months passed (not to mention how publishing the
config had been requested years ago). It could have been quickly
reviewed before handing out, and we weren't going to abuse it if there
really something weird was there. Replicating the WMF setup was done
without viewing that same setup. I finally fixed it. I was quite proud
of having solved it.
Where is that file right now? It vanished. The file was lost in one of
the multiple corruptions of labs instances. It was replaced with a copy
of the cluster config (which was finally published in the meantime).
So it feels like wasted effort now. I'd have liked to save a local copy
at least.

It's not enough to leave tools there and say "It is fully open. Anyone
can help build it"


_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: Reasons for not migrating to Tool Lab

Ryan Lane-3
In reply to this post by Hersfold Wikipedia
On Wed, Sep 26, 2012 at 6:29 PM, Hersfold <[hidden email]> wrote:
> You may not have meant for it to lead to the end of the Toolserver, but
> apparently that's how WMDE is taking it, and it sounds like that's going to
> be the inevitable result. To say otherwise is rather naive at this point,
> given the size of the threads talking about this.
>

I'll be honest, I don't really care about the politics behind any of
this, and I'm going to ignore anything more related to that. WMDE
dropping Toolserver is their decision and it doesn't affect how Labs
will operate in the future.

Labs is adding infrastructure needed to support Toolserver users. If
there's anything the Toolserver community needs that isn't in our
current roadmap, I'm more than happy to work those issues with the
community. The environment isn't going to be exactly the same, so
tools and bots may need to be modified. We can provide the necessary
resources, access, and training to integrate into the new environment.
WMDE will be providing resources to help with migrations.

Overall the environment provided by Labs has the ability to be much
more flexible and much more powerful than Toolserver. I hope everyone
migrates over, but I'll understand if anyone feels like it's too much
work.

- Ryan

_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: Reasons for not migrating to Tool Lab

Ryan Lane-3
In reply to this post by Platonides
>> We currently have no plans for having the user databases on the same
>> servers as the replicated databases. Direct joins will not be
>> possible, so tools will need to be modified.
>
> -50
>
> It's such a useful feature, that it would be worth making a local mysql
> slaves for having them.
> I know, the all-powerful labs environment is unable to run a mysql
> instance, but we could use MySQL cluster, trading memory (available) to
> get joins (denied).
>

I'm not the one setting up the databases. If you want information
about why this won't be available, talk to Asher (binasher in
#wikimedia-operations on Freenode). Maybe he can be convinced
otherwise.

Of course, in the production cluster we don't do joins this way. We
handle the joins in the app logic, which is a more appropriate way of
doing this.

> SGE is a strong queue system. We have people and tools already trained
> to use it. It would be my first option.
> That said, if the presented alternative has the same user interface, it
> shouldn't be a problem. For instance, I don't have an opinion about
> which of the SGE forks would be preferable.
>

In general in Labs we don't have a large need for a queuing system
right now. If Toolserver folks need it very badly, it's possible to
add, someone just needs to put the effort into it. It likely wouldn't
be amazingly hard to puppetize this to run in a single project. Making
things multi-project is difficult and takes effort. Anyone can do the
single-project version in a project, multi-project will likely take
engineering effort.

>
>>> * no support for servlets
>>
>> I'm not sure what you mean by servlet?
>
> J2EE, I guess.
>

Well, if it's available in the ubuntu repos, or if it's open source,
then it's available in Labs.

>> I'd love DaB to help us improve Labs.
>>
>> Everything about Labs is fully open. Anyone can help build it, even
>> the production portions.
>>
> Would it be worth our efforts? I sometimes wonder why we should work on
> that (yes, I'm pessimistic right now).
> For instance the squid in front of *.beta.wmflabs.org. It was configured
> by Petan and me. We had absolutely no support from the WMF. The squid
> wasn't purging correctly. It worked on production, so there was a config
> error somewhere.
> We begged to see the squid config for months. But as it was in the
> private repository, no, it can't be shown, just in case it has something
> secret (very unlikely for squid config). Yes, we will clean them up and
> publish, eventually. Months passed (not to mention how publishing the
> config had been requested years ago). It could have been quickly
> reviewed before handing out, and we weren't going to abuse it if there
> really something weird was there. Replicating the WMF setup was done
> without viewing that same setup. I finally fixed it. I was quite proud
> of having solved it.

And you should be. Your changes kept that project moving along for
months until I broke it.

> Where is that file right now? It vanished. The file was lost in one of
> the multiple corruptions of labs instances. It was replaced with a copy
> of the cluster config (which was finally published in the meantime).
> So it feels like wasted effort now. I'd have liked to save a local copy
> at least.
>

To be fair, there's only been a single occurrence of instance
corruption, which was due to a bug in KVM.

Also, yes, the squid configuration was finally published because ones
of the devs spent the time to do so. I was working on stabilizing
things most of that time.

Does this mean your efforts were wasted? Of course not. Your efforts
helped keep the project running, which is important. Just because your
file was replaced with the production copy doesn't mean the work put
into it was for nothing.

> It's not enough to leave tools there and say "It is fully open. Anyone
> can help build it"
>

We're also putting effort into making the migration happen, but we're
focusing our efforts in different places. We can't do everything,
which is why I'm trying to encourage others to help out. If we work on
separate pieces of the work it'll go much quicker.

- Ryan

_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: Reasons for not migrating to Tool Lab

Tim Landscheidt
In reply to this post by MZMcBride-2
(anonymous) wrote:

> [...]
> Ryan Lane wrote:
>> If WMF becomes evil, fork the entire infrastructure into EC2,
>> Rackspace cloud, HP cloud, etc. and bring the community operations
>> people along for the ride. Hell, use the replicated databases in Labs
>> to populate your database in the cloud.

> Tim Landscheidt wrote:
>> But the nice thing about Labs is that you can try out (re-
>> plicable :-)) replication setups at no cost, and don't have
>> to upfront investments on hardware, etc., so when time
>> comes, you can just upload your setup to EC2 or whatever and
>> have a working Wikipedia clone running in a manageable time-
>> frame.

> This is not an easy task. Replicating the databases is enormously
> challenging (they're huge datasets in the cases of the big wikis) and
> they're constantly changing. If you tried to rely on dumps alone, you'd
> always be out of date by at least two weeks (assuming dumps are working
> properly). Two weeks on the Internet is a lot of time.

I don't know if this is not an easy task, but you are proba-
bly right.  So what?  If a scenario of WMF turning rogue
couldn't bear losing two weeks of edits while saving almost
a decade, we should work on ways to incremental dumps.

> But more to the point, even if you suddenly had a lot of infrastructure
> (bandwidth for constantly retrieving the data, space to store it all, and
> extra memory and CPU to allow users to, y'know, do something with it) and
> even if you suddenly had staff capable of managing these databases, not
> every table is in even available currently. As far as I'm aware,
> http://dumps.wikimedia.org doesn't include tables such as "user",
> "ipblocks", "archive", "watchlist", any tables related to global images or
> global user accounts, and probably many others. I'm not sure a full audit
> has ever been done, but this is partially tracked by
> <https://bugzilla.wikimedia.org/show_bug.cgi?id=25602>.

The first part is easy: You go to some supplier and buy
bandwith, space, memory and CPU.  There is even staff for
hire.

  The second part is simple as well: What do you need
"ipblocks" or "watchlist" in a Wikipedia clone for?  It cer-
tainly is neither free content nor the content users use Wi-
kipedia for.

> So beyond the silly simplicity of the suggestion that one could simply "move
> to the cloud!", there are currently technical impossibilities to doing so.

And it would be far more helpful if you could stop spreading
FUD and instead show what actual impediments there are, for
example in a Labs project.

Tim


_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: Reasons for not migrating to Tool Lab

Ariel Glenn WMF
Στις 26-09-2012, ημέρα Τετ, και ώρα 23:38 +0000, ο/η Tim Landscheidt
έγραψε:

> (anonymous) wrote:
>
> > [...]
> > Ryan Lane wrote:
> >> If WMF becomes evil, fork the entire infrastructure into EC2,
> >> Rackspace cloud, HP cloud, etc. and bring the community operations
> >> people along for the ride. Hell, use the replicated databases in Labs
> >> to populate your database in the cloud.
>
> > Tim Landscheidt wrote:
> >> But the nice thing about Labs is that you can try out (re-
> >> plicable :-)) replication setups at no cost, and don't have
> >> to upfront investments on hardware, etc., so when time
> >> comes, you can just upload your setup to EC2 or whatever and
> >> have a working Wikipedia clone running in a manageable time-
> >> frame.
>
> > This is not an easy task. Replicating the databases is enormously
> > challenging (they're huge datasets in the cases of the big wikis) and
> > they're constantly changing. If you tried to rely on dumps alone, you'd
> > always be out of date by at least two weeks (assuming dumps are working
> > properly). Two weeks on the Internet is a lot of time.
>
> I don't know if this is not an easy task, but you are proba-
> bly right.  So what?  If a scenario of WMF turning rogue
> couldn't bear losing two weeks of edits while saving almost
> a decade, we should work on ways to incremental dumps.
>

In fact there are (experimental) adds/changes dumps, so while it might
not be a 5 minute procedure to get that data into your copy, and
deletions and suppressions wouldn't be covered, the amount of data that
would be lost would be pretty small.

Ariel


_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: Reasons for not migrating to Tool Lab

Federico Leva (Nemo)
In reply to this post by Ryan Lane-3
 > In general in Labs we don't have a large need for a queuing system
 > right now.

Of course, because nobody is using it right now. I suppose Toolserver
didn't need it when it had only a few users consuming its resources.

 > Does this mean your efforts were wasted? Of course not. Your efforts
 > helped keep the project running, which is important. Just because your
 > file was replaced with the production copy doesn't mean the work put
 > into it was for nothing.

Amazing! We should suggest this approach to the editor engagement team,
the post-edit feedback could say "Your valuable edit is now visible to
the world. It will probably be reverted in a few minutes, but you can
still be proud of it, knowing that the new revision written by someone
else is much better!".

Nemo

_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: Reasons for not migrating to Tool Lab

Federico Leva (Nemo)
In reply to this post by Ryan Lane-3
Ryan wrote:
 > Again, as I've mentioned, Labs purpose isn't a Toolserver replacement.
 > It's vision is much, much larger than what the Toolserver can do.

Which in the meanwhile will allow us to do a much, much narrower set of
things for Wikimedia projects than the Toolserver can do.

Of course, maybe in 5 or 10 years users will be able to reinvent from
scratch or readapt what has been done on the Toolserver in these years,
and in a much better way. Before that age en.wiki might have been locked
due to editor activity drop.

But yes, you're right, we should be using a better terminology: the
Toolserver isn't being "replaced", it's being
killed/terminated/discontinued/trashed/<insert favourite word here>.

Hersfold wrote:
 > You may not have meant for it to lead to the end of the Toolserver, but
 > apparently that's how WMDE is taking it, and it sounds like that's going
 > to be the inevitable result. To say otherwise is rather naive at this
 > point, given the size of the threads talking about this.

+1 (except that it's not WMDE).

Ryan wrote:
 > I'll be honest, I don't really care about the politics behind any of
 > this, and I'm going to ignore anything more related to that. WMDE
 > dropping Toolserver is their decision [...]

Ridiculous. Your boss said that it's the WMF's decision to terminate the
Toolserver just a few mails ago: «for our part, we will not continue to
support the current arrangement (DB replication, hosting in our
data-center, etc.) indefinitely».
<http://lists.wikimedia.org/pipermail/toolserver-l/2012-September/005294.html>

 > [...] and it doesn't affect how Labs
 > will operate in the future. [...]
 > WMDE will be providing resources to help with migrations.

Can Pavel confirm this? Or are you the one who decides about WMDE budget
now?

In general, I'm really amazed by this approach "it doesn't affect us"
etc. Is Wikimedia Labs supposed to advance Wikimedia's mission and help
Wikimedia projects or not? Can this be done ignoring the context? Do you
really think that trashing all tools and services currently on
Toolserver rather than ensuring they mostly will continue operating
makes any sense for the scope of Wikimedia Labs?
I wish someone guesstimated the value of Toolserver's current tools and
services in terms of developing work hours and the cost for migration.
I'm quite sure that by requiring a huge effort for migration and
therefore trashing most stuff you'll be losing millions of dollars of
value for your Wikimedia Labs. Too bad that also Wikimedia projects will
lose the corresponding value.

Finally, I'm greatly re-evaluating the wisdom of those users who across
the years insistently used things like appspot.com, heroku.com or their
own websites where possible for their Wikimedia tools. They are
extremely unreliable and limited by what's possible with dumps, API and
screenscraping, but at least they don't rely on a single person in the
WMF not pressing the huge red button.

Nemo

_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: Reasons for not migrating to Tool Lab

Peter Wendorff
In reply to this post by Merlissimo
Hi.
I'm not happy with the decisions discussed here, too, and I don't want to support that decision with this mail, but if I understand it right, at least partly what you describe might be wrong.

If I read right, the Labs environment basically is a kind of private cloud, using lots of virtual machines as servers in a hardware setup of less machines with more power each.

If that's true and provided that the corresponding admins would allow it, then some of your points are wrong. (comments between your posts lines)

Am 26.09.2012 16:10, schrieb Merlissimo:
temporary blockers
*[...]
* no support for script execution dependency (on ts: currently done by sge)
if scripts run on virtual machines of the labs environment, it's possible
1) to run several scripts on the same machine, keeping dependencies in the execution,
2) to run several scripts on different machines with execution dependency modelled by web apis/interfaces. For some combinations that's overkill in complexity, but sometimes it might be useful, too.
* no support for servlets
most likely wrong as it should be possible to use virtual machines running java and a servlet container like tomcat or jetty on it.
missing support blockers
* no support for new users not familar with unix based systems
* no transparent updating of packages with security problems/bug
+1
permanent blockers
* license problems (i wrote code at work for my company and reuse parts for my bot framework. I have not the right to declare this code as open source which is needed by labs policy.)
well... yes, but I doubt this is a big issue at all for most toolserver users as when I joined the toolserver a (declared) open source licensing declaration was a necessary condition for the tools.
* no DaB.
+10 (well... or more)

And after reading a little bit more about labs some points to add:

- as DaB already pointed out: no OSM database, and it's not possible even to use OSM data as every content used has to be under CC-license, which isn't true any more for OSM.
- osm, which was a (not sure, how it was called exactly) partner project for WMDE - at least one that is acknowledged to be supportable by WMDE, is neither Wikimedia nor mediawiki and therefore not possible to work on at labs with projects, that are not directly incorporated to mediawiki (especially as the content isn't CC, see above)

regards
Peter

P.S.: Interestingly very strict rules about personal data, but passwords are only to be hashed - someone could use unsalted md5 or even the hash-function myHash(s) { return substr(s, 0, 12) } which is a hash function, but neither cryptographic nor secure...

_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: Reasons for not migrating to Tool Lab

Ryan Lane-3
In reply to this post by Federico Leva (Nemo)
On Thu, Sep 27, 2012 at 1:36 AM, Federico Leva (Nemo)
<[hidden email]> wrote:
>> In general in Labs we don't have a large need for a queuing system
>> right now.
>
> Of course, because nobody is using it right now. I suppose Toolserver didn't
> need it when it had only a few users consuming its resources.
>

I should know better than to feed a troll, but Labs is relatively
heavily used. At this moment there are 233 virtual machines running
across 125 projects. It's actively used by quite a number of bots
(which have already moved from Toolserver). It's being used by the
following teams;

* Analytics
* Editor-engagement
* Visual editor
* Global education
* QA
* Mobile
* Pediapress
* Localization
* Wikidata
* Operations
* Fundraising
* Core services

Many of those teams host multiple active projects.

Additionally, we have a number of volunteer driven projects. Here's a
few choice ones:

* Bots
* Deployment-prep
* Maps (for OpenStreetMaps)
* Wikistats
* Wikitrust
* Signwriting
* Phabricator
* Metavidwiki
* Huggle
* Glam
* Wiki loves monuments
* Blamemaps
* Counter vandalism network

It was used extensively during Google summer of code by the students
and mentors. It's also used very heavily during hackathons; most
projects demo at the end with Labs.

These projects aren't in great need of a queue because they don't
fight against each other for shared resources. When bots and tools are
added that need to do expensive, long-running queries against a set of
common databases we'll likely need some form of queuing system, but it
hasn't been a high priority since we haven't been working on
Toolserver like features.

- Ryan

_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: Reasons for not migrating to Tool Lab

Andrei Cipu
> Additionally, we have a number of volunteer driven projects. Here's a

> few choice ones:
>
> * Bots
> * Deployment-prep
> * Maps (for OpenStreetMaps)
> * Wikistats
> * Wikitrust
> * Signwriting
> * Phabricator
> * Metavidwiki
> * Huggle
> * Glam
> * Wiki loves monuments
> * Blamemaps
> * Counter vandalism network
>

Where can we find more information about these projects, especially OSM and WLM?


_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
12