switching to something better than irc.wikimedia.org

classic Classic list List threaded Threaded
29 messages Options
12
Reply | Threaded
Open this post in threaded view
|

switching to something better than irc.wikimedia.org

Petr Bena
Hi,

I think that irc feed of recent changes is working great, but there is
still a lot of space for improvement.

As Ryan Lane suggested once, we could probably use system of queues
instead of irc which would be even more advanced. My suggestion is to
create some kind of feed that would be in machine parseable format,
like XML

This feed would be distributed by some kind of dispatcher living on
some server, like feed.wikimedia.org and offering not just recent
changes but also a recent history (for example last 5000 changes per
project)

In case that service which is parsing this feed would be down for a
moment, it could retrieve a backlog of changes.

The current feed irc.wikimedia.org should stay, but we could change it
so that the current bot is retrieving the data from new xml feed
instead of directly from apaches.

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: switching to something better than irc.wikimedia.org

Tyler Romeo
Hey,

It sounds like an interesting idea. Actually, AWS (I've been working with
it recently for Extension:AWS) has a similar architecture, where you
establish a push notification service using their Simple Notification
Service and have it send messages to a queue using their Simple Queue
Service.

The difficulty in replacing IRC would be that, first of all, it would
almost definitely have to be a push-based service, where the wiki would
publish the message and the notification server would send out the recent
change to all the subscribed clients. This begs the question of whether
there's an existing piece of software that does this or whether this would
require implementing a daemon in the form of a maintenance script that
handle the job.

*--*
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com | [hidden email]


On Fri, Mar 1, 2013 at 3:55 AM, Petr Bena <[hidden email]> wrote:

> Hi,
>
> I think that irc feed of recent changes is working great, but there is
> still a lot of space for improvement.
>
> As Ryan Lane suggested once, we could probably use system of queues
> instead of irc which would be even more advanced. My suggestion is to
> create some kind of feed that would be in machine parseable format,
> like XML
>
> This feed would be distributed by some kind of dispatcher living on
> some server, like feed.wikimedia.org and offering not just recent
> changes but also a recent history (for example last 5000 changes per
> project)
>
> In case that service which is parsing this feed would be down for a
> moment, it could retrieve a backlog of changes.
>
> The current feed irc.wikimedia.org should stay, but we could change it
> so that the current bot is retrieving the data from new xml feed
> instead of directly from apaches.
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: switching to something better than irc.wikimedia.org

Petr Bena
I believe it would require to create a new daemon (preferably written
in c++) which I am willing to write, that could do similar what the
ircd does right now. And that is delivery of new change to all
connected clients.

There would be preferably set of processes that are working together
on this system. Some kind of cache daemon that would contain the
history for all projects, dispatcher that would handle requests of
clients and retrieve the data from cache daemon and listener which
would retrieve the UDP traffic from wiki's and forward them to cache
daemon.

We could of course create a multithreaded single process daemon as
well, but that would make it little bit less stable, given that crash
of any thread would bring down whole system.

On Fri, Mar 1, 2013 at 1:26 PM, Tyler Romeo <[hidden email]> wrote:

> Hey,
>
> It sounds like an interesting idea. Actually, AWS (I've been working with
> it recently for Extension:AWS) has a similar architecture, where you
> establish a push notification service using their Simple Notification
> Service and have it send messages to a queue using their Simple Queue
> Service.
>
> The difficulty in replacing IRC would be that, first of all, it would
> almost definitely have to be a push-based service, where the wiki would
> publish the message and the notification server would send out the recent
> change to all the subscribed clients. This begs the question of whether
> there's an existing piece of software that does this or whether this would
> require implementing a daemon in the form of a maintenance script that
> handle the job.
>
> *--*
> *Tyler Romeo*
> Stevens Institute of Technology, Class of 2015
> Major in Computer Science
> www.whizkidztech.com | [hidden email]
>
>
> On Fri, Mar 1, 2013 at 3:55 AM, Petr Bena <[hidden email]> wrote:
>
>> Hi,
>>
>> I think that irc feed of recent changes is working great, but there is
>> still a lot of space for improvement.
>>
>> As Ryan Lane suggested once, we could probably use system of queues
>> instead of irc which would be even more advanced. My suggestion is to
>> create some kind of feed that would be in machine parseable format,
>> like XML
>>
>> This feed would be distributed by some kind of dispatcher living on
>> some server, like feed.wikimedia.org and offering not just recent
>> changes but also a recent history (for example last 5000 changes per
>> project)
>>
>> In case that service which is parsing this feed would be down for a
>> moment, it could retrieve a backlog of changes.
>>
>> The current feed irc.wikimedia.org should stay, but we could change it
>> so that the current bot is retrieving the data from new xml feed
>> instead of directly from apaches.
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: switching to something better than irc.wikimedia.org

Daniel Friesen-2
We actually have an open RFC on this topic:

https://www.mediawiki.org/wiki/Requests_for_comment/Structured_data_push_notification_support_for_recent_changes

--
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: switching to something better than irc.wikimedia.org

Petr Bena
I see that the RFC is considering multiple formats, why not support
all of them? We could make the client request the format they like,
either XML or JSON, that would be a matter of dispatcher how it
produce the output data.

On Fri, Mar 1, 2013 at 2:35 PM, Daniel Friesen
<[hidden email]> wrote:

> We actually have an open RFC on this topic:
>
> https://www.mediawiki.org/wiki/Requests_for_comment/Structured_data_push_notification_support_for_recent_changes
>
> --
> ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]
>
>
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: switching to something better than irc.wikimedia.org

Happy Melon-2
Because we made that mistake with the API, and now we're stuck with a bunch
of deadweight formats that do nothing other than increase maintenance
costs.  If your first preference as a client developer is for JSON, it's
really not that hard for you to go get a library to receive it in XML
instead, or vice versa.  That's the whole point of a standardised format.

--HM


On 1 March 2013 13:48, Petr Bena <[hidden email]> wrote:

> I see that the RFC is considering multiple formats, why not support
> all of them? We could make the client request the format they like,
> either XML or JSON, that would be a matter of dispatcher how it
> produce the output data.
>
> On Fri, Mar 1, 2013 at 2:35 PM, Daniel Friesen
> <[hidden email]> wrote:
> > We actually have an open RFC on this topic:
> >
> >
> https://www.mediawiki.org/wiki/Requests_for_comment/Structured_data_push_notification_support_for_recent_changes
> >
> > --
> > ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]
> >
> >
> >
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: switching to something better than irc.wikimedia.org

Tyler Romeo
In reply to this post by Petr Bena
The RFC doesn't seem to have gotten much interest (only a burst of edits
from Krinkle in August and then it died). But interesting nonetheless.

The one thing I do know is that if this were to be implemented, it would
probably be pretty complex. It would have to support at least a couple of
different push methods (I'd imagine IRC would be one of them for backwards
compatibility) and would have to be efficient enough to handle the load of
receiving and sending Wikipedia's recent changes. Like Petr said, the
client would probably be in C++ or some other language that supports true
multithreading and is efficient enough.

At that point, it might as well be its own product, i.e., a generic push
notification server similar to Amazon's SNS. I feel like such a project
would take an insane amount of resources to develop. Between design,
coding, and testing, we'd be lucky to see it implemented by MW 1.25 ;)

Nonetheless, it sounds like a fun project, and if some developers would be
interested in putting together a generic C++ push notification server, I'd
be happy to help out.

*--*
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com | [hidden email]
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: switching to something better than irc.wikimedia.org

Petr Bena
In reply to this post by Happy Melon-2
The problem is that while XML is widely accepted standard supported on
all platforms and languages, JSON, even if it might be better, is not
so well supported in this moment. For this reason I think it would be
cool to be able to offer multiple outputs.

In the end, as you said, it's not that hard to get a library which
converts it from one to other, so why we can't use such a library on
side of dispatcher instead of forcing developers of clients to seek
this library for their language so that they can convert it

On Fri, Mar 1, 2013 at 2:56 PM, Happy Melon <[hidden email]> wrote:

> Because we made that mistake with the API, and now we're stuck with a bunch
> of deadweight formats that do nothing other than increase maintenance
> costs.  If your first preference as a client developer is for JSON, it's
> really not that hard for you to go get a library to receive it in XML
> instead, or vice versa.  That's the whole point of a standardised format.
>
> --HM
>
>
> On 1 March 2013 13:48, Petr Bena <[hidden email]> wrote:
>
>> I see that the RFC is considering multiple formats, why not support
>> all of them? We could make the client request the format they like,
>> either XML or JSON, that would be a matter of dispatcher how it
>> produce the output data.
>>
>> On Fri, Mar 1, 2013 at 2:35 PM, Daniel Friesen
>> <[hidden email]> wrote:
>> > We actually have an open RFC on this topic:
>> >
>> >
>> https://www.mediawiki.org/wiki/Requests_for_comment/Structured_data_push_notification_support_for_recent_changes
>> >
>> > --
>> > ~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://danielfriesen.name/]
>> >
>> >
>> >
>> > _______________________________________________
>> > Wikitech-l mailing list
>> > [hidden email]
>> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: switching to something better than irc.wikimedia.org

Petr Bena
In reply to this post by Tyler Romeo
OK Inserted this to hackaton topics as well...

On Fri, Mar 1, 2013 at 3:02 PM, Tyler Romeo <[hidden email]> wrote:

> The RFC doesn't seem to have gotten much interest (only a burst of edits
> from Krinkle in August and then it died). But interesting nonetheless.
>
> The one thing I do know is that if this were to be implemented, it would
> probably be pretty complex. It would have to support at least a couple of
> different push methods (I'd imagine IRC would be one of them for backwards
> compatibility) and would have to be efficient enough to handle the load of
> receiving and sending Wikipedia's recent changes. Like Petr said, the
> client would probably be in C++ or some other language that supports true
> multithreading and is efficient enough.
>
> At that point, it might as well be its own product, i.e., a generic push
> notification server similar to Amazon's SNS. I feel like such a project
> would take an insane amount of resources to develop. Between design,
> coding, and testing, we'd be lucky to see it implemented by MW 1.25 ;)
>
> Nonetheless, it sounds like a fun project, and if some developers would be
> interested in putting together a generic C++ push notification server, I'd
> be happy to help out.
>
> *--*
> *Tyler Romeo*
> Stevens Institute of Technology, Class of 2015
> Major in Computer Science
> www.whizkidztech.com | [hidden email]
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: switching to something better than irc.wikimedia.org

Tyler Romeo
In reply to this post by Petr Bena
On Fri, Mar 1, 2013 at 9:04 AM, Petr Bena <[hidden email]> wrote:

> The problem is that while XML is widely accepted standard supported on
> all platforms and languages, JSON, even if it might be better, is not
> so well supported in this moment. For this reason I think it would be
> cool to be able to offer multiple outputs.
>

JSON is *very* widely supported in almost every language. It would
definitely not be a problem if we only supported JSON. Furthermore, JSON
represents data using only native types, all of which are existent in PHP.
In other words, a PHP/C++/etc. variable can be directly serialized into
JSON, whereas in XML this is very much not the case due to attributes and
the ability to have child elements of different types, making it much more
difficult to implement.

*--*
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com | [hidden email]
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: switching to something better than irc.wikimedia.org

Petr Bena
I have not yet found a good and stable library for JSON parsing in c#,
should you know some let me know :)

However, I disagree with "I feel like such a project would take an
insane amount of resources to develop." If we wouldn't make it
insanely complicated, it won't take insane amount of time ;). The
cache daemon could be memcached which is already written and stable.
Listener is a simple daemon that just listen in UDP, parse the data
from mediawiki and store them in memcached in some universal format,
and dispatcher is just process that takes the data from cache, convert
them to specified format and send them to client.

Sounds easy ;)

On Fri, Mar 1, 2013 at 3:10 PM, Tyler Romeo <[hidden email]> wrote:

> On Fri, Mar 1, 2013 at 9:04 AM, Petr Bena <[hidden email]> wrote:
>
>> The problem is that while XML is widely accepted standard supported on
>> all platforms and languages, JSON, even if it might be better, is not
>> so well supported in this moment. For this reason I think it would be
>> cool to be able to offer multiple outputs.
>>
>
> JSON is *very* widely supported in almost every language. It would
> definitely not be a problem if we only supported JSON. Furthermore, JSON
> represents data using only native types, all of which are existent in PHP.
> In other words, a PHP/C++/etc. variable can be directly serialized into
> JSON, whereas in XML this is very much not the case due to attributes and
> the ability to have child elements of different types, making it much more
> difficult to implement.
>
> *--*
> *Tyler Romeo*
> Stevens Institute of Technology, Class of 2015
> Major in Computer Science
> www.whizkidztech.com | [hidden email]
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: switching to something better than irc.wikimedia.org

Tyler Romeo
On Fri, Mar 1, 2013 at 9:16 AM, Petr Bena <[hidden email]> wrote:

> I have not yet found a good and stable library for JSON parsing in c#,
> should you know some let me know :)
>

Take a look at http://www.json.org/. They have a list of implementations
for different languages.

However, I disagree with "I feel like such a project would take an
> insane amount of resources to develop." If we wouldn't make it
> insanely complicated, it won't take insane amount of time ;). The
> cache daemon could be memcached which is already written and stable.
> Listener is a simple daemon that just listen in UDP, parse the data
> from mediawiki and store them in memcached in some universal format,
> and dispatcher is just process that takes the data from cache, convert
> them to specified format and send them to client.


Here's a quick list of things that are basic requirements we'd have to
implement:

   - Multi-threading, which is in and of itself a pain in the a**.
   - Some sort of queue for messages, rather than hoping the daemon can
   send out every message in realtime.
   - Ability for clients to register with the daemon (and a place to store
   a client list)
   - Multiple methods of notification (IRC would be one, XMPP might be a
   candidate, and a simple HTTP endpoint would be a must).

Just those basics isn't an easy task, especially considering unless WMF
allocates resources to it the project would be run solely by those who have
enough free time. Also, I wouldn't use memcached as a caching daemon,
primarily because I'm not sure such an application even needs a caching
daemon. All it does is relay messages.

*--*
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com | [hidden email]
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: switching to something better than irc.wikimedia.org

Petr Bena
I still don't see it as too much complex. Matter of month(s) for
volunteers with limited time.

However I quite don't see what is so complicated on last 2 points.
Given the frequency of updates it's most simple to have the client
(user / bot / service that need to read the feed) open the persistent
connection to server (dispatcher) which fork itself just as sshd does
and the new process handle all requests from this client. The client
somehow specify what kind of feed they want to have (that's the
registration part) and forked dispatcher keeps it updated with
information from cache.

Nothing hard. And what's the problem with multithreading huh? :) BTW I
don't really think there is a need for multithreading at all, but even
if there was, it shouldn't be so hard.

On Fri, Mar 1, 2013 at 3:47 PM, Tyler Romeo <[hidden email]> wrote:

> On Fri, Mar 1, 2013 at 9:16 AM, Petr Bena <[hidden email]> wrote:
>
>> I have not yet found a good and stable library for JSON parsing in c#,
>> should you know some let me know :)
>>
>
> Take a look at http://www.json.org/. They have a list of implementations
> for different languages.
>
> However, I disagree with "I feel like such a project would take an
>> insane amount of resources to develop." If we wouldn't make it
>> insanely complicated, it won't take insane amount of time ;). The
>> cache daemon could be memcached which is already written and stable.
>> Listener is a simple daemon that just listen in UDP, parse the data
>> from mediawiki and store them in memcached in some universal format,
>> and dispatcher is just process that takes the data from cache, convert
>> them to specified format and send them to client.
>
>
> Here's a quick list of things that are basic requirements we'd have to
> implement:
>
>    - Multi-threading, which is in and of itself a pain in the a**.
>    - Some sort of queue for messages, rather than hoping the daemon can
>    send out every message in realtime.
>    - Ability for clients to register with the daemon (and a place to store
>    a client list)
>    - Multiple methods of notification (IRC would be one, XMPP might be a
>    candidate, and a simple HTTP endpoint would be a must).
>
> Just those basics isn't an easy task, especially considering unless WMF
> allocates resources to it the project would be run solely by those who have
> enough free time. Also, I wouldn't use memcached as a caching daemon,
> primarily because I'm not sure such an application even needs a caching
> daemon. All it does is relay messages.
>
> *--*
> *Tyler Romeo*
> Stevens Institute of Technology, Class of 2015
> Major in Computer Science
> www.whizkidztech.com | [hidden email]
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: switching to something better than irc.wikimedia.org

Yuvi Panda
0mq? RabbitMQ? Seem to fit the use case pretty well / closely.

--
Yuvi Panda T
http://yuvi.in/blog
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: switching to something better than irc.wikimedia.org

Petr Bena
Closely, but seems a bit overcomplicated to me. What I proposed is as
simple as you could just use telnet to retrieve the last changes.

In rabbitMQ for example you need to use 3rd libraries for client so
that you can connect to server and obtain some data... But I don't
have a problem with using anything that already works and is fast and
stable. Just please let's make it better than what we have now (making
it worse would be no fun :P)

On Fri, Mar 1, 2013 at 5:36 PM, Yuvi Panda <[hidden email]> wrote:
> 0mq? RabbitMQ? Seem to fit the use case pretty well / closely.
>
> --
> Yuvi Panda T
> http://yuvi.in/blog
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: switching to something better than irc.wikimedia.org

Chad
In reply to this post by Petr Bena
On Fri, Mar 1, 2013 at 12:55 AM, Petr Bena <[hidden email]> wrote:

> Hi,
>
> I think that irc feed of recent changes is working great, but there is
> still a lot of space for improvement.
>
> As Ryan Lane suggested once, we could probably use system of queues
> instead of irc which would be even more advanced. My suggestion is to
> create some kind of feed that would be in machine parseable format,
> like XML
>
> This feed would be distributed by some kind of dispatcher living on
> some server, like feed.wikimedia.org and offering not just recent
> changes but also a recent history (for example last 5000 changes per
> project)
>
> In case that service which is parsing this feed would be down for a
> moment, it could retrieve a backlog of changes.
>
> The current feed irc.wikimedia.org should stay, but we could change it
> so that the current bot is retrieving the data from new xml feed
> instead of directly from apaches.
>

There's been a request for years to provide this data with XMPP.
https://bugzilla.wikimedia.org/17450

https://bugzilla.wikimedia.org/30555 also seems related to the
RFC.

-Chad

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: switching to something better than irc.wikimedia.org

Asher Feldman
In reply to this post by Petr Bena
I don't think a custom daemon would actually be needed.

http://redis.io/topics/pubsub

While I was at flickr, we implemented a pubsub based system to push
notifications of all photo uploads and metadata changes to google using
redis as the backend. The rate of uploads and edits at flickr in 2010 was
orders of magnitude greater than the rate of edits across all wmf projects.
Publishing to a redis pubsub channel does grow in cost as the number of
subscribers increases but I don't see a problem at our scale. If so, there
are ways around it.

We are planning on migrating the wiki job queues from mysql to redis in the
next few weeks, so it's already a growing piece of our infrastructure.  I
think the bulk of the work here would actually just be in building
a frontend webservice that supports websockets / long polling, provides a
clean api, and preferably uses oauth or some form of registration to ward
off abuse and allow us to limit the growth of subscribers as we scale.

On Friday, March 1, 2013, Petr Bena wrote:

> I still don't see it as too much complex. Matter of month(s) for
> volunteers with limited time.
>
> However I quite don't see what is so complicated on last 2 points.
> Given the frequency of updates it's most simple to have the client
> (user / bot / service that need to read the feed) open the persistent
> connection to server (dispatcher) which fork itself just as sshd does
> and the new process handle all requests from this client. The client
> somehow specify what kind of feed they want to have (that's the
> registration part) and forked dispatcher keeps it updated with
> information from cache.
>
> Nothing hard. And what's the problem with multithreading huh? :) BTW I
> don't really think there is a need for multithreading at all, but even
> if there was, it shouldn't be so hard.
>
> On Fri, Mar 1, 2013 at 3:47 PM, Tyler Romeo <[hidden email]<javascript:;>>
> wrote:
> > On Fri, Mar 1, 2013 at 9:16 AM, Petr Bena <[hidden email]<javascript:;>>
> wrote:
> >
> >> I have not yet found a good and stable library for JSON parsing in c#,
> >> should you know some let me know :)
> >>
> >
> > Take a look at http://www.json.org/. They have a list of implementations
> > for different languages.
> >
> > However, I disagree with "I feel like such a project would take an
> >> insane amount of resources to develop." If we wouldn't make it
> >> insanely complicated, it won't take insane amount of time ;). The
> >> cache daemon could be memcached which is already written and stable.
> >> Listener is a simple daemon that just listen in UDP, parse the data
> >> from mediawiki and store them in memcached in some universal format,
> >> and dispatcher is just process that takes the data from cache, convert
> >> them to specified format and send them to client.
> >
> >
> > Here's a quick list of things that are basic requirements we'd have to
> > implement:
> >
> >    - Multi-threading, which is in and of itself a pain in the a**.
> >    - Some sort of queue for messages, rather than hoping the daemon can
> >    send out every message in realtime.
> >    - Ability for clients to register with the daemon (and a place to
> store
> >    a client list)
> >    - Multiple methods of notification (IRC would be one, XMPP might be a
> >    candidate, and a simple HTTP endpoint would be a must).
> >
> > Just those basics isn't an easy task, especially considering unless WMF
> > allocates resources to it the project would be run solely by those who
> have
> > enough free time. Also, I wouldn't use memcached as a caching daemon,
> > primarily because I'm not sure such an application even needs a caching
> > daemon. All it does is relay messages.
> >
> > *--*
> > *Tyler Romeo*
> > Stevens Institute of Technology, Class of 2015
> > Major in Computer Science
> > www.whizkidztech.com | [hidden email] <javascript:;>
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email] <javascript:;>
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email] <javascript:;>
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: switching to something better than irc.wikimedia.org

Petr Bena
web frontend you say?

if you compare the raw data of irc protocol (1 rc feed message) and
raw data of a http request and response for one page consisting only
of that 1 rc feed message, you will see a huge difference in size and
performance.

Also all kinds of authentication required doesn't seem like an
improvement to me. It will only complicate what is simple now. Have
there been many attempts to abuse irc.wikimedia.org so far? there is
no authentication at all.

On Fri, Mar 1, 2013 at 5:46 PM, Asher Feldman <[hidden email]> wrote:

> I don't think a custom daemon would actually be needed.
>
> http://redis.io/topics/pubsub
>
> While I was at flickr, we implemented a pubsub based system to push
> notifications of all photo uploads and metadata changes to google using
> redis as the backend. The rate of uploads and edits at flickr in 2010 was
> orders of magnitude greater than the rate of edits across all wmf projects.
> Publishing to a redis pubsub channel does grow in cost as the number of
> subscribers increases but I don't see a problem at our scale. If so, there
> are ways around it.
>
> We are planning on migrating the wiki job queues from mysql to redis in the
> next few weeks, so it's already a growing piece of our infrastructure.  I
> think the bulk of the work here would actually just be in building
> a frontend webservice that supports websockets / long polling, provides a
> clean api, and preferably uses oauth or some form of registration to ward
> off abuse and allow us to limit the growth of subscribers as we scale.
>
> On Friday, March 1, 2013, Petr Bena wrote:
>
>> I still don't see it as too much complex. Matter of month(s) for
>> volunteers with limited time.
>>
>> However I quite don't see what is so complicated on last 2 points.
>> Given the frequency of updates it's most simple to have the client
>> (user / bot / service that need to read the feed) open the persistent
>> connection to server (dispatcher) which fork itself just as sshd does
>> and the new process handle all requests from this client. The client
>> somehow specify what kind of feed they want to have (that's the
>> registration part) and forked dispatcher keeps it updated with
>> information from cache.
>>
>> Nothing hard. And what's the problem with multithreading huh? :) BTW I
>> don't really think there is a need for multithreading at all, but even
>> if there was, it shouldn't be so hard.
>>
>> On Fri, Mar 1, 2013 at 3:47 PM, Tyler Romeo <[hidden email]<javascript:;>>
>> wrote:
>> > On Fri, Mar 1, 2013 at 9:16 AM, Petr Bena <[hidden email]<javascript:;>>
>> wrote:
>> >
>> >> I have not yet found a good and stable library for JSON parsing in c#,
>> >> should you know some let me know :)
>> >>
>> >
>> > Take a look at http://www.json.org/. They have a list of implementations
>> > for different languages.
>> >
>> > However, I disagree with "I feel like such a project would take an
>> >> insane amount of resources to develop." If we wouldn't make it
>> >> insanely complicated, it won't take insane amount of time ;). The
>> >> cache daemon could be memcached which is already written and stable.
>> >> Listener is a simple daemon that just listen in UDP, parse the data
>> >> from mediawiki and store them in memcached in some universal format,
>> >> and dispatcher is just process that takes the data from cache, convert
>> >> them to specified format and send them to client.
>> >
>> >
>> > Here's a quick list of things that are basic requirements we'd have to
>> > implement:
>> >
>> >    - Multi-threading, which is in and of itself a pain in the a**.
>> >    - Some sort of queue for messages, rather than hoping the daemon can
>> >    send out every message in realtime.
>> >    - Ability for clients to register with the daemon (and a place to
>> store
>> >    a client list)
>> >    - Multiple methods of notification (IRC would be one, XMPP might be a
>> >    candidate, and a simple HTTP endpoint would be a must).
>> >
>> > Just those basics isn't an easy task, especially considering unless WMF
>> > allocates resources to it the project would be run solely by those who
>> have
>> > enough free time. Also, I wouldn't use memcached as a caching daemon,
>> > primarily because I'm not sure such an application even needs a caching
>> > daemon. All it does is relay messages.
>> >
>> > *--*
>> > *Tyler Romeo*
>> > Stevens Institute of Technology, Class of 2015
>> > Major in Computer Science
>> > www.whizkidztech.com | [hidden email] <javascript:;>
>> > _______________________________________________
>> > Wikitech-l mailing list
>> > [hidden email] <javascript:;>
>> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email] <javascript:;>
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: switching to something better than irc.wikimedia.org

Tyler Romeo
In reply to this post by Asher Feldman
On Fri, Mar 1, 2013 at 11:46 AM, Asher Feldman <[hidden email]>wrote:

> don't think a custom daemon would actually be needed.
>
> http://redis.io/topics/pubsub
>
>
>
> While I was at flickr, we implemented a pubsub based system to push
> notifications of all photo uploads and metadata changes to google using
> redis as the backend. The rate of uploads and edits at flickr in 2010 was
> orders of magnitude greater than the rate of edits across all wmf projects.
> Publishing to a redis pubsub channel does grow in cost as the number of
> subscribers increases but I don't see a problem at our scale. If so, there
> are ways around it.
>
> We are planning on migrating the wiki job queues from mysql to redis in the
> next few weeks, so it's already a growing piece of our infrastructure.  I
> think the bulk of the work here would actually just be in building
> a frontend webservice that supports websockets / long polling, provides a
> clean api, and preferably uses oauth or some form of registration to ward
> off abuse and allow us to limit the growth of subscribers as we scale.
>

Interesting. Didn't know Redis had something like this. I'm not too
knowledgeable about Redis, but would clients be able to subscribe directly
to Redis queues? Or would that be a security issue (like allowing people to
access Memcached would be) and we would have to implement our own
notification service anyway?

0mq? RabbitMQ? Seem to fit the use case pretty well / closely.


Hmm, I've always only thought of RabbitMQ as a messaging service between
linked applications, but I guess it could be used as a type of push
notification service as well.

*--*
*Tyler Romeo*
Stevens Institute of Technology, Class of 2015
Major in Computer Science
www.whizkidztech.com | [hidden email]
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: switching to something better than irc.wikimedia.org

Asher Feldman
In reply to this post by Petr Bena
On Friday, March 1, 2013, Petr Bena wrote:

> web frontend you say?
>
> if you compare the raw data of irc protocol (1 rc feed message) and
> raw data of a http request and response for one page consisting only
> of that 1 rc feed message, you will see a huge difference in size and
> performance.


I was sugesting it for websockets or a long poll, the above comparison
isn't relevant.  Connection is established, with its protocol overhead. It
stays open and messages are continually pushed from the server. Not a web
request for a page containing one rc message.

Also all kinds of authentication required doesn't seem like an
> improvement to me. It will only complicate what is simple now. Have
> there been many attempts to abuse irc.wikimedia.org so far? there is
> no authentication at all.


Maybe none is needed but I don't think the irc feed interests anyone
outside of a very small community. Doing something a little more modern
might attract different uses. It might not, but I have no idea.


>
> On Fri, Mar 1, 2013 at 5:46 PM, Asher Feldman <[hidden email]<javascript:;>>
> wrote:
> > I don't think a custom daemon would actually be needed.
> >
> > http://redis.io/topics/pubsub
> >
> > While I was at flickr, we implemented a pubsub based system to push
> > notifications of all photo uploads and metadata changes to google using
> > redis as the backend. The rate of uploads and edits at flickr in 2010 was
> > orders of magnitude greater than the rate of edits across all wmf
> projects.
> > Publishing to a redis pubsub channel does grow in cost as the number of
> > subscribers increases but I don't see a problem at our scale. If so,
> there
> > are ways around it.
> >
> > We are planning on migrating the wiki job queues from mysql to redis in
> the
> > next few weeks, so it's already a growing piece of our infrastructure.  I
> > think the bulk of the work here would actually just be in building
> > a frontend webservice that supports websockets / long polling, provides a
> > clean api, and preferably uses oauth or some form of registration to ward
> > off abuse and allow us to limit the growth of subscribers as we scale.
> >
> > On Friday, March 1, 2013, Petr Bena wrote:
> >
> >> I still don't see it as too much complex. Matter of month(s) for
> >> volunteers with limited time.
> >>
> >> However I quite don't see what is so complicated on last 2 points.
> >> Given the frequency of updates it's most simple to have the client
> >> (user / bot / service that need to read the feed) open the persistent
> >> connection to server (dispatcher) which fork itself just as sshd does
> >> and the new process handle all requests from this client. The client
> >> somehow specify what kind of feed they want to have (that's the
> >> registration part) and forked dispatcher keeps it updated with
> >> information from cache.
> >>
> >> Nothing hard. And what's the problem with multithreading huh? :) BTW I
> >> don't really think there is a need for multithreading at all, but even
> >> if there was, it shouldn't be so hard.
> >>
> >> On Fri, Mar 1, 2013 at 3:47 PM, Tyler Romeo <[hidden email]<javascript:;>
> <javascript:;>>
> >> wrote:
> >> > On Fri, Mar 1, 2013 at 9:16 AM, Petr Bena <[hidden email]<javascript:;>
> <javascript:;>>
> >> wrote:
> >> >
> >> >> I have not yet found a good and stable library for JSON parsing in
> c#,
> >> >> should you know some let me know :)
> >> >>
> >> >
> >> > Take a look at http://www.json.org/. They have a list of
> implementations
> >> > for different languages.
> >> >
> >> > However, I disagree with "I feel like such a project would take an
> >> >> insane amount of resources to develop." If we wouldn't make it
> >> >> insanely complicated, it won't take insane amount of time ;). The
> >> >> cache daemon could be memcached which is already written and stable.
> >> >> Listener is a simple daemon that just listen in UDP, parse the data
> >> >> from mediawiki and store them in memcached in some universal format,
> >> >> and dispatcher is just process that takes the data from cache,
> convert
> >> >> them to specified format and send them to client.
> >> >
> >> >
> >> > Here's a quick list of things that are basic requirements we'd have to
> >> > implement:
> >> >
> >> >    - Multi-threading, which is in and of itself a pain in the a**.
> >> >    - Some sort of queue for messages, rather than hoping the daemon
> can
> >> >    send out every message in realtime.
> >> >    - Ability for clients to register with the daemon (and a place to
> >> store
> >> >    a client list)
> >> >    - Multiple methods of notification (IRC would be one, XMPP might
> be a
> >> >    candidate, and a simple HTTP endpoint would be a must).
> >> >
> >> > Just those basics isn't an easy task, especially considering unless
> WMF
> >> > allocates resources to it the project would be run solely by those who
> >> have
> >> > enough free time. Also, I wouldn't use memcached as a caching daemon,
> >> > primarily because I'm not sure such an application even needs a
> caching
> >> > daemon. All it does is relay messages.
> >> >
> >> > *--*
> >> > *Tyler Romeo*
> >> > Stevens Institute of Technology, Class of 2015
> >> > Major in Computer Science
> >> > www.whizkidztech.com | [hidden email] <javascript:;><javascript:;>
> >> > _______________________________________________
> >> > Wikitech-l mailing list
> >> > [hidden email] <javascript:;> <javascript:;>
> >> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >>
> >> _______________________________________________
> >> Wikitech-l mailing list
> >> [hidden email] <javascript:;> <javascript:;>
> >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email] <javascript:;>
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email] <javascript:;>
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
12