m1 (etherpad, bacula, librenms, rt, racktables) db primary master switchover: 10th Sept 16:00 UTC

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

m1 (etherpad, bacula, librenms, rt, racktables) db primary master switchover: 10th Sept 16:00 UTC

Manuel Arostegui
Hello,

The current primary master for m1 (db1063), which is mostly for internal
services + etherpad isn't in a great healthy status: it is an old host,
which needs to be decommissioned and which is having disks failing pretty
much every week (plus disks on predictive failure).

We have decided to fail it over one of the newer hosts, db1135:
https://phabricator.wikimedia.org/T231403

We have scheduled this switchover for: Tuesday 10th September at 16:00 UTC

This failover should be rather quick and would only take a few seconds
(while we re-load the haproxy) - during those few seconds, the following
services will be on read-only:

bacula
etherpadlite
librenms
puppet
racktables
rt

Communication will happen at #wikimedia-operations
If you are around at that time and want to help with the monitoring, please
join us!

Thanks
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: m1 (etherpad, bacula, librenms, rt, racktables) db primary master switchover: 10th Sept 16:00 UTC

Pine W
Hello,

Does this mean that https://etherpad.wikimedia.org/ will be temporarily
read only? That site seems to be somewhat popular for taking notes during
meetings. If that site will be affected then I recommend that you put up a
site notice there, possibly a multilingual one depending on how much use
the site gets for non-English content.
Pine
( https://meta.wikimedia.org/wiki/User:Pine )



On Thu, Aug 29, 2019, 04:01 Manuel Arostegui <[hidden email]>
wrote:

> Hello,
>
> The current primary master for m1 (db1063), which is mostly for internal
> services + etherpad isn't in a great healthy status: it is an old host,
> which needs to be decommissioned and which is having disks failing pretty
> much every week (plus disks on predictive failure).
>
> We have decided to fail it over one of the newer hosts, db1135:
> https://phabricator.wikimedia.org/T231403
>
> We have scheduled this switchover for: Tuesday 10th September at 16:00 UTC
>
> This failover should be rather quick and would only take a few seconds
> (while we re-load the haproxy) - during those few seconds, the following
> services will be on read-only:
>
> bacula
> etherpadlite
> librenms
> puppet
> racktables
> rt
>
> Communication will happen at #wikimedia-operations
> If you are around at that time and want to help with the monitoring, please
> join us!
>
> Thanks
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: m1 (etherpad, bacula, librenms, rt, racktables) db primary master switchover: 10th Sept 16:00 UTC

Manuel Arostegui
On Thu, Aug 29, 2019 at 3:05 PM Pine W <[hidden email]> wrote:

> Hello,
>
> Does this mean that https://etherpad.wikimedia.org/ will be temporarily
> read only? That site seems to be somewhat popular for taking notes during
> meetings. If that site will be affected then I recommend that you put up a
> site notice there, possibly a multilingual one depending on how much use
> the site gets for non-English content.
> Pine
> ( https://meta.wikimedia.org/wiki/User:Pine )
>
>
>
Hello Pine,

Yes, etherpad will be read-only for a few seconds (5-10 seconds)
I don't know if it is possible to put a banner on Etherpad itself, is that
what you mean?

Thanks
Manuel.



>
> On Thu, Aug 29, 2019, 04:01 Manuel Arostegui <[hidden email]>
> wrote:
>
> > Hello,
> >
> > The current primary master for m1 (db1063), which is mostly for internal
> > services + etherpad isn't in a great healthy status: it is an old host,
> > which needs to be decommissioned and which is having disks failing pretty
> > much every week (plus disks on predictive failure).
> >
> > We have decided to fail it over one of the newer hosts, db1135:
> > https://phabricator.wikimedia.org/T231403
> >
> > We have scheduled this switchover for: Tuesday 10th September at 16:00
> UTC
> >
> > This failover should be rather quick and would only take a few seconds
> > (while we re-load the haproxy) - during those few seconds, the following
> > services will be on read-only:
> >
> > bacula
> > etherpadlite
> > librenms
> > puppet
> > racktables
> > rt
> >
> > Communication will happen at #wikimedia-operations
> > If you are around at that time and want to help with the monitoring,
> please
> > join us!
> >
> > Thanks
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: m1 (etherpad, bacula, librenms, rt, racktables) db primary master switchover: 10th Sept 16:00 UTC

Pine W
Yes, I tink that a banner would be preferable, but i think that a notice on
the homepage would be a reasonable backup option.

Pine
( https://meta.wikimedia.org/wiki/User:Pine )

On Thu, Aug 29, 2019, 06:28 Manuel Arostegui <[hidden email]>
wrote:

> On Thu, Aug 29, 2019 at 3:05 PM Pine W <[hidden email]> wrote:
>
> > Hello,
> >
> > Does this mean that https://etherpad.wikimedia.org/ will be temporarily
> > read only? That site seems to be somewhat popular for taking notes during
> > meetings. If that site will be affected then I recommend that you put up
> a
> > site notice there, possibly a multilingual one depending on how much use
> > the site gets for non-English content.
> > Pine
> > ( https://meta.wikimedia.org/wiki/User:Pine )
> >
> >
> >
> Hello Pine,
>
> Yes, etherpad will be read-only for a few seconds (5-10 seconds)
> I don't know if it is possible to put a banner on Etherpad itself, is that
> what you mean?
>
> Thanks
> Manuel.
>
>
>
> >
> > On Thu, Aug 29, 2019, 04:01 Manuel Arostegui <[hidden email]>
> > wrote:
> >
> > > Hello,
> > >
> > > The current primary master for m1 (db1063), which is mostly for
> internal
> > > services + etherpad isn't in a great healthy status: it is an old host,
> > > which needs to be decommissioned and which is having disks failing
> pretty
> > > much every week (plus disks on predictive failure).
> > >
> > > We have decided to fail it over one of the newer hosts, db1135:
> > > https://phabricator.wikimedia.org/T231403
> > >
> > > We have scheduled this switchover for: Tuesday 10th September at 16:00
> > UTC
> > >
> > > This failover should be rather quick and would only take a few seconds
> > > (while we re-load the haproxy) - during those few seconds, the
> following
> > > services will be on read-only:
> > >
> > > bacula
> > > etherpadlite
> > > librenms
> > > puppet
> > > racktables
> > > rt
> > >
> > > Communication will happen at #wikimedia-operations
> > > If you are around at that time and want to help with the monitoring,
> > please
> > > join us!
> > >
> > > Thanks
> > > _______________________________________________
> > > Wikitech-l mailing list
> > > [hidden email]
> > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: m1 (etherpad, bacula, librenms, rt, racktables) db primary master switchover: 10th Sept 16:00 UTC

Manuel Arostegui
On Thu, Aug 29, 2019 at 3:50 PM Pine W <[hidden email]> wrote:

> Yes, I tink that a banner would be preferable, but i think that a notice on
> the homepage would be a reasonable backup option.
>
>
Thanks - Unfortunately I don't know if that is possible on Etherpad, I
don't have the knowledge to do so.
Maybe I can put an User-Notice tag on the phabricator task, so more people
would be aware of that brief interruption?
Again, we are expecting 5-10 seconds of read-only time.

Thanks,
Manuel.


> Pine
> ( https://meta.wikimedia.org/wiki/User:Pine )
>
> On Thu, Aug 29, 2019, 06:28 Manuel Arostegui <[hidden email]>
> wrote:
>
> > On Thu, Aug 29, 2019 at 3:05 PM Pine W <[hidden email]> wrote:
> >
> > > Hello,
> > >
> > > Does this mean that https://etherpad.wikimedia.org/ will be
> temporarily
> > > read only? That site seems to be somewhat popular for taking notes
> during
> > > meetings. If that site will be affected then I recommend that you put
> up
> > a
> > > site notice there, possibly a multilingual one depending on how much
> use
> > > the site gets for non-English content.
> > > Pine
> > > ( https://meta.wikimedia.org/wiki/User:Pine )
> > >
> > >
> > >
> > Hello Pine,
> >
> > Yes, etherpad will be read-only for a few seconds (5-10 seconds)
> > I don't know if it is possible to put a banner on Etherpad itself, is
> that
> > what you mean?
> >
> > Thanks
> > Manuel.
> >
> >
> >
> > >
> > > On Thu, Aug 29, 2019, 04:01 Manuel Arostegui <[hidden email]
> >
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > The current primary master for m1 (db1063), which is mostly for
> > internal
> > > > services + etherpad isn't in a great healthy status: it is an old
> host,
> > > > which needs to be decommissioned and which is having disks failing
> > pretty
> > > > much every week (plus disks on predictive failure).
> > > >
> > > > We have decided to fail it over one of the newer hosts, db1135:
> > > > https://phabricator.wikimedia.org/T231403
> > > >
> > > > We have scheduled this switchover for: Tuesday 10th September at
> 16:00
> > > UTC
> > > >
> > > > This failover should be rather quick and would only take a few
> seconds
> > > > (while we re-load the haproxy) - during those few seconds, the
> > following
> > > > services will be on read-only:
> > > >
> > > > bacula
> > > > etherpadlite
> > > > librenms
> > > > puppet
> > > > racktables
> > > > rt
> > > >
> > > > Communication will happen at #wikimedia-operations
> > > > If you are around at that time and want to help with the monitoring,
> > > please
> > > > join us!
> > > >
> > > > Thanks
> > > > _______________________________________________
> > > > Wikitech-l mailing list
> > > > [hidden email]
> > > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > > _______________________________________________
> > > Wikitech-l mailing list
> > > [hidden email]
> > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: m1 (etherpad, bacula, librenms, rt, racktables) db primary master switchover: 10th Sept 16:00 UTC

Pine W
Hi Manuel,

I'm not familiar with User-Notice tags, but I'll trust your judgment.
Perhaps you could also mention the planned outage of Etherpad in Tech News.

It sounds like this will be a low impact outage, but there is a small
chance that someone will be taking notes on Etherpad during a meeting that
is important to them and they will experience a few seconds of information
loss during the time that Etherpad is read only due to the inability to
write new information while the meeting is ongoing.

Pine
( https://meta.wikimedia.org/wiki/User:Pine )


On Thu, Aug 29, 2019, 06:55 Manuel Arostegui <[hidden email]>
wrote:

> On Thu, Aug 29, 2019 at 3:50 PM Pine W <[hidden email]> wrote:
>
> > Yes, I tink that a banner would be preferable, but i think that a notice
> on
> > the homepage would be a reasonable backup option.
> >
> >
> Thanks - Unfortunately I don't know if that is possible on Etherpad, I
> don't have the knowledge to do so.
> Maybe I can put an User-Notice tag on the phabricator task, so more people
> would be aware of that brief interruption?
> Again, we are expecting 5-10 seconds of read-only time.
>
> Thanks,
> Manuel.
>
>
> > Pine
> > ( https://meta.wikimedia.org/wiki/User:Pine )
> >
> > On Thu, Aug 29, 2019, 06:28 Manuel Arostegui <[hidden email]>
> > wrote:
> >
> > > On Thu, Aug 29, 2019 at 3:05 PM Pine W <[hidden email]> wrote:
> > >
> > > > Hello,
> > > >
> > > > Does this mean that https://etherpad.wikimedia.org/ will be
> > temporarily
> > > > read only? That site seems to be somewhat popular for taking notes
> > during
> > > > meetings. If that site will be affected then I recommend that you put
> > up
> > > a
> > > > site notice there, possibly a multilingual one depending on how much
> > use
> > > > the site gets for non-English content.
> > > > Pine
> > > > ( https://meta.wikimedia.org/wiki/User:Pine )
> > > >
> > > >
> > > >
> > > Hello Pine,
> > >
> > > Yes, etherpad will be read-only for a few seconds (5-10 seconds)
> > > I don't know if it is possible to put a banner on Etherpad itself, is
> > that
> > > what you mean?
> > >
> > > Thanks
> > > Manuel.
> > >
> > >
> > >
> > > >
> > > > On Thu, Aug 29, 2019, 04:01 Manuel Arostegui <
> [hidden email]
> > >
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > The current primary master for m1 (db1063), which is mostly for
> > > internal
> > > > > services + etherpad isn't in a great healthy status: it is an old
> > host,
> > > > > which needs to be decommissioned and which is having disks failing
> > > pretty
> > > > > much every week (plus disks on predictive failure).
> > > > >
> > > > > We have decided to fail it over one of the newer hosts, db1135:
> > > > > https://phabricator.wikimedia.org/T231403
> > > > >
> > > > > We have scheduled this switchover for: Tuesday 10th September at
> > 16:00
> > > > UTC
> > > > >
> > > > > This failover should be rather quick and would only take a few
> > seconds
> > > > > (while we re-load the haproxy) - during those few seconds, the
> > > following
> > > > > services will be on read-only:
> > > > >
> > > > > bacula
> > > > > etherpadlite
> > > > > librenms
> > > > > puppet
> > > > > racktables
> > > > > rt
> > > > >
> > > > > Communication will happen at #wikimedia-operations
> > > > > If you are around at that time and want to help with the
> monitoring,
> > > > please
> > > > > join us!
> > > > >
> > > > > Thanks
> > > > > _______________________________________________
> > > > > Wikitech-l mailing list
> > > > > [hidden email]
> > > > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > > > _______________________________________________
> > > > Wikitech-l mailing list
> > > > [hidden email]
> > > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > > _______________________________________________
> > > Wikitech-l mailing list
> > > [hidden email]
> > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > _______________________________________________
> > Wikitech-l mailing list
> > [hidden email]
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: m1 (etherpad, bacula, librenms, rt, racktables) db primary master switchover: 10th Sept 16:00 UTC

Manuel Arostegui
In reply to this post by Manuel Arostegui
Hello,

This will happen in 1 hour.

Thanks
Manuel.

On Thu, Aug 29, 2019 at 1:00 PM Manuel Arostegui <[hidden email]>
wrote:

> Hello,
>
> The current primary master for m1 (db1063), which is mostly for internal
> services + etherpad isn't in a great healthy status: it is an old host,
> which needs to be decommissioned and which is having disks failing pretty
> much every week (plus disks on predictive failure).
>
> We have decided to fail it over one of the newer hosts, db1135:
> https://phabricator.wikimedia.org/T231403
>
> We have scheduled this switchover for: Tuesday 10th September at 16:00 UTC
>
> This failover should be rather quick and would only take a few seconds
> (while we re-load the haproxy) - during those few seconds, the following
> services will be on read-only:
>
> bacula
> etherpadlite
> librenms
> puppet
> racktables
> rt
>
> Communication will happen at #wikimedia-operations
> If you are around at that time and want to help with the monitoring,
> please join us!
>
> Thanks
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: m1 (etherpad, bacula, librenms, rt, racktables) db primary master switchover: 10th Sept 16:00 UTC

Manuel Arostegui
This was done.
Read-only starts: Tue Sep 10 16:10:39 UTC 2019
Read-only stops: Tue Sep 10 16:10:45 UTC 2019

Total read-only time: 6 seconds

On Tue, Sep 10, 2019 at 5:00 PM Manuel Arostegui <[hidden email]>
wrote:

> Hello,
>
> This will happen in 1 hour.
>
> Thanks
> Manuel.
>
> On Thu, Aug 29, 2019 at 1:00 PM Manuel Arostegui <[hidden email]>
> wrote:
>
>> Hello,
>>
>> The current primary master for m1 (db1063), which is mostly for internal
>> services + etherpad isn't in a great healthy status: it is an old host,
>> which needs to be decommissioned and which is having disks failing pretty
>> much every week (plus disks on predictive failure).
>>
>> We have decided to fail it over one of the newer hosts, db1135:
>> https://phabricator.wikimedia.org/T231403
>>
>> We have scheduled this switchover for: Tuesday 10th September at 16:00 UTC
>>
>> This failover should be rather quick and would only take a few seconds
>> (while we re-load the haproxy) - during those few seconds, the following
>> services will be on read-only:
>>
>> bacula
>> etherpadlite
>> librenms
>> puppet
>> racktables
>> rt
>>
>> Communication will happen at #wikimedia-operations
>> If you are around at that time and want to help with the monitoring,
>> please join us!
>>
>> Thanks
>>
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l