Phabricator was down for a short time today (April 4th)

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Phabricator was down for a short time today (April 4th)

Greg Grossmeier-2
Apologies for not sending out this announcement before hand.

Short summary: The machine that Phabricator is hosted on rebooted itself
last night due to high temperatures. It ended up just shutting itself
down.

Today we needed our DataCenter Technician to reapply the thermal paste
in an attempt to remedy the issue. That took less than 10 minutes but it
happened during the middle of the day.

Full details: https://phabricator.wikimedia.org/T131742

And yes, we are requesting a backup machine so issues like this won't
have as much of an impact on you (our users):
https://phabricator.wikimedia.org/T131775

Best,

Greg

--
| Greg Grossmeier            GPG: B2FA 27B1 F7EB D327 6B8E |
| identi.ca: @greg                A18D 1138 8E47 FAC8 1C7D |

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Phabricator was down for a short time today (April 4th)

Vi to
Why not a small virtualised cluster for these not-so-resource-consuming
services like OTRS, phab, etc?

/me runs away before writing the world-which-shouldn't be written

Vito

Il 04/04/2016 19:57, Greg Grossmeier ha scritto:

> Apologies for not sending out this announcement before hand.
>
> Short summary: The machine that Phabricator is hosted on rebooted itself
> last night due to high temperatures. It ended up just shutting itself
> down.
>
> Today we needed our DataCenter Technician to reapply the thermal paste
> in an attempt to remedy the issue. That took less than 10 minutes but it
> happened during the middle of the day.
>
> Full details: https://phabricator.wikimedia.org/T131742
>
> And yes, we are requesting a backup machine so issues like this won't
> have as much of an impact on you (our users):
> https://phabricator.wikimedia.org/T131775
>
> Best,
>
> Greg
>


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Phabricator was down for a short time today (April 4th)

Alex Monk
Actually I believe OTRS was moved into the ganeti VM cluster a couple of
months ago.

I'm not sure whether Phabricator is considered a not-so-resource-consuming
service...

On 4 April 2016 at 19:01, Vituzzu <[hidden email]> wrote:

> Why not a small virtualised cluster for these not-so-resource-consuming
> services like OTRS, phab, etc?
>
> /me runs away before writing the world-which-shouldn't be written
>
> Vito
>
> Il 04/04/2016 19:57, Greg Grossmeier ha scritto:
>
>> Apologies for not sending out this announcement before hand.
>>
>> Short summary: The machine that Phabricator is hosted on rebooted itself
>> last night due to high temperatures. It ended up just shutting itself
>> down.
>>
>> Today we needed our DataCenter Technician to reapply the thermal paste
>> in an attempt to remedy the issue. That took less than 10 minutes but it
>> happened during the middle of the day.
>>
>> Full details: https://phabricator.wikimedia.org/T131742
>>
>> And yes, we are requesting a backup machine so issues like this won't
>> have as much of an impact on you (our users):
>> https://phabricator.wikimedia.org/T131775
>>
>> Best,
>>
>> Greg
>>
>>
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Phabricator was down for a short time today (April 4th)

Vi to
Uhm, tech still lists OTRS on iodine, which seems to be decommissioned.
I had a look at
http://ganglia.wikimedia.org/latest/?r=year&cs=&ce=&m=cpu_report&c=Miscellaneous+eqiad&h=iridium.eqiad.wmnet&tab=m&vn=&hide-hf=false&mc=2&z=medium&metric_group=NOGROUPS 
and...well fairly busy but still "packable", though I'll let more
experienced people think about it ;)

Vito

Il 04/04/2016 20:11, Alex Monk ha scritto:

> Actually I believe OTRS was moved into the ganeti VM cluster a couple of
> months ago.
>
> I'm not sure whether Phabricator is considered a not-so-resource-consuming
> service...
>
> On 4 April 2016 at 19:01, Vituzzu <[hidden email]> wrote:
>
>> Why not a small virtualised cluster for these not-so-resource-consuming
>> services like OTRS, phab, etc?
>>
>> /me runs away before writing the world-which-shouldn't be written
>>
>> Vito
>>
>> Il 04/04/2016 19:57, Greg Grossmeier ha scritto:
>>
>>> Apologies for not sending out this announcement before hand.
>>>
>>> Short summary: The machine that Phabricator is hosted on rebooted itself
>>> last night due to high temperatures. It ended up just shutting itself
>>> down.
>>>
>>> Today we needed our DataCenter Technician to reapply the thermal paste
>>> in an attempt to remedy the issue. That took less than 10 minutes but it
>>> happened during the middle of the day.
>>>
>>> Full details: https://phabricator.wikimedia.org/T131742
>>>
>>> And yes, we are requesting a backup machine so issues like this won't
>>> have as much of an impact on you (our users):
>>> https://phabricator.wikimedia.org/T131775
>>>
>>> Best,
>>>
>>> Greg
>>>
>>>
>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Phabricator was down for a short time today (April 4th)

Alex Monk
Yeah... It also still mentions mchenry. I marked it as outdated.

On 4 April 2016 at 19:24, Vituzzu <[hidden email]> wrote:

> Uhm, tech still lists OTRS on iodine, which seems to be decommissioned. I
> had a look at
> http://ganglia.wikimedia.org/latest/?r=year&cs=&ce=&m=cpu_report&c=Miscellaneous+eqiad&h=iridium.eqiad.wmnet&tab=m&vn=&hide-hf=false&mc=2&z=medium&metric_group=NOGROUPS
> and...well fairly busy but still "packable", though I'll let more
> experienced people think about it ;)
>
> Vito
>
>
> Il 04/04/2016 20:11, Alex Monk ha scritto:
>
>> Actually I believe OTRS was moved into the ganeti VM cluster a couple of
>> months ago.
>>
>> I'm not sure whether Phabricator is considered a not-so-resource-consuming
>> service...
>>
>> On 4 April 2016 at 19:01, Vituzzu <[hidden email]> wrote:
>>
>> Why not a small virtualised cluster for these not-so-resource-consuming
>>> services like OTRS, phab, etc?
>>>
>>> /me runs away before writing the world-which-shouldn't be written
>>>
>>> Vito
>>>
>>> Il 04/04/2016 19:57, Greg Grossmeier ha scritto:
>>>
>>> Apologies for not sending out this announcement before hand.
>>>>
>>>> Short summary: The machine that Phabricator is hosted on rebooted itself
>>>> last night due to high temperatures. It ended up just shutting itself
>>>> down.
>>>>
>>>> Today we needed our DataCenter Technician to reapply the thermal paste
>>>> in an attempt to remedy the issue. That took less than 10 minutes but it
>>>> happened during the middle of the day.
>>>>
>>>> Full details: https://phabricator.wikimedia.org/T131742
>>>>
>>>> And yes, we are requesting a backup machine so issues like this won't
>>>> have as much of an impact on you (our users):
>>>> https://phabricator.wikimedia.org/T131775
>>>>
>>>> Best,
>>>>
>>>> Greg
>>>>
>>>>
>>>> _______________________________________________
>>> Wikitech-l mailing list
>>> [hidden email]
>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>>
>>> _______________________________________________
>> Wikitech-l mailing list
>> [hidden email]
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Phabricator was down for a short time today (April 4th)

Ricordisamoa
In reply to this post by Greg Grossmeier-2
Cool down, Phab. Cool down. We need you.

Il 04/04/2016 19:57, Greg Grossmeier ha scritto:

> Apologies for not sending out this announcement before hand.
>
> Short summary: The machine that Phabricator is hosted on rebooted itself
> last night due to high temperatures. It ended up just shutting itself
> down.
>
> Today we needed our DataCenter Technician to reapply the thermal paste
> in an attempt to remedy the issue. That took less than 10 minutes but it
> happened during the middle of the day.
>
> Full details: https://phabricator.wikimedia.org/T131742
>
> And yes, we are requesting a backup machine so issues like this won't
> have as much of an impact on you (our users):
> https://phabricator.wikimedia.org/T131775
>
> Best,
>
> Greg
>


_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Phabricator was down for a short time today (April 4th)

Mukunda Modell
In reply to this post by Vi to
I would like to see Phabricator move to a virtual owed cluster, or even a
bare metal cluster. However, I would not include Phabricator in a list
of "not-so-resource-consuming
services."

Phabricator definitely uses significant amount of resources. It's running
on a 12 core server with 64gb ram and 500gb raid storage, currently.
That hardware
is a little bit more than required but not by very large margin.

I've recently requested a second machine to be used for a backup
Phabricator instance, so that we can avoid long downtimes during
maintenance or hardware failure. That request is still being discussed at
https://phabricator.wikimedia.org/T131775

Phabricator is gaining improved high-availability support thanks to recent
work upstream, so it might be possible to have dual-master phabricator
nodes in the near future. See https://secure.phabricator.com/T10751 for
upstream progress.

On Monday, April 4, 2016, Vituzzu <[hidden email]> wrote:

> Why not a small virtualised cluster for these not-so-resource-consuming
> services like OTRS, phab, etc?
>
> /me runs away before writing the world-which-shouldn't be written
>
> Vito
>
> Il 04/04/2016 19:57, Greg Grossmeier ha scritto:
>
>> Apologies for not sending out this announcement before hand.
>>
>> Short summary: The machine that Phabricator is hosted on rebooted itself
>> last night due to high temperatures. It ended up just shutting itself
>> down.
>>
>> Today we needed our DataCenter Technician to reapply the thermal paste
>> in an attempt to remedy the issue. That took less than 10 minutes but it
>> happened during the middle of the day.
>>
>> Full details: https://phabricator.wikimedia.org/T131742
>>
>> And yes, we are requesting a backup machine so issues like this won't
>> have as much of an impact on you (our users):
>> https://phabricator.wikimedia.org/T131775
>>
>> Best,
>>
>> Greg
>>
>>
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Phabricator was down for a short time today (April 4th)

Jaime Crespo
On Thu, Apr 14, 2016 at 10:26 PM, Mukunda Modell <[hidden email]> wrote:
> Phabricator is gaining improved high-availability support thanks to recent
> work upstream, so it might be possible to have dual-master phabricator
> nodes in the near future. See https://secure.phabricator.com/T10751 for
> upstream progress.

Phabricator has 3 dedicated bare-metal machines on database side
(including geographical replication and a 24-hour delayed slave).
Currently the slaves are mostly used for backups, maintenance and long
running stats.

It would be great if they could be used for the main app, too (ro and
semi-automatic failover)!

_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Reply | Threaded
Open this post in threaded view
|

Re: Phabricator was down for a short time today (April 4th)

Mukunda Modell
Indeed, Jaime, I think that will be possible very soon. The upstream task I
linked to previously (https://secure.phabricator.com/T10751) is about two
things:

1. properly handling clustered databases, especially dealing with read-only
slaves
2. Implementing replicated git hosting, where repositories can exist on
multiple back-end nodes with reads going to any up-to-date copy of the repo
and writes replicated to every other copy.

It seems like progress is being made on both fronts and I am happy to see
it.


On Fri, Apr 15, 2016 at 12:35 AM, Jaime Crespo <[hidden email]>
wrote:

> On Thu, Apr 14, 2016 at 10:26 PM, Mukunda Modell <[hidden email]>
> wrote:
> > Phabricator is gaining improved high-availability support thanks to
> recent
> > work upstream, so it might be possible to have dual-master phabricator
> > nodes in the near future. See https://secure.phabricator.com/T10751 for
> > upstream progress.
>
> Phabricator has 3 dedicated bare-metal machines on database side
> (including geographical replication and a 24-hour delayed slave).
> Currently the slaves are mostly used for backups, maintenance and long
> running stats.
>
> It would be great if they could be used for the main app, too (ro and
> semi-automatic failover)!
>
> _______________________________________________
> Wikitech-l mailing list
> [hidden email]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[hidden email]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l