Web services and SSH down?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Web services and SSH down?

DeltaQuad Wikipedia
Hi all,

I'm currently getting connection timeouts on HTTP (at the pages listed below, without the secure part), and 404s on HTTPS on TS pages such as:
https://toolserver.org/~unblock/p/appeal.php
https://toolserver.org/~acc/acc.php
https://toolserver.org/~snottywong/index.html
https://toolserver.org/~helloannyong/range/

On SSH to willow and nightshade, my key doesn't work. No errors, it just thinks for a long time after I input my username and then may or may not request a keyboard interactive password. If I don't get asked for a password, after about 1min30sec I get a connection timeout error.

DeltaQuad
English Wikipedia Administrator and Checkuser

_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: Web services and SSH down?

John Doe-27
web and ssh work for me

On Mon, Feb 25, 2013 at 8:18 PM, DeltaQuad Wikipedia
<[hidden email]> wrote:

> Hi all,
>
> I'm currently getting connection timeouts on HTTP (at the pages listed
> below, without the secure part), and 404s on HTTPS on TS pages such as:
> https://toolserver.org/~unblock/p/appeal.php
> https://toolserver.org/~acc/acc.php
> https://toolserver.org/~snottywong/index.html
> https://toolserver.org/~helloannyong/range/
>
> On SSH to willow and nightshade, my key doesn't work. No errors, it just
> thinks for a long time after I input my username and then may or may not
> request a keyboard interactive password. If I don't get asked for a
> password, after about 1min30sec I get a connection timeout error.
>
> DeltaQuad
> English Wikipedia Administrator and Checkuser
>
> _______________________________________________
> Toolserver-l mailing list ([hidden email])
> https://lists.wikimedia.org/mailman/listinfo/toolserver-l
> Posting guidelines for this list:
> https://wiki.toolserver.org/view/Mailing_list_etiquette

_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: Web services and SSH down?

DeltaQuad Wikipedia
They *just* came back up. Sorry for the spam all.

DeltaQuad
English Wikipedia Administrator and Checkuser


On Mon, Feb 25, 2013 at 8:25 PM, John <[hidden email]> wrote:
web and ssh work for me

On Mon, Feb 25, 2013 at 8:18 PM, DeltaQuad Wikipedia
<[hidden email]> wrote:
> Hi all,
>
> I'm currently getting connection timeouts on HTTP (at the pages listed
> below, without the secure part), and 404s on HTTPS on TS pages such as:
> https://toolserver.org/~unblock/p/appeal.php
> https://toolserver.org/~acc/acc.php
> https://toolserver.org/~snottywong/index.html
> https://toolserver.org/~helloannyong/range/
>
> On SSH to willow and nightshade, my key doesn't work. No errors, it just
> thinks for a long time after I input my username and then may or may not
> request a keyboard interactive password. If I don't get asked for a
> password, after about 1min30sec I get a connection timeout error.
>
> DeltaQuad
> English Wikipedia Administrator and Checkuser
>
> _______________________________________________
> Toolserver-l mailing list ([hidden email])
> https://lists.wikimedia.org/mailman/listinfo/toolserver-l
> Posting guidelines for this list:
> https://wiki.toolserver.org/view/Mailing_list_etiquette

_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette


_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: Web services and SSH down?

MZMcBride-2
DeltaQuad wrote:
> They *just* came back up. Sorry for the spam all.

It's about 9 p.m. on Monday evening right now for me.
https://toolserver.org/~mzmcbride/watcher/ and other similar URLs were
404ing for me yesterday (Sunday) evening. And then they suddenly started
working again without explanation. It seems to be an intermittent issue.

Maybe it's related to the start of a new UTC day and load? Or maybe it's
just an intermittent issue. Probably needs to be investigated if it
continues to happen, though.

MZMcBride



_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: Web services and SSH down?

Johannes Kroll
On Mon, 25 Feb 2013 20:58:19 -0500
MZMcBride <[hidden email]> wrote:

> DeltaQuad wrote:
> > They *just* came back up. Sorry for the spam all.
>
> It's about 9 p.m. on Monday evening right now for me.
> https://toolserver.org/~mzmcbride/watcher/ and other similar URLs were
> 404ing for me yesterday (Sunday) evening. And then they suddenly started
> working again without explanation. It seems to be an intermittent issue.
>
> Maybe it's related to the start of a new UTC day and load? Or maybe it's
> just an intermittent issue. Probably needs to be investigated if it
> continues to happen, though.

While trying to load http://toolserver.org/~render/stools/tlg, we got
500 errors first and then "connection reset". SSH to nightshade took 2
minutes or so to connect. Now web & ssh seems to be working again.

Yesterday evening up till early in the morning today, SQL queries were
very slow. I did't take measurements but simple page queries that would
normally execute instantly would take minutes.

Don't know if the two things are related.


_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: Web services and SSH down?

Marlen Caemmerer-3
On Tue, 26 Feb 2013, Johannes Kroll wrote:

>
> While trying to load http://toolserver.org/~render/stools/tlg, we got
> 500 errors first and then "connection reset". SSH to nightshade took 2
> minutes or so to connect. Now web & ssh seems to be working again.
>
At which time did you try about?

> Yesterday evening up till early in the morning today, SQL queries were
> very slow. I did't take measurements but simple page queries that would
> normally execute instantly would take minutes.
>
Did you try the whole night? Or which time? And which databases seemed to answer slower?
The problem is that the head nodes are doing SQL forwarding too.
So if the active one is fishy you might not even have SQL connections.
But the phenomenon should have occured between about 0:30 and 1:30 am UTC (1:30 and 2:30 CET).
If you tried outside of this timeframe it would be good to know if you had any other errors and what they looked like.

Cheers
  nosy

_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: Web services and SSH down?

Legoktm
On Tue, Feb 26, 2013 at 4:38 AM, Marlen Caemmerer <[hidden email]> wrote:
On Tue, 26 Feb 2013, Johannes Kroll wrote:


While trying to load http://toolserver.org/~render/stools/tlg, we got
500 errors first and then "connection reset". SSH to nightshade took 2
minutes or so to connect. Now web & ssh seems to be working again.
At which time did you try about?
On IRC myself and jem- reported having issues at around 10:10am UTC and it recovered around 10:14am UTC.
As of 11:08am UTC I cannot ssh in, and phe was getting 404s.
tsbot and tsnag also left the channel at 11:05am UTC after timing out.

Cheers
        nosy

--Legoktm


_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: Web services and SSH down?

Johannes Kroll
On Tue, 26 Feb 2013 05:19:32 -0600
legoktm <[hidden email]> wrote:

> On Tue, Feb 26, 2013 at 4:38 AM, Marlen Caemmerer <
> [hidden email]> wrote:
>
> > On Tue, 26 Feb 2013, Johannes Kroll wrote:
> >
> >
> >> While trying to load http://toolserver.org/~render/**stools/tlg<http://toolserver.org/~render/stools/tlg>,
> >> we got
> >> 500 errors first and then "connection reset". SSH to nightshade took 2
> >> minutes or so to connect. Now web & ssh seems to be working again.
> >>
> > At which time did you try about?
>
> On IRC myself and jem- reported having issues at around 10:10am UTC and it
> recovered around 10:14am UTC.
> As of 11:08am UTC I cannot ssh in, and phe was getting 404s.
> tsbot and tsnag also left the channel at 11:05am UTC after timing out.

Curious: now 'ps aux' on nightshade hangs after displaying some 30 or so
processes.

Some lines from strace when it was hanging:

connect(6, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.24.1.18")}, 16) = 0
poll([{fd=6, events=POLLOUT}], 1, 0)    = 1 ([{fd=6, revents=POLLOUT}])
sendto(6, "\3353\1\0\0\1\0\0\0\0\0\0\4ldap\3esi\ntoolserver"..., 41, MSG_NOSIGNAL, NULL, 0) = 41
poll([{fd=6, events=POLLIN|POLLOUT}], 1, 5000) = 1 ([{fd=6, revents=POLLOUT}])
sendto(6, "\23I\1\0\0\1\0\0\0\0\0\0\4ldap\3esi\ntoolserver"..., 41, MSG_NOSIGNAL, NULL, 0) = 41
poll([{fd=6, events=POLLIN}], 1, 4999)  = 0 (Timeout)

Port 53 is DNS? So it looks like some DNS query timed out?

Now it seems to be working again. I didn't log the whole strace run, but
I saved the lines that I still had in the terminal buffer... I can send
it if anybody needs it.



_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette
Reply | Threaded
Open this post in threaded view
|

Re: Web services and SSH down?

Johannes Kroll
On Tue, 26 Feb 2013 12:54:27 +0100
Johannes Kroll <[hidden email]> wrote:

> On Tue, 26 Feb 2013 05:19:32 -0600
> legoktm <[hidden email]> wrote:
>
> > On Tue, Feb 26, 2013 at 4:38 AM, Marlen Caemmerer <
> > [hidden email]> wrote:
> >
> > > On Tue, 26 Feb 2013, Johannes Kroll wrote:
> > >
> > >
> > >> While trying to load http://toolserver.org/~render/**stools/tlg<http://toolserver.org/~render/stools/tlg>,
> > >> we got
> > >> 500 errors first and then "connection reset". SSH to nightshade took 2
> > >> minutes or so to connect. Now web & ssh seems to be working again.
> > >>
> > > At which time did you try about?
> >
> > On IRC myself and jem- reported having issues at around 10:10am UTC and it
> > recovered around 10:14am UTC.
> > As of 11:08am UTC I cannot ssh in, and phe was getting 404s.
> > tsbot and tsnag also left the channel at 11:05am UTC after timing out.
>
> Curious: now 'ps aux' on nightshade hangs after displaying some 30 or so
> processes.
>
> Some lines from strace when it was hanging:
>
> connect(6, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.24.1.18")}, 16) = 0
> poll([{fd=6, events=POLLOUT}], 1, 0)    = 1 ([{fd=6, revents=POLLOUT}])
> sendto(6, "\3353\1\0\0\1\0\0\0\0\0\0\4ldap\3esi\ntoolserver"..., 41, MSG_NOSIGNAL, NULL, 0) = 41
> poll([{fd=6, events=POLLIN|POLLOUT}], 1, 5000) = 1 ([{fd=6, revents=POLLOUT}])
> sendto(6, "\23I\1\0\0\1\0\0\0\0\0\0\4ldap\3esi\ntoolserver"..., 41, MSG_NOSIGNAL, NULL, 0) = 41
> poll([{fd=6, events=POLLIN}], 1, 4999)  = 0 (Timeout)
>
> Port 53 is DNS? So it looks like some DNS query timed out?

If DNS drops out from time to time, could that explain the problems we
see? Even rsync failed for me at one point, in addition to the web
and ssh stuff.

Which machine has address 10.24.1.18? Why would it be down or
unreachable?



_______________________________________________
Toolserver-l mailing list ([hidden email])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: https://wiki.toolserver.org/view/Mailing_list_etiquette