PDA

View Full Version : A note about DTP Forum


ktinkel
January 29th, 2008, 01:36 PM
The DTP Forum is off-line for now, has been since mid-morning January 29.

I woke up this morning to discover that we had been “restored” to our January 26 condition, with many messages missing. After thinking about it a bit and having a few exchanges with tech support at our host, I closed the site, and began backing up the missing files.

It is a slow process; the first increment has been running for 3-plus hours as I type, and there are three more to go.

But if any of our members are reading this, slow though it is we are working on restoring most of what was lost, and hope to be back on-line by Wednesday if not sooner.

Thank you.

Judy G. Russell
January 30th, 2008, 06:05 PM
I woke up this morning to discover that we had been “restored” to our January 26 condition, with many messages missing.Ouch ouch ouch... and of course being "restored" to an earlier date tells me your host restored an old backup...

ktinkel
January 30th, 2008, 08:26 PM
Ouch ouch ouch... and of course being "restored" to an earlier date tells me your host restored an old backup...Yep. Talk about agita! Before that we thought we were being hacked (and may have been, in fact).

It has been something like 10 days since I slept a normal night and ate meals on time. I have been eating cereal for lunch for a week (having overslept).

This was really ridiculous. Who says this is a virtual world? Hah! :eek:

Judy G. Russell
January 31st, 2008, 05:12 PM
we thought we were being hacked (and may have been, in fact).Sounds like it was more than just you -- maybe the whole shared server.

ktinkel
January 31st, 2008, 08:52 PM
Sounds like it was more than just you -- maybe the whole shared server.Well, the RAID failed, and that certainly affected everyone.

As for whether we were being hacked, that is an open question. I paid little attention during the last couple of weeks in December (crazy time of the year). But it appears our bandwidth began to increase then (from 12% or so to 20% of 25GB). In January it went crazy, and was 73% on 1/18.

I went looking for cause (the forum itself had not been particularly busy, and it was around that time we began to have many complaints about slow response times). The only thing that really glared was the Error Log, which was pumping hundreds of errors a day.

Many errors referred to missing files. I have been going through all those and checking, and replacing buttons, icons, and other doodads that were unavailable. That, the vB update (and our 24-hour hiatus) have reduced the error rate, but I still do not know for sure. We lost some files last week, but it may have been as the server was failing. Or it might have been some skulduggery. Not sure.

Anyway, we are better now; and I will be vigilent. Sigh.

Lindsey
February 1st, 2008, 12:04 AM
Well, the RAID failed, and that certainly affected everyone.
Ouch! We had that happen to one of our servers a couple of years ago, and it was not pretty.

Many errors referred to missing files.
Just my inexpert guess, because I am not a network tech, but that does sound to me as if it could have been the result of a failing server.

--Lindsey

Judy G. Russell
February 1st, 2008, 01:42 PM
Well, the RAID failed, and that certainly affected everyone....We lost some files last week, but it may have been as the server was failing. Or it might have been some skulduggery. Not sure.Sure sounds like a failing server to me. But vigilance is a good thing anyway.

ktinkel
February 3rd, 2008, 09:28 PM
Just my inexpert guess, because I am not a network tech, but that does sound to me as if it could have been the result of a failing server.Right you are. The RAID failed.

To makes things more crazy, about 12 hours after they fixed that, one of the drives failed, and we lost another bunch of hours over that one.

We were more or less out of business from late Jan 28 through the 30th.

Yuk.

Lindsey
February 3rd, 2008, 10:49 PM
We were more or less out of business from late Jan 28 through the 30th.
Ouch!! But it does make me feel a little less bad that when we had a RAID failure, it took the application using that server offline for the whole next day.

--Lindsey

ktinkel
February 4th, 2008, 11:40 AM
Ouch! We had that happen to one of our servers a couple of years ago, and it was not pretty.

Just my inexpert guess, because I am not a network tech, but that does sound to me as if it could have been the result of a failing server.I wish I had told you about this as it was unfolding — we might have saved some down time!

Our story is complicated because we had been having complaints from members about slow responses for several weeks. So they moved us from one server to another. It did not solve the problem; it got worse, and I was pestering tech support for days with complaints about slowness, then asking for advice on the mounting errors, concern about possibly being hacked, and about missing files. They didn’t seem to take any of it seriously. Then boom.

But I find it hard to believe that they had failing RAIDs on two servers.

Ah, well. Experience keeps a dear school …

ktinkel
February 4th, 2008, 11:45 AM
Sure sounds like a failing server to me. But vigilance is a good thing anyway.Yes, I guess so. We are monitoring everything closely.

I even got Summary (http://summary.net/summary.html) so I can [try to] understand Raw Access Logs.

Lindsey
February 4th, 2008, 07:24 PM
I wish I had told you about this as it was unfolding — we might have saved some down time!
Well, as I said, I am not a network tech, so I don't know that I could have offered much more than sympathy!

Simultaneous failing RAIDs on two different servers is a bit hard to believe. I'd have thought that reports of slow responses and mounting errors would have been a red flag to them saying "pending disk failure," but as I said, I don't do network support, so ....

--Lindsey

ktinkel
February 7th, 2008, 03:04 PM
Well, as I said, I am not a network tech, so I don't know that I could have offered much more than sympathy!

Simultaneous failing RAIDs on two different servers is a bit hard to believe. I'd have thought that reports of slow responses and mounting errors would have been a red flag to them saying "pending disk failure," but as I said, I don't do network support, so ....Turns out they had also updated their server software — then reverted when they started having lots of problems on all the servers. Our original server was affected by that, it seems. The RAID failure was on the second.

Yesterday we were still having problems so they moved us again; things are better though not perfect. Three servers in three weeks is some kind of a record for me.

We are looking around, but carefully.

Lindsey
February 7th, 2008, 10:17 PM
Turns out they had also updated their server software — then reverted when they started having lots of problems on all the servers. Our original server was affected by that, it seems. The RAID failure was on the second.
Ahhhh, that makes more sense. Software upgrades are a bitch! (But, ummm, I have to wonder why they apparently did all of them at once, rather than phasing them in? Perhaps they had reasons, but if I have a choice, I always, always, always, prefer phased upgrades. That way if there are problems, at least you have confined them to a small subset of your universe.

--Lindsey

ktinkel
February 8th, 2008, 03:20 PM
Ahhhh, that makes more sense. Software upgrades are a bitch! (But, ummm, I have to wonder why they apparently did all of them at once, rather than phasing them in? Perhaps they had reasons, but if I have a choice, I always, always, always, prefer phased upgrades. That way if there are problems, at least you have confined them to a small subset of your universe.Who knows their reasons? Not me. :(