I got a nice email from Dreamhost earlier today... good details, and the specifically call out that their working on the speed issues...
On Monday, September 12 the greater Los Angeles area experienced a major power outage
affecting large sections of the city, including our main data center. The power
shortly before 1pm PST and continued until about 4:30pm PST. Our data center is
with a redundant backup power system with both battery UPS systems and diesel
but the backup failed and our entire data center was powered down.
We have previously covered much of this information on our official weblog (http://
blog.dreamhost.com/) but many of you have not seen that information so we will
the events here.
When the grid power to our building was cut, the UPS system kicked in and kept
the building up and running. The five generators also fired up and began providing
The building needs four generators to operate at full power so the system is
tolerate a single failure. Unfortunately, two of the five generators failed within
minutes of each
other. We receive our power from the building housing our data center and they also
the redundant power system. We do not know the exact reason for the generator
this time. We have received some vague explanations that we have not found to be
Regardless, the remaining three generators were not sufficient to meet the
needs and that caused the emergency electrical systems to transfer into a “load
mode” and the building’s UPS system to turn itself off, thus preventing permanent
related equipment damage. That shut everything down, including emergency lighting,
building was evacuated.
About 15 minutes later, one of the generators was started up to power emergency
a couple of our senior technicians made their way into the (still evacuated)
building and down
to our data center to assess the damage. Since the backup power had failed, our own
center power remained off until the main grid power came back. We then proceeded to
power up our equipment. Servers (and all computers) consume significantly more
booting up than when up and running so there is some risk of overloading the power
too many of them are flipped on at once. Keeping that in mind, we powered
everything on as
quickly as possible. At that time the majority of our services were fully back up
but some services were still down and we began the process of systematically
services and making any necessary repairs and adjustments. Whenever a large number of
servers suddenly loses power a certain small percentage of them will not come back
and when you have several hundred servers it takes awhile to verify all of them.
Once our own access to our servers was restored our staff continued working into the
restore as much service as possible and to respond to as many of your support cases as
possible. Some of our staff continued working all the way through the night and we
to restore almost everything that first night.
Tuesday (September 13) started off early with all of us addressing the residual
around noon that day one of our core routers experienced an internal failure
damage previously sustained during the power outage. Our routers handle all of the
traffic coming in and out of our network and they are set up in a redundant way to
network disruption when a failure does occur. In this case, the main cpu of the
the 'supervisor') died and the secondary one took over. Everything continued
working almost as
it should have, but there is a remaining router issue that we are still working with
on. That issue is responsible for the slower than normal performance of our network
and it will
be resolved absolutely as soon as possible.
During this outage, our off-network Emergency Status Page
proved to be an invaluable resource for disseminating information among our
status page remained up throughout the power outage and was updated regularly as we
received new information. Unfortunately, not everyone knows about it and we will be
to improve that situation in the coming days. Those bloggers among you that did
check the status page were extra helpful in passing along the information to other
dreamhosters who were still in the dark. Thank you to everyone who helped out with
This announcement will be followed by another explaining what went wrong with our
and what we plan to do to address them. That will come in the next few days.
We will be continuing to provide more detailed information on our official weblog
found here: http://blog.dreamhost.com/
Also, everyone who has not bookmarked our Emergency Status Page should do so now.
page is found here:http://status.dreamhost.com/
We will be improving on the basic page we have there to provide as useful of an
information as possible.
If you have any additional questions about this outage, please let us know. We will
be happy to
address all of your questions or concerns.
The Un-Happy DreamHost Powerless Team