Last week, the server was down for a little over 24 hours. It's our first major outage for quite some time, but still - it was annoying. I finally got things back online at about 9pm Friday night, and we seem to be stable now. A list of the changes is below, plus details on the circumstances leading up to the problem.
Executive summary: Telstra sucks.
The Outage
On Tuesday last week, I put in an order with Internode to have our IP address space expanded from a single IP to a subnet of five. This, in a nutshell, will allow us to have more than one server directly accessible from the internet, and is pretty much necessary for some expansion plans I have in mind. More detail on
that will come at some other time.
We received the following advice from Internode in response to our request:
Date: 27 September 2005
The request now needs to be created with Telstra Wholesale to modify this service with the new IP range. This normally would take around a week or so. You should be contacted with the new range once this request has been processed by Telstra. The Accounts department will then adjust your billing accordingly.
Two days later, our line dropped out - the very day I was going to send out a "we have some downtime coming up next week" email to everyone. We couldn't get a traceroute through to the server. I called Internode to query this, given that there
was a pending request in the system that could cause something like what we were seeing. According to the person I spoke to, it looked like Telstra
may have done something on the line, but it wasn't clear whether they had or not.
I bundled up some gear and drove out to the premises where our hardware lives. We have had a nice, new Cisco 837 router sitting down there, ready to be configured, for a few months now. I had been planning on doing some research on getting it configured up for the new address range, once I'd been advised of what it was, over the weekend.
I gave up on Thursday night, given I had no net connection with which to look up documentation (I'd cleverly removed the modem from my laptop some months earlier and not put it back) and it was still uncertain as to whether we still had our old address, the new one, or none at all.
On Friday afternoon, Internode confirmed that the new address range was definitely set up, and they even supplied me with a sample configuration for the 837 on their network, for which I can't thank them enough - it took a great deal of uncertainty out of the situation.
I arrived on-site Friday evening, plugged in the router, hooked up my laptop, loaded the configuration that Internode had supplied and
everything started working. Hooray for Internode. I can't thank their support staff enough for helping me out over this period. They were great.
It took another hour or so to get the servers reconfigured and to make the necessary DNS changes so the world could find us again. By Saturday morning, we were getting mail again, and most of our hosted sites were reachable.