This is the fifth part of the story of a community network that was
cracked and what was done to recover from it. The first
part Cracked! Part1:
Denial and truth details the report that leads to the discovery
that the community network was indeed cracked and some of the initial
reactions. The second article Cracked! Part 2: Watching and
Waiting talks about how they learned more about the cracker
and what they did next. The third Cracked! Part 3: Hunting
the hunter talks about some of the efforts made to track down the
cracker and some surprises. The fourth Cracked! Part 4: The Sniffer tells how they found the
sniffer that the cracker was running on their network and what they did
next. This article covers the rebuilding of the system to recover from the
crack and fix some long standing problems. Future articles detail
their conversations with the cracker on IRC, the hole they missed and
the crackers revenge.
By this point we have realized that we must get the cracker off of our
machines before it is to late. It is only a matter of time
before he trashes our system to clean up his tracks, gets a sniffer
running under a different architecture or uses us to launch some denial of
service attack. The FBI have not delivered anything even though they
always sound positive about the situation whenever we talk to them.
The next step is up to us.
There are pros and cons involved in attempting a rebuild.
First on the pro side we will have
the opportunity to set things up in a planned manner, not the
randomness caused by the equipment slowly coming in as it was
donated. We will be able to get all of the machines running a newer
version of their operating systems and the machines of the same
architecture will be running the same versions of the operating
system. On the negative
side it will be a lot of work (remember I was a volunteer) more than I
can do in my spare time so I will have to take off from work. Other
people will also have to take off work. There
is the chance that we will do all of this and the cracker will still
be able to crack us bringing us most of the way back to were we
started from except that he will know that we know about him, and there
is the chance that we will tip our hand and the cracker will destroy
things to cover up his tracks before we can lock him out.
From these factors we decided that we had to do the rebuild all at
once. We had to make sure that we cleaned up all the machines at the
and that we did not let them back on the net until we were as sure as
we could be that the cracker could not get back on. We decided to
make the changes in about a month when we had scheduled to move the
equipment from their original site to a new location across town.
We also decided to not tell the users any more than we had to. We
told them that we had to move the equipment and did not tell them
that we had been cracked and were going to do have to do all the other
work needed to rebuild the
system. By doing this we planned to keep the cracker in the dark and
prevent him from destroying the system to cover his tracks. The
downside to this was that we told
the users that we were going to be offline for a day or two when we
knew that it was going to be more like a week.
Before we did anything we needed to figure out what we wanted to do
and how we were going to do it. We knew that without a plan we were
not going to get anything done.
We made the following goals for the redesign and securing of our
- Make the system more reliable and make no machine depend on another:
One of our major problems with our original setup we was that the
system was very
unreliable due to the convoluted and interdependent configuration of
our machines. When one machine crashed it would take down the others.
This was in many ways worse than our having been cracked. We were
eliminate or reduce this interdependence.
We also did not want the machines to trust one another any more then
was absolutely essential. If one of our machines was cracked we did
not want the cracker to easily leverage this into all of them being
- Restrict logins to one box:
Our users could login to all but one of our machines. This caused
more admin work keeping user software such as pine, elm, inn etc up to
date on multiple platforms. In a commercial environment this might
not have been a problem but it caused a problem for our volunteer
system admins. It also increased our vulnerabilities to system
cracks due to the number of OS versions and software versions that we
had to keep an eye on. We wanted to improve this by simplifying things
and allowing user logins to only one of the alphas.
- Limit what set user id and set group id programs that users can run:
Every set user or group id program that a user can run is one that may
have a vulnerability in it. They less that they can run the less that
they can abuse. As an example Digital Unix had over three hundred set
user and set group programs and only a handful of these had man
pages. Many of these were very useful running the Alpha as a desktop
machine but were only a trap waiting to happen as a server.
- Users can not run their own software:
Users will run the most surprising things. The cracker was running a
user space port redirector, and a file transfer utility that left no
records in the logs. There are many things such as denial of service
programs, software that allows you to execute commands on the machine without
login in or leaving any trace, or that assist you to exploit
vulnerabilities and race
conditions. We also had a steady supply of users running bone headed
scripts that would suck up more than their share of system resources.
We wanted to stop all of this on our machines.
- Have server software run on one machine:
In our original configuration we had many services running on many
different machines and we wanted to reduce that as much as we could.
For example we wanted only one news server, one IRC server, mail, etc.
Running server software on multiple hosts made us more vulnerable to
attack and increased the system admin load.
- Use a log host:
One of the problems with being cracked is that once someone has root
on your machine you can no longer trust it's logs. By sending all the
syslog messages to a loghost we would be more likely to have reliable
logs. Of course you must pay special attention to securing it because
if the log host is cracked then you can no longer trust it's logs and
you are back were you started.
Also having the logs all in one place would allow us to monitor things
much more closely. We also planned a system to parse the logs and
pull out events that we wanted to monitor.
- Run tripwire on the newly installed machines and generate a
baseline for the installation:
After we were cracked we had no way to know what had been changed on
our systems. By running tripwire we could capture a cryptographic
snapshot of what applications were on our system and be able to tell
what had been changed at a later time.
Once these goals had been established we built a planning document
that detailed what the system would look like when we finished and
what steps were needed along the way. The plan was then distributed
to all the system administrators who each provided feedback to
improve it. By doing this and
using this plan we could look at each piece individually and improve
it while still keeping in mind what the final system would look like.
It also turned out to be invaluable during the work because everyone
involved understood the plan and knew what needed to be done.
We placed announcements on our web pages and in our system messages
that informing our users that we were moving the equipment in a few weeks
and even made an
announcement to the local news paper. As was planned we made no
mention that we had been cracked or that the outage was going
to be longer than the weekend that we had announced that we were going to
We lined up all of the people that needed to be involved. Not just
the system administrators but also people who would help pack up
the equipment move it across town and set it back up. Some of these
volunteers drove hundreds of miles to be there and help. All of them
did much more that their share and went the extra mile.
One of the other goals of this move was to physically organize the
equipment. The equipment as it had been acquired and as it was arranged
in the old location was quite a mess. There were lots of external
SCSI cabinets containing hard drives and tape drives. These had been
placed were they would stack without regard to what system they were
attached to or any other rhyme or reason. We had received some donated
racks with shelves and some other people had created a detailed plan
to physically restructure the equipment so that all of the components
of the machine were together and would be much easier to work on.
The day of the move came with out any new surprises. We had worked
long and hard refining the plan and had made many improvements . Some
of us had driven from far away to be there and help. I felt like we
Before driving to the office that the equipment was in I logged in and
started one last backup. By the time I arrived the backup had finished
and I shutdown the systems and we were ready to move them to their new
home across town.
Our crew packed up the equipment and drove it across town to the new
location. As with any move of computer equipment especially older
boxes we had a lot of concerns that the drives would not spin up or
that something would be damaged during the move. But in this move we
were lucky and everything spun back up.
When we arrived with the equipment at the new site one crew of about
four to five people started setting up the racks,
bolting them together, cutting cable holes through them and building
a very nice custom rack for the system. All of the
system admins including myself started setting up the systems on the
floor in preparation for reinstalling them. We had one system admin
there that was an expert on the IBM RS6000s so he started working on
installing a new release of AIX. The rest of us started working
installing the latest Digital Unix on the alphas.
Installing the operating systems was accompanied by the normal
frustrations but went fairly smoothly. One by one we replaced the
operating systems with a clean trusted version. For the first
time all the machines were using the exact same operating system as
the others of their type.
Once the operating system were installed we went in compiled and
installed ssh and turned
off all the services and daemons that we could. On most of the
machines we turned off everything but ssh. The only user service
we left running was mail on one of our machines. This was so that as
we rebuilt everything the incoming mail would continue to spool to this
machine. Our users had no way to read this mail, but by letting it
spool to a machine we prevented them from loosing it, it being
returned to the sender as undeliverable or from them being dropped
from their mailing lists.
Next we ran tripwire and saved all the cryptographic
hashes onto the machine that was going to be our log host. The hashes
were also copied to a floppy and to be taken off site. I then ran
tripwire on the hashes themselves and wrote down the MD5 hash of the
hash files. We were feeling a little paranoid. We would have
preferred to burn them to a CD or some other unwritable media but did
not have the equipment to do it.
As the machine's operating systems were installed and secured we
started moving them into their new
racks. We wired them to power and to our internal network that we had
not yet connected to the Internet.
With only root allowed to login and then only with ssh and with no
services running but ssh with the
exception of sendmail running on one machine
we felt ready to connect them to the net. We had configured ssh to
only allow connections that had a key placed in the authorized_keys
file on the server and had put in place new root passwords. We plugged
the Ethernet cable into the jack on the wall and within a half hour
had spotted attempted telnet connections from the machine the cracker
normally connected from.
We telneted to several off site machines and checked the network
connectivity in our new site and the DNS changes that had been made.
Then from a remote site I started a port scan on each of our machines
to make sure that we had not accidently left any daemons running or
ports on. It only showed what we expected to be running.
It was now evening and we had all had enough for the day. I
went home and dialed in and checked that my key for my home machine
was working and that I could ssh into all of the machines and then
went to bed.
The next day I started on the hardest part of rebuilding, all of the
custom and open source software. The first pieces I worked on was the
Radius server and the pop server. We provided free dialup accounts
and we wanted to get those people to be up and working as fast as
possible. Radius is a daemon that provides authentication for
our dial up servers.
Once I had radius and pop mail up I started downloading and
compiling the source for most of software we would need. I also
compiled the custom software that we needed to run. Some of the stuff
was really hard to find and/or compile. But I made steady progress
but I still ended up working on it for days.
There were many other things to configure and install. At each step a
lot of care had to be taken to ensure that I was closing and not
opening any vulnerabilities. We replaced some of the software we had
been running with different packages that we felt were more secure.
In the most part these packages have been more secure than the
packages that we replaced. If you read bugtraq or other security
stuff you will probably
notice that many of the same packages show up again and again with
There were lots of little problems to
work out. For example the menuing software acted differently than it
had with an earlier version of Digital Unix. Some of the problems
were caused by an effort to make things work just like they used to to
reduce support calls after we started allowing logins again.
There was also a lot of work done to secure the systems. For and
example in Digital Unix there were hundreds of set user id
programs most of which were owned by root. As this was the OS that was
going to have user logins this
was a big potential problem. To reduce this problem I turned off all
the set user id bits and then only turned back on the ones that were
We broke the set user id programs into several categories:
- Not needed
This was the great bulk of them. Somewhere around 250 applications.
Many of these applications have shown up on the bugtraq mailing list
with exploits attached. It is a really nice feeling to read them and
know that your system is safe.
- Need by staff / volunteer accounts
There were a few of these. Examples are ping, traceroute, etc. We
made a policy that to be in the group that could run these you had to
only connect with ssh. We also kept membership in this group very
- Needed by everyone
There were two of these. mail and sendmail. It amazed me that out of
more than three hundred applications only two were needed by all our
users. It is also something to consider that there are other mail
packages that do not require set user id root to run, only set group
id in a mail group. If I were setting this up today there would not
be any set user id programs owned by root that could be executed by
There was a lot more work to do than I had really expected. I had
been working on it much longer than I had expected. I ended up taking
almost a week off from work before I was through. Others also contributed
many days of volunteer work.
After the first day we had our webserver back up and had used this to
notify our users about the situation. We also ended up talking to the
local press and admitting that we were down to recover from being
It was very educational for me to compare what a system looked like
when it was carefully planned on paper rather than just grown. Our
system reliability went up ten fold. It made the system
administration work go
down to at least a third. The system was much easier to secure and
much simpler to understand.
However it is one thing for it to be better and easier and quite
another to get every hole. In the battle between the system
administrator and the cracker the system administrator has to find all
of the holes and the cracker only needs one. As we were soon to find