# RootPrompt.org   Nothing but Unix.[Home] [Features] [Programming] [Mac OS X] [Search]


 Feature: Cracked! part 5: Rebuilding

This is the fifth part of the story of a community network that was cracked and what was done to recover from it.

This article covers the rebuilding of the system to recover from the crack and fix some long standing problems.

Future articles detail their conversations with the cracker on IRC, the hole they missed and the crackers revenge.

 (Submitted by Noel Mon Jun 12, 2000 )

  

Cracked


Part 5

Rebuilding


This is the fifth part of the story of a community network that was cracked and what was done to recover from it. The first part Cracked! Part1: Denial and truth details the report that leads to the discovery that the community network was indeed cracked and some of the initial reactions. The second article Cracked! Part 2: Watching and Waiting talks about how they learned more about the cracker and what they did next. The third Cracked! Part 3: Hunting the hunter talks about some of the efforts made to track down the cracker and some surprises. The fourth Cracked! Part 4: The Sniffer tells how they found the sniffer that the cracker was running on their network and what they did next. This article covers the rebuilding of the system to recover from the crack and fix some long standing problems. Future articles detail their conversations with the cracker on IRC, the hole they missed and the crackers revenge.

By this point we have realized that we must get the cracker off of our machines before it is to late. It is only a matter of time before he trashes our system to clean up his tracks, gets a sniffer running under a different architecture or uses us to launch some denial of service attack. The FBI have not delivered anything even though they always sound positive about the situation whenever we talk to them. The next step is up to us.

There are pros and cons involved in attempting a rebuild. First on the pro side we will have the opportunity to set things up in a planned manner, not the randomness caused by the equipment slowly coming in as it was donated. We will be able to get all of the machines running a newer version of their operating systems and the machines of the same architecture will be running the same versions of the operating system. On the negative side it will be a lot of work (remember I was a volunteer) more than I can do in my spare time so I will have to take off from work. Other people will also have to take off work. There is the chance that we will do all of this and the cracker will still be able to crack us bringing us most of the way back to were we started from except that he will know that we know about him, and there is the chance that we will tip our hand and the cracker will destroy things to cover up his tracks before we can lock him out.

From these factors we decided that we had to do the rebuild all at once. We had to make sure that we cleaned up all the machines at the same time and that we did not let them back on the net until we were as sure as we could be that the cracker could not get back on. We decided to make the changes in about a month when we had scheduled to move the equipment from their original site to a new location across town.

We also decided to not tell the users any more than we had to. We told them that we had to move the equipment and did not tell them that we had been cracked and were going to do have to do all the other work needed to rebuild the system. By doing this we planned to keep the cracker in the dark and prevent him from destroying the system to cover his tracks. The downside to this was that we told the users that we were going to be offline for a day or two when we knew that it was going to be more like a week.

Before we did anything we needed to figure out what we wanted to do and how we were going to do it. We knew that without a plan we were not going to get anything done.

We made the following goals for the redesign and securing of our systems:

  • Make the system more reliable and make no machine depend on another:

    One of our major problems with our original setup we was that the system was very unreliable due to the convoluted and interdependent configuration of our machines. When one machine crashed it would take down the others. This was in many ways worse than our having been cracked. We were determined to eliminate or reduce this interdependence.

    We also did not want the machines to trust one another any more then was absolutely essential. If one of our machines was cracked we did not want the cracker to easily leverage this into all of them being cracked.

  • Restrict logins to one box:

    Our users could login to all but one of our machines. This caused more admin work keeping user software such as pine, elm, inn etc up to date on multiple platforms. In a commercial environment this might not have been a problem but it caused a problem for our volunteer system admins. It also increased our vulnerabilities to system cracks due to the number of OS versions and software versions that we had to keep an eye on. We wanted to improve this by simplifying things and allowing user logins to only one of the alphas.

  • Limit what set user id and set group id programs that users can run:

    Every set user or group id program that a user can run is one that may have a vulnerability in it. They less that they can run the less that they can abuse. As an example Digital Unix had over three hundred set user and set group programs and only a handful of these had man pages. Many of these were very useful running the Alpha as a desktop machine but were only a trap waiting to happen as a server.

  • Users can not run their own software:

    Users will run the most surprising things. The cracker was running a user space port redirector, and a file transfer utility that left no records in the logs. There are many things such as denial of service programs, software that allows you to execute commands on the machine without login in or leaving any trace, or that assist you to exploit vulnerabilities and race conditions. We also had a steady supply of users running bone headed scripts that would suck up more than their share of system resources. We wanted to stop all of this on our machines.

  • Have server software run on one machine:

    In our original configuration we had many services running on many different machines and we wanted to reduce that as much as we could. For example we wanted only one news server, one IRC server, mail, etc. Running server software on multiple hosts made us more vulnerable to attack and increased the system admin load.

  • Use a log host:

    One of the problems with being cracked is that once someone has root on your machine you can no longer trust it's logs. By sending all the syslog messages to a loghost we would be more likely to have reliable logs. Of course you must pay special attention to securing it because if the log host is cracked then you can no longer trust it's logs and you are back were you started.

    Also having the logs all in one place would allow us to monitor things much more closely. We also planned a system to parse the logs and pull out events that we wanted to monitor.

  • Run tripwire on the newly installed machines and generate a baseline for the installation:

    After we were cracked we had no way to know what had been changed on our systems. By running tripwire we could capture a cryptographic snapshot of what applications were on our system and be able to tell what had been changed at a later time.

Once these goals had been established we built a planning document that detailed what the system would look like when we finished and what steps were needed along the way. The plan was then distributed to all the system administrators who each provided feedback to improve it. By doing this and using this plan we could look at each piece individually and improve it while still keeping in mind what the final system would look like. It also turned out to be invaluable during the work because everyone involved understood the plan and knew what needed to be done.

We placed announcements on our web pages and in our system messages that informing our users that we were moving the equipment in a few weeks and even made an announcement to the local news paper. As was planned we made no mention that we had been cracked or that the outage was going to be longer than the weekend that we had announced that we were going to be unavailable.

We lined up all of the people that needed to be involved. Not just the system administrators but also people who would help pack up the equipment move it across town and set it back up. Some of these volunteers drove hundreds of miles to be there and help. All of them did much more that their share and went the extra mile.

One of the other goals of this move was to physically organize the equipment. The equipment as it had been acquired and as it was arranged in the old location was quite a mess. There were lots of external SCSI cabinets containing hard drives and tape drives. These had been placed were they would stack without regard to what system they were attached to or any other rhyme or reason. We had received some donated racks with shelves and some other people had created a detailed plan to physically restructure the equipment so that all of the components of the machine were together and would be much easier to work on.

The day of the move came with out any new surprises. We had worked long and hard refining the plan and had made many improvements . Some of us had driven from far away to be there and help. I felt like we were ready.

Before driving to the office that the equipment was in I logged in and started one last backup. By the time I arrived the backup had finished and I shutdown the systems and we were ready to move them to their new home across town.

Our crew packed up the equipment and drove it across town to the new location. As with any move of computer equipment especially older boxes we had a lot of concerns that the drives would not spin up or that something would be damaged during the move. But in this move we were lucky and everything spun back up.

When we arrived with the equipment at the new site one crew of about four to five people started setting up the racks, bolting them together, cutting cable holes through them and building a very nice custom rack for the system. All of the system admins including myself started setting up the systems on the floor in preparation for reinstalling them. We had one system admin there that was an expert on the IBM RS6000s so he started working on installing a new release of AIX. The rest of us started working installing the latest Digital Unix on the alphas.

Installing the operating systems was accompanied by the normal frustrations but went fairly smoothly. One by one we replaced the operating systems with a clean trusted version. For the first time all the machines were using the exact same operating system as the others of their type.

Once the operating system were installed we went in compiled and installed ssh and turned off all the services and daemons that we could. On most of the machines we turned off everything but ssh. The only user service we left running was mail on one of our machines. This was so that as we rebuilt everything the incoming mail would continue to spool to this machine. Our users had no way to read this mail, but by letting it spool to a machine we prevented them from loosing it, it being returned to the sender as undeliverable or from them being dropped from their mailing lists.

Next we ran tripwire and saved all the cryptographic hashes onto the machine that was going to be our log host. The hashes were also copied to a floppy and to be taken off site. I then ran tripwire on the hashes themselves and wrote down the MD5 hash of the hash files. We were feeling a little paranoid. We would have preferred to burn them to a CD or some other unwritable media but did not have the equipment to do it.

As the machine's operating systems were installed and secured we started moving them into their new racks. We wired them to power and to our internal network that we had not yet connected to the Internet.

With only root allowed to login and then only with ssh and with no services running but ssh with the exception of sendmail running on one machine we felt ready to connect them to the net. We had configured ssh to only allow connections that had a key placed in the authorized_keys file on the server and had put in place new root passwords. We plugged the Ethernet cable into the jack on the wall and within a half hour had spotted attempted telnet connections from the machine the cracker normally connected from.

We telneted to several off site machines and checked the network connectivity in our new site and the DNS changes that had been made. Then from a remote site I started a port scan on each of our machines to make sure that we had not accidently left any daemons running or ports on. It only showed what we expected to be running.

It was now evening and we had all had enough for the day. I went home and dialed in and checked that my key for my home machine was working and that I could ssh into all of the machines and then went to bed.

The next day I started on the hardest part of rebuilding, all of the custom and open source software. The first pieces I worked on was the Radius server and the pop server. We provided free dialup accounts and we wanted to get those people to be up and working as fast as possible. Radius is a daemon that provides authentication for our dial up servers.

Once I had radius and pop mail up I started downloading and compiling the source for most of software we would need. I also compiled the custom software that we needed to run. Some of the stuff was really hard to find and/or compile. But I made steady progress but I still ended up working on it for days.

There were many other things to configure and install. At each step a lot of care had to be taken to ensure that I was closing and not opening any vulnerabilities. We replaced some of the software we had been running with different packages that we felt were more secure. In the most part these packages have been more secure than the packages that we replaced. If you read bugtraq or other security stuff you will probably notice that many of the same packages show up again and again with security problems.

There were lots of little problems to work out. For example the menuing software acted differently than it had with an earlier version of Digital Unix. Some of the problems were caused by an effort to make things work just like they used to to reduce support calls after we started allowing logins again.

There was also a lot of work done to secure the systems. For and example in Digital Unix there were hundreds of set user id programs most of which were owned by root. As this was the OS that was going to have user logins this was a big potential problem. To reduce this problem I turned off all the set user id bits and then only turned back on the ones that were needed.

We broke the set user id programs into several categories:

  • Not needed

    This was the great bulk of them. Somewhere around 250 applications. Many of these applications have shown up on the bugtraq mailing list with exploits attached. It is a really nice feeling to read them and know that your system is safe.

  • Need by staff / volunteer accounts

    There were a few of these. Examples are ping, traceroute, etc. We made a policy that to be in the group that could run these you had to only connect with ssh. We also kept membership in this group very small.

  • Needed by everyone

    There were two of these. mail and sendmail. It amazed me that out of more than three hundred applications only two were needed by all our users. It is also something to consider that there are other mail packages that do not require set user id root to run, only set group id in a mail group. If I were setting this up today there would not be any set user id programs owned by root that could be executed by normal users.

There was a lot more work to do than I had really expected. I had been working on it much longer than I had expected. I ended up taking almost a week off from work before I was through. Others also contributed many days of volunteer work.

After the first day we had our webserver back up and had used this to notify our users about the situation. We also ended up talking to the local press and admitting that we were down to recover from being cracked.

It was very educational for me to compare what a system looked like when it was carefully planned on paper rather than just grown. Our system reliability went up ten fold. It made the system administration work go down to at least a third. The system was much easier to secure and much simpler to understand.

However it is one thing for it to be better and easier and quite another to get every hole. In the battle between the system administrator and the cracker the system administrator has to find all of the holes and the cracker only needs one. As we were soon to find out.


Our content can be syndicated: Main page Mac Page

Copyright 1999-2005 Noel Davis. Noel also runs web sites about sailing and kayaking.
All trademarks are the property of their owners.
All articles are owned by their author