If you examine your logs, you will occasionally find that your site is being hammered by a visitor or two. These are usually spiders (automated robots which examine web sites for various reasons) although sometimes they can be malicious users or hackers.

Spiders are generally looking for things. Sometimes they are attempting to find email addresses so spam lists can be created (See “SPAM – email spiders”) and occasionally they are performing various research tasks. Usually they are related to search engines such as Altavista or perhaps even things you’ve set up such as the Atomz search engine service.

Hackers are usually active on pay sites, most especially the pornographic ones, or highly visible ones such as Yahoo or Amazon. They are usually trying to find vulnerabilities so they can accomplish some goal.

By examining your logs regularly you will find out data such as this. Then you will be making decisions as to whether or not that traffic is valuable or desired. This is especially true if you are paying for your bandwidth or if your ISP has indicated your traffic is too high.

What can you do if you decide that a spider or visitor is not desirable? Well, if it is a spider you can exclude it in your Robots.Txt file – although that is not necessarily valuable as many spiders (especially email grabbers) do not follow this standard. It is, however, a good place to start with your exclusion process.

Another thing you can do if you have access to your .htaccess file is deny the intruders access to your site.

So, edit your .htaccess file and add some deny clauses, as shown in the example below.

<Limit GET POST>
order deny,allow
deny from all
deny from 210.165.39.212
allow from all
</Limit>

What these lines accomplish is to allow all users to access the site, except those from the specified IP address. You can also include domain names in the form “.domain.com”.

There is no limit to the number of IP addresses and domains that may be included. Just set each one on it’s own line.