February 23, 2004

Easy SpamAssassin Tips That Work

A LawTech Guru feature article by Jeffrey Beard
(For reprint arrangements, please contact me via .)

If you're using the popular SpamAssassin software to deal with spam, or perhaps considering its use, here are some firsthand tips written in plain English to improve its effectiveness:

SpamAssassin was included in the base monthly price of my web host provider, one of the deciding factors for choosing them. Between June and November, SpamAssassin did an incredibly accurate job of flagging spam with virtually no false positives (less than a dozen misflagged legit e-mails in 6 months). SpamAssassin does this by analyzing each e-mail for certain traits and then assesses a differently weighted value for each trait found. Then it adds up these values, and if the total exceeds your chosen threshold, it flags it as spam.

Since SpamAssassin had done a great job, I left the original default settings alone. In December, my experience changed dramatically. Suddenly, roughly half of incoming spam messages were scoring below SpamAssassin's default threshold of 5.0. Luckily I wasn't seeing any false positives (legit e-mail being moved into my Spam folder), but I had to wade through a lot of spam left in my regular Inbox. It appeared spammers crafted messages that fell under SpamAssassin's default settings radar. I didn't want to reduce the threshold score because some valid e-mail was scoring in the 4.x range. I'd rather err on the side of having some spam in my Inbox than filtering legitimate e-mails into my Spam folder. However, I missed reading several important messages in my Inbox because they were buried in the surrounding spam.

At first I chalked it up to the holidays -- spammers were going all out during the big spending season. But it didn't relent in January or February. That's when I decided to take things into my own hands. I called my host provider's tech support, which has been exceptional on technical matters. Surprisingly, both the first level rep and supervisor were pretty clueless on SpamAssassin, and suggested I head on over to SpamAssassin's web site for better documentation. I was disappointed there as well. Armed with the suspicion there had to be more people using SpamAssassin with similar problems, I went a-Googling.

I quickly located information on enabling SpamAssassin's RBL checks (Realtime Blackhole List, a blacklist of servers used by spammers), as well as its Bayesian features for better spam identification and classification. I found it easy to do, and it took only 20 minutes. The immediate results over the past several days is very encouraging, although quite preliminary: Out of more than 100 total spam messages received, all but five were properly identified as spam, and I had no false positives. That's a far cry from the 10-25 spams previously left in my Inbox each day.

Enabling SpamAssassin's RBL checks resulted in spam originating from known open relays (i.e., mail servers that allow spammers to send mail through them) receiving a substantially higher total score -- for example, 8.7 instead of 2.7. As mentioned above, anything scoring 5.0 and higher gets filtered into my Spam folder via a simple rule in my e-mail program. [Please Note: The corresponding risk with using RBL checks is that legitimate e-mail coming from blacklisted servers may be improperly flagged as spam because of this trait.]

So now you know the "Why" and my preliminary results. Here is the "How" for making desired changes, and it's not difficult:

At Lunarpages.com, I have two easy ways of changing my SpamAssassin user settings. The first is by using their web-based Control Panel, under Mail, then under SpamAssassin. The other was adding the desired changes to the text-based "user_prefs" file via an FTP upload to my server. The Catch: Either method requires one to understand the settings, syntax, and the best way to select them.

That's where the SpamAssassin Configuration Generator site came in most handily. My web server is running SpamAssassin version 2.63, and the SA Config Generator site works with versions 2.5x and above. As the site states, "This tool is designed to make it easier to customize an installation of SpamAssassin with some common options. After you answer the questions below, a SpamAssassin configuration file matching your choices will be displayed, and you can download it and use it with your SpamAssassin installation." The best part is that it not only lists some of the most useful SA features and their options, but actually explains what each setting does.

I entered my choices into the web form, and it generated the following SpamAssassin setting file for me:

# SpamAssassin config file for version 2.5x
# generated by http://www.yrex.com/spam/spamconfig.php (version 1.01)

# How many hits before a message is considered spam.
required_hits 5.0

# Whether to change the subject of suspected spam
rewrite_subject 0

# Text to prepend to subject if rewrite_subject is used
subject_tag *****SPAM*****

# Encapsulate spam in an attachment
report_safe 1

# Use terse version of the spam report
use_terse_report 0

# Enable the Bayes system
use_bayes 1

# Enable Bayes auto-learning
auto_learn 1

# Enable or disable network checks
skip_rbl_checks 0
use_razor2 1
use_dcc 1
use_pyzor 1

# Mail using languages used in these country codes will not be marked
# as being possibly spam in a foreign language.
# - english
ok_languages en

# Mail using locales used in these country codes will not be marked
# as being possibly spam in a foreign language.
ok_locales en

The big changes above were the "skip_rbl_checks 0" to enable RBL checking (don't you just love double negative syntaxes?), and the two Bayes settings.

After that, I downloaded the original default "user_prefs" file from my web server via FTP so I could edit it. Windows Notepad, while primitive, is more than sufficient for the quick copy/paste task. If you want a more full-featured text editor, then I strongly recommend TextPad. I retained all the original text for future reference (commented out by preceding "#" characters), pasted the above text into the bottom of the file, and saved it. It was then uploaded via FTP to replace the original.

To double-check the settings actually changed, I went into the web-based SpamAssassin Control Panel, and sure enough, all of the new settings were displayed. Alternatively, I could have manually entered the above settings into LunarPages' web-based Control Panel and skipped the FTP file transfer. If you are running some type of SpamAssassin plugin program locally on your PC instead of a web server, odds are that the text-based settings file is stored on your local hard drive.

Lastly, I expect everyone's mileage will vary, as we all have a different mix of e-mail messages. I also plan to monitor the true effectiveness of these setting changes over a longer period. However, it was quite empowering to be able to combat spam on my own terms and see immediate results. While somewhat cryptic at first, the SpamAssassin software was fairly easy to tweak with a little self-help. Perhaps best of all, I didn't have to go purchase one of the many commercial anti-spam packages or services, as it was already included in my low monthly web host fee.

I prefer using SpamAssassin because frankly, I've never liked the various "whitelist" spam services. Why should I make friends and business colleagues jump through confirmation hoops when the problem is on my end? Not exactly my idea of customer service. Likewise, there will always be some people who won't perform the confirmation process, so their e-mail would otherwise be blocked from me. So I prefer to let spam through as long as it's flagged and managed appropriately. I'm also dramatically increasing the odds that I will see the important messages that were previously buried amongst the flotsam.

As a parting tip, if you're looking for a good free FTP program without included adware, then I heartily recommend LeechFTP, which has many features and has worked extremely well for me.

Topic(s):   Feature Articles  |  Privacy & Security
Posted by Jeff Beard
Comments

Mr. Beard has once again demonstrated his knowledge, experience and tenacity in solving a mysterious area of the Internet.

Thanks for providing this advice. I need it and will implement it ASAP.

Thanks.

Posted by: J. Larry Green at April 7, 2004 09:18 AM