Docs/SysAdmin/Server/Mail/Mailfilter
From Mandriva Community Wiki
A lot of us are bothered by spam these days, to put it mildly. This tutorial explains how to set up mail filtering in order get rid of spam.
Contents |
[edit] Conventions Used in this Tutorial
mail client is what you use to read and send e-mail, like KMail, Mozilla-mail, etc.
localuser is your username on the local machine, ie. who you log in as.
username is the username you have for logging in to your POP server.
password is the password you have for logging in to your POP server.
[edit] Assumptions
- You should know how to edit configuration files in a text editor either from the GUI or the command line in a terminal.
- You should have your software sources configured for contrib.
[edit] What is spam?
"Spam" is also known as "junk mail" or "unsolicited commercial e-mail". Most of them are sex or "earn money fast" advertisements.
[edit] How to deal with spam
You can delete the spam yourself, when you read it. But you can also save time by setting up a mail filter that will classify your emails for you. The following strategies can be applied:
- Text analysis. Some simple rules are applied. Eg: all mail containing "penis enlargement" in the tile are classified as spam. It is the less efficient method.
- Naive Bayesian rule. It is a naive but efficient classification based on probabilities. Given two training data sets (one containing "good" mails, the other containing "junk" mails) the classifier tries to guess in which category the incoming mail belongs to. It is the same strategy than the previous one, except that the rules are created by the system itself, from the training sets, being thus much more efficient. The major drawback is the risk of "false positive" classification: The system is not 100% reliable, and some good messages can be clasified as "junk".
- Blacklist and collaborative spam-tracking database. By definition, a spammer spams many people. The idea here is that one of the victim warns the other ot the spam he ha just received. This collaborative strategy is quite effcient (and there are virtually no "false positive"). http://www.mail-abuse.org or http://www.ordb.org give blacklists. http://razor.sf.net/ is a collaborative spam-tracking database.
[edit] Mail client turnaround
As one very simple solution, you should know that some advanced mail clients can perform mail filtering. This is the case of Mozilla-mail, for instance:
[edit] Mozilla-mail
In the menu choose Tools > Junk mail control. Select which account you want to apply the spam filter on, and check Enable junk mail controls. This mail filter offers you the following option:
- Do not mark messages as junk mail if the sender is in my address book. This is definitely an efficient rule to follow. It will prevent many cases of "false positive" classification.
- Move incoming determined to be junk mail to. Of course, you want to get rid of your junk mail. But you probably don't want to delete it without being notified. Mozilla actually moves the junk in another folder (by default "Junk" but you can chose any other folder, even from another account). You should have a look from time to time, to readd the "false positive".
At the beginning, Mozilla won't classify anything for you. This is because Mozilla doesn't know what is a junk mail, yet: You have to teach the classifier. To do this, simply press the "junk" button when you are facing a spam.
[edit] Fetchmail + Mailfilter
As one solution, I would like to introduce you to Mailfilter (http://mailfilter.sourceforge.net/), which applies simple rules.
This is a very lightweight and easy to use solution for people who have relatively simple mail setups. What I mean by that is, you are using a simple mail client to grab your POP mail, not processing large volumes of mail form various locations, or using Hotmail or other web based e-mail.
Your mail client will no longer connect directly with the POP server, but will snag your mail from your local machine (i.e. /var/spool/mail/localuser). And you will retrieve your mail from your provider to your local machine with fetchmail. It's no big deal, you won't even notice the difference, except of course that you are receiving less spam, or, eventually, none at all! (RegisDecamps - 31 May 2003: I don't agree with that: It makes a difference if you are using Imap to read your mails)
[edit] Mailfilter
Login as root and enter urpmi mailfilter.
Essentially, what Mailfilter does is analyse the mail on your POP server for certain charactaristics in the header, like "From", "To", or "cc", "Subject", and the like. It checks those characteristics against your config file rules and deletes what you have defined as spam. For example, you might have an entry in your config file like this:
DENY = "^Subject:.*VIAGRA"
so that any mail that has the word "viagra" in the subject line will be classified as spam. One really great thing about mailfilter is that it comes with an example config file, so that you can see the basic syntax to use, called Regular Expressions (or regexp for short). If you are already familiar with regexp, then you will be up and running in no time. If not, enough examples are provided in the supplied config file to get you started pretty much right away.
But let's say your friend sends you an e-mail with the subject "Viagra". Why one would do that, I have no idea, but there is a way to make sure you don't miss your friend's e-mail about how much fun Viagra is, and it's an ALLOW rule. For example, let's say your friend's e-mail is [email protected] (this is starting to sound like a pretty strange person, but hey, we've all got strange friends, right?). Well, you just put an entry in your config like this:
ALLOW = "^From:.*[email protected]"
and your friend's e-mail will not get deleted.
Just a simple setup like that is going to put a serious dent in the amount of spam you receive, but as you go along, you can get more creative with your rules, and I will update this page with some of the rules that I have picked up on the way. Of course you should read the docs provided on the site, particularly the FAQ, but I would recommend joining the mailing list as well, so if something isn't covered in the docs or you are not clear, you can ask informed questions on the list. There are several people, including the very friendly and helpful developer, Andreas Bauer, who are usually pretty quick to respond, although don't be offended if they just point you back to the documentation.
[edit] Fetchmail
To configure your mail client, what you are going to do is go into the options for your mail account, and in the dialogue box for "Servers", you will replace *pop3.ispname.com* with your local mail spool. As I mentioned above it should be /var/spool/mail/localuser. The way this is done is going to vary from mail client to mail client. Click on Help in Mozilla, KMail, Sylpheed, or whatever you use if you are not sure how to make this change.
Next, you need to set up fetchmail. Normally, fetchmail does not run automatically when your machine boots, so you have two choices here: you can enable fetchmail to run at system boot or when you log in. Personally, I chose to have it run when I log in, just because it is a little easier to configure for newbies like us! What you do is edit your .bashrc, located in your home directory. You could simply add the line "exec fetchmail", but the problem there is, what if it is already running for some reason? So I was lucky enough for someone to provide me with this little bit of script you can paste into your .bashrc:
#exec on startup ps ax > ~/tmp/bashterm grep fetchmail ~/tmp/bashterm > /dev/null if [ $? -ne 0 ] ; then fetchmail fi
What this does is check to see if fetchmail is already running, and if not, execute it. You should have a ~/tmp directory by default, and this script will create a file called bashterm in there to do it's business. Now, of course, you may want to kill fetchmail when you logout, perhaps so as not to interfere with another user. In that case, you will edit another file in your home directory, .bash_logout. This one is a lot simpler, because all it does is use the -q switch from fetchmail to kill it. In your .bash_logout file, add these lines:
fetchmail -q clear
All of this is according to personal preferences, of course, but I can assure you that the way I have it set up works fine for me.
Now on to configuring fetchmail: You can use the fetchmailconf program to create the configuration file, which you can run from the command line, or you could create one yourself, called .fetchmailrc, in your home directory. It should look something like this:
# Configuration file for fetchmail set postmaster "localuser" set bouncemail set no spambounce set properties "" set daemon (time in seconds between checking your POP server) poll pop3.ispname.com with proto POP3 user 'username' there with password 'password' is 'username' 'localuser' here options nokeep/keep preconnect "mailfilter"
The first thing to notice is right at the end, where it says preconnect "mailfilter". This is where mailfilter is "called" to analyse the mail on the server.
You will see that after options, I put nokeep/keep. This is up to you to choose, one or the other. What it means is that when fetchmail retrieves the mail from your POP server, it will either delete it from the server (nokeep) or leave a copy on the server (keep). Just keep these two things in mind: if you choose keep, and your mail piles up on the POP server, your ISP could get angry at you and take various actions, such as not letting any more mail through. If you choose nokeep, you will have a copy of it on your local machine, and then when your mail client retrieves it, it is always a good idea to archive your mail anyway. The only reason to use keep that I have run into is that there may be an account that you share, like say husband and wife, or whole family, and you don't use the same computer to retrieve the mail, or use seperate accounts on the same computer. In a case like this, it is best to put some thought into how you are going to run fetchmail before proceeding. I will attempt in later stages of this tutorial to give some guidance and examples for you to consider.
Of course, if you check more than one POP account, you can simply add more entries to the .fetchmailrc, in the same format, except, as explained in the mailfilter docs, you only have to call mailfilter once.
[edit] Fetchmail + Spamassassin + Procmail + Courier-imap
The SpamAssassin tactics used include: text and header analysis blacklists and a collaborative spam-tracking database
In this solution, we will use
- Fetchmail to fetch your mails from your provider to your local machine
- Spamassin to classify them and tag them (alternatively, you can use http://spamprobe.sf.net/ which applies Naive Bayesian rule)
- Procmail to store the tagged mails in the appropriate Maildir
- Courier-imap to serve your mails stored in the Maildir to any imap-compliant mail client (equally you can use WU-imap or Cyrus-imap)
[edit] Spamassassin
Log in as root, and urpmi spamassassin. The default rules are very good, we are done with it!
[edit] Fetchmail
This is exactly like in the previous section; see: Fetchmail
[edit] Imap
urpmi maildirmake
Mails will be stored in the Maildir format. Each user should thus perform in their home directory:
maildirmake++ Maildir
[edit] Procmail
Procmail is responsible for delivering the mail locally.
As root, edit /etc/procmailrc (or each user should edit his or her .procmailrc ) to make the mail go through spamassin:
VERBOSE=no SHELL=/bin/sh PATH=/usr/local/bin:/usr/bin:/bin MAILDIR=$HOME/Maildir/ #ORGMAIL=emergency-inbox DEFAULT=new LOGFILE=procmail.log :0fw | spamassassin --auto-whitelist
As a result, the mail will be tagged by spamassassin. All you have to do now is put in a different imap folder the mails that have been classified as spam. Still editing /etc/procmailrc add:
:0: * ^X-Spam-Status: Yes .Junk/
Thus, the junk mail will automatically be stored in the Imap folder called "Junk".
[edit] Courier-imap
Up to now, your emails are classified in $HOME/Maildir/ if they are good, and $HOME/Maildir/.Junk/ if they are spam (accordingly to the Maildir format, unread mail is in the new/ subdirectory and read mail is in the cur/ subdirectory). Some mail clients, like Sylpheed, can directly read in the Maildirs; but it might be easier to set up an Imap server:
urpmi courier-imap
Now configure your mail client to use localhost as mail server, with imap protocol.
Since you now have an Imap mail server, you can read your mails exactly the same way, with any mail client, from any computer (that is networked, of course).
[edit] Debugging
This last solution is the most "advanced". It is not really hard to set up, but since several programs are implied, a mistake can be hard to find. You will find interesting logs in /var/log/mail/* and $HOME/Maildir/procmail.log.
[edit] Bayesian filters
Bayesian filters seem to be very efficient (there was a good comparison on slashdot recently). AugustinMa - 28 Aug 2003
[edit] Bogofilter
Current Sylpheed - and no doubt other clients - can be easily configured to use bogofilter from within for any spam that still arrives in your inbox, and it will autolearn on messages that you characterize as 'junk' or 'no junk' with the appropriate buttons. (Dick Gevers - 19 Feb 2006)
[edit] POPFile
PopFile is very easy to configure, but much of the documentation refers to Windows installations. See the PopFile Wiki page for details of setting up and using PopFile. AnneWilson 13 Jan 2005
[edit] Other References
Setting up Fetchmail, Procmail, and SpamAssassin
Unless otherwise noted, this page has been written by: RegisDecamps - 31 May 2003