Train SpamAssassin Spam Email Message Detection in Zentyal

Train SpamAssassin Spam Email Message Detection in Zentyal

Category : How-to

Get Social!

SpamAssassin_logo

SpamAssassin does a great job at identifying SPAM emails and is probably one of the most popular in it’s class. An important part of fighting SPAM email is keeping up with the changes in both SPAM and non-SPAM (often called Ham) email content. Luckily for us, whatever magic SpamAssassin employs to decide if email is SPAM or not can be trained so that emails wrongly classified can be correctly classified in future.

I’m using SpamAssasin as part of Zentyal email server, however a stand alone SpamAssassin install could also be used with a similar process.

spam-training-inboxThe first step is to create a folder for a given email account which will be used to manually classify SPAM email messages. You could use your existing SPAM/ Junk folder but I’ve called mine ‘Spam Training’ to keep it separate. When you receive email to your inbox that hasn’t been classed as SPAM but should be, you’ll manually move the email to your new ‘Spam Training’ folder and SpamAssassin will update it’s detection routine accordingly. Messages could be falsely marked as SPAM and kept in the SPAM folder and you wouldn’t want SpamAssassin to automatically learn those to be SPAM which is why I’ve kept them separate.

Cron Job

The next step is to set up a cron that will execute the SpamAssassin utility sa-learn to consume the emails and update its detection mechanism. sa-learn will learn from messages that you specify as SPAM with the –spam switch and messages that are not SPAM with the –ham switch.

Open up a crontab shell and enter one or more of the below lines as required.

crontab -e

Crontab entries to teach SpamAssassin

Add one or more of the below lines to your crontab. Each entry is set to trigger every 24 hours at 0330 and log to /var/log/spam_train.log – both of these items can be changed as per your requirements. In addition, the mailbox location for these commands is set as /mailvol/jamescoyle.net/ and will need to be changed to match your email server environment.

There are a couple of other things to note with this process:

  1. Emails will only be learnt once. If you re-run the commands on the same emails they will not be learnt again.
  2. Emails will not be deleted from these folders so you’ll need to set up data retention rules.

Learn SPAM messages for a specific mailbox folder ‘Spam Training’

30 03 * * * L=/var/log/spam_train.log && echo `date` >> $L && sa-learn --spam --showdots /mailvol/jamescoyle.net/james.coyle/Maildir/.Spam\ Training/cur/ >> $L

Learn Ham messages for a specific mailbox folder ‘Spam Training’

30 03 * * * L=/var/log/spam_train.log && echo `date` >> $L && sa-learn --ham --showdots /mailvol/jamescoyle.net/james.coyle/Maildir/.Ham\ Training/cur/ >> $L

Search all users for a folder called ‘Spam Training’ and learn them as SPAM

Note: this could be process intensive for large mailboxes. 

30 03 * * * L=/var/log/spam_train.log && echo `date` >> $L && find /mailvol/jamescoyle.net/* -name '*Spam Training' -exec sa-learn --spam --showdots {}  >> $L \;

 


Visit our advertisers

Quick Poll

Are you using Docker.io?

Visit our advertisers