SpamAssassin is back

https://lwn.net/Articles/769917/

LWN.net needs you!

Without subscribers, LWN would simply not exist. Please consider
signing up for a subscription and helping
to keep LWN publishing

By
Jonathan Corbet
November 2, 2018


OSSEU

The
SpamAssassin 3.4.2 release was the
first from that project in well over three years. At the
2018
Open Source
Summit Europe
, Giovanni Bechis talked about that release and those that
will be coming in the near future. It would seem that, after an extended
period of quiet, the
SpamAssassin project is back
and has rededicated itself to the task of keeping junk out of our inboxes.

Bechis started by noting that spam filtering is hard because everybody’s
spam is different. It varies depending on which languages you speak, what your
personal interests are, which social networks you use, and so on. People
vary, so results vary; he knows a lot of Gmail users who say that its spam
filtering works well, but his Gmail account is full of spam. Since Google
knows little about him, it is unable to train itself to properly filter his
mail.

Just like Gmail, SpamAssassin isn’t the perfect filter for everybody right
out of the box; it’s really a framework that can be used to create that
filter. Getting the best out of it can involve spending some time to write
rules, for example. Most of the current rule base is aimed at
English-language spam, which isn’t helpful for people whose spam comes in
other languages. Another useful thing to do is to participate in the MassCheck
project
, which can quickly evaluate the effectiveness of new rules on a

[Giovanni Bechis]

large body of spam.
In particular, MassCheck performs a nightly
run
to check the hit rate of rules to determine how those
rules are performing in real installations. It can also check for overlap;
if two rules always trigger on the same messages, there isn’t really a need
for both of them. This information feeds into the RuleQA database to give a
picture of how the rules are working overall.

SpamAssassin is not just for email filtering, Bechis said; some sites are
using it to detect spam submitted in web forms, for example.

So what is new in SpamAssassin? There has been a lot of work by the
project’s system administration team, he said, to update the
infrastructure. That has resulted in the rebuilding of the MassCheck
implementation from scratch. The 3.4.2 release contained fixes for four security
bugs, and also an important workaround for a Perl bug that was only
triggered on Red-Hat-based distributions. Startup time has been improved,
and SSLv3 support has been removed. The “freemail antiforge” mechanism,
which seeks to detect forged Gmail messages, has been improved. The
geo-aware scoring system can adjust scores based on which continent the
mail came from. The URILocalBL
plugin, which can blacklist URLs based on information like where they are
hosted, has seen a number of improvements.

3.4.2 Also saw the addition of the HashBL
plugin, which can be used to block email addresses from domains that cannot
be blocked wholesale. There is a new anti-phishing
plugin
that can filter on URLs commonly found in phishing emails. The
new ResourceLimits
plugin
can put limits on the amount of CPU
and memory used by SpamAssassin. And the FromNameSpoof
plugin tries to detect attempts to confuse users about the source of an
email using the full-name field.

Some future plugins include a couple that are aimed at detecting Microsoft
Office attachments containing macros. There is one for checking URLs from
URL-shortening services; it will filter based on the final destination of
those URLs. The KAM.cf
ruleset
is an unofficial addition that can allow
sites to respond more quickly to new spam campaigns, but at a cost of more
false positive results. Also coming is a set of international channels
that will carry signed rulesets designed for different parts of the planet.

The SpamAssassin 4.0 release can be expected around January, Bechis said.
It will include full UTF-8 support that has been completely rewritten, with
better detection of east-Asian languages. The TxRep plugin, which
applies scores to messages depending on the reputation of the sender, is being
improved and will be able use PostgreSQL 10. The Office macro and URL
shortener plugins will be in this release, but another new plugin to check
for suspicious URLs inside attachments will have to wait until 4.1.

Further in the future, the project plans to update its approach to machine
learning. The current code is getting old, and there is interest in
applying deep-learning techniques to the spam-detection problem. There was
a Google Summer of Code project that attempted to make progress in that
area but it didn’t succeed, so more work is needed.

When asked about whether the SpamAssassin project had really slowed down as
much as its release history suggests, Bechis conceded that it had. A
number of people had left the project, and there were infrastructure
problems that blocked the rule-generation process. But the situation has
since improved, he said. The project has picked up a new set of developers
and is moving forward again. Certainly the world can only benefit from
better spam filtering.

The
slides from this talk [PDF]
are available.

[Thanks to the Linux
Foundation, LWN’s travel sponsor, for supporting my travel to the event.]


(
Log in to post comments)