The War on Spam
by Larry Kelly
Without some corrective action, spam threatens to turn e-mail into one gigantic electronic garbage delivery service
by Larry Kelly
Anyone who has an e-mail account has probably come into contact with spam. The word spam was reportedly taken from a Monty Python skit and refers to commercial unsolicited, automated e-mail. I don’t think that it would be exaggerating to say that spam delivery has reached epidemic proportions. In one week before last Christmas, I received 1,250 spam e-mails. Of course, not all commercial e-mail is spam. Many companies use “opt-in” campaigns that solicit approval before adding a name to a mailing list. Some Web forms are mildly deceptive about getting the approval, but respectable companies will delete an e-mail address when requested (“opt-out”). Bulk e-mail generated by viruses, while potentially nasty and a real nuisance, is not considered spam.
According to the latest report from Postini, an e-mail security service provider that collects real-time data from the 40-plus million e-mail messages the company processes each day for users of its spam and virus filtering services, the number of spam messages sent to Postini’s users increased 150% last year. In 2002, the company processed more than nine billion messages. During this period, the fraction of messages identified as “spam” increased from 20% in January to more than 60% in December. In a single day, Postini quarantined nearly 22 million spam messages which represented 64% of the total messages it processed. Other e-mail filtering service providers have noted similar increases.
Spam is becoming a serious drain on the electronic mail handling system, and the cost of equipment to handle in the increased e-mail volume will certainly be passed on to customers. Mail servers require larger hard drives and processing power as they bog down under the load, potentially spending more time stopping spam then delivering mail. There is also a direct cost to companies in the employee time needed to sort incoming messages and delete the unwanted content. Some users of wireless e-mail devices such as the BlackBerry even get the privilege of paying for their spam. Finally, there is the cost of valuable e-mail mistakenly lost in the sea of spam either by oversight or erroneous mail filtering.
Without some corrective action, spam will turn e-mail into one gigantic electronic garbage delivery service. Twenty-six American states have passed spam controlling legislation but the Canadian government has not. At this point, the only legislation comes from the Personal Information Protection and Electronic Documents Act which controls the sale of personal information. This will have some measure of control on the e-mail list sellers but not the spammers themselves. A good resource in the fight against spam is the Coalition Against Unsolicited Commercial Email or CAUCE (it rhymes with “sauce”), a worldwide organization of volunteers who work tirelessly to fight spam. For the activists, SpamCop offers a service which sends notices to the spammer’s Internet Service Provider (ISP). The spam victim uses a software command in their e-mail client to “reveal headers” and then forwards the offending message to SpamCop. The headers are required because the return address in the spam message is usually forged. The true origin of the e-mail is deciphered and a warning sent to the spammer’s ISP. SpamCop claims that this is effective and may cause a spammer to lose their Internet account.
Unless a gigantic asteroid crashes into the earth sending the surviving population in to caves, electronic communications is here to stay. Unfortunately so is spam. The most effective method of dealing with spam is to use an e-mail filtering service or a hardware filtering device. But this costs money and in some cases, serious money. A simple and reasonably low cost solution is the spam filter. A spam filter is software that takes on the job of sorting good e-mail from bad, but the unique nature of e-mail makes the filtering process difficult. After all, one man’s spam is another man’s treasure. Filters rely on a combination of tools: “blacklists,” “whitelists,” good word lists and bad word lists.
Blacklists are well-intentioned lists of known spammers’ e-mail addresses. However, centralized versions made headlines in tech circles because their high error rate quarantined hundreds of thousands of legitimate messages. Once a sender’s address was on a blacklist, even by mistake, it was very difficult to get it off. Local blacklists customized for each user seem to be a better approach. They won’t stop new spam but they keep previously received spam from showing up again. A whitelist is just the opposite – a list of all e-mail addresses that the user wishes to receive. Any e-mail address on the whitelist is passed through regardless of its content. Good filters make it easy to add to the whitelist as new e-mail addresses are received.
Good word lists consist of words that would not likely be used by spammers. This might include products, projects, model numbers, or other unique words. When good words are found, they can either cause the filter to skip the remaining filter process and send the message to the inbox or simply lessen the impact of the bad words in the overall equation. The more good words that show up, the less likely that a message is spam.
Bad words call for the real black magic in spam filtering. Spammers are constantly changing their strategy to evade filtering. The hottest technology now is the Bayesian filter. These spam filters calculate the probability that a message is spam based on its contents. Each “bad word” is assigned a number representing the likelihood that it represents spam. These terms are mathematically combined to create an overall probability that the message is a piece of spam. If a combination of words exceeds a certain probability threshold, it is diverted to your spam bucket. One researcher found that filtering out e-mails with the single word “click” would eliminate 80% of spam. But if filtering based on click is effective, you can be sure spammers will stop using it or use a spelling variation such as clik, c(l)ick or clikk.
Most filters must be taught the difference between spam and good mail and they require ongoing maintenance and adjustment. A finetuned filter can remove 95% or more of unsolicited e-mail with very few false positives (good e-mail erroneously classified as spam) while a poorly maintained filter will be hard pressed to stop 50%. Up-to-date filter rules are important but the order of the filter rules is critical. The whitelist filters should be placed at the beginning of the filter list. If not, an e-mail from a friend with a phrase like “Hey, Larry, I got some free coupons for the casino” would be bounced to the spam pile because of the words “free” and “casino” before it got to the address filter. Put the positive filter rules (whitelist and good words) at the beginning and the Blacklist and negative word filters (such as adult, cash, celebrity) at the end. Since no filter is perfect, send spam to a mailbox called Filter or Spam that can be reviewed before it is emptied. Keep in mind that while spam that sneaks through a filter is a nuisance, lost or misclassified e-mail is an order of magnitude worse.
Many e-mail filters have the option to label messages with colours or words. When e-mail volume is low and clean, labels are not really required. However, when the spam rates are in the dozens or hundreds per day, labels are a quick way to identify e-mail by subject, content or sender. Assign sales to one colour and friends and family another. E-mails with headers marked in red really stand out from the pile.
Pages: 1 2







Follow Alberta Venture On: