The Inevitability of Spam

Blog spam (blam), search engine spam (spamdexing), fax spam, wikipedia spam, second-life spam, Instant Messenger spam (spim) - you name an electronic form of communication, I’ll show you a spam that invades it.

Taking into account the following variables:
  • Overhead: The costs and overhead of electronic spamming include bandwidth, developing or acquiring an email/wiki/blog spam tool, taking over or acquiring a host/zombie, etc.
  • TransactionCost: The incremental cost of contacting each additional recipient once a method of spamming is constructed, multiplied by the number of recipients.
  • Risks: Chance and severity of legal and/or public reactions, including damages and punitive damages
  • Damage: Impact on the community and/or communication channels being spammed (see Newsgroup spam)
  • Benefit: total expected profit from spam
  • ConversionRate: chance of someone who is spammed adding to your Benefit total
The back of the napkin formula to predict “if spam will happen” is

Risks*AudienceSize + Overhead + TransactionCost*AudienceSize
(is greater or less than)
Benefit*ConversionRate*AudienceSize

It gets interesting when, as in nearly every form of electronic communication, TransactionCost quickly approaches zero. Also, anonymity is easy, so Risks are also minimized.  Doing a little algebra we get:

Overhead/AudienceSize <> Benefit*ConversionRate

So, as soon as any electronic community gets large enough to outweigh the initial spam-tool overhead, it will invariably fall prey to spam. Slightly pessimistic of me, I know - there doesn’t exist any other alternatives, machine techniques for spam filtering continue to fight an arms race with spam distributors with no clear limits on the horizon. The only hope (that I can see) is CAPTCHAS offsetting the formula by making TransactionCosts keep above zero, imposing a “cost” of time/attention/thinking for every communication.

Which brings us to the facetiously named

Hill’s Law
Any online community’s total value = the cost of injecting spam into the system * amount of spam in the system * ? (a constant I just made up)

Put another way: A new and potentially more accurate way to estimate a startup’s market worth as it vies for VC money is through the sum of spim, blam, spaSMS, spamdexing, bots, farmers, phishers and offshore traders.

The modestly named “Hill’s Law” came to mind during talks with internet companies of various sizes about their relative problems with spam, and observing how the problem’s magnitude (IMHO) tracked with the community’s market value. An example is Yahoo’s recent addition of a CAPTCHA verification for their online chatting service, a common reaction to reduce spam in the transaction-cost-free communication environment of “web 2.0" communities. Combine this with the going rate for CAPTCHA cracking style low latency OCR work (human powered? Who cares!) through something similar to Amazon Mechanical Turk, and you’ve got a precise market metric for exactly how much it is worth to spammers to infiltrate a chat room and push unwanted ads to the room and a known number of viewers, a number I’m sure they’d be none to happy about revealing. I think this is a better metric than the CPM or CPA cost of advertising, because the spam has to go through the same hoops that each user does when communicating.

For any given form of electronic (or low transaction cost) communication that provides the backbone of a Web 2.0 community:
  • … is there a range of CAPTCHA difficulty/human-only barrier placed on the communication choke points where (economically speaking) it isn’t worth it for any spammers to outsource or manually crack through, but is “worth it” for the general users of the service to put up with?
  • … to what extent does placing these added restrictions on the end users’ experience squeeze down on the users’ tolerance for the prevalent advertising based business model?
  • … and finally, can this squeeze be offset by tricks like re-captcha, actually putting those brain cycles to work and recouping value from the spammers?
I vote ‘no chance in hell but enough to manage the issue’, ‘enough that Yahoo should be sweating’, and ‘absolutely’, respectively. But for the next year, the market will be swinging back and fourth (take a look at the user’s complaints on the rooms before the CAPTCHAs went in), and a great time for a few people to make loads of cash both exploiting and patching those inefficiencies.

Originally Posted: June 28th, 2007

Comments

Popular posts from this blog

Visualizing the user post migration paths across Reddit to extract linked communities

Why people don't like Product Managers