I was stymied when I saw this article about the recent connection between pesticides and ADHD. It wasn't the subject that I found interesting, but the hyperlinks... in particular as I was reading the article something stood out as bizarre:
"Chemical influences like pesticides used to protect produce from insects are believed to contribute heavily to this upward trend. Some scientists believe it may have an even greater impact than other environmental factors like video games, television and online personal loan advertisements that may have been linked previously to ADHD behavior." [emphasis mine and original link to removed.]
Really? Online personal loan advertisements have been linked to inducing ADHD behavior? Of course, I was curious and clicked the link... but it only went back to a rather obnoxious personal loan advertisement on the host site. I looked at other links in the story and realized they all pointed to the ad in interesting ways. Like the text about US and Canadian children... "Canadian" links to "ontario-payday-loans" on the site. Oh... oh that's dirty.

I traced the whois for the site to, an online marketing company specializing in SEO (or Search Engine Optimization). My first thought was that they had simply scraped the original article off the Net and then inserted links -- but I couldn't find any exact matches for this story. However, another story on their site yielded a bunch of exact matches on other sites.

So maybe they are doing something especially innovative for SEO... they are paraphrasing articles, which seem to be legit news stories (I mean they are unless they are scraped) - and then overlay the article with links to their ads. Hey, I give them some credit, the links weren't completely random.

However, this is still a relatively "hard" way to get the numbers required for SEO campaigns -- rewriting pieces and hand-linking them isn't the most scalable business model. And because it takes "human-time" to do, Google can conceivably keep up with the Joneses. But what if there was an automatic way to paraphrase?

SEO's Killer App

Unfortunately, automated paraphrasing is exactly what is around the corner. Anne Eisenberg's article "’Get Me Rewrite!’ ’Hold On, I’ll Pass You to the Computer,’" describes how researchers at MIT and Cornell have come up with probabilistic methods of generating paraphrased content automatically. The code has been out there for a long time already and there are several interesting projects along similar lines such as Sujit Pal's Lucene-summarizer, Classifier4J, Open Text Summarizer and MEAD.

Imagine you are an SEO marketer... you suddenly have the ability to automatically paraphrase content from other sources (even the New York Times!), but because there are no identical phrases, Google can't detect (and therefore can't block) your ranking. In fact, it's difficult for humans to tell which is the paraphrase and which is the original... and unlike most scraped content feeds, it looks completely legit to readers who stumble across it via search engines.

Mix this with other SEO techniques, such as dynamic domain generation, and sell into several different TLDs and search engines would not be able to detect or defeat the flood of seemingly legitimate links.

So why am I helping the "evil" SEO marketers out there by foretelling this unstoppable weakness and possibly bringing about the demise* of search engines?

*Yes, this is what unrestrained SEO might do. Search engines are only used by people when they yield useful results -- when every link goes to an ad, a trick, a deception, people will stop using the service. Notice the chilling effect that telemarketing has had on land-line phones (many people have cell-phones only now), or door-to-door salemen had on a previous era).

The Counter Move

Fortunately, search engines have some awesome counter moves at their disposal. It's the same thing that identifies spam email so readily: the target url. SEO marketers have many many tricks, but the one thing they can't disguise is the link they want you to go to. So this is how you identify them:

"SURBLs are lists of web sites that have appeared in unsolicited messages. Unlike most lists, SURBLs are not lists of message senders... Web sites seen in unsolicited messages tend to be more stable than the rapidly changing botnet IP addresses used to send the vast majority of them... Many applications support SURBLs, including SpamAssassin and filters for most major MTAs including sendmail, postfix, qmail, exim, Exchange, qpsmtpd and others."

