Wednesday, May 19, 2010

the privacy of feedly feeds

A while ago I confronted Feedly about an apparent hole in their firefox plugin on Twitter:


They claimed they don't store credentials per-se and after further investigation I believe them, but there's still something not quite right.

See, if you install the plugin, everything appears normal:

But when you turn on Firefox's "Private Browsing" mode and click the Feedly button, you still see your feeds!




Fortunately, after a while, Feedly attempts to update your feed and displays the login screen:



So this tells me that what feedly says is probably true, they don't cache your credentials in the plugin. However, they still apparently cache content from your feeds for a little while until the next refresh period. By itself, this content cache isn't a bad thing (it's a performance optimization and saves network bandwidth) -- but the fact that their local content cache doesn't respect privacy modes in the browser is somewhat disturbing... does that mean that they cache outside the browser's model? or does that mean that firefox doesn't secure local data? Either conclusion would be troubling.

Does this actually expose private information in practice? I can't guess how you'd exploit it, but it certainly doesn't give me a warm fuzzy feeling either.

Monday, May 17, 2010

SEO's "killer app"

I was stymied when I saw this article about the recent connection between pesticides and ADHD. It wasn't the subject that I found interesting, but the hyperlinks... in particular as I was reading the article something stood out as bizarre:
"Chemical influences like pesticides used to protect produce from insects are believed to contribute heavily to this upward trend. Some scientists believe it may have an even greater impact than other environmental factors like video games, television and online personal loan advertisements that may have been linked previously to ADHD behavior." [emphasis mine and original link to http://personalmoneystore.com removed.]
Really? Online personal loan advertisements have been linked to inducing ADHD behavior? Of course, I was curious and clicked the link... but it only went back to a rather obnoxious personal loan advertisement on the host site. I looked at other links in the story and realized they all pointed to the ad in interesting ways. Like the text about US and Canadian children... "Canadian" links to "ontario-payday-loans" on the site. Oh... oh that's dirty.

I traced the whois for the site to adworkz.com, an online marketing company specializing in SEO (or Search Engine Optimization). My first thought was that they had simply scraped the original article off the Net and then inserted links -- but I couldn't find any exact matches for this story. However, another story on their site yielded a bunch of exact matches on other sites.

So maybe they are doing something especially innovative for SEO... they are paraphrasing articles, which seem to be legit news stories (I mean they are unless they are scraped) - and then overlay the article with links to their ads. Hey, I give them some credit, the links weren't completely random.

However, this is still a relatively "hard" way to get the numbers required for SEO campaigns -- rewriting pieces and hand-linking them isn't the most scalable business model. And because it takes "human-time" to do, Google can conceivably keep up with the Joneses. But what if there was an automatic way to paraphrase?


SEO's Killer App

Unfortunately, automated paraphrasing is exactly what is around the corner. Anne Eisenberg's article "’Get Me Rewrite!’ ’Hold On, I’ll Pass You to the Computer,’" describes how researchers at MIT and Cornell have come up with probabilistic methods of generating paraphrased content automatically. The code has been out there for a long time already and there are several interesting projects along similar lines such as Sujit Pal's Lucene-summarizer, Classifier4J, Open Text Summarizer and MEAD.

Imagine you are an SEO marketer... you suddenly have the ability to automatically paraphrase content from other sources (even the New York Times!), but because there are no identical phrases, Google can't detect (and therefore can't block) your ranking. In fact, it's difficult for humans to tell which is the paraphrase and which is the original... and unlike most scraped content feeds, it looks completely legit to readers who stumble across it via search engines.

Mix this with other SEO techniques, such as dynamic domain generation, and sell into several different TLDs and search engines would not be able to detect or defeat the flood of seemingly legitimate links.

So why am I helping the "evil" SEO marketers out there by foretelling this unstoppable weakness and possibly bringing about the demise* of search engines?

*Yes, this is what unrestrained SEO might do. Search engines are only used by people when they yield useful results -- when every link goes to an ad, a trick, a deception, people will stop using the service. Notice the chilling effect that telemarketing has had on land-line phones (many people have cell-phones only now), or door-to-door salemen had on a previous era).

The Counter Move

Fortunately, search engines have some awesome counter moves at their disposal. It's the same thing that identifies spam email so readily: the target url. SEO marketers have many many tricks, but the one thing they can't disguise is the link they want you to go to. So this is how you identify them:

"SURBLs are lists of web sites that have appeared in unsolicited messages. Unlike most lists, SURBLs are not lists of message senders... Web sites seen in unsolicited messages tend to be more stable than the rapidly changing botnet IP addresses used to send the vast majority of them... Many applications support SURBLs, including SpamAssassin and filters for most major MTAs including sendmail, postfix, qmail, exim, Exchange, qpsmtpd and others."



Wednesday, May 12, 2010

consumers hooked on oil? please.

I don't agree with Peter Maass' assessment that
"because when you kind of get down to it, American consumers do want to have their gasoline."
As a consumer, I'd love to spend less on gas. One practical example of how I could do this is to telecommute to work (not every job can do this, but some can), but most corporations don't believe this is an option. How about WebEx meetings instead of business travel? Many don't believe this is as good as face time and are willing to pay for air travel. There are many other alternatives I could list that either require unacceptable tradeoffs or drastically reduced standards of living or drastically higher expenses to support alternative energy. The infrastructure for what he wants simply doesn't exist yet.

So what are consumers going to do? Of course they're going to want their gasoline because there is no way to make a living without it. Give us an alternative before accusing us of inaction! Personally, I think rising gas prices will ultimately motivate alternative energy. As alternative energy becomes cheaper than oil, consumers will be happy to switch, as will the corporations and markets that connect them.

But Maass' premise is insulting. He thinks we can afford it and we're just being petulant. Instead, he should be focusing his energy on the existing economic system that was built on oil. Of course we want to get off oil dependency as soon as possible and move towards sustainable energy, but Maass wants the consumer to single-handedly bear the cost of switching without any help from the structure of corporations or governments.

Most of us can't afford that and Maass is forgetting where the consumer's power to spend money comes from in the first place: the ability to earn money and spend responsibly within that structure, not apart from it.

Tuesday, May 4, 2010

Ben Ward says "Link to it!!"

Ben Ward's post about the difference between web-like rich applications and real web content is awesome.

I think it pins down some distinctions that have been brewing for a while, and solidifies several of my own thoughts about it.

For example, his statement:
"If you want to build the most amazing user interface, you will need to use native platforms."
This is the fuel behind java applets, Flash, SVG, VRML and every other extension to the web developed so far. It's about presentation. It's something the W3C does exceptionally poorly, and I think they shouldn't attempt to do it at all. It's also something that every tools company has struggled with: the only way to extend kick-ass native platforms into the browser is to create a new plugin. The only way to get a new plugin recognized is ubiquity among the user-base. Flash arguably won that battle, but it's a losing war in the long run because now it's Flash or nothing. The other "open" alternatives are attempts that are valid in their own right, but have never been quite ubiquitous enough. And HTML5 simply shuffles a bunch of capabilities from plugins to the browser, which certainly makes those capabilities ubiquitous, but doesn't necessarily make them better, or even more open (as Ben points out about H.264).

But Ben also makes a brilliant case for content:
"Want to know if your ‘HTML application’ is part of the web? Link me into it. Not just link me to it; link me into it. Not just to the black-box frontpage. Link me to a piece of content. Show me that it can be crawled, show me that we can draw strands of silk between the resources presented in your app. That is the web: The beautiful interconnection of navigable content. If your website locks content away in a container, outside the reach of hyperlinks, you’re not building any kind of ‘web’ app. You’re doing something else."
This idea, the linking of content, is exactly what the W3C has always excelled at. It's awesome, the idea that I can simply link and connect pieces of content from all over the world and make my own connections and conclusions and in turn become a piece of content for someone else. This was Tim-Berners Lee's original vision of the web and I agree with Ben, it shouldn't be lost. We should fight for it. Maybe the real problem is that presentation and content have gotten so tangled and confused that we've forgotten the difference?

This galvanizes me to talk about some experiments I've been doing recently to make simple websites that make the distinction VERY VERY clear. Let me show you an example:

This semester, I had to produce "mini-sites" for class homework assignments. These mini sites had to basically run as static html because they had to be viewed from the local filesystem of a TF after being zipped up and submitted. They had to be viewable on the wide range of platforms used by the TFs, which effectively meant vanilla HTML/CSS, but I didn't want to have to code a TOC by hand either. What I came up with was an xml file tuned exactly to my content.

Take a look at the finished html page: writeup.html

This was generated from a handcrafted xml file: homework.xml. (You'll have to "View Source" to see the xml if you're in a XSLT capable browser like FireFox because it will automatically render the HTML for you). There are very few presentation aspects in this xml file, and quite a few "new" custom elements I created that are specific to the structure of my assignments. It's small and ad-hoc, which is fine. It's basically a micro-format without all of the extraneous divs.

I render it with xslt using this file for the presentation layer: homwork.xsi. This file contains all the rejiggering and mucking about that presentation layers require. Even so, I tried to make it as clean as possible by utilizing CSS where I could and getting creative with XSLT. It can even provide microformats-compatible semantic markup in the generated div elements -- the best of both worlds! Finally, I generated the static html file using a very simple ruby script: genhtml.rb.

Sure it's rough and simple (I didn't have a lot of time), but it highlights what I want to show: we don't need fancy web platforms or standards to separate content from presentation. We can do it right now, today, with the tools we have, it just requires the right mind-set.

I'd love to see WebGL, Processing and Flash sites that have separate RESTful urls that flay out all their internal content in browseable xml or JSON microformats, easy to link to, easy to transform and include in other pages or references. (btw, Processing already does this somewhat by putting links to the source code in the applet page -- brilliant!)

I'd also love to see brilliant new presentation layer technologies form, like WebGL, Processing for the web, javascript, Flash, webkit, etc. because I want to innovate and create that new killer interface and interaction model for my content that no one has thought of before and no one else has implemented before!

I think Ben's post and my experiments show a way that we can have both of these things at the same time as long as we don't get confused and tangled up in the standards war or the RIA platform war and forget the difference between content and presentation.

Monday, May 3, 2010

OReily's Internet Operating System FTW!!

I'm glad OReily wrote his post "The State of the Internet Operating System." I agree with it a lot.

I said this years ago. In fact, I'll repeat the idea here: just like early video devs went through hell writing to non-standard graphic card memory models, early web devs have had to struggle with non-standard serialization and presentation formats.

And just like back then, a crop of "device drivers" slowly formed (first from DOS systems like Miles Sound System, etc. then later via the OS) -- well it's happening now: people fed up with the conflicting standards of the w3c and webdev-hell have had it and are starting to create "web-standard" independent "drivers" -- like webkit, like jquery -- that don't care what version of HTML browser or CSS you really have, that degrade gracefully by offering "hardware abstractions" (or in this case browser and w3c standards abstractions!) And javascript is rapidly becoming the language of the new platform because no-one else is moving fast enough.

In my mind it's time to put the W3C to bed where they belong -- leave them as keepers of a data standard to the semantic web, but for crying out loud, give the presentation devs a standard platform we can actually code to with the solidity and flexibility of postscript or renderman!

IMHO, the next steps (great leaps forward) are standard serialization formats. Who gives a flying &#&#@ if you are doing a GET or a POST or a JSON or an xml, or java-to-javascript, or what the hell have you -- SERIOUSLY, it's as insane as memory management was before contiguous memory controllers (does anyone remember those days?) -- it's all state serialization, it should have a freakin standard way of writing and reading no matter where it comes from or where it goes to -- I'm not talking implementation details, I'm talking raw syntax.

ActiveRecord and JSON begins to get close to this, but we've got a LONG way to go before I can just say "give me an address from the database and post it to that RESTful service and write it to a memory cache over there -- and it's all the same damn syntax! Why reinvent the wheel 80 bazillion times?

/rant off