Thursday, December 23, 2010

how do you "freeze" individual gems in Rails 3?

Ok, this wasn't documented anywhere, (the best I could find was the Rails 3.0 announcement and the Bundler rationale) so after reading the Rails guide I hacked around with it on my own.

In the old days (Rails 2) you could do something like this:
rake gems:unpack
and it would take the required gems and copy them into vendor/gems.

But Rails 3.0 gets rid of this handy feature in favor of Bundler's approach. Bundler's basic approach assumes that you'll always run bundle install in every deployment environment and that it can go to the internet to get the gems. Neither of these assumptions are valid in my case.

There is one way of doing this:
bundle install --deployment
but that copies everything, rails, ruby, 10,000 gems I've never heard of (well ya, I've heard of 'em, but sheesh) and nicely bloats the target environment. So it seems it is all or nothing.

Anyone else figure this out?

Thursday, December 16, 2010

the single most important unstated rule about BDD...

I've been co-leading a book club at work studying the RSpec Book. One thing that was difficult for me and other people new to Behavior-Driven Development (BDD) is that it often seems like the authors modify code in the BDD part of the cycle instead of waiting until the Test-Driven Development (TDD) part of the cycle.

For example, the first chapters dealing with BDD seem to make a lot of changes in actual code files and only at the end of Chapter 4 do they drop the inscrutable phrase, "now we have our first logical error." I know they mean "our first error that requires going into TDD" (because that's the next chapter), but how does this differ from all the code mucking they've already done? At first glance it looks like you can just skip around writing whatever code you want.

But I've found that the authors of the RSpec Book follow a really important rule that they don't come out and say:
When in the BDD cycle you can define new classes or methods (and even arguments to methods) BUT you are not allowed under any circumstances to change or define the implementation of those classes and methods.
This is super important because it explains why the authors sometimes write code while they're still in the BDD phase-- they aren't skipping around after all!

It makes sense when you think about it because BDD is all about the behavior and interaction between classes (outside), whereas TDD is all about the functionality and implementation of classes (inside). So, of course, in BDD you'd want the flexibility to define an interface on the outside as much as possible before being driven down into TDD.

Many programmers are confused at first about the difference between BDD and TDD and as a consequence, many more have blogged about the philosophical differences (google it!) but I haven't seen any clear rules before. This rule really helps solidify the differences in practical terms so you know exactly when you are in BDD and when you aren't.

Monday, November 22, 2010

are entropy and risk related?

A friend of mine who is a plasma physicist posted a personal version of his presentation on the nature of the instability generated by a Maxwell's Demon wire-array. Well who doesn't like Maxwell's Demon, the little imp who goes around subverting Newton's Second Law of Thermodynamics by decreasing entropy?

Well since my friend's version had a lot more physics humor in it, I said he had invented a new field: "stand-up physicist" (although some may claim Feynman has first dibs on that)... my friend went on to say that next time he should relate the demon size collapse to our financial cycles. Heh heh...

But wait... I think that implies a really interesting question: is the concept of risk (including financial risk) somehow related to entropy?

In a time when our markets are being determined more and more by the math of physics and information theory, the idea that lowering financial risk is somehow akin to lowering entropy would be a very deep insight into the limits of a financial system.

Think about it. So far, Maxwell's demon hasn't beaten the 2nd law on the large... you may lower local entropy, but in the large, things always bounce back to a net entropy increase. Sound familiar? Markets and quants may be able to locally lower risk through use of financial derivatives, but ultimately, in the large, the markets always bounce back.

Wow. I just googled for "risk entropy" and apparently people in the field are already well aware of the connection. Well, even if it's not original, it's still a fascinating relationship.

Actually, this source summed it up great (duh!):

Any project, large or small is associated with expected and unexpected problems. The analogy mentioned above could be derived from the Second Law of Thermodynamics. The Second law of thermodynamics deals with a concept : Entropy. Entropy, in short, is the amount of disorderliness of the system. Entropy is also a measure on the information contained in an system. In information technology, entropy is considered as the amount of uncertainty in an given system. This has a defined relation, "As the amount of information increases, the disorderliness of a system (entropy) decreases".

Thursday, August 26, 2010

compiling opengl-redbook examples on Lucid guest


Rats! Turns out that any video setting or reboot gets confused with the nvidia driver present -- even though it isn't set, it doesn't play nice with the virtualbox driver... convinces it there is no 3D present and runs it in software (dirt slow). I eventually uninstalled it... maybe it's freeglut, but not sure. Needs more research. Later.


I'm starting a class this fall in computer graphics and I thought I'd experiment with trying to compile some of the examples in the redbook. I have to wait for our class copies of VisualStudio (our course is taught targeting Windows), but I wanted to try some things, so I decided to use my Lucid Lynx Ubuntu 10.04 instance to compile examples.

First off, VirtualBox rocks. You have to hand it to the team because their 3D acceleration is strictly super-awesome... it lets me run full compiz settings while running on a i7 920 GTX 260 equipped Windows 7 host. It's fast! Like pretty close to native fast. So fast that I simply use ubuntu in VM mode instead of dual-booting. (Actually, I pretty much like everything about Windows 7 too, except maybe the filesystem changes, but there are more tools available for Linux, so it helps to have the best of both worlds.) Make sure 3D acceleration is enabled in your vm before you start; I also used the max setting of 128 MB VRAM. Anyway, back to the story...

Once in the guest OS, I downloaded the samples via apt-get, installed the usual libs... in this case the ones suggested here. Like the poster I tried the shipped makefile at first and got:

$ make
make: *** No rule to make target `$@.o', needed by `hello'. Stop.

Ug. I suspect some dialect of gnu make doesn't like that variable name or syntax, but gave up trying to understand it too much (yeah, lazy) and switched to the suggested longhand:

$ g++ hello.c -lGL -lGLU -lglut -o hello

Success! Then I ran it:

$ ./hello
OpenGL Warning: XGetVisualInfo returned 0 visuals for 0x24c10a0

Segmentation fault

Ug. Not what I wanted. Did some poking around and found that "nVidia and ATI "driver" installs on linux *replace* -lGL and -lGLU default Mesa installations with their own." So on a lark I tried

$ sudo apt-get install nvidia-glx-185-dev

And it worked like a charm!!

That's kind of obvious (because the VirtualBox 3D is supposed to be as clean a passthru as possible) but also kind of amazing (because the ubuntu guest is setup to use the virtualbox driver, not the nvidia 185 driver for ubuntu) -- it just worked... at least for this sample! :)


Saturday, August 7, 2010

DRM should live in the document, not the device

I started taking a closer look at Kindle. A new semester is starting up and I noticed that several of the optional textbooks for the course are available on Amazon in Kindle format, so I was interested in seeing what the format could do.

Here's what I found:

  • If you are writing academic papers you can't cite Kindle versions very well. My suggestion to the Kindle development team is to expose "Locations" as standard URIs instead of the current proprietary bookmarks that only work on Kindle. URIs would be sharable and documentable and thus fit better with existing citation standards for electronic sources.

  • If you are using Kindle for PC, and you have a programming book with source code samples, you can't copy/paste the samples from the app to your editor. It's very ironic that you have to retype in code snippets when you're reading an ebook on your pc.

  • If you have a book and a friend asks you about it, a very common social case is to say "oh, here, you can borrow it, I'm not reading it right now" -- you can't do this with the Kindle because the DRM is in the device, not the document. I have to loan my entire library (and the Kindle too) in order to satisfy this use case. For this reason, I'm beginning to think that DRM (document rights management) should live in the doc, and merely enforce uniqueness and have nothing to do with "rights" or licenses per se.
I know the suits will be shocked at this idea: "but but, we were going to get all this money from individual sales forced on the customers!" -- no, I don't think you would. I think people will simply shrug and say "sorry, can't loan you the book, but you can buy your own" and then people will either say "ok", or "never mind".

However as books move to this new medium, I realize that the concept of a "library" simply doesn't work with DRM in the device. You can't loan licenses in the current model. Of course, the publishers again want to water at the trough of infinite profits -- but I think the reality will be far worse -- libraries will simply dry up and there will be little in electronic initiative to replace them.

Open formats may fill a little of the gap, but right now this is limited to academic papers and a handful of independent authors... it's hardly enough to keep libraries working... furthermore, open formats don't really encourage a library (except maybe digitally speaking) because you can just make copies of them.

If DRM lived in the doc, I could buy it, I could share it, but I couldn't copy it. If my friend had it, I wouldn't have it. This satisfies the "scarcity" requirement of the publishing industry (without scarcity, there is no value for books or any media).

But somewhere along the way, DRM became about "licenses" tied to devices. So I can't share, I can't own, but everyone can buy their own copies. Of course, the consumer market says "hey, well in that case, I don't use that book all the time, it's not worth the same price" -- and publishers are again shocked that consumers don't want to pay for ebooks at only 20% off.

Even the OReilly subscription model is hard to stomach at $300-400/yr for access to their entire library. I've paid a huge amount, but at the end of the year, I own nothing. It's really hard to see any value in that arrangement unless I'm constantly using 10 books every day, and even then... I can't share them with any colleagues. Then again, at $50-$100 per computer graphics book and factoring that such technology books are obsolete within a couple years on average -- after a certain point, subscriptions do look "cheaper"... but it's still pretty expensive from my perspective.

I ended up buying physical copies.

Wednesday, July 28, 2010

how to call lambdas without call()

One thing that is rather nifty in javascript is the ability to assign anonymous functions to variables and then simply call them. For example, Protovis has this nifty method for creating a mapping function from the specified domain to the specified range:
var y = pv.Scale.linear(0, 100).range(0, 640);
y(100); // 640
y(0); // 0
y(50); // 320
Neat!! Well, how about Ruby?

Well, I've been reading Metaprogramming Ruby (which is a really fun book so far) and we have lambdas. However, with a lambda, you usually have assign the lambda and then call() the lambda.

It looks like this:
f = lambda {|x| x} # 10

But I wanted this to be more like javascript's syntax, so I did some tinkering. Here's what I came up with:

Wednesday, May 19, 2010

the privacy of feedly feeds

A while ago I confronted Feedly about an apparent hole in their firefox plugin on Twitter:

They claimed they don't store credentials per-se and after further investigation I believe them, but there's still something not quite right.

See, if you install the plugin, everything appears normal:

But when you turn on Firefox's "Private Browsing" mode and click the Feedly button, you still see your feeds!

Fortunately, after a while, Feedly attempts to update your feed and displays the login screen:

So this tells me that what feedly says is probably true, they don't cache your credentials in the plugin. However, they still apparently cache content from your feeds for a little while until the next refresh period. By itself, this content cache isn't a bad thing (it's a performance optimization and saves network bandwidth) -- but the fact that their local content cache doesn't respect privacy modes in the browser is somewhat disturbing... does that mean that they cache outside the browser's model? or does that mean that firefox doesn't secure local data? Either conclusion would be troubling.

Does this actually expose private information in practice? I can't guess how you'd exploit it, but it certainly doesn't give me a warm fuzzy feeling either.

Monday, May 17, 2010

SEO's "killer app"

I was stymied when I saw this article about the recent connection between pesticides and ADHD. It wasn't the subject that I found interesting, but the hyperlinks... in particular as I was reading the article something stood out as bizarre:
"Chemical influences like pesticides used to protect produce from insects are believed to contribute heavily to this upward trend. Some scientists believe it may have an even greater impact than other environmental factors like video games, television and online personal loan advertisements that may have been linked previously to ADHD behavior." [emphasis mine and original link to removed.]
Really? Online personal loan advertisements have been linked to inducing ADHD behavior? Of course, I was curious and clicked the link... but it only went back to a rather obnoxious personal loan advertisement on the host site. I looked at other links in the story and realized they all pointed to the ad in interesting ways. Like the text about US and Canadian children... "Canadian" links to "ontario-payday-loans" on the site. Oh... oh that's dirty.

I traced the whois for the site to, an online marketing company specializing in SEO (or Search Engine Optimization). My first thought was that they had simply scraped the original article off the Net and then inserted links -- but I couldn't find any exact matches for this story. However, another story on their site yielded a bunch of exact matches on other sites.

So maybe they are doing something especially innovative for SEO... they are paraphrasing articles, which seem to be legit news stories (I mean they are unless they are scraped) - and then overlay the article with links to their ads. Hey, I give them some credit, the links weren't completely random.

However, this is still a relatively "hard" way to get the numbers required for SEO campaigns -- rewriting pieces and hand-linking them isn't the most scalable business model. And because it takes "human-time" to do, Google can conceivably keep up with the Joneses. But what if there was an automatic way to paraphrase?

SEO's Killer App

Unfortunately, automated paraphrasing is exactly what is around the corner. Anne Eisenberg's article "’Get Me Rewrite!’ ’Hold On, I’ll Pass You to the Computer,’" describes how researchers at MIT and Cornell have come up with probabilistic methods of generating paraphrased content automatically. The code has been out there for a long time already and there are several interesting projects along similar lines such as Sujit Pal's Lucene-summarizer, Classifier4J, Open Text Summarizer and MEAD.

Imagine you are an SEO marketer... you suddenly have the ability to automatically paraphrase content from other sources (even the New York Times!), but because there are no identical phrases, Google can't detect (and therefore can't block) your ranking. In fact, it's difficult for humans to tell which is the paraphrase and which is the original... and unlike most scraped content feeds, it looks completely legit to readers who stumble across it via search engines.

Mix this with other SEO techniques, such as dynamic domain generation, and sell into several different TLDs and search engines would not be able to detect or defeat the flood of seemingly legitimate links.

So why am I helping the "evil" SEO marketers out there by foretelling this unstoppable weakness and possibly bringing about the demise* of search engines?

*Yes, this is what unrestrained SEO might do. Search engines are only used by people when they yield useful results -- when every link goes to an ad, a trick, a deception, people will stop using the service. Notice the chilling effect that telemarketing has had on land-line phones (many people have cell-phones only now), or door-to-door salemen had on a previous era).

The Counter Move

Fortunately, search engines have some awesome counter moves at their disposal. It's the same thing that identifies spam email so readily: the target url. SEO marketers have many many tricks, but the one thing they can't disguise is the link they want you to go to. So this is how you identify them:

"SURBLs are lists of web sites that have appeared in unsolicited messages. Unlike most lists, SURBLs are not lists of message senders... Web sites seen in unsolicited messages tend to be more stable than the rapidly changing botnet IP addresses used to send the vast majority of them... Many applications support SURBLs, including SpamAssassin and filters for most major MTAs including sendmail, postfix, qmail, exim, Exchange, qpsmtpd and others."

Wednesday, May 12, 2010

consumers hooked on oil? please.

I don't agree with Peter Maass' assessment that
"because when you kind of get down to it, American consumers do want to have their gasoline."
As a consumer, I'd love to spend less on gas. One practical example of how I could do this is to telecommute to work (not every job can do this, but some can), but most corporations don't believe this is an option. How about WebEx meetings instead of business travel? Many don't believe this is as good as face time and are willing to pay for air travel. There are many other alternatives I could list that either require unacceptable tradeoffs or drastically reduced standards of living or drastically higher expenses to support alternative energy. The infrastructure for what he wants simply doesn't exist yet.

So what are consumers going to do? Of course they're going to want their gasoline because there is no way to make a living without it. Give us an alternative before accusing us of inaction! Personally, I think rising gas prices will ultimately motivate alternative energy. As alternative energy becomes cheaper than oil, consumers will be happy to switch, as will the corporations and markets that connect them.

But Maass' premise is insulting. He thinks we can afford it and we're just being petulant. Instead, he should be focusing his energy on the existing economic system that was built on oil. Of course we want to get off oil dependency as soon as possible and move towards sustainable energy, but Maass wants the consumer to single-handedly bear the cost of switching without any help from the structure of corporations or governments.

Most of us can't afford that and Maass is forgetting where the consumer's power to spend money comes from in the first place: the ability to earn money and spend responsibly within that structure, not apart from it.

Tuesday, May 4, 2010

Ben Ward says "Link to it!!"

Ben Ward's post about the difference between web-like rich applications and real web content is awesome.

I think it pins down some distinctions that have been brewing for a while, and solidifies several of my own thoughts about it.

For example, his statement:
"If you want to build the most amazing user interface, you will need to use native platforms."
This is the fuel behind java applets, Flash, SVG, VRML and every other extension to the web developed so far. It's about presentation. It's something the W3C does exceptionally poorly, and I think they shouldn't attempt to do it at all. It's also something that every tools company has struggled with: the only way to extend kick-ass native platforms into the browser is to create a new plugin. The only way to get a new plugin recognized is ubiquity among the user-base. Flash arguably won that battle, but it's a losing war in the long run because now it's Flash or nothing. The other "open" alternatives are attempts that are valid in their own right, but have never been quite ubiquitous enough. And HTML5 simply shuffles a bunch of capabilities from plugins to the browser, which certainly makes those capabilities ubiquitous, but doesn't necessarily make them better, or even more open (as Ben points out about H.264).

But Ben also makes a brilliant case for content:
"Want to know if your ‘HTML application’ is part of the web? Link me into it. Not just link me to it; link me into it. Not just to the black-box frontpage. Link me to a piece of content. Show me that it can be crawled, show me that we can draw strands of silk between the resources presented in your app. That is the web: The beautiful interconnection of navigable content. If your website locks content away in a container, outside the reach of hyperlinks, you’re not building any kind of ‘web’ app. You’re doing something else."
This idea, the linking of content, is exactly what the W3C has always excelled at. It's awesome, the idea that I can simply link and connect pieces of content from all over the world and make my own connections and conclusions and in turn become a piece of content for someone else. This was Tim-Berners Lee's original vision of the web and I agree with Ben, it shouldn't be lost. We should fight for it. Maybe the real problem is that presentation and content have gotten so tangled and confused that we've forgotten the difference?

This galvanizes me to talk about some experiments I've been doing recently to make simple websites that make the distinction VERY VERY clear. Let me show you an example:

This semester, I had to produce "mini-sites" for class homework assignments. These mini sites had to basically run as static html because they had to be viewed from the local filesystem of a TF after being zipped up and submitted. They had to be viewable on the wide range of platforms used by the TFs, which effectively meant vanilla HTML/CSS, but I didn't want to have to code a TOC by hand either. What I came up with was an xml file tuned exactly to my content.

Take a look at the finished html page: writeup.html

This was generated from a handcrafted xml file: homework.xml. (You'll have to "View Source" to see the xml if you're in a XSLT capable browser like FireFox because it will automatically render the HTML for you). There are very few presentation aspects in this xml file, and quite a few "new" custom elements I created that are specific to the structure of my assignments. It's small and ad-hoc, which is fine. It's basically a micro-format without all of the extraneous divs.

I render it with xslt using this file for the presentation layer: homwork.xsi. This file contains all the rejiggering and mucking about that presentation layers require. Even so, I tried to make it as clean as possible by utilizing CSS where I could and getting creative with XSLT. It can even provide microformats-compatible semantic markup in the generated div elements -- the best of both worlds! Finally, I generated the static html file using a very simple ruby script: genhtml.rb.

Sure it's rough and simple (I didn't have a lot of time), but it highlights what I want to show: we don't need fancy web platforms or standards to separate content from presentation. We can do it right now, today, with the tools we have, it just requires the right mind-set.

I'd love to see WebGL, Processing and Flash sites that have separate RESTful urls that flay out all their internal content in browseable xml or JSON microformats, easy to link to, easy to transform and include in other pages or references. (btw, Processing already does this somewhat by putting links to the source code in the applet page -- brilliant!)

I'd also love to see brilliant new presentation layer technologies form, like WebGL, Processing for the web, javascript, Flash, webkit, etc. because I want to innovate and create that new killer interface and interaction model for my content that no one has thought of before and no one else has implemented before!

I think Ben's post and my experiments show a way that we can have both of these things at the same time as long as we don't get confused and tangled up in the standards war or the RIA platform war and forget the difference between content and presentation.

Monday, May 3, 2010

OReily's Internet Operating System FTW!!

I'm glad OReily wrote his post "The State of the Internet Operating System." I agree with it a lot.

I said this years ago. In fact, I'll repeat the idea here: just like early video devs went through hell writing to non-standard graphic card memory models, early web devs have had to struggle with non-standard serialization and presentation formats.

And just like back then, a crop of "device drivers" slowly formed (first from DOS systems like Miles Sound System, etc. then later via the OS) -- well it's happening now: people fed up with the conflicting standards of the w3c and webdev-hell have had it and are starting to create "web-standard" independent "drivers" -- like webkit, like jquery -- that don't care what version of HTML browser or CSS you really have, that degrade gracefully by offering "hardware abstractions" (or in this case browser and w3c standards abstractions!) And javascript is rapidly becoming the language of the new platform because no-one else is moving fast enough.

In my mind it's time to put the W3C to bed where they belong -- leave them as keepers of a data standard to the semantic web, but for crying out loud, give the presentation devs a standard platform we can actually code to with the solidity and flexibility of postscript or renderman!

IMHO, the next steps (great leaps forward) are standard serialization formats. Who gives a flying &#&#@ if you are doing a GET or a POST or a JSON or an xml, or java-to-javascript, or what the hell have you -- SERIOUSLY, it's as insane as memory management was before contiguous memory controllers (does anyone remember those days?) -- it's all state serialization, it should have a freakin standard way of writing and reading no matter where it comes from or where it goes to -- I'm not talking implementation details, I'm talking raw syntax.

ActiveRecord and JSON begins to get close to this, but we've got a LONG way to go before I can just say "give me an address from the database and post it to that RESTful service and write it to a memory cache over there -- and it's all the same damn syntax! Why reinvent the wheel 80 bazillion times?

/rant off

Friday, January 8, 2010

risk management implies socialism not capitalism...

I've been hearing several stories on NPR recently that question the ideas of an unregulated free-market economy given the recent financial meltdown. The argument is that laissez-faire markets have been shown "not to work," there are now new doubts about whether markets are the most efficient system for managing resources.

Ok, so why is this disturbing? Because I think a critical distinction is being lost in the debate between free-markets and controlled economies. The critical distinction is a concept of risk.

Unregulated free markets work. They work well. They are extremely efficient. Nothing I've seen disproves that. However, they can be extraordinarily brutal -- usually when speculative bubbles pop. The free-market theory has been that less regulation will allow these bubbles to pop sooner and smaller -- but as recent experience has shown, that's not necessarily the case. A free-market's ideal is in information that represents actual value, but it's entirely possible for prolonged periods of speculation to exist and even grow, fueling such bubbles. When large bubbles pop, huge sections of the economy may be at risk. If this were allowed to happen, the results would be catastrophic, but they would ensure a quick demise to everyone who touched (and fueled) such speculation. This doesn't mean the concept of free-market is broken, or it's not efficient. It is cruelly efficient, but what we are really talking about is risk, not the free-market!

In fact, let me make a bold thesis: all forms of risk management move away from free-market capitalism towards socialism.

I'm not necessarily against socialism, but let's look closely at this new distinction that is absent from the current dialogue: if you embrace risk management, it means ultimately that you are diffusing risk to other areas, you are in effect damping all the wild fluctuations of the market into a smooth, level field. Hedge funds did this by intricately linking and amplifying thousands of previously unrelated investments. The bailout did this by spreading the losses over all American taxpayers. The extreme end of this arrow is zero-risk: zero development, zero capital, zero life. The extreme end of the free-market arrow is maximum-risk, which could be heaven or hell, or likely both over time.

The math of hedge funds and derivatives is fundamentally unstable - in every place it has been used, the results have been unpredictable and disastrous from the beginning of the very first hedge fund to use "zero-risk" maths, which lost billions of dollars in a few months. I believe this is because people are missing the connection between risk-management and socialist-economics. One implies the other automatically. This becomes painfully obvious if you look at the basic premise of hedge funds: to create zero-risk investment instruments... how is such a thing possible in a capitalist system which is built on risk? It would be like trying to build a casino where there is zero-risk... would you call it gambling?

Realistically, we will always have a mixed economy in some respects. The question is not whether one socio-economic system is better than another, it's how much do you want of one or the other. Or put simply, how much risk can we afford? I don't think zero-risk is necessarily a good place to aim, because we've seen exactly how that turns out.