Jason Morrison

Make stuff.

Reading List: February 2014

I am maintaining a reading list:

http://jayunit.net/reading-list

I’m choosing a theme per month, with several works on the theme. I’ll try to collect notes at the end of each month about what I read and wrote, what I thought, and how I might revisit that theme in the future.

Take “reading” loosely – presentations and podcasts definitely count.

What did I complete in February?

My high level goals for February were to:

  • Read about Meteor, FRP, and React.
  • Do some toy projects with these.
  • Skim a finance book.

Meteor

What did I read?

Thoughts:

Lots of interesting ideas, but I’m not sure I’ll use Meteor. Maybe I’ll use parts (DDP).

  • Isomorphic (same code for the client and server) is interesting. I would have to build a large app to see demonstrated advantages, but can imagine.
  • Seems to work best when all used together, but no reason you couldn’t extract e.g. DDP or Deps and use in an existing application. (Right?)
  • There are three places to get packages – core, Atmosphere, and, as of June 2013, NPM. It’s nice that they bless a small set of core packages as a stdlib of sorts, although I don’t understand the need for Atmosphere and NPM to be separate.
  • The Deps module, especially Deps.autorun, is a particularly elegant approach to automatic dependency registration that avoids the need for static analysis. The clever part relies JavaScript’s single-threaded nature, and tracks the current computation as Deps.currentComputation.

    • At first I thought this could break down if you depend on a boolean expression a() && b(). If a() returns false, the language short-circuits evaluation so that b() is never invoked, so the dependency is not registered. However, once a() returns true, b() will get run and the dependency is registered correctly. There must be some shortcomings? External data is a clear example, but that is covered by the more explicit Deps.Dependency facilities.

Questions:

  • What effect does DDP and the use of a document storage model inform data modeling? How would you build a system like DDP atop a relational model? One system I work on uses multiple steps of data mapping (SQL –> Python objects –> JSON –> Backbone models and back). This seems like unnecessary layers of complexity, and replicating a datastore into the client, like minimongo does, seems like a preferable situation in many cases. How might you introduce DDP to an existing rich-client application without rewriting it in Meteor?
  • How do you build for reliability atop DDP and RPC? (E.g. ensuring all RPC endpoints are idempotent.) How does DDP navigate timeout/retry/backoff? See Andrew Wilcox’s meteor-offline-data work.
  • How does operational transform (OT) fit in? Compare to the Derby framework and its Racer library, which uses ShareJS for OT:

    • OT is one approach to conflict resolution. I assume there are many. What are the tradeoffs?
    • What other conflict resolution approaches exist? Maybe some in the thinkdistributed.io Causality episode.

      The correctness problems of OT led to introduction of transformationless post-OT schemes, such as WOOT, Logoot and Causal Trees (CT). “Post-OT” schemes decompose the document into atomic operations, but they workaround the need to transform operations by employing a combination of unique symbol identifiers, vector timestamps and/or tombstones.

      http://en.wikipedia.org/wiki/Operational_transformation

    • If you use OT, can you use tree-structured data? Is the scope of OT limited to a document? Can you coordinate operations across documents?

What would I study about this next?

  • How does the very new (documented February 27) Blaze rendering system compare to React?
  • How well does Meteor play with other libraries? I recall seeing a “modularized” version of Meteor where some parts were available a la carte. What does it look like to involve something like Backbone for models? React for DOM computation?
  • What is the multi-server story (for performance and for availability)? I think that the new oplog work is supposed to support this.
  • SQL bindings

Functional Reactive Programming (FRP)

What did I read?

There are a bunch more resources in my 2014-February reading-list page that I collected but didn’t get to.

Thoughts:

Seems like a solid theoretical underpinning for complex dataflow apps.

FRP concepts (behaviors) have similarities to promises – they represent an abstraction of a value. Promises represent a future value. Behaviors represent a value which may vary continuously over time, they are functions of time.

From reading the papers, I also learned a more general concept – that of a paper separately introducing formal semantics from a specific implementation. I haven’t read enough CS papers to know how typical this is.

That reminds me of a thinkdistributed.io podcast on Raft (a consensus algorithm) which was designed for understandability, seemingly a novel goal in CS research. The result is a large number of (attempted) implementations, because the theory was so approachable. Does the Raft paper propose a formal semantics? See in May/June.

Questions:

I’d like to try out https://github.com/baconjs/bacon.js and read about underlying ideas.

If I were to study further/return to this, what would I look at?

I didn’t get to read too much about FRP. I would like to try some of the JS implementations to build some small dataflow applications. I’d like to build some medium-sized React.js apps and compare that experience to the FRP libraries.

React.js

What did I read?

First, I read about React from other people:

Then I react the docs themselves:

And aimed to get my hands dirty:

Thoughts:

Looks good, I want to do more with it.

The design of React is very appealing; using functional composition, cohering templates with view logic, implementing synthetic DOM events atop delegation, and providing an immediate mode atop the DOM’s retained mode Not sure on the JSX syntax, but I think I’ll like it as soon as I install vim-jsx.

If I were to study further/return to this, what would I look at?

  • Build something more substantial with React
  • Om, a CLJS wrapper atop React enjoying advantages from immutable data structures and presumably other CLJS fanciness
  • What does this discussion mean by “UI as value”?

People:

https://twitter.com/sgrove https://twitter.com/floydophone https://twitter.com/swannodette

Finance

On my friend Mac Cowell’s recommendation, I skimmed some of The Little Blue Book That Still Beats the Market. I maintain a healthy level of skepticism, but underneath the “wow it’s pure magic!” skin appears to be a proxy for value investing that identifies underpriced companies. Time will tell if broad dissemination of this valuation strategy will correct the underpricing, or if it holds.

I’d like to, as an exercise, build a software implementation of this strategy and backtest it. I’d also like to understand if there is affordable historical market information that avoids survivorship bias, and to understand what other backtesting blind spots I may have.

I’ll follow this up (also on Mac’s recommendation) with Graham and Zweig’s The Intelligent Investor, maybe in March or April.

Links from an hour-ish of searching about building trading simulations:

Retrospective

The pace for this month was quite high, but I enjoyed keeping up with it. I’ve found it helpful to schedule reading nights on my calendar to try and block some time off for paying attention.

I could not read some of the FRP papers for lack of understanding of some foundational functional programming concepts (applicatives, monoids), so I will read on those next month.

I wish I had done more hands-on programming with these new tools.

The part I have enjoyed the most is discussing these ideas with others. I’d like to try small reading groups or journal clubs around some of my future readings.

Next, onto March reading: including Clojure, FP concepts, and core.async.

RedStorm: Distributed Computation in Ruby

On December 11, 2012 I gave a talk at Boston.rb about writing distributed realtime computations in Ruby using Storm by Nathan Marz and RedStorm by Colin Surprenant.

There is a video of the talk on the Boston.rb website, and the slides are posted online.

Basically, Storm provides a framework for building streaming/realtime computations (like log analysis, for example) and distributed RPC for running large adhoc computations on a cluster. RedStorm is a JRuby-based adapter for writing these computations and assembling them into topologies (workflows) in Ruby.

Here are the recommended resources from my talk:

Getting started

Related software tools

  • storm-contrib provides integration with many third-party tools like communicating with queues, service buses, and databases.
  • storm-deploy “makes it dead-simple to deploy Storm clusters on AWS.”
  • storm-mesos provides integration with Apache Mesos for cluster resource management.

Documentation

Talks

Two excellent talks by Storm author Nathan Marz:

Book

  • Big Data is an early access book by Nathan Marz which covers “Principles and best practices of scalable realtime data systems”

Other ESP/CEP resources

Storm lives in a space that’s often referred to as ESP (“Event Stream Processing”) or CEP (“Complex Event Processing”):

CLAHub: Easy Contributor Agreements on GitHub

CLAHub is a small side project I cooked up a few months ago, and just got around to open-sourcing. The goal is to remove the friction of Contributor License Agreements for contributors and maintainers alike. It’s not done yet, but I’m curious to hear what people think.

What is it?

The general idea with CLAs is this: contributors grant the maintainer a license to distribute the their code, and state that they’re legally able to do so. A fair number of projects have a CLA in place, including jQuery, Node.js, Django, and Chef. In the best cases the CLA is signed via electronic signature, like Node.js does with a Google Form. In the worst cases you have to print, sign, and fax the agreement. In all cases, maintainers are responsible for cross-referencing contributions and signatures to make sure all contributions have a corresponding signature.

With CLAHub and an open source project on GitHub you can:

  • Sign in with GitHub and create a CLA for your project.
  • Ask contributors to sign in with GitHub to electronically sign the CLA.
  • See on each pull request whether the contributors have all signed your CLA. This uses the handy Commit Status API, similar to what CI tools do.

Here’s the app. There’s a little slideshow on the frontpage to see how it works. And here’s the source on GitHub.

Learn more about CLAs

Here’s some more background on CLAs:

Want to choose a CLA? Project Harmony is a web tool that helps you quickly select a CLA.

Feedback

There’s more that needs to be done, but the core of the app works. The next steps are in GitHub issues.

Do you use a CLA for your project(s)? Would this encourage you to add a CLA if you don’t have one already? (That’s not really my goal – just to reduce friction where CLAs are already valuable.) If you have a CLA, would you use something like this to reduce the barrier to entry and your overhead? What kinds of features would be useful?

Papernaut: Exploring Online Discussion of Academic Papers

If you regularly read scholarly papers, you likely use a reference manager to maintain your personal library. Papernaut connects to your library to find online coverage and discussion of your papers in blogs, forums, and mainstream media. My hope is that these discussions can provide broader perspective on research and, in some cases, be the spark that starts a new collaboration.

Here’s a very quick video demo. We start with a Zotero library that includes a paper from Science on the effect of pesticides on honey bees. We then connect to Papernaut, and find several discussions and articles, including one in The Guardian:

I’ve been working on Papernaut in my spare time for a few months, and I’m happy to say that it’s now open source. The project comes in two parts, and the source is on GitHub:

If you are interested in how the application is put together, the rest of this article is a technical overview of the moving parts and how they interact.

Overview: A simple example

Let’s walk through a simplified example. Say I have only one paper in my reference manager — that paper from earlier, about the effect of pesticides on honey bees:

Henry, M., Beguin, M., Requier, F., Rollin, O., Odoux, J., Aupinel, P., Aptel, J., Tchamitchian, S., & Decourtye, A. (2012). A Common Pesticide Decreases Foraging Success and Survival in Honey Bees. Science, 336 (6079), 348-350 DOI:10.1126/science.1215039

Let’s also say that the engine is crawling content from only one source feed, ResearchBlogging.org. Among many other content items, that source feed contains a relevant entry, whose content page is on The Guardian.

We’ll look at how the engine crawls and indexes this source feed. Then, we’ll see how the frontend pulls the paper from my reference manager and asks the engine for relevant discussions.

Papernaut-engine: Loading content and identifying papers

The goal of the engine is to produce a collection of Discussion records, each of which links to several Identifier records, representing journal papers that are referenced from the Discussion. In our example, the Discussion is the article in The Guardian, and the relevant Identifier is DOI:10.1126/science.1215039. There are also intermediate objects, Page and Link which connect Discussions to Identifiers.

The engine consists of two main parts: loaders (which are Ruby classes), and the query API (a Rails app). For loading, it also depends on an external running instance of the Zotero translation-server.

Loading content by crawling feeds

The loaders load discussion candidates from feeds and archives, extract outbound links, and store these in the database.

In the first step, I invoke the ResearchBlogging.org loader to crawl and index the most recent 100 pages of their archives:

[engine] rails runner "Loaders::ResearchbloggingWebLoader.new(100).load"

This will load a large number of Discussion entries into the database, with zero or more Page entries for each Discussion, corresponding to outbound links.

At this point, the engine database contains the Discussion:

#<Discussion id: 3424,
             url: "http://www.guardian.co.uk/science/grrlscientist/2012/may/08/1",
             title: " Bee deaths linked to common pesticides | video | G...", ...>

and the linked Page entries:

[#<Page id: 7531, url: "http://dx.doi.org/10.1126/science.1215039", ... >,
 #<Page id: 7532, url: "http://pubget.com/doi/10.1126/science.1215039", ... >,
 #<Page id: 7533, url: "http://dx.doi.org/10.1126/science.1215025", ... >,
 #<Page id: 7534, url: "http://pubget.com/doi/10.1126/science.1215025", ... >]

Identifying papers via the Zotero translation-server

The engine determines which outbound links (or Pages) are academic papers by issuing calls to the Zotero translation-server HTTP API. The translation-server is a third-party project from open-source reference manager Zotero. It examines a given URL and, if that page contains an academic paper, it returns common publication identifiers such as DOI or PMID.

The translation-server wraps the Zotero translators, a set of JavaScript scripts that do the heavy lifting of parsing a webpage and attempting to identify it as one or more academic publications. These translators are maintained by the community, keeping them fairly up-to-date with publishers. The translation-server uses XULRunner to run these scripts in a Gecko environment, and makes them available through a simple HTTP API:

[~] ~/dev/zotero/translation-server/build/run_translation-server.sh &
    zotero(3)(+0000000): HTTP server listening on *:1969

[~] curl -d '{"url":"http://www.sciencemag.org/content/336/6079/348.short","sessionid":"abc123"}' \
     --header "Content-Type: application/json" \
     http://localhost:1969/web | jsonpp

    [
      {
        "itemType": "journalArticle",
        "creators": [
          { "firstName": "M.", "lastName": "Henry", "creatorType": "author" },
          { "firstName": "M.", "lastName": "Beguin", "creatorType": "author" },
          { "firstName": "F.", "lastName": "Requier", "creatorType": "author" },
          { "firstName": "O.", "lastName": "Rollin", "creatorType": "author" },
          { "firstName": "J.-F.", "lastName": "Odoux", "creatorType": "author" },
          { "firstName": "P.", "lastName": "Aupinel", "creatorType": "author" },
          { "firstName": "J.", "lastName": "Aptel", "creatorType": "author" },
          { "firstName": "S.", "lastName": "Tchamitchian", "creatorType": "author" },
          { "firstName": "A.", "lastName": "Decourtye", "creatorType": "author" }
        ],
        "notes": [],
        "tags": [],
        "publicationTitle": "Science",
        "volume": "336",
        "issue": "6079",
        "ISSN": "0036-8075, 1095-9203",
        "date": "2012-03-29",
        "pages": "348-350",
        "DOI": "10.1126/science.1215039",
        "url": "http://www.sciencemag.org/content/336/6079/348.short",
        "title": "A Common Pesticide Decreases Foraging Success and Survival in Honey Bees",
        "libraryCatalog": "CrossRef",
        "accessDate": "CURRENT_TIMESTAMP"
      }
    ]

There are several useful standardized identifiers here – DOI, URL, and ISSN.

So, continuing with our example from above, I’ll next start the Zotero translation server and identify the pages:

[engine] ~/dev/zotero/translation-server/build/run_translation-server.sh &
         zotero(3)(+0000000): HTTP server listening on *:1969

[engine] rails runner "ParallelIdentifier.new(Page.unidentified).run"

The engine issues calls to the translation-server and records new Identifiers. Now, the Page entries we previously crawled:

[#<Page id: 7531, url: "http://dx.doi.org/10.1126/science.1215039", ... >,
 #<Page id: 7532, url: "http://pubget.com/doi/10.1126/science.1215039", ... >,
 #<Page id: 7533, url: "http://dx.doi.org/10.1126/science.1215025", ... >,
 #<Page id: 7534, url: "http://pubget.com/doi/10.1126/science.1215025", ... >]

have corresponding Identifier records:

[#<Identifier id: 1819, page_id: 7531, body: "DOI:10.1126/science.1215039" ...>,
 #<Identifier id: 1820, page_id: 7531, body: "URL:http://www.sciencemag.org/content/336/6079/348" ...>],
 #<Identifier id: 1821, page_id: 7533, body: "DOI:10.1126/science.1215025" ...>,
 #<Identifier id: 1822, page_id: 7533, body: "URL:http://www.sciencemag.org/content/336/6079/351" ...>,

Two of the four pages were identified (7531 and 7533), and both of those pages received two identifiers apiece. This means that the Guardian Discussion actually referenced two different papers, not just the one we’re interested in.

Now that there is a link between the paper in question and this discussion page, we are ready to visit the frontend.

Papernaut-frontend: importing libraries, finding discussions

The frontend works in two distinct phases: first, it helps you import papers from your reference manager. Second, it shows you discussions for those papers.

You can import your papers via the Zotero API or Mendeley API by giving Papernaut access to your libraries via OAuth. This happens with omniauth-zotero and omniauth-mendeley libraries, followed by the ZoteroClient and MendeleyClient classes.

Alternatively, you can import papers from most reference management software by exporting and uploading a .bibtex file. Papers and their identifiers are then extracted with the BibtexImport class.

Many papers will have multiple identifiers, and the frontend attempts to clean and validate your papers’ identifiers as best it can in an attempt to find the best matches.

Once your papers are loaded into the frontend, it issues requests to the papernaut-engine query API to find discussions that match papers in your library.

The interface between the frontend and the engine are Identifier strings, which take a type/value form:

  • DOI:10.1038/nphys2376
  • ISSN:1542-4065
  • PMID:10659856
  • URL:http://nar.oxfordjournals.org/content/40/D1/D742.full

So, in our example video above, we authenticate via Zotero and authorize Papernaut’s API access via OAuth. The frontend extracts our library of papers from Zotero and stores their Identifiers locally. It issues requests to the engine’s query API for matching discussions, and displays those to the end user:

Deployment

In production, the Papernaut engine and frontend are deployed to Heroku. The translation-server is deployed to EC2. I spin it up and run the loaders periodically, to reduce hosting overhead.

There is a DEPLOY.md file for both the frontend and the engine that goes into further detail.

Next steps

I’m excited to see what kinds of results people get with Papernaut, but it’s still very early software. I look forward to making a variety of improvements.

I’d really like to add a bulk request API endpoint to the engine, so that the frontend can discover discussions in a single HTTP request, rather that one request per paper. That’s a big performance hit, and the user experience right now for large libraries is that the frontend just hangs for a while.

On the engine side, I’d like to do a better job of culling false positives in the matching engine, and of contributing to Zotero’s translators to improve the match rate. I think the primary issue there is that the translator-server actually only runs a subset of all the Zotero translators, as some declare that they only work inside a real browser context (see “browserSupport”).

I’d like to get a larger sample set of BibTeX files to try, as there are probably edge cases and assumptions in the importer waiting to be hit.

I’d also like to background some of the tasks in the frontend’s import process; validating DOIs is a big one there. Ideally, the whole library import would be backgrounded, and the user interface would be notified when the import is complete.

Currently, some matches are missed because the engine and frontend have different identifiers for the same paper – say a DOI and a PMID. I also have an experimental branch that cross-references papers with the crossref.org API, which yields more complete information. Ideally that would happen in the engine. I’ve also seen some library management and import tools that use Google Scholar to improve matching and identification.

After that, I’d like loaders to run semi-continuously instead of manually, and to have more robust infrastructure around paper identification.

In the long term, it would be interesting to try and bring the discussion matching experience directly into reference managers. This is one reason why I provide the engine query API separately from the frontend.

Conclusion

I’m most interested in hearing feedback from people. Is this useful to you? If you use a reference manager, give Papernaut a spin and let me know how it goes.

A Year of Travel

On December 4, Lindsay and I returned to the US after a year of traveling abroad. Lindsay diligently blogged our experiences and her photos at cadeparade.com.

We have spent December visiting family and friends. On December 31, we fly to San Francisco to start the next chapter of our lives.

It’s adventure time all over again

Time to sift through apartments and carefully consider our work, to reacquaint ourselves with first world amenities and first world problems. To reunite with family and friends, to fondly shuffle through our notes and photos, and to reflect on our travel experiences and put them into context.

Also, to eat fajitas and burritos en masse, because let me tell you: Mexican and Tex-Mex food outside the Americas just is not the same.

The first half in photos

All photos are by Lindsay Cade, and are from the first half of the year.

During the first six months of December 2011 through May 2012, we traveled in India, Thailand, Laos, Vietname, Cambodia, and Burma (Myanmar).

We traveled to places beautiful and remote:

Ate incredible foods:

And some not-so-incredible ones:

We enjoyed amazing sunsets:

We ventured across deserts:

into backwaters:

through rivers and valleys:

We marvelled at constructions old and new:

During the second six months of June through November, we traveled in the Czech Republic, Italy, Turkey, Germany, France, England, Thailand (again! we are quite fond of it), South Korea, Malaysia, and Hong Kong. We ended the trip where we began, returning to India for a month.

I cannot recommend this experience highly enough. My sense of perspective and patience have been changed at a fundamental level. At the same time, I’m very much ready for this return to the US, to be with friends and to focus on my career, to do good in this world of which I’ve now seen a tiny slice more.

Hitting the Road!

On November 28, my wife Lindsay and I are flying to India. We have no return tickets, and little plan. I’m leaving a great job; “professional ennui” is the furthest thing from my motivations. What’s going on?!

Adventure Time!

It’s adventure time!

If there’s one common lesson I could distill from my collegiate and professional engagements, it would be the value of diverse experience, and the difficulty of planning to build that experience. Sometimes you just gotta jump in learning’s way.

We’re young, not tied down, and have seen like 0.0001% of the world. So, earlier this year, after getting engaged, we decided: let’s hit the road! Our plans are loose. As of now, we:

  • Have 1-way tickets to Delhi and 5-year visas to India. Many countries in Asia have VOA (visa on arrival) for US citizens.
  • Got our arms jabbed (immunizations).
  • Are brandishing a fat sack of doxy and a veritable menagerie of antibiotics.
  • Booked two days booked at a hotel to buffer our jetlag.
  • Asked a friend-of-a-friend to find a short-term lease in Delhi.
  • Are super frigging pumped. I mean, come on!

I’ll miss the crap out of my friends here in the US. We’re flying around a bit to visit folks before heading overseas – San Fran tomorrow through Wednesday, then Buffalo, then Houston for Thanksgiving.

Then, on November 28, IAH-ORD-DEL.

Closing thoughts

Journeys are the midwives of thought. Few places are more conducive to internal conversations than a moving plane, ship or train. There is an almost quaint correlation between what is in front of our eyes and the thoughts we are able to have in our heads: large thoughts at times requiring large views, new thoughts new places. Introspective reflections which are liable to stall are helped along by the flow of the landscape. The mind may be reluctant to think properly when thinking is all it is supposed to do.

If we find poetry in the service station and motel, if we are drawn to the airport or train carriage, it is perhaps because, in spite of their architectural compromises and discomforts, in spite of their garish colours and harsh lighting, we implicitly feel that these isolated places offer us a material setting for an alternative to the selfish ease, the habits and confinement of the ordinary, rooted world.

― Alain de Botton, The Art of Travel

Backbone.js on Rails Talk

On Tuesday, September 20, I gave a talk at the New Hampshire Ruby Users Group on Backbone.js on Rails. I’ll be giving a very similar talk on Tuesday, October 11 at boston.rb and a version more targeted to front-end developers on Wednesday, October 26 at the Boston Front End Developers meetup

I have posted the Backbone.js on Rails slides online, and the slide source is on my GitHub.

As an aside, I’m using landslide for the slides – I love the resulting HTML and interface, though I’ve heard great things about deck.js.

People found the resources sections useful. Many of the links are buried in the presenter notes, so I’ll repeat them here. There are plenty more online, and I’m sure I’m missing some content. Please link to any of your favorites in the comments, and I’ll add them.

Testing

Push synchronization

Get started with Backbone

Further reading: Books on JavaScript

Further reading: Online resources

Notes From the MIT Startup Bootcamp 2011

If you’d like to talk with other people who made it to this event, check out the Hacker News discussion thread.

Yesterday, September 24 2011, I had the pleasure of attending MIT’s 2011 Startup Bootcamp. In its third year, Startup Bootcamp brought an inspiring and thoughful collection of speakers who have had a variety of startup successes.

The event hashtag #sb2011 is a stream of reactions and pull-quotes from the event – mixed here and there with excited anticipation for a dance festival in Goa.

Ten speakers presented a variety of viewpoints, insight, and food for thought.

It was a mixed bag – yes, there was unnecessary focus on vanity metrics and the rah-rah of startup theater. Breathless celebration of hockeysticking uniques and of flying around to court VCs makes for good TechCrunch articles. Like it or not, that’s an inculcated part of startup culture.

But if you get past the Hollywooding and the Silicon Valley adulation, there were gems of solid advice, grounded in experience, on hiring (Paul English of Kayak), data-driven product development (Naveen Selvadurai of foursquare), optimizing your life for personal growth (Drew Houston of Dropbox), identifying underlying social and technological shifts that enable new products (Charlie Cheever of Quora, Patrick Collison of Stripe), negotiation (Alex Polvi of Cloudkick), the importance of on-the-ground and unscalable product development tactics early on (Nathan Blecharczyk of Airbnb), earning and answering to the responsibility of finding your own way in the world (Anthony Volodkin of Hype Machine) and how important it is to empower yourself in perhaps the largest disruptive theme of our time by learning to code (Patrick Collison of Stripe).

Paul English, CTO and co-founder of Kayak.

Recruit a diversity of success.

Paul spoke on three kinds of recruiting: companies recruiting new hires, companies recruiting investors, and job-seekers recruiting companies.

When you’re recruiting, look for success, regardless of the kind. In fact, look for a diversity of success. Paul once hired an olympic rower, and a chess grandmaster, and couldn’t be happier with these decisions. Find people who operate at the top levels of excellence.

Some companies have a “no assholes” rule – at Kayak, they have a policy of “no neutrals”. Like Charlie Cheever, who later discussed the importance of hiring people you have high-bandwidth communication with, Paul encouraged building a team of people who are fully engaged: “intense and in-your-face – in a good way.”

Leah Culver, CEO and co-founder of Convore

Show up, say yes.

Leah told an lighthearted and likeable story of her journey from big state school CS major to Silicon Valley startup founder. Full of serendipity and luck, she shared stories of driving a UHaul from her native Minnesota out to the Bay Area (picked not primarily for its burgeoning tech scene, but for how much better the weather is), getting started with Instructables, and bumping into Pownce co-founders Kevin Rose and Daniel Burka at a party.

Have a good story to tell the press – you don’t have to tell people the ugly, dirty truth.

Another of Leah’s pieces of advice was a common thread through the talks – that of consistent applied effort. “Show up,” she said – in places with a critical mass of startup people, such as Silicon Valley – and “say yes” to opporunities that come your way.

Andrew Sutherland, founder of Quizlet

I didn’t just rush it on my parents that I was leaving MIT. It took two whole weeks.

Andrew shared his story of inspiration for an online learning tool. When he hacked together a prototype to help study for a French III class in high school and subsequently aced the test, he knew he was onto something.

Andrew discouraged market research – “If I had googled for online flash cards, I would have found other sites, that were not as good, and I wouldn’t have made Quizlet. Now, we’re 10x the [volume] of our next competitor.”

This phrasing raised some contention. I would reframe his advice as: focus on your own products rather than on the competition, and don’t be discouraged by incumbent players; rather, recognize them as a validation of the market space, and proceed to out-execute them.

Naveen Selvadurai co-founder of foursquare

At first, go with your hunch. Later, with data.

Naveen worked for Lucent and Sun in college. This was important – it was real-world learning. Seeing engineering culture, doing code reviews, shipping real products. Sun had an open culture of learning where you can dive into other products. “How’d they build Solaris? File systems?” Just sign up for the mailing list.

Naveen shared seven pieces of distilled advice:

  1. Keep good company.
  2. Make something that people want.
  3. Build around an atomic action.
  4. Seek mentors early.
  5. At first, go with your hunch. Later, with data.
  6. Balance unknowns with knowns.
  7. Always be recruiting.

On the last point Naveen shared the four stages of foursquare’s hiring strategy:

  1. Hire friends
  2. Hire friends of friends
  3. Use an external agency (but they didn’t find this valuable)
  4. Hire an internal fulltime recruiter.

It needs to be someone’s job to think about recruiting, seven days a week. Additionally, as a founder, you must always be recruiting.

Charlie Cheever, founder of Quora

Work with people you have really high-bandwidth communication with. Understand how the other person is thinking.

Charlie shared great advice on early-stage tactics. Start with few users (Quora started with fewer than fifty) and a low-cost MVP. Foster the community by hand, be high-touch and, if your business builds on user-generated content, be prepared at the beginning to build a lot of it by yourself. See how the experiment goes, and then take the learning from that experience and apply it to your MVP.

He shared the importance of collecting metrics early on. With Quora, they actually stored the entire webpage for every visit for every customer, so that they could go back later, having identified trends or formulated hypotheses, and see the site as their users saw it.

They noticed a set of high-engagement users, looked at these users’ expereinces, and found that they had all used Facebook connect. Running with this, the team spent time focusing on improving their social experience.

Charlie also left the audience with good food for though:

What wave enables your product? Why is now the right time to build it?

For foursquare, it was GPS-enabled mobile phones. For Quora, it was that “normal” people were comfortable sharing things online, and that the web was turning into a mess; with Google turning up more content farm results, people were moving onto safe harbors of organized information like IMDB and Wikipedia. The timing was right.

Drew Houston, co-founder of Dropbox

Get out of your comfort zone. Learn a little about a lot.

“Everything big starts small” – Drew’s original perception of startups was that of Tolkien’s Mount Doom. His original strategy to build a successful startup was to be overwhelmingly prepared – nab an MIT CS degree, get a few years’ exerpience working for small companies and big companies alike, come back for a PhD, maybe an MBA.

He then related a story from Dropbox’s origins: Drew had just settled into his seat on a Chinatown bus from Boston, in which he could usually get in several hours of undisturbed work. He popped open his laptop, and searched his pockets for his ever-present USB thumb drive. “Shit.” Realization set in just as he visualized, in his mind’s eye, the thumb drive sitting on his desk at home. “Like any good engineer with a problem to solve, I opened my editor.” Drew then wrote the first lines of what would eventually become Dropbox. Today, his company has a multi-billion dollar valuation and “stores more files than Twitter stores tweets.”

Drew exhorted the audience to learn about a broad variety of topics: sales, marketing, finance, accounting, product design, psychology, influence, negotiation, organizational design, management and leadership, business strategy. Buy books (“today we have this amazing thing, Amazon”), dip in, find mentors, and surround yourself with smart people.

Wrapping up, Drew shared his advice for success:

  • Take on more than you’re “ready for.”
  • Maximize how much you learn per unit time.
  • Stack the odds in your favor. Surround yourself with great people; you are the average of your five closest friends.
  • The fastest way to learn about startups is to join one.
  • Starting a company is one of the best ways for engingeers to change the world.

Alex Polvi, founder of Cloudkick

No matter what number they offer, pause, count to 10 in your head, and then act as disappointed as possible.

Alex spoke on negotiation, specifically about his experience of his company Cloudkick being acquired by Rackspace.

  • If a VP of Corp Dev says “strategic” to you, they are talking about acquisition.
  • Acquisitions are a bit like romantic relationships: you often get the most attention when you’re looking for it the least. Once you are involved with one party, others can sense it. You somehow become more desirable.
  • Once you have a term sheet from one prospective buyer, you have great leverage. When others call you up, you can very quickly get to hard numbers.

The best negotiation position is one of truth. Build something of value that people want, and your position is irrefutable.

Alex also discussed the importance of taking care of your team, and the people around you. Upon acquisition, he fully accelerated all employees’ options – whether they had been with Cloudkick for four years or four weeks, they were all fully vested and could share in the company’s success. It was important that the acquiring party, Rackspace was on board with this – and they were. Rackspace wanted the new team members to stick around not because they were waiting to vest, but because they wanted to be there.

Anthony Volodkin, founder of Hype Machine

Venture Capital? You do not need anyone’s permission to make stuff.

Anthony shared the perspective that VC or angel investment can be very important, but it’s not for everyone. “I don’t want to shut something off because the math doesn’t work. For people to not remember it. That would make me sad.”

Anthony’s vision was a question: while people with cool friends can get interesting music recommendations from that network, what about people without cool friends? He knew that there was great taste and insight being shared by music bloggers online, and sought to aggregate and distill it. “I didn’t want to miss anything.”

(If music startups are your thing, Anthony couldn’t recommend highly enough Dalton Caldwell’s talk from Startup School 3 on music startups.)

Find your own way.

He started Hype Machine from his dorm room. He didn’t take investor money. This gave Anthony and his team the freedom to run the company as they pleased.

“We wanted to travel,” he said – so they packed their bags and hung out in Berlin for a month. It was cheaper than they would have thought, “about six thousand dollars,” and incredibly fun. But if they’d had VC money? “No way,” Anthony imagined an advisor’s response, “we thought you were, you know, going to be working sixteen hour days. Now you want to go to Berlin and maybe work?”

YCombinator? TechStars? Just fucking make something.

Anthony exhorted: it’s okay to have a different process. Don’t discount investment and the accompanying advisors, but don’t go blindly down that most celebrated path. With a different process, it’s easier to stand out, to be differentiated. You can always get money if you are making something great.

Nathan Blecharczyk of Airbnb

You have to have a vision, you have to be able to execute that vision.

Nathan shared a 2008 pitch deck for Airbnb (then AirBed&Breakfast) – the first time this deck had ever seen the light of day. Tiffany Kosolcharoen posted photos of the slides on her blog.

He highlighted its strengths – it had a problem statement, and had a bottom-up business projection by analogy to CouchSurfing and Craigslist. He was also quick to point out its weaknesses – it involved hand-wavy notions of unlikely major player partnerships, and touted top down projections (“If we can capture 2% of the $1.9B travel booking market… imagine!”) that are quick to raise doubt from savvy adviors or investors.

The company was accepted into Y Combinator’s Winter 2009 class. YC companies are supposed to be heads-down; but at Paul Graham’s behest, the cofounders zeroed in their market focus to just New York and hopped redeyes back and forth every few weeks. They met with their initial supply-side renteres in bars, and chatted about how things were going. As the team refined the product and identified sticking points, they could be on the ground to help optimize listings. They’d go with people into their homes and take high-quality photos. They found that the initial asking rates were a little too high, so they asked their listers (after a few drinks) to lower their prices. Things clicked, and soon they had handled $250,000 in bookings of which they collected 10%.

Fast-forward to the YC W09 Demo Day, and although at that point Airbnb has already accepted Sequoia investment, they had prepared a Demo Day deck. Gone was the hand-wavy top-down projection and partnership hopefulness, replaced with a quarter million dollars of demonstrable traction, a tight initial market focus, and a tight, clear problem statement.

Like many of the speakers, Nathan stressed the importance of finding quality mentors.

Patrick Collison co-founder of Stripe

It is impossible to motivate great people by something that is merely going to be profitable.

Patrick’s talk was an excellent finish to the day. He delivered an essay full of engaging stories – I sincerely hope it will be posted online in full.

Patrick’s story was of his trip from hardcore Lisp academic to startup founder. Along the way, he developed one of the first iPhone apps, an offline Wikipedia, before the SDK and App Store, by debugging ARM assmebly. He shared the touching experience of getting emails form users whose lives he had changed; from bringing the world’s knowledge to villages in rural Peru and Ghana to delivering the freedom to browse Wikipedia without overisght to people behind the Great Firewall of China. At nineteen, he co-founded and sold an online action tool, and is currently working on a new payment startup, Stripe.

The anthropological story of the last twenty years is that software is taking over the world. Even if you’re a traveling violinist, you should learn how to program. Do all you can to ensure code is not a foreign language.

Hello, Octopress. Hello, Blog.

On writing

I’ve written sporadically here for several years about programming and language theory, synthetic biology, amateur biology, running user groups and barcamps, multitouch and immersive interactions.

I’ve imported my old posts from WordPress into Octopress. That was — oh wait, I was about to write about that experience before I even began. I was going to say how buttercream-frosting-smooth it was, and that’s probably because I have a lot of confidence in exactly that, mostly due to their well-coiffed htmls. Update! Turns out they’re Jekyll migrations instead. Still easy-peasy.

I’ve written more frequently and recently over on the thoughtbot blog, on development-related topics from from little tips to medium-size tips to architecture deep-dives, from product announcements to high-performance bears.

I’ll be traveling extensively over the next year, and will be writing about that, too. But that’s a different post.

On tools

I wrote most of my previous posts in Mephisto, which was kind of janky after a white, and then switched to WordPress, which is totally not Ruby, and more or less means I have to run a VPS and make sure I don’t get chainsawed by spammers. Also, I’m interested in switching to a toolset more near and dear to my heart. Octopress fits the bill.

This also means I can write using vim and git, like a champ.