Author Archives

Epistemological Modesty and Big Data

I am currently “reading” (listening to the audiobook actually) David Brooks’ Social Animal. First of, if you care about understanding human behavior in any realm, it is a fantastic read. I have always enjoyed Brooks’ columns at the NYT. His writing in this book is just as engaging and thought-provoking. I listened to a segment [...]

Digital Marketing and BI integrated in Business Process

With my evaluation of the major SEM platform vendors over the last few months as well as my time with demand-side platforms for display, I’ve finally experienced a productized version of business intelligence integrated with business process that scales across many different businesses. While evaluating (and now using) SEM platforms, I have been impressed by: [...]

Small businesses and BI

I saw Dayna Grayson’s post about the winners of NBVP’s Seed Competition. One of the winners was Profitably, which is providing analytics and BI on data in Quickbooks. They are joining a growing list of companies that are selling SaaS BI on standard data schemas to SMB’s (small and medium businesses). Another recent one that [...]

Summing a column of numbers

Often, when I’m doing data validation with the output from Hive and comparing it to another system, it’s useful to get Unix to do some summing for me.  awk, cut, sort, and uniq are quite handy in these cases, and often much faster than modifying and re-running Hive queries.  Here’s my bag of tricks: – [...]


A colleague recently pointed me at Datameer, an analytics front-end for Hadoop.  As their website and datasheet mention, they use a familiar spreadsheet interface for large data.  I recently saw a demo of the product, and I thought they had done a nice implementation of joins through a graphical user interface targeted at non-ETL experts. [...]

Hive Annoyances

As I mentioned in my prior post, I’ve been using Hadoop / Hive for six months now.  My top three frustrations with Hive v0.4.0: 1) a decent quality CLI (command line interface) for Hive.  Editing of a query in Hive is very limited. You can’t use custom keybindings – ideally, you’d want the CLI to [...]

Hadoop and Hive

I’ve been using Hadoop and Hive for the last six months and have been pretty impressed with how well it works.  To state the obvious, if you can correctly formulate your query, nothing beats this approach.  It’s been very useful for doing cohort analysis and large scale lifetime value computations on a relatively high traffic [...]


It’s been over one year since I last posted.  A second child and a new job (focused on digital marketing at KAYAK) are the major updates; a new non-professional blog and a lot more time on digital photography have been amongst the other changes in the last one year. In the context of digital marketing, I’ve [...]

Is running a company easier than picking a cereal?

NPR’s Fresh Air interviewed Jonah Lehrer today who has a new book out about how humans make decisions.  A couple of topics dominated the interview: An overload of choices make decision-making much more difficult since our pre-frontal cortex can only handle a small number of choices at a time (somewhere between 5 and 12) Several [...]

Physical interfaces

When I was working on a travel-related start-up idea last year, I had a call with a senior executive from a major web travel company. What she said really surprised me – that the majority of America still decides on where to go based on the following technique: when they see a compelling destination in [...]