Epistemological Modesty and Big Data

I am currently “reading” (listening to the audiobook actually) David Brooks’ Social Animal. First of, if you care about understanding human behavior in any realm, it is a fantastic read. I have always enjoyed Brooks’ columns at the NYT. His writing in this book is just as engaging and thought-provoking. I listened to a segment [...]

Digital Marketing and BI integrated in Business Process

With my evaluation of the major SEM platform vendors over the last few months as well as my time with demand-side platforms for display, I’ve finally experienced a productized version of business intelligence integrated with business process that scales across many different businesses. While evaluating (and now using) SEM platforms, I have been impressed by: [...]

Small businesses and BI

I saw Dayna Grayson’s post about the winners of NBVP’s Seed Competition. One of the winners was Profitably, which is providing analytics and BI on data in Quickbooks. They are joining a growing list of companies that are selling SaaS BI on standard data schemas to SMB’s (small and medium businesses). Another recent one that [...]

Summing a column of numbers

Often, when I’m doing data validation with the output from Hive and comparing it to another system, it’s useful to get Unix to do some summing for me.  awk, cut, sort, and uniq are quite handy in these cases, and often much faster than modifying and re-running Hive queries.  Here’s my bag of tricks: – [...]


A colleague recently pointed me at Datameer, an analytics front-end for Hadoop.  As their website and datasheet mention, they use a familiar spreadsheet interface for large data.  I recently saw a demo of the product, and I thought they had done a nice implementation of joins through a graphical user interface targeted at non-ETL experts. [...]

Hive Annoyances

As I mentioned in my prior post, I’ve been using Hadoop / Hive for six months now.  My top three frustrations with Hive v0.4.0: 1) a decent quality CLI (command line interface) for Hive.  Editing of a query in Hive is very limited. You can’t use custom keybindings – ideally, you’d want the CLI to [...]


It’s been over one year since I last posted.  A second child and a new job (focused on digital marketing at KAYAK) are the major updates; a new non-professional blog and a lot more time on digital photography have been amongst the other changes in the last one year. In the context of digital marketing, I’ve [...]

A capital framework for evaluating career choices

My brother recently proposed the following framework for thinking about career choices or rather, choices “for your next gig.” It is a capital-oriented framework where traditional capital (money, equity) is just one part of it. He defines five different types of capital: Financial capital: traditional capital falls into this category e.g. How much money will [...]

Revenue – cost = profit

One of the simplest and yet striking ideas that I’ve heard for thinking about case interviews has been – always start with the simple framework that you are trying to maximize profit, which is the difference between revenue and cost. Everything else should be done either to increase revenue or decrease cost. Similarly, one of [...]

Who has time for this?

I finally decided to start my own blog. I always wondered how people manage to have time to write up blog entries. I will shortly find out. The title of this post, apart from being my own rhetorical question, is also the title of David Cowan’s blog.