Monthly Archives June 2010

Digital Marketing and BI integrated in Business Process

With my evaluation of the major SEM platform vendors over the last few months as well as my time with demand-side platforms for display, I’ve finally experienced a productized version of business intelligence integrated with business process that scales across many different businesses. While evaluating (and now using) SEM platforms, I have been impressed by: [...]

Small businesses and BI

I saw Dayna Grayson’s post about the winners of NBVP’s Seed Competition. One of the winners was Profitably, which is providing analytics and BI on data in Quickbooks. They are joining a growing list of companies that are selling SaaS BI on standard data schemas to SMB’s (small and medium businesses). Another recent one that [...]

Summing a column of numbers

Often, when I’m doing data validation with the output from Hive and comparing it to another system, it’s useful to get Unix to do some summing for me.  awk, cut, sort, and uniq are quite handy in these cases, and often much faster than modifying and re-running Hive queries.  Here’s my bag of tricks: – [...]


A colleague recently pointed me at Datameer, an analytics front-end for Hadoop.  As their website and datasheet mention, they use a familiar spreadsheet interface for large data.  I recently saw a demo of the product, and I thought they had done a nice implementation of joins through a graphical user interface targeted at non-ETL experts. [...]

Hive Annoyances

As I mentioned in my prior post, I’ve been using Hadoop / Hive for six months now.  My top three frustrations with Hive v0.4.0: 1) a decent quality CLI (command line interface) for Hive.  Editing of a query in Hive is very limited. You can’t use custom keybindings – ideally, you’d want the CLI to [...]

Hadoop and Hive

I’ve been using Hadoop and Hive for the last six months and have been pretty impressed with how well it works.  To state the obvious, if you can correctly formulate your query, nothing beats this approach.  It’s been very useful for doing cohort analysis and large scale lifetime value computations on a relatively high traffic [...]