Hadoop and Hive

I’ve been using Hadoop and Hive for the last six months and have been pretty impressed with how well it works.  To state the obvious, if you can correctly formulate your query, nothing beats this approach.  It’s been very useful for doing cohort analysis and large scale lifetime value computations on a relatively high traffic site.  There are of course limits to what you want to keep in Hadoop / Hive; however, the convenience and the growing feature set are reducing that limit more and more.

Hive is not a good store as a backend for a BI product, since it offers no caching at all.  However, a workflow where you crunch data in Hadoop/Hive and then export to a MySQL table (or an Endeca instance) for use in a BI tool works very well.

Post a Comment

Your email is never published nor shared. Required fields are marked *