HBase, one API to rule them all

Posted on 6 May 2015

In a recent podcast with O’Reilly, Cloudera’s Michael Stack underlined the recent contributions of Google’s engineering team to the HBase roadmap. At the time, he mostly underlined Google’s huge experience and overall advance on the wide-column datastore technology, and their huge value add in making HBase an even better database.

Today’s announcement of a Cloud Bigtable offering on Cloud Platform put his remark into a new light. As emphasized by Google, Cloud BigTable is fully compatible with ...

HBase: one database for both analytical and operational workloads

Posted on 15 Feb 2015

From its deep integration within the Hadoop platform, the promise of HBase is clear: one datastore for both analytical and operational workloads. But these two type of workloads command very different data interaction patterns.

On one hand, exploratory analytics is focused on complex analysis. Data analysts need to perform rich queries on the data, and typically generate analytic reports through the combination of usual SQL commands and BI visualization products. On the other hand, operational intelligence requires ...

Three cool features of HBase

Posted on 29 Jan 2015

With the release of HBase v1.0 now imminent, we would like to pause and share our thoughts on some cool features of HBase. We will not talk here about HBase scalability, flexible schema design or deep integration with the Hadoop platform and ecosystem, this is all well known by now. Instead, we will focus on three additional characteristics of HBase that make it truly stand apart from other NoSQL databases:

  • Sorted row-keys
  • Control on data sharding
  • Strong consistency

Sorted row-keys

Manipulation ...

Kafka’s emergence in the Hadoop ecosystem

Posted on 25 Nov 2014

Kafka is far from having the same visibility as Spark, it is nonetheless emerging as a first-class citizen of the Hadoop ecosystem. Some would say it is only natural considering the kafkaesque nature of the Hadoop ecosystem of open source projects…

What Kafka brings to the table is a top-notch, scalable layer to shuffle messages around different execution engines and streamline data pipelines. The rise of interest in Kafka is intimately linked to Hadoop’s transformation into a ...

A tribute to Facebook engineering

Posted on 30 Aug 2014

As a company heavily focused on HBase, it felt appropriate to pay tribute to Facebook engineering in this blog. Facebook’s decision to use HBase as the backend for its Messages application back in 2010 was arguably a pivotal moment in the development of the column-oriented, key value store.

Back then, HBase was mostly used for storing web crawling data, and deployments were few and far between. Also, Facebook had internally developed its own key value store, Cassandra, ...

Scala gaining ground in the Hadoop ecosystem

Posted on 27 Aug 2014

Java has historically been the main development language within the Hadoop community. Both Apache Hadoop and HBase are developed in Java, as most of the early tooling for the platform. Overall, Java preeminence is clear, and the language’s vast pool of developers further guarantees its privileged position.

Over the last 2-3 years however, Scala has been gaining significant ground. Obviously the two languages should not be opposed, as Scala programs run in the JVM. Scala provides full ...

