Three cool features of HBase

Posted on 29 Jan 2015

With the release of HBase v1.0 now imminent, we would like to pause and share our thoughts on some cool features of HBase. We will not talk here about HBase scalability, flexible schema design or deep integration with the Hadoop platform and ecosystem, this is all well known by now. Instead, we will focus on three additional characteristics of HBase that make it truly stand apart from other NoSQL databases:

  • Sorted row-keys
  • Control on data sharding
  • Strong consistency

Sorted row-keys

Manipulation ...

Read more →

Kafka’s emergence in the Hadoop ecosystem

Posted on 25 Nov 2014

Kafka is far from having the same visibility as Spark, it is nonetheless emerging as a first-class citizen of the Hadoop ecosystem. Some would say it is only natural considering the kafkaesque nature of the Hadoop ecosystem of open source projects…

What Kafka brings to the table is a top-notch, scalable layer to shuffle messages around different execution engines and streamline data pipelines. The rise of interest in Kafka is intimately linked to Hadoop’s transformation into a ...

Read more →

A tribute to Facebook engineering

Posted on 30 Aug 2014

As a company heavily focused on HBase, it felt appropriate to pay tribute to Facebook engineering in this blog. Facebook’s decision to use HBase as the backend for its Messages application back in 2010 was arguably a pivotal moment in the development of the column-oriented, key value store.

Back then, HBase was mostly used for storing web crawling data, and deployments were few and far between. Also, Facebook had internally developed its own key value store, Cassandra, ...

Read more →

Scala gaining ground in the Hadoop ecosystem

Posted on 27 Aug 2014

Java has historically been the main development language within the Hadoop community. Both Apache Hadoop and HBase are developed in Java, as most of the early tooling for the platform. Overall, Java preeminence is clear, and the language’s vast pool of developers further guarantees its privileged position.

Over the last 2-3 years however, Scala has been gaining significant ground. Obviously the two languages should not be opposed, as Scala programs run in the JVM. Scala provides full ...

Read more →

Bigtop: The unsung hero of Hadoop’s development

Posted on 21 Jul 2014

With the release of Hadoop 2.0 and YARN, MapReduce has been downgraded as one among many execution engines running on the platform, and the combination of HDFS and YARN represents the new core of Hadoop. Or more precisely (to take into account the specificities of MapR or IBM distributions), the combination of the HDFS API and of YARN has become Hadoop’s kernel.

However, above this kernel, the Hadoop ecosystem remains a conglomerate of open-source projects oft initiated ...

Read more →
Page 2 of 2 12