Cloudera

CDH Cloudera HadoopCloudera provides a distribution of the Hadoop data management platform known as CDH (Cloudera Distribution of Hadoop). CDH is a comprehensive platform that significantly accelerates deployment of Apache Hadoop. It also provides critical capabilities designed to make Hadoop more useful in the enterprise.

Here are some of the features of Cloudera’s Hadoop distribution:

  • HDFS – Self healing distributed file system
  • MapReduce – Powerful, parallel data processing framework
  • Hadoop Common – a set of utilities that support the Hadoop subprojects
  • HBase – Hadoop database for random read/write access
  • Hive – SQL-like queries and tables on large datasets
  • Pig – Dataflow language and compiler
  • Oozie – Workflow for interdependent Hadoop jobs
  • Sqoop – Integrate databases and data warehouses with Hadoop
  • Flume – Highly reliable, configurable streaming data collection
  • Zookeeper – Coordination service for distributed applications
  • Hue – User interface framework and SDK for visual Hadoop applications

Using the Cloudera distribution for Hadoop provides capabilities that are:

  • Hardened. Patched with future improvements that improve stability and performance.
  • Integrated and simplified. Cloudera manages cross-component integration, versions, and interdependencies.
  • Functionally rich. The broadest feature set of any Hadoop distribution.
  • Proven in the enterprise. In use in financial services, telecom, web, manufacturing, media, and retail industries.
  • Flexible. Run CDH on premises or in the cloud, on multiple OS versions with multiple installation options.
  • Supported. Backed by the project founders and committers.
  • 100% Apache licensed.

Cloudera should be considered the capability of first choice when it comes to big enterprise data management needs.

[cb]Cloudera[/cb]

For more on Cloudera see: http://cloudera.com