Chapter 15. Apache Hadoop and Spark interface

Chapter 15. Apache Hadoop and Spark interface
Prev	Part V. Invoking DataCleaner jobs	Next

Chapter 15. Apache Hadoop and Spark interface

Abstract

DataCleaner allows big data processing on the Apache Hadoop platform. In this chapter we will guide you through the process of setting up and running your first DataCleaner job on Hadoop.

Table of Contents

Hadoop deployment overview

Setting up Spark and DataCleaner environment

Upload configuration file to HDFS
Upload job file to HDFS
Upload executables to HDFS

Launching DataCleaner jobs using Spark

Using Hadoop in DataCleaner desktop

Configure Hadoop clusters
CSV datastores on HDFS

Limitations of the Hadoop interface

Prev	Up	Next
Dynamically overriding configuration elements	Home	Hadoop deployment overview