Welcome to the DataCleaner community edition site.

The premier open source Data Quality solution.

What is DataCleaner?

Data profiling

The heart of DataCleaner is a strong data profiling engine for discovering and analyzing the quality of your data. Find the patterns, missing values, character sets and other characteristics of your data values.

Interrogating and profiling your data is an essential activity of any Data Quality, Master Data Management or Data Governance program. If you don’t know what you’re up against, you have poor chances of fixing it.

Data wrangling

DataCleaner is built to handle data both big and small. Give everything from CSV files, Excel spreadsheets to Relational Databases (RDBMs) and NoSQL databases a spin!

Use reference data, external and internal, in order to verify that the data values you have correspond to the real world. DataCleaner allows you to build your own cleansing rules and compose them into several use scenarios or target databases. Whether it is simple search/replace rules, regular expressions, pattern matching or completely custom transformations, it’s all possible.

A Data Quality eco-system

Pluggability and Connectivity are keywords for the open source design philosophy of DataCleaner. The application delivers not only out-of-the-box functionality, but also hosts an eco-system of community driven application extensions integrations, shared content and more.

Developers have the ability to embed DataCleaner into other applications, build plug-ins for the specific use case or even utilize adaptors that make DataCleaner work with Apache Hadoop and Apache Spark. Other prominent integrations exist around the integration with Pentaho Data Integration as well as support for custom data source definitions via the Apache MetaModel framework.