Value matcher

The value matcher works very similar to the Value distribution , except for the fact that it takes a list of expected values and everything else is put into a group of 'unexpected values'. This division of values means a couple of things:

  1. You get a built-in validation mechanism. You expect maybe only 'M' and 'F' values for your 'gender' column, and everything else is in a sense invalid, since it is unexpected.

  2. The division makes it easier to monitor specific values in the data quality monitoring web application.

  3. This analyzer scales much better for large datasets, since the groupings are deterministic and thus can be prepared for in the batch run.