The DataValidator for PowerCenter product was originally developed by a company DVO SOFTWARE. It is now available under the Informatica brand as Informatica PowerCenter Data Validation Option.
DVO is a custom tool built on top of Informatica PowerCenter. DVO integrates with the Informatica PowerCenter Repository and Integration Services and enables developers and business analysts to create rules to test the data being transformed during the data integration process.
DVO Architecture
Courtesy: Informatica Corp.
- Data Validation Option requires installation and setup of PowerCenter.
- Source and target data table and file definitions are imported from PowerCenter repositories.
- You set up table pairs and test rules in Data Validation Option. This test metadata is stored in the Data Validation Option repository.
- When the tests are run, DVO communicates with PowerCenter through an API to create appropriate mappings, sessions, and workflows, and to execute them.
- PowerCenter connects to the data being tested instead of Data Validation Option.
- After the tests are executed, results are stored in the Data Validation Option repository and displayed in the DVO Client.
Typical Data Validation Workflow
- Data Validation Option reads one or more PowerCenter metadata repositories.
- You define the validation rules in Data Validation Option.
- You run the rules to ensure the data conforms to the validation rules. When you do this, Data Validation Option performs the following tasks:
- Creates and executes all tests through PowerCenter.
- Loads results into the DVO results database and displays them in the DVO Client.
- You examine the results and identify sources of inconsistencies in the ETL process or the source systems.
- You repeat this process for new records.
Benefits of DVO
- DVO reduces the time required for data validation and production data auditing and verification significantly, eliminating the traditional methods of validating data by visual inspection, data comparison tools or writing SQL scripts viz row counts, minus queries etc. The risk of validating data by traditional methods is particularly high when there is a larger data set to work with and a higher chance of occurence of human errors.
- Maintaining different test scripts to validate data for different projects is cumbersome. DVO provides a easy-to-use GUI interface to test the rules created for data validations for multiple projects.
- No programming skills needed to create validation tests.
- DVO includes a repository with reporting capabilities to provide a complete audit trail of all tests and their results.
- It reads data definitions from PowerCenter metadata repositories and can easily deal with data definition changes.
Key Pointers For DVO Testing
- DVO tests data only, not mappings or workflows. Testing mappings is unit testing, which is different from data validation.
- DVO only reads table definitions from PowerCenter metadata repositories, and checks the data at either end of the process and will show problems or inconsistencies only. It does not attempt to identify the bug in the ETL process.
- Do not copy formulas from the ETL mapping into Data Validation Option. If there is an error in the ETL mapping formula, you will replicate it in Data Validation Option, and Data Validation Option will not catch it. Therefore, you must always maintain a proper separation between ETL and testing.
- Do not try to do everything in Data Validation Option. If you think that a particular step can be accomplished more easily with SQL, use SQL. If you run 95% of your validation in Data Validation Option, and can document it with the audit trail, this is more than enough.
DVO Usage
- Validate Data being Transformed - ETL Testing, ETL Reconciliation, Application Migration
Courtesy: Informatica Corp.
- Validate if Data is Identical - ETL Migration, Database Migration, ETL Version Upgrade
References
- www.informatica.com
- Informatica PowerCenter Data Validation Option (Version 9.1.2.0) Installation and User Guide