Saturday, May 22, 2010

Feature Manipulation Engine (FME) and Informatica PowerCenter

Feature Manipulation Engine (FME) is a spatial ETL tool by Safe Software Inc.

The advantage of a spatial ETL tool is that it can read, write and even manipulate spatial data. While an ETL tool must have processing capabilities for the various column types that are in a non-spatial database or system, a spatial ETL must also have the spatial operations - geoprocessing capabilities that change the structure and representation of spatial data - needed to move from one spatial database or GIS to another.

Besides, FME can read data from and write data to over 250 supported formats. Informatica PowerCenter as an ETL tool cannot read from or write data in the spatial domain. So, the FME Extension for Informatica extends Informatica PowerCenter with spatial data integration capabilities.

You can read more on FME at http://www.safe.com/. Also, Safe Software Inc. provides a Free FME Evaluation Download that has 14 day free evaluation period. It also has a series of tutorials on how to get started with FME, how to register the FME Extension in an Informatica PowerCenter Repository Service as a plug-in and how to use the FME Extension for Informatica.

As shown below, the Oracle_BI_DW_Base Informatica Repository Service has registered the FME Client plug-in in the Administration Console.


Now, if we check the Informatica PowerCenter Designer client tool, we have a new transformation added i.e. the FME Transformation as shown below.


The FME transformation, if used within any Informatica PowerCenter mapping will enable spatial data flow within Informatica. FME is certainly a tool to explore, particularly if you are well-versed with the spatial domain.

Friday, May 21, 2010

Concatenating multiple fields from the source table as comma separated values into a field in the target table

The purpose of the mapping discussed below is to concatenate the employee skills data available in the EMPLOYEE_SKILLS source table and load it into the EMPLOYEE_SKILL_SUMMARY target table.

Below is the script for EMPLOYEE_SKILLS source table:

CREATE TABLE EMPLOYEE_SKILLS
(EMP_ID NUMBER(10),
EMP_NAME VARCHAR2(30),
EMP_SKILL1 VARCHAR2(40),
EMP_SKILL2 VARCHAR2(40),
EMP_SKILL3 VARCHAR2(40));

The data available in the EMPLOYEE_SKILLS table is shown below:


Below is the script for EMPLOYEE_SKILL_SUMMARY target table:

CREATE TABLE EMPLOYEE_SKILL_SUMMARY
(EMP_ID NUMBER(10),
EMP_NAME VARCHAR2(30),
EMP_SKILLS VARCHAR2(120));

The data from EMP_SKILL1, EMP_SKILL2 and EMP_SKILL3 columns of EMPLOYEE_SKILLS table should be concatenated as comma separated values into the EMP_SKILLS column of the EMPLOYEE_SKILL_SUMMARY table.

The mapping shown above is the earlier mapping which concatenated all the skill Input port values in the EMP_SKILL_OUT Output port in the EXP_CONCAT_SKILLS Expression Transformation. The expression used in the EMP_SKILL_OUT port is shown below.


The issue surfaced after the data was loaded in the target table. Below are the records loaded into the EMPLOYEE_SKILL_SUMMARY table.


Now, the EMP_SKILLS field was displayed in some analytical reports and the way the data showed up didn’t look very pleasing. For instance, EMP_ID = 1199 has no skills in the EMPLOYEE_SKILLS source table, so when the three input port values were concatenated in the expression transformation, the data displayed in the reports was ',,'. So, a logic needed to be implemented that took care of NULL values and didn’t add a ‘comma’ for the NULL values.

The new logic is discussed below.


Instead of concatenating all the EMP_SKILL1_IN, EMP_SKILL2_IN and EMP_SKILL3_IN port values together, the new logic checked if the Input Skill values in the ports EMP_SKILL1_IN and EMPSKILL2_IN were having NULL values. If it is a NULL value, it will pass a blank string as '', else it would concatenate the incoming value with a ‘comma’ in the variable ports EMP_SKILL1_V and EMP_SKILL2_V.

The EMP_SKILLS_CONCAT_V variable port will concatenate values from the variable ports EMP_SKILL1_V, EMP_SKILL2_V and the input port EMP_SKILLS3_IN.

Now, the only problem of a ‘comma’ occurring at the end of the concatenated string will be in a scenario when EMP_SKILL2_IN and EMP_SKILL3_IN values are NULL or when EMP_SKILL3_IN values is NULL. For instance, if we observe in the source table EMPLOYEE_SKILLS, for EMP_ID = 1127, the EMP_SKILL1 and EMP_SKILL2 columns have valid data. So, for this particular record, the EMP_SKILLS_CONCAT_V port will have the string value as ‘SQL Server 2005,OBIEE 10.1.3.4,’. At the end of the string, the extra comma gets added as well.

To eliminate this ‘comma’, the string in the variable port EMP_SKILLS_CONCAT_V is reversed in the variable port EMP_SKILLS_REVERSE_V. The EMP_SKILLS_REMOVE_COMMA_V variable port will check for the occurrence of a ‘comma’ at the start of the data string. If it encounters a comma, it will pass only the substring without the ‘comma’ at the start of the string, else it passes the data as it is. This is achieved with the expression as shown below.



Now, to get back the original concatenated string without any extra ‘comma’ , reverse the data again and pass it through the EMP_SKILLS_OUT output port to the final target table column EMP_SKILLS.

These are the records in the EMPLOYEE_SKILL_SUMMARY table after running the modified mapping again.