Friday, May 21, 2010

Concatenating multiple fields from the source table as comma separated values into a field in the target table

The purpose of the mapping discussed below is to concatenate the employee skills data available in the EMPLOYEE_SKILLS source table and load it into the EMPLOYEE_SKILL_SUMMARY target table.

Below is the script for EMPLOYEE_SKILLS source table:


The data available in the EMPLOYEE_SKILLS table is shown below:

Below is the script for EMPLOYEE_SKILL_SUMMARY target table:


The data from EMP_SKILL1, EMP_SKILL2 and EMP_SKILL3 columns of EMPLOYEE_SKILLS table should be concatenated as comma separated values into the EMP_SKILLS column of the EMPLOYEE_SKILL_SUMMARY table.

The mapping shown above is the earlier mapping which concatenated all the skill Input port values in the EMP_SKILL_OUT Output port in the EXP_CONCAT_SKILLS Expression Transformation. The expression used in the EMP_SKILL_OUT port is shown below.

The issue surfaced after the data was loaded in the target table. Below are the records loaded into the EMPLOYEE_SKILL_SUMMARY table.

Now, the EMP_SKILLS field was displayed in some analytical reports and the way the data showed up didn’t look very pleasing. For instance, EMP_ID = 1199 has no skills in the EMPLOYEE_SKILLS source table, so when the three input port values were concatenated in the expression transformation, the data displayed in the reports was ',,'. So, a logic needed to be implemented that took care of NULL values and didn’t add a ‘comma’ for the NULL values.

The new logic is discussed below.

Instead of concatenating all the EMP_SKILL1_IN, EMP_SKILL2_IN and EMP_SKILL3_IN port values together, the new logic checked if the Input Skill values in the ports EMP_SKILL1_IN and EMPSKILL2_IN were having NULL values. If it is a NULL value, it will pass a blank string as '', else it would concatenate the incoming value with a ‘comma’ in the variable ports EMP_SKILL1_V and EMP_SKILL2_V.

The EMP_SKILLS_CONCAT_V variable port will concatenate values from the variable ports EMP_SKILL1_V, EMP_SKILL2_V and the input port EMP_SKILLS3_IN.

Now, the only problem of a ‘comma’ occurring at the end of the concatenated string will be in a scenario when EMP_SKILL2_IN and EMP_SKILL3_IN values are NULL or when EMP_SKILL3_IN values is NULL. For instance, if we observe in the source table EMPLOYEE_SKILLS, for EMP_ID = 1127, the EMP_SKILL1 and EMP_SKILL2 columns have valid data. So, for this particular record, the EMP_SKILLS_CONCAT_V port will have the string value as ‘SQL Server 2005,OBIEE,’. At the end of the string, the extra comma gets added as well.

To eliminate this ‘comma’, the string in the variable port EMP_SKILLS_CONCAT_V is reversed in the variable port EMP_SKILLS_REVERSE_V. The EMP_SKILLS_REMOVE_COMMA_V variable port will check for the occurrence of a ‘comma’ at the start of the data string. If it encounters a comma, it will pass only the substring without the ‘comma’ at the start of the string, else it passes the data as it is. This is achieved with the expression as shown below.

Now, to get back the original concatenated string without any extra ‘comma’ , reverse the data again and pass it through the EMP_SKILLS_OUT output port to the final target table column EMP_SKILLS.

These are the records in the EMPLOYEE_SKILL_SUMMARY table after running the modified mapping again.

1 comment:

  1. Thank you very much for your detailed explanation, It just helped me a lot!