Tuesday 4 September 2012

FUZZY LOOKUP TRANSFORMATION





Introduction:
            Using Fuzzy lookup we can find the percentage matches between two strings.

Sample package:
Requirement:-

Source table:-
            Source table is having persons information (ID, Name, and Gender) shown below.



Lookup table:-
            Lookup table is having valid person’s names, shown below.


Target table:-
            We need to populate target table with valid records from source, using lookup table.


            Source table is having person’s information (Valid and invalid). Lookup table is having valid person’s names. I want to extract valid person’s information from source with the help of lookup table, and load it into target table.

            Source and lookup table person names (data in the Name column) are similar but not exactly matched, so we need to find percentage matched between the name columns. Based on that percentage we need to take decision weather that particular record is valid or invalid.

            Lookup transformation creates additional columns, in those “_Similarity” column returns the percentage matches between the mapped columns. Using conditional split we can split the data into different sets (valid and invalid records).


Package creation:-
  • From control flow items, drag and drop “Data flow task” into designer surface.
  • Right click on “Data flow task” and select edit option then it will go to “Data flow”
  • From data flow items, drag and drop “OLEDB source” into designer surface. Create the source table mappings.
  • After source mappings, drag and drop the “Fuzzy Lookup” transformation into designer surface. Connect the source green arrow to the transformation shown below screen shot.



  • Edit the “Fuzzy Lookup” transformation. Select the reference table (Lookup table) shown below.



  • Go to “Columns” tab.



  • In the above screen shot, two mappings are available between the ID and Name columns. We need to delete the ID mapping, because we need to find the valid records based on name field only.
  • Click on the ID mapping line and press delete button from the key board then mapping will get delete. Select “Name” column check box, from lookup columns. See the below screen shot for reference.



  • Click “OK” button, then “Fuzzy Lookup Transformation Editor” window will disappear.
  • Drag and drop the “Conditional split” transformation into designer surface.
  • Connect the “Fuzzy lookup” transformation green arrow to the “Conditional split”.
  • Edit the “Conditional split” and expand the columns.



  • Drag and drop the “_Similarity” column into “Condition” box.



  • If the strings are matched more than 60% then I want to consider that particular record as valid other wise we should ignore.

Condition: _Similarity>0.6



  • Click “OK” Button.
  • Drag and drop the “OLEDB destination” into designer surface.
  • Connect the conditional split green arrow to the “OLEDB destination” then “Input and output selection” window will open.



  • In that window “Output:” dropdown list select the case name (Case 1).



  • Click “OK” button.
  • Edit the “OLEDB target” and map the target table. After that execute the package.



Target table output:


Note: R King is not loaded into target table because the record is invalid record (Reference table doesn’t have the “R King” information).

 
 
 

 
 
 
 

 

No comments:

Post a Comment