Thursday 3 July 2014

How to remove duplicate records from a flat file

Approach 1 using Sorter Transformation

Create a mapping as below:
In case your source is relation select "Distinct" option in Properties tab (not required any sorter transformation).
In case your source is file use Sorter Transformation as mentioned above:
In Sorter Transformation by selection “Distinct” option.
 
Map required ports from Sorter Transformation to Target Instance.

Approach 2 using Aggregator Transformation:

By selecting Group By option for all the ports in Aggregator Transformation.

Create the mapping as below:

In Aggregator Transformation by selection Group by for “All Ports”. (In case your are filter duplicate based on selected ports/columns then select group by for only those ports/columns).
 
Map required ports from Aggregator Transformation to Target Instance.

Approach 3 using Ranker Transformation:
Create mapping as below:

In Ranker Transformation by select Group by “EmpId
Map ports from RankerTransformation to Target Instance.

3 comments:

  1. we can do this by just Rank Transformation no need of Flagging and Filter Transformation.

    ReplyDelete
  2. Yes.. We can do this by Rank also and I have already posted above a Approach 3 using Ranker Transformation

    ReplyDelete