DataStage Data Transformation
Transforming Data with the Transformer Stage
The Transformer stage is used within a DataStage job (referred to as DS for the remainder of this article) to filter, transform, or re-direct data. Overall, you can define conditions in the Transformer that will affect the output. In this article, we will explore some of the ways that you can use the Transformer.
Another common use of the Transformer, is splitting the output to multiple tables or databases. You can also test for specific conditions, and if it fails, you can define a ‘reject’ link, and examine the failed data.
These concepts are best demonstrated with examples.
There are 3 general types of transformations:
- Derivations change data.
- Constraints filter data
- Rejections direct data.
We have created a simple Parallel job with a source DB2 connector and the same for the target. The Transformer links the two connectors. Both the source and target have been configured. The three different types of Transforms will be described as follows:
Filter/Constraint – 1
Filters, or Constraints, can allow rows to pass that satisfy specific conditions. For example, you can filter customer data so that only Canadian customers can pass into a table. In addition, you can split data into multiple outputs, based on conditions. We could have one table populated with customers from Canada only, and another table populated with customers from any other country, excluding Canada. This is what the filter would look like the stage using ‘Constraints’.
Parallel Job using the Transformer Stage with Multiple Outputs
Derivations – 2
Derivations are typically used in conjunction with Stage Variables. As stated previously, Derivations are used to change data. Let us look at an example.
We have a field in our source table named ACTIVE. The value is either ‘Y’ or ‘N’. Our requirement in the Target table field ACTIVE is for that to be displayed as either ‘Active’ or ‘Inactive’. A Stage Variable ‘AcctActiveFlag’ is created to hold the output. An If/Else statement tests for the input values and outputs to ‘AcctActiveFlag’.
‘Y’ becomes ‘Active’ and ‘N’ becomes ‘Inactive’ in the target field. There are also various functions available (Date, Time, Number, String, and Type Conversion) that can assist us in converting data. We could change a null to a value using the ‘NullToValue’ function for instance.
Reject Links – 3
We can think of Reject Links as a direction type of transformation. Data that passes our criteria will continue to the Target table.
Data that fails our criteria is ‘re-directed’ to a separate link and output (ex: text file). We can then examine the data and ascertain the reason for the rejection.
This is yet another function available in the Transformer Stage. Multiple output rows can be written from an input row. Looping is useful for writing repeating values to the output for an input, as in grouping.
There are two components to Loop Processing:
- Loop Condition – This is a True/False expression that is evaluated after a row is read.
- Loop Variable – While the condition is met, the Variable will continue to execute.
Note ** You must ensure that the condition eventually changes to false. This is necessary to prevent an infinite loop state.
In the following example, we have a source table with a ‘MODEL’ field. The model names are delimited by the backslash character (Input Table). However, we need to split them. Our desired result is that each model will be on a separate row (Output Table).
To implement this, we would create a Stage Variable ‘CarCount’ in the Transformer Stage to hold the count of the fields. In the derivation we would create an expression assigned to split and count for ‘CarCount’. It would look like this:
Next, we set our Loop Condition as less than or equal to our Stage Variable (CarCount).
Furthermore, we create a Loop Variable ‘modelLoopVar’ that is assigned the value of the Derivation formula: Field(to_transformer.MODEL, “/”, @ITERATION, 1)
Drag the Loop Variable ‘modelLoopVar’ down to the Input Derivation. Your Transformer configuration should look like this:
The Transformer stage is quite useful at filtering and modifying data. However, it does have some drawbacks. The overhead when using a Transformer is relatively high. Keep this in mind when designing your job. A good practise is to examine all the requirements, then build your job accordingly.
If performance is paramount, DS provides different stages, where the functionality overlaps. These alternatives have a smaller performance ‘’cost’.
If you just want to split data to multiple outputs, you would be better served using the ‘Copy’ stage instead of the ‘Transformer’. Likewise, for filtering, use the ‘Filter’ stage. However, what makes the Transformer appealing is that it has all the features of the Copy, Filter, and Switch stage combined.
IBM’s online documentation provides a listing and description for every DS stage here at DS Stages