Interview Questions And Answers Informatica Powercenter

Interview Questions And Answers Informatica Powercenter

Summary

Informatica PowerCenter is an ETL tool used to extract, transform, and load data. The list of important interview questions covers various topics such as dynamic file creation, join implementation, session recovery strategy, variable and parameter differences, cache types, update strategy, mapping variables, and more. Additionally, the summary advises candidates to be well-versed in Unix and SQL, as well as familiar with automation concepts in PowerCenter. The ability to handle complex requirements and troubleshoot issues is also highlighted as essential for interview preparation.

This document contains a list of 50 very important Informatica PowerCenter questions that are a must-read before attending an interview.

Informatica PowerCenter is an ETL(Extract Transform and Load), which can extract information from various source systems such as Database, Flat files, XML, etc., and transforms the extracted information according to the requirement and loads it into a target that can be again the Database, Flat files, XML, etc.

Most asked interview questions for Informatica PowerCenter

1. How can dynamic files be created as a target in Informatica?

To generate multiple flat files, we can use TCT (transaction control transformation) transformation. Informatica provides a special port, "FileName" in the target file definition. So, we can go to the target designer and edit the file definition, then click on the button (which is on the right-side-up corner) to add the special port, this can be connected to the expression transformation which is used to generate the appropriate name.

2. How can joins be performed in Informatica without using the joiner transformation?

This can be done using the source qualifier transformation by using the "User Defined Join" property.

3. What is meant by 'Output is deterministic' and 'Output is repeatable' in Informatica?

'Output is deterministic' is specified only when the source output does not change between session runs. 'Output is repeatable' is specified only when the order of the source output is the same as the session runs. Generally, the Integration Service stages the source data for recovery. When these options are set, the source data is not staged for recovery.

4. What is meant by Session Recovery Strategy in Informatica?

Session Recovery Strategy is used when a session fails and needs to restart from where it failed. Informatica uses an inbuilt table OPB_SRVR_RECOVERY to determine the last committed record and starts again from the next record. The session also needs to be configured for this. To do this:
Go to session --> Properties --> For attribute 'Recovery Strategy' set value to 'Resume from the Last checkpoint'
The commit strategy also matters here. If the commit interval is 500 records and the transaction fail at the 1100th record, then the new run will start from the 1001th record.

5. What is the difference between a variable and a parameter in Informatica?

A parameter in Informatica is one for which the value is specified in a parameter file and that value cannot be changed during the run of that session. A variable, on the contrary, is the one whose value can change during the session run.

6. What is the difference between a Static cache and a dynamic Cache?

  • Static Lookup cache:
    • When the Dynamic Lookup Cache lookup property is not selected the cache is static and the data in the cache will stay the same for the entire session. PowerCenter does not update the cache while it processes the transformation.
  • Dynamic Lookup cache:
    • When the Lookup Caching Enabled and Dynamic Lookup Cache lookup properties are selected the lookup cache is dynamic.
    • Each time the Server executes the session, whenever a row is inserted, updated or deleted to or from a target table, the cache will also be updated.

7. Under what situation do we use dynamic lookup transformation?

Dynamic lookup transformation is used when we have to consider changed data on the table which is been looking upon during the session run like updating the details of the table. Example: SCD type II.

8. When Dynamic Cache is selected, a new default port will be created. What does the new port do?

The cache will be getting updated if any changes in the lookup table during the session run a new default port will be created - 'newlookuprow', it will generate values - (0 - no change, 1 - a new record, 2 - changed record) based on these values the UPD transformation can act further.

  • 0 = Integration Service does not update or insert the row in the cache.
  • 1 = Integration Service inserts the row into the cache.
  • 2 = Integration Service updates the row in the cache.

9. What are the different types of ports in Expression transformation?

There are 3 type of ports:

  • Input,
  • Output
  • Variable

10. In an expression transformation, there are 2 variable ports, in the beginning, 1 output and 1 input. What is the sequence of execution of these ports?

The sequence will be: Input -->Variable1 -->Variable2-->Output

11. What is MD5 function?

MD5 (Message Digest Function) is a hash function in Informatica which is used to evaluate data integrity. The MD5 function uses Message-Digest Algorithm 5 (MD5) and calculates the checksum of the input value. MD5 is a one-way cryptographic hash function with a 128-bit hash value.
MD5 returns a 32 characters string of hexadecimal digits 0-9 & a-f and returns NULL if the input is a null value.

12. A row has 5 columns. How do you take 5 ports i.e. single row and create 5 rows from it?

This can be done using normalizer transformation (Gk and GCID port will be created, which are inbuilt)

13. What is meant by the Target load plan in Informatica?

When we have multiple targets to load data to in Informatica, we can decide which target to load first and which to load next. This can be helpful in case the two or more targets have a primary key or foreign key relationship between them.
To do this: go to Mappings--> Target Load Plan--> Select a Target Load Plan.

14. While comparing the source with the target, when there is a need to avoid duplicate rows coming from the source, then which lookup should be used?

Dynamic Lookup should be used, as it creates a dynamic cache of the target which changes as the rows are processed. In this case, rows can be determined for insert or update. A normal lookup creates a static cache of the target.

15. Reading data from huge (100 Million rows) flat file, the target is oracle table which will have either insert or update on target. Oracle has correct indexes. What are the performance things you look for from Informatica to improve performance?

Dropping the indexes before the session run and re-creating the indexes once the session completes.

16. What are the different partitioning methods in Informatica?

Database partitioning: This can be used with Oracle or IBM DB2 source instances on a multi-node tablespace or with DB2 targets. It reads partitioned data from the corresponding nodes in the database. The PowerCenter Integration Service queries the IBM DB2 or Oracle system for table partition information.
Hash partitioning: This can be used when you want the PowerCenter Integration Service to distribute rows to the partitions by the group. For example, you need to sort items by item ID, but you do not know how many items have a particular ID number.
You can use the following types of hash partitioning:

  • Hash auto-keys: A compound partition key is created by the PowerCenter Integration Service using all grouped or sorted ports. You may need to use hash auto-keys partitioning at rank, sorter, and unsorted aggregator transformations.
  • Hash user keys: A hash function is used by the PowerCenter Integration Service to group rows of data among partitions. You define the number of ports to generate the partition key.
  • Key range: The PowerCenter Integration Service passes data to each partition depending on the ranges you specify for each port. One or more ports can be used to form a compound partition key. Use key range partitioning where the sources or targets in the pipeline are partitioned by key range.
  • Pass-through: Choose pass-through partitioning where you want to create an additional pipeline stage to improve performance, but do not want to change the distribution of data across partitions. The PowerCenter Integration Service passes all rows at one partition point to the next partition point without redistributing them.
  • Round-robin: This can be used so that each partition processes rows based on the number and size of the blocks. The PowerCenter Integration Service distributes blocks of data to one or more partitions.

17. How does Update Strategy work in Informatica?

Update strategy transformation is used to insert, update, and delete records in the target table. It can also reject the records without reaching the target table. It can also flag rows in mapping with update strategy as below:
DD_INSERT: Numeric value is 0. Used for flagging the row as Insert.
DD_UPDATE: Numeric value is 1. Used for flagging the row as Update.
DD_DELETE: Numeric value is 2. Used for flagging the row as Delete.
DD_REJECT: Numeric value is 3. Used for flagging the row as Reject.
In the session level have to set, 'Treat target rows as - Data Driven'

18. While doing Insert and Update on Target table, Update Strategy is poor in performance. Without using update strategy, how do you perform Insert and Updates? How do you design the mapping?

We can create two mappings. One for inserting the new records and, another one is for updating the existing record, In which have to connect the key column and columns which have to get updated of the target table. In the session only update the target rows have to check.

19. Out of these operations: like constants for insert, update, delete and reject in update strategy, what does expression of an update strategy leads to?

It will perform a corresponding operation.

20. If the update strategy evaluates a record to Reject, what will Informatica do?

It will block that record

21. How do you improve the performance of a Joiner Transformation?

We use sorted input by choosing the table with fewer records as Master Table.

22. What is Transaction Control transformation?

Transaction Control Transformation. It is used to control the commit and rollback of transactions. The following built-in variables are available:

  • TC_CONTINUE_TRANSACTION: The Integration Service does not perform any transaction change for this row. This is the default value of the expression.
  • TC_COMMIT_BEFORE: The Integration Service commits the transaction, begins a new transaction, and writes the current row to the target. The current row is in the new transaction.
  • TC_COMMIT_AFTER: The Integration Service writes the current row to the target, commits the transaction, and begins a new transaction. The current row is in the committed transaction.
  • TC_ROLLBACK_BEFORE: The Integration Service rolls back the current transaction, begins a new transaction, and writes the current row to the target. The current row is in the new transaction.
  • TC_ROLLBACK_AFTER: The Integration Service writes the current row to the target, rolls back the transaction, and begins a new transaction. The current row is in the rolled-back transaction.

23. What is the difference between a Reusable transformation and a Mapplet?

Reusable transformation is a Single transformation that can be reused in many mappings.
Mapplet is a group of transformations that forms particular logic, which can be used in many mappings.

24. Can you have a Mapplet that reads data from the source and expression transformation and writes data to the target?

We can add source definitions that act as a source of data for our mapplet. We can add as many sources as we want. Another way to feed data through a mapplet is with an input transformation. Mapplet can have an expression transformation. The output of a mapplet cannot be connected to any target table.
You cannot include the following objects in a mapplet:

  • Normalizer transformations.
  • Cobol sources.
  • XML Source Qualifier transformations.
  • XML sources.
  • Target definitions.
  • Pre- and post-session stored procedures.
  • Other mapplets.

25. You design a Mapplet, when I drag a Mapplet to a mapping, what ports of the Mapplet are visible? What if we have to pass data to a Mapplet from Mapping? What are the Output ports?

Input and output.

26. What are the different types of tasks?

Different types of tasks include:

  1. Assignment- Used to assign a value to a workflow variable.
  2. Command -Used to run a shell command during the workflow.
  3. Control -Used to stop or abort the workflow.
  4. Decision -Tells a condition to evaluate.
  5. Email- Used to send an email during the workflow.
  6. Event-Raise -Notifies the Event-Wait task that an event has occurred.
  7. Event-Wait - It waits for the event to complete in order to start the next task.
  8. Session - Used to run the mapping created in Designer buy linking to session.
  9. Timer - It waits for an already timed event to start.

27. Session Task is nothing but a mapping. I design a mapping, somebody is using it in a workflow. Can you name a few properties of mapping that we can override at the session level?

Some of the properties of mapping that we can override are:

  • Table names.
  • Properties to treat the target rows (insert, update).
  • Joining condition.
  • Filter condition.
  • Source qualifier overrides.

28. What is a Control Task in a workflow?

Control tasks can be used to stop or abort the workflow.

29. If you run a Workflow and it fails, how would you investigate the failure?

We can do this by looking at the session and wf logs.

30. How do you access the Session or Workflow log?

We can access sessions and wf logs on the monitor.

31. What are the sections present in creating Workflow Parameter file? What is the type of information present?

Sample structure:
[Global]
[Folder_Name.WF:wf_name]
$param1 =
It might contain - log file name, some parameter values like data which has to be passed, connection strings.
It will contain the information which will remain the same for current session run.

32. How do you execute a workflow from a Unix shell script?

Using PMCMD command:
pmcmd startworkflow -sv ${INTEGRATION_SERVICE} -d ${DOMAIN_NAME} -u ${INFA_USR} -p ${INFA_PWD} -f ${FOLDER_NAME} -wait ${WORKFLOW_NAME}

33. How to use mapping variable be used in a workflow?

A mapping variable can be assigned to the workflow variable at the workflow manager.
To do this: Go to Session→ Edit→Components→Pre/Post session variable assignment.

34. How to use session name in a command task in a workflow?

This can be done with the help of $PMSessionName.
Example: echo “Session: $PMSessionName” >> Dummy.txt

35. How to use session name and workflow name in email task without hardcoding?

This can be done by using the %s for session and %w for workflow.

36. We have a scenario where we want our session to stop processing further when the 2nd error is detected while running it. How to achieve this?

There is a session level property: Stop on errors
Set this value to 2.

37. What type of sessions allow the variable assignment from mapping to a workflow?

Only the non-reusable sessions. For Re-usable sessions, we cannot assign the mapping variable to the workflow variable.

38. What are different types of tracing levels in Informatica 10.1.1?

  • Note: Applicable only at the session level. The Integration Service uses the tracing levels configured in the mapping.
  • Terse: logs initialization information, error messages, and notification of rejected data in the session log file.
  • Normal: Integration Service logs initialization and status information, errors encountered and skipped rows due to transformation row errors. Summarizes session results, but not at the level of individual rows.
  • Verbose Initialization: In addition to normal tracing, the Integration Service logs additional initialization details; names of index and data files used, and detailed transformation statistics.
  • Verbose Data: In case of Verbose data, in addition to the verbose initialization tracing, the logs of each row that passes into the mapping is kept by the Integration Service. It also logs where the Integration Service truncates string data to fit the precision of a column and provides detailed transformation statistics. The Integration Service writes row data for all rows in a block when it processes a transformation when you configure the tracing level to verbose data.

39. How can the ‘not exists’ operator be implemented in Informatica?

Implementing the Not Exists operator is very easy in Informatica. For example: If we want to get only the records which are available in table A and not in table B, we use a joiner transformation with A as master and B as detail. We specify the join condition and in the join type, select detail outer join. This will get all the records from A table and only the matching records from B table. Connect the joiner to a filter transformation and specify the filter condition as B_port is NULL. This will give the records which are in A and not in B. Then connect the filter to the target definition.

40. If a parameter file path is defined at both the session level as well as at the workflow level, which path will be taken while running the workflow?

The workflow-level parameter file is picked up irrespective of whether a parameter file is defined at the session level or not.

41. When do we select a connected/unconnected transformation in Informatica?

Unconnected Lookup should be selected when the same Lookup has to be performed at multiple places. This is a kind of reusable lookup which is used as a function in any transformation using LKP Expression. It uses the only static cache.
A connected lookup should be used when the same lookup need not be used at multiple places. It is the part of the data flow. It uses both static and dynamic cache.

42. When we right click on a running session from the workflow monitor, Stop and Abort options are available. What is the difference between both of them?

Stop option just makes the integration service stop taking input from the source but continues to process the records which are being processed to go to the target.
Abort option makes the integration service stop not only taking the records from the input but also stops the in-process records.

43. What is the scenario which compels the Informatica server to reject files?

This happens when it faces DD_Reject in update strategy transformation. Moreover, it disrupts the database constraint filed in the rows was condensed.

44. How can we keep last 20 session logs in Informatica?

Go to Session→ Properties→Config Object→ Log Options
Set these 2 properties:
Save Session Logs by Session runs.
Save Session Log for these runs: 20

45. How can we delete duplicate rows from flat files?

We can make use of sorter transformation and select distinct option to delete the duplicate rows.

46. Under what condition selecting Sorted Input in aggregator may fail the session?

If the input data is properly sorted, the session may fail if the sort order by ports and the group by ports of the aggregator are not in the same order.

47. How does Sorter handle NULL values?

We can configure the way the sorter transformation treats null values. Enable the property Null Treated Low if we want to treat null values as lower than any other value when it performs the sort operation. Disable this option if we want the Integration Service to treat null values as higher than any other value.

48. How does rank transformation handle string values?

Rank transformation can return the strings at the top or the bottom of a session sort order. When the Integration Service runs in Unicode mode, it sorts character data in the session using the selected sort order associated with the Code Page of IS which may be French, German, etc. When the Integration Service runs in ASCII mode, it ignores this setting and uses a binary sort order to sort character data.

49. What is a Dimensional Model?

  • Data Modeling: It is a process of designing the database by fulfilling business requirements specifications.
  • A Data Modeler (or) Database Architect Designs the warehouse Database using a GUI based data modeling tool called “ERWin”.
  • ERWin is a data modeling tool from Computer Associates (A).
  • A dimensional modeling consists of following types of schemas designed for the data warehouse:
    • Star Schema.
    • Snowflake Schema.
    • Galary Schema.
  • A schema is a data model which consists of one or more tables.

50. What are the different ways of debugging a code in Informatica?

  • Use a debugger at the mapping level. A debugger gives you row by row data.
  • Use the workflow/session logs.
  • Use the create and add target after any transformation which you want to check for data.
  • Change the tracing level at session level/transformation level to verbose data.

Important Tips to remember while preparing for an interview

  • The Interview questions of any ETL tool like Informatica PowerCenter/ IBM Datastage mainly consist of 3 type of questions:
    • Theoretical Questions related to the tool.
    • Scenario-Based Questions related to the tool.
    • How to implement a particular functionality through the tool and with SQL/Unix(whichever applicable). The basic knowledge of Unix and SQL is very essential for clearing any ETL interview.
  • A lot of questions are generally asked about Automation, which you may have implemented in your project using Informatica PowerCenter.
    • For Example Export and import Automation, which can be done using Unix and PowerCenter.
  • Thoroughly go through all the properties at the session level, workflow level, and mapping level, a lot of questions are asked about that.
  • Be prepared for General Questions like ‘What is the most complex requirement you have implemented in Informatica PowerCenter’ or ‘What is the most complex bug you have faced and how did you fix it’.

All the Best!!

Share

Data science bootcamp

About the Author

Meet Siddhartha Mathur, a talented writer who enjoys baking and taking pictures in addition to contributing insightful articles. He contributes a plethora of knowledge to our blog with years of experience and skill.

Join OdinSchool's Data Science Bootcamp

With Job Assistance

View Course