What can you do to increase performance or explain Performance tuning in Informatica?
The goal of performance tuning is to optimize session performance so sessions run during the available load window for the Informatica Server.Increase the session performance by following:
The performance of the Informatica Server is related to network connections. Data generally moves across a network at less than 1 MB per second, whereas a local disk moves data five to twenty times faster. Thus network connections often affect on session performance.So avoid network connections.
Flat files: If your flat files stored on a machine other than the Informatica server, move those files to the machine that consists of Informatica server.
Relational data sources: Minimize the connections to sources, targets and Informatica server to improve session performance.Moving target database into server system may improve session performance.
Staging areas: If you use staging areas you force Informatica server to perform multiple data passes.Removing of staging areas may improve session performance.
You can run the multiple Informatica servers’ against the same repository.Distributing the session load to multiple Informatica servers may improve session performance.
Run the Informatica server in ASCII data movement mode improves the session performance. Because ASCII data movement mode stores a character value in one byte.Unicode mode takes 2 bytes to store a character.
If a session joins multiple source tables in one Source Qualifier, optimizing the query may improve performance. Also, single table select statements with an ORDER BY or GROUP BY clause may benefit from optimization such as adding indexes.
We can improve the session performance by configuring the network packet size, which allows
data to cross the network at one time. To do this go to server manger, choose server configure database connections.
If your target consists key constraints and indexes you slow the loading of data. To improve the session performance in this case drop constraints and indexes before you run the session and rebuild them after completion of session.
Running parallel sessions by using concurrent batches will also reduce the time of loading the
data. So concurrent batches may also increase the session performance.
Partitioning the session improves the session performance by creating multiple connections to sources and targets and loads data in parallel pipe lines.
In some cases if a session contains an aggregator transformation, you can use incremental aggregation to improve session performance.
Avoid transformation errors to improve the session performance.
If the session contained lookup transformation you can improve the session performance by enabling the look up cache.
If your session contains filter transformation, create that filter transformation nearer to the sources or you can use filter condition in source qualifier.
Aggreagator, Rank and joiner transformation may often decrease the session performance .Because they must group data before processing it. To improve session performance in this case use sorted ports option.
How do you handle decimal places while importing a flat file into Informatica?
While importing flat file definition just specify the scale for a numeric data type. In the mapping, the flat file source supports only number data type(no decimal and integer). In the SQ associated with that source will have a data type as decimal for that number port of the source.
Source - Number data type port - SQ - decimal data type. Integer is not supported. Hence decimal is taken care.
What is the use of incremental aggregation? Explain in brief with an example?
It’s a session option. When the Informatica server performs incremental aggregation, it passes new source data through the mapping and uses historical chache data to perform new aggregation calculations incrementally. For performance we will use it.
Differences between Normalizer and Normalizer transformation?
Normalizer: It is a transformation mainly used for Cobol sources,it changes the rows into columns and columns into rows
Normalization: To remove the redundancy and inconsistency
What is the target load order?
You specify the target load order based on source qualifiers in a mapping. If you have the multiple source qualifiers connected to the multiple targets, you can designate the order in which Informatica server loads data into the targets.
What are the output files that the Informatica server creates during the session run?
Informatica server log: Informatica server(on Unix) creates a log for all status and error messages(default name: pm.server.log). It also creates an error log for error
messages. These files will be created in Informatica home directory.
Session log file: Informatica server creates session log file for each session. It writes information about session into log files such as initialization process, creation of sql commands for reader and writer threads, errors encountered and load summary. The amount of detail in session log file depends on the tracing level that you set.
Session detail file: This file contains load statistics for each target in mapping. Session detail include information such as table name, number of rows written or rejected you can view this file by double clicking on the session in monitor window.
Performance detail file: This file contains information known as session performance details which helps you where performance can be improved. To generate this file select the performance detail option in the session property sheet.
Reject file: This file contains the rows of data that the writer does not write to targets.
Control file: Informatica server creates control file and a target file when you run a session that uses the external loader. The control file contains the information about the target flat file such as data format and loading instructions for the external loader.
Post session email: Post session email allows you to automatically communicate information about a session run to designated recipents.You can create two different messages. One if the session completed successfully the other if the session fails.
Indicator file: If you use the flat file as a target, you can configure the informatica server to create indicator file. For each target row, the indicator file contains a number to indicate whether the row was marked for insert, update, delete or reject.
Output file: If session writes to a target file, the informatica server creates the target file based on file properties entered in the session property sheet.
Cache files: When the informatica server creates memory cache it also creates cache files.
For the following circumstances informatica server creates index and data cache files:
What is Data driven?
The informatica server follows instructions coded into update strategy transformations with in the session mapping determine how to flag records for insert, update, delete or reject. If you do not choose data driven option setting, the informatica server ignores all update strategy transformations in the mapping.
How the informatica server sorts the string values in Rank transformation?
We can run informatica server either in UNICODE data moment mode or ASCII data moment mode.
Unicode mode: In this mode informatica server sorts the data as per the sorted order in session.
ASCII Mode: In this mode informatica server sorts the date as per the binary order.
When do you use an unconnected lookup and connected lookup?
what is the difference between dynamic and static lookup?
Why and when do we use dynamic and static lookup?
In static lookup cache, you cache all the lookup data at the starting of the session. In dynamic lookup cache, you go and query the database to get the lookup value for each record which needs the lookup. Static lookup cache adds to the session run time, but it saves time as informatica does not need to connect to your database every time it needs to lookup. Depending on how many rows in your mapping needs a lookup, you can decide on this. Also remember that static lookup eats up space. so remember to select only those columns which are needed.
How do we do unit testing in Informatica? How do we load data in Informatica?
Unit testing in Informatica are of two types
1. Quantitative testing
2. Qualitative testing
1. First validate the mapping
2.Create session on the mapping and then run work flow.
Once the session is succeeded then right click on session and go for statistics tab.There you can see how many numbers of source rows are applied and how many number of rows loaded in to targets and how many number of rows rejected. This is called Quantitative testing.
If once rows are successfully loaded then we will go for qualitative testing.
1.Take the DATM (DATM means where all business rules are mentioned to the corresponding source columns) and check whether the data is loaded according to the DATM in to target table. If any data is not loaded according to the DATM then go and check in the code and rectify it.
This is called Qualitative testing.This is what a developer will do in Unit Testing.
Explain error handling in Informatica with examples?
There is one file called the bad file which generally has the format as *.bad and it contains the records rejected by informatica server. There are two parameters one for the types of row and other for the types of columns. The row indicators signify what operation is going to take place (i.e. insertion, deletion, updating etc.). The column indicators contain information regarding why the column has been rejected. (Such as violation of not null constraint, value error, overflow etc.) If one rectifies the error in the data present in the bad file and then reloads the data in the target, then the table will contain only valid data.
What is power center repository?
Standalone repository: A repository that functions individually, unrelated and unconnected to other repositories.
Global repository: (Power Center only.) The centralized repository in a domain, a group of connected repositories. Each domain can contain one global repository. The global repository can contain common objects to be shared throughout the domain through global shortcuts.
Local repository. (Power Center only.) A repository within a domain that is not the global repository. Each local repository in the domain can connect to the global repository and use objects in its shared folders.
Explain difference between static and dynamic cache with one example?
Static Cache: Once the data is cached, it will not change. Example unconnected lookup uses static cache.
Dynamic Cache: The cache is updated as to reflect the update in the table (or source) for which it is referring to. (Ex. connected lookup).
What is update strategy transformation?
The model you choose constitutes your update strategy, how to handle changes to existing rows. In Power Center and Power Mart, you set your update strategy at two different levels:
Within a session. When you configure a session, you can instruct the Informatica Server to either treat all rows in the same way (for example, treat all rows as inserts), or use instructions coded into the session mapping to flag rows for different database operations.
Within a mapping. Within a mapping, you use the Update Strategy transformation to flag rows for insert, delete, update, or reject.
Explain Informatica server Architecture?
Informatica server, load manager/rs,data transfer manager,reader,temp server and writer are the components of Informatica server. First load manager sends a request to the reader if the reader is ready to read the data from source and dump into the temp server and data transfer manager manages the load and it send the request to writer as per first in first out process and writer takes the data from temp server and loads it into the target.
Define Informatica repository?
Informatica Repository: The Informatica repository is at the center of the Informatica suite. You create a set of meta data tables within the repository database that the Informatica application and tools access. The Informatica client and server access the repository to save and retrieve meta data.
What are the difference between joiner transformation and source qualifier transformation?
Joiner Transformation can be used to join tables from heterogeneous (different sources), but we still need a common key from both tables. If we join two tables without a common key we will end up in a Cartesian Join. Joiner can be used to join tables from difference source systems where as Source qualifier can be used to join tables in the same database. We definitely need a common key to join two tables no mater they are in same database or difference databases.
How can you improve session performance in aggregator transformation?
One way is supplying the sorted input to aggregator transformation. In situations where sorted input cannot be supplied, we need to configure data cache and index cache at session/transformation level to allocate more space to support aggregation.
What is the difference between connected and unconnected stored procedures ?
Unconnected: The unconnected Stored Procedure transformation is not connected directly to the flow of the mapping. It either runs before or after the session, or is called by an expression in another transformation in the mapping.
Connected: The flow of data through a mapping in connected mode also passes through the Stored Procedure transformation. All data entering the transformation through the input ports affects the stored procedure. You should use a connected Stored Procedure transformation when you need data from an input port sent as an input parameter to the stored procedure, or the results of a stored procedure sent as an output parameter to another transformation.
Is sorter an active or passive transformation? What happens if we uncheck the distinct option in sorter? Will it be under active or passive transformation?
Sorter is an active transformation. if you don't check the distinct option it is considered as a passive transformation. Because this distinct option eliminates the duplicate records from the table.
How can we partition a session in Informatica?
Partitioning option optimizes parallel processing on multiprocessor hardware by providing a thread-based architecture and built-in data partitioning. GUI-based tools reduce the development effort necessary to create data partitions and streamline ongoing troubleshooting and performance tuning tasks, while ensuring data integrity throughout the execution process. As the amount of data within an organization expands and real-time demand for information grows, the Power Center Partitioning option enables hardware and applications to provide outstanding performance and jointly scale to handle large volumes of data and users.
What are active and passive transformations?
Transformations can be active or passive. An active transformation can change the number of rows that pass through it, such as a Filter transformation that removes rows that do not meet the filter condition.
A passive transformation does not change the number of rows that pass through it, such as an Expression transformation that performs a calculation on data and passes all rows through the transformation.
What is tracing level and what are the types of tracing levels?
Tracing level represents the amount of information that Informatica server writes in a log file.
Types of tracing level:
How can you say that union Transformation is Active transformation?
By Definition, Active transformation is the transformation that changes the number of rows that pass through it. In union transformation the number of rows resulting from union can be different from the actual number of rows.
Is a fact table normalized or de-normalized?
A fact table is always DENORMALISED table. It consists of data from dimension table (Primary Key's) and Fact table has foreign keys and measures.
What are the different threads in DTM process?
Master thread: Creates and manages all other threads
Mapping thread: One mapping thread will be creates for each session.Fectchs session and mapping information.
Pre and post session threads: This will be created to perform pre and post session operations.
Reader thread: One thread will be created for each partition of a source. It reads data from source.
Writer thread: It will be created to load data to the target.
Transformation thread: It will be created to transform data.
If we are using Update Strategy Transformation in a mapping how can we know whether insert or update or reject or delete option has been selected during running of sessions in Informatica?
In Designer while creating Update Strategy Transformation uncheck "forward to next transformation". If any rejected rows are there automatically it will be updated to the session log file.
Update or insert files are known by checking the target file or table only.
How to join two tables without using the Joiner Transformation?
It’s possible to join the two or more tables by using source qualifier. But provided the tables should have relationship.
When you drag and drop the tables you will be getting the source qualifier for each table. Delete all the source qualifiers. Add a common source qualifier for all. Right click on the source qualifier you will find EDIT, click on it. Click on the properties tab and then you will find sql query in that you can write your sql.
Which is better among incremental load, Normal Load and Bulk load?
It depends on the requirement. Otherwise Incremental load can be better as it takes only that data which is not available previously on the target.
What is the difference between summary filter and detail filter?
Summary filter can be applied on a group of rows that contain a common value. Whereas detail filters can be applied on each and every red of the data base.
What are the tasks that Load manger process will do?
Manages the session and batch scheduling: When you start the Informatica server the load manager launches and queries the repository for a list of sessions configured to run on the Informatica server. When you configure the session the load manager maintains list of list of sessions and session start times. When you start a session load manger fetches the session information from the repository to perform the validations and verifications prior to starting DTM process.
Locking and reading the session: When the Informatica server starts a session load manager locks the session from the repository. Locking prevents starting the session again and again.
Reading the parameter file: If the session uses a parameter files,load manager reads the parameter file and verifies that the session level parameters are declared in the file
Verifies permission and privileges: When the session starts load manger checks whether or not the user have privileges to run the session.
Creating log files: Load manger creates log file contains the status of session.
What is Router transformation?
Router transformation allows you to use a condition to test data. It is similar to filter transformation. It allows the testing to be done on one or more conditions.
What type of metadata is stored in repository?
Source definitions: Definitions of database objects (tables, views, synonyms) or files that provide source data.
Target definitions: Definitions of database objects or files that contain the target data.
Multi-dimensional metadata: Target definitions that are configured as cubes and dimensions.
Mappings: A set of source and target definitions along with transformations containing business logic that you build into the transformation. These are the instructions that the Informatica Server uses to transform and move data.
Reusable transformations: Transformations that you can use in multiple mappings.
Mapplets: A set of transformations that you can use in multiple mappings.
Sessions and workflows: Sessions and workflows store information about how and when the Informatica Server moves data. A workflow is a set of instructions that describes how and when to run tasks related to extracting, transforming, and loading data. A session is a type of task that you can put in a workflow. Each session corresponds to a single mapping.
How to delete duplicate rows in flat files source?
Use a sorter transformation, in this you will have a "distinct" option make use of it.
Can you use aggregator/active transformation after update strategy transformation?
You can use aggregator after update strategy. The problem will be, once you perform the update strategy, say you had flagged some rows to be deleted and you had performed aggregator transformation for all rows, say you are using SUM function, then the deleted rows will be subtracted from this aggregator transformation.
What is the difference between dimension table and fact table and what are different dimension tables and fact tables?
Fact table contain measurable data, contains primary key
Different types of fact tables:
2. Non additive
3. Semi additive
Dimensions table contain textual description of data.
It contains primary key.
What is meant by lookup cache?
The Informatica server builds a cache in memory when it processes the first row at a data in a cached look up transformation. It allocates memory for the cache based on the amount you configure in the transformation or session properties. The Informatica server stores condition values in the index cache and output values in the data cache.
Can you use the mapping parameters or variables created in one mapping into any other reusable transformation?
Yes. Because reusable transformation is not contained with any maplet or mapping.
What are reusable transformations?
You can design using two methods:
1. using transformation developer
2. Create normal one and promote it to reusable
What is Code Page used for?
Code Page is used to identify characters that might be in different languages. If you are importing Japanese data into mapping, you must select the Japanese code page of source data.
Can you use a session Bulk loading options and during this time can you make a recovery to the session?
If the session is configured to use in bulk mode it will not write recovery information to recovery tables. So Bulk loading will not perform the recovery as required.
What are the differences between connected and unconnected lookup?
1) Receives input values directly from the pipe line.
2) you can use a dynamic or static cache.
3) Cache includes all lookup columns used in the mapping.
4) Support user defined default values.
1) Receives input values from the result of a lkp expression in a another transformation.
2) You can use a static cache.
3) Cache includes all lookup output ports in the lookup condition and the lookup/return port.
4) Does not support user defined default values.
What are the scheduling options to run a session?
A session can be scheduled to run at a given time or intervel, or you can manually run the session.
Different options of scheduling:
Run only on demand: server runs the session only when user starts session explicitly.
Run once: Informatica server runs the session only once at a specified date and time.
Run every: Informatica server runs the session at regular intervals as u configured.
Customized repeat: Informatica server runs the session at the dates and times specified in the repeat dialog box.
What is parameter file?
Parameter file is to define the values for parameters and variables used in a session. A parameter file is a file created by text editor such as word pad or notepad.
You can define the following values in parameter file:
What are the session parameters?
Session parameters are like mapping parameters, that represent values you might want to change between sessions such as database connections or source files.
Server manager also allows you to create user defined session parameters. Following are user defined session parameters:
Source file names: Use this parameter when you want to change the name or location of session source file between session runs.
Target file name: Use this parameter when you want to change the name or location of session target file between session runs.
Reject file name: Use this parameter when you want to change the name or location of session reject files between session runs.
In a sequential batch can you run the session if previous session fails?
Yes. By setting the option always runs the session.
How can you transform row to a column?
1. We can use normalizer transformation
2.Use pivot function in oracle
What are the basic needs to join two sources in a source qualifier?
Basic need to join two sources using source qualifier:
1) Both sources should be in same database
2) The should have at least one column in common with same data types
What are two types of processes that Informatica runs the session?
Load manager Process: Starts the session, creates the DTM process, and sends post-session email when the session completes.
The DTM process: Creates threads to initialize the session, read, write, and transform data, and handle pre- and post-session operations.
What are mapping parameters and variables in which situation we can use it ?
If we need to change certain attributes of a mapping after every time the session is run, it will be very difficult to edit the mapping and then change the attribute. So we use mapping parameters and variables and define the values in a parameter file. Then we could edit the parameter file to change the attribute values. This makes the process simple.
Mapping parameter values remain constant. If we need to change the parameter value then we needs to edit the parameter file.
But value of mapping variables can be changed by using variable function. If we need to increment the attribute value by 1 after every session run then we can use mapping variables.
In a mapping parameter we need to manually edit the attribute value in the parameter file after every session run.
What is the method of loading 5 flat files of having same structure to a single target and which transformations I can use?
1. Write all files in one directory then use file repository concept (don’t forget to type source file type as indirect in the session).
2. Use union transformation to combine multiple input files into a single target.
In which circumstances that Informatica server creates Reject files?
When it encounters the DD_Reject in update strategy transformation.
Violates database constraint
Field in the rows was truncated or overflown.
What is the default join that source qualifier provides?
Inner equi join.
What is the difference between Stored Procedure (DB level) and Stored proc trans (INFORMATICA level) ? Why should we use SP trans ?
First of all stored procedures (at DB level) are series of SQL statement. And those are stored and compiled at the server side. In the Informatica it is a transformation that uses same stored procedures which are stored in the database. Stored procedures are used to automate time-consuming tasks that are too complicated for standard SQL statements. if you don't want to use the stored procedure then you have to create expression transformation and do all the coding in it.
What are variable ports and list two situations when they can be used?
We have mainly tree ports Import, Output, Variable port. Import represents data is flowing into transformation. Out port is used when data is mapped to next transformation. Variable port is used when we mathematical calculations are required.
This is a scenario in which the source has 2 columns
and there should be 2 targets one to show the duplicate values and another target for distinct rows.
10 A 10 A
20 C 20 C
which transformation can be used to load data into target?
Step1: sort the source data based on the unique key.
Flag= iif(col1 =prev_col1,'Y','N')
prev_col1 = col1
1.for duplicate record: condition: falg = 'Y'
2. For distinct Records condition
flag = 'N'
What r the types of lookup caches?
1) Static Cache
2) Dynamic Cache
3) Persistent Cache
4) Reusable Cache
5) Shared Cache
What are the real times problems that generally come up while doing/running mapping/any transformation? Explain with an example?
Here are few real time examples of problems while running informatica mappings:
1) Informatica uses OBDC connections to connect to the databases.The database passwords (production) is changed in a periodic manner and the same is not updated at the Informatica side. Your mappings will fail in this case and you will get database connectivity error.
2) If you are using Update strategy transformation in the mapping, in the session properties
you have to select Treat Source Rows: Data Driven. If we do not select this Informatica
server will ignore updates and it only inserts rows.
3) If we have mappings loading multiple target tables we have to provide the Target Load Plan
in the sequence we want them to get loaded.
4) Error: Snapshot too old is a very common error when using Oracle tables. We get this error
while using too large tables. Ideally we should schedule these loads when server is not very
busy (meaning when no other loads are running).
5) We might get some poor performance issues while reading from large tables. All the source tables should be indexed and updated regularly.
In update strategy target table or flat file which gives more performance? Why?
Pros: Loading, Sorting, Merging operations will be faster as there is no index concept and Data will be in ASCII mode.
Cons: There is no concept of updating existing records in flat file. As there is no indexes, while lookups speed will be lesser.
What is the difference between constraint base load ordering and target load plan ?
Constraint based load ordering
If the data in Table-1 is dependent on the data in Table-2 then Table-2 should be loaded first. In such cases to control the load order of the tables we need some conditional loading which is nothing but constraint based load. In Informatica this feature is implemented by just one check box at the session level.
What is parameter file?
When you start a workflow, you can optionally enter the directory and name of a parameter file. The Informatica Server runs the workflow using the parameters in the file you specify.
For UNIX shell users, enclose the parameter file name in single quotes:
For Windows command prompt users, the parameter file name cannot have beginning or trailing spaces. If the name includes spaces, enclose the file name in double quotes:
-paramfile ?$PMRootDirmy file.txt?
Note: When you write a pmcmd command that includes a parameter file located on another machine, use the backslash () with the dollar sign ($). This ensures that the machine where the variable is defined expands the server variable.
Pmcmd startworkflow -UV USERNAME -PV PASSWORD -s SALES: 6258 -f east -w wSalesAvg -paramfile '$PMRootDir/myfile.txt'