Subscribe

RSS Feed (xml)

Powered By

Skin Design:
Free Blogger Skins

Powered by Blogger

search topic

Thursday, April 17, 2008

Data Warehouse Interview

explain why and where do we exactly use the lookup tranformations.

You can use the Lookup transformation to perform many tasks, including:

Get a related value. For example, your source includes employee ID, but you want to

include the employee name in your target table to make your summary data easier to read.

Perform a calculation. Many normalized tables include values used in a calculation, such as gross sales per invoice or sales tax, but not the calculated value (such as net sales).

Update slowly changing dimension tables. You can use a Lookup transformation to

determine whether rows already exist in the target.

how do you tell aggregator stage that input data is already sorted

By enablign sorted input property in Aggregator Properties

What are push and pull etl strategies?Push and Pull strategies determine how data comes from source system to ETL server.

Push : In this case the Source system pushes data i.e.(sends data) to the ETL server.

Pull : In this case the ETL server pulls data i.e.(gets data) from the source system.

What is the Difference between a ODS and Staging Area

ODS :-Operational Data Store which contains data .
ods comes after the staging area
eg:-
In our e.g lets consider that we have day level Granularity in the OLTP & Year level Granularity in the Data warehouse.
If the business(manager) asks for week level Granularity then we have to go to the oltp and summarize the day level to the week level which would be pain taking.So wat we do is that we maintain week level Granularity in the ods for the data,for abt 30 to 90 days.

Note : Ods information would contain cleansed data only. ie after staging area

Staging Area :-
It comes after the etl has finished.Staging Area consists of
1.Meta Data .
2.The work area where we apply our complex business rules.
3.Hold the data and do calculations.
In other words we can say that its a temp work area.

How you capture changes in data if the source system does not have option of storing date/time field in source table from where you need to extract the data?

The DW database can be Oracle or Teradata. The requirement here is to pull data from source system and ETL need to device a mechanism to identify the changes or new records. The source system can be a legacy system like AS400 application or Mainframe application. List out all such methods of data capture. The ETL can be Informatica, data stage or custom etl code.

If LKP on target table is taken, can we update the rows without update strategy transformation?

yes, by using dynamic lookup

In what scenario ETL coding is preferred than Database level SQL,PL/SQL coding?

Data scrubbing process is difficult. That is, file contains date column like 20070823 but data warehouse requires date as 08/23/2007 in that case it is difficult.

No comments: