Header Ads

SSIS Interview Questions and Answers

SSIS Development Questions

How many difference source and destinations have you used?
 It is very common to get all kinds of sources so the more the person worked with the
better for you. Common ones are SQL Server, CSV/TXT, Flat Files, Excel, Access, Oracle,
MySQL but also Salesforce, web data scrapping.

What configuration options have you used?
 This is an important one. Configuration should always be dynamic and usually is done using XML and/or  Environment Variable and SQL Table with all configurations.

How do you apply business rules in SSIS (Transformations….Specific calculations
but also cleansing)?
 Some people use SSIS only to extract data and then go with stored procedures
only….they are usually missing the point of the power of SSIS. Which allows to create "a
flow" and on each step applies certain rules this greatly simplifies the ETL process and
simplicity is very good.

How to quickly load data into sql server table?
 Fast Load option. This option is not set by default so most developers know this
answer as otherwise the load is very slow.

Give example of handling data quality issues?
Data Quality is almost always a problem and SSIS handles it very well. Examples
include importing customers from different sources where customer name can be duplicates.
For instance you can have as company name: SQL Server Business Intelligence but also SQL
Server BI or SQL Server BI LTD or SQL Server BI Limited or intelligence (with one l).
There are different ways to handle it. Robust and time consuming is to create a table with or
possible scenarios and update it after each update. You can also use fuzzy grouping which is
usually easy to implement and will make usually very good decisions but it is not 100%
accurate so this approach has to be justified. Other typical quality issues are nulls (missing
values), outliers (dates like 2999 or types like 50000 instead of 5000 especially important if
someone is adjusting the value to get bigger bonus), incorrect addresses
and these are either corrected during ETL, ignored, re-directed for further manual updates or
it fails the packages which for big processes is usually not practised.

 When to use Stored Procedures?
 This was one of the requested question in comment (at the bottom of the page). This
one is very important but also tricky. ALL SSIS developers have SQL Server background and
that is sometime not very good if they use SQL not SSIS approach.

Let's start with when you typically use SPs. This is for preparing tables (truncate), audit tasks
(usually part of SSIS framework), getting configuration values for loops and a few other
general tasks.

During ETL extract you usually type simple SQL because it comes from other sources and
usually over complication is not a good choice (make it dynamic) because any changes
usually affect the package which has to be updated as well.

During Transformation phase (business rules, cleaning, core work) you should use
Transformation tasks not Stored procedures! There are loads of tasks that make the package
much easier to develop but also a very important reason is readability which is very important
for other people who need to change the package and obviously it reduces risks of making
errors. Performance is usually very good with SSIS as it is memory/flow based approach. So
when to use Stored Procedures for transformations? If you don't have strong SSIS developers
or you have performance reasons to do it. In some cases SPs can be much faster (usually it
only applies to very large datasets). Most important is have reasons which approach is better
for the situation.

What is your approach for ETL with data warehouses (how many packages you
developer during typical load etc.)?
This is rather generic question. A typical approach (for me) when building ETL is to.
Have a package to extract data per source with extract specific transformations (lookups,
business rules, cleaning) and loads data into staging table. Then a package do a simple merge
from staging to data warehouse (Stored Procedure) or a package that takes data from staging
and performs extra work before loading to data warehouse. I prefer the first one and due to
this approach I occasionally consider having extract stage (as well as stage phase) which gives
me more flexibility with transformation (per source) and makes it simpler to follow (not
everything in one go). So to summarize you usually have package per source and one package
per data warehouse table destination. There are might be other approach valid as well so ask
for reasons.

General SSIS Questions 

Which versions of SSIS have you used? 
 Differences between 2005 and 2008 are not very big so 2005, 2008 or 2008 R2
experience usually is very similar. The big difference is with 2000 which had DTS and it very
different (SSIS is created from scratch)

 Have you used SSIS Framework? 
This is common term in SSIS world which just means that you have templates that
are set up to perform routine tasks like logging, error handling etc. Yes answer would usually
indicate experienced person, no answer is still fine if your project is not very mission critical.

 Do you have experienced working with data warehouses? 
 SSIS is in most cases used for data warehouses so knowledge of Data Warehouses
 Designs is very useful.

 What are the changes in SSIS 2012? 
 Major changes are configuration,deployment and logging.

No comments:

Powered by Blogger.