Sunday, May 19, 2013

Data Extraction Techniques

 

SSIS Packages primary objective is to move data from a source to a destination database. The Sources can be a flat file, a SQL Server Database, an Oracle Based DB, FTP Server, Excel File, Access DB, etc. [1]

 

Choosing the right Data source control will be the first step in improving performance of your SSIS package. The naming standards require Data Flow Sources to be named as in the example “SRC Get Customer Orders”.

The most commonly used Data flow sources by MS BI Developers are

OLE DB

Most commonly used, Reads Data from Views, SQL Queries, Tables and Variables. Simple to configure and use

Uses MS Default OLE DB Providers and generates the metadata based on the column data type.

Works best for SQL Server Databases that do not have huge volume of records.

ADO.NET

Uses ADO.NET Providers to connect to Data sources.

Use ADO.NET when you want to avoid oracle SQL data type mismatches, 64 bit runtime engines and when you have a provider that is not supported by OLEDB.

ADO.NET Source is a little bit more efficient in managing its buffer memory usage so preferred when the source data set volume is very high.

ADO Sources can have the SQL Command assigned from variables by assigning the SQL Command Property of the ADO Source from the Parent Data Flow Control. The variable can be built by expressions.[2]

 

 The Flat file and the Excel File Data flow sources are other commonly used data flow Sources.

Use the SQL Command to select data from the Source rather than the View or Table. Do not use Select (*) Statements as the Control will take up more memory to determine the column Metadata.

The SQL Query should contain code to do data type conversions that might be required. For e.g. Convert and Cast Statements can be embedded in the SQL, rather than adding a separate transformation to convert data types. Add an order by to the SQL Statement rather than adding a Sort Transformation.



[1] Supported Data Sources – Technet Article

[2] MSDN Forum Configuring ADO NET Source Query using Variables

Wednesday, March 20, 2013

Guidelines to Design and Develop SSIS Packages

Introduction
As most companies are investing a lot to get their data in order, Microsoft BI technologies have started to play a pivotal role in converting data to information. Microsoft SQL Server Integration Services (SSIS) allows the users to create packages that facilitate data integration, and data analysis. SSIS packages should be designed to meet the requirements and meet optimal performance standards as well. The package performance is determined by a number of external factors such as the amount of data, infrastructure and the design itself.  This article will highlight some features of SSIS that developers can use to design and develop efficient packages
Guidelines to Package Design
The most common function of SSIS packages is to move data from a source to a destination. Every SSIS package one designs should follow the following design template
1.       Flexible – Design a package that is configurable
2.       Extract – Design a package that uses the best extraction technique for the Source Data
3.       Transform – Design packages that use minimal transformations when possible
4.       Load and Log – Design packages that loads data quickly and logs information for reprocessing and maintenance.

·         SSIS does plenty of transformations in memory, the package can be performance intensive, so as a rule of thumb split your package design into smallest units of wok possible.
·         If the package is expected to support restarts then the checkpoints property should be turned on for the Packages.
·         Use the Package properties Under Execution group to control Package behaviour on Error, number of concurrent executables. [1]
Designing Configurable Packages
Over the course of the development cycle, the SSIS package would have to be moved and tested in various environments.  The Source and the Destination settings would also change along with the environments, so packages should support configurations that can be changed as packages are deployed across environments.
·         SSIS packages can use XML Configuration, Environment variables configuration and also SQL Server Configurations. The Attribute values are saved in the configuration store and your package should dynamically load the attribute values from the store. SQL Server configuration store would be the preferred option. [2] [3]
·         SSIS packages support the use of Variables. Variables can be used to build expressions that can be used in Precedence Constraints, Expressions of Connection Managers, Source Queries, etc.[4]
·         SSIS packages using dynamic configuration settings must have the Delay Validation property set to True.








[2] Package Configurations –MSDN Article
[3] Best Practices – Package Configurations - Technet Article

Sunday, November 4, 2012

Introduction to SSIS

SSIS

Introduction

SQL Server Integration Services is part of the Business Intelligence Technology offered my MS SQL Server. SSIS is a platform to build enterprise level data integration and data transformation solutions.
SSIS provides the developers a set of built in tasks, that will allow him to perform a variety of actions such as Merge data from multiple systems of records, populate data marts or data warehouses, perform Data transformation activities and perform a host of maintenance activities on the database.
Typical uses of Integration Services Package are better explained - http://msdn.microsoft.com/en-us/library/ms137795(v=sql.105).aspx
Architecture
The SSIS Architecture is made of the following 4 components. Refer Image from the MSDN Library for SSIS.
The SSIS has 4 main components.
1.       The Integration Service – a windows service and is set up when the SQL Server is installed with the integration services module. The Integration Service does the work to monitor, to execute, and to import and export SSIS Packages. The packages can be either a file system or stored in a Database. The Integration service is controlled by a configuration file called the MsDtsSrvr.ini.xml.
2.       SSIS Runtime Engine and object model – The runtime engine acts as the core piece of functionality that coordinates execution of tasks, manages threads that execute the tasks and the object model exposes the native functionality of SSIS for the developers to use and define custom tasks or invoke packages remotely from third party applications
3.       SSIS Clients – the SSIS Designer Tool or any Custom Script that can leverage the SSIS runtime and object model to design and create a SSIS Package.
4.       SSIS Data Flow Engine and Data Flow Components – The Data flow engine manages the data pipeline and is invoked by the Data flow Components. The Data flow engine is instrumental in fetching the records from the source, transforming the data and saving them into the database. Most of the performance tuning for a SSIS Package will be done at the data flow engine level.
SSIS Elementary units
Task – Individual unit of works that help achieve the required functionality. The SSIS Designer tools offer a variety of Tasks that can be used readily. The developer can also define and use custom tasks.
Some of the tasks that I have used are Data Flow task, Execute SQL task, Script Task, Sequence Containers, for each loop enumerators etc.
Package – A XML Structured executable program that executes a series of tasks to implement a workflow. The precedence constraints are set on tasks to control the sequence of execution of various tasks.
Variables – Can be used to set properties dynamically. Variables are assigned through configuration files/Database or even from other Tasks.
Connection managers – Can define data sources to be used by the source and destination transformations. The Connection managers can be configured by variables.