Posts

Thomas Hoover Joins UpSearch

We are excited to announce Thomas Hoover joined UpSearch.  He will be working as a Data Platform Consultant based in Perrysville, Ohio.  Welcome, Tom!

 

About Tom Hoover

Thomas Hoover Joins UpSearch

Tom Hoover is passionate about helping organizations use Microsoft’s data platform, especially SQL Server, to transform their business while ensuring the critical data systems maintain confidentiality, availability, and integrity.

Since 2005, Tom has specialized in Microsoft’s database platforms and data management. He is an accomplished DBA team leader with significant OLTP and Data Warehouse expertise. Tom excels at protecting data in highly secure environments while empowering users to easily access the information they need to meet their organization’s needs. Tom believes that by correctly using data, businesses can change their customer’s lives.

To learn more about Tom, visit https://upsearch.com/thomas-hoover/.

Please help me welcome Tom to the UpSearch team.

Shawn Upchurch,
Founder & CEO

Welcome to UpSearch Tom!

Fun Facts

Tom is a musician, historian, and avid reader. He is also knowledgeable in the Department of Defense Information Assurance Certification and Accreditation Process (DIACAP) and has a proven track record in managing the Plan of Action and Milestones (POAM) based on Government IA guidance and regulations for both new systems and existing SQL Server systems undergoing re-accreditation. Additionally, Tom has years of experience in ensuring compliance with various DISA Security Technical Implementation Guides (STIGs) using both approved automated tools and manual review.

Tom is married to an incredible lady and has four wonderful daughters.

About UpSearch

up-social-round

UpSearch is a company of data management and analytics experts who enable digital maturity with Microsoft’s technologies. Its mission is to enable every leader to unlock data’s full potential. UpSearch provides full lifecycle support for SQL Server, SQL Server in Azure (IaaS), Azure SQL DB (PaaS), Azure SQL DW (PaaS), Analytics Platform System (APS), and Power BI.

Implementing a data warehouse: SSIS Connection Managers

Originally published on ColleenMorrow.com.

This post is part of a series on implementing a data warehouse with Microsoft SQL Server 2012. So far, we’ve covered:

One of the first items you’ll create when developing a new project in SQL Server Integration Services (SSIS) is a connection manager.  Connection managers allow control flow and data flow tasks within SSIS packages to access a variety of data sources and destinations.  Depending on the type of connection manager you create, you’ll be able to access relational database sources like SQL Server or Oracle, web services, external files, FTP Servers.  The connection manager you select will not only determine what data store you can access, but it can also determine what functionality is available to you, so choose wisely.

The following connection managers are built-in:

  • SQL Server
    • ADO.NET (preferred when using stored procedures or parameterized queries for its usability and better support for a larger range of data types)
    • ODBC
    • OLE DB (deprecated, use ODBC instead)
    • SQL Server Compact Edition
  • SQL Server administrative tasks
    • SMO
  • File operations
    • File or Multiple File
    • Flat File or Multiple Flat File
    • FTP
  • Other
    • Analysis Services
    • DQS (Data Quality Services)
    • Excel
    • HTTP
    • MSMQ (Microsoft Message Queueing)
    • SMTP
    • WMI (Windows Management Instrumentation)

Once you’ve created a connection manager, you can reuse it as often as needed within the package/project, in a variety of roles.  For example, an ODBC connection to the AdventureWorks2012 database can be used as the source for a data flow task, a data flow destination, the connection for an Execute SQL Task, etc.  It’s not necessary to create a separate connection manager for each task.

32-bit or 64-bit

One of the most confusing aspects about developing SSIS projects for newbies is when you’re running 32-bit and when you’re running in 64-bit.  The SSDT development environment uses 32-bit providers.  So you’ll only see those providers where a 32-bit version of the adapter was installed on the workstation on which you’re developing.  When you execute the project outside of the development environment, the version of the adapter is determined by the OS you’re running on.  As you start to develop projects in SSDT, you may notice a project property called Run64BitRuntime.  Don’t be confused, this property is only used at design-time, not at runtime.

Scope

Within SSIS, there is a hierarchy of scope.  For example, a project can contain one or more packages.  Those packages can contain one or more connection managers.  Additionally, the project itself can contain one or more connection managers.  Where a connection manager is available is dependent on where it was created.  A package-scoped connection manager is created within the scope of a specific package and may only be accessed by tasks within that package.  Connection managers that are project-scoped, however, are available to tasks in any package within the project.  If you define both a project-scoped and a package-scoped connection manager with the same name, the package-scoped connection manager’s settings will take precedence within that package.

Summary

Connection managers are relatively simple, yet critical components in SSIS development. Familiarize yourself with the various types available and the pros and cons of each.  And bear in mind that third-party connection managers are also available.

About the Author

SQL Server Consultant

Colleen Morrow

Colleen Morrow is SQL Server Consultant at UpSearch.

Colleen is a database strategist, community advocate, author, blogger and public speaker. She is passionate about helping technology leaders use Microsoft’s SQL Server to protect, optimize and unlock data’s value.

Colleen has been working with relational databases for almost 20 years. Since 2000, Colleen has specialized in SQL Server and Oracle database management solutions. She excels at performance tuning, troubleshooting mission critical SQL Server environments, and training technology professionals and business users of all levels.

Since 2011, Colleen has maintained a SQL Server focused blog at http://colleenmorrow.com. She is an active member of the Ohio North SQL Server User Group, as well as a volunteer for the Professional Association for SQL Server (PASS). Colleen earned a Bachelor of Science in Computer and Information Systems from Cleveland State University.

Learn more about Colleen Morrow at https://upsearch.com/colleen-morrow/.

Implementing a data warehouse: Introduction to SSDT

Originally published on ColleenMorrow.com.

This post is part of a series on implementing a data warehouse with Microsoft SQL Server 2012. So far, we’ve covered:

Once you’ve designed your fact tables and dimensions, decided how you’re going to partition and index your tables, and created your empty structures, it’s time for the fun stuff:  loading data.

There are 2 basic scenarios when it comes to moving data from one repository to another: simple and complex.  For simple data movements, data is read from the source and loaded into the destination as-is.  This can be as simple as an export/import process.  But for most data warehouse environments, a simple data movement isn’t going to cut it.  Most of the time, we need to massage or transform the data prior to loading it into the data warehouse.  This is a complex data movement, and is also known as ETL, or Extract-Transform-Load.

SQL Server Integration Services (SSIS) is Microsoft’s primary tool for developing and implementing the ETL process.  And SQL Server Data Tools is the development environment in SQL Server 2012 for the development and deployment of SSIS projects, as well as SQL Server Analysis Services (SSAS) and SQL Server Reporting Services (SSRS) projects.  If you’ve worked with SSIS packages prior to SQL Server 2012, you’re probably familiar with the Business Intelligence Development Studio (BIDS), which was similar to the SSDT interface.

The first time you start up SSDT, you may see a window asking you to choose your default environment settings.  If you’ll primarily be developing SSIS packages , go ahead and choose Business Intelligence Settings.  This will customize your environment for developing business intelligence projects.  You can always change this setting later. Once Visual Studio opens, click New Project  Navigate to Installed Templates -> Integration Services, and click Integration Services Project.  Give your project a name and click OK.

An SSDT Tour

 

The SSDT interface

The SSDT interface

 

In the upper right-hand corner of the SSDT interface you’ll see the Solution Explorer.  There you should see the Solution and Project you just created, as well as all related files.  Right now, the only file in our project is the default Package.dtsx. Any additional files or project-level connection managers or parameters you create will be listed here as well.

Below the Solution Explorer, in the lower right-hand corner, you’ll see a Getting Started window, listing resources for sample SSIS projects, which can be helpful for getting familiar with the interface.  If you close the Getting Started tab, you’ll see the Properties pane.

Continuing clockwise, at the bottom of the interface you’ll see the Connection Managers pane, where you will define package-level connection managers.  Connection managers allow SSIS to access various data sources and destinations.  We’ll discuss connection managers in more depth next time.

On the left side of the interface you’ll see the SSIS Toolbox pane.  The items listed in the Toolbox are context-specific.  Right now, in the context of the control flow tab, you’ll find control flow tasks like the Execute SQL Task, Execute Process Task, FTP Task, etc.  Click on the Data Flow tab, and you’ll see a different set of items in the Toolbox, ETL tasks that are specific to the data flow.

Tookbox context

Tookbox context

We should probably take a step back for a second and talk about control flow vs. data flow.  In SSIS, control flow is the defined order of operations in a package and the precedence constraints that define when they are executed.  Data flow describes the movement of data from source, through any transformations, to the destination.  The data flow itself is actually a task within the control flow of the package.

Other uses for SSIS

Data movement is the most common use for SSIS, but keep in mind that SSIS is capable of much more, including

  • File system operations and FTP access – if the data you need to load is transported or provided in files, SSIS can be used to access those files
  • Invoking external processes – if, for example, an external service is required for your warehouse load process, the service can be invoked from within SSIS, making integration easier to maintain
  • Database administration tasks – maintenance plans are implemented as SSIS packages
  • OS operations and system state can be accessed via Windows Management Instrumentation (WMI)
  • Sending email
  • Processing SSAS objects

Summary

If you’re not already familiar with the SSDT environment, take some time to poke around.  Download some sample projects and get comfortable with the interface.  We’ll be spending a lot of time there in the next few weeks.

About the Author

SQL Server Consultant

Colleen Morrow

Colleen Morrow is SQL Server Consultant at UpSearch.

Colleen is a database strategist, community advocate, author, blogger and public speaker. She is passionate about helping technology leaders use Microsoft’s SQL Server to protect, optimize and unlock data’s value.

Colleen has been working with relational databases for almost 20 years. Since 2000, Colleen has specialized in SQL Server and Oracle database management solutions. She excels at performance tuning, troubleshooting mission critical SQL Server environments, and training technology professionals and business users of all levels.

Since 2011, Colleen has maintained a SQL Server focused blog at http://colleenmorrow.com. She is an active member of the Ohio North SQL Server User Group, as well as a volunteer for the Professional Association for SQL Server (PASS). Colleen earned a Bachelor of Science in Computer and Information Systems from Cleveland State University.

Learn more about Colleen Morrow at https://upsearch.com/colleen-morrow/.

Implementing a data warehouse: Intro to Columnstore Indexes

Originally published on ColleenMorrow.com.

This post is part of a series on implementing a data warehouse with Microsoft SQL Server 2012. So far, we’ve covered:

SQL Server 2012 introduced a new way of storing nonclustered indexes, called the columnstore index, which has been shown to improve performance in a data warehouse environment many times over. How, you might ask? Well, the magic of the columnstore index lies in how the data is stored.

Rowstore structure

If you look at a traditional nonclustered index, you’ll see that the data is stored in a row-wise format. For example, let’s assume we have a Customers table with columns for CustomerNum, FirstName, LastName, and BirthDate. In a row-wise storage format, data pages contain rows of data, as illustrated below.

rowstore storage

Rowstore storage structure

Let’s say I want to run the following query:

SELECT LastName FROM dbo.Customers

Even though I only want the LastNames, SQL Server still has to read all pages in their entirely into memory to satisfy my query (assuming I don’t have an index on last name). That makes for a lot of wasted space in memory holding information that I don’t really need (i.e. first names, birth dates, etc.). It also means a lot of extra IO operations if the data isn’t already in memory.

Columnstore structure

With columnstore indexes, data is stored in a columnar format, meaning pages contain the data from a single column, as shown below.

Columnstore storage structure

Columnstore storage structure

By storing data for a single column on a page, SQL Server can now read in only the data really needed by a particular query. If my query only needs LastName, SQL Server only needs to retrieve the pages for LastName. This leads to fewer IO operations and less wasted memory space. SQL Server also uses a special compression algorithm on columnstore indexes, leading to even greater IO savings.

Batch Mode Processing

But wait, there’s more! In addition to highly compressed columnstore indexes, SQL Server 2012 also introduced batch mode processing. In the data warehouse environment, we’re typically scanning and processing large sets of data. Working with individual rows would be very inefficient, so SQL Server can now choose to process data in batches, rather than one row at a time.

Not all operators will run in batch mode. In SQL Server 2012, batch mode processing is available for filter, sequence project, scan, hash match (partial) aggregation, hash match (inner join), and hash table build operations. You can use the Actual and Estimated Execution Mode properties of the Actual Execution Plan to determine what mode SQL Server is using in a particular plan. Keep in mind that batch processing is really only used when joining, filtering or aggregating large data sets.

The cost of columnstore indexes

Unfortunately, there’s no such thing as a free lunch. SQL Server still needs to ultimately return rows of data. Therefore, it needs to take the columnstore data and reconstruct the rows that satisfy a query. This reconstruction takes time and CPU resources. This also makes nonclustered columnstore indexes read-only and their underlying tables also read-only, so you won’t be using this in your OLTP databases. (SQL Server 2014 introduced updateable clustered columnstore indexes, but we won’t go into those here.) If your table is partitioned, the columnstore index must be partition aligned.

There can also only be one columnstore index on a table. A common recommendation for the data warehouse is to create a columnstore index on fact tables, putting all of the columns of the fact table in the columnstore index. Larger dimension tables can also benefit from columnstore indexes. Just keep in mind that you don’t want to use this feature on tables that are updated frequently.

Summary

That’s a brief introduction on columnstore indexes and their benefits, as well as their inherent costs. Next time we’ll look at how to implement columnstore indexes in our data warehouse.

About the Author

SQL Server Consultant

Colleen Morrow

Colleen Morrow is SQL Server Consultant at UpSearch.

Colleen is a database strategist, community advocate, author, blogger and public speaker. She is passionate about helping technology leaders use Microsoft’s SQL Server to protect, optimize and unlock data’s value.

Colleen has been working with relational databases for almost 20 years. Since 2000, Colleen has specialized in SQL Server and Oracle database management solutions. She excels at performance tuning, troubleshooting mission critical SQL Server environments, and training technology professionals and business users of all levels.

Since 2011, Colleen has maintained a SQL Server focused blog at http://colleenmorrow.com. She is an active member of the Ohio North SQL Server User Group, as well as a volunteer for the Professional Association for SQL Server (PASS). Colleen earned a Bachelor of Science in Computer and Information Systems from Cleveland State University.

Learn more about Colleen Morrow at https://upsearch.com/colleen-morrow/.