Posts

How CommitBatchSize And CommitBatchThreshold Affect SQL Server Replication

CommitBatchSize And CommitBatchThreshold Affect SQL Server ReplicationOriginally published on KendalVanDyke.com.

Note: This is a long post. I hope it’s worth your while to read.

A common suggestion for optimizing transactional replication performance is to adjust the values in your distribution agent profile for CommitBatchSize and CommitBatchThreshold. Unfortunately what these two value really do isn’t documented very well anywhere. According to Books Online:

  • CommitBatchSize “Is the number of transactions to be issued to the Subscriber before a COMMIT statement is issued. The default is 100.”
  • CommitBatchThreshold “Is the number of replication commands to be issued to the Subscriber before a COMMIT statement is issued. The default is 1000.”

When you read this you might wonder what the difference is between a transaction and a command. After all, isn’t a command just an autocommit transaction? Does this mean that the commands in a transaction could get broken up into smaller transactions at the subscriber? Won’t that violate ACID? Microsoft went out of their way to make these two distinct values so each one has to influence delivery of commands in some way, right?

I went the cheap route to find out and posted to the forums at SQLServerCentral and to microsoft.public.sqlserver.replication. While I waited for an answer I set up some simple tests to try and figure it out for myself.

Before reading on, I want to test your replication mettle and give you a little quiz. If you want to cut to the chase and see if you’re right, jump to the conclusions at the end of this post to view the answers.

  1. Based on the defaults above, what happens if I issue 1,500 insert\update\delete statements which each affect a single row all contained within an explicit transaction (e.g. BEGIN TRANSACTION, 1,500 updates, then COMMIT)?
  2. What if those 1,500 statements are not part of a transaction?
  3. What happens if I issue three update statements: the first affecting 600 rows, the second affecting 500 rows, and the third affecting 400 rows?

 

Setup

I started by creating a simple table and populating it with 1,000 rows (representative of a large and random sample set):

SET NOCOUNT ON 
GO 
CREATE TABLE TransReplTest ( 
    [ID] INT IDENTITY(1,1), 
    [SomeDate] DATETIME, 
    [SomeValue] VARCHAR(20), 
    CONSTRAINT [PK_TransReplTest] PRIMARY KEY CLUSTERED ( 
        [ID] 
    ) 
) 
GO 
INSERT INTO TransReplTest (SomeDate, SomeValue) 
SELECT GETDATE(), CONVERT(VARCHAR(20), RAND() * 1000) 
GO 1000

I created a publication, added a table article to it, and set up a subscription. I created a new distribution agent profile based on the default agent profile and changed only the values for CommitBatchSize and CommitBatchThreshold. Each time I changed their values I stopped and started the distribution agent to ensure I was using the new value. Finally, I set up a profiler session capturing the following events on the subscriber: “RPC:Completed”, “SQL:BatchCompleted”, and “SQL:BatchStarting”. I captured some extra noise so for the sake of readability I saved my profiler sessions and filtered them for each test afterwards.

Test 1

To make things easier to understand I set the values low, like this:

CommitBatchSize: 3
CommitBatchThreshold: 5

First I ran this statement which randomly updates 10 rows, one at a time, all contained within a transaction:

BEGIN TRANSACTION

DECLARE @rownum TINYINT 
SET @rownum = 0

WHILE @rownum < 10 
BEGIN

    UPDATE TransReplTest 
    SET SomeDate = GETDATE() 
    WHERE ID IN ( 
        SELECT TOP 1 ID 
        FROM TransReplTest 
        ORDER BY NEWID() 
    )

    SET @rownum = @rownum + 1 
END

COMMIT

Profiler shows 1 transaction with 10 updates. CommitBatchThreshold didn’t appear affect anything here.

CommitBatchSize And CommitBatchThreshold Affect SQL Server Replication

When I issue the same statement without the explicit transaction Profiler shows 4 transactions of 3 updates, 3 updates, 3 updates, and 1 update. CommitBatchSize causes every 3 statements (each of which is an autocommit transaction) to be grouped together in a transaction on the subscriber.

CommitBatchSize And CommitBatchThreshold Affect SQL Server Replication

Next I issued a single update statement affecting 10 rows:

UPDATE TransReplTest 
SET SomeDate = GETDATE() 
WHERE ID IN ( 
    SELECT TOP 10 ID 
    FROM TransReplTest 
    ORDER BY NEWID() 
)

Profiler shows 1 transaction with 10 updates at the subscriber. Again CommitBatchThreshold did not affect anything here.

 

Finally, I issued an update statement affecting 3 rows 3 times:

UPDATE TransReplTest 
SET SomeDate = GETDATE() 
WHERE ID IN ( 
    SELECT TOP 3 ID 
    FROM TransReplTest 
    ORDER BY NEWID() 
) 
GO 3

This time Profiler shows 2 transactions; the first contains 6 updates and the second contains 3 updates. Even though CommitBatchSize was set to 3, the number of rows affected by the first two statements exceeded CommitBatchThreshold (set to 5) and so the third statement was put into its own transaction.

CommitBatchSize And CommitBatchThreshold Affect SQL Server Replication

Test 2

Let’s see what happens if we switch the values around and run through the same statements as Test 1:

CommitBatchSize: 5
CommitBatchThreshold: 3

Randomly update 10 rows in a single transaction. We still see 1 transaction with 10 updates.

CommitBatchSize And CommitBatchThreshold Affect SQL Server Replication

Now the same statement without the transaction. Now we see 4 transactions with updates in batches of 4, 1, 4, and 1. This time CommitBatchThreshold set to 3 was the reason we saw the smaller batches. But wait – shouldn’t a CommitBatchThreshold of 3 mean we should see 3 transactions with 3 updates in each transaction and 1 transaction with 1 update?

CommitBatchSize And CommitBatchThreshold Affect SQL Server Replication

Next, the single update statement affecting 10 rows. Profiler shows 1 transaction with 10 updates. CommitBatchThreshold made no difference here.

CommitBatchSize And CommitBatchThreshold Affect SQL Server Replication

Finally, updating 3 rows 3 times. Profiler shows 1 transaction with 6 updates and 1 transaction with 3 updates. Just like in Test 1, even though CommitBatchThreshold was set to 3, the number of rows affected by the first two statements exceeded CommitBatchSize (set to 5) and so the third statement was put into its own transaction.

CommitBatchSize And CommitBatchThreshold Affect SQL Server Replication

Conclusions

Based on the profiler traces, here’s what we can conclude:

  • Replication delivers changes to subscribers one row at a time; a single update statement which changes 1,000 rows at the publisher will result in 1,000 statements at the subscriber.
  • Transactions at the publisher are kept intact at the subscriber. If a million rows are affected inside a transaction – either explicit or autocommit – on the publisher you can count on a million statements in one explicit transaction to be delivered to the subscriber. We saw this in both the statement which updated 10 rows one at a time in an explicit transaction and the statement which updated 10 rows in one shot.
  • The lower of the two values for CommitBatchSize and CommitBatchThreshold is the primary driver for how statements are delivered to subscribers. When the number of transactions on the publisher reaches that lower value the distribution agent will apply them in a single explicit transaction at the subscriber. We saw this in the statement which updated 10 rows one at a time (not in a transaction).
  • The higher of the two values can cause commands to be delivered before the lower value is met. If the number of rows affected at the publisher reaches the higher value the distribution agent will immediately apply the changes in a single explicit transaction at the subscriber. We saw this in the statements which updated 3 rows 3 times.

OK, now that we know how CommitBatchSize and CommitBatchThreshold affect SQL Server replication, let’s look at the answers to the quiz:

  1. Based on the defaults above, what happens if I issue 1,500 insert\update\delete statements which each affect a single row all contained within an explicit transaction (e.g. BEGIN TRANSACTION, 1,500 updates, then COMMIT)?
    Answer: Transactions at the publisher are honored at the subscriber so we’ll see one explicit transaction applied at the subscriber which contains 1,500 updates.
  2. What if those 1,500 statements are not part of a transaction?
    Answer: Because CommitBatchSize is set to 100 we’ll see 15 transactions at the subscriber, each containing 100 statements which affect one row per statement.
  3. What happens if I issue three update statements: the first affecting 600 rows, the second affecting 500 rows, and the third affecting 400 rows?
    Answer: Because CommitBatchThreshold is set to 1,000 we’ll see 2 transactions at the subscriber. The first transaction will contain 1,100 statements and the second transaction will contain 400 statements.

BTW, remember that one weird case where CommitBatchSize was set to 5, CommitBatchThreshold was set to 3, and when we updated 10 rows one at a time we saw 4 transactions with updates in batches of 4, 1, 4, and 1? I think that’s a bug in SQL Server. It looks like the distribution agent alternates between (CommitBatchThreshold + 1) and (CommitBatchThreshold – 2) number of commands placed into each transaction delivered to the subscriber. Since this only appears to happen when CommitBatchSize is higher than CommitBatchThreshold – and most people don’t change the values to work that way – this seems relatively insignificantbut still not the behavior that I expected to see.

 

P.S. – I did get a response to my usenet posting from MVPs Hilary Cotter and Paul Ibison. You can read it here. Gopal Ashok, a program manager in the SQL Replication team, confirmed that what I thought was a bug was indeed a bug. Yay me!

 

About the Author

Microsoft SQL Server MVP & Principal Consultant

Kendal Van Dyke

UpSearch Alum Kendal Van Dyke is a database strategist, community advocate, public speaker and blogger. He is passionate about helping leaders use Microsoft's SQL Server to solve complex problems that protect, unlock and optimize data's value.

Since 1999, Kendal has specialized in SQL Server database management solutions and provided IT strategy consulting. Kendal excels at disaster recovery, high availability planning/implementation and debugging/troubleshooting mission critical SQL Server environments.

Kendal Van Dyke served the SQL Server community as Founder and President of MagicPass, the Orlando, FL based chapter of the Professional Association for SQL Server (PASS). In 2012, Kendal served as a member of the PASS Board of Directors.

Kendal remains active in the SQL Server community as a speaker and blogger. He teaches SQL Server enthusiast and technology leaders how to protect, unlock and optimize data’s value. Since 2008, Kendal has operated a SQL Server focused blog at http://www.kendalvandyke.com/.

Microsoft acknowledged Kendal for his support and outstanding contributions to the SQL Server community by awarding him Microsoft MVP (2011-15). Learn more about Kendal Van Dyke https://upsearch.com/kendal-van-dyke/.

About UpSearch

up-social-round

UpSearch is a company of data management and analytics experts who enable digital maturity with Microsoft’s technologies. Its mission is to enable every leader to unlock data’s full potential. UpSearch provides full lifecycle support for SQL Server, SQL Server in Azure (IaaS), Azure SQL DB (PaaS), Azure SQL DW (PaaS), Analytics Platform System (APS), and Power BI.

Long merge and Slow merge SQL Server Replication Warnings

Long merge and Slow merge SQL Server Replication WarningsOriginally published on KendalVanDyke.com.

During my presentation about Transactional Replication at SQLSaturday #8 – Orlando 2008, I was asked what the difference was between Long merge and Slow merge SQL Server replication warnings. I didn’t know the answer (it was a session about transactional replication after all), but not being one to let a good question go unanswered I had to find out.

Here’s what they mean:

  • Long merge refers to the amount of time that the merge agent has run (or is currently running) during the most recent synchronization. If this value exceeds the set threshold the alert will fire.
  • Slow merge refers to the number of rows processed per second by the merge agent while running. If this value falls below the set threshold the alert will fire.

There are two threshold values for each alert – one for dialup connections and one for LAN connections. For dialup the default is 8 minutes for long merge and 20 rows/sec for slow merge. For LAN the default is 3 minutes for long merge and 100 rows/sec for slow merge.

Thresholds are enabled and set in Replication Monitor by selecting your merge publication in the left pane and the “Warnings” tab in the right pane. Alert actions are set by clicking the “Configure Alerts” button in the same pane or in Management Studio’s Object Explorer under SQL Server Agent –> Alerts. A screenshot of the Warnings tab in Replication Monitor is below.


(Click Image to view full size)

 

About the Author

Microsoft SQL Server MVP & Principal Consultant

Kendal Van Dyke

UpSearch Alum Kendal Van Dyke is a database strategist, community advocate, public speaker and blogger. He is passionate about helping leaders use Microsoft's SQL Server to solve complex problems that protect, unlock and optimize data's value.

Since 1999, Kendal has specialized in SQL Server database management solutions and provided IT strategy consulting. Kendal excels at disaster recovery, high availability planning/implementation and debugging/troubleshooting mission critical SQL Server environments.

Kendal Van Dyke served the SQL Server community as Founder and President of MagicPass, the Orlando, FL based chapter of the Professional Association for SQL Server (PASS). In 2012, Kendal served as a member of the PASS Board of Directors.

Kendal remains active in the SQL Server community as a speaker and blogger. He teaches SQL Server enthusiast and technology leaders how to protect, unlock and optimize data’s value. Since 2008, Kendal has operated a SQL Server focused blog at http://www.kendalvandyke.com/.

Microsoft acknowledged Kendal for his support and outstanding contributions to the SQL Server community by awarding him Microsoft MVP (2011-15). Learn more about Kendal Van Dyke https://upsearch.com/kendal-van-dyke/.

About UpSearch

up-social-round

UpSearch is a company of data management and analytics experts who enable digital maturity with Microsoft’s technologies. Its mission is to enable every leader to unlock data’s full potential. UpSearch provides full lifecycle support for SQL Server, SQL Server in Azure (IaaS), Azure SQL DB (PaaS), Azure SQL DW (PaaS), Analytics Platform System (APS), and Power BI.

Reduce Transactional SQL Server Replication Synchronization Time

Reduce Transactional SQL Server Replication Synchronization TimeOriginally published on KendalVanDyke.com.

I’ve had to take some big transactional snapshots before – 15 million rows\10 GB big. Applying the snapshot to a subscriber sitting in the same facility over a local network can take upwards of an hour…so imagine how long it takes to apply the snapshot to a subscriber sitting in another data center. Even at the best of speeds it can take 6-8 hours or more for database synchronization.

You can intervene and reduce the time it takes to apply the snapshot. Here’s how:

  1. Create the subscription to the publication and start the snapshot agent. Before the snapshot agent completes stop the distribution agent for the subscriber. When the snapshot agent finishes it will have written several files for each article to the snapshot folder including: IDX, PRE, SCH, XPP, and BCP files along with a copy of any pre and post snapshot scripts that were configured for the publication.
  2. Using your text editor of preference (Notepad works) open the pre-snapshot script in the snapshot folder, copy and paste the commands into Management Studio, and execute them against the subscriber. Do the same for the PRE, SCH, XPP, and IDX files (respectively) for each article.
  3. Using your compression tool of choice, compress each BCP file and transfer it to the subscriber. I use Info-Zip because I can compress all files individually on the command line using a FOR command like so:
    FOR %F in ("{snapshot folder path}\*.bcp") DO zip -j "%~nF.zip" "%F"
  4. At the subscriber, uncompress each BCP file and use the bcp utility to import the files into their respective tables. There are two options that must be provided for the file to be imported correctly; They are -n for “native type” and -k for “keep null values”. Other options can be provided as necessary for authentication, server name, database name, and table. I created a batch script that I named “unzip and bcp.bat” to process each compressed file. Here’s what the script looks like:
    pushd "%~dp1"
    unzip -j %1
    bcp "%2" in "%~dpn1.bcp" -n -T -k &amp;gt;&amp;gt; "%~dp1%2.bcplog.txt"
    del "%~dpn1.bcp"
    popd

    The script takes two arguments – the full path of the compressed file (%1 in the script) and the qualified name of the table (i.e. database.owner.table) to bulk insert into (%2 in the script). A log is written in the same directory as the compressed file in order to review for any errors that might have occurred during the bulk copy. The log file is the name of the table (second argument) with “.bcplog.txt” added on at the end. Since the script works with one file at a time and there are multiple files to process, I use the FOR command again like so:

    FOR %F in ("{compressed file path}\{table name}*.zip") DO "unzip and bcp.bat" "%F" {db}.{owner}.{table}

    I always make it a point to check the log files for any errors that occurred (search for the word “error”) when the bcp in is done. I also select the top 500 rows from each table to ensure the data looks good and didn’t get mangled up somewhere along the way.

  5. At the distributor, replace each file in the snapshot folder with an empty file with the same name. An easy way to do this is create an empty file called “blank.txt” and use the FOR command:
    FOR %F in ("{snapshot folder path}\*.*") DO copy /Y "blank.txt" "%F"
  6. Start the distribution agent. When you view the agent history it will look like it’s going through every article in the publication and applying the snapshot, but in reality it’s not doing anything because all of the files are empty. After it “applies” the snapshot it will start delivering any transactions that have been read by the log reader agent. Violà!

 

Note that you don’t have to do this for every article in your publication – if only one of them is big you can work with just the files generated for that article and leave the others alone. When the distribution agent runs it will do what it normally does to the other articles and “apply” the blank files for the article you transferred manually.

I hope this saves you time the next time you need to transfer a large snapshot!

Want to Learn More About SQL Server Replication?

If you'd like to learn more about how UpSearch can support your SQL Server database replication initiative, visit SQL Server Replication or contact us today.

About the Author

Microsoft SQL Server MVP & Principal Consultant

Kendal Van Dyke

UpSearch Alum Kendal Van Dyke is a database strategist, community advocate, public speaker and blogger. He is passionate about helping leaders use Microsoft's SQL Server to solve complex problems that protect, unlock and optimize data's value.

Since 1999, Kendal has specialized in SQL Server database management solutions and provided IT strategy consulting. Kendal excels at disaster recovery, high availability planning/implementation and debugging/troubleshooting mission critical SQL Server environments.

Kendal Van Dyke served the SQL Server community as Founder and President of MagicPass, the Orlando, FL based chapter of the Professional Association for SQL Server (PASS). In 2012, Kendal served as a member of the PASS Board of Directors.

Kendal remains active in the SQL Server community as a speaker and blogger. He teaches SQL Server enthusiast and technology leaders how to protect, unlock and optimize data’s value. Since 2008, Kendal has operated a SQL Server focused blog at http://www.kendalvandyke.com/.

Microsoft acknowledged Kendal for his support and outstanding contributions to the SQL Server community by awarding him Microsoft MVP (2011-15). Learn more about Kendal Van Dyke https://upsearch.com/kendal-van-dyke/.

About UpSearch

up-social-round

UpSearch is a company of data management and analytics experts who enable digital maturity with Microsoft’s technologies. Its mission is to enable every leader to unlock data’s full potential. UpSearch provides full lifecycle support for SQL Server, SQL Server in Azure (IaaS), Azure SQL DB (PaaS), Azure SQL DW (PaaS), Analytics Platform System (APS), and Power BI.

How To Script SQL Server Replication Stored Procedures

How To Script SQL Server Replication Stored ProceduresOriginally published on KendalVanDyke.com.

Ever wonder how to script SQL Server replication stored procedures?  If you use SQL stored procedures to replicate commands there’s a very useful procedure in SQL 2000 SP1 and up which will generate the custom insert, update, and delete procedures for all articles in a transactional publication: sp_scriptpublicationcustomprocs. It takes only one parameter – the name of the publication – and is run at the publisher. Details are available in BOL here.

sp_scriptpublicationcustomprocs relies on other stored procedures to create the scripts, some of which have been around since the SQL Server 7.0 days. These SQL replication stored procedures are used to script individual articles and take an article ID as their only parameter (Article IDs are found by querying sysarticles in the published database). For CALL syntax the procedures sp_scriptinsproc, sp_scriptdelproc, and sp_scriptupdproc will generate the insert, delete, and update procedures, respectively. For XCALL syntax use sp_scriptxupdproc to generate the update procedure and sp_scriptxdelproc to generate the delete procedure. For MCALL and SCALL syntax use sp_scriptmappedupdproc to generate the update procedure. (Side note – if you don’t understand the differences between CALL, SCALL, MCALL, and XCALL check out Specifying How Changes Are Propagated for Transactional Articles in BOL)

The ability to script some or all of the procedures necessary for a publication is very useful for many reasons. The obvious is when you already have the data at the subscriber and set up a subscription without synchronization. Or you may want to modify the scripts to do something special at the subscriber. The not so obvious situations are the ones that make life as a DBA interesting. I’ve had developers who didn’t recognize “what all those sp_MS stored procedures were” in their database and included dropping them as part of their deployments. Oops. To be fair I’ve also dropped them unintentionally in moments of stupidity. You know that checkbox that says “Connect to Subscriber and delete record of this subscription at the Subscriber also” when you delete a subscription? Leave that checked next time you drop a subscription in a central subscriber topology and you’ll find out why sp_scriptpublicationcustomprocs is so handy.

Want to Learn More About SQL Server Replication?

If you'd like to learn more about how UpSearch can support your SQL Server database replication initiative, visit SQL Server Replication or contact us today.

About the Author

Microsoft SQL Server MVP & Principal Consultant

Kendal Van Dyke

UpSearch Alum Kendal Van Dyke is a database strategist, community advocate, public speaker and blogger. He is passionate about helping leaders use Microsoft's SQL Server to solve complex problems that protect, unlock and optimize data's value.

Since 1999, Kendal has specialized in SQL Server database management solutions and provided IT strategy consulting. Kendal excels at disaster recovery, high availability planning/implementation and debugging/troubleshooting mission critical SQL Server environments.

Kendal Van Dyke served the SQL Server community as Founder and President of MagicPass, the Orlando, FL based chapter of the Professional Association for SQL Server (PASS). In 2012, Kendal served as a member of the PASS Board of Directors.

Kendal remains active in the SQL Server community as a speaker and blogger. He teaches SQL Server enthusiast and technology leaders how to protect, unlock and optimize data’s value. Since 2008, Kendal has operated a SQL Server focused blog at http://www.kendalvandyke.com/.

Microsoft acknowledged Kendal for his support and outstanding contributions to the SQL Server community by awarding him Microsoft MVP (2011-15). Learn more about Kendal Van Dyke https://upsearch.com/kendal-van-dyke/.

About UpSearch

up-social-round

UpSearch is a company of data management and analytics experts who enable digital maturity with Microsoft’s technologies. Its mission is to enable every leader to unlock data’s full potential. UpSearch provides full lifecycle support for SQL Server, SQL Server in Azure (IaaS), Azure SQL DB (PaaS), Azure SQL DW (PaaS), Analytics Platform System (APS), and Power BI.

When Good SQL Server Replication Agents Go Bad…

When Good SQL Server Replication Agents Go Bad...Originally published on KendalVanDyke.com.

Here’s the scenario: You’re responsible for replication (you are the DBA after all) and all of your publishers use a dedicated distributor. Now this happens to be in a high traffic datacenter and there are a lot of publications and subscriptions which means there are a lot of SQL Server replication agents running (think hundreds). One day you’re working on spinning up a new server and subscribe it to the handful of publications it needs..and then you start noticing that some distribution agents don’t run anymore. No matter what you do – restarting agents, restarting services, or restarting the server – seems to fix the problem. You wonder if your distributor has “hit its limit” and if you need to start talking to your boss about buying new hardware. Meanwhile you’re in firefight mode because subscribers aren’t getting data, customers are starting to notice that things aren’t updating, and you’re pulling your hair out wondering how to fix it.

If you’ve ever been in this scenario, chances are the new distribution agents have caused your distributor to use up its non-interactive desktop heap. Fortunately there’s an easy workaround to get things back on track. Microsoft KB article 949296 details the problem and links to KB article 824422 for the workaround. It requires editing the registry so you’ll need admin level access to the server. Once in place your distribution agents should all run again and you can tell your boss about how you just prevented the company from having to spend a few thousand dollars on a new distributor.

Want to Learn More About SQL Server Replication?

If you'd like to learn more about how UpSearch can support your SQL Server database replication initiative, visit SQL Server Replication or contact us today.

About the Author

Microsoft SQL Server MVP & Principal Consultant

Kendal Van Dyke

UpSearch Alum Kendal Van Dyke is a database strategist, community advocate, public speaker and blogger. He is passionate about helping leaders use Microsoft's SQL Server to solve complex problems that protect, unlock and optimize data's value.

Since 1999, Kendal has specialized in SQL Server database management solutions and provided IT strategy consulting. Kendal excels at disaster recovery, high availability planning/implementation and debugging/troubleshooting mission critical SQL Server environments.

Kendal Van Dyke served the SQL Server community as Founder and President of MagicPass, the Orlando, FL based chapter of the Professional Association for SQL Server (PASS). In 2012, Kendal served as a member of the PASS Board of Directors.

Kendal remains active in the SQL Server community as a speaker and blogger. He teaches SQL Server enthusiast and technology leaders how to protect, unlock and optimize data’s value. Since 2008, Kendal has operated a SQL Server focused blog at http://www.kendalvandyke.com/.

Microsoft acknowledged Kendal for his support and outstanding contributions to the SQL Server community by awarding him Microsoft MVP (2011-15). Learn more about Kendal Van Dyke https://upsearch.com/kendal-van-dyke/.

About UpSearch

up-social-round

UpSearch is a company of data management and analytics experts who enable digital maturity with Microsoft’s technologies. Its mission is to enable every leader to unlock data’s full potential. UpSearch provides full lifecycle support for SQL Server, SQL Server in Azure (IaaS), Azure SQL DB (PaaS), Azure SQL DW (PaaS), Analytics Platform System (APS), and Power BI.

Transactional SQL Server Replication: Architecture, Setup, and Monitoring

 

About the Presentation

Transactional SQL Server Replication: Architecture, Setup, and MonitoringAt some point in every DBA’s career they’ll be probably be asked to work with transactional SQL Server replication. While it may seem scary at first, it’s not that complicated once you understand how all the parts and pieces work together. In this session we’ll cover the fundamentals of how transactional replication in SQL Server works, walk through setting up a working replication topology, and learn how to keep an eye on things to make sure everything is working properly.

Session Goals:

  • Understand core concepts of transactional replication including: replication topologies, publishers, distributors, agents, & agent profiles.
  • Learn how to configure a working replication topology
  • Learn how to how to monitor transactional replication performance

Session Level: Beginner

Presentations (Upcoming & Past)

Want to Learn More About SQL Server Replication?

If you'd like to learn more about how UpSearch can support your SQL Server database replication initiative, visit SQL Server Replication or contact us today.

About the Author

Microsoft SQL Server MVP & Principal Consultant

Kendal Van Dyke

UpSearch Alum Kendal Van Dyke is a database strategist, community advocate, public speaker and blogger. He is passionate about helping leaders use Microsoft's SQL Server to solve complex problems that protect, unlock and optimize data's value.

Since 1999, Kendal has specialized in SQL Server database management solutions and provided IT strategy consulting. Kendal excels at disaster recovery, high availability planning/implementation and debugging/troubleshooting mission critical SQL Server environments.

Kendal Van Dyke served the SQL Server community as Founder and President of MagicPass, the Orlando, FL based chapter of the Professional Association for SQL Server (PASS). In 2012, Kendal served as a member of the PASS Board of Directors.

Kendal remains active in the SQL Server community as a speaker and blogger. He teaches SQL Server enthusiast and technology leaders how to protect, unlock and optimize data’s value. Since 2008, Kendal has operated a SQL Server focused blog at http://www.kendalvandyke.com/.

Microsoft acknowledged Kendal for his support and outstanding contributions to the SQL Server community by awarding him Microsoft MVP (2011-15). Learn more about Kendal Van Dyke https://upsearch.com/kendal-van-dyke/.

About UpSearch

up-social-round

UpSearch is a company of data management and analytics experts who enable digital maturity with Microsoft’s technologies. Its mission is to enable every leader to unlock data’s full potential. UpSearch provides full lifecycle support for SQL Server, SQL Server in Azure (IaaS), Azure SQL DB (PaaS), Azure SQL DW (PaaS), Analytics Platform System (APS), and Power BI.

Implementing a data warehouse: Fact tables

Originally published on ColleenMorrow.com.

This post is part of a series on implementing a data warehouse with Microsoft SQL Server 2012. So far, we’ve covered:

Today we’re exploring fact tables in more depth.  If you’ll recall fact tables store the quantitative data that lies at the heart of the data warehouse.  Without fact tables, our data warehouse would really just be a bunch of mostly unrelated dimension tables, and how boring would that be?

Fact columns

The InternetSalesFact fact table in the AdventureWorksDW2012 database

The InternetSalesFact fact table in the AdventureWorksDW2012 database

Just like dimension tables, fact tables have their own column types.  They are:

  • Foreign Keys – As in an OLTP database, foreign key columns reference primary key values in the dimension tables with which our fact table is associated.  In the FactInternetSales table, these would include ProductKey, CustomerKey, and OrderDateKey, among others.
  • Business Keys – This is usually used as a surrogate key from the table we’re using as our primary source for this fact table.  They’re not strictly necessary, but they make it easy to compare and link back to the source data.  For example, the SalesOrderNumber and SalesOrderLineNumber columns comprise a composite surrogate key in the FactInternetSales table.  They were pulled from the OrderDetails source table.
  • Lineage columns – Just as with dimension tables, these are strictly for auditing purposes.
  • Measures – These columns store measurements of interest associated with a business process.  I’m talking about columns like SalesAmount, OrderQuantity, etc.  They are the root of our fact table and the values we’re going to aggregate for reporting.  Therefore, they’re usually of a numeric data type.

Additivity

The primary benefit of a data warehouse is the ability to aggregate data for reporting, forecasting, etc.  We don’t often report row-level information, rather we want to be able to slice and dice sales by quarter, by region, by product type, etc.  So when we design our fact tables and start adding measures, we need to keep in mind how those measures will be aggregated.  Because sometimes how we aggregate a particular measure will depend on what dimension we’re looking at.

For some measures, it’s pretty straightforward.  When it comes to a measure like sales amount, we can sum it by date, by region, by product type.  If a measure can be summed across all dimensions, it’s referred to as an additive measure.

Sometimes, however, we can sum a measure across all dimensions except for time.  An example of this is a balance amount.  If Bob has $2000 in his checking account and Betty has $3000, then we can sum across the customer dimension and say that they have a sum of $5000.  However, if Bob buys a new laptop today and his checking account balance drops to $500, Bob obviously doesn’t have $2500 total.  We can’t sum the account balance across the time dimension.  We would need to do something like take the average instead, or simply use the last value.  Measures like this are called semi-additive measures.

Finally, some measures can’t ever be summed.  These are called non-additive measures, and include measures like discount percentages and prices.

Summary

In this installment, we reviewed the different types of fact columns and the additivity of measures.  Next time, we’ll talk about implementing fact and dimension tables in the data warehouse.

 

About the Author

SQL Server Consultant

Colleen Morrow

Colleen Morrow is SQL Server Consultant at UpSearch.

Colleen is a database strategist, community advocate, author, blogger and public speaker. She is passionate about helping technology leaders use Microsoft’s SQL Server to protect, optimize and unlock data’s value.

Colleen has been working with relational database for almost 20 years. Since 2000, Colleen has specialized in SQL Server and Oracle database management solutions. She excels at performance tuning, troubleshooting mission critical SQL Server environments, and training technology professionals and business users of all levels.

Since 2011, Colleen has maintained a SQL Server focused blog at http://colleenmorrow.com. She is an active member of the Ohio North SQL Server User Group, as well as a volunteer for the Professional Association for SQL Server (PASS). Colleen earned a Bachelor of Science in Computer and Information Systems from Cleveland State University.

Learn more about Colleen Morrow at https://upsearch.com/colleen-morrow/.

 

PowerShell 101 for the SQL Server DBA

 

About the Presentation

PowerShell 101 for the SQL Server DBAThe more you have to manage, the more likely you’ll want to automate your processes. PowerShell scripting language will make you truly effective at managing lots of servers. But it’s more than just a scripting language – it’s an interactive shell that stores data for you and allows you to implement ad-hoc solutions quickly and easily. Within the PowerShell environment you can easily manage both SQL Server instances and the Windows servers themselves, giving you a ‘best of both worlds’ environment that puts you in control. This session will introduce you to SQL Server PowerShell and show you how to use it to manage SQL Server across many instances.

Session Level: Intermediate

Download the Presentation

PowerShell 101 for the SQL Server DBA

Presentations (Upcoming & Past)

 

About the Author

Microsoft SQL Server MVP and Practice Leader

Allen White

Allen White is an UpSearch Alum and Microsoft SQL Server MVP.  For over 30 years, Allen has specialized in developing applications that manage the movement of data and maximize data's usefulness. Allen excels at communicating highly technical information using language that results in increased client engagement and understanding, regardless of technical competency.

Allen has been working with relational database systems for over 20 years. He has architected database solutions in application areas like retail point-of-sale (POS), POS audit, loss prevention, logistics, school district information management, purchasing and asset inventory and runtime analytics. Allen thrives on providing comprehensive solutions to information management problems across a great variety of application environments.

About UpSearch

up-social-round

UpSearch is a company of data management and analytics experts who enable digital maturity with Microsoft’s technologies. Its mission is to enable every leader to unlock data’s full potential. UpSearch provides full lifecycle support for SQL Server, SQL Server in Azure (IaaS), Azure SQL DB (PaaS), Azure SQL DW (PaaS), Analytics Platform System (APS), and Power BI.

 

Factor Encrypted Data Requirements into your Disaster Recovery Plan

Backups are probably the first, most important thing a DBA can do for his company’s data. Without backups, any number of problems can cause data loss, and significant data loss can be devastating to a company. Of course, without testing those backups, they’re just bits on a disk, and may or not be useful. Testing those backups regularly allows the DBA the practice of restoring so the steps are ingrained when they’re needed for a real recovery, plus it ensures that the backups themselves are good.

Sensitive data in the databases, of course, needs to be encrypted, so that only authorized users have access to that data. We’ve seen too many cases of the wrong people getting access to personal information that leads to fraud and identity theft, so encrypting that data is also critical to a company’s success.

Testing restores of database backups, with encrypted data in the database, becomes a bit more complicated, and it’s important to remember a few things before attempting to restore from backups with encrypted data in SQL Server.

First, and this is really important, back up the service master key and database master key for each database which uses SQL Server encryption. It’s not hard – here are the steps:

use master
GO
BACKUP SERVICE MASTER KEY FROM FILE = 'C:\MyDirectory\ServerSMK.key' ENCRYPTION BY PASSWORD = 'UD58ss6r'
GO
use MyDatabase
GO
BACKUP MASTER KEY FROM FILE = 'C:\MyDirectory\MyDatabaseMK.key'
ENCRYPTION BY PASSWORD = 'UD58ss6r'
GO

Copy these files to somewhere safe, where you know you’ll be able to find them should you find you need to recover the data in these databases.
Now, should you find that you need to recover the database, you can restore the database from backup, then issue the following commands to restore full access to the encrypted data:

use master
GO
RESTORE SERVICE MASTER KEY FROM FILE = 'C:\MyDirectory\ServerSMK.key' DECRYPTION BY PASSWORD = 'UD58ss6r'
GO
use MyDatabase
GO
RESTORE MASTER KEY FROM FILE = 'C:\MyDirectory\MyDatabaseMK.key'
DECRYPTION BY PASSWORD = 'UD58ss6r'
ENCRYPTION BY PASSWORD = 'ccH4QvQCp8Ry6nYSsVxZ5oU'
FORCE
GO

Disaster recovery without testing doesn’t amount to much, so it’s a good exercise for many reasons.

Want to Learn More About SQL Server Disaster Recovery?

If you'd like to learn more about how UpSearch can support your SQL Server disaster recovery initiative, visit SQL Server Disaster Recovery or contact us today.

About the Author

Microsoft SQL Server MVP and Practice Leader

Allen White

Allen White is an UpSearch Alum and Microsoft SQL Server MVP.  For over 30 years, Allen has specialized in developing applications that manage the movement of data and maximize data's usefulness. Allen excels at communicating highly technical information using language that results in increased client engagement and understanding, regardless of technical competency.

Allen has been working with relational database systems for over 20 years. He has architected database solutions in application areas like retail point-of-sale (POS), POS audit, loss prevention, logistics, school district information management, purchasing and asset inventory and runtime analytics. Allen thrives on providing comprehensive solutions to information management problems across a great variety of application environments.

About UpSearch

up-social-round

UpSearch is a company of data management and analytics experts who enable digital maturity with Microsoft’s technologies. Its mission is to enable every leader to unlock data’s full potential. UpSearch provides full lifecycle support for SQL Server, SQL Server in Azure (IaaS), Azure SQL DB (PaaS), Azure SQL DW (PaaS), Analytics Platform System (APS), and Power BI.

Naming Conventions for Tables and Views

In years past I taught the Microsoft 2779B Implementing a Microsoft SQL Server2005 Database class, and in Module 7 (focusing on Views) I came across the following “tip”:

Tip: You should develop a consistent naming convention to distinguish views from tables. For example, you could add the letter v or the word view as a suffix to the name of each view that you create. This approach allows tables and views to be easily distinguished.

Now, I agree that you should develop a consistent naming convention, but I advised the students that (in my opinion) views are, from the user’s perspective, no different than tables. The relational model makes little distinction between tables (relational variables, or “base” relvars) and views (“derived” or “virtual” relvars).

In fact, in C.J. Date’s book An Introduction to Database Systems(8th Edition, Addison-Wesley/Pearson Education, 2004, pg. 75), Chris states “Indeed, the fact that a view is a relvar is precisely one of the strengths of relational systems; it is important in just the same way as the fact that a subset is a set is important in mathematics. Note: SQL Products and the SQL Standard often seem to miss this point, however, inasmuch as they refer repeatedly to ‘tables and views’ with the tacit implication that a view is not a table.”

In other words, from the perpective of an application or from the user, there is no difference between a view and a table. Now, we know that’s not completely true, because it’s easy to create a view that can’t be updated because it joins multiple tables, and the query processor can’t parse out what’s being updated amongst the various tables in the view. That’s a relatively easy problem to solve, however, with the INSTEAD OF trigger.

In a former company we had an application that manages web sites. It uses a master database for all sites, and then a database for each site. In each database is a table that the application uses to display exhibitors at our trade shows. Some of the trade shows have multiple shows around the country, and for these the site database is actually a set of databases with a parent database for the main site and child databases for the individual shows in the group. In the child databases the table is replaced by a view, presenting data in the parent database. The application had a problem with an update to this object, so I wrote the INSTEAD OF trigger, and the application works as it was supposed to. I didn’t have to create views in every database used by the application for this exception situation.

So, my point is that tables and views should follow the same naming standard – the difference between the two should be transparent to the user.

 

About the Author

Microsoft SQL Server MVP and Practice Leader

Allen White

Allen White is a Microsoft SQL Server MVP and Practice Leader at UpSearch.

For over 30 years, Allen has specialized in developing applications that manage the movement of data and maximize data’s usefulness. Allen excels at communicating highly technical information using language that results in increased client engagement and understanding, regardless of technical competency.

Allen has been working with relational database systems for almost 20 years. He has architected database solutions in application areas like retail point-of-sale (POS), POS audit, loss prevention, logistics, school district information management, purchasing and asset inventory and runtime analytics. Allen thrives on providing comprehensive solutions to information management problems across a great variety of application environments.

Learn more about Allen White at https://upsearch.com/allen-white.

 

About UpSearch

up-social-round UpSearch provides as needed Microsoft SQL Server DBA Services across the globe. We specialize in helping leaders protect, unlock and optimize data’s value.