Getting More Rows Per Buffer in Your SQL Server Integration Services Package
In the past, I’ve written on my personal blog about the Default Buffer Max Rows property in SQL Server Integration Services (SSIS), and how powerful it can be. As a recap, there are two properties in SSIS that control the amount of data that your ETL (Extract, Transform, and Load) package will hold in any given buffer: the default buffer max rows and default buffer size properties.
When performance tuning SSIS packages, we can modify either of these properties to adjust how much data SSIS can carry through a buffer at once. By default, this is 10,000 rows or 10 MB.
Before we can dynamically set the default buffer max rows property, we first need to find our destination table and identify the maximum row length in that table. Fortunately, this can be obtained with a simple query (using DimCurrency in the AdventureWorksDW2012):
SUM (max_length) [row_length]
FROM sys.tables t
JOIN sys.columns c
JOIN sys.schemas s
WHERE t.name = ‘DimCurrency’
AND s.name = ‘dbo’
You could easily turn this into a procedure to take the table name and schema as a parameter and return it. For this post, I’m going to leave it as a query. Once we have the max row length, it becomes a simple exercise in math. The table in the example query above returns a max row length of 118 (in bytes). We take this value and divide it into our default value (which fortunately is stored in SSIS in bytes as 10485760). So our calculation, 10485760 divided by 118, gives us a result of 88,862 rows. We’ll round down to the closest 100, and 88,800 will be our default buffer max rows property.
Now how do we make this dynamic? To start, we open up our SSIS package and add an Execute SQL task into the Control Flow. A good way to improve upon this would take this, put in a Pre-Execute SQL task, and write the value to an information table. Then you can log it and do some reporting around what your buffers are. As an added bonus with this dynamic structure: as your data scales, your ETL package will too!
Next, we want to create 3 variables:
The str_table_name and str_schema_name variables should be strings. The int_row_size variable should be an Int32 and the value should be set to 1. These settings will avoid an error in our expression later. In this way we can parameterize our inputs for our destination tables and the corresponding values and create a placeholder for our incoming result. If your package names are the same as the destination tables then instead of having to assign this value manually you could just configure it from an expression, using the @[System::PackageName] global variable:
Let’s open up our Execute SQL Task and configure it. The connection will point to our destination database, and the SQL query will be as we’ve outlined above, subbing parameters for the table and schema name:
Do the parameter mapping on the second tab to parameterize your table and schema with the variables you created above, and set your result set to single row. The single row result set should map to your int_row_size variable. Now for the fun part. Create a data flow task, right click on it and go to “Properties.”
Under “Expressions,” select DefaultBufferMaxRows and expand the expression ellipses. For the expression, use the following logic:
@[User::int_row_size]==1? 10000: 10485760/ @[User::int_row_size]
The initial test of the value is to ensure that if the SQL fails to get the proper row size, your buffers remain at the default. And that’s it! Now you can dynamically set your default buffer max rows for SSIS!
Now that’s some awesome-sauce.
About the Author
Chris Schmidt is a data architect and has more than six years of experience in developing business intelligence solutions. Chris is extremely passionate about business intelligence and helping companies achieve insight through data. He lives in Columbus, OH and is an active member of the SQL community.