SQLServer

Microsoft SQL Server 2000 Index Defragmentation Best Practices

손병환 2006. 1. 14. 21:49

아래의 글은 마이크로소프트 사이트에서 발췌한 내용입니다. 개인적으로 많은 도움이 되었기에 글을 올립니다. 혹시 게제에 대한 문제가 있을 경우에는 바로 삭제하도록 하겠습니다.

http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/ss2kidbp.mspx

 

Microsoft SQL Server 2000 Index Defragmentation Best Practices

Updated: March 19, 2003

Author: Mike Ruthruff

February 2003

Summary As Microsoft SQL Server 2000 maintains indexes to reflect updates to their underlying tables, these indexes can become fragmented. Depending on workload characteristics, this fragmentation can adversely affect workload performance. This white paper provides information to help you determine whether you should defragment table indexes to benefit workload performance. To defragment indexes, SQL Server 2000 provides several statements. This white paper compares two of those statements: DBCC DBREINDEX and DBCC INDEXDEFRAG.

*
 

Overview

This white paper provides information that you can use to determine whether you should defragment indexes to benefit the workload performance in your production environment. In addition, this paper compares DBCC DBREINDEX and DBCC INDEXDEFRAG, the statements provided by Microsoft SQL Server 2000 to perform index defragmentation. Included in this comparison are the results from testing these statements against different databases and hardware environments. For information about these test environments, see "Small-Scale Environment vs. Large-Scale Environment" and Appendix A later in this paper.

Note: Defragmentation does not yield performance gains in every case. Every scenario is different. Similarly, determining when you should run the defragmentation statements requires analysis.

This white paper describes important considerations and the general process for determining when you should defragment indexes. The following is a summary of the key points presented in this paper:

Before you defragment indexes, ensure that system resources issues, such as physical disk fragmentation or inappropriate schemas, are not adversely affecting workload performance.

DBCC SHOWCONTIG allows you to determine the amount of index fragmentation. When you run this statement, pay particular attention to the values for Logical Fragmentation and Page Density.

Workload type is an important consideration when determining whether you should defragment indexes. Not all workload types benefit from defragmenting. Read-intensive workload types that do significant disk I/O benefit the most. The test results showed that the Decision Support System (DSS) workload type benefited much more than the OLTP (Online Transaction Processing) workload type.

Fragmentation affects disk performance and the effectiveness of the SQL Server read-ahead manager. There are some key indicators available through the Windows Performance Monitor that show this.

The decision of whether to use DBCC DBREINDEX or DBCC INDEXDEFRAG is based on your availability needs and the hardware environment.

Updating statistics is a side effect of DBCC DBREINDEX, which is not the case with DBCC INDEXDEFRAG. You can increase the effectiveness of DBCC INDEXDEFRAG by running UPDATE STATISTICS afterwards.

 

Understanding Fragmentation

Fragmentation exists when indexes have pages in which the logical ordering, based on the key value, does not match the physical ordering inside the data file. All leaf pages of an index contain pointers to the next and the previous pages in the index. This forms a doubly linked list of all index/data pages. Ideally, the physical order of the pages in the data file should match the logical ordering. Overall disk throughput is increased significantly when the physical ordering matches the logical ordering of the data. This leads to much better performance for certain types of queries. When the physical ordering does not match the logical ordering, disk throughput can become less efficient, because the disk head must move back and forth to gather the index pages instead of scanning forward in one direction. Fragmentation affects I/O performance, but has no effect on performance of queries whose data pages reside in the SQL Server data cache.

When indexes are first built, little or no fragmentation should exist. Over time, as data is inserted, updated, and deleted, fragmentation levels on the underlying indexes may begin to rise. To correct the fragmentation, SQL Server provides the following statements:

DROP INDEX followed by CREATE INDEX

CREATE INDEX WITH DROP_EXISTING

DBCC INDEXDEFRAG

DBCC DBREINDEX

DBCC INDEXDEFRAG and DBCC DBREINDEX were used for the tests described in this white paper. These statements allowed both online and offline scenarios to be represented. DBCC DBREINDEX builds indexes the same way as CREATE INDEX; therefore, the results achieved with DBCC DBREINDEX are similar to those if CREATE INDEX were used. The results of the tests and the functionality provided by each of these statements are described later in this paper.

 

Considerations Before Defragmenting

System Resource Issues

Before you decide to defragment indexes, you should first ensure that any performance issues are not related to system resource limitations. A detailed discussion about this issue is beyond the scope of this paper; however, some of the more common resource issues are related to I/O subsystem performance, memory usage, and CPU utilization. For sources of more in-depth information about diagnosing these types of problems, see "Finding More Information" later in this paper.

Physical Disk Fragmentation

Disk fragmentation can contribute to poor performance on some systems. To determine whether disk fragmentation exists, use system tools provided in Microsoft Windows or from third parties to analyze drives on which SQL Server databases reside. on small-scale environments with more conventional I/O subsystems, it is recommended that you correct disk fragmentation before running index defragmentation tools. on large-scale environments that benefit from more intelligent disk subsystems, such as SAN (storage area networks) environments, correcting disk fragmentation is not necessary.

Poorly Performing Queries

When investigating any performance related problems, you must identify which queries in the workload are performing poorly. Some of the information discussed here is also used later in the paper to determine which indexes to focus on for defragmentation.

You can use SQL Profiler to identify poorly performing queries. (For more information, see the topic "SQL Profiler" in SQL Server Books online.) Running SQL Profiler introduces some overhead; however, monitoring only the following events should allow you to collect needed information with minimal impact on performance (this varies, but generally less than 10 percent impact to CPU utilization).

SQL Profiler provides a trace template named SQLProfilerTSQL_Duration, which captures relevant events. You can use this to identify poorly performing queries quickly. You can also manually create a SQL Profiler trace that captures these events:

TSQL: SQLBatchCompleted

Stored Procedures: RPC:Completed

The length of time you need to run SQL Profiler depends on the server workload. For the trace to be most useful, it should represent a typical workload or, at least, the part of the workload that exhibits poor performance. After the trace has been captured, examine the trace log focusing on the duration column. This is a measure in milliseconds of the duration of each batch or query.

Identifying the Poorest Performing Queries

Here are some suggestions to best determine the poorest performing queries in the trace:

Group the trace by query duration. Focus first on the "Top 10" worst performing queries.

If your application uses stored procedures extensively, consider using the SQLProfilerSP_Counts trace template to identify the most commonly called stored procedures. Focus on the worst performing stored procedures that are used most frequently.

Consider saving the collected data into SQL Server tables. This allows you to query the tables and perform more detailed analysis (for example, average duration, max duration, and so on) of the workload performance.

Underlying Schema

After you have identified the longest running/worst performing queries, you must ensure that the underlying schema is optimal for these. Ensure that appropriate indexes exist and are being used appropriately by the queries. Using SQL Query Analyzer to display and review query plans allows you to see what indexes are being used by queries in the workload. When the execution plan of a query is graphically displayed using SQL Query Analyzer, out-of-date or missing statistics are indicated as warnings (the table name appears in red text). Resolve any out-of-date statistics before defragmenting.

When you examine query plans, keep the following suggestions in mind:

Look for steps in the execution plan that have a high cost associated with them. These are the most expensive parts of the query. Improving performance of these steps results in the biggest performance gains.

Identify steps that perform index scans. Index scans benefit the most from defragmenting. Note indexes where poorly performing queries use index scans so you can focus on those when you defragment the indexes.

Capturing a trace with SQL Profiler, in addition to allowing you to manually review the query plans, allows you to use the Index Tuning Wizard to analyze the workload. You can use the report created by the Index Tuning Wizard to determine whether any changes must be made to the underlying schema. Address any needed schema changes before defragmenting indexes.

 

Small-Scale Environment vs. Large-Scale Environment

The tests described later in this paper were run on two servers, each having very different I/O subsystems. one server represented a small-scale environment and the other system represented a large-scale environment. The specifics of each environment are presented here to provide a context in which to interpret the test results.

Small-Scale Environment

In the smaller configuration, the database size ranged from 10 to 20 gigabytes (GB). Data was spread across two physical spindles, with tempdb and the database log separated on 2 additional spindles using RAID 0. The database configuration for the DSS database consisted of two file groups, each having one file. The database configuration for the OLTP database consisted of one file group with one data file.

Large-Scale Environment

Microsoft partnered with Hitachi Data Systems to build the large-scale Storage Area Network (SAN) environment using a Hitachi Freedom Storage Lightning 9900 Series Lightning 9960 system for storage of the data. The database size for the testing done on this system was approximately 1 terabyte (TB). Data was spread across 64 physical spindles using RAID 1+0. The spindles used for data were exposed through eight Logical Unit Numbers (LUNs), and the database configuration consisted of one file group containing eight data files. tempdb and the database log were created on a separate set of spindles isolated from the data, with tempdb spread across 48 spindles and the log across 8 spindles. To quickly back up and restore images of the fragmented database, two Hitachi ShadowImage software copies of the data/log were maintained in the SAN, and the Lightning 9960 system was used to resynchronize a ShadowImage software copy of the data with the production copy. on this larger system, tests were repeated at two of the three fragmentation levels tested due to the amount of storage needed to maintain a copy for each level (approximately 1.4 TB).

Performance Impact of Index Fragmentation

The test results are discussed in more detail later in this paper; however, it is worth noting that although index fragmentation had an adverse effect on both environments, the impact on the large-scale environment was significantly less than on the small-scale environment. These results are to be expected because the larger environment benefits from greatly increased I/O performance provided by the SAN: Not only can the data be spread across many more spindles, but also the SAN provided 16 GB of data cache. The I/O benchmark tests performed when creating the 1-TB database yielded a maximum read throughput of 354 MB/sec, compared to only 71 MB/sec on the small-scale environment.

Note: These values can vary based on your specific implementation and storage configuration.

Obviously, having a high-performance I/O subsystem benefits SQL Server performance; however, performance gains can still be realized by defragmenting indexes across all systems. When building databases, give careful consideration to the I/O subsystem and be sure to isolate log files on separate spindles from data files whenever possible.

 

Deciding When to Defragment Indexes

Consider the following important recommendations when deciding to defragment indexes:

Identify fragmented indexes.

Understand which workload types benefit most from defragmenting.

Determine the amount of I/O performed by a query.

Understand the effect of fragmentation on disk throughput and the SQL Server read-ahead manager.

In the following sections, where appropriate, the results from the tests have been included to help illustrate the recommendations.

Using DBCC SHOWCONTIG to Identify Fragmented Indexes

When deciding to defragment, you must first identify the fragmented indexes. DBCC SHOWCONTIG allows you to measure fragmentation and page density levels on indexes.

The following is sample output from DBCC SHOWCONTIG:

DBCC SHOWCONTIG scanning 'table_1' table...
Table: 'table_1' (453576654); index ID: 1, database ID: 8
TABLE level scan performed.
- Pages Scanned................................: 48584
- Extents Scanned..............................: 6090
- Extent Switches..............................: 12325
- Avg. Pages per Extent........................: 8.0
- Scan Density [Best Count:Actual Count].......: 49.27% [6073:12326]
- Logical Scan Fragmentation ..................: 10.14%
- Extent Scan Fragmentation ...................: 32.74%
- Avg. Bytes Free per Page.....................: 1125.2
- Avg. Page Density (full).....................: 86.10%
DBCC SHOWCONTIG scanning 'table_1' table...
Table: 'table_1' (453576654); index ID: 2, database ID: 8
LEAF level scan performed.
- Pages Scanned................................: 41705
- Extents Scanned..............................: 5221
- Extent Switches..............................: 6094
- Avg. Pages per Extent........................: 8.0
- Scan Density [Best Count:Actual Count].......: 85.55% [5214:6095]
- Logical Scan Fragmentation ..................: 7.80%
- Extent Scan Fragmentation ...................: 6.63%
- Avg. Bytes Free per Page.....................: 877.7
-  Avg. Page Density (full).....................: 83.20%

When you examine the results from DBCC SHOWCONTIG, pay particular attention to logical scan fragmentation and average page density. Logical scan fragmentation is the percentage of out-of-order pages in an index (Note: This value is not relevant on heaps and text indexes. A heap is a table with no clustered index.). Page density is a measure of fullness for leaf pages of an index. For more information, see the topic "DBCC SHOWCONTIG" in SQL Server Books online.

Analyzing Output from DBCC SHOWCONTIG

When analyzing the output from DBCC SHOWCONTIG, consider the following:

Fragmentation affects disk I/O. Therefore, focus on the larger indexes because their pages are less likely to be cached by SQL Server. Use the page count reported by DBCC SHOWCONTIG to get an idea of the size of the indexes (each page is 8 KB in size). Generally, you should not be concerned with fragmentation levels of indexes with less than 1,000 pages. In the tests, indexes containing more than 10,000 pages realized performance gains, with the biggest gains on indexes with significantly more pages (greater than 50,000 pages).

High values for logical scan fragmentation can lead to degraded performance of index scans. In the tests, workload performance increased after defragmenting when clustered indexes had logical fragmentation greater than 10 percent, and significant increases were attained when logical fragmentation levels were greater than 20 percent. Consider defragmenting indexes with 20 percent or more logical fragmentation. Remember that this value is meaningless when reporting on a heap (Index ID = 0).

Low values for average page density can result in more pages that must be read to satisfy a query. Reorganizing the pages so they have a higher page density can result in less I/O to satisfy the same query. Generally, tables have a high page density after initial loading of data, and page density may decrease over time as data is inserted, resulting in splits of leaf pages. When examining the value for average page density, remember that this value is dependent on the fillfactor specified when the table was created.

Although scan density can also be an indication of fragmentation levels, it is not valid when indexes span multiple files; thus, scan density should not be considered when examining indexes that span multiple files.

Monitoring Fragmentation Levels

Regularly monitoring fragmentation levels on indexes is good practice. For a sample script that demonstrates how to automate the capture and rebuild of heavily fragmented indexes, see the "DBCC SHOWCONTIG" topic in SQL Server Books online. Consider using the DBCC SHOWCONTIG TABLERESULTS option and importing this information into tables at regular intervals. Doing this allows you to monitor the fragmentation levels over time. Also consider using the WITH FAST option when running DBCC SHOWCONTIG on a busy server. Using the WITH FAST option allows DBCC SHOWCONTIG to avoid scanning the leaf pages of an index. This can result in faster performance of DBCC SHOWCONTIG; however, because it does not scan the leaf pages of the index, it cannot report page density numbers.

Table 1 shows the amount of time it took to run DBCC SHOWCONTIG against all the indexes in the small-scale and large-scale DSS environments. The ALL_INDEXES and TABLERESULTS options were used for each test.

Table 1 DBCC SHOWCONTIG performance

DBCC SHOWCONTIG options Total number of index pages (all indexes) Run time (minutes)

Small-Scale Environment

 

 

Not using the WITH FAST option

1,702,889

5.02

Using the WITH FAST option

1,702,889

0.90

Large-Scale Environment

 

 

Not using the WITH FAST option

111,626,354

382.35

Using the WITH FAST option

111,626,354

48.73

Understanding Which Workload Types Benefit Most from Defragmenting Indexes

When you decide to defragment indexes, it is important to understand that certain workload types realize a much greater performance increase than others as a result of defragmenting indexes. Fragmentation has an adverse effect on disk I/O performance. Queries that scan large ranges of index pages are affected most by fragmentation and gain the most from defragmenting.

The tests examined the performance of two typical database workload types with fragmentation present, contrasted with the performance of these same two workload types after index defragmentation. Testing was performed against a representative OLTP database and a representative DSS database. The OLTP type of workload primarily performs focused updates to particular ranges of the data (inserts, updates, and deletes) and very selective reads of the data. The workload for the DSS system, however, is read intensive, consisting of queries that perform joins across several tables. Typically, these queries must perform scans of one or more underlying indexes to satisfy the query results. The test results show that fragmentation had little impact on the OLTP workload performance; however, the DSS workload performance realized significant gains from defragmenting fragmented indexes.

The DBCC INDEXDEFRAG and DBCC DBREINDEX statements were used to defragment indexes. Specific functionality of the DBCC INDEXDEFRAG and DBCC DBREINDEX statements is discussed in more detail later in this paper.

OLTP-Type Workload

In the tests, the OLTP database workload simulated order processing for a warehouse environment. The workload consisted of five stored procedures issuing transactions for new orders, order status inquiries, deliveries, stock level status, and payment processing. The stored procedures used queries including inserts, updates, deletes, and very selective SELECT queries.

Figure 1 shows the difference in query performance for each of the five stored procedures, before and after the indexes were defragmented using DBCC INDEXDEFRAG and DBCC DBREINDEX.

Figure 1: Average duration for each stored procedure in the OLTP-type workload, before and after defragmenting. The lower values indicate better performance.

Figure 1: Average duration for each stored procedure in the OLTP-type workload, before and after defragmenting. The lower values indicate better performance.
See full-sized image.

As shown in Figure 1, there is little difference between the performance of the stored procedures before and after defragmenting. Because the underlying queries issued by these stored procedures acted upon very selective portions of the data, workload performance was not adversely affected by fragmented indexes. Some of the results shown in Figure 1 may tend to indicate that running defragmentation utilities actually decreased performance of the stored procedures; however, there was a natural variation of 10 to 20 percent in the performance of the stored procedures during the workload run. The differences shown in Figure 1 are within this range. More importantly, the results show that over time there was no decrease in performance of the stored procedures as the fragmentation levels rose.

DSS-Type Workload

In the tests, the DSS workload consisted of 22 reporting-type queries that issued complex SELECT statements against the database. These queries were run serially as a batch against the server. All of the queries in this workload consisted of one or more table joins, and most queries scanned a large portion of the underlying indexes.

Table 2 shows the average fragmentation and page-density levels for the target indexes that were used in the tests. Fragmentation levels were achieved by a combination of the following:

Bulk inserts of new data into the database, simulating periodic refreshes of the data.

Deletes were performed on ranges of data.

Some updates were performed on the key values; however, this affected fragmentation levels the least; and the number of updates performed was relatively small in comparison to the inserts and deletes.

Table 2 Average logical fragmentation and page-density levels tested on the small- and large-scale environments

Fragmentation level Average logical fragmentation (%) Average page density (%)

Small-Scale Environment

 

 

Low (1)

7.8

80.1

Medium (2)

16.6

68.1

High (3)

29.5

69.2

Large-Scale Environment

 

 

Low (1)

5.9

84.4

Medium (2)

13.8

70.3

The test results for the DSS workload type were very different from the results of the OLTP workload type. The workload performance improved significantly after defragmenting the indexes. This was expected because the performance for this workload was highly dependent on the disk throughput. (Most of the workload queries performed index scans.) Figures 2 and 3 that follow show the performance gains realized by the DSS workloads before and after defragmenting indexes. As the figures indicate, workload performance gained significant improvement from defragmenting.

The workload performance increase realized in the small-scale environment ranged from 60 percent at the low level of fragmentation to more than 460 percent at the highest level of fragmentation. The workload performance increased realized for the large-scale environment ranged from 13 percent at the low fragmentation level to 40 percent at the medium fragmentation level. The results show that fragmentation had less impact on the large-scale environment because that environment benefited from a better performing disk subsystem. The test results are discussed in more detail in "Effect of Fragmentation on Disk Throughput and the SQL Server Read-Ahead Manager" later in this paper.

Figure 2: Total workload duration time for the DSS workload for different levels of fragmentation on the small-scale environment. The lower values indicate better performance.

Figure 2: Total workload duration time for the DSS workload for different levels of fragmentation on the small-scale environment. The lower values indicate better performance.
See full-sized image.

Figure 3: Total workload duration time for the DSS workload for different levels of fragmentation on the large-scale environment. The lower values indicate better performance.

Figure 3: Total workload duration time for the DSS workload for different levels of fragmentation on the large-scale environment. The lower values indicate better performance.
See full-sized image.

The results shown in Figure 2 may seem a bit counter-intuitive because DBCC INDEXDEFRAG actually resulted in better workload performance than DBCC DBREINDEX on the small-scale environment. In most cases, completely rebuilding the indexes should yield better performance.

When interpreting these test results, keep the following in mind:

The results shown in Figures 2 and 3 indicate a "best case" scenario for DBCC INDEXDEFRAG. In the tests, DBCC INDEXDEFRAG was run on a quiescent system; therefore DBCC INDEXDEFRAG could fully eliminate fragmentation. When DBCC INDEXDEFRAG is run on nonquiescent systems on which updates to the underlying data are taking place, DBCC INDEXDEFRAG skips locked pages as it encounters them. As a result, DBCC INDEXDEFRAG may not be able to completely eliminate fragmentation. To measure the effectiveness of DBCC INDEXDEFRAG, you should run DBCC SHOWCONTIG after DBCC INDEXDEFRAG.

Placement of the data on disk can contribute to disk performance. Data on the small–scale environment was spread across only two physical spindles (33.5 GB of total space) that were empty at the time the database was built. Total size for all data files was between 22 GB and 30 GB. Originally, when the database was created, the data resided close to the beginning of the data files nearer the outer portion of the physical spindles. DBCC INDEXDEFRAG defragments the data in place so the position of the data on disk is close to its original position. Because DBCC DBREINDEX rebuilds the indexes completely, it must first allocate new space in the file before releasing the previous indexes pages. This allocation results in the data being moved further into the data files and nearer the inner rim of the disks, resulting in slightly decreased I/O throughput. In the benchmark testing of the small-scale environment, this decrease could be as much as 15 percent for reads.

Free space can also influence the effectiveness of DBCC DBREINDEX. Without large enough contiguous blocks of free space, DBREINDEX may be forced to reuse other areas of free space within the data files, resulting in indexes being rebuilt with a small amount of logical fragmentation. For more information about the free space required by DBCC DBREINDEX, see "DBCC DBREINDEX" later in this paper.

Determining the Amount of I/O Performed by a Query

Because I/O intensive queries stand to gain the most from defragmenting, it is worth mentioning how to determine the amount of I/O for a particular query. The SET STATISTICS IO statement reports the number and type of reads issued by an instance of SQL Server to satisfy a particular query. This option can be set on and OFF in SQL Query Analyzer as follows:

SET STATISTIC IO on
GO
SELECT * FROM table_1
GO
SET STATISTIC IO OFF
GO

Sample Output

Table 'table_1'.
 Scan count 1,
 logical reads 12025,
 physical reads 0,
 read-ahead reads 11421.

Table 3 Description of the output values in a SET STATISTIC IO report

Value Description

Scan count

Number of scans performed

logical reads

Number of pages read from the data cache

physical reads

Number of pages read from disk

read-ahead reads

Number of pages placed into the cache for the query

By looking at the number of physical and read-ahead reads, you can get an idea of the disk I/O that must be performed to satisfy the query. Both physical and read-ahead reads indicate that a page was read from disk. Usually, you should see more read-ahead reads than physical reads.

Note When you attempt to get this information from a SQL Profiler trace, the reads column in SQL Profiler reports the number of logical reads, not physical reads.

In addition to reordering out-of-order pages, defragmenting indexes can also reduce the amount of I/O performed by queries by increasing page-density levels for the leaf pages of an index. Increased page density results in less total pages being read by SQL Server to satisfy the same query, resulting in better performance.

Effect of Fragmentation on Disk Throughput and the SQL Server Read-Ahead Manager

Fragmentation can adversely affect performance of read-intensive disk-bound workloads. You can use Windows Performance Monitor to gain some insight into the effect of fragmentation on your workload. Performance Monitor enables you to monitor disk activity and can be helpful in determining when you need to defragment.

To understand why fragmentation had such an effect on the DSS workload performance, it is important to understand how fragmentation affects the SQL Server read-ahead manager. For queries that scan one or more indexes, the SQL Server read-ahead manager is responsible for scanning ahead through the index pages and bringing additional data pages into the SQL Server data cache. The read-ahead manager dynamically adjusts the size of reads it performs based on the physical ordering of the underlying pages. When there is low fragmentation, the read-ahead manager can read larger blocks of data at a time, more efficiently using the I/O subsystem. As the data becomes fragmented, the read-ahead manager must read smaller blocks of data. The amount of read-aheads that can be issued is independent of the physical ordering of the data; however, smaller read requests take more CPU resources per block, resulting in less overall disk throughput.

In all cases, the read-ahead manager benefits performance; but, when fragmentation exists and the read-ahead manager is unable to read the larger block sizes, it can lead to a decrease in overall disk throughput. You can see this behavior by examining some of the Physical Disk counters exposed by the Performance Monitor. The following table lists and describes these counters.

Table 4 Performance Monitor Physical Disk counters

Physical Disk counter Description

Avg Disk sec/ Read

This is a measure of the disk latency. The tests showed that very high levels of fragmentation (greater than 30 percent) can lead to increased disk latency.

Disk Read Bytes/ sec

This is a good measure of overall disk throughput. A trend downward over time for the same workload could be an indication of fragmentation affecting performance.

Avg Disk Bytes/ Read

This is a measure of how much data is read for each read request issued. When index pages are contiguous, the SQL Server read-ahead manager can read larger chunks of data at a time, making more efficient use of the I/O subsystem. The tests showed a direct correlation between this value and the amount of fragmentation. As fragmentation levels increase, this value can decrease, affecting overall disk throughput.

Avg Disk Read Queue Length

In general, the target is for a sustained average below two per physical spindle. In the tests, this value tended to increase with increasing fragmentation, most likely due to the higher latency and lower overall disk throughput.

Figures 4 through 7 show the disk throughput and average read size during the DSS workloads as reported by Performance Monitor.

Figure 4: Disk throughput during the DSS workload on the small-scale environment. The higher values indicate better disk throughput.

Figure 4: Disk throughput during the DSS workload on the small-scale environment. The higher values indicate better disk throughput.
See full-sized image.

Figure 5: Disk throughput during the DSS workload on the large-scale environment. The higher values indicate better disk throughput.

Figure 5: Disk throughput during the DSS workload on the large-scale environment. The higher values indicate better disk throughput.
See full-sized image.

Figure 6: Average size of each disk read during the DSS workload run on the small-scale environment. The higher values indicate greater bytes per read.

Figure 6: Average size of each disk read during the DSS workload run on the small-scale environment. The higher values indicate greater bytes per read.
See full-sized image.

Figure 7: Average size of each disk read during the DSS workload run on the large-scale environment. The higher values indicate greater bytes per read.

Figure 7: Average size of each disk read during the DSS workload run on the large-scale environment. The higher values indicate greater bytes per read.
See full-sized image.

The previous figures show the results that fragmentation can have on disk performance. Although there are differences in results obtained with DBCC DBREINDEX and DBCC INDEXDEFRAG, notice that consistently across all systems the average size of each read and overall disk throughput decreases as fragmentation grows. As you can see, defragmenting the indexes resulted in much improved disk throughput.

The numbers for average read size have been included to illustrate the effect fragmentation has on the ability of the read-ahead manager to read larger chunks of data. It is important to keep in mind, however, that a larger average read size does not always equate to greater overall disk throughput. Larger read sizes are issued to reduce the CPU overhead incurred for the data transfer. When indexes are not fragmented, data may be read in just as fast with 64-KB read sizes as it would with 256-KB read sizes. This is especially true for larger systems on which the data is spread across many spindles. These particular results and disk performance in general can vary greatly from system to system due to any number of reasons (for example, difference in I/O subsystems, workload characteristics, placement of data on disk, and so on). When monitoring your system, look for general downward trends in disk throughput and read size over long periods of time. This, in addition to the information provided by DBCC SHOWCONTIG, can help you determine when to defragment the indexes.

Fragmentation can also lead to increased disk latency. The tests showed, however, that only the highest fragmentation levels had a significant negative impact on disk latency, and this affected only the small-scale environment. Disk latency was significantly lower and never became an issue on the large-scale environment due to increased I/O performance provided by the SAN.

 

DBCC DBREINDEX vs. DBCC INDEXDEFRAG

In addition to using the CREATE INDEX statement to drop and re-create indexes, you can use the DBCC DBREINDEX and DBCC INDEXDEFRAG statements to help with index maintenance.

DBCC DBREINDEX

DBCC DBREINDEX can be used to rebuild one or more indexes for a specific table. DBCC DBREINDEX is an offline operation. While this operation is running, the underlying table is unavailable to users of the database. DBCC DBREINDEX rebuilds indexes dynamically. You do not have to know anything about the underlying table structure, nor any PRIMARY KEY or UNIQUE constraints; these are preserved automatically during the rebuild. DBCC DBREINDEX completely rebuilds the indexes, so it restores the page density levels to the original fillfactor (default); or you can choose another target value for the page density. Internally, running DBCC DBREINDEX is very similar to using Transact-SQL statements to drop and re-create the indexes manually.

There are two distinct advantages of running DBCC DBREINDEX over DBCC INDEXDEFRAG:

DBCC DBREINDEX rebuilds statistics automatically during the rebuild of the indexes; this can have dramatic improvements on workload performance.

DBCC DBREINDEX can take advantage of multiple-processor computers and can be significantly faster when rebuilding large or heavily fragmented indexes.

All work done by DBCC DBREINDEX occurs as a single, atomic transaction. The new indexes must be completely built and in place before the old index pages are released. Performing the rebuild requires adequate free space in the data file(s). With not enough free space in the data file(s), DBCC DBREINDEX may be unable to rebuild the indexes, or the indexes may be rebuilt with logical fragmentation values above zero. The amount of free space needed varies and is dependent on the number of indexes being created in the transaction. For clustered indexes, a good guideline is: Required free space = 1.2 * (average rowsize) * (number of rows).

For nonclustered indexes, you can predict free space necessary by calculating the average row size of each row in the nonclustered index (length of the nonclustered key plus the length of clustering key or row ID). Then multiply that value by the number of rows. If you rebuild indexes for an entire table, you will need enough free space to build the clustered index and all nonclustered indexes. Similarly, if you rebuild a nonunique clustered index, you will also need free space for both the clustered and any nonclustered indexes. The nonclustered indexes are implicitly rebuilt because SQL Server must generate new unique identifiers for the rows. When you use DBCC DBREINDEX, it is good practice to specify the index you want to defragment. This gives you more control over the operations being performed and can help to avoid unnecessary work.

DBCC INDEXDEFRAG

DBCC INDEXDEFRAG allows you to rebuild a specific index. Similar to using DBCC DBREINDEX, you do not have to know about the underlying table structure; however, with DBCC INDEXDEFRAG you cannot rebuild all indexes with a single statement. You must run DBCC INDEXDEFRAG once for each index you want to defragment.

Unlike DBCC DBREINDEX, DBCC INDEXDEFRAG is an online operation; therefore, the table and indexes are available while the index is being defragmented. Another major difference is that DBCC INDEXDEFRAG can be stopped and restarted without losing any work. The entire DBCC DBREINDEX operation runs as one atomic transaction. This means if you stop DBCC DBREINDEX the entire operation is rolled back, and you must start over. However, if you stop DBCC INDEXDEFRAG it stops instantly and no work is lost, because each unit of work performed by DBCC INDEXDEFRAG occurs as a separate transaction.

DBCC INDEXDEFRAG consists of two phases:

1.

Compact the pages and attempt to adjust the page density to the fillfactor that was specified when the index was created. DBCC INDEXDEFRAG attempts to raise the page-density level of pages to the original fillfactor. DBCC INDEXDEFRAG does not, however, reduce page density levels on pages that currently have a higher page density than the original fillfactor.

2.

Defragment the index by shuffling the pages so that the physical ordering matches the logical ordering of the leaf nodes of the index. This is performed as a series of small discrete transactions; therefore, the work done by DBCC INDEXDEFRAG has a small impact to overall system performance. Figure 8 shows the page movements performed during the defragmentation phase of DBCC INDEXDEFRAG.

Figure 8: Page movements performed by DBCC INDEXDEFRAG within the data file

Figure 8: Page movements performed by DBCC INDEXDEFRAG within the data file

DBCC INDEXDEFRAG does not help to untangle indexes that have become interleaved within a data file. Likewise, DBCC INDEXDEFRAG does not correct extent fragmentation on indexes. Interleaving occurs when index extents (a group of eight index pages) for an index are not completely contiguous within the data file, leaving extents from one or more indexes intermingled in the file. Interleaving can occur even when there is no logical fragmentation, because all index pages are not necessarily contiguous, even when logical ordering matches physical ordering.

Even with this limitation, the tests showed that DBCC INDEXDEFRAG can be as effective at improving performance of workloads as DBCC DBREINDEX. In fact, the tests showed that even if you can rebuild indexes so that there is minimal interleaving, this does not have a significant effect on performance. Reducing the logical fragmentation levels had a much greater impact on workload performance. This is why it is recommended that you focus on logical fragmentation and page density levels when examining fragmentation on your indexes. Table 5 provides a summary of the differences between DBCC DBREINDEX and DBCC INDEXDEFRAG.

Table 5 Comparison of DBCC DBREINDEX to DBCC INDEXDEFRAG

Functionality DBCC DBREINDEX DBCC INDEXDEFRAG

Online/Offline

Offline

Online

Faster when logical fragmentation is:

High

Low

Parallel processing

Yes

No

Compacts pages

Yes

Yes

Can be stopped and restarted without losing work completed to that point

No

Yes

Able to untangle interleaved indexes

May reduce interleaving

No

Additional free space is required in the data file for defragmenting

Yes

No

Faster on larger indexes

Yes

No

Rebuilds statistics

Yes

No

Log space usage

High in full recovery mode (logs entire contents of the index), low in bulk logged or simple recovery mode (only logs allocation of space)

Varies based on the amount of work performed

May skip pages on busy systems

No

Yes

Performance: DBCC DBREINDEX vs. DBCC INDEXDEFRAG

The tests demonstrated that both DBCC DBREINDEX and DBCC INDEXDEFRAG can defragment indexes effectively and return the page density levels to near the original fillfactor of the table. Based on these results, your availability needs should determine which statement you should run.

If you have a period of time to rebuild the indexes offline, DBCC DBREINDEX generally rebuilds indexes faster than DBCC INDEXDEFRAG. DBCC DBREINDEX takes full advantage of multiprocessor systems by creating the index in parallel across the available processors. DBCC INDEXDEFRAG is designed to be less intrusive to a production environment and has little impact to workload performance. The tests demonstrated that even when multiple concurrent streams were performing DBCC INDEXDEFRAG in parallel, the DSS workload performance was never affected by more than 10 percent. However, this design causes DBCC INDEXDEFRAG to take significantly longer to complete on larger indexes. In addition, the length of time it takes DBCC INDEXDEFRAG to defragment varies depending on load on the server at the time the statement is run.

Figure 9 shows a comparison between the performance of DBCC INDEXDEFRAG and DBCC DBREINDEX. The values shown are the total amount of time taken to run each against all indexes in the database on the small-scale environment (results on the large-scale environment are consistent with these, with DBCC INDEXDEFRAG running as much as eight times slower than DBCC DBREINDEX). As the fragmentation level and size of the indexes increase, DBCC DBREINDEX can rebuild the indexes much faster than DBCC INDEXDEFRAG.

Figure 9: Total run time to defragment all indexes in database for the small-scale environment

Figure 9: Total run time to defragment all indexes in database for the small-scale environment
See full-sized image.

Logging Considerations: DBCC DBREINDEX vs. DBCC INDEXDEFRAG

One last consideration is the difference in the amount of data that is written to the transaction log when using DBCC INDEXDEFRAG and DBCC DBREINDEX. The amount of information logged by DBCC INDEXDEFRAG depends on the level of fragmentation and the amount of work that is performed. The tests resulted in DBCC INDEXDEFRAG logging significantly less than DBCC DBREINDEX when the database was in full recovery mode. The amount of data logged by DBCC INDEXDEFRAG, however, can vary greatly. This is because the amount of work that DBCC INDEXDEFRAG performs when defragmenting indexes is dependent on the number of page movements and the amount of page compaction necessary. You can reclaim the log space used by DBCC INDEXDEFRAG by backing up the log because the work performed is a series of small transactions.

With respect to log usage, DBCC DBREINDEX behaves slightly differently from DBCC INDEXDEFRAG, with the biggest difference being the amount of log used when in bulk logged recovery mode. When in full recovery mode, DBCC DBREINDEX logs images of each index page, which does not occur when in bulk logged mode. For this reason, in full recovery mode, the log space required by DBCC DBREINDEX is roughly equivalent to the number of index pages multiplied by 8 KB. You can use DBCC SHOWCONTIG to determine the number of pages in a given index. For large-scale environments, consider changing the recovery mode to bulk logged when you run DBCC DBREINDEX. Then return to full recovery mode after index maintenance has been performed.

Note It is important to understand logging requirements on large-scale environments because rollbacks of long-running transactions can be expensive.

Figure 10 shows the differences in log space usage between DBCC INDEXDEFRAG and DBCC DBREINDEX for the medium fragmentation level on the small-scale environment. Log space usage can fluctuate greatly for DBCC INDEXDEFRAG; however, the results from the tests can be used for a general comparison between DBCC DBREINDEX and DBCC INDEXDEFRAG.

Figure 10: Total log space used by DBCC INDEXDEFRAG and DBCC DBREINDEX while defragmenting all indexes in the DSS database

Figure 10: Total log space used by DBCC INDEXDEFRAG and DBCC DBREINDEX while defragmenting all indexes in the DSS database
See full-sized image.

 

Conclusion

Index fragmentation has very different effects on different workload types. Certain applications can achieve great performance benefits when indexes are defragmented. Understanding the application workload characteristics, system performance, and fragmentation statistics provided by SQL Server are key in making a good decision about when to defragment indexes. SQL Server provides several statements that you can use to correct fragmented indexes. The information in this paper can help you determine when and how you should defragment indexes to have the greatest impact on your workload performance.