FRESHERS WALKIN

Friday, 7 November 2014

How to troubleshoot SQL Server performance issues

To troubleshoot performance issues, you must complete a series of steps to isolate and determine the cause of the problem. Possible causes include:
Blocking
System resource contention
Application design problems
Queries or stored procedures that have long execution times

Identifying these causes is typically very time consuming, and you may spend several days evaluating the collected information. The complexity of the performance analysis is not specific to a particular database product or operating system. All applications are subject to performance constraints because of resource, design, or usage factors.

To help identify and troubleshoot the problem, collect the following information at the same time and have the output readily available before you contact Microsoft Product Support Services (PSS):
Blocker script output
SQL Profiler trace log
SQL Server Performance Monitor log

Note If you do not collect this information when the performance problem occurs, you may have to gather all the information again. This can delay the troubleshooting process.

After the problem occurs, collect the following information and have it available:
sqldiag report
Microsoft Windows NT System and Application Event logs

In most scenarios, PSS requires this information to understand the environment and the nature of the performance issue. If any part of this information is not available, the troubleshooting process may be prolonged and identifying the performance issue may be delayed.

Even if you are currently not experiencing any performance issues, Microsoft recommends that you implement this process. If a performance issue occurs, you can capture the required information as soon as possible. Additionally, if you capture a baseline SQL Profiler log, SQL Server Performance Monitor log, and blocker script when the application is performing as expected, you can use that information for comparison when the application does not perform as expected.

If you gather this information in a high traffic SQL Server environment, you may experience some performance degradation. However, you must have this information to identify the cause of the problem and for troubleshooting purposes. The SQL Profiler trace has the most impact on performance. If the performance is severely degraded, you can customize the SQL Profiler trace by reducing the types of events that it captures. Limiting the SQL Profiler trace should provide some improvement. If you have questions or problems setting up and collecting the information, contact PSS.

Blocker script output

The blocker script is critical for identifying blocking scenarios. However, you can use the output from the script to troubleshoot performance problems even when blocking is not an issue. This output also helps to determine if queries are waiting on resources, such as file I/O, or if transactions are not being committed or rolled back as expected.

SQL Profiler trace log

The SQL Profiler trace captures the activity on the computer running SQL Server. You can use this information to identify slow running queries and non-optimal execution plans. Additionally, SQL Profiler documents the series of events that occur before the performance problem and helps to identify its cause.

Note If you have a SQL Profiler trace of a large workload, you can use the Index Tuning Wizard. The Index Tuning Wizard uses the SQL Server query optimizer to determine the optimal set of indexes for the specified queries. The Index Tuning Wizard is a very efficient tool to determine if the correct indexes exist in your database. By implementing the indexes that the wizard suggests, you may be able to increase the performance of your application.

For more information about how to use the Index Tuning Wizard, see the "Index Tuning Wizard" topic in SQL Server Books Online.

SQL Server Performance Monitor log

SQL Server is typically affected by the following bottlenecks:
CPU
Memory
File I/O
Locking, blocking, or deadlocking

You can use SQL Server Performance Monitor to identify how these potential bottlenecks may affect SQL Server. Additionally, you can use this log to identify when an external process is heavily using the computer running SQL Server and negatively impacting SQL Server performance.

Before you start SQL Server Performance Monitor, make sure that the disk counters are on. To do so, run diskperf from a command prompt. If the disk counters are not on, run diskperf -y and then restart the computer.

When you create a SQL Server Performance Monitor log, collect the following information:
Paging file
Process
Processor
All SQL Server counters
Memory
Threads
Logical disk
Physical disk
System

Note The default interval of 15 seconds should enough time to monitor the server; however, for some timing issues, you may have to reduce the time interval for collecting data.

Note For more information about monitoring performance in SQL Server 2005, see the "Monitoring and Tuning for Performance" topic in SQL Server 2005 Books Online.

sqldiag utility

The sqldiag utility is provided with SQL Server. It collects valuable information about the configuration of the computer running SQL Server, the operating system, and the information that is reported to the SQL Server error logs. For information about how to use the sqldiag utility, see the "sqldiag Utility " topic in SQL Server Books Online.

Note In SQL Server 2005, the SQLdiag utility has changed significantly. The command line arguments for this utility are not compatible with SQL Server 2000. This utility may be changed, and applications or scripts that rely on its command line arguments or behavior may not work correctly in future releases. For more information, see the "SQLdiag Utility" topic in SQL Server 2005 Books Online.

Microsoft Windows NT System and Application Event logs

You can use the Windows NT system and application event logs to identify issues that you cannot see in other data. These logs help provide a complete view of server activity and provide a more complete understanding of the environment.

Where to save these files

The following Microsoft File Exchange FTP server permits you to send and receive files to and from PSS engineers:
ftp://ftppss.microsoft.com

For more information, read the instructions that are provided on the Microsoft File Exchange FTP Web site.

Thursday, 11 September 2014

The Top 5 Most Common SQL Server Performance Problems

I'll make this a count down from 5 to 1, with 1 being the most common technical problem we see.

NUMBER 5: TEMPDB PAGELATCH CONTENTION

I'm starting to see this more and more. It is usually with a system making heavy use of tempdb for some type of ETL process. This is especially common if it is an ongoing "real-time" style ETL process. I'm not saying data loads are the cause of this by the way; I'm just relaying my observations.

The symptoms of this can vary, but some things are always the same. High PAGELATCH waits in tempdb and poor performance recorded for processes using tempdb. I'll typically follow the waits to Top SQL in Performance Advisor, and see lots of queries that use temp tables listed in Top SQL. These queries usually run in milliseconds, and should never be counted among the "Top SQL" for the server. This can have people feeling like these queries are a large part of the problem, but that is not necessarily the case at all. The queries are the victims of the real problem.

Once I'm suspecting this to be the case, I will usually jump to the Disk Activity tab in Performance Advisor to see how tempdb is configured. Most times I actually see the same thing: A busy tempdb with a single data file defined. From here I'll usually recommend reconfiguring tempdb, and direct them to a resource like Jonathan's article for more information.

NUMBER 4: EXPECTING AUTO UPDATE STATISTICS TO KEEP STATISTICS UPDATED

The problem here is that the thresholds for triggering auto statistics updates end up being the same in most cases, even for a very large table. Without going into a very deep explanation, the threshold is ~20% of the rows in the table. So on a really big table it takes a lot of data change to trigger an update. Kevin Kline has a nice, easy to follow explanation of this here as well.

The reason this makes the list is that DBAs seem really surprised to find out that the auto update isn't taking care of things the way the name implies. Then there are also many dbas that believe it should be handled by their maintenance job. Then after looking at the maintenance, they are doing index reorgs most of the time, and that won't update the statistics either (though a rebuild will). I also want to note here that if you are using the fragmentation manager feature in SQL Sentry 7.0 and higher, you can have a running history of when your indexes were reorganized rather than rebuilt. This can help you decide whether the problem you're seeing could be related to auto update not happening.

The lesson here is really to keep an eye on statistics, and make sure they're updated regularly, especially on large tables, which are becoming more and more common as time goes on. Another option here can be to use trace flag 2371 to actually change the formula used to trigger the update.

NUMBER 3: THE CXPACKET WAIT TYPE

I would say that this is hands down the single most common wait type I see on larger SQL Server systems when someone asks me to look into query performance with them.

There is a lot of information out there on how to deal with this, but sadly I still see a lot of people make the initial assumption that the problem should be solved by having either the query or the entire server set MAXDOP to 1. More often than not the problem can be handled by proper indexing or statistics maintenance. It could also be that the plan cached for this query is just not optimal, and you can mark it for recompile using sp_recompile, set recompile at the query level, or just evict the plan using DBCC FREEPROCCACHE with a plan handle. It is best to exhaust these options before deciding to change MAXDOP to 1 because you could be throwing away a lot of processing power without realizing it.

NUMBER 2: MISUNDERSTOOD "TIMEOUT EXPIRED PRIOR TO THE COMPLETION OF…"

This one is huge. Outside of some very edge case behavior, there are two basic types of timeouts you *might* deal with for SQL Server. These are connection timeouts and operation (or query) timeouts. In both cases these are values set by the client connecting to the SQL Server. On the server side, there is a remote query timeout setting, but this is the very edge case I mentioned and will leave that for another discussion.

I'm going to focus on operation timeouts, since they are the most common. Operation timeout errors from various software tools might be the most misunderstood situation I come across. The cause of these really boils down to one simple thing though: The client executing the command has set a maximum amount of time that it will wait for the command to complete. If this maximum is reached prior to completion the command is aborted. An error is raised from the client.

Many times the timeout error will induce a panic mode, because the error can look a bit intimidating. The reality is, though, that this is not much different than hitting the stop button in SSMS because the query was taking too long. In fact, it will show up exactly the same in a profiler trace with Error = 2 (Aborted).

So, what does a timeout like this really tell us? It tells us that queries are taking longer than expected. We should go into "performance tuning" mode rather than "something is broken" mode. The error information from the client is really just some good information on where you might start to focus your tuning efforts.

If you receive timeout errors from the SQL Sentry monitoring service, and one of the servers you are monitoring is the source, this is not telling you that SQL Sentry is broken. This is SQL Sentry telling you that this server is experiencing performance issues. Again, it is time for "performance tuning" mode. These errors could be easily consumed internally, and retried later, but this would be doing our customers a huge disservice. We believe that you should know about *any* potential problem on your monitored server, even if it is SQL Sentry encountering the problem.

Incidentally, this is true for SQL Sentry, just as it is for any other system that uses an RDBMS for a repository your SQL Sentry database needs some TLC now and again. Without it you may indeed experience some timeouts from your SQL Sentry client. We spend a lot of time tuning our queries for performance before they ever make it out the door, but proper maintenance will ensure they continue to run as intended.

NUMBER 1: MEMORY PRESSURE

This is the big one. As soon as Kevin mentioned wanting this list it's the first thing that popped into my head. Not only because I see it so often, but also because it is so often mistaken for poor disk performance.

There are lots of caches in SQL Server, but the most well-known is the data cache (aka buffer pool). The easiest way to describe the data cache is that it is the data stored in memory, rather than persisted to disk. Being able to store lots of data in memory long term is desirable because working with data in memory is generally much quicker than having to perform physical I/Os.

I could turn this post into a very long discussion on memory pressure in SQL Server at this point, but I promise I will try to avoid that. There is already a ton of information available on this subject, and that is not really the intent of this post. What I will say is that, usually, memory pressure manifests as a few different symptoms. When viewed individually, some of these symptoms can lead you to incorrect, and sometimes costly, conclusions.

The two misleading symptoms are that you may start to see higher than normal latency across the disk subsystem, and you may start to see abnormally high waits related to disk activity. If you look at nothing but these two symptoms, you may come to the conclusion that you need to work on your disk system.

This is why being presented with all relevant metrics on one dashboard is so important. You have to look at the bigger picture, and having the memory-related data available along with the disk activity and waits helps to paint a clearer picture of what is really going on.

Typically what I'll see (along with the disk waits and disk latency) is a PLE (Page Lifetime Expectancy) that is fairly low for this server. I describe it this way because what is good or bad for this value really "depends". The larger your buffer cache is, the higher your "critical" threshold will be for PLE. The more data there is to churn in and out of the buffer, the worse off you will be when the "churn" actually happens. Another consideration is NUMA. The way the PLE counter is calculated can cause this value alone to be very misleading when multiple NUMA nodes are involved, as described by Paul Randal in a blog post about page life expectancy isn't what you think... Luckily in SQL Sentry 7.0 and higher, you can actually see where PLE is for the individual NUMA nodes in history mode, which makes this a bit less of a problem.

I'll usually also see consistently higher lazy writer activity, and SQL Server page faults (SQL Server going to disk). Sometimes I'll see what I call buffer tearing. It's basically when the data buffer is up and down frequently creating a jagged (or torn) edge on the history chart in Performance Advisor. Finally, I may also see an abnormally large plan cache reducing available memory for the data cache.

All of these things together spell memory pressure, and there are various ways to deal with them, but the important thing to note is that this is not a disk issue. It's not saying that your disk system is necessarily wonderful either, but I am saying I wouldn't call up your SAN guy and order a bunch of new hardware based on this situation. Once you get the memory pressure situation under control, SQL Server will not need to go to disk as much, and the few symptoms related to disk may disappear entirely!

The moral here is really to always consider the full picture of performance, because looking at one thing out of context could severely limit your options for a solution.

HONORABLE MENTION: SQL SERVER AGENT HISTORY RETENTION SETTINGS UNLIMITED

We see this enough to include it in this list, and I think anyone that uses SQL Server Agent should be aware of it.

In SQL Server Agent Properties, under History, you can adjust retention settings.

For some reason, I've seen quite a few people set this to unlimited by unchecking both checkboxes. If you do this, and you use Agent jobs frequently, eventually you're going to run into problems with job history in MSDB, because these tables aren't really indexed very well. The settings I'm using above are generally fine for most cases, and if you're using SQL Sentry event manager, you're keeping this information in the SQL Sentry database anyway, so retaining it here is just redundant.
CONCLUSION

So there are my (current) top 5 most common SQL Server performance issues/topics. For #4 and #5, I actually had to run some numbers to find out what they were, but for the top three, I knew without having to consider it much at all.

Thanks for reading!

FRESHERS WALKIN

AUGUSTEN TECHNICAL

Pages

amazon

Blogroll

About

Pages - Menu

Blog Archive

Friday, 7 November 2014

How to troubleshoot SQL Server performance issues

Thursday, 11 September 2014

The Top 5 Most Common SQL Server Performance Problems

Popular Posts

Blogger templates

Blogger news

Labels

Blog Archive