Performance troubleshooting on Microsoft SQL Server
Gridlock
If Microsoft SQL Server in your IT center supports infrastructure services such as virtualization and IP address management (IPAM), as well as databases for applications and business processes, a performance bottleneck in the SQL Server area is almost always a critical issue. For one thing, delays to data queries can be very specifically translated into dollars and cents from the management perspective. Moreover, all areas of IT – server, storage, network, and applications themselves – are involved in accessing SQL Server. As the person responsible for the affected SQL Server, you would thus do well to be proactive from the outset by identifying any problems as quickly as possible and helping others understand any difficulties. However, before you pounce on SQL Server, make sure the problem lies within your sphere of influence.
A typical SQL-based application has several layers – the application itself, with the application layer and the client front end – each of which can be the source of poor performance. A further sticking point could be the network between client and server. If the SQL queries provide a large volume of data, transmitting the data to the client could be a cause of the perceived delay – even if SQL Server has processed the results quickly.
The database also can be the cause of poor performance. If you have a table with many records, incorrect or missing indexing cannot possibly be compensated for by the available hardware resources and will affect the performance of the selection queries. SQL Server makes very complex use of the hardware resources provided to it, and configuration errors could easily occur. Also, a typical SQL Server is likely to provide databases for numerous applications in several instances. What is rarely tested before rolling out a new application or a major upgrade of an existing application is whether the use of resources still leaves enough room to maneuver.
For further diagnostics, it is important to find out where the performance bottleneck originated. Thus, you should first ask the appropriate department where the information causing the suspected performance bottleneck comes from. If you only have one user complaining about a "slow database server," it is quite possible that the application they use has a performance problem that is not related to SQL Server, or that the application is simply badly programmed.
Tools for Troubleshooting
The most important tools to use to diagnose problems in the SQL Server environment are provided by SQL Server and Windows themselves. For example, SQL Management Studio provides access to the server's system tables and logs, as well as diagnostic tools such as SQL Activity Monitor, SQL Server Profiler, and Database Tuning Advisor. PerfMon provides information about the performance behavior of various SQL Server and Windows components. If you need to generate data collector sets and evaluate PerfMon results, the free PAL (Performance Analysis for Logs) proves to be an excellent choice [1]. Windows Task Manager also provides a valuable service, although the Process Explorer by Microsoft Sysinternals [2] is considerably more powerful.
If a secondary, network-specific diagnosis is required, Wireshark or Microsoft Message Analyzer (formerly NetMon) jump into the breach. DiskPart and FSUtil give you the lowdown on some typical misconfigurations in the disk subsystem. If the affected SQL Servers are virtualized, you need a good basic understanding of how resources are allocated on the hypervisor. In this case, you also need tools to measure the performance data from there.
Depending on the direction in which your troubleshooting takes you, it might also be necessary to visualize the results (e.g., to present to other teams or the management). Spreadsheet programs such as Microsoft Excel are well suited to visualizing tabular data, but you can also import the data into SQL Server and access Reporting Services or even Microsoft Power BI.
If the results of your diagnosis force you and your database developers to optimize the database schema and queries, I recommend the popular Unified Plan Explorer [3] tool, which the creator, SQL Sentry, now offers free of charge.
Determine the Baseline, Identify Bottlenecks
To begin, it is a good idea to know the performance behavior of your SQL Server. Record a baseline for the key resources (CPU, RAM, network, hard disk) – that is, a performance profile of the server in a state considered to be normal in ongoing operations. You can use PerfMon for this step, or you could use your monitoring system, assuming that it is designed for long-term storage of performance indicators. If you are then confronted with questions about possible changes in SQL Server's performance in the future, you can quickly compare the new data with the historic data.
You can quickly examine the behavior of an interactive application with the help of SQL Server Profiler. To begin, make sure that the time on the user's workstation and on SQL Server are in sync to the extent possible. Launch Profiler and select the events to be tracked so that TSQL events are recorded. If you know that the application in question tends to work with stored procedures rather than SQL instructions, add these events, too. Filter the trace when recording, preferably by user, workstation, database, or application if these parameters are unambiguously known.
Let the user perform the action that seems slow to them, and make a precise note of the timing from the user's perspective. In the Profiler trace, you will be able to identify clearly when the user input raises a query on SQL Server and how long it takes from the perspective of SQL Server. If a considerable amount of time occurs between user input and the SQL query (or between the end of the SQL query and the output in the application), the problem, at least partly, lies with the application, which needs to be addressed by its developers. The same thing applies if you notice in Profiler that the submitted queries cause errors on SQL Server.
One more question needs to be clarified as soon as possible: Does the observed performance bottleneck only occur when reading from the database? Only when writing? Or for any kind of access? This information will be enormously helpful in the further diagnosis.
Slow SQL Server
If the bottleneck is SQL Server, then you need to draw up a plan for further diagnosis. SQL Server uses four types of resources: CPU, RAM, storage, and network. The following sequence is recommended for performance troubleshooting:
- Network: Problems with the network can also cause bottlenecks in the CPU or RAM.
- CPU.
- RAM: Not enough RAM will almost always lead to increased disk I/O levels.
- Disk: If you have ruled out the other factors as the cause, take a closer look at the disks.
A bottleneck in one resource does not automatically mean that the problem can be resolved by providing more of that resource.
Before checking anything else, make sure SQL Server has no obvious problems or misconfigurations. The following list can serve as a starting point, and you can add your own known incidents:
- Status of hardware, virtualization, and operating system (OS): Do suspicious entries exist in the Event Viewer?
- For virtualized servers, is the CPU, RAM, or both oversubscribed on the hypervisor?
- Configuration and status of network connections: Check the teaming, speed, and flow control.
- Has a tempdb database filled the system logs, disk logs, or hard disk?
- Have you defined and enabled exclusions from the antivirus scan [4]?
- Is the maximum available RAM for the SQL instance (or the sum total of all instances in case of several instances) configured such that the OS still has enough RAM left? For Windows Server 2012, the OS should have at least 2GB of RAM available, or 4GB on a physical system with a large amount of RAM.
- In Hyper-V virtualization does the dynamic memory configuration comply with best practices [5]?
- Have you taken into account and checked the alignment of disk volumes [6] [7]? Are the volumes formatted with a 64KB file allocation unit?
- Does the distribution of the database files (on SQL Server 2016 and for tempdb instances) correspond to the performance profile of the disk volumes?
- Are full backups and database or index reorganizations running at peak times?
- If the default CPU configuration of SQL Server has been changed, does the currently active configuration still match the number of instances and their use?
If you have eliminated all possible sources of error, but the observed problem still persists, you need to perform more in-depth performance troubleshooting.
Buy this article as PDF
(incl. VAT)
Buy ADMIN Magazine
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Most Popular
Support Our Work
ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.