« Previous 1 2
Performance troubleshooting on Microsoft SQL Server
Gridlock
Evaluating Performance Details
You should gain a quick overview of the server's resource consumption and compare it with the baseline, assuming you have one. To do so, launch PerfMon and add the following performance counters to the real-time display (Figure 1):
- Memory Available MBytes and Page Faults/sec
- Network Adapter Bytes Sent/sec and Bytes Received/sec
- Physical Disk: Avg. Disk Queue Length for each volume
- Processor Information: % User time and % Processor Time
- System: Processor Queue Length
- TCPv4: Segments/sec and Segments Retransmitted/sec
- TCPv6: The same as TCPv4 if TCPv6 is used in production. Now is the time to find out which version is used – some systems automatically agree to use TCPv6 if it is technically possible.
Even better, define a data collector with these values to log data over a slightly longer period of time. Watch out for the following phenomena:
- Network: Bytes Sent/sec and Bytes Received/sec approaching the available bandwidth. Keep in mind that the bandwidth is specified in bits per second. Thus, for a Gigabit Ethernet interface, the critical value is 131,072 bytes per second. If the measured values are permanently close to the upper limit, then the diagnosis is available in the network bandwidth, but it is also critical to look at the size of the returned results.
- Network/TCP: Retransmitted segments make up too large a proportion of the total transferred segments on your network. Typical values fluctuate between 0.5 and 2 percent. Ask your network specialists to be sure.
- CPU: The utilization of CPU on a single physical server that approaches the 100 percent mark is not fundamentally worrying, but if you see the CPU queue length permanently exceeding the limit of 1, you should be worried. In virtualized environments, this value can also be distorted by CPU scheduling on the hypervisor.
- RAM: A system with no free RAM is not good. If the Available MBytes value is clearly and permanently less than 256, action is required. You can also check the rate of Page Faults/sec , which then shoots up. Because this metric has no generally accepted thresholds, you should compare it with your own baseline, if possible.
- Physical disk queue length: If this classic metric is permanently greater than 1, you definitely have a problem, even though you cannot be sure at this juncture where the currently observed performance bottleneck originates.
Observe performance for a while and remember the troubleshooting sequence "network, CPU, RAM, disk" to help you to exclude normal parameters at an early stage and concentrate on what is essential.
Conclusions
Armed with the right tools, you can identify and resolve performance issues accurately with SQL Server. In the second article of this series, I will take a look at the network, RAM usage, and the hard disk drives. Finally, I will look into visualizing and evaluating the results.
Infos
- PAL on GitHub: https://github.com/clinthuffman/PAL
- Process Explorer: https://docs.microsoft.com/en-us/sysinternals/downloads/process-explorer
- Plan Explorer: https://www.sentryone.com/plan-explorer
- Antivirus exclusions: https://blog.sqlauthority.com/2015/05/19/sql-server-antivirus-exclusions-best-practices-with-sql-server/
- Dynamic memory: https://msdn.microsoft.com/en-us/library/hh372970.aspx
- MSDN: Partition alignment: https://blogs.msdn.microsoft.com/jimmymay/2014/03/14/disk-partition-alignment-it-still-matters-dpa-for-windows-server-2012-sql-server-2012-and-sql-server-2014/
- TechNet: Partition alignment: https://technet.microsoft.com/en-us/library/dd758814.aspx
« Previous 1 2
Buy this article as PDF
(incl. VAT)