Monitor and optimize Fibre Channel SAN performance
Tune Up
In the past, spinning hard disks were often a potential bottleneck for fast data processing, but in the age of hybrid and all-flash storage systems, the bottlenecks are shifting to other locations on the storage area network (SAN). I talk about where it makes sense to influence the data stream and how possible bottlenecks can be detected at an early stage. To this end, I will be determining the critically important performance parameters within a Fibre Channel SAN and showing optimization approaches.
The Fibre Channel (FC) protocol is connectionless and transports data packets in buffer-to-buffer (B2B) mode. Two endpoints, such as a host bus adapter (HBA) and a switch port, negotiate a number of FC frames, which are added to the input buffer as buffer credits at the other end, allowing the sender to transmit a certain number of frames to the receiver on a network without having to wait for each individual data packet to be confirmed (Figure 1).
For each data packet sent, the buffer credit is reduced by a value of one, and for each data packet confirmed by the other party, the value increases by one. The remote station sends a receive ready (R_RDY) message to the sender as soon as the frames have been processed and new data can be sent. If the sender does not receive this R_RDY message and all buffer credits are used up, no further data packets are transmitted until the sender receives the message. Actual flow control of the data is handled by the higher level SCSI protocol.
Suppose a server writes data over a Fibre Channel SAN to a remote storage system; the FC frames are forwarded to multiple locations along the way in the B2B process, as is the case whenever an HBA or a storage port communicates with a switch port or two switches exchange data with each other over one or more Inter-Switch Link (ISL) connections connected in parallel. With this FC transport layer method – service class 3 (connectionless without acknowledgement) optimized for mass storage data – many FC devices can communicate in parallel with high bandwidth. However, this type of communication also has weaknesses, which quickly become apparent in certain constellations.
Backlog by R_RDY Messages
One example of this type of backlog is an HBA or memory port that does not return R_RDY messages to the sender because of a technical defect or driver problem or that only returns R_RDY messages to the sender after a delay. In turn, transmission of new frames are delayed. Incoming data is then stored and consumes the available buffer credits. The backlog then spreads farther back and gradually uses up the buffer credits of the other FC ports on the route.
Especially with shared connections, all SAN subscribers who communicate over the same ISL connection are negatively affected because no buffer credits are available for them during this period. A single slow-drain device can lead to a massive drop in the performance of many SAN devices (fabric congestions). Although most FC switch manufacturers have now developed countermeasures against such fabric congestions, they only take effect when the problem has already occurred and are only available for the newer generations of SAN components.
To detect fabric congestions at an early stage, you at least need to monitor the ISL ports on the SAN for such situations. One indicator of this kind of bottleneck is the increase in the zero buffer credit values at the ISL ports. These values indicate how often units had to wait 2.5µs for the R_RDY message to arrive before further frames could be sent. If this counter grows to a value in the millions within a few minutes, caution is advised. In such critical cases, the counters for "link resets" and "C3 timeouts" at the affected ISL ports usually also grow.
Data Rate Mismatches
A similar effect as in the previous case can occur if a large volume of data is transferred at different speeds between endpoints on the SAN. For example, if the HBA operates at a bandwidth of 8Gbps while the front-end port on the storage system operates at 16Gbps, the storage port can process the data almost twice as fast as the HBA. In return, at full transfer rate, the storage system returns twice the volume of data to the HBA that it could process in the same time.
Buffering the received frames also nibbles away the buffer credits there, which can cause a backlog and a fabric congestion given a continuously high data transfer volume. The situation becomes even more drastic with high data volumes at 4 and 32Gbps. Such effects typically occur at high data rates on the ports of the nodes with the lowest bandwidth in the data stream.
Additionally, too high a fan-in ratio of servers to the storage port is possible (i.e., too high a volume of data from the servers arriving at the storage port, which is no longer able to process the data). My recommendation is therefore to adapt the speed of the HBA and storage port to a uniform speed and, depending on the data transfer rates, maintain a moderate fan-in ratio between servers and the storage port, if possible.
To reduce the data traffic fundamentally over the ISLs, you will want to configure your servers such that the hosts only read locally in the case of cross-location mirroring (e.g., with the Logical Volume Manager) and only access both storage systems when writing. With a high read rate, this approach immensely reduces ISL data traffic and thus the risk of potential bottlenecks.
Overcrowded Queue Slows SAN
The SCSI protocol also has ways to accelerate data flow. The Command Queuing and I/O Queuing methods supported by SCSI-3 achieve a significant increase in performance. For example, a server connected to the SAN can send several SCSI commands in parallel to the logical unit number (LUN) of a storage system. When the commands arrive, they are put into a kind of waiting loop before it is their turn to be processed. Especially for random I/O operations, this arrangement offers significant performance gain.
The number of I/O operations that can be buffered in this queue is known as the queue depth . Important values include the maximum queue depth per LUN and per front-end port of a storage array. These values are usually fixed in the storage system and immutable. On the other hand, you can specify the maximum queue depth on the server side of the HBA or in its driver. Make sure that the sum of the queue depths of all LUNs on a front-end port does not exceed its maximum permitted queue depth. If, for example, 100 LUNs are mapped to an array port and addressed by their servers with a queue depth of 16, the maximum queue depth value at the array port must be greater than 1,600. If, on the other hand, the maximum value of a port is only 1,024, the connected servers can only work with a queue depth of 10 with these LUNs. It makes sense to ask the vendor about the limits and optimum settings for the queue depth.
If a front-end port is overloaded because of incorrect settings and too many parallel I/O operations, and all queues are used up, the storage array sends a Queue_Full or Device_Busy message back to the connected servers, which triggers a complex recovery mechanism that usually affects all servers connected to this front-end port. On the other hand, a balanced queue depth configuration can often tweak that extra share of server and storage performance out of the systems. If the mapped servers or the number of visible LUNs change significantly, you need to update the calculations to prevent gradual overloading.
Buy this article as PDF
(incl. VAT)
Buy ADMIN Magazine
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Most Popular
Support Our Work
ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.