You can identify cluster performance issues when a cluster component goes into contention. The performance of volume workloads that use the component slow down and their response time for client requests increases, which triggers an incident in Performance Manager.
A component that is in contention cannot perform at an optimal level, its performance has declined, and the performance of other cluster components and workloads, called victims, might have increased response time. To bring a component out of contention, you must reduce its workload or increase its ability to handle more work, so that the performance can return to normal levels. Because Performance Manager collects and analyzes workload activity in five-minute intervals, it detects only when a cluster component is consistently overused. Transient spikes of overusage that last for only a short duration within the five-minute interval are not detected.
For example, a storage aggregate might be under contention because one or more workloads on it are competing for their I/O requests to be fulfilled. Other workloads on the aggregate can be impacted, causing their performance to decrease. To reduce the amount of activity on the aggregate, there are different steps you can take, such as moving one or more workloads to a less busy aggregate, to lessen the overall workload demand on the current aggregate. For a QoS policy group, you can adjust the throughput limit, or move workloads to a different policy group, so that the workloads are no longer being throttled.
Performance Manager monitors the following cluster components to alert you when they are in contention:
- Represents the wait time of I/O requests by the iSCSI protocols or the Fibre Channel protocols (FCP) on the cluster. The wait time is time spent waiting for iSCSI Ready to Transfer (R2T) or FCP Transfer Ready (XFER_RDY) transactions to complete before the cluster can respond to an I/O request. If the network component is in contention, it means high wait time at the block protocol layer is impacting the response time of one or more workloads.
- Network Processing
- Represents the software component in the cluster involved with I/O processing between the protocol layer and the cluster. The node handling network processing might have changed since the incident was detected. If the network processing component is in contention, it means high utilization at the network processing node is impacting the response time of one or more workloads.
- Policy Group
- Represents the Storage Quality of Service (QoS) policy group of which the workload is a member. If the policy group component is in contention, it means all workloads in the policy group are being throttled by the set throughput limit, which is impacting the response time of one or more of those workloads.
- Cluster Interconnect
- Represents the cables and adapters with which clustered nodes are physically connected. If the cluster interconnect component is in contention, it means high wait time for I/O requests at the cluster interconnect is impacting the response time of one or more workloads.
- Data Processing
- Represents the software component in the cluster involved with I/O processing between the cluster and the storage aggregate that contains the workload. The node handling data processing might have changed since the incident was detected. If the data processing component is in contention, it means high utilization at the data processing node is impacting the response time of one or more workloads.
- MetroCluster Resources
- Represents the MetroCluster resources, including NVRAM and interswitch links (ISLs), used to mirror data between clusters in a MetroCluster configuration. If the MetroCluster component is in contention, it means high write throughput from workloads on the local cluster or a link health issue is impacting the response time of one or more workloads on the local cluster. If the cluster is not in a MetroCluster configuration, this icon is not displayed.
- Aggregate or SSD Aggregate
- Represents the storage aggregate on which the workloads are running. If the aggregate component is in contention, it means high utilization on the aggregate is impacting the response time of one or more workloads. An "Aggregate" consists of all HDDs, or a mix of HDDs and SSDs (a Flash Pool aggregate). An "SSD Aggregate" consists of all SSDs (an all-flash aggregate).
Note: When viewing SSD Aggregates, bully and shark workloads are not currently displayed, and utilization charts are unavailable.