Manual Pages


Table of Contents

NAME

na_fcstat - Commands for Fibre Channel stats functions

SYNOPSIS

fcstat link_stats [ channel_name ]

fcstat fcal_stats [ channel_name ]

fcstat device_map [ channel_name ]

DESCRIPTION

Use the fcstat command to show (a) link statistics maintained for all drives on a Fibre Channel loop, (b) internal statistics kept by the Fibre Channel driver, and (c) a tenancy and relative physical position map of drives on a Fibre Channel loop.

SUB-COMMANDS: link_stats

All disk drives maintain counts of useful link events. The link_stats option displays the link event counts and this information can be useful in isolating problems on the loop. Refer to the event descriptions and the example below for more information.

link failure count
The drive will note a link failure event if it cannot synchronize its receiver PLL for a time greater than R_T_TOV, usually on the order of milliseconds. A link failure is a loss of sync that occurred for a long enough period of time and therefore resulted in the drive initiating a Loop Initialization Primitive (LIP). Refer to loss of sync count below.

underrun count
Underruns are detected by the Host Adapter (HA) during a read request. The disk sends data to the HA through the loop and if any frames are corrupted in transit, they are discarded by the HA as it has received less data than expected. The driver reports the underrun condition and retries the read. The cause of the underrun is downstream in the loop after the disk being read and before the HA.

loss of sync count
The drive will note a loss of sync event if it loses PLL synchronization for a time period less than R_T_TOV and thereafter manages to resynchronize. This event generally occurs when a component, before the disk, reports loss of sync up to and including the previous active component in the loop. Disks that are on the shelf borders are subject to seeing higher loss of sync counts than disks that are not on a border.

invalid CRC count
Every frame received by a drive contains a checksum that covers all data in the frame. If upon receiving the frame, the checksum does not match, the invalid CRC counter is incremented and the frame is "dropped". Generally, the disk which reports the CRC error is not at fault, rather a component between the Host Adapter (which originated the write request) and the reporting drive, corrupts the frame.

frame in count/ frame out count
These counts represent the total number of frames received and transmitted by a device on the loop. The number of frames received by the Host Adapter is equal to the sum of all of the frames transmitted from all of the disks. Similarly, the number of frames transmitted by the Host Adapter is equal to the sum of all frames received by all of the disks.

The occurrence of any of the error events may result in loop disruption. A link failure is considered the most serious since it may indicate a transmitter problem that is affecting loop signal integrity upstream of the drive. These events will typically result in frames being dropped and may result in data underruns or SCSI command timeouts.

Note that loop disruptions of this type, even though potentially resulting in data underruns and/or SCSI command timeouts, will not result in data corruption. The host adapter driver will detect such events and will retry the associated commands. The worst-case effect is a negligible drop in performance.

All drive counters are persistent across node reboots and drive resets and can only be cleared by power-cycling the drives. Host adapter counters, for example underruns, are reset with each reboot.

SUB-COMMANDS: fcal_stats

The Fibre Channel host adapter driver maintains statistics on various error conditions, exception conditions, and handler code paths executed. In general, interpretation of the fields requires understanding of the internal workings of the driver. However, some of the counts kept on a per drive basis, (for example: device_underrun_cnt, device_overrun_cnt, device_timeout_cnt) may be helpful in identifying potentially problematic drives.

Counts are not persistent across node reboots.

SUB-COMMANDS: device_map

A Fibre Channel loop, as the name implies, is a logically closed loop from a frame transmission perspective. Consequently, signal integrity problems caused by a component upstream will be seen as problem symptoms by components downstream.

The relative physical position of drives on a loop is not necessarily directly related to their loop IDs (which are in turn determined by the drive shelf IDs). The device_map sub-command is helpful therefore in determining relative physical position on the loop.

Two pieces of information are displayed, (a) the physical relative position on the loop as if the loop was one flat space, and (b) the mapping of devices to shelves, to aid in quick correlation of disk ID with shelf tenancy.

EXAMPLE OF USE

Diagnosing a possible problem using fcstat

Suppose a running node is experiencing problems indicative of loop signal integrity problems. For example, the syslog shows SCSI commands being aborted (and retried) due to frame parity/CRC errors.

To isolate the faulty component on this loop, we collect the output of link_stats and device_map.

toaster> fcstat link_stats 4

  Loop        Link  Underrun   Loss of   Invalid    Frame In   Frame Out
   ID      Failure     count      sync       CRC       count       count
             count               count     count
  4.29          0         0       180         0         787        2277
  4.28          0         0        26         0         787        2277
  4.27          0         0         3         0         787        2277
  4.26          0         0        13         0         788        2274
  4.25          0         0        27         0         779        2269
  4.24          0         0         2         0         787        2277
  4.23          0         0        11         0         786        2274
  4.22          0         0        83         0         786        2274
  4.21          0         0         3         0         786        2274
  4.20          0         0        11         0         786        2274
  4.19          0         0        14         0         779        2277
  4.18          0         0        26         0         786        2274
  4.17          0         0        10         0         787        2274
  4.16          0         0        90         0         779        2269
  4.45          0         0        12         0      183015      179886
  4.44          0         0        16         0     1830107    17990797
  4.43          0         0         7        11     1829974    17988806
  4.42          0         0        13        33     1968944    18123526
  4.41          0         0        14        23     1843636    17989836
  4.40          0         0        13        11     1828782    17990036
  4.39          0         0        14       138     4740596    18459648
  4.38          0         0        11        27     1832428    17133866
  4.37          0         0        43        22     1839572    17994200
  4.36          0         0        13       130     4740446    18468932
  4.35          0         0        11        23     1844301    17994200
  4.34          0         0        14        25     1832428    17133866
  4.33          0         0        26        29     1839572    17894220
  4.32          0         0       110        31     1740446    18268912
  4.61          0         0        50        23     1844301    17994200
  4.60          0         0        12        21     1830150    18188148
  4.59          0         0        16        19     1830107    17990997
  4.58          0         0         7        27     1829974    17988904
  4.57          0         0        13        25     1968944    18123526
  4.50          0         0        14        19     1843636    17889830
  4.49          0         0        13        22     1828782    18090042
  4.48          0         0       114       130     4740596    18459648
  4.ha          0         0         1         0   396255820    51468458

toaster> fcstat device_map 4
  Loop Map for channel 4:

  Translated Map: Port Count 37
                    7  29  28  27  26  25  24  23  22  21  20  19  18  17  16  45
                   44  43  42  41  40  39  38  37  36  35  34  33  32  61  60  59
                   58  57  50  49  48
  Shelf mapping:
                  Shelf 1:  29  28  27  26  25  24  23  22  21  20  19  18  17  16
                  Shelf 2:  45  44  43  42  41  40  39  38  37  36  35  34  33  32
                  Shelf 3:  61  60  59  58  57  XXX XXX XXX XXX XXX XXX 50  49  48

From the output of device_map we see the following:

Drive 29 is the first component on the loop immediately downstream from the host adapter. (Note that the host adapter port (7) will always appear first on the position map.)

Shelf 3 has 6 slots that do not have any disks, which are represented by `XXX'. If the slot showed `BYP', then the slot is bypassed by an embedded switched hub (ESH).

Shelf 1 is connected to shelf 2 between drives 16 and 45. Shelf 2 is connected to shelf 3 between drives 32 and 61.

From the output of link_stats we can see the following:

There is a higher loss of sync count for the drive connected to the host adapter. Since every node reboot involves re-initialization of the host adapters, we expect the first drive on the loop to see a higher loss of sync count.

Disks 4.16 through 4.29 are probably spares as they have relatively small frame counts.

CRC errors are first reported by drive 4.43. Assuming that there is only one cause of all the CRC errors, then the failing component is located between the Host Adapter and drive 4.43.

Since drive 4.43 is in shelf 2, it is possible that the errors are being caused by faulty components connecting the shelves. In order to isolate the problem, we want to see if it is related to any of the shelf connection points. We can do this by running a disk write test on the first shelf of disks using the following command (This command is only available in maintenance mode so it will be necessary to reboot.)

*> disktest -W -s 4:1

       where:
       W       Write workload since CRC errors only occur on writes
       s 4:1   test only shelf 1 on adapter 4

If errors are seen testing shelf 1, then it is likely that the faulty component is either the cable or the I/O module between the host adapter and the first drive. If no errors are seen testing shelf 1, then the test should be run on shelf 2. If errors are seen testing shelf 2, the faulty component could be the connection between shelf 1 and 2. A plan of action would involve (a) replacing cables between shelves 1 and 2, or HA and shelf 1, and (b) replacing I/O modules at faulty connection point.

Example of a link status for Shared Storage configurations

The following link status shows a Shared Storage configuration:

ferris> fcstat link_stats

  Targets on channel 4a:
  Loop                Link  Underrun   Loss of   Invalid    Frame In   Frame Out
   ID              Failure     count      sync       CRC       count       count
                     count               count     count
  4a.80                  1         0         9         0           0           0
  4a.81                  1         0         3         0           0           0
  4a.82                  1         0        13         0           0           0
  4a.83                  1         0         3         0           0           0
  4a.84                  1         0         3         0           0           0
  4a.86                  1         0         3         0           0           0
  4a.87                  1         0         3         0           0           0
  4a.88                  1         0         3         0           0           0
  4a.89                  1         0         3         0           0           0
  4a.91                  1         0        10         0           0           0
  4a.92                  1         0         3         0           0           0
  4a.93                  1         0       264         0           0           0
  Initiators on channel 4a:
  Loop                Link  Underrun   Loss of   Invalid    Frame In   Frame Out
   ID              Failure     count      sync       CRC       count       count
                     count               count     count
  4a.0 (self)            0         0         0         0           0           0
  4a.7 (toaster)         0         0         0         0           0           0

From the output of link_stats we see the following:

The local node has a loop id of 0 on this loop, and the node named toaster has a loop id of 7 on this loop.

Example of a device map for Shared Storage configurations

The following device map shows a Shared Storage configuration:

ferris> fcstat device_map

  Loop Map for channel 4a:
  Translated Map: Port Count 14
                    0  80  81  82  83  84  86  87  88  89  91  92  93   7
  Shelf mapping:
                  Shelf 5:  93  92  91 XXX  89  88  87  86 XXX  84  83  82  81  80


  Initiators on this loop:
                    0 (self)  7 (toaster)

From the output of device_map we see the following:

Both slot 6a and 6b are attached to Shelves 1 and 6.

Each loop has four nodes connected to it. On both loops, the loop id of node `ha15' is 0, the loop id of the local node, `ha16', is 1, the loop id of node `ha17' is 2, the loop id of the local node, `ha18', is 7.

Example of a device map for switch attached drives

The following device map shows a configuration where a set of shelves is connected via a switch:

toaster> fcstat device_map

  Loop Map for channel 9:
  Translated Map: Port Count 43
                    7  32  33  34  35  36  37  38  39  40  41  42  43  44  45  16
                   17  18  19  20  21  22  23  24  25  26  27  28  29  64  65  66
                   67  68  69  70  71  72  73  74  75  76  77
  Shelf mapping:
                  Shelf 1:  29  28  27  26  25  24  23  22  21  20  19  18  17  16
                  Shelf 2:  45  44  43  42  41  40  39  38  37  36  35  34  33  32
                  Shelf 4:  77  76  75  74  73  72  71  70  69  68  67  66  65  64



  Loop Map for channel sw2:0:
  Translated Map: Port Count 15
                  126  93  92  89  91  90  88  87  86  85  84  83  80  82  81

  Shelf mapping:
                  Shelf 5:  93  92  91  90  89  88  87  86  85  84  83  82  81  80

From the output of device_map we see the following:

The first set of shelves is connected to a host adapter in slot 9.

The disks of shelf 5 are connected via a switch `sw2' at its port 0. The switch port is 126 and appears first in the translated map.

HA CONSIDERATIONS

Statistics are maintained symmetrically for primary and partner loops.


Table of Contents