Content
Expedite the troubleshooting process for the most common issues by reviewing and answering the following questions for your product.
General Troubleshooting Questions
- Is there a problem with Cisco switches or Management software (DCNM)?
- If Cisco switch is involved, what is the type of switch (MDS 9000 series)?
- If this is Management software, then what version of software (DCNM)? (If this is Management tool situation, please go to the Management Software section for Data Center Network Manager top ten questions)
- If Network time protocol is not being used please open a connection to the host/switch and array to gather the times?
- Host time:
- Switch time:
- Array time:
- What are the host, switch, and storage ports in the path/s in question?
- If hosts impacted, please provide WWN and port numbers the hosts are attached to.
- Please provide switch SNs, switch names, and version of NX-OS (Cisco) on the switches.
- Please provide storage device model, SN if it is Hitachi, port numbers from the storages perspective, and port numbers from the switch perspective.
- Please verify the model and serial number of all involved switches, from all fabrics involved, if the issue encompasses multiple fabrics.
- Please provide a SAN diagram detailing the paths that are involved, including the host(s), edge devices, core devices, and array(s).
- Please provide a screenshot and/or output of the error you are receiving? Please include timestamp if available.
- Please identify any recent changes made to the SAN or fabric, including zoning changes within 2 hours of the problem start time? Please provide reason for change.
Performance Cases:
Host Situations
- If hosts impacted, please get WWN, and port numbers the hosts are attached to.
- Please identify any recent changes made to the SAN or fabric, including zoning changes within 2 hours of the problem start time.
- Has any maintenance been done in the DC near the connections between the host and switch?
- Please collect a 'show tech-support' and 'show logging onboard' from the switches involved and once complete run 'clear counters interface all', 'clear ips-stats all' and 'debug system internal clear-counters all' to clear all the port statistics on the switches. Then wait between 1-24 hours and collect another 'show tech-support'. Please determine the time between collection based on how severe the problem is.
- If known please provide the zoning for the host or hosts impacted.
- Have the SFP RX and TX power levels been checked by running command 'show interface fcx/x transceiver details'?
- Have cables been checked or cleaned since the problem started?
- Has workload changed on the host or hosts in the environment recently?
Storage Situations
- Please provide storage device model, SN if it is Hitachi, port numbers from the storages perspective, and port numbers from the switch perspective.
- Please collect a 'show tech-support' from the switches involved and once complete run 'clear counters interface all' and 'debug system internal clear-counters all' to clear all the port statistics on the switches. Then wait between 1-24 hours and collect another 'show tech-support'. Please determine the time between collection based on how severe the problem is.
- Has any maintenance been done in the DC near the connections between the storage and switch?
- Is this a mainframe issue? If so port information will be the middle digits of the FCID (00XX00).
- Has the number of hosts zoned to impacted ports increase recently? Has workload increased?
- Have SFP RX and TX power levels been checked by running command 'show interface fcx/x transceiver details' ?
- Have cables been checked or cleaned since the problem started?
CPU
- What is the IP address and names of the Management server devices attempting to communicate with this Switch?
- Have any changes to SNMP been made within 24 hours of the problem starting?
- Please provide an output of the ‘top’ command from the affected switch.
FCIP Cases
- Has a Remote Copy Planning and Design Study (RCP&D) ever been performed for this specific solution? If yes, kindly upload to TUF as soon as possible.
- If RCP&D was not performed, who is responsible for architecting and/or implementing this solution?
- When was this FCIP replication solution deployed?
- What is the longest period of time this solution has worked?
- What is the distance between the two sites?
- Who is the TELCO that provides the leased lines/circuits for the WAN?
- How many leased lines/circuits comprise the WAN links for this solution?
- How much bandwidth was identified by the vendor of the FCIP replication solution for this solution to work?
- What is the Guaranteed Bandwidth of the WAN circuits from the TELCO, i.e. the Committed Info Rate (CIR) for this solution?
- If the bandwidth is not dedicated to only traffic for this remote copy solution, is it separated from other IP traffic in its own VLAN, or has QoS been configured at the WAN router to keep it segregated from non-replication IP traffic?
- Are the WAN links in this deployment shared with any other (non-replication) IP traffic, i.e. is the bandwidth dedicated to ONLY the replication traffic for THIS solution?
- Has the workload of replication changed since the solution was deployed across the FCIP link? If WAN link is not dedicated to this solution, has the other traffic workload changed?
- What WAN and LAN equipment lies in the data flight path between the initiator(s) and target(s) at each site (e.g., Ethernet routers, switches, traffic-shaping devices, etc.)? Please be specific.
- What is the maximum MTU size the Wan supports?
Management Software Cases:
Cisco Data Center Network Manager (DCNM) General Questions
- What version of DCNM are you running?
- What is the host OS (version, VM) that DCNM is running on?
- Please collect a screenshot of the issue if it can be recreated and provide the actions performed that caused the error?
- Do you have any other applications or servers running on the DCNM server?
- Are you connecting remotely or via a web browser?
- What time and date did you start seeing the issue occur?
SNMP
- What are the IP address of the switch and the Management server(s) (including Remote Ops (Hi-Track))?
- Is DNS being used in this setup?
- If so, then is the Domain Name fully qualified?
- Have any firewall rules been changed recently on the IP network?
- Has SNMP been working in the past, if so when did it work last?
Hardware:
SFP
- Have the SFP RX and TX power levels been checked with show interface fc slot/port trans det command?
- RX and TX values below -5 db can be problematic (Depending on if this is shortwave).
- Loose cable connections, dirty cables, damaged cables, or weak/failing HBA or SFPs from the end devices can cause low RX power. Check all connections between the switch and end device for issues if you see low RX power.
- If TX power is below -5db and you see a problem with the port, likely the SFP needs to be replaced.
- It is recommended that you run the show interface fc slot/port command to check for any physical layer counters as well.
Blades
- What error messages are being noticed? Please provide a screenshot of the error message.
- Is the source of electric to the switch power supplies causing issues?
- Are there any blades defined as faulty or powered off with issuing the command show module?
- Please provide the output of show module
Additional Notes
- See Cisco Data Collection for instructions on collecting logs.
CXone Metadata
Tags: Cisco,triage,Triage Questions
PageID: 24970