Replacement procedure overview
This section provides information on the requirements and considerations for replacing nodes.
Replacing a power supply
LED indicators on each PSU indicate the PSU status.
Procedure
Remove the power cord from the PSU.
Move the retaining latch to the right (you may hear a slight click if the PSU moves when the latch disengages).
Using the handle on the PSU, pull the PSU out from the back of the server until you can completely remove the PSU from the chassis.
Insert the replacement PSU. The retention latch should click into position all the way to the left when the PSU is fully inserted.
If the PSU that is not being replaced is receiving mains power when the replacement PSU is fitted, the fan on the replacement PSU becomes active.Connect the power cord to the back of the PSU.
The PSU should start as soon as the power connection is made. If the PSU does not start immediately, make sure the main power circuit is live and that the other end of the power cable is connected to a live outlet.
Field replaceable units
FRUs include the following components:
- Whole node (except rail kit, bezel and PSU)
- SSDs
- Fans
- Bezel
- Power Supply Units (PSUs)
- SFP+ port adapters
Some components are also hot-swappable. See Hot-swappable components for details.
Recovering or replacing a drive
Some drive failures require drive replacement, others only require performing a recovery process. Use the recovery process to ensure that all partitions are recovered before proceeding with any further drive recovery or replacement procedures. Unless you are certain the drive has failed, perform a drive recovery.
Drives can fail for a number of reasons, including corrupt sectors or erroneous blocks of data. Typically, the RAID controller handles these types of errors and they do not cause the server to fail.
More serious errors may cause a drive failure, causing one or both drives to fall out of the RAID. Should one partition of a drive fail, attempt a disk recovery. If a partition fails repeatedly, replace the drive. If all the partitions fall out of RAID, replace the failed drive.
- Failed drives are hot-swappable, so a failed drive can be replaced without shutting down the server. However, there are serious risks in trying to swap a drive that has not failed.
- Do not assume that because the red LED is illuminated that a drive is faulty. Under a RAID rebuild/recovery, the red LED is illuminated. If the drive fails and must be replaced, remove it from the server.
- If the drive shows signs of failure (through warning events in the event log), the drive can be replaced as it is hot-swappable.
- Do not pull out a drive that is in a known good configuration. Doing so can potentially lead to data corruption.
- Unless you are certain the drive has failed, perform a disk recovery.
- Drive redundancy is unsupported if the drive is removed from the server.
- The new drive does not require the same capacity as the drive being replaced.
Replacing a fan
Procedure
Remove the front bezel. The fan assemblies are now visible.
Identify the fan to be replaced.
Fans are labeled on the chassis, and are numbered 1 and 2, with fan 1 on the left and fan 2 on the right. Refer to the fan status LEDs on front panel of the server (behind the bezel) to see which fan has failed. In the following figure, number 1 indicates the status LED for fan 1 (the left-side fan), and number 2 indicates the status LED for fan 2 (the right-side fan).
Fan status LEDs Fan status LED descriptions Item Description 1 Fan 1 status LED 2 Fan 2 status LED Remove the faulty fan by loosening the thumbscrews (turning them counter-clockwise) until they are loose, then pulling the fan unit straight out of the chassis. (The fan lead connector disengages automatically as you remove the fan assembly.)
Put the new fan assembly into place.
Gently press the fan assembly back into the chassis. The fan electrical connector is aligned automatically when the fan is fully inserted into the chassis.
Secure the fan assembly in position by tightening the thumbscrews (turning them clockwise).
Replace the front bezel.
Server replacement requirements
Consider the following server replacement requirements:
- Much of the process required for a server replacement is the same as what is covered in installation and configuration training.
- Determine which replacement scenario is being encountered. The replacement process is different for each scenario.
You can use a keyboard, video, and mouse (KVM) device or a serial cable to connect to the serial port. Bring these with you just in case they are needed when the unit arrives. If you connect to the serial port, use the following SSH client settings:
- 115,200 b/s
- 8 data bits
- 1 stop bit
- No parity
- No flow control
- VT100 emulation
Swapping components
The server can be replaced onsite, however, some components are not included in your replacement server. You must remove those components from the original server and use them in the replacement server. There are a minimum of three parts to be reused in the replacement server.
The components that can be swapped include:
- Power supplies
- Bezel
- Rack mounting guides
MAC ID and license keys
The replacement server has a new MAC ID, which means that you are required to have new license keys regardless of whether you are replacing a single node or a complete cluster.
As part of the field replacement process, Hitachi Vantara recommends that you obtain temporary keys to enable quick delivery and implementation. However, any temporary keys must eventually be replaced with a permanent key. This is required for all field scenarios, except when replacing a single node in a cluster.
Previous backups
A system backup preserves two critical components of information:
- SMU configuration
- Server configuration
The backup form for an embedded SMU is different than one from an external SMU. Depending on the replacement scenario severity, different limitations might exist for the system recovery.
Upgrades
Replacement servers can be down or above a revision, and not at the expected level of firmware required at the customer site. An upgrade is typically required during the replacement process, which is not covered in this document. It is assumed that all services personnel performing a replacement have already been trained, and know where to get this information within their respective organization.