Content
How To Debug CCI Issues – Version 1.3
Introduction
This short document describes some of the many ways CCI can break. It should be used as a list of the things to check with a customer whenever CCI problems are raised.
Check: This tag is used to highlight what to look for.
Installation and Environment Variables
UNIX
You would think it is easy to get this right – but we see many errors caused due to bad installation. Here is what should be done. Let us assume a non-default installation – as that is the easiest for the user to get wrong J
Let us assume we want to install CCI in /opt/HORCM
- Copy the installation file to the hard disk somewhere. It is called RMHORC. Let’s copy it to /var/tmp
- cd /opt
Change directory to where you want /HORCM created - cpio -idmu < /var/tmp/RMHORC
This will copy all the files in the RMHORC “package” to /opt/HORCM
Check: We have seen cases where users build their own installation “packages” for HORCM – and then copy the files from one Host to another. In UNIX particularly this is dangerous. CCI needs a “hidden” directory called .uds or it will not start.
- In CCI 01-16-03 and below it was in /var/tmp
- In CCI 01-17-03 and above it is in /yourdirectory/HORCM
This directory contains UNIX “pipes” when the instances are started. The “pipes” are deleted when the instance stops. Thus, you will see this:
root@SYD-E250-1:/opt/HORCM/.uds#:ls -al
total 4
drwxrwxrwx 2 root sys 512 Feb 22 15:26 .
dr-xr-xr-x 12 root sys 512 Feb 22 15:04 ..
root@SYD-E250-1:/opt/HORCM/.uds#:horcmstart.sh 4
starting HORCM inst 4
HORCM inst 4 starts successfully.
root@SYD-E250-1:/opt/HORCM/.uds#:ls -al
total 6
drwxrwxrwx 3 root sys 512 Feb 22 16:29 .
dr-xr-xr-x 12 root sys 512 Feb 22 15:04 ..
drwxrwxrwx 2 root other 512 Feb 22 16:29 .lcmcl04
srwxrwxrwx 1 root other 0 Feb 22 16:29 .lcmep04
Always check for this directory if you have a case where CCI does not start.
4 ln -s /opt/HORCM /HORCM
Check: You must create a link or the install in the next step will fail.
5 /HORCM/horcminstall.sh
Check: You must do this on UNIX to create links to the CCI commands.
6 raidqry -h
Here is what you see if the user has done everything right.
root@SYD-E250-1:/opt/HORCM/.uds#:raidqry -h
Model : RAID-Manager/Solaris
Ver&Rev: 01-19-03/04
Usage : raidqry [options] for HORC
-h Help/Usage
-I[#] Set to HORCMINST#
-IH[#] or -ITC[#] Set to HORC mode [and HORCMINST#]
-IM[#] or -ISI[#] Set to MRCF mode [and HORCMINST#]
-z Set to the interactive mode
-zx Set to the interactive mode and HORCM monitoring
-q Quit(Return to main())
-g Specify for getting all group name on local
-l Specify the local query
-r <group> Specify the remote query
-f Specify display for floatable host
Check: Always get the user to run this command and send you the output.
- If user is using an old version of CCI, ask why. 01-19-03/04 or later is preferred for many reasons which will be discussed later.
- In some special cases, like HPtM, a specific level of CCI may be stated in the ECN or Release Notes. In this case, it may be advisable to stick with that level. Also, the microcode ECNs for 9900V and USP always recommend a CCI level.
- However, in my experience, CCI is always backwards compatible – and the developer has confirmed this – so one should always use the minimum level stated in ECNs.
- Any command this user issues will be assumed to be TrueCopy (refer above - HORC). If the user is trying to perform ShadowImage operation, you now know why it is failing J
- No instance has been set. Here is what you see if the instance variable has been set:
root@SYD-E250-1:/opt/HORCM/.uds#:export HORCMINST=4
root@SYD-E250-1:/opt/HORCM/.uds#:raidqry -h
Model : RAID-Manager/Solaris
Ver&Rev: 01-19-03/04
Usage : raidqry [options] for HORC[4]
-h Help/Usage
If the user is trying to control Instance 1, now you know why it is failing J
Windows
Installation is easier. Double click the EXE and follow the bouncing ball. I always recommend taking the default of C:\HORCM. The directory is only about 10 MB in size so we are not likely to fill the drive.
Check: The same rule applies as for UNIX. Always ask for raidqry output.
Environment Variables
Here is a ShadowImage example.
C:\HORCM\ETC>set horcminst=4
C:\HORCM\ETC>set horcc_mrcf=1
C:\HORCM\ETC>raidqry -h
Model : RAID-Manager/WindowsNT
Ver&Rev: 01-19-03/04
Usage : raidqry [options] for HOMRCF[4]
-h Help/Usage
-I[#] Set to HORCMINST#
-IH[#] or -ITC[#] Set to HORC mode [and HORCMINST#]
-IM[#] or -ISI[#] Set to MRCF mode [and HORCMINST#]
-z Set to the interactive mode
Check: horcminst is case insensitive on windows – it is case sensitive on UNIX
Change mode of operation to TrueCopy
C:\HORCM\ETC>set horcc_mrcf=
C:\HORCM\ETC>raidqry -h
Model : RAID-Manager/WindowsNT
Ver&Rev: 01-19-03/04
Usage : raidqry [options] for HORC[4]
Now try this on UNIX. Change mode of operation:
root@SYD-E250-1:/opt/HORCM/.uds#:export HORCC_MRCF=1
root@SYD-E250-1:/opt/HORCM/.uds#:raidqry -h
Model : RAID-Manager/Solaris
Ver&Rev: 01-19-03/04
Usage : raidqry [options] for HOMRCF[4]
root@SYD-E250-1:/opt/HORCM/.uds#:export HORCC_MRCF=
root@SYD-E250-1:/opt/HORCM/.uds#:raidqry -h
Model : RAID-Manager/Solaris
Ver&Rev: 01-19-03/04
Usage : raidqry [options] for HOMRCF[4]
Check: This does not work. You must do this.
root@SYD-E250-1:/opt/HORCM/.uds#:unset HORCC_MRCF
root@SYD-E250-1:/opt/HORCM/.uds#:raidqry -h
Model : RAID-Manager/Solaris
Ver&Rev: 01-19-03/04
Usage : raidqry [options] for HORC[4]
Sending logs to GSC
If you have to escalate the problem to GSC, we will need the complete set of HORCM logs and all the HORCM CONF files. In general, the preferred method of doing this is to run “getconfig”. These executables/scripts are available on the TUF Knowledge area here.
If for any reason you do not run these scripts, you must zip up all the LOG directories underneath the HORCM directory. Never pick and choose which log to upload – many of them have the same name – and GSC may need to refer to all of them!
In addition, the factory always asks for the output from these commands (Windows only):
inqraid -CLI -fgx $Phys
inqraid -CLI -fgvx $Vol
inqraid -CLI -fgx $LETALL
Finding Command Devices
You cannot create a HORCM CONF file or check it for accuracy without doing INQRAID commands for UNIX/Windows and RAIDSCAN commands for Windows.
UNIX
Check: Get the user to send you the result of this command:
root@SYD-E250-1:/opt/HORCM/.uds#:ls /dev/rdsk/* | inqraid -CLI -fxg
DEVICE_FILE PORT SERIAL LDEV CTG H/M/12 SSID R:Group PRODUCT_ID
...
c2t0d16s2 CL1-A-4 10111 0 - - - - OPEN-V-CM
c2t2d36s2 - - - - - - - -
c2t4d0s2 CL1-A-20 20169 43 - - - - OPEN-V-CM
c2t6d0s2 CL1-A-11 80025 31F - - - - OPEN-V-CM
c2t6d14s2 - - - - - - - -
c3t2d128s2 CL2-A-6 10262 2180 - - - - OPEN-V-CM
c3t3d4s2 CL2-A-7 3157 1 - - - - DF600F-CM
...
Here are 5 command devices – 2 are in bold. The first is a USP – the second is a 9570V. If the user wants to use the first one, they need to code:
/dev/rdsk/c2t6d0s2
in the HORCM CONF file.
Windows
Check: Get the user to send you the result of these commands:
C:\HORCM\ETC>raidscan -x findcmddev h0,20
cmddev of Ser# 10111 = \\.\PhysicalDrive2
cmddev of Ser# 10111 = \\.\PhysicalDrive5
cmddev of Ser# 41 = \\.\PhysicalDrive7
cmddev of Ser# 10262 = \\.\PhysicalDrive8
cmddev of Ser# 80025 = \\.\PhysicalDrive10
cmddev of Ser# 20169 = \\.\PhysicalDrive11
cmddev of Ser# 20169 = \\.\Volume{3c107ab6-7dbf-11db-a1ed-000e0c6abf1d}
Check: Do not use ANY of these names! If you find a user using this syntax, ask that it be changed J See INQRAID output below:
- Harddisk numbers can change after a reboot.
- GUID numbers can change in a MS Cluster environment after reboot
- Do yourself a favour – do not use these names
C:\HORCM\ETC>inqraid $Phys -CLI
DEVICE_FILE PORT SERIAL LDEV CTG H/M/12 SSID R:Group PRODUCT_ID
Harddisk0 - - - - - - - 00JS-22MHB0
Harddisk1 - - - - - - - 00JS-22MHB0
Harddisk2 CL1-A 10111 515 - - - - OPEN-V-CM
Harddisk3 CL1-A 10111 1920 - s/s/ss 9997 5:03-02 OPEN-V
Harddisk4 CL1-A 10111 768 - s/s/ss 9993 5:06-02 OPEN-V
Harddisk5 CL1-A 10111 1856 - - - - OPEN-V-CM
Harddisk6 CL1-A 10111 2632 - P/s/ss 999A 5:06-02 OPEN-V
Harddisk7 CL1-A 41 0 - - - - DF600F-CM
Harddisk8 CL1-A 10262 16 - - - - OPEN-V-CM
Harddisk9 CL1-A 10262 8320 - s/s/ss 2000 5:01-05 OPEN-V
Harddisk10 CL1-A 80025 784 - - - - OPEN-V-CM
Harddisk11 CL1-A 20169 13 - - - - OPEN-V-CM
\\.\CMD syntax has been around since 01-17-03/05. There is no reason not to use it! If the user is running 01-17-03/05 or below, get them to use 01-19-03/04 or higher.
In this case, for Harddisk8/USP 10262, the correct syntax in the HORCM CONF file is:
\\.\CMD-10262-16 or even
\\.\CMD-10262-16-CL1-A-12 if you know this is HSD 12 - or
\\.\CMD-10262-16-CL1-A or, for slack people,
\\.\CMD-10262-16 is my preferred coding technique – as this takes care of multipath environments as well.
Starting HORCM Instances
There are so many ways for this to fail that I could write a book on this topic J
So, always take the easy way out. Send the user a deck that is bound to work. If it does not, then you have very little to debug. Here is such a deck – HORCM4.CONF:
UNIX
HORCM_MON
#ip_address service poll(10ms) timeout(10ms)
10.129.2.53 11004 1000 3000
HORCM_CMD
#dev_name dev_name dev_name dev_name
/dev/rdsk/c2t6d0s2
HORCM_DEV
#dev_group dev_name port# TargetID LU# MU#
HORCM_INST
#dev_group ip_address service
There are only 3 things to check:
- Is the IP address correct? Note: You can use “localhost” here, but this will not work for TC environments using 2 different CCI servers.
- Is 11004 a “free” UDP port? Almost certainly it is.
- Is the CMDDEV right? You can tell that from the commands we have already issued.
UNIX HORCM CONF files are kept in /etc
Windows
Here is HORCM8.CONF for Windows:
HORCM_MON
#ip_address service poll(10ms) timeout(10ms)
10.129.3.127 11008 1000 3000
HORCM_CMD
#dev_name dev_name dev_name dev_name
\\.\CMD-10262-16
HORCM_DEV
#dev_group dev_name port# TargetID LU# MU#
HORCM_INST
#dev_group ip_address service
Use the same logic as for UNIX. Windows HORCM CONF files are in C:\WINDOWS
Other recommendations:
- HDvM uses HORCM CONF files called HORCM900.CONF to HORCM988.CONF for temporary HORCM CONF files. Do not use these numbers yourself.
- I suggest that you use 0-799 for user created files and 800-899 for HDvM created permanent HORCM CONF files.
- I also suggest a numbering convention of 1100x where x is the number in HORCMx.CONF. This means that you will need to “reserve” UDP ports 11000 to 11899 for HORCM CONF usage.
Updating the “Services” file
Many people code HORCM CONF files like this:
HORCM_MON
#ip_address service poll(10ms) timeout(10ms)
10.129.3.127 horcm8 1000 3000
In this case, the UDP port – horcm8 – must be defined in the “Services” file:
Windows C:\WINDOWS\system32\drivers\etc\services
UNIX /etc/services
Like this:
horcm0 11000/udp
horcm1 11001/udp
…
horcm8 11008/udp
horcm9 11009/udp
“blank line”
Check: Under Windows, if there is no blank line after horcm9 (in this example) that definition will be ignored! PS No blank lines at the end of the HORCM CONF file, please J
Check: If you have 2 CCI servers, using horcm8 and horcm9, for example, then both horcm8 and horcm9 have to be defined in both servers!
Reading the LOGS
Windows
Let’s start with Windows first this time J
In our example – we used Instance 8 – so you will find the log here:
C:\HORCM\log8\curlog\horcm_ml_acer510_log.txt
because this server is called ml_acer510.
Let us examine it in detail:
*****************************************************************************
- HORCM STARTUP LOG - Thu Feb 22 18:03:08 2007
*****************************************************************************
18:03:08-39210-05000- horcmgr started on Thu Feb 22 18:03:08 2007
18:03:08-39210-05000- _spawnvp() horcmd_08 using horcmgr [CWD=C:\HORCM\ETC]
18:03:08-3d090-07240- Fibre address conversion TBL has been set to '2'
P.P. : RAID Manager for WindowsNT
Model : RAID-Manager/WindowsNT
Ver&Rev: 01-19-03/04
Release: Production(GA)
ALL Rights Reserved, Copyright (c) 1998-2006, Hitachi, Ltd.
HORCM(ml_acer510 7240) started by Administrator (0) on Thu Feb 22 18:03:08 2007
Lots of useful information here. See the data in bold.
18:03:08-3d090-07240- horcmd_08 started on Thu Feb 22 18:03:08 2007
18:03:08-3d090-07240- [horcmcfgrdf] access(conf_file) OK.
18:03:08-3d090-07240- [horcmcfgrdf] access(check) OK.
18:03:08-3d090-07240- [horcmcfgrdf] open(conf_file) OK.
18:03:08-3d090-07240- [horcmcfgetent] fseek(top) OK.
18:03:08-40b28-07240- converted CMDDEV filename \\.\CMD-10262-16 to \\.\PhysicalDrive8
Here is where \\.\CMD syntax is converted to a physical drive number.
18:03:08-40b28-07240- [horcmcfgetent] read(conf_file) OK.
18:03:08-40b28-07240- [horcmcfgrdf] close(conf_file) OK.
18:03:08-40b28-07240- [horcmcfgrdf] check(conf) OK.
18:03:08-40b28-07240- [horcmcfgrdf] horccmddev(0) OK.
18:03:08-40b28-07240- [horcread] cmddevopen() start
18:03:08-40b28-07240- [horcread] cmddevopen() finished
18:03:08-40b28-07240- ***** horcread *****
---ADDR--- -OFF- 0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F- ------CHAR------
[0x0012d8d4]0000: 48495441 43484920 4f50454e 2052454d HITACHI OPEN REM
[0x0012d8e4]0010: 4f544520 434f5059 20535953 54454d20 OTE COPY SYSTEM
[0x0012d8f4]0020: 44415441 20545950 45203031 00000000 DATA TYPE 01....
[0x0012d904]0030: f0f0f0f0 f0f0f0f1 f0f2f6f2 00040d09 000000010262....
Here is the USP serial number.
[0x0012d914]0040: 50090100 00040000 00040004 00040004 P...............
[0x0012d924]0050: ffffffff ffffffff 00060006 00060006 ................
[0x0012d934]0060: 00070007 00070007 000f0c00 00000000 ................
[0x0012d944]0070: 00000000 ef00e011 08030100 01004000 ..............@.
[0x0012d954]0080: 38000400 04400100 01000400 00ff0100 8....@..........
[0x0012d964]0090: 80000000 00000000 00000000 00000000 ................
[0x0012d974]00a0: 00000000 00000000 00000000 00000000 ................
[0x0012d984]00b0: 00800012 000e0002 00000000 00000000 ................
[0x0012d994]00c0: 00000000 00000000 00000000 00000000 ................
[0x0012d9a4]00d0: 00000000 00000000 00000000 00000000 ................
[0x0012d9b4]00e0: 00000000 00000000 00000000 00000000 ................
[0x0012d9c4]00f0: 00000000 00000000 00000000 00000000 ................
---ADDR--- -OFF- 0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F- ------CHAR------
[0x0012d9d4]0100: 0004ffff 00060007 ffffffff ffff000b ................
[0x0012d9e4]0110: ffff000d 000e000f 00100011 00120013 ................
[0x0012d9f4]0120: 00140015 00160017 9914ffff 001a001b ................
[0x0012da04]0130: 001c001d 001e001f 00200021 00220023 ......... .!.".#
[0x0012da14]0140: 20002001 00260027 00280029 002a002b . ..&.'.(.).*.+
[0x0012da24]0150: 002c002d ffff002f 00300031 00320033 .,.-.../.0.1.2.3
[0x0012da34]0160: 00340035 00360037 00380039 003a003b .4.5.6.7.8.9.:.;
[0x0012da44]0170: 003c003d 003e003f 00400041 00420043 .<.=.>.?.@.A.B.C
18:03:08-40b28-07240- [HORCREAD]: maxldev = 16384, unitnum = 256
18:03:08-40b28-07240- [HORCREAD]: maxhorc = 4, maxmrcf = 64, maxlun = 1024, maxctg = 256, maxjnlg = 256, mixport =1, slprflag = 1
18:03:08-40b28-07240- [HORCREAD]: Number of used instance(s) = 17, Number of attached instance(s) = UNKNOWN, Number of same instance(s) = 1
18:03:08-40b28-07240- [HORCREAD] execute-test read is done.:\\.\PhysicalDrive8
18:03:08-40b28-07240- [HORCMCFGRDF] SLPR is supported.
18:03:08-40b28-07240- ***** SLPR bitmap *****
---ADDR--- -OFF- 0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F- ------CHAR------
[0x0012db18]0000: 80000000 00000000 00000000 00000000 ................
[0x0012db28]0010: 00000000 00000000 00000000 00000000 ................
18:03:08-40b28-07240- [HORCRELOWNLBA] floatable LBA(e011) is released.ID:0:\\.\PhysicalDrive8
18:03:08-40b28-07240- [horcread] cmddevopen() start
18:03:08-40b28-07240- [horcread] cmddevopen() finished
18:03:08-40b28-07240- ***** horcread *****
---ADDR--- -OFF- 0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F- ------CHAR------
[0x0012d8d4]0000: 48495441 43484920 4f50454e 2052454d HITACHI OPEN REM
[0x0012d8e4]0010: 4f544520 434f5059 20535953 54454d20 OTE COPY SYSTEM
[0x0012d8f4]0020: 44415441 20545950 45203031 00000000 DATA TYPE 01....
[0x0012d904]0030: f0f0f0f0 f0f0f0f1 f0f2f6f2 00040d09 000000010262....
[0x0012d914]0040: 50090100 00040000 00040004 00040004 P...............
[0x0012d924]0050: ffffffff ffffffff 00060006 00060006 ................
[0x0012d934]0060: 00070007 00070007 000f0c00 00000000 ................
[0x0012d944]0070: 00000000 ef00e011 08030100 01004000 ..............@.
[0x0012d954]0080: 38000400 04400100 01000400 00ff0100 8....@..........
[0x0012d964]0090: 80000000 00000000 00000000 00000000 ................
[0x0012d974]00a0: 00000000 00000000 00000000 00000000 ................
[0x0012d984]00b0: 00800012 000e0002 00000000 00000000 ................
[0x0012d994]00c0: 00000000 00000000 00000000 00000000 ................
[0x0012d9a4]00d0: 00000000 00000000 00000000 00000000 ................
[0x0012d9b4]00e0: 00000000 00000000 00000000 00000000 ................
[0x0012d9c4]00f0: 00000000 00000000 00000000 00000000 ................
---ADDR--- -OFF- 0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F- ------CHAR------
[0x0012d9d4]0100: 0004ffff 00060007 ffffffff ffff000b ................
[0x0012d9e4]0110: ffff000d 000e000f 00100011 00120013 ................
[0x0012d9f4]0120: 00140015 00160017 9914ffff 001a001b ................
[0x0012da04]0130: 001c001d 001e001f 00200021 00220023 ......... .!.".#
[0x0012da14]0140: 20002001 00260027 00280029 002a002b . ..&.'.(.).*.+
[0x0012da24]0150: 002c002d ffff002f 00300031 00320033 .,.-.../.0.1.2.3
[0x0012da34]0160: 00340035 00360037 00380039 003a003b .4.5.6.7.8.9.:.;
[0x0012da44]0170: 003c003d 003e003f 00400041 00420043 .<.=.>.?.@.A.B.C
18:03:08-40b28-07240- [HORCREAD]: maxldev = 16384, unitnum = 256
18:03:08-40b28-07240- [HORCREAD]: maxhorc = 4, maxmrcf = 64, maxlun = 1024, maxctg = 256, maxjnlg = 256, mixport =1, slprflag = 1
18:03:08-40b28-07240- [HORCREAD]: Number of used instance(s) = 17, Number of attached instance(s) = UNKNOWN, Number of same instance(s) = 1
18:03:08-40b28-07240- [HORCREAD] execute-test read is done.:\\.\PhysicalDrive8
18:03:08-40b28-07240- [HORCMCFGRDF] SLPR bitmap is checked.
18:03:08-40b28-07240- [horcmcfgrdf] horccmddev(0) OK.
18:03:08-40b28-07240- [HORCRELOWNLBA] floatable LBA(e011) is released.ID:0:\\.\PhysicalDrive8
18:03:08-40b28-07240- [horcread] cmddevopen() start
18:03:08-40b28-07240- [horcread] cmddevopen() finished
18:03:08-449a8-07240- ***** horcread *****
---ADDR--- -OFF- 0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F- ------CHAR------
[0x0012d8d4]0000: 48495441 43484920 4f50454e 2052454d HITACHI OPEN REM
[0x0012d8e4]0010: 4f544520 434f5059 20535953 54454d20 OTE COPY SYSTEM
[0x0012d8f4]0020: 44415441 20545950 45203031 00000000 DATA TYPE 01....
[0x0012d904]0030: f0f0f0f0 f0f0f0f1 f0f2f6f2 00040d09 000000010262....
[0x0012d914]0040: 50090100 00040000 00040004 00040004 P...............
[0x0012d924]0050: ffffffff ffffffff 00060006 00060006 ................
[0x0012d934]0060: 00070007 00070007 000f0c00 00000000 ................
[0x0012d944]0070: 00000000 ef00e011 08030100 01004000 ..............@.
[0x0012d954]0080: 38000400 04400100 01000400 00ff0100 8....@..........
[0x0012d964]0090: 80000000 00000000 00000000 00000000 ................
[0x0012d974]00a0: 00000000 00000000 00000000 00000000 ................
[0x0012d984]00b0: 00800012 000e0002 00000000 00000000 ................
[0x0012d994]00c0: 00000000 00000000 00000000 00000000 ................
[0x0012d9a4]00d0: 00000000 00000000 00000000 00000000 ................
[0x0012d9b4]00e0: 00000000 00000000 00000000 00000000 ................
[0x0012d9c4]00f0: 00000000 00000000 00000000 00000000 ................
---ADDR--- -OFF- 0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F- ------CHAR------
[0x0012d9d4]0100: 0004ffff 00060007 ffffffff ffff000b ................
[0x0012d9e4]0110: ffff000d 000e000f 00100011 00120013 ................
[0x0012d9f4]0120: 00140015 00160017 9914ffff 001a001b ................
[0x0012da04]0130: 001c001d 001e001f 00200021 00220023 ......... .!.".#
[0x0012da14]0140: 20002001 00260027 00280029 002a002b . ..&.'.(.).*.+
[0x0012da24]0150: 002c002d ffff002f 00300031 00320033 .,.-.../.0.1.2.3
[0x0012da34]0160: 00340035 00360037 00380039 003a003b .4.5.6.7.8.9.:.;
[0x0012da44]0170: 003c003d 003e003f 00400041 00420043 .<.=.>.?.@.A.B.C
18:03:08-449a8-07240- [HORCREAD]: maxldev = 16384, unitnum = 256
18:03:08-449a8-07240- [HORCREAD]: maxhorc = 4, maxmrcf = 64, maxlun = 1024, maxctg = 256, maxjnlg = 256, mixport =1, slprflag = 1
18:03:08-449a8-07240- [HORCREAD]: Number of used instance(s) = 17, Number of attached instance(s) = UNKNOWN, Number of same instance(s) = 1
18:03:08-449a8-07240- [HORCREAD] execute-test read is done.:\\.\PhysicalDrive8
18:03:08-449a8-07240- [horcmcfgrdf] seldevdata() OK.
18:03:08-449a8-07240- MON(HORC): Size of memory allocation for CONFIG_DB = 64 bytes.
18:03:08-449a8-07240- MON:HORCM_CMD=\\.\PhysicalDrive8[Fibre][AL-PA=0xef -> C=5,T=1] port=CL1-A , targ=1, lun=12
Here is the AL-PA for the Port, and the Port, target ID and LUN.
18:03:08-449a8-07240- MON(HORC):number of Mus = 0
18:03:08-449a8-07240- MON(HOMRCF): Size of memory allocation for CONFIG_DB = 0 bytes.
18:03:08-449a8-07240- MON(HOMRCF):number of Mus = 0
18:03:10-d1b78-05000- horcmgr : executed 'CreateProcess(raidscan.exe -pi $PhysicalDrive -find inst -z2w=NUL -z1w=NUL)',exit = 0.
I have quoted this in full for a reason. This is what you should expect to see if it all works. If it does not work, at least you can compare the user log with this one J
UNIX
Here is the output for a Solaris server called SYD-E250-1. This is the log for instance 4. As we installed CCI in /opt/HORCM, the log is here:
root@SYD-E250-1:/opt/HORCM/log4/curlog#:ls -al
total 28
drwxr-xr-x 3 root other 512 Mar 7 16:29 .
drwxr-xr-x 4 root other 512 Mar 7 16:29 ..
-rw-r--r-- 1 root other 10274 Mar 7 16:30 horcm_SYD-E250-1.log
drwxr-xr-x 2 root other 512 Mar 7 16:29 horcmlog_SYD-E250-1
*****************************************************************************
- HORCM STARTUP LOG - Wed Mar 7 16:29:59 2007
*****************************************************************************
16:29:59-cac9d-11271- horcmgr started on Wed Mar 7 16:29:59 2007
16:29:59-cd940-11271- execvp() horcmd_04 using /etc/horcmgr [CWD=/]
16:29:59-e99c5-11272- Fibre address conversion TBL has been set to '1'
P.P. : RAID Manager for Solaris
Model : RAID-Manager/Solaris
Ver&Rev: 01-19-03/04
Release: Production(GA)
ALL Rights Reserved, Copyright (c) 1998-2006, Hitachi, Ltd.
HORCM(SYD-E250-1 11272) started by root (0) on Wed Mar 7 16:30:00 2007
16:30:00-11d9d-11272- horcmd_04 started on Wed Mar 7 16:30:00 2007
16:30:00-17e65-11272- [horcmcfgrdf] access(conf_file) OK.
16:30:00-1c076-11272- [horcmcfgrdf] access(check) OK.
16:30:00-1e127-11272- [horcmcfgrdf] open(conf_file) OK.
16:30:00-29cf3-11272- [horcmcfgetent] fseek(top) OK.
16:30:00-31d0e-11272- [horcmcfgetent] read(conf_file) OK.
16:30:00-34856-11272- [horcmcfgrdf] close(conf_file) OK.
16:30:00-389cb-11272- [horcmcfgrdf] check(conf) OK.
16:30:00-4a34c-11272- [horcmcfgrdf] horccmddev(0) OK.
16:30:00-5ac7f-11272- [horcread] cmddevopen() start
16:30:00-63837-11272- [horcread] cmddevopen() finished
16:30:00-6e384-11272- ***** horcread *****
---ADDR--- -OFF- 0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F- ------CHAR------
[0xffbfcb94]0000: 48495441 43484920 4f50454e 2052454d HITACHI OPEN REM
[0xffbfcba4]0010: 4f544520 434f5059 20535953 54454d20 OTE COPY SYSTEM
[0xffbfcbb4]0020: 44415441 20545950 45203031 00000000 DATA TYPE 01....
[0xffbfcbc4]0030: f0f0f0f0 f0f0f0f8 f0f0f2f5 00070d09 000000080025....
NSC55 with a Serial Number of 80025
[0xffbfcbd4]0040: 50090500 00020000 00050005 00050005 P...............
[0xffbfcbe4]0050: 00040004 00040004 00060006 00060006 ................
[0xffbfcbf4]0060: 00070007 00070007 000f2a00 00000000 ..........*.....
[0xffbfcc04]0070: 00000000 b200e00c 08030100 01004000 ..............@.
[0xffbfcc14]0080: f8000400 04400100 01000400 00ff0100 8....@..........
[0xffbfcc24]0090: 80000000 00000000 00000000 00000000 ................
[0xffbfcc34]00a0: 00000000 00000000 00000000 00000000 ................
[0xffbfcc44]00b0: 0080000e 00080002 00000000 00000000 ................
[0xffbfcc54]00c0: 00000000 00000000 00000000 00000000 ................
[0xffbfcc64]00d0: 00000000 00000000 00000000 00000000 ................
[0xffbfcc74]00e0: 00000000 00000000 00000000 00000000 ................
[0xffbfcc84]00f0: 00000000 00000000 00000000 00000000 ................
---ADDR--- -OFF- 0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F- ------CHAR------
[0xffbfcc94]0100: 00050004 00060007 00080009 0060ffff .............`..
[0xffbfcca4]0110: ffffffff ffffffff ffff000d ffffffff ................
[0xffbfccb4]0120: 0020ffff ffffffff ffffffff ffffffff . ..............
[0xffbfccc4]0130: ffffffff ffffffff ffffffff ffffffff ................
[0xffbfccd4]0140: 0039ffff ffffffff ffffffff ffffffff .9..............
[0xffbfcce4]0150: 0028ffff ffffffff ffff002d ffffffff .(.........-....
[0xffbfccf4]0160: ffffffff 00320033 ffffffff ffffffff .....2.3........
[0xffbfcd04]0170: ffffffff ffffffff ffffffff ffff000a ................
16:30:01-ae6ea-11272- [HORCREAD]: maxldev = 16384, unitnum = 256
16:30:01-b1cea-11272- [HORCREAD]: maxhorc = 4, maxmrcf = 64, maxlun = 1024, maxctg = 256, maxjnlg = 256, mixport =1, slprflag = 1
16:30:01-b5e34-11272- [HORCREAD]: Number of used instance(s) = 13, Number of attached instance(s) = UNKNOWN, Number of same instance(s) = 1
16:30:01-c2226-11272- [HORCREAD] execute-test read is done.:/dev/rdsk/c2t6d0s2
Here is the CMDDEV
16:30:01-c636e-11272- [HORCMCFGRDF] SLPR is supported.
16:30:01-ca4bf-11272- ***** SLPR bitmap *****
---ADDR--- -OFF- 0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F- ------CHAR------
[0xffbfce08]0000: 80000000 00000000 00000000 00000000 ................
[0xffbfce18]0010: 00000000 00000000 00000000 00000000 ................
16:30:01-dad71-11272- [HORCRELOWNLBA] floatable LBA(e00c) is released.ID:0:/dev/rdsk/c2t6d0s2
16:30:01-deb6b-11272- [horcread] cmddevopen() start
16:30:01-e2d12-11272- [horcread] cmddevopen() finished
16:30:01-e7502-11272- ***** horcread *****
---ADDR--- -OFF- 0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F- ------CHAR------
[0xffbfcb94]0000: 48495441 43484920 4f50454e 2052454d HITACHI OPEN REM
[0xffbfcba4]0010: 4f544520 434f5059 20535953 54454d20 OTE COPY SYSTEM
[0xffbfcbb4]0020: 44415441 20545950 45203031 00000000 DATA TYPE 01....
[0xffbfcbc4]0030: f0f0f0f0 f0f0f0f8 f0f0f2f5 00070d09 000000080025....
[0xffbfcbd4]0040: 50090500 00020000 00050005 00050005 P...............
[0xffbfcbe4]0050: 00040004 00040004 00060006 00060006 ................
[0xffbfcbf4]0060: 00070007 00070007 000f2a00 00000000 ..........*.....
[0xffbfcc04]0070: 00000000 b200e00c 08030100 01004000 ..............@.
[0xffbfcc14]0080: f8000400 04400100 01000400 00ff0100 8....@..........
[0xffbfcc24]0090: 80000000 00000000 00000000 00000000 ................
[0xffbfcc34]00a0: 00000000 00000000 00000000 00000000 ................
[0xffbfcc44]00b0: 0080000e 00080002 00000000 00000000 ................
[0xffbfcc54]00c0: 00000000 00000000 00000000 00000000 ................
[0xffbfcc64]00d0: 00000000 00000000 00000000 00000000 ................
[0xffbfcc74]00e0: 00000000 00000000 00000000 00000000 ................
[0xffbfcc84]00f0: 00000000 00000000 00000000 00000000 ................
---ADDR--- -OFF- 0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F- ------CHAR------
[0xffbfcc94]0100: 00050004 00060007 00080009 0060ffff .............`..
[0xffbfcca4]0110: ffffffff ffffffff ffff000d ffffffff ................
[0xffbfccb4]0120: 0020ffff ffffffff ffffffff ffffffff . ..............
[0xffbfccc4]0130: ffffffff ffffffff ffffffff ffffffff ................
[0xffbfccd4]0140: 0039ffff ffffffff ffffffff ffffffff .9..............
[0xffbfcce4]0150: 0028ffff ffffffff ffff002d ffffffff .(.........-....
[0xffbfccf4]0160: ffffffff 00320033 ffffffff ffffffff .....2.3........
[0xffbfcd04]0170: ffffffff ffffffff ffffffff ffff000a ................
16:30:02-62fd9-11272- [HORCREAD]: maxldev = 16384, unitnum = 256
16:30:02-6712a-11272- [HORCREAD]: maxhorc = 4, maxmrcf = 64, maxlun = 1024, maxctg = 256, maxjnlg = 256, mixport =1, slprflag = 1
16:30:02-6b268-11272- [HORCREAD]: Number of used instance(s) = 13, Number of attached instance(s) = UNKNOWN, Number of same instance(s) = 1
16:30:02-77659-11272- [HORCREAD] execute-test read is done.:/dev/rdsk/c2t6d0s2
16:30:02-7b7d2-11272- [HORCMCFGRDF] SLPR bitmap is checked.
16:30:02-7f90c-11272- [horcmcfgrdf] horccmddev(0) OK.
16:30:02-85faf-11272- [HORCRELOWNLBA] floatable LBA(e00c) is released.ID:0:/dev/rdsk/c2t6d0s2
16:30:02-89c66-11272- [horcread] cmddevopen() start
16:30:02-8de05-11272- [horcread] cmddevopen() finished
16:30:02-925ff-11272- ***** horcread *****
---ADDR--- -OFF- 0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F- ------CHAR------
[0xffbfcb94]0000: 48495441 43484920 4f50454e 2052454d HITACHI OPEN REM
[0xffbfcba4]0010: 4f544520 434f5059 20535953 54454d20 OTE COPY SYSTEM
[0xffbfcbb4]0020: 44415441 20545950 45203031 00000000 DATA TYPE 01....
[0xffbfcbc4]0030: f0f0f0f0 f0f0f0f8 f0f0f2f5 00070d09 000000080025....
[0xffbfcbd4]0040: 50090500 00020000 00050005 00050005 P...............
[0xffbfcbe4]0050: 00040004 00040004 00060006 00060006 ................
[0xffbfcbf4]0060: 00070007 00070007 000f2a00 00000000 ..........*.....
[0xffbfcc04]0070: 00000000 b200e00c 08030100 01004000 ..............@.
[0xffbfcc14]0080: f8000400 04400100 01000400 00ff0100 8....@..........
[0xffbfcc24]0090: 80000000 00000000 00000000 00000000 ................
[0xffbfcc34]00a0: 00000000 00000000 00000000 00000000 ................
[0xffbfcc44]00b0: 0080000e 00080002 00000000 00000000 ................
[0xffbfcc54]00c0: 00000000 00000000 00000000 00000000 ................
[0xffbfcc64]00d0: 00000000 00000000 00000000 00000000 ................
[0xffbfcc74]00e0: 00000000 00000000 00000000 00000000 ................
[0xffbfcc84]00f0: 00000000 00000000 00000000 00000000 ................
---ADDR--- -OFF- 0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F- ------CHAR------
[0xffbfcc94]0100: 00050004 00060007 00080009 0060ffff .............`..
[0xffbfcca4]0110: ffffffff ffffffff ffff000d ffffffff ................
[0xffbfccb4]0120: 0020ffff ffffffff ffffffff ffffffff . ..............
[0xffbfccc4]0130: ffffffff ffffffff ffffffff ffffffff ................
[0xffbfccd4]0140: 0039ffff ffffffff ffffffff ffffffff .9..............
[0xffbfcce4]0150: 0028ffff ffffffff ffff002d ffffffff .(.........-....
[0xffbfccf4]0160: ffffffff 00320033 ffffffff ffffffff .....2.3........
[0xffbfcd04]0170: ffffffff ffffffff ffffffff ffff000a ................
16:30:02-e7a8a-11272- [HORCREAD]: maxldev = 16384, unitnum = 256
16:30:02-ebbdb-11272- [HORCREAD]: maxhorc = 4, maxmrcf = 64, maxlun = 1024, maxctg = 256, maxjnlg = 256, mixport =1, slprflag = 1
16:30:02-efd23-11272- [HORCREAD]: Number of used instance(s) = 13, Number of attached instance(s) = UNKNOWN, Number of same instance(s) = 1
16:30:03-07ece-11272- [HORCREAD] execute-test read is done.:/dev/rdsk/c2t6d0s2
16:30:03-0e0d4-11272- [horcmcfgrdf] seldevdata() OK.
16:30:03-12354-11272- MON(HORC): Size of memory allocation for CONFIG_DB = 64 bytes.
16:30:03-16392-11272- MON:HORCM_CMD=/dev/rdsk/c2t6d0s2[Fibre][AL-PA=0xb2 -> C=2,T=32] port=CL1-A , targ=32, lun=42
Here is the AL-PA for the Port, and the Port, target ID and LUN.
16:30:03-1a4ba-11272- MON(HORC):number of Mus = 0
16:30:03-1e633-11272- MON(HOMRCF): Size of memory allocation for CONFIG_DB = 0 bytes.
16:30:03-2275a-11272- MON(HOMRCF):number of Mus = 0
16:30:07-b3adf-11271- horcmgr : executed 'system(/bin/ls /dev/rdsk/* | /HORCM/usr/bin/raidscan -find inst)'.
Audit Logging
Check: Always set full logging if possible. This was introduced with 01-17-03/05 – but it is disabled by default. The environment variable is HORCC_LOGSZ. If this environment variable is not set, only errors are logged. With this variable set, successful commands are logged as well – very useful if you need to know what was typed and when.
However, only the input is logged, not the output! So, always cut and paste the entire Command Prompt session and send that to GSC as well.
Check: Can the user reproduce this problem at will? If so, get them to stop CCI, delete the LOGx directories and then start CCI and issue the command that fails. This will make reading the LOGx files much easier as the only messages in the logs will be what you want to look at J
Windows
TSTART.BAT BAT file to start CCI and set the correct options for TC
@echo off
rem
rem Batch file to start HORCM for TrueCopy operations
rem
rem turn on CCI logging for 01-17-03/05 or later
set HORCC_LOGSZ=2048
rem
raidscan -x findcmddev h0,20
set horcmfctbl=2
rem set instance to match your naming convention for the PVOL instance
set horcminst=0
rem next line with a value for SI only!
set horcc_mrcf=
horcmstart 0 1
TSTOP.BAT BAT file to stop CCI
@echo off
rem
rem Batch file to stop HORCM after TrueCopy operations
rem
horcmshutdown 0 1
set horcmfctbl=
set horcminst=
set horcc_mrcf=
set HORCC_LOGSZ=
UNIX
Check: Always ask the user to “cut and paste” the command line input and output – you need to know what they typed and what the result was J
root@SYD-E250-1:/opt/HORCM/log4/curlog#:horcmstart.sh 4
starting HORCM inst 4
HORCM inst 4 starts successfully.
root@SYD-E250-1:/opt/HORCM/log4/curlog#:export HORCC_LOGSZ=2048
root@SYD-E250-1:/opt/HORCM/log4/curlog#:raidscan -p CL1-A
…
root@SYD-E250-1:/opt/HORCM/log4/curlog#:horcmshutdown.sh 4
inst 4:
HORCM Shutdown inst 4 !!!
root@SYD-E250-1:/opt/HORCM/log4/curlog#:cd ..
root@SYD-E250-1:/opt/HORCM/log4#:ls -al
total 10
drwxr-xr-x 4 root other 512 Mar 7 16:50 .
dr-xr-xr-x 12 root sys 512 Feb 22 15:04 ..
drwxr-xr-x 3 root other 512 Mar 7 16:49 curlog
-rw-r--r-- 1 root other 289 Mar 7 16:51 horcc_SYD-E250-1.log
drwxr-xr-x 3 root other 512 Mar 7 16:29 tmplog
Here are the contents of LOG file: horcc_SYD-E250-1.log
COMMAND NORMAL : EUserId for HORC[4] : root (0) Wed Mar 7 16:50:36 2007
CMDLINE : raidscan -p CL1-A
16:50:37-450c6-11368- [raidscan][exit(0)]
COMMAND NORMAL : EUserId for HORC[4] : root (0) Wed Mar 7 16:51:53 2007
CMDLINE : /usr/bin/horcctl -S
16:51:54-0f8cf-11376- [horcctl][exit(0)]
Command Device Reject
Most CCI errors are self explanatory – however this one is usually impossible for the user to debug L
Here is a simple ShadowImage example:
HORCM8.CONF
HORCM_MON
#ip_address service poll(10ms) timeout(10ms)
localhost 11008 1000 3000
HORCM_CMD
\\.\CMD-977-5
HORCM_DEV
#dev_group dev_name port# TargetID LU# MU#
#
VG01 LDEV49 CL1-A-1 1 7 0
HORCM_INST
#dev_group ip_address service
#
VG01 localhost 11009
HORCM9.CONF
HORCM_MON
#ip_address service poll(10ms) timeout(10ms)
localhost 11009 1000 3000
HORCM_CMD
\\.\CMD-977-5
HORCM_DEV
#dev_group dev_name port# TargetID LU# MU#
#
VG01 LDEV49 CL1-A-1 1 8 0
HORCM_INST
#dev_group ip_address service
#
VG01 localhost 11008
Check: Is the user using “good syntax”?
- Even though this is a 9500V, users should always use Port-HSD-LUN syntax
- I strongly recommend not to use Port-LDEV syntax – where is the cross-check?
- Is MU# specified for ShadowImage? On some levels of CCI, this is mandatory. However, you should specify it anyway as this is Best Practice.
How to check if the HORCM CONF files are correct:
C:\HORCM\ETC>raidscan -p CL1-A-1 -m 0
PORT# /ALPA/C,TID#, LU#..Num(LDEV#....)...P/S, Status, LDEV#,P-Seq#,P-LDEV#
CL1-A-1/ef/ 5, 1, 0-0 .1(13)............S-VOL PAIR 13, ----- 10
CL1-A-1/ef/ 5, 1, 1-0 .1(29)............P-VOL PSUS 29, 977 309
CL1-A-1/ef/ 5, 1, 2-0 .1(48)............P-VOL PSUS 48, 977 300
CL1-A-1/ef/ 5, 1, 3-0 .1(309)...........S-VOL SSUS 309, ----- 29
CL1-A-1/ef/ 5, 1, 4-0 .1(310)...........S-VOL SSUS 310, ----- 29
CL1-A-1/ef/ 5, 1, 5-0 .1(308)...........S-VOL SSUS 308, ----- 24
CL1-A-1/ef/ 5, 1, 6-0 .1(305)...........S-VOL SSUS 305, ----- 1
CL1-A-1/ef/ 5, 1, 7-0 .1(49)............SMPL ---- -----, ----- -----
CL1-A-1/ef/ 5, 1, 8-0 .1(50)............SMPL ---- -----, ----- -----
C:\HORCM\ETC>pairdisplay -g VG01
Group PairVol(L/R) (Port#,TID, LU-M) ,Seq#,LDEV#.P/S,Status, Seq#,P-LDEV# M
VG01 LDEV49(L) (CL1-A-1, 1, 7-0 ) 977 49.SMPL ----,----- ----- -
VG01 LDEV49(R) (CL1-A-1, 1, 8-0 ) 977 50.SMPL ----,----- ----- -
- Check the PVOL and SVOL LDEV numbers (in this case, 49 and 50)
- Check the Port/HSD/LUN (in this case, CL1-A-1 – LUNs 7 and 8)
But now the failure:
C:\HORCM\ETC>paircreate -g VG01 -vl
paircreate: [EX_CMDRJE] An order to the control/command device was rejected
Refer to the command log(C:\HORCM\log8\horcc_hp2k5_log.txt) for details.
And in the log we see this:
COMMAND ERROR : EUserId for HOMRCF[8] : Administrator (0) Wed Mar 07 17:02:30 2007
CMDLINE : paircreate -g VG01 -vl
17:02:30-9a8a8-12452- ERROR:cm_sndrcv[rc < 0 from HORCM]
17:02:30-9e728-12452- [paircreate] L_CMD(CREATEPAIR) ERROR :rc = -35
17:02:30-9e728-12452- [paircreate][exit(221)]
[EX_CMDRJE] An order to the control/command device was rejected
[Cause ]: An order to the command(control) device failed,or was rejected.
[Action]:Please confirm the following items.If this trouble doesn't resolve,then collect HORCM error logs(HORCM_LOG=C:\HORCM\log8\curlog) and Remote HORCM logs,and send them to service personnel.
(1) Check if the 'HORC or HOMRCF function' is installed in the RAID.
(2) Check if the RCP and LCP are installed in the RAID.
(3) Check if the path between the RAID CUs is established by using the SVP.
(4) Check if the pair target volume is an appropriate status.
Yes, meaningless error message numbers like -35 and 221. If this is a RAID subsystem, check the SSBLOGS on the SVP. However, for DF, the SSB is logged in CCI J
Contents of C:\HORCM\log8\curlog\horcmlog_servername\horcm_log.txt
17:02:30-9a8a8-14140- SCSI : Check Condition.
17:02:30-9a8a8-14140- ***** SCSI SENSE DATA *****
---ADDR--- -OFF- 0-1-2-3- 4-5-6-7- 8-9-A-B- C-D-E-F- ------CHAR------
[0x0012f2b4]0000: 70000500 00000038 8400000d 961c0000 p......8........
[0x0012f2c4]0010: 00001000 00000000 00000000 00000000 ................
[0x0012f2d4]0020: 00000000 00000000 00000000 00000000 ................
[0x0012f2e4]0030: 00000000 00000000 00000000 00000000 ................
[0x0012f2f4]0040: 00000000 00000000 00000000 00000000 ................
[0x0012f304]0050: 00000000 00000000 00000000 00000000 ................
17:02:30-9a8a8-14140- SKEY = 0x05
17:02:30-9a8a8-14140- ASC = 0x96
17:02:30-9a8a8-14140- SSB = 0x8400,000d
17:02:30-9a8a8 is the cross-check. Next, it is not obvious, but the error code is:
961C 000D
Now, get hold of the latest AMS CCI manual which contains Appendix A.4
A.4 How to Read Detailed Error Log Codes
and this subsection
A.4.4 Sense Code and Detail Code
Beware – some versions of this manual do not contain these sections L Find one that does.
Table A.5 Sense Codes and Detailed Codes
…
Error Contents Recommended Action
961C 000C The S-VOL is a Sub LU of a unified LU. Check the status of the LU.
961C 000D The default controllers controlling the P-VOL and S-VOL are not the same….
961C 000E The P-VOL is a Cache Residency LU. Check the status of …
In this case, the PVOL and SVOL default controllers are not the same.
“Old Syntax” HORCM CONF Files
This problem only applies to RAID subsystems from 9900V onwards. With 7700E and 9900, there were no Host Storage Domains (HSD) so all LUNs were on the “real” port. With 9900V, USP etc the LUNs are normally considered to be attached to “logical” ports – which are called HSD or Host Groups.
However, it is still possible to use the “old” syntax. This always causes confusion after a while as LUNs get added and deleted from various HSD. Here is an example:
- Imagine that 3 HSD are created on an empty port – HSD 1,2 and 3
- Each HSD has 3 LUNs added – numbered as 0, 1 and 2
If this is done in sequence, HSD 1 has “absolute” LUNs 0-2, HSD 2 has “absolute” LUNs 3-5 and HSD 3 has “absolute” LUNs 6-8
Now imagine that the following actions have been performed some time later:
- Delete HSD 2
- Add HSD 4 with LUNs 0 and 1
And then you allocate LUN 3 to HSD 1 and 3. If you did not know that the previous changes had been made, it would be impossible for you to “guess” that:
- HSD 1 LUN 3 was “absolute” LUN 5
- HSD 3 LUN 3 was “absolute” LUN 9
Even worse, you have no way of looking at the LUN allocations via Storage Navigator as that only shows “relative” LUN numbers L
In a recent case, 47 S-VOL LUNs were deleted by mistake from a HSD. When the mistake was noticed, the same 47 S-VOL LUNs were added back in the “same order”. However, a subsequent pairdisplay showed the following:
TC-WRP 1003-108A(L) (CL2-F , 0, 45)32179 10b5.S-VOL PAIR ASYNC , 0 102e
TC-WRP 1003-108A(R) (CL1-C , 0, 4)32208 1003.P-VOL PAIR ASYNC , 0 108a - (1)
TC-WRP 1004-108B(L) (CL2-F , 0, 46)32179 -----.---- ---- ------,----- ----- -
TC-WRP 1004-108B(R) (CL1-C , 0, 5)32208 1004.P-VOL PAIR ASYNC , 0 108b -
TC-WRP 1005-108C(L) (CL2-F , 0, 47)32179 -----.---- ---- ------,----- ----- -
TC-WRP 1005-108C(R) (CL1-C , 0, 6)32208 1005.P-VOL PAIR ASYNC , 0 108c -
TC-WRP 1006-108D(L) (CL2-F , 0, 48)32179 -----.---- ---- ------,----- ----- -
TC-WRP 1006-108D(R) (CL1-C , 0, 7)32208 1006.P-VOL PAIR ASYNC , 0 108d -
TC-WRP 1007-108E(L) (CL2-F , 0, 49)32179 108a.S-VOL PAIR ASYNC , 0 1003 - (2)
TC-WRP 1007-108E(R) (CL1-C , 0, 8)32208 1007.P-VOL PAIR ASYNC , 0 108e - (3)
...
What can you tell from the display above? Firstly, the pairdisplay was issued by the “DR” CCI server – as (L) refers to the S-VOL. Next, we have obvious mismatches – in yellow. What is less obvious is that the turquoise and green pairs are also invalid. Indeed:
(3) P-VOL is 10:07 and the associated S-VOL is 10:8E
(2) S-VOL is 10:8A and the associated P-VOL is 10:03
This entry does not go with (3)
(1) This is the associated P-VOL for (2)
Here is an excerpt from the “old” HORCM CONF file – using “absolute” LUN numbers:
TC-WRP 1003-108A CL2-F 0 45
TC-WRP 1004-108B CL2-F 0 46
TC-WRP 1005-108C CL2-F 0 47
TC-WRP 1006-108D CL2-F 0 48
And here is the same excerpt after the file has been changed to use HSD syntax:
TC-WRP 1003-108A CL2-F-2 0 6
TC-WRP 1004-108B CL2-F-2 0 7
TC-WRP 1005-108C CL2-F-2 0 8
TC-WRP 1006-108D CL2-F-2 0 9
As you can, the new HORCM CONF file is easier to understand and compare with Storage Navigator J
By the way, here is how you find out the “absolute” and “relative” LUN numbers:
raidscan -p CL2-F -fx
...
CL2-F /88/ 3, 0, 49.1(108a)..........S-VOL PAIR ASYNC 108a, ----- 1003
CL2-F /88/ 3, 0, 50.1(108b)..........S-VOL PAIR ASYNC 108b, ----- 1004
CL2-F /88/ 3, 0, 51.1(108c)..........S-VOL PAIR ASYNC 108c, ----- 1005
...
raidscan -p CL2-F-2 -fx
...
CL2-F-2 /88/ 3, 0, 6.1(108a)..........S-VOL PAIR ASYNC 108a, ----- 1003
CL2-F-2 /88/ 3, 0, 7.1(108b)..........S-VOL PAIR ASYNC 108b, ----- 1004
CL2-F-2 /88/ 3, 0, 8.1(108c)..........S-VOL PAIR ASYNC 108c, ----- 1005
...
Secured CMDDEV and HORCMPERM Implications
If you use a “normal” ie non-secured CMDDEV, you can control CCI for any LUNs on any Host. This also means that you can destroy anyone’s data by using SI to copy your LUNs over the top of their LUNs J
For this reason, you normally only let the Storage Administrator have access to a “normal” CMDDEV – and you always give normal users access to a Secured CMDDEV.
You can tell if a CMDDEV is secured as follows:
C:\HORCM\ETC>horcmstart 0
starting HORCM inst 0
HORCM inst 0 starts successfully.
C:\HORCM\ETC>set horcminst=0
C:\HORCM\ETC>horcctl –D
Current control device = \\.\PHYSICALDRIVE1
This is a “normal” CMDDEV. For this test, I also had access to a secured CMDDEV – and it is possible to swap between them as follows:
C:\HORCM\ETC>horcctl -C
Changed control device(\\.\PHYSICALDRIVE1 -> \\.\PHYSICALDRIVE10*
C:\HORCM\ETC>horcctl -D
Current control device = \\.\PHYSICALDRIVE10*
The asterisk means that the CMDDEV is secured. Normally, of course, you would not give any user access to different types of CMDDEV as that will cause problems.
To test what will happen before giving a secured CMDDEV to a user, you can set the HORCMPROMOD environment variable as follows:
C:\HORCM\etc>set HORCMPROMOD=1
C:\HORCM\ETC>horcmstart 410
starting HORCM inst 410
HORCM inst 0 starts successfully.
C:\HORCM\ETC>set horcminst=410
C:\HORCM\etc>horcctl -D
Current control device = \\.\PhysicalDrive53
Note, however, that this does not affect the horcctl display!
Here is some pairdisplay output when HORCMPROMOD is not set on any CCI server.
C:\HORCM\etc>pairdisplay -g VG01
Group PairVol(L/R) (Port#,TID, LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV# M
VG01 d0(L) (CL2-D , 1, 410)77010027 410.P-VOL PAIR NEVER ,75010010 410 -
VG01 d0(R) (CL1-A , 1, 410)75010010 410.S-VOL PAIR NEVER ,----- 410 -
VG01 d1(L) (CL2-D , 1, 411)77010027 411.P-VOL PAIR NEVER ,75010010 411 -
VG01 d1(R) (CL1-A , 1, 411)75010010 411.S-VOL PAIR NEVER ,----- 411 -
VG01 d2(L) (CL2-D , 1, 412)77010027 412.P-VOL PAIR NEVER ,75010010 412 -
VG01 d2(R) (CL1-A , 1, 412)75010010 412.S-VOL PAIR NEVER ,----- 412 -
VG01 d3(L) (CL2-D , 1, 413)77010027 413.P-VOL PAIR NEVER ,75010010 413 -
VG01 d3(R) (CL1-A , 1, 413)75010010 413.S-VOL PAIR NEVER ,----- 413 -
VG01 d4(L) (CL2-D , 1, 414)77010027 414.P-VOL PAIR NEVER ,75010010 414 -
VG01 d4(R) (CL1-A , 1, 414)75010010 414.S-VOL PAIR NEVER ,----- 414 -
As you can see, LDEVs 410-414 on an AMS1000 (SN begins with 770x) are paired with LDEVs 410-414 on an AMS500 (SN begins with 750x).
Here is the same pairdisplay output when HORCMPROMOD has been set on both CCI servers.
C:\HORCM\etc>set HORCMPROMOD=1
C:\HORCM\etc>horcmstart 410
starting HORCM inst 410
HORCM inst 410 starts successfully.
C:\HORCM\ETC>set horcminst=410
C:\HORCM\etc>pairdisplay -g VG01
Group PairVol(L/R) (Port#,TID, LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV# M
VG01 d0(L) (CL2-D , 1, 410)77010027 410.P-VOL PAIR NEVER ,75010010 410 -
VG01 d0(R) (CL1-A , 1, 410)75010010 *****.---- ---- ------,----- ----- -
VG01 d1(L) (CL2-D , 1, 411)77010027 411.P-VOL PAIR NEVER ,75010010 411 -
VG01 d1(R) (CL1-A , 1, 411)75010010 *****.---- ---- ------,----- ----- -
VG01 d2(L) (CL2-D , 1, 412)77010027 412.P-VOL PAIR NEVER ,75010010 412 -
VG01 d2(R) (CL1-A , 1, 412)75010010 *****.---- ---- ------,----- ----- -
VG01 d3(L) (CL2-D , 1, 413)77010027 413.P-VOL PAIR NEVER ,75010010 413 -
VG01 d3(R) (CL1-A , 1, 413)75010010 *****.---- ---- ------,----- ----- -
VG01 d4(L) (CL2-D , 1, 414)77010027 414.P-VOL PAIR NEVER ,75010010 414 -
VG01 d4(R) (CL1-A , 1, 414)75010010 *****.---- ---- ------,----- ----- -
As you can see, the local CCI instance (L) has access to all its LUNs/LDEVs. However, the DR CCI server (R) has no access to LDEVs 410-414.
If you attempt to do any commands such as pairsplit, the following will happen:
C:\HORCM\etc>pairsplit -g VG01
pairsplit: [EX_ENPERM] Permission denied with the LDEV
Refer to the command log(C:\HORCM\log410\horcc_Verdande_log.txt) for details.
You can use HORCMPERM*.CONF (* is the instance number) to further limit CCI access. HORCMPERM*.CONF does not give you access to LDEVs that you are not allowed to process. It removes access to LDEVs that you are allowed to process but do not wish to process.
How does this work? Let’s start instance 410 with HORCMPROMOD=1 and no HORCMPERM.CONF file. At the bottom of the start up log you will see this:
11:01:48-518b0-02092- HORCM has been set to the PROTECT MODE on 'ENV'.
...
11:01:50-e2900-01428- horcmgr : executed 'CreateProcess(raidscan.exe -pi $PhysicalDrive -find inst -z2w=NUL -z1w=NUL)',exit = 0.
As you can see, raidscan is called internally with an argument of -pi $PhysicalDrive (all physical drives) – thus allowing all LUNs on this server to be accessed.
Now let’s stop horcm and define a file as follows:
C:\HORCM\etc>type C:\WINDOWS\HORCMPERM410.CONF
hd0-56
C:\HORCM\etc>
HORCMPERM410.CONF contains a list of every device that we wish to be able to access via CCI. Here is the resultant pairdisplay after a restart of horcm:
C:\HORCM\etc>pairdisplay -g VG01
Group PairVol(L/R) (Port#,TID, LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV# M
VG01 d0(L) (CL2-D , 1, 410)77010027 410.P-VOL PAIR NEVER ,75010010 410 -
VG01 d0(R) (CL1-A , 1, 410)75010010 *****.---- ---- ------,----- ----- -
VG01 d1(L) (CL2-D , 1, 411)77010027 411.P-VOL PAIR NEVER ,75010010 411 -
VG01 d1(R) (CL1-A , 1, 411)75010010 *****.---- ---- ------,----- ----- -
VG01 d2(L) (CL2-D , 1, 412)77010027 412.P-VOL PAIR NEVER ,75010010 412 -
VG01 d2(R) (CL1-A , 1, 412)75010010 *****.---- ---- ------,----- ----- -
VG01 d3(L) (CL2-D , 1, 413)77010027 *****.---- ---- ------,----- ----- -
VG01 d3(R) (CL1-A , 1, 413)75010010 *****.---- ---- ------,----- ----- -
VG01 d4(L) (CL2-D , 1, 414)77010027 *****.---- ---- ------,----- ----- -
VG01 d4(R) (CL1-A , 1, 414)75010010 *****.---- ---- ------,----- ----- -
The bold lines show what has changed. Here is the bottom of the start up log file.
11:08:03-7d3e8-02408- horcmgr : executed 'CreateProcess(raidscan.exe -find inst -z0r=C:\WINDOWS\horcmperm410.conf -z2w=NUL -z1w=NUL)',exit = 0.
As you can, raidscan has been called internally and is using the list of disks in HORCMPERM410.CONF to determine which LDEVs can be accessed. Here is some inqraid output:
C:\HORCM\etc>inqraid $LETALL -CLI
DEVICE_FILE PORT SERIAL LDEV CTG H/M/12 SSID R:Group PRODUCT_ID
E:\Vol13\Dsk54 CL2-D 77010027 410 - P/s/ss 0000 A:07-00 DF600F
F:\Vol14\Dsk55 CL2-D 77010027 411 - P/s/ss 0000 A:07-00 DF600F
Q:\Vol11\Dsk12 CL1-B 3157 169 - P/s/ss 0000 5:02-00 DF600F
G:\Vol15\Dsk56 CL2-D 77010027 412 - P/s/ss 0000 A:07-00 DF600F
R:\Vol12\Dsk13 CL1-B 3157 170 - P/s/ss 0000 5:02-00 DF600F
H:\Vol16\Dsk57 CL2-D 77010027 413 - P/s/ss 0000 A:07-00 DF600F
I:\Vol17\Dsk58 CL2-D 77010027 414 - P/s/ss 0000 A:07-00 DF600F
J:\Vol2\Dsk0 - - - - - - - ST336754LC
The bold lines show that LDEVs 413 and 414 are Physical Drives 57 and 58 – and as we only allowed access to Physical Drives 0-56, this explains why the pairdisplay has changed.
Note that it is possible to “fix” this “mistake” by manual use of the raidscan command as follows:
C:\HORCM\etc>echo hd57-58 | raidscan -find inst
DEVICE_FILE Group PairVol PORT TARG LUN M SERIAL LDEV
Harddisk57 VG01 d3 CL2-D 1 413 0 77010027 413
Harddisk57 VG01 d3 CL2-D 1 413 - 77010027 413
Harddisk58 VG01 d4 CL2-D 1 414 0 77010027 414
Harddisk58 VG01 d4 CL2-D 1 414 - 77010027 414
C:\HORCM\etc>pairdisplay -g VG01
Group PairVol(L/R) (Port#,TID, LU),Seq#,LDEV#.P/S,Status,Fence,Seq#,P-LDEV# M
VG01 d0(L) (CL2-D , 1, 410)77010027 410.P-VOL PAIR NEVER ,75010010 410 -
VG01 d0(R) (CL1-A , 1, 410)75010010 *****.---- ---- ------,----- ----- -
VG01 d1(L) (CL2-D , 1, 411)77010027 411.P-VOL PAIR NEVER ,75010010 411 -
VG01 d1(R) (CL1-A , 1, 411)75010010 *****.---- ---- ------,----- ----- -
VG01 d2(L) (CL2-D , 1, 412)77010027 412.P-VOL PAIR NEVER ,75010010 412 -
VG01 d2(R) (CL1-A , 1, 412)75010010 *****.---- ---- ------,----- ----- -
VG01 d3(L) (CL2-D , 1, 413)77010027 413.P-VOL PAIR NEVER ,75010010 413 -
VG01 d3(R) (CL1-A , 1, 413)75010010 *****.---- ---- ------,----- ----- -
VG01 d4(L) (CL2-D , 1, 414)77010027 414.P-VOL PAIR NEVER ,75010010 414 -
VG01 d4(R) (CL1-A , 1, 414)75010010 *****.---- ---- ------,----- ----- -
Of course, you are unlikely to fix such an issue with raidscan. You would normally fix HORCMPERM*.CONF and then stop and restart horcm.
“Basic” HORCM CONF problems
When HORCM will not start, you strip the CONF file back to the bare essentials – and then change one thing at a time. Sometimes even this fails. Here are the most common reasons.
HORCM_MON
#ip_address service poll(10ms) timeout(10ms)
10.129.3.127 11042 1000 3000
HORCM_CMD
#dev_name
# CMDDEV0 - USP600 - SN 10111 - \\.\CMD-10111-4
The above file is correct – let us make some simple changes to break it.
1 Wrong IP Address
Change 10.129.3.127 to 10.129.2.127. A simple typo, but here is what you get:
Windows
[System Call Error]
SysCall: bind
WSAerr : 10049(0x00002741) (See winsock2.h)
ErrInfo: Internal Error
ErrTime: Mon Sep 08 12:43:03 2008
SrcFile: shorcmc.c
SrcLine: 2405
ERROR:cmr_repcre[scmcrepcr fail]
Of course, it is the “Internal Error” that confuses most people here. The real error is in the line above. It is the result of a standard call to an OS socket service, in this case Winsock.
Here is the relevant section from winsock2.h:
#define WSABASEERR 10000
…
#define WSAEADDRNOTAVAIL (WSABASEERR+49)
Here is a useful web page – and some useful information:
http://www.sockets.com/err_lst1.htm
WSAEADDRNOTAVAIL (10049) Cannot assign requested address.
Berkeley description: Normally results from an attempt to create a socket with an address not on this machine.
So, the error is obvious when you know where to look. The problem is, not many people know where to look!
UNIX
UNIX errors messages are not only different, they are different on each platform! Here is the same error for Solaris:
[System Call Error]
SysCall: bind
Errorno: 126 (Cannot assign requested address)
ErrInfo: Internal Error
ErrTime: Tue Sep 2 11:45:40 2008
SrcFile: shorcmc.c
SrcLine: 2427
ERROR:cmr_repcre[scmcrepcr fail]
Here is a useful web page:
http://www.ioplex.com/~miallen/errcmpp.html
The relevant line for this error says:
| AIX 4.3,5.1 | HP-UX 11.22 | Solaris 9,10 | |||
EADDRNOTAVAIL | 68 | Can't assign requested address | 227 | Can't assign requested address | 126 | Can't assign requested address |
Once again, this is not the most intuitive error I have seen.
2 Invalid CMDDEV
Here is what you get if you change the CMDDEV to \\.\CMD-10111-42
12:52:23-16b48-04004- horcread():cannot open command device:\\.\CMD-10111-42
12:52:23-16b48-04004- [WARNING] This device(\\.\CMD-10111-42) is not ready for receiving a command.
12:52:23-16b48-04004- No device is ready for receiving a command in 1 line from HORC_CMD.
12:52:23-16b48-04004- ERROR:horcm_cfg_create
12:52:28-0b3b0-01136- horcmgr:Failed to connect to HORCM.
Here I think it is pretty obvious what the problem is J
3 Invalid service name
Change 11042 to “horcm42”
17:29:02-d59f8-02260- [horcmcfgrdf] open(conf_file) OK.
17:29:02-d59f8-02260- ERROR: A wrong ipaddr or servicename line exists in HORCM_MON: line 4
17:29:02-d59f8-02260- 10.129.3.127 horcm42 1000 3000
17:29:02-d59f8-02260- [horcmcfgrdf] close(conf_file) OK.
17:29:02-d59f8-02260- ERROR:horcm_cfg_create
Once again, it is more obvious what is wrong.
4 UDP port which is in use
Change 11042 to 1030. This is not a “sensible” port number. It was chosen to cause an error.
[System Call Error]
SysCall: bind
WSAerr : 10013(0x0000271d) (See winsock2.h)
ErrInfo: Internal Error
ErrTime: Mon Sep 08 17:39:46 2008
SrcFile: shorcmc.c
SrcLine: 2405
ERROR:cmr_repcre[scmcrepcr fail]
Here is the relevant section from winsock2.h:
#define WSAEACCES (WSABASEERR+13)
The following web page has more information:
http://www.sockets.com/err_lst1.htm
WSAEACCES (10013) Permission denied.
Berkeley description: An attempt was made to access a file in a way forbidden by its file access permissions.
However, in this case, that is hardly descriptive of the problem. Of course, if one had access to a command prompt, one could do this:
C:\HORCM\ETC>netstat -a -p UDP
Active Connections
Proto Local Address Foreign Address State
UDP ml_acer510:microsoft-ds *:*
UDP ml_acer510:isakmp *:*
UDP ml_acer510:1030 *:*
…
UDP ml_acer510:54323 *:*
It is not likely that you will be this lucky J