Category Archives: RAC

[FATAL] [INS-44000] Passwordless SSH connectivity is not setup

Faced this while running installer for setting up a 2 node RAC setup (version 19.8) on an Oracle SuperCluster. The error reported in the log is:

[FATAL] [INS-44000] Passwordless SSH connectivity is not setup from the local node node1 to the following nodes:
[node2]
[INS-06006] Passwordless SSH connectivity not set up between the following node(s): [node2]

From the error it appears that the ssh is not setup between two nodes but actually that is not the case. Here the error message is bit misleading. It turned out to be an issue with scp with openssh version 8.x. Running the setup with -debug option gives the clue:

<protocol error: filename does not match request>

The reason is a new check introduced in openssh version 8.x. It is explained here, here and here. MOS note 2555697.1 also talks about it.

Workaround is to pass the -T option to scp to ignore the new checks. You can rename the scp binary to something like scp.original and create a new shell script there like this:

cd /usr/bin
mv scp scp.original
vi scp
/usr/bin/scp.original -T $*
chmod 555 scp

This time, the install should succeed. You can revert the changes back once the install is done.

PRVF-4657 : Name resolution setup check for “db-scan” (IP address: x.x.x.101) failed

A quick note about an error I faced while running root.sh on an Exadata machine. The configuration tools failed with the following error:

Error is PRVF-4657 : Name resolution setup check for "db-scan" (IP address: x.x.x.101) failed

I did nslookup on the scan name and it all seemed good. So why the error ? After spending another 5 minutes, I looked at /etc/hosts and there was it. Someone had populated /etc/hosts of DB nodes with all the hostnames entries including the scan name. Something like:

x.x.x.101	db-scan.example.com	db-scan
x.x.x.102	db-scan.example.com	db-scan
x.x.x.103	db-scan.example.com	db-scan

As /etc/hosts can return only one IP against a hostname whereas for scan, DNS is supposed to return 3 IPs, hence the problem. The solution is to comment out the scan name entries in /etc/hosts on all the db nodes and let the system do the name resolution via the DNS.

root.sh fails with CRS-2101:The OLR was formatted using version 3

Got this while trying to install 11.2.0.4 RAC on Redhat Linux 7.2. root.sh fails with a message like

ohasd failed to start
Failed to start the Clusterware. Last 20 lines of the alert log follow:
2017-11-09 15:43:37.883:
[client(37246)]CRS-2101:The OLR was formatted using version 3.

This is bug 18370031. Need to apply the patch before running root.sh.

Failed to create voting files on disk group RECOC1

Long story short, faced this issue while running OneCommand for one Exadata system. The root.sh step (Initialize Cluster Software) was failing with the following error on the screen

Checking file root_dm01dbadm02.in.oracle.com_2017-04-27_18-13-27.log on node dm01dbadm02.somedomain.com
Error: Error running root scripts, please investigate…
Collecting diagnostics…
Errors occurred. Send /u01/onecommand/linux-x64/WorkDir/Diag-170427_181710.zip to Oracle to receive assistance.

Doesn’t make much sense. So let us check the log file of this step

2017-04-27 18:17:10,463 [INFO][  OCMDThread][        ClusterUtils:413] Checking file root_dm01dbadm02.somedomain.com_2017-04-27_18-13-27.log on node inx321dbadm02.somedomain.com
2017-04-27 18:17:10,464 [INFO][  OCMDThread][        OcmdException:62] Error: Error running root scripts, please investigate…
2017-04-27 18:17:10,464 [FINE][  OCMDThread][        OcmdException:63] Throwing OcmdException… message:Error running root scripts, please investigate…

So we need to go to root.sh log file now. That shows

Failed to create voting files on disk group RECOC1.
Change to configuration failed, but was successfully rolled back.
CRS-4000: Command Replace failed, or completed with errors.
Voting file add failed
2017/04/27 18:16:37 CLSRSC-261: Failed to add voting disks

Died at /u01/app/12.1.0.2/grid/crs/install/crsinstall.pm line 2068.
The command ‘/u01/app/12.1.0.2/grid/perl/bin/perl -I/u01/app/12.1.0.2/grid/perl/lib -I/u01/app/12.1.0.2/grid/crs/install /u01/app/12.1.0.2/grid/crs/install/root
crs.pl ‘ execution failed

Makes some senses but we can’t understand what happened while creating Voting files on RECOC1. Let us check ASM alert log also

NOTE: Creating voting files in diskgroup RECOC1
Thu Apr 27 18:16:36 2017
NOTE: Voting File refresh pending for group 1/0x39368071 (RECOC1)
Thu Apr 27 18:16:36 2017
NOTE: Attempting voting file creation in diskgroup RECOC1
NOTE: voting file allocation (replicated) on grp 1 disk RECOC1_CD_00_DM01CELADM01
NOTE: voting file allocation on grp 1 disk RECOC1_CD_00_DM01CELADM01
NOTE: voting file allocation (replicated) on grp 1 disk RECOC1_CD_00_DM01CELADM02
NOTE: voting file allocation on grp 1 disk RECOC1_CD_00_DM01CELADM02
NOTE: voting file allocation (replicated) on grp 1 disk RECOC1_CD_00_DM01CELADM03
NOTE: voting file allocation on grp 1 disk RECOC1_CD_00_DM01CELADM03
ERROR: Voting file allocation failed for group RECOC1
Thu Apr 27 18:16:36 2017
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_228588.trc:
ORA-15274: Not enough failgroups (5) to create voting files

So we can see the issue here. We can look at the above trace file also for more detail.

Now to why did this happen ?

The RECOC1 is a HIGH redundancy disk group which means that if we want to place Voting files there, it should have at least 5 failure groups. In this configuration there are only 3 cells and that doesn’t meet the minimum failure groups condition (1 cell = 1 failgroup in Exadata).

Now to how did it happen ?

This one was an Exadata X3 half rack and we planned to deploy it (for testing purpose) as 2 quarter racks : 1st cluster with db1, db2 + cell1, cell2, cell3 and 2nd cluster with db3, db4 + cell4, cell5, cell6, cell7. All the disk groups to be in High redundancy.

Before a certain 12.x Exadata software version it was not even possible to have all disk groups in High redundancy in a quarter rack as to have Voting disk in a High redundancy disk group you need to have a minimum of 5 failure groups (as mentioned above). In a quarter rack you have only 3 fail groups. With a certain 12.x Exadata software version a new feature quorum disks was introduced which made is possible to have that configuration. Read this link for more details. Basically we take a slice of disk from each DB node and add it to the disk group where you want to have the Voting file. 3 cells + 2 disks from DB nodes makes it 5 so all is good.

Now while starting with the deployment we noticed that db node1 was having some hardware issues. As we needed the machine for testing so we decided to build the first cluster with 1 db node only. So the final configuration of 1st cluster had 1 db node + 3 cells. We imported the XML back in OEDA, modified the cluster 1 configuration to 1 db node and generated the configuration files. That is where the problem started. The RECO disk group still was High redundancy but as we had only 1 db node at this stage so the configuration was not even a candidate for quorum disks. Hence the above error. Changing DBFS_DG to Normal redundancy fixed the issue as when DBFS_DG is Normal redundancy, OneCommand will place the Voting files there.

Ideally it shouldn’t happened as OEDA shouldn’t allow a configuration that is not doable. The case here is that as originally the configuration was having 2 db nodes + 3 cells so High redundancy for all disk groups was allowed in OEDA. While modifying the configuration when one db node was removed from the cluster, OEDA probably didn’t run the redundancy check on disk groups and it allowed the go past that screen. If you try to create a new configuration with 1 db node + 3 cells, it will not allow you to choose High redundancy for all disk groups. DBFS will remain in Normal redundancy. You can’t change that.

Oracle RAC 12.1 – lsnodes exited with code 9

I was trying to do a 2 node RAC setup on Solaris 11.3 where Oracle Solaris Cluster 4.3 was already configured. Installed was running but the Cluster Node Information screen was appearing like this

error

The install log shows this:

INFO: Checking cluster configuration details

INFO: Found Vendor Clusterware. Fetching Cluster Configuration

INFO: Executing [/tmp/OraInstall2017-03-28_12-50-48PM/ext/bin/lsnodes]

with environment variables {TERM=xterm, LC_COLLATE=, SHLVL=3, JAVA_HOME=, XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt, SSH_CLIENT=172.16.64.55 56370 22, LC_NUMERIC=, LC_MESSAGES=, MAIL=/var/mail/oracle, PWD=/export/software/grid/grid, XTERM_VERSION=XTerm(320), WINDOWID=2097165, LOGNAME=oracle, _=*50727*/export/software/grid/grid/install/.oui, NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat, SSH_CONNECTION=172.16.64.55 56370 172.16.72.18 22, OLDPWD=/export/oracle, LC_CTYPE=, CLASSPATH=, PATH=/usr/bin:/usr/ccs/bin:/usr/bin:/bin:/export/software/grid/grid/install, LC_ALL=, DISPLAY=localhost:10.0, LC_MONETARY=, USER=oracle, HOME=/export/oracle, XTERM_SHELL=/bin/bash, XAUTHORITY=/tmp/ssh-xauth-mlq21a/xauthfile, A__z=”*SHLVL, XTERM_LOCALE=en_US.UTF-8, TZ=localtime, LC_TIME=, LANG=en_US.UTF-8}

INFO: Starting Output Reader Threads for process /tmp/OraInstall2017-03-28_12-50-48PM/ext/bin/lsnodes

INFO: The process /tmp/OraInstall2017-03-28_12-50-48PM/ext/bin/lsnodes exited with code 9

So we can see the problem. lsnodes is not able to list the nodes. Let us try to run that command manually.

-bash-4.1$ export PATH=PATH=/usr/bin:/usr/ccs/bin:/usr/bin:/bin:/export/software/grid/grid/install

-bash-4.1$ /tmp/OraInstall2017-03-28_12-50-48PM/ext/bin/lsnodes

ld.so.1: lsnodes: fatal: libskgxn2.so: open failed: No such file or directory

Killed

-bash-4.1$

So looks like it is not able to find this library called libskgxn2.so. If we do a find for this file name we can see that it is present in this directory /usr/cluster/lib/sparcv9/libskgxn2.so .

Some googling and MOS searches revealed that it expects the library to be present at /opt/ORCLcluster/lib. This directory doesn’t exist here. As a workaround we can create this directory manually and create symbolic link to file libskgxn2.so

The lsnodes command worked fine after this workaround and installer also shows both the nodes listed.