An Oracle blog – Amardeep Sidhu

November 18, 2017

root.sh fails with CRS-2101:The OLR was formatted using version 3

Filed under: RAC,Troubleshooting — Sidhu @ 4:03 pm
Tags: ,

Got this while trying to install 11.2.0.4 RAC on Redhat Linux 7.2. root.sh fails with a message like

ohasd failed to start
Failed to start the Clusterware. Last 20 lines of the alert log follow:
2017-11-09 15:43:37.883:
[client(37246)]CRS-2101:The OLR was formatted using version 3.

This is bug 18370031. Need to apply the patch before running root.sh.

April 28, 2017

Failed to create voting files on disk group RECOC1

Filed under: Exadata,RAC — Sidhu @ 2:31 pm
Tags: , ,

Long story short, faced this issue while running OneCommand for one Exadata system. The root.sh step (Initialize Cluster Software) was failing with the following error on the screen

Checking file root_dm01dbadm02.in.oracle.com_2017-04-27_18-13-27.log on node dm01dbadm02.somedomain.com
Error: Error running root scripts, please investigate…
Collecting diagnostics…
Errors occurred. Send /u01/onecommand/linux-x64/WorkDir/Diag-170427_181710.zip to Oracle to receive assistance.

Doesn’t make much sense. So let us check the log file of this step

2017-04-27 18:17:10,463 [INFO][  OCMDThread][        ClusterUtils:413] Checking file root_dm01dbadm02.somedomain.com_2017-04-27_18-13-27.log on node inx321dbadm02.somedomain.com
2017-04-27 18:17:10,464 [INFO][  OCMDThread][        OcmdException:62] Error: Error running root scripts, please investigate…
2017-04-27 18:17:10,464 [FINE][  OCMDThread][        OcmdException:63] Throwing OcmdException… message:Error running root scripts, please investigate…

So we need to go to root.sh log file now. That shows

Failed to create voting files on disk group RECOC1.
Change to configuration failed, but was successfully rolled back.
CRS-4000: Command Replace failed, or completed with errors.
Voting file add failed
2017/04/27 18:16:37 CLSRSC-261: Failed to add voting disks

Died at /u01/app/12.1.0.2/grid/crs/install/crsinstall.pm line 2068.
The command ‘/u01/app/12.1.0.2/grid/perl/bin/perl -I/u01/app/12.1.0.2/grid/perl/lib -I/u01/app/12.1.0.2/grid/crs/install /u01/app/12.1.0.2/grid/crs/install/root
crs.pl ‘ execution failed

Makes some senses but we can’t understand what happened while creating Voting files on RECOC1. Let us check ASM alert log also

NOTE: Creating voting files in diskgroup RECOC1
Thu Apr 27 18:16:36 2017
NOTE: Voting File refresh pending for group 1/0x39368071 (RECOC1)
Thu Apr 27 18:16:36 2017
NOTE: Attempting voting file creation in diskgroup RECOC1
NOTE: voting file allocation (replicated) on grp 1 disk RECOC1_CD_00_DM01CELADM01
NOTE: voting file allocation on grp 1 disk RECOC1_CD_00_DM01CELADM01
NOTE: voting file allocation (replicated) on grp 1 disk RECOC1_CD_00_DM01CELADM02
NOTE: voting file allocation on grp 1 disk RECOC1_CD_00_DM01CELADM02
NOTE: voting file allocation (replicated) on grp 1 disk RECOC1_CD_00_DM01CELADM03
NOTE: voting file allocation on grp 1 disk RECOC1_CD_00_DM01CELADM03
ERROR: Voting file allocation failed for group RECOC1
Thu Apr 27 18:16:36 2017
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_ora_228588.trc:
ORA-15274: Not enough failgroups (5) to create voting files

So we can see the issue here. We can look at the above trace file also for more detail.

Now to why did this happen ?

The RECOC1 is a HIGH redundancy disk group which means that if we want to place Voting files there, it should have at least 5 failure groups. In this configuration there are only 3 cells and that doesn’t meet the minimum failure groups condition (1 cell = 1 failgroup in Exadata).

Now to how did it happen ?

This one was an Exadata X3 half rack and we planned to deploy it (for testing purpose) as 2 quarter racks : 1st cluster with db1, db2 + cell1, cell2, cell3 and 2nd cluster with db3, db4 + cell4, cell5, cell6, cell7. All the disk groups to be in High redundancy.

Before a certain 12.x Exadata software version it was not even possible to have all disk groups in High redundancy in a quarter rack as to have Voting disk in a High redundancy disk group you need to have a minimum of 5 failure groups (as mentioned above). In a quarter rack you have only 3 fail groups. With a certain 12.x Exadata software version a new feature quorum disks was introduced which made is possible to have that configuration. Read this link for more details. Basically we take a slice of disk from each DB node and add it to the disk group where you want to have the Voting file. 3 cells + 2 disks from DB nodes makes it 5 so all is good.

Now while starting with the deployment we noticed that db node1 was having some hardware issues. As we needed the machine for testing so we decided to build the first cluster with 1 db node only. So the final configuration of 1st cluster had 1 db node + 3 cells. We imported the XML back in OEDA, modified the cluster 1 configuration to 1 db node and generated the configuration files. That is where the problem started. The RECO disk group still was High redundancy but as we had only 1 db node at this stage so the configuration was not even a candidate for quorum disks. Hence the above error. Changing DBFS_DG to Normal redundancy fixed the issue as when DBFS_DG is Normal redundancy, OneCommand will place the Voting files there.

Ideally it shouldn’t happened as OEDA shouldn’t allow a configuration that is not doable. The case here is that as originally the configuration was having 2 db nodes + 3 cells so High redundancy for all disk groups was allowed in OEDA. While modifying the configuration when one db node was removed from the cluster, OEDA probably didn’t run the redundancy check on disk groups and it allowed the go past that screen. If you try to create a new configuration with 1 db node + 3 cells, it will not allow you to choose High redundancy for all disk groups. DBFS will remain in Normal redundancy. You can’t change that.

March 28, 2017

Oracle RAC 12.1 – lsnodes exited with code 9

Filed under: RAC — Sidhu @ 10:01 pm
Tags: , ,

I was trying to do a 2 node RAC setup on Solaris 11.3 where Oracle Solaris Cluster 4.3 was already configured. Installed was running but the Cluster Node Information screen was appearing like this

error

The install log shows this:

INFO: Checking cluster configuration details

INFO: Found Vendor Clusterware. Fetching Cluster Configuration

INFO: Executing [/tmp/OraInstall2017-03-28_12-50-48PM/ext/bin/lsnodes]

with environment variables {TERM=xterm, LC_COLLATE=, SHLVL=3, JAVA_HOME=, XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt, SSH_CLIENT=172.16.64.55 56370 22, LC_NUMERIC=, LC_MESSAGES=, MAIL=/var/mail/oracle, PWD=/export/software/grid/grid, XTERM_VERSION=XTerm(320), WINDOWID=2097165, LOGNAME=oracle, _=*50727*/export/software/grid/grid/install/.oui, NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat, SSH_CONNECTION=172.16.64.55 56370 172.16.72.18 22, OLDPWD=/export/oracle, LC_CTYPE=, CLASSPATH=, PATH=/usr/bin:/usr/ccs/bin:/usr/bin:/bin:/export/software/grid/grid/install, LC_ALL=, DISPLAY=localhost:10.0, LC_MONETARY=, USER=oracle, HOME=/export/oracle, XTERM_SHELL=/bin/bash, XAUTHORITY=/tmp/ssh-xauth-mlq21a/xauthfile, A__z=”*SHLVL, XTERM_LOCALE=en_US.UTF-8, TZ=localtime, LC_TIME=, LANG=en_US.UTF-8}

INFO: Starting Output Reader Threads for process /tmp/OraInstall2017-03-28_12-50-48PM/ext/bin/lsnodes

INFO: The process /tmp/OraInstall2017-03-28_12-50-48PM/ext/bin/lsnodes exited with code 9

So we can see the problem. lsnodes is not able to list the nodes. Let us try to run that command manually.

-bash-4.1$ export PATH=PATH=/usr/bin:/usr/ccs/bin:/usr/bin:/bin:/export/software/grid/grid/install

-bash-4.1$ /tmp/OraInstall2017-03-28_12-50-48PM/ext/bin/lsnodes

ld.so.1: lsnodes: fatal: libskgxn2.so: open failed: No such file or directory

Killed

-bash-4.1$

So looks like it is not able to find this library called libskgxn2.so. If we do a find for this file name we can see that it is present in this directory /usr/cluster/lib/sparcv9/libskgxn2.so .

Some googling and MOS searches revealed that it expects the library to be present at /opt/ORCLcluster/lib. This directory doesn’t exist here. As a workaround we can create this directory manually and create symbolic link to file libskgxn2.so

The lsnodes command worked fine after this workaround and installer also shows both the nodes listed.

September 13, 2016

addNode.sh, failed root.sh and IB listener

Filed under: Exadata,RAC — Sidhu @ 7:44 pm
Tags: , ,

So this customer has an Exadata quarter rack and they have an IB listener configured on both DB nodes (for DB connections from a multi-racked Exalogic system). We were adding a new DB node to this rack. So just followed the standard procedure of creating users, directories etc on the new node, setting up ssh equivalence and running addNode.sh. All went fine but root.sh failed. Little looking into the logs revealed that it failed while running srvctl start listener –n <node_name>

If we manually run this command, it will immediately reveal what the problem is. It is not able to start IB listener on the new node as the IB VIP doesn’t yet exist. It could happen for any of the additional networks added.

There is a MOS note that describes this exact situation but the solution that it gives is to remove the additional listener, complete addNode.sh & root.sh and add the additional listener back. That wasn’t possible in this case. After little bit of googling I stumbled upon this post by Jeremy Schneider. His colleague solved this problem with a very simple and clever workaround. Before root.sh prepares to run srvctl start listener command, run the add VIP command from another Window Winking smile. Additional network would have already got added when root.sh runs on the new node.

To be able to perform this trick, you have to have the hosts file updated with the new VIP name and IP and be ready with the command to add the VIP. While root.sh is running, it will show a message like “there is already an active cluster, restarting to join”, immediately start trying to run srvctl add vip command in another window. The moment CRS, comes up the command will succeed. Immediately after that root.sh is going to run srvctl start listener command, and this time it shouldn’t fail as the VIP is already added.

Another small mistake we made was not updating the cellip.ora on the new node before running root.sh. That caused the root.sh to fail as it couldn’t talk to ASM running on existing cell nodes. Updating cellip.ora with the existing storage node IPs fixed the problem.

Theme: Rubric. Get a free blog at WordPress.com

%d bloggers like this: