Amardeep Sidhu's Tech blog

ksplice kernel updates and Exadata patching

If you have installed some one off ksplice fix for kernel on Exadata, remember to uninstall it before you do a kernel upgrade eg regular Exadata patching. As such fixes are kernel version specific so they may not work with the newer version of the kernel.

ORA-15040 ORA-15042 with EXTERNAL redundancy Diskgroup

A colleague was working on an ASM issue (Standalone one, Version 11.2.0.3 on AIX) at one of the customer sites. Later on, I also joined him. The issue was that the customer added few news disks to an existing diskgroup. Everything went well and the rebalance kicked in. After some time, something happened and all of a sudden the diskgroup was dismounted. While trying the mount the diskgroup again, it was giving ...

TNS-12543: TNS:destination host unreachable

Scenario : Setting up a physical standby from Exadata to a non-Exadata single instance. tnsping from standby to primary works fine but tnsping from primary to standby fails with: [sql]TNS-12543: TNS:destination host unreachable[/sql] I am able to ssh standby from primary, can ping as well but tnsping doesn’t work. From the error description we can figure out that something is blocking the access. In this case it was iptables that was enabled on the standby server. ...

ORA-12154 in Data Guard environment

Hit this silly issue in one of the data guard environments today. Primary is a 2 node RAC running 11.2.0.4 and standby is also a 2 node RAC. Archive logs from node2 aren’t shipping and the error being reported is [sql]ORA-12154: TNS:could not resolve the connect identifier specified[/sql] We tried usual things like going to $TNS_ADMIN, checking the entry in tnsnames.ora and then also trying to connect using sqlplus sys@target as sysdba. Everything seemed to be good but logs were not shipping and the same problem was being reported repeatedly. As everything on node1 was working fine so it looked even more weird. ...

Failed to create voting files on disk group RECOC1

Long story short, faced this issue while running OneCommand for one Exadata system. The root.sh step (Initialize Cluster Software) was failing with the following error on the screen Checking file root_dm01dbadm02.in.oracle.com_2017-04-27_18-13-27.log on node dm01dbadm02.somedomain.com Error: Error running root scripts, please investigate… Collecting diagnostics… Errors occurred. Send /u01/onecommand/linux-x64/WorkDir/Diag-170427_181710.zip to Oracle to receive assistance. Doesn’t make much sense. So let us check the log file of this step 2017-04-27 18:17:10,463 [INFO][ OCMDThread][ ClusterUtils:413] Checking file root_dm01dbadm02.somedomain.com_2017-04-27_18-13-27.log on node inx321dbadm02.somedomain.com 2017-04-27 18:17:10,464 [INFO][ OCMDThread][ OcmdException:62] Error: Error running root scripts, please investigate… 2017-04-27 18:17:10,464 [FINE][ OCMDThread][ OcmdException:63] Throwing OcmdException… message:Error running root scripts, please investigate… ...

OneCommand Step 1 error

Hit this silly issue while doing an Exadata deployment for a customer. Step 1 was giving the following error: ERROR: 192.168.99.102 configured on dm01celadm01.example.com as dm01dbadm02 does not match expected value dm01dbadm02.example.com I wasn’t able to make sense of it for quite some time until a colleague pointed out that the reverse lookup entries should be done for FQDN only. As it is clear in the above message reverse lookup of the IP 192.168.99.102 returns dm01dbadm02 instead of dm01dbadm02.example.com. Fixing this in DNS resolved the issue. ...

Oracle RAC 12.1 – lsnodes exited with code 9

I was trying to do a 2 node RAC setup on Solaris 11.3 where Oracle Solaris Cluster 4.3 was already configured. Installed was running but the Cluster Node Information screen was appearing like this The install log shows this: INFO: Checking cluster configuration details INFO: Found Vendor Clusterware. Fetching Cluster Configuration INFO: Executing [/tmp/OraInstall2017-03-28_12-50-48PM/ext/bin/lsnodes] with environment variables {TERM=xterm, LC_COLLATE=, SHLVL=3, JAVA_HOME=, XFILESEARCHPATH=/usr/dt/app-defaults/%L/Dt, SSH_CLIENT=172.16.64.55 56370 22, LC_NUMERIC=, LC_MESSAGES=, MAIL=/var/mail/oracle, PWD=/export/software/grid/grid, XTERM_VERSION=XTerm(320), WINDOWID=2097165, LOGNAME=oracle, _=*50727*/export/software/grid/grid/install/.oui, NLSPATH=/usr/dt/lib/nls/msg/%L/%N.cat, SSH_CONNECTION=172.16.64.55 56370 172.16.72.18 22, OLDPWD=/export/oracle, LC_CTYPE=, CLASSPATH=, PATH=/usr/bin:/usr/ccs/bin:/usr/bin:/bin:/export/software/grid/grid/install, LC_ALL=, DISPLAY=localhost:10.0, LC_MONETARY=, USER=oracle, HOME=/export/oracle, XTERM_SHELL=/bin/bash, XAUTHORITY=/tmp/ssh-xauth-mlq21a/xauthfile, A__z="*SHLVL, XTERM_LOCALE=en_US.UTF-8, TZ=localtime, LC_TIME=, LANG=en_US.UTF-8} ...

ERROR: SPFile in diskgroup <> does not match the specified spfile

Just a stupid error. Posting it so that someone else googling for the same thing can get a clue. An ASM instance running with default parameters (no pfile, no spfile). Updated spfile for the instance with asmcmd spset command and bounced crs. After reboot also, it still wasn’t using spfile. Got puzzled and checked GPnP settings again. All looked good. Then in alert log came across this [text]ERROR: SPFile in diskgroup <> does not match the specified spfile +DATA/asm/asmparameterfile/registry.253.769187275[/text] ...

addNode.sh, failed root.sh and IB listener

So this customer has an Exadata quarter rack and they have an IB listener configured on both DB nodes (for DB connections from a multi-racked Exalogic system). We were adding a new DB node to this rack. So just followed the standard procedure of creating users, directories etc on the new node, setting up ssh equivalence and running addNode.sh. All went fine but root.sh failed. Little looking into the logs revealed that it failed while running srvctl start listener –n <node_name> ...

OEDA–Things to keep an eye on

So if you are filling an OEDA for Exadata deployment there are few things you should take care of. Most of the screens are self explanatory but there are some bits where one should focus little more. I am running the Aug version of it and the screenshots below are from that version. On the Define customer networks screen, the client network is the actual network where your data is going to flow. So typically it is going to be bonded (for high availability) and depending upon the network in your data center you have to select one out of 1/10 G copper and 10 G optical. ...