Amardeep Sidhu's Tech blog

Unable to connect to the database with SQLPlus

I was working on configuring a new database for backup to ZDLRA and hit this issue while testing a controlfile backup via Enterprise Manager -> Schedule backup. It could happen in any environment. Unable to connect to the database with SQLPlus, either because the database is down or due to an environment issue such as incorrectly specified... If the database is up, check the database target monitoring properties and verify that the Oracle Home value is correct. The 2nd line clearly tells the problem but since the Cluster Database status in EM was green, so it took me a while to figure it out. Issue turned out to be a missing / in the end of ORACLE_HOME specified in monitoring configuration of the cluster database. The DB home specified was /u01/app/oracle/product/11.2.0.4/dbhome_1 instead of /u01/app/oracle/product/11.2.0.4/dbhome_1 /. ...

File system already present at specified mount point /dbfs_direct

It was actually funny. So thought about posting it that sometimes how we can miss the absolute basics. This customer is using a virtualized Exadata with multiple VMs. One VM hosts the database meant to be used for dbfs and another VM connects to this DB over IB to mount dbfs file system using dbfs_client. One day VMs were rebooted and due to some reason the dbfs filesystem didn’t mount on startup. It went on for few days and they couldn’t mount it. One day I got a chance to look at it and the error they were facing was: ...

Database performance degradation due to multipath issues

To put it in bit of an Indian context, database is not your daughter-in-law that you can blame it for every performance issue that occurs in the environment. But it does happen. Most of the time it is the database that is blamed for all such issues. Many times, the issues are in some other layer like OS, network or storage. Faced this issue recently at one of the customer sites where performance in one of the databases went down suddenly. It was a 2 node RAC on 12.1.0.2 running on Linux 7 using some kind of Hitachi SSD storage array. There were no changes as per DBA, application, OS and storage teams. But something must have changed somewhere. Otherwise why would performance degrade just like that. I & my colleague checked some details and found that something happened in the morning a day before. Starting from that point in time, the execution time for all the commonly run queries shot up. Generally speaking, when all the queries are doing bad and you are sure that nothing has been changed on the database side, the reasons could be outside the database. But being a DBA, it is not easy to prove that. We took AWRs from good and bad times and the wait events section looked like this: ...

ZDLRA patching

To be honest, Fernando Simon has already documented all the steps needed in ZDLRA patching . So this post is more like a reference post for me and it points to the links on his blog. One thing he could change though are the post titles. He also agrees ;) https://twitter.com/amardeep_sidhu/status/1370304085245661192 ZDLRA patching is broadly divided into two parts. First part is where you patch the RA library and Grid & DB homes. Second part includes compute node & storage cell image patch and patches for IB/RoCE switches. Second part is exactly similar to Exadata except that it is bit restricted in terms of image versions that you can use. Only the versions that are certified for ZDLRA can be used. Also the RA library version and the Exadata image version should be compatible with each other. So if you are planning to patch only one part; RA library or the image, make sure that both the components stay compatible. The MOS note that has all these details is 1927416.1. This note should be the first place to go when you are planning to patch a ZDLRA. The steps for upgrade/patch, image patching are given in MOS note 2028931.1. There is another note 2639262.1 that discusses some of the known issues that you may face while doing the patching. It is important to review all three notes before you plan to patch. ...

[FATAL] [INS-44000] Passwordless SSH connectivity is not setup

Faced this while running installer for setting up a 2 node RAC setup (version 19.8) on an Oracle SuperCluster. The error reported in the log is: [FATAL] [INS-44000] Passwordless SSH connectivity is not setup from the local node node1 to the following nodes: [node2] [INS-06006] Passwordless SSH connectivity not set up between the following node(s): [node2] From the error it appears that the ssh is not setup between two nodes but actually that is not the case. Here the error message is bit misleading. It turned out to be an issue with scp with openssh version 8.x. Running the setup with -debug option gives the clue: ...

Doing an Exadata mixed cells config with OEDA

Earlier versions of OEDA didn’t allow you to have mixed cells in the configuration i.e. High Capacity (HC) and Extreme Flash (EF). The way to deal with that configuration was that deploy the system with either HC or EF cells and then manually configure the remaining cells. I am not sure when did it change but the newer versions allow you have mixed type of cells in a single OEDA configuration. Once you select the hardware, there is an additional option called Enable Additional Storage, where you can select the other type of cells. The minimum number of cells has to be three to use this option. Also the cells that are at the bottom of the rack physically should be selected as main storage and the other cells should be added as additional storage as that is how OEDA builds the configuration files. ...

Implementing ZDLRA – Part 2

In part 1, we discussed few things that you should take care before implementation of a ZDLRA. In this post, we will discuss few more things that you should review before or at the time of implementation: If you are getting two ZDLRAs (one each for primary and standby sites), there are two ways they can be deployed. One scenario is where all the primary databases (or the database that have no standby) backup to RA at the primary site and then the data is replicated from primary RA to RA at the standby site. This works well for the DBs that have no standby database. For the DBs where there is a standby database, there is a better architecture that can be deployed. In that scenario, primary databases backup to primary RA and the standby databases backup to standby RA. That saves you all the traffic over replication network. Oracle has published a whitepaper on how to do this configuration. Few of the instructions in this paper are a bit dated but it gives a good overall idea of how to do the implementation. Keep an eye on the features supported for different DB versions. An interesting one is that real-time redo shipping from standby databases is supported on 12c+ databases only. It is not supported for 11g. There could be other similar things. MOS note 1995866.1 has these details. Depending upon the ZDLRA software version being deployed, it may need a minimum version of EM and the ZDLRA plugin. MOS note 2542836.1 has these details. Make sure after discovering the the primary and standby databases in EM, their primary-standby relationship is reflected. Real-time redo sent to ZDLRA is compressed but the archive logs backup will be compressed only if you use compression in the RMAN command. It is always good to include backup archivelog command with daily incremental job to make sure that no archive log is missed. Many of the environments have separate networks for backup traffic. Make sure the backup traffic to ZDLRA uses DB server’s backup network. If that is not the case, you may need to add an explicit route on DB server for ZDLRA client/VIP/scan IPs. There are going to be different users that you will need to use: one OS user for deploying the EM agent, one DB user that will be used to run the backups. Depending upon your environment, it may oracle OS user, SYS DB user or could be some other named user created for this purpose. In next few posts, we will discuss some of the issues I have faced while doing ZDLRA implementation for some customers. ...

PRVF-4657 : Name resolution setup check for “db-scan” (IP address: x.x.x.101) failed

A quick note about an error I faced while running root.sh on an Exadata machine. The configuration tools failed with the following error: Error is PRVF-4657 : Name resolution setup check for "db-scan" (IP address: x.x.x.101) failed I did nslookup on the scan name and it all seemed good. So why the error ? After spending another 5 minutes, I looked at /etc/hosts and there was it. Someone had populated /etc/hosts of DB nodes with all the hostnames entries including the scan name. Something like: ...

Implementing ZDLRA – Part 1

Zero Data Loss Recovery Appliance (ZDLRA) is Oracle’s solution for database backups. It has many advantages over other backup solutions that are available in the market. This post has a brief introduction to ZDLRA and few links for further reading. This is a quick post about few of things that you should keep in mind if you are planning to get a ZDLRA (RA in short). Of course, there is a lot more that is needed while executing the whole plan, but these are some of the basics: ...

Using Secure Fabric for network isolation in KVM environments on Exadata

Exadata storage software version 20.1 introduces a new feature called “Secure Fabric” for KVM based multi cluster deployments (Exadata X8M). It enables network isolation between multiple tenants (i.e. KVM VMs based RAC clusters). This feature aligns with Infiniband Partitioning on OVM based systems. There are customers who in such scenarios want that VMs of one RAC shouldn’t be able to see traffic of the other RAC VMs. This feature achieves that. Similar to Pkeys in IB switches, here it uses a double VLAN tagging system where the first tag identiefies the network partition and the second tag is used to denote membership level of the VM. Exadata documention has more details. ...