I was working on configuring a new database for backup to ZDLRA and hit this issue while testing a controlfile backup via Enterprise Manager -> Schedule backup. It could happen in any environment.
Unable to connect to the database with SQLPlus, either because the database is down or due to an environment issue such as incorrectly specified...
If the database is up, check the database target monitoring properties and verify that the Oracle Home value is correct.
The 2nd line clearly tells the problem but since the Cluster Database status in EM was green, so it took me a while to figure it out. Issue turned out to be a missing / in the end of ORACLE_HOME specified in monitoring configuration of the cluster database. The DB home specified was /u01/app/oracle/product/184.108.40.206/dbhome_1 instead of /u01/app/oracle/product/220.127.116.11/dbhome_1/.
On the server the bash_profile has the home set as /u01/app/oracle/product/18.104.22.168/dbhome_1. When I tried to connect as sysdba there, it gave an error TNS : lost contact. Then I set the environment with .oraenv and I was able to connect. /etc/oratab had correct home specified as /u01/app/oracle/product/22.214.171.124/dbhome_1/. After comparing the value of ORACLE_HOME in these two cases, the issue was identified. Then I updated the ORACLE_HOME value in the target monitoring configuration in Enterprise Manager and it worked as expected.
To be honest, Fernando Simon has already documented all the steps needed in ZDLRA patching . So this post is more like a reference post for me and it points to the links on his blog. One thing he could change though are the post titles. He also agrees 😉
ZDLRA patching is broadly divided into two parts. First part is where you patch the RA library and Grid & DB homes. Second part includes compute node & storage cell image patch and patches for IB/RoCE switches. Second part is exactly similar to Exadata except that it is bit restricted in terms of image versions that you can use. Only the versions that are certified for ZDLRA can be used. Also the RA library version and the Exadata image version should be compatible with each other. So if you are planning to patch only one part; RA library or the image, make sure that both the components stay compatible. The MOS note that has all these details is 1927416.1. This note should be the first place to go when you are planning to patch a ZDLRA. The steps for upgrade/patch, image patching are given in MOS note 2028931.1. There is another note 2639262.1 that discusses some of the known issues that you may face while doing the patching. It is important to review all three notes before you plan to patch.
The RA library patching part can be considered of two different types. This is an important difference. Make sure that you follow the right set of commands. When you are jumping between major versions say going from 12.x to 19.x, it is called an upgrade and the commands are like racli upgrade appliance –step=1. Fernando talks about this in detail in this post.
On the other hand, when you are not jumping between versions; say going from 19.x to 19.x only, it is called patching and the commands are like racli patch appliance –step=1. Fernando has discussed this in detail in this post.
The Exadata bit (image & switches patching) of it is exactly the same as we do in Exadata. Fernando talks about this in this post.
The RA library patching bit is pretty much automated and works fine most of the time. If you hit an issue, you may find the solution/workaround documented in one of the MOS notes.
Happy patching !
In part 1, we discussed few things that you should take care before implementation of a ZDLRA. In this post, we will discuss few more things that you should review before or at the time of implementation:
- If you are getting two ZDLRAs (one each for primary and standby sites), there are two ways they can be deployed. One scenario is where all the primary databases (or the database that have no standby) backup to RA at the primary site and then the data is replicated from primary RA to RA at the standby site. This works well for the DBs that have no standby database. For the DBs where there is a standby database, there is a better architecture that can be deployed. In that scenario, primary databases backup to primary RA and the standby databases backup to standby RA. That saves you all the traffic over replication network. Oracle has published a whitepaper on how to do this configuration. Few of the instructions in this paper are a bit dated but it gives a good overall idea of how to do the implementation.
- Keep an eye on the features supported for different DB versions. An interesting one is that real-time redo shipping from standby databases is supported on 12c+ databases only. It is not supported for 11g. There could be other similar things. MOS note 1995866.1 has these details.
- Depending upon the ZDLRA software version being deployed, it may need a minimum version of EM and the ZDLRA plugin. MOS note 2542836.1 has these details.
- Make sure after discovering the the primary and standby databases in EM, their primary-standby relationship is reflected.
- Real-time redo sent to ZDLRA is compressed but the archive logs backup will be compressed only if you use compression in the RMAN command. It is always good to include backup archivelog command with daily incremental job to make sure that no archive log is missed.
- Many of the environments have separate networks for backup traffic. Make sure the backup traffic to ZDLRA uses DB server’s backup network. If that is not the case, you may need to add an explicit route on DB server for ZDLRA client/VIP/scan IPs.
- There are going to be different users that you will need to use: one OS user for deploying the EM agent, one DB user that will be used to run the backups. Depending upon your environment, it may oracle OS user, SYS DB user or could be some other named user created for this purpose.
In next few posts, we will discuss some of the issues I have faced while doing ZDLRA implementation for some customers.
PS: Fernando Simon has written some brilliant posts related to ZDLRA on his blog. I highly recommend to review all of them. Brilliant stuff.
Zero Data Loss Recovery Appliance (ZDLRA) is Oracle’s solution for database backups. It has many advantages over other backup solutions that are available in the market. This post has a brief introduction to ZDLRA and few links for further reading. This is a quick post about few of things that you should keep in mind if you are planning to get a ZDLRA (RA in short). Of course, there is a lot more that is needed while executing the whole plan, but these are some of the basics:
- The very first thing is capacity planning. Depending upon the number & sizes of the DBs that you plan to backup, you need to choose the required configuration. In most cases, an Oracle guy would be doing this for you but you should actively participate in the exercise by providing all the necessary information so that the calculations can be as accurate as possible.
- Another things that plays an important role in deciding the capacity needed is the retention period i.e. period for which you would like to keep the backups in RA. More the number of days, more is the space that you will need.
- Another important thing to consider is whether you are getting only one RA (for primary or standby site) or getting two of them i.e. one each for primary and standby site. Both scenarios need different type of configurations (including the bandwidth requirements between primary and standby sites) so it needs to be planned accordingly.
- One more aspect you need to consider is long term retention. It could be Oracle Cloud object storage or some tape solution.
- Once you have enabled DB backups to ZDLRA, you will need to stop all other backups. Plan that accordingly. Oracle provides way to run the legacy and ZDLRA backups together but that is for short duration i.e. when you are migrating from legacy backups to ZDLRA. That is not really a way to run 2 backup strategies together for long term.
In the next post, will talk about few more things that are important at the time of actual implementation.