A customer is using an Exadata X8M-2 machine with multiple VMs (hence multiple clusters). I was working on adding a new storage cell to the configuration. After creating griddisks on the new cell and updating cellip.ora on all the VMs, I noticed that none of the clusters was able to see the new griddisks. I checked the usual suspects like if asm_diskstring was set properly, private network subnet mask on new cell was same as the old ones. All looked good. I started searching about the issue and stumbled upon some references mentioning ASM scoped security. I checked on one of the existing cells and that actually was the issue. The existing nodes had it enabled while the new one hadn’t. Running this command on an existing cell
We need to enable ASM scoped security on the new cell as well. There are three things that need to be done. We need to copy /etc/oracle/cell/network-config/cellkey.ora from an existing cell to the new cell, assign the key to the cell and then assign keys to the different ASM clusters. We can use these commands to do it
cellcli -e ASSIGN KEY FOR CELL 'c25a62472a160e28bf15a29c162f1d74'
cellcli -e ASSIGN KEY FOR ASMCLUSTER 'cluster1'='fa292e11b31b210c4b7a24c5f1bb4d32';
cellcli -e ASSIGN KEY FOR ASMCLUSTER 'cluster2'='b67d5587fe728118af47c57ab8da650a';
Once this is done, we need to tag the griddisks for appropriate ASM clusters. If the griddisks aren’t created yet, we can use this command to do it
cellcli -e CREATE GRIDDISK ALL HARDDISK PREFIX=sales, size=75G, availableTo='cluster1'
If the griddisks are already created, we can use the alter command to make this change
cellcli -e alter griddisk griddisk0,gridisk1,.....griddisk11 availableTo='cluster1';
Once this is done, we should be able to see new griddisks as CANDIDATE in v$asm_disk
A customer who is using an Exadata X8M-2 with multiple VMs had Smokescreen deployed in their company recently and they reported an issue that one of the Smokescreen decoy servers in their DC was seeing traffic from one of the Exadata VMs on a certain port. That was rather confusing as that port was the database listener port on that VM and why would a VM with Oracle RAC deployed try to access any random IP on the listener port. Also it was happening only for this VM. Nothing for so many other VMs.
We were just looking at the things and my colleague said that he had seen this IP somewhere and he started looking through the emails. In a minute, we found the issue as he found this IP mentioned in one of the emails. This was the VIP of this VM from where the traffic was reported to be originating. While reserving IPs for Smokescreen decoy servers, someone made the mess and re-used the IP that was already used for one of the VIPs of this RAC system !
It was actually funny. So thought about posting it that sometimes how we can miss the absolute basics. This customer is using a virtualized Exadata with multiple VMs. One VM hosts the database meant to be used for dbfs and another VM connects to this DB over IB to mount dbfs file system using dbfs_client. One day VMs were rebooted and due to some reason the dbfs filesystem didn’t mount on startup. It went on for few days and they couldn’t mount it. One day I got a chance to look at it and the error they were facing was:
File system already present at specified mount point /dbfs_direct
If you are familiar with Unix, this clearly indicates some problem with the directory where it is trying to mount the file system. I checked that and there were some files in /dbfs_direct. I moved those files and it was able to mount. Issue resolved.
Then after closing the session, I was thinking that what could have happened as this dbfs mount point has been in use for long. Then it struck me. It was being used to take some RMAN backups and the path was hard coded in the scripts. When it didn’t mount after the reboot (i don’t know why), someone ran that script and whatever directories it didn’t find, it probably created that complete directory structure and tried to write to a log file. Once /dbfs_direct had those files, it anyway was not going to mount.
Earlier versions of OEDA didn’t allow you to have mixed cells in the configuration i.e. High Capacity (HC) and Extreme Flash (EF). The way to deal with that configuration was that deploy the system with either HC or EF cells and then manually configure the remaining cells.
I am not sure when did it change but the newer versions allow you have mixed type of cells in a single OEDA configuration. Once you select the hardware, there is an additional option called Enable Additional Storage, where you can select the other type of cells. The minimum number of cells has to be three to use this option. Also the cells that are at the bottom of the rack physically should be selected as main storage and the other cells should be added as additional storage as that is how OEDA builds the configuration files.
Once this is selected, on the Diskgroups screen, select Diskgroup layout as custom and you can create multiple diskgroups and select cells for each diskgroup (as EF & HC cells can’t be part of the same diskgroup).
Once the configuration is generated, it can be deployed with OneCommand without any manual intervention. A small feature but makes life easier by getting rid of all the manual steps.
A quick note about an error I faced while running root.sh on an Exadata machine. The configuration tools failed with the following error:
Error is PRVF-4657 : Name resolution setup check for "db-scan" (IP address: x.x.x.101) failed
I did nslookup on the scan name and it all seemed good. So why the error ? After spending another 5 minutes, I looked at /etc/hosts and there was it. Someone had populated /etc/hosts of DB nodes with all the hostnames entries including the scan name. Something like:
As /etc/hosts can return only one IP against a hostname whereas for scan, DNS is supposed to return 3 IPs, hence the problem. The solution is to comment out the scan name entries in /etc/hosts on all the db nodes and let the system do the name resolution via the DNS.