Tag Archives: Performance

Database performance degradation due to multipath issues

To put it in bit of an Indian context, database is not your daughter-in-law that you can blame it for every performance issue that occurs in the environment. But it does happen. Most of the time it is the database that is blamed for all such issues. Many times, the issues are in some other layer like OS, network or storage.

Faced this issue recently at one of the customer sites where performance in one of the databases went down suddenly. It was a 2 node RAC on 12.1.0.2 running on Linux 7 using some kind of Hitachi SSD storage array. There were no changes as per DBA, application, OS and storage teams. But something must have changed somewhere. Otherwise why would performance degrade just like that. I & my colleague checked some details and found that something happened in the morning a day before. Starting from that point in time, the execution time for all the commonly run queries shot up. Generally speaking, when all the queries are doing bad and you are sure that nothing has been changed on the database side, the reasons could be outside the database. But being a DBA, it is not easy to prove that. We took AWRs from good and bad times and the wait events section looked like this:

Now there is something clearly and terribly wrong with the details in the second snippet and in the first look it appears to be an IO issue. Av Rd(ms) in the File IO Stats section of the AWR reports was also showing really bad numbers for most of the data files, which have been fine two days ago.

The conference calls continued and we were not reaching anywhere. Storage team as usual said that everything was fine and there were no issues. Finally the discussion moved to multipathing and the teams started checking in that direction. There were errors like this in /var/log/messages

multipathd: asm!.asm_ctl_vbg1: failed to get path uid
multipathd: asm!.asm_ctl_vbg6: failed to get path uid
multipathd: asm!.asm_ctl_vbg9: failed to get path uid

That meant there was a problem with one of the paths from the database nodes to storage. They disabled the bad path for both the DB nodes and voila ! IO performance was back on track. It was multipathing that needed to be fixed.

So it is always not the database. It is unfair to always blame the DBA !

Tuning GoldenGate Extract Pump performance

Just a quick note/post about the significance of COMPRESS and TCPBUFSIZE parameter in performance of a GoldenGate Extract Pump process. COMPRESS helps in compressing the outgoing blocks hence helping in better utilization of the bandwidth from source to target. GG is going to uncompress the blocks before writing them to the remote trail file on the target. Compression ratios of 4:1 or better can be achieved. Of course, use of COMPRESS may result in increased CPU usage on both the sides.

TCPBUFSIZE controls the size of the TCP buffer socket that is going to be used by the Extract. If the bandwidth allows, it will be a good idea to send larger packets. So depending upon the available bandwidth one can experiment with the values of TCPBUFSIZE. At one of the client sites, I saw a great increase in the performance after setting TCPBUFSIZE. The trail file (10 MB size) that was taking almost a minute to transfer started getting through in few seconds after setting this parameter. Documentation (http://docs.oracle.com/cd/E35209_01/doc.1121/e29399.pdf page 313) provides the method to calculate the optimum value for TCPBUFSIZE for your environment.

While using TCPBUFSIZE value for TCPFLUSHBYTES (at least equal to the value of TCPBUFSIZE) also needs to be set. It is the buffer that collects the data that is going to be transferred to the target.

These parameters can be used like following:

rmthost, mgrport, compress, tcpbufsize 10000, tcpflushbytes 10000

Also see the metalink note 1071892.1.