To put it in bit of an Indian context, database is not your daughter-in-law that you can blame it for every performance issue that occurs in the environment. But it does happen. Most of the time it is the database that is blamed for all such issues. Many times, the issues are in some other layer like OS, network or storage.
Faced this issue recently at one of the customer sites where performance in one of the databases went down suddenly. It was a 2 node RAC on 22.214.171.124 running on Linux 7 using some kind of Hitachi SSD storage array. There were no changes as per DBA, application, OS and storage teams. But something must have changed somewhere. Otherwise why would performance degrade just like that. I & my colleague checked some details and found that something happened in the morning a day before. Starting from that point in time, the execution time for all the commonly run queries shot up. Generally speaking, when all the queries are doing bad and you are sure that nothing has been changed on the database side, the reasons could be outside the database. But being a DBA, it is not easy to prove that. We took AWRs from good and bad times and the wait events section looked like this:
Now there is something clearly and terribly wrong with the details in the second snippet and in the first look it appears to be an IO issue. Av Rd(ms) in the File IO Stats section of the AWR reports was also showing really bad numbers for most of the data files, which have been fine two days ago.
The conference calls continued and we were not reaching anywhere. Storage team as usual said that everything was fine and there were no issues. Finally the discussion moved to multipathing and the teams started checking in that direction. There were errors like this in /var/log/messages
multipathd: asm!.asm_ctl_vbg1: failed to get path uid
multipathd: asm!.asm_ctl_vbg6: failed to get path uid
multipathd: asm!.asm_ctl_vbg9: failed to get path uid
That meant there was a problem with one of the paths from the database nodes to storage. They disabled the bad path for both the DB nodes and voila ! IO performance was back on track. It was multipathing that needed to be fixed.
So it is always not the database. It is unfair to always blame the DBA !
It was a 10g (10.2.0.5 on HP-UX 11.23 RISC) database which was recently upgraded from 126.96.36.199. The CPU and memory utilization was going really high. After tuning few of the queries coming in top, CPU usage was coming within accetable limits but the memory usage was still high. There was a total of 16 GB of RAM on the server and the usage was above 90%, constantly. One of the reasons behind high usage was increase in the SGA size. It was increased from 2.5 GB (in 9i) to around 5 GB (in 10g). Another major chunk was being eaten by OS buffer cache. While looking at the memory usage with kmeminfo:
Buffer cache = 1048448 4.0g 25% details with -bufcache
In HP-UX, The memory allocated to (dynamic) buffer cache is controlled by two parameters dbc_min_pct and dbc_max_pct. It can vary between dbc_min_pct and dbc_max_pct percent of the total RAM. They default to 5 and 50 respectively. For a system that is running an Oracle database value of 50 for dbc_max_pct is way too high. That means half of the memory is going to be allocated to OS buffer cache. As Oracle has got its own buffer cache so the OS cache is not of much use. As mentioned in the metalink note 726652.1, the value of dbc_max_pct can be safely lowered without impacting the Oracle database performance. In many of the threads (on HP website) people have suggested the value of 10 for db_max_pct. Not sure if it is more like a thumb rule but in the same metalink note (726652.1) it is mentioned that if %rcache in sar -b is above 90, that means your OS buffer cache is adequately sized.
After setting the value of dbc_max_pct to 15 (It will be changed to 10, finally), around 1.6 GB more memory was freed. Also there was no impact on the database or OS performance. Here are few of the metalink notes and threads on HP-UX website that talk about these parameters in detail: