Blog Aggregator - Amardeep Sidhu

  • Tags

  • Archives

  • Meta

  •  

    October 2007
    M T W T F S S
    « Sep   Nov »
    1234567
    891011121314
    15161718192021
    22232425262728
    293031  

Archive for October 16th, 2007

16 Oct

Data Guard in SYNC mode – does my DB suffer from network weakness ?

My Grid Control console is often polluted by warnings about wait events of the network event class :

Is my Data Guard setup really suffering from poor performance?

I operate with the Maximum Availability mode.

We will investigate why the Grid Control warnings occur. The database is up since about 3 days (72 hours). First of all try to identify the wait classes which generated the most important wait times:


set lines 130
select * from V$SYSTEM_WAIT_CLASS
order by 5
/

WAIT_CLASS_ID WAIT_CLASS# WAIT_CLASS          TOTAL_WAITS TIME_WAITED
————- ———– ——————- ———– ———–
   4217450380           1 Application                1472         434
   3875070507           4 Concurrency                1178        1338
   3290255840           2 Configuration              1097        5250
   3386400367           5 Commit                     3512        6981
   4166625743           3 Administrative             4906      111527
   2000153315           7 Network                 2549790      325016
   1740759767           8 User I/O                1384276      427440
   4108307767           9 System I/O              3151272      646415
   1893977003           0 Other                    251373     2207566
   2723168908           6 Idle                    3720999   639262672

In the database, after a performance alert occurred in Grid Control, we can identify the problem in V$WAITCLASSMETRIC_HISTORY :


select BEGIN_TIME, END_TIME, DBTIME_IN_WAIT, TIME_WAITED WAIT_COUNT
  from V$WAITCLASSMETRIC_HISTORY
 where WAIT_CLASS# = 7; # 7 for network

BEGIN_TIME           END_TIME               DBTIME_IN_WAIT TIME_WAITED WAIT_COUNT
——————– ——————–   ————– ———– ———-
25-SEP-2007 11:00:54 25-SEP-2007 11:01:54   21.0387822     2143.1923        19262
25-SEP-2007 10:59:54 25-SEP-2007 11:00:54   51.2713653     2147.2517        19047
25-SEP-2007 10:58:54 25-SEP-2007 10:59:54   55.8887214     2480.2726        18405
25-SEP-2007 10:57:53 25-SEP-2007 10:58:54   44.2017456     2374.9075        21223
25-SEP-2007 10:56:53 25-SEP-2007 10:57:53   33.1204214     2676.6979        28336
25-SEP-2007 10:55:53 25-SEP-2007 10:56:53   37.3184263     3498.7629        30965
25-SEP-2007 10:54:54 25-SEP-2007 10:55:53   27.779576      3130.5472        29403
25-SEP-2007 10:53:54 25-SEP-2007 10:54:54   26.117611      2752.5801        23430


Now we will try to analyze in depth the events mentioned in Grid control :


"Database Time Spent Waiting (%)" is at 88.26568 for event class "Network"

Let’s have a look at the primary site. According to the white paper presented in the Metalink note 387174.1, the event “LNS wait on SENDREQ” covers the sequence of the following actions (when a commit transaction is propagated to the standby database).

(Extract of the Metalink note:)
The RFS process receives the redo being sent by LNS and completes the I/O to the standby redo log (important to understand the analysis below).

V$SYSTEM_EVENT is the view which will allow us to identify the current waits events :


select EVENT, TOTAL_WAITS, TOTAL_TIMEOUTS, TIME_WAITED, AVERAGE_WAIT
from v$system_event
where event like '%LNS%'
   or event like '%LGWR%'
/

EVENT                     TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT
————————- ———– ————– ———– ————
. . .
LNS wait on SENDREQ            125643              0      156673         1.25
. . .


On the standby side we perform almost the same select (a few seconds later) , to identify the LNS/RFS related wait times :

select EVENT, TOTAL_WAITS, TOTAL_TIMEOUTS, TIME_WAITED, AVERAGE_WAIT

  from v$system_event
 where event like '%RFS%'
 order by 4
/

EVENT                     TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT
————————- ———– ————– ———– ————
. . .
RFS write                      125270              0      121825          .97
. . .

We clearly observe that out of the 156673 ms waited because of "LNS wait on SENDREQ" (on the primary, see above for detailed description of this event), 121825 ms are due to the waits on writing to the standby redo log files on the standby side ("RFS write")

In the whole propagation/synchronization process between primary and standby the most important part is simply the writing on the standby redo log files (normal I/Os to keep the transactions on the standby side) and not the network as expected.

It is worth to mention that the "redo write time" on the primary is 128333 ms which means that the 121825 ms of waits for the RFS writes are fully acceptable (normal I/O operations/performance)

During these 3 days while the database was up and running having waits of about 120000 to 130000 ms (120 to 130 s) is absolutely negligible!

As a conclusion: although the most important waits on the primary database are related to the network, there are negligible in absolute values. 90% of 120 seconds of waits in 3 days is still only 1,66 second of wait per hour in average

More details on www.trivadis.com -> publications :

http://www.trivadis.com/Images/Dataguard_network_tuning_tvd_layout_tcm16-16758.pdf

 

16 Oct

My top 10 Oracle 11g New Features Part 5: DDL-logging in the alert log in 11g

This week I deliver for Oracle Belgium in Brussels.
It is an 11g New Features for Administrators course.
I have shown some nice features this afternoon
which are not really documented yet
and want to share some of it here.
As of Oracle database 11g we can have loggin information for DDL operations in the alert log files. I say files in […]

© 2008 Blog Aggregator - Amardeep Sidhu | Entries (RSS) and Comments (RSS)

Powered by Wordpress, design by Web4 Sudoku, based on Pinkline by GPS Gazette