Tuning GoldenGate Extract Pump performance

Just a quick note/post about the significance of COMPRESS and TCPBUFSIZE parameter in performance of a GoldenGate Extract Pump process. COMPRESS helps in compressing the outgoing blocks hence helping in better utilization of the bandwidth from source to target. GG is going to uncompress the blocks before writing them to the remote trail file on the target. Compression ratios of 4:1 or better can be achieved. Of course, use of COMPRESS may result in increased CPU usage on both the sides.

TCPBUFSIZE controls the size of the TCP buffer socket that is going to be used by the Extract. If the bandwidth allows, it will be a good idea to send larger packets. So depending upon the available bandwidth one can experiment with the values of TCPBUFSIZE. At one of the client sites, I saw a great increase in the performance after setting TCPBUFSIZE. The trail file (10 MB size) that was taking almost a minute to transfer started getting through in few seconds after setting this parameter. Documentation (http://docs.oracle.com/cd/E35209_01/doc.1121/e29399.pdf page 313) provides the method to calculate the optimum value for TCPBUFSIZE for your environment.

While using TCPBUFSIZE value for TCPFLUSHBYTES (at least equal to the value of TCPBUFSIZE) also needs to be set. It is the buffer that collects the data that is going to be transferred to the target.

These parameters can be used like following:

rmthost, mgrport, compress, tcpbufsize 10000, tcpflushbytes 10000

Also see the metalink note 1071892.1.

DML and HCC – Exadata

Hybrid Columnar Compression (HCC) is a new awesome feature in Exadata that helps in saving a lot of storage space in your environment. This whitepaper on Oracle website explains this feature in detail. Also Uwe Hesse has an excellent how to use all this post on his blog. You can see the compression levels one can achive by making use of HCC. It is very simple to use feature but one needs to be aware of few things before using HCC extensively as otherwise all your storage calculations may go weird. Here are few of the things to keep in mind:

    • HCC works with direct path loads only that includes: CTAS, running impdp with ACCESS_METHOD=DIRECT or direct path inserts. If you insert data using a normal insert, it will not be HCC compressed.
    • It is most suited for tables that aren’t going to be updated once loaded. There are some complications (next point) that arise if some DML is going to be run on HCC compressed data.
    • At block level HCC stores data as compression units. A compression unit can be defined as a set of blocks. Now if some rows (stored with HCC) are updated, they need to be decompressed first. Also in that case the database needs to read the compression unit, not a single block. So once you do some update on the data stored in HCC, it will be moved out of HCC compression. To HCC compress it again you will need to do alter table table_name move compress for (Also see Metalink note 1332853.1). So if the tables you are planning to use HCC upon, undergo frequent DML, HCC may not be best suited for that scenario. Not only it will add the additional overhead of running alter table move statement every time some updates happen, it may screw up the storage space calculations as well.

OGG-01004 Aborted grouped transaction on ‘, Database error 1403 ()

The last post was just like that. It was this GoldenGate issue that woke me up from the deep sleep to do a post after a long time 😛 .

Well it was a simple schema to schema replication setup using GoldenGate. We were using the SCN method (Metalink Doc ID 1276058.1 & 1347191.1) to do the intial load so that there is no overlvapping of transactions and the replicat runs with minimum issues. Even after following this method, the replicat was hitting

2011-10-31 19:25:17  WARNING OGG-01004  Aborted grouped transaction on 'SCHEMA.TABLE', Database error 1403 ().

2011-10-31 19:25:17  WARNING OGG-01003  Repositioning to rba 3202590 in seqno 1.

2011-10-31 19:25:18  WARNING OGG-01154  SQL error 1403 mapping SCHEMA.TABLE TO SCHEMA.TABLE.

2011-10-31 19:25:18  WARNING OGG-01003  Repositioning to rba 3468713 in seqno 1.

If we managed to bypass this error somehow, it hit:

2011-10-24 19:58:15  WARNING OGG-00869  OCI Error ORA-00001: unique constraint (SCHEMA.UK) violated (status = 1), SQL <INSERT INTO "SCHEMA"."TABLE" (<INSERT HERE>

2011-10-24 19:58:15  WARNING OGG-01004  Aborted grouped transaction on 'SCHEMA.TABLE', Database error 1 (OCI Error ORA-00001: unique constraint (SCHEMA.UK) violated (status = 1), SQL <INSERT HERE>).

2011-10-24 19:58:15  WARNING OGG-01003  Repositioning to rba 1502788 in seqno 3.

2011-10-24 19:58:15  WARNING OGG-01154  SQL error 1 mapping SCHEMA.TABLE to SCHEMA.TABLE OCI Error ORA-00001: unique constraint (SCHEMA.UK) violated (status = 1), SQL <INSERT HERE>.

2011-10-24 19:58:15  WARNING OGG-01003  Repositioning to rba 1502788 in seqno 3.

1403 means that GoldenGate couldn’t find the record it wanted to update.

00001 would mean that the record GoldenGate tried to insert was already there.

In our case, as we used SCN method so none of them was expected. So these weird errors left us totally confused. Some guys suggested that expdp was not taking a consistent image and some transactions were getting overlapped (picked up by both expdp & GG extract trail). We took the database down and repeated the exercise but oops ! it hit almost the same errors again. So it was not about consistency for sure.

Till now we haven’t been examining the contents of discard file very seriously. As the errors were pretty simple so we always suspected that some transactions were getting overlapped. Now it was high time to take some help from discard file as well 😉 . We took the before/after image of the record from the discard file and checked it in the target database & values in one or two columns were different (that is why GG couldn’t find that record). The new values were the actual hint towards the solution [It was a table storing the mail requests and their statuses. This update that GG was trying to run was updating the status from NOT-SENT TO SENT but here on the target the status was already set to ‘ORA-something……’]. We got the clue that something must have run on the target itself that spoiled this record and now GG is not able to find it and abending with 1403. select * from dba_jobs cleared it all. While doing the initial load with expdp/impdp, job also got imported and some of them were in not broken state. They were firing according to their schedule and making changes to data in the target. So before GG came to update/insert record the job had already done its game and the replicat was hitting different errors. We did the initial load again (this time by using flashback_scn in the running database), disabled all the jobs and ran the replicat. It went through without any errors.

So things to take care of, in such cases:

1) Disable all the triggers on the target side (or exclude triggers while running expdp)

2) Look for and disable any scheduled jobs (could be dba_jobs, dba_scheduler_jobs or cron)

Happy GoldenGate’ing !

expdp not consistent

Came across this small oddity that documentation of 10.2 and 11.2 states that expdp by default takes consistent image of the database. But actually it is not so. You need to use flashback_scn/flashback_time for that. Metalink doc 377218.1 explains the scenario.

dbc_min_pct and dbc_max_pct in HP-UX

It was a 10g (10.2.0.5 on HP-UX 11.23 RISC) database which was recently upgraded from 9.2.0.8. The CPU and memory utilization was going really high. After tuning few of the queries coming in top, CPU usage was coming within accetable limits but the memory usage was still high. There was a total of 16 GB of RAM on the server and the usage was above 90%, constantly. One of the reasons behind high usage was increase in the SGA size. It was increased from 2.5 GB (in 9i) to around 5 GB (in 10g). Another major chunk was being eaten by OS buffer cache. While looking at the memory usage with kmeminfo:

Buffer cache        =  1048448    4.0g  25%  details with -bufcache

In HP-UX, The memory allocated to (dynamic) buffer cache is controlled by two parameters dbc_min_pct and dbc_max_pct. It can vary between dbc_min_pct and dbc_max_pct percent of the total RAM. They default to 5 and 50 respectively. For a system that is running an Oracle database value of 50 for dbc_max_pct is way too high. That means half of the memory is going to be allocated to OS buffer cache. As Oracle has got its own buffer cache so the OS cache is not of much use. As mentioned in the metalink note 726652.1, the value of dbc_max_pct can be safely lowered without impacting the Oracle database performance. In many of the threads (on HP website) people have suggested the value of 10 for db_max_pct. Not sure if it is more like a thumb rule but in the same metalink note (726652.1) it is mentioned that if %rcache in sar -b is above 90, that means your OS buffer cache is adequately sized.

After setting the value of dbc_max_pct to 15 (It will be changed to 10, finally), around 1.6 GB more memory was freed. Also there was no impact on the database or OS performance. Here are few of the metalink notes and threads on HP-UX website that talk about these parameters in detail:

Oracle Shadow Processes Are Taking Too Much Memory (Doc ID 434535.1)

How OS Buffer Cache Size Affects Db Performance (Doc ID 726652.1)

Commonly Misconfigured HP-UX Kernel Parameters (Doc ID 68105.1)

http://forums11.itrc.hp.com/service/forums/questionanswer.do?admit=109447626+1306231311459+28353475&threadId=1266914

http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=727618

http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=467288

http://forums11.itrc.hp.com/service/forums/questionanswer.do?threadId=750342