Capgemini Oracle Blog

Capgemini Oracle Blog

Opinions expressed on this blog reflect the writer’s views and not the position of the Capgemini Group

Alternate title: 

Of ECIDs and Histograms;
Oracle SOA Suite Metadata Repository Performance Management

In my job as system administrator/DBA, projects related to Oracle’s SOA Suite I put a lot of emphasis on configuration of the environment, like the OS, the Oracle database, WebLogic, OSB, and other products. Part of the Oracle SOA Suite is the Oracle Metadata Repository, where all metadata and run-time data of applications is stored. This repository is often forgotten in performance management , but deserves special attention when dealing with performance improvements.
This blog is part of a series of blogs around Oracle SOA Suite Metadata Repository Performance Management. In this blog we will show how a performance gain of factor 15 was achieved while querying the Metadata Repository.

Terminology
Oracle Metadata Repository
Oracle Metadata Repository is an Oracle database that contains additional schemas to support Oracle Fusion Middleware and its components, for design and run-time management of the applications. Oracle SOA Suite mainly uses two components:
  • The Metadata Store (MDS) to store all SCA artifacts
  • The Dehydration Store (SOAINFRA) to persist composite instances and to audit composite instances
ECIDs
ECID, or Execution Context ID, is a unique identifier to correlate events or requests associated with the same transaction across several components. As a message is passed from composite to composite, the ECID is passed within each message. In other words, ECIDs are used to track a message flow that crosses instances of different composite applications.

Histograms & buckets
Histograms are a special type of column statistics that provide more detailed information about the data distribution in a table column. Histograms help the Oracle Optimizer in deciding whether to use an index or a fulltable scan. These are most useful for a column that is included in a WHERE clause and the data distribution is skewed. A histogram sorts values into buckets to help estimate the cardinality. Oracle uses a maximum of 254 buckets.

Case
Investigating performance issues around purging the Oracle Metadata Repository, I stumbled over the following simple query that was executed by the SOA Suite while running an application:

SELECT ID, CONVERSATION_ID, ...
  FROM COMPOSITE_INSTANCE
 WHERE (ECID = :1)
 ORDER BY CREATED_TIME DESC;


This is a simple query with a simple WHERE clause returning 22 records. Nothing special. And as an index on column ECID with lots of distinct values exists, I expected the Oracle Optimizer to execute the query using an Index Range Scan.
To my surprise the Oracle Optimizer chose to use a Full Table Scan (see Explain Plan 1 below). The query took 2.79 seconds.

Id Operation Name Rows Bytes Cost (%CPU)
0 SELECT STATEMENT   124K 13M 27781   (1)
1  SORT ORDER BY   124K 13M 27781   (1)
*2   TABLE ACCESS FULL COMPOSITE_INSTANCE 124K 13M 25764   (1)
Explain Plan 1: TABLE ACCESS FULL


A quick test where I used a hint (/*+ index(composite_instance composite_instance_ecid) */) to force the usage of the index in the select statement shows the following explain plan. The query took 0.19 seconds to complete.

Id Operation Name Rows Bytes Cost (%CPU)
0 SELECT STATEMENT   5 575 8  (13)
1  SORT ORDER BY   5 575 8  (13)
2   TABLE ACCESS BY INDEX ROWID COMPOSITE_INSTANCE 5 575 7  (0)
*3     INDEX RANGE SCAN COMPOSITE_INSTANCE_ECID 5   3  (0)
Explain Plan 2: INDEX RANGE SCAN


So my gut feeling was right, an execution plan using the index is much faster. Using the index makes the query 15x faster over the full table scan! And using the index used a lot less I/O than the full table scan (18 consistent gets vs. 77,705 consistent gets).
 
So why didn’t Oracle use the index?
To answer this question, some investigating needs to be done.
Examining Explain Plan 1, using the full table scan, we see that the Oracle Optimizer expected the query to return approximately 124,000 records. In reality, the query returned only 22 records. The Oracle Optimizer is depending on statistics to come up with an execution plan.
Investigating the statistics showed:

  • Table holds 1,490,123 records in 44,827 blocks.
  • Index on ECID holds 1,464,478 keys of which 356,992 distinct values.
    BLevel of 2 and a Clustering Factor of 1,251,361.
  • A height-balanced histogram is created on the ECID column.
So, on average, for every value on the ECID column, 4.1 records are returned by the query. Looking at Explain Plan 2, using the index, the Oracle Optimizer estimated 5 records to be returned. This value is in the same order of magnitude.
But wait a minute, why is a histogram created for the ECID column if it’s holding vast numbers of distinct values? Histograms sort distinct values into a maximum of 254 buckets. So histograms are useful in case of a limited set of distinct values. The number of distinct values (356,992) for the ECID column vastly exceeds the maximum number of buckets.
Examining the histogram (see Histogram Statistics below), shows only 9 buckets out of a maximum of 254 buckets are being used, meaning bucket compression is being used, and thus the Oracle histogram sees only 9 (!) distinct values. No wonder, the Oracle Optimizer chose a Full Table Scan to execute the query as the cardinality estimate is completely off.
 
SELECT endpoint_number, endpoint_value
  FROM USER_HISTOGRAMS
 WHERE table_name = 'COMPOSITE_INSTANCE' AND column_name = 'ECID'
 ORDER BY endpoint_number;

ENDPOINT_NUMBER ENDPOINT_VALUE
11 2,5130E+35
59 2,6065E+35
62 2,6593E+35
65 2,9797E+35
82 5,0465E+35
84 5,1082E+35
85 5,1516E+35
97 5,1520E+35
254 5,2021E+35

Histogram Statistics
Let’s remove the histogram and see what execution path the Oracle Optimizer favors now.
We remove the histogram, using:

exec dbms_stats.delete_column_stats('SOAINFRA','COMPOSITE_INSTANCE', -'ECID',col_stat_type=>'HISTOGRAM')

and rerun the query.
 

Id Operation Name Rows Bytes Cost (%CPU)
0 SELECT STATEMENT   5 575 8  (13)
1  SORT ORDER BY   5 575 8  (13)
2   TABLE ACCESS BY INDEX ROWID COMPOSITE_INSTANCE 5 575 7  (0)
*3     INDEX RANGE SCAN COMPOSITE_INSTANCE_ECID 5   3  (0)

Explain Plan 3: Histogram is removed


By removing the histogram the Oracle Optimizer now uses the Index Range Scan on the ECID column to execute the query, resulting in a 15x faster performance and very little I/O.
Coming back to the question earlier why Oracle didn’t use the index at hand, the answer is the presence of the histogram.

But why did Oracle create a histogram in the first place?
To answer this question we need to dive deeper into histograms. The histogram statistics showed a cardinality mismatch. Somehow Oracle is fooled to believe there are very few distinct values stored in this ECID column.
Looking at the limitations of histograms:

  • A maximum of 254 buckets are used to distribute the distinct values
  • Histograms store only the first 32 characters of a character string.
The maximum number of buckets puts a limitation on the number of distinct values useful in a histogram. Both limitations are in place to limit resource usage while calculating the histogram statistics.
As the histogram is calculated on the ECID column let’s look at the data. It is filled by a monotonically increasing GUID lookalike. The average length of the ECID column values is 62 bytes. The GUID lookalike is increased on the right side. The data looks like this:

0f34473c0736d22b:-655b1c3e:14b7fc3be94:-8000-000000000003d039
0f34473c0736d22b:-655b1c3e:14b7fc3be94:-8000-000000000003d9c2
0f34473c0736d22b:-655b1c3e:14b7fc3be94:-8000-000000000003e39e
0f34473c0736d22b:-655b1c3e:14b7fc3be94:-8000-00000000000465bc


As Oracle histograms only use the first 32 characters to find distinct values, this explains why so few distinct values are found, making the creation of a histogram
plausible.
So the length of the data (62 characters) and the monotonic right-side increase of the data fool Oracle into creating a histogram.

How to prevent Oracle from creating the histogram?
We can drop the histogram as explained above. But the histogram is created during the Automatic Optimizer Statistics Collection. So you need to prevent the statistic gathering job from recreating the histogram.
You can do this using the statement below.

exec dbms_stats.set_table_prefs('SOAINFRA','COMPOSITE_INSTANCE', -'METHOD_OPT','FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 1 ECID);

Setting the column size to 1, prevents the creation of histograms.

Conclusion
  • ECIDs fool Oracle into creating histograms by their length and their monotonic right-sided increase.
  • Histograms on the ECID column impacts performance negatively.
  • Removing the histogram on the ECID column lead to a performance boost of factor 15.
  • Remove histograms from ECID columns and prevent them from being recreated.
Hope you liked it and can use it to improve the performance of your SOA Suite application.


Paul Wever is Oracle DBA and Weblogic configuration specialist at Capgemini.
 

About the author

Léon Smiers

Leave a comment

Your email address will not be published. Required fields are marked *.