Capgemini Oracle Blog

Capgemini Oracle Blog

Improving the performance of pivot view in OBIEE 10G

Categories : BITechnical

Overview We will look at an approach to improve the performance of the pivot view of OBIEE. Pivot view is the name of a class of representation in which we slice and dice data based on various dimensions. We arrange one dimension on the x axis and another on y axis. The data is then aggregated based on both these dimensions and presented in a grid format Intended audience This blog will be useful to anyone who intends to improve the performance of the pivot view of OBIEE Assumptions 1. The version of database server configured in the connection pool of OBIEE’s repository is Oracle 11GR1 or higher 2. The number of distinct values in the dimension placed on x axis will not change frequently in future. In case it does, we will have to change the OBIEE repository and the pivot view in web catalog Pivot data – A general discussion Pivoting data has always been one of the typical requirements in business intelligence. It is a method of displaying data in a more intuitive way. For example, consider the following data Name                Dept no                       Color of the shirt Tom                   40                                 Red Dick                   40                                 Blue Harry                 50                                 Blue Mary                  50                                  Red Julia                  50                                  Red If we wanted to find out the number of employees per department per color of shirt then, we could represent the data in either of the following 2 ways:

1. Color of the shirt/Dept no                             40                                     50

Red                                                                   1                                         2

Blue                                                                  1                                         1

2. Count                  Dept no               Color of the shirt 1                            40                        Red 1                            40                        Blue 2                            50                        Red 1                            50                        Blue It's easy to see that the first representation of the aggregation is easier to read, it is called the pivot view.

OBIEE’s algorithm to generate pivot view To generate the pivot view, OBIEE first collects the entire set of data and then the presentation server organizes it in the desired format to display it This method suffers on 2 accounts: 1 a. The number of logical i/o from the DB is huge because OBIEE retrieves the entire result set. The huge number of i/o can be proved by a number of techniques, such as the trace of the query and runstats.sql (given by Tom Kyte). Explain plan with statistics can also give us the info about consistent gets (set autotrace on explain statistics). The plan with statics is at the bottom of the article b. Because all the rows are fetched from the database, the number of round trips from the db server to the client is huge 2. a. All these rows will have to be sent to the presentation server. This will involve an added delay.

Modus operandi 1. We will create 2 answers that show the pivoted data using 2 techniques and measure the performance of each step in each technique. 2. We will use OBIEE's session log to find the time taken by the query to execute on the Database and the time taken for the generation of the pivot view. To do this test, I am using the following query to generate 100,000 records

SELECT DBMS_RANDOM.STRING ('u', 10) AS employee_name, CEIL (DBMS_RANDOM.VALUE (1, 10)) * 10 AS dept_number, DECODE (FLOOR (DBMS_RANDOM.VALUE (1, 5)), 1, 'Blue', 2, 'Green', 3, 'Purple', 4, 'Red' ) AS shirt_color, 1 employee_count FROM DUAL CONNECT BY ROWNUM <= 100000

Proposed solution Oracle 11G has a new function called pivot and I intend to leverage it for my benefit. My physical and BMM models are weird but those are not the focus of this paper. I have created my physical and BMM is such a way that my queries fetch data from a single physical source. The intention is to not add any extra load on the database just because OBIEE needs one fact and one dimension table to execute a query. My Physical Layer:


1. The dummy table is an opaque view with the following query

select 1 from dual 2. Employees is an actual table generated with the query written above 3. Pivoted_table is again an opaque view with the following query

select * from

(select dept_number,shirt_color,employee_count from employees )

pivot count(employee_count) for shirt_color in ('Blue' as Blue,'Green' as Green,'Purple' as Purple,'Red' as Red)

) The above query is doing most of the trick. In this query, I am counting employees and then grouping then on the shirt_color field. Anything in the 'in' clause here becomes the column of my pivoted table. So the output of this query will have 5 columns, namely: dept_number, Blue, Green, Purple and Red.

Since we know the column names generated by the pivot query, we can create the same for our opaque view as shown below

My BMM Layer: I have kept the same model in BMM as well but I have created Right outer joins here. Creating a Right outer join ensures that all data from Pivoted_table and Employees is selected irrespective of the records in the dummy table.

So my BMM looks like the following

Both the joins here are right outer joins

My Presentation Layer: My presentation layer does not have any manipulation. I have dragged my business model to the presentation layer My web catalog: I have created 2 answers, one each on Pivoted_table and Employees tables The answer on Pivoted_table with table view looks like the following

Now my other answer is on Employees table. I have selected the pivot view in the compound layout as shown below


Performance comparison Below are the screenshots of the 2 session logs. The 1st one is of the answer built on Pivoted_Table and the 2nd is on Employees table. As we can see the 1st only returned the 9 desired records. This is a huge benefit because we are reducing the traffic all across the channel. Right from the Db server to the repository to the presentation server to the engine generating the table view. The 2nd log shows that 100000 records were returned

Now let's dig a little deeper. Let's bifurcate the time into 2 parts 1. Time taken to execute the query 2. Time taken to generate the chart

I am pasting only the relevant portions of the session log.

Log of the answer on Pivoted_table: +++Administrator:2f0000:2f0004:----2012/01/15 02:37:12 -------------------- Sending query to database named Pivot Pool (id: <>): select *from(select dept_number,shirt_color,employee_count from employees)

pivot(count(employee_count) for shirt_color in ('Blue' as Blue,'Green' as Green,'Purple' as Purple,'Red' as Red)) +++Administrator:2f0000:2f0004:----2012/01/15 02:37:12 -------------------- Execution Node: <>, Close Row Count = 9, Row Width = 32 bytes +++Administrator:2f0000:2f0004:----2012/01/15 02:37:12 -------------------- Execution Node: <> Sort, Close Row Count = 9, Row Width = 32 bytes +++Administrator:2f0000:2f0004:----2012/01/15 02:37:13 -------------------- Query Status: Successful Completion +++Administrator:2f0000:2f0004:----2012/01/15 02:37:13 -------------------- Rows 9, bytes 288 retrieved from database query id: <> +++Administrator:2f0000:2f0004:----2012/01/15 02:37:13 -------------------- Physical query response time 0 (seconds), id <> +++Administrator:2f0000:2f0004:----2012/01/15 02:37:13 -------------------- Physical Query Summary Stats: Number of physical queries 1, Cumulative time 0, DB-connect time 0 (seconds) +++Administrator:2f0000:2f0004:----2012/01/15 02:37:13 -------------------- Rows returned to Client 9 +++Administrator:2f0000:2f0004:----2012/01/15 02:37:13 -------------------- Logical Query Summary Stats: Elapsed time 1, Response time 0, Compilation time 0 (seconds)

Log of the answer on Employees: +++Administrator:2f0000:2f0003:----2012/01/15 02:32:57 -------------------- Sending query to database named Pivot Pool (id: <>): select T3334.SHIRT_COLOR as c1, T3334.DEPT_NUMBER as c2, T3334.EMPLOYEE_NAME as c3 from EMPLOYEES T3334 +++Administrator:2f0000:2f0003:----2012/01/15 02:33:09 +++Administrator:2f0000:2f0003:----2012/01/15 02:34:16 -------------------- Execution Node: <> DbGateway Exchange, Close Row Count = 100000, Row Width = 8048 bytes +++Administrator:2f0000:2f0003:----2012/01/15 02:34:17 -------------------- Query Status: Successful Completion +++Administrator:2f0000:2f0003:----2012/01/15 02:34:17 -------------------- Rows 100000, bytes 804800000 retrieved from database query id: <> +++Administrator:2f0000:2f0003:----2012/01/15 02:34:17 -------------------- Query Status: Successful Completion +++Administrator:2f0000:2f0003:----2012/01/15 02:34:17 -------------------- Rows 100000, bytes 804800000 retrieved from database query id: <> +++Administrator:2f0000:2f0003:----2012/01/15 02:34:17 -------------------- Physical query response time 11 (seconds), id <> +++Administrator:2f0000:2f0003:----2012/01/15 02:34:17 -------------------- Physical Query Summary Stats: Number of physical queries 1, Cumulative time 11, DB-connect time 0 (seconds) +++Administrator:2f0000:2f0003:----2012/01/15 02:34:17 -------------------- Rows returned to Client 100000 +++Administrator:2f0000:2f0003:----2012/01/15 02:34:17 -------------------- Logical Query Summary Stats: Elapsed time 81, Response time 14, Compilation time 0 (seconds)

Let's compare the important timing metrics of the 2 answers

Physical Query Response Time: The time for a query to be processed in the back-end database. Cumulative time: The sum of time for all physical queries for a request (that is, the sum of all back-end database processing times and DB-connect times). DB-Connect time: The time taken to connect to the back-end database. Elapsed time: The time that has elapsed from when the logical query is presented to the BI Server until the result is returned to the user. Elapsed time can never be less than response time, because elapsed time takes into account the small extra time between the logical queries being presented to the BI Server to the start of preparation of the query. In cases where this delta time is negligible, the elapsed time equals the response time. Response time: The time taken for the logical query to prepare, execute, and fetch the last record. This matches the TOTAL_TIME_SEC that is logged in usage tracking. Compilation time: The time taken to compile the logical query.

Let me also share the plan and the statistics

Conclusion We have seen the method to use the pivot function of Oracle 11G database in OBIEE. It helps us by pushing the calculations on the database server and hence improves performance by reducing the data flow throughout the system and by reducing the amount of computation at the presentation server. It not only improves the response time but also reduces resource consumption on the BI server. We are using a data set of 100,000 records and we see that the performance imporvement is 10x. The performance benefit will be bigger for larger data sets and will be lesser for smaller ones. Performance benefit will also depend on the java memory heap and and the overall load on the webserver hosting OBIEE's presentation. It will also depend on the physical memory and processor of the box hosting OBIEE's presentation. For realistic scenarios, you should see a benefit of atleast 7x using this method


About the author

1 Comment Leave a comment
Hi, We are facing performance issues at answers levels, We are not using Pivot Tables. In Dev instance,report is rendering in seconds.We moved same rpd and catalog to QA instance,we are facing problem there.In DB,Query takes 17Sec to render,Indexes are already created in DB.Please give some solutions. Thanks in advacne

Leave a comment

Your email address will not be published. Required fields are marked *.