Tuesday, November 18, 2008

Eliminate the duplicate rows

How do I eliminate the duplicate rows ?
SQL> delete from table_name where rowid not in (select max(rowid) from table group by duplicate_values_field_name);
or
SQL> delete duplicate_values_field_name dv from table_name ta where rowid <(select min(rowid) from table_name tb where ta.dv=tb.dv);
Example.
Table Emp
Empno Ename
101 Scott
102 Jiyo
103 Millor
104 Jiyo
105 Smith
delete ename from emp a where rowid < ( select min(rowid) from emp b where a.ename = b.ename);
The output like,
Empno Ename
101 Scott
102 Millor
103 Jiyo
104 Smith

Oracle Interview questions and answers

1). Is there a limit on the size of a PL/SQL block?
Answer: Currently, the maximum parsed/compiled size of a PL/SQL block is 64K and the maximum code size is 100K.You can run the following select statement to query the size of an existing package or procedure. SQL> select * from dba_object_size where name = 'procedure_name'

2). Can one read/write files from PL/SQL?
Answer: Included in Oracle 7.3 is a UTL_FILE package that can read and write files.The directory you intend writing to has to be in your INIT.ORA file (see UTL_FILE_DIR=...parameter).Before Oracle 7.3 the only means of writing a file was to use DBMS_OUTPUT with the SQL*Plus SPOOL command.
DECLARE
fileHandler UTL_FILE.FILE_TYPE;
BEGIN
fileHandler := UTL_FILE.FOPEN('/home/oracle/tmp', 'myoutput','W');
UTL_FILE.PUTF(fileHandler, 'Value of func1 is %sn', func1(1));
UTL_FILE.FCLOSE(fileHandler);
END;

Thursday, August 28, 2008

Monday, August 4, 2008

Importing dump file to Oracle

You can import a emp.dmp file to Oracle using the below command line syntax.

C:\>imp userid/password@ora10 file=emp.dmp tables=test fromuser=lispadmin touser=lisp

If Oracle is installed on C drive use c:\
tables = test tells the case when one want to exclusively import only some tables.

Wednesday, July 16, 2008

Oracle 11g SQL New Features

"pivot" SQL clause - The new "pivot" SQL clause will allow quick rollup, similar to an MS-Excel pivot table, where you can display multiple rows on one column with SQL. MS SQL Server 2005 also introduced a pivot clause. Laurent Schneider notes that the new SQL "pivot" syntax is great for converting rows-to-columns and columns-to-rows.


The /*+result_cache*/ SQL hint - This suggests that the result data will be cached in the data buffers, and not the intermediate data blocks that were accessed to obtain the query results. You can cache SQL and PL/SQL results for super-fast subsequent retrieval. The "result cache" ties into the "scalable execution" concept. There are three areas of the result cache:

The SQL query result cache - This is an area of SGA memory for storing query results.


The PL/SQL function result cache - This result cache can store the results from a PL/SQL function call.


The OCI client result cache - This cache retains results from OCI calls, both for SQL queries or PL/SQL functions.


Scalable Execution - This 11g feature consists of a number of features, the first of which is query results caching; this feature automatically caches the results of an SQL query as opposed to the data blocks normally cached by the buffer cache, and works both client (OCI) and server side - this was described as "buffer cache taken to the next level". The DBA sets the size of the results cache and turns the feature on at a table level with the command "alter table DEPT cache results", the per-process cache is shared across multiple session and at the client level, is available with all 11g OCI-based clients. Mark Rittman


XML SQL queries - Oracle11g will support query mechanisms for XML including XQuery and SQL XML, emerging standards for querying XML data stored inside tables.


SQL Replay - Similar to the previous feature, but this only captures and applies the SQL workload, not total workload. Source: Dr. Tim Hall

Improved optimizer statistics collection speed - Oracle 11g has improved the dbms_stats performance, allowing for an order of magnitude faster CBO statistics creation. Oracle 11g has also separated-out the "gather" and "publish" operations, allowing CBO statistics to be retained for later use. Also, Oracle 11g introduces multi-column statistics to give the CBO the ability to more accurately select rows when the WHERE clause contains multi-column conditions or joins.

SQL execution Plan Management - Oracle 11g SQL will allow you to fix execution plans (explain plan) for specific statements, regardless of statistics or database version changes. See Inside the 11g SQL Performance Advisor.

Dynamic SQL. DBMS_SQL is here to stay. It's faster and is being enhanced. DBMS_SQL and NDS can now accept CLOBs (no more 32k limit on NDS). A ref cursor can become a DBMS_SQL cursor and vice versa. DBMS_SQL now supprts user defined types and bulk operations. Source: Lewis Cunningham


SQL Performance Advisor - You can tell 11g to automatically apply SQL profiles for statements where the suggested profile give 3-times better performance that the existing statement. The performance comparisons are done by a new administrative task during a user-specified maintenance window.

Improved SQL Access Advisor - The 11g SQL Access Advisor gives partitioning advice, including advice on the new interval partitioning. Interval partitioning is an automated version of range partitioning, where new equally-sized partitions are automatically created when needed. Both range and interval partitions can exist for a single table, and range partitioned tables can be converted to interval partitioned tables.

oracle 11g PLSQL new features

PL/SQL "continue" keyword – Now PL/SQL will allow a continue in a loop similar to one provided by C language, which can skip an iteration to bypass any “else” Boolean conditions. Below is an excellent PL/SQL example of the PL/SQL continue clause in-action:
BEGIN
FOR i IN 1..10
LOOP
DBMS_OUTPUT.PUT_LINE(’i=’||TO_CHAR(i));
IF ( i = 9 ) THEN
CONTINUE;
END IF;
DBMS_OUTPUT.PUT_LINE(’Only if i is not equal to 9′);
END LOOP;
END;

Disabled state for PL/SQL - Another 11g new feature is a "disabled" state for PL/SQL (as opposed to "enabled" and "invalid" in dba_objects).

Easy PL/SQL compiling - Native Compilation no longer requires a C compiler to compile your PL/SQL. Your code goes directly to a shared library.

Improved PL/SQL stored procedure invalidation mechanism - A new 11g features will be fine grained dependency tracking, reducing the number of objects which become invalid as a result of DDL.

Scalable PL/SQL - The next scalable execution feature is automatic creation of "native" PL/SQL (and Java code), with just one parameter for each type with an "on/off" value. This apparently provides a 100% performance boost for pure PL/SQL code, and a 10%-30% performance boost for code containing SQL.

Enhanced PL/SQL warnings - The 11g PL/SQL compiler will issue a warning for a "when others" with no raise.

Stored Procedure named notation - Named notation is now supported when calling a stored procedure from SQL.

Tuesday, July 15, 2008

Oracle/PLSQL: Creating Functions

In Oracle, you can create your own functions.

The syntax for a function is:

CREATE [OR REPLACE] FUNCTION function_name
[ (parameter [,parameter]) ]
RETURN return_datatype
IS | AS
[declaration_section]
BEGIN
executable_section
[EXCEPTION
exception_section]
END [function_name];


When you create a procedure or function, you may define parameters. There are three types of parameters that can be declared:

IN - The parameter can be referenced by the procedure or function. The value of the parameter can not be overwritten by the procedure or function.

OUT - The parameter can not be referenced by the procedure or function, but the value of the parameter can be overwritten by the procedure or function.

IN OUT - The parameter can be referenced by the procedure or function and the value of the parameter can be overwritten by the procedure or function.


The following is a simple example of a function:

CREATE OR REPLACE Function FindCourse
( name_in IN varchar2 )
RETURN number
IS
cnumber number;

cursor c1 is
select course_number
from courses_tbl
where course_name = name_in;

BEGIN

open c1;
fetch c1 into cnumber;

if c1%notfound then
cnumber := 9999;
end if;

close c1;

RETURN cnumber;

EXCEPTION
WHEN OTHERS THEN
raise_application_error(-20001,'An error was encountered - '||SQLCODE||' -ERROR- '||SQLERRM);
END;

This function is called FindCourse. It has one parameter called name_in and it returns a number. The function will return the course number if it finds a match based on course name. Otherwise, it returns a 99999.


You could then reference your new function in an SQL statement as follows:

select course_name, FindCourse(course_name) as course_id
from courses
where subject = 'Mathematics';

Change password for a user in Oracle

How do I change the password for a user in Oracle?
Answer: To change a user's password in Oracle, you need to execute the alter user command.

The syntax for changing a password is:

alter user user_name identified by new_password;

user_name is the user whose password you wish to change.

new_password is the new password to assign.


For example:

If you wanted to reset the password for a user named SCOTT, and you have to set the new password to TIGER, you would run the following command:

alter user scott identified by tiger;

Monday, July 14, 2008

Difference between PRIMARY KEY and UNIQUE KEY

Primary Key

1) There can be only one Primary key in a table.
2) Primary key creates Cluster Index
3) Primary key provide two constraint inforceness "UNIQUE + NOT NULL"
4) We can reference Primary Key for Foreign Key constraint.

Unique Key

1) More than one Unique key can be created in a table
2) Unique Key creates Non-Clustered Index
3) Unique Key provides only "Uniqueness for Table"
4) We can reference Unique Key for Foreign Key constraint.

What are Pseudocolumns

A pseudocolumn is like a table column, but is not actually stored in the table.
You can select from pseudocolumns, but you cannot insert, update, or delete their values.

Examples:
CURRVAL, USER, NEXTVAL, ROWNUM, ROWID,LEVEL

Currval and Nextval are used with Sequences to retreive the next sequence value and current sequence value.

The usage is as follows :
Suppose abc is the sequence name and if we want to see the Currval of the sequence then we issue the below sql statement:-

Select abc.currval From dual;

If we want to see the Next value of the sequence then we issue the command :-
Select abc.nextval From dual;

Thursday, July 10, 2008

Difference between Delete and Truncate

Deleteing records from a table is a DML(Data Manipulation Language) operation.
The deleted records are written to redo log files in case they need to be retrieved again or the operation is ROLLBACKed.

Syntax of DELETE operation :-
DELETE FROM tablename
WHERE CONDITION

TRUNCATE is a DDL(Data Definition Language) operation. When we TRUNCATE a table all the records gets deleted in one go and not record by record. Also the deleted records are not written to the REDO LOG file.
When we have millions of records to delete TRUNACATE is the best option.

Syntax of TRUNCATE operation :-
TRUNCATE TABLE tablename

Writting of the deleted records one by one to the REDO LOG files takes time which makes DELETE operation slower than TRUNCATE.
One must take care that data once Truncated can't be Rolled back.

Analytical Functions

Analytic Functions are available since Oracle 8.1.6. They are designed to address such problems as Top-N queries, Calculate a running total, Compute a moving average, Find percentages within a group and many others. Analytic Functions make these problems very easy to code as well as make them faster.

Analytic functions calculates an aggregate value based on a group of records. They differ from aggregate functions in that they return multiple records for each group. The group of rows is called a window and is defined by the analytic clause. For each row, a "sliding" window of rows is defined. The window determines the range of rows used to perform the calculations for the "current row". Window sizes can be based on either a physical number of rows or a logical interval such as time.

Analytic functions are the last set of operations performed in a query except for the final ORDER BY clause. All joins and all WHERE, GROUP BY, and HAVING clauses are completed before the analytic functions are processed. Therefore, analytic functions can appear only in the select list or ORDER BY clause.

The Syntax

Analytic-Function(Argument1,Argument2,....)
OVER (
Query-Partition-Clause
Order-By-Clause
Windowing-Clause
)

Specify the name of an analytic function, Oracle actually provides many analytic functions such as AVG, CORR, COVAR_POP, COVAR_SAMP, COUNT, CUME_DIST, DENSE_RANK, FIRST, FIRST_VALUE, LAG, LAST, LAST_VALUE, LEAD, MAX, MIN, NTILE, PERCENT_RANK, PERCENTILE_CONT, PERCENTILE_DISC, RANK, RATIO_TO_REPORT, STDDEV, STDDEV_POP, STDDEV_SAMP, SUM, VAR_POP, VAR_SAMP, VARIANCE.

Arguments

Analytic functions take 0 to 3 arguments.

Query-Partition-Clause

The PARTITION BY clause logically breaks a single result set into N groups, according to the criteria set by the partition expressions. The words "partition" and "group" are used synonymously here. The analytic functions are applied to each group independently, they are reset for each group.

Order-By-Clause

The ORDER BY clause specifies how the data is sorted within each group (partition). This will definitely affect the outcome of any analytic function.

Windowing-Clause

The windowing clause gives us a way to define a sliding or anchored window of data, on which the analytic function will operate, within a group. This clause can be used to have the analytic function compute its value based on any arbitrary sliding or anchored window within a group. More information on windows can be found here.

Example: Calculate a running Total

This example shows the cumulative salary within a departement row by row, with each row including a summation of the prior rows salary.

set autotrace traceonly explain
break on deptno skip 1
column ename format A6
column deptno format 999
column sal format 99999
column seq format 999

SELECT ename "Ename", deptno "Deptno", sal "Sal",
SUM(sal)
OVER (ORDER BY deptno, ename) "Running Total",
SUM(SAL)
OVER (PARTITION BY deptno
ORDER BY ename) "Dept Total",
ROW_NUMBER()
OVER (PARTITION BY deptno
ORDER BY ENAME) "Seq"
FROM emp
ORDER BY deptno, ename
/

Ename Deptno Sal Running Total Dept Total Seq
------ ------ ------ ------------- ---------- ----
CLARK 10 2450 2450 2450 1
KING 5000 7450 7450 2
MILLER 1300 8750 8750 3

ADAMS 20 1100 9850 1100 1
FORD 3000 12850 4100 2
JONES 2975 15825 7075 3
SCOTT 3000 18825 10075 4
SMITH 800 19625 10875 5

ALLEN 30 1600 21225 1600 1
BLAKE 2850 24075 4450 2
JAMES 950 25025 5400 3
MARTIN 1250 26275 6650 4
TURNER 1500 27775 8150 5
WARD 1250 29025 9400 6

Execution Plan
---------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 WINDOW (SORT)
2 1 TABLE ACCESS (FULL) OF 'EMP'
Statistics
---------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
0 physical reads
0 redo size
1658 bytes sent via SQL*Net to client
503 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
14 rows processed

The example shows how to calculate a "Running Total" for the entire query. This is done using the entire ordered result set, via SUM(sal) OVER (ORDER BY deptno, ename).

Further, we were able to compute a running total within each department, a total that would be reset at the beginning of the next department. The PARTITION BY deptno in that SUM(sal) caused this to happen, a partitioning clause was specified in the query in order to break the data up into groups.

The ROW_NUMBER() function is used to sequentially number the rows returned in each group, according to our ordering criteria (a "Seq" column was added to in order to display this position).

The execution plan shows, that the whole query is very well performed with only 3 consistent gets, this can never be accomplished with standard SQL or even PL/SQL.

Top-N Queries

How can we get the Top-N records by some set of fields ?
Prior to having access to these analytic functions, questions of this nature were extremely difficult to answer.

There are some problems with Top-N queries however; mostly in the way people phrase them. It is something to be careful about when designing reports. Consider this seemingly sensible request:

I would like the top three paid sales reps by department

The problem with this question is that it is ambiguous. It is ambiguous because of repeated values, there might be four people who all make the same salary, what should we do then ?

Let's look at three examples, all use the well known table EMP.

Example 1

Sort the sales people by salary from greatest to least. Give the first three rows. If there are less then three people in a department, this will return less than three records.

set autotrace on explain
break on deptno skip 1

SELECT * FROM (
SELECT deptno, ename, sal, ROW_NUMBER()
OVER (
PARTITION BY deptno ORDER BY sal DESC
) Top3 FROM emp
)
WHERE Top3 <= 3
/

DEPTNO ENAME SAL TOP3
---------- ---------- ---------- ----------
10 KING 5000 1
CLARK 2450 2
MILLER 1300 3

20 SCOTT 3000 1
FORD 3000 2
JONES 2975 3

30 BLAKE 2850 1
ALLEN 1600 2
TURNER 1500 3

9 rows selected.

Execution Plan
--------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 VIEW
2 1 WINDOW (SORT)
3 2 TABLE ACCESS (FULL) OF 'EMP'

This query works by sorting each partition (or group, which is the deptno), in a descending order, based on the salary column and then assigning a sequential row number to each row in the group as it is processed. The use of a WHERE clause after doing this to get just the first three rows in each partition.

Example 2

Give me the set of sales people who make the top 3 salaries - that is, find the set of distinct salary amounts, sort them, take the largest three, and give me everyone who makes one of those values.

SELECT * FROM (
SELECT deptno, ename, sal,
DENSE_RANK()
OVER (
PARTITION BY deptno ORDER BY sal desc
) TopN FROM emp
)
WHERE TopN <= 3
ORDER BY deptno, sal DESC
/

DEPTNO ENAME SAL TOPN
---------- ---------- ---------- ----------
10 KING 5000 1
CLARK 2450 2
MILLER 1300 3

20 SCOTT 3000 1 <--- !
FORD 3000 1 <--- !
JONES 2975 2
ADAMS 1100 3

30 BLAKE 2850 1
ALLEN 1600 2
30 TURNER 1500 3


10 rows selected.

Execution Plan
--------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 VIEW
2 1 WINDOW (SORT PUSHED RANK)
3 2 TABLE ACCESS (FULL) OF 'EMP'

Here the DENSE_RANK function was used to get the top three salaries. We assigned the dense rank to the salary column and sorted it in a descending order.

The DENSE_RANK function computes the rank of a row in an ordered group of rows. The ranks are consecutive integers beginning with 1. The largest rank value is the number of unique values returned by the query. Rank values are not skipped in the event of ties. Rows with equal values for the ranking criteria receive the same rank.

The DENSE_RANK function does not skip numbers and will assign the same number to those rows with the same value. Hence, after the result set is built in the inline view, we can simply select all of the rows with a dense rank of three or less, this gives us everyone who makes the top three salaries by department number.

Windows

The windowing clause gives us a way to define a sliding or anchored window of data, on which the analytic function will operate, within a group. The default window is an anchored window that simply starts at the first row of a group an continues to the current row.

We can set up windows based on two criteria: RANGES of data values or ROWS offset from the current row. It can be said, that the existance of an ORDER BY in an analytic function will add a default window clause of RANGE UNBOUNDED PRECEDING. That says to get all rows in our partition that came before us as specified by the ORDER BY clause.

Let's look at an example with a sliding window within a group and compute the sum of the current row's SAL column plus the previous 2 rows in that group. If we need a report that shows the sum of the current employee's salary with the preceding two salaries within a departement, it would look like this.

break on deptno skip 1
column ename format A6
column deptno format 999
column sal format 99999

SELECT deptno "Deptno", ename "Ename", sal "Sal",
SUM(SAL)
OVER (PARTITION BY deptno
ORDER BY ename
ROWS 2 PRECEDING) "Sliding Total"
FROM emp
ORDER BY deptno, ename
/

Deptno Ename Sal Sliding Total
------ ------ ------ -------------
10 CLARK 2450 2450
KING 5000 7450
MILLER 1300 8750

20 ADAMS 1100 1100
FORD 3000 4100
JONES 2975 7075 ^
SCOTT 3000 8975 |
SMITH 800 6775 \-- Sliding Window

30 ALLEN 1600 1600
BLAKE 2850 4450
JAMES 950 5400
MARTIN 1250 5050
TURNER 1500 3700
WARD 1250 4000

The partition clause makes the SUM (sal) be computed within each department, independent of the other groups. Tthe SUM (sal) is ' reset ' as the department changes. The ORDER BY ENAME clause sorts the data within each department by ENAME; this allows the window clause: ROWS 2 PRECEDING, to access the 2 rows prior to the current row in a group in order to sum the salaries.

For example, if you note the SLIDING TOTAL value for SMITH is 6 7 7 5, which is the sum of 800, 3000, and 2975. That was simply SMITH's row plus the salary from the preceding two rows in the window.

Range Windows

Range windows collect rows together based on a WHERE clause. If I say ' range 5 preceding ' for example, this will generate a sliding window that has the set of all preceding rows in the group such that they are within 5 units of the current row. These units may either be numeric comparisons or date comparisons and it is not valid to use RANGE with datatypes other than numbers and dates.

Example

Count the employees which where hired within the last 100 days preceding the own hiredate. The range window goes back 100 days from the current row's hiredate and then counts the rows within this range. The solution ist to use the following window specification:

COUNT(*) OVER (ORDER BY hiredate ASC RANGE 100 PRECEDING)

column ename heading "Name" format a8
column hiredate heading "Hired" format a10
column hiredate_pre heading "Hired-100" format a10
column cnt heading "Cnt" format 99

SELECT ename, hiredate, hiredate-100 hiredate_pre,
COUNT(*)
OVER (
ORDER BY hiredate ASC
RANGE 100 PRECEDING
) cnt
FROM emp
ORDER BY hiredate ASC
/

Name Hired Hired-100 Cnt
-------- ---------- ---------- ---
SMITH 17-DEC-80 08-SEP-80 1
ALLEN 20-FEB-81 12-NOV-80 2
WARD 22-FEB-81 14-NOV-80 3
JONES 02-APR-81 23-DEC-80 3
BLAKE 01-MAY-81 21-JAN-81 4
CLARK 09-JUN-81 01-MAR-81 3
TURNER 08-SEP-81 31-MAY-81 2
MARTIN 28-SEP-81 20-JUN-81 2
KING 17-NOV-81 09-AUG-81 3
JAMES 03-DEC-81 25-AUG-81 5
FORD 03-DEC-81 25-AUG-81 5
MILLER 23-JAN-82 15-OCT-81 4
SCOTT 09-DEC-82 31-AUG-82 1
ADAMS 12-JAN-83 04-OCT-82 2

We ordered the single partition by hiredate ASC. If we look for example at the row for CLARK we can see that his hiredate was 09-JUN-81, and 100 days prior to that is the date 01-MAR-81. If we look who was hired between 01-MAR-81 and 09-JUN-81, we find JONES (hired: 02-APR-81) and BLAKE (hired: 01-MAY-81). This are 3 rows including the current row, this is what we see in the column "Cnt" of CLARK's row.

Compute average salary for defined range

As an example, compute the average salary of people hired within 100 days before for each employee. The query looks like this:

column ename heading "Name" format a8
column hiredate heading "Hired" format a10
column hiredate_pre heading "Hired-100" format a10
column avg_sal heading "Avg-100" format 999999

SELECT ename, hiredate, sal,
AVG(sal)
OVER (
ORDER BY hiredate ASC
RANGE 100 PRECEDING
) avg_sal
FROM emp
ORDER BY hiredate ASC
/

Name Hired SAL Avg-100
-------- ---------- ---------- -------
SMITH 17-DEC-80 800 800
ALLEN 20-FEB-81 1600 1200
WARD 22-FEB-81 1250 1217
JONES 02-APR-81 2975 1942
BLAKE 01-MAY-81 2850 2169
CLARK 09-JUN-81 2450 2758
TURNER 08-SEP-81 1500 1975
MARTIN 28-SEP-81 1250 1375
KING 17-NOV-81 5000 2583
JAMES 03-DEC-81 950 2340
FORD 03-DEC-81 3000 2340
MILLER 23-JAN-82 1300 2563
SCOTT 09-DEC-82 3000 3000
ADAMS 12-JAN-83 1100 2050

Look at CLARK again, since we understand his range window within the group. We can see that the average salary of 2758 is equal to (2975+2850+2450)/3. This is the average of the salaries for CLARK and the rows preceding CLARK, those of JONES and BLAKE. The data must be sorted in ascending order.

Row Windows

Row Windows are physical units; physical number of rows, to include in the window. For example you can calculate the average salary of a given record with the (up to 5) employees hired before them or after them as follows:

set numformat 9999
SELECT ename, hiredate, sal,
AVG(sal)
OVER (ORDER BY hiredate ASC ROWS 5 PRECEDING) AvgAsc,
COUNT(*)
OVER (ORDER BY hiredate ASC ROWS 5 PRECEDING) CntAsc,
AVG(sal)
OVER (ORDER BY hiredate DESC ROWS 5 PRECEDING) AvgDes,
COUNT(*)
OVER (ORDER BY hiredate DESC ROWS 5 PRECEDING) CntDes
FROM emp
ORDER BY hiredate
/

ENAME HIREDATE SAL AVGASC CNTASC AVGDES CNTDES
---------- --------- ----- ------ ------ ------ ------
SMITH 17-DEC-80 800 800 1 1988 6
ALLEN 20-FEB-81 1600 1200 2 2104 6
WARD 22-FEB-81 1250 1217 3 2046 6
JONES 02-APR-81 2975 1656 4 2671 6
BLAKE 01-MAY-81 2850 1895 5 2675 6
CLARK 09-JUN-81 2450 1988 6 2358 6
TURNER 08-SEP-81 1500 2104 6 2167 6
MARTIN 28-SEP-81 1250 2046 6 2417 6
KING 17-NOV-81 5000 2671 6 2392 6
JAMES 03-DEC-81 950 2333 6 1588 4
FORD 03-DEC-81 3000 2358 6 1870 5
MILLER 23-JAN-82 1300 2167 6 1800 3
SCOTT 09-DEC-82 3000 2417 6 2050 2
ADAMS 12-JAN-83 1100 2392 6 1100 1

The window consist of up to 6 rows, the current row and five rows " in front of " this row, where " in front of " is defined by the ORDER BY clause. With ROW partitions, we do not have the limitation of RANGE partition - the data may be of any type and the order by may include many columns. Notice, that we selected out a COUNT(*) as well. This is useful just to demonstrate how many rows went into making up a given average. We can see clearly that for ALLEN's record, the average salary computation for people hired before him used only 2 records whereas the computation for salaries of people hired after him used 6.

Accessing Rows Around Your Current Row

Frequently you want to access data not only from the current row but the current row " in front of " or " behind " them. For example, let's say you need a report that shows, by department all of the employees; their hire date; how many days before was the last hire; how many days after was the next hire.

Using straight SQL this query would be difficult to write. Not only that but its performance would once again definitely be questionable. The approach I typically took in the past was either to " select a select " or write a PL/SQL function that would take some data from the current row and " find " the previous and next rows data. This worked, but introduce large overhead into both the development of the query and the run-time execution of the query.

Using analytic functions, this is easy and efficient to do.

set echo on

column deptno format 99 heading Dep
column ename format a6 heading Ename
column hiredate heading Hired
column last_hire heading LastHired
column days_last heading DaysLast
column next_hire heading NextHire
column days_next heading NextDays

break on deptno skip 1

SELECT deptno, ename, hiredate,
LAG(hiredate,1,NULL)
OVER (PARTITION BY deptno
ORDER BY hiredate, ename) last_hire,
hiredate - LAG(hiredate,1,NULL)
OVER (PARTITION BY deptno
ORDER BY hiredate, ename) days_last,
LEAD(hiredate,1,NULL)
OVER (PARTITION BY deptno
ORDER BY hiredate, ename) next_hire,
LEAD(hiredate,1,NULL)
OVER (PARTITION BY deptno
ORDER BY hiredate, ename) - hiredate days_next
FROM emp
ORDER BY deptno, hiredate
/

Dep Ename Hired LastHired DaysLast NextHire NextDays
--- ------ --------- --------- -------- --------- --------
10 CLARK 09-JUN-81 17-NOV-81 161
KING 17-NOV-81 09-JUN-81 161 23-JAN-82 67
MILLER 23-JAN-82 17-NOV-81 67

20 SMITH 17-DEC-80 02-APR-81 106
JONES 02-APR-81 17-DEC-80 106 03-DEC-81 245
FORD 03-DEC-81 02-APR-81 245 09-DEC-82 371
SCOTT 09-DEC-82 03-DEC-81 371 12-JAN-83 34
ADAMS 12-JAN-83 09-DEC-82 34

30 ALLEN 20-FEB-81 22-FEB-81 2
WARD 22-FEB-81 20-FEB-81 2 01-MAY-81 68
BLAKE 01-MAY-81 22-FEB-81 68 08-SEP-81 130
TURNER 08-SEP-81 01-MAY-81 130 28-SEP-81 20
MARTIN 28-SEP-81 08-SEP-81 20 03-DEC-81 66
JAMES 03-DEC-81 28-SEP-81 66

The LEAD and LAG routines could be considered a way to " index into your partitioned group ". Using these functions you can access any individual row. Notice for example in the above printout, it shows that the record for KING includes the data (in bold red font) from the prior row (LAST HIRE) and the next row (NEXT-HIRE). We can access the fields in records preceding or following the current record in an ordered partition easily.

LAG

LAG ( value_expr [, offset] [, default] )
OVER ( [query_partition_clause] order_by_clause )

LAG provides access to more than one row of a table at the same time without a self join. Given a series of rows returned from a query and a position of the cursor, LAG provides access to a row at a given physical offset prior to that position.

If you do not specify offset, then its default is 1. The optional default value is returned if the offset goes beyond the scope of the window. If you do not specify default, then its default value is null.

The following example provides, for each person in the EMP table, the salary of the employee hired just before:

SELECT ename,hiredate,sal,
LAG(sal, 1, 0)
OVER (ORDER BY hiredate) AS PrevSal
FROM emp
WHERE job = 'CLERK';

Ename Hired SAL PREVSAL
------ --------- ----- -------
SMITH 17-DEC-80 800 0
JAMES 03-DEC-81 950 800
MILLER 23-JAN-82 1300 950
ADAMS 12-JAN-83 1100 1300

LEAD

LEAD ( value_expr [, offset] [, default] )
OVER ( [query_partition_clause] order_by_clause )

LEAD provides access to more than one row of a table at the same time without a self join. Given a series of rows returned from a query and a position of the cursor, LEAD provides access to a row at a given physical offset beyond that position.

If you do not specify offset, then its default is 1. The optional default value is returned if the offset goes beyond the scope of the table. If you do not specify default, then its default value is null.

The following example provides, for each employee in the EMP table, the hire date of the employee hired just after:

SELECT ename, hiredate,
LEAD(hiredate, 1)
OVER (ORDER BY hiredate) AS NextHired
FROM emp WHERE deptno = 30;

Ename Hired NEXTHIRED
------ --------- ---------
ALLEN 20-FEB-81 22-FEB-81
WARD 22-FEB-81 01-MAY-81
BLAKE 01-MAY-81 08-SEP-81
TURNER 08-SEP-81 28-SEP-81
MARTIN 28-SEP-81 03-DEC-81
JAMES 03-DEC-81

Determine the First Value / Last Value of a Group

The FIRST_VALUE and LAST_VALUE functions allow you to select the first and last rows from a group. These rows are especially valuable because they are often used as the baselines in calculations.

Example

The following example selects, for each employee in each department, the name of the employee with the lowest salary.

break on deptno skip 1

SELECT deptno, ename, sal,
FIRST_VALUE(ename)
OVER (PARTITION BY deptno
ORDER BY sal ASC) AS MIN_SAL_HAS
FROM emp
ORDER BY deptno, ename;

DEPTNO ENAME SAL MIN_SAL_HAS
---------- ---------- ---------- -----------
10 CLARK 2450 MILLER
KING 5000 MILLER
MILLER 1300 MILLER

20 ADAMS 1100 SMITH
FORD 3000 SMITH
JONES 2975 SMITH
SCOTT 3000 SMITH
SMITH 800 SMITH

30 ALLEN 1600 JAMES
BLAKE 2850 JAMES
JAMES 950 JAMES
MARTIN 1250 JAMES
TURNER 1500 JAMES
WARD 1250 JAMES

The following example selects, for each employee in each department, the name of the employee with the highest salary.

SELECT deptno, ename, sal,
FIRST_VALUE(ename)
OVER (PARTITION BY deptno
ORDER BY sal DESC) AS MAX_SAL_HAS
FROM emp
ORDER BY deptno, ename;

DEPTNO ENAME SAL MAX_SAL_HAS
---------- ---------- ---------- -----_-----
10 CLARK 2450 KING
KING 5000 KING
MILLER 1300 KING

20 ADAMS 1100 FORD
FORD 3000 FORD
JONES 2975 FORD
SCOTT 3000 FORD
SMITH 800 FORD

30 ALLEN 1600 BLAKE
BLAKE 2850 BLAKE
JAMES 950 BLAKE
MARTIN 1250 BLAKE
TURNER 1500 BLAKE
WARD 1250 BLAKE

The following example selects, for each employee in department 30 the name of the employee with the lowest salary using an inline view

SELECT deptno, ename, sal,
FIRST_VALUE(ename)
OVER (ORDER BY sal ASC) AS MIN_SAL_HAS
FROM (SELECT * FROM emp WHERE deptno = 30)

DEPTNO ENAME SAL MIN_SAL_HAS
---------- ---------- ---------- -----------
30 JAMES 950 JAMES
MARTIN 1250 JAMES
WARD 1250 JAMES
TURNER 1500 JAMES
ALLEN 1600 JAMES
BLAKE 2850 JAMES

Crosstab or Pivot Queries

A crosstab query, sometimes known as a pivot query, groups your data in a slightly different way from those we have seen hitherto. A crosstab query can be used to get a result with three rows (one for each project), with each row having three columns (the first listing the projects and then one column for each year) -- like this:

Project 2001 2002
ID CHF CHF
-------------------------------
100 123.00 234.50
200 543.00 230.00
300 238.00 120.50

Example

Let's say you want to show the top 3 salary earners in each department as columns. The query needs to return exactly 1 row per department and the row would have 4 columns. The DEPTNO, the name of the highest paid employee in the department, the name of the next highest paid, and so on. Using analytic functions this almost easy, without analytic functions this was virtually impossible.

SELECT deptno,
MAX(DECODE(seq,1,ename,null)) first,
MAX(DECODE(seq,2,ename,null)) second,
MAX(DECODE(seq,3,ename,null)) third
FROM (SELECT deptno, ename,
row_number()
OVER (PARTITION BY deptno
ORDER BY sal desc NULLS LAST) seq
FROM emp)
WHERE seq <= 3
GROUP BY deptno
/

DEPTNO FIRST SECOND THIRD
---------- ---------- ---------- ----------
10 KING CLARK MILLER
20 SCOTT FORD JONES
30 BLAKE ALLEN TURNER

Note the inner query, that assigned a sequence (RowNr) to each employee by department number in order of salary.

SELECT deptno, ename, sal,
row_number()
OVER (PARTITION BY deptno
ORDER BY sal desc NULLS LAST) RowNr
FROM emp;

DEPTNO ENAME SAL ROWNR
---------- ---------- ---------- ----------
10 KING 5000 1
10 CLARK 2450 2
10 MILLER 1300 3
20 SCOTT 3000 1
20 FORD 3000 2
20 JONES 2975 3
20 ADAMS 1100 4
20 SMITH 800 5
30 BLAKE 2850 1
30 ALLEN 1600 2
30 TURNER 1500 3
30 WARD 1250 4
30 MARTIN 1250 5
30 JAMES 950 6

The DECODE in the outer query keeps only rows with sequences 1, 2 or 3 and assigns them to the correct "column". The GROUP BY gets rid of the redundant rows and we are left with our collapsed result. It may be easier to understand if you see the resultset without the aggregate function MAX grouped by deptno.

SELECT deptno,
DECODE(seq,1,ename,null) first,
DECODE(seq,2,ename,null) second,
DECODE(seq,3,ename,null) third
FROM (SELECT deptno, ename,
row_number()
OVER (PARTITION BY deptno
ORDER BY sal desc NULLS LAST) seq
FROM emp)
WHERE seq <= 3
/

DEPTNO FIRST SECOND THIRD
---------- ---------- ---------- ----------
10 KING
10 CLARK
10 MILLER
20 SCOTT
20 FORD
20 JONES
30 BLAKE
30 ALLEN
30 TURNER

The MAX aggregate function will be applied by the GROUP BY column DEPTNO. In any given DEPTNO above only one row will have a non-null value for FIRST, the remaining rows in that group will always be NULL. The MAX function will pick out the non-null row and keep that for us. Hence, the group by and MAX will collapse our resultset, removing the NULL values from it and giving us what we want.

Wednesday, July 9, 2008

Index-Organized Tables

An indexed-organized table (IOT) is like a regular table with an index on one or more of its columns, but instead of maintaining two separate segments for the table and the B*-tree index, the Oracle server maintains one single B*-tree structure which contains:
• The primary key value
• Other (non-key) column values for the row

The B*-tree structure, which is based on the primary key of the table, is organized like an index.
The leaf blocks in this structure contain the rows instead of the ROWIDs. This means that the rows in the index-organized table are always maintained in the order of the primary key.

Because index-organized tables do not have real ROWIDs, the concept of logical ROWIDs is introduced in Oracle9i. This allows you to create additional indexes on index-organized tables; this was not possible in Oracle8 release 8.0.
Because large rows of an index-organized table might destroy the dense and efficient storage of the B*-tree structure, in Oracle9i you can store part of the row in another segment, called an overflow area. This is discussed on the following pages.

IOT Performance Characteristics

 B*-tree stores full rows
 Sorted rows
 Logical ROWIDs
 Fast, key-based access to table data

You can access an index-organized table by using either the primary key or a combination of columns that constitute the leading part of the primary key.
IOTs provide fast, key-based access for queries involving exact match (equality operator) or range searches on the primary key.

Before Oracle9i, any other access to rows in an index-organized table required a full scan of the B*-tree structure. That is why Oracle9i introduced the concept of logical ROWIDs, which you can use to create additional indexes on an index-organized table. Logical ROWIDs internally have the UROWID datatype.

Because the rows are ordered in the IOT, full scans on an index-organized table return rows in a primary key sequence.
Because there is no duplication of primary key values (compared with regular tables: index segment and data segment), IOTs use less storage. Index organization is useful for a table that is frequently accessed using the primary key and has only a few, relatively short nonkey columns.

Note : The statement in the fourth paragraph about returning the rows in primary key sequence is only true for serial execution; it is no longer true with parallel execution. Also, the first bullet on the slide is not always true: the concept of row overflow is introduced on the following pages.

IOT Limitations

 Must have a primary key
 Cannot be part of an index cluster or hash cluster
 Cannot contain LONG columns (although LOB columns are allowed)

• Index-organized tables must have a primary key. This is the unique identifier and is used as the basis for ordering; there is no ROWID to act as a unique identifier in the B*-tree structure.
• Index-organized tables cannot be part of an index cluster or a hash cluster.
• Index-organized tables cannot include LONG columns, but they can contain LOB columns.
• Since index-organized tables are B*-tree structures, they are subject to fragmentation as a result of incremental updating.
• You can use the ALTER TABLE … MOVE command to rebuild the index-organized table:
ALTER TABLE iot_tablename MOVE [OVERFLOW...];
• Specifying the optional OVERFLOW clause causes the overflow segment to be rebuilt, as well. Overflow segments are explained on the following pages.

Note : Oracle9i removed the IOT restrictions regarding secondary indexes and additional UNIQUE constraints. You may mention the absence of real ROWIDs as an IOT limitation, although the implementation of logical ROWIDs (with the UROWID data type) resolved all related shortcomings. Note that the MOVE option for the ALTER TABLE statement is new in Oracle9i.

When to Use Index-Organized Tables

 Note that:
• IOTs are transparent to users and applications
• IOTs are supported by Oracle9i utilities

 Content-based information applications: text, images, and sound storage or retrieval

• Applications that use complex data objects have many different indexing requirements. For example, information-retrieval applications may need a keyword search or a search for similar words.
• Index-organized tables are particularly useful for applications accessing and manipulating complex data. Examples are information-retrieval, spatial, and OLAP applications.
• No special considerations exist for using most SQL statements against an index-organized table. The structure of the table should be completely transparent to the user.
• The Oracle9i utilities also support index-organized tables. SQL*Loader, using direct path mode, can load data directly into an index-organized table. This kind of loading can be quicker than loading into a standard table and then building the indexes afterwards.


Creating Index-Organized Tables

CREATE TABLE table-name
( column_definitions
[,constraint_definitions] )
ORGANIZATION INDEX
[ block_util_parameters ]
[ storage_clause ]
[ TABLESPACE tablespace ]
[ PCTTHRESHOLD integer
[ INCLUDING column_name ] ]
[ OVERFLOW segment_attr_clause ]


• The ORGANIZATION INDEX clause of the CREATE TABLE statement specifies that you create an index-organized table.
• You must specify a primary key constraint when creating index-organized tables. If you try to create an index-organized table without a primary key, the following error is generated:
ORA-25175: no PRIMARY KEY constraint found
• PCTTHRESHOLD specifies the percentage of space reserved for an index-organized table row.
• PCTTHRESHOLD defaults to 50 and must be a value from 0 to 50.
• If a row exceeds the size calculated based on this value, all columns after the column named in the INCLUDING clause are moved to the overflow segment.
• If OVERFLOW is not specified, then rows exceeding the threshold are rejected.
• INCLUDING column_name specifies a column at which to divide an index-organized table row into index and overflow portions. All columns that follow column_name are stored in the overflow data segment. If this is not specified and a row size exceeds PCTTHRESHOLD, all columns except the primary key columns are moved to the overflow area. The column is either the name of the last primary key column or any nonprimary key column.
• OVERFLOW specifies that index-organized table rows exceeding the specified threshold be placed in the data segment defined by segment_attr_clause, which specifies the tablespace, storage, and block utilization parameters.

IOT Row Overflow

• Large rows in an index-organized table might destroy the dense B*-tree storage. You can overcome this problem by using an overflow area. Note that you need additional I/Os to retrieve these large rows, because the rows are stored in two pieces. This results in a decrease in performance compared to an IOT with shorter records that are mostly stored entirely in the IOT segment alone.
• When an index-organized table is created by specifying an OVERFLOW clause, the following three segments are created in the database:
• A logical table with the name defined in the CREATE TABLE clause
• An index with the same name as the primary key constraint
• A table to accommodate the overflow row pieces (The name of this table is SYS_IOT_OVER_n, where n is the OBJECT_ID of the index-organized table as seen from USER_OBJECTS.)

Note: If you create an index-organized table without specifying an OVERFLOW clause, only the first two segments are created. Give the primary key constraint a name, so that the index segment receives a meaningful name (as opposed to a system generated one.)

Retrieving IOT Information from the Data Dictionary

Use the following query to list the index-organized tables and information related to their structure:
SQL> select t.table_name as "IOT"
2 , o.table_name as "Overflow"
3 , i.index_name as "Index"
4 , o.tablespace_name as "Overflow TS"
5 , i.tablespace_name as "Index TS"
6 , i.pct_threshold as "Threshold"
7 from user_tables t
8 , user_tables o
9 , user_indexes i
10 where t.table_name = o.iot_name
11 and t.table_name = i.table_name;

Top-N SQL

• The idea behind the Top-N SQL feature is that you do not need to sort a full set if, for example, you are only interested in the four highest (or lowest) values.
• If a set is too big to be sorted in memory, the performance is significantly and especially degrading. This is caused by the I/O to and from temporary storage of intermediate results on disk.
• If you only want to know the four highest values, you only need an array with four slots to scan the set and keep the four highest values in that array.
• The WHERE clause of the statement above is merged into the in-line view to prevent the full EMPLOYEES table from being sorted by SALARY. This appears in the execution plan as follows:

SQL> select *
2 from (select emp_id
3 , last_name
4 , first_name
5 , salary
6 from employees
7 ORDER BY salary desc)
8 where ROWNUM <= 5;

Execution Plan
------------------------------------------
SELECT STATEMENT Optimizer=CHOOSE
COUNT (STOPKEY)
VIEW
SORT (ORDER BY STOPKEY)
TABLE ACCESS (FULL) OF 'EMPLOYEES'

Tuesday, July 8, 2008

SQL Trace and TKPROF

 Configure SQL Trace to collect session statistics
 Enable SQL Trace and locate your trace files
 Format trace files using TKPROF
 Interpret the output of the TKPROF command

SQL Trace Facility

 Is set at the instance or session level
 Gathers statistics for SQL statements
 Produces output that can be formatted by TKPROF

A good way to compare two execution plans is to run them and compare the statistics to see which one performs better. You can use the SQL Trace facility to obtain performance information. SQL Trace writes its output to a file, and you use TKPROF to format it.
The SQL Trace facility:
• Can be switched on for an instance or a session
• Reports on volume and time statistics for the PARSE, EXECUTE, and FETCH phases
• Produces output that can be formatted by TKPROF
When the SQL trace facility is enabled for a session, Oracle generates a trace file containing statistics for traced SQL statements for that session. When the SQL trace facility is enabled for an instance, Oracle creates a separate trace file for each process.
Note: SQL Trace involves some overhead, so you normally do not want to enable SQL Trace at the instance level.

How to Use the SQL Trace Facility

 Set the initialization parameters.
 Switch on tracing.
 Run the application.
 Format the trace file.
 Interpret the output.

You must complete five steps to use the SQL Trace tool:
1. Set the appropriate initialization parameters.
2. Switch on SQL Trace.
3. Run the application (and switch off tracing when done).
4. Format the trace file produced by SQL Trace with TKPROF.
5. Interpret the output and tune the SQL statements when needed.
Running SQL Trace increases system overhead. Use SQL Trace only when required, and use it at the session level rather than at the instance level.
Note : On certain platforms, it is necessary to exit your session after step 3 to get the full trace file on disk. Sometimes the last part is missing from a trace file; exiting your session resolves that problem. STAT lines in the trace file are only written when the corresponding cursor is closed.

Initialization Parameters

Several initialization parameters relate to SQL Trace.

TIMED_STATISTICS :

• The SQL Trace facility provides a variety of information about processes, optionally including timing information.
• If you want timing information, you must turn on this parameter. You can do so for the entire database by setting the following initialization parameter in the parameter file before starting up and opening the database:
TIMED_STATISTICS = TRUE
• The parameter also can be set dynamically for a particular session with the following command:
SQL> alter session set timed_statistics=true;
• The timing statistics have a resolution of one-hundredth of a second. This means that any operation that finishes quickly may not be timed accurately, such as simple queries that execute quickly.
• Having TIMED_STATISTICS turned on affects performance slightly because the Oracle server must do some additional work. Therefore, this parameter is commonly turned off until specifically desired for tuning purposes. However, keeping TIMED_STATISTICS turned on could make trace files more useful for support engineers upon a system crash.
Note: The underlined values in the slide indicate the default values for the parameters.

MAX_DUMP_FILE_SIZE and USER_DUMP_DEST

These two parameters control the size and destination of the output file:
MAX_DUMP_FILE_SIZE = n
USER_DUMP_DEST = directory_path
• The MAX_DUMP_FILE_SIZE default value is 500, and this parameter value is expressed in operating system blocks.
• MAX_DUMP_FILE_SIZE can also be changed at the session level by using the ALTER SESSION command.
• The default USER_DUMP_DEST location varies by operating system; it normally is the default destination for system dumps on your system.
• USER_DUMP_DEST is a system level parameter that cannot be changed at the session level. It can only be changed dynamically by a database administrator using the ALTER SYSTEM command.
• Obtain Information about Parameter Settings. You can display current parameter values by querying the V$PARAMETER view:
SQL> select name, value
2 from v$parameter
3 where name like '%dump%';
Note: The directory path specifications that show up in the VALUE column are operating system dependent.
• In Oracle9i you can also use the “show parameter ” command; all commands that were specific for the Server Manager tool are now available in SQL*Plus, provided you have the correct privileges.
• If the MAX_DUMP_FILE_SIZE is reached, *** Trace File Full *** is written to the trace file. No notification or error message is displayed, and the background process stops writing to the trace file.
• Note that there is an additional initialization parameter, SQL_TRACE, that influences SQL Trace. However, it is not a parameter to use when preparing for SQL Trace, but rather when enabling SQL Trace. That is why it is not on the previous slide but is introduced on the next slide. You should discourage use of that parameter and suggest enabling SQL Trace at the session level instead.
• Note that when you are using SQL Trace in Multi-Threaded Server (MTS), the shared server processes will create trace files in the BACKGROUND_DUMP_DEST, not in the USER_DUMP_DEST directory.

TIMED_STATISTICS = {false|true}
MAX_DUMP_FILE_SIZE = {n|500}
USER_DUMP_DEST = directory_path

 MAX_DUMP_FILE_SIZE is measured in Operating System blocks
 Default USER_DUMP_DEST : background dump destination

Switching On SQL Trace for an Instance

 For an instance, set the following parameter:
SQL_TRACE = TRUE

 For your current session:
SQL> alter session set sql_trace = true;
SQL> execute dbms_session.set_sql_trace(true);

 For any session:
SQL> execute dbms_system.set_sql_trace_in_session
2 (session_id, serial_id, true);

• In order to switch on SQL Trace for an entire instance, set the initialization parameter:
SQL_TRACE = TRUE
• Switching On SQL Trace for a Session. You can use a SQL command to switch on SQL Trace for your session.
SQL> alter session set sql_trace = true;
• You can also use a supplied package to switch on SQL Trace for your session. This is useful when you want to turn SQL Trace on and off from within a PL/SQL unit.
SQL> execute dbms_session.set_sql_trace(true);
• You can also switch on SQL Trace for another user’s session with a supplied package.
SQL> execute dbms_system.set_sql_trace_in_session
2 (session_id, serial_id, true);
• In this procedure call, session_id and serial_id are the values in the SID and SERIAL# columns of V$SESSION, a data dictionary view commonly used by database administrators.
• Switching Off SQL Trace. When you have finished tuning, you can switch off SQL Trace by using any of the above methods, substituting the word FALSE for TRUE. If you switch on SQL Trace for a single session, exiting that session also turns off SQL Trace.

Finding Your Trace Files

 Look in the directory specified by USER_DUMP_DEST.
 If only a few people are running SQL Trace, look for the most recent time stamp.
 Otherwise, consider writing a script.

SQL> @readtrace.sql

SQL> execute gettrace;
PL/SQL procedure successfully completed.

Identifying your trace file in the directory specified by USER_DUMP_DEST is usually a matter of finding the trace file with the most recent time stamp. However, this task becomes more difficult when several users generate trace files at the same time.
When several users create trace files simultaneously, the readtrace.sql script can be useful. It creates a procedure that opens your current trace file using the UTL_FILE package. The default output file name is username.trc, but you can also specify a name yourself.
SQL> @readtrace.sql
SQL> alter session set sql_trace = true;
SQL> select * from dual;
SQL> execute gettrace('output_filename');

Note :
The readtrace is not a standard Oracle utility that comes with the software, but is only developed and provided for this course.
There is also a script gt.sql that you can use to hide the execute gettrace statement; it accepts the output_filename as an argument.

Formatting Your Trace Files

OS> tkprof tracefile outputfile [options]

 TKPROF command examples :
OS> tkprof
OS> tkprof ora_815.trc run1.txt
OS> tkprof ora_815.trc run2.txt sys=no
sort=execpu print=3


Use the TKPROF command to format your trace files into a readable output.
OS> tkprof tracefile outputfile [options]
tracefile The name of the trace output file (input for TKPROF)
outputfile The name of the file to store the formatted results
When the TKPROF command is entered without any arguments, it generates a usage message together with a description of all TKPROF options.
Note: Trace files generated immediately after instance startup contain data that reflects the activity of the startup process. In particular, they reflect a disproportionate amount of physical I/O activity as caches in the SGA are filled. For the purposes of tuning, you should ignore such trace files.
These are just some examples; the full TKPROF syntax will be explained later.
In previous releases, TKPROF had a version component in the command name on Windows platforms (for example, tkprof73 or tkprof80). With Oracle9i, there is no tkprof81 command anymore.

SORT = option
PRINT = n
EXPLAIN = username/password
INSERT = filename
SYS = NO
AGGREGATE = NO
RECORD = filename
TABLE = schema.tablename

Output of the TKPROF command

 Text of the SQL statement
 Trace statistics (for statement and recursive calls) separated into three SQL processing steps:


PARSE Translates the SQL statement into an execution plan
EXECUTE Executes the statement
(This step modifies the data for INSERT, UPDATE, and DELETE statements.)
FETCH Retrieves the rows returned by a query (Fetches are performed only for SELECT statements.)

The TKPROF output lists the statistics for a SQL statement by the SQL processing step.
The step for which each row contains statistics is identified by the value of the call column.

Seven categories of trace statistics:

Count Number of times procedure was executed.
CPU Number of seconds to process
Elapsed Total number of seconds to execute
Disk Number of physical blocks read
Query Number of logical buffers read for consistent read
Current Number of logical buffers read in current mode
Rows Number of rows processed by the fetch or execute

Output of the TKPROF command

The TKPROF output also includes the following:
 Recursive SQL statements
 Library cache misses
 Parsing user ID
 Execution plan
 Optimizer mode or hint

Recursive Calls : Sometimes in order to execute a SQL statement issued by a user, the Oracle server must issue additional statements. Such statements are called recursive SQL statements. For example, if you insert a row into a table that does not have enough space to hold that row, the Oracle server makes recursive calls to allocate the space dynamically. Recursive calls are also generated when data dictionary information is not available in the data dictionary cache and must be retrieved from disk. If recursive calls occur while the SQL trace facility is enabled, TKPROF marks them clearly as recursive SQL statements in the output file. You can suppress the listing of recursive calls in the output file by setting the SYS=NO command-line parameter. Note that the statistics for recursive SQL statements are always included in the listing for the SQL statement that caused the recursive call.

Library Cache Misses : TKPROF also lists the number of library cache misses resulting from parse and execute steps for each SQL statement. These statistics appear on separate lines following the tabular statistics.

Parsing User ID : This is the ID of the last user to parse the statement.

Execution Plan : If you specify the EXPLAIN parameter on the TKPROF command line, TKPROF uses the EXPLAIN PLAN command to generate the execution plan of each SQL statement traced. TKPROF also displays the number of rows processed by each step of the execution plan. Note: Be aware that the execution plan is generated at the time that the TKPROF command is run and not at the time the trace file was produced. This could make a difference if, for example, an index has been created or dropped since tracing the statements.

Optimizer Mode or Hint : This indicates the optimizer hint used during the execution of the statement. If there is no hint, then it shows the optimizer mode used. Hints and optimizer modes are discussed in more detail in the “Influencing the Optimizer” lesson.

• Note that Oracle9i TKPROF output also shows the row source operators (RSO) before the (optional) full execution plan. Note that these row source operators are derived from the trace file STAT lines without accessing the database, as opposed to the additional execution plan generated with the EXPLAIN option.

TKPROF Output Example : No Index

SQL> select status from registrations
2 where class_id = 155801 and stud_id = 7586;

call count cpu elapsed disk query current rows
------------ ---- ------- ---- ----- ------- ----
Parse 1 0.54 0.56 1 0 0 0
Execute 1 0.01 0.01 0 0 0 0
Fetch 1 0.52 1.47 1159 1160 0 1
------------ ---- ------- ---- ---- ------- ----
total 3 1.07 2.04 1160 1160 0 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 75

The example in the slide shows one row being fetched from the registrations table. It requires 1,160 reads and 0.52 seconds of CPU fetch time. The statement is executed through a full table scan of the REGISTRATIONS table, as you could see using the EXPLAIN command-line option. The statement needs to be optimized.

TKPROF Output Example : Unique Index

SQL> select status from registrations
2 where class_id = 155801 and stud_id = 7586;

call count cpu elapsed disk query current rows
------------ ---- ------- ---- ----- ------- ----
Parse 1 0.10 0.10 0 0 0 0
Execute 1 0.00 0.00 0 0 0 0
Fetch 1 0.00 0.00 0 4 0 1
------------ ---- ------- ---- ---- ------- ----
total 3 0.10 0.10 0 4 0 1

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 75

The same row is retrieved, but in this case I/O is substantially reduced to only four blocks read. In addition, CPU time is reduced to 0.00 seconds. These results are achieved because the statement uses a unique index.
You can achieve significant improvements in performance by sensible indexing. Identify areas for potential improvement using the SQL Trace facility.

Note: Indexes should not be built until required. Indexes slow down the processing of INSERT, UPDATE, and DELETE commands, because references to rows need to be added, changed, or removed. Unused indexes should be removed. If possible, process all application SQL through EXPLAIN PLAN and remove any indexes that are not referenced.
This demonstration can also be done using SQL*Plus AUTOTRACE or TKPROF:
SQL> alter table registrations disable primary key cascade;
SQL> set autotrace traceonly
SQL> @d04-17
SQL> alter table registrations enable primary key;
SQL> @d04-17


Some TKPROF Interpretation Pitfalls

 Read consistency trap
 Schema trap
 Time trap
 Trigger trap

• Read Consistency Trap : Sometimes other transactions hold uncommitted changes against a table. This increases the number of block visits, because additional blocks must be constructed and visited for read consistency.

• Schema Trap : The TKPROF statistics show a high number of block visits while the execution plan indicates an index access. Especially if the output shows a nonzero value for current, the table is probably accessed by a full table scan (compare the current columns on the two previous pages). The index is probably built after the trace file is produced, or the table and column statistics may be recomputed.

• Time Trap : If a particular DML statement shows high timing statistics, the explanation may be interference from another transaction holding (shared) locks on the table. That is why CPU time is a better indicator than elapsed time.

• Trigger Trap : The resources reported for a statement include those for all of the SQL issued while the statement was being processed. Therefore, they include any resources used within a trigger, along with the resources used by any other recursive SQL (such as SQL used for space allocation).

SQL Statement Processing

 Shared SQL areas
 SQL statement processing phases
 Sharing cursors
 SQL coding standards

Shared SQL Areas
• The shared pool is part of the System Global Area (SGA). The shared pool contains the dictionary cache and the library cache. Shared SQL areas are part of the library cache. The library cache also contains metadata.
• The shared pool is automatically maintained by an aging mechanism. This aging mechanism uses a least recently used (LRU) algorithm to determine what has been there longest and ages it out when space is required.
• The database administrator (DBA) can change the available dictionary and shared SQL areas space by modifying the SHARED_POOL_SIZE initialization parameter. The DBA would do this as part of an overall database tuning effort, which would be subject to various constraints.

Cursors
• Inside the shared SQL area, each SQL statement is parsed in its own part, known as a context area or cursor.
• Each cursor holds the following information:
• The parsed statement (static, dynamic, and recursive SQL, plus program units such as procedures and database triggers)
• The execution plan
• A list of referenced objects
• If two users issue the same SQL statement, then they will use the same cursor.
• The statement is reparsed if the representation in the shared pool is invalid. This happens, for example, if a data definition language (DDL) statement such as ALTER TABLE was used on one of the objects in the statement, or if a dependent table is analyzed.


Processing Phases
• The four most important phases in SQL statement processing are parsing, binding, executing, and fetching.
• The reverse arrows indicate processing scenarios; for example, FETCH—(RE)BIND—EXECUTE—FETCH.
• The FETCH phase only applies to queries.

Note :
• There is no real REBIND phase; the same pointer is simply used for a reEXECUTE. The value found at the corresponding memory address is used; Oracle9i uses a “bind by reference” technique.
• Three more phases are missing from this slide: DEFINE, DESCRIBE, and PARALLELIZE.
• DEFINE and DESCRIBE are not very important for tuning. PARALLELIZE is not mentioned here.

SQL Statement Processing Phases – Parse

PARSE Phase

 Searches for the statement in the shared pool
 Checks syntax
 Checks semantics and privileges
 Merges view definitions and subqueries
 Determines execution plan


The Oracle9i Server onwards does the following:
• Searches for the statement in the shared pool
• Checks the statement syntax, given the grammar and specifications of the SQL language
• Checks the semantics, ensuring that objects referenced in the SQL statement are valid and satisfy security constraints
• Determines whether the process issuing the statement has appropriate privileges to execute it
• Transforms a SQL statement on a view into an equivalent SQL statement on its underlying definition, and attempts to simplify a statement with a subquery by rewriting it into a join
• Determines and stores the execution plan, or uses an existing execution plan, if possible


BIND Phase

• The Oracle Server checks the statement for references of bind variables.
• The Oracle Server assigns or reassigns a value to each variable.

Note: This phase order implies that the Oracle Server does not know bind variable values when optimizing a statement. This enables a fast rebind-execute without the need for reparsing, thus saving time and memory; a disadvantage is that it is impossible for the optimizer to estimate predicate selectivity. This will be discussed in more detail in the “Collecting Statistics” lesson.


EXECUTE Phase

 Applies the execution plan
 Performs necessary I/O and sorts for data manipulation language (DML) statements


• The Oracle9i Server applies the parse tree to the data buffers.
• Multiple users can share the same parse tree.
• The Oracle9i Server performs physical reads or logical reads/writes for DML statements and also sorts the data when needed.


FETCH Phase

 Retrieves rows for a query
 Sorts for queries when needed
 Uses an array fetch mechanism

• The Oracle9i Server retrieves rows for a SELECT statement during the FETCH phase. Each fetch typically retrieves multiple rows, using an array fetch.
• Each Oracle tool offers its own ways of influencing the array size; in SQL*Plus you do so by using the ARRAYSIZE setting:
SQL> show arraysize
arraysize 15
SQL> set arraysize 1
With this setting, SQL*Plus will process one row at a time. The default value is 15.


Sharing Cursors

When a SQL statement is found in the shared SQL area, then the PARSE phase is curtailed and the existing cursor is used.

Benefits

 Reduces parsing and saves time
 Dynamically adjusts memory to the SQL being executed
 Improves memory usage


Shared Cursor Requirements

Cursors can be shared only by SQL statements that have the following identical elements:
 Text
• Uppercase and lowercase
• White space (spaces, tabs, carriage returns)
• Comments
 Referenced objects
 Bind variable data types

• Only identical SQL statements can use the same cursor.
• The text of the SQL statements must be exactly the same, including case, spaces, tabs, carriage returns, and comments.
• The objects referenced in the SQL statements must resolve to the same objects in the database.
• The types of the bind variables used in the SQL statements must be the same.
Note: Before sending SQL statements to the Oracle9i Server, most Oracle tools (such as PL/SQL, the precompilers, and Oracle Developer) preprocess SQL statements to make them as identical as possible by removing comments, squeezing white space, and converting to uppercase or lowercase. SQL*Plus, however, sends SQL statements to the Oracle9i Server in the same format as they are entered.

Writing SQL to Share Cursors

If the case or the amount of white space is different, then the statements are not identical.

SQL> select * from employees where emp_id = 70727;

SQL> select * from EMPLOYEES where EMP_ID = 70727;

If the objects belong to different users, then the statements are not identical.

SQL> select * from employees where EMP_ID = 70727;

SQL> select * from employees where EMP_ID = 70727;


SQL statements must be identical to be able to share cursors. Note that sharing statements is unimportant in a decision support system (DSS) environment, because most statements will be different anyway.
• Identical Case : The first two examples in the slide are not identical. Notice the case difference for the table and column names. Because of this case difference, the statements are not identical and thus cannot share a SQL area.

• Identical Objects : Even when two statements look identical, if the objects actually refer to different database objects, then the two statements are not identical. In the last two examples in the slide, the statements are issued by two different users who each have their own EMPLOYEES table. Thus the statements are not identical and cannot share a SQL area.


Bind Variables

Suppose you enter the following two statements:

select * from employees where emp_id = :c

select * from employees where emp_id = :d

Both statements are translated into:

select * from employees where emp_id = :b1


If two bind variables have different data types, then the statements are not identical. If the bind variable data types match but their names are not identical, as in the example above, there is no problem, because bind variables will be renamed internally. The first variable is always called :b1, the second is :b2, and so on.

Note that the script actually uses the PL/SQL environment of SQL*Plus. SQL*Plus itself does not translate bind variable references in SQL statements.
Note that 8.1.6 comes with a powerful (but also dangerous) feature to force cursor sharing, allowing you to set CURSOR_SHARING=FORCE at the session or instance level. This will replace all literals in your SQL statements by system generated bind variables.

Writing SQL to Share Cursors

Create generic code using the following:
 Stored procedures and packages
 Database triggers
 Referenced Oracle Developer Forms procedures
 Any other library routines and procedures

Develop coding conventions for SQL statements in ad hoc queries, SQL scripts, and Oracle Call Interface (OCI) calls.
Using Generic Shared Code
• Write and store procedures that can be shared across applications.
• Use database triggers.
• Write referenced triggers and procedures when using Oracle Developer.
• Write library routines and procedures in other environments.


Writing to Format Standards
• Develop format standards for all statements, including those in PL/SQL code.
• Develop rules for use of uppercase and lowercase.
• Develop rules for use of white space (spaces, tabs, carriage returns).
• Develop rules for use of comments, preferably keeping them out of the SQL statements themselves.
• Use the same names to refer to identical database objects.

Follow SQL coding standards:
 Case
 White space
 Comments
 Object references
 Bind variables

Monitoring Shared Cursors

 V$LIBRARYCACHE provides general information about the library cache.
 Information about individual SQL statements is contained in the following views:
• V$SQLTEXT
• V$SQLTEXT_WITH_NEWLINES
• V$SQL_BIND_DATA
• V$SQL
• V$SQLAREA


When there is no room to parse a statement in the shared SQL area, the oldest cursor is closed and its space is reused. If the original statement is needed again, the server must parse it again. The shared pool should be large enough to keep the number of statements that are parsed more than once to a minimum.
You can monitor your system to see how often the server cannot find a statement in memory. You can use Oracle Enterprise Manager (OEM) or query the appropriate data dictionary views: V$LIBRARYCACHE and V$SQLAREA.

Querying V$ views is usually a DBA task. Tell the participants that they may not have access to these data dictionary views on their home systems. The following pages give examples of V$LIBRARYCACHE and V$SQLAREA usage only.
Another interesting view is V$DB_OBJECT_CACHE.
Note that V$SQL lists all copies of identical SQL statements (child cursors) on a separate line, whereas V$SQLAREA groups those statements together.


The V$LIBRARYCACHE View

NAMESPACE The name of the library cache area
GETS Total number of requests (lookups)
GETHITS The number of times that an object's handle
was found in memory
GETHITRATIO The ratio of GETHITS to GETS
PINS The number of times that a PIN was requested
PINHITS The number of times that all the pieces of the
object were found in memory
PINHITRATIO The ratio of PINHITS to PINS
RELOADS The number of library cache misses

This view holds information about library cache management. Values for PINHITRATIO and GETHITRATIO close to 1 indicate a good library cache performance. Note that this slide does not show all columns of the V$LIBRARYCACHE view.
The V$LIBRARYCACHE view is used in the query below to check on the amount of caching.
SQL> select gethitratio, pinhitratio
2 from v$librarycache
3 where namespace = 'SQL AREA';
GETHITRATIO PINHITRATIO
----------- -----------
0.94 0.95

Instead of the PINHITRATIO, you could also use sum(reloads)/sum(pins) ratio to determine when you should increase the size of the shared pool.

Shared Cursor Use

SQL_TEXT Text of the SQL statement
VERSION_COUNT Number of versions of this cursor
LOADS Number of times the cursor has been loaded
INVALIDATIONS Number of times the contents have been invalidated
PARSE_CALLS Number of times a user has called this cursor
SORTS Number of sorts performed by the statement
COMMAND_TYPE Command type
PARSING_USER_ID Parsing user ID (SYS = 0)


The V$SQLAREA view holds information about all shared cursors in the cache.
VERSION COUNT >1 Indicates that the same text is used by different users on their own version of a table
LOADS > 1 Indicates cursor reloads after aging out or cursor invalidation
COMMAND_TYPE 1: CREATE TABLE
2: INSERT
3: SELECT
6: UPDATE
7: DELETE
Note: Only the most important columns of the V$SQLAREA view are listed above.
V$SQL provides more detailed information than V$SQLAREA, because it does not group by the SQL statement text.
For a full list of command type values, see description of the V$SESSION view in Oracle9i Server Reference.

Monitoring Shared Cursor Use

 One load per statement is ideal.
 One load per version/invalidation is acceptable.
 More than one load per version indicates a potential benefit from increasing the shared pool size.

In the best case scenario, there should be one version of each statement that is never invalidated or aged out.
If the number of loads is significantly higher than the sum of the versions and invalidations, especially if the number of loads is similar to the number of calls, then the cursor has probably been reloaded because of aging, and the system may benefit from increasing the size of the shared pool.

The statement above excludes information about recursive SQL (parsing user SYS: user_id=0) and displays only SELECT commands (command type 3).
There are two versions of the first statement, probably because they reference two different EMPLOYEES objects. However, each version has been loaded only once. The statement has been issued three times (PARSE_CALLS).
There is only one version of the second statement, but it has been loaded twice, having been invalidated once (probably by some DDL on the table or related index).

Note: Oracle SQL Analyze, a component of the Oracle Enterprise Manager (OEM) Tuning Pack, offers an excellent graphical interface on top of V$SQLAREA. This diagnostic tool is introduced in the “Oracle SQL Analyze” lesson.
V$SQL would distinguish between the two versions of the first SQL statement.

SQL> select sql_text, version_count, loads
2 , invalidations, parse_calls, sorts
3 from v$sqlarea
4 where parsing_user_id > 0
5 and command_type = 3
6 order by sql_text;


version invali parse
sql_text count loads dations calls sorts
-------------------- ------- ------- ---------- ------- --------
select * 2 2 0 3 0
from employees
where EMP_ID = 70727

select * 1 2 1 4 0
from employees
where emp_id = 70727

The Tuning Methodology or Steps

 Tune business function
 Tune data design
 Tune process design
 Tune SQL statements
 Tune physical structure
 Tune memory allocation
 Tune I/O
 Tune memory contention
 Tune operating system

The list in the slide shows the phases of the development cycle in which tuning can be applied. Following the steps in this order is recommended for the following reasons:
• The earlier the step, the higher the potential for improving performance.
• The earlier the step, the higher the cost for performing or reworking it later. For example, changes to the data structures and application code after the initial design tend to be expensive and require the additional expense of retesting the affected components.
• Decisions made at one step may influence subsequent steps. For example, the SQL statement that you write in Step 4 will have significant bearing on parsing and caching issues that are addressed in Step 6.
The more extensively that you use object-oriented design techniques and multitiered architecture, the better the chances that you will safely achieve any application changes at a reasonable cost.

Note : About object-oriented design techniques: on the other hand, encapsulation tends to make you lose control over events occurring at a lower level.

Tuning SQL Statements

 Tune the schema
• Add indexes
• Create Index Organized tables
• Create Clusters
 Choose the language : SQL or PL/SQL
 Design for the reuse of SQL optimization
 Design and tune SQL statements
 Maximize performance with the optimizer

During the steps within SQL statement tuning, analysis techniques should be used frequently to determine goals and progress.
• Tuning the Schema : The designer is responsible for determining which tables should be used, their contents, and similar issues. The application developer might then need to decide when to use indexing, when to use index organized tables, and when to use clustering. Denormalization may be needed for good performance, especially in data warehouse environments. The decisions that are made at this step greatly affect the performance of any SQL statements that would use these structures.
• Choosing a Language (SQL or PL/SQL) : In some cases, you may achieve better performance by using the PL/SQL language to perform a task.
• Designing for Reuse of SQL Parsing, Optimization, and Tuning Efforts : The Oracle9i Server is often able to reuse some of its efforts at parsing and optimization when it sees an identical statement repeated. Therefore, creating SQL statements that appear identical can improve performance. It also allows the application developer to focus and tune individual SQL statements that are used repeatedly.
• Designing and Tuning SQL Statements : There is a wide variety of methods for designing high-performance SQL statements. Knowing the overall functions of the optimizer helps to show where tuning efforts can be effective. Consider rewriting SQL statements into semantically equivalent statements. Also, use the correct language (SQL or PL/SQL) in each situation.
• Achieving Maximum Performance with the Optimizer : To make the best use of the Oracle cost-based optimizer, you must understand how it chooses access paths to data. You must assist the cost-based optimizer by using the ANALYZE command and by sometimes providing it hints about the best access path. The rule-based optimizer is also available.


 Be proactive: work from the top down.
 If you must be reactive, work from the bottom up, using these hints:
• Establish quantifiable objectives.
• Create a minimum repeatable test.
• Ask questions of affected users, and avoid preconceptions.
• Test hypotheses and keep notes.
• Stop when you meet the target.


• Quantifiable Objectives : Establish quantifiable objectives, and avoid vague ones. An example of an acceptable objective would be the following: “We must be able to support up to 20 operators, each entering 20 orders per hour, and the picking lists must be produced within 30 minutes of the end of the shift.”
• Minimum Repeatable Test : If you are trying to cut a four-hour run down to two hours, repeated timings take too long. Perform your initial trials against a test environment similar to the real one (or provide input for a restrictive condition, such as processing one department instead of 500 of them). The ideal test case should run for more than one minute so that improvements are demonstrated intuitively and can be measured using timing features.
• Ask questions of affected users and avoid preconceptions.
• Test hypotheses and keep notes.
• Stop when you meet the target.

• Tuning a Specific Problem for an Application Already in Production : Often, performance specialists are summoned late in the life cycle of a project, when the system is in production and performing unacceptably. In this situation, start at the bottom of the method list and work up. The items at the end are the cheapest and fastest to perform. If they do not solve the problem, you will have to work back “uphill,” and this starts to increase costs and time delays.

Temporary Tables

The Oracle9i Server onwards enables you to create temporary tables. Temporary tables can improve performance significantly by holding temporary data for reuse within your transaction or session. A temporary table has the following properties:
• Temporary table data is only visible within its defined scope; the scope can be defined to be a session or a transaction.
• The definition of a global temporary table is visible to all sessions. In contrast, the definition of a local temporary table does not persist at the end of the session that creates it. Note: The Oracle9i Server currently does not support local temporary tables.
• Temporary table data is stored within the sort space used by the session. If sort space is not sufficient to accommodate the data, space is allocated in the user’s temporary tablespace.
• Indexes on temporary tables have the same scope and duration as the table they correspond to.
• Triggers and views can be defined on temporary tables. However, a view cannot be defined joining a temporary and a permanent table.
• The CREATE GLOBAL TEMPORARY TABLE AS SELECT command can be used to create a temporary table and insert data into it.
• Definitions of temporary tables can be exported and imported.

Creating Temporary Tables

The example shows the command to create a temporary table where rows are only visible within the transaction that created them.
The clauses that control the duration of the rows are:
• ON COMMIT DELETE ROWS to specify that rows are only visible within the transaction (This is the default behavior).
• ON COMMIT PRESERVE ROWS to specify that rows are visible for the entire session.

Although the command does not create extents for the table, a session can query the table without receiving a error.
The view DBA_TABLES contains columns to indicate that the table is a temporary table. The column DURATION shows whether data in the table persists for a whole session or only within the transaction.

Note :There is a bug in 8.1.5 that shows the DURATION to be SYS$TRANSACTION even if the table is defined with ON COMMIT PRESERVE ROWS. The data in the table, however, is visible throughout the session.

SQL> CREATE GLOBAL TEMPORARY TABLE emp_temp
2 ( emp_id number(7)
3 , last_name varchar2(24)
4 , salary number(7,2) )
5 ON COMMIT DELETE ROWS;

SQL> select table_name, temporary, duration
2 from dba_tables
3 where table_name = 'EMP_TEMP';
TABLE_NAME TEMPORARY DURATION
-------------------- ------------------ ---------------
EMP_TEMP Y SYS$TRANSACTION

Saturday, July 5, 2008

Clusters and Index Clusters

A cluster is a structure that is used to physically locate rows together because they share common column values.
There are two type of clusters: index clusters and hash clusters.
The columns used to cluster the rows are called the cluster key:
The cluster key may consist of one or more columns.

Tables in a cluster have columns that correspond to the cluster key.
Clustering is a mechanism that is transparent to the applications using the tables. Data in a clustered table can be manipulated as though it were stored in a regular table.
Updating one of the columns in the cluster key may cause the Oracle server to physically relocate the row.

The cluster key is independent of the primary key. The tables in a cluster can have a primary key, which may be the cluster key or a different set of columns.
Clusters can waste disk space if the cluster key distribution is skewed. Choosing an appropriate cluster size is one of the most important issues when you consider to cluster your tables.


Cluster Example

If they are stored as regular tables, CLASSES and REGISTRATIONS are placed in different segments. This means that the tables use their own set of blocks—a block that stores rows from the CLASSES table does not contain data from the REGISTRATIONS table, and vice versa.

Because registrations are usually accessed by CLASS_ID, the developer may cluster both tables or just the REGISTRATIONS table on the CLASS_ID column.
If the tables CLASSES and REGISTRATIONS are stored in a cluster, they share the same cluster segment; see the example on the slide.

A block in this segment stores rows from both tables. This means that a full table scan on the CLASSES table takes more time, because its rows are interspersed with the rows from the REGISTRATIONS table.

If a table is stored in a cluster, the cluster becomes the physical unit of storage, and the table is a logical entity; that is, the clustering is transparent to the users and applications.
Note : These two tables are not ideal for clustering; the slide is meant only for illustration purposes. Note that this is an index cluster.

Index Clusters

An index cluster uses a special index, known as the cluster index, to maintain the data within the cluster.
In the example of the previous page, when a user inserts a new registration, the cluster index ensures that the new registration is placed in the same block as the other registrations with the same CLASS_ID.

Choose Appropriate Columns for the Cluster Key

Choose cluster key columns carefully.
If you use multiple columns in queries that join the tables, make the cluster key a composite key.
In general, the same column characteristics that make a good index apply for cluster indexes.
A good cluster key has enough unique values so that the group of rows corresponding to each key value fills approximately one or two data blocks.
Too many rows per cluster key value can require extra searching to find rows for that key.
Cluster key values that are too general (such as MALE and FEMALE) result in excessive searching and can result in poor performance.

Note : The “Choose Appropriate Columns for the Cluster Key” paragraph is also true for hash clusters.

Index Clusters: Performance Characteristics

Index clusters can save disk space, because each cluster key value is stored only once for all the rows that have the same key value.
When accessing rows with a common value, disk I/O is reduced and access time improves.
Tables that are often joined using a foreign key relationship are good choices for an index cluster. Because rows with a given key value are stored together, using a cluster can reduce the number of blocks read to satisfy such a request.

In some situations, it is useful to place only the detail table in a cluster. Consider a case where registration information is always accessed by STUD_ID, and full table scans of the EMPLOYEES table are frequent. In this case, placing only the REGISTRATIONS table in an index cluster may be the best solution, thus the full table scans of the EMPLOYEES table do not suffer any performance degradation.

The cluster index must be available to store, access, or maintain data in an index cluster. The cluster index points to the block that contains the rows with a given key value.
The structure of a cluster index is similar to that of a normal index.
Although a normal index does not store NULL key values, cluster indexes do store NULL keys.

There is only one index entry for each key value in the cluster index. Therefore, cluster indexes are likely to be smaller than a normal B*-tree index on the same set of data.
To store or retrieve rows from a cluster, the Oracle server uses the cluster index to locate the first row that corresponds to the given key value and then retrieves the rows for the given key.

How to Create Index Clusters

Create the cluster.
CREATE CLUSTER cluster-name ( clu-column-definition[s] ) SIZE integer [K|M]
TABLESPACE... STORAGE(...) INDEX;
CREATE TABLE table-name
( column-definitions [,constr-definitions] ) CLUSTER cluster-name(column[s]);


Create the cluster index.
CREATE INDEX cluster-index
ON CLUSTER cluster-name
TABLESPACE... STORAGE(...);

Create tables in the cluster.

You can create tables in the cluster before creating the index. However, all three steps must be completed before you can insert data into a clustered table.

Create a Cluster :SIZE specifies the space required by all rows corresponding to a key value in bytes, kilobytes, or megabytes; INDEX specifies that it is an index cluster. If SIZE is not defined, it defaults to the size of one block. The Oracle server uses this value to estimate the maximum number of key values that can be accommodated in each block of the cluster.

Create a Cluster Index : The key columns do not need to be specified for a cluster index because they have already been defined while creating the cluster. For optimal performance, place the cluster index in a tablespace different from that used for creating the cluster.

Note : A cluster index cannot be unique or include a column defined as LONG.

Create Tables in a Cluster : The CLUSTER clause of the CREATE TABLE command specifies that the table must be placed in a cluster. The cluster columns must have exactly the same data type as the corresponding cluster key columns defined in step 1.
Note: A table that is placed in a cluster cannot have any physical attributes of its own because it is not a segment by itself but is a part of the cluster.