Clusters and Index Clusters ~ OracleSource

Saturday, July 5, 2008

Clusters and Index Clusters

A cluster is a structure that is used to physically locate rows together because they share common column values.
There are two type of clusters: index clusters and hash clusters.
The columns used to cluster the rows are called the cluster key:
The cluster key may consist of one or more columns.

Tables in a cluster have columns that correspond to the cluster key.
Clustering is a mechanism that is transparent to the applications using the tables. Data in a clustered table can be manipulated as though it were stored in a regular table.
Updating one of the columns in the cluster key may cause the Oracle server to physically relocate the row.

The cluster key is independent of the primary key. The tables in a cluster can have a primary key, which may be the cluster key or a different set of columns.
Clusters can waste disk space if the cluster key distribution is skewed. Choosing an appropriate cluster size is one of the most important issues when you consider to cluster your tables.


Cluster Example

If they are stored as regular tables, CLASSES and REGISTRATIONS are placed in different segments. This means that the tables use their own set of blocks—a block that stores rows from the CLASSES table does not contain data from the REGISTRATIONS table, and vice versa.

Because registrations are usually accessed by CLASS_ID, the developer may cluster both tables or just the REGISTRATIONS table on the CLASS_ID column.
If the tables CLASSES and REGISTRATIONS are stored in a cluster, they share the same cluster segment; see the example on the slide.

A block in this segment stores rows from both tables. This means that a full table scan on the CLASSES table takes more time, because its rows are interspersed with the rows from the REGISTRATIONS table.

If a table is stored in a cluster, the cluster becomes the physical unit of storage, and the table is a logical entity; that is, the clustering is transparent to the users and applications.
Note : These two tables are not ideal for clustering; the slide is meant only for illustration purposes. Note that this is an index cluster.

Index Clusters

An index cluster uses a special index, known as the cluster index, to maintain the data within the cluster.
In the example of the previous page, when a user inserts a new registration, the cluster index ensures that the new registration is placed in the same block as the other registrations with the same CLASS_ID.

Choose Appropriate Columns for the Cluster Key

Choose cluster key columns carefully.
If you use multiple columns in queries that join the tables, make the cluster key a composite key.
In general, the same column characteristics that make a good index apply for cluster indexes.
A good cluster key has enough unique values so that the group of rows corresponding to each key value fills approximately one or two data blocks.
Too many rows per cluster key value can require extra searching to find rows for that key.
Cluster key values that are too general (such as MALE and FEMALE) result in excessive searching and can result in poor performance.

Note : The “Choose Appropriate Columns for the Cluster Key” paragraph is also true for hash clusters.

Index Clusters: Performance Characteristics

Index clusters can save disk space, because each cluster key value is stored only once for all the rows that have the same key value.
When accessing rows with a common value, disk I/O is reduced and access time improves.
Tables that are often joined using a foreign key relationship are good choices for an index cluster. Because rows with a given key value are stored together, using a cluster can reduce the number of blocks read to satisfy such a request.

In some situations, it is useful to place only the detail table in a cluster. Consider a case where registration information is always accessed by STUD_ID, and full table scans of the EMPLOYEES table are frequent. In this case, placing only the REGISTRATIONS table in an index cluster may be the best solution, thus the full table scans of the EMPLOYEES table do not suffer any performance degradation.

The cluster index must be available to store, access, or maintain data in an index cluster. The cluster index points to the block that contains the rows with a given key value.
The structure of a cluster index is similar to that of a normal index.
Although a normal index does not store NULL key values, cluster indexes do store NULL keys.

There is only one index entry for each key value in the cluster index. Therefore, cluster indexes are likely to be smaller than a normal B*-tree index on the same set of data.
To store or retrieve rows from a cluster, the Oracle server uses the cluster index to locate the first row that corresponds to the given key value and then retrieves the rows for the given key.

How to Create Index Clusters

Create the cluster.
CREATE CLUSTER cluster-name ( clu-column-definition[s] ) SIZE integer [K|M]
TABLESPACE... STORAGE(...) INDEX;
CREATE TABLE table-name
( column-definitions [,constr-definitions] ) CLUSTER cluster-name(column[s]);


Create the cluster index.
CREATE INDEX cluster-index
ON CLUSTER cluster-name
TABLESPACE... STORAGE(...);

Create tables in the cluster.

You can create tables in the cluster before creating the index. However, all three steps must be completed before you can insert data into a clustered table.

Create a Cluster :SIZE specifies the space required by all rows corresponding to a key value in bytes, kilobytes, or megabytes; INDEX specifies that it is an index cluster. If SIZE is not defined, it defaults to the size of one block. The Oracle server uses this value to estimate the maximum number of key values that can be accommodated in each block of the cluster.

Create a Cluster Index : The key columns do not need to be specified for a cluster index because they have already been defined while creating the cluster. For optimal performance, place the cluster index in a tablespace different from that used for creating the cluster.

Note : A cluster index cannot be unique or include a column defined as LONG.

Create Tables in a Cluster : The CLUSTER clause of the CREATE TABLE command specifies that the table must be placed in a cluster. The cluster columns must have exactly the same data type as the corresponding cluster key columns defined in step 1.
Note: A table that is placed in a cluster cannot have any physical attributes of its own because it is not a segment by itself but is a part of the cluster.

No comments: