Oracle Database INTERVAL Partitioning and Virtual Columns

  • 0

Oracle Database INTERVAL Partitioning and Virtual Columns

Get Social!

oracle-logoI bumped into this issue recently which prevents you INSERTING a date of 31-12-9999 into an INTERVAL partitioned table due to the fact that the LESS THAN partition value would be larger than an Oracle Date.

After speaking with Jeff Moss he mentioned that we might be able to get round the problem by using a virtual column on the table and partition on that. I didn’t find a satisfactory solution using this method, but understanding some of the optimisers quirks and abilities when using partitions and virtual columns was rather interesting.

Starting with the following table structure:

The purpose here is to create a tuple timestamped table for storing historical and current data. The data_from_date represents the date the row is known to be valid from and the data_to_date is when the row is known to changed or been removed entirely from the source system. The current row, the one that is still available in the source system, would not have an ‘end date’ and therefore would either be NULL or a made up ‘high’ date, such as DATE 9999-12-31.

And this is where the problems start.

Winding back a little, and forgetting the high date scenario for a moment, let’s take a look at how queries use the partitions. Generally, queries would use a specific date that’s between the data_from_date and data_to_date to get a view of the data for a point in time.

As you can see, the partitions are being pruned where possible, starting at partition 14. Side note: because this table is INTERVAL partitioned and there is no maximum value set for our data_to_date Oracle may have to look at all the partitions after partition 14. Potentially, this could be all the way up to 1048575 which is the maximum partition number that could exist. It’s a shame the optimiser isn’t a little more intelligent here – I know there aren’t 1048575 partitions and so should the optimiser!

Anyway, back to the matter at hand. Inserting a high date into this table results in an error (described further here).

A potential solution to the problem was to add a virtual column to the table that translates the high date into something that would go into a monthly partition. There are a bunch of options, to name a few:

  • TRUNC the date to ‘MM’ and minus 1
  • Minus 31 away from the date
  • TRUNC the date to ‘IQ’ – this would give quarterly partitions and not monthly partitions, but let’s not rule it out.
  • TO_CHAR the date to a YYYYMM, then CAST that as a NUMBER and use numbered partitioning.

I’m not going to explore all of these options and the set up script for you to test further scenarios is here. Note the test SQL will be:

Truncate the month and minus 1

This was my first test as this would give me the previous months value to partition on which would nicely fit in without causing the ORA-01841 error. The virtual column expression is as follows:

The result wasn’t a success with a query plan of:

Minus 31

Next I simplified the expression by simply minusing 31 days from the date being inserted. This removes the TRUNC ‘complication’ to produce a less complex expression.

Again, the result was a failure with a query plan of:

Does it ever work?

Yes! The following expressions work, but as you’ll notice have mixed results in solving our initial problem. It seems as soon as any math operator is specified that the optimiser cannot work out that partition pruning can occur. Using TRUNC however, which to me seems a more complex function, works perfectly. Interestingly when it does work, the filter section of the query plan shows an additional predicate that your query doesn’t specify but that is used to partition prune.

Here are some of the virtual columns that do partition prune however, might not solve the original issue.

 


  • 0

Oracle Database Interval Partition Set Up Script

Get Social!

This script is used to set up the test table for the following blog posts:

 


  • 0

Oracle Database INTERVAL Partitioning Date Error With 31-12-9999

Get Social!

oracle-logoYou may have seen the below error when using dates at the extremes with an Oracle Database. This post focuses on the scenario where you’re inserting a high date, such as 31-12-9999 into an INTERVAL partitioned table. High dates such as this are often used in warehouse environments where from and to dates are specified to denote a period of time the data was valid for – a high date specifying the current data row.

Take the following table and partition definition:

Each row inserted will go into a partition for that year-month combination. This can be useful for high volume data that’s generally queried for one-or-so months at a time.

If, however, you try to insert the following row, you’ll get an error:

This is due to the way the Oracle Database defines a partition. An INTERVAL RANGE partition in an Oracle Database is defined as accepting dates LESS THAN a specific value. The below example would store any dates up to the clock ticking over into May 2016.

When partitioned monthly, the LESS THAN value for inserting DATE ‘9999-12-31’ would have to be DATE ‘10000-01-01’ and that is not a valid date value. Oracle Database dates must be between the year -4713 and +9999 inclusive as explained by the error.

The solution? Either use a non-INTERVAL partition strategy, reduce the date being INSERTed or experiment with a Virtual Column and partition on that.


  • 1

Create a Simple Cassandra Cluster With 3 Nodes

Category : How-to

Get Social!

datastax-logoApache Cassandra™ is a massively scalable open source NoSQL database. Cassandra is built from day 1 to be clustered to tick the usual clustering boxes; no single point of failure, and capacity and throughput scales with cluster size. This guide will look at creating a three node basic cluster within a single data centre.

It is assumed that the following is already in place:

  • Cassandra 3.2 is installed on 3 nodes.
  • Each node has open communication between the other nodes. Take a look here if you use a firewall between nodes.
  • The IP addresses of each node are known.
  • No data is stored on the 3 Cassandra instances.
  • The default file paths are used, for example from a yum or apt-get install.

Clear existing Cassandra data

If you’ve already started your Cassandra instance you’ll need to stop it and remove the data it contains. The main reason for this is because the cluster_name needs to be the same on all nodes, and it’s best to choose one for yourself rather than use the default Test Cluster.

Set the Cassandra cluster configuration

Cassandra is configured using various files in the /etc/cassandra directory. The cassandra.yaml contains most of the Cassandra configuration, such as ports used, file locations and seed node IP addresses. Edit your cassandra.yaml file and fill in the details as below. The below example is for brevity and your file may contain many other settings.

  • cluster_name can be anything chosen by you to describe the name of the cluster. Space is allowed but make sure you wrap everything in quotes. All members of this cluster must have the same name.
  • num_tokens is the number of virtual nodes within a Cassandra instance. This is used to partition the data and spread the data throughout the cluster. A good starting point is 256.
  • seeds are the IP addresses of the clusters seed servers. Seed nodes are used as known places where cluster information (such as a list of nodes in the cluster) can be obtained. They are not a single point of failure as all active nodes have this information, they are just known locations that can be relied on to have the information when other machines can come and go. It’s recommended that there are 3 seed nodes per data centre.
  • listen_address is the IP address that Cassandra will listen on for internal (Cassandra to Cassandra) communication will occur. Cassandra will try to guess your machines IP address if you leave it blank, but it’s best to specify it yourself. This will change on each node.
  • rpc_address is the IP address that Cassandra will listen on for client based communication, such as through the CQL protocol. This will change on each node.
  • endpoint_snitch is the ‘snitch’ used by Cassandra. A snitch is what tells Cassandra which data center and rack a node belongs to within a cluster. There are various types that could be used here, however describing them is outside the scope of this guide.

Example for node 1:

Example for node 2:

Example for node 3:

Cassandra is built to be fault tolerant and will distribute data to try to minimize the risk of a failure causing loss of data or any downtime. Cassandra therefore has the understanding of a node, a rack and a data centre. Where possible, Cassandra will ensure that the data and it’s backups are stored on a different rack and a different data centre to ensure that failure, even at a data centre level isn’t catastrophic.

Edit the cassandra-rackdc.properties file on each node and set the dc and rack attributes. For this example we’ll assume everything is in the same dcdc1 however two nodes will be on rack1 and one node will be on rack2. Names are irrelevant, just come up with a naming standard that helps you understand where the Cassandra instance actually is. Everything here is case sensitive so be sure you’re consistent.

Example for node 1:

Example for node 2:

Example for node 3:

Finally, remove the config file cassandra-topology.properties from the etc dir as that’s not used with our configuration.

Starting your Cassandra cluster

The final steps are to start your cluster and connect to it.

First off, start your seed instances that were specified in the cassandra.yaml config file. Once these are up and running you can start the remaining nodes.

Once all of your services are started you can use the nodetool status command to check the status of your nodes. Run this from any Cassandra server. As you can see with the below output, all three servers are available in the uk_dc data centre on rack1 and rack2.

Connect to your Cassandra Cluster

Once all of your servers have started your cluster is ready to use! Each node will have the cqlsh utility installed that you can use to interact with your Cassandra cluster. You’ll need to use one of the IP addresses Cassandra is listening on (set with rpc_address in cassandra.yaml).