Wednesday, February 10, 2010

Data Archival Mechanisms

The first step in planning a data archival is to decide on a data retention period. It is always better to select the retention period such that no data is required from the archival db and live db together for any of the operations in the application. The user should also be prevented from entering old dates before the retention period in the report selection screens.

Common Archival mechanisms used are

CDC (Change Data Capture)
Oracle Streams(built on top of CDC)

CDC - When any insert/update/delete event happens in the prod db, it allows us to update the info in another db. It is possible to write scripts to catch the event, process the data and update in the archive db.This will not affect the performance of the prod db much as its not done synchronously like triggers. When we write script to copy the data to archive db, we need to exclude the delete events. Another scheduled job will have to be written to remove the old data from the live db.

Oracle Streams - is built on top of CDC. It does the propagation of data to the archival db automatically. Here we can copy all the data changes from live db excluding the delete events through some configurations. The scripts for deleting old data from live db will have to be written and run as a scheduled job.

Both these methods keeps archived data as well as the operational data in the archive db. Production db will have only the operational data. There may be a slight delay in synchronizing the production db and archive db which is neglegible.

These features[CDC/Streams] are not available in all the version/edition of oracle. So before deciding upon a archival method, need to ensure that the feature is available in the oracle version used.

Another method for archiving is to write scripts to move the data to archive db and run the script as a scheduled job.

There may be a situation where the customer have a requirement which needs data from both production db as well as archival db. for eg: statistical reports. In case of CDC or Streams the archival db contains the entire data ie the production data plus the old data. In this case for statistical reports which requires old data, can connect to the archive db to show the reports. The db connection can be selected on the basis of the date range entered in the screen to achieve this.

When the sql script method is used for archival, we will have to connect to both the dbs, when generating reports with a date range selection. Instead of connecting to both dbs from the application we can create views which takes a union of data from the production db and archive db. Parameterized views can be used to increase performance in such a scenario.

Labels: ,


This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]