High availability with Oracle Standard Edition
Simple Continuous Operation
If your task is to establish a high-availability (HA) environment in the Oracle universe, Oracle offers you its "Maximum Availability Architecture," consisting of RAC (Real Application Clusters), Data Guard, and GoldenGate. However, this package is made up of different solutions for various problems, which leads to questions regarding the need and suitability of the various products for your company. Keeping the definition of high availability in mind, you need to ask the following for your database environment: What should be protected?
High availability is thus divided into several considerations: Does an application need to be safeguarded against failure or – more likely in many cases – does protection need to be provided against potential loss of data? This takes you to the technical differences between RAC and Data Guard. Simply setting up RAC does not provide any protection against the data center failing, for example, because of a fire. This additionally requires a standby solution for which you can set up a Data Guard configuration.
Unfortunately, however, Data Guard can only be used in the Enterprise Edition. A nice note can be found in the Data Guard documentation: It may be possible to set up a standby database environment manually in the Standard Edition itself by moving archived redo logfiles onto the standby site via the command line and then importing them by script, but, of course, not with the monitoring and management capabilities that Data Guard offers.
Standard Edition and High Availability
Is high availability then possible only using the Enterprise Edition? A look at Oracle GoldenGate, which can also be used with the Standard Edition, shows several additional features compared with Data Guard. These include cross-platform and cross-version use and bidirectional replication. However, these features are not needed for the example here – a high-availability environment with the Standard Edition.
It's high time to think outside the box and address various third-party solutions if the Standard Edition is sufficient for your purposes, but you do not want to do without affordable protection against failure and data loss. In addition to the well-known SharePlex from Quest or Dell, Dbvisit has become a talking point of late with its standby product. Additionally, Dbvisit has developed its own replication solution in the form of Replicate. This tool presents an alternative to Oracle Streams, which you should take a look at because streams were set to a status of "deprecated" in database release 12c R1, alongside Advanced Replication. Thus, the feature is still included in 12c, but Oracle has discontinued development. It is thus only one step ahead of "desupported" and ultimately – along with the limited usability in the Standard Edition – a reason why it probably should be dropped.
Defining Tolerable Data Loss
Replication and standby provide different solutions because, apart from mirrored disks, redundant data storage through replication – logically or physically – as a standby is possible. Unfortunately, the terminology and its use also often causes confusion, especially with Data Guard. The differences lie in the usability of the replica. Although a "Physical Standby" is built exclusively for a failure event, the target database can be fully used with logical replication (e.g., as a reporting database) (Figure 1).
Multiple solutions for lossless replication while simultaneously using the replica are available when using the Enterprise Edition with Active Data Guard and Logical Standby. However, the challenge with the Standard Edition is much greater. Solutions such as Dbvisit Standby, which are based on a redo log shipping, may allow elegant and quick-to-implement high availability; however, they have the disadvantage that data loss of several minutes (about 15 minutes in practice) must be accepted if there is an error, and the replica cannot be used during normal operation.
The most important question when analyzing an HA environment with the Standard Edition is therefore: How much data loss can be tolerated? If the number of transactions is on the low side, a standby solution where the redo log files are transmitted every five minutes might come into question. Be careful: This can quickly become a bottleneck in batch runs or other major actions. Thus, you should refrain from physical replication if the maximum data loss must be less than 15 minutes. Here logical replication demonstrates its strengths, because it can reduce data loss to a minimum when there is a failover.
No recovery of the archived redo log files occurs with logical replication, as opposed to the Physical Standby; instead, the DML and DDL commands are restored directly from the redo logfiles and run on the standby server. Data Guard now goes in other directions and writes directly via Log Writer from the primary database in parallel to separate standby redo logs on the target side.
Completely normal recovery is then performed from this in the Physical Standby version; with Logical Standby, the same is done by a SQL Apply with the database open. The minimal data loss with logical replication is limited to transactional data that has not yet been sent from the primary site to the replication site until the failover. This means that everything that landed in the redo logfiles and was applied also exists in the replicated database. There are two conceptual differences when transmitting the data: SharePlex and Dbvisit Replicate transmit the data before the commit occurs, whereas GoldenGate transmits the data together with the commit. A single "commit" could still be carried out if an error occurs, but larger transactions that have been transferred as a whole are unlikely to be successful.
Challenges in the HA Environment
Although a physical standby is set up quickly and no further work is generally required (in addition to monitoring), this is a separate database with logical replication that must be managed separately and backed up for productive use. The backup should not be an unsolvable problem. The fact that changes to the SYS and SYSTEM schemes were not included in the logical replication, however, creates more difficulties. The same applies to ROWID
s.
Other problems could arise from specific data types that are not supported by a replication solution such as XMLType
. Triggers that might now fire twice represent another task. Additionally, there's a problem with sequences, which are often used as unique IDs by some applications, in that an uninterrupted numerical order cannot be guaranteed – with or without replication. Simply rolling back a transaction creates a gap. Despite this fact, many applications rely on sequences or the primary key (which may be guaranteed to be unique but is not definitely continuous) that could cause discrepancies in a failover.
Another problem that must be avoided involves conflicts that can arise if constraints on the replication site prevent data from being inserted from the primary side. This problem, however, can only occur if changes have been made by third parties in the replication database. Thus, you must ensure appropriate access protection.
The last major challenge is to keep the complexity as low as possible and to develop a workable system that both meets the requirements and remains manageable. Of course, you cannot disregard human error in all considerations regarding high availability and so need to implement a robust backup strategy for this case, too.
Buy this article as PDF
(incl. VAT)