EnkiBackupDesign
Contents |
Overview
Requirements for backing up different types of Enki-based repositories vary based on the repository type. XMI export is a lowest common denominator, but for large repositories it can be very slow. Alternatively, Enki/Netbeans can be backed up by making a copy of the underlying storage file(s) and Enki/Hibernate can be backed up using common database tools.
This document describes an API and implementation that abstracts the process of backing up an Enki-based repository so that the caller need not know how the underlying repository is configured.
API
The class EnkiMDRepository provides an extension point where additions to the traditional Netbeans MDRepository can be made.
Two new methods are added EnkiMDRepository to provide for catalog backup and restore:
import java.io.OutputStream;
import java.io.InputStream;
import java.io.IOException;
/**
* Backs up the given extent, writing a single file to the given output stream.
* The file written may be a multi-file archive, such as a ZIP, JAR or TAR file.
* Typically the file is uncompressed, allowing the caller to apply their choice
* of compression.
*
* @param extentName name of the extent to back up
* @param stream an OutputStream to which a single file's data is written
* @throws IOException if there is an error writing to the stream
*/
public void backupExtent(String extentName, OutputStream stream) throws IOException;
/**
* Restores the contents of the given input stream to the named extent. The
* input stream is assumed to contain a single file's data. The input stream's
* close method is <b>not</b> called, thereby allowing the extent to be restored
* from a single entry in a multi-file archive, such as a ZIP, JAR or TAR file.
*
* @param extentName name of the extent to be created and restore to
* @param metaPackageExtentName name of the extent describing the
* new extent's metamodel
* @param metaPckageName name of the root package for the new extent
* @throws IOException if there is an error reading from the stream
*/
public void restoreExtent(
String extentName,
String metaPackageExtentName,
String metaPackageName,
InputStream stream) throws IOException;
Implementations of the API may throw additional runtime exceptions to handle cases where the the backup or restore operations cannot execute due to non-I/O related errors. For example, some repository implementations may place restrictions on when a backup may occur. Others may be able to detect a restore-time mismatch between backed up data and pre-configured storage.
Enki/Hibernate Implementation
Backup
Enki/Hibernate code generation stores DDL for each model or model plug-in's schema along side the existing model configuration data. Two files are generated and kept with the model:
- DDL to drop each table and view related to the model or model plug-in. If the database supports it, the drop statements will be conditional.
- DDL to create each table and view related to the model or model plug-in.
The backup method for Enki/Hibernate uses a java.util.zip.ZipOutputStream wrapped around the caller's output stream to store multiple files as part of the backup itself. The ZipOutputStream is set to store only, allowing the caller to compress it as desired.
--Zfong 16:37, 17 November 2008 (EST) Is it possible to use another compression program instead of zip, or for the backup to be uncompressed? E.g., LucidDB hot backup currently optionally supports gzip compression. In the future, we may also want to support other programs like bzip2.
- --Szuercher 12:47, 19 November 2008 (EST) The backup is actually uncompressed, it just uses the ZIP format to store multiple files since the JDK provides ZIP streams. Enki/Hibernate's backup format is meant to be opaque. The idea is that the code that calls backup/restore can pass any Input/OutputStream it desires. I expect LucidDB hot backup will pass either a FileInput/OutputStream (no compression) or a GzipInput/OutputStream wrapped around a FileInput/OutputStream (when compressing). That lets LucidDB (or any other hypothetical caller) choose the compression algorithm without Enki's participation.
For each backup, two files are stored:
- A backup descriptor which contains enough information to guarantee that the appropriate metamodel is available upon restoration. The file will also contain the extent's name, minimum MOF ID, package version information and any extent annotation that has been made.
- The actual data stored in the repository, excluding provider-specific tables.
The backup descriptor is a simple properties file:
enki.extent=FarragoCatalog enki.extent.annotation=foobar enki.metamodel=FarragoMetamodel enki.minMofId=1000 enki.model.packageVersion=1.0
Restoration requires:
- The given metamodel is available on the JVM's classpath
- The backup's package version is compatible with the version of the Enki library in use
The data file contains table data in a format that allows streaming of LOB data to MySQL. This eliminates any issues related to the MySQL maximum packet size configured on the server. For each table, the file contains a line indicating the name of a table, followed by its columns:
SMPL_Sample_Driver,mofId,name,points
The table and column names are ready to be quoted as necessary for a specific database dialect. Also note that the table and column names have already been through JMI's name mangling algorithm, which will strip out any commas (in addition to most other punctuation), so there's no need to escape commas embedded in the table or column names.
Each table description is followed by a number indicating how many rows to insert, which is then followed by one line of comma-separated values for each row representing the actual rows to insert. Data is output in UTF-8 format and uses backslashes to escape embedded end-of-line, or single quote characters. String values are enclosed in single quotes, but do not use SQL-style quote doubling to escape quotes. 64-bit integer values are suffixed with "L", 16-bit integers are suffixed with "s", and regular precision floating-point numbers are suffixed with "f". Null values are specified by the bare word null, which is distinct from a string value containing the word null ('null'). Example data:
2 123L,'Bob\'s\nname is very, very weird',2.5f 4567L,'Alice',null
Restore
Repository restore uses a java.util.zip.ZipInputStream wrapped around the caller's input stream to retrieve multiple files.
The following steps are taken to restore a backup:
- Find metamodel. The first file will always be a backup descriptor which contains enough information to guarantee that the appropriate metamodel is available upon restoration. If the metamodel cannot be found, restoration fails.
- Check package version. the backup's package version is compared to those compatible with the version of Enki currently in use. If the backup is incompatible, restoration fails.
- Delete old type/MOF ID mappings. Delete all type/MOF ID mappings related to any existing copy of the model.
- Remove any existing objects from the extent, if any.
- Restore data. The data file is read.
- For each table, prepare an insert statement using the table name and columns and the necessary dialect-specific quoting. Then apply the data rows in turn. If any row does not contain exactly the right number of values, the restoration fails. If any insertion fails, restoration fails. Row values are set on the prepared statement by detecting integer type (integral number + suffix), floating point type (floating point number + suffix), boolean values (true, false), string values (enclosed in single quotes), or nulls.
- During restoration, columns named
mofIdare treated specially. MOF ID values are replaced with newly generated values to insure that no collisions occur in the repository. At the start of restoration, the next MOF ID is requested and a delta between the new MOF ID and the minimum MOF ID in the backup is computed. Replacement MOF ID values are then computed by adding the delta to the backup's MOF ID for any given row. This allows restoration to be ignorant of foreign key relationships between tables so long as the backup stores data in the correct order.
- Update MOF ID table. The repository's MOF ID table is updated to insure that any MOF IDs generated in the future are greater than any value in use in the model.
- Update type/MOF ID mapping. The repository's type/MOF ID mapping table is updated to reflect the newly restored data.
- Update extent. If the extent did not previously exist, create an entry in the extent table. Otherwise, update the annotation to reflect the backup's annotation.
Missing Repository-Specific Storage
There are some issues around restoring a database without the necessary provider-specific tables. Presently, the HibernateMDRepository instance will fail during construction if it cannot find these tables. It is expected that the necessary provider-specific tables are created by an external process.
Existing Extent
Restoring to a database which already contains a version of the same extent will cause the old extent to be lost.
DDL
Although the metamodel contains scripts to drop and create a metamodel's schema, they are not used during the restore process. The original design called for the extent to be dropped and re-created via these scripts, but this prevents multiple extents using the same metamodel from existing in the same database. Instead, we delete the objects for the given extent individually.
TODO: Need to make it possible to distinguish the type lookup entries from multiple extents when they reference the same object names.
Enki/Netbeans Implementation
Backup
Uses javax.jmi.xmi.XmiWriter to export an XMI file containing the contents of the named extent directly to the caller's output stream.
Restore
Uses javax.jmi.xmi.XmiReader to import an XMI file containing the contents for the named extent directly from the caller's input stream.
TBD: Handle the old Farrago bug where database data, when inserted into the catalog (typically as histogram values for column statistics), must be filtered to prevent XML encoding errors. Could move these brains into Enki, since it's really an XMI bug that the generated XML isn't escaped properly.