Common Archive Observation Model (CAOM-2.2)
The Common Archive Observation Model (CAOM) is a general purpose data model for use as the core data model of an astronomical data centre. The purpose of this model is to standardize the core parts of every archive implementation so that a common set of software tools can be used across archives. This set of tools includes simple data access mechanisms (web pages), discovery agents, advanced query capabilities (typically as IVOA web services), and advanced applications. In addition, we have implemented CAOM in a cross-archive data warehouse which will hold metadata for all public and proprietary data; the common model is used within the data warehouse and thus defines the portion of the archive database that is available through all applications.
While the previous version (CAOM-1.0) was very successful in enabling the CADC to serve a wide variety of data, some particular data structures were awkward to describe in a meaningful way. CAOM-2.0 has been developed to address such issues and cover new scenarios for data products.
CAOM-2.2 Software
resource | description |
---|---|
Table Access Protocol (TAP) | a web service for ad-hoc querying, including both CAOM and the IVOA ObsCore-1.1 Data Model |
Simple Image Access (SIA) | a web service for finding calibrated images (uses TAP and CAOM) |
AdvancedSearch | a user interface for querying CAOM (uses TAP) |
CAOM operations services | a set of high level web services to support use of CAOM: drill down to downloads using IVOA DataLink, cutouts on data files using IVOA SODA, get detailed metadata of a CAOM observation, get all files in a product (CAOM Plane) from a single download URL |
CAOM observation repository | web service to support ingestion of metadadata into CAOM at CADC (requires authentication) |
caom2 | java library for manipulating CAOM observations, including reading and writing XML documents
source code |
fits2caom2 | java command-line tool for extracting metadata from FITS files, reads FITS file(s) and writes an XML document
source code |
pyCAOM2 | python library for manipulating CAOM observations, including reading and writing XML documents
source code |
caom2repoClient | python command-line tool for interacting with the CAOM observation repository, supports
get, put, update, delete (requires authentication)
source code |
cadcWCS | java library for accessing wcslib
source code |
wcsLibJNI | JNI binding library for wcslib
source code |
Core Classes
The core CAOM classes describe the observation, data products, and access to the physical artifacts (typically files), as well as the provenance of the data products. In addition, we also show the (loosely coupled) metadata access control classes.
The UML diagrams specify through the cardinality whether an object or field may be null or not.
- no visible cardinality means 1: the value is mandatory
- 0..1 compositions (solid black diamond at start) are equivalent to member variables; the value is optional
- n..* compositions signify a collection which is never null, but may be empty for n = 0
While there are a few methods shown in the model, these are to signify value conversions (e.g. wavelength to frequency) that require access to other internal fields; this is a reminder/hint to persistence implementations that the values returned from methods should also be stored for efficient querying (for example, by the TAP service).

Changes
- 2016-04-07 (red outline): made Artifact.productType mandatory; added mandatory Artifact.releaseType
- 2014-11-18: added Observation.requirements with a Status flag; added Plane.qualityFlag; moved all vocabularies (enumerations) a separate diagram (below)
- 2014-02-10: added restwav (observed rest wavelength) to computed plane energy metadata
- 2013-11-26: added moving flag to Target class, added TargetPosition class
- 2013-09-08: added PolarizationState literals: POLI (polarized intensity), FPOLI (fractional polarized intensity), POLA (linear polarized angle), EPOLI (elliptical polarized intensity), CPOLI (circular polarized intensity), and NPOLI (non-polarized intensity)
- 2013-06-18: added CATALOG, NOISE, and WEIGHT as additional ProductType literals (used to all be classified as AUXILIARY)
- 2012-11-20: added timesys, trefpos, and mjdref to TemporalWCS (from the draft FITS TimeWCS paper); time values in the CoordAxis1D are still restricted to MJD only as they are numbers and and we would need another field to allow JD; if mjdref is null, the values are absolute (otherwise they are relative to mjdref)
- 2012-07-31: renamed DataProcess to Provenance, changed Plane.position.timeDependent to be a Boolean (null allowed)
- 2012-06-20: removed partName - partNumber distinction and just use a required Part.name field
- 2012-06-20: made Observation.target.type optional, removed ProcessMetaReadAccess since provenance is now part of one plane (access is controlled by PlaneMetaReadAccess), added Observation.type (typically OBSTYPE from FITS) and Observation.intent (science or calibration)
- 2012-06-07: removed Plane.observable since it is not sensible to summarise multiple Chunk.observable values and chunks with one observable will usually leave it blank
- 2012-03-26: added various ASMD fields, refactored Metric to be a fixed set of Plane.metrics.* fields
- 2012-03-15: added cardinality to keywords fields to show they are lists of strings
changed Plane.observable to have a simple name and units instead of re-using Axis
removed Target.position field - 2012-02-17: change Plane.dataType to Plane.dataProductType
-
2012-02-15: Refactored provenance to be a single class owned by and characterising a single plane
made Metric.error a field (nullable), but utility of current design is under consideration
specified cardinality of keywords to be [0..*] to signify a list of strings
refactored Observable, but current design is under consideration
removed Plane.dataStatus and the DataStatus enumerated type (no use case)
refactored Part so that partName and partNumber are immutable state
- 2012-01-27: Added convenience methods to Energy to return the width of the energy bounds and the sample size in frequency (the bounds and sampleSize fields are wavelength)
All the mini-vocabularies defined in the CAOM model are shown below. More detailed definitions and actual literal values used when stored or serialised follow.

Changes
- 2016-04-07 (red outline): ; added ProductType.THUMBNAIL (subtype of preview); added ReleaseType vocabulary
- 2014-11-18: separate vocabulary (enumerations) diagram, added Status and Quality; removed ProductType.CATALOG value
Observation
An observation is the primary top-level class for empirical data. A simple observation is the result of one use or invocation of a telescope and instrument, while a composite observation is a collection of simple observations that have been combined to create a new data product. simple observations have one or more data products (planes). The metadata for an observation is concerned with describing the data acquisition: the proposal, telescope, instrument, and target. The algorithm for a simple observation is always exposure while the algorithm for a composite observation describes the semantics of the grouping or composition involved.
Enumerated type values are generally lower case strings equivalent to the names used in the model, but there are a few exceptions where numeric values or mixed-case strings from an external source are used.
Enumeration | model field | serialised value | description |
---|---|---|---|
ObservationIntentType | describes the intent of the observation | ||
SCIENCE | science | ||
CALIBRATION | calibration | ||
TargetType | describes the type of target (e.g. FITS OBSTYPE keyword) | ||
OBJECT | object | ||
FIELD | field | ||
Status | flag to characterise the satisfaction of requirements | ||
FAIL | fail |
Plane
A plane is one (of several) data product(s) that are created as part of an observation. Each simple observation typically has one raw plane created by the observing process itself and may have one or more additional planes that are produced by subsequent data processing. The metadata for a plane characterises the data product: the coverage and sampling in position, energy, time, and polarization, a description of the observed quantity (usually something proportional to flux). This metadata is generally computed from the artifact metadata and is used primarily in data discovery.
Enumeration | model field | serialised value | description |
---|---|---|---|
CalibrationLevel | from IVOA ObsCoreDM-1.0 | ||
RAW_INSTRUMENTAL | 0 | raw data, instrumental format | |
RAW_STANDARD | 1 | raw data, standard format | |
CALIBRATED | 2 | calibrated data, standard reductions appleid | |
PRODUCT | 3 | product, advanced processing applied | |
DataProductType | from IVOA ObsCoreDM-1.0 | ||
IMAGE | image | ||
SPECTRUM | spectrum | ||
TIMESERIES | timeseries | ||
VISIBILITY | visibility | ||
EVENTLIST | eventlist | ||
CUBE | cube | ||
CATALOG | catalog | custom data product type | |
EnergyBand | from IVOA Resource Metadata for the Virtual Observatory | ||
RADIO | Radio | nu > 30GHz (lambda < 10mm) | |
MILLIMETER | Millimeter | 0.1-10mm | |
INFRARED | Infrared | 1-100um | |
OPTICAL | Optical | 300-1000nm | |
UV | UV | 100-300nm | |
EUV | EUV | 10-100nm | |
XRAY | X-ray | 0.12-120keV | |
GAMMARAY | Gamma-ray | energy > 120keV | |
Quality | flag data quality | ||
JUNK | junk | this data is bad for everything |
The serialised values for Polarization (as listed in the PolarizationState type) are the identical upper case letter(s).
Artifact
An artifact is one physical product or resource (typically a file) that is part of a plane. Planes in an Observation with intent=science should always have at least one science artifact and may have other types of (associated) artifacts. Planes in an Observation with intent=calibration should always have at least one calibration artifact and may have other types of (associated) artifacts. While science Observations (with science Artifacts) could also contain calibration Artifacts, this should only be done if those calibrations are to be used with that science data only; normally science and calibration files are part of different Observations.
Enumeration | model field | serialised value | description |
---|---|---|---|
ProductType | file type classification | ||
SCIENCE | science | science data | |
CALIBRATION | calibration | files needed to calibrate science data | |
INFO | info | additional information file(s), usually text | |
PREVIEW | preview | non-science quality rendition of the data for preview | |
THUMBNAIL | thumbnail | small non-science quality rendition of the data for preview | |
NOISE | noise | associated noise files needed for analysis | |
WEIGHT | weight | associated weight files needed for analysis | |
AUXILIARY | auxiliary | other associated files needed for analysis | |
ReleaseType | artifact release permission type | ||
DATA | data | release depends on parent Plane.dataRelease and PlaneDataReadAccess |
|
META | meta | release depends on parent Plane.metaRelease and PlaneMetaReadAccess |
Part
A part is one qualitatively defined subsection of an artifact. For example, if the artifact is a multi-extension FITS file, it would have one part for each extension. If the artifact is a tar or zip file, it would have one part for each contained file.
Chunk
A chunk is a quantitatively defined subsection of a part. The chunk is characterised by world coordinate system (WCS) metadata plus an extra axis to describe different observables (the measured values) stored within the data. Different chunks can be defined which vary only on the range of coordinate values they include. For example, if a single data array contains different observable quantities can define a chunk for each slice through the array, with each slice having a different product type. One can also use chunks to define arbitrary tiles in a large data array; this is useful if there is no WCS solution to describe the mapping of sky to pixel coordinates but one still wants to be able to extract smaller sections of the data (e.g. one chunk).
Metric
A metric is a measured quantity that described a data product. Metrics are inherently very flexible and can be used to derive interesting and useful metadata about data products, usually by analysing the actual content and computing some aggregate information. For example, one could compute the number (or density) of point and extended sources in an image or the signal-to-noise ratio of a specific type and brightness of source. Metrics describe the content of the data product.
Provenance
The provenance of a data product describes the data processing that transformed the input(s) into the output. For example, a raw plane is processed to produce a calibrated plane using a specific recipe. This is considered provenance since the primary goal or view of this metadata is to understand how a specific plane was produced and to trace backwards from an output, through the data process, to the input(s). The complete provenance may include multiple loops until it reaches a plane which does not have a provenance: a raw plane in a simple observation.
ReadAccess and Subclasses
Access control is declared as special permission classes where the existence of instances grants read permission on instances of associated assets. In addition to the permissions, assets also include a release date (e.g. metaRelease in the Plane class) that specifies when the asset is public and permission is not required. For example, instances of the PlaneMetaReadAccess class grant read permission on the metadata held in the an instance of the Plane class (where the assetID of the permission matches the ID of the asset). Access control is implemented as a query transformation (e.g. in a TAP service) that connects the asset and the permission and allows one to implement and protect proprietary metadata within CAOM.
IVOA Publisher Dataset Identifiers
The IVOA publisher dataset identifier (publisherDID) is a Uniform Resource Identifier (URI). In CAOM, this is a reproducible, globally-unique identifier, constructed as follows:
ivo://{authority}/{collection}/{observationID}/{productID}
where the curly braces { } denote values from the model (or a registered IVOA Authority ID in the case of {authority}). Within one authority (data centre), the collection and observationID from the Observation class and the productID from the Plane class make up the publishderDID for a single CAOM data product. It is not clear exactly how one resolves a publisherDID, but what one can do is strip the path off and look up the authority in an IVOA registry and then look up specific types of IVOA services owned by that authority, such as the proposed DataLink service.
Data Types (Discovery)
The following data types are used in the computed characterisation of the Plane class to support data discovery queries.

Changes
- 2013-09-08: added PolarizationState literals: POLI (polarized intensity), FPOLI (fractional polarized intensity), POLA (linear polarized angle), EPOLI (elliptical polarized intensity), CPOLI (circular polarized intensity), and NPOLI (non-polarized intensity)
- 2012-02-15: made Location contain coordinates rather than be a subclass for symmetry
- added Box datatype
- changed some fields to methods to correctly specify required and immutable state
Data Types (WCS)
The following data types are derived from FITS WCS (TODO: ref), but also include more general ability to describe coverage when per-pixel coordinate system information is not available (for example, raw data that is not in FITS format) or is not practical to store in the database (for example, non-linear spectral WCS solution stored in the data array or table of a FITS file).

The PolarizationState enumeration that describes polarization of a Plane is computed from the FITS WCS description of the axis (CTYPEi=STOKES). The numeric values used in the FITS WCS representation are related to the string values following FITS WCS Paper 1 (Table 7) and some extensions extracted from a fits mailing list discusssion. The mapping is as follows:
PolarizationState | numeric value | description/source |
---|---|---|
I | 1 | FITS WCS Paper 1 |
Q | 2 | FITS WCS Paper 1 |
U | 3 | FITS WCS Paper 1 |
V | 4 | FITS WCS Paper 1 |
POLI | 5 | linear polarized intensity sqrt(Q^2 + U^2), code used in AIPS |
FPOLI | 6 | fractional linear polarization POLI/I, code used in AIPS |
POLA | 7 | linear polarization angle 1/2 arctan(U,Q), code used in AIPS |
EPOLI | 8 | elliptical polarization intensity sqrt(Q^2 + U^2 + V^2) |
CPOLI | 9 | circular polarization intensity |V| |
NPOLI | 10 | unpolarized intensity I - EPOLI |
RR | -1 | FITS WCS Paper 1 |
LL | -2 | FITS WCS Paper 1 |
RL | -3 | FITS WCS Paper 1 |
LR | -4 | FITS WCS Paper 1 |
XX | -5 | FITS WCS Paper 1 |
YY | -6 | FITS WCS Paper 1 |
XY | -7 | FITS WCS Paper 1 |
YX | -8 | FITS WCS Paper 1 |
Changes
- 2012-11-20: added timesys and trefpos to TemporalWCS (from the draft FITS TimeWCS paper); time values in the CoordAxis1D are still restricted to MJD only as they are numbers and and we would need a boolean to allow JD; example usage from draft paper: CTYPE=TIME and TIMESYS=UTC is equivalent to CTYPE=UTC and TIMESYS=null
- 2012-03-15: changed Slice to have methods that reflect immutable state
changed SpectralWCS.specsys to be a method since a value is mandatory
removed extraneous get methods from CoordAxis1D and CoordAxis2D - 2012-02-22: Changed Error class to CoordError to avoid annoying code conflict with java.lang.Error since that package is always in scope
-
2012-02-15: removed get methods from CoordAxis1D and CoordAxis2D that were utilities for
computing information and not signifying state
renamed bounds and footprint to range and bounds respectively changed CoordAxis1D.axis from field to getAxis method (not nullable) - 2012-01-27: Added support for a circular footprint (in addition to polygon) to better capture a single beam in radio data; still need to determine the impact of the use of polymorphism on persistence implementation
- changed some fields to methods to correctly specify required and immutable state
Persistence Interfaces
Once of the goals for CAOM is to enable instances of observations to be harvested to alternate databases. To support this, all CAOM entities have a unique identifier to support fine-grained updating of existing records and a last-modified timestamp to support incremental harvesting (normally only detect and harvest changed entities).

Changes
- 2012-02-15: DataProcess was renamed to provenance and is no longer a distinct CaomEntity
Important Changes Since CAOM-1.0
Below are the main differences between the previous and current versions of CAOM. This is not a complete changelog, but it does try to capture the main differences.
- metadata that used to be specified for SimpleObservation or CompositeObservation is moved into the base Observation class: for simple observations these value retain the original meaning, while for composites the values simply mean that all the members had the same value (practically, CompositeObservations are almost always made from data with the same telescope and instrument and often the same target and proposal)
- the Artifact class is normalised into to three separate classes: Artifact, Part, and Chunk: Artifact is a whole thing (typically file), Part is a qualitative subsection (typically a single FITS extension), and Chunk is a quantitateivly defined subsection (typically a single row or column from a table or data array, but could be extended to support an arbitrary list of range of values if this actually occurs in reality)
- the Metric class is a part of the plane with specific metrics defined
- WCS classes are no longer entities that can be shared by multiple Artifacts
- add support for non-science artifacts to be included within a plane instead of as separate, related planes
- added an ObservableAxis to support cases where different quantities are stored in different sections of a data array: the use cases that drive this also require the Chunk and typically each Chunk is a different observable slice in the data array, but it could be used as a tiling mechanism
- integrated provenance model
- integrated metadata access control model
- renamed the permanent observation identifier from collectionID to observationID
- added a permanent product identifier (productID) to the Plane class
- expanded the WCS classes to include less detailed characterisation when a complete WCS solution is not available or practical
- added <<computed>> stereotype so that computed (aggreggate) metadata was clearly marked
- made the rules for specifying whether null values are allowed or not follow clearly from the model