Common Archive Observation Model (CAOM-2.2)

The Common Archive Observation Model (CAOM) is a general purpose data model for use as the core data model of an astronomical data centre. The purpose of this model is to standardize the core parts of every archive implementation so that a common set of software tools can be used across archives. This set of tools includes simple data access mechanisms (web pages), discovery agents, advanced query capabilities (typically as IVOA web services), and advanced applications. In addition, we have implemented CAOM in a cross-archive data warehouse which will hold metadata for all public and proprietary data; the common model is used within the data warehouse and thus defines the portion of the archive database that is available through all applications.

While the previous version (CAOM-1.0) was very successful in enabling the CADC to serve a wide variety of data, some particular data structures were awkward to describe in a meaningful way. CAOM-2.0 has been developed to address such issues and cover new scenarios for data products.

CAOM-2.2 Software

resourcedescription
Table Access Protocol (TAP) a web service for ad-hoc querying, including both CAOM and the IVOA ObsCore-1.1 Data Model
Simple Image Access (SIA) a web service for finding calibrated images (uses TAP and CAOM)
AdvancedSearch a user interface for querying CAOM (uses TAP)
CAOM operations services a set of high level web services to support use of CAOM: drill down to downloads using IVOA DataLink, cutouts on data files using IVOA SODA, get detailed metadata of a CAOM observation, get all files in a product (CAOM Plane) from a single download URL
CAOM observation repository web service to support ingestion of metadadata into CAOM at CADC (requires authentication)
caom2 java library for manipulating CAOM observations, including reading and writing XML documents
source code
fits2caom2 java command-line tool for extracting metadata from FITS files, reads FITS file(s) and writes an XML document
source code
pyCAOM2 python library for manipulating CAOM observations, including reading and writing XML documents
source code
caom2repoClient python command-line tool for interacting with the CAOM observation repository, supports get, put, update, delete (requires authentication)
source code
cadcWCS java library for accessing wcslib
source code
wcsLibJNI JNI binding library for wcslib
source code

Core Classes

The core CAOM classes describe the observation, data products, and access to the physical artifacts (typically files), as well as the provenance of the data products. In addition, we also show the (loosely coupled) metadata access control classes.

The UML diagrams specify through the cardinality whether an object or field may be null or not.

While there are a few methods shown in the model, these are to signify value conversions (e.g. wavelength to frequency) that require access to other internal fields; this is a reminder/hint to persistence implementations that the values returned from methods should also be stored for efficient querying (for example, by the TAP service).

core CAOM model

Changes

All the mini-vocabularies defined in the CAOM model are shown below. More detailed definitions and actual literal values used when stored or serialised follow.

CAOM vocabularies

Changes

Observation

An observation is the primary top-level class for empirical data. A simple observation is the result of one use or invocation of a telescope and instrument, while a composite observation is a collection of simple observations that have been combined to create a new data product. simple observations have one or more data products (planes). The metadata for an observation is concerned with describing the data acquisition: the proposal, telescope, instrument, and target. The algorithm for a simple observation is always exposure while the algorithm for a composite observation describes the semantics of the grouping or composition involved.

Enumerated type values are generally lower case strings equivalent to the names used in the model, but there are a few exceptions where numeric values or mixed-case strings from an external source are used.

Enumerationmodel fieldserialised valuedescription
ObservationIntentType describes the intent of the observation
SCIENCEscience
CALIBRATIONcalibration
TargetType describes the type of target (e.g. FITS OBSTYPE keyword)
OBJECTobject
FIELDfield
Status flag to characterise the satisfaction of requirements
FAILfail

Plane

A plane is one (of several) data product(s) that are created as part of an observation. Each simple observation typically has one raw plane created by the observing process itself and may have one or more additional planes that are produced by subsequent data processing. The metadata for a plane characterises the data product: the coverage and sampling in position, energy, time, and polarization, a description of the observed quantity (usually something proportional to flux). This metadata is generally computed from the artifact metadata and is used primarily in data discovery.

Enumerationmodel fieldserialised valuedescription
CalibrationLevel from IVOA ObsCoreDM-1.0
RAW_INSTRUMENTAL0raw data, instrumental format
RAW_STANDARD1raw data, standard format
CALIBRATED2calibrated data, standard reductions appleid
PRODUCT3product, advanced processing applied
DataProductType from IVOA ObsCoreDM-1.0
IMAGEimage
SPECTRUMspectrum
TIMESERIEStimeseries
VISIBILITYvisibility
EVENTLISTeventlist
CUBEcube
CATALOGcatalogcustom data product type
EnergyBand from IVOA Resource Metadata for the Virtual Observatory
RADIORadionu > 30GHz (lambda < 10mm)
MILLIMETERMillimeter0.1-10mm
INFRAREDInfrared1-100um
OPTICALOptical300-1000nm
UVUV100-300nm
EUVEUV10-100nm
XRAYX-ray0.12-120keV
GAMMARAYGamma-rayenergy > 120keV
Quality flag data quality
JUNKjunkthis data is bad for everything

The serialised values for Polarization (as listed in the PolarizationState type) are the identical upper case letter(s).

Artifact

An artifact is one physical product or resource (typically a file) that is part of a plane. Planes in an Observation with intent=science should always have at least one science artifact and may have other types of (associated) artifacts. Planes in an Observation with intent=calibration should always have at least one calibration artifact and may have other types of (associated) artifacts. While science Observations (with science Artifacts) could also contain calibration Artifacts, this should only be done if those calibrations are to be used with that science data only; normally science and calibration files are part of different Observations.

Enumerationmodel fieldserialised valuedescription
ProductType file type classification
SCIENCEsciencescience data
CALIBRATIONcalibrationfiles needed to calibrate science data
INFOinfoadditional information file(s), usually text
PREVIEWpreviewnon-science quality rendition of the data for preview
THUMBNAILthumbnailsmall non-science quality rendition of the data for preview
NOISEnoiseassociated noise files needed for analysis
WEIGHTweightassociated weight files needed for analysis
AUXILIARYauxiliaryother associated files needed for analysis
ReleaseType artifact release permission type
DATAdatarelease depends on parent
Plane.dataRelease and PlaneDataReadAccess
METAmetarelease depends on parent
Plane.metaRelease and PlaneMetaReadAccess

Part

A part is one qualitatively defined subsection of an artifact. For example, if the artifact is a multi-extension FITS file, it would have one part for each extension. If the artifact is a tar or zip file, it would have one part for each contained file.

Chunk

A chunk is a quantitatively defined subsection of a part. The chunk is characterised by world coordinate system (WCS) metadata plus an extra axis to describe different observables (the measured values) stored within the data. Different chunks can be defined which vary only on the range of coordinate values they include. For example, if a single data array contains different observable quantities can define a chunk for each slice through the array, with each slice having a different product type. One can also use chunks to define arbitrary tiles in a large data array; this is useful if there is no WCS solution to describe the mapping of sky to pixel coordinates but one still wants to be able to extract smaller sections of the data (e.g. one chunk).

Metric

A metric is a measured quantity that described a data product. Metrics are inherently very flexible and can be used to derive interesting and useful metadata about data products, usually by analysing the actual content and computing some aggregate information. For example, one could compute the number (or density) of point and extended sources in an image or the signal-to-noise ratio of a specific type and brightness of source. Metrics describe the content of the data product.

Provenance

The provenance of a data product describes the data processing that transformed the input(s) into the output. For example, a raw plane is processed to produce a calibrated plane using a specific recipe. This is considered provenance since the primary goal or view of this metadata is to understand how a specific plane was produced and to trace backwards from an output, through the data process, to the input(s). The complete provenance may include multiple loops until it reaches a plane which does not have a provenance: a raw plane in a simple observation.

ReadAccess and Subclasses

Access control is declared as special permission classes where the existence of instances grants read permission on instances of associated assets. In addition to the permissions, assets also include a release date (e.g. metaRelease in the Plane class) that specifies when the asset is public and permission is not required. For example, instances of the PlaneMetaReadAccess class grant read permission on the metadata held in the an instance of the Plane class (where the assetID of the permission matches the ID of the asset). Access control is implemented as a query transformation (e.g. in a TAP service) that connects the asset and the permission and allows one to implement and protect proprietary metadata within CAOM.

IVOA Publisher Dataset Identifiers

The IVOA publisher dataset identifier (publisherDID) is a Uniform Resource Identifier (URI). In CAOM, this is a reproducible, globally-unique identifier, constructed as follows:

ivo://{authority}/{collection}/{observationID}/{productID}

where the curly braces { } denote values from the model (or a registered IVOA Authority ID in the case of {authority}). Within one authority (data centre), the collection and observationID from the Observation class and the productID from the Plane class make up the publishderDID for a single CAOM data product. It is not clear exactly how one resolves a publisherDID, but what one can do is strip the path off and look up the authority in an IVOA registry and then look up specific types of IVOA services owned by that authority, such as the proposed DataLink service.

Data Types (Discovery)

The following data types are used in the computed characterisation of the Plane class to support data discovery queries.

core CAOM model

Changes

Data Types (WCS)

The following data types are derived from FITS WCS (TODO: ref), but also include more general ability to describe coverage when per-pixel coordinate system information is not available (for example, raw data that is not in FITS format) or is not practical to store in the database (for example, non-linear spectral WCS solution stored in the data array or table of a FITS file).

core CAOM model

The PolarizationState enumeration that describes polarization of a Plane is computed from the FITS WCS description of the axis (CTYPEi=STOKES). The numeric values used in the FITS WCS representation are related to the string values following FITS WCS Paper 1 (Table 7) and some extensions extracted from a fits mailing list discusssion. The mapping is as follows:

PolarizationStatenumeric valuedescription/source
I1FITS WCS Paper 1
Q2FITS WCS Paper 1
U3FITS WCS Paper 1
V4FITS WCS Paper 1
POLI5linear polarized intensity sqrt(Q^2 + U^2), code used in AIPS
FPOLI6fractional linear polarization POLI/I, code used in AIPS
POLA7linear polarization angle 1/2 arctan(U,Q), code used in AIPS
EPOLI8elliptical polarization intensity sqrt(Q^2 + U^2 + V^2)
CPOLI9circular polarization intensity |V|
NPOLI10unpolarized intensity I - EPOLI
RR-1FITS WCS Paper 1
LL-2FITS WCS Paper 1
RL-3FITS WCS Paper 1
LR-4FITS WCS Paper 1
XX-5FITS WCS Paper 1
YY-6FITS WCS Paper 1
XY-7FITS WCS Paper 1
YX-8FITS WCS Paper 1

Changes

Persistence Interfaces

Once of the goals for CAOM is to enable instances of observations to be harvested to alternate databases. To support this, all CAOM entities have a unique identifier to support fine-grained updating of existing records and a last-modified timestamp to support incremental harvesting (normally only detect and harvest changed entities).

core CAOM model

Changes

Important Changes Since CAOM-1.0

Below are the main differences between the previous and current versions of CAOM. This is not a complete changelog, but it does try to capture the main differences.