Data Preservation Policy
This policy documents the digital preservation and protection policy of the Canadian Astronomy Data Centre. The CADC is a National Research Council (NRC) data centre hosted by the Herzberg Astronomy and Astrophysics Research Centre (HAA) and operated in partnership with Canadian Space Agency (CSA) and Shared Services Canada (SSC). The CADC mandate is to ensure the availability and longevity of the digital information assets associated with Canada's astronomical observatories in a sustainable way by addressing the factors which risk making them unusable and/or inaccessible.
We aim to:
- Maintain the integrity of the data by regularly auditing checksums of data files.
- Ensure all data access is secured for the relevant level of information security.
- Ensure data are accompanied by sufficient documentation to enable their re-use for analytical and research purposes.
- Ensure data are checked and validated according to appropriate data and documentation ingestion procedures.
- Ensure data are catalogued according to appropriate metadata standards.
- Provide suitable storage media for long-term data management, migrating data to new media as needed.
- Ensure that data are stored in sufficiently secure facilities to prevent loss due to catastrophic events.
- Ensure we have sufficient rights to preserve data and distribute it under an appropriate licence.
Retention policy varies for different data sets. Generally:
- Observations by Canada's telescopic facilities, as a unrepeatable measurement of the sky, are retained indefinitely.
- Model data, which can have a limited shelf-life and may be withdrawn but to-date is kept indefinitely as storage capacity has grown more rapidly that model data sets stored within the CADC.
- Third party data, where there is other primary archive, may be kept as a rolling archive of the most recent data, or reviewed if usage falls.
- Data that takes significant resource to keep, such as very large data sets, may need special consideration.
- Exceptionally, data may be withdrawn for a number of reasons such as being made redundant by recalibration of the same input/raw data sets. CADC does not preserve versions of processed data.
External Policy Considerations
As a core activity in the CADC, preservation does not exist in isolation. It needs to take account of:
- The NRC and Government of Canada Data Policy, in particular policies concerning accessibility of public science data (e.g. Open science - helping make science accessible for all Canadians - Science.gc.ca).
- The data policies of our external partners.
- Good practice developed by other repositories for example, Core Trust Seal certification, the International Virtual Observatory Alliance (IVOA) and the Open Archival Information System (OAIS) reference model
The preservation strategy of the CADC is to maintain a flexible preservation system that can evolve to meet the demands of changing technology and developing user expectations. The CADC has chosen to implement a preservation strategy based upon open and available file formats. The same ingestion procedure is used for all data resources and no judgement is made on the scholarly value of the data sets once they have been identified as suitable for deposit with the CADC. All data sets accepted for deposit must be accompanied by supporting documentation of sufficient quality to enable re-use over the long-term. To reduce the risk of obsolescence, files are only accepted in a non-proprietary formats.
Migration to new media is performed as technologies progress, but the data files themselves are unaltered whenever possible. Checksumming of files is performed to verify that nothing has changed.
Online storage for the data centre repository is administered by a dedicated IT infrastructure team both within the CADC and from our SSC partners. The environmental parameters which control the storage media are tightly controlled to reduce vulnerability. Data for which CADC is the primary or sole repository are mirrored to an external geographically distinct location on spinning disk, these data are served to users as part of normal operations. In addition, these data are backed up to Tape Libraries continuously. A copy of the archive is kept securely off site and forms a key component of the CADC's disaster recovery and business continuity procedures, providing for recovery of data and infrastructure under commonly anticipated threats (e.g. technical failure, human error). The system also ensures the safety of the data in the event of a more serious incident if, for example, the buildings housing the data centre and/or major IT infrastructure were to be rendered inoperable.
The preservation of the CADC data relies on servers and networks it uses. Currently we use the Government of Canada's research network, maintained by SSC, and university research network maintained by CANARIE. We also make use of storage infrastructure provided by the Digital Research Alliance of Canada as part of our on going partnership with that organization. Our internal data centre as well as those of our partners are maintained to modern physical and electronic security standards.
Funding and resource planning
The CADC is, and always has been, dependant on funding from NRC and the CSA. The CADC is currently funded via NRC A-base funding allocated to the Herzberg Astronomy and Astrophysics Research Centre as well as through an MOU (renewed every 5 years) with the Canadian Space Agency and via storage hardware contributions (renewed every 3 years) from the Digital Research Alliance. Each of these partners is federally funded and committed to Research Data Management and Government of Canada’s open science policies.
Resource management for preservation of digital resources includes:
- technical infrastructure, including equipment purchases, maintenance and upgrades, software/hardware obsolescence monitoring, network connectivity etc.
- financial plan, including strategy and financing the CADC and commitment to long-term funding
- staffing infrastructure, including recruitment, induction and ongoing staff training
The CADC Archive makes every effort to remain up-to-date with any relevant technological advances to ensure continued access to its collections. The CADC also implements a programme of continual improvement in how users interact with the data centre, for example, improved deposit and request functions for users.
- Date modified: