Documentation

Documentation Direct Data Service

Direct Data Service

Direct Data Service

The data service allows you to download files directly from the CADC archive. If you know the name of a file and the name of the archive, you can use the CADC cadc-data client to download the file. You can also use a simple URL to download a file. This URL can be used with command line clients like wget or curl. All of these clients can be incorporated in scripts.

Basic Usage

Downloading files with the cadc-data client

The cadc-data client is a Python application for accessing the data Web Service at the CADC.

cadc-data will:

  • retrieve one or more files at a time from an archive.
  • upload files.
  • discover information about files.
  • automatically discover the data service URL's and failover to another URL if an error occurs transferring a file.
  • automatically retry on transient errors.
  • check that the md5 checksum of the downloaded file matches the md5 checksum on the server to ensure the integrity of the file.

A simple example using cadc-data:

cadc-data get HLADR2 hst_05476_4r_wfpc2_total_pc_drz.fits.gz

will download the file hst_05476_4r_wfpc2_total_pc_drz.fits.gz from the HLADR2 archive to the current directory.
If the file is gzip compressed, using the -z option will automatically decompress the file.

To save to the file to a different filename use the -o <new filename> option.

If the data you are downloading isn't public, you can use your CADC username and password.
You can use your username on the command line with the --user option:

cadc-data get --user USERNAME HLADR2 hst_05476_4r_wfpc2_total_pc_drz.fits.gz

You will be prompted to enter your password.
Or you can use the --netrc-file option to specify a .netrc file containing your username and password.

cadc-data get --netrc-file NETRC_FILE HLADR2 hst_05476_4r_wfpc2_total_pc_drz.fits.gz

If you have a x509 certificate for authentication, you can use the --cert option to use the certificate.

cadc-data get --cert CERT_FILE HLADR2hst_05476_4r_wfpc2_total_pc_drz.fits.gz

cadc-data can also be used in scripts. It returns a non-zero exit status when an error occurs during execution.
For example:

#!/bin/bash
archive=IRIS
for file in I001B3H0.fits I016B4H0.fits
do
    echo "getting $archive $file"
    cadc-data get $archive $file && echo "done || echo "failed"
done

Commonly used options for cadc-data

  • -u, --user=USER name of user to authenticate. Note: application prompts for the corresponding password!.
  • --cert CERT location of your X509 certificate to use for authentication (unencrypted, in PEM format).
  • --netrc-file NETRC_FILE netrc file to use for authentication.
  • --fhead return the FITS header information.
  • -z, --decompress decompress the data (gzip only).
  • -o, --output OUTPUT space-separated list of destination files (quotes required for multiple elements).
  • --cutout [CUTOUT [CUTOUT ...]] specify one or multiple extension and/or pixel range cutout operations to be performed. Use cfitsio syntax.
  • -q, --quiet run quietly.

The data service URL

A data service URL is used for downloading data by direct transfer only.
Here is an example of a data service URL:

The URL has four parts:

Element Description
data service resource https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/pub/: The base URL identifying the resource of the data service. The data/pub ending will allow you access to all public files. If the file is proprietary it will re-direct to data/auth and challenge you for CADC username and password, which can be given via the usual pop-up in a browser or via command line options with wget and curl. There are other options described in below in Data Service Resources.
archive CFHT: Identifies the data archive (The CADC archives are listed in archives.txt
fileID 1722795p.fits: Identifies the file.
options [24][520:990,2420:2782]: Following the filename you can add options, in this case a cutout.

This URL (and the others below) can be used with command line web clients (e.g.: wget, curl) or with scripts (e.g.: with the Requests library in python).

Downloading files using wget or curl

A simple example with wget:

wget https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/pub/HLADR2/hst_05476_4r_wfpc2_total_pc_drz.fits.gz

A simple example with curl:

curl -O -J -L https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/pub/HLADR2/hst_05476_4r_wfpc2_total_pc_drz.fits.gz

The options -O -J make curl save the file locally using the server-specified Content-Disposition filename if available, else extracts a filename from the URL (instead writing it to STDOUT). The -L option makes curl follow any re-directs.

If the data you are downloading isn't public, you will need your CADC username and password. With wget use:

wget --user=fred --password=Pas$w0rD https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/pub/HLADR2/hst_05476_4r_wfpc2_total_pc_drz.fits.gz

With curl use:

curl -u fred:Pas$w0rD -O -J -L https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/pub/HLADR2/hst_05476_4r_wfpc2_total_pc_drz.fits.gz

There are several other options for both commands:

Commonly used options for wget

  • --user=username --password=password specify username and password.
  • -nv non-verbose. wget sends a lot of information to STDOUT. If you are running wget in a script, you want this option.
  • -q quiet mode.
  • -t, --tries=NUMBER set number of retries to NUMBER (5 recommended).
  • --waitretry=SECONDS wait 1..SECONDS between retries of a retrieval. By default, wget will assume a value of 10 seconds.
  • -N, --timestamping Turn on time-stamping and download only missing or updated files.
  • --content-disposition Forces wget to give the proper name to the downloaded file.
  • --certificate=file Use the certificate in file for authentication.

Commonly used options for curl

  • -O save the file locally with the same name as the remote version.
  • -J use the server-specified Content-Disposition filename.
  • -L follow redirects.
  • -u username:password specify username and password. If you just specify the username, curl will ask you for your password.
  • -s make curl run quietly. If you are running curl in a script, you want this option.
  • --retry NUMBER set number of retries to NUMBER (5 recommended).

Cutouts

A cutout can be performed when downloading a file. The cutout parameters are:

Number Parameter Value explanation
one or more cutout [extension number][image section] When requesting a file of type FITS, a number of cutout parameters may be included so that only these cutouts are retrieved. We are using a subset of the CFITSIO image section specification for cutout specification. Please note that single cutout parameters can also be requested as a suffix in the file ID element of the URL.

Cutout Syntax: Examples

Image Section Explanation
[1:512:2,2:512:2] Open a 256x256 pixel image consisting of the odd numbered columns (1st axis) and the even numbered rows (2nd axis) of the image in the primary array of the file.
[*,512:256] Open an image consisting of all the columns in the input image, but only rows 256 through 512. The image will be flipped along the 2nd axis since the starting pixel is greater than the ending pixel.
[*:2,512:256:2] Same as above but keeping only every other row and column in the input image.
[-*,*] Copy the entire image, flipping it along the first axis.
[3][1:256,1:256] Opens a subsection of the image that is in the 3rd extension of the file.

Cutout Examples

  1. Single Extension Cutout
    cadc-data get --output 806045o-cutout1.fits --cutout [1] CFHT 806045o
    curl --location-trusted -g -o 806045o-cutout1.fits "https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/pub/CFHT/806045o?cutout=[1]"
  2. Pixel Coordinate Cutout
    cadc-data get --output D3.IQ.R.9979_10490_10573_11084.fits --cutout [9979:10490,10573:11084] CFHTSG D3.IQ.R.fits
    curl --location-trusted -g -o D3.IQ.R.9979_10490_10573_11084.fits "https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/pub/CFHTSG/D3.IQ.R.fits[9979:10490,10573:11084]"
  3. Extension and Pixel Coordinate Cutout
    cadc-data get --output 806045o-cutout2.fits --cutout [1][1:100,1:200] CFHT 806045o
    curl --location-trusted -g -o 806045o-cutout2.fits "https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/pub/CFHT/806045o?cutout=[1][1:100,1:200]"
  4. Multiple Extension Cutout
    cadc-data get --output 806045o-cutout3.fits --cutout [1][2] CFHT 806045o
    curl --location-trusted -g -o 806045o-cutout3.fits "https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/pub/CFHT/806045o?cutout=[1]&cutout=[2]"
  5. Multiple Extension Cutout with Pixel Coordinates
    cadc-data get --output 806045o-cutout4.fits --cutout [1][10:120,20:30] [2][10:120,20:30] CFHT 806045o
    curl --location-trusted -g -o 806045o-cutout4.fits "https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/pub/CFHT/806045o?cutout=[1][10:120,20:30]&cutout=[2][10:120,20:30]"
  6. Single Extension Cutout (Shortcut version)
    cadc-data get --output 806045o-cutout5.fits --cutout [1] CFHT 806045o
    curl --location-trusted -g -o 806045o-cutout5.fits "https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/pub/CFHT/806045o[1]"
  7. Extension and Pixel Coordinate Cutout (Shortcut version)
    cadc-data get --output 806045o-cutout6.fits --cutout [1][1:100,1:200] CFHT 806045o
    curl --location-trusted -g -o 806045o-cutout6.fits "https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/pub/CFHT/806045o[1][1:100,1:200]"
  8. Alternatively, it is possible to specify a cutout by RA and Dec, using a slightly different service:
    curl -L -O -J "https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/caom2ops/sync?id=ad:CFHTSG/D2.I.fits&Circle=150.570478+2.172356+0.01"
    Where the numbers are RA, Dec and size, all in degrees. Remember that in a "+" (plus sign) in an URL means " ", a blank space.

FITS Header retrieval

Note: this option cannot be combined with the cutout options.

Using cadc-data to download a FITS header

cada-data has a --fhead option for downloading FITS header information.

Example

cadc-data get --fhead IRIS I001B3H0.fit

Using a data service URL to download a FITS header

When using a data service URL the fhead parameter is used for downloading FITS header information:

Number Parameter Value explanation
one or more fhead true When requesting a file of type FITS, providing the parameter fhead=true will result in the download of the header information of the file only.

Examples

Advanced usage using the cadc-data client

To download a file using the cadc-data client, use the archive and file name.

cadc-data get IRIS I001B3H0.fits

If cadc-data is unable to download the file, a message is returned describing the error(s) encountered.

cadc-data info FOO foo.fits

returns:

ERROR:cadc-data:File name foo.fit not found in archive FOO
ERROR:cadc-data:Finished with 1 error(s)

To upload a file use the cadc-data put argument.

cadc-data put CFHT newFile

Use the cadc-data info argument to retrieve metadata for a file.

cadc-data info IRIS I001B3H0.fits`

    File I001B3H0.fit:
        archive: IRIS
       encoding: None
        lastmod: Tue, 25 Jul 2006 23:15:19 GMT
         md5sum: 2ada853a8ae135e16504aeba4e47489e
           name: I001B3H0.fits
           size: 1008000
           type: application/fits
        umd5sum: 2ada853a8ae135e16504aeba4e47489e
          usize: 1008000

Metadata information returned by cadc-data info:

Name Explanation
archive The archive name
encoding The type of encoding (typically compression) used (optional)
lastmod Date of the last file modification (optional: not present when modified during delivery)
md5sum The MD5 digest of the contents of the file.
name Contains a suggested filename for clients that will write the file
size Size of the file as delivered
type The mimetype of the file (optional: only present if type is known)
umd5sum The MD5 digest of the contents of the file when uncompressed. (optional: not present when modified during delivery)
usize The size of the uncompressed file, in bytes (optional: not present when modified during delivery)

Advanced usage using a data service URL

Uploading a file

To upload a file to the data service, you must have permission to write to the target archive. An upload is done by performing an HTTP PUT to the URL identifying the file, and supplying the file data in the accompanying input stream of the request. If successful, an HTTP 201 response code will be returned.

Upload example:

Data Service Resources

Resource Description
https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/pub Public data file transfer resource. /pub over HTTP does not gather user credentials, so if downloading a non-public file or uploading to a non-public folder, you will be redirected to https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/auth and challenged for a userid/password.
https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/auth Authenticated data file transfer resource. This resource will challenge for a CADC userid/password for authentication and authorization.
https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/pub SSL data file transfer resource. A client certificate must be used to connect to this SSL-based resource. You will be authorized based on the credentials in the certificate.
https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/transfer Transfer negotiation endpoint for uploads and downloads.
https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/transfer Transfer negotiation endpoint that takes client certificates for authentication and authorization.
https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/auth/transfer Transfer negotiation endpoint that takes userid/password for authentication and authorization.
https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/availability Resource that can be used to check the availability of the data service. Performing an HTTP get to this resource will produce an XML document describing the state of the service.

Data transfer techniques

  • Direct download: Perform an HTTP GET to /data/pub/<archive>/<file> and receive a redirect to the preferred download location.
  • Direct upload: Perform an HTTP PUT to /data/pub/<archive>/<file> and upload directly to the stream.
  • Negotiated download: HTTP POST a transfer document to /data/transfer (or /data/auth/transfer) and receive a transfer document with multiple download locations included.
  • Negotiated upload: HTTP POST a transfer document to /data/transfer (or /data/auth/transfer) and receive a transfer document with multiple upload locations included.

Authentication and Authorization

If trying to access a non-public file you will be required to authenticate either by a CADC User ID and password or through a client certificate over SSL. If the authentication (login) fails, you will get an HTTP 401 (Unauthorized) response. If you successfully authenticate but are not allowed to access to the file, you will get an HTTP 403 (Forbidden) response. If the file does not exist, you will get an HTTP 404 (Not Found) response.

Checking for file availability and access

To simply check if a file exists and that you have access to the file, using wget or curl you can perform an HTTP HEAD request to the same URL that you would use to download the file. This HEAD request will allow you confirm its existence, your authorization, and to gather basic meta-data about the file.

To view the HTTP headers with curl, use curl --location --head or curl -L -I

With wget, use wget --server-response --spider Headers prefixed with an X- are custom CADC headers; all others are standard HTTP 1.1 headers.

HTTP Header Explanation
Content-Type The mimetype of the file (optional: only present if type is known)
Content-Encoding The type of encoding (typically compression) used (optional)
Content-Disposition Contains a suggested filename for clients that will write the file
Content-Length Size of the file as delivered
Content-MD5 The MD5 digest of the contents of the file
Last-Modified Date of the last file modification (optional: not present when modified during delivery)
X-Uncompressed-Length The size of the uncompressed file, in bytes (optional: not present when modified during delivery)
X-Uncompressed-MD5 The MD5 digest of the contents of the file when uncompressed. (optional: not present when modified during delivery)
X-CADC-Stream The name of the Stream to use when performing a PUT request. (optional: Default Stream is used when none specified.)

Data service and file names

You can use the Content-Disposition returned in the getData HTTP header to easily get wget to write the downloaded file to the name the file is stored in the archive with by using its --content-disposition flag. Note that you might want to also use the no-clobber option to avoid over-writing files you've already downloaded. There is not a curl option equivalent to the wget --content-disposition flag, but you could retrieve the HTTP header for the file, parse it for the content disposition and file name, then retrieve the file and saving it to that file name.

For URLs which specify a cutout, the suggested filename in the Content-Disposition header will include a extra part so that different cutouts from the same file will have different filenames. This extra part is intended to be somewhat human readable, though many characters are replaced with an underscore (_) to be generally more Internet and file system compatible. This extra part will be consistent between requests with the same cutout parameters.

Contact CADC for Assistance

For help and support with the data service, please email mailto:cadc@nrc.ca