Documentation Direct Data Service
Direct Data Service
The data service allows you to download files directly from the CADC archive. If you know the name of a
file and the name of the archive, you can use the CADC
cadc-data client to download the
file. You can also use a simple URL to download a file. This URL can be used with command line clients
curl. All of these clients can be incorporated in scripts.
- retrieve one or more files at a time from an archive.
- upload files.
- discover information about files.
- automatically discover the data service URL's and failover to another URL if an error occurs transferring a file.
- automatically retry on transient errors.
- check that the md5 checksum of the downloaded file matches the md5 checksum on the server to ensure the integrity of the file.
A simple example using
cadc-data get HLADR2 hst_05476_4r_wfpc2_total_pc_drz.fits.gz
will download the file
hst_05476_4r_wfpc2_total_pc_drz.fits.gz from the HLADR2 archive to
the current directory.
If the file is gzip compressed, using the
-z option will automatically decompress the file.
To save to the file to a different filename use the
-o <new filename> option.
If the data you are downloading isn't public, you can use your CADC username and password.
You can use your username on the command line with the
cadc-data get --user USERNAME HLADR2 hst_05476_4r_wfpc2_total_pc_drz.fits.gz
You will be prompted to enter your password.
Or you can use the
--netrc-file option to specify a .netrc file containing your username and
cadc-data get --netrc-file NETRC_FILE HLADR2 hst_05476_4r_wfpc2_total_pc_drz.fits.gz
If you have a x509 certificate for authentication, you can use the
--cert option to use the
cadc-data get --cert CERT_FILE HLADR2hst_05476_4r_wfpc2_total_pc_drz.fits.gz
cadc-data can also be used in scripts. It returns a non-zero exit status when an error
occurs during execution.
#!/bin/bash archive=IRIS for file in I001B3H0.fits I016B4H0.fits do echo "getting $archive $file" cadc-data get $archive $file && echo "done || echo "failed" done
-u, --user=USERname of user to authenticate. Note: application prompts for the corresponding password!.
--cert CERTlocation of your X509 certificate to use for authentication (unencrypted, in PEM format).
--netrc-file NETRC_FILEnetrc file to use for authentication.
--fheadreturn the FITS header information.
-z, --decompressdecompress the data (gzip only).
-o, --output OUTPUTspace-separated list of destination files (quotes required for multiple elements).
--cutout [CUTOUT [CUTOUT ...]]specify one or multiple extension and/or pixel range cutout operations to be performed. Use cfitsio syntax.
-q, --quietrun quietly.
A data service URL is used for downloading data by direct transfer only.
Here is an example of a data service URL:
The URL has four parts:
|data service resource||
|archive||CFHT: Identifies the data archive (The CADC archives are listed in archives.txt|
|fileID||1722795p.fits: Identifies the file.|
|options||[520:990,2420:2782]: Following the filename you can add options, in this case a cutout.|
A simple example with
A simple example with
curl -O -J -L https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/pub/HLADR2/hst_05476_4r_wfpc2_total_pc_drz.fits.gz
-O -J make curl save the file locally using the server-specified
Content-Disposition filename if available, else extracts a filename from the URL (instead writing it to
-L option makes curl follow any re-directs.
If the data you are downloading isn't public, you will need your CADC username and password. With
wget --user=fred --password=Pas$w0rD https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/pub/HLADR2/hst_05476_4r_wfpc2_total_pc_drz.fits.gz
curl -u fred:Pas$w0rD -O -J -L https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/pub/HLADR2/hst_05476_4r_wfpc2_total_pc_drz.fits.gz
There are several other options for both commands:
--user=username --password=passwordspecify username and password.
-nvnon-verbose. wget sends a lot of information to STDOUT. If you are running wget in a script, you want this option.
-t, --tries=NUMBERset number of retries to NUMBER (5 recommended).
--waitretry=SECONDSwait 1..SECONDS between retries of a retrieval. By default, wget will assume a value of 10 seconds.
-N, --timestampingTurn on time-stamping and download only missing or updated files.
--content-dispositionForces wget to give the proper name to the downloaded file.
--certificate=fileUse the certificate in
-Osave the file locally with the same name as the remote version.
-Juse the server-specified Content-Disposition filename.
-u username:passwordspecify username and password. If you just specify the username, curl will ask you for your password.
-smake curl run quietly. If you are running curl in a script, you want this option.
--retry NUMBERset number of retries to NUMBER (5 recommended).
A cutout can be performed when downloading a file. The cutout parameters are:
|one or more||cutout||[extension number][image section]||When requesting a file of type FITS, a number of cutout parameters may be included so that only these cutouts are retrieved. We are using a subset of the CFITSIO image section specification for cutout specification. Please note that single cutout parameters can also be requested as a suffix in the file ID element of the URL.|
|[1:512:2,2:512:2]||Open a 256x256 pixel image consisting of the odd numbered columns (1st axis) and the even numbered rows (2nd axis) of the image in the primary array of the file.|
|[*,512:256]||Open an image consisting of all the columns in the input image, but only rows 256 through 512. The image will be flipped along the 2nd axis since the starting pixel is greater than the ending pixel.|
|[*:2,512:256:2]||Same as above but keeping only every other row and column in the input image.|
|[-*,*]||Copy the entire image, flipping it along the first axis.|
|[1:256,1:256]||Opens a subsection of the image that is in the 3rd extension of the file.|
- Single Extension Cutout
cadc-data get --output 806045o-cutout1.fits --cutout  CFHT 806045o curl --location-trusted -g -o 806045o-cutout1.fits "https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/pub/CFHT/806045o?cutout="
- Pixel Coordinate Cutout
cadc-data get --output D3.IQ.R.9979_10490_10573_11084.fits --cutout [9979:10490,10573:11084] CFHTSG D3.IQ.R.fits curl --location-trusted -g -o D3.IQ.R.9979_10490_10573_11084.fits "https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/pub/CFHTSG/D3.IQ.R.fits[9979:10490,10573:11084]"
- Extension and Pixel Coordinate Cutout
cadc-data get --output 806045o-cutout2.fits --cutout [1:100,1:200] CFHT 806045o curl --location-trusted -g -o 806045o-cutout2.fits "https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/pub/CFHT/806045o?cutout=[1:100,1:200]"
- Multiple Extension Cutout
cadc-data get --output 806045o-cutout3.fits --cutout  CFHT 806045o curl --location-trusted -g -o 806045o-cutout3.fits "https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/pub/CFHT/806045o?cutout=&cutout="
- Multiple Extension Cutout with Pixel Coordinates
cadc-data get --output 806045o-cutout4.fits --cutout [10:120,20:30] [10:120,20:30] CFHT 806045o curl --location-trusted -g -o 806045o-cutout4.fits "https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/pub/CFHT/806045o?cutout=[10:120,20:30]&cutout=[10:120,20:30]"
- Single Extension Cutout (Shortcut version)
cadc-data get --output 806045o-cutout5.fits --cutout  CFHT 806045o curl --location-trusted -g -o 806045o-cutout5.fits "https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/pub/CFHT/806045o"
- Extension and Pixel Coordinate Cutout (Shortcut version)
cadc-data get --output 806045o-cutout6.fits --cutout [1:100,1:200] CFHT 806045o curl --location-trusted -g -o 806045o-cutout6.fits "https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/pub/CFHT/806045o[1:100,1:200]"
- Alternatively, it is possible to specify a cutout by RA and Dec, using a slightly different service:
Where the numbers are RA, Dec and size, all in degrees. Remember that in a "+" (plus sign) in an URL means " ", a blank space.
curl -L -O -J "https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/caom2ops/sync?id=ad:CFHTSG/D2.I.fits&Circle=150.570478+2.172356+0.01"
Note: this option cannot be combined with the cutout options.
cada-data has a
--fhead option for downloading FITS header information.
cadc-data get --fhead IRIS I001B3H0.fit
When using a data service URL the
fhead parameter is used for downloading FITS header
|one or more||fhead||true||When requesting a file of type FITS, providing the parameter
View meta-data (headers) of a CFHT image extension cutout
curl -v --location-trusted -g --head "https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/pub/CFHT/806045o?cutout="
View meta-data (headers) of a CFHT image extension cutout (Shortcut version)
curl -v -L -g --head "https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/pub/CFHT/806045o"
To download a file using the cadc-data client, use the archive and file name.
cadc-data get IRIS I001B3H0.fits
If cadc-data is unable to download the file, a message is returned describing the error(s) encountered.
cadc-data info FOO foo.fits
ERROR:cadc-data:File name foo.fit not found in archive FOO ERROR:cadc-data:Finished with 1 error(s)
To upload a file use the
cadc-data put argument.
cadc-data put CFHT newFile
cadc-data info argument to retrieve metadata for a file.
cadc-data info IRIS I001B3H0.fits` File I001B3H0.fit: archive: IRIS encoding: None lastmod: Tue, 25 Jul 2006 23:15:19 GMT md5sum: 2ada853a8ae135e16504aeba4e47489e name: I001B3H0.fits size: 1008000 type: application/fits umd5sum: 2ada853a8ae135e16504aeba4e47489e usize: 1008000
Metadata information returned by
|archive||The archive name|
|encoding||The type of encoding (typically compression) used (optional)|
|lastmod||Date of the last file modification (optional: not present when modified during delivery)|
|md5sum||The MD5 digest of the contents of the file.|
|name||Contains a suggested filename for clients that will write the file|
|size||Size of the file as delivered|
|type||The mimetype of the file (optional: only present if type is known)|
|umd5sum||The MD5 digest of the contents of the file when uncompressed. (optional: not present when modified during delivery)|
|usize||The size of the uncompressed file, in bytes (optional: not present when modified during delivery)|
To upload a file to the data service, you must have permission to write to the target archive. An upload is done by performing an HTTP PUT to the URL identifying the file, and supplying the file data in the accompanying input stream of the request. If successful, an HTTP 201 response code will be returned.
- HTTP PUT to: https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/pub/CFHT/newFile
curl -T /path/to/newFile "https://www.cadc-ccda.hia-iha.nrc-cnrc.gc.ca/data/pub/CFHT/newFile"
Data Service Resources
||Public data file transfer resource. /pub over HTTP does not gather user credentials, so if
downloading a non-public file or uploading to a non-public folder, you will be redirected to
||Authenticated data file transfer resource. This resource will challenge for a CADC userid/password for authentication and authorization.|
||SSL data file transfer resource. A client certificate must be used to connect to this SSL-based resource. You will be authorized based on the credentials in the certificate.|
||Transfer negotiation endpoint for uploads and downloads.|
||Transfer negotiation endpoint that takes client certificates for authentication and authorization.|
||Transfer negotiation endpoint that takes userid/password for authentication and authorization.|
||Resource that can be used to check the availability of the data service. Performing an HTTP get to this resource will produce an XML document describing the state of the service.|
- Direct download: Perform an HTTP GET to
/data/pub/<archive>/<file>and receive a redirect to the preferred download location.
- Direct upload: Perform an HTTP PUT to
/data/pub/<archive>/<file>and upload directly to the stream.
- Negotiated download: HTTP POST a transfer document to
/data/auth/transfer) and receive a transfer document with multiple download locations included.
- Negotiated upload: HTTP POST a transfer document to
/data/auth/transfer) and receive a transfer document with multiple upload locations included.
If trying to access a non-public file you will be required to authenticate either by a CADC User ID and password or through a client certificate over SSL. If the authentication (login) fails, you will get an HTTP 401 (Unauthorized) response. If you successfully authenticate but are not allowed to access to the file, you will get an HTTP 403 (Forbidden) response. If the file does not exist, you will get an HTTP 404 (Not Found) response.
To simply check if a file exists and that you have access to the file, using wget or curl you can perform an HTTP HEAD request to the same URL that you would use to download the file. This HEAD request will allow you confirm its existence, your authorization, and to gather basic meta-data about the file.
To view the HTTP headers with curl, use
curl --location --head or
curl -L -I
With wget, use
wget --server-response --spider Headers prefixed with an X- are custom CADC
headers; all others are standard HTTP
|Content-Type||The mimetype of the file (optional: only present if type is known)|
|Content-Encoding||The type of encoding (typically compression) used (optional)|
|Content-Disposition||Contains a suggested filename for clients that will write the file|
|Content-Length||Size of the file as delivered|
|Content-MD5||The MD5 digest of the contents of the file|
|Last-Modified||Date of the last file modification (optional: not present when modified during delivery)|
|X-Uncompressed-Length||The size of the uncompressed file, in bytes (optional: not present when modified during delivery)|
|X-Uncompressed-MD5||The MD5 digest of the contents of the file when uncompressed. (optional: not present when modified during delivery)|
|X-CADC-Stream||The name of the Stream to use when performing a PUT request. (optional: Default Stream is used when none specified.)|
You can use the Content-Disposition returned in the getData HTTP header to easily get wget to write the
downloaded file to the name the file is stored in the archive with by using its
flag. Note that you might want to also use the
no-clobber option to avoid over-writing
files you've already downloaded. There is not a
curl option equivalent to the
--content-disposition flag, but you could retrieve the HTTP header for
the file, parse it for the content disposition and file name, then retrieve the file and saving it to
that file name.
For URLs which specify a cutout, the suggested filename in the Content-Disposition header will include a extra part so that different cutouts from the same file will have different filenames. This extra part is intended to be somewhat human readable, though many characters are replaced with an underscore (_) to be generally more Internet and file system compatible. This extra part will be consistent between requests with the same cutout parameters.
- Date modified: