You will see several different colors of text used in this document. These were chosen to help the reader differentiate between the following types of parameter sources:
GREEN_ENTRIES - come from the process
resource file for a particular pipeline process
ORANGE_ENTRIES - come from entries
in the CDBS release catalog
TAN_ENTRIES - come from the OPUS_DEFINITIONS_DIR:opus_login.csh
login script or from the OPUS architecture itself
Hopefully the use of these different colors will make the source of each parameter more clear.
The CDBS (Calibration DataBase System) Reference File Retrieval Pipeline was constructed using the OPUS pipeline architecture. There are a number of online documents and a FAQ (start from the FAQ) describing the OPUS architecture which contain valuable information for the operator of this pipeline. That information will not be duplicated here.
The CDBS Pipeline consists of a set of stages (processes) that will accept reference file deliveries from CDBS, submit the files to the HST archive, copy these deliveries to remote sites, install these deliveries on disk, and return a feedback signal to CDBS indicating the completion of the delivery. A reference file delivery consists of a catalog file listing the rootnames of the calibration files and some other coded information, and a set of reference file images and/or tables in FITS format. The feedback signal is a database update of selected fields in the CDBS database. This feedback signal enables a reference file for use in the pipeline.
One of the pipeline processes will need to get files from a CDBS workstation. File transfers via FTP will occur between the pipeline system and the science cluster where reference file deliveries are made, so network connectivity disruptions will likely cause problems in the CDBS pipeline processing.
The feedback signal for successful processing of a CDBS installation is communicated via an update of database fields in the CDBS database. Disruptions in database availability will affect this function, and in turn, affect the capability for enabling a reference file for pipeline use.
The process CDBPOL polls a remote CDBS directory for new catalog releases, and creates an OSF on an OPUS blackboard for each new release. This process feeds data to the CDBS pipeline, and therefore does not show up in the pipeline stages listed in the OMG. Consider it the "jump start" process for the pipeline.
The “visible” CDBS Pipeline consists of the following stages (visible as columns on the OMG , the Observation Manager graphical display):
copy a CDBS delivery (catalog, reference files) from a remote system to a local disk
Prepare a CDBS delivery for archiving by creating OSFs for each dataset
Generate archive requests for each dataset in a CDBS delivery
Await archive response for
archiving reference files
Create a tar package of a CDBS delivery
Send tar package to remote site #1
send tar package to remote site #2
pause to allow remote sites to process the delivery
install new reference files/tables in the reference file directories, possibly converting FITS-to-GEIS if needed
Notify CDBS system that pipeline installation is complete via a database update
delete obsolete reference files from disk
delete the catalog and logs after a successful installation
The CDBTRG process does not show up in the stages either. Its function is to retrigger deliveries in the CDBDEL stage to make sure catalogs are "old enough" for deletions to proceed
There are three additional processes, cdbnewexp, cdbnewref, and cdbswitch, that run periodically as time pollers. These update the archive catalog BESTREF and BESTSWITCH fields. There are no OSFs associated with these processes. Some will send e-mail to a configurable distribution list indicating the work performed during each run.
Trigger: Delta-time poller (awakens at periodic time interval)
Summary:
Awakens periodically (DELTA_TIME) to check the CDBS release directory (REMOTE_FILESPEC) on a disk local to a CDBS machine (REMOTE_MACHINE) for CDBS deliveries (REMOTE_FILEMASK). For each delivery found, an attempt is made to create an OSF on an OPUS blackboard (OSF_PROCESSING, OSF_DATA_ID, OSF_DCF_NUM), providing an OSF does not already exist for that delivery.
This process identifies CDBS deliveries awaiting pick up on a remote CDBS system by registering them through OSFs on an OPUS blackboard. Once OSFs are created, the deliveries will proceed down the CDBS pipeline. Failures in this process will stop new CDBS deliveries from entering the pipeline.
The current trigger for noticing a new delivery is the existence of the CDBS-provided catalog file (REMOTE_FILEMASK). CDBS follows a standard delivery procedure where they first copy the reference files and tables to the retrieval interface directory (REMOTE_FILESPEC), followed by the catalog. Since the catalog is the last file delivered, the delivery should be complete once it is available.
Before creating a new OSF for a delivery, this process checks for an existing OSF with the same OSF_DATASET name (the name of the CDBS release catalog is used for this value). If an OSF with this name already exists, then no action is taken. This is necessary since this process does not attempt to rename or delete deliveries that it finds available, so it would pick up the same delivery repeatedly until the CDBCPY stage retrieves it and deletes it from the CDBS directory (REMOTE_FILESPEC).
If the very first remote login by this process fails, it will immediately exit (i.e. go ABSENT). After the first successful login, it is assumed that the set-up parameters are correct, and subsequent failed logins will be reported and counted, but the process will not exit until a maximum number of consecutive login errors (MAX_LOGIN_ERRORS) is reached. If errors have accumulated but then a successful connection occurs, the error count will be set back to zero. This hopefully will allow the program to recover from intermittent network connectivity failures but detect systematic problems that have occurred but did not exist at the time the task was started.
The PMG will prompt the user for a password for the remote CDBS system at the time this process is started.
OPUS Resource File Entries
PASSWORD_PROMPT
reminds the user at task start-up, which password they are expected to provide
DELTA_TIME
time interval (DDD:HH:MM:SS) at which this process
awakens to search for new deliveries
MAX_LOGIN_ERRORS
Maximum number of consecutive login errors (after the first) allowed before the process exits
REMOTE_FILESPEC
remote directory on CDBS system that is searched for new deliveries
REMOTE_FILEMASK
filemask used to search the directory REMOTE_FILESPEC
REMOTE_MACHINE
name of CDBS system that will be connected to for the remote file search
REMOTE_LOGIN
login id used to connect to the remote system
REMOTE_TYPE
Type of remote system (UNIX or VMS)
USE_RSH_RCP
Flag indicating if rsh/rcp should be used in place of FTP
OSF_PROCESSING
status code used to set the OSFs that are created for new CDBS deliveries
OSF_DATA_ID
value of the OSF data_id field when new OSFs are created
OSF_DCF_NUM
value of the OSF dcf number field when new OSFs are created
Success Conditions
No deliveries found on the CDBS system OR
An OSF is created on an OPUS blackboard for each new CDBS delivery, using the catalog rootname as the OSF dataset name.
Error Conditions
Failure to connect to remote CDBS system with the first connection attempt.
Failure to connect to the remote CDBS system more than MAX_LOGIN_ERRORS consecutive times.
Failure to create OSFs on an OPUS blackboard.
Trigger: OSF stage CP = w
Summary:
Attempts to copy a release catalog and a set of reference
files from a remote CDBS system (REMOTE_MACHINE, FIREWALL, REMOTE_FILESPEC, REMOTE_LOGIN) to a local directory (LOCAL_FILESPEC) for the CDBS delivery indicated by the triggering
OSF (OSF_DATASET). The
catalog is copied back first, and then is read to obtain the rootnames of
the reference files to copy back.
A CDBS release catalog will be named with a filename that is unique to a particular delivery. It will follow the form, opus_NNNNN_I.cat, which identifies release number NNNNN containing, at the very least, files for instrument I (catalogs can contain entries for multiple instruments, so the instrument character "I" provides limited information). The file contains entries in three columns, providing the reference file rootname, the type of operation needed (insertion or deletion: I or D), and an indicator of whether the entry describes a reference file image or a table (R or T). Here is a sample catalog:
H3L1118TY I R
H3L1119EY I R
H3L11196Y I R
This process first attempts to copy back the release catalog identified by the OSF_DATASET from the triggering OSF. If successful, the catalog is opened and read, and for each rootname listed, a file copy from the remote CDBS system is made. Multiple files on the remote system can match each rootname (wildcard characters "*.*" are appended to the rootname before the copy), but at least one file must be found for each rootname entry, otherwise the delivery is considered incomplete and the OSF.CP = 'e'.
To log the operations on each release catalog throughout the CDBS pipeline, a trailer file is created with the filename opus_NNNNN_I.trl. It is created in the same directory in which the files are copied to on the local system (LOCAL_FILESPEC). It is appended to by each processing stage, and deleted at the end of all processing by CDBCLN. This trailer is viewable through the OMG/Utilities/View Trailer menu selection.
The user must provide a login password for the remote CDBS system at the time this process is started by the PMG.
Examination of the process resource file will show that this process is both OSF and time triggered. The time trigger is built with a very long (999 day) delta time interval that effectively guarantees that the process will trigger once when it is initially started, and from then on is only OSF triggered. This trick is used as a way to verify that the password provided for the remote CDBS system is exercised right away, to avoid coming back with a "bad password" error hours or days later when the first real OSF event occurs.
OPUS Resource File Entries
PASSWORD_PROMPT
reminds the user at task start-up, which password they are expected to provide
REMOTE_FILESPEC
remote directory on a CDBS system that holds the catalog and files
LOCAL_FILESPEC
local directory that the catalog and files are copied to
REMOTE_MACHINE
name of CDBS system that will be connected to for the remote file copy
REMOTE_LOGIN
login id used to connect to the remote system
FILE_TYPE
Ascii or binary file transfer
PUT_OR_GET
Put or get files to/from the remote system
USE_RCP_RSH
Flag indicating to use rsh/rcp in place of FTP
DISK_SLOP_PERC
extra percentage of free disk space that must exist on LOCAL_FILESPEC before files are copied back
DELETE_AFTER_COPY
flag indicating that a successfully copied delivery (catalog, files) is to be deleted from the remote system
ROLLBACK_PARTIAL_COPY
flag indicating that a partial copy is considered a failure, and to remove any files copied that are not part of a complete delivery
Success Conditions
A CDBS release catalog and a set of reference files and tables will be copied from the remote CDBS system to a local directory.
A trailer file (ASCII log file) is created for each catalog retrieved from the remote system.
The original delivery on the remote CDBS system will be deleted.
OSF.CP = c
Error Conditions
OSF.CP = e
Failure to connect to the remote CDBS system
Failure to copy any part of the delivery from the remote CDBS system
Failure to read the CDBS release catalog
Trigger: OSF stage AR = w
Summary
This process will prepare a CDBS delivery for archiving by creating OSFs for each catalog INSERT (new ref file) and creating a link to the archive disk area.
This task moves to the directory where the CDBS delivery files exist (INPATH) and extracts all "insert" entries (contain " I ") from the release catalog (OSF_DATASET). A hard link is created for each reference file in the delivery to the directory from which the files will be archived (ARCHIVE_DIR). Using a hard link allows pipeline installation and archiving to proceed independently without requiring the extra disk space used by a file copy. If archiving finishes first, it can delete its link, and if pipeline processing finishes first, it can delete its link. The last link deleted results in freeing the disk space used. Neither affects the other. An Observation Status File (OSF) is created for each insertion entry in the catalog. The OSF is set-up to trigger the pipeline stage that generates an archive request (OSF_STATUS). The DATA_ID field of the OSF is set to the intended archive class appropriate for the reference file. This is determined by reading the XTENSION header keyword value from the 1st image extension in the reference file’s FITS header. If the value exists and contains the string “TABLE” within it, the DATA_ID is set to CTB (i.e. a match occurs for keyword values TABLE and BINTABLE). If the keyword value does not contain “TABLE”, or cannot be located at all, the DATA_ID is set to CDB.
NOTE: The OSFs that are created by this stage are for the individual reference file datasets (e.g. u2440101t). The OSF that triggered this process to begin is for the entire CDBS delivery (e.g. opus_10998_n). So when viewing OSFs using the OMG, there will be two different “flavors” of OSFs on the same blackboard. The dataset OSFs will be of the “CDB” or “CTB” data_id and only use the RQ, RS, and CL stages. The delivery OSFs will be of the “REF” class and use all of the other stages, as well as CL. This distinction is quite obvious since the names of the reference files are in no way similar to the names of the deliveries.
OPUS Resource File Entries
INPATH
directory where CDBS release catalog, trailer, and FITS files are found
ARCHIVE_DIR
Directory where reference files will be archived from
OSF_STATUS
The string of characters used to initialize the OSF created for each
reference file in a delivery.
Success Conditions
OSFs will be created for each reference file being INSERTED into the pipeline, and will set up to trigger archiving, OSF.RQ = w
For the delivery OSF, OSF.AR = c
For the delivery OSF, OSF.TA = w
Error Conditions
Failure to move to the directory holding the CDBS delivery (INPATH)
Failure to read the release catalog listing the reference file datasets.
Failure to create hard links from the INPATH to the ARCHIVE_DIR for each reference file in the delivery.
Failure to create OSFs for each catalog insertion
OSF.AR = e
Trigger: OSF stage RQ = w
Summary
This process will generate an archive request for archiving a CDBS reference file dataset.
This task runs the generic OPUS archive request task “genreq”. A request for archiving is generated for the triggering OSF and an entry is created in the DSQUERY::OPUS_DB.dads_archive database relation documenting the archive request.
OPUS Resource
File Entries
MISSION – name of the
mission being archived
LOG_DIR – the directory where archive log files
will be created for this archive class
DSQUERY – database server for the OPUS database
OPUS_DB – name of OPUS database
INGEST_PATH_NAME – name of the Ingest path
INGEST_STAGE_ID1 – stage name of the first stage
in the Ingest pipeline
INGEST_STAGE_ID2 – stage name of the second
stage in the Ingest pipeline
INGEST_STAGE_STATUS1 – OSF status value for
the first stage in the Ingest pipeline
INGEST_STAGE_STATUS2 – OSF status value for
the second stage in the Ingest pipeline
INGEST_TABLE_FILTER – required by Ingest for
ASN processing, but not used for CDBS Ingest
INGEST_PATH_ROOT – root directory of Ingest
path
DATA_DELETE_LOCATION.04 – points to the archive
log file directory so that the CDBCLN stage can delete them
DATA_DELETE_FILTER.04
- filemask for picking up archive log files so that CDBCLN can delete
them
CDB.DATASET_DIR – the directory where the CDBS reference files for archiving can be found
CDB.TRACK_EXT -
(Y/N) When it is Y, every extension is saved in archive_files relation
CDB.DATASET_FILTER – the filter used to locate
dataset names for archiving
CDB.PROCESS_TYPE – the type of archive request
(GENERIC or types requiring special handling like STIS_ASN, HST_CAL, etc.)
(This entire set of “CDB.” resource entries
is repeated for the CTB archive class.)
Success Conditions
An archive request is submitted for this dataset to
AREQ_DIR.
An entry is made in the DSQUERY::OPUS_DB.dads_archive database table.
OSF.RQ = c
OSF.RS = w
Error Conditions
Failure to generate the archive request.
Failure to create the database entry.
OSF.RQ = e
Duplicate archive request
OSF.RQ = d
Trigger: OSF – ‘w’ or ‘b’ in RS stage
Summary:
This stage will process the response from the Ingest pipeline, which is either successful ingest or failed ingest.
This task is triggered by the remote Ingest pipeline when a CDBS file has been ingested. Two different triggerings are possible, one for successful ingest, one for failed ingest. The triggering OSF is marked complete or error, depending on the triggering status. For CDBS deliveries, the corresponding DSQUERY::OPUS_DB.dads_archive record is updated when ingest was successful.
OPUS Resource
File Entries
UPDATE_OPUS.{cdb|ctb}
– indicates that dads_archive is to be updated upon successful ingest (archive
classes = cdb, ctb)
VIRTUAL_FILES.{cdb|ctb} - indicates if
copying virtual files to cache is needed (NOT used for CDBS)
DATASET_DIR.{cdb|ctb} - location
of virtual files
DATASET_FILTER.{cdb|ctb} – filemask for virtual files
MISSION – name of mission to archive under (HST for CDBS)
LOG_DIR – directory where archive log files are written
MSG_RPT_LVL - Message report level in use. Set equal to JMSG_REPORT_LEVEL
to use path entry
DB_TYPE – type of database (currently = SYBASE)
OPUS_DB_SERVER, OPUS_DB – specify the database to be used for the dads_archive
table update
ARCH_DB_SERVER, ARCH_DB – specifies the archive database to use
Success Conditions
Success trigger from Ingest pipeline.
Update to DSQUERY::OPUS_DB.dads_archive database record indicating successful archiving.
OSF.RS = c
Failure trigger from Ingest pipeline.
OSF.RS = f
Failure to update DSQUERY::OPUS_DB.dads_archive
record.
OSF.RS = e
Trigger: OSF stage TA = w
Summary
This process will create a gzipped tar package of the CDBS delivery including all inserted reference files and the catalog.
This task moves to the directory where the CDBS delivery files exist (CDB_IN_DIR) and extracts all "insert" entries (contain " I ") from the release catalog (OSF_DATASET, CDB_CATALOG_FILE_EXT). This list of entries is used to create a tar file containing the release catalog and all of the FITS files to be inserted for this delivery. The resulting tar file is then compressed using gzip. Hard links are then created from the gzipped tar file to the two outgoing directories for file transfers to the remote sites (OUTGOING1_DIR, OUTGOING2_DIR). Using hard links allows each site to independently obtain the delivery without one affecting the other, and using minimal disk space.
Upon completion this task triggers the next two pipeline stages to run in parallel. These stages copy the gzipped tar file to remote sites #1 and #2.
OPUS Resource File Entries
CDB_IN_DIR
directory where CDBS release catalog and FITS files are found
CATALOG_FILE_EXT
file extension appended to catalog rootname to form complete release catalog filename
OUTGOING1_DIR
Directory where gzipped tar file will await push to remote site #1
OUTGOING2_DIR
Directory where gzipped tar file will await push to remote site #2
DATA_DELETE_LOCATION
Directory where the cleandata process will look for files to delete
DATA_DELETE_FILTER
Filespec used by cleandata to look for files to delete
Success Conditions
A gzipped tar file package containing the release catalog
and FITS files for each catalog insert is created in CDB_IN_DIR
Hard links to the gzipped tar file are created in OUTGOING1_DIR and OUTGOING2_DIR for remote site distribution.
OSF.P1 = w
OSF.P2 = w
Error Conditions
Failure to move to the directory holding the CDBS delivery (CDB_IN_DIR)
Failure to create or gzip the tar file of the CDBS delivery
Failure to create hard links to the gzipped tar file
OSF.TA = e
Trigger: OSF stage P1 = w
Summary
This process will FTP the gzipped tar package to a remote machine.
This task moves to the directory where the CDBS delivery files exist (CDB_IN_DIR) and verifies that the gzipped compressed tar file (OSF_DATASET, CDB_CATALOG_FILE_EXT) created during CDBTAR exists there. If it does, and the DO_COPY resource parameter is set to "YES", it is transferred to a remote machine (RMT_MACHINE, RMT_LOGIN, RMT_DIRECTORY) using the netcopy FTP tool. The file transfer is performed as a put/rename so that the remote machine can poll for the file's existence without picking up an incomplete version of the file. The user will be required to provide a login password for the remote system (PASSWORD_PROMPT) when this process is started by the PMG.
Examination of the process resource file will show that this process is both OSF and time triggered. The time trigger is built with a very long (999 day) delta time interval that effectively guarantees that the process will trigger once when it is initially started, and from then on is only OSF triggered. This trick is used as a way to verify that the password provided for the remote system is exercised right away, to avoid coming back with a "bad password" error hours or days later when the first real OSF event occurs.
NOTE: The current pipeline design has this process, the copy to remote site #1, triggering the “pause” stage, in which time is given for the remote sites to install the CDBS delivery. ST-ECF is currently set-up as remote site #1, and their copies most often take more time to complete than copies to remote site #2, CADC. That is why the pause is triggered following this stage, and not P2. We want to allow the pause following the completion of the longest copy, under normal conditions.
OPUS Resource File Entries
DO_COPY
must be set = "YES" in order for remote copy to occur, otherwise the copy
is skipped
(this allows an easy method for deactivating remote copy with changing
the pipeline structure)
CDB_IN_DIR
directory where CDBS release catalog and FITS files are found
CDB_CATALOG_FILE_EXT
file extension appended to catalog rootname to form complete release catalog filename
RMT_MACHINE
name of the remote system
RMT_MACHINE_TYPE
UNIX or VMS
RMT_LOGIN
login name to use to connect to the remote system
RMT_DIRECTORY
destination directory on the remote system for the tar package
DISK_SLOP_PERC
Percentage of extra free disk space required for a copy to LOCAL disk
to occur
FILE_TYPE
File transfer type of ASCII or BINARY
PUT_OR_GET
Put or get of files to/from remote site
PASSWORD_PROMPT
text that will be displayed when the PMG prompts for a login password for the remote system at process start-up
Success Conditions
The compressed tar package is transferred to a remote system.
OSF.P1 = c
OSF.PS = w
Error Conditions
Failure to login to the remote system
Failure to move to the directory holding the CDBS delivery (CDB_IN_DIR)
Failure to transfer the tar package to the remote system
OSF.P1 = e
OSF.PS = w
**NOTE: Failure of this pipeline stage will trigger the PS stage in the pipeline automatically. It was determined at design time that a failure to transfer a delivery to remote site P1 should not hold up installation at STScI. Such a failure will also cause a shrinking of the “grace period” for remote site P2 to obtain and install the delivery, since normally P2 would have all of the file transfer time to P1 + the pause interval to get its own delivery copied over and installed, but in this case the error causes the transfer time to P1 to be eliminated or shortened due to the failure.
Trigger: OSF stage P2 = w
Summary
This process will transfer the compressed tar package of the CDBS delivery to a remote machine.
This task is identical to CDBPS1 except that the parameters for login to the remote system are changed to reflect remote site #2. The exact same code is used for CDBPS1 and CDBPS2.
This task moves to the directory where the
CDBS delivery files exist (CDB_IN_DIR)
and verifies that the gzipped compressed tar file (OSF_DATASET, CDB_CATALOG_FILE_EXT) created during
CDBTAR exists there. If it does, and the DO_COPY resource parameter is set to
"YES", it is transferred to a remote machine (RMT_MACHINE, RMT_LOGIN, RMT_DIRECTORY)
using the netcopy
FTP tool. The file transfer is performed as a put/rename so that the
remote machine can poll for the file's existence without picking up an incomplete
version of the file. The user will be required to provide a login password
for the remote system (PASSWORD_PROMPT)
when this process is started by the PMG.
Examination of the process resource file will show that this process is both OSF and time triggered. The time trigger is built with a very long (999 day) delta time interval that effectively guarantees that the process will trigger once when it is initially started, and from then on is only OSF triggered. This trick is used as a way to verify that the password provided for the remote system is exercised right away, to avoid coming back with a "bad password" error hours or days later when the first real OSF event occurs.
Note: See the CDBPS1 description for reasons why this task, CDBPS2, does not trigger any other task when it completes. In short, this task is a dead end and does not control further pipeline processing.
OPUS Resource File Entries
DO_COPY
must be set = "YES" in order for remote copy to occur, otherwise the copy
is skipped
(this allows an easy method for deactivating remote copy with changing
the pipeline structure)
CDB_IN_DIR
directory where CDBS release catalog and FITS files are found
CDB_CATALOG_FILE_EXT
file extension appended to catalog rootname to form complete release catalog filename
RMT_MACHINE
name of the remote system
RMT_MACHINE_TYPE
UNIX or VMS
RMT_LOGIN
login name to use to connect to the remote system
RMT_DIRECTORY
destination directory on the remote system for the tar package
DISK_SLOP_PERC
Percentage of extra free disk space required for a copy to LOCAL disk
to occur
FILE_TYPE
File transfer type of ASCII or BINARY
PUT_OR_GET
Put or get of files to/from remote site
PASSWORD_PROMPT
text that will be displayed when the PMG prompts for a login password for the remote system at process start-up
Success Conditions
The gzipped tar file package containing the release catalog and FITS files for each catalog insert is transferred to a remote system.
The compressed tar package is deleted following successful transfer to the remote system.
OSF.P2 = c
Error Conditions
Failure to login to the remote system
Failure to move to the directory holding the CDBS delivery (CDB_IN_DIR)
Failure to transfer the tar package to the remote system
OSF.P2 = e
Trigger: OSF stage PS = w, DK = _
Summary
This process will pause for a configured time interval to allow the remote sites to receive and install this CDBS delivery.
This task will enter a sleep state (WAIT_SECONDS)
to allow both remote sites (from CDBPS1 and CDBPS2) to receive and install the delivery in their
reference file areas. The remote sites preferred this "uncoupled" approach,
as opposed to sending a confirmation file back to STScI that they successfully
installed the delivery.
The triggering includes DK=_ so that if the
P1 stage needs to be retriggered, PS will not retrigger the DK and FB and
further on stages. So, if P1 is retriggered, don’t
be surprised to find that the PS stage contains ‘w’ until the OSF is cleaned
off.
OPUS Resource File Entries
WAIT_SECONDS
time in seconds this task will pause to allow the remote sites to install the delivery
Success Conditions
OSF.PS = c
OSF.DK = w
Error Conditions
None expected
Trigger: OSF stage DK = w
Summary
Attempts
to install a CDBS delivery on disk. For each INSERT entry in the release
catalog, a verification that reference files
and/or tables exist in the incoming directory (CDB_RETRIEVAL_IN_DIR) or in the pipeline reference directories is made.
After this check, each reference file or table in the incoming directory
is installed in the appropriate pipeline reference directory. If a file
conversion is needed for this instrument (CDB_PIPELINE_CAL_*), the STSDAS task strfits
is invoked from a shell script (CDB_IRAF_PARM_DIR).
Each catalog entry contains an indication of the instrument affected by that entry (the last character in the rootname before the last underscore provides the indication). The third column of the catalog entry contains an indication as to whether this entry is a reference file image or table. These two tidbits are combined to form the name of the pipeline reference file directory. For example, the catalog entries
H3R1620FU
I R
F3O1418MM
I T
indicate a WFPC-2 reference file image (U), which will be installed in the uref directory, and a multi-instrument table (M), which will be installed in the mtab directory. Here is the complete list of pipeline reference file directories:
Disk space checks are made for each pipeline reference directory before installing any new reference files or tables. If insufficient disk space exists, then OSF.DK = 'e'.
RESTRICTION: This process MUST be run on the same architecture as the calibration process for instruments that use GEIS-format reference files. GEIS is an architecture-specific format, so when GEIS files are created by this process they must be created on the same architecture that will eventually be used to read them. Instruments that use FITS reference files in calibration are NOT affected by this restriction.
MORE RESTRICTIONS: The IRAF strfits command is run by this task in a shell script to perform FITS-to-GEIS conversion if needed. The conversion is run from the directory specified by the CDB_TEMP_DIR resource value, and strfits apparently creates some temporary work files in that location. IRAF cannot handle a stretched or concealed logical in the definition of CDB_TEMP_DIR. You must provide a full directory specification for this item in either the resource file or the path file.
Note: When strfits is run, it will not overwrite existing output files. Under normal pipeline operation, the same files will not be installed multiple times, but it is conceivable that someone could attempt to run this process for files that are already installed. What happens in this case, is that strfits gives the new files "mangled" names (it usually appends something like "_01" to the name). These files will not be used by pipeline processing. The old files remain the active ones. If you want the new files to be the active ones, either rename them after this process runs, or delete the original files before running this process.
OPUS Resource File Entries
CDB_CATALOG_IN_DIR
directory where the release catalog is found
CATALOG_FILE_EXT
File extension for catalogs
CDB_RETRIEVAL_IN_DIR
directory where incoming reference files and tables are found
CDB_IRAF_PARM_DIR
directory where an IRAF default parameter file is found when strfits is run under VMS
CDB_DISK_SPACE_%SLOP
percentage of extra free disk space required in each pipeline directory before installation proceeds
CDB_PIPELINE_CAL_*
file format expected by pipeline processes for this reference file type
CDB_TEMP_DIR
location used to perform strfits conversion of FITS files to GEIS (see MORE RESTRICTIONS above)
Success Conditions
New reference files and tables are copied (if calibration wants FITS format) or converted (if calibration wants GEIS format) and installed in the pipeline reference file directories.
OSF.DK= c
OSF.FB = w
Error Conditions
Failed to find all catalog entries in either the incoming or pipeline reference file directories (i.e. missing files).
Failed to move a FITS file to a pipeline reference file directory.
Failed to convert
a FITS reference file to GEIS for pipeline use.
Trigger: OSF - 'w' in FB stage
Summary:
This stage will update records in the CDBS database, letting CDBS know that the delivery was installed in both the archive and the pipeline.
The purpose of this task is to verify that reference files in a delivery have completed installation both on disk and in the archive. Once verified, the reference files are marked ready for use in the CDBS database (by updating the opus_load_date and archive_date fields as shown below). The pre-archive and OTFR pipeline REF* processes will only write reference file names that have been marked ready for use into header keywords for calibration.
This task parses the CDBS delivery number from the triggering OSF (e.g. triggering dataset opus_10992_n gives delivery number 10992) and opens the release catalog for the delivery (INPATH) to obtain the reference file rootnames. The task parses each rootname to determine the instrument, and constructs a filespec for that instrument’s reference file disk area. A search is performed of the disk area to verify that each rootname in the release catalog has at least one reference file on disk. If the disk verification fails, the task exits with an error status (OSF_ERROR). If it succeeds, the task updates the CDBS database (CDBS_SERVER, CDBS_DB) for that delivery to indicate that the disk installation has completed in OPUS operations. The table updated depends on the instrument affected by the delivery, but will look generally like this:
update SI_file set opus_load_date
= TODAY where delivery_number = NNNNN
where SI = instrument prefix like “acs, stis, etc.), TODAY = today’s date, and NNNNN is the delivery number. If the database update should fail for some reason, the task will sleep for a configured interval (DB_RETRY_INTERVAL) and then try the update again, assuming that there was a temporary database connection issue that might later be resolved. This retry will occur a fixed number of times before the task will give up and exit with an error status (OSF_ERROR).
The task then performs a similar verification and database update for the archiving portion of the CDBS delivery. Each reference file rootname is used to query the OPUS_SERVER::OPUS_DB:dads_archive table to verify that a successful archive ingest response was received for the ingest request. If the ingest verification fails for any of the reference file rootnames, the task exits with an error status (OSF_ERROR). If it succeeds, the task updates the CDBS database (CDBS_SERVER, CDBS_DB) for the delivery to indicate that the archive installation has completed. The table updated again depends on the instrument affected by the delivery, but will generally look like this:
update
SI_file set archive_date = TODAY where delivery_number = NNNNN
With the parameters defined in the same way as the opus_load_date update query. Again, if the database update fails for some reason, the task will sleep and try the update again later, assuming there was a temporary database problem that might later be resolved.
OPUS Resource
File Entries
INPATH – directory where
reference file delivery catalogs and trailers can be found
DB_RETRY_INTERVAL – time
(in seconds) before the task will attempt to retry a failed database query
CDBS_SERVER, CDBS_DB – database server and database name for the CDBS
database
OPUS_SERVER, OPUS_DB – database server and database name for the OPUS database
The CDBS_SERVER:CDBS_DB.{SI}_file.opus_load_date and CDBS_SERVER::CDBS_DB.{SI}_file.archive_date fields in the instrument specific CDBS “*_file” table are updated for this delivery.
OSF.FB = c
OSF.DE = w
Error Conditions
Failure to find reference files for one or more catalog
entries in the instrument’s reference file disk area
Failure to parse and translate the reference file rootname
into an instrument’s reference file disk area
OSF.FB = e
Failure to update the CDBS database entry.
(will retry a number of times before giving up)
OSF.FB = e
Files not yet archived
OSF.FB = z (retry later, as retriggered by CDBTRG)
Trigger: OSF - 'w' in DE stage
Summary:
This stage will perform deletions of obsolete reference
files and tables from the pipeline reference
file directories, as indicated by DELETION entries in a CDBS release
catalog if the catalog is "old enough". Running this process
frees up disk space by deleting reference files that have been superceded
and should never be used again.
If a catalog contains
DELETION (D) entries, this stage will eventually delete the obsolete reference
files from disk. If a catalog does not contain
any DELETION entries, then this stage exits right away (NO_DELETIONS_NEEDED) and moves on to the next stage.
There used to be a restriction on running this process, since it could delete reference files from disk that might be used by observations nearing calibration in the pipeline. The restriction is now lifted with the implementation of a delay in the deletion of any reference files. When a catalog is triggered for processing in this pipeline stage, the MIN_DELETION_AGE resource parameter is read and compared against the modification time of the release catalog. If the release catalog is not yet sufficiently old, the OSF for this catalog will be set to "sleep" (WAIT_SOME_MORE). The CDBTRG process will later retrigger this OSF to perform the catalog-age test again.
If the catalog is old enough, the process proceeds
with the deletion stage. The release catalog is read, and for each
DELETION entry, attempts to locate and delete reference files matching that
rootname in the corresponding pipeline reference directory. If a catalog
contains no DELETION entries, then the process takes no action.
OPUS Resource File Entries
CATALOG_FILE_EXT
File extension for release catalogs
CDB_DELETION_DIR
directory where release catalog is found
MIN_DELETION_AGE
age in days that a release catalog must be before any file deletions it prescribes can be performed
Success Conditions
Release catalog not old enough
for deletions to be applied
OSF.DE = z (sleep)
Any DELETION entries in the release catalog are removed from the pipeline
reference file directories OR
All attempted deletions were for files that were already missing on
disk
OSF.DE = c
OSF.CL = w
No DELETION entries exist in the release catalog
OSF.DE = n
OSF.CL = w
Error Conditions
Failure to delete a reference file from a pipeline reference directory.
OSF.DE = e
Trigger: Delta-time trigger, with default value = 30 minutes
Summary:
This process awakens on a time-trigger (at DELTA_TIME interval) and checks for OSFs in need of retriggering.
The old and new OSF status values are specified through resource file entries
(SEARCH_OSF.stage, SET_OSF.stage).
This process is part of the CDBS pipeline, but does not have a dedicated pipeline stage in which it runs. It awakens periodically on a time-trigger and checks for OSFs that have certain values in certain pipeline stages, as specified in resource entries. For example,
ENV.SEARCH_OSF.DE = z
ENV.SET_OSF.DE = w
These pairs of resource file entries will cause the retrigger process to look first for any OSF with DE = z, and if any are found the 'z' is changed to 'w', and so on for any other pairs that may be added.
A shell script (retrigger.csh) actually performs the retriggering behavior using the pipeline stage file and generic OPUS tools, so this retriggering behavior could be applied in any pipeline that might need it. It is not in any way specific to the CDBS pipeline.
Currently the resource entries are set-up to look for retries in the FB stage (for deliveries that have not yet completed archiving) and the DE stage (for deliveries with deletion entries that are not yet old enough).
OPUS Resource File Entries
DELTA_TIME
interval at which this process awakens (DDD:HH:MM:SS, DDD=day, HH=hour, MM=minute, SS=second)
ENV.SEARCH_OSF.stage
ENV.SET_OSF.stage
Define the OSF search condition and replacement condition for retriggering an OSF in a particular stage.
Multiple pairs of conditions are allowed. Every SEARCH must have a SET.
Success Conditions
Locates OSFs matching the search condition and applies the OSF changes indicated by the set condition OR
One or more OSFs unable to be updated (could be due to parallel processing, so this is not considered an error) OR
No OSFs match the search condition.
Error Conditions
Failure to locate and read
the pipeline stage file for the current path.
Failure to test the blackboard
for search OSFs.
Trigger: OSF = 'w' in CL stage
Summary:
This stage is performed once an installation is complete
and successful, and cleans up the leftover files that were used during processing,
including the catalog, gzipped tar file, trailer file, and archive log.
This task uses the “cleandata” utility to clean up files that are no longer needed once a CDBS delivery has been installed. The cleandata task will read these entries from the following process resource files to determine what needs clean-up:
cdbcpy.resource
DATA_DELETE_LOCATION.01
= cdbs_input_dir
DATA_DELETE_FILTER.01 = <OSF_DATASET>.cat
DATA_DELETE_LOCATION.02
= cdbs_input_dir
DATA_DELETE_FILTER.02 = <OSF_DATASET>.trl
cdbtar.resource
DATA_DELETE_LOCATION.03
= cdbs_input_dir
DATA_DELETE_FILTER.03 = <OSF_DATASET>.tar.gz
cdbreq.resource
DATA_DELETE_LOCATION.04
= cdbs_archive_log_dir
DATA_DELETE_FILTER.04 = <OSF_DATASET>_*.log
Each of the indicated locations is checked for files matching the filter, using OSF_DATASET from the triggering OSF (either the name of the CDBS delivery, like opus_11011_n, or the name of the reference file, like m4329823n_drk.fits), and any matching files found are deleted. CDBS delivery OSFs will only find matches in the cdbs_input_dir, while reference file OSFs will only find matches in the cdbs_archive_log_dir.
Note: The locations for the remote site deliveries are not included in the cleanup done by this task (OUTGOING1_DIR, OUTGOING2_DIR). Those will automatically get cleaned up by CDBPS1 and CDBPS2 once their copies to the remote sites are successful, so the clean-up done by this task has no effect on the capability to retry failed copies to the remote sites.
Success Conditions
Removes the catalog, trailer, archive log files, and gzipped tar file for a delivery that was successfully installed OR
No files found to delete.
OSF.CL = c
Error Conditions
Failure to delete existing files.
OSF.CL = e
Trigger: time poller awakens nightly at 06:00GMT
Summary:
This process updates the empty “best” reference file fields in the archive catalog tables for any new exposures for supported BESTREF instruments (SUPPORTED_INSTR) that have been ingested into the archive.
This task awakens nightly on a time trigger and checks the archive catalog (ARCH_SERVER, ARCH_DB) for any newly archived exposures for supported BESTREF instruments (SUPPORTED_INSTR) with NULL “best” reference file fields in the database. A separate database table exists for each instrument
acs_ref_data
stis_ref_data
nicmos_ref_data
wfpc2_ref_data
and each contains fields indicating the “best” reference file values for that instrument. ACS for example contains at least these fields
acr_best_biasfile
acr_best_darkfile
acr_best_pfltfile
Etc.
A database query is run against each instrument “ref_data” table to locate any records with at least one NULL “best” reference file field. If no NULL entries are found, the task prints a message and exits. This is the normal case for a period where no new exposures for a particular instrument have been ingested. If any new exposures were found, the bestref algorithm is run against them, obtaining header keyword values from archive catalog tables, applying the file selection rules from the OPUS_DEFINITIONS_DIR:reference_file_defs.xml file, and writing out any “best” field changes needed to a file of SQL. If any SQL update statements were written, these are then applied to the archive catalog (ARCH_SERVER, ARCH_DB) and the file of SQL is date-time stamped, saved (in directory OUTPATH), and e-mailed to the configured list (OPERATOR_MAIL_LIST).
OPUS Resource File Entries
OUTPATH - location for output log files
SUPPORTED_INSTR – list of instruments supported by BESTREF
OPERATOR_MAIL_LIST – list of e-mail addresses that want to know when
BESTREF has been run
ARCH_SERVER – the archive database server
ARCH_DB - the archive database (i.e. the
catalog)
CDBS_SERVER - the CDBS database server
CDBS_DB - the CDBS database
KW_SERVER - the keyword database server
KW_DB – the keyword database
Trigger: time poller awakens nightly at 06:30GMT
Summary:
This process updates the “best” reference file fields in the archive catalog tables for all exposures affected by the delivery of a new reference file.
NOTE: This task runs an algorithm that is basically the inverse of cdbnewexp. Given a new reference file, this task needs to determine which exposures already in the archive should be using it, while cdbnewexp, given a newly ingested exposure, needs to determine which existing reference files apply.
This task awakens nightly on a time trigger
and updates the contents of the CDBS_SERVER::CDBS_DB.new_cal_files
table, setting the timestmp field
for all present records to the current date, signaling that these records
will be processed in this run. Any records coming
in to new_cal_files during the time this task is being run will
be skipped this time, and processed at the next event trigger. The names of all of the new reference files
and their expansion numbers (CDBS lingo for the number of modes covered by
this particular file type) are retrieved from CDBS_SERVER::CDBS_DB.new_cal_files
for those entries with non-NULL timestmp fields.
For each new reference file, the set of mode selection parameters and their values, including the useafter_date, are read from the CDBS_SERVER::CDBS_DB.{si}_row database table, where {si} is a science instrument prefix like “acs” or “stis”. These mode values will be matched against currently archived exposures to search for exposures that should be using this new reference file. The set of exposures to update is found by using the useafter_date of the new reference file as a start date and the useafter_date of any subsequent reference file covering the same mode as a stop date to form a bracketing date range. If there is no subsequent version, the current date is used as the stop date to complete the date range. With this date range and the mode information, an SQL database update statement is created for the ARCH_SERVER::ARCH_DB.{si}_ref_data table to update the appropriate “best” reference file field for all pertinent exposures to the new reference file name. The SQL statement is written to an output file, for later application to the database, and includes SQL code for “chunking” the number of rows (UPDATE_LIMIT) that will be affected by the update operation. This “chunking” is necessary to avoid causing problems with database replication by updating hundreds of thousands of rows at one time. All of the necessary rows do get updated in this “chunking” scheme, they just get updated in row-limited sets. Once all new reference files and expansion numbers have been processed, the file of SQL written (if any) is applied to the archive database (ARCH_SERVER, ARCH_DB), date-time stamped, saved (in directory OUTPATH), and e-mailed to the configured “wants-to-know” list (OPERATOR_MAIL_LIST).
A special list of the new reference file names that have just been applied against the archive catalog is created and mailed to a distribution list for the STIS (STIS_MAIL_LIST), since that instrument team is interested in knowing when new reference file deliveries have been processed.
Finally, all entries in CDBS_SERVER::CDBS_DB.new_cal_files with non-NULL timestmp values are copied to the CDBS_SERVER::CDBS_DB.new_cal_files_history table and deleted from new_cal_files. This ensures that the new reference files just processed will not be picked up again on the next triggering event.
OPUS Resource File Entries
OUTPATH - location for output log files
SUPPORTED_INSTR – list of instruments supported by BESTREF
UPDATE_LIMIT – maximum number of records updated in a “chunk” (to avoid
replication problems). All updates are performed,
just in chunks.
STIS_MAIL_LIST – list of e-mail addresses that want to know when new
STIS reference files have been processed
OPERATOR_MAIL_LIST – list of e-mail addresses that want to know when
BESTREF has been run
ARCH_SERVER – the archive database server
ARCH_DB - the archive database (i.e. the
catalog)
CDBS_SERVER - the CDBS database server
CDBS_DB - the CDBS database
Trigger: time poller awakens nightly at 07:00GMT
Summary:
This process updates the empty “best” calibration switch fields in the archive catalog tables for any new exposures for supported BESTSWITCH instruments (SUPPORTED_INSTR) that have been ingested into the archive.
This task awakens nightly on a time trigger and checks the archive catalog (ARCH_SERVER, ARCH_DB) for any newly archived exposures for supported BESTSWITCH instruments (SUPPORTED_INSTR) with NULL “best” calibration switch fields in the database. A separate database table exists for each instrument
acs_ref_data
stis_ref_data
wfpc2_ref_data
and each contains fields indicating the “best” calibration switch values for that instrument. ACS for example contains at least these fields
acr_best_biascorr
acr_best_darkcorr
acr_best_flatcorr
Etc.
A flatfile exists for each supported BESTSWITCH instrument listing the names of the calibration switches that should be checked. The files are named like
OPUS_DEFINITIONS_DIR:acs.switches
The file is located for each instrument and then a database query is run against each instrument “ref_data” table to locate any records with at least one NULL “best” calibration switch field. If no NULL entries are found, the task prints a message and goes on to the next supported instrument. This is the normal case for a period where no new exposures for a particular instrument have been ingested. If any new exposures were found they are written to a disk file and fed to the “kwset” tool, which obtains header keyword values from archive catalog tables, applies keyword rules for setting switch values, and writes out any “best” calibration switch changes needed to a file of SQL. If any SQL update statements were written, these are then applied to the archive catalog (ARCH_SERVER, ARCH_DB) and the file of SQL is date-time stamped, saved (in directory OUTPATH), and e-mailed to the configured list (EMAIL_LIST).
OPUS Resource File Entries
OUTPATH - location for output log files
SUPPORTED_INSTR – list of instruments supported by BESTSWITCH
DB_RETRY_INTERVAL – the time (in seconds) before retrying a failed
database access
UPDATE_LIMIT – maximum number of records updated in a “chunk” (to avoid
replication problems). All updates are performed,
just in chunks
ARCH_SERVER – the archive database server
ARCH_DB - the archive database (i.e. the
catalog)
KW_SERVER - the keyword database server
KW_DB – the keyword database
The OPUS architecture provides a way for the same set of processes to be run against different sets of data and disk directories through the use of a path file. A path file (described in detail in the OPUSFAQ) can be thought of as a configuration set up for running a set of data. When you start a set of pipeline processes running under OPUS, you must tell it which path to use.
Process resource files are usually set up to "pull" certain parameter values "through a path file". This means that in the process resource file, instead of directly providing a value for a parameter, the value is given as the name of an entry in a path file. By using different path files, you can change the values of certain items used by the processes in a pipeline.
In the CDBS default path file, here are some example parameters that are referenced by existing entries in the process resource files (this is not a complete path file):
! CDBS database used
to set feedback indication and read for BESTREF/BESTSWITCH
CDBS_DB
= cdbs_test
CDBS_SERVER
= ROBBIE
! ARCH and KW databases
used to run nightly BESTREF/BESTSWITCH
ARCH_DB
= dadsdev
ARCH_SERVER
= ROBBIE
KW_DB
= keyworddev
KW_SERVER
= ROBBIE
cdbs_input_dir
= /data/cdbs1/opus_release
cdbs_archive_dir = !<directory where files are
put for archiving>
cdbs_archive_log_dir
= !<directory for CDBS archive log files>
outgoing_remote1_dir
= !<directory for outgoing gzipped tar pkg to remote site 1>
outgoing_remote2_dir
= !<directory for outgoing gzipped tar pkg to remote site 2>
!local directory for new deliveries
cdbs_machine
= area51
!remote CDBS machine
cdbs_machine_type
= UNIX
!type of CDBS machine (VMS or UNIX)
cdbs_login
= mswam
!login to use for CDBS machine
cdbs_prompt
= area51:mswam
!reminder for PASSWORD prompt
cdbs_rmt_directory
= /home/mswam/cdbs_data/ !remote directory where files
are found
!
! These
two sets of entries describe the remote connection parameters
! for
delivering CDBS reference files to ST-ECF (remote1) and
! CADC
(remote2)
!
remote1_prompt
= ST_ECF:cdbs
remote1_machine
= empty ! <e.g. tib.stsci.edu
remote1_login
= empty ! <e.g. opus
remote1_directory
= empty ! <e.g. /data/cdbs1/opus_release/
!
remote2_prompt
= CADC:cdbs
remote2_machine
= empty ! <e.g. tib.stsci.edu
remote2_login
= empty ! <e.g. opus
remote2_directory
= empty ! <e.g. /data/cdbs1/opus_release/
!
! This
value is the period of time to pause after pushing the CDBS delivery to
! the
second remote site. This will allow both remote sites time
! to
install the reference files before the archive database is triggered
! to
start calling for them.
!
wait_seconds
= 1200 ! currently 20 minutes, at their request
The OPUS architecture provides a way to define a set of environment variables across all processes that run within it. There is a special command file script, opus_login.csh, that is run prior to starting any process within OPUS. All variables defined in opus_login.csh will be available to processes run under OPUS.
These are the variables defined in the default opus_login.csh file that are used by the CDBS pipeline:
setenv
IRAFBIN - points to the TABLES software tree for VMS
setenv
TABLESDISK - points to the TABLES software tree for UNIX
setenv
nref - location of NICMOS reference files
setenv
jref - location of ACS reference files
setenv
oref - location of STIS reference files
setenv
uref - location of WFPC-2 reference files
setenv
xref - location of FOC reference files
setenv
yref - location of FOS reference files
setenv
zref - location of HRS reference files
setenv
mtab - location of multiple-instrument reference tables
setenv
ttab - location of throughput tables
The CDBS operator will hopefully not need to attend to many pipeline problems. Here are a few known problems that could arise, and what actions can be taken by the operator to recover from them:
Failure to Connect to Remote CDBS System
Several pipeline stages (CDBPOL,CDBCPY, CDBFB) could fail if the network connection to the remote CDBS system, or one of the remote mirror sites, is unavailable, or if the login account information provided (login name, password) is incorrect.
Action:
Retriggering (setting the OSF column to 'w') or restarting the process at a time when the network connection is again available or providing proper login information could correct these problems.
Incomplete CDBS delivery found
If CDBS puts an incomplete delivery into the release area (files referenced in the release catalog that aren't on disk), the CDBCPY stage will generate an error.
Action:
It is possible, if CDBS does not follow their normal release procedure, for the catalog file to be made available while the reference files (which can be large) are still being copied into the release directory. CDBCPY triggers on the existence of the catalog file, so if the timing is just right, it is possible for it to trigger when all files are not yet available. You can retrigger the process (setting the OSF column to 'w') to see if the delivery was eventually made complete by CDBS. If it still fails with the same error, CDBS will need to be contacted to find out why the delivery is not complete.
[endif]-->
Insufficient Disk Space in the Pipeline Reference File Directories
CDBDSK can fail if there is insufficient disk space in the pipeline reference file directories to install a delivery.
Action:
If the calibration stages of the pipeline are quiet or can be made so, try running CDBDEL to get back disk space being taken up by obsolete reference files. If this does not bring back sufficient free space the options are more limited (deletion of reference files that could be called for in the future, obtaining more disk space, etc.).