Data Management System Association Project

Introduction

 

The Proposal Transformation (TRANS) program is being enhanced to add rules so that more NICMOS and ACS exposures are combined into associations. These new rules will take affect in the next cycle of proposals. In addition, because we are able to reprocess old datasets in the Data Management System (DMS) On The Fly Reprocessing (OTFR) pipeline, these new rules of association should be used for the old datasets as well. This document describes a project to reprocess old NICMOS and ACS datasets that were not associated and create new association products defined under the new broader rules. Because this activity involves DMS software rather than TRANS software, this project is called the DMS Association Project. This DMS project does not require any changes to the Science Planning and Scheduling System (SPSS) database.

 

Overview

 

In order for the OTFR pipeline to generate associated datasets, database records must be present for the association products in the Observation Processing Unified System (OPUS) database tables: asn_association, asn_members, asn_product_link, and qolink_sms. Normally, these tables are populated during processing of the Mission Schedule in the DMS PASS Data Receipt (PDR) pipeline, based on the SPSS database table qeassociation. The Mission Schedule includes the HST Vehicle Maintenance activities as well as the Science Mission Schedule (SMS). For DMS associations, a new tool is required to add these required association records. The SPSS database qeassociation table will not longer have a complete set of associations. The OPUS database tables will contain data for both TRANS associations and DMS associations.

 

In order for the user community to identify the newly associated products in the Archive so that they can be regenerated in OTFR, the Data And Distribution System (DADS) catalog tables must reflect all characteristics of associated datasets. This accomplished by reprocessing, reingesting and re-cataloging the new associations.  A special OPUS reprocessing pipeline and a special DMS Reingest pipeline are already established for reprocessing activities, so no extra software or configuration is required for these activities. A new tool will be required to identify lists of science Post-Observation Data (POD) telemetry files needed for reprocessing.

 

After the new associations have been cataloged in DADS database tables, some old records created in the original ingest must be cleaned out. These are the records meant only for non-associated datasets and association product datasets. These records with dataset names of association exposures will still be present after the new associations are cataloged. These records can cause problems for users because the individual exposures of an association cannot be processed by OTFR.

 

 

 

 

New OPUS database table

 

Only one new OPUS database table is used for DMS associations. This table, dms_asn_id, allows OPUS to easily distinguish between TRANS associations and DMS associations. It contains only one field, the name of the DMS association. The table is needed for the new tools that manage the other OPUS association tables. The table can also be used for reporting the progress of reprocessing the new associations.  At the end of all reprocessing this table may be deleted.

 

New DMS Association tools

 

There are four new tools that are run by OPUS operations staff to manage the insertion of new database records and to collect data for reprocessing. The tools will be delivered in the OPUS configuration directory /dbms/tools where they can be copied by the operations staff into the override bin/sparc_solaris directory that is used for temporary software.

 

There are four tools that are described in greater detail in the following sections.

 

 

 The dms_asn_sms.pl tool

 

The Science Mission Schedule contains the plan for executing the science observations over a small time interval, usually a week. To avoid problems in linking sequential schedules, any set of observations using the same guide-stars that are combined into an “obset” must be completed before the end of the SMS. That means that the end of a SMS is also the end of any “obset”. A base-lined SMS is one that has never been rescheduled by a later “replan” SMS. Therefore, the end time of a base-lined SMS is sure to be a boundary for any “obset”. The dms_asn_insert.pl tool needs a time boundary that never crosses an “obset”. It uses the end time of a base-lined SMS to get such a boundary.

 

The interactive tool dms_asn_sms, displays the SMS ID values for base-lined SMS files that start with the characters supplied in its only argument. The first two characters of any SMS ID are the last two digits of the year. The next three characters are the zero-filled day-of-year. To list all of the base-lined SMS ID values for 2005, along with the exact end time of the SMS, the following command would be used:

 

>dms_asn_sms.pl 05

 

To list the SMS ID values at the beginning of 2004, the following command would be used:

 

>dms_asn_sms.pl 040

 

The most recent SMS should never be used. Candidate SMS values should be a few weeks old so that all Observation Monitoring System (OMS) data obtained from HST Engineering Data has been processed and ingest into DADS. The tool dms_asn_insert.pl used the DADS oms_summary table to determine which observations are secondary parallel (coordinated) observations that are restricted for new NICMOS associations. 

 

The dms_asn_insert.pl tool

 

The interactive dms_asn_insert.pl tool has two arguments: the previous SMS ID (optional) and the end SMS ID. The tool inserts all the new database records that define the new DMS associations. This is the primary tool of this project.  It searches the SPSS and OPUS databases for all NICMOS and ACS exposures that have been successfully processed from the end time of the previous SMS up the end time of the end SMS. It filters out unsuitable records and finds sets of exposures that have at least two members with the same characteristics. These sets of exposures are assigned new unique association name.

 

Any association name consists of five parts: one-character instrument code, three-character program ID, two-character “obset” ID, two-character association number, and the constant  “0”. Partly because of the naming convention, associations never cross “obset” boundaries but there may be more than one new association in the same “obset”. The unique association number is found by finding the maximum association number for existing associations for each “obset” and incrementing that base-36 number for each new association name. It is very common that new associations occur in “obsets” that already have TRANS associations. For each new association, only one association product will be generated. For each new DMS association the following OPUS database tables must include new records.

 

 

The dms_asn_pods.pl tool

 

The interactive dms_asn_pods.pl tool has two arguments: the start day and the end day. The tool performs a query that joins the dms_asn_id table with the asn_association table selecting the associations that have asn_association.last_exp_date from the start day up to, but not including, the end day. For these associations, the asn_product_link table is used to find all the exposure members. For each exposure member, the podnames table and the “executed” table are used to get the names of the POD files to be reprocessed. All the POD file names are written to a text file having one line per name.

 

The end day value can be adjusted as needed by the operator until the number of POD files is within the practical range for a large batch of reprocessing. The output file uses a format suitable for a single DADS Distribution request. Neither the start day nor the end day need be at SMS boundaries. Operational staff has complete flexibility to limit the size of the POD reprocessing to any number of days. Known pipeline congestion problems can occur if too many datasets are reprocessed concurrently so it is important to have the flexibility to control the size of a batch of POD files to be reprocessed.

 

The dms_asn_delete.pl tool

 

The interactive dms_asn_delete.pl has one argument: the DMS association to be deleted. This tool is used if processing problems occur that indicate the member exposures are incompatible for a DMS association. Since we do not want the problems to reoccur in OTFR processing, the DMS association records of failed associations must be deleted. Only associations having a record in dms_asn_id can be deleted.

 

Project Implementation

 

There are three Problem Reports (PRs) that track the analysis, development, testing and installation of changes required for this project. Each PR affects a different operational subsystem. For this project, each PR can be implemented independently with no coordination of installation.  Each PR is describe briefly below and has a link to the actual PR that can be used if the reader is reading this document online.

 

The OPUS changes

 

The four tools described above and the file that defines the new OPUS database table dms_asn_id will be implemented under the control of PR.54219. With this OPUS PR the tools will be tested and the tools installed in the dbms/tools directory. At the build installation an empty database table dms_asn_id will be created in the operational OPUS database.

 

The DADS changes

 

The DADS PR.54290 is used to instigate an analysis of DADS tools that might still use the SPSS database table qeassociation. Because of the additional DMS associations only the OPUS database table asn_members should be used to validate association content.

 

 The ARCH changes

 

In the Problem Reporting System, the ARCH subsystem is used to repair the Archive (DADS) Database. The ARCH PR.54291 requires that a repair tool be developed to clean up certain DADS database tables that will have unwanted records associated with the original cataloging of the reprocessed datasets.

 

 

Rules for DMS Associations

 

The exposure characteristics for both ACS and NICMOS are all taken from SPSS database fields, most of which were populated by the TRANS program. These fields must be the same for each exposure included in an association.  They are listed in the following table.

 

Field Name

Table Name

Comment

program_id

qobservation

standard identifier

obset_id

qobservation

standard identifier

config

qelogsheet

from proposal logsheet

opmode

qelogsheet

from proposal logsheet

coord_id

qobservation

aperture name

targname

qelogsheet

from proposal logsheet

sp_1

qelogsheet

from proposal logsheet

sp_2

qelogsheet

from proposal logsheet

sp_3

qelogsheet

from proposal logsheet

sp_4

qelogsheet

from proposal logsheet

minwave

qelogsheet

from proposal logsheet

 

 

The candidates for inclusion into DMS associations are filtered by database query rules listed in the following table.


 

Scope

Rule

ACS

qobservation.si_id must be “ACS”

NICMOS

qobservation.si_id must be “NIC”

Both

qobservation.pred_strt_tm must be within desired time range

Both

qobservation.control_id not blank (i.e. not a download record)

Both

qobservation.mt_flag must be blank (i.e. not moving target)

Both

qolink_sms.status must be “E” (successfully executed and archived)

Both

asn_members record does not exist (i..e not in any association)

ACS

qelogsheet.opmode not “ACQ” within entire “obset” that includes this exposure

NICMOS

qelogsheet.opmode not “ACQ” for this exposure.

ACS

qelogsheet.targname not (“EARTH_CALIB","DARK","NONE","BIAS","TUNGSTEN","DEUTERIUM")

NICMOS

ARCH_DB..oms_summary.oss_parallel not “SECONDARY” (i.e. not a coordinated parallel observation)

Both

After the above filters are applied, there must be at least two exposures having the same characteristics specified in previous table.

 

The new asn_members records have a field called member_type. There are four different values for member_type as specified in the following table.

 

Instrument

Member

member_type

ACS

Product

PROD-DITH

ACS

Exposure

EXP-DITH

NICMOS

Product

PROD-TARG

NICMOS

Exposure

EXP-TARG