The reingest procedure is used to transfer datasets into the reingest pipeline that is used for this purpose. This reingest pipeline is a special ingest pipeline set up to be able to override the value used by the primary ingest pipeline for the archive_data_set_all.ads_data_source field. This allows the operator to indicate the nature of the reprocessing. This reingest pipeline is also used for ingest and catalog regression tests where the ads_data_source field is always REGR.
These tools were developed in multiple build releases under separate OPRS. Each OPR contains test plans that give examples of the use of these tools. A Web link will be provided for each OPR so that this additional information may be easily available.
The procedure to reingest datasets usually involves three steps:
None of the tools will work if there are any OSFs left in the reingest pipeline. This is needed so that there is no confusion about the completion of the processing for all datasets. In an ingest pipeline, the ingdel task removes any OSF that has a completion status (c) in its last column. Use the OMG for the reingest path to set the last column to c, in order to delete an OSF for a previous failure that requires reingesting.
The data that is to be reingested must be segregated into directories that contain only the data allowed for each tool. Multiple datasets can be loaded into these directories. Usually a reingest task requires only one directory. The kinds of data that need a separate directory are shown in Table 1. Association files can go into the same directory as the files for the members of that association.
OPUS 2010.4 changes should allow the data to be ingested in either the name that comes out of the originating OPUS pipeline, or the one that DADS delivers. The data format matters. In other words, SMS data can be in either yd5a11550.pod format, or yt5813360_pod.fits format. PDQ data that is a text file called o6n901fbq.pdq should work, and PDQ data that has already been converted to fits and is called o6n901fbq_pdq.fits will also work. OMS FITS data called o6n901fbq.jif will work just as well as OMS FITS data called o6n901fbq_jif.fits.
There are two additional tools that were intended for regression
testing the ingest pipeline and the cataloging processes:
ingest_regr_datasets.csh
and
catalog_regr_datasets.csh
. It is not clear these tools
have ever been used or if they are still needed or even if they would
still work. But they were not deleted from the build tree. With
the exception of those two scripts, AUTO regr_00202 now tests all
these tools using real data, but skips DADS by using the nsa_bypass.pl
tool discussed below.
Because each tool handles data having special requirements each tool has slightly different interfaces. The tools share many common subroutines and the output should look similar. Each tool can be called with no arguments to get a usage description. Here is a combination description of all the tools.
Table 1. Tool usage by class or data_id |
||
CLASS | Data ID | Tools |
---|---|---|
CAL | many | ingest_hst_cal_oms.pl catalog_hst_cal_oms.pl (all CAL data can be in one directory in a single request) |
OMS | fgs, fas |
ingest_hst_cal_oms.pl catalog_hst_cal_oms.pl ( all OMS data can be in one directory in a single request) |
Table 2. Tool usage by data_id | ||
CLASS | Data ID | Tools |
---|---|---|
EPC | epc | ingest_hst_non_cal.pl catalog_hst_non_cal.pl |
MSC | msc | ingest_hst_non_cal.pl catalog_hst_non_cal.pl |
MTL | mtl | ingest_hst_non_cal.pl catalog_hst_non_cal.pl |
ORB | orb | ingest_hst_non_cal.pl catalog_hst_non_cal.pl |
POD | pod | ingest_hst_non_cal.pl catalog_hst_non_cal.pl |
PRB | prb | ingest_hst_non_cal.pl catalog_hst_non_cal.pl |
PDQ | pdq | ingest_hst_non_cal.pl catalog_hst_non_cal.pl |
SMS | sms | ingest_hst_non_cal.pl catalog_hst_non_cal.pl |
TVI | tvi | ingest_hst_non_cal.pl catalog_hst_non_cal.pl (these type of data waiting for PR 48609) ) |
TVL | tvl | ingest_hst_non_cal.pl catalog_hst_non_cal.pl (these type of data waiting for PR 48609) ) |
ACC | oma | ingest_hst_non_cal.pl catalog_hst_non_cal.pl (These type of data are deprecated by PR 65747) |
ACM | acm | ingest_hst_non_cal.pl catalog_hst_non_cal.pl |
ANC | anc | ingest_hst_non_cal.pl catalog_hst_non_cal.pl |
AST | ast | ingest_hst_non_cal.pl catalog_hst_non_cal.pl |
CDB | cdb | ingest_hst_non_cal.pl catalog_hst_non_cal.pl |
CTB | ctb | ingest_hst_non_cal.pl catalog_hst_non_cal.pl |
DIA | adm, cdm, ndm, sdm, wdm (dia to mix and match) |
ingest_hst_non_cal.pl catalog_hst_non_cal.pl |
DMP | dmp | ingest_hst_non_cal.pl catalog_hst_non_cal.pl |
EDT | edi, edl, edn, edo, edu, (edt to mix and match) |
ingest_hst_non_cal.pl catalog_hst_non_cal.pl |
DLG | dlg | ingest_dads_logs.pl (no catalog required) |
Each of the tools has a -i <input_dir>
option now, so that the
location of the input data can be specified, instead of forcing the user to be
working from that directory. If this option is NOT provided, then the data are
assumed to be in the current working directory.
CAL archive_class data have a many-to-one relationship with data_ids, but since the same ingest and catalog steps are done for all CAL data_ids, CAL data from multiple instruments can be mixed together in one input directory. The ingest and catalog steps for OMS data are similar to CAL data, so there is one set of tools for both these archive_classes:
>ingest_hst_cal_oms.pl -c <archive_class> -o <data_source> -i <input_dir>
>catalog_hst_cal_oms.pl -c <archive_class> -i <input_dir>
Most of the data_ids used as input to these tools correspond to
archive classes, with the exceptions of DIA and EDT archive classes,
which both map to many data_ids each. However, the strings
dia
or edt
will work as 'data_id' input
for those types of data as of OPUS 2010.4.
>ingest_hst_non_cal.pl -d <data_id> -o <data_source> -i <input_dir>
(see original PR 51848)
>catalog_hst_non_cal.pl -d <data_id> -i <input_dir>
>ingest_dads_logs.pl -o <data_source> -i <input_dir>
(see original PR 52338)
dlg
.
For all tools above:
setenv MSG_REPORT_LEVEL MSG_ALL
>ingest_regr_datasets.csh <path> (see original OPR 51879)
>catalog_regr_datasets.csh <path>
The above two tools still exist, but it is not clear they are ever used. They were tweaked by PR 59963 only to remove calls to no longer existing tools.
Note that the FUSE tools were removed since we do not expect to reingest FUSE again.
From what I can glean, this tool has traditionally been used by DADS Ops personnel whereas ARCINS is traditionally used by OPUS Ops personnel to, for example, get failed data into the archive under the PRB archive class.
It needs to be tested in this context, but I believe ARCINS could be replaced by this set of tools.
n3uy01060.tra n3uy01piq_raw.fits n3uy01pkq_ima.fits n3uy01060_asn.fits n3uy01piq_spt.fits n3uy01pkq_raw.fits n3uy01061.tra n3uy01piq_trl.fits n3uy01pkq_spt.fits n3uy01phq.tra n3uy01piq_trl.txt n3uy01pkq_trl.fits n3uy01phq_cal.fits n3uy01pjq.tra n3uy01pkq_trl.txt n3uy01phq_ima.fits n3uy01pjq_cal.fits n3uy01plq.tra n3uy01phq_raw.fits n3uy01pjq_ima.fits n3uy01plq_cal.fits n3uy01phq_spt.fits n3uy01pjq_raw.fits n3uy01plq_ima.fits n3uy01phq_trl.fits n3uy01pjq_spt.fits n3uy01plq_raw.fits n3uy01phq_trl.txt n3uy01pjq_trl.fits n3uy01plq_spt.fits n3uy01piq.tra n3uy01pjq_trl.txt n3uy01plq_trl.fits n3uy01piq_cal.fits n3uy01pkq.tra
ingest_hst_cal_oms.pl -c CAL -o DLSG -i ./data >& n3uy01060_ingest.out
Note: capturing the output is a really good idea in case there are going to be questions about the results.
catalog_hst_cal_oms.pl -c CAL -o DLSG -i ./data >& n3uy01060_catalog.out