CLEANDATA

Summary

This command deletes data files related to the processing specified by OSF status.
 

Description

This command deletes data files in a path related to an OSF. This is accomplished by cross-referencing the OSF components with the process resource files contained in the path. The default action of the cleandata task is to use all the process resource files found in OPUS_DEFINITIONS_DIR that have the SYSTEM and CLASS keywords that matches the value of the keywords SYSTEM and CLASS in the cleandata resource file. The user can override the CLASS keyword by one or more CLASS_GROUPING.nn keywords. To access all classes, the value ‘*’ is used  by the keyword CLASS_GROUPING.01. The cleandata task will use these keywords to select only process resource files with a matching SYSTEM and CLASS keyword values. SYSTEM and CLASS are required process resource file keywords for every resource file. Any error in loading the path's process resource file will cause the command to exit with an error status. After the process resource files, that are consistent with the SYSTEM and CLASS, are read by the task, the OSF components are used to identify data files ready for deletion.

 

First, the status component of the OSF is used to determine what stages in the pipeline have been run. Any step having a status value that is neither a ‘n’ nor a ‘_’ will be checked. The process resource files are searched for any tasks that execute these processing step(s). The process resource file's OSF_PROCESSING keyword identifies the pipeline stage a task executes.

Next, the OSF component DATA_ID is compared to the data_id of the process resource file's data_id value(s). The data_id  in a process resource file can be found in the OSF_TRIGGERnn.DATA_ID keyword(s). If this keyword is not found in the process resource file, it is assumed that the process executes on all data_id values.

Finally, the location and name of related data files are built using the optional process resource file keywords. The default filter may be specified in the cleandata resource with the optional keyword DEFAULT_FILTER. If DEFAULT_FILTER is not specified then “<OSF_DATASET>.*”  is used as a default. The default filter is overridden by optional process resource file keywords OUTPATH_FILTER or DATA_DELETE_FILTER.nn.

The location of files to be deleted is found in the optional process resource file keywords OUTPATH or DATA_DELETE_LOCATION.nn. If a non-default filter is needed for OUTPATH it is specified by the OUTPATH_FILTER keyword. If a non-default filter is needed for any of the DATA_DELETE_LOCATION.nn keywords, the keyword DATA_DELETE_FILTER.nn is used with a matching value of the integer “nn”.

The OSF's dataset_name will be substituted for string < OSF_DATASET > if found in the OUTPATH_FILTER or DATA_DELETE_FILTER.nn keyword values. If the output files only use part of the OSF’s dataset name, then the optional filter syntax <OSF_DATASET_nn> is used. The ‘nn’ can be one or two digits and it indicates the number of characters of the dataset name to use in the filter. The OUTPATH and DELETE_DATA location and filter keywords are never placed in the resource file that runs cleandata, where only the optional DEFAULT_FILTER keyword may be found.

Process resource files that contain no OSF_PROCESSING keyword are ignored. Process resource files that contain no data location information (either OUTPATH, OUTPATH_FILTER, DATA_DELETE_LOCATION.nn or DATA_DELETE_FILTER.nn) are skipped. If OUTPATH is found but no OUTPATH_FILTER is present, then the default filter is assumed. The filter names use the filename meta-characters supported by the file system. For example, ‘*’ is zero or more characters, ‘?’ is a single character, and ‘[abc…]’ match any of the characters enclosed in the square brackets.  

The process that runs cleandata should be run in a pipeline where the substantial initialization overhead is paid only one time. But for testing by developers, an interactive mode is supported to simulate its operation for a particular path and cleandata process resource file. In the interactive mode, pipeline OSF’s are not used, and optional arguments replace the information that would have been extracted from an OSF. The interactive mode tests the filter syntax and completeness of the resource files that defined the files to be deleted.

The cleandata resource file

The cleandata process requires a resource file even when run interactively. Every pipeline that uses cleandata will have at least one uniquely named resource file that uses an OSF trigger based on the stage title in the pipeline stage file. This trigger should always have a data_id restriction. If multiple data_id values must be supported in the same resource file, then multiple triggers must be defined for each data_id. The required OSF status keywords for the cleandata resource file are OSF_PROCESSING, OSF_SUCCESS and OSF_ERROR. The optional keywords, FILTER_DEFAULT and CLASS_GROUPING.nn, are described above. The task line of the resource file should look as follows:

 cleandata -p $PATH_FILE -r process

Interactive Usage

cleandata -p path_name -r process -d rootname -i data_id -o status_columns

For example the following command will delete all files related to the OSF whose dataset_name is "no5h905040" and data_id is "nic" and status column values are "ccccccc____c" in the red path.

cleandata -p red –r niccln -d no5h905040 -i nic -o ccccccc____c

All arguments are required.

Input

Resource files in the path.

Either an OSF or optional command line parameters.

Output

Informational messages describing files being deleted in the pipeline

Required Interactive Command-Line Arguments

pathname

The path containing the directory definitions required by the pipeline resource files for the SYSTEM value in the cleandata process resource file.

 

process

 The name of the cleandata process resource file (without the extension ‘.resource’) that runs the cleandata task

 

rootname

The rootname of the simulated OSF.
 

data_id

The data id of the simulated OSF.
 

status_columns 

The status component of the simulated OSF.