Getref Design

Getref Design

Bernie Simon

5 September 97

1: Overview

Getref is designed as a replacement for the existing CDBS task of the same name. As such, it has the same functionality as the existing task: it prints the best reference files to use for a given observation. It differs from the current version by being written entirely in C and by using the new CDBS database exclusively. It extends the functionality of the current version of getref by also having a cgi interface and optionally formatting its output as a html page. This will allow off site users to run getref, which they have not been able to do until now. In addition, the new version of getref will generate two kinds of output:

The cgi interface will be set up so the user can generate both kinds of output using links in the html output pages generated by getref.

2: Input

The parameters to getref are:

The way the input parameters are formatted depends on how the task is run. If the task is run using the cgi interface, the input will be formatted according to the usual cgi convention. The task will be able to read input using either the get or post methods. Input of root name values from a file will be disallowed when using the cgi interface, as there is no way to read the file over the net.

If the task is run from the command line, the input will be formatted to resemble iraf command input:

getref  reffile=
Unlike iraf, the task arguments can appear in any order. Two task parameters in the current version of getref, long and old will not be in the new version. The first has been replaced by the detail output and the second is no longer supported, as the old cdbs database will no longer be supported.

3: Output

Output from getref will either be formatted as an ascii table or an html table, depending on whether the task is invoked from the command line or by the cgi interface. The traditional output table will contain the following columns:

In the html version of the table the values in the reference file type column will be links to generate the corresponding detail information. There will also be a form to perform another traditional query.

The detail output will contain selected fields in the file and row level relations. The following fields will be selected from the file level relation:

The observation mode and pedigree information (pedigree, observation start date, and observation end date) will be selected from the row level relation. The information in the row level relation corresponding to the observation mode of the observation will be printed.

4: Main Flow

The main procedure of getref is fairly simple. Here is the pseudocode:

Parse input and set input type (command line or html)
Determine query type (traditional or detail)
Check value of root against query type and input type 
IF traditional query THEN
    Do traditional query, save results in table
ELSIF detail query THEN
    Do detail query, save reults in table
END
Print table according to input type

Because the format of the input and output varies acording to the input type, the queries get their input and generate their output using intermediate formats. Both formats use code already written for cdbs. The input parameters are stored as an associative list (alist) and the output from the queries is stored in an in-memory table. The code which builds the associative list from command line arguments is already written and tested and can be found in main.c and option.c in the cdbslib library. Similarly, the code to manipulate in-memory tables can be found in table.c.

5: Parse input

The code to parse the command line input is mostly written. The sole exception is that that the current code only places optional parameters in the associative list. So this additional section of code is necessary to add the rootname to the alist.

root[0] = EOS;
for (iarg = 1; iarg < argc; iarg ++) {
    if (iarg > 1)
	safecat (root, ",", SZ_PARAM);
    safecat (root, argv[iarg], SZ_PARAM);
}
put_option ("root", root);

The functions safecat and put_option can be found in safecat.c and option.c in the cdbslib library.

The code to parse the cgi form input and write it into the associative list will be placed in function get_form_input. Here is an abbreviated version of the code for this function.

void get_form_input()
{
    method = getenv ("REQUEST_METHOD");
    if (method == NULL) {
        put_option ("HTML", "no");
        return;
    } else if (strcmp (method, "GET") == 0) {
        input = getenv ("QUERY_STRING");
        safecopy (buf, input, SZ_BUF);
        parse_cgi (buf);
        put_option ("HTML", "yes");
    } else if (strcmp (method, "POST") == 0) {
        len_input = getenv ("CONTENT_LENGTH");
        len = atoi (len_input);
        rdlen = fread (buf, 1, len, stdin);
        parse_cgi (buf);
        put_option ("HTML", "yes");
    }
}

The function checks to see if input is coming from a cgi script and sets the option html accordingly. It reads the cgi form input into the variable buf and calls parse_cgi to convert the buffer into the associative list. An abbreviated version of the code for parse_cgi is:

void parse_cgi (char *buf)
{
    strsplit ('&', buf, param, MAXPARAM)
    for (ipar = 0; param[ipar][0] != EOS; ipar ++) {
        strsplit ('=', param[ipar], pair, 2);
        decode_cgi (pair[0]);
        decode_cgi (pair[1]);
        put_option (pair[0], pair[1]);
    }
}

Strsplit is an existing function found in strsplit.c in the cdbslib library. It splits a string into fields using its first argument as a delimeter. Decode_cgi is a new function which converts + to blanks and hexadecimal constants into characters.

6: Query type and Root check

The query type is determined by the presence of reffile. If reffile is present, the query type is detail, and if it is absent, the query type is traditional.

If the option HTML is set to yes, the task checks to see if the value of root starts with a @. If so, the task terminates with an error, as files cannot be read across the net. If the query type is not traditional, the task checks to see if the value of root starts with a @ or contains a comma. If so, the task terminates with an error, since only the traditional query can process more than one root name.

7: Traditional query

The traditional query is actually done as several queries, since the user may request information about more than one observation set. So the main procedure for the traditional query is structured as a loop preceded with the initialization and followed by the cleanup. The pseudocode for the function is:

Create and initialize output table
Get value of root
WHILE another observation set read from root DO
    Initialize query
    Allocate query variables
    Bind query variables
    Build query string
    Execute query string
    Write query variables into table
    Free query variables
    Close query
ENDWHILE
IF output table is empty THEN
   stop with error "observation not found in db"
ENDIF
Return pointer to output table

Since the parameter root may contain more than one observation set name, the task will require an iterator to retrieve one observation set name at a time from the parameter. To complicate things more, the information may be contained in a file or read from a string. In order to disguise these complications, the task will use the following structure for the iterator:

typedef struct {
int	type;
union	{
FILE	*file;
char	*str;
}	ptr;
} root_iterator;

The type field contains a value indicating whether the iterator is reading its information from a file or a string. The ptr field either contains the file descriptor, if input is coming from a file or a pointer to the next unread character in the string if input is coming from a file.

There are three functions associated with the iterator: start_read_root, read_root, and end_read_root. The start function allocates the iterator structure, checks the first character of root to see if it is a file, and either opens the file or sets the string pointer to the first character of root. The read function checks the type of the iterator and if it is a file, reads a line from the file and if it is a string, and reads characters from the string until the next comma or end of string. If the pointer is already at the end of file or string, the function returns a null pointer. The function then checks for any extension on the string and removes the extension if it finds it. The end function closes the file if there was one and releases the structure. The interface to these functions are

root_iterator *start_read_root (char *root);
char *read_root (root_iterator *rp);
void end_read_root (root_iterator *rp);

There is a separate database table containing the information for each instrument. The column names and number of columns differ for each table. To simplify matters, the information needed to perform each query will be stored in arrays of strings indexed by instrument. The following information will be stored for each instrument:

Since the number and names of the fields to be bound for the traditional query vary according to instrument, they must be allocated dynamically. The query string also (obviously) varies with the names of the fields. The query to retrieve the information will look like:

select $oldfile1, $oldfile2, ... $bestfile1, $bestfile2 ... 
from $dbtable where $obsetfield = '$obsetname'

The words prefixed by dollar signs in the query represent variables, whose values will vary according to the query performed.

A function, build_query, will be written to simplify building the query. The function will take a variable length argument list. The first argument will be the query format and the remaining arguments will be the strings used to fill in the format. Arguments are written into the format wherever a caret appears in the format. Arrays will be supported by having the format contain repeating text, surrounded by braces, and optional text, surrounded by brackets, which is output for every element of the array except the first. The format for the above query is

select {[,] ^}, {[,] ^} from ^ where ^ = '^'

The function build_query will be implemented using a state machine and a stack. Text will be written to an internal buffer. When function finds a blank, it will check to see if the buffer is nearly full. If so, it will flush the buffer to the database server. This allows the function to construct an arbitrarily long query. The function will be a somewhat enhanced version of adbsetfmt.x rewritten in C.

Executing the query will either retrieve zero or one rows, since the observation set name is a unique key to the database table. However, the output contains one row for each type of reference file, assuming any results were retrieved from the database. The output table has four columns: the observation set name (repeated for each of the rows retrieved), the header keyword name, the old reference file name, and the new reference file name. If the database is being queried from a CGI form, the header keyword name should be a link to the corresponding historical query. So the code will check the value of html and if it is yes, will write the html code for the link to the second column instead of the keyword name.

8: Detail query

There will be two forms of the detail query. The first will use the observation mode, as determined from the root name to select the proper row from the row level relation . The second will not use the root name, but will select an eesentially arbitrary row from the row level relation. The reason for the second form of the detail query is to allow information about the comparison file to be viewed when the observation mode of the comparison file does not match the mode of the observation. The html output from the detail query will include a link so that another detail query can be done on the comparison file. Since the reuslts of this query will also have a link to its comparison file, the whole chain of comparison files will be viewable. The second form of the detail query will also be used if the user does not supply a value for root.

The pseudocode for the detail query is:

Create and initialize output table
Get value of root and refname
Get observation mode for root (first query)
IF root is not blank THEN
    Get row and file level info selected on refname and 
    observation mode (second query)
    Write results into output table
    IF no results THEN
        Get row and file level info selected on refname 
        only (third query)
        Write results into output table
    ENDIF
ELSE
    Get row and file level info selected on refname 
    only (third query)
    Write results into output table
ENDIF
Return pointer to table

The output of the detail query will be written into the table in two columns, the first containing the field name and the second the field value.

The following additional information is required for each instrument to support the detail query:

The first query will be packaged as a separate function. The function call will look like:

void qry_obsmode (char *reffile, char **cdbsnames, 
                  char **dadsnames, int maxname);

The actual number of strings returned is indicated by writing a null string at the end of the output arrays. The query that retrieves the observation mode from the database looks like:

select a.modes from cdbs_ops..cdbs_mode a where
a.instrument = '$inst' and a.file_select = 'Y' and
a.reference_file_type in (select distinct
b.reference_file_type from cdbs_ops..hrs_file b where
b.file_name = '$reffile')

This retrieves the cdbs observation mode names. The corresponding names of the DADS tables will be retrieved from an array to fill the second output array.

The first form of the query to return the detail information looks something like:

select a.file_name, a.reference_file_type, b.$obsmode1,
b.$obsmode2, ... , a.expansion_number, a.delivery_number,
a.useafter_date, a.general_availability_date,
a.opus_load_date, a.archive_date, a.reject_flag,
a.comparison_file_name, b.pedigree,
b.observation_begin_date, b.observation_end_date, a.comment
from cdbs_ops..$filetab a, cdbs_ops..$rowtab b,
dadsops..$dadstab c where a.file_name = '$refname' and
b.file_name = '$refname' and a.expansion_number =
b.expansion_number and b.$obsmode1 = c.$obsmode1 and
b.$obsmode2 = c.$obsmode2 and .... and c.hsr_data_set_name
= '$root'

If this query does not return any results, getref will do the second form of the detailquery which eliminates the observation mode condition of the first form.

select a.file_name, a.reference_file_type, b.$obsmode1,
b.$obsmode2, ... , a.expansion_number, a.delivery_number,
a.useafter_date, a.general_availability_date,
a.opus_load_date, a.archive_date, a.reject_flag,
a.comparison_file_name, b.pedigree,
b.observation_begin_date, b.observation_end_date, a.comment
from cdbs_ops..hrs_file a, cdbs_ops..hrs_row b where
a.file_name = '$refname' and b.file_name = '$refname' and
a.expansion_number = b.expansion_number

The output fields produced by both forms of the query are identical.

9: Output

There are two types of output from getref, error output when the task encounters an error condition, and the normal task output. Since getref will be runnable from a cgi form, it is important that error output be translated into html for display on the web browser. Currently all error output goes to the function cdbs_message in cdbserror.c. This function will be modified so that it is possible to post an error message handler to replace the function. If an error message handler is posted, it will be called with the same arguments as the current function:

void cdbs_message (const int flag, const char *message)

Getref will post a message handler which checks flag to see if the message is an error message (as opposed to a warning or informational message) and if so, format the message as an html page.

The principal output of getref is the table generated from the database query. If the value of HTML is yes, the table is converted into an html table. Otherwise output is written as ascii. The ascii output will have a comment line (preceded by a # character) containing the table column names followed by one line per row of the table. Columns within the row will be separated by a tab character. Ascii output produced from the traditional query should be identical to the short form output from the current version of getref.

To simplify the job of creating html pages, html page templates for the pages that getref will produce will be stored in the cdbs data directory. The template pages will contain html comments that will serve as variable names. When the template is printed the comments will be replaced by text generated by getref. The function supporting template printing is:

char *html_template (FILE *fp)

It takes a file pointer to the template and prints the contents to stdout until it finds a variable. It then outputs the name of the variable to the calling function, which uses the information to print its own output to stdout. The calling function calls html_template in a while loop so that successive variables can be filled. When the entire template has been printed, html_template returns a null pointer, ending the while loop.

10: Upreffile interface

Teh task upreffile takes the output produced by getref and uses it to update the header of the observation so that it can be recalibrated. Since the traditional output by the new version of getref isunchanged, it will continue to work from the new version of getref. But since the current version of getref is only available locally and we want to make the functioanlity available to all users of stsdas, upreffile will be rewritten as a cl script which calls hedit to update the header. The script will be downloadable from the web page containing the cgi form for getref so that users can install it at their site.


Back to index of cdbs tasks.