Frequently Asked Questions

OPUS Applications

What are the advantages of using OPUS?
What kind of processes can be run in the pipeline?
Can a script read from STDIN (standard input)?
Can I use command line arguments for my tasks?
How are the values of environment variables set?
What is the difference between an external and an internal poller?
How do I create a C++ internal poller ?
Can I create an internal poller in a language besides C++ ?
Is there any way to run my tasks on other (unsupported) platforms?
How do I add a processing step to a pipeline?
Are there any limitations on naming a new task?
What are some of the common gotchas I should beware of when I write my OPUS tasks?
What kind of message reporting does OPUS provide?

OPUS Applications

What are the advantages of using OPUS?

OPUS is designed to help you distribute processing over a network of machines and to allow you to start up a number of separate instances of any task. The central objective is throughput: performing the analysis of a number of independent datasets robustly and efficiently.

A useful feature of this system is the ability to monitor the status of both datasets and processes.

What kind of processes can be run in the pipeline?

Any process which can be run from a shell script: a simple shell script itself, or an executable invoked from a shell script. Of course, those executables can be written in any language including IRAF and IDL.

To take full advantage of the OPUS environment, tasks can be written as internal pollers built with the OAPI.

The default size of the "PROCESS" field in a process status entry is currently set at a maximum of 15 characters, which effectively limits your process names to this size or less. However OPUS supports changes to the process status entry structure, including the size of this field.

Can a script read from STDIN (standard input)?

No.

The script should get its arguments from the command line, from environment variables, or from the process resource file. All keywords prepended with ENV. in the process resource file become environment variables (to external pollers) so these variables are easily accessible.

Can I use command line arguments for my tasks?

Yes.

This is a standard way to get information into your task. OPUS pipeline tasks run in the background, but command line arguments can be specified in the process resource file. This is a convenient way to use the same task for slightly different functions.

How are the values of environment variables set?

The environment variables used by the OPUS managers, the Observation Manager and the Process Manager, are not the same as those used by the pipeline processes. Those pipeline processes must first have their environment variables set by your opus_login.csh file.

Keywords from the process resource file prepended with ENV. and their values are defined as environment variables for external pollers. Note that ENV. does not become part of the keyword name. These environment variables only are available in the shell in which your task is run. This mechanism is not necessary for internal pollers, since they have programmatic access to resource file values.

There are additional environment variables defined by the OPUS system as a task is started in response to an event. First, all tasks have access to the EVENT_TYPE variable that takes on one of the following three possible values:

   EVENT_FILE_EVENT, EVENT_OSF_EVENT, or EVENT_TIME_EVENT

Each of these values corresponds to the type of trigger that caused the event. The number of items in the event is placed in the EVENT_NUM variable. Unless you have configured your application to handle more than one item per event, EVENT_NUM will be 1.

Tasks that are triggered by a file event additionally have access to the EVENT_NAME variable:

   EVENT_NAME      The filename which triggers the event.

If EVENT_NUM is greater than 1, then there will be additional environment variables defined of the form EVENT_NAME1, EVENT_NAME2, etc..

Tasks that are triggered by an OSF have access to all the information in that OSF. The task can also get the name of the OSF trigger defined in the task's resource file. The OSF trigger names are OSF_TRIGGER1, OSF_TRIGGER2, etc. Here is the list of the environmental variables specified for each OSF event:

   OSF_DATASET     The name of the exposure that triggered the task.
   OSF_DATA_ID     The type of the exposure (by default, a 3 character descriptor).
   OSF_DCF_NUM     An arbitrary sequence number.
   OSF_START_TIME  The time the exposure started in the pipeline.
   OSF_EVENT       The OSF trigger name from the resource file.

As in the case of file events, if EVENT_NUM is greater than 1, there will be additional environment variables defined but followed by a number (e.g., OSF_DATA_ID1) for each item in the event.

Time events have no event-related environment variables defined other than EVENT_TYPE.

You can use the values of these environment variables as command line arguments to the tasks you write, or in the bodies of the tasks themselves. See the path file section for more details on the relationship between path file variables and the environment variables from process resource files.

Additional task-related environment variables that are available to each process include

   PROCESS_NAME    The name of the OPUS task.
   PATH_FILE       The full path-name of the path file.
   PATH_BASENAME   The rootname.extension of the path file.
   PATH_BASEROOT   The rootname of the path file
   TIME_STAMP      The encoded time stamp for the process start time.
   OPUS_LOG_FILE   The full path-name of the process log file.

What is the difference between an external and an internal poller?

External polling processes are programs or scripts that have no knowledge of how the OPUS blackboard works. These processes are invoked through the OPUS task XPOLL (eXternal POLLer). Most of the sample pipeline applications are external pollers. The g2f task is the only exception; it was implemented using the OAPI.

The OPUS system uses information in the process resource file to decide when to activate a process. In the case of external pollers, xpoll responds to an event by spawning its associated process that in turn communicates back to xpoll how successful it was in processing the event through an exit status code. The code is mapped to specific keyword values in the process resource file by xpoll, and the OPUS system is informed of the disposition of the event. External pollers are started by the OPUS system each time work is required, then they exit to be started again later by the OPUS system when more work is needed.

Internal polling processes, like g2f, are programs written with knowledge of how the OPUS blackboard works. They are typically processes with some significant start-up overhead (e.g. database reading, etc.). The process is written to perform the start-up overhead and then enter a polling loop to wait for pipeline events. The process stays active as it polls for and processes events. Internal pollers are built using the OAPI to communicate with the OPUS system, and can respond to a reinitialization command.

How do I create a C++ internal poller ?

C++ internal pollers can be written via the OAPI and linked using the provided OPUS libraries. To see how this is done, simply follow the example C++ internal poller which comes with OPUS.

Can I create an internal poller in a language besides C++ ?

Yes.

Internal pollers are often written in C++, but can be written in other languages, with some restrictions. Included with OPUS are two examples of non-C++ internal pollers, one written in Python and one in Java.

Is there any way to run my tasks on other (unsupported) platforms ?

Yes.

In general, OPUS tasks (processes) are run on either the same platform as where the blackboard servers are running, or at least run on another node which is cross-mounted to the platform where the servers are running. In any case, the operating system of all nodes used would be found on the list of supported/tested operating systems for OPUS.

However it is now possible to run certain types of internal pollers on basically any platform without the need for a complete port of OPUS to said platform (e.g. Mac OS X), and without the need for any special disk mounting. This FAQ information applies to tasks written in any language which supports CORBA, but for simplicity in this example we will discuss how this applies to Java internal pollers since Java is so widely supported and itself has CORBA support built in. Such a Java task would be able to run on any platform which provides both Java and a remote-shell capability such as SSH.

This may be useful to you if:

Some of your processing is better suited to a node or operating system on which OPUS is not yet supported, or
You have a large amount of processing to do and would like to spread it across a network which includes operating systems on which OPUS is not yet supported, or
You would simply like to employ the CPU power of some extra nodes which are not cross-mounted to the disk(s) containing your OPUS installation.

Caveats:

This does not mean that all of OPUS can be run anywhere - this is only for certain OPUS tasks. There must still be a node with a supported operating system on which OPUS is installed and on which the OPUS servers are running.
You should be familiar with configuring/running your own internal poller.
You should be aware that non-C++ internal pollers (thus, users of the OpusUser IDL interface) do not currently have the same depth of access to the OAPI that OAPI-linked C++ internal pollers have.
You must at least be able to run a remote-shell command (e.g. ssh) on the node in question without being prompted for a password. This is configurable on most flavors of Unix. It is even possible on Windows with an SSH daemon service running, though we have not yet tested that.
If this mechanism is being used because the node in question does not have disk access to where the servers (and possibly your mission data) are, then clearly the task in question should not require such disk access. However, many tasks might not need such disk access. For example, perhaps the task performs its work via a database - available across the network. Or, maybe the task itself first collects its data from across the network (or internet) before starting.

There are many possible reasons to use this kind of a task. The way your task runs and what is does is limited only by your pipeline needs.

How is this done? There are notes at the end of the Java Internal Poller Example which describe the necessary steps.

How do I add a processing step to a pipeline?

There are three things that are required: a new script or OAPI application for that step, a new corresponding process resource file, and an update to the pipeline.stage file.

Script: This is the shell script which either does the work or which runs (one or more) tasks that process the data. It is better, however, to have more steps in the pipeline than to have a single script perform more than one function.
Process Resource File: This file describes the task to OPUS, and defines the necessary directory and polling parameters. The task line in the process resource file refers to the name of your new script. For example, if you wanted to insert a task to list the FITS headers to an ASCII file, called listhd (included with the sample pipeline):
```
   TASK = < xpoll -p $PATH_FILE -r listhd >
```
You then need to define the triggers for this process. In the simplest case, the task will be triggered by a flag in the Observation Status File. You need to select a two-character mnemonic for your task; this will be the OSF stage title which the pipeline.stage file uses as the name of the column on the Observation Manager's display. We'll use "LH" in this case. You might just make the trigger:
```
   OSF_RANK = 1            ! First Trigger
   OSF_TRIGGER1.LH = w     ! Need a "Wait" flag in LH column
```
You can then tell OPUS how to set the flags during processing, when processing has completed normally, and when the task encountered an error with the observation:
```
   OSF_PROCESSING.LH  = p          ! Set the processing flag to "Processing"
   OSF_OK.LH          = c          ! Completed header listing
   OSF_OK.NX          = w          ! Waiting for the next step NX
   OSF_ERROR.LH       = e          ! Error:  Set the error flag
```
The link between the return status of the script and these flags is defined by the XPOLL_STATE keywords in this file:
```
   XPOLL_STATE.01 = OSF_OK
   XPOLL_STATE.03 = OSF_ERROR
```
Thus when your process completes with the "exit 1", OPUS will use the OSF_OK tags, and there may be more than one of these. When an error is encountered by the task and it completes with "exit 3" then the OSF_ERROR tags will be used by OPUS.
There are a number of standard resource file lines which should also be present; see the more complete discussion of process resource files.
Because your new task will likely fit in the pipeline after an existing task, you will probably want to modify the preceding task's process resource file so, on normal completion of that first task, the Observation Status File flag is set properly for your new process.
Stage file: This defines the names of the columns on the OMG display. If this is the third step in the pipeline you just need to include a new set of three lines like:
```
   STAGE03.TITLE = LH
     STAGE03.DESCRIPTION = "List Headers"
     STAGE03.PROCESS01 = listhd
```
In addition, remember to increment the NSTAGE line in this file.
```
   NSTAGE = 6
```

Are there any limitations on naming a new task?

Yes.

The name of the task is used in the construction of the process status entry, and that entry has a fixed, limited number of characters to hold the task name. The default value for that limit, however, can be changed (see "PROCESS.SIZE").

What are some of the common gotchas I should beware of when I write my OPUS tasks?

Besides the traditional dangers of memory leaks and unclosed files you should be attuned to the possibility that many copies of your task might be running simultaneously. Thus it is important to open files for reading only (in C use 'r', not 'r+') when possible, to expect collisions when updating databases, and always to terminate with a known status.

Status messages to the standard output device will automatically be kept in a process log file. It is extraordinarily useful to write to the log file both wisely and often to document the actions taken by a pipeline task.

Also keep in mind that an external polling process has access to process resource file keywords and values through its environment only for those keywords prepended with ENV..

What kind of message reporting does OPUS provide?

The severity of the OPUS messages reported to any of the log files can be selected with the environment variable, MSG_REPORT_LEVEL. This allows the user to specify which type of messages should be reported. Ordinarily the number of 'Debug' message can be quite large, and during normal operations 'Informational' messages (and those more severe) will be sufficient. The user can set the current report level to following values: MSG_ALL, MSG_DIAG, MSG_INFO, MSG_WARN, MSG_ERROR, MSG_FATAL, MSG_NONE. The report levels are cumulative which implies, for example, that the MSG_WARN level will receive MSG_WARN, MSG_ERROR and MSG_FATAL messages. The default level for message reporting is to report MSG_INFO messages.

Note that when the report level is set to MSG_NONE, no log files are produced.

If the OPUS task in question is an internal poller, there is more information here regarding message reporting.

Frequently Asked Questions

OPUS Applications

OPUS Applications

Top of Applications FAQ

Top of OPUS FAQ