[O1.2]ÊRapid Development for Distributed Computing, with Implications for the Virtual Observatory
Michael S. Noble 
Compute nodes will require enabling technologies to participate in a virtual observatory (VO), yet much of the available GRID software exhibits the classic buy-in problem: a) it is distributed in relatively large packages that require regular updating as new versions are deployed, and platform-specific binaries for each CPU architecture, b) involves steep learning curves, and c) can require significant institutional commitment to install, utilize, and maintain. These factors create a formidable entry barrier for metacomputing newcomers, a class into which a large majority of the astronomy community currently falls. Recent work, however, shows that combining Java tuplespaces with network class loaders can greatly simplify the configuration and management of distributed computations within heterogenous networks. When added to the semantic clarity of tuplespace programming, such an approach could shrink the buy-in cost of VO participation considerably, at the very least in terms of the up-front costs needed to merely dabble in metacomputing, as well as simplify the iterative redeployment of experimental infrastructure and services that will prove necessary as the VO evolves. To bolster this argument we outline an architecture which shows promise as a means of accessing diverse data and services through a virtual metacomputer interface. Early prototypes suggest that the approach offers a number of attractive VO-relevant features: (1) loosely couples service provider nodes, accessor nodes, and users across networks of all scales; (2) permits dynamic joining and leaving of service providers and accessors; (3) hides much of the complexity of distributed programming behind a clean and simple interface; (4) provides intrinsic scalability and fault tolerance, and comparatively simple replication; and (5) requires low institutional or individual buy-in, relative to other GRID tookits, for initial VO participation.
[O1.3]ÊeSTAR: Building an Observational GRID
Alasdair Allan, University of Exeter Tim Naylor, University of Exeter Iain Steele, Liverpool John Moores University Dave Carter, Liverpool John Moores University Jason Etherton, Liverpool John Moores University Chris Mottram, Liverpool John Moores University 
The eSTAR Project is a programme to build a prototype robotic telescope network to design and test the infrastructure and software which could be used in larger scale projects. The network consists of a number of autonomous telescopes, and associated rapid data reduction pipelines, connected together using GLOBUS middleware. Intelligent agents carry out resource discovery, submit observing requests, and analyse the reduced data returned by the telescope nodes. The agents are capable of carrying out data mining and cross-correlation tasks using online catalogues and databases and, if necessary, requesting follow-up observations from the telescope nodes. We discuss the design and implications of the eSTAR software and its implications with respect to the GRID.
[O1.4]ÊAstroComp: A Web Portal for High-Performance Astrophysical Computing on a Grid of Supercomputers
Paola Di Matteo, Univ. of Rome "La Sapienza" (Italy) Paolo Miocchi, Univ. of Rome "La Sapienza" (Italy) Vincenzo Antonuccio-Delogu, Astroph. Obs. of Catania (Italy) Ugo Becciani, Astroph. Obs. of Catania (Italy) Roberto Capuzzo Dolcetta, Univ. of Rome "La Sapienza" (Italy) Alessandro Costa, Astroph. Obs. of Catania (Italy) Vittorio Rosato, ENEA - Rome (Italy) 
AstroComp is a project (initially funded by the Italian National Research Council, CNR) aiming at creating a portal that permits to handle and use high-performance numerical tools for Astrophysics, on a grid of supercomputers. The main motivation of the project is to construct a portal, which allows to set up a repository of computational codes and common databases, making them available and enjoyable, with a user-friendly graphical web interface, to the entire national (and international) community. AstroComp will allow the scientific community to benefit by the use of many different numerical tools implemented on high performance computing (HPC) resources, both for theoretical astrophysics and cosmology and for the storage and analysis of astronomical data, without the need of specific training, know-how and experience either in computational techniques or in database construction and management methods. I will illustrate some examples of practical utilization of the present version of the AstroComp portal in the framework of the numerical simulations of globular clusters dynamics. I will show how to handle the various aspects related to the performing and managing of a typical N-body simulation. A prototype of the portal can be visited at http://www.astrocomp.it/
[O10.1]ÊThe Design of the MicroObservatory Network of Educational Telescopes
Philip M. Sadler 
Many students have a deep interest in astronomy, but a limited opportunity to use telescopes to explore the heavens. The MicroObservatory Network of automated telescopes is designed to provide access to classroom teachers who wish their students to conduct projects over the World-Wide Web. The intuitive interface makes it easy for even 10-year-olds to take pictures. Our telescopes can be remotely pointed and focused; filters, field of view, and exposure times can be changed easily. Images are archived at the website, along with sample challenges, and a user bulletin board, all of which encourage collaboration among schools. Wide geographic separation of instruments provides for access to distant night skies during local daytime. Operational since 1995, we have learned much about remote troubleshooting, designing for unattended use, and for acquiring the kinds of images that students desire. This network can be scaled up from its present capability of 240,000 images each year to provide telescope access for all U.S. students with an interest in astronomy. Our WWW address is: http://mo-www.harvard.edu/MicroObservatory/
[O10.2]ÊMirage: A Tool for Interactive Pattern Recognition from Multimedia Data
Tin Kam Ho, Bell Laboratories. 
Many data mining queries in astronomy involve an identification of objects that are similar or discernible in different aspects such as spectral shapes and features, light curves, morphology, positional proximity, or other derived attributes. Analyses need to go beyond conventional clustering algorithms that stop at computing a single proximity structure according to a specific criterion. We describe Mirage, a software tool designed for interactive exploration of the correlation of multiple partitional or hierarchical cluster structures arising in different contexts. The tool shows projected images of point classes and traversals of proximity structures in one, two, or higher dimensional subspaces, in linked views of tables, histograms, scatter plots, parallel coordinates, or over an image background. It also provides facilities for arbitrary plot configuration, manual or automatic classification, and intuitive graphical querying. We show applications of Mirage to find robust designs of optical devices, verify the consistence of DLS catalogs, and examine spectral classes from IRAS LRS.
[O10.3]ÊMontage: An On-Demand Image Mosaic Service for the NVO
G. B. Berriman D. Curkendall J. Good J. Jacob D. S. Katz T. Prince R. Williams. 
Montage will deliver a generalized toolkit for generating on-demand, science-grade custom astronomical image mosaics. "Science-grade" in this context requires that terrestrial and instrumental features are removed from images in a way that can be described quantitatively. "Custom" refers to user-specified parameters of projection, coordinates, size, rotation and spatial sampling, and whether the drizzle algorithm should be invoked. The greatest value of Montage will be its ability to analyze images at multiple wavelengths, by delivering them on a common projection, coordinate system and spatial sampling and thereby allowing analysis as if they were part of the same multi-wavelength image. Montage will be deployed as a compute - intensive service through existing portals. It will be integrated into the emerging NVO architecture, and run operationally on the Teragrid, where it will process the 2MASS, DPOSS and SDSS SDSS image data sets. The software will also be portable and publicly available.
[O10.4]ÊArchitecture for All-Sky Browsing of Astronomical Datasets
Joseph C. Jacob, Jet Propulsion Laboratory, California Institute of Technology Gary Block, Jet Propulsion Laboratory, California Institute of Technology David W. Curkendall, Jet Propulsion Laboratory, California Institute of Technology 
A new architecture for all-sky browsing of astronomical datasets, has been designed and implemented in the form of a graphical front-end to the yourSky custom mosaicking engine. With yourSky, any part of the sky can be retrieved as a single FITS image with user-specified parameters such as coordinate system, projection, resolution and data type [1]. The simple HTML form interface to yourSky has been supplemented with a graphical interface that allows: (i) All-sky, web-based pan and zoom; (ii) Interactive, multi-spectral viewing; (iii) Symbol overlays from object catalogs; (iv) Invocation of the yourSky mosaicking engine once a desired view has been selected; (v) Image pixel to sky coordinate conversions; and (vi) User control over display view size. The image viewed by the user at each instant is rendered from a collection of overlapping image plates with limited size for each zoom level. These plates are constructed in a Tangent Plane projection with tangent points selected at Hierarchical Triangular Mesh [2] vertices. The plate sizes are selected to limit the maximum distortion from image projection while providing sufficient overlap between neighboring plates that the sub-image in the view window is wholly contained within a single plate. This results in astrometric accuracy, rapid response time, and efficient data storage. Although the current implementation is interoperable with the yourSky mosaic engine, we believe this architecture should be applicable for front ends to other space science applications and, with little modification, to planetary science applications as well. References: [1] J.C. Jacob, R. Brunner, D.W. Curkendall, S.G. Djorgovski, J.C. Good, L. Husman, G. Kremenek and A. Mahabal, "yourSky: Rapid Desktop Access to Custom Astronomical Image Mosaics", to appear in Proceedings of SPIE Astronomical Telescopes and Instrumentation: Virtual Observatories, Waikoloa, HI, Aug., 2002. [2] P.Z. Kunszt, A.S. Szalay, I. Csabai and A.R. Thakar, "The Indexing of the SDSS Science Archive", Proceedings of ADASS IX, Kona, HI, Oct., 1999.
[O10.5]ÊScoping the UK's Virtual Observatory: AstroGrid's Key Science Drivers and Impact on System Design.
Nicholas A Walton, Institute of Astronomy, Cambridge, UK Andrew Lawrence, Institute for Astronomy, Edinburgh, UK Tony Linde, Dept Physics & Astronomy, Leicester, UK 
AstroGrid (see http://www.astrogrid.org), a UK eScience project with collaborating groups drawn from the major UK data archive centres, is creating the UK's first virtual observatory. AstroGrid is aiming to support a broad spectrum of astronomical activity, with an initial emphasis on meeting the needs expressed by the UK community. AstroGrid is aiming to balance the scientific requirements of a wide community which spans the Astronomy, Solar and STP areas. This paper discusses how the AstroGrid project has captured a well scoped set of key science drivers. These represent a number of currently topical scientific areas where access to capabilities promised by a virtual observatory will make a significant impact on the researchers abilities to progress in these areas. At the same time, the tools and facilities developed to support these drivers will be employable in addressing a wider range of astronomical science problems. The drivers cover aspects of astronomy ranging from facilitating access to large scale survey data sets to speed the discovery of galaxy clusters, through understanding the onset of solar magnetic storms via the analysis of multi-sourced STP data sets. This paper further describes the process by which the AstroGrid architecture has been formulated. Note is made of the interactive mechanisms provided by the project, linked from http://www.astrogrid.org. In particular the AstroGrid Wiki at http://wiki.astrogrid.org is highlighted as an excellent medium with which to capture user requirements and to disseminate back project developments via, for instance, meeting notes, architecture and other discussions. The paper closes by noting how the science drivers have been used to determine the system design required to deliver the capabilities that AstroGrid will provide to its community by the end of its three year project lifetime.
[O10.6]ÊToward a Minimal Buy-in Virtual Observatory
Eric Mandel Paul S. Grant William Joye 
Minimal buy-in software seeks to hide the complexity of complex software from users. It strikes a balance between the extremes of full functionality (in which you can do everything, but it is hard to do anything in particular) and naive simplicity (in which it is easy to do the obvious things, but you can't do anything interesting). Design decisions must be made up-front in order to achieve this balance. We report on recent efforts to implement a minimal buy-in Virtual Observatory that allows users to view and analyze remotely-located data from their local site. The ds9 image display program is used as the front-end graphical interface to display small (100Kb) but fully functional FITS images representing large (300Mb or more) data sets. Analysis requests made via the ds9 Analysis menu are processed on back-end chroot'ed analysis servers, which can send images, plot results or text/html back to the user for display. Threaded queues and Linux Virtual Server technology distribute the analysis load in an extensible manner. In this way, our architecture provides low-end PC users with sophisticated analysis support on very large data sets with minimal transfer of data and software.
[O2.1]ÊPhotometeric and Astrometric Calibration of the Southern H-alpha Sky Survey Atlas (SHASSA)
Peter R. McCullough, STScI John Gaustad, Swarthmore Wayne Rosing, Google 
The Southern H-Alpha Sky Survey Atlas (SHASSA) is the primary data product of a robotic wide-angle imaging survey of the southern sky ($\delta = +15\degr $ to $-90\degr $) at 656.3 nm wavelength, the H$\alpha$ emission line of hydrogen. This presentation will focus on the photometric and astrometric calibration of the images in the Atlas.
[O2.2]ÊThe XMM-Newton Serendipitous Sky Survey
M G Watson, University of Leicester, UK on behalf of the XMM-Newton SSC 
XMM-Newton provides a powerful facility for X-ray surveys by virtue of its high sensitivity and large field of view, coupled with excellenthard X-ray response. Over the course of each year's observations around 100 sq.deg. of the sky is covered by XMM-Newton observations, yielding a serendipitous catalogue of ~50,000 X-ray sources extending to faint X-ray fluxes. This paper will describe two projects being undertaken by the XMM-Newton Survey Science Centre (SSC) which are designed to maximise the value of the XMM-Newton serendipitous sky survey: (i) a programme of identification and follow-up of significant samples of sources serendipitously discovered in XMM-Newton observations. This project, underway since mid-2000, involves a substantial programme of spectroscopic identifications, coupled with extensive deep optical/infrared imaging of XMM-Newton fields. (ii) the compilation of a comprehensive serendipitous source catalogue from all XMM-Newton observations, emphasing sensitivity but with particular attention paid to uniformity and quality control. Specialised data processing for this catalogue started in April 2002 with the first instalment will be publicly released in late 2002. The main properties of the catalogue will be presented.
[O2.3]ÊThe ISOCAM Parallel Mode Survey
S. Ott R. Siebenmorgen N. Schartel T. V 
During most of ESA's ISO mission, the mid-infrared camera ISOCAM continued to observe the sky mainly around 6.7$\mu$m with a pixel field of view of 6$''$ in its so-called ``parallel mode'' while another instrument was prime. This permitted an unbiased survey of limited areas of the infrared sky, albeit with varying depth and wavelength per field due to the different instrumental configurations used and the highly variable time spent per pointed observation. Dedicated calibration, data reduction and source extraction methods were developed to analyse these serendipitously recorded data: 37000 individual pointings, taken during 6700~hours of observation. Using sophisticated cleaning and merging algorithms, over 42~square degrees of the sky --- roughly one per mille of the celestial sphere --- could be processed and catalogued. We will give an overview of the data processing and results of this recently finished project, and outline the scientific potential of the generated data-set. For the final point source catalogue, around 30000 sources are expected. Their mid-infrared flux goes down to 0.5~mJy, with a median of 2.7~mJy for sources outside the galactic plane, and 6.3~mJy for sources inside the galactic plane. We will announce the release date of the ISOCAM Parallel Point Source Catalogue and of all calibrated ISOCAM parallel images to the general community at this conference, and hope this will become an attractive and valuable resource for all mid-infrared research activities and a major legacy from the ISO mission.
[O3.2]ÊJames Webb Space Telescope Science and Operations Center
Joe Pollizzi, STScI 
The newly named James Webb Space Telescope is more than just a replacement telescope for the current Hubble Space Telescope. The JWST is the successor mission to that of Hubble ‰ÛÒ which means its goals are to be as successful in advancing our understanding of the universe, but in ways that are beyond Hubble's capabilities. Plus it is to build on Hubble's legacy of success at only a fraction of its cost. These are indeed tall goals. Certainly advances in technology will buy some of that capability, as will the L2 orbit of JWST, which will remove it from many of the visibility and other environmental constraints that Hubble experiences. But, we believe the proper application of the lessons we have learned in operating the Hubble will be as significant to JWST's success as are these more evident changes. Foremost among these lessons is in taking only the best of Hubble's ground systems and building upon them. This talk presents some of our thinking in how we‰Ûªll use Hubble's systems and the kinds of changes we are planning in preparing the Science and Operations Center for the JWST. This talk will first present an overview of the James Webb Space Telescope and highlight some of its features and differences from HST. It will then discuss how we intend to take advantage of these changes in simplifying the ground systems. Next we present our plans for where we will use existing systems and where we will acquire or build new components. The talk will conclude with a general overview of the planned S&OC system and our concepts of how it will support the JWST.
[O3.3]ÊAdaptive Optics at the VLT: NAOS-CONICA
Chris Lidman Wolfgang Brandner 
NAOS-CONICA is the first adaptive optics instrument to be offered to the community at the ESO VLT. This instrument is capable of diffraction limited imaging, spectroscopy, polarimetry and coronography in the 1 to 5 micron wavelength region. In this talk I will provide a description of the instrument and summarize NAOS-CONICA "end-to-end" operations.
[O4.1]ÊData Management for the LSST
Andrew Connolly 
The Large-aperture Synoptic Survey Telescope (LSST) represents the next generation of wide-field survey telescopes. With repeated scans of the sky on timescales ranging from a few minutes to several years it will be one of the first of the wide-area survey facilities to open up the study of the time domain in astrophysics. The scientific returns from such an approach are numerous, ranging from the dynamics of near-Earth asteroids and Kuiper-belt objects through to the detection of intermediate and high redshift supernovae. We will discuss here the impact that such a diverse range of scientific questions will have on the analysis and management of the data flow from such a telescope. We will focus on a number of areas that must be addressed in order for the successful operation of the LSST. These will include: a) the computational challenge presented by a data rate in excess of 300GB per hour b) the impact of the optical design on the photometric accuracy and image quality of the system together with how this translates into implementing efficient techniques for measuring the differences between repeated observations or for co-adding multiple images to construct a deep image of the sky c) the impact on the design of the software due to the requirement that we be able to detect variability over a very broad range of temporal scales (from almost real-time though to several years). Throughout we will discuss the current state-of-the-art in analysis software and algorithms and how they might be expected to scale with the increase in computational resources over the coming decade. We will then identify which computational data management challenges we must address in the near future.
[O4.3]ÊThe Raptor Real-Time Processing Architecture
Mark Galassi,Los Alamos National Laboratory D. Starr, Los Alamos National Laboratory K. Borozdin, Los Alamos National Laboratory D. Casperson, Los Alamos National Laboratory K. McGowan, Los Alamos National Laboratory W. T. Vestrand, Los Alamos National Laboratory R. White, Los Alamos National Laboratory P. Wozniak, Los Alamos National Laboratory J. Wren, Los Alamos National Laboratory 
The primary goal of Raptor is ambitious: to identify interesting optical transients from very wide field of view telescopes in real time, and then to quickly point the higher resolution Raptor ``fovea'' cameras and spectrometer to the location of the optical transient. Any application of real-time search and time-domain mapping of the sky is possible with Raptor, including the very interesting real-time search for orphan optical counterparts of Gamma Ray Bursts. The sequence of steps (data acquisition, basic calibration, source extraction, astrometry, relative photometry, the smarts of transient identification and elimination of false positives, telescope pointing feedback...) is implemented with a ``component'' aproach. All basic elements of the pipeline functionality have been written from scratch or adapted (as in the case of SExtractor for source extraction) to form a consistent modern API operating on memory resident images and source lists. The result is a pipeline which meets our real-time requirements and which can easily operate as a monolithic or distributed processing system. Finally: the Raptor architecture is entirely based on free software (sometimes referred to as "open source" software). In this paper we also discuss the interplay between various free software technologies in this type of astronomical problem.
[O4.3]ÊMicrolensing Surveys: Exploring the Time Domain
Kem H. Cook and the MACHO Collaboration 
In the last decade a number of different projects have been mounted to detect and follow the progress of gravitational microlensing by compact objects, an extremely rare event. These projects, driven by the need to monitor millions of potential source stars, have have opened the time domain in wide-field time-domain astronomy. One of the original projects was the MACHO Project, a survey to determine whether there is significant baryonic component to the dark matter in the halo of the Milky Way. The MACHO Project collected 8 years and 7.3 Tbyte of data on 99 square degrees toward the Magellanic Clouds and the bulge of the Milky Way. Half square degree fields were sampled, simultaneously in two bands, roughly every three days and light curves for about 55 million stars to a depth of about magnitude 21 have been collected in a photometry database. This database has been analyzed for microlensing and about 500 events toward the Bulge and about two dozen toward the Magellanic Clouds have been detected. We have also identified about 500,000 variable light stars. These have been analyzed yielding new results in the astrophysics of pulsating stars, new categories of stellar variability, and such disparate detections as new high proper motion stars and new quasars. I will present, from an astronomer's perspective, some of the data management issues encountered in the MACHO Project, a survey which pushed the boundaries of available technology. I will also recount some of the lesson's learned from MACHO's and other microlensing survey's experience in data mining in, and providing public access to, large image and photometry databases.
[O4.4]ÊThe Subaru Telescope Software Trinity System
Ryusuke Ogasawara, Subaru Telescope, National Astronomical Observatory of Japan Yoshihiro Chikada, Radio Astronomy Division, National Astronomical Observatory of Japan Yasuhide Ishihara, Fujitsu Ltd. Atsushi Kawai, Fujitsu America Incorporation Kenji Kawarai, Fujitsu Ltd. George Kosugi, Subaru Telescope, National Astronomical Observatory of Japan Yoshihiko Mizumoto, Optical Infrared Astronomy Division, National Astronomical Observatory of Japan Junichi Noumaru, Subaru Telescope, National Astronomical Observatory of Japan Toshiyuki Sasaki, Subaru Telescope, National Astronomical Observatory of Japan Tadafumi Takata, Subaru Telescope, National Astronomical Observatory of Japan Masafumi Yagi, Optical Infrared Astronomy Division, National Astronomical Observatory of Japan Michitoshi, Yoshida, Okayama Astrophysical Observatory, National Astronomical Observatory of Japan 
The Subaru Telescope is the optical infrared telescope with 8.2m monolithic mirror located at the summit of Mauna Kea, Hawaii, USA, granted 100% by the Japanese government, Ministry of the Education, Culture, Sports, Science, and Technology. The Subaru Telescope began operation in January 1999, and opened the Open Use program for astronomers all over the world in December 2000. The Subaru Telescope Software Trinity System, which consists of Subaru Observation Software (SOSS), Subaru Telescope Archive System (STARS), and Data Analysis System Hierarchy (DASH) is supporting Subaru Data Flow, observation, archiving and analysis. Observation on the Subaru Telescope is operated by SOSS. In SOSS, telescope and instruments are defined as external modules and interface methods for sending commands and receiving status from those modules are defined. Quick analysis tools and utilities for preparation of the observation procedure are also implemented on SOSS. The Observation Dataset created during the observation procedure by SOSS defines the relation of various categories of FITS frames such as calibration frames, standard stars, and object frames. FITS frames are transferred to the Hilo Data Center and archived automatically to the tape library system. STARS is running to support online registration of observation data in close relation with SOSS, and retrieving with DASH as well as WWW interface for astronomers. As a first trial in the history of the Japanese astronomy, the Subaru Telescope began a challenging project to develop a new platform to support pipeline analysis of data taken by the Subaru Telescope. The DASH project based on the Object Oriented Method and CORBA began in 1996 and completed in March 2002. Thus the whole data obtained by the Subaru Telescope will be reusable for observers to prepare for the new observation proposal of the Subaru Telescope. Even during the observation, the Subaru Telescope Software Trinity System would be useful to find optimum parameters for observations to achieve the best quality. This is the way how the quality control on the observation data is realized on the Subaru Telescope with the Subaru Telescope Software Trinity system. The basic concept of supporting Subaru Data Flow by the Subaru Software Trinity, detailed software methodology we have chosen to develop, the status of the current operation, and the upgrade plan for the future will be presented.
[O6.1]ÊSkyQuery - A Prototype Distributed Query Web Service for the VO
Tamas Budavari, Tanu Malik, Alex Szalay, Ani Thakar, The Johns Hopkins University Jim Gray, Microsoft Research 
We present SkyQuery, a prototype distributed query and cross-matching service for the VO community. SkyQuery enables astronomers to run combined queries on existing heterogeneous astronomy archives. SkyQuery provides a simple, user-friendly interface to run distributed queries over the federation of registered astronomical archives in the VO. SkyQuery not only provides location transparency, but also takes care of vertical fragmentation of the data and runs the query efficiently to minimize query execution costs. The SkyQuery client connects to the portal, which is an XML Web Service. The portal farms the query out to the individual archives, which are also accessible via Web Services called SkyNodes. The cross-matching algorithm is run recursively on each SkyNode. Each archive is a relational DBMS with a HTM (Hierarchical Triangular Mesh) index built in for fast spatial lookups. The results of the distributed query are returned as an XML DataSet that is automatically rendered by the client. SkyQuery client web application also displays the image cutout corresponding to the query result. The importance of a service like SkyQuery for the worldwide astronomical community cannot be overstated: scientific data on the same astronomical objects residing in various archives are mapped in different wavelength ranges and look very different due to the different errors, instrument sensitivities and other peculiarities of the data acquisition and calibration processes used for each archive. Our cross-matching algorithm preforms a probabilistic spatial join across multiple catalogs. This is far from a solved problem in astronomy - indeed, this type of cross-matching is currently often done by eye, one object at a time. Even if we built a static cross-identification table for a set of archives, it would become obsolete by the time we finished building it - theexponential rate of growth of astronomical data means that a dynamic cross-identification mechanism like SkyQuery is the only viable option. Finally, it should be noted that finding non-matches (dropouts) between datasets - objects that exist in some of the catalogs but not in others - is often as important as finding matches, and SkyQuery provides that capability.
[O6.2]ÊWhy Indexing the Sky is Desireable
Patricio F. Ortiz, Leicester University 
"Indexing the sky" is a database-oriented term to indicate a partitioning scheme of the celestial sphere in order to achieve better performance in queries involving finding close neighbours (cone search, cross correlations amongst catalogues, etc). Several schemes have been proposed (HTM, HEALPix, "quasi-equal area tiles", cubic projection, etc.), and their use has been kept "hidden" from a more massive use. The scientific value of the internal indexation files is much higher though, as they keep track of the source density of catalogues allowing to answer a family of questions not easily handled by a standard DB system and providing an unusual visual aid: a snapshot of the location of sources listed in any catalog. The pros and cons of adopting an VO-oriented indexation scheme are analyzed.
[O6.3]ÊQuantum Topic Maps: A Physicist's View of the Information Universe
Nikita Ogievetsky 
It is a continuation of work on RDF Topic Maps presented at Extreme 2001 [1] and KT2002 [2]; Quantum Topic Maps provide a very concise and intuitive way to represent experimental data. By experiment here we assume any examination or inquiry in experimental physics, other natural sciences, or any type of investigation whatsoever in the real world in general. It will be shown how Quantum Topic Maps can be validated against DAML-OIL ontology. [1] http://www.cogx.com/xtm2rdf/extreme2001 [2] http://www.cogx.com/kt2002/
[O6.4]ÊA New Way of Joining Source Catalogs using a Relational DBMS
Clive Page, University of Leicester 
As part of the AstroGrid and AVO projects we have been examining the facilities of a number of free and commercial DBMS for astronomical data processing, especially the handling of large source catalogs. One particularly important operation is the cross-matching of sources in different catalogs, this being an important precursor to a wide range of data mining operations. This operation, sometimes called the fuzzy join, is difficult because it needs a match of spatial coordinates within the combined error radius. Unfortunately spatial-indexing rarely comes as standard, and even where it does (e.g. in PostgreSQL), R-trees do not cope well with the singularities in spherical-polar coordinate systems. A new algorithm is proposed here which makes use of a pixelation of the sky using, for example, HTM or HEALPix. This allows the use of a simple equi-join on integers, well within the capability of SQL on any relational DBMS. Using PostgreSQL it has been possible to compare the new PCODE method with the traditional approach based on R-tree indexing. A number of other results from our evaluations are also reported.
[O6.5]ÊA Bit of GLUe for the VO: Aladin Experience
Pierre Fernique, CDS Andre Schaaff, CDS Francois Bonnarel, CDS Thomas Boch, CDS 
Aladin is now widely known as a tool to display and cross-match heterogeneous data and images anticipating future VO portals. It offers transparent access to Simbad, VizieR, NED, SkyView, SuperCosmos, NVSS and FIRST, as well as archive logs such as CFHT, Chandra, HST, HUT, ISO, IUE and Merlin. For each of these servers, Aladin knows how to access them, the required query syntax, the list of query parameters, and the fastest mirror site. This knowledge database is automatically updated by taking advantage of the GLU system on which Aladin is based. We present in this article how the GLU system allows Aladin to integrate in an unique interface, several image and data servers. We describe how it works, how it is updated and how it is implemented in this java applet context. We also present the evolutions we foresee in the GLU system in order to interact with the emerging web services like UDDI, WSDL...
[O6.6]ÊInteroperability of the ISO Data Archive and the XMM-NEWTON Science Archive
Christophe Arviset John Dowson Jose Hernandez Pedro Osuna Aurele Venet 
The ISO Data Archive (IDA) and the XMM-Newton Science Archive (XSA) have been developed by the Science Operations and Data Systems Division in Villafranca, Spain. They are both built using the same flexible and modular 3-tier architecture: (Data Products and Database, Business Logic, User Interface). This open architecture, together with Java and XML technology have helped in making the IDA and XSA inter-operable with other archives and applications. Inter-operability has been achieved from these archives to external archives through: - target name resolution with NED and SIMBAD - access to electronic articles through ADS - access to IRAS data through the IRSA server Moreover, direct access to ISO and XMM-Newton data is provided, bypassing the standard user interface. The observation / exposure log is given to external archives or application together with a mechanism to access data via a Java Server Page. Later development will be described in particular the so-called Postcard and Product Server. This is currently available from: - the ADS WWW, that give then access to the data from the articles - the CDS / Vizier catalogue - the IRSA ISO Visualizer - HEASARC archive	The ISO Data Archive can be accessed at : http://www.iso.vilspa.esa.es/ida The XMM-NEWTON Science Archive can be accessed at : http://xmm.vilspa.esa.es/xsa
[O7.1]ÊData Management for the VO
Patrick Dowler, CADC 
The Canadian Astronomy Data Centre has developed a a general purpose scientific data warehouse system and an API for accessing it. The Catalog API defines a general mechanism for exploring and querying scientific content using a constraint-based design. The API provides access to separate but related catalogs and allows for entries in one catalog to be related to (usually derived from) entries in another catalog. The purpose of the API is to provide storage-neutral and content-neutral access methods to scientific data. The API defines a network-accessible Jini service. We have implemented several instances of the warehouse to store related catalogs: the pixel catalog provides uniform access to our many archival data holdings, the source catalog stores the results of image analysis, and the processing catalog stores metadata describing exactly how sources are extracted from pixel data so that all results are reproducible. Thus, entries in the source catalog are connected to entries in the processing and pixel catalogs from which they are derived.
[O7.2]ÊThe NRAO End-to-End (e2e) Project
Tim Cornwell John Benson Boyd Waters Honglin Ye, NRAO 
The NRAO End-to-End (e2e) project has the goal of providing automated, streamlined handling of radio observations on NRAO telescopes all the way from proposal submission to archive access. Thus e2e will ease the use of NRAO telescopes both for expert radio astronomers and novices. The latter is particularly important in attracting new people to the use of NRAO telescopes. E2e must include new capabilities in the areas of proposal submission and handling, scripting of observations, scheduling (both conventional and dynamic), pipeline processing, and archive access. The project was initiated in July 2001 and has just completed the first cycle of development. To track and minimize the risk in our software development, we have chosen to adopt a spiral model, whereby a complete development cycle (from inception to testing and deployment) is completed in 9 months and thence repeated, hopefully learning more and more as we proceed. We expect to complete seven such cycles in the project, delivering new capabilities with each cycle. The resources available are limited, thus placing a premium on careful costing, planning and scheduling, as well as reuse. We are endeavoring to reuse as much as possible, and so much of our work has been based in AIPS++. With this approach, a prototype archive has been completed with about 1 FTE-year of effort. We are placing an emphasis on early and frequent deployment, and so the archive prototype will be deployed for use with the VLA later this year, with deployment for GBT and VLBA planned for 2003. In the area of database access and presentation, we have developed a Calibrator Source tool that can be used by astronomers to find suitable calibrator sources for synthesis observations. This also will be deployed later this year.
[O7.3]ÊData Organization in the SDSS Data Release 1
Ani Thakar, JHU Alex Szalay, JHU Jim Gray, Microsoft BARC Chris Stoughton, FNAL 
The first official public data release from the Sloan Digital Sky Survey (Data Release 1 or DR1) is scheduled for January 2003. Due to the unprecedented size and complexity of the datasets involved, we face unique challenges in organizing and distributing the data to a large user community. We discuss the data organization, the archive loading and backup strategy, and the data mining tools that we plan to offer to the public and the astronomical community, in the overall context of large databases and the VO. It was originally thought that the catalog data would be a fraction of the size of the raw data, which is expected to be several Tb. However, with the multiple versions and data products of the catalog data that will be simultaneously maintained and distributed, it now appears that the size of the catalog data will indeed be comparable to that of the raw data, and organizing and loading it will be quite a daunting task. The DR1 archive will be organized in multiple Microsoft SQL Server relational databases residing on a Windows cluster and logically linked to each other. There will be two calibrations (reruns) of the primary dataset available at any given time: the "target" rerun, from which the spectroscopic targets were selected, and the "best" rerun, which is usually the latest-greatest rerun. The third dataset will be the spectra. In addition to the live datasets, there will be hot spares and offline backups, and a legacy database will preserve all versions of the data served to date. The raw data is stored at FermiLab on a LINUX cluster, so it must be loaded across a LINUX/Windows interface. We have attempted to automate the loading and validation process as much as possible using a combination of perl scripts on the LINUX side and VB scripts and DTS packages on the Windows side. Each step of the loading and validation process is logged in a log database. A separate poster discusses the SDSS DR1 storage configuration. To facilitate data mining in the DR1 archive, we have a variety of interfaces available that allow users to run sophisticated SQL queries on the datasets as well as browse the data using web-based explore and navigation tools. Additionally, we have built a Hierarchical Triangular Mesh (HTM) spatial index into the SQL Server databases for fast spatial lookups and constructed a neighbors table for fast nearest-neighbor searches.
[O7.4]ÊHDX Data Model - FITS, NDF and XML implementation
David Giaretta, Starlink Mark Taylor, Starlink Peter Draper, Starlink Norman Gray, Starlink Brian McIlwrath, Starlink 
A highly adaptable data model, HDX, based on the concepts embodied in FITS, various proposed XML-based formats, as well as Starlink's NDF and HDS will be described, together with the Java software that has been developed to support it. This follows on from the presentation given at ADASS 2001. The aim is to provide a flexible model which is compatible FITS, can be extended to accomodate VO requirements, but which maintains enough mandatory structure to make application-level interoperability relatively easy. The implementation provides HDX factories and lower level data access classes allow a great deal of flexibility, in particular single FITS files can be regarded as HDX files, as can complex structures made up of XML and FITS and HDS components. It can also deal with distributed, large, datasets.
[O8.1]ÊNew Science with LIGO: Past, Present and Future
Kent Blackburn, California Institute of Technology 
Significant analysis of data from LIGO's laser interferometric gravitational wave detector project began with a 25-hour stretch of data collected from its 40-meter prototype instrument located on the Caltech campus in 1994. Since then, construction of LIGO's 4-kilometer laser interferometric gravitational wave observatories in the states of Louisiana and Washington has been completed. Beginning in the spring of 2000, a series of multi-day engineering runs using these new multi-kilometer interferometers have collected 40 terabytes of data. The seventh of these engineering runs, also known as the upper limits run, collected data from both observatories continuously for a 17-day span in late 2001 and early 2002. During this upper limits run, data was analyzed in near real time for instrumental effects, terrestrial effects and astrophysical bursts and binary inspirals. After the data were collected, analyses for stochastic and periodic gravitational wave signals began. To handle the large data analysis requirement of LIGO data, each of the observatories along with MIT and Caltech have been equipped with distributed computer systems known as LDAS. These systems use custom software to integrate concurrent job control, parallel compute clusters and databases, managing the continuous data analysis requirements in near real time or better. LIGO will begin its first scientific run in late summer of 2002. LDAS again will be utilized to carry out scientific searches for gravitational waves as LIGO endeavors to open this new window on the universe.
[O8.2]ÊAstroVirgil: Interactive X-ray Analysis for EPO and First Look
Steve McDonald, University of Massachusetts at Boston and Silicon Spaceships Srikanth Buddha, University of Massachusetts at Boston 
This paper reviews AstroVirgil, a user friendly program for the analysis Chandra event files. AstroVirgil integrates photon filtering and visualization into a single GUI based tool. Photons can be filtered based spatial position (in multiple coordinate systems), photon energy level (using multiple measures) or time of arrival using various custom GUI panels. Filtered photons can be displayed as images, a spectrums or a lightcurves. Each display can be adjusted and improved using a variety of GUI controls. Many existing Chandra tools use a command line interface. This paper reviews some of the performance and memory consequences of performing non-disk file based processing. It is hoped that a user friendly, GUI based, platform independent tool can reach broader community than traditional "high-end" Chandra tools. The initial evaluation from both amateur astronomers and the educational community will be discussed. AstroVirgil is GPLed pure Java program. It is built on top of JSky, a collection of reusable Java components developed at the ESO and first described at ADASS'99. It is available at www.SiliconSpaceships.com.
[O8.3]ÊAn End-to-End Architecture for Science Goal Driven Observing
Anuradha Koratkar Sandy Grosvenor Jeremy Jones Karl Wolf 
Many of the upcoming missions will not only have better detectors, greater on-board storage capacity, and on-board processing capabilities, they will also generate vast volumes of data. Although significant research and development efforts are underway to increase the download capacities, it is prudent to use the available bandwidth efficiently. The transmission efficiency of large data volumes is critical because, even when we have the high bandwidth it will come at a cost. The cost of downlink time and limitations of bandwidth will end the era where all exposure data is downloaded and all data processing is performed on the ground. In addition, observing campaigns involving inherently variable targets will need scheduling flexibility to focus observing time and data download on exposures that are scientifically interesting. The ability to quickly recognize and react to such events by re-prioritizing the observing schedule will be an essential characteristic for maximizing scientific returns. It will also be a step towards increasing spacecraft autonomy, a major goal of NASA's strategic plan. The science goal monitoring (SGM) system is a proof-of-concept effort to address these challenges. The SGM will have an interface to help capture higher level science goals from the scientists and translate them into a flexible observing strategy that SGM can execute and monitor. We are developing an interactive distributed system that will use on-board processing and storage combined with event-driven interfaces with ground-based processing and operations, to enable fast re-prioritization of observing schedules, and to minimize time spent on non-optimized observations. This talk will focus on our strategy for developing SGM and the technical challenges that we have encountered. The SGM architecture and interfaces are designed for easy adaptability to other observing platforms, including ground-based systems and to work with different scheduling and pipeline processing systems.
[O8.4]ÊSmall Theory Data and the Virtual Observatory
Jonathan McDowell, SAO 
The integration of large theoretical simulation archives with the VO has been widely discussed. I suggest it is also important to include smaller theoretical datasets and functional relationships in a structured way, and outline some possible standards. First, I discuss metadata for simulations by drawing an analogy with X-ray spectral analysis, a domain in which complex new theoretical models have been rapidly integrated with the standard data analysis tools via a simple parameterized-function description. This paradigm can easily be extended to image simulations. Secondly, I address the issue of resource discovery for tabular and functional theoretical and phenomenological results such as extinction laws, luminosity functions, isochrones, and distance indicators. A structured extension of the CDS concept of UCDs could make tabular data of this kind easily available not only to astronomers but also to interoperable software. This project is supported by the Chandra Xray Center under NASA contract NAS8-39073
[O8.5]ÊFederating Catalogs and Interfacing Them with Archives: A VO Prototype
Douglas J. Mink Michael J. Kurtz 
A common scientific requirement is to perform a joint query on two or more remote catalogs, then use the resulting combined catalog as input to query an archive or catalog. We have developed techniques which enable the routine federation of several of the largest astrometric and photometric catalogs from either in-house or remote copies, and use this federated output to query the several archives of spectral and imaging data which we either manage or maintain local copies of. Allowing the federation of arbitrary sections of large catalogs, with user defined match criteria; and then allowing this result to be used to query several large archives of spectral and imaging data (also subject to user constraints) is a key goal of all VO projects. The problems we have solved in developing our methods will also have to be addressed by any VO project which delivers similar capabilities.
[O9.2]ÊUncertainty Estimation & Propagation in SIRTF Pipelines
Mehrdad Moshir, SSC/Caltech/JPL John Fowler, SSC/Caltech/JPL David Henderson, SSC/Caltech/JPL 
In the course of reducing raw data from SIRTF into properly calibrated science products, many automated pipelines are utilized. In a typical pipeline, instrumental signatures are successively removed and previously computed calibration values are applied. For such a large-scale automated process one needs to assess quantitatively the results of data reduction to facilitate quality assessment, for example to verify that requirements are met. Furthermore, higher level science products such as point source extraction or mosaicking are dependent on trustable estimates of uncertainties in the data. In addition, it is essential that the end-user is supplied with statistically meaningful measures of confidence in the quoted fluxes or positions to allow full scientific utilization. For these reasons all of SIRTF pipelines have been designed to estimate and propagate uncertainties in each step. This paper will discuss the methods that we have adopted for estimating and propagating uncertainties. Our approach has been based on sound statistical reasoning while taking into account the implications of inherent uncertainties in the characterization of the instrumental signatures that we are trying to remove