TSR 2009-01: Pysynphot Commissioning Report Authors: Victoria G. Laidler (SSB) and the INS commissioning team (Elizabeth Barker, Luigi Bedin, Francesca Boffi, Howard Bushouse, Rosa Diaz, Danny Lennon, Charles Proffitt, Michael Wolfe) Date 14 August 2009 Abstract -------- The pysynphot package is being developed by SSB as a re-implementation of, and replacement for, the STSDAS.SYNPHOT package that is widely used for synthetic photometry. Version 0.6 of pysynphot has been commissioned by a detailed comparison of its results to SYNPHOT results. Several bugs were identified in SYNPHOT, and a number of bugs were identified and fixed in pysynphot. As a result of this process, pysynphot has been determined to produce answers that are at least as accurate, and in some data domains more accurate than SYNPHOT. A final round of testing was conducted with the Exposure Time Calculator, and pysynphot v0.61 has been certified for use with the ETC 18.0. Goals and methodology --------------------- The purpose of this commissioning process was to demonstrate to the instrument teams, and by extension other potential pysynphot users, that the answers produced by pysynphot are at least as accurate as those produced by SYNPHOT. Because SYNPHOT has known deficiencies, and pysynphot was developed in part to address those deficiencies, we began by comparing SYNPHOT and pysynphot results, but examined cases that failed this comparison to determine whether the differences could be explained by known differences in the two packages. If so, the outcome was deemed acceptable; if not, further investigation was undertaken to determine which package was less correct, producing either a bug report and corresponding corrective action for pysynphot, or a bug report and no further action for SYNPHOT. Test software ............. Software was developed to execute the SYNPHOT and pysynphot commands and perform the comparisons. This software evolved over time as our comparisons were refined, and as failed tests were traced to bugs in the test software. The details of this effort are not included in this report, but are recorded in the change log for the test software, located in the subversion repository with the pysynphot code under test/commissioning. The test sets are also stored in this repository. Test sets --------- The largest test set was derived from the existing set of ETC regression tests for ACS, NICMOS, STIS, and WFC3, as part of the preparation for moving these ETCs to use pysynphot. (No similar set was derived for COS because its ETC was already using pysynphot and its regression tests had already passed.) Each ETC test case resulted in multiple (py)synphot calls; after removing obvious duplicates in function/obsmode/spectrum space, these became the ETC-based test set. For each test case, comprising a (SYNPHOT task, obsmode, spectrum) tuple, several quantities were compared. Each of these comparisons was tracked as a distinct test, and the quantities to be compared (and thus the set of tests performed) naturally varied depending on the SYNPHOT task. Initial or intermediate quantities were compared as well as final quantities; for example, a test case involving the countrate functionality generated tests that compared not only the final scalar count rate, but the throughput for the specified obsmode, the spectrum alone, and the spectrum-throughput combination, both in counts and in photlam (the flux unit used internally by both SYNPHOT and pysynphot). This procedure ensured that any problems could be quickly localized and more easily investigated. The table below summarizes the set of tests performed for test cases involving a given SYNPHOT task. Table of tests run as a function of testcase type: ============= ======== ======== ========= ============== ========= (test type) calcphot calcspec countrate specsourcerate thermback ============= ======== ======== ========= ============== ========= Array tests: testthru X X X testcrspec X X X X X testcrphotlam X X testcrcounts X X Scalar tests: --- --- --- --- --- testcountrate X X testefflam X X X testthermback X ============= ======== ======== ========= ============== ========= Additional smaller test sets were defined: - from first principles - from cases with reported problems identified by INS - for functionality that is not exercised by the ETC, including more general renormalization and the effective stimulation (effstim) calculation provided by the SYNPHOT.calcphot task However, this test set was overwhelmingly large, primarily due to the extremely large size of the WFC3 regression test set. Therefore, after the initial testing period, the review procedure was modified to investigate only cases for which the throughput or count rate comparisons failed, and review the other quantities only as part of an ongoing investigation. This was necessary in order to reduce the amount of work to better match available resources. The final test set included 15701 tests. Of these, 1456 constituted the subset that received more careful scrutiny, as did all testcountrate and testthru tests. Additionally, any 'extreme" failures in a given run were reviewed. The breakdown of tests is summarized in the table below. Table of final review test set: ============== ======================= Source Total ============== ======================= Total tests 15701 (8632 from WFC3) Repr. subset 1456 testcountrate 1769 testthru 3218 ============== ======================= Comparing quantities -------------------- While comparing scalar quantities, such as the count rate and effective wavelength, was relatively straightforward, writing tests to compare array results from SYNPHOT and pysynphot was difficult. The problem had two elements: - produce arrays to be compared on the same wavelength array - devise a comparison that was reasonably insensitive to numerical noise Producing arrays on the same wavelength array ............................................. Because pysynphot was deliberately designed to correct some of SYNPHOT's deficiencies in wavelength array handling, it is clear that the spectral arrays produced by the two software packages will be different, unless special steps are taken to force them onto the same wavelength array. This was accomplished in a slightly different fashion for the various tests, as follows: testcrspec: This test uses the SYNPHOT.countrate task, with a box(R/2,R) where R is the full wavelength range over which the source spectrum is defined, to simulate the obsmode to produce the source spectrum unconvolved with the instrumental obsmode. The countrate task uses a wavecat file to look up the optimally binned wavelength array based on the obsmode. For this test, a special purpose wavecat was produced for each run, to force the use of the wavelength array in the spectrum computed by pysynphot. testthru: This test samples the pysynphot-computed throughput array on the wavelength array used for the SYNPHOT-computed throughput. testcrcounts, testcrphotlam: For these tests, the wavelength array from the SYNPHOT-computed spectrum is used as the binned wavelength set when constructing the pysynphot Observation. Performing the array comparisons ................................ Constructing a meaningful comparison of array numerical data that will be sensitive to significant differences, but insensitive to insignificant differences, is a difficult task. This was the most volatile element of the commissioning test software, as we experimented with various approaches; details can be seen in the revision log. This document describes the final approach used. - After confirming that the arrays are the same size, the two endpoints on either end of the array were excluded from the comparison. This step was introduced when it was observed that the endpoints were often significantly discrepant due to a difference in wavelength array handling. - The significant elements of each array were identified by selecting elements that were greater than some small fraction of the array maximum. The value of this fraction (SIGTHRESH) was tunable for test runs but is typically on the order of 0.01. If the two arrays had different sets of significant elements, a flag was set, and the test continued using the set of elements that were significant in the SYNPHOT array. - From the set of significant elements, the set of elements for which the SYNPHOT array is nonzero was further selected. Then, using this subset of elements, the discrepancy array (test-ref)/ref was constructed, where test is the pysynphot array and ref is the SYNPHOT array. - If any elements in the discrepancy array exceeded the threshold THRESH (also tunable, typically set to 0.01), the test failed. - If more than SUPERTHRESH (typically 20%) of the elements in the discrepancy array exceeded the threshold, the test was considered an "extreme failure". - The min, max, mean, and std of the discrepancy array were also reported, as were the number of 5-sigma outliers. Diagnostic plots ................ An important part of the evaluation procedure involved the use of diagnostic plots that were produced from the FITS files containing the synphot and pysynphot spectra (sampled on the SYNPHOT wavelength array) calculated for each test. For each pair of spectra, three plots were produced: - a simple overplot of both spectra - ratio of results: 1-(synphot/pysynphot) - scaled differences (scaled by the maximum value in the SYNPHOT spectrum) For convenience, the log file from the test was reproduced on the same page as these three plots. Only non-zero array elements were used in the division, and it was assumed that the spectra are indeed on the same wavelength range. Comparisons where the maximum difference is greater than some preset percentage were flagged. Any problems, and some additional information, were listed in an output file. Earlier versions of this code experimented with excluding varying numbers of points from the ends of the array to try to trap and ignore spurious differences, but the final version plots the full wavelength range. .. figure:: sample_diag.png :alt: A sample diagnostic plot from a STIS case A sample diagnostic plot for a failed STIS test case. Errors identified during the commissioning ------------------------------------------ Identified SYNPHOT errors ........................... The following problems with SYNPHOT were either previously known, or were identified during this commissioning process. Tests exhibiting these problems were either deemed acceptable, or reworked to avoid the failure so that a meaningful comparison could be performed: - Poor performance around very narrow lines or very steep gradients (acceptable discrepancy) - Spectrum differences shortward of 900A due to intersection with Vega wavelength array (acceptable discrepancy --> modify test to avoid failing) - Spectrum differences because SYNPHOT drops any wavelength bins outside the range defined by the renormalization bandpass (acceptable discrepancy) - Poor performance in renormalization around suddenly sparse wavelength arrays (rework tests) - Bug in calcspec not present in countrate (documented in TSR 2008-01) (reworked tests) Identified and repaired pysynphot errors ........................................ The following problems with pysynphot were identified and fixed during this commissioning process: - Incorrect calculation of interpolated spectral elements (#134) - Incorrect tapering behavior (#149) - FITS files written to single precision by default (#146) - Incorrect parsing of obsmode keywords that look like numbers (#125) - Incorrect parsing of band() arguments (#126) - Implement gal3 extinction law (#127/123) - Improve behavior in case of partial overlap (#157) - Handle partial overlap cases in the etc.py interface (#158) Identified data issues ....................... The following problems with the SYNPHOT data files managed by CDBS were identified and resolved during the commissioning process: Unpinned throughput files ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Because SYNPHOT uses the intersection of the wavelength array, throughput files that simply run out of data at one or both ends are treated by SYNPHOT as if they register zero throughput outside the defined range. Pysynphot uses the union, rather than the intersection, and extrapolates missing data. To resolve this source of (sometimes large) differences between SYNPHOT and pysynphot, the instrument teams determined which throughput files should be "pinned" to zero at one or both ends (for example, mirrors should not be pinned; ordinary filters should be pinned at both ends; longpass filters should be pinned at one end). The final set of commissioning tests was run against a special version of CDBS that contained the corrected files. These files were delivered to CDBS once the commissioning process was complete. Graph table and WFC3 thermal ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A problem was detected with the graph table that produced incorrect thermal results for WFC3, because there was no entry in the THCOMPNAME column for the wfc3_ir_cor component. Because of different algorithms used to traverse the graph table, pysynphot was immune to this error, but SYNPHOT was not. Thus pysynphot correctly included this element in the optical path for the thermal background calculation, but SYNPHOT incorrectly omitted it, and the two packages therefore produced different thermal background rates for WFC3 observing modes. Results ------- In most cases, agreement between SYNPHOT and pysynphot is quite good. This is illustrated by the histogram below, which shows the ((syn-pysyn)/syn) discrepancy for the scalar quantity countrate. The peak, centered at zero, is quite narrow, with a small number of cases less than 1%, and a handful of larger outliers. .. figure:: countrate.png :alt: Histogram of countrate results showing narrow peak (syn-pysyn)/syn for countrate tests, full range shown. .. figure:: countrate_zoomed.png :alt: Zoomed version of histogram. Same histogram zoomed to show structure around peak. Disagreements are generally due to differences in interpolation, for example: - extremely narrow emission lines - high frequency features in spectra or bandpasses - spectra that are poorly sampled - steep gradients - differences in endpoint handling Some differences also arise due to differences in extrapolation, as discussed above in Data Issues. For all cases in which there is a significant difference, pysynphot produces a more accurate result than SYNPHOT. The appendices include instrument-specific discussions of the testing process and results. Coda: Certification of pysynphot for use with the ETC ----------------------------------------------------- The Exposure Time Calculator is a primary user of the SYNPHOT package, and so there was additional concern to ensure that pysynphot and the ETC functioned together correctly. Therefore, as a coda to the commissioning process, a set of certification tests was run against a version of the ETC 17.4. This certification process involved replacing SYNPHOT use by pysynphot use, and running the ETC regression tests themselves (including those for COS, which had been using pysynphot v0.4 but not the version under test). Some of these tests failed, as expected, due to the differences between SYNPHOT and pysynphot. These failures were examined in light of the known results from the commissioning tests, and all were accepted as either small enough to be acceptable, or adequately explained by the known differences between SYNPHOT and pysynphot. During this analysis, a potential problem for slitless spectroscopy was identified because of the way that the ETC uses the pysynphot .countrate() method in these cases. Namely, it used the optimally binned wavelength array (specified by the wavecat lookup table) for the source, but the native wavelength array (the union of the wavelength arrays of all throughput and spectrum components) for the sky. In order to expedite the certification process and facilitate an early delivery of the ETC, the pysynphot code in the ETC interface was modified so that countrate would use the optimally binned wavelength array for all calculations. While this may not be the most correct thing to do, it is at least consistent with SYNPHOT's behavior. The discussion of what is the scientifically correct thing to do was deferred until the instrument teams had more leisure to consider the question. This modified version of pysynphot was designated v0.61, and the release for use by the ETC was prepared from the tagged version https://svn.stsci.edu/svn/ssb/astrolib/tags/pysynphot_v0.61_ETC18.0. Further detail about the certification process is stored in the pysyncert_17_4 branch of the ETC repository (https://svn.stsci.edu/svn/ssb/extic/branches/pysyncert_17_4/). Acknowledgements ---------------- The authors thank Donald McLean for assistance with the ETC regression tests, and Mark Sienkiewicz for assistance with infrastructure for test management and reporting, both of which were necessary for the successful completion of this project.