astro-ph0001540/Q.tex
1: %Begun 30 Dec 1999
2: \documentstyle[12pt,aasms4]{article}
3: \begin{document}
4: \def\arcsec{\ifmmode^{\prime\prime}\;\else$^{\prime\prime}\;$\fi}
5: \def\arcmin{\ifmmode^{\prime}\;\else$^{\prime}\;$\fi}
6: \title{The Number of Publications Used as a Metric of the NOAO WIYN Queue 
7: Experiment}
8: 
9: 
10: \author{Philip Massey, Mary Guerrieri,
11: and Richard R. Joyce}
12: \affil{Kitt Peak National Observatory, National Optical Astronomy
13: Observatory\altaffilmark{1}
14: \\ P.O. Box 26732, Tucson, AZ 85726-6732}
15: 
16: 
17: 
18: 
19: \altaffiltext{1}{Operated by AURA
20: under cooperative agreement with the
21: National Science Foundation.}
22: \begin{abstract}
23: 
24: We use the number of papers published in 1998 and 1999 to test the
25: hypothesis that the queue observing mode at WIYN leads to a
26: significantly higher scientific throughput than classical mode
27: observing.  We use the papers published from the 4-m, and papers
28: published from the non-queue WIYN time as controls, requiring only that
29: the data be obtained after 1996 August 1, at which time the WIYN queue
30: was in its third full semester of operation, and the WIYN instruments
31: functional and stable.  The number of papers published from the queue
32: data is actually 1.5 times smaller (on a per night basis) than from the
33: 4-m, and roughly comparable to (but lower than) the number published
34: from non-queue WIYN time.  Thus neither comparison offers any {\it
35: support} for the hypothesis that queue leads to a higher scientific
36: throughput.  The number of papers is relatively small, but the
37: statistics are sufficiently robust to {\it reject} the possibility that
38: queue observing at WIYN leads to a factor of 1.5 enhancement in
39: publication rate with a 99.3\%  confidence in comparison to the 4-m,
40: and with an 89.9\% confidence in comparison with non-queue WIYN time.
41: We consider several explanations, and urge that other observatories
42: planning to employ the queue mode include some controls to provide
43: an objective evaluation of its success. \end{abstract}
44: 
45: \keywords{PAC codes 95.45 95.55; instrumentation: miscellaneous---methods: 
46: miscellaneous---sociology of astronomy}
47: 
48: \section{Introduction}
49: 
50: The 3.5-m Wisconsin-Indiana-Yale-NOAO (WIYN) telescope was dedicated on
51: 1994 October 15, and shared-risk observing began in 1995 March.  NOAO's share
52: of the time is 40\%, and nearly all of this has been carried out in ``queue" mode, where the observations 
53: from highly ranked proposals are placed in a queue and executed during
54: nights assigned to the queue program.  The observations are carried out by
55: highly experienced professionals, who are extremely familiar with
56: the instrumentation, without the direct assistance of the proposing astronomer.
57: A small fraction of the NOAO time
58: is scheduled out in ``classical mode", with the
59: observers present at the telescope.  The time allocated to the
60: university consortium members (roughly 60\%) 
61: is all carried out in classical mode.
62: 
63: 
64: The goal
65: of the NOAO WIYN queue experiment was 
66: eloquently described by Silva \& De Young (1996) as an empirical test
67: of ``the hypothesis that in the face of a high over-subscription rate, the
68: science throughput of WIYN can be maximized by executing
69: the most highly ranked science programs first, completing datasets in a timely
70: manner, allowing a larger range of program lengths, and matching the
71: observing program to the observing conditions on an observation-by-observation
72: basis."  
73: 
74: The WIYN queue has often been described as an ``experiment" at least in part
75: because other observatories are considering scheduling some or all of their time
76: in this mode, and NOAO staff have felt that what we can learn from the WIYN
77: queue will be useful to others.
78:  In an era that sees both the proliferation of very large
79: ($\ge$8 m) telescopes, but ever-tightening financial resources, observatories
80: are scrambling to understand how to maximize their scientific return.
81: 
82: Queue observing offers a variety of theoretical advantages, as nicely
83: summarized by Mountain (1996) and
84: Boroson et al.\ (1998). For very highly ranked programs that require
85: rare conditions, queue observing may be the only practical way to acquire such data.  Queue observing naturally allows synoptic observations, and such
86: scheduling easily accommodates target-of-opportunity requests, such as optical
87: follow-ups of gamma-ray bursts or supernovae. Furthermore, as instrumentation
88: becomes more complex, queue observing carried out by dedicated observers
89: may result in more efficient use of telescope time than if the observations
90: were carried out by visitors who uses the equipment only occasionally.
91: This contention is partially supported by evidence that 
92: observers collect less data on the first night of an observing run than
93: on subsequent nights (Bohannan 1998).
94: 
95: However, there are obvious down-sides to the queue mode.  The astronomer is not
96: present at the telescope, and therefore cannot make real-time decisions
97: concerning the data.  Serendipity is eliminated, as are the risky programs 
98: many of us have snuck in during gaps in our main observing program, and which
99: have sometimes led to the more interesting results.
100: Some of us suspect, rightly or wrongly, 
101: that we could better carry out our own observations.
102: And, there is not the same strong sense of
103: ``data ownership" that comes
104: with having carried out the observations ourselves:  the memory of a night
105: may provide details that are relevant to the interpretation of the reduced
106: data, as well as providing an emotional impetus for seeing the project
107: through to its completion.  
108: 
109: There is also a non-negligible expense of
110: running a queue, which is off-set to some degree
111: by the smaller support
112: required for visiting astronomers.
113: 
114: Boroson (1996) has described a simulation program that can be used to
115: test how successfully programs are completed in a queue mode
116: vs.\ a classical mode, using Monte Carlo sampling of characteristic 
117: observing conditions (weather, seeing) for the site. Boroson 
118: et al.\ (1998) used this simulation program comparing queue mode and classical
119: scheduling 
120: for two actual semesters (1997) of WIYN programs, concluding that 
121: queue scheduling at WIYN has led to a significant gain in efficiency
122: and scientific effectiveness. 
123: 
124: 
125: Now that the queue experiment has run for several years, we thought it would be worth examining the gain using some real-world measure.
126: As emphasized by Boroson et al.\ (1998), much of the argument about observing
127: modes can be emotional.  We seek some metric that we can use to {\it test}
128: the {\it hypothesis} enunciated above that the queue observing mode leads
129: to significant improvement in the {\it science throughput}.  One such
130: simple metric is the number of refereed papers published.  This may not be
131: as meaningful in its long-term impact on astronomy as, say, 
132: the number of important new discoveries, 
133: but at least it has the advantage of being quantifiable, and, if the
134: experimental and control samples are well matched, equitable and fair.
135: 
136: 
137: We choose to compare the number of papers produced by the WIYN queue to the 
138: following two controls, 
139: each with its advantages and disadvantages:
140: \begin{enumerate}
141: \item The number of papers produced by observations made over the same time
142: period with the Mayall 4-m telescope.
143: \item The number of papers produced by observations made over the same time
144: period by non-queue use of WIYN; i.e., primarily the time used by the 
145: consortium universities.
146: 
147: \end{enumerate}
148: 
149: The first comparison has the primary advantage that both the 4-m and WIYN
150: proposals have undergone similar scrutiny by
151: the same time allocation committees (TACs), 
152: which often consider such factors as the past track-record of the proposers 
153: as well as the 
154: scientific excellence of the proposals.  Thus proposers to the 4-m and
155: WIYN will feel similar pressures to publish in a timely manner, and the
156: feasibility of the proposals has been carefully evaluated.  Users of the
157: university time may choose to undertake 
158: longer-term projects, leading ultimately
159: to more important results, but not processing the same rapid turn around
160: from observing to publication.  
161: We offer the second comparison as there may be differences in the actual
162: on-sky performance of the two telescopes that would affect the results:
163: the 4-m is a mature telescope, possibly with fewer teething problems, than
164: the newer WIYN.
165:  
166: If the queue leads to significantly higher 
167: scientific throughput, then we expect that the number of papers published using
168: data obtained via the queue should be significantly
169: greater than those produced by the
170: control samples, after normalization on the basis of the number of scheduled
171: nights.
172: 
173: \section{The Data Set}
174: 
175: All of the 1998 and 1999 issues of the main US astronomy journals
176: were examined
177: for papers which used 4-m and/or WIYN observations.  The complete list of 135
178: papers is given in Table~A1 of the Appendix.
179: 
180: In order to make a fair 
181: comparison, we restricted ourselves to only those papers for which the data
182: were obtained in semester ``1996B" or later (i.e., after 1996 August 1).
183: This was the third full semester of WIYN queue time, and the first semester in
184: which both the imager and fiber positioner were fully functional. 
185: (A non-linearity problem with the S2KB imager chip was discovered and
186: fixed during the 1996A semester, and a mechanical problem which compromised the
187: positioning accuracy of the Hydra fiber positioner was fixed in 1996 March.)
188: 
189: We list in Table~1 the number of papers published during 1998 and 1999.
190: Six papers used both 4-m {\it and} WIYN data; we chose to count each of
191: these papers separately for both telescopes, depending upon the date
192: in which the data were obtained for the telescope under consideration; i.e.,
193: if the data for WIYN was obtained in 1996B or later, but the 4-m data was
194: obtained prior to 1996, it would count as a WIYN publication but not as a
195: 4-m paper. There were six papers in our list in which the data
196: collected were such a minor component of the
197: paper that we chose not to count the paper at all; only one of these used
198: the WIYN queue, and in that case the data had been published previously by
199: the original proposers.
200: 
201: \section{Results}
202: 
203: 
204: \subsection{Comparison of the WIYN and the 4-m}
205: 
206: In order to make a valid comparison, we must first take into account that not
207: as much time is scheduled for the WIYN queue as for the 4-m.  We expect the
208: answer is about 40\%, as NOAO receives 
209: 40\% of the time on WIYN, and almost
210: all of this goes to the queue.  However, the 4-m is shut down during July
211: and August, while WIYN continues to operate; on the other hand, there
212: are more engineering nights scheduled at WIYN. One could use the 
213: total number of clear hours spent observing as
214: the normalization, but these data are hard to extract reliably.  Instead, we
215: took the final observing schedules for semesters 1996B, 1997A, 1997B, 1998A, and
216: 1998B, and simply counted the number of nights assigned to the WIYN queue,
217: and to science operations at the 4-m. (For the latter, we included half-night
218: instrument ``checkout" nights, as much of this time is typically 
219: returned to the observers
220: scheduled on the second half; full-night ``check" nights and engineering
221: nights were excluded.  We excluded all engineering nights scheduled at WIYN,
222: although occasionally queue observations are obtained during such time.)  
223: The numbers of nights so scheduled for the WIYN
224: queue and for the 4-m are 260 and 656 respectively; i.e.,
225: the number of nights scheduled to the WIYN queue turned out to be 39.6\% of the
226: nights scheduled at the 4-m.
227: 
228: If the hypothesis described above is
229: correct, we would expect the number of publications based upon WIYN queue data
230: to be significantly greater than 40\% of those produced by the 4-m.
231: Instead, we find in Table~1 that there
232: were only 9 papers produced by WIYN queue data as opposed to 34 papers
233: produced by the 4-m; i.e., 26\%.  Thus there are actually
234: 1.5 times fewer papers published (on a per night basis) based on
235: queue WIYN data relative to those based on 4-m data. This comparison does not
236: support the hypothesis of greater science throughput by the WIYN queue.
237: 
238: Can we rule out the hypothesis given the small number statistics?  If we assume
239: the simplest model that a 1$\sigma$ uncertainty in the number of publications
240: $N$ is simply the $\sqrt{N}$, then the 1$\sigma$ error on the 0.26 ratio of
241: WIYN to 4-m publications is 0.13.  What does it mean for there to be a 
242: ``significant" enhancement in the scientific throughput?  Boroson et al.\ (1998)
243: discuss how their simulation predicts this will depend upon program type,
244: TAC grade, and so on, and that overall about 2.5 times as many programs will be completed by queue observing than with classical observing.  We take here
245: a more conservative approach: certainly a 50\% increase (a factor of 1.5)
246: would be cause for celebration. Were this enhancement present, we would expect
247: there to be 1.5 $\times$ 39.6\% = 59.4\% as many WIYN queue papers as 4-m
248: papers.  We observe 0.26$\pm$0.13
249: We thus can reject such an increase at a +2.5$\sigma$ level; i.e., with a 99.3\%
250: confidence.\footnote{The rejection probability corresponding to +2.5$\sigma$
251: was found by  $$1.0-0.5 \times (1.0-A_G(\mid x-\mu\mid /\sigma)),$$ where  
252: $A_G$ is the
253: integral probability of the normal distribution with a mean of $\mu$ and
254: a standard deviation of $\sigma$; see, for example, Fig.~C-2
255: in Bevington (1969).}
256: 
257: \subsection{Comparison of Queue vs.\ Non-Queue Time at WIYN}
258: 
259: Of the 731 nights scheduled for science at WIYN during 1996B through 1998B,
260: we find that 260 nights were scheduled for queue observations (35.6\%),
261: 27 nights were scheduled for NOAO classical observations (3.7\%), and 444 as
262: university time (60.7\%).  If queue observing produced
263: a significantly higher scientific throughput, we would expect significantly
264: more than 36\% of the papers produced by WIYN data to be based on data obtained
265: with the queue.  Instead, of the 28 total WIYN
266: papers in our sample, 9
267: (32\%) were produced from queue data. This is 
268: essentially
269: the same fraction of time on WIYN used by the queue (36\%), and therefore does
270: not suggest that queue provides a significant advantage. 
271: 
272: While the data fail to offer any support for the hypothesis, at what level
273: can we reject the claim, given our limited statistics? Using the same argument
274: as above that we would hope for a factor of 1.5 enhancement over the non-queue
275: publication rate, we can ask at what level can we exclude the queue publications
276: amounting to 1.5$\times$ 35.6\% = 53.4\% of the total.  The uncertainty
277: in our ratio 0.32 ratio is 0.17.  Thus we can exclude a 50\% enhancement at
278: the +1.3$\sigma$ level; i.e., with an 89.8\% confidence. 
279: 
280: Nevertheless, it is clear that queue observing does
281: fare better in this comparison than it did in comparison to the 4-m control,
282: although still failing to produce a higher number of publications. 
283: Several explanations
284: come to mind.
285: One possibility is that the 4-m simply operates more efficiently
286: than WIYN (at least in the time period when most of the data were acquired), 
287: and that
288: it was thus easier to obtain usable data at the 4-m.
289: It is possible that review of queue proposals by an
290: outside TAC leads to a higher publication rate than time used by the
291: universities, who have a preallocated amount of time, which is divided up
292: internally.  (As suggested earlier, the university time may be spent on 
293: longer-term programs than the NOAO portion.)  Finally, the 4-m supports a wider
294: complement of instrumentation (such as infrared imaging and spectroscopy) than WIYN,
295: which plausibly provides greater coverage of astronomical disciplines and
296: thus involvement in a wider variety of publications.
297: 
298: Although the numbers are small, the very high publication rate
299: for NOAO time that is scheduled {\it classically} at WIYN suggests that it
300: may be the TAC process rather than the telescope or instrumentation
301: which explains why the queue
302: does better in this comparison than it does in comparison to the 4-m: 
303: 14\% of the WIYN papers were produced by the small
304: (3.7\%) time allocated to non-university classical observing. 
305: The classically scheduled NOAO time 
306: undergoes the same rigorous review as the queue
307: proposals, and thus is under the same pressure to publish rapidly.
308: 
309: \section{Discussion}
310: 
311: Arguably, the WIYN queue has been as well run as it is possible for
312: any queue to be.  A survey carried out of astronomers who had proposed for
313: queue time suggests that people were very satisfied with the quality of the
314: data they received (Boroson et al.\ 1998); some might expect that maintaining
315: data quality to be the hardest part of a queue.
316: Yet the evidence so far fails to support the suggestion that queue
317: observing leads to a higher scientific throughput, at least as measured by the
318: number of publications.  Why does this differ from the dramatic predictions of simulations that suggest that a much higher percentage of programs should be
319: completed by the queue mode? 
320: 
321: We have read through the papers based upon the WIYN queue data and have several observations of our own to offer.  First, let us consider the advantage that
322: queue offers in providing easy ``target of opportunity" (TOO) observations.
323: Of the full set of 11 papers (ignoring the 1996B cutoff), four rely on the
324: TOO advantage of queue for optical followup of 
325: gamma-ray bursts (Galama et al.\ 1998) or supernovae
326: (Jha et al.\ 1999; Perlmutter et al.\ 1999; and Riess et al.\ 1998).  
327: Although WIYN played a role in these important studies, our examination of
328: these papers suggests that it was a relatively minor role, with the
329: majority of the data coming from elsewhere.   For instance,
330: there are considerably
331: more
332: data from the CTIO 4-m (which is classically scheduled) than from WIYN
333: in the Riess et al.\ (1998) study.
334: Inspection of these papers suggest that
335: there is no lack of ways for large groups to acquire such data. 
336: The number of authors on these four papers range from
337: 17 to 42, and with a large number of participants being a reflection of
338: the degree (and method?) of telescope access. 
339: Thus TOO use of WIYN may not be more significant simply
340: because there
341: are other ways of obtaining such data.
342: 
343: One of the other purported advantages for queue observing is the ability to 
344: take
345: advantage of particularly good
346: conditions, and indeed some programs may not be
347: completed any other way.  However, this advantage is larger the greater the
348: range of conditions. 
349: For instance, if the frequency histogram of delivered image quality (DIQ)
350: is very sharply peaked,
351: then queue offers less of an advantage, as all programs will obtain something
352: like the median seeing.  At WIYN the median DIQ (at {\it R})
353: is
354: 0.8 arcseconds, and 0.6 arcsecond or better images are achieved 18\% of the
355: time (Green 1999).  
356: Of the 11 queue papers listed in Table~A1, 
357: Armandroff, Jacoby, \& Davies (1999) is one of the clearest examples of taking
358: advantage of the queue to obtain the best DIQ. 
359: The study utilized sub-arcsecond conditions (0.8 arcsec at {\it B},
360: 0.6 arcsec at {\it V}, and 0.7 arcsec at {\it I}) for deep imaging of a newly
361: discovered dwarf member of the Local Group, Andromeda~VI, after confirming its
362: nature using imaging at the 4-m.  Nevertheless, these DIQ 
363: values are not all that
364: different than the median values.
365: 
366: However, it may be that the sociological issues raised in the introduction dominate.  The
367: use of queue may reduce the sense of ``data ownership," and given
368: situations of ``data saturation," we are more likely to publish the
369: data more rapidly if we have acquired them ourselves.  
370: The use of ``queue mode" on {\it HST} has been perceived as being highly
371: successful, 
372: although a meaningful control sample is hard to find for comparison; however,
373: one important difference comes to mind, namely that observing time (to US
374: proposers) usually comes with grants, providing a financial incentive to produce
375: results rapidly, coupled with a 1-year proprietary
376: period for unique data.  An additional consideration is that {\it HST} supplies
377: the user with fully reduced data, unlike WIYN, which provides basic calibration
378: data and requested standard observations, but which does not attempt a 
379: ``pipe-line" reduction. However, our own experience with {\it HST} data is
380: that customized reductions are often needed in order to provide the data most
381: meaningful for a particular application.
382: 
383: Finally, it may be that we simply have not been sufficiently patient.  As is evident from the 4-m publications, only one-third of the 4-m papers in
384: the past two years relied purely on ``new" data (i.e., all data obtained in
385: the past 3.5 years).  While our control samples explicitly took this into account,
386: we are nevertheless comparing numbers that are on the
387: the tails of the distribution of how quickly data finds its way into the
388: literature.  This may be particularly true if the datasets from the WIYN
389: queue were to be larger than that in the control samples, or if they
390: take longer to reduce.
391: Current plans call for discontinuing the WIYN queue at the
392: end of semester 2000A, but continuing to provide some synoptic and target of opportunity service observing beyond that.  It will be interesting to
393: re-examine the literature five years from now
394: using data obtained in 1996B-2000A
395: as the selection criterion.
396: 
397: We note that the quantity we would most like to measure is ``quality", but
398: this is of course harder to do in an objective manner.  Citation rates might
399: provide one means, but not enough time has past for these to be meaningful.
400: Counting the number of papers is some measure of the ``output" of a telescope,
401: but it is not necessarily the best; it does have the advantage of being
402: objective and reproducible, qualities usually assumed to be desirable
403: in any experiment.
404: 
405: 
406: Nevertheless, our results suggest that it 
407: may benefit observatories to evaluate
408: their queue programs using some external measure, such as the number of
409: publications, if suitable controls can be defined.
410: 
411: \acknowledgments
412: Helmut Abt, 
413: Dave De Young, David Sawyer, 
414: Dave Silva, and Sidney C. Wolff were kind enough to 
415: provide thoughtful comments
416: on the manuscript.  We also benefited conversations with Taft Armandroff,
417: Bruce Bohannan, and
418: Abi Saha on the issues of queue observing.
419: 
420: \section{Appendix}
421: 
422: In Table~A1 we present the list of papers published in the
423: {\it Astronomical Journal}, the {\it Astrophysical Journal} (Parts 1 and 2), 
424: and the {\it Publications of the Astronomical Society of the Pacific}
425: during 1998 and 1999 that used data from the 4-m and/or WIYN.  We list
426: the dates of the first data obtained (from the relevant telescope).  Often
427: this information was directly obtained from the paper, but in many cases we
428: had to contact the authors, or inspect the observing schedule or list of
429: queue programs to determine the actual data or semester.
430: 
431: 
432: 
433: %\input{tabs.tex}
434: 
435: \begin{references}
436: \reference {} Armandroff, T. E., Jacoby, G. H., \& Davies, J. E. 1999, AJ, 118,
437: 1220
438: 
439: \reference {} Bevington, P. R. 1969, Data Reduction and Error Analysis for
440: the Physical Sciences (New York, McGraw-Hill)
441: 
442: \reference {} Bohannan, B. 1998, SPIE, 3349, 30.
443: 
444: \reference {} Boroson, T. 1996, in New Observing Modes for the
445: Next Century, ed.\ T. Boroson, J. Davies, \& I. Robson (San Francisco: ASP), 13
446: 
447: \reference {} Boroson, T., Harmer, D. L., Saha, A., Smith, P. S., Willmarth, D. W., \& Silva, D. R. 1998, SPIE, 3349, 41
448: 
449: \reference {} Galama, T. J. et al.\ (16 additional authors) 1998, ApJ, 497, L13
450: 
451: \reference {} Green, R. 1999 NOAO Newsletter No. 60, 38
452: 
453: \reference {} Jha, S. et al.\ (41 additional authors) 1999, ApJS, 125, 73
454: 
455: \reference {} Mountain, M. 1996, in New Observing Modes for the
456: Next Century, ed.\ T. Boroson, J. Davies, \& I. Robson (San Francisco: ASP), 235
457: 
458: \reference {} Perlmutter, S. et al.\ (31 additional authors) 1999, ApJ, 517, 565
459: 
460: \reference {} Riess, A. G. et al.\ (19 additional authors) 1998, AJ, 116, 1009
461: 
462: \reference {} Silva, D., \& De Young, D. 1996, NOAO Newsletter 45, 36
463: 
464: \end{references}
465: 
466: 
467: \end{document}
468: 
469: