Northeast Fisheries Science Center Reference Document 07-01
Accuracy and Precision Exercises Associated
with 2006 TRAC Production
by Sandra J. Sutherland,
Nina L. Shepherd, Sarah E. Pregracke,
and John M. Burnett
Marine Fisheries Serv., Woods Hole Lab., 166 Water St., Woods Hole MA
publication date January 2007;
web version posted January 23, 2007
SJ, Shepherd NL, Pregracke SE, Burnett JM.
2007. Accuracy and precision exercises associated
with 2006 TRAC production aging. US Dep Commer, Northeast
Fish Sci Cent Ref Doc 07-01; 20 p.
Information Quality Act Compliance: In accordance with section 515 of Public Law 106-554, the Northeast Fisheries Science Center completed both technical and policy reviews for this report. These predissemination reviews are on file at the NEFSC Editorial Office.
Download complete PDF/print version
In production aging programs, age reader accuracy can be thought of
as how often the “right” age is obtained, and precision
as how often the “same” age is obtained (Campana 2001). It
is possible that, over time, an age reader may inadvertently change
the criteria that are used for determining ages, thereby introducing
a bias into the age data. This bias can be measured with accuracy
tests, which consist of the age reader blindly examining known- or
consensus-aged fish from established reference collections. An
age reader may also make periodic mistakes, which introduces random
errors into the data. The degree of this error can be measured
with precision tests, which consist of the age reader blindly re-aging
fish which they have already aged. Both accuracy and precision
must be considered within a quality-control monitoring program.
Acceptable levels of aging accuracy and precision are influenced by
factors such as species, age structure, and age reader experience. Although
percent agreement is strongly affected by these differences, the staff
of the Fishery Biology Program at the Northeast Fisheries Science Center
(NEFSC) have long considered levels above 80% to be acceptable. The
total coefficient of variation (CV) is less affected by these differences
and, thus, is a better measure of aging error. In many aging
labs around the world, total CVs of under 5% are considered acceptable
among species of moderate longevity and aging complexity (Campana 2001),
such as the species considered here.
For over 35 years, scientists at the NEFSC Fishery Biology Program
have regularly conducted production aging, determining the ages for
large numbers of samples over a short period of time using established
methods (Penttila and Dery 1988), for the species assessed by the Transboundary
Resources Assessment Committee (TRAC). Historically, our approach
to age-data quality control and assurance has been a two-reader system. In
this approach, there are both a primary and a secondary age reader
for each species. The primary age reader conducts all production
aging, and the secondary age reader then ages a portion of those same
samples using similar methods. The ages determined by the two
readers are compared, and if they agree sufficiently (above 80% agreement),
the production ages are considered valid. If not, the sources
of disagreement must first be resolved. This interreader approach
is still used in the course of training new readers in order to ensure
consistency in application of aging criteria and in inter-laboratory
sample exchanges. Budgetary and staffing constraints have made
this approach less feasible, however, by reducing the number of species
for which there are two competent age readers at this laboratory.
In the past few years, the NEFSC Fishery Biology Program has updated
our approach to quality control and assurance. Intrareader tests
of aging accuracy and precision, as described above, allow us to quantify
the amount of inherent aging error and bias in the ages determined
by each of our staff members. These values provide a measure
of the reliability of the production age data used in stock assessments,
and they may be directly incorporated into population models as a source
In conjunction with implementation of these tests, we have begun to
establish reference collections of age samples for each species. These
collections are necessary to evaluate aging accuracy. Fish of
known age are difficult to obtain, so we have focused on assembling
collections from age samples which have been included in aging exchanges
with other laboratories. From those samples, we have selected
those fish for which multiple experienced age readers agree on the
age (see Silva et al. 2004 for more details).
As in past years, exercises were undertaken to estimate the accuracy
and/or precision of U.S. production aging for the 2006 TRAC assessments
(Legault et al. 2006; Gavaris et al. 2006; Van Eeckhaute and Brodziak
[in press]) of Georges Bank stocks of cod (Gadus
morhua), haddock (Melanogrammus
aeglefinus), and yellowtail flounder (Limanda
report lists the results of those exercises.
In all cases, the primary age reader for each species conducted the
production aging and completed all accuracy and precision exercises. Subsamples
were randomly selected to be re-aged in order to test age-reader accuracy
(versus the reference collections) or precision (versus samples previously
aged by that reader). When re-aging fish, the age reader had
knowledge of the same data as during production aging (i.e. fish length,
date captured, and area captured) but no knowledge of previous age
estimates. During age-testing exercises, no attempts were made
to improve results with repeated readings. There was also no
attempt to revise the production ages in cases where differences occurred.
Results are presented in terms of percentage agreement, total coefficient
of variation (CV), age-bias plots, and age-frequency tables (Campana
et al. 1995; Campana 2001). In the precision exercises, a Bowker’s
test (Bowker 1948; Hoenig et al. 1995) was also used to test for deviations
from symmetry in any case where the percent agreement fell below 90%. This
test can be used to objectively detect a strong bias when comparing
two sets of ages.
Age-reader accuracy was determined for both cod and haddock, from
a random subsample drawn from the corresponding NEFSC otolith reference
collection. For cod, this exercise was done after the completion
of production aging. For haddock, exercises were completed both
before and after production aging. Accuracy for yellowtail flounder
aging was not assessed at present, because the reference collection
for that species is not yet complete.
For all three species, age-reader precision was estimated from blind
second readings of subsamples from each NEFSC survey (autumn 2005 and
spring 2006). Similar precision tests were conducted for samples
from the 2005 NEFSC commercial port samples, but the haddock samples
were further broken down by commercial quarter.
RESULTS AND DISCUSSION
The total sample sizes associated with the accuracy and precision
exercises were N = 225, 483, and 183 for cod, haddock, and yellowtail
flounder, respectively. Results for cod are presented in Figures
1, 2, 3, and 4, haddock in Figures 5, 6, 7, 8, 9, 10, 11, and 12, and yellowtail flounder in Figures 13, 14, and 15. Results
of the three accuracy tests are summarized in Table 1, while all precision
exercise results are shown in Table 2. The Bowker’s test
was run for three of the haddock precision exercises and two of the
exercises for yellowtail flounder; in no case did this test reveal
a significant deviation from symmetry (Table 2).
For cod, the accuracy estimate was high (87% agreement), and the total
CV (3.9%) was low. There was a mild tendency toward overaging
(Figure 1). This accuracy has dropped slightly from last year
(91% agreement and 1.5% CV, Sutherland et al. 2006), when another age
reader conducted the production aging. Cod precision levels were
high, ranging from 94 to 98% agreement and from 0.2 to 1.2% CV (Figures
2, 3, and 4). No bias was apparent in these exercises. Both the
high accuracy and precision levels indicate that the cod age reader
has maintained a reliable level of aging capability.
For haddock, both accuracy estimates were high (96 and 92% agreement,
total CVs of 1.0 and 1.1%, Figures 5 and 6), indicating that the application
of aging criteria has not changed in the past year. Precision
levels were between 85–97% agreement and 0.6–2.2% CV (Figures
7, 8, 9, 10, 11, and 12), indicating that age determinations were consistent. No
bias was apparent in any of these exercises. Although this year’s
results are lower than those in 2005 (median of 95% agreement and 0.7%
CV, Sutherland et al. 2006), these precision levels are well within
accepted limits. The high accuracy estimates and consistently
high precision results indicate that the haddock age reader is continuing
to provide reliable ages.
Precision levels for yellowtail flounder were between 82–90%
agreement and 1.6–5.1% CV (Figures 13, 14, and 15). In no case was
the difference between the production and test ages greater than one
year. There may have been a weak bias toward underaging during
the precision exercise on autumn survey samples, but this was not found
to be significant (P < 0.05, Bowker’s test). Overall,
these precision levels are higher than they were last year, when the
current age reader was still in training (73% agreement and 6.1% CV
for U.S. samples, Sutherland et al. 2006). These high precision
levels, combined with an increase since last year , indicate that the
new age reader has attained a reliable level of aging capability.
Among these three species, U.S. precision measures did not fall below
acceptable in-house precision or accuracy levels in the past year’s
production aging. In most cases, these levels were exceeded. Therefore,
U.S. age determinations are considered to be reliable during recent
Bowker AH. 1948. A test for symmetry in contingency tables. J
Am Statistical Assoc. 43:572–574.
Campana SE. 2001. Accuracy, precision, and quality control
in age determination, including a review of the use and abuse of age
validation methods. J Fish Biol. 59:197-242.
Campana SE, Annand MC, McMillan JI. 1995. Graphical and
statistical methods for determining the consistency of age determinations. Trans
Am Fish Soc. 124:131-138.
Gavaris S, O'Brien L, Hatt B, and Clark K. 2006. Assessment
of eastern Georges Bank cod for 2006. TRAC Ref Doc. 2006/05;
48 p. Available at http://www.mar.dfo-mpo.gc.ca/science/trac/trac.html.
Hoenig JM, Morgan MJ, Brown CA. 1995. Analysing differences
between two age determination methods by tests of symmetry. Can
J Fish Aquat Sci. 52:364–368.
Legault CM, Stone HH, and Clark KJ. 2006. Stock assessment
of Georges Bank yellowtail flounder for 2006. TRAC Ref Doc. 2006/01;
66 p. Available at http://www.mar.dfo-mpo.gc.ca/science/trac/trac.html.
Penttila J, Dery LM. 1988. Age determination methods for northwest
Atlantic species. NOAA Tech Rep NMFS 72; 135 p. Available
Silva V, Munroe N, Pregracke SE, Burnett J. 2004. Age
structure reference collections: the importance of being earnest. In: Johnson
DL, Finneran TW, Phelan BA, Deshpande AD, Noonan CL, Fromm S, Dowds
DM, compilers. Current fisheries research and future ecosystems
science in the Northeast Center: collected abstracts of Northeast Fisheries
Science Center's Eighth Science Symposium, Atlantic City, New Jersey,
February 3-5, 2004. Northeast Fish Sci Cent Ref Doc. 04-01; p.
Sutherland SJ, Munroe N, Silva V, Pregracke S, Burnett J. 2006. Accuracy
and precision exercises associated with 2005 TRAC production aging. Northeast
Fish Sci Cent Ref Doc. 06-27; 17 p.
Van Eeckhaute L and Brodziak J. (in press). Assessment
of haddock on eastern Georges Bank. TRAC Ref Doc. 2006/06. Available