Estimating Greenhouse Gas
The Washington Post and information
A Page One story by Washington Post staff writer David A. Fahrenthold says carbon dioxide
emissions in the Washington, DC, area increased 13.4% from 2001 to
The article clearly is intended to influence public policy. But
there are significant problems with this estimate that are not disclosed in the
article. The federal Information Quality Act does not apply to the Washington Post, but it would apply to any
federal agency that attempted to either take action based on them, or even to
report them in a manner suggesting that it thought they were valid. (Congress is
exempt from the statutory requirement to only disseminate scientific and
statistical data that meet applicable information quality standards. Unlike
Executive branch agencies, of course, Congress is never regarded as an
authoritative body for scientific or statistical information. )
compare the data reported by Fahrenthold with the information quality standards
that apply to federal agencies.
Here's what Fahrenthold tells us about how he derived his
The Post estimate began with
data on miles traveled by cars and trucks in local jurisdictions and the amount
of kilowatt hours used by utility customers.
Then, using methods from the
U.S. Energy Information Administration, those figures were used to calculate the
total amount of carbon dioxide emitted from vehicles and power-plant
smokestacks. [See the chart for details.]
The figures from those
calculations leave out greenhouse gases from other sources, such as agriculture,
planes, boats and oil furnaces. Those missing figures could account for half of
The the chart referred to in square brackets above is
found only in the print edition and is titled "The Rapid Rise of Emissions." It
contains the following reported data, but in graphical
Rise of Emissions"
("The rate of increase was calculated by The
using data from governments, environmental groups and
||Cars and Trucks*
|*Arlington County not included
in Virginia Suburbs|
** Frederick County not included in Maryland
Suburbs. Only partial data available for Stafford, Fauquier, Calvert,
Montgomery and Prince George's counties
SOURCE: Staff reporting
Washington Post, April
29, 2007 Print Edition, A16
These data do not adhere to the minimum information
quality standards that would apply if they had been disseminated by the federal
TRANSPARENCY AND REPRODUCIBILITY
Federal information quality guidelines
require government agencies to practice transparency and reproducibility when they disseminate
statistical information. Transparency
means fully revealing all sources and methods. Reproducibility means providing enough
information that a qualified third party would obtain essentially the same
answer. The Post's data do not satisfy
either of these requirements.
The Post's choice of data is not transparent, and
Fahrenthold only hints at his sources.
At least one of his acknowledged sources -- "environmental groups" -- have a
policy interest in maximizing the reported percentage increase in CO2 emissions.
It is possible that they did not bias their data in accordance with these policy
interests. However, Fahrenthold does not inform readers of this potential
conflict of interest, nor does he reveal whether the Post performed due diligence to validate the
validity and reliability their data. It appears that the Post simply accepted
their data without question.
The Post acknowledges that its cdata are
incomplete two ways -- first, by not counting all emissions from categories that
it included, and second, by excluding source categories. When data are
incomplete, inferences about them should be made with caution. Instead, the
Post mentions these defects but draws
inferences as if these defects are minor.
With regard to its analytic
methods, the Post also reveals nothing
of importance. Presumably, the Post
performed a simple subtraction of 2001 from 2005 values and assumed the
resulting difference to be an unbiased
estimate. An unbiased estimate is one
that is just as likely to overestimate the true but unknown value as to
underestimate it. But simple subtraction yields an unbiased estimate of the difference only under
certain restrictive conditions, including:
conditions might apply, but we don't know because the Post did not reveal its sources and methods.
- All definitions must be
identical for 2001 and 2005. Any change in definitions means that the
data are not comparable across years, and the result of subtraction is
uninterpretable. Apples cannot be subtracted from oranges.
- Data that were missing
in each year must be missing from both years. Counties partially
counted or missing in 2001 must be either missing or excluded in 2005, and
vice versa. Where coverage was partial in 2001, it must be identically partial
- The methods used to
estimate values for 2001 must be the same methods used for estimating values
for 2005. Any change in methods implies an explainable discrepancy in
the reported difference.
This leads to the Post's second procedural failure. The Post's calculations are not reproducible by a
qualified independent third party. Fahrenthold reports that "Jonathan Cogan, a
spokesman for the [Department of Energy's] Energy Information Administration
reviewed The Post's calculations and said the agency's formulas appeared to have
been used correctly." The extent of this external review is unclear -- was it
limited to fidelity to EIA formulae, or did it also include a review of the
Post's input data? (By responding to the Post's request, Cogan put EIA in the position
of violating the spirit of the law by implicitly conveying its endorsement. He
did not violate the letter of the law because statements made by agency
spokesmen are exempt.)
The depth of Cogan's review notwithstanding, the
reproducibility requirement in federal information quality standards can't be
satisfied by reliance on a hand picked third party. Satisfying the
reproducibility requirement can be achieved only by disclosure.
Federal information quality guidelines require
federal agencies to ensure that statistical information intended to influence
policy be objective.
Substantive objectivity means that information
must be "accurate, reliable, and unbiased." "In a scientific, financial, or
statistical context, the original and supporting data shall be generated, and
the analytic results shall be developed, using sound statistical and research
methods."We've already documented why the Post's estimates are unlikely to be
substantively objective. If a federal agency disseminated statistical
information this way, it would be presumptively in violation of the law. So
we'll focus on presentational
objectivity, which applies even if substantive objectivity is
objectivity means that information must be "presented in an accurate,
clear, complete, and unbiased manner," including "within a proper context" that
may include"other information" necessary "to ensure an accurate, clear,
complete, and unbiased presentation, including sources and supporting data and
models "so that the public can assess for itself whether there may be some
reason to question the objectivity of the sources."
elementary principle of information quality is to present quantitative
measurements or estimates at a level of precision consistent with that of the
measurement instruments and analytic tools. In this case, Fahrenthold presents
estimates of percentage change with three significant figures, with the last
digit measuring tenths of percentage points. This means Fahrenthold's estimate
of the percentage change in CO2 emissions should be accurate within 0.05%. Given
just the acknowledged missing data, that level of precision is technically
infeasible; he would be fortunate if his first digit were significant. But by
using three significant digits, Fahrenthold falsely implies that he knows much
more about CO2 emissions, and their changes over time, than is justified by his
data. Presentational objectivity is never served by misleading the users of
information about its precision even when the information is
unclear by how much, if any, CO2 emissions actually rose because Fahrenthold
chose a problematic baseline. The year 2001 was unusual in many respects, most
notably a weak recession and the coordinated
terrorist attacks of September 11. The average annual change in CO2 emissions
likely would be different -- and in particular, smaller -- if Fahrenthold had
chosen as a baseline a comparable date in the previous business cycle.
also unclear what to make of estimates for the Virginia Suburbs that exclude
Arlington County, the jurisdiction closest to the District of Columbia. This
difficulty is exacerbated by missing data from two exurban Virginia counties
(Stafford and Fauquier). Arlington, Stafford and Fauquier counties represent 9%,
5% and 3%, respectively, of the estimated 2005 population of the Virginia
Suburbs. That is, data are incomplete or excluded with respect to 17% of the
suburban Virginia population.
Figures for the Maryland Suburbs are even
more problematic. Fahrenthold reports that there are data missing from
Montgomery and Prince George's counties, and he excludes Frederick County. These
counties represent 33%, 30% and 3%, respectively, of the Maryland Suburbs. Data
are incomplete or excluded with respect to 66% of the suburban Maryland
Howard County, located midway between Washington and
Baltimore, is also excluded by the Post.
Had Howard County been included, the population for the Maryland Suburbs would
have been about 10% greater.
Fahrenthold reports that District of Columbia officials took credit
for their apparently lower rate of increase in CO2 emissions:
- Invalid inferences from the data
The brightest news came from the
District, where emissions grew 6.7 percent. D.C. officials said they think the
relatively low increase is partly a sign of changing behavior: Residents were
leaving their cars at home and walking, biking or taking public transit..
But Fahrenthold did not point out that
DC's population had declined about 4% during this period, whereas the population
of Suburban Virginia and Suburban Maryland increased about 11% and 10%,
respectively. Adjusting for DC's population decline, Fahrenthold's figures, if
true, would mean DC's CO2 emissions rose 11% per capita.
entire picture changes when population changes are taken into account. When
Fahrenthold's (unverified) estimates of percentage changes in CO2 emissions from
2001 to 2005 are divided by the Census Bureau's (validated) estimates of
population changes from 2000 to 2005, DC's performance is the worst in the
region rather than the best:
be clear, we hesitate to draw any
inferences from Fahrenthold's data. We doubt they are useful for any public
policy purpose. Most importantly, his inferences about both the absolute change
in CO2 emissions in the Washington metropolitan area and his comparisons across
jurisdictions are unsupported by his own data.
Adjusting for Population|
Changes the Washington Post's
Change in CO2 Emissions
Reported by the Washington Post
Change in CO2|
by the Washington Post Adjusted for Population
The primary message of Fahrenthold's article is that CO2 emissions in
the Washington metropolitan area are "rapidly rising." But Fahrenthold reports
data from just two dates. Even if these data were accurate to three significant
figures, it would be technically impossible to discern acceleration. The most
that Fahrenthold could legitimately report is the average annual change.
- Invalid inferences beyond the data
Information quality principles
matter for many reasons, but one key reason is that when poor quality
information is disseminated, others are led to draw invalid inferences. These
invalid inferences often find their way into public policy unless they are
successfully corrected before decisions are made.
- Information quality defects lead others to draw invalid
A plausible explanation
for the invalid inferences made by the anonymous DC government officials cited
by Fahrenthold is that Fahrenthold himself premised his request for a reaction
on invalid inferences about the data. When pressed for a reaction, public
officials may offer answers that are consistent with other data at their
disposal. Alternatively, they may give an explanation that is either
self-serving or what they think the reporter wants to hear. (Sometimes these are
the same thing.) It's possible that DC officials have data supporting their
suggestion that DC's allegedly lower rate of increase CO2 emissions is a "sign
of changing behavior." But it's more plausible that they didn't want to
attribute the lower rate to a decline in the District's population, about which
they would be familiar and would not be interpreted favorably by a reporter
whose narrative is that regional CO2 are "rapidly rising."
Frank O'Donnell's claim that "sprawl is causing a big increase in greenhouse
gases" is most plausibly related to the public policy positions he and his
organization advocate. Because they are opposed to what they call "suburban
sprawl," sprawl is a convenient inference from Fahrenthold's data that also fits
the reporter's likely narrative.
If sprawl were actually the culprit,
then one would expect to find that commuting times are significantly higher for
jurisdictions farther away from the District. The available data don't support
that inference. Average commute times reported by the Census Bureau are not
nearly as different across the region as one would expect if sprawl were the
underlying cause of rising CO2 emissions. For Virginia, average commute times
vary from 27.3 minutes (Arlington County) to 37.7 minutes (Stafford County). But
Arlington is located adjacent to the District and Stafford is about 45 miles
southwest. A 10-minute difference in average commuting time seems much less than
one would expect if proximity to the District reduced CO2 emissions from
commuting. For Maryland the range is 29.2 minutes (St. Mary's County) to 39.8
minutes (Calvert County) -- again, a range of just 10-minutes.
the average commuting time for residents of the District was almost 30 minutes
in 2000. The higher population density of the District apparently does not
translate into a significantly reduced commute. When DC's figure is treated as a
baseline and subtracted from the averages for the other jurisdictions, the range
in net average commuting times in Virginia becomes -2.4 to 8, and the range in
Maryland becomes -0.5 to 10.1. People in the Washington metropolitan area don't
all work in the District, and they choose places to live based on many criteria
other than the length of their commute. But their average commute is remarkable
stable irrespective if where they live.
Of all the errors in
Fahrenthold's story, surely the most pernicious is the claim that CO2 emissions
are "rising rapidly." As we've already noted, a rate of acceleration cannot be
discerned from two static observations. But this narrative is clearly an
appealing one for those who are predisposed to believe that "the problem" of
anthropogenic global climate change is "getting worse." This narrative is often
expressed by Post reporters and the
newspaper's editorial board. The Post should make a diligent effort to
understand information quality principles and apply them to the newspaper's work
products, especially when a story appears to conform to the revealed biases of
its reporters and editors.
Population Statistics, 2000-2005
|1 Estimated by Census Bureau; see data quality note.|
Estimated by Census Bureau; see data quality note.
Estimated by Census Bureau; see data quality note.
Estimated by the Washington Post;
no data quality disclosed.