Scientific Stewardship in the Open Data and Big Data Era — Roles and Responsibilities of Stewards and Other Major Product Stakeholders
From: D-Lib Magazine
Ge Peng1, Nancy A. Ritchey2, Kenneth S. Casey2, Edward J. Kearns2, Jeffrey L. Privette2, Drew Saunders2, Philip Jones3, Tom Maycock1, and Steve Ansari2
Ensuring and improving quality and usability is an important part of scientific stewardship of digital environmental data products, but the roles of the responsible parties — those who manage quality and usability — have been evolving over time and have not always been clearly defined. Recognizing that in the Open Data and Big Data era, effective long-term scientific stewardship of data products requires an integrated and coordinated team effort of experts in multiple knowledge domains — data management, science, and technology — we introduce the following stewardship roles for each of these domains: data steward, scientific steward, and technology steward. This article defines their roles and high-level responsibilities as well as the responsibilities of other major product stakeholders, including data originators and distributors. Defining roles and formalizing responsibilities will facilitate the process of curating and communicating quality information to users. Clearly defined roles will allow effective cross-disciplinary communication and better resource allocation for data stewardship, supporting organizations in meeting the challenges of stewarding digital environmental data products in the Open Data and Big Data era.
Ensuring data quality and improving availability and accurate representation of data and information are critical for informed, sound decision-making. Data and information quality management has always been a critical part of data management. However, for environmental data products that are produced or stewarded using federal funding, emerging non-functional requirements (such as correctness, usability, integrity, scalability, traceability, interoperability, etc.) and the advent of Big Data (characterized by massive data volumes, vast variety and complexity of data types, and low data latency) have dramatically changed the way digital datasets are being managed (Laney, 2001; Miller, 2013; Gurin, 2014; Shueh, 2014; Saey, 2015).
On the policy side, to be compliant with the U.S. Information Quality Act (U.S. Public Law 106-554, 2001), many U.S. federal agencies require their data providers to have comprehensive plans for managing and/or sharing non-restricted data and results in a timely manner, working with designated data centers or repositories (e.g., National Science Foundation (NSF), 2011; National Aeronautics and Space Administration (NASA), 2011; 2014; National Oceanographic and Atmospheric Administration (NOAA), 2011; U.S. Geological Survey (USGS), 2015). A summary of many federal funding agencies and their data access and sharing policies can be found here. (See Section 2 for definitions and scopes of terms used in this article.)
3.3.1 Role of data stewards
The role of data stewards has been previously defined as leading governance practices and providing guidelines on governance (Khatibloo et al., 2014; Information Management, 2014; Chatfield and Selbach, 2011). From the scientific data stewardship perspective, data stewards are responsible for ensuring compliance with data management standards, including community standards on data quality metadata and policies such as the U.S. Information Quality Act (U.S. Public Law 106-554, 2001) and Open Data Policy (OMB, 2013). They also need to provide data management guidance and help define data management requirements to other stewards, documentation and metadata team members, and other key stakeholders.