North American Regional Climate Change Assessment Program
North American Regional Climate Change Assessment Program
    Home About Data Management
 
  PROGRAM
About NARCCAP
About Data
Contact Us
  RESOURCES
For PIs
For Users
    Access Data
  User Directory
  Contributions
  Acknowledgements
  RESULTS
Output Data Catalog
General Results
    NCEP-Driven RCM Runs
Climate Change Results
CRCM+CCSM
CRCM+CGCM3
ECP2+GFDL
ECP2+HadCM3 NEW!
HRM3+GFDL
HRM3+HadCM3
MM5I+CCSM
MM5I+HadCM3 NEW!
RCM3+CGCM3
RCM3+GFDL
WRFG+CCSM
WRFG+CGCM3
  SPONSORS
 
NARCCAP Operational Data Management Plan
 

Version 1.5 -- September 28, 2007

Overview of Plan

This plan is currently divided into two phases. Phase 1 of the NARCCAP Data Management Plan is aimed at carving a taut, critical path towards releasing RCM data to the broad community - our priority. It will allow us to work through all phases of data processing, refine the procedures, and deliver scientific value to the broad community in the shortest possible timeframe. Phase 1 will rely upon existing, installed, and operational computational, storage, and Earth System Grid (ESG) systems. Initial services will include registration, browse, search, file-based access, fast multi-file download, aggregation and subsetting, and an archive.

Phase 2 will begin with the installation and configuration of ESG distributed software components at LLNL/PCMDI. It is our intent to engage in this phase in parallel with Phase 1. This will involve dealing primarily with data transport systems, storage management systems, and software and security infrastructure and policies. Once these new capabilities have been integrated and tested with PCMDI systems and storage, we will be positioned to begin publishing NARCCAP datasets at PCMDI. In parallel with this work, we will begin to extend the system from file-based access to "virtual" datasets, where users will be able to request distributed data products by spatial-temporal-variable subsets.

General Roles and Responsibilities

IOWA

  • Specification and refinement of NARCCAP data format and metadata requirements.
  • Initial data quality control (QC) inspection
  • Approval to move publishing process forward.

NCAR

  • Software development for ESG adaptation for NARCCAP (CISL)
  • Full dataset QC inspection (ISSE)
  • Archiving of QC'd datasets on NCAR Mass Store System (MSS) (CISL)
  • Publication of QC'd datasets into ESG, including transfer of datasets to NCAR and PCMDI disk storage resources
  • Notification of availability to community

LLNL/PCMDI

  • Liaison for exchanging data via shippable disk arrays, upload of submitted datasets
  • Allow data quality control NCAR staff access to PCMDI storage systems to verify data before archiving
  • Install all necessary software used by the quality control NCAR staff to verify the data at PCMDI
  • Archive all data at NERSC HPSS
  • Installation, configuration, and support of ESG distributed components
  • Provisioning of ESG-connected online storage

LLNL

  • Provisioning of CMOR code for RCM NARCCAP applications, and CAM3 timeslice modeling activities.

Description of NARCCAP Data Process

The process in a nutshell:

  • Modeling centers will send sample data to Iowa State after data are processed with CMOR or equivalent
  • Data will undergo an initial quality-control check through Iowa State.
  • Modeling groups will relay their datasets to PCMDI via shippable storage arrays.
  • PCMDI will upload datasets from shippable disk arrays to local staging PCMDI rotating storage.
  • PCMDI will make a copy of all incoming datasets on the NERSC HPSS for purposes of disaster recovery.
  • Datasets will be QC'd by NCAR staff at PCMDI.
  • QC'd datasets will be archived to the NCAR Mass Storage System (MSS).
  • QC'd datasets will be published to the Earth System Grid (ESG), making them transparently available through the www.earthsystemgrid.org interface. Early datasets will reside on NCAR disk, later datasets from PCMDI disk.

We have established a "NARCCAP Data" mailing list, which will be used by modeling groups to initiate and follow-through on the submission process, and by NARCCAP data team members to post notifications related to the various steps and related progress, as outlined below. The idea here is to maintain good communication throughout the process, so everyone is aware of progress and any issues that might crop up. This is important: at each stage of the process, collaborators must post mail to the NARCCAP Data Mailing List: narccap-data@mesonet.agron.iastate.edu. This is to insure that everyone is aware of progress, problems, and issues.

Step 1: Modeling groups submit sample data to Iowa State

If a modeling group is preparing output from runs subsequent to the NCEP-driven runs, go to Step 3

Modeling groups will prepare output for the variables specified at http://narccap.ucar.edu/data/output_archive.html.

Using CMOR or an equivalent process, modeling groups will produce datasets for publication according to the NARCCAP requirements specified at http://narccap.ucar.edu/data/output_requirements.html.

If preparing output from the NCEP-driven runs, send to Iowa State via ftp, one file in standard NARCCAP format for one variable in each of the NARCCAP archive tables:

Table 1 - Daily maximum temperature (tasmax)
Table 2 - Precipitation (pr)
Table 3 - 500 hPa geopotential height (zg500)
Table 4 - Surface altitude (orog)
Table 5 - Temperature (ta)

(Note: these tables are not the same as the CMOR tables)

Step 2: Iowa State reviews submitted data

Following a process similar to that used for AR4, Iowa will undertake an initial review of model output, and interact with each modeling group as needed to arrive at correct datasets according to NARCCAP standards. This will include evaluating some diagnostics and reviewing the metadata. Communications in this activity will all be posted to the mailing list, so that all parties are aware of any problems or workflow issues that arise.

Upon successful completion of this step, Iowa will either approve the submission or iterate the process further with the modeling group. Once the submission has been approved, the data can move on to step 3.

Step 3: Modeling group announces data ready, PCMDI ships disk

When a modeling group has data that is ready for submission, they will announce it on the narccap-data mailing list. At this point, Tony Hoang will ship a disk array to the modeling group for them to load with their data. Note that Dean Williams will help coordinate activities at PCMDI.

Step 4: Modeling groups ship data back to LLNL/PCMDI

The modeling group will receive the disk array, load their correctly formatted and structured output data upon it, and ship it back to Tony Hoang at PCMDI.

Step 5: PCMDI uploads submitted data

Upon receipt of the shippable disk array, PCMDI will upload datasets onto rotating storage at PCMDI (onto disk on the machine climate.llnl.gov) and archive a copy of each dataset, as submitted by the modeling group, to the NERSC HPSS for purposes of disaster/failure recovery. The current plan is to treat these copies as dark/unpublished archives. Once upload of the contents of the disk array is successfully complete, PCMDI will notify the mailing list that the dataset is available, and will provide a pointer to its location.

Step 6: NCAR performs QC on submission

QC will be conducted in-situ at PCMDI, on climate.llnl.gov. This provides the closest connection to the original datasets in the event file corruption or other problems are detected. Seth McGinnis and Larry McDaniel will perform the final QC of the datasets, resulting in a product that is ready to be published into the ESG system.

What will happen when problems are found in the submission depends on their nature. In the case of problems that are simple, isolated, and easy to fix, the QC team will fix them. For more complicated or pervasive problems, the modeling team will be responsible for fixing them. The QC team will communicate the problem directly to the modeling team, and the modeling team will coordinate with Dean Williams to arrange for the retransfer of the corrected datasets to PCMDI via ftp or shipping of another disk, as appropriate. The QC team will also summarize the problem for the mailing list to aid other modeling groups in avoiding the same problem.

Step 7: NCAR archives datasets, readies them for publishing

Once a dataset has made it through final QC, an archival copy will be stored on the NCAR MSS for at least five years, as per grant contract. The QC team will announce that the dataset is ready on the mailing list. Chi-Fan Shih will then transfer QC'd datasets via the network using DataMover from PCMDI to the NARCCAP storage staging space on the NCAR SAN and make an archival copy in the MSS. If the data is part of the first 10 TB (approximately) of output, it will reside on disk at NCAR, on datazone.ucar.edu. In this case, Chi-Fan will also copy the data into the appropriate location on datazone. The remainder of the data will reside on disk at PCMDI, on climate, in which case the QC team will simply copy it from scratch space into the appropriate location.

Step 8: NCAR publishes datasets to ESG

Luca Cinquini will set up the initial re-engineering of the ESG publishing infrastructure for NARCCAP. Once a dataset has been positioned in its final place of residence, Seth and Larry will publish it into the ESG system, whereupon it will be available for the NARCCAP community to download.

Data will reside in a directory structure organized [regional-model]/[driver]/[present|future] (e.g., MM5/CCSM/future or RegCM3/NCEP/present). These will be presented to end-users as a table of RCM/GCM combinations that link to the appropriate catalogs of data.

Note: Initially, test datasets and early results will be published at NCAR as part of Phase 1 activities. The idea is to work through the entire process as quickly as possible, and to get data products out to the community in as streamlined a fashion as possible. Once the Phase 2 integration of PCMDI systems is complete, published data will begin to flow there as well.

Timeslice Data

The GFDL timeslice data will be served directly by GFDL. It is currently available at http://www.gfdl.noaa.gov/~bw/narccap/. Only data for the NARCCAP region is available. (QC for this dataset is still under discussion.)

The CAM3 timeslice data from Phil Duffy will reside on climate alongside the other datasets. The entire global dataset will be made available. Phil will work with the QC team to perform an initial check of the data before postprocessing. Otherwise, the timeslice data will be treated like other model datasets with regard to QC and publishing. Arrangements for transfer of the timeslice data as necessary will be coordinated by Phil, Dean, and Chi-Fan.

 
©2007 UCAR   |   Privacy Policy   |   Terms of Use   |   Site Map   |   top of page    
The National Center for Atmospheric Research is sponsored by the National Science Foundation. Any opinions, findings and conclusions or recommendations expressed in this material do not necessarily reflect the views of the National Science Foundation.