• ARCHIVE LOCATION & CATALOGING SYSTEM •
In contrast to more conventionsl libraries, our Spatial Data Archive houses data that starts out as digital files. So all the speed and capacity advantages of digital data are sort of built in. But we have the same challenges faced by any digital archive. How to make the data holdings readily available to users and keep them secure at the same.
Master Archive Location & Configuration
The master archive consists of digital folders and files stored on a set of hard drives. As of December 2012, the drives in question are installed in the PC assigned to the Biodversity Center's GIS/Mapping Specialist. The PC is physically located in the Biodiversity Center GIS Lab (Rm 238 - Mary Ann Cofrin Hall).
The archive organization is based on assembling related databases into second-level folders called "volumes", which are described in more detail below. A backup copy of each volume is maintained on optical media with discs stored on shelves in clearly marked file boxes. As of December 2012, the master archive backup is physically located in the Biodiversity Center GIS Lab (Rm 238 - Mary Ann Cofrin Hall). Backing up is done on a volume-by-volume basis. When an archived database is changed or a new database is added to the archive, the revised version of the affected volume is burned to disk and placed in the appropriate file box.
Master Archive Basic Organization
We chose not to invest in the usual "card catalog" database to communicate the location of individual items among our library holdings. Instead we use the most transparent folder-naming scheme that we could come up with to organize the drives that contain our data. This seems like the best use of our very limited staff time but is still something of an experiment. Hopefully the scheme that we came up with is sufficient for users to find items of interest in a reasonable amount of time. Criteria used in development of the system are listed below:
• The system must allow for orderly archiving of 100,000 spatial databases
• Spatial databases can range in size from < 1mb to > 5gb
• All current and future database formats must be accomodated
• The system must leave original database file names unchanged and not rely on file names for cataloging
• The system must be transparent enough to be used by GIS novices.
Second-level subfolders called "volumes" are the primary organizational tool. As of November 2012,
~20,000 spatial databases have been assigned to one or another of ~250 volumes. Here's a summary of the practices
• A volume is a second tier folder. New ones are created as needed. Volume folder names are written in CamelCase.
• Volume folder names always end with "_". Underscores are not used anywhere else in the naming system.
• Related volumes are grouped under top folders with names consisting of a volume prefix and some explanatory text.
• Databases are in third-level folders with names consisting of the volume name plus a short modifier
The screen capture below shows what a small excerpt from the archive looks like in Windows Explorer:
Volume Names Based On Topic Category
The example above illustrates the portion of the library whose organization is based on topic categories. Volume names begin with a two-digit number that correspond to one of 20 categories established by the International Standards Institute in the ISO 19115 standard. "07" in the example signifies data from the "Environment" category which is further defined in the standard as "Environmental resources, protection and conservation". The standard enumerates the following examples: "environmental pollution, waste storage and treatment, environmental impact assessment, monitoring environmental risk, nature reserves, landscape, water quality, air quality, environmental modeling". Check out the following references for more information on ISO 19115.
• Metadata Quick Guide (technical paper published by Redlands Institute)
• ISO Topic Categories (guidance document published by NASA)
Examples of data types cataloged by topic category:
• Federal watershed boundary data (Category 12)
• LANDSAT scenes (category 10)
• Federal weather and climate data (category 04)
Volume Names Based On Geography
About half the archive is cataloged by geography rather than by topic category. Why do we have 2 cataloging schemes running in parallel? Because as the design evolved it made sense to do it it that way. The screen capture below shows what a small excerpt from the archive looks like in Windows Explorer:
Examples of data types cataloged by geography:
• Data obtained from county governments (parcel data, some roads, local land use, etc)
• Imagery mosaicked by county (some orthophotos and topo maps)
• Data relevent to a particular study site or area of interest (eg Niagara Escarpment)