Use of vocabularies
Controlled vocabularies are standardized sets of terms that define terminology used to describe and organise data. Using standardised sets of terms ("controlled vocabularies") in metadatametadata
Information about meteorological and climatological data concerning how and when they were measured, their quality, known problems and other characteristics. and for data labelling solves the problem of ambiguities associated with data markup and also enables records to be interpreted by computers. This opens up a whole world of possibilities for computer aided manipulation, distribution, and long term reuse of datasets.
The datasets used in CLIPC are very diverse in origin, including satellite and in-situ observational data, climateclimate
Climate in a narrow sense is usually defined as the average weather, or more rigorously, as the statistical description in terms of the mean and variability of relevant quantities over a period of time ranging from months to thousands or millions of years. The classical period for averaging these variables is 30 years, as defined by the World Meteorological Organization. The relevant quantities are most often surface variables such as temperature, precipitation and wind. Climate in a wider sense is the state, including a statistical description, of the climate system. model and re-analysis data. Although they are often already harmonised within their domain, the syntax and semantics is different. Using controlled vocabularies is a very important step in harmonised discovery and access to datasets.
A controlled vocabulary usually contains the following information for each term:
- Key — a unique permanent identifier for the term, it is often designed for computer storage rather than human readability
- Term — the text string representing the term in human-readable form
- Abbreviation — a concise text string representing the term in human-readable form where space is limited
- Definition — a full description of what is meant by the term
Many of the datasets used in CLIPC already have existing controlled vocabularies. CLIPC is also incorporating datasets that do not have existing controlled vocabularies and so new ones have been created. In addition new high level controlled vocabularies have been created, that define the broad concepts used within this portal to improve data discovery.
The new controlled vocabularies that have been defined for CLIPC have been incorporated into the NERCNERC
National Environment Research Council Vocabulary Server (NVS). The NVS is hosted at the British Oceanographic Data Centre (BODC) but holds vocabularies related to most areas of Earth System Science; e.g. meteorology, ocenography and the Climate and Forecast (CF) conventions.
An example of how computers may benefit from the use of controlled vocabularies is in the mathematical processing of values (e.g. addition) taken from different datasets. Consider, for example, one dataset may have a field labelled "Air temperature at the surface" and another might have "Air temperature at 1.5m" or even "TAS". To the human eye, the similarity is obvious but a computer would not be able to interpret these as the same thing unless all the possible options were hard coded into its software. If data are marked up with the same terms, this problem is resolved.
In the real world, it is not always possible or agreeable for data providers to use the same terms. In such cases, controlled vocabularies can be used as a medium to which data centres can map their equivalent terms. This is often done using the Simple Knowledge Organisation System (SKOS), which permits a variety of mappings between terms.
At the NVS all the vocabularies are fully versioned and a permanent record is kept of all changes made.