• Antelope Release 5.5 Linux 2.6.32-220.el6.x86_64 2015-04-21

 

NAME

Metadata - A C++ handler for variable metadata in an object

SYNOPSIS

#include "Metadata.h"
using namespace std;
using namespace SEISPP;

class Metadata
{
public:
        Metadata(){};
	Metadata(Pf*);
	Metadata(Pf *pfin, string tag);
	Metadata(string) throw(MetadataParseError);
	Metadata(string,MetadataList&);
	Metadata(DatabaseHandle&,
		MetadataList&,AttributeMap&) throw(MetadataError);

        double get_double(string) throw(MetadataGetError);
        int get_int(string)throw(MetadataGetError);
        string get_string(string)throw(MetadataGetError);
        bool get_bool(string);
        void put(string,double);
        void put(string,int);
        void put(string,bool);
        void put(string,string);
        void put(string,char *);
	void remove(string);
	friend ostream& operator<<(ostream&,Metadata&);
	MetadataList keys();
};

typedef struct Metadata_typedef {
	string tag;
	enum MDtype mdt;
} Metadata_typedef;

typedef list<Metadata_typedef> MetadataList;
void copy_selected_metadata(Metadata& mdin, Metadata& mdout,
	MetadataList& mdlist) throw(MetadataError);
MetadataList pfget_mdlist(Pf *pf,string tag);
Pf *Metadata_to_pf(Metadata& md);

SUPPORT


Contributed code: NO BRTT support.
THIS PIECE OF SOFTWARE WAS CONTRIBUTED BY THE ANTELOPE USER COMMUNITY. BRTT DISCLAIMS ALL OWNERSHIP, LIABILITY, AND SUPPORT FOR THIS PIECE OF SOFTWARE.

FOR HELP WITH THIS PIECE OF SOFTWARE, PLEASE CONTACT THE CONTRIBUTING AUTHOR.

DESCRIPTION

Metadata is a proverbial problem in data processing. By metadata I mean a set of ancillary parameters that are associated with a data object. Some parameters are essential for describing an object while others are not always needed. The latter are what I mean by metadata. Dealing with this causes major portability and consistency issues in data processing. The traditional solution is using a fixed name space through parameters stored as headers. The Metadata object provides a more flexible and extensible approach to dealing with this problem.

The Metadata class in this library is a generalization of the idea of a header to store metadata. Rather than burden code with different sets of extracted metadata for each algorithm, the model here is to place the metadata in a private namespace. This is a classic use of data hiding and abstraction of the interface to a concept: here the pool of metadata is tagged with a name key. The basic idea is that a data object that is to be used in some data processing algorithm will include a Metadata object as a component of a higher order object. A good example of how this is intended is this fragment of the TimeSeries object definition in SEISPP that inherits Metadata

class TimeSeries : publis BasicTimeSeries, public Metadata
{
public:
	vector<double> s;
        ...  more components of object definition
};
This allows a generalized header to be attached to the data.

The Metadata space is created by one of several constructors. Metadata(char*) uses Antelope's pfcompile function to append parameter file like input data loaded in the form of a string (char * and string forms are supplied for convenience). The Pf constructor, Metadata (Pf*), is similar but appends the contents of Pf in the Metadata object's namespace.

There is a variant of the parameter file constructor with the calling sequence Metadata(Pf *pfi, string tag). This allows extraction of nested Pf items labelled with "tag" but containing other parameters within the tagged item. The item tagged must be labelled as an &Arr{ parameter file entry or the contents cannot be parsed correctly. Here is an example of how this is used in the a program called pwstack to define parameters for two data objects called a "RectangularSlownessGrid" and a "TopMute":

P_wave_slowness_grid &Arr{
    uxlow = 0.5
    uylow = 0.5
    nux = 51
    nuy = 51
    dux = 0.2
    duy = 0.2
}
Data_TopMute &Arr{
    TimeReferenceType relative
    Zero_End_Time 1.0
    End_Time 2.0
}
Stack_TopMute &Arr{
    TimeReferenceType relative
    Zero_End_Time 2.0
    End_Time 3.0
}

where in this case constructors for these unrelated data objects use a Metadata constructor to parse the parameter file for the tags named "P_wave_slowness_grid" for the RectangularSlownessGrid object and two different TopMute objects tagged with "Data_TopMute" and "Stack_TopMute".

The database constructor allows loading metadata from a database. It's calling sequence is

Metadata(DatabaseHandle& dbh,
	MetadataList& mdl,AttributeMap& am)
where dbh is a generic database handle. In the current implementation this is a pure abstraction and the only available constructor assumes dbh is a Datascope database. That is, the first thing that happens in this constructor is that dbh is downcast to a DatascopeHandle object (see dbpp.3) which contains a datascope Dbptr that is manipulated as usual.

The attributes that are to be extracted from the database are driven by this object:


class AttributeMap
{
public:
        map<string,AttributeProperties> attributes;

	AttributeMap();
        AttributeMap(Pf *pf,string name);
        AttributeMap(string schema);
        AttributeMap(const AttributeMap&);
        AttributeMap& operator=(const AttributeMap&);
};
noting that the STL map container holds the following objects:
class AttributeProperties
{
public:
        string db_attribute_name;
        string db_table_name;
        string internal_name;
        MDtype mdt;
        bool is_key;
        AttributeProperties();
        AttributeProperties(string);// main constructor parses string
        AttributeProperties(const AttributeProperties&);
        AttributeProperties& operator=(const AttributeProperties&);
};
This STL map is used to provide a general way to map data from an external namespace to an internal one. For example, externally a database attribute might be referred to by the attribute with a name like "arrival.time", but internally you might want to refer to this as "atime" for simplicity or to mesh with some other naming convention. Bear in mind that is is the internal name that will be needed to retrieve the correct information from a Metadata object. This could be cumbersome baggage, but it is assumed this can be hidden from most users by defining this mapping in the global parameter file for the application. The Pf constructor for the AttributeMap object, which is the one that that would probably normally be used to build this, looks for a Tbl pf list with the key defined by the string that is the second argument of the constructor. This is expected to be followed by a list of pairs of (external,internal) names. For example, if we set the key to "Sample_AttributeMap" we would want an entry like this in the parameter file:
Sample_AttributeMap &Tbl{
#internal_name	db_attribute_name	db_table_name	MDtype	is_key
sta		sta			wfdisc		string	yes
chan		chan			wfdisc		string	yes
wfstime		time			wfdisc		real	yes
Ptime		time			arrival		real
Stime		time			arrival		real
wfetime		endtime			wfdisc		real
nsamp		nsamp			wfdisc		integer
samprate	samprate		wfdisc		real
wfdir		dir			wfdisc		string
wfdfile		dfile			wfdisc		string
}
Note the order of the tokens is fixed and the top row is a comment that defines this rigid order. Tokens are separated by standard unix white space. When the constructor reads this data there must be exactly four tokens per line or the constructor will throw a MetadataError exception object (see below). The is_key field is optional and when missing the constructor sets the is_key boolean to false. Note that in the current implementation with Datascope the is_key field is relevant only to integer keys that require a dbnextid(3) call to maintain the integrity of the index. Datascope will complain about noninteger key problems, but maintaining noninteger keys is a more complicated problem that is viewed as the applications problem.

The AttributeMap should normally be static and defines the fixed mapping of an internal namespace to a collection of metadata stored in an external database. Hence, the AttributeMap has an intrinsic database model for the data it is indexing. That is, don't expect it to be capable of defining anything that cannot be stored in an antelope database. If you need additional capabilities the interface allows it, but only simple types are currently supported in the MDtype definition.

Most users will wish to use the default constructor for the AttributeMap or the one with a single string argument schema. The default constructor loads a file from the global pf data area for Antelope and assumes a mapping appropriate for the css3.0 schema. The parameterized version scans for the name set in schema instead of the default css3.0. You should check the file (seispp_attribute_maps.pf) for the full definition of namespace mapping including aliases. That is, the standard namespace uses Antelope-oriented naming for the internal names (e.g. wfdisc.time for the time field in wfdisc), but it also includes some useful aliases.

The AttributeMap should normally be loaded from a parameter file early on in a programs initialization phase. It should define the entire namespace of parameters of interest. The information actually passed in and out of a program is controlled by a MetadataList object. MetadataList objects might commonly be constructed using different sets of names for input and output. These are easily constructed from a parameter file using the function pfget_mdlist defined as:

MetadataList pfget_mdlist(Pf *pf, string pftag);
where pf is an Antelope Pf handle (see man pf(3)) and pftag is a string that identifies a tag to an &Tbl entry in a parameter file. For example, to select only entries from the wfdisc table for the example AttributeMap defined above one could set pftag="Input_mdlist" and place the following the the parameter file used for initialization:
Input_mdlist &Tbl{
sta		string
chan		string
wfstime		real
wfetime		real
nsamp		integer
samprate	real
wfdir		string
wfdfile		string
}

Putting all this together, the AttributeMap and MetadataList are used together in the database constructor; the AttributeMap defines the namespace mapping from external (database centric naming) to a set of internal names while the MetadataList passed to that constructor defines what metadata to attempt to extract from the database. The db pointer, db, is expected to be one row of a database view. This can be a join of several tables as the table names are resolved through the AttributeMap.

More limited parameters can be placed in the Metadata object one at a time with the put functions. (There is only one function name because C++ allows overloading.) Single entries in the Metadata object can be deleted with the remove function. In all cases the string in the function defines the key used to access that parameter.

Metadata are retrieved by the get_type functions. The get routines will throw an exception if the requested parameter is not found in the Metadata space. As a result all get functions should be surrounded by a try block with the following catch clause:

try
{
	series of metadata get requests
}
catch ( MetadataError& me)
{
	me.log_error();
	error handling code
}
The catch block can handle this error appropriately as some metadata requests require different actions. As in all proper error handlers the program can abort, set a default and try to continue, or something else.

A copy constructor and an assignment operator are provided to allow depositing Metadata objects into STL containers. An output function is supplied through the "<<" friend function. The output of this method is a parameter file. A corresponding input function was intentionally not included in the class definition.

LIBRARY

$(STOCKLIBS)

SEE ALSO

pf(3), pf(5), pfecho(1),
http://geology.indiana.edu/pavlis/software/seispp/html/index.html

BUGS AND CAVEATS

The AttributeMap object adds complexity to something that is already a bit messy. Applications using these functions should strive to hide this element of the implementation from normal use. In most cases this is expected to mean you will build a static AttributeMap pf description for all programs using this library and a particular database schema. In the same way the mdlist can and should be prepared in a standard pf file for a program, placed in the standard Antelope location for pf files, and not be advertised to the user.

AUTHOR

Gary L. Pavlis
Indiana University
pavlis@indiana.edu


Antelope User Group Contributed Software
Printer icon