• Antelope Release 5.5 Mac OS X 10.8.5 2015-04-21

 

NAME

TimeSeries - generic data object to hold scalar time series data

SYNOPSIS

#include "TimeSeries.h"
using namespace SEISPP;
class TimeSeries: public BasicTimeSeries, public Metadata

SUPPORT


Contributed code: NO BRTT support.
THIS PIECE OF SOFTWARE WAS CONTRIBUTED BY THE ANTELOPE USER COMMUNITY. BRTT DISCLAIMS ALL OWNERSHIP, LIABILITY, AND SUPPORT FOR THIS PIECE OF SOFTWARE.

FOR HELP WITH THIS PIECE OF SOFTWARE, PLEASE CONTACT THE CONTRIBUTING AUTHOR.

DESCRIPTION

A TimeSeries object encapsulates the concept of a scalar time series. A scalar time series is defined as a BasicTimeSeries with the dependent variable being a scalar. That is, a TimeSeries describes a function of one variable defined on a regular mesh of points defined by a start time, sample rate, and the number of samples (see BasicTimeSeries(3)).

A TimeSeries is a derived class inheriting members from two very different data objects. The BasicTimeSeries (3) object contains basic parameters and functions common to any data type sampled on a 1D regular mesh. The Metadata(3) object is used to provide what is effectively a variable length header with parameters that can be extracted by name. The namespace for the parameter definitions and their internal meaning is largely implementation dependent although a few items (see below) have frozen names that are utilized by the Metadata-based constructor (see below). The Metadata concept and how it is expected to be applied is described in more detail in the Metadata(3) man page.

The TimeSeries object itself was designed to be lightweight. That is, the aim was to make the number of member functions and required data as small as reasonably possible. Note that parameters common to all time-series type data are inherited from the BasicTimeSeries object (i.e. sample rate, start time, etc.). As a result the actual TimeSeries object definition is thus quite small. It is:

class TimeSeries: public BasicTimeSeries , public Metadata
{
public:
        vector<double>s;
        TimeSeries();
        TimeSeries(int nsin);
        TimeSeries(Metadata& md,bool load_data);
        TimeSeries(Dbptr db, MetadataList& mdl, AttributeMap& am);
        TimeSeries(const TimeSeries&);
        TimeSeries& operator=(const TimeSeries&);
        void zero_gaps();
};

That is, the object mainly defines a set of constructors and operators (see below) and a specific implementation of the BasicTimeSeries virtual function zero_gaps. Most of its contents are acquired by inheritance.

The primary data item added by the TimeSeries object is the actual sample data. This is stored in the Standard Template Library (STL) vector and has the name "s". The STL vector was chosen over a regular C vector for all the standard reasons given for the STL. An important element that may not be known by all potential users is that the standards definition for the STL requires that the elements of a vector be stored in contiguous memory. This means the symbol s can be treated (with care) almost like a regular C pointer to an array of doubles. To be more precise it means we can do things like this:

TimeSeries x(10);
double xC[10];
for(int i=0;i<10;++i)x.s[i]=static_cast<double>(i);
dcopy(10,&(x.s[0]),1,xC,1);

and xC will contain a copy of the contents of the x.s vector. This mechanism should allow a large class of existing time series algorithms (including FORTRAN ones like the original BLAS used in the example above) to be used by simply passing the address of the first sample of s to the desired function. The STL vector, however, improves the robustness of code that uses it as it provides basic functionality like index range checking. Be warned though that inverse of the above is evil and virtually guaranteed to cause problems. That is, you can read the contents of s as a continuous vector, but writing to a container using that type of algorithm is dangerous.

CONSTRUCTORS

  • (1)
    TimeSeries(); This, the default constructor, does almost nothing. It initializes all member data parameters to zero, sets the length of s to zero, and marks the data dead.
  • (2)
    TimeSeries(int nsin); This acts like the default constructor except it reserves nsin slots in the s vector.
  • (3)
    TimeSeries(TimeSeries&); Standard copy constructor.
  • (4)
    TimeSeries(Metadata& md,bool load_data); This should be viewed as a general constructor driven by a Metadata object. It provides a fairly general mechanism for file-based input of time series data, but it has a frozen name space for a set of parameters it needs to extract from the parent Metadata object. This constructor first clones the Metadata contents (using the Metadata copy constructor) of the object md that it is passed. It then extracts data for a BasicTimeSeries from the input Metadata object using this map of external names to internal symbols:
    
    Symbol	Metadata name
    dt	samprate
    t0	starttime
    ns	nsamp
    
    
  • The load_data boolean field on this constructor determines whether or not the constructor should attempt to read data into the s array. This switch was a design feature to allow flexibility in input. The concept is that simple input (defined as a raw fread of a vector of doubles stored in parent host binary format) is handled by the allowing this constructor to directly and blindly read such data. Other formats are assumed to require more work with a specialized function. In those cases the expectation is that the user would call this constructor with load_data false, read the input data into a buffer, and convert and store the results in the s vector. This model is applicable to any format I know of so I viewed this generic approach preferable to adding a long string of constructors for the plethora of formats that exist for time series data. When load_data is true a second set of parameters are extracted from the Metadata object, md. These must be present of the constructor will fail. The required symbols are:
    
    Keyword		Definition
    TimeReferenceType	tref field of BasicTimeSeries
    datatype		Data type ala CSS3.0 dtype attribute
    foff			File offset in bytes to first sample
    dir			Directory where data will be found
    dfile			File name to read
    
    
  • Note that tref is assumed to be defined by a string field. If TimeReferenceType=="relative" relative times are assumed, otherwise the time standard is assumed to be absolute (epoch) times. The "datatype", "dir", and "dfile" fields define the format and path description for the file that contains the data of interest. Note that this simple mechanism does not currently support any data gap definitions. The current implementation is limited in capabilities due to the design concept for this constructor described above. Specifically, if datatype is anything but the host float format (t4 for Suns and u4 for Intel boxes) this constructor will throw a SeisppError (3) exception. If datatype is valid the constructor will attempt to open the file dir/dfile, seek to foff, read ns samples from the file with fread, convert to doubles, and close the file. If any of the I/O operations fail the constructor will throw a SeisppError exception. An error handler wrapped around this constructor should be aware that if any SeisppError exception is thrown the object should be considered to have valid Metadata but the sample data are not valid. In particular note that in this situation the s vector will have ns slots reserved for data, but the sample data are not valid. This constructor may also throw a MetadataError object (see Metadata(3)). If a MetadataError is caught the object should be discarded as it should be viewed as trash. A MetadataError indicates one of the required parameters for the constructor was not found in the Metadata space.
  • (5)
    TimeSeries(Dbptr db,MetadataList& mdl,AttributeMap& am); This is a generic database-driven constructor for segmented data. That is, it makes an implicit assumption the data being requested are defined in single, discrete segments indexed by a database (e.g. event-based seismograms). The current implementation uses an Antelope database pointer as Dbptr but the user should recognize that this is not a requirement of the interface. Dbptr should be viewed as a database handle that points at one row of a database that defines TimeSeries data. What that handle points to is implementation dependent. For this implementation it is an Antelope database pointer and it MUST point to one and only one row of a database view. (i.e. the row field must not be something like dbALL). The database attributes to be extracted from this database row are controlled by mdl and am in the manner described in detail in Metadata(3). Briefly, mdl and am control how attributes in the database are mapped to an internal Metadata namespace. The list of attributes to be extracted from the database are driven by mdl. The basic algorithm is that for each element of mdl an associated attribute is extracted from the unique database row defined by db. This implementation then uses the Antelope trace library to read the data and define any data gaps. This means any trace format Antelope supports can be read and loaded with this constructor. It also means that the mechanisms used by trgaps(3) to define gaps in external data representations will also work and the TimeSeries object that is constructed will have such data gaps correctly defined. This constructor will throw a SeisppError if there are any problems. An error handler should catch this exception, call log_error(), and discard such data or abort.

OPERATORS

Current the = operator is defined and should behave as expected.

NON-MEMBER FUNCTION

The SEISPP library has a number of functions that operate on TimeSeries objects. Thes are documented elsewhere. Here I describe only generic functions that are not seismo-centric.

The following is the prototype for a Datascope database output function:

void dbsave(TimeSeries& ts,Dbptr db,string table,
	MetadataList& md, AttributeMap& am)
                throw(SeisppError);

This function saves a TimeSeries object to a Datascope database. The function is fairly general, which complicates the interface. The function takes the contents of ts and writes results to database db storing metadata in table (usually wfdisc but not required). The attributes to be stored in wfdisc are extracted from the Metadata(3) components of the ts object. The list of names to be saved is stored in md and the AttributeMap object, am, describes how these names are mapped to external attribute names (see Metadata(3) for details on this process -- dbsave uses the metadata function with similar arguments).

The process of saving the data contents of the ts object has two parts. First, the time series vector (the main data) is saved as an external file. The file name is derived from the "dir" and "dfile" metadata fields of ts. The output routines that do this always operate in append mode. That is, if dir/dfile already exist, the data from ts are appended onto that file and foff is set appropriately. Be warned that a variable foff must be defined in mdl or this information will be lost and the calling program could have mysterious behaviour. The current implementation is somewhat rigid in it's viewpoint on output. That is, it only writes data in host (32 bit)floating point format. This was a choice driven by the viewpoint that the SEISPP library was an offline processing library and the added complexity of the multiplicity of external formats should be handled ONLY on input as a raw data problem. The viewpoint is that dbsave output is a scientific result or an intermediate calculation where format is not an issue.

Note the function will throw a SeisppError if there are any problems in saving the object.

EXAMPLE

This small test probram illustrates the use of the database constructor and the assignment operator.

#include <string>
#include "pf.h"
#include "db.h"
#include "seispp.h"
using namespace SEISPP;
int main()
{
	string pfname("test_ts");
        string amname="AttributeMap";
        int i;
        int ierr;

        Pf *pf;

        if(pfread(const_cast<char *>(pfname.c_str()),&pf))
        {
                cerr << "Pfread error"<<endl;
                exit(-1);
        }
        try {
               	cout << "Building Attribute map"<<endl;
		// define namespace mapping using pf
                AttributeMap am(pf,amname);
		// use pf to define list of attributes to be obtained
		// from the database
               	MetadataList mdl=pfget_mdlist(pf,"metadata_input");
                Dbptr db;
                char *dbname="testdb";
                ierr=dbopen(dbname,"r+",&db);
                db=dblookup(db,0,"wfdisc",0,0);
                db.record = 0;
		TimeSeries ts(db,mdl,am);
                cout << "Metadata contents of Time Series Read"<<endl;
                cout << (dynamic_cast<Metadata &>(ts1));
                cout << "Data values of trace data"<<endl;
                for(i=0;i<ts1.ns;++i) cout << ts1.s[i] <<endl;
		cout << "Testing assignment operator" << endl;
                TimeSeries tscopy;
		tscopy=ts;
	}
        catch (MetadataError mess)
        {
                mess.log_error();
                exit(-1);
        }
        catch (SeisppError smess)
        {
                smess.log_error();
                exit(-1);
        }

}

LIBRARY

-lseispp

SEE ALSO

BasicTimeSeries(3), SeisppError(3), Metadata(3), ThreeComponentSeismogram(3),
http://geology.indiana.edu/pavlis/software/seispp/html/index.html

BUGS AND CAVEATS

There are a few obvious operators that can and should eventually be added. These include adding two time series (binary + operator) and multiplication by a scalar. Everything else I can think of in standard time series analysis can and should be viewed as procedural.

AUTHOR

Gary L. Pavlis
Indiana University
pavlis@indiana.edu

Antelope User Group Contributed Software
Printer icon