• Antelope Release 5.5 Mac OS X 10.8.5 2015-04-21

 

NAME

db2pfstream - build database view and send metadata as a pfstream

SYNOPSIS

db2pfstream  db file [-n -V -v -pf pffile -sift expression]

SUPPORT


Contributed code: NO BRTT support.
THIS PIECE OF SOFTWARE WAS CONTRIBUTED BY THE ANTELOPE USER COMMUNITY. BRTT DISCLAIMS ALL OWNERSHIP, LIABILITY, AND SUPPORT FOR THIS PIECE OF SOFTWARE.

FOR HELP WITH THIS PIECE OF SOFTWARE, PLEASE CONTACT THE CONTRIBUTING AUTHOR.

DESCRIPTION

db2pfstream is a generalized input function for a processing system based on a concept of a pfstream. A pfstream is a way to encapsulate pieces of metadata data that could be attached to an arbitrary data object to define it. This program is designed to build a (potentially long) stream of these pf blocks and feed the results into an output file. It is expected that file would normally be a fifo to allow the output of this program to feed directly into a secondary processing algorithm without actually taking up space for the pf data. This is intended as a simple data model for parallel processing algorithms that are data driven like most seismic applications.

The db and file parameters are required. db is assumed to be an Antelope database and file is the output file name (can and would normally be a fifo created with mkfifo). The default assumes a parameter file exists in PFPATH called db2pfstream.pf that controls most of what this program will do (see below).

OPTIONS

FILES

Here is an example pfstream output file with 5 rows and 2 attributes (columns). This example has no grouping defined.

example &Arr{
    ensemble    &Arr{
        00000   &Arr{
            arid        2378
            orid        316
        }
        00001   &Arr{
            arid        2379
            orid        316
        }
        00002   &Arr{
            arid        2380
            orid        316
        }
        00003   &Arr{
            arid        2381
            orid        316
        }
        00004   &Arr{
            arid        2382
            orid        316
        }
        00005   &Arr{
            arid        2383
            orid        316
        }
    }
    ensemble_keys       &Tbl{
        gridid
    }
}
__EOF__
__EOI__

For more details see pfstream(5).

PARAMETER FILE

The input parameter file can get fairly long. This is an abbreviated example forming part of the command arrival view used in catalog preparations. Check the standard one for a larger example:


# Example parameter file
sleep_time 30
dbprocess_list	&Tbl{
    dbopen event
    dbjoin origin
    dbjoin assoc
    dbjoin arrival
}
ensemble_keys	&Tbl{
    evid
}
ensemble_mode	true
group_keys	&Tbl{
}
passthrough	&Tbl{
    evname string evname
    origin.jdate	int	origin.jdate
    origin.nass	int	nass
    origin.ndef	int	ndef
    origin.ndp	int	ndp
    grn	int 	grn
    srn	int	srn
    etype	string 	etype
    review	string	review
    depdp	real	depdp
    dtype	string	dtype
    mb	real	mb
    mbid	int	mbid
    ms	real	ms
    msid	int	msid
    ml	real	ml
    mlid	int	mlid
    auth	string	auth
    origin.auth string origin.auth
    commid	int	commid
    origin.commid	int	origin.commid
    arrival.commid	int	arrival.commid
    algorithm string algorithm
    belief	double  belief
    assoc.delta double delta
    assoc.seaz  double seaz
    assoc.esaz  double esaz
    assoc.timeres  double timeres
    assoc.timedef  string timedef
    assoc.azdef  string azdef
    assoc.slodef  string slodef
    assoc.azres	double azres
    assoc.slores	double slores
    assoc.emares	double emares
    assoc.wgt double wgt
    assoc.vmodel string vmodel
    arrival.jdate int arrival.jdate
    iphase	string	iphase
    stassid int stassid
    chanid int chanid
    stype string stype
    azimuth double azimuth
    delaz double delaz
    slow double slow
    delslo double delslo
    ema double ema
    rect double rect
    amp double amp
    per double per
    logat double logat
    clip string clip
    fm string fm
    snr double snr
    qual string qual
    arrival.auth string arrival.auth
}
require	&Tbl{
    evid	int	evid
    orid	int	orid
    prefor	int	prefor
    origin.lat	real	origin.lat
    origin.lon	real	origin.lon
    origin.depth	real	origin.z
    origin.time	time	origin.time
    arid	int	arid
    phase	string	phase
    sta	string	sta
    chan	string	chan
    arrival.time	time	arrival.time
    deltim	real	deltim
}
virtual_table_name	test_arrival_view

dbprocess_list is a Tbl list that is passed directly to dbprocess to build the working view for this program. This is a VERY important parameter in two ways. First, it defines the set of joins that will be needed to build the complete suite of attributes to be passed into the ouput stream. Second, it must define the sort order properly to provide the right grouping when ensemble output is requested (see below).

The boolean, ensemble_mode, controls the basic output mode. If this parameter is false, the program writes pf's in single object (row) mode. That is, a block of parameters is written to the output stream for each row in the input database view. If ensemble_mode is set true, the the output will be blocks of paremeters with repeating names surrounded by the parameter file block

ensemble &Arr{
   ...
}
(see example above) When ensemble_mode is true this program searchs for two lists called ensemble_keys and grouping_keys. The ensemble_keys parameter defines the grouping that defines one data object that is to be processed by a downstream algorithm. That is, an ensemble is the basic unit of granularity of the algorithm that is to use these data. (Note the ensemble is assumed to have only one element/database row if ensemble_mode is false. ) That is, it defines the outer blocking of the pfstream. The grouping_keys is optional for an ensemble and defines a secondary grouping of the ensemble. A typical example of this might be an ensemble of three-component seismograms with the gather defining the ensemble and the grouping defining the collections of three-components for each station. It is VERY important that the process_list, ensemble_keys, and grouping_keys be internally consistent and sensible for the input database. The process_list must make sure the sort order is consistent with the grouping and the ensemble and group keys need to be consistent or chaos can result. The basic advice to keep in mind is this. First, the process_list should sort the data in the order of the combined ensemble and group keys with the ensemble sort first. Second, make SURE that if group keys are used the group_key list should contain the ensemble keys as the first entries in the list. This is necessary because the program simply calls dbgroup twice: once with the ensemble key and (when requested) a second time with the group keys. The group keys are assumed to be a finer grouping than the ensemble keys.

virtual_table_name is the tag assigned to the outer block defining an ensemble. That is, multiple ensembles with different tags can be embedded in a single pfstream block. This allows a fairly general way to map from one name space to another or one database schema to another. For the example described in the FILES section above this tag was set to example.

sleep_time sets the time the program will sleep before closing it's output file. This should be set to the maximum expected execution time for downstream processing of one data object. It can be set to zero if the output is an ordinary file.

SEE ALSO

pfstream2db(1), pfstream(3), pfstream(5)

BUGS AND CAVEATS

The sleep_time parameter is unquestionably a kludge. It is a less than elegant way to work around a problem with a fifo when downstream processes operate in feeding mode (i.e. open, read one block, close) from a fifo. It is necessary because if db2pfstream closes and the other end does not have the fifo open for read the last block of data will be dropped. This could probably be done better by using bidirectional communication capabilities of named pipes described in streamio(7).

The interaction of the process_list, ensemble_key, and group_key parameters is complex and perhaps should be forced. Here it is up to the user to make sure they understand the database well enough and the grouping process to guarantee it all works right. In general this should not be awful because stock pf files should be built for any application to be the receiver of a pfstream.

AUTHOR

Gary L. Pavlis, Indiana University (pavlis@indiana.edy)
Antelope User Group Contributed Software
Printer icon