• Antelope Release 5.5 Linux 2.6.32-220.el6.x86_64 2015-04-21

 

NAME

bulletin2orb,bulletin.pm - collect catalog information from remote institutions and put in /pf/orb2dbt packets

SYNOPSIS

bulletin2orb [-v] [-V] [-1] [-p parameter_file] [-s seconds] orb

SUPPORT


Contributed code: NO BRTT support.
THIS PIECE OF SOFTWARE WAS CONTRIBUTED BY THE ANTELOPE USER COMMUNITY. BRTT DISCLAIMS ALL OWNERSHIP, LIABILITY, AND SUPPORT FOR THIS PIECE OF SOFTWARE.

FOR HELP WITH THIS PIECE OF SOFTWARE, PLEASE CONTACT THE CONTRIBUTING AUTHOR.

DESCRIPTION

Background

Association of events with externally published bulletins is a necessary part of the analyst review process with dbloc2(1) or real-time event processing using dborigin2orb(1). Whether or not this external association happens is often limited by the ability of the analyst/operator to write a parser to collect these catalogs from other network operators via a timely automated routine. The distribution format and mechanism varies from institution to institution. Examples of methods of distribution include: simple http available text file, html augmented file available via http, XML files, ftpable file, RSS feed, KML mappable files, finger quake (mostly deprecated), etc. Methods supported by bulletin2orb(1) and the associated bulletin.pm are discussed in the Bulletin Collection Types section. Formats of the web pages or retrieved files vary widely and are discussed in the Supported Bulletins and File Formats section.

Prior to the creation of bulletin2orb and the corresponding module bulletin.pm, I had to maintain individual scripts for collection of each external bulletin. Without any modules in place, each script recreated techniques to collect the bulletin, parse information out of it, and place that information in the appropriate css3.0 tables. This led to redundant code that was hard to maintain. Furthermore, sharing those scripts outside of our group was problematic as they often had to be tweaked yearly or more often as URLs, ftp locations, or html styles changed. A more reasonable approach to maintaining the scripts and sharing the information gleaned from them (i.e. the css3.0 origin tables) was needed. The need for ease of maintenance led to the development of the more generic bulletin2orb script and associated bulletin.pm routines along with the service of providing a single access point for external bulletin information.

The approach taken with bulletin2orb is to have a single data center, currently the ANF at UCSD (Array Network Facility), maintain all of the parsing scripts in bulletin.pm. That same data center would then run various instances of the bulletin2orb bulletin collection script. The data center can then serve the data to external users via /pf/$BULL/orb2dbt packets. The external clients collect whichever bulletins they are interested in by using orb2orb with a match statement and write individual bulletins out to local disk via orb2dbt. bulletin2orb is a script best used in-house at the ANF, but has been shared with the community as a courtesy.

For information on how to collect bulletins from the orb maintained at the ANF, please review the documentation provided with the bulletins rtdemo(1). You can adjust the provided demo such that only bulletins of interest to your specific project are collected. You should not have to run the bulletin2orb program!

General Methodology

The procedural flow for bulletin2orb is that a specific method is chosen that determines how the bulletin is to be collected. Depending upon the format of the returned data (i.e. single lines of text containing all information, multi-line text, downloaded file, etc.), a parser and, for some bulletins, an extractor must also be specified. The bulletins to be collected by bulletin2orb are selected via the bulletin2orb.pf. The script runs in daemon mode (unless the -1 option is selected) and processes each bulletin collection routine sequentially. The next attempt to process all bulletins begins after the program sleeps for -s seconds. For the non-searchable, non-ftp'd, rapidly supplied bulletins, checks are made against the previous collection run to attempt to avoid sending duplicate /pf/$BULL/orb2dbt packets into the orb. However, you will see duplicate information sent for any bulletin that is returned via the ftp, dbsubset, or search* methods. Duplication also occurs if the bulletin2orb process is restarted. At the ANF, multiple instances of bulletin2orb are running. Update intervals for the various instances of bulletin2orb are: 10 minutes, 60 minutes, 24 hours, and monthly. Duplicate instances are recommended such that no undue load of excessive data requests are put on the regional network servers.

Bulletin Collection Types - aka method

The perl module bulletin.pm was created to modularize the different techniques used to collect bulletin information from external sources. If additional methods of collection are required (i.e. RSS feed, KML files, etc.) or new data formats for collected bulletin information are discovered, new subroutines will need to be developed and added to bulletin.pm. The currently supported types of collection methods are described below. The method selected for a particular bulletin is specified in the method parameter value of bulletin2orb.pf for each bulletin collection task. DO NOT change these unless you are certain you know what you are doing. Only the methods listed below are currently supported. Specifying any other method in the pf file causes the script to die. The actual subroutine name in bulletin.pm is collect_method.

Supported Bulletins

Many institutions have multiple methods for collection (i.e. a web page with rapid solutions, a searchable bulletin, and/or a downloadable ftp file). You may or may not want to collect all types of bulletins from a regional network. My goal was to produce parsers and collect information from a wide variety of bulletins in a method that was sustainable and able to expand as new bulletins are discovered. The end-user can pick and choose the best solution from the origin table generated by orb2dbt(1). The table below shows the responsible institution or network, whether or not non-reviewed events may be included (if known), how frequently the bulletin is updated by the responsible data center, and the method, parser, and extractor.

Inst/Network NonRvwdEvts? UpdateInt method parser extractor
NEIC-CUBE maybe Rapid text qedCUBE -
NEIC-QED maybe Rapid htmltagged recenteqs neicWWW
EMSC maybe Rapid htmltagged EMSC EMSC
AEIC maybe Rapid htmlnotags finger finger
CERI-NMSN maybe Rapid htmltagged recenteqs qedWWW
Lamont-LCSN maybe Rapid htmltagged recenteqs qedWWW
MTECH-auto yes Rapid htmlnotags mtechAUTO mtechAUTO
MTECH-rev no Rapid htmlnotags recenteqs simsum
MTECH-file no Never file mtech_hypo71 mtech_hypo71
PNSN maybe Rapid htmltagged recenteqs qedWWW
NRCAN-PGC maybe Rapid htmlnotags recenteqs simsum
NRCAN-GSC maybe Rapid htmlnotags recenteqs simsum
UNR-NBE yes Rapid htmlnotags NBEwww NBEwww
UUSS-combo maybe Rapid htmltagged recenteqs qedWWW
UUSS-daily maybe Rapid htmlnotags recenteqs simsum
YELL-daily maybe Rapid htmlnotags recenteqs simsum
NEIC-PDE no Monthly ftp ehdf -
NEIC-QEDweekly no D/Weekly ftp ehdf -
PNSN-rev no Quarterly(?) ftp uwcard -
UUSS-lists no Quarterly(?) htmlnotags uussLIST uussLIST
YELL-lists no Quarterly(?) htmlnotags uussLIST uussLIST
AEIC maybe Quarterly(?) search_qf AEIC AEIC
GSmines no Daily(?) ftp mchdr mchdr
GSmines-monthly no Weekly(?) ftp mchdr mchdr
ANF maybe Rapid** dbsubset dbsubset -
BC-NESN maybe Rapid** search_post NESN NESN
NBE-search maybe Rapid** search_post NBEsearch NBEsearch
NCSN-search maybe Rapid** search_post HYPO2000 HYPO2000
SCSN-search maybe Rapid** search_qf HYPO2000 HYPO2000


Some bulletins are updated rapidly, within a few minutes to hours of the event origin time. Bulletins of this type which can be collected via bulletin2orb include: NEIC-CUBE, NEIC-QED, EMSC, UUSS (combo and daily, YELL-daily, Lamont-LCSN, CERI-NMSN, PNSN, NRCAN (PGC and GSC), MTECH (automatic and reviewed), and UNR-NBE. These rapid updating bulletin types can be requested fairly often without putting a high processing load on either the remote distributor or the host running bulletin2orb. Some bulletins are updated every few hours or daily, include less frequently updated reviewed solutions, or require submissions to a search page. These bulletins should not have a collection attempt made every 600 seconds. Instead you should run a separate instance of bulletin2orb which overrides the default sleep window with a larger value for -s, or run a multiple time per day (or week) cron job. Bulletins that you might wish to attempt to collect multiple times per day include: ANF, BC-NESN, NBE-search, NCSN-search, and SCSN-search. Bulletins that you might wish to collect daily include: AEIC, NEIC-QEDweekly and GSmines. Bulletins that you might wish to attempt to collect a few times per month include: NEIC-PDE, PNSN-rev, UUSS-lists, and YELL-lists. You might also consider recollecting many of the searchable bulletins once a month with a longer time window supplied to the search. In this way you might attempt to catch events that have been reprocessed or added after analyst review of back-logged data. NOTE: There will be duplicate origin data written to the export orb for many of these bulletins. This is especially true for any of the searchable bulletins as each successive search returns the same data. Attempts have been made to limit the duplication of event data for those bulletins that have a rapid update interval and are not results from a search. I expect that the endusers' orb2dbt(1) process will deal with any duplication of data.

Supported File Formats

There are a multitude of different file formats that have been developed by individual institutions. Some are more consistent and better documented than others. Listed below are a few that currently have parsers written for them with links to documentation and pickup location when available. The parsers are in bulletin.pm and appear as parse_parser.

The CUBE format provided by the NEIC combines solutions from many of the ANSS regional networks. The method used for this bulletin is text. The parser used for this bulletin is qedCUBE. Descriptions of the format can be found here:

http://earthquake.usgs.gov/regional/neic/qdds/cube_fmt.php
ftp://ehzftp.wr.usgs.gov/QDM/docs/CUBE.html
http://neic.cr.usgs.gov/neis/qdds/cube_fmt.html

The actual URL for the bulletin is:

http://earthquake.usgs.gov/eqcenter/catalogs/merged_catalog.cube

Many of the regional networks provide a web page that has a map along with a list of earthquakes. There is no formal format description that I have found. In general, the listings seem to have a magnitude (no magtype), a date, lat, lon, depth, and location. The majority of the ANSS networks include links in their earthquake lists. Those that have these embedded links use the htmltagged method with the recenteqs parser and the qedWWW or neicWWW extractor. This type of bulletin collection needs to have the time zone TZ, extractor, and defaultmagtype specified in the parameter file. Some regional networks provide a web page map and listing which has no html links. These bulletins use the htmlnotags method and either recenteqs or an institution specific parser (i.e. mtechAUTO, NBEwww, uussLIST). An extractor needs to be specified and varies depending on the institution which produces the bulletin. The actual URLs for the bulletins:


http://earthquake.usgs.gov/eqcenter/recenteqsww/Quakes/quakes_all.php
http://www.seis.utah.edu/req2webdir/recenteqs/Quakes/quakes0.html
http://www.ldeo.columbia.edu/LCSN/recenteqs/Quakes/quakes0.html
http://folkworm.ceri.memphis.edu/recenteqs/Quakes/quakes0.html
http://www.pnsn.org/recenteqs/Quakes/quakes0.htm
http://mbmgquake.mtech.edu/earthworm/reviewed_locations.html
http://mbmgquake.mtech.edu/earthworm/automatic_locations.html

The EMSC format provided by the European-Mediterranean Seismological Centre reports solutions from many of the regional and national networks found in Europe, around the Mediterranean, western Asia, and northern Africa along with solutions from the NEIC. The method used for this bulletin is htmltagged. The parser used for this bulletin is EMSC. I have found no formal write-up of the format.

The actual URLs for the bulletin:


http://www.emsc-csem.org/index.php?page=current&sub=list

A well documented format used by the SCSN and NCSN. This is one of the output format options available from the searchable bulletins. Unfortunately, there are slight differences between how Berkeley and Caltech use the format, but both should be able to use HYPO2000 for the parser and extractor. The SCSN bulletin collection uses the search_qf method while the NCSN bulletin collection uses the search_post method. Format descriptions can be found here:

http://www.data.scec.org/catalog_search/docs/2000hyposum_format.txt
http://www.ncedc.org/ncedc/documents.html#catalog_formats

The actual URL for the bulletin search program is:

http://www.data.scec.org/cgi-bin/catalog/catalog_search.pl
http://www.ncedc.org/cgi-bin/catalog-search2.pl

The NESN, run by Boston College, has its own unique format for the earthquake bulletin information that is displayed. In general, the data is tab separated, and contains three possible magnitude types, Mn, Mc, and Ml. Both a searchable bulletin and list (with no tags) are available, but the searchable mechanism is the preferred collection method as the no-search option leads to all events since 1990. The searchable bulletin uses the search_post method and NESN for the parser and extractor.

The actual URL for the bulletin search program is:
http://quake.bc.edu:8000/cgi-bin/NESN/print_catalog.pl

Although I thought it was deprecated and not in use anywhere, I managed to find a few places where you could still get a web based listing in the old finger format. A parser and extractor have been written and tested successfully for the AEIC instance. Note that this is only applicable to web accessible plain text versions of a finger listing that report event times in UTC.

The actual URL for the finger listing for which this extractor works is:
http://www.aeic.alaska.edu/cgi-bin/quake_finger.pl

The AEIC, run by the University of Alaska, Fairbanks, has its own unique format for the earthquake bulletin information that is displayed. The data returned from the searchable bulletin is white space separated, and contains three possible magnitude types, mb, ML, and MS. The searchable bulletin uses the search_qf method and AEIC for the parser and extractor.

The actual URL for the bulletin search program is:
http://www.aeic.alaska.edu/cgi-bin/db2catalog.pl

The Nevada Broadcast of Earthquakes (NBE) has a tagged web format similar to that used in the recenteqs display of other regional networks. No specific format description is available. However, a different parser and extractor is used: NBEwww.

The actual URL for the bulletin is:
http://www.seismo.unr.edu/Catalog/nbe.html

Also available from UNR is a searchable catalog. Like the NESN and NCSN bulletins, the search_post method is used. The parser and extractor is unique to this institution and is currently called NBEsearch.

The actual URL for the bulletin search program is:
http://www.seismo.unr.edu/cgi-bin/catalog-search

This is a well documented format used by the NEIC/USGS. Format descriptions can be found here:

ftp://hazards.cr.usgs.gov/weekly/ehdf.txt

The weekly QED bulletin is available via the ftp method using the edhr parser.

The ftp location for the QED weekly files is:
ftp://hazards.cr.usgs.gov/weekly

However, in the pf file, the leading ftp:// is excluded in the ftphost parameter and /weekly is the value used for the ftpdir. The ftpmatch should be set to ehdf.*.

The UW card format is a well documented format used by the PNSN of UW. Format descriptions can be found here:

ftp://ftp.ess.washington.edu/pub/seis_net/README.cardformat
http://www.pnsn.org/INFO_GENERAL/PNSN_QUARTERLY_EQ_CATALOG_KEY.html
The PNSN reviewed bulletin is available via the ftp method using the uwcard parser.

The ftp location for the PNSN reviewed files is:
ftp://ftp.ess.washington.edu/pub/seis_net

However, in the pf file, the leading ftp:// is excluded in the ftphost parameter and pub/seis_net is the value used for the ftpdir. The ftpmatch should be set to loc.[0-9].*.

This is a well documented format used by the NEIC/USGS. It is another example of a card format where the first one or two characters of the line determines what type of information is available. This parser was written to collect the USGS mine explosion bulletin. The QED weekly bulletin could also be collected in this format, but the ehdf format is preferred, mostly for historic reasons. For the USGS mine explosion bulletin, the mchdr method and mchdr extractor are used. Format descriptions can be found here:

ftp://hazards.cr.usgs.gov/weekly/mchedr.txt

The ftp location for the recent mine explosion files is:

ftp://hazards.cr.usgs.gov/explosions/mchedrexp.dat

However, in the pf file, the leading ftp:// is excluded in the ftphost parameter and explosions is the value used for the ftpdir. The ftpmatch should be set to mchedrexp.dat. Historic information, from 1997 forward, is available from:

ftp://hazards.cr.usgs.gov/mineblast/

OPTIONS

ENVIRONMENT

Must have sourced $ANTELOPE/setup.csh or $ANTELOPE.sh. Your version of Antelope needs to be 4.11 (fully patched) or higher.

PARAMETER FILE

The bulletin2orb.pf parameter file contains a series of bulletins (in an associative array). The bulletins array has one or more bulletin specifications. Each bulletin specification consists of an associative array with the bulletin name (user specified) as the associative array key. Bulletin parameters can vary depending on the method and/or the parser. Changes to the method, parser, extractor, url, ftphost, ftpdir, ftpmatch, or linestart are NOT RECOMMENDED!

In general, only change the src, localdir, and ndays. Leave the remainder as is or unexpected results may occur.



#
# bulletin2orb.pf
#
# Not all of these should be run under a
# single instance of bulletin2orb.
# Many should be run with a longer default
# interval between collection attempts
#
# --J.Eakins  10/7/2009

bulletins &Arr{

#
# These bulletins can be collected using the default
# settings (-s 600).
#
#  Create a bulletin2orb_rapid.pf between START/END
#

# START

  AEIC-finger &Arr{	# quick, but not so accurate in time/space solutions from AEIC
	method		htmlnotags
	parser		finger
	extractor	finger
	src		AEIC/finger	# srcname will be /pf/$src/orb2dbt
	auth		AEIC      	# auth == $auth
	url		http://www.aeic.alaska.edu/cgi-bin/quake_finger.pl
	defaultmagtype	ml		#
  }

  EMSC-www &Arr{
        method          htmltagged              # text file available via http
        extractor       EMSC
        parser          EMSC
        src             EMSC	     # srcname will be /pf/$src/orb2dbt
        auth            EMSC            #
        url             http://www.emsc-csem.org/index.php?page=current&sub=list
  }

  MTECH-auto    &Arr{	# automatic solutions from Montana Tech
	method		htmlnotags
	parser		mtechAUTO
	extractor	mtechAUTO
	src		MTECH/A  	# srcname will be /pf/$src/orb2dbt
	auth		MTECH		#  auth == $auth . "$evid"
	url		http://mbmgquake.mtech.edu/earthworm/automatic_locations.html
	defaultmagtype	md		# not coded yet
  }

  MTECH-rev     &Arr{	# reviewed solutions from Montana Tech
	method		htmlnotags
	parser		recenteqs
	extractor	simsum
	src		MTECH/R		# srcname will be /pf/$src/orb2dbt
	auth		MTECH_R		# auth == $auth
	url		http://mbmgquake.mtech.edu/earthworm/reviewed_locations.html
	defaultmagtype	ml		# not coded yet
  }

  UNR-NBE    	&Arr{	# both prelim and reviewed solutions from the Nevada Broadcast of Earthquakes
	method		htmlnotags
	extractor	NBEwww		# extract_nbeWWW
	parser		NBEwww		# parse_nbeWWW
	src		NBE		# srcname will be /pf/$src/orb2dbt
	auth		NBE 		# auth == $auth	. "$NBE_evid"
	url		http://www.seismo.unr.edu/Catalog/nbe.html
	defaultmagtype	ml		# not coded yet, "local magnitudes"
  }

  NEIC-CUBE &Arr{	# combined quick bulletin from NEIC.  Includes solutions from multiple regional networks
	method		text		# text file available via http
	parser		CUBE
	src		NEIC/CUBE	# srcname will be /pf/$src/orb2dbt
	auth		USGS		# qedCUBE will actually have auth == "USGS:$contributor"
	url		http://earthquake.usgs.gov/eqcenter/catalogs/merged_catalog.cube
  }

  NEIC-QED &Arr{	# Quick Earthquake determination list from NEIC/USGS
	method		htmltagged 	# WWW screen scrape w/ tags
	parser		recenteqs
	extractor	neicWWW
	TZ		UTC
	src		NEIC/qed	# srcname will be /pf/$src/orb2dbt
	auth		QED 		# auth == $auth
	url		http://earthquake.usgs.gov/eqcenter/recenteqsww/Quakes/quakes_all.php
	defaultmagtype	M 		#  md if mag< 1.9, ml if mag >=1.9, ???
  }

  Lamont-LCSN &Arr{	# Lamont Cooperative Seismo Network (Eastern US)
	method		htmltagged 	# WWW screen scrape w/ tags
	parser		recenteqs
	extractor	qedWWW
	TZ		US/Eastern
	src		LCSN		# srcname will be /pf/$src/orb2dbt
	auth		LCSN		# auth == $auth
	url		http://www.ldeo.columbia.edu/LCSN/recenteqs/Quakes/quakes0.html
	defaultmagtype	md		#  md if mag< 1.9, ml if mag >=1.9, ???
  }

  CERI-NMSN   &Arr{
	method		htmltagged
	parser		recenteqs
	extractor	qedWWW
	TZ		US/Central
	src		NMSN		# srcname will be /pf/$src/orb2dbt
	auth		CERI		# auth == $auth
	url		http://folkworm.ceri.memphis.edu/recenteqs/Quakes/quakes0.html
	defaultmagtype	md		#  md if mag< 1.9, ml if mag >=1.9, ???
  }

  PNSN &Arr{
	method		htmltagged
	parser		recenteqs
	extractor	qedWWW
	TZ		US/Pacific
	src		PNSN/A	 	# srcname will be /pf/$src/orb2dbt
	auth		PNSN_A		# auth == $auth
	url		http://www.pnsn.org/recenteqs/Quakes/quakes0.htm
	defaultmagtype	md		#  md if mag< 1.9, ml if mag >=1.9, ???
  }

  UUSS-combo   &Arr{
	method		htmltagged
	parser		recenteqs
	extractor	qedWWW
	TZ		US/Mountain
	src		UUSS/combo 	# srcname will be /pf/$src/orb2dbt
	auth		UUSS		# auth == $auth
	url		http://www.seis.utah.edu/req2webdir/recenteqs/Quakes/quakes0.html
	defaultmagtype	md		# md if mag< 1.9, ml if mag >=1.9, ???
  }

  NRCAN-PGC 	&Arr{
	method		htmlnotags
	parser		recenteqs
	extractor	simsum
	src		NRCAN/PGC	# srcname will be /pf/$src/orb2dbt
	auth		PGC  	 	# auth == $auth
### If you want more events, try the year long grab
###http://earthquakescanada.nrcan.gc.ca/recent/maps-cartes/index-eng.php?maptype=1y&tpl_region=west
	url	 	http://earthquakescanada.nrcan.gc.ca/recent/maps-cartes/index-eng.php?tpl_region=west
	defaultmagtype	M 		# not coded yet
  }

  NRCAN-GSC 	&Arr{
	method		htmlnotags
	parser		recenteqs
	extractor	simsum
	src		NRCAN/GSC	# srcname will be /pf/$src/orb2dbt
	auth		GSC  		# auth == $auth
	url	 	http://earthquakescanada.nrcan.gc.ca/recent/maps-cartes/index-eng.php?tpl_region=east
	defaultmagtype	M 		# not coded yet
  }

# END

#
# these should probably not be checked every 600 seconds
# I suggest putting them in a separate pf and run a bull2orb
# that has a -s of 3600*3 (every 3 hours)
#

#
#  Create a bulletin2orb_multi.pf between START/END
#
# START

# searchable catalogs need ndays defined

  AEIC-search      &Arr{
	method		search_qf
	parser		AEIC	# calls will be to postqf_AEIC, extract_AEIC, parse_AEIC
	extractor	AEIC	# AEIC
	src		AEIC	# srcname will be /pf/$src/orb2dbt
	auth		AEIC		# auth == $auth
	url		http://www.aeic.alaska.edu/cgi-bin/db2catalog.pl
	ndays		90		# used to set min/max time for search. Search range w/o enddate:  now-86400*ndays::now
  }

  SCSN-search      &Arr{
	method		search_qf
	parser		HYPO2000	# calls will be to postqf_HYPO2000, extract_HYPO2000, parse_HYPO2000
	extractor	HYPO2000	# extract_HYPO2000
	src		SCSN		# srcname will be /pf/$src/orb2dbt
	auth		SCSN		# auth == $auth	. "$evid"
	url		http://www.data.scec.org/cgi-bin/catalog/catalog_search.pl
	ndays		7 		# used to set min/max time for search. Search range w/o enddate:  now-86400*ndays::now
  }

   NCSN-search    &Arr{
 	method		search_post
 	parser		HYPO2000
 	extractor	HYPO2000	# extract_HYPO2000
 	src		NCSN	  	# srcname will be /pf/$src/orb2dbt
 	auth		NCSN		# auth == $auth	. "$nscn_evid" . "$rev_info"
 	url		http://www.ncedc.org/cgi-bin/catalog-search2.pl
 	ndays		7 		# used to set min/max time for search. Search range w/o enddate:  now-86400*ndays::now
   }

   BC-NESN       &Arr{
 	method		search_post
 	parser		NESN
 	extractor	NESN    	# extract_NESN
 	src		NESN		# srcname will be /pf/$src/orb2dbt
 	auth		NESN		# auth == $auth
 	url		http://quake.bc.edu:8000/cgi-bin/NESN/print_catalog.pl
	ndays		31 		# used to set min/max time for search. Search range w/o enddate:  now-86400*ndays::now
   }

  NBE-search  	&Arr{
	method		search_post
	parser		NBEsearch
	extractor	NBEsearch	# extract_NBE (different from extract_NBEwww)
	src		NBE		# srcname will be /pf/$src/orb2dbt
	auth		UNR_NBE		# auth == $auth
	url		http://www.seismo.unr.edu/cgi-bin/catalog-search
	ndays		7  		# used to set min/max time for search. Search range w/o enddate:  now-86400*ndays::now
  }

 ANF_rt &Arr{
        method          dbsubset
        parser          dbsubset
        src             ANF             # srcname will be /pf/$src/orb2dbt
        auth            ANF             # auth will be filled in with origin.auth after authsubset
        db              /path/to/anf/rt/db/usarray
        authsubset      auth=~/ANF.*/
        ndays           7               # used to set min/max time for search. Search range w/o enddate:  now-86400*ndays::now
   }

 ANF_arch       &Arr{
        method          dbsubset
        parser          dbsubset
        src             ANF             # srcname will be /pf/$src/orb2dbt
        auth            ANF             # auth will be filled in with origin.auth after authsubset
        db              /path/to/anf/archived/db/usarray
        authsubset      auth=~/ANF.*/
        enddate         2/1/2008        # used to set endtime for search.  Without enddate, endtime == now()
        ndays           31              # used to set min/max time for search.  Search range w/o enddate:  now-86400*ndays::now
   }

  UUSS-daily &Arr{	# utah daily updated solutions
	method		htmlnotags
	parser		recenteqs
	extractor	simsum
	TZ		UTC
	src		UUSS/utah	# srcname will be /pf/$src/orb2dbt
	auth		UUSS		# auth == $auth
	url		http://www.quake.utah.edu/ftp/DATA_REQUESTS/RECENT_EQS/utah.list
	defaultmagtype	md		# md if mag< 1.9, ml if mag >=1.9, ???
  }

  YELL-daily &Arr{	# yellowstone daily updated solutions
	method		htmlnotags
	parser		recenteqs
	extractor	simsum
	TZ		UTC
	src		UUSS/yellowstone	# srcname will be /pf/$src/orb2dbt
	auth		UUSS		# auth == $auth
	url		http://www.quake.utah.edu/ftp/DATA_REQUESTS/RECENT_EQS/yellowstone.list
	defaultmagtype	md		# md if mag< 1.9, ml if mag >=1.9, ???
  }

  NEIC-QEDweekly &Arr{	# NEIC's more reviewed QED solutions (not quite PDE quality)
	method		ftp
	parser		ehdf
	src		NEIC/qedw	# srcname will be /pf/$src/orb2dbt
	auth		QED_weekly	# auth == $auth
	ftphost		hazards.cr.usgs.gov	# remote host for ftp pickup
	ftpdir		/weekly/	# remote directory where files are kept
	ftpmatch	ehdf.* 		# match string or remote ftp files
	linestart	GS  		# match for start of event line ("GS" for ehdr, "E" for CUBE, etc.)
	account		jeakins@ucsd.edu	# email address for anonymous ftp
	localdir	savefiles/qed_weekly	# local directory where retrieved files are kept
  }

  PNSN-rev &Arr{	# reviewed solutions by UW/PNSN
	method		ftp
	parser		uwcard
	src		PNSN/R  	# srcname will be /pf/$src/orb2dbt
	auth		PNSN_R  	# auth == $auth
	ftphost		ftp.ess.washington.edu	# remote host for ftp pickup
	ftpdir		pub/seis_net/	# remote directory where files are kept
	ftpmatch	loc.[0-9].* 		# match string or remote ftp files
	linestart	A   		# match for start of event line ("GS" for ehdr, "E" for CUBE, etc.)
	linelength	40  		# reject lines that are shorter than linelength
	account		jeakins@ucsd.edu	# email address for anonymous ftp
	localdir	savefiles/pnsn_reviewed # local directory where retrieved files are kept
	defaultmagtype	md		#
  }

  GSmines   &Arr{
	method		ftp
	parser		mchdr
	extractor	mchdr
	src		NEIC/mines	# srcname will be /pf/$src/orb2dbt
	auth		NEIC_mines	# auth == $auth
	ftphost		hazards.cr.usgs.gov	# remote host for ftp pickup
	ftpdir		explosions	# remote directory where files are kept
	ftpmatch	mchedrexp.dat 		# match string or remote ftp files
	linestart	HY|E  		# match for start of event line ("GS" for ehdr, "E" for CUBE, etc.)
	account		jeakins@ucsd.edu	# email address for anonymous ftp
	localdir	savefiles/current_mines # local directory where retrieved files are kept
  }

  GSmines-monthly &Arr{
	method		ftp
	parser		mchdr
	extractor	mchdr
	src		NEIC/mines	# srcname will be /pf/$src/orb2dbt
	auth		NEIC_mines	# auth == $auth
	ftphost		hazards.cr.usgs.gov	# remote host for ftp pickup
	ftpdir		mineblast	# remote directory where files are kept
	ftpmatch	ex.dat 		# match string or remote ftp files
	linestart	HY|E  		# match for start of event line ("GS" for ehdr, "E" for CUBE, etc.)
	account		jeakins@ucsd.edu	# email address for anonymous ftp
	localdir	savefiles/monthly_mines # local directory where retrieved files are kept
  }

# END

#
# These should probably not be run from a daemonized bulletin2orb.
# I suggest putting them in a separate pf and run bulletin2orb
# as a monthly cronjob with the -1 option used
#

#
#  Create a bulletin2orb_multi.pf between START/END
#
# START

  PNSN-rev &Arr{
	method		ftp
	parser		uwcard
	src		PNSN/R    	# srcname will be /pf/$src/orb2dbt
	auth		PNSN_R    	# auth == $auth
	ftphost		ftp.ess.washington.edu	# remote host for ftp pickup
	ftpdir		pub/seis_net/	# remote directory where files are kept
	ftpmatch	loc.[0-9].* 		# match string or remote ftp files
	linestart	A   		# match for start of event line ("GS" for ehdr, "E" for CUBE, etc.)
	linelength	40  		# reject lines that are shorter than linelength
	account		jeakins@ucsd.edu	# email address for anonymous ftp
	localdir	savefiles/pnsn_reviewed # local directory where retrieved files are kept
	defaultmagtype	md		#
  }

  UUSS-lists &Arr{	# Reviwed(?) Utah region events
	method		htmlnotags
	parser		uussLIST
	extractor	uussLIST
	src		UUSS/utah	# srcname will be /pf/$src/orb2dbt
	auth		UUSS      	# auth == $auth
	url		http://www.quake.utah.edu/EQCENTER/LISTINGS/UTAH/equtah_2009
  }

  YELL-lists &Arr{	# Reviewed(?) Yellowstone region events
	method		htmlnotags
	parser		uussLIST
	extractor	uussLIST
	src		UUSS/yellowstone	# srcname will be /pf/$src/orb2dbt
	auth		UUSS      	# auth == $auth
	url		http://www.quake.utah.edu/EQCENTER/LISTINGS/OTHER/yell_2009
  }

# END
#

#
#  Create a bulletin2orb_archived.pf between START/END
#
# START
#
# collect the databases you have locally and push to orb for downstream collection
# should probably run infrequently
#

  2008PDE &Arr{
	method		dbsubset
	parser		dbsubset
	src		archived/NEIC/2008PDE		# srcname will be /pf/$src/orb2dbt
	auth		dummy		# auth will be filled in with origin.auth after authsubset
	db 		archived_catalogs/qed/qed_2008
	authsubset	auth=~/.*/
  }

  2009PDE &Arr{
	method		dbsubset
	parser		dbsubset
	src		archived/NEIC/2009PDE		# srcname will be /pf/$src/orb2dbt
	auth		dummy		# auth will be filled in with origin.auth after authsubset
	db 		qrchived_catalogs/pde/pde_2009
	authsubset	auth=~/.*/
  }

  2009PNSN &Arr{
	method		dbsubset
	parser		dbsubset
	src		archived/PNSN/2009		# srcname will be /pf/$src/orb2dbt
	auth		dummy		# auth will be filled in with origin.auth after authsubset
	db 		archived_catalogs/pnsn/pnsn_2009
	authsubset	auth=~/.*/
  }

  2009UUSS &Arr{
	method		dbsubset
	parser		dbsubset
	src		archived/UUSS/2009	# srcname will be /pf/$src/orb2dbt
	auth		dummy     	# auth == $auth
	db 		archived_catalogs/utah/utah_2009
	authsubset	auth=~/.*/
  }

  2009SCSN      &Arr{
	method		dbsubset
	parser		dbsubset
	src		archived/SCSN/2009	# srcname will be /pf/$src/orb2dbt
	auth		dummy     	# auth == $auth
	db 		archived_catalogs/cit/cit_2009
	authsubset	auth=~/.*/
  }

  2009NCSN    &Arr{
	method		dbsubset
	parser		dbsubset
	src		archived/NCSN/2009	# srcname will be /pf/$src/orb2dbt
	auth		dummy     	# auth == $auth
	db 		archived_catalogs/ncec/ncec_2009
	authsubset	auth=~/.*/
  }

  2009NESN       &Arr{
	method		dbsubset
	parser		dbsubset
	src		archived/NESN/2009	# srcname will be /pf/$src/orb2dbt
	auth		dummy     	# auth == $auth
	db 		archived_catalogs/nesn/nesn_2009
	authsubset	auth=~/.*/
  }

  2009NBE  	&Arr{
	method		dbsubset
	parser		dbsubset
	src		archived/NBE/2009 	# srcname will be /pf/$src/orb2dbt
	auth		dummy     	# auth == $auth
	db 		archived_catalogs/unr/unr_2009
	authsubset	auth=~/.*/
  }

# END
#

#
#  Create a bulletin2orb_monthly.pf between START/END
#
# START
#
# these bulletins are either updated infrequently
# or take a longer time for requests to process and should probably
# be run as an infrequent cron job
#

  NEIC-PDE &Arr{
	method		ftp
	parser		ehdf
	src		NEIC/PDE	# srcname will be /pf/$src/orb2dbt
	auth		PDE		# auth == $auth
	ftphost		hazards.cr.usgs.gov	# remote host for ftp pickup
	ftpdir		/pde/	# remote directory where files are kept
	ftpmatch	ehdf2008.*|ehdf2009.* 		# match string or remote ftp files
	linestart	GS  		# match for start of event line ("GS" for ehdr, "E" for CUBE, etc.)
	account		jeakins@ucsd.edu	# email address for anonymous ftp
	localdir	savefiles/pde	# local directory where retrieved files are kept
  }

  SCSN-longsearch      &Arr{
	method		search_qf
	parser		HYPO2000	# calls will be to postqf_HYPO2000, extract_HYPO2000, parse_HYPO2000
	extractor	HYPO2000	# extract_HYPO2000
	src		SCSN		# srcname will be /pf/$src/orb2dbt
	auth		SCSN		# auth == $auth	. "$evid"
	url		http://www.data.scec.org/cgi-bin/catalog/catalog_search.pl
#       enddate         4/1/2008        # used to set endtime for search.  Without enddate, endtime == now()
	ndays		90		# used to set min/max time for search.  Search range w/o enddate:  now-86400*ndays::now
  }

   NCSN-longsearch    &Arr{
 	method		search_post
 	parser		HYPO2000
 	extractor	HYPO2000	# extract_HYPO2000
 	src		NCSN		# srcname will be /pf/$src/orb2dbt
 	auth		NCSN		# auth == $auth	. "$nscn_evid" . "$rev_info"
 	url		http://www.ncedc.org/cgi-bin/catalog-search2.pl
 	ndays		90 		# used to set min/max time for search.  Search range w/o enddate:  now-86400*ndays::now
   }

   NESN-longsearch   &Arr{
 	method		search_post
 	parser		NESN
 	extractor	NESN    	# extract_NESN
 	src		NESN		# srcname will be /pf/$src/orb2dbt
 	auth		NESN		# auth == $auth
 	url		http://quake.bc.edu:8000/cgi-bin/NESN/print_catalog.pl
 	ndays		90 		# used to set min/max time for search.  Search range w/o enddate:  now-86400*ndays::now
   }

  NBE-longsearch  	&Arr{
	method		search_post
	parser		NBEsearch
	extractor	NBEsearch	# extract_NBE (different from extract_NBEwww)
	src		NBE		# srcname will be /pf/$src/orb2dbt
	auth		UNR_NBE		# auth == $auth
	url		http://www.seismo.unr.edu/cgi-bin/catalog-search
	ndays		180 		# used to set min/max time for search.  Search range w/o enddate:  now-86400*ndays::now
  }

 ANF-longsubset	&Arr{
	method		dbsubset
	parser		dbsubset
	src		ANF		# srcname will be /pf/$src/orb2dbt
	auth		ANF		# auth will be filled in with origin.auth after authsubset
	db 		/path/to/usarray/db/usarray
	authsubset	auth=~/ANF.*/
	ndays		90 		# used to set min/max time for search.  Search range w/o enddate:  now-86400*ndays::now
   }

# END
#
#
# This is an example of a single-use update for a non-daemonized bulletin2orb.
# In this case, the user has a single local file that needs to be converted.
#
# I suggest putting this collection in a separate pf and run bulletin2orb
# as a command line process with the -1 option used
#

#
#  Create a bulletin2orb_once.pf between START/END
#
# START

  MTECH-file    &Arr{   # file from mtech
        method          file
        parser          mtech_hypo71
        extractor       mtech_hypo71
        linestart       [0-9]
        src             flatfile        # srcname will be /pf/$src/orb2dbt
        auth            MTECH           #  auth == $auth . "$evid"
        file            /file/location/2004-2009.qks
  }
# END
}


Searchable bulletins and those using the dbsubset method need to have an ndays parameter to define the time range of the search.

Searchable bulletins and those using the dbsubset method can have an enddate parameter to define the ending time for the search. If not specified, search looks for ndays of data prior to now().

Whether or not a particular bulletin needs an extractor specified depends on the format and method used to collect it. Don't change the defaults.

Bulletins using the ftp method need to have an ftphost, ftpdir, ftpmatch, linestart, account, and localdir specified.

EXAMPLE

See rtdemo(1) bulletins for methodology for collecting the bulletins from the server provided by the ANF.

For earthquake bulletins that are updated rapidly and require no search of remote site:


% bulletin2orb -p pf/bulletin2orb_rapid $ORB 

For earthquake bulletins that are updated frequently but may require searchs or other retreival mechanisms that are CPU intensive (on the remote side). Collect only once an hour.


% bulletin2orb -s 3600 -p pf/bulletin2orb_multi $ORB 

For earthquake bulletins that are updated once less frequently, maybe once or twice a day, but contain multiple months of data, or otherwise don't need to be collected raidly, collect once per day:


% bulletin2orb -s 86400 -p pf/bulletin2orb_daily $ORB 

For earthquake bulletins that are updated infrequently, maybe once or twice a month, or for re-collecting long stretches of data from searchable bulletins, collect via a cron job run monthly. Use the non-daemon mode of operations.


% bulletin2orb -1 -p pf/bulletin2orb_monthly $ORB 

DIAGNOSTICS

If you select a method, parser, or extractor that is not defined in bulletin.pm the script fails in unexpected ways.

Error messages like:

Use of uninitialized value $p in hash dereference at bin/bulletin2orb line 120.
Use of uninitialized value $m in hash dereference at bin/bulletin2orb line 121.
Use of uninitialized value $parsed_info{"or_time"} in string at bin/bulletin2orb line 349.
imply that critical information is not being retrieved from the bulletin collected via the ftp method. It is highly likely that the input you are trying to read is a short line placed by the originating institution to signify a comment, a non-located event, or a deleted event (bulletin2orb has no provisions to deal with deleted events). You can review the retrieved file to see what might be going on. To avoid the short lines in the input file, specify a linelength parameter in the appropriate bulletin collection task section of the parameter file. Lines shorter than linelength are rejected.

SEE ALSO

mchedr2db(1)
pde2origin(1)
rtdemo(1)

BUGS AND CAVEATS

The most important caveat: Garbage In = Garbage Out.

I am still finding new and interesting ways to cause this program to fail. Consider this an early beta release... I suspect that there may be memory issues when this program is run for a long time. No long-term testing has been completed yet.

The client must use Antelope version 4.11 (fully patched). Previous versions of orb2dbt(1) do not write out origin rows for non-associating events.

There is no subsetting of the data collected from the remote sites. For instance, if you are only interested in teleseismic events but are collecting the NEIC-CUBE bulletin, there is no way to subset the incoming list of events: this program returns all events that are reported, including the local ones.

If you are collecting many bulletins, or are running the script for the first time and have to collect many of the ftp files, it can take significantly longer than the default 600 seconds between bulletin collection passes.

The dbsubset method is less than ideal as it does not perform an exact duplication of data that was in the input db. Other methods have been put in to contrib to help with complete database replication.

Although this is a vast improvement over previous procedures to collect external bulletins, it is still a very complex procedure. Further attempts at documentation would probably be a good idea. I expect to present something about this at a future Antelope User Group Meeting so a presentation will be available at some point.

After years of trying to collect random formats of bulletin data produced by multiple sources and finally making the "one script to rule them all", I am reminded of a phrase that my daughter learned in kindergarten: "You get what you get, and you don't throw a fit." It seems appropriate for both the author and endusers to keep in mind.

If you have a favorite bulletin that does not currently have a method for collection or parsing, feel free to contact me to see if I would consider writing a parser for it. However see previous caveat...

AUTHOR

Jennifer Eakins
jeakins@ucsd.edu
ANF
University of California, San Diego

Antelope User Group Contributed Software
Printer icon