bulletin2orb [-v] [-V] [-1] [-p parameter_file] [-s seconds] orb
Association of events with externally published bulletins is a necessary part of the analyst review process with dbloc2(1) or real-time event processing using dborigin2orb(1). Whether or not this external association happens is often limited by the ability of the analyst/operator to write a parser to collect these catalogs from other network operators via a timely automated routine. The distribution format and mechanism varies from institution to institution. Examples of methods of distribution include: simple http available text file, html augmented file available via http, XML files, ftpable file, RSS feed, KML mappable files, finger quake (mostly deprecated), etc. Methods supported by bulletin2orb(1) and the associated bulletin.pm are discussed in the Bulletin Collection Types section. Formats of the web pages or retrieved files vary widely and are discussed in the Supported Bulletins and File Formats section.
Prior to the creation of bulletin2orb and the corresponding module bulletin.pm, I had to maintain individual scripts for collection of each external bulletin. Without any modules in place, each script recreated techniques to collect the bulletin, parse information out of it, and place that information in the appropriate css3.0 tables. This led to redundant code that was hard to maintain. Furthermore, sharing those scripts outside of our group was problematic as they often had to be tweaked yearly or more often as URLs, ftp locations, or html styles changed. A more reasonable approach to maintaining the scripts and sharing the information gleaned from them (i.e. the css3.0 origin tables) was needed. The need for ease of maintenance led to the development of the more generic bulletin2orb script and associated bulletin.pm routines along with the service of providing a single access point for external bulletin information.
The approach taken with bulletin2orb is to have a single data center, currently the ANF at UCSD (Array Network Facility), maintain all of the parsing scripts in bulletin.pm. That same data center would then run various instances of the bulletin2orb bulletin collection script. The data center can then serve the data to external users via /pf/$BULL/orb2dbt packets. The external clients collect whichever bulletins they are interested in by using orb2orb with a match statement and write individual bulletins out to local disk via orb2dbt. bulletin2orb is a script best used in-house at the ANF, but has been shared with the community as a courtesy.
For information on how to collect bulletins from the orb maintained at the ANF, please review the documentation provided with the bulletins rtdemo(1). You can adjust the provided demo such that only bulletins of interest to your specific project are collected. You should not have to run the bulletin2orb program!
The procedural flow for bulletin2orb is that a specific method is chosen that determines how the bulletin is to be collected. Depending upon the format of the returned data (i.e. single lines of text containing all information, multi-line text, downloaded file, etc.), a parser and, for some bulletins, an extractor must also be specified. The bulletins to be collected by bulletin2orb are selected via the bulletin2orb.pf. The script runs in daemon mode (unless the -1 option is selected) and processes each bulletin collection routine sequentially. The next attempt to process all bulletins begins after the program sleeps for -s seconds. For the non-searchable, non-ftp'd, rapidly supplied bulletins, checks are made against the previous collection run to attempt to avoid sending duplicate /pf/$BULL/orb2dbt packets into the orb. However, you will see duplicate information sent for any bulletin that is returned via the ftp, dbsubset, or search* methods. Duplication also occurs if the bulletin2orb process is restarted. At the ANF, multiple instances of bulletin2orb are running. Update intervals for the various instances of bulletin2orb are: 10 minutes, 60 minutes, 24 hours, and monthly. Duplicate instances are recommended such that no undue load of excessive data requests are put on the regional network servers.
The perl module bulletin.pm was created to modularize the different techniques used to collect bulletin information from external sources. If additional methods of collection are required (i.e. RSS feed, KML files, etc.) or new data formats for collected bulletin information are discovered, new subroutines will need to be developed and added to bulletin.pm. The currently supported types of collection methods are described below. The method selected for a particular bulletin is specified in the method parameter value of bulletin2orb.pf for each bulletin collection task. DO NOT change these unless you are certain you know what you are doing. Only the methods listed below are currently supported. Specifying any other method in the pf file causes the script to die. The actual subroutine name in bulletin.pm is collect_method.
Many institutions have multiple methods for collection (i.e. a web page with rapid solutions, a searchable bulletin, and/or a downloadable ftp file). You may or may not want to collect all types of bulletins from a regional network. My goal was to produce parsers and collect information from a wide variety of bulletins in a method that was sustainable and able to expand as new bulletins are discovered. The end-user can pick and choose the best solution from the origin table generated by orb2dbt(1). The table below shows the responsible institution or network, whether or not non-reviewed events may be included (if known), how frequently the bulletin is updated by the responsible data center, and the method, parser, and extractor.
Inst/Network | NonRvwdEvts? | UpdateInt | method | parser | extractor |
NEIC-CUBE | maybe | Rapid | text | qedCUBE | - |
NEIC-QED | maybe | Rapid | htmltagged | recenteqs | neicWWW |
EMSC | maybe | Rapid | htmltagged | EMSC | EMSC |
AEIC | maybe | Rapid | htmlnotags | finger | finger |
CERI-NMSN | maybe | Rapid | htmltagged | recenteqs | qedWWW |
Lamont-LCSN | maybe | Rapid | htmltagged | recenteqs | qedWWW |
MTECH-auto | yes | Rapid | htmlnotags | mtechAUTO | mtechAUTO |
MTECH-rev | no | Rapid | htmlnotags | recenteqs | simsum |
MTECH-file | no | Never | file | mtech_hypo71 | mtech_hypo71 |
PNSN | maybe | Rapid | htmltagged | recenteqs | qedWWW |
NRCAN-PGC | maybe | Rapid | htmlnotags | recenteqs | simsum |
NRCAN-GSC | maybe | Rapid | htmlnotags | recenteqs | simsum |
UNR-NBE | yes | Rapid | htmlnotags | NBEwww | NBEwww |
UUSS-combo | maybe | Rapid | htmltagged | recenteqs | qedWWW |
UUSS-daily | maybe | Rapid | htmlnotags | recenteqs | simsum |
YELL-daily | maybe | Rapid | htmlnotags | recenteqs | simsum |
NEIC-PDE | no | Monthly | ftp | ehdf | - |
NEIC-QEDweekly | no | D/Weekly | ftp | ehdf | - |
PNSN-rev | no | Quarterly(?) | ftp | uwcard | - |
UUSS-lists | no | Quarterly(?) | htmlnotags | uussLIST | uussLIST |
YELL-lists | no | Quarterly(?) | htmlnotags | uussLIST | uussLIST |
AEIC | maybe | Quarterly(?) | search_qf | AEIC | AEIC |
GSmines | no | Daily(?) | ftp | mchdr | mchdr |
GSmines-monthly | no | Weekly(?) | ftp | mchdr | mchdr |
ANF | maybe | Rapid** | dbsubset | dbsubset | - |
BC-NESN | maybe | Rapid** | search_post | NESN | NESN |
NBE-search | maybe | Rapid** | search_post | NBEsearch | NBEsearch |
NCSN-search | maybe | Rapid** | search_post | HYPO2000 | HYPO2000 |
SCSN-search | maybe | Rapid** | search_qf | HYPO2000 | HYPO2000 |
There are a multitude of different file formats that have been developed by individual institutions. Some are more consistent and better documented than others. Listed below are a few that currently have parsers written for them with links to documentation and pickup location when available. The parsers are in bulletin.pm and appear as parse_parser.
The CUBE format provided by the NEIC combines solutions from many of the ANSS regional networks. The method used for this bulletin is text. The parser used for this bulletin is qedCUBE. Descriptions of the format can be found here:
http://earthquake.usgs.gov/regional/neic/qdds/cube_fmt.php
ftp://ehzftp.wr.usgs.gov/QDM/docs/CUBE.html
http://neic.cr.usgs.gov/neis/qdds/cube_fmt.html
The actual URL for the bulletin is:
http://earthquake.usgs.gov/eqcenter/catalogs/merged_catalog.cube
Many of the regional networks provide a web page that has a map along with a list of earthquakes. There is no formal format description that I have found. In general, the listings seem to have a magnitude (no magtype), a date, lat, lon, depth, and location. The majority of the ANSS networks include links in their earthquake lists. Those that have these embedded links use the htmltagged method with the recenteqs parser and the qedWWW or neicWWW extractor. This type of bulletin collection needs to have the time zone TZ, extractor, and defaultmagtype specified in the parameter file. Some regional networks provide a web page map and listing which has no html links. These bulletins use the htmlnotags method and either recenteqs or an institution specific parser (i.e. mtechAUTO, NBEwww, uussLIST). An extractor needs to be specified and varies depending on the institution which produces the bulletin. The actual URLs for the bulletins:
http://earthquake.usgs.gov/eqcenter/recenteqsww/Quakes/quakes_all.php
http://www.seis.utah.edu/req2webdir/recenteqs/Quakes/quakes0.html
http://www.ldeo.columbia.edu/LCSN/recenteqs/Quakes/quakes0.html
http://folkworm.ceri.memphis.edu/recenteqs/Quakes/quakes0.html
http://www.pnsn.org/recenteqs/Quakes/quakes0.htm
http://mbmgquake.mtech.edu/earthworm/reviewed_locations.html
http://mbmgquake.mtech.edu/earthworm/automatic_locations.html
The EMSC format provided by the European-Mediterranean Seismological Centre reports solutions from many of the regional and national networks found in Europe, around the Mediterranean, western Asia, and northern Africa along with solutions from the NEIC. The method used for this bulletin is htmltagged. The parser used for this bulletin is EMSC. I have found no formal write-up of the format.
The actual URLs for the bulletin:
http://www.emsc-csem.org/index.php?page=current&sub=list
A well documented format used by the SCSN and NCSN. This is one of the output format options available from the searchable bulletins. Unfortunately, there are slight differences between how Berkeley and Caltech use the format, but both should be able to use HYPO2000 for the parser and extractor. The SCSN bulletin collection uses the search_qf method while the NCSN bulletin collection uses the search_post method. Format descriptions can be found here:
http://www.data.scec.org/catalog_search/docs/2000hyposum_format.txt
http://www.ncedc.org/ncedc/documents.html#catalog_formats
The actual URL for the bulletin search program is:
http://www.data.scec.org/cgi-bin/catalog/catalog_search.pl
http://www.ncedc.org/cgi-bin/catalog-search2.pl
The NESN, run by Boston College, has its own unique format for the earthquake bulletin information that is displayed. In general, the data is tab separated, and contains three possible magnitude types, Mn, Mc, and Ml. Both a searchable bulletin and list (with no tags) are available, but the searchable mechanism is the preferred collection method as the no-search option leads to all events since 1990. The searchable bulletin uses the search_post method and NESN for the parser and extractor.
The actual URL for the bulletin search program is:
http://quake.bc.edu:8000/cgi-bin/NESN/print_catalog.pl
Although I thought it was deprecated and not in use anywhere, I managed to find a few places where you could still get a web based listing in the old finger format. A parser and extractor have been written and tested successfully for the AEIC instance. Note that this is only applicable to web accessible plain text versions of a finger listing that report event times in UTC.
The actual URL for the finger listing for which this extractor works is:
http://www.aeic.alaska.edu/cgi-bin/quake_finger.pl
The AEIC, run by the University of Alaska, Fairbanks, has its own unique format for the earthquake bulletin information that is displayed. The data returned from the searchable bulletin is white space separated, and contains three possible magnitude types, mb, ML, and MS. The searchable bulletin uses the search_qf method and AEIC for the parser and extractor.
The actual URL for the bulletin search program is:
http://www.aeic.alaska.edu/cgi-bin/db2catalog.pl
The Nevada Broadcast of Earthquakes (NBE) has a tagged web format similar to that
used in the recenteqs display of other regional networks. No specific format description
is available. However, a different parser and extractor is used: NBEwww.
The actual URL for the bulletin is:
http://www.seismo.unr.edu/Catalog/nbe.html
Also available from UNR is a searchable catalog. Like the NESN and NCSN bulletins, the
search_post method is used. The parser and extractor is unique
to this institution and is currently called NBEsearch.
The actual URL for the bulletin search program is:
http://www.seismo.unr.edu/cgi-bin/catalog-search
This is a well documented format used by the NEIC/USGS. Format descriptions can be found here:
ftp://hazards.cr.usgs.gov/weekly/ehdf.txt
The weekly QED bulletin is available via the ftp method using the edhr
parser.
The ftp location for the QED weekly files is:
ftp://hazards.cr.usgs.gov/weekly
However, in the pf file, the leading ftp:// is excluded in the ftphost parameter and /weekly is the value used for the ftpdir. The ftpmatch should be set to ehdf.*.
The UW card format is a well documented format used by the PNSN of UW. Format descriptions can be found here:
ftp://ftp.ess.washington.edu/pub/seis_net/README.cardformat
http://www.pnsn.org/INFO_GENERAL/PNSN_QUARTERLY_EQ_CATALOG_KEY.html
The PNSN reviewed bulletin is available via the ftp method using the uwcard
parser.
The ftp location for the PNSN reviewed files is:
ftp://ftp.ess.washington.edu/pub/seis_net
However, in the pf file, the leading ftp:// is excluded in the ftphost parameter and pub/seis_net is the value used for the ftpdir. The ftpmatch should be set to loc.[0-9].*.
This is a well documented format used by the NEIC/USGS. It is another example of a card format where the first one or two characters of the line determines what type of information is available. This parser was written to collect the USGS mine explosion bulletin. The QED weekly bulletin could also be collected in this format, but the ehdf format is preferred, mostly for historic reasons. For the USGS mine explosion bulletin, the mchdr method and mchdr extractor are used. Format descriptions can be found here:
ftp://hazards.cr.usgs.gov/weekly/mchedr.txt
The ftp location for the recent mine explosion files is:
ftp://hazards.cr.usgs.gov/explosions/mchedrexp.dat
However, in the pf file, the leading ftp:// is excluded in the ftphost parameter and explosions is the value used for the ftpdir. The ftpmatch should be set to mchedrexp.dat. Historic information, from 1997 forward, is available from:
ftp://hazards.cr.usgs.gov/mineblast/
In general, only change the src, localdir, and ndays. Leave the remainder as is or unexpected results may occur.
# # bulletin2orb.pf # # Not all of these should be run under a # single instance of bulletin2orb. # Many should be run with a longer default # interval between collection attempts # # --J.Eakins 10/7/2009 bulletins &Arr{ # # These bulletins can be collected using the default # settings (-s 600). # # Create a bulletin2orb_rapid.pf between START/END # # START AEIC-finger &Arr{ # quick, but not so accurate in time/space solutions from AEIC method htmlnotags parser finger extractor finger src AEIC/finger # srcname will be /pf/$src/orb2dbt auth AEIC # auth == $auth url http://www.aeic.alaska.edu/cgi-bin/quake_finger.pl defaultmagtype ml # } EMSC-www &Arr{ method htmltagged # text file available via http extractor EMSC parser EMSC src EMSC # srcname will be /pf/$src/orb2dbt auth EMSC # url http://www.emsc-csem.org/index.php?page=current&sub=list } MTECH-auto &Arr{ # automatic solutions from Montana Tech method htmlnotags parser mtechAUTO extractor mtechAUTO src MTECH/A # srcname will be /pf/$src/orb2dbt auth MTECH # auth == $auth . "$evid" url http://mbmgquake.mtech.edu/earthworm/automatic_locations.html defaultmagtype md # not coded yet } MTECH-rev &Arr{ # reviewed solutions from Montana Tech method htmlnotags parser recenteqs extractor simsum src MTECH/R # srcname will be /pf/$src/orb2dbt auth MTECH_R # auth == $auth url http://mbmgquake.mtech.edu/earthworm/reviewed_locations.html defaultmagtype ml # not coded yet } UNR-NBE &Arr{ # both prelim and reviewed solutions from the Nevada Broadcast of Earthquakes method htmlnotags extractor NBEwww # extract_nbeWWW parser NBEwww # parse_nbeWWW src NBE # srcname will be /pf/$src/orb2dbt auth NBE # auth == $auth . "$NBE_evid" url http://www.seismo.unr.edu/Catalog/nbe.html defaultmagtype ml # not coded yet, "local magnitudes" } NEIC-CUBE &Arr{ # combined quick bulletin from NEIC. Includes solutions from multiple regional networks method text # text file available via http parser CUBE src NEIC/CUBE # srcname will be /pf/$src/orb2dbt auth USGS # qedCUBE will actually have auth == "USGS:$contributor" url http://earthquake.usgs.gov/eqcenter/catalogs/merged_catalog.cube } NEIC-QED &Arr{ # Quick Earthquake determination list from NEIC/USGS method htmltagged # WWW screen scrape w/ tags parser recenteqs extractor neicWWW TZ UTC src NEIC/qed # srcname will be /pf/$src/orb2dbt auth QED # auth == $auth url http://earthquake.usgs.gov/eqcenter/recenteqsww/Quakes/quakes_all.php defaultmagtype M # md if mag< 1.9, ml if mag >=1.9, ??? } Lamont-LCSN &Arr{ # Lamont Cooperative Seismo Network (Eastern US) method htmltagged # WWW screen scrape w/ tags parser recenteqs extractor qedWWW TZ US/Eastern src LCSN # srcname will be /pf/$src/orb2dbt auth LCSN # auth == $auth url http://www.ldeo.columbia.edu/LCSN/recenteqs/Quakes/quakes0.html defaultmagtype md # md if mag< 1.9, ml if mag >=1.9, ??? } CERI-NMSN &Arr{ method htmltagged parser recenteqs extractor qedWWW TZ US/Central src NMSN # srcname will be /pf/$src/orb2dbt auth CERI # auth == $auth url http://folkworm.ceri.memphis.edu/recenteqs/Quakes/quakes0.html defaultmagtype md # md if mag< 1.9, ml if mag >=1.9, ??? } PNSN &Arr{ method htmltagged parser recenteqs extractor qedWWW TZ US/Pacific src PNSN/A # srcname will be /pf/$src/orb2dbt auth PNSN_A # auth == $auth url http://www.pnsn.org/recenteqs/Quakes/quakes0.htm defaultmagtype md # md if mag< 1.9, ml if mag >=1.9, ??? } UUSS-combo &Arr{ method htmltagged parser recenteqs extractor qedWWW TZ US/Mountain src UUSS/combo # srcname will be /pf/$src/orb2dbt auth UUSS # auth == $auth url http://www.seis.utah.edu/req2webdir/recenteqs/Quakes/quakes0.html defaultmagtype md # md if mag< 1.9, ml if mag >=1.9, ??? } NRCAN-PGC &Arr{ method htmlnotags parser recenteqs extractor simsum src NRCAN/PGC # srcname will be /pf/$src/orb2dbt auth PGC # auth == $auth ### If you want more events, try the year long grab ###http://earthquakescanada.nrcan.gc.ca/recent/maps-cartes/index-eng.php?maptype=1y&tpl_region=west url http://earthquakescanada.nrcan.gc.ca/recent/maps-cartes/index-eng.php?tpl_region=west defaultmagtype M # not coded yet } NRCAN-GSC &Arr{ method htmlnotags parser recenteqs extractor simsum src NRCAN/GSC # srcname will be /pf/$src/orb2dbt auth GSC # auth == $auth url http://earthquakescanada.nrcan.gc.ca/recent/maps-cartes/index-eng.php?tpl_region=east defaultmagtype M # not coded yet } # END # # these should probably not be checked every 600 seconds # I suggest putting them in a separate pf and run a bull2orb # that has a -s of 3600*3 (every 3 hours) # # # Create a bulletin2orb_multi.pf between START/END # # START # searchable catalogs need ndays defined AEIC-search &Arr{ method search_qf parser AEIC # calls will be to postqf_AEIC, extract_AEIC, parse_AEIC extractor AEIC # AEIC src AEIC # srcname will be /pf/$src/orb2dbt auth AEIC # auth == $auth url http://www.aeic.alaska.edu/cgi-bin/db2catalog.pl ndays 90 # used to set min/max time for search. Search range w/o enddate: now-86400*ndays::now } SCSN-search &Arr{ method search_qf parser HYPO2000 # calls will be to postqf_HYPO2000, extract_HYPO2000, parse_HYPO2000 extractor HYPO2000 # extract_HYPO2000 src SCSN # srcname will be /pf/$src/orb2dbt auth SCSN # auth == $auth . "$evid" url http://www.data.scec.org/cgi-bin/catalog/catalog_search.pl ndays 7 # used to set min/max time for search. Search range w/o enddate: now-86400*ndays::now } NCSN-search &Arr{ method search_post parser HYPO2000 extractor HYPO2000 # extract_HYPO2000 src NCSN # srcname will be /pf/$src/orb2dbt auth NCSN # auth == $auth . "$nscn_evid" . "$rev_info" url http://www.ncedc.org/cgi-bin/catalog-search2.pl ndays 7 # used to set min/max time for search. Search range w/o enddate: now-86400*ndays::now } BC-NESN &Arr{ method search_post parser NESN extractor NESN # extract_NESN src NESN # srcname will be /pf/$src/orb2dbt auth NESN # auth == $auth url http://quake.bc.edu:8000/cgi-bin/NESN/print_catalog.pl ndays 31 # used to set min/max time for search. Search range w/o enddate: now-86400*ndays::now } NBE-search &Arr{ method search_post parser NBEsearch extractor NBEsearch # extract_NBE (different from extract_NBEwww) src NBE # srcname will be /pf/$src/orb2dbt auth UNR_NBE # auth == $auth url http://www.seismo.unr.edu/cgi-bin/catalog-search ndays 7 # used to set min/max time for search. Search range w/o enddate: now-86400*ndays::now } ANF_rt &Arr{ method dbsubset parser dbsubset src ANF # srcname will be /pf/$src/orb2dbt auth ANF # auth will be filled in with origin.auth after authsubset db /path/to/anf/rt/db/usarray authsubset auth=~/ANF.*/ ndays 7 # used to set min/max time for search. Search range w/o enddate: now-86400*ndays::now } ANF_arch &Arr{ method dbsubset parser dbsubset src ANF # srcname will be /pf/$src/orb2dbt auth ANF # auth will be filled in with origin.auth after authsubset db /path/to/anf/archived/db/usarray authsubset auth=~/ANF.*/ enddate 2/1/2008 # used to set endtime for search. Without enddate, endtime == now() ndays 31 # used to set min/max time for search. Search range w/o enddate: now-86400*ndays::now } UUSS-daily &Arr{ # utah daily updated solutions method htmlnotags parser recenteqs extractor simsum TZ UTC src UUSS/utah # srcname will be /pf/$src/orb2dbt auth UUSS # auth == $auth url http://www.quake.utah.edu/ftp/DATA_REQUESTS/RECENT_EQS/utah.list defaultmagtype md # md if mag< 1.9, ml if mag >=1.9, ??? } YELL-daily &Arr{ # yellowstone daily updated solutions method htmlnotags parser recenteqs extractor simsum TZ UTC src UUSS/yellowstone # srcname will be /pf/$src/orb2dbt auth UUSS # auth == $auth url http://www.quake.utah.edu/ftp/DATA_REQUESTS/RECENT_EQS/yellowstone.list defaultmagtype md # md if mag< 1.9, ml if mag >=1.9, ??? } NEIC-QEDweekly &Arr{ # NEIC's more reviewed QED solutions (not quite PDE quality) method ftp parser ehdf src NEIC/qedw # srcname will be /pf/$src/orb2dbt auth QED_weekly # auth == $auth ftphost hazards.cr.usgs.gov # remote host for ftp pickup ftpdir /weekly/ # remote directory where files are kept ftpmatch ehdf.* # match string or remote ftp files linestart GS # match for start of event line ("GS" for ehdr, "E" for CUBE, etc.) account jeakins@ucsd.edu # email address for anonymous ftp localdir savefiles/qed_weekly # local directory where retrieved files are kept } PNSN-rev &Arr{ # reviewed solutions by UW/PNSN method ftp parser uwcard src PNSN/R # srcname will be /pf/$src/orb2dbt auth PNSN_R # auth == $auth ftphost ftp.ess.washington.edu # remote host for ftp pickup ftpdir pub/seis_net/ # remote directory where files are kept ftpmatch loc.[0-9].* # match string or remote ftp files linestart A # match for start of event line ("GS" for ehdr, "E" for CUBE, etc.) linelength 40 # reject lines that are shorter than linelength account jeakins@ucsd.edu # email address for anonymous ftp localdir savefiles/pnsn_reviewed # local directory where retrieved files are kept defaultmagtype md # } GSmines &Arr{ method ftp parser mchdr extractor mchdr src NEIC/mines # srcname will be /pf/$src/orb2dbt auth NEIC_mines # auth == $auth ftphost hazards.cr.usgs.gov # remote host for ftp pickup ftpdir explosions # remote directory where files are kept ftpmatch mchedrexp.dat # match string or remote ftp files linestart HY|E # match for start of event line ("GS" for ehdr, "E" for CUBE, etc.) account jeakins@ucsd.edu # email address for anonymous ftp localdir savefiles/current_mines # local directory where retrieved files are kept } GSmines-monthly &Arr{ method ftp parser mchdr extractor mchdr src NEIC/mines # srcname will be /pf/$src/orb2dbt auth NEIC_mines # auth == $auth ftphost hazards.cr.usgs.gov # remote host for ftp pickup ftpdir mineblast # remote directory where files are kept ftpmatch ex.dat # match string or remote ftp files linestart HY|E # match for start of event line ("GS" for ehdr, "E" for CUBE, etc.) account jeakins@ucsd.edu # email address for anonymous ftp localdir savefiles/monthly_mines # local directory where retrieved files are kept } # END # # These should probably not be run from a daemonized bulletin2orb. # I suggest putting them in a separate pf and run bulletin2orb # as a monthly cronjob with the -1 option used # # # Create a bulletin2orb_multi.pf between START/END # # START PNSN-rev &Arr{ method ftp parser uwcard src PNSN/R # srcname will be /pf/$src/orb2dbt auth PNSN_R # auth == $auth ftphost ftp.ess.washington.edu # remote host for ftp pickup ftpdir pub/seis_net/ # remote directory where files are kept ftpmatch loc.[0-9].* # match string or remote ftp files linestart A # match for start of event line ("GS" for ehdr, "E" for CUBE, etc.) linelength 40 # reject lines that are shorter than linelength account jeakins@ucsd.edu # email address for anonymous ftp localdir savefiles/pnsn_reviewed # local directory where retrieved files are kept defaultmagtype md # } UUSS-lists &Arr{ # Reviwed(?) Utah region events method htmlnotags parser uussLIST extractor uussLIST src UUSS/utah # srcname will be /pf/$src/orb2dbt auth UUSS # auth == $auth url http://www.quake.utah.edu/EQCENTER/LISTINGS/UTAH/equtah_2009 } YELL-lists &Arr{ # Reviewed(?) Yellowstone region events method htmlnotags parser uussLIST extractor uussLIST src UUSS/yellowstone # srcname will be /pf/$src/orb2dbt auth UUSS # auth == $auth url http://www.quake.utah.edu/EQCENTER/LISTINGS/OTHER/yell_2009 } # END # # # Create a bulletin2orb_archived.pf between START/END # # START # # collect the databases you have locally and push to orb for downstream collection # should probably run infrequently # 2008PDE &Arr{ method dbsubset parser dbsubset src archived/NEIC/2008PDE # srcname will be /pf/$src/orb2dbt auth dummy # auth will be filled in with origin.auth after authsubset db archived_catalogs/qed/qed_2008 authsubset auth=~/.*/ } 2009PDE &Arr{ method dbsubset parser dbsubset src archived/NEIC/2009PDE # srcname will be /pf/$src/orb2dbt auth dummy # auth will be filled in with origin.auth after authsubset db qrchived_catalogs/pde/pde_2009 authsubset auth=~/.*/ } 2009PNSN &Arr{ method dbsubset parser dbsubset src archived/PNSN/2009 # srcname will be /pf/$src/orb2dbt auth dummy # auth will be filled in with origin.auth after authsubset db archived_catalogs/pnsn/pnsn_2009 authsubset auth=~/.*/ } 2009UUSS &Arr{ method dbsubset parser dbsubset src archived/UUSS/2009 # srcname will be /pf/$src/orb2dbt auth dummy # auth == $auth db archived_catalogs/utah/utah_2009 authsubset auth=~/.*/ } 2009SCSN &Arr{ method dbsubset parser dbsubset src archived/SCSN/2009 # srcname will be /pf/$src/orb2dbt auth dummy # auth == $auth db archived_catalogs/cit/cit_2009 authsubset auth=~/.*/ } 2009NCSN &Arr{ method dbsubset parser dbsubset src archived/NCSN/2009 # srcname will be /pf/$src/orb2dbt auth dummy # auth == $auth db archived_catalogs/ncec/ncec_2009 authsubset auth=~/.*/ } 2009NESN &Arr{ method dbsubset parser dbsubset src archived/NESN/2009 # srcname will be /pf/$src/orb2dbt auth dummy # auth == $auth db archived_catalogs/nesn/nesn_2009 authsubset auth=~/.*/ } 2009NBE &Arr{ method dbsubset parser dbsubset src archived/NBE/2009 # srcname will be /pf/$src/orb2dbt auth dummy # auth == $auth db archived_catalogs/unr/unr_2009 authsubset auth=~/.*/ } # END # # # Create a bulletin2orb_monthly.pf between START/END # # START # # these bulletins are either updated infrequently # or take a longer time for requests to process and should probably # be run as an infrequent cron job # NEIC-PDE &Arr{ method ftp parser ehdf src NEIC/PDE # srcname will be /pf/$src/orb2dbt auth PDE # auth == $auth ftphost hazards.cr.usgs.gov # remote host for ftp pickup ftpdir /pde/ # remote directory where files are kept ftpmatch ehdf2008.*|ehdf2009.* # match string or remote ftp files linestart GS # match for start of event line ("GS" for ehdr, "E" for CUBE, etc.) account jeakins@ucsd.edu # email address for anonymous ftp localdir savefiles/pde # local directory where retrieved files are kept } SCSN-longsearch &Arr{ method search_qf parser HYPO2000 # calls will be to postqf_HYPO2000, extract_HYPO2000, parse_HYPO2000 extractor HYPO2000 # extract_HYPO2000 src SCSN # srcname will be /pf/$src/orb2dbt auth SCSN # auth == $auth . "$evid" url http://www.data.scec.org/cgi-bin/catalog/catalog_search.pl # enddate 4/1/2008 # used to set endtime for search. Without enddate, endtime == now() ndays 90 # used to set min/max time for search. Search range w/o enddate: now-86400*ndays::now } NCSN-longsearch &Arr{ method search_post parser HYPO2000 extractor HYPO2000 # extract_HYPO2000 src NCSN # srcname will be /pf/$src/orb2dbt auth NCSN # auth == $auth . "$nscn_evid" . "$rev_info" url http://www.ncedc.org/cgi-bin/catalog-search2.pl ndays 90 # used to set min/max time for search. Search range w/o enddate: now-86400*ndays::now } NESN-longsearch &Arr{ method search_post parser NESN extractor NESN # extract_NESN src NESN # srcname will be /pf/$src/orb2dbt auth NESN # auth == $auth url http://quake.bc.edu:8000/cgi-bin/NESN/print_catalog.pl ndays 90 # used to set min/max time for search. Search range w/o enddate: now-86400*ndays::now } NBE-longsearch &Arr{ method search_post parser NBEsearch extractor NBEsearch # extract_NBE (different from extract_NBEwww) src NBE # srcname will be /pf/$src/orb2dbt auth UNR_NBE # auth == $auth url http://www.seismo.unr.edu/cgi-bin/catalog-search ndays 180 # used to set min/max time for search. Search range w/o enddate: now-86400*ndays::now } ANF-longsubset &Arr{ method dbsubset parser dbsubset src ANF # srcname will be /pf/$src/orb2dbt auth ANF # auth will be filled in with origin.auth after authsubset db /path/to/usarray/db/usarray authsubset auth=~/ANF.*/ ndays 90 # used to set min/max time for search. Search range w/o enddate: now-86400*ndays::now } # END # # # This is an example of a single-use update for a non-daemonized bulletin2orb. # In this case, the user has a single local file that needs to be converted. # # I suggest putting this collection in a separate pf and run bulletin2orb # as a command line process with the -1 option used # # # Create a bulletin2orb_once.pf between START/END # # START MTECH-file &Arr{ # file from mtech method file parser mtech_hypo71 extractor mtech_hypo71 linestart [0-9] src flatfile # srcname will be /pf/$src/orb2dbt auth MTECH # auth == $auth . "$evid" file /file/location/2004-2009.qks } # END }
Searchable bulletins and those using the dbsubset method need to have an ndays parameter to define the time range of the search.
Searchable bulletins and those using the dbsubset method can have an enddate parameter to define the ending time for the search. If not specified, search looks for ndays of data prior to now().
Whether or not a particular bulletin needs an extractor specified depends on the format and method used to collect it. Don't change the defaults.
Bulletins using the ftp method need to have an ftphost, ftpdir, ftpmatch, linestart, account, and localdir specified.
See rtdemo(1) bulletins for methodology for collecting the bulletins from the server provided by the ANF.
For earthquake bulletins that are updated rapidly and require no search of remote site:
% bulletin2orb -p pf/bulletin2orb_rapid $ORB
For earthquake bulletins that are updated frequently but may require searchs or other retreival mechanisms that are CPU intensive (on the remote side). Collect only once an hour.
% bulletin2orb -s 3600 -p pf/bulletin2orb_multi $ORB
For earthquake bulletins that are updated once less frequently, maybe once or twice a day, but contain multiple months of data, or otherwise don't need to be collected raidly, collect once per day:
% bulletin2orb -s 86400 -p pf/bulletin2orb_daily $ORB
For earthquake bulletins that are updated infrequently, maybe once or twice a month, or for re-collecting long stretches of data from searchable bulletins, collect via a cron job run monthly. Use the non-daemon mode of operations.
% bulletin2orb -1 -p pf/bulletin2orb_monthly $ORB
If you select a method, parser, or extractor that is not defined in bulletin.pm the script fails in unexpected ways.
Error messages like:
imply that critical information is not being retrieved from the bulletin collected via the ftp method. It is highly likely that the input you are trying to read is a short line placed by the originating institution to signify a comment, a non-located event, or a deleted event (bulletin2orb has no provisions to deal with deleted events). You can review the retrieved file to see what might be going on. To avoid the short lines in the input file, specify a linelength parameter in the appropriate bulletin collection task section of the parameter file. Lines shorter than linelength are rejected.Use of uninitialized value $p in hash dereference at bin/bulletin2orb line 120. Use of uninitialized value $m in hash dereference at bin/bulletin2orb line 121. Use of uninitialized value $parsed_info{"or_time"} in string at bin/bulletin2orb line 349.
mchedr2db(1) pde2origin(1) rtdemo(1)
The most important caveat: Garbage In = Garbage Out.
I am still finding new and interesting ways to cause this program to fail. Consider this an early beta release... I suspect that there may be memory issues when this program is run for a long time. No long-term testing has been completed yet.
The client must use Antelope version 4.11 (fully patched). Previous versions of orb2dbt(1) do not write out origin rows for non-associating events.
There is no subsetting of the data collected from the remote sites. For instance, if you are only interested in teleseismic events but are collecting the NEIC-CUBE bulletin, there is no way to subset the incoming list of events: this program returns all events that are reported, including the local ones.
If you are collecting many bulletins, or are running the script for the first time and have to collect many of the ftp files, it can take significantly longer than the default 600 seconds between bulletin collection passes.
The dbsubset method is less than ideal as it does not perform an exact duplication of data that was in the input db. Other methods have been put in to contrib to help with complete database replication.
Although this is a vast improvement over previous procedures to collect external bulletins, it is still a very complex procedure. Further attempts at documentation would probably be a good idea. I expect to present something about this at a future Antelope User Group Meeting so a presentation will be available at some point.
After years of trying to collect random formats of bulletin data produced by multiple sources and finally making the "one script to rule them all", I am reminded of a phrase that my daughter learned in kindergarten: "You get what you get, and you don't throw a fit." It seems appropriate for both the author and endusers to keep in mind.
If you have a favorite bulletin that does not currently have a method for collection or parsing, feel free to contact me to see if I would consider writing a parser for it. However see previous caveat...
Jennifer Eakins jeakins@ucsd.edu ANF University of California, San Diego