NAME
rsync_baler - Sync a remote baler directory to a local copy
SYNOPSIS
rsync_baler [-n] [-h] [-v] [-d] [-p pf_file]
[-m email ] [-s select_regex ] [-r reject_regex ]
SUPPORT
Contributed code: NO BRTT support.
THIS PIECE OF SOFTWARE WAS CONTRIBUTED BY THE ANTELOPE USER COMMUNITY. BRTT DISCLAIMS ALL OWNERSHIP, LIABILITY, AND SUPPORT FOR THIS PIECE OF SOFTWARE.
FOR HELP WITH THIS PIECE OF SOFTWARE, PLEASE CONTACT THE CONTRIBUTING AUTHOR.
SUPPORT
Contributed: NO BRTT support -- please contact author.
DESCRIPTION
rsync_baler is designed to create and maintain a local archive
of miniseed files from balers with ftp and http access (Baler44).
The script will get the list of active stations and IPs from a database.
The required tables are deployment, stabaler and staq330 from a css3.0 database.
After creating the list of targets the script will spawn multiple copies
of itself to connect to each station and download the data.
Each child will connect to a single station and a single directory archive.
Multiple connection utilities are used during the retrieval process.
The list in order is:
-
http => wget, curl
File::Fetch::BLACKLIST = [qw/LWP ncftp lftp lynx iosock/]
The system will connect to the stations and will download as many files as
possible in the allowed time (set in the parameter file).
The downloads are done in alphabetical order.
Before runnning the script will review every entry on the databases and fix
any errors. Each file in the directory that does not match the name of the file gets thrown
into a trash folder in the archive. The rest of the files in the folder are verified on
the database. If missing from the database with an entry in the status field of "downloaded"
then one is created for it. Then the script verifies every entry on the database with status
"downloaded" and checks for the file in the directory. The the script verifies that every file
contains a checksum. If missing one then the system will try to get the respective checksum
file and will update the table with the results. If we have a success match then the value
of the checksum gets updated on the table. If the file is missing then we have the word
"missing" on the database. If there is an error then we have "error" flag on the database.
If the file is missing in the directory the entry on the database with value "downloaded"
gets deleted.
If we have a reject subset from command-line
then we use that string. If we don't then
we look into the parameter file for a possible
definition set on days of the week. This is
related to the second process that we run
over the files and parses the data. We want
to avoid collision with that process.
if ( $pf{avoid_on_day_of_week} ) {
$opt_r ||= $pf{avoid_on_day_of_week}{epoch2str( now(), "%A" )} ;
}
OPTIONS
-
-n
Null/Dry run. No updates and no downloads.
-
-v
Verbose.
-
-d
Debugging mode.
-
-h
Help. Prints a script synopsis to screen.
-
-x
Ignore all database flags for max attempts on files.
-
-s
Select station regex
-
-r
Reject station regex
-
-p parameter_file
Parameter file name to use.
-
-m email1,email2,email3
Email (or list of emails) address to send log at the end of script.
NOTES
The code will get the list of stations and the ips from the information
present in tables [deployment, stabaler, staq330] (all of them
are required for the process to run).
Fork of child process is done with this method:
undef $pid ;
$pid = open($archive{$station}{fh}, -| ) ;
if ($pid) {
fcntl($archive{$station}{fh},F_SETFL, O_NONBLOCK) ;
$archive{$station}{fh}->autoflush(1) ;
fork_debug($parent,"Got pid $pid for $station") if $opt_d ;
push @procs, $pid ;
$archive{$station}{pid} = $pid ;
}
BUGS AND CAVEATS
Needs to have valid stabaler and staq330 tables. Will use stations reported to be of
type "PacketBaler44" on table stabaler and will get the public IP of the station from
staq330 table. The port for the connections will be set on the parameter file.
AUTHOR
Juan C. Reyes <reyes@ucsd.edu>
Antelope User Group Contributed Software