User manual

The command line interface to the FileCatalog

The FileCatalog database contains information about all files used in production for the STAR experiment. The database itself and the PERL module used to manipulate it are described later in this document. Below You'll find the description of the command line utility to access data in this database.
The utility is called get_file_list.pl. If issued without any arguments it will print the following usage message:

% get_file_list.pl [-all] -keys keyword[,keyword,...]              \    
  [-cond keyword=value[,keyword=value,...]] [-start #] [-limit #] [-delim $St] \    
  [-onefile] [-o outputfile]

Command line options

The command line options are described below:

-all	use all entries regardless of availability flag. Default is to show only available=1
-alls	use all entries regardless of sanity flag, default is to show sanity=1 unless the sanity flag was used as a key
-onefile	A special mode of operation; returns a list of files, but gives only one location (the one with highest persistence) for each file, even if the database has many.
-keys	Specify what data You want to get from the database. A list of valid keywords, separated by colons should follow this parameter. See examples for more clarification. See also the description of aggregate functions for some more sophisticated tricks.
-cond	Specify the conditions limiting returned dataset. A list of valid expressions (consisting of a valid keyword, a valid operator and a value), separated by colons should follow this parameter. Since some of the operators are special characters, the list of expressions should always be enclosed in single quotes. Again, see examples for more explanations.
-start #	specify the record number # to start from - default - start with the first record (together with -limit can be used to get the data in chunks)
-limit #	limit the number of records returned (default 100, a value of 0 indicates an unlimited number of records).
-rlimit #	limit the number of unique LFN (attention, the number of lines may be more than the rlimit). Using rlimit will switch the limit logic off and you cannot use both at the same time.
-delim <string>	specify the characters that will separate the fields in the output (default: “::“)
-V	print the module version and leave
-as <scope> -as <site:scope>	connects to the FileCatalog database as specified. scopes are {Admin\|User}. site should be specified for a multi-site deployment.

Supported comparison or selection operators

<=	Not greater than
<	Lesser than
>=	Not less than
>	Greater than
<>	Not equal to
!=	Not equal to
=	equal to
!~	Not containing (i.e. do not match)	strings
~	Containing (i.e. approximately matching)	strings
[]	In range
][	Outside the range
%	Modulo	integer
%%	Not Modulo	integer

Logical operators

The following logical operators can also be used in a query. The usage scope in this case is in a -cond context as keyword=Value1 LogicalOperator Value2 {LogicalOperator Value3 ...}

\|\|	Logical OR	Strings or numbers
&&	Logical AND	Strings or numbers

Note that the use of the logical AND operator will return no selection in most cases (for example, a runnumber cannot be of Value1 and Value2 at the same time) but was added for later extension of the database : selections based on meta data such as triggered-events (many triggers in a file) would be case where this operator would be used.

The aggregate functions

These are special aggregate functions. They can be used in conjunction with any keyword that describes some data. Note that most of them only make sense for numerical values. See examples for the description on how to use them.

sum	The sum of the values
avg	The average of the values
min	The minimum of the values
max	The maximum of the values
orda	Sort the output in ascending order by this keyword
ordd	Sort the output in descending order by this keyword
count	The count for a given selection
grp	Group the output - put all the records with the same value for a given keyword together. This is required in conjunction with any aggregate functions used in a multiple keyword syntax context.

Keyword list

Here are the keywords that can be used in the context. There is a color scheme to those keywords as follow
Keywords in blue are currently supported by the database schema but unused by the production scripts and therefore are not filled (or ill-filled).
Keywords in aqua are automatically updated (there is no need to reset)
Keywords in magenta are filled but update may be needed (do not strongly rely on their value)

keyword	Notes	Meaning
site		The site where the data is stored, eg. `BNL`, `LBL`
sitecmt		The site comment string
siteloc		A full string describing the site location in the world
storage		The storage medium, eg. `HPSS`, `NFS`, `local` disk. Note that the `local` disk storage does not allow for a unique file location. One must also select on `node`
node		The name of the node where data is stored (necessary to locate `local` disk storage)
path		the path to a specific copy of the file
filename		The name of the data file
sname1		The (short) name of the data file with the extensions removed. E.G. "st_physics_12114010_raw_4040002"
sname2		The (short) name of the data file with only the file name prefix remaining. E.G. "st_physics". Useful, for example, to isolate only st_physics files and rejecting "st_physics_adc" files.
filetype		The type of the file, e.g. "daq_reco_dst", "MC_fzd" etc ...
extension		The extension of the file - directly connected to type (each file type has an associated extension)
events		Number of events or entries in the file
size		The size of the data file
fileseq		The file sequence as determined during data taking by DAQ. Arbitrary for simulation and processed files.
stream		The file stream if applicable (defaut is 0)
md5sum	Early stage db fill did not update this field. It may return 0.	The file's md5 checksum
production		The production tag with which a given file was produced. Can also be "raw" or "simulation"
library		The library version this file was produced with
trgsetupname	Used in to encode the path in production	The name of the online trigger setup name
trgname		The name of one trigger in a collection of triggers associated to a runumber.
trgcount		The event count having the associated trgname for a given runnumber
trgword	This is available for Year4 data and beyond for DAQ files	The trigger word associated to one trigger in a collection
trgversion		The trigger word version associated to a trgname
trgdefinition		The trigger definition of one trigger in a collection
runtype		the type of the run - eg. "physics", "laser" , “pulser”, “pedestal”, “test” but also "simulation" for simulated datasets
configuration		The detector configuration name. A detector configuration is a combination of detectors that were present during data taking in a given run. Note tha the combination configuration/ geometry is unique (but not any of the two alone)
geometry		The geometry definition for a given simulation set.
runnumber		The number of the run. Arbitrary for simulations.
runcomments		The comments for a given run.
collision		The collision type. Specified in the form of <first particle><second particle><collision energy>, eg. "AuAu200"
datetaken	Format was messed up at conversion old->new Catalog. Can be (and will be) recovered.	The date the data was taken. Arbitrary for simulation.
magscale		The name of the magnetic field scale, e.g. FullField
magvalue		The actual magnetic field value
filecomment		The comment to the file.
owner		The owner of the file.
protection	Subject to changes	The protection or read/write permissions, given in a format similar to UNIX 'ls -l'
available		is the file available ? (0 if one cannot get it from HPSS or the file disappeared from disk)
persistent		is the file persistent ?
createtime	Only HPSS files have a createtime which is not subject to changes	the time a file was created. Format is YYYYmmddHHMMSS
inserttime		the time a file data was inserted into the database.
simcomment		The comments for the simulation
generator		The event generator name
genversion		Event generator version
gencomment		Event generator comments
genparams		Event generator params
tpc		was the TPC in the data stream when specific data was taken?
svt		was the SVT in the data stream when specific data was taken?
tof		was the TOF in the data stream when specific data was taken?
emc		was the B-EMC in the data stream when specific data was taken?
eemc		was the E-EMC in the data stream when specific data was taken?
fpd		was the FPD in the data stream when specific data was taken?
ftpc		was the FTPC in the data stream when specific data was taken?
pmd		was the PMD in the data stream when specific data was taken?
rich		was the RICH in the data stream when specific data was taken?
ssd		was the SSD in the data stream when specific data was taken?
bbc		was the BBC in the data stream when specific data was taken?
bsmd		was the Barrel EMC SMD in the data stream when specific data was taken?
esmd		was the End-Cap SMD in the data stream when specific data was taken?
zdc		was the Zero-Degree Calorimeter in the data stream when specific data was taken?
tpx		was the tpx (tpc-X) information in the data stream when data was taken?
fgt		was the Forward Gem Tracker information saved in this data stream?

The following keywords are for either internal use or specific management purposes. They have no meaning to users (but are unique).

flid	Access the FileLocation ID of the FileLocation table
fdid	Access the FileData ID of the FileData table
rfdid	Access the FileData ID of the FileLocation table
pcid	Access the ProductionCondition ID of the ProductionConditions table
rpcid	Access the ProductionCondition ID of the FileData table
rpid	Access the runParam ID of the runParams table
rrpid	Access the runParam ID of the FileData table
ftid	Access the FileType ID of the FileTypes table
rftid	Access the FileType ID of the FileData table
stid	Access the storageType ID of the StorageTypes table
rtid	Access the storageType ID of the FileLocations table
ssid	Access the storageSite ID of the StorageSites table
rssid	Access the storageSite ID of the FileLocations table
tcfdid	Access the FileData ID of the TriggerCompositions table
tctwid	Access the TriggerWords ID of the TriggerCompositions table
twid	Access the TriggerWords ID of the TriggerWords table
dcid	Access the detectorConfiguration ID of the DetectorConfigurations table
rdcid	Access the detectorConfiguration ID o the RunParams table

lgnm	An aggregate keyword returning an equivalence to the logical name
lgpth	An aggregate keyword returning a logical path (a string which uniquely characterize the file's location)
fulld	An aggregate keyword returning a string completely defining all meta-data for real data
fulls	An aggregate keyword returning a string completely defining all meta-data for simulation data

Here are the keywords not connected to a specific field in the database. They change the behaviour of the module itself.

keyword	Notes	Meaning
simulation		Is the data a simulation?
nounique	In script mode, this keyword is set to 0 (i.e. unique fields) which may slow down tremendously your scripting. In the user interface `get_file_list.pl` however, this is set by default to 1 (does not ensure unique fields).	Should the module return all fields, instead of only unique selected fields.
noround		Turns off rounding of magfield, and collision energy.
startrecord		The PERL module will skip the first startrecord records and start returning data beginning from the next one.
limit		The PERL module will return the maximum of limit records.