The FileCatalog database contains information about all files used in production for the STAR experiment. The database itself and the PERL module used to manipulate it are described later in this document. Below You'll find the description of the command line utility to access data in this database.
The utility is called get_file_list.pl. If issued without any arguments it will print the following usage message:
% get_file_list.pl [-all] -keys keyword[,keyword,...] \ [-cond keyword=value[,keyword=value,...]] [-start #] [-limit #] [-delim $St] \ [-onefile] [-o outputfile]
The command line options are described below:
-all | use all entries regardless of availability flag. Default is to show only available=1 |
-alls | use all entries regardless of sanity flag, default is to show sanity=1 unless the sanity flag was used as a key |
-onefile | A special mode of operation; returns a list of files, but gives only one location (the one with highest persistence) for each file, even if the database has many. |
-keys | Specify what data You want to get from the database. A list of valid keywords, separated by colons should follow this parameter. See examples for more clarification. See also the description of aggregate functions for some more sophisticated tricks. |
-cond | Specify the conditions limiting returned dataset. A list of valid expressions (consisting of a valid keyword, a valid operator and a value), separated by colons should follow this parameter. Since some of the operators are special characters, the list of expressions should always be enclosed in single quotes. Again, see examples for more explanations. |
-start # | specify the record number # to start from - default - start with the first record (together with -limit can be used to get the data in chunks) |
-limit # | limit the number of records returned (default 100, a value of 0 indicates an unlimited number of records). |
-rlimit # | limit the number of unique LFN (attention, the number of lines may be more than the rlimit). Using rlimit will switch the limit logic off and you cannot use both at the same time. |
-delim <string> |
specify the characters that will separate the fields in the output (default: “::“) |
-V | print the module version and leave |
-as <scope> -as <site:scope> |
connects to the FileCatalog database as specified. scopes are {Admin|User}. site should be specified for a multi-site deployment. |
<= | Not greater than | |
< | Lesser than | |
>= | Not less than | |
> | Greater than | |
<> | Not equal to | |
!= | Not equal to | |
= | equal to | |
!~ | Not containing (i.e. do not match) | strings |
~ | Containing (i.e. approximately matching) | strings |
[] | In range | |
][ | Outside the range | |
% | Modulo | integer |
%% | Not Modulo | integer |
The following logical operators can also be used in a query. The usage scope in this case is in a -cond context as keyword=Value1 LogicalOperator Value2 {LogicalOperator Value3 ...}
|| | Logical OR | Strings or numbers |
&& | Logical AND | Strings or numbers |
Note that the use of the logical AND operator will return no selection in most cases (for example, a runnumber cannot be of Value1 and Value2 at the same time) but was added for later extension of the database : selections based on meta data such as triggered-events (many triggers in a file) would be case where this operator would be used.
These are special aggregate functions. They can be used in conjunction with any keyword that describes some data. Note that most of them only make sense for numerical values. See examples for the description on how to use them.
sum |
The sum of the values |
avg |
The average of the values |
min |
The minimum of the values |
max |
The maximum of the values |
orda |
Sort the output in ascending order by this keyword |
ordd |
Sort the output in descending order by this keyword |
count |
The count for a given selection |
grp |
Group the output - put all the records with the same value for a given keyword together. This is required in conjunction with any aggregate functions used in a multiple keyword syntax context. |
Here are the keywords that can be used in the context. There is a color scheme to those keywords as follow
Keywords in blue are currently supported by the database schema but unused by the production scripts and therefore are not filled (or ill-filled).
Keywords in aqua are automatically updated (there is no need to reset)
Keywords in magenta are filled but update may be needed (do not strongly rely on their value)
keyword |
Notes |
Meaning |
site |
|
The site where the data is stored, eg. BNL, LBL |
sitecmt |
|
The site comment string |
siteloc |
|
A full string describing the site location in the world |
storage |
The storage medium, eg. HPSS, NFS, local disk. Note that the local disk storage does not allow for a unique file location. One must also select on node |
|
node |
The name of the node where data is stored (necessary to locate local disk storage) |
|
path |
the path to a specific copy of the file |
|
filename |
The name of the data file |
|
sname1 |
The (short) name of the data file with the extensions removed. E.G. "st_physics_12114010_raw_4040002" |
|
sname2 |
The (short) name of the data file with only the file name prefix remaining. E.G. "st_physics". Useful, for example, to isolate only st_physics files and rejecting "st_physics_adc" files. |
|
filetype |
The type of the file, e.g. "daq_reco_dst", "MC_fzd" etc ... |
|
extension |
The extension of the file - directly connected to type (each file type has an associated extension) |
|
events |
Number of events or entries in the file |
|
size |
The size of the data file |
|
fileseq |
The file sequence as determined during data taking by DAQ. Arbitrary for simulation and processed files. |
|
stream |
The file stream if applicable (defaut is 0) |
|
md5sum |
Early stage db fill did not update this field. It may return 0. |
The file's md5 checksum |
production |
The production tag with which a given file was produced. Can also be "raw" or "simulation" |
|
library |
The library version this file was produced with |
|
trgsetupname |
Used in to encode the path in production |
The name of the online trigger setup name |
trgname |
|
The name of one trigger in a collection of triggers associated to a runumber. |
trgcount |
|
The event count having the associated trgname for a given runnumber |
trgword |
This is available for Year4 data and beyond for DAQ files |
The trigger word associated to one trigger in a collection |
trgversion |
The trigger word version associated to a trgname |
|
trgdefinition |
The trigger definition of one trigger in a collection |
|
runtype |
the type of the run - eg. "physics", "laser" , “pulser”, “pedestal”, “test” but also "simulation" for simulated datasets |
|
configuration |
The detector configuration name. A detector configuration is a combination of detectors that were present during data taking in a given run. Note tha the combination configuration/ geometry is unique (but not any of the two alone) |
|
geometry |
The geometry definition for a given simulation set. |
|
runnumber |
The number of the run. Arbitrary for simulations. |
|
runcomments |
The comments for a given run. |
|
collision |
The collision type. Specified in the form of <first particle><second particle><collision energy>, eg. "AuAu200" |
|
datetaken |
Format was messed up at conversion old->new Catalog. Can be (and will be) recovered. |
The date the data was taken. Arbitrary for simulation. |
magscale |
The name of the magnetic field scale, e.g. FullField |
|
magvalue |
The actual magnetic field value |
|
filecomment |
The comment to the file. |
|
owner |
The owner of the file. |
|
protection |
Subject to changes |
The protection or read/write permissions, given in a format similar to UNIX 'ls -l' |
available |
is the file available ? (0 if one cannot get it from HPSS or the file disappeared from disk) |
|
persistent |
is the file persistent ? |
|
createtime |
Only HPSS files have a createtime which is not subject to changes |
the time a file was created. Format is YYYYmmddHHMMSS |
inserttime |
the time a file data was inserted into the database. |
|
simcomment |
The comments for the simulation |
|
generator |
The event generator name |
|
genversion |
Event generator version |
|
gencomment |
Event generator comments |
|
genparams |
Event generator params |
|
tpc |
was the TPC in the data stream when specific data was taken? |
|
svt |
was the SVT in the data stream when specific data was taken? |
|
tof |
was the TOF in the data stream when specific data was taken? |
|
emc |
was the B-EMC in the data stream when specific data was taken? |
|
eemc |
was the E-EMC in the data stream when specific data was taken? |
|
fpd |
was the FPD in the data stream when specific data was taken? |
|
ftpc |
was the FTPC in the data stream when specific data was taken? |
|
pmd |
was the PMD in the data stream when specific data was taken? |
|
rich |
was the RICH in the data stream when specific data was taken? |
|
ssd |
was the SSD in the data stream when specific data was taken? |
|
bbc |
was the BBC in the data stream when specific data was taken? |
|
bsmd |
was the Barrel EMC SMD in the data stream when specific data was taken? |
|
esmd |
was the End-Cap SMD in the data stream when specific data was taken? |
|
zdc |
was the Zero-Degree Calorimeter in the data stream when specific data was taken? |
|
tpx | was the tpx (tpc-X) information in the data stream when data was taken? | |
fgt | was the Forward Gem Tracker information saved in this data stream? |
The following keywords are for either internal use or specific management purposes. They have no meaning to users (but are unique).
flid |
Access the FileLocation ID of the FileLocation table |
fdid |
Access the FileData ID of the FileData table |
rfdid |
Access the FileData ID of the FileLocation table |
pcid |
Access the ProductionCondition ID of the ProductionConditions table |
rpcid |
Access the ProductionCondition ID of the FileData table |
rpid |
Access the runParam ID of the runParams table |
rrpid |
Access the runParam ID of the FileData table |
ftid |
Access the FileType ID of the FileTypes table |
rftid |
Access the FileType ID of the FileData table |
stid |
Access the storageType ID of the StorageTypes table |
rtid |
Access the storageType ID of the FileLocations table |
ssid |
Access the storageSite ID of the StorageSites table |
rssid |
Access the storageSite ID of the FileLocations table |
tcfdid |
Access the FileData ID of the TriggerCompositions table |
tctwid |
Access the TriggerWords ID of the TriggerCompositions table |
twid |
Access the TriggerWords ID of the TriggerWords table |
dcid |
Access the detectorConfiguration ID of the DetectorConfigurations table |
rdcid |
Access the detectorConfiguration ID o the RunParams table |
|
|
lgnm |
An aggregate keyword returning an equivalence to the logical name |
lgpth |
An aggregate keyword returning a logical path (a string which uniquely characterize the file's location) |
fulld |
An aggregate keyword returning a string completely defining all meta-data for real data |
fulls |
An aggregate keyword returning a string completely defining all meta-data for simulation data |
Here are the keywords not connected to a specific field in the database. They change the behaviour of the module itself.
keyword |
Notes |
Meaning |
simulation |
Is the data a simulation? |
|
nounique |
In script mode, this keyword is set to 0 (i.e. unique fields) which may slow down tremendously your scripting. In the user interface get_file_list.pl however, this is set by default to 1 (does not ensure unique fields). |
Should the module return all fields, instead of only unique selected fields. |
noround |
Turns off rounding of magfield, and collision energy. |
|
startrecord |
The PERL module will skip the first startrecord records and start returning data beginning from the next one. |
|
limit |
The PERL module will return the maximum of limit records. |