User manual

The command line interface to the FileCatalog

The FileCatalog database contains information about all files used in production for the STAR experiment. The database itself and the PERL module used to manipulate it are described later in this document. Below You'll find the description of the command line utility to access data in this database.
The utility is called get_file_list.pl. If issued without any arguments it will print the following usage message:

% get_file_list.pl [-all] -keys keyword[,keyword,...]              \    
  [-cond keyword=value[,keyword=value,...]] [-start #] [-limit #] [-delim $St] \    
  [-onefile] [-o outputfile]

Command line options

The command line options are described below:

-all use all entries regardless of availability flag. Default is to show only available=1
-alls use all entries regardless of sanity flag, default is to show sanity=1 unless the sanity flag was used as a key
-onefile A special mode of operation; returns a list of files, but gives only one location (the one with highest persistence) for each file, even if the database has many.
-keys Specify what data You want to get from the database. A list of valid keywords, separated by colons should follow this parameter. See examples for more clarification. See also the description of aggregate functions for some more sophisticated tricks.
-cond Specify the conditions limiting returned dataset. A list of valid expressions (consisting of a valid keyword, a valid operator and a value), separated by colons should follow this parameter. Since some of the operators are special characters, the list of expressions should always be enclosed in single quotes. Again, see examples for more explanations.
-start # specify the record number # to start from - default - start with the first record (together with -limit can be used to get the data in chunks)
-limit # limit the number of records returned (default 100, a value of 0 indicates an unlimited number of records).
-rlimit # limit the number of unique LFN (attention, the number of lines may be more than the rlimit). Using rlimit will switch the limit logic off and you cannot use both at the same time.
-delim <string>
specify the characters that will separate the fields in the output (default: “::“)
-V print the module version and leave
-as <scope>
-as <site:scope>
connects to the FileCatalog database as specified. scopes are {Admin|User}. site should be specified for a multi-site deployment.


 

Supported comparison or selection operators

<= Not greater than  
< Lesser than  
>= Not less than  
> Greater than  
<> Not equal to  
!= Not equal to  
= equal to  
!~ Not containing (i.e. do not match) strings
~ Containing (i.e. approximately matching) strings
[] In range  
][ Outside the range  
% Modulo integer
%% Not Modulo integer

 

Logical operators

The following logical operators can also be used in a query. The usage scope in this case is in a -cond context as keyword=Value1 LogicalOperator Value2 {LogicalOperator Value3 ...}

|| Logical OR Strings or numbers
&& Logical AND Strings or numbers


Note that the use of the logical AND operator will return no selection in most cases (for example, a runnumber cannot be of Value1 and Value2 at the same time) but was added for later extension of the database : selections based on meta data such as triggered-events (many triggers in a file) would be case where this operator would be used.

The aggregate functions

These are special aggregate functions. They can be used in conjunction with any keyword that describes some data. Note that most of them only make sense for numerical values. See examples for the description on how to use them.

sum

The sum of the values

avg

The average of the values

min

The minimum of the values

max

The maximum of the values

orda

Sort the output in ascending order by this keyword

ordd

Sort the output in descending order by this keyword

count

The count for a given selection

grp

Group the output - put all the records with the same value for a given keyword together. This is required in conjunction with any aggregate functions used in a multiple keyword syntax context.


 

Keyword list

Here are the keywords that can be used in the context. There is a color scheme to those keywords as follow
Keywords in blue are currently supported by the database schema but unused by the production scripts and therefore are not filled (or ill-filled).
Keywords in aqua are automatically updated (there is no need to reset)
Keywords in magenta are filled but update may be needed (do not strongly rely on their value)
 

keyword

Notes

Meaning

site

 

The site where the data is stored, eg. BNL, LBL

sitecmt

 

The site comment string

siteloc

 

A full string describing the site location in the world

storage

 

The storage medium, eg. HPSS, NFS, local disk. Note that the local disk storage does not allow for a unique file location. One must also select on node

node

 

The name of the node where data is stored (necessary to locate local disk storage)

path

 

the path to a specific copy of the file

filename

 

The name of the data file

sname1

 

The (short) name of the data file with the extensions removed. E.G. "st_physics_12114010_raw_4040002"

sname2

 

The (short) name of the data file with only the file name prefix remaining. E.G. "st_physics". Useful, for example, to isolate only st_physics files and rejecting "st_physics_adc" files.

filetype

 

The type of the file, e.g. "daq_reco_dst", "MC_fzd" etc ...

extension

 

The extension of the file - directly connected to type (each file type has an associated extension)

events

 

Number of events or entries in the file

size

 

The size of the data file

fileseq

 

The file sequence as determined during data taking by DAQ. Arbitrary for simulation and processed files.

stream

 

The file stream if applicable (defaut is 0)

md5sum

Early stage db fill did not update this field. It may return 0.

The file's md5 checksum

production

 

The production tag with which a given file was produced. Can also be "raw" or "simulation"

library

 

The library version this file was produced with

trgsetupname

Used in to encode the path in production

The name of the online trigger setup name

trgname

 

The name of one trigger in a collection of triggers associated to a runumber.

trgcount

 

The event count having the associated trgname for a given runnumber

trgword

This is available for Year4 data and beyond for DAQ files

The trigger word associated to one trigger in a collection

trgversion

 

The trigger word version associated to a trgname

trgdefinition

 

The trigger definition of one trigger in a collection

runtype

 

the type of the run - eg. "physics", "laser" , “pulser”, “pedestal”, “test” but also "simulation" for simulated datasets

configuration

 

The detector configuration name. A detector configuration is a combination of detectors that were present during data taking in a given run. Note tha the combination configuration/ geometry is unique (but not any of the two alone)

geometry

 

The geometry definition for a given simulation set.

runnumber

 

The number of the run. Arbitrary for simulations.

runcomments

 

The comments for a given run.

collision

 

The collision type. Specified in the form of <first particle><second particle><collision energy>, eg. "AuAu200"

datetaken

Format was messed up at conversion old->new Catalog. Can be (and will be) recovered.

The date the data was taken. Arbitrary for simulation.

magscale

 

The name of the magnetic field scale, e.g. FullField

magvalue

 

The actual magnetic field value

filecomment

 

The comment to the file.

owner

 

The owner of the file.

protection

Subject to changes

The protection or read/write permissions, given in a format similar to UNIX 'ls -l'

available

 

is the file available ? (0 if one cannot get it from HPSS or the file disappeared from disk)

persistent

 

is the file persistent ?

createtime

Only HPSS files have a createtime which is not subject to changes

the time a file was created. Format is YYYYmmddHHMMSS

inserttime

 

the time a file data was inserted into the database.

simcomment

 

The comments for the simulation

generator

 

The event generator name

genversion

 

Event generator version

gencomment

 

Event generator comments

genparams

 

Event generator params

tpc

 

was the TPC in the data stream when specific data was taken?

svt

 

was the SVT in the data stream when specific data was taken?

tof

 

was the TOF in the data stream when specific data was taken?

emc

 

was the B-EMC in the data stream when specific data was taken?

eemc

 

was the E-EMC in the data stream when specific data was taken?

fpd

 

was the FPD in the data stream when specific data was taken?

ftpc

 

was the FTPC in the data stream when specific data was taken?

pmd

 

was the PMD in the data stream when specific data was taken?

rich

 

was the RICH in the data stream when specific data was taken?

ssd

 

was the SSD in the data stream when specific data was taken?

bbc

 

was the BBC in the data stream when specific data was taken?

bsmd

 

was the Barrel EMC SMD in the data stream when specific data was taken?

esmd

 

was the End-Cap SMD in the data stream when specific data was taken?

zdc  

was the Zero-Degree Calorimeter in the data stream when specific data was taken?

tpx   was the tpx (tpc-X) information in the data stream when data was taken?
fgt   was the Forward Gem Tracker information saved in this data stream?


 

The following keywords are for either internal use or specific management purposes. They have no meaning to users (but are unique).

flid

Access the FileLocation ID of the FileLocation table

fdid

Access the FileData ID of the FileData table

rfdid

Access the FileData ID of the FileLocation table

pcid

Access the ProductionCondition ID of the ProductionConditions table

rpcid

Access the ProductionCondition ID of the FileData table

rpid

Access the runParam ID of the runParams table

rrpid

Access the runParam ID of the FileData table

ftid

Access the FileType ID of the FileTypes table

rftid

Access the FileType ID of the FileData table

stid

Access the storageType ID of the StorageTypes table

rtid

Access the storageType ID of the FileLocations table

ssid

Access the storageSite ID of the StorageSites table

rssid

Access the storageSite ID of the FileLocations table

tcfdid

Access the FileData ID of the TriggerCompositions table

tctwid

Access the TriggerWords ID of the TriggerCompositions table

twid

Access the TriggerWords ID of the TriggerWords table

dcid

Access the detectorConfiguration ID of the DetectorConfigurations table

rdcid

Access the detectorConfiguration ID o the RunParams table

 

 

lgnm

An aggregate keyword returning an equivalence to the logical name

lgpth

An aggregate keyword returning a logical path (a string which uniquely characterize the file's location)

fulld

An aggregate keyword returning a string completely defining all meta-data for real data

fulls

An aggregate keyword returning a string completely defining all meta-data for simulation data

 

Here are the keywords not connected to a specific field in the database. They change the behaviour of the module itself.

keyword

Notes

Meaning

simulation

 

Is the data a simulation?

nounique

In script mode, this keyword is set to 0 (i.e. unique fields) which may slow down tremendously your scripting. In the user interface get_file_list.pl however, this is set by default to 1 (does not ensure unique fields).

Should the module return all fields, instead of only unique selected fields.

noround

 

Turns off rounding of magfield, and collision energy.

startrecord

 

The PERL module will skip the first startrecord records and start returning data beginning from the next one.

limit

 

The PERL module will return the maximum of limit records.