Data Carousel Quick Start/Tutorial

Under:

The Function of the Data Carousel

The Data Carousel should be used by STAR users to retrieve data from HPSS. The purpose of the data carousel is to organize the requests of users to retrieve data and to prevent chaos. User file requests are stored in a MySQL database. The Data Carousel consists of a collection of perl scripts (written by Jérôme Lauret) which provides the user with a simple way to submit requests. Keeping track of the number of requests made by the physics analysis and/or hardware/software groups ("accounting") is made by aa server which takes care of submitting requests according to needs.
In addition, the Data Carousel will warn you if a file you are requesting has already been requested by another user, so you do not waste the bandwidth in trying to restore something which is already there.

IMPORTANT Note: If you have an AFS HOME directory, you will NOT be able to use the Data Carousel tool due to a Unix-to-AFS poor hand-shaking (authentication failure). The solution : move back to NFS !!! There are more and more tools we discover on a monthly basis which does not work with AFS home directories ...

Getting Started

The client-perl scripts can be executed from any of the rcas nodes.

To do this, you need to create a file containing your requests from HPSS. There are several format you may use. All of the above example will assume you want to restore files from HPSS into a disk path /star/rcf/test/carousel/. You can of course restore files only to directories you actually have access to so, adapt the examples accordingly.

  • In its simpliest form, and if you want to restore a list of files in the SAME directory, you may write a file.lis composed of one HPSS file specification per line
    HPSSFile
    
    For example, this file may contain the following lines
    % echo "/home/starreco/hpsslogs/logfile01_9903260110"  >file.lis
    % echo "/home/starreco/hpsslogs/logfile01_9903260110" >>file.lis
    
    Note at this stage that the HPSS file name may be specified as a relative path. The default prepended path is /home/starreco and the above file could have been written like this
    % echo "hpsslogs/logfile01_9903260110"  >file.lis
    % echo "hpsslogs/logfile01_9903260110" >>file.lis
    
    However, it is a good idea to write the HPSS file names with no ambiguities.

     

  • The second format is used whenever you want to restore a list of files but in diverse directories. In this case, the file2.lis is composed of key pairs
    HPSSFile TargetFiles
    

    For example, file2.lis may contain the following request:

    /home/starreco/reco/central/P01hb/2000/08/st_physics_1235013_raw_0002.dst.root /star/rcf/test/carousel/New/jeromel/bla.dst
    
    In this example, input and output file do not necessarily match. It is your entire choice to organized how you want to save those files. Also, it is to be noted that the current version of the Data Carousel will actually create the target directory if it does not exists. So, beware of the potential mess you may create if you miss-type the output file names ...

    As another example, file3.lis contains 4 files to be transferred:

    /home/starreco/reco/central/P01hb/2000/08/st_physics_1235013_raw_0001.event.root /star/rcf/test/carousel/New/jeromel/physics/st_physics_1235013_raw_0001.event.root
    /home/starreco/reco/central/P01hb/2000/08/st_physics_1235013_raw_0001.hist.root /star/rcf/test/carousel/New/jeromel/physics/st_physics_1235013_raw_0001.hist.root
    /home/starreco/reco/central/P01hb/2000/08/st_physics_1235013_raw_0001.runco.root /star/rcf/test/carousel/New/jeromel/physics/st_physics_1235013_raw_0001.runco.root
    /home/starreco/reco/central/P01hb/2000/08/st_physics_1235013_raw_0001.tags.root /star/rcf/test/carousel/New/jeromel/physics/st_physics_1235013_raw_0001.tags.root
    

Note that the full documentation specifies that the TargetFiles should be specified in the form pftp://user@node.domain.zone/UnixVisiblePathFromThatNode/Filename where user is your user name, and node.domain.zone being the node where to restore the file to. However, la later version of the Data Carousel (V01.150 or up) will add "smartness" in choosing the node to connect to in order to access the disk you want. But if you want to use the full syntax (actually the preferred unbreakable method), you MUST specify the machine where the disk physically sits. Be aware that doing otherwise will create unnecessary NFS traffic and will slow down your file restoration.

Running the Scripts

Now you are ready to execute hpss_user.pl (this should already be in your path at this stage). You should execute it from a cas node.
Following our examples, you would issue one of the following

% hpss_user.pl -r /star/rcf/test/carousel/New/ -f file.lis
% hpss_user.pl /home/starreco/hpsslogs/logfile01_9903271502 /star/rcf/test/carousel/New/logfile01_9903271502
% hpss_user.pl -f file2.lis
$ hpss_user.pl -f file3.lis

The first and second line are both equivalent. They both restore the file /home/starreco/hpsslogs/logfile01_9903271502 in /star/rcf/test/carousel/New/ ; the first using a file list, the second a request fully specified from the command line. Whenever -r is specified, this option MUST be specified prior to the -f option.

NOTE hpss_user.pl script will alter your ~/.shosts file and your ~/.netrc file. If these files do not already exist in your area, they will be automatically created. If they do already exist in your area, then they will simply be updated. The .shosts file must contain an entry to allow starrdat to remotly access your account, and the .netrc file allows you to access HPSS (via pftp) without prompting you for a username and password.

 

What Happens Next?

On the server end (rmds05), starrdat runs the server script. This script is invoked every 10 minutes by a cron job. The server script inspects the MySQL database and provides a list of input and output files to the script ORNL batch system (adapted by Tom Throwe) and report the operation status in its own internal accounting tables. In short, this process restores the files from HPSS.

 

Viewing your requests in MySQL database

In order to inspect the submissions, you may use a script which inspects the content of the Accounting table. The last column of that table reflects the Success of the reason for a failure.

User hooks

General introduction

The Data Carousel may call user scripts or user hooks at each file transfer. The mechanism works as follow :

  • A directory named $HOME/carousel or $HOME/.carousel must first exist for this feature to work.
  • 2 user hooks script may be used. They have fixed names (you CANOT change those names). They are named beforeFtp.pl and afterFtp.pl. They will be called prior and after each file restoration. They are user based, that is, only applicable to you and what you ask for.
  • Both MUST be perl scripts. They DO NOT have to start with any perl command since the content of those files will be parsed, interpreted and executed externally by the main process (within an eval() statement).
  • They MUST return a value which will be interpreted as an error level or status. A value of 0 is considered as being normal (nothing happened, 0 errors). In the case of beforeFtp.pl, any other value will make the Data Carousel porcess to abort the transaction. So, it is of the outmost importance that you take good care of having this script well debugged ... otherwise, you will lose file restore operations.
    aftreFtp.pl user hook, if used tio return a non zero value, currently have no effect whatsoever on the Data Carousel process itself. However, you should also know that this script is called ONLY if everything went accordign to plan during the transfer. If something went wrong, the Data Carousel will revert to its normal error trap mechanism and will skip the execution of the post user hook.
  • Finally, it is noteworthy to mention th at ANY of the calls to those hooks will be killed after some time-out. So, using sleep(), long loops, pseudo-interractive process under the Data Carousel process and other fun you may come up with will not do you any good and only lead to killed transactions.

What are those hooks for ?

Those user hooks have been provided to help user execute commands/action before and after the file transfer. Amongst the possibilities are actions like

  • Send yourself an Email whe a file is restored (or any other method of keeping track of what's going on).
  • Checking if the disk where you want to restore your file has enough space to do so.
  • Starting an LSF batch after the file has been successfully restored
  • .... many other usage ...

There a certain number of variables you may use in both of those scripts and which will be known globally :

  • $inFile the HPSS input file name
  • $outFile the output file name
  • $fsize the file size. BEWARE : The currently broken ORNL batch system is not reporting this size and it may appear as -1 from within beforeFtp.pl
  • $dirnameThe directory name (extracted from $outFile)
  • $HOME your home directory

An example of beforeFtp.pl script follows. Here, we check the available space on the target disk and rebuild a list of files we cannot retore due to space restrictions. This list will be palattable to the Data Carousel for later re-submit ...

# Solaris only. You should write a more fancy script for a
# full proof remaining disk size procedure.
$result = `/bin/df -k $dirname | awk '{print $2}' | tail -1`;

if ($fsize == -1){
   # Here, we return 0 but you can also take a guesstimate
   # of the minimal required space and save time.
   return 0;
} else {
   # The file size is known ...
   if ($result < $fsize){
      # will skip this file, keep the list though
      # so we can re-submit later on ...
      open(FO,">>$HOME/skipped.lis");
      print FO "$inFile $outFile\n";
      close(FO);
      return 1;
   } else {
      return 0;
   }
}

In this afterFtp.pl script, we keep trakc of what is restored (note that only successful restored call this user hook):

if ( open(FO,">>$HOME/success.lis")){   # keep track of files restored  
   print FO "$outFile restored on ".localtime()."\n";
   close(FO);
}
chmod(oct(775),$outFile);               # make this file group rw :-) 

1;                                      # always return success - be aware that failure <=> retry 

Feel free to try and correct those examples yourself ...

 

Final comment

Note that those hooks are there to help you accomplish tasks which otherwise would required external scripts. But please, think carefully of the script you are writting and keep in mind that they will be executed for each file restored ... For example, a script doing a du -k instead of a df will be disastrous on the NFS server's load. Or a script touching or doing a stat() of all files in a directory tree for each file restored from HPSS would be an equally bad and considered as sabotage... In other words, keep your user hooks lite weighted.