A first simple and immediate consideration is to minimize tape mount and dismount operations, causing latencies and therefore performance drops. Since we use the DataCarousel for most restore operations, let's summarize its features.
The DataCarousel (DC) is an HPSS front end which main purpose is to coordinate requests from many un-correlated client's requests. Its main assumption is that all requests are asynchronous that is, you make a request from one client and it is satisfied “later” (as soon as possible). In other words, the DC aggregates all requests from all clients (many users could be considered as separate clients) and re-order them according policies, and possibly aggregating multiple requests for the same source into one request to the mass storage. The DC system itself is composed of a light client program (script), a plug-and-play policy based server architecture component (script) and a permanent process (compiled code) interfacing with the mass storage using HPSS API calls (this components is known as the “Oakridge Batch” although it current code content has little to do with the original idea from the Oakridge National Laboratory). Client and server interacts via a database component isolating client and server completely from each other (but sharing the same API , a perl module).
Policies may throttle the amount of data by group (quota, bandwidth percentage per user, etc ... i.e. queue request fairshare) but also perform tape access optimization such as grouping requests by tape ID (for equivalent share, all requests from the same tape are grouped together regardless of the time at which this request was performed or position in the request queue). The policy could be anything one can come up with based on the information either historical usage or current pending requests and characteristics of those requests (this could include file type, user, class of service, file size, ...). The DC then submits bundle of requests to the daemon component ; each request is a bundle of N file and known as a “job”. The DC submits K of those jobs before stopping and observing the mass storage behavior: if the jobs go through, more are submitted otherwise, either the server stops or proceed with a recovery procedure and consistency checks (as it will assume that no reaction and no unit of work being performed is a sign of MSS failure). In other words, the DC will also be error resilient and recover from intrinsic HPSS failures (being monitored). Whenever the files are moved from tape to cache in the MSS, a call back to the DC server is made and captive account connection is initiated to pull the file out of mass storage cache to more permanent storage.
While the policy is clearly a source of optimization (as far as the user is concerned), from a DataCarousel “post policy” perspective, N*K files are being requested at minimum at every point in time. In reality, more jobs are being submitted so the consumption of the “overflow”of job are used to monitor if the MSS is alive. The N*K files represents a total amount of files which should match the number of threads allowed by the daemon. The current setting are K=50, N=15 with an overflow allowed up to 25). The daemon itself has the possibility to treat requests simultaneously according to a“depth”. Those calls to HPSS are however only advisory. The depth is set at being 30 deep for the DST COS and 20 deep for the Raw COS. The deepest the request queue will be, more files will be requested simultaneously but this means that the daemon will also have to start more threads as previously noted. Those parameters have been showed to influence the performance to some extent (within 10%) with however a large impact on response time: the larger the request stack, the “less instantaneous” the response from a user's perspective (since the request queue length is longer).
The daemon has the ability of organizing X requests into a sorted list of tape ID and number of requests per tape. There are a few strategies allowing to alter the performance. We chose to enable “start with the tape with the largest number of files requested”. In addition, and since our queue depth is rather small comparing to the ideal number of files (K) per job, we order the files requested by the user by tape ID. Both optimizations are in place and lead to a 20% improvement within a realistic usage (bulk restore, Xrootd, other user activities).
Optimization based on tapeID would need to be better quantified (graph, average restore rate) for several class of files and usage. TBD.
The tape ID program is a first implementation returning partial information. Especially, the MSS failures are not currently handled, leading to setting the tape ID to -1 (since there are now ways to recognize whether or not it is an error or a file missing in HPSS or even a file in the MSS MetaData server but located on a bad tape). Work in progress.
The queue depth parameters should be studied and adjusted according to the K and N values. However, this would need to respect the machine / hardware capabilities. The beefier the machine would be, the better but this is likely a fine tuning. This needs to be done with great care as the hardware is also shared by multiple experiments. Ideally the compiled daemon should auto-adjust to the DC API settings (and respect command line parameters for queue depth). TBD.
Currently, the daemon number of threads used for handling the HPSS API calls and to handle the call backs are sharing the same pool. This diminishes the number of threads available to communication with the Mass Storage and therefore, causes performance fluctuations (call back threads could get “stuck” or come in “waves” - we observed cosine behavior perhaps related to this issue). TBD.
Average (bytes) | Average (MB) | File Type |
943240627 | 899 | MC_fzd |
666227650 | 635 | MC_reco_geant |
561162588 | 535 | emb_reco_event |
487783881 | 465 | online_daq |
334945320 | 319 | daq_reco_laser |
326388157 | 311 | MC_reco_dst |
310350118 | 295 | emb_reco_dst |
298583617 | 284 | daq_reco_event |
246230932 | 234 | daq_reco_dst |
241519002 | 230 | MC_reco_event |
162678332 | 155 | MC_reco_root_save |
93111610 | 88 | daq_reco_MuDst |
52090140 | 49 | MC_reco_MuDst |
17495114 | 16 | MC_reco_minimc |
14982825 | 14 | daq_reco_emcEvent |
14812257 | 14 | emb_reco_geant |
12115661 | 11 | scaler |
884333 | 0 | daq_reco_hist |
This section seems rather academic considering the previous sections improvement perspectives.
In this section, we will discuss optimizing based on file size, perhaps isolated by PVR or COS. This will be possible in future run but would lead to a massive repackaging of files and data for the past years.
Further reading:
The following is the man page of how to use htar.
NAME htar - HPSS tar utility PURPOSE Manipulates HPSS-resident tar-format archives. SYNOPSIS htar -{c|t|x|X} -f Archive [-?] [-B] [-E] [-L inputlist] [-h] [-m] [-o] [-d debuglevel] [-p] [-v] [-V] [-w] [-I {IndexFile | .suffix}] [-Y [Archive COS ID][:Index File COS ID]] [-S Bufsize] [-T Max Threads] [Filespec | Directory ...] DESCRIPTION htar is a utility which manipulates HPSS-resident archives by writing files to, or retrieving files from the High Performance Storage System (HPSS). Files written to HPSS are in the POSIX 1003.1 "tar" format, and may be retrieved from HPSS, or read by native tar programs. For those unfamiliar with HPSS, an introduction can be found on the web at http://www.sdsc.edu/hpss The local files used by the htar command are represented by the Filespec parameter. If the Filespec parameter refers to a directory, then that directory, and, recursively, all files and directories within it, are referenced as well. Unlike the standard Unix "tar" command, there is no default archive device; the "-f Archive" flag is required. Archive and Member files Throughout the htar documentation, the term "archive file" is used to refer to the tar-format file, which is named by the "-f filename" command line option. The term "member file" is used to refer to individual files contained within the archive file. WHY USE HTAR htar has been optimized for creation of archive files directly in HPSS, without having to go through the intermediate step of first creating the archive file on local disk storage, and then copying the archive file to HPSS via some other process such as ftp or hsi. The program uses multiple threads and a sophisticated buffering scheme in order to package member files into in-memory buffers, while making use of the high-speed network striping capabilities of HPSS. In most cases, it will be signficantly faster to use htar to create a tar file in HPSS than to either create a local tar file and then copy it to HPSS, or to use tar piped into ftp (or hsi) to create the tar file directly in HPSS. In addition, htar creates a separate index file, (see next section) which contains the names and locations of all of the member files in the archive (tar) file. Individual files and directories in the archive can be randomly retrieved without having to read through the archive file. Because the index file is usually smaller than the archive file, it is possible that the index file may reside in HPSS disk cache even though the archive file has been moved offline to tape; since htar uses the index file for listing operations, it may be possible to list the contents of the archive file without having to incur the time delays of reading the archive file back onto disk cache from tape. It is also possible to create an index file for a tar file that was not originally created by htar. HTAR Index File As part of the process of creating an archive file on HPSS, htar also creates an index file, which is a directory of the files contained in the archive. The Index File includes the position of member files within the archive, so that files and/or directories can be randomly retrieved from the archive without having to read through it sequentially. The index file is usually significantly smaller in size than the archive file, and may often reside in HPSS disk cache even though the archive file resides on tape. All htar operations make use of an index file. It is also possible to create an index file for an archive file that was not created by htar, by using the "Build Index" [-X] function (see below). By default, the index filename is created by adding ".idx" as a suffix to the Archive name specified by the -f parameter. A different suffix or index filename may be specified by the "-I " option, as described below. By default, the Index File is assumed to reside in the same directory as the Archive File. This can be changed by specifying a relative or absolute pathname via the -I option. The Index file's relative pathname is relative to the Archive File directory unless an absolute pathname is specified. HTAR Consistency File HTAR writes an extra file as the last member file of each Archive, with a name similar to: /tmp/HTAR_CF_CHK_64474_982644481 This file is used to verify the consistency of the Archive File and the Index File. Unless the file is explicitly specified, HTAR does not extract this file from the Archive when the -x action is selected. The file is listed, however, when the -t action is selected. Tar File Restrictions When specifying path names that are greater than 100 characters for a file (POSIX 1003.1 USTAR) format, remember that the path name is composed of a prefix bufferFR, a / (slash), and a name buffer. The prefix buffer can be a maximum of 155 bytes and the name buffer can hold a maximum of 100 bytes. Since some implementations of TAR require the prefix and name buffers to terminate with a null (' ') character, htar enforces the restriction that the effective prefix buffer length is 154 characters (+ trailing zero byte), and the name buffer length is 99 bytes (+ trailing zero byte). If the path name cannot be split into these two parts by a slash, it cannot be archived. This limitation is due to the structure of the tar archive headers, and must be maintained for compliance with standards and backwards compatibility. In addition, the length of a destination for a hard or symbolic link ( the 'link name') cannot exceed 100 bytes (99 characters + zero- byte terminator). HPSS Default Directories The default directory for the Archive file is the HPSS home directory for the DCE user. An absolute or relative HPSS path can optionally be specified for either the Archive file or the Index file. By default, the Index file is created in the same HPSS directory as the Archive file. Use of Absolute Pathnames Although htar does not restrict the use of absolute pathnames (pathnames that begin with a leading "/") when the archive is created, it will remove the leading / when files are extracted from the archive. All extracted files use pathnames that are relative to the current working directory. HTAR USAGE Two groups of flags exist for the htar command; "action" flags and "optional" flags. Action flags specify the operation to be performed by the htar command, and are specified by one of the following: -c, -t, -x, -X One action flag must be selected in order for the htar command to perform any useful function. File specification (Filespec) A file specification has one of the following forms: WildcardPath or Pathname or Filename WildcardPath is a path specification that includes standard filename pattern-matching characters, as specified for the shell that is being used to invoke htar. The pattern- matching characters are expanded by the shell and passed to htar as command line arguments. Action Flags Action flags defined for htar are as follows: -c Creates a new HPSS-resident archive, and writes the local files specified by one or more File parameters into the archive. Warning: any pre-existing archive file will be overwritten without prompting. This behavior mimics that of the AIX tar utility. -t Lists the files in the order in which they appear in the HPSS- resident archive. Listable output is written to standard output; all other output is written to standard error. -x Extracts the files specified by one or more File parameters from the HPSS-resident archive. If the File parameter refers to a directory, the htar command recursively extracts that directory and all of its subdirectories from the archive. If the File parameter is not specified, htar extracts all of the files from the archive. If an archive contains multiple copies of the same file, the last copy extracted overwrites all previously extracted copies. If the file being extracted does not already exist on the system, it is created. If you have the proper permissions, then htar command restores all files and directories with the same owner and group IDs as they have on the HPSS tar file. If you do not have the proper permissions, then files and directories are restored with your owner and group IDs. -X builds a new index file by reading the entire tar file. This operation is used either to reconstruct an index for tar files whose Index File is unavailable (e.g., accidentally deleted), or for tar files that were not originally created by htar. Options -? Displays htar's verbose help -B Displays block numbers as part of the listing (-t option). This is normally used only for debugging. -d debuglevel Sets debug level (0 - N) for htar. 0 disables debug, 1 - n enable progressively higher levels of debug output. 5 is the highest level; anything > 5 is silently mapped to 5. 0 is the default debug level. -E If present, specifies that a local file should be used for the file specified by the "-f Archive" option. If not specified, then the archive file will reside in HPSS. -f Archive Uses Archive as the name of archive to be read or written. Note: This is a required parameter for htar, unlike the standard tar utility, which uses a built-in default name. If the Archive variable specified is - (minus sign), the tar command writes to standard output or reads from standard input. If you write to standard output, the -I option is mandatory, in order to specify an Index File, which is copied to HPSS if the Archive file is successfully written to standard output. [Note: this behavior is deferred - reading from or writing to pipes is not supported in the initial version of htar]. -h Forces the htar command to follow symbolic links as if they were normal files or directories. Normally, the tar command does not follow symbolic links. -I index_name Specifies the index file name or suffix. If the first character of the index_name is a period, then index_name is appended to the Archive name, e.g. "-f the_htar -I .xdnx" would create an index file called "the_htar.xndx". If the first character is not a period, then index_name is treated as a relative pathname for the index file (relative to the Archive file directory) if the pathname does not start with "/", or an absolute pathname otherwise. The default directory for the Index file is the same as for the Archive file. If a relative Index file pathname is specifed, then it is appended to the directory path for the Archive file. For example, if the Archive file resides in HPSS in the directory "projects/prj/files.tar", then an Index file specification of "-I projects/prj/files.old.idx" would fail, because htar would look for the file in the directory "projects/prj/projects/prj". The correct specification in this case is "-I files.old.idx". -L InputList Writes the files and directories listed in the "InputList" file to the archive. Directories named in the InputList file are not treated recursively. For directory names contained in the InputList file, the tar command writes only the directory entry to the archive, not the files and subdirectories rooted in the directory. Note that "home directory" notation ("~") is not expanded for pathnames contained in the InputList file, nor are wildcard characters, such as "*" and "?". -m Uses the time of extraction as the modification time. The default is to preserve the modification time of the files. Note that the modification time of directories is not guaranteed to be preserved, since the operating system may change the timestamp as the directory contents are changed by extracting other files and/or directories. htar will explicitly set the timestamp on directories that it extracts from the Archive, but not on intermediate directories that are created during the process of extracting files. -o Provides backwards compatibility with older versions (non-AIX) of the tar command. When this flag is used for reading, it causes the extracted file to take on the User and Group ID (UID and GID) of the user running the program, rather than those on the archive. This is the default behavior for the ordinary user. If htar is being run as root, use of this option causes files to be owned by root rather than the original user. -p Says to restore fields to their original modes, ignoring the present umask. The setuid, setgid, and tacky bit permissions are also restored to the user with root user authority. -S bufsize Specifies the buffer size to use when reading or writing the HPSS tar file. The buffer size can be specified as a value, or as kilobytes by appending any of "k","K","kb", or "KB" to the value. It can also be specified as megabytes by appending any of "m" or "M" or "mb" or "MB" to the value, for example, 23mb. -T max_threads Specifies the maximum number of threads to use when copying local member files to the Archive file. The default is defined when htar is built; the release value is 20. The maximum number of threads actually used is dependent upon the local file sizes, and the size of the I/O buffers. A good approximation is usually buffer size/average file size If the -v or -V option is specified, then the maximum number of local file threads used while writing the Archive file to HPSS is displayed when the transfer is complete. -V "Slightly verbose" mode. If selected, file transfer progress will be displayed in interactive mode. This option should normally not be selected if verbose (-v) mode is enabled, as the outputs for the two different options are generated by separate threads, and may be intermixed on the output. -v "Verbose" mode. For each file processed, displays a one-character operation flag, and lists the name of each file. The flag values displayed are: "a" - file was added to the archive "x" - file was extracted from the archive "i" - index file entry was created (Build Index operation) -w Displays the action to be taken, followed by the file name, and then waits for user confirmation. If the response is affirmative, the action is performed. If the response is not affirmative, the file is ignored. -Y auto | [Archive CosID][:IndexCosID] Specifies the HPSS Class of Service ID to use when creating a new Archive and/or Index file. If the keyword auto is specified, then the HPSS hints mechanism is used to select the archive COS, based upon the file size. If -Y cosID is specified, then cosID is the numeric COS ID to be used for the Archive File. If -Y :IndexCosID is specified, then IndexCosID is the numeric COS ID to be used for the Index File. If both COS IDs are specified, the entire parameter must be specified as a single string with no embedded spaces, e.g. "-Y 40:30". HTAR Memory Restrictions When writing to an HPSS archive, the htar command uses a temporary file (normally in /tmp) and maintains in memory a table of files; you receive an error message if htar cannot create the temporary file, or if there is not enough memory available to hold the internal tables. HTAR Environment HTAR should be compiled and run within a non-DCE HPSS environment. Miscellaneous Notes: 1. The maximum size of a single Member file within the Archive is approximately 8 GB, due to restrictions in the format of the tar header. HTAR does not impose any restriction on the total size of the Archive File when it is written to HPSS; however, space quotas or other system restrictions may limit the size of the Archive File when it is written to a local file (-E option). 2. HTAR will optionally write to a local file; however, it will not write to any file type except "regular files". In particular, it is not suitable for writing to magnetic tape. To write to a magnetic tape device, use the "tar" or "cpio" utility. Exit Status This command returns the following exit values: 0 Successful completion. >0 An error occurred. Examples 1. To write the file1 and file2 files to a new archive called "files.tar" in the current HPSS home directory, enter: htar -cf files.tar file1 file2 2. To extract all files from the project1/src directory in the Archive file called proj1.tar, and use the time of extraction as the modification time, enter: htar -xm -f proj1.tar project1/src 3. To display the names of the files in the out.tar archive file within the HPSS home directory, enter: htar -tvf out.tar Related Information For file archivers: the cat command, dd command, pax command. For HPSS file transfer programs: pftp, nft, hsi File Systems Overview for System Management in AIX Version 4 System Management Guide: Operating System and Devices explains file system types, management, structure, and maintenance. Directory Overview in AIX Version 4 Files Reference explains working with directories and path names. Files Overview in AIX Version 4 System User's Guide: Operating System and Devices provides information on working with files. HPSS web site at http://www.sdsc.edu/hpss Bugs and Limitations: - There is no way to specify relative Index file pathnames that are not rooted in the Archive file directory without specifying an absolute path. - The initial implementation of HTAR does not provide the ability to append, update or remove files. These features, and others, are planned enhancements for future versions.