Migration and notes from V01.265 to V01.275

This document is intended for FileCatalog managers only who have previously deployed an earlier version of API and older database table layout. It is NOT intended for users.

Reasoning for this upgrade and core of the upgrade

One of the major problem with the preceding database layout started to show itself when we reached 4 Million entries (for some reason, we seem to have magic numbers). A dire restriction was the presence of the field 'path' and 'nodename' in the FileLocations table. This table became unnecessarily large (of the order of GB) and sorting and queries would become slow and IO demanding (regardless of our careful indexing). The main action was to move both field to separate tables. This change requires a two step modification :

  1. reshape of the database (leaving the old field), deployment of the database API in cross mode support
  2. run the normalization scripts filling the new table and fields, deployment of the final API and drop of the obsolete columns (+ index rebuild)

The steps are more carefully described below ...

Step by step migration instructions

Has to be made in several steps for safety a least interruption of service (although a pain to the manager). Note that you can do that much faster by cutting the Master/slave relationship, disabling all daemons auto-updating the database, proceed with table reshape and normalization script execution, drop and rebuild index, deploy the point-of-no-return API and restore Master/slave relation).

This upgrade is best if you have perl 5.8 or upper. Note that this transition will be the LAST one using perl 5.6 (get ready for a perl upgrade on your cluster).

We will assume you know how to connect to your database from an account able to manipulate and create any tables in the FileCatalog database.

Steps in Phase I

  1. (0) Create the following tables
      DROP TABLE IF EXISTS FilePaths; CREATE TABLE FilePaths
      (
        filePathID                    BIGINT         NOT NULL         AUTO_INCREMENT,
        filePathName                  VARCHAR(255)   NOT NULL         UNIQUE,
        filePathCount                 INT,
        PRIMARY KEY                   (filePathID)
      ) TYPE=MyISAM;
    
      DROP TABLE IF EXISTS Hosts; CREATE TABLE Hosts 
     (
        hostID      smallint(6) NOT NULL auto_increment,
        hostName    varchar(30) NOT NULL default 'localhost',
        hostIDate   timestamp(14) NOT NULL,
        hostCreator varchar(15) NOT NULL default 'unknown',
        hostCount   int(11) default NULL,
        hostComment text,
        PRIMARY KEY (hostID),
        UNIQUE KEY  hostName (hostName)
      ) TYPE=MyISAM;
    
    
  2. Modify some table and recreate one
         
         ALTER TABLE `FileLocations` ADD `filePathID` bigint(20) NOT NULL default '0' AFTER `fileDataID`;
         ALTER TABLE `FileLocations` ADD `hostID` bigint(20) NOT NULL default '1' AFTER `protection`;
         UPDATE TABLE `FileLocations` SET hostID=0;
    
         # note that I did that one from the Web interface (TBC)
         INSERT INTO Hosts VALUES(0,'localhost',NOW()+0,'',0,'Any unspecified node'); 
    
         ALTER TABLE `FileLocations` ADD INDEX ( `filePathID` )  
    
         ALTER TABLE `FilePaths` ADD `filePathIDate` TIMESTAMP NOT NULL AFTER `filePathName` ;
         ALTER TABLE `FilePaths` ADD `filePathCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `filePathIDate` ;
         ALTER TABLE `FilePaths` ADD `filePathComment` TEXT AFTER `filePathCount`;
    
         ALTER TABLE `StorageSites` ADD  `storageSiteIDate` TIMESTAMP NOT NULL AFTER `storageSiteLocation` ;
         ALTER TABLE `StorageSites` ADD  `storageSiteCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `storageSiteIDate` ;
         ALTER TABLE `StorageSites` DROP `storageComment`;
         ALTER TABLE `StorageSites` ADD  `storageSiteComment` TEXT AFTER `storageSiteCount`;
    
         ALTER TABLE `StorageTypes` ADD `storageTypeIDate` TIMESTAMP NOT NULL AFTER `storageTypeName` ;
         ALTER TABLE `StorageTypes` ADD `storageTypeCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `storageTypeIDate` ;
    
    
         ALTER TABLE `FileTypes` ADD `fileTypeIDate` TIMESTAMP NOT NULL AFTER `fileTypeExtension` ;
         ALTER TABLE `FileTypes` ADD `fileTypeCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `fileTypeIDate` ;
         ALTER TABLE `FileTypes` ADD `fileTypeComment` TEXT AFTER `fileTypeCount`;
    
    
         ALTER TABLE `TriggerSetups` ADD `triggerSetupIDate` TIMESTAMP NOT NULL AFTER `triggerSetupComposition` ;
         ALTER TABLE `TriggerSetups` ADD `triggerSetupCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `triggerSetupIDate`;
         ALTER TABLE `TriggerSetups` ADD `triggerSetupCount`   INT AFTER `triggerSetupCreator`;
         ALTER TABLE `TriggerSetups` ADD `triggerSetupComment` TEXT  AFTER `triggerSetupCount`;
    
         ALTER TABLE `EventGenerators` ADD `eventGeneratorIDate` TIMESTAMP NOT NULL AFTER `eventGeneratorParams` ;
         ALTER TABLE `EventGenerators` ADD `eventGeneratorCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `eventGeneratorIDate` ;
         ALTER TABLE `EventGenerators` ADD `eventGeneratorCount`   INT AFTER `eventGeneratorCreator`;
    
         ALTER TABLE `RunTypes` ADD `runTypeIDate` TIMESTAMP NOT NULL AFTER `runTypeName` ;
         ALTER TABLE `RunTypes` ADD `runTypeCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `runTypeIDate` ;
    
         ALTER TABLE `ProductionConditions` DROP `productionComments`; 
         ALTER TABLE `ProductionConditions` ADD  `productionConditionIDate`   TIMESTAMP NOT NULL AFTER `libraryVersion`;
         ALTER TABLE `ProductionConditions` ADD  `productionConditionCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `productionConditionIDate`;
         ALTER TABLE `ProductionConditions` ADD  `productionConditionComment` TEXT AFTER `productionConditionCount`;
    
    
    
         #
         # This table was not shaped as a dictionary so needs to be re-created
         # Hopefully, was not filled prior (but will be this year)
         #
         DROP TABLE IF EXISTS TriggerWords; CREATE TABLE TriggerWords
         (
            triggerWordID           MEDIUMINT       NOT NULL        AUTO_INCREMENT,
            triggerWordName         VARCHAR(50)     NOT NULL,
            triggerWordVersion      CHAR(6)         NOT NULL DEFAULT "V0.0",
            triggerWordBits         CHAR(6)         NOT NULL,  
            triggerWordIDate        TIMESTAMP       NOT NULL,
            triggerWordCreator      CHAR(15)        DEFAULT 'unknown' NOT NULL,
            triggerWordCount        INT,
            triggerWordComment      TEXT,
            UNIQUE   TW_TriggerCharacteristic (triggerWordName, triggerWordVersion, triggerWordBits),
            PRIMARY KEY             (triggerWordID)
         ) TYPE=MyISAM;
  3. Deploy the new API CVS version 1.62 of FileCatalog.pm

  4. Run the following utility scripts

    util/path_convert.pl
    util/host_convert.pl

    Note that those scripts use a new method $fC->connect_as("Admin"); which assumes that the Master Catalog will be accessed using the XML connection description. Also, it should be obvious that

    use lib "/WhereverYourModulAPIisInstalled"; should be replaced by the appropriate path for your site (or test area). Finally, it uses API CVS version 1.62 which supports Xpath and Xnode transitional keywords allowing us to transfer the information from one field to one table.

  5. Check that Hosts table was filled properly and automatically with Creator/IDate
  6. Paranoia step : Re-run the scripts mentioned 2 steps ago

    At this stage and ideally, nothing should happen (as you have already modified the records).
    A few tips prior from doing that
    • % fC_cleanup.pl -modif node=localhost -cond node='' -doit
      would hopefully do nothing but if you have messed something up in the past, hostName would be NULL and the above would be necessary.
    • After a full update, the following queries should return NOTHING
      % get_file_list.pl -keys flid -cond rfpid=0 -all -alls -as Admin
      % get_file_list.pl -keys flid -cond rhid=0 -all -alls -as Admin

      Those are equivalent to the SQL statements
      >SELECT FileLocations.fileLocationID FROM FileLocations WHERE FileLocations.filePathID = 0 LIMIT 0, 100
      >SELECT FileLocations.fileLocationID FROM FileLocations WHERE FileLocations.hostID = 0 LIMIT 0, 100 
    If it does return anything, contact me for further investigation and database repairs. As a side note, the -as keyword was introduced recently and you should update your get_file_list.pl script if not available.
  7. Make a backup copy of the database for security (optional but safer) Backup can be done by easer a dump of mysql or more trivially, a cp -r of the database directory.
  8. Leave it running for a few days (should be fine) for confidence consolidation ;-)

You are ready for phase II. Hang on tight now ...

Steps in Phase II

Those steps are no VERY intrusive and potentially destructive. Be careful from here on ...

  1. Stop all daemons, be sure that during the rest of the operations, NO command attempts to manipulate the database. If you want to shield your users from the upgrade, stop all Master/slave relations.
  2. Connect to the master FileCatalog as administrator for that database and execute the following SQL commands
      > ALTER TABLE `FileLocations` ADD INDEX FL_HostIndex (hostID);
      > ALTER TABLE `FileLocations` DROP INDEX `FL_FileLocationUnique`, ADD UNIQUE (fileDataID, storageTypeID, filePathID, storageSiteID, hostID);
    
      # drop the columns not in use anymore / should also get rid of the associated
      # indexes.
      > ALTER TABLE `FileLocations` DROP COLUMN nodeName;
      > ALTER TABLE `FileLocations` DROP COLUMN filePath;
    
      # "rename" index / was created with a name difference to avoid clash for transition
      # now renamed for consistency
      > ALTER TABLE `FileLocations` DROP INDEX `filePathID`, ADD INDEX  FL_FilePathIndex (filePathID);
  3. OK, you should be done. Deploy either CVS version 1.63 which correspond to the FileCatalog API version V01.275 and above ... (by the way, get_file_list.pl -V gives the API version).


 

A few notes

  • The new API is XML connection aware via a non-mandatory module named XML::Simple . You should install that module but there are some limitations if you are using perl 5.6 i.e., you MUST use the schema with ONLY one choice per category (Admin, Master or User).
  • Your scripts will likely need to change if your Database Master and Slave are not on the same node (i.e. the administration account for the FileCatalog can be used only on the database Master and the regular user account on the Slave). There are a few forms of this such as the one below
    # Get connection fills the blanks while reading from XML
    # However, USER/PASSWORD presence are re-checked
    #$fC->debug_on();
    ($USER,$PASSWD,$PORT,$HOST,$DB) = $fC->get_connection("Admin");
    $port = $PORT if ( defined($PORT) );
    $host = $HOST if ( defined($HOST) );
    $db   = $DB   if ( defined($DB) );
    
    
    if ( defined($USER) ){   $user = $USER;}
    else {                   $user = "FC_admin";}
    
    if ( defined($PASSWD) ){ $passwd = $PASSWD;}
    else {                   print "Password for $user : ";
                             chomp($passwd = );}
    
    #
    # Now connect using a fully specified user/passwd/port/host/db
    #
    $fC->connect($user,$passwd,$port,$host,$db);

    or counting on the full definition in the XML file

    $fC    = FileCatalog->new();
    $fC->connect_as("Admin");
  • Note a small future convenience when XML is ON. connect_as() does not only select as who you want to connect to but where as well. In fact, the proper syntax is intent=SITE::User (for example BNL::Admin is valid as well as LBL::User). This is only partly supported however.
  • The new version of the API automatically add information in dictionary tables. Especially, the account under which a new dictionary value was inserted (Creator) and the insertion date (IDate) are filled automatically. A side effect being that the new API is NOT compatible with previous database table layout (no backward support will be attempted).