This document is intended for FileCatalog managers only who have previously deployed an earlier version of API and older database table layout. It is NOT intended for users.
Reasoning for this upgrade and core of the upgrade
One of the major problem with the preceding database layout started to show itself when we reached 4 Million entries (for some reason, we seem to have magic numbers). A dire restriction was the presence of the field 'path' and 'nodename' in the FileLocations table. This table became unnecessarily large (of the order of GB) and sorting and queries would become slow and IO demanding (regardless of our careful indexing). The main action was to move both field to separate tables. This change requires a two step modification :
The steps are more carefully described below ...
Step by step migration instructions
Has to be made in several steps for safety a least interruption of service (although a pain to the manager). Note that you can do that much faster by cutting the Master/slave relationship, disabling all daemons auto-updating the database, proceed with table reshape and normalization script execution, drop and rebuild index, deploy the point-of-no-return API and restore Master/slave relation).
This upgrade is best if you have perl 5.8 or upper. Note that this transition will be the LAST one using perl 5.6 (get ready for a perl upgrade on your cluster).
We will assume you know how to connect to your database from an account able to manipulate and create any tables in the FileCatalog database.
DROP TABLE IF EXISTS FilePaths; CREATE TABLE FilePaths ( filePathID BIGINT NOT NULL AUTO_INCREMENT, filePathName VARCHAR(255) NOT NULL UNIQUE, filePathCount INT, PRIMARY KEY (filePathID) ) TYPE=MyISAM; DROP TABLE IF EXISTS Hosts; CREATE TABLE Hosts ( hostID smallint(6) NOT NULL auto_increment, hostName varchar(30) NOT NULL default 'localhost', hostIDate timestamp(14) NOT NULL, hostCreator varchar(15) NOT NULL default 'unknown', hostCount int(11) default NULL, hostComment text, PRIMARY KEY (hostID), UNIQUE KEY hostName (hostName) ) TYPE=MyISAM;
ALTER TABLE `FileLocations` ADD `filePathID` bigint(20) NOT NULL default '0' AFTER `fileDataID`; ALTER TABLE `FileLocations` ADD `hostID` bigint(20) NOT NULL default '1' AFTER `protection`; UPDATE TABLE `FileLocations` SET hostID=0; # note that I did that one from the Web interface (TBC) INSERT INTO Hosts VALUES(0,'localhost',NOW()+0,'',0,'Any unspecified node'); ALTER TABLE `FileLocations` ADD INDEX ( `filePathID` ) ALTER TABLE `FilePaths` ADD `filePathIDate` TIMESTAMP NOT NULL AFTER `filePathName` ; ALTER TABLE `FilePaths` ADD `filePathCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `filePathIDate` ; ALTER TABLE `FilePaths` ADD `filePathComment` TEXT AFTER `filePathCount`; ALTER TABLE `StorageSites` ADD `storageSiteIDate` TIMESTAMP NOT NULL AFTER `storageSiteLocation` ; ALTER TABLE `StorageSites` ADD `storageSiteCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `storageSiteIDate` ; ALTER TABLE `StorageSites` DROP `storageComment`; ALTER TABLE `StorageSites` ADD `storageSiteComment` TEXT AFTER `storageSiteCount`; ALTER TABLE `StorageTypes` ADD `storageTypeIDate` TIMESTAMP NOT NULL AFTER `storageTypeName` ; ALTER TABLE `StorageTypes` ADD `storageTypeCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `storageTypeIDate` ; ALTER TABLE `FileTypes` ADD `fileTypeIDate` TIMESTAMP NOT NULL AFTER `fileTypeExtension` ; ALTER TABLE `FileTypes` ADD `fileTypeCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `fileTypeIDate` ; ALTER TABLE `FileTypes` ADD `fileTypeComment` TEXT AFTER `fileTypeCount`; ALTER TABLE `TriggerSetups` ADD `triggerSetupIDate` TIMESTAMP NOT NULL AFTER `triggerSetupComposition` ; ALTER TABLE `TriggerSetups` ADD `triggerSetupCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `triggerSetupIDate`; ALTER TABLE `TriggerSetups` ADD `triggerSetupCount` INT AFTER `triggerSetupCreator`; ALTER TABLE `TriggerSetups` ADD `triggerSetupComment` TEXT AFTER `triggerSetupCount`; ALTER TABLE `EventGenerators` ADD `eventGeneratorIDate` TIMESTAMP NOT NULL AFTER `eventGeneratorParams` ; ALTER TABLE `EventGenerators` ADD `eventGeneratorCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `eventGeneratorIDate` ; ALTER TABLE `EventGenerators` ADD `eventGeneratorCount` INT AFTER `eventGeneratorCreator`; ALTER TABLE `RunTypes` ADD `runTypeIDate` TIMESTAMP NOT NULL AFTER `runTypeName` ; ALTER TABLE `RunTypes` ADD `runTypeCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `runTypeIDate` ; ALTER TABLE `ProductionConditions` DROP `productionComments`; ALTER TABLE `ProductionConditions` ADD `productionConditionIDate` TIMESTAMP NOT NULL AFTER `libraryVersion`; ALTER TABLE `ProductionConditions` ADD `productionConditionCreator` CHAR( 15 ) DEFAULT 'unknown' NOT NULL AFTER `productionConditionIDate`; ALTER TABLE `ProductionConditions` ADD `productionConditionComment` TEXT AFTER `productionConditionCount`; # # This table was not shaped as a dictionary so needs to be re-created # Hopefully, was not filled prior (but will be this year) # DROP TABLE IF EXISTS TriggerWords; CREATE TABLE TriggerWords ( triggerWordID MEDIUMINT NOT NULL AUTO_INCREMENT, triggerWordName VARCHAR(50) NOT NULL, triggerWordVersion CHAR(6) NOT NULL DEFAULT "V0.0", triggerWordBits CHAR(6) NOT NULL, triggerWordIDate TIMESTAMP NOT NULL, triggerWordCreator CHAR(15) DEFAULT 'unknown' NOT NULL, triggerWordCount INT, triggerWordComment TEXT, UNIQUE TW_TriggerCharacteristic (triggerWordName, triggerWordVersion, triggerWordBits), PRIMARY KEY (triggerWordID) ) TYPE=MyISAM;
Deploy the new API CVS version 1.62 of FileCatalog.pm
util/path_convert.pl
util/host_convert.pl
Note that those scripts use a new method $fC->connect_as("Admin"); which assumes that the Master Catalog will be accessed using the XML connection description. Also, it should be obvious that
use lib "/WhereverYourModulAPIisInstalled"; should be replaced by the appropriate path for your site (or test area). Finally, it uses API CVS version 1.62 which supports Xpath and Xnode transitional keywords allowing us to transfer the information from one field to one table.
>SELECT FileLocations.fileLocationID FROM FileLocations WHERE FileLocations.filePathID = 0 LIMIT 0, 100 >SELECT FileLocations.fileLocationID FROM FileLocations WHERE FileLocations.hostID = 0 LIMIT 0, 100
You are ready for phase II. Hang on tight now ...
Those steps are no VERY intrusive and potentially destructive. Be careful from here on ...
> ALTER TABLE `FileLocations` ADD INDEX FL_HostIndex (hostID); > ALTER TABLE `FileLocations` DROP INDEX `FL_FileLocationUnique`, ADD UNIQUE (fileDataID, storageTypeID, filePathID, storageSiteID, hostID); # drop the columns not in use anymore / should also get rid of the associated # indexes. > ALTER TABLE `FileLocations` DROP COLUMN nodeName; > ALTER TABLE `FileLocations` DROP COLUMN filePath; # "rename" index / was created with a name difference to avoid clash for transition # now renamed for consistency > ALTER TABLE `FileLocations` DROP INDEX `filePathID`, ADD INDEX FL_FilePathIndex (filePathID);
OK, you should be done. Deploy either CVS version 1.63 which correspond to the FileCatalog API version V01.275 and above ... (by the way, get_file_list.pl -V gives the API version).
# Get connection fills the blanks while reading from XML # However, USER/PASSWORD presence are re-checked #$fC->debug_on(); ($USER,$PASSWD,$PORT,$HOST,$DB) = $fC->get_connection("Admin"); $port = $PORT if ( defined($PORT) ); $host = $HOST if ( defined($HOST) ); $db = $DB if ( defined($DB) ); if ( defined($USER) ){ $user = $USER;} else { $user = "FC_admin";} if ( defined($PASSWD) ){ $passwd = $PASSWD;} else { print "Password for $user : "; chomp($passwd = );} # # Now connect using a fully specified user/passwd/port/host/db # $fC->connect($user,$passwd,$port,$host,$db);
or counting on the full definition in the XML file
$fC = FileCatalog->new(); $fC->connect_as("Admin");