STAR DB API v2

This page holds STAR database API v2 progress. Here are major milestones :

Components Overview (UML)
Core Components technology choice:
1. [done] New version of XML configuration file + XML Schema for verification (.xsd available, see attachment);
2. [done] New XML parser library :
  - [done] rapidXML (header-only, fast, non-validating parser) - will use this one as a start;
  - [not done] xerces-c (huge, validating, slow parser);
3. [done] Encryption support for XML configuration :
  - [done] XOR-SHIFT-BASE64;
  - [done] AES128, AES256;
  - [CYPHER-TYPE]-V1 means that [CYPHER-TYPE] was used with parameter set 1 (e.g.: predefined AES key 'ABC');
4. [done] Threads support for Load Balancer, Data preload, Cache search
  - pthreads - popular, widely used;
5. [done] Database Abstraction Layer reshape :
  - OpenDBX (MySQL, Postresql, Sqlite3, Oracle) - API level abstraction, will require "personalized" sql queries for our system;
6. [done] In-memory cache support [requires C++ -> DB -> C++ codec update]:
  - memcached (local/global, persistent, with expiration time setting);
  - simple hashmap (local - in memory cache, per job, non-persistent);
7. [in progress] Persistent cache support :
  - hypercache (local, persistent, timestamped, file-based cache);
Supported Features list
- OFFLINE DB: read-heavy
- ONLINE DB: write-heavy
AOB

--------------------------------------------------------------------------------------------------------------
New Load Balancer (abstract interface + db-specific modules) :

dbServers.xml, v1, v2 xml config support;
db load : number of running threads;
db load : response time;
should account for slave lag;
should allow multiple instances with different tasks and databases (e.g. one for read, another for write);
should randomize requests if multiple "free" nodes found;

OPEN QUESTIONS

Should we support <databases></databases> tag with new configuration schema?

<StDbServer>
<server> run2003 </server>
<host> onldb.starp.bnl.gov </host>
<port> 3501 </port>
<socket> /tmp/mysql.3501.sock </socket>
<databases> RunLog, Conditions_rts, Scalers_onl, Scalers_rts </databases>
</StDbServer>

Hypercache ideas and plans

--- HYPERCACHE ---

Definitions :
1. persistent representation of STAR Offline db on disk;
2. "database on demand" feature;

Each STAR Offline DB request is :

fetched from original db server by timestamp (or run number);
stored on disk in some format for later use;
subsequent requests check disk cache first, try db next, eventually building local db cache accessible by db name/table + timestamp;

3. data on disk is to be partitioned by :
a) "db path" AND ("validity time" OR "run number");

POSSIBLE IMPLEMENTATIONS:
a) local sqlite3 database + data blobs as separate files, SINGLE index file like "/tmp/STAR/offline.sqlite3.cache" for ALL requests;
b) local sqlite3 database + data blobs as separate files, MULTIPLE index files, one per request path. Say, request is "bemc/mapping/2003/emcPed", therefore, we will have "/tmp/STAR/sha1(request_path).sqlite3.cache" file for all data entries;
c) local embedded MySQL server (possibly, standalone app) + data blobs as separate files;
d) other in-house developed solution;

-----------------------------------------------------------------------------------------------------------------------------

SQLITE 3 database table format : [char sha1( path_within_subsystem ) ] [timestamp: beginTime] [timestamp: endTime] [seconds: expire] [char: flavor]

SQLITE 3 table file : /tmp/STAR_OFFLINE_DB/ sha1 ( [POOL] / [DOMAIN] / [SUBSYSTEM] ) / index.cache

SQLITE 3 blob is located at : /tmp/STAR_OFFLINE_DB/ sha1 ( [POOL] / [DOMAIN] / [SUBSYSTEM] ) / [SEGMENT] / sha1( [path_within_subsystem][beginTime][endTime] ).blob.cache

[SEGMENT] = int [0]...[N], for faster filesystem access

database access abstraction libraries overview

Name	Backends	C/C++	Linux/Mac version	Multithreading lib/drivers	RPM available	Performance	Licence
OpenDBX	Oracle,MySQL, PostgreSQL, Sqlite3 + more	yes/yes	yes/yes	yes/yes	yes/authors	fast, close to native drivers	LGPL
libDBI	MySQL, PostgreSQL, Sqlite3	yes/external	yes/yes	yes/some	yes/Fedora	fast, close to native drivers	LGPL/GPL
SOCI	Oracle,MySQL, PostgreSQL	no/yes	yes/yes	no/partial	yes/Fedora	average to slow	Boost
unixODBC	ALL known RDBMS	yes/external	yes/yes	yes/yes	yes/RHEL	slow	LGPL

While other alternatives exist ( e.g. OTL,QT/QSql ), I'd like to keep abstraction layer as thin as possible, so my choice is OpenDBX. It supports all databases we plan to use in a mid-term (MySQL, Oracle, PostreSQL, Sqlite3) and provides both C and C++ APIs in a single package with minimal dependencies on other packages.