STAR DB API v2

STAR DB API v2

This page holds STAR database API v2 progress. Here are major milestones :

  1. Components Overview (UML)
  2. Core Components technology choice:
    1. [done] New version of XML configuration file + XML Schema for verification (.xsd available, see attachment);
    2. [done] New XML parser library :
      • [done] rapidXML (header-only, fast, non-validating parser) - will use this one as a start;
      • [not done] xerces-c (huge, validating, slow parser);
    3. [done] Encryption support for XML configuration :
      • [done] XOR-SHIFT-BASE64;
      • [done] AES128, AES256;
      • [CYPHER-TYPE]-V1 means that [CYPHER-TYPE] was used with parameter set 1 (e.g.: predefined AES key 'ABC');
    4. [done] Threads support for Load Balancer, Data preload, Cache search
      • pthreads - popular, widely used;
    5. [done] Database Abstraction Layer reshape :
      • OpenDBX (MySQL, Postresql, Sqlite3, Oracle) - API level abstraction, will require "personalized" sql queries for our system;
    6. [done] In-memory cache support [requires C++ -> DB -> C++ codec update]:
      • memcached (local/global, persistent, with expiration time setting);
      • simple hashmap (local - in memory cache, per job, non-persistent);
    7. [in progress] Persistent cache support :
      • hypercache (local, persistent, timestamped, file-based cache);
  3. Supported Features list
    • OFFLINE DB: read-heavy
    • ONLINE DB: write-heavy
  4. AOB

--------------------------------------------------------------------------------------------------------------
New Load Balancer (abstract interface + db-specific modules) :

  • dbServers.xml, v1, v2 xml config support;
  • db load : number of running threads;
  • db load : response time;
  • should account for slave lag;
  • should allow multiple instances with different tasks and databases (e.g. one for read, another for write);
  • should randomize requests if multiple "free" nodes found;

OPEN QUESTIONS

Should we support <databases></databases> tag with new configuration schema?

<StDbServer>
<server> run2003 </server>
<host> onldb.starp.bnl.gov </host>
<port> 3501 </port>
<socket> /tmp/mysql.3501.sock </socket>
<databases> RunLog, Conditions_rts, Scalers_onl, Scalers_rts </databases>
</StDbServer>

Hypercache ideas and plans

--- HYPERCACHE ---

Definitions :
1. persistent representation of STAR Offline db on disk;
2. "database on demand" feature;

Each STAR Offline DB request is :

  1.   fetched from original db server by timestamp (or run number);
  2.   stored on disk in some format for later use;
  3.   subsequent requests check disk cache first, try db next, eventually building local db cache accessible by db name/table + timestamp;

3. data on disk is to be partitioned by :
 a) "db path" AND ("validity time" OR "run number");

POSSIBLE IMPLEMENTATIONS:
a) local sqlite3 database + data blobs as separate files, SINGLE index file like "/tmp/STAR/offline.sqlite3.cache" for ALL requests;
b) local sqlite3 database + data blobs as separate files, MULTIPLE index files, one per request path. Say, request is "bemc/mapping/2003/emcPed", therefore, we will have "/tmp/STAR/sha1(request_path).sqlite3.cache" file for all data entries;
c) local embedded MySQL server (possibly, standalone app) + data blobs as separate files;
d) other in-house developed solution;

-----------------------------------------------------------------------------------------------------------------------------

SQLITE 3 database table format : [char sha1( path_within_subsystem ) ] [timestamp: beginTime] [timestamp: endTime] [seconds: expire] [char: flavor]

SQLITE 3 table file : /tmp/STAR_OFFLINE_DB/ sha1 ( [POOL] / [DOMAIN] / [SUBSYSTEM] ) / index.cache

SQLITE 3 blob is located at : /tmp/STAR_OFFLINE_DB/ sha1 ( [POOL] / [DOMAIN] / [SUBSYSTEM] ) / [SEGMENT] / sha1( [path_within_subsystem][beginTime][endTime] ).blob.cache

[SEGMENT] = int [0]...[N], for faster filesystem access

database access abstraction libraries overview

Name Backends C/C++ Linux/Mac version Multithreading lib/drivers RPM available Performance Licence
OpenDBX Oracle,MySQL, PostgreSQL, Sqlite3 + more yes/yes yes/yes yes/yes yes/authors fast, close to native drivers LGPL
libDBI MySQL, PostgreSQL, Sqlite3 yes/external yes/yes yes/some yes/Fedora fast, close to native drivers LGPL/GPL
SOCI Oracle,MySQL, PostgreSQL no/yes yes/yes no/partial yes/Fedora average to slow Boost
unixODBC ALL known RDBMS yes/external yes/yes yes/yes yes/RHEL slow LGPL

 

While other alternatives exist ( e.g. OTL,QT/QSql ), I'd like to keep abstraction layer as thin as possible, so my choice is OpenDBX. It supports all databases we plan to use in a mid-term (MySQL, Oracle, PostreSQL, Sqlite3) and provides both C and C++ APIs in a single package with minimal dependencies on other packages.