Pico DST issues

 I want to put in one place all Pico DST issues, requirements, and estimation.

Pico DST issues to discuss (numbers are taken from 2016 AuAu200 sample):
I. Micro => Pico: To access whole STAR data on disk we need reduction factor ~ 10.
    a. 1 M events stored as MuDst takes 1 TB on disk. 1 B event requires 1 PB. 
In runs 14 (3.3 B wih HFT)  and 16 (7.6 B with HFT)  we took in total 10.6 B events.
    b. STAR has ~12 PB distributed disks and 1 PB gpfs disks (total pwg 0.4 PB + pwd_tasks 0.1 PB + 0.5 PB for the rest).
     see slide 14 of
    c. Total statistics with HFT "in" for Run 14 is 3.3 B events and Run 16 is 7.6 B.  Usable statiscs for Run 14 is ~1B events, for run 16 is ~1.3 B events.
    d. The main limitaton factor in Micro to Pico DST conversion is HPSS. 
Limit 20 MB/s *60*60*24 = 1.72 TB/day.  This corresponds to rate for conversion MuDST => PicoDST ~2 Mevents/day.
    e. To solve this problems we need to use the selection criterias:
       1. Define trigger sets for analysis,
       2. Reject bad runs
       3. Selection good events (Vpd vertex match, ...)
       4. Select good tracks (no. of fit points, DCA of track to primary vertex < 50 cm (reduce size by a factor of 2), ....
II. The track parameters covariance matrix contains an essental information. The comparision of open charm reconstruction shows
that usage covariance matrix (KFParticle) allows to increase signficant practically in all channels by a factor 1.5. 
This effictively increase  statistics by a factor of 2.
III. KFParticle and KFParticleFinder has been speeded up matirx calculations (VC usign SIMD, compiler optimization, ...)  
techniques to allow analysis in very CPU effictive ways.
  a. O2 with newest gcc compiler (> 4.8) gives speed up of codes by a factor of 2 with respect to non optimized option
   see x86 https://drupal.star.bnl.gov/STAR/blog/fisyak/compare-cpu-event-gcc-482-492-521-and-icc-1700-compilers-2013-pp510-w-sample-hlt-farm
IV. The main limitation factor of analysis is reading from disk.  We have 15 k nodes for processing  but the limited no. of disk servers which 
limit our ability to read disk (~20 MB/s per server).
V. Branching: To reduce reading we can use TTree branching. But because we have complited splitting of our TTree the branching has granularity of 
a varibale of PicoDST. For simple task you can use TTreeIter developed by Victor about 20 yeasr ago. $STAR/StRoot/StarRoot/TTreeIter.txt
VI. ROOT has provided packing of variables during writing TTree on file. See  
void WriteFloat16(Float_t* f, TStreamerElement* ele = 0) 
especcially plot for number of significant digit for different packing modes.
   This is means that, for example StPicoTrack, instead doing packing by hands like
  Short_t  mNSigmaPion;       // nsigmaPi * 100
  Short_t  mNSigmaKaon;       // nsigmaK * 100
  Short_t  mNSigmaProton;     // nsigmaP * 100
  Short_t  mNSigmaElectron;   // nsigmaE * 100
inline Float_t StPicoTrack::nSigmaPion() const { return mNSigmaPion / 100.f; }
inline Float_t StPicoTrack::nSigmaKaon() const { return mNSigmaKaon / 100.f; }
inline Float_t StPicoTrack::nSigmaProton() const { return mNSigmaProton / 100.f; }
inline Float_t StPicoTrack::nSigmaElectron() const { return mNSigmaElectron / 100.f; }
 You can just define 
  Float16_t  mNSigmaPion;       //![-327.67,327.67,16]  nsigmaPi * 100
  Float16_t  mNSigmaKaon;       //![-327.67,327.67,16] nsigmaK * 100
  Float16_t  mNSigmaProton;     //![-327.67,327.67,16] nsigmaP * 100
  Float16_t  mNSigmaElectron;   //![-327.67,327.67,16] nsigmaE * 100
 and ROOT wil take care about packing and unpacking, and you will eliminated a lot of senseless codes.