General information

Job openings

STAR Scientific and IT Opportunities, Job openings & Positions. 

Thank you for your interest in working with STAR.

 


High Performance Computing Postdoctoral Scholar


Berkeley Lab’s Physics Division has two High Performance Computing Postdoctoral Scholar openings. Under the supervision of High Performance Computing experts and Computational Physicists, this role will develop and evaluate new software workflows that exploit the capabilities of the High Performance Computing facilities at National Energy Research Scientific Computing Center (NERSC); may participate in a new research initiative to blend Advanced Scientific Research Computing tools and facilities into High-Energy Physics software to optimize high-dimensional parameter fitting and tuning of simulation to data.  The research will include designing, implementing, and validating a new chain integrating existing HEP tools with advanced optimization tools and approximation techniques, allowing new advances in Monte Carlo simulation predictions at the Large Hadron Collider.

 

Specific Responsibilities:

  • Work on high performance computing facilities at NERSC to evaluate software workflows and implement new software workflows for running experimental software on High Performance Computing (HPC) facilities, and analyze workflows from the ATLAS, ALICE, LUX, LZ, and Daya Bay experiments.

  • Develop and implement new Monte Carlo event generator tuning tools for experiments like ATLAS at the LHC, including the extension of existing tools for more efficient optimization algorithms, tunes against new regions of phase space, and the utilization of new computational techniques in order to automate the tuning of many-parameter numerical models.  Explore the inclusion of detector simulation, a computationally expensive process, directly in the tuning process.

  • Conduct original research independently and in collaborations.

  • Interact with LBNL and other investigators working on similar and related scientific problems.

  • Interact with the experimental High Energy and Nuclear Physics communities and the experimental communities involved in the work.

  • Report results to supervisor.

Required Qualifications:

  • Ph.D. in Physics, Computer Science, or related fields.

  • Experience with Physics or Nuclear Science software development, workflows, or production.

  • Proficiency with computing programming languages including C/C++ and python.

  • Demonstrated ability to conduct original research independently and as a team member.

  • Good communication and organizational skills.

  • Ability to work as a team member to accomplish goals.

 

Additional Desired Qualifications:

  • Experience in any of the following areas: the ROOT software framework, job scheduling, batch system operation, data management, HPC systems, NUMA and MIC system architectures, software performance evaluation and optimization is desirable.

  • Knowledge of event generators (Pythia8, Herwig7, Sherpa), generator tuning tools (Professor, Rivet), detector simulation (Geant4, Delphes, PGS), physics analysis in LHC experiments.

 

The following requested application materials listed below must be submitted through Academic Jobs Online:

  1. Curriculum Vitae.

  2. Cover Letter.

  3. Statement of Interest.

  4. 3 Letters of reference (to be uploaded on AJO by referee).



[Original link and more information]

 


 

 

Other links

Meetings, meeting sessions and Reviews


Starting May 2020 after the S&C re-organization under the new STAR management, the S&C management team has weekly meeting on Wednesday between 12:00 to 13:00. The meeting is currently happening on Bluejeans. Link: https://bluejeans.com/727703210.

============Before May 2020 ====================
The S&C group has weekly meetings on Wednesday, between 12:00 to 13:00 (noon to 1 PM) in building 510a, room 1-189 at BNL.
Additional regular meetings include
  • A Grid operation and activity meeting on Thursday, 13:00 to 14:00, building 510a, room 1-189 at BNL.
  • Before and during the run, a Friday "run preparation meeting" or a "run status meeting" targeted toward the core team, DAQ, Slow Control and Trigger groups as well as the software sub-system's coordinators.

Phone bridge are provided for those and announced in mailing lists.

Reviews

2008 - EMC calibration workshop

The EMC calibration workshop was launched as with a start up Email to starmail on 9/4/2008 10:31. A copy resides below. The review report was requested to be delivered by October the 29th 2008.

See also for further references:

 

 

 

Dear STAR collaborators,

	Driven by findings and (constructive) criticisms of
the past operation workshop on the core team's attention to EMC
Physics, I have flagged the calorimetric physics issues as high
priority for the core team to address and  a key project for
the year.

	As corrective action, I proposed and discussed internally
to the S&C project (including the software coordinators) the
need for a workshop which high-level goal would be to define,
discuss, map the Physics goals and deliverables of our research
program to EMC tasks timelines, milestones and effort levels
needed to reach those goals (by physics topics, analysis). To
answer the requirements needs, and thanks to Bedanga for being
on board with this, specific questions will be asked of the PWG
(via the PWGC). Questions will serve as direct input to the
workshop.


---
	The workshop will take place at BNL on September
29 and 30th (just after the analysis meeting) in the ITD seminar
room (Bldg 515) and I have asked (and charged) Gene Van Buren,
calibration coordinator, to help with the logistic and
post-workshop report.



---
	As a product of the workshop, I have asked Gene Van Buren,
in consultation with the EMC sub-system management (Will Jacobs
as project coordinator and Matthew Walker as software coordinator)
to steer an effort to write a summary report which would emphasize
the roadmap ahead, and the (human) resources available/needed as
well as what we would not be able to accomplish shall we have
missing workforce. Such document would then serve as a clear
statement for an incoming operation workshop where the needs
would be re-iterated, quantification in our hands. The report
would be due by October 17th, well in time for the operation
workshop in November.

	Many thanks to Will Jacobs for his understanding, Matthew
Walker for his prompt response and assistance and Gene Van Buren
for pulling the troops and generating interest as well as taking
this task on the S&C global plan to the level of seriousness it
deserves.

	With hopes you will attend, participate in providing
feedback (through your PWG) and make this workshop a success.
	

-- 
                 Dr. Jerome LAURET
                 RHIC/STAR Software and Computing project Leader
      ,,,,,      Physics Department, Brookhaven National Laboratory
     ( o o )     Bldg 510a, Upton, NY 11973
 ---m---U---m---------------------------------------------
 E-mail: jlauret@bnl.gov

 

2008 - TOF software review

The Time Of Flight sub-system was called for a software readiness and integration review in the Fall of 2008. The commitee's charges are available below in the related documents section.

Related documents follows:

 

 

2011 - Sti, CA and Stv tracking component review

This page will keep information related to the 2011 tracking component review. The review will cover the state of the Cellular Automaton (CA) seed finding component as well as the Virtual Monte-Carlo based tracker (Stv) and their relevance to STAR's future need in terms of tracking capabilities.

Project goals

After a successful review of the ITTF/Sti tracker in 2004, the STAR collaboration have approved the move to the new framework bringing at the time unprecedented new capabilities to the experiment and physics analysis.Sti allowed the STAR reconstruction approach to integrate to its tracking other detector sub-systems by providing method to integrate simple geometry models and allow to extrapolate track to the non-TPC detector planes therefore, correlating information across detector sub-systems. In 2005, the STAR production switched to a Sti based production and we have run in this mode ever since.

However, careful architecture considerations revealed a few areas where improvements seemed needed. Those are:

  • The need to maintain two different geometry models (one for reconstruction, one for simulation) increasing workforce load at a time when STAR is both active and ambitious in its future program as well as running thin on detector sub-system code developer. Beyond workforce considerations
    • The two separate geometries have consequences on embedding and simulation hence, our ability to bring efficiency corrections to the next level of accuracy.
    • Material budgets were found to be ill-accounted in reconstruction (dead-material were not properly modeled in the Sti framework). The use of a common geometry model would have removed this issue
  • Sti has some tracking restrictions - geometries made of planes and volumes perpendicular to the beam cannot be treated due to a technical choice (detector elements are organized in planes // to the beam, sub-systems assumed to be composed of elements replicated in Phi). This would preclude tracking in detectors such as the FGT.
    • Our goal was to create an extended set of functionalities providing a truly complete integrated tracking approach, allowing the inclusion of hit information from other detectors (a key goal the inclusion of detector hits placed in the forward direction)
  • The use Monte-Carlo based propagators would allow better access to Eloss, better predictors and track swimming allowing for tracking in non constant B field (this is also not possible in Sti)

Additional considerations for the future of STAR were

  • A single yet flexible geometry model would allow STAR to be ready for GeantX (5 and TGeo based)
  • A flexible geometry model would allow STAR to better cope with STAR to eSTAR migration (geometry morphing)
  • A revitalize framework would allow addressing long standing issues of event mode in simulation
    • While STAR has a FORtran based approach allowing integration of some event generators, many have appeared pure C++ based, making their integration to the STAR simulation framework difficult. A new generic model would allow a "plug-and-play" approach.
    • The use of non-perfect geometries (miss-aligned) have been lacking in the simulation framework and would be advisable
  • Novel algorithm have appeared in the community, leveraging and harvesting the power of multi-core and many-core architectures. Investigating speed and efficient gains and evaluate the best use of resources is necessary for STAR demanding physics program. Equally important, those new algorithm (Cellular Automaton being one) are opening to online tracking algorithm (GPU based).

 

Based on those considerations, several projects were launch and encouraged

  • CA based tracking - the sudy of the CBM/Alice Cellular Automaton algorithm for seed finding was launched in collaboration with our GSI colleagues.  Multi-core aware, the simple algorithm is thought to provide speed gains over the seed finding. Further work could spurse from this evaluation (online HLT) if successful. The algorithm was showed to be portable to STAR, thanks to Yuri Fisyak and Ivan Kisel team, and the product of this evaluation to be tested.
  • The VMC project - a three part project (VMC tracking, VMC geometry, VMC simulation framework), the VMC geometry (a.k.a. aGML) has rapidly matured under the care of Jason Webb. The VMC trakcing (a.k.a. Stv) has been developed by Victor Perevoztchikov and thought to provide equal efficiency than Sti (as well as implement all the features listed above).

We propose to review the aGML, CA and Stv components of our framework reshape.

 

NB: Beyond the scope of this review, a key goal for VMC is to allow the inclusion of newer Geant version and hence, getting ready to step away from Geant3 (barely maintainable), the FORtran baggage (Zebra and portability issues on 64 bits architectures) and remove the need for a special verison of root (root4star) hard-binding root and STAR specific needed runtime non-dynamic libraries.

 

Why a review?

  • All R&D projects are reviewed in STAR
    • Initial approach was to proceed with a "soft" PWG evaluation but (on second thoughts) not really an options …
    • An internal STAR review process should (and will) be established
  • Advantages
    • A review process provides strong and independent backing  of the projects
    • A review process provides  an independent set of guidance to management (S&C and PWG) on path forward
    • Collaboration wide scrutiny and endorsement across PWG lessen the risks of  finding problems later
  • Reminder: ITTF / Sti was not carried without problems
    • Sti review missed  the UPC PWG’s feedback –problems found a-posteriori diverted attention and workforce in solving it
    • Problem are seen in HBT and fluctuation analysis when Run 4 is compared to Run 10
      • HBT issues were not seen at Sti evaluation – Is it an analysis problem? Something else?
  • A review will also provide a good time to re-establish a solid baseline and get feedback from the PWG on opened issues if any
    • This is even more so important that STAR is moving forward to a new set of detectors and high-precision measurements

 

 

Review charges

See attachment at the bottom of this page.

 

Review committee

Status:

  • 2011/08/12 Intent of a review brought to management (charges to be written).
                        Action items was to suggest a set of names for the committee set.
  • 2011/08/18 Committee members suggestions provided at management meeting. Spokesperson decides he will contact chair.
  • 2011/09/02 Charges sent to management for comments along a note that the charges may be long (text is both for committee and reviewee). No feedback outside the provided self-provided note.
  • 2011/10/07 Chair contacted - process of selecting committee being worked out (Spokesperson or)
  • 2011/10/13 Spokesperson delegate committee forming to review Chair (Olga Evdokimov), S&C Leader (Jerome Lauret) and PAC (Xin Dong)
  • 2011/10/15 Committee assembled
  • 2011/10/31 Draft agenda made
  • 2011/11/01 Agenda presented and feedback requested
  • 2011/11/08 Final agenda crystalized

Members:

  • Olga Evdokimov (chair)
  • Claude Pruneau
  • Jim Thomas
  • Renee Fatemi                [EVO]
  • Aihong Tang
  • Thomas Ullrich              [EVO]
  • Jan Balewski                 [EVO]
  • Anselm Vossen

The agenda is ready and available at You do not have access to view this node.

Material

Below is a list of cross-references to other documents:

Follow-up of Stv performances

This page will list material for the follow-up review of Stv performances.

Material

Other information

Review findings and key points/problems

The review recommendations for focii were

  • Hit residuals and track chi2 should be the first problems to be addressed
  • Tracking inefficiencies, pT and charge dependences for reconstructed tracks, and charge separation for high pT tracks need to be reevaluated after Stv tuning and optimization.
  • Implementation of the track extensions to other detector volumes is necessary for the Stv to become the truly integrated STAR tracker.
  • Extension to the TOF detector is recommended to be next on the priority list

 

The review finding details included

  • No documentation is available on the method (in general), and its specific implementation - this needs to be addressed
    • Appropriate documentation needs to be created for the track predictors to other detector elements, with detailed workflow of the algorithm to be worked out with the relevant subsystem groups
  • CPU resources: Stv uses twice the CPU resources - recommends addressing this matter before a full deployment is attempted (and along development)
     
  • The ability of the Stv to properly account for the material budget is unclear at the moment, as performance tests uncovered significant losses in the dead material
  • The importance of optimizing PPV VF within Stv (work toward optimization should be pursued by the developers from both, S&C and VF teams)
  • The implementation of the track extension method into volumes other than the TPC was not reviewed by this committee. The validity of an extrapolated covariance matrix was not discussed. Integration of all sub-systems, for example FGT and TOF, into the tracking model should become a major focus of the project
  • Integration of forward detector geometries has not been showed
  • The possibility to have track fits using different mass assumptions has not been presented
     
  • Performance were not all understood:
    • The pT dependence of the tracking reconstruction efficiency is not entirely understood (degraded resolution of charge sign reconstruction)
    • The ψ resolutions and pulls were found to be slightly worse for Stv compared to Sti.
    • Track by track comparison of the Stv and Sti trackers on the real data have shown similar momentum resolution and pulls for inclusive tracks: Systematic and opposite shifts for positive and negative tracks, were observed between tracks found by the two trackers. The discrepancies are shown to grow with the transverse momenta
  • Insufficient supporting evidence were presented with regard to Stv’s ability to handle low multiplicity high pile-up events - this issue be studied in details and presented for the next review before Stv deployment
  • Physics evaluation specifics
    • Stv and StvCA have been reported to have lower efficiencies and unusual phi distributions by the UPC
    • The efficiencies of Stv and StvCA look significantly lower than those from Sti from Jetcorr correlation studies
    • The widths of Lambda and AntiLambda become broader in Stv and StvCA data samples, compared with Sti samples
    • Mass shifts were observed between the Stv and Sti tracker family
    • Spin group reported major losses in jet reconstruction efficiency (factor of 2) due to the loss of low pT tracks in Stv
    • The charge sign discrimination at high pT deteriorates significantly while switching from Sti to Stv.
    • Jetcorr tests uncovered Stv problems with saving of global tracks to MuDst format; also they report incorrect results for pT and yT correlations in Stv

Specific example of the DCA distribution problem (reminded 4/13/2012)

 

 

2021 TPC calibration review

2021 TPC calibration review

 

Organization

On May 2020, the STAR management team has decided to reorganize the STAR software and computing (S&C) activities. The new S&C organization includes an S&C management team which oversees the S&C related issues together with six sub-groups. Please see the following the new organization chart and the subgroup leaders and relevant mailing lists.



The S&C management team members:

Ivan Kisel (Frankfurt)

Gene van Buren (BNL)

Jason Webb (BNL), Xianglei Zhu (Tsinghua)

Dmitri Smirnov (BNL), Grigory Nigmatkulov (MEPhI)

Dmitry Arkhipkin (BNL)

Jerome Lauret (BNL), Jeff Landgraf (BNL)

Ashik Ikbal Sheikh (KSU)

Xin Dong (LBNL), Lijuan Ruan (BNL)

Torre Wenaus (ex. off.)


Gene van Buren  -  NPPS STAR point-of-contact

Gene van Buren  -   SDCC liaison 




Below is a brief description on the responsibilities of each sub-groups:

Tracking:                             - maintain / develop tracking software

                                             - online/offline tracking merging

Calibration/Production:     - data calibrations (coordinating subsystems)

                                             - production library built and maintenance

                                             - real data production and data management

Simulation/Embedding:     - GEANT geometry maintenance and development

                                             - Event generator integration

                                             - Embedding software maintenance and development

                                             - Simulation/embedding production

Software infrastructure:    - Offline software code review, integration and maintenance

                                            -  StEvent/MuDst/picoDst maintenance

                                            - management of OS, compilers etc and karma permissions

                                            - coordination of efforts for bug fixing

Database:                           - Online databases and maintenance

                                            - Offline databases (Calibrations, Geometry, RunLog) 

                                            - FileCataLog databases

                                            - STAR phonebook / drupal modules

Experimental Computer Support and Online Computing:

                                            - Offline/online computing support for experiment

                                            - Cyber security


The principal members of the S&C structured team are listed below.

  • Data readiness sub-group  - Coordinated by Gene Van Buren (BNL)
    • Calibration Coordinator (+) - Gene Van Buren (BNL)
      (Note: this position can be filled by a remote participant, allowing time release for calibration R&D)
    • Database administration and support (R&D falls under infrastructure) - Dmitry Arkhipkin (BNL)
    • Quality Assurance Coordinator - Lanny Ray (UTA)
      • Online QA - Jeffery Landgraf (BNL)
      • Offline QA - Alexander Jentsch (UTA)
         
  • Offline software - activities are oversight by the co-leaders
    • Offline reconstruction and simulation - Transitional Structure - Coordinated by Jason Webb (BNL)
      • Reconstruction leader (+Simulation leader (+)
        This area is currently co-lead by Victor Perevoztchikov and Jason Webb (BNL) with an emphasis on Victor:reconstruction, Jason:simulation.
      • Jason Webb (BNL) - Simulation support specialist (Start November 2nd 2009)
      • General offline software & sub-system support - Dmitri Smirnov (BNL)
         
    • Embedding Coordinator (+) - Xianglei Zhu (Tsinghua)
      • Embedding Deputies - Xionghong He (IMP, Lanzhou), Maowu Nie (Shandong Univ.)
      • Embedding helpers
        • LFS/UPC:     Yi Fang (Tsinghua), Yiding Han (Rice Univ.)
        • HP:              Diptanil Roy (Rutgers)
        • CF:               Yongcong Xu (CCNU)
        • ColdQCD:    Hannah Harrison (Kentucky)
          • Members before 08/20222
          • Embedding Deputy, base QA support (for NERSC) - Derek Anderson (TAMU)
          • Embedding Deputy (Other) - Ning Yu (CCNU)
          • Embedding Helpers (see the Embedding structure for more information)
            • Spin:           Amilkar Quintero (Temple), Joe Kwasizur (CEEM)
            • HF:              Yuanjing Ji (USTC), Robert Líčeník (CTU)
            • LFS/UPC:     Leszek Adamczyk (AGH), Dave Stewart (Yale)
            • Jet-like-corr:Prabhat Bhattarai (UTA), Zillay Khan (UIC)
            • Bulk-cor:      Jinlong Zhang (),Shu He (CCNU)
  • Data production and library - Coordinated by Lidia Didenko (BNL) -> Amol Jaikar (BNL)
    • Production coordinator (+) and Software Librarian () - Lidia Didenko (BNL) -> Amol Jaikar (BNL, 1/2 time)
    • Distributed production support - Levente Hajdu (BNL) @ 1/3rd time
  • Infrastructure, software infrastructure, Middleware support, technology evaluation and integration - Coordinated by Jérôme Lauret (BNL)
    • ROOT development, visualization & software architecture, R&D and support - [X]
    • Levente Hajdu (BNL) - (1/3rd) technology provisioning and development of tools (for local or distributed computing)
    • Computer Operation and user support
      • Wayne Betts (BNL) - Computer support
      • Michael Poat (BNL) - Computer support: BNL user's laptop and desktop (purchase, setup, OS upgrades and software) + visitors laptops (troubleshooting, system setup for BNL compliance, cyber-security)
    • Grid Operations and OSG activities - Coordinated by Douglas Olson (LBNL) and Wayne Betts (BNL)
      • [X] - Activity Coordinator
      • Wayne Betts (BNL) - Grid Operation coordinator and distributed facility point of contact
      • Levente Hajdu (BNL) - (1/3rd) Grid Technology support 
         
  • Cyber Security - STAR has been authorized to deploy, implement and maintain its own tools and hardware that comply with the spirit of the law (but are least-intrusive to STAR workflows and operations)
    Several point of contacts are crucial to STAR's smooth operation.
    • Cyber Security System Owner - Jerome Lauret  (please contact for matters of policies and divergence to the CS rules)
    • Cyber Security Information Systems Security Officer - Wayne Betts (please contact for any issues with implementation of the Cyber security tools, including STAR's approach)
    • Cyber Security trusted system admin - Michael Poat (versed in CS implementation and policies and trusted by the CS team for implementation)

 

Other supporting efforts & members

  • Related to Software infrastructure
    • Thomas Ullrich (BNL / Yale) - StEvent support -> Jason Webb (BNL) (03/2022)
    • Daniel Brandenburg (Rice) - MuDST support
    • Grigory Nigmatkulov (MEPhI) - picoDST support
    • Dmitri Smirnov (BNL) - Vertex Finder support   [NB: formally a PWG activity]
    • Hongwei Ke (BNL) - Web master and Web support (+)
       
  • Related to Computing and facilities Operations - Off-site facilities - This section includes central and/or distributed facility support 
    • Jeff Porter & Jan Balewski (LBNL) - NERSC/PDSF support -> Irakli Chakaberia (LBNL) (06/2021)

The Software Sub-system coordinators (+) in each specialized area are as follows :

Detector sub-systems:

  • TPC Software – Yuri Fisyak (BNL)
  • GMT Software –  Grigory Nigmatkulov (UIC)
  • DAQ Software – Jeff Landgraf (BNL)
  • BEMC Software - Charles Clark (Temple) 
  • EEMC: Zilong Chang (Indiana Univ.) (03/2022)
  • FMS/FPS Software - Oleg Eyser (BNL)
  • bTOF/VPD Software - Chenliang Jin (Rice)
  • eTOF Software - Yannik Soehngen (Heidelberg)
  • MTD Software - Rongrong Ma (BNL) 
  • Trigger Detectors (BBC, FPD, CTB, ZDC, MWC, ...) - Akio Ogawa (BNL)
  • HFT Software - Xin Dong (LBNL)
  • HLT Software - Hongwei Ke (BNL) -> Diyu Shen (Fudan) (04/2022)
  • PP2PP/RP Software - Yip Kin (BNL)
  • EPD - Prashant Shanmuganathan (Lehigh)
  • Forward Upgrade - Daniel Brandenburg (BNL). Forward Upgrade includes FCS, sTGC and FST. PoCs are
    • FCS - Akio Ogawa (BNL)
    • sTGC - Daniel Brandenburg (OSU)
    • FST - Shenghui Zhang/Zhenyu Ye (UIC)


The below sub-systems are no longer supported in STAR (detector system physically removed) - green are sub-systems with no software support, blue are the ones with some support:

  • eSTAR R&D - Ming Shao (USTC)
  • TPC Software – Richard Witt (Yale, USNA) & Yuri Fisyak (BNL)
  • FGT Software - Xuan Li (Temple)
  • FTPC Software - Janet Seyboth (MPI)
  • PMD Software - Rashmi Raniwala (U. Rajasthan)
  • RICH Software - Boris Hippolyte (Yale)
  • L3 Software - Thorsten Kollegger (Frankfurt)
  • SVT Software - Helen Caines (Yale)
  • SSD Software - Jonathan Bouchet (KSU)
  • HCAL Software - Wangmei Zha (USTC)
  •  

The names below reflect the list of software coordinators while diverse projects were in R&D phase. The projects moved to full projects in 2007

  • Offline Heavy Flavor Tracker Software - Andrew Rose (LBL)
  • Offline Inner Silicon Tracker Software - Mike Miller (MIT)
  • Offline Hybrid Pixel Detector Software - Sevil Salur (Yale)

The computing and software effort is closely associated with the Physics Working Groups. STAR physics analysis software runs within the context of the computing infrastructure, taking the DST as input. The physics working groups have responsibility for the development of physics analysis software. The STAR Physics Analysis Coordinator acts as coordinator between the PWGs and computing. The PAC's responsibilities are described here.

Sooraj Radhakrishnan is the current STAR Physics Analysis Coordinator.

 


STAR Software & Computing is headed by
Dr. Jérôme Lauret and Dr. Gene Van Buren located at the Brookhaven National Laboratory.

S&C Team November 2018
From left to right: Gene Van Buren, Dmitri Smirnov, Lidia Didenko, Jason Webb, Victor Perevoztchikov,
Dmitry Arkhipkin, Michael Poat,
Levente Hajdu, Amol Jaikar, Wayne Betts, Jérôme Lauret

The S&C management structure is as below. Unless otherwise specified, [X] indicates an activity area whose overall coordinator has been missing and co-lead (either internally absorbed or activity dropped).

  • Jérôme Lauret (BNL)    - Software & Computing project Leader    
  • Gene Van Buren (BNL) - Software & Computing project co-leader

 

 

Analysis Coordinator

STAR Physics Analysis Coordinator

Responsibilities:

1) To work with the physics working group convenors and as appropriate the Software and Computing Project Leader, Simulations Leader, Reconstruction Software Leader, Offline Production Leader, Software Infrastructure Leader, and Run-Time Committee to determine the physics analysis and simulation software needs. To act as an interface between the physics working group convenors and the STAR Software and Computing Project on matters of physics software and computing and consult as needed with the Spokesperson on priorities for this software.

2) To work with the physics working group convenors and the STAR Software Project Leaders to faciliate the development and integration of physics analysis software in a way that is compatible with the overall STAR software approach. In so doing, the quality and performance of the reconstruction and simulation codes should be primary considerations.

3) To represent the physics working groups in discussions, with the software project leaders, on the physics analysis tasks to be performed during event reconstruction and at each stage of analysis. This will require that the physics analysis coordinator maintain an overall perspective of the status and availability of physics analysis and simulation software.

4) To facilitate input and communication between the physics working groups and the Simulations Leader on issues of determining and implementing the tradeoffs in the simulation capability versus physics.

5) To work with the Simulations Leader to make efficient use of the computing resources for the simulations needed by each of the physics working groups and to coordinate the physics working groups' input on design tradeoffs in the simulations with respect to general performance and overall capabilities.

6) To work with the Reconstruction Leader to establish requirements for DSTs and event reconstruction functionality.

Desired Skills and Abilities of the STAR Physics Analysis Coordinator:

1. well versed in STAR's physics program with a strong interest in physics, software and computing.

2. active in physics analysis, as an active developer and user of analysis codes.

3. strong in computing, able and willing to be an active participant in the computing group designing and developing the analysis software and the computing framework that supports it, and able to assess the quality and approach of the upstream reconstruction and simulation codes and give feedback.

4. direct experience in OO/C++ prefered.

5. be able to communicate well.

6. be able to commit a large fraction of time to this job and to have a presence at BNL as needed to interface with the software project leaders and the physics working group convenors.

Torre's statement on the job:
"A principal early role of the physics analysis coordinator would be to help assemble the physics analysis program for the mock data challenges, going well beyond the broad strokes of what physics should be looked at to developing the program to put in place the physics analysis software needed to execute it, software layered over a physics analysis infrastructure and toolset that the Analysis Coordinator should play a strong role in designing and ideally developing. Besides assembling the disparate needs of the PWGs to scope out and assign the design and implementation job, there is a lot of commonality in their needs that needs to be coordinated."

Calibration Coordinator

Position and responsibilities Description: Calibration coordinator

The STAR Calibration Coordinator's primary mission is focused toward the delivery of the calibration constants necessary to bring the data to an expected level of quality in support of the scientific program. The STAR Calibration coordinator is expected to work in concert with the STAR sub-systems software coordinator's designated calibration expert(s) to bring the data to a level of accuracy and quality in support of the scientific program. Before and during period of data taking and data production, this may be achieved through organizing calibration readiness meetings or communicate with the calibration experts and/or prepare/summarize and develop a calibration plan and schedule as required. He/she would interact with them to understand their problems and seek to work toward the elimination of mindless tasks through automation (support for online calibration, fast-offline, etc...). He/she will be responsible for pro-actively be the liaison (and main point of contact) between production, reconstruction, database or other coordinators and the sub-systems expert within the realm of expertise.

Authorities

  • To achieve objectives, he/she has the authority to directly request highly prioritized productions.

  • The Calibration coordinator priorities and schedule takes precedence over the individual sub-systems calibration needs.

  • The Calibration coordinator may request progress status to the sub-system designated calibration experts

  • If any, and in order to make the best of use of the global STAR calibration organization, he/she should be informed about on-going independent calibration effort and techniques being developed within the collaboration. Work should have his/her final approval before an integration in the STAR framework.

Responsibilities

He/she is responsible for identifying key milestones, determine immediate and future needs and communicate critical project issues in a timely fashion.
He/she is expected to be a central point of contact for the user's need within the area of expertise, respond to user problems, explain technology and methodologies and guide or mentor individuals as appropriate.

Skills

The STAR Calibration Coordinator is expected to demonstrate in depth understanding of fundamentals of the requirement specification, design, coding, and testing of technologies, methodologies and computational techniques related to the calibration needs of the STAR experiment. He/she should have a good understanding of the current and future application and technology and the faculty to learn, apply and implement new and emerging techniques and concepts very quickly.

Written by Jerome Lauret, S&C Leader 2003

 

Embedding structure

Organization, responsibilities, authorities and policies related to Embedding

Due to the increasing demand of the STAR collaboration, associated to the need for redundancy and more cohesion in the embedding activity, the embedding structure will move toward a distributed (computing) model paradigm with a structured set of responsibilities.

Organization

The embedding activity is a cross between the Simulation and Reconstruction activities and an important part of our data mining and data analysis process. Hence, the Embedding structure described below is an activity part of the Software & Computing (S&C) project structure.
The embedding activity will be led by the Embedding Coordinator (EC) helped by Embedding Deputies (ED) whose responsibilities and authorities are described below. It is understood that each Physics Working Group (PWG) may assign a contact person to help running an embedding series, or an Embedding Helper (EH). EC, ED and EH constitute the embedding team and core structure.
Embedding tasks will be created consistent with Appendix A “Initiating Embedding requests”.

 

The EC

The primary mission of the EC is to organize the work and the set of QA results related to each of the embedding series and communicate to the collaboration the progress and difficulties encountered in the data production process. The EC is the interface to many areas in STAR and to efficiently achieve the goals prescribed by the function, his/her responsibilities and authorities are described below:

The EC responsibilities are:

  • The EC will work in close relation to the S&C Leader, Physics Analysis Coordinator (PAC) and the PWG Convener (PWGC) to define and facilitate the processing of the embedding requests needed by the collaboration to carry its scientific goals
  • The EC will work with the PWG EH and gather keep up-to-date a set of macros and code aimed to provide appropriate QA for the PWG of interest
  • The EC is responsible for the communication with the PWGC whenever embedding requests clarifications are needed (Physics process and/or intent unclear). This communication should happen only when the PWG EH  feels it is best to properly relefct the matters at hand. Communications to/from the PWG are expected to flow through the EH.
  • The EC will communicate and feedback to the S&C leader any issues related to resources needed to accomplish the mission.
    • This may include discussions involving tasks lists and priorities, assignment of additional workforce, addition of computational resources allocations or difficulties encountered during the embedding process.
    • The discussions may be substantiated by summary of resource needed in the form of a written document (and provided by the EC) as requested.
  • The EC is responsible for regularly bringing to the collaboration, via weekly S&C meeting venue, summaries of the tasks at hand and presenting the priorities or priority reshape as applies. All modified embedding requests are expected be announced at “a” S&C meeting as a summary and in addition of the communication with the PWGC. The frequency of such intervention is at the discretion of the EC and should not be below once a month
  • The EC will be responsible for communicating with the ED and organize the work and tasks with the ED forming the core force of the embedding
  • The EC is expected to handle and resolve with the ED issues (technical, resource) which may be raised by the EH and help toward a reolution and understanding

 

The EC authorities are

  • Assign tasks as applies to the ED

  • Request highly prioritized resource allocation to perform tasks
  • Contact and discuss directly issues related to embedding requests with PWGC, PAC and S&C leader as applies
  • Authoritative decisions are granted to the EC on the following area:
    • Declare an embedding request as improper and re-direct to the simulation leader
    • Re-prioritize or close embedding tasks and requests upon PWGC feedback or lack thereof.
    • Similarly, close an incomplete request upon PWGC lack of feedback
    • Re-assign priorities to a low priority shall precisions on an embedding request not be provided
    • Has authoritative leverage to declare an embedding series as 'good or not', based upon quantitative arguments and backed by proper embedding related QA.
  • May request discussions and/or arbitration ruling to the PAC and S&C leader shall a contention arise with PWG(C) of interest

Conflict resolution

  • In the event an embedding request is re-directed, re-prioritized, closed, declared 'bad' and shall the decision be contested by the  PWGC, the following applies:
    • The responsibility to present a case and burden of proof is on the PWGC
    • Explicit discussion and approval of both S&C leader and Physics Analysis Coordinator (PAC) is necessary for re-considering the EC decision

 

The ED

The Embedding deputies' role and responsibilities are the one previously expected of the PDSF Tier1 center in 2005 and as defined in wording in 2007. An Embedding Deputy is intimately tight to a site's resources or a specific well-defined tasks (such as performing the base QA).

Those responsibilities include

  • Respond to the EC requests for assigned tasks or work unit and provide estimate on time need to complete the tasks and feasibility of accomplishing he goals
  • Prepare the embedding jobs and ensures the full workflow is working for achieving the asssigned tasks. Run a sample and help validate the production chain
  • Help resolve technical details as appears and file/track trouble tickets as applies
  • Keep the EC informed of progress, issues and difficulties encountered during the setup of the embedding process including resource allocation difficulties if any as well as technical difficulties
  • Maintain the embedding framework consistent with a distributed model and provide feedback on improvements / fixes suitable for all sites
  • Keep the embedding related documentation up-to-date
  • Work closely with the EH as applies to accomplish task at hand

 

The authorities of the ED include

  • Manage global assigned local resources as necessary to perform tasks
  • Reshuffle and assign portion of resources to EH as see fit to perform task (tasks passed to the EH could include running a portion of the jobs and managing batch related to the relevant to the PWG's EH)
  • Request EC intervention whenever necessary
  • Directly communicate with the EH issues related to geometry tag, magnetic fields and other parameters wherever applies (see appendix B) while keeping summarizing the outcome to the EC
  • Request feedback from the PWG EH on the adequacy of the samples provided to the PWG's requests. It is expected that the EH will communicate with the PWG (PWG specifc QA may be performed) and provide an explicit answer to the ED, accompagnied of material putting into evidence inadequacy of the sample or documenting adequacy as applies.

 

The EH

The Embedding Helpers are individuals recruited from within the PWG and hence, are STAR collaborators helping with the running of the embedding and carry some of its burden as a general service. The EH is part of a workforce supplement provided by the PWG and the expectation is for a EH to serve for a minimal time of two years during which, knowledge build-up and stability in our procedure and communication can be achieved. PWG may or may not provide EH, understanding however that the lack of EH may result in delays in delivery of results.

The EH responsibilities is to carry the communicating of all issues to/from the PWG and the Embedding team and seek that the requests made by the PWG are carried to their end. TO this end, rhe EH are expected to work closely with the ED and EC to perform the embedding tasks related to their respective PWG and consistent with the principles stated above. Examples of duties:

  • Help clarifying embedding requests and communcate with the PWG in case of questions of lack of clarity
  • Whenever an embedding request is determine valid by the EC and assigned to an ED, the EH will work within guidance and supervision of the assigned ED. The EH is then expected to carry the tasks assigned (or delegated) by the ED and possibly carry production related work to complete the requests on behalf of their related PWG. It is understood that resource managements and allocation remian the prime responsibility and call of the ED.
  • Communicate pro-actively with the PWG and work internally to the PWG structure to carry QA testing on samples provided for satisfying the PWG requests.
  • Provide a clear and documented response to the ED on the adequacy (or lack thereoff) of the produced sample for the request.

Appendix

Appendix A – Initiating Embedding requests

Embedding requests and need will be initiated and discussed within either PWG or R&D working group with respectively the supervision of the PWGC of interest or R&D simulation coordinator. The following caveats apply.

Pure embedding will be asked by PA of a pending publication / paper or whenever the accuracy needs to be as-close as possible to the real data. All other cases will be reviewed by the EC and may be transformed into a enriched sample simulation (“injection” simulation) wherever it applies and re-directed to the simulation leader consistent with the EC authorities.

After discussions within the PWG, embedding requests will be recorded by one of the PWGC via a provided interface which purpose will be to keep track of all requests and allow for priorities and overview of status and progress. The PWGC may designate an embedding point of contact as PWGC representative for embedding. In such case, the communication will be carried through the Embedding Point of contact.

No embedding requests outside of the provided framework and interface shall be satisfied and no “pending discussions” without an actual requests will be considered. It is also understood that a ill-defined requests may be closed by the EC consistent with the feedback gathering policy described in the EC responsibilities and authorities section. In such event, the request slot is not re-usable by the PWG.

Unless specified otherwise and priorities assigned explicitly by the PAC, EC or S&C leader, the requests will be considered as per a first in first done basis and upon availability of resources.

Appendix B – ED responsibilities and authorities regarding geometry tag and field settings

Upon request from the PWGC, the embedding team will ensure that the proper information is obtained as per the production series the embedding requests relate to. For a given production, the field setting and geometry tag used will be checked for consistency. Consistency must arise between simulation, real data and intended study. The geometry tag will be acquired from the production options page maintained by the production coordinator [currently Real Data production option].
The ED will be responsible for carrying those checks and will pass on functional macros to the EH assigned to assist them. If a setting is inaccurate (field or geometry tag), the EH shall immediately inform the PWG and the EC and request explicit confirmation. An embedding series shall not be started nor be run without the acknowledgment of the related field or geometry tag adequacy as chosen by the PWGC or PWG Point of Contact.

Jérôme Lauret, STAR Software & computing Leader 07/05/16

Jérôme Lauret, STAR Software & computing Leader 09/10/20, revised for expanding and clarifying the EH role

 

Generic Software Coordinator

General Position Description: detector sub-system software coordinator


Each detector sub-system must designate or identify a Software sub-system Coordinator who then becomes the main contact person for developing and maintaining the software written to bring the data for that sub-system to a Physics usable form and at a level of accuracy and expectations required for carrying the STAR Physics program involving the use of that particular sub-system. Additional manpower for the development of the Software may be allocated within the sub-system's group or requested by the sub-system software coordinator as additional manpower (aka service/community task).

As all realizes that there is no Physics without data reduction (via code/software), the Software coordinator is therefore a corner stone of sub-system's group. He/she has for main responsibilities:

  1. The development and maintenance of the data acquisition reader for the offline chain, the detector geometry in the appropriate framework (Geant, reconstruction), the calibration database layout and content, at least one response simulator suitable for the simulation and embedding chains as well as responsible for the tracking specific software if applicable.
  2. To disseminate in the collaboration the information related to the sub-system's of interest especially how to use the data in Physics studies: this may be accomplished via documentation, development of analysis API, regular updates and presentations at collaboration meetings or when asked by the S&C leader for a progress and/or readiness status report made regularly and pro-actively at the week S&C meeting
  3. Similarly, to pro-actively bring forward to the S&C leader issues and show stoppers pertaining to the sub-system - examples span from delays in calibration procedures, resource needs or issues with the data quality. Case based presentation a the S&C meeting are highly encouraged and welcomed
  4. To work in close relation to the database, reconstruction, simulation and calibration leader/coordinator when appropriate
  5. To ensure compliance of his/her code with the STAR coding standard. Each new code is expected to be brought forward to a peer-review process where the code's standards, functionalities, adequacy of the documentation will be judged by peer developers.
  6. Participate to the development of innovative projects aimed to enhance the Physics capabilities of the experiment as a whole. This may include participation to the development or support for new tracking methods, better framework, database evolution etc …

In order to bring the sub-system data closer to readiness, he/she

  • Has the authority to request highly prioritized productions within the scope of efficiency, alignment or calibration studies, or any study going toward the convergence, consolidation or strengthening of the Physics results. The software sub-system coordinator designated a point of contact handling calibration production requests (in such case, the POC should be clearly specified).

  • May request allocation of resources necessary to accomplish the outlined above tasks.

  • Has the ultimate and final authority to organize the work at hand within his/her sub-system realm. For example, partitioning of calibration, simulation and other tasks as necessary and depending on available manpower.

  • Is, unless indicated otherwise, the point of contact for modification of any code pertaining to the sub-system (others proposing modifications must inform the software sub-system coordinator).

  • Is expected to communicate to the S&C leader concerns and issues which may be or become obstacles in achieving the above mission.

Written by Jerome Lauret, S&C Leader 2003

 

Reconstruction Leader

Responsibilities Description: Reconstruction Leader

General

The STAR Reconstruction Leader is responsible for maintaining, developing and expanding the STAR reconstruction code and framework. By reconstruction, one includes

  • all detector / sub-system space point and/or physics quantity reconstruction
  • Any related data structure handling from the loading of the raw data to the final physics usable DST (or derivate)
  • Global tracking
  • dE/dx and particle identification
  • global event summary and hand-shaking with calibration constants and procedure
  • code applying and/or performing corrections
  • ...

The domain of development of Detector specific microscopic (slow) and parametrized (fast) response simulator will be done through discussions and advanced planning at Software & Computing meetings in conjunction with the Simulation and Database Leaders, affected sub-system detector coordinators and experts. The same applies with the cross-discipline (Reconstruction/Simulation) area known in STAR as embedding.

The reconstruction leader is expected to

  • Take the lead on the study and evaluation of new tracking techniques and framework enhancement as needed by the STAR Physics program and future development
  • Be aware of methodologies used in other High Energy or Nuclear Physics experiments and have deep understanding of their applicability and limitations
  • Conceive new ideas and attempt to convince colleagues and scientific community of their validity, significance and importance and if need be, document and publish ideas
  • Bring to attention innovative solutions and make recommendations to problems
  • Elaborate and present schedule for deployment of solutions whenever accepted/approved
  • Deploy, implement or integrate cost and time effective options/solutions in consideration of the research needs and schedules
  • Complete responsibilities on time and according to the STAR program planning
  • At every step, provide and maintain up to date documentation and offer support to users and developers of the STAR reconstruction softwares
  • Obtain user feedback, diagnose problems and make software and documentation modifications as necessary.

The Reconstruction Leader's tremendous task will therefore be assisted by an expert per detector sub-system as designated by the detector Software sub-system coordinator. He/she will provide to this expert guidance as per integrating the sub-system specific code within the STAR reconstruction framework and global tracking. Further manpower may come through reconstruction projects (a new tracking software is an example) which, upon completion, would fall under the Reconstruction Leadership.

The Reconstruction Leader will be further assisted by the STAR Calibration Coordinator and the Production Coordinators. He/she should respond and assists to the Calibration Coordinator's findings and requests for integration of new algorithms or techniques specific to the Calibration coordinator's area of expertise. In such case, they will work closely together until task completion within the scope and planning defined above. The Reconstruction coordinator may request directly to the Productions Coordinator(s) highly prioritized production in order to resolve or evaluate a question pertinent to the reconstruction area.

However, to ensure a smooth execution of global planning and complete transparency between the area of reconstructions, simulations and calibration, schedule and priorities should be brought to the attention of the STAR S&C leader and further discussed in Software & Computing meetings prior to execution or deployment.

In the absence of the STAR S&C leader and deputies, shall the schedule and tasks priorities be left unclear, the Reconstruction Leader judgment on production schedule will take precedence over all others.

Reconstruction deputy

One or more reconstruction deputy/deputies may be assigned by the S&C Leader to assist further the task of the reconstruction leader.

A reconstruction deputy's task is to effectively take the lead on a specific project as defined. Within the scope of this project, the reconstruction deputy has the same authorities and responsibilities than the Reconstruction Leader. They are expected to work close to one another until the completion of the defined task. In the absence of the Reconstruction Leader, such deputy will take full responsibility over the Reconstruction Software and in all areas including his assigned project. Shall several deputies be in office, the choice will be left to the Reconstruction Leader (or following the chain of the S&C organization).

Furthermore, “a” reconstruction deputy may represent the reconstruction activities and progress at Collaboration and/or Analysis meetings and therefore, should remain informed of activities within this area of expertise.

Written by Jerome Lauret, S&C Leader 2003

 

Simulation Leader

Responsibilities Description: Simulation Leader

General

The STAR Simulation Leader is responsible for maintaining, developing and expanding the STAR simulation framework. His role is to analyze, design, formulate, implement and maintain the consistency of the simulation software(s), packages and toolkits solution to support the STAR research needs and/or in response to problems in support of the scientific program. A list of areas under the Simulation's leader responsibilities are:
  • GEANT (or any other detector description and simulation toolkit), geometry and physics process modeling (i.e. describing the passage of elementary particles through the matter)
  • Any related data structure handling
  • Simulation framework response simulator including user API, framework, ...
  • Event generators (in conjunction with the Event Generator Coordinator)
  • ...
The domain of development of Detector specific microscopic (slow) and parametrized (fast) response simulator will be done through discussions and advanced planning at Software & Computing meetings in conjunction with the Reconstruction and Database Leaders, affected sub-system detector coordinators and experts. The same applies with the cross-discipline (Simulation/Reconstruction) area known in STAR as embedding.

Specifics (current as per 2003) and future

Within the current STAR simulation framework, the Simulation Leader is expected to attend to the development, test and maintenance of the existing geometry and materials database and related GEANT simulation software necessary to simulate the response of the STAR Detector used to interpret, without discontinuities, ongoing and forthcoming research data from the STAR Experiment at RHIC. He/she will be expected to
  • coordinate with the site's production coordinators the usage of computing farms at several institutes for Monte-Carlo production
  • respond to requests of the physics-working-groups
  • provide guidance and expertise to the sub-system software coordinator's designated expert when comes to the modeling of their detector sub-system geometry description
  • communicate with scientists and engineers to determine / solve / implement solutions to/to their problems
  • plan and prepare the production and transmission of the STAR simulation data
  • participate in the experiments physics program.
However, the STAR experiment and collaboration being a leaving body with evolving needs, the true meaning/ primary role and responsibility of this position is to ensure timely development of the software capability necessary to produce and interpret STAR research data and respond to the program's medium to long-term needs. To achieve this mission, the Simulation Leader must
  • Stay current with the state of the art technology, survey the research literature, evaluate existing methodologies
  • Follow the development of models, tools, toolkit for the simulation of geometry/material/detector response and modeling relevant to STAR Collaboration effort and determine feasibility of approach, develop and integrate new solutions within the STAR Simulation Software framework when applicable
  • At every step, provide and maintain up to date documentation and offer support to users and developers of the STAR simulation softwares
  • Obtain user feedback, diagnose problems and make software or documentation modifications as necessary.

Authorities

  • Manage assigned resources as see fit to complete research goals on time and at appropriate standards
  • To achieve objectives, he/she has the authority to directly organize highly prioritized productions within the alloted resources
  • The STAR Simulation Leader is the ultimate point of contact and organizer of the STAR Simulation framework and software and therefore, any STAR simulation work and activities should be clearly stated and indicated to the Simulation Leader.
  • Any Simulation Work should have his/her final approval before an integration in the STAR framework.

Responsibilities

He/she is responsible for identifying key milestones, determine immediate and future needs and communicate critical project issues in a timely fashion.
He/she is expected to be a central point of contact for the user's need within the area of expertise, respond to user problems, explain technology and methodologies and guide or mentor individuals as appropriate.

Skills

He/she is expected to demonstrate in depth understanding of fundamentals of the requirement specification, design, coding, and testing of technologies, methodologies and computational techniques related to the simulation needs of the STAR experiment. He/she should act as an architect for the future needs and therefore, have a good understanding of the current and future application and technology and the faculty to learn, apply and implement new and emerging techniques and concepts very quickly. The simulation leader would have a PhD in physics and several years of post-doctoral experience in the field of Heavy Ion, strong background in programming, using C++, FORTRAN, and GEANT3 and/or GEANT4, and good communicational skills.
Written by Jerome Lauret, S&C Leader 2003

Requirements, plans, projects and proposals

General

The pages and documents in this section are a mix of resource or design requirement documents, proposal, planning and assessment. Links to other documents may be made (like meeting, evaluation, reviews) making the pages here a single point shopping to the S&C project resource requirement design.

 

Computing requirements

 

 

 

2002

Every year, the 4 RHIC experiments along with the RCF assemble a a task force to discuss and plan for the Computing resource allocation. In STAR, FY03/FY04 was lead by Jérôme Lauret with help from Jeff Porter. We meant for this work to be publically available.

Most documents are from 2002 but are in effect in 2003.

 

2003

 

 

2005

This page is a placeholder to import the projects launched in 2005.

 

 

 

Content Management system in STAR

Introduction

This project started in 2005 as a service task aimed to provide a seamless port of the online Web server for document self-maintenance and easy access. The initial description follows. It was motivated by the poor maintenance and log term support of the pages available online and the need for quick page creation for keeping help, instructions and procedures up to date in a multiple user and group environment context. Also, we imagined that shift crew could drop comments on existing pages and hoped for the documentation of our operation to be more interactive and iterative with immediate feedback process. Plone was envisioned at the time but the task was opened to an evaluation based on requirements provided below.

Initial service task

This task would include the evaluation and deployment of the a content management system (CMS) on the online Web server. While most CMS uses a virtual file system, its ability to manage web content through a database is of particular interest. Especially, the approach would allow for a Web automatic mirroring and recovery. We propose the task to include

  • A smooth port of the main page available at http://online.star.bnl.gov/ to the CMS base system. This may include look and feel, color scheme etc ... at minimum, content. For example, the main tools should appear as left menu for easy navigation as a general web template.
  • The evaluation of the database approach should be made and in place before a transition from the old style to a CMS based online web server.
  • The CMS system layout should provide accessible branches for detector sub-systems, each sub-system may manage their branch separately. The branches will be most helpful to keep documentation on the diverse subsystem and provide a remote editable mechanism allowing easy management and modification.
  • Depending on time and experience, the development could be applied to the offline Web server to some extend. An area where it could benefit best is the tutorial area and the QA area. Within the plone system, users would be able to leave comments, add new tutorials etc ... a manager would then organize or make available to the public (flexibility of access to be discussed).

Project timelines and project facts

Facts:

  • We wanted a framework rather than a tool
  • There were more than 762 main stream CMS at the start of the project
  • "best" of today is not the best tomorrow
  • We therefore decided to start the project with a focus on requirements rather than a specific solutions

 Timelines:

  • 2005 – Dmitry Arkhipkin took this & Communicated with Dan Magestro from the beginning
    • The project evolved toward a offline Web server scope rather than a online Web server. The goal was to similarly address the obsolescence of the offline Web server
      • more than 1/2 of the links were dead (65%)
      • most documents were obsolete
      • AFS tree became inextricable and not scaling (strategy to "hold" it together included the creation of separate (AFS volumes for load balancing reasons / ACL became had to maintain)
  • 2006 – Beta (was rapidly used by STAR users) – Version 4.6
  • 2007 – Zbigniew Chajecki (development & integration)

Project requirements

The following requirements were set for the project:

Technical requirements

  • ¡Database storage support
    • Preferably MySQL or  Postgres
    • Any other with a ”driver”
  • Replication
  • Flexible
    • Can be extended by modules or plug-in
    • Programmatic language
  • Modul(ar) design
    • Can reshape look-and-feel Granularity
    • Layout
      • Trees, collaborative work
    • Individual accounts
    • Authorization
      • Privileges and/or ACLs based system
      • Group management (virtual or group based privs)
  • Self-maintained
    • Layouts by sections
    • Link auto-update (page move should auto-update cross-reference)

STAR requirements

  • Easy page creation
    • Web based editor – accessibility from anywhere
    • WYSIWYG editor, What You See Is What You Get - help novice and advanced users. Feature not a mandate (plain HTML allowed)
    • Assisted Help to layout & design
      • i.e. no special html, xhtml, xml knowledge required
  • Support for
    • Attachments, images,  …
    • Auto-update search (i.e. search index and is self maintained withoit the need for external scripts or program)
  • Community (popular) tools supported - may trends corresond to real modern world needs
    • Blogs, Comments
    • Polls, Calendar, Conferences, Meetings …
    • ... many more via modules (a-la-perl extension)
  • Group management
    • PWG, Sub-systems, Activities, …
  • Powerful search (feature rich selector as far as possible)
  • Common visual theme for all pages (auto)
  • Community support (leverage development from a wider base support)

Functional requirements

The following functional requirements were either requested or desired for a smooth (sup)port of previous deployment.

  • Page content must be allowed to be public or require an authentication
  • Meetings/agenda should allow for a non-public sections. Public content should remain to the strict minimal "advertisement" level and not reveal information internal to STAR.
    • (comments? summary? section support?)
  • Talks provided as attachments should not be public (PWG requested)
  • Groups should extent beyond sub-systems and PWG (technical groups, sub-groups)

Related presentation

  • You do not have access to view this node

 

2006

To be transfered from the old site

2007

.

Inner Silicon Tracking

ID
Task Name
DurationStartFinishResource Names 
1      
2
TPC checks
7 daysFri 2/9/07Mon 2/19/07  
3
Laser drift+T0
7 daysFri 2/9/07Mon 2/19/07Yuri[50%] 
4
SSD shift + East/West TPC tracks
3 daysFri 2/9/07Tue 2/13/07Spiros[25%] 
5
SVT aligment
7 days?Tue 2/20/07Wed 2/28/07  
6
SVT+SSD (cone) for each wafer
1 wkTue 2/20/07Mon 2/26/07Ivan,Richard 
7
Shell/Sector for each magnetic field settings
1 day?Tue 2/27/07Tue 2/27/07  
8
Ladder by Ladder
1 day?Wed 2/28/07Wed 2/28/07  
9
Using TPC+SSD, Determining the SVT Drift velocity
7 daysFri 2/9/07Mon 2/19/07Ivan 
10
Drift velocity
12 daysFri 2/9/07Mon 2/26/07  
11
High stat sample processing preview
7 daysFri 2/9/07Mon 2/19/07Vladimir 
12
Final evaluation
5 daysTue 2/20/07Mon 2/26/07Vladimir 
13      
14
Online QA (offline QA)
7 daysFri 2/9/07Mon 2/19/07Ivan,Helen 
15      
16
Hit error calculation final pass
1 wkFri 2/9/07Thu 2/15/07Victor 
17
Self-Alignement
3 wksFri 2/16/07Thu 3/8/07Victor 
18
Code in place for library - aligement related
1 wkFri 2/9/07Thu 2/15/07Yuri[10%],Victor[10%] 
19      
20
Tasks without immediate dependencies
60 daysFri 2/9/07Thu 5/3/07  
21
Cluster (SVT+SSD) and efficiency studies
1.5 monsFri 2/9/07Thu 3/22/07Artemios,Jonathan 
22
Slow/Fast simulators reshape
3 monsFri 2/9/07Thu 5/3/07Jonathan,Polish students x2,Stephen 
23      
24      
25
Cu+Cu re-production
87.5 daysFri 3/9/07Tue 7/10/07  
26
Cu+Cu 62 GeV production
3 wksFri 3/9/07Thu 3/29/07  
27
Cu+Cu 200 GeV production
72.5 daysFri 3/30/07Tue 7/10/07  
28
cuProdcutionMinBias (30 M)
8.5 wksFri 3/30/07Tue 5/29/07  
29
cuProductionHighTower (17 M)
6 wksTue 5/29/07Tue 7/10/07  

 

Multi-core CPU era task force

Introduction

On 7/12/2007 23:42, a task force was assembled to evaluate the future of the STAR software
and its evolution in the un-avoidable multi-core era of hardware realities.

The task force was composed of: Claude Pruneau (Chair), Andrew Rose, Jeff Landgraf, Victor Perevozchikov, Adam Kocolosk. The task force was later joined by Alex Wither from the RCF as the local support personnel were interested in this activity.

The charges and background information are attached at the bottom of this page.

The initial Email announcement launching the task force follows:

Startup Email (7/12/2007 23:42)

Date: Thu, 12 Jul 2007 23:42:40 -0400
From: Jerome LAURET <jlauret@bnl.gov>
To: pruneau claude <aa7526@wayne.edu>, Andrew Rose <AARose@lbl.gov>, 
 Jeff Landgraf <jml@bnl.gov>,
 Victor Perevozchikov <perev@bnl.gov>, Adam Kocoloski <kocolosk@MIT.EDU>
Subject: Multi-core CPU era task force

        Dear Claude, Adam, Victor, Jeff and Andrew,

        Thank you once again for volunteering to participate to
serve on a task force aimed to evaluate the future of our software
and work habits in the un-avoidable multi-core era which is upon
us. While I do not want to sound too dire, I believe the emergence
of this new direction in the market has potentials to fundamentally
steer code developers and facility personnel into directions they
would not have otherwise taken.

        The work and feedback you would provide on this task force
would surely be important to the S&C project as depending on
your findings, we may have to change the course of our "single-thread"
software development. Of course, I am thinking of the fundamental
question in my mind: where and how could we make use of threading
if at all possible or are we "fine" as it is and should instead
rely on the developments made in areas such as ROOT libraries.

        In all cases, out of your work, I am seeking either
guidance and recommendation as per possible improvements and/or
project development we would need to start soon to address the
identified issues or at least, a quantification of the "acceptable
loss" based on cost/performance studies. As a side note, I have
also been in discussion with the facility personnel and they may
be interested in participating to this task force (TBC) so, we
may add additional members later.


        To guide this review, I include a background historical
document and initial charges. I would have liked to work more on
the charges (including adding my expectations of this review as
stated in this Email) but I also wanted to get them out of the
door before leaving for the V-days. Would would be great would
be that, during my absence, you start discussing the topic and
upon my return, I would like to discuss with you on whether or
not you have identified key questions which are not in the charges
but need addressing. I would also like by then to identify a chair
for this task force  - the chair would be calling for meetings,
coordinate the discussions and organize the writing of a report
which ultimately, will be the result of this task force.

        Hope this will go well,

        Thank you again for being on board and my apologies for
dropping this and leaving at the same time.


-- 
              ,,,,,
             ( o o )
          --m---U---m--
              Jerome

-

Follow up EMail (8/3/2007 15:34)

 

Date: Fri, 03 Aug 2007 15:34:56 -0400
From: Jerome LAURET <jlauret@bnl.gov>
CC: pruneau claude <aa7526@wayne.edu>, Andrew Rose <AARose@lbl.gov>, 
 Jeff Landgraf <jml@bnl.gov>,
 Victor Perevozchikov <perev@bnl.gov>, Adam Kocoloski <kocolosk@MIT.EDU>, 
 Alexander Withers <alexw@bnl.gov>
BCC: Tim Hallman <hallman@bnl.gov>
Subject: Multi-core CPU era task force


        Dear all,

        First of all, I would like to mention that I am very pleased
that Claude came forward and offered to be the chair of this task force.
Claude's experience will certainly be an asset in this process. Thank
you.

Second news: after consulting with Micheal Ernst (Facility director
for the RACF) and Tony Chan (Linux group manager) as well as Alex
Withers from the Linux group, I am pleased to mention that Alex
has kindly accepted to serve on this task force. Alex's experience
in the facility planing and work on batch system as well as aspects
of how to make use of the multi-core trends in the parallel nascent
era of virtualization may shade some lights on issues to identify
and bring additional concepts and recommendations as per adapting
our framework and/or software to take best advantage of the multi-core
machines. I further discussed today with Micheal Ernst of the
possibility to have dedicated hardware shall testing be needed for
this task force to complete their work - the answer was positive
(and Alex may help with the communication in that regard).

        Finally, as Claude has mentioned, I would very much like for
this group to converge so a report could be provided by the end of
October at the latest (mid-October best). This time frame is not
arbitrary but is at the beginning of the fiscal year and at the
beginning of the agency solicitations for new ideas. A report by
then would allow shaping development we may possibly need for our
future.


        With all the best for your work,

 

Background work

The following documents were produced by the task-force members and archived here for historical purpose (and possibly providing a starting point in future).

CPU and memeory usage on the the farm - Alex Wither

Opteron (CPU / memory)

Xeon (CPU / memory)

CAS & CRS CPU usage, month and year

 

 

Outcome & Summary

A reminder as per the need for a reoprt was sent on 10/3/2007 to the chair (with a side track discussion on other issues which seemed to have taken attention). To accomodate for the busy times, a second reminder was sent on 11/19/2007 with a new due date for the end of november. Sub-sequent reminders were sent on the 12/10/2007 and 1/10/2008.

The task force has not deliverred the report as requested. A summary was sent in an Email as follow:

... a summary of the activities/conclusions of the committee.


... during the first meeting, all participants agreed that if
there was anything to be done, it would be on reconstruction. Members
of the committee felt that GEANT related activities are not in the
perview of STAR and should not be STAR's responsibility.  In view also
of what we did next it also appears that not much would actually be
gained.  We also discussed (1st meeting) the possibility of
multi-treading some aspects of user analysis. e.g. io, and perhaps some
aspects of processing.  Here people argued that there is too much
variability in type of analyses carried by STAR users. And it is not
clear that multi-treading would be in anyway faster - while adding much
complexity to infrastructure - if not to the user code.


Members of the committee thus decided to consider reconstruction processes only.

In subsequent meetings, we realized (based on some references test
conducted in the Industry) that perhaps not much would be gained if a
given node (say 4 cores) can be loaded with 4 or 5 jobs simultaneously
and provided sufficient RAM is available to avoid memory swapping to
disk.

Alex, and Andrew carried some tests. Alex's test were not really
conclusive because of various problems with RCF. Andrew's test however
clearly demonstrated that the wall clock time essentially does not
change if you execute 1 or 4 jobs on a 4-core node. So the effective
throughput of a multicore node scales essentially with the number of
cores. No need for complexity involving multithreading.  Instant
benefits.

Cost:   PDSF and RCF are already committed according to Alex and
Andrew to the purchase of multicore machines. This decision is driven
in part by cost effectiveness and by power requirements. 1 four core
machine consumes less power, and is less expensive than 4 1-core
machine. Additionally, that's where the whole computing industry is
going...


So it is clear the benefits of multicore technology are real and immediate without 
invocation of multitreading.

Possible exceptions to this conclusion would be for online
processing of data for trigger purposes or perhaps for fast diagnostic
of the quality of the data. Diagnostics (in STAR) are usually based on
a fairly large dataset so the advantage of multi-threading are dubious
at best in this case because the througput for one event is then
irrelevant - and it is the aggregate throuput that matters.

Online triggering is then the only justifiable case for use of
multithreading.  Multithreading would in principle enable faster
throughput for each event thereby enabling sophisticated algorithms.
This is however a very special case and it is not clear that adapting
the whole star software for this purpose is a worthy endeavor - that's
your call.

I should say in closing that the mood of the committee was overall
quite pessimistic from the onset. Perhaps a different group of people
could provide  a slightly different point of view - but I really doubt
it.

 

 

2008

Background information

 

Projects and proposals

This page will either have requirements document or project description for R&D related activity in S&C (or defined activities hopefully in progress).

 

  1. We proposed an R&D development within the ROOT framework to support full schema evolution as described in the project description
  2. We worked on the elaboration of the KISTI proposal to join STAR
  3. We supported a network upgrade for the RCF backbone (sent via Micheal Ernst to Tom Ludlam Physics departement chair at BNL on April 16th 2008, discussed at the Spokesperson meeting on April 4th 2008)
  4. For 2008 Network Requirements Workshop for the DOE/SC Nuclear Physics Program Office, we provided background material as below (STAR sections and summary, the Phenix portion was taken verbatim from their contribution)
  5. Trigger emulation / simualtion framework: discussions from 20070528 and following Emails.

 

 

Ongoing activities

Internal projects and sub-systems task lists

 

Tasks and projects

Computing operation: IO performance measurements

Goals:

  • Provide a documented reference of IO performance tests made toward several configuration sin both disk formatting and RAID level space under non-constrained hardware considerations.
  • The base line would help making future configuration choices when it comes to hardware provisioning of servers (services) such as database servers, grid gatekeepers, network IO doors, etc...

Steps and tasks:

  1. Survey community work on the topic of IO performance of drives especially topics concerning
    1. Effect of disk format on performance
    2. Effect of parallelism on performance
    3. Effect of software raid (Linux) performance and responsiveness (load impact on node under stress)
    4. Software RAID level and performance impacts
    5. Kernel parameter tweaks impacting IO performance (good examples are efforts of DAQ group, review consequence)
       
  2. Prepare a baseline IO test suite for measuring IO performance (read and write) under two mode Possible test suite could follow what was used in the  IO performance page . Other tools welcomed based upon survey recommendations.
    • single stream IO
    • multi stream IO (parallel IO)
       
  3. Use a test node and measure IO performance under the diverse reviewed configurations. A few constraints on choice of hardware are needed to avoid biasing the performance results
    • The node should have sufficient memory to accommodate for the tests (2 GB of memory or more is assumed to be large sufficient to accommodate for any tests)
    • OS must support software RAID
    • Disks used for the test should be isolated from system drive to avoid performance degradation
    • Node should have more than two drives (including system disk) and ideally, at least 4 (3+1)
       
  4. Present result as a function of disk formatting, RAID level and/or number of drives added in both absolute values (values for each configuration) and differentials (gain when moving from one configuration to another).

Status: See results on Disk IO testing, comparative study 2008.

Opened project and activities listing

A summary of ongoing and incoming projects was sent to the software coordinators for feedback. The document refers to projects listed in this section under Projects and proposals.

The list below does NOT include general tasks such as the one described as part of the S&C core team roles as defined in the Organization job descriptions documents . Examples of which would be global tracking with Silicon including HFT, geometry maintenance and updates or otherwise calibration or production tasks as typically carried for the past few years. Neither does this list include improvements we need for areas such as online computing (many infrastructure issues, including networking an area of responsibility which has been unclear at best) nor activities such as the development and enhancement of the Drupal project (requirements and plans sent here).

The list includes:

  • Closer look at Calorimetry issues if any (2007 operation workshop feedback follow-up related to calibration being too"TPC centric" and not addressing Physics qualities). Proposed a workshop with goals to:
    • gather requirements from the PWG (statements from the operation workshop in 2007 seemed to have taken the EMC coordinators by surprised as per what resolution was needed to achieve Physics goals)
    • discuss with experts technical details and implementation, unrolling / deployment and timing
  • Status: Underway, see report from a review as PSN0465 : EMC Calibrations Workshop report, fall 2008

  • Db related: load balancing improvements, monitoring and performance measurements, resource discovery, distributed database
    Status: underway.
    References: You do not have access to view this node
     
  • Trigger simulations - (some fleshed out on May 2007 as mentioned in this S&C meeting and attached below). The general idea was to provide a framework to allow trigger emulation / simulation offline for studying rejection/selection effects either applying trigger algorithms on real data (minimum bias) or via true simulation or allow re-applying trigger algorithm to triggered sample (higher threshold for example)
    Status: nowhere close to where it should be
    References: trigger simulation discussions meeting notes and Email communications.
     
  • Embedding framework reshape.
    Status: underway (need full eval with SVT and SSD integrated)
     
  • Unified online/offline framework including integration of online reader offline and offline tools online (leveraging knowledge, minimizing work). This task would address comments and concerns that whenever code is developed online (for PPlot purposes for example), it also needs to be developed offline within separate and very different reader approaches. At a higher level, dramatic memory overwrite offline occurred in early 2007 due to the lack of synchronization between structure sizes (information did NOT propagate and was not adjusted offline by the software sub-system coordinator of interest; an entire production had to be re-run).
    Status: tasked and underway, first version delivered in 2008, usage of "cons" and regression testing in principle in place (TBC in 2009 run)
     
  • EventDisplay revisited
    Status: underway (are we done? need new review follow-up after the pre-review meeting made in 2007)
     
  • VMC - realistic geometry / geometry description
    Status: Project on hold due to reconstruction issues, resumed July 2008.
     
  • Forward tracking (radial field issue). May have importance for FGT project upon schedule understanding.
    Status: depend on previous item and would be tasked whenever forward tracking need would be better defined.
     
  • Old framework cleanup, table cleanup, drop old formats and historical baggage. In principle a framework tasks, this is bound to introduce instabilities during which assembling a production library would be challenging. This need to be tasked outside major development projects.
    Status: only depend on production of Year 7/8 start-up
     
  • Multi-core CPU era - Task force assembled in 2007 (Multi-core CPU era task force) had an unfortunate conclusion that the work would be too hard hence not necessary. Unfortunately, market development and aggressive company progression toward even more packed CPU and core indicates the future must integrate this new paradigm. First attempts should target the "obvious".
    Status: First status and proposal made at ACAT08 (changing chains to accommodate for possible parallelism). Investigated possibility of parallelism at library level and core algorithm (tracking). Talks at ACAT08 very informative.
     
  • Automated QA (project draft available, Kolmogorov etc... discussed and summarized here)
    Status: no project drafted yet, only live discussions and Email communications.
     
  • Automated calibration. The main project objective is to move toward a more automated calibration framework whereas migration from one chain to another chain (distortion correction) would be triggered by a criteria (resolution, convergence) rather than a manual change. This work may leverage the FastOffline framework (which was a first attempt to make automated calibration a reality; currently modified by hand and the trigger mechanism is not present / implemented)
    Status: Project description available . Summer 08 service task.
     
  • IO schema evolution (reduction of file size by dropping redundant variables but with full transparency to users)
    Status: Project started as planned on July 16th with goals drafted on page Projects and proposals. Project deliverables were achieved (tested from a custom ROOT version now in the ROOT main CVS). Future release will include a fully functional schema evolution as specified in our document. Integration will be needed.
    Project team: Jerome Lauret (coordination), Valeri Fine (STAR tetsing), Philippe Canal (ROOT team)

     
  • Distributed storage improvement (Efficient dynamic disk population). This project would aim to restore the dynamic disk population of datasets on distributed disk as well as a prioritization mechanism (and possibly bandwidth throttling) so user cannot over-subscribe storage, causing past observed massive delete/restore dropping efficiency.
    Status: under-graduate thesis done ; model to improve IO in/out of HPSS is defined and need implementation.
     
  • Efficient multi-site data transfer (coordination of data movement), this project aims to address multi-Tier2 data transfer support and help organize / best utilize the bandwidth out of BNL. A second part of this project aims at data placement on Grid whereas a "task" working on a dataset is to be scheduled with use of existing staged files at sites or possible pre-staging or migration of files from any site to any site (a bit ambitious).
    Status: Project started as a computer science PhD program (thesis submitted). Work scheduled over a 3 years period and deliverable would need to be put in perspectives of Grid project deliverables.

     
  • Distributed production and monitoring system, job monitoring, centralized production requests interface
    Status: work tasked within the production team.
     
  • FileCatalg improvement. The FileCatalog in STAR was developed from in-house knowledge and support (starting from service work). The catalog now hold 15 Million records (scalability beyond is a concern) and its access possibly inefficient. An initial design diverging from Meta-Data catalog, File Catalog, Replica Catalog has allowed for a quick start and the development of additional infrastructure but has also lead to the replication of the Meta Data information, making hard to maintain consistency of the Catalogs across sites. Federating the Catalogs and using all site's information simultaneously has been marginal to not possible, making a global namespace (replicas) not possible. The lack of this component will directly affect grid realities.
    Status: Ongoing (see Catalog centralized load management, resolving slow querries).

Wish list (for now):

  • Online tracking & High Level trigger. This may depend on a trigger simulation framework (it would have benefited from it for sure) or may be an opportunity to revive the issue and shape anew focused (and reduced in scope) project.
    Status: How to fit this additional activity is under debate. First discussion held at BNL on 2008/07/10 and followed later by additional meetings. This activity moved to the "upgrade" activity.

 

STAR/RCF resource plans

 

 

General fund

 The level of funding planned for 2008 was:

  • According to the RHIC mid-term strategic planning for 2006-2011 document, the budget for 2008 was projected to be 2140 k$ (table 7-2) with a note that and additional 2 M $ additional would be needed between FY08 and FY10 (to accommodate for network infrastructure, storage robotics and silo expansion and general infrastructure changes)
  • The budget planned for FY08 in FY07 was 2.5 M$, accounting for recovering by 0.5 M$ already present past years shortfalls
  • The current budget available is 1.7 M$ with a 1.5 M$ usable base fund.

External funds

Following previous years "outsourcing" of funds approach, an note was sent to the STAR collaboration (Subject: RCF requirements & purchase) on 3/31/2008 12:18. The pricing offered was 4.2 $/GB i.e. 4.3 k$/TB of usable space. Based on the 2007 RCF requirement learning experience (pricing was based on vendor's total space rather than usable), the price was firmed, fixed and guaranteed as "not higher than 4.2 $/GB" by the facility director Micheal Ernst at the March 27th liaison meeting.

The institutions external fund profile for 2008 is as follows:

 

STAR external funds
Institution Paying account TB requested Price
UCLA UCLA 1 4300.8
rice rice 1 4300.8
LBNL LBNL 4 17203.2
VECC BNL 1 4300.8
UKY UKY 1 4300.8
Totals 8 34406.4

Penn State university provided (late) funds for 1 TB worth.

 

*** WORK IN PROGRESS ***

Requirements 

The requirements for FY08 are determined based on 

The initial STAR requirements provided for the RHIC mid-term strategic plan can be found here

 

STAR resource requirements FY05-FY12STAR resource requirements FY05-FY12

 

The initial raw data projected was 870 TB (+310 TB).

The RAW data volume taken by STAR in FY08 (shorter run) is given by the HPSS usage (RAW COS) as showed below:


A total of 165 TB was accumulated far below expected data projections by a factor of 2. The run was however declared as meeting (to exceeding) goals comparing to the STAR initial BUR.

Some notes:

  • STAR made extensive use this year of fast triggers
     
  • Based on those numbers, we assumed that
    • The CPU requirements of 1532 kSI2k (+1071 kSI2k) would equally scale, hence a minimal requirement of +215 kSI2K should be accounted for
    • A bigger pool of distributed storage would allow for more flexibility: it would allow for re-considering multiple (if not most of) the datasets to be placed on disk in Xrootd pool + it would allow (modulo expanding beyond the 1.2 replication baseline) to better load balance the resources.
    • The distributed disk planing accounted for 365 TB of storage (1 pass production, small fraction of past results on disk). We targeted 800 TB of disk space (about twice the initial amount).

Allocations within total budgets

scenario B = scenario A + external funds

 

Experiment Parameters STAR STAR
  Senario A Senario B.
Sustained d-Au Data Rate (MB/sec) 70 70
Sustained p-p Data Rate (MB/sec) 50 50
Experiment Efficiency (d-Au) 90% 90%
Experiment Efficiency (p-p) 90% 90%
Estimated d-Au Raw Data Volume (TB) 130.8 130.8
Estimated p-p Raw Data Volume (TB) 41.5 41.5
Estimated Raw Data Volume (TB) 172.3 172.3
<d-AU Event Size> (MB) 1 1
<p-p Event Size> (MB) 0.4 0.4
Estimated Number of Raw d-Au Events 137,168,640 137,168,640
Estimated Number of Raw p-p Events 108,864,000 108,864,000
d-AU Event Reconstruction Time (sec) 9 9
p-p Event Reconstruction Time (sec) 16 16
SI2000-sec/event d-Au 5202 5202
SI2000-sec/event p-p 9248 9248
CPU Required (kSI2000-sec) 1.7E+9 1.7E+9
CRS Farm Size if take 1 Yr. (kSI2k) 54.6 54.6
CRS Farm Size if take 6 Mo. (kSI2k) 109.1 109.1
     
Estimated Derived Data Vlume (TB) 200.0 200.0
Estimated CAS Farm Size (kSI2k) 400.0 400.0
     
Total Farm Size (1 Yr. CRS) (kSI2k) 454.6 454.6
Total Farm Size (6 Mo. CRS) (kSI2k) 509.1 509.1
     
Current Central Disk  (TB) 82 82
Current Distributed Disk (TB) 527.5 527.5
Current kSI2000 1819.4 1819.4
     
Central Disk to retire (TB) 0 0
# machines to retire form CAS 0 0
# machines to retire from CRS 128 128
Distributed disk to retire (TB) 27.00 27.00
CPU to retire (kSI2k) 120.00 120.00
     
Central Disk (TB) 49.00 57.00
     
Cost of Central Disk $205,721.60 $239,308.80
Cost of Servers to support Central Disk    
     
Compensation Disk entitled (TB) 0.00 0.00
Amount (up to entitlement) (TB) 0.00 0.00
Cost of Compensation Disk $0 $0
Remaining Funds $0 $0
     
Compensation count (1U, 4 GB below) 5 5
Compensation count (1U, 8 GB below) 0 0
CPU Cost $27,500 $27,500
Distributed Disk 27.8 27.8
kSI2k 114.5 114.5
     
     
# 2U, 8 cores, 5900 GB disk, 8 GB RAM 27 27
# 2U, 8 cores, 5900 GB disk, 16 GB RAM 0 0
CPU Cost $148,500 $148,500
Distrib. Disk on new machines (TB) 153.9 153.9
kSI2k new 618.2 618.2
Total Disk (TB) 813.2 821.2
Total CPU (kSI2000) 2432.1 2432.1
Total Cost $354,222 $387,809
Outside Funds Available $0 $34,406
Funds Available $355,000 $355,000

 

Post purchase actions

BlueArc disk layout before the new storage commissioning

 

Name

File System

Path

Hard Quota

Space allocated

Available Space

BlueArc Physical storage

star_institutions_bnl

STAR-FS01

/star_institution/bnl

3.50

16.50

19.00

BA01

star_institutions_emn

STAR-FS01

/star_institution/emn

1.60

star_institutions_iucf

STAR-FS01

/star_institution/iucf

0.80

star_institutions_ksu

STAR-FS01

/star_institution/ksu

0.80

star_institutions_lbl

STAR-FS01

/star_institution/lbl

9.80

star_data03

STAR-FS02

/star_data03

1.80

17.22

19.75

star_data04

STAR-FS02

/star_data04

1.00

star_data08

STAR-FS02

/star_data08

1.00

star_data09

STAR-FS02

/star_data09

1.00

star_data16

STAR-FS02

/star_data16

1.66

star_data25

STAR-FS02

/star_data25

0.83

star_data26

STAR-FS02

/star_data26

0.84

star_data31

STAR-FS02

/star_data31

0.83

star_data36

STAR-FS02

/star_data36

1.66

star_data46

STAR-FS02

/star_data46

6.60

star_data05

STAR-FS03

/star_data05

2.24

18.51

21.40

BA02

star_data13

STAR-FS03

/star_data13

1.79

star_data34

STAR-FS03

/star_data34

1.79

star_data35

STAR-FS03

/star_data35

1.79

star_data48

STAR-FS03

/star_data48

6.40

star_data53

STAR-FS03

/star_data53

1.50

star_data54

STAR-FS03

/star_data54

1.50

star_data55

STAR-FS03

/star_data55

1.50

star_data18

STAR-FS04

/star_data18

1.00

16.86

19.45

star_data19

STAR-FS04

/star_data19

0.80

star_data20

STAR-FS04

/star_data20

0.80

star_data21

STAR-FS04

/star_data21

0.80

star_data22

STAR-FS04

/star_data22

0.80

star_data27

STAR-FS04

/star_data27

0.80

star_data47

STAR-FS04

/star_data47

6.60

star_institutions_mit

STAR-FS04

/star_institutions/mit

0.96

star_institutions_ucla

STAR-FS04

/star_institutions/ucla

1.60

star_institutions_uta

STAR-FS04

/star_institutions/uta

0.80

star_institutions_vecc

STAR-FS04

/star_institutions/vecc

0.80

star_rcf

STAR-FS04

/star_rcf

1.10

star_emc

STAR-FS05

/star_emc

?

1.042

2.05

BA4

star_grid

STAR-FS05

/star_grid

0.05

star_scr2a

STAR-FS05

/star_scr2a

?

star_scr2b

STAR-FS05

/star_scr2b

?

star_starlib

STAR-FS05

/star_starlib

0.02

star_stsg

STAR-FS05

/star_stsg

?

star_svt

STAR-FS05

/star_svt

?

star_timelapse

STAR-FS05

/star_timelapse

?

star_tof

STAR-FS05

/star_tof

?

star_tpc

STAR-FS05

/star_tpc

?

star_tpctest

STAR-FS05

/star_tpctest

?

star_trg

STAR-FS05

/star_trg

?

star_trga

STAR-FS05

/star_trga

?

star_u

STAR-FS05

/star_u

0.97

star_xtp

STAR-FS05

/star_xtp

0.002

star_data01

STAR-FS06

/star_data01

0.83

14.94

16.90

star_data02

STAR-FS06

/star_data02

0.79

star_data06

STAR-FS06

/star_data06

0.79

star_data14

STAR-FS06

/star_data14

0.89

star_data15

STAR-FS06

/star_data15

0.89

star_data38

STAR-FS06

/star_data38

1.79

star_data39

STAR-FS06

/star_data39

1.79

star_data40

STAR-FS06

/star_data40

1.79

star_data41

STAR-FS06

/star_data41

1.79

star_data43

STAR-FS06

/star_data43

1.79

star_simu

STAR-FS06

/star_simu

1.80

star_data07

STAR-FS07

/star_data07

0.89

16.40

19.15

star_data10

STAR-FS07

/star_data10

0.89

star_data12

STAR-FS07

/star_data12

0.76

star_data17

STAR-FS07

/star_data17

0.89

star_data24

STAR-FS07

/star_data24

0.89

star_data28

STAR-FS07

/star_data28

0.89

star_data29

STAR-FS07

/star_data29

0.89

star_data30

STAR-FS07

/star_data30

0.89

star_data32

STAR-FS07

/star_data32

1.75

star_data33

STAR-FS07

/star_data33

0.89

star_data37

STAR-FS07

/star_data37

1.66

star_data42

STAR-FS07

/star_data42

1.66

star_data44

STAR-FS07

/star_data44

1.79

star_data45

STAR-FS07

/star_data45

1.66


Reshape proposal

 

 

 

 

Action effect (+/- impact in TB unit)

 

Action

FS01

FS02

FS03

FS04

FS05

FS06

FS07

SATA

2008/08/15

Move/backup data25, 26, 31, 36 to SATA

 

4.56

 

 

 

 

 

-4.56

2008/08/18

Drop 25, 26, 31, 36 from FS01 and expand on SATA to 5 TB

 

 

 

 

 

 

 

-15.84

2008/08/22

Shrink 46 to 5 TB, move to SATA and make it available at 5 TB

 

6.60

 

 

 

 

 

-5.00

 

 

 

 

 

 

 

 

 

 

2008/08/19

Move institutions/ksu and institutions/iucf to FS02

1.60

-1.60

 

 

 

 

 

 

2008/08/19

Expand ksu and iucf to 2 TB

 

-0.80

 

 

 

 

 

 

2008/08/22

Move institutions/bnl to FS02

3.50

-3.50

 

 

 

 

 

 

 

Expand bnl to 4 TB

 

-0.50

 

 

 

 

 

 

 

Expand lbl by 4.2 TB (i.e. 14 TB)

-4.20

 

 

 

 

 

 

 

 

Expand emn to 2 TB 

-0.40

 

 

 

 

 

 

 

 

Expand data03 to 2.5 TB

 

-0.70

 

 

 

 

 

 

 

Expand data04 to 2 TB

 

-1.00

 

 

 

 

 

 

 

Expand data08 to 2 TB

 

-1.00

 

 

 

 

 

 

 

Expand data16 to 2 TB

 

-0.34

 

 

 

 

 

 

 

Expand data09 to 2 TB

 

-1.00

 

 

 

 

 

 

Checkpoint

 

0.50

0.72

0.00

0.00

0.00

0.00

0.00

-25.40

 

Action

FS01

FS02

FS03

FS04

FS05

FS06

FS07

SATA

2008/08/22

Shrink data 48 to 5 TB,move to SATA

 

 

6.40

 

 

 

 

-5.00

 

Expand data05 to 3 TB

 

 

-0.76

 

 

 

 

 

 

Expand 13, 34, 35, 53, 54 and 55 to 2.5 TB

 

 

-5.13

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2008/08/22

Shrink and move data47 to SATA

 

 

 

6.60

 

 

 

-5.00

 

Move 18,19, 20, 21 to SATA

 

 

 

3.40

 

 

 

-3.40

 

Expand data18, 19, 20, 21 to 2.5 TB

 

 

 

 

 

 

 

-6.60

 

Add to FS02 a institutions/uky at 1 TB

 

 

 

-1.00

 

 

 

 

 

Add to FS02 a institutions/psu at 1 TB

 

 

 

-1.00

 

 

 

 

 

Add to FS02 a institutions/rice at 1 TB

 

 

 

-1.00

 

 

 

 

 

Expand vecc to 2 TB

 

 

 

-1.20

 

 

 

 

 

Expand ucla to 3 TB

 

 

 

-1.40

 

 

 

 

 

Expand 22 and 27 to 1.5 TB

 

 

 

-1.40

 

 

 

 

 

Expand /star/rcf to 3 TB

 

 

 

-1.90

 

 

 

 

Checkpoint

 

0.50

0.72

0.51

1.10

0.00

0.00

0.00

-45.40

 

Action

FS01

FS02

FS03

FS04

FS05

FS06

FS07

SATA

 

Free (HPSS archive) emc, src2a, src2b, stsg, timelapse, tof

 

 

 

 

0.00

 

 

 

 

Free (HPSS archive) tpc, tpctest, trg, trga

 

 

 

 

0.00

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Move 40, 41, 43 to SATA

 

 

 

 

 

5.37

 

-5.37

 

Expand 01 to 2 TB

 

 

 

 

 

-1.17

 

 

 

Expand 02 to 2 TB

 

 

 

 

 

-1.21

 

 

 

Expand star_simu to 3 TB

 

 

 

 

 

-1.20

 

 

Checkpoint

 

0.50

0.72

0.51

1.10

0.00

1.79

0.00

-50.77


 


Missing information and progress records:

  • 2008/08/14 13:44 - Answer from the RCF as per the above plan being approved (and commented it seemed easy)
    • Two caveats: ETA cannot be provided until migration starts (one test example) to get a more accurate estimate
    • While virtualmount point are swapped between one storage pool to another, there may be a fluke in access (will need infomring institutions / production disk will be handled by hard-dismount)
       
  • 2008/08/14 13:42 - Sent an Email requesting infomration regarding disk manager and/or policies for PSU, UKY and RICE - Email sent to council rep and/or designated rep on August 14th 2008
    • Answer from UKY 2008/08/14 13:56 >> Disk space manager=Renee Fatemi, policy = MIT policy
    • Answer from PSU 2008/08/15 16:07 >> Policy is standard
       
  • 2008/08/18
    • Achieved actions marked in Italic + date
    • Date in Italic are ongoing actions
    • If two dates appear, the first is the start of the action and the second the end

 

 

 

2009

Requirements and resource planing for 2009.

CPU and bulk storage purchase 2009

The assumed CPU profile will be:

  • 2 GB of memory per core
  • Nearly 6 TB of disk space per node
  • Several CPU model will be investigated for best price/performance ratio (bulk purchase pricing matters in this purchase hence coordination between STAR/Phenix is likely needed) - currentely being considered are
    • Xeon 5550 @ 3350 SI2k (scenario A)
    • Xeon 5560 @ 3526 SI2k (scenario B)

The share between space and CPU is as below within the following caveats:

  • THe additional massive amount of storage (+170 TB for production) requires a secondary Titan head and the proper network switches. The total cost is projected to be ~  50k$ and we agreed to leave a ~ 20k$ unspent fund to move in this direction (cost shared with facility budget)

 

Experiment Parameters

Scenario A

Scenario B

Central Disk (TB) - Institution

20.00

20.00

Type Institution (Index from C&C)

11

11

Cost of Central Disk for Institution

$62,441.47

$62,441.47

Central Disk (TB) - NexSan-Production

0.00

0.00

Type NS-Prod (Index from C&C)

13

13

Cost of NexSan-Production

$0.00

$0.00

Central Disk (TB) - Production

170.00

170.00

Type of Production (Index from C&C)

12

12

Cost of Production Disk

$136,374.27

$136,374.27

Total Size of new Central Disk (TB)

190.00

190.00

Total Cost of Central Disk

$198,815.74

$198,815.74

Cost of Servers to support Central Disk

 

 

 

 

 

Compensation Disk entitled (TB)

0.00

0.00

Amount (up to entitlement) (TB)

0.00

0.00

Cost of Compensation Disk

$0

$0

Remaining Funds

$0

$0

 

 

 

Compensation count (1U, 4 GB below)

0

0

Compensation count (1U, 8 GB below)

0

0

CPU Cost

$0

$0

Distributed Disk

0.0

0.0

kSI2k

0.0

0.0

 

 

 

CPU Type (Index from Constants&Costs)

2

5

# 2U, 55xx, 5700 GB disk, 24 GB

74

72

CPU Alternative (not used)

0

0

CPU Cost

$429,126

$427,680

Distrib. Disk on new machines (TB)

421.8

410.4

kSI2k new

1983.2

2031.0

Total Disk (TB)

1393.8

1382.4

Total CPU (kSI2000)

4303.2

4351.0

Total Cost

$627,942

$626,496

Outside Funds Available

$62,441

$62,441

Funds Available

$588,000

$588,000

Unspent Funds

$22,500

$23,946

 

 

Disk space for FY09

Institution disk space

The below is what was gathered as the call sent to starsoft "Inquiry - institutional disk space for FY09" (with delay, a copy was sent to starmail on the 14th of April 2009). The deadline was provided as the end of Tuesday the 14th 2009, feedback was accepted until Wednesday the 15th (anything afterward could have been ignored).

 

Institution # TB confirmed
LBNL 5 April 21st 17:30
BNL hi 2 [self]
BNL me 1  [self]
NPI/ASCR 3 April 22nd 05:54
UCLA 1  
Rice 4 April 21st 18:47
Purdue 1 April 22nd 15:12
Valpo 1 April 22nd 17:59
MIT 2 April 22nd 15:56
Total 20  

The pricing on the table is as initially advertised i.e. a BlueArc Titan 3200 based solution at 4.3 k$/ TB for fiber channel based storage. For a discussion of fiber channel versus SATA, please consult this posting in starsofi. A quick performance overview of the Titan 3200 is showed below:

  Titan 3200
IOPS 200,000
Throughput Up to 20Gbps (2.5 GB/sec)
Scalability Up to 4PB in a single namespace
Ethernet Ports 2 x 10GbE or 6 x GbE
Fibre Channel Ports Eight 4Gb
Clustering Ports

Two 10GbE

Solution enables over 60,000 user sessions and thousands of compute nodes to be served concurrently.

The first scalability statement is over the top comparing to RHIC/STAR need but the second is by far reached at the RCF environment.

Production space

SATA based solution will be priced at 2.2 k$ / TB. While the price is lower than the fiber channel solution (and may be tempting), this solution is NOT recommended for institutional disk as the scalability for read IO at the level we are accustom to is doubtful (doubtful is probably an under-statement as we know by 5 years ago experience we will have to apply IO throttling).

As a space for production however (and considering resource constrained demanding cheaper solutions coupled with a Xrootd fast IO based aggregation solution which will remain the primary source of data access to users), the bet is that it will work if used as a buffer space (production jobs write locallyto the worker nodes, move files to central disk at the end as an additional copy along an HPSS data migration). There will be minimal guarantees of read performance access for analysis on those "production reserved" storage.

One unit of Thumper at 20k$ / 33 TB usable will be also purchased and tried out in special context. This solution is even less scalable and hence, requires a reduced amount of users and IO. The space targeted for this lower end may include (TBC):

  • data06 & data07 (2 TB) - reserved for specific projects and not meant for analysis, performance would not an issue
  • data08                (2 TB) - meant for Grid, IO is minimal there but we may need to measure data transfers compatible with KISTI based production
  • /star/rcf               (5 TB) - production log space (delayed IO, mostly a one time saving and will be fine)

Final breakdown

 

Post procurement 1 space topology

Following the Disk space for FY09, here is the new space topology and space allocation. 

BlueArc01   BlueArc02   BlueArc04  
           
STAR-FS01  Space STAR-FS03 Space STAR-FS05 Space
star_institutions_emn  2.0 star_data05  3.0 star_grid  0.5
star_institutions_lbl  14.0 star_data13  2.5 star_starlib  0.25
star_institutions_lbl_prod  5.0 star_data34  2.5 star_u  1.6
star_institutions_mit  3.0 star_data35  2.5    
star_institutions_rice  5.0 star_data53  2.5 STAR-FS06  Space
    star_data54  2.5 star_data01  2.2
STAR-FS02  Space star_data55  2.5 star_data02  2.2
star_data03  2.5     star_data06  1.0
star_data04  2.0 STAR-FS04  Space star_data14  1.0
star_data08  2.0 star_data22  2.0 star_data15  1.0
star_data09  2.0 star_data27  1.5 star_data16  2.0
star_institutions_bnl  6.0 star_institutions_psu  1.0 star_data38  2.0
star_institutions_bnl_me  1.0 star_institutions_purdue  1.0 star_data39  2.0
star_institutions_iucf  1.0 star_institutions_ucla  4.0 star_simu  3.0
star_institutions_ksu  1.0 star_institutions_uky  1.0    
star_institutions_npiascr  3.0 star_institutions_uta  1.0 STAR-FS07  Space
star_institutions_valpo  1.0 star_institutions_vecc  2.0 star_data07  0.89
    star_rcf  3.0 star_data10  0.89
        star_data12  0.76
        star_data17  0.89
        star_data24  0.89
        star_data28  0.89
        star_data29  0.89
        star_data30  0.89
        star_data32  1.75
        star_data33  0.89
        star_data37  1.66
        star_data42  1.66
        star_data44  1.79
        star_data45  1.66

 

Projects & proposals

This page is under constructions. Most projects are stil under the Projects and proposals page and not revised.

  • Supplemental funds were requested from DOE to help with infrastructure issues for both STAR & Phenix (and in predicion of a difficult FY10 funding cycle). The document is attached below as Supplemental-justification-v0 7.jl_.pdf
  • CloudSpan: Enabling Scientific Computing Across Cloud and Grid Platforms proposal was granted a Phase-I SBIR. This proposal is made in collaboration with Virkaz Tech.
  • Customizable Web Service for Efficient Access to Distributed Nuclear Physics Relational Databases proposal was granted a Phase-II award.

2010

.

CPU and bulk storage purchase 2010

 

Institutional disk space summary

Announcement for institutional disk space was made in starmail on 2010/04/26 12:31.

To date, the following requests were made (either in $ or in TB):

 

Institution Contact   Date $ (k$) TB equivalent Final cost
LBNL Hans Georg Ritter   2010/04/26 15:24 20 5 $17,006.00
ANL Harold Spinka   2010/04/26 16:29 - 1 $3,401.00
UCLA Huan Huang   2010/04/26 16:29 - 1 $3,401.00
UTA Jerry Hoffmann   2010/04/27 14:59 - 1 $3,401.00
NPI Michal Sumbera & Jana Bielcikova   2010/04/20 10:00 30 8 $27,210.00
PSU Steven Heppelmann   2010/04/29 16:00 - 1 $3,401.00
BNL Jamie Dunlop   2010/04/29 16:45 - 5 $17,006.00
IUCF Will Jacobs   2010/04/29 20:18 - 2 $6,802.00
MIT Bernd Surrow   2010/05/08 18:07 - 2
$6,802.00
        Totals 24
$88,430.00

The storage cost for 2010 was estimated at 3.4k$ / TB. Detail pricing below.

 

Central storage cost estimates

Since the past storage stretches on the number of servers and scalability, we would (must) buy a pair of mercury servers which recently cost us $95,937. The storage itself would be based on a recent pricing i.e. a recent configuration quoted it as: (96) 1TB SATA drives, price $85,231 + $2,500 installation yielding to 54 TB usable. STAR's target is 50 TB for production +5+10 TB for institution (it will fit and can be slightly expanded). Total cost is hence:

$95,937 + $85,231 + $2,500  =  $183,668 / 54TB = 3401/TB

Detail cost projections may indicate (depending on global volume) a possibly better pricing: the installation price (a haf a day of work for a tech from BlueArc) is fixed and each server pair could hold more than the planned storage (hence the cost for two servers is also fixed). Below a few configurations:

Service installation 2500 2500 2500
Cost for 54+27 TB     127846.5
Cost per 54 TB 85231    
Cost per 27 TB   42615.5  
Two servers 95937 95937 95937
Price with servers 183668 141052.5 226283.5
Price per TB 3401.3 5224.2 3187.1
Price per MB 0.003244 0.004982154 0.003039447

 

CPU estimates, choices, checks

Projected / allowed CPU need additional based on funding guidance (see CSN0474 : The STAR Computing Resource Plan, 2009): 7440 kSi2k / 2436 kSi2k    - projected to be 43% shortage
Projected distributed storage under the same condition (dd model has hidden assumptions): 417 TB / 495 TB  - projected to be at acquired level 130% off optimal solution

The decision was to go for 1U machine, switch to the 2 TB drive Hitachi HUA722020ALA330 SATA 3.0 Gbps drive to compensate from drive space loss (4 slots instead of 6 in a 2U). The number of network ports was verified to be adequate for our below projection. The 1U configuration allows recovering more CPU power / desnity. Also, the goal is to move to a boosted memory configuration and enable hyper-threading growing from 8 Batch slots to a consistent 16 slots per node (so another x2 although the performance scaling will not be x2 due to the nature of hyper-threading). Finally, it was decided NOT to retire the older machines this year but keep them on until next year.

Planned numbers

  • Distributed storage additional: 1009.4 TB
    • Only 3/4th of this space is usage * 90% for high watermarking hence, we end up with 681 TB of new storage. The assumption is that one of teh 2 TB disk will go to support production and user analysis (the likely proper number is 1 TB, hence a 16% effect and margin TBC).
    • The total required space for considering all production passes within a year is 1440 TB.
    • The accumulated total usable distributed storage is 277 TB - the total space is hence planned to be 958 TB with the assumptions above (possibly 30% shortfall or only 15% if we recover 1 TB from the OS+TEMP disk).
    • Conclusion: distributed storage will remain constrained as planned (not all productions will be available but near all).
  • The total centralized disk needed for 2010 was 50 TB. The final number will be a 81 TB unit - 24 TB for institutional support = 57 TB storage.
    • Conclusion: The central storage will have a small margin of flexibility allowing expansion of simu space and other similar areas
  • The total needed CPU required was projected to be 11634 kSI2k
    • Within the current procurement, the total CPU will reach 8191 kSI2k with 1U nodes (would have been 6827 kSI2k for 2U nodes).
    • Our shortfall will be ~ 30% off the theoretical projected needs. Initial projection was a fall by 43% (so a 13% gain by balancing cost between storage, memory and CPU).
    • Assuming the hyper-threading will allow for at least a gail factor of x1.4 (TBC but evidence through beta-testing indicates this is likely), the shortfall may be as little as 16% shortfall. This number is within reacheable enhanced duty factor.
    • Conclusion: the shortfall, if the initial projections remain accurate, is assumed to be from 16 to 30%.

Reality checks:

 

Accounting check - post install

Since we had many problems with missmatch of purchased/provided space with the RCF in past years, keeping track of the space accounting is a good idea. Below is an account of where the space went (we should total to 55 TB of production space and 26 TB of institution space).

Disk Initial space Final size Total
lbl_prod 5 5 10
lbl 14 0 14
anl 0 1 1
mit 3 2 5
bnl 6 5 11
iucf 1 2 3
npiascr 3 8 11
psu 1 1 2
ucla 4 1 5
uta 1 1 2
Total added 26  
data08 2 2.5 4.5
data09 2 3 5
data22 2 3.5 5.5
data23 5 0.5 5.5
data27 1.5 4 5.5
data11 (gone in 2009) 5 5
data23 (gone in 2009) 5 5
data85 to 89 N/A 5*5 25
data90 N/A 6 6
Total added so far 54.5  

 

There should be a 0.5 TB unallocated here and there.

 

2011

 

Quick notes - initial purchase cycle process

  • Discussion on procurements re-opened on 2011/05/31. Discussed with facility director on of institutional disk space dire need for STAR (following some chat from the last Collaboration Meeting) - general commitment was to try to find a viable payment solution (still unclear how)
  • Initial call for institutional disk space need made on 2011/06/01 - only couple days for feedback (requirements total is needed immediately as agreed with the facility director). Provided feedback insofar
    • 2011/06/01 12:17: UCLA        + 2 TB
    • 2011/06/01 12:20: Purdue     + 3 TB
    • 2011/06/02 11:07: NPI/ASCR + 7 TB
    • 2011/06/02 21:16: LBL           + 7 TB
  • Other note
    • 2011/06/02 03:33: Valpo U offered to pay back this year what they owed last year (1 TB equivalent)
    • Exceeded deadline (low priority request with even less guarantees):
      • 2011/06/06 15:32: IUCF    + 2 TB
      • 2011/06/06 16:14: BNL      + 5 TB

    • Exceeded submission of the requirements request
      • 2011/06/14 10:54: UTA     + 1.5 TB

 

2011/06 - FY11 procurement Status

  • Requirements turn to the RCF on 2011/06/09 - institutions in blue above were included but institution in red was NOT (for obvious date reason)
  • Purchase for the Linux farm went out on the 10th
  • Requests for storage quotes went out on the 13th
    • Any other requests will need to be re-addressed at a later time: 2x 26 TB cabinets were ordered exceeding the storage request from STAR of 36 TB total for FY11

Summary of requests and dues:

 

Institution Space (TB) Estimated cost *
Charged & date Left over due +
UCLA 2 4802.47808 4,800$ - ????/??/?? 3$
Purdue 3 7203.71712 7,203.72$ - 2013/02/20 0$
NPI/ASCR 7 16808.67328 15,000$ - 2011/12/09 1,809$
LBNL 7 16808.67328 14,160$ - 2011/06/15 2,649$
IUCF 2 4802.47808 6,802$ - 2012/06/11 (past unpaid due added)
BNL 5 12006.1952 (internal) 0$
Grand totals 26 62432.21504    

* ATTENTION: estimated cost based on initial purchase cost estimates. Final price may varry.
+ Unless a number appears in this column, the estimated cost is due in full. Multiple charge may apply to a given institution (until total dues are collected).

 

Acquired otherwise:

 

  Additional Total after purchase %tage increase S&C plan 2008 Deficit *
Central space (prod) 10.00 345.00 2.90% 377.00 -8.49%
Distributed disk space 430.47 1456.47 29.56% 2659.50 -45.24%
kSI2K farm 3045.00 6900.00 44.13% 30115.00 -77.09%

* ATTENTION: Note-A: deficit assumes a similar run plan - U+U was suggested for FY11 ; Note-B: increase in number of events is not helping; Note-C: if we are lucky and size / events is smaller than projected, the distributed disk may be fine.

 

Disk space status on 2011/08/04

Storage

Mount Y2010 Current Requested Status / Comment
ANL 1 1    
BNL 11 16 +5 Taken care off
BNL_ME 1 1    
EMN 2 2    
IUCF 3 5 +2 Taken care off
KSU 1 1    
LBL 14 14 +5 Taken care off
LBL_PROD 10 12 +2 Taken care off
MIT 5 5    
NPIASCR 11 18 +7 Taken care off
PSU 2 2    
PURDUE 1 4 +3 Taken care off
RICE 5 5    
UCLA 5 7 +2 Taken care off
UKY 1 1    
UTA 2 2    
VALPO 1 1    
VECC 2 2    

 

2012

The 2012 budget did not allow for flexible choices of hardware or storage. The RHIC experiments were not asked for partitioning (1/2 and 1/2 was done for STAR and PHENIX and essentially coverred for new farm nodes). Storage was handled via a replacement of old storage by newer storage media (and we doubled out space).

Since several institutional disk space bills were pending (unpaid), that possibility did not offer itself either. See the 2011 requirements for where we were.

2013

Requirements and plans for 2013

The RCF budget was minimal - no extranl disk purchase was carried but essentially, "infrastruture" related money (HPSS Silo expansion) too the core budget modulo some left for COU purchases.

Software effort level by sub-systems


Please map in blue with physics program.
N/A indictae constant effort level for duration of the project.

Sub system Task description Aproximate time Start time needed Core FTE Sub-sys FTE
HFT Geometry / alignment studies
Includes dev geometry development, developing alignment procedures, infrastructure support for code and alignment,
12 months 2012-10-01 0.2 2*0.5=1.0
HFT Survey, Db work and maintenance 12 months 2012-10-01 0.1 0.5
HFT Detector operations. Includes monitoring QA, calibrations, alignment for PXL, IST,SSD Each Run 2013-03-01 0.1 3*0.5=1.5
HFT Tracking studies, Stv integration and seed finder studies 12 months 2012-10-01 0.4 0.5
HFT Cluster/Hit reconstruction: DAQ for PXL, SSD, IST and definition of base structures but also development of Fast simulator 12 months 2012-10-01 0.2 1.0
HFT Decay vertex reconstruction, development of secondary vertex fitting methods and tools 8 months 2012-12-01 0.1 0.3
HFT General help N/A 2012-09-01 0.1 0.2
FGT Tracking with Stv including integration of FGT and EEMC, ... ECAL
W program requires good charge separation. Requirements for other physics goals like direct photon in the EEMC, delta-G, jets, IFF  and x-sections have to be investigated and likely range from crude track reconstruction for vetoing to optimal momentum reconstructions
8 months 2012-12-01 0.6 0.2
FGT Vertexing for forward physics 2 months 2013-04-01 0.2 0.3
FGT Alignment study, improvements 8 months 2012-12-01 0.2 0.5
FGT Improvements and tuning (Cluster finding, ...) 3 months 2013-01-01 0.0 0.3
FGT Tuning simulation to data, comparison studies using VMC 10 months 2012-12-01 0.3 1.0
FGT MuDST related work 1 month 2012-02-01 0.1 0.1
FGT Miscellaneous maintenance N/A   0.1 0.2
FMS Database interface, client and maintenance N/A   0.1 0.2
FMS Better simulation for the FMS, VMC based 6 months   0.2 0.4
TPC Calibration and alignment efforts: space charge and grid leak distortions and calculating correction factors, twist correction work, alignment (sector to sector as well as inner to outer), T0 and gain determinations, and dE/dx calibration 22 months 2013-01-01 0.5 1.5
TPC Calibration maintenance (methods developed converged, documented and automated) N/A 2015-01-01 0.3 0.7
TPC Calibration R&D: alignment and distortions 8 months 2012-07-01 0.2 0.3
TPC Understanding aging effects 20 months 2012-07-01 0.5 0.0
TPC iTPC upgrade efforts as well as contingency planning for existing TPC. Design and construction of a sector removal tool. 20 months 2012-07-01 0.5 1.5
UPG Geometry implementation for ETTR, FCS, VFGT 6 months 2012-07-01 0.2 0.5
UPG Event generator integration and simulator development (initial effort for generator, effort for proposal, longer term efforts as needed) 12 months 2012-07-01 0.2 0.5
EEMC Calibration support for physics readiness, software adjustements and maintenance N/A 2013-01-01 0.1 0.3
EEMC SMD calibration related software development coming year 2013-01-01 0.0 0.1
EEMC EEMC alignement work, development of better methods 12 months 2013-01-01 0.0 0.5
EEMC Cluster MIPS studies 6 months 2013-01-01 0.0 0.2
TOF Calibration support, software and database maintenance. Provide final parameters for TOF-based PID, and status tables for BTOF in PPV
 
per run 2013-01-01 0.2 0.5
TOF Separate TOF and VPD slow simulators 2 months 2013-01-01 0.2 0.5
TOF Simulation studies, mixermaker 6 months 2013-01-01 0.1 1.0
TOF Geometry maintenance 2 months 2013-01-01 0.2 0.2
MTD Calibration support, software maintenance. Provide final parameters for MTD-based muon ID per run 2013-01-01 0.1 1.0
MTD Simulation studies & development: simulation maker. 6 months 2013-01-01 0.2 1.0
MTD Software development: calibration maker 6 months 2013-01-01 0.2 1.0
MTD Geometry maintenance 2 months 2013-01-01 0.2 0.2
MTD Database maintenance & development 2 months 2013-01-01 0.1 0.5

2014

The base budget was sufficient to purchase network equipment needed to move to 10 GbE, a first wave of HPSS upgrade (disk cache, drive for Run 14 bandwidth requirements), refresh of BlueArc storage (end of warranties) and purchase of the GPFS system (with supplemental funds argued by STAR). The remainder went into purchasing an equal amount of CPU to be shared between STAR and PHENIX (TBC).

2015

Budget initially thought to be allocated for the RCF for equipment growth and refresh was not provided. Only emergency purchases and minor refresh were done (like replacing dying drives on 4-6 years old hardware to keep it alive) from a farm/processing perspective.

The latest computing resource needs is available at PSN0622 : Computing projections revisited 2014-2019. Even under a modest budget, the resources were dimmed insufficient to meet the need for timely scientific throughput. A new projection based on the non-funding profile of FY15 is not available at this time.

Tasks in S&C


If you are an enthusiastic junior and intend to fulfill your service task duties as defined by our publication policy, please contact the S&C Leader directly for any of the tasks below. Also, please make sure to specify your current skills and what you would like to do, learn and acquire as new skills. Some of those tasks will have a point of contact - please refer to the table for more details.

You may also consider serving as embedding helper or deputy or attend to long term services such as being a Software sub-system coordinator or maintaining the analysis IO layers (MuDST  and picoDST). Just ask what are the opened position in those maintenance categories. We not only welcome your help but rely on it as the S&C team does not have a growth model built-in (it does not grow with number of participants, number of sub-systems, number of components or number of years of data to support). We can only do more with YOUR help.

Title, affected area, task description & goal
Skill required
POC Taker or assignee
Status (date)
Title: Evaluate the use of XCache as an possible improvement over Xrootd access.
Affected area: user analysis

Task description:
Xrootd access has recently been improved by reducing the IO operation per second (very much like access to GPFS): entire files are transferred to local disk and read locally. Why was this done? This was done as the Xrootd storage model has changed - used to be widely distributed and leveraging our compute farm storage (hence scaling as the farm grew), the concentration of of storage over a few large data-servers caused bottlenecks. However, a caching layer could improve and reduce access hence load to those data servers. Multiple issues will arise and questions needs to be asked:
- Can we leverage caching in the first place? [do we have dataset re-use over a 24 hours period for example]?
- XCache infrastructure - is it flexible? Can we scale over many smaller nodes? [we do not want to displace the bottleneck from a few Xrootd data servers to ... a few cache]
- If we deploy, what are the measure of success? [cache hit must minimally confirm our estimate of benefit, what is the scale of impact?]
Nothing special J. Lauret A. Jaikar
L. Hajdu
Opened
2019/08
Title: Evaluation of a new "forum" based mailing list system
Affected area: Communication, exchange


Task description:
 
   J. Lauret RACF
J.Lauret
W. Betts
 
Opened
Title: GMT software integration
Affected area: Calibrations


Task description:
Clean up existing code library for the GMT and bring it through code peer review.
C++ G. Van Buren ?
 
Untaken
2019/08
Title: Collider Performance impact on STAR data
Affected area: Calibrations


Task description:
Use whatever tools we can (e.g. scalers, DCAs) to look for datasets impacted by collider performance similar to what was seen with the Booster Main Magnet for Run 18 AuAu27.
C++ G. Van Buren Yue-Hang Leung
 
Taken
2019/09
Title: Integration of automated run-by-run Offline QA
Affected area: QA


Task description:
Missing from Offline QA has been run-by-run (i.e. time and/or run dependence) QA plots. Significant work has been done to generate plots, but the final integration and interface to these plots needs to be completed.
open (e.g. develop web interface) G. Van Buren ?
 
Untaken
2019/08
Title: Web Master
Affected area: All web content


Task description:
None. Skill set includes PHP and a sense of organization.
 
PHP  J. Lauret Daniel Nemes +
David Stewart

 
 
2019/08
Taken 2019/10