Thank you for your interest in working with STAR.
Berkeley Lab’s Physics Division has two High Performance Computing Postdoctoral Scholar openings. Under the supervision of High Performance Computing experts and Computational Physicists, this role will develop and evaluate new software workflows that exploit the capabilities of the High Performance Computing facilities at National Energy Research Scientific Computing Center (NERSC); may participate in a new research initiative to blend Advanced Scientific Research Computing tools and facilities into High-Energy Physics software to optimize high-dimensional parameter fitting and tuning of simulation to data. The research will include designing, implementing, and validating a new chain integrating existing HEP tools with advanced optimization tools and approximation techniques, allowing new advances in Monte Carlo simulation predictions at the Large Hadron Collider.
Specific Responsibilities:
Work on high performance computing facilities at NERSC to evaluate software workflows and implement new software workflows for running experimental software on High Performance Computing (HPC) facilities, and analyze workflows from the ATLAS, ALICE, LUX, LZ, and Daya Bay experiments.
Develop and implement new Monte Carlo event generator tuning tools for experiments like ATLAS at the LHC, including the extension of existing tools for more efficient optimization algorithms, tunes against new regions of phase space, and the utilization of new computational techniques in order to automate the tuning of many-parameter numerical models. Explore the inclusion of detector simulation, a computationally expensive process, directly in the tuning process.
Conduct original research independently and in collaborations.
Interact with LBNL and other investigators working on similar and related scientific problems.
Interact with the experimental High Energy and Nuclear Physics communities and the experimental communities involved in the work.
Report results to supervisor.
Required Qualifications:
Ph.D. in Physics, Computer Science, or related fields.
Experience with Physics or Nuclear Science software development, workflows, or production.
Proficiency with computing programming languages including C/C++ and python.
Demonstrated ability to conduct original research independently and as a team member.
Good communication and organizational skills.
Ability to work as a team member to accomplish goals.
Additional Desired Qualifications:
Experience in any of the following areas: the ROOT software framework, job scheduling, batch system operation, data management, HPC systems, NUMA and MIC system architectures, software performance evaluation and optimization is desirable.
Knowledge of event generators (Pythia8, Herwig7, Sherpa), generator tuning tools (Professor, Rivet), detector simulation (Geant4, Delphes, PGS), physics analysis in LHC experiments.
The following requested application materials listed below must be submitted through Academic Jobs Online:
Curriculum Vitae.
Cover Letter.
Statement of Interest.
3 Letters of reference (to be uploaded on AJO by referee).
The EMC calibration workshop was launched as with a start up Email to starmail on 9/4/2008 10:31. A copy resides below. The review report was requested to be delivered by October the 29th 2008.
See also for further references:
Dear STAR collaborators, Driven by findings and (constructive) criticisms of the past operation workshop on the core team's attention to EMC Physics, I have flagged the calorimetric physics issues as high priority for the core team to address and a key project for the year. As corrective action, I proposed and discussed internally to the S&C project (including the software coordinators) the need for a workshop which high-level goal would be to define, discuss, map the Physics goals and deliverables of our research program to EMC tasks timelines, milestones and effort levels needed to reach those goals (by physics topics, analysis). To answer the requirements needs, and thanks to Bedanga for being on board with this, specific questions will be asked of the PWG (via the PWGC). Questions will serve as direct input to the workshop. --- The workshop will take place at BNL on September 29 and 30th (just after the analysis meeting) in the ITD seminar room (Bldg 515) and I have asked (and charged) Gene Van Buren, calibration coordinator, to help with the logistic and post-workshop report. --- As a product of the workshop, I have asked Gene Van Buren, in consultation with the EMC sub-system management (Will Jacobs as project coordinator and Matthew Walker as software coordinator) to steer an effort to write a summary report which would emphasize the roadmap ahead, and the (human) resources available/needed as well as what we would not be able to accomplish shall we have missing workforce. Such document would then serve as a clear statement for an incoming operation workshop where the needs would be re-iterated, quantification in our hands. The report would be due by October 17th, well in time for the operation workshop in November. Many thanks to Will Jacobs for his understanding, Matthew Walker for his prompt response and assistance and Gene Van Buren for pulling the troops and generating interest as well as taking this task on the S&C global plan to the level of seriousness it deserves. With hopes you will attend, participate in providing feedback (through your PWG) and make this workshop a success. -- Dr. Jerome LAURET RHIC/STAR Software and Computing project Leader ,,,,, Physics Department, Brookhaven National Laboratory ( o o ) Bldg 510a, Upton, NY 11973 ---m---U---m--------------------------------------------- E-mail: jlauret@bnl.gov
The Time Of Flight sub-system was called for a software readiness and integration review in the Fall of 2008. The commitee's charges are available below in the related documents section.
Related documents follows:
This page will keep information related to the 2011 tracking component review. The review will cover the state of the Cellular Automaton (CA) seed finding component as well as the Virtual Monte-Carlo based tracker (Stv) and their relevance to STAR's future need in terms of tracking capabilities.
After a successful review of the ITTF/Sti tracker in 2004, the STAR collaboration have approved the move to the new framework bringing at the time unprecedented new capabilities to the experiment and physics analysis.Sti allowed the STAR reconstruction approach to integrate to its tracking other detector sub-systems by providing method to integrate simple geometry models and allow to extrapolate track to the non-TPC detector planes therefore, correlating information across detector sub-systems. In 2005, the STAR production switched to a Sti based production and we have run in this mode ever since.
However, careful architecture considerations revealed a few areas where improvements seemed needed. Those are:
Additional considerations for the future of STAR were
Based on those considerations, several projects were launch and encouraged
We propose to review the aGML, CA and Stv components of our framework reshape.
NB: Beyond the scope of this review, a key goal for VMC is to allow the inclusion of newer Geant version and hence, getting ready to step away from Geant3 (barely maintainable), the FORtran baggage (Zebra and portability issues on 64 bits architectures) and remove the need for a special verison of root (root4star) hard-binding root and STAR specific needed runtime non-dynamic libraries.
See attachment at the bottom of this page.
Status:
Members:
The agenda is ready and available at You do not have access to view this node.
Below is a list of cross-references to other documents:
This page will list material for the follow-up review of Stv performances.
Other information
The review recommendations for focii were
The review finding details included
Specific example of the DCA distribution problem (reminded 4/13/2012)
On May 2020, the STAR management team has decided to reorganize the STAR software and computing (S&C) activities. The new S&C organization includes an S&C management team which oversees the S&C related issues together with six sub-groups. Please see the following the new organization chart and the subgroup leaders and relevant mailing lists.
The S&C management team members:
Ivan Kisel (Frankfurt)
Gene van Buren (BNL)
Jason Webb (BNL), Xianglei Zhu (Tsinghua)
Dmitri Smirnov (BNL), Grigory Nigmatkulov (MEPhI)
Dmitry Arkhipkin (BNL)
Jerome Lauret (BNL), Jeff Landgraf (BNL)
Ashik Ikbal Sheikh (KSU)
Xin Dong (LBNL), Lijuan Ruan (BNL)
Torre Wenaus (ex. off.)
Gene van Buren - SDCC liaison
Below is a brief description on the responsibilities of each sub-groups:
Tracking: - maintain / develop tracking software
- online/offline tracking merging
Calibration/Production: - data calibrations (coordinating subsystems)
- production library built and maintenance
- real data production and data management
Simulation/Embedding: - GEANT geometry maintenance and development
- Event generator integration
- Embedding software maintenance and development
- Simulation/embedding production
Software infrastructure: - Offline software code review, integration and maintenance
- StEvent/MuDst/picoDst maintenance
- management of OS, compilers etc and karma permissions
- coordination of efforts for bug fixing
Database: - Online databases and maintenance
- Offline databases (Calibrations, Geometry, RunLog)
- FileCataLog databases
- STAR phonebook / drupal modules
Experimental Computer Support and Online Computing:
- Offline/online computing support for experiment
- Cyber security
The principal members of the S&C structured team are listed below.
Other supporting efforts & members
The Software Sub-system coordinators (+) in each specialized area are as follows :
The below sub-systems are no longer supported in STAR (detector system physically removed) - green are sub-systems with no software support, blue are the ones with some support:
The names below reflect the list of software coordinators while diverse projects were in R&D phase. The projects moved to full projects in 2007
The computing and software effort is closely associated with the Physics Working Groups. STAR physics analysis software runs within the context of the computing infrastructure, taking the DST as input. The physics working groups have responsibility for the development of physics analysis software. The STAR Physics Analysis Coordinator acts as coordinator between the PWGs and computing. The PAC's responsibilities are described here.
Sooraj Radhakrishnan is the current STAR Physics Analysis Coordinator.
STAR Software & Computing is headed by Dr. Jérôme Lauret and Dr. Gene Van Buren located at the Brookhaven National Laboratory.
The S&C management structure is as below. Unless otherwise specified, [X] indicates an activity area whose overall coordinator has been missing and co-lead (either internally absorbed or activity dropped).
1) To work with the physics working group convenors and as appropriate the Software and Computing Project Leader, Simulations Leader, Reconstruction Software Leader, Offline Production Leader, Software Infrastructure Leader, and Run-Time Committee to determine the physics analysis and simulation software needs. To act as an interface between the physics working group convenors and the STAR Software and Computing Project on matters of physics software and computing and consult as needed with the Spokesperson on priorities for this software.
2) To work with the physics working group convenors and the STAR Software Project Leaders to faciliate the development and integration of physics analysis software in a way that is compatible with the overall STAR software approach. In so doing, the quality and performance of the reconstruction and simulation codes should be primary considerations.
3) To represent the physics working groups in discussions, with the software project leaders, on the physics analysis tasks to be performed during event reconstruction and at each stage of analysis. This will require that the physics analysis coordinator maintain an overall perspective of the status and availability of physics analysis and simulation software.
4) To facilitate input and communication between the physics working groups and the Simulations Leader on issues of determining and implementing the tradeoffs in the simulation capability versus physics.
5) To work with the Simulations Leader to make efficient use of the computing resources for the simulations needed by each of the physics working groups and to coordinate the physics working groups' input on design tradeoffs in the simulations with respect to general performance and overall capabilities.
6) To work with the Reconstruction Leader to establish requirements for DSTs and event reconstruction functionality.
1. well versed in STAR's physics program with a strong interest in physics, software and computing.
2. active in physics analysis, as an active developer and user of analysis codes.
3. strong in computing, able and willing to be an active participant in the computing group designing and developing the analysis software and the computing framework that supports it, and able to assess the quality and approach of the upstream reconstruction and simulation codes and give feedback.
4. direct experience in OO/C++ prefered.
5. be able to communicate well.
6. be able to commit a large fraction of time to this job and to have a presence at BNL as needed to interface with the software project leaders and the physics working group convenors.
Torre's statement on the job:
"A principal early role of the physics analysis coordinator would be to help assemble the physics analysis program for the mock data challenges, going well beyond the broad strokes of what physics should be looked at to developing the program to put in place the physics analysis software needed to execute it, software layered over a physics analysis infrastructure and toolset that the Analysis Coordinator should play a strong role in designing and ideally developing. Besides assembling the disparate needs of the PWGs to scope out and assign the design and implementation job, there is a lot of commonality in their needs that needs to be coordinated."
Position and responsibilities Description: Calibration coordinator
The STAR Calibration Coordinator's primary mission is focused toward the delivery of the calibration constants necessary to bring the data to an expected level of quality in support of the scientific program. The STAR Calibration coordinator is expected to work in concert with the STAR sub-systems software coordinator's designated calibration expert(s) to bring the data to a level of accuracy and quality in support of the scientific program. Before and during period of data taking and data production, this may be achieved through organizing calibration readiness meetings or communicate with the calibration experts and/or prepare/summarize and develop a calibration plan and schedule as required. He/she would interact with them to understand their problems and seek to work toward the elimination of mindless tasks through automation (support for online calibration, fast-offline, etc...). He/she will be responsible for pro-actively be the liaison (and main point of contact) between production, reconstruction, database or other coordinators and the sub-systems expert within the realm of expertise.
Authorities
To achieve objectives, he/she has the authority to directly request highly prioritized productions.
The Calibration coordinator priorities and schedule takes precedence over the individual sub-systems calibration needs.
The Calibration coordinator may request progress status to the sub-system designated calibration experts
If any, and in order to make the best of use of the global STAR calibration organization, he/she should be informed about on-going independent calibration effort and techniques being developed within the collaboration. Work should have his/her final approval before an integration in the STAR framework.
Responsibilities
He/she is responsible for identifying key milestones, determine immediate and future needs and communicate critical project issues in a timely fashion.
He/she is expected to be a central point of contact for the user's need within the area of expertise, respond to user problems, explain technology and methodologies and guide or mentor individuals as appropriate.
Skills
The STAR Calibration Coordinator is expected to demonstrate in depth understanding of fundamentals of the requirement specification, design, coding, and testing of technologies, methodologies and computational techniques related to the calibration needs of the STAR experiment. He/she should have a good understanding of the current and future application and technology and the faculty to learn, apply and implement new and emerging techniques and concepts very quickly.
Due to the increasing demand of the STAR collaboration, associated to the need for redundancy and more cohesion in the embedding activity, the embedding structure will move toward a distributed (computing) model paradigm with a structured set of responsibilities.
The embedding activity is a cross between the Simulation and Reconstruction activities and an important part of our data mining and data analysis process. Hence, the Embedding structure described below is an activity part of the Software & Computing (S&C) project structure.
The embedding activity will be led by the Embedding Coordinator (EC) helped by Embedding Deputies (ED) whose responsibilities and authorities are described below. It is understood that each Physics Working Group (PWG) may assign a contact person to help running an embedding series, or an Embedding Helper (EH). EC, ED and EH constitute the embedding team and core structure.
Embedding tasks will be created consistent with Appendix A “Initiating Embedding requests”.
The primary mission of the EC is to organize the work and the set of QA results related to each of the embedding series and communicate to the collaboration the progress and difficulties encountered in the data production process. The EC is the interface to many areas in STAR and to efficiently achieve the goals prescribed by the function, his/her responsibilities and authorities are described below:
The EC responsibilities are:
The EC authorities are
Assign tasks as applies to the ED
Conflict resolution
The Embedding deputies' role and responsibilities are the one previously expected of the PDSF Tier1 center in 2005 and as defined in wording in 2007. An Embedding Deputy is intimately tight to a site's resources or a specific well-defined tasks (such as performing the base QA).
Those responsibilities include
The authorities of the ED include
The Embedding Helpers are individuals recruited from within the PWG and hence, are STAR collaborators helping with the running of the embedding and carry some of its burden as a general service. The EH is part of a workforce supplement provided by the PWG and the expectation is for a EH to serve for a minimal time of two years during which, knowledge build-up and stability in our procedure and communication can be achieved. PWG may or may not provide EH, understanding however that the lack of EH may result in delays in delivery of results.
The EH responsibilities is to carry the communicating of all issues to/from the PWG and the Embedding team and seek that the requests made by the PWG are carried to their end. TO this end, rhe EH are expected to work closely with the ED and EC to perform the embedding tasks related to their respective PWG and consistent with the principles stated above. Examples of duties:
Embedding requests and need will be initiated and discussed within either PWG or R&D working group with respectively the supervision of the PWGC of interest or R&D simulation coordinator. The following caveats apply.
Pure embedding will be asked by PA of a pending publication / paper or whenever the accuracy needs to be as-close as possible to the real data. All other cases will be reviewed by the EC and may be transformed into a enriched sample simulation (“injection” simulation) wherever it applies and re-directed to the simulation leader consistent with the EC authorities.
After discussions within the PWG, embedding requests will be recorded by one of the PWGC via a provided interface which purpose will be to keep track of all requests and allow for priorities and overview of status and progress. The PWGC may designate an embedding point of contact as PWGC representative for embedding. In such case, the communication will be carried through the Embedding Point of contact.
No embedding requests outside of the provided framework and interface shall be satisfied and no “pending discussions” without an actual requests will be considered. It is also understood that a ill-defined requests may be closed by the EC consistent with the feedback gathering policy described in the EC responsibilities and authorities section. In such event, the request slot is not re-usable by the PWG.
Unless specified otherwise and priorities assigned explicitly by the PAC, EC or S&C leader, the requests will be considered as per a first in first done basis and upon availability of resources.
Upon request from the PWGC, the embedding team will ensure that the proper information is obtained as per the production series the embedding requests relate to. For a given production, the field setting and geometry tag used will be checked for consistency. Consistency must arise between simulation, real data and intended study. The geometry tag will be acquired from the production options page maintained by the production coordinator [currently Real Data production option].
The ED will be responsible for carrying those checks and will pass on functional macros to the EH assigned to assist them. If a setting is inaccurate (field or geometry tag), the EH shall immediately inform the PWG and the EC and request explicit confirmation. An embedding series shall not be started nor be run without the acknowledgment of the related field or geometry tag adequacy as chosen by the PWGC or PWG Point of Contact.
Jérôme Lauret, STAR Software & computing Leader
General Position Description: detector sub-system software coordinator
Each detector sub-system must designate or identify a Software sub-system Coordinator who then becomes the main contact person for developing and maintaining the software written to bring the data for that sub-system to a Physics usable form and at a level of accuracy and expectations required for carrying the STAR Physics program involving the use of that particular sub-system. Additional manpower for the development of the Software may be allocated within the sub-system's group or requested by the sub-system software coordinator as additional manpower (aka service/community task).
As all realizes that there is no Physics without data reduction (via code/software), the Software coordinator is therefore a corner stone of sub-system's group. He/she has for main responsibilities:
In order to bring the sub-system data closer to readiness, he/she
Has the authority to request highly prioritized productions within the scope of efficiency, alignment or calibration studies, or any study going toward the convergence, consolidation or strengthening of the Physics results. The software sub-system coordinator designated a point of contact handling calibration production requests (in such case, the POC should be clearly specified).
May request allocation of resources necessary to accomplish the outlined above tasks.
Has the ultimate and final authority to organize the work at hand within his/her sub-system realm. For example, partitioning of calibration, simulation and other tasks as necessary and depending on available manpower.
Is, unless indicated otherwise, the point of contact for modification of any code pertaining to the sub-system (others proposing modifications must inform the software sub-system coordinator).
Is expected to communicate to the S&C leader concerns and issues which may be or become obstacles in achieving the above mission.
Responsibilities Description: Reconstruction Leader
General
The STAR Reconstruction Leader is responsible for maintaining, developing and expanding the STAR reconstruction code and framework. By reconstruction, one includes
The domain of development of Detector specific microscopic (slow) and parametrized (fast) response simulator will be done through discussions and advanced planning at Software & Computing meetings in conjunction with the Simulation and Database Leaders, affected sub-system detector coordinators and experts. The same applies with the cross-discipline (Reconstruction/Simulation) area known in STAR as embedding.
The reconstruction leader is expected to
The Reconstruction Leader's tremendous task will therefore be assisted by an expert per detector sub-system as designated by the detector Software sub-system coordinator. He/she will provide to this expert guidance as per integrating the sub-system specific code within the STAR reconstruction framework and global tracking. Further manpower may come through reconstruction projects (a new tracking software is an example) which, upon completion, would fall under the Reconstruction Leadership.
The Reconstruction Leader will be further assisted by the STAR Calibration Coordinator and the Production Coordinators. He/she should respond and assists to the Calibration Coordinator's findings and requests for integration of new algorithms or techniques specific to the Calibration coordinator's area of expertise. In such case, they will work closely together until task completion within the scope and planning defined above. The Reconstruction coordinator may request directly to the Productions Coordinator(s) highly prioritized production in order to resolve or evaluate a question pertinent to the reconstruction area.
However, to ensure a smooth execution of global planning and complete transparency between the area of reconstructions, simulations and calibration, schedule and priorities should be brought to the attention of the STAR S&C leader and further discussed in Software & Computing meetings prior to execution or deployment.
In the absence of the STAR S&C leader and deputies, shall the schedule and tasks priorities be left unclear, the Reconstruction Leader judgment on production schedule will take precedence over all others.
Reconstruction deputy
One or more reconstruction deputy/deputies may be assigned by the S&C Leader to assist further the task of the reconstruction leader.
A reconstruction deputy's task is to effectively take the lead on a specific project as defined. Within the scope of this project, the reconstruction deputy has the same authorities and responsibilities than the Reconstruction Leader. They are expected to work close to one another until the completion of the defined task. In the absence of the Reconstruction Leader, such deputy will take full responsibility over the Reconstruction Software and in all areas including his assigned project. Shall several deputies be in office, the choice will be left to the Reconstruction Leader (or following the chain of the S&C organization).
Furthermore, “a” reconstruction deputy may represent the reconstruction activities and progress at Collaboration and/or Analysis meetings and therefore, should remain informed of activities within this area of expertise.
Responsibilities Description: Simulation Leader
General
The STAR Simulation Leader is responsible for maintaining, developing and expanding the STAR simulation framework. His role is to analyze, design, formulate, implement and maintain the consistency of the simulation software(s), packages and toolkits solution to support the STAR research needs and/or in response to problems in support of the scientific program. A list of areas under the Simulation's leader responsibilities are:Specifics (current as per 2003) and future
Within the current STAR simulation framework, the Simulation Leader is expected to attend to the development, test and maintenance of the existing geometry and materials database and related GEANT simulation software necessary to simulate the response of the STAR Detector used to interpret, without discontinuities, ongoing and forthcoming research data from the STAR Experiment at RHIC. He/she will be expected toAuthorities
Responsibilities
He/she is responsible for identifying key milestones, determine immediate and future needs and communicate critical project issues in a timely fashion.Skills
He/she is expected to demonstrate in depth understanding of fundamentals of the requirement specification, design, coding, and testing of technologies, methodologies and computational techniques related to the simulation needs of the STAR experiment. He/she should act as an architect for the future needs and therefore, have a good understanding of the current and future application and technology and the faculty to learn, apply and implement new and emerging techniques and concepts very quickly. The simulation leader would have a PhD in physics and several years of post-doctoral experience in the field of Heavy Ion, strong background in programming, using C++, FORTRAN, and GEANT3 and/or GEANT4, and good communicational skills.The pages and documents in this section are a mix of resource or design requirement documents, proposal, planning and assessment. Links to other documents may be made (like meeting, evaluation, reviews) making the pages here a single point shopping to the S&C project resource requirement design.
Every year, the 4 RHIC experiments along with the RCF assemble a a task force to discuss and plan for the Computing resource allocation. In STAR, FY03/FY04 was lead by Jérôme Lauret with help from Jeff Porter. We meant for this work to be publically available.
Most documents are from 2002 but are in effect in 2003.
This page is a placeholder to import the projects launched in 2005.
This project started in 2005 as a service task aimed to provide a seamless port of the online Web server for document self-maintenance and easy access. The initial description follows. It was motivated by the poor maintenance and log term support of the pages available online and the need for quick page creation for keeping help, instructions and procedures up to date in a multiple user and group environment context. Also, we imagined that shift crew could drop comments on existing pages and hoped for the documentation of our operation to be more interactive and iterative with immediate feedback process. Plone was envisioned at the time but the task was opened to an evaluation based on requirements provided below.
This task would include the evaluation and deployment of the a content management system (CMS) on the online Web server. While most CMS uses a virtual file system, its ability to manage web content through a database is of particular interest. Especially, the approach would allow for a Web automatic mirroring and recovery. We propose the task to include
Facts:
Timelines:
The following requirements were set for the project:
The following functional requirements were either requested or desired for a smooth (sup)port of previous deployment.
To be transfered from the old site
.
ID | Task Name | Duration | Start | Finish | Resource Names | |
---|---|---|---|---|---|---|
1 | ||||||
2 | TPC checks | 7 days | Fri 2/9/07 | Mon 2/19/07 | ||
3 | Laser drift+T0 | 7 days | Fri 2/9/07 | Mon 2/19/07 | Yuri[50%] | |
4 | SSD shift + East/West TPC tracks | 3 days | Fri 2/9/07 | Tue 2/13/07 | Spiros[25%] | |
5 | SVT aligment | 7 days? | Tue 2/20/07 | Wed 2/28/07 | ||
6 | SVT+SSD (cone) for each wafer | 1 wk | Tue 2/20/07 | Mon 2/26/07 | Ivan,Richard | |
7 | Shell/Sector for each magnetic field settings | 1 day? | Tue 2/27/07 | Tue 2/27/07 | ||
8 | Ladder by Ladder | 1 day? | Wed 2/28/07 | Wed 2/28/07 | ||
9 | Using TPC+SSD, Determining the SVT Drift velocity | 7 days | Fri 2/9/07 | Mon 2/19/07 | Ivan | |
10 | Drift velocity | 12 days | Fri 2/9/07 | Mon 2/26/07 | ||
11 | High stat sample processing preview | 7 days | Fri 2/9/07 | Mon 2/19/07 | Vladimir | |
12 | Final evaluation | 5 days | Tue 2/20/07 | Mon 2/26/07 | Vladimir | |
13 | ||||||
14 | Online QA (offline QA) | 7 days | Fri 2/9/07 | Mon 2/19/07 | Ivan,Helen | |
15 | ||||||
16 | Hit error calculation final pass | 1 wk | Fri 2/9/07 | Thu 2/15/07 | Victor | |
17 | Self-Alignement | 3 wks | Fri 2/16/07 | Thu 3/8/07 | Victor | |
18 | Code in place for library - aligement related | 1 wk | Fri 2/9/07 | Thu 2/15/07 | Yuri[10%],Victor[10%] | |
19 | ||||||
20 | Tasks without immediate dependencies | 60 days | Fri 2/9/07 | Thu 5/3/07 | ||
21 | Cluster (SVT+SSD) and efficiency studies | 1.5 mons | Fri 2/9/07 | Thu 3/22/07 | Artemios,Jonathan | |
22 | Slow/Fast simulators reshape | 3 mons | Fri 2/9/07 | Thu 5/3/07 | Jonathan,Polish students x2,Stephen | |
23 | ||||||
24 | ||||||
25 | Cu+Cu re-production | 87.5 days | Fri 3/9/07 | Tue 7/10/07 | ||
26 | Cu+Cu 62 GeV production | 3 wks | Fri 3/9/07 | Thu 3/29/07 | ||
27 | Cu+Cu 200 GeV production | 72.5 days | Fri 3/30/07 | Tue 7/10/07 | ||
28 | cuProdcutionMinBias (30 M) | 8.5 wks | Fri 3/30/07 | Tue 5/29/07 | ||
29 | cuProductionHighTower (17 M) | 6 wks | Tue 5/29/07 | Tue 7/10/07 |
On 7/12/2007 23:42, a task force was assembled to evaluate the future of the STAR software
and its evolution in the un-avoidable multi-core era of hardware realities.
The task force was composed of: Claude Pruneau (Chair), Andrew Rose, Jeff Landgraf, Victor Perevozchikov, Adam Kocolosk. The task force was later joined by Alex Wither from the RCF as the local support personnel were interested in this activity.
The charges and background information are attached at the bottom of this page.
The initial Email announcement launching the task force follows:
Date: Thu, 12 Jul 2007 23:42:40 -0400 From: Jerome LAURET <jlauret@bnl.gov> To: pruneau claude <aa7526@wayne.edu>, Andrew Rose <AARose@lbl.gov>, Jeff Landgraf <jml@bnl.gov>, Victor Perevozchikov <perev@bnl.gov>, Adam Kocoloski <kocolosk@MIT.EDU> Subject: Multi-core CPU era task force Dear Claude, Adam, Victor, Jeff and Andrew, Thank you once again for volunteering to participate to serve on a task force aimed to evaluate the future of our software and work habits in the un-avoidable multi-core era which is upon us. While I do not want to sound too dire, I believe the emergence of this new direction in the market has potentials to fundamentally steer code developers and facility personnel into directions they would not have otherwise taken. The work and feedback you would provide on this task force would surely be important to the S&C project as depending on your findings, we may have to change the course of our "single-thread" software development. Of course, I am thinking of the fundamental question in my mind: where and how could we make use of threading if at all possible or are we "fine" as it is and should instead rely on the developments made in areas such as ROOT libraries. In all cases, out of your work, I am seeking either guidance and recommendation as per possible improvements and/or project development we would need to start soon to address the identified issues or at least, a quantification of the "acceptable loss" based on cost/performance studies. As a side note, I have also been in discussion with the facility personnel and they may be interested in participating to this task force (TBC) so, we may add additional members later. To guide this review, I include a background historical document and initial charges. I would have liked to work more on the charges (including adding my expectations of this review as stated in this Email) but I also wanted to get them out of the door before leaving for the V-days. Would would be great would be that, during my absence, you start discussing the topic and upon my return, I would like to discuss with you on whether or not you have identified key questions which are not in the charges but need addressing. I would also like by then to identify a chair for this task force - the chair would be calling for meetings, coordinate the discussions and organize the writing of a report which ultimately, will be the result of this task force. Hope this will go well, Thank you again for being on board and my apologies for dropping this and leaving at the same time. -- ,,,,, ( o o ) --m---U---m-- Jerome -
Date: Fri, 03 Aug 2007 15:34:56 -0400 From: Jerome LAURET <jlauret@bnl.gov> CC: pruneau claude <aa7526@wayne.edu>, Andrew Rose <AARose@lbl.gov>, Jeff Landgraf <jml@bnl.gov>, Victor Perevozchikov <perev@bnl.gov>, Adam Kocoloski <kocolosk@MIT.EDU>, Alexander Withers <alexw@bnl.gov> BCC: Tim Hallman <hallman@bnl.gov> Subject: Multi-core CPU era task force Dear all, First of all, I would like to mention that I am very pleased that Claude came forward and offered to be the chair of this task force. Claude's experience will certainly be an asset in this process. Thank you. Second news: after consulting with Micheal Ernst (Facility director for the RACF) and Tony Chan (Linux group manager) as well as Alex Withers from the Linux group, I am pleased to mention that Alex has kindly accepted to serve on this task force. Alex's experience in the facility planing and work on batch system as well as aspects of how to make use of the multi-core trends in the parallel nascent era of virtualization may shade some lights on issues to identify and bring additional concepts and recommendations as per adapting our framework and/or software to take best advantage of the multi-core machines. I further discussed today with Micheal Ernst of the possibility to have dedicated hardware shall testing be needed for this task force to complete their work - the answer was positive (and Alex may help with the communication in that regard). Finally, as Claude has mentioned, I would very much like for this group to converge so a report could be provided by the end of October at the latest (mid-October best). This time frame is not arbitrary but is at the beginning of the fiscal year and at the beginning of the agency solicitations for new ideas. A report by then would allow shaping development we may possibly need for our future. With all the best for your work,
The following documents were produced by the task-force members and archived here for historical purpose (and possibly providing a starting point in future).
CPU and memeory usage on the the farm - Alex Wither
Opteron (CPU / memory)
Xeon (CPU / memory)
CAS & CRS CPU usage, month and year
A reminder as per the need for a reoprt was sent on 10/3/2007 to the chair (with a side track discussion on other issues which seemed to have taken attention). To accomodate for the busy times, a second reminder was sent on 11/19/2007 with a new due date for the end of november. Sub-sequent reminders were sent on the 12/10/2007 and 1/10/2008.
The task force has not deliverred the report as requested. A summary was sent in an Email as follow:
... a summary of the activities/conclusions of the committee. ... during the first meeting, all participants agreed that if there was anything to be done, it would be on reconstruction. Members of the committee felt that GEANT related activities are not in the perview of STAR and should not be STAR's responsibility. In view also of what we did next it also appears that not much would actually be gained. We also discussed (1st meeting) the possibility of multi-treading some aspects of user analysis. e.g. io, and perhaps some aspects of processing. Here people argued that there is too much variability in type of analyses carried by STAR users. And it is not clear that multi-treading would be in anyway faster - while adding much complexity to infrastructure - if not to the user code. Members of the committee thus decided to consider reconstruction processes only. In subsequent meetings, we realized (based on some references test conducted in the Industry) that perhaps not much would be gained if a given node (say 4 cores) can be loaded with 4 or 5 jobs simultaneously and provided sufficient RAM is available to avoid memory swapping to disk. Alex, and Andrew carried some tests. Alex's test were not really conclusive because of various problems with RCF. Andrew's test however clearly demonstrated that the wall clock time essentially does not change if you execute 1 or 4 jobs on a 4-core node. So the effective throughput of a multicore node scales essentially with the number of cores. No need for complexity involving multithreading. Instant benefits. Cost: PDSF and RCF are already committed according to Alex and Andrew to the purchase of multicore machines. This decision is driven in part by cost effectiveness and by power requirements. 1 four core machine consumes less power, and is less expensive than 4 1-core machine. Additionally, that's where the whole computing industry is going... So it is clear the benefits of multicore technology are real and immediate without invocation of multitreading. Possible exceptions to this conclusion would be for online processing of data for trigger purposes or perhaps for fast diagnostic of the quality of the data. Diagnostics (in STAR) are usually based on a fairly large dataset so the advantage of multi-threading are dubious at best in this case because the througput for one event is then irrelevant - and it is the aggregate throuput that matters. Online triggering is then the only justifiable case for use of multithreading. Multithreading would in principle enable faster throughput for each event thereby enabling sophisticated algorithms. This is however a very special case and it is not clear that adapting the whole star software for this purpose is a worthy endeavor - that's your call. I should say in closing that the mood of the committee was overall quite pessimistic from the onset. Perhaps a different group of people could provide a slightly different point of view - but I really doubt it.
This page will either have requirements document or project description for R&D related activity in S&C (or defined activities hopefully in progress).
Goals:
Steps and tasks:
Status: See results on Disk IO testing, comparative study 2008.
A summary of ongoing and incoming projects was sent to the software coordinators for feedback. The document refers to projects listed in this section under Projects and proposals.
The list below does NOT include general tasks such as the one described as part of the S&C core team roles as defined in the Organization job descriptions documents . Examples of which would be global tracking with Silicon including HFT, geometry maintenance and updates or otherwise calibration or production tasks as typically carried for the past few years. Neither does this list include improvements we need for areas such as online computing (many infrastructure issues, including networking an area of responsibility which has been unclear at best) nor activities such as the development and enhancement of the Drupal project (requirements and plans sent here).
The list includes:
Wish list (for now):
The level of funding planned for 2008 was:
Following previous years "outsourcing" of funds approach, an note was sent to the STAR collaboration (Subject: RCF requirements & purchase) on 3/31/2008 12:18. The pricing offered was 4.2 $/GB i.e. 4.3 k$/TB of usable space. Based on the 2007 RCF requirement learning experience (pricing was based on vendor's total space rather than usable), the price was firmed, fixed and guaranteed as "not higher than 4.2 $/GB" by the facility director Micheal Ernst at the March 27th liaison meeting.
The institutions external fund profile for 2008 is as follows:
STAR external funds | |||
Institution | Paying account | TB requested | Price |
UCLA | UCLA | 1 | 4300.8 |
rice | rice | 1 | 4300.8 |
LBNL | LBNL | 4 | 17203.2 |
VECC | BNL | 1 | 4300.8 |
UKY | UKY | 1 | 4300.8 |
Totals | 8 | 34406.4 |
Penn State university provided (late) funds for 1 TB worth.
*** WORK IN PROGRESS ***
The requirements for FY08 are determined based on
The initial STAR requirements provided for the RHIC mid-term strategic plan can be found here
The initial raw data projected was 870 TB (+310 TB).
The RAW data volume taken by STAR in FY08 (shorter run) is given by the HPSS usage (RAW COS) as showed below:
A total of 165 TB was accumulated far below expected data projections by a factor of 2. The run was however declared as meeting (to exceeding) goals comparing to the STAR initial BUR.
Some notes:
scenario B = scenario A + external funds
Experiment Parameters | STAR | STAR |
Senario A | Senario B. | |
Sustained d-Au Data Rate (MB/sec) | 70 | 70 |
Sustained p-p Data Rate (MB/sec) | 50 | 50 |
Experiment Efficiency (d-Au) | 90% | 90% |
Experiment Efficiency (p-p) | 90% | 90% |
Estimated d-Au Raw Data Volume (TB) | 130.8 | 130.8 |
Estimated p-p Raw Data Volume (TB) | 41.5 | 41.5 |
Estimated Raw Data Volume (TB) | 172.3 | 172.3 |
<d-AU Event Size> (MB) | 1 | 1 |
<p-p Event Size> (MB) | 0.4 | 0.4 |
Estimated Number of Raw d-Au Events | 137,168,640 | 137,168,640 |
Estimated Number of Raw p-p Events | 108,864,000 | 108,864,000 |
d-AU Event Reconstruction Time (sec) | 9 | 9 |
p-p Event Reconstruction Time (sec) | 16 | 16 |
SI2000-sec/event d-Au | 5202 | 5202 |
SI2000-sec/event p-p | 9248 | 9248 |
CPU Required (kSI2000-sec) | 1.7E+9 | 1.7E+9 |
CRS Farm Size if take 1 Yr. (kSI2k) | 54.6 | 54.6 |
CRS Farm Size if take 6 Mo. (kSI2k) | 109.1 | 109.1 |
Estimated Derived Data Vlume (TB) | 200.0 | 200.0 |
Estimated CAS Farm Size (kSI2k) | 400.0 | 400.0 |
Total Farm Size (1 Yr. CRS) (kSI2k) | 454.6 | 454.6 |
Total Farm Size (6 Mo. CRS) (kSI2k) | 509.1 | 509.1 |
Current Central Disk (TB) | 82 | 82 |
Current Distributed Disk (TB) | 527.5 | 527.5 |
Current kSI2000 | 1819.4 | 1819.4 |
Central Disk to retire (TB) | 0 | 0 |
# machines to retire form CAS | 0 | 0 |
# machines to retire from CRS | 128 | 128 |
Distributed disk to retire (TB) | 27.00 | 27.00 |
CPU to retire (kSI2k) | 120.00 | 120.00 |
Central Disk (TB) | 49.00 | 57.00 |
Cost of Central Disk | $205,721.60 | $239,308.80 |
Cost of Servers to support Central Disk | ||
Compensation Disk entitled (TB) | 0.00 | 0.00 |
Amount (up to entitlement) (TB) | 0.00 | 0.00 |
Cost of Compensation Disk | $0 | $0 |
Remaining Funds | $0 | $0 |
Compensation count (1U, 4 GB below) | 5 | 5 |
Compensation count (1U, 8 GB below) | 0 | 0 |
CPU Cost | $27,500 | $27,500 |
Distributed Disk | 27.8 | 27.8 |
kSI2k | 114.5 | 114.5 |
# 2U, 8 cores, 5900 GB disk, 8 GB RAM | 27 | 27 |
# 2U, 8 cores, 5900 GB disk, 16 GB RAM | 0 | 0 |
CPU Cost | $148,500 | $148,500 |
Distrib. Disk on new machines (TB) | 153.9 | 153.9 |
kSI2k new | 618.2 | 618.2 |
Total Disk (TB) | 813.2 | 821.2 |
Total CPU (kSI2000) | 2432.1 | 2432.1 |
Total Cost | $354,222 | $387,809 |
Outside Funds Available | $0 | $34,406 |
Funds Available | $355,000 | $355,000 |
Name | File System | Path | Hard Quota | Space allocated | Available Space | BlueArc Physical storage | |
star_institutions_bnl | STAR-FS01 | /star_institution/bnl | 3.50 | 16.50 | 19.00 | BA01 | |
star_institutions_emn | STAR-FS01 | /star_institution/emn | 1.60 | ||||
star_institutions_iucf | STAR-FS01 | /star_institution/iucf | 0.80 | ||||
star_institutions_ksu | STAR-FS01 | /star_institution/ksu | 0.80 | ||||
star_institutions_lbl | STAR-FS01 | /star_institution/lbl | 9.80 | ||||
star_data03 | STAR-FS02 | /star_data03 | 1.80 | 17.22 | 19.75 | ||
star_data04 | STAR-FS02 | /star_data04 | 1.00 | ||||
star_data08 | STAR-FS02 | /star_data08 | 1.00 | ||||
star_data09 | STAR-FS02 | /star_data09 | 1.00 | ||||
star_data16 | STAR-FS02 | /star_data16 | 1.66 | ||||
star_data25 | STAR-FS02 | /star_data25 | 0.83 | ||||
star_data26 | STAR-FS02 | /star_data26 | 0.84 | ||||
star_data31 | STAR-FS02 | /star_data31 | 0.83 | ||||
star_data36 | STAR-FS02 | /star_data36 | 1.66 | ||||
star_data46 | STAR-FS02 | /star_data46 | 6.60 | ||||
star_data05 | STAR-FS03 | /star_data05 | 2.24 | 18.51 | 21.40 | BA02 | |
star_data13 | STAR-FS03 | /star_data13 | 1.79 | ||||
star_data34 | STAR-FS03 | /star_data34 | 1.79 | ||||
star_data35 | STAR-FS03 | /star_data35 | 1.79 | ||||
star_data48 | STAR-FS03 | /star_data48 | 6.40 | ||||
star_data53 | STAR-FS03 | /star_data53 | 1.50 | ||||
star_data54 | STAR-FS03 | /star_data54 | 1.50 | ||||
star_data55 | STAR-FS03 | /star_data55 | 1.50 | ||||
star_data18 | STAR-FS04 | /star_data18 | 1.00 | 16.86 | 19.45 | ||
star_data19 | STAR-FS04 | /star_data19 | 0.80 | ||||
star_data20 | STAR-FS04 | /star_data20 | 0.80 | ||||
star_data21 | STAR-FS04 | /star_data21 | 0.80 | ||||
star_data22 | STAR-FS04 | /star_data22 | 0.80 | ||||
star_data27 | STAR-FS04 | /star_data27 | 0.80 | ||||
star_data47 | STAR-FS04 | /star_data47 | 6.60 | ||||
star_institutions_mit | STAR-FS04 | /star_institutions/mit | 0.96 | ||||
star_institutions_ucla | STAR-FS04 | /star_institutions/ucla | 1.60 | ||||
star_institutions_uta | STAR-FS04 | /star_institutions/uta | 0.80 | ||||
star_institutions_vecc | STAR-FS04 | /star_institutions/vecc | 0.80 | ||||
star_rcf | STAR-FS04 | /star_rcf | 1.10 | ||||
star_emc | STAR-FS05 | /star_emc | ? | 1.042 | 2.05 | BA4 | |
star_grid | STAR-FS05 | /star_grid | 0.05 | ||||
star_scr2a | STAR-FS05 | /star_scr2a | ? | ||||
star_scr2b | STAR-FS05 | /star_scr2b | ? | ||||
star_starlib | STAR-FS05 | /star_starlib | 0.02 | ||||
star_stsg | STAR-FS05 | /star_stsg | ? | ||||
star_svt | STAR-FS05 | /star_svt | ? | ||||
star_timelapse | STAR-FS05 | /star_timelapse | ? | ||||
star_tof | STAR-FS05 | /star_tof | ? | ||||
star_tpc | STAR-FS05 | /star_tpc | ? | ||||
star_tpctest | STAR-FS05 | /star_tpctest | ? | ||||
star_trg | STAR-FS05 | /star_trg | ? | ||||
star_trga | STAR-FS05 | /star_trga | ? | ||||
star_u | STAR-FS05 | /star_u | 0.97 | ||||
star_xtp | STAR-FS05 | /star_xtp | 0.002 | ||||
star_data01 | STAR-FS06 | /star_data01 | 0.83 | 14.94 | 16.90 | ||
star_data02 | STAR-FS06 | /star_data02 | 0.79 | ||||
star_data06 | STAR-FS06 | /star_data06 | 0.79 | ||||
star_data14 | STAR-FS06 | /star_data14 | 0.89 | ||||
star_data15 | STAR-FS06 | /star_data15 | 0.89 | ||||
star_data38 | STAR-FS06 | /star_data38 | 1.79 | ||||
star_data39 | STAR-FS06 | /star_data39 | 1.79 | ||||
star_data40 | STAR-FS06 | /star_data40 | 1.79 | ||||
star_data41 | STAR-FS06 | /star_data41 | 1.79 | ||||
star_data43 | STAR-FS06 | /star_data43 | 1.79 | ||||
star_simu | STAR-FS06 | /star_simu | 1.80 | ||||
star_data07 | STAR-FS07 | /star_data07 | 0.89 | 16.40 | 19.15 | ||
star_data10 | STAR-FS07 | /star_data10 | 0.89 | ||||
star_data12 | STAR-FS07 | /star_data12 | 0.76 | ||||
star_data17 | STAR-FS07 | /star_data17 | 0.89 | ||||
star_data24 | STAR-FS07 | /star_data24 | 0.89 | ||||
star_data28 | STAR-FS07 | /star_data28 | 0.89 | ||||
star_data29 | STAR-FS07 | /star_data29 | 0.89 | ||||
star_data30 | STAR-FS07 | /star_data30 | 0.89 | ||||
star_data32 | STAR-FS07 | /star_data32 | 1.75 | ||||
star_data33 | STAR-FS07 | /star_data33 | 0.89 | ||||
star_data37 | STAR-FS07 | /star_data37 | 1.66 | ||||
star_data42 | STAR-FS07 | /star_data42 | 1.66 | ||||
star_data44 | STAR-FS07 | /star_data44 | 1.79 | ||||
star_data45 | STAR-FS07 | /star_data45 | 1.66 |
| Action effect (+/- impact in TB unit) | ||||||||
Action | FS01 | FS02 | FS03 | FS04 | FS05 | FS06 | FS07 | SATA | |
2008/08/15 | Move/backup data25, 26, 31, 36 to SATA |
| 4.56 |
|
|
|
|
| -4.56 |
2008/08/18 | Drop 25, 26, 31, 36 from FS01 and expand on SATA to 5 TB |
|
|
|
|
|
|
| -15.84 |
2008/08/22 | Shrink 46 to 5 TB, move to SATA and make it available at 5 TB |
| 6.60 |
|
|
|
|
| -5.00 |
|
|
|
|
|
|
|
|
| |
2008/08/19 | Move institutions/ksu and institutions/iucf to FS02 | 1.60 | -1.60 |
|
|
|
|
|
|
2008/08/19 | Expand ksu and iucf to 2 TB |
| -0.80 |
|
|
|
|
|
|
2008/08/22 | Move institutions/bnl to FS02 | 3.50 | -3.50 |
|
|
|
|
|
|
Expand bnl to 4 TB |
| -0.50 |
|
|
|
|
|
| |
Expand lbl by 4.2 TB (i.e. 14 TB) | -4.20 |
|
|
|
|
|
|
| |
Expand emn to 2 TB | -0.40 |
|
|
|
|
|
|
| |
Expand data03 to 2.5 TB |
| -0.70 |
|
|
|
|
|
| |
Expand data04 to 2 TB |
| -1.00 |
|
|
|
|
|
| |
Expand data08 to 2 TB |
| -1.00 |
|
|
|
|
|
| |
Expand data16 to 2 TB |
| -0.34 |
|
|
|
|
|
| |
Expand data09 to 2 TB |
| -1.00 |
|
|
|
|
|
| |
Checkpoint |
| 0.50 | 0.72 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | -25.40 |
Action | FS01 | FS02 | FS03 | FS04 | FS05 | FS06 | FS07 | SATA | |
2008/08/22 | Shrink data 48 to 5 TB,move to SATA |
|
| 6.40 |
|
|
|
| -5.00 |
Expand data05 to 3 TB |
|
| -0.76 |
|
|
|
|
| |
Expand 13, 34, 35, 53, 54 and 55 to 2.5 TB |
|
| -5.13 |
|
|
|
|
| |
|
|
|
|
|
|
|
|
| |
2008/08/22 | Shrink and move data47 to SATA |
|
|
| 6.60 |
|
|
| -5.00 |
Move 18,19, 20, 21 to SATA |
|
|
| 3.40 |
|
|
| -3.40 | |
Expand data18, 19, 20, 21 to 2.5 TB |
|
|
|
|
|
|
| -6.60 | |
Add to FS02 a institutions/uky at 1 TB |
|
|
| -1.00 |
|
|
|
| |
Add to FS02 a institutions/psu at 1 TB |
|
|
| -1.00 |
|
|
|
| |
Add to FS02 a institutions/rice at 1 TB |
|
|
| -1.00 |
|
|
|
| |
Expand vecc to 2 TB |
|
|
| -1.20 |
|
|
|
| |
Expand ucla to 3 TB |
|
|
| -1.40 |
|
|
|
| |
Expand 22 and 27 to 1.5 TB |
|
|
| -1.40 |
|
|
|
| |
Expand /star/rcf to 3 TB |
|
|
| -1.90 |
|
|
|
| |
Checkpoint |
| 0.50 | 0.72 | 0.51 | 1.10 | 0.00 | 0.00 | 0.00 | -45.40 |
Action | FS01 | FS02 | FS03 | FS04 | FS05 | FS06 | FS07 | SATA | |
Free (HPSS archive) emc, src2a, src2b, stsg, timelapse, tof |
|
|
|
| 0.00 |
|
|
| |
Free (HPSS archive) tpc, tpctest, trg, trga |
|
|
|
| 0.00 |
|
|
| |
|
|
|
|
|
|
|
|
| |
Move 40, 41, 43 to SATA |
|
|
|
|
| 5.37 |
| -5.37 | |
Expand 01 to 2 TB |
|
|
|
|
| -1.17 |
|
| |
Expand 02 to 2 TB |
|
|
|
|
| -1.21 |
|
| |
Expand star_simu to 3 TB |
|
|
|
|
| -1.20 |
|
| |
Checkpoint |
| 0.50 | 0.72 | 0.51 | 1.10 | 0.00 | 1.79 | 0.00 | -50.77 |
Missing information and progress records:
Requirements and resource planing for 2009.
The assumed CPU profile will be:
The share between space and CPU is as below within the following caveats:
Experiment Parameters |
Scenario A |
Scenario B |
Central Disk (TB) - Institution |
20.00 |
20.00 |
Type Institution (Index from C&C) |
11 |
11 |
Cost of Central Disk for Institution |
$62,441.47 |
$62,441.47 |
Central Disk (TB) - NexSan-Production |
0.00 |
0.00 |
Type NS-Prod (Index from C&C) |
13 |
13 |
Cost of NexSan-Production |
$0.00 |
$0.00 |
Central Disk (TB) - Production |
170.00 |
170.00 |
Type of Production (Index from C&C) |
12 |
12 |
Cost of Production Disk |
$136,374.27 |
$136,374.27 |
Total Size of new Central Disk (TB) |
190.00 |
190.00 |
Total Cost of Central Disk |
$198,815.74 |
$198,815.74 |
Cost of Servers to support Central Disk |
|
|
|
|
|
Compensation Disk entitled (TB) |
0.00 |
0.00 |
Amount (up to entitlement) (TB) |
0.00 |
0.00 |
Cost of Compensation Disk |
$0 |
$0 |
Remaining Funds |
$0 |
$0 |
|
|
|
Compensation count (1U, 4 GB below) |
0 |
0 |
Compensation count (1U, 8 GB below) |
0 |
0 |
CPU Cost |
$0 |
$0 |
Distributed Disk |
0.0 |
0.0 |
kSI2k |
0.0 |
0.0 |
|
|
|
CPU Type (Index from Constants&Costs) |
2 |
5 |
# 2U, 55xx, 5700 GB disk, 24 GB |
74 |
72 |
CPU Alternative (not used) |
0 |
0 |
CPU Cost |
$429,126 |
$427,680 |
Distrib. Disk on new machines (TB) |
421.8 |
410.4 |
kSI2k new |
1983.2 |
2031.0 |
Total Disk (TB) |
1393.8 |
1382.4 |
Total CPU (kSI2000) |
4303.2 |
4351.0 |
Total Cost |
$627,942 |
$626,496 |
Outside Funds Available |
$62,441 |
$62,441 |
Funds Available |
$588,000 |
$588,000 |
Unspent Funds |
$22,500 |
$23,946 |
The below is what was gathered as the call sent to starsoft "Inquiry - institutional disk space for FY09" (with delay, a copy was sent to starmail on the 14th of April 2009). The deadline was provided as the end of Tuesday the 14th 2009, feedback was accepted until Wednesday the 15th (anything afterward could have been ignored).
Institution | # TB | confirmed |
LBNL | 5 | April 21st 17:30 |
BNL hi | 2 | [self] |
BNL me | 1 | [self] |
NPI/ASCR | 3 | April 22nd 05:54 |
UCLA | 1 | |
Rice | 4 | April 21st 18:47 |
Purdue | 1 | April 22nd 15:12 |
Valpo | 1 | April 22nd 17:59 |
MIT | 2 | April 22nd 15:56 |
Total | 20 |
The pricing on the table is as initially advertised i.e. a BlueArc Titan 3200 based solution at 4.3 k$/ TB for fiber channel based storage. For a discussion of fiber channel versus SATA, please consult this posting in starsofi. A quick performance overview of the Titan 3200 is showed below:
Titan 3200 | |
IOPS | 200,000 |
Throughput | Up to 20Gbps (2.5 GB/sec) |
Scalability | Up to 4PB in a single namespace |
Ethernet Ports | 2 x 10GbE or 6 x GbE |
Fibre Channel Ports | Eight 4Gb |
Clustering Ports |
Two 10GbE |
Solution enables over 60,000 user sessions and thousands of compute nodes to be served concurrently.
The first scalability statement is over the top comparing to RHIC/STAR need but the second is by far reached at the RCF environment.
SATA based solution will be priced at 2.2 k$ / TB. While the price is lower than the fiber channel solution (and may be tempting), this solution is NOT recommended for institutional disk as the scalability for read IO at the level we are accustom to is doubtful (doubtful is probably an under-statement as we know by 5 years ago experience we will have to apply IO throttling).
As a space for production however (and considering resource constrained demanding cheaper solutions coupled with a Xrootd fast IO based aggregation solution which will remain the primary source of data access to users), the bet is that it will work if used as a buffer space (production jobs write locallyto the worker nodes, move files to central disk at the end as an additional copy along an HPSS data migration). There will be minimal guarantees of read performance access for analysis on those "production reserved" storage.
One unit of Thumper at 20k$ / 33 TB usable will be also purchased and tried out in special context. This solution is even less scalable and hence, requires a reduced amount of users and IO. The space targeted for this lower end may include (TBC):
Following the Disk space for FY09, here is the new space topology and space allocation.
BlueArc01 | BlueArc02 | BlueArc04 | |||
---|---|---|---|---|---|
STAR-FS01 | Space | STAR-FS03 | Space | STAR-FS05 | Space |
star_institutions_emn | 2.0 | star_data05 | 3.0 | star_grid | 0.5 |
star_institutions_lbl | 14.0 | star_data13 | 2.5 | star_starlib | 0.25 |
star_institutions_lbl_prod | 5.0 | star_data34 | 2.5 | star_u | 1.6 |
star_institutions_mit | 3.0 | star_data35 | 2.5 | ||
star_institutions_rice | 5.0 | star_data53 | 2.5 | STAR-FS06 | Space |
star_data54 | 2.5 | star_data01 | 2.2 | ||
STAR-FS02 | Space | star_data55 | 2.5 | star_data02 | 2.2 |
star_data03 | 2.5 | star_data06 | 1.0 | ||
star_data04 | 2.0 | STAR-FS04 | Space | star_data14 | 1.0 |
star_data08 | 2.0 | star_data22 | 2.0 | star_data15 | 1.0 |
star_data09 | 2.0 | star_data27 | 1.5 | star_data16 | 2.0 |
star_institutions_bnl | 6.0 | star_institutions_psu | 1.0 | star_data38 | 2.0 |
star_institutions_bnl_me | 1.0 | star_institutions_purdue | 1.0 | star_data39 | 2.0 |
star_institutions_iucf | 1.0 | star_institutions_ucla | 4.0 | star_simu | 3.0 |
star_institutions_ksu | 1.0 | star_institutions_uky | 1.0 | ||
star_institutions_npiascr | 3.0 | star_institutions_uta | 1.0 | STAR-FS07 | Space |
star_institutions_valpo | 1.0 | star_institutions_vecc | 2.0 | star_data07 | 0.89 |
star_rcf | 3.0 | star_data10 | 0.89 | ||
star_data12 | 0.76 | ||||
star_data17 | 0.89 | ||||
star_data24 | 0.89 | ||||
star_data28 | 0.89 | ||||
star_data29 | 0.89 | ||||
star_data30 | 0.89 | ||||
star_data32 | 1.75 | ||||
star_data33 | 0.89 | ||||
star_data37 | 1.66 | ||||
star_data42 | 1.66 | ||||
star_data44 | 1.79 | ||||
star_data45 | 1.66 |
This page is under constructions. Most projects are stil under the Projects and proposals page and not revised.
.
Announcement for institutional disk space was made in starmail on 2010/04/26 12:31.
To date, the following requests were made (either in $ or in TB):
Institution | Contact | Date | $ (k$) | TB equivalent | Final cost | ||
LBNL | Hans Georg Ritter | 2010/04/26 15:24 | 20 | 5 | $17,006.00 | ||
ANL | Harold Spinka | 2010/04/26 16:29 | - | 1 | $3,401.00 | ||
UCLA | Huan Huang | 2010/04/26 16:29 | - | 1 | $3,401.00 | ||
UTA | Jerry Hoffmann | 2010/04/27 14:59 | - | 1 | $3,401.00 | ||
NPI | Michal Sumbera & Jana Bielcikova | 2010/04/20 10:00 | 30 | 8 | $27,210.00 | ||
PSU | Steven Heppelmann | 2010/04/29 16:00 | - | 1 | $3,401.00 | ||
BNL | Jamie Dunlop | 2010/04/29 16:45 | - | 5 | $17,006.00 | ||
IUCF | Will Jacobs | 2010/04/29 20:18 | - | 2 | $6,802.00 | ||
MIT | Bernd Surrow | 2010/05/08 18:07 | - | 2 |
|
||
Totals | 24 |
|
The storage cost for 2010 was estimated at 3.4k$ / TB. Detail pricing below.
Since the past storage stretches on the number of servers and scalability, we would (must) buy a pair of mercury servers which recently cost us $95,937. The storage itself would be based on a recent pricing i.e. a recent configuration quoted it as: (96) 1TB SATA drives, price $85,231 + $2,500 installation yielding to 54 TB usable. STAR's target is 50 TB for production +5+10 TB for institution (it will fit and can be slightly expanded). Total cost is hence:
$95,937 + $85,231 + $2,500 = $183,668 / 54TB = 3401/TB
Detail cost projections may indicate (depending on global volume) a possibly better pricing: the installation price (a haf a day of work for a tech from BlueArc) is fixed and each server pair could hold more than the planned storage (hence the cost for two servers is also fixed). Below a few configurations:
Service installation | 2500 | 2500 | 2500 | |
Cost for 54+27 TB | 127846.5 | |||
Cost per 54 TB | 85231 | |||
Cost per 27 TB | 42615.5 | |||
Two servers | 95937 | 95937 | 95937 | |
Price with servers | 183668 | 141052.5 | 226283.5 | |
Price per TB | 3401.3 | 5224.2 | 3187.1 | |
Price per MB | 0.003244 | 0.004982154 | 0.003039447 |
Projected / allowed CPU need additional based on funding guidance (see CSN0474 : The STAR Computing Resource Plan, 2009): 7440 kSi2k / 2436 kSi2k - projected to be 43% shortage
Projected distributed storage under the same condition (dd model has hidden assumptions): 417 TB / 495 TB - projected to be at acquired level 130% off optimal solution
The decision was to go for 1U machine, switch to the 2 TB drive Hitachi HUA722020ALA330 SATA 3.0 Gbps drive to compensate from drive space loss (4 slots instead of 6 in a 2U). The number of network ports was verified to be adequate for our below projection. The 1U configuration allows recovering more CPU power / desnity. Also, the goal is to move to a boosted memory configuration and enable hyper-threading growing from 8 Batch slots to a consistent 16 slots per node (so another x2 although the performance scaling will not be x2 due to the nature of hyper-threading). Finally, it was decided NOT to retire the older machines this year but keep them on until next year.
Planned numbers
Reality checks:
Since we had many problems with missmatch of purchased/provided space with the RCF in past years, keeping track of the space accounting is a good idea. Below is an account of where the space went (we should total to 55 TB of production space and 26 TB of institution space).
Disk | Initial space | Final size | Total |
lbl_prod | 5 | 5 | 10 |
lbl | 14 | 0 | 14 |
anl | 0 | 1 | 1 |
mit | 3 | 2 | 5 |
bnl | 6 | 5 | 11 |
iucf | 1 | 2 | 3 |
npiascr | 3 | 8 | 11 |
psu | 1 | 1 | 2 |
ucla | 4 | 1 | 5 |
uta | 1 | 1 | 2 |
Total added | 26 | ||
data08 | 2 | 2.5 | 4.5 |
data09 | 2 | 3 | 5 |
data22 | 2 | 3.5 | 5.5 |
data23 | 5 | 0.5 | 5.5 |
data27 | 1.5 | 4 | 5.5 |
data11 | (gone in 2009) | 5 | 5 |
data23 | (gone in 2009) | 5 | 5 |
data85 to 89 | N/A | 5*5 | 25 |
data90 | N/A | 6 | 6 |
Total added so far | 54.5 |
There should be a 0.5 TB unallocated here and there.
Summary of requests and dues:
Institution | Space (TB) | Estimated cost * |
Charged & date | Left over due + |
UCLA | 2 | 4802.47808 | 4,800$ - ????/??/?? | 3$ |
Purdue | 3 | 7203.71712 | 7,203.72$ - 2013/02/20 | 0$ |
NPI/ASCR | 7 | 16808.67328 | 15,000$ - 2011/12/09 | 1,809$ |
LBNL | 7 | 16808.67328 | 14,160$ - 2011/06/15 | 2,649$ |
IUCF | 2 | 4802.47808 | 6,802$ - 2012/06/11 | (past unpaid due added) |
BNL | 5 | 12006.1952 | (internal) | 0$ |
Grand totals | 26 | 62432.21504 |
* ATTENTION: estimated cost based on initial purchase cost estimates. Final price may varry.
+ Unless a number appears in this column, the estimated cost is due in full. Multiple charge may apply to a given institution (until total dues are collected).
Acquired otherwise:
Additional | Total after purchase | %tage increase | S&C plan 2008 | Deficit * | |
Central space (prod) | 10.00 | 345.00 | 2.90% | 377.00 | -8.49% |
Distributed disk space | 430.47 | 1456.47 | 29.56% | 2659.50 | -45.24% |
kSI2K farm | 3045.00 | 6900.00 | 44.13% | 30115.00 | -77.09% |
* ATTENTION: Note-A: deficit assumes a similar run plan - U+U was suggested for FY11 ; Note-B: increase in number of events is not helping; Note-C: if we are lucky and size / events is smaller than projected, the distributed disk may be fine.
Mount | Y2010 | Current | Requested | Status / Comment |
---|---|---|---|---|
ANL | 1 | 1 | ||
BNL | 11 | 16 | +5 | Taken care off |
BNL_ME | 1 | 1 | ||
EMN | 2 | 2 | ||
IUCF | 3 | 5 | +2 | Taken care off |
KSU | 1 | 1 | ||
LBL | 14 | 14 | +5 | Taken care off |
LBL_PROD | 10 | 12 | +2 | Taken care off |
MIT | 5 | 5 | ||
NPIASCR | 11 | 18 | +7 | Taken care off |
PSU | 2 | 2 | ||
PURDUE | 1 | 4 | +3 | Taken care off |
RICE | 5 | 5 | ||
UCLA | 5 | 7 | +2 | Taken care off |
UKY | 1 | 1 | ||
UTA | 2 | 2 | ||
VALPO | 1 | 1 | ||
VECC | 2 | 2 |
The 2012 budget did not allow for flexible choices of hardware or storage. The RHIC experiments were not asked for partitioning (1/2 and 1/2 was done for STAR and PHENIX and essentially coverred for new farm nodes). Storage was handled via a replacement of old storage by newer storage media (and we doubled out space).
Since several institutional disk space bills were pending (unpaid), that possibility did not offer itself either. See the 2011 requirements for where we were.
Requirements and plans for 2013
The RCF budget was minimal - no extranl disk purchase was carried but essentially, "infrastruture" related money (HPSS Silo expansion) too the core budget modulo some left for COU purchases.
Sub system | Task description | Aproximate time | Start time needed | Core FTE | Sub-sys FTE |
HFT | Geometry / alignment studies Includes dev geometry development, developing alignment procedures, infrastructure support for code and alignment, |
12 months | 2012-10-01 | 0.2 | 2*0.5=1.0 |
HFT | Survey, Db work and maintenance | 12 months | 2012-10-01 | 0.1 | 0.5 |
HFT | Detector operations. Includes monitoring QA, calibrations, alignment for PXL, IST,SSD | Each Run | 2013-03-01 | 0.1 | 3*0.5=1.5 |
HFT | Tracking studies, Stv integration and seed finder studies | 12 months | 2012-10-01 | 0.4 | 0.5 |
HFT | Cluster/Hit reconstruction: DAQ for PXL, SSD, IST and definition of base structures but also development of Fast simulator | 12 months | 2012-10-01 | 0.2 | 1.0 |
HFT | Decay vertex reconstruction, development of secondary vertex fitting methods and tools | 8 months | 2012-12-01 | 0.1 | 0.3 |
HFT | General help | N/A | 2012-09-01 | 0.1 | 0.2 |
FGT | Tracking with Stv including integration of FGT and EEMC, ... ECAL W program requires good charge separation. Requirements for other physics goals like direct photon in the EEMC, delta-G, jets, IFF and x-sections have to be investigated and likely range from crude track reconstruction for vetoing to optimal momentum reconstructions |
8 months | 2012-12-01 | 0.6 | 0.2 |
FGT | Vertexing for forward physics | 2 months | 2013-04-01 | 0.2 | 0.3 |
FGT | Alignment study, improvements | 8 months | 2012-12-01 | 0.2 | 0.5 |
FGT | Improvements and tuning (Cluster finding, ...) | 3 months | 2013-01-01 | 0.0 | 0.3 |
FGT | Tuning simulation to data, comparison studies using VMC | 10 months | 2012-12-01 | 0.3 | 1.0 |
FGT | MuDST related work | 1 month | 2012-02-01 | 0.1 | 0.1 |
FGT | Miscellaneous maintenance | N/A | 0.1 | 0.2 | |
FMS | Database interface, client and maintenance | N/A | 0.1 | 0.2 | |
FMS | Better simulation for the FMS, VMC based | 6 months | 0.2 | 0.4 | |
TPC | Calibration and alignment efforts: space charge and grid leak distortions and calculating correction factors, twist correction work, alignment (sector to sector as well as inner to outer), T0 and gain determinations, and dE/dx calibration | 22 months | 2013-01-01 | 0.5 | 1.5 |
TPC | Calibration maintenance (methods developed converged, documented and automated) | N/A | 2015-01-01 | 0.3 | 0.7 |
TPC | Calibration R&D: alignment and distortions | 8 months | 2012-07-01 | 0.2 | 0.3 |
TPC | Understanding aging effects | 20 months | 2012-07-01 | 0.5 | 0.0 |
TPC | iTPC upgrade efforts as well as contingency planning for existing TPC. Design and construction of a sector removal tool. | 20 months | 2012-07-01 | 0.5 | 1.5 |
UPG | Geometry implementation for ETTR, FCS, VFGT | 6 months | 2012-07-01 | 0.2 | 0.5 |
UPG | Event generator integration and simulator development (initial effort for generator, effort for proposal, longer term efforts as needed) | 12 months | 2012-07-01 | 0.2 | 0.5 |
EEMC | Calibration support for physics readiness, software adjustements and maintenance | N/A | 2013-01-01 | 0.1 | 0.3 |
EEMC | SMD calibration related software development | coming year | 2013-01-01 | 0.0 | 0.1 |
EEMC | EEMC alignement work, development of better methods | 12 months | 2013-01-01 | 0.0 | 0.5 |
EEMC | Cluster MIPS studies | 6 months | 2013-01-01 | 0.0 | 0.2 |
TOF | Calibration support, software and database maintenance. Provide final parameters for TOF-based PID, and status tables for BTOF in PPV |
per run | 2013-01-01 | 0.2 | 0.5 |
TOF | Separate TOF and VPD slow simulators | 2 months | 2013-01-01 | 0.2 | 0.5 |
TOF | Simulation studies, mixermaker | 6 months | 2013-01-01 | 0.1 | 1.0 |
TOF | Geometry maintenance | 2 months | 2013-01-01 | 0.2 | 0.2 |
MTD | Calibration support, software maintenance. Provide final parameters for MTD-based muon ID | per run | 2013-01-01 | 0.1 | 1.0 |
MTD | Simulation studies & development: simulation maker. | 6 months | 2013-01-01 | 0.2 | 1.0 |
MTD | Software development: calibration maker | 6 months | 2013-01-01 | 0.2 | 1.0 |
MTD | Geometry maintenance | 2 months | 2013-01-01 | 0.2 | 0.2 |
MTD | Database maintenance & development | 2 months | 2013-01-01 | 0.1 | 0.5 |
The base budget was sufficient to purchase network equipment needed to move to 10 GbE, a first wave of HPSS upgrade (disk cache, drive for Run 14 bandwidth requirements), refresh of BlueArc storage (end of warranties) and purchase of the GPFS system (with supplemental funds argued by STAR). The remainder went into purchasing an equal amount of CPU to be shared between STAR and PHENIX (TBC).
Budget initially thought to be allocated for the RCF for equipment growth and refresh was not provided. Only emergency purchases and minor refresh were done (like replacing dying drives on 4-6 years old hardware to keep it alive) from a farm/processing perspective.
The latest computing resource needs is available at PSN0622 : Computing projections revisited 2014-2019. Even under a modest budget, the resources were dimmed insufficient to meet the need for timely scientific throughput. A new projection based on the non-funding profile of FY15 is not available at this time.
Title, affected area, task description & goal |
Skill required |
POC | Taker or assignee |
Status (date) |
Title: Evaluate the use of XCache as an possible improvement over Xrootd access. Affected area: user analysis Task description: Xrootd access has recently been improved by reducing the IO operation per second (very much like access to GPFS): entire files are transferred to local disk and read locally. Why was this done? This was done as the Xrootd storage model has changed - used to be widely distributed and leveraging our compute farm storage (hence scaling as the farm grew), the concentration of of storage over a few large data-servers caused bottlenecks. However, a caching layer could improve and reduce access hence load to those data servers. Multiple issues will arise and questions needs to be asked: - Can we leverage caching in the first place? [do we have dataset re-use over a 24 hours period for example]? - XCache infrastructure - is it flexible? Can we scale over many smaller nodes? [we do not want to displace the bottleneck from a few Xrootd data servers to ... a few cache] - If we deploy, what are the measure of success? [cache hit must minimally confirm our estimate of benefit, what is the scale of impact?] |
Nothing special | J. Lauret | A. Jaikar L. Hajdu |
Opened 2019/08 |
Title: Evaluation of a new "forum" based mailing list system Affected area: Communication, exchange Task description: |
J. Lauret | RACF J.Lauret W. Betts |
Opened | |
Title: GMT software integration Affected area: Calibrations Task description: Clean up existing code library for the GMT and bring it through code peer review. |
C++ | G. Van Buren | ? |
Untaken 2019/08 |
Title: Collider Performance impact on STAR data Affected area: Calibrations Task description: Use whatever tools we can (e.g. scalers, DCAs) to look for datasets impacted by collider performance similar to what was seen with the Booster Main Magnet for Run 18 AuAu27. |
C++ | G. Van Buren | Yue-Hang Leung |
Taken 2019/09 |
Title: Integration of automated run-by-run Offline QA Affected area: QA Task description: Missing from Offline QA has been run-by-run (i.e. time and/or run dependence) QA plots. Significant work has been done to generate plots, but the final integration and interface to these plots needs to be completed. |
open (e.g. develop web interface) | G. Van Buren | ? |
Untaken 2019/08 |
Title: Web Master Affected area: All web content Task description: None. Skill set includes PHP and a sense of organization. |
PHP | J. Lauret | Daniel Nemes + David Stewart |
2019/08 Taken 2019/10 |