embedding library release policy

Quick jump list

 

Embedding library consolidation

General

Topic of bringing up a policy was brought on 2011/03/30. Looking at the last part of what we can improve, it seemed to me that a lot of time is spent on trying to figure out if the libraries at BNL and PDSF are alike or not, pointing at the need for  stronger QA on library deployment and perhaps, submitting to the need to have embedding libraries compiled and assembled at BNL.

Message to Terry/Renee follows

Subject: New rule to be put in place
CC: Lidia D., Gene V.B.

Email was sent on 2011/04/03 after consulting with the S&C team (in general agreement wih a move forward).

	Dear Renee & Terry,

	At the internal S&C meeting I called for last Wednesday, I
brought the delicate topic of whether or not the current software
support scheme for the embedding is working out. My personal observation
is that a lot of discussions and tracing has to do with figuring out
(still) what was run, where, under which conditions etc ... Especially,
we tend to find offsets between PDSF and BNL libraries making it hard
for us to easily debug.


	I would hence like to crank the nob of "policy" one level
higher and set as rule that

(a) embedding libraries will need (MUST) be assembled at BNL and
    tested at BNL

(b) offset (PDSF) will copy / clone the BNL codes and the BNL code
    will remain the primary test copy. In other words, I would like
    custom patch added to libraries at PDSF to cease for the benefit
    of additional library support at BNL ...

    Note 1: naming conventions and release policies would need to
    sorted out but if the principle is agreed upon, I will work
    on that and we will move on.

    Note 2: whenever PDSF will deploy an embedding library a.1 Lidia
    will run a test suite and compare to the results provided at BNL.
    The will need to match EXACTLY to have the OK to proceed. a.2 I
    will put in place a "md5sum" like mechanism for a library - this
    will be run regularly and if an offset is detected, by policy,
    we would declare ANY production done with the "tainted" remote
    library as invalid/void [zero risk approach]

(c) All codes should be strictly CVS controlled (macros, prepMaker
    etc ...) hence, karma list should be reviewed thoroughly (again,
    by policy we should decide who has prime responsibility for each
   code and area)

	At the end, we will gain confidence of similar code between
BNL and pDSF (or elsewhere) and we will be able to remove one more
ambiguity which tend to cloud the findings in many tickets ("do we
have the same code?").

	Before shaping this into a final policy, I would like your
feedback. Policies can always be stronger / softer depending on
ration arguments - the above is a baseline.
	
	Thank you,


Additional comment I made:

We have to have the absolute assurance of all components of our
production workflow (whether data production, simulation or embedding)
strictly CVS-controlled so there is zero discussion on the where/what.
In other words, all ticket should ideally start with "I am running
from library=$VERSION, using this command line and this is what I see"
and a BNL check from code developers should be able to reproduce the
issue right away [this is simply not the case now]. Any other problem
analysis welcomed.

Initial thoughts

A policy should hence contain the following ingredients/elements as base operational requirements as follows

  • All code necessary for running the embedding should be strictly under revision control, whether code or macros. Embedding libraries would be assembled at BNL first by the STAR production coordinator and librarian (PCL).

    • A library following a naming convention along the path SLYYS_embed should be created, where YY and S are respectively the year and sub-version of the production the embedding library relates to.

      • For example, SL10h_embed would be a tree representing the embedding code serie for SL10h data production related activities

    • A tag shall be released for each embedding libraries and follow the convention used by the Library release structure and policy.

      • In other words, tags should follow the form SLYYS_embed_# where # is an incremental number from 1 to N for each sub-sequent patch level. There is no need for _# postfix for the first release (and _1 will be considered the first patch level if needed)

  • Upon validation of such library, a release tag will be issued an an OK to deploy given by the PCL.

    • Validation should happen at BNL based on a set of nightly based jobs, representative of our knowledge of standard embedding productions, results compared with previous BNL based embedding for the same test suite and changes explained (level 1 QA-ing / incremental changes).

    • Validation should include a set of standard QA macros provided by the embedding team. One may envision that, rather than looking at many plots, one would leverage the offline QA framework to spot changes comparing to references. Kolmogorov test could be leveraged as well (level 1 QA / deviation from reference)

    • Each library should also add a unique source code based checksum like number uniquely identifying a set of source codes (TBD). This checksum would be used to quickly re-validate remote libraries (level 0 / checksum)

  • Library should then be deployed at remote site and re-validated remotely by the PCL. Libraries must pass the validation.

    • Validation would include a full test suite as used at BNL to validate a given library – comparisons shall be between the remote library and the BNL one (level 2 QA-ing / site differences)

    • The same applies with any test suite based on Kolmogorov (level 2 / deviation from reference and in this case, reference = BNL base library)

    • Level 0 validation must pass (checksum)

    • Without validation, the library should NOT be used for official embedding production, no matter what

      • Hint: in other words, rather than spending time on hack a tweaks, time should be spent in understanding the core source of the differences.

  • Active embedding libraries should be regularly checked for validity. Shall a later validity not pass the basic validity check (a) a full QA suite must be run again (b) stop production must be issued on all opened request pertaining to affected library

    • Only a level 0 checksum validation is required – if it fails, level 1 validations must be carried out and reason for differences understood.

    • Note: same as in the previous hint – local hacks, even if working, shall not be the focus but a re-sync with the BNL repository should be the focus of efforts and attention.

  • All code and macros should work identically at all sites (implying macros should work at BNL and STAR's primary embedding site)

    • NB: one may argue, as pushed forward by Lee Barnby, that the steering code and scripts should also work independently of the sites it runs on – the idea at the time: levearge SUMS for job submission and develop XML template.

 

This being said, the reason for having codes in private directories was no clear. Before drafting any final policies, understanding the true reason behind is fundamental (are there any real need or is it a question of organization, information flow or lack thereoff from the analysis to S&C?).

 

Follow-up #1 2011/04/05 (triggered by a PDSF library issue)

Subject: Re: PDSF library discussion for embedding
CC: Jeff P., Joanna K., Eric H., Lidia D.

	At this point in time, my proposal and decision (after
consulting with the team) is indeed to strengthen the centralization of
libraries and code in STAR. We will now have ANY libraries used for ANY
productions validated at BNL - remote site code will be compared to the
BNL version (a "repository") and if they do not match [technical detail
on what that means not the object of this EMail] we will not run. Lidia
already has in place a test suite comparison (leveraging Grid based job
submission) and  a few more tricks will be added to be able to validate
remote sites.

	Jeff's and Lidia's reports stresses the point I have brought
to Terry and Renee earlier and the reason behind this proposal. In all
good faith, mistakes are still being made and complete assurance of the
base code is essential.

Discussion on 2011/04/27

 Attending: Jerome (chair), Renee, Lidia

The topic aimed to discuss the FTPC embedding but much started with the policy. One off-Email discussion was to evaluate what is really the root cause of having "private" codes compiled (still) to run embedding and why (even more odd) private macros.

  • Especially, the famous Embedding PrepMaker seem to be tweaked regularly and reside in private directories ...
  • First question would be: Why do we need to re-compile this? It should not.
    • People requested things which were not in the PrepEmbedMaker like cuts on vertex, etc ...
    • Jerome argues that discussions should take place on needs prior to running embedding [we should not be reactive but  rather, be pro-active].  In other words, some discussion should happen before we even think (at the last moment) of what we need as cut.
      Example: we know that we may have pileup in Run 11 and Run 12. We could ask right away what does this imply for analysis …
  • If a discussion is carried prior, the Embedding interface parameter set (cuts) available could be revised frequently based on evolution of requests.
    • We should always aim to focus on a final goal of "full automation" of the embedding workflow. How realistic is that is another story but it forces us to also think in advance of what we need (interface is one thing, there ar eplenty of details and requirements)
    • Jerome suggests that perhapsm we can request from the PAC the set of parameters used in analysis which would transpire & propagate into the embedding  ... This bullet statement of a policy would imply a frequent polling of the PWG for their ongoing analysis (best knowledge) the same way the Software coordinators are polled for their code readiness (why should it be different - we may not be 100% ready but this systematic polling allows fastre convergence and reducing the amount of patch and work to be done later).
  • Generally, we seem to agree with the idea of preparation
  • Jerome noted that Lidia has defined the tag name at this stage - the test suite and validation are near all in place. It seems best to indentify the problems as we go before drafting a final text.
    • Right now, understanding (still) what are the components subject to late patch and understanding why remains an opened task and an opened issue.

Renee brought a possible issue

  • Special case: concerns the FTPC embedding
    • We have been on this for a while … NFit distribution not as it should (gain table tweak needed?)
    • Idea: implement ad-hoc a way to get hits “right” … In contact with Janet about this … there is some private code tweaking efficiencies.
    • Jerome is super-concerned of what "tweaking" to get the "right" signal means when efficiency calculation is concerned. What does this mean?? We tweak to get what? The result we want to see? The result we beleive to be correct ... because?
    • Renee promised to review the issue and come back with material to discuss...

Use cases - cases where we had to backward patch libraries

Case 1

Pi/K/P in 7.7GeV BES :  Lokesh found 3mm dcaz offset which corresponds to a 0.5 time bucket shift in StTpcRSMaker.  
Yuri fixed this and this was our very first update to SL10c.  This resulted in the SL10h_embed release at PDSF.

This case would be mitigated by the presence of proper QA suite able to catch such problems (coverred by policy).

  • Test of pi+/pi- should be done before the library is officially built. 
  • Standard pi/k/p base checks moves back to core group and is documented as standard test for library release.

Case 2

D0/D0bar in AuAu : Reconstructed tracks in embedding had less SVT hits than in data

This took 8 months to resolve and resulted in modifications to the  StSVTSimulatorMaker. 
In order to have a meaningful embedding at all we had to use checked out code (and patch).

A complete test of a library would require a test of all detector simulators to be used for embedding in those requests.  Another consideration is to realize that SVT based embedding was simply not ready for use. In such a case, the policy allows the release of a consistent new tag for embedding libraries (the final must be consistent with everything else).

Case 3

When we were ready to set up to run the Xi  on 11W17,  the following steps were taken:

cvs co StRoot/pams/sim/gstar
->  cascade decay mode with 100 % BR
cvs co StRoot/StarClassLibrary
->  cascade geant id

Comment: Concerted efforts to get these changes into CVS for documentation and reproducibility were made.  
However, these were put in the library AFTER the SL10h that we are using now for embedding.

The poilicy proposes that we pro-actively gather information from the PWGs (via the PAC) about what we will be needing for embedding for a given set of analysis (tight to a production library) and make sure all needed elements are there/present in the embedding library BEFORE the final tag is made.

This will make the code more rigid but also more reproducible. 

  • Concern: If at the end something is missing, what do we do? Say "sorry but we have to run without that cut or that feature because it is not ready"?
  • Answer: this is what is already happening for data production and while the procedure took time to be accepted, it is now (a) widely understood that code need to be ready before a library is assembled (b) well understood [and actually requested in some cases] that we have homogeneity of production as well as consolidation and (c) recent feature be implemented for a next wave of production or forward libraries.
    Analysis preparation should not differ.

 

Possible path forward - policy retuning

A concern was raised during the elaboration of the policy that coordination through the PAC of the PWGC (sometime not on top of all analysis requirements) may be from hard to un-practical. Possibilities on the table are then below (initial thoughts and alternative):

  • Analysis requirements is gathered by the PAC in discussion with the PWGC
  • Each PWG has an "analysis requirement" Point of Contact - those POC would be communicating directly to the S&C structure and help define the needs
    • NB: interestingly, it happens often that a single or a hand-full of individuals drive new requirements when it comes to extension of the framework including IO, SUMS, etc ...

 

STAR Analysis Meeting presentation

Effort to understand how to improve further indicates avoidable issues
- Embedding libraries get off-sync with BNL, ticket resolution becomes nebulous (can’t
  reproduce, why, what’s different, e…)
- Proposal
  * All code will be strictly CVS controlled (not always the case)
  * Embedding libraries would be re-centralized at BNL (assembled, tested, QA-ed, patched, retested
    at BNL) – only a BNL GO <=> install at remote site
  * Offsite then clone the BNL library – differential tests run. If remote library does not pass the tests,
    it is not used until understood. Iterate …
  * Some tests will be non-stop testing (checksum like) – alarm will be raised of divergence appears
     (possible dashboard a-la AutoBuild)
  * Patch are still allowed but need to be propagated back to BNL for a centralized re-test and re-QA
  * See analysis and more details on “Embedding library release policy”
- Catch 22: often patches are made due to a “late” discovery of analysis cuts and needs …
  * Propose to (a) have PAC gather requirements from PWGC or (b) PWG assign a POC for analysis
    requirement
  * Believe that most likely workable path forward is (b) [similar to sub-system software + tend to be
    always identifiable individuals who answer and/or are close to the details]

    Requested feedback from PAC / PWGC – ongoing discussion (1 support
    for policy along (b)) – no answer = OK to proceed (by mid-June)

We are now mid-June ...