AgML Checksum Nightly Tests


04/17/2014

Lidia deployed nightly checksum of ROOT geometry in nightly tests.  Resulting checksums and number of volumes which match the template checksums are recorded in the LibraryJobs database in the AgML_checksum table.


04/18/2014

Checksums were evaluated after last night's autobuild.  Comparison of the checksums in the database show errors... clear differences between the 04/17 and 04/18 checksums.  Jerome suggested to look at configuration... indeed, we see that the checksum changes depending on whether we run the tests at 64 bit vs 32 bit vs optimized and non optimized.  So... is this a real problem, or just machine precision / roundoff error.  To investigate...
Create three geometries (y2014a) under the three different conditions -- optimized vs nonoptimized
at 32 bit, and nonoptimized at 64 bit.  Then evaluate the checksum for each of the three conditions
using each of the three geometries as input.  (9 values in total...)

tag=y2014a-dbg-32b

1) Run checksum in DEV, non optimized, 32 bits         HALL: 0521e4659c57773fdf15e554bdd5b6a9
2) Run checksum in DEV, non optimized, 64 bits         HALL: f33eaae5cd1b645726fe0924ddf2f2c7
3) Run checksum in DEV,     optimized, 32 bits         HALL: 0521e4659c57773fdf15e554bdd5b6a9

tag=y2014a-opt-32b

4) Run checksum in DEV, non optimized, 32 bits         HALL: bb586bb43e452c825aa16cb95f7684e3
5) Run checksum in DEV, non optimized, 64 bits         HALL: 6ac0a6be950c16e98cdd4f2b4e1fc7ec
6) Run checksum in DEV,     optimized, 32 bits         HALL: bb586bb43e452c825aa16cb95f7684e3

tag=y2014a-dbg-64b

7) Run checksum in DEV, non optimized, 32 bits         HALL: cc029dc9b3fbad8982195e8fe85f0c3e
8) Run checksum in DEV, non optimized, 64 bits         HALL: 79ee228c82c31f7cc562f5f1afc4fabb
9) Run checksum in DEV,     optimized, 32 bits         HALL: cc029dc9b3fbad8982195e8fe85f0c3e


What do we see?

o AgML checksum evaluation does not depend on optimization at 32 bits... we see no change when
  the same geometry is input.  Compare 1 and 3.

o AgML checksum evaluation does depend on 32 bits vs 64 bits.  For the same input file, we get
  a different checksum.  Compare 2 and 3.

o The geometry which is produced is bitwise different between optimized and non optimized code.
  Compare 1 and 4.

o The geometry which is produced is bitwise different between 32 and 64 bit.  Compare 1 and 7.


04/18/2014
Let's restrict ourselves to 32bit opt geometry vs 32bit debug geometry.  Looking at the volume by 
volume checksums we see most checksums differ.  In particular we have the magnet showing differences
in volume PAWT...

[Checksum Mismatch PAWT 2314db7fc691d8a8278b6b0fc0a8ebdb 062183dc2da4dc8c2cbc360311424782]

PAWT has no children, so it will be easy to see where (if) it has real differences or not.
Checksums are evaluated recursively down the volume tree.  A volume's checksum depends on
the checksums and positions of its daughters, plus the checksums of its material, media 
parameters and shape.

Comparing geometry created under debug vs nodebug...

Geometry.y2014a-dbg-32b.root                    Geometry.y2014a-opt-32b.root

PAWT 

Material parameters:

A: (const Double_t)1.43218804507728983e+01      (const Double_t)1.43218804507729001e+01
Z: (const Double_t)7.21671159149344277e+00      (const Double_t)7.21671159149344366e+00
D: (const Double_t)1.00000000000000000e+00      (const Double_t)1.00000000000000000e+00
R: (const Double_t)3.57585804664578362e+01      (const Double_t)3.57585804664578362e+01
I: (const Double_t)7.55166047475367748e+01      (const Double_t)7.55166047475367606e+01

Medium parameters:
0                                               0
1                                               1
20                                              20
20                                              20
10                                              10
0                                               0
0.01                                            0.01
...

Shape (tube) parameters:

Rmin: (const Double_t)2.66607147216796875e+02    (const Double_t)2.66607147216796875e+02
Rmax: (const Double_t)2.68107147216796875e+02    (const Double_t)2.68107147216796875e+02
Dz:   (const Double_t)7.50000000000000000e-01    (const Double_t)7.50000000000000000e-01

So this looks like an issue with materials.  In this specific case it is a mixture:

Mixture MagpGeo_Water Material MagpGeo_Water last touched in block PAWT of module MagpGeo   Aeff=14.3219 Zeff=7.21671 rho=1 radlen=35.7586 intlen=75.5166 index=25
   Element #0 : H  Z=  1.00 A=  1.01 w= 0.112 natoms=2
   Element #1 : O  Z=  8.00 A= 16.00 w= 0.888 natoms=1



HOLE:

Material:
     14.60999999999999943157 14.60999999999999943157
     7.299999999999999822364 7.299999999999999822364
   0.00120499999999999990799 0.00120499999999999990799
     30412.60851290595383034 30412.60851290595383034
     70037.71747003440395929 70037.71747003438940737      **
Medium:
                           0
                           1
                          20
                          20
                          10
                           0
Shape:
*** Shape TGeoBBox: TGeoBBox ***
    dX =  1666.45996
    dY =    45.72000
    dZ =   899.15997
    origin: x=    0.00000 y=    0.00000 z=    0.00000

Medium and shape the same...

So... it looks like

(1) Evaluation of mixtures causes issues ...
(2) Evaluation of derived material properties (intlen) causes problems ...


TCOO is also a mixture 

Material:
     25.78537218018238874606 25.78537218018238874606
     13.35006757009759326138 13.35006757009759326138
     2.321549999999999780442  2.321549999999999780442
     9.393737860498498903894  9.393737860498497127537
     44.18165789569228252276 44.18165789569227541733


04/21/2014
First attempt.  Apply a decimal shift and truncation algorithm to material properties.  i.e. shift decimal point 4 places past the most significant figure, then cast to an integer.  The number 0.000012345/f becomes 12345/i.

This results in a new checksum for y2014a, which we add to the StarDb/AgMLChecksum DB...

[Checksum Validation y2014a] HALL 0521e4659c57773fdf15e554bdd5b6a9 0521e4659c57773fdf15e554bdd5b6a9
[Checksum Validation y2014a] HALL Geometry Checksums Agree
[Checksum Validation y2014a] Total number of volumes:     4872
[Checksum Validation y2014a] Number of same volumes:      4872
[Checksum Validation y2014a] Number of different volumes: 0

Running the code under optimized library...

... seems to have failed... far worse than I would have thought...
[Checksum Validation y2014a] HALL bff661296520a246e71cb84bfb2aa332 0521e4659c57773fdf15e554bdd5b6a9
[Checksum Validation y2014a] HALL Geometry Checksums Mismatch
[Checksum Mismatch HOLE 8fbd050895a1346fcf0acc0b300bbd25 daa7de5141cbb27b0c87ff12b1758670]
...
[Checksum Mismatch PVAG 1a5dfa3e8335be8b1b26cba917acfef1 843e645af96c85f6389496437ec22fce]
[Checksum Validation y2014a] Total number of volumes:     4872
[Checksum Validation y2014a] Number of same volumes:      1
[Checksum Validation y2014a] Number of different volumes: 4871

Try making checksum completely insensitive to materials...
Then I get only 1656 volumes showing differences.

Looking deeper... it looks like even shapes are a factor:
e.g., in ECAL ESCI --

((TGeoTrd1 *)esci->GetShape())->GetDx2()

(const Double_t)4.31181628415218388e+00     (debug)
(const Double_t)4.31181588431256291e+00     (optimized)

So... looks like all parameters need to be degraded slightly... 5 digits past decimal.  Maybe down to 4 digits past decimal.

With all parameters @ 5 sig figures, down to 376 volumes difference.  Mostly ESMD strips, but few others.  (Note-- material @ 3 sig figs).
Ajust so that material params @ 5 sig figs as well... still 376 volumes.  So let's find out what's up...

DEBUG--
matrix pos_THX1_in_BBC1_1 - tr=1  rot=0  refl=0  scl=0
  1.000000    0.000000    0.000000    Tx =   9.640015
  0.000000    1.000000    0.000000    Ty =  50.091003
  0.000000    0.000000    1.000000    Tz =   0.000000
Info in <TGeoNodeMatrix::InspectNode>: Mother volume BBC1
OPTIMIZED--
matrix pos_THX1_in_BBC1_1 - tr=1  rot=0  refl=0  scl=0
  1.000000    0.000000    0.000000    Tx =   9.640015
  0.000000    1.000000    0.000000    Ty =  50.091000
  0.000000    0.000000    1.000000    Tz =   0.000000
Info in <TGeoNodeMatrix::InspectNode>: Mother volume BBC1

So... we *can* have slight differences in volume positions... in this case, the tripple hex module w/in the inner BBC annulus.  It's off by a small amount in Y...
... but we're already shifting/truncating to take care of that.




At this point, we're pretty much going to need to degrade sensitivity to all parameters, not just materials.  So I would rather just restrict the checksum test to 
a single compilation configuration (e.g. 32bits debug) OR define a separate test for each config.