In search of 64-bit slowness

 Following recent tests by Dmitri and me that showed no library dependence in the 64-bit slowness, I decided to have a closer look at where 64-bit was running slowly. Using simple greps of the log files, I found three notable places in the code:
  1. StMuDstMaker
  2. Sti
  3. StTpcHitMaker
This is demonstrated in the following plots of the ratio of CPU time reported at the end of the BFC chain in different makers and sub-chains in optimized 64-bit divided by 32-bit for different nightly tests. Colors indicate the absolute amount of time spent in that maker/sub-chain, from very little (purple) to a lot (red). So the most important ones are those with a ratio above 1.0 who aren't purple, consistently across datasets. The multiple data points at each maker/sub-chain are different days on which the nightly tests ran, giving some sense of the variability/uncertainty.

notes:
1) Sub-chains appear twice...first as a link in the big full chain, and then again at the beginning of its own sub-links.
2) The value "1" on the x-axis was for the whole big full chain, but I excluded it so that it didn't dominate the color scheme in my plots.

Dataset Ratios Legend
2014AuAu200
     2	StRtsReaderMaker::rts_reader
     3	StAnalysisMaker::analysis
     4	StTreeMaker::outputStream
     5	StEventQAMaker::EventQA
     6	StMuDstMaker::MuDst
     7	StStrangeMuDstMaker::strangeMuDst
     8	StTagsMaker::tags
     9	StHighPtTagsMaker::HighPtTags
    10	StHeavyTagMaker::HeavyTag
    11	StEventCompendiumMaker::StEventCompendiumMaker
    12	StMtdCalibMaker::mtdCalib
    13	StMtdMatchMaker::MtdMatch
    14	StBTofCalibMaker::btofCalib
    15	StBTofMatchMaker::btofMatch
    16	StdEdxY2Maker::dEdxY2
    17	StiMaker::Sti
    18	StGenericVertexMaker::GenericVertex
    19	Stl3RawReaderMaker::l3RawReader
    20	StMaker::BTofChain
    21	StEpcMaker::epc
    22	StPreEclMaker::preecl
    23	StEmcRawMaker::emcRaw
    24	StZdcVertexMaker::StZdcVertexMaker
    25	StIstHitMaker::ist_hit
    26	StIstClusterMaker::ist_cluster
    27	StIstRawHitMaker::ist_raw_hit
    28	StPxlHitMaker::pxl_hit
    29	StPxlClusterMaker::pxl_cluster
    30	StPxlRawHitMaker::pxl_raw_hit
    31	StMaker::tpcChain
    32	StMaker::MtdChain
    33	StEventMaker::0Event
    34	StTriggerDataMaker::trgd
    35	StIstDbMaker::istDb
    36	StPxlDbMaker::pxlDb
    37	StEEmcDbMaker::eeDb
    38	StSstDbMaker::sstDb
    39	StTpcDbMaker::tpcDB
    40	StDetectorDbMaker::detDb
    41	StMagFMaker::MagField
    42	St_db_Maker::db
    43	StIOMaker::inputStream
    44	StMaker::BTofChain
    45	StVpdCalibMaker::vpdCalib
    46	StBTofHitMaker::tof_raw
    47	StMaker::tpcChain
    48	StTpcHitMover::tpc_hit_mover
    49	StTpcHitMaker::tpc_hits
    50	StMaker::MtdChain
    51	StMtdHitMaker::mtd_raw
    52	StIOMaker::inputStream
    53	StDAQMaker::inputStream_DAQ
2015pAu200
     2	StRtsReaderMaker::rts_reader
     3	StAnalysisMaker::analysis
     4	StTreeMaker::outputStream
     5	StEventQAMaker::EventQA
     6	StMuDstMaker::MuDst
     7	StStrangeMuDstMaker::strangeMuDst
     8	StTagsMaker::tags
     9	StHighPtTagsMaker::HighPtTags
    10	StHeavyTagMaker::HeavyTag
    11	StEventCompendiumMaker::StEventCompendiumMaker
    12	StMtdMatchMaker::MtdMatch
    13	StBTofCalibMaker::btofCalib
    14	StBTofMatchMaker::btofMatch
    15	StdEdxY2Maker::dEdxY2
    16	StiMaker::Sti
    17	StGenericVertexMaker::GenericVertex
    18	StFmsFpsMaker::FmsFps
    19	StFmsPointMaker::StFmsPointMaker
    20	StFpsRawHitMaker::FpsHit
    21	StFmsHitMaker::StFmsHitMaker
    22	Stl3RawReaderMaker::l3RawReader
    23	StMaker::BTofChain
    24	StEpcMaker::epc
    25	StPreEclMaker::preecl
    26	StEmcRawMaker::emcRaw
    27	StZdcVertexMaker::StZdcVertexMaker
    28	StSstPointMaker::SstPoint
    29	StSstDaqMaker::sst_raw
    30	StIstHitMaker::ist_hit
    31	StIstClusterMaker::ist_cluster
    32	StIstRawHitMaker::ist_raw_hit
    33	StPxlHitMaker::pxl_hit
    34	StPxlClusterMaker::pxl_cluster
    35	StPxlRawHitMaker::pxl_raw_hit
    36	StMaker::tpcChain
    37	StMaker::MtdChain
    38	St_pp2pp_Maker::PP2PP
    39	StEventMaker::0Event
    40	StTriggerDataMaker::trgd
    41	StIstDbMaker::istDb
    42	StPxlDbMaker::pxlDb
    43	StFmsDbMaker::fmsDb
    44	StEEmcDbMaker::eeDb
    45	StSstDbMaker::sstDb
    46	StTpcDbMaker::tpcDB
    47	StDetectorDbMaker::detDb
    48	StMagFMaker::MagField
    49	St_db_Maker::db
    50	StIOMaker::inputStream
    51	StMaker::BTofChain
    52	StVpdCalibMaker::vpdCalib
    53	StBTofHitMaker::tof_raw
    54	StMaker::tpcChain
    55	StTpcHitMover::tpc_hit_mover
    56	StTpcHitMaker::tpc_hits
    57	StMaker::MtdChain
    58	StMtdHitMaker::mtd_raw
    59	StIOMaker::inputStream
    60	StDAQMaker::inputStream_DAQ
2016AuAu200
     2	StRtsReaderMaker::rts_reader
     3	StAnalysisMaker::analysis
     4	StTreeMaker::outputStream
     5	StEventQAMaker::EventQA
     6	StPicoDstMaker::PicoDst
     7	StMuDstMaker::MuDst
     8	StStrangeMuDstMaker::strangeMuDst
     9	StTagsMaker::tags
    10	StHighPtTagsMaker::HighPtTags
    11	StHeavyTagMaker::HeavyTag
    12	StEventCompendiumMaker::StEventCompendiumMaker
    13	StMtdMatchMaker::MtdMatch
    14	StBTofCalibMaker::btofCalib
    15	StBTofMatchMaker::btofMatch
    16	StdEdxY2Maker::dEdxY2
    17	StiMaker::Sti
    18	StGenericVertexMaker::GenericVertex
    19	Stl3RawReaderMaker::l3RawReader
    20	StMaker::BTofChain
    21	StEpcMaker::epc
    22	StPreEclMaker::preecl
    23	StEmcRawMaker::emcRaw
    24	StZdcVertexMaker::StZdcVertexMaker
    25	StSstPointMaker::SstPoint
    26	StSstDaqMaker::sst_raw
    27	StIstHitMaker::ist_hit
    28	StIstClusterMaker::ist_cluster
    29	StIstRawHitMaker::ist_raw_hit
    30	StPxlHitMaker::pxl_hit
    31	StPxlClusterMaker::pxl_cluster
    32	StPxlRawHitMaker::pxl_raw_hit
    33	StMaker::tpcChain
    34	StMaker::MtdChain
    35	StEventMaker::0Event
    36	StTriggerDataMaker::trgd
    37	StIstDbMaker::istDb
    38	StPxlDbMaker::pxlDb
    39	StEEmcDbMaker::eeDb
    40	StSstDbMaker::sstDb
    41	StTpcDbMaker::tpcDB
    42	StDetectorDbMaker::detDb
    43	StMagFMaker::MagField
    44	St_db_Maker::db
    45	StIOMaker::inputStream
    46	StMaker::BTofChain
    47	StVpdCalibMaker::vpdCalib
    48	StBTofHitMaker::tof_raw
    49	StMaker::tpcChain
    50	StTpcHitMover::tpc_hit_mover
    51	StTpcHitMaker::tpc_hits
    52	StMaker::MtdChain
    53	StMtdHitMaker::mtd_raw
    54	StIOMaker::inputStream
    55	StDAQMaker::inputStream_DAQ
2017AuAu54
     2	StRtsReaderMaker::rts_reader
     3	StAnalysisMaker::analysis
     4	StTreeMaker::outputStream
     5	StEventQAMaker::EventQA
     6	StMuDstMaker::MuDst
     7	StStrangeMuDstMaker::strangeMuDst
     8	StTagsMaker::tags
     9	StHighPtTagsMaker::HighPtTags
    10	StHeavyTagMaker::HeavyTag
    11	StEventCompendiumMaker::StEventCompendiumMaker
    12	StMtdMatchMaker::MtdMatch
    13	StBTofCalibMaker::btofCalib
    14	StBTofMatchMaker::btofMatch
    15	StdEdxY2Maker::dEdxY2
    16	StiMaker::Sti
    17	StGenericVertexMaker::GenericVertex
    18	StFmsPointMaker::StFmsPointMaker
    19	StFpsRawHitMaker::FpsHit
    20	StFmsHitMaker::StFmsHitMaker
    21	Stl3RawReaderMaker::l3RawReader
    22	StMaker::BTofChain
    23	StEpcMaker::epc
    24	StPreEclMaker::preecl
    25	StEmcRawMaker::emcRaw
    26	StZdcVertexMaker::StZdcVertexMaker
    27	StMaker::tpcChain
    28	StMaker::MtdChain
    29	StEventMaker::0Event
    30	StTriggerDataMaker::trgd
    31	StFmsDbMaker::fmsDb
    32	StEEmcDbMaker::eeDb
    33	StTpcDbMaker::tpcDB
    34	StDetectorDbMaker::detDb
    35	StMagFMaker::MagField
    36	St_db_Maker::db
    37	StIOMaker::inputStream
    38	StMaker::BTofChain
    39	StVpdCalibMaker::vpdCalib
    40	StBTofHitMaker::tof_raw
    41	StMaker::tpcChain
    42	StTpcHitMover::tpc_hit_mover
    43	StTpcHitMaker::tpc_hits
    44	StMaker::MtdChain
    45	StMtdHitMaker::mtd_raw
    46	StIOMaker::inputStream
    47	StDAQMaker::inputStream_DAQ
2018AuAu27
     2	StRtsReaderMaker::rts_reader
     3	StAnalysisMaker::analysis
     4	StTreeMaker::outputStream
     5	StEventQAMaker::EventQA
     6	StPicoDstMaker::PicoDst
     7	StMuDstMaker::MuDst
     8	StStrangeMuDstMaker::strangeMuDst
     9	StTagsMaker::tags
    10	StHighPtTagsMaker::HighPtTags
    11	StHeavyTagMaker::HeavyTag
    12	StEventCompendiumMaker::StEventCompendiumMaker
    13	StMtdMatchMaker::MtdMatch
    14	StBTofCalibMaker::btofCalib
    15	StBTofMatchMaker::btofMatch
    16	StdEdxY2Maker::dEdxY2
    17	StiMaker::Sti
    18	StGenericVertexMaker::GenericVertex
    19	Stl3RawReaderMaker::l3RawReader
    20	StMaker::BTofChain
    21	StEpcMaker::epc
    22	StPreEclMaker::preecl
    23	StEmcRawMaker::emcRaw
    24	StZdcVertexMaker::StZdcVertexMaker
    25	StMaker::tpcChain
    26	StEpdHitMaker::epdHit
    27	StMaker::MtdChain
    28	StEventMaker::0Event
    29	StTriggerDataMaker::trgd
    30	StEpdDbMaker::epdDb
    31	StEEmcDbMaker::eeDb
    32	StTpcDbMaker::tpcDB
    33	StDetectorDbMaker::detDb
    34	StMagFMaker::MagField
    35	St_db_Maker::db
    36	StIOMaker::inputStream
    37	StMaker::BTofChain
    38	StVpdCalibMaker::vpdCalib
    39	StBTofHitMaker::tof_raw
    40	StMaker::tpcChain
    41	StTpcHitMover::tpc_hit_mover
    42	StTpcHitMaker::tpc_hits
    43	StMaker::MtdChain
    44	StMtdHitMaker::mtd_raw
    45	StIOMaker::inputStream
    46	StDAQMaker::inputStream_DAQ


Thinking that StTpcHitMaker may be the simplest (and best known to me) of those three key makers, I used TStopwatch to time various portions of the code. I found in particular that two places were using a lot of CPU:
  1. GetNextDaqElement() calls
  2. GetInputDS("StEvent") calls
It turns out the GetInputDS() calls were excessive in number: it only needed to be called once per event, but I was seeing over 1000 calls per event. I discussed a solution for this with the TPC group and then committed the fix.

The GetNextDaqElement() calls go through StMaker::FindDataSet(), which I then also populated with TStopwatch to determine slow points in 64-bit. I found two:
  1. A call to TString::Contains()
  2. A call to TDataSet::Find()
The call to TString::Contains() uses only a single character in the search and can be easily replaced with a strchr() call that is much faster. I suggest to do this [done]. However, both of these calls feature text string (character array) comparisons. There are reasons to expect this not to be faster in 64-bit because bytes are compared at offsets from the 32-bit or 64-bit word boundaries, which likely requires additional overhead.

But I wondered which function exactly is our biggest problem, so I copied the TString::Contains() [really TString::Index()] function code to a local function. Not slower in 64-bit!!! So I then copied all of TString locally, renamed it, and used the local class in this one place in StMaker::FindDataSet(). Again, not slower in 64-bit!!!

The implication, then is that the compiled TString code from ROOT provided on RACF is slower than a local compile of the same code.

I have now implemented a very simple test, not using any STAR-specific software, just ROOT as provided on RACF and its TString::Contains() function, to demonstrate that 64-bit optimization isn't as it should be.

Here are the results of several tests:

SL17g_test on SL6 in a Singularity container on RACF:
unoptimized	optimized
32-bit	2.79	0.95
64-bit	2.91	2.85
SL17g_test on SL7 on RACF:
unoptimized	optimized
32-bit	2.81	0.58
64-bit	2.72	2.70
DEV on SL7 on RACF:
unoptimized	optimized
32-bit	2.94	0.58
64-bit	2.80	2.80
DEV in CVMFS on RACF:
unoptimized	optimized
32-bit	2.81	0.59
64-bit	2.80	0.55
These demonstrate that only the ROOT compiled under CVMFS has a fast 64-bit TString::Contains().

-Gene

__________

Important observation brought up in discussions today: the 64-bit optimized version of root actually wasn't available on AFS, as noted in the comment below from Jerome! When setting the environment to use 64-bit optimized on RACF, the unoptimized 64-bit root was being run, but this wasn't obvious to a user. So optimized STAR-specific software was being run in combination with unoptimized ROOT software.