In search of 64-bit slowness
Following recent tests by Dmitri and me that showed no library dependence in the 64-bit slowness, I decided to have a closer look at where 64-bit was running slowly. Using simple greps of the log files, I found three notable places in the code:
notes:
1) Sub-chains appear twice...first as a link in the big full chain, and then again at the beginning of its own sub-links.
2) The value "1" on the x-axis was for the whole big full chain, but I excluded it so that it didn't dominate the color scheme in my plots.
Thinking that StTpcHitMaker may be the simplest (and best known to me) of those three key makers, I used TStopwatch to time various portions of the code. I found in particular that two places were using a lot of CPU:
The GetNextDaqElement() calls go through StMaker::FindDataSet(), which I then also populated with TStopwatch to determine slow points in 64-bit. I found two:
But I wondered which function exactly is our biggest problem, so I copied the TString::Contains() [really TString::Index()] function code to a local function. Not slower in 64-bit!!! So I then copied all of TString locally, renamed it, and used the local class in this one place in StMaker::FindDataSet(). Again, not slower in 64-bit!!!
The implication, then is that the compiled TString code from ROOT provided on RACF is slower than a local compile of the same code.
I have now implemented a very simple test, not using any STAR-specific software, just ROOT as provided on RACF and its TString::Contains() function, to demonstrate that 64-bit optimization isn't as it should be.
Here are the results of several tests:
SL17g_test on SL6 in a Singularity container on RACF:
-Gene
__________
Important observation brought up in discussions today: the 64-bit optimized version of root actually wasn't available on AFS, as noted in the comment below from Jerome! When setting the environment to use 64-bit optimized on RACF, the unoptimized 64-bit root was being run, but this wasn't obvious to a user. So optimized STAR-specific software was being run in combination with unoptimized ROOT software.
- StMuDstMaker
- Sti
- StTpcHitMaker
notes:
1) Sub-chains appear twice...first as a link in the big full chain, and then again at the beginning of its own sub-links.
2) The value "1" on the x-axis was for the whole big full chain, but I excluded it so that it didn't dominate the color scheme in my plots.
Dataset | Ratios | Legend |
---|---|---|
2014AuAu200 |
2 StRtsReaderMaker::rts_reader 3 StAnalysisMaker::analysis 4 StTreeMaker::outputStream 5 StEventQAMaker::EventQA 6 StMuDstMaker::MuDst 7 StStrangeMuDstMaker::strangeMuDst 8 StTagsMaker::tags 9 StHighPtTagsMaker::HighPtTags 10 StHeavyTagMaker::HeavyTag 11 StEventCompendiumMaker::StEventCompendiumMaker 12 StMtdCalibMaker::mtdCalib 13 StMtdMatchMaker::MtdMatch 14 StBTofCalibMaker::btofCalib 15 StBTofMatchMaker::btofMatch 16 StdEdxY2Maker::dEdxY2 17 StiMaker::Sti 18 StGenericVertexMaker::GenericVertex 19 Stl3RawReaderMaker::l3RawReader 20 StMaker::BTofChain 21 StEpcMaker::epc 22 StPreEclMaker::preecl 23 StEmcRawMaker::emcRaw 24 StZdcVertexMaker::StZdcVertexMaker 25 StIstHitMaker::ist_hit 26 StIstClusterMaker::ist_cluster 27 StIstRawHitMaker::ist_raw_hit 28 StPxlHitMaker::pxl_hit 29 StPxlClusterMaker::pxl_cluster 30 StPxlRawHitMaker::pxl_raw_hit 31 StMaker::tpcChain 32 StMaker::MtdChain 33 StEventMaker::0Event 34 StTriggerDataMaker::trgd 35 StIstDbMaker::istDb 36 StPxlDbMaker::pxlDb 37 StEEmcDbMaker::eeDb 38 StSstDbMaker::sstDb 39 StTpcDbMaker::tpcDB 40 StDetectorDbMaker::detDb 41 StMagFMaker::MagField 42 St_db_Maker::db 43 StIOMaker::inputStream 44 StMaker::BTofChain 45 StVpdCalibMaker::vpdCalib 46 StBTofHitMaker::tof_raw 47 StMaker::tpcChain 48 StTpcHitMover::tpc_hit_mover 49 StTpcHitMaker::tpc_hits 50 StMaker::MtdChain 51 StMtdHitMaker::mtd_raw 52 StIOMaker::inputStream 53 StDAQMaker::inputStream_DAQ |
|
2015pAu200 |
2 StRtsReaderMaker::rts_reader 3 StAnalysisMaker::analysis 4 StTreeMaker::outputStream 5 StEventQAMaker::EventQA 6 StMuDstMaker::MuDst 7 StStrangeMuDstMaker::strangeMuDst 8 StTagsMaker::tags 9 StHighPtTagsMaker::HighPtTags 10 StHeavyTagMaker::HeavyTag 11 StEventCompendiumMaker::StEventCompendiumMaker 12 StMtdMatchMaker::MtdMatch 13 StBTofCalibMaker::btofCalib 14 StBTofMatchMaker::btofMatch 15 StdEdxY2Maker::dEdxY2 16 StiMaker::Sti 17 StGenericVertexMaker::GenericVertex 18 StFmsFpsMaker::FmsFps 19 StFmsPointMaker::StFmsPointMaker 20 StFpsRawHitMaker::FpsHit 21 StFmsHitMaker::StFmsHitMaker 22 Stl3RawReaderMaker::l3RawReader 23 StMaker::BTofChain 24 StEpcMaker::epc 25 StPreEclMaker::preecl 26 StEmcRawMaker::emcRaw 27 StZdcVertexMaker::StZdcVertexMaker 28 StSstPointMaker::SstPoint 29 StSstDaqMaker::sst_raw 30 StIstHitMaker::ist_hit 31 StIstClusterMaker::ist_cluster 32 StIstRawHitMaker::ist_raw_hit 33 StPxlHitMaker::pxl_hit 34 StPxlClusterMaker::pxl_cluster 35 StPxlRawHitMaker::pxl_raw_hit 36 StMaker::tpcChain 37 StMaker::MtdChain 38 St_pp2pp_Maker::PP2PP 39 StEventMaker::0Event 40 StTriggerDataMaker::trgd 41 StIstDbMaker::istDb 42 StPxlDbMaker::pxlDb 43 StFmsDbMaker::fmsDb 44 StEEmcDbMaker::eeDb 45 StSstDbMaker::sstDb 46 StTpcDbMaker::tpcDB 47 StDetectorDbMaker::detDb 48 StMagFMaker::MagField 49 St_db_Maker::db 50 StIOMaker::inputStream 51 StMaker::BTofChain 52 StVpdCalibMaker::vpdCalib 53 StBTofHitMaker::tof_raw 54 StMaker::tpcChain 55 StTpcHitMover::tpc_hit_mover 56 StTpcHitMaker::tpc_hits 57 StMaker::MtdChain 58 StMtdHitMaker::mtd_raw 59 StIOMaker::inputStream 60 StDAQMaker::inputStream_DAQ |
|
2016AuAu200 |
2 StRtsReaderMaker::rts_reader 3 StAnalysisMaker::analysis 4 StTreeMaker::outputStream 5 StEventQAMaker::EventQA 6 StPicoDstMaker::PicoDst 7 StMuDstMaker::MuDst 8 StStrangeMuDstMaker::strangeMuDst 9 StTagsMaker::tags 10 StHighPtTagsMaker::HighPtTags 11 StHeavyTagMaker::HeavyTag 12 StEventCompendiumMaker::StEventCompendiumMaker 13 StMtdMatchMaker::MtdMatch 14 StBTofCalibMaker::btofCalib 15 StBTofMatchMaker::btofMatch 16 StdEdxY2Maker::dEdxY2 17 StiMaker::Sti 18 StGenericVertexMaker::GenericVertex 19 Stl3RawReaderMaker::l3RawReader 20 StMaker::BTofChain 21 StEpcMaker::epc 22 StPreEclMaker::preecl 23 StEmcRawMaker::emcRaw 24 StZdcVertexMaker::StZdcVertexMaker 25 StSstPointMaker::SstPoint 26 StSstDaqMaker::sst_raw 27 StIstHitMaker::ist_hit 28 StIstClusterMaker::ist_cluster 29 StIstRawHitMaker::ist_raw_hit 30 StPxlHitMaker::pxl_hit 31 StPxlClusterMaker::pxl_cluster 32 StPxlRawHitMaker::pxl_raw_hit 33 StMaker::tpcChain 34 StMaker::MtdChain 35 StEventMaker::0Event 36 StTriggerDataMaker::trgd 37 StIstDbMaker::istDb 38 StPxlDbMaker::pxlDb 39 StEEmcDbMaker::eeDb 40 StSstDbMaker::sstDb 41 StTpcDbMaker::tpcDB 42 StDetectorDbMaker::detDb 43 StMagFMaker::MagField 44 St_db_Maker::db 45 StIOMaker::inputStream 46 StMaker::BTofChain 47 StVpdCalibMaker::vpdCalib 48 StBTofHitMaker::tof_raw 49 StMaker::tpcChain 50 StTpcHitMover::tpc_hit_mover 51 StTpcHitMaker::tpc_hits 52 StMaker::MtdChain 53 StMtdHitMaker::mtd_raw 54 StIOMaker::inputStream 55 StDAQMaker::inputStream_DAQ |
|
2017AuAu54 |
2 StRtsReaderMaker::rts_reader 3 StAnalysisMaker::analysis 4 StTreeMaker::outputStream 5 StEventQAMaker::EventQA 6 StMuDstMaker::MuDst 7 StStrangeMuDstMaker::strangeMuDst 8 StTagsMaker::tags 9 StHighPtTagsMaker::HighPtTags 10 StHeavyTagMaker::HeavyTag 11 StEventCompendiumMaker::StEventCompendiumMaker 12 StMtdMatchMaker::MtdMatch 13 StBTofCalibMaker::btofCalib 14 StBTofMatchMaker::btofMatch 15 StdEdxY2Maker::dEdxY2 16 StiMaker::Sti 17 StGenericVertexMaker::GenericVertex 18 StFmsPointMaker::StFmsPointMaker 19 StFpsRawHitMaker::FpsHit 20 StFmsHitMaker::StFmsHitMaker 21 Stl3RawReaderMaker::l3RawReader 22 StMaker::BTofChain 23 StEpcMaker::epc 24 StPreEclMaker::preecl 25 StEmcRawMaker::emcRaw 26 StZdcVertexMaker::StZdcVertexMaker 27 StMaker::tpcChain 28 StMaker::MtdChain 29 StEventMaker::0Event 30 StTriggerDataMaker::trgd 31 StFmsDbMaker::fmsDb 32 StEEmcDbMaker::eeDb 33 StTpcDbMaker::tpcDB 34 StDetectorDbMaker::detDb 35 StMagFMaker::MagField 36 St_db_Maker::db 37 StIOMaker::inputStream 38 StMaker::BTofChain 39 StVpdCalibMaker::vpdCalib 40 StBTofHitMaker::tof_raw 41 StMaker::tpcChain 42 StTpcHitMover::tpc_hit_mover 43 StTpcHitMaker::tpc_hits 44 StMaker::MtdChain 45 StMtdHitMaker::mtd_raw 46 StIOMaker::inputStream 47 StDAQMaker::inputStream_DAQ |
|
2018AuAu27 |
2 StRtsReaderMaker::rts_reader 3 StAnalysisMaker::analysis 4 StTreeMaker::outputStream 5 StEventQAMaker::EventQA 6 StPicoDstMaker::PicoDst 7 StMuDstMaker::MuDst 8 StStrangeMuDstMaker::strangeMuDst 9 StTagsMaker::tags 10 StHighPtTagsMaker::HighPtTags 11 StHeavyTagMaker::HeavyTag 12 StEventCompendiumMaker::StEventCompendiumMaker 13 StMtdMatchMaker::MtdMatch 14 StBTofCalibMaker::btofCalib 15 StBTofMatchMaker::btofMatch 16 StdEdxY2Maker::dEdxY2 17 StiMaker::Sti 18 StGenericVertexMaker::GenericVertex 19 Stl3RawReaderMaker::l3RawReader 20 StMaker::BTofChain 21 StEpcMaker::epc 22 StPreEclMaker::preecl 23 StEmcRawMaker::emcRaw 24 StZdcVertexMaker::StZdcVertexMaker 25 StMaker::tpcChain 26 StEpdHitMaker::epdHit 27 StMaker::MtdChain 28 StEventMaker::0Event 29 StTriggerDataMaker::trgd 30 StEpdDbMaker::epdDb 31 StEEmcDbMaker::eeDb 32 StTpcDbMaker::tpcDB 33 StDetectorDbMaker::detDb 34 StMagFMaker::MagField 35 St_db_Maker::db 36 StIOMaker::inputStream 37 StMaker::BTofChain 38 StVpdCalibMaker::vpdCalib 39 StBTofHitMaker::tof_raw 40 StMaker::tpcChain 41 StTpcHitMover::tpc_hit_mover 42 StTpcHitMaker::tpc_hits 43 StMaker::MtdChain 44 StMtdHitMaker::mtd_raw 45 StIOMaker::inputStream 46 StDAQMaker::inputStream_DAQ |
Thinking that StTpcHitMaker may be the simplest (and best known to me) of those three key makers, I used TStopwatch to time various portions of the code. I found in particular that two places were using a lot of CPU:
- GetNextDaqElement() calls
- GetInputDS("StEvent") calls
The GetNextDaqElement() calls go through StMaker::FindDataSet(), which I then also populated with TStopwatch to determine slow points in 64-bit. I found two:
- A call to TString::Contains()
- A call to TDataSet::Find()
But I wondered which function exactly is our biggest problem, so I copied the TString::Contains() [really TString::Index()] function code to a local function. Not slower in 64-bit!!! So I then copied all of TString locally, renamed it, and used the local class in this one place in StMaker::FindDataSet(). Again, not slower in 64-bit!!!
The implication, then is that the compiled TString code from ROOT provided on RACF is slower than a local compile of the same code.
I have now implemented a very simple test, not using any STAR-specific software, just ROOT as provided on RACF and its TString::Contains() function, to demonstrate that 64-bit optimization isn't as it should be.
Here are the results of several tests:
SL17g_test on SL6 in a Singularity container on RACF:
unoptimized optimized 32-bit 2.79 0.95 64-bit 2.91 2.85SL17g_test on SL7 on RACF:
unoptimized optimized 32-bit 2.81 0.58 64-bit 2.72 2.70DEV on SL7 on RACF:
unoptimized optimized 32-bit 2.94 0.58 64-bit 2.80 2.80DEV in CVMFS on RACF:
unoptimized optimized 32-bit 2.81 0.59 64-bit 2.80 0.55These demonstrate that only the ROOT compiled under CVMFS has a fast 64-bit TString::Contains().
-Gene
__________
Important observation brought up in discussions today: the 64-bit optimized version of root actually wasn't available on AFS, as noted in the comment below from Jerome! When setting the environment to use 64-bit optimized on RACF, the unoptimized 64-bit root was being run, but this wasn't obvious to a user. So optimized STAR-specific software was being run in combination with unoptimized ROOT software.
Groups:
- genevb's blog
- Login or register to post comments