BSMD Status Monitoring Documentation

Update Feb 2016 -- Kolja:
These steps will serve as documentation for starting BSMD Status Monitoring each year. This is code that creates summaries and pedestal PDF files on this website.

1. Log onto the online machines. From your terminal, try this (insert your username):

ssh -X -A username@rssh.rhic.bnl.gov
ssh -X -A username@stargw.starp.bnl.gov
ssh -X -A onlmon@onl02.starp.bnl.gov
 
In principle, it doesn't matter which onl machine you use, but it's very helpful to know where your code is running, and traditionallw we use onl02. If you have a failure, please visit this website: www.star.bnl.gov/starkeyw/ and request access to onlldap.starp.bnl.gov as the username "onlmon".


2. check out monitoring software and set up directories

In /ldaphome/onlmon/ (your home directory) execute:

mkdir bsmdYYYY/ (YYYY = current year)
cd 
bsmdYYYY

# REPLACE WITH CVS CO

# cp -r bsmdMonitoringCode/* .


End Update Feb 2016 -- Kolja

Ideally, ignore the instructions below, I'm keeping them for now while I'm rebuilding the code and the instructions

These steps will serve as documentation for starting BSMD Status Monitoring each year. This is code that creates summaries and pedestal PDF files on this website. This algorithm has some problems, that I will note in the steps. To date the code will run with these issues, but they should be resolved at some point.

The first steps are to log onto the online machines (I like to use onl02). From your terminal, try this (insert your username):

ssh -X -A username@rssh.rhic.bnl.gov
ssh -X -A username@stargw.starp.bnl.gov
ssh -X -A onlmon@onl02.starp.bnl.gov

If you have a failure, please visit this website: www.star.bnl.gov/starkeyw/ and request access to the onlXX nodes as the username "onlmon".

Once you have access, log in and proceed to the directory /ldaphome/onlmon/, which is the home directory when you log in. Here you should find a directory named bsmdMonitoringCode/, which serves as a clean backup of all the code without any output. In the /ldaphome/onlmon/ directory, execute these commands:

mkdir bsmdYYYY/ (YYYY = current year)
cp -r bsmdMonitoringCode/* bsmdYYYY/

There also needs to be a web directory set up. There should be a directory /onlineweb/www/bsmdStatusYYYY/. If it's not yet set up, email Wayne Betts or Jerome Lauret to request it. Also, there should be a directory /onlineweb/www/bsmdStatus that is soft linked to the /onlineweb/www/bsmdStatusYYYY/ directory.

Now you must update the year to the current running year. When the code was backed up, the year was 2015, so one should search for that year. Use the following command:

grep -r --color "bsmd2015" *
grep -r --color "bsmdStatus2015" *
Update Feb 2016 -- Kolja:
Clean up first using make clean, then replace all instances using
find . -type f | xargs perl -pi -e 's/bsmd2015/bsmd2016/g'
find . -type f | xargs perl -pi -e 's/bsmdStatus2015/bsmdStatus2016/g'
perl -pi -e 's/2015/2016/g' *html
--- End Update

From these two results, you should see several files that need updating. Change all instances of the above to the current year. You may want to do one final grep for just 2015

grep -r -I --color "2015" *
to see if there are any final instances that need updating.
Uppercase "i" suppresses output for binary files, everything else should be pretty obviously unrelated.
Be sure to update index.html to your address, so someone can contact you in the event of a problem.

The final update you need to make is in the .cshrc file in the /ldaphome/onlmon/ directory.

emacs -nw /ldaphome/onlmon/.cshrc

Change LD_LIBRARY_PATH to have current year /ldaphome/onlmon/bsmdYYYY/lib. Note that this is an issue, one shouldn't have to mess with the .cshrc file. Some part of compilation needs updating to exclude the use of this. After changing the LD_LIBRARY_PATH variable, be sure to log out and log back into the online machine (or, make sure to execute "source .cshrc").

Now you're ready to compile the code. Starting in your new /ldaphome/onlmon/bsmdYYYY directory, execute the following

cd StRoot/RTS/src/RTS_EXAMPLE
make clean
make

The compiler generally breaks on the first try, but running "make" a second time always works. This is a problem, and should be investigated. 

Update Feb 2016 -- Kolja:

This is legacy code that I'm not going to touch. However, I edited the makefile to suppress the warnings for mostly harmless problems.
Be sure to at least use the Makefile from

/ldaphome/onlmon/bsmd2016/StRoot/RTS/src/RTS_EXAMPLE

or start the whole procedure above with /ldaphome/onlmon/bsmd2016/ per default.
There are some frightening warnings remaining which an expert should investigate...

--- End Update

Now go back to the /ldaphome/onlmon/bsmdYYYY/ directory and compile there:

cd /ldaphome/onlmon/bsmdYYYY
make clean
make

This wraps up the preparation for the code. Using the code from the backup directory there should be no issues, but some always tend to pop up. Hopefully it goes well! If all goes well to this point, you run the code with the following command:

nohup python runOnlineBsmdPSQA.py -n 1000000 -m /evp/ >& monitoringReport.txt &

If the code ever stops, be sure to remove the monitoringReport.txt file before trying this command again, or it wont run. Also, running with the nohup command above, you'll need to be able to stop the code. Remember which node you were logged into when you started it (I like onl02) and execute

ps -daf | grep "runOnlinePSQA"

If the code is running, this will give you the process ID which you can stop the code with using

kill XXXXX

where XXXXX is the process ID. If you would like to learn more about the code, there is further documentation here:
drupal.star.bnl.gov/STAR/blog/wleight/2009/sep/01/run-9-bsmd-online-monitoring-documentation

Go to the STAR online web server to check if your plots are being put on the webpage: online.star.bnl.gov/bsmdStatus. Also check the ped text files to make sure they aren't all zero. If any of the PDF or Summary links are empty, then there are issues you should tend to immediately.

The Plots under the PDF link should look something like the ones here as an example: http://drupal.star.bnl.gov/STAR/blog-entry/wleight/2009/jan/07/bprs-bsmd-online-qa

If the script stops, restart it with the same command used previously.