SRM/DRM Testing June/July 2007

Charge

We had a discussion with Arie Shoshani and group pertaining
to the use of SRM (client and site caching) in our analysis
scenario. We agreed we would proceed with the following plan,
giving ourselves the best shot at achieving the milestone we
have with the OSG.
- first of all, we will try to restore the SRM service both at
LBNL and BNL . This will require
* Disk space for the SRM cache at LBNL - 500 GB is plenty
* Disk space for the SRM cache at BNL - same size is fine

- we hope for a test of transfer to be passed to the OSG troubleshooting
team who will stress test the data transfer as we have defined i.e.
* size test and long term stability - we would like to define a test
where each job would transfer 500 MB of data from LBNL to BNL
We would like 100 jobs submitted at a time
For the test to be run for at least a few days
* we would like to be sure the test includes burst of
100 requests transfer /mn to SRM
+ the success matrix
. how many time the service had to be restarted
. % success on data transfer
+ we need to document the setup i.e.number of streams
(MUST be greater than 1)

- whenever this test is declared successful, we would use
the deployment in our simulation production in real
production mode - the milestone would then behalf
achieved

- To make our milestone fully completed, we would reach
+1 site. The question was which one?
* Our plan is to move to SRM v2.2 for this test - this
is the path which is more economical in terms of manpower,
OSG deliverables and allow for minimal reshuffling of
manpower and current assignment hence increasing our
chances for success.
* FermiGrid would not have SRM 2.2 however
=> We would then UIC for this, possibly leveraging OSG
manpower to help with setting up a fully working
environment.

Our contact people would be

- Doug Olson for LBNL working with Alex Sim, Andrew Rose,
Eric Hjort (whenever necessary) and Alex Sim
* The work with the OSG troubleshooting team will be
coordinated from LBNL side
* We hope Andrew/Eric will work along with Alex to
set the test described above

- Wayne Betts for access to the infrastructure at BNL
(assistance from everyone to clean the space if needed)

- Olga Barannikova will be our contact for UIC - we will
come back to this later according to the strawman plan
above

As a reminder, I have discussed with Ruth that at
this stage, and after many years of work which are bringing
exciting and encouraging sign of success (the recent production
stability being one) I have however no intent to move, re-scope
or re-schedule our milestone. Success of this milestone is path
forward to make Grid computing part of our plan for the future.
As our visit was understood and help is mobilize, we clearly
see that success is reachable.

I count on all of you for full assistance with
this process.

Thank you,

--
,,,,,
( o o )
--m---U---m--
Jerome

Test Plan (Alex S., 14 June)

Hi all,

The following plan will be performed for STAR SRM test by SDM group with
BeStMan SRM v2.2.
Andrew Rose will duplicate, in the mean time, the successful analysis case
that Eric Hjort had previously.

1. small local setup
1.1. small number of analysis jobs will be submitted directly to PDSF job
queue.
1.2. A job will transfer files from datagrid.lbl.gov via gsiftp into the
PDSF project working cache.
1.3. a fake analysis will be performed to produce a result file.
1.4 the job will issue srm-client to call BeStman to transfer the result
file out to datagrid.lbl.gov via gsiftp.

2. small remote setup
2.1. small number of analysis jobs will be submitted directly to PDSF job
queue.
2.2. A job will transfer files from stargrid?.rcf.bnl.gov via gsiftp into
the PDSF project working cache.
2.3. a fake analysis will be performed to produce a result file.
2.4 the job will issue srm-client to call BeStman to transfer the result
file out to stargrid?.rcf.bnl.gov via gsiftp.

3. large local setup
3.1. about 100-200 analysis jobs will be submitted directly to PDSF job
queue.
3.2. A job will transfer files from datagrid.lbl.gov via gsiftp into the
PDSF project working cache.
3.3. a fake analysis will be performed to produce a result file.
3.4 the job will issue srm-client to call BeStman to transfer the result
file out to datagrid.lbl.gov via gsiftp.

4. large remote setup
4.1. about 100-200 analysis jobs will be submitted directly to PDSF job
queue.
4.2. A job will transfer files from stargrid?.rcf.bnl.gov via gsiftp into
the PDSF project working cache.
4.3. a fake analysis will be performed to produce a result file.
4.4 the job will issue srm-client to call BeStman to transfer the result
file out to stargrid?.rcf.bnl.gov via gsiftp.

5. small remote sums setup
5.1. small number of analysis jobs will be submitted to SUMS.
5.2. A job will transfer files from stargrid?.rcf.bnl.gov via gsiftp into
the PDSF project working cache.
5.3. a fake analysis will be performed to produce a result file.
5.4 the job will issue srm-client to call BeStman to transfer the result
file out to stargrid?.rcf.bnl.gov via gsiftp.

6. large remote setup
6.1. about 100-200 analysis jobs will be submitted to SUMS.
6.2. A job will transfer files from stargrid?.rcf.bnl.gov via gsiftp into
the PDSF project working cache.
6.3. a fake analysis will be performed to produce a result file.
6.4 the job will issue srm-client to call BeStman to transfer the result
file out to stargrid?.rcf.bnl.gov via gsiftp.

7. have Andrew and Lidia use the setup #6 to test with real analysis jobs
8. have a setup #5 on UIC and test
9. have a setup #6 on UIC and test
10. have Andrew and Lidia use the setup #9 to test with real analysis jobs

Any questions? I'll let you know when things are in progress.

-- Alex
asim at lbl dot gov