balewski's blog

balewski's home page
Posts

2013

September (1)
May (2)

2012

July (1)
June (3)
May (2)
April (6)
March (3)
February (9)
January (6)

2011

September (1)
June (2)
April (2)
March (1)
January (3)

2010

December (2)
November (2)
September (2)
August (2)
March (1)
February (2)
January (4)

2009

December (4)
November (1)
October (5)
September (6)
August (1)
July (1)
June (1)
May (2)
April (5)
March (5)
February (4)
January (6)

2008

December (1)
November (1)
September (3)
July (1)
June (4)
April (2)
March (2)
February (3)
January (2)

2007

October (1)

STAR Protected

You must register or login in order to post into this group.

User login

Navigation

Group notifications

This group offers an RSS feed. Or subscribe to these personalized, sitewide feeds:

All posts:
- Feed
- Page

Cloud computing - FastOffline run 11

Updated on Thu, 2011-06-02 08:07. Originally created by balewski on 2011-06-01 07:52.

The summary of FastOffline real-time run 11 STAR data processing on the Cloud VMs.

Deployment of STAR software, initially on NERSC/Magellan:

STAR VM w/ SL10k: euca-run-instances -t c1.xlarge emi-48080D8D
STAR VM w/ SL11a: euca-run-instances -t c1.xlarge emi-6F2A0E46
STAR VM w/ SL11b: euca-run-instances - -t c1.xlarge emi-FA4D10D5
STAR VM w/ SL11c: euca-run-instances -t c1.xlarge emi-6E5B0E5C

Eventually we coherent cluster of over 100 VMs from three resource pools:

NERSC Eucalyptus, up to 60 8-core VMs
ANL Nimbus cloud, up to 60 8-core VMs
ANL OpenStack cloud, 30 8-core VMs

although never all 3 at once at full load.

Data processing scheme:

daq files were dispatched by STAR FastOffline to 3TB /star/data13/ disk at RCF
globus-online was used to serially transfer daq files to 20TB /scratch NERSC
scp command did parallel transfer of daq files to VM and to transfer back the produced muDst+StEvent files to /scratch disk at NERSC
globus-online pulled muDst+Stevent back to /star/data13
FastOfline did cataloging of produced files form data13 and save them in HPSS @ RCF

STAR DB snapshot was generated every 2 hours and uploaded to VMs once per day at fixed random time. Every VM run 5 independent local DB, launched in consecutive days.

Single BFC job lasted up to 3 days.

Summary of performance (estimate, ~10-20% accurate)

# of processed daq files : 18,000
data volume pushed: RCF-->NERSC ~70 TB, send back ~40 TB
# of simultaneous jobs varied from 160 to 750
continuous processing over 4 months
total harvest CPU of 25,000 CPU days or 70 CPU years

Fig 1. Example of peak load on VMs

Fig 2. Total load on VMs over last 3 months

Groups:

STAR Protected

balewski's blog
Login or register to post comments