scheduler tricks

  1.  To see full details of the job que do:
    • condor_q -long -submitter balewski
  2. To see all machines available to STAR do
    • condor_status -constraint 'Turn_Off==False' 
  3. STAR queues limits:
    +---------------------------------------------Queues-----------------------------------------------+
    +-------------------------+------+--------+-----------+--------+-------+--------+------------------+
    | ID                      | Name | Type   | TimeLimit | MaxMem | Local | S.O.P. | Cluster          |
    +-------------------------+------+--------+-----------+--------+-------+--------+------------------+
    | bnl_condor_short_quick  |      | CONDOR | 180min    | 350MB  | N     | 1      | rcrs.rcf.bnl.gov |
    | BNL_condor_medium_quick |      | CONDOR | 300min    | 400MB  | N     | 50     | rcrs.rcf.bnl.gov |
    | bnl_condor_long_quick   |      | CONDOR | 14400min  | 440MB  | N     | 100    | rcrs.rcf.bnl.gov |
    +-------------------------+------+--------+-----------+--------+-------+--------+------------------+
  4. Condor priority can be checked with condor_userprio

    % condor_userprio -allusers | grep "balewski"
    balewski@bnl.gov                       179.52

    with no activity you have value of 5 and it grows from there...
  5. running jobs fired from different rcas6nnn machines: 
    • condor_q -submitter balewski | grep "running" ; date
  6. You can remove all the jobs by using
    % condor_vacate -fast
    wait a few mnts
    % condor_rm $USER

    Alternatively, use the
    -pool condor02.rcf.bnl.gov:9664
    qualifier to be sure to remove jobs even if they were not
    submitted from the node where you are now.
  7. tell scheduler to split files: (PDF )
    1. <job datasetSplitting="eventBased"  maxEvents="2000">

    2.  

      root4star -q -b userMacro.C\($EVENTS_START, $EVENTS_STOP,\"$FILELIST\"\)

       

  8. Job priority, by Leve:
    This is not really a SUMS issue, it’s sofi. It would depend on a few things:
    1) The resource requirements of the jobs.
    2) Your condor priority can be checked with condor_userprio
    3) The queue the jobs are in.
    4) Also if the queue gets real long it will, only check a section of the 
    jobs at once to try to match them to nodes. So in condor it will depend 
    on which section it’s matching. This is why we fear users submitting 
    real large numbers of jobs.
    5) The priority of the jobs relative to each other, this can force jobs 
    to start in the order you want it can be set in SUMS or in the .condor 
    file. This will not effect overall priority.
    
    < ResourceUsage>
    < Priority>n< /Priority>
    < /ResourceUsage>
    
    
    
    See: http://www.star.bnl.gov/public/comp/Grid/scheduler/manual.htm
    See Section: 3.4 Resource requirements :  element and  sub-elements
  9. ddd