Group based quota on the PWG disk

General

Following a thread as per how to allocate disk space to users (see BA to GPS plan) here is a description on how to allocate disk space based on group rather than user based quota.
  • First, the file system where this mechanism is applied is assumed to support secondary groups and group quotas. GPS supports those features.
     
  • Second, the primary source of information would be the STAR PhoneBook.
    • Note: STAR Members who would have "forgotten" to identify themselves to our STAR collaboration record keeper would not be able to leverage the group-based quota

Observations and guidances

A few preliminary remarks:
  • A given user is part of one and one institution only
    Bottom line: 1:1 association when it comes to user->institution should be verified in such a way that a given user cannot have more than one secondary group generated in this scheme.
     
  • A given user may change institution from A to B but this is not frequent - when this is done, his/her quota should go toward the new institution B he/she belongs to and the amount previously added to A removed.
    Bottom line: group quotas should be re-computed at each passes.
     
  • Each institution is identified by a unique "short name" - this field can be reliably used to create a pseudo-group in Unix-land. Institutions may however appear, disappear or even (more rarely but happened couple times in 15 years in STAR) change name - this does not happen often.
    Bottom line: the list of groups generated in this scheme should be "guessable" - groups that do not match the list coming from the PhoneBook should be removed.

Code-less workflow description

  • We provide a public list of institution / members in XML format
     
  • An example is shown below
    <institution name="AGH">
       <members>
         <member isAuthor="yes">Adamczyk Leszek</member>
         <member isAuthor="yes">Fulek Lukasz</member>
         <member isAuthor="yes">Kycia Radoslaw</member>
         <member isAuthor="yes">Pawlik Bogdan</member>
         <member isAuthor="yes">Sikora Rafal</member>
         <member isAuthor="no">Chwastowski Janusz</member>
         <member isAuthor="no">Przybycien Mariusz</member>
         <member isAuthor="no">Turnau Jacek</member>
       </members>
    </institution>
  • I propose we use generic secondary group names - for /gpfs01/star/pwg, we would generate a secondary group name volumeq_star_pwg_AGH . This convention would allow:
    (1) making sure this scheme can expand to all experiments and all volume quotas
    (2) checking that a user is assigned one and only one volumeq_star_pwg secondary group
     
  • The group quota based policy considers
    • The quota value is proportional to the number of authors. Each authors brings a quota of X to the group (institution). In the above case, X*5 is given to volumeq_star_pwg_AGH.
    • However, all members would be assigned the volumeq_star_pwg_AGH group quota defined by the group (institution's) number of authors. For example, if X=800 GB, all members of AGH would have a shared group quota volumeq_star_pwg_AGH of 3.9 TB (5*800/1024)
       
  • Workflow: Whenever an institution is parsed
    1. The deterministic secondary group name defined as volumeq_{path}_{short name} - if this secondary group does not exists, create it and show a log message for tracking purposes
    2. Save the group name in a list {G} as the XML block <institutions></institutions> is parsed - OUTER LOOP 1
    3. The number of authors should be calculated as N - group volumeq_{path}_{short name} should be assigned a quota of N*X. If the quota of group volumeq_{path}_{short name} is not equal N*X, adjust and show a log message. If it is already N*X, nothing needs to be changed.
    4. A list {U} of users/members should be saved as the XML block <members></members> is parsed
    5. Foreach user in {U} in the group - INNER LOOP 2
      • transform user names unto a uid - if a lookup of first name / last name fails to resolve into a valid uid, display a log message / warning and move to the next user (nothing we can do).
      • verify the user uid is assigned the volumeq_{path}_{short name} - if not, he should be added to that secondary group
      • verify the user is NOT assigned another secondary group matching the pattern volumeq_* - if so and * != {path}_{short name}, remove the user from that group
    6. After the XML block <institutions></institutions> is parsed, recover all groups matching the pattern volumeq_* existing in LDAP. If a group exists but NOT in the list {G}, this would be an orphan group. Remove it (and display a log message). Alternatively, check that no users are associated with that orphan group before removing it.
  • If a user belong to STAR but is NOT assigned a secondary group volumeq_*, a user based quota of X/2 should be applied to that user. It is not clear how to handle this however. Here are a few cases.
    • If group quotas take precedence over user quotas, then it is safe to apply X/2 user quota to each user and let the logic above take place.
    • However, if the user quota supplement or override the group quota, this would not be possible - the alternative would be to add a check at the end that ALL users in STAR are assigned at least one secondary group quota. If none are applied, this would be an orphan user - a user based quota should be set as X/2
    • A note that whenever a user is assigned a group quota, the user quota should then be removed to ensure consistency.

2015/03/26 Follow-up with the RCF

  • STAR will provide as a format a list of UID instead of a list of first/last names. In other words, we will sort out ourselves the matching between our people names and their RCF user name - this would simplify the RCF work. If any "insider knowledge" is required to match user first/last name to a RCF account, STAR would be best placed to make this adjustement.
    • Note [internal STAR discussion] we would proceed in a few passes
      • Pass 1: grab all user ID in rhstar, scan and match names in LDAP with the PhoneBook and an aproximate match.
        If they match, initialize the PhoneBook field RCFUid.
        If some userID does not match, save names found in LDAP in list 1.
      • Pass 2: select users where RCFUid = '' in the PhoneBook. Create list 2 (with full names).
        Verify overlaps between list 1 and list 2 and fix by hand wherever possible
      • Pass 3: we send the information to our users and ask to verify the records
         
  • To avoid having users having a RCF account but not appearing in our PhoneBook (later updates), we would simplify the scheme through a simple reinforced STAR policy (this is already true but not enforced).
    NB: This would create an incentive for users to ensure they are recorded in our PhoneBook.
    • If a user does not appear in our PhoneBook, he will not be assigned a secondary group and hence, not have access to the storage resources managed by a group quota mechanism
    • If such a user is an author, he would not bring a quota credit to his institution
       
  • ...