Sharing of Computational Resources

Information about how computational resources on the central HPC clusters of the University are shared among users.

General and system-specific terms for shared access and resource usage

The total computing time available on the central HPC systems is a scarce resource. In order to meet several competing goals as far as possible, such as easy access in particular for HPC entrants, efficient use of the available computation time, undisturbed allocation of resources for high-end users, and a reasonable distribution of computation time, we provide a diverse environment that combines different approaches and that sometimes also depends on system-specific modalities.

Bonna:

  • Computational Resources are shared as equally as possible (see below) among research groups. To this end, research group heads receive confirmation requests via e-mail to the address that is associated to their Uni-ID if a user fills out the registration form for gaining access to Bonna and thereby claims being affiliated to the respective research group. Confirmations are carried out by replying to these emails. Only when this happens, the access request is issued to the Bonna admins.
  • Access is restricted in time. As a rule of thumb, students and fully graduated researchers receive access for six and twelve months respectively, and professors receive access for sixty months. Of course, accounts can be prolongated via the corresponding form when they (are about to) expire. Moreover, accounts are deactivated when the respective person leaves the university. See this flowchart for an illustration of the account lifecycle.

Marvin:

  • Computational Resources are shared as equally as possible (see below) among research groups. To this end, research group heads are asked to grant permission to a user that has filled out the (prospective) registration form for gaining access to Marvin and that thereby claims being affiliated to the respective research group. The permissions are granted via a self-service portal. To this end, research group heads are kindly asked to register using the Marvin registration form once (unless they anyways did this before to get access to the system). Group leaders may delegate group management. Information on Marvin group management can be found in our wiki.
  • Access is restricted in time. As a rule of thumb, students and fully graduated researchers receive access for six and twelve months respectively, and professors receive access for sixty months. Of course, accounts can be prolongated via the corresponding form when they (are about to) expire.

Bender:

  • Computational Resources are shared as equally as possible (see below) among users. Students are granted access related to a thesis, course, or project, when confirmed by the head of a research group that shows responsible for this person. To this end, research group heads receive confirmation requests via e-mail to the address that is associated to their Uni-ID which they need to reply to in order to actually issue the access request of the user to the Bender admins.

General information about the sharing of computational resources.

As described system-speficially above, either each research group or each user principally receives an equal share of the total computing time (group share or user share) which is however not to be understood as a static contingent but rather as a dynamic (priority) measure. Using these share measures, submitted jobs will be scheduled by Slurm in the following way:

  • Jobs can always be submitted irrespective of the state of the share. Jobs will always run as long as there are computational resources available that have not been scheduled for higher priority jobs. This means in particular that "unused" computing time is assigned whenever meaningfully possible (requested and available resources fit together).
  • Running jobs use up the share according to the requested resources (CPU cores and RAM). Requesting just a single core but all of the available memory on a node is equivalent to requesting all cores.
  • When a share is exhausted, jobs are enqueued and scheduled with a lower priority than any job associated to a share that is not yet exhausted. This ensures in particular that occasional users will usually be able to start small jobs rather timely.
  • Group shares recover over time (with present settings, about 20 days of not running jobs will fully recover a share).
  • When the number of research groups or users on the system changes, this is reflected in the size of all shares.
  • Please consider that the non-interference and "schedulability" of jobs is further established using job partitions with different, system-specific, properties (such as, e.g., a maximum job "length").

Contact

Wird geladen