MemoryManager

class cubie.memory.mem_manager.MemoryManager(totalmem: int = None, registry: dict[int, ~cubie.memory.mem_manager.InstanceMemorySettings]=NOTHING, stream_groups: StreamGroups = NOTHING, mode: str = 'passive', allocator: FakeBaseCUDAMemoryManager = <class 'cubie.cuda_simsafe.FakeNumbaCUDAMemoryManager'>, auto_pool: list[int] = NOTHING, manual_pool: list[int] = NOTHING, queued_allocations: Dict[str, ~typing.Dict]=NOTHING)[source]

Bases: object

Singleton interface coordinating GPU memory allocation and stream usage.

Parameters:
  • totalmem (int) – Total GPU memory in bytes. Determined automatically when omitted.

  • registry (dict[int, cubie.memory.mem_manager.InstanceMemorySettings]) – Registry mapping instance identifiers to their memory settings.

  • stream_groups (cubie.memory.stream_groups.StreamGroups) – Manager for organizing instances into stream groups.

  • _mode (str) – Memory management mode, either "passive" or "active".

  • _allocator (cubie.cuda_simsafe.FakeBaseCUDAMemoryManager) – Memory allocator class registered with Numba.

  • _auto_pool (list[int]) – List of instance identifiers using automatic memory allocation.

  • _manual_pool (list[int]) – List of instance identifiers using manual memory allocation.

  • _queued_allocations (Dict[str, Dict]) – Queued allocation requests organized by stream group.

Notes

The manager accepts ArrayRequest objects and returns ArrayResponse instances that reference allocated arrays and chunking information. Active mode enforces per-instance VRAM proportions while passive mode mirrors standard allocation behaviour using chunking only when necessary.

_add_auto_proportion(instance: object) float[source]

Add an instance to the auto allocation pool with equal share.

Parameters:

instance – Instance to add to auto allocation pool.

Returns:

Proportion assigned to this instance.

Return type:

float

Raises:

ValueError – If available auto-allocation pool is less than minimum required size.

Notes

Splits the non-manually-allocated portion of VRAM equally among all auto-allocated instances. Triggers rebalancing of the auto pool.

_add_manual_proportion(instance: object, proportion: float) None[source]

Add an instance to the manual allocation pool with the specified proportion.

Parameters:
  • instance – Instance to add to manual allocation pool.

  • proportion – Memory proportion to assign (0.0 to 1.0).

Raises:

ValueError – If manual proportion would exceed total available memory or leave insufficient memory for auto-allocated processes.

Warning

UserWarning

If manual proportion leaves less than 5% of memory for auto allocation.

Notes

Updates the instance’s proportion and cap, then rebalances the auto pool. Enforces minimum auto pool size constraints.

Return type:

None

_check_requests(requests: dict[str, ArrayRequest]) None[source]

Validate that all requests are properly formatted.

Parameters:

requests – Dictionary of requests to validate.

Raises:

TypeError – If requests is not a dict or contains invalid ArrayRequest objects.

Return type:

None

_rebalance_auto_pool() None[source]

Redistribute available memory equally among auto-allocated instances.

Notes

Calculates the available proportion after manual allocations and divides it equally among all instances in the auto pool. Updates both proportion and cap for each auto-allocated instance.

Return type:

None

allocate(shape: tuple[int, ...], dtype: Callable, memory_type: str, stream: cuda.cudadrv.driver.Stream = 0) object[source]

Allocate a single C-contiguous array with specified parameters.

Parameters:
  • shape – Shape of the array to allocate.

  • dtype – Constructor returning the precision object for the array elements.

  • memory_type – Type of memory: “device”, “mapped”, “pinned”, or “managed”.

  • stream – CUDA stream for the allocation. Defaults to 0.

Returns:

Allocated GPU array.

Return type:

object

Raises:
allocate_all(requests: dict[str, ArrayRequest], instance_id: int, stream: cuda.cudadrv.driver.Stream) dict[str, object][source]

Allocate multiple arrays based on a dictionary of requests.

Parameters:
  • requests – Dictionary mapping labels to array requests.

  • instance_id – ID of the requesting instance.

  • stream – CUDA stream for the allocations.

Returns:

Dictionary mapping labels to allocated arrays.

Return type:

dict of str to object

allocate_queue(triggering_instance: object) None[source]

Process all queued requests for a stream group with coordinated chunking.

Chunking is always performed along the run axis when memory constraints require splitting the batch.

Parameters:

triggering_instance – The instance that triggered queue processing.

Notes

Processes all pending requests in the same stream group, applying coordinated chunking based on available memory. Calls allocation_ready_hook for each instance with their results.

Return type:

None

property auto_pool_proportion

Total proportion of VRAM currently distributed automatically.

cap(instance: object) int | None[source]

Get the maximum allocatable bytes for an instance.

Parameters:

instance – Instance to query.

Returns:

Maximum allocatable bytes for this instance.

Return type:

int or None

change_stream_group(instance: object, new_group: str) None[source]

Move instance to another stream group.

Parameters:
  • instance – Instance to move.

  • new_group – Name of the new stream group.

Return type:

None

compute_chunked_shapes(requests: dict[str, ArrayRequest], chunk_size: int) dict[str, Tuple[int, ...]][source]

Compute per-array chunked shapes based on available memory.

Parameters:
  • requests – Dictionary mapping labels to array requests.

  • chunk_size – Length of chunked arrays along run axis

Returns:

Mapping from array labels to their per-chunk shapes.

Return type:

dict[str, tuple[int, …]]

Notes

Unchunkable arrays retain their original shape.

create_host_array(shape: tuple[int, ...], dtype: type, memory_type: str = 'pinned', like: ndarray | None = None) ndarray[source]

Create a C-contiguous host array.

Parameters:
  • shape – Shape of the array to create.

  • dtype – Data type for the array elements.

  • memory_type – Memory type for the host array. Must be "pinned" or "host". Defaults to "pinned".

  • like – A source array to copy data from. If provided, the new array has the same data as like; if not, it is filled with zeros

Returns:

C-contiguous host array.

Return type:

numpy.ndarray

free(array_label: str) None[source]

Free an allocation by label across all instances.

Parameters:

array_label – Label of the allocation to free.

Return type:

None

free_all() None[source]

Free all allocations across all registered instances.

Return type:

None

from_device(instance: object, from_arrays: list[object], to_arrays: list[object]) None[source]

Copy data from device arrays using the instance’s stream.

Parameters:
  • instance – Instance whose stream to use for copying.

  • from_arrays – Source device arrays to copy from.

  • to_arrays – Destination arrays to copy to.

Return type:

None

get_available_memory(group: str) int[source]

Get available memory for an entire stream group.

Parameters:

group – Name of the stream group.

Returns:

Available memory in bytes for the group.

Return type:

int

Warning

UserWarning

If group has used more than 95% of allocated memory.

get_chunk_parameters(requests: Dict[str, Dict], axis_length: int, stream_group: str) Tuple[int, int][source]

Calculate number of chunks and chunk size for a dict of array requests.

Chunking is performed along the run axis only.

Parameters:
  • requests – Dictionary mapping instance IDs to their array requests.

  • axis_length – Unchunked length of the chunking axis.

  • stream_group – Name of the stream group making the request.

Returns:

Length of chunked axis and number of chunks needed to fit the request.

Return type:

int, int

Warning

UserWarning

If request exceeds available VRAM by more than 20x.

get_memory_info() tuple[int, int][source]

Get free and total GPU memory information.

Returns:

(free_memory, total_memory) in bytes.

Return type:

tuple of int

get_stream(instance: object) object[source]

Get the CUDA stream associated with an instance.

Parameters:

instance – Instance to retrieve the stream for.

Returns:

CUDA stream associated with the instance.

Return type:

object

get_stream_group(instance: object) str[source]

Get the name of the stream group for an instance.

Parameters:

instance – Instance to query.

Returns:

Name of the stream group.

Return type:

str

invalidate_all() None[source]

Call each invalidate hook and release all allocations.

Return type:

None

is_grouped(instance: object) bool[source]

Check if instance is grouped with others in a named stream.

Parameters:

instance – Instance to check.

Returns:

True if instance shares a stream group with other instances.

Return type:

bool

property manual_pool_proportion

Total proportion of VRAM currently assigned manually.

proportion(instance: object) float[source]

Get the maximum proportion of VRAM allocated to an instance.

Parameters:

instance – Instance to query.

Returns:

Proportion of VRAM allocated to this instance.

Return type:

float

queue_request(instance: object, requests: dict[str, ArrayRequest]) None[source]

Queue allocation requests for batched stream group processing.

Parameters:
  • instance – The instance making the request.

  • requests – Dictionary mapping labels to array requests.

Notes

Requests are queued per stream group, allowing multiple components to contribute to a single coordinated allocation that can be optimally chunked together.

Return type:

None

register(instance: object, proportion: float | None = None, invalidate_cache_hook: Callable = <function placeholder_invalidate>, allocation_ready_hook: Callable = <function placeholder_dataready>, stream_group: str = 'default') None[source]

Register an instance and configure its memory allocation settings.

Parameters:
  • instance – Instance to register for memory management.

  • proportion – Proportion of VRAM to allocate (0.0 to 1.0). When omitted, the instance joins the automatic allocation pool.

  • invalidate_cache_hook – Function to call when CUDA memory system changes occur.

  • allocation_ready_hook – Function to call when allocations are ready.

  • stream_group – Name of the stream group to assign the instance to.

Raises:

ValueError – If instance is already registered or proportion is not between 0 and 1.

Return type:

None

registry: dict[int, InstanceMemorySettings]
reinit_streams() None[source]

Reinitialise all streams after a CUDA context reset.

Return type:

None

set_allocator(name: str) None[source]

Set the external memory allocator in Numba.

Parameters:

name – Memory allocator type. Accepted values are "cupy_async" to use CuPy’s AsyncMemoryPool, "cupy" to use MemoryPool, and "default" for Numba’s default manager.

Raises:

ValueError – If allocator name is not recognized.

Warning

UserWarning

A change to the memory manager requires the CUDA context to be closed and reopened. This invalidates all previously compiled kernels and allocated arrays, requiring a full rebuild.

Return type:

None

set_auto_limit_mode(instance: object) None[source]

Convert a manual-limited instance to auto allocation mode.

Parameters:

instance – Instance to convert to auto mode.

Raises:

ValueError – If instance is already in auto allocation pool.

Return type:

None

set_limit_mode(mode: str) None[source]

Set the memory allocation limiting mode.

Parameters:

mode – Either "passive" or "active" memory management mode.

Raises:

ValueError – If mode is not “passive” or “active”.

Return type:

None

set_manual_limit_mode(instance: object, proportion: float) None[source]

Convert an auto-limited instance to manual allocation mode.

Parameters:
  • instance – Instance to convert to manual mode.

  • proportion – Memory proportion to assign (0.0 to 1.0).

Raises:

ValueError – If instance is already in manual allocation pool.

Return type:

None

set_manual_proportion(instance: object, proportion: float) None[source]

Set manual allocation proportion for an instance.

If instance is currently in the auto-allocation pool, shift it to manual.

Parameters:
  • instance – Instance to update proportion for.

  • proportion – New proportion between 0 and 1.

Raises:

ValueError – If proportion is not between 0 and 1.

Return type:

None

stream_groups: StreamGroups
sync_stream(instance: object) None[source]

Synchronize the CUDA stream for an instance.

Parameters:

instance – Instance whose stream to synchronize.

Return type:

None

to_device(instance: object, from_arrays: list[object], to_arrays: list[object]) None[source]

Copy data to device arrays using the instance’s stream.

Parameters:
  • instance – Instance whose stream to use for copying.

  • from_arrays – Source arrays to copy from.

  • to_arrays – Destination device arrays to copy to.

Return type:

None

totalmem: int