MemoryManager

class cubie.memory.mem_manager.MemoryManager(totalmem: int = None, registry: dict[int, ~cubie.memory.mem_manager.InstanceMemorySettings]=NOTHING, stream_groups: StreamGroups = NOTHING, mode: str = 'passive', allocator: FakeBaseCUDAMemoryManager = <class 'cubie.cuda_simsafe.FakeNumbaCUDAMemoryManager'>, auto_pool: list[int] = NOTHING, manual_pool: list[int] = NOTHING, queued_allocations: Dict[str, ~typing.Dict]=NOTHING)[source]

Bases: object

Singleton interface coordinating GPU memory allocation and stream usage.

Parameters:

totalmem (int) – Total GPU memory in bytes. Determined automatically when omitted.
registry (dict[int, cubie.memory.mem_manager.InstanceMemorySettings]) – Registry mapping instance identifiers to their memory settings.
stream_groups (cubie.memory.stream_groups.StreamGroups) – Manager for organizing instances into stream groups.
_mode (str) – Memory management mode, either "passive" or "active".
_allocator (cubie.cuda_simsafe.FakeBaseCUDAMemoryManager) – Memory allocator class registered with Numba.
_auto_pool (list[int]) – List of instance identifiers using automatic memory allocation.
_manual_pool (list[int]) – List of instance identifiers using manual memory allocation.
_queued_allocations (Dict[str, Dict]) – Queued allocation requests organized by stream group.

Notes

The manager accepts ArrayRequest objects and returns ArrayResponse instances that reference allocated arrays and chunking information. Active mode enforces per-instance VRAM proportions while passive mode mirrors standard allocation behaviour using chunking only when necessary.

_add_auto_proportion(instance: object) → float[source]

Add an instance to the auto allocation pool with equal share.

Parameters:: instance – Instance to add to auto allocation pool.
Returns:: Proportion assigned to this instance.
Return type:: float
Raises:: ValueError – If available auto-allocation pool is less than minimum required size.

Notes

Splits the non-manually-allocated portion of VRAM equally among all auto-allocated instances. Triggers rebalancing of the auto pool.

_add_manual_proportion(instance: object, proportion: float) → None[source]

Add an instance to the manual allocation pool with the specified proportion.

Parameters:

instance – Instance to add to manual allocation pool.
proportion – Memory proportion to assign (0.0 to 1.0).

Raises:

ValueError – If manual proportion would exceed total available memory or leave insufficient memory for auto-allocated processes.

Warning

UserWarning: If manual proportion leaves less than 5% of memory for auto allocation.

Notes

Updates the instance’s proportion and cap, then rebalances the auto pool. Enforces minimum auto pool size constraints.

Return type:: None

_check_requests(requests: dict[str, ArrayRequest]) → None[source]

Validate that all requests are properly formatted.

Parameters:: requests – Dictionary of requests to validate.
Raises:: TypeError – If requests is not a dict or contains invalid ArrayRequest objects.
Return type:: None

_rebalance_auto_pool() → None[source]

Redistribute available memory equally among auto-allocated instances.

Notes

Calculates the available proportion after manual allocations and divides it equally among all instances in the auto pool. Updates both proportion and cap for each auto-allocated instance.

Return type:: None

allocate(shape: tuple[int, ...], dtype: Callable, memory_type: str, stream: cuda.cudadrv.driver.Stream = 0) → object[source]

Allocate a single C-contiguous array with specified parameters.

Parameters:

shape – Shape of the array to allocate.
dtype – Constructor returning the precision object for the array elements.
memory_type – Type of memory: “device”, “mapped”, “pinned”, or “managed”.
stream – CUDA stream for the allocation. Defaults to 0.

Returns:

Allocated GPU array.

Return type:

object

Raises:

ValueError – If memory_type is not recognized.
NotImplementedError – If memory_type is “managed” (not supported).

allocate_all(requests: dict[str, ArrayRequest], instance_id: int, stream: cuda.cudadrv.driver.Stream) → dict[str, object][source]

Allocate multiple arrays based on a dictionary of requests.

Parameters:

requests – Dictionary mapping labels to array requests.
instance_id – ID of the requesting instance.
stream – CUDA stream for the allocations.

Returns:

Dictionary mapping labels to allocated arrays.

Return type:

dict of str to object

allocate_queue(triggering_instance: object) → None[source]

Process all queued requests for a stream group with coordinated chunking.

Chunking is always performed along the run axis when memory constraints require splitting the batch.

Parameters:: triggering_instance – The instance that triggered queue processing.

Notes

Processes all pending requests in the same stream group, applying coordinated chunking based on available memory. Calls allocation_ready_hook for each instance with their results.

Return type:: None

property auto_pool_proportion: Total proportion of VRAM currently distributed automatically.

cap(instance: object) → int | None[source]

Get the maximum allocatable bytes for an instance.

Parameters:: instance – Instance to query.
Returns:: Maximum allocatable bytes for this instance.
Return type:: int or None

change_stream_group(instance: object, new_group: str) → None[source]

Move instance to another stream group.

Parameters:

instance – Instance to move.
new_group – Name of the new stream group.

Return type:

None

compute_chunked_shapes(requests: dict[str, ArrayRequest], chunk_size: int) → dict[str, Tuple[int, ...]][source]

Compute per-array chunked shapes based on available memory.

Parameters:

requests – Dictionary mapping labels to array requests.
chunk_size – Length of chunked arrays along run axis

Returns:

Mapping from array labels to their per-chunk shapes.

Return type:

dict[str, tuple[int, …]]

Notes

Unchunkable arrays retain their original shape.

create_host_array(shape: tuple[int, ...], dtype: type, memory_type: str = 'pinned', like: ndarray | None = None) → ndarray[source]

Create a C-contiguous host array.

Parameters:

shape – Shape of the array to create.
dtype – Data type for the array elements.
memory_type – Memory type for the host array. Must be "pinned" or "host". Defaults to "pinned".
like – A source array to copy data from. If provided, the new array has the same data as like; if not, it is filled with zeros

Returns:

C-contiguous host array.

Return type:

numpy.ndarray

free(array_label: str) → None[source]

Free an allocation by label across all instances.

Parameters:: array_label – Label of the allocation to free.
Return type:: None

free_all() → None[source]

Free all allocations across all registered instances.

Return type:: None

from_device(instance: object, from_arrays: list[object], to_arrays: list[object]) → None[source]

Copy data from device arrays using the instance’s stream.

Parameters:

instance – Instance whose stream to use for copying.
from_arrays – Source device arrays to copy from.
to_arrays – Destination arrays to copy to.

Return type:

None

get_available_memory(group: str) → int[source]

Get available memory for an entire stream group.

Parameters:: group – Name of the stream group.
Returns:: Available memory in bytes for the group.
Return type:: int

Warning

UserWarning: If group has used more than 95% of allocated memory.

get_chunk_parameters(requests: Dict[str, Dict], axis_length: int, stream_group: str) → Tuple[int, int][source]

Calculate number of chunks and chunk size for a dict of array requests.

Chunking is performed along the run axis only.

Parameters:

requests – Dictionary mapping instance IDs to their array requests.
axis_length – Unchunked length of the chunking axis.
stream_group – Name of the stream group making the request.

Returns:

Length of chunked axis and number of chunks needed to fit the request.

Return type:

int, int

Warning

UserWarning: If request exceeds available VRAM by more than 20x.

get_memory_info() → tuple[int, int][source]

Get free and total GPU memory information.

Returns:: (free_memory, total_memory) in bytes.
Return type:: tuple of int

get_stream(instance: object) → object[source]

Get the CUDA stream associated with an instance.

Parameters:: instance – Instance to retrieve the stream for.
Returns:: CUDA stream associated with the instance.
Return type:: object

get_stream_group(instance: object) → str[source]

Get the name of the stream group for an instance.

Parameters:: instance – Instance to query.
Returns:: Name of the stream group.
Return type:: str

invalidate_all() → None[source]

Call each invalidate hook and release all allocations.

Return type:: None

is_grouped(instance: object) → bool[source]

Check if instance is grouped with others in a named stream.

Parameters:: instance – Instance to check.
Returns:: True if instance shares a stream group with other instances.
Return type:: bool

property manual_pool_proportion: Total proportion of VRAM currently assigned manually.

proportion(instance: object) → float[source]

Get the maximum proportion of VRAM allocated to an instance.

Parameters:: instance – Instance to query.
Returns:: Proportion of VRAM allocated to this instance.
Return type:: float

queue_request(instance: object, requests: dict[str, ArrayRequest]) → None[source]

Queue allocation requests for batched stream group processing.

Parameters:

instance – The instance making the request.
requests – Dictionary mapping labels to array requests.

Notes

Requests are queued per stream group, allowing multiple components to contribute to a single coordinated allocation that can be optimally chunked together.

Return type:: None

register(instance: object, proportion: float | None = None, invalidate_cache_hook: Callable = <function placeholder_invalidate>, allocation_ready_hook: Callable = <function placeholder_dataready>, stream_group: str = 'default') → None[source]

Register an instance and configure its memory allocation settings.

Parameters:

instance – Instance to register for memory management.
proportion – Proportion of VRAM to allocate (0.0 to 1.0). When omitted, the instance joins the automatic allocation pool.
invalidate_cache_hook – Function to call when CUDA memory system changes occur.
allocation_ready_hook – Function to call when allocations are ready.
stream_group – Name of the stream group to assign the instance to.

Raises:

ValueError – If instance is already registered or proportion is not between 0 and 1.

Return type:

None

registry: dict[int, InstanceMemorySettings]

reinit_streams() → None[source]

Reinitialise all streams after a CUDA context reset.

Return type:: None

set_allocator(name: str) → None[source]

Set the external memory allocator in Numba.

Parameters:: name – Memory allocator type. Accepted values are "cupy_async" to use CuPy’s AsyncMemoryPool, "cupy" to use MemoryPool, and "default" for Numba’s default manager.
Raises:: ValueError – If allocator name is not recognized.

Warning

UserWarning: A change to the memory manager requires the CUDA context to be closed and reopened. This invalidates all previously compiled kernels and allocated arrays, requiring a full rebuild.

Return type:: None

set_auto_limit_mode(instance: object) → None[source]

Convert a manual-limited instance to auto allocation mode.

Parameters:: instance – Instance to convert to auto mode.
Raises:: ValueError – If instance is already in auto allocation pool.
Return type:: None

set_limit_mode(mode: str) → None[source]

Set the memory allocation limiting mode.

Parameters:: mode – Either "passive" or "active" memory management mode.
Raises:: ValueError – If mode is not “passive” or “active”.
Return type:: None

set_manual_limit_mode(instance: object, proportion: float) → None[source]

Convert an auto-limited instance to manual allocation mode.

Parameters:

instance – Instance to convert to manual mode.
proportion – Memory proportion to assign (0.0 to 1.0).

Raises:

ValueError – If instance is already in manual allocation pool.

Return type:

None

set_manual_proportion(instance: object, proportion: float) → None[source]

Set manual allocation proportion for an instance.

If instance is currently in the auto-allocation pool, shift it to manual.

Parameters:

instance – Instance to update proportion for.
proportion – New proportion between 0 and 1.

Raises:

ValueError – If proportion is not between 0 and 1.

Return type:

None

stream_groups: StreamGroups

sync_stream(instance: object) → None[source]

Synchronize the CUDA stream for an instance.

Parameters:: instance – Instance whose stream to synchronize.
Return type:: None

to_device(instance: object, from_arrays: list[object], to_arrays: list[object]) → None[source]

Copy data to device arrays using the instance’s stream.

Parameters:

instance – Instance whose stream to use for copying.
from_arrays – Source arrays to copy from.
to_arrays – Destination device arrays to copy to.

Return type:

None

totalmem: int