MemoryManager
- class cubie.memory.mem_manager.MemoryManager(totalmem: int = None, registry: dict[int, ~cubie.memory.mem_manager.InstanceMemorySettings]=NOTHING, stream_groups: StreamGroups = NOTHING, mode: str = 'passive', allocator: FakeBaseCUDAMemoryManager = <class 'cubie.cuda_simsafe.FakeNumbaCUDAMemoryManager'>, auto_pool: list[int] = NOTHING, manual_pool: list[int] = NOTHING, queued_allocations: Dict[str, ~typing.Dict]=NOTHING)[source]
Bases:
objectSingleton interface coordinating GPU memory allocation and stream usage.
- Parameters:
totalmem (int) – Total GPU memory in bytes. Determined automatically when omitted.
registry (dict[int, cubie.memory.mem_manager.InstanceMemorySettings]) – Registry mapping instance identifiers to their memory settings.
stream_groups (cubie.memory.stream_groups.StreamGroups) – Manager for organizing instances into stream groups.
_mode (str) – Memory management mode, either
"passive"or"active"._allocator (cubie.cuda_simsafe.FakeBaseCUDAMemoryManager) – Memory allocator class registered with Numba.
_auto_pool (list[int]) – List of instance identifiers using automatic memory allocation.
_manual_pool (list[int]) – List of instance identifiers using manual memory allocation.
_queued_allocations (Dict[str, Dict]) – Queued allocation requests organized by stream group.
Notes
The manager accepts
ArrayRequestobjects and returnsArrayResponseinstances that reference allocated arrays and chunking information. Active mode enforces per-instance VRAM proportions while passive mode mirrors standard allocation behaviour using chunking only when necessary.- _add_auto_proportion(instance: object) float[source]
Add an instance to the auto allocation pool with equal share.
- Parameters:
instance – Instance to add to auto allocation pool.
- Returns:
Proportion assigned to this instance.
- Return type:
- Raises:
ValueError – If available auto-allocation pool is less than minimum required size.
Notes
Splits the non-manually-allocated portion of VRAM equally among all auto-allocated instances. Triggers rebalancing of the auto pool.
- _add_manual_proportion(instance: object, proportion: float) None[source]
Add an instance to the manual allocation pool with the specified proportion.
- Parameters:
instance – Instance to add to manual allocation pool.
proportion – Memory proportion to assign (0.0 to 1.0).
- Raises:
ValueError – If manual proportion would exceed total available memory or leave insufficient memory for auto-allocated processes.
Warning
- UserWarning
If manual proportion leaves less than 5% of memory for auto allocation.
Notes
Updates the instance’s proportion and cap, then rebalances the auto pool. Enforces minimum auto pool size constraints.
- Return type:
None
- _check_requests(requests: dict[str, ArrayRequest]) None[source]
Validate that all requests are properly formatted.
- Parameters:
requests – Dictionary of requests to validate.
- Raises:
TypeError – If requests is not a dict or contains invalid ArrayRequest objects.
- Return type:
None
- _rebalance_auto_pool() None[source]
Redistribute available memory equally among auto-allocated instances.
Notes
Calculates the available proportion after manual allocations and divides it equally among all instances in the auto pool. Updates both proportion and cap for each auto-allocated instance.
- Return type:
None
- allocate(shape: tuple[int, ...], dtype: Callable, memory_type: str, stream: cuda.cudadrv.driver.Stream = 0) object[source]
Allocate a single C-contiguous array with specified parameters.
- Parameters:
shape – Shape of the array to allocate.
dtype – Constructor returning the precision object for the array elements.
memory_type – Type of memory: “device”, “mapped”, “pinned”, or “managed”.
stream – CUDA stream for the allocation. Defaults to 0.
- Returns:
Allocated GPU array.
- Return type:
- Raises:
ValueError – If memory_type is not recognized.
NotImplementedError – If memory_type is “managed” (not supported).
- allocate_all(requests: dict[str, ArrayRequest], instance_id: int, stream: cuda.cudadrv.driver.Stream) dict[str, object][source]
Allocate multiple arrays based on a dictionary of requests.
- Parameters:
requests – Dictionary mapping labels to array requests.
instance_id – ID of the requesting instance.
stream – CUDA stream for the allocations.
- Returns:
Dictionary mapping labels to allocated arrays.
- Return type:
dict of str to object
- allocate_queue(triggering_instance: object) None[source]
Process all queued requests for a stream group with coordinated chunking.
Chunking is always performed along the run axis when memory constraints require splitting the batch.
- Parameters:
triggering_instance – The instance that triggered queue processing.
Notes
Processes all pending requests in the same stream group, applying coordinated chunking based on available memory. Calls allocation_ready_hook for each instance with their results.
- Return type:
None
- property auto_pool_proportion
Total proportion of VRAM currently distributed automatically.
- cap(instance: object) int | None[source]
Get the maximum allocatable bytes for an instance.
- Parameters:
instance – Instance to query.
- Returns:
Maximum allocatable bytes for this instance.
- Return type:
int or None
- change_stream_group(instance: object, new_group: str) None[source]
Move instance to another stream group.
- Parameters:
instance – Instance to move.
new_group – Name of the new stream group.
- Return type:
None
- compute_chunked_shapes(requests: dict[str, ArrayRequest], chunk_size: int) dict[str, Tuple[int, ...]][source]
Compute per-array chunked shapes based on available memory.
- Parameters:
requests – Dictionary mapping labels to array requests.
chunk_size – Length of chunked arrays along run axis
- Returns:
Mapping from array labels to their per-chunk shapes.
- Return type:
Notes
Unchunkable arrays retain their original shape.
- create_host_array(shape: tuple[int, ...], dtype: type, memory_type: str = 'pinned', like: ndarray | None = None) ndarray[source]
Create a C-contiguous host array.
- Parameters:
shape – Shape of the array to create.
dtype – Data type for the array elements.
memory_type – Memory type for the host array. Must be
"pinned"or"host". Defaults to"pinned".like – A source array to copy data from. If provided, the new array has the same data as like; if not, it is filled with zeros
- Returns:
C-contiguous host array.
- Return type:
- free(array_label: str) None[source]
Free an allocation by label across all instances.
- Parameters:
array_label – Label of the allocation to free.
- Return type:
None
- from_device(instance: object, from_arrays: list[object], to_arrays: list[object]) None[source]
Copy data from device arrays using the instance’s stream.
- Parameters:
instance – Instance whose stream to use for copying.
from_arrays – Source device arrays to copy from.
to_arrays – Destination arrays to copy to.
- Return type:
None
- get_available_memory(group: str) int[source]
Get available memory for an entire stream group.
- Parameters:
group – Name of the stream group.
- Returns:
Available memory in bytes for the group.
- Return type:
Warning
- UserWarning
If group has used more than 95% of allocated memory.
- get_chunk_parameters(requests: Dict[str, Dict], axis_length: int, stream_group: str) Tuple[int, int][source]
Calculate number of chunks and chunk size for a dict of array requests.
Chunking is performed along the run axis only.
- Parameters:
requests – Dictionary mapping instance IDs to their array requests.
axis_length – Unchunked length of the chunking axis.
stream_group – Name of the stream group making the request.
- Returns:
Length of chunked axis and number of chunks needed to fit the request.
- Return type:
Warning
- UserWarning
If request exceeds available VRAM by more than 20x.
- get_stream(instance: object) object[source]
Get the CUDA stream associated with an instance.
- Parameters:
instance – Instance to retrieve the stream for.
- Returns:
CUDA stream associated with the instance.
- Return type:
- get_stream_group(instance: object) str[source]
Get the name of the stream group for an instance.
- Parameters:
instance – Instance to query.
- Returns:
Name of the stream group.
- Return type:
- invalidate_all() None[source]
Call each invalidate hook and release all allocations.
- Return type:
None
- is_grouped(instance: object) bool[source]
Check if instance is grouped with others in a named stream.
- Parameters:
instance – Instance to check.
- Returns:
True if instance shares a stream group with other instances.
- Return type:
- property manual_pool_proportion
Total proportion of VRAM currently assigned manually.
- proportion(instance: object) float[source]
Get the maximum proportion of VRAM allocated to an instance.
- Parameters:
instance – Instance to query.
- Returns:
Proportion of VRAM allocated to this instance.
- Return type:
- queue_request(instance: object, requests: dict[str, ArrayRequest]) None[source]
Queue allocation requests for batched stream group processing.
- Parameters:
instance – The instance making the request.
requests – Dictionary mapping labels to array requests.
Notes
Requests are queued per stream group, allowing multiple components to contribute to a single coordinated allocation that can be optimally chunked together.
- Return type:
None
- register(instance: object, proportion: float | None = None, invalidate_cache_hook: Callable = <function placeholder_invalidate>, allocation_ready_hook: Callable = <function placeholder_dataready>, stream_group: str = 'default') None[source]
Register an instance and configure its memory allocation settings.
- Parameters:
instance – Instance to register for memory management.
proportion – Proportion of VRAM to allocate (0.0 to 1.0). When omitted, the instance joins the automatic allocation pool.
invalidate_cache_hook – Function to call when CUDA memory system changes occur.
allocation_ready_hook – Function to call when allocations are ready.
stream_group – Name of the stream group to assign the instance to.
- Raises:
ValueError – If instance is already registered or proportion is not between 0 and 1.
- Return type:
None
- reinit_streams() None[source]
Reinitialise all streams after a CUDA context reset.
- Return type:
None
- set_allocator(name: str) None[source]
Set the external memory allocator in Numba.
- Parameters:
name – Memory allocator type. Accepted values are
"cupy_async"to use CuPy’sAsyncMemoryPool,"cupy"to useMemoryPool, and"default"for Numba’s default manager.- Raises:
ValueError – If allocator name is not recognized.
Warning
- UserWarning
A change to the memory manager requires the CUDA context to be closed and reopened. This invalidates all previously compiled kernels and allocated arrays, requiring a full rebuild.
- Return type:
None
- set_auto_limit_mode(instance: object) None[source]
Convert a manual-limited instance to auto allocation mode.
- Parameters:
instance – Instance to convert to auto mode.
- Raises:
ValueError – If instance is already in auto allocation pool.
- Return type:
None
- set_limit_mode(mode: str) None[source]
Set the memory allocation limiting mode.
- Parameters:
mode – Either
"passive"or"active"memory management mode.- Raises:
ValueError – If mode is not “passive” or “active”.
- Return type:
None
- set_manual_limit_mode(instance: object, proportion: float) None[source]
Convert an auto-limited instance to manual allocation mode.
- Parameters:
instance – Instance to convert to manual mode.
proportion – Memory proportion to assign (0.0 to 1.0).
- Raises:
ValueError – If instance is already in manual allocation pool.
- Return type:
None
- set_manual_proportion(instance: object, proportion: float) None[source]
Set manual allocation proportion for an instance.
If instance is currently in the auto-allocation pool, shift it to manual.
- Parameters:
instance – Instance to update proportion for.
proportion – New proportion between 0 and 1.
- Raises:
ValueError – If proportion is not between 0 and 1.
- Return type:
None
- stream_groups: StreamGroups
- sync_stream(instance: object) None[source]
Synchronize the CUDA stream for an instance.
- Parameters:
instance – Instance whose stream to synchronize.
- Return type:
None
- to_device(instance: object, from_arrays: list[object], to_arrays: list[object]) None[source]
Copy data to device arrays using the instance’s stream.
- Parameters:
instance – Instance whose stream to use for copying.
from_arrays – Source arrays to copy from.
to_arrays – Destination device arrays to copy to.
- Return type:
None