cubie.memory.mem_manager

Comprehensive GPU memory management system for cubie.

This module provides a singleton interface for managing GPU memory allocation, stream coordination, and memory pool management. It integrates with CuPy memory pools (if desired) and provides passive or active splitting of VRAM amongst different processes/stream groups. All requests for allocation made through this interface are “chunked” to fit alotted memory.

Functions

get_total_request_size(request)

Calculate the total memory size of a request in bytes.

placeholder_dataready(response)

Default placeholder data ready hook that performs no operations.

placeholder_invalidate()

Default placeholder invalidate hook that performs no operations.

Classes

InstanceMemorySettings(proportion, ...)

Memory registry information for a registered instance.

MemoryManager(totalmem, registry, ...)

Singleton interface for managing GPU memory allocation and stream coordination.

class cubie.memory.mem_manager.InstanceMemorySettings(proportion: float = 1.0, allocations: dict = NOTHING, invalidate_hook: ~typing.Callable[[None], None] = <function placeholder_invalidate>, allocation_ready_hook: ~typing.Callable[[~cubie.memory.array_requests.ArrayResponse], None] = <function placeholder_dataready>, cap: int = None)[source]

Bases: object

Memory registry information for a registered instance.

Parameters:
  • proportion (float, default 1.0) – Proportion of total VRAM assigned to this instance.

  • allocations (dict, default empty dict) – Dictionary of current allocations keyed by label.

  • invalidate_hook (callable, default dummy_invalidate) – Function to call when CUDA memory system changes occur.

  • allocation_ready_hook (callable, default dummy_dataready) – Function to call when allocations are ready.

  • cap (int or None, default None) – Maximum allocatable bytes for this instance.

proportion

Proportion of total VRAM assigned to this instance.

Type:

float

allocations

Dictionary of current allocations keyed by array label.

Type:

dict

invalidate_hook

Function to call when CUDA memory system changes.

Type:

callable

allocation_ready_hook

Function to call when allocations are ready.

Type:

callable

cap

Maximum allocatable bytes for this instance.

Type:

int or None

Properties
----------
allocated_bytes

Total number of bytes across all allocated arrays for the instance.

Type:

int

Notes

The allocations dictionary serves both as a “keepalive” reference and a way to calculate total allocated memory. The invalidate_hook is called when the allocator/memory manager changes, requiring arrays and kernels to be re-allocated or redefined.

add_allocation(key, arr)[source]

Add an allocation to the instance’s allocations list.

Parameters:
  • key (str) – Label for the allocation.

  • arr (array-like) – The allocated array.

Notes

This will overwrite any previous allocation with the same key, which should function as intended but suggests the previous batch has not been properly deallocated. A warning is emitted in this case.

property allocated_bytes

Calculate total allocated bytes across all arrays.

Returns:

Total bytes allocated for this instance.

Return type:

int

allocation_ready_hook: Callable[[ArrayResponse], None]
allocations: dict
cap: int
free(key)[source]

Free an allocation by key.

Parameters:

key (str) – Label of the allocation to free.

Notes

Emits a warning if the key is not found in allocations.

free_all()[source]

Drop all references to allocated arrays.

Return type:

None

invalidate_hook: Callable[[None], None]
proportion: float
class cubie.memory.mem_manager.MemoryManager(totalmem: int = None, registry: dict[int, ~cubie.memory.mem_manager.InstanceMemorySettings] = NOTHING, stream_groups: ~cubie.memory.stream_groups.StreamGroups = NOTHING, mode: str = 'passive', allocator: ~cubie.cudasim_utils.FakeBaseCUDAMemoryManager = <class 'cubie.cudasim_utils.FakeNumbaCUDAMemoryManager'>, auto_pool: list[int] = NOTHING, manual_pool: list[int] = NOTHING, stride_order: tuple[str, str, str] = ('time', 'run', 'variable'), queued_allocations: ~typing.Dict[str, ~typing.Dict] = NOTHING)[source]

Bases: object

Singleton interface for managing GPU memory allocation and stream coordination.

Provides memory management for cubie with support for passive or active memory limiting modes. In passive mode, it simply provides chunked allocations based on available memory. In active mode, it manages VRAM proportions between instances with support for manual and automatic allocation.

Parameters:
  • totalmem (int, optional) – Total GPU memory in bytes. If None, will be determined automatically.

  • registry (dict of int to InstanceMemorySettings, optional) – Registry of instances and their memory settings.

  • stream_groups (StreamGroups, optional) – Manager for organizing instances into stream groups.

  • _mode (str, default "passive") – Memory management mode, either “passive” or “active”.

  • _allocator (BaseCUDAMemoryManager, default NumbaCUDAMemoryManager) – The memory allocator to use.

  • _auto_pool (list of int, optional) – List of instance IDs using automatic memory allocation.

  • _manual_pool (list of int, optional) – List of instance IDs using manual memory allocation.

  • _stride_order (tuple of str, default ("time", "run", "variable")) – Default stride ordering for 3D arrays.

  • _queued_allocations (dict of str to dict, optional) – Queued allocation requests organized by stream group.

totalmem

Total GPU memory in bytes.

Type:

int

registry

Registry of instances and their memory settings.

Type:

dict of int to InstanceMemorySettings

stream_groups

Manager for organizing instances into stream groups.

Type:

StreamGroups

Notes

The MemoryManager accepts ArrayRequest objects and returns ArrayResponse objects with references to allocated arrays and chunking information. Each instance is assigned to a stream group for coordinated operations.

In active management mode, instances can be assigned specific VRAM proportions or automatically allocated equal shares of available memory.

_add_auto_proportion(instance)[source]

Add an instance to the auto allocation pool with equal share.

Parameters:

instance (object) – Instance to add to auto allocation pool.

Returns:

Proportion assigned to this instance.

Return type:

float

Raises:

ValueError – If available auto-allocation pool is less than minimum required size.

Notes

Splits the non-manually-allocated portion of VRAM equally among all auto-allocated instances. Triggers rebalancing of the auto pool.

_add_manual_proportion(instance: object, proportion: float)[source]

Add an instance to the manual allocation pool with specified proportion.

Parameters:
  • instance (object) – Instance to add to manual allocation pool.

  • proportion (float) – Memory proportion to assign (0.0 to 1.0).

Raises:

ValueError – If manual proportion would exceed total available memory or leave insufficient memory for auto-allocated processes.

Warning

UserWarning

If manual proportion leaves less than 5% of memory for auto allocation.

Notes

Updates the instance’s proportion and cap, then rebalances the auto pool. Enforces minimum auto pool size constraints.

_check_requests(requests)[source]

Validate that all requests are properly formatted.

Parameters:

requests (dict) – Dictionary of requests to validate.

Raises:

TypeError – If requests is not a dict or contains invalid ArrayRequest objects.

_rebalance_auto_pool()[source]

Redistribute available memory equally among auto-allocated instances.

Notes

Calculates the available proportion after manual allocations and divides it equally among all instances in the auto pool. Updates both proportion and cap for each auto-allocated instance.

allocate(shape, dtype, memory_type, stream=0, strides=None)[source]

Allocate a single array with specified parameters.

Parameters:
  • shape (tuple of int) – Shape of the array to allocate.

  • dtype (numpy.dtype) – Data type for array elements.

  • memory_type (str) – Type of memory: “device”, “mapped”, “pinned”, or “managed”.

  • stream (Stream, default 0) – CUDA stream for the allocation.

  • strides (tuple of int, optional) – Custom strides for the array.

Returns:

Allocated GPU array.

Return type:

array

Raises:
allocate_all(requests, instance_id, stream)[source]

Allocate multiple arrays based on a dictionary of requests.

Parameters:
  • requests (dict of str to ArrayRequest) – Dictionary mapping labels to array requests.

  • instance_id (int) – ID of the requesting instance.

  • stream (Stream) – CUDA stream for the allocations.

Returns:

Dictionary mapping labels to allocated arrays.

Return type:

dict of str to array

allocate_queue(triggering_instance: object, limit_type: str = 'group', chunk_axis: str = 'run')[source]

Process all queued requests for a stream group with coordinated chunking.

Parameters:
  • triggering_instance (object) – The instance that triggered queue processing.

  • limit_type (str, default "group") – Limiting strategy: “group” for aggregate limits or “instance” for individual instance limits.

  • chunk_axis (str, default "run") – Axis along which to chunk arrays if needed.

Notes

Processes all pending requests in the same stream group, applying coordinated chunking based on the specified limit type. Calls allocation_ready_hook for each instance with their results.

property auto_pool_proportion

Get total proportion of VRAM automatically distributed.

Returns:

Sum of all automatic allocation proportions.

Return type:

float

cap(instance)[source]

Get the maximum allocatable bytes for an instance.

Parameters:

instance (object) – Instance to query.

Returns:

Maximum allocatable bytes for this instance.

Return type:

int

change_stream_group(instance, new_group)[source]

Move instance to another stream group.

Parameters:
  • instance (object) – The instance to move.

  • new_group (str) – Name of the new stream group.

chunk_arrays(requests: dict[str, ArrayRequest], numchunks: int, axis: str = 'run')[source]

Divide array requests into smaller chunks along a specified axis.

Parameters:
  • requests (dict of str to ArrayRequest) – Dictionary mapping labels to array requests.

  • numchunks (int) – Number of chunks to divide arrays into.

  • axis (str, default "run") – Axis name along which to chunk the arrays.

Returns:

New dictionary with modified array shapes for chunking.

Return type:

dict of str to ArrayRequest

Notes

The axis must match a label in the stride ordering. Chunking is done conservatively with ceiling division to ensure no data is lost.

free(array_label: str)[source]

Free an allocation by label across all instances.

Parameters:

array_label (str) – Label of the allocation to free.

free_all()[source]

Free all allocations across all registered instances.

from_device(instance: object, from_arrays: list, to_arrays: list)[source]

Copy data from device arrays using the instance’s stream.

Parameters:
  • instance (object) – Instance whose stream to use for copying.

  • from_arrays (list) – Source device arrays to copy from.

  • to_arrays (list) – Destination arrays to copy to.

get_available_group(group: str)[source]

Get available memory for an entire stream group.

Parameters:

group (str) – Name of the stream group.

Returns:

Available memory in bytes for the group.

Return type:

int

Warning

UserWarning

If group has used more than 95% of allocated memory.

get_available_single(instance_id)[source]

Get available memory for a single instance.

Parameters:

instance_id (int) – ID of the instance to check.

Returns:

Available memory in bytes for this instance.

Return type:

int

Warning

UserWarning

If instance has used more than 95% of allocated memory.

get_chunks(request_size: int, available: int = 0)[source]

Calculate number of chunks needed for a memory request.

Parameters:
  • request_size (int) – Total size of the request in bytes.

  • available (int, default 0) – Available memory in bytes.

Returns:

Number of chunks needed to fit the request.

Return type:

int

Warning

UserWarning

If request exceeds available VRAM by more than 20x.

get_memory_info()[source]

Get free and total GPU memory information.

Returns:

(free_memory, total_memory) in bytes.

Return type:

tuple of int

get_stream(instance)[source]

Get the CUDA stream associated with an instance.

Parameters:

instance (object) – The instance to get the stream for.

Returns:

CUDA stream associated with the instance.

Return type:

Stream

get_stream_group(instance)[source]

Get the name of the stream group for an instance.

Parameters:

instance (object) – Instance to query.

Returns:

Name of the stream group.

Return type:

str

get_strides(request)[source]

Calculate memory strides for a given access pattern (stride order).

Parameters:

request (ArrayRequest) – Array request to calculate strides for.

Returns:

Stride tuple for the array, or None if no custom strides needed.

Return type:

tuple or None

Notes

Only 3D arrays get custom stride optimization. 2D arrays use default strides as they are not performance-critical.

invalidate_all()[source]

Call each registered instance’s invalidate hook and free all allocations.

is_grouped(instance)[source]

Check if instance is grouped with others in a named stream.

Parameters:

instance (object) – Instance to check.

Returns:

True if instance shares a stream group with other instances.

Return type:

bool

property manual_pool_proportion

Get total proportion of VRAM currently manually assigned.

Returns:

Sum of all manual allocation proportions.

Return type:

float

proportion(instance)[source]

Get the maximum proportion of VRAM allocated to an instance.

Parameters:

instance (object) – Instance to query.

Returns:

Proportion of VRAM allocated to this instance.

Return type:

float

queue_request(instance, requests: dict[str, ArrayRequest])[source]

Queue allocation requests for batched stream group processing.

Parameters:
  • instance (object) – The instance making the request.

  • requests (dict of str to ArrayRequest) – Dictionary mapping labels to array requests.

Notes

Requests are queued per stream group, allowing multiple components to contribute to a single coordinated allocation that can be optimally chunked together.

register(instance, proportion: float | None = None, invalidate_cache_hook: ~typing.Callable = <function placeholder_invalidate>, allocation_ready_hook: ~typing.Callable = <function placeholder_dataready>, stream_group: str = 'default')[source]

Register an instance and configure its memory allocation settings.

Parameters:
  • instance (object) – The instance to register for memory management.

  • proportion (float, optional) – Proportion of VRAM to allocate (0.0 to 1.0). If None, instance will be automatically assigned an equal portion with other auto-assigned instances.

  • invalidate_cache_hook (callable, optional) – Function to call when CUDA memory system changes occur.

  • allocation_ready_hook (callable, optional) – Function to call when allocations are ready.

  • stream_group (str, default "default") – Name of the stream group to assign the instance to.

Raises:

ValueError – If instance is already registered or proportion is not between 0 and 1.

registry: dict[int, InstanceMemorySettings]
reinit_streams()[source]

Reinitialize all streams after CUDA context reset.

set_allocator(name: str)[source]

Set the external memory allocator in Numba.

Parameters:

name (str) – Memory allocator type: - “cupy_async”: Use CuPy’s MemoryAsyncPool (experimental) - “cupy”: Use CuPy’s MemoryPool - “default”: Use Numba’s default memory manager

Raises:

ValueError – If allocator name is not recognized.

Warning

UserWarning

A change to the memory manager requires the CUDA context to be closed and reopened. This invalidates all previously compiled kernels and allocated arrays, requiring a full rebuild.

set_auto_limit_mode(instance)[source]

Convert a manual-limited instance to auto allocation mode.

Parameters:

instance (object) – Instance to convert to auto mode.

Raises:

ValueError – If instance is already in auto allocation pool.

set_global_stride_ordering(ordering: tuple[str, str, str])[source]

Set the global memory stride ordering for arrays.

Parameters:

ordering (tuple of str) – Tuple containing ‘time’, ‘run’, and ‘variable’ in desired order.

Raises:

ValueError – If ordering doesn’t contain exactly ‘time’, ‘run’, and ‘variable’.

Notes

This invalidates all current allocations as arrays need to be reallocated with new stride patterns.

set_limit_mode(mode: str)[source]

Set the memory allocation limiting mode.

Parameters:

mode (str) – Either “passive” or “active” memory management mode.

Raises:

ValueError – If mode is not “passive” or “active”.

set_manual_limit_mode(instance: object, proportion: float)[source]

Convert an auto-limited instance to manual allocation mode.

Parameters:
  • instance (object) – Instance to convert to manual mode.

  • proportion (float) – Memory proportion to assign (0.0 to 1.0).

Raises:

ValueError – If instance is already in manual allocation pool.

set_manual_proportion(instance: object, proportion: float)[source]

Set manual allocation proportion for an instance.

If instance is currently in the auto-allocation pool, shift it to manual. :param instance: Instance to update proportion for. :type instance: object :param proportion: New proportion between 0 and 1. :type proportion: float

Raises:

ValueError – If proportion is not between 0 and 1.

single_request(instance: object | int, requests: dict[str, ArrayRequest], chunk_axis: str = 'run')[source]

Process a single allocation request with automatic chunking.

Parameters:
  • instance (object or int) – The requesting instance or its ID.

  • requests (dict of str to ArrayRequest) – Dictionary mapping labels to array requests.

  • chunk_axis (str, default "run") – Axis along which to chunk if memory is insufficient.

Raises:

TypeError – If requests is not a dict or contains invalid ArrayRequest objects.

Notes

This method calculates available memory, determines chunking needs, allocates arrays with optimal strides, and calls the instance’s allocation_ready_hook with the results.

stream_groups: StreamGroups
sync_stream(instance)[source]

Synchronize the CUDA stream for an instance.

Parameters:

instance (object) – Instance whose stream to synchronize.

to_device(instance: object, from_arrays: list, to_arrays: list)[source]

Copy data to device arrays using the instance’s stream.

Parameters:
  • instance (object) – Instance whose stream to use for copying.

  • from_arrays (list) – Source arrays to copy from.

  • to_arrays (list) – Destination device arrays to copy to.

totalmem: int
cubie.memory.mem_manager.get_total_request_size(request: dict[str, ArrayRequest])[source]

Calculate the total memory size of a request in bytes.

Parameters:

request (dict of str to ArrayRequest) – Dictionary of array requests to sum.

Returns:

Total size in bytes across all requests.

Return type:

int

cubie.memory.mem_manager.placeholder_dataready(response: ArrayResponse) None[source]

Default placeholder data ready hook that performs no operations.

Parameters:

response (ArrayResponse) – Array response object (unused).

Return type:

None

cubie.memory.mem_manager.placeholder_invalidate() None[source]

Default placeholder invalidate hook that performs no operations.

Return type:

None