cubie.memory.mem_manager
Comprehensive GPU memory management system for cubie.
This module provides a singleton interface for managing GPU memory allocation, stream coordination, and memory pool management. It integrates with CuPy memory pools (if desired) and provides passive or active splitting of VRAM amongst different processes/stream groups. All requests for allocation made through this interface are “chunked” to fit alotted memory.
Functions
|
Calculate the total memory size of a request in bytes. |
|
Default placeholder data ready hook that performs no operations. |
Default placeholder invalidate hook that performs no operations. |
Classes
|
Memory registry information for a registered instance. |
|
Singleton interface for managing GPU memory allocation and stream coordination. |
- class cubie.memory.mem_manager.InstanceMemorySettings(proportion: float = 1.0, allocations: dict = NOTHING, invalidate_hook: ~typing.Callable[[None], None] = <function placeholder_invalidate>, allocation_ready_hook: ~typing.Callable[[~cubie.memory.array_requests.ArrayResponse], None] = <function placeholder_dataready>, cap: int = None)[source]
Bases:
object
Memory registry information for a registered instance.
- Parameters:
proportion (float, default 1.0) – Proportion of total VRAM assigned to this instance.
allocations (dict, default empty dict) – Dictionary of current allocations keyed by label.
invalidate_hook (callable, default dummy_invalidate) – Function to call when CUDA memory system changes occur.
allocation_ready_hook (callable, default dummy_dataready) – Function to call when allocations are ready.
cap (int or None, default None) – Maximum allocatable bytes for this instance.
- invalidate_hook
Function to call when CUDA memory system changes.
- Type:
callable
- allocation_ready_hook
Function to call when allocations are ready.
- Type:
callable
- Properties
- ----------
Notes
The allocations dictionary serves both as a “keepalive” reference and a way to calculate total allocated memory. The invalidate_hook is called when the allocator/memory manager changes, requiring arrays and kernels to be re-allocated or redefined.
- add_allocation(key, arr)[source]
Add an allocation to the instance’s allocations list.
- Parameters:
key (str) – Label for the allocation.
arr (array-like) – The allocated array.
Notes
This will overwrite any previous allocation with the same key, which should function as intended but suggests the previous batch has not been properly deallocated. A warning is emitted in this case.
- property allocated_bytes
Calculate total allocated bytes across all arrays.
- Returns:
Total bytes allocated for this instance.
- Return type:
- allocation_ready_hook: Callable[[ArrayResponse], None]
- class cubie.memory.mem_manager.MemoryManager(totalmem: int = None, registry: dict[int, ~cubie.memory.mem_manager.InstanceMemorySettings] = NOTHING, stream_groups: ~cubie.memory.stream_groups.StreamGroups = NOTHING, mode: str = 'passive', allocator: ~cubie.cudasim_utils.FakeBaseCUDAMemoryManager = <class 'cubie.cudasim_utils.FakeNumbaCUDAMemoryManager'>, auto_pool: list[int] = NOTHING, manual_pool: list[int] = NOTHING, stride_order: tuple[str, str, str] = ('time', 'run', 'variable'), queued_allocations: ~typing.Dict[str, ~typing.Dict] = NOTHING)[source]
Bases:
object
Singleton interface for managing GPU memory allocation and stream coordination.
Provides memory management for cubie with support for passive or active memory limiting modes. In passive mode, it simply provides chunked allocations based on available memory. In active mode, it manages VRAM proportions between instances with support for manual and automatic allocation.
- Parameters:
totalmem (int, optional) – Total GPU memory in bytes. If None, will be determined automatically.
registry (dict of int to InstanceMemorySettings, optional) – Registry of instances and their memory settings.
stream_groups (StreamGroups, optional) – Manager for organizing instances into stream groups.
_mode (str, default "passive") – Memory management mode, either “passive” or “active”.
_allocator (BaseCUDAMemoryManager, default NumbaCUDAMemoryManager) – The memory allocator to use.
_auto_pool (list of int, optional) – List of instance IDs using automatic memory allocation.
_manual_pool (list of int, optional) – List of instance IDs using manual memory allocation.
_stride_order (tuple of str, default ("time", "run", "variable")) – Default stride ordering for 3D arrays.
_queued_allocations (dict of str to dict, optional) – Queued allocation requests organized by stream group.
- registry
Registry of instances and their memory settings.
- Type:
dict of int to InstanceMemorySettings
- stream_groups
Manager for organizing instances into stream groups.
- Type:
Notes
The MemoryManager accepts ArrayRequest objects and returns ArrayResponse objects with references to allocated arrays and chunking information. Each instance is assigned to a stream group for coordinated operations.
In active management mode, instances can be assigned specific VRAM proportions or automatically allocated equal shares of available memory.
- _add_auto_proportion(instance)[source]
Add an instance to the auto allocation pool with equal share.
- Parameters:
instance (object) – Instance to add to auto allocation pool.
- Returns:
Proportion assigned to this instance.
- Return type:
- Raises:
ValueError – If available auto-allocation pool is less than minimum required size.
Notes
Splits the non-manually-allocated portion of VRAM equally among all auto-allocated instances. Triggers rebalancing of the auto pool.
- _add_manual_proportion(instance: object, proportion: float)[source]
Add an instance to the manual allocation pool with specified proportion.
- Parameters:
- Raises:
ValueError – If manual proportion would exceed total available memory or leave insufficient memory for auto-allocated processes.
Warning
- UserWarning
If manual proportion leaves less than 5% of memory for auto allocation.
Notes
Updates the instance’s proportion and cap, then rebalances the auto pool. Enforces minimum auto pool size constraints.
- _rebalance_auto_pool()[source]
Redistribute available memory equally among auto-allocated instances.
Notes
Calculates the available proportion after manual allocations and divides it equally among all instances in the auto pool. Updates both proportion and cap for each auto-allocated instance.
- allocate(shape, dtype, memory_type, stream=0, strides=None)[source]
Allocate a single array with specified parameters.
- Parameters:
dtype (numpy.dtype) – Data type for array elements.
memory_type (str) – Type of memory: “device”, “mapped”, “pinned”, or “managed”.
stream (Stream, default 0) – CUDA stream for the allocation.
strides (tuple of int, optional) – Custom strides for the array.
- Returns:
Allocated GPU array.
- Return type:
array
- Raises:
ValueError – If memory_type is not recognized.
NotImplementedError – If memory_type is “managed” (not yet supported).
- allocate_all(requests, instance_id, stream)[source]
Allocate multiple arrays based on a dictionary of requests.
- allocate_queue(triggering_instance: object, limit_type: str = 'group', chunk_axis: str = 'run')[source]
Process all queued requests for a stream group with coordinated chunking.
- Parameters:
Notes
Processes all pending requests in the same stream group, applying coordinated chunking based on the specified limit type. Calls allocation_ready_hook for each instance with their results.
- property auto_pool_proportion
Get total proportion of VRAM automatically distributed.
- Returns:
Sum of all automatic allocation proportions.
- Return type:
- chunk_arrays(requests: dict[str, ArrayRequest], numchunks: int, axis: str = 'run')[source]
Divide array requests into smaller chunks along a specified axis.
- Parameters:
- Returns:
New dictionary with modified array shapes for chunking.
- Return type:
dict of str to ArrayRequest
Notes
The axis must match a label in the stride ordering. Chunking is done conservatively with ceiling division to ensure no data is lost.
- free(array_label: str)[source]
Free an allocation by label across all instances.
- Parameters:
array_label (str) – Label of the allocation to free.
- from_device(instance: object, from_arrays: list, to_arrays: list)[source]
Copy data from device arrays using the instance’s stream.
- get_available_group(group: str)[source]
Get available memory for an entire stream group.
- Parameters:
group (str) – Name of the stream group.
- Returns:
Available memory in bytes for the group.
- Return type:
Warning
- UserWarning
If group has used more than 95% of allocated memory.
- get_available_single(instance_id)[source]
Get available memory for a single instance.
- Parameters:
instance_id (int) – ID of the instance to check.
- Returns:
Available memory in bytes for this instance.
- Return type:
Warning
- UserWarning
If instance has used more than 95% of allocated memory.
- get_chunks(request_size: int, available: int = 0)[source]
Calculate number of chunks needed for a memory request.
- Parameters:
- Returns:
Number of chunks needed to fit the request.
- Return type:
Warning
- UserWarning
If request exceeds available VRAM by more than 20x.
- get_stream(instance)[source]
Get the CUDA stream associated with an instance.
- Parameters:
instance (object) – The instance to get the stream for.
- Returns:
CUDA stream associated with the instance.
- Return type:
Stream
- get_strides(request)[source]
Calculate memory strides for a given access pattern (stride order).
- Parameters:
request (ArrayRequest) – Array request to calculate strides for.
- Returns:
Stride tuple for the array, or None if no custom strides needed.
- Return type:
tuple or None
Notes
Only 3D arrays get custom stride optimization. 2D arrays use default strides as they are not performance-critical.
- property manual_pool_proportion
Get total proportion of VRAM currently manually assigned.
- Returns:
Sum of all manual allocation proportions.
- Return type:
- queue_request(instance, requests: dict[str, ArrayRequest])[source]
Queue allocation requests for batched stream group processing.
- Parameters:
Notes
Requests are queued per stream group, allowing multiple components to contribute to a single coordinated allocation that can be optimally chunked together.
- register(instance, proportion: float | None = None, invalidate_cache_hook: ~typing.Callable = <function placeholder_invalidate>, allocation_ready_hook: ~typing.Callable = <function placeholder_dataready>, stream_group: str = 'default')[source]
Register an instance and configure its memory allocation settings.
- Parameters:
instance (object) – The instance to register for memory management.
proportion (float, optional) – Proportion of VRAM to allocate (0.0 to 1.0). If None, instance will be automatically assigned an equal portion with other auto-assigned instances.
invalidate_cache_hook (callable, optional) – Function to call when CUDA memory system changes occur.
allocation_ready_hook (callable, optional) – Function to call when allocations are ready.
stream_group (str, default "default") – Name of the stream group to assign the instance to.
- Raises:
ValueError – If instance is already registered or proportion is not between 0 and 1.
- registry: dict[int, InstanceMemorySettings]
- set_allocator(name: str)[source]
Set the external memory allocator in Numba.
- Parameters:
name (str) – Memory allocator type: - “cupy_async”: Use CuPy’s MemoryAsyncPool (experimental) - “cupy”: Use CuPy’s MemoryPool - “default”: Use Numba’s default memory manager
- Raises:
ValueError – If allocator name is not recognized.
Warning
- UserWarning
A change to the memory manager requires the CUDA context to be closed and reopened. This invalidates all previously compiled kernels and allocated arrays, requiring a full rebuild.
- set_auto_limit_mode(instance)[source]
Convert a manual-limited instance to auto allocation mode.
- Parameters:
instance (object) – Instance to convert to auto mode.
- Raises:
ValueError – If instance is already in auto allocation pool.
- set_global_stride_ordering(ordering: tuple[str, str, str])[source]
Set the global memory stride ordering for arrays.
- Parameters:
ordering (tuple of str) – Tuple containing ‘time’, ‘run’, and ‘variable’ in desired order.
- Raises:
ValueError – If ordering doesn’t contain exactly ‘time’, ‘run’, and ‘variable’.
Notes
This invalidates all current allocations as arrays need to be reallocated with new stride patterns.
- set_limit_mode(mode: str)[source]
Set the memory allocation limiting mode.
- Parameters:
mode (str) – Either “passive” or “active” memory management mode.
- Raises:
ValueError – If mode is not “passive” or “active”.
- set_manual_limit_mode(instance: object, proportion: float)[source]
Convert an auto-limited instance to manual allocation mode.
- Parameters:
- Raises:
ValueError – If instance is already in manual allocation pool.
- set_manual_proportion(instance: object, proportion: float)[source]
Set manual allocation proportion for an instance.
If instance is currently in the auto-allocation pool, shift it to manual. :param instance: Instance to update proportion for. :type instance: object :param proportion: New proportion between 0 and 1. :type proportion: float
- Raises:
ValueError – If proportion is not between 0 and 1.
- single_request(instance: object | int, requests: dict[str, ArrayRequest], chunk_axis: str = 'run')[source]
Process a single allocation request with automatic chunking.
- Parameters:
- Raises:
TypeError – If requests is not a dict or contains invalid ArrayRequest objects.
Notes
This method calculates available memory, determines chunking needs, allocates arrays with optimal strides, and calls the instance’s allocation_ready_hook with the results.
- stream_groups: StreamGroups
- sync_stream(instance)[source]
Synchronize the CUDA stream for an instance.
- Parameters:
instance (object) – Instance whose stream to synchronize.
- cubie.memory.mem_manager.get_total_request_size(request: dict[str, ArrayRequest])[source]
Calculate the total memory size of a request in bytes.
- cubie.memory.mem_manager.placeholder_dataready(response: ArrayResponse) None [source]
Default placeholder data ready hook that performs no operations.
- Parameters:
response (ArrayResponse) – Array response object (unused).
- Return type:
None