GPU Memory Management

CuBIE manages GPU memory (VRAM) automatically, but understanding the available options lets you run larger batches and avoid out-of-memory errors.

Default Behaviour

By default CuBIE uses the Numba CUDA allocator. Each call to solve() allocates the required device arrays, runs the kernel, copies results back, then frees the memory.

CuPy Memory Pools

Repeatedly allocating and freeing GPU memory has overhead. CuPy provides memory pools that recycle allocations across calls:

solver = qb.Solver(system, algorithm="dormand_prince_54",
                    memory_settings={"allocator": "cupy"})

Available allocators:

"default"

Numba’s built-in allocator.

"cupy"

CuPy synchronous memory pool. Reduces allocation overhead between successive solves.

"cupy_async"

CuPy asynchronous memory pool. Can overlap allocation with computation on supported hardware.

CuPy must be installed separately (pip install cupy-cuda12x).

VRAM Limits

CuBIE estimates the available VRAM and sizes the batch accordingly. You can override the proportion of VRAM that CuBIE is allowed to use:

solver = qb.Solver(system, algorithm="dormand_prince_54",
                    memory_settings={"mem_proportion": 0.7})

Set a lower proportion if other processes share the GPU.

Automatic Chunking

When a batch is too large to fit in VRAM in one go, CuBIE automatically splits it into chunks and processes them sequentially. The results are concatenated transparently—you always get a single SolveResult.

Chunking is triggered automatically when the estimated memory requirement exceeds the available VRAM.

Stream Groups

For advanced use, CuBIE supports running multiple chunks concurrently on different CUDA streams via stream groups. This can hide data-transfer latency behind compute. Configure via memory_settings={"stream_group": ...} on the Solver.