GPU Memory Management
CuBIE manages GPU memory (VRAM) automatically, but understanding the available options lets you run larger batches and avoid out-of-memory errors.
Default Behaviour
By default CuBIE uses the Numba CUDA allocator. Each call to
solve() allocates the required
device arrays, runs the kernel, copies results back, then frees the
memory.
CuPy Memory Pools
Repeatedly allocating and freeing GPU memory has overhead. CuPy provides memory pools that recycle allocations across calls:
solver = qb.Solver(system, algorithm="dormand_prince_54",
memory_settings={"allocator": "cupy"})
Available allocators:
"default"Numba’s built-in allocator.
"cupy"CuPy synchronous memory pool. Reduces allocation overhead between successive solves.
"cupy_async"CuPy asynchronous memory pool. Can overlap allocation with computation on supported hardware.
CuPy must be installed separately (pip install cupy-cuda12x).
VRAM Limits
CuBIE estimates the available VRAM and sizes the batch accordingly. You can override the proportion of VRAM that CuBIE is allowed to use:
solver = qb.Solver(system, algorithm="dormand_prince_54",
memory_settings={"mem_proportion": 0.7})
Set a lower proportion if other processes share the GPU.
Automatic Chunking
When a batch is too large to fit in VRAM in one go, CuBIE automatically
splits it into chunks and processes them sequentially. The results are
concatenated transparently—you always get a single
SolveResult.
Chunking is triggered automatically when the estimated memory requirement exceeds the available VRAM.
Stream Groups
For advanced use, CuBIE supports running multiple chunks concurrently on
different CUDA streams via stream groups. This can hide data-transfer
latency behind compute. Configure via
memory_settings={"stream_group": ...} on the Solver.