Buffer Registry
CuBIE centralises GPU memory management through the
buffer_registry module. Every CUDA-generating component
registers its memory requirements, and the registry computes a layout
that is allocated once at kernel launch time.
CUDABuffer Descriptors
Each buffer is described by a CUDABuffer:
nameUnique identifier (scoped to its parent component).
sizeNumber of elements.
location"shared"(on-chip, fast, limited to ~48 KB per block) or"local"(per-thread, in registers/L1).persistentIf
True, the buffer survives across steps (e.g. state arrays). Non-persistent buffers can be aliased.aliasesName of another buffer that this one can share storage with, provided their lifetimes do not overlap.
precisionElement dtype (
np.float32ornp.float64).
Registering Buffers
Components register buffers in their build() method:
from cubie.buffer_registry import buffer_registry
buffer_registry.register(
name="stage_k",
parent=self,
size=self.n_states * self.n_stages,
location="shared",
precision=self.precision,
)
Allocators
After all components have registered, the registry computes offsets and produces allocator callables:
get_allocator(name, parent)Returns a function that, given a base pointer and thread index, returns a typed array slice for the named buffer.
get_toplevel_allocators(kernel)Returns
(shared_allocator, local_allocator)for the top-level kernel launch.get_child_allocators(parent, child)Delegates a region of the parent’s allocation to a child component.
Layout Queries
buffer_registry.shared_buffer_size(parent)Total shared memory bytes for a component and its children.
buffer_registry.local_buffer_size(parent)Total local memory bytes (non-persistent).
buffer_registry.persistent_local_buffer_size(parent)Total local memory bytes (persistent).
Use these to verify that your component’s memory footprint is reasonable.