cubie.batchsolving.BatchSolverKernel

Batch Solver Kernel Module.

This module provides the BatchSolverKernel class, which manages GPU-based batch integration of ODE systems using CUDA. The kernel handles the distribution of work across GPU threads and manages memory allocation for batched integrations.

Created on Tue May 27 17:45:03 2025

@author: cca79

Classes

BatchSolverKernel(system, algorithm, ...[, ...])

CUDA-based batch solver kernel for ODE integration.

class cubie.batchsolving.BatchSolverKernel.BatchSolverKernel(system, algorithm: str = 'euler', duration: float = 1.0, warmup: float = 0.0, dt_min: float = 0.01, dt_max: float = 0.1, dt_save: float = 0.1, dt_summarise: float = 1.0, atol: float = 1e-06, rtol: float = 1e-06, saved_state_indices: ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy.int64]] = None, saved_observable_indices: ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy.int64]] = None, summarised_state_indices: ~collections.abc.Buffer | ~numpy._typing._array_like._SupportsArray[~numpy.dtype[~typing.Any]] | ~numpy._typing._nested_sequence._NestedSequence[~numpy._typing._array_like._SupportsArray[~numpy.dtype[~typing.Any]]] | bool | int | float | complex | str | bytes | ~numpy._typing._nested_sequence._NestedSequence[bool | int | float | complex | str | bytes] | None = None, summarised_observable_indices: ~collections.abc.Buffer | ~numpy._typing._array_like._SupportsArray[~numpy.dtype[~typing.Any]] | ~numpy._typing._nested_sequence._NestedSequence[~numpy._typing._array_like._SupportsArray[~numpy.dtype[~typing.Any]]] | bool | int | float | complex | str | bytes | ~numpy._typing._nested_sequence._NestedSequence[bool | int | float | complex | str | bytes] | None = None, output_types: list[str] = None, precision: type = <class 'numpy.float64'>, profileCUDA: bool = False, memory_manager=MemoryManager(totalmem=8589934592, registry={}, stream_groups=StreamGroups(groups={}, streams={}), _mode='passive', _allocator=<class 'cubie.cudasim_utils.FakeNumbaCUDAMemoryManager'>, _auto_pool=[], _manual_pool=[], _stride_order=('time', 'run', 'variable'), _queued_allocations={}), stream_group='solver', mem_proportion=None)[source]

Bases: CUDAFactory

CUDA-based batch solver kernel for ODE integration.

This class builds and holds the integrating kernel and interfaces with lower-level modules including loop algorithms, ODE systems, and output functions. The kernel function accepts single or batched sets of inputs and distributes those amongst the threads on the GPU.

Parameters:

system (object) – The ODE system to be integrated.
algorithm (str, default='euler') – Integration algorithm to use.
duration (float, default=1.0) – Duration of the simulation.
warmup (float, default=0.0) – Warmup time before the main simulation.
dt_min (float, default=0.01) – Minimum allowed time step.
dt_max (float, default=0.1) – Maximum allowed time step.
dt_save (float, default=0.1) – Time step for saving output.
dt_summarise (float, default=1.0) – Time step for saving summaries.
atol (float, default=1e-6) – Absolute tolerance for adaptive stepping.
rtol (float, default=1e-6) – Relative tolerance for adaptive stepping.
saved_state_indices (NDArray[np.int_], optional) – Indices of state variables to save.
saved_observable_indices (NDArray[np.int_], optional) – Indices of observable variables to save.
summarised_state_indices (ArrayLike, optional) – Indices of state variables to summarise.
summarised_observable_indices (ArrayLike, optional) – Indices of observable variables to summarise.
output_types (list[str], optional) – Types of outputs to generate. Default is [“state”].
precision (type, default=np.float64) – Numerical precision to use.
profileCUDA (bool, default=False) – Whether to enable CUDA profiling.
memory_manager (MemoryManager, default=default_memmgr) – Memory manager instance to use.
stream_group (str, default='solver') – CUDA stream group identifier.
mem_proportion (float, optional) – Proportion of GPU memory to allocate.

Notes

This class is one level down from the user, managing sanitised inputs and handling the machinery of batching and running integrators. It does not handle:

Integration logic/algorithms - these are handled in SingleIntegratorRun and below
Input sanitisation / batch construction - this is handled in the solver api
System equations - these are handled in the system model classes

The class runs the loop device function on a given slice of its allocated memory and serves as the distributor of work amongst the individual runs of the integrators.

_on_allocation(response)[source]

Handle memory allocation response.

Parameters:: response (ArrayResponse) – Memory allocation response containing chunk information.

property active_output_arrays: ActiveOutputs

Get active output arrays.

Returns:: Active output arrays configuration.
Return type:: ActiveOutputs

Notes

Exposes the _active_outputs attribute from the child OutputArrays object.

property algorithm

Get the integration algorithm.

Returns:: The integration algorithm being used.
Return type:: str

property atol

Get absolute tolerance.

Returns:: Absolute tolerance for the solver.
Return type:: float

build()[source]

Build the integration kernel.

Returns:: The built integration kernel.
Return type:: CUDA device function

build_kernel()[source]

Build and compile the CUDA integration kernel.

Returns:: Compiled CUDA kernel function for integration.
Return type:: function

Notes

This method creates a CUDA kernel that: 1. Distributes work across GPU threads 2. Manages shared memory allocation 3. Calls the underlying integration loop function 4. Handles output array indexing and slicing

The kernel uses a 2D thread block structure where: - x-dimension handles intra-run parallelism - y-dimension handles different runs

property device_observable_summaries_array

Get device observable summaries array.

Returns:: Device observable summaries array.
Return type:: object

Notes

Exposes the observable_summaries attribute from the child OutputArrays object.

property device_observables_array

Get device observables array.

Returns:: Device observables array.
Return type:: object

Notes

Exposes the observables attribute from the child OutputArrays object.

property device_state_array

Get device state array.

Returns:: Device state array.
Return type:: object

Notes

Exposes the state attribute from the child OutputArrays object.

property device_state_summaries_array

Get device state summaries array.

Returns:: Device state summaries array.
Return type:: object

Notes

Exposes the state_summaries attribute from the child OutputArrays object.

disable_profiling()[source]

Disable CUDA profiling for the solver.

Notes

This will stop profiling the performance of the solver on the GPU, but will speed things up.

property dt_max

Get maximum step size.

Returns:: Maximum step size allowed for the solver.
Return type:: float

property dt_min

Get minimum step size.

Returns:: Minimum step size allowed for the solver.
Return type:: float

property dt_save

Get save time step.

Returns:: Time step for saving output.
Return type:: float

Notes

Exposes the dt_save attribute from the child SingleIntegratorRun object.

property dt_summarise

Get summary time step.

Returns:: Time step for saving summaries.
Return type:: float

Notes

Exposes the dt_summarise attribute from the child SingleIntegratorRun object.

property duration

Get simulation duration.

Returns:: Duration of the simulation.
Return type:: float

enable_profiling()[source]

Enable CUDA profiling for the solver.

Notes

This will allow you to profile the performance of the solver on the GPU, but will slow things down. Consider disabling optimisation and enabling debug and line info for profiling.

property fixed_step_size

Get the fixed step size.

Returns:: Fixed step size for the solver.
Return type:: float

Notes

Exposes the step_size attribute from the child SingleIntegratorRun object.

property forcing_vectors

Get forcing vectors array.

Returns:: The forcing vectors array.
Return type:: array_like

property initial_values

Get initial values array.

Returns:: The initial values array.
Return type:: array_like

property kernel

Get the device function kernel.

Returns:: The compiled CUDA device function.
Return type:: object

property mem_proportion

Get the memory proportion.

Returns:: The memory proportion the solver is assigned.
Return type:: float

property memory_manager

Get the memory manager.

Returns:: The memory manager the solver is registered with.
Return type:: MemoryManager

property observable_summaries

Get observable summaries array.

Returns:: The observable summaries array.
Return type:: array_like

property observables

Get observables array.

Returns:: The observables array.
Return type:: array_like

property ouput_array_sizes_2d

Get 2D output array sizes.

Returns:: The 2D output array sizes for a single run.
Return type:: object

property output_array_heights

Get output array heights.

Returns:: Output array heights information.
Return type:: object

Notes

Exposes the output_array_heights attribute from the child SingleIntegratorRun object.

property output_array_sizes_3d

Get 3D output array sizes.

Returns:: The 3D output array sizes for a batch of runs.
Return type:: object

property output_heights

Get output array heights.

Returns:: Output array heights from the child SingleIntegratorRun object.
Return type:: OutputArrayHeights

Notes

Exposes the output_array_heights attribute from the child SingleIntegratorRun object.

property output_length

Get number of output samples per run.

Returns:: Number of output samples per run.
Return type:: int

property output_stride_order

Get output stride order.

Returns:: The axis order of the output arrays.
Return type:: str

property output_types

Get output types.

Returns:: Types of outputs generated.
Return type:: list[str]

Notes

Exposes the output_types attribute from the child SingleIntegratorRun object.

property parameters

Get parameters array.

Returns:: The parameters array.
Return type:: array_like

property precision

Get numerical precision type.

Returns:: Numerical precision type (e.g., np.float64).
Return type:: type

Notes

Exposes the precision attribute from the child SingleIntegratorRun object.

property rtol

Get relative tolerance.

Returns:: Relative tolerance for the solver.
Return type:: float

run(inits, params, forcing_vectors, duration, blocksize=256, stream=None, warmup=0.0, chunk_axis='run')[source]

Execute the solver kernel for batch integration.

Parameters:

inits (array_like) – Initial conditions for each run. Shape should be (n_runs, n_states).
params (array_like) – Parameters for each run. Shape should be (n_runs, n_params).
forcing_vectors (array_like) – Forcing vectors for each run.
duration (float) – Duration of the simulation.
blocksize (int, default=256) – CUDA block size for kernel execution.
stream (object, optional) – CUDA stream to use. If None, uses the solver’s assigned stream.
warmup (float, default=0.0) – Warmup time before the main simulation.
chunk_axis (str, default='run') – Axis along which to chunk the computation (‘run’ or ‘time’).

Notes

This method performs the main batch integration by: 1. Setting up input and output arrays 2. Allocating GPU memory in chunks 3. Executing the CUDA kernel for each chunk 4. Handling memory management and synchronization

The method automatically adjusts block size if shared memory requirements exceed GPU limits, warning the user about potential performance impacts.

property save_time

Get save time array.

Returns:: Time points for saved output.
Return type:: array_like

Notes

Exposes the save_time attribute from the child SingleIntegratorRun object.

property saved_observable_indices

Get saved observable indices.

Returns:: Indices of observable variables to save.
Return type:: NDArray[np.int_]

Notes

Exposes the saved_observable_indices attribute from the child SingleIntegratorRun object.

property saved_state_indices

Get saved state indices.

Returns:: Indices of state variables to save.
Return type:: NDArray[np.int_]

Notes

Exposes the saved_state_indices attribute from the child SingleIntegratorRun object.

property shared_memory_bytes_per_run

Get shared memory bytes per run.

Returns:: Shared memory bytes required per run.
Return type:: int

Notes

Exposes the shared_memory_bytes attribute from the child SingleIntegratorRun object.

property shared_memory_elements_per_run

Get shared memory elements per run.

Returns:: Number of shared memory elements required per run.
Return type:: int

Notes

Exposes the shared_memory_elements attribute from the child SingleIntegratorRun object.

property state

Get state array.

Returns:: The state array.
Return type:: array_like

property state_summaries

Get state summaries array.

Returns:: The state summaries array.
Return type:: array_like

property stream

Get the assigned CUDA stream.

Returns:: The CUDA stream assigned to the solver.
Return type:: Stream

property stream_group

Get the stream group.

Returns:: The stream group the solver is in.
Return type:: str

property summaries_buffer_sizes

Get summaries buffer sizes.

Returns:: Summaries buffer sizes information.
Return type:: object

Notes

Exposes the summaries_buffer_sizes attribute from the child SingleIntegratorRun object.

property summaries_length

Get number of summary samples per run.

Returns:: Number of summary samples per run.
Return type:: int

property summarised_observable_indices

Get summarised observable indices.

Returns:: Indices of observable variables to summarise.
Return type:: ArrayLike

Notes

Exposes the summarised_observable_indices attribute from the child SingleIntegratorRun object.

property summarised_state_indices

Get summarised state indices.

Returns:: Indices of state variables to summarise.
Return type:: ArrayLike

Notes

Exposes the summarised_state_indices attribute from the child SingleIntegratorRun object.

property summary_legend_per_variable

Get summary legend per variable.

Returns:: Summary legend per variable information.
Return type:: object

Notes

Exposes the summary_legend_per_variable attribute from the child SingleIntegratorRun object.

property system

Get the ODE system.

Returns:: The ODE system being integrated.
Return type:: object

Notes

Exposes the system attribute from the SingleIntegratorRun instance.

property system_sizes

Get system sizes.

Returns:: System sizes information.
Return type:: object

Notes

Exposes the system_sizes attribute from the child SingleIntegratorRun object.

property threads_per_loop

Get threads per loop.

Returns:: Number of threads per integration loop.
Return type:: int

Notes

Exposes the threads_per_loop attribute from the child SingleIntegratorRun object.

update(updates_dict=None, silent=False, **kwargs)[source]

Update solver configuration parameters.

Parameters:

updates_dict (dict, optional) – Dictionary of parameter updates.
silent (bool, default=False) – If True, suppresses error messages for unrecognized parameters.
**kwargs – Additional parameter updates passed as keyword arguments.

Returns:

Set of recognized parameter names that were updated.

Return type:

set

Raises:

KeyError – If unrecognized parameters are provided and silent=False.

Notes

This method attempts to update parameters in both the compile settings and the single integrator instance. Unrecognized parameters are collected and reported as an error unless silent mode is enabled.

property warmup

Get warmup time.

Returns:: Warmup time of the simulation.
Return type:: float

property warmup_length

Get number of warmup samples.

Returns:: Number of warmup samples.
Return type:: int