cubie.batchsolving.BatchSolverKernel
Batch Solver Kernel Module.
This module provides the BatchSolverKernel class, which manages GPU-based batch integration of ODE systems using CUDA. The kernel handles the distribution of work across GPU threads and manages memory allocation for batched integrations.
Created on Tue May 27 17:45:03 2025
@author: cca79
Classes
|
CUDA-based batch solver kernel for ODE integration. |
- class cubie.batchsolving.BatchSolverKernel.BatchSolverKernel(system, algorithm: str = 'euler', duration: float = 1.0, warmup: float = 0.0, dt_min: float = 0.01, dt_max: float = 0.1, dt_save: float = 0.1, dt_summarise: float = 1.0, atol: float = 1e-06, rtol: float = 1e-06, saved_state_indices: ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy.int64]] = None, saved_observable_indices: ~numpy.ndarray[~typing.Any, ~numpy.dtype[~numpy.int64]] = None, summarised_state_indices: ~collections.abc.Buffer | ~numpy._typing._array_like._SupportsArray[~numpy.dtype[~typing.Any]] | ~numpy._typing._nested_sequence._NestedSequence[~numpy._typing._array_like._SupportsArray[~numpy.dtype[~typing.Any]]] | bool | int | float | complex | str | bytes | ~numpy._typing._nested_sequence._NestedSequence[bool | int | float | complex | str | bytes] | None = None, summarised_observable_indices: ~collections.abc.Buffer | ~numpy._typing._array_like._SupportsArray[~numpy.dtype[~typing.Any]] | ~numpy._typing._nested_sequence._NestedSequence[~numpy._typing._array_like._SupportsArray[~numpy.dtype[~typing.Any]]] | bool | int | float | complex | str | bytes | ~numpy._typing._nested_sequence._NestedSequence[bool | int | float | complex | str | bytes] | None = None, output_types: list[str] = None, precision: type = <class 'numpy.float64'>, profileCUDA: bool = False, memory_manager=MemoryManager(totalmem=8589934592, registry={}, stream_groups=StreamGroups(groups={}, streams={}), _mode='passive', _allocator=<class 'cubie.cudasim_utils.FakeNumbaCUDAMemoryManager'>, _auto_pool=[], _manual_pool=[], _stride_order=('time', 'run', 'variable'), _queued_allocations={}), stream_group='solver', mem_proportion=None)[source]
Bases:
CUDAFactory
CUDA-based batch solver kernel for ODE integration.
This class builds and holds the integrating kernel and interfaces with lower-level modules including loop algorithms, ODE systems, and output functions. The kernel function accepts single or batched sets of inputs and distributes those amongst the threads on the GPU.
- Parameters:
system (object) – The ODE system to be integrated.
algorithm (str, default='euler') – Integration algorithm to use.
duration (float, default=1.0) – Duration of the simulation.
warmup (float, default=0.0) – Warmup time before the main simulation.
dt_min (float, default=0.01) – Minimum allowed time step.
dt_max (float, default=0.1) – Maximum allowed time step.
dt_save (float, default=0.1) – Time step for saving output.
dt_summarise (float, default=1.0) – Time step for saving summaries.
atol (float, default=1e-6) – Absolute tolerance for adaptive stepping.
rtol (float, default=1e-6) – Relative tolerance for adaptive stepping.
saved_state_indices (NDArray[np.int_], optional) – Indices of state variables to save.
saved_observable_indices (NDArray[np.int_], optional) – Indices of observable variables to save.
summarised_state_indices (ArrayLike, optional) – Indices of state variables to summarise.
summarised_observable_indices (ArrayLike, optional) – Indices of observable variables to summarise.
output_types (list[str], optional) – Types of outputs to generate. Default is [“state”].
precision (type, default=np.float64) – Numerical precision to use.
profileCUDA (bool, default=False) – Whether to enable CUDA profiling.
memory_manager (MemoryManager, default=default_memmgr) – Memory manager instance to use.
stream_group (str, default='solver') – CUDA stream group identifier.
mem_proportion (float, optional) – Proportion of GPU memory to allocate.
Notes
This class is one level down from the user, managing sanitised inputs and handling the machinery of batching and running integrators. It does not handle:
Integration logic/algorithms - these are handled in SingleIntegratorRun and below
Input sanitisation / batch construction - this is handled in the solver api
System equations - these are handled in the system model classes
The class runs the loop device function on a given slice of its allocated memory and serves as the distributor of work amongst the individual runs of the integrators.
- _on_allocation(response)[source]
Handle memory allocation response.
- Parameters:
response (ArrayResponse) – Memory allocation response containing chunk information.
- property active_output_arrays: ActiveOutputs
Get active output arrays.
- Returns:
Active output arrays configuration.
- Return type:
Notes
Exposes the _active_outputs attribute from the child OutputArrays object.
- property algorithm
Get the integration algorithm.
- Returns:
The integration algorithm being used.
- Return type:
- property atol
Get absolute tolerance.
- Returns:
Absolute tolerance for the solver.
- Return type:
- build()[source]
Build the integration kernel.
- Returns:
The built integration kernel.
- Return type:
CUDA device function
- build_kernel()[source]
Build and compile the CUDA integration kernel.
- Returns:
Compiled CUDA kernel function for integration.
- Return type:
function
Notes
This method creates a CUDA kernel that: 1. Distributes work across GPU threads 2. Manages shared memory allocation 3. Calls the underlying integration loop function 4. Handles output array indexing and slicing
The kernel uses a 2D thread block structure where: - x-dimension handles intra-run parallelism - y-dimension handles different runs
- property device_observable_summaries_array
Get device observable summaries array.
- Returns:
Device observable summaries array.
- Return type:
Notes
Exposes the observable_summaries attribute from the child OutputArrays object.
- property device_observables_array
Get device observables array.
- Returns:
Device observables array.
- Return type:
Notes
Exposes the observables attribute from the child OutputArrays object.
- property device_state_array
Get device state array.
- Returns:
Device state array.
- Return type:
Notes
Exposes the state attribute from the child OutputArrays object.
- property device_state_summaries_array
Get device state summaries array.
- Returns:
Device state summaries array.
- Return type:
Notes
Exposes the state_summaries attribute from the child OutputArrays object.
- disable_profiling()[source]
Disable CUDA profiling for the solver.
Notes
This will stop profiling the performance of the solver on the GPU, but will speed things up.
- property dt_max
Get maximum step size.
- Returns:
Maximum step size allowed for the solver.
- Return type:
- property dt_min
Get minimum step size.
- Returns:
Minimum step size allowed for the solver.
- Return type:
- property dt_save
Get save time step.
- Returns:
Time step for saving output.
- Return type:
Notes
Exposes the dt_save attribute from the child SingleIntegratorRun object.
- property dt_summarise
Get summary time step.
- Returns:
Time step for saving summaries.
- Return type:
Notes
Exposes the dt_summarise attribute from the child SingleIntegratorRun object.
- enable_profiling()[source]
Enable CUDA profiling for the solver.
Notes
This will allow you to profile the performance of the solver on the GPU, but will slow things down. Consider disabling optimisation and enabling debug and line info for profiling.
- property fixed_step_size
Get the fixed step size.
- Returns:
Fixed step size for the solver.
- Return type:
Notes
Exposes the step_size attribute from the child SingleIntegratorRun object.
- property forcing_vectors
Get forcing vectors array.
- Returns:
The forcing vectors array.
- Return type:
array_like
- property initial_values
Get initial values array.
- Returns:
The initial values array.
- Return type:
array_like
- property kernel
Get the device function kernel.
- Returns:
The compiled CUDA device function.
- Return type:
- property mem_proportion
Get the memory proportion.
- Returns:
The memory proportion the solver is assigned.
- Return type:
- property memory_manager
Get the memory manager.
- Returns:
The memory manager the solver is registered with.
- Return type:
- property observable_summaries
Get observable summaries array.
- Returns:
The observable summaries array.
- Return type:
array_like
- property observables
Get observables array.
- Returns:
The observables array.
- Return type:
array_like
- property ouput_array_sizes_2d
Get 2D output array sizes.
- Returns:
The 2D output array sizes for a single run.
- Return type:
- property output_array_heights
Get output array heights.
- Returns:
Output array heights information.
- Return type:
Notes
Exposes the output_array_heights attribute from the child SingleIntegratorRun object.
- property output_array_sizes_3d
Get 3D output array sizes.
- Returns:
The 3D output array sizes for a batch of runs.
- Return type:
- property output_heights
Get output array heights.
- Returns:
Output array heights from the child SingleIntegratorRun object.
- Return type:
Notes
Exposes the output_array_heights attribute from the child SingleIntegratorRun object.
- property output_length
Get number of output samples per run.
- Returns:
Number of output samples per run.
- Return type:
- property output_stride_order
Get output stride order.
- Returns:
The axis order of the output arrays.
- Return type:
- property output_types
Get output types.
Notes
Exposes the output_types attribute from the child SingleIntegratorRun object.
- property parameters
Get parameters array.
- Returns:
The parameters array.
- Return type:
array_like
- property precision
Get numerical precision type.
- Returns:
Numerical precision type (e.g., np.float64).
- Return type:
Notes
Exposes the precision attribute from the child SingleIntegratorRun object.
- property rtol
Get relative tolerance.
- Returns:
Relative tolerance for the solver.
- Return type:
- run(inits, params, forcing_vectors, duration, blocksize=256, stream=None, warmup=0.0, chunk_axis='run')[source]
Execute the solver kernel for batch integration.
- Parameters:
inits (array_like) – Initial conditions for each run. Shape should be (n_runs, n_states).
params (array_like) – Parameters for each run. Shape should be (n_runs, n_params).
forcing_vectors (array_like) – Forcing vectors for each run.
duration (float) – Duration of the simulation.
blocksize (int, default=256) – CUDA block size for kernel execution.
stream (object, optional) – CUDA stream to use. If None, uses the solver’s assigned stream.
warmup (float, default=0.0) – Warmup time before the main simulation.
chunk_axis (str, default='run') – Axis along which to chunk the computation (‘run’ or ‘time’).
Notes
This method performs the main batch integration by: 1. Setting up input and output arrays 2. Allocating GPU memory in chunks 3. Executing the CUDA kernel for each chunk 4. Handling memory management and synchronization
The method automatically adjusts block size if shared memory requirements exceed GPU limits, warning the user about potential performance impacts.
- property save_time
Get save time array.
- Returns:
Time points for saved output.
- Return type:
array_like
Notes
Exposes the save_time attribute from the child SingleIntegratorRun object.
- property saved_observable_indices
Get saved observable indices.
- Returns:
Indices of observable variables to save.
- Return type:
NDArray[np.int_]
Notes
Exposes the saved_observable_indices attribute from the child SingleIntegratorRun object.
- property saved_state_indices
Get saved state indices.
- Returns:
Indices of state variables to save.
- Return type:
NDArray[np.int_]
Notes
Exposes the saved_state_indices attribute from the child SingleIntegratorRun object.
Get shared memory bytes per run.
- Returns:
Shared memory bytes required per run.
- Return type:
Notes
Exposes the shared_memory_bytes attribute from the child SingleIntegratorRun object.
Get shared memory elements per run.
- Returns:
Number of shared memory elements required per run.
- Return type:
Notes
Exposes the shared_memory_elements attribute from the child SingleIntegratorRun object.
- property state
Get state array.
- Returns:
The state array.
- Return type:
array_like
- property state_summaries
Get state summaries array.
- Returns:
The state summaries array.
- Return type:
array_like
- property stream
Get the assigned CUDA stream.
- Returns:
The CUDA stream assigned to the solver.
- Return type:
Stream
- property stream_group
Get the stream group.
- Returns:
The stream group the solver is in.
- Return type:
- property summaries_buffer_sizes
Get summaries buffer sizes.
- Returns:
Summaries buffer sizes information.
- Return type:
Notes
Exposes the summaries_buffer_sizes attribute from the child SingleIntegratorRun object.
- property summaries_length
Get number of summary samples per run.
- Returns:
Number of summary samples per run.
- Return type:
- property summarised_observable_indices
Get summarised observable indices.
- Returns:
Indices of observable variables to summarise.
- Return type:
ArrayLike
Notes
Exposes the summarised_observable_indices attribute from the child SingleIntegratorRun object.
- property summarised_state_indices
Get summarised state indices.
- Returns:
Indices of state variables to summarise.
- Return type:
ArrayLike
Notes
Exposes the summarised_state_indices attribute from the child SingleIntegratorRun object.
- property summary_legend_per_variable
Get summary legend per variable.
- Returns:
Summary legend per variable information.
- Return type:
Notes
Exposes the summary_legend_per_variable attribute from the child SingleIntegratorRun object.
- property system
Get the ODE system.
- Returns:
The ODE system being integrated.
- Return type:
Notes
Exposes the system attribute from the SingleIntegratorRun instance.
- property system_sizes
Get system sizes.
- Returns:
System sizes information.
- Return type:
Notes
Exposes the system_sizes attribute from the child SingleIntegratorRun object.
- property threads_per_loop
Get threads per loop.
- Returns:
Number of threads per integration loop.
- Return type:
Notes
Exposes the threads_per_loop attribute from the child SingleIntegratorRun object.
- update(updates_dict=None, silent=False, **kwargs)[source]
Update solver configuration parameters.
- Parameters:
- Returns:
Set of recognized parameter names that were updated.
- Return type:
- Raises:
KeyError – If unrecognized parameters are provided and silent=False.
Notes
This method attempts to update parameters in both the compile settings and the single integrator instance. Unrecognized parameters are collected and reported as an error unless silent mode is enabled.