Nonlinear Solvers ================= Implicit integration methods (DIRK, FIRK) produce a system of nonlinear equations at each time step. This page describes how CuBIE solves those equations on the GPU. Why a Solver is Needed ---------------------- An implicit Runge--Kutta stage satisfies: .. math:: k_i = f\!\Bigl(t_n + c_i h,\; x_n + h \sum_j a_{ij}\, k_j\Bigr) Because :math:`k_i` appears on both sides, we cannot simply evaluate the right-hand side. Instead we must find the :math:`k_i` that make the equation hold---a root-finding problem: .. math:: G(K) = K - F(K) = 0 where :math:`K` stacks all stage vectors and :math:`F` wraps the right-hand-side evaluations. Newton Iteration ---------------- CuBIE uses a simplified Newton method. Starting from an initial guess :math:`K^{(0)}` (typically extrapolated from the previous step), each iteration solves: .. math:: M\, \Delta K = -G(K^{(m)}), \qquad K^{(m+1)} = K^{(m)} + \Delta K where :math:`M \approx \partial G / \partial K` is formed from the Jacobian of :math:`f` and the tableau coefficients. In the simplified variant the Jacobian is evaluated once at the start of the step rather than every iteration. Convergence is checked by monitoring :math:`\|\Delta K\| / \|K^{(m+1)}\|`. If the ratio stalls, the step is rejected and retried with a smaller :math:`h`. Krylov Linear Solver -------------------- The linear system :math:`M \Delta K = r` is large (:math:`s \times n` for an :math:`s`-stage method with :math:`n` states), and forming :math:`M` explicitly would require :math:`O(n^2)` storage per thread---prohibitive on a GPU. CuBIE therefore uses a *matrix-free Krylov* solver that needs only the action of :math:`M` on a vector: .. math:: v \mapsto M\, v = v - h\, (A \otimes I)\, (I \otimes J)\, v The Jacobian--vector product :math:`J\, v` is computed symbolically by CuBIE's code generator (see :doc:`jacobians`), so no matrix is ever stored. Preconditioning ^^^^^^^^^^^^^^^ Krylov methods converge faster when the system is *preconditioned*: instead of solving :math:`M x = b` directly, we solve :math:`P^{-1} M x = P^{-1} b` for a cheaply invertible :math:`P` that approximates :math:`M`. CuBIE uses a *Neumann-series preconditioner*: .. math:: P^{-1} \approx I + N + N^2 + \cdots + N^k where :math:`N = I - M` and :math:`k` is a low-order truncation (typically 1--3). This requires only repeated matrix--vector products and no factorisation, making it well suited to the GPU's throughput model. Return Code Encoding -------------------- CuBIE encodes the Newton iteration count into the upper 16 bits of each step's integer return code: .. code-block:: python iterations = (code >> 16) & 0xFFFF status = code & 0xFFFF # compare with IntegratorReturnCodes This allows post-solve analysis of solver effort without extra output buffers. See :doc:`/user_guide/results` for how to access iteration counters.