Making it Faster (basic) ======================== TL/DR ----- To get the best performance from Cubie, try to: - Solve many problems at once (thousands if possible). - Reduce the number of variables and samples you save or summarise. - Set all parameters that you're not changing between solves to be `constants`. - Reuse existing Solvers. Parallelism ----------- If we compare a like-for-like implementation of an IVP integration using Cubie vs using an optimised CPU-utilising library like SciPy, we find that Cubie is some linear factor slower. This is expected - GPU hardware isn't optimised to process single tasks quickly. Instead, it's optimised to process many tasks in parallel. Except for a penalty transferring data in and out of the integration functions, increasing the number of problems being solved at once by a factor \(n\) has little effect on the total time taken - you get \ (n-1\) problems solved _almost_ for free. The single best way to get a performance gain from using Cubie is to solve more problems at once. Memory ------ The big bottleneck in GPU computing is memory traffic. When you're completing 32,000 integrations at the same time, they all want to save a sample of their state at the same time, and this puts a lot of pressure on the tubes between the GPU and its memory. The more data you save, the slower it goes. Cubie has three main levers you can pull to reduce memory traffic and speed up your solves: 1. Reduce the number of variables you save. If you're solving a 10D system but only care about one variable, only save that one variable. 2. Reduce the number of samples you save. If you're solving a system for 1000 time units but only care about the state at the end, only save the final state. 3. Use summary metrics. If you want to know the mean and standard deviation of a variable over the course of the solve, rather than save the whole history and process each dataset offline (slow!), use Cubie's built in summary metrics to calculate these on the GPU during the solve. You don't even need to save the state history at all! Constants --------- When you tell Cubie about your problem, you provide some symbols/variables that are input-only - they don't change during the solve. If you're brute-forcing a parameter study, you will want to be able to start an IVP from a bunch of different values for some of these parameters. However, you may have more parameters that you're not interested in changing between solves. If you mark these as `constants` when defining your system of ODEs, Cubie puts them in a different place in memory - rather than taking up space in the scarce fast memory that needs to be able to change often, they go into the compiled program itself. This means they require no memory traffic, and they free up more space to run more runs at once! Reusing Solvers --------------- Cubie uses Numba to just-in-time compile your system of ODEs and chosen integration algorithm into a CUDA kernel. This compilation step takes time (up to 30s!), so if you're only solving a few problems at once, it can dominate the total time taken. The compiled kernel is cached, and as long as you're running the same system with the same algorithm and saving at the same cadence (more or less, there's some other variables you might change that could force a recompile), Cubie will reuse the existing kernel and skip compilation for the next run. For this reason, if you're going to do multiple batches with the same system, instead of using the :func:`cubie.solve_ivp` function, create a :class:`cubie.Solver` object and call its :meth:`solve` method multiple times. Keeping a reference to the :class:`cubie.Solver` object means that subsequent calls to :meth:`solve` will be much faster.