.. _cuda: .. py:currentmodule:: tissue_forge GPU Acceleration ================= Tissue Forge supports modular, runtime-configurable GPU acceleration of a simulation using CUDA. Computational features of Tissue Forge that support GPU-acceleration can be configured, offloaded to a GPU, brought back to the CPU and reconfigured at any time during a simulation. For Tissue Forge installations with enabled GPU acceleration, no computations are performed on a GPU by default. Rather, GPU-supporting features of Tissue Forge must be explicitly configured and offloaded to a GPU using their corresponding interactive interface. This modular, configurable approach allows fine-grain control of computations to achieve maximum performance for a given set of hardware *and* a particular simulation. For example, suppose a simulation begins with a few hundred particles. Such a simulation would likely not benefit from GPU acceleration (or even run slower on a GPU). However, suppose that over the course of the simulation, hundreds of thousands of :ref:`particles are created `. At some point, this simulation will run faster on a GPU. Tissue Forge easily handles such a situation by allowing the computations of the particle interactions to be offloaded to a GPU mid-execution of the simulation (and brought back to the CPU, should the particle number significantly decrease). Deployment on a GPU is best accomplished when running Tissue Forge in :ref:`windowless mode `, since real-time rendering of interactive Tissue Forge simulations also utilizes available GPUs. .. note:: Tissue Forge currently supports acceleration using a single GPU. Future releases will support deploying computations on multiple GPUs by computational feature. Tissue Forge includes a flag :py:attr:`has_cuda` to check whether GPU acceleration is supported by the installation (``hasCuda`` in C++), :: import tissue_forge as tf print(tf.has_cuda) # True if GPU acceleration is installed; False otherwise GPU-Accelerated Simulator ^^^^^^^^^^^^^^^^^^^^^^^^^^ :py:attr:`Simulator` provides access to runtime control of GPU-accelerated simulation features. Each GPU-accelerated simulation feature has its own runtime control interface for configuring and deploying on a GPU. GPU runtime control of simulation modules can be accessed directly from :py:attr:`Simulator`, :: cuda_config_sim: tf.cuda.SimulatorConfig = tf.Simulator.getCUDAConfig() The returned :py:class:`cuda.SimulatorCUDAConfig` (``cuda::SimulatorConfig`` in C++) provides convenient access to all current GPU-accelerated simulation features. GPU-Accelerated Engine ^^^^^^^^^^^^^^^^^^^^^^^ Engine GPU acceleration is a GPU-accelerated simulation feature that offloads nonbonded potential interactions, fluxes, particle sorting and space partitioning onto a GPU. All runtime controls of engine GPU acceleration are available on :py:class:`cuda.EngineConfig` (``cuda::EngineConfig`` in C++), which is an attribute with name ``engine`` on :py:class:`cuda.SimulatorConfig`, :: cuda_config_engine = tf.Simulator.getCUDAConfig().engine # Get engine cuda runtime interface Engine GPU acceleration can be enabled, disabled and customized during simulation according to hardware capabilities and simulation state, :: cuda_config_engine.set_blocks(numBlocks=64) # Set number of blocks cuda_config_engine.set_threads(numThreads=32) # Set number of threads per block cuda_config_engine.to_device() # Send engine to GPU # Simulation code here... if cuda_config_engine.on_device(): # Ensure engine is on GPU cuda_config_engine.from_device() # Bring engine back from GPU Setting a number of blocks specifies the maximum number of CUDA thread blocks that can be deployed during a simulation step, which work on various engine tasks (*e.g.*, calculating interactions among particles in a subspace of the simulation space). Setting a number of threads per block specifies the number of threads launched per block to work on each engine task. Many Tissue Forge operations automatically update data when running on a GPU. However, some operations (*e.g.*, :ref:`binding ` a :py:attr:`Potential`) requires manual refreshing of engine data for changes to be reflected when running on a GPU. Engine GPU acceleration runtime control provides methods to explicitly tell Tissue Forge to refresh data on a GPU at various levels of granularity, :: cuda_config_engine.refresh_potentials() # Capture changes to potentials cuda_config_engine.refresh_fluxes() # Capture changes to fluxes cuda_config_engine.refresh_boundary_conditions() # Capture changes to boundary conditions cuda_config_engine.refresh() # Capture all changes Refer to the :ref:`Tissue Forge API Reference ` for which operations automatically update engine data on a GPU. .. note:: It's not always clear what changes are automatically detected by Tissue Forge when running on a GPU. When in doubt, refresh the data! Performing a refresh comes with additional computational cost but must be performed only after all changes to simulation data have been made, and before the next simulation step is called. GPU-Accelerated Bonds ^^^^^^^^^^^^^^^^^^^^^^ Bond GPU acceleration is a GPU-accelerated simulation feature that offloads :ref:`bonded interactions ` onto a GPU. All runtime controls of bond GPU acceleration are available on :py:class:`cuda.BondConfig` (``cuda::BondConfig`` in C++), which is an attribute with name ``bonds`` on :py:class:`cuda.SimulatorConfig`, :: cuda_config_bonds = tf.Simulator.getCUDAConfig().bonds # Get bond cuda runtime interface The bond GPU acceleration runtime control interface is very similar to that of engine GPU acceleration. Bond GPU acceleration can be enabled, disabled and customized at any point in simulation, :: cuda_config_bonds.set_blocks(numBlocks=64) # Set number of blocks cuda_config_bonds.set_threads(numThreads=32) # Set number of threads per block cuda_config_bonds.to_device() # Send bonds to GPU # Simulation code here... if cuda_config_bonds.on_device(): # Ensure bonds are on GPU cuda_config_bonds.from_device() # Bring bonds back from GPU Setting a number of blocks specifies the maximum number of CUDA thread blocks that can be deployed during a simulation step, which calculate pairwise forces due to each bond. Setting a number of threads per block specifies the number of threads launched per block to work force calculations. Adding and destroying bonds both automatically update data while running on a GPU. However, changes to bond properties (*e.g.*, half life) and bond potential require manual refreshing of bond data for changes to be reflected when running on a GPU. Bond GPU acceleration runtime control provides methods to explicitly tell Tissue Forge to refresh data on a GPU at various levels of granularity, :: cuda_config_bonds.refresh_bond(bond) # Capture changes to a bond cuda_config_bonds.refresh_bonds(bonds) # Capture changes to multiple bonds cuda_config_bonds.refresh() # Capture all changes Angle GPU acceleration is a similar GPU-accelerated simulation feature that offloads angle interactions onto a GPU. The angle GPU acceleration runtime control interface is practically identical to that of bond GPU acceleration (*e.g.*, ``refresh_angles`` for angle GPU acceleration is analogous to ``refresh_bonds`` for bond GPU acceleration). The angle GPU acceleration runtime control interface is accessible on :py:class:`cuda.AngleConfig` (``cuda::AngleConfig`` in C++), which is available as an attribute with name ``angles`` on :py:class:`cuda.SimulatorConfig`, :: cuda_config_angles = tf.Simulator.getCUDAConfig().angles # Get angle cuda runtime interface Refer to the :ref:`Tissue Forge API Reference ` for which operations automatically update bond and angle data on a GPU.