FLOW MODELLING AROUND AN OSCILLATING OBJECT IN A CHANNEL BY THE LATTICE BOLTZMANN METHOD ON GPU DEVICES

This paper addresses the idea of evaluation fluid flow behaviour in case of an oscillating object in a channel. Simulation performed with the Lattice Boltzmann method using the origin program. Fluid flows over a circular cylinder with R = 0.00625 at different Reynolds numbers were modeled to verify the computation algorithm. Implementation was designed to work on CPU and GPU devices based on the Tensorflow framework.

3. Lattice Boltzmann method description and implementation. Generally motion of the viscous fluid can be fully described by Navier-Stokes equations partial differential equations. It mathematically expresses conservation of momentum and conservation of mass for Newtonian fluids. But despite its strength and power Navier-Stokes equations hard to implement and paralize on modern devices due to its continuum nature. Thus other approaches more suitable for discrete space were developed.
One of such algorithms is the lattice Boltzman method. The main idea behind it is to represent fluid on a mesoscopic level with a few assumptions and approximations. It models the fluid consisting of fictive particles, and such particles perform consecutive propagation and collision processes over a discrete lattice. Those particles describe fluid though discrete-velocity distribution function ( , ). Function represents the density of particles with velocity = ( , , ) at position and time . The mass density and momentum density at ( , ) can be found through weighted sums known as momentum of [14]: is the collision operator. While there are many different collision operators available, the simplest one that can be used for Navier-Stokes simulations is the Bhatnagar-Gross-Krook (BGK) operator: It relaxes the populations towards an equilibrium at a rate determined by the relaxation time . Equilibrium is given by with the weights specific to the chosen velocity set. The equilibrium depends on the local quantities density and fluid velocity only. These are calculated from the local values of , with the fluid velocity found as ( , ) = ( , ) / ( , ). During implementation the fully discretized Boltzmann equation with the BGK collision operator can be separated in two isolated steps: collision and streaming. Boundary conditions are handled additionally.
The first part is collision (or relaxation) where * represents the distribution function after collisions and is found from .
The second part is streaming when we move computed lattice variables to the neighbor lattices.
Boundary conditions are handled before or after streaming with help of lattice entries reassignment with reverse values.
Because the collision is simply an algebraic local operation and the streaming is a data movement between arrays the whole algorithm computation can be implemented on devices with high parallelism. For that, different frameworks can be used: OpenMP, CUDA, etc. For our implementations we've stuck to TensorFlow. Generally TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications. But what is more important for our project is that this framework allows us to define an execution graph in Python, and later on compile it to the C/C++ executable with native CUDA support or to JavaScript with web browser compatibility. Thus we can reuse the same code base with various optimizations under the hood for different devices and use cases.

Experiments.
To verify our implementation a few experiments with already known results were performed. We've tested flow around a fixed round cylinder with radius R = 0.0625 and the same object oscillating with amplitude equal to its radius and frequencies 0.05Hz, 0.1Hz and 0.2Hz. In addition other amplitudes were considered and CPU\GPU performance analysis was held. As shown in figure 2 the flow at Re = 10 is symmetric and there is a Karman vortex street at Reynolds numbers starting from Re = 60, the vortex frequency increases with the Reynolds number increasing. Such results correspond to other full-scale and numerical experiments [15,16].

Flow around an oscillating circular cylinder.
For this experiment we consider the same object oscillating by Y axis with periods 0.05Hz, 0.1Hz and 0.2Hz and amplitude equal to object radius 0.0625 ( fig. 4). Reynolds number was fixed at 100. The simulation results showed that the vortex zones when flowing around a rigidly fixed cylinder have the same symmetric shape. Oscillations of the cylinder violate this structure: symmetry is broken when vortices form around the cylinder, and the diameter of the vortex zones increases. In the near part of the vortex wake, several vortex bunches are formed on both sides of the cylinder, but on a smaller scale compared to a stationary cylinder. Moreover, the scale of these formations decreases with an increase in the vibration frequency, and later these formations merge into structures of a larger scale. Similar results were received by Sheng Bao et al [12].
To extend our results experiments with larger amplitudes were considered as well ( fig. 5, 6).

Performance analysis.
For performance analysis we run the experiment with the fixed object for a different domain scale. Evaluation was made on a server with Intel(R) Xeon(R) CPU E5-1630 v4 @ 3.70GHz CPU and GeForce GTX 1060 6GB GPU device. It was observed that while GPU vs CPU difference for small scales is subtle, with larger scales it becomes more crucial. . Conclusion. In this work it was shown that the lattice Boltzman method is well suited for simulation of different conditions. Implementation was verified based on well known experiments with fixed objects and different Reynolds numbers. Additional research was carried on for oscillating objects with various frequencies and amplitudes.
Moreover it was shown the algorithm can benefit a lot from the parallel GPU utilization with help of modern frameworks. Double speedup was achieved only by moving exactly the same code from CPU to GPU devices.