Hashing grid cell indices based on grid dimensions #

Description #

This strategy is based on this NVIDIA article. The idea being that instead of storing indices of particles in a grid data structure, you can convert these 3-valued indices to single hashes. These hashes can then be used to sort the particle data so that the particle data is ordered based on their grid cell hash index. This is beneficial for GPUs, which is why it’s mentioned in the above article, but is also useful for CPUs as it iterating over the particle pairs more cache-friendly.

Read the above article for details. The code makes use of the C++ sort function available in the algorithm header.

Additional notes #

I managed to obtain another 1.4x speedup for 1E5 points over The half-search cell-list pair search when using no optimisations. A speedup of 4x over the previous version when compiling all the codes with -O3 (using total program time, measured roughly with time).

The significant speedup obtained is for a few reasons:

When searching adjacent cells, points in 3 consecutive cells can be searched in a single loop.
- This makes the search much more cache-friendly.
Branching is reduced due to
- the removed if statement when searching through the cell of interest, and
- the fewer number of loops.
In the case of compiling with more aggressive optimisations, the new structure is better able to leverage the compiler’s automatic SIMD instructions.

The main downside of the code is that now the points are ordered differently from the input. If the order of the points is not critical, then it is better to maintain this new order,as data locality would be much better. However, if necessary, reordering the points can be reversed easily, and with only a modest time penalty.

Another benefit is the lower memory usage by the algorithm. This is because it doesn’t need to allocate the cells array which is maxpercell*ngridx(1)*ngridx(2)*ngridx(3).

Code (Fortran and C++) #

Save the C++ file as sort.cpp, and Fortran code as cll3.F90. Compile the code with

g++ -c sort.cpp
gfortran -o cll3.x cll3.F90 sort.o -lstdc++

// sort.cpp
// C++ source code containing STL sort - modified for sort by indices

#include <algorithm>

extern "C" {
    
    void sort_hashes(int n, int* hashes, int* idx) {

        // filling idx with numbers 0:n-1
        for (int i = 0; i < n; ++i) {
            idx[i] = i;
        }

        // sorting idx based on hashes
        std::sort(idx, idx + n, 
            [&](const int& a, const int& b) {
                return (hashes[a] < hashes[b]);
            }
        );

        // adapting idx for 1-based indexing
        for (int i = 0; i < n; ++i) {
            idx[i]++;
        }
    }

}

! cll3.F90
! Module to perform direct search for pairs of points within a fixed cutoff.
! cellList subroutine
!   dim:      An integer defining dimension of the coordinates.
!   npoints:  An integer with number of points.
!   maxpercell: An integer with max number of points in a cell.
!   x:        An real/real(f) array of dim x points dimension containing the list of points to find pairs of.
!   cutoff:   The distance (same type as x) which defines a pair
!   maxnpair: An integer of the maximum number of pairs expected.
!   npairs:   An integer with the number of pairs found.
!   pair_i:   An integer array of maxnpairs length with the "left-sided" point in a pair.
!   pair_j:   An integer array of maxnpairs length with the "right-sided" point in a pair.

module cellList_m

   use iso_fortran_env, only: error_unit, real32, real64
   use iso_c_binding

   public

#ifdef SINGLEPRECISION
   integer, parameter:: f = real32
#else
   integer, parameter:: f = real64
#endif

   interface sort_hashes

      subroutine sort_hashes(n, hashes, idx) bind(C)

         use iso_c_binding

         integer(c_int), value, intent(in):: n
         integer(c_int), intent(in):: hashes(*)
         integer(c_int), intent(out):: idx(*)

      end subroutine sort_hashes

   end interface sort_hashes

contains

   !---------------------------------------------------------------------------
   function coordsToHash(n, dim, x, minx, ngridx, cutoff) result(gridhash)
      ! converts raw 2/3D positions to gridhashes based on the grid dimensions
      ! described by minx (lowest grid coordinate), and ngridx (number of grid
      ! cells in xyz direction).

      implicit none
      integer, intent(in):: n, dim, ngridx(3)
      real(f), intent(in):: x(dim, n), minx(dim), cutoff
      integer:: i, icell(3)
      integer(c_int):: gridhash(n)

      if (dim == 2) icell(3) = 1

      do i = 1, n
         icell(1:dim) = int((x(1:dim, i) - minx(1:dim))/cutoff) + 1
         gridhash(i) = ngridx(1)*ngridx(2)*(icell(3) - 1) + ngridx(1)*(icell(2) - 1) + icell(1)
      end do

   end function coordsToHash

   !---------------------------------------------------------------------------
   function hashIncrement(ngridx, dcellx, dcelly, dcellz) result(dhash)
      ! calculates the change in hash, given a change in xyz cell indices

      implicit none
      integer, intent(in):: ngridx(3), dcellx, dcelly, dcellz
      integer:: dhash

      dhash = ngridx(1)*ngridx(2)*dcellx + ngridx(1)*dcelly + dcellz

   end function hashIncrement

   !---------------------------------------------------------------------------
   subroutine rearrange(n, dim, idx, gridhash, x)
      ! rearranges gridhash and x, using the indices obtained from a sort by
      ! key.

      implicit none
      integer, intent(in):: n, dim, idx(n)
      integer(c_int), intent(inout):: gridhash(n)
      real(f), intent(inout):: x(dim, n)
      integer(c_int), allocatable:: tmpgridhash(:)
      real(f), allocatable:: tmpx(:, :)
      integer:: i

      allocate (tmpgridhash(n), tmpx(dim, n))

      do i = 1, n
         tmpgridhash(i) = gridhash(idx(i))
         tmpx(:, i) = x(:, idx(i))
      end do

      gridhash(:) = tmpgridhash(:)
      x(:, :) = tmpx(:, :)

   end subroutine rearrange

   !---------------------------------------------------------------------------
   subroutine find_starts(n, ngridx, gridhash, starts)
      ! finds the starting index for each grid cell. See "Building the grid
      ! using Sorting" section in https://developer.download.nvidia.com/assets/cuda/files/particles.pdf

      implicit none
      integer, intent(in):: n, ngridx(3)
      integer(c_int), intent(in):: gridhash(n)
      integer, intent(out):: starts(:)
      integer:: i

      starts(1:gridhash(1)) = 1

      do i = 2, n
         if (gridhash(i) /= gridhash(i - 1)) starts(gridhash(i - 1) + 1:gridhash(i)) = i
      end do

      starts(gridhash(n) + 1:ngridx(1)*ngridx(2)*ngridx(3)) = n + 1

   end subroutine find_starts

   !---------------------------------------------------------------------------
   subroutine update_pair_list(cutoff, jstart, jend, i, x, npairs, pair_i, pair_j)
      ! subroutine to loop over sequential grid cells and check if j point is
      ! within a given i point

      implicit none
      integer, intent(in):: jstart, jend, i
      real(f), intent(in):: cutoff, x(:, :)
      integer, intent(inout):: npairs, pair_i(:), pair_j(:)
      integer:: j
      real(f):: r2

      do j = jstart, jend
         r2 = sum((x(:, i) - x(:, j))**2)
         if (r2 <= cutoff*cutoff) then
            npairs = npairs + 1
            pair_i(npairs) = i
            pair_j(npairs) = j
         end if
      end do

   end subroutine update_pair_list

   !---------------------------------------------------------------------------
   subroutine cellList(dim, npoints, x, cutoff, maxnpair, npairs, pair_i, pair_j)
      ! main subroutine to perform pair search

      implicit none
      integer, intent(in):: dim, npoints, maxnpair
      real(f), intent(in):: cutoff
      real(f), intent(inout):: x(dim, npoints)
      integer, intent(out)::  npairs, pair_i(maxnpair), pair_j(maxnpair)
      integer:: i, hashi, hashj, ngridx(3)
      real(f):: minx(3), maxx(3)
      integer(c_int), allocatable:: gridhash(:), idx(:), starts(:)

      ! determine grid min-max coordinates, with concessions made for dim=2 case
      minx(1:dim) = minval(x, dim=2)
      if (dim == 2) minx(3) = 0.d0
      maxx(1:dim) = maxval(x, dim=2)
      if (dim == 2) maxx(3) = 0.d0
      
      ! creating buffer layers
      minx(:) = minx(:) - 2.d0*cutoff
      maxx(:) = maxx(:) + 2.d0*cutoff

      ! determining no. of grid cells and adjusting maximum extent
      ngridx(:) = int((maxx(:) - minx(:))/cutoff) + 1
      maxx(:) = maxx(:) + ngridx(:)*cutoff

      ! convert coordinates to cell grid hashes
      allocate (gridhash(npoints))
      gridhash = coordsToHash(npoints, dim, x, minx, ngridx, cutoff)

      ! get sorting indices to reorder points by ascending grid hash
      allocate (idx(npoints))
      call sort_hashes(npoints, gridhash, idx)

      ! reorder points (x, gridhash) by ascending grid hash
      call rearrange(npoints, dim, idx, gridhash, x)

      ! find starting points of each grid cell
      allocate (starts(product(ngridx)))
      call find_starts(npoints, ngridx, gridhash, starts)

      ! perform pair search
      npairs = 0
      do hashi = gridhash(1), gridhash(npoints) ! for all relevant grid cell points
         do i = starts(hashi), starts(hashi + 1) - 1 ! loop over all points in the cell
            ! loop over "other" points in cell + next cell (1st adjacent tell: top of hashi cell)
            call update_pair_list(cutoff, i + 1, starts(hashi + 2) - 1, i, x, npairs, pair_i, pair_j)
            ! loop over 2nd-4rd adjacent cells (bottom-south-west to top-south-west)
            hashj = hashi + hashIncrement(ngridx, -1, -1, -1)
            call update_pair_list(cutoff, starts(hashj), starts(hashj + 3) - 1, i, x, npairs, pair_i, pair_j)
            ! loop over 5th-7th adjacent cells (bottom-centre-west to top-centre-west)
            hashj = hashi + hashIncrement(ngridx, -1, 0, -1)
            call update_pair_list(cutoff, starts(hashj), starts(hashj + 3) - 1, i, x, npairs, pair_i, pair_j)
            ! loop over 8th-10th adjacent cells (bottom-north-west to top-north-west)
            hashj = hashi + hashIncrement(ngridx, -1, 1, -1)
            call update_pair_list(cutoff, starts(hashj), starts(hashj + 3) - 1, i, x, npairs, pair_i, pair_j)
            ! loop over 11th-13th adjacent cells (bottom-south-centre to top-south-centre)
            hashj = hashi + hashIncrement(ngridx, 0, -1, -1)
            call update_pair_list(cutoff, starts(hashj), starts(hashj + 3) - 1, i, x, npairs, pair_i, pair_j)
         end do
      end do

   end subroutine cellList

end module cellList_m

#ifndef NOMAIN
program main

   use cellList_m, only: f, cellList

   implicit none
   integer, parameter:: n = 1e2, dim = 3, maxnpair = 60*n ! estimated using
   ! 2x the coaxial spacing if the points were arranged in a square
   real(f), parameter:: cutoff = 2*n**(-1.d0/dim)
   real(f):: x(dim, n)
   integer, allocatable:: pair_i(:), pair_j(:)
   integer:: npairs, k

   allocate (pair_i(maxnpair), pair_j(maxnpair))

   ! initialize positions with pseudo-random numbers
   call random_number(x)

   write (*, '(A)') 'Executing cellList'

   ! finding pairs
   call cellList(dim, n, x, cutoff, maxnpair, npairs, pair_i, pair_j)

   write (*, '(2x, A,I4,A)') 'Found ', npairs, ' pairs'
   write (*, '(2x, A)') 'First and last 5 pairs of points found:'
   write (*, '(2x, 4(A4, 1x))') 'Pair', 'i', 'j'
   do k = 1, npairs
      if (k <= 5 .or. k > npairs - 4) write (*, '(2x, 3(I4, 1x))') k, &
         pair_i(k), pair_j(k)
      if (k == 6) write (*, '(2x, A)') '...'
   end do

end program main
#endif