Skip to content

Cuda aware code

Sanfilippo edited this page Apr 27, 2021 · 2 revisions

Structure aware of where they are instantiated

Can we make a structure get aware of whether it is allocated in a device or host context?

The idea is to make a resource-owning structure allocate automatically the resources on the host or device, depending on the context where the structure is instantiated.

  template <bool OnDevice=
#ifdef __CUDA_ARCH__
	    true
#else
	    false
#endif
	    >
  struct CudaAwareStruct
  {
    void bar() const
    {
    }
  };
  
  void test()
  {
    CudaAwareStruct foo;
    
    foo.bar(); //error: a variable declared with an auto type specifier cannot appear in its own initializer
  }

The code compile with plain C++ compiler, but I get the funny error above when compiling with nvcc.

Another test using constexpr member variable

#ifdef __CUDA_ARCH__
  struct CudaAwareStruct
  {
    static constexpr bool IsOnDevice=true;
  };
#else
  struct CudaAwareStruct
  {
    static constexpr bool IsOnDevice=false;
  };
#endif
  
  static_assert(not CudaAwareStruct::IsOnDevice,"Host sees the device version");

The code compiles with plain c++ compiler, but the assertion fails when compiling with nvcc

Is the copy constructor called when getting into an extended lambda?

template <typename IMin,
	  typename IMax,
	  typename F>
__global__
void cuda_generic_kernel(const IMin min,
			 const IMax max,
			 F f)
{
  const auto i=min+blockIdx.x*blockDim.x+threadIdx.x;
  if(i<max)
    f(i);
}

struct A //: no_copy
{
  A()
  {
  }
  
  A(const A&) = delete;
};

void testNoCopiable()
{
  const int min=0;
  const int max=2;
  const int length=max-min;
  const int nthreads=128;
  
  const dim3 block_dimension(nthreads);
  const dim3 grid_dimension((length+block_dimension.x-1)/block_dimension.x);
  
  A b;
  cuda_generic_kernel<<<block_dimension,grid_dimension>>>(min,max,
							  [=] __device__(const int& index) mutable
							  {
							    if(index<max)
							      {
								b;
							      }
							  });
  cudaDeviceSynchronize();
}

We get the error:

error: call to deleted constructor of 'A'
                                                          [=] __device__(const int& index) mutable
                                                           ^
lambda.cpp:158:3: note: 'A' has been explicitly marked deleted here
  A(const A&) = delete;
  ^

Which means that copy constructor is called. We need then to delete the copy constructor of resourse-managing data.

Clone this wiki locally