fat pointer in CUDA
create copy of host array into device
create uninitialized T.sizeof * n array in device
create fat pointer from raw pointer and its length
dtor calling cuMemFree
See Implementation
fat pointer in CUDA