used when loc == CTC_CPU, the maximum number of threads that can be used
used when loc == CTC_GPU, which stream the kernels should be launched in
See Implementation