Source for fix

The problem seems to only happen when compiling for 64 bit architectures. I have experienced this problem and usually ended up copying data to host before doing the reduce operation.

The fix is very simple, open up:

thrust/detail/backend/cuda/reduce_by_key.inl

Find the line that states:

 typedef typename thrust::iterator_traits<InputIterator1>::difference_type  IndexType;

and change it to:

typedef  unsigned int  IndexType;

Reduce by key should now work on the device.