The problem seems to only happen when compiling for 64 bit architectures. I have experienced this problem and usually ended up copying data to host before doing the reduce operation.
The fix is very simple, open up:
Find the line that states:
and change it to:
Reduce by key should now work on the device.