Add an overload that take a stream to use if the data is fetched. Any data movement inthe call uses that stream. This may make it easier for a user to integrate processing. this can be required in a couple of scenarios because streams are tied to the thread and GPU where they were created changing either would require one to use a different stream.