[feature-request] Please improve multi-GPU multi-threading by separate training

Instead of using leela-client with lc0 GPU multiplexing, I rather recommend to rewrite a client in such a way as to detect # of GPUs, and run totally separate lc0 processes on each GPU.

Assume a server with 3 RTX GPUs, it could teach Leela faster than current lc0 multiplexing.

What do you think ?

-Technologov, 7.2.2019.