Instead of using leela-client with lc0 GPU multiplexing, I rather recommend to rewrite a client in such a way as to detect # of GPUs, and run totally separate lc0 processes on each GPU.
Assume a server with 3 RTX GPUs, it could teach Leela faster than current lc0 multiplexing.
What do you think ?
-Technologov, 7.2.2019.