Skip to content

Conversation

@solin319
Copy link
Contributor

@solin319 solin319 commented Apr 23, 2018

Use 0.0.0.0 to bind all nodes' listening port.
This can make all nodes use cluster ip in kubernetes.
We can set DMLC_NODE_HOST and DMLC_PS_ROOT_URI with cluster ip when use kubernetes to launch distribute jobs.
@mli

Use 0.0.0.0 to bind all nodes' listening port.
This can make all nodes use cluster ip in kubernetes. 
We can set DMLC_NODE_HOST and DMLC_PS_ROOT_URI with cluster ip when use kubernetes to launch distribute jobs.
@sswv
Copy link

sswv commented Apr 23, 2018

In my Kubernetes based MXNet cluster, I also found the issue of using Cluster IP. Cluster IP can be accessed through virtual routing but it cannot be bound by socket. Thus, it is not good to use DMLC_NODE_HOST for both binding at server and accessing at client. I had some tricked modification in my MXNet cluster to solve it.

I think it is necessary to differentiate "the IP for socket binding at server" and "the IP for accessing at client". It is a good idea to use 0.0.0.0 simply when binding socket at server. TensorFlow also did that in its GrpcServer.

@coldsheephot
Copy link

I think this is necessary when I use k8s to create mxnet distributed job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants