Running MPI on LAN Cluster with different usernames









up vote
0
down vote

favorite












I have two machines with different usernames: assume user1@master and user2@slave. I would like to run a MPI job on the two machines, but I have been unsuccessful until now. I have successfully setup passwordless ssh between the two machines. Both machines have the same version of OpenMPI and both machines have the PATH and LD_LIBRARY_PATH setup correspondingly.



The path for openmpi on each machine is /home/$USER/.openmpi and the program I want to run is inside ~/folder



My /etc/hosts file on both machines:



master x.x.x.110
slave x.x.x.111


My /.ssh/config file on user1@master:



Host slave
User user2


I then execute the command on user1@master while inside ~/folder as follows:



$ mpiexec -n 1 ./program : -np 1 -host slave -wdir /home/user2/folder ./program


I get the following error:



bash: orted: command not found
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.

* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
--------------------------------------------------------------------------


Edits



If I use a hostfile with contents:



localhost
user2@slave


along with the --mca argument I get the following error:



$ mpirun --mca plm_base_verbose 10 -n 5 --hostfile hosts.txt ./program
[user:29277] mca: base: components_register: registering framework plm components
[user:29277] mca: base: components_register: found loaded component slurm
[user:29277] mca: base: components_register: component slurm register function successful
[user:29277] mca: base: components_register: found loaded component isolated
[user:29277] mca: base: components_register: component isolated has no register or open function
[user:29277] mca: base: components_register: found loaded component rsh
[user:29277] mca: base: components_register: component rsh register function successful
[user:29277] mca: base: components_open: opening plm components
[user:29277] mca: base: components_open: found loaded component slurm
[user:29277] mca: base: components_open: component slurm open function successful
[user:29277] mca: base: components_open: found loaded component isolated
[user:29277] mca: base: components_open: component isolated open function successful
[user:29277] mca: base: components_open: found loaded component rsh
[user:29277] mca: base: components_open: component rsh open function successful
[user:29277] mca:base:select: Auto-selecting plm components
[user:29277] mca:base:select:( plm) Querying component [slurm]
[user:29277] mca:base:select:( plm) Querying component [isolated]
[user:29277] mca:base:select:( plm) Query of component [isolated] set priority to 0
[user:29277] mca:base:select:( plm) Querying component [rsh]
[user:29277] mca:base:select:( plm) Query of component [rsh] set priority to 10
[user:29277] mca:base:select:( plm) Selected component [rsh]
[user:29277] mca: base: close: component slurm closed
[user:29277] mca: base: close: unloading component slurm
[user:29277] mca: base: close: component isolated closed
[user:29277] mca: base: close: unloading component isolated
[user:29277] *** Process received signal ***
[user:29277] Signal: Segmentation fault (11)
[user:29277] Signal code: (128)
[user:29277] Failing at address: (nil)
[user:29277] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20)[0x7f4226242f20]
[user:29277] [ 1] /lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x197)[0x7f422629b207]
[user:29277] [ 2] /lib/x86_64-linux-gnu/libc.so.6(__nss_lookup_function+0x10a)[0x7f422634d06a]
[user:29277] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__nss_lookup+0x3d)[0x7f422634d19d]
[user:29277] [ 4] /lib/x86_64-linux-gnu/libc.so.6(getpwuid_r+0x2f3)[0x7f42262e7ee3]
[user:29277] [ 5] /lib/x86_64-linux-gnu/libc.so.6(getpwuid+0x98)[0x7f42262e7498]
[user:29277] [ 6] /home/.openmpi/lib/openmpi/mca_plm_rsh.so(+0x477d)[0x7f422356977d]
[user:29277] [ 7] /home/.openmpi/lib/openmpi/mca_plm_rsh.so(+0x67a7)[0x7f422356b7a7]
[user:29277] [ 8] /home/.openmpi/lib/libopen-pal.so.40(opal_libevent2022_event_base_loop+0xdc9)[0x7f4226675749]
[user:29277] [ 9] mpirun(+0x1262)[0x563fde915262]
[user:29277] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f4226225b97]
[user:29277] [11] mpirun(+0xe7a)[0x563fde914e7a]
[user:29277] *** End of error message ***
Segmentation fault (core dumped)


I do not get any ssh orte info as asked but maybe becuase i am mistyping the --mca command?










share|improve this question























  • it seems you are mixing mpiexec syntax (e.g. -n 1) with mpirun syntax (e.g. -np 1). try to use the full path to mpirun (e.g. /opt/openmpi-3.1.3/bin/mpirun and see whether it helps. an other option is to configure --with-enable-mpirun-prefix-by-default and rebuild/install Open MPI.
    – Gilles Gouaillardet
    Nov 10 at 16:46










  • @GillesGouaillardet I tried doing what you suggested but that did not solve the problem. To try and skip over the problem i reinstalled OpenMPI on both machines in the same directory: /home/.openmpi and then setting up a nfs shared folder with the code. This also did not change my output error and I have exactly the same thing as before.
    – John.Ludlum
    Nov 11 at 8:00










  • did you configure --enable-mpirun-prefix-by-default ? if you mpirun --mca plm_base_verbose 10 ... it should print the ssh ... orted ... command line that is ran under the hood, and you can try to run it manually (could be a permission issue on the Open MPI libs). Have you tried using a hostfile using the user@host syntax instead of tweaking your .ssh/config ?
    – Gilles Gouaillardet
    Nov 11 at 20:02










  • @GillesGouaillardet yes, i tried with both --enable-mpirun-prefix-by-default and --enable-orterun-prefix-by-default (as suggested in the error message) with no change in the output. for --mca command, see edits. I tried using a hostfile too. Thanks for all your suggestions! The problem still exists but atleast I'm learning quite a bit!
    – John.Ludlum
    Nov 13 at 7:23










  • this is the log of a mpirun crash and that should never happen ! can you run mpirun --mca plm_base_verbose 10 ... without the hostfile (and relying on your ssh config instead) ?
    – Gilles Gouaillardet
    Nov 13 at 21:21















up vote
0
down vote

favorite












I have two machines with different usernames: assume user1@master and user2@slave. I would like to run a MPI job on the two machines, but I have been unsuccessful until now. I have successfully setup passwordless ssh between the two machines. Both machines have the same version of OpenMPI and both machines have the PATH and LD_LIBRARY_PATH setup correspondingly.



The path for openmpi on each machine is /home/$USER/.openmpi and the program I want to run is inside ~/folder



My /etc/hosts file on both machines:



master x.x.x.110
slave x.x.x.111


My /.ssh/config file on user1@master:



Host slave
User user2


I then execute the command on user1@master while inside ~/folder as follows:



$ mpiexec -n 1 ./program : -np 1 -host slave -wdir /home/user2/folder ./program


I get the following error:



bash: orted: command not found
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.

* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
--------------------------------------------------------------------------


Edits



If I use a hostfile with contents:



localhost
user2@slave


along with the --mca argument I get the following error:



$ mpirun --mca plm_base_verbose 10 -n 5 --hostfile hosts.txt ./program
[user:29277] mca: base: components_register: registering framework plm components
[user:29277] mca: base: components_register: found loaded component slurm
[user:29277] mca: base: components_register: component slurm register function successful
[user:29277] mca: base: components_register: found loaded component isolated
[user:29277] mca: base: components_register: component isolated has no register or open function
[user:29277] mca: base: components_register: found loaded component rsh
[user:29277] mca: base: components_register: component rsh register function successful
[user:29277] mca: base: components_open: opening plm components
[user:29277] mca: base: components_open: found loaded component slurm
[user:29277] mca: base: components_open: component slurm open function successful
[user:29277] mca: base: components_open: found loaded component isolated
[user:29277] mca: base: components_open: component isolated open function successful
[user:29277] mca: base: components_open: found loaded component rsh
[user:29277] mca: base: components_open: component rsh open function successful
[user:29277] mca:base:select: Auto-selecting plm components
[user:29277] mca:base:select:( plm) Querying component [slurm]
[user:29277] mca:base:select:( plm) Querying component [isolated]
[user:29277] mca:base:select:( plm) Query of component [isolated] set priority to 0
[user:29277] mca:base:select:( plm) Querying component [rsh]
[user:29277] mca:base:select:( plm) Query of component [rsh] set priority to 10
[user:29277] mca:base:select:( plm) Selected component [rsh]
[user:29277] mca: base: close: component slurm closed
[user:29277] mca: base: close: unloading component slurm
[user:29277] mca: base: close: component isolated closed
[user:29277] mca: base: close: unloading component isolated
[user:29277] *** Process received signal ***
[user:29277] Signal: Segmentation fault (11)
[user:29277] Signal code: (128)
[user:29277] Failing at address: (nil)
[user:29277] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20)[0x7f4226242f20]
[user:29277] [ 1] /lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x197)[0x7f422629b207]
[user:29277] [ 2] /lib/x86_64-linux-gnu/libc.so.6(__nss_lookup_function+0x10a)[0x7f422634d06a]
[user:29277] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__nss_lookup+0x3d)[0x7f422634d19d]
[user:29277] [ 4] /lib/x86_64-linux-gnu/libc.so.6(getpwuid_r+0x2f3)[0x7f42262e7ee3]
[user:29277] [ 5] /lib/x86_64-linux-gnu/libc.so.6(getpwuid+0x98)[0x7f42262e7498]
[user:29277] [ 6] /home/.openmpi/lib/openmpi/mca_plm_rsh.so(+0x477d)[0x7f422356977d]
[user:29277] [ 7] /home/.openmpi/lib/openmpi/mca_plm_rsh.so(+0x67a7)[0x7f422356b7a7]
[user:29277] [ 8] /home/.openmpi/lib/libopen-pal.so.40(opal_libevent2022_event_base_loop+0xdc9)[0x7f4226675749]
[user:29277] [ 9] mpirun(+0x1262)[0x563fde915262]
[user:29277] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f4226225b97]
[user:29277] [11] mpirun(+0xe7a)[0x563fde914e7a]
[user:29277] *** End of error message ***
Segmentation fault (core dumped)


I do not get any ssh orte info as asked but maybe becuase i am mistyping the --mca command?










share|improve this question























  • it seems you are mixing mpiexec syntax (e.g. -n 1) with mpirun syntax (e.g. -np 1). try to use the full path to mpirun (e.g. /opt/openmpi-3.1.3/bin/mpirun and see whether it helps. an other option is to configure --with-enable-mpirun-prefix-by-default and rebuild/install Open MPI.
    – Gilles Gouaillardet
    Nov 10 at 16:46










  • @GillesGouaillardet I tried doing what you suggested but that did not solve the problem. To try and skip over the problem i reinstalled OpenMPI on both machines in the same directory: /home/.openmpi and then setting up a nfs shared folder with the code. This also did not change my output error and I have exactly the same thing as before.
    – John.Ludlum
    Nov 11 at 8:00










  • did you configure --enable-mpirun-prefix-by-default ? if you mpirun --mca plm_base_verbose 10 ... it should print the ssh ... orted ... command line that is ran under the hood, and you can try to run it manually (could be a permission issue on the Open MPI libs). Have you tried using a hostfile using the user@host syntax instead of tweaking your .ssh/config ?
    – Gilles Gouaillardet
    Nov 11 at 20:02










  • @GillesGouaillardet yes, i tried with both --enable-mpirun-prefix-by-default and --enable-orterun-prefix-by-default (as suggested in the error message) with no change in the output. for --mca command, see edits. I tried using a hostfile too. Thanks for all your suggestions! The problem still exists but atleast I'm learning quite a bit!
    – John.Ludlum
    Nov 13 at 7:23










  • this is the log of a mpirun crash and that should never happen ! can you run mpirun --mca plm_base_verbose 10 ... without the hostfile (and relying on your ssh config instead) ?
    – Gilles Gouaillardet
    Nov 13 at 21:21













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I have two machines with different usernames: assume user1@master and user2@slave. I would like to run a MPI job on the two machines, but I have been unsuccessful until now. I have successfully setup passwordless ssh between the two machines. Both machines have the same version of OpenMPI and both machines have the PATH and LD_LIBRARY_PATH setup correspondingly.



The path for openmpi on each machine is /home/$USER/.openmpi and the program I want to run is inside ~/folder



My /etc/hosts file on both machines:



master x.x.x.110
slave x.x.x.111


My /.ssh/config file on user1@master:



Host slave
User user2


I then execute the command on user1@master while inside ~/folder as follows:



$ mpiexec -n 1 ./program : -np 1 -host slave -wdir /home/user2/folder ./program


I get the following error:



bash: orted: command not found
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.

* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
--------------------------------------------------------------------------


Edits



If I use a hostfile with contents:



localhost
user2@slave


along with the --mca argument I get the following error:



$ mpirun --mca plm_base_verbose 10 -n 5 --hostfile hosts.txt ./program
[user:29277] mca: base: components_register: registering framework plm components
[user:29277] mca: base: components_register: found loaded component slurm
[user:29277] mca: base: components_register: component slurm register function successful
[user:29277] mca: base: components_register: found loaded component isolated
[user:29277] mca: base: components_register: component isolated has no register or open function
[user:29277] mca: base: components_register: found loaded component rsh
[user:29277] mca: base: components_register: component rsh register function successful
[user:29277] mca: base: components_open: opening plm components
[user:29277] mca: base: components_open: found loaded component slurm
[user:29277] mca: base: components_open: component slurm open function successful
[user:29277] mca: base: components_open: found loaded component isolated
[user:29277] mca: base: components_open: component isolated open function successful
[user:29277] mca: base: components_open: found loaded component rsh
[user:29277] mca: base: components_open: component rsh open function successful
[user:29277] mca:base:select: Auto-selecting plm components
[user:29277] mca:base:select:( plm) Querying component [slurm]
[user:29277] mca:base:select:( plm) Querying component [isolated]
[user:29277] mca:base:select:( plm) Query of component [isolated] set priority to 0
[user:29277] mca:base:select:( plm) Querying component [rsh]
[user:29277] mca:base:select:( plm) Query of component [rsh] set priority to 10
[user:29277] mca:base:select:( plm) Selected component [rsh]
[user:29277] mca: base: close: component slurm closed
[user:29277] mca: base: close: unloading component slurm
[user:29277] mca: base: close: component isolated closed
[user:29277] mca: base: close: unloading component isolated
[user:29277] *** Process received signal ***
[user:29277] Signal: Segmentation fault (11)
[user:29277] Signal code: (128)
[user:29277] Failing at address: (nil)
[user:29277] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20)[0x7f4226242f20]
[user:29277] [ 1] /lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x197)[0x7f422629b207]
[user:29277] [ 2] /lib/x86_64-linux-gnu/libc.so.6(__nss_lookup_function+0x10a)[0x7f422634d06a]
[user:29277] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__nss_lookup+0x3d)[0x7f422634d19d]
[user:29277] [ 4] /lib/x86_64-linux-gnu/libc.so.6(getpwuid_r+0x2f3)[0x7f42262e7ee3]
[user:29277] [ 5] /lib/x86_64-linux-gnu/libc.so.6(getpwuid+0x98)[0x7f42262e7498]
[user:29277] [ 6] /home/.openmpi/lib/openmpi/mca_plm_rsh.so(+0x477d)[0x7f422356977d]
[user:29277] [ 7] /home/.openmpi/lib/openmpi/mca_plm_rsh.so(+0x67a7)[0x7f422356b7a7]
[user:29277] [ 8] /home/.openmpi/lib/libopen-pal.so.40(opal_libevent2022_event_base_loop+0xdc9)[0x7f4226675749]
[user:29277] [ 9] mpirun(+0x1262)[0x563fde915262]
[user:29277] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f4226225b97]
[user:29277] [11] mpirun(+0xe7a)[0x563fde914e7a]
[user:29277] *** End of error message ***
Segmentation fault (core dumped)


I do not get any ssh orte info as asked but maybe becuase i am mistyping the --mca command?










share|improve this question















I have two machines with different usernames: assume user1@master and user2@slave. I would like to run a MPI job on the two machines, but I have been unsuccessful until now. I have successfully setup passwordless ssh between the two machines. Both machines have the same version of OpenMPI and both machines have the PATH and LD_LIBRARY_PATH setup correspondingly.



The path for openmpi on each machine is /home/$USER/.openmpi and the program I want to run is inside ~/folder



My /etc/hosts file on both machines:



master x.x.x.110
slave x.x.x.111


My /.ssh/config file on user1@master:



Host slave
User user2


I then execute the command on user1@master while inside ~/folder as follows:



$ mpiexec -n 1 ./program : -np 1 -host slave -wdir /home/user2/folder ./program


I get the following error:



bash: orted: command not found
--------------------------------------------------------------------------
ORTE was unable to reliably start one or more daemons.
This usually is caused by:

* not finding the required libraries and/or binaries on
one or more nodes. Please check your PATH and LD_LIBRARY_PATH
settings, or configure OMPI with --enable-orterun-prefix-by-default

* lack of authority to execute on one or more specified nodes.
Please verify your allocation and authorities.

* the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
Please check with your sys admin to determine the correct location to use.

* compilation of the orted with dynamic libraries when static are required
(e.g., on Cray). Please check your configure cmd line and consider using
one of the contrib/platform definitions for your system type.

* an inability to create a connection back to mpirun due to a
lack of common network interfaces and/or no route found between
them. Please check network connectivity (including firewalls
and network routing requirements).
--------------------------------------------------------------------------


Edits



If I use a hostfile with contents:



localhost
user2@slave


along with the --mca argument I get the following error:



$ mpirun --mca plm_base_verbose 10 -n 5 --hostfile hosts.txt ./program
[user:29277] mca: base: components_register: registering framework plm components
[user:29277] mca: base: components_register: found loaded component slurm
[user:29277] mca: base: components_register: component slurm register function successful
[user:29277] mca: base: components_register: found loaded component isolated
[user:29277] mca: base: components_register: component isolated has no register or open function
[user:29277] mca: base: components_register: found loaded component rsh
[user:29277] mca: base: components_register: component rsh register function successful
[user:29277] mca: base: components_open: opening plm components
[user:29277] mca: base: components_open: found loaded component slurm
[user:29277] mca: base: components_open: component slurm open function successful
[user:29277] mca: base: components_open: found loaded component isolated
[user:29277] mca: base: components_open: component isolated open function successful
[user:29277] mca: base: components_open: found loaded component rsh
[user:29277] mca: base: components_open: component rsh open function successful
[user:29277] mca:base:select: Auto-selecting plm components
[user:29277] mca:base:select:( plm) Querying component [slurm]
[user:29277] mca:base:select:( plm) Querying component [isolated]
[user:29277] mca:base:select:( plm) Query of component [isolated] set priority to 0
[user:29277] mca:base:select:( plm) Querying component [rsh]
[user:29277] mca:base:select:( plm) Query of component [rsh] set priority to 10
[user:29277] mca:base:select:( plm) Selected component [rsh]
[user:29277] mca: base: close: component slurm closed
[user:29277] mca: base: close: unloading component slurm
[user:29277] mca: base: close: component isolated closed
[user:29277] mca: base: close: unloading component isolated
[user:29277] *** Process received signal ***
[user:29277] Signal: Segmentation fault (11)
[user:29277] Signal code: (128)
[user:29277] Failing at address: (nil)
[user:29277] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20)[0x7f4226242f20]
[user:29277] [ 1] /lib/x86_64-linux-gnu/libc.so.6(__libc_malloc+0x197)[0x7f422629b207]
[user:29277] [ 2] /lib/x86_64-linux-gnu/libc.so.6(__nss_lookup_function+0x10a)[0x7f422634d06a]
[user:29277] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__nss_lookup+0x3d)[0x7f422634d19d]
[user:29277] [ 4] /lib/x86_64-linux-gnu/libc.so.6(getpwuid_r+0x2f3)[0x7f42262e7ee3]
[user:29277] [ 5] /lib/x86_64-linux-gnu/libc.so.6(getpwuid+0x98)[0x7f42262e7498]
[user:29277] [ 6] /home/.openmpi/lib/openmpi/mca_plm_rsh.so(+0x477d)[0x7f422356977d]
[user:29277] [ 7] /home/.openmpi/lib/openmpi/mca_plm_rsh.so(+0x67a7)[0x7f422356b7a7]
[user:29277] [ 8] /home/.openmpi/lib/libopen-pal.so.40(opal_libevent2022_event_base_loop+0xdc9)[0x7f4226675749]
[user:29277] [ 9] mpirun(+0x1262)[0x563fde915262]
[user:29277] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f4226225b97]
[user:29277] [11] mpirun(+0xe7a)[0x563fde914e7a]
[user:29277] *** End of error message ***
Segmentation fault (core dumped)


I do not get any ssh orte info as asked but maybe becuase i am mistyping the --mca command?







mpi host






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 13 at 7:21

























asked Nov 10 at 5:26









John.Ludlum

13




13











  • it seems you are mixing mpiexec syntax (e.g. -n 1) with mpirun syntax (e.g. -np 1). try to use the full path to mpirun (e.g. /opt/openmpi-3.1.3/bin/mpirun and see whether it helps. an other option is to configure --with-enable-mpirun-prefix-by-default and rebuild/install Open MPI.
    – Gilles Gouaillardet
    Nov 10 at 16:46










  • @GillesGouaillardet I tried doing what you suggested but that did not solve the problem. To try and skip over the problem i reinstalled OpenMPI on both machines in the same directory: /home/.openmpi and then setting up a nfs shared folder with the code. This also did not change my output error and I have exactly the same thing as before.
    – John.Ludlum
    Nov 11 at 8:00










  • did you configure --enable-mpirun-prefix-by-default ? if you mpirun --mca plm_base_verbose 10 ... it should print the ssh ... orted ... command line that is ran under the hood, and you can try to run it manually (could be a permission issue on the Open MPI libs). Have you tried using a hostfile using the user@host syntax instead of tweaking your .ssh/config ?
    – Gilles Gouaillardet
    Nov 11 at 20:02










  • @GillesGouaillardet yes, i tried with both --enable-mpirun-prefix-by-default and --enable-orterun-prefix-by-default (as suggested in the error message) with no change in the output. for --mca command, see edits. I tried using a hostfile too. Thanks for all your suggestions! The problem still exists but atleast I'm learning quite a bit!
    – John.Ludlum
    Nov 13 at 7:23










  • this is the log of a mpirun crash and that should never happen ! can you run mpirun --mca plm_base_verbose 10 ... without the hostfile (and relying on your ssh config instead) ?
    – Gilles Gouaillardet
    Nov 13 at 21:21

















  • it seems you are mixing mpiexec syntax (e.g. -n 1) with mpirun syntax (e.g. -np 1). try to use the full path to mpirun (e.g. /opt/openmpi-3.1.3/bin/mpirun and see whether it helps. an other option is to configure --with-enable-mpirun-prefix-by-default and rebuild/install Open MPI.
    – Gilles Gouaillardet
    Nov 10 at 16:46










  • @GillesGouaillardet I tried doing what you suggested but that did not solve the problem. To try and skip over the problem i reinstalled OpenMPI on both machines in the same directory: /home/.openmpi and then setting up a nfs shared folder with the code. This also did not change my output error and I have exactly the same thing as before.
    – John.Ludlum
    Nov 11 at 8:00










  • did you configure --enable-mpirun-prefix-by-default ? if you mpirun --mca plm_base_verbose 10 ... it should print the ssh ... orted ... command line that is ran under the hood, and you can try to run it manually (could be a permission issue on the Open MPI libs). Have you tried using a hostfile using the user@host syntax instead of tweaking your .ssh/config ?
    – Gilles Gouaillardet
    Nov 11 at 20:02










  • @GillesGouaillardet yes, i tried with both --enable-mpirun-prefix-by-default and --enable-orterun-prefix-by-default (as suggested in the error message) with no change in the output. for --mca command, see edits. I tried using a hostfile too. Thanks for all your suggestions! The problem still exists but atleast I'm learning quite a bit!
    – John.Ludlum
    Nov 13 at 7:23










  • this is the log of a mpirun crash and that should never happen ! can you run mpirun --mca plm_base_verbose 10 ... without the hostfile (and relying on your ssh config instead) ?
    – Gilles Gouaillardet
    Nov 13 at 21:21
















it seems you are mixing mpiexec syntax (e.g. -n 1) with mpirun syntax (e.g. -np 1). try to use the full path to mpirun (e.g. /opt/openmpi-3.1.3/bin/mpirun and see whether it helps. an other option is to configure --with-enable-mpirun-prefix-by-default and rebuild/install Open MPI.
– Gilles Gouaillardet
Nov 10 at 16:46




it seems you are mixing mpiexec syntax (e.g. -n 1) with mpirun syntax (e.g. -np 1). try to use the full path to mpirun (e.g. /opt/openmpi-3.1.3/bin/mpirun and see whether it helps. an other option is to configure --with-enable-mpirun-prefix-by-default and rebuild/install Open MPI.
– Gilles Gouaillardet
Nov 10 at 16:46












@GillesGouaillardet I tried doing what you suggested but that did not solve the problem. To try and skip over the problem i reinstalled OpenMPI on both machines in the same directory: /home/.openmpi and then setting up a nfs shared folder with the code. This also did not change my output error and I have exactly the same thing as before.
– John.Ludlum
Nov 11 at 8:00




@GillesGouaillardet I tried doing what you suggested but that did not solve the problem. To try and skip over the problem i reinstalled OpenMPI on both machines in the same directory: /home/.openmpi and then setting up a nfs shared folder with the code. This also did not change my output error and I have exactly the same thing as before.
– John.Ludlum
Nov 11 at 8:00












did you configure --enable-mpirun-prefix-by-default ? if you mpirun --mca plm_base_verbose 10 ... it should print the ssh ... orted ... command line that is ran under the hood, and you can try to run it manually (could be a permission issue on the Open MPI libs). Have you tried using a hostfile using the user@host syntax instead of tweaking your .ssh/config ?
– Gilles Gouaillardet
Nov 11 at 20:02




did you configure --enable-mpirun-prefix-by-default ? if you mpirun --mca plm_base_verbose 10 ... it should print the ssh ... orted ... command line that is ran under the hood, and you can try to run it manually (could be a permission issue on the Open MPI libs). Have you tried using a hostfile using the user@host syntax instead of tweaking your .ssh/config ?
– Gilles Gouaillardet
Nov 11 at 20:02












@GillesGouaillardet yes, i tried with both --enable-mpirun-prefix-by-default and --enable-orterun-prefix-by-default (as suggested in the error message) with no change in the output. for --mca command, see edits. I tried using a hostfile too. Thanks for all your suggestions! The problem still exists but atleast I'm learning quite a bit!
– John.Ludlum
Nov 13 at 7:23




@GillesGouaillardet yes, i tried with both --enable-mpirun-prefix-by-default and --enable-orterun-prefix-by-default (as suggested in the error message) with no change in the output. for --mca command, see edits. I tried using a hostfile too. Thanks for all your suggestions! The problem still exists but atleast I'm learning quite a bit!
– John.Ludlum
Nov 13 at 7:23












this is the log of a mpirun crash and that should never happen ! can you run mpirun --mca plm_base_verbose 10 ... without the hostfile (and relying on your ssh config instead) ?
– Gilles Gouaillardet
Nov 13 at 21:21





this is the log of a mpirun crash and that should never happen ! can you run mpirun --mca plm_base_verbose 10 ... without the hostfile (and relying on your ssh config instead) ?
– Gilles Gouaillardet
Nov 13 at 21:21


















active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53236241%2frunning-mpi-on-lan-cluster-with-different-usernames%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes















draft saved

draft discarded
















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53236241%2frunning-mpi-on-lan-cluster-with-different-usernames%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Use pre created SQLite database for Android project in kotlin

Darth Vader #20

Ondo