If you wish to run interactively but are constrained by the limits on the CPUs, CPU Time or memory, you may run a small interactive job requesting the resources you want.
By doing that, you will get a dedicated allocation of CPUs and memory to run your application interactively. There are several ways to do this, depending on your use case:
If you have a single script or a command you wish to run interactively, one way to do this through the batch system is with a direct call to srun from within session in the login node. It would feel as if you were running locally, but it is instead using a job with dedicated resources:
$ cat myscript.sh #!/bin/bash echo "This is my super script" echo "Doing some heavy work on $HOSTNAME..." $ ./myscript.sh This is my super script Doing some heavy work on at1-11... $ srun ./myscript.sh This is my super script Doing some heavy work on at1-105... |
In that example the submitted job would have run using the default settings (default qos, with just 1 cpu and default memory). You can of course pass additional options to srun
to customise the resources allocated to this interactive job. For example, to run with 4 cpus, 12 GB with a limit of 6 hours:
$ srun -c 4 --mem=12G -t 06:00:00 ./myscript.sh |
Check man srun
for a complete list of options.
However, you may want to To facilitate that task, we are providing the ecinteractive tool. Its main features are the following:
$ ecinteractive -h Usage : /usr/local/bin/ecinteractive [options] [--] -d|desktop Submits a vnc job (default is interactive ssh job) -j|jupyter Submits a jupyter job (default is interactive ssh job) More Options: -h|help Display this message -v|version Display script version -p|platform Platform (default aa. Choices: aa, ab, ac, ad) -u|user ECMWF User (default user) -A|account Project account -c|cpus Number of CPUs (default 2) -m|memory Requested Memory (default 8G) -s|tmpdirsize Requested TMPDIR size (default 3 GB) -t|time Wall clock limit (default 06:00:00) -k|kill Cancel any running interactive job -q|query Check running job -o|output Output file for the interactive job (default /dev/null) -x set -x |
You can get an interactive shell running on an allocated node within the Atos HCPF with just calling ecinteractive. By default it will just use the default settings which are:
Cpus | 2 |
---|---|
Memory | 8 GB |
Time | 6 hours |
TMPDIR size | 3 GB |
If you need more resources, you may use the ecinteractive options when creating the job. For example, to get a shell with 4 cpus and 16 GB or memory for 12 hours:
[user@aa6-100 ~]$ ecinteractive -c4 -m 16G -t 12:00:00 Submitted batch job 10225018 Waiting 5 seconds for the job to be ready... Using interactive job: CLUSTER JOBID STATE EXEC_HOST TIME_LIMIT TIME_LEFT MAX_CPUS MIN_MEMORY TRES_PER_NODE aa 10225018 RUNNING aa6-104 12:00:00 11:59:55 4 16G ssdtmp:3G To cancel the job: /usr/local/bin/ecinteractive -k Last login: Mon Dec 13 09:39:09 2021 [ECMWF-INFO-z_ecmwf_local.sh] /usr/bin/bash INTERACTIVE on aa6-104 at 20211213_093914.794, PID: 1736962, JOBID: 10225018 [ECMWF-INFO-z_ecmwf_local.sh] $SCRATCH=/ec/res4/scratch/user [ECMWF-INFO-z_ecmwf_local.sh] $PERM=/ec/res4/perm/user [ECMWF-INFO-z_ecmwf_local.sh] $HPCPERM=/ec/res4/hpcperm/user [ECMWF-INFO-z_ecmwf_local.sh] $TMPDIR=/etc/ecmwf/ssd/ssd1/tmpdirs/user.10225018 [ECMWF-INFO-z_ecmwf_local.sh] $SCRATCHDIR=/ec/res4/scratchdir/user/8/10225018 [ECMWF-INFO-z_ecmwf_local.sh] $_EC_ORIG_TMPDIR=N/A [ECMWF-INFO-z_ecmwf_local.sh] $_EC_ORIG_SCRATCHDIR=N/A [ECMWF-INFO-z_ecmwf_local.sh] Job 10225018 time left: 11:59:54 [user@aa6-104 ~]$ |
If you log out, the job continues to run until explicietly cancelled or reaching the time limit. |
Once you have an interactive job running, you may reattach to it, or open several shells within that job calling ecinteractive again.
If you have a job already running, ecinteractive will always attach you to that one regardless of the resources options you pass. If you wish to run a job with different settings, you will have to cancel it first |
[user@aa6-100 ~]$ ecinteractive Using interactive job: CLUSTER JOBID STATE EXEC_HOST TIME_LIMIT TIME_LEFT MAX_CPUS MIN_MEMORY TRES_PER_NODE aa 10225018 RUNNING aa6-104 12:00:00 11:57:56 4 16G ssdtmp:3G WARNING: Your existing job 10225018 may have a different setup than requested. Cancel the existing job and rerun if you with to run with different setup To cancel the job: /usr/local/bin/ecinteractive -k Last login: Mon Dec 13 09:39:14 2021 from aa6-100.bullx [ECMWF-INFO-z_ecmwf_local.sh] /usr/bin/bash INTERACTIVE on aa6-104 at 20211213_094114.197, PID: 1742608, JOBID: 10225018 [ECMWF-INFO-z_ecmwf_local.sh] $SCRATCH=/ec/res4/scratch/user [ECMWF-INFO-z_ecmwf_local.sh] $PERM=/ec/res4/perm/user [ECMWF-INFO-z_ecmwf_local.sh] $HPCPERM=/ec/res4/hpcperm/user [ECMWF-INFO-z_ecmwf_local.sh] $TMPDIR=/etc/ecmwf/ssd/ssd1/tmpdirs/user.10225018 [ECMWF-INFO-z_ecmwf_local.sh] $SCRATCHDIR=/ec/res4/scratchdir/user/8/10225018 [ECMWF-INFO-z_ecmwf_local.sh] $_EC_ORIG_TMPDIR=N/A [ECMWF-INFO-z_ecmwf_local.sh] $_EC_ORIG_SCRATCHDIR=N/A [ECMWF-INFO-z_ecmwf_local.sh] Job 10225018 time left: 11:57:54 [user@aa6-104 ~]$ |
You may query ecinteractive for existing interactive jobs, and you can do so from within or outside the job. It may be useful to see how much time is left
[user@aa6-100 ~]$ ecinteractive -q CLUSTER JOBID STATE EXEC_HOST TIME_LIMIT TIME_LEFT MAX_CPUS MIN_MEMORY TRES_PER_NODE aa 10225018 RUNNING aa6-104 12:00:00 11:55:40 4 16G ssdtmp:3G |
Logging out of your interactive shells spawn through ecinteractive will not cancel the job. If you have finished working with it, you should cancel it with:
[user@aa6-100 ~]$ ecinteractive -k cancelling job 10225018... CLUSTER JOBID STATE EXEC_HOST TIME_LIMIT TIME_LEFT MAX_CPUS MIN_MEMORY TRES_PER_NODE aa 10225018 RUNNING aa6-104 12:00:00 11:55:34 4 16G ssdtmp:3G Cancel job_id=10225018 name=user-ecinteractive partition=inter [y/n]? y Connection to aa-login closed. |
if you need to run graphical applications, you can do so through the standard x11 forwarding.
Alternatively, you may use ecinteractive to open a basic window manager running on the allocated interactive node, which will open a VNC client on your end to connect to the running desktop in the allocated node:
[user@aa6-100 ~]$ ecinteractive -d Submitted batch job 10225277 Waiting 5 seconds for the job to be ready... Using interactive job: CLUSTER JOBID STATE EXEC_HOST TIME_LIMIT TIME_LEFT MAX_CPUS MIN_MEMORY TRES_PER_NODE aa 10225277 RUNNING aa6-104 6:00:00 5:59:55 2 8G ssdtmp:3G To cancel the job: /usr/local/bin/ecinteractive -k Attaching to vnc session... To manually re-attach: vncviewer -passwd ~/.vnc/passwd aa6-104:9598 |
You can also use ecinteractive to open up a Jupyter Lab instance very easily:
[user@aa6-100 ~]$ ./ecinteractive -j Using interactive job: CLUSTER JOBID STATE EXEC_HOST TIME_LIMIT TIME_LEFT MAX_CPUS MIN_MEMORY TRES_PER_NODE aa 10225277 RUNNING aa6-104 6:00:00 5:58:07 2 8G ssdtmp:3G To cancel the job: ./ecinteractive -k Attaching to Jupyterlab session... To manually re-attach go to http://aa6-104.ecmwf.int:33698/?token=b1624da17308654986b1fd66ef82b9274401ea8982f3b747 |