I have an Arvados cluster running on Kubernetes, and I’m trying to use arvados-cwl-runner
from outside the cluster. However, it seems like I get DNS errors:
$ arvados-cwl-runner --debug --create-workflow bwa-mem.cwl
INFO /usr/local/bin/arvados-cwl-runner 2.1.0, arvados-python-client 2.1.0, cwltool 3.0.20200807132242
INFO Resolved 'bwa-mem.cwl' to 'file:///home/mludwig/arvados-test/bwa-mem.cwl'
DEBUG Parsed job order from command line: {
"id": "bwa-mem.cwl",
"PL": null,
"group_id": null,
"read_p1": null,
"read_p2": null,
"reference": null,
"sample_id": null
}
INFO Using cluster 3rzp3 (https://10.8.47.219:444)
INFO Uploading Docker image quay.io/biocontainers/bwa:0.7.17--ha92aebf_3
2020-10-26 19:18:36 arvados.arv_put[3387992] INFO: Resuming upload from cache file /home/mludwig/.cache/arvados/arv-put/14d1decaacaf50a6bea32c5847b666b1
0M / 94M 0.0% 2020-10-26 19:18:36 arvados.keep[3387992] DEBUG: {'3rzp3-bi6l4-soaotya77wddkrk': OrderedDict([('href', '/keep_services/3rzp3-bi6l4-soaotya77wddkrk'), ('kind', 'arvados#keepService'), ('etag', 'b0m1700vrmft9o2ctazbs8o33'), ('uuid', '3rzp3-bi6l4-soaotya77wddkrk'), ('owner_uuid', '3rzp3-tpzed-000000000000000'), ('created_at', '2020-10-21T10:22:32.155265000Z'), ('modified_by_client_uuid', '3rzp3-ozdt8-dy6wkmdfw1qsr7g'), ('modified_by_user_uuid', '3rzp3-tpzed-000000000000000'), ('modified_at', '2020-10-21T10:22:32.358468000Z'), ('service_host', 'arvados-keep-store-0.arvados-keep-store'), ('service_port', 25107), ('service_ssl_flag', False), ('service_type', 'disk'), ('read_only', False), ('_service_root', 'http://arvados-keep-store-0.arvados-keep-store:25107/')]), '3rzp3-bi6l4-t3ed9j6d21r50gc': OrderedDict([('href', '/keep_services/3rzp3-bi6l4-t3ed9j6d21r50gc'), ('kind', 'arvados#keepService'), ('etag', '1pb5lilpukxzjlnl69h305up0'), ('uuid', '3rzp3-bi6l4-t3ed9j6d21r50gc'), ('owner_uuid', '3rzp3-tpzed-000000000000000'), ('created_at', '2020-10-21T10:22:33.118259000Z'), ('modified_by_client_uuid', '3rzp3-ozdt8-dy6wkmdfw1qsr7g'), ('modified_by_user_uuid', '3rzp3-tpzed-000000000000000'), ('modified_at', '2020-10-21T10:22:33.320829000Z'), ('service_host', 'arvados-keep-store-1.arvados-keep-store'), ('service_port', 25107), ('service_ssl_flag', False), ('service_type', 'disk'), ('read_only', False), ('_service_root', 'http://arvados-keep-store-1.arvados-keep-store:25107/')])} (X-Request-Id: req-z1ms5bxcoeon2tmc8kv6)
2020-10-26 19:18:36 arvados.keep[3387992] DEBUG: b09fae35d758cd937a8271dc579a7217+31714304: ['http://arvados-keep-store-0.arvados-keep-store:25107/', 'http://arvados-keep-store-1.arvados-keep-store:25107/']
2020-10-26 19:18:36 arvados.keep[3387992] DEBUG: Pool max threads is 2
2020-10-26 19:18:36 arvados.keep[3387992] DEBUG: Request: PUT http://arvados-keep-store-0.arvados-keep-store:25107/b09fae35d758cd937a8271dc579a7217
2020-10-26 19:18:36 arvados.keep[3387992] DEBUG: Request: PUT http://arvados-keep-store-1.arvados-keep-store:25107/b09fae35d758cd937a8271dc579a7217
2020-10-26 19:18:36 arvados.keep[3387992] DEBUG: Request fail: PUT http://arvados-keep-store-1.arvados-keep-store:25107/b09fae35d758cd937a8271dc579a7217 => <class 'arvados.errors.HttpError'>: (0, "(6, 'Could not resolve host: arvados-keep-store-1.arvados-keep-store')")
2020-10-26 19:18:36 arvados.keep[3387992] DEBUG: Request fail: PUT http://arvados-keep-store-0.arvados-keep-store:25107/b09fae35d758cd937a8271dc579a7217 => <class 'arvados.errors.HttpError'>: (0, "(6, 'Could not resolve host: arvados-keep-store-0.arvados-keep-store')")
DNS works on both api-server and keep-proxy pods:
root@arvados-api-server-fd84b99dc-px94g:/# nslookup arvados-keep-store-0.arvados-keep-store
Server: 169.254.25.10
Address: 169.254.25.10#53
Name: arvados-keep-store-0.arvados-keep-store.arvados-demo.svc.cluster.local
Address: 10.233.79.167
root@arvados-api-server-fd84b99dc-px94g:/# nslookup arvados-keep-store-1.arvados-keep-store
Server: 169.254.25.10
Address: 169.254.25.10#53
Name: arvados-keep-store-1.arvados-keep-store.arvados-demo.svc.cluster.local
Address: 10.233.120.139
root@arvados-keep-proxy-545cd4b664-5x7v5:/# nslookup arvados-keep-store-0.arvados-keep-store
Server: 169.254.25.10
Address: 169.254.25.10#53
Name: arvados-keep-store-0.arvados-keep-store.arvados-demo.svc.cluster.local
Address: 10.233.79.167
root@arvados-keep-proxy-545cd4b664-5x7v5:/# nslookup arvados-keep-store-1.arvados-keep-store
Server: 169.254.25.10
Address: 169.254.25.10#53
Name: arvados-keep-store-1.arvados-keep-store.arvados-demo.svc.cluster.local
Address: 10.233.120.139
The proxy is available from the cwl-runner host:
$ nmap 10.8.47.219 -p 25107
Starting Nmap 7.70 ( https://nmap.org ) at 2020-10-26 14:54 EDT
Nmap scan report for 10.8.47.219
Host is up (0.0012s latency).
PORT STATE SERVICE
25107/tcp open unknown
But the keep-store pods of course aren’t accessible from the cwl-runner host, which I think is why it’s failing. Is there a way to use the keep-proxy with arvados-cwl-runner?
I also tried running the same command from inside the cluster (on the shell-server pod, after installing cwl-runner), which succeeded, so I’m fairly certain the keep-store pods being inaccessible from outside is the problem.
root@arvados-shell-server-8645664676-n4fwl:/home/mludwig# arvados-cwl-runner --create-workflow bwa-mem.cwl
INFO /usr/local/bin/arvados-cwl-runner 2.1.0, arvados-python-client 2.1.0, cwltool 3.0.20200807132242
INFO Resolved 'bwa-mem.cwl' to 'file:///home/mludwig/bwa-mem.cwl'
INFO Using cluster 3rzp3 (https://10.8.47.219:444)
INFO ['docker', 'pull', 'lh3lh3/bwa']
Using default tag: latest
latest: Pulling from lh3lh3/bwa
Image docker.io/lh3lh3/bwa:latest uses outdated schema1 manifest format. Please upgrade to a schema2 image for better future compatibility. More information at https://docs.docker.com/registry/spec/deprecated-schema-v1/
d56ac91634e2: Pull complete
a3ed95caeb02: Pull complete
Digest: sha256:ecb80258bdaebe4d42445eb34adea936c929b3a3439bea154a128939f7cce95d
Status: Downloaded newer image for lh3lh3/bwa:latest
docker.io/lh3lh3/bwa:latest
INFO Uploading Docker image lh3lh3/bwa:latest
2020-10-26 22:51:11 arvados.arv_put[4377] INFO: Creating new cache file at /root/.cache/arvados/arv-put/f0bdda26db6887d5ca906aef92ccdac0
1M / 1M 100.0% 2020-10-26 22:51:11 arvados.arv_put[4377] INFO:
2020-10-26 22:51:11 arvados.arv_put[4377] INFO: Collection saved as 'Docker image lh3lh3 bwa:latest sha256:c66bf'
3rzp3-4zz18-9cx44yx9hl4ccr3
2020-10-26 22:51:11 cwltool[4377] INFO: ['docker', 'pull', 'arvados/jobs:2.1.0']
2.1.0: Pulling from arvados/jobs
8559a31e96f4: Pull complete
6880da06a4a9: Pull complete
72c96cad4268: Pull complete
8acf86f98e38: Pull complete
0ce8c1e0dd01: Pull complete
2b381ae22fdd: Pull complete
824ec0548c57: Pull complete
0720cb34bd6e: Pull complete
f0a6d2641296: Pull complete
e928bba34ab6: Pull complete
14a1bd0a41d0: Pull complete
Digest: sha256:33484303914787c57b8796511c9c394926f1986e832f5936a5c99c93661afaf7
Status: Downloaded newer image for arvados/jobs:2.1.0
docker.io/arvados/jobs:2.1.0
2020-10-26 22:51:22 arvados.cwl-runner[4377] INFO: Uploading Docker image arvados/jobs:2.1.0
2020-10-26 22:51:38 arvados.arv_put[4377] INFO: Creating new cache file at /root/.cache/arvados/arv-put/d8b92189bf59bcc03a995eac3c2725f0
239M / 239M 100.0% 2020-10-26 22:51:41 arvados.arv_put[4377] INFO:
2020-10-26 22:51:41 arvados.arv_put[4377] INFO: Collection saved as 'Docker image arvados jobs:2.1.0 sha256:e7866'
3rzp3-4zz18-zas3cq19mkkpwf2
3rzp3-7fd4e-pfbb2hwxokpmsbv
2020-10-26 22:51:42 cwltool[4377] INFO: Final process status is success
I don’t want to run the user shell server inside the cluster since containers aren’t really suited for the problem (cron job for the login-sync plus an SSH server), and it’s less secure.