Debugging Arvados deployed with Salt

As discussed on Gitter I’m having some trouble getting workflows to run from the Workbench. I have a

crunch-dispatch-local.service as follows:

########################################################################
# File managed by Salt at <salt://arvados/dispatcher/service/files/default/crunch-dispatch-local-service.tmpl>.
# Your changes will be overwritten.
########################################################################
[Unit]
Description=Arvados Crunch Dispatcher for LOCAL service
Documentation=https://doc.arvados.org/
After=network.target

# systemd==229 (ubuntu:xenial) obeys StartLimitInterval in the [Unit] section
StartLimitInterval=0

# systemd>=230 (debian:9) obeys StartLimitIntervalSec in the [Unit] section
StartLimitIntervalSec=0

[Service]
Type=simple
EnvironmentFile=-/etc/arvados/environment
ExecStart=/usr/bin/crunch-dispatch-local -poll-interval=1 -crunch-run-command=/usr/bin/crunch-run
# Set a reasonable default for the open file limit
LimitNOFILE=65536
Restart=always
RestartSec=1
LimitNOFILE=1000000

# systemd<=219 (centos:7, debian:8, ubuntu:trusty) obeys StartLimitInterval in the [Service] section
StartLimitInterval=0

[Install]
WantedBy=multi-user.target

and /etc/arvados/environment as follows:

ARVADOS_API_HOST=sanbi.arvados.sanbi.ac.za
ARVADOS_API_HOST_INSECURE=1
ARVADOS_API_TOKEN=changeme_system_root_token 

and the error that keeps cycling in my syslog is:

Nov 18 18:24:06 arvados systemd[1]: Started Arvados Crunch Dispatcher for LOCAL service.
Nov 18 18:24:06 arvados crunch-dispatch-local[9796]: {"level":"info","msg":"crunch-dispatch-local 2.1.0 started","time":"2020-11-18T18:24:06.668847709Z"}
Nov 18 18:24:06 arvados arvados-controller[3028]: {"PID":3028,"RequestID":"req-4khgiez4l7ozjrnwoeef","level":"info","msg":"request","remoteAddr":"127.0.0.1:45358","reqBytes":0,"reqForwardedFor":"127.0.0.1","reqHost":"sanbi.arvados.sanbi.ac.za","reqMethod":"GET","reqPath":"arvados/v1/api_client_authorizations/current","reqQuery":"","time":"2020-11-18T18:24:06.678752178Z"}
Nov 18 18:24:06 arvados arvados-controller[3028]: {"PID":3028,"RequestID":"req-4khgiez4l7ozjrnwoeef","level":"info","msg":"response","remoteAddr":"127.0.0.1:45358","reqBytes":0,"reqForwardedFor":"127.0.0.1","reqHost":"sanbi.arvados.sanbi.ac.za","reqMethod":"GET","reqPath":"arvados/v1/api_client_authorizations/current","reqQuery":"","respBody":"{\"errors\":[\"Not logged in (req-4khgiez4l7ozjrnwoeef)\"],\"error_token\":\"1605723846+4bd3eab8\"}","respBytes":91,"respStatus":"Unauthorized","respStatusCode":401,"time":"2020-11-18T18:24:06.693454796Z","timeToStatus":0.014530,"timeTotal":0.014659,"timeWriteBody":0.000129}
Nov 18 18:24:06 arvados crunch-dispatch-local[9796]: {"level":"fatal","msg":"\"error getting my token UUID: arvados API server error: Not logged in (req-4khgiez4l7ozjrnwoeef) (401: 401 Unauthorized) returned by sanbi.arvados.sanbi.ac.za\"","time":"2020-11-18T18:24:06.694966035Z"}
Nov 18 18:24:06 arvados systemd[1]: crunch-dispatch-local.service: Main process exited, code=exited, status=1/FAILURE
Nov 18 18:24:06 arvados systemd[1]: crunch-dispatch-local.service: Failed with result 'exit-code'.
Nov 18 18:24:07 arvados systemd[1]: crunch-dispatch-local.service: Service hold-off time over, scheduling restart.
Nov 18 18:24:07 arvados systemd[1]: crunch-dispatch-local.service: Scheduled restart job, restart counter is at 25.
Nov 18 18:24:07 arvados systemd[1]: Stopped Arvados Crunch Dispatcher for LOCAL service.

btw the workflow that I am testing is https://github.com/pvanheus/lukasa/blob/main/protein_evidence_mapping.cwl with appropriate inputs…

ARVADOS_API_TOKEN=changeme_system_root_token 

@pvanheus, after some research you might have discover a bug. Please try a 41 character long, alphanumeric only system root token (i.e: CvlebKKXnykZwE3f3n7NdMRQI2iSjlobf87EUebpW)

pvanheus: I uploaded and merged a few changes to the provision.sh script and salt formula:

I hope you can give it a try and let me know if you have further issues

Hi! Salt deploy is now working.

The SSL certificates in use by the services are signed by a Issue Identified as ArvadosFormula - is the cert for this CA available on the server somewhere so that I can import it into my browser?

Thanks,
Peter

Hi @pvanheus. The cert is a classic self-signed certificate generated with a salt script.

If I’m not wrong, in most browsers, when you get a warning about the certificate being self-signed, you go to ‘advanced / accept certificate’ and that would be it. The browser will remember your choice and won’t ask you again for the site.

Hi! Unfortunately that is not quite enough. The workbench talks to keep, and even after accepting the certificate in Firefox, uploads will fail because the certificate for the keep server is not accepted. My workaround for that has been to examine the Firefox console at upload time, find the URL which Firefox refuses to connect to (the keep one) and open that in the browser so that an exception can be made. Then everything works.

Peter

Does it work if you go (in Firefox) to Preferences -> Security -> View Certificates -> Authorities -> Import and add the self-signed certificate file?

If not, we might have to split it into a private root certificate (that you can install in the browser) and one or more service certificates signed by the root.

@Javier_Bertoli The reason why he can’t just accept the certificate in Firefox is that it doesn’t give you a popup with the option to accept/override when the browser accessed the URL from an AJAX call or a redirect. In those cases it just silently fails, you never get the opportunity to accept the certificate. The only way I’ve found that works reliably is to install the private root certificate.

I updated the arvados-formula's helper scripts to add a self-signed CA and create the certificates with it, as @tetron suggested, to fix the AJAX issue.

Also, updated the provision.sh script and documentation to leave the CA’s cert in the script dir so you can add it to your workstation easily.

Pushed a branch (https://github.com/arvados/arvados/tree/17177-use-newly-created-ca) which is still pending review and merging, but you can try it if you wish.