Arvados 2.6.3 released

The Arvados team is pleased to announce Arvados 2.6.3. This release includes a number of important bugfixes and performance improvements, particularly when Crunch is running a large number of containers. We recommend that new and existing installations of 2.6.2 or earlier upgrade to 2.6.3. See Upgrading Arvados for upgrade instructions. For a full list of features and improvements introduced in the 2.6 Arvados release, see the 2.6.0 release notes.

Crunch

crunch-run now retries all of its Arvados API calls, waiting up to ten minutes in the worst case, to provide improved resiliency when the API server is busy or unresponsive. #20540

All Arvados API calls in crunch-run now select only the fields they need, improving performance. #20541

crunch-run logs more progress information as it uploads output files to Keep. This is intended to help alleviate concerns that a container is hung after the main process has finished, and help identify cases where containers upload more than intended. #20561

The Docker watchdog timeouts in crunch-run have been increased to accommodate specific situations where the watchdog command itself becomes unresponsive. #20595

Cloud Dispatcher

arvados-dispatch-cloud supports a new configuration setting Containers.CloudVMs.InstanceInitCommand. Administrators can write custom shell in this setting, and it will run after Crunch’s own dispatch script. #20520

arvados-dispatch-cloud supports a new configuration flag Containers.CloudVMs.DeployPublicKey to decide whether or not it authorizes its own SSH key for newly created compute nodes. You can set this flag to false if you provision the dispatcher’s SSH key another way, like preinstalling it on the compute node image. #20485

arvados-dispatch-cloud no longer terminates instances or unlocks containers when it reaches its internal request concurrency limit. #20511

arvados-dispatch-cloud no longer terminates instances when there are more supervisor containers running than the cluster’s configured SupervisorFraction allows. #20511

Fixed a bug in arvados-dispatch-cloud where supervisor containers in the Running state were not correctly counted against the cluster’s configured SupervisorFraction. #20511

SDKs

Python SDK client constructors like arvados.api() now accept a num_retries argument that sets a floor for for all Arvados API requests made with this client. This lets clients define their preferred retry limit once, rather than passing it to each API call. #12684

Related to that change, client constructors will ensure that logs for googleapiclient.http appear when you first fetch the discovery document to let you know about early problems that are being retried. If you want to handle logging yourself, just ensure googleapiclient.http or a parent logger has a log handler installed before you construct Arvados API clients. #20613

When you issue an Arvados API request through the Go SDK, and do not save the result, the SDK automatically adds the select argument to your request set to ["uuid"]. This minimizes the size of the API response, improving performance. #20541

There are a number of changes to the way arvados.Client in the Go SDK retries requests (#20511):

  • The client now starts with a limit of 8 concurrent requests that grows as those requests succeed, rather than starting with unlimited concurrency that gets tamped down on failure.
  • When used by an internal service, the maximum request concurrency is 25% of the cluster’s configured MaxConcurrentRequests.
  • Fixed a bug where the concurrency limit was increased more than intended after a successful request.

Workbench 2

The browser title is now updated to reflect the current navigation path. #19369

The “Subprocesses” and “All Processes” lists will no longer refresh continuously in the background. #20449

Certain permission denied errors are now handled correctly so they no longer crash the application. #20538

CLI tools

When a container fails, arvados-cwl-runner selects different logs to report based on whether the container failed to start, reported its own failure, or was terminated by a signal. The wording of some log messages has been clarified as well. All these changes are intended to help users diagnose problems faster. #20531

The default number of retries for Arvados API requests in all Python tools has been increased to 10, to provide improved resiliency when the API server is unresponsive. In the worst case, a single API request that keeps failing may be retried for about 35 minutes. #12684

Security

The Arvados API controller now drops the If-None-Match header from requests forwarded to the Rails API server. Arvados does not support this header, but a security vulnerability was recently discovered in Rails’ support for it (CVE-2023-22795). Dropping the header defends against exploits without affecting users. #20545

Documentation

A new page in the Administration Guide documents the /_inspect/requests API endpoint of many Arvados services. #20229

Administrators can set Containers.Logging.LimitLogBytesPerJob to zero in their configuration to disable realtime container logging. This has long been supported, and is now documented in the configuration reference. #20433