The Arvados team is pleased to announce Arvados 2.6.3. This release includes a number of important bugfixes and performance improvements, particularly when Crunch is running a large number of containers. We recommend that new and existing installations of 2.6.2 or earlier upgrade to 2.6.3. See Upgrading Arvados for upgrade instructions. For a full list of features and improvements introduced in the 2.6 Arvados release, see the 2.6.0 release notes.
Crunch
crunch-run
now retries all of its Arvados API calls, waiting up to ten minutes in the worst case, to provide improved resiliency when the API server is busy or unresponsive. #20540
All Arvados API calls in crunch-run
now select only the fields they need, improving performance. #20541
crunch-run
logs more progress information as it uploads output files to Keep. This is intended to help alleviate concerns that a container is hung after the main process has finished, and help identify cases where containers upload more than intended. #20561
The Docker watchdog timeouts in crunch-run
have been increased to accommodate specific situations where the watchdog command itself becomes unresponsive. #20595
Cloud Dispatcher
arvados-dispatch-cloud
supports a new configuration setting Containers.CloudVMs.InstanceInitCommand
. Administrators can write custom shell in this setting, and it will run after Crunch’s own dispatch script. #20520
arvados-dispatch-cloud
supports a new configuration flag Containers.CloudVMs.DeployPublicKey
to decide whether or not it authorizes its own SSH key for newly created compute nodes. You can set this flag to false if you provision the dispatcher’s SSH key another way, like preinstalling it on the compute node image. #20485
arvados-dispatch-cloud
no longer terminates instances or unlocks containers when it reaches its internal request concurrency limit. #20511
arvados-dispatch-cloud
no longer terminates instances when there are more supervisor containers running than the cluster’s configured SupervisorFraction
allows. #20511
Fixed a bug in arvados-dispatch-cloud
where supervisor containers in the Running state were not correctly counted against the cluster’s configured SupervisorFraction
. #20511
SDKs
Python SDK client constructors like arvados.api()
now accept a num_retries
argument that sets a floor for for all Arvados API requests made with this client. This lets clients define their preferred retry limit once, rather than passing it to each API call. #12684
Related to that change, client constructors will ensure that logs for googleapiclient.http
appear when you first fetch the discovery document to let you know about early problems that are being retried. If you want to handle logging yourself, just ensure googleapiclient.http
or a parent logger has a log handler installed before you construct Arvados API clients. #20613
When you issue an Arvados API request through the Go SDK, and do not save the result, the SDK automatically adds the select
argument to your request set to ["uuid"]
. This minimizes the size of the API response, improving performance. #20541
There are a number of changes to the way arvados.Client
in the Go SDK retries requests (#20511):
- The client now starts with a limit of 8 concurrent requests that grows as those requests succeed, rather than starting with unlimited concurrency that gets tamped down on failure.
- When used by an internal service, the maximum request concurrency is 25% of the cluster’s configured
MaxConcurrentRequests
. - Fixed a bug where the concurrency limit was increased more than intended after a successful request.
Workbench 2
The browser title is now updated to reflect the current navigation path. #19369
The “Subprocesses” and “All Processes” lists will no longer refresh continuously in the background. #20449
Certain permission denied errors are now handled correctly so they no longer crash the application. #20538
CLI tools
When a container fails, arvados-cwl-runner
selects different logs to report based on whether the container failed to start, reported its own failure, or was terminated by a signal. The wording of some log messages has been clarified as well. All these changes are intended to help users diagnose problems faster. #20531
The default number of retries for Arvados API requests in all Python tools has been increased to 10, to provide improved resiliency when the API server is unresponsive. In the worst case, a single API request that keeps failing may be retried for about 35 minutes. #12684
Security
The Arvados API controller now drops the If-None-Match
header from requests forwarded to the Rails API server. Arvados does not support this header, but a security vulnerability was recently discovered in Rails’ support for it (CVE-2023-22795). Dropping the header defends against exploits without affecting users. #20545
Documentation
A new page in the Administration Guide documents the /_inspect/requests
API endpoint of many Arvados services. #20229
Administrators can set Containers.Logging.LimitLogBytesPerJob
to zero in their configuration to disable realtime container logging. This has long been supported, and is now documented in the configuration reference. #20433