Arvados 2.6.2 released

The Arvados team is pleased to announce Arvados 2.6.2. This release includes a number of important bugfixes and performance improvements, particularly when Crunch is running a large number of containers. We recommend that new and existing installations of 2.6.1 or earlier upgrade to 2.6.2. See Upgrading Arvados for upgrade instructions. For a full list of features and improvements introduced in the 2.6 Arvados release, see the 2.6.0 release notes.

Container scalability improvements

The API server now uses an SQL function to cascade container record updates, to reduce lock contention and improve performance. #20472

The API server no longer locks container database records for updates that do not affect other records, to reduce lock contention and improve performance. #20447

Once arvados-dispatch-cloud locks a container, it will prioritize finishing that container over starting new containers with higher priority that appear in the queue. This can help reduce thrashing of cloud resources at the cost of making the dispatcher less responsive to high-priority requests that appear in the queue. #20457

arvados-dispatch-cloud now uses less memory by clearing fields from the container records it holds in memory after they are no longer needed. #20457

When arvados-cwl-runner fails to load container records, it now writes a log message at the warning level that reflects this is a temporary, recoverable problem. Previously it logged an exception traceback at the error level, which was overly dramatic. #20432

Most Arvados services can now log their request queue to a JSON file when they are over 90% capacity. This feature is intended to help diagnose performance and scaling problems. The location of the these JSON files is configured through the SystemLogs.RequestQueueDumpDirectory setting. #20475

Fixed a bug in arvados-dispatch-cloud where, when the maximum number of both containers and supervisor containers were running, supervisor containers in the queue were not counted towards the dispatcher’s “overquota” metric. This caused the graph to jump up and down in confusing ways as supervisor containers finished and new ones replaced them. #20457

Other bug fixes

Fixed a bug in arv-mount that would cause it to crash if its internal block cache was thrashing (for example, because multiple processes were requesting data in parallel faster than it could be downloaded). #20422

Fixed a bug in arvados-cwl-runner that would cause it to crash when the first filename loaded from the workflow was a strict prefix of others. #20462

The HTTP download function in arvados-cwl-runner now detects when connections are still open but no longer receiving data, so it can retry or abort as appropriate. #20257

Workbench 2 only requests the data it needs to list project contents, improving response time. #20469

Fixed a bug where Workbench 2 would crash trying to load container logs before any had been recorded. #20452

Fixed a bug in Workbench 2 where, after a user opened a registered workflow to view it, its inputs and outputs would disappear after a brief time. #20487

Fixed a bug in Workbench 2 where, when paging rapidly through a long list, it would sometimes force the user back to an earlier page. #20377

Workbench 2 now renders optional array inputs for processes. #20493

Improved the performance of Workbench 2’s rendering of process inputs. #20424

New APIs

The Python SDK has a new http_to_keep module that provides a high-level interface to mirror an HTTP resource in a collection. This is the same code used by arvados-cwl-runner. #20257

The groups.contents API method now supports a select argument like many other methods. You can specify qualified attribute names like collections.name to retrieve an attribute only on a specific kind of object. #20470, #20527