Reading gzipped file from an Arvados collection in R

Hello,

Using ArvadosR, I cannot manage to read gzipped files stored on Arvados form within a R session.

Example:

library(ArvadosR)

arv <- Arvados$new()
collection <- Collection$new(arv, "<my_collection_uuid>")
fileListing <- collection$getFileListing()
arvadosFile   <- collection$get(fileListing[1])

arvadosFile
Type:          "ArvadosFile"
Name:          "myFile.txt.gz"
Relative path: "/myFile.txt.gz"
Collection:    "<my_collection>"

Trying any of the following does not solve the problem since the commands for unzipping the file expect a character string.

arvConnection <- arvadosFile$connection("r")
mytable <- read.table(arvConnection)
mytable <- read.table(gzfile(arvadosFile$read("raw")))
mytable <- fread(arvadosFile$read("raw"))

Any help would be greatly appreciated !

I think the function you want is gzcon:

https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/gzcon

Then you should be able to do something like this:

arvConnection <- gzcon(arvadosFile$connection("r"))
mytable <- read.table(arvConnection)

Hat tip to stack overflow:

@tetron

Thanks for your suggestion. Unfortunately, this does not solve the problem (neither the other solutions proposed on the SOF thread). Depending on the file to be read, I get either one of the two following error messages:

Error in read.table(arvConnection) : 
  incomplete final line found by readTableHeader on 'gzcon(https://collections..../my_file.gz)'

OR

Error in read.table(arvConnection) : no lines available in input 

Although I am sure these files are not empty, they contain # characters at the beginning of the first few rows. Could it be the cause of the problem?

I’m not really an R programmer but I think read.table is expecting a particular file format (tab separated values, maybe?). What are you trying to read in, exactly? If it is a CSV I think you need to use a different function (read.csv), and if you just want lines of text that’s another function (readLines). You might want to download the file and make sure it is properly formatted and that you know how to load it from a local file, and then apply that to loading it with ArvadosR.