Use move_ds() to change the enclosing directory. write_ds() can also
move the files, but also condenses all simulation output in to a single
parquet file if multiple files are backing the mrgsimsds object. All
operations are made on the object in place; see Details.
Arguments
- x
an mrgsimsds object.
- path
the new directory location for backing files.
- id
a short name used to create data set files for the simulated output.
- sink
the complete path (including file name) for a single parquet file containing all simulated data; passed to
arrow::write_parquet().- ...
passed to
arrow::write_parquet(); files are always written in parquet format.
Details
There is an important distinction between write_ds() and move_ds() or
rename_ds() for multi-file objects. The backing files can be moved or
written easily, without much computational burden. For multi-file simulation
outputs, write_ds() will need to read each file and then write the data
out to a single file. Apache Arrow can do this very efficiently, but there
will still be an additional, potentially noticeable computational burden.
When dataset files are rewritten to a single file with write_ds(), those
files will no longer be cleaned up when the containing R object is finalized
upon garbage collection. When dataset files are moved outside of tempdir(),
those files, too, will no longer be cleaned up on garbage collection; but
file cleanup will continue to occur as long as the files remain under
tempdir(). No change in finalization behavior due to garbage collection
of the containing object will happen when files are renamed.
All three functions modify x in place and file ownership stays with x.
Examples
mod <- house_ds()
out <- lapply(1:3, \(x) { mrgsim_ds(mod, events = ev(amt = 100)) })
out <- reduce_ds(out)
rename_ds(out, id = "example-sims")
basename(out$files)
#> [1] "mrgsims-ds-example-sims-0001.parquet"
#> [2] "mrgsims-ds-example-sims-0002.parquet"
#> [3] "mrgsims-ds-example-sims-0003.parquet"
write_ds(out, sink = file.path(tempdir(), "example.parquet"))
basename(out$files)
#> [1] "example.parquet"
if (FALSE) { # \dontrun{
move_ds(out, path = "data/simulated")
} # }