Skip to main content

Frequently asked questions

Which type of storage drive should I choose?

For most use cases, we recommend using object storage. Object drives have several advantages for most use cases:

  • Their capacity is unlimited and doesn't need to be fixed ahead of time.
  • Object drives can be accessed outside the Deep Origin OS. For example, they are ideal for uploading data from laboratory instruments.
  • Object drives are cheaper to operate because they are charged for their usage rather than their provisioned capacity, they can be accessed without running a workstation, and they are cheaper per unit storage.

We recommend using file and block storage for use cases that require high performance such as distributed simulation.

Which throughput should I use for file drives?

For most use cases, we recommend choosing a throughput that will provide 150 MB/s per user.

First, consider how many users you expect to read and write a drive concurrently. Second, consider the size of the drive you want to provision. Third, use these figures to calculate the required throughput. Finally, select the nearest throughput option. For example, we recommend that a team with 5 people using a 10 TB drive choose 125 MB/s/TB throughput:

  • 5 users × 150 MB/s = 750 MB/s total throughput needed.
  • 10 TB drive = 75 MB/s/TB throughput needed.
  • Select 125 MB/s/TB throughput.

For more information, see the documentation for AWS FSx for Lustre.

How can I change the ID, type, or throughput of a storage drive?

info

Once created, the ID, type, and throughput of a drive cannot be changed.

Follow these steps to copy the contents of a storage drive to a new drive with a different ID, type, and/or throughput:

  1. Create a new drive with the desired ID, type, and throughput.
  2. Mount the previous and new drives to the same workstation.
  3. Use the workstation to copy the files from the previous drive to the new drive.
  4. Delete the previous drive.

Can I control the permissions to files and directories within storage drives?

Currently, all of the members of the organization that owns the drive have full control over the files in the drive.

To avoid accidentally deleting important files, we recommend changing the ownership of critical files to root. For example, to protect the folder /share/my-drive/data/, run the following command:

sudo chown -R root:root /share/my-drive/data/

Files owned by root will be protected from deletion by regular users. For example, the following command will fail with the error "Permission denied".

rm -rf /share/my-drive/data/

Files owned by root can be deleted using sudo as illustrated below.

sudo rm -rf /share/my-drive/data

Why are the disk sizes of the files in my file drives smaller than I expect?

Our file storage drives automatically compress each file to increase the amount of data they can store with no impact on performance. Consequently, the disk sizes of your files may be smaller than their regular, uncompressed sizes. For example, the compressed size of a 1 GB uncompressed FASTQ file may be significantly smaller than 1 GB.

To view the uncompressed and compressed sizes of your files and directories, run the following shell commands:

  • du: Display compressed sizes.
  • du --apparent-size: Display uncompressed sizes.
  • ls -l: Display uncompressed sizes.

More information about the automated compression is available in the documentation for AWS FSx for Lustre.

How can I access my object drives outside the Deep Origin OS?

info

We plan to provide users keys to read and write their object drives from outside the Deep Origin OS with S3 clients such as the AWS CLI, boto3, S3 Browser, and s5cmd. For immediate assistance accessing object drives outside Deep Origin, please contact customer support.

Which data should I store with the data hub vs storage drives vs workstations?

We recommend using our data hub as the primary mechanism for storing your raw data and processed results. Our data hub has numerous features for data management including for organizing, labeling, searching, sorting, and sharing data. A client CLI and Python library is available for retrieving managed data into workstations.

We recommend using storage drives for files that are frequently needed for data processing with multiple workstations and/or by multiple users, such as a reference database that is needed for sequence alignment. While storage drives are typically more expensive than workstation persistent local storage, storage drives can be more cost-effective for files that would otherwise need to be replicated across multiple workstations, such as reference genomes and databases.

We recommend using the persistent local storage of individual workstations for temporary files that are only needed for data processing with a single workstation.

What is the most cost-effective way to access the data in my storage drives?

To cost-effectively access your drives, we recommend only running workstations when needed to read and write files.