Skip to content

Storage Location Architecture

This document provides an in-depth architectural overview of the StorageLocation system in the Synapse Python Client. It explains the design decisions, class relationships, and data flows that enable flexible storage configuration.


On This Page


Overview

The StorageLocation setting enables Synapse users to configure a location where files are uploaded to and downloaded from via Synapse. By default, Synapse stores files in its internal S3 storage, but users can configure projects and folders to use external storage backends such as AWS S3 buckets, Google Cloud Storage, SFTP servers, or a local file server using a proxy server.

Key Concepts

  • StorageLocationSetting: A configuration specifying file storage and download locations.
  • ProjectSetting: A configuration applied to projects that allows customization of file storage locations.
  • UploadType: An enumeration that defines the types of file upload destinations that Synapse supports.
  • STS Credentials: Temporary AWS credentials for direct S3 access.
  • StorageLocation Migration: The process of transferring the files associated with Synapse entities between storage locations while preserving the entities’ structure and identifiers.


Part 1: Data Model

This section covers the core classes, enumerations, and type mappings.


Domain Model

The following class diagram shows the core classes and their relationships in the StorageLocation system.

classDiagram
    direction TB

    class StorageLocation {
        +int storage_location_id
        +StorageLocationType storage_type
        +UploadType upload_type
        +str bucket
        +str base_key
        +bool sts_enabled
        +str banner
        +str description
        +str etag
        +str created_on
        +int created_by
        +str url
        +bool supports_subfolders
        +str endpoint_url
        +str proxy_url
        +str secret_key
        +str benefactor_id
        +store() StorageLocation
        +get() StorageLocation
        +fill_from_dict(dict) StorageLocation
    }

    class StorageLocationType {
        <<enumeration>>
        SYNAPSE_S3
        EXTERNAL_S3
        EXTERNAL_GOOGLE_CLOUD
        EXTERNAL_SFTP
        EXTERNAL_OBJECT_STORE
        PROXY
    }

    class UploadType {
        <<enumeration>>
        S3
        GOOGLE_CLOUD_STORAGE
        SFTP
        HTTPS
        NONE
    }

    class StorageLocationConfigurable {
        <<mixin>>
        +set_storage_location(storage_location_id) ProjectSetting
        +get_project_setting(setting_type) ProjectSetting
        +delete_project_setting(setting_id)
        +get_sts_storage_token(permission, output_format) dict
        +index_files_for_migration(dest_storage_location_id, db_path) MigrationResult
        +migrate_indexed_files(db_path) MigrationResult
    }

    class Project {
        +str id
        +str name
        +str description
    }

    class Folder {
        +str id
        +str name
        +str parent_id
    }

    class UploadDestinationListSetting {
        <<enumeration>>
        concreteType
        id
        projectId
        settingsType
        etag
        locations
    }

    class ProjectSetting {
        <<enumeration>>
        concreteType
        id
        projectId
        settingsType
        etag

    }
    StorageLocation --> StorageLocationType : storage_type
    StorageLocation --> UploadType : upload_type
    StorageLocationConfigurable <|-- Project : implements
    StorageLocationConfigurable <|-- Folder : implements
    StorageLocationConfigurable ..> ProjectSetting : returns
    StorageLocationConfigurable ..> UploadDestinationListSetting : uses


Key Components

Component Description
[synapseclient.models.StorageLocation] The model representing a storage location setting in Synapse
[synapseclient.models.StorageLocationType] Enumeration defining the supported storage backend types
[synapseclient.models.UploadType] Enumeration defining the upload protocol for each storage type
[synapseclient.models.mixins.StorageLocationConfigurable] Mixin providing storage management methods to entities
[synapseclient.models.mixins.UploadDestinationListSetting] Dataclass defining the upload destination list setting containing storage location IDs
[synapseclient.models.mixins.ProjectSetting] Dataclass defining the base project setting structure


Storage Type Mapping

Each StorageLocationType maps to a specific REST API concreteType and has a default UploadType. This mapping allows the system to parse responses from the API and construct requests.

flowchart LR
    subgraph StorageLocationType
        SYNAPSE_S3["SYNAPSE_S3"]
        EXTERNAL_S3["EXTERNAL_S3"]
        EXTERNAL_GOOGLE_CLOUD["EXTERNAL_GOOGLE_CLOUD"]
        EXTERNAL_SFTP["EXTERNAL_SFTP"]
        EXTERNAL_HTTPS["EXTERNAL_HTTPS"]
        EXTERNAL_OBJECT_STORE["EXTERNAL_OBJECT_STORE"]
        PROXY["PROXY"]
    end

    subgraph concreteType
        S3SLS["S3StorageLocationSetting"]
        ExtS3SLS["ExternalS3StorageLocationSetting"]
        ExtGCSSLS["ExternalGoogleCloudStorageLocationSetting"]
        ExtSLS["ExternalStorageLocationSetting"]
        ExtObjSLS["ExternalObjectStorageLocationSetting"]
        ProxySLS["ProxyStorageLocationSettings"]
    end

    subgraph UploadType
        S3["S3"]
        GCS["GOOGLECLOUDSTORAGE"]
        SFTP["SFTP"]
        HTTPS["HTTPS"]
        PROXYLOCAL["PROXYLOCAL"]
    end

    SYNAPSE_S3 --> S3SLS --> S3
    EXTERNAL_S3 --> ExtS3SLS --> S3
    EXTERNAL_GOOGLE_CLOUD --> ExtGCSSLS --> GCS
    EXTERNAL_SFTP --> ExtSLS --> SFTP
    EXTERNAL_HTTPS --> ExtSLS --> HTTPS
    EXTERNAL_OBJECT_STORE --> ExtObjSLS --> S3
    PROXY --> ProxySLS --> HTTPS


Storage Type Attributes

Different storage types support different configuration attributes:

Attribute Type S3StorageLocationSetting ExternalS3StorageLocationSetting ExternalObjectStorageLocationSetting ExternalStorageLocationSetting ExternalGoogleCloudStorageLocationSetting ProxyStorageLocationSettings
Common (all types)
concreteType string (enum) ✓ (required) ✓ (required) ✓ (required) ✓ (required) ✓ (required) ✓ (required)
storageLocationId integer (int32)
uploadType string
banner string
description string
etag string
createdOn string
createdBy integer (int32)
Type-specific
baseKey string
stsEnabled boolean
bucket string ✓ (required) ✓ (required) ✓ (required)
endpointUrl string ✓ (required)
url string ✓ (required)
supportsSubfolders boolean
proxyUrl string ✓ (required)
secretKey string ✓ (required)
benefactorId string ✓ (required)

Summary by type

Setting type Description Type-specific attributes
S3StorageLocationSetting Default Synapse storage on Amazon S3. baseKey, stsEnabled
ExternalS3StorageLocationSetting External S3 bucket connected with Synapse (Synapse-accessed). bucket (required), baseKey, stsEnabled, endpointUrl
ExternalObjectStorageLocationSetting S3-compatible object storage not accessed by Synapse. bucket (required), endpointUrl (required)
ExternalStorageLocationSetting SFTP or HTTPS upload destination. url (required), supportsSubfolders
ExternalGoogleCloudStorageLocationSetting External Google Cloud Storage bucket connected with Synapse. bucket (required), baseKey
ProxyStorageLocationSettings HTTPS proxy for all upload/download operations. proxyUrl (required), secretKey (required), benefactorId (required)


Choosing a Storage Type

Use this decision tree to select the appropriate storage type for your use case:

flowchart TB
    Start{Need custom storage?}
    Start -->|No| DEFAULT[Use default Synapse storage]
    Start -->|Yes| Q1{Want Synapse to<br/>manage storage?}

    Q1 -->|Yes| DEFAULT[Use default Synapse storage]
    Q1 -->|No| Q2{What storage<br/>backend?}

    Q2 -->|AWS S3| Q3{Synapse accesses<br/>bucket directly?}
    Q2 -->|Google Cloud| EXTERNAL_GOOGLE_CLOUD[Use EXTERNAL_GOOGLE_CLOUD]
    Q2 -->|SFTP Server| EXTERNAL_SFTP[Use EXTERNAL_SFTP]
    Q2 -->|Proxy Server| PROXY[Use PROXY]
    Q2 -->|AWS S3 | EXTERNAL_OBJECT_STORE[Use EXTERNAL_OBJECT_STORE]

    Q3 -->|Yes| Q4{Need STS<br/>credentials?}
    Q3 -->|No| EXTERNAL_OBJECT_STORE

    Q4 -->|Yes| EXTERNAL_S3_STS[Use EXTERNAL_S3<br/>with sts_enabled=True]
    Q4 -->|No| EXTERNAL_S3[Use EXTERNAL_S3]

    SYNAPSE_S3 --> Benefits1[Benefits:<br/>- Zero configuration<br/>- Managed by Synapse<br/>- STS available]
    EXTERNAL_S3 --> Benefits2[Benefits:<br/>- Use your own bucket<br/>- Control access & costs<br/>- Optional STS]
    EXTERNAL_S3_STS --> Benefits2
    EXTERNAL_GOOGLE_CLOUD --> Benefits3[Benefits:<br/>- GCP native<br/>- Use existing GCS buckets]
    EXTERNAL_SFTP --> Benefits4[Benefits:<br/>- Legacy systems<br/>- Synapse never touches data]
    EXTERNAL_OBJECT_STORE --> Benefits5[Benefits:<br/>- OpenStack, MinIO, etc<br/>- Synapse never touches data]
    PROXY --> Benefits6[Benefits:<br/>- Custom access control<br/>- Data transformation]
    DEFAULT --> Benefits0[Benefits:<br/>- No configuration needed<br/>- Synapse-managed S3]


Entity Inheritance Hierarchy

Projects and Folders inherit storage configuration capabilities through the StorageLocation mixin. This pattern allows consistent storage management across container entities.

classDiagram
    direction TB

    class StorageLocation {
        <<mixin>>
        +set_storage_location()
        +get_project_setting()
        +delete_project_setting()
        +get_sts_storage_token()
        +index_files_for_migration()
        +migrate_indexed_files()
    }

    class Project {
        +str id
        +str name
        +str description
        +str etag
    }

    class Folder {
        +str id
        +str name
        +str parent_id
        +str etag
    }

    StorageLocation <|-- Project
    StorageLocation <|-- Folder

The mixin pattern allows Project and Folder to share storage location functionality without code duplication. Both classes inherit the same methods from StorageLocation.




Part 2: Operation Flows

This section contains sequence diagrams for key operations.


Operation Flows

Store Operation

The store() method creates a new storage location in Synapse. Creating a storage location is idempotent per user. Repeating a creation request with the same properties will return the previously created storage location rather than creating a new one.

sequenceDiagram
    participant User
    participant StorageLocation
    participant _to_synapse_request as _to_synapse_request()
    participant API as storage_location_services
    participant Synapse as Synapse REST API

    User->>StorageLocation: store()
    activate StorageLocation

    StorageLocation->>_to_synapse_request: Build request body
    activate _to_synapse_request

    Note over _to_synapse_request: Validate storage_type is set
    Note over _to_synapse_request: Build concreteType from storage_type
    Note over _to_synapse_request: Determine uploadType
    Note over _to_synapse_request: Add type-specific fields

    _to_synapse_request-->>StorageLocation: Request body dict
    deactivate _to_synapse_request

    StorageLocation->>API: create_storage_location_setting(body)
    activate API

    API->>Synapse: POST /storageLocation
    activate Synapse

    Synapse-->>API: Response with storageLocationId
    deactivate Synapse

    API-->>StorageLocation: Response dict
    deactivate API

    StorageLocation->>StorageLocation: fill_from_dict(response)
    Note over StorageLocation: Parse storageLocationId
    Note over StorageLocation: Parse concreteType → storage_type
    Note over StorageLocation: Parse uploadType → upload_type
    Note over StorageLocation: Extract type-specific fields

    StorageLocation-->>User: StorageLocation (populated)
    deactivate StorageLocation


STS Token Retrieval

STS (AWS Security Token Service) enables direct S3 access using temporary credentials.

When a Synapse client is constructed (Synapse.__init__), it creates an in-memory token cache:

  • self._sts_token_store = sts_transfer.StsTokenStore() (see synapseclient/client.py)

The store caches STS tokens per entity and permission so repeated access to the same storage location can reuse credentials without a round-trip to the REST API.

sequenceDiagram
    participant User
    participant Entity as Folder/Project
    participant Mixin as StorageLocation
    participant STS as sts_transfer module
    participant Client as Synapse Client
    participant TokenStore as _sts_token_store (StsTokenStore)
    participant Synapse as Synapse REST API

    Note over Client,TokenStore: Client.__init__ creates self._sts_token_store = sts_transfer.StsTokenStore()

    User->>Entity: get_sts_storage_token(permission, output_format)
    activate Entity

    Entity->>Mixin: get_sts_storage_token_async()
    activate Mixin

    Mixin->>Client: Synapse.get_client()
    Client-->>Mixin: Synapse client instance

    Mixin->>STS: sts_transfer.get_sts_credentials()
    activate STS

    STS->>Client: syn._sts_token_store.get_token(...)
    activate Client
    Client->>TokenStore: get_token(entity_id, permission, min_remaining_life)
    activate TokenStore

    alt token cached and not expired
        TokenStore-->>Client: Cached token
    else cache miss or token expired
        TokenStore->>Synapse: GET /entity/{id}/sts?permission={permission}
        activate Synapse
        Synapse-->>TokenStore: STS credentials response
        deactivate Synapse
        TokenStore-->>Client: New token (cached)
    end
    deactivate TokenStore
    Client-->>STS: Token
    deactivate Client

    Note over STS: Parse credentials

    alt output_format == "boto"
        Note over STS: Format for boto3 client kwargs
        STS-->>Mixin: {aws_access_key_id, aws_secret_access_key, aws_session_token}
    else output_format == "json"
        Note over STS: Return JSON string
        STS-->>Mixin: JSON credentials string
    else output_format == "shell" / "bash"
        Note over STS: Format as export commands
        STS-->>Mixin: Shell export commands
    end
    deactivate STS

    Mixin-->>Entity: Formatted credentials
    deactivate Mixin

    Entity-->>User: Credentials
    deactivate Entity


Credential Output Formats

Format Description Use Case
boto Dict with aws_access_key_id, aws_secret_access_key, aws_session_token Pass directly to boto3.client('s3', **creds)
json JSON string Store or pass to external tools
shell / bash export AWS_ACCESS_KEY_ID=... format Execute in shell
cmd Windows SET commands Windows command prompt
powershell PowerShell variable assignments PowerShell scripts



Part 3: Settings & Infrastructure

This section covers project settings, API architecture, and the async/sync pattern.


Project Setting Lifecycle

Project settings control which storage location(s) are used for uploads to an entity. The following state diagram shows the lifecycle of a project setting.

stateDiagram-v2
    [*] --> NoSetting: Entity created

    NoSetting --> Created: set_storage_location()
    Note right of NoSetting: Inherits from parent or uses Synapse default

    Created --> Updated: set_storage_location() updates existing setting
    Updated --> Updated: set_storage_location() updates existing setting

    Created --> Deleted: delete_project_setting(project_setting_id)
    Updated --> Deleted: delete_project_setting(project_setting_id)

    Deleted --> NoSetting: Returns to default (inherits from parent)

    state NoSetting {
        [*] --> Inherited
        Inherited: No project setting exists
        Inherited: Uses parent or Synapse default (ID=1)
    }

    state Created {
        [*] --> Active
        Active: concreteType = UploadDestinationListSetting
        Active: locations = [storage_location_id]
        Active: settingsType = "upload"
        Active: projectId = entity.id
        Active: Has id and etag
    }

    state Updated {
        [*] --> Modified
        Modified: concreteType = UploadDestinationListSetting
        Modified: locations = [new_id, ...] (max 10)
        Modified: settingsType = "upload"
        Modified: etag updated (OCC)
    }


Setting Types

Type Purpose Status
upload Configures upload destination storage location(s) Supported

Other setting types may be added in the future.



API Layer Architecture

The storage location services module provides async functions that wrap the Synapse REST API endpoints. This layer handles serialization and error handling.

flowchart TB
    subgraph "Model Layer"
        SL[StorageLocation]
        SLCM[StorageLocation Mixin]
    end

    subgraph "API Layer"
        create_sls[create_storage_location_setting]
        get_sls[get_storage_location_setting]
        get_ps[get_project_setting]
        create_ps[create_project_setting]
        update_ps[update_project_setting]
        delete_ps[delete_project_setting]
    end

    subgraph "REST Endpoints"
        POST_SL["POST /storageLocation"]
        GET_SL["GET /storageLocation/{id}"]
        GET_PS["GET /projectSettings/{id}/type/{type}"]
        POST_PS["POST /projectSettings"]
        PUT_PS["PUT /projectSettings"]
        DELETE_PS["DELETE /projectSettings/{id}"]
    end

    SL --> create_sls --> POST_SL
    SL --> get_sls --> GET_SL

    SLCM --> get_ps --> GET_PS
    SLCM --> create_ps --> POST_PS
    SLCM --> update_ps --> PUT_PS
    SLCM --> delete_ps --> DELETE_PS


REST API Reference

Method Endpoint Description
POST /storageLocation Create a new storage location setting
GET /storageLocation/{id} Retrieve a storage location by ID
GET /projectSettings/{projectId}/type/{type} Get project settings for an entity
POST /projectSettings Create a new project setting
PUT /projectSettings Update an existing project setting
DELETE /projectSettings/{id} Delete a project setting


Async/Sync Pattern

The StorageLocation system follows the Python client's @async_to_sync pattern, providing both async and sync versions of all methods.

flowchart LR
    subgraph "User Code"
        SyncCall["folder.set_storage_location()"]
        AsyncCall["await folder.set_storage_location_async()"]
    end

    subgraph "@async_to_sync Decorator"
        Wrapper["Sync wrapper"]
        AsyncMethod["Async implementation"]
    end

    subgraph "Event Loop"
        RunSync["wrap_async_to_sync()"]
        AsyncIO["asyncio"]
    end

    SyncCall --> Wrapper
    Wrapper --> RunSync
    RunSync --> AsyncIO
    AsyncIO --> AsyncMethod

    AsyncCall --> AsyncMethod


Method Pairs

Sync Method Async Method
StorageLocation.store() StorageLocation.store_async()
StorageLocation.get() StorageLocation.get_async()
StorageLocation.setup_s3() StorageLocation.setup_s3_async()
folder.set_storage_location() folder.set_storage_location_async()
folder.get_project_setting() folder.get_project_setting_async()
folder.delete_project_setting() folder.delete_project_setting_async()
folder.get_sts_storage_token() folder.get_sts_storage_token_async()
folder.index_files_for_migration() folder.index_files_for_migration_async()
folder.migrate_indexed_files() folder.migrate_indexed_files_async()



Part 4: Migration

This section covers the file migration system.


Migration Flow

File migration is a two-phase process that first indexes all candidate files and then performs an asynchronous, batched migration that reuses copied file handles where possible, respects concurrency limits, snapshots affected tables when needed, and updates entities and table cells via transactional table operations while recording per-item status in a SQLite database.

sequenceDiagram
    participant User
    participant Entity as Project/Folder
    participant IndexFn as index_files_for_migration
    participant DB as SQLite Database
    participant MigrateFn as migrate_indexed_files
    participant Synapse as Synapse REST API

    Note over User,Synapse: === Phase 1: Index Files ===
    User->>Entity: index_files_for_migration_async
    activate Entity

    Entity->>IndexFn: index_files_for_migration_async(dest_id, source_ids, file_version_strategy, include_table_files)
    activate IndexFn

    IndexFn->>Synapse: Verify user owns destination storage location
    Synapse-->>IndexFn: OK / error

    IndexFn->>DB: Create/open DB + ensure schema
    IndexFn->>DB: Store migration settings (root_id, dest_id, source_ids, file_version_strategy, include_table_files)

    alt Entity is Project/Folder (container)
        IndexFn->>Synapse: get_children(parent, include_types)
        Synapse-->>IndexFn: Child references (folders/files/tables)

        loop For each child (bounded concurrency)
            IndexFn->>Synapse: get_async(child, downloadFile=false)
            Synapse-->>IndexFn: Child entity
            IndexFn->>IndexFn: _index_entity_async(child)
        end

        IndexFn->>DB: Mark container as indexed (PROJECT/FOLDER)

    else Entity is File
        alt file_version_strategy = new / latest / all
            IndexFn->>Synapse: Get file handle metadata (and versions if needed)
            Synapse-->>IndexFn: File handle(s)
            IndexFn->>DB: Insert/append FILE migration rows (INDEXED and ALREADY_MIGRATED)
        else file_version_strategy = skip
            Note over IndexFn: Skip file entities
        end

    else Entity is Table (include_table_files=true)
        IndexFn->>Synapse: get_columns(table_id)
        Synapse-->>IndexFn: Column list
        IndexFn->>Synapse: Query rows for FILEHANDLEID columns (+ rowId,rowVersion)
        Synapse-->>IndexFn: Row results (fileHandleId values)
        loop For each row + file-handle cell (bounded concurrency)
            IndexFn->>Synapse: get_file_handle_for_download(fileHandleId, objectType=TableEntity)
            Synapse-->>IndexFn: File handle
            IndexFn->>DB: Insert TABLE_ATTACHED_FILE migration row (or ALREADY_MIGRATED)
        end
    end

    opt continue_on_error=true
        Note over IndexFn,DB: Indexing errors are recorded in DB instead of aborting
    end

    IndexFn-->>Entity: MigrationResult (db_path)
    deactivate IndexFn

    Entity-->>User: MigrationResult
    deactivate Entity

    Note over User,Synapse: === Phase 2: Migrate Files ===
    User->>Entity: migrate_indexed_files / migrate_indexed_files_async (db_path)
    activate Entity

    Entity->>MigrateFn: Start migration
    activate MigrateFn

    MigrateFn->>DB: Open DB, ensure schema, load settings
    MigrateFn->>User: Confirm migration (unless force=True)
    Note over MigrateFn,DB: If not confirmed, abort and return

    loop While there are indexed items
        MigrateFn->>DB: Query next batch (respecting pending/completed handles & concurrency)

        loop For each item in batch
            MigrateFn->>MigrateFn: Skip if key or file handle already pending

            MigrateFn->>DB: Check if destination file handle already exists
            alt Existing copy found
                Note over MigrateFn,DB: Reuse existing to_file_handle_id
            else No existing copy
                MigrateFn->>Synapse: Copy file to new storage (bounded concurrency)
                Synapse-->>MigrateFn: New to_file_handle_id
            end

            alt Item is FILE (entity)
                alt file_version_strategy = new (version is None)
                    MigrateFn->>Synapse: Create new file version with new file handle
                else specific version
                    MigrateFn->>Synapse: Update existing version's file handle
                end
            else Item is TABLE_ATTACHED_FILE
                alt create_table_snapshots=True
                    MigrateFn->>Synapse: Create table snapshot
                end
                MigrateFn->>Synapse: Update table cell via transactional table update (PartialRowSet/TableUpdateTransaction)
            end

            MigrateFn->>DB: Update row status to MIGRATED/ERRORED
        end

    end

    MigrateFn-->>Entity: MigrationResult (migrated counts)
    deactivate MigrateFn

    Entity-->>User: MigrationResult
    deactivate Entity


Migration Strategies

Strategy Description
new Create new file versions in destination (default)
all Migrate all versions of each file
latest Only migrate the latest version
skip Skip if file already exists in destination



Learn More

Resource Description
Storage Location Tutorial Step-by-step guide to using storage locations
StorageLocation API Reference Complete API documentation
[StorageLocation Mixin][synapseclient.models.mixins.StorageLocation] Mixin methods for Projects and Folders
Custom Storage Locations (Synapse Docs) Official Synapse documentation