fortran-s3-accessor

Brief description

fortran-s3-accessor provides a simple, direct interface for accessing S3-compatible object storage from Fortran programs. The library is designed for scientific computing workflows that need to read and write data to cloud object storage with minimal friction.

Key Features

High-performance streaming: Direct memory streaming via POSIX popen() eliminates disk I/O overhead
Direct HTTP-based S3 operations: GET, PUT, DELETE, and HEAD requests via curl
Dual interface design:
Low-level s3_http module for direct S3 operations
High-level s3_io module providing familiar Fortran I/O patterns (open/read/write/close)
URI support: Work with s3://bucket/key URIs for seamless cross-bucket operations
Public bucket access: Read from public S3 buckets without authentication
Cross-platform compatibility: Native streaming on Linux/macOS/Unix, automatic fallback on Windows
Comprehensive testing: Mock-based testing framework with 22+ test cases
Zero dependencies: Uses only standard Fortran 2008 and system curl
Production ready: Designed for integration with scientific computing workflows (FESOM, climate models, etc.)

Architecture

The library follows a layered architecture with three core modules:

Streaming Module: `curl_stream`

Provides C interoperability layer for high-performance streaming:

stream_command_output(command, output, exit_status): Stream command output directly to memory via popen()
is_streaming_available(): Platform detection for streaming capability
Uses POSIX C bindings (popen, fread, pclose) for zero-copy streaming
Automatic fallback to temp file method on non-POSIX platforms (e.g., Windows)

Core Module: `s3_http`

Provides direct curl-based HTTP access to S3 operations:

s3_config: Configuration type containing bucket name, region, endpoint, credentials, and protocol settings
s3_init(config): Initialize the library with an S3 configuration
s3_get_object(key, content): Download object content from S3
s3_put_object(key, content): Upload object content to S3 (requires authentication)
s3_object_exists(key): Check if an object exists using HTTP HEAD requests
s3_delete_object(key): Delete an object from S3 (requires authentication)

URI-based Operations

For convenience, the library provides URI-aware versions of all operations:

s3_get_uri(uri, content): Get object using s3://bucket/key format
s3_put_uri(uri, content): Put object using URI
s3_exists_uri(uri): Check existence using URI
s3_delete_uri(uri): Delete object using URI

These functions automatically parse the bucket name from the URI and temporarily switch contexts when accessing different buckets.

High-level Module: `s3_io`

Provides Fortran-like I/O interface built on top of s3_http:

s3_open(unit, key, mode, iostat): Open an S3 object for reading or writing
s3_close(unit, iostat): Close the S3 object (uploads on write mode)
s3_read_line(unit, line, iostat): Read a line from the object
s3_write_line(unit, line, iostat): Write a line to the object buffer
s3_rewind(unit, iostat): Rewind read position to beginning

This module manages up to 100 concurrent file handles with internal buffering.

Testing

The library includes comprehensive testing infrastructure using the test-drive framework:

Test Coverage

Mock-based testing: PATH manipulation replaces curl with controllable mock script
Error condition testing: Network failures, authentication errors, HTTP error codes, malformed responses
Edge case testing: Empty keys, very long keys, special characters, boundary conditions
Uninitialized state testing: Separate executable ensures proper isolation
Protocol testing: Both HTTP and HTTPS, various S3-compatible endpoints
URI parsing tests: Complex URIs, edge cases in bucket/key extraction

Running Tests

# Run all tests with mock curl
PATH="test/scripts:$PATH" fpm test

# Tests are defined in test/test_s3_http.f90
# Mock responses stored in test/data/responses/

Use Cases

Scientific Data Access

Access climate model output, observational data, and analysis results stored in S3:

Read NetCDF files from ESGF data nodes
Process CMIP6 climate model output
Access NOAA, NASA, and other public scientific datasets
Integrate with existing Fortran scientific workflows

Public Bucket Integration

The library excels at reading from public S3 buckets commonly used in scientific computing:

NOAA Global Forecast System (GFS) data
NASA Earth observation data
CMIP6 climate model archives
Genomics datasets (1000 Genomes, NIH)

Performance

Direct Memory Streaming (v1.1.0+)

The library uses POSIX popen() for high-performance streaming on supported platforms:

Linux/macOS/Unix: Network → Memory (direct streaming, minimal overhead)
Windows: Network → /tmp → Memory (automatic fallback)

This eliminates disk I/O bottlenecks present in v1.0.0, providing production-grade performance for HPC and scientific computing workflows.

Limitations

URL encoding: Special characters in S3 keys are currently untested; use alphanumeric keys with underscores for safety
Authentication: Uses simplified credential checking; production write operations require AWS Signature v4 (planned)
Windows streaming: Uses temporary file fallback; native streaming planned for v1.2.0
Binary data: Current implementation optimized for text; binary support exists but is less tested
Dependencies: Requires system curl command to be available in PATH
Parallel I/O: No built-in support for concurrent multi-threaded access

Examples

Example 1: Basic Object Download

Download a text file from a public S3 bucket:

program basic_download
    use s3_http
    implicit none

    type(s3_config) :: config
    character(len=:), allocatable :: content
    logical :: success

    ! Configure for public NOAA bucket
    config%bucket = 'noaa-gfs-bdp-pds'
    config%region = 'us-east-1'
    config%endpoint = 's3.amazonaws.com'
    config%use_https = .true.
    call s3_init(config)

    ! Download README
    success = s3_get_object('README.md', content)
    if (success) then
        print *, 'Downloaded ', len(content), ' bytes'
        print *, content
    else
        print *, 'Download failed'
    end if
end program basic_download

Example 2: Check Object Existence

Verify an object exists before attempting to download:

program check_exists
    use s3_http
    implicit none

    type(s3_config) :: config
    logical :: exists

    config%bucket = 'my-bucket'
    config%use_https = .true.
    call s3_init(config)

    exists = s3_object_exists('data/input.nc')
    if (exists) then
        print *, 'File exists, proceeding with download...'
    else
        print *, 'File not found, using default data'
    end if
end program check_exists

Example 3: Using Fortran I/O Interface

Read an S3 object line by line like a regular file:

program read_lines
    use s3_http
    use s3_io
    implicit none

    type(s3_config) :: config
    integer :: unit, iostat, line_count
    character(len=1024) :: line

    ! Initialize
    config%bucket = 'my-data-bucket'
    call s3_init(config)

    ! Open S3 object
    call s3_open(unit, 'data/measurements.csv', 'read', iostat)
    if (iostat /= 0) stop 'Failed to open file'

    ! Read all lines
    line_count = 0
    do
        call s3_read_line(unit, line, iostat)
        if (iostat /= 0) exit
        line_count = line_count + 1
        print *, 'Line ', line_count, ': ', trim(line)
    end do

    call s3_close(unit, iostat)
    print *, 'Read ', line_count, ' lines'
end program read_lines

Example 4: URI-based Access Across Buckets

Use s3:// URIs to seamlessly access multiple buckets:

program uri_access
    use s3_http
    implicit none

    type(s3_config) :: config
    character(len=:), allocatable :: content
    logical :: success

    ! Initialize with default bucket
    config%bucket = 'default-bucket'
    config%use_https = .true.
    call s3_init(config)

    ! Access different bucket via URI
    success = s3_get_uri('s3://other-bucket/path/to/data.txt', content)
    if (success) then
        print *, 'Content from other-bucket: ', content
    end if

    ! Original bucket still accessible
    success = s3_get_object('local-data.txt', content)
end program uri_access

Example 5: NetCDF Climate Data

Download and process NetCDF files from ESGF climate data archives:

program climate_data
    use s3_http
    use netcdf
    implicit none

    type(s3_config) :: config
    character(len=:), allocatable :: nc_data
    character(len=*), parameter :: climate_uri = &
        's3://esgf-world/CMIP6/CMIP/AWI/AWI-ESM-1-1-LR/piControl/r1i1p1f1/fx/areacella/gn/v20200212/' // &
        'areacella_fx_AWI-ESM-1-1-LR_piControl_r1i1p1f1_gn.nc'
    logical :: success

    ! Configure for public ESGF bucket
    config%use_https = .true.
    call s3_init(config)

    ! Download NetCDF file
    success = s3_get_uri(climate_uri, nc_data)
    if (success) then
        ! Write to temp file and process with NetCDF library
        open(10, file='/tmp/climate.nc', form='unformatted', access='stream')
        write(10) nc_data
        close(10)

        ! Now use NetCDF library to read the file
        print *, 'Climate data downloaded, ready for processing'
    end if
end program climate_data

Example 6: Writing Data (with Authentication)

Upload data to S3 (requires AWS credentials):

program write_data
    use s3_http
    implicit none

    type(s3_config) :: config
    character(len=:), allocatable :: data
    logical :: success

    ! Configure with credentials
    config%bucket = 'my-output-bucket'
    config%access_key = 'AKIAIOSFODNN7EXAMPLE'
    config%secret_key = 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'
    config%use_https = .true.
    call s3_init(config)

    ! Create data
    data = 'simulation results: temp=25.3, pressure=1013.2'

    ! Upload to S3
    success = s3_put_object('results/output.txt', data)
    if (success) then
        print *, 'Data uploaded successfully'
    else
        print *, 'Upload failed'
    end if
end program write_data

Getting Started

Clone the repository: bash git clone https://github.com/pgierz/fortran-s3-accessor.git
Build with FPM: bash fpm build fpm test # Run tests
Or build with CMake: bash mkdir build && cd build cmake .. -DCMAKE_BUILD_TYPE=Release make -j
Run examples: bash fpm run test_simple # Basic operations fpm run --example s3_netcdf_example # NetCDF example

Integration

To use in your project with FPM, add to fpm.toml:

[dependencies]
fortran-s3-accessor = { git = "https://github.com/pgierz/fortran-s3-accessor.git" }

Then in your code:

use s3_http        ! For direct S3 operations
use s3_io          ! For Fortran I/O interface

fortran-s3-accessor

Brief description

Key Features

Architecture

Streaming Module: `curl_stream`

Core Module: `s3_http`

URI-based Operations

High-level Module: `s3_io`

Testing

Test Coverage

Running Tests

Use Cases

Scientific Data Access

Public Bucket Integration

Performance

Direct Memory Streaming (v1.1.0+)

Limitations

Examples

Example 1: Basic Object Download

Example 2: Check Object Existence

Example 3: Using Fortran I/O Interface

Example 4: URI-based Access Across Buckets

Example 5: NetCDF Climate Data

Example 6: Writing Data (with Authentication)

Getting Started

Integration

Developer Info

Paul Gierz

Source Files

Modules

Procedures

Derived Types

fortran-s3-accessor

Brief description

Key Features

Architecture

Streaming Module: curl_stream

Core Module: s3_http

URI-based Operations

High-level Module: s3_io

Testing

Test Coverage

Running Tests

Use Cases

Scientific Data Access

Public Bucket Integration

Performance

Direct Memory Streaming (v1.1.0+)

Limitations

Examples

Example 1: Basic Object Download

Example 2: Check Object Existence

Example 3: Using Fortran I/O Interface

Example 4: URI-based Access Across Buckets

Example 5: NetCDF Climate Data

Example 6: Writing Data (with Authentication)

Getting Started

Integration

Developer Info

Paul Gierz

Source Files

Modules

Procedures

Derived Types

Streaming Module: `curl_stream`

Core Module: `s3_http`

High-level Module: `s3_io`