File version control with custom names

We will go through a use case that has recently come up to create name for a version of a file. I have decided to use s3 to accomplish this and we will learn how to do this

We will go through a use case that has recently come up to create name for a version of a file. I have decided to use s3 to accomplish this and we will learn how to do this.

Use case

As a user I should be able to save the file with a version name when there are any changes to it. As a user I should be able to see the list of versions of a file. As a user I should be able to read the specific version of a file.

Install

Idea

S3 keeps versions of a file when ever we update them and I have decided that instead of re-inventing the wheel i’ll just use this to accomplish this. However, there is a gotcha! S3 has version id to each version that was created and we need to come up with a solution to persist the names of the versions created so that human/user can see the name instead of version id that he doesn’t understand. Since we are already using s3 lets also persist the names with in s3 as well. I’ve created a meta file that has the data to understand versions that were created for a file. Each user has his own folder where he saves file and has meta file in the same folder.

Modules

Let’s break the use case into small and tiny modules/functions that can acheive a specific goal. I’ve identified the following

  • get_meta_file
  • update_meta_file
  • get_file
  • save_file
  • list_versions
  • sort_versions

Tests

Let’s do this in a test driven development approach by writing up some tests in tests.py

First thing first, lets write our conftest

import boto3
import pytest

from moto import mock_s3


@pytest.fixture(scope="function")
def _s3():
    with mock_s3():
        conn = boto3.resource("s3", region_name="us-east-1")
        conn.create_bucket(Bucket="bucket.proxyroot.com")
        versioning = conn.BucketVersioning("bucket.proxyroot.com")
        versioning.enable()
        yield {"client": boto3.client("s3", region_name="us-east-1"), "resource": conn}

test_get_meta_file

When a file path is passed to the function it should return the versions of a file name if it exists otherwise it returns empty list.

from version import (
    list_versions,
    sort_versions,
    get_meta_file,
    save_file,
)


def test_get_meta_file(_s3):
    assert get_meta_file("proxyroot/test.txt") == []

list_versions

Given a path to a file, returns list of versions of that file

def test_list_versions_happy_path(_s3, mocker):
    _s3["client"].put_object(
        Bucket="bucket.proxyroot.com", Key="proxyroot/test.txt", Body="scan stl file",
    )
    expected = [{"created": mocker.ANY, "id": mocker.ANY}]

    assert list_versions("proxyroot/test.txt") == expected

    _s3["client"].put_object(
        Bucket="bucket.proxyroot.com", Key="proxyroot/test.txt", Body="scan stl file",
    )
    expected = [
        {"created": mocker.ANY, "id": mocker.ANY},
        {"created": mocker.ANY, "id": mocker.ANY},
    ]
    assert list_versions("proxyroot/test.txt") == expected

sort_versions

Given a list of versions sort version sorts based on the created timestamp

def test_sort_versions():
    versions = [
        {"id": "kjksdf", "created": "April 23, 2019"},
        {"id": "asdf", "created": "April 21, 2019"},
    ]
    expected = [
        {"id": "asdf", "created": "April 21, 2019"},
        {"id": "kjksdf", "created": "April 23, 2019"},
    ]
    assert sort_versions(versions) == expected

save_file

Save a file with the content which creates versions and saves it in meta file. Returns versions list in the end.

def test_save_file(_s3, mocker):
    versions = save_file("proxyroot", "test2.txt", "Hello World", "first")

    assert versions == [
        {"created": mocker.ANY, "name": "first"},
    ]

    versions = save_file("proxyroot", "test2.txt", "Hello World\nHello World", "second")

    assert versions == [
        {"created": mocker.ANY, "name": "first"},
        {"created": mocker.ANY, "name": "second"},
    ]

Implementation

import boto3
import botocore
import json

from datetime import datetime
from dateutil import parser

S3_BUCKET = "bucket.proxyroot.com"


def get_meta_file(folder):
    """
    Given a folder (proxyroot), returns meta file contents

    :return: list<dict>
    """
    key = f"{folder}/meta.json"
    S3 = boto3.resource("s3")
    content_object = S3.Object(S3_BUCKET, key)
    try:
        object = content_object.get()
    except botocore.exceptions.ClientError:
        return []
    return json.loads(object["Body"].read().decode("utf-8"))


def list_versions(key: str):
    """
    Given a string key (proxyroot/test.txt), returns list of versions

    :return: list<dict>
    """
    S3 = boto3.resource("s3")
    versions = S3.Bucket(S3_BUCKET).object_versions.filter(Prefix=key)
    results = []
    for version in versions:
        obj = version.get()
        results.append(
            {"id": obj.get("VersionId"), "created": f'{obj.get("LastModified")}'}
        )
    return results


def sort_versions(versions: str):
    """
    Sort list of dictionaries based on created timestamp in the dictionary

    :return: list<dict>
    """
    return sorted(versions, key=lambda item: parser.parse(item["created"]))


def update_meta_file(folder, name, version_id):
    """
    Given a folder and name of version with version id. Saves the version into meta file

    :return: list<dict>
    """
    S3 = boto3.resource("s3")
    key = f"{folder}/meta.json"
    versions = get_meta_file(folder)
    versions.append({"name": name, "created": f"{datetime.now()}"})
    object = S3.Object(S3_BUCKET, key)
    object.put(Body=json.dumps(versions))
    return versions


def save_file(folder, file_name, content, name):
    """
    Given a folder, file_name and content and version name, saves the file in s3 and updates meta file

    :return: list<dict>
    """
    S3 = boto3.resource("s3")
    key = f"{folder}/{file_name}"
    object = S3.Object(S3_BUCKET, key)
    object.put(Body=content)
    versions = sort_versions(list_versions(key))
    return update_meta_file(folder, name, versions[-1]["id"])

Test

Let’s run the tests

$ poetryrun pytest test.py -s -vv
=============================================== test session starts ================================================
platform darwin -- Python 3.6.8rc1, pytest-5.4.2, py-1.8.1, pluggy-0.13.1 -- /Users/proxyroot/Library/Caches/pypoetry/virtualenvs/s3versioning-py3.6/bin/python
cachedir: .pytest_cache
rootdir: /Users/proxyroot/personal/s3versioning
plugins: mock-3.1.0
collected 4 items                                                                                                  

test.py::test_get_meta_file PASSED
test.py::test_list_versions_happy_path PASSED
test.py::test_sort_versions PASSED
test.py::test_save_file PASSED

================================================ 4 passed in 0.45s =================================================

Hooray! :tada:

This is exactly what we can do to persist the names of the versions in s3. No other persistence is required apart from what we already used here S3.

Alternative persistence layer

You can also accomplish this in a few other ways.

  • Github (It already has commit hash that you can use to get the file and save the file. You can also persist names in a gist or meta file in the same repo)
  • Google Buckets (Well by the name of it copy.deepcopy(“Amazon S3 Bucket”) == “Google Buckets”)

Well any service that provides versioning can be used but these are the things that I feel are more reliable, fast and easy to do.

Note

I haven’t implemented the get_file function. I’ll leave it for you to write for yourself. You can find the code for the above at s3versioning

Comments

Add a Comment