Photo Idempotency in DynamoDB

Create a stable photoId from S3 uploads and write exactly one record per photo into DynamoDB.

Goal

When a photo is uploaded to s3://beetroot-raw/photos-raw/, the ingestion Lambda should:

Extract bucket and key from the S3 event
Ignore any uploads not under photos-raw/
Compute a stable photoId
Write one item into the Photos table
Avoid double-processing if the same upload triggers more than once

Why idempotency matters

S3 events can sometimes be delivered more than once. If we write to DynamoDB without protection, we may create duplicates and corrupt our data.

We’ll solve this by using:

A deterministic photoId (same file path → same ID)
A conditional DynamoDB write (attribute_not_exists(photoId))

How we avoid duplicates (idempotency)

We make uploads safe to retry by doing two things:

Deterministic photoId: the ID is generated from bucket + key, so same S3 path always produces same photoId.
Conditional write: DynamoDB writes the item only if it doesn’t already exist using attribute_not_exists(photoId). If the same upload triggers again, we skip it instead of creating a duplicate.

Add Environment Variables (Lambda Console)

Go to Lambda → beetroot-ingest → Configuration → Environment variables and add:

PHOTOS_TABLE = Photos
RAW_PREFIX = photos-raw/

Why add environment variables?

Hard-coding resources names makes code harder to reuse. Using environment variables keeps it clean and configurable.

Lambda Code

What this code does?

It reads the S3 event
Generates a stable photoId
Inserts a record into DynamoDB only if it doesn’t already exist

Part 1: Imports

This section brings in everything we need:

standard Python utilities (JSON, hashing, timestamps)
URL decoding for S3 keys
- AWS SDK (boto3) + error type for clean handling

import json
import os
import hashlib
from datetime import datetime, timezone
from urllib.parse import unquote_plus

import boto3
from botocore.exceptions import ClientError

Why unquote_plus?

S3 object keys in event payloads can be URL-encoded (for example, spaces may appear as +). We decode them so we always hash and store the real path.

Part 2: AWS clients + env vars

We create the DynamoDB resource once (outside the handler) so it can be reused across invocations.

We also read configuration from environment variables (with safe defaults).

ddb = boto3.resource("dynamodb")

# Use env var if present, otherwise default to "Photos"
PHOTOS_TABLE = os.environ.get("PHOTOS_TABLE", "Photos")
RAW_PREFIX = os.environ.get("RAW_PREFIX", "photos-raw/")

photos_table = ddb.Table(PHOTOS_TABLE)

Part 3: `photoId` (Idempotency key)

We generate a stable ID based on the S3 object path. Same bucket + key will always produce the same ID.

def make_photo_id(bucket: str, key: str) -> str:
    """
    Deterministic, stable photoId.
    We use SHA-256(bucket/key) but keep only first 20 hex chars (short + stable).
    """
    raw = f"{bucket}/{key}".encode("utf-8")
    return hashlib.sha256(raw).hexdigest()[:20]

Why this matters

S3 can sometimes trigger the same upload more than once. A deterministic photoId lets us detect “this is the same file again”.

raw = f"{bucket}/{key}".encode("utf-8")
photo_id = hashlib.sha256(raw).hexdigest()[:20]

Example:

bucket = "beetroot-raw"
key = "photos-raw/group1.jpg"

Example output:

photoId = "a1f09c2b7e3d4a91b6c2"

It will be the same every time for the same bucket/key.

Part 4: Handler entry + read event records

The Lambda handler receives the event payload. For S3 triggers, it includes a list called Records.

def lambda_handler(event, context):
    records = event.get("Records", [])
    if not records:
        print("No Records found; nothing to do.")
        return {"statusCode": 200, "body": "no records"}

Why we check `Records`

If the function is triggered manually (Test button), there may be no S3 records.
This avoids errors and keeps the handler safe.

Part 5: Parse `bucket + key`

We loop through each record (sometimes there can be more than one). Then we extract the bucket and object key.

for r in records:
    s3 = r.get("s3", {})
    bucket = s3.get("bucket", {}).get("name")
    key = s3.get("object", {}).get("key")

    if not bucket or not key:
        print("Skipping record: missing bucket/key")
        continue

    # S3 keys in events are URL-encoded sometimes
    key = unquote_plus(key)

    # Only process uploads under photos-raw/
    if not key.startswith(RAW_PREFIX):
        print(f"Skipping key not under RAW_PREFIX: {key}")
        continue

bucket = s3.get("bucket", {}).get("name")
key = s3.get("object", {}).get("key")
key = unquote_plus(key)

if not key.startswith(RAW_PREFIX):
    continue

Example event record:

bucket name: beetroot-raw
key: photos-raw/family+photo.jpg

After decoding:

key becomes: photos-raw/family photo.jpg

Prefix check:

passes because it starts with photos-raw/

Part 6: Build the DynamoDB item

We create a record to store metadata about the uploaded photo.

photo_id = make_photo_id(bucket, key) 
uploaded_at = datetime.now(timezone.utc).isoformat()

item = {
    "photoId": photo_id,
    "s3Bucket": bucket,
    "s3Key": key,
    "uploadedAt": uploaded_at,
}

What gets stored

photoId: stable ID (for idempotency)
s3Bucket + s3Key: where the photo lives
uploadedAt: timestamp for debugging and ordering

Part 7: Conditional write

This is where we prevent duplicates.

We write the item only if it does not already exist.

try:
    photos_table.put_item(
        Item=item,
        ConditionExpression="attribute_not_exists(photoId)", 
    )
    print(f"Photos: inserted photoId={photo_id} key={key}")
except ClientError as e:
    code = e.response.get("Error", {}).get("Code", "Unknown")
    if code == "ConditionalCheckFailedException":
        print(f"Photos: already exists, skipping photoId={photo_id} key={key}")
        continue
    print("DynamoDB put_item failed:", str(e))
    raise

First upload (record does not exist yet)

photos_table.put_item(
    Item=item,
    ConditionExpression="attribute_not_exists(photoId)",
)

bucket = "beetroot-raw"
key = "photos-raw/group1.jpg"

DynamoDB write succeeds
Logs show: inserted

Duplicate upload (same photo triggers again)

if code == "ConditionalCheckFailedException":
    print("already exists, skipping")

bucket = "beetroot-raw"
key = "photos-raw/group1.jpg"

The same S3 key triggers again, hence creating an item with the same photoId which already exists in DynamoDB

DynamoDB throws ConditionalCheckFailedException
We treat it as a normal "skip" (not a crash)
Logs show: already exists

Part 8: Return response

Finally, we return a normal success response.

return {"statusCode": 200, "body": "ingest lambda with s3 trigger ok"}

Why always return 200?

For S3 triggers, Lambda retries on errors. Returning success after handling duplicates avoids unnecessary retries.

Lambda Code

Having this code in your lambda function, click on Deploy to save it.

beetroot-ingest/lambda_function.py

import json
import os
import hashlib
from datetime import datetime, timezone
from urllib.parse import unquote_plus

import boto3
from botocore.exceptions import ClientError


ddb = boto3.resource("dynamodb")

# Use env var if present, otherwise default to "Photos"
PHOTOS_TABLE = os.environ.get("PHOTOS_TABLE", "Photos")
RAW_PREFIX = os.environ.get("RAW_PREFIX", "photos-raw/")

photos_table = ddb.Table(PHOTOS_TABLE)


def make_photo_id(bucket: str, key: str) -> str:
    """
    Deterministic, stable photoId.
    We use SHA-256(bucket/key) but keep only first 20 hex chars (short + stable).
    """
    raw = f"{bucket}/{key}".encode("utf-8")
    return hashlib.sha256(raw).hexdigest()[:20]


def lambda_handler(event, context):
    records = event.get("Records", [])
    if not records:
        print("No Records found; nothing to do.")
        return {"statusCode": 200, "body": "no records"}

    for r in records:
        s3 = r.get("s3", {})
        bucket = s3.get("bucket", {}).get("name")
        key = s3.get("object", {}).get("key")

        if not bucket or not key:
            print("Skipping record: missing bucket/key")
            continue

        # S3 keys in events are URL-encoded sometimes
        key = unquote_plus(key)

        # Only process uploads under photos-raw/
        if not key.startswith(RAW_PREFIX):
            print(f"Skipping key not under RAW_PREFIX: {key}")
            continue

        photo_id = make_photo_id(bucket, key)
        uploaded_at = datetime.now(timezone.utc).isoformat()

        item = {
            "photoId": photo_id,
            "s3Bucket": bucket,
            "s3Key": key,
            "uploadedAt": uploaded_at,
        }

        try:
            photos_table.put_item(
                Item=item,
                ConditionExpression="attribute_not_exists(photoId)",
            )
            print(f"Photos: inserted photoId={photo_id} key={key}")
        except ClientError as e:
            code = e.response.get("Error", {}).get("Code", "Unknown")
            if code == "ConditionalCheckFailedException":
                print(f"Photos: already exists, skipping photoId={photo_id} key={key}")
                continue
            print("DynamoDB put_item failed:", str(e))
            raise

    return {"statusCode": 200, "body": "ingest lambda with s3 trigger ok"}

Step 3: Test (two quick runs)

Upload a new file (insert)

Upload one new photo:

aws s3 cp ./v2-test-photos/group2.jpg s3://beetroot-raw/photos-raw/.jpg --region us-east-1

In CloudWatch logs, you should see:

Photos: inserted photoId=...

Upload same key again (skip)

Upload the same key again (same destination path):

aws s3 cp ./v2-test-photos/group2.jpg s3://beetroot-raw/photos-raw/.jpg --region us-east-1

In logs, you should see:

Photos: already exists, skipping ...

Where to confirm the record was written

Go to DynamoDB → Tables → Photos → Explore items and confirm a new item exists with:

photoId
s3Bucket
s3Key
uploadedAt

Common mistakes

If you see errors such as missing table name, ensure: - PHOTOS_TABLE exists in Lambda environment variables - Your code uses a safe fallback, os.environ.get("PHOTOS_TABLE", "Photos")

If the Lambda logs show “Skipping key not under RAW_PREFIX”, confirm: - Upload path starts with photos-raw/- RAW_PREFIX is set correctly (or left as default)

A full SHA-256 hex is 64 chars. We intentionally shorten it using:

hashlib.sha256(...).hexdigest()[:20]

This keeps the id deterministic but easier to read in DynamoDB / logs.

Photo Idempotency in DynamoDB

On this page