People Clusters in DynamoDB

Write Persons + Occurrences records to list people and fetch all photos where a person appears.

Goal

This phase helps in the build of the UI screen, which shows:

People grid → list people with a thumbnail + photoCount
Person detail → click a person → fetch all photos where they appear

We achieve this by writing two kinds of records:

Persons table: one item per personId
Occurrences table: one item per (personId, photoId) pair

What is a personId in this workshop?

We use Rekognition's FaceId as personId. That means if Rekognition says “this is the same face,” you get the same personId.

What changes in this phase

Until now, we were processing photos and generating thumbs. Now we also save the relationships so data is queryable later.

You could:

Detect faces
Crop and upload thumbnails
Search/Index faces (get personId)

But DynamoDB didn't store:

“How many photos does this person appear in?”
“Which photos belong to this person?”

For every detected face you now also write:

Persons → increment photoCount + set a cover thumbnail (repThumbKey)
Occurrences → store one row linking personId to every photoId it appears

Environment variables

PERSONS_TABLE = Persons
OCCURRENCES_TABLE = Occurrences

Step 1: Add table handles

PERSONS_TABLE_NAME = os.environ.get("PERSONS_TABLE", "Persons")
OCCURRENCES_TABLE_NAME = os.environ.get("OCCURRENCES_TABLE", "Occurrences")

persons_table = ddb.Table(PERSONS_TABLE_NAME)
occ_table = ddb.Table(OCCURRENCES_TABLE_NAME)

Step 2: Store `boundingBox` safely

DynamoDB prefers numbers as Decimal (not Python floats). So we convert the bounding box into a “DynamoDB friendly” map.

from decimal import Decimal

def as_decimal_bbox(bbox: dict) -> dict:
    return {
        "Left": Decimal(str(bbox.get("Left", 0.0))),
        "Top": Decimal(str(bbox.get("Top", 0.0))),
        "Width": Decimal(str(bbox.get("Width", 0.0))),
        "Height": Decimal(str(bbox.get("Height", 0.0))),
    }

Example Rekognition bbox:

{"Left": 0.12, "Top": 0.08, "Width": 0.30, "Height": 0.40}

Same values, but stored as Decimal so DynamoDB saves them reliably.

Why store boundingBox at all?

Later, a UI can draw a rectangle over the original photo, or you can re-crop using the same bbox.

Step 3: Update `Persons` table

Create a new person if not found, else increment photoCount

What we want in `Persons`

personId (PK)
createdAt (set once)
repThumbKey (the “cover photo” set once)
photoCount (increment each time we see this person in a new photo)

def upsert_person(person_id: str, rep_thumb_key: str, created_at: str):
    persons_table.update_item(
        Key={"personId": person_id},
        UpdateExpression=(
            "SET createdAt = if_not_exists(createdAt, :ca), "
            "repThumbKey = if_not_exists(repThumbKey, :rt) "
            "ADD photoCount :inc"
        ),
        ExpressionAttributeValues={
            ":ca": created_at,
            ":rt": rep_thumb_key,
            ":inc": Decimal(1),
        },
    )

Example:

person_id = 0c7d...
rep_thumb_key = faces-thumbs/0c7d.../ab12_face_1.jpg
created_at = 2026-01-21T...

In Persons table:

If the person is new → item is created
createdAt is set once
repThumbKey is set once
photoCount increases by 1

Why use if_not_exists?

If the person appears again later, we do not want to overwrite their original createdAt or change the “cover thumbnail”.

Step 4: Write `Occurrences` table (person ↔ photo link)

Capture `confidence`

Before writing an item into the Occurrences table, extract the per-face confidence score from Rekognition's FaceDetails. You'll use this value later when you call write_occurrence(...).

thumb_keys = []

for idx, fd in enumerate(face_details, start=1):
    bbox = fd.get("BoundingBox", {})
    confidence = fd.get("Confidence")  

    x1, y1, x2, y2 = bbox_to_pixels(bbox, img_w, img_h)

    face_im = im.crop((x1, y1, x2, y2))

What we want in `Occurrences`

One row per person per photo:

personId (Primary Key)
photoId (Sort Key)
photoBucket, photoKey
thumbKey
boundingBox
confidence

def write_occurrence(
    person_id: str,
    photo_id: str,
    photo_bucket: str,
    photo_key: str,
    thumb_key: str,
    bbox: dict,
    confidence: float | None,
):
    item = {
        "personId": person_id,
        "photoId": photo_id,
        "photoBucket": photo_bucket,
        "photoKey": photo_key,
        "thumbKey": thumb_key,
        "boundingBox": as_decimal_bbox(bbox),
        "confidence": confidence
    }
    if confidence is not None:
        item["confidence"] = Decimal(str(confidence))

    occ_table.put_item(Item=item)

Example values:

personId = 0c7d...
photoId = fe40...
photoKey = photos-raw/st5.jpg
thumbKey = faces-thumbs/0c7d.../fe40..._face_2.jpg

In Occurrences table:

Query by personId → returns all photos where the person that occurred
Each row links the person to one photo + one thumbnail

Why make Occurrences a separate table?

Because “Photos for a person” becomes a clean DynamoDB query: PK = personId → returns all related photoId's fast.

Step 5: Call these functions

Add all the functions definitions before lambda_handler
Once you have person_id and thumb_key for a face, you add these two writes:
- update/create the person
- write the occurrence row

at the end of the face loop.

    # continuing from the face loop...
    thumb_keys.append(thumb_key)

    # increment photoCount (for "most frequent" People grid)
    # set repThumbKey once
    upsert_person(
        person_id=person_id,
        rep_thumb_key=thumb_key,
        created_at=now_iso,
    )

    # record that this person appears in this photo
    write_occurrence(
        person_id=person_id,
        photo_id=photo_id,
        photo_bucket=bucket,
        photo_key=key,
        thumb_key=thumb_key,
        bbox=bbox,
        confidence=confidence,
    )

print(f"Wrote Persons/Occurrences for photoId={photo_id}")

Test

`Persons` table

Upload 2 photos with the same 3 people.

In DynamoDB → Persons:

You should still see 3 items (not 6)
photoCount should increase (e.g., 1 → 2)
repThumbKey should be present

`Occurrences` table

Pick one personId and query by it.

In DynamoDB → Occurrences:

Partition key = that personId
You should see 2 rows with different photoId

Common questions

We use put_item for Occurrences. If the same person-photo pair is processed again, overwriting is okay in this workshop because it keeps the latest stored metadata.

The People grid needs a “most frequent” ordering. Incrementing photoCount gives you that number without scanning the entire Occurrences table.

The UI needs one thumbnail to represent the person. We pick the first one we ever stored and keep it stable using if_not_exists.

Suggestions (optional)

If you later notice photoCount is increasing too fast, it usually means the same photo is being reprocessed under a different key (or idempotency isn't working).
If you want “one occurrence per face” (instead of per person-photo), you can add a third key such as faceIndex later—but for beginners, person-photo is simpler.

People Clusters in DynamoDB

On this page