People Clusters in DynamoDB
Write Persons + Occurrences records to list people and fetch all photos where a person appears.
Goal
This phase helps in the build of the UI screen, which shows:
- People grid → list people with a thumbnail +
photoCount - Person detail → click a person → fetch all photos where they appear
We achieve this by writing two kinds of records:
Personstable: one item perpersonIdOccurrencestable: one item per(personId, photoId)pair
What is a personId in this workshop?
We use Rekognition's FaceId as personId. That means if Rekognition says
“this is the same face,” you get the same personId.
What changes in this phase
Until now, we were processing photos and generating thumbs. Now we also save the relationships so data is queryable later.
You could:
- Detect faces
- Crop and upload thumbnails
- Search/Index faces (get
personId)
But DynamoDB didn't store:
- “How many photos does this person appear in?”
- “Which photos belong to this person?”
For every detected face you now also write:
- Persons → increment
photoCount+ set a cover thumbnail (repThumbKey) - Occurrences → store one row linking
personIdto everyphotoIdit appears
Environment variables
PERSONS_TABLE=PersonsOCCURRENCES_TABLE=Occurrences
Step 1: Add table handles
PERSONS_TABLE_NAME = os.environ.get("PERSONS_TABLE", "Persons")
OCCURRENCES_TABLE_NAME = os.environ.get("OCCURRENCES_TABLE", "Occurrences")
persons_table = ddb.Table(PERSONS_TABLE_NAME)
occ_table = ddb.Table(OCCURRENCES_TABLE_NAME)Step 2: Store boundingBox safely
DynamoDB prefers numbers as Decimal (not Python floats).
So we convert the bounding box into a “DynamoDB friendly” map.
from decimal import Decimal
def as_decimal_bbox(bbox: dict) -> dict:
return {
"Left": Decimal(str(bbox.get("Left", 0.0))),
"Top": Decimal(str(bbox.get("Top", 0.0))),
"Width": Decimal(str(bbox.get("Width", 0.0))),
"Height": Decimal(str(bbox.get("Height", 0.0))),
}Example Rekognition bbox:
{"Left": 0.12, "Top": 0.08, "Width": 0.30, "Height": 0.40}
Same values, but stored as Decimal so DynamoDB saves them reliably.
Why store boundingBox at all?
Later, a UI can draw a rectangle over the original photo, or you can re-crop using the same bbox.
Step 3: Update Persons table
Create a new person if not found, else increment photoCount
What we want in Persons
personId(PK)createdAt(set once)repThumbKey(the “cover photo” set once)photoCount(increment each time we see this person in a new photo)
def upsert_person(person_id: str, rep_thumb_key: str, created_at: str):
persons_table.update_item(
Key={"personId": person_id},
UpdateExpression=(
"SET createdAt = if_not_exists(createdAt, :ca), "
"repThumbKey = if_not_exists(repThumbKey, :rt) "
"ADD photoCount :inc"
),
ExpressionAttributeValues={
":ca": created_at,
":rt": rep_thumb_key,
":inc": Decimal(1),
},
)Example:
person_id=0c7d...rep_thumb_key=faces-thumbs/0c7d.../ab12_face_1.jpgcreated_at=2026-01-21T...
In Persons table:
- If the person is new → item is created
createdAtis set oncerepThumbKeyis set oncephotoCountincreases by 1
Why use if_not_exists?
If the person appears again later, we do not want to overwrite their original
createdAt or change the “cover thumbnail”.
Step 4: Write Occurrences table (person ↔ photo link)
Capture confidence
Before writing an item into the Occurrences table, extract the per-face confidence score from Rekognition's FaceDetails. You'll use this value later when you call write_occurrence(...).
thumb_keys = []
for idx, fd in enumerate(face_details, start=1):
bbox = fd.get("BoundingBox", {})
confidence = fd.get("Confidence")
x1, y1, x2, y2 = bbox_to_pixels(bbox, img_w, img_h)
face_im = im.crop((x1, y1, x2, y2))What we want in Occurrences
One row per person per photo:
personId(Primary Key)photoId(Sort Key)photoBucket,photoKeythumbKeyboundingBoxconfidence
def write_occurrence(
person_id: str,
photo_id: str,
photo_bucket: str,
photo_key: str,
thumb_key: str,
bbox: dict,
confidence: float | None,
):
item = {
"personId": person_id,
"photoId": photo_id,
"photoBucket": photo_bucket,
"photoKey": photo_key,
"thumbKey": thumb_key,
"boundingBox": as_decimal_bbox(bbox),
"confidence": confidence
}
if confidence is not None:
item["confidence"] = Decimal(str(confidence))
occ_table.put_item(Item=item)Example values:
personId=0c7d...photoId=fe40...photoKey=photos-raw/st5.jpgthumbKey=faces-thumbs/0c7d.../fe40..._face_2.jpg
In Occurrences table:
- Query by
personId→ returns all photos where the person that occurred - Each row links the person to one photo + one thumbnail
Why make Occurrences a separate table?
Because “Photos for a person” becomes a clean DynamoDB query: PK = personId
→ returns all related photoId's fast.
Step 5: Call these functions
- Add all the functions definitions before
lambda_handler - Once you have
person_idandthumb_keyfor a face, you add these two writes:- update/create the person
- write the occurrence row
at the end of the face loop.
# continuing from the face loop...
thumb_keys.append(thumb_key)
# increment photoCount (for "most frequent" People grid)
# set repThumbKey once
upsert_person(
person_id=person_id,
rep_thumb_key=thumb_key,
created_at=now_iso,
)
# record that this person appears in this photo
write_occurrence(
person_id=person_id,
photo_id=photo_id,
photo_bucket=bucket,
photo_key=key,
thumb_key=thumb_key,
bbox=bbox,
confidence=confidence,
)
print(f"Wrote Persons/Occurrences for photoId={photo_id}")Test
Persons table
Upload 2 photos with the same 3 people.
In DynamoDB → Persons:
- You should still see 3 items (not 6)
photoCountshould increase (e.g., 1 → 2)repThumbKeyshould be present
Occurrences table
Pick one personId and query by it.
In DynamoDB → Occurrences:
- Partition key = that
personId - You should see 2 rows with different
photoId
Common questions
We use put_item for Occurrences. If the same person-photo pair is processed
again, overwriting is okay in this workshop because it keeps the latest stored
metadata.
The People grid needs a “most frequent” ordering. Incrementing photoCount
gives you that number without scanning the entire Occurrences table.
The UI needs one thumbnail to represent the person.
We pick the first one we ever stored and keep it stable using if_not_exists.
Suggestions (optional)
- If you later notice
photoCountis increasing too fast, it usually means the same photo is being reprocessed under a different key (or idempotency isn't working). - If you want “one occurrence per face” (instead of per person-photo), you can add a third key such as
faceIndexlater—but for beginners, person-photo is simpler.