How Let's Mack Works

Let's Mack generates a 3–5 second AI kissing video from your photo in 30–60 seconds. Here's the plain-language explanation of what happens in those 30–60 seconds — what the AI does, what it doesn't do, and why we're not a deepfake tool. For the safety / policy framing see Trust & Safety; for definitions see the Glossary.

The 4-step pipeline

Step 1: Photo upload + screening

You upload a photo (single-photo mode) or two photos (Two-Photo Mode). Before any generation runs, every uploaded image passes through a screening layer that checks for minors, explicit content, and policy violations. Failures are rejected immediately and the photo is not stored.

Step 2: Face detection & analysis

The AI identifies and maps the faces in the uploaded photo. For single-photo mode, it locates two faces in one image. For Two-Photo Mode, it identifies one face per photo and prepares them for composition. Mapping includes pose estimation, gaze direction, lighting analysis, and skin-tone matching.

Step 3: Motion generation conditioned on style prompt

This is the core of the Let's Mack model. The AI generates realistic facial motion — lip movement, head tilting, slight body movement — conditioned on:

Each style is its own prompt — Movie Kiss tells the model to render dramatic Hollywood angles and warm lighting; NPC Energy tells it to render stiff video-game motion. That's why each Let's Mack style looks completely different rather than being a "filter" applied to a fixed kiss.

Step 4: Video synthesis

The motion sequence is rendered into a complete 3–5 second clip blending the original image with the generated motion and any style-specific ambient effects (rain, sparkles, snow, glitches, etc.). The clip is encoded and delivered to your account.

What the AI does NOT do

Three categories of "what AI video models do" that Let's Mack explicitly avoids:

It does not face-swap

Face swap apps replace one face with another. Let's Mack keeps the faces from your photo as the same identities throughout the output. Two-Photo Mode places each person's face into a shared scene as themselves — your face stays your face; the other person's face stays theirs.

It does not synthesize speech

Lip sync apps animate a face's mouth to match a target audio track. Let's Mack outputs are silent and the kissing motion is not coupled to any audio. We do not put words in anyone's mouth.

It does not start from real video and modify what happened

Deepfake tools take an existing video of a real person and modify what they did. Let's Mack starts with a still photograph and generates new motion on top of it in a deliberately stylized look. The output is not "this person doing X in real life"; it is "this stylized AI rendering of an animated kiss in [Telenovela / NPC Energy / Rain Kiss / etc.] aesthetic."

For the deeper distinction between AI kissing videos, deepfakes, face swaps, and lip sync see the Glossary.

How long does this take

End-to-end, from "tap Generate" to "video ready":

What hardware runs the model

Let's Mack runs on cloud GPU infrastructure (NVIDIA H100 / A100 class). We do not require any local hardware on your phone or computer beyond a modern web browser. The generation cost is what makes free videos finite — each generation has a real GPU minute associated with it.

Privacy of your photos

The 36-style catalog

Each style is a separate AI generation, not a filter applied to a fixed kiss. That's why Movie Kiss looks photoreal and dramatic while NPC Energy looks glitchy and stiff — they're literally different model invocations with different prompts. Browse all 36 in the dedicated style hubs:

Get started

Try Let's Mack free — 3 videos on signup, no credit card required. Pick Classic for your first video to see the photoreal output, then explore the catalog.

Frequently Asked Questions

How does Let's Mack generate AI kissing videos?

Four steps: (1) photo upload + screening for minors / explicit content, (2) face detection and pose / lighting analysis, (3) motion generation conditioned on the chosen style prompt — for Two-Photo Mode, a composition step places both faces into a shared scene first, (4) video synthesis blending the original image with the generated motion. End-to-end takes 30–60 seconds.

Is Let's Mack a deepfake tool?

No. We don't face-swap (the faces in your output are the faces in your input), don't synthesize speech (output is silent), and don't start from real video and modify what happened (we start from still photos and generate new stylized motion on top). See the Glossary for the full distinction between AI kissing videos and deepfakes.

How long does AI kiss video generation take?

30–45 seconds typical for single-photo mode, 40–60 seconds for Two-Photo Mode (the composition step adds 10–15 seconds). Real-time progress is shown in the UI.

What hardware does Let's Mack run on?

Cloud GPU infrastructure (NVIDIA H100 / A100 class). No local hardware required beyond a modern web browser on your phone or computer. The GPU cost per generation is what makes free videos finite — each video has a real GPU minute associated with it.

Are uploaded photos kept on Let's Mack servers?

No. Photos are processed during generation and discarded shortly after. We retain account metadata (style picked, date, etc.) but never the source images. Generated videos are private to your account by default. We don't sell user photos and don't use them to train third-party models. See the Privacy Policy for details.

Why do different styles look so different?

Each of the 36 styles is a separate AI generation with its own prompt, not a filter applied to a fixed kiss. Movie Kiss tells the model "dramatic Hollywood angle, warm lighting, wind-blown hair." NPC Energy tells it "stiff video-game motion, blank expression, polygon flickers." That's why each style produces a completely different visual.

Can the AI lip-sync to audio?

No. Let's Mack outputs are silent. The kissing motion is generated independently of any audio track. We are not a lip-sync product. Users typically pair the silent video with their own TikTok / Reels audio overlay after download.

Related Tools