How Let's Mack Works
Let's Mack generates a 3–5 second AI kissing video from your photo in 30–60 seconds. Here's the plain-language explanation of what happens in those 30–60 seconds — what the AI does, what it doesn't do, and why we're not a deepfake tool. For the safety / policy framing see Trust & Safety; for definitions see the Glossary.
The 4-step pipeline
Step 1: Photo upload + screening
You upload a photo (single-photo mode) or two photos (Two-Photo Mode). Before any generation runs, every uploaded image passes through a screening layer that checks for minors, explicit content, and policy violations. Failures are rejected immediately and the photo is not stored.
Step 2: Face detection & analysis
The AI identifies and maps the faces in the uploaded photo. For single-photo mode, it locates two faces in one image. For Two-Photo Mode, it identifies one face per photo and prepares them for composition. Mapping includes pose estimation, gaze direction, lighting analysis, and skin-tone matching.
Step 3: Motion generation conditioned on style prompt
This is the core of the Let's Mack model. The AI generates realistic facial motion — lip movement, head tilting, slight body movement — conditioned on:
- The style prompt you picked from the 36-style catalog (e.g., Movie Kiss, Rain Kiss, NPC Energy, FBI Open Up).
- The faces extracted in Step 2.
- For Two-Photo Mode: a composition step that places the two faces into a shared scene before motion synthesis.
Each style is its own prompt — Movie Kiss tells the model to render dramatic Hollywood angles and warm lighting; NPC Energy tells it to render stiff video-game motion. That's why each Let's Mack style looks completely different rather than being a "filter" applied to a fixed kiss.
Step 4: Video synthesis
The motion sequence is rendered into a complete 3–5 second clip blending the original image with the generated motion and any style-specific ambient effects (rain, sparkles, snow, glitches, etc.). The clip is encoded and delivered to your account.
What the AI does NOT do
Three categories of "what AI video models do" that Let's Mack explicitly avoids:
It does not face-swap
Face swap apps replace one face with another. Let's Mack keeps the faces from your photo as the same identities throughout the output. Two-Photo Mode places each person's face into a shared scene as themselves — your face stays your face; the other person's face stays theirs.
It does not synthesize speech
Lip sync apps animate a face's mouth to match a target audio track. Let's Mack outputs are silent and the kissing motion is not coupled to any audio. We do not put words in anyone's mouth.
It does not start from real video and modify what happened
Deepfake tools take an existing video of a real person and modify what they did. Let's Mack starts with a still photograph and generates new motion on top of it in a deliberately stylized look. The output is not "this person doing X in real life"; it is "this stylized AI rendering of an animated kiss in [Telenovela / NPC Energy / Rain Kiss / etc.] aesthetic."
For the deeper distinction between AI kissing videos, deepfakes, face swaps, and lip sync see the Glossary.
How long does this take
End-to-end, from "tap Generate" to "video ready":
- Single-photo mode: 30-45 seconds typical, 60 seconds at peak load.
- Two-Photo Mode: 40-60 seconds (the composition step adds 10-15 seconds).
- Real-time progress is shown in the UI. You're not staring at a loading spinner without context.
What hardware runs the model
Let's Mack runs on cloud GPU infrastructure (NVIDIA H100 / A100 class). We do not require any local hardware on your phone or computer beyond a modern web browser. The generation cost is what makes free videos finite — each generation has a real GPU minute associated with it.
Privacy of your photos
- Photos are not stored permanently. The image you upload is processed during generation and discarded shortly after. We retain only the metadata needed for your account history (style picked, date, etc.) — never the source image.
- Generated videos are private to your account by default. They are not public and are not shared anywhere unless you explicitly choose to share them.
- We do not sell user photos. We do not use uploaded photos to train models for resale or third-party use.
- For full policy details see Privacy Policy.
The 36-style catalog
Each style is a separate AI generation, not a filter applied to a fixed kiss. That's why Movie Kiss looks photoreal and dramatic while NPC Energy looks glitchy and stiff — they're literally different model invocations with different prompts. Browse all 36 in the dedicated style hubs:
- Romantic / occasion (8): Classic, Movie Kiss, Rain Kiss, Sunset, Fairy Tale, Slow Mo, Prom Night, Surprise
- Holiday (2): Mistletoe, Midnight
- Stylized maximalist (4): Anime, Telenovela, Bollywood, Italian Chef
- TikTok comedy (8): Caught in 4K, NPC Energy, FBI Open Up, Awkward, Tickle Fight, Photobomb, 3AM Energy, Zoom Call
- Sci-fi / chaos (4): Glitch in the Matrix, Time Traveler, Zero Gravity, Speed Run
- Sports / event (3): JumboTron, WWE Kiss, Victory Kiss
- Distinctive aesthetic (4): Underwater, Horror Kiss, Silent Film, Cooking Show
- Angst / drama trope (2): Plot Twist, Enemies to Lovers
- Absurdist (1): Nature Doc
Get started
Try Let's Mack free — 3 videos on signup, no credit card required. Pick Classic for your first video to see the photoreal output, then explore the catalog.
Frequently Asked Questions
How does Let's Mack generate AI kissing videos?
Four steps: (1) photo upload + screening for minors / explicit content, (2) face detection and pose / lighting analysis, (3) motion generation conditioned on the chosen style prompt — for Two-Photo Mode, a composition step places both faces into a shared scene first, (4) video synthesis blending the original image with the generated motion. End-to-end takes 30–60 seconds.
Is Let's Mack a deepfake tool?
No. We don't face-swap (the faces in your output are the faces in your input), don't synthesize speech (output is silent), and don't start from real video and modify what happened (we start from still photos and generate new stylized motion on top). See the Glossary for the full distinction between AI kissing videos and deepfakes.
How long does AI kiss video generation take?
30–45 seconds typical for single-photo mode, 40–60 seconds for Two-Photo Mode (the composition step adds 10–15 seconds). Real-time progress is shown in the UI.
What hardware does Let's Mack run on?
Cloud GPU infrastructure (NVIDIA H100 / A100 class). No local hardware required beyond a modern web browser on your phone or computer. The GPU cost per generation is what makes free videos finite — each video has a real GPU minute associated with it.
Are uploaded photos kept on Let's Mack servers?
No. Photos are processed during generation and discarded shortly after. We retain account metadata (style picked, date, etc.) but never the source images. Generated videos are private to your account by default. We don't sell user photos and don't use them to train third-party models. See the Privacy Policy for details.
Why do different styles look so different?
Each of the 36 styles is a separate AI generation with its own prompt, not a filter applied to a fixed kiss. Movie Kiss tells the model "dramatic Hollywood angle, warm lighting, wind-blown hair." NPC Energy tells it "stiff video-game motion, blank expression, polygon flickers." That's why each style produces a completely different visual.
Can the AI lip-sync to audio?
No. Let's Mack outputs are silent. The kissing motion is generated independently of any audio track. We are not a lip-sync product. Users typically pair the silent video with their own TikTok / Reels audio overlay after download.