This post is Part-2 of my faceless AI experiment.

Once the visual and text foundation of my faceless Instagram channel was set, I wanted to push the boundaries into video and audio. Part of my core value offering for these Reels was creating entirely unique songs.

Given that I work at DistroKid, I already had a massive inherent motivation to explore AI music creation and distribution.

The Audio Stack: Suno AI & Voiceovers

Suno AI became my go-to model, and I am now a very happy Suno Pro customer. Seeing a raw idea turn into a fully produced track in seconds is one of the best technological leaps I’ve witnessed. Whenever I talk about AI amplifying human creativity, this is exactly what I mean.

For voiceovers and lyrics, I experimented with a few other tools:

  • ElevenLabs: Evaluated for voiceovers, but honestly, I didn’t love the output for my specific use case.
  • Lyric Generators: Used a few video-to-lyric generators (ElevenLabs, Higgsfield) and was quite happy with the results, though writing a custom Python script for this feels like the better long-term play.

The Expensive Reality of AI Video

Next came automated video generation and this is where things got token-heavy and expensive.

I registered on Google Cloud. For quick prototyping, aistudio.google.com is fantastic because it gives you access to Google’s advanced models on a pay-per-token basis. My monthly costs hovered around 20-30€ for API usage. Because raw AI video generation is so expensive, I tried a workaround: using JavaScript animation libraries and screen-recording them into MP4s. It was a solid budget option, but the output quality didn’t satisfy me, and I lacked the time to fine-tune the code.

To give my faceless channel a “face”, I had Gemini write a prompt for a fictional character, generated the image, and used Higgsfield.ai to lip-sync it. The result? Lip-syncing ate through my tokens instantly. I easily spent ~10€ in a single day.

I eventually tested HeyGen as an alternative to ElevenLabs and Higgsfield, which was decent. I didn’t get around to trying Dzine.ai. Ultimately, lip-sync generation is complicated, but with enough time, writing and fine-tuning a custom script is probably the most sustainable route.

Leave a comment

Trending