I had a blank white wall in my zoom background at my new Seattle home. And it was ugly.
There is already a bookshelf back there but it felt like a place that I should put some art. I’ve been using and buying generative art for a little while, so my first thought was to hang up a Dalle generated image. The immediate second thought was for a way to change the image that came up on some regular cadence, mostly so I wouldn’t be trapped by the fear of choosing just one prompt.
In this case, I’d want:
To input text
For the frame to show the generated image
I realized I shouldn’t be so selfish. Why not just let anyone input a prompt? It would be fun to see what photos my friends or coworkers decided to toss up there.
After this I was off to the races. I had a few conversations with the new o1 flavor of GPT via Perplexity, and decided on the following framework.
Hardware:
A raspberry pi
An old digital monitor I had lying around
A shelf that my gf bought but hadn’t used yet
Software:
Twilio to accept prompts
Dalle to handle photo generation
s3 to host the generated photos
A python script on the raspberry pi that would show the photo to the monitor
A Flask API that would accept prompt via Twilio Webhook + ngrok, generate photo, upload photo (aka connecting #1, #2, and #3)
I built this over the course of a weekend, and the results turned out, great?
Anyone can text a prompt to a number, and the generated photo will stay on the screen in my background until another person gives it a shot. Everyone on my team at work has this number, as do a bunch of my friends.
A few of my favorite Dalle prompts have been:
A toddler dressed as a rancher riding a triceratops lassoing a velociraptor that is trying to escape down a red rock canyon, dramatic cinematic heroic
In a cartoon style, cats wearing pinstripe suits and playing billiards
The words "Zachary Blackwood Was Here" with an otherwise spotless white background
I accidentally used Dalle 2 for a day or so, until I showed it to my coworker who told me to use a better model, and now the photos are pretty awesome. Thank god for progress.
Whenever Dalle finishes creating the image, I have my Flask API send a text to the user with the photo that it generated, along with the link.
The moderation problem
My friends wanted to also be able to upload their own photos directly to the photo frame. I thought it would be a fun idea, but I knew that a few of my friends would post unsavory photos and I really didn't want to have dick pics in the background of my work Zoom. I thought that would probably be a fireable offense, but I really wasn't sure.
Either way, it wasn’t that hard to implement some basic content moderation via AWS Rekognition (thanks to Thots for the idea), and now anyone can send a full photo to that number and it’ll pop up if it passes the moderation filters.
Here is a photo from my friend Parth who sent a selfie.
If a user sends a photo that won’t look good stretched on the frame, the program on the raspberry pi will adjust it by adding black space on the sides or top to make it fit. Doesn’t look amazing, but not much I could do there.
Some problems I ran into:
Twilio responses
I can’t, for the life of me, get Twilio to respond to everyone who texts me a prompt. About 25% of the time, Twilio never responds and it is quite difficult to figure out why. Am I getting shadowbanned? Does Twilio not love having me sending photos to a bunch of random people? Feels reasonable that it wouldn’t work all the time
File changes
Right now, the way that the raspberry pi knows the image has changed is stupid. I just reload the image from the s3 bucket on an interval (every 5 minutes). So every 5 minutes, you briefly (~3-5 seconds) get a terminal showing on the screen. I probably could find a better way of doing this, but I didn’t want to mess with any asynchrony as I was already in Python.
Ok what did all this cost?
The entire software cost for this over the last month was a few dollars. I already had a number that I pay Twilio for (for another project), so each marginal text is tiny. The OpenAI bill was around a dollar for October for ~75 images. s3 bucket costs are functionally 0 for one photo regularly updated. And the content moderation API is actually functionally free (1 tenth of 1 cent per image).
The only bit of the hardware that I bought was the raspberry pi (~$35). If I had to buy everything, maybe the total cost would have been closer to $75 assuming I used Facebook marketplace for an old monitor. Either way, extremely cheap.
~ Fin ~
I would have never realistically finished this project without the help of AI. I didn’t know much about raspberry pis, or Flask, or different moderation APIs. It quite literally made me not just 50%+ faster, but made me outrageously more likely to try new side projects. This is an extremely basic realization, but why am I not using LLMs to accomplish all the side projects I have thought about and pushed away? Many such cases. The age of personalized software is here, and if only I had some actually good ideas to take advantage of it.
The full code for the Flask API, which has almost all the logic, can be found here. You can see my conversation with Perplexity here (which, is a little embarrassing. How much I didn’t know!).
Thanks to Wolfgang Männel for writing this blog post which Perplexity used to help me create this product. And of course, thanks to my beta users Rob Olsthoorn, Nikhil Thota, Parth Chopra, Kenton Prescott, Zachary Blackwood, Tyler Simons, and Zexin Jin.