I feel burned out on data science projects. In undergrad (aka, when I was trying to get a job), I would whip through data science projects like no tomorrow. When playing ping pong with my friends, I would record every game so I could make this. When working in (against?) student government, I wanted to be able to predict who was going to win each election so I made a model that supposedly did that. And directly after college while doing a 6 month project with a non-profit, I made a map of NYC’s best pizza that was created by literally scraping all the data from a poorly configured AWS bucket. Who would have guessed that Barstool Sports can’t figure out AWS security?
But now, I haven’t been able to get much started on this front these past 9 months. There are a few good reasons for this, I’ve been writing a book on Streamlit (a new Python library which I think the majority of future data scientists will use), and I’ve also been teaching data science courses on how to interview as a DS with Interview Query and HiCounselor. Oh also a pandemic, which is a bit of a downer. Data Science projects haven’t been a priority for me.
However, my data science project draft folder is bursting at the seams, so I decided to try something out. I’m going to pitch these ideas to you all, the members of this exclusive (read: not popular) newsletter. Whatever is the more popular idea, i’ll go for and publish something by the end of April. And if I don’t, I’ll So here we go!
I love the podcast Invest Like The Best by Patrick O’Shaughnessy for its depth, subject experts, and most of all, thoughtful questions. I honestly feel like if every interviewer asked questions like the interviewer, podcasts would be better. So my idea here is to make a bot that takes a few questions you have already come up with, and makes them better in the style of Patrick himself. Half of this idea’s worth is in the name.
A year or so ago, I found a delightfully funny dataset released by the city of San Francisco that contained a huge amount of data on resident reports. This data has everything from loud noise complaints, to trash on the street, to even reports of poop on the ground. Given that this is San Francisco, we can’t really be sure about the origin of said feces, but there are so many data science ideas from this data. Could we create an app that takes in the address of a house you’re thinking about moving to, and calculates the average poop reports per square mile? Maybe we could also plot this on a distribution of all areas in SF? Where is the least shitty area of SF?
GPT-3 For Data Science Ideas
Other data scientists must have the same problem I have too, with not being able to decide on personal projects. I have access to the GPT-3 beta, so I could compile a list of data science projects or allow the user to input their own list of personal projects, and then let GPT-3 suggest new ones. Eventually, generative language models like GPT-3 should be incredibly useful for problems like writers block, and this might be an interesting use case.
Analyzing Your Spotify Listening History
I love little apps that allow users to analyze their own data in increasingly interesting ways (e.g. my Goodreads app), so this could be an extension of that. The Spotify API has a bunch of great info, like what songs you listen to and when, the category of those songs, and even some features they use to classify and recommend music like acousticness (pretty sure they made this word up) and tempo. This idea was inspired by my friend Frish, who wanted to do this for himself.
So those are a few of my ideas, go ahead and reply to the email and let me know what you think! If you’re reading this on the web, you can email me at firstname.lastname@example.org or comment on this post. Looking forward to it.