Discover more from Insignificant Data Science
The Medium Is the Message
On data apps, notebooks, and the almighty doc
Last week I was listening to a podcast with Andrew Sullivan and Johann Hari where they talked about the phrase “The medium is the message.” I’d heard this phrase before but never really understood it until they offered two examples:
A book sends the message that you should pay close attention to the subject for a long period of time.
A tweet sends the message that you can grep the subject in 280 characters.
I immediately got it. While my girlfriend and I were doing long distance, we would have conversations over text, FB messenger, Facetime, on the phone, over email, etc. All of these conversations we had were fundamentally distinct because of the medium they were on. We could have, at any time, consolidated channels and used the same words, but the message would have changed.
The same is true of data products. The number of mediums that data scientists use to present work is exploding and we have to be careful to choose the right one. I see three main categories that are currently used: notebooks, docs, and apps.
Sending someone a Jupyter Notebook (e.g. via Google Colab) sends the message, “Here, come into my workstation and see what I’ve tried already.” There might be some options to change a parameter or two, but if the viewer is not a data scientist, they’d probably struggle to sift the gold out of the work. It also sends the message that the viewer should see everything, regardless of the importance of any line of code or query. By default, the medium makes your data cleaning just as important as the insightful graph at the end. This makes a notebook ideal especially for folks who have:
Experience in your data stack
Experience in your language of choice
The context in your specific problem
Experience with your notebook solution
Who are the people that reliably have those four characteristics? Your team’s data scientists. For them, this solution can work pretty well!
Some companies like Hex do an exceptional job at making the notebook sharable, but usually the end product of Hex is a fully fledged data app, so I don’t really include them in this category.
As a data scientist at Facebook, I wrote many docs. It was the primary form of output. I generally followed the format:
Text, text, text explaining what I did
A graph with a result (sometimes with an arrow pointing it out!)
More text giving a recommendation for the team
Some other graphs to anticipate questions from readers
A ton of caveats about the data used and the assumptions made
By default, docs are passive and non-interactive. Any follow-up questions created more work for me. My PM wants to know how this graph looked a few months earlier? More work. Engineers want to know how a classification model I made would work on a different set of images? More work. Another DS on my team wants to download the raw data? More work.
The benefit of the almighty doc is, however, that everyone can read it. You don’t need experience in my data stack, Python or R, or anything else. Docs existed to democratize information within an organization, and were the best tool to influence behavior or decisions. I saw hundreds of decisions influenced by docs, and maybe a half dozen influenced by notebooks, which is why I always opted for docs at Facebook (or I guess Meta now? whatever).
As much as data scientists would like to believe it, the people we’re trying to convince and influence are almost never other data scientists. They are PMs, Tech Leads, Engineers, and the execs who have the budget to pay for our ideas.
As of now, I’m employed at Streamlit, which means that everyone has some experience in data and data science, but that is the exception not the rule.
On Data Apps
The problem was always the effort. At Facebook, if I wasn’t complete with an analysis within a week or two, I had to move mountains to draw the Eye of Sauron (the focus of the org) back to the subject. This problem only got bigger as I got more senior because the group of people I was trying to convince expanded rapidly. This always made apps out of the question.
Streamlit makes creating apps ridiculously fast, rendering this concern fairly obsolete. And thank goodness it does, apps as a medium encourage a deep focus on a smaller set of information than notebooks and a larger set of information than docs while inviting the user to interact with it and learn in a structured manner. There are other options popping up every day too (none, in my biased opinion, better than Streamlit) because I think folks are starting to realize that data apps are just a superior medium for a huge variety of use cases.
The medium is the message, so send the best message possible.