Putting ChatGPT into perspective; Three Data Point Thursday
Andrew Ng puts ChatGPT and other LLMs into perspective. Companies are building valuable things with ChatGPT. Seldon merges MLOps datacentric AI and open source and gets more funding.
I’m Sven, writing this to help you build things with data. Whether you’re a data PM, inside a data startup, internal data lead, or investing in data companies, this is for you.
Let’s dive in!
Why LLMs are nothing fundamentally new.
Your task as a builder hasn’t changed.
Why is Seldon interesting? (not just for the 20M funding!)
Noteable plays with ChatGPT
Subscribe (free!) or someone will steal your data business & (data) users.
What: Andrew Ng, AI sage, wrote a short article titled “AGI (Artificial General Intelligence) Progress Report” in response to the recent advances of ChatGPT and other LLMs.
My perspective: Andrew’s article is short enough for everyone to read it. I mean it. His view is simple and direct, and one I hold as well:
“The latest LLMs exhibit some superhuman abilities, just as a calculator exhibits superhuman abilities in arithmetic. At the same time, there are many things that humans can learn that AI agents today are far from being able to learn.”
“To be clear, though, in the past year, I think we’ve made one year of wildly exciting progress in what might be a 50- or 100-year journey. Benchmarking against humans and animals doesn’t seem to be the most useful question to focus on at the moment, given that AI is simultaneously far from reaching this goal and also surpasses it in valuable ways. I’d rather focus on the exciting task of putting these technologies to work to solve important applications while also addressing realistic risks of harm.”
The true question is not how exciting or dangerous new LLMs are but what you do with them. I can tell you what I think every single reader of this newsletter should do:
Please turn exciting new research and technology in data into usable products for us all.
That’s as true and hard for ChatGPT as it was for large-scale data processing (thank you, databricks gang). You do that by:
Not getting distracted by the latest trends (“ChatGPT”)
By having a good process to integrate trends into your products (“How to NOT ChatGPT”)
By thinking about generative artificial intelligence, not just “Google Bard”
Not ignoring trends either! If people want to use generative AI as “inspiration” (a term used by a friend), they do. No sense in trying to “catch them”; instead, the question becomes, how do you use gAIs in a useful way to advance a certain discipline, to increase value to everyone?
Key question: how do you use gAIs to advance a specific discipline, to increase value to everyone? How do you make their benefits available for everyone?
Seldon 20M Series B
What: Seldon just raised a 20M series B. The company is producing a fair amount of open source and describes its mission roughly as
“AI is everywhere, and at Seldon, we’re enabling our customers to put AI in everything.”
My perspective: The company seems interesting because it looks like MLOps meets datacentric AI meets open-source—a potent combination.
They try to achieve their mission by focusing on the bottleneck sitting at “getting ML into production.”
They also push forward research.
“Our world-class R&D team is pioneering academic research in the MLOps field.”
Especially research around explainability and ML systems running in production might potentially turn into a competitive advantage if they double down on it.
Stay tuned. Seldon, I’m watching you.
Build something with ChatGPT
What: The notebook company noteable.io jumped onto the ChatGPT train and wrote open-source tooling for IPython notebooks.
My perspective: The noteable team injects ChatGPT into exceptions and thus (try to) help you fix them.
Picture from Introducing Genai, Noteable.
This might be useful, or it may not. I don’t know. But I know this integration is an excellent example of what to do with ChatGPT, as outlined in “How to NOT ChatGPT”. Noteable is either jumping on a recent trend or thinks this might be useful to users.
In both cases, integrating it into their current product and making it open-source is a great idea. There’s no need to make a separate product, and there is no competitive advantage, but it might be helpful. So why not do it in the most publicly visible way that might even get contributions vis open source?
New articles by me
I just published my 130th story on medium: “5 Helpful Extract & Load Practices for High-Quality Raw Data”
Shameless plugs of things by me
Check out Data Mesh in Action (co-author, book)
and Build a Small Dockerized Data Mesh (author, liveProject in Python).
And on Medium with unique content.
I created the “20 Point Questionnaire To Assess The Strength Of Your Data Startup Idea”.
Just recommend the ThDPTh and respond to this email with “SHARE: [link to your recommendation],” and you’ll receive this fabulous giveaway.
You can take a lot of shortcuts by reading pieces from people with real experience that can condense their wisdom into words.
And that’s what I’m collecting here: wisdom from other smart people.
You’re welcome to email me with questions or raise issues I should discuss. If you know a great topic, let me know about it.
If you feel like this might be worthwhile to someone else, pass it along; finding good reads is always a hard challenge, and they will appreciate it.
Until next week,