Three Data Point Thursday

Share this post

🐰 #22 AI Whisky, Data Business Models, Data Version Control; ThDPTh #22 🐰

thdpth.substack.com

🐰 #22 AI Whisky, Data Business Models, Data Version Control; ThDPTh #22 🐰

Sven Balnojan
Jun 3, 2021
Share
Three Data Point Thursday

How AI creates an award-winning whisky, how data companies make money, and which tools you can use to versionĀ data.

Data will power every piece of our existence in the near future. I collect ā€œData Pointsā€ to help understand & shape this future.

If you want to support this, please share it on Twitter, LinkedIn, or Facebook.

(1)šŸ”® Data Open Source BusinessĀ Models

I just stumbled across some weird data orchestrator business models, so I started researching…

I’m sharing this article because as I said before, I believe the data space will be dominated by open source solutions pretty soon. As such I think it’s interesting to understand how open source companies actually make money and make sure they survive. Something we as end-users actually have a lot of interest in. I don’t like using tools that won’t be supported anymore in 2–3 years.

The authors run timescaleDB, an open-source database, so they are very aware of the fact that a lot of data companies go the open-source route. They provide a great list of big open-source data companies that made it like CockroachDB, Elastic, Databricks, MongoDB, and many more.

They also shine a good light on the true importance of community building and understanding why some business models are a better fit than others. I really enjoyed their take on it and will probably contribute something on the same lines soon.

5 ways open source software companies make money

A guide on how to evaluate the long-term sustainability of the business behind any open-source software you are using (or considering working on yourself).

blog.timescale.com  •  Share

(2)šŸ”„ AI Whisky winsĀ Gold

I sometimes like a sip of whisky, now an AI-mixed whisky has won a bunch of awards showcasing how the interaction of humans & machines in the future might look like.

ā€œThe work of a Master Blender is not at risk,ā€ Angela states. ā€œWhile the whisky recipe is created by AI, we still benefit from a person’s expertise and knowledge. We believe that the whisky is AI-generated, but human-curated. Ultimately, the decision is made by aĀ person.ā€

It seems to be a good display of what I called ā€œhuman aided machine engineeringā€ and what will be the target model for AI-human interaction for the foreseeable future. I think it’s also a great model you should adopt when thinking about introducing AI into your products/ processes. That of course has two sides.

The good: it means you can probably introduce AI into much more things than you previously thought. The bad: AI probably won’t take the whole process out of your hands and as such, the value of introducing AI is probably lower than you think. I’m not sure whether this helps or confuses you, but at least it always keeps me thinking.

Mackmyra | Fourkind, part of ThoughtWorks | ThoughtWorks

Together with Fourkind, part of ThoughtWorks, Mackmyra created the world’s first whisky developed completely by machine learning. In an industry synonymous with deep-rooted tradition, human expertise and craftsmanship, what happens when 1,000-year-old techniques meet advanced 21st Century technology?

www.thoughtworks.com  •  Share

(3) šŸ“£ Data VersionĀ Control

I just had a discussion with a friend about data version control and he pointed me to this comparison. The author, Guy Smilovsky provided a decent comparison of data version control tools in 2020 and to my knowledge, not much has changed in the space (sadly). I do still think some major innovation has to take place in this space, but so far really there are only two options for actually versioning data as code:

  • DVC

  • and lakeFS.

GFS is built for a different purpose and as such doesn’t work the way we need it to work to e.g. version control your database or the input of your machine learning models. It might still do to manage a large ML model, but that’s about it.

Other solutions like Pachyderm all come packaged with lots of baggage, and so are not really useful as standalone data version control. That leaves us with two options, DVC and lakeFS. DVCs pipeline functionality is nice, but not a must-have. LakeFS really shines with branching etc and I’m still looking forward to them implementing distributed version control features someday.

I don’t agree with everything the author says, in particular, I do really think we just need one tool to do data versioning, and as said, DVC and lakeFS shine there. But still, the comparison is sound.

Comparing Data Version Control Tools — 2020 | by Guy Smoilovsky | Towards Data Science

An overview and comparison of tools for data version control in 2020

towardsdatascience.com  •  Share

šŸŽ„ In Other News & Thanks

Thanks for reading this far! I’d also love it if you shared this newsletter with people whom you think might be interested in it.

P.S.: I share things that matter, not the most recent ones. I share books, research papers, and tools. I try to provide a simple way of understanding all these things. I tend to be opinionated. You can always hit the unsubscribe button!

By Sven Balnojan

Data; Business Intelligence; Machine Learning, Artificial Intelligence; Everything about what powers our future.

Tweet Ā Ā Ā  Share

In order to unsubscribe, click here.

If you were forwarded this newsletter and you like it, you can subscribe here.

Powered by Revue

Share
Previous
Next
Comments
Top
New
Community

No posts

Ready for more?

Ā© 2023 Sven Balnojan
Privacy āˆ™ Terms āˆ™ Collection notice
Start WritingGet the app
SubstackĀ is the home for great writing