If AI is the latest arms race, data is its ammo. There are a number of ways that AI companies could pay publishers. Here's some speculation about what these business models might look like.
It got me thinking if specific high quality training data would be more valuable and thus economically viable for smaller curated models rather than the big providers (e.g., OpenAI). With GPT-4 being trained on trillions of tokens could be hard for a data provider like NYT to make sustainable revenue; however, if instead it was easy to license NYT data to produce a model trying to be an expert essay editor maybe that could work?? Also, is the future business model of NYT and other content creators going to be similar to Reddits shift charging for data they produce?
I think the market is trying to figure out what it means to train data on narrower datasets. What would it look like to train a model only on code, for example. Would that be a better code generating LLM? Think you're right there might be an opportunity to create an LLM focused on e.g., news.
As for business models -- with fewer people clicking into websites, ad revenue is sure to go down. It raises interesting questions about what types of business models will be viable. Obviously don't know the answers here but fun to conjecture!
Hey Anant! Great article!
It got me thinking if specific high quality training data would be more valuable and thus economically viable for smaller curated models rather than the big providers (e.g., OpenAI). With GPT-4 being trained on trillions of tokens could be hard for a data provider like NYT to make sustainable revenue; however, if instead it was easy to license NYT data to produce a model trying to be an expert essay editor maybe that could work?? Also, is the future business model of NYT and other content creators going to be similar to Reddits shift charging for data they produce?
I think the market is trying to figure out what it means to train data on narrower datasets. What would it look like to train a model only on code, for example. Would that be a better code generating LLM? Think you're right there might be an opportunity to create an LLM focused on e.g., news.
As for business models -- with fewer people clicking into websites, ad revenue is sure to go down. It raises interesting questions about what types of business models will be viable. Obviously don't know the answers here but fun to conjecture!