The Desperate Hunt for Digital Data to Fuel the AI Revolution

Virginia Backaitis
Digitizing Polaris
Published in
4 min readApr 8, 2024

--

In late 2021, OpenAI faced a supply problem. The artificial intelligence lab had exhausted every reservoir of reputable English-language text on the internet as it developed its latest A.I. system. It needed more data to train the next version of its technology — lots more.

So OpenAI researchers created a speech recognition tool called Whisper. “It could…

--

--