Databricks seems to have figured out how to get around this with Dolly 2.0.Further Reading The generative AI revolution has begun-how did we get here? But as per OpenAI’s terms of use, you can’t use output from services that compete with OpenAI. The Stanford project’s data of 52k questions and answers was trained on the ChatGPT’s outputs. The trailblazer was Meta’s LLaMA, followed by Stanford’s Alpaca, Koala, and Vicuna. There has been a wave of LLM releases that are considered open-source by many definitions but are bound by industrial licences and. The LLM based on the EleutherAI pythia model family, boasts an impressive 12 billion parameters and has been fine-tuned exclusively on an open-source corpus databricks-dolly-15k.ĭatabricks’ employees generated this dataset, and its licensing terms allow it to be used, modified, and extended for any purpose, including academic or commercial applications. The differentiating factor between other ‘ open source’ models and Dolly 2.0 is that it is available for commercial purposes without the need to pay for API access or share data with third parties unlike the rest.Īccording to the company’s official statement, Dolly 2.0 is the world’s first open-source LLM that follows instructions and is fine-tuned on a transparent and openly available dataset. Databricks seems to have figured out a way around this with Dolly 2.0, the predecessor of the large language model with ChatGPT-like human interactivity that the company released just two weeks ago. Large language models, up until now, have been in a legal grey area being trained on ChatGPT output.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |