Llama and ChatGPT Are Not Open-Source


Social media and advertising technology company Meta recently released an update to its large language model Llama. The Llama version 2 was made available in the form of an open-source project which gives users access to weights for the model and evaluation code, as well as documentation. Meta declares that the open-source release is designed to enable the model to be “accessible to individuals, creators, researchers, and businesses so they can experiment, innovate, and scale their ideas responsibly.”

In contrast with other open-source LLMs or open-source software applications more broadly, Llama 2 is considerably locked down. Although Meta has made its trained model public, it’s not sharing its learning data or the program used in its training. While third parties have created applications that expand beyond the basic model, Researchers and students need help discerning it in its current form.

The research that was presented during the ACM Conference on Conversational User Interfaces, the collection of AI researchers from Radboud University, located in Nijmegen, Netherlands, argues that Llama 2 is not the only LLM to be questioned for being classified by the term “open source.” In the article, researchers provide a multi-dimensional evaluation of the model’s openness. The rubric is used to assess 15 different open-source LLMs on various aspects of their accessibility, documentation, and access methods. Researchers have compiled the results of their assessments into an open table which has since been extended to include 21 distinct open-source models. Models that were smaller and focused on research were included in the analysis when they were considered to be, as per the paper’s preprint, “open, sufficiently documented, and released under an open source license.”

“Meta using the term ‘open source’ for this is positively misleading.” Mark Dingemanse Radboud University

The researchers started the project after looking for AI models they could use for their research and teaching. “If you write a research paper, you want the results to be reproducible for as long as possible,” says Andreas Liesenfeld, one of preprint’s co-authors and Assistant Professor at Radboud. “That’s something you’d specifically be looking for when conducting research using these tools you’re sure? We didn’t observe, for instance in ChatGPT”–the chat-bot interface based on the OpenAI Generative Pretrained Transformation (GPT) LLM series. Despite what we can infer from the namesake, OpenAI closed access to a significant portion of its research codes after launching GPT-4 and receiving a large amount of funding from Microsoft this year.

The Radboud University team’s evaluation could be more pessimistic about ChatGPT’s and Llama’s status as open source. (The entire table included 20 entries at publication; we’ll show solely the top two and lowest threshold here.)In reality, the OpenAI ChatGPT model has been rated as the weakest of the models being evaluated in the openness table that the team has developed. There are three statuses available: open as well as closed. ChatGPT is classified as “closed” in all assessments apart from ” model card“–a common way to describe the model’s limitations and strengths–along with “preprint”–whether or not there’s an extensive research paper on the model. ChatGPT is only given a “partial” grade for these two categories. Llama 2, the second one, is the second most in the overall rankings, with an overall score marginally higher than ChatGPT.

AI’s Reproducibility Problems

The concerns of Liesenfeld regarding the reliability of research using ChatGPT have yielded some tangible results. A different study from researchers at Stanford University and the University of California, Berkeley recently showed that GPT-4 and GPT-3.5’s performance in reasoning tasks have changed between March and June this year and mainly to the negative. Any news or announcement from OpenAI did not accompany the changes. The changes could impede the replication of research results derived through these models.

While Liesenfeld and their coworkers found that smaller, more research-oriented models were less slack compared to Llama 2 or ChatGPT, they found that all models they examined had been closed in two significant ways. In the first, only a few models provided sufficient details of the vital refinement process that is required in the modern LLM function, referred to by the name of the reinforcement-learning process using human feedback (RLHF). This crucial process, which enables models for language to provide valuable outputs from the statistics patterns incorporated into them during model pretraining, is the key ingredient behind modern LLM performance. This procedure is time-consuming and requires human-in-the-loop evaluation of outputs from the model during the training process.

The other major issue researchers have to address is how commercial LLM releases have eluded peer review. Although publishing models techniques, training methods, and performance at peer-reviewed journals or conferences is a standard procedure in the field of academic study, ChatGPT, as well as Llama 2 were both released with an official preprint that was hosted by the company that was likely to shield information that was classified as trade secrets about models’ structure and training.

Although the illumination this project sheds on the varying openness of LLMs could lead the field toward an open-source model development process. However, Liesenfeld and his colleagues still need to be convinced of the commercial use of models within academic research. Mark Dingemanse as co-author of this report, offered a highly positive assessment of Llama 2. Llama 2 model: “Meta using the term “open source, which is not a good thing. Incorrect: There isn’t a source to find, and the training data is completely undocumented and beyond the appealing charts; the technical documentation is incredibly inadequate. We aren’t sure what the motivation behind this model is. Meta is so determined to get everyone involved in it. However, the track record of their choices does not give us confidence. Beware of the users.”


Leave a Reply

Your email address will not be published. Required fields are marked *