A little Chinese company’s development of a fresh AI robot has caused tumbling stock market values and outrageous claims. What makes it so unique?
The reason behind this turmoil? The “large language model” ( LLM) that powers the app is reportedly less expensive to train and operate but has reasoning capabilities comparable to those of US models like OpenAI’s o1.
Analysis
At the Alan Turing Institute in London, United Kingdom, Dr. Andrew Duncan is the director of science and innovation basic AI.
DeepSeek claims to have accomplished this by employing a number of specialized techniques that reduced both the amount of memory needed to store and the processing time required to train its model ( also known as R1 ). According to DeepSeek, the cost reduction from these overheads led to a serious cost reduction. At the same time, R1’s base model V3 reportedly required 2.788 million hours to train ( running across numerous graphical processing units, or GPUs ), which is estimated to cost less than$ 6 million ( £4.8 million ), compared to the more than$ 100 million ( £80 million ) that OpenAI boss Sam Altman claims was needed to train GPT-4.
Despite the strike taken to Nvidia’s market value, the DeepSeek designs were trained on around 2, 000 Nvidia H800 GPUs, according to one study report released by the company. These cards were created in accordance with trade regulations in China and are a modified version of the popular H100 chip. These were good in stock before the Biden presidency tightened the restrictions more, which properly forbade Nvidia from exporting the H800s to China in October 2023. Working within these considerations, it is likely that DeepSeek has been forced to find creative ways to make the best use of the tools it has at hand.
Concerns about the impact of AI on the environment may also be addressed by lowering the mathematical costs of training and running designs. The machines are primarily kept warm by the information centers ‘ high expectations for electricity and water. While the majority of tech companies do not make their designs ‘ carbon footprints public, according to a new estimate, ChatGPT’s regular carbon dioxide emissions are the equivalent of 260 flights from London to New York. Therefore, from an economic perspective, increasing the effectiveness of AI models may be beneficial for the market.
Of course, whether DeepSeek’s models do provide real-world savings in electricity remains to be seen, and it’s also unclear if cheaper, more efficient AI could lead to more people using the model, and thus an increase in overall energy intake.
If nothing else, it might help advance the issue of green AI at the approaching Paris AI Action Summit so that the AI tools we use in the upcoming are also environmentally friendly.
Some people have been left shocked by DeepSeek’s rapid adoption of a dynamic, large-language business model; Liang Wenfeng was the only person to be recognized as an” AI hero” in China at the time of his founding.
A group of significantly smaller models, each with experience in a particular field, are used to create the unit.
The most recent DeepSeek type is notable for having both a technical report describing the woman’s development process and its “weights,” which are the numeric parameters of the model obtained from the training process. This enables different organizations to use their own equipment to run the model and adapt it to a variety of things.
In contrast to OpenAI’s o1 and o3, which are essentially black boxes, researchers can then look under the woman’s cap to learn what it does. However, some details remain incomplete, such as the datasets and the code used to teach the models, but groups of researchers are now attempting to piece these collectively.
Not all of DeepSeek’s cost-cutting techniques are fresh either – some have been used in various LLMs. Mistral AI publicly released its Mixtral 8x7B design in 2023, which was on par with the more advanced versions of the time. Both Mixtral and the DeepSeek models make use of the “mixture of experts” approach, which consists of a group of many smaller models each with expertise in a particular field. Given a job, the combination model assigns it to the most competent “expert”.
DeepSeek has also made its fruitless attempts to improve LLM reasoning using other technical methods, including Monte Carlo Tree Search, a method that has long been touted as a potential method to control the LLM’s reasoning process. Researchers will use this data to research how the model’s now remarkable problem-solving abilities can be even more advanced, improvements that are likely to be used in the next generation of AI models.
Getty Images
What does all of this mean for the AI industry’s potential?
DeepSeek might be demonstrating that creating powerful AI designs doesn’t require a lot of money. As businesses discover ways to create model training and operation more effective, I’d venture to see very worthy AI models being developed with always fewer resources.
Up until now, the AI landscape has been dominated by” Great Software” companies in the US – Donald Trump has called the fall of DeepSeek” a wake-up call” for the US tech industry. However, this development may not actually be bad for Nvidia in the long run because businesses and governments will be able to follow AI technology more quickly as the cost of developing it decreases. That will in turn drive demand for new products, and the cards that authority them – and so the cycle continues.
Smaller businesses like DeepSeek are likely to perform a growing role in developing AI equipment that have the ability to simplify our lives. It would be foolish to undervalue that.
For more research, engineering, environment and health reports from the BBC, follow us on , Facebook,  , X , and , Instagram.