2 months ago 24

What is DeepSeek, the AI chatbot from China that is sending shockwaves through the tech world?

DeepSeek-R1, the latest in a series of models developed with fewer chips and at low cost, is challenging the dominance of OpenAI, Google, and Meta.

Chinese artificial intelligence (AI) lab DeepSeek's eponymous large language model (LLM) has stunned Silicon Valley by becoming one of the biggest competitors to US firm OpenAI's ChatGPT.

The latest DeepSeek models, released this month, are said to be both extremely fast and low-cost.

The DeepSeek-R1, the last of the models developed with fewer chips, is already challenging the dominance of giant players such as OpenAI, Google, and Meta, sending stocks in chipmaker Nvidia plunging on Monday.

Here's what we know about the industry disruptor from China.

Where did DeepSeek come from?

The Hangzhou, China-based company was founded in July 2023 by Liang Wenfeng, an information and electronics engineer and graduate of Zhejiang University.

It was part of the incubation programme of High-Flyer, a fund Liang founded in 2015. Liang, like other leading names in the industry, aims to reach the level of "artificial general intelligence" that can catch up or surpass humans in various tasks.

Operating independently, DeepSeek's funding model allows it to pursue ambitious AI projects without pressure from outside investors and prioritise long-term research and development.

DeepSeek's team is made up of young graduates from China's top universities, with a company recruitment process that prioritises technical skills over work experience.

In short, it is considered to have a new perspective in the process of developing artificial intelligence models.

DeepSeek's journey began in November 2023 with the launch of DeepSeek Coder, an open-source model designed for coding tasks.

This was followed by DeepSeek LLM, which aimed to compete with other major language models. DeepSeek-V2, released in May 2024, gained traction due to its strong performance and low cost.

It also forced other major Chinese tech giants such as ByteDance, Tencent, Baidu, and Alibaba to lower the prices of their AI models.

What is the capacity of DeepSeek models?

DeepSeek-V2 was later replaced by DeepSeek-Coder-V2, a more advanced model with 236 billion parameters.

Designed for complex coding prompts, the model has a high context window of up to 128,000 tokens.

A token is a unit in a text. This unit can often be a word, a particle (such as "artificial" and "intelligence") or even a character. For example: "Artificial intelligence is great!" may consist of four tokens: "Artificial," "intelligence," "great," "!".

A context window of 128,000 tokens is the maximum length of input text that the model can process simultaneously.

A larger context window allows a model to understand, summarise or analyse longer texts. This is a great advantage, for example, when working on long documents, books, or complex dialogues.

The company's latest models DeepSeek-V3 and DeepSeek-R1 have further consolidated its position.

A 671,000-parameter model, DeepSeek-V3 requires significantly fewer resources than its peers, while performing impressively in various benchmark tests with other brands.

The DeepSeek-R1, which was launched this month, focuses on complex tasks such as reasoning, coding, and maths. With its capabilities in this area, it challenges o1, one of ChatGPT's latest models.

Although DeepSeek has achieved significant success in a short time, the company is primarily focused on research and has no detailed plans for commercialisation in the near future, according to Forbes.

Is it free for the end user?

One of the main reasons DeepSeek has managed to attract attention is that it is free for end users.

This is the first such advanced AI system available to users for free. Other powerful systems such as OpenAI o1 and Claude Sonnet require a paid subscription. Even some subscriptions impose quotas on users.

Google Gemini is also available for free, but free versions are limited to older models. DeepSeek has no limitations for now.

How to use it?

Users can access the DeepSeek chat interface developed for the end user at "chat.deepseek". It is enough to enter commands on the chat screen and press the "search" button to search the internet.

There is a "deep think" option to obtain more detailed information on any subject. While this option provides more detailed answers to users' requests, it can also search more sites in the search engine. However, unlike ChatGPT, which only searches by relying on certain sources, this feature may also reveal false information on some small sites. Therefore, users need to confirm the information they obtain in this chat bot.

Is it safe?

Another important question about using DeepSeek is whether it is safe. DeepSeek, like other services, requires user data, which is likely stored on servers in China.

As with any LLM, it is important that users do not give sensitive data to the chatbot.

Since DeepSeek is also open-source, independent researchers can look at the code of the model and try to determine whether it is secure. More detailed information on security concerns is expected to be released in the coming days.

What does open source mean?

The models, including DeepSeek-R1, have been released as largely open source. This means that anyone can access the tool's code and use it to customise the LLM. The training data is proprietary.

OpenAI, on the other hand, had released the o1 model closed and is already selling it to users only, even to users, with packages of $20 (€19) to $200 (€192) per month.

How did it produce such a model despite US restrictions?

The company has also established strategic partnerships to enhance its technological capabilities and market reach.

One of the notable collaborations was with the US chip company AMD. According to Forbes, DeepSeek used AMD Instinct GPUs (graphics processing units) and ROCM software at key stages of model development, particularly for DeepSeek-V3.

MIT Technology Review reported that Liang had purchased significant stocks of Nvidia A100 chips, a type currently banned for export to China, long before the US chip sanctions against China.

Chinese media outlet 36Kr estimates that the company has more than 10,000 units in stock. Some say this figure is 50,000.

Realising the importance of this stock for AI training, Liang founded DeepSeek and began using them in conjunction with low-power chips to improve his models.

But the important point here is that Liang has found a way to build competent models with few resources.

US chip export restrictions forced DeepSeek developers to create smarter, more energy-efficient algorithms to compensate for their lack of computing power.

ChatGPT is thought to need 10,000 Nvidia GPUs to process training data. DeepSeek engineers say they achieved similar results with only 2,000 GPUs.

What has the reaction to DeepSeek been?

Alexandr Wang, CEO of ScaleAI, which provides training data to AI models of major players such as OpenAI and Google, described DeepSeek's product as "an earth-shattering model" in a speech at the World Economic Forum (WEF) in Davos last week.

While DeepSeek has stunned American rivals, analysts are already warning about what its release will mean in the West.

"We should be alarmed. Chinese AI technology integrating further into the UK and Western society is not just a bad idea - it’s a reckless one," Ross Burley, Co-Founder of the Centre for Information Resilience, said.

"We’ve seen time and again how Beijing weaponises its tech dominance for surveillance, control, and coercion, both domestically and abroad. Whether it’s through spyware-laden devices, state-sponsored cyber campaigns, or the misuse of AI to suppress dissent, China’s track record demonstrates that its technology is an extension of its geopolitical strategy," he added.

"This might appear to be a benign Large Language Model, but we’ve already seen that the AI is suppressing information critical of the Chinese government".

Others agree that the move to release its latest LLM is a political move, one which is likely to inflame already intense Sino-American relations.

"The technology innovation is real, but the timing of the release is political in nature," Gregory Allen, director of the Wadhwani AI Center at the Center for Strategic and International Studies, told the Associated Press.

Allen compared DeepSeek's announcement last week to US-sanctioned Chinese company Huawei's release of a new phone during diplomatic discussions over Biden administration export controls in 2023.

"Trying to show that the export controls are futile or counterproductive is a really important goal of Chinese foreign policy right now," Allen said.

Read this article on source website