WordPress Ad Banner

Meet ‘DarkBERT:’ South Korea’s Dark Web AI Could Combat Cybercrime

A team of South Korean researchers has taken the unprecedented step of developing and training artificial intelligence (AI) on the so-called “Dark Web.” The Dark Web trained AI, called DarkBERT, was unleashed to trawl and index what it could find to help shed light on ways to combat cybercrime.

The “Dark Web” is a section of the internet that remains hidden and cannot be accessed through standard web browsers. This part of the web is notorious for its anonymous websites and marketplaces that facilitate illegal activities, such as drug and weapon trading, stolen data sales, and a haven for cybercriminals.

How Does DarkBERT Function?

Currently, the DarkBERT is still in the works. The developers are currently working on the AI to adapt well to the language that might be being used on the dark web. The researchers will be training the model by crawling through the Tor network.

It has also been reported that the pre-trained model will be filtered well and deduplicated. Data processing will be incorporated into the model to identify threats or concerns from the expected sensitive information.

According to the team, their LLM was far better at making sense of the dark web than other models that were trained to complete similar tasks, including RoBERTa, which Facebook researchers designed back in 2019 to “predict intentionally hidden sections of text within otherwise unannotated language examples,” according to an official description.

“Our evaluation results show that DarkBERT-based classification model outperforms that of known pre-trained language models,” the researchers wrote in their paper.

According to the team, DarkBERT has the potential to be employed for diverse cybersecurity purposes, including identifying websites that vend ransomware or release confidential data. Additionally, it can scour through the numerous dark web forums updated daily and keep an eye on any illegal information exchange.

What’s next?

A lot has been going on as the DarkBERT is being developed. The researchers will be incorporating multiple languages into the pre-trained model. DarkBERT performance is expected to be better with using the latest language in the pre-trained model to allow the crawling of additional data.