Researchers at the Leiden Institute of Advanced Computer Science found thousands of repositories on GitHub that offer fake proof-of-concept (PoC) exploits for various vulnerabilities, some of them including malware.
GitHub is one of the largest code hosting platforms, and researchers use it to publish PoC exploits to help the security community verify fixes for vulnerabilities or determine the impact and scope of a flaw.
According to the technical paper from the researchers at Leiden Institute of Advanced Computer Science, the possibility of getting infected with malware instead of obtaining a PoC could be as high as 10.3%, excluding proven fakes and prankware.
Data collection and analysis
The researchers analyzed a little over 47,300 repositories advertising an exploit for a vulnerability disclosed between 2017 and 2021 using the following three mechanisms:
- IP address analysis: comparing the PoC’s publisher IP to public blocklists and VT and AbuseIPDB.
- Binary analysis: run VirusTotal checks on the provided executables and their hashes.
- Hexadecimal and Base64 analysis: decode obfuscated files before performing binary and IP checks.
Of the 150,734 unique IPs extracted, 2,864 matched blocklist entries, 1,522 were detected as malicious in antivirus scans on Virus Total, and 1,069 of them were present in the AbuseIPDB database.
The binary analysis examined a set of 6,160 executables and revealed a total of 2,164 malicious samples hosted in 1,398 repositories.
In total, 4,893 repositories out of the 47,313 tested were deemed malicious, with most of them concerning vulnerabilties from 2020.
The report contains a small set of repositories with fake PoCs that delivered malware. However, the researchers shared with BleepingComputer at least 60 other examples that are still live and in the process of being taken down by GitHub.
Malware in the PoC
By looking closer into some of those cases, the researchers found a plethora of different malware and harmful scripts, ranging from remote access trojans to Cobalt Strike.
One interesting case is that of a PoC for CVE-2019-0708, commonly known as “BlueKeep”, which contains a base64-obfuscated Python script that fetches a VBScript from Pastebin.
In another case, the researchers spotted a fake PoC that was an info-stealer collecting system information, IP address, and user agent.
This was created before as a security experiment by another researcher, so finding it with the automated tool was a confirmation for the researchers that their approach worked.
One of the researchers, El Yadmani Soufian, who is also a security researcher at Darktrace, was kind enough to provide BleepingComputer with additional examples not included in the technical report, which are given below:
PowerShell PoC containing a binary encoded in base64 flagged as malicious in Virus Total.
Python PoC containing a one-liner that decodes a base64-encoded payload flagged as malicious on Virus Total.
Fake BlueKeep exploit containing an executable that is flagged by most antivirus engines as malicious, and identified as Cobalt Strike.
A script hiding inside fake PoC with inactive malicious components that could cause damage if its author wishes so.
How to stay safe
Blindly trusting a repository on GitHub from an unverified source would be a bad idea since the content is not moderated, so it falls on the users to review it before using it.
Software testers are advised to carefully scrutinize the PoCs they download and run as many checks as possible before executing them.
Soufian believes that all testers should follow these three steps:
- Read carefully the code you are about to run on your or your customer’s network.
- If the code is too obfuscated and needs too much time to analyze manually, sandbox it in an environment (ex: an isolated Virtual Machine) and check your network for any suspicious traffic.
- Use open-source intelligence tools like VirusTotal to analyze binaries.
The researchers have reported all the malicious repositories they discovered to GitHub, but it will take some time until all of them are reviewed and removed, so many still remain available to the public.
As Soufian explained, their study aims not just to serve as a one-time cleaning action on GitHub but to act as a trigger to develop an automated solution that could be used to flag malicious instructions in the uploaded code.
This is the first version of the team’s research and they are working on improving their detector. Currently, the the detection tool misses code with stronger obfuscation.