Authors
Carlos A. Rivera A.(1), Xinzhang Chen(1), Arash Shaghaghi1(1), Gustavo Batista(1), and Salil S. Kanhere(1)
- The University of New South Wales (UNSW) Sydney, Australia.
Abstract
The rapid integration of Internet of Things (IoT) devices into enterprise environments presents significant security challenges. Many IoT devices are released to the market with minimal security measures, often harbouring an average of 25 vulnerabilities per device. To enhance cybersecurity measures and aid system administrators in managing IoT patches more effectively, we propose an innovative framework that predicts the time it will take for a vulnerable IoT device to receive a fix or patch. We developed a survival analysis model based on the Accelerated Failure Time (AFT) approach, implemented using the XGBoost ensemble regression model, to predict when vulnerable IoT devices will receive fixes or patches. By constructing a comprehensive IoT vulnerabilities database that combines public and private sources, we provide insights into affected devices, vulnerability detection dates, published CVEs, patch release dates, and associated Twitter activity trends. We conducted thorough experiments evaluating different combinations of features, including fundamental device and vulnerability data, National Vulnerability Database (NVD) information such as CVE, CWE, and CVSS scores, transformed textual descriptions into sentence vectors, and the frequency of Twitter trends related to CVEs. Our experiments demonstrate that the proposed model accurately predicts the Time-2-Fix for IoT vulnerabilities, with data from VulDB and NVD proving particularly effective. Incorporating Twitter trend data offered minimal additional benefit. This framework provides a practical tool for organisations to anticipate vulnerability resolutions, improve IoT patch management, and strengthen their cybersecurity posture against potential threats.
Keywords
CyberSecurity in IoT | Time2Fix | Internet of Things (IoT) | Common Vulnerability Enumeration (CVE) | Common Weakness Enumeration (CWE) | MITRE Att&ck |
The Datasets
Our dataset was created by collecting vulnerability information on IoT devices from four sources: MITRE, NIST NVD, VulDB, and Twitter. MITRE provided information about common vulnerabilities and exposures (CVE) and their corresponding release dates. NIST delivers additional data to each CVE, such as the common vulnerability risk score (CVSS) and the common weaknesses enumeration (CWE). VulDB was used to obtain extensive information on the vulnerability (CVE), including risk scores (CVSS), exploits, attacks, weaknesses (CWEs), remediation actions, and dates corresponding to each element. Lastly, Twitter was used to identify social network trends related to each published vulnerability identifier (CVEID) and extracted only the trend counts provided by the platform. Table 1 provides a detailed breakdown of the features we extracted from each source.
We used the MITRE and NIST vulnerabilities databases (NIST NVD) to gather vulnerabilities. We filtered out records not related to hardware by identifying the type of affected product using the product code in the NVD (o for operating system, s for software, a for application, and h for hardware). We utilised the publication date of CVE as a reference to determine the Time-2-Fix for each vulnerability. However, we came across numerous records that resulted in a zero Time-2-Fix. This indicates that the date of the fix being published is the same as that of CVE. There could be various reasons behind this, and one of the most common reasons we found is the manufacturer's strategy, which involves publishing CVEs only after developing the fix. Due to this biased process, we conducted an in-depth analysis of the CVE data. We discovered that NVD labels the references attached to each CVE, and we found it convenient that one of the labels refers to PATCH (Vendor Advisory, Patch, Third Party Advisory). After developing Java code scripts, the dataset underwent filtration procedures, with the intention of retaining only those records that were accurately labelled as PATCH.
To accurately gauge the length of Time-2-Fix, we thoroughly investigated the reference links associated with each CVE. We meticulously searched for relevant information regarding the date the fix was made available. Once we obtained this information, we compared it to the date the vulnerability had been initially detected. By subtracting the latter from the former, we could precisely calculate the time (Time-2-Fix) it had taken for the device's manufacturer to address the reported vulnerability (CVE). The calculated Time-2-Fix is what we will use as the predicted feature.
Upon compiling all pertinent information, our final dataset consisted of 1,027 records. These records encompass many aspects, including vulnerabilities, exploits, remediation actions, risk levels, timelines, and Twitter trends regarding IoT devices. We thoroughly analysed this dataset to identify the optimal model for implementation. In the next section, we delve deeper into the chosen machine learning model, which accurately predicts Time-2-Fix.
Table 1. Dataset-0 features. Breakdown of the features provided by each source: MITRE (SRC-1), NIST (RC-2), VulDB (SRC-3), and Twitter (SRC-4).
| Source | Feature Name | Data Type | Unique Values | Details |
|---|---|---|---|---|
| SRC-1 | CVEID | Continuous | 1,022 (21-02/2023) | Vulnerability identifier. |
| SRC-2 | CVSS | Categorical | 100 | A real number range 0 to 10. |
| SRC-2 | Weakness ID (CWEID) | Categorical | 926 | Weakness identifier |
| SRC-3 | Title | Categorical | 885 | Vulnerability Title. |
| SRC-3 | Summary | Categorical | 271 | Vulnerability Summary. |
| SRC-3 | Affected | Categorical | 871 | Affected products description. |
| SRC-3 | Vulnerability | Categorical | 73 | Vulnerability details identifier. |
| SRC-3 | Impact | Categorical | 19 | Attack impact. |
| SRC-3 | Exploit | Categorical | 91 | Exploit description. |
| SRC-3 | Countermeasure | Categorical | 194 | Suggested countermeasure description. |
| SRC-3 | Sources | Categorical | 735 | Data sources description. |
| SRC-3 | Vendor | Categorical | 77 | Vendor name of affected product. |
| SRC-3 | Product Name | Categorical | 333 | Name of affected product. |
| SRC-3 | Component | Categorical | 360 | Name of affected component. |
| SRC-3 | Version | Categorical | 392 | Affected product version. |
| SRC-3 | Risk | Categorical | 3 | Vulnerability Risk (Low, Medium, High). |
| SRC-3 | Class | Categorical | 72 | Vulnerability Class Name. |
| SRC-3 | Attck | Categorical | 21 | Attck Identifier from MITRE Att&ck. |
| SRC-3 | CVE Assigned | Date | 235 | Date of the CVE assignment. |
| SRC-3 | VulDB Base Score | Categorical | 100 | CVSSv3 Base Score calculation by VulDB. |
| SRC-3 | Vulnerability Found | Date | 144 | Timestamp for reported vulnerability found. |
| SRC-3 | Advisory Disclosed | Date | 326 | Timestamp for reported vulnerability Disclosed by Advisory. |
| SRC-3 | CVE Reserved | Date | 235 | Timestamp CVE Reserved. |
| SRC-3 | NVD Disclosed | Date | 215 | Timestamp of Vulnerability Disclosure by NVD. |
| SRC-3 | VulDB Entry Created | Date | 171 | Timestamp of Vulnerability Entry Creation by VulDB. |
| SRC-3 | VulDB Entry Updated | Date | 1022 | Timestamp of Vulnerability Entry Last Updated by VulDB. |
| SRC-3 | Advisory Confirmation | Date | 3 | Advisory Confirmation Type (Confirmed, Not Defined, Uncorroborated. |
| SRC-3 | Exploit Publicity | Categorical | 3 | Publicity of the exploit (Public, Private, NA). |
| SRC-3 | Exploit Availability | Categorical | 3 | Availability of Exploit (1=Yes, 0=No, NA). |
| SRC-3 | Price 0day | Categorical | 4 | Known or Estimated 0-day Price of the Exploit. |
| SRC-3 | Price Today | Categorical | 4 | Known or Estimated Price of the Exploit as of Today (Daily Updated). |
| SRC-3 | EPSS Score | Categorical | 536 | Current Value of the Exploit Prediction Scoring System. |
| SRC-3 | EPSS Percentile | Categorical | 653 | Percentile of CVE within Current EPSS. |
| SRC-3 | Countermeasure | Categorical | 4 | Generic Remediation Level Description. |
| SRC-3 | Countermeasure Name | Categorical | 8 | Name of the Suggested Countermeasure. |
| SRC-3 | Upgrade Version | Categorical | 132 | First Known Unaffected Version(s). |
| SRC-4 | Tweets | Categorical | 54 | Tweet Count of a CVE. |
| SRC-4 | First Tweet Date | Date | 165 | Timestamp of the First Tweet. |
| SRC-4 | Retweet Count | Categorical | 44 | Retweets Average. |
| SRC-4 | Reply Count | Categorical | 21 | Reply-Count Average. |
| SRC-4 | Like Count | Categorical | 47 | Like-Count Average. |
| SRC-4 | Quote Count | Categorical | 19 | Quote-Count Average. |
| SRC-4 | Impression Count | Categorical | 18 | Impression-Count Average. |
The Time2Fix dataset can be found in the following link.
Datasets inquiries Please submit any inquiry related to the dataset to:
