Verizon released their yearly Data Breach Investigations Report (DBIR) and it wasn’t too long before I started getting asked about their “Vulnerabilities” section (page 13). After bringing up some highly questionable points about last year’s report regarding vulnerabilities, several people felt that the report did not stand up to scrutiny. With a few questions leveled at me, I was curious if Verizon and partners learned from last year.
This year’s vulnerability data was provided by Kenna Security (formerly Risk I/O), and Verizon “also utilized vulnerability scan data provided by Beyond Trust, Qualys and Tripwire in support of this section.” So the data isn’t from a single vendor, but at least four vendors, giving the impression that the data should be well-rounded, and have less questions than last year.
From the report:
Secondly, attackers automate certain weaponized vulnerabilities and spray and pray them across the internet, sometimes yielding incredible success. The distribution is very similar to last year, with the top 10 vulnerabilities accounting for 85% of successful exploit traffic. While being aware of and fixing these mega-vulns is a solid first step, don’t forget that the other 15% consists of over 900 CVEs, which are also being actively exploited in the wild.
This is not encouraging, as they have 10 vulnerabilities that account for an incredible amount of traffic, and the footnoted list of CVE IDs suggests the same problems as last year. And just like last year, the report does not explain the methodology for detecting the vulnerabilities, does not include details about the generation of the statistics, and provides a loose definition of what “successfully exploited” means. Without more detail it is impossible for others to reproduce their results, and extremely difficult to explain or disclaim them as a third party reading the report. Going to the Kenna Security page about this report doesn’t really yield much clarity, but does highlight another potential flaw in the methodology:
Kenna’s Chief Data Scientist Michael Roytman was the primary author of this year’s “Vulnerabilities” chapter, analyzing a correlated threat data set that spans 200M+ successful exploitations across 500+ common vulnerabilities and exposures from over 20,000 enterprises in more than 150 countries.
It’s subtle, but notice they went through a data set that spans exploitations across “500+ common vulnerabilities and exposures”, also known as CVE. If the data is only looking for CVEs, then there is an incredibly large bias at play from the start, since they are missing at least half of the disclosed vulnerabilities. More importantly, this becomes a game of fractions that the industry is keen to overlook at every opportunity:
- CVE represents approximately half of the disclosed vulnerabilities.
- Vulnerability scanners and IPS/IDS don’t have signatures for all CVE IDs, so they look for some fraction of CVE.
- Detection signatures are often flawed, leading to false positives and false negatives, meaning they are actually detecting a fraction of the CVE IDs they intend to.
Another crucial factor in how this data is generated is in the detection of the exploits. Of the four companies contributing data, one was founded in 2009 (Risk I/O / Kenna) and another in 2006 (Beyond Trust, in the context of this discussion). That leaves Qualys (founded 1999) and Tripwire (founded 1997), who are likely the sources of the signatures that detected the vulnerabilities. For those around in the late 90s, the vulnerability landscape was very different than today, and security products based on signatures back then are in some ways considered rudimentary compared to today. Over time, most security products do not revisit older signatures to improve them unless they have to, often due to customer demand. Newly formed companies basically never go back and write signatures for vulnerabilities from 1999. So it stands to reason that the detection of these issues are based on Qualys and/or Tripwire’s detection capabilities, and the signatures detecting these vulnerabilities are likely outdated and not as well-constructed as compared to their more recent signatures.
That leads us to ask, how many vulnerabilities are these companies really looking for? Where did the detection signatures originate and how accurate are they? While the DBIR does disclaim that the data used is a sample, they also admit “bias undoubtedly exists”. However, they don’t warn the reader of these extremely limiting caveats that put the entire data set into a perspective clearly showing strong bias. This, combined with the lack of detailed methodology for how these vulnerabilities are detected and correlated to measure ‘success’, ultimately means this data has little value other than for inclusion in pedestrian reports on vulnerabilities.
With that in mind, I can only go by what information is available. We’ll start with the concise list of the top 10 CVE IDs these four vulnerability intelligence providers say are being exploited the most, and Verizon labels as “successfully exploited”:
- 2015-03-05 – CVE-2015-1637 – Microsoft Windows Secure Channel (Schannel) RSA Temporary Key Handling EXPORT_RSA Ciphers Downgrade MitM (FREAK)
- 2015-01-06 – CVE-2015-0204 – OpenSSL RSA Temporary Key Handling EXPORT_RSA Ciphers Downgrade MitM (FREAK)
- 2012-02-25 – CVE-2012-1054 – Puppet k5login File Symlink File Overwrite Local Privilege Escalation
- 2011-07-19 – CVE-2011-0877 – Oracle Enterprise Manager Grid Control Instance Management Unspecified Remote Issue (2011-0877)
- 2004-02-10 – CVE-2003-0818 – Microsoft Windows ASN.1 Library (MSASN1.DLL) BER Encoding Handling Remote Integer Overflows
- 2002-01-15 – CVE-2002-0126 – BlackMoon FTP Server Multiple Command Remote Overflow
- 2001-12-26 – CVE-2002-0953 – PHPAddress globals.php LangCookie Parameter Remote File Inclusion
- 2001-12-20 – CVE-2001-0876 – Microsoft Windows Universal Plug and Play NOTIFY Directive URL Handling Remote Overflow
- 2001-04-13 – CVE-2001-0680 – QVT/Net / Term FTP Server LIST Command Traversal Remote File Access
- 1999-11-22 – CVE-1999-1058 – Vermillion FTPD Long CWD Command Handling Remote Overflow DoS
This list should raise serious red flags for anyone passingly familiar with vulnerabilities. Not only do we have very odd ‘top 10’ lists from last year and this year, but there is little overlap between them. How does 2015 show a top 10 list exploiting eight vulnerabilities with CVE identifiers between 1999 and 2002, meaning they had been exploited so much as many as thirteen years later, only to see them all drop off the list this year, to be replaced by new 15+ year old vulnerabilities? In addition to this oddity, there are more considerations leading to my top 10 list of questions about their list:
- How does a local vulnerability based on a symlink overwrite flaw (CVE-2012-1054) make it into a top 10 list of “85% of successful exploit traffic“?
- How does a local vulnerability in Puppet rank #3 on this list, given the install base of Puppet as compared to Adobe or Java?
- If they are detecting exploits on the wire, shouldn’t we see Java, Adobe Reader, and Adobe Flash somewhere on the list? The “Slow and steady—but how slow?” section even talks about time-to-exploit for Adobe.
- Why doesn’t this list remotely match US-CERT’s “Top 30 Targeted High Risk Vulnerabilities” that includes vulnerabilities back to 2006, but not a single one listed above?
- How does a vulnerability that by all accounts is so vague, that it has to be distinguished by the vendor issued CVE ID (CVE-2011-0877), have a signature and get exploited so much?
- How does a vulnerability in Oracle Enterprise Manager Grid Control show up as #4, when no Oracle Database vulnerabilities appear?
- How do you distinguish an FTP LIST command exploit from one vendor to another? (e.g. CVE-2005-2726, CVE-2002-0558, CVE-2001-0933, CVE-2001-0680) According to the one-liner methodology, this is done via pairing SIEM data, suggesting that BlackMoon and Vermillion are that popular today.
- Yet, how does a remote DoS in an Windows-based FTP program that doesn’t appear to have been distributed for a decade make it on this list? Are people really conducting targeted DoS attacks against this software?
- Is BlackMoon FTP Server really that prevalent to be exploited so often?
- Or is there a problem in generating this data, which would be more easily attributed to loose signatures detecting FTP attacks regardless of vendor?
Figure 12 in the report, which is described as “Count of CVEs exploited in 2015 by CVE publication date” is a curious thing to include, as the CVE publication date is very distinct from the vulnerability disclosure date. While a large percentage of CVE publication dates are within seven days of disclosure, many are not (e.g. CVE-2015-8852 disclosed 2015-03-23 and CVE publication on 2016-04-26). Enough to make this chart questionable as far as the insight it provides. Taking the data as presented, are they really saying that only ~ 73 vulnerabilities with a 2015 ID were successfully exploited in 2015 across “millions of successful real-world exploitations“? Given that 40 vulnerabilities were discovered in the wild, 33 of which have 2015 CVE IDs, that means that only ~40 other 2015 vulnerabilities were successfully exploited? If that is the takeaway, how is the security industry unable to stop the increasing wave of data breaches, the same kinds that led to this report? Something doesn’t add up here.
While people are cheering about the DBIR disclaiming there is sample bias (and not really enumerating it), they ignore the measurement bias, don’t speak to publication bias, don’t explain the attrition bias between 2015 and 2016, or potential chaining bias. As usual, the media is happy to glom onto such reports without asking any of these questions or providing critical analysis. As an industry, we need to keep challenging metrics and statistics to ensure they are not only accurate, but provide meaning that benefits us.
4/28/2016 – According to Gabe (@gdbassett), the list of CVEs in the DBIR is incorrect. He posted a new list of CVEs (mostly the same) via Twitter in a reply to Andreas Lindh who was surprised at the top ten list of vulnerabilities as well. Gabe also confirmed that afterwards they “compared the figure CVEs (listed above) against the raw data. After removing non-confirmed breaches, they match.” He went on to link to another source showing “data” about one of the CVEs, that really doesn’t mean anything without more context. Meanwhile, Michael Roytman, who did the vulnerability section of the report confirmed that he/Kenna would be responding to this blog with one of their own.
I hate to harp on simple transposition style mistakes in a report, but given the severity behind using numeric identifiers for vulnerabilities, it seems like that should have been double and triple-checked. Even then, I don’t understand how someone familiar with vulnerabilities could see either list and not ask many of the same questions I did, or provide more information in the report to back the claims. That said, let’s look at Gabe’s new list of CVEs. Bold and links are used to highlight the new ones:
- 2015-03-05 – 2015-1637 – Microsoft Windows Secure Channel (Schannel) RSA Temporary Key Handling EXPORT_RSA Ciphers Downgrade MitM (FREAK)
- 2015-01-06 – 2015-0204 – OpenSSL RSA Temporary Key Handling EXPORT_RSA Ciphers Downgrade MitM (FREAK)
- 2004-02-10 – 2003-0818 – Microsoft Windows ASN.1 Library (MSASN1.DLL) BER Encoding Handling Remote Integer Overflows
- 2002-07-22 – 2002-1054 – Pablo FTP Server LIST Command Arbitrary Directory Listing Remote Information Disclosure
- 2002-01-15 – 2002-0126 – BlackMoon FTP Server Multiple Command Remote Overflow
- 2001-12-26 – 2002-0953 – PHPAddress globals.php LangCookie Parameter Remote File Inclusion
- 2001-12-20 – 2001-0877 – Microsoft Windows Universal Plug and Play NOTIFY Request Remote DoS
- 2001-12-20 – 2001-0876 – Microsoft Windows Universal Plug and Play NOTIFY Directive URL Handling Remote Overflow
- 2001-04-13 – 2001-0680 – QVT/Net / Term FTP Server LIST Command Traversal Remote File Access
- 1999-11-22 – 1999-1058 – Vermillion FTPD Long CWD Command Handling Remote Overflow DoS
The list gained another FTP server issue that doesn’t necessarily lead to privileges, and another remote denial of service attack, while losing the Puppet symlink (CVE-2012-1054) and the vague Oracle Enterprise Manager issue (CVE-2011-0877). All said and done, the list is just as confusing as before, perhaps more so. That gives us four FTP vulnerabilities, only one of which leads to remote code execution, and two denial of service attacks, that gain no real privileges for an attacker. As Andreas Lindh points out, that I failed to highlight, having a man-in-the-middle vulnerability occupy two spots on this list is also baffling in the context of the volume of attacks stated. Also note that with the addition of CVE-2002-1054 (Pablo FTP), there are now two vulnerabilities that appear on the DBIR 2015 and DBIR 2016 top ten CVE list.
Hopefully the forthcoming blog from Michael Roytman will shed some light on these issues.
Recently, the Verizon 2015 Data Breach Investigations Report (DBIR) was released to much fanfare as usual, prompting a variety of media outlets to analyze the analysis. A few days after the release, I caught a Tweet linking to a blog from @raesene (Rory McCune) that challenged one aspect of the report. On page 16 of the report, Verizon lists the top 10 CVE ID exploited.
The bottom of the table says “Top 10 CVEs Exploited” which can be interpreted many ways. The paragraph above qualifies it as the “ten CVEs account for almost 97% of the exploits observed in 2014“. This is problematic because neither fully explains what that means. Is this the top 10 detected from sensors around the world? Meaning the exploits were launched and detected, with no indication if they were successful? Or does this mean these are the top ten exploits that were launched and resulted in a successful compromise of some form?
This gets more confusing when you read McCune’s blog that lists out what those ten CVE correspond to. I’ll get to the one that drew his attention, but will start with a different one. CVE-1999-0517 is an identifier for “an SNMP community name is the default (e.g. public), null, or missing.” This is very curious CVE identifier related to SNMP as it is specific to a default string. Any device that has a default community name, but does not allow remote manipulation is a rather trivial information disclosure vulnerability. This begins to speak to what the chart means, as “exploit” is being used in a broad fashion and does not track to the usage many in our industry associate with it. McCune goes on to point another on their list, CVE-2001-0540 which is an identifier for a “memory leak in Terminal servers in Windows NT and Windows 2000 allows remote attackers to cause a denial of service (memory exhaustion) via a large number of malformed Remote Desktop Protocol (RDP) requests to port 3389“. This is considerably more specific, giving us affected operating systems and the ultimate impact; a denial of service. This brings us back to why this section is in the report when part (or all) of it has nothing to do with actual breaches. After these two, McCune points out numerous other issues that directly challenge how this data was generated.
Since Verizon does not explain their methodology in generating these numbers, we’re left with our best guesses and a small attempt at an explanation via Twitter, that leads to as many questions as answers. You can read the full Twitter chat between @raesene, @vzdbir, and @mroytman (RiskIO Data Scientist who provided the data to generate this section of the report). The two pages of ‘methodology’ Verizon provides in the report (pages 59-60) are too high-level to be useful for the section mentioned above. Even after the conversation, the ‘top 10 exploited’ is still highly suspicious and does not seem accurate. The final bit from Roytman may make sense in RiskIO’s world, but doesn’t to others in the vulnerability world.
That brings me to the first vulnerability McCune pointed out, that started a four hour rabbit hole journey for me last night. From his blog:
The best example of this problem is CVE-2002-1931 which gets listed at number nine. This is a Cross-Site Scripting issue in version 2.1.1 and 1.1.3 of a product called PHP Arena and specifically the pafileDB area of that product. Now I struggled to find out too much about that product because the site that used to host it http://www.phparena.net is now a gambling site (I presume that the domain name lapsed and was picked up as it got decent traffic). Searching via google for information, most of the results seemed to be from vulnerability databases(!) and using a google dork of inurl:pafiledb shows a total of 156 results, which seems low for one of the most exploited issues on the Internet.
This prompted me to look at CVE-2002-1931 and the corresponding OSVDB entry. Then I looked at the ‘pafiledb.php’ cross-site scripting issues in general, and immediately noticed problems:
This is a good testament to how far vulnerability databases (VDBs) have come over the years, and how our data earlier on is not so hot sometimes. Regardless of vulnerability age, I want database completeness and accuracy so I set out to fix it. What I thought would be a simple 15 minute fix took much longer. After reading the original disclosures of a few early paFileDB XSS issues, it became clear that the one visible script being tested was actually more complex, and calling additional PHP files. It also became quite clear that several databases, ours included, had mixed up references and done a poor job abstracting. Next step, take extensive notes on every disclosure including dates, exploit strings, versions affected, solution if available, and more. The rough notes for some, but not all of the issues to give a feel for what this entails:
Going back to this script being ‘more complex’, and to try to answer some questions about the disclosure, the next trick was to find a copy of paFileDB 3.1 or before to see what it entailed. This is harder than it sounds, given the age of the software and the fact the vendor site has been gone for seven years or more. With the archive finally in hand, it became more clear what the various ‘action’ parameters really meant.
Going through each disclosure and following links to links, it also became obvious that every VDB missed the inclusion of some disclosures, and/or did not properly abstract them. To be fair, many VDBs have modified their abstraction rules over the years, including us. So using today’s standards along with yesterday’s data, we get a very different picture of those same XSS vulnerabilities:
With this in mind, reconsider the Verizon report that says CVE-2002-1931 is a top 10 exploited vulnerability in 2014. That CVE is very specific, based on a 2002-10-20 disclosure that starts out referring to a vulnerability that we don’t have details on. Either it was publicly disclosed and the VDBs missed it, it was mentioned on the vendor site somewhere (and doesn’t appear to be there now, even with the great help of archive.org), or it was mentioned in private or restricted channels, but known to some. We’re left with what that Oct, 2002 post said:
We know it is “pafiledb.php” only because that is the base script that in turn calls others. In reality, based on the other digging, it is very likely making a call to includes/search.php which contains the vulnerability. Without performing a code-level analysis, we can only include a technical note with our submission and move on. Continuing the train of thought, what data collection methods are being performed that assign that CVE to an attack that does not appear to be publicly disclosed? The vulnerability scanners from around that time, many of which are still used today, do not specifically test for and exploit that issue. Rather, they look for the additional three XSS disclosed in the same post further down. All three “pafiledb.php” exploits that call a specific action that correspond to /includes/rate.php, /includes/download.php, and /includes/email.php. It is logical that intrusion detection systems and vulnerability scanners would be looking for those three issues (likely lumped into one ID), but not the vague “search string” issue.
McCune’s observation and singling this vulnerability out is spot on. While he questioned the data in a different way, my method gives additional evidence that the ‘top 10’ is built on faulty data at best. I hope that this blog is both educational from the VDB side of things, and further encourages Verizon to be more forthcoming with their methodology for this data. As it stands, it simply isn’t trustworthy.