Cybersecurity Emerges as a Big Data Problem

Cloud Security

Cybersecurity Emerges as a Big Data Problem

By TMCnet Special Guest
Jay Desai, XtremeData, Inc.
  |  September 04, 2013

In a connected world there are no boundaries. Desk-bound activities such as email are now performed on mobile devices. What once required a personal visit is now be done online (e.g. banking, shopping) – from anywhere, anytime. Access, convenience and speed have created remarkable change in the way we consume and interact. The march to connected-digital is accelerating with our growing dependence. But digital interconnectedness is not for free. Risks of security breaches, data loss, espionage, denial of service, malware, and even more through cyber attacks are both numerous and growing.

Over the past few years, businesses have deployed various kinds of security appliances in their data centers to protect against pointed threats, such as intrusions, denials of service, viruses, spam, data loss, compliance and so forth. Appliances provide real-time, in-stream filtering, detection and prevention capabilities, and have proven to be very useful and cost effective protection against isolated threats. 

Security appliances are a growing market with many choices, and vendors are rapidly adding new functionality to strengthen in-stream capabilities. It is not unusual for a typical company to have dozens of these appliances in their data center. However, there are some limitations. As security appliances mature so do cyber criminals. Cyberattacks are becoming more sophisticated with more coordination, deft and reach to circumvent security appliances. One of the vulnerabilities is that appliances are “point solutions” - meaning each appliance is only aware of what it sees and has no visibility to traffic elsewhere. Cybercriminals can quite easily exploit these limitations to wreak havoc.

Comprehensive Approach

To address many of the vulnerabilities we need to start with data. Each security appliance generates a significant amount of incident and log level data every day. There is valuable information about who, what, when, how, and so forth in these logs. The data is used for operational reporting and then typically discarded. 

Collecting and storing granular data from every appliance into a central repository can provide a more comprehensive view of the security threats. Maintaining granular, real-time data with historical data in a central repository can provide an integrated view of threats that can be analyzed over time to enable reporting, detection, forensics, prevention, and prediction. 

Building such a repository has been a monumental task because of the quantity, speed and diversity of data. A typical enterprise can generate 1-10 terabytes of cybersecurity data per day with peak ingest rates of 1-10 million records per second, coming from tens of hundreds of different appliances. For most enterprises, a centralized cyber security data repository would be the largest data environment. Lack of cost-effective hardware, storage and software solutions have discouraged companies from building a cyber security database – that is until now. Big data and cloud technology are rapidly changing this trend, and it is now possible to build and operate petabyte scale data systems without big investments. As a result, savvy companies are beginning to invest in cyber security. 

The emerging landscape for cyber security comprises two complementary solutions. Each of these solutions serve defined requirements with different users, service levels and costs of ownership. Let us briefly examine each:

Scalable SQL Data warehouse – This is ideal for an integrated cyber security repository supporting interactive reporting and analytics and real-time data ingest. Such a repository will typically house real-time and historical data for a defined period of time (e.g. 12 months). Supporting complex data relationships with granular and aggregate level data can enable security analysts to perform rapid interactive reporting and analysis using industry standard visualization and analytics tools. In addition to native functionality, as the industry matures, users can leverage off-the-shelf SQL based cyber security analytics. 

Hadoop – This is ideal for maintaining a permanent repository of data to support batch-mode cyber security analysis over longer time frames. Hadoop inherently offers scalable and resilient storage at a low cost. The Hadoop ecosystem also supports a variety of open source tools that can be leveraged by developers for pattern detection algorithms, machine learning, predictive analytics and more. 

The combination of security appliances, SQL data warehouses and a Hadoop-based deep storage layer provides companies with a more complete solution to address cyber security and protect their digital franchises. A side benefit of building such a solution would be the insights gained about user behavior and the ability to spot anomalous patterns, even unrelated to security. Cybersecurity is both a threat and an opportunity.  

Jay Desai is co-founder of XtremeData, Inc.




Edited by Alisen Downey
blog comments powered by Disqus