June 23, 2016 by Canadian Underwriter
Big data is proving to be bad data for some surveyed enterprises, with almost nine in 10 reporting they believe they are flowing bad data into their data stores, note results of a global survey issued Wednesday by StreamSets.
The finding – part of a global survey of 314 data management professionals at organizations and conducted by independent research firm Dimensional Research – illustrates the need for enterprises to adopt a data flow operations mindset, contends the San Francisco-headquartered company, which provides data ingest technology for the next generation of big data applications.
That conclusion may be further supported by that fact that only slightly more than one in 10 of the enterprises taking part in the survey consider themselves to be good at the key aspects of data flow performance management.
Specifically, StreamSets reports, 12% of respondents rate themselves as “good” or “excellent” across five key performance management areas, namely detecting the following events: pipeline down, throughput degradation, error rate increases, data value divergence and personally identifiable information violations.
Of those surveyed, 30% were from enterprises with 10,000-plus employees, 16% from enterprises with 5,000 to 10,000 employees, 29% from enterprises with 1,000 to 5,000 employees and 25% from enterprises with 500 to 1,000 employees. Three-quarters of respondents – from food and beverage, hospitality and entertainment, media and advertising, non-profit, retail, transportation, energy and utilities, telecommunications, government, services, education, healthcare, manufacturing and financial services sectors – were from the United States or Canada, 14% were from Europe, and the rest from Asia, Middle East/Africa, Australia/ New Zealand, and Mexico/Central America/South America.
StreamSets cautions that pervasive data pollution, which implies analytic results may be wrong, is leading to false insights that drive poor business decisions. “Even if companies can detect their bad data, the process of cleaning it after the fact wastes the time of data scientists and delays its use, which is deadly in a world increasingly reliant on real-time analysis,” the company contends.
In all, 68% of respondent enterprises cite ensuring data quality as the most common challenge they face when managing big data flows, 74% report currently having bad data in their stores (despite cleansing throughout the data lifecycle), and only 34% rate themselves as “good” or “excellent” at detecting diverging data values in flow.
Of those surveyed, 44% felt weakest with performance degradation, 44% with error rate increases and 34% with detecting divergent data. “Detecting a ‘pipeline down’ event was the only metric where a large majority felt positively about their capabilities (66%),” notes the StreamSets statement.
“The study showed that enterprises of all sizes face challenges on a range of key data performance management issues from stopping bad data to keeping data flows operating effectively,” the company reports.
“In today’s world of real-time analytics, data flows are the lifeblood of an enterprise,” says Girish Pancha, CEO of StreamSets. “The industry has long been fixated on managing data at rest and this myopia creates a real risk for enterprises as they attempt to harness big and fast data. It is imperative that we shift our mindset towards building continuous data operations capabilities that are in tune with the time-sensitive, dynamic nature of today’s data,” Pancha says.
All that said, “enterprises overwhelmingly report that they struggle to manage their data flows. What is required is a new organizational discipline around performance management of data flows with the goal of ensuring that next-generation applications are fed quality data continuously,” the report states.
A chasm exists between the “problem-detection capabilities data experts have today and what they desire. This translates into a lack of real-time visibility and control of data flows, operations, quality and security,” the report states.
“For companies who use big data to optimize current business operations or to make strategic decisions, it is critical that they ensure their big data teams have real-time visibility and control over the data at all times,” it emphasizes.