Frequently Asked Questions
The digital revolution and the rise of big data have created an enormous volume of data for the average business. In a 2017 study, Data Age 2025: The Evolution of Data to Life-Critical, IDC predicted that global data will grow to 163ZB (zettabytes, or one trillion gigabytes) by 2025.
The emergence of big data has spawned a wide range of data types that companies must manage and secure. These data types include:
- Structured data, which is largely numeric and comes from transactional systems and technology tools like enterprise resource planning (ERP) systems.
- Unstructured data, which consists of random files types — including images, audio/video recordings and Microsoft Office files — which are not subject to rules.
- Semi-structured data, which represents a hybrid of these types, where the file may contain numeric information, but that data is hard to extract (for example, a Microsoft Excel spreadsheet).
Each of these data types poses unique challenges in terms of creating a data governance strategy that stores the information, protects privacy and security, and complies with government regulations about data.
Most businesses have a solid understanding of structured data, which usually has a row-column format and very explicit metadata elements, such as month/day/year. Largely numeric, structured data comes from transactional systems, databases and back-office applications (for example, ERP systems). While businesses have an overwhelming amount of structured data, they generally know how to manage, analyze and apply it because of how well it’s defined.
The bigger challenge for the majority of organizations lies in understanding and extracting value from unstructured data. Unstructured data comes in many formats, each with varying degrees of complexity, such as images, audio files, office or productivity files, and handwritten notes that have been scanned. This data can, and does, originate from anywhere: internally, externally, from third parties, via edge devices, and from other sources.
Because unstructured data is not governed by strict rules or shared formats, it can be difficult to manage and apply a consistent data governance strategy. Still, it can contain critical insights that organizations need to leverage in today’s highly competitive, always-on business world.
For example, consider an important customer’s complaint left on voicemail. Mining value from the audio file requires a software application capable of playing it, a person to physically listen to it, and another person to determine what information is valuable and what isn’t. Converting the audio to text as part of the data processing strategy creates a consistent view of the recording that can be interpreted as necessary by anyone with authorized access to the recording. It also allows the voicemail to be blended with other forms of analytics, without compromising the original source.
Other unstructured data that contains key insights could include the handwritten notes of a maintenance technician servicing an essential piece of production equipment. With regard to third-party data, a long-range weather forecast or a negative social media post by an influencer can significantly impact demand for some products. It’s easy to see the huge potential value of this type of data.
Finally, semi-structured data represents a hybrid of these two data types. This group could include Excel spreadsheets that contain important financial information, but the data itself is hard to extract. These data objects may have structure within them, but they lack the external structure needed for standard data management processes. Like unstructured data, these objects contain important insights that can be hard to extract and apply without an intelligent data governance strategy.
Semi-structured data refers to any information that uses a self-describing schema, such as XML or JSON. These types of data have an open-ended schema that enables application data flexibility. Sometimes, this type of data is combined with structured data to record additional properties for specific types of records within a structured data store.
The open-ended schema means that semi-structured data does not rely on the application that created it to define the embedded structure. For example, an Oracle database would be considered a structured data type. The rules governing the database are bound and applied by the application that creates the file, or, in this case, the database.
With semi-structured datasets, the definitions and constraints are embedded within the file, regardless of the application that created them. For example, XML files and cascading style sheets for web pages are both forms of semi-structured data. They can be created by almost any kind of application — such as Notepad, a website builder app or an Office app like Word — so there is no way for the application to apply structure or rules to these data types.
Semi-structured data is challenging for organizations to manage because it does not necessarily have the same level of organization and predictability as structured data. It does not reside in fixed fields or records. At the same time, it does have more rigidity than unstructured data, because it does contain elements that can separate the data into various hierarchies (think of comma delimited files or tab delimited files).
Unlike structured data — which represents data as a flat table —semi-structured data can contain n-level hierarchies of nested information. This means that it can be easy to apply standard data management processes to semi-structured data, and it can be easy to extract insights from. The real issue is making sure your business has the tools and technology necessary to load the data into structured or unstructured data models, which can be managed via data governance.
By far, the greatest challenge for businesses today is the explosive growth of unstructured data. In fact, 80% of all new data created today is unstructured. That’s more than most organizations can keep up with, and it indicates that companies are likely collecting information they’re not even aware of. This can make it extremely challenging to properly use and secure unstructured data, and it creates an element of risk — because that the lack of awareness creates a propensity for organizations to unintentionally violate the increasing number of regulations addressing data privacy.
The second challenge is knowing what do with the data. Such a large and growing volume of data requires infrastructure to store and retain. Most organizations are not equipped for the significant administrative time and costs required to maintain the data. More important, its sheer volume makes it difficult for companies to extract the important strategic insights that do exist.
Data will never stop growing, its complexity will always vary, and the number of producers and consumers will remain seemingly endless. That’s why the answer lies in intelligent data governance, which means establishing policies and best practices aimed at cleansing, labeling, protecting, managing and ensuring accessibility to unstructured data — as well as structured and semi-structured data. With a well-defined data governance strategy in place, your company will be better prepared to deal with data growth, data quality, data relevance and data usability.
In simplest terms, intelligent data governance means bringing data under control, keeping it protected and enabling access to it, to carry out the top-level business strategy. But data governance also means knowing where data originated, where it is currently located, who can access it, what it contains, and how long it should be retained. Intelligent data governance also implies that trivial data is distinguished from strategically important information.
Once data is centralized and thoughtfully managed, its true strategic potential can be unleashed. Businesses can easily identify customer needs, anticipate emerging issues, explore new business opportunities and respond to regulatory inquiries. They can optimize the costs of storing and administering these information assets, while still allowing key stakeholders in the business to leverage data for improved decision-making.
When it comes to data governance, striking an appropriate balance is key. Data of all types must be closely managed, but the organization still needs to make it accessible, supporting the high degrees of flexibility and speed that are essential in today’s fast-moving world.
The good news is that there are innovative, automated solutions that can help streamline and accelerate the process of data governance, saving your organization valuable time and costs.
Hitachi Vantara literally wrote the book on data governance. With established leadership in data storage and management, the experts at Hitachi can make the complex task of data governance easy and straightforward, via automated solutions that help your company to:
- Ensure data quality.
- Make data identifiable.
- Centralize data and make it accessible.
By implementing automated solutions that cleanse, identify and centralize your structured, unstructured and semi-structured data, Hitachi can help you create a “single source of truth” that has enormous strategic value. You can gain new insights about your daily operations, your customers and trading partners, your finances and emerging trends that will impact your company and its financial results.
Intelligent data governance provides a range of strategic benefits for the typical company, including:
- Improved decision-making. Well-governed data is easier to access and apply, which means that stakeholders across the business can make decisions based on facts, not intuition or guesswork.
- Operational efficiencies. Critical data, including performance metrics, can be used to identify and address bottlenecks and inefficiencies in the way the company works every day. Having access to accurate, current data is essential to accomplishing this.
- Improved data understanding and lineage. Understanding the “data trail” and all accountabilities for data translates into timely responses to audits, more effective early case assessment activities, and a more proactive approach to preventing data corruption and breaches.
- Regulatory compliance. Increasingly, companies need to comply with complex privacy and security regulations related to the data they manage and store. Data governance is a critical aspect of ensuring and proving organizational alignment with the rules set forth in any applicable regulatory requirements.
- Increased revenues. Confidently armed with accurate, cleansed, real-time data, businesses can make better, faster decisions that positively impact sales and operating margins.
Because data volumes are growing exponentially, Hitachi Vantara recommends that your company reviews its data governance policies and practices on a quarterly basis. By looking at the “big picture” with regard to data every three months, your company can identify emerging trends, troubleshoot problems and ensure that data continues to function as a strategic resource.
In addition to establishing a cadence for data governance, Hitachi recommends that every organization include the position of chief data officer or CDO. Within the organization, the CDO serves as “the voice of the data,” protecting it and maximizing its strategic contribution on an ongoing basis.
An emerging concept, DataOps — or data operations — is enterprise-level data management for the artificial intelligence era. By implementing an overarching DataOps strategy, you can seamlessly connect your data consumers and creators, to rapidly find and use all the value in your data.
Data operations is not a product, service or solution. It’s a methodology, a technological and cultural change aimed at improving your organization’s use of data through better data quality, shorter cycle time and superior data management.
Since DataOps spans the entire cycle of gathering and applying information, it’s absolutely essential that your organization manages every type of data efficiently. By having data cleansed, well managed and immediately accessible, your DataOps initiative can be supported with the right information you need to make strategic decisions based on facts, not guesswork.
Because Hitachi Vantara has proven expertise in both DataOps and data governance, across every type of data, Hitachi is a natural partner. By instilling a data-driven culture and mindset, Hitachi can help make data a focus for your business every day.