Unstructured data is content generated by people. It’s fundamentally very different from the structured data found in databases and needs to be managed differently.
Unstructured data includes emails, letters, documents, social media content, images, audio and video recordings, presentations, and webpages. These data types obviously include a large amount of business-critical and valuable information.
See our FAQs for more on this.
Organisations are reporting a number of significant challenges in achieving an end-to-end DSAR solution, including:
Increasing volume
Unstructured content is rapidly accelerating in velocity and variety; with data volume doubling every two years*.
Complexity
Unstructured data increasingly comes in new forms and from many more sources. This puts pressure on legacy storage systems which don’t cope well with new data types.
Maintaining compliance
Increased data protection regulations (e.g. the GDPR) create compliance obligations. Non-compliance implies very real risks of financial penalties and reputational damage.
Data repositories
Cloud migration and efficient resource allocation means organising and minimising data before moving it to tiered storage. To do this you need a clear and complete understanding what data you have, where it is held and its value.
See our FAQs for more on this.
Describe solution and engagements… (address the problems mentioned above)
Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum Lorem ipsum
Oyster IMS partners with the world’s leading technology experts to provide bespoke unstructured data solutions for our clients.
Our software partners for unstructured data solutions include Micro Focus, whose ControlPoint software is one of the most powerful and effective data analysis tools available today.
Also, Repstor Lorem ipsum Lorem ipsum Lorem ipsum
And, OneTrust Lorem ipsum Lorem ipsum Lorem ipsum
Lorem ipsum Lorem ipsum Lorem ipsum
Book an online demo of our unstructured data solution
What is meant by unstructured data?
Unstructured data is the term used to denote all data that does not sit in rows and columns databases, that is, it is not part of a SQL or Oracle type database. Examples include all types of office documents, as well as emails, scanned images, PDFs, video, CCTV/digital camera footage and even data from internet connected devices and log files (although the latter two are more correctly referred to as “dark data”, in that they are often hidden or difficult to find).
Unstructured data usually has very limited metadata, or none at all. Paper documents will have no metadata, a spreadsheet will have date, title and author information whilst emails will contain more information such as “to”, “from”, “date”, “subject”, alongside the body of the message (for this reason emails are sometimes considered to be semi-structured data).
Where do we find unstructured data?
Unstructured data is found in personal drives, shared drives, departmental drives, email systems, document and record management systems, cloud storage and anywhere that data can be found.
Moreover, the same data and information can be found in multiple locations as people store duplicate copies and versions locally, personally or in enterprise repositories. Many organisations think they know all their repositories, until somebody points out an off-shore back-up system, or data held in website databases.
How do we understand unstructured data?
Understanding unstructured data requires a data discovery exercise. Organisations must first ask themselves “What information have we got and where is it?” in order to understand where their information repositories are located and to gain a high-level understanding of the information contained within each, which may often be very approximate. The step requires obtaining an understanding of both the content and context of this information.
There are many tools available for searching and extracting content, but without context it may be difficult to quantify the risk or understand the value. Context can come from the surrounding metadata: for example, knowing that this subject was being discussed in a conversation between two people around a certain date. Context can also come from using natural language processing and smart analysis to understanding sentence construction (tools like IDOL can provide this).
Is unstructured data necessarily a problem?
Unstructured data is a problem wherever there is a requirement for any form of information production. The process of information production refers to creating a set of documents, data or information for litigation, for regulators, for compliance, for internal or external audit, or indeed for anyone who needs to see a set of information satisfying a certain set of criteria.
Moreover, organisations must also be able to make information sets available when requested as part of a data subject access request (DSAR). The growth in the volume of DSARs is causing some organisations particular issues at present.
Why is unstructured data considered risky?
The big issue is uncertainty: we don’t know what is in there.
Most content sitting in structured databases started life as unstructured, as a result of conversation/phone call/email/forms. This original unstructured data is still sitting there – somewhere – potentially discoverable and evidential. It may contain sensitive personal data, e.g. relating to medical records/religious beliefs/finances/children. It is in your system. If your system gets hacked, that information is out in the wild. This can – especially if not appropriately secured – result in breaking the law. Not knowing potentially opens you up to litigation especially if there is a “smoking gun” document in your system.
How should I classify unstructured data?
The best way to understand unstructured data is to classify it in some way. Classification of data can be automatic at a high level but, for more granular classification there is a need to engage with the business to understand the requirements of the organisation. A combination of good technical capability allied to the right tools together with good business analysis will lead to successful classification projects.
Most unstructured data tools allow you to build a “category” of information that contains some metadata elements and some elements of content. You can limit the category to certain people/emails sets/document types. Rules can then be set up to act on categories of data allowing automatic updating of categories, and application of appropriate policies. Alerts are immediately flagged-up, so the data can be moved, deleted, preserved, or otherwise treated as necessary
How can I search my unstructured data?
Tools like Windows search, or Outlook search will allow you to search within or across unstructured data sets. (Outlook has better search tool than you might expect, offering the ability to perform metadata and content searches across both documents and attachments).
The problem with many search tools is that they have limited use as one-off search tools – they do not offer a “what do I do with results?” function. You will usually not be able to action the results of your search by moving or deleting the documents. A better option is to use a dedicated file analysis or data discovery tool to analyse, reduce, classify and manage your unstructured data stores as well as carry complex searches.
© Copyright 2024 Oyster IMS | Web design by Union 10 Design