If you Googled “metadata” and found this article, you have used metadata. When you bought your mom a gift from Amazon, you used metadata. Did you connect with a colleague through LinkedIn? The metadata was up and running. Your afternoon Spotify fix? Yes, you guessed it, you used metadata. We analyze here what metadata is, its characteristics, types, functions and main examples.
What is metadata?
Metadata is data about data . In other words, it is information that is used to describe the data contained in something like a web page, document, or file. Another way to think of metadata is as a brief explanation or summary of what the data is.
A simple example of document metadata might include a collection of information such as the author, file size, the date the document was created, and keywords to describe the document. The metadata for a music file can include the name of the artist, the album, and the year it was released.
For computer files, the metadata may be stored within the file itself or elsewhere, as is the case with some EPUB book files that keep the metadata in an associated ANNOT file.
Metadata represents behind-the-scenes information that is used everywhere, by all industries, in multiple ways. It is ubiquitous in information systems, social media, websites, software, music services, and online retail. Metadata can be created manually to pick and choose what is included, but it can also be generated automatically based on the data.
Metadata describes invisible HTML elements that directly communicate and clarify website information for search engines, playing a critical role in effective search engine optimization for retailers. This series of micro-communications includes page titles, description tags, and other protocols, and can describe purposes, features, and general content.
They are a structured way of communicating information about a data set, used in a variety of settings with particular relevance to e-commerce businesses.
Etymology of the term
The etymology of this term consists of two words, one Greek and one Latin. On the one hand the Greek word “meta”, which means after or beyond, and on the other hand the Latin word “datum”, which means datum. Therefore, the expression metadata means beyond the data.
According to this etymology, metadata is a set of data that describes the informative content of a resource, of files or of their information. That is, it is information that describes other data. But there is no single definition of metadata , there are several expressions by which it is known, such as information about data, information about information or data about information.
“Metadata” is a fairly new word (it appeared in the second half of the 20th century), while “data” dates back to the middle of the 17th century.
Among the main characteristics of metadata are the following:
- They are highly structured packages of information that explain the content, quality and characteristics of the data on the website.
- They are precise and in many cases short and made up of simple words.
- They offer access points to the information on the website.
- They encode the description of the website.
What are they for?
Metadata serves a variety of purposes, resource discovery being one of the most common. Here, it can be compared to effective cataloging, which includes identifying resources, defining them by criteria, gathering similar resources, and distinguishing between those that are different.
It is also an effective means of organizing electronic resources, which is an important use given the growth of Web-based resources. Typically, links to resources have been organized as lists and created as static web pages, with the names and resources encoded in HTML. However, a more efficient practice is to use metadata to create these pages. For web purposes, information may be extracted and reformatted through the use of software tools.
Another use of metadata is as a means to facilitate interoperability and integration of resources. The use of metadata to describe resources allows their understanding by both humans and machines. This enables the most effective levels of interoperability, or how data is exchanged between many systems with disparate operating platforms, data structures, and interfaces. At the same time, it facilitates the search for resources on the network.
Metadata also facilitates digital identification through standard numbers that uniquely identify the resource defined by the metadata. In this line, another practice is to combine metadata to act as a set of identification data that differentiate objects or resources, supporting validation needs.
Finally, metadata is an important way to protect resources and their future accessibility . It is a critical concern given the fragility of digital information and its susceptibility to corruption or alteration. For archival and preservation purposes, it takes metadata elements that trace the lineage of the object and describe its physical characteristics and behavior so that it can be replicated in future technologies.
Metadata is a tool through which companies that master a large amount of information obtain the necessary help to organize that information and facilitate the work of users, increasing their productivity.
These are the main types of metadata:
According to its function
Depending on the function that these metadata have, they are divided into:
They are data that explain how symbolic data can be used to make inferences from logical results, so they are characterized by compression.
They are the data that detail the sub-symbolic data, so they introduce meaning.
They are those data that do not contain any information about their meaning.
According to its variability
In this case, metadata is divided into two types:
This is the data that does not change regardless of which part of the resource is visible.
They are the data different from others and even differ from part to part.
According to its content
In this case, the metadata is partitioned by its content. Thus, the option is given to differentiate between the metadata that details the resource itself and the metadata that describes the content of that resource.
What is the life cycle of metadata?
Metadata have a life cycle that details each stage they go through, doing certain tasks in each of them. Thus, in this aspect we can differentiate the life cycle of metadata into three phases:
- Creation – This stage is when the metadata is created. These can develop in different ways:
- Manually: it can be a somewhat complicated procedure, although it all depends on the format used and the volume that is being searched. In any case, any of the other two forms of creation that we detail below is more used.
- Automatic way: in this case, the software receives all the required information on its own, that is, without any external help. However, despite the technological advances in terms of the algorithms used in this aspect, it is not feasible for the computer to be able to extract each and every one of the metadata automatically by itself. So this form is not the most appropriate either, although it is also used frequently.
- Semi-automatic way: this is the ideal way to create metadata. Through this system, a series of autonomous algorithms are established that are supported by the user in question and that do not allow the software to extract the desired data by itself, but instead need external help to do so.
- Manipulation : in this phase changes are made in certain aspects. Therefore, if the data in question changes, the metadata must also change and this will be done easily and automatically, although there are times when human help is needed to carry out this work.
- Destruction : as the last phase that can be carried out in the life of the metadata is its destruction. In this case, you have to study well how to do it. There are different ways to remove metadata . On certain occasions, the metadata is deleted at the same time as its resources together. However, there are other situations where metadata is kept for different reasons, such as to control changes to a document.
How are they stored?
Metadata can be stored in a variety of places. When metadata is related to databases , the data is often stored in tables and fields within the database.
Sometimes the metadata exists in a specialized document or database designed to store such data, called a data dictionary or metadata repository . There are some specialized data file types that include both raw data and metadata.
More generally, metadata can be stored anywhere (for example, in emails, questionnaires, data collection instructions, or spreadsheets).
Advantages of proper metadata management
Investing in metadata development can deliver benefits in three key areas:
- You can extend the longevity of data. The lifespan of a typical dataset can be very short, often because missing or unavailable relevant metadata renders it useless. When comprehensive metadata is developed and maintained, typical data degradation and entropy is counteracted.
- It also makes it easy to reuse and share data. Metadata is key to ensuring that highly detailed or complicated data is more easily interpreted, analyzed, and processed by the creator of the data and others.
- Metadata is essential for maintaining long-term historical records of data sets, compensating for inconsistencies that can occur in documentation of data, personnel, and methods. They can also allow data sets designed for one purpose to be reused for other, long-term purposes.
Developing and maintaining metadata can be an expensive proposition. There are costs associated with editing and publishing data and metadata. Its long-term administration and maintenance can also be cumbersome. However, metadata is an investment that may not be optional in an era where information is critical to the life force of an organization.
Here is a detailed example of metadata.
You just took a photo of a bear in the woods. You upload it to your computer and put it in your image database. To find it quickly, you’ll use the metadata descriptors to search for the photo in the future. This is especially important because you have a lot of other bear photos and you want to be able to remember specific ones.
The metadata helps narrow your search using descriptors that identify the image. First, the date the photo was taken and the author are noted. This date gives a good base of where to start your search for the image. Then some keywords like bear or forest can be attached to the image. This is your metadata. Using a combination of the metadata keywords , you will be able to find the exact images . These types of metadata fall under the “descriptive” category.
Other examples of using metadata are as follows:
Metadata and website searches
The metadata embedded in websites is vitally important to the success of the site. It includes a website description, keywords, meta tags , and more, all of which play a role in search results.
Some common metadata terms used when creating a web page include meta title and meta description . The meta title briefly explains the topic of the page to help readers understand what they will get from the page if they open it. The meta description is more information, albeit brief, about the content of the page.
Both pieces of metadata show up in search engines so readers get a quick idea of what the page is about. The search engine uses this information to group similar items together so that when you search for a specific keyword or group of keywords, the results are relevant to your search.
The metadata of a web page can also include the language in which the page was written, as if it is an HTML page.
Metadata for tracking
Retailers and online shopping sites use metadata to track consumer habits and movements. Digital marketers track each of your clicks and purchases, storing information about you such as the type of device you use, your location, the time of day, and any other data they may legally collect.
With this metadata they create a picture of your daily routine and interactions, your preferences, your associations and your habits, and they can use that picture to market their products.
Internet service providers, governments, and anyone else with access to large collections of metadata information could use metadata from web pages, emails, and other places where users are online to monitor web activity.
Since metadata is a brief representation of the larger data, this information can be searched and filtered to find information on millions of users at once and track things like hate speech, threats, etc. Some governments have been known to collect this data, including not only web traffic, but also phone calls, location information, and more.
Metadata in computer files
Every file you save to your computer includes basic information about the file so the operating system understands how to handle it, and so you or someone else can quickly figure out what the file is from the metadata.
For example, in Windows, when you view the properties of a file, you can clearly see the name of the file, the type of file, where it is stored, when it was created and last modified, how much space it takes up on your hard drive, who it is. the owner of the file and more.
The information can be used by the operating system as well as by other programs. For example, you can use a file search utility to quickly find all files on your computer that were created sometime today and are larger than 3 MB.
Metadata on social networks
Every time you make friends with someone on Facebook, listen to the music that Spotify recommends for you, post a status, or share someone’s tweet, the metadata works in the background.
Online metadata is useful in very specific social media situations, like when you’re looking for someone on Facebook. You can view a profile picture and a short description of the Facebook user to learn just the basics about them before you decide to friend them or send them a message.
Database and metadata management
Metadata in the world of database management can address the size and format or other characteristics of a data item. It is essential to interpret the content of the data in the database. Extensible Markup Language (XML) is a markup language that defines data objects using a metadata format.
For example, if you have a data set with dates and names scattered all over it, you can’t tell what the data represents or what the columns and rows describe. With basic metadata like column names, you can quickly take a look at the database and understand what is describing a particular set of data.
If there’s a list of names with no metadata to describe them, it could be anything, but when you add metadata at the top that says “Former Employee,” you now know that those names represent all employees who have been terminated. The date next to them can also be understood as something useful like “Date of Termination” or “Date of Hire”.
Tools to find metadata
Here are several tools that you can use to find metadata.
FOCA is a tool that is mainly used to find metadata and hidden information in documents. These documents can be on web pages and can be downloaded and analyzed with FOCA.
It is capable of analyzing a wide variety of documents, the most common being Microsoft Office, Open Office or PDF files, although it can also analyze Adobe InDesign or SVG files, for example.
These documents are searched using three potential search engines: Google, Bing, and DuckDuckGo . The sum of the results of the three engines is equivalent to many documents. It is also possible to add local files to extract the EXIF information from the graphic files, and a complete analysis of the information discovered via the URL is performed even before the file is downloaded.
Octopai is a centralized cross-platform metadata management automation solution that enables data and analytics teams to discover and control shared metadata.
The product performs metadata scanning by automatically collecting metadata from ETL, databases, and reporting tools. Metadata is stored and managed in a central repository, and an intelligent engine using hundreds of crawlers searches all metadata and returns results quickly.
Octopai is best used for use cases in business intelligence, governance, and data cataloging.
Infogix offers a set of built-in data governance capabilities including business glossaries, data cataloging, data lineage, and metadata management.
The tool also provides customizable dashboards and zero-code workflows that adapt as each organization’s data capability matures. Reference clients use Infogix for data governance and data value, compliance and risk management.
The product is also flexible and easy to use, supporting smaller data analysis jobs as well.
Collibra’s data dictionary documents an organization’s technical metadata and how it is used. Describes the structure of data, its relationship to other data, and its origin, format, and use.
The solution serves as a searchable repository for users who need to understand how and where data is stored and how it can be used. Users can also document roles and responsibilities and use workflows to define and map data. Collibra is unique in that the product was created with business end users in mind.
It is a technology-agnostic unified enterprise data catalog. It features a business glossary that allows users to define and maintain key business terms and link them to physical data assets, processes and results.
Policy-driven data quality combines data lineage with data profiling and intelligent labeling based on machine learning. Alex also offers smart tagging that helps users add business context to physical data assets. Deployment and integration are simple, and the product’s user interface is friendly to business users.
IBM InfoSphere Metadata Workbench
Business and data analysts use IBM’s Infosphere Metadata Workbench to explore and analyze the relationships between information assets and the metadata repository. Its efficiency comes from its ability to provide impact analysis with an overview of the effects of changes in information management environments.
The above content published at Collaborative Research Group is for informational purposes only and has been developed by referring to reliable sources and recommendations from experts. We do not have any contact with official entities nor do we intend to replace the information that they emit.