What Is Data-to-text? Everything You Need to Know About It

By David Martinez De Lecea, COO, Narrativa

Currently, there are 2.5 quintillion bytes of data created each day, reaching almost 40 zettabytes in 2018. Over 90 percent of the data in the world was generated in the last two years alone

Data generation is expected to increase at an exponential rate as the number of connected devices continues to grow, thanks to technologies such as the Internet of Things (IoT) and 5G, as well as a reduction in the cost of electronic data storage.

Data to text D2T

Annual size of the global datasphere

A lot of data, which can contain very useful and beneficial knowledge, remains unused and unanalyzed due to human limitations iEn processing such vast amounts of information in a practical manner.

What Is Data-to-text? An Introduction

Natural Language Generation (NLG) is a subarea of the field of Natural Language Processing (NLP) focused on the construction of understandable text in natural language automatically. Data-to-text (D2T), a key component of NLG, covers the transformation of data into natural language.

The key difference between D2T versus other areas of NLG is that the input information does not come in another form of natural language, but in the shape of non-linguistic electronic data. This data is frequently structured data, i.e. stored as fixed fields within a record or file such as relational databases or spreadsheets. 

Thanks to recent and spectacular developments in deep learning technologies, the data can also be unstructured. Unstructured data includes images, audio and video recordings, whose content can be translated to text and subsequently used to produce long narratives with D2T data-to-text techniques.

What Are the Applications of Data-to-text?

The potential applications of D2T are innumerable. Any report or communication based on structured input data can be automated. Here are some examples where the technology can be utilized: 

  • Journalism. Generate automatic news articles. Data-heavy topics such as sports, weather or financial markets are very well suited for these applications. In fact, this is one of the areas with the current highest penetration of D2T solutions. Narrativa is currently producing circa half a million articles in English, Arabic and Spanish every month, covering a variety of different topics such as sports, finance and weather.

  • Business Intelligence tools. Enhance the normally graphical output of business intelligence tools for analysis and research. In order to extract meaning, numbers require calculations and graphs require interpretation. Users can then benefit from a combination of graphs and natural language to bring to life insights found in the data.

  • E-commerce. Generate the descriptions of products being sold online. This is particularly useful for marketplace businesses that can offer millions of different products with high SKU rotation and in multiple languages. These product descriptions could even be generated considering the particular needs and interests of each individual customer, greatly enhancing the effectiveness of the already very successful recommendation systems of big e-commerce players.

  • Business management. Generate required business reports for management, customers or regulators. This saves a lot of time and resources so that businesses can focus on decision-making as opposed to reporting. 

  • Customer personalization. Create personalized customer communications. Reports can be generated for “an audience of one”, and can even be generated in real time when users interact with the business through digital channels, providing information in real time.

What Are the Advantages of Data-to-text?

The use of D2T technologies present some very clear benefits over more traditional approaches, namely:

  • Cost. D2T technologies can produce content at a cost at least 10 times lower than traditional human writing, and even higher in domains with highly-skilled, highly-paid staff. This even permits unearthing content that otherwise would never be analyzed and commented on due to lack of resources or the impracticality of writing for very small audiences. D2T makes feasible writing narratives for an audience as small as one person feasible.

  • Production time. The conversion of data into text, once the system has been set up, is done in an instant. This is of particular interest in areas where it is important to report on recent information (e.g. news or personalized real-time reporting)

  • Accuracy. Computers do not make mistakes or write typos. The only requirement is clean, correct data in order to obtain 100% accurate interpretation and analysis. 

  • Clarity. Information presented in natural language is easier to understand. In an experiment from 2017, the researchers found that the use of NLG content enhances decision-making under uncertainty, compared to state-of-the-art graphical-based representation methods.

  • Scale. As long as there’s new data, there is no theoretical limit as to how much output volume can be generated, allowing an unlimited number of users to benefit.

What Are The Disadvantages of Data-to-text?

There are, however, some limitations to the broad applicability of these solutions, including:

  • Lack of creativity. Machines cannot yet produce as much variety of content as humans can. Contrary to popular belief, this is not due to the inability of computers to create new ideas but to the limitations imposed by the data.

  • Limited scope. Currently, machines using D2T techniques can only talk about the information present in the input dataset. This greatly contrasts to a human’s ability to gather information from other sources such as industry common knowledge or analogies.

  • Setup time. In order to deploy D2T technologies to new areas, techniques must be customized for the specific domain of interest.

  • Rigidity. State-of-the-art techniques require important customization for the deployment of each specific use case. Whenever there are important changes in the data source or application requirements, the technology needs to be retrained and reconfigured.

Despite the attractiveness and effectiveness of D2T solutions, the implementation of this technology is still quite limited. These limitations fall into two main categories:

  • Lack of awareness. Most people are not aware of the existence of D2T solutions and are very surprised by the quality of their output. In simple terms, these technologies are not often utilized because potential users do not know they exist.

  • Lack of input data. Not all knowledge domains have sufficient data with the required volume and quality to automatically produce interesting content. Even when the data does exist, sometimes the owner has neither the knowledge nor the expertise and interest to exploit it, foregoing the opportunity to extract insights and value from that information.

At Plug and Play, we boost innovation by matching corporations and cutting-edge startups. Our Fintech accelerator is already working with some of the most exciting companies in the world. 

Join our platform today!