Desktop version

Home arrow Business & Finance

  • Increase font
  • Decrease font

<<   CONTENTS   >>


Visualization Techniques for Multidimensional Data

Visualization techniques refer to graphical representations of the data mined from texts. Examples include word clouds, tag clouds, histograms, scatter plots, line plots, box plots, network graphs, and other graphs. Visualization is commonly used for representing text mining analysis results from clustering, topical proximity analysis, sentiment analysis, or other analyses.

Visualization is a critical part of the text mining process, and in data mining in general.

As researcher Daniel Keim (2002) put it:

Visual data exploration aims at integrating the human in the data exploration process, applying its perceptual abilities to the large data sets available in today's computer systems. The basic idea of visual data exploration is to present the data in some visual form, allowing the human to get insight into the data, draw conclusions, and directly interact with the data.

... In addition to the direct involvement of the user, the main advantages of visual data exploration over automatic data mining techniques from statistics or machine learning are:

  • • Visual data exploration can easily deal with highly nonhomo-geneous and noisy data,
  • • Visual data exploration is intuitive and requires no understanding of complex mathematical or statistical algorithms or parameters.

As a result, visual data exploration usually allows faster data exploration and often provides better results, especially in cases where automatic algorithms fail. (p. 1)

Researcher Daniel Keim (2002) has provided a comprehensive classification of visualization techniques, from simpler to more sophisticated techniques. Simpler methodologies such as 2D/3D displays are suited for small and low-dimensional data sets. Sophisticated techniques, such as geometrically transformed displays, icon-based displays, dense pixel displays, and others, are used for multidimensional, text and hypertext, and other data sets. Interaction and distortion techniques, such as dynamic projections, zooming, filtering, and others, are

Text Mining 59 associated with these visualization methods. Daniel Keim’s article (2002) includes illustrations of some of these visualization techniques. Illustrations of information visualization for text mining can also be found in Professor Marti Hearst’s book (2009).

Text Summarization

Text summarization consists of a reductive transformation of the source text to summary text through content reduction by selection and generalization of what is important in the source (Sparck-Jones, 1999).

The field of automated text summarization has been the subject of intense research over the past decades. It continues to receive ample attention given that text summarization (like semantic analysis) remains one of the most challenging tasks for computer systems. The growing interest and need for text summarization technologies have been driven by the exponential growth of unstructured texts on the internet and the recent advances in natural language processing and machine learning.

Automated text summarization systems provide many benefits. These include:

  • • Reducing manual efforts in reading or browsing documents,
  • • Improving efficiencies and effectiveness in the summarization process,
  • • Facilitating document selection and indexing,
  • • Enriching answers in question answering systems, and
  • • Improving quality and consistency (by eliminating potential human errors or omissions in text summarization using manual efforts).

In addition, automated text summarization has the potential benefit of removing human biases in the summarization process.

Additional considerations regarding automated text summarization are discussed hereafter.

Kumar et al. (2016) distinguish between two types of automatic summarization outputs: extractive or abstractive summaries.

  • Extractive summaries (or extracts) are produced by identifying meaningful sentences directly selected from the document. Important sentences can be automatically determined using criteria such as word frequency, sentence position, presence oftitle word, or other criteria, and using a training model (machine learning approach).
  • Abstractive summaries (or Abstractive summarizations) are produced when the selected document sentences are combined coherently and compressed to exclude unimportant sections of the sentences (Kumar et al., 2016). Building abstractive summaries requires highly sophisticated language modeling and is a more complex task for computer systems than building extractive summaries.

As professor Torres-Moreno pointed out in his book (2014), there are many variants of automatic text summarization. Variants include generic or guided (i.e., personalized), single-document, multidocument, and multilingual summarization, and there are also various combinations of the above. Each task features different challenges that require different approaches to tackle them. Additional variants of automatic text summarization algorithms described by professor Torres-Moreno include domain-specific summarization (e.g., chemistry, biomedical, legal), update summarization (i.e., new facts only), sentence compression and multi-sentence fusion, semantic summarization, and ultra-summarization (summarization of short texts). Professor Torres-Moreno discusses many types of methods and algorithms for text summarization available today, including those involving machine learning approaches, and among them, artificial neural networks.

The use of deep learning in text summarization has gained significant interest recently and is a fertile topic for new research. As discussed in Chapter 1, deep learning is a subset of machine learning that uses artificial neural networks (ANNs) to discover patterns from data, here texts, in the context of text mining. For example, researchers Sukriti Verma and Vagisha Nidhi (2019) proposed an extractive text summarization approach for factual reports using a deep learning model. This model explores various features (e.g., most frequently occurring words, sentence position, sentence length, and others) to improve the set of sentences selected for the summary and the accuracy of the summary.

Zhang et al. (2016) developed a document summarization framework based on a convolutional neural network (CNN) model. This framework learns sentence features and performs sentence ranking jointly, using a regression process for sentence ranking and pre-trained word vectors. A convolutional neural network is a type of deep neural network initially used for computer vision tasks and more recently

Text Mining 61 used for natural language processing tasks. The use of deep learning approaches has also been explored for abstractive text summarization (Chopra et al., 2016; Nallapati et al., 2016: Rush et al., 2015; Song et al., 2019).

In conclusion, automatic summarization algorithms represent a vibrant and dynamic field of research. These algorithms are continually evolving to improve their performance and to address new challenges. Challenges include the systems’ ability to process increasingly large amounts of unstructured texts (blogs, emails, social media posts, and others) and other multimedia contents (images, audio, and video) in global, multilingual environments.

<<   CONTENTS   >>

Related topics