Home Education Library and Information Sciences
III Information Seeking and Retrieval
Visual Data Mining in a Q&A Based Social Media Website
Jin Zhang and Yiming Zhao
Abstract Data mining methods and technologies have been applied to different social media environments but seldom applied to narrative information based Q&A sites. This paper aimed to employ visual data mining techniques to address health care consumer terms use behavior in the Yahoo!Answers. Three months of data on the topic of diabetes in the health category of Yahoo!Answers were collected and analyzed. Terms from the collected data set were processed, validated, and classified. Both Multi-dimensional Scaling and Social Network Analysis visualization methods were employed to visualize the relationships of terms from related categories ('Complication & Related Disease' and 'Medication'; 'Complication & Related Disease' and 'Sign & Symptom'). Patterns and knowledge were revealed and discovered from the mapping of terms such as “acarbose might cause a side effect of hives”, “antidepressant may increase the risk of developing diabetes”, “there is a connection between imbalance and birthdefects”, etc. The results of this study can be of benefit to both health consumers and medical professionals.
Keywords Data mining • Social media • Social Q&A • Term analysis • Visual analysis
Data mining is a knowledge discovery process that reveals hidden patterns and trends from an investigated data set, illustrates relationships among involved objects, and analyzes data in a holistic way. It is widely used in business, health, information sciences and other disciplines.
Information visualization techniques can project abstract and invisible items or objects in a data set onto a visual and observable space where relationships among the projected objects are displayed and people can explore and interact with them. Information visualization and data mining have a natural connection because they share a common purpose. Information visualization can be employed as an effective means for data mining.
Social media provides an interactive online environment where people can create groups of interests, post and share opinions and ideas, discuss issues and concerns, and exchange relevant information in a variety of formats and ways. Social media not only provides users with an interactive environment but also offers dynamic, rich, and open datasets for researchers to utilize. Social media data has been applied to various domains and it is no surprise that researchers use visual data mining techniques to address a domain problem in a social media environment.
With the development of Web 2.0, people seek information from social media instead of completely relying on experts in the Internet. This phenomenon is so widespread that no one can negate its existence and influence. For instance, Yahoo!Answers is the most popular Internet reference site in America (Alexa 2013) and 16.64 % users in Yahoo are using Yahoo! Answers.
Yahoo! Answers is a social Question & Answer (Q&A) site, in which questions are categorized and broadcasted to the community. Any user can answer any question. Visitors to Q&A sites are increasingly seeking answers to a wide variety of questions that are organized under topical categories. Questions and answers from users are organized, archived, and searchable for other users (Rosenbaum and Shachaf 2010). Because of these unique characteristics and natural advantages of social Q&A sites, online Q&A sites are fertile ground for future studies in many aspects (Harper et al. 2008).
Social media has become one of the most popular textual and visual data sources for studying individual behavior and dispersive information. Data mining methods and technologies have been applied to different social media collections such as Flickr, YouTube, Twitter, Facebook, etc. but they have seldom been applied to narrative information based Q&A sites.
There are many technologies and tools to do data mining in social media. Applying some of these social network data mining techniques generates very complex models that are hard to analyze and understand (Ferreira and Alves 2012). Visual data mapping, however, is a simple, efficient, and effective mining technique which can present, understand, and explore complex abstract information by using computing techniques (Robertson et al. 1989). Visualization mapping is often employed to reveal connections and relationships among investigated objects, to do data analysis, to explore information, to explain information, to predict trends, and to detect patterns (Zhang 2008). This study employed two visualization mapping methods to mine social media data.
The astonishing size of social media communities and great diversity of information exchanged within them make these sites a valuable research setting for understanding the general public's online information seeking (Kim and Oh 2009). The interactions between users and social media include various user behaviors (Liu et al. 2012). Consumer health informatics is supposed to analyze and understand consumer behaviors and contained knowledge. Social media provides data generated by consumers for researchers to investigate the consumers themselves.
This paper will use data mining technologies, especially the information visualization techniques, to address health consumer terms use behavior. Two visual data mining techniques, Multi-dimensional Scaling (MDS) and Social Network Analysis (SNA), were employed to visually analyze the subject terms and their relationships under the topic of “Complication & Related Disease” of diabetes and its related topics in order to discover underlying patterns. Findings of this study can be used to better understand health consumer term usage behavior and provide a new research method to conduct similar research in consumer health informatics.
|< Prev||CONTENTS||Next >|