ARISE: Augmented Reality Information Search Engine (2010 - present)
(funded by A*STAR
through the Human Sixth Sense Program at ADSC
The ARISE project, for building the Augmented Reality Information Search Engine, aims to exploit the emerging trends that the virtual world (the Internet cyberspace) and thereal world (the geographic world) are increasingly merging into one. With the proliferation of the Internet, everything we find in the real world also appears in the virtual world. On the other hand, with the prevalence of mobile devices, anywhere and anytime we are in the real world, we are also connected to the virtual world. Our vision is thus to enable the seamless "mashing up" of the two worlds, to augment our real world experience with information organized from the virtual world. Our objective, as the figure above shows, is to develop the ARISE service — a set of search functions in the clouds — to provide various mobile apps with information about any real world “entity” of interest, such as places (e.g., Bombay Grill restaurant), and events (e.g., a concert of Beyonce), and situation (e.g., road construction at Ayer Rajah Ave).
Trust Network Mining (2008 - present)
Trust between a pair of users is an important piece of information for users in an online community (e.g., e-commerce or product review site), where users may rely on trust information to make decisions. A directed link in a trust network indicates one node's trust on the other node. A common feature of trust link in an online site is to allow the trustor to monitor the trustee's activities. For example, in Twitter, a user's tweets will be streamed to those who follow her. In Epinions, a user's written reviews will be streamed to other users who trust her. The trend of social information processing means users increasingly rely not only on their own preferences, but also on those whom they trust when making various decisions. In this research, we study various issues in trust networks, such as predicting missing trust links, identifying untrustworthy users (e.g., spammers), studying the correlation of item adoptions between trustee and trustor, as well as predicting such item adoptions based on the correlations.
Mining User-Generated Data for Improving Search (2008 - 2010)
Web search has evolved far beyond the ten blue links. First, users now issue complex queries and demand answers, not just pages. For example, a query for a current movie is expected to return actual show times in nearby movie theaters. This is complicated by the fact that user queries are very informal (e.g., indy 4), while some answers coming from structured databases have formal representations (e.g., Indiana Jones and the Kingdom of the Crystal Skull). Second, users now pursue more complex tasks (e.g., holiday planning), involving related queries over a period of days, if not weeks. To better support users in such task-driven queries, the search engine seeks to organize user queries into sessions. This is difficult because related queries are frequently interleaved (due to users multi-tasking), and may not share text similarity. Third, users may want to customize their searches, for instance by focusing on specific domains (e.g., travel), or by mashing up personal data and web data. We observe that the users themselves (or more accurately the data generated by them) hold the key to these diverse challenges. In this research, we seek to mine user-generated data (e.g., query click log, query reformulation log) in order to derive synonyms for the names of long-tail entities (e.g., books, movies), as well as to organize the user queries into coherent sessions. We also experiment with an innovative system for users to build their own custom search applications (Symphony).
- Entity Synonyms for Structured Web Search, TKDE, to appear.
- Organizing User Search Histories, TKDE, in preprint.
- Homophily in the Digital World: A LiveJournal Case Study, IEEE Internet Computing, 2010.
- Fuzzy Matching of Web Queries to Structured Data, ICDE, 2010.
- Symphony: A Platform for Search-Driven Applications, ICDE (demo), 2010.
- Leveraging Social Context for Searching Social Media, SSM, 2008.
Measuring Behaviors in Online Rating Networks (2006 - 2008)
With the advent of Web 2.0, online rating has become an important feature in many applications that involve information (e.g., video, photo, and audio) sharing and social networking (e.g., blogging). In a collaborative rating system, a set of reviewers assign rating scores to a set of objects. As part of the evaluation analysis, we want to obtain fair reviews for all the given objects. However, because reviewers rate different subsets of objects, there is a lot of variance in how the rating scores are assigned. This assignment is affected by certain reviewer behaviors, such as bias (tendency to deviate from the norm), leniency (tendency to in-/deflate ratings), or dependency (tendency to over/under-rely on particular criteria). Interestingly, such behaviors are tightly inter-linked with properties of the objects, such as controversy (its natural variance), quality (its deserved score), as well as dependency (its important criteria). In this research, we employ a "data-driven" approach to comprehensively study the respective rating behaviors of reviewers and objects, as well as their mutual relationships.
- On Mining Rating Dependencies in Online Collaborative Rating Networks, PAKDD, 2009.
- Bias and Controversy in Evaluation Systems, TKDE, 2008.
- Summarizing Review Scores of Unequal Reviewers, SDM, 2007.
- A Multitude of Opinions: Mining Online Rating Data, NGDM, 2007.
- Bias and Controversy: Beyond the Statistical Deviation, KDD, 2006.
Quality and Authority in Collaborative Authorship Networks (2006 - 2008)
Wikipedia has grown to be the world's largest free encyclopedia. Its articles are collaboratively written and maintained by a large number of volunteers online. Given its crowd-sourcing nature, there is a large variance in quality among the articles. On one hand, some articles are considered the most authoritative on the Web, and are frequently ranked highly by search engines. On the other hand, other articles might have been edited by non-experts and inexperienced contributors, or may simply be incomplete. In this research, we seek to determine the quality of articles by mining their edit histories. An article is likely to have higher quality, if it has been edited by more authoritative contributors. In turn, a contributor is more likely to have higher authority, if her contributions to higher-quality articles tend to survive the edits by other authoritative contributors. Our work is among the first to use edit histories and featured articles for quality measurement and evaluation on Wikipedia.
STEvent: Social Network Discovery from Spatio-Temporal Events (2004 - 2005)
Spatio-temporal data capturing the movement of individuals over space and time contains latent information on the potential associations among these individuals. For instance, a pair of individuals frequently participating in events together are likely to be associates in some way (e.g., classmates, colleagues, friends). In this work, we define an event as a spatio-temporal co-occurrence of several individuals, and we measure the likelihood of an event with the precision and uniqueness of the event's location and time. Our research focuses on how to efficiently mine such events and the social network links they support from usage logs of mobile and Internet technologies. Experiments on real data sets involving both real-world locations (wi-fi base stations) as well as virtual-world locations (web pages) show that such events can be mined efficiently, and that the discovered social links correlate with friendship indicators such as homophily.