Research Interests

My research interests are in the broad areas of Data Mining, Machine Learning and Artificial Intelligence. Specifically, I am interested in (Big) Data Exploration, Hidden Web Databases, Crowdsourcing, Social Content Mining and Social Networks.


Analytics over Hidden Databases

Structured hidden databases are widely prevalent on the Web. They provide restricted form-like search interfaces that allow users to execute search queries by specifying desired attribute values of the sought-after tuples, and the system responds by returning a few (e.g., top-k) tuples that satisfy the selection conditions, sorted by a suitable ranking function. The top-k output constraint prevents many interesting third-party (e.g., mashup) services from being developed over real-world web databases. This research involves developing effective techniques for retrieving more than top-k tuples for any query and support additional rank based analytics such as estimating the rank of a tuple or compare the rank of two arbitrary tuples to determine which of them is highly ranked. Our techniques access the hidden structured databases via their public interfaces and operate without any knowledge of the underlying static ranking function.

  • Saravanan Thirumuruganathan, Nan Zhang, and Gautam Das. Rank Discovery From Web Databases. In PVLDB 2013. [Paper] [Slides] [Poster] [BibTeX]
  • Saravanan Thirumuruganathan, Nan Zhang, Gautam Das. Breaking the Top-k Barrier of Hidden Web Databases. In ICDE 2013. [Paper] [Slides] [Poster] [BibTeX]
  • Weimo Liu, Saravanan Thirumuruganathan, Nan Zhang, Gautam Das. Aggregate Estimation Over Dynamic Hidden Web Databases. In PVLDB 2014. [Paper] [Technical Report PDF] [BibTeX]
  • Weimo Liu, Saad Bin Suhaim, Saravanan Thirumuruganathan, Nan Zhang, Gautam Das, Ali Jaoua. HDBTracker: Aggregate Tracking and Monitoring Over Dynamic Web Databases. In PVLDB 2014 (Demo paper). [Paper] [BibTeX]

Mining Online User-Item Interactions

The growing popularity of online collaborative content sites such as Netflix/MovieLens (movie ratings), Flickr (images), Youtube (videos) etc has provided enormous data that lets us peer into the collective mind of customers. Knowing what items customers like and why they like it is essential for any successful business. Various user-item interactions such as visits, likes, +1s, ratings, reviews provide a rich window into what users like, but knowing why a user likes the item is much trickier as few users leave elaborate comments explaining their preferences. While users are drawn to an item due to a subset of its features, a user-item interaction only provides an expression of user preference over the entire item, and not its component features. This project concerns developing data mining and exploration algorithms for performing aggregate analytics over user interactions (visits, likes, +1s, ratings, etc) available from collaborative content sites and use the resulting information to aid customer consumption decision making, rank features or identify frequently liked sets of features.

  • Saravanan Thirumuruganathan, Habibur Rahman, Sofiane Abbar, Gautam Das. Beyond Itemsets: Mining Frequent Featuresets over Structured Items. In PVLDB 2015. [Paper] [BibTeX]
  • Sofiane Abbar, Habibur Rahman, Saravanan Thirumuruganathan, Carlos Castillo and Gautam Das. Ranking Item Features by Mining Online User-Item Interactions. In ICDE 2014. [PDF] [Poster] [BibTeX]
  • Mahashweta Das, Saravanan Thirumuruganathan, Sihem Amer-Yahia, Gautam Das, and Cong Yu. An Expressive Framework and Efficient Algorithms for the Analysis of Collaborative Tagging. In VLDB Journal Special Issue on Best Papers of VLDB 2012. [PDF] [BibTeX]
  • Mahashweta Das, Saravanan Thirumuruganathan, Sihem Amer-Yahia, Gautam Das, and Cong Yu. Who Tags What? An Analysis Framework. In PVLDB 2012. [PDF] [BibTeX]
  • Saravanan Thirumuruganathan, Mahashweta Das, Shrikant Desai, Sihem Amer-Yahia, Gautam Das, and Cong Yu. MapRat: Meaningful Explanation, Interpretation and Geo-Visualization of Collaborative Rating. In PVLDB 2012. [PDF] [Poster] [BibTeX]

Analytics over Social Networks

Social Networks such as Facebook and Microblogging platforms such as Twitter have experienced a phenomenal growth of popularity in recent years, making them attrac- tive platforms for research in diverse fields from computer science to sociology. However, most of these platforms impose strict access restrictions (e.g., API rate limits) that prevent scientists with limited resources to leverage the wealth of microblogs for analytics. In this project, we consider multiple novel problems such as enabling efficient aggregate estimation over social networks and microblog platforms. In addition, we also investigate the feasibility of supporting complex queries over the limited search interfaces provided by these platforms and the various tradeoffs needed in enabling advertising over microblogs.

  • Saravanan Thirumuruganathan, Nan Zhang, Vagelis Hristidis, Gautam Das. Aggregate Estimation Over a Microblog Platform. In SIGMOD 2014. [Paper] [Slides] [Poster] [BibTeX]
  • Azade Nazi, Zhuojie Zhou, Saravanan Thirumuruganathan, Nan Zhang, and Gautam Das. Walk, Not Wait: Faster Sampling Over Online Social Networks. In PVLDB 2015. [Paper] [Technical Report] [BibTeX]
  • Azade Nazi, Saravanan Thirumuruganathan, Vagelis Hristidis, Nan Zhang, Khaled Shaban, and Gautam Das. Querying Hidden Attributes in Social Networks. In Third International Workshop on Intelligent Data Processing (IDP 2014), collocated with ICDM 2014. [Paper] [Slides] [BibTeX]
  • Milad Eftekhar, Saravanan Thirumuruganathan, Gautam Das, Nick Koudas. Price Trade-offs in Social Media Advertising. In COSN 2014 (ACM Conference on Online Social Networks). [Paper] [Slides] [BibTeX] [Technical Report PDF]

Knowledge Intensive Crowdsourcing

Crowdsourcing systems have gained popularity in a variety of domains. The next generation crowdsourcing systems will be collaborative and knowledge-intensive in nature. They need to treat the crowdsourcing problem not in optimization silos, but as an adaptive optimization problem by seamlessly handling the three main crowdsourcing processes (worker skill estimation, task assignment, task evaluation) and incorporating the uncertainty stemming from human factors. The main thrust behind this project is to develop algorithms for such an adaptive, knowledge-intensive crowdsourcing scenario by quantifying and incorporating the human factors into the three major crowdsourcing processes.

  • Senjuti Basu Roy, Ioanna Lykourentzou, Saravanan Thirumuruganathan, Sihem Amer-Yahia, Gautam Das. Crowds, not Drones: Modeling Human Factors in Interactive Crowdsourcing. In DBCrowd 2013 held in conjunction with VLDB 2013. [Paper] [Slides] [BibTeX]


In addition to the above focussed projects, I am also involved in multiple other cool, but smaller, projects. Taking part in them has introduced me to some awesome collaborators and new subfields! This is a catchall place to list the publications arising from these projects.

  • Senjuti Basu Roy, Saravanan Thirumuruganathan, Sihem Amer-Yahia, Gautam Das, Cong Yu. Exploiting Group Recommendation Functions for Flexible Preferences. In ICDE 2014. [PDF] [Slides] [Poster] [BibTeX]


Teleherence uses web and phone technologies to optimize adherence to treatment. It calls the client at agreed upon times, delivers reminders and messages, asks questions, graphs responses, sends desired alerts, and flags potential problems or opportunities using smart algorithms. It uses text-to-speech and speech recognition along with landline, cell, smart, SMS (texting), and VOIP phone technology. The system can also deliver pre-recorded audio files such as motivational messages from the care manager. Development is in partnership with Mental Health Mental Retardation of Tarrant County with support from the National Institute of Health, National Library of Medicine.

  • Saravanan Thirumuruganathan, Manfred Huber. Building Bayesian Network based expert systems from rules. In SMC 2011. [PDF] [Slides] [BibTeX]
  • Saravanan Thirumuruganathan. Building Bayesian Network based expert systems from rules. MS Thesis. [Thesis]