EDBT/ICDT 2012 Tutorials

Indexing and mining topological patterns for drug discovery

Organizers: Sayan Ranu (University of California, Santa Barbara) and Ambuj Singh (University of California, Santa Barbara)

Duration: 1.5 hours

Abstract: Increased availability of large molecular repositories has created new challenges and opportunities for the application of mining and indexing techniques to problems in chemical informatics. The primary goal in analysis of molecular databases is to infer biological activity from structural properties. Two of the most popular approaches to representing molecular topologies are graphs and 3D geometries. As a result, the problem of indexing and mining structural patterns map to indexing and mining patterns from graph and 3D geometric databases.
In this tutorial, we will first introduce the problem of drug discovery and how computer science plays a critical role in that process. We will then proceed by introducing the problems of subgraph containment and subgraph similarity searches. Due to the NP-hardness of the problems, a number of heuristics have been designed in recent years and the tutorial will present an overview of those techniques. Next, we will introduce the problem of mining frequent subgraph patterns along with some of their limitations that ignited the interest in the problem of mining statistically significant subgraphs. After providing an in-depth survey of mining statistically significant subgraphs using two recent representative techniques, the tutorial will present the problem of analyzing 3D geometric structures of molecules. Finally, we will conclude by discussing some open computer science problems that can have a significant impact in the field of drug discovery.

Author affiliations:
Sayan Ranu is a Ph.D. candidate at the computer science department of University of California, Santa Barbara. He works in the "Data Mining and Bioinformatics lab" under Professor Ambuj K. Singh. His research focuses on querying and mining molecular databases to predict biological activity from structural properties. He was awarded the "Distinguished Graduate Research Fellowship" at UCSB.
Ambuj K Singh is a Professor of Computer Science in the Department of Computer Science and in the program of Biomolecular Sciences and Engineering at the University of California, Santa Barbara. He received his B.Tech. degree from the Indian Institute of Technology and a PhD degree from the University of Texas at Austin. His current research interests are in the areas of network science, chemoinformatics & bioinformatics, graph querying and mining, and databases. He has written over 160 technical papers.

Adaptive Indexing in Modern Databases

Organizers: Stratos Idreos, (CWI, The Netherlands) ; Stefan Manegold, (CWI, The Netherlands) ; Goetz Graefe, (HP Labs, Palo Alto)

Duration: 3 hours

Abstract: Physical design represents one of the hardest problems for database management systems. Without proper tuning, systems cannot achieve good performance. Traditional indexing creates indexes a priori assuming good workload knowledge and enough idle time. More recent approaches monitor the workload trends and create or drop indexes online, i.e., during query processing. Adaptive indexing takes another step towards completely automating the tuning process of a database system, by enabling incremental and partial online indexing. The main idea is that physical design changes continuously, adaptively, partially, incrementally and on demand while processing queries as part of the execution operators. As such it brings a plethora of opportunities for rethinking and improving every single corner of database system design.
We will analyze the research space between traditional indexing and adaptive indexing through several state of the art indexing techniques, e.g., what-if analysis and soft indexes. We will discuss in detail adaptive indexing techniques such as database cracking, adaptive merging, sideways cracking and various hybrids that try to balance the online tuning overhead with the convergence speed to optimal performance. In addition, we will discuss how various aspects of modern techniques for database architectures, such as vectorization, bulk processing, column-store execution and storage affect adaptive indexing. Finally, we will discuss several open research topics towards fully automomous database kernels.

Author affiliations:
Stratos Idreos holds a tenure track senior researcher position with CWI, the Dutch National Research Center for Mathematics and Computer Science. The main focus of his research is on adaptive query processing and database architectures, mainly in the context of column-stores. He also works on stream processing, distributed query processing and scientific databases. Idreos obtained his PhD from CWI and University of Amsterdam. In the past he has also been with the Technical University of Crete, Greece, and held research internship positions with Microsoft Research, Redmond, with EPFL, Switzerland and with IBM Almaden. Idreos won the 2011 ACM SIGMOD Jim Gray Doctoral Dissertation award for his thesis on database cracking, the 2011 ERCIM Cor Baayen award and the VLDB 2011 Challenges and Visions best paper award. In 2010 he was named a "Distinguished Scientist Excelling in Research abroad" by the Hellenic Ministry of National Defense. Personal web page
Stefan Manegold is the group leader of the database architecture research group at CWI in Amsterdam, The Netherlands. He received his PhD from the University of Amsterdam, The Netherlands, in 2002 and his Master (Diplom) in computer science from the Technical University of Clausthal, Germany, in 1994. Manegold's research work comprises database architectures, query processing algorithms and data management on modern hardware, as well as leveraging column-store database technology for efficient and scalable XML / XQuery processing, with a particular focus on optimization, performance, benchmarking and testing. Manegold co-authored more than 40 scientific publications, and recently received the VLDB 2009 10-year Best Paper Award together with his co-authors Peter Boncz and Martin Kersten. Stefan Manegold is a core member of the developers team of the open-source column-oriented database system MonetDB, co-founder of the DaMoN workshop series (co-located with SIGMOD since 2005), and co-chair of the Repeatability and Workability Evaluation for SIGMOD 2009 and 2010. Personal web page
Goetz Graefe is a HP Fellow researching database issues, primarily transactional indexing and robust query processing. Various database products employ techniques from his Exodus, Volcano, Cascades research projects. His best known works are in-depth surveys on query execution, sorting, and B-tree indexing

Similarity in (Spatial, Temporal and) Spatio-Temporal Databases

Organizers: Dimitrios Gunopulos, (University of Athens) and Goce Trajcevski, (Northwestern University)

Duration: 3 hours

Abstract: An important problem which permeates variety of application domains dealing with spatio-temporal data is assessing the similarity among the mobile particiipants. The (location, time) information may be obtained from different sources - e.g., cameras, satellite imagery, GPs-enabled devices, sensors - to name a few - and different applications may have different categories of relevant/imporant queries of interest. A similarity query targets the detection of which pairs/groups of objects have motion features that are "more similar" to each other, than the respective features of the objects in a given dataset. Although simple to define, the similarity query is essential for tasks like classification or clustering of spatio-temporal data which, in turn, are crucial for many (online or off-line) decision-making activities in diverse domains: from weather tracking and modelling, to chemical reactions control. The aim of this tutorial is to give an overview of the various challenges encountered when assessing the similarity of mobile entities and to present the corresponding solution techniques. After a motivational overview of the similarity-related issues in different application domains, the tutorial presents an overview of similarity-related problems and solutions in spatial and temporal databases. This is followed by the crux of the tutorial which provides a thorough exposition of the various aspects of similarity in spatio-temporal settings. The final part of the tutorial presents an overview of the role of similarity in several data mining applications and in the context of wireless sensor networks, concluding with some open challenging issues and applications.
Download: slides (pdf, 11MB)

Author affiliations:
Dimitrios Gunopulos received his PhD from Princeton University in 1995. He is an Associate Professor in the Department of Informatics and Telecommunications, University of Athens. He has held positions at the Max-Planck-Institut for Informatics, the IBM Almaden Research Center, and the Department of Computer Science and Engineering, University of California Riverside. His main research interests are in the areas of Data Mining, Web Mining, Knowledge Discovery in Databases, Databases, Sensor Networks, Peer-to-Peer systems, and Algorithms. He has co-authored a book and over 100 publications in refereed conferences and journals. He has served as a General co-Chair in the HDMS 2011 and the IEEE ICDM2010 conferences, as a PC co-Chair in the ECML/PKDD 2011, IEEE ICDM2008, ACM SIGKDD 2006, SSDBM2003, and DMKD 2000 conferences, and as an associate Editor at the IEEE TKDE, IEEE TPDS, ACM TKDD, and KAIS journals.
Goce Trajcevski received his PhD from the University of Illinois at Chicago in 2002 and has participated in both NSF and industry-funded research projects (with BEA Corp. and Northrop Grumman Corp.). His main research interests are in the areas of Mobile Data Management, Uncertainty Management and Sensor Networks and, in addition to encyclopediae and book-chapters, he has co-authored over 60 publications in refereed conferences and journals and has received two best paper awards (CoopIS 2000 and MDM 2010). He was part of the organizing committees of ACM SIGMOD 2006, IEEE MDM2011, IEEE MDM 2012 and ACM GIS 2011 and has served in the program committees in numerous conferences and workshops.

Distributed Skyline Processing: a Trend in Database Research Still Going Strong

Organizers: Katja Hose (Max-Planck Institute for Informatics Saarbrücken, Germany, hose@mpi-inf.mpg.de) ; Akrivi Vlachou (Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim, Norway, vlachou@idi.ntnu.no)

Duration: 1.5 hours

Abstract: During the last decade, data management and storage have become distributed. In consideration of the huge amount of data available in such systems, advanced query operators, such as skyline queries, are necessary to help users process the data. Imagine, for instance, a user is looking for a car, mobile, or hotel on the internet. The huge amount of available offers makes it a tedious task to find a good trade-off between different (contradicting) criteria, e.g., minimum age and minimum price. Skylines, however, consist of interesting offers representing good trade-offs. The skyline operator has been proposed about a decade ago, but research on skyline queries, especially in distributed scenarios, is still an ongoing process. Query processing in distributed environments poses inherent challenges and requires non-traditional techniques due to the distribution of content and the lack of global knowledge. In this tutorial, we will outline the objectives and the main principles that any distributed skyline approach has to fulfill, leading to useful guidelines for the design of efficient distributed skyline algorithms. More importantly, distributed processing of other query types share the same objectives and principles, therefore several of the guidelines are applicable also for other query types. Furthermore, this tutorial will provide a broad survey of the state-of-the-art in distributed skyline processing, present a categorization of the existing approaches based on their characteristics, and point out open research challenges in distributed skyline processing.

Author affiliations:
Katja Hose is currently a post-doctoral researcher at the Max-Planck Institute for Informatics in Saarbrücken, Germany.She studied Computer Science at Ilmenau University of Technology, Germany and received her diploma in 2004. She joined the Databases & Information Systems Group at Ilmenau University of Technology as a research associate in the same year. She received her doctoral degree in Computer Science in 2009 and afterwards joined the Max-Planck Institute for Informatics in Saarbrücken. During her PhD studies she focused on distributed processing of skyline and top-k queries in schema-based P2P systems, efficient query routing, update strategies for routing indexes, heterogeneous data, and query rewriting using views. Her current research interests range from query processing and optimization in distributed systems, heterogeneous databases, and rank-aware query operators to information retrieval, linked data, and RDF query processing.
Akrivi Vlachou is currently a post-doctoral researcher at the Norwegian University of Science and Technology (NTNU) in collaboration with Athena Research and Innovation Center, Athens, Greece. She received her Ph.D. in 2008 from the Athens University of Economics and Business (AUEB), her MSc degree and her B.Sc. degree from the Department of Computer Science and Telecommunications of University of Athens in 2003 and 2001 respectively. In her dissertation, she studied methods for efficient query processing for highly distributed data. She has received fellowships for post-doctoral studies from European Research Consortium for Informatics and Mathematics (ERCIM) and from the Greek State Scholarship Foundation. She has published her research results in top-tier conferences and journals. Her research interests include query processing and data management in distributed systems, algorithms and query operators for large-scale data analysis and spatial-keyword search over web-accessible data.

Layout based on YAML