32nd IEEE International Conference on Data Engineering

May 16-20, 2016 · Helsinki, Finland


It is with great pleasure that the organizers of the 32nd IEEE International Conference on Data Engineering invite you to take part in ICDE 2016 to be held in Helsinki, Finland, from May 16 to 20, 2016. As the capital of Finland, Helsinki is a vibrant city on the Baltic Sea renowned for its design and friendly atmosphere. The venue is at Aalto University's School of Business, which is located right in the city center.
  • Helsinki_City
  • Finlandia_Hall
  • Venue
Photos now available in Photo Gallery under 'General Information'.

Tutorials

  • Data Profiling
  • Ziawasch Abedjan, Lukasz Golab and Felix Naumann.
  • One of the crucial requirements before consuming datasets for any application is to understand the dataset at hand and its metadata. The process of metadata discovery is known as data profiling. Profiling activities range from ad-hoc approaches, such as eye-balling random subsets of the data or formulating aggregation queries, to systematic inference of structural information and statistics of a dataset using dedicated profiling tools. In this tutorial, we highlight the importance of data profiling as part of any data-related use-case, and discuss the area of data profiling by classifying data profiling tasks and reviewing the state-of-the-art data profiling systems and techniques. In particular, we discuss hard problems in data profiling, such as algorithms for dependency discovery and profiling algorithms for dynamic data and streams. We conclude with directions for future research in the area of data profiling.
  • Microblogs Data Management and Analysis
  • Amr Magdy and Mohamed Mokbel.
  • Microblogs data, e.g., tweets, reviews, news comments, and social media comments, has gained considerable attention in recent years due to its popularity and rich contents. Nowadays, microblogs applications span a wide spectrum of interests, including detecting and analyzing events, user analysis for geo-targeted ads and political elections, and critical applications like discovering health issues and rescue services. Consequently, major research efforts are spent to analyze and manage microblogs data to support different applications. This 1.5 hours tutorial gives an overview about microblogs data analysis, management, and systems. The tutorial gives a comprehensive review for research efforts that are trying to analyze microblogs contents to build on them new functionality and use cases. In addition, the tutorial reviews existing research that proposes core data management components to support microblogs queries at scale. Finally, the tutorial reviews system-level issues and on-going work on supporting microblogs data through the rising big data systems. Through its different parts, the tutorial highlights the challenges and opportunities in microblogs data research.
  • The Era of Big Spatial Data
  • Ahmed Eldawy and Mohamed Mokbel.
  • In this tutorial, we present the recent work in the database community for handling Big Spatial Data. This topic became very hot due to the recent explosion in the amount of spatial data generated by smartphones, satellites and medical devices, among others. This tutorial goes beyond the use of existing systems as-is (e.g., Hadoop, Spark or Impala), and digs deep into the core components of big systems (e.g., indexing and query processing) to describe how they are designed to handle big spatial data. During this 90-minute tutorial, we review the state-of-the-art work in the area of Big Spatial Data while classifying the existing research efforts according to the implementation approach, underlying architecture, and system components. In addition, we provide case studies of full-fledged systems and applications that handle Big Spatial Data which allows the audience to better comprehend the whole tutorial.
  • Blocking for Large-Scale Entity Resolution: Challenges, Algorithms, and Practical Examples
  • George Papadakis and Themis Palpanas.
  • Entity Resolution constitutes one of the cornerstone tasks for the integration of overlapping information sources. Due to its quadratic complexity, a large amount of research has focused on improving its efficiency so that it scales to Web Data collections, which are inherently voluminous and highly heterogeneous. The most common approach for this purpose is blocking, which clusters similar entities into blocks so that the pair-wise comparisons are restricted to the entities contained within each block. In this tutorial, we take a close look on blocking-based Entity Resolution, starting from the early blocking methods that were crafted for database integration. We highlight the challenges posed by contemporary heterogeneous, noisy, voluminous Web Data and explain why they render inapplicable these schema-based techniques. We continue with the presentation of blocking methods that have been developed for large-scale and heterogeneous information and are suitable for Web Data collections. We also explain how their efficiency can be further improved by meta-blocking and parallelization techniques. We conclude with a hands-on session that demonstrates the relative performance of several, state-of-the-art techniques. The participants of the tutorial will put in practice all the topics discussed in the theory part, and will get familiar with a reference toolbox, which implements the most prominent techniques in the area and can be readily used to tackle Entity Resolution problems.
  • Accelerating Database Workloads by Software-Hardware-System Co-design
  • Rajesh R. Bordawekar and Mohammad Sadoghi.
  • The key objective of this tutorial is to provide a broad, yet an in-depth survey of the emerging field of co-designing software, hardware, and systems components for accelerating enterprise data management workloads. The overall goal of this tutorial is two-fold. First, we provide a concise system-level characterization of different types of data management technologies, namely, the relational and NoSQL databases and data stream management systems from the perspective of analytical workloads. Using the characterization, we discuss opportunities for accelerating key data management workloads using software and hardware approaches. Second, we dive deeper into the hardware acceleration opportunities using Graphics Processing Units (GPUs) and Field-Programmable Gate Arrays (FPGAs) for the query execution pipeline. Furthermore, we explore other hardware acceleration mechanisms such as single-instruction multiple-data (SIMD) that enables short-vector data parallelism.
  • Scalable Data Management: NoSQL Data Stores in Research and Practice
  • Felix Gessert and Norbert Ritter.
  • The unprecedented scale at which data is consumed and generated today has shown a large demand for scalable data management and given rise to non-relational, distributed "NoSQL" database systems. Two central problems triggered this process: 1) vast amounts of user-generated content in modern applications and the resulting requests loads and data volumes 2) the desire of the developer community to employ problem-specific data models for storage and querying. To address these needs, various data stores have been developed by both industry and research, arguing that the era of one-size-fits-all database systems is over. The heterogeneity and sheer amount of these systems - now commonly referred to as NoSQL data stores - make it increasingly difficult to select the most appropriate system for a given application. Therefore, these systems are frequently combined in polyglot persistence architectures to leverage each system in its respective sweet spot. This tutorial gives an in-depth survey of the most relevant NoSQL databases to provide comparative classification and highlight open challenges. To this end, we analyze the approach of each system to derive its scalability, availability, consistency, data modeling and querying characteristics. We present how each system's design is governed by a central set of trade-offs over irreconcilable system properties. We then cover recent research results in distributed data management to illustrate that some shortcomings of NoSQL systems could already be solved in practice, whereas other NoSQL data management problems pose interesting and unsolved research challenges.
  • Indoor Data Management
  • Hua Lu and Muhammad Aamir Cheema.
  • A large part of modern life is lived indoors such as in homes, offices, shopping malls, universities, libraries and airports. However, almost all of the existing location-based services (LBS) have been designed only for outdoor space. This is mainly because the global positioning system (GPS) and other positioning technologies cannot accurately identify the locations in indoor venues. Some recent initiatives have started to cross this technical barrier, promising huge future opportunities for research organizations, government agencies, technology giants, and enterprising start-ups---to exploit the potential of indoor LBS. Consequently, indoor data management has gained significant research attention in the past few years and the research interest is expected to surge in the upcoming years. This will result in a broad range of indoor applications including emergency services, public services, in-store advertising, shopping, tracking, guided tours, and much more. In this tutorial, we first highlight the importance of indoor data management and the unique challenges that need to be addressed. Subsequently, we provide an overview of the existing research in indoor data management, covering modeling, cleansing, indexing, querying, and other relevant topics. Finally, we discuss the future directions in this important and growing research area, covering spatial-textual search, integrating outdoor and indoor spaces, handling uncertain indoor data, as well as mining and analytics for indoor positioning data.
  • Scaling Up Truth Discovery: From Probabilistic Inference to Misinformation Dynamics
  • Laure Berti-Équille and Javier Borge-Holthoefer.
  • The evolution of the Web from a technology platform to a social ecosystem has resulted in unprecedented data volumes being continuously generated, exchanged, and consumed. User-generated content on the Web is massive, highly dynamic, and characterized by a combination of factual data and opinion data. False information, rumors, and fake contents can be easily spread across multiple sources, making it hard to distinguish between what is true and what is not. Truth discovery (also known as fact-checking) has recently gained lot of interest from Data Science communities. This tutorial will attempt to cover all the facets of the complex topic of truth-finding in Big Data. It will provide a broad overview with new insights, highlighting the progress that has been made in information extraction, data and knowledge fusion, as well as modeling of misinformation dynamics in complex (social) networks. We will review in details current models, algorithms, and techniques proposed by various research communities whose contributions converge towards the same goal of estimating the veracity of data at the Web scale.