Keynote: Prof. Martin Kersten — CWI, the Netherlands
to Purified Information
To truly understand the complexities of global warming, gene diversity and the complexity of the universe, the sciences more and more rely on database technology to digest and manipulate large amounts of raw facts. It requires daily injection of Terabytes, a synergy with large-scale existing science repositories, efficient support for the preferred computational paradigm (e.g. SQL, Python, R), and linked-in libraries (e.g. BLAS, GSL, Numpy). However, the staggering amount of science data collected creates a challenge beyond a mere buying of more hardware or elastic growth in the Cloud. Scientific database management systems should fundamentally change.
To cope with big data growth, a DBMS should provide a data freshness decay model to ensure its proper functioning within the storage, processing and responsiveness bounds given. Data should after injection into a database become subject to a data rotting process, or be distilled into meaningful half-products using a purification process. Data rotting uses data agnostic techniques to remove data from the repository when the sustainability of the system becomes at risk. The counter measure is to exploit domain knowledge to enable data purification, i.e. replace raw data by sound statistical micro-models to reduce their resource claims. The project challenges the data durability axiom underlying all database systems. Instead, I coin that a DBMS may selectively forget raw data on its own initiative. Ideally by harvesting micromodels and to forget noisy facts. The experimental context is provided by the emerging in-memory database technology, which provides a significant improvement over disk-based approaches.
If successful, the research brings down the resource requirements of scientific databases significantly, it provides fast and robust statistical query responses and it harvests the use patterns by identifying the laws of data. A new substrate is created for data driven scientific discoveries.
Professor Martin Kersten is the recipient of the 2014 SIGMOD Edgar F. Codd Innovation Award for his influential contributions to advanced database architectures, most notably his pioneering work on columnar, in-memory, and hardware-conscious database technologies and their realization in the MonetDB system. Martin Kersten is a CWI Research Fellow (the Institute for Mathematics and Computer Science Research of the Netherlands), professor at the University of Amsterdam, and co-founder of several database companies.
He has dedicated his scientific career to the development and dissemination of database systems and technology. Since the early nineties, he developed MonetDB, an open source high-performance columnar database system. MonetDB contains innovations at all layers of a DBMS: a storage model based on vertical fragmentation, in-memory self-tuning indices, just-in-time query optimization, hardware aware database structures, and a modern CPU-tuned vectorized query execution architecture. The technology received its recognition in the VLDB 10-year Best Paper Award in 2009, the SIGMOD Best Paper runner up award in 2009, and the VLDB Best Vision paper award in 2011. The latest MonetDB Solutions company was established in 2013 to support commercial exploitation and driving the open-source product technology.
In the last decade, Martin Kersten shifted his focus towards the requirements of scientific databases. Input from astronomy, seismology, and remote sensing applications lead to enrichments of relational database technology with just-in-time access to scientific file repositories, a symbiosis between the relational query model and array-based processing, and the support for statistics within a database kernel. These activities are part of the 10-year EU FET Human Brain Project.
Keynote: Dr. Götz Brasche — Central Software Institute, Huawei European Research Center, Germany
Are Databases Ready for the Cloudification of the Telecommunication Systems?
The mobile telecom industry has been growing at unprecedented pace. It started in the 90s with mobile telephony and quickly expanded into a general application and service platform. The next big wave is expected with the outbreak of Internet of Things (IoT). It is predicted that, by the next decade, the number of connected devices will grow from billions to tens of billions. Under the big growth pressure and tough competition, the implementation of telecom systems is shifting away from dedicated hardware to virtualized components. With the Network Function Virtualization (VNF) initiative, network components will be implemented as layers of distributed virtualized services. Also physical routers are being replaced with software entities following the SDN (Sofware Defined Network) approach. The two approaches can be also combined, resulting in SDN over NFV.
Databases could be found already in the first computerized switching centers. These service control databases were very often proprietary and very specialized. In time, the demand for more generalized and commercially supported database has been surfacing. With the growing complexity of network services, the amount of data used in the execution of the services (service control) is also rapidly growing. Introduction of NFV and SDN poses additional challenges before the database technology. New service control databases are not only supposed to be very fast and large. NFV imposes an environment of virtual clusters that are elastic (scaling in and out), acccording to the load, and built of low-reliability components. Still, the database is supposed to meet high availability requirements and varying consistency requirements. Thus, distributed databases enter a new era of having to deliver demanding service quality in non-demanding environment. Also, present are new trends of memory-centered systems, and utilizing new technologies like non-volatile memory and transactional memory.
In the talk, we discuss, in a more detail, the data management challenges posed by different telecom services, from IP session management to subscriber data management to SDN control. We also describe Huawei's plans to enter the era of network cloudification and the introduction of NFV and SDN products. The databases will have their part in the development.
Dr. Götz Brasche is CTO IT R&D Europe and Director of the Central Software Institute at the European Research Center (ERC) of Huawei. He is responsible for the research and development of Huawei's IT, cloud and data center product portfolio and software platform research. Huawei with its 170.000 employees and 76.000 R&D staff is a world leader in mobile communication technologies, smartphones and IT products. The ERC as part of the corporate R&D organization is Huawei's central "innovation engine" in Europe with 9 locations in 6 countries and currently round about 1200 employees. Dr. Brasche holds a master's degree in Computer Science with a minor in Business Administration and a Ph.D. in Electrical Engineering.
He joined Huawei in 2013 from Microsoft Research where he co-founded the European Microsoft Innovation Center (EMIC) in May 2003. At EMIC he was responsible for the Cloud Computing research engagements in EMEA. Dr. Brasche also initiated the establishment of the Microsoft Embedded Systems Development Center at EMIC in the beginning of 2008 to facilitate an integrated R&D approach of Microsoft in Europe. He represented Microsoft in the Joint Undertaking ARTEMIS where he was on the Steering Board and co-chaired the Chamber of industry members. In his prior roles at EMIC as Director Embedded Systems R&D and Program Director, he was in charge of EMIC's collaborative R&D activities in the fields of «embedded and mobile computing» and EMIC's overall research program management.
Before his career at Microsoft Dr. Brasche held various management and research positions at Ericsson. As Director of Ericsson's partner program he initiated the mobile Internet market in Europe through evaluation, design and marketing of promising mobile application and solutions. He was majorly involved in strategic business development and sales of solutions for the emerging Universal Mobile Telecommunications Systems (UMTS). While at the Ericsson Eurolab Germany, one of Ericsson's major European research centers, he headed the System Management, played a decisive role in the development and standardization of mobile communications systems and was in charge of the pilot implementation of one of the first world-wide General Packet Radio Service (GPRS) networks. As research assistant in the Department of Communication Networks at RWTH Aachen University, Dr. Brasche explored various aspects of embedded systems and mobile Internet as early as 1992. In particular, he pioneered protocol architectures for packet oriented speech and data services for existing and future mobile networks.
Keynote: Dr. Leo Kärkkäinen — Nokia Labs, Finland
Importance of Data to Artificial Intelligence. Deep Learning with Examples
Digitalisation has enabled gathering and sharing of huge amounts of data, while preserving the quality of the content. Lot of data has been structured, and entered in a way that specifies relations in the data. However, most of the digital content is in free form, raw data, and this trend will be emphasised with more and more connected devices around uploading digital reflections of modern life.
Big efforts have been set and are on-going to give a meaning and structure to the raw data. In some case it is done by crowd sourcing to annotate the content, be it images of faces, sensor- or even medical data. With annotation one can use a new breed of artificial intelligence, deep learning to classify the data. This has been behind modern search, recommendation, classification, anomaly detection and a vast amount of other tasks related to data utilisation and semantic understanding.
The talk will discuss about the relation between raw data and deep learning algorithms in industrial value chain and give examples of applications from industry. Also, it is important to foresee how the requirements for data storage and retrieval will change with the new ways of using the data.
Dr. Leo Kärkkäinen is Distinguished Research Leader at Nokia Labs. Since autumn 2015 Leo has been heading the Predictive Health Analytics team in the Digital Health Laboratory. He is a member of the laboratory's leadership team and also manages the activities for Nokia's university donations.
After obtaining his PhD in 1990 in theoretical physics from the University of Helsinki, Leo held postdoctoral positions in Germany (Bielefeld), in USA (Tucson), and in Denmark (Copenhagen). After he joined Nokia Research Center in 1996, he has worked as a leader in electro-acoustic and emerging technologies teams, and in several leadership teams as a technology expert. He has been a member of CEO (2009-2011) and CTO (2012-2014) Technology councils of Nokia.
Leo is an Adjunct Professor of Theoretical Physics in University of Helsinki and he has more than 70 academic papers.
Keynote: Bruce Momjian — EnterpriseDB, USA
and Future Challenges
Relational databases are regularly challenged by new technologies that promise to make SQL obsolete, but these new approaches often fade into obscurity. NoSQL is one of these new technologies, though it is actually four separate technologies: key-value stores, document databases, columnar stores, and graph databases. These NoSQL solutions are optimized for fast querying, auto-sharding, and flexible schemas. However, they lack many of the features of relational systems, e.g. a query language, optimization, data integrity, making them unsuitable for complex applications. NoSQL works best for massive write scaling and simple data access patterns. Postgres has added JSONand variable schema features to handle typical NoSQL data demands. Postgres also supports Foreign Data Wrappers to access NoSQL data stores when required.
Over its thirty years of development, Postgres has excelled at harnessing the creative talent of thousands of developers around the world. However, there are some difficult problems that have stumped our massive group. This talk explores how relational database systems like Postgres are adapting to handle NoSQL workloads, and covers some complex problems that continue to challenge the project.
Bruce Momjian is co-founder and core team member of the PostgreSQL Global Development Group, and has worked on PostgreSQL since 1996. Bruce is employed by EnterpriseDB. Previously, he was employed by SRA Japan and other PostgreSQL support companies. He has spoken at many international open-source conferences and is the author of PostgreSQL: Introduction and Concepts, published by Addison-Wesley.
Prior to his involvement with PostgreSQL, Bruce worked as a consultant, developing custom database applications for some of the world's largest law firms. As an academic, Bruce holds a Masters in Education, was a high school computer science teacher, and is currently an adjunct professor at Drexel University.
TCDE Impact Award: Prof. Michael J. Carey — University of California at Irvine, USA
Is the Fourth Time the Charm?
In the beginning was the Word, and the Word was with Codd, and the Word was Codd. The beginning - of the database field as we know it today, that is - was 1970. And of course, the Word was the relational model: rows and columns, a normalized (and flat) world. The spartan simplicity of that early relational model, together with the very idea of a logical data model, enabled revolutionary changes: declarative queries, transparent indexing, query optimization, and scalable run-time parallelism, among others. Also from this beginning grew a multi-billion dollar market for relational database systems and tools. And the rest, as they say, is history...
While relational languages and systems have served us incredibly well for nearly half a century, they have always been a bit of a "misfit" when it comes to application data and requirements, particularly for user-facing applications. Time and again this issue has been identified and "solved" - or not. Three notable attempts to address this "impedance mismatch" between applications and databases were object-oriented this "impedance mismatch" between applications and databases were object-oriented databases, object-relational databases, and XML databases. Each attempt, in the end, "fell flat" (so to speak) as compared to the success of relational databases. Another attempt appears to be happening today, in the so-called "Big Data" era, in the form of "NoSQL" databases. In this talk, the speaker will share his views about what went wrong the first three times, what lasting lessons resulted, and whether or not this fourth attempt might be the charm. He will also highlight some of the systemsrelated research challenges posed by this attempt as well as some thoughts/pleas on how (or how not) to approach them.
BIO: (with acknowledgements to Roberto Zicari for this bio)
Michael J. Carey is a Bren Professor of Information and Computer Sciences at UC Irvine. Before joining UCI in 2008, Carey worked at BEA Systems for seven years and led the development of BEA's AquaLogic Data Services Platform product for virtual data integration. He also spent a dozen years teaching at the University of Wisconsin-Madison, five years at the IBM Almaden Research Center working on object-relational databases, and a year and a half at e-commerce platform startup Propel Software during the infamous 2000-2001 Internet bubble.
Carey is an ACM Fellow, a member of the National Academy of Engineering, and a recipient of the ACM SIGMOD E.F. Codd Innovations Award. His current interests all center around data-intensive computing and scalable data management (a.k.a. Big Data).