HardBD & Active'18

Joint Workshop of HardBD (International Workshop on Big Data Management on Emerging Hardware)
and Active (Workshop on Data Management on Virtualized Active Systems)

To be Sponsored by and Held in Conjunction with ICDE 2018

April 16, 2018 in Paris, France

	Description
	Topics
	Submission
	Important Dates
	Program
	Keynote
	Organizers
	PC Members
	Sponsor

Description

HardBD : Data properties and hardware characteristics are two key aspects for efficient data management. A clear trend in the first aspect, data properties, is the increasing demand to manage and process Big Data in both enterprise and consumer applications, characterized by the fast evolution of Big Data Systems, such as Key-Value stores, Document stores, Graph stores, Spark, MapReduce/Hadoop, Graph Computation Systems, Tree-Structured Databases, as well as novel extensions to relational database systems. At the same time, the second aspect, hardware characteristics, is undergoing rapid changes, imposing new challenges for the efficient utilization of hardware resources. Recent trends include massive multi-core processing systems, high performance co-processors, very large main memory systems, persistent main memory, fast networking components, big computing clusters, and large data centers that consume massive amounts of energy. Utilizing new hardware technologies for efficient Big Data management is of urgent importance.

Active : Existing approaches to solve data-intensive problems often require data to be moved near the computing resources for processing. These data movement costs can be prohibitive for large data sets. One promising solution is to bring virtualized computing resources closer to data, whether it is at rest or in motion. The premise of active systems is a new holistic view of the system in which every data medium and every communication channel become compute-enabled. The Active workshop aims to study different aspects of the active systems' stack, understand the impact of active technologies (including but not limited to hardware accelerators such as SSDs, GPUs, FPGAs, and ASICs) on different applications workloads over the lifecycle of data, and revisit the interplay between algorithmic modeling, compiler and programming languages, virtualized runtime systems and environments, and hardware implementations, for effective exploitation of active technologies.

HardBD & Active'18 : Both HardBD and Active are interested in exploiting hardware technologies for data-intensive systems. The aim of this one-day joint workshop is to bring together researchers, practitioners, system administrators, and others interested in this area to share their perspectives on exploiting new hardware technologies for data-intensive workloads and big data systems, and to discuss and identify future directions and challenges in this area. The workshop aims at providing a forum for academia and industry to exchange ideas through research and position papers.

[ Go to Top ]

Topics

Topics of interest include but are not limited to:

Systems Architecture on New Hardware
Data Management Issues in Software-Hardware-System Co-design
Main Memory Data Management (e.g. CPU Cache Behavior, SIMD, Lock-Free Designs, Transactional Memory)
Data Management on New Memory Technologies (e.g., SSDs, NVMs)
Active Technologies (e.g., GPUs, FPGAs, and ASICs) in Co-design Architectures
Distributed Data Management Utilizing New Network Technologies (e.g., RDMA)
Novel Applications of New Hardware Technologies in Query Processing, Transaction Processing, or Big Data Systems (e.g., Hadoop, Spark, NoSQL, NewSQL, Document Stores, Graph Platforms etc.)
Novel Applications of Low-Power Modern Processors in Data-Intensive Workloads
Virtualizing Active Technologies on Cloud (e.g., Scalability and Security)
Benchmarking, Performance Models, and/or Tuning of Data Management Workloads on New Hardware Technologies

[ Go to Top ]

Submission Guidelines

We welcome submissions of original, unpublished research papers that are not being considered for publication in any other forum. Papers should be prepared in the IEEE format and submitted as a single PDF file. The paper length should not exceed 6 pages. The submission site is https://cmt3.research.microsoft.com/HardBDActive2018 .

[ Go to Top ]

Important Dates

Paper submission:	~~January 15, 2018 (Monday) 11:59:00 PM PT~~ January 21, 2018 (Sunday) 11:59:00 PM PT
Notification of acceptance:	~~February 5, 2018 (Monday)~~ February 10, 2018 (Saturday)
Camera-ready copies:	~~February 19, 2018 (Monday)~~ February 24, 2018 (Saturday)
Workshop:	April 16, 2018 (Monday)

[ Go to Top ]

Program

8:40-8:45am Welcome Messages

8:45-9:45am Session I: Keynote 1

Bitwise Dimensional Co-Clustering (BDCC): Exploiting Fine-Grained Persistent Memories for OLAP. (slides)
Peter Boncz (CWI & VU University Amsterdam)

9:45-10:30am Coffee Break

10:30-12:00pm Session II: Research Presentation 1

Exploiting Automatic Vectorization to Employ SPMD on SIMD Registers. Stefan Sprenger (Humboldt-Universität zu Berlin), Steffen Zeuch (DFKI Berlin), Ulf Leser (Humboldt-Universität zu Berlin) (slides)
Conflict Detection-based Run-Length Encoding -- AVX512-CD Instruction Set in Action. Annett Ungethüm, Johannes Pietrzyk, Patrick Damme, Dirk Habich, Wolfgang Lehner (TU Dresden) (slides)
Fused Table Scans: Using AVX-512 and JIT to Double the Performance of Consecutive Table Scans. Markus Dreseler, Jan Kossmann, Johannes Frohnhofen, Matthias Uflacker, Hasso Plattner (Hasso Plattner Institute) (slides)
An NVM-aware Storage Layout for Analytical Workloads. Philipp Götze, Stephan Baumann, Kai-Uwe Sattler (TU Ilmenau)

12:00-13:30am Lunch

13:30-15:00pm Session III: Research Presentation 2

Workload-Driven Horizontal Partitioning and Partition Pruning for Large HTAP Systems. Martin Boissier (Hasso Plattner Institute), Daniel Kurzynski (SAP SE) (slides)
Life Cycle of Transactional Data in In-memory Databases. Amit Pathak, Aditya Gurajada, Pushkar Khadilkar (SAP) (slides)
Towards Batch-Processing on Cold Storage Devices. Ali Hadian, Thomas Heinis (Imperial College) (slides)
Efficient Stream Processing of Scientific Data. Thomas Lindemann, Jonas Kauke, Jens Teubner (TU Dortmund)

15:00-15:30am Coffee Break

15:30-16:30pm Session IV: Keynote 2

Caching in the Memory Hierarchy: 5 Minutes Ought to Be Enough for Everybody. (slides)
Anastasia Ailamaki (EPFL)

16:30-17:00pm Session V: Industry Talk

X-DB: the Next Generation Database System of Alibaba Group.
Tieying Zhang (Alibaba)

[ Go to Top ]

Keynote Talks

Bitwise Dimensional Co-Clustering (BDCC):
Exploiting Fine-Grained Persistent Memories for OLAP

Peter Boncz (CWI & VU University Amsterdam)

With the current dominance of flash and trend towards even more fine-grained non-volatile memories (NVM), rightfully a lot of attention has been given to the implications of fine-grained persistence for OLTP. This keynote, however, focuses on OLAP, and describes a new processing and storage framework called Bitwise Dimensional Co-clustering (BDCC) that exploits the low-latency and small granularity access capabilities of such modern storage.

Analytical workloads in data warehouses often include heavy joins where queries involve multiple fact tables in addition to the typical star-patterns, dimensional grouping and selections. Deeply clustered storage is an optimizing database design that avoids replication and thus keeps updates fast, yet accelerates all foreign key joins, efficiently supports grouping and pushes down most dimensional selections, and sometimes eliminates dimensional joins. The framework is made up of side-ways information passing operators at execution, query optimization rules and costing and automated physical design algorithm that takes the schema as its only input.

While BDCC was designed for flash, its fine-grained philosophy matches up-and-coming NVM capabilities, and this talk is intended to inspire new research directions for analytics on these new memories.

Bio: Peter Boncz holds appointments as tenured researcher at CWI and full professor at VU University Amsterdam. His academic background is in core database architecture, with the architecture of MonetDB the main topic of his PhD thesis --MonetDB won the 2016 ACM SIGMOD systems award. This work focused on architecture-conscious database research, which studies the interaction between computer architecture and data management techniques. His specific contributions are in cache-conscious join methods, query and transaction processing in columnar database systems, and vectorized query execution. He has a track record in bridging the gap between academia and commercial application, receiving the Dutch ICT Regie Award 2006 for his role in the CWI spin-off company Data Distilleries. In 2008 he founded a new CWI spin-off company called Vectorwise, dedicated to state-of-the art business intelligence technology. He is also the co-recipient of the 2009 VLDB 10 Years Best Paper Award, and in 2013 received the Humboldt Research Award. He recently also provides advise on high performance data architectures to Databricks (the creators of Apache Spark), who recently openend an Amsterdam R&D office to collaborate with CWI.

Caching in the Memory Hierarchy:
5 Minutes Ought to Be Enough for Everybody

Anastasia Ailamaki (EPFL)

In 1987, Jim Gray and Gianfranco Putzolu introduced the five-minute rule for trading memory to reduce disk I/O using the then-current price-performance characteristics of DRAM and Hard Disk Drives (HDD). Since then, the five-minute rule has gained wide-spread acceptance as an important rule-of-thumb in data engineering. In this talk, I will revisit the five-minute rule three decades since its introduction and use it to identify impending changes in today's multi-tier storage hierarchy given recent trends in the storage hardware landscape. I will investigate the impact of the five-minute rule -- explicit or implicit -- on the way we perform analytics. We will see that the rule applies both in the bottom tiers of the hierarchy, which is based on new Cold Storage Devices (CSD), but also in main-memory databases, where researchers have been working on hot-cold data separation and on heterogeneity-aware caching techniques.

Bio: Anastasia Ailamaki is a Professor of Computer and Communication Sciences at the Ecole Polytechnique Federale de Lausanne (EPFL) in Switzerland and the co-founder of RAW Labs SA, a swiss company developing real-time analytics infrastructures for heterogeneous big data. Her research interests are in data-intensive systems and applications, and in particular (a) in strengthening the interaction between the database software and emerging hardware and I/O devices, and (b) in automating data management to support computationally- demanding, data-intensive scientific applications. She has received an ERC Consolidator Award (2013), a Finmeccanica endowed chair from the Computer Science Department at Carnegie Mellon (2007), a European Young Investigator Award from the European Science Foundation (2007), an Alfred P. Sloan Research Fellowship (2005), eight best-paper awards in database, storage, and computer architecture conferences, and an NSF CAREER award (2002). She holds a Ph.D. in Computer Science from the University of Wisconsin-Madison in 2000. She is an ACM fellow, an IEEE fellow, and an elected member of the Swiss National Research Council. She has served as a CRA-W mentor, and is a member of the Expert Network of the World Economic Forum.

[ Go to Top ]

Organizers

Shimin Chen, Chinese Academy of Sciences, chensm@ict.ac.cn
Xiaofeng Meng, Renmin University of China (RUC), mengxf@ruc.edu.cn
Mohammad Sadoghi, UC Davis, msadoghi@ucdavis.edu

[ Go to Top ]

PC Members

Manos Athanassoulis, Harvard University
Sebastian Breß, DFKI GmbH
Bingsheng He, National University of Singapore
Peiquan Jin, Univerisity of Science and Technology of China
Qiong Luo, Hong Kong University of Science and Technology
Roger Moussalli, Two Sigma
Ilia Petrov, Reutlingen University
Eva Sitaridi, Amazon
Xiaodong Zhang, Ohio State University

[ Go to Top ]

Sponsor

[ Go to Top ]