HardBD & Active'20

Joint International Workshop on Big Data Management on Emerging Hardware
and Data Management on Virtualized Active Systems

To be Sponsored by and Held in Conjunction with ICDE 2020

April 20, 2020 in Dallas, Texas, USA

	Description
	Topics
	Submission
	Important Dates
	Program
	Keynote
	Organizers
	PC Members

Update: Due to the recent situation of COVID-19, HardBD&Active 2020 will follow ICDE 2020's decision to be held online using Zoom. ICDE and the workshops are free to register.

The morning sessions will be in the HardBD & Active Zoom room. The afternoon session will be in the SMDB Zoom room.

Description

HardBD : Data properties and hardware characteristics are two key aspects for efficient data management. A clear trend in the first aspect, data properties, is the increasing demand to manage and process Big Data in both enterprise and consumer applications, characterized by the fast evolution of Big Data Systems, such as Key-Value stores, Document stores, Graph stores, Spark, MapReduce/Hadoop, Graph Computation Systems, Tree-Structured Databases, as well as novel extensions to relational database systems. At the same time, the second aspect, hardware characteristics, is undergoing rapid changes, imposing new challenges for the efficient utilization of hardware resources. Recent trends include massive multi-core processing systems, high performance co-processors, very large main memory systems, persistent main memory, fast networking components, big computing clusters, and large data centers that consume massive amounts of energy. Utilizing new hardware technologies for efficient Big Data management is of urgent importance.

Active : Existing approaches to solve data-intensive problems often require data to be moved near the computing resources for processing. These data movement costs can be prohibitive for large data sets. One promising solution is to bring virtualized computing resources closer to data, whether it is at rest or in motion. The premise of active systems is a new holistic view of the system in which every data medium and every communication channel become compute-enabled. The Active workshop aims to study different aspects of the active systems' stack, understand the impact of active technologies (including but not limited to hardware accelerators such as SSDs, GPUs, FPGAs, and ASICs) on different applications workloads over the lifecycle of data, and revisit the interplay between algorithmic modeling, compiler and programming languages, virtualized runtime systems and environments, and hardware implementations, for effective exploitation of active technologies.

HardBD & Active'20 : Both HardBD and Active are interested in exploiting hardware technologies for data-intensive systems. The aim of this one-day joint workshop is to bring together researchers, practitioners, system administrators, and others interested in this area to share their perspectives on exploiting new hardware technologies for data-intensive workloads and big data systems, and to discuss and identify future directions and challenges in this area. The workshop aims at providing a forum for academia and industry to exchange ideas through research and position papers.

[ Go to Top ]

Topics

Topics of interest include but are not limited to:

Systems Architecture on New Hardware
Data Management Issues in Software-Hardware-System Co-design
Main Memory Data Management (e.g. CPU Cache Behavior, SIMD, Lock-Free Designs, Transactional Memory)
Data Management on New Memory Technologies (e.g., SSDs, NVMs)
Active Technologies (e.g., GPUs, FPGAs, and ASICs) in Co-design Architectures
Distributed Data Management Utilizing New Network Technologies (e.g., RDMA)
Novel Applications of New Hardware Technologies in Query Processing, Transaction Processing, or Big Data Systems (e.g., Hadoop, Spark, NoSQL, NewSQL, Document Stores, Graph Platforms etc.)
Novel Applications of Low-Power Modern Processors in Data-Intensive Workloads
Virtualizing Active Technologies on Cloud (e.g., Scalability and Security)
Benchmarking, Performance Models, and/or Tuning of Data Management Workloads on New Hardware Technologies

[ Go to Top ]

Submission Guidelines

We welcome submissions of original, unpublished research papers that are not being considered for publication in any other forum. Papers should be prepared in the IEEE format and submitted as a single PDF file. The paper length should not exceed 6 pages. The submission site is https://cmt3.research.microsoft.com/HardBDActive2020.

Authors of a selection of accepted papers will be invited to submit an extended version to the Distributed and Parallel Databases (DAPD) journal.

[ Go to Top ]

Important Dates

Paper submission:	~~January 20, 2020 (Monday) 11:59:00 PM PT~~ January 24, 2020 (Friday) 11:59:00 PM PT
Notification of acceptance:	February 10, 2020 (Monday)
Camera-ready copies:	February 28, 2020 (Friday)
Workshop:	April 20, 2020 (Monday)

[ Go to Top ]

Program

Note: Times are displayed in CDT.

HardBD&Active Zoom Room: all morning sessions

8:45-9:00am CDT Welcome Messages

9:00-9:45am CDT Keynote 1

Software Hardware Co-Design for Cloud Native Database Systems
Feifei Li (Vice President of Alibaba Group, Professor at University of Utah)

9:45-10:15am CDT Joint Invited Talk with SMDB

AI-native Database
Guoliang Li (Tsinghua University)

10:15-10:30am CDT Break

10:30-11:30am CDT Research Presentation

On the Necessity of Explicit Cross-Layer Data Formats in Near-Data Processing Systems.
Tobias Vinçon (Reutlingen University), Arthur Bernhardt (Reutlingen University), Lukas Weber (TU Darmstadt), Andreas Koch (TU Darmstadt), Ilia Petrov (Reutlingen University)
Selective Caching: A Persistent Memory Approach for Multi-Dimensional Index Structures.
Muhammad Attahir Jibril (TU Ilmenau), Philipp Götze (TU Ilmenau), David Broneske (OvG University Magdeburg & Anhalt University of Applied Science), Kai-Uwe Sattler (TU Ilmenau)
A Persistent Memory-Aware Buffer Pool Manager Simulator for Multi-Tenant Cloud Databases.
Taras Basiuk (University of Oklahoma), Le Gruenwald (University of Oklahoma), Laurent d'Orazio (Rennes 1 University), Eleazar Leal (University of Minnesota)

11:30-12:00pm CDT Break (Please switch to the SMDB Zoom room for the afternoon session)

SMDB Zoom Room: afternoon session

12:00-12:45pm CDT Joint Keynote 2 with SMDB

AIOps with the Oracle Autonomous Database
Rao Sandesh (Oracle USA, Vice President, Cloud Diagnosability and RAC Assurance)

[ Go to Top ]

Joint Keynote Talks with SMDB 2020

Software Hardware Co-Design
for Cloud Native Database Systems

Feifei Li
Vice President of Alibaba Group, Professor at University of Utah

Abstract: Cloud native database systems become increasingly popular on the cloud, which leverages the virtualized resource pool provided by the underlying cloud infrastructure to offer excellent elasticity, high availability, and scalability. Decoupling resource usage and management across the stack (e.g, compute and storage) is a critical path towards realizing cloud native properties. Software-hardware co-design plays an important role in this paradigm, such as using kernel bypassing, RDMA for shared distributed storage, FPGA acceleration, NVM for tied memory hierarchy, TEE for secure and trustworthy compute, to name a few. This talk shares our experience and lessons learned from using software-hardware co-design principles towards building cloud native database systems.

Bio: Feifei Li is currently a Vice President of Alibaba Group, ACM Distinguished Scientist, President of the Database Products Business Unit of Alibaba Cloud, and director of the Database and Storage Lab of DAMO academy. He is a tenured full professor at the School of Computing, University of Utah (on leave). He has won multiple awards from NSF, ACM, IEEE, Visa, Google, HP, Microsoft, IBM, etc. He is a recipient of the ACM SoCC 2019 Best Paper Award (runner-up), IEEE ICDE 2014 10 Years Most Influential Paper Award, ACM SIGMOD 2016 Best Paper Award, ACM SIGMOD 2015 Best System Demonstration Award, IEEE ICDE 2004 Best Paper Award. He has been an associate editor, PC co-chairs, and core committee members for many prestigious journals and conferences.

AIOps with the Oracle Autonomous Database

Sandesh Rao
Oracle USA
Vice President, Cloud Diagnosability and RAC Assurance

Abstract: Autonomous Database is one of the hottest Oracle products where we have attempted to use Machine Learning for several aspects of the service. We will cover some use cases to find anomalies in them to troubleshoot them at a scale of several petabytes a year using Log Anomaly timeline using semi-supervised machine learning techniques to reduce logs and match them in near real time. We will also cover how we detect changing workload, use Zscores to pinpoint faults, use time series analysis to find good times to do backups or maintenance, models to detect performance tuning issues and root cause analysis as well as fleet learning to apply knowledge of trends and issues across multiple symptoms affecting the fleet including rediscovery. We will cover examples, code where applicable and frameworks we use for this.

Bio: Sandesh Rao is a VP running the AIOps Automation for the Autonomous Database Group at Oracle Corporation specializing using AI/ML for different use cases from predicting faults before they happen to Anomaly Detection within log data, metrics data. His previous positions have focused on performance tuning, high availability, disaster recovery and architecting cloud-based solutions using the Oracle Stack. With more than 18 years of experience working in the HA space and having worked on several versions of Oracle with different application stacks, he is a recognized expert in RAC, Database Internals, PaaS, SaaS, and IaaS solutions and solving Big Data related problems. Most of his work involves working with customers in the implementation of public and hybrid cloud projects in the financial, retailing, scientific, insurance, biotech, and tech space. He is also responsible for developing assessments for best practices for the Oracle Grid Infrastructure 19c including products like RAC (Real Application Clusters), Storage (ASM, ACFS). More details https://bit.ly/1UCL46K

[ Go to Top ]

Joint Invited Talk with SMDB 2020

AI-Native Database

Guoliang Li
Tsinghua University

Abstract: In big data era, database systems face three challenges. Firstly, the traditional heuristics-based optimization techniques (e.g., cost estimation, join order selection, knob tuning) cannot meet the high-performance requirement for large-scale data, various applications and diversified data. We can design learning-based techniques to make database more intelligent. Secondly, many database applications require to use AI algorithms, e.g., image search in database. We can embed AI algorithms into database, utilize database techniques to accelerate AI algorithms, and provide AI capability inside databases. Thirdly, traditional databases focus on using general hardware (e.g., CPU), but cannot fully utilize new hardware (e.g., AI chips). Moreover, besides relational model, we can utilize tensor model to accelerate AI operations. Thus, we need to design new techniques to make full use of new hardware. To address these challenges, we design an AI-native database. On one hand, we integrate AI techniques into databases to provide self-configuring, self-optimizing, self-healing, self-protecting and self-inspecting capabilities for databases. On the other hand, we can enable databases to provide AI capabilities using declarative languages, in order to lower the barrier of using AI. In this talk, I will introduce the five levels of AI-native databases and provide the open challenges of designing an AI-native database. I will also take automatic database knob tuning, deep reinforcement learning based optimizer, machine-learning based cardinality estimation, automatic index/view advisor as examples to showcase the superiority of AI-native databases.

Bio: Guoliang Li is a full Professor of Department of Computer Science, Tsinghua University, Beijing, China. His research interests include AI-native database, big data analytics and mining, crowdsourced data management, big spatio-temporal data analytics, large-scale data cleaning and integration. He has published more than 100 papers in premier conferences and journals, such as SIGMOD, VLDB, ICDE, SIGKDD, SIGIR, TODS, VLDB Journal, and TKDE. He will be the General co-chair of SGIMOD 2021 and demo chair of VLDB 2021. He is working as associate editor for IEEE Transactions and Data Engineering, VLDB Journal, ACM Transaction on Data Science, IEEE Data Engineering Bulletin. He got several best paper awards in top conferences, such as CIKM 2017 best paper award, ICDE 2018 best paper candidate, KDD 2018 best paper candidate, DASFAA 2014 best paper runner-up, APWeb 2014 best paper award, etc. He received VLDB Early Research Contribution Award 2017, and IEEE TCDE Early Career Award 2014.

[ Go to Top ]

Organizers

Shimin Chen, Chinese Academy of Sciences, chensm@ict.ac.cn
Mohammad Sadoghi, UC Davis, msadoghi@ucdavis.edu
Khuzaima Daudjee , University of Waterloo, kdaudjee@uwaterloo.ca

[ Go to Top ]

PC Members

Manos Athanassoulis, Boston University
Bingsheng He, National University of Singapore
Peiquan Jin, Univerisity of Science and Technology of China
Wolfgang Lehner, TU Dresden
Yinan Li, Microsoft Research
Qiong Luo, Hong Kong University of Science and Technology
Stefan Manegold, CWI
Ilia Petrov, Reutlingen University
Eva Sitaridi, Amazon
Tianzheng Wang, Simon Fraser University
Xiaodong Zhang, Ohio State University

[ Go to Top ]