HardBD & Active'22


HardBD & Active'23

Joint International Workshop on Big Data Management on Emerging Hardware
and Data Management on Virtualized Active Systems

To be Sponsored by and Held in Conjunction with ICDE 2023

April 3, 2023 in Anaheim, California, USA

bullet Description
bullet Topics
bullet Submission
bullet Important Dates
bullet Program
bullet Keynote
bullet Organizers
bullet PC Members
bullet Previous Workshops
2022
2021
2020
2019
2018
2017
2016
2015

  Description

HardBD & Active'23 will be a one-day workshop co-located with ICDE'23. The aim of this one-day workshop is to bring together researchers, practitioners, system administrators, and others interested in this area to share their perspectives on exploiting new hardware technologies for data-intensive workloads and big data systems, and to discuss and identify future directions and challenges in this area. The workshop aims at providing a forum for academia and industry to exchange ideas through research and position papers.

[ Go to Top ]

  Topics

 Topics of interest include but are not limited to:

  • Systems Architecture on New Hardware
  • Data Management in Software-Hardware-System Co-design
  • Main Memory Data Management (e.g. Multi-core, Cache, SIMD)
  • Data Management on New Memory Technologies (e.g., SSDs, NVMs)
  • Active Technologies (e.g., GPUs, FPGAs, and ASICs) in Co-design Architectures
  • Distributed Data Management Utilizing New Network Technologies (e.g., RDMA, CXL)
  • Novel Applications of New Hardware Technologies in Query Processing, Transaction Processing, or Big Data Systems (e.g., Hadoop, Spark, NoSQL, NewSQL, Document Stores, Graph Platforms etc.)
  • Virtualizing Active Technologies on Cloud (e.g., Scalability and Security)
  • Benchmarking, Performance Models, and/or Tuning of Data Management Workloads on New Hardware Technologies

[ Go to Top ]

  Submission Guidelines

We welcome submissions of original, unpublished research papers that are not being considered for publication in any other forum. Papers should be prepared in the IEEE format and submitted as a single PDF file. The paper length should not exceed 6 pages. The submission site is https://cmt3.research.microsoft.com/HardBDActive2023.

[ Go to Top ]

  Important Dates


Paper submission: January 11, 2023 (Wednesday) 11:59:00 PM PT
January 18, 2023 (Wednesday) 11:59:00 PM PT
Notification of acceptance: February 1, 2023 (Wednesday)
Camera-ready copies: February 15, 2023 (Wednesday)
Workshop: April 3, 2023 (Monday)

[ Go to Top ]

  Program


10:30-12:30 PDT SMDB and HardBD&Active Joint Keynote Session I (SMDB Room)

12:30-14:00 PDT Lunch

14:00-14:10 PDT Opening and Introductions (HardBD&Active Room)

14:10-15:30 PDT Research Session (Session Chair: Suyash Gupta) (HardBD&Active Room)

15:30-16:00 PDT Coffee Break

16:00-18:00 PDT SMDB and HardBD&Active Joint Keynote Session II (Session Chair: Tianzheng Wang) (HardBD&Active Room)

18:00-18:05 PDT Closing (HardBD&Active Room)

[ Go to Top ]

  Keynote Talks


Anastasia Ailamaki      From Self-Managed Database Systems to Runtime-Intelligent Analytics


Anastasia Ailamaki
EPFL and Google, Inc.

Abstract: Self-managed database systems have been successful in addressing various challenges such as database tuning, optimization, and maintenance. However, the growth and heterogeneity of data, hardware, and applications reveal limitations which can only be addressed using real-time adaptive query engines. Just-in-time query execution using code generation presents a promising solution which enables efficient processing of complex queries while minimizing overhead and maintenance costs and allowing databases to be more dynamic, adaptive, and efficient. This talk presents an overview of approaches to just-in-time query execution including dynamic query planning, adaptive caching, self-optimized data pipelines, and machine learning-based techniques. We discuss the benefits and challenges of these approaches, as well as their practical applications.

Bio: Anastasia Ailamaki is a Professor of Computer and Communication Sciences at the École Polytechnique Fédérale de Lausanne (EPFL) in Switzerland, as well as the co-founder and Chair of the Board of Directors of RAW Labs SA, a Swiss company developing systems to analyze heterogeneous big data from multiple sources efficiently. She earned a Ph.D. in Computer Science from the University of Wisconsin-Madison in 2000. She has received the 2019 ACM SIGMOD Edgar F. Codd Innovations Award and the 2020 VLDB Women in Database Research Award. She is also the recipient of an ERC Consolidator Award (2013), the Finmeccanica endowed chair from the Computer Science Department at Carnegie Mellon (2007), a European Young Investigator Award from the European Science Foundation (2007), an Alfred P. Sloan Research Fellowship (2005), an NSF CAREER award (2002), twelve best-paper awards in international scientific conferences. She has received the 2018 Nemitsas Prize in Computer Science by the President of Cyprus and the 2021 ARGO Innovation Award by the President of the Hellenic Republic. She is an ACM fellow, an IEEE fellow, a member of the Academia Europaea, and an elected member of the Swiss, the Belgian, the Greek, and the Cypriot National Research Councils.


Hanuma Kodavalla      Self-Managing Database Capabilities in SQL Azure


Hanuma Kodavalla
Technical Fellow, Microsoft

Abstract: SQL Azure is a Database Platform as a Service offering from Microsoft that manages more than 11 million databases worldwide in all geographies. Managing at this scale requires automating many aspects of running a database system. This talk describes how SQL Azure automatically allocates required resources efficiently based on the workload, detects and recovers from query plan regressions, protects from security attacks, recovers from hardware defects, finetunes the locking mechanism based on the concurrent workload, reduces recovery time and failover impact, protects from natural disasters like power, fire and flood, manages storage and indexes, and deploys new versions of infrastructure and database software with minimum disruption. The talk also covers the monitoring and the debugging facilities, some of the lessons learnt and the challenges that remain in making a large and growing fleet of databases self-managing.

Bio: Hanuma Kodavalla is a Technical Fellow in the Azure Databases group at Microsoft where he has been for twenty years. He previously worked at Data General, Digital Equipment Corporation, Oracle, Sybase and Asera. For more than three decades, Hanuma worked on many aspects of Relational Database Systems and has been instrumental in architecting multiple commercial database systems for high performance and high availability. Hanuma received BTech in Electronics and Communications in 1981 from National Institute of Technology, Warangal, India, MTech in Computer Science in 1983 from Indian Institute of Technology, Chennai, India, and MS in Computer Science in 1988 from University of Massachusetts, Amherst, USA. He has a few publications in database conferences and many patents related to novel implementation techniques for online transaction processing and data warehousing in the areas of concurrency control, recovery, high-availability, query processing and security.


Stratos Idreos      The Data Systems Grammar


Stratos Idreos
Harvard

Abstract: Data systems are everywhere. A data system is a collection of data structures and algorithms working together to achieve complex data processing tasks. For example, with data systems that utilize the correct data structure design for the problem at hand, we can reduce the monthly bill of large-scale data applications on the cloud by hundreds of thousands of dollars. We can accelerate data science tasks by dramatically speeding up the computation of statistics over large amounts of data. We can train drastically more neural networks within a given time budget, improving accuracy. However, knowing the right data system design for any given scenario is a notoriously hard problem; there is a massive space of possible designs, while no single design is perfect across all data, queries, and hardware contexts. In addition, building a new system may take several years for any given (fixed) design.
      We will discuss our quest for the first principles of data system design. We will show that it is possible to reason about this massive design space. This allows us to create a self-designing data system that can take drastically different shapes to optimize for the workload, hardware, and available cloud budget using a grammar for data systems. These shapes include data structure, algorithms, and overall system designs which are discovered automatically and do not (always) exist in the literature or industry, yet they can be more than 10x faster.

Bio: Stratos Idreos is an associate professor of Computer Science at Harvard University, where he leads the Data Systems Laboratory. His research focuses on making it easy and even automatic to design workload and hardware-conscious data structures and data systems with applications on relational, NoSQL, and data science problems. For his Ph.D. thesis on adaptive indexing, Stratos was awarded the 2011 ACM SIGMOD Jim Gray Doctoral Dissertation award and the 2011 ERCIM Cor Baayen award from the European Research Council on Informatics and Mathematics. In 2015 he was awarded the IEEE TCDE Rising Star Award from the IEEE Technical Committee on Data Engineering for his work on adaptive data systems, and in 2022 he received the ACM SIGMOD Test of Time award for the NoDB concept. Stratos is also a recipient of the National Science Foundation Career award and the Department of Energy Early Career award. Stratos was PC Chair of ACM SIGMOD 2021 and IEEE ICDE 2022, he is the founding editor of the ACM/IMS Journal of Data Science and the chair of the ACM SoCC Steering Committee. Finally, Stratos received the 2020 ACM SIGMOD Contributions award for his work on reproducible research.


Pedro Pedreira      A Composable Era for Data Management


Pedro Pedreira
Meta

Abstract: The requirement for specialization in data management systems has evolved faster than our software development practices. After decades of organic growth, this situation has created a siloed landscape composed of hundreds of products developed and maintained as monoliths, with limited reuse between systems. This fragmentation has often forced us to reinvent the wheel, impacted our end users through SQL API inconsistencies, and ultimately slowed down innovation. In this talk, I will describe how the increasing popularity of open source projects aimed at standardizing different layers of the stack is changing how data management systems are developed, and outline a novel reference modular architecture. I will also discuss experiences with Velox and on componentizing one of the largest data warehouses in the world with the Shared Foundations effort at Meta, hoping to foster collaboration, motivate further research, and promote a more composable future for data management.

Bio: Pedro Pedreira is a Software Engineer at Meta. In his 10-year tenure, he has led a series of Data Infrastructure projects aimed at unifying and consolidating fragmented data management stacks. Currently, Pedro leads the Velox program, which is an effort at unifying execution engines into an open-source library, spanning more than a dozen engines within Meta and beyond. In the past, he worked on log analytics engines (such as Scuba), and created Cubrick, an in-memory analytical DBMS. Pedro holds a PhD, and MS in Computer Science from the Federal University of Parana, in Brazil.


[ Go to Top ]

  Organizers


[ Go to Top ]

  PC Members


  • Manos Athanassoulis, Boston University
  • Bingsheng He, National University of Singapore
  • Peiquan Jin, Univerisity of Science and Technology of China
  • Wolfgang Lehner, TU Dresden
  • Sang Won Lee, Sungkyunkwan University
  • Yinan Li, Microsoft Research
  • Ilia Petrov, Reutlingen University
  • Thamir Qadah, Umm Al-Qura University
  • Xiaodong Zhang, Ohio State University

[ Go to Top ]