|
Description
|
HardBD & Active'23 will be a one-day
workshop co-located with ICDE'23. The aim of this one-day workshop is to bring
together researchers, practitioners, system administrators, and others
interested in this area to share their perspectives on exploiting new hardware
technologies for data-intensive workloads and big data systems, and to discuss
and identify future directions and challenges in this area. The workshop aims
at providing a forum for academia and industry to exchange ideas through
research and position papers.
[
Go to Top ]
|
Topics
|
Topics of interest include but
are not limited to:
- Systems Architecture on New Hardware
- Data Management in Software-Hardware-System
Co-design
- Main Memory Data Management (e.g. Multi-core,
Cache, SIMD)
- Data Management on New Memory Technologies (e.g.,
SSDs, NVMs)
- Active Technologies (e.g., GPUs, FPGAs, and ASICs)
in Co-design Architectures
- Distributed Data Management Utilizing New Network
Technologies (e.g., RDMA, CXL)
- Novel Applications of New Hardware Technologies in
Query Processing, Transaction Processing, or Big Data
Systems (e.g., Hadoop, Spark, NoSQL, NewSQL, Document
Stores, Graph Platforms etc.)
- Virtualizing Active Technologies on Cloud (e.g.,
Scalability and Security)
- Benchmarking, Performance Models, and/or Tuning of
Data Management Workloads on New Hardware Technologies
[
Go to Top ]
|
Submission Guidelines
|
|
Important Dates
|
Paper submission: |
January 11, 2023 (Wednesday) 11:59:00 PM PT
January 18, 2023 (Wednesday) 11:59:00 PM PT
|
Notification of
acceptance: |
February 1, 2023 (Wednesday)
|
Camera-ready
copies: |
February 15, 2023 (Wednesday)
|
Workshop: |
April 3, 2023 (Monday) |
[
Go to Top ]
|
Program
|
10:30-12:30 PDT SMDB and HardBD&Active Joint Keynote Session I (SMDB Room)
12:30-14:00 PDT Lunch
14:00-14:10 PDT Opening and Introductions (HardBD&Active Room)
14:10-15:30 PDT Research Session (Session Chair: Suyash Gupta) (HardBD&Active Room)
- An Empirical Performance Comparison between Matrix Multiplication Join and Hash Join on GPUs (25min).
Wenbo Sun, Asterios Katsifodimos, Rihan Hai (TU Delft)
- Adaptive Query Compilation with Processing-in-Memory (25min).
Alexander Baumstark, Muhammad Attahir Jibril, Kai-Uwe Sattler (TU Ilmenau)
- Sailfish: Exploring Heterogeneous Query Acceleration on Discrete CPU-FPGA Architecture (Short Presentation 12min).
Xing Wei, Yaofeng Tu, Yinjun Han, Zhenghua Chen, Xuecheng Qi, Daojun Hua (ZTE Corporation)
- Vertical Vectorized Hashing for Faster Group-By Aggregation (Short Presentation 12min).
Spoorthi Nijalingappa (University of Magdeburg), Bala Gurumurthy (University of Magdeburg), David Broneske (German Center for Higher Education Research and Science Studies), Gunter Saake (University of Magdeburg)
15:30-16:00 PDT Coffee Break
16:00-18:00 PDT SMDB and HardBD&Active Joint Keynote Session II (Session Chair: Tianzheng Wang) (HardBD&Active Room)
18:00-18:05 PDT Closing (HardBD&Active Room)
[
Go to Top ]
|
Keynote Talks
|
|
|
From Self-Managed Database Systems to Runtime-Intelligent Analytics
Anastasia Ailamaki
EPFL and Google, Inc.
|
Abstract:
Self-managed database systems have been successful in addressing various
challenges such as database tuning, optimization, and maintenance. However, the
growth and heterogeneity of data, hardware, and applications reveal limitations
which can only be addressed using real-time adaptive query engines.
Just-in-time query execution using code generation presents a promising
solution which enables efficient processing of complex queries while minimizing
overhead and maintenance costs and allowing databases to be more dynamic,
adaptive, and efficient. This talk presents an overview of approaches to
just-in-time query execution including dynamic query planning, adaptive
caching, self-optimized data pipelines, and machine learning-based techniques.
We discuss the benefits and challenges of these approaches, as well as their
practical applications.
Bio:
Anastasia Ailamaki is a Professor of Computer and Communication Sciences at the
École Polytechnique Fédérale de Lausanne (EPFL) in Switzerland, as well as the
co-founder and Chair of the Board of Directors of RAW Labs SA, a Swiss company
developing systems to analyze heterogeneous big data from multiple sources
efficiently. She earned a Ph.D. in Computer Science from the University of
Wisconsin-Madison in 2000. She has received the 2019 ACM SIGMOD Edgar F. Codd
Innovations Award and the 2020 VLDB Women in Database Research Award. She is
also the recipient of an ERC Consolidator Award (2013), the Finmeccanica
endowed chair from the Computer Science Department at Carnegie Mellon (2007), a
European Young Investigator Award from the European Science Foundation (2007),
an Alfred P. Sloan Research Fellowship (2005), an NSF CAREER award (2002),
twelve best-paper awards in international scientific conferences. She has
received the 2018 Nemitsas Prize in Computer Science by the President of Cyprus
and the 2021 ARGO Innovation Award by the President of the Hellenic Republic.
She is an ACM fellow, an IEEE fellow, a member of the Academia Europaea, and an
elected member of the Swiss, the Belgian, the Greek, and the Cypriot National
Research Councils.
|
|
Self-Managing Database Capabilities in SQL Azure
Hanuma Kodavalla
Technical Fellow, Microsoft
|
Abstract:
SQL Azure is a Database Platform as a Service offering from Microsoft that
manages more than 11 million databases worldwide in all geographies. Managing
at this scale requires automating many aspects of running a database system.
This talk describes how SQL Azure automatically allocates required resources
efficiently based on the workload, detects and recovers from query plan
regressions, protects from security attacks, recovers from hardware defects,
finetunes the locking mechanism based on the concurrent workload, reduces
recovery time and failover impact, protects from natural disasters like power,
fire and flood, manages storage and indexes, and deploys new versions of
infrastructure and database software with minimum disruption. The talk also
covers the monitoring and the debugging facilities, some of the lessons learnt
and the challenges that remain in making a large and growing fleet of databases
self-managing.
Bio:
Hanuma Kodavalla is a Technical Fellow in the Azure Databases group at
Microsoft where he has been for twenty years. He previously worked at Data
General, Digital Equipment Corporation, Oracle, Sybase and Asera. For more than
three decades, Hanuma worked on many aspects of Relational Database Systems and
has been instrumental in architecting multiple commercial database systems for
high performance and high availability. Hanuma received BTech in Electronics
and Communications in 1981 from National Institute of Technology, Warangal,
India, MTech in Computer Science in 1983 from Indian Institute of Technology,
Chennai, India, and MS in Computer Science in 1988 from University of
Massachusetts, Amherst, USA. He has a few publications in database conferences
and many patents related to novel implementation techniques for online
transaction processing and data warehousing in the areas of concurrency
control, recovery, high-availability, query processing and security.
|
|
The Data Systems Grammar
Stratos Idreos
Harvard
|
Abstract:
Data systems are everywhere. A data system is a collection of data structures
and algorithms working together to achieve complex data processing tasks. For
example, with data systems that utilize the correct data structure design for
the problem at hand, we can reduce the monthly bill of large-scale data
applications on the cloud by hundreds of thousands of dollars. We can
accelerate data science tasks by dramatically speeding up the computation of
statistics over large amounts of data. We can train drastically more neural
networks within a given time budget, improving accuracy. However, knowing the
right data system design for any given scenario is a notoriously hard problem;
there is a massive space of possible designs, while no single design is perfect
across all data, queries, and hardware contexts. In addition, building a new
system may take several years for any given (fixed) design.
We will discuss our quest for the first principles of data
system design. We will show that it is possible to reason about this massive
design space. This allows us to create a self-designing data system that can
take drastically different shapes to optimize for the workload, hardware, and
available cloud budget using a grammar for data systems. These shapes include
data structure, algorithms, and overall system designs which are discovered
automatically and do not (always) exist in the literature or industry, yet they
can be more than 10x faster.
Bio:
Stratos Idreos is an associate professor of Computer Science at Harvard
University, where he leads the Data Systems Laboratory. His research focuses on
making it easy and even automatic to design workload and hardware-conscious
data structures and data systems with applications on relational, NoSQL, and
data science problems. For his Ph.D. thesis on adaptive indexing, Stratos was
awarded the 2011 ACM SIGMOD Jim Gray Doctoral Dissertation award and the 2011
ERCIM Cor Baayen award from the European Research Council on Informatics and
Mathematics. In 2015 he was awarded the IEEE TCDE Rising Star Award from the
IEEE Technical Committee on Data Engineering for his work on adaptive data
systems, and in 2022 he received the ACM SIGMOD Test of Time award for the NoDB
concept. Stratos is also a recipient of the National Science Foundation Career
award and the Department of Energy Early Career award. Stratos was PC Chair of
ACM SIGMOD 2021 and IEEE ICDE 2022, he is the founding editor of the ACM/IMS
Journal of Data Science and the chair of the ACM SoCC Steering Committee.
Finally, Stratos received the 2020 ACM SIGMOD Contributions award for his work
on reproducible research.
|
|
A Composable Era for Data Management
Pedro Pedreira
Meta
|
Abstract:
The requirement for specialization in data management systems has evolved
faster than our software development practices. After decades of organic
growth, this situation has created a siloed landscape composed of hundreds of
products developed and maintained as monoliths, with limited reuse between
systems. This fragmentation has often forced us to reinvent the wheel, impacted
our end users through SQL API inconsistencies, and ultimately slowed down
innovation. In this talk, I will describe how the increasing popularity of open
source projects aimed at standardizing different layers of the stack is
changing how data management systems are developed, and outline a novel
reference modular architecture. I will also discuss experiences with Velox and
on componentizing one of the largest data warehouses in the world with the
Shared Foundations effort at Meta, hoping to foster collaboration, motivate
further research, and promote a more composable future for data management.
Bio:
Pedro Pedreira is a Software Engineer at Meta. In his 10-year tenure, he has
led a series of Data Infrastructure projects aimed at unifying and
consolidating fragmented data management stacks. Currently, Pedro leads the
Velox program, which is an effort at unifying execution engines into an
open-source library, spanning more than a dozen engines within Meta and beyond.
In the past, he worked on log analytics engines (such as Scuba), and created
Cubrick, an in-memory analytical DBMS. Pedro holds a PhD, and MS in Computer
Science from the Federal University of Parana, in Brazil.
[
Go to Top ]
|
Organizers
|
[
Go to Top ]
|
PC Members
|
- Manos Athanassoulis, Boston University
- Bingsheng He, National University of Singapore
- Peiquan Jin, Univerisity of Science and Technology of China
- Wolfgang Lehner, TU Dresden
- Sang Won Lee, Sungkyunkwan University
- Yinan Li, Microsoft Research
- Ilia Petrov, Reutlingen University
- Thamir Qadah, Umm Al-Qura University
- Xiaodong Zhang, Ohio State University
[
Go to Top ]
|
|