|  | 
          
            |     Description |  
            | 
   HardBD
                : Data properties and hardware characteristics are two
                key aspects for efficient data management. A clear trend
                in the first aspect, data properties, is the increasing
                demand to manage and process Big Data in both enterprise
                and consumer applications, characterized by the fast
                evolution of Big Data Systems, such as Key-Value stores,
                Document stores, Graph stores, Spark, MapReduce/Hadoop,
                Graph Computation Systems, Tree-Structured Databases, as
                well as novel extensions to relational database systems.
                At the same time, the second aspect, hardware
                characteristics, is undergoing rapid changes, imposing
                new challenges for the efficient utilization of hardware
                resources. Recent trends include massive multi-core
                processing systems, high performance co-processors, very
                large main memory systems, persistent main memory, fast
                networking components, big computing clusters, and large
                data centers that consume massive amounts of energy.
                Utilizing new hardware technologies for efficient Big
                Data management is of urgent importance.    Active :
                Existing approaches to solve data-intensive problems
                often require data to be moved near the computing
                resources for processing. These data movement costs can
                be prohibitive for large data sets. One promising
                solution is to bring virtualized computing resources
                closer to data, whether it is at rest or in motion. The
                premise of active systems is a new holistic
                view of the system in which every data medium and every
                communication channel become compute-enabled. The Active
                workshop aims to study different aspects of the active
                systems' stack, understand the impact of active
                technologies (including but not limited to hardware
                accelerators such as SSDs, GPUs, FPGAs, and ASICs) on
                different applications workloads over the lifecycle of
                data, and revisit the interplay between algorithmic
                modeling, compiler and programming languages,
                virtualized runtime systems and environments, and
                hardware implementations, for effective exploitation of
                active technologies.     HardBD &
                      Active'21  : Both HardBD and
                Active are interested in exploiting hardware
                technologies for data-intensive systems. The aim of this
                one-day joint workshop is to bring together researchers,
                practitioners, system administrators, and others
                interested in this area to share their perspectives on
                exploiting new hardware technologies for data-intensive
                workloads and big data systems, and to discuss and
                identify future directions and challenges in this area.
                The workshop aims at providing a forum for academia and
                industry to exchange ideas through research and position
                papers.   [ 
                    Go to Top ]
 
 |  
            |     Topics |  
            |  Topics of interest include but
                are not limited to: 
                 Systems Architecture on New Hardware  Data Management Issues in Software-Hardware-System
                  Co-design  Main Memory Data Management (e.g. CPU Cache
                  Behavior, SIMD, Lock-Free Designs, Transactional
                  Memory)  Data Management on New Memory Technologies (e.g.,
                  SSDs, NVMs)  Active Technologies (e.g., GPUs, FPGAs, and ASICs)
                  in Co-design Architectures  Distributed Data Management Utilizing New Network
                  Technologies (e.g., RDMA)  Novel Applications of New Hardware Technologies in
                  Query Processing, Transaction Processing, or Big Data
                  Systems (e.g., Hadoop, Spark, NoSQL, NewSQL, Document
                  Stores, Graph Platforms etc.)  Novel Applications of Low-Power Modern Processors
                  in Data-Intensive Workloads  Virtualizing Active Technologies on Cloud (e.g.,
                  Scalability and Security)  Benchmarking, Performance Models, and/or Tuning of
                  Data Management Workloads on New Hardware Technologies
                  [ 
                    Go to Top ]
                
 
 |  
            |      Submission Guidelines  |  
            | 
                
                   We welcome submissions of
                    original, unpublished research papers that are not
                    being considered for publication in any other forum.
                    Papers should be prepared in the IEEE format and
                    submitted as a single PDF file. The paper length
                    should not exceed 6 pages. 
                    
                    The submission site is https://cmt3.research.microsoft.com/HardBDActive2021.
                   
                     Authors of a selection of accepted papers 
                     will be invited to submit an extended version to the
                     
                             Distributed and Parallel Databases (DAPD) journal. 
                   Camera Ready Instructions: ICDE'21 is using IEEE
                  Conference Publishing Services (CPS) to collect camera-ready
                  papers and copyright forms. 
                  Please submit your camera-ready papers and copyright forms at
                  the following link:
                  https://ieeecps.org/#!/auth/login?ak=1&pid=7olafLcrkRpJt57rn5a8lW.
                  
                  Papers should be formatted in the IEEE format.
                  The paper length should not exceed 6 pages.
                   
                  Please also upload your camera-ready papers on the CMT site so that we can provide paper links in the workshop program.
                    [
                           Go to Top
                          ]
 
 |  
            |        Important Dates |  
            | 
 
                
                  
                    | Paper submission: | January 18, 2021 (Monday) 11:59:00 PM PTJanuary 25, 2021 (Monday) 11:59:00 PM PT
 |  
                    | Notification of
                      acceptance: | February 8, 2021 (Monday)February 15, 2021 (Monday)
 |  
                    | Camera-ready
                      copies: | March 1, 2021 (Monday) |  
                    | Workshop: | April 19, 2021
                      (Monday) |   [ 
                    Go to Top ]
 
 |  
            |     
                Program |  
              | 
 7:00-8:00am PDT Research Session I
 
                GTraclus: A Local Trajectory Clustering Algorithm for GPUs.Hamza Mustafa (Microsoft), Clark Barrus (University of Oklahoma), Eleazar Leal (University of Minnesota Duluth), Le Gruenwald (The University of Oklahoma)
 
 
Analysis of GPU-Libraries for Rapid Prototyping Database Operations.Harish Kumar Harihara Subramanian, Bala Gurumurthy, Gabriel Campero Durand, David Broneske, Gunter Saake (University of Magdeburg)
 
 
Performance Analysis of Big Data ETL Process over CPU-GPU Heterogeneous Architectures (short talk).Suyeon Lee, Sungyong Park (Sogang University)
 
 
An Investigation of Atomic Synchronization for Sort-Based Group-By Aggregation on GPUs (short talk).Bala Gurumurthy, David Broneske (University of Magdeburg), Martin Schäler (Salzburg University), Thilo Pionteck, Gunter Saake (University of Magdeburg)
 
 
 8:00-8:10am PDT Break
 8:10-9:40am PDT SMDB and HardBD&Active Joint Keynote Session I
 9:40-9:50am PDT Break
 9:50-11:20am PDT SMDB and HardBD&Active Joint Keynote Session II
 11:20-11:30am PDT Break
 11:30-12:30pm PDT Research Session II
  [ 
                    Go to Top ]
 
 |  
            |     
                Joint Keynote Talks with SMDB 2021
                       |  
            | 
 
   |   |  | AI's Enormous Potential for Database Simplification 
 
 Sam Lightstone
 CTO AI Strategy, IBM Data and AI
 |  Abstract: 
Research into self-managing databases exploded in the early 2000's with
sizeable corporate efforts from each of IBM, Microsoft and Oracle. In 2005 the
SMDB Workgroup was founded by Sam Lightstone and Guy Lohman to bring together
like minded innovators from industry and academia. Now, as we enter the era of
intelligent computing, AI offers itself as a catalyst for quantum gains in
database simplification. In this session Sam Lightstone will contrast the state
of SMDB technology of 2005 with today's emerging new potential for automation,
semantic simplification and handling new workloads.
 
 Bio: 
Sam Lightstone is IBM Chief Technology Officer for AI, IBM Fellow and a Master
Inventor in the IBM Data and AI group. He is also chair of the Data and AI
Technical Team, the working group of IBM's technical executives in the
division. He has been the founder and co-founder of several large-scale
initiatives including AI databases, next generation data warehousing, data
virtualization, autonomic computing for data systems, serverless cloud SQL
query, and cloud native database services. He co-founded the IEEE Data
Engineering Workgroup on Self-Managing Database Systems. Sam has more than 65
patents issued and pending and has authored 4 books and over 30 papers. Sam's
books have been translated into Chinese, Japanese and Spanish. In his spare
time he is an avid guitar player and fencer. His Twitter handle is
"samlightstone".
 
 
 
 
   |   |  | OtterTune: An Automatic Database Configuration Tuning Service 
 
 Andy Pavlo
 Associate Professor, Carnegie Mellon University & Co-founder, OtterTune
 
 |  Abstract: 
Database management systems (DBMS) expose dozens of configurable knobs that
control their runtime behavior. Setting these knobs correctly for an
application's workload can improve the performance and efficiency of the DBMS.
But such tuning requires considerable efforts from experienced administrators,
which is not scalable for large DBMS fleets. This problem has led to research
on using machine learning (ML) to devise strategies to optimize DBMS knobs for
any application automatically. The OtterTune database tuning service from
Carnegie Mellon uses ML to generate and install optimized DBMS configurations.
OtterTune observes the DBMS's workload through its metrics and then trains
recommendation models that select better knob values. It then reuses these
models to tune other DBMSs more quickly. In this talk, I will present an
overview of OtterTune and discuss the challenges one must overcome to deploy an
ML-based service for DBMSs. I will also highlight the insights we learned from
real-world installations of OtterTune.
 
 Bio: 
Andy Pavlo is an Associate Professor of Databaseology in the Computer Science
Department at Carnegie Mellon University. His research interest is in database
management systems, specifically main memory systems, self-driving / autonomous
architectures, transaction processing systems, and large-scale data analytics.
At CMU, he is a member of the Database Group and the Parallel Data Laboratory.
He is also the co-founder and CEO of OtterTune.
 
 
 
 
   |   |  | Automatic Data Management and Storage Tiering with Oracle Database In-Memory
 
 
 Shasank Chavan
 Vice President of the Data and In-Memory Technologies group, Oracle
 
 |  Abstract: 
Autonomous / Self-Driving Databases utilize machine learning techniques to
eliminate the manual labor associated with database tuning, security, backups,
updates, and other routine management tasks traditionally performed by DBAs.
This talk will focus specifically on how we implement a self-driving database
with Oracle's Database In-Memory product to automatically tune for query
optimization, memory management, storage management and data tiering. We will
first present Oracle's Database In-Memory architecture and various features
built for optimizing analytics and mixed workload performance, and then
describe in some detail the smarts we have to make it auto-performing in our
self-driving database.
 
 Bio: 
Shasank Chavan is the Vice President of the Data and In-Memory Technologies
group at Oracle. He leads an amazing team of brilliant engineers in the
Database organization who develop customer-facing, performance-critical
features for an In-Memory Columnar Store which, as Larry Ellison proclaimed,
"processes data at ungodly speeds". His team implements novel SIMD kernels and
hardware acceleration technology for blazing fast columnar data processing,
optimized data formats and compression technology for efficient in-memory
storage, algorithms and techniques for fast in-memory join and aggregation
processing, and optimized in-memory data access and storage solutions in
general. His team is currently hyper-focused on leveraging emerging hardware
technologies to build Oracle's next-generation, highly distributed, data
storage engine that powers the cloud. Shasank earned his BS/MS in Computer
Science at the University of California, San Diego. He has accumulated 20+
patents over a span of 22 years working on systems software technology.
 
 
 
 
   |   |  | Architectural Evolution of Amazon Redshift and Its Practical Usage of Machine Learning
 
 
 Ippokratis Pandis
 Amazon AWS
 
 |  Abstract: 
Amazon Redshift is Amazon's Petabyte-scale managed cloud data warehouse. Every
day customers use Amazon Redshift to process multiple Exabytes of data. In the
first part of this talk, we are going to look a bit under the hood of Amazon
Redshift and discuss how the team makes sure that Amazon Redshift maintains its
price/performance leadership among Cloud Data Warehouses. Further, we will talk
about its architectural evolution, discussing features such as Managed Storage,
Elastic Resize, Concurrency Scaling, DataSharing, Spectrum and AQUA. In the
second part, we are going to discuss how Amazon Redshift leverages Machine
Learning to improve its global operation, to reduce the need for administrative
operations by its customers, and to improve its performance.
 
 Bio: 
Ippokratis Pandis is a senior principal engineer at Amazon Web Services,
working in Amazon Redshift. Redshift is Amazon's fully managed, petabyte-scale
data warehouse service. Among others, Ippokratis is the architect of the
Spectrum, Concurrency Scaling and DataSharing features of Redshift. Previously,
Ippokratis has held positions as software engineer at Cloudera where he worked
on the Impala SQL-on-Hadoop query engine, and as member of the research staff
at the IBM Almaden Research Center, where he worked on the DB2 BLU product.
Ippokratis received his PhD from the Electrical and Computer Engineering
department at Carnegie Mellon University. He is the recipient of Best
Demonstration awards at ICDE 2006 and SIGMOD 2011, and Test-of-Time award at
EDBT 2019. He has served as PC chair of DaMoN 2014, DaMoN 2015, CloudDM 2016
and HPTS 2019.
 
  [ 
                    Go to Top ]
 
 |  
            |     
                Organizers |  
            | 
  [ 
                    Go to Top ]
 
 |  
            |     
                PC
                      Members |  
            | 
 
                Manos Athanassoulis, Boston UniversityBingsheng He, National University of SingaporePeiquan Jin, Univerisity of Science and Technology
                  of ChinaWolfgang Lehner, TU DresdenYinan Li, Microsoft ResearchQiong Luo, Hong Kong University of Science and
                  TechnologyIlia Petrov, Reutlingen UniversityEva Sitaridi, AmazonTianzheng Wang, Simon Fraser UniversityXiaodong Zhang, Ohio State University  [ 
                    Go to Top ]
 
 |  |