I am a postdoctoral researcher at the Data Systems Lab (DASLab) as well as the Institute for Applied Computational Science (IACS), both at Harvard University. I develop data structures for astronomers that enable a tractable exploration of data. I am also interested in adaptive database indexing as well as flash memory.
The slides from my recent workshop on NoSQL vs. NewSQL systems are here, and the tutorial on MongoDB is here.
Tractable Exploration of Astronomical Data
Our era’s scientific Wonders of the World such as the Large Synoptic Survey Telescope (LSST) will produce vast quantities of data. The manner in which this data is organized will determine the efficiency in which different kinds of queries can be processed. The problem is that a scientist does not necessarily know in advance which queries will run, because the next question about the data often depends on the results of a previous query. Thus, being able to efficiently process a broad range of queries from an evolving scientific workload is crucial. In this project, we develop data structures that enable a broader range of queries to be answered efficiently so that scientific exploration can remain tractable.
Adaptive Database Indexes
The rising proportion of writes in application workloads has made write-optimized indexes such as LSM-trees resurge in popularity.
The cost of such indexes is measured along three dimensions: the work needed to insert a key-value pair (write-amplification), the work needed for a lookup (read-amplification), and the amount of main memory and secondary storage needed to store the index (space-amplification). In this project, we identify the optimal trade-off curves among these cost metrics and show how to navigate them to find the best point for a given application.
Predictable Flash Memory Performance
Although flash memory has risen as a popular secondary storage medium, its performance is difficult to predict. The reason is that internally, flash memory is subject to a complex set of constraints. Storage units must be erased before they are updated, erases have a bigger granularity than writes, and each storage unit has a limited lifetime in terms of erases. A flash translation layer (FTL) hides these constraints and exposes a simple block interface to the application. The problem is that the FTL's behind-the-scene work impacts performance. In this project, we redesign the FTL so as to eliminate bottlenecks and make the performance of flash devices predictable.
My CV can be downloaded here.