Efficient Query Execution over Large Databases through Semantic Caching of Bitmap Indices
Sarah McClain and Prof. David Chiu (Advisor)
Jared Polonitza and Prof. David Chiu (Advisor)
Abstract: This paper establishes a transactional energy market in which any data center can participate. Like a stock-market exchange, we propose a framework in which data centers can trade energy usage (in the form of jobs) for monetary value. The proposed framework allows each data center to monitor multiple parameters, including the current energy prices, budgets, and job execution states. These parameters inform the construction of models to help participating data centers optimize various cost, profit, and job-performance objectives to manage the risks of market participation. In our feasibility study, market participants mutually benefit by increasing general revenue through either a reduction of energy costs and/or through the successful completion of more jobs. Using real energy-pricing data, in our simulated experiment of 100 participating data centers, we observe a significant cost reduction leading to an average increase of 17.8% profit margins.
Sarah Walling-Bell and Prof. David Chiu (Advisor)
Abstract: Most university neuroscience and psychology students are not required to learn programming in their curriculum, and so miss out on a fundamental aspect of modern brain and behavioral research: learning to organize and manipulate their stimuli algorithmically. This is the case in the University of Puget Sound (UPS) Electroencephalography (EEG) lab, where students researching how electrical activity in the brain responds to stimuli (e.g., images) are unable to program their own stimuli. In this research I developed a software tool that enables novice programmers to build EEG experiment stimuli scripts without first learning a programming language. Future development of this tool will aim to include an educational component by teaching users basic programming concepts.
Andrew Brandt and Prof. Courtney Thatcher (Advisor)
Abstract: The goal of this project was to find a set of requirements for planar graphs that would simplify the inclusion-exclusion principle calculations for counting problems, and to explore the relationships between sets and complexes in various examples to expand the technique to a larger set of examples. After researching planar graphs and working through many examples, we discovered a set of requirements for a planar graph associated to a family of sets that allows for the simplification of the inclusion-exclusion principle. We then proceeded to research simplicial complexes and intersection complexes associated to families of sets to expand and improve upon this application. After defining some new complexes and collapses, we were able to relate the weighted Euler characteristic to the order of the union of the family of sets, and hence simplify the inclusion-exclusion principle in a different way. While working with the new complexes, we discovered that we could use the generalized complexes from the second theorem to improve upon the conditions of the first.
Rachel Hirsch and Prof. David Chiu (Advisor)
Abstract: This project builds on previous research on the handling of Big Data, data management, and data compression algorithms, with the main focus being compression of live bitmap data. The project this research is building on relied primarily on test data rather than real-time data streaming. The primary part of this research investigates the optimal speed of streaming in bitmap data before compressing it using the already-existing Word-Aligned-Hybrid (WAH) bitmap compression algorithm.
Drew Kristensen and Prof. Adam Smith (Advisor)
Abstract: The ultimate goal of this study is to gain a better understanding of the affective state of mice, by analyzing mouse vocalizations. To achieve this goal, this study aimed to distinguish vocalizations from recordings using an artificial neural network. In order to analyze these calls, they must first be identified and characterized. While human technicians are very accurate at these classifications, manually annotating the ultrasonic vocalizations (USVs) is a time-intensive process and might be better if it were automated. This project relies on neural networks, a machine learning technique that employs weighted inputs with the ability to learn as a method of discerning calls from noise.
Imanuel Chen and Prof. Mike Spivey (Advisor)
Abstract: The binomial coefficients can be generalized to any class of function, which we call the generalized binomial coefficients. One interesting thing about the binomial coefficients is that they are always integral. But when are the generalized binomial coefficients integral? Multiplicative functions are functions that satisfy the properties f(ab)=f(a)f(b) when a and b are relatively prime, and f(1)=1. Michael Spivey and Tom Edgar proved a Corollary which determined when all multiplicative functions have integral generalized binomial coefficients. The goal was then to discover patterns and generalizations within this Corollary, and to determine when the generalized binomial coefficients of specific multiplicative functions are integral.
Thomas Gagne and Prof. Rob Beezer (Advisor)
Abstract: First developed in 2001, the Advanced Encryption Standard (AES) cipher is now one of the most com- monly used encryption algorithms worldwide. However, the algebraically simple description of the AES leads some cryptographers to question whether an algebraic weakness in the cipher exists, which would be fatal to the security of the AES. This summer, I studied the algebraic properties of the AES with the goal of designing a computational tool for researchers and algebraic cryptanalysts of the AES which would allow more rigorous study of the algebraic qualities of the AES. I accomplished this by implementing an algebraic representation of the cipher in the open source mathematical software system Sage. Over the course of this project, my examination of the algebraic properties of the cipher allowed me to create a generalized system of algebraic equations which behaved analogously to the entire cipher. I then used this system of equations as an underlying mathematical framework for an implementation of the AES cipher in Sage, which allowed this implementation to perform the cipher’s functions through strictly algebraic means. This allows the implementation to be used for close study of the algebraic properties of the individual steps of the cipher as well as the cipher as a whole in a powerful computing environment, making this tool a valuable addition to the Sage cryptography library.
Alexia Ingerson and Prof. David Chiu (Advisor)
Abstract: As technology continues to be fundamental to our society, the abundance of data has increased exponentially. Because computers’ hard drives are slow, the more data is stored, the longer it takes to access useful information. For this reason, it is imperative to use efficient data structures to provide fast data access. One such structure that is widely used is known as the bitmap index, which contains M rows and N columns of bits. Bitmaps, however, increase in volume at the same (or even higher) rate as the data set itself, so it is common to compress the bitmap, which can take an enormous amount of time. To combat this problem, we split the index into N columns, and dispatch a thread to compress each column separately. Because there is no data dependence between the columns, we were surprised that the exhibited speedup fell short of expectations over a multi-core processor. Our research investigates the data-access bottlenecks and explores ways to reorganize our data set to increase performance. We hypothesize that the contention for shared last-level cache and disk I/O among the threads led to slowdowns in execution. To this end, we reformatted the bitmap files into striped files, that is, instead of storing one column per file, each striped files contain multiple columns interleaved, thereby decreasing the latency of accessing files from different regions of disk. In addition to reducing the I/O time, striping the files also decreases the number of cache conflicts since all of the files being accessed at any one time are in similar locations in memory and, as a result, this causes the number of cache hits to increase. We ran our algorithms on several scientific data sets with varying sizes (7.1 MB - 10 GB) varying in cardinality (e.g., number of columns) and found that striping consistently increases performance with increased threading. We found that the addition of striping can improve speedup by up to 50% over the original unstriped method, achieving a more linear increase in speed with the addition of threads.
Chili Johnson and Prof. David Chiu (Advisor)
Abstract: Renewable energy sources such as wind and solar are unpredictable for power utilities, which must produce exactly as much power as is needed at any given time. To help manage the demand, some utilities have begun deploying real-time energy prices to their customers. Data centers, which often run Hadoop jobs on thousands of machines, have become some of the utilities' largest consumers. In fact, recent studies have shown that, when processing at full capacity, data centers can require as much power as a mid-sized U.S. city. By implementing a method in which data centers can offload their work to locations on different power grids, they can take advantage of the lower-priced energy and thereby minimize operational costs. To this end, we have designed and implemented a new mechanism directly within the Hadoop 2 codebase that allows users to pause, migrate, and resume a job at arbitrary points of execution. We have evaluated this scheme using popular applications and show that energy can be delayed and shifted to a different location with reasonable overheads. Our experiments justify the migration use-case, showing that it saves both energy and time over either restarting the job remotely or allowing it to complete locally.
Matthew Moreno and Prof. Adam Smith (Advisor)
Abstract: Quantification of the type and frequency of mouse ultrasonic vocalizations (USVs) can serve as an effective assay of mouse social and affective state. Identifying and characterizing USVs from raw recordings by hand is a slow and resource-intensive task. However, software packages developed for automated extraction and characterization of USVs that are currently available are difficult to use and are confounded by the presence of non-vocalization noise. To address these issues, an algorithm employing hidden Markov models (HMMs) has been developed by Dr. Smith of the University of Puget Sound in conjunction with collaborators from OHSU and UC Santa Cruz. The algorithm provides enhanced call-identification accuracy by recognizing and excluding time intervals with significant non-USV noise. Unfortunately, mouse vocalizations that occur in these time intervals, which could be readily identified by manual processing, are excluded. Thus, an algorithm to identify USV in the presence of non-USV noise might complement the HMM-based approach developed by Dr. Smith. To this end, several filtering algorithms inspired by the Sobel Edge detection method were developed and tested. Convolution of spectrographic data with a best fit Sobel matrix generated from manual annotations achieved some discrimination between true-call signal and noise signal. However, a flexible generic double-edge-detecting matrix proved to be the mot successful filter. It is hoped that the Sobel-inspired methods might be employed in conjunction with other USV- detection schemes to provide biological researches a more complete and reliable insight into the vocalizations of their experimental subjects.
Brandon Roberts and Prof. David Chiu (Advisor)
Abstract: To bill patients, hospitals and medical clinics must send insurance companies and medicaid/medicare a "billing code," which uniquely identifies the patient's diagnosis and treatment. The code's current format is known as ICD9, and is standardized by the World Health Organization (WHO). ICD9 had been used for a number of years, but insurance companies and medicaid/medicare are migrating to ICD10, a far more granular version of ICD9. If clinics do not switch over to ICD10 by October 2015, they simply won't get paid by insurances! Clearly, this has clinics worried about their level of preparedness. Medical billing is becoming more of a challenge with the added complexity of the new medical code base, ICD-10. The time and cost to find codes in medical manuals will increase drastically when ICD-10 becomes required. The objective of this research was to find a faster and more direct way to create patient bills using iOS.