- Results include
Wang, WinstonAugust 15, 2022; May 2019
Proper characterization of a tumor is essential for informing treatment and assessing prognosis for the patient. The concentration of certain biomarkers, presence or absence of specific mutations, and even the pattern of distribution of biomarkers throughout the tumor can be extremely important in determining the aggressiveness of the tumor. For example, tumors that are more homogenous tend to be less aggressive and have a better prognosis than those that are more heterogenous. Therefore, we seek to image multiple biomarkers in vivo using targeted dyes. However, imaging the tumor multiple times in series without an invasive biopsy is much too time consuming. Usually, for these targeted dyes, imaging occurs 5-7 days after the imaging agent is administered, meaning that imaging any more than one or two biomarkers would be prohibitively lengthy, possibly affecting timely treatment of the tumor. Therefore, we can utilize unique surface-enhanced resonance Ra- man scattering (SERRS) particles to target distinct biomarkers. In order to determine the concentration of each particle at each point, the individual spectra need to be deconvolved from the multiplexed spectra. Conventional methods for separating the spectra, nonnegative least squares (NNLS), have been successful for low numbers of spectra. However, NNLS must calculate the pseudoinverse, and as the number of spectra increases, the condition number for that matrix increases quickly. Thus, past five spectra or so, using NNLS to deconvolute the spectra becomes untenable. Thus, we aim to use machine learning as an alternative to NNLS, with the potential to expand spectral deconvolution to more spectra accurately and quickly during run time.
Pierson, Emma[Stanford, California] : [Stanford University], 2020
Recent work in algorithmic fairness has highlighted the ways in which machine learning and data science can exacerbate already profound social inequalities. While invaluable, this work should not cause us to lose sight of the more optimistic counterpoint: that machine learning and data science have the potential to also reduce social inequality if properly applied. This dissertation explores this potential. In the first half of the dissertation, we provide two examples illustrating how data science and machine learning can improve healthcare for underserved populations. We first develop a deep learning algorithm which identifies pain-relevant features in knee osteoarthritis x-rays which conventional severity measures overlook, but which help explain higher pain levels in black, lower-income, and lower-education patients. Secondly, we use data from a women's health app to decompose women's mood, behavior, and vital signs into four simultaneous cycles --- daily, weekly, seasonal, and menstrual --- and reveal that the menstrual cycle, though often invisible in past analyses, is the largest of the four cycles. In the second half of the dissertation, we provide two examples illustrating how data science and machine learning can detect bias in human decision-making, focusing on policing as an application domain. We first describe a new family of probability distributions and use them to accelerate a Bayesian test for discrimination by two orders of magnitude, allowing it to scale to much larger datasets. We then apply this test to a national dataset of traffic stops which we collect via public records requests and publicly release. The methods we develop are more broadly applicable to assessing bias in many other human decisions
Kim, Michael Pum-Shin[Stanford, California] : [Stanford University], 2020
Algorithms make predictions about people constantly. The spread of such prediction systems---from precision medicine to targeted advertising to predictive policing---has raised concerns that algorithms may perpetrate unfair discrimination, especially against individuals from minority groups. While it's easy to speculate on the risks of unfair prediction, devising an effective definition of algorithmic fairness is challenging. Most existing definitions tend toward one of two extremes---individual fairness notions provide theoretically-appealing protections but present practical challenges at scale, whereas group fairness notions are tractable but offer marginal protections. In this thesis, we propose and study a new notion---multi-calibration---that strengthens the guarantees of group fairness while avoiding the obstacles associated with individual fairness. Multi-calibration requires that predictions be well-calibrated, not simply on the population as a whole but simultaneously over a rich collection of subpopulations C. We specify this collection---which parameterizes the strength of the multi-calibration guarantee---in terms of a class of computationally-bounded functions. Multi-calibration protects every subpopulation that can be identified within the chosen computational bound. Despite such a demanding requirement, we show a generic reduction from learning a multi-calibrated predictor to (agnostic) learning over the chosen class C. This reduction establishes the feasibility of multi-calibration: taking C to be a learnable class, we can achieve multi-calibration efficiently (both statistically and computationally). To better understand the requirement of multi-calibration, we turn our attention from fair prediction to fair ranking. We establish an equivalence between a semantic notion of domination-compatibility in rankings and the technical notion of multi-calibration in predictors---while conceived from different vantage points, these concepts encode the same notion of evidence-based fairness. This alternative characterization illustrates how multi-calibration affords qualitatively different protections than standard group notions