Presentation abstracts and slides

Stanford Research Computing

Mark Piercy, Research Computing Technical Liaison, Stanford Research Computing
10:00 - 10:30am


The Stanford Research Computing Center (SRCC) is a joint effort of the Dean of Research and IT Services to build and support a comprehensive program to advance computational research at Stanford. That includes offering and supporting traditional high-performance computing (HPC) systems, as well as systems for high throughput and data-intensive computing. The SRCC also helps researchers transition their analyses and models from the desktop to more capable and plentiful resources, providing the opportunity to explore their data and answer research questions (on-premise or in the cloud) at a scale typically not possible on desktops or departmental servers.

Keeping Stanford’s Research Mission Secure in an Era of Increasing Cyber Threats

Michael A. Timineri, Director of Information Security Consulting, Stanford Information Security Office
10:30 - 11:00am


While it may seem that Stanford stands apart in our teaching, research, and clinical care missions, the university is no stranger to data breaches and cyber attacks. In this presentation you will get an overview of the University’s cyber security program, hear about cyber security incidents and threats, and learn what you need to do in your role, as a researcher, to keep your data safe.


Ramanathan V Guha, Google Fellow
11:00 - 11:30am

Publicly available data from open sources are a vital resource for students and researchers in a variety of disciplines. Unfortunately, processing these datasets to make them useful --- scraping, cleaning, normalizing, joining --- is tedious, error prone and has to be repeated by every group. DataCommons attempts to alleviate some of this pain by synthesizing a single Knowledge Graph from many different data sources. It links references to the same entities (such as cities, counties, organizations, etc.) across different datasets to nodes on the graph, so that users can access data about a particular entity aggregated from different sources. Like the Web, the DataCommons graph is open - any user can contribute data or build applications powered by the graph. We are jump-starting the graph with data from publicly available sources such as CDC, Census, BLS, FBI, etc. and are looking to engage with the academic community to take it further.

The Intersection between repository and research computing

Hannah Frost, Service Manager, Stanford Digital Repository
11:30 - 12:00pm


Research computing results in new discoveries. New discoveries are worth sharing with present and future researchers. Find out how and where the Stanford Digital Repository and its services — search engine discovery, DOIs, archiving — fit into this process. This presentation is of relevance to students, faculty, librarians and anyone else responsible for the data and systems involved in research computing.

Rendering beautiful cover art using real data

Robin Betz, Graduate Student, Bioinformatics
Lightning Talk
1:45 - 1:55pm


Computational resources are useful for more than just data analysis and number crunching-- they can also be used to create beautiful images for journal covers and publicity materials. In this talk, I will describe how to use Python, the rendering engine Pov-Ray, parallelization on the Sherlock cluster, and a large molecular dynamics dataset to create an artistic rendering for a journal cover.

Using parallel computing and optimization techniques to uncover the phenotypes involved in adaptation to novel environments

Grant Kinsler, Graduate Student, Biology
Lightning Talk
1:55 - 2:05pm


Recent technological advances in DNA barcoding allow us to track millions of independently evolving lineages. These new technologies have opened up the field of experimental evolution to quantitatively study how organisms are able to adapt to new environments. In particular, these allow us to test and fit predictions of phenotypic models of evolution, including Fisher’s geometric model by measuring the fitness of adaptive mutants across a range of subtly changing environments. To do this, we rely on computational methods to (1) process next-generation sequencing data of these barcodes to quantify fitness, (2) simulate data to verify our inference methods, and (3) infer the parameters of these models using non-linear optimization techniques. We utilize Sherlock’s ability to run many, high-memory jobs for these purposes, and identify that adapting populations utilize a small number of fitness-relevant phenotypes for early adaptation.

Genome analysis on the Sherlock cluster

Mark Kowarsky, Graduate Student, Physics
Lightning Talk
2:05 - 2:15pm


Throughout the years, Mark Kowarsky has developed a library of code to run analyses on high-performance computing infrastructure such as the Sherlock cluster. In this talk, Mark Kowarsky will describe the tools he has used and what he has learned about optimizing pipelines for quick, reproducible, and documented research.

Towards precision medicine: using Tuba-seq to predict patient outcome

Chuan Li, Postdoctoral Scholar, Biology
Lightning Talk
2:15 - 2:25pm

Lung cancer accounts for over a quarter of deaths due to cancer in the United States. None of the treatments for lung cancer are very effective for each patient because of the high heterogeneity across patients. Tuba-seq combines CRISPR/Cas9 genome editing with tumor barcoding to generate many genetic alterations in a pooled setting while simultaneously barcoding each tumor to uniquely identify the genetic perturbation and quantify tumor size. Using this powerful platform, we explored the drug responses of mice carrying various tumor suppressor mutations to multiple FDA-approved cancer treatments, generating the first pharmacogenomics map of lung cancer treatments.

Automated cell sorting of large-scale neural calcium imaging data

Biafra Ahanonu, Postdoctoral Research Fellow, Biology
Lightning Talk
2:25 - 2:35pm

Recent advances in large-scale calcium imaging allow neuroscientists to visualize concurrently the dynamics of thousands of individual neurons in live animals, but analysis of these datasets remains a bottleneck. We present novel computational approaches for cell sorting that improve the speed and accuracy of cell identification, activity trace reconstruction, and signal source classification. Our log-likelihood based cell identification and activity trace reconstruction algorithm improves performance on both simulated and real datasets. We then use machine learning based classification approaches to distinguish actual neurons from other non-cellular signal sources in the calcium videos. These methods enabled fast and accurate cell identification when applied to calcium imaging datasets acquired in multiple brain regions. Overall, our computational pipeline provides a versatile, reliable, parallelizable, and scalable means of extracting cellular dynamics in a wide variety of calcium imaging studies. Thus, we expect its usage will improve the speed and accuracy of experiments relying on large-scale neural imaging. We have made part of our analysis pipeline available at: Calcium Imaging Analysis.

Monte Carlo simulation of ballistic electron transport

Aaron Sharpe, Graduate Student, Applied Physics
Lightning Talk
2:35 - 2:45pm


In most conventional materials, electrons travel only very short distances before scattering. However, in nanostructures of ultraclean two-dimensional materials, electrons can travel long distances without scattering. This ballistic transport of electrons can result in drastically different voltage measurements compared to conventional diffusive transport. Here we present a Monte Carlo simulation platform for ballistic transport capable of predicting the voltage at a given contact of a patterned nanostructure. The predictions from this platform are consistent with transport measurements of PdCoO2 nanostructures.