In this section you can find some of my open source work as well as some links to older in-depth analyses from my college days.
Frequentist approaches to AB/Hypothesis testing are notoriously hard to understand - even for some statisticians. Bayesian methods tend to be more difficult computationally and mathematically, but in terms of interpretability they easily take the cake. Long gone are the days of ‘rejecting the null hypothesis with a p-value of .043’. Rejoice in phrases such as ‘The probability that A has a 3% lift over B is 96.4%’.
bayesAB is an R package which provides a suite of functions for a user to conduct and interpret the results of a count or proportion AB test in a Bayesian way. The package is meant to be used at all steps of the process - from choosing a prior, to interpreting final results, and then calculating lifts based on certain thresholds.
Non technical users may simply use these methods as drop-in replacements for the t.test and prop.test in R. Data-minded people may opt to read some of the help documentation and play with some of the helper functions.
- Scrape ESPN for all player photos
- Extract and normalize their faces
- Build a Convolutional Neural Network to match their faces with the sport they play
- Achieve 93% accuracy on holdout set
Old but Good (?)
Flyvis was my undergraduate statistics honors thesis/project. Interactive visualization has always been a huge interest of mine. For this project, we showed how an interactive framework helps analyze data in a different and objectively quicker light to unearth more hidden insights through an iterative and interactive approach to data analysis.
We explore the possibility of improving data analysis through the use of interactive visualization. Exploration of data and models is an iterative process. We hypothesize that dynamic, interactive visualizations greatly reduce the cost of new iterations and thus f acilitate agile investigation and rapid prototyping. Our web-application framework, flyvis.com, offers evidence for such a hypothesis for a dataset consisting of airline on-tim e flight performance between 2006-2008. Utilizing our framework we are able to study the feasibility of modeling subsets of flight delays from temporal data, which fails on the full dataset.
Check out the website at flyvis.com (now dead) where you will find the web application along with the accompanying paper and poster.