I always get giddy when I can apply real statistics and math to problems in my life. Recently, I had an opportunity to apply the ‘Taxicab Problem’ to something that came up at work. Given that I work for a ridesharing platform and I was quite literally counting “taxis” (or at least cars meant to drive others around), this was doubly exquisite.

For the uninitiated, the Taxicab / Germany Tank problem is as follows:

Viewing a city from the train, you see a taxi numbered x. Assuming taxicabs are consecutively numbered, how many taxicabs are in the city?

Read on →

bayesAB 0.7.0

Quick announcement that my package for Bayesian AB Testing, bayesAB, has been updated to 0.7.0 on CRAN. Some improvements on the backend as well a few tweaks for a more fluid UX/API. Some links:

Now, on to the good stuff.

Why should we care about priors?

Most questions I’ve gotten since I released bayesAB have been along the lines of:

  • Why/how is Bayesian AB testing better than Frequentist hypothesis AB testing?
  • Why do I need priors?
  • Do I really really really need priors?
  • How do I choose priors?
Read on →

This is a port of the bayesAB vignette. Check the full vignette here. Check it out on Github here. Newer version has since been released.

Most A/B test approaches are centered around frequentist hypothesis tests used to come up with a point estimate (probability of rejecting the null) of a hard-to-interpret value. Oftentimes, the statistician or data scientist laying down the groundwork for the A/B test will have to do a power test to determine sample size and then interface with a Product Manager or Marketing Exec in order to relay the results. This quickly gets messy in terms of interpretability. More importantly it is simply not as robust as A/B testing given informative priors and the ability to inspect an entire distribution over a parameter, not just a point estimate.

Read on →

My New Year’s resolution is to make more than one blog post in 2016. I’m halfway to my minimum goal as of January 2nd so things are looking good.


Twitter released a new R package earlier this year named AnomalyDetection (link to Github). The Github goes into a bit more detail, but at a high-level it uses a Seasonal Hybrid ESD (S-H-ESD) which is built upon the Generalized ESD (Extreme Studentized Deviate Test) - a test for outliers. The S-H-ESD is particularly noteworthy since it can detect both local and global outliers. That is, it can detect outliers within local short-term seasonal trends, as well as global outliers that fall far above or below all other values.

Read on →

Let’s take $n$ distinct points on the real line:

Yay. We can now define the Lagrange Polynomials :

Read on →

Although virtually obsolete, Roman Numerals are subtly embedded into our culture. From the Super Bowl and Olympics to royal titles, Roman Numerals refuse to fully be extinguished from our every day lives. And that’s not without reason. All numbers are beautiful and Roman Numerals are no exception, even if they are written a little differently from their Arabic counterparts.

In this post, we’ll examine some fascinating properties of Roman Numerals - namely the lengths of Roman Numerals in succession.

First, we define a simple Arabic –> Roman Numeral converter. Start by creating two vectors, one for the 13 Roman symbols and another for the Arabic counterparts. Next, a simple for/while combination iterates through the arrays and chooses the appropriate Roman symbols while iteratively decreasing the input variable.

Read on →

Update 2017-09-26: Please don’t e-mail me asking to share the final model with you.

For one of my computational finance classes, I attempted to implement a Machine Learning algorithm in order to predict stock prices, namely S&P 500 Adjusted Close prices. In order to do this, I turned to Artificial Neural Networks (ANN) for a plethora of reasons. ANNs have been known to work well for computationally intensive problems where a user may not have a clear hypothesis of how the inputs should interact. As such, ANNs excel at picking up hidden patterns within the data so well that they often overfit!

Read on →

I’m gonna share a short code snippet that I thought was interesting. This post is inspired by one of my engineering computation classes at Rice. The program initializes a pair of coordinates ‘z’ and iteratively updates z by matrix multiplication based on some random number generation criteria. After each successive coordinate update, the new ‘z’ is plotted.

Read on →

I was inspired by a few animated gifs that I saw recently so I decided to make one of my own. For this project, I sought out a way to effectively visualize how Mcdonald’s expanded throughout the world. To do this, I created a heatmap of the world and using animations I was able to efficiently map out how McDonald’s became more popular over time.

The data I am using is from this Wikipedia page. It took a small amount of manual cleaning before I could import it into R just because some of the countries’ spellings from this article did not match with what is used in the R ‘maps’ package.

Read on →