Quant Interview Questions (part 1)

Zach McCormick
Quant Interview Questions (part 1)

I finally have a free moment (I should be doing work, but I don’t have any crazy deadlines to meet today), so I thought I would start posting the questions and answers to one of my recent job interviews, as I found them incredibly interesting and thought provoking. In fact, I would say that their interview process rekindled a love of mathematics that I once had, and lost in high school due to the sheer amount of busywork I was made to do in the “accelerated track” at my school.

Background

I interviewed for a developer position at one of the top quantitative/proprietary trading firms out there, and thus they are looking for the cream of the crop in terms of quantitative reasoning and programming ability. I had four interviews over the phone, of which the first three went well, but I struggled a bit with the fourth. I applied to a handful of other similar firms and have not heard back from any of them.

Difficulty of quant programming positions

For reference, they all required practically every math-related statistic about me they could before the interviews: my GPA is about 3.55 out of 4 at a top 20 school, I scored an 800 on the math part of the SAT, an 800 on the Math II SAT Subject Test, a 36 on the math part of the ACT, and a 168 out of 170 on the quantitative part of the GRE. All of this was required to submit an application to most firms. I applied to about 20 of these firms and was only approached by 1, so you have been warned! :-)

Question

The first question was “given an input stream, how would you collect a k-sized evenly-distributed sample of the input?”’

Constraints and clarification

First, to answer these types of questions, you need to fully understand the question and all of the constraints. First, I asked about data types — are these numbers? are they objects? — and the interviewer told me it didn’t matter, but we could pretend they are integers for now and move to the general case later. Next, I asked about total size — my intuition first says that I need to know the size of the sample space in order to sample it. He said “input stream” is meant to be like an iterator — you just keep calling next() until it stops returning results — so you don’t know the total size of the set. I next asked “okay, so if we start at 0, we always know how many we have at each step, right?” to which he answered, “yes.” At this point, I could restate the problem in a more explicit form: “Given an input stream of objects, how would you collect a k-sized evenly-distributed sample of the input, so that at any point in time, you have a k-sized evenly distributed sample of the input so far.**”

How to start

Next, I started typing out the code (we used a Google Docs-esque screen sharing thing to share coe). I knew that, given a constant K, I would always need to iterate through the first k items and add them to the sample. Based on the intuitive concept: If you have k items, then a k-sized sample includes all of the items.

Inductive step

Next, the hard step: how do I determine whether or not the k+1'th item goes into the sample set or does not? If it does, which item does it replace?

It turns out, as long as we keep a running count of how many items we’ve seen so far, let’s say y, then we just need to roll a y-sided die to determine whether or not it goes in the sample. If you roll a 1, 2, …, or a k, then it goes in the sample. If you roll a k, k+1, …, y, it does not. If it goes in the sample, then you roll a k-sided die to determine the index of the element it replaces. This can then be proven to work inductively and thus answers the theoretical question. Programmatically, I implemented this with the Random class of Java.

Concrete Example

Consider a 6-sized sample of natural numbers. We fill it at first with 1, 2, 3, 4, 5, and 6. Next we get a 7. We roll a 7-sided die (or in the program, we generate a random number from 1 to 7). We get a 5. Since 1 <= 5 <= 6, we put 7 in the sample. We roll a 6-sided die (generate a random number from 1 to 6). We get 4. So now we replace 4 in the set, so our sample looks like {1, 2, 3, 7, 5, 6}.

1, 2, 3, 4, 5, and 6 all had a 100% chance to be in the sample. With the 7th item, they all need to have a 6/7 chance of being in the sample for it to be evenly distributed. The roll for 7 makes sense, since it had a 6/7 chance of being in the sample just based on its roll. Anything currently in the sample, in our case 4, now has a 1/6 chance of not being in the sample. Thus, we get a (6/7)*(1/6)=1/7 chance of not being in the sample REGARDLESS of the result of the roll for 7, thus giving it a 6/7 chance of being in the sample, just like 7 has based on its roll.

Cool math, eh?

Zach McCormick

Zach McCormick

Zach is a software engineer and manager passionate about building and maintaining global-scale distributed systems. He has experience with distributed systems, web applications, and mobile applications across a variety of industries, including marketing automation, fintech, IoT, healthcare, and mobile cybersecurity, as well as across a variety of languages and technologies, including Python, Java, JavaScript, Ruby, PostgreSQL, MySQL, MongoDB, Redis, and others. He currently works for Braze.
Zach McCormick

Interested in chatting?

I'm always happy to chat about software engineering challenges of all sorts - architecture, organizational, or otherwise. Just drop me an email at zachary.tyler.mccormick@gmail.com.