Overly Complicated Excel: April 2013

Monday, April 29, 2013

Rate of the (Rate of the (Unemployment Rate))

This is the second of (possibly) several pieces I'll do on the unemployment rate. If you think I missed something important, by all means comment on it, but there is a chance it just hasn't come up yet.

A Little Bit of Math (and Philosophy)

A train leaves Chicago at 6 PM heading to LA (2000 miles away) at 60 miles per hour. A man, starting 1 hour later in LA, walks at 3 miles per hour towards Chicago along the train tracks. How far from Chicago do they meet?

Trick question. Nobody walks in LA.

Rates are funny things. We tend to think of them as units per time (miles/hour, beats/min, dollars/year), but really they are just units per different unit. (Technically they could be units per same unit but that would be quite boring.) So any fraction is a rate. The ratio 0.77 euro / dollar is an exchange rate. Similarly, any percent is a rate. Cent is 100 of something, so per-cent is 1/100 or .01 of something. You could say, "I pay 31% of my income in taxes," and I would say:

You need a better tax attorney.
Your Tax = 31 * 1/100 * Your Income

Because I could stick the equals sign in there to make this an equation, we can harness the power of algebra and calculus to affect both sides. We start with:

Your Tax = 31/100 * Your Income

If we were to increase the money to one side of the equation (you earn another $100 to Your Income), algebra says we must increase the other side to balance it (by increasing Your Tax by $31):

(Your Tax + 31) = 31/100 * (Your Income + 100)

Calculus says that we can do this for any size change in any variable:

(Your Tax) + (Change in Your Tax) = 31/100 * ((Your Income) + (Change in Your Income))

Which simplifies to:

(Change in Your Tax) = 31/100 * (Change in Your income)

Often written:

d(Your Tax) = 31/100 * d(Your Income)

or 31% = d(Your Tax)/d(Your Income)

Don't be afraid of the d() terminology. It's just an easy way of saying "small change in" and is called a derivative. That last equation read out loud would sound like:

"Change in Your Tax per Change in Your Income is thirty-one percent"

And because we already agreed that fractions, ratios and percents were rates you could also say:

"Your Tax rate is 31% of Your Income"

The Unemployment Rate

So since we agreed that any percent or fraction is indeed a rate... (I assume we agreed. You must have agreed implicitly. I assume you would have said something if you didn't. Ok, you agree.) ... the unemployment rate (reported as "U3") is a real rate. It is defined by the form:

Unemployed People = U3 * Workforce

Where Unemployed are strictly defined as >16 yrs. old and actively searching for work. The Workforce is defined as Unemployed People + Employed People. Of course we (you implicitly) just agreed that we can do this too:

U3 = d(Unemployed People)/d(Workforce)

I like the mental image of a giant game of duck-duck-goose. If U3 is 10%, and our circle of >16 year olds (for some reason tricked into playing duck-duck-goose by the little kids) is 10 people, the chooser runs around the circle and selects one person. If there are 20 people in the circle, they select two. It doesn't matter how long it takes them to choose the person- they can pat you on the head as fast or slow as they like. Time is not in the equation at all. It's not that kind of rate.

Until it is.

Historical Data

While thinking about our current unemployment rate's extremely linear trajectory, I was curious as to whether this was a common phenomenon. Maybe it was just this recovery? Maybe it was just this portion of this recovery? The Bureau of Labor Statistics (BLS) has the monthly unemployment rate from their survey going back to 1948. I threw this in to Excel and plotted it. (I also added a bar to denote recessions based on the National Bureau of Economic Research, though this doesn't play into the current post's theme.)

Oh, and you can track my work if you like.

What jumps out at you?

I was immediately drawn to the right side of the graph, with our recent unemployment jump and current BBQ recovery (low and slow). I also note that there's never really been a "golden decade" of low unemployment. It hit nice low levels in the early 50's, late 60's and late 90's, but has never been sustainably low. Maybe that is the first history lesson here: enjoy the good times, because they won't last. Our economy and job outlook have been consistently inconsistent.

So what else jumps out? Well, for one thing the graph kind of looks like a mountain range. And like mountains, the sides of many of the peaks differ in their steepness. The easier way to say this is that the slopes are different. A slope is a change in something with respect to a change in something else, which, you guessed it... is a rate.

Now what we are looking at is something (U3) that is changing over time. (My apologies if this is already obvious to you. I'm trying to get everyone up to speed together.) If there were an equation that modeled the following function, you could just use calculus to take the derivative:

(and there is an equation... it's just really complicated)

A much easier way to find slopes is to use the data we have (U3 for each given month), and assume that to get to the next data point (U3 for the next month) the slope is linear. Since there are no data points between them, for our purposes this is absolutely true. This makes it simple to calculate the slope as (Change in U3)/(Change in time) or if you like d's it is d(U3)/dt. If you are following along in my spreadsheet, Column A is Date (month) and B is U3. To this I add a Column C with the following equation (type the inside brackets portion):

(Cell C3) [=B3-B2]

This is of course assuming your data moves forward in time as your column value increases. It doesn't really matter if you flipped the data, just flip your equations too. You might notice I leave out time (Column A) from the equation. Excel counts dates in days, and if I already know that the difference I am calculating is over 1 month then all I need to do is complete the column (drag down or double click the right bottom corner) and I have a nicely defined Column C of d(U3)/month. A graph of it looks like this:

This is what I would call "noisy data." It has lots of random fluctuations and when you get a data point like the one in January 1975 it is unclear if it is meaningful because the next month there was a much smaller increase in U3. What we need is a smoothing algorithm. The easiest way to smooth data is to take a simple moving average (SMA) of the trailing 12 months.

(Cell D13) [=AVERAGE(C2:C13)]

Complete the column and you have data that is much better behaved. There are clear trends that match up nicely with the original data. (Technically our SMA is 6 months behind though, which can be solved by plotting the data with an adjusted axis, shown below, or making your Cell D13 [=AVERAGE(C7:C18)]. For our purposes we are just looking at trends though.)

This looks much better! If you conditional format the data for positive and negative (go to format -> conditional formatting, and make two rules: [if cell is < 0, green] and [if cell is > 0 red]) you see long stretches of continuous data. The average for this data [=AVERAGE(D2:D784)] is almost 0 (though not quite, because we are at higher U3 than when the data start). The standard deviation [=STDEV(D2:D784)] is about 0.1, which makes sense. About 95% of the data are within the range of -0.2 to 0.2.

What Does It Mean?

A couple notes on this graph. All the positive peaks are brief and sharp, while the negative peaks are mostly low and elongated (almost not peaks at all). Yes, there have been exceptions to this, but U3 has increased at a high rate (> 0.2% per month) about twice as often as it decreased at a high rate (< -0.2% per month). In the last 50 years we really have had only one deep valley (while we were recovering from a double dip recession in the early 80's). Most often what we see is a prolonged period of d(U3)/dt of about -0.05% per month.

Our most recent unemployment jump (2007-2008) and corresponding recovery (2009-now) fits this model exactly, and as I mentioned in a previous post, our current d(U3)/dt is about -0.06% per month. It is anyone's guess as to whether it will continue, but if you look at the chart, most of our history is a long slow grind towards full employment, punctuated by blips of uncontrolled unemployment spikes.

But Wait... There's More!

We have already found a value for the change in U3 with respect to time, and plotted it on a graph... against time. We could take the derivative again using our linear-by-subtraction method. In this case we take our SMA smoothed data and find the difference per month (Cell E13 [=D13-D12] and complete the column). Here I show the full plot and a zoomed in section where I only plot every 4th datapoint, along with the original U3 so you can see the correlation.

This data is already smoothed and still looks noisy, but the message I want you to take away from this is that the "change in the change" is almost always near zero. When a second derivative equals zero, the data series is a straight line (as opposed to curving up or down). This basically means that U3 is moving along not as a curvy line, but more like a series of straight lines. You can see this just by eye.

The portions of that graph that least match up to the straight black lines are where the second derivative is non-zero. You might notice that the upswings in U3 have a worse fit (by eye) than the slow descent. We can show this in the data too. First we will separate our second derivative data into two columns.

Up-swings in U3 (Cell F12) [=IF(D12>0,E12," ")]

Down-swings in U3 (Cell G12) [=IF(D12<0,E12," ")]

The average of each column is near zero, but more importantly, the standard deviation of the up-swings [=stdev(F12:F786)] is higher (0.031) than the down-swings [=stdev(G12:G786)] (0.024). If unemployment is going up, it is more likely to be accelerating (followed by a deceleration followed by a more steady decent).

Take Home Message

Sixty-odd years of data seems like a lot, with nearly 800 data points to make our graphs, but really we are looking at only a handful of economic events. From the first derivative it looks like these events cause a rapid increase in U3, followed by a slow and steady decrease that has historically lasted two to eight years before a new challenge. For the large economic events such as recessions, you get brief second derivative blips to the positive then negative meaning the rate is no longer linear. When the graph does this it is the signal that the U3 rate is about to change very significantly. Of course with the smoothing in this model you get that signal 6 months after it has happened. Maybe it is the signal to run?

--------

My humor may be low, but I'm trying to keep the quality of my posts high. That means updating only once a week, or whenever I get a really good idea. This, in combination with the imminent demise of google reader, has led people to ask if I could email them when the blog is updated. A compromise that I think will work well is a subscription to this google group (an email list). Send an email to that link (overly-complicated-excel+subscribe@googlegroups.com) and you will be added to the list. The blog will email that list whenever it posts are published. If you want your money back you can always unsubscribe the same way.

Tuesday, April 23, 2013

The Unemployment Line

This is the first of (possibly) several pieces I'll do on the unemployment rate. If you think I missed something important, by all means comment on it, but there is a chance it just hasn't come up yet.

If you saw this graph, what would your thoughts be?

a) Why doesn't he label his axes?
b) Pretty linear, with relatively few outliers and an R-squared of > 0.95.
c) Negative slope, with a rate decreasing of 0.002 somethings per something.
d) All of the above.
e) Seriously, why doesn't he label his axes?

As you may have guessed, the correct answer is d. The graph is not in fact a titration curve in a middle school science fair project (though that would explain the axes), but rather several years of unemployment data.

In September of 2009, the unemployment rate topped out at 10.0 and has been linearly decreasing ever since at the rate of around 0.002% per day. That slow and steady decrease has led to this being called the BBQ recovery: low and slow. These low rates of economic growth over a long period of time are frustrating for many, and may even be self perpetuating... but we'll get to that.

Before I go any further with unemployment, I have to clear things up a bit. Unemployment is tough to talk about. It vies with the Dow Jones for being the most overrated statistic by the media. Calling it data is very generous.

It is from a survey (providing large error bars that are rarely reported).
It is a different survey than that of business job creation (providing confusion).
It is not supposed to be revised. Although sometimes it is.
It changes by such small amounts that rounding plays a large part in reporting.

Even the definition is weird. When I say "unemployment" I mean U3, which is "unemployed people" / "workforce". Workforce is defined as unemployed people plus employed people, so you can see the circular reasoning starting to take effect. Unemployed people have to be >16 yrs. old and actively looking for work (last 4 weeks). If you're not looking for work you are just... there. Some people like talking about U6 (including discouraged and underemployed), but that starts to open lots of doors that I won't get into now.

Then you add the political aspect. This is not a political blog, so I'll actively try to show many sides of things (and vigorously poke fun at them), but you can imagine what gets twisted when there are many interpretations of a statistic.

Okay. Back to the BBQ recovery. The point of this post is just to point out the linearity of our downward trend and some possible consequences. It is important to note that over a longer time (pre-Fall 2009 or some unknown time in the future) the linearity goes away. Everything is linear if you zoom in close enough. I say this as a caveat to the whole idea of modeling the unemployment rate. Future posts will deal with whether or not this is a generalizable phenomenon. So. Lets get to it!

Track my work!

In November 2012, I started by making up a column of dates and the corresponding unemployment rates (BLS). I went back to October 2009 mainly because it was the highest unemployment rate in my lifetime, and I was interested in the influence of U3 on the election. A happy side effect is that Oct '09 marks the end of The Recession and start of The Recovery. I had always thought that U3 would be somewhat sinusoidal, but was startled to see how linear it was (see graph above). Now, if you extrapolate forward you get some interesting numbers!

March 2013- 7.63%
Two weeks ago the March numbers came out, and the actual U3 was 7.6%, so we are looking good so far. One possibility is to add data and keep updating the fit, but I will still keep the original guesses for posterity.

January 2014- 7.0%
A happy new year indeed, though this number wouldn't come out until the first week of February. A happy ground hog day indeed!

September 2014- 6.5%
An important marker! The federal reserve has stated that this number is it's goal. Once U3 reaches 6.5% they will most likely stop quantitative easing. Maybe even before if it looks like they will get there anyways. I could do a whole post on this but basically it means bond rates will go up and so will mortgage rates. You might even earn a bit more on your bank account interest rates and CD rates, though I wouldn't hold my breath. There is one small problem with this though. As interest rates rise and bonds become more attractive, economic growth slows down because the federal reserve took all the free money away. Then again, that could prevent further bubbles, which is kind of the other half of the federal reserve's job. Nice job federal reserve!

Unfortunately, slower growth might cause the U3 graph to become non-linear, throwing off all further predictions. But let's pretend it doesn't.

June 2015- 6.0%
We drop down to the rate we were at in July 2008 when the McCain team famously said that we were in a mental recession, and gave Obama a talking point for the next 4 months. (In fact we had been in recession for 6 months already and would be for another year.)

July 2016- 5.2%
We reach the average rate over the entire Bill Clinton - George W. Bush era. (Not that presidents really change the unemployment rate themselves.) Anyways, it's not a round number, but probably the most reasonable goal for a second term Obama to feel vindicated. Not that that will matter as we will have new candidates to ~~argue over the unemployment rate~~ engage in insightful political discussion. Did I mention that presidents don't affect unemployment?

As an aside- it seems odd that you will still know quite a few unemployed people (~ half as many as Oct '09). It just won't feel like it. It will feel like a 3-4% U3, because U6 will have dropped so much. Hopefully.

June 2018- 3.8%
We reach the lowest rate in the in my lifetime. President Joe Biden and VP Rand Paul give each other high-fives in the back halls of congress. We are basically left with frictional unemployment (where everyone's dream job is out there waiting for them and they just haven't found it yet). What? We haven't broken our linear fit yet? Lets keep going!

October 2023- 0%
Now your dream job actively seeks you out. All work is volunteer because all goods are made by robots running on rainbows. President Rand Paul and VP Joe Biden (having voluntarily switched roles in 2020) are seen high-fiving robots in the halls of congress.

So yeah. It can get pretty silly, but the bottom line is that the graph is linear for now. Until we lose linearity, the best guess for next month's U3 in this BBQ recovery will be this month's rate minus 0.06% (0.002% per day). Next installment of this series will deal with what history can tell us about when we lose linearity.

Monday, April 22, 2013

Welcome to My EXCELent Blog! (Too soon? Too soon.)

Yesterday I picked the wrong lane at the grocery checkout. I know, not earthshaking news. Actually, not news at all since I have never picked the right lane. At least thats how it feels. Picking the right lane would be news. Perhaps still not "earthshaking."

Wrong line calculation:
10 (years of shopping for myself) * 52 (weeks/year) * 1 (weekly shopping trip) = 520 trips

But my wife has done the grocery shopping without me many times:
6 (yr when trips fit her schedule better) * 52 (wk/yr) * 1 (trip/wk) * 80% (I couldn't go) = - 250 trips

But we shop for things other than groceries where we pick a line:
10 (years of shopping) * 52 (wk/yr) * 0.5 (trip/wk) = 260 trips

And I'm pretty good at convincing my wife to let me drive the cart:
520 - 250 + 260 = 530 trips
530 (trips) * 85% (of the time I pick the checkout lane) =
~450 times my wife could have leaned over and whispered, "You have chosen... poorly."

Right line calculation:
= 0

Compare that to the 150-200 earthquakes that have been "feel-able" in the northwest in that time and [insert "earthshaking" joke here].

We get a lot of sensory stuff thrown at us every day that makes us feel a certain way, and (at least for me) make us ask questions. The nice thing about most questions, though, is that there is usually some data you can scrounge up and throw into an excel model. Five minutes (or an hour or a week) later you have an answer. It might not be [omit "earthshaking" joke here], but it might inform you about why you felt what you did. Do I really pick the wrong line more often than the northwest gets earthquakes? Maybe. Does that validate how I feel about it? Definitely.

So yeah. Thats what this blog is about. When my fish have babies and I worry that they will overrun the tank (and the world?) I model it in Excel. When talking heads say they you can beat the market by "timing momentum," I model it in Excel. When I worry about retirement, housing, or why exponents act so crazy, I model it in Excel. When I worry about the debt to GDP ratio I DO NOT MODEL IT IN EXCEL. (Actually, I probably do.)

I hope you enjoy the blog. I know the in-text numbers are boring, and I'll be working on combining Excel screenshots with google-drive interactivity to minimize clutter.