Tools in Data Science

Visualizing Forecasts with Excel#

Excel File used in the video

You’ll learn to perform exploratory data visualization on time-series financial data using Excel, covering:

Visualizing Trends with Sparklines: Use PivotTables and Sparklines to create compact, in-cell charts that reveal the performance trend of various financial securities (currencies, indices, commodities) over time.
Forecasting and Measuring Volatility: Learn to calculate key metrics like the average value, use the GROWTH function to forecast the next day’s performance, and measure volatility using standard deviation.
Analyzing Relationships with Scatter Plots: Create scatter plots to visually investigate how two different securities move in relation to each other.
Quantifying Correlation with R-Squared: Add trendlines and display the R-squared value on a scatter plot to quantitatively measure how much of the movement in one variable is explained by another.
Using the Data Analysis ToolPak: Learn how to enable and use Excel’s powerful Data Analysis ToolPak to perform more advanced statistical analysis.
Creating a Correlation Matrix: Use the ToolPak to efficiently generate a complete correlation matrix, showing the correlation coefficient for every pair of securities in the dataset.
Interpreting a Correlation Matrix with Conditional Formatting: Apply color scales to the matrix to instantly identify strong positive correlations, negative correlations, and outliers.
Discovering Insights and Forming Hypotheses: Learn how to use these visualizations to spot patterns and form data-driven hypotheses, such as the high correlation between the currencies of neighboring countries.

Here’s another version of this analysis, delivered as part of a talk:

Transcript#

Data visualization can be a great way of exploring data, and Excel is actually a pretty fantastic tool to be able to do that. What I’ll be showing you in this session is how you can create simple visualizations in Excel to explore time series data and use that to find meaningful insights about the data itself.

Let’s start with this data. What we have is data about a series of securities. That includes currencies like the Australian Dollar, the Canadian Dollar, the Swiss Franc, and so on, as well as some indices, like the FTSE, the S&P, and the NASDAQ. This apart, we also have some commodities, like the price of gold or the price of silver, or the price of gold, or the price of platinum.

This information, that is, the values for each of these securities, we have for a period ranging from August—that’s somewhere around the 8th of August—all the way down to the 6th of September, which is a one-quarter, 90-day period.

Now, what might we want to see from all of this information? Well, one thing we might be curious about is what’s the trend? How is the Australian dollar going? How is NASDAQ trending, and so on? We may also want to see whether the value tomorrow is likely to be higher or lower. Which of these has a higher value today? Which of these fluctuates more? These are examples of questions that we may want answered. And the way in which we might answer them is potentially through a visualization that might look something like this.

Let’s zoom in. We have for each of these securities—let’s say the FTSE, the S&P, etc.—a sense of what the trend looks like. So, the FTSE looks like it dipped, rose, dipped again, and rose even further, which is almost exactly the same trend as the S&P and the NASDAQ, and is very similar to the trend exhibited by the Australian dollar as well. But it is slightly different from the Brazilian Real and is certainly different from something like the Hong Kong dollar, which seems to have a much more pronounced, discrete trend, or the Israeli Kroner, which has a very different pattern as well.

From this, we can see that the indices seem to be going fairly similarly, but the currencies are, though somewhat close, still fairly different from each other, and we can infer what the pattern is from those. We can also see that the Israeli Kroner is probably one of the very few that is actually dipping rather than rising compared to the others. We can also see what the average values of these indices or the exchange rates are. We can see how much they are likely to grow the next day—effectively a forecast. From this, we can quickly infer that the Brazilian Real is what’s forecast to grow the most tomorrow. And also from their variance as a percentage, we can see that the Brazilian Real also has had the highest percentage fluctuation. The Indian Rupee has also had a fairly high fluctuation but is not expected to grow as much tomorrow.

The range tells us another measure of fluctuation, which is the spread between the highest and the lowest values as a percentage. And we can see that the Brazilian Real has had a 6% spread. The NASDAQ has fluctuated by as much as 5%, but the Hong Kong Dollar is barely fluctuating at 0.04%.

How does one create a visual like this? We’ll start by creating a pivot table. So let’s select this data range, click on Insert, PivotTable from this table or range. And for the selection that we have, it says I can create it in a new worksheet or an existing one. Let’s create it in a new worksheet.

Now, what I want is for each of the securities, for each of the dates that we have—it says I already have a pivot table on this, do I want to replace it? No, I don’t want to replace it. Now, this has condensed the dates into months by adding a months column and under that, dates. I don’t really want to collapse it; I just want to see all the dates at one shot. And now I can get the rates for the individual dates. So now this gives me for each security, for each date, what the value is.

This allows me to start putting in a variety of trends. So, let’s start by adding a new column here. We’ll call this column “Security” and we’ll put in the values same as, let’s say, cell C5. That’s FTSE, and we’ll extend the selection from here all the way down to the bottom. Let’s get rid of the grand totals; we don’t really need those. Those don’t have a meaning. We’ll remove the grand totals here, and we’ll remove the grand totals here. Right-click, remove grand total. All right.

Now, let’s add a column for the trend. And we’ll call that column “Trend”. The way to insert this is using the Insert menu and the sparkline called “Line,” where we can pick any data range, such as from 8th of August for the FTSE all the way to the 6th of November. And that automatically inserts a sparkline. You can expand the cell and the sparkline fills the entire range, and this can now be extended all the way down to the last row, which fills all of these trend lines automatically, and you can make inferences about how these are moving. So you can see that platinum, for example, seems to have had a slightly bigger dip than silver has had.

We can also look at the average value. So that’s simply equals average of the data over this entire range. So we start with, let’s say, this is cell G5 all the way down to the end, which would be CS5. Looks like I made a slight mistake in that formula. So, yeah, there we go. Now that we have this, it can again be extended all the way down to the bottom, and we’ll have the average values for each of these. We can decide how exactly we want to format these cells depending on the actual numbers.

We can also predict what the value is going to be. So let’s call that “Forecast,” for which we can use a formula called GROWTH. This takes three things: the growth formula requires a set of known Y values (that is, these values) and a set of known X values, which are kind of like the dates, and the new value for which we want to predict. So let’s fill those in. What I’m going to do is take these dates’ values and convert them into numbers. So this can be formatted as a number that is the number of days since the 1st of January, 1980, and you’ll see that it increases by one for every successive date. So we can go all the way to the end and then take this data and then say for the next available date, that is the 7th of November, that’s simply this value plus one. And we want to predict for this as an X value.

So let’s do that. Equals GROWTH of the known Y values, which is simply the currency values, against the known X values, which is simply these values. And I’m going to press F4 to add a dollar against the formula so that when we copy-paste this down, these numbers don’t change. And that leaves us with the value that we want to predict, which is this particular value. So that’s predicting that the Australian dollar is going to be 0.96. Okay, but how much higher is that than the previous day? So let’s put in the value of the last day. To look at the growth, that’s going to be a bit over 1%, say 1.5% higher than the last day’s value. That’s the forecast for the next date. Let’s take this all the way down and we can see which of these securities is going to increase or decrease. We, unfortunately, aren’t able to do this for missing values, and since the FTSE, S&P, and the NASDAQ have weekends that don’t have values, we can’t use the growth formula. But for the rest, we can do this and apply a conditional formatting that gives us a color scale that tells us which of these values, in this case, the Brazilian Real, is going to be the one with the highest increase.

We can then similarly add one for what’s the variation. So let’s take the standard deviation of the same data, starting from here all the way down to the end. And that gives us what’s the standard deviation. But rather than have this as an absolute number, it’s probably better to divide this by the mean so that we get the deviation, the variance as a percentage of the mean. And that’s slightly more informative. It says this varies plus or minus a certain percentage. And again, with a certain amount of conditional formatting, what we’ll be able to do is see where the variation is higher. If you’re not comfortable with standard deviation, then you could just use min and max instead and see how much the spread of the value is. But if there are outliers, if there are very high and very low values, then you’re probably better off with something which does a little bit of outlier removal.

But what we have right now is a quick sense of how each of these securities are performing. And you can think of that as the first level of exploratory visualization of this data.

Now that we have this, another exploration you may want to do is, how do these securities move with each other? So, for instance, it looks like when the Australian dollar is moving up, the Brazilian Real is also kind of moving up. But how close are they? And it doesn’t look like the Canadian dollar is that close. So can we actually get a quantitative measure or an intuitive feel for how close these move together? Well, that’s the next level of visualization where we could take, for example, the correlation between the Australian dollar and the Brazilian Real and see how they tend to move together. And we find that the pattern’s pretty close.

In fact, we see that about 82%, or 83% almost, of the variation in the Brazilian Real is explainable by the Australian dollar’s movement. That’s pretty strong. But if I take the Canadian dollar, now we are taking the X-axis to be the Australian dollar. How much of the Australian dollar’s movement can explain the Canadian dollar’s movement? The answer is only about 26%. Not so strong. Now this is not exactly a perfect pattern. What about the Swiss Franc? 60%, slightly closer. So I can quantitatively say that the Australian dollar is actually slightly closer to the Swiss Franc than to the Canadian dollar. And we can keep moving these and figuring out. At 85%, the Chinese Yuan is even closer, closely determined by the Australian dollar. And we can look at any pair of currencies to see how well they are related.

So, how do we create this kind of a visualization? How do we also use this to spot the outliers? What are the specific points when we actually had an unusually high or an unusually low value? Well, all it takes is, pick a column, let’s say the Australian dollar in this case. We want to find the influence of the Australian dollar on, pick another column, let’s say the Swiss Franc. Now, go to Insert and click a scatter plot. That gives you the scatter plot right away, and it tells you that it’s the CHF that you’re predicting, but let’s ignore that for now. The next thing that you can do is right-click and add a trendline. A trendline can be linear, logarithmic, exponential, and so on. In this particular case, we’ll assume that we want a linear trendline. The variations are pretty small. And you can also add an equation and the R-square to this, which you can move around and format any way you like.

So, now we can do exactly the same thing that we did before, which is change any of the values and see how well one particular currency is able to predict the value of another currency. This gives us a good feel, therefore, for which of these currencies have a strong correlation or regression, that is, what is the strength of the impact of one on the other. But it would be good if I could just get the number without having to make this movement, which takes us to the next level of abstraction. Where, if I could just put in, for example, the slope or even the correlation coefficient. Let’s just say I want the correlation coefficient of, let’s make it a bit bigger, the Brazilian Real against the Australian dollar. So that’s equals correlation of, take all the values of the Brazilian Real against the Australian dollar. So that says 91% correlated. Great. What if I wanted to do this for the Canadian dollar? That’s 54%. That’s a lot less.

This is 51% Canadian dollar against the Australian dollar, and this is 78% Swiss Franc against the Australian dollar, and so on. And I can quickly see that so far, in fact, let’s add a conditional formatting around this to see which is the most correlated with the Australian dollar. It looks like at 92%, the Chinese Yuan, no, sorry, the Singaporean dollar is closest to the Australian dollar. Or no, sorry, I missed the New Zealand dollar. The New Zealand dollar is the closest to the Australian dollar with a 95% correlation.

Now, we got these values. What if we could get these values for any pair of currencies? I just have these for the Australian dollar. What if we could do this for any pair of currencies? Well, that you can do using a correlation matrix, and that’s what this shows. So if I take, for example, the Canadian dollar versus, let’s say, the Brazilian Real, that’s a 54% correlation. If I take the Hong Kong dollar versus, let’s say, the Chinese Yuan, that’s a 57% correlation. Now, from this, I can, with the color coding, see a few things. Wherever it’s red, the correlations are low or even negative. So the Pakistani Rupee, for example, seems to be anti-correlated with most securities, but moderately positively correlated with gold, silver, and platinum. We also see that the Israeli Kroner is not very strongly positively correlated with most securities, though it is fairly strongly positively correlated with the Pakistani Rupee here. So using a table like this, what we are able to do is figure out which securities move well with each other.

To apply correlations, you need the Data Analysis ToolPak, which you’ll find in the Data menu out here under Analysis, called Data Analysis. If you don’t find it, then you may want to go to the File menu, Options, and look for Add-ins. And in the Add-ins section, make sure that you have the Analysis ToolPak Excel add-in inserted. If that’s not available, not possible, or you’re using a version of Excel that just does not have it, not a problem. You can still apply the correlation formula manually, the way we just did here, and do it for all of the securities. But if you have the Analysis ToolPak, what I’m about to show you becomes a whole lot easier. So you click on Data Analysis and choose Correlation. There are several other kinds of analysis that you can do, but you choose Correlation, click OK. And the input range that we select will be, well, effectively the same data that we have here, each of the values as columns. And then we say labels are present in the first row, because they are present in the first row, and the data is grouped as columns, not rows. That’s how we have structured the data. And the output is going to go into a new worksheet. Click OK, and that gives us exactly the same data that we looked at earlier. What we’ll do is change these values, rather than numbers, to percentages, and that makes it a little easier to see. We’ll also apply a conditional formatting to this, and that can be a red-amber-green kind of structure. And we have exactly the same visual that we had before.

So, to summarize, what we did was took the underlying raw data. We then converted it to a set of exploratory visuals at the next level, which helped us see what the trends were and what the value is likely to be. We then started looking for patterns within these to see whether one particular security was more correlated with another. And we looked at which were securities that were most correlated, which were some of the outliers using scatter plots. And then we took it to the next level and started looking at whether there are patterns of correlations themselves, using a correlation matrix to see whether there are some securities that are consistently anti-correlated with others, like the Pakistani Rupee or the Israeli Kroner, and whether there are some securities that are very highly correlated with each other, like the Singapore dollar and the Malaysian Ringgit, or the Australian dollar and the New Zealand dollar, both of which happen to be neighbors in this particular case.

Where I want to take this is that Excel in itself is a very powerful tool for exploratory analysis and exploratory visualization. And you can take datasets like this time series data and create some fairly powerful insights just from that.