0

Bank Stocks Progression – the financial crisis

Share this article!

Bank Stocks Progression

In this data project we will focus on exploratory data analysis of stock prices.This will help you develop and practice your visualization and pandas skills. Here we’ll also enlighten you about how to fetch data straight from google and use it as your data set.

So let’s get started…

NOTE: Chose this project only if you have a slight interest in finance and stocks, there may be terms which you are unfamiliar with but don’t worry we’ll lead you in the right direction.


We’ll focus on bank stocks and see how they progressed throughout the financial crisis all the way to early 2016.

Get the Data

In this section we will learn how to use pandas to directly read data from Google finance using pandas!

First we need to start with the proper imports, which we’ve already laid out for you here.

*Note: You’ll need to install pandas-datareader for this to work! Pandas datareader allows you to read stock information directly from the internet Use these links for install guidance (pip install pandas-datareader).

The Imports

Importing the libraries that we are going to use.

In [1]:
from pandas_datareader import data, wb
import pandas as pd
import numpy as np
import datetime
import matplotlib.pyplot as plt
%matplotlib inline

Data

We need to get data using pandas datareader. We will get stock information for the following banks:

  • Bank of America
  • CitiGroup
  • Goldman Sachs
  • JPMorgan Chase
  • Morgan Stanley
  • Wells Fargo

Figure out how to get the stock data from Jan 1st 2006 to Jan 1st 2016 for each of these banks. Then we will set each bank to be a separate dataframe, with the variable name for that bank being its ticker symbol. This will involve a few steps:

  1. Use datetime to set start and end datetime objects.
  2. Figure out the ticker symbol for each bank.
  3. Figure out how to use datareader to grab info on the stock.

Use this documentation page for hints and instructions (it should just be a matter of replacing certain values. Use google finance as a source, for example:

# Bank of America
BAC = data.DataReader("BAC", 'google', start, end)

Lets set the start date and end date we will use this when retrieving data

In [2]:
start = datetime.datetime(2006, 1, 1)
end = datetime.datetime(2016, 1, 1)
In [3]:
# Bank of America
BAC = data.DataReader("BAC", 'google', start, end)

# CitiGroup
C = data.DataReader("C", 'google', start, end)

# Goldman Sachs
GS = data.DataReader("GS", 'google', start, end)

# JPMorgan Chase
JPM = data.DataReader("JPM", 'google', start, end)

# Morgan Stanley
MS = data.DataReader("MS", 'google', start, end)

# Wells Fargo
WFC = data.DataReader("WFC", 'google', start, end)

Once you get the data , transform it into a data frame for the banks as shown below

In [4]:
# Could also do this for a Panel Object
df = data.DataReader(['BAC', 'C', 'GS', 'JPM', 'MS', 'WFC'],'google', start, end)

We will create a list of the ticker symbols (as strings) in alphabetical order. Call this list: tickers

In [5]:
tickers = ['BAC', 'C', 'GS', 'JPM', 'MS', 'WFC']

Using pd.concat to concatenate the bank dataframes together to a single data frame called bank_stocks. we will also set the keys argument equal to the tickers list. Also pay attention that we have used axis= 1 , what does that mean ?. comment down below !!

In [6]:
bank_stocks = pd.concat([BAC, C, GS, JPM, MS, WFC],axis=1,keys=tickers)

You must have figured out that the data frame is going to be multi-indexed so here we will set the column name levels.

In [7]:
bank_stocks.columns.names = ['Bank Ticker','Stock Info']

Lets check the Data Frame now.

Make sure you notice how the codes written above have affected the data frame .
PS- now you see the multi-indexing

In [8]:
bank_stocks.head()
Out[8]:
Bank Ticker BAC C MS WFC
Stock Info Open High Low Close Volume Open High Low Close Volume Open High Low Close Volume Open High Low Close Volume
Date
2006-01-03 46.92 47.18 46.15 47.08 16296700 490.0 493.8 481.1 492.9 1537660 57.17 58.49 56.74 58.31 5377000 31.60 31.98 31.20 31.90 11016400
2006-01-04 47.00 47.24 46.45 46.58 17757900 488.6 491.0 483.5 483.8 1871020 58.70 59.28 58.35 58.35 7977800 31.80 31.82 31.36 31.53 10871000
2006-01-05 46.58 46.83 46.32 46.64 14970900 484.4 487.8 484.0 486.2 1143160 58.55 58.59 58.02 58.51 5778000 31.50 31.56 31.31 31.50 10158000
2006-01-06 46.80 46.91 46.35 46.57 12599800 488.8 489.0 482.0 486.2 1370250 58.77 58.85 58.05 58.57 6889800 31.58 31.78 31.38 31.68 8403800
2006-01-09 46.72 46.97 46.36 46.60 15620000 486.0 487.4 483.0 483.9 1680740 58.63 59.29 58.62 59.19 4144500 31.68 31.82 31.56 31.68 5619600

5 rows × 30 columns

EDA

Let’s explore the data a bit! Before continuing, I encourage you to check out the documentation on Multi-Level Indexing and Using .xs.
Make sure you have a clear understanding of how to use them, since that will be a major part of this project.

What is the max Close price for each bank’s stock throughout the time period?

In [9]:
bank_stocks.xs(key='Close',axis=1,level='Stock Info').max()
Out[9]:
Bank Ticker
BAC     54.90
C      564.10
GS     247.92
JPM     70.08
MS      89.30
WFC     58.52
dtype: float64

Adding a new feature here, we will create a new empty DataFrame called returns (Change in percentage). This dataframe will contain the returns for each bank’s stock. returns are typically defined by:

$$r_t = \frac{p_t – p_{t-1}}{p_{t-1}} = \frac{p_t}{p_{t-1}} – 1$$

In [10]:
returns = pd.DataFrame()

We can use pandas pct_change() method on the Close column to create a column representing this return value. Here we create a for loop that goes and for each Bank Stock Ticker creates this returns column and set’s it as a column in the returns DataFrame.

In [11]:
for tick in tickers:
    returns[tick+' Return'] = bank_stocks[tick]['Close'].pct_change()
returns.head()
Out[11]:
BAC Return C Return GS Return JPM Return MS Return WFC Return
Date
2006-01-03 NaN NaN NaN NaN NaN NaN
2006-01-04 -0.010620 -0.018462 -0.013812 -0.014183 0.000686 -0.011599
2006-01-05 0.001288 0.004961 -0.000393 0.003029 0.002742 -0.000951
2006-01-06 -0.001501 0.000000 0.014169 0.007046 0.001025 0.005714
2006-01-09 0.000644 -0.004731 0.012030 0.016242 0.010586 0.000000

Now we create a pairplot using seaborn of the returns dataframe. What stock stands out to you? Can you figure out why?

In [12]:
#returns[1:] figure this out
import seaborn as sns
sns.pairplot(returns[1:])
Out[12]:
<seaborn.axisgrid.PairGrid at 0x2a5ce53bd68>

Background on Citigroup’s Stock Crash available here.

You’ll also see the enormous crash in value if you take a look a the stock price plot (which we do later in the visualizations.)

Using this returns DataFrame, we try to find out on what dates each bank stock had the best and worst single day returns. You should notice that 4 of the banks share the same day for the worst drop, did anything significant happen that day?

Inauguration Day – New president takes the white house , read more and share your learnings below

In [13]:
# Worst Drop (4 of them on Inauguration day)
returns.idxmin()
Out[13]:
BAC Return   2009-01-20
C Return     2011-05-06
GS Return    2009-01-20
JPM Return   2009-01-20
MS Return    2008-10-09
WFC Return   2009-01-20
dtype: datetime64[ns]

You should have noticed that Citigroup’s largest drop and biggest gain were very close to one another, did anything significant happen in that time frame?

Citigroup had a stock split. – So what is a stock split , do share is you know about it

In [14]:
# Best Single Day Gain
# citigroup stock split in May 2011, but also JPM day after inauguration.
returns.idxmax()
Out[14]:
BAC Return   2009-04-09
C Return     2011-05-09
GS Return    2008-11-24
JPM Return   2009-01-21
MS Return    2008-10-13
WFC Return   2008-07-16
dtype: datetime64[ns]

Take a look at the standard deviation of the returns, which stock would you classify as the riskiest over the entire time period? Which would you classify as the riskiest for the year 2015?

In [15]:
returns.std()# Citigroup is the most riskiest - max spread data 
Out[15]:
BAC Return    0.036650
C Return      0.179969
GS Return     0.025346
JPM Return    0.027656
MS Return     0.037820
WFC Return    0.030233
dtype: float64
In [16]:
returns.ix['2015-01-01':'2015-12-31'].std() # Very similar risk profiles, but Morgan Stanley or BofA
Out[16]:
BAC Return    0.016163
C Return      0.015289
GS Return     0.014046
JPM Return    0.014017
MS Return     0.016249
WFC Return    0.012591
dtype: float64

Here we try and create a distplot using seaborn of the 2008,2011,2015 returns for Morgan -Stanley and Citigroup banks see how the returns have changed over the year

1.Morgan-Stanley

In [17]:
f, (ax1, ax2,ax3) = plt.subplots(1, 3, sharey=True )

plt.figure(figsize=(6,6))
sns.distplot(returns.ix['2008-1-1':'2008-12-31']['MS Return'] , color='green' , bins=50 ,ax=ax1 )

plt.figure(figsize=(12,6))
sns.distplot(returns.ix['2011-1-1':'2011-12-31']['MS Return'] , color='green' , bins=50,ax=ax2)
sns.distplot(returns.ix['2015-1-1':'2015-12-31']['MS Return'] , color='green' , bins=50,ax=ax3)
Out[17]:
<matplotlib.axes._subplots.AxesSubplot at 0x2a5d24a87f0>
<matplotlib.figure.Figure at 0x2a5d1f9e2e8>
<matplotlib.figure.Figure at 0x2a5d24fa390>

2.CitiGroup

In [18]:
f, (ax1, ax2,ax3) = plt.subplots(1, 3, sharey=True )

plt.figure(figsize=(6,6))
sns.distplot(returns.ix['2008-1-1':'2008-12-31']['C Return'] , color='green' , bins=50 ,ax=ax1 )

plt.figure(figsize=(12,6))
sns.distplot(returns.ix['2011-1-1':'2011-12-31']['C Return'] , color='green' , bins=50,ax=ax2)
sns.distplot(returns.ix['2015-1-1':'2015-12-31']['C Return'] , color='green' , bins=50,ax=ax3)
Out[18]:
<matplotlib.axes._subplots.AxesSubplot at 0x2a5d38d2438>
<matplotlib.figure.Figure at 0x2a5d05a31d0>
<matplotlib.figure.Figure at 0x2a5d3986cf8>

We get similar results returns for Morgan Stanley are on an increase , where as citigroups 2011 result is dramatic for which you already have read what the reason was.


More Visualization

A lot of this project will focus on visualizations. Feel free to use any of your preferred visualization libraries to try to recreate the described plots below, seaborn, matplotlib, plotly and cufflinks, or just pandas.

Imports

In [19]:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
%matplotlib inline

Here we create a line plot showing Close price for each bank for the entire index of time. Try using a for loop, or use .xs to get a cross section of the data.)

In [20]:
for tick in tickers:
    bank_stocks[tick]['Close'].plot(figsize=(12,4),label=tick)
plt.legend()
Out[20]:
<matplotlib.legend.Legend at 0x2a5d385add8>
In [21]:
bank_stocks.xs(key='Close',axis=1,level='Stock Info').plot()
Out[21]:
<matplotlib.axes._subplots.AxesSubplot at 0x2a5d3e45358>

You see all the line plots converging at a point near the end of 2008 and thats what we indicate as the Great Depression


Moving Averages

Let’s analyze the moving averages for these stocks in the year 2008.

Ploting the rolling 30 day average against the Close Price for every bank’s stock for the year 2008

We will also plot the year 2011-2012 for citi group , it should show some variation and we know why

In [22]:
plt.figure(figsize=(12,6))
BAC['Close'].ix['2008-01-01':'2009-01-01'].rolling(window=30).mean().plot(label='30 Day Avg')
BAC['Close'].ix['2008-01-01':'2009-01-01'].plot(label='BAC CLOSE')
plt.legend()
  
Out[22]:
<matplotlib.legend.Legend at 0x2a5d2485668>
In [23]:
plt.figure(figsize=(12,6))
MS['Close'].ix['2008-01-01':'2009-01-01'].rolling(window=30).mean().plot(label='30 Day Avg')
MS['Close'].ix['2008-01-01':'2009-01-01'].plot(label='MS CLOSE')
plt.legend()
 
Out[23]:
<matplotlib.legend.Legend at 0x2a5d4045eb8>
In [24]:
plt.figure(figsize=(12,6))
WFC['Close'].ix['2008-01-01':'2009-01-01'].rolling(window=30).mean().plot(label='30 Day Avg')
WFC['Close'].ix['2008-01-01':'2009-01-01'].plot(label='WFC CLOSE')
plt.legend()
  
Out[24]:
<matplotlib.legend.Legend at 0x2a5d3ee3518>
In [25]:
plt.figure(figsize=(12,6))
JPM['Close'].ix['2008-01-01':'2009-01-01'].rolling(window=30).mean().plot(label='30 Day Avg')
JPM['Close'].ix['2008-01-01':'2009-01-01'].plot(label='JPM CLOSE')
plt.legend()
Out[25]:
<matplotlib.legend.Legend at 0x2a5d419c470>
In [26]:
plt.figure(figsize=(12,6))
GS['Close'].ix['2008-01-01':'2009-01-01'].rolling(window=30).mean().plot(label='30 Day Avg')
GS['Close'].ix['2008-01-01':'2009-01-01'].plot(label='GS CLOSE')
plt.legend()
Out[26]:
<matplotlib.legend.Legend at 0x2a5d424bfd0>
In [27]:
plt.figure(figsize=(12,6))
C['Close'].ix['2008-01-01':'2009-01-01'].rolling(window=30).mean().plot(label='30 Day Avg')
C['Close'].ix['2008-01-01':'2009-01-01'].plot(label='C CLOSE')
plt.legend()
 
Out[27]:
<matplotlib.legend.Legend at 0x2a5d3f5dd68>
In [28]:
plt.figure(figsize=(12,6))
C['Close'].ix['2011-01-01':'2012-01-01'].rolling(window=30).mean().plot(label='30 Day Avg')
C['Close'].ix['2011-01-01':'2012-01-01'].plot(label='C CLOSE')
plt.legend()
Out[28]:
<matplotlib.legend.Legend at 0x2a5d3f407f0>

Create a heatmap of the correlation between the stocks Close Price.

Heatmaps using correlation will indicate how these bank stock’s are related to each other through our data , we’ll also turn the annot so that we can see the values .

Later , the cluster map will give rest to our mind and will handle which bank is related to which one by indicating the clusters

In [29]:
sns.heatmap(bank_stocks.xs(key='Close',axis=1,level='Stock Info').corr(),annot=True)
Out[29]:
<matplotlib.axes._subplots.AxesSubplot at 0x2a5d44c7b00>

Optional: Use seaborn’s clustermap to cluster the correlations together:

In [30]:
sns.clustermap(bank_stocks.xs(key='Close',axis=1,level='Stock Info').corr(),annot=True)
Out[30]:
<seaborn.matrix.ClusterGrid at 0x2a5d565a048>

Great Job!

Definitely a lot of more specific finance topics here, so don’t worry if you didn’t understand them all! The only thing you should be concerned with understanding are the basic pandas and visualization operations.

Feel free to comment out your doubts. Happy Learning 🙂

 

Share this article!

Tanishk Sachdeva

Leave a Reply

Your email address will not be published. Required fields are marked *