Mock Data - Data Testing Tool - Pandas TimeSeries
Time Series can be useful to analyze financial data in different frequencies based on data distribution.
We can reload/repeat the data analysis process in Python and export the result to Excel or CSV on daily, weekly, monthly, quarterly or yearly basis.
Goals: Generate Mock Data
Overview Data Time Series Basics
- Example is Amazon stock daily closing prices from 1997 - 2021 in CSV format
- View date type: convert string to time sereies
- Set date as index : retrieve range of data
- Resample function : for plotting
- Assign date to the stock price without date column : with date frequency options
Mock Data
- Generate date range
- Use Numpy to geneerate random number
- Set index to match date and number
## Read the file from CSV format
df = pd.read_csv('data/Amazon.csv')
df.tail()
Find out Date type
- The Date type here is a string, not time series type
type(df.Date[0])
Convert string into time series type
- By adding Parse to read the CSV file
df = pd.read_csv('data/Amazon.csv', parse_dates=["Date"])
df.tail()
## Find out Date type again: the Date type is Timestamp which is one of the time series
type(df.Date[0])
Set index to Date by adding Index column (index_col) to Date
-The benefit of setting index is we can retrieve specific date ranges of Amazon stock prices
df = pd.read_csv('data/Amazon.csv', parse_dates=["Date"],index_col="Date")
df.index
## Retrieve September 2021 stock price
df["2021-09"].head()
Time Series enable us to do more financial analysis by few coding steps
- Get the average closing price of Amazon stock in March 2018
- Retrieve the specific date price
- Weekly, Monthly or Yearly average of the stock price
- Plot the stock price
## Get the average closing price in March 2018
df["2021-09"].Close.mean()
## Retrieving one specific day - can use loc slice function, but it's not necessary
df.loc["2021-09-15":"2021-09-15"]
## Retrieving with a date range
df["2021-09-10":"2021-09-14"]
## Yearly average of the stock price
df.Close.resample('Y').mean().head()
Resample data function
- Shift+tab to view all the possibilities of the Resample data function
- Without Resample, the chart is more grannuily
## Plot with resample - average monthly price
df.Close.resample('M').mean().plot()
## Plot without Resample
df.Close.plot()
Assigning date to a dataset with missing date
- The following example is the August 2021 Amazon stock prices but without any date
- Read the stock closing price csv file with the missing date column
- Assign the date to match the stock price
## Read August 2021 Amazon stock price csv file without date column
df1 = pd.read_csv('data\Amazon_No_Date.csv')
df1.tail()
Assign the date to match the data
- Use "rng" function to assign date to matach stock price
- Apply "B" (Business Day) as the frequency
- Shift+ tab twice on "B" to follow the link for more day function
- The Business day function does not consider holidays
rng = pd.date_range(start="8/1/2021", end = "8/31/2021", freq='B')
rng
Set the date column as the index
- You must specify "True" to modify the original dataframe
df1.set_index(rng,inplace=True)
df1.head(5)
## Average Amazon price between 8/1/21 - 8/18/21
df1["2021-08-01":"2021-08-18"].Close.mean()
Add weekends to the list
- The price would be the same as the previous business day's closing price
df1.asfreq('D', method='pad').head()
View stock data by weekly
- 8/1/21 is Sunday, the weekly business date starts at 8/2/21
df1.asfreq('W', method='pad')
Mock Data - Generating Dates and Random Data for Testing
- Set the date frequency to 'B' for business days
- Import numpy to generate random numbers
- Set the index to match the random numbers
Generating business days
- Generate 30 business days starting from 9/10/2021
rng = pd.date_range(start='9/10/2021', periods=30, freq='B')
rng
Import numpy to generate random numbers
- Generate random numbers between 1 to 20 for the length of 30 days
import numpy as np
np.random.randint(1,20,len(rng))