Database for Financial Analysis I - SQLite
After merging and reshaping the datasets, we can use SQLite to store, organize and manipulate data in smaller environments.
SQLite is a public-domain software package that provides a relational database management system.
SQLite is a public-domain software package that provides a relational database management system.
Goals: Automate Financial Analysis Process in SQLite
- Install Scientific Python "sqlite3" to convert Excel file into SQLite database in 8 lines of code
- Excel file will be updated after rerun/reconnect to SQLite
- Set index and modifying primary key in SQLite or in Python
- Create and save query result table in SQLite and Excel
- Create and alter tables with pandas
- Insert columns to the tables
- Use free graphical user interface (GUI) tools to run SQL queries on database
Connecting SQLite
- The first row in each sheet should have legal column names
- Excel tab name cannot have spaces, but underscores are acceptable
- Here is the conversion codes:
- Import library sqlite3
- Name of Excel .xlsx file. SQLite database will have the same name and extension .db (filename+".db")
- Use if_exists so we can repeat the process
Connect Excel file into SQlite
- The example file is a lsit of stocks with 4 tabs - market value, share, cost and combined
In [2]:
filename="data/SQLite/Stock"
con=sqlite3.connect(filename+".db")
wb=pd.read_excel(filename+'.xlsx',sheet_name=None)
for sheet in wb:
wb[sheet].to_sql(sheet,con, index=False, if_exists='replace')
con.commit()
#con.close()
Query the database
- The benefit of SQlite is we can set up unique identifer as primay key
- We can repeat/automate the query process when new data get updated
- Save the query result to SQLite or Excel
Join the table
- Option A : Inner join between the reinsurance, premium and loss tables
- Option B : Selecting row
Option A : Using SQL for Inner Join between three tables
In [3]:
sql_statement='''
select
m.Symbol as Market_Tab,
s.Symbol as Share_Tab,
c.Symbol as Cost_Tab
from Market as m
join Share as s on s.Symbol=m.Symbol
join Cost as c on c.Symbol=m.Symbol
'''
df = pd.read_sql(sql_statement,con)
df.to_sql('/SQlite/QueryResult', con,if_exists='replace')
df.to_excel('data/SQlite/QueryResult.xlsx')
Option B : Using Pandas issuing SQL command for Inner Join between three tables
In [4]:
sql_statement='select * from Market'
df_market = pd.read_sql(sql_statement,con)
df_market['Market_Tab']=df_market['Symbol']
sql_statement='select * from Share'
df_share = pd.read_sql(sql_statement,con)
df_share['Share_Tab']=df_share['Symbol']
sql_statement='select * from Cost'
df_cost = pd.read_sql(sql_statement,con)
df_cost['Cost_Tab']=df_cost['Symbol']
df_merge = pd.merge(df_market, df_share, on=['Symbol'], how='inner', suffixes=['_from_Market_Tab','_from_Share_Tab'])
df_merge = pd.merge(df_merge, df_cost, on=['Symbol'], how='inner', suffixes=['','_from_Cost_Tab'])
df_merge[['Market_Tab','Share_Tab','Cost_Tab']].head()
df_merge=df_merge[['Market_Tab','Share_Tab','Cost_Tab']]
print("rows=%s" % df_merge.shape[0])
df_merge.to_sql('/SQlite/QueryRresult2', con,if_exists='replace')
df_merge.to_excel('data/SQlite/QueryResult2.xlsx')
Out[4]:
DB Browser - a tool to visualize and access SQLite databases
- DB Browser for SQLite is a high quality, visual, open source tool used to create, design, and edit database files compatible with SQLite.
- Save the query result to SQLite
- Below is the screenshot of the database structures after connecting from Excel to SQLite by using Python
Insert a column to table
- Option A - use DML to add a column :-|
- Option B - use to_sql to add a column - much easier and more native to PANDAS way of doing things
View table before inserting a new column
In [5]:
sql_statement='''
select s.* from share as s
'''
df = pd.read_sql(sql_statement,con)
df.head()
Out[5]:
Option A - use DML to add a column :-|
In [6]:
## Obtain a Cursor object to execute SQL statements
cur=con.cursor()
## Add a new column to share table
addColumn = "ALTER TABLE Share ADD COLUMN BROKER varchar(32)"
cur.execute(addColumn)
sql_statement='''
select s.* from Share as s
'''
df = pd.read_sql(sql_statement,con)
df.head()
Out[6]:
Out[6]:
Option B - use to_sql to add a column - much easier and more native to PANDAS way of doing things
In [7]:
df['Broker2']='Fidelity LLC'
df.to_sql('Share',con,if_exists='replace')
df.head()
Out[7]:
In [9]:
## Rename the SQLite Table
renameTable = "ALTER TABLE Combined RENAME TO CombinedNew"
cur.execute(renameTable)
Out[9]:
Inserting a new column called "testing" to the SQlite
In [10]:
con = sqlite3.connect("data/SQLite/Stock.db")
cur = con.cursor()
addColumn = "ALTER TABLE Market ADD COLUMN Broker_Name varchar(32)"
cur.execute(addColumn)
Out[10]:
In [ ]:
## Closing the connection to the database file
con.close()