top of page

Use MySQL with Python: The Ultimate Combo for Data Analysts and Data Scientists

Updated: Feb 28



For Data Analysts and Data Scientists, Python and SQL are two of the most crucial languages.


We'll go over all you need to know to connect Python and SQL in this blog post.


You'll discover how to import data directly into your Pandas Dataframe from relational databases, save data from your Python program in a custom database, and handle any additional use cases you could think of.


In this blog we will cover:


  • Why learn how to use Python and SQL together?

  • Prerequisites for Python-MySQL connection

  • Connecting to MySQL Server in Python

  • Running MySQL Queries with Python

  • Extracting data from a MySQL Database

  • Creating a Pandas Dataframe by extracting data from MySQL


There is a ton of cool and valuable information there. Let's get started!


Why Python with SQL?


Python has many benefits for data analysts and data scientists. It is a very helpful tool for any data analyst because of the wide variety of open-source libraries available.

For data analysis, we have Pandas and NumPy, for visualisation, Matplotlib and Seaborn, and for machine learning applications, TensorFlow, Scikit-Learn, and PyTorch (plus many, many more).



Python is one of the fastest-growing programming languages available, and it's quite easy to notice why with its (relatively) simple learning curve and adaptability.


So it's important to consider where all of this data comes from if we're using Python for data analysis.

Despite the enormous range of sources for datasets, data is frequently kept in relational databases, especially in commercial enterprises. Relational databases are a very effective, potent, and often-used method for generating, reading, updating, and deleting data of all kinds.


The most popular relational database management systems (RDBMSs), including Oracle, MySQL, Microsoft SQL Server, PostgreSQL, and IBM DB2, all access and modify data using the Structured Query Language (SQL).


Keep in mind that because each RDBMS employs a somewhat different flavour of SQL, SQL code created for one RDBMS typically won't function properly in another without (mostly minimal) changes. However, the ideas, frameworks, and methods are substantially the same.






Getting Started (Prerequisites):

  • Python / Anaconda installed

  • IDE (we’re using Jupyter Notebook)

  • MySQL Installed

Note: You can download Anaconda and Jupyter here and MySQL from here.


We'll need to get them connected to each other after you've got those setup.


Installing the MySQL Connector Python library is necessary for that. To accomplish this, utilise pip or adhere to the instructions:

copy this code from here:

pip install mysql-connector-python


Make sure that you have pandas installed as well because we'll be using it.

copy this code from here:

pip install pandas


Importing Libraries:


Importing our libraries is the first step we want to take, as with every Python project.


In order to prevent problems for others reading or reviewing our code, it is excellent practice to import all of the libraries we want to utilise at the start of the project.


Only the pandas and MySQL Connector libraries will be used in this tutorial.

copy this code from here:

import mysql.connector

import pandas as pd


Connecting to MySQL Server:


By this point, we should have MySQL Community Server set up on our system. Now we need to write some code in Python that lets us establish a connection to that server.

copy this code from here:

conn = mysql.connector.connect(user='root', database='test_db',

password='your_password',

host="localhost")

cursor = conn.cursor()



To better grasp what is going on, let's examine this code line by line:


The first part tries to create a connection object to the server using the “mysql.connector.connect()” method using the details specified by the user in the arguments such as “user” which specifies username, “database” the name of the database we want to connect to, “password” that is the password of the MySQL server and lastly the “host” which is the MySQL server address.


We use this in practice by assigning the output of the function to a variable “conn”, which then becomes our connection object.


The second part creates a cursor object into a “cursor” variable. It is an object that is used to make the connection for executing SQL queries. It acts as middleware between MySQL database connection and SQL query. It is created after giving connection to MySQL database.



Inserting data into the database using INSERT:


The following code snippet executes the SQL query five times to generate data:


copy this code from here:

add_record = ("""INSERT INTO test_table

(id, value)

VALUES (%s, %s)""")


for i in range(5):

cursor.execute(add_record, (i, "Record " + str(i)))


conn.commit()

cursor.close()

conn.close()



First of all we assign our SQL command (explained in detail here) to a variable with an appropriate name. Note that this multi-line formatting is purely for the benefit of humans reading our code. Neither SQL nor Python 'care' if the SQL command is spread out like this. So as long as the syntax is correct, both languages will accept it. The same is true for the CAPITALISATION of operators in SQL. This is a widely-used convention that is strongly recommended, but the actual software that runs the code is case-insensitive because SQL is not case-sensitive, and will treat 'INSERT INTO test_table' and 'insert into test_table' as identical commands.

Next, we use a for loop to make the cursor object to execute our query 5 times. Then we commit the changes in the test_table using conn. commit().


Lastly, we close the cursor and connection to make the changes permanent.



Extract the data from a database in your MySQL server:


This example code extract data directly from the database using the connector:

copy this code from here:

conn = mysql.connector.connect(user='root', database='test_db',

password='your_password',

host="localhost")

cursor = conn.cursor()


query = "SELECT id, value FROM test_table"


cursor.execute(query)


for (id, value) in cursor:

print("ID={}, Value={}".format(id, value))


cursor.close()

conn.close()



Output:




Once again the first part tries to create a connection object to the server using the “mysql.connector.connect()” method.


Note: You don’t have to establish a connection in a Python file then you don’t have to do it again and again.


The second part creates a cursor object into a “cursor” variable for executing SQL queries.


Then we assign our SQL command (explained in detail here) to a variable with an appropriate name and execute it by using the cursor object.


Now we use a simple for loop to iterate through all the records in the given table in the database. Then we close the connection and cursor object.


But this is not an efficient way of extracting data from a table and we cannot use this for data analysis.



Creating a Dataframe table using Python’s Pandas library:


We can also use Pandas framework to create a DataFrame object. The performance will also be better instead of using the cursor. For Data Analysts using Python, pandas is our beautiful and trusted old friend. It's very simple to convert the output from our database into a DataFrame, and from there the possibilities are endless!


copy this code from here:

conn = mysql.connector.connect(user='root', database='test_db',

password='your_password',

host="localhost")


query = "SELECT id, value FROM test_table"


df = pd.read_sql(query, con=conn)

display(df)

conn.close()


Output:



The above code snippet first creates a database connection and then uses Pandas read_sql() API to retrieve data.

With just a few lines of code, we can easily extract all the data we can handle from the relational databases where it lives and pull it into our state-of-the-art data analytics pipelines. This is really helpful stuff.



CRUD:


Now as you have seen how you can manipulate MySQL databases with ease, you can also perform:

  • Create - entirely new databases, tables and records

  • Read - extract data from a database, and store that data in multiple formats

  • Update - make changes to existing records in the database

  • Delete - remove records which are no longer needed


These are fantastically useful things to be able to do.



Conclusion:


We have covered a lot of ground in this tutorial.


We have learned how to use Python and MySQL Connector to create an entirely new database in MySQL Server, create tables within that database, define the relationships between those tables, and populate them with data.


We have looked at how to extract data from existing databases and load them into pandas DataFrames, ready for analysis.


We hope this tutorial has helped you to see how we can use Python and SQL together to be able to manipulate data even more effectively! Join IOTA Academy to learn industry-ready Data Analysis and Data Science courses.






Recent Posts

See All
bottom of page