In this lesson, we will be learning how to read and write files. This is a great tool in our toolbox to have, as many programs have to read in data from txt or csv files to then manipulate it. We will be working with two different file types here:
- .txt
- .csv
Reading .txt Files
We can think about .txt files as being a series of lines. Each line is separated by the new line \n
character. We will be observing different ways of reading this file type.
with open("sampleText.txt") as file:
data = file.read()
print(data)
file.close()
Hello World!
I am doing great today!
How are you doing?
I am also fine thanks!
Have a great day!
Thanks you too!
Here the with
statement improves our syntax and allows us to catch any errors if the file does not exist. This is part of best practices. Another good habit is to always close the file after we are done reading. This is done by calling file.close()
. This helps with the computer’s random access memory (RAM).
Reading Lines
Remember how we first said that files can be thought of as lines that are each separated by the \n
character? This is helpful for us because instead of reading all the lines in a file, we can break it down and only read a certain portion. We can do this like so:
with open("sampleText.txt") as file:
print(file.readline())
print(file.readline())
Hello World!
I am doing great today!
Note how each time we call the file.readline()
function, we are referring to the next line in the file. Thus, each time we call the function, we move down one line in the file.
Reading Lines into Lists
We can also read the lines of a file and store the line in an array. Essentially, we would be making an array of strings. This would look something like this:
lineList = []
with open("sampleText.txt") as file:
for x in file:
lineList += [x]
print(lineList)
file.close()
['Hello World!\n', 'I am doing great today!\n', 'How are you doing?\n', 'I am also fine thanks!\n', 'Have a great day!\n', 'Thanks you too!\n']
Do you see how we get an array of strings? Also note the presence of the \n
character that separates each string.
Writing to .txt Files
Now the goal is generate our own text and then write it to a file. We can either append to an existing file, or create a new file. We will discuss both approaches.
Add to an existing file
Here, we are going to reference the sampleText.txt
file we were working with previously.
with open("sampleText.txt", "a") as file:
file.write("I am adding another line at the end of the file \n Oh look this is another line \n")
file.close()
#Now let's read the file to see what we wrote in it
with open("sampleText.txt" , "r") as file:
print(file.read())
file.close()
Hello World!
I am doing great today!
How are you doing?
I am also fine thanks!
Have a great day!
Thanks you too!
I am adding another line at the end of the file
Oh look this is another line
Note the use of the "a"
and the "r"
parameters. The "a'
tells our open function that we will be appending to our file. This means that we will only be adding content to the end of the file. When we use "r"
this indicates that we will be only reading the file. If we try to write to it, the program will throw an error.
Overwrite the contents of a file
Here we will be adding entirely new content to the file. We will be using "w"
in our open
function to indicate that we are writing to it.
with open("sampleText.txt", "w") as file:
file.write("I am erasing the old content \n")
file.write("Oh look I can add a new line! \n")
file.close()
with open("sampleText.txt" , "r") as file:
print(file.read())
file.close()
I am erasing the old content
Oh look I can add a new line!
Creating a new .txt File
There are two ways that you can create a new file. One utilizes the "w"
parameter that we had employed prior. This will only create a file if the file path we reference does not already exist. If it does exist, it will just write to the existing one.
The other uses the "x"
parameter. If the file name we pass as a parameter already exists, then we will get an error.
# Using the "w" parameter
with open("newFile.txt" , "w") as file:
file.write("Oh look, I created a new file!")
file.close()
with open("newFile.txt" , "r") as file:
print(file.read())
file.close()
Oh look, I created a new file!
# Using the "x" parameter
with open("anotherNewFile.txt" , "x") as file:
file.write("I created another new file!")
file.close()
with open("anotherNewFile.txt", "r") as file:
print(file.read())
file.close()
I created another new file!
Reading .csv Files
Now we will be working with a common data storing file, the Comma Separated Values (CSV) file. This file can often be opened up in Excel to view the individual data points as part of a cell. One way of reading a csv file is just to use the same techniques we used with the .txt files. This is not preferred though. This sample file we will be working with comes from here.
import csv
with open("pokemon.csv" , "r") as file:
data = csv.DictReader(file)
print(file.read())
Pokemon,Type
Bulbasaur,Grass
Ivysaur,Grass
Fearow,Normal
Ekans,Poison
Arbok,Poison
Pikachu,Electric
Raichu,Electric
Tynamo,Electric
Eelektrik,Electric
...
Diancie,Rock
Hoopa,Psychic
Volcanion,Fire
This is a good point to start discussing the pandas library. Pandas is generally preferred for reading csv files because you can then directly store it in a dataFrame object that we can more easily process in python. You can easily download pandas by running this command in your terminal after you have installed python:
pip install pandas
or
pip3 install pandas
Luckily, Google Colab already has pandas pre-installed, so we have nothing to worry about. But, what exactly is pandas? Pandas is a library with many functions that we can use to store data; specifically to store data in dataframes. This helps us to reference and organize tabular data.
Here is some sample code of working with pandas to read a simple .CSV file.
import pandas as pd
dataFrame = pd.read_csv("pokemon.csv")
print(dataFrame.head())
Pokemon Type
0 Bulbasaur Grass
1 Ivysaur Grass
2 Charmander Fire
3 Charmeleon Fire
Writing to .csv Files
Now that we have read the data we can write it. I will be showing you to ways. One that uses the original the csv library and one that uses pandas.
Here is the first method:
with open("pokemon.csv", "a") as file:
pokemon_writer = csv.writer(file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
pokemon_writer.writerow(['Pickachu', 'Electric'])
file.close()
with open("pokemon.csv" , "r") as file:
print(file.read())
file.close()
Pokemon,Type
Bulbasaur,Grass
Ivysaur,Grass
Fearow,Normal
Ekans,Poison
Arbok,Poison
Pikachu,Electric
Raichu,Electric
Tynamo,Electric
Eelektrik,Electric
...
Diancie,Rock
Hoopa,Psychic
Volcanion,Fire
Pickachu,Electric
This second method is a lot better because we can reference the dataframe as if it were a 2D array. This allows us to add pokemons wherever, not just the end of the file.
import pandas as pd
dataFrame = pd.read_csv("pokemon.csv")
# Add a new row at index position 2 with values provided in list
dataFrame.iloc[2] = ["Salamence" , "Flying"]
print(dataFrame.head())
# We can now write the dataFrame back to the CSV file we read from
dataFrame.to_csv(r"pokemon.csv", index = False, header=True)
Pokemon Type
0 Bulbasaur Grass
1 Ivysaur Grass
2 Salamence Flying
3 Charmander Fire
4 Charmeleon Fire
As shown above the iloc
function takes in the index as a parameter and allows us to insert the new information at any index in the file. This is different than the loc
function, which replaces the contents that already exist.
To complete some assignments related to this lesson, click the button below!