Lesson 2 - Reading and Writing Files

In this lesson, we will be learning how to read and write files. This is a great tool in our toolbox to have, as many programs have to read in data from txt or csv files to then manipulate it. We will be working with two different file types here:

  • .txt
  • .csv

Reading .txt Files

We can think about .txt files as being a series of lines. Each line is separated by the new line \n character. We will be observing different ways of reading this file type.

with open("sampleText.txt") as file:
  data = file.read()
  print(data)
  file.close()

  Hello World!
  I am doing great today!
  How are you doing?
  I am also fine thanks!
  Have a great day!
  Thanks you too!

Here the with statement improves our syntax and allows us to catch any errors if the file does not exist. This is part of best practices. Another good habit is to always close the file after we are done reading. This is done by calling file.close(). This helps with the computer’s random access memory (RAM).

Reading Lines

Remember how we first said that files can be thought of as lines that are each separated by the \n character? This is helpful for us because instead of reading all the lines in a file, we can break it down and only read a certain portion. We can do this like so:

with open("sampleText.txt") as file:
  print(file.readline())
  print(file.readline())
Hello World!

I am doing great today!

Note how each time we call the file.readline() function, we are referring to the next line in the file. Thus, each time we call the function, we move down one line in the file.

Reading Lines into Lists

We can also read the lines of a file and store the line in an array. Essentially, we would be making an array of strings. This would look something like this:

lineList = []
with open("sampleText.txt") as file:
  for x in file:
    lineList += [x]
  print(lineList)
  file.close()
['Hello World!\n', 'I am doing great today!\n', 'How are you doing?\n', 'I am also fine thanks!\n', 'Have a great day!\n', 'Thanks you too!\n']

Do you see how we get an array of strings? Also note the presence of the \n character that separates each string.

Writing to .txt Files

Now the goal is generate our own text and then write it to a file. We can either append to an existing file, or create a new file. We will discuss both approaches.

Add to an existing file

Here, we are going to reference the sampleText.txt file we were working with previously.

with open("sampleText.txt", "a") as file:
  file.write("I am adding another line at the end of the file \n Oh look this is another line \n")
  file.close()

#Now let's read the file to see what we wrote in it
with open("sampleText.txt" , "r") as file:
  print(file.read())
  file.close()
Hello World!
I am doing great today!
How are you doing?
I am also fine thanks!
Have a great day!
Thanks you too!
I am adding another line at the end of the file
 Oh look this is another line

Note the use of the "a" and the "r" parameters. The "a' tells our open function that we will be appending to our file. This means that we will only be adding content to the end of the file. When we use "r" this indicates that we will be only reading the file. If we try to write to it, the program will throw an error.

Overwrite the contents of a file

Here we will be adding entirely new content to the file. We will be using "w" in our open function to indicate that we are writing to it.

with open("sampleText.txt", "w") as file:
  file.write("I am erasing the old content \n")
  file.write("Oh look I can add a new line! \n")
  file.close()

with open("sampleText.txt" , "r") as file:
  print(file.read())
  file.close()
I am erasing the old content
Oh look I can add a new line!

Creating a new .txt File

There are two ways that you can create a new file. One utilizes the "w" parameter that we had employed prior. This will only create a file if the file path we reference does not already exist. If it does exist, it will just write to the existing one. The other uses the "x" parameter. If the file name we pass as a parameter already exists, then we will get an error.

# Using the "w" parameter
with open("newFile.txt" , "w") as file:
  file.write("Oh look, I created a new file!")
  file.close()

with open("newFile.txt" , "r") as file:
  print(file.read())
  file.close()
Oh look, I created a new file!
# Using the "x" parameter

with open("anotherNewFile.txt" , "x") as file:
  file.write("I created another new file!")
  file.close()

with open("anotherNewFile.txt", "r") as file:
  print(file.read())
  file.close()
I created another new file!

Reading .csv Files

Now we will be working with a common data storing file, the Comma Separated Values (CSV) file. This file can often be opened up in Excel to view the individual data points as part of a cell. One way of reading a csv file is just to use the same techniques we used with the .txt files. This is not preferred though. This sample file we will be working with comes from here.


import csv

with open("pokemon.csv" , "r") as file:
  data = csv.DictReader(file)
  print(file.read())
Pokemon,Type
Bulbasaur,Grass
Ivysaur,Grass
Fearow,Normal
Ekans,Poison
Arbok,Poison
Pikachu,Electric
Raichu,Electric
Tynamo,Electric
Eelektrik,Electric
...

Diancie,Rock
Hoopa,Psychic
Volcanion,Fire

This is a good point to start discussing the pandas library. Pandas is generally preferred for reading csv files because you can then directly store it in a dataFrame object that we can more easily process in python. You can easily download pandas by running this command in your terminal after you have installed python:

pip install pandas

or

pip3 install pandas

Luckily, Google Colab already has pandas pre-installed, so we have nothing to worry about. But, what exactly is pandas? Pandas is a library with many functions that we can use to store data; specifically to store data in dataframes. This helps us to reference and organize tabular data.

Here is some sample code of working with pandas to read a simple .CSV file.

import pandas as pd

dataFrame = pd.read_csv("pokemon.csv")

print(dataFrame.head())
      Pokemon    Type
0   Bulbasaur   Grass
1     Ivysaur   Grass
2  Charmander    Fire
3  Charmeleon    Fire

Writing to .csv Files

Now that we have read the data we can write it. I will be showing you to ways. One that uses the original the csv library and one that uses pandas.

Here is the first method:

with open("pokemon.csv", "a") as file:
    pokemon_writer = csv.writer(file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
    pokemon_writer.writerow(['Pickachu', 'Electric'])
    file.close()
with open("pokemon.csv" , "r") as file:
  print(file.read())
  file.close()
Pokemon,Type
Bulbasaur,Grass
Ivysaur,Grass
Fearow,Normal
Ekans,Poison
Arbok,Poison
Pikachu,Electric
Raichu,Electric
Tynamo,Electric
Eelektrik,Electric
...

Diancie,Rock
Hoopa,Psychic
Volcanion,Fire
Pickachu,Electric

This second method is a lot better because we can reference the dataframe as if it were a 2D array. This allows us to add pokemons wherever, not just the end of the file.

import pandas as pd
dataFrame = pd.read_csv("pokemon.csv")

# Add a new row at index position 2 with values provided in list
dataFrame.iloc[2] = ["Salamence" , "Flying"]

print(dataFrame.head())

# We can now write the dataFrame back to the CSV file we read from

dataFrame.to_csv(r"pokemon.csv", index = False, header=True)
      Pokemon    Type
0   Bulbasaur   Grass
1     Ivysaur   Grass
2   Salamence  Flying
3  Charmander    Fire
4  Charmeleon    Fire

As shown above the iloc function takes in the index as a parameter and allows us to insert the new information at any index in the file. This is different than the loc function, which replaces the contents that already exist.

To complete some assignments related to this lesson, click the button below!

Previous
Next