Managing Files and Directories with Python

We cannot always avoid the details of file management, especially when analyzing raw data. It can come in the form of multiple files distributed over multiple directories. Accessing those files and the directory system can be a critical aspect of raw data processing. Luckily, Python has very convenient methods to handle files and directories in the os module. Here we will look at functions to look inspect the contents of directories (os.listdir, os.walk) and functions that help us manipulate files and directoris (os.rename, os.remove, os.mkdir).

Directory Contents

One of the most important functions to manage files is os.listdir(directory). It returns a list of strings that contains to the names of all directories and files in path.

import os

directory = r"C:\Users\Daniel\tutorial"

dir_content = os.listdir(directory)
# ['images', 'test_one.txt', 'test_two.txt']

Now that we know what is in our directory, how do we find out which of these are files and which are subdirectories? You might be tempted to just check for a period (.) inside the string because it separates the file from its extension. This method is error prone, because directories could contain periods and files don’t necessarily have extensions. It is much better to use os.path.isfile() and os.path.isdir().

files = [x for x in dir_content if os.path.isfile(directory + os.sep + x)]
# ['test_one.txt', 'test_two.txt']

dirs = [x for x in dir_content if os.path.isdir(directory + os.sep + x)]
# ['images']

Now we know the contents of directory. We have two files and one directory. In case you were wondering about os.sep, that is the directory separator of the operating system. On my Window 10 that is the '\'. What if we need both files that are in our directory and those that are in sub-directories? This is the perfect case to use os.walk(). It gives a convenient way to loop through a directory and all its sub-directories.

for root, dirs, files in os.walk(directory):

# C:\Users\Daniel\tutorial
# ['images']
# ['file_1.txt', 'file_2.txt']
# C:\Users\Daniel\tutorial\images
# []
# ['plot_1.png', 'plot_2.png']

By default os.walk() goes from top down. The first name root tells us the full path of the directory we are currently at. Printing root tells us that we start at the top directory. While root is a string, both dirs and files are lists. They tell us for the current root, which files and directories are there. For the first directory we already found out on our own that the contents are. Two text files and a sub-directory. Our loop next goes to the sub-directory images. In there are no more sub-directories but two image files. If there were more sub-categories at any level (directory or directory\images), os.walk would go through all of them. Next we will find out how to create/move/rename/delete both files and directories.

Manipulating Files and Directories

Let’s say I want to rename the .txt files. I don’t like the numbering and would prefer them to have a leading zero in case they go into the double digits. We can use os.rename for this job.

directory = r"C:\Users\Daniel\tutorial"

dir_content = os.listdir(directory)

txt_f = [x for x in dir_content
         if os.path.isfile(directory + os.sep + x) and ".txt" in x]
# ['file_1.txt', 'file_2.txt']
for f_name in txt_f:
    f_name_split = f_name.split("_")
    num = f_name_split[1].split(".")[0]
    new_name = f_name_split[0] + "_" + num.zfill(2) + ".txt"
    os.rename(directory + os.sep + f_name,
              directory + os.sep + new_name)
['file_01.txt', 'file_02.txt', 'images']

Now that we renamed our files, let’s create another sub-directory for these .txt files. To create new directories we use os.mkdir().

# ['file_01.txt', 'file_02.txt', 'images']

os.mkdir(directory + os.sep + 'texts')

# ['file_01.txt', 'file_02.txt', 'images', 'texts']

Now we need to move the .txt files. There is no dedicated move function in the os module. Instead we use rename but instead of changing the name of the file, we change the path to the directory.

9directory = r"C:\Users\Daniel\tutorial"
dir_content = os.listdir(directory)
txt_f = [x for x in dir_content
         if os.path.isfile(directory + os.sep + x) and ".txt" in x]

for f in txt_f:
    old_path = directory + os.sep + f
    new_path = directory + os.sep + "texts" + os.sep + f
    os.rename(old_path, new_path)

# ['images', 'texts']

# ['file_01.txt', 'file_02.txt']

Now our .txt files are in the ‘\texts’ sub-directory. Unfortunately there is no copy function in os. Instead we have to use another module called shutil. You can use a signature like this.

from shutil import copyfile
copyfile(source, destination)

Finally, to remove a file we simple use os.remove().

# ['file_01.txt', 'file_02.txt']


# ['file_02.txt']

And that’s it. You might have noticed that we did not cover how to create or read files. The os module is technically able to create and read files but in data science we usually depend on more high level interfaces to read files. For example, we might want to open a .csv file with pandas pd.read_csv. Using the lower level os functions will rarely be necessary. Thank you for reading and let me know if you have any questions.

In case you want to learn more about the os module, here are the os module docs.