How to merge multiple CSV files into one using Python

Introduction

In this tutorial I am going to show you how to merge multiple CSV files into one in Python. I am using two CSV files for this example. You may need to merge multiple CSV files into one in some situations for your business requirements.

Your different CSV files might have some common headers amongst them and when you merge those CSV files you want to put unique headers for the final CSV file. Under each header you will write values from different CSV files. If there is no value for a particular header in a CSV file then you may put empty or blank.

CSV file is where data are stored as comma separated value (CSV). Using CSV format has many advantages and some of them are given below:

  • CSV format is considered to be standard format
  • CSV is smaller in size and faster to handle
  • CSV is simple to implement and easy to parse
  • CSV is human readable and easy to edit manually
  • CSV is processed by almost all applications

Recommended Reading:

Prerequisites

Python 3.9.0 – 3.9.1

Merge Multiple CSV Files

Let’s consider the following contents from two separate CSV files. The csv1.csv file has the following content. This file has the following headers and values.

NAME,MIDDLENAME,SURNAME,AGE
Jason,Noname,Scarry,16

The csv2.csv file has the following content. This file has the following content with the headers and values. This file has extra field called EMAIL. Also this file has headers and values in different orders than the first csv file.

MIDDLENAME,NAME,AGE,SURNAME,EMAIL
,Fred,Unknown,Krueger,fred.krueger@email.com
Noname,Jason,16,Scarry,jason.scarry@email.com

The merged or the final csv file (csv.csv) will have the following content. The merged file has the unique header names combined from the above two files. This file might have the blank column value when any one of the above files does not have any value for a particular column or uncommon column.

NAME,MIDDLENAME,SURNAME,AGE,EMAIL

Jason,Noname,Scarry,16,

Fred,,Krueger,Unknown,fred.krueger@email.com

Jason,Noname,Scarry,16,jason.scarry@email.com

The root directory of the project is merge-multiple-csv-files-into-one-csv-file. Under this directory I am going to keep all the required files such as csv1.csv, csv2.csv, csv.csv (output file) and the Python script merge-csv-files.py.

The required code for merging two csv files is written into the file merge-csv-files.py as shown below. Let’s analyze the code.

As a first step I have imported the required module or package csv (import csv) for working with the csv file.

Next I declared few variables to store the headers and row data from both csv files.

Then I have read csv files one by one and stored the headers into variable csv1_header and csv2_header for csv1.csv and csv2.csv files, respectively.

Next I have determined the unique headers from both files using set which will be used in the final csv file csv.csv.

Then I am writing to the final csv file. I used DictWriter to write the headers of the csv file. Again I have read the first csv file using DictReader and iterating each row to write to the final csv file using writerow() function. I have shown how to write each column by accessing the name of the column or by accessing the index of the column. In the same way I have read the second csv file and written into the final csv file.

import csv

csv1_header = []
csv1_data = []

csv2_header = []
csv2_data = []

with open('csv1.csv') as csv1:
    reader = csv.reader(csv1)
    csv1_header = next(reader, None)

with open('csv2.csv') as csv2:
    reader = csv.reader(csv2)
    csv2_header = next(reader, None)

#print(csv1_header)
#print(csv2_header)

set_1 = set(csv1_header)
set_2 = set(csv2_header)

list_2_items_not_in_list_1 = list(set_2 - set_1)
csv_header = list(csv1_header) + list_2_items_not_in_list_1

#print(csv_header)

with open('csv.csv', 'w') as csvfile:
    fieldnames = csv_header
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    writer.writeheader()

    with open('csv1.csv') as csv1:
        reader = csv.DictReader(csv1)
        for row in reader:
            #writer.writerow({'NAME': row['NAME'], 'MIDDLENAME': row['MIDDLENAME'], 'SURNAME': row['SURNAME'], 'AGE': row['AGE']})
            writer.writerow({fieldnames[0]: row[fieldnames[0]], fieldnames[1]: row[fieldnames[1]], fieldnames[2]: row[fieldnames[2]], fieldnames[3]: row[fieldnames[3]]})
			
    with open('csv2.csv') as csv2:
        reader = csv.DictReader(csv2)
        for row in reader: 
            #writer.writerow({'NAME': row['NAME'], 'MIDDLENAME': row['MIDDLENAME'], 'SURNAME': row['SURNAME'], 'AGE': row['AGE'], 'EMAIL': row['EMAIL']})
            writer.writerow({fieldnames[0]: row[fieldnames[0]], fieldnames[1]: row[fieldnames[1]], fieldnames[2]: row[fieldnames[2]], fieldnames[3]: row[fieldnames[3]], fieldnames[4]: row[fieldnames[4]]})

Hope you got an idea how to read csv files and write to a single csv file using Python programming language without using any third party package or module.

Source Code

Download

Leave a Reply

Your email address will not be published. Required fields are marked *