How to Upload Files to Amazon S3 with Python - Codewolfy

Uploading Files to Amazon S3 with Python is a very important skill to learn if you want a robust, scalable way to manage application files. Amazon S3, or Simple Storage Service, offers secure, durable, and highly available object storage that is ideal for everything from website assets to large data backups. This tutorial will walk you through the entire process, from the installation of the required tools to writing Python scripts for basic and advanced file upload.

Why Use an Amazon S3 Bucket?

Amazon S3 is the industry standard for cloud object storage for several reasons that are quite compelling.

  • Durability and Availability: S3 is designed for 99.99% of durability, meaning that your files are incredibly safe from loss. It automatically stores your data across multiple devices in different facilities.
  • Scalability: You can store a virtually unlimited amount of data. S3 scales seamlessly as your needs grow, so you never have to worry about running out of disk space.
  • Cost-Effectiveness: You pay only for the storage you actually use. Its tiered pricing allows you to move less frequently accessed data to a cheaper storage class to save money.
  • Security: S3 provides robust security features in the form of encryption for data at rest and in transit, control lists, and bucket policies, which provide extra protection.

Introducing Boto3: The AWS SDK for Python

To interact with AWS services like S3 from your Python code, you use an SDK (Software Development Kit). Boto3 is the official AWS SDK for Python, and it makes communicating with the S3 API incredibly simple. You need to install this library if not.

pip install boto3

Before moving further, make sure you have configured your AWS credentials. The easiest way is to install the AWS CLI and run aws configure. Boto3 will automatically use these credentials to authenticate your requests.

Basic Python S3 Upload Example

Let’s start with simple file upload to amazon s3 using python. In this example, we will establish connection with S3 bucket and upload single file to it.

import boto3
from botocore.exceptions import NoCredentialsError

def upload_file_to_s3(file_name, bucket, object_name=None):
    if object_name is None:
        object_name = file_name

    s3_client = boto3.client('s3')
    try:
        response = s3_client.upload_file(file_name, bucket, object_name)
    except NoCredentialsError:
        print("Credentials not available")
        return False
    except Exception as e:
        print(f"An error occurred: {e}")
        return False
    return True

# --- Code to call the upload function ---
if __name__ == '__main__':
    # Define the local file path and the S3 bucket name
    local_file = 'sample.txt'
    bucket_name = 'your-s3-bucket-name'
    
    # Call the function to upload the file
    success = upload_file_to_s3(local_file, bucket_name)
    
    if success:
        print(f"Upload successful: {local_file} has been uploaded to {bucket_name}.")
    else:
        print("Upload failed.")

It will use your configuration from local machine and upload file. If you want to use credentials hard coded then create object like below code snippet.

import boto3

s3_client = boto3.client(
    's3',
    aws_access_key_id='YOUR_ACCESS_KEY_ID',
    aws_secret_access_key='YOUR_SECRET_ACCESS_KEY'
)

For multiple files upload, you can use loop or directory based iteration to uploaded files one by one. Let’s create another function to utilize our file upload method for each file into directory.

import os

def upload_directory_to_s3(directory_path, bucket):
    for root, dirs, files in os.walk(directory_path):
        for file in files:
            local_path = os.path.join(root, file)
            # Create a relative path for the S3 object name
            relative_path = os.path.relpath(local_path, directory_path)
            s3_object_name = relative_path.replace("\\", "/")
            
            print(f"Uploading {local_path} to {bucket}/{s3_object_name}")
            upload_file_to_s3(local_path, bucket, s3_object_name)

The function will loop through each file and upload it to Amazon S3 bucket.

Uploading Large Files into Chunks to Amazon S3

Sometimes, large files need to upload to the S3 bucket for better performance. Let’s assume you are creating service like netflix where entire movies are being uploaded and streamed. Let’s take an example to upload large file to bucket.

import boto3
import os
import math

def upload_large_file_in_chunks(file_path, bucket_name, object_name):
    s3_client = boto3.client('s3')
    chunk_size = 5 * 1024 * 1024

    try:
        response = s3_client.create_multipart_upload(
            Bucket=bucket_name,
            Key=object_name
        )
        upload_id = response['UploadId']
        print(f"Multipart upload initiated with UploadId: {upload_id}")

        parts = []
        file_size = os.path.getsize(file_path)
        num_chunks = math.ceil(file_size / chunk_size)

        with open(file_path, 'rb') as f:
            for i in range(num_chunks):
                part_number = i + 1
                chunk = f.read(chunk_size)
                
                print(f"Uploading part {part_number}/{num_chunks}...")
                part_response = s3_client.upload_part(
                    Bucket=bucket_name,
                    Key=object_name,
                    PartNumber=part_number,
                    UploadId=upload_id,
                    Body=chunk
                )
                
                parts.append({
                    'PartNumber': part_number,
                    'ETag': part_response['ETag']
                })

        print("Completing multipart upload...")
        s3_client.complete_multipart_upload(
            Bucket=bucket_name,
            Key=object_name,
            UploadId=upload_id,
            MultipartUpload={'Parts': parts}
        )
        print("Large file uploaded successfully.")
        return True

    except Exception as e:
        print(f"An error occurred: {e}")
        if 'upload_id' in locals():
            print("Aborting multipart upload.")
            s3_client.abort_multipart_upload(
                Bucket=bucket_name,
                Key=object_name,
                UploadId=upload_id
            )
        return False


if __name__ == '__main__':
    file_to_upload = 'test.mov'
    bucket = 'your-s3-bucket-name'
    s3_object_key = 'videos/test.mov'
    
    if os.path.exists(file_to_upload):
        upload_large_file_in_chunks(file_to_upload, bucket, s3_object_key)
    else:
        print(f"Error: The file {file_to_upload} was not found.")

It will split file into small chunks for better performance and upload chunks individually.

Conclusion

You now have the tools and knowledge to confidently upload files to S3 from your Python applications. We have covered the basics of installing Boto3 and uploading a single file, then expanded on that to upload an entire directory and learned how Boto3 transparently handles large files using multipart uploads. Adding Amazon S3 to your projects means obtaining a scalable, highly secure, and very affordable environment for storing all types of files.