Apache Airflow + MongoDB

Run Mongo Docker Container

docker run -p 27018:27017 \\
-e MONGO_INITDB_ROOT_USERNAME=admin \\
-e MONGO_INITDB_ROOT_PASSWORD=password -d mongo \\

Test Mongo Connection

import pymongo
# Replace the uri string with your MongoDB deployment's connection string.
conn_str = "mongodb://admin:[email protected]:27018/admin?retryWrites=true&w=majority"
# set a 5-second connection timeout
client = pymongo.MongoClient(conn_str, serverSelectionTimeoutMS=5000)
try:
print(client.server_info())
except Exception as e:
print(e, "Unable to connect to the server.")

Setting up airflow connection

Under Airflow connection create a new Airflow connection with host, password and username

Mongo Hook

from airflow.providers.mongo.hooks.mongo import MongoHook
hook = MongoHook(mongo_conn_id='mongo')
client = hook.get_conn()
try:
print(client.server_info())
except Exception as e:
print(e, "Unable to connect to the server.")
Adding sample data to mongo
docker exec -it <container-id> -- /bin/bash
mongo -u admin
use notes
db.createCollection("notes")
db.notes.insertMany(
[
{ "note": "Create an identity", "_id":1 , "archived": false},
{ "note": "0.1% increment every day", "_id":2, "archived": true},
{ "note": "productivity is quality not quantity","_id":3, "archived": true}
]
)

Setting up s3 localstack

docker run --rm -it -p 4566:4566 localstack/localstack -e SERVICES=s3
Create a s3 bucket
aws --endpoint-url=http://localhost:4566 s3 mb s3://notes
Copy link
On this page
Run Mongo Docker Container
Test Mongo Connection
Setting up airflow connection
Mongo Hook
Setting up s3 localstack