Merging Python Modules
Airflow had many AWS providers that weren't following the latest conventions. The resolution involes merging multiple modules into single Python file. I wrote this script to automate the task.
Recently I took up the humongous task of Merging multiple Python modules into a single one for Apache Airflow. Airflow has many AWS providers that weren't following the latest conventions. So with issues #20139 and #20296 we set out to resolve it.
What waited ahead was a line up of PRs which involves
Create a new Python module
Add the license agreement to it
Move all the classes from independent modules to the new one
Add imports from each independent module
Add deprecated warning block to old modules
Fixing all the imports in test cases and examples
As you can see, this task soon became quite repetitive since we were dealing with different AWS products like EMR, EKS, EC2, DMS, etc., Overtime the process got boring and icky, and I did what any developer would do. Automated it. Along the way, I also learned quite a few things.
Reading all the top-level imports
A typical Python module can contain a variety of stuff. Imports, Global variables, functions, lambdas, but in this case, it was imports and classes. The first task is to load these imports and classes' source code into Python objects written to the new module.
The ast module came in handy to load the Python module and load it into a tree. This is my first time with ast, and pretty amazed by what it can do. So buckle up
Next, we loop through these nodes and capture classes and imports.
In the case of an import statement, there are variations.
We construct the import strings on the fly and have them handy to handle all of this.
Get Classes' source code
Let's shed some more light on the code section that captures the classes and their source code. I used a library called astunparse
, which unparses an ast node back to its source code form.
The astunparse
module has a dispatch
method which walks through the ast and unparses the node based on its type.
In the future
The current script is pretty limited in its capabilities. I'm not going to work on them until a need arises.
It will work only on modules' import statements and classes. Any other python construct will be ignored
Unparse makes all docstrings a single line with with single quotes.
The current script does not handle multiple imports.
The snippet, along with an example directory, is available in my github repository
Last updated