Coding to solve problems: Fixing up your gits with Python (Part 1)
With directory traversal!
Here’s a quick example of how I used a Python script to automagically fix a mistake with my git repos (that I had made with another Python script).
In this post, I go through my (stripped-down) process for getting to working code:
- identify the problem
- write out the steps to solve a single case manually
- write out dummy code for the solution
- start script with documentation
- figure out code for each step of the process
- add error checking
In reality, there was a lot of testing and iterating to get to working code, but this process minimized the false starts.
Before reading
This blog post presumes familiarity with the git commands:
git clone
(make a local copy of a repo)git remote -v
(list remote repos associated with local copy)git push
(send local changes to the remote repo)git remote set-url
(change the url of the remote repo)
The post discusses the Python modules:
- os
- pygit2
- docopt
My problem: Wrong origin url on hundreds of repos
I built a tool to automate cloning repos from Github based on a keyword search, which was great.
However, my function clone_repos()
set the remote ‘origin’ to be the git url (‘git://
’) of the Github repo, not the clone url (‘https://
’). So that meant that I couldn’t git push
my work.
And I had run that early version to generate hundreds of local repos, all with the wrong remote ‘origin’.
how to fix it manually
The manual fix to this problem would have been:
- traverse to the directory of the local repo
- look up the origin push url with
git remote -v
- if the url begins with
git://...
, change it usinggit remote set-url origin https://...
.
(We could just change the push url by setting the--push
flag.)
I’m sure there’s a one-line command-line that combines find
and bash -c
to traverse and sed
and git
to update the remote url. (Or even something that directly edits the .git/config
files!) But I’m working on python-fu, so here’s my approach.
Below, I’m simply going to start with dummy code and replace step-by-step with real code. My actual process involves lots of print statements and test modes to run the code as I figure out how to do each step.
Dummy code:
# get starting directory
# for every dir under starting directory:
# if dir is a git repo:
# fix its origin
# ignore subdirectories
Solution part 1: Directory traversal
Directory traversal in python is straightforward, thanks to os.walk
. All we have to do to use os.walk
for this problem is:
import ostopDir = # get starting directoryfor root, dirs, files in os.walk(topDir): # for every dir under starting directory
# if root is a git repo:
# fix its origin
# ignore subdirectories
Given a topDir
, os.walk traverses each directory it finds below (root
), descending recursively into each root’s subdirectories (dirs
) in turn. For this problem, we’re not concerned
I’ve put comments in where we need to figure out more code. Let’s tell our code to ignore the subdirectories of gits. (We’ll figure out if a directory is a git next.) To do so, we need to empty out dirs
, which is easy if you know how to edit it in-place using slice notation.
import ostopDir = # get starting directoryfor root, dirs, files in os.walk(topDir):
# if root is a git repo:
# fix its origin
dirs[:] = [] # ignore subdirectories
Solution part 2: Settings and documentation
How to get our starting dir? There are a bunch of simple ways to do so, but those would run into complications since we also want to access Github. Here’s a more complicated but robust solution.
I use docopt
, a great Python package which lets you set command-line options using standard documentation text. It means I start with documentation for what I’m trying to do, and I’m already coding without even trying.
"""Usage:
fixorigin.py DIR
fixorigin.py (-h|--help)Search for Github repositories in DIR; for every git found under DIR make sure remote origin set to clone_url.Arguments:
DIR root directory to search for gitsOptions:
-h --help show this screen.
"""
from docopt import docoptif __name__ == '__main__':
config = docopt(__doc__)
topDir = config['DIR'] # get starting directoryimport osfor root, dirs, files in os.walk(topDir):
# if root is a git repo:
# fix its origin
dirs[:] = [] # ignore subdirectories
Now our starting directory topDir
will be set from the command line!
Solution part 3: Checking if a directory is a git repo
Python has a module that intelligently wraps all of the functionality of git
, called pygit2
. As long as we call import pygit2
up top, this works:
if pygit2.discover_repository(root): # if root is a git repo
repo = pygit2.Repository(pygit2.discover_repository(root))
Solution part 4: Fixing the origin of a local repo
Here’s some dummy code:
def fix_remote_origin(repo):
# clone_url = replace 'git://' at front with 'https://' in repo's origin_url
# git remote set-url origin clone_url
What’s great is that the Python code for this is just as simple:
def fix_remote_origin(repo):
clone_url=repo.remotes['origin'].url.replace('^git://',
'^https://')
repo.remotes.set_url('origin',clone_url)
Solution part 5: Add error-checking
Error checking is important!
Let’s:
- make sure that there is an ‘origin’ remote
- make sure that the ‘origin’ remote has
git://
def fix_remote_origin(repo):
repo = pygit2.Repository(pygit2.discover_repository(dirName))
try:
repo.remotes['origin']:
except:
print("Repo at {dirName} does not have 'origin' remote.")
else:
if repo.remotes['origin'].url.startswith('git://'):
clone_url=repo.remotes['origin'].url.replace('^git://',
'^https://')
repo.remotes.set_url('origin',clone_url)
Note: we can’t use repo.remotes.has_key(‘origin’)
because repo.remotes
doesn’t have that functionality. It’s a special object, not a dictionary.
Solution: Final result
All together:
"""Usage:
fixorigin.py DIR
fixorigin.py (-h|--help)Search for Github repositories in DIR; for every git found under DIR make sure remote origin set to clone_url.Arguments:
DIR root directory to search for gitsOptions:
-h --help show this screen.
"""
from docopt import docoptif __name__ == '__main__':
config = docopt(__doc__)
topDir = config['DIR'] # get starting directoryimport os
import pygit2def fix_remote_origin(repo):
repo = pygit2.Repository(pygit2.discover_repository(dirName))
try:
repo.remotes['origin']:
except:
print("Repo at {dirName} does not have 'origin' remote.")
else:
if repo.remotes['origin'].url.startswith('git://'):
clone_url=repo.remotes['origin'].url.replace('^git://',
'^https://')
repo.remotes.set_url('origin',clone_url)for root, dirs, files in os.walk(topDir):
if pygit2.discover_repository(root):
repo = pygit2.Repository(pygit2.discover_repository(root))
fix_remote_origin(repo)
dirs[:] = [] # ignore subdirectories
Awesome!
That was easy(ish).
But what if we really want to check on Github that the remote exists and we want to ask Github what the correct clone_url
is, instead of assuming that we can simply replace git://
with https://
?
That’s not so easy. And will be another blog post.