Coding to solve problems: Fixing up your gits with Python (Part 1)

Brad Johnson
5 min readOct 14, 2019

--

With directory traversal!

Here’s a quick example of how I used a Python script to automagically fix a mistake with my git repos (that I had made with another Python script).

In this post, I go through my (stripped-down) process for getting to working code:

  • identify the problem
  • write out the steps to solve a single case manually
  • write out dummy code for the solution
  • start script with documentation
  • figure out code for each step of the process
  • add error checking

In reality, there was a lot of testing and iterating to get to working code, but this process minimized the false starts.

Before reading

This blog post presumes familiarity with the git commands:

  • git clone (make a local copy of a repo)
  • git remote -v (list remote repos associated with local copy)
  • git push (send local changes to the remote repo)
  • git remote set-url (change the url of the remote repo)

The post discusses the Python modules:

  • os
  • pygit2
  • docopt

My problem: Wrong origin url on hundreds of repos

I built a tool to automate cloning repos from Github based on a keyword search, which was great.

However, my function clone_repos() set the remote ‘origin’ to be the git url (‘git://’) of the Github repo, not the clone url (‘https://’). So that meant that I couldn’t git push my work.

And I had run that early version to generate hundreds of local repos, all with the wrong remote ‘origin’.

how to fix it manually

The manual fix to this problem would have been:

  • traverse to the directory of the local repo
  • look up the origin push url with git remote -v
  • if the url begins with git://..., change it using git remote set-url origin https://....
    (We could just change the push url by setting the --push flag.)

I’m sure there’s a one-line command-line that combines find and bash -c to traverse and sed and git to update the remote url. (Or even something that directly edits the .git/config files!) But I’m working on python-fu, so here’s my approach.

Below, I’m simply going to start with dummy code and replace step-by-step with real code. My actual process involves lots of print statements and test modes to run the code as I figure out how to do each step.

Dummy code:

# get starting directory
# for every dir under starting directory:
# if dir is a git repo:
# fix its origin
# ignore subdirectories

Solution part 1: Directory traversal

Directory traversal in python is straightforward, thanks to os.walk. All we have to do to use os.walk for this problem is:

import ostopDir = # get starting directoryfor root, dirs, files in os.walk(topDir): # for every dir under starting directory
# if root is a git repo:
# fix its origin
# ignore subdirectories

Given a topDir, os.walk traverses each directory it finds below (root), descending recursively into each root’s subdirectories (dirs) in turn. For this problem, we’re not concerned

I’ve put comments in where we need to figure out more code. Let’s tell our code to ignore the subdirectories of gits. (We’ll figure out if a directory is a git next.) To do so, we need to empty out dirs, which is easy if you know how to edit it in-place using slice notation.

import ostopDir = # get starting directoryfor root, dirs, files in os.walk(topDir):
# if root is a git repo:
# fix its origin
dirs[:] = [] # ignore subdirectories

Solution part 2: Settings and documentation

How to get our starting dir? There are a bunch of simple ways to do so, but those would run into complications since we also want to access Github. Here’s a more complicated but robust solution.

I use docopt, a great Python package which lets you set command-line options using standard documentation text. It means I start with documentation for what I’m trying to do, and I’m already coding without even trying.

"""Usage:
fixorigin.py DIR
fixorigin.py (-h|--help)
Search for Github repositories in DIR; for every git found under DIR make sure remote origin set to clone_url.Arguments:
DIR root directory to search for gits
Options:
-h --help show this screen.
"""
from docopt import docopt
if __name__ == '__main__':
config = docopt(__doc__)
topDir = config['DIR'] # get starting directory
import osfor root, dirs, files in os.walk(topDir):
# if root is a git repo:
# fix its origin
dirs[:] = [] # ignore subdirectories

Now our starting directory topDir will be set from the command line!

Solution part 3: Checking if a directory is a git repo

Python has a module that intelligently wraps all of the functionality of git, called pygit2. As long as we call import pygit2 up top, this works:

if pygit2.discover_repository(root): # if root is a git repo
repo = pygit2.Repository(pygit2.discover_repository(root))

Solution part 4: Fixing the origin of a local repo

Here’s some dummy code:

def fix_remote_origin(repo):
# clone_url = replace 'git://' at front with 'https://' in repo's origin_url
# git remote set-url origin clone_url

What’s great is that the Python code for this is just as simple:

def fix_remote_origin(repo):
clone_url=repo.remotes['origin'].url.replace('^git://',
'^https://')
repo.remotes.set_url('origin',clone_url)

Solution part 5: Add error-checking

Error checking is important!

Let’s:

  • make sure that there is an ‘origin’ remote
  • make sure that the ‘origin’ remote has git://
def fix_remote_origin(repo):
repo = pygit2.Repository(pygit2.discover_repository(dirName))
try:
repo.remotes['origin']:
except:
print("Repo at {dirName} does not have 'origin' remote.")
else:
if repo.remotes['origin'].url.startswith('git://'):
clone_url=repo.remotes['origin'].url.replace('^git://',
'^https://')
repo.remotes.set_url('origin',clone_url)

Note: we can’t use repo.remotes.has_key(‘origin’) because repo.remotes doesn’t have that functionality. It’s a special object, not a dictionary.

Solution: Final result

All together:

"""Usage:
fixorigin.py DIR
fixorigin.py (-h|--help)
Search for Github repositories in DIR; for every git found under DIR make sure remote origin set to clone_url.Arguments:
DIR root directory to search for gits
Options:
-h --help show this screen.
"""
from docopt import docopt
if __name__ == '__main__':
config = docopt(__doc__)
topDir = config['DIR'] # get starting directory
import os
import pygit2
def fix_remote_origin(repo):
repo = pygit2.Repository(pygit2.discover_repository(dirName))
try:
repo.remotes['origin']:
except:
print("Repo at {dirName} does not have 'origin' remote.")
else:
if repo.remotes['origin'].url.startswith('git://'):
clone_url=repo.remotes['origin'].url.replace('^git://',
'^https://')
repo.remotes.set_url('origin',clone_url)
for root, dirs, files in os.walk(topDir):
if pygit2.discover_repository(root):
repo = pygit2.Repository(pygit2.discover_repository(root))
fix_remote_origin(repo)
dirs[:] = [] # ignore subdirectories

Awesome!

That was easy(ish).

But what if we really want to check on Github that the remote exists and we want to ask Github what the correct clone_url is, instead of assuming that we can simply replace git:// with https://?

That’s not so easy. And will be another blog post.

--

--

Brad Johnson
Brad Johnson

Written by Brad Johnson

Climate strategist, HillHeat.News. Former Climate Hawks Vote ED, Campaign Manager for Forecast the Facts, ThinkProgress Green Editor.

No responses yet