Personalized Code Searches using OpenGrok

Many organizations implement a full-feature code search/intelligence tool – e.g., SourceGraph or Atlassian Fisheye – allowing engineers to search the enterprise code base.  Other orgs just use the native search of the hosted version control system, such as GitHub‘s or GitLab‘s search.  There are other commercial and open sources ways to search code.

Most of us, however, do not regularly work on code spanning the enterprise and would prefer results related to our current activities.  You may also be need to search the open source projects incorporated into your work.  If this describes you, running OpenGrok on your laptop with a limited set of projects might be a solution.

If you’ve never heard of OpenGrok, it is fully-functioning Open Source code search/intelligence tool that works with many version control systems and multiple languages.

Prerequisites

These instructions were tested using Docker Desktop, and should work with others – e.g., Colima – though slight adjustments might be necessary.  A helpful page can be found here.

If necessary, install the docker runtime of your choice, i.e., Docker Desktop, Colima, Podman, containerd, etc.

This post uses Docker Desktop as the container runtime, but should work with other container runtimes  with some adjustments.

Create Docker Container

OpenGrok automatically pulls changes and rebuilds its indexes, hourly by default.

While open source projects usually do not restrict read-only access to their code, organizations secure their code to protect their intellectual property.  To allow OpenGrok to access a private GitHub repository requires providing OpenGrok a valid SSH private key associated with your GitHub account. A customized Docker container is required to achieve this.

Clone Opengrok

The entire OpenGrok code base is in GitHub.  Clone the repo.

scsosna@mymachine src % git clone https://github.com/oracle/opengrok.git

Change Directory

Your current working directory must be the just-cloned directory.

scsosna@mymachine src % cd opengrok

Copy SSH Key

These instructions are for accessing a git repository via SSH authentication, such as GitHub.

Copy the private key associated to your GitHub into the opengrok directory from your .ssh directory.  For this example, the file is named id_ed25519.

scsosna@mymachine opengrok % cp ~/.ssh/id_ed25519 .

Modify Dockerfile

Two changes are required to the provided Dockerfile:

  • Remove unused version control systems: most common version control systems are supported, unused ones may be removed from your rebuilt container;
  • Add the SSH key: the SSH key copied into the OpenGrok repository needs to be explicitly included into the built container. 

This git patch can be applied directly to the cloned OpenGrok repo.

index ae0b52d76e..e1e3771a1a 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -55,9 +55,12 @@ RUN echo 'deb http://package.perforce.com/apt/ubuntu bionic release' > /etc/apt/
 # install dependencies and Python tools
 # hadolint ignore=DL3008,DL3009
 RUN apt-get update && \
-    apt-get install --no-install-recommends -y git subversion mercurial cvs cssc bzr rcs rcs-blame helix-p4d \
+    apt-get install --no-install-recommends -y git \
     unzip inotify-tools python3 python3-pip \
     python3-venv python3-setuptools openssh-client
+RUN mkdir -p /root/.ssh
+COPY id_ed25519 /root/.ssh/id_ed25519
+RUN chmod 600 /root/.ssh
 
 # compile and install universal-ctags
 # hadolint ignore=DL3003,DL3008

Build

scsosna@mymachine opengrok % docker build -t opengrok .

Clone Repos

A fully-qualified directory path is provided when starting the container which identifies the directory for OpenGrok to index

I recommend using a dedicated, separate directory into which the repos of interest are cloned.  OpenGrok processes each subdirectory as a separate project, pulls recent changes, and reindexes.

scsosna@mymachine opengrok % cd ~/data/src
scsosna@mymachine src % mkdir repos
scsosna@mymachine src % cd repos
scsosna@mymachine repos % git clone git@github.com:spring-projects/spring-boot.git
scsosna@mymachine repos % git clone git@github.com:spring-projects/spring-framework.git
scsosna@mymachine repos % git clone git@github.com:spring-projects/spring-security.git
scsosna@mymachine repos % git clone git@github.com:square/retrofit.git
scsosna@mymachine repos % git clone git@github.com:scsosna99/neo4j-gradle-dependencies.git

Start OpenGrok

Run Container

Simple command, provide the local directory in which the repos to index are located.  In my example, it’s /data/src/repos.

scsosna@mymachine repos % docker run -d -v ~/data/src/repos:/opengrok/src -p 8080:8080 opengrok

Note: the -p 8080:8080 maps the container port 8080 to machine port 8080.  Choose another port if 8080 is unavailable, preferably a non-privileged port above 1023, such as -p 8080:1234.

Update Known Hosts

Initially, your SSH key is not known or approved and needs to be confirmed.  OpenGrok will attempt its git pull but the command fails because the SSH key is still unrecognized.  Accessing the container via the docker shell, you need to execute a manual git pull to approve the key.  You’ll only need to do this once.

The docker ps command shows the names of running containers, in our example bold_chandrasekhar.  After entering the docker shell, change to any source repo and execute a git pull.

scsosna@mymachine repos % docker ps                         
CONTAINER ID   IMAGE      COMMAND               CREATED          STATUS          PORTS                    NAMES
a77a49d1d675   opengrok   "/scripts/start.py"   58 seconds ago   Up 56 seconds   0.0.0.0:8080->8080/tcp   bold_chandrasekhar
scsosna@mymachine repos % docker exec -it bold_chandrasekhar bash
root@a77a49d1d675:/usr/local/tomcat# cd /opengrok/src
root@a77a49d1d675:/opengrok/src# cd user-service
root@a77a49d1d675:/opengrok/src/user-service# git pull
The authenticity of host 'github.com (140.82.112.3)' can't be established.
ED25519 key fingerprint is SHA256:+DiY3wvvV6TuJJhbpZisF/zLDA0zPMSvHdkr4UvCOqU.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added 'github.com' (ED25519) to the list of known hosts.
Already up to date.
root@a77a49d1d675:/opengrok/src/user-service# exit
scsosna@mymachine repos %

Restart Container

Since OpenGrok indexes every hour, you can wait an hour for the next re-indexing or restart the container to force an immediate re-index.

scsosna@mymachine repos % docker restart bold_chandrasekhar
bold_chandrasekhar
scsosna@mymachine repos %

Browse OpenGrok

Navigate to http://localhost:8080 to access OpenGrok:

Select the project(s) to include in your search:

Enter a search and review the results:

Conclusions

I’m a long-time advocate of OpenGrok but hadn’t used it recently so was pleasantly surprised to see it so easy to set up in a local environment using Docker.

Using it locally in this manner has definitely helped productivity, for the reasons described: I’m only interested in a subset of the enterprise code base.

I have noticed that indices appear to be corrupted over time, requiring starting a new image, but that effort is fairly minor.