Elevator Pitch
With Go modules, the dependency trees of our projects have become … complicated. Other tools will show you your dependencies but not which things depend on you. To fix that, [tool] provides a graph database and an API for storing and querying module dependencies, both ancestors and descendants.
Description
Overview
At [Company]
, the move to Go Modules from our existing GOPATH-mode monorepo has brought with it some
pain points, especially around tracing descendant dependencies. In the GOPATH days, when our monorepo
lived under $GOPATH/src
and all engineers always had a full copy of the entire codebase, it was a
straightforward grep
command to find all imports of a given package to see what other packages depend on it.
Now that we’ve moved the majority of our development effort to modules, neither of those two conditions
are true. Engineers are no longer required to have the code under $GOPATH/src
and almost no one has
the entire codebase locally anymore. We have dozens of functional teams working on hundreds of microservices,
so most developers have pared down their local workspace to only the things they’re directly working
on on a day-to-day basis.
An unfortunate side effect of this paradigm shift has been that there is no longer a direct way to see
which other modules depend on your work. Existing tools like go mod graph
and go list -m all
will
show you what modules you depend on - with some rough edges - and the pkg.go.dev
site has an Imported By
view that shows what other packages depend on your code. The go
tool won’t show which things depend
on you, though. The pkg.go.dev
site can only show things that it knows about, so it won’t help for
private modules, and it doesn’t show you which versions of those other packages depend on your code.
In this talk, [name]
will be showing off [tool]
, which [Company]
built to fill in some of these
gaps. [tool]
provides a graph of versioned links between modules along with an API to populate the
database and queries to answer questions like “Which of our modules depend on v1.42.0 of foo?” and
“How many things are still referencing v0.11.38 of bar?”.
Outline
Introduction (3 minutes)
Code searches were simpler in GOPATH mode, where all source code was found under a common parent folder.
[Company]
also has a large number of microservices, hundreds in fact, along with nearly a hundred
“library” modules for various things. Add in the service-to-service dependencies and various third-
party OSS modules and the number of links between all of those things gets untenable pretty quickly.
Existing Tooling (3 minutes)
Unfortunately, the go
CLI commands, the pkg.go.dev
site, and OSS tools like goda
and godepgraph
don’t quite cover the ground we need, specifically
querying for downstream dependencies. The go
CLI, goda
and godepgraph
all do an excellent job
of surfacing up which modules your code depends on in multiple ways. The pkg.go.dev site also provides
a nice Imported By view, but it shows which packages depend on your code, not which modules, and
doesn’t include the version(s) of those dependents.
[tool]
To The Rescue
The Database (5 minutes)
For simplicity, [tool]
uses a PostgreSQL database rather than an actual graph database like Neo4J,
Cayley, and the like. After some initial investigation, we found that the relatively small number of
entries (compared to other graph datasets) didn’t warrant a specialized graph database. Additionally,
our IT organization already has all of the necessary infrastructure in place to support PostgreSQL,
both in-house and hosted in AWS.
The Service (7 minutes)
Service Architecture
The [service name]
service exposes a gRPC API, along with JSON/REST mappings using the gRPC Gateway
project for exposure to web-based consumers. Both endpoints, plus a very basic web UI for testing,
are served on a single port using cmux.
Service Operations
Updates to the graph are made by submitting a request with the name and version of a module, along with a list of its dependencies (name and version). This request is converted into a set of “edges” in the graph linking the name/version pairs. Since tagged module versions are frozen in Go, any existing data for the submitted module is first removed.
Example update:
curl --request POST http://localhost/api/graph-update --header 'Content-Type: application/json' --data-raw '{"module": "github.com/example/foo@v1.2.3", "deps": ["github.com/rs/zerolog@v1.27.0", "golang.org/x/text@v0.0.0-...."]}'
Queries consist of a target module, an optional version, and a direction (ancestors or descendants). An “ancestor” query returns the set of modules that the target module depends on and a “descendant” query returns the set of modules that depend on the target. In either case, if no version is specified for the target, the most recent known version is used.
Example query:
curl --request GET http://localhost/api/graph-query?module=github.com%2Fexample%2Ffoo%40v1.2.3&mode=descendant
The format of the response is either:
- a JSON document containing the graph of dependencies as nested objects
- a DOT file describing the graph
- a rendered PNG or SVG image of the graph
Updating the Graph (4 minutes)
The [tool]
project also contains a CLI tool, [tool name]
. This program walks each module repository
in [Company]
’s VCS and, for each SemVer tag on each module, uses a combination of go mod graph
and
go list -m all
to build up the tree of that module’s dependencies. Then, for each node in the tree,
it submits a graph update to the [service name]
service for that node and its immediate children.
In this way, we were able to build up the information needed for descendant queries using standard Go
tooling and a small amount of “elbow grease”. At [Company]
, we stopped the walk at our own service
and library modules, but [tool name]
could query OSS repos and aggregate their dependencies just as easily.
Demo (1 minute)
Spin up [service name]
with a pre-seeded database of selected modules and show the results of several queries.
Questions and Future Improvements (1-2 minutes, can cut or rush)
Talk about areas for improvement including:
- A nicer UI
- A GraphQL API
- An “updated at” timestamp for each module/version to skip processing for modules that haven’t changed
Questions from the audience
Notes
This talk will be relevant to all Gophers since we all have to work with modules and manage our dependencies. It will be especially useful to project maintainers who want to be conscientious about managing changes or to those who enjoy graph visualizations.
Why Now?
Go Modules has been enabled by default since Go 1.16. Practically all new Go code is using modules and an ever-growing percentage of the pre-modules Go ecosystem has been converted. With the move to modules, however, the matrix of dependencies for any given Go project becomes significantly more complex. There are gaps in the current language tooling that can be addressed in future Go releases but developers need answers today.
Why Me?
[name]
has been writing Go full-time for 6 years and has been working with Go Modules since they were first introduced as vgo
in 2018. For the last two years, [name]
has been instrumental in converting a large codebase (300+ microservices and nearly 100 library packages) at [Company]
over to Go Modules and working with the rest of their development team to help them along. During that time, they ran into and resolved many different issues related to both the new paradigms introduced by Go modules and to [Company]
’s own pre-existing tech debt. [name]
is also a contributor on several prominent open-source Go projects and a frequent contributor to conversations and debates in the Gophers Slack, so they are extremely familiar with the … changes … that Go Modules has introduced into the community.