Writing multi-package analysis tools for Go

In my posts about embedding in Go
last month, I provided multiple examples of different kinds of embeddings from
the Go standard library. How did I find these examples?

I wish I could say it all comes from a deep familiarity with the breadth and
depth of the standard library; instead, I combined the programming virtues of
laziness and impatience
and wrote a tool that found these examples for me.

In this post, I’m going to describe this tool and how you may go about writing
such tools of your own to analyze real-world Go codebases to glean any insights
you may be interested in.

The task

Let’s start by describing the requirement: we’re interested in finding all
instances of embeddings in Go code, and moreover – we’d like to know what kinds
of embeddings they are and call it out in some way; i.e. distinguish
interface-in-interface embeddings from struct-in-struct embeddings, and so on.

I wrote earlier about the various compilation steps Go source code goes through.
Many of these are available for Go tool writers as well, and it’s worth spending
a bit of time thinking about the level of information we need for our tool. For
a deeper exploration of what it takes to analyze Go source code, I highly
recommend reading this document.

Just parsing the Go source code of a project won’t do, because we’ll need type
information. Take this example struct from part 3
of the embedding post:

type StatsConn struct {
net.Conn

BytesRead uint64
}

We can figure out that net.Conn is an embedding from parsing this code and
looking at the AST. But what kind of embedding is it? Is net.Conn an
interface or a struct? For this, we’ll have to run the AST through Go type
checking; moreover, in the general case this ought to be cross-package, or
even cross-module type checking because the embedded type net.Conn could
be defined in a different package or module. Therefore, our tool should
be able to perform cross-module type checking. If this sounds tricky, that’s
because it is! But worry not, Go has just the package to help us.

x/tools/go/packages

Enter x/tools/go/packages, which I’ll refer
to as XTGP from this point on. This package is a one-stop-shop for loading
Go packages for analysis. It does all the heavy lifting for tool writers,
leaving us with just the “business logic” of the tool to write – the analysis
itself. For a given package, XTGP will:

Parse and type check the package, providing access to the AST and full type
information.
Optionally load all of the package’s dependencies, type checking them as
well.

XTGP is the newest (2018) in a sequence of similar packages, and
has by now replaced the other approaches as the “one true recommended way”
for multi-package analysis. It’s also used as the basis for x/tools/go/analysis, upon which tools
like go vet are now built. In this post I’ll show how to write my tool using
both “vanilla” XTGP and the go/analysis framework.

Finding embeddings

It’s time to show the code of the “find embeddings” tool. The full source code
is available on GitHub.
We’ll start with the setup for configuring XTGP:

import “golang.org/x/tools/go/packages”

const mode packages.LoadMode = packages.NeedName |
packages.NeedTypes |
packages.NeedSyntax |
packages.NeedTypesInfo

func main() {
flag.Usage = func() {
out := flag.CommandLine.Output()
fmt.Fprintln(out, “usage: find-embeddings [options] <module dir>n”)
fmt.Fprintln(out, “Options:”)
flag.PrintDefaults()
}

pattern := flag.String(“pattern”, “./…”, “Go package pattern”)
flag.Parse()
if flag.NArg() != 1 {
log.Fatal(“Expecting a single argument: directory of module”)
}

var fset = token.NewFileSet()
cfg := &packages.Config{Fset: fset, Mode: mode, Dir: flag.Args()[0]}
pkgs, err := packages.Load(cfg, *pattern)
if err != nil {
log.Fatal(err)
}

for _, pkg := range pkgs {
findInPackage(pkg, fset)
}
}

The main entry point to XTGP is packages.Load, which takes a
packages.Config object for configuration. The most important field to pay
attention to is Mode, which
specifies what XTGP should load. It’s tempting to just ask for “everything”,
but this isn’t necessarily the best approach in the general case, as it may take
quite a while for large projects. For example, in our case we don’t need
NeetImports | NeedDeps, which would bring in the type-checked ASTs of all
the transitive dependencies of our code. This is an expensive operation, as you
can imagine! All we need for our tool is to look at dependencies sufficiently
to glean the type information of their exported types; luckily, in Go this
information is available cheaply (to support Go’s famously fast parallel
compilation process).

Once we have the packages loaded, we get a slice of packages.Package values,
through which we can perform our analysis. We invoke findInPackage for
each such package.

func findInPackage(pkg *packages.Package, fset *token.FileSet) {
for _, fileAst := range pkg.Syntax {
ast.Inspect(fileAst, func(n ast.Node) bool {
if structTy, ok := n.(*ast.StructType); ok {
findInFields(structTy.Fields, n, pkg.TypesInfo, fset)
} else if interfaceTy, ok := n.(*ast.InterfaceType); ok {
findInFields(interfaceTy.Methods, n, pkg.TypesInfo, fset)
}

return true
})
}
}

This function has two important tasks:

Invoke ast.Inspect to run a visitor function on every AST node in the
package. Our visitor focuses on either an *ast.StructType or
*ast.InterfaceType to look deeper into struct/interface declarations.
Deal with a difference in how struct vs. interface fields are
accessed (Fields field for *ast.StructType, Methods field
for *ast.InterfaceType).

Let’s move on to findInFields:

func findInFields(fl *ast.FieldList, n ast.Node, tinfo *types.Info, fset *token.FileSet) {
type FieldReport struct {
Name string
Kind string
Type types.Type
}
var reps []FieldReport

for _, field := range fl.List {
if field.Names == nil {
tv, ok := tinfo.Types[field.Type]
if !ok {
log.Fatal(“not found”, field.Type)
}

embName := fmt.Sprintf(“%v”, field.Type)

_, hostIsStruct := n.(*ast.StructType)
var kind string

switch typ := tv.Type.Underlying().(type) {
case *types.Struct:
if hostIsStruct {
kind = “struct ([email protected])”
} else {
kind = “struct ([email protected])”
}
reps = append(reps, FieldReport{embName, kind, typ})
case *types.Interface:
if hostIsStruct {
kind = “interface ([email protected])”
} else {
kind = “interface ([email protected])”
}
reps = append(reps, FieldReport{embName, kind, typ})
default:
}
}
}

if len(reps) > 0 {
fmt.Printf(“Found at %vn%vn”, fset.Position(n.Pos()), nodeString(n, fset))

for _, report := range reps {
fmt.Printf(“–> field ‘%s’ is embedded %s: %sn”, report.Name, report.Kind, report.Type)
}
fmt.Println(“”)
}
}

This function is conceptually simple; it iterates over a slice of fields,
focusing only on fields that are unnamed (i.e. embedded). For each field, it
looks at its underlying type [1] and its kind – is it a struct type, or an
interface type? This is where inter-package type analysis is critical, because
in the general case we have no way of knowing the type of fields without
understanding the types imported from other packages.

This is it! There’s a bit of extra logic in findInFields to collect all
embedded fields of a given struct/interface into a single place, but otherwise
it does what we need – including distinguishing between the kinds of embedding.
This simple tool can now be run on the Go standard library or real-world large
projects (like k8s or hugo) and report all the embeddings found therein.

Finding embeddings using go/analysis

The example shown above uses the “raw” XTGP API to load packages. An alternative
approach is to use the go/analysis framework, which saves us from some of
the boilerplate:

import “golang.org/x/tools/go/analysis”
import “golang.org/x/tools/go/analysis/singlechecker”

var EmbedAnalysis = &analysis.Analyzer{
Name: “embedanalysis”,
Doc: “reports embeddings”,
Run: run,
}

func main() {
singlechecker.Main(EmbedAnalysis)
}

func run(pass *analysis.Pass) (interface{}, error) {
for _, file := range pass.Files {
ast.Inspect(file, func(n ast.Node) bool {
if structTy, ok := n.(*ast.StructType); ok {
findInFields(structTy.Fields, n, pass.TypesInfo, pass.Fset)
} else if interfaceTy, ok := n.(*ast.InterfaceType); ok {
findInFields(interfaceTy.Methods, n, pass.TypesInfo, pass.Fset)
}

return true
})
}

return nil, nil
}

Note how short the main function becomes; by delegating to the
go/analysis framework, we no longer need to explicitly initialize
go/packages or handle command-line flags. The singlechecker helper
from go/analysis does this for us.

The rest of the code is very similar to the previous sample. run is the
moral equivalent of findInPackage and does pretty much the same work, except
that it has to operate on pass.Files instead of pkg.Syntax. It invokes
findInFields for every struct or interface, and this function is exactly
the same as shown above.

Conclusion

Given the two slightly different approaches to achieve the same goal, which one
should you choose?

Pros of using XTGP directly:

More flexibility in how go/packages is configured and how the command-line
interface (flags, etc.) for the tool is defined.
Less magic and fewer black boxes in the process.

Pros of using go/analysis:

Slightly less code to write; if you’re writing many different analyses, this
may add up.
Interoperability with other analyses; there’s a rich set of
analysis passes
available for use, and go/analysis makes it easy to chain passes and
pass information between them.

Whichever way you choose, it’s comforting to know that Go has powerful tooling
support that makes it relatively easy to write tools to analyze whole codebases.
This tooling framework handles to most tedious part of tool-writing: figuring
out how the project is assembled from multiple modules and packages [2]. It
does the heavy lifting, leaving the tool writer with only the “business logic”
of the analysis to implement.

For writing the business logic, we have a fully type-checked AST at our
disposal. ASTs are the starting point for most real-world compilers, and if you
need some specialized IR – this can typically be constructed from a type-checked
AST. For example, if your analysis needs the program in SSA form you can use
x/tools/go/ssa to create SSA straight from the type-checked packages XTGP
returns. But… I’m getting carried away here, as this is a topic for another
time.

Happy tool-writing!

[1]
The concept of underlying types help us see through named types or
aliases. For example if we have var k Foo and elsewhere
type Foo int, then we know that the underlying type of k is
int.

[2]
In the C++ world this is similar to compilation databases.

Flatlogic Admin Templates banner

Leave a Reply

Your email address will not be published. Required fields are marked *