Passing callbacks and pointers to Cgo

Cgo enables Go programs to invoke C
libraries or any other library that exposes a C API. As such, it’s a important
part of a Go programmer’s toolbox.

Using Cgo can be tricky, however, especially when passing pointers and callback
functions between Go and C code. This post discusses an end-to-end
example that covers:

Basic usage of Cgo, including linking a custom C library into the Go binary.
Passing structs from Go to C.
Passing Go functions to C and arranging C to call them back later.
Safely passing arbitrary Go data to C code, which can later
pass it back to the Go callbacks it invokes.

This is not a tutorial for Cgo – before reading, you’re expected to have some
familiarity with its simpler use cases. Several useful tutorials and reference
pages are listed at the end of the post. The full source code for this example
is available on GitHub.

The problem – a C library that invokes multiple Go callbacks

Here is the header file of a fictional C library that works through some data
and invokes callbacks based on events:

typedef void (*StartCallbackFn)(void* user_data, int i);
typedef void (*EndCallbackFn)(void* user_data, int a, int b);

typedef struct {
StartCallbackFn start;
EndCallbackFn end;
} Callbacks;

// Processes the file and invokes callbacks from cbs on events found in the
// file, each with its own relevant data. user_data is passed through to the
// callbacks.
void traverse(char* filename, Callbacks cbs, void* user_data);

The callback signatures are made up, but demonstrate several important patterns
that are common in reality:

Every callback has its own type signature; here we’re using int parameters
for simplicity, but it could be anything else.
When only a small number of callbacks is involved, they could be passed into
traverse as separate parameters; however, often the number of callbacks
is large (say, more than 3) and then almost always a struct collecting
them is passed along. It’s common to allow the user to set some of the
callbacks to NULL to convey to the library that this particular event is
not interesting and no user code should be invoked for it.
Every callback gets an opaque user_data pointer passed through from
the call to traverse. It’s used to distinguish different traversals from
each other, and pass along user-specific state. traverse typically
passes user_data through without even trying to access it; since it’s
void*, it’s completely opaque to the library and the user code will cast
it to some concrete type inside the callback.

Our implementation of traverse is just a trivial simulation:

void traverse(char* filename, Callbacks cbs, void* user_data) {
// Simulate some traversal that calls the start callback and then the end
// callback, if they are defined.
if (cbs.start != NULL) {
cbs.start(user_data, 100);
}
if (cbs.end != NULL) {
cbs.end(user_data, 2, 3);
}
}

Our task is to wrap this library for usage from Go code. We’ll want to invoke Go
callbacks on traversal, without having to write any additional C code.

The Go interface

Let’s start by sketching how our interface would look like in Go. Here is
one way:

type Visitor interface {
Start(int)
End(int, int)
}

func GoTraverse(filename string, v Visitor) {
// … implementation
}

The rest of the post shows a complete implementation using this approach.
However, it has some drawbacks:

When the number of callbacks we need to provide is large, writing
implementations of Visitor may be tedious if we’re only interested
in a couple of callbacks. This can be mitigated by providing a struct to
implement the complete interface with some defaults (say, no-ops) and user
structs can then embed this default struct and not have to implement every
single method. Still, interfaces with many methods are often not a good Go
practice.
A more serious limitation is that it’s hard to convey to the C traverse
that we’re not interested in some callback. The object implementing
Visitor will – by definition – have an implementation for all the methods,
so there’s no easy way to tell if we’re not interested in invoking some of
them. This can have serious performance implications.

An alternative approach is to mimic what we have in C; that is, create a struct
collecting function objects:

type GoStartCallback func(int)
type GoEndCallback func(int, int)

type GoCallbacks struct {
startCb GoStartCallback
endCb GoEndCallback
}

func GoTraverse(filename string, cbs *GoCallbacks) {
// … implementation
}

This solves both drawbacks immediately: the default value of a function object
is nil, which can be interpreted by GoTraverse as “not interested in
this event”, wherein it can set the corresponding C callback to NULL. Since
Go function objects can be closures or bound methods, there’s no difficulty in
preserving state between the different callbacks.

The accompanying code sample has this alternative implementation available in a
separate directory,
but in the rest of the post we’re going to proceed with the more idiomatic
approach that uses a Go interface. For the implementation, it doesn’t really
matter which approach is chosen.

Implementing the Cgo wrapper

Cgo pointer passing rules
disallow passing Go function values directly to C, so to register callbacks we
need to create wrapper functions in C.

Moreover, we also can’t pass pointers allocated in Go to C directly, because the
Go concurrent garbage collector may move data around. The Cgo Wiki page offers a workaround
using indirection. Here I’m going to use the
go-pointer package which
accomplishes the same in a slightly more convenient and general way.

With this in mind, let’s get straight to the implementation. The code may appear
obscure at first, but it will all make sense soon. Here’s the code for
GoTraverse:

import gopointer “github.com/mattn/go-pointer”

func GoTraverse(filename string, v Visitor) {
cCallbacks := C.Callbacks{}

cCallbacks.start = C.StartCallbackFn(C.startCgo)
cCallbacks.end = C.EndCallbackFn(C.endCgo)

var cfilename *C.char = C.CString(filename)
defer C.free(unsafe.Pointer(cfilename))

p := gopointer.Save(v)
defer gopointer.Unref(p)

C.traverse(cfilename, cCallbacks, p)
}

We start by creating the C Callbacks struct in Go code, and populating it.
Since we can’t assign Go functions to C function pointers, we’ll have these
wrappers, defined in a separate Go file [1]:

/*
extern void goStart(void*, int);
extern void goEnd(void*, int, int);

void startCgo(void* user_data, int i) {
goStart(user_data, i);
}

void endCgo(void* user_data, int a, int b) {
goEnd(user_data, a, b);
}
*/
import “C”

These are very thin wrappers that invoke Go functions – and we’ll have to write
one such C function per callback kind. We’ll see the Go functions goStart
and goEnd shortly.

After populating the C callback struct, GoTraverse converts the file name
from a Go string to a C string (the wiki has the
details). It then creates a value representing the Go visitor and that we
can pass to C using the go-pointer package. Finally, it calls traverse.

To complete the implementation, the code for goStart and goEnd is:

//export goStart
func goStart(user_data unsafe.Pointer, i C.int) {
v := gopointer.Restore(user_data).(Visitor)
v.Start(int(i))
}

//export goEnd
func goEnd(user_data unsafe.Pointer, a C.int, b C.int) {
v := gopointer.Restore(user_data).(Visitor)
v.End(int(a), int(b))
}

The export directives means these functions are visible to C code; their
signature should have C types or types convertible to C types. They act
similarly:

Unpack the visitor object from user_data

Invoke the appropriate method on the visitor

Callback flow in detail

Let’s examine the flow of callback calls for a “start” event to get a better
understanding of how the pieces are connected together.

GoTraverse assigns startCgo to the start pointer in the
Callbacks structure passed to traverse. Therefore, when traverse
encounteres a start event, it will invoke startCgo. The parameters are
the user_data pointer passed in to traverse and the event-specific
parameters (a single int in this case).

startCgo is a shim around goStart, and calls it with the same
parameters.

goStart unpacks the Visitor implementation that was packed into
user_data by GoTraverse and calls the Start method from there,
passing it the event-specific parameters. All the code until this point is
provided by the Go library wrapping traverse; from here, we get to the
custom code written by the user of the API.

Tunneling Go pointers through C code

Another critical detail of this implementation is the trick we used to pack
a Visitor inside a void* user_data passed around to and from C
callbacks.

The Cgo documentation
states that:

Go code may pass a Go pointer to C provided the Go memory to which it points
does not contain any Go pointers.

But of course we can’t guarantee that arbitrary Go objects don’t contain any
pointers. Besides the obvious uses of pointers, function values, slices,
strings, interfaces and many other objects contain implicit pointers.

The limitation stems from the nature of the Go garbage collector, which runs
concurrently to other code and is allowed to move data around, invalidating
pointers from the point of view of C.

So what can we do? As mentioned above, the solution is indirection and the Cgo
Wiki offers a simple example. Instead of passing a pointer to C directly, we
keep the pointer in Go-land and find a way to refer to it indirectly; we could
use some numeric index, for example. This guarantees that all pointers remain
visible to the Go GC, yet we can keep some unique identifier in C-land that will
let us access them later.

This is what the go-pointer package does, by creating a map between
unsafe.Pointer (which maps to directly void* in Cgo calls to C) and
interface{}, essentially letting us store arbitrary Go data and providing a
unique ID (the unsafe.Pointer) to refer to it later. Why is
unsafe.Pointer used instead of an int as in the Wiki example? Because
opaque data is often represented with void* in C, so unsafe.Pointer is
something that maps to it naturally. With an int we’d have to worry about
casting in several additional places.

What if there is no user_data?

Seeing how we use user_data to tunnel the user-specific Visitor
implementation through C code back to our generic callback, one may wonder –
what if there’s no user_data available?

It turns out, in most cases there is something like user_data, because
without it the original C API is flawed. Consider our traverse example
again, but this itme without user_data:

typedef void (*StartCallbackFn)(int i);
typedef void (*EndCallbackFn)(int a, int b);

typedef struct {
StartCallbackFn start;
EndCallbackFn end;
} Callbacks;

void traverse(char* filename, Callbacks cbs);

Suppose we provide a callback as start:

void myStart(int i) {
// …
}

Within myStart, we’re somewhat lost. We don’t know which traversal we
were invoked for – there could be many different traversals of different files
and data structures for different needs. We also don’t know where to record
the results of the event. The only recourse here is using global data; this
is a bad API!

Given such an API, we’re not really much worse off in Go-land. We can also rely
on global data to find the information relevant to this specific traversal,
and we can use the same go-pointer trick to store arbitrary Go objects in
this global data. But again, this situation is unlikely because the C API is
unlikely to omit this critical detail.

Links to additional resources

There’s a lot of information about using Cgo out there, some of it dated (before
the rules for passing pointers
were defined explicitly). Here’s a collection of links I found particularly
useful in preparing this post:

The official Cgo documentation is the source
of truth.

The Cgo page on the Wiki is
extremely useful.

Some details about the concurrent GC in Go.
Yasuhiro Matsumoto’s post on calling Go from C.

More details
on the pointer passing rules.

[1]
They are in a separate file because of a peculiarity of how Cgo
generates and compiles C code – more details on the Wiki.
The reason I’m not using the static inline trick for these functions
is that we have to take their address.

Flatlogic Admin Templates banner

Leave a Reply

Your email address will not be published. Required fields are marked *