The goal of this post is to show an example of how to refactor existing code to gain better
reusability, modularization, and generalization.
Initial Context
Recently I was working on a project to lint debug log statements. The project goals include out of the box
support for multiple popular logging libraries, as well as the ability to configure the linter to work
with any logging library.
For this linter, I want to automatically detect safe functions when possible. A reasonable starting point is linting zap.
For zap specifically, safe functions include zap.String
, zap.Int
, zap.Int64
, zap.Any
and so on.
The starting code provided below works for one case. But that’s not good enough. I know
that I’ll need to be able to work with an arbitrary number of logging libraries, and
there’s no way I’m maintaining code for every library.
Starting Code
From funcs package - included for reference
1
2
3
4
5
6
7
8
9
10
| // Description contains a description of a Function.
// Description is used instead of *types.Func for
// ease of configuration.
type Description struct {
// Package is the package the function is declared in.
// Should be the full package path.
Package string
// Name is the name of the function.
Name string
}
|
The actual zap field finding code. This is the code we’ll be iterating on.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
| import (
"debuglint/internal/funcs"
"go/types"
"github.com/hashicorp/go-set/v3"
"golang.org/x/tools/go/packages"
)
const (
zapPackagePath = "go.uber.org/zap"
zapFieldType = "go.uber.org/zap.Field"
)
// AllowedZapFnCalls returns the default zap function calls that are allowed.
func AllowedZapFnCalls() *set.Set[funcs.Description] {
pkgs, err := packages.Load(
&packages.Config{
Mode: packages.NeedTypes | packages.NeedTypesInfo,
},
zapPackagePath,
)
if err != nil {
panic(err)
}
return findZapFieldFunctions(pkgs)
}
// findZapFieldFunctions discovers all functions in the zap package that return zap.Field.
// These functions (like zap.String, zap.Int, etc.) are considered "safe" for use in
// debug log arguments since they don't perform expensive computations themselves.
// This enables automatic discovery rather than maintaining a hardcoded allowlist.
func findZapFieldFunctions(pkgs []*packages.Package) *set.Set[funcs.Description] {
s := set.New[funcs.Description](1)
for _, pkg := range pkgs {
scope := pkg.Types.Scope()
for _, name := range scope.Names() {
obj := scope.Lookup(name)
if fn, ok := obj.(*types.Func); ok {
results := fn.Signature().Results()
if results.Len() == 1 {
rType := results.At(0).Type().String()
if rType == zapFieldType {
s.Insert(
funcs.Description{
Package: zapPackagePath,
Name: fn.Name(),
},
)
}
}
}
}
}
return s
}
|
Issues and goal
To be clear, the above code works and is reasonably testable. It is by no means terrible code.
But there are some issues.
- In order to support slog or other logging libraries, the above code would need to either be refactored
or copy and pasted. Copy and paste is almost never a good pattern to follow.
- This code depends on internal data structures for this particular project. Not exactly a deal breaker, since
we are talking about code used internally for a specific project. But it does not need to be this way.
- The code depends on external data structures from the package set. This again is not a big deal, since
most of the project also uses set.Set. It just doesn’t need to be this way.
So let’s try and make our code a little more general and modular.
First Pass
Start With Tests
Seriously, you already had tests, right?
The first step should be to write tests for the outermost function whose behavior we do not want to change. Starting with well-tested code allows
us to make drastic changes with confidence.
In this case, that function is AllowedZapFnCalls
. This function happened to already be tested, and exported
functions should generally come with dedicated unit tests anyway.
Included here is a minimal test for correctness. This is by no means a complete set of tests, after all this post is not about testing your code.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
| func TestAllowedZapFnCalls(t *testing.T) {
t.Parallel()
tests := []struct {
name string
wantInclude []funcs.Description
wantNotInclude []funcs.Description
}{
{
name: "Spot check functions",
wantInclude: []funcs.Description{
{
Package: zapPackagePath,
Name: "Int",
},
{
Package: zapPackagePath,
Name: "String",
},
},
wantNotInclude: []funcs.Description{
{
Package: zapPackagePath,
Name: "LevelFlag",
},
},
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
t.Parallel()
got := AllowedZapFnCalls()
// We use ContainsSlice instead of checking for all possible
// functions because the zap API can change. It's just not
// reasonable to check every single function.
assert.True(t, got.ContainsSlice(tt.wantInclude), "Got: %v\nWant:%v\n", got, tt.wantInclude)
assert.False(t, got.ContainsSlice(tt.wantNotInclude), "Got: %v\nWant:%v\n", got, tt.wantNotInclude)
})
}
}
|
Make the code re-usable within the same package
First we’re going to extract needed parameters, allowing the next
logging library we support to be implemented with a single function call.
Create a new file func_find.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
| package configs
import (
"debuglint/internal/funcs"
"go/types"
"github.com/hashicorp/go-set/v3"
"golang.org/x/tools/go/packages"
)
func findFuncsThatReturn(pkgPath string, returnType string) *set.Set[funcs.Description] {
pkgs, err := packages.Load(
&packages.Config{
Mode: packages.NeedTypes | packages.NeedTypesInfo,
},
pkgPath,
)
if err != nil {
panic(err)
}
return scanPkgForFuncs(pkgs, pkgPath, returnType)
}
// scanPkgForFuncs discovers all functions in the pkgPath package that return returnType and only returnType.
func scanPkgForFuncs(pkgs []*packages.Package, pkgPath string, returnType string) *set.Set[funcs.Description] {
s := set.New[funcs.Description](1)
for _, pkg := range pkgs {
scope := pkg.Types.Scope()
for _, name := range scope.Names() {
obj := scope.Lookup(name)
if fn, ok := obj.(*types.Func); ok {
results := fn.Signature().Results()
if results.Len() == 1 {
rType := results.At(0).Type().String()
if rType == returnType {
s.Insert(
funcs.Description{
Package: pkgPath,
Name: fn.Name(),
},
)
}
}
}
}
}
return s
}
|
Alter AllowedZapFnCalls
to use the new altered code.
1
2
3
4
| // AllowedZapFnCalls returns the default zap function calls that are allowed.
func AllowedZapFnCalls() *set.Set[funcs.Description] {
return findFuncsThatReturn(zapPackagePath, zapFieldType)
}
|
At this point, we now have a fairly general findFuncsThatReturn
function that can be reused
to support other log libraries. By providing both the package path and return type, we can
support packages such as slog. In fact, here’s a working slog implementation:
1
2
3
4
5
6
7
8
9
10
| const (
slogPackagePath = "log/slog"
slogFieldType = "log/slog.Attr"
)
// AllowedSlogCalls is the default set of allowed slog
// structured field calls.
func AllowedSlogCalls() *set.Set[funcs.Description] {
return findFuncsThatReturn(slogPackagePath, slogFieldType)
}
|
This refactoring is starting to look pretty good. Any new logging libraries will be quite easy to support.
This code is a reasonable place to stop much of the time. For many projects, if I were reviewing a PR with this code, I would happily approve the code!
However, this particular piece of code can be made more reusable without adding undue
burden. There’s no need for the code that does the searching to depend on set.Set
or funcs.Description
in any way.
We’re going to have to flip the dependency relationship.
Iterator approach - flipping the dependencies
The iter package introduced in go1.23 lets us remove the data type dependencies from our functionality.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
| func Returning(pkgPath string, returnType string) (iter.Seq[*types.Func], error) {
pkgs, err := packages.Load(
&packages.Config{
Mode: packages.NeedTypes | packages.NeedTypesInfo,
},
pkgPath,
)
if err != nil {
return nil, fmt.Errorf("load package %s: %w", pkgPath, err)
}
return scanPkgForFuncs(pkgs, returnType), nil
}
// scanPkgForFuncs discovers all functions in the loaded pkgs package that return returnType and only returnType.
func scanPkgForFuncs(pkgs []*packages.Package, returnType string) iter.Seq[*types.Func] {
return func(yield func(*types.Func) bool) {
for _, pkg := range pkgs {
scope := pkg.Types.Scope()
for _, name := range scope.Names() {
obj := scope.Lookup(name)
if fn, ok := obj.(*types.Func); ok {
if shouldYield(fn, returnType) && !yield(fn) {
return
}
}
}
}
}
}
func shouldYield(fn *types.Func, returnType string) bool {
results := fn.Signature().Results()
if results.Len() != 1 {
return false
}
rType := results.At(0).Type().String()
return rType == returnType
}
|
This change does several things.
Rename the function to Returning. At this point I’m planning to move this to a single purpose package called funcfind
.
funcfind.Returning
is nicer to use.
Utilizes iterators to yield a Seq of *types.Func
. Naturally, the calling code must also change a little
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
| func findStructuredLogFns(pkgPath string, returnType string) *set.Set[funcs.Description] {
fnIter, err := funcfind.Returning(pkgPath, returnType)
if err != nil {
panic(err)
}
s := set.New[funcs.Description](1)
for fn := range fnIter {
s.Insert(
funcs.Description{
Package: pkgPath,
Name: fn.Name(),
},
)
}
return s
}
|
The immediate downside of this change is the new need for iterating and building the collection outside of the
newly renamed Returning
function. That should be considered OK, since the core logic of Returning
is
now focused entirely on finding functions returning a specific type. The newly created funcfind package
is now doing one thing, and one thing only.
shouldYield
is pulling a bit of somewhat complex decision making into a single place. The change reduces the
cyclomatic complexity of the calling function, as well as making an easy place to make more complex decisions
should multiple return values ever be supported (They are now)
Tradeoffs
The iterator approach does have some tradeoffs. As previously mentioned, the caller must write
a little more code. Developers calling the package must also understand how iterators work.
The upsides, however, are significant. By using iterators we remove dependencies on external
data types and focus on the one thing this section of code is supposed to do. It’s potentially
more memory efficient, the caller can inspect a single item at a time instead of needing to
load the full data set into memory for processing and manipulation. It’s more flexible in that
the caller can decide when or if to continue processing the dataset. Iterators are also composable.
Here’s a quick example of finding function names that only start with String:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| funcs, _ := funcfind.Returning("go.uber.org/zap", "go.uber.org/zap.Field")
// onlyStartsWithString is a []string containing all functions that
// start with String
onlyStartsWithString := slices.Collect(
iter.Map(
iter.Filter(
funcs,
func(fn *types.Func) bool {
return strings.HasPrefix(fn.Name(), "String")
},
),
func (fn *types.Func) string {
return fn.Name()
}
)
)
|
Final Look
At this point we have very generic code providing function finding capabilities. In fact, it’s so generic I’ve published it as a standalone package