blog.kdgregory.com

Monday, December 23, 2013

Learning Go: Imports

I'm going to finish this series of posts with an interesting but in my opinion unfinished feature: third-party imports. Unless you like to reinvent wheels, your projects will depend on libraries written by other people. To help you do this, Go's import declaration allows you to reference packages stored in open source repositories.

For example, if you want to use the Go project's example square root implementation, you could write the following:

package main

import "fmt"
import "code.google.com/p/go.example/newmath"

func main() {
    fmt.Println("The square root of 2 is ", newmath.Sqrt(2))
}

If you just try to build this code, you'll get an error message indicating that the compiler can't find the package. Before you can use it, you have to retrieve it:

go get code.google.com/p/go.example/newmath

This is, to say the least, inconvenient: if you depend on multiple libraries, you have to manually retrieve each of them before you can build your own project. Fortunately, go get resolves transitive dependencies, and if your project is stored in one of the “standard” repositories you can leverage this feature to retrieve your project and its dependencies in one step. However, at least for GitHub projects, this technique doesn't clone the repository. If you want to make changes — or get updates from your dependencies — you need to manually clone.

A bigger problem is that there's no mechanism for versioning: you always get the trunk revision. If all of your libraries are backwards compatible, that may not be a problem. My experience suggests that's a bad thing to rely upon. In practice, it seems that most developers retrieve the libraries that they want to use, then check those libraries into their own version control system: creating a “locked” revision that they know supports their code.

As I said, I think the feature is unfinished. Versioning is a necessary first step, and should not be too difficult to add. But it's not enough. If you rely on retrieving packages from an open-source repository, you also rely on the package owner; one fit of pique, and your dependency could disappear. Along with versioning, Google needs to add its own repository for versioned artifacts, much like Maven Central, CPAN, or RubyGems.org. This could work transparently to the user: remote imports of public projects would be proxied through Google's server, and it would keep a copy of anything that had a version number.

Thursday, December 19, 2013

Learning Go: Goroutines and Multiple Cores

Here's a program that I've been using to explore Go concurrency. If you want to play along at home, you'll need to compile and run it locally; the Playground won't suffice.

package main

import (
    "fmt"
    "runtime"
    "syscall"
)

type Response struct {
    Received int
    Calculated int
    Handler int
    Tid int
}

func Listener(me int, in chan int, out chan Response, done chan int) {
    for val := range in {
        out <- Response{val, runCalc(me), me, syscall.Gettid()}
    }
    done <- me
}

func runCalc(num int) int {
    zz := 1
    for ii := 0 ; ii < 100000000 ; ii++ {
        zz += ii % num
    }
    return zz
}

func main() {
    fmt.Println("Main running on thread ", syscall.Gettid(), " numCPU = ", runtime.NumCPU())

    chSend := make(chan int, 100)
    chRecv := make(chan Response)
    chDone := make(chan int)

    listenerCount:= 8
    for ii := 1 ; ii <= listenerCount ; ii++ {
        go Listener(ii, chSend, chRecv, chDone)
    }

    messageCount := 100
    for ii := 0 ; ii < messageCount ; ii++ {
        chSend <- ii
    }
    close(chSend)

    for listenerCount > 0 {
        select {
            case data := <- chRecv :
                fmt.Println("Received ", data.Received, ",", data.Calculated, " from ", data.Handler, " on thread ", data.Tid)
                messageCount--
            case lnum := <- chDone :
                fmt.Println("Received DONE from ", lnum)
                listenerCount--
        }
    }

    fmt.Println("Main done, outstanding messages = ", messageCount)
}

The short description of this program is that it kicks off a bunch of goroutines, then sends them CPU-intensive work. My goal in writing it was to explore thread affinity and communication patterns as the amount of work increased. Imagine my surprise when I saw the following output from my 4 core CPU:

go run multi_listener.go 
Main running on thread  27638 , numCPU =  8
Received  0 , 1  from  1  on thread  27640
Received  1 , 1  from  1  on thread  27640
Received  2 , 50000001  from  2  on thread  27640
Received  3 , 100000000  from  3  on thread  27640
Received  4 , 150000001  from  4  on thread  27640
Received  5 , 200000001  from  5  on thread  27640
Received  6 , 249999997  from  6  on thread  27640
Received  7 , 299999996  from  7  on thread  27640
Received  8 , 350000001  from  8  on thread  27640
Received  9 , 1  from  1  on thread  27640
Received  10 , 1  from  1  on thread  27640
Received  11 , 50000001  from  2  on thread  27640
Received  12 , 100000000  from  3  on thread  27640
Received  13 , 150000001  from  4  on thread  27640
Received  14 , 200000001  from  5  on thread  27640

The thread ID is's always the same! And top confirmed that this wasn't a lie: one core was consuming 100% of the CPU, while the others were idle. It took some Googling to discover the GOMAXPROCS environment variable:

experiments, 505> export GOMAXPROCS=4
experiments, 506> go run multi_listener.go 
Main running on thread  27674 , numCPU =  8
Received  2 , 350000001  from  8  on thread  27677
Received  0 , 299999996  from  7  on thread  27678
Received  1 , 1  from  1  on thread  27674
Received  3 , 350000001  from  8  on thread  27677
Received  4 , 50000001  from  2  on thread  27679
Received  5 , 299999996  from  7  on thread  27678
Received  6 , 200000001  from  5  on thread  27674
Received  7 , 249999997  from  6  on thread  27677
Received  8 , 100000000  from  3  on thread  27679
Received  9 , 1  from  1  on thread  27678
Received  10 , 200000001  from  5  on thread  27674
Received  11 , 150000001  from  4  on thread  27677

This variable is documented in the runtime package docs, and also in the (28 page) FAQ. It's not mentioned in the Go Tour or tutorial.

I'm a bit taken aback that it's even necessary, however the comment that goes along with the associated runtime method gives a hit: “This call will go away when the scheduler improves.” As of Go 1.2, the behavior remains, one of the quirks of using a young framework.

Wednesday, December 18, 2013

Learning Go: Slices

Slices are one of the stranger pieces of Go. They're like lists or vectors in other languages, but have some peculiar behaviors; particularly when multiple slices share the same backing array. I suspect that a lot of bugs will come from slices that suddenly stop sharing.

To explain, let's start with a simple slice example: creating two slices backed by an explicit array (you can run these examples in the Go Playground):

package main

import "fmt"

func main() {
    a := []int{1, 2, 3, 4, 5}
    s1 := a[1:4]
    s2 := s1[0:2]
    
    fmt.Println(a)
    fmt.Println(s1)
    fmt.Println(s2)
}

When you run this program, you get the following output (the slice operator is inclusive of its first parameter, exclusive of its second):

[1 2 3 4 5]
[2 3 4]
[2 3]

As I said, these slices share a backing array. A change to s2 will be reflected in s1 and a:

func main() {
    a := []int{1, 2, 3, 4, 5}
    s1 := a[1:4]
    s2 := s1[0:2]
    
    s2[0] = 99
    
    fmt.Println(a)
    fmt.Println(s1)
    fmt.Println(s2)
}

[1 99 3 4 5]
[99 3 4]
[99 3]

If you're used to slices from, say, Python, this is a little strange: Python slices are separate objects. A Go slice is more like a Java sub-list, sharing the same backing array. But wait, there's more, you can add items to the end of a slice:

func main() {
    a := []int{1, 2, 3, 4, 5}
    s1 := a[1:4]
    s2 := s1[0:2]
    
    s2 = append(s2, 101)
    
    fmt.Println(a)
    fmt.Println(s1)
    fmt.Println(s2)
}

Since s2 shares backing store with s1 and a, when you append a value to the former, the latter are updated as well:

[1 2 3 101 5]
[2 3 101]
[2 3 101]

But now what happens if we append a bunch of values to s2?

func main() {
    a := []int{1, 2, 3, 4, 5}
    s1 := a[1:4]
    s2 := s1[0:2]
    
    s2 = append(s2, 101)
    s2 = append(s2, 102)
    s2 = append(s2, 103)
    
    fmt.Println(a)
    fmt.Println(s1)
    fmt.Println(s2)
}

[1 2 3 101 102]
[2 3 101]
[2 3 101 102 103]

Did you see that one coming? Here's one more piece of code to ponder:

func main() {
    a := []int{1, 2, 3, 4, 5}
    s1 := a[1:4]
    s2 := s1[0:2]
    
    s2 = append(s2, 101)
    s2 = append(s2, 102)
    s2 = append(s2, 103)

    a[3] = 17
    
    fmt.Println(a)
    fmt.Println(s1)
    fmt.Println(s2)
}

[1 2 3 17 102]
[2 3 17]
[2 3 101 102 103]

As you can see, s2 no longer uses a as its backing array. Not only does a not reflect the last element added to the slice, but changing an element of a does not propagate to s2.

This behavior is hinted in the slice internals documentation, which says that append() “grows the slice if a greater capacity is needed.” Since Go requires you to pay attention to return values, whatever code appended to the slice will always see the correct values. But if you have multiple slices that you think refer to the same backing array, the others won't.

I can understand why you would want slices to share a backing array: they represent a view on that array. And I can understand the desire for expanding the slice via append(): it's the behavior that other languages provide in a list. But this blend seems to be the worst of both worlds, in that you never know whether a changing a given slice will mutate other slices/arrays, or not. I recommend treading carefully.