Thursday, December 19, 2013

Learning Go: Goroutines and Multiple Cores

Here's a program that I've been using to explore Go concurrency. If you want to play along at home, you'll need to compile and run it locally; the Playground won't suffice.

package main

import (
    "fmt"
    "runtime"
    "syscall"
)

type Response struct {
    Received int
    Calculated int
    Handler int
    Tid int
}

func Listener(me int, in chan int, out chan Response, done chan int) {
    for val := range in {
        out <- Response{val, runCalc(me), me, syscall.Gettid()}
    }
    done <- me
}

func runCalc(num int) int {
    zz := 1
    for ii := 0 ; ii < 100000000 ; ii++ {
        zz += ii % num
    }
    return zz
}

func main() {
    fmt.Println("Main running on thread ", syscall.Gettid(), " numCPU = ", runtime.NumCPU())

    chSend := make(chan int, 100)
    chRecv := make(chan Response)
    chDone := make(chan int)

    listenerCount:= 8
    for ii := 1 ; ii <= listenerCount ; ii++ {
        go Listener(ii, chSend, chRecv, chDone)
    }

    messageCount := 100
    for ii := 0 ; ii < messageCount ; ii++ {
        chSend <- ii
    }
    close(chSend)

    for listenerCount > 0 {
        select {
            case data := <- chRecv :
                fmt.Println("Received ", data.Received, ",", data.Calculated, " from ", data.Handler, " on thread ", data.Tid)
                messageCount--
            case lnum := <- chDone :
                fmt.Println("Received DONE from ", lnum)
                listenerCount--
        }
    }

    fmt.Println("Main done, outstanding messages = ", messageCount)
}

The short description of this program is that it kicks off a bunch of goroutines, then sends them CPU-intensive work. My goal in writing it was to explore thread affinity and communication patterns as the amount of work increased. Imagine my surprise when I saw the following output from my 4 core CPU:

go run multi_listener.go 
Main running on thread  27638 , numCPU =  8
Received  0 , 1  from  1  on thread  27640
Received  1 , 1  from  1  on thread  27640
Received  2 , 50000001  from  2  on thread  27640
Received  3 , 100000000  from  3  on thread  27640
Received  4 , 150000001  from  4  on thread  27640
Received  5 , 200000001  from  5  on thread  27640
Received  6 , 249999997  from  6  on thread  27640
Received  7 , 299999996  from  7  on thread  27640
Received  8 , 350000001  from  8  on thread  27640
Received  9 , 1  from  1  on thread  27640
Received  10 , 1  from  1  on thread  27640
Received  11 , 50000001  from  2  on thread  27640
Received  12 , 100000000  from  3  on thread  27640
Received  13 , 150000001  from  4  on thread  27640
Received  14 , 200000001  from  5  on thread  27640

The thread ID is's always the same! And top confirmed that this wasn't a lie: one core was consuming 100% of the CPU, while the others were idle. It took some Googling to discover the GOMAXPROCS environment variable:

experiments, 505> export GOMAXPROCS=4
experiments, 506> go run multi_listener.go 
Main running on thread  27674 , numCPU =  8
Received  2 , 350000001  from  8  on thread  27677
Received  0 , 299999996  from  7  on thread  27678
Received  1 , 1  from  1  on thread  27674
Received  3 , 350000001  from  8  on thread  27677
Received  4 , 50000001  from  2  on thread  27679
Received  5 , 299999996  from  7  on thread  27678
Received  6 , 200000001  from  5  on thread  27674
Received  7 , 249999997  from  6  on thread  27677
Received  8 , 100000000  from  3  on thread  27679
Received  9 , 1  from  1  on thread  27678
Received  10 , 200000001  from  5  on thread  27674
Received  11 , 150000001  from  4  on thread  27677

This variable is documented in the runtime package docs, and also in the (28 page) FAQ. It's not mentioned in the Go Tour or tutorial.

I'm a bit taken aback that it's even necessary, however the comment that goes along with the associated runtime method gives a hit: “This call will go away when the scheduler improves.” As of Go 1.2, the behavior remains, one of the quirks of using a young framework.

No comments: