Code Jam Concurrency in Golang

A typical Code Jam problem requires at least 2 solution submissions: a small set and a large set. Expect the large set to have more test cases, larger inputs, and an 8 minute time limit. Normally, this means that the brute force algorithm you wrote for the small set is useless on the large set.

Normally.

A previous post looked at the Bullseye problem in Rust. I’m going to revisit it with Go language. My base machine will be a fairly respectable “CC2 Cluster Compute” EC2 instance:

(cc2.8xlarge)
88 ECUs
16 Cores
60.5 GiB

The basic solver function

func solver(input *Input, c chan<- *Result) {
    paint_used := uint64(0)
    rings_drawn := uint64(0)

    for {
	    current_r := uint64(2)*rings_drawn + input.R
	    paint_needed := uint64(2)*current_r + uint64(1)
	    if paint_needed+paint_used > input.T {
		    break
	    }
	    paint_used += paint_needed
	    rings_drawn += uint64(1)
    }

    c <- &Result{input.Id, rings_drawn}
}

After porting the O(N) simplistic summing solution over from Rust and utilizing only one core, 1000 large test cases ran in 1280s. In the large set, there are a total of 6000 cases, 4000 of which involve large numbers, so roughly speaking, this solution would take 1 hour and 25 minutes. One of Go’s main features is its very simple parallelization via channels and goroutines. Goroutines let us fan-out our solver to multiple threads. Channels help us fan-in their responses to a single array in a threadsafe manner.

Here, we let the program use all logical processors. NumCPU detected 32 in this case (perhaps the 16 cores are hyperthreaded):

runtime.GOMAXPROCS(runtime.NumCPU())

Spawn one solver function per input. Note that the solver above does not return a value but writes it to the provided channel:

func spawnSolvers(inputs []*Input, c chan<- *Result) {
    for _, input := range inputs {
	    go solver(input, c)
    }
}

Returned values are collated into an answer slice and placed in order:

answers := make([]uint64, T)
for i := 0; i < T; i++ {
    res := <-c
    answers[res.Id] = res.Answer
}

Executing this brought us to 83 seconds: a 15x improvment. Not bad. There’s a limit to how much scaling can be done on a single machine, so our next step is to use a cluster. If we divide the test cases out to 2 machines, theoretically, we should cut that time in half.

Data is grouped into 2 batches in this case. I’ve tried to distribute the load evenly among the nodes to ensure that no one gets the lucky easy half:

for index, input := range inputs {
    x := index % nodeCount
    inputBatches[x] = append(inputBatches[x], input)
}

Kick off a sender for each batch and don’t wait for a response:

for index, inputBatch := range inputBatches {
    go sendBatchToNode(inputBatch, nodeUrls[index], c)
}

Batches are serialized and sent to the node. When the results come back, send them back over the Result channel one at a time:

func sendBatchToNode(inputs []*Input, url string, c chan<- *Result) {
    // Establish TCP connection to node. 
    conn, _ := net.Dial("tcp", url+port)
    // Send batch of inputs.
    gob.NewEncoder(conn).Encode(inputs)
    results := make([]*Result, len(inputs))
    // Wait for results and unpack.
    gob.NewDecoder(conn).Decode(&results)
    // Send results 1 at a time through the channel.
    for _, res := range results {
	    c <- res
    }
}

The Node Mode code. Listen for connection, deserialize the batch and spawn solvers. In Single mode we extracted the answer into a []uint64 but we need to return this batch to the Master node so we’ll keep it in a []*Result and let it sort things out:

func batchSolverNode() {
    // Wait for connection.
    l, _ := net.Listen("tcp", port)
    conn, _ := l.Accept()
    defer conn.Close()
    // Receive input batch.
    var inputs []*Input
    gob.NewDecoder(conn).Decode(&inputs)
    batchSize := len(inputs)
    // Create a channel for results to be returned to.
    c := make(chan *Result, batchSize)
    // Execute solution on the inputs.
    spawnSolvers(inputs, c)
    // Gather channel results into Result slice. Ignore order for now.
    results := make([]*Result, batchSize)
    for i := 0; i < batchSize; i++ {
	    res := <-c
	    results[i] = res
    }
    // Send back the result batch.
    gob.NewEncoder(conn).Encode(results)
}

Executing this on 2 machines yielded 42s, very close to our theory. Since I’d gone this far, why not see what 8 machines would do? Here are the results from all 4 executions:

Enabling multithreading and 8-way parallelization brought us from 1280s to 12s, a 100x improvement; very good. Now that we’ve got our cluster up and running, here are execution times for various size loads:

And finally, all 6000 cases of the large set executed in 43.38s. This is lower than the 4096 cases tested because for those tests I was repeating a single case which was perhaps more complex on average than the cases in the large set.

Possible enhancements:

Allow some tolerance for TCP delays using timeouts and retries.
Improve load balancing by dividing batches into smaller chunks. When a node becomes available, send it a new batch.
Provide failover. If one node refuses to connect, or if it doesn’t return its results within the expected time, strike it and give its work to another node.
Create logging for debugging remote node.
General error handling; currently, nothing is checked.

I didn’t find language in the Terms and Conditions that prohibited the use of clusters (of machines; clusters of humans are a no-no). However, Code Jam is clearly focused on elegant expression of a problem using code and on algorithmic efficiency. I’d guess that code executing on a single machine - multi-threaded or otherwise - falls within the spirit of the contest; the computing power of a single machine is fairly predictable. Once clusters are in play, there’s too much variability among contestants (e.g., access to server farm, finances to run EC2 cluster, using a stone-age language limited to single threading * cough * javascript). The playing field against an underprivileged coder becomes a little imbalanced. At any rate, the finals provide only a single machine with no internet, IIRC.

This was merely a POC to mess around with Go’s concurrency mechanisms, but for those of you who need just that extra edge, the code is up on github: clusterjam.go.

xhroot

Code Jam Concurrency in Golang

Comments