Code Jam Concurrency in Golang

A typical Code Jam problem requires at least 2 solution submissions: a small set and a large set. Expect the large set to have more test cases, larger inputs, and an 8 minute time limit. Normally, this means that the brute force algorithm you wrote for the small set is useless on the large set.

Normally.

A previous post looked at the Bullseye problem in Rust. I’m going to revisit it with Go language. My base machine will be a fairly respectable “CC2 Cluster Compute” EC2 instance:

• (cc2.8xlarge)
• 88 ECUs
• 16 Cores
• 60.5 GiB

After porting the O(N) simplistic summing solution over from Rust and utilizing only one core, 1000 large test cases ran in 1280s. In the large set, there are a total of 6000 cases, 4000 of which involve large numbers, so roughly speaking, this solution would take 1 hour and 25 minutes. One of Go’s main features is its very simple parallelization via channels and goroutines. Goroutines let us fan-out our solver to multiple threads. Channels help us fan-in their responses to a single array in a threadsafe manner.

Here, we let the program use all logical processors. NumCPU detected 32 in this case (perhaps the 16 cores are hyperthreaded):

Spawn one solver function per input. Note that the solver above does not return a value but writes it to the provided channel:

Returned values are collated into an answer slice and placed in order:

Executing this brought us to 83 seconds: a 15x improvment. Not bad. There’s a limit to how much scaling can be done on a single machine, so our next step is to use a cluster. If we divide the test cases out to 2 machines, theoretically, we should cut that time in half.

Data is grouped into 2 batches in this case. I’ve tried to distribute the load evenly among the nodes to ensure that no one gets the lucky easy half:

Kick off a sender for each batch and don’t wait for a response:

Batches are serialized and sent to the node. When the results come back, send them back over the Result channel one at a time:

The Node Mode code. Listen for connection, deserialize the batch and spawn solvers. In Single mode we extracted the answer into a []uint64 but we need to return this batch to the Master node so we’ll keep it in a []*Result and let it sort things out:

Executing this on 2 machines yielded 42s, very close to our theory. Since I’d gone this far, why not see what 8 machines would do? Here are the results from all 4 executions:

Enabling multithreading and 8-way parallelization brought us from 1280s to 12s, a 100x improvement; very good. Now that we’ve got our cluster up and running, here are execution times for various size loads:

And finally, all 6000 cases of the large set executed in 43.38s. This is lower than the 4096 cases tested because for those tests I was repeating a single case which was perhaps more complex on average than the cases in the large set.

Possible enhancements:

• Allow some tolerance for TCP delays using timeouts and retries.
• Improve load balancing by dividing batches into smaller chunks. When a node becomes available, send it a new batch.
• Provide failover. If one node refuses to connect, or if it doesn’t return its results within the expected time, strike it and give its work to another node.
• Create logging for debugging remote node.
• General error handling; currently, nothing is checked.

I didn’t find language in the Terms and Conditions that prohibited the use of clusters (of machines; clusters of humans are a no-no). However, Code Jam is clearly focused on elegant expression of a problem using code and on algorithmic efficiency. I’d guess that code executing on a single machine - multi-threaded or otherwise - falls within the spirit of the contest; the computing power of a single machine is fairly predictable. Once clusters are in play, there’s too much variability among contestants (e.g., access to server farm, finances to run EC2 cluster, using a stone-age language limited to single threading * cough * javascript). The playing field against an underprivileged coder becomes a little imbalanced. At any rate, the finals provide only a single machine with no internet, IIRC.

This was merely a POC to mess around with Go’s concurrency mechanisms, but for those of you who need just that extra edge, the code is up on github: clusterjam.go.

Time Complexity vs. Coding Complexity

photo by stevendepolo

Problem 1A in Code Jam 2013 involved finding the maximum number of rings that could be drawn when creating an archery target; the starting radius r and paint supply t vary.

I cranked out a naive implementation for the small problem set which looped through the formula for calculating paint used for each ring, summing it up along the way and comparing against the max paint supply. Since 1 mL of paint conveniently covers exactly pi cm2, pi was omitted. Given radius r, the paint required for the kth ring is found by:

This was also the first time I’d used Mozilla Research’s Rust, though I did gear up for the contest by preparing a vanilla template. As with any new language, the syntax isn’t completely cast in iron, and if you google around, you’ll see some examples with slight differences. All code samples are relevant for version 0.6:

Complexity is O(N) with N dependent upon t-r. This took care of the small dataset fine. However, for a small radius and lots of paint - up to 10^18 - this could take long. Indeed, my first test with large-dataset-type numbers crawled. I immediately looked for the fastest solution I could think of, which involved arithmetic series:

computing total paint for j rings

and solving for the positive root using the quadratic equation, given p as paint used:

And in code:

No built in BigInt sqrt, had to use Newton’s method:

The good news was that the solution was constant time, O(1). The bad news was computation, debugging and BigInt wrangling ate up too much time so I failed to submit before the contest ended.

Turns out there’s a better approach here. Part of an earlier calculation yielded the formula for computing how much total paint would be used for j number of rings. This could be used to approach the paint available t in faster than linear time. In fact, a binary search did occur to me at one point. It’s orders of magnitude faster, but given how slowly the O(N) solution was executing, I figured O(log N) wouldn’t make much difference. Sucker ran nearly as fast as the O(1) solution:

Given the time constraints in Code Jam, sometimes it’s difficult to stop and guage what’s going to be fast enough. Then again, I should be able to adapt binary search into any solution fast enough by now, that a quick check shouldn’t cost much more than a few minutes.

The Rust templates for all the code above are on github: Rust Code Jam templates

Accessing Redis From Google App Engine

Google App Engine release 1.7.7 includes a new Sockets API, however the documentation is a bit thin. Here’s an example that uses sockets to request a website (or at least the 1st 1000 chars):

import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.send('GET / HTTP/1.1\r\n\r\n')
website = s.recv(1000)
self.response.write(website)


The more important point here is that libraries that rely on socket can now be used. Indeed, the App Engine team has provided a demo that uses “nntplib”.

A question on Reddit spurred my curiousity and gave me a chance to try out Amazon EC2 for the first time. This is a small proof-of-concept that demonstrates accessing a remote Redis instance hosted on EC2 from App Engine.

Amazon EC2 Setup

This section is a dump of my notes from setting up EC2. Skip this if you already have a Redis server.

Create EC2 Instance

Navigate to the Amazon EC2 console Dashboard and click “Launch Instance”. Go with the “Classic Wizard” and choose a server. I used “Ubuntu Server 12.04.1 LTS, 64-bit”. For most of the way, I stuck to the defaults, so you’ll end up with a “T1 Micro (t1.micro)” instance type. Blow through the Launch Instances, Advance Instance Options, Storage Device Conf screens. Give a value to the Name key tag if you wish; I skipped.

Security

You’ll have the option to choose an existing Key-Pair if you’ve done this before. If not, Create a new one by giving it a name and click “Create & Download your Key Pair”. Keep track of that downloaded file; you won’t get another opportunity to download it. In fact, go ahead and copy it to the ~/.ssh folder of the machine you’ll be connecting from.

Next “Create a new Security Group” if you don’t already have one and give it a name and description.

• In the “Create a new rule” dropdown, select SSH and click “Add Rule”, so you can admin the box.

• For the next rule, keep it at “Custom TCP rule” and select a port range of 6379. Leave “Source” alone again, and “Add Rule”

Now “Continue” and “Launch”.

Note the name which looks something like “ec2-99-999-999-999.compute-1.amazonaws.com”

Install and Configure Redis

With your PEM file in the ~/.ssh folder[1]:

$ssh -i ~/.ssh/ec2_redis_keypair.pem root@ec2-99-999-999-999.compute-1.amazonaws.com @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: UNPROTECTED PRIVATE KEY FILE! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ Permissions 0664 for '/home/xhroot/.ssh/ec2_redis_keypair.pem' are too open. It is required that your private key files are NOT accessible by others. This private key will be ignored. bad permissions: ignore key: /home/xhroot/.ssh/ec2_redis_keypair.pem' Permission denied (publickey).  Ok: $ chmod 400 ~/.ssh/ec2_redis_keypair.pem
$ssh -i ~/.ssh/ec2_redis_keypair.pem root@ec2-99-999-999-999.compute-1.amazonaws.com Please login as the user "ubuntu" rather than the user "root".  Whoops, ok: $ ssh -i .ssh/ec2_redis_keypair.pem ubuntu@ec2-99-999-999-999.compute-1.amazonaws.com


Success! Now, install Redis:

$sudo apt-get install redis-server  [2] Notice that it’s only listening for connections on the localhost port: $ netstat -nlpt | grep 6379
(No info could be read for "-p": geteuid()=1000 but you should be root.)
tcp        0      0 127.0.0.1:6379          0.0.0.0:*               LISTEN      -


[3] Adjust the configuration file to permit remote connections:

$sudo vim /etc/redis/redis.conf  Comment out the line: #bind 127.0.0.1  It was on line 30 for me. Use 30Gi#, hit ESC, ZZ. [4] Restart Redis: $ sudo /etc/init.d/redis-server restart
Stopping redis-server: redis-server.
Starting redis-server: redis-server.


Check the port again:

$netstat -nlpt | grep 6379 (No info could be read for "-p": geteuid()=1000 but you should be root.) tcp 0 0 0.0.0.0:6379 0.0.0.0:* LISTEN -  You can now connect from the outside. Connect to Redis locally: $ redis-cli

redis 127.0.0.1:6379> set igor "IT WORKS!"
OK


I had a Redis client installed on a local Windows machine so I thought I’d mix it up by connecting from there.

Start -> Run -> c:\Program Files\redis\redis-cli -h ec2-99-999-999-999.compute-1.amazonaws.com

> get igor
"IT WORKS!"


Congratulations! You are the proud owner of a public, wide open, credential-free Redis instance. Fortunately, this article is merely a POC so we’ll be driving blindly past red flags for now.

Connecting from App Engine

Sample App Engine-Redis app: https://github.com/xhroot/appengine-redis

It’s not necessary to deploy this app to test it; the dev_appserver works fine. If you choose to deploy, note that “Sockets are available only for paid apps.” Create a new App Engine application and enable billing in the dashboard now to allow for the 15 minute delay before billing is active.

Clone the application above on your machine. In a separate folder, clone the Python Redis client:

git clone git://github.com/andymccurdy/redis-py.git


Copy the redis folder into the root of the App Engine application. Modify app.yaml to match your application name:

application: yourappname


Modify main.py to use your EC2 instance name:

REDIS_HOST = 'ec2-99-999-999-999.compute-1.amazonaws.com'


Again, you can either run this on the local dev_appserver or you can deploy it. Once running, you can fetch values with GETs. This method uses r.get() to retrieve values from the remote Redis installation:

$curl -w '\n' 'http://yourappname.appspot.com?igor' igor="IT WORKS!"<br>  Use PUT to add/update values. This method uses r.set(): $ curl -w '\n' 'http://yourappname.appspot.com' -X PUT -d 'proj=treadstone'

$curl -w '\n' 'http://yourappname.appspot.com?igor&proj' proj="treadstone"<br>igor="IT WORKS!"<br>  Use DELETE - r.delete() - to remove values: $ curl -w '\n' 'http://yourappname.appspot.com?igor' -X DELETE

\$ curl -w '\n' 'http://yourappname.appspot.com?igor&proj'


As mentioned earlier, port 6379 on the EC2 server is open to the public, which is not secure. There are some options:

• use an SSL proxy like stunnel
• use a Redis fork that has SSL built-in
• use another key/value store that has SSL support, such as MongoDB (requires a special build flag)

That’s it. Perhaps the main driver for having an external caching layer as opposed to using memcache is to have greater control over data eviction. Another reason might be that a distributed cache may be suitable for synchronizing across platforms. It’s always nice to have choices and to see more capabilities being added to this platform. Looking forward to seeing what’s new in the I/O release.

Server push (AKA long polling and Comet) is a relatively recent technology where servers initiate data transfer to connected browsers. Contrast this with the normal request-response interaction shown below:

For a long running job, the browser would have to constantly pester the server for updates. With server push, the browser could initiate the job, then sit back and let the server respond at its leisure:

For Google App Engine developers this is implemented by the Channel API. There are also third party services like Pusher and Beaconpush that provide this capability through their API. And recently, thanks to David Fowler and Damian Edwards, .NET developers can also integrate this technology into their projects using SignalR.

I’ve cooked up a small demo that uses SignalR to demonstrate one solution to the problem of conflicting updates. When multiple users act on the same set of information, it’s easy for it to get out of sync. This demo uses SignalR’s Hub to push updates to the users the moment they occur so they can immediately act on the changes.

Our application is a student registry form. Users can update a student’s name, GPA and enrollment status. A server section, visible only for demo purposes, shows us the state of the database. Because the most recent data is always pushed to the browser, we can take immediate action instead of discovering a save conflict at the end of a long form.

Shown right is your typical N-tier MVC stack. Browser requests are handled by the controller, data is passed to the service for the real legwork, the DB is consulted as necessary and the data is passed back up the chain to the browser. Fantastic. Where SignalR comes into play is when the StudentService decides a legitimate update has taken place. It then notifies the Hub that a new update has occurred and it should broadcast the new record to all listening browsers.

Let’s take a look at the setup.

The HubConnection establishes the base URL that the client side API will use to watch for updates. This is done in the controller and injected into the StudentService. Here’s the javascript setup:

This starts the hub on the client side. Once started, a unique id is assigned to this browser which comes in handy as we’ll see later. Next we look at the Hub,

and its corresponding javascript:

StudentHub derives from the abstract Hub. The hub only needs one method for this demo and that method sends out the updated data. It calls a javascript method called updateStudent and sends it one argument - transport. By the way, StatusTransport is a wrapper around the data to be sent - StudentViewModel - and is used to attach additional information about the request (e.g., operation status, error messages). In the javascript, the function name should match the method that is invoked from StudentHub; result is the javascript object representation of the transport object sent by StudentHub. When Clients.updateStudent executes on the server, all connected browsers will execute this javascript function locally.

Recall the video demo above. When the student is saved, all other browsers immediately show a dialog notifying them that an update recently occurred with the option to accept the recent changes. Note also that the browser that issues the save will receive an updated record from both the Ajax save response and the StudentHub. We use the ClientId to check for the identity of the sender and ignore the duplicate update when the client id is self.

To contrast this behavior with the conventional approach, disable the hub by commenting out this line in StudentService:

Now, concurrency violations will only detected at save time.

I’ve built this example as an MVC project, but Webforms could be used as well and I’ve included a WCF service as an example. To use it, in the Index.cshtml view, change the serviceUrl ternary to true.

To wrap up,

• Pros:
• Very responsive UX. Events can be reported within milliseconds of a database save.
• Allows interaction with other connected clients (e.g., customer service chat).
• Can be added to existing projects.
• Cons:
• Introduces some complexity in the code, especially on the client side. More states to keep track of.

The source for this demo is available on github: Registry.

Koderank

Koderank is an online whiteboard that provides an environment for quick and easy coding interviews. Interviewers can give candidates small coding exercises to gauge their abilities, and can view the code live as it is being typed. Voice chat is available through Twilio Client to allow the interviewer and candidate to converse during the session.

I originally created it for the Twilio Client launch contest and it was selected as one of five winning entries. The first iteration was done in Python/jQuery. As a learning exercise, I reimplemented it in Go language and Closure Library.

One of the fun things about using a language as new as Go is that a lot of third-party libraries haven’t been released yet, so you often get to be the pioneer, although it sometimes means a few arrows in your back. Here are a couple tools I had to create:

• Twilio Client Capability. Used for creating and packaging scope definitions. I ported some functions over from John Sheehan’s C# project.
• Base 62 UUID’s. This is based off a suggestion by Russ Cox for creating simple UUID’s. For URL compression, I wrote a method that converts it to base 62. It requires the math/big package to convert the 32 character hexadecimal number into a 23 character base 62 string.

Sometimes it’s nice not to have to write everything yourself, though. Kudos to the developers of the third party libraries that I used such as Diff-Match-Patch, CodeMirror, and GoJWT. They’re credited on the About page. In the same spirit, I’m opening the source to the Koderank website which you can now find on github: Koderank source. It runs on App Engine and uses the Channel API for sending data to the listening browser. The server and client-side are written in Go and Closure Library, respectively.

I’ve written previously about my experiences with Go and Closure.

Go Language

A welcome side-effect of exploring new languages is that they tend to force you to think differently about language design, at times bringing improvements to the languages you use daily. Here are some observations about Go:

• Declaration is very concise. := is used as declaration + assignment. var and a type are necessary if there is no assignment (and thus no type to infer from).
• Multiple assignment. This is not unique to Go, but it highlights what is so attractive about it: you can compose code in a way that is so natural that others find it easy to read. How do you swap 2 integers in C#?

In Go?

More importantly, multiple variables can be returned from a function. This eliminates having to create transport objects whose only purpose is to wrap multiple values.

• Implicit interfaces. Implement an interface without having to declare it.
• Strict compiler. Does not compile if imports or variables are unused, which has prevented many bugs. So much is caught by the compiler that I’ve been pleasantly surprised by how frequently I’ve gotten past compilation and the program ran without errors.
• Slices. Think of them as sub-array pointers.
• Error handling. Idiomatic Go encourages always checking for errors as soon as they may occur or be returned. If you use guard statements often, you’ll probably be OK with this style of coding.

Here’s a list of some great resources for learning Go: Go Resources.

Rangecheck

According to Oracle, Java’s rangeCheck is a formidable accomplishment. Here it is in LOLCODE:

Example usage:

I competed in Google Code Jam for the first time this year and really had fun with it. Some of the puzzles can seem challenging at first glance but usually break down into a main algorithm with a few checks for outliers and limits. It’s a pretty good test of how quickly you can translate a problem into code.

I used C++ primarily and Go language in one case since I knew it had a handy string.Map function. My solutions can be found on github.

One of the problems, named Kingdom Rush, provided a game scenario as input and required a result indicating the shortest number of turns to complete it.

On my 1st pass I implemented the obvious naive solution:

It passed the given samples, so I submitted and it failed. The grader doesn’t say why or give errors, so I had to find a case that broke the code. This is where pencil and paper are invaluable since at this point you have to simulate instances. Turns out, you can shorten your game by doing 2-star ratings where the 1-star rating is unlikely to be completed until late. I resubmitted with this modification and it passed.

What I learned:

• Have a pencil and paper handy; this is largely a math contest. Doodling tends to provide insight. As I was working on Hall of Mirrors my sketch reminded me of the Fly and Spider problem, which revealed a very reasonable approach.
• Hurry. Readability and elegance are for teams and longevity; this is an intense individual exercise where you only need 1 good run. Write the quickest thing that might work.
• Remember any initial assumptions. In this problem, I assumed that order did not matter. When the solution I had coded worked exactly as intended, but was still wrong, it took me valuable minutes to work back and realize that turns could be minimized with a deliberate selection order. Don’t mistake this for advice about not making assumptions; you should (see “Hurry”). Just remember what you glossed over when things don’t work.
• Break it down. The idea behind this is trite, yet why do large problems always daunt us as if they were monoliths that can only be tackled in one piece? While not being gimmicky, these problems often break down into known solveable chunks. At its core is usually something familiar.

Get Time of Last Wake From Sleep in Windows 7

Start -> Paste in “Search programs and files” box:

Here’s the breakdown:

cmd /k - This opens a command session, executes the command that follows, and keeps the window open. Useful if you’re entering commands in a Run dialog; unnecessary if you already have an active command prompt.

wevtutil - Windows event log read utility.

qe - Query events …

System - … specifically from the system event logs.

/q:"..." - XPath-like query used to traverse the XML.

/rd:true - Direction, “true” shows latest first.

/c:1 - Number of results to return.

/f:text - Format output as text, which is fairly readable in this case. Could also opt for “XML”.

This simply returns the last event with a source name of ‘Microsoft-Windows-Power-Troubleshooter’. Skim through the result to verify that it’s a return from sleep event. If not, increase the number of results to return using the /c parameter.

This can also be accessed via the GUI using Start -> Run -> eventvwr -> Windows Logs -> System -> Find -> “Microsoft-Windows-Power-Troubleshooter”.

I’ve found this command useful in consulting situations where I need to track time for billing purposes.

Go Resources

Below are some resources for learning/using Go, #golang:

• online
• print
• code samples, packages, libraries
• hosts
• general