orbitz, open source and me

2008-06-30

Some friends at work were recently interviewed by Matt Asay on the release of our monitoring software, ERMA, as open source — an unconventional move for corporate America.

We have a long and quiet relationship with open source at Orbitz. In the article Matt O’Keefe was kind enough to throw a compliment my way:

We have a history of contributing to other open-source projects. Brian Zimmer and others on the team have been very active in open-source projects.

You can read more coverage about open-sourcing ERMA here and be sure to check out the real-time visualization software we released, Graphite, as well.

Congrats Matt!

Categories : development

Locks, food and flowers.

2008-06-26
Geotag Icon View on map.

Sunday, another gloomy day this spring/summer, found us in Seattle checking out the Hiram M. Chittenden (Ballard) locks and the neighboring Carl S. English, Jr botanic gardens.

lockssalmon ladderbaby seal

The docks were originally built so coal and timber could be easily transported by boat but today salmon, seals and pleasure boating dominate the landscape.

A few boats were loaded into the locks while we watched, a couple apparently doing it for the first time given the general confusion around what to do. We started to watch the water rise but having risen in locks while kayaking, we knew the proceedings and left early to check out the botanic garden …

hot pink

orange poppy

which I found more interesting.

After we spent some time chasing squirrels and running through the lawns we went to the Ballard Market and Clover Toys, the kind of toy store we love with lots of European-imported toys and almost no plastic crap (though we did walk out with a new rubber duck named “Ben”). The owner of the store introduced us to Froebel Gifts which we had to buy because I loved them so, my daughter taking to them (Gift 1) as much as I.

I love farmers’ markets. In Chicago I would shop three days a week in the summer and arrive at work loaded down with bags of currants, blueberries, bread, … whatever was seasonal. The BI market is great but the Ballard Market is really my kind of affair, reminding me a little of the markets in Paris with a full complement of products on display.

For as often as we attend a farmers’ market I rarely take photos even though I feel it’s a wonderful display of color and textures — I rushed these photos not wanting to look like that-guy-with-the-camera. I can’t wait to go back and fortunately it doesn’t conflict with our local market.

morrels

turnips

carrots & broccoli

Back home in the garden we have our own peonies opening slowly.

peony

Horses, guns and loud noises.

2008-06-24
Geotag Icon View on map.

In a departure from our usual naturalizing, I persuaded the family to attend the Battle of Port Gamble, a Civil War reenactment in Port Gamble, an event type we had never before attended.

equestrian parade

It started innocently enough with an Equestrian Parade over the battle field.

Then the problems started. I should have realized the guns would be loud, but they were LOUD — the canons literally shook the earth when they fired, startling my daughter to tears. My wife and daughter departed for an ice cream cone while I stayed behind to shoot some photos. I was not alone; I had no idea these events attracted such a large contingent of expensive camera gear.

the north shooting at the south

Let the battle begin.

girl

I love this photo. I don’t remember seeing the girl in the middle when I was shooting but when I was going through the photos it really jumped out at me. I love how she’s standing so innocently among the battle.

carnage

The battlefield as the end of the Northern victory was secured.

fiddle

guitar

The camp after the battle was lively with food over open fires and musicians playing our favorite Southern music.

I’m happy we went, but I don’t think we’ll be going again, not because the event wasn’t well staged but rather that’s enough loud noises for a while — back to the woods, mountains and birds for us.

Dungeness Spit, Black Brants and Nash’s Organic Farm Stand.

2008-06-19
Geotag Icon View on map.

The would-be naturalists chose Dungeness Spit as this year’s Father’s Day destination. Some research on the web unearthed The 3 Crabs as a highly recommended crab shack right on the Sound with birding on the adjacent beach. Lunch plans decided, we packed the car and drove up to Dungeness.

We arrived at The 3 Crabs a bit early, lunch not served until 11:30am, so we tried to do some birding while we waited — the Sound did not cooperate.

no view

The 3 Crabs opened and we had crabcakes for lunch. While I like Dungeness crab, I generally finish wanting more, especially for the price, and this time was no exception. I’m still looking for a real crab shack — a dive.

The 3 Crabs

Fortunately the Sound cleared while we ate. If you look closely dead-center on the horizon you can see the New Dungeness Lighthouse. Compare this photo to the one above, taken maybe thirty minutes earlier; the weather on the coast is volatile.

departure

Somewhat disappointed, we left The 3 Crabs and drove to the Spit. Paying our three dollars we headed down the trail and stopped at a lookout which gave a great view of this unique formation.

the view down the Spit

A spit (courtesy Wikipedia):

A spit is a deposition landform found off coasts. At one end, spits connect to land, while at the far end they exist in open water. A spit is a type of bar or beach that develops where a re-entrant occurs, such as at cove’s headlands, by the process of longshore drift.

Basically, it’s a long, thin sandbar and in the case of the Dungeness Spit it protects the Dungeness National Wildlife Refuge. If you look at the extreme right of the horizon in the photo below you can see the Lighthouse, 5.5 miles down sand, stones and driftwood. This is one long finger.

the Spit wrapping it's long way around

The Black Brant winters here and while not threatened yet it is being watched because of habitat loss.

black brant

On the way home we stopped at Nash’s Organic Produce in Dungeness. My daughter loved the broccoli and almost refused to hand it over to be weighed. The produce was delicious — and cheap! I can’t wait to go back. As unimpressed as I was with The 3 Crabs, I loved Nash’s!

Nash's Farm Stand

inside Nash's

SmugNDrag v1.4.

2008-06-17

I’m happy to announce SmugNDrag v1.4 has been released. The primary change is the addition of the Sparkle framework to automate version updates with a minor change to the UI. The new release is available here. Enjoy!

Categories : development   photography
Tags :       

Google Seattle Conference on Scalability — 2008 Edition.

2008-06-16

Overview

I was really looking forward to this conference based on my experience last year, with the likes of Jeff Dean and Marissa Mayer presenting. When I saw the original agenda for Saturday I was excited to see they expanded the number of talks at the expense of having to make a decision about which presentation to attend, a task at which I often feel I failed.

When I arrived, late, I was surprised to see they decided to change the format and rather than have two tracks for each session, the presentations were shorten so everyone could attend every talk. I’m not sure how much notice the presenters were given of this decision because a number had presentations well exceeding the diminished time frame. As a conference presenter myself, I know that a well re-hearsed presentation can be difficult to amend on the fly.

Communicating Like Nemo

I’m not sure what I was supposed to get out of this presentation. I understand that working under water places significant constraints on connectivity, bandwidth and other factors but I didn’t feel like I really learned much about how these are being overcome. I did get to brush up on my PADI hand signals — it’s been awhile since I dove last.

maidsafe

Since I arrived to the conference a bit late I was seated towards the rear of the room for the first two talks. The presenter chose to use the whiteboard as a primary presentation medium, which as a friend said demonstrates he really has confidence and knows his shit, but for me was unfortunate since I could barely hear the presentation nor see the board. Since my mind was already deep in debugging objc’s forwardInvocation: I chose to leave the room and finish my work, in which I’m happy to report success. Afterwards, I learned this talk was pretty good if you could see and hear.

Chapel

Fantastic. This was the quality and topic of talk I was looking forward to seeing. Chapel is a new programming language coming out of Cray which:

supports a multithreaded parallel programming model at a high level by supporting abstractions for data parallelism, task parallelism, and nested parallelism. It supports optimization for the locality of data and computation in the program via abstractions for data distribution and data-driven placement of subcomputations.

It supports constructs within the language to create and execute arbitrarily nested tasks via a begin keyword and join on the results of those calculations via sync. Furthermore, it can execute the same tasks in parallel by using cobegin and coforall operations without changing the underlying code. This is an improvement over the current state-of-the-art MPI programming which forces the developer to have intimate knowledge about both the high level logic of the application and the distributed runtime, creating difficult to maintain code. Chapel also supports synchronization of tasks in a similar, data-driven manner.

In addition to the task and data parallelism, Chapel supports the idea of locales which can be CPUs, cores or separate machines entirely. Through lower level constructs such as locale and on, the developer can specify where tasks should run and how resources are accessed and utilized.

This is pretty exciting. I like the approach of high-level, don’t-worry-about-it language features with the ability to dig deeper if necessary. Unfortunately, I’m not sure this language will ever see a line of code from the likes of me given its intended problem domain and hardware.

Carmen

The scientific community is plagued by a number of issues regarding research such as a myriad of file formats, no central repository for data and limited data sharing and analysis. Carmen addresses some of these concerns through the implementation of a domain-specific cloud architecture. In many ways it looks and feels like EC2+AWS but it addresses the specific needs of the science community, such as the security model for collaboration and the cost structure of using the commercial clouds given the cost for data storage would be extraordinarily high.

In order to carry out experiments or analysis, data and services are uploaded to the cloud and then a workflow is created to integrate, via SOAP, the binary services (WARs, executables). During the runtime of the analysis, if additional services are required (based on numerous metrics) they are automatically created and deployed. This sounds a lot like a combination of EC2 and AppEngine.

The presenter also showed a photo of an exposed human brain from an operation — unexpected at a computer conference.

GIGA+

This was one of those talks that was, for me, better for the bits of take-away material than the actual product being presented. For example, when a node reaches storage capacity in GIGA+ it splits some of the data elsewhere. In order to achieve limited-to-no locking, each node keeps a table of where it sent data so every client doesn’t have to be updated right now but instead can be lazily updated. If a client makes a request to the old node because of a stale view of the world, the request is forwarded, ala HTTP, and the client updated. I also learned about extendible hashing and bitmap management of partition locations.

This could have been a more interesting talk but a lot of assumptions about the operating environment were made making it more or less unrealistic at the moment, such as: the network is always reliable, the configuration is static, no offline disconnected mode.

Google Maps Mobile

A light, but interesting overview of the problems facing mobile development: lots of OSs, form factors, bandwidth, available storage, security, localization, …

Wikipedia on Erlang

This talk should have replaced Erlang with DHT in the title for it was really about replacing a typical large-scale MySQL cluster of databases with a DHT+transactions to implement a clone of Wikipedia. As far as I could tell, Erlang was used a pseudo-message bus with more development in Java integrated with Erlang via JInterface. In the end, this looked like a similar implementation of SimpledDB or any of the other key-value stores.

NetWorkSpaces

NetWorkSpaces is a Python-implemented (twisted and Zope) tuplespace integrated with R to provide parallel computation for the otherwise serial computational model of R. Given the almost commodity-like tuplespace environment, it seems the real advantage here is the integration with R and not the tuplespace itself (again see SimpleDB, …), though the presenter pointed out NWS would run on any platform which runs Python. The typical deployment is small, around 12-16 nodes, because that’s a normal installation more than a limitation of the architecture.

Shared Transactional Memory

A good, general overview of the problems facing language and hardware (Azul, Sun Rock) developers and engineers as they attempt to address transactional memory. I thought the presenter did a nice job of demonstrating the issues through code examples but as with any [H|S]TM presentation, it was light on answers and heavy on “that needs to be figured out”.

Conclusion

I was glad I went for the Chapel talk and enjoyed the Carmen, GIGA+ and STM talks.

One of the themes I took away was while cloud computing has become mainstream there’s a need to add the domain-specific abstraction on top of it, not too dissimilar really to the ever-growing popularity of DSLs implemented in mainstream languages.

I liked last year’s approach better: fewer talks, more time for each presentation, more polished speakers and more technical content; I also liked the move to Seattle from Bellevue.

pysmug, tag clouds, asynch IO and the SmugMug API.

2008-06-12

A question was asked on a dgrin thread about whether the SmugMug API supported building a tag cloud — it doesn’t. A responder suggested it would take far too long to generate one from the API since you’d have to trawl through every photo. This is indeed true, but you don’t have to do it serially. I consider the batchable interface for pysmug to be it’s selling point and building a tag cloud is the perfect demonstration.

In order to get the results for my 80+ albums and 3200+ photos I need to make one call to get the full list of albums and then one call each for every photo. If this was being done serially, then I’d give up too, but under pysmug sits pycURL+libcurl which are very fast at handling many, many simultaneous requests.

Here’s the code:

def tagcloud(self, kwfunc=None):
  """
  Compute the occurrence count for all keywords for all images in all albums.
 
  @keyword kwfunc: function taking a single string and returning a list of keywords
  @return: a tuple of (number of albums, number of images, {keyword: occurences})
  """
  b = self.batch()
  albums = self.albums_get()["Albums"]
  for album in albums:
    b.images_get(AlbumID=album["id"], AlbumKey=album["Key"], Heavy=True)
 
  images = 0
  kwfunc = kwfunc or _kwsplit
  cloud = collections.defaultdict(lambda: 0)
  for params, response in b():
    album = response["Album"]
    images += album["ImageCount"]
    for m in (x for x in (y["Keywords"].strip() for y in album["Images"]) if x):
      for k in kwfunc(m):
        cloud[k] = cloud[k] + 1
 
  return (len(albums), images, cloud)

The big win here is I’m not waiting on sum(response times) but rather on max(response times) because the requests are being handled asynchronously and the responses are coming back as soon as they’re ready. If I remove the use of the batchable and instead make the requests serially I wait much, much longer: batchables create the cloud in less than 30 seconds, serially it takes just under three minutes. This works out to around 110 requests/second for the batchable and 19 requests/second serially. I’d say that’s an impressive performance improvement.

This new method is available on tip and will be released with v0.5 (though it’s easily back-patched to v0.4). There are a number of other batchable examples in the SmugTool class.

I love asynchronous IO — concurrently handling many requests with a simple API makes me happy; using only one thread makes me happy too.