Archive Page 2


Vaporbase micro-CMS: Performance Improvement

(Warning: code talk ahead.)

When I started building the app I’m working on, I needed a quick CMS system that didn’t impose its own parsing language, and I was just learning RoR, so I implemented the Vaporbase micro-CMS based on this great tutorial. While you wouldn’t confuse this with a pro-grade CMS system, it works just fine for the most part, and doesn’t require you to learn a number of new things.

Since implementing it, I’d made a few minor cleanups, but nothing significant, until I had 60 pages in the CMS, and saw a problem: the list of pages was taking a very long time to load (21sec on avg on my dev machine).

Sure enough, there’s a problem: in the Edit and Show Tree Hierarchy section, here’s a line that says

<% unless index_item.children.nil? %>

Looks innocuous, right? Not exactly: for every single page in the tree, this does a separate SQL query (which is a full table scan, though likely on a tiny table) to find out if that item has any children. So creating the page is O(# pages in table), and while the queries itself aren’t that expensive, building the page just takes a while. If your tree is mostly flat – almost certainly the case for this kind of CMS – you’re wasting a lot of time.

Fortunately, there’s an easy solution: when you get the list of pages before creating the tree, figure out which pages have children, and then store that list to the side. It adds a bit of code, but not a lot.

# Original: from the index action in the controller
# This gets just the root nodes and then recurs down the tree
@pages = Page.find( :all, :conditions => [‘parent_id IS NULL’], :order => :position)
# Replacement:
# build two arrays –
# @pages consists of all of the root nodes (since the view walks down the tree)
# @pageIDs_with_children consists of all pageIDs which are the parent for >=1 page

@pages = []
@pageIDs_with_children = []

@all_pages = Page.find(:all)
@all_pages.each do |page|
if (page.parent_id.nil?)
@pages << page else @pageIDs_with_children << page.parent_id end

(Note that the list might have duplicates, but that doesn’t matter. If that makes you unhappy, check for existence in the else clause.)
Then replace the view line above with

<% if @pageIDs_with_children.index( %>

I saw a >80X performance improvement on this page with this change in development (from >20sec to ~0.25sec with 60 pages), and you’ve just replaced an expensive linear scaling step with a very inexpensive one.


Search Evernote, not Google

I’ve played with 100 note-taking/history-recording tools over the years, and none of them have ever really taken off for me. I started playing with Evernote soon after the beta began, and while I loved the interface and the story, my uses were very occasional.

Now, however, I’m constantly in Evernote – it’s third only to my browser and Textmate – as I’m coding. 

Why? I’m using it as a programming information repository. Every day that I program in Ruby on Rails, I learn a few new things or change the way I do things – today, for example, I learned that interpolation is faster than concatenation, and yesterday I finally found a clear explanation on how yields work. What I do now is every time I find an interesting piece of information, I clip the whole page to Evernote, add a few tags, and then keep on going. Occasionally I’ll paste screenshots from the PDFs of some books I own as well, which then (usually) get magically transcribed for me.

I’ve realized that I was using Google to search for the same information over and over again, and then trying to remember which link I liked the best. With Evernote, I’ve already recorded that memory, so I just have one source. Every day a greater percentage of my programming-related searching is happening in Evernote.

I’ve thought about making a separate notebook for my Rails/Ruby content and making it public – if I do so, I’ll post it here – though obviously it’s not a replacement for an actual text, it just has pointers to things I’ve cared about. However, if a dozen people did this, I’d love to have links to all of their notebooks.


On Apple, Amazon, Reviewing, and Large Companies

Two interesting stories going around in the last week, which I find to be similar (even if their impact is different): Apple’s being raked over the coals for rejecting an iPhone app that duplicates Apple functionality, and Amazon’s been dealing with the customer review attack on Spore, including purging and then restoring all of the reviews.

As I’ve mentioned before, I used to manage the Amazon customer reviews business, and so I know very well what the current team is going through. My assumption is that the Apple app store review business has some similar processes and problems. Here are some things I learned while dealing with this:

You start with some philosophical rules, and you try to make them stick. Providing guidelines is the only way to start. Example philosophies for Amazon (made-up, these aren’t real, don’t quote them anywhere else) could include “our customer is the Amazon buyer” (so no, Ms. Vendor, we won’t take down the negative reviews of your book, even though you spend a lot of money on advertising with us), “we eliminate reviews with demonstrably false information”, and “fairness is more important than justice” (so if you generally write good reviews and then get caught plagiarizing once, you can be given more chances). 

All sensible on face and all make sense to folks who think in these kinds of abstractions all day – there may still be debate but these are good places to start. 

There’s a clear chain of command for decisions. The escalation path from “customer service rep in her fourth week receives a review complaint in the mail queue” to “Jeff decides the review stays” should be very clear. (In my ~2 years dealing with customer reviews, btw, Jeff only engaged once on actual content, and the issue was much larger than just reviews (and he was getting hundreds of mails on this topic) – he generally trusted the heads of these teams to do the right thing as long as they could articulate the philosophy.)

All of this sounds good, of course, but then people get involved. And customer service reps are trying to interpret the philosophies (if they can find them among hundreds of pages of other rules), and some of them are judgment calls (what is “demonstrably false?” If I say “the defibrillator didn’t work and my dad died,” is someone going to check? are comments on voting records trustworthy? etc.) that different people will make, and of course you don’t want Jeff or Steve Jobs or anyone making every decision.

So it’s messy, and when it’s messy, strange things happen – reviews appear and disappear, apps go away and come back (like Netshare), etc. 

This is a long way of saying that it’s entirely likely that the banning of Podcaster is a problem of human judgment in a theoretically well-structured system – not least because the decision seems inconsistent – and that could easily come back, not because of a correction of a philosophy, but because of a correction of a human error.

Now, it’s Apple’s responsibility to make that correction, and then to treat the errant employee with respect and look at how the company can do a better job. 



Programming: When you’re stuck, write it down

Hey. Long time, no blog. Been busy.

Thought I’d share a tip that I’ve used for many years when programming (or learning almost any new thing) and I’ve been using recently: when you’re stuck, write it down.

Say you’re trying to figure out how to do something in [pick a framework], and you’ve Googled the heck out of the most-likely search terms, and nothing’s coming up.

Then write down your question as if you were going to ask a teacher/email it to a friend/post to a Google group/etc. Write down all the details: explain the thing you’re trying to do, the problem you have, and the number of things you’ve tried. Be as clear as you can, but don’t worry about being concise.

Literally every single time I’ve ever done this – and my rule-of-thumb is to do it after ~1.5 days worth of trying to figure it out myself – I find a number of new avenues to try, and almost always solve the problem on my own.

Writing it down forces you to take the jumbled thoughts in your head (they probably weren’t jumbled when you started, but you’ve changed paths so many times now) and turn them into a narrative. The process makes visible paths that your random walk opened up but that you didn’t see.

(This has been a procrastination for completing a writeup of my own.)


Newcomer’s Guide to Foo Camp

This was my first year attending Foo Camp, Tim O’Reilly and O’Reilly Media’s annual gathering of new and old friends. (Thanks are certainly due to Jesse Robbins and Brady Forrest, who have helped me connect to the O’Reilly community and who no doubt helped make the invitation appear.) TechCrunch has a great summary of this year’s event, with some excellent comments from attendees.

While I’m only 24 hours home and still only quarter-brained, I thought I’d write some hints down for next year’s newbies. There are some great resources about Foo Camp online – the 2008 Foo Camp Wiki and Scott Berkun’s collection of articles are just two – but I thought I’d add a few things to help out.

  • Force yourself through the shellshock. It is strange showing up on the O’Reilly campus and seeing folks who you’ve been reading/listening to/building because of just chatting with each other. Being around people who are famous (at least to you, maybe not so much your mom) is strange, and it’s easy to seek out the few people you do know and chat with them.

    My advice: get through it as fast as possible and get to the other side. It took me ~2 hours to stop talking to just the Seattle people. There’s wine if you need it – I just forced it until it became natural in an hour. (Talking about something you know with someone you don’t really helps.) I met almost nobody who was too pretentious to talk to the more unknown people at the camp – there was an assumption that everybody had interesting things to talk about.

    It’s also worth remembering that almost everyone is going through that same thought process – one non-slouch-himself said that in every conversation, “I could always tell why that person was here, but I couldn’t tell why I was here.” I just called it “Impostor Syndrome Camp.”

    Introvert? Extrovert? Worrywart? Just fake it ’til you make it. The experience will be better for it.

  • Talk about something you know. Two things here: first, Talk. Do something: run a session, give a lightning talk, something. Be in production at least as much as you’re in marketing; produce at least as much as you consume, in the O’Reilly lingo.

    Second, make it about something you know. This may seem counter-intuitive, as Foo is supposed to include a lot of learning about things you don’t know, but 1) this isn’t the place to not know what you’re talking about, and 2) people do know who you are and want to hear about the things you do.

    I was fortunately dragged along on this one, when Steve Souders asked me to join him to talk about making faster web pages. I spoke about Jiffy, Steve talked about some of his new ideas, we had a lot of conversation with people from organizations big and small, and it tied me to something that was concrete for others and led to a number of followup conversations. (I also learned that Jiffy is spreading much faster than I knew… more examples to come on that later.)

    You don’t have to just talk about areas you’re an expert: I also led a session on Unexpected Consequences of Software, which I could talk about intelligently but which others could know better, and it wasn’t bad – but it wasn’t as good, either. I also did a proto-Ignite talk on making your first open source project, which I’ll likely turn into a real Ignite talk for later in the year.

    Oh, and don’t stress about the signup board. I say the mad rush is overstated, and I know – I was pinned against it for about 5 minutes, and then just watched people for another ~20 (and the board still had plenty of space). There were still a few signup sessions open >24 hours after the conference started. If you want to talk about something, you’ll find people to talk with. (Of course, before you do, read Scott Berkun’s required reading on running an unconference session.)

  • Practice Serendipitous Session Selection. All of the advice said “go listen to things you don’t know anything about” – I didn’t even know how to pick those, and so a few times I found myself just wandering about for the first five minutes until a home felt right. Sunday morning, I just walked by some tents and saw Linda Stone leading a session with some people I had talked with earlier, and I just sat down. It ended up being about Attention Hacks – ways to improve your ability to focus – and let’s just say that we’re all ready to build some very cool things after that hour.

    I saw some other newcomers plan out their whole schedule in advance: my advice is pick the next session when it’s time to go – it enables more magical things to happen.

  • Play a little. Join a Werewolf game, play whatever crazy thing Jane or Robin (whose game I was too late to play 🙁 ) or Elan or someone else will bring, find a demo, something that forces you into a different kind of interaction. Does wonders for your confidence and creates memories (and YouTube videos). Every conversation doesn’t have to be dripping with meaning, and you aren’t wasting your time not having them.

That’s all. Thanks for the invite, O’Reilly folks, and I hope to join next year’s newcomers!


Speaking for the only-speaking

So I’m reading the New Yorker article on Sheldon Adelson’s political maneuverings and I came across the following passage:

“About three hours later DeLay calls and he tells Sheldon, ‘You’re in luck,’ ” he continued, “ ‘because we’ve got a military-spending bill. . . . We’re not going to be able to move the bill, so you tell your mayor that he can be assured that this bill will never see the light of day.’… (According to DeLay’s spokeswoman, DeLay does not recall the conversation and had no role in blocking the bill)

And my immediate reaction is this:

Tom DeLay has a spokeswoman?

Political opinions aside, what exactly does Tom DeLay need a spokeswoman for? Wikipedia says his current work includes a ghostwritten blog and a book he wrote with a professional writer. Otherwise, all he does is speak – to the press, at least, says Google News. Speaking is his job: he needs someone else to speak sometimes? Maybe this is some sort of post-Congressional pension – your own (or shared) spokesperson?

What does this person do all day? How often does she have to speak for the person who at this point just speaks? Does she speak for other people as well?

Perhaps this post will become the #1 SEO result for “Tom DeLay Spokeswoman,” and she’ll Google herself, and she’ll respond. We can only hope that I can corner that search niche.


Not-at-all-fun things to do with Twitter

Every once in a while I hit another breathless article about all the cool things you can do with Twitter. Most of them involve either following things I don’t actually care about or submitting obscure numbers for reasons unknown.

So here’s the start of a list of things that might seem less fun.

  • Get your stuff stolen. When I get into burglary, I’ll start with Twitter. Hey, Ian’s at a play! That’s plenty of time to take his MacBook and mardi gras beads.
  • Get spammed. OK, The Onion stopped, but still, it’s like double opt-in to friendly spam.
  • Be fooled by a fake celebrity. Ira Glass? Fake Ira Glass, currently dealing in non-sequiturs. (Some were likely devastated.)
  • Keep up with dictators.
    what are you doing?

    (Sadly, I believe Twitter has started blocking the creation of other dictator’s names: Pol Pot & Robert Mugabe were “unavailable” but have no pages. Probably that’s a good thing.)
  • Wait for pages to load. You already knew that one, but I was Internetally obligated to include it. Sorry, I don’t make the law.

Anything else to add?


Starting a Seattle CTO Support Group

I’ve been thinking for a while about trying to put together a Seattle “CTO Support Group” and have mentioned this to a few folks, and I’m finally kicking this off. I posted basically this message to the Seattle Tech Startups mailing list this past Friday, but I’m not sure who that hits and misses. I’ve had ~20 people express interest so far.

Opening thoughts:

  • Attendees are CTOs (or whatever title denotes “head of technology” – software and ops) at local companies/govts/etc. Starting point is that this is one person in a company, feel free to tell me that this doesn’t make sense for your organization, but the goal isn’t a replacement for other tech gatherings with more open invites.
  • Figure the makeup of the group will determine the balance between small/medium/large companies, private v. public, funded v. bootstrapped, etc. Early folks have been primarily (but not entirely) small startups.
  • Would have regular get-togethers – breakfast? lunch? – hosted by some company, ideally in a regular location (prob. in downtown Seattle or Pioneer Square) or in a rotating location.
  • Get-togethers would be 70% mingling, 30% presentation (probably from someone in the group) on an interesting topic – could be technical, could be managerial, could be look-at-the-industry. Assumption is that the content and topics are meant for hands-on leaders and are real-world, not stuff we’d read in a Gartner report if we read them anymore. Obviously interactive.
  • Assumption is that everything is under FrieNDA.

If you fit the description and would be interested enough to attend regularly, let me know (and if you have a preference b/w breakfast or lunch, or another bright idea, let me know that as well). There’s an “Email Me” link in the sidebar of I’ll likely host the first one in the next N weeks at WhitePages HQ in downtown Seattle (and probably could host regularly, we have a good space depending on size).


Velocity: Wikipedia Operations Talk

A few interesting notes from the Wikipedia talk, which for my money was the most interesting ops talk of the conference so far, from a Wikipedia engineer and trustee:

  • Lots of talk at the beginning about how many mistakes they make and how often an employee takes  the site down. Refreshing, if strange, to hear such a perspective. Hard to find volunteers who aren’t too scared about breaking the site. (Mentioned later that there are 6 site engineers, two volunteer. All of them were in the room.)
  • An early graph showed that the % of non-cached data was a super-tiny % of all traffic: even with what’s no doubt a very sparse hit matrix, it seems like they’re able to cache just about everything.
  • The speaker didn’t know what their uptime was – he said (without prompting) “99 point something” – and very much said that “lots of nines” wasn’t a priority for them. He was an engineer and not a manager type, but it was still an interesting perspective – obsess over efficient use of non-profit resources (and people), not about high availability. This theme continues throughout the talk.
  • “I don’t know how important the last few seconds of changes” are – people can just re-enter the changes.
  • When the site goes down, their Gone Fishin’ page shows a donation box, and they make a huge amount of money. It’s the most profitable time. “It’s sometimes better to torture people.”
  • They have an external developer community, they like when they add features, but their optimization operator is // – be always ready to disable something. Heavy tasks get eliminated, delayed, reduced, and otherwise made smaller.
  • Some very funny/snarky comments, like pointing out how they split their database into multiple slaves based on use cases/languages, and then said something like “now that’s called sharding. We didn’t call it sharding, we just do things and then people come up with names for them.”
  • Uses Lucene for its search engine: since Lucene was based on Java, and Java wasn’t open-source friendly, they got it running on an open-source Java clone and on Mono (!). Now back to Lucene proper and very focused on keeping things open.
  • Every feature anyone builds needs to allow caching and think about invalidation. OK to think of the database as a cache if the in-memory cache hit rate will be very low. Memcached for an object cache with a lot of careful planning, aggressive Squid usage as an HTTP cache with a month-long TTL (and an app that purges outdated data), multiple layers and geographically distributed.
  • Have some old P4s that they won’t use b/c of their power usage and heat distribution. (We have some as well, but they aren’t sitting idle yet…)
  • Some crazy high numbers (80K SQL queries/sec, 50K pages/sec, >1TB of compressed revisions, etc.) that I didn’t catch – I’ll look for this talk and update it later.
  • “Please open source your stuff, so we can use it!”

All in all, a very memorable talk.


Velocity: Introducing Jiffy – Performance Tools for the Rest of Us

During my time at Amazon, I drank the performance obsession kool-aid. Geniuses like John did the hard-core data analysis to show that milliseconds matter to encourage customers to buy things. Using the great toolset Amazon had built for analyzing page performance, my teams worked to improve the response time at the average, the 99.9th percentile, etc.

Amazon’s challenges were very specific – pages were built by hundreds of service calls to hundreds of services across thousands of boxes, and so the page framework needed to be able to respond to failing or slowing services quickly, with appropriate backoff, etc. These were fun and heady problems to work on and follow.

When I left Amazon for WhitePages, I realized that the toolset I had grown to rely on both solved Amazon’s unique problems and were kept inside Amazon. I learned that other large companies had similar systems, and I could pay the usual suspects for some kinds of services, but there was nothing just off-the-shelf that could give me 10% of the information Amazon had.

Smaller web publishers, then, have the double-whammy of fewer tools to measure page performance and fewer engineers to build the tools to make that better.

That’s not right for the web, and so at WhitePages, we’ve built a toolset to help engineers build faster web sites, based on real-world data from real clients.

Today, we released that system as an open source project – Jiffy. Jiffy’s an end-to-end system for measuring page and component performance across real data.

I’ve blogged in detail on the WhitePages Developer Blog about Jiffy, and the code is immediately available for download. I announced the release at the O’Reilly Velocity conference this morning – here are the slides.

More to come about Velocity later, which has already had a few interesting moments, even if the room is very hard to make laugh. (I killed a few jokes before I started, likely a good call, since the ones I left fell flat.)

Twitter Updates

[aktt_tweets count="5"]