Thursday, September 25, 2014

The Cinnamon Twist Alert - Handling Complex Boundary Conditions

The Old Ways Are Not Always Best


A new bug report came in. After reading through the report, the problem was clear. Our system did not completely support single-day batches of over 999 transactions.

The problem boiled down to a single counter used to track transactions throughout the day. The field for the counter has a fixed width of three digits. The sequence counter field begins with 001 and increments by one for each transaction all the way up to 999. This value is used to help identify and correct communication errors. When a terminal attempts to send a request but encounters an error, we resubmit the transaction using the same sequence counter. If the other side receives two similar (duplicate) transaction records with matching sequence counters (more on this later), the earlier of the two submissions is reversed and is not funded.

The problem with the sequence counter becomes obvious when you think about what happens when the terminal needs to go beyond 999 transactions in a day. The sequence counter will spill over the three digits available, reusing a value from earlier in the same batch or the forbidden value 000. Our software was not so silly as to ignore this problem. The solution, as it was implemented, was to increment the batch number by one and reset the sequence counter for the newly-created batch to 001.

Unfortunately for us, one of our clients began frequently exceeding the magic 999 transactions per day and was experiencing problems with this approach. Processing multiple batches on the same day led to reconciliation and accounting issues, while causing delays in the deposits of funds to the client's bank account. Obviously, these were problems we wanted to deal with swiftly, once and for all.

Edit for clarity: Some have asked why I didn't simply increase the width of the sequence counter field to more than three digits. This width was defined in a third party specification and was not under my control. My software had to deal with this somehow.

A New Approach


After consulting the documentation and our technical contacts and running through a couple of false starts, we formulated a new approach to the three-digit sequence counter problem. Instead of creating a new batch to deal with the overflow, we would simply roll the sequence counter from 999 back to 001 and continue processing everything normally.

Our biggest concern with this new approach was related to the special duplicate transaction checking mentioned previously. The duplicate checking logic considers the following criteria when determining whether a subsequent transaction request matches an earlier one:

  • Sequence Counter
  • Card Account Number
  • Total Dollar Amount

If all three of these values match, the earlier of the two transaction requests is silently reversed, leaving the transaction totals out of balance and the client short of money. For some silly reason, people get very upset when their money goes missing unexpectedly.

Mr. Cinnamon Twist


To describe a plausible scenario where I thought this might actually happen, I decided to write a brief
story about a man I dubbed Mr. Cinnamon Twist.

Cinnamon Bun
Cinnamon Twist
...whatever
Mr. Cinnamon Twist is a businessman with a sweet tooth. Knowing he has a long day of meetings ahead of him, he stops in at the busy corner coffee shop looking for a morning treat. He spies a delicious, gooey cinnamon twist (with double frosting). His growling, empty stomach simply cannot resist. Leaving the shop with cinnamon twist in hand, he takes two bites and wraps up the rest as he hurries to catch a train heading downtown. Through the early morning, Mr. Twist savors his treat as he goes about his work. He prepares his materials for the big afternoon presentation for a prospective new client. After a long and stressful day of work, Mr. Cinnamon Twist boards the train heading towards home. Worn out and feeling exhausted, his mind wanders back to his early morning treat. He decides that he will treat himself to another (just this once) before dragging his tired body home,

In this scenario, it's plausible that our sweet-toothed protagonist used the same credit card to pay the same amount for both a very low and very high sequence number. If all the stars aligned and these transactions happened to reuse the exact same sequence counter, this would mean that Mr. Cinnamon Twist magically received his first treat of the day without being charged for it. Great news for Mr. Twist, not so good for the coffee shop who would be out the cost of a scrumptious cinnamon-flavored treat.

Back Of The Envelope


The first problem with the above scenario is simply noticing it. Detecting this type of situation on the fly is hard enough. With the requirement to be fast, high volume, and redundant between multiple data centers, this becomes complicated very quickly. The second problem is how to correct the situation once an error has been identified. I could think of a few tricks that I might consider, but I saw no obvious trivial approach for this problem.

While trying to avoid tackling this complex condition, I paused to look at some data. How likely is the above scenario? I looked at some rough numbers to try to get an idea. I looked at the number of clients exceeding the magic 999 transaction limit. I looked at the number of transactions using the same card at the same merchant on the same day. Using the classic back-of-the-envelope approach, I calculated that we would likely only see this situation a handful of times in a year.

It seemed to me that we were looking at a lot of complicated and error-prone work to save the cost of a tray of delicious cinnamon treats each year.

The Compromise


As it turns out, there is a manual process available to correct these types of transactions. By picking up the phone and talking to a real live human being, we are able to manually single out a transaction request and force it through.

Knowing that this manual correction process was available and fearing the work required to fully automate every possibility, I proposed a compromise. We would create a scheduled script to run daily and search the database for requests matching the duplicate transaction scenario above. If any duplicates were found, we would fire off an email alert message (subject line: "Cinnamon Twist") identifying the key transaction details and describing the process for manual corrections. The worst case, I thought, was that the alert would fire too often and I would have to implement the complex solution later anyway. The best case, on the other hand, was that the alert would basically never fire, saving a great deal of time and effort.

Sounding The Alarm


The first week after installing my script, there were still no email alerts. I was beginning to feel optimistic that we may never see the alert fire in practice. They say that trouble shows up when you least expect it. The day after sharing my optimism with my coworkers, we received our first Cinnamon Twist alert.

No problem, I thought. We followed the manual procedure only to discover that both transaction requests were good and no corrective action was required. This contradicted the documentation and our general understanding of how the system should work, but who am I to look a gift horse in the mouth?

Another week or two went by before the next alert fired. It seems that my back-of-the envelope calculations were a bit off. We were receiving more alerts than I had expected. This alert, too, turned out to be a false positive when we followed up manually.

We asked our technical contacts for clarification. After our messages got passed around a few times, our contacts eventually got back to us saying that this behavior was by design. It seems that we had worried ourselves over a problem that didn't actually exist.

We disabled the Cinnamon Twist Alert script a short time later. My "lazy" approach had saved me from implementing a lot of complicated logic for no reason.

An Ounce Of Cure


What is my point? What can we learn from these events? Perhaps it's time to spin the wheel of morality to tell us the lesson we should learn.

Maybe I was lucky. My calculations turned out to be somewhat (but not excessively) optimistic. There was a risk that we would need to handle these manual corrections frequently, leaving me scrambling to implement a complex change to relieve pressure from the rest of my team as quickly as possible. My approach was a calculated gamble, but it paid dividends even larger than I had anticipated.

To me, this is a turnabout on the old adage saying, "an ounce of prevention is worth a pound of cure." In this case, an ounce of cure (the alert and manual correction) was quicker, safer, and simpler to implement than a pound of prevention (a fully automated solution). In rare cases, the easiest way to deal with complex boundary conditions is not to. Instead, find a way to look for the errors and tidy up after they happen. Don't forget to calculate the risk and the cost, but you may just discover that you were about to make much ado about nothing.

Cheers,

Joshua Ganes

Wednesday, September 17, 2014

Institutional Knowledge Is The Default

This article is a follow up to my previous post on the topic of institutional knowledge.

Do As I Say, Not As I Do


Please don't interpret my recent post as a claim of personal innocence when it comes to accumulating institutional knowledge. I have completed many projects in my time that are completely devoid of, or seriously lacking in adequate documentation.

I realized long ago that the only way to avoid becoming emotionally paralyzed by constant feelings of inadequacy is to acknowledge my own shortcomings and work hard to improve myself day by day. By staying disciplined and focusing on continuous improvement, my recent projects have been more thoroughly documented than those from only a few years ago.

The Pit Of Despair


Eric Lippert writes about the pit of despair as a place where the traps are easy to fall into and difficult to climb out of. Unfortunately, institutional knowledge fits this description to a tee.

We constantly pick up valuable little nuggets of information as we go about our duties. Sometimes these are technical details about the systems we're working with. Other times, it may simply be the knowledge of who is already an expert in a given area.Tapping into the institutional knowledge of others can be more valuable than struggling to discover everything for yourself.

There is nothing wrong with this knowledge in and of itself. This knowledge can be used to unlock further discoveries and make key decisions that allows us to avoid disasters and achieve success. The problem is that the knowledge is trapped inside a lone individual's head. Without further action, we end up continuously accumulating more and more institutional knowledge. Institutional knowledge is the default and we must act deliberately if we intend to avoid it.

Why We Despair


Knowledge is tremendously valuable. As G.I. Joe has taught us, "knowing is half the battle." This is why distributing institutional knowledge is so important to any group of people working towards a common goal. When knowledge is trapped within a single mind, its potential is limited to that one individual. Time is wasted, uninformed decisions are made, and existing work is duplicated unnecessarily. From a business perspective, institutional knowledge is clearly bad for the bottom line.

I am about to draw a moral line in the sand. Neglecting to share institutional knowledge is regrettable, but intentionally hoarding knowledge to the detriment of the team in order to further one's own selfish ends is reprehensible. This is comparable to the salesman who viciously defends his "territory" from his coworkers to protect his own commissions. Not only does it reduce the collective effectiveness of the team, but it fosters and air of hostility and inhibits sharing important details needed to succeed.

Scaling The Walls


How then, do we climb out of the pit of despair and tiptoe around the pitfalls waiting to drag us back down? I'm no expert on this topic, but I'll share some of the things I do in my attempt to scale the walls and share my knowledge with my coworkers.

One of the best tools available at my workplace for sharing knowledge is our internal company wiki. Any pages I create on the wiki are immediately available to be searched, read, and modified by our entire company. These days, whenever I start a new project I will immediately create a new wiki page describing the basic purpose of the project and how it will work. As I continue to develop the project, I frequently edit the page with the most up-to-date understanding of the available details. As for my writing, I try to follow many of Joel Spolsky's excellent tips for writing functional specifications.

Another great way to ensure you're not accumulating institutional knowledge is to pay attention to the questions people ask. Sometimes people ask lazy questions. When they ask about something you've already covered, simply point them to the relevant documentation. If, on the other hand, they've done their homework and still require missing details or clarification, consider this a flaw in your documentation. Recognize the flaw, modify the documentation, and think about how to improve for the next time around.

On the same token, any time you find yourself asking for assistance, it's a likely sign that someone else has a collection of hidden institutional knowledge. Ask them if there's documentation, and suggest (or insist) that they write some. If nothing else, write down whatever lessons you've learned from your interactions.

Do You Validate?


Words of caution: just because you wrote some documentation, it doesn't mean that it's adequate.

When it comes to documentation, if I can't find it, it doesn't exist. You may have written a 500-page treatise covering every last detail of uses and maintenance of your paper clip system including a full bibliography, glossary, and footnotes on every page. It pays me exactly zero benefit if I can't find the document after giving an honest effort to search for it in all the expected places.

Just because instructions are clear to you, that doesn't necessarily mean that they will be clear to everyone. Each person is familiar with his own style. Things that appear straightforward to you may be ambiguous or unclear to others. Instructions that seem obvious to an experienced user may involve hidden steps unknown to a novice.

A great way to check for these flaws is to ask someone to validate your documentation for you. I find this particularly effective in the case of a documented procedure. In the spirit of hallway usability testing, ask a coworker to start from scratch and try to achieve your documented goal. Watch from a distance and note every time that they get stuck or confused. Later, add additional notes for clarification. Once another person can follow your documentation with minimal fuss, then you can be confident that someone else can perform the task when you're gone.

Still No Expert


As noted previously, I am not to be considered an expert in these matters. Listed above are some tips that I've found useful in sharing my institutional knowledge with my coworkers. What are are the best tips and tricks you have for avoiding the pit of despair and sharing your own institutional knowledge? Tell me in the comments.

Cheers,

Joshua Ganes

Sunday, September 14, 2014

What People Say When You're Gone

Parental Leave


My wife and I are pleased to announce the birth of our second daughter, Isla. She was born at the very end of May and has been providing us with baby snuggles and depriving us of sleep ever since.

I was fortunate enough to be in a position to take a decent length of paternity leave to help my wife with our two young girls and to enjoy some time together as a family. We made good use of our time by showing off Isla to our friends and family scattered across western Canada.

If you are in a position where you can manage and afford to take parental leave, I would urge you to take hold of the opportunity. Getting away from my work routine for a while was a great way to recharge and reflect on my current situation and goals. The precious first months and years of a child's life pass just as fast as the cliches say. Slowing down to experience and savor this special time with my children while I'm able is a privilege that I wouldn't want to pass up.

Returning To Work


When I came back to work, my coworkers greeted me with in a variety of ways. There were those who (hopefully) jokingly told me, "I though you were fired." There were a lot of pleasant and generic, "welcome back", or "how's the family?" responses. There were also a few who genuinely expressed that they missed me and how glad they were that I was back.

Of course, on an emotional level, we all want to be missed. It's a wonderful feeling to know that you're missed and appreciated while you're gone. From a personal standpoint, being missed is great. That got me to thinking about whether I want to be missed on a professional level as well.

Professionally Speaking


Imagine if not one colleague missed you during an extended absence. That would mean they don't need or desire your assistance to do their work, or worse, that you may actually stand in their way. Imagine if your boss didn't miss you either. It would mean that your job is irrelevant or that you are so unproductive that your absence is barely noticed. Either way, it sounds like your job security is in perilous danger. Obviously, you want to be missed at least a bit.

To be successful professionally, you need to become indispensable to your team. I believe that there are two varieties of indispensability -- one good and one bad. Let me illustrate using a couple of examples and see if you agree.

Mr. Smith is indispensable to his team. When the OIU system goes down (as it often does), he is the one who knows just how to diagnose the problem and get things back up and running. Last month when he was on vacation, it took his coworker three days to fix the problem. Mr. Smith can usually sort out issues in a matter of hours. The operations team loves Mr. Smith, because he's always so quick to dive in and troubleshoot their problems as soon as they call.

Mr. Brown is indispensable to his team. He is always ready to lend his expertise to help a colleague solve a technical issue or discuss a design question. His software is always high quality, well documented, and easy to maintain. The junior developers love him because of his valuable mentoring. They prefer to maintain and enhance Mr. Brown's projects because the code is clear, well designed, and easy to work with.

You can probably see where I'm going with this. Mr. Smith and Mr. Brown are both considered indispensable for wildly different reasons. Mr. Smith uses something called "institutional knowledge". Over the years, he has become an expert in the internal systems of his company (institution). This knowledge, while valuable, can sometimes even be hoarded. With all of this valuable information held only in his own head, Mr. Smith essentially holds the information for ransom. He ensures his own job security while maintaining a charade of expertise and talent.

Mr. Brown, on the other hand, tries to offload institutional knowledge. Instead of hoarding it, he documents the details someplace where anyone can easily find it. Sure, it may take an intern new to the project some time to get up to speed, but that's only natural. Instead of banging his head against the wall or interrupting Mr. Brown with unending questions, our intern can simply read and reference the documentation as he stumbles his way through the project. This leaves Mr. Brown free to concentrate on his own work, while empowering others to do great things.

What I Hope They're Saying


I hope I was missed during my absence. I also hope that people weren't asking when I'll be back because I'm the only one who knows about a specific system. Instead, I hope that they were simply discussing their projects, confident in their understanding based on my documentation. I hope that my boss was missing me because I'm the best man for the task, not because everyone else is struggling to keep my work on track.

What do you want people to say when you're gone? Tell me in the comments.

Cheers,

Joshua Ganes