Tuesday, December 01, 2015

Security Month - Episode 3: Social Engineering

End Of The Month


This past month has sure flown past for me. I intended to create one more security month entry, but life got in the way and it will have to wait for another day. Today's post is based on a presentation I gave at work a few years ago on a topic that is probably the most important of all security: the human component. It doesn't matter how good your locks and alarm systems are if you neglect to turn on the alert and lock the front door as you leave your house. Similarly, it doesn't matter if your system is designed properly if your users do not know how to protect the security of the system and their own credentials.

The Conundrum


When designing a secure system, you must walk a fine line between burdening your users with a barrage of security measures and keeping attackers at bay. Legitimate users should be able to access the system quickly, easily, and with minimal interruptions while at the same time keeping it "impossible|" to access the software without permission.

Typical applications rely on single-factor authentication (username / password). When authentication relies only on a secret key (the password), when a password becomes compromised, the system becomes compromised. The more users a system has, the more opportunities exist that at least one user will become compromised.

Social Engineering


It's often simpler to manipulate people than machines. While machines may be rigid in terms of interactions and enforcement of rules, humans tend to be more flexible -- especially when subjected to emotional influences. Social engineering is the process of "hacking" people to gain access to secure systems. This may involve tricking or convincing them to give up sensitive information such as passwords, or manipulating users into performing actions on the attacker's behalf.

Mandatory xkcd Reference

https://xkcd.com/538/


Pretexting


In order to manipulate users into making rash and emotional decisions, attackers may pose as authority figures and create plausible urgent scenarios where sensitive information needs to be divulged. This may include authentication credentials, or other details to be used in subsequent attacks. It's important to inform users to never give out their credentials to an unknown party regardless of urgency.

Phishing


In a phishing attack, a hacker will send a link to an illegitimate web site impersonating a legitimate page. A classic example is an email claiming to be from PayPal, and telling you that your account may have been compromised. It asks you to visit a link and enter your username and password. Ironically for those who are duped, the very attempt to save their credentials is what ultimately divulged them.

It is very simple to copy a legitimate site using copy/paste tools and modify it for devious purposes. Many people are not savvy enough to know to look for a legitimate secure (https) address. Teaching users not to "click the dancing bunnies" and how to verify a legitimate web page is a battle we are constantly waging.


Baiting


A rarer and more targeted attack vector is baiting. In this attack, a hacker will leave a physical device such as a USB thumb drive somewhere to be found. They may leave it in a company parking lot with an enticing label in the hopes that curiosity gets the better of their victim. Upon inserting the device into a secure system, the payload is delivered to either steal information or run a program or script to do something on the attacker's behalf. Users must learn never to connect hardware of suspicious origin to a secure system.


Quid Pro Quo



A hacker may attempt to butter up users by giving them something, either physical or performing a service, in return for sensitive information. While this may come in the form of a bribe, it may be as simple as establishing a friendly relationship over several days and requesting a simple favor.

Perhaps the most devious attempt I've heard described is to call random employees at a large company posing as "technical support". Many non-technical people will accept support on their latest computer problem and can be tricked into entering commands on the attacker's behalf.

All Good Things...


If there's one lesson to be learned from this post, it's to always be on your guard. "Trust no one" is bad advice to live by, but essential advice when it comes to sensitive credentials. Computers do not have human judgement and will gladly accept validated credentials from the shadiest of characters. As is demonstrated so well by social engineering, humans and machines do not behave the same way and your interactions with them ought not follow the same rules.

That brings us to the end of Amish Programmer security month. I hope you learned something new, or at least refreshed yourself.

Here's wishing you a merry Christmas and a happy New Year. I hope you return next year for some non-security topics here at Amish Programmer.

Cheers,

Joshua Ganes

Monday, November 02, 2015

Security Month - Episode 2: Storing Passwords

Trust No One


Two weeks after being promoted from Junior to Intermediate Programmer at Paranoid Software Ltd. you've been tasked with implementing a secure sign-in system for the company's new up-and-coming website. Due to "government conspiracies" and myriad other reasons, your tinfoil-hat-wearing boss has mandated that you use no third-party libraries for passwords and authentication. Only his most trusted and loyal employees are qualified to create something so important and sensitive. He'll be reviewing your work at a later time in his secure bunker...

Not being allowed to research industry best practices on this subject via the corporate internet connection (again, "conspiracies"), you sit down to draw up designs for a new database table as follows:

UserNamePassword
WayneGretzkyGr8One99
SidneyCrosbySidTheKid87

Each user will sign using a unique user name and corresponding password. When the user submits the sign in form, the submitted credentials will be compared against the database values found in the table and access will be granted or denied accordingly. This design allows for passwords to be verified while remaining flexible enough to allow for additional new columns for features such as temporary passwords, expiration, and more.

Plain Text Woes


You set to work building a new sign in page based on this design. Everything is coming along fine until forgetful Tom drops by your desk after your coffee break. "Looking for something, Tom?", you ask casually. "Maybe...", says Tom, "... I can't remember what I'm looking for, but that's not the only reason I'm here..." Tom scratches his head for a moment before his face changes in a moment of sudden recollection. He begins to explain how he can only remember one password for everything, and he's afraid of what certain other employees might do if they copied his password out of the new database table. They could potentially use it to access his account on another system.

Tom's a nice enough guy with some serious clout and connections around the office. You don't want to upset him, so you revise your design as follows:

UserNameEncryptedPassword
WayneGretzkyTe8Bar99
SidneyCrosbyFvqGurXvq87

In this revised design, the passwords are stored encrypted using the "ultra secure" ROT-13 algorithm. When a user's sign in credentials are submitted, the database value is decrypted and compared against the submitted password to authenticate the user and grant or deny access to the system.

Clever Estelle


As your project nears completion it needs to pass a review from Estelle, the company's oldest programmer. Estelle got her start in the industry working with VAX machines in the '70s. Nobody knows the details of her top-secret projects, but her office is strewn with archaic servers and mainframes cobbled together with assorted wires and duct tape. She left a barely-legible handwritten note on your desk asking you to stop by for a review of your project.

A light blue haze of smoke oozes out from under the door as you approach and knock briskly. You hear a loud cough before Estelle's unusually deep and raspy voice beckons you to enter. As you enter, you notice the yellow-brown stained walls of her private office. Distracted, you trip over an old server, yanking loose a bundle of wires. "Don't worry", Estelle reassures you, "that machine deserves it!"

Estelle likes your design, but due to her advanced technical savvy, she's figured out how to decrypt the passwords stored in your database and suggests that you use one-way hashes instead. She hands you a ream of fan-fold paper retrieved from an old dot-matrix printer. "Here's some materials I've prepared for you," see grunts as she urges you towards the exit.

After reading her notes about a wide variety of one-way hash algorithms, you once again draft up a design for your database table:

UserNameHashedPassword
WayneGretzky$2a$12$/Ox24ETwuRKwqCASxD40x.O
7Y1HOuaLzVp7nwf6FwUQjnRGnJok0m
SidneyCrosby$2a$12$o7chUYiQh0htw463/m5df.o
lmVQzkIjINY9CyyQmtC7xHV53OxtEC

This time there is a subtle change to the password verification logic. Instead of decrypting the stored password, the submitted password is passed through a one-way hash (in this case bcrypt with 12 iterations) and the hashed values are compared to see if they match.

A good hash algorithm is easy to calculate, but nearly impossible to reverse. They also distribute hashes effectively to avoid collisions (two similar strings hashing to the same value), which makes them statistically secure for use in verifying that you have a matching password.

Tom's Back!


Having passed Estelle's second review, your project is mere days away from being installed. You're enjoying a well-earned break in the lunch room when Tom pokes his head in looking confused. "I'm looking for someone," says Tom, "but I can't quite remember who... Oh yeah, it's you!" You notice that Tom is holding the same dot-matrix print-out about hashes in his hand. He explains that he's found a way to break your password system. He's created a script to generate every string up to an arbitrary length, calculate its corresponding bcrypt hash, and store those in a new indexed database. With this upfront work complete, he can use the newly indexed table to perform a reverse password look-up from the hash of any password short enough to be covered by the script. He calls this generated table a "rainbow table". This could potentially be used to reveal thousands of user's passwords.

After racking your brain late into the evening, you modify your design one last time:

UserNameRandomSaltHashedSaltedPassword
WayneGretzkyPQw#{hC*bDvyHnhVeOGaVOnG
dcsiAYETeNBbannZbdtu?kLN
dDXjbfDe?slovDjJfw$m2uGf
ZRneg-7bDJeeVwSwDGEsonI4
Cini1p*zuJ&HwJRgC 3grzzU
c!EKxrSt
$2a$12$tasiR59CA7uefis0.
4r4CuTjo2gD7MkP3Mzzn77yr
/QWuRbV8kyf.
SidneyCrosby;ZjmYbJeNMstnqu\q|gbLpfs
zb@6PVyOoglbBuLUb-Hulgau
nwCR.qvCUikOEkEQroqi8lRY
J4gVA{VwlWwh0nyQnZG5JYCm
k.cStNNyy|hzgiIqPjSdTlkh
zxaZIi_C
$2a$12$o3aiDUMmSDqX9/fpA
SJ/VOB483GsYvtfESeZbrON.
Gz080gVtlBqW

In this final incarnation, each password gets assigned its own unique random "salt" value. This value is first concatenated with the password. This "salted" password value is then run through the hashing algorithm to populate the database. When the a user submits a password at sign in, this process is repeated with the submitted password. It is first concatenated ("salted") with the stored random salt, then hashed. The final result is compared the hash stored in the database to grant or deny access.

Your tinfoil-hat-wearing boss emerges from his bunker to shake your hand for a job well done and asks you if you wouldn't mind helping him recover his password...


Transition


The best way to secure your users' passwords is to protect them in such a way that even a programmer with full knowledge and access to the system cannot get at them. A corollary of this is that if you ever submit a lost password request to a website that returns your password in plain text, you know they're doing something wrong.

Several years ago I came across a collection of sites with multiple sign in systems that were at various stages of progression from the story above. Some were storing passwords in plain text, some were storing them using different forms of simple encryption, and others were hashing the passwords (using varying strengths of hashing algorithms) without salting them first. It was my job to bring these all up to snuff without breaking live systems with active users.

The plain-text passwords were the simplest. I already had the passwords -- I salted and hashed them and moved along.

The encrypted passwords were almost as simple. I decrypted the passwords to their plain text form, then treated them as in the plain-text case above.

The hashed passwords were the most challenging. I created an extra database column to list the hash algorithm and imported the hash values into the database with no salt value. When one of these users signed in and was authenticated, I could then use their submitted password value to populate a strong salted hash value on their very first visit under my new system.

There were also some accounts that were not so active. We conveniently had a 90-day password expiration policy. After 90 days, I ran a script to reset all of the remaining passwords to new temporary values and purge our system of all insecurely stored user credentials. Any users who had not signed in within 90 days would have to use our password recovery system to regain access to their accounts.

Take It Away


If there's one thing you should take away from this episode it's that you should do your best to store passwords such that they cannot even be recovered by someone with inside knowledge.

If you know of any code that stores passwords without securely salting and hashing them, please do your best to correct the situation. If that code belongs to you, please go fix it. If not, please write a friendly but firm reminder to get those passwords under control and securely stored.

Cheers,

Joshua Ganes

P.S. In case you missed the joke, ROT-13 just rotates the letters through the alphabet by 13 characters. Please don't use this for any form of security.

Tuesday, October 27, 2015

Security Month - Episode 1: Injection

Starting With A Bang


Injection attacks, in all their various flavors, are now the number one most exploited security flaw in software. Their impact can range from something as innocuous as displaying extra text on a web page to the full compromise of your host machine.

Setting A Bad Example


Do you see anything wrong with the following snippet of PHP code?
(Note: this is a simplified example for demonstration purposes only)

// Initializes necessary variables, functions, etc.
require_once( "important.libs.php" );


$accessLevel = GetAccessLevel();

<snip>

// Defaults to empty string if not present
$name = Posted( "name" );

// Find contacts with matching first name in database
$sql = "SELECT * FROM Contacts WHERE ".

       "FirstName = '$name' AND ".
       "AccessLevel <= $accessLevel";
$result = $db->query( $sql );
// We should really do some error checking here

// Print details for each match found
while( $row = $result->fetch_assoc() ) {

    $firstName = $row[ "FirstName" ];
    $lastName = $row[ "LastName" ];
    $phone = $row[ "Phone" ];

    print "Name: $firstName $lastName, Phone: $phone<br />";

}

<snip> 


As you might have guessed from the title of this article, this code is vulnerable to injection. In fact, it is subject to two common forms of injection: SQL injection, and cross-site scripting (XSS).

What Is Injection?


Injection happens every time we receive input from an untrusted source and submit it to an interpreter without first sanitizing the input. In the case of SQL injection, that interpreter is the database interpreting the text of our SQL query. In the case of cross-site scripting, the interpreter is the browser attempting to display our HTML page. Whatever the scenario, allowing potentially dangerous user input to be interpreted as a trusted command or request can have disastrous .consequences.

What Could Go Wrong?


In the scenario above, imagine that a hacker wants to get a list of all contacts in the database. They are currently restricted only to those contacts whose AccessLevel value is lower than a configured value. How could they use SQL Injection to gain access to everything?

What if, in the form that requests this page, they entered "' OR 1 --" as the name to search? What would our query look like now? (Assuming AccessLevel = 2)

SELECT * FROM Contacts WHERE FirstName = '' OR 1 --' AND AccessLevel <= 2

With a new terminating quote and comment, suddenly our query is changed to be much more permissive. The attacker has tricked our script into revealing more information than intended. What if instead of a SELECT, we were working with a DELETE query? The entire table full of valuable data might be lost.

The second vulnerability is to a cross-site scripting attack. What if, instead of entering their name, an attacker entered some raw HTML such as "<script>alert('XSS!')</script>"? Suddenly, they've inserted their own elements and even javascript code into your web page. If they can get this to happen on a page displayed to another user, the attacker can hijack your page to show you them anything they want and it will look like it's coming directly from your page.

It Could Be Worse


Imagine a different scenario where input is passed to an exec() call and run on the system. Now instead of limiting damage to the database system, a clever attacker can run commands directly on the server. Depending on the privilege of the process and security of the underlying system, the attacker could potentailly gain root access to the machine. They can use this to access sensitive data, change important settings, or even run any program of their choosing. The entire system can be compromised and placed at the mercy of a malicious hacker.

Don't Forget Secondary Sources


Here's one that can really trip you up. Imagine an application that performs a task based on values stored in the database. It's easy to forget that despite the fact that the database belongs to you, the source of your database values may not always be clean. "We've traced the call, and it's coming from inside the house." Always assume any source of data is dangerous unless you can verify its source and security.

How Can We Fix It?


The good news is that there is a simple way to protect yourself from injection attacks: sanitize user input using readily-available libraries.

If your language of choice is PHP, there's plenty of built in functions to safely encode all varieties of user input:
Please remember that this is not a comprehensive list. There are so many different languages and environments out there. Whenever you're wielding user input, be careful to consider whether your code may be subject to injection attacks and research appropriate measures to counter such attacks.

Where Do We Go From Here?


If you feel this post is yesterday's news, I hope you take it as a friendly reminder and refresher on the topic. If any of this is new to you, I suggest you take the next opportunity to study up on the subject. Read some other blog posts. Follow some examples online. Revisit a piece of recent code and try to break it (in a test environment please!). Once you've found some vulnerable code, learn the proper way to secure it and and run your tests again to prove that it's been fixed.

If there's one thing to take away from this it's this: always treat user input as suspect. Never let raw user input pass from your application to another system without sanitizing it first.

Now go and write some secure code.

Cheers,

Joshua Ganes

Thursday, October 22, 2015

Announcing Amish Programmer Security Month

From Humble Beginnings


I started programming from a young age. I distinctly recall being in grade 5 and being curious about how some of the games in our school's computer lab worked. Having absolutely zero clue, I checked out a book from the school library on programming for the Apple IIe. My first programming experience was typing out Apple BASIC programs straight from the book and watching them fail as my two-fingered hunt-and-peck typing resulted in multiple typos.

My exposure to computers was reasonably limited until my sister and I pooled our money to buy a 100MHz 486 (with TURBO on for maximum fastness). I'm pretty sure I got the better end of that deal as I began spending a lot of time both playing video games and experimenting with QBASIC and its elaborate but confusing help files.

In high school, I took a few courses in Turbo Pascal and C++ programming. This mostly involved running through a bunch of easy exercises in each chapter and then spending the rest of our class time playing in the lab or working on our frighteningly terrible video game group project. That Pascal text book may be the only high school textbook that I've ever read from cover to cover.

After puttinsg four years of university and another year's paid internship under my belt, I was sure that I knew almost everything I would ever need to know as a professional software developer. I had no idea just how much more I still had to learn...

What Do You Mean This Is Insecure?


With so many courses behind me, I was sure that I had covered most of the important topics. If security was so important and difficult, surely someone would have covered it in one of my course by now. I was surprised when one of my colleagues pointed out security flaws in my code. I did not even recognize it as a flaw when it was pointed out to me.

Over time, with the help of more experienced mentors, bloggers, and some job-sponsored training, I began to learn a lot more about secure coding techniques. For many of my experienced readers, these have likely become second nature. Still, everyone can stand a refresher on the basics now and again. For those who, like I once was, may still be in the dark about secure coding, you have a great deal to learn.

Amish Programmer Security Month


With that in mind, I declare the next month to be security month here at Amish Programmer. I will strive to get back on board the posting train and publish a brief post covering such security topics as injection, buffer overflows, and more.

These topics have all been covered in depth by others. I will try to present a fresh take on these topics using code snippets and straightforward examples. You can tune in in the coming weeks for a short discussion of several fundamental topics in secure coding. I hope to see you back here soon.

Cheers,

Joshua Ganes

Wednesday, October 14, 2015

Challenge Your Assumptions

Today's brief post was inspired by my recent road trip with my wife from here in Vancouver, BC through the states of Washington and Oregon.

When it comes to road trips, I'm all about efficiency. Pit stops run like clockwork. The guys from NASCAR have nothing on me. That's why I prefer self service with pay-at-the-pump when stopping for gas. My fingers fly with precision as I fetch my card from my wallet, insert it in the machine and punch my way through the menu to start up the pump and select my grade of gasoline to begin fueling.

On my first stop of the trip, everything was running smoothly right up until I was stopped in my tracks by the following prompt:

Enter Zip Code:

 

Cancel         OK

Unfortunately for me, as a Canadian, I don't have a numeric 5-digit zip code. Instead, I have a six character postal code with a mix of numbers and letters. Try though I might, I could not find a way to enter my postal code or skip this prompt.

Edit: I've since been told that you can simply enter the numeric portion of your postal code followed by zeros

After a few moments of poking around at the machine, I eventually resorted to pressing cancel and walked into the station to speak with the cashier. Within a few minutes I was back on track (a lap behind the race leaders).

My experience with the zip code prompt was hardly life altering. Still, it illustrates some of the problems that can come up when your software makes incorrect assumptions (e.g. all gas station customers have a zip code). Challenge your assumptions and do your best to consider all possible users. Question whether all the information you prompt for will be available or match your expected format. If not, your software should gently guide the user on how to proceed.

Cheers,

Joshua Ganes 

P.S. Did you see the seventh inning of today's Blue Jays vs. Rangers game? It was well worth a second look! Go Jays!

Wednesday, May 27, 2015

Any News? - Working With Third Parties

I Am A Rock


We programmers tend to be an independent bunch. Like a lone wolf, we often prefer our own company to that of others and like to work alone. It's been said that, "no man is an island." I have to agree (unless you happen to be Paul Simon). Even programmers need to interact with their coworkers, managers, other departments, and even third-party contacts to meet the day-to-day demands of their jobs.

While networking and politics may not come naturally to all developers, we're hardly worthy of our titles if we cannot find a way to communicate clearly and accurately. Programmers must express requirements, prerequisites, technical issues, and design ideas in ways that others can understand so that all players can do their jobs and assist us in completing our own work.

Have you ever found yourself in this situation? You're beginning to wrap up a project, but you notice a seemingly simple, yet annoying bug in the library you are working with. Being diligent, you follow up with the library provider to report the issue. You ask if this is a known issue. You ask if you're using their library correctly. You ask if there are any workarounds or alternative approaches they can suggest. You go ahead and make a note to yourself to revisit the problem once the provider has had time to review your questions. Fast forward a few weeks. You've cleaned and polished every other corner of your software, but you really need to address this one remaining issue before you can build a final software release. In the meantime, the only feedback you've received is that they're, "looking into it."

On Dependencies


A wise piece of advice for any project is, "eliminate the dependencies." Just as in the example above, making yourself dependent on someone else's assistance can be a very compromising position. Not being able to deliver a solution because you're waiting for someone else to act can be uncomfortable, awkward, and frustrating. If you can eliminate these dependencies, you may still run into roadblocks of your own, of course, but you will retain the power to work through them and implement a solution dictated by your own schedule and budget.

It's not really fair to say, "eliminate the dependencies", and leave it at that. For many developers, creating a dependency is not really a choice. Maybe your boss, team, or client is demanding a specific technical requirement. Maybe you can't afford any alternatives. Maybe, as is often the case for me, your work involves creating an interface between your own software and a third-party tool. In this case, if you eliminate the third-party tool, you eliminate the need for your own software entirely.

Now, before I go accusing anyone of wrongdoing, let's try the shoe on the other foot. The development team in charge of that third-party library has more than just one bug fix on their plate. They need to balance support and bug fixes against their company's need for revenue-generating new features and routine maintenance. They may have a significant collection of issues that go through a triage system, and your bug report hasn't yet make the cut. I'm sure there's several times when others were waiting on me to complete my fix before they could proceed. Any fingers I point here need to be pointing back at myself also.

Planning For Success


The key to managing the risks of creating project dependencies is to identify those dependencies and understand them. Do your best to perform a reasonable analysis of the risks (if possible). Try to understand the expected flow of communications between both parties. Manage expectations by discussing project timelines and turnaround times for both technical inquiries and potential bug fixes. Understand the implications for your project when you cannot get the other side to take quick action. Are there alternatives? What are they? Will they be just as risky? If the risks are too high, sometimes the only winning move is not to play. Don't allow yourself and your organization to be sucked into wasting resources on an endeavor that will never succeed.

One of the best ways to reduce delays due to third parties is to be prudent and up-front to find and report any issues. Fixing bugs takes time. The sooner a bug is reported, the sooner it will be fixed. Therefore, schedule any development and testing work dependent on third-party support at the earliest opportunity. As soon as an issue is discovered, take time to understand the problem in reasonable detail and communicate it as soon as possible. This will give everyone the most time to respond and allows you to proceed with other features until the bug can be resolved.

Once you have requested support from an outside party, follow up and keep in touch. Remember, the contact at the other end is a person too. Treat them with respect and professionalism. Keep your communications positive and inquisitive. Try not to accuse or pester. There are differing opinions on how frequently you should send messages, so I won't offer specific advice. Instead, I suggest you give your best effort to judge the tone of their replies to see if the other side has become annoyed. Never hesitate to follow up if a written deadline or estimate is missed. Your goal, again, is not to accuse, but to make sure you are fully informed and kept up to date.

If worst comes to worst, don't be afraid to escalate. Give your best effort to work through the default channels provided. If, however, you simply cannot get the answers you need, the time may come where you need to ascend the ladder of power. Do your best to do so as respectfully as possible. Ask your contact if there is an option to expedite or escalate the communication process. I have seen cases where two lower-rung employees went back and forth for weeks. A single brief phone call between two big wigs was all that was required to sort things out. Nobody wants to be disrespected and have someone go over their head. Reserve a power move like this as a last resort, but don't be afraid to pull the trigger if all other options have been exhausted.

The very last resort sometimes required implementing a heroic workaround. For this to work, you need to understand the third-party system well enough to automatically recognize the failure and wield the power to fix it through one or more additional steps. Rather than having the third-party implement a fix, you can choose to live with the existing bug, masking its existing by coding your own workaround fix. On a personal note, while this option is often the best business decision, it often feel like an enabler. We just let the bug continue to fester and bend over backwards to get past it rather than correcting it at its source.

Good Days And Bad Days


In my professional experience, I have worked with a fair number of third parties. Some have been eager and helpful. They are quick to respond and provide clear details needed to solve my problems. Others have been slow to respond and very terse and unclear in their replies. After working through a few issues with a contact, you will quickly develop an opinion of their competence. This can be used in further risk assessment. My advice is to go with your gut. So far, I have never once changed my opinion from that initial impression following subsequent communications.

What's your best or worst story of working with a third party? What went right and what went wrong? What would you do differently? Let me know in the comments.

Cheers,

Joshua Ganes

Monday, March 30, 2015

The Data Structure Barbecue Grill

Charcoal, Fire Starter, Match


Recently, a colleague asked me a question about the implementation of a hash table. I found that I had to stop and think for a moment. My explanation was by no means the smoothest delivery. I tripped over my words a few times and needed to catch myself and backtrack before settling on the correct details. This incident has convinced that it's time for a refresher on one of the fundamental components of computing science: data structures.

Rather than dusting off my old text books and violating a copyright or two, I thought I'd take a new approach inspired by a conversation with yet another colleague of mine. With spring in full bloom here in Vancouver, it's time to pull out a stiff wire brush and begin cleaning the grill. Consider this your invitation to the first ever data structure barbecue.

On A Stick


While my usual preference is for a nice steak, there is one barbecue item that is nearly as much fun as it is delicious: the kebab. The scrumptious kebab is a collection of seasoned meats and vegetables skewered onto a single stick.

How is this tantalizing treat comparable to a data structure? When building a kebab it's impossible to add new pieces to the middle, but poking a new item onto either end is easy. Once cooked, it's far simpler to remove and eat from either end than it is to chew off the center. This matches the properties of a double-ended queue. You can add or remove at either end of the queue, but the items remain in a fixed order and cannot be accessed until found at either end of the queue.

A neat twist on our double-ended kebab is a kebab with a handle on one end. This kebab can only be accessed from a single end; the exact properties of a stack. We push items on one at a time, keeping the initial order until removed. The first item added to the kebab when constructing it is the very last to be removed and eaten. We must wait until all other pieces are gone before finally arrivinge back where we started.

Intestinal Fortitude


Our second course here at the data structure barbecue is a juicy and meaty collection of grilled sausage links. Filled with ground seasoned meats, seasonings, and aromatics, these stuffed links are bursting with flavor. The links of this chain are joined to their nearest neighbors by a thin brand, creating a long collection of heartburn-inducing, artery-clogging satisfaction.

Most of you have probably already figured this one out. This closely resembles the good old-fashioned linked list. Starting at one end, we can journey from link to link all the way to the other end. The order is fixed, but severing just one connection breaks the list into two. Losing a handle on the second half causes the whole chain to be dropped on the floor causing a mess and making it unusable (a memory leak).

Fresh From Chilliwack


I made a special trip down the Fraser Valley for our final course here at the data structure barbecue. Today's finale is grilled corn on the cob. I hope you held onto the toothpick-like skewer from the kebabs, because you're about to sink your teeth into some fresh, sweet corn. This final starchy treat contains a fixed collection of ears of corn. You can start wherever you'd like and easily access any area on demand, choosing to stay in one area or revisit places you've already covered.

The cob of corn closely resembles the classic array. It's very rigid in total size and arrangement of values, but easy to access any part at any time. The constraints of an array are limiting, but enable incredibly quick access to any item within the data structure.

Where's The Hash Table?


Over there next to the table with the macaroni salad, the snacks and all the fixings...

I guess not all things are fit to be grilled. I'll cover hash tables here because they are the essential building block for more advanced data structures such as associative arrays.

A hash table is a clever way to treat non-ordinal values (e.g. strings) as indices. The first thing to do is to define a simple to compute hash function taking the key values as input and computing a numeric hash value. A very terrible hash function could simply return a fixed value (this would result in a data structure equivalent to a linked list in most implementations). An ideal hash function given any random input would output any hash value with an equal chance. Similar input values would result in very different hashes.

Each computed hash value corresponds to a bucket containing all previously stored values. Each bucket may contain many results with the same hash value. Though there are a variety of implementations, from here the collection should be small and much more manageable. Traversing the collection, we look for a matching index and the associated value. In this way, hash tables allow us to quickly store and access a wide variety of data types using limited memory and direct object comparisons.

Come Again Soon


I hope you all enjoyed the first ever data structure barbecue grill. Please feel free to take home seconds and share with your friends. I hope the analogies weren't too labored. Please don't try to push them too far; I'm sure they'll break.

What are your favorite clever data structure analogies? Let me know in the comments.

Cheers,

Joshua Ganes

Thursday, January 22, 2015

National Programming League Draft 2015

Did You Say Hockey?


One of the great things about publishing this blog is that it lets me write on topics I'm passionate about. While I blog primarily about software development, today I feel like discussing another passion of mine: hockey -- specifically the National Hockey League and the Edmonton Oilers. For those who say that hockey's not your game, please bear with me as I'm headed somewhere with this.

As a red-blooded Canadian, I've been raised in a hockey culture. Hockey in Canada is unavoidable. Like (American) football in the USA or soccer (football) in Europe, it permeates our culture. Nearly everyone is aware of the stars of the game and the latest exploits of the local team. For the die-hard fans, it goes much deeper and becomes a part of our identity. Unfortunately for me, the Oilers have been taking me through a bit of an identity crisis since I moved to Vancouver in 2007.

Office Pool


My love of hockey has led me to run the office hockey pool for both the regular season and the Stanley Cup playoffs. Our regular season pool is a draft pool where each owner drafts a fixed number of forwards, defencemen, and a goaltender. Each poolie collects points throughout the season based on his team and is allowed a limited number of trades. With more than half of this NHL season complete, it's interesting to see how our draft choices compare to the real numbers.

The first surprise is the busts: those players whose performance failed to live up to expectations. Prior to our pool draft, I distributed a spreadsheet of last season's top scorers. Most of our poolies based their picks (at least in part) on this list. Milan Lucic, and Jordan Eberle, and most notably James Neal were all dropped from our pool within the first few months due to their poor performance. It appears that a player's historical statistics are not always a sure indication of his future results.

Perhaps more surprising is the missed opportunities. Many players were overlooked during our draft and turned into great late-round steals. Several of the top players in the league were passed over and remained completely undrafted. This includes top-rate goaltender Pekka Rinne, and the current league-leading forward Jakub Voracek. The collective knowledge of our pool was not enough to identify these high-scoring gems before draft day.

Perhaps we're just terrible at hockey pools, you say. Maybe so, but before we wallow in self pity at our own ineptitude, let's see how the professionals fared.

The Real Deal


Each season, the NHL hosts its annual entry draft wherein each team attempts to select the league's next superstars. Teams spend massively in time and money to scout young prospects in preparation for the draft. Their one objective is to choose the best player available in each round. Excellent young draft picks develop into the core of an elite NHL squad, and are considered mandatory to build the next championship team. With so much on the line, how good is their track record?

When it comes to drafting forwards, it seems to me that the professionals have done a solid job. Of the top thirty scorers this season, the vast majority were selected in the first round, several were taken number one overall. There are a few notable exceptions such as Henrik Zetterberg and Joe Pavelski, each taken in the seventh round of their own draft year. As for goaltenders, the professional record appears much shakier. Many of the top goaltenders were selected well into the late rounds including two of this season's leaders in the wins column: Pekka Rinne and Jarolav Halak, who were selected in the eighth and ninth rounds respectively.

As for the busts, I feel the best place to look is the number one overall seed. The most coveted position in the draft, the privilege of the first overall pick allows one team first choice of hundreds of young up-and-coming players. While we can hardly expect the scouts to nail it every time, it seems fair to anticipate that anyone taken so early in the draft would surely develop into a NHL staple with consistent performance for years to come.

One of the all-time great busts was the number one selection of Alexandre Daigle by the Ottawa Senators in the 1993 NHL entry draft. After a promising rookie season, he consistently fell below expectations before bouncing from one team to another and leaving the NHL for Europe following the 2006 season.

Then there's my personal favorite draft bust: Patrik Stefan. Stefan was chosen first overall by the Atlanta Thrashers in the 1999 draft. After averaging less than half a point per game over his NHL career, the number one pick left the league after only seven seasons. He is now unfortunately best known for this debacle, missing a wide open empty net and allowing my Edmonton Oilers to tie the game with only seconds to left on the clock.

It appears that the NHL scouts, whose entire job consists of watching, interviewing, and evaluating young players still get it wrong sometimes. I give them all due credit for a tough job where the successes go mostly unnoticed and the failures expose them to public scrutiny, with the critics calling for their heads. Still, these scouts occasionally miss golden opportunities or unwittingly place their faith in the wrong player.

Programmer Draft


Imagine, if you will, that the first ever National Programming League (NPL) draft is a mere three weeks from today. Your company was lucky enough (or perhaps it just performed poorly last season) to be given the privilege of the first overall pick. On draft day you will have your choice of any active programmer in the world to build a your dream team around. How do you choose?

You could start by looking at famous programmers. There's not too many who have wide-reaching fame, but if you dig hard enough, you could probably come up with a list of maybe a few hundred programmers who could be recognized by several people they had never met. Does fame really equate with talent? Not where I come from. Certainly some programmers might be famous because of their talents, others may also be famous despite them.

Another place to look would be job sites and headhunters. Browsing through candidate profiles from a collection of sources, you could try to boil down all the varied types and qualities of information into a short list of candidates. From there you could further research each programmer on the short list and try to come up with your definitive number one pick.

An interesting approach would be to follow chains of references and recommendations. Start with a reasonably large collection of randomly selected programmers and ask each one to recommend the smartest programmers they know. By repeating this process recursively, you should eventually bump into the same highly-praised candidates time and again. Again, from that pool of recommended candidates, you could perform some final detailed analysis and evaluation to settle on one final choice.

With the strategies above, you could probably come away with a solid programmer, but how close could you come to identifying the true "best" programmer on earth? I think you might still miss by thousands of places.

I'll bet that some of my readers have even more clever schemes up their sleeves. Can you think of any better strategies, or are you just hoping that you'll be selected in the early rounds?

Programming Is Different


For hockey and other elite-level sports, all the work is done out in the open and under the public's watchful eye. Every shift, every hit, and every goal is recorded and tallied for later retrieval and analysis. We can watch healthy NHL players play 82 games each season and play back the tape in slow motion from every angle imaginable. Yet, somehow we still find it hard to predict tomorrow's score.

Unlike the world of sports, programming work is largely an independent activity and is typically done behind closed doors. Private companies protect their source code and their employee information. With open source, of course, you can see the fruits of their labor, but it is still very difficult to measure performance in a meaningful and statistically relevant way. There's no great programmer database where you can slice, dice, and graph statistics to your heart's content.

From my own experience, I know that employees who have worked at the same company for years can know little about their colleagues talents. Only by working closely with another developer on a variety of projects, can I even begin to gauge how truly talented (or not) they might be. How much harder would it be to evaluate the performance of someone I've never met?

To The Point


The thing that has me all fired up is that people sharing their "ideal" process for hiring a programmer.

At this point I must mention that my experience in this area is incredibly limited and I'm not really qualified to be giving out advice on hiring. (Now ignore this disclaimer and listen to me!)

I've seen this advice take many forms. Some propose a simple bullet-point test that can be completed in minutes, while others have devised an elaborate series of excruciating tests spanning multiple days and consisting of so much actual work that it might not even be legal anywhere on earth.
(P.S. please don't do this)

What I'm desperately trying to get across is that there is no such thing as the ideal hiring process for programmers.

There are certainly some tips and tricks that you can apply to weed out the riffraff and narrow the field to those with a higher chance of success. You can ask candidates to write simple functions on a whiteboard or using software. You could discuss the advantages of different data structures in a specific scenario. You might give candidates basic design problems and ask them to draw diagrams of their solution using boxes and arrows. You might discuss work history and how they've dealt with people problems. I'm sure if I sat here long enough, I could rattle off dozens more.

All of these hiring tips and tests will only get your so far, eventually you have to settle on a candidate who seems competent and talented given limited information. Even if you are lucky enough to hire an elite performer, it may take years before you can confirm that they are truly a cut above the rest.

Sorry To Disappoint 


Just as it is for the professional scouts, finding and hiring the right people for your technical team is an essential part of your product's success. I understand that this is an important problem, so I'm sorry that all I have to say is, "this problem has no ideal solution". I'll let you in on a secret though: if I ever discover a solution, it will be much too valuable to share -- at least until I can use it to earn some cold, hard cash.

Do you believe in an ideal hiring process? What tricks and tips do you use to identify promising candidates? Let me know in the comments.

Cheers,

Joshua Ganes