7 MAY 2012

Lisp and Brain Neurochemistry

Lisp changes the way you think.

I recently did experiment with solving a programming puzzle. I started off using the object-oriented way, then abandoned the effort mid-way to code the whole thing from scratch using a more data-first, functional approach.

The latter was influenced by Rich Hickey’s talks on Clojure, state, simplicity, and functional programming; and the amazing SICP lectures.

I was using Ruby, a hybrid language that has both OO features and functional features. Ruby's original name was matzlisp, and I feel it's been a great ally for me in making the OO to functional transition.

The lessons I learned while applying the functional mindset are priceless. My ability to write programs and solve problems is completely transformed.

I wanted to share my discoveries with other programmers wanting to bridge over to functional from OO programming; hopefully this will be useful.

The short version

Design programs data-first.

In other words, bring data and data structures to the forefront of your program in the way you declare, reason about, and manipulate data.

The longer version follows below.

Program = data + functions

When confronted with a problem for the first time, answer three questions first:

What is source of the data you’re dealing with?
How is this data represented; what's the shape of data?
What are the main functions needed to turn this data into a solution?

Programming is data transformation.

Use realistic data sets

Run your program against a real-world data set as early as possible.

Let's make objects great again

An object, in Object-Oriented Programming (OOP), is defined by three things.

State: the current values of its mutable variables.
Behavior: the methods
Identity: a fleeting thing of mystery, magic, and intrigue.

Here’s how to create better objects, step by step.

Step 1 — Eliminate mutable state

Get rid of mutable state first by eliminating the use of variables:

use constants in place of global variables
for all other variables, make your methods return new values instead of overwriting the existing ones once their value has changed through computation.

This brings us directly to the next step: the methods.

Step 2 — Turn methods into functions

A method performs computation on some input (global or passed as argument), possibly mutating some global state (bad!) and possibly returning some value upon exit.

A function performs computation on some input, returning a value as result.

Re-design your methods by forcing them to return values — thus turning them into functions.

In other words, re-design them for chain-ability. If you can pipe the output of one method into some other method as input, you can create programs that don't mutate any state at all.

This is called referential transparency: being able to swap a function with its return value, with no changes to program behavior at all.

A function does nothing but explicitly operate on data, without any additional side effects. You should be able to reason about the way a function works, definitively and completely, just by inspecting its code visually. That's right; wave "the magic" good-bye. Magic belongs in Harry Potter books, not programming.

Step 3 — Manage your identity crisis

What's left by now is the object's identity.

This is probably the most subtle part. You have a choice to make:

your object is defined as the sum of its properties
your object gets its identity in some other way.

The former is called extensional identity; the latter, intensional identity.

When an object is defined as the sum of its properties or attributes, it's simply the sum of its functions; in other way, it's the interface the object presents to the world.

When an object gets its idientity from some other property, it's usually the case that its identity is implicitly understood as the unique mix of variables, their values (state), and methods held by the object at any given time. In some cases, an object can also have a custom way of defining identity depending on program domain.

This latter, fuzzier type type of identity is typical of objects in OOP — while extensional identity is common in functional programming.

If you've gone through steps #1 and #2, you've already simplified your objects to the point where they act like containers, or namespaces, for functions. In other words, your objects now have extensional (attribute-based) identity by default. If you don't have a specific need to give your objects an identity based on some other criteria, you can leave them at that. More often than not, you'll find that to be enough.

If your objects need to differ on some other parameter, you can introduce intensional identity. But the best way to do so is to make intensional identity as explicit as possible. Taking a page out of database theory is one way. I've often given objects unique identifiers based on some function of their attributes (date, number, etc), effectively turning them into in-memory database records.

This makes object equivalence straightforward. To determine whether two objects are the same, just compare their unique ids and you're set.

The bottom line: either your objects can be simply defined as the sum of base attributes; or their identity can be reduced to some function of these base attributes.

Step 4 — Congratulate yourself

You've now eliminated about 80% of complexity inherent in OO programs. You have:

turned objects into records — simple containers of key-value pairs like JavaScript objects (JSON), or records in Clojure;
turned methods into functions.

What follows next are some general recommendations.

Primitive data is good

When solving a problem, make functions operate on primitive data types to the extent possible. Replace objects and records with their string or numeric representations. Powerful functions operating on primitive values are a great way to prototype any algorithm. They are a great way to come to the core of the problem quickly and head-on.

Use data structure literals

Favor data structure literals over data objects or variables.

Data structure literals (arrays, hashes, sets) make programs more declarative because they display data within the context of their containing structure. This makes for more transparent programs and easier reasoning. Use data structures to expose data, not hide it.

Think in collections

Think in collections even when dealing with standalone elements; favor functions that take a collection and return a new one. Returning an empty array is better than returning null; an array with a single element is (usually) better than returning the element.

Develop programs bottom-up

At a fundamental level, any computer works by manipulating zeros and ones. Similarly, all that a program does under the hood is manipulate primitive values like strings and numbers (which are all really strings of character encodings).

Similarly, all an object needs to do is wrap a number of data structures and functions within a single namespace.

Most programmers (and I've been there myself) introduce abstract concepts such as objects early in the program's development, and then spend the rest of their time unravelling how the program primitives actually behave underneath all these layers of abstraction. This usually involves long cycles of debugging, testing, and refactoring.

There's a better way: do the opposite. Bring data to the forefront of your program, and make it as explicit, transparent, and concrete (the opposite of abstract) as possible.

When working on a problem, first prototype a solution using just data structures and primitive data types. Only introduce an object when you feel the need to create a dedicated namespace in which to group data and functions according to some custom criteria.

Utilize nested data structures

When solving a problem, go as far as you can using compositions of core data structures (array of hashes; hash of arrays; array of arrays).

You'll find that these standard library data structures, together with the operations they provide, are usually enough to solve a problem. This will make you think in a functional way because operating on collections calls for recursive procedures; the bread-and-butter functions map, reduce, etc) are functional by default.

Resort to creating objects only when this approach results in more complexity than needed.

For example: you have an array of arrays nested n levels deep. But then you discover that all or some of these arrays need to support an operation that isn't a part of the Array class interface. You can choose to wrap your arrays into an object or a struct that acts like a Decorator, extending the Array functionality with the operations you need — but still exposing native Array methods underneath.

Thinking functionally feels like mental shapeshifting. It'll make your programs better, as well.

Edited September 2022.