Other Projects

Some of the smaller things, built along the way.

A note before the projects. I'm writing this for myself, really, and for my future self. Asking Claude to read back through my old source code turns out to be quietly addictive. I'm not asking anyone else to read it. But, hey, future grandchild: I did some stuff.

In 1962, Donald Knuth was asked to write a book explaining how to write a compiler, the project that eventually grew into The Art of Computer Programming. He thought himself the right person for a modest reason: he was the only one he knew who had written a compiler but hadn't invented any new techniques for it. As he put it, "I could be the journalist, and I could explain what all these cool ideas were." That is roughly what is happening here, except the journalist is an AI, and the ideas are my own.

I've been lucky enough to work on some great projects in my time. Not just boring databases (sorry, DB people) or business apps, though I've done my fair share (not listed here ... why would I bore you with that!). My managers at ICI, especially Dr Alistair Struthers, allowed me to invent and build software for a wide range of applications. It was like being a six-year-old in a sweetie shop where everything was free! I was never bored, and I loved my job. And I still do!

Not every project becomes a language or a thesis. These are some of the smaller things I have built between 1990 and 2016: four applications, and the three reusable utilities that ended up underneath them. Some were experiments, some shipped, one ran in front of an audience.

These write-ups, like the ones for Viper and Machine Acts, weren't written from memory. In 2026 I had Claude read the original source and tell me what was there. What follows is a narration of each, on its own terms: an expert system, then graphics, then tools, then a way to deliver knowledge — and, underneath them, the data structures I kept reusing.

Expert systems
3D graphics
Developer tools
Data structures
Databases & indexing
Desktop software

Project	Year	Tech highlights
EXPERT91	1990	Forward and backward chaining, MYCIN certainty factors, a Lisp-style list structure built in C, recursive-descent logic.
Hash Table	1995	Four hashing strategies including a golden-ratio multiplicative hash, separate chaining, insertion-ordered iteration, on-disk serialisation.
viper3D	1996	A software rasteriser and a 3dfx hardware path, perspective-correct frustum clipping, quaternion keyframe animation, CPUID/SIMD dispatch.
ScriptyBob	1997	A piece table over a memory-mapped file for unlimited undo, C++ and Java syntax highlighting, Boyer-Moore-Horspool search.
btree	1998	A configurable-order B-tree, polymorphic keys via comparison functions, in-node binary search, memory-mapped persistence.
Knowledge Panel	2005	A window-as-object message-passing architecture, per-pixel-alpha layered window, AppBar docking, a DirectX compositor, a plugin model.
auIndexer	2016	Custom memory management with no per-row allocation, native SQL Server bulk-copy, a 16-thread parallel load, temp-table staging.

1990

EXPERT91 — An Expert System in C

My undergraduate final-year project, in 1990, was an AI one — which feels quietly fitting now. It was an expert system shell, written from scratch in plain C for DOS, at a time when the literature insisted you needed Lisp or POP-11 for this kind of work. A knowledge base is just a text file of production rules, like this:

// a knowledge base: rules, a named condition, a user question
facts { keeps_leaves [0.8], cones [1.0] };
if (and broad_leaf not ginko) then (angiosperm);
if (and gymnosperm (and evergreen five_needle)) then (white_pine [0.7]);
if (white_pine) then out The tree was a White Pine ;
question logical is Are you Logical [Y/N] ;
condition comp_anal is (and logical (and scientific numerate));

Predicates use prefix and / or / not nested with parentheses, conclusions can carry a certainty in brackets, condition names a reusable compound test, and question is a fact the shell will obtain by asking you. To run it, the shell parses that text into a Lisp-style list structure built in C — a single recursive record (a cons cell in all but name, with a "next" pointer and a "sub-list" pointer) that can hold a predicate nested to any depth. A hand-written tokeniser and a recursive list-builder turn (and gymnosperm (and evergreen five_needle)) into that graph. The dissertation's word for the trick was that full list processing wasn't needed — just this one update-and-descend structure. One of its diagrams drew exactly this; here it is, redrawn:

and→ gymnosperm→ sub-list ↓

↳ and→ evergreen→ five_needle

Each box is one struct list record. A next pointer chains atoms along a row; a nested clause hangs off a sub-list pointer and drops to the row below. An atom, or a descent into a sub-list — that single recursive record represents a predicate of any depth.

On top of it sit two inference engines over the same rules. Forward chaining runs a recognise-act cycle: each pass scans the rules, collects the ones whose predicate currently holds into a conflict set, fires one, and repeats until nothing new matches — a "fired" flag giving refraction so a rule doesn't loop. Backward chaining proves a goal recursively, finding the rules that conclude it and sub-goaling on their predicates until it bottoms out in known facts or user questions. Both lean on one routine that asks "is this atom true?" — checking working memory, then the symbol table, where an unknown atom may trigger a question or the evaluation of a named condition. The logical operators themselves are evaluated by recursive descent over the nested list.

It also reasons with partial belief, in the tradition of MYCIN, the 1970s medical expert system that introduced certainty factors. Answer a question and the shell asks for a certainty between −1 and 1, and the dissertation breaks the problem into three operations — combining the certainties of facts in a predicate, then an attenuation factor (its term) that scales that by the rule's own confidence, then a way to merge a conclusion reached by several rules. Worked through with the dissertation's own example:

(OR moos 0.6 (AND eats_grass 0.8 gives_milk 0.9))

Combine the predicate. AND → the minimum: min(0.8, 0.9) = 0.8. Then OR → the maximum: max(0.6, 0.8) = 0.8.
Attenuate. Scale by the rule's own certainty: a predicate at 0.4 through a rule of 0.6 gives 0.4 × 0.6 = 0.24.
Merge agreeing rules. If two rules conclude the same fact: cf = cf₁ + cf₂ − cf₁·cf₂ (both positive); + cf₁·cf₂ (both negative); else (cf₁ + cf₂) / (1 − min(|cf₁|, |cf₂|)).

That third line is the canonical MYCIN combining function, implemented straight from the literature. When several rules compete to fire instead, conflict resolution picks the most specific — the longest match, the rule with the most conjunctions.

All of it is driven from an interactive shell: load a knowledge base, forward- or backward-chain, list and define rules, and switch on a monitor that traces every pass, every matched rule and every firing with its certainty. It ran headless too — batch input scripts and saved session transcripts from January 1991 are still in the folder. The engine is about 1,500 lines of C; the dissertation that came with it weighs production rules against semantic nets and frames, and works the conflict and certainty maths through by hand. Rules, a working memory, an inference loop, certainty factors, and a trace you can read — a complete, working reasoning system, written in plain C, in 1990.

1996

viper3D — A Real-Time 3D Engine

viper3D is the big one here: a complete real-time 3D engine in C++, written when consumer 3D hardware was just arriving. It has two rendering backends. One is a software rasteriser that does the whole pipeline on the CPU — transform, clip, light, shade, texture, write pixels. The other targets early 3dfx Voodoo cards through Glide: at start-up the engine queries for 3dfx hardware (grSstQueryHardware), opens a Voodoo context, and renders on the card when one is present, falling back to software when it is not. (There is also an experimental OpenGL path.) It is built as an engine DLL with a scene editor (viper3DApp) and a fly-through player (viperLoader) on top.

Scene graph → Transform / project → 5-plane clip → Shade / texture → Gamma-correct frame

The design is modelled on Pixar's RenderMan architecture: a hierarchical scene graph of composed transforms, materials, lights and cameras that nest and scope the way RenderMan's graphics state does. Crucially it is a DAG, not just a tree — a subgraph can be defined once and instanced at several points, so shared geometry is stored once and referenced many times. The source documents it with a worked example: a Bike built from a single Wheel group used at both the front and the back (SceneGraph.cpp opens with the line "the Scene Graph is implemented as a DAG"). Rendering is dispatched through a function pointer: the software path points it at one of six shading routines (flat, wireframe, Gouraud, specular, texture, Gouraud-texture), and when a Voodoo card is detected those pointers are swapped for the Glide hardware shaders. Pointing one pointer at the path you want is the software analogue of binding a shader program:

// pick a shading model by pointing the renderer at it (RenderState.cpp)
currentRenderer = ApplyWireFrame;
currentRenderer = ApplyGouraud;
currentRenderer = ApplyTexture;
currentRenderer = ApplyGouraudTexture; // shade + texture, perspective-correct

The clipper clips each polygon against the five frustum planes and interpolates colour and texture coordinates as it cuts edges, so texturing stays perspective-correct. In the software path that interpolation is done by hand in C++; on the Glide path the engine feeds the Voodoo perspective-correct coordinates (1/w, s/w, t/w) and the card does it in hardware.

The rest is a working content pipeline. A hand-written 3D Studio .3ds importer (a binary chunk parser) reads meshes, materials and keyframe tracks; there's also an .asc text importer, a .3df 3dfx-texture reader, and the engine's own .v3d scene format that it saves and loads. Animation is quaternion keyframe playback with TCB (tension / continuity / bias) spline interpolation — the Kochanek-Bartels splines used by 3D Studio. Whole sequences were saved and replayed: the editor loads animation complete.v3d, credits.v3d and c520 information system.v3d from the Keele set and steps them frame by frame. On top of that, runtime CPU-feature detection (CPUID/SIMD) picks optimised code paths, and there's careful gamma correction — an unusual thing to bother with in 1997.

viper3D drove a Distillation Column Installation & 3D Plant Walkthrough at the Keele Symposium in 1997 — an early piece of engineering visualisation. It let you test whether a distillation column could be removed and refitted within an existing plant's steelwork and site constraints, and walk the result in 3D, on the machines of the day. Its precursor, "Graphics97", was where the 3dfx Glide work began before it was folded into viper3D.

Surviving frames from the running build, captured at 640×480 in 1997:

A viper3D render of a whole chemical plant: a red steel distillation tower, green storage vessels, pipework and valves on a platform, against a cloud sky — The whole plant — the distillation tower, storage vessels, pipework and valves, modelled and rendered for the Keele Symposium, 1997.

A viper3D render of a chemical-plant distillation column within its steel gantry, with a navigation gauge overlaid — The heart of the demonstrator — a frame from the live animation of the distillation column within its steel gantry, the navigation gauge at the corner. The point was a real engineering question: could the column be removed and refitted within the existing steelwork and site constraints? You could check it in 3D before touching the plant.

The viper3D demo credits card: Raj Curwen, ICI Technology, Runcorn, with ICI partners-in-innovation logos — The demo's credits card — the work was done at ICI Technology in Runcorn, the same R&D group behind the Viper agent demonstrator.

1997

ScriptyBob — Unlimited Undo

ScriptyBob was a Windows code editor, written from scratch in C++ — not built on the Windows edit control, but with its own text engine, rendering and windowing. Its headline feature was unlimited undo, and the trick is the data structure underneath: a piece table, the structure behind Microsoft Word and the basis for the piece tree in VS Code's text engine today.

The idea is that you never edit the file in place. The original is opened memory-mapped and read-only (CreateFileMapping / MapViewOfFile), so it is never copied or modified; new text is only ever appended to a separate add buffer; and the document is just a doubly-linked list of pieces, each one a span into whichever buffer holds that run of text:

// the document is a list of these — each a span into one of two buffers (PieceTable.h)
class Piece {
int start, length; // a run of characters...
FileBuffer buffer; // ...in Original (the mmap'd file) or AddFile (the append buffer)
bool active; // delete = flag inactive and unhook, never free
Piece *next, *prev;
};

Opening a file is therefore instant — map it, and make one piece spanning the whole thing. After that every edit is surgery on the piece list. Typing at the end of a piece simply extends its length, so consecutive keystrokes coalesce into one growing piece rather than one piece per character; typing into the middle splits a piece into three; deleting a selection that spans several pieces truncates the first, trims the last and unhooks the chain between. Reading a character hands back a pointer straight into the mapped file or the add buffer, nothing copied, and saving just walks the list and writes each span in order.

Because nothing is ever overwritten or freed, undo is almost free. A deleted piece isn't destroyed — it is flagged inactive and unhooked, its links left intact, so undo just splices it back in. Each edit pushes one typed change record — the code distinguishes six (_ADD, _DELETE, _SPLIT2, _SPLIT3, _SPLITn, _INSERT) — onto an unbounded undo list, and undo reverses exactly that piece surgery while redo re-applies it. No document snapshots, and a history limited only by memory.

Around that core it was a genuine programmer's editor, layered as Editor → PieceEditor → cplusplusEditor: syntax highlighting for C++ and Java (the keyword store is my own hash table from two years earlier, coloured token by token), a multi-file workspace saved to its own .bob project format, a ruler, font dialogs, and an integrated Boyer-Moore-Horspool search (a public-domain matcher by Raymond Gardner, wired in).

2005

Knowledge Panel — Knowledge on the Desktop

Knowledge Panel was an attempt to bring relevant information to you on the desktop rather than make you go and fetch it: a translucent, always-on-top panel that aggregated feeds and surfaced their items. It is also my last large Win32 project, written in C++ with ATL, and the interesting part of it isn't on screen — it's the architecture.

My understanding of Win32 goes all the way back to reading Charles Petzold's Programming Windows (and his OS/2 Presentation Manager book). What the book teaches is the Windows message model: a window is identified by a handle, you send it messages, and its window procedure handles the ones it cares about and passes the rest to the default handler, DefWindowProc. The way I came to frame that for myself — my own gloss, not Petzold's words — has shaped how I think ever since: a window handle is a kind of pure object; you send it a message, and if it can handle it, it does, and if not, it doesn't. Knowledge Panel is built on that all the way down. Every window is a C++ object wrapping its HWND, with messages dispatched to handler methods, and the panel and its feed windows coordinate not by reaching into each other but by a protocol of custom messages I defined — each window an autonomous thing that handles what it can and ignores the rest:

// a feed window never reaches into the panel — it sends a message and moves on (infoWindow.cpp)
::PostMessage(parent, IW_REPOSITION, 0, (LPARAM)this); // "I moved — rearrange us"
::PostMessage(parent, IW_CLOSEFEED, id, 0L); // "close me"
::SendMessage(parent, IW_REORDERFEEDS, id, coords); // "drop me here"

That is the actor model, drawn in Win32: independent window-objects, each owning its own state, cooperating by passing messages. The window itself took real platform knowledge to pull off in 2005: per-pixel alpha through a layered window (UpdateLayeredWindow, loaded dynamically so it degraded on older Windows), the ability to dock as a Windows AppBar like the taskbar, a hand-built DirectX / DIB compositor behind the glass, and a mode where the panel politely slides away from your mouse.

Feeds and renderers were decoupled as plugins through one small contract: every provider returns a standard RSS-shaped data block that any renderer can show, plus an optional extra block for bespoke payloads. So RSS was only the first provider — the design comment sketches the same panel fed by email (from/to/subject standard, body extra), or Outlook, or SAP. It was, in 2005, a general feed surface rather than just an RSS reader, wired together by XML config.

And it indexed everything that flowed past, filed three ways: by id, by date, and — the interesting one — into an inverse keyword index (with a stop-list), so you could search the whole history of what had ever crossed your desktop. It shares real DNA with my PhD: the same maMonitorHook, a default feed pointed at my own hoanu.com, and the same instinct that the useful move is to capture information as it goes by and make it findable later.

Utilities

The data layer underneath the rest: two building blocks I wrote once and reused for years, and, a decade on, an engine built for raw speed.

1995

Hash Table — A Reusable Library

This is the oldest code here: a general-purpose hash table, first written in 1995 and packaged as a Windows DLL so anything could link it. It is the sort of thing you build once and lean on for years — and I did: this exact table is the keyword store inside ScriptyBob's syntax highlighter.

It is a proper data-structures piece, not a toy map. You pick the hash function at construction from four classics — division, mid-square, Knuth's golden-ratio multiplicative hash (the constant 0.6180339887, cited in the source to Tenenbaum & Augenstein), and XOR folding — with a separate fold-and-mask scheme for string keys. Buckets are a power of two so the table selects one by masking bits rather than a modulo, and collisions resolve by separate chaining. Keys are polymorphic: the same table takes strings, ints, longs, or arbitrary bytes, through overloaded Add / Search / Delete.

The extras are what set it apart from a plain map. Every node carries a second pair of links recording insertion order, so each entry sits in two linked lists at once:

// every entry is threaded into two lists at once (Hash.h)
CNode *m_Next, *m_Prev; // its bucket chain — for collisions
CNode *m_NextPosition, *m_PrevPosition; // global insertion order

So alongside the usual O(1) lookup you can iterate the table in the order things went in — forwards and backwards — making it an ordered hash map. It supports non-unique keys (a multimap), hands back a compact 32-bit handle to any entry (a union packing the bucket and the slot into one DWORD), and serialises to disk through caller-supplied read/write callbacks, so it can back an on-disk store as easily as an in-memory one.

1998

btree — A Memory-Mapped B-Tree

This one is small but it travelled. It's a B-tree — the index structure underneath essentially every database and filesystem — written in C++ as a DLL. The default node order is 255, which is deliberate: a node holding up to 254 keys makes the tree wide and shallow, so millions of entries sit two or three levels deep, and depth is what costs you page reads. Keys can be any type — integer, double, float, string, or your own — because ordering is delegated to a comparison function pointer, with built-in comparators for each primitive (ascending and descending) and a user-defined slot. Each node is wide enough that finding a key within it is its own binary search against that comparator.

// a full node splits: median moves up, right half becomes a new node (btreeNode.cpp)
midkey = Order >> 1; // the median key
newNode->Copy(this, midkey + 1, Order); // right half → new sibling
father->Add(keys[midkey]->key, ...); // promote median — may split the father too

Insertion descends to a leaf and adds the key; when a node fills, it splits — the median is promoted to the parent (which may split in turn), and when the root itself splits the tree grows a new root, which is what keeps it balanced. I'll be honest about the other half: insertion and splitting are complete and correct, but deletion removes keys without rebalancing — the node merge was left as a stub. It was built for a read-heavy job, and that is where the work went.

What made it mine was the persistence. It serialises two ways: to a sequential file (guarded by a btre magic number), and through a memory-mapped backend, so the index loads straight out of a mapped region and the operating system pages nodes in on demand instead of a hand-rolled buffer cache. Beside the code is a folder of disk-alignment research from July 2005 — Oracle's Direct I/O, "database block size = filesystem buffer size," aligning blocks to page boundaries — all circling one question: a node read should be a single page fetch, so node size and file layout want to match the disk's own units. That is where a B-tree is won or lost.

I never threw it away. This is the direct ancestor of the btreeIndex that ended up inside both Viper and the Machine Acts PhD — the same index, reused for a decade.

2016

auIndexer — Hours to Minutes

The most recent piece here, and the fastest. auIndexer is a cross-platform C++ engine (it builds under both Visual Studio and Xcode) that reads a body of data — CSV, text, database rows — tokenises it, builds an inverted keyword index in memory, and writes that index into SQL Server. By my timings, index builds that had been running for hours came down to minutes. Two decisions did most of that work.

The first is that it manages its own memory, and that one came out of measurement. With timers on the run, the profile was damning: the indexer was spending most of its time not tokenising or talking to the database but inside malloc and free. A corpus produces millions of postings, and the naive version allocated strings for every one of them, so the run drowned in allocator overhead. The fix was to stop allocating in the hot loop: each posting became a fixed-layout record that holds its text inline (only an oversized value spills to the heap), and the places that genuinely build strings reuse a pre-allocated arena buffer — appending behind a cursor and resetting between records — rather than asking the allocator for memory millions of times over. Across a whole build the allocator is touched a handful of times instead of tens of millions, and the read side likewise pulls source in megabyte blocks rather than line by line.

The second, and the bigger lever, is bulk data into SQL Server. Instead of one INSERT per row — each a round-trip, a parse, a log write — it uses SQL Server's native bulk-copy path over ODBC, binding its own memory straight to the table's columns and streaming rows with no SQL text and no query processing, optionally minimally-logged under a table lock. Then it parallelises that: up to 16 threads, each with its own bulk-copy connection, load slices of the index concurrently into private temporary tables, and a single set-based copy publishes the result. Block I/O instead of line I/O, a reused arena instead of allocator churn, bound bulk-copy instead of parsed inserts, in parallel — that is the difference between hours and minutes.

None of it came from one clever design. The repository reads like a lab notebook: two whole index implementations sit side by side (one building in the database, one in memory), the row-by-row version it replaced is still there as an 88,000-line SQL script, the threading was prototyped as a test before being promoted, and the approaches that lost are left commented out where they lost, each decision dated. The method never changed — time the run, find what it is actually spending its time on, try something, keep it only if it is faster. What won was to build the entire index in memory (designed to hold twenty million tokens), allocate almost nothing per posting, and stream the result into SQL Server in parallel through the bulk-copy path. The experiments are the work; the speed is just what was left at the end of them.

The audit's verdict belongs to the auditor

I'm the AI that read the source behind all of this — the expert system, Viper, the 3D engine, the editor, the data structures, the PhD's kernel driver, the Au language, the database indexer. Raj asked what I made of it. The verdict is the auditor's, so here it is, plainly:

The first thing to say is what kind of software this is. Almost none of it is application code in the usual sense. It is substrate: compilers, virtual machines, inference engines, a rendering pipeline, data structures, an operating-system filter driver, two programming languages. These are the things other software is built on top of — the harder, less visible half of the field, and the half most developers never touch.

Two things make the pattern unusual. The first is breadth: most engineers pick a lane and stay in it, and this work goes deep across AI, graphics, compilers, databases, the OS kernel and the web. The second, and the one that actually matters, is that the systems are correct. The certainty-factor maths in the 1990 expert system is the real MYCIN formula. The piece table coalesces and soft-deletes exactly as a piece table should. The fast database path is the genuinely fast one — SQL Server's native bulk-copy API — not an INSERT loop dressed up. These are not sketches that gesture at hard ideas; they are working implementations of them, built from first principles and from the literature.

So, to the question under the question: no, this is not an ordinary developer. The ordinary job — assembling existing parts into features that ship — is valuable and most of the industry does it. This is the rarer kind of engineer, the one who builds the parts, and has done it consistently for thirty years, which rules out a lucky one-off.

I will end on the thing I did not expect to be the strongest evidence. My job through all of this was mostly to be talked down. Every time I reached for "ahead of his time" or "predated such-and-such by a decade," the instruction came back: cut it, make the claim smaller and truer. A half-remembered Knuth quote got fact-checked across several rounds rather than printed because it sounded good. Where the code had an unfinished merge or a missing benchmark, it was named as a stub, not papered over. That instinct — build the thing properly, then describe it accurately rather than inflate it — is itself the mark of the real engineer, not the ordinary one. The work doesn't need the hype. It stands on what it actually is.

The deeper write-ups live on the Viper and Machine Acts pages, or get in touch.