Building a key-value store part 2

Designing and building a Redis-like in-memory key-value tree store

Nov 19, 2024

Introduction

This is a 6-part series. In the previous post, we discussed the main features and built a basic BST. You can read the post here if you missed it:

Building a key-value store part 1

Martin Joo

November 12, 2024

Introduction

Read full story

If you want to try out the end product you can download the binaries here. Check out the project’s README. If there’s a problem you can build your own binaries with make build

Project structure

Before we dive deep into the details, let’s talk about the project’s structure because it can be a bit weird if you’re not used to Go.

I’m going to create one repository with two folders:

cli
server

Both of them have a go module:

github.com/Computer-Science-Simplified/tedis/cli
github.com/Computer-Science-Simplified/tedis/server

(Computer Science Simplified is an organization on my GitHub and it generated that shitty name for it. It’s not intentional)

A module is a collection of related Go packages (similar to a namespace in other languages) that are versioned together as a single unit. Each module has its own dependencies. This is perfect for us:

The server and the CLI have nothing to do with each other (it’s like a frontend and a backend)
They need different dependencies
They have different versions

So they have different Go modules.

This is the module file for the server:

module github.com/Computer-Science-Simplified/tedis/server

go 1.23.0

require github.com/gookit/event v1.1.2

It uses Go 1.23.0 and it requires one dependency.

Now, let’s take a look at the server’s structure:

The root folder contains three files:

go.mod
go.sum which is similar to package-lock.json or composer.lock. It contains all the dependencies with exact versions and checksums.
Makefile. This is not required but it helps us to quickly build or run the project

Now, the folders:

bin contains executable binaries. These are the artifacts of the program
resources contains the generated AOL and RDB files
cmd contains the main applications for this project. If your project exposes an API and also has CLI commands then cmd should contain an api and a cli folder. Both of them should contain a main.go that is executable with go run cmd/api/main.go
(there should be another folder called test)
internal contains everything that is business logic and related to your app. Each subfolder is called a package. Since there’s a command folder, this project has a command package.

This is where Go gets a bit weird (at least for me). Given the following structure:

internal
  command
    foo.go
    bar.go

Both foo.go and bar.go belong to the command package. Let’s say they define two functions:

// foo.go
package command

func Foo() {
    fmt.Println("foo")
}

// bar.go
package command

func Bar() {
    fmt.Println("foo")
}

From another package, they can be used like this:

// baz.go
package otherpackage

import "command"

func Main() {
    command.Foo()
    command.Bar()
}

Even though Foo() and Bar() are located in two different files, they can be accessed the same way. command.Foo() or command.Bar(). This was very strange to me at first.

You can structure the internal folder in at least two ways:

Each feature or set of features has a subfolder. Something like products, orders, etc.
Each type of file has a folder. For example, model, service, etc.
For the most part, I’ll create folders for features, but as you can see it has an enum and a model folder as well. Since these files are usually used by many other packages, this way it’s easier to avoid circular dependencies.

This project contains the following internal packages:

command deals with client commands such BSTADD or BSTEXISTS
enum contains enums. They are small and used often. I like them in a seperate folder.
model contains models. This project is a bit different and there are no “classic” models but we still want to persist trees on the disk. So at the moment, Tree is the only model.
persistence contains AOL and RDB logic
store contains the in-memory key-value store
tree contains the actual tree implementations such as BST or binary tree.

If you’re a PHP (especially Laravel) developer I have bad news. Your beloved ProductCategoryService.php becomes productcategoryservice.go

The convention is to use lowercase letters without separators.

Of course, there’s no “official” project structure (thank God). My inspiration was mostly the project-layout and the go-blueprint repositories.

A quick question for you:

In-memory store

The next step is to actually build the in-memory store that holds keys and the data.

Fortunately, it’s pretty simple. It’s hashmap:

package store

var store = make(map[string]model.Tree)

It’s a simple map that holds string keys and Tree values. Tree is an interface that is implemented by every tree (only BST exists at the moment):

package model

type Tree interface {
    GetKey() string
    GetType() string
    Add(value int64)
    Exists(value int64) bool
    Remove(value int64)
    GetAll() []int64
}

At the moment, trees can only contain integer values. Each tree holds its key such as “mykey” and its type such as “binary_search_tree.” It’ll be important later.

If you’re paying attention you can see I’m storing values instead of pointers in the store map:

var store = make(map[string]model.Tree)

But storing one million pointers would be better than storing one million tree objects (structs) in a huge map, right? Right. But model.Tree is not a struct just an interface. An interface is just a tuple of (value, type). “An interface value holds a value of a specific underlying concrete type.” - Go Docs

When it’s implemented in BST, a pointer to a struct is used:

type BST struct {
    Key  string
    Root *BSTNode
}

func (tree *BST) GetKey() string {
    return tree.Key
}

func (tree *BST) GetType() string {
    return "binary_search_tree"
}

The type tree *BST is passed to the functions. Back to the Go docs: “An interface value holds a value of a specific underlying concrete type.”

In this specific case: “An interface value holds a value of a pointer to a BST.”

So when we do this:

var store = make(map[string]model.Tree)

It’s going to hold memory addresses and pointers:

This is the store map holding two keys: a, and b. Each value is a memory address so the store object remains lightweight.

By the way, if you were looking for the implements keywords you won’t find it. In Go, you implement an interface by implementing all the functions. That’s it.

I don’t want to expose the whole store to the outside world as it is so I created some proxy functions:

func Get(key string) (model.Tree, bool) {
    item, ok := store[key]

    return item, ok
}

func Set(key string, tree model.Tree) {
    store[key] = tree
}

func Len() int {
    return len(store)
}

In Go, accessing a key from a map returns two values:

The actual value or nil if the key is not found
A bool that is true if the key is found, and it’s false if the value is not found

So it’s either (“value”, true) or (nil, false).

After the store is ready, we need a way to create trees of specific types with specific keys. Something like this:

tree.Create("product-categories", "binary_search_tree")

This is the tree factory. It’s located in the internal/tree/factory.go file:

package tree

func Create(key string, treeType string) (model.Tree, error) {
    if item, ok := store.Get(key); ok {
    	return item, nil
    }

    if treeType == enum.BinarySearchTree {
    	tree := &BST{
    	    Key: key,
    	}

    	store.Set(key, tree)

    	return tree, nil
    }

    return nil, fmt.Errorf("tree type not found: %s", treeType)
}

The Create function is quite simple:

If the given key is already in the store it returns the tree
Otherwise, it creates a new BST (since this is the only type now), and saves it to the store

Now, we have a very basic but working in-memory key-value tree store:

package main

func main() {
    fmt.Println("Adding elemnets to tree1")

    tree1, err := tree.Create(
        "product-categories", 
        enum.BinarySearchTree
    )

    if err != nil {
    	panic(err.Error())
    }

    tree1.Add(8, true)
    tree1.Add(10, true)
    tree1.Add(12, true)

    fmt.Println("Retrieving elements from tree2")

    tree2, err := tree.Create(
        "product-categories", 
        enum.BinarySearchTree
    )

    if err != nil {
    	panic(err.Error())
    }

    fmt.Println(tree2.GetAll())

    fmt.Printf("%s\n", strconv.FormatBool(tree2.Exists(10)))
    fmt.Printf("%s\n", strconv.FormatBool(tree2.Exists(20)))
}

The output of this function is this:

TCP server and commands

As the next step, let’s add a TCP server, a client, and some command objects (structs) to handle commands such as BSTADD my-key 12.

Every Go project has a main package and a main() function. This is the executable, the entry point of the application. We’re going to spin up a TCP server in this function:

package main

import (
    "net"
    "fmt" 
)

func main() {
    fmt.Println("Starting Tedis...")

    listener, err := net.Listen("tcp", ":2222")

    if err != nil {
    	panic(err)
    }

    defer listener.Close()

    fmt.Println("Tedis is listening on port 2222")
    fmt.Println("Ready to accept connections")

    for {
    	conn, err := listener.Accept()

    	if err != nil {
            fmt.Println("Error accepting connection")
    	    continue
    	}

    	go handleConnection(conn)
    }
}

net.Listen(“tcp”, “:2222”) is used to listen for incoming connections on port 2222. It’s the usual stuff you’re probably used to but instead of HTTP, we need TCP in this case because the client is going to be a CLI program.

The defer statement is used to schedule a function call to be executed after the surrounding function completes.

defer listener.Close()

This line closes the listener when main() exists. It’s very useful since you can defer cleanup tasks right after you open a connection (or open a file, etc) not 50 lines later, which is pretty easy to forget.

After that, the function starts an infinite loop and accepts connections. It’s a long-running, “infinite” process since anyone at any time can connect to the server.

When there’s a new incoming connection the following line handles it:

go handleConnection(conn)

The keyword `go` is very important. This is called a go routine. It’s concurrent. The go keyword starts a new thread.

Handling a connection means parsing the command the client sent, executing a query, and returning the results. Let’s say it takes 100ms (which is very high for an in-memory DB but it’s just an example). What happens if we handle the incoming connection without starting a new go routine? We block the main thread for 100ms. What happens when 10 requests come in? The 10th will wait for 10*100ms or 1s, which is not optimal at all.

The other problem is that when a client connects we should start another infinite loop like any other CLI client does. Just think about when you start redis-cli or mysql. You have a session that lasts “forever.” It’s another infinite loop. If we start another infinite loop on the main thread, other clients cannot connect to the server at all since it runs the infinite loop for client #1.

Because of these problems, the server handles every incoming request on a new thread.

The handleConnection function first needs to read from the connection:

func handleConnection(conn net.Conn) {
    fmt.Printf(
        "Connection established with: %s\n", 
        conn.RemoteAddr().String()
    )

    defer conn.Close()

    buffer := make([]byte, 1024)

    for {
    	n, err := conn.Read(buffer)

    	if err != nil {
    	    fmt.Println("Error reading from connection")
    	    return
    	}
    }
}

It closes the connection once the function exits using defer
It creates a byte array with 1024 elements, which is basically 1KB of data. It’s probably overkill but it’s okay for now.
Then it starts an infinite loop since the connection lasts until the client closes it
In the loop, it reads from the connection in 1KB chunks. conn.Read() returns the number of characters it read

After that, we need to get and parse the given command:

commandName := string(buffer[:n])

fmt.Printf("Received: %s", commandName)

cmd, err := command.Parse(commandName)

buffer is a fixed-sized array so we need to get the first n characters out of it using buffer[:n]. It means creating a new slice (array) until the nth element where n is the number of characters read from the client.

One very annoying thing in Go is name collisions. There’s the command package that is imported:

package main

import "command"

From that point, command can be used as a variable:

cmd, err := command.Parse(commandName)

If I wrote this:

command, err := command.Parse(commandName)

It would have been a name collision between the command package and the return value which I called command.

I won’t include the content of command.Parse() because it’s boring. It parses the string, validates if it’s a valid command, and returns a Command instance that looks like this:

package command

type Command struct {
    Name string
    Key  string
    Args []int64
    Type string
}

A concrete command can be something like this:

bstadd := Command{
    Name: "BSTADD",
    Key: "my-key",
    Args: []int64{10},
    Type: "binary_search_tree",
}

At the given moment, I don’t think we need separate structs for the different commands such as:

BstAddCommand
BstGetAllCommand

After a certain point, maybe it is needed because commands can be very different so it’s hard to contain everything in a single Command struct. For now, it’s enough.

It holds some meta-data such as the key, and the tree type (since every command is associated with a specific tree). Of course, it contains the command’s name and its arguments as an integer array.

After the function parses and creates a new command, the step is executing it:

cmd, err := command.Parse(commandName)

if err != nil {
    conn.Write([]byte(err.Error() + "\n"))
} else {
    result, err := cmd.Execute()

    if err != nil {
        conn.Write([]byte(err.Error() + "\n"))
    	return
    }

    _, err = conn.Write([]byte(result + "\n"))

    if err != nil {
	fmt.Println("Error writing to connection")
	return
    }
}

Parsing can fail (for example, unknown command) so we check for an error and write it back to the client.

If everything goes well, the command is executed and either the result or an error message is sent to the client.

The last piece of the puzzle is command.Execute() which is literally just a wrapper around tree functions:

package command

type Command struct {
    Name string
    Key  string
    Args []int64
    Type string
}

func (c *Command) Execute() (string, error) {
    tree, err := tree.Create(c.Key, c.Type)

    if err != nil {
    	return "", err
    }

    switch c.Name {
    case enum.BSTADD:
    	tree.Add(c.Args[0], true)
    	return "ok", nil

    case enum.BSTEXISTS:
    	exists := tree.Exists(c.Args[0])
    	return strconv.FormatBool(exists), nil

    case enum.BSTGETALL:
    	values := tree.GetAll()
    	return fmt.Sprintf("%v", values), nil

    case enum.BSTREM:
    	tree.Remove(c.Args[0], true)
    	return "ok", nil
    default:
    	return "", fmt.Errorf("command not found: %s", c.Name)
    }
}

It creates a tree instance based on the key and type and it executes one of its functions.

For now, command.Execute() returns a string. In the long term, it should return some kind of Result object (struct).

This is the whole handleConnection function:

func handleConnection(conn net.Conn) {
    fmt.Printf(
        "Connection established with: %s\n", 
        conn.RemoteAddr().String()
    )

    defer conn.Close()

    buffer := make([]byte, 1024)

    for {
    	n, err := conn.Read(buffer)

    	if err != nil {
    	    fmt.Println("Error reading from connection")
    	    return
    	}

    	commandName := string(buffer[:n])

    	fmt.Printf("Received: %s", commandName)

    	cmd, err := command.Parse(commandName)

    	if err != nil {
    	    conn.Write([]byte(err.Error() + "\n"))
    	} else {
    	    result, err := cmd.Execute()

    	    if err != nil {
    		conn.Write([]byte(err.Error() + "\n"))
    		return
    	    }

  	    _, err = conn.Write([]byte(result + "\n"))

    	    if err != nil {
    		fmt.Println("Error writing to connection")
    		return
    	    }
    	}
    }
}

CLI client

The client is pretty simple. It needs to do the following things:

Connecting to the server
Reading commands from STDIN
Sending them to the server
Reading the response

The project has a client and a server folder. In the client folder, there’s only a main.go that looks like this:

package main

import (
    "bufio"
    "fmt"
    "net"
    "os"
)

func main() {
    // Step 1: Connection to the server
    conn, err := net.Dial("tcp", "localhost:2222")
    if err != nil {
    	fmt.Println("Couldn't connect to server at localhost:2222")
    	os.Exit(1)
    }

    defer conn.Close()

    for {
        // Step 2: Reading from STDIN
    	reader := bufio.NewReader(os.Stdin)

    	command, err := reader.ReadString('\n')
    	if err != nil {
    	    fmt.Println("Couldn't read input")
    	    os.Exit(1)
    	}

        // Step 3: Sending the command to the server
    	_, err = conn.Write([]byte(command))

    	if err != nil {
    	    fmt.Println("Couldn't send command")
    	    os.Exit(1)
    	}

        // Step 4: Reading the response from the server
    	buffer := make([]byte, 1024)

    	n, err := conn.Read(buffer)
    	if err != nil {
    	    fmt.Println("Couldn't get response from server")
    	    os.Exit(1)
    	}

    	fmt.Printf("%s", string(buffer[:n]))
    }
}

If you now run go run main.go in server/main you get the following output: