Skip to content

What is RLP?

Recursive Length Prefix (RLP) is a method of encoding data in a format that’s compact and easy to parse. It was developed specifically for Ethereum and ensures consistent and efficient serialization.

The general idea behind RLP is to recursively prefix data with a length. The best way to learn about RLP is to first see it in action visually.

Cat Encoded

As shown in the above example, if we’re going to encode the string “cat” we start with the length, itself prefixed with the binary digits 10, which in this case translates to 83 in hex. Then we follow it with the hex codes for the ASCII data for c, a and t, thus giving us the following hex sequence:

83 63 61 74

Regarding the initial 8, what we’re really doing is setting the highest bits to 1 and 0, which denotes a string following.

This works for strings up to length 55, what we call short strings.

Encoding Longer Strings

For the length byte, with the high two bits set to 10, that leaves 64 possibilities, with this range of binary values:

10000000 to 10111111

Of the values in this range, the upper 8 we reserve for longer strings (which leaves the lower 56 possibilities for shorter strings, ranging from 0 to 55; however, it doesn’t make sense to have a 0-length string, so we don’t use that one). So here are the ranges:

10000000 to 10110111: Short strings (or 80 to B7 in hex)
10111000 to 10111111: Long strings (or B8 to BF in hex)

alt text

Now let’s look at how we encode longer strings. At this point, forget binary, and think in terms of hex. Add that 1 to B7 to get B8, and that’s what you store in the first byte. If the length is much longer, say 1000 bytes, which is 03E8 and takes up 2 bytes (03 and E8), then instead you add 2 to B7 to get B9. So you would store B9, followed by 03, then E8 (hence big endian). In other words, you take how long in bytes the length value is, and that’s what you store in the first byte, added to B7.

Here’s a full table of lengths:

First Byte Length of Length in bytes
B8 Length takes up 1 byte
B9 Length takes up 2 bytes
BA Length takes up 3 bytes
BB Length takes up 4 bytes
BC Length takes up 5 bytes
BD Length takes up 6 bytes
BE Length takes up 7 bytes
BF Length takes up 8 bytes

So again, if your start byte is B8, then that’s followed by a single byte that holds the length of the string that follows, which means it can be from 0 to 255 in length.

If your start byte is B9, then that’s followed by two bytes that hold the length of the string. Two bytes can represent up to 65,535 for the length.

Note: Technically, while you could encode anything from 0 to 255 using 2 bytes, just leading with a 0, this is not allowed in RLP. RLP requires the minimum number of bytes possible.

Note: Yes, a string whose length takes up 8 bytes would be astronomically long. As it stands, 4 bytes maxes out at 4.2 gigabytes. Ethereum nodes would certainly reject anything whose length reaches 8 bytes, or 18,446,744,073,709,551,615 in length. However, technically the Ethereum specification allows for data that big; consider it future-proofing!

Special case: Single Byte Strings

Because one goal is to keep things as compact as possible, Ethereum gave us a special case: Single Byte Strings. In that case, you simply state the ASCII value itself. (This, of course, only works for characters whose high bit are not set, hence hex values 00 to 7f.)

So, for example, if your string consists of the capital letter A, which is 41 in hex, then your entire RLP encoding would simply be:

0x41

Encoding Lists

In order to encode a list, we use the same method as for strings, except we set both two high bits, meaning the ranges go from:

11000000 to 11110111: Short lists (or C0 to F7 in hex)
11111000 to 11111111: Long lists (or F8 to FF in hex)

Then from there, we follow the same procedure as above, except instead of storing the number of items, we store the length of the entire payload. Let’s consider a short list first, that is, a list with 55 or fewer items in it. What can go in it? Strings and other lists, and even null values. (We’ll look at null values shortly.) Let’s first just consider two strings in a list, cat and dog. Here’s a visual:

alt text

Lists of Lists

You can also encode lists within a list. Let’s consider the following list:

[ “cat”, [ “apple”, “banana” ], “dog”]

To encode this, we simply follow the above rules. The inner list, [“apple”, “banana”] gets encoded using the same rules as above. We encode “apple” and “banana” individually as 85 61 70 70 6c 65 and 86 62 61 6e 61 6e 61. Together as a list, their payload has length 13, and so the inner list becomes:

cd 85 61 70 70 6c 65 86 62 61 6e 61 6e 61

where the length (13) plus c0 gives us cd.

We encode cat and dog individually as we did before. Cat becomes 83 63 61 74 and dog becomes 83 64 6f 67. Now we combine all three items: cat, the inner list, and dog. Taken together they give us:

83 63 61 74    cd 85 61 70 70 6c 65 86 62 61 6e 61 6e 61    83 64 6f 67

That entire payload has length of 22. We add that to c0 to get d6. And that gives us the entire RLP encoding as:

d6 (payload length)

83 63 61 74 (“cat”)

cd 85 61 70 70 6c 65 86 62 61 6e 61 6e 61 (“apple”, “banana”)

83 64 6f 67 (“dog”)

Or combined into a single line.

d6 83 63 61 74 cd 85 61 70 70 6c 65 86 62 61 6e 61 6e 61 83 64 6f 67

And written as a single, long hex string:

0xd683636174cd856170706c658662616e616e6183646f67

From there, encoding lists of lists of lists, and so on, is simple, as you follow the same procedure.

Encoding Numbers (specifically Ethereum Values)

How do we encode numbers? Remember that RLP was built specifically for Ethereum, and most of the numbers you’ll encounter are currency values. The general approach to encoding Ethereum values is to use the smallest possible currency and encode that as a whole number in string form without any decimal values.

The smallest possible value in Ethereum is 1 wei. There are 10^18 wei in a single ETH. Numbers are stored as whole numbers of weis.

So if you’re storing 1.000234567 Eth, you would first convert that to wei, which is:

1000234567000000000

Then you would convert that number to hex, which gives us:

0de0f76202c5c000

(Note that we need an even number of hex digits, so we pad a zero on the front.) Then from there you follow the same procedure as with strings. That’s 8 bytes, so we precede this with 88, giving us the following RLP encoding:

88 0d e0 f7 62 02 c5 c0 00

Encoding Structs: Agree on the field order

Encoding structs requires agreement on both ends on the order of the named fields. The names of the fields don’t get encoded, only the values do. So if you want to encode the following struct:

{
  "gasPrice": 20000000000,
  "to": "0x0000000000000000000000000000000060138453",
  "value": 1000000000000000000
}

you would simply encode the values as a list:

[20000000000, "0x0000000000000000000000000000000060138453", 1000000000000000000]

The key is that when you decode it, you have to know the order of the fields: gasPrice, to, value.

Special Case: null and empty values

Sometimes when encoding objects, one member might be null. For that we need to encode an empty value. Technically, RLP doesn’t have a “null” value; instead, it provides two options: an empty string or an empty list. If you need to encode an empty list, you simply choose 0 for the length, and follow it by nothing, meaning you use:

80

Or for an empty list:

c0