The Base64 Alphabet contains 64 basic ASCII characters which are used to encode data.Yeah, that’s right, 64 characters is enough to encode any data of any length. The only drawback is that the size of the result will increase to 33%. In Base64 encoding, the length of output-encoded String must be a multiple of three. If not, the output will be padded with additional pad characters (=). Upon decoding, these extra padding characters will be discarded. To dig deeper into padding in Base64, check out this detailed answer on Stack Overflow. Base64url encoding October 17, 2014 It’s often more convenient to manage data in text format rather than binary data (for example a string column in a database, or a string rendered into a HTTP response). Common examples in security are digital signatures and encryption.
Basically, Base64 is a collection of related encoding designs which represent the binary information in ASCII format by converting it into a base64 representation. Base64 encoding schemes are generally used when there is a need to encode binary information that needs to be stored and transferred over media that are developed to deal with textual information. Basics of Base64 Encoding A base64 string is pretty easy to identify: VGhpcyBpcyB3aGF0IGJhc2U2NCBsb29rcyBsaWtlIGluIHRoZSB3aWxkLgo= There are 64 characters in the Base64 “alphabet”, and an encoded string will contain a mixture of uppercase and lowercase letters, numbers, and sometimes an “=” or two (never more than two) at the end.
The Base64 Alphabet contains 64 basic ASCII characters which are used to encode data. Yeah, that’s right, 64 characters is enough to encode any data of any length. The only drawback is that the size of the result will increase to 33%. However, its benefits are much more important, at least because all these symbols are available in 7-bit and 8-bit character sets.
Characters of the Base64 alphabet can be grouped into four groups:
- Uppercase letters (indices 0-25):
ABCDEFGHIJKLMNOPQRSTUVWXYZ
- Lowercase letters (indices 26-51):
abcdefghijklmnopqrstuvwxyz
- Digits (indices 52-61):
0123456789
- Special symbols (indices 62-63):
+/
It is very important to note that the Base64 letters are case sensitive. This means that, for example, when decoding the values “QQ”, “Qq”, “qq”, and “qQ” four different results are obtained.
For a better understanding, I grouped all characters into the Base64 table:
Index | Character |
---|---|
0 | A |
1 | B |
2 | C |
3 | D |
4 | E |
5 | F |
6 | G |
7 | H |
8 | I |
9 | J |
10 | K |
11 | L |
12 | M |
13 | N |
14 | O |
15 | P |
16 | Q |
17 | R |
18 | S |
19 | T |
20 | U |
21 | V |
22 | W |
23 | X |
24 | Y |
25 | Z |
Index | Character |
---|---|
26 | a |
27 | b |
28 | c |
29 | d |
30 | e |
31 | f |
32 | g |
33 | h |
34 | i |
35 | j |
36 | k |
37 | l |
38 | m |
39 | n |
40 | o |
41 | p |
42 | q |
43 | r |
44 | s |
45 | t |
46 | u |
47 | v |
48 | w |
49 | x |
50 | y |
51 | z |
Index | Character |
---|---|
52 | 0 |
53 | 1 |
54 | 2 |
55 | 3 |
56 | 4 |
57 | 5 |
58 | 6 |
59 | 7 |
60 | 8 |
61 | 9 |
Index | Character |
---|---|
62 | + |
63 | / |
In addition to these characters, the equal sign (=
) is used for padding. That is, the equal sign does not own an index and is not involved in the encoding of data. By and large, the padding character ensures that the length of Base64 value is a multiple of 4 bytes and it is always appended at the end of the output. Nevertheless, the heart of the algorithm contains only 64 characters, and for each of them there is a unique index. Only indices determine which characters will be used to encode the data, and only thanks to them you can “recover” the original data. All indices are listed in the Base64 table above.
Given all of the above, a Base64 value can be defined using the following regular expression:
^[A-Za-z0-9+/]+={0,2}$
However, some standards allow and even require the use of multi-line values. In such cases, we need to supplement the list of characters, by allowing “Line Feed” and “Carriage Return”.
^[A-Za-z0-9+/rn]+={0,2}$
The Base64 encode algorithm converts any data into plain text. Technically, it can be said that it converts eight-bit bytes into six-bit bytes. To understand how the encoding algorithm works, check the example below that describes step by step how to manually encode strings to Base64 (if you are looking for an automatic converter, use the Base64 online encoder).
For example, you have the “ABC” string and want to convert it to Base64:
- First, you need to split the string letter by letter. Thus, you got 3 groups:
A
B
C
- Next you need to convert each group to binary. To do this, for each letter you need to find the corresponding binary value in the ASCII table. Thus, now you have 3 groups of ones and zeros:
01000001
01000010
01000011
- Now concatenate all the binary values together (that is, glue all the groups along and make sure you get a total of 24 characters):
010000010100001001000011
- Then, divide the resulting string into groups so that each one has 6 characters (if the last group has less than 6 characters, you need to fill it with zeros until it reaches the desired length). Well and good, now you have 4 groups:
010000
010100
001001
000011
- At this step you have to convert six-bit bytes into eight-bit bytes. To do this, prepend the prefix “00” (two zeros) in front of each group:
010000
010100
001001
000011
- There you have to convert each group from binary to decimal by finding its corresponding decimal value in the ASCII table. If you did everything right, each group will be transformed into its integer number as follows:
16
20
9
3
- Integer numbers obtained in the previous step are called “Base64 indices”. They are easy to remember, because it is a zero-based numbering, where each index corresponds to a Latin letter. It starts with the letter “A” in alphabetical order (i.e., A=0, B=1, C=2, D=3, and so on). For complete list, see Base64 Characters Table. So, matching indexes, convert them to corresponding letters:
Q
U
J
D
- The final chord, concatenate all letters to get the Base64 string:
QUJD
To summarize, you learned that encoding “ABC” to Base64 yields the result “QUJD”. As you can see, this is a very simple process and you can encode text to Base64 even by hand. I hope that you managed to get the right encoding result. Otherwise, let me know and I will try to help you.
Base64 Encoding Hex
If you need more step by step examples, use the form below to get encoding instructions for custom strings (once you submit the form, the article above will be updated accordingly in real time):