How to convert an integer to base64 in Python

I found  a bunch of solutions on web and noticed it may lead to entirely different results if you ignore a few assumptions.

First of all, the input of base64 algorithm is an array of bytes, which is a string in Python. Thus, before converting an integer to base64, you need to convert the integer to an array of bytes. To do so, you need to first decide the byte order: big endian or little endian. The best solution is using Python module struct. It can convert an integer to binary format in either byte order. Also, it can handle signed integers, etc.

I am using unsigned long long, which has 64 bits,  and little endian,  the simplest solution could be:

>>>import struct

>>> n = 12345

>>> struct.pack(‘<Q’, n).encode(“base64″).strip()

However, the reason I need to use base64 here is that base64 is more compact than the decimal format of integers.  But the above example has a lot of A’s, which are caused by zeros padded in the unsigned long long. Also, I am using encode method provided by string to generate base64 in the above example. A better base64 support is module base64, which supports url safe version of base64.

The following example can solve these problems. Also, I remove the padding =’s here.

import base64
import struct

def encode(n):
  data = struct.pack('<Q', n).rstrip('x00')
  if len(data)==0:
    data = 'x00'
  s = base64.urlsafe_b64encode(data).rstrip('=')
  return s

def decode(s):
  data = base64.urlsafe_b64decode(s + '==')
  n = struct.unpack('<Q', data + 'x00'* (8-len(data)) )
  return n[0]

Notice that when stripping zeros, how to handle the integer zero itself? If we simply remove all tailing zeros, we will end up with an empty string here. Thus, I keep a byte zero if the integer is zero. In decode, we have to pad zeros to variable data to 8 bytes for unpacking it as an unsigned long long. Notice the padding is in little endian order too.

Run some tests:

>>> print encode(0)

>>> print decode('AA')

>>> print encode(12345)

>>> print decode('OTA')

The code is for Python 2.5.

Finally, an interesting question. Is that true any string constructed by base64 characters is a valid base64 string?  See the following example,

>>>print decode('100')

>>>print decode('101')

>>>print decode('102')

>>>print decode('103')

The results are the same. Why? I would say only 100 is a valid base64 string. But python base64 decoder can tolerate the other 3. We know that 4 base64 chars (24 bits) represent 3 bytes. If we have only 3 base64 chars (18 bits) here, it can only represent 2 bytes (16 bits). Thus, the least significant 2 bits of the 3 chars (“101″-”102″) are ignored by base64 decoder.

2 thoughts on “How to convert an integer to base64 in Python