How to convert an integer to base64 in Python

I found  a bunch of solutions on web and noticed it may lead to entirely different results if you ignore a few assumptions.

First of all, the input of base64 algorithm is an array of bytes, which is a string in Python. Thus, before converting an integer to base64, you need to convert the integer to an array of bytes. To do so, you need to first decide the byte order: big endian or little endian. The best solution is using Python module struct. It can convert an integer to binary format in either byte order. Also, it can handle signed integers, etc.

I am using unsigned long long, which has 64 bits,  and little endian,  the simplest solution could be:

>>>import struct

>>> n = 12345

>>> struct.pack(‘<Q’, n).encode(“base64″).strip()
‘OTAAAAAAAAA=’

However, the reason I need to use base64 here is that base64 is more compact than the decimal format of integers.  But the above example has a lot of A’s, which are caused by zeros padded in the unsigned long long. Also, I am using encode method provided by string to generate base64 in the above example. A better base64 support is module base64, which supports url safe version of base64.

The following example can solve these problems. Also, I remove the padding =’s here.


import base64
import struct

def encode(n):
  data = struct.pack('<Q', n).rstrip('\x00')
  if len(data)==0:
    data = '\x00'
  s = base64.urlsafe_b64encode(data).rstrip('=')
  return s

def decode(s):
  data = base64.urlsafe_b64decode(s + '==')
  n = struct.unpack('<Q', data + '\x00'* (8-len(data)) )
  return n[0]

Notice that when stripping zeros, how to handle the integer zero itself? If we simply remove all tailing zeros, we will end up with an empty string here. Thus, I keep a byte zero if the integer is zero. In decode, we have to pad zeros to variable data to 8 bytes for unpacking it as an unsigned long long. Notice the padding is in little endian order too.

Run some tests:

>>> print encode(0)
AA

>>> print decode('AA')
0

>>> print encode(12345)
OTA

>>> print decode('OTA')
12345

The code is for Python 2.5.

Finally, an interesting question. Is that true any string constructed by base64 characters is a valid base64 string?  See the following example,

>>>print decode('100')
19927

>>>print decode('101')
19927

>>>print decode('102')
19927

>>>print decode('103')
19927

The results are the same. Why? I would say only 100 is a valid base64 string. But python base64 decoder can tolerate the other 3. We know that 4 base64 chars (24 bits) represent 3 bytes. If we have only 3 base64 chars (18 bits) here, it can only represent 2 bytes (16 bits). Thus, the least significant 2 bits of the 3 chars (“101″-”102″) are ignored by base64 decoder.

glassfish errors in netbeans

I don’t use IDE since I am used to editing in vim. I tried to adopt eclipse for java work several times. But I just could not keep using it. However, I recently started to see if I can adopt netbeans for java ee development. My environment is netbeans + glassfish + maven.

I experienced several problems in the beginning

(1) NetBeans: No suitable Deployment Server is defined for the project or globally.

I found the solution here: http://wiki.netbeans.org/JAXWSNB6Maven2GlassFishV2

In Netbeans:

  1. Right-click the project and select Properties. Navigate to the Run tab.
  2. In the Server field select GlassFish V2

(2) netbeans SEC5046: Audit: Authentication refused for [admin].

Why do I see this error while I can still successfully deploy my application to glassfish from netbeans?

I found the solution here http://forums.java.net/jive/thread.jspa?threadID=35551

Just simply remove ~/.asadminpass

bit.ly

If someone (say google) wants to acquire twitter, but worries about its fair market value, a cheaper alternative may be to buy bit.ly, assuming the goal of acquisition is to get crowdsourcing links in realtime for building a better search engine. Bit.ly is the default link shortening service for twitter. Most outgoing links on twitter are shortened by bit.ly. My experience on twitter tells me that most valuable tweets have links referring to useful sites. So bit.ly aggregates most useful sites shared by community. More important, bit.ly is a redirection service. It monitors the click rate of the links, which indicates the value of the links.

However, tradevibes shows that bit.ly is brought to you by the same people who built, acquired, or invested in twitter. So it may not be so easy to hijack twitter in this way. But bit.ly is really a useful link aggregation service. As a fact, bit.ly already provides a search engine for discovering tweet links. Bit.ly may be even more useful than those social bookmarks, since people use shortened urls not only publicly but also in private emails.

So why do people use bit.ly? Twitter 140-char limit is one reason. But that’s not all. Tracking purpose is another important reason. If you host a blog yourself, you may have some way to track the click rate. But if you share a photo on flickr, an article you happen to find, how do you know how many people really click your sharing? Even if you host your own blogs, how do you know if people really click 3rd-party links on your blog? Bit.ly is the answer. The dashboard on bit.ly distinguishes it from other url shortening services. It tracks the entire click history including the time histogram, referrers, and locations of clicks. That’s more advanced than some virtual hosting services or blog services.

What inspires me more is this. Large Internet services all have their own private redirection/tracking services. Now, this kind of non-user-perceived component can become an independent startup to enable other services to build on top of it. So the question is what other components like this are missing on the Internet?  A lot.

Posted in web