bit.ly

If someone (say google) wants to acquire twitter, but worries about its fair market value, a cheaper alternative may be to buy bit.ly, assuming the goal of acquisition is to get crowdsourcing links in realtime for building a better search engine. Bit.ly is the default link shortening service for twitter. Most outgoing links on twitter are shortened by bit.ly. My experience on twitter tells me that most valuable tweets have links referring to useful sites. So bit.ly aggregates most useful sites shared by community. More important, bit.ly is a redirection service. It monitors the click rate of the links, which indicates the value of the links.

However, tradevibes shows that bit.ly is brought to you by the same people who built, acquired, or invested in twitter. So it may not be so easy to hijack twitter in this way. But bit.ly is really a useful link aggregation service. As a fact, bit.ly already provides a search engine for discovering tweet links. Bit.ly may be even more useful than those social bookmarks, since people use shortened urls not only publicly but also in private emails.

So why do people use bit.ly? Twitter 140-char limit is one reason. But that’s not all. Tracking purpose is another important reason. If you host a blog yourself, you may have some way to track the click rate. But if you share a photo on flickr, an article you happen to find, how do you know how many people really click your sharing? Even if you host your own blogs, how do you know if people really click 3rd-party links on your blog? Bit.ly is the answer. The dashboard on bit.ly distinguishes it from other url shortening services. It tracks the entire click history including the time histogram, referrers, and locations of clicks. That’s more advanced than some virtual hosting services or blog services.

What inspires me more is this. Large Internet services all have their own private redirection/tracking services. Now, this kind of non-user-perceived component can become an independent startup to enable other services to build on top of it. So the question is what other components like this are missing on the Internet?  A lot.

Posted in web

Char By Char Synchronization

Today google is going to launch google wave to 100K users. Wave is a new form of communication channel, which makes group email more like wiki + instant message conference. One important feature is character by character synchronization among all participants. An interview with wave core engineer Dhanji Prasanna shows that the synchronization part can be traced back to  a 1995 paper, High-Latency, Low-bandwidth Windowing in the Jupiter Collaboration System.

I am not sure if char-by-char synchronization is really useful for email. I feel that will be the first thing I am going to disable in my email client if it exists. I prefer  the opposite direction by giving a second thought for everything I send. Isn’t that another google project, undo send? (Actually the dream may come true by replacing distributed emails with a central wave.) But I do see some other scenarios where char-by-char synchronization is very interesting. Live cooperation on google doc really introduced huge edge to beat microsoft. A new startup on social answers, flusher, distinguishes itself from other answer sites by instant answers. Also, some professional traders really want to share their trading actions in a more live way. Social needs to go to real time.

Posted in web

Connection Close in HttpClient

I recently made a mistake using Java Jakarta Commons HttpClient. I decided to dig deeper into the issue.

My code uses HttpClient to send HTTP requests to another machine. The load is very high. Over time, I often see exceptions of “Too many open files” in logs. But the problem may auto recover. Using netstat, I found that there were a lot of tcp connections in CLOSE_WAIT state on the machine. So the problem is that the application did not close the connections.

My code is very similar to the example in HttpClient tutorial.

HttpClient client = new HttpClient();
GetMethod httpget = new GetMethod("http://www.whatever.com/");
try {
    client.executeMethod(httpget);
    ...
} finally {
    httpget.releaseConnection();
}

The code calls releaseConnection at the end as specified by the tutorial. But what does this method do?  To understand it, we need to understand what’s behind HttpClient object. Each HttpClient has an HttpConnectionManager responsible for maintaining connections. If we don’t pass an HttpConnectionManager to the constructor, HttpClient will initiate a SimpleHttpConnectionManager by default. SimpleHttpConnectionManager maintains a single connection and can only be used by a single thread. The main job of SimpleHttpConnectionManager is to keep the connection alive if the next request is to the same host. So the above releaseConnection call will not close the socket. If the next method to be executed is to a different host, it will close the prior connection at that time. Otherwise, it may reuse the connection.

The mistake I made is that I created  HttpClient objects on demand instead of reusing a single instance as documented here. If the peer closes the socket first (by sending FIN), the connection will be in CLOSE_WAIT state on my side until my application layer closes the socket.  CLOSE_WAIT is a state that will not time out. (It is not TIME_WAIT.) It is application’s responsibility to close it. So how to force HttpClient to close the socket? Actually HttpConnectionManager interface does not define a way to close the socket. But SimpleHttpConnectionManager introduced shutdown method since 3.1. So one possible way to close the connection is as follows.


HttpConnectionManager mgr = client.getHttpConnectionManager();
if (mgr instanceof SimpleHttpConnectionManager) {
    ((SimpleHttpConnectionManager)mgr).shutdown();
}

But why isn’t the problem deterministic?  Shouldn’t it never recover once the problem starts to happen? The magic is Java garbage collection. I reproduced the effect by forcing garbage collection. It will clean CLOSE_WAIT connections. But, to be accurate, JVM garbage collection does not handle socket closing by itself. It only frees memory. It is Socket object who closes sockets in finanize method as discussed here.

SimpleHttpConnectionManager is not thread safe. If you need to maintain a reusable HttpClient instance shared by multiple threads, you should use MultiThreadedHttpConnectionManager. For example,


protected static HttpClient m_client = null;
 static {
    MultiThreadedHttpConnectionManager mgr = new MultiThreadedHttpConnectionManager();
    mgr.getParams().setDefaultMaxConnectionsPerHost(1000);
    mgr.getParams().setMaxTotalConnections(1000);
    m_client = new HttpClient(mgr);
 }

To win a game that is impossible to win

To win a game that is impossible to win, you need to first change the rule.

When Microsoft was busy fixing its IE security problems, Google introduced a browser that is fastest in executing javascript. Who care the difference in javascript execution speed at that time? But, that was a new rule to compare browsers. Gradually chrome gave me an impression that it is “fast”.

After I upgraded to firefox 3.5 beta, a browser with unbelievably long startup time,  I started to seek alternatives and roughly remembered chrome is somehow “fast”.  So I started to use chrome and did feel it is fast. All web sites I regularly visit work very well in it.

Now google starts to boost HTML5 using its strategic products. IE, once the king of browsers, needs google chrome frame to fully support HTML5.

No matter whether google can win this game or not, what impressed me is that I saw consistency in google strategies to win a game from all perspectives with patience year over year.

Posted in web

One Success Factor of Social Network Services

I was always wondering what’s the essence of social networks. How to start a new social network service? What’s a successful social network service?

This summer is a singing season in China. Two series of singing competition shows were lively broadcasted during the summer by two TV stations. One is more for teenagers and more like American Idol. I did not watch. The other, named Do You Remember, is my favorite. This program invented this competition model to see how accurate a singer can remember the lyrics of popular or classical songs and sing them well. Moreover, they keep inventing new rules to compete. For example in the last week show, they let audience to use cellphone messages to vote to determine the winners. But you can only vote during the 180-second period when the singer you support was performing on stage. I can imagine the fans of those singers cannot even leave from the screen for a while to avoid missing the window. How long was the show?  It lasted almost 4 hours.

TV as a media service is changing. Traditional movies or TV series based TV stations cannot compete with DVD or the Internet, where audience can watch any programs any time. This is especially true in China. So what are those top TV players doing in China? They are playing games, live social games like the above shows, with audience. Only in this way, they can draw audience attention for 4 hours continuously.

Internet media services face the same problem. Given a set of users, how to increase user’s time spent on the service. As a consequence, content-based media services are losing users to social networks. Then, how to create a successful social web site? I think one success factor is playing games with your users.

Yahoo recently launched English version Meme silently and ridiculously got a lot of Chinese users to flood in. Yahoo Meme played a very good game in this launch. First, it is an invite-only service. Notice, it is different from other yahoo bucket tests, where users are selectively invited by Yahoo. Instead, users have rights to invite others. This is an old game invented by Gmail. However, as the result, many Chinese users ask each other on twitter to attempt to get an invitation email to see this mysterious Meme. Second, after users log in, they can immediately notice a re-post button for each post. Yes, this is retweet, one of the most important features on twitter. It is always true that majority of social network users cannot generate novel content. Reposting at least allows them to participate in the game.

If a social network is a game, then what’s the essence of the game? There are many. One important one is creativity. Not game creator’s creativity, but user’s creativity. A good game should intrigue user’s interest to discover or even to invent! Twitter is a good example of social games. It took me almost half a year to understand twitter better and better. From the beginning, how to find interesting users to follow. To the end, how to get others to follow me. Twitter is also a good example of user invention. An article on MIT Technology Review  “The Evolution of Retweeting” described how twitter users invented reply, hash, and RT. What twitter did in this game is just simply following the users. Remember the most important experience in trading market? Follow the trend, don’t predicate.

Let’s imagine what Yahoo would do if Yahoo acquired twitter in 2007, when twitter had not obtained critical user space. Assuming Yahoo realized this could be the most important social network service in the following years, they decided to boost it. First of all, integrate twitter with Yahoo homepage. If after that, the result is not impressive, then what to do? Integrate with search result? Good idea. Let’s select most valuable tweets and show them above search results as shortcuts to get more traffic. Hmm, the result is still not good enough.  OK, Let’s use machine learning and NLP to understand user’s tweets and rewrite them as search queries to get search results to satisfy our users. The final solution may never get a chance to show the result since the product will be cut before that. So what’s wrong? Traffic acquisition is important. But that’s still an old-school solution for content-based media services. It is like using Yahoo existing services as a truck to pull a rocket. However, the rocket’s fuel is not ignited.

Install pygtk

First, download gtk from

http://www.gtk.org/download-windows.html

I downloaded the all-in-one bundle

http://ftp.gnome.org/pub/gnome/binaries/win32/gtk+/2.16/gtk+-bundle_2.16.6-20090911_win32.zip

unzip the file to c:gtk

Then, I modify windows environment variable PATH by appending C:gtkbin to it.  (Right click “My Computer”, click “Properties”,   “Advanced”, “Environment Variables”, edit “path”)
Go to http://www.pygtk.org/downloads.html, download the latest version pycairo, pygobject, pygtk. (I am using python 2.6).

pycairo-1.4.12-2.win32-py2.6.exe

pygobject-2.14.2-2.win32-py2.6.exe

pygtk-2.12.1-3.win32-py2.6.exe

Run them one by one. The python installation on Windows is very easy, they can all find Python 2.6 from registration and automatically install the package to python lib directory.

Now I run python, I can do


import gtk

If possible, I want to try http://code.google.com/p/pywebkitgtk/
However, it seems quite troublesome to build it on windows:
http://coding.derkeiler.com/Archive/Python/comp.lang.python/2008-10/msg01587.html
http://coding.derkeiler.com/Archive/Python/comp.lang.python/2008-10/msg01723.html
http://webkit.org/building/tools.html
http://www.python.org/doc/2.5.2/ext/win-cookbook.html

related http://aruiz.typepad.com/siliconisland/2006/12/allinone_win32_.html

Your Twitter Follower Statistics

Check your twitter statistics at TwitterCounter

They even provide an API http://twittercounter.com/pages/api

just try

curl “http://twittercounter.com/api/?username=your_twitter_account”

the statistics for me was generated 1 month ago. You can see your rank.

Also, you can see top 100 twitter users here

http://twittercounter.com/pages/100

Related Sites:

http://twitterank.com/

http://twitalyzer.com/twitalyzer/

Luke Wroblewski's blog about Facebook Design

Design at Facebook

http://www.lukew.com/ff/entry.asp?879

There are a lot of commonly appreciated points such as “Feedback is good. Get as much as you can and as early as possible. Helps save time.” and don’t fall in love with your design.

Some other more unique points:

“Writing front-end code ties design into the engineering culture of the company. Having a designer that can write code allows details to get figured out and quickly implemented.”

“There is no creative director at Facebook, instead use a bottoms up process to get alignment. ”

Mockups lie. They lack content and context. Need to use real content and page designs to understand how the design will work.

I like the last point.