Download Android App

Alternate Blog View: Timeslide Sidebar Magazine

Tuesday, November 13, 2012

The evolution of Recommender Systems

Recommender systems or recommendation systems are a subclass of information filtering system that seek to predict the 'rating' or 'preference' that a user would give to an item (such as music, books, or movies) or social element (e.g. people or groups) they had not yet considered.

 An important point to note is that some techniques work well with explicit ratings e.g. movies, music while others do well with implicit ratings e.g. page views, clicks etc.

Collaborative filtering (CF) methods are based on collecting and analyzing a large amount of information on users’ behaviors, activities or preferences and predicting what users will like based on their similarity to other users. A key advantage of the collaborative filtering approach is that it does not rely on machine analyzable content and therefore it is capable of accurately recommending complex items such as movies without requiring an "understanding" of the item itself.

First, there was 'Taste', a simple collaborative filtering engine that could predict what a user would like next, be it a movie, book or a product.  Later, Taste was incorporated into Apache Mahout. As of v0.7, Mahout provides the standalone version of the original item-item, user-user and slope-one recommender as well as the distributed versions of item-item CF.

The following images demonstrate how Mahout's recommendation engine work:

S is the similarity matrix between items, U is the user’s preferences for items and R is the predicted recommendations

Another approach is to use association rule mining (or market basket analysis) to compute interesting recommendations. But this technique does not generate personalized recommendations. Some researchers have combined ARM and CF to provide personalized recommendations.

The Nextflix competition brought about a sea change in the way CF engines analyze and compute predictions. The Netflix Prize sought to substantially improve the accuracy of predictions about how much someone is going to enjoy a movie based on their movie preferences (ratings).

A very successful approach to this problem are the so called latent factor models using Matrix Factorization.

In contrast to CF algorithms, this technique builds a model during the learning phase, by recognizing patterns in the training data. After the learning phase, they use the model to predict ratings of given queries. Many recent achievement of precise prediction used this approach. Specifically, these methods factorize the rating matrix into two low-rank matrices: user profile and item profile i.e. model the users and items as points in a k-dimensional feature space. An unknown rating can than simply be estimated by taking the dot product between the corresponding user and item feature vectors.  They tend to take longer time compared to CF, but claimed to achieve better accuracy.

Mathematically spoken, decompose A into two other matrices U and M whose combination is a good approximation of A.

The matrix A can be very large and the challenge is to find the decomposition and various methods exist to calculate it.

Few published techniques in matrix factorization include:
  • Regularized SVD - Regularized SVD (Singular Value Decomposition) minimizes squared error between actual ratings and predicted estimations for all available votes. In order to control overfitting issue, it adds regularization terms both for user and item profiles. For minimization process, it uses gradient descent. With a proper choice of parameters, this algorithm is known to achieve good accuracy.
  • Non-negative Matrix Factorization (NMF) - NMF also factorizes the rating matrix into user and item profiles, but it has one more restriction: both low-rank profile matrices should have only positive values in them. This method uses multiplicative update rules for minimizing Euclidean distance or Kullback-Leibler divergence between the actual ratings and estimation. 
  • Probabilistic Matrix Factorization (PMF) - PMF adopts a probabilistic linear model with Gaussian observation noise for representing latent features both for users and items.
  • Bayesian Probabilistic Matrix Factorization (BPMF) - This algorithm applies a fully Bayesian treatment of PMF model in which model capacity is controlled automatically by integrating over all model parameters and hyperparameters.
  • Non-linear Probabilistic Matrix Factorization (NLPMF) - This algorithm develops a non-linear probabilistic matrix factorization using Gaussian process latent variable models. The model is optimized using stochastic gradient descent method, allowing to apply Gaussian processes to data sets with millions of observations without approximate methods.
Based on feedback from Netflix participants, a straightforward matrix factorization with stochastic gradient descent training worked well for predicting ratings. For item recommendation, weighted regularized matrix factorization (WR-MF), which is called weighted-alternating least squares and BPR-MF (Bayesian personalized ranking), which optimizes for a ranking loss (AUC), does well.

Mahout uses Alternating Least Squares with Weighted Lambda-Regularization to find the decomposition, which is an iterative algorithm and currently show poor performance on Hadoop caused by the enormous overhead of scheduling and check-pointing each single iteration.

The next wave of innovative techniques were brought out by KDD 2011 competition, similar in nature to the Netflix competition but offered a fraction of the prize money! The winners used a combination of techniques (ensemble) including ALS, KNN, SGD, SVD++. More on it in next blog article.

Finally, whatever algorithm you choose, error analysis can provide lot of insights into an algorithm's behavior!

Thursday, October 18, 2012

Exploring Apache Shiro

I was looking at some implementations for "RememberMe" or persistent login functionality and came across Apache Shiro. From the project website:

"Apache Shiro is a powerful and easy-to-use Java security framework that performs authentication, authorization, cryptography, and session management. With Shiro’s easy-to-understand API, you can quickly and easily secure any application – from the smallest mobile applications to the largest web and enterprise applications."

It was an interesting find. For the first time, I saw a session management implementation that is simple and works outside a web container. Indeed the APIs were simple enough to use in an application. While the documentation is reasonably good, I could not find an end-to-end sample with a configuration. So here is something I wrote to play with the APIs.

This code is implements a standalone Username and Password based authentication module. The dummy realm implementation simply returns an empty credential (User Info) pair. So if you plan to use database backed authentication, substitute the dummy implementation with say JDBC code or use JdbcRealm.

1. Create shiro.ini that will be used to initialize Shiro's SecurityManager.

sessionManager = org.apache.shiro.session.mgt.DefaultSessionManager

# ensure the securityManager uses our native SessionManager
securityManager.sessionManager = $sessionManager

#set the sessionManager to use an enterprise cache for backing storage:
sessionDAO = org.apache.shiro.session.mgt.eis.EnterpriseCacheSessionDAO
securityManager.sessionManager.sessionDAO = $sessionDAO

cacheManager = org.apache.shiro.cache.ehcache.EhCacheManager
securityManager.cacheManager = $cacheManager

# Session validation
sessionValidationScheduler = org.apache.shiro.session.mgt.ExecutorServiceSessionValidationScheduler

# Session timeout  
securityManager.sessionManager.globalSessionTimeout = 3600000

# Default is 3,600,000 millis = 1 hour:
sessionValidationScheduler.interval = 3600000

sessionValidationScheduler.sessionManager = $sessionManager

# Auth
myRealm =
myRealmCredentialsMatcher = org.apache.shiro.authc.credential.AllowAllCredentialsMatcher
myRealm.credentialsMatcher = $myRealmCredentialsMatcher

#Remember Me
rememberMe = org.apache.shiro.web.mgt.CookieRememberMeManager
securityManager.rememberMeManager = $rememberMe




Note: We are using  AllowAllCredentialsMatcher that always returns true while matching credentials. The configuration also uses EhCache for storing sessions. Also note that  DefaultSessionManager does not have a default implementation for "RememberMe". Take a look at DefaultWebSecurityManager. It uses CookieRememberMeManager as a default implementation which is useful in a webapp.

2. Custom Realm

public class CustomRealm extends AuthenticatingRealm {

 private CredentialsMatcher credentialsMatcher;
 public String getName() {
  return "CustomRealm";

 public boolean supports(AuthenticationToken token) {
  return true;

    public CredentialsMatcher getCredentialsMatcher() {
        return credentialsMatcher;
    public void setCredentialsMatcher(CredentialsMatcher credentialsMatcher) {
        this.credentialsMatcher = credentialsMatcher;

 protected AuthenticationInfo doGetAuthenticationInfo(
   AuthenticationToken token) throws AuthenticationException {
  return new SimpleAuthenticationInfo("", "".toCharArray(), getName());

3. Auth Code:


import org.apache.shiro.SecurityUtils;
import org.apache.shiro.authc.UsernamePasswordToken;
import org.apache.shiro.config.IniSecurityManagerFactory;
import org.apache.shiro.mgt.SecurityManager;
import org.apache.shiro.session.Session;
import org.apache.shiro.subject.Subject;
import org.apache.shiro.util.Factory;

public class ShiroAuthService {

 public ShiroAuthService() {
  Factory factory = new IniSecurityManagerFactory(
  SecurityManager securityManager = factory.getInstance();
  // Make the SecurityManager instance available to the entire application
  // via static memory:

 public void testAuth() {

  // simulate a username/password (plaintext) token created in response to
  // a login attempt:
  UsernamePasswordToken token = new UsernamePasswordToken("user", "secret");

  boolean loggedIn = false;
  Session session = null;
  Subject currentUser = SecurityUtils.getSubject();

  try {
   session = currentUser.getSession();
   System.out.println("Session Id: " + session.getId());
   loggedIn = true;
  } catch (Exception ex) {
   loggedIn = false;

  Serializable sessionId = session.getId();
  if (loggedIn) {
   Subject requestSubject = new Subject.Builder().sessionId(sessionId)
   System.out.println("Is Authenticated = "
     + requestSubject.isAuthenticated());//Should return true
   System.out.println("Is Remembered = "
     + requestSubject.isRemembered());
  } else {
   System.out.println("Not logged in.");


 public static void main(String[] args) {
  new ShiroAuthService().testAuth();

There are some other interesting features. It has a nice pluggable architecture wherein you can provide custom implementations of SessionManager, Realm, Caching, CredentialMatching, SessionDAO and "RememberMe".

For implementing a custom matcher, go with "public class CustomMatcher extends CodecSupport implements CredentialsMatcher". Similary, for a custom Realm implementation, use: "public class CustomRealm extends AuthenticatingRealm" as the base class provides some useful functionality. Of-course you can provide an implementation from scratch.

For more samples, see

Monday, September 3, 2012

Exploring Streaming Algorithms - Part 1

From Wikipedia - "Streaming algorithms are algorithms for processing data streams in which the input is presented as a sequence of items and can be examined in only a few passes (typically just one). These algorithms have limited memory available to them (much less than the input size) and also limited processing time per item."

More formally, a sequence S = <a1, a2, . . . , am>, where the elements of the sequence (called tokens) are drawn from the universe [n] := {1, 2, . . . , n}. Note the two important size parameters: the stream length, m, and the universe size, n. Since m and n are to be thought of as “huge,” we want to make s much smaller than these; specifically, we want s to be sublinear in both m and n. The holy grail is to achieve:

s = O(log m + log n)

1. Finding Frequent Items

We have a stream S = <a1, . . . , an>, with each ai belongs to [n], and this implicitly defines a frequency vector f = ( f1, . . . , fn). Note that f1 + · · · + fn = m. In the majority of problem, the task is as follows:
if for each j : f j > m/2, then output j, otherwise, output “No”. This can be generalized to the FREQUENT problem, with parameter k, as follows: output the set { j : f j > m/k}.

Clearspring has open sourced a library "Stream-lib" that is ideal for summarizing streams and counting distinct elements or cardinality estimation. Here is a sample code to find top 100 elements in a stream.

public static void main(String[] args) {

    long count = 0;
    StreamSummary<String> topk = new StreamSummary<String>(1000);

    /* Read product(s) id from console or a stream e.g.
    *  300645, 301482, 286467, 282697, 282697, 301482, 286467, .....
    List<String> productIds = readProductId();
    for (String productId : productIds) {

    count = 0;
    List<Counter<String>counters = topk.topK(100);
    for (Counter counter : counters) {
        System.out.println((String.format("%d %s %d %d", ++count,
                        counter.getItem(), counter.getCount(),

Sample Output:
Count ProductId Frequency Error
1     300645      231      0
2     282697      221      0
3     301482      105      0
4     295059      59       0
5     286467      58       0

Finding frequent items or Top K elements are related problems. The above code uses Space-Saving algorithm which is a deterministic algorithm i.e. it guarantees the correctness of frequent elements as well as correctness and the order of Top K elements.

The algorithm basically works like this: The stream is processed one item at a time. A collection of k distinct items and their associated counters is maintained. If a new item is encountered and fewer than k items are in the collection, then the item is added and its counter is set to 1. If the item is already in the collection, its counter is increased by 1. If the item is not in the collection and the collection already has a size of k, then the item with lowest counter is removed and the new item is added, with its counter set to one larger than the previous minimum counter.

Here is some pseudo code to make this clearer:

SpaceSaving(k, stream):
collection = empty collection
for each element in stream:
    if element in collection:
    then collection[element] += 1
    else if length of collection < k:
        then add element to collection, collection[element] = 1
        current_minimum_element = element with lowest count value in collection
        current_minimum = collection[current_minimum_element]
        remove current_minimum_element from collection
        collection[element] = current_minimum + 1

Finding frequent elements has applications to network traffic monitoring, anomaly detection and DDoS detection. In case of DDoS detection, the top few frequent IPs are continuously maintained for further action. Possible IP addresses can be large, but mostly a subset of IPs are seen in an attack. In such cases, the Space Saving algorithm space bound is a function of the number of distinct IPs that have occurred in the stream.

In part 2, we will look at counting distinct elements and other stream algorithms. 

Tuesday, August 28, 2012

The 'ETA' Rush

Estimated Time of Arrival or ETA is used in numerous contexts in everyday conversation. But it holds a special significance in the context of Project Management in IT.

Your boss has asked you to take the lead on a IT project in your company. Maybe you are a project manager, software architect or technical lead. If your project is important, your boss will be pressed hard to keep his superiors informed of its progress. And so it happens that ETAs for important milestones are key to  publishing an excellent project status.

Smart managers consume status on important projects voraciously. Excellent status reporting means that managers are fully informed of your projects health and overall direction without having to get involved themselves. There is particular information your boss needs in order to show her boss that she is on top of things and able to run the show effectively.

So far so good. But what if the manager comes from a non software background and suddenly the whole project starts revolving around ETAs, the technical specification document is replaced by a time-line sheet, the architecture document is no more referred to and in all team meetings, the impending question is "What is the ETA for....?"

We are engineers. That doesn't just describe our training or job description — its who we are. We look at the world through a scientific, analytical lens and when we identify problems, we prototype ways to solve them. Engineers have unique viewpoints and skills and – in my opinion – a unique responsibility to use our abilities for the good of all. While each task must be time bound, it does not make sense to calculate and announce ETAs in  few seconds. Research and prototype or debugging the issue requires time before a date can be given.

Managers are like children and always want to know when they are going to get something. "Are we there yet? Are we there yet?"

So what happens to a project where ETA rules? An ETA driven project is a sure sign of very hard-to-meet deadlines and mostly poor project planning. Engineers get frustrated and lose interest in project.  Poor project management, software design and architecture is followed by hurried execution.  In short, it is a disaster waiting to happen.

Dealing with 'ETA' craziness requires some tact. Provide a time and date for when the issue will be resolved. If you cannot, then provide a time and date for when you will get to the next step in the issue resolution process. If you cannot do that, then provide an ETA for your next updated status on the issue or in short an ETA for an ETA!

Software Engineer: Jack, a bull is coming in your direction.
Manager: What is the ETA?
Software Engineer: 0.5 millisecond.
Manager: ----@!$@*@!

Friday, August 10, 2012

A random conversation

A random conversation between a Team lead and a product manager

Lead: I got a new project request from Joey (Alliance manager). It is a big project.
Product Manager: Oh, what will you reply to Joey now?
Lead: Sir, please wait for your turn. You are in queue. We value our customers. Please do not disconnect. You will be attended to shortly. This call may be monitored for quality purpose.
Product Manager: :)

Sunday, July 22, 2012

Getting started: Infinispan as remote cache cluster

This guide will walk you through configuring and running Infinispan as a remote distributed cache cluster. There is straightforward documentation for running Infinispan in embedded mode. But there is no complete documentation for running Infinispan in client/server or remote mode. This guide helps bridge the gap.

Infinispan offers four modes of operation, which determine how and where the data is stored:
  • Local, where entries are stored on the local node only, regardless of whether a cluster has formed. In this mode Infinispan is typically operating as a local cache
  • Invalidation, where all entries are stored into a cache store (such as a database) only, and invalidated from all nodes. When a node needs the entry it will load it from a cache store. In this mode Infinispan is operating as a distributed cache, backed by a canonical data store such as a database
  • Replication, where all entries are replicated to all nodes. In this mode Infinispan is typically operating as a data grid or a temporary data store, but doesn't offer an increased heap space
  • Distribution, where entries are distributed to a subset of the nodes only. In this mode Infinispan is typically operating as a data grid providing an increased heap space
Invalidation, Replication and Distribution can all use synchronous or asynchronous communication.

Infinispan offers two access patterns, both of which are available in any runtime:
  • Embedded into your application code
  • As a Remote server accessed by a client (REST, memcached or Hot Rod)
In this guide, we will configure an Infinispan server with a HotRod endpoint and  access it via a Java Hot Rod client. One reason to use HotRod protocol is it provides automatic loadbalancing and failover.

1. Download full distribution of Infinispan. I will use version 5.1.5.
2. Configure Infinispan to run in distributed mode. Create infinispan-distributed.xml.

<infinispan xmlns:xsi="" 
xmlns="urn:infinispan:config:5.1" xsi:schemalocation="urn:infinispan:config:5.1">
  <globaljmxstatistics enabled="true">
    <property name="configurationFile" value="jgroups.xml">
  <jmxstatistics enabled="true">
  <clustering mode="distribution">
   <hash numowners="2">

 <namedcache name="myCache">
  <clustering mode="distribution">
   <hash numowners="2">

We will use JGroups to setup cluster communication. Copy etc/jgroups-tcp.xml as jgroups.xml.

3. Place infinispan-distributed.xml and jgroups.xml in bin folder. Start 2 Infinispan instances on the same or different machines.

Starting an Infinispan server is pretty easy. You need to download and unzip the Infinispan distribution and use the startServer script.

bin\startServer.bat --help // Print all available options
bin\startServer.bat -r hotrod -c infinispan-distributed.xml -p 11222
bin\startServer.bat -r hotrod -c infinispan-distributed.xml -p 11223

The 2 server instances will start talking to each other via JGroups.

4. Create a simple Remote HotRod Java Client.

import java.util.Map;

import org.infinispan.client.hotrod.RemoteCache;
import org.infinispan.client.hotrod.RemoteCacheManager;
import org.infinispan.client.hotrod.ServerStatistics;

public class Quickstart {

 public static void main(String[] args) {

  URL resource = Thread.currentThread().getContextClassLoader()
  RemoteCacheManager cacheContainer = new RemoteCacheManager(resource, true);

  //obtain a handle to the remote default cache
  RemoteCache cache = cacheContainer.getCache("myCache");

  //now add something to the cache and make sure it is there
  cache.put("car", "ferrari");
  } else {
   System.out.println("Not found!");

  //remove the data

  //Print cache statistics
  ServerStatistics stats = cache.stats();
  for (Map.Entry stat : stats.getStatsMap().entrySet()) {
   System.out.println(stat.getKey() + " : " + stat.getValue());

  // Print Cache properties


5. Define

infinispan.client.hotrod.server_list = localhost:11222;localhost:11223;
infinispan.client.hotrod.socket_timeout = 500
infinispan.client.hotrod.connect_timeout = 10

## below is connection pooling config
maxTotal = -1
maxIdle = -1
whenExhaustedAction = 1
testWhileIdle = true
minIdle = 1

See RemoteCacheManager for all available properties.

6. Run You will see something like this on the console:

Jul 22, 2012 9:40:39 PM org.infinispan.client.hotrod.impl.protocol.Codec10 
INFO: ISPN004006: localhost/ sent new topology view (id=3) 
containing 2 addresses: [/, /]

hits : 3
currentNumberOfEntries : 1
totalBytesRead : 332
timeSinceStart : 1281
totalNumberOfEntries : 8
totalBytesWritten : 926
removeMisses : 0
removeHits : 0
retrievals : 3
stores : 8
misses : 0
{whenExhaustedAction=1, maxIdle=-1, infinispan.client.hotrod.connect_timeout=10, 
maxActive=-1, testWhileIdle=true, minEvictableIdleTimeMillis=1800000, maxTotal=-1, 
minIdle=1, infinispan.client.hotrod.server_list=localhost:11222;localhost:11223;, 
timeBetweenEvictionRunsMillis=120000, infinispan.client.hotrod.socket_timeout=500}

As you will notice, the cache server returns the cluster topology when the connection is established. You can start more Infinispan instances and notice that the cluster topology changes quickly.

That's it!

Some useful links:

Saturday, July 7, 2012

A Software Architect

A software architect lives to serve the engineering team -- not the other way around.

A software architect is a mentor.

A software architect is a student.

A software architect is the code janitor. Happily sweeping up after the big party is over.

A software architect helps bring order where there is chaos, guidance where there is ambiguity, and decisions where there is disagreement.

A software architect codes the parts of the system that are the most precious and understands them through and through.

A software architect creates a vocabulary to enable efficient communication across an entire company.

A software architect reads far more code than he or she writes -- catching bugs before they manifest as systems change.

A software architect provides technological and product vision without losing sight of the present needs.

A software architect admits when he or she is wrong and never gloats when right.

A software architect gives credit where it is due and takes pride simply in a job well done.

One word to sum is up - humility.

 Disclosure: Borrowed article from Thanks Chris for the wonderful thoughts. 

Saturday, June 30, 2012

Sports and Art

X-Treme Sports as seen from a cameraman's lens can be beautiful, like art.

Sunday, June 17, 2012

Shopping online still sucks

E-Commerce in India is almost a decade old setup. Yet, the shopping experience online is scary at best. It is not because of lack of shopping destinations, that are sprouting up like mushrooms but because not all are edible.

Recently, I became more adventurous. No, I did not trek to the Himalayas but instead, bought a furniture online. My wife was craving to beautify the kitchen. The stars were aligned and I found the perfect piece of "Chest of Drawers" at an amazing price on The delivery was 20 days away, and very child like, we waited for the drawer. In the meanwhile, I checked with my colleagues, hoping they were as daring. Few were, and the experience was good.

Then it came. Nicely packed, it was delivered to my door step. After cutting through 5 layers of tough packing material, I finally got to see the wood. Then something hit my hand. Upon digging, a triangular piece of wood came out.  I surely did order only 1 piece of furniture, and now I had 2! And then I found another one. The first piece had come out from the top corner but I struggled to find the origin of the second piece.

Some outstanding features of the drawer can only be described in pictures.

Broken piece

Wood sticking out

Crack on the left wall

An attempt to give an antique look.

In built lock!
The drawer had an automatic inbuilt locking mechanism. Unfortunately, I could not unlock it.

The drawer was cleary a discarded piece of furniture that could never sell. Most probably it was gathering dust in some remote corner in Pepperfry's warehouse for the last decade. The drawer was promtly picked the next day and a refund was due. After a month, it is still due because the furniture did not reach Pepperfry warehouse.

We were left with a broken heart and a credit card bill to pay.

Monday, April 23, 2012

The Truth: How fantastic discounts are created!

categories. The discounts are too good to be true. The limited time deals can tempt men and women alike. But are the 50%+ discount really true? What about markup price? The following comparisons done on 23/4/2012 will shock you.

The markup price at FandU is 200% of the actual MRP.

The markup price at FandU is 182% of the actual MRP.

The above screenshots show two E-Commerce websites selling same products but quote dissimilar MRP. There are many more products that have such jacked up MRPs. Now we know how such unbelievable discounts are CREATED! Well, it is really smart and foolish strategy from such E-Commerce websites to mislead and outsmart customers.

Sunday, April 22, 2012

Creativity and Innovation at Work

This blog article is about innovation and creativity - what they are and how it can be applied at work and in personal life. The article also tries to provide some insights into being creative.

Creativity Defined

"Creativity is the act of turning new and imaginative ideas into reality. Creativity involves two processes: thinking, then producing. Innovation is the production or implementation of an idea. If you have ideas, but don’t act on them, you are imaginative but not creative.” — Linda Naiman

“A product is creative when it is (a) novel and (b) appropriate. A novel product is original not predictable. The bigger the concept, and the more the product stimulates further work and ideas, the more the product is creative.” —Sternberg & Lubart, Defying the Crowd

What is Innovation?

Innovation is the production or implementation of ideas.

Creativity at Work

According to the IBM 2010 Global CEO Study, which surveyed 1,500 Chief Executive Officers from 60 countries and 33 industries worldwide, CEOs believe that, “more than rigor, management discipline, integrity or even vision – successfully navigating an increasing complex world will require creativity.”

Creativity is a crucial part of the innovation equation. Creativity is a core competency for leaders and managers and one of the best ways to set your company apart and create blue ocean.

Innovation can happen at various levels in an organization. Likewise, it can create a local or global impact. Whether it's local or global, there is definitely a meaningful impact. And it can in turn create a fly-wheel effect. A technical software or process innovation for example, can internally impact an engineering team. But the impact does not stop there. The innovation can itself attract more innovation or indirectly impact business. In this case, the engineering team can build better software which in turn positively impacts customer experience. For a company seeking profit, this results in improved top-line, part of which will be funneled into increased expenditure in innovation.

From a psychological point of view, creativity requires whole-brain thinking; right-brain imagination, artistry and intuition, plus left-brain logic and planning. So creativity and innovation can (should) happen in every environment. People are often too occupied with their everyday-work, that they don't have the time to think of something else. Often, they think that “people in management” are the ones who should think about new ideas.

Well, this is so wrong, everyone could have improvement ideas in everyday-work and life. To let out the subdued creativity in ourselves, here are a few points that can help:

1.  Creative people are restless by nature (in a good way of course!). They cannot sit still unless absorbed in something of his or her own interest. They are very energetic and the whole process of creation give them satisfaction. Connecting with such people can create a tremendous impact on oneself. Like a magnet can magnetize iron, such is the effect of connecting with creative people.

A word of caution: Often, some people are bubbling with ideas all the time. They dish out ideas but never see it through or just create an initial buzz and leave it midway. Most of their energy is spent in either thinking about it or talking about it. And the positive energy goes to waste.

The same theory is applied at organization level. For innovation to flourish, organizations create an environment that fosters creativity; bringing together multi-talented groups of people who work in close collaboration together - exchanging knowledge, ideas and shaping the direction of the future.

2.  To be creative requires a moment of silence. At times when the brain is at rest or not solving a problem at hand but distracted by some other leisure activity, chances are more than a creative idea will strike or a solution will announce itself.

A friend of mine was struggling with a software issue for over 3 days. He had stopped eating, would not care of personal hygiene and continued to stare at the computer screen. Even sleep avoided him. Unable to solve it and the deadline the next day, he decided to goto a nearby store to get some biscuits. And there it happened. While crossing the busy road, he stopped for a moment and the sense of Eureka! befell on him. Lot of us have experienced it. And psychologist have done experiments to support this observation.

3. Not all ideas are worth pursuing. Let's face it, not all "wow" ideas are powerful enough to create a measurable impact. And this brings us to the point of objectively analyzing a novel idea or process. Some people go by hunch, others by passion and "let's give it a shot" attitude while others derive a mathematics formula to prioritize ideas. Whatever it is, it requires patience and wisdom to analyze and prioritize ideas.

I once knew a senior manager who created an Excel to do some sort of impact analysis and also measure the risk associated with the execution of the idea. This way, he spent his limited time pursuing the ideas and also understood the risk level.

4. Growing and extending creativity. Since you can't be creative all the time, connecting with others around you and encouraging them can be effective. But saying mere words of encouragement to people around you rarely works. Understanding the competencies and motivation of your colleagues and your subordinates, and ensuring they work on projects that directly maps to their competencies is a great way to extend creativity. When individuals have the freedom to choose the means to achieve the goal, imagination and creativity blossoms.

If you have something to share, feel free to write to me.

Friday, March 23, 2012

A Managers Diary

Software engineering as a profession is fun. The role shift from a software engineer to a people manager in quite common place in Indian IT industry. Managing people brings forth its own set of challenges. For those with a MBA degree, the transition is relatively smooth. For others, the opportunity requires climbing the learning curve. I believe I’ve learned a thing or two from others experiences and my own experiences in managing people. I hope you find them useful.

1)  Be empathetic: You need to be able to put yourself in other people’s shoes, to see things from their perspective, and have an idea of what you would do if you where them. One way is to listen to your engineers actively without talking, understanding their concerns or ask questions that allow them to tell you more about themselves instead of yes and no questions.

2) Communication - Lifeblood of a project: One of the major contributors to a project's success is communication within and outside the team. An average person is not a great communicator. In particular, a significant number of gifted software developers are introverts. Sometimes they have a lot going on inside, and little time to express it to others. As a manager you must overcome this obstacle by actively connecting with people, and helping them connect with each other. It takes effort to get communication going, and once the momentum is built up, things become much easire.

3) There’s always a right thing to do, and sometimes it’s uncomfortable: For instance, there are things at work that I would have done differently if I had been designing or coding a particular component. This doesn’t mean they would have turned out better or worse.

Last year, while I was busy on a project, my team was ready for another product release. The release bombed and we had to rollback. When I reconnected with my team to sync up on deployment architecture, they explained a number of deployment decisions they had made. My knee-jerk reaction to some of them was "this is not right". After thinking for a minute it was clear that they had done mostly the right things given their time constraints and resources.

Trusting people to do the right thing is hard, although it gets easier when your team is absolutely awesome. As a manager you have to let people do things that take you out of your comfort zone. That’s fine, because they must own their work.

The point I was trying to make is that a stitch in time saves nine. Don’t succumb to the temptation of doing the easy thing, because many times it will cost you in the short and long term.

4) Help others succeed: One of the most rewarding aspects of my job has been to see people grow and evolve. It feels great when you see someone who was a shy programmer fresh out of school become a great presenter who can design a system and explain it to others. Or when you can help someone get to the next level and share some of your experiences with them.

As a manager,  you have to continously challenge your team by giving them variety of problems and providing guidance. Let them evaluate the pros and cons and make an informed decision. Remember engineere have their own way of doing things. Often they will take shortcuts during development or make a quick fix and move on or simply reinvent the wheel. And such practices comes to haunt later when things do not work as expected in production or clients abolutely hate the application.

Yes, there will be some horrible mistakes and some stellar success. Helping engineers understand their mistakes, and guiding them in the right direction instead of being critical is one way to help them succeed.

It is something you have to genuinely want.

5) Be an umbrella: An important part of your job is to simply let people do their work. You report to people (your own boss, customers, shareholders) who want different things. In some cases they are contradictory, or change too often. Sometimes there are storms brewing in the upper atmosphere,  and which could be averted. There is no need to burden your team with constantly changing weather they can do nothing about. Shield your people. Let them stand under you umbrella.

6) Be firm but nice: You know that you have the authority to call the shots. That is a powerful weapon, and you must use it wisely and sparingly. It’s much better when you can motivate people to agree on what to do. It’s true that it doesn’t always happen, but that should be your goal.

With valuable inputs from my friends and colleagues. Please feel free to share your experiences.

Sunday, February 19, 2012

BigMemory: Scaling vertically

Until recently, Moore's Law resulted in faster CPUs, but physical constraints - heat dissipation, for example - and computer requirements force manufacturers to place multiple cores to single CPU wafers. Increases in memory, however, are unconstrained by this type of physical requirement. For instance, today you can purchase standard Von Neumann servers from Oracle, Dell and HP with up to 2 TB of physical RAM and 64 cores. Servers with 32 cores and 512 GB of RAM are certainly more typical, but it's clear that today's commodity servers are now “big iron” in their own right.

The following table shows the random access times for different storage technologies:

Storage Technology Latency
Registers 1-3ns
CPU L1 Cache 2-8ns
CPU L2 Cache 5-12ns
Memory (RAM) 10-60ns
High-speed network gear 10,000-30,000ns
Solid State Disk (SSD) Drives 70,000-120,000ns
Hard Disk Drives 3,000,000-10,000,000ns

Since most enterprise applications tend to be I/O bound (i.e. they spend too much time waiting for data stored on disk), it follows that these applications would benefit greatly from the use of the lower-latency forms of storage at the top of this hierarchy. Specifically, today's time-sensitive enterprise applications would speed up significantly without much modification if they could replace all disk access with memory usage.

To drive this point home further, note that with modern network technology, latencies are at worst around 10,000-30,000 ns, with even lower latencies and higher speeds possible. This means that with the right equipment, accessing memory on other servers over the network is still much faster than reading from a local hard disk drive. All of this proves that as an enterprise architect or developer, your goal should be to use as much memory as possible in your applications.

The original Java language and platform design took into account the problems developers had when manually managing memory with other languages. For instance, when memory management goes wrong, developers experience memory leaks (lack of memory de-allocation) or memory access violations due to accessing memory that has already been de-allocated or attempting to de-allocate memory more than once. To relieve developers of these potential problems, Java implemented automatic memory management of the Java heap with a garbage collector (GC). When a running program no longer requires specific objects, the Java garbage collector reclaims its memory within the Java heap. Memory management is no longer an issue for Java developers, which results in greater productivity overall.

Garbage collection works reasonably well, but it becomes increasingly stressed as the size of the Java heap and numbers of live objects within it increase. Today, GC works well with an occupied Java heap around 3-4 GB in size, which also just happens to be the 32-bit memory limit.

The size limits imposed by Java garbage collection explain why 64-bit Java use remains a minority despite the availability of commodity 64-bit CPUs, operating systems and Java for half a decade. Attempts in Java to consume a heap beyond 3-4 GB in size can result in large garbage collection pauses (where application threads are stopped so that the GC can reclaim dead objects), unpredictable application response times and large latencies that can violate your application's service level agreements. With large occupied Java heaps, it's not uncommon to experience multi-second pauses, often at the most inopportune moments.

Solving the Java Heap/GC Problem

Large heaps are desirable in cases such as in-process caching and sessions storage. Both of these use cases use a map-like API where a framework allocates and de-allocates resources programmatically with puts and removes, opening up a way to constrain and solve the garbage collection problem.

BigMemory implementation from Terracotta and Apache (incubated) is an all-Java implementation built on Java's advanced NIO technology.  BigMemory is just a hair slower than Java heap. Its in process with the JVM so there is no management complexity and it is pure Java so there is no deployment complexity or sensitivity to JVM version. It creates a cache store in memory but outside the Java heap using Direct Byte Buffers. By storing data off heap, the garbage collector does not know about it and therefore does not collect it. Instead, BigMemory responds to the put and remove requests to allocate and free memory in its managed byte buffer.

This lets you keep the Java heap relatively small (1-2GB in size), while using the maximum amount of objects within physical memory. As a result, BigMemory can create caches in memory that match physical RAM limits (i.e. 2TB today and more in the future), without the garbage collection penalties that usually come with a Java heap of that size. By storing your application's data outside of the Java heap but within RAM inside your Java process, you get all the benefits of in-memory storage without the traditional Java costs.

How does ByteBuffer help?

Prior to JDK 1.4, Java programmers had limited options: they could read data into a byte[] and use explicit offsets (along with bitwise operators) to combine bytes into larger entities, or they could wrap the byte[] in a DataInputStream and get automatic conversion without random access.

The ByteBuffer class arrived in JDK 1.4 as part of the java.nio package, and combines larger-than-byte data operations with random access. To construct a simple cache using ByteBuffer, see this article.

For those looking for in-depth explanation on the topic, read the article here. It is a long read but it is worth the information gain.

Note: Essentially, both these products are managing contiguous region of memory. Even though  the approach described above avoids GC, fragmentation in any contiguous region eventually has a cost. The compaction cycle would happen far less often than a JVM garbage collection cycle would, so while it would cruelly affect performance during the cycle, it would occur fairly rarely.

That brings up another topic: how does the non-heap memory for direct buffers get released? After all, there's no method to explicitly close or release them. The answer is that they get garbage collected like any other object, but with one twist: if you don't have enough virtual memory space or commit charge to allocate a direct buffer, that will trigger a full collection even if there's plenty of heap memory available. -


It still makes sense to scale horizontally. Even so, you still leverage vertical scalability with BigMemory, which makes the distributed cache faster with higher density.

Further reading

Friday, February 17, 2012

Humor: Software

Software engineer 1: I am not able to connect to test database.
Please check the problem and restart it.

IT:  Issue resolved

Software engineer 2: What was the fix?

Software engineer 3:  Exorcism…

Keep smiling!
You may also like: Thirst

Sunday, February 12, 2012

Getting started with HTML5 WebSockets and Java - Part 1

Any technology around HTML5 seems to be a hot button topic these days and lucky for me that I got an opportunity to take a deep dive into WebSockets. Be it canvas, geolocation, video playback, drag-and-drop or WebSocket , there is a lot of buzz around these upcoming technologies.

Some background on HTML5 WebSockets

HTML5 WebSocket defines a bi-directional, full-duplex communication channel operates through a single TCP connection. The important thing to note is the WebSocket API is being standardized by the W3C, and the WebSocket protocol has been standardized by the IETF as RFC 6455.

What this means is that there are bunch of protocol versions and today's browsers support specific protocols versions only. e.g. Chrome 14, Firefox 7 and Internet Explorer 10 are currently the only browsers supporting the latest draft specification ("hybi-10") of the WebSocket protocol. The same goes for web servers. Different web servers are in varying stages of support for asynchronous messaging, with Jetty, Netty and Glassfish being the best options currently a provide native WebSocket support.

Tomcat 7 currently does not support WebSockets, yet. Check out the following issue tracker entry to learn more about the current state of affairs in Tomcat 7:

Socket.IO provides a default implementation for Node.JS.

It is expected that HTML5 WebSockets will replace the existing XHR approaches as well as Comet services by a new flexible and ultra high speed bidirectional TCP socket communication technology.

Technical details about WebSocket
  • Uses WebSocket protocol instead of HTTP
  • True full duplex communication channel; UTF8 strings and binary data can be sent in any direction at the same time.
  • It is not a raw TCP socket
  • Connection established by "upgrading" (handshake) from HTTP to WebSocket protocol
  • Runs via port 80/443 and is firewall/proxy friendly
  • Supports WebSocket ws:// and secure WebSocket wss://
Benefits of using WebSockets
  • Reduces network traffic. each message has 2 bytes of overhead
  • Low latency
  • No polling overhead
In tests run by Kaazing Corp, who have been closely involved in the specification process, it was found that "HTML5 Web Sockets can provide a 500:1 or - depending on the size of the HTTP headers - even a 1000:1 reduction in unnecessary HTTP header traffic and 3:1 reduction in latency";.

In short: Web Sockets can make your applications faster, more efficient, and more scalable.

The WebSocket Interface:

interface WebSocket {


//ready state
const unsigned short CONNECTING = 0;
const unsigned short OPEN = 0;
const unsigned short CLOSING = 0;
const unsigned short CLOSED = 0;

attribute Function onopen;
attribute Function onmessage;
attribute Function onerror;
attribute Function onclose;
boolean send(in data);
void close();

A typical Javascript client:

var wsUri = "ws://";

function init() 

function testWebSocket() 
 websocket = new WebSocket(wsUri); 
 websocket.onopen = function(evt) { onOpen(evt) }; 
 websocket.onclose = function(evt) { onClose(evt) }; 
 websocket.onmessage = function(evt) { onMessage(evt) }; 
 websocket.onerror = function(evt) { onError(evt) }; 

  function onOpen(evt) 
 doSend("WebSocket rocks"); 

  function onClose(evt) 

  function onMessage(evt) 
 writeToScreen('RESPONSE: ' +''); 

  function onError(evt) 
 writeToScreen('ERROR: ' +; 

  function doSend(message) 
 writeToScreen("SENT: " + message); 

function writeToScreen(message) 
 var pre = document.createElement("p"); 
 pre.innerHTML = message; 

Getting started with WebSockets with Java backend

To build applications around websockets, I will focus on Jetty, Netty and Atmosphere for building websocket applications. The focus will be on backend processing. JQuery or raw Javascript can be used as client. We will work with a sample chat application.

1. Jetty 8

Jetty is a Java-based HTTP server and servlet container. Jetty 8 is Servlet 3.0 container and provides WebSocket implementation, so that it is possible to offer server push via both HTTP and WebSocket protocol. Jetty provides WebSocket implementation as a subclass of HttpServlet. Here is a Jetty server example:

@WebServlet(urlPatterns = "/chat", asyncSupported = true)
public class ChatServlet extends WebSocketServlet {

        // GET method is used to establish a stream connection
        protected void doGet(HttpServletRequest request, HttpServletResponse response)
                        throws ServletException, IOException {

        // POST method is used to communicate with the server
        protected void doPost(HttpServletRequest request, HttpServletResponse response)
                        throws ServletException, IOException {

        public WebSocket doWebSocketConnect(HttpServletRequest request, String protocol) {
                return new ChatWebSocket();

 private Queue webSockets = new ConcurrentLinkedQueue();
        class ChatWebSocket implements WebSocket.OnTextMessage {

                Connection connection;

                public void onOpen(Connection connection) {
                        this.connection = connection;

                public void onClose(int closeCode, String message) {

                public void onMessage(String queryString) {
                        // Parses query string
                        UrlEncoded parameters = new UrlEncoded(queryString);                        

                        Map data = new LinkedHashMap();
                        data.put("username", parameters.getString("username"));
                        data.put("message", parameters.getString("message"));

                        try {
                                messages.put(new Gson().toJson(data));
                        } catch (InterruptedException e) {
                                throw new RuntimeException(e);

                public void onClose(int closeCode, String message) {

The advantage of this approach is that it means that WebSocket connections are terminated in the same rich application space provided by HTTP servers, thus a WebSocket enabled web application can be developed in a single environment rather than by collaboration between a HTTP server and a separate WebSocket server.

2. Atmosphere

Atmosphere is a WebSocket/Comet web framework that enables real time web application in Java. Atmosphere really simplifies a real time web application development and works with servlet containers that do not implement Servlet 3.0 but natively support Comet such as Tomcat 6. Here is an exmaple:

public class ChatAtmosphereHandler implements
                AtmosphereHandler {

        public void onRequest(AtmosphereResource resource)
                        throws IOException {
                HttpServletRequest request = resource.getRequest();
                HttpServletResponse response = resource.getResponse();


                // GET method is used to establish a stream connection
                if ("GET".equals(request.getMethod())) {
                        // Content-Type header

                // POST method is used to communicate with the server
                } else if ("POST".equals(request.getMethod())) {
                        Map data = new LinkedHashMap();
                        data.put("username", request.getParameter("username"));
                        data.put("message", request.getParameter("message"));

                        // Broadcasts a message
                        resource.getBroadcaster().broadcast(new Gson().toJson(data));

        public void onStateChange(AtmosphereResourceEvent event)
                        throws IOException {
                if (event.getMessage() == null) {

                sendMessage(event.getResource().getResponse().getWriter(), event.getMessage().toString());

        private void sendMessage(PrintWriter writer, String message) throws IOException {
                // default message format is message-size ; message-data ;


There are many resources on www describing websockets; and many libraries trying to solve the application portability problem. To a developer trying to embrace the upcoming websockets technology, it can be confusing and overwhelming.

For others who want to integrate websockets into an existing application, there is dilemma about choosing the framework and technology stack. Building applications around HTML 5 websockets is going to be tricky for next few months till the API and protocols are standardized and the open source community provides native implementations. Using Atmosphere and jWebSockets makes sense as it abstracts out the underlying provider.

I will be writing more about Atmosphere and jWebSocket in my future blog posts.

Further reading:

Part 2: 

Thursday, February 9, 2012

Spring annotations and Ehcache

Ehcache-spring-annotations is a library that simplifies caching in Spring based application using popular Ehcache library. In this article, I will present a simple way to integrate Ehcache in a spring based project.

Spring annotations are particularly useful when there is a need to cache methods of an application with minimal code changes and to use configuration to control the cache settings. In such cases, Ehcache Annotations can be used to dynamically configure caching of method return values.

For example, suppose you have a method: Product getProduct(long productId).

Once caching is added to this method, all calls to the method will be cached using the " productId" parameter as a key.

The steps described below works with Spring 3.1.0, Ehcache 2.5.1 and Ehcache-spring-annotations 1.2.0.

Step 1.

Configure maven to include the required libraries.

<!-- Include all spring dependencies -->

Step 2.

Configure Spring. You must add the following to your Spring configuration file in the beans declaration section:

<!-- Ehcache annotation config -->
<ehcache:annotation-driven cache-manager="ehCacheManager"/>

<bean id="ehCacheManager" class="org.springframework.cache.ehcache.EhCacheManagerFactoryBean">
 <property name="configLocation">

Step 3.

Configure ehcache.xml and put it in /WEB-INF/ or in classpath.

<?xml version="1.0" encoding="UTF-8"?>
<ehcache xmlns:xsi=""
    xsi:noNamespaceSchemaLocation="" updateCheck="false">

    <defaultCache eternal="false" maxElementsInMemory="1000"
        overflowToDisk="false" diskPersistent="false" timeToIdleSeconds="0"
        timeToLiveSeconds="600" memoryStoreEvictionPolicy="LRU"/>

    <cache name="product" eternal="false"
        maxElementsInMemory="100" overflowToDisk="false" diskPersistent="false"
        timeToIdleSeconds="0" timeToLiveSeconds="300"
        memoryStoreEvictionPolicy="LRU" />


 If you are not familiar with configuring EhCache please read their configuration guide.

Step 4.

Add the Annotation to methods you would like to cache. Lets assume you are using the Product getProduct(long productId) method from above.

@Cacheable(cacheName = "product")
public List getProduct(long productId) {
 Query query = entityManager.createQuery(" from Product where productId = :productId");
        query.setParameter("productId", productId);
 return query.getResultList();

@Cacheable annotation can be placed on a method of an interface, or a public method on a class.

Note: The cache name should match the cache name defined in ehcache.xml. Multiple cache names can be defined in ehcache.xml.

What is Spring annotations library doing in the background? is where all the work is done. It handles invocations on methods annotated with @Cacheable. It simply calls the method and stores the value in Ehcache if the key is not already present.

//See if there is a cached result
final Element element = cache.getWithLoader(cacheKey, null, methodInvocation);
if (element != null) {
    final Object value = element.getObjectValue();

    final boolean ignoreValue = cacheInterceptor.preInvokeCachable(cache, methodInvocation, cacheKey, value);
    if (!ignoreValue) {
 return value;
//No cached value or exception, proceed
final Object value;
try {
    value = methodInvocation.proceed();
if ((value != null || cacheableAttribute.isCacheNull()) && shouldCache) {
     cache.put(new Element(cacheKey, value));

return value;

Simple, isn't it!

Sunday, February 5, 2012

Amazon's Jungle move

The rumors were in the air for a long time, as early as 2009. And it happened on a Thursday, 2nd Feb, 2012 when, an Amazon powered comparison shopping portal was launched. There would have been last minute glitches, pushing the release to February 2 instead of February 1.

Of the many products and services that Amazon owns, launching an aggregation and comparison portal is an interesting move. It had been tried by few Indian portals in the past but to no great success. Amazon has not done it before in any other geography. One of the reasons could be the large number of shoppers visiting from India. So leveraging on,  Amazon can build a sizable customer base for its yet to be launched marketplace.

For online shoppers, Junglee can turn out to be the starting point for majority of the transactions. Rich product description, quality product reviews and product recommendations have always helped customers make informed purchase decision.

Looking at, some immediate points can be made:
  • A huge product selection offered on a Beta website. No API support yet.
  • Of the 1.2 crore product listed, 90 lakh are books.
  • The product listings are mainly from off-line sellers. Very few sellers listed have serious e-commerce presence. 
With the law on multi retail brand still hanging, setting up an E-Commerce store for Amazon is some distance away. But what will Amazon do with when is launched? 

Indeed, is a marketplace and they compete with other online sellers on their own platform. But in the current setup, it is unlikely that Junglee will list prominent Indian e-retailers and drive traffic to them. Of course, Junglee will drive traffic to apart from serving as the starting point for online shopping.

It could be a win-win situation for brick-and-mortar stores in the short term if the prices are competitive and product quality is good. Smaller cities have very few branded stores like Gitanjali, The Bombay Store and Fabindia. Shopper from these cities are already very quickly embracing online shopping.

In the past, Amazon has invested in niche shopping portals and content websites like, but comparison shopping portals is not really an Amazon thing. There seems to be a deeper meaning to the agenda.

Amazon has some serious competition in India. And there is a lot at stake. India is the world’s third largest e-commerce market, trailing China and the U.S. Indian online sales have doubled from around $4 billion in 2009 to nearly $10 billion in 2011, according to The Economic Times of India. Nearly $350 million has been poured into 40 Indian e-commerce start-ups as of year-end 2011 compared to $43 million in 11 companies two years prior.

There were speculations that Amazon will buy out an Indian e-retailer but it did not happen. With the amount of money some e-retailers have been throwing in advertising recently, it became a calling that the online commerce winner would be the one who brands it most on TV, print and other traditional media.

And this could be another reason for launching Junglee, to shake the tree and create some reasons to worry or even panic. Perhaps Amazon is targeting a specific competitor. True or not, there is bound to be some ripples. E-retailers with nothing new to offer will fold in. Established names will strengthen their foothold.

Whatever it is, the landscape is going to be more competitive. Customers will be spoilt for choice and price will not be the key differentiator, as is the case today.

Monday, January 30, 2012

Managing your virtual social world trail

By staying signed in on Google and performing search and accessing websites, a user leaves a digital trail behind. This trail is used by many companies to create a user profile. Richer user behavior profiles are very useful to advertisers as they can put personalized ads on websites you view.

There are some simple ways to stop this. The methods explained below will only work for advertisers who are part of Network Advertising Initiative and follows the industry privacy standards for online advertising.

Note: Before following the steps described below, ensure that you are signed in.


1. Visit ads on the web. If you do not like Google 'Make the ads you see on the web more interesting', click on 'Opt Out'.

2. Visit ads preference. This is the profile Google created for you based on your search and browse history. Google even guessed your age and gender. You can delete all the information from here and 'Opt Out'.

You can also download the Advertising cookie opt-out plugin to permanently block personalized ads.

Other advertisers

1. Visit NAI. Using their online tool, you can examine your computer to identify those member companies that have placed an advertising cookie file on your computer. Don't be surprised to see 50 or more cookies from different advertisers. Now, did you know that? You can 'Select All' and 'Submit' the form to clear all the cookies.

2. Visit Aboutads. Again, you can 'opt-out' of ad network.


1. Visit Account Settings. Click on 'Apps' tab. Delete all infrequently used apps.  For the remaining apps, set the appropriate privacy level by clicking on 'Edit'. e.g., if you use 'Washington social post reader', by default, all posts and activity from this app are visible on Facebook. Do disable this, select 'Only Me' in privacy settings.


Orkut was once very popular in India. If you do not use it anymore and switched camps, then you are not alone. But your profile, scraps, photos etc are still public. To delete your orkut account, visit this link.

If you would like to share more privacy tips, please leave a comment.

Also read: The Internet, social media and privacy.

The Internet, Social Media and Privacy

Very recently, Google signaled its intent to begin correlating data about its users' activities across all of its most popular services and across multiple devices. The goal: to deliver those richer behavior profiles to advertisers.

Likewise, Facebook announced it will soon make Timeline - the new, glitzier user interface for its service - mandatory. Timeline is designed to chronologically assemble, automatically display and make globally accessible the preferences, acquaintances and activities for most of Facebook's 800 million members.

Combined with the addition last week of some 60 apps specifically written for Timeline, consumers can provide a detailed account, often in real time, of the music they listen to, what they eat, where they shop - even where they jog.

The driver: advertising revenue. What this tells us is that there is a lot of money at stake here. The global on-line advertising market is expected to swell to $132 billion by 2015, up from $80 billion this year, according to eMarketer. As such, it is too dangerous for 2 companies to have so much personal data. What this also tells us that there is a significant shift in the way we interact with the Internet and social networks or 'publics' in general terms, today.We are in the middle of three trends.

1. From Anonymity to Real identity

Social media has become a part of our daily lives. The things we do on social networking websites and mobile devices is increasingly about who we are.

2. From wisdom of crowd to wisdom of friends

Earlier, the Internet gave you the information in an anonymity way or the information was not personalized for you. But these days, we are more influenced by the wisdom of our friends that the wisdom of the crowd.

3. From being receivers to broadcasters

Going back in time, one had to be rich or powerful or famous in order to have a voice. But now, the power of being a broadcaster is with everyone.

How do these trends affects us? As with most changes, this freely available freedom of expression has its goods and bads.

The Internet and the social media has given us a powerful tool to speak out and be heard. Information is personalized and quickly available. Reaching out to unknown people is simple and quick. Collaborating and sharing ideas has never been so easy.

But more disquieting are the negatives. What is shocking is that some are not even aware of it. Those who are aware of it choose to silently ignore it. Some create barriers, requiring effort to understand the published information but still go on to publish. Importantly, once the information is published, regardless of our expectations, it is available and mostly remains that way. Thus, publishing personal information has become our second nature. Teenagers are most vulnerable to negative efforts of social networking.

These social networks (and malicious softwares, ISPs etc) keep track of all interactions used on their sites and save them for later use.  It is now possible to reconstruct a persons life without paying a dime or hiring a detective agency. Apps like Timeline is all you need.

A complete user profile can be created and sold to advertisers. Richer personal details are very beneficial to identity thieves and cyberspies, as well as to parties motivated to use such data unfairly against consumers, such as insurance companies, prospective employers, political campaigners and, lately, hacktivists.

With the advent of social networking website or 'publics' in general, we have become more uninhibited and often let others know more than what is really required. Mature users practice self-governance but a fair percentage of users do not. "I just checked into a restaurant!" - Well good for you, but did you ever think about a possible security threat?

“The Breakup Notifier” is another example of a Facebook “cyberstalking” app that has recently been taken down. Essentially, the application notifies users when a person breaks up with their partner through Facebook, allowing users to instantly become aware of their friend's romantic activities. Thousands had used the app within 36 hours of it's launch.

Facebook recently made sharing even easier by automatically sharing what you're doing on Facebook-connected apps. Instead of having to “Like” something to share it, you'll just need to click “Add to Timeline” on any website or app, and that app will have permission to share your activity with your Facebook friends. What activity, you may ask? It could be the news articles you read online, the videos you watch, the photos you view, the music you listen to, or any other action within the site or app. Facebook calls this auto-sharing “Gestures.” Be careful for it may cause you embarrassment.

In the web usage mining parlance, these companies are already using Clickstream for marketing (by cloaking it under the term 'relevant content') but now they are openly publishing this data for everyone to see. And thats called killing two birds with one stone.

The commonly used phrase 'Your reputation precedes you'. Knowing someone and forming opinions has become quicker.

Well, one thing is for sure and that is we will be served with relevant advertisements soon!
[For those who don't know, in September of 2003, adjacent to a New York Post article about a gruesome murder in which the victim’s body parts were stashed in a suitcase, Google listed an ad for suitcases.Since that incident, Google has improved its filters and automatically pulls ads from pages with disturbing content.]

Also read: Managing your virtual social world trail.

Sunday, January 29, 2012

Suffering-Oriented Programming

While exploring the Flume architecture, I came across a presentation called 'Become Efficient or Die: The Story of BackType' that coined a new term - 'suffering-oriented programming'. It is a simple concept which means:
  • Don’t add process until you feel the pain of not having it.
  • Don’t build new technology until you feel the pain of not having it.
  • First make it possible. Then, make it beautiful. Then, make it fast.
Growing from 2 people to 3 people.

Other interesting points from the presentation:

Over-engineering = Attempting to create beautiful software without a thorough understanding of the problem domain.
Premature optimization = Optimizing before creating “beautiful” design, creating unnecessary complexity.
Refactoring and reducing technical debt = Garbage collection for the code base.

Technical debt:
  • W needs to be refactored
  • X deploy should be faster
  • Y needs more unit tests
  • Z needs more documentation
Such issues are never high priority to work on, but they build up and slow you down.

The presentation is available here.

Tuesday, January 24, 2012

The changing nature of Capitalism

We love our iPhones and iPads.

And that's why it's disconcerting to remember that the low prices of our iPhones and iPads - and the super-high profit margins of Apple - are only possible because our iPhones and iPads are made with labor practices that are that would be illegal in the United States.

The manufacturing processes of Apple and other electronics companies have come into sharp focus of late, with the revelation of details about how difficult life is for the Chinese workers who make the world's gadgets.

Here are some details:
  • Foxconn, one of the companies that builds iPhones and iPads (and products for many other electronics companies), has a factory in Shenzhen that employs 430,000 people.
  • According to estimates, about 5% of the workers are underage.
  • The official work day in China is 8 hours long, but the standard shift is 12 hours. Generally, these shifts extend to 14-16 hours, especially when there's a hot new gadget to build. 
  • The workers stay in dormitories. There are 15 beds, stacked like drawers up to the ceiling in a 12-by-12 cement cubic room.
The workers are paid ~$1 per hour or less. Manufacturing an iPhone in the United States would cost about $65 more than manufacturing it in China, where it costs an estimated $8. This additional $65 would dent the profit Apple makes on each iPhone, but it wouldn't eliminate it. (The iPhone average selling price is about $600, and Apple's average gross margin is about 40%. So Apple's gross profit on each iPhone is probably in the neighborhood of $250.)

But the reason Apple makes iPhones and iPads in China, is not just about money. The real reasons Apple makes iPhones in China are as follows:
  • Most of the components of iPhones and iPads - the supply chain - are now manufactured in China, so assembling the phones half-a-world away would create huge logistical challenges. It would also reduce flexibility - the ability to switch easily from one component supplier or manufacturer to another.
  • China's factories are now far bigger and more nimble than those in the United States. They can hire (and fire) tens of thousands of workers practically overnight. Because so many of the workers live on-site, they can also press them into service at a moment's notice. And they can change production practices and speeds extremely rapidly.
  • China now has a far bigger supply of appropriately-qualified engineers than the U.S. does - folks with the technical skills necessary to build complex gadgets but not so credentialed that they cost too much.
  • And, lastly, China's workforce is much hungrier and more frugal than many of their counterparts in the United States.

Marx made it clear that capitalism could not exist unless the worker produced a value greater than his or her own subsistence requirements. If a day's labor was required in order to keep a worker alive for a day, capital could not exist, for the day's labor would be exchanged for its own product, and capital would not be able to function as capital and consequently could not survive - If, however, a mere half-day's labor is enough to keep a worker alive during a whole day's labor, then surplus value results automatically

This surplus value does not arise in exchange, but in production. Thus the aim of production, from the capitalist's standpoint, is to get surplus value out of each worker. This is what Marx meant by the "exploitation of labor." Exploitation exists because the extra value contributed by labor is expropriated by the capitalist. Surplus value arises not because the worker is paid less than he is worth but because he produces more than he is worth.

Karl Marx was right in claiming that globalization, unfettered financial capitalism, and redistribution of income and wealth from labor to capital, could lead capitalism to self-destruct. As he argued, unregulated capitalism can lead to regular bouts of over-capacity, under-consumption and the recurrence of destructive financial crises, fueled by credit bubbles and asset-price booms and busts.

Marx argued capitalism had an internal contradiction that would cyclically lead to crises, and that contradiction - at minimum - would place intense pressure on the economic system.

The  "exploitation of labor" has effectively moved from US to China, the extent of which differs though. And it would move to another country after China. The class struggle has also begun. High unemployment and stagnant wager, according to economist Nouriel "Dr. Doom" Roubini, was triggered by the failure of laissez-faire, unregulated capitalism and free markets.

Companies are motivated to minimize costs, save money, and stockpile cash, but this leads to less money in the hands of employees, which means they have less money to spend and flow back to companies, thereby weakening the capitalist system.

The bottom line is that iPhones and iPads cost what they do because they are built using labor practices that would be illegal in this country - because people in this country consider those practices grossly unfair.

That's not a value judgment. It's a fact.

So, next time you pick up your iPhone or iPad, ask yourself how you feel about that.