Download Android App


Alternate Blog View: Timeslide Sidebar Magazine

Friday, December 30, 2011

Time series and OpenTSDB


OpenTSDB is a distributed, scalable Time Series Database (TSDB) written on top of HBase. OpenTSDB was written to address a common need: store, index and serve metrics collected from computer systems (network gear, operating systems, applications) at a large scale, and make this data easily accessible and graphable.

This article will help you gather metrics and send it to OpenTSDB server. The detailed instructions on setting up OpenTSDB can be found here.

First, register the metric.

./tsdb mkmetric order.count.1m

A simple MySQL collector.

cat > orders.sh <<\EOF
#!/bin/bash
set -e
while true; do
  echo "Querying orders...."
  mysql -u user -ppassword -Ddatabase --batch -N --execute "select count(orderdate) from ORDERS where orderdate > date_sub(now(), INTERVAL 1 MINUTE)" \
  | awk -F"\t" -v now=`date +%s` -v host=`hostname` '{ print "put order.count.1m " now " " $1 " host=" host }' \
  | nc -w 5 tsdb.host.name tsdb.host.port
  || exit  # To handle SIGPIPE properly.
  sleep 60
done
EOF

chmod +x orders.sh
nohup ./orders.sh &

This shell script will poll the MySQL database every 1 minute, get the order count for the last 1 minute and publish it to the TSDB server. Open the TSDB screen, select "order.count.1m " as metric and you should see a graph like the following:


Most of data collection work can be accomplished by writing simple shell or python scripts. Here is a code snippet that can help you publish metrics using Java.


public static void main(String[] args) throws Exception {

   Socket echoSocket = null;
   PrintWriter out = null;

   String host = null;
   int port = 0;

   while (true) {
       try {
           if (echoSocket == null) {  // Only open the connection if necessary.
                echoSocket = new Socket(host, port);
                out = new PrintWriter(echoSocket.getOutputStream(), true);
           }

           int orderCount = getOrderCount(60);//Orders in last 60 sec
           out.println("put order.count.1m " + (new Date().getTime() / 1000) + " 100");
           out.flush();
       } catch (UnknownHostException e) {
           System.err.println("Don't know about host: " + host);
           return;
       } catch (IOException e) {
           System.err.println("Couldn't get I/O for the connection to: " + host);
           out.close();
           echoSocket.close();
           out = null;
           echoSocket = null;
       }

       System.out.println("Sleeping for 60 s");
       Thread.sleep(60000);
   }
}

public int getOrderCount(int interval){
   .........
   return orderCount;
}

More collectors are available on github.

There are many good things about TSDB like:
  • Simple to setup and get started.
  • Easy to build dashboards. Plot and compare multiple metrics on the same graph.
  • Simple HTTP API to query the time series database using "/q" option. It can produce results in HTML, JSON, PNG or plain text. This way, you can use the data for numerical analysis or use a different plotting library than the default Gnuplot used by OpenTSDB.
It will be great to see some enhancements like:
  • More aggregation functions like percentiles etc.
  • Better support for counters, sampling and debugging.
  • User guide.
  • Export to more formats like CSV, Excel etc.

StatsD is another great tool for metrics aggregation and plotting. More about it in my future blog post.

Tuesday, December 20, 2011

Globalization!

Shared by my friend:

My wife wanted knitting needles as aluminum needles, being relatively heavy, were proving to be painful for her. After some research, she decided on buying bamboo needles which are light weight. She could not find it online in India and so she ordered it on a UK based E-Commerce website and shipped it to her friend in London.

She was super excited after her friend arrived and handed her the needles. She quickly opened the parcel and looked at the plastic wrapper to realize the needles were "Made in India"!


Well, the needles flew all the way from India to UK and back!

Sunday, December 18, 2011

The Principles!

Robustness Principle: "be conservative in what you do, be liberal in what you accept from others".

This principle encourages excellence in design and maximizes interoperability. It's practical meaning is:
  • when designing software, be careful to be compliant with the protocols specified in the RFCs and STDs (Internet-related standards documents).
  • do not introduce features or proprietary extensions to protocols that will be incompatible with other products or systems that are compliant.
  • design your software to be tolerant of products that may not be completely compliant with Internet, Unix and Linux standards.
  • where the standards are vague, accept as wide a range of reasonable operation as possible.
The principle is still honored today and is frequently mentioned in modern day Internet standards documents (RFC 2015 is just one example). Compatibility is a tradition in the world of the Internet, Unix and now Linux.

Douglas Engelbart's law of maturity: "The rate at which a person can mature is directly proportional to the embarrassment he can tolerate."

Excerpts from the book "The Engelbart Hypothesis: Dialogs with Douglas Engelbart"

"And then I’d realize, “Boy, that’s just the way I often sound.”When I was in the service I had time to think through a lot of things. I generated a sort of algorithm: the rate at which a person can mature is directly proportional to how much embarrassment he can tolerate. And I realized that embarrassment didn’t seem to bother me very much, because of my upbringing and the perspective I had about the world. Something Benjamin Franklin wrote was so beautiful, “You wouldn’t worry half so much about what other people thought about you if you realized how seldom they did,” and I’d say, “Oh, that’s right.”I seem to have a lot of intuitive capability. I just don’t mind at all not being able to explain to people how I reached something. It doesn’t bother me."

Principle of locality, also known as the principle of locality, is the phenomenon of the same value or related storage locations being frequently accessed. There are two basic types of reference locality. Temporal locality refers to the reuse of specific data and/or resources within relatively small time durations. Spatial locality refers to the use of data elements within relatively close storage locations.

Wednesday, December 14, 2011

New age and electronic music

New Age music is music of various styles intended to create artistic inspiration, relaxation, and optimism. It is used by listeners for yoga, massage, meditation and reading as a method of stress management or to create a peaceful atmosphere in their home or other environments, and is often associated with environmentalism and New Age spirituality.

'Beautiful' by Ryan Farish is pretty special. Listening to this album is like taking a tour through new age and electronic music. There’s some of Enigma’s arpeggio bells; a few chanting voices from Adiemus; some pleasant “whoosh” sounds from Jean Michelle Jarre; and them all topped off with some tastefully applied pan flute and simple Jim Brickman piano melodies. And yet, I think this is pretty special stuff. The compositions are nice and elegant; the melodies are simple but enriched by some intricate electronic noodling. It's really kind of intoxicating.

Enjoy 2 beautiful videos from this album.



Friday, December 9, 2011

The wifey confusion!

Friend 1: How is wify doing ?
Friend 2: do not know.. its data card...
its uses different network... something like UMTS or HDPS
Friend 1: hahahaha, wify means wife dear, not wi fi!
Friend 2: Oh!


You may also like: Thirst!, Software Humor

Thursday, December 8, 2011

Interesting times in Indian E-Retailing

These are interesting times for E-Commerce in India.

Companies are getting funded by the dozens, spending money on TVCs and burning investor money, distributing gift coupons as if it were cheaper that water, and what not. In short, e-commerce is catching up!

So, an e-commerce company X, famous for selling books online, recently started offering replacement guarantee for electronic products.

Company Y, another e-commerce player in electronics space, responded with:

In these fascinating times, Taggle.com shut shop, stating competition to be too cut-throat for building a profitable business.

One e-commerce website selling electronics and gadgets since inception, entered into another category - "Toys"!

A milestone day for India’s ecommerce industry was observed recently. While most of the sites are selling mobile, tablets, perfumes etc. Ebay went a step further and sold ‘milk giving black young buffalo 20 liter per day’.

Apart from free shipping offer (national courier), the deal also comes with 7-days exchange offer!


Deal websites are not just about deal anymore. They have entered mainstream e-retailing.

Very recently, Bollywood felt left out. Yes, Bollywood. After all, who would want to miss the e-commerce money making race.

Karishma Kapoor recently invested in an e-commerce startup that operates a retail website for kids.  Ajay Devgan launched an e-ticketing and film entertainment site Ticketplease.com in January this year. Other investors in this venture include Sanjay Dutt, Bollywood producer Nitin Manmohan, Vatsal Seth, Ramakant Tibrewal, Vijay Jain and Amit Sharma of Roha JP Group. Seth has also set up Celebwears.com with actor Sohail Khan. Surely, Indian celebrities are now reaching beyond endorsements and turning into full-fledged angel investors focused on start-ups.

Finally, not all is well. dearflipkart.blogspot.com will tell you why? Interesting name for a blog, isn't it? Read the article titled Dear Flip****, Your Customer Service Has Started Sucking. Shoppers are unhappy to the extent that they have dedicated a blog and websites to voice their frustration. 

And it is just the beginning!

Thirst!

I wake up in the middle of the night, thirsty. I drink water. It doesn't quench my thirst. I drink more, and more, and more. But the thirst refuses to be quenched, like a wildfire refusing to be doused by the gallons of water dropped over it by a helicopter.

It finally dawns on me: I'm thirsty for real, but I'm drinking water in my dreams. I sigh, relieved. I decide to wake up, take pick up the bottle, and quench my thirst for real this time.

-- From Rajbir Bhattacharjee blog

You may also like: BBA entrance interview
You may also like: The wifey confusion!

Sunday, December 4, 2011

NoSQL - Right tool for the right job!

First of all, the name: NoSQL is not "Never SQL" or "No To SQL". It is generally expanded to "Not only SQL".

There are a crazy number of softwares out there that are part of the 'NoSQL' movement. Some of the popular ones are listed below:

Tokyo Cabinet/Tyrant, Redis, Voldemort, Oracle BDB
CouchDB, MongoDb
Neo4J, InfoGrid, Infinite Graph
Cassandra, HBase, Riak
Oracle Coherence, db4o, ObjectStore, GemStone, Polar

NoSQL softwares can be categorized broadly into 4 categories:
  1. Key-value stores
    • Based on Amazon's dynamo paper
    • Collection of key-value pairs
    • Example: Voldemort,  Tokyo Cabinet
  2. BigTable clones
    • Based on Google's BigTable paper.
    • Big table, column families
    • Example: HBase, Hypertable,  Cassandra
  3. Document database
    • Inspired by Lotus Notes
    • Collection of key-value pairs
    • Example: MongoDB, CouchDB
  4. Graph Databases
    • Inspired by Euler and graph theory
    • Nodes, relations and key-value on both.
    • Neo4j, Sones, Allegrograph
NoSQL is primarily about scalability. Generally, when we talk about scalability, we think about the data size - from gigabytes to pentabytes and maybe more. There is another dimension to scalability - complexity of data. The two axes to scalability: Size and complexity.

Scaling size: Dealing with more and more information that is roughly similar in nature.
Scaling complexity: How do you deal with data that is more messy and more semi-structure. The categories above are mapped in the graph below.


As you can see, the focus of these softwares are very different.
Key-value stores: Simplistic data model that can massively scale.
Graph database: Capable of handling complex connected data; difficult to achieve horizontal scalability.

Typical applications:

Key-value store: Storing the customer shopping cart of a customer in an E-Commerce application. Since all lookups are based on key (customer id), and in most cases, ad-hoc query is not needed. This makes key-value an ideal storage for shopping cart data.

BigTable clones: You have a news site where any piece of content: articles, comments, author profiles, can be voted on and an optional comment supplied on the vote. You create one store per user and one store per piece of content, using a UUID as the key (generating one for each piece of content and user). The user's store holds every vote they have ever made while the content "bucket" contains a copy of every vote that has been made on the piece of content. Overnight you run a batch job to identify content that users have voted on, you generate a list of content for each user that has high votes but which they have not voted on. You then push this list of recommended articles into the user's "bucket".

Document database: Documents are semi-structured data. Modeling product attributes using RDBMS has always been a challenge. All products will have a common set of attributes e.g. title, sku, price and a category specific set of attributes. Document databases effortlessly solve this problem. MongoDB is one step away from MySQL providing ad-hoc queries, transactions, indexes and more.

Graph database: Great for cases where data is connected e.g. social applications like twitter, facebook and blogs.