Download Android App


Alternate Blog View: Timeslide Sidebar Magazine

Sunday, February 19, 2012

BigMemory: Scaling vertically

Until recently, Moore's Law resulted in faster CPUs, but physical constraints - heat dissipation, for example - and computer requirements force manufacturers to place multiple cores to single CPU wafers. Increases in memory, however, are unconstrained by this type of physical requirement. For instance, today you can purchase standard Von Neumann servers from Oracle, Dell and HP with up to 2 TB of physical RAM and 64 cores. Servers with 32 cores and 512 GB of RAM are certainly more typical, but it's clear that today's commodity servers are now “big iron” in their own right.

The following table shows the random access times for different storage technologies:

Storage Technology Latency
Registers 1-3ns
CPU L1 Cache 2-8ns
CPU L2 Cache 5-12ns
Memory (RAM) 10-60ns
High-speed network gear 10,000-30,000ns
Solid State Disk (SSD) Drives 70,000-120,000ns
Hard Disk Drives 3,000,000-10,000,000ns

Since most enterprise applications tend to be I/O bound (i.e. they spend too much time waiting for data stored on disk), it follows that these applications would benefit greatly from the use of the lower-latency forms of storage at the top of this hierarchy. Specifically, today's time-sensitive enterprise applications would speed up significantly without much modification if they could replace all disk access with memory usage.

To drive this point home further, note that with modern network technology, latencies are at worst around 10,000-30,000 ns, with even lower latencies and higher speeds possible. This means that with the right equipment, accessing memory on other servers over the network is still much faster than reading from a local hard disk drive. All of this proves that as an enterprise architect or developer, your goal should be to use as much memory as possible in your applications.

The original Java language and platform design took into account the problems developers had when manually managing memory with other languages. For instance, when memory management goes wrong, developers experience memory leaks (lack of memory de-allocation) or memory access violations due to accessing memory that has already been de-allocated or attempting to de-allocate memory more than once. To relieve developers of these potential problems, Java implemented automatic memory management of the Java heap with a garbage collector (GC). When a running program no longer requires specific objects, the Java garbage collector reclaims its memory within the Java heap. Memory management is no longer an issue for Java developers, which results in greater productivity overall.

Garbage collection works reasonably well, but it becomes increasingly stressed as the size of the Java heap and numbers of live objects within it increase. Today, GC works well with an occupied Java heap around 3-4 GB in size, which also just happens to be the 32-bit memory limit.

The size limits imposed by Java garbage collection explain why 64-bit Java use remains a minority despite the availability of commodity 64-bit CPUs, operating systems and Java for half a decade. Attempts in Java to consume a heap beyond 3-4 GB in size can result in large garbage collection pauses (where application threads are stopped so that the GC can reclaim dead objects), unpredictable application response times and large latencies that can violate your application's service level agreements. With large occupied Java heaps, it's not uncommon to experience multi-second pauses, often at the most inopportune moments.

Solving the Java Heap/GC Problem

Large heaps are desirable in cases such as in-process caching and sessions storage. Both of these use cases use a map-like API where a framework allocates and de-allocates resources programmatically with puts and removes, opening up a way to constrain and solve the garbage collection problem.

BigMemory implementation from Terracotta and Apache (incubated) is an all-Java implementation built on Java's advanced NIO technology.  BigMemory is just a hair slower than Java heap. Its in process with the JVM so there is no management complexity and it is pure Java so there is no deployment complexity or sensitivity to JVM version. It creates a cache store in memory but outside the Java heap using Direct Byte Buffers. By storing data off heap, the garbage collector does not know about it and therefore does not collect it. Instead, BigMemory responds to the put and remove requests to allocate and free memory in its managed byte buffer.

This lets you keep the Java heap relatively small (1-2GB in size), while using the maximum amount of objects within physical memory. As a result, BigMemory can create caches in memory that match physical RAM limits (i.e. 2TB today and more in the future), without the garbage collection penalties that usually come with a Java heap of that size. By storing your application's data outside of the Java heap but within RAM inside your Java process, you get all the benefits of in-memory storage without the traditional Java costs.

How does ByteBuffer help?

Prior to JDK 1.4, Java programmers had limited options: they could read data into a byte[] and use explicit offsets (along with bitwise operators) to combine bytes into larger entities, or they could wrap the byte[] in a DataInputStream and get automatic conversion without random access.

The ByteBuffer class arrived in JDK 1.4 as part of the java.nio package, and combines larger-than-byte data operations with random access. To construct a simple cache using ByteBuffer, see this article.

For those looking for in-depth explanation on the topic, read the article here. It is a long read but it is worth the information gain.

Note: Essentially, both these products are managing contiguous region of memory. Even though  the approach described above avoids GC, fragmentation in any contiguous region eventually has a cost. The compaction cycle would happen far less often than a JVM garbage collection cycle would, so while it would cruelly affect performance during the cycle, it would occur fairly rarely.

That brings up another topic: how does the non-heap memory for direct buffers get released? After all, there's no method to explicitly close or release them. The answer is that they get garbage collected like any other object, but with one twist: if you don't have enough virtual memory space or commit charge to allocate a direct buffer, that will trigger a full collection even if there's plenty of heap memory available. -

Conclusion

It still makes sense to scale horizontally. Even so, you still leverage vertical scalability with BigMemory, which makes the distributed cache faster with higher density.

Further reading
http://raffaeleguidi.wordpress.com/

Friday, February 17, 2012

Humor: Software

Software engineer 1: I am not able to connect to test database.
Please check the problem and restart it.

IT:  Issue resolved

Software engineer 2: What was the fix?

Software engineer 3:  Exorcism…





Keep smiling!
You may also like: Thirst

Sunday, February 12, 2012

Getting started with HTML5 WebSockets and Java - Part 1

Any technology around HTML5 seems to be a hot button topic these days and lucky for me that I got an opportunity to take a deep dive into WebSockets. Be it canvas, geolocation, video playback, drag-and-drop or WebSocket , there is a lot of buzz around these upcoming technologies.

Some background on HTML5 WebSockets

HTML5 WebSocket defines a bi-directional, full-duplex communication channel operates through a single TCP connection. The important thing to note is the WebSocket API is being standardized by the W3C, and the WebSocket protocol has been standardized by the IETF as RFC 6455.

What this means is that there are bunch of protocol versions and today's browsers support specific protocols versions only. e.g. Chrome 14, Firefox 7 and Internet Explorer 10 are currently the only browsers supporting the latest draft specification ("hybi-10") of the WebSocket protocol. The same goes for web servers. Different web servers are in varying stages of support for asynchronous messaging, with Jetty, Netty and Glassfish being the best options currently a provide native WebSocket support.

Tomcat 7 currently does not support WebSockets, yet. Check out the following issue tracker entry to learn more about the current state of affairs in Tomcat 7:

https://issues.apache.org/bugzilla/show_bug.cgi?id=51181

Socket.IO provides a default implementation for Node.JS.

It is expected that HTML5 WebSockets will replace the existing XHR approaches as well as Comet services by a new flexible and ultra high speed bidirectional TCP socket communication technology.

Technical details about WebSocket
  • Uses WebSocket protocol instead of HTTP
  • True full duplex communication channel; UTF8 strings and binary data can be sent in any direction at the same time.
  • It is not a raw TCP socket
  • Connection established by "upgrading" (handshake) from HTTP to WebSocket protocol
  • Runs via port 80/443 and is firewall/proxy friendly
  • Supports WebSocket ws:// and secure WebSocket wss://
Benefits of using WebSockets
  • Reduces network traffic. each message has 2 bytes of overhead
  • Low latency
  • No polling overhead
In tests run by Kaazing Corp, who have been closely involved in the specification process, it was found that "HTML5 Web Sockets can provide a 500:1 or - depending on the size of the HTTP headers - even a 1000:1 reduction in unnecessary HTTP header traffic and 3:1 reduction in latency";.

In short: Web Sockets can make your applications faster, more efficient, and more scalable.

The WebSocket Interface:



interface WebSocket {

....

//ready state
const unsigned short CONNECTING = 0;
const unsigned short OPEN = 0;
const unsigned short CLOSING = 0;
const unsigned short CLOSED = 0;

..
//Networking
attribute Function onopen;
attribute Function onmessage;
attribute Function onerror;
attribute Function onclose;
boolean send(in data);
void close();
};


A typical Javascript client:



var wsUri = "ws://echo.websocket.org/";

function init() 
 { 
 testWebSocket(); 
 }

function testWebSocket() 
 { 
 websocket = new WebSocket(wsUri); 
 websocket.onopen = function(evt) { onOpen(evt) }; 
 websocket.onclose = function(evt) { onClose(evt) }; 
 websocket.onmessage = function(evt) { onMessage(evt) }; 
 websocket.onerror = function(evt) { onError(evt) }; 
 } 

  function onOpen(evt) 
 { 
 writeToScreen("CONNECTED"); 
 doSend("WebSocket rocks"); 
 } 

  function onClose(evt) 
 { 
 writeToScreen("DISCONNECTED"); 
 } 

  function onMessage(evt) 
 { 
 writeToScreen('RESPONSE: ' + evt.data+''); 
 websocket.close(); 
 } 

  function onError(evt) 
 { 
 writeToScreen('ERROR: ' + evt.data); 
 } 

  function doSend(message) 
 { 
 writeToScreen("SENT: " + message); 
 websocket.send(message); 
 }

function writeToScreen(message) 
 { 
 var pre = document.createElement("p"); 
 pre.innerHTML = message; 
....
 }

Getting started with WebSockets with Java backend

To build applications around websockets, I will focus on Jetty, Netty and Atmosphere for building websocket applications. The focus will be on backend processing. JQuery or raw Javascript can be used as client. We will work with a sample chat application.

1. Jetty 8

Jetty is a Java-based HTTP server and servlet container. Jetty 8 is Servlet 3.0 container and provides WebSocket implementation, so that it is possible to offer server push via both HTTP and WebSocket protocol. Jetty provides WebSocket implementation as a subclass of HttpServlet. Here is a Jetty server example:



@WebServlet(urlPatterns = "/chat", asyncSupported = true)
public class ChatServlet extends WebSocketServlet {

        // GET method is used to establish a stream connection
        @Override
        protected void doGet(HttpServletRequest request, HttpServletResponse response)
                        throws ServletException, IOException {
         //Implementation
        }

        // POST method is used to communicate with the server
        @Override
        protected void doPost(HttpServletRequest request, HttpServletResponse response)
                        throws ServletException, IOException {
  //Implementation
 }

        @Override
        public WebSocket doWebSocketConnect(HttpServletRequest request, String protocol) {
                return new ChatWebSocket();
        }

 private Queue webSockets = new ConcurrentLinkedQueue();
        class ChatWebSocket implements WebSocket.OnTextMessage {

                Connection connection;

                @Override
                public void onOpen(Connection connection) {
                        this.connection = connection;
                        webSockets.add(this);
                }

                @Override
                public void onClose(int closeCode, String message) {
                        webSockets.remove(this);
                }

                @Override
                public void onMessage(String queryString) {
                        // Parses query string
                        UrlEncoded parameters = new UrlEncoded(queryString);                        

                        Map data = new LinkedHashMap();
                        data.put("username", parameters.getString("username"));
                        data.put("message", parameters.getString("message"));


                        try {
                                messages.put(new Gson().toJson(data));
                        } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                        }
                }

                @Override
                public void onClose(int closeCode, String message) {
                        webSockets.remove(this);
                }
        }
}


The advantage of this approach is that it means that WebSocket connections are terminated in the same rich application space provided by HTTP servers, thus a WebSocket enabled web application can be developed in a single environment rather than by collaboration between a HTTP server and a separate WebSocket server.

2. Atmosphere

Atmosphere is a WebSocket/Comet web framework that enables real time web application in Java. Atmosphere really simplifies a real time web application development and works with servlet containers that do not implement Servlet 3.0 but natively support Comet such as Tomcat 6. Here is an exmaple:



public class ChatAtmosphereHandler implements
                AtmosphereHandler {

        public void onRequest(AtmosphereResource resource)
                        throws IOException {
                HttpServletRequest request = resource.getRequest();
                HttpServletResponse response = resource.getResponse();

                request.setCharacterEncoding("utf-8");
                response.setCharacterEncoding("utf-8");

                // GET method is used to establish a stream connection
                if ("GET".equals(request.getMethod())) {
                        // Content-Type header
                        response.setContentType("text/plain");
                        resource.suspend();

                // POST method is used to communicate with the server
                } else if ("POST".equals(request.getMethod())) {
                        Map data = new LinkedHashMap();
                        data.put("username", request.getParameter("username"));
                        data.put("message", request.getParameter("message"));


                        // Broadcasts a message
                        resource.getBroadcaster().broadcast(new Gson().toJson(data));
                }
        }

        public void onStateChange(AtmosphereResourceEvent event)
                        throws IOException {
                if (event.getMessage() == null) {
                        return;
                }

                sendMessage(event.getResource().getResponse().getWriter(), event.getMessage().toString());
        }

        private void sendMessage(PrintWriter writer, String message) throws IOException {
                // default message format is message-size ; message-data ;
                writer.print(message.length());
                writer.print(";");
                writer.print(message);
                writer.print(";");
                writer.flush();
        }
}


Conclusion

There are many resources on www describing websockets; and many libraries trying to solve the application portability problem. To a developer trying to embrace the upcoming websockets technology, it can be confusing and overwhelming.

For others who want to integrate websockets into an existing application, there is dilemma about choosing the framework and technology stack. Building applications around HTML 5 websockets is going to be tricky for next few months till the API and protocols are standardized and the open source community provides native implementations. Using Atmosphere and jWebSockets makes sense as it abstracts out the underlying provider.

I will be writing more about Atmosphere and jWebSocket in my future blog posts.

Further reading:

http://www.html5rocks.com/en/tutorials/websockets/basics/
http://websocket.org/
http://en.wikipedia.org/wiki/WebSockets

Part 2: 

Thursday, February 9, 2012

Spring annotations and Ehcache


Ehcache-spring-annotations is a library that simplifies caching in Spring based application using popular Ehcache library. In this article, I will present a simple way to integrate Ehcache in a spring based project.

Spring annotations are particularly useful when there is a need to cache methods of an application with minimal code changes and to use configuration to control the cache settings. In such cases, Ehcache Annotations can be used to dynamically configure caching of method return values.

For example, suppose you have a method: Product getProduct(long productId).

Once caching is added to this method, all calls to the method will be cached using the " productId" parameter as a key.

The steps described below works with Spring 3.1.0, Ehcache 2.5.1 and Ehcache-spring-annotations 1.2.0.

Step 1.

Configure maven to include the required libraries.


<dependency>
 <groupid>net.sf.ehcache</groupid>
 <artifactid>ehcache</artifactid>
 <version>2.5.1</version>
 <type>pom</type>
</dependency>
<dependency>
 <groupid>com.googlecode.ehcache-spring-annotations</groupid>
 <artifactid>ehcache-spring-annotations</artifactid>
 <version>1.2.0</version>
 <type>jar</type>
 <scope>compile</scope>
 <exclusions>
  <exclusion>
   <groupid>org.springframework</groupid>
   <artifactid>spring-expression</artifactid>
  </exclusion>
 </exclusions>
</dependency>
<!-- Include all spring dependencies -->
<dependency>
 <groupid>org.springframework</groupid>
 <artifactid>spring-core</artifactid>
 <version>3.1.0.RELEASE</version>
 <type>jar</type>
 <scope>compile</scope>
</dependency>

Step 2.

Configure Spring. You must add the following to your Spring configuration file in the beans declaration section:


<!-- Ehcache annotation config -->
<ehcache:annotation-driven cache-manager="ehCacheManager"/>

<bean id="ehCacheManager" class="org.springframework.cache.ehcache.EhCacheManagerFactoryBean">
 <property name="configLocation">
  <value>/WEB-INF/ehcache.xml</value>
 </property>
</bean>

Step 3.

Configure ehcache.xml and put it in /WEB-INF/ or in classpath.



<?xml version="1.0" encoding="UTF-8"?>
<ehcache xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:noNamespaceSchemaLocation="http://ehcache.org/ehcache.xsd" updateCheck="false">

    <defaultCache eternal="false" maxElementsInMemory="1000"
        overflowToDisk="false" diskPersistent="false" timeToIdleSeconds="0"
        timeToLiveSeconds="600" memoryStoreEvictionPolicy="LRU"/>

    <cache name="product" eternal="false"
        maxElementsInMemory="100" overflowToDisk="false" diskPersistent="false"
        timeToIdleSeconds="0" timeToLiveSeconds="300"
        memoryStoreEvictionPolicy="LRU" />

</ehcache>


 If you are not familiar with configuring EhCache please read their configuration guide.

Step 4.

Add the Annotation to methods you would like to cache. Lets assume you are using the Product getProduct(long productId) method from above.



@Cacheable(cacheName = "product")
public List getProduct(long productId) {
 Query query = entityManager.createQuery(" from Product where productId = :productId");
        query.setParameter("productId", productId);
 query.setMaxResults(10);
 return query.getResultList();
}


@Cacheable annotation can be placed on a method of an interface, or a public method on a class.

Note: The cache name should match the cache name defined in ehcache.xml. Multiple cache names can be defined in ehcache.xml.

What is Spring annotations library doing in the background?

EhCacheInterceptor.java is where all the work is done. It handles invocations on methods annotated with @Cacheable. It simply calls the method and stores the value in Ehcache if the key is not already present.



//See if there is a cached result
final Element element = cache.getWithLoader(cacheKey, null, methodInvocation);
if (element != null) {
    final Object value = element.getObjectValue();

    final boolean ignoreValue = cacheInterceptor.preInvokeCachable(cache, methodInvocation, cacheKey, value);
    if (!ignoreValue) {
 return value;
    }
}
....
....
//No cached value or exception, proceed
final Object value;
try {
    value = methodInvocation.proceed();
}
if ((value != null || cacheableAttribute.isCacheNull()) && shouldCache) {
     cache.put(new Element(cacheKey, value));
}

return value;

Simple, isn't it!

Sunday, February 5, 2012

Amazon's Jungle move

The rumors were in the air for a long time, as early as 2009. And it happened on a Thursday, 2nd Feb, 2012 when Junglee.com, an Amazon powered comparison shopping portal was launched. There would have been last minute glitches, pushing the release to February 2 instead of February 1.

Of the many products and services that Amazon owns, launching an aggregation and comparison portal is an interesting move. It had been tried by few Indian portals in the past but to no great success. Amazon has not done it before in any other geography. One of the reasons could be the large number of shoppers visiting Amazon.com from India. So leveraging on Junglee.com,  Amazon can build a sizable customer base for its yet to be launched marketplace.

For online shoppers, Junglee can turn out to be the starting point for majority of the transactions. Rich product description, quality product reviews and product recommendations have always helped customers make informed purchase decision.

Looking at Junglee.com, some immediate points can be made:
  • A huge product selection offered on a Beta website. No API support yet.
  • Of the 1.2 crore product listed, 90 lakh are books.
  • The product listings are mainly from off-line sellers. Very few sellers listed have serious e-commerce presence. 
With the law on multi retail brand still hanging, setting up an E-Commerce store for Amazon is some distance away. But what will Amazon do with Junglee.com when Amazon.in is launched? 

Indeed, Amazon.com is a marketplace and they compete with other online sellers on their own platform. But in the current setup, it is unlikely that Junglee will list prominent Indian e-retailers and drive traffic to them. Of course, Junglee will drive traffic to Amazon.in apart from serving as the starting point for online shopping.

It could be a win-win situation for brick-and-mortar stores in the short term if the prices are competitive and product quality is good. Smaller cities have very few branded stores like Gitanjali, The Bombay Store and Fabindia. Shopper from these cities are already very quickly embracing online shopping.

In the past, Amazon has invested in niche shopping portals and content websites like shelfari.com, but comparison shopping portals is not really an Amazon thing. There seems to be a deeper meaning to the agenda.

Amazon has some serious competition in India. And there is a lot at stake. India is the world’s third largest e-commerce market, trailing China and the U.S. Indian online sales have doubled from around $4 billion in 2009 to nearly $10 billion in 2011, according to The Economic Times of India. Nearly $350 million has been poured into 40 Indian e-commerce start-ups as of year-end 2011 compared to $43 million in 11 companies two years prior.

There were speculations that Amazon will buy out an Indian e-retailer but it did not happen. With the amount of money some e-retailers have been throwing in advertising recently, it became a calling that the online commerce winner would be the one who brands it most on TV, print and other traditional media.

And this could be another reason for launching Junglee, to shake the tree and create some reasons to worry or even panic. Perhaps Amazon is targeting a specific competitor. True or not, there is bound to be some ripples. E-retailers with nothing new to offer will fold in. Established names will strengthen their foothold.

Whatever it is, the landscape is going to be more competitive. Customers will be spoilt for choice and price will not be the key differentiator, as is the case today.