Posted by Hindol Datta
We are entering into a new age wherein we are interested in picking up a finer understanding of relationships between businesses and customers, organizations and employees, products and how they are being used, how different aspects of the business and the organizations connect to produce meaningful and actionable relevant information, etc. We are seeing a lot of data, and the old tools to manage, process and gather insights from the data like spreadsheets, SQL databases, etc., are not scalable to current needs. Thus, Big Data is becoming a framework to approach how to process, store and cope with the reams of data that is being collected.
According to IDC, it is imperative that organizations and IT leaders focus on the ever-increasing volume, variety and velocity of information that forms big data.
- Volume. Many factors contribute to the increase in data volume – transaction-based data stored through the years, text data constantly streaming in from social media, increasing amounts of sensor data being collected, etc. In the past, excessive data volume created a storage issue. But with today’s decreasing storage costs, other issues emerge, including how to determine relevance amidst the large volumes of data and how to create value from data that is relevant.
- Variety. Data today comes in all types of formats – from traditional databases to hierarchical data stores created by end users and OLAP systems, to text documents, email, meter-collected data, video, audio, stock ticker data and financial transactions. By some estimates, 80 percent of an organization’s data is not numeric! But it still must be included in analyses and decision making.
- Velocity. According to Gartner, velocity “means both how fast data is being produced and how fast the data must be processed to meet demand.” RFID tags and smart metering are driving an increasing need to deal with torrents of data in near-real time. Reacting quickly enough to deal with velocity is a challenge to most organizations.
SAS has added two additional dimensions:
- Variability. In addition to the increasing velocities and varieties of data, data flows can be highly inconsistent with periodic peaks. Is something big trending in the social media? Daily, seasonal and event-triggered peak data loads can be challenging to manage – especially with social media involved.
- Complexity. When you deal with huge volumes of data, it comes from multiple sources. It is quite an undertaking to link, match, cleanse and transform data across systems. However, it is necessary to connect and correlate relationships, hierarchies and multiple data linkages or your data can quickly spiral out of control. Data governance can help you determine how disparate data relates to common definitions and how to systematically integrate structured and unstructured data assets to produce high-quality information that is useful, appropriate and up-to-date.
So to reiterate, Big Data is a framework stemming from the realization that the data has gathered significant pace and that it’s growth has exceeded the capacity for an organization to handle, store and analyze the data in a manner that offers meaningful insights into the relationships between data points. I am calling this a framework, unlike other materials that call Big Data a consequent of the inability of organizations to handle mass amounts of data. I refer to Big Data as a framework because it sets the parameters around an organizations’ decision as to when and which tools must be deployed to address the data scalability issues.
Thus to put the appropriate parameters around when an organization must consider Big Data as part of their analytics roadmap in order to understand the patterns of data better, they have to answer the following ten questions:
- What are the different types of data that should be gathered?
- What are the mechanisms that have to be deployed to gather the relevant data?
- How should the data be processed, transformed and stored?
- How do we ensure that there is no single point of failure in data storage and data loss that may compromise data integrity?
- What are the models that have to be used to analyze the data?
- How are the findings of the data to be distributed to relevant parties?
- How do we assure the security of the data that will be distributed?
- What mechanisms do we create to implement feedback against the data to preserve data integrity?
- How do we morph the big data model into new forms that accounts for new patterns to reflect what is meaningful and actionable?
- How do we create a learning path for the big data model framework?
Some of the existing literature have commingled Big Data framework with analytics. In fact, the literature has gone on to make a rather assertive statement i.e. that Big Data and predictive analytics be looked upon in the same vein. Nothing could be further from the truth!
There are several tools available in the market to do predictive analytics against a set of data that may not qualify for the Big Data framework. While I was the CFO at Atari, we deployed business intelligence tools using Microstrategy, and Microstrategy had predictive modules. In my recent past, we had explored SAS and Minitab tools to do predictive analytics. In fact, even Excel can do multivariate, ANOVA and regressions analysis and best curve fit analysis. These analytical techniques have been part of the analytics arsenal for a long time. Different data sizes may need different tools to instantiate relevant predictive analysis. This is a very important point because companies that do not have Big Data ought to seriously reconsider their strategy of what tools and frameworks to use to gather insights. I have known companies that have gone the Big Data route, although all data points ( excuse my pun), even after incorporating capacity and forecasts, suggest that alternative tools are more cost-effective than implementing Big Data solutions. Big Data is not a one-size fit-all model. It is an expensive implementation. However, for the right data size which in this case would be very large data size, Big Data implementation would be extremely beneficial and cost effective in terms of the total cost of ownership.
Areas where Big Data Framework can be applied!
Some areas lend themselves to the application of the Big Data Framework. I have identified broadly four key areas:
- Marketing and Sales: Consumer behavior, marketing campaigns, sales pipelines, conversions, marketing funnels and drop-offs, distribution channels are all areas where Big Data can be applied to gather deeper insights.
- Human Resources: Employee engagement, employee hiring, employee retention, organization knowledge base, impact of cross-functional training, reviews, compensation plans are elements that Big Data can surface. After all, generally over 60% of company resources are invested in HR.
- Production and Operational Environments: Data growth, different types of data appended as the business learns about the consumer, concurrent usage patterns, traffic, web analytics are prime examples.
- Financial Planning and Business Operational Analytics: Predictive analytics around bottoms-up sales, marketing campaigns ROI, customer acquisitions costs, earned media and paid media, margins by SKU’s and distribution channels, operational expenses, portfolio evaluation, risk analysis, etc., are some of the examples in this category.
Hadoop: A Small Note!
Hadoop is becoming a more widely accepted tool in addressing Big Data Needs. It was invented by Google so they could index the structural and text information that they were collecting and present meaningful and actionable results to the users quickly. It was further developed by Yahoo that tweaked Hadoop for enterprise applications.
Hadoop runs on a large number of machines that don’t share memory or disks. The Hadoop software runs on each of these machines. Thus, if you have for example – over 10 gigabytes of data – you take that data and spread that across different machines. Hadoop tracks where all these data resides! The servers or machines are called nodes, and the common logical categories around which the data is disseminated are called clusters. Thus each server operates on its own little piece of the data, and then once the data is processed, the results are delivered to the main client as a unified whole. The method of reducing the disparate sources of information residing in various nodes and clusters into one unified whole is the process of MapReduce, an important mechanism of Hadoop. You will also hear something called Hive which is nothing but a data warehouse. This could be a structured or unstructured warehouse upon which the Hadoop works upon, processes data, enables redundancy across the clusters and offers a unified solution through the MapReduce function.
Personally, I have always been interested in Business Intelligence. I have always considered BI as a stepping stone, in the new age, to be a handy tool to truly understand a business and develop financial and operational models that are fairly close to the trending insights that the data generates. So my ear is always to the ground as I follow the developments in this area … and though I have not implemented a Big Data solution, I have always been and will continue to be interested in seeing its applications in certain contexts and against the various use cases in organizations.
Posted by Hindol Datta
Social networking, as we understand it today, comprises three major elements:
The cross product of all of the above three lends to the gravitas of the social network and its effective reach into many aspects of the daily routine of a user.
What has transpired over the last few years is the thought that social networks should be free. The fundamental presumption is that a free network would generate activity that would gain traction and sufficient critical mass to justify its zero price value. Once the network has been able to successfully capture meaningful mind share, the next big thing would be to harness the network for revenue opportunities in order to later add quality content and greater expanse … and the cycle would be a positively self-perpetuating cycle that would theoretically know no bounds.
All of the above is fine and dandy. To reiterate, this has been the traditional model and it has continued to be overtly championed so much so that a significant number of such networks still embrace this concept. But have we stepped back and reflected upon this thought that have crystallized as a de facto model over the last few years!
I think we have, and the market is responding accordingly. I am a firm believer in the market system, as I have been steeped in the traditional compositions of Hayek, Schumpeter, Mises and the Austrian school of thought: their thesis being that the price mechanism is the big leveler. Price is not simply that numerical tag on a good or service that I am just referring to: rather, it is the abstraction of the definition that has significantly huge relevance in the economics of social networks and long-term virality and sustainability. Price is the measure of value determined by the laws of demand and supply. It is the spontaneous emergence of value created across several ecosystems that one may be immediately or indirectly connected to … but no one could or would have enough information to gauge or for that matter guide the whole toward a directed end. The value that is imputed in the price quotient changes: that much is obvious! It is governed by what Keynes had called once – the animal spirits. That fact still holds true and these “spirits” can advance or collapse societies in short notice. So what are the implications, if you will, of this short diatribe of economic thought upon social networks and the urban myth of FREE?
Free is good. Yet, Milton Friedman harangued at that word – He said, “There is no such thing as a free lunch”. But we still believe that free is the price that is most acceptable to consumers, since it would create monetizable opportunities on pure volume. Freemium became the order of the day. Get the barbarians into the gate and then cross-sell and up-sell to your heart’s content. But all along that strain of logic, we have forgotten the Friedman adage – for indeed, the barbarians at the gate are paying a price – time, energy, their conversations, their rants and rave, their hopes, their dreams – all potential fodder in the “free” social network ecosystem. What we are doing is that our technologies are creating a petri dish for guinea pigs and the enablers are forcing themselves upon the sensibilities of these “barbarians”. The common thinking – We want them in but once in, we can then strafe their sensibilities with what we determine are contextual ads and offers and relevant value parameters that they would appreciate. But all of a sudden the market has caught on to that. The house is falling down with too many barbarians through the gate; and these “barbarians” care less and less. They become cliques and guilds in a closed environment; they become oblivious to the big brother; they revolt with their attention; they shut their minds, their hearts, and yes … they shut their wallets. Yet they are there for the simple price of being part of small but meaningful conversations … a psychological need to belong, a sense of loyalty to their immediate virtual neighbors. But the noise around is overwhelming and it drowns the tidbits of conversation … so a need emerges wherein there these folks will pay for silence. The conversations will continue; conversations which have the added value of cultural diversity fuelled by the dissolution of geographical boundaries … the conversations are weighed by meaning and gravitas; a modus operandi of respect and mutual understanding … it is my contention that people are willing to pay a price for the silence in order to avert white noise and create more meaning.
So now emerges the niche networks! They offer the insulation from irrelevance: people want to be relevant; they want their conversations to be heard. They would want to reduce it to fireside chats and they measure the meaning over sips of coffee. Information and conversations are balkanized; the quality is enhanced and generally uniformed but quality ensues. The patterns of conversation become the springboard for native monetization. So we will focus on natively monetizing the buzz between connections; we will inspect the patterns of exchange and giving; we will attribute a value and we believe that a small fee will do justice to the calculus of consent.