Category Archives: Analytics

Scaling Considerations in Complex Systems and Organizations: Implications

Scale represents size. In a two-dimensional world, it is a linear measurement that presents a nominal ordering of numbers. In other words, 4 is two times two and 6 would be 3 times two. In other words, the difference between 4 and 6 represents an increase in scale by two. We will discuss various aspects of scale and the learnings that we can draw from it. However, before we go down this path, we would like to touch on resource consumption.

scales

As living organisms, we consume resources. An average human being requires 2000 calories of food per day to sustain themselves. An average human being, by the way, is largely defined in terms of size. So it would be better put if we say that a 200lb person would require 2000 calories. However, if we were to regard a specimen that is 10X the size or 2000 lbs., would it require 10X the calories to sustain itself? Conversely, if the specimen was 1/100th the size of the average human being, then would it require 1/100th the calories to sustain itself. Thus, will we consume resources linearly to our size? Are we operating in a simple linear world? And if not, what are the ramifications for science, physics, biology, organizations, cities, climate, etc.?

Let us digress a little bit from the above questions and lay out a few interesting facts. Almost half of the population in the world today live in cities. This is compared to less than 15% of the world population that lived in cities a hundred years ago.  It is anticipated that almost 75% of the world population will be living in cities by 2050. The number of cities will increase and so will the size. But for cities to increase in size and numbers, it requires vast amount of resources. In fact, the resource requirements in cities are far more extensive than in agrarian societies. If there is a limit to the resources from a natural standpoint – in other words, if the world is operating on a budget of natural resources – then would this mean that the growth of the cities will be naturally reined in? Will cities collapse because of lack of resources to support its mass?

What about companies? Can companies grow infinitely?  Is there a natural point where companies might hit their limit beyond which growth would not be possible? Could a company collapse because the amount of resources that is required to sustain the size would be compromised? Are there other factors aside from resource consumption that play into what might cap the growth and hence the size of the company? Are there overriding factors that come into play that would superimpose the size-resource usage equation such that our worries could be safely kept aside? Are cities and companies governed by some sort of metabolic rate that governs the sustenance of life?

gw scale title

Geoffrey West, a theoretical physicist, has touched on a lot of the questions in his book: Scale: The Universal Laws of Growth, Innovation, Sustainability, and the Pace of Life in Organisms, Cities, Economies, and Companies.     He says that a person requires about 90W (watts) of energy to survive. That is a light bulb burning in your living room in one day.  That is our metabolic rate. However, just like man does not live by bread alone, an average man has to depend on a number of other artifacts that have agglomerated in bits and pieces to provide a quality of life to maximize sustenance. The person has to have laws, electricity, fuel, automobile, plumbing and water, markets, banks, clothes, phones and engage with other folks in a complex social network to collaborate and compete to achieve their goals. Geoffrey West says that the average person requires almost 11000W or the equivalent of almost 125 90W light bulbs. To put things in greater perspective, the social metabolic rate of 11,000W is almost equivalent to a dozen elephants.  (An elephant requires 10X more energy than humans even though they might be 60X the size of the physical human being). Thus, a major portion of our energy is diverted to maintain the social and physical network that closely interplay to maintain our sustenance.  And while we consume massive amounts of energy, we also create a massive amount of waste – and that is an inevitable outcome. This is called the entropy impact and we will touch on this in greater detail in later articles. Hence, our growth is not only constrained by our metabolic rate: it is further dampened by entropy that exists as the Second Law of Thermodynamics.   And as a system ages, the impact of entropy increases manifold. Yes, it is true: once we get old, we are racing toward our death at a faster pace than when we were young. Our bodies are exhibiting fatigue faster than normal.

Scaling refers to how a system responds when its size changes. As mentioned  earlier, does scaling follow a linear model? Do we need to consume 2X resources if we increase the size by 2X? How does scaling impact a Complex Physical System versus a Complex Adaptive System? Will a 2X impact on the initial state create perturbations in a CPS model which is equivalent to 2X? How would this work on a CAS model where the complexity is far from defined and understood because these systems are continuously evolving? Does half as big requires half as much or conversely twice as big requires twice as much? Once again, I have liberally dipped into this fantastic work by Geoffrey West to summarize, as best as possible, the definitions and implications. He proves that we cannot linearly extrapolate energy consumption and size: the world is smattered with evidence that undermines the linear extrapolation model. In fact, as you grow, you become more efficient with respect to energy consumption. The savings of energy due to growth in size is commonly called the economy of scale. His research also suggests two interesting results. When cities or social systems grow, they require an infrastructure to help with the growth. He discovered that it takes 85% resource consumption to grow the systems by 100%. Thus, there is a savings of 15% which is slightly lower than what has been studied on the biological front wherein organisms save 25% as they grow. He calls this sub linear scaling. In contrast, he also introduces the concept of super linear scaling wherein there is a 15% increasing returns to scale when the city or a social system grows. In other words, if the system grows by 100%, the positive returns with respect to such elements like patents, innovation, etc.   will grow by 115%. In addition, the negative elements also grow in an equivalent manner – crime, disease, social unrest, etc. Thus, the growth in cities are supported by an efficient infrastructure that generates increasing returns of good and bad elements.

sublinear

Max Kleiber, a Swiss chemist, in the 1930’s proposed the Kleiber’s law which sheds a lot of light on metabolic rates as energy consumption per unit of time. As mass increases so does the overall metabolic rate but it is not a linear relation – it obeys the power law. It stays that a living organism’s metabolic rate scales to the ¾ power of its mass. If the cat has a mass 100 times that of a mouse, the cat will metabolize about 100 ¾ = 31.63 times more energy per day rather than 100 times more energy per day.  Kleiber’s law has led to the metabolic theory of energy and posits that the metabolic rate of organisms is the fundamental biological rate that governs most observed patters in our immediate ecology. There is some ongoing debate on the mechanism that allows metabolic rate to differ based on size. One mechanism is that smaller organisms have higher surface area to volume and thus needs relatively higher energy versus large organisms that have lower surface area to volume. This assumes that energy consumption occurs across surface areas. However, there is another mechanism that argues that energy consumption happens when energy needs are distributed through a transport network that delivers and synthesizes energy. Thus, smaller organisms do not have as a rich a network as large organisms and thus there is greater energy efficiency usage among smaller organisms than larger organisms. Either way, the implications are that body size and temperature (which is a result of internal activity) provide fundamental and natural constraints by which our ecological processes are governed. This leads to another concept called finite time singularity which predicts that unbounded growth cannot be sustained because it would need infinite resources or some K factor that would allow it to increase. The K factor could be innovation, a structural shift in how humans and objects cooperate, or even a matter of jumping on a spaceship and relocating to Mars.

power law

We are getting bigger faster. That is real. The specter of a dystopian future hangs upon us like the sword of Damocles. The thinking is that this rate of growth and scale is not sustainable since it is impossible to marshal the resources to feed the beast in an adequate and timely manner. But interestingly, if we were to dig deeper into history – these thoughts prevailed in earlier times as well but perhaps at different scale. In 1798 Thomas Robert Malthus famously predicted that short-term gains in living standards would inevitably be undermined as human population growth outstripped food production, and thereby drive living standards back toward subsistence. Humanity thus was checkmated into an inevitable conclusion: a veritable collapse spurred by the tendency of population to grow geometrically while food production would increase only arithmetically. Almost two hundred years later, a group of scientists contributed to the 1972 book called Limits to Growth which had similar refrains like Malthus: the population is growing and there are not enough resources to support the growth and that would lead to the collapse of our civilization. However, humanity has negotiated those dark thoughts and we continue to prosper. If indeed, we are governed by this finite time singularity, we are aware that human ingenuity has largely won the day. Technology advancements, policy and institutional changes, new ways of collaboration, etc. have emerged to further delay this “inevitable collapse” that could be result of more mouths to feed than possible.  What is true is that the need for new innovative models and new ways of doing things to solve the global challenges wrought by increased population and their correspondent demands will continue to increase at a quicker pace. Once could thus argue that the increased pace of life would not be sustainable. However, that is not a plausible hypothesis based on our assessment of where we are and where we have been.

Let us turn our attention to a business. We want the business to grow or do we want the business to scale? What is the difference? To grow means that your company is adding resources or infrastructure to handle increased demand, at a cost which is equivalent to the level of increased revenue coming in. Scaling occurs when the business is growing faster than the resources that are being consumed. We have already explored that outlier when you grow so big that you are crushed by your weight. It is that fact which limits the growth of organism regardless of issues related to scale. Similarly, one could conceivably argue that there are limits to growth of a company and might even turn to history and show that a lot of large companies of yesteryears have collapsed. However, it is also safe to say that large organizations today are by several factors larger than the largest organizations in the past, and that is largely on account of accumulated knowledge and new forms of innovation and collaboration that have allowed that to happen. In other words, the future bodes well for even larger organizations and if those organizations indeed reach those gargantuan size, it is also safe to draw the conclusion that they will be consuming far less resources relative to current organizations, thus saving more energy and distributing more wealth to the consumers.

Thus, scaling laws limit growth when it assumes that everything else is constant. However, if there is innovation that leads to structural changes of a system, then the limits to growth becomes variable. So how do we effect structural changes? What is the basis? What is the starting point? We look at modeling as a means to arrive at new structures that might allow the systems to be shaped in a manner such that the growth in the systems are not limited by its own constraints of size and motion and temperature (in physics parlance).  Thus, a system is modeled at a presumably small scale but with the understanding that as the system is increases in size, the inner workings of emergent complexity could be a problem. Hence, it would be prudent to not linearly extrapolate the model of a small system to that of a large one but rather to exponential extrapolate the complexity of the new system that would emerge. We will discuss this in later articles, but it would be wise to keep this as a mental note as we forge ahead and refine our understanding of scale and its practical implications for our daily consumption.

Model Thinking

Model Framework

The fundamental tenet of theory is the concept of “empiria“. Empiria refers to our observations. Based on observations, scientists and researchers posit a theory – it is part of scientific realism.

A scientific model is a causal explanation of how variables interact to produce a phenomenon, usually linearly organized.  A model is a simplified map consisting of a few, primary variables that is gauged to have the most explanatory powers for the phenomenon being observed.  We discussed Complex Physical Systems and Complex Adaptive Systems early on this chapter. It is relatively easier to map CPS to models than CAS, largely because models become very unwieldy as it starts to internalize more variables and if those variables have volumes of interaction between them. A simple analogy would be the use of multiple regression models: when you have a number of independent variables that interact strongly between each other, autocorrelation errors occur, and the model is not stable or does not have predictive value.

thinking

Research projects generally tend to either look at a case study or alternatively, they might describe a number of similar cases that are logically grouped together. Constructing a simple model that can be general and applied to many instances is difficult, if not impossible. Variables are subject to a researcher’s lack of understanding of the variable or the volatility of the variable. What further accentuates the problem is that the researcher misses on the interaction of how the variables play against one another and the resultant impact on the system. Thus, our understanding of our system can be done through some sort of model mechanics but, yet we share the common belief that the task of building out a model to provide all of the explanatory answers are difficult, if not impossible. Despite our understanding of our limitations of modeling, we still develop frameworks and artifact models because we sense in it a tool or set of indispensable tools to transmit the results of research to practical use cases. We boldly generalize our findings from empiria into general models that we hope will explain empiria best. And let us be mindful that it is possible – more so in the CAS systems than CPS that we might have multiple models that would fight over their explanatory powers simply because of the vagaries of uncertainty and stochastic variations.

Popper says: “Science does not rest upon rock-bottom. The bold structure of its theories rises, as it were, above a swamp. It is like a building erected on piles. The piles are driven down from above into the swamp, but not down to any natural or ‘given’ base; and when we cease our attempts to drive our piles into a deeper layer, it is not because we have reached firm ground. We simply stop when we are satisfied that they are firm enough to carry the structure, at least for the time being”. This leads to the satisficing solution: if a model can choose the least number of variables to explain the greatest amount of variations, the model is relatively better than other models that would select more variables to explain the same. In addition, there is always a cost-benefit analysis to be taken into consideration: if we add x number of variables to explain variation in the outcome but it is not meaningfully different than variables less than x, then one would want to fall back on the less-variable model because it is less costly to maintain.

problemsol

Researchers must address three key elements in the model: time, variation and uncertainty. How do we craft a model which reflects the impact of time on the variables and the outcome? How to present variations in the model? Different variables might vary differently independent of one another. How do we present the deviation of the data in a parlance that allows us to make meaningful conclusions regarding the impact of the variations on the outcome? Finally, does the data that is being considered are actual or proxy data? Are the observations approximate? How do we thus draw the model to incorporate the fuzziness: would confidence intervals on the findings be good enough?

Two other equally other concepts in model design is important: Descriptive Modeling and Normative Modeling.

Descriptive models aim to explain the phenomenon. It is bounded by that goal and that goal only.

There are certain types of explanations that they fall back on: explain by looking at data from the past and attempting to draw a cause and effect relationship. If the researcher is able to draw a complete cause and effect relationship that meets the test of time and independent tests to replicate the results, then the causality turns into law for the limited use-case or the phenomenon being explained. Another explanation method is to draw upon context: explaining a phenomenon by looking at the function that the activity fulfills in its context. For example, a dog barks at a stranger to secure its territory and protect the home. The third and more interesting type of explanation is generally called intentional explanation: the variables work together to serve a specific purpose and the researcher determines that purpose and thus, reverse engineers the understanding of the phenomenon by understanding the purpose and how the variables conform to achieve that purpose.

This last element also leads us to thinking through the other method of modeling – namely, normative modeling. Normative modeling differs from descriptive modeling because the target is not to simply just gather facts to explain a phenomenon, but rather to figure out how to improve or change the phenomenon toward a desirable state. The challenge, as you might have already perceived, is that the subjective shadow looms high and long and the ultimate finding in what would be a normative model could essentially be a teleological representation or self-fulfilling prophecy of the researcher in action. While this is relatively more welcome in a descriptive world since subjectivism is diffused among a larger group that yields one solution, it is not the best in a normative world since variation of opinions that reflect biases can pose a problem.

How do we create a representative model of a phenomenon? First, we weigh if the phenomenon is to be understood as a mere explanation or to extend it to incorporate our normative spin on the phenomenon itself. It is often the case that we might have to craft different models and then weigh one against the other that best represents how the model can be explained. Some of the methods are fairly simple as in bringing diverse opinions to a table and then agreeing upon one specific model. The advantage of such an approach is that it provides a degree of objectivism in the model – at least in so far as it removes the divergent subjectivity that weaves into the various models. Other alternative is to do value analysis which is a mathematical method where the selection of the model is carried out in stages. You define the criteria of the selection and then the importance of the goal (if that be a normative model). Once all of the participants have a general agreement, then you have the makings of a model. The final method is to incorporate all all of the outliers and the data points in the phenomenon that the model seeks to explain and then offer a shared belief into those salient features in the model that would be best to apply to gain information of the phenomenon in a predictable manner.

business model

There are various languages that are used for modeling:

Written Language refers to the natural language description of the model. If price of butter goes up, the quantity demanded of the butter will go down. Written language models can be used effectively to inform all of the other types of models that follow below. It often goes by the name of “qualitative” research, although we find that a bit limiting.  Just a simple statement like – This model approximately reflects the behavior of people living in a dense environment …” could qualify as a written language model that seeks to shed light on the object being studied.

Icon Models refer to a pictorial representation and probably the earliest form of model making. It seeks to only qualify those contours or shapes or colors that are most interesting and relevant to the object being studied. The idea of icon models is to pictorially abstract the main elements to provide a working understanding of the object being studied.

Topological Models refer to how the variables are placed with respect to one another and thus helps in creating a classification or taxonomy of the model. Once can have logical trees, class trees, Venn diagrams, and other imaginative pictorial representation of fields to further shed light on the object being studied. In fact, pictorial representations must abide by constant scale, direction and placements. In other words, if the variables are placed on a different scale on different maps, it would be hard to draw logical conclusions by sight alone. In addition, if the placements are at different axis in different maps or have different vectors, it is hard to make comparisons and arrive at a shared consensus and a logical end result.

Arithmetic Models are what we generally fall back on most. The data is measured with an arithmetic scale. It is done via tables, equations or flow diagrams. The nice thing about arithmetic models is that you can show multiple dimensions which is not possible with other modeling languages. Hence, the robustness and the general applicability of such models are huge and thus is widely used as a key language to modeling.

Analogous Models refer to crafting explanations using the power of analogy. For example, when we talk about waves – we could be talking of light waves, radio waves, historical waves, etc.  These metaphoric representations can be used to explain phenomenon, but at best, the explanatory power is nebulous, and it would be difficult to explain the variations and uncertainties between two analogous models.  However, it still is used to transmit information quickly through verbal expressions like – “Similarly”, “Equivalently”, “Looks like ..” etc. In fact, extrapolation is a widely used method in modeling and we would ascertain this as part of the analogous model to a great extent. That is because we time-box the variables in the analogous model to one instance and the extrapolated model to another instance and we tie them up with mathematical equations.

 

Disseminating financial knowledge to develop engaged organizations

Financial awareness of key drivers are becoming the paramount leading indicators for organizational success. For most, the finance department is a corner office service that offers ad hoc analysis on strategic and operational initiatives to a company, and provides an ex-post assessment of the financial condition of the company among a select few. There are some key financial metrics that one wants to measure across all companies and all industries without exception, but then there are unique metrics that reflect the key underlying drivers for organizational success. Organizations align their forays into new markets, new strategies and new ventures around a narrative that culminates in a financial metric or a proxy that illustrates opportunities lost or gained.

Image

Having been cast in operational finance roles for a good length of my career, I have often encountered a high level of interest to learn financial concepts in areas such as engineering, product management, operations, sales, etc. I have to admit that I have been humbled by the fairly wide common-sense understanding of basic financial concepts that these folks have. However, in most cases, the understanding is less than skin deep with misunderstandings that are meaningful. The good news is that I have also noticed a promising trend, namely … the questions are more thoroughly weighed by the “non-finance” participants, and there seems to be an elevated understanding of key financial drivers that translate to commercial success. This knowledge continues to accelerate … largely, because of convergence of areas around data science, analytics, assessment of personal ownership stakes, etc. But the passing of such information across these channels to the hungry recipients are not formalized. In other words, I posit that having a formal channel of inculcating financial education across the various functional areas would pay rich dividends for the company in the long run. Finance is a vast enough field that partaking general knowledge in these concepts which are more than merely skin-deep would also enable the finance group to engage in meaningful conversations with other functional experts, thus allowing the narrative around the numbers to be more wholesome. Thus, imparting the financial knowledge would be beneficial to the finance department as well.

Image

To be effective in creating a formal channel of disseminating information of the key areas in finance that matter to the organization, it is important to understand the operational drivers. When I say operational drivers, I am expanding that to encompass drivers that may uniquely affect other functional areas. For example, sales may be concerned with revenue, margins whereas production may be concerned with server capacity, work-in-process and throughput, etc. At the end, the financial metrics are derivatives. They are cross products of single or multiple drivers and these are the elements that need to be fleshed out to effect a spirited conversation. That would then enable the production of a financial barometer that everyone in the organization can rally behind and understand, and more importantly … be able to assess how their individual contribution has and will advance organization goals.

The Political Campaign Juggernaut – What Obamney campaigns can teach Organizations!

The Presidential election is tomorrow. I shall not disclose my position, but I am a San Francisco/Bay Area Native. Any doubts who I most likely am inclined toward? Most likely not! But the campaign throughout the year got me thinking. Imagine … over $1.3B have been spent to either bash someone or to send a message out. Over $1.3B! I do not have the actual numbers, but what I do know is that about $1B was spent in 2008 and it is estimated that the total spend was at least 30% more for the 2012 campaign. That makes it one of the biggest annual marketing budgets. To put it in context, that is almost 50% more than what Apple spent on advertising in 2011 ($933M).


We are expecting about 100M people to vote. 100M people to give a like for either party. Now look at it this way. $1.3B suggests that the total presidential campaign budget would translate to over 400M clicks (assuming $3 per click) or over 650 billion impressions (assuming $2 per 1000 impressions). Of course, that is not actually the case because there is payroll, organization expenses, etc, etc, etc. But you get the point. It is a big big budget … and it is one of the very few budgets that tend to be managed very well. Despite the largesse, it does not take into account the volunteer base that goes into the campaigns.


Now the outcome associated with political campaigns is fairly concrete. Either you have put the money to good use, hence resulting in the election of the appropriate person or your money spent has not been good enough. Who do you fire? The person who loses either goes moves shop from White House or considers becoming the CEO of the next big thing – perhaps a public equity capital group. Either way, we can take some learnings from all that have transpired and apply it to organizations. Of course, most organizations do not have this massive budget but regardless … they do have substantial marketing budgets and so the question is: What can we learn from what we have seen in the political theater that would enable the organization to shape and landscape the customer and employee mindshare.

Here are a few key points:
1. Pounding the message: Organizations have to be focused on the end goal and ensure at all times that any and all message that is being delivered is being done to attain a set of key objectives that enables organization success. That means that there should be no ambiguity as to what the organization and its brand represents. Dilution of the message may open up pockets of undecided customers or employees that could vote with their wallet and their feet quite readily.
2. Creating advocacy groups: Organizations have to create and nurture product and message evangelists by placing these nodes across many fields where potential customers and employees may come in contact with the organization. That would mean almost all social media channels, offline channels, conferences, elicit testimonials, investor and public relations efforts, timing special news releases etc. Advocacy groups are a proxy for all channels that an organization must leverage.
3. Aspirational Inclinations: Sell a dream! Sell possibilities! Sell the Why Nots! People tend to converge upon a platform of optimism. Yet, organizations must also be able to short their competitor’s offerings or perhaps not mention them at all.
4. Polling the behavior: If you notice, political campaigns have taken a page out of Lean Startup methodology. If polls go haywire …resources and messages are tweaked to create a semblance of stability and to get back to desired radar frequencies. Tweaking of the message and the presence of the messenger becomes important. This is field deployment of solutions associated with what all the data intelligence gathered is telling you.
5. Super PACS and Angel Affiliates: You have limits as do all organizations! No problem! Create evangelists that are not directly on the take. These are folks that will push your culture to the furthest corners of the globe. So recognize them and support them. They carry the torch since they fully believe in your mission and that your organization outcomes will impact them positively. How? Let them know? Drill. Baby. Drilllll the message.
6. Electoral College wins, not popular polls: Focus on the profitable customers; get the very best employees. Stratify your business so that you buy the win. You may not have the most likes but you would have had enough among the strata that truly matters.
7. Give the final reason: Give customers and employees a reason to vote. You want them to vote for you, but all the same you still want them to vote. You want the market of ideas to expand, even though they may serve competing visions in the tapestry of organizations in your space. But in trying to harness the turnout to the polls, you will have done as well as you can to draw them to your mojo.


See you all possible voters in the polls tomorrow. Applaud and keep the flames of democracy alive.

The Big Data Movement: Importance and Relevance today?

We are entering into a new age wherein we are interested in picking up a finer understanding of relationships between businesses and customers, organizations and employees, products and how they are being used,  how different aspects of the business and the organizations connect to produce meaningful and actionable relevant information, etc. We are seeing a lot of data, and the old tools to manage, process and gather insights from the data like spreadsheets, SQL databases, etc., are not scalable to current needs. Thus, Big Data is becoming a framework to approach how to process, store and cope with the reams of data that is being collected.

According to IDC, it is imperative that organizations and IT leaders focus on the ever-increasing volume, variety and velocity of information that forms big data.

  • Volume. Many factors contribute to the increase in data volume – transaction-based data stored through the years, text data constantly streaming in from social media, increasing amounts of sensor data being collected, etc. In the past, excessive data volume created a storage issue. But with today’s decreasing storage costs, other issues emerge, including how to determine relevance amidst the large volumes of data and how to create value from data that is relevant.
  • Variety. Data today comes in all types of formats – from traditional databases to hierarchical data stores created by end users and OLAP systems, to text documents, email, meter-collected data, video, audio, stock ticker data and financial transactions. By some estimates, 80 percent of an organization’s data is not numeric! But it still must be included in analyses and decision making.
  • Velocity. According to Gartner, velocity “means both how fast data is being produced and how fast the data must be processed to meet demand.” RFID tags and smart metering are driving an increasing need to deal with torrents of data in near-real time. Reacting quickly enough to deal with velocity is a challenge to most organizations.

SAS has added two additional dimensions:

  • Variability. In addition to the increasing velocities and varieties of data, data flows can be highly inconsistent with periodic peaks. Is something big trending in the social media? Daily, seasonal and event-triggered peak data loads can be challenging to manage – especially with social media involved.
  • Complexity. When you deal with huge volumes of data, it comes from multiple sources. It is quite an undertaking to link, match, cleanse and transform data across systems. However, it is necessary to connect and correlate relationships, hierarchies and multiple data linkages or your data can quickly spiral out of control. Data governance can help you determine how disparate data relates to common definitions and how to systematically integrate structured and unstructured data assets to produce high-quality information that is useful, appropriate and up-to-date.

 

So to reiterate, Big Data is a framework stemming from the realization that the data has gathered significant pace and that it’s growth has exceeded the capacity for an organization to handle, store and analyze the data in a manner that offers meaningful insights into the relationships between data points.  I am calling this a framework, unlike other materials that call Big Data a consequent of the inability of organizations to handle mass amounts of data. I refer to Big Data as a framework because it sets the parameters around an organizations’ decision as to when and which tools must be deployed to address the data scalability issues.

Thus to put the appropriate parameters around when an organization must consider Big Data as part of their analytics roadmap in order to understand the patterns of data better, they have to answer the following  ten questions:

  1. What are the different types of data that should be gathered?
  2. What are the mechanisms that have to be deployed to gather the relevant data?
  3. How should the data be processed, transformed and stored?
  4. How do we ensure that there is no single point of failure in data storage and data loss that may compromise data integrity?
  5. What are the models that have to be used to analyze the data?
  6. How are the findings of the data to be distributed to relevant parties?
  7. How do we assure the security of the data that will be distributed?
  8. What mechanisms do we create to implement feedback against the data to preserve data integrity?
  9. How do we morph the big data model into new forms that accounts for new patterns to reflect what is meaningful and actionable?
  10. How do we create a learning path for the big data model framework?

Some of the existing literature have commingled Big Data framework with analytics. In fact, the literature has gone on to make a rather assertive statement i.e. that Big Data and predictive analytics be looked upon in the same vein. Nothing could be further from the truth!

There are several tools available in the market to do predictive analytics against a set of data that may not qualify for the Big Data framework. While I was the CFO at Atari, we deployed business intelligence tools using Microstrategy, and Microstrategy had predictive modules. In my recent past, we had explored SAS and Minitab tools to do predictive analytics. In fact, even Excel can do multivariate, ANOVA and regressions analysis and best curve fit analysis. These analytical techniques have been part of the analytics arsenal for a long time. Different data sizes may need different tools to instantiate relevant predictive analysis. This is a very important point because companies that do not have Big Data ought to seriously reconsider their strategy of what tools and frameworks to use to gather insights. I have known companies that have gone the Big Data route, although all data points ( excuse my pun), even after incorporating capacity and forecasts, suggest that alternative tools are more cost-effective than implementing Big Data solutions. Big Data is not a one-size fit-all model. It is an expensive implementation. However, for the right data size which in this case would be very large data size, Big Data implementation would be extremely beneficial and cost effective in terms of the total cost of ownership.

Areas where Big Data Framework can be applied!

Some areas lend themselves to the application of the Big Data Framework.  I have identified broadly four key areas:

  1. Marketing and Sales: Consumer behavior, marketing campaigns, sales pipelines, conversions, marketing funnels and drop-offs, distribution channels are all areas where Big Data can be applied to gather deeper insights.
  2. Human Resources: Employee engagement, employee hiring, employee retention, organization knowledge base, impact of cross-functional training, reviews, compensation plans are elements that Big Data can surface. After all, generally over 60% of company resources are invested in HR.
  3. Production and Operational Environments: Data growth, different types of data appended as the business learns about the consumer, concurrent usage patterns, traffic, web analytics are prime examples.
  4. Financial Planning and Business Operational Analytics:  Predictive analytics around bottoms-up sales, marketing campaigns ROI, customer acquisitions costs, earned media and paid media, margins by SKU’s and distribution channels, operational expenses, portfolio evaluation, risk analysis, etc., are some of the examples in this category.

Hadoop: A Small Note!

Hadoop is becoming a more widely accepted tool in addressing Big Data Needs.  It was invented by Google so they could index the structural and text information that they were collecting and present meaningful and actionable results to the users quickly. It was further developed by Yahoo that tweaked Hadoop for enterprise applications.

Hadoop runs on a large number of machines that don’t share memory or disks. The Hadoop software runs on each of these machines. Thus, if you have for example – over 10 gigabytes of data – you take that data and spread that across different machines.  Hadoop tracks where all these data resides! The servers or machines are called nodes, and the common logical categories around which the data is disseminated are called clusters.  Thus each server operates on its own little piece of the data, and then once the data is processed, the results are delivered to the main client as a unified whole. The method of reducing the disparate sources of information residing in various nodes and clusters into one unified whole is the process of MapReduce, an important mechanism of Hadoop. You will also hear something called Hive which is nothing but a data warehouse. This could be a structured or unstructured warehouse upon which the Hadoop works upon, processes data, enables redundancy across the clusters and offers a unified solution through the MapReduce function.

Personally, I have always been interested in Business Intelligence. I have always considered BI as a stepping stone, in the new age, to be a handy tool to truly understand a business and develop financial and operational models that are fairly close to the trending insights that the data generates.  So my ear is always to the ground as I follow the developments in this area … and though I have not implemented a Big Data solution, I have always been and will continue to be interested in seeing its applications in certain contexts and against the various use cases in organizations.