Data Science

Data Science is about making sense of data, using it to model and solve complex problems, drawing conclusions from various hypotheses by utilizing the Scientific Method and presenting the compelling solutions by “telling a story with data." It draws up skills and knowledge from different disciplines – Data engineering, math and statistics, modelling and programming, visualization – aided by deep domain expertise.  The picture illustrates our Data Science Capabilities:

Our Data Science Services:

Consulting – Our Consulting offering comes with expertise in mapping business problems to technical solutions that ensure Business value attainment. We specialize in High level architecture, platform and design consulting for a large-scale data Processing, transformational BI and Predictive Analytics.

System Integration – Our System Integration Services focus on building custom analytical systems that are performant, efficient data management and maintenance of the cluster infrastructure that optimally utilizes a given computing environment.

Solution Development – Our Solutions offerings come with a framework/Toolkits for solving challenges in a specific vertical or a domain. They would work on large datasets, variety of data structure and/or latency. These are complete solution builds that fits in well with customer’s Data Product strategies. We will utilize the open source accelerators that were built by us.

Center of Excellence (CoE) – Our CoE offering is unique in terms of sharing our years of expertise working in the areas of Big Data Analytics and Data Science. We would put our experience to benefit nascent Data Science practices ramp up quickly on the Advanced Analytics journey or establish a captive CoEs.

Statistical Analysis

Statistical analysis involves collecting and scrutinizing every data sample in a set of items from which samples can be drawn. A sample, in statistics, is a representative selection drawn from a total population. The goal of statistical analysis is to identify trends, relationship or a pattern.

Exploratory Data Analysis: Describe the nature of the data to be analyzed and explore the relations between data elements. EDA also helps understand and quantify features of the population.

Hypothesis Testing:  The purpose is to design experiments, measure the results and draw the conclusions on the Population based on the samples. This helps create a model to summarize understanding of how the data relates to the underlying population and prove (or disprove) the validity of the model.

Bayesian Methods: This is a method of inference in which Bayes’ theorem is used to update the probability for a hypothesis as more evidence or information becomes available. It has a wide range of applications including science, engineering, philosophy, medicine, sport, and business.

Operations Research

Operations Research deals with the application of advanced analytical methods to help make better decisions in a complex decision-making environment that is riddled with many conflicting metrics of performance and constraints on the available resources.

Optimization: This involves finding the optimal or near-optimal solutions to complex decision-making problems concerning the maximum (of profit, performance, or yield) or minimum (of loss, risk, or cost) of some real-world objective.

Simulation: Simulation is the imitation of a real-world system.  It needs a model that represents the key characteristics, behavior and function of a system. This is used to mimic what-if scenarios and understand the effects of alternative conditions and courses of action.

Forecasting: Forecasting is the process of making predictions of the future based on past and present data and most commonly by analysis of trends and seasonality. Generally, a margin of error or confidence is attached to a forecast.

Machine Learning (ML)

The goal of machine learning is to develop methods that can automatically detect patterns in data, and then to use the uncovered patterns to predict the outcomes of interest.

Supervised Learning:  This is a process where the algorithm learns from the past observations and creates a prediction model. This model is utilized to predict the outcome of a given set of input conditions.

Unsupervised learning:  The goal for unsupervised learning is to model the underlying structure or distribution in the data in order to learn more about the data. Algorithms are left to their own to discover and present the interesting structure in the data.

Recommender Systems: These systems filter the choices for a user/product and personalize the recommendations based on the user’s preferences/propensity.

Natural Language Processing (NLP)

NLP refers to method of communicating with an intelligent system using a human language.  It involves making computers to perform useful tasks by automatically interpreting and making decisions using both speech and Text

Topic Mining: Identifying the key topics in a corpus of text is called topics. These techniques are used to identify trends, user preferences and new ideas that were discussed in increasingly proliferating information generation.

Entity Extraction:  This seeks to identify pre-defined categories from a corpus of text such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.

Sentiment Analysis:  This is to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media.

Deep Learning

Deep learning is a study of Artificial neural networks with applications in image processing and other complex tasks that simple machine learning algorithms may not directly perform.

Neural Networks: These are algorithms that attempts to identify underlying relationships in a set of data by using a process that mimics the way the human brain operates. They have the ability to adapt to changing inputs without the need to redesign the output criteria

Semantic Learning: These are models that are utilized in developing intelligent systems that comprehend Language. Semantic networks themselves are utilized to store and retrieve knowledge.

AI Bots:  These are computer programs which conducts a conversation via auditory or textual methods. These are designed to convincingly simulate human behavior as a conversational partner. They can use advanced AI techniques.


Visualization amplifies the cognition by helping in pattern detection and enhancing visual insight of a large quantity of data. It helps us to see data in context, analyze and discover knowledge.

Interactive Graphs/charts: This provides for dynamically changing various filters, changing hierarchies, slicing and dicing data to study and gather insight of the data.

Geographical or Geospatial:  Use of geo-spatial visualization and analysis, can represent many facets of Geospatial data. In a retail situation, it can capture location of store, the difference in market size according to region, price and compensation studies in regard to specific regions, etc.  in a single diagram.

Networks and Time series: Networks are used to study meaning and relationships between large contextual data. These graphs are used to quantify relationships between different entities of interest to a business. Time series graphs are representation of a KPI as a function of time.