We shall talk about Business Analytics here. We will try to spend a little time on each of these slides and make sure that you understand every bit of it. We can pause the training in case there is any confusion. Nowadays we usually hear a lot about buzzwords like data mining, machine learning, artificial intelligence, robot etc. I have seen many companies talking about robotic process automation too. So where do they come from? The statistical analysis to the robotic process automation as one of their sidelines or facets on their website, which they boast of and other stuff. Let's not really care about what those different concepts are at this point of time. Let's first concentrate on what exactly exists in the market.
Let's understand what exactly exists in the market and what are the different areas of analytics, data visualization and data that exists in each of those businesses, and see what is the importance of them. Let me give a small example; I worked in a very big retail organization called Target. I believe Target is the third or fourth largest retail organization in the USA and they have recently commenced business in Canada with a company called Zellers. They acquired Zellers. I shall try to give you my real-time perspective experience on what is analytics and this definitely covers a lot of ground that I'm talking about as far as the slide is concerned. Now I'm talking about Target which is the third or fourth biggest retail organization, they have a humongous amount of data. Now if you take an example of some 35th or 36th opposition of retail organization or a healthcare company like Sanofi for example, which is a pharmaceutical company. Now Sanofi is probably standing at the 50th position or maybe at 40th position, they do have a humongous amount of information. I was quite surprised when I met some of these business teams at Target and asked if we could do some analytics, data discovery and manage some data. They were quite surprised to know that they have data that is also being spoken about their own products or about their own business processes somewhere in Canada or somewhere in Jerusalem. They don't even know that their data exists there, which means that if there is a Target-owned private brand like Archer Farms. This brand is spoken of in India and Target never did any business in India, so it was quite surprising for them to know that their data exists all over the world. Nowadays social media is so disruptive that almost every single human being is aware of it. It wouldn't be surprising the factor if I say that maybe after ten years from now, even people at the age of 90 or 95, who are on the verge of completing a century, they will be available on social media. So given the data and social media is so ripped and disruptive, it's quite important for us to really see the concept of data discovery and the management. This means how important data plays an important role in the entire gamut of the business process. We have not even reached the concept of analytics yet, we are just talking about data. So it is quite important for us to know how to acquire data which is called data acquisition and how to clean the data. You might have actually read in many books or journals or articles or news that majority of the stuff that we Data Scientist drew about 75 to 80 percent of it, is data management and discovery. Now earlier it was just data management which used to hit the internal databases and they were doing a lot of data management. But now that social media and data have become so disruptive. If you are talking about web 2.0 now which means that you have more data outside than what you have inside. So obviously, since that data is so disruptive, data management becomes very difficult and it's no more 75% now, it is probably touching 90%, this is the amount of work that a data management consultant has to do. Now you don't need to be an analytics consultant to be a part of the data science acumen or data science 360 degrees that I’m talking about. If you really have the skills to write codes and performing data discovery, data acquisition, data extraction and management, I think you are the right fit for this particular kind of job.
Learn more about Data Visualization – How Best To Do It?
We are talking about dashboards here where data is so disruptive, data is present everywhere. So obviously, it's quite important for us to really combine all the data that you have. This is because the data is loosely coupled which means people who are sitting in supply chain teams may not really understand what is really happening in social media background. In social media tweets, for example, someone is talking bad about your product they don't know what is happening because they are from supply chain team, they simply don't really care what is happening about their product. They only care whether the product is delivered as per the order. So it's quite important that we have all the teams connected together and this is the reason why we are talking about 360 degrees so that there's no there is no breakage, data leakage etc. Obviously, everyone has to be aligned with the business process of the organization and every single team of an organization is part of the entire game that I'm talking about. So we need to have that kind of a dashboard which can connect all the dots in an organization. It should be interactive so that someone can quickly view what is happening there. For example, if we talk about Amazon godown, a place where all the stuff is stored and it travels from that godown to another city if people really want to know what is happening, whether the truck is stuck somewhere or moving or there is an accident that took place etc. So one has to be very interactive so that they can quickly take an intelligent decision which is also quite informative.
We are talking about real-time analytics now. When I say real-time analytics, someone tweeted somewhere, someone wrote a Facebook post stating that they just ordered a crib for their small baby just a couple of minutes back and they got an error message on their website stating that the website is down. This could happen probably for someone who is ordering a product somewhere in Namibia and they want that product to be delivered from the US. Now people who are sitting in the US and the ones sitting in Namibia, there could be some time differences. There may not be such a real-time analytics algorithm which will quickly give insights to the backend team stating that there is some problem with the website and the customer is cribbing about it. This is what real-time analytics that we are talking about.
Now we have the concept of stakeholder 360. Since everyone is connected, the data obviously should be flowing freely among all the teams of an organization. It's quite important that not only are we talking about data 360 as we said about data discovery and management as the first green box there, but we are also talking about supplier 360, vendor 360 and customer 360. This is because your customer could be anywhere in the world and he can talk good or bad about your product. The data obviously is disruptive so you need to really be so proactive, reactive and interactive on a real-time basis so that you can understand what is really happening and the connectivity is quite important. As a Data Scientist I believe, it's a primary role for us to be really involved in that kind of analytics which is very important and consistent.
We are talking about consistent insights across organization here. I will give you an example if there was a product which was losing its sales and it was reported by the sales team as they monitored the product. In Target, we set up a Center of Excellence and the idea behind it is they are the ones who monitor the real-time analytics and are responsible for gathering data across the organization and they then try to connect the different dots. Now one of the teams in the business. discovered and monitored the product which is losing sales over a period of time and the other one, they were not really worried or because they didn't really receive the signals from that team. So if the insights are not really consistent across the organization, it's a problem. So how consistent do we make the inside set run across the organization which means you can identify who are your primary and secondary consumers. You should make sure that you deliver the data and insights to them in the best possible manner.
Everything is data driven. Earlier they were not taken a decision based on the data, they were just going through the gut feeling. They know that the data and sales are on a declining trend, so they said let’s do something else like start a new product because the old product is not deriving sales. It’s no more of that kind. We are not talking about data-driven informed decision making, so everything is depending on data. What is really making you depend on the data? It is that which is churning the data to a maximum extent, so that's very important.
You know data never sleeps. As we all know that data is disruptive, it is extracted, derived and generated so disruptively and actively. I have brought statistics that I was able to get from a website which continuously monitors this, it's called Domo. You can go back and check the Wikipedia on Domo, what is the kind of business they do. They have a humongous amount of data that is generated and some of the statistical examples that you can see right here. Say, for example, LinkedIn is a professional network and almost every day you see your network is growing in size. I mean it's no more a professional network I would believe that people are also talking about their personal stuff and then the amount of information that is being generated almost every second is humongous. There are new social media websites like Sify or Snapchat where people exchange digital content, which is one of those drivers which tells you how good is your business making. If there is someone who is sharing your photos like your business photos, product photos and then they mention a small text below and then there are many comments that are flowing that itself is an important data point for you to analyze whether your business is doing good or not. That's the reason why digital content websites like Snapchat, Instagram and Tumblr and many businesses are subscribing to all this. It’s not just people like you and me, but many businesses are also subscribing to it and they are trying to talk and discover about their products. Twitter is one such example where many companies talk about their deals, coupons etc. This great bit of information I would see where data scientists can extract and derive a lot of insights about. So you name a website here and then, almost every minute of the day, every second I would believe the humongous amount of information is generated. Now there is one lacuna I still believe that we, data scientists over a period of time could definitely manipulate ourselves, can improve ourselves and the lacuna is the interactivity between different digital devices. So you have something which is generated in Snapchat, which may not talk with LinkedIn, but I know that Facebook and YouTube are integrated or maybe Twitter and YouTube are integrated, I don't know much about that. But if you talk about Skype or Snapchat or Netflix which is typically a streaming video content, that may not be related much with the weather channel. So these digital sources may not be really integrated among themselves and the day is not far, where we would see that all these digital devices would be connected with each other and that would definitely complete the concept of device 360. We didn't even complete device 360 and we started talking about IoT and all that stuff. A machine starts generating data, and there is another machine which receives the data and interprets the data and then does some kind of analytics which is more applied to healthcare. But I would believe that it may take some more time for this device 360 to be real to turn itself into a reality. I know there is something called machine to machine analytics which is coming and there are some companies in India which are heavily working on that but that will really take some more time and that itself will probably be complete analytics I would say at the end of the day.
We shall talk about Big Data now. As I conduct a lot of webinars, there is one question that many people ask me i.e, how do you define Big Data? Is it just based on the volume of data that is being generated? Let me explain with the help of an example, I have a machine - a laptop which has 100 GB or maybe 1 TeraByte(TB) of space. Now for me, there is data which is more than 1 TeraByte, then it is Big Data for me. I would say that it is too big for my laptop to accommodate. But that's not what Big Data is all about. Earlier I used to talk about only 4 V’s, now we are talking about an 8 V’s. So it’s not just the volume of data that is coming inside your organization which is flowing freely, but it is also value that you generate from the data. For example, there are too many tweets that are being discussed and none of the tweets is valuable to your organization. None of them is talking about your business processes, how important it is to transform your business process and stuff like that, so it may not be important for you at all. So if it is not valuable and some of the organizations have a benchmark stating that if the value of the data is less than 60% or 70%, this means the actual value is not 50% or lesser, it of no value and may not be Big Data at all.
The concept that I’m talking about today is visualization. If you say that you can make probably some sense out of data, it may or may not be Big Data, there may be a variety that you might be talking about. If you remember I talked about a digital content. Earlier we were talking about the text being transferred between the different devices and now we are talking about transferring digital content. Probably after few years of time, there will be devices which would transform or transmit data into some kind of a signal which can be transmitted. So we are talking about signals as well and also about textual data, images, maps, blogs, binary large objects, videos that could be transformed and transmitted across the devices. So if there is a platform which can really manage the variety of information and process it in real time(remember the one we discussed in the first slide) it can really transform and realize to the business, that this kind of data exists, I believe that this is Big Data. So it need not be too much big so that your machine cannot really sustain it. The reason why I'm talking about this is we are not talking about cloud computing here. So the data is no more hosted on your laptop or in you on your local desktop. It's now being hosted on a cloud platform and it is the headache of companies like Microsoft or Amazon to manage that kind of humongous amount of data, so they have lots of complex data warehouses, hardware, software etc. to manage your data and data of numerous businesses. It’s a big cloud platform.
We shall also talk about velocity now wherein we’ll see how quickly the data is generated in real time and how is the outlook that you have for today. So we are not only talking about today, we are also talking about yesterday and there is an opportunity for you to be proactive about what could happen next moment. For example, if there is an AC machine which was making some sound for the last one week and sufficient data is not outputted, when I say data I mean the throughput. For some reason, it started giving you more warmth than what it is supposed to give, then you can proactively think or probably predict that after a few weeks from now, this machine is going to die. That is just a gut feeling of yours. But how much time does it really take to die is something that probably your big data analytics can really bring in?
We will talk about viscosity: does it really stick with you? This means that does it really give you the same value that you really intended that data to give you? That means there is some amount of data that you put some analytics on, but it didn't really give you the sufficient insights for your action. I will share one example, you know down the line after a couple of slides. But viscosity is all about whether the data can really give you sufficient actionable decision-making or not.
About virality, whether your data can travel and transmit itself from one source to another destination and how quickly can it really go? So it's not just velocity like how real-time the data is generated internal to your database; but in real time, how far this particular data has reached from a particular source to a particular destination.
Two metrics: (i) What is the business value that each of this bucket will give you? and (ii) The difficulty. There are many other metrics, we would not be talking about it, but value and difficulty only. So you can see this 45° line, which is cutting the X axis and Y axis exactly at the center. You can see there are four different areas and we will actually call this as D2 and P2 mechanism, which means Descriptive, Diagnostic, Predictive and Prescriptive. You can find the same thing probably in a different visualization in Wikipedia or Google or what not, any other search engine, but Gartner is the first one which came up with this idea. How do I differentiate between all these as the questions are asked, what really happened and that's what is your descriptive analytics is all about.
Since you are from the data scientists team, your business comes and says that look I have a problem; my sales are decreasing. They ask you what happened because they have data but they don't know how to analyze it. So it is you who is going to drill down and see what really happened. So you may have some basic dashboard, some pivot charts and stuff like that because you would ask them some kind of a structured data and then you do some kind of basic statistical dashboarding and you would come up with some answers as to what really happened. For example, you can come up with a high-level analysis stating that maybe some stores in Texas have lost sales, that's all you can come up with. That's the basic descriptive analysis, you don't know what exactly in those stores in Texas are really undergoing through. When you don't know whether it is the problem with these sales customers, sales representatives or it is probably the store in Texas which is probably unreachable or probably is situated on a hill where people can't really walk and they can make a purchase or maybe that store is not available online. There could be any reason but what you did at a very high level is you just said that the store in Texas probably around that Bay Area is losing sales. That's all you were able to tell.
The question: Why did it happen? This is all about diagnostic analysis because you're trying to do some diagnosis there. You double-click now because you know that there is a store which is present in some area which is losing sales. Now you say that why did that particular thing happen? This means you are doing some kind of a competitive thing here and there is another store in same Texas Bay Area which is not losing sales, but this one did. So you probably do some kind of competitive analysis as to what and why did or didn’t that happen, etc. There are some graphs which we will discuss further and help you to understand in a better way.
The third one is predictive analytics, what will happen if I just leave it like this; which means that you are able to do enough groundwork in the first two analysis, which is the historical data analysis. If you are able to come up to a conclusion to state that what kind of analysis I can do in future or what will happen in future if I do not take care of what happened in the past. That's what is predictive analytics. That means you are able to identify that yes there were some customers who were dissatisfied in this particular store because the store manager was not good, he or she was, I hate to use the racist words but you know in the US you can't use the word black. And there could be some possibility, in fact, there are many videos that are being so viral these days on Facebook. Walmart is one example, where Walmart came into the picture because you know there are some stores where some of the some of the store managers use the words black and racist and you know sorry to use but some religions were also being used. Unfortunate incident, but that was one of the reasons why sales were on a downtrend, people stopped going to those stores and stuff like that. So, it is important that we also, we not only look at what happened in the past but we also need to uncover what would happen in the future, if sufficient measures were not taken. So, if you don't know that so and so store did this kind of stuff, which should be stopped. Because, if you don't stop then that could impact the future sales, that's important. So, you need to have some kind of predictive, I just take an example to connect this, but there could be many dimensions to it.
And if you know that there are sales which could improve over a period of time if sufficient measures are taken, you also need to know how to make that particular thing happen; that’s what is prescriptive analytics. You go to a doctor, he prescribes something, he says that look you have some knee pain so let me do some kind of prognosis. The word is called as prognosis because; in diagnostic analytics, its more on a diagnosis. In predictive analytics, it is fixing so that you make sure that such thing doesn't really happen again; so that you have an improvement. And how can we make that happen and how do you realize that? That’s what is prognosis all about. The doctor gives you a medicine and he says that look this is what I have done and over a period of time your knee pain is gone. So, that is the kind of foresight we are talking about.
So, if I really break this entire Gartner Analytics Continuum into three different areas. I would say hindsight, which is what is your historical data he is talking about; your insight which is what you are trying to double click and see what is really happening and how do I uncover the hidden patterns and stuff like that. And then you are also trying to prescribe what are the measures that we need to do so that we can probably do some kind of foresight there. And this is a great transformation from information to optimization. Because, why would I say information to Optimization? The optimization is only possible after you have understood enough of the data. So, it is something that you are looking at after some time. So, you have an experience with the data; you say that something bad has happened with the data, that is what your information is all about. You did some descriptive analysis and you did some diagnosis analytics, you did some predictive analytics and then you are able to now say that, “ok this is what is the stuff I need to do so that you can optimize it”.
So, you know, in the overall picture or presentation of the entire big data analytics if you ask me, I would or for that matter, any data scientist’s journey would fall into three different buckets, essentially three different buckets. You can probably bucketize them or you can categorize each bucket further, you can drill down, you can double-click each of them. But, I would believe that entire gamut of this big data analytics, I hate to use the word big data analytics; because it's not just big data analytics but it is also about the other two buckets. But, it's the gamut of three different buckets, one is data which is the most important and we talked about it in the first few slides, it is only talking about data- how disruptive it is and stuff like that. The next is analytics, and what is the role that analytics will play, which is what I believe the next few slides will talk about. And the next is the visualization, and how important it is for someone to visualize interactively and in real-time to know what is happening. The Amazon trucks example that I talked about. I will give you some more examples if time permits.
Okay, so you're also talking about, you know, great bit of migration stuff to the cloud and stuff like that. I mean people, there are many many people who are making a lot of money, who help the companies to host the data not only host the data but also do real-time analytics inbox or in memory computing and a lot of other terms also you would uncover slowly as you enter into the data science field. But, you know the formula right now for existence is not E is equal to MC square, the data is transformed. Of course, I know that data or energy is transformed to another; in the big data analytics the data is transformed into insights so it is e is equal to MC square definitely, where it's no more energy, but it is B is equal to MC square stuff like that. But, we are also talking about E is equal to M square C; that is what is the efficiency I have in migrating to the cloud. It is also important how quickly we analyze, it's also important how quickly I transform the data from unstructured to structured, and there is much other stuff also that comes in picture. We have a humongous amount of information, it's time for us to really make sure that the data is churned in the real-time quickly, and then also in a sufficient manner; that's the reason we are talking about cloud here.
So, what are the different verticals? Now, these are just some of the verticals, I didn't really mention a lot of other verticals as well. So, what are the different verticals as I said there are many many other verticals as well I remember myself connecting with a couple of detectives and FBI, how dare me! I was actually able to do that, I don’t know what reason I was doing it; because I was thinking that a lot of analytics can be done in forensics as well. So, you know I connected with a couple of detectives in FBI, some of them are retired. What these guys asked me is, what kind of analytics you guys can do? What big data analytics can play a major role in forensic analytics and I was able to state some of the facts that, you can, like say for example cyberstalking is one of those examples, right?
So, I remember this connecting with the detectives and they asked me the question as to what kind of analytics can play a role, cyberstalking is one example, right? I mean you know that many people, many women, and men also some of them are being stalked on Facebook and unwanted messages, irrelevant digital content is being stored; and how do I probably know who the stalker is and how I can be pre-emptive in nature to prevent such kind of attacks and stuff like that. So, analytics and definitely play a major role; but the verticals that you can see on this slide they are almost on the mature side almost saturated side but still lot of analytics and a lot of data discovery is being made. Media analytics is not that mature when you compare it to retail CPG, banking, and finance, telecommunication; even health care is also not mature. Reinsurance is also not mature; but, insurance, retail CPG, banking and finance, telecommunications; I think these are some of the verticals where analytics is really mature. Mature as in its saturated but of course there is a huge amount of potential improvement on existing models to understand your data, and better fine-tune the data to do what informative decision-making is all about.
We'll talk about one of the cases on healthcare and probably one case on media, it will be more contextual it nature but healthcare I would take you through a slide. So, I will pause here for a minute to take some of the questions if you guys have, and then we will go ahead. So, probably I will also take some of the examples. Yes, we can take questions, I am ready for any questions. In case if you have any questions, it would be good. So, someone was asking, “is there an HR vertical?” Yes, definitely there is an HR Vertical. I will take that question first, before someone types in a question. You know, I said I worked in Target and then many employees in Target, there were people who worked in Target, people were leaving the company and lot of negative impact it was creating. And then we are also talking about the importance of women to join analytics companies and stuff like that. So there is one company called glassdoor- so what are the possibilities in HR vertical- who collects a lot of data about employee-related queries, employee-related concerns and stuff like that. And there is a huge amount of data that is also generated internal to the organization, right? Say, for example, you leave a company and then you are asked to take an exit interview, I am just taking one of the examples, so you are asked to take an exit interview and an exit interview will have a big poll; it will ask a couple questions as to why are you satisfied or dissatisfied, reasons why you left the company. And the response rate majority of the times is erratic, because if you get a six-figure salary and you are more inclined towards the salary of a new job than what your existing company is giving; you don't really write on the response, right? Obviously, you write something which is not really relevant sometimes, I am not making generic. But, many people, they do take the survey; they do give responses in the exit interview, that itself is a great opportunity for the HR analytics HR business process to act upon. Glassdoor, as I said, is one of the companies who discusses a lot about concerns in employees, concerns in salaries, concerns in the company, the transformation from one area of business to another area of business, mergers, and acquisitions is one example. I know there are some companies who are in mergers and acquisitions, and when they merged into some other company and many employees left the company. What kind of analytics you can do, being a part of HR, for employees like that? What impact mergers can do and stuff like that. Just one example, but we can talk about others as well.
What is the background required for analytics? I have done honors in economics, after that worked in MIS economics, MIS reporting for one year. You know what, you are from honors economics and it is quite important we have many streams of analytics. Time series forecasting is one of the best areas of econometrics, where analytics is being heavily used. I will double click on each of them later, but now that you mention you are into MIS reporting, there are some econometrics tools which are key to these analytics tools; which involve a lot of reporting and visualization. So, you have a great bit of future there, and there is no problem. Companies are in dearth of people like you, there is a great bit of need, I believe. When you say data is disruptive, what does it exactly mean? See, the amount of analytics that is being done on data versus the amount of data that is being generated- there is a huge amount of gap- that's the reason I said data is disruptive. It means- while I am speaking, for example, in the word real-time analytics there are three words; so, by the time I say real-time analytics, real-time data more data is being generated and this is happening. So, it's important that we have a kind of a system which is real-time in nature, that is keeping some time lag- it should be able to collect the data and derive analytics which can act upon in real-time. So, data is disruptive, it is so fast and also volatile; that means when I say volatile, someone mentioned a bad comment on facebook and it is also read by many people online. By the time you want to remove that data, it is already captured by many people’s eyes, you can’t really do much about it. So, it is disruptive in that nature. What kind of data would you capture for you to understand, you know, the store manager who used racist words? There are some unstructured data that is being monitored on websites like Yelp, even Facebook, Twitter. There is a humongous amount of in-house store surveys, you would look at that kind of unstructured data. You would do NLTK and NLP to convert that into insights, which will tell you whether it is a racist word or not. Explain the concept of the velocity; velocity as you know, it is talking about the speed at which the data is generated. And, not only that, you are also talking about how quickly your analytics can act upon that. You mention something bad about a product, for example, a couple of years back Target mentioned about a teenager who got pregnant before she told her mom. She mentioned it on facebook and that became so viral because target responded to it and they didn't delete it from facebook. It is a security concern, it is a privacy concern. So, how quickly that data was viral? How quickly was that data able to be captured by many people? And they did not act upon that. So, velocity talks about the how quickly that is not only transmitted but also not acted upon, that is the concept of the velocity. What field should we choose if we want to do MS in business analytics? If you are a business analyst, you are the key guy when you talk about business analytics. You are the one who's going to go to the client first, data scientists don't really go there. You are the one who is going to connect different businesses- different dots- you are going to talk domain expert. So, if you are a business analyst, you are the key guy. The data scientists are dependent completely on what you bring from the client, you are an important person, I believe. So, what kind of salary do you think a business analytics can make a year? Depends on many parameters, it depends also on what kind of vertical you're talking about. In healthcare, a business analyst makes more than what the same business analytics can do in telecom There are many reasons why we are talking about this, healthcare is more important in every person’s day to day life. I believe, a business analyst in healthcare is probably earning more. But then, there are many surveys that are being done. How do you become a data scientist? So, if you are working in and out on data, I would say that you are a data scientist, no doubt about it. You don't need to be a statistical analyst to be a data scientist, you are already a data scientist. So, what is the scope of the business analyst in various fields, and does it serve well? Definitely- you are a business analyst, you are the hero or the heroine of this entire continuum. You are the one who’s going to convert the business problem into a statistical hypothesis, and it is the data scientist who is going to come back and then answer it; by doing statistical analysis. How do you identify junk data to weed them out? There are many tools that are available which are already doing that, I would not say they are 100% accurate. DataFlux Studio is one of them from SAS, which knows how to clean the data and identify junk data. There are many business rules that we have written, as a data scientist, I have written many contextual rules which are business oriented. I will give you an example, one of the rules is talking about the word “Target India” and when I write the search term “Target India” obviously I would be interested to know all the comments, which are talking about target corporation doing their business in India. That also gave me many search responses which are talking about Osama bin Laden targeting India. So, it's a junk data for me; so I have written some business rules, some contextual rules, which deleted them. So, you are learning python, you are on the right path to start your career in data science; because you are from the finance background and since you are pursuing python- I think there are many streams like machine learning, you can definitely put that up. You can also look into building a complete data acquisition platform using python and stuff like that. We are coming up with the new course, I will explain to you what kind of relevance it has in finance.
How do I become a data scientist?
You are already data scientist if you are dealing with data, you don't need to be a statistician for that. We all also talk about this particular question in the career path.
I will not spend much time, this is something we already completed. So, what is the business value and what is the capability maturity? We said that we are talking about data discovery engine- how you can create a data acquisition engine? How can you use artificial intelligence and machine learning to churn the data, to fine-tune the data and then how you can probably visualize? Now, I'll mention a small thing here and then we'll move forward to the next slide, my mentor always used to say that, unless and until you sleep with the data you don't make the right babies. So, it's important that you spend as much time as you have on data discovery and then do that other stuff. So, people who are already data scientists who are already spending their enough time on data acquisition and stuff like that, you're very fortunate than you are already a data scientist.
I'll just quickly take one minute on a case study on capital markets analysis. I took a specific case, this is just an approach as to what kind of stuff you can do. So, you are investing in a company and you want to know whether this company is doing good or not, you just look at some of the websites like Bloomberg and whatnot and then you look at market capitalization and stuff. And then based on the market capitalization, based on the rate of return and whatnot; most of the finance participants coming from finance background they will understand what I am speaking about. But, in capital markets, statistical analysis, artificial intelligence, and machine learning play a major role. So, a lot of unstructured data is available, which otherwise you know structured databases like Bloomberg and Fortune finder and whatnot they can't really give you.
An example is this, you can see on this slide, I wrote a small comment that was written by a domain expert. He is the COO of the company, and he said that the operating margin for Microsoft increased by 10% during quarter one, ending 2012; due to increase in the sales of Microsoft Xbox. So, if you are interested in investing your share in Microsoft and you are just looking at structured data, you may not be able to take a decision to spend your money on Microsoft. But, if you see that Microsoft has done very good, their operating margin increased by 10%; you would immediately want to do that. So, we data scientists have done that unstructured data analysis and then we were able to tell to my audience, I was able to tell to my audience that yes you can invest in Microsoft because they did very well in operating margin.
In retail you can do a lot of stuff on product intelligence, I'm trying to quickly give the perspective on each of them. You can do on product intelligence, what your products are doing? What kind of declining trend or inclining trend and whatnot. Whether you can do similar product analysis? What are the questions that are asked by customers? And whether I can probably do some kind of new product development? What product features are more important before someone makes an intelligent purchase? Can I use that as a recommendation for some kind of a feature scorecard analysis? Can I use that as a recommendation for new product development? And stuff like that. Competitive intelligence- Why my products are going down? Why are my competitors' products going down? Why my product is being added more to the wishlist? Why is it more demanding compared to my competitors' product? How can I understand my competitors' customers compared to my own customers? That comes under customer intelligence but also falls under competitive intelligence. What is my competitive strategy on social; what kind of instruments they make- marketing instruments- deals coupons and whatnot. What are the twitter trends on deals coupons, digital content- what kind of video they made on youtube and why it attracted more on youtube when compared to my video on youtube? How can you analyze more deals and stuff like that? Merchandising analytics- what kind of pricing, promotion, vendors and stuff like that, this is more into retail analytics.
So, you can do a lot of store segmentation market mix modeling, store forecasting modeling. There are some models which are already available in the market and stuff like that.
Customer intelligence- this is a very specific area that I double clicked and I wanted to show what kind of analytics we can do in customer intelligence. Segmentation of customers, we are talking about personalization these days; big data personalization that means you are able to identify each and every customer uniquely. No two customers are the same, right? So, if you target 100 thousands of customers with the same product, with one single product; the return on investment is going to be very less. It is going to be a percentage, so if you are able to make sure that you're able to segment the customers uniquely even to the level of going and identifying one single customer and customizing the product very specific to him; that is what is customer segmentation is all about. We're talking about identity resolution mapping that is understanding one customer across the different boards, across the different social media channels. Your customer stopped shopping at your store and he went to a competitor store, why? What were the reasons? That is retail intelligence. Purchase confidence estimation, that is, how confident your customer posts the purchase; you are able to do analytics on that. Sentiment analysis- why is your customer satisfied or dissatisfied? How viral that has been that is social listening. Demand estimation- how are you able to estimate the demand using some kind of a time series forecasting model and stuff like that. I don't want to touch upon all of them.
You may also like: 9 Must Have Data Analysis Tools To Create Dashing Business Reports
So, Identity resolution as I said, identifying the same customer across Facebook, Pinterest, Snapchat and stuff like that. So, you're able to combine all of them into one single platform. So, you know the same customer, who is shopping in your store is talking bad about your product on Facebook. How are you able to do that? Unless and until they come to your Facebook page and make a mention about your product on your own social media page, you don't know what is that they are talking about. So, you are able to go to their posts and you are able to do that, unless and until they are private.
So, someone talked negatively about your Facebook page. So, you as a business he’s mentioned something on Facebook and you can see the customer here is saying, “we won't be spending any more money.” So this itself is a negative sentiment and that is becoming so viral. So, one gentleman here posted something on Facebook and imagine he has got 300 Facebook followers or friends, and you know that can become so viral. I mean people are so connected so that itself can become so replicative and so viral. And this video has made so many views, so which means that not only people who are writing here and then his followers; but, so many views, so people who viewed that, who shared that that also becomes an important data point for you to identify how reputed your post is and how you can probably stop it.
So, media analytics, what kind of stuff you can do. Piracy analysis, for example, before even the movie is pirated you can probably come to the conclusion that this has more potential to become pirated. Wow, you can stop that? How can you monetize the content? How can you convert this, how you can improve the monetization on this? And how you can do customer churn? How can you predict audience interests over a period of time? The interests change, for example, big bang theory, right? That comedy sitcom has converted from very good sitcom to an average sitcom, over a period of time.
In healthcare also there is a lot of cognitive computing systems that are available. What kind of analytics you can bring on to doctors medical practitioners, etc. One example I can give, a drug that was discovered somewhere in Australia for a cardiovascular disease like myocardial infarction for example, which is a heart attack or atrial fibrillation, right? That drug which can cure a heart attack; how that drug can help doctors who are present somewhere in a remote area in, you know, West Bengal or maybe in Andhra- who knows that such a drug is available in the market and that can really cure people with already suffering from diabetes, for example. So, you need a very quick analytic system which can act upon that, I'm talking about clinical analytics, clinical trials and stuff like that.
I won't double-click on this, but just a few seconds. This is a typical methodology that analytics, data scientists who are working in healthcare analytics can work on. It is a very unique disease called De Quervain’s Tenosynovitis, which is a bone forming in a bone- a bone forming inside a bone- it's a very unique disease. Maybe 2 out of 1 million undergo that and you know the experience that, and what is the methodology we follow to diagnose that particular disease. So, you go to a doctor and you say I am suffering from fever, now that fever could be a reason for 10,000 diseases. We don't know what one of those diseases is; so, how we can create an intelligent analytic system to help the doctors, to help the analytics health care practitioners, to identify the right drug and the right disease.
So pharmacovigilance is one of them, I don't want to spend much time but again the same thing. If a drug is administered to a patient, then a doctor should know what are the other drugs the same patient has already undergone in the last 7 days. If the patient is already having, say, aspirin for example- I don't know if anyone of you knows this, but if you have aspirin already in your system; then you can't go to a doctor for an injection. Because, if you take an injection and already aspirin is present, given the aspirin is used as an anticoagulant or a blood thinner, the injection cannot be given because if you give injection the blood will ooze out. How many doctors know it? And whether it is important or not, I mean that's quite important. I'm just giving an example. How one symptom can transform itself into other, what kind of side effects may happen if a drug is administered, and stuff like that. A lot of analytics is being applied at it.
Now, what I will do now is take a minutes pause and after this pause we will talk about the career path and what kind of career path you have as when you already are a data scientist you are aware that you are a data scientist; but, I think this career path will help you take the right stream and probably answer some of your questions, which I have not seen for the first few slides. So, let's double-click this and see what kind of career path you have, and we at GreyCampus are offering. So, if you really see, there is a mandatory course that we made at GreyCampus; which is a mandatory course for everyone. Which means that if you are not into the stream of data analytics or data science, if you take this mandatory course the transformation that you have is easy, quite easy- your existing career path to whatever career path you have. So, as I said, this is a mandatory course, we will call it as module zero one for four weeks, it's called the data science foundation, and you have two different streams one is for programmers and the other is for non-programmers. There's a small discrepancy, the right side is for non-programmers and the left side is for programmers because you talk about machine learning and all the stuff; this is given that you already have some programming background. So, someone I know, one of the participants said she did a course in Coursera on Python; you would obviously go on and take the left side stream because you already have a data sense foundation, you should take up the next best which is for machine learning for beginners and advanced machine learning. So, the left side is for programmers and the right side is for non-programmers. So, we'll be doing a module 2 for four weeks again, which is mostly on time-series, classification techniques, regressions and what not; more on statistical techniques and a bit on tableau used as a platform for visualization. So, you can either select any one of them or you can have a customized program. So, the advisors at Grey campus would help you identify the right combination to arrive at a particular combination; but, the right side as I said is more for non-programmers and you can select one of them. Tableau being a data visualization tool, it's quite easy so that's the reason why non-programmers should be able to take that, but even the programmers can take that. We'll talk about a little more once we talk about data visualization we'll have an introduction to bootstrapping, backing and boosting. So, you don't need to have any knowledge of programming for understanding this, so you'll be done and there will be a demo as well to understand all these things. There's a 2-day demo which will be given by GreyCampus team so it will be helpful for you guys to know more about each of those. And there will be specific domain specific thing also, so you don't need to be a programmer for all those stuff and four verticals we have chosen to make sure you understand where you fit in. So, you either select one or you can elect all of them, and similarly, the left one is exclusively for programmers; it's for machine learning for beginners will talk about supervised and unsupervised structured and unstructured and stuff like that. So, you'll talk about neural networks, some kind of kernel estimation program, single value decomposition and SVM's and stuff like that, and support vector machines- SPD's and SVM's both, neural networks and stuff like that. And some deep learning NLP, NLP K and stuff like that, so you can either select one of them or you can choose a couple of them.
You know while you are taking a course, just a small thing I want to add. I don't know if you people have read the news; there are some machine learning tools, which are also coming based on Sanskrit as a defacto language for machine learning software. So, Sanskrit is also not restricted. Not just English; because there are, I believe, a couple of languages around 25 to 28 languages where a majority of the machine learning software is being written and our current president Ramnadh Govind also has mentioned it as a point. He says that it is more suitable to have some machine learning software being developed in India based on Sanskrit as a defacto language.
Attendee- Great, so that opens up jobs for Sanskrit scholars. Who is interested in Sanskrit. Is it helpful for mechanical background, Surya, this data science?
Trainer- So, if you take about Jaguar, Jaguar is a company, which I think is taken over by Tata. They have opened a big center of excellence in Jamshedpur, I'm not really sure; Jamshedpur, I believe, but they have a data science team which is coming up and they want only mechanical engineers. Because they are the ones who are going to convert the domain knowledge into machine learning software. In the right roles, I think mechanical engineers are definitely needed. That is one of the streams I can think of.
Attendee- Great, so thinking, after you indicated I can also think of it. I have been in the aerospace industry in the past, I know for sure that Boeing has data science. I know for sure the air force has data science. Right? So, I guess there is ample opportunity all over.
Trainer- Yes, aerospace is also coming up. In terms of logistics, in terms of how well is the connectivity, a lot of analytics has been done in aerospace as well.
Trainer- Someone is asking about the opportunities in the finance industry.
Trainer- So, I will just answer that and we will call it a day. Now, in finance industry; risk mitigation, fraud intelligence and then the capital markets example that I told- I think these are some of the examples I can think of. There are a huge amount of opportunities in finance industries, every bank is now coming up with their own anti-money laundering software; developed in-house. So, that's one of the areas I can think of, which comes under risk again. How volatile is the risk? And how you can mitigate that risk? Stuff like that. So, two or three examples I gave you.
27 FEB 2019RACI Matrix: How does it help Project Managers?