Hello ladies and gentlemen! Welcome to our live discussion about AI and Risk Management and insurance. Today, we have our interview with Neal Silbert. Neal is Vice President and general manager of insurance at Data Robot which is an inventor and category leader in automated machine learning. It is used very heavily in the banking and insurance industry and its founders came from the insurance industry.
Boris: So Neal, can you tell me a short story about Data Robot.
Neal: Sure. Data Robot was founded in 2012 by two executives in the Travelers Insurance company. They left the Travelers. They’ve been competing very heavily in Kaggle competitions. Kaggle is the world-wide data science competition website. For example, Netflix might put a million dollar prize in a data set and say whoever can make the most effective prediction about customer video preferences can win this million dollars.
They were grandmasters and they started communicating more and more with other grandmasters about fast techniques to rapidly create very, very accurate predictive analytics models using machine learning. As they’ve been competing, they created and collected more and more techniques and began assembling a vision of how they could very rapidly create winning machine learning models to make the best predictions.
Then they started thinking more broadly about how they can apply these techniques based on their own insurance industry experience into other corporate environments and the idea of Data Robot was born. They then went and started collaborating more and more with these other worldwide Kaggle grandmasters. And pretty soon, they were on the way to build the Data Robot product.
So they spent three years secretly developing the product and since then, since 2015 when we slowly, very cautiously entered the market, Data Robot has grown very significantly. It’s been adopted very broadly in the banking and insurance industry, also in many other industries worldwide.
What makes data robot unique and special is that it is highly automated and takes many of the leading data science best practices and automatically applies them. It also has guide rails so the very experience data sciences can choose to use the data robot engine to very rapidly create machine learning models that are accurate and transparent in 5 to 15 times the speed they normally would.
New users who are not necessarily trained data scientists, just actuaries, business analyst, quantitative analysts or IT analysts can apply data robot and use it without being expert in data science because of all the automation and guide rails. They can actually produce accurate and understandable machine learning models.
Boris: Alright. So the data scientist is kind of very cool job title just now as opposed to what they were just a few years back when they were called mostly statisticians. Can you explain perhaps what is required of data scientists?
Neal. Sure. I think the difference between data scientists and statisticians is their depth of programming skills in general. Traditional statisticians they do programming often they use statistics-specific software packages, anything ranging from SaaS or Matlab or SPSS and they work very well within those packages. At times, they would have to go out of them and do additional work which made sense if they needed to work on algorithms and things that weren’t available in the package.
With the growth of computational availability growing significantly, the increased availability of unstructured data, we saw a significant increase in a different approach to statistics using machine learning. Whereas traditional statistics, you would know a statistical method, you would prepare the data, optimize for that individual method, you would establish your hypothesis and you would test it and that would work very well. Then if you wanted to try other methods you would learn them and apply them in a similar way.
With machine learning, you would go and run different machine learning algorithms against our models, against our data and that would then produce algorithms that you could use. It required, in general, a more significant level of programming skill. If there is unstructured data, it might require very significant programming skills in order to acquire and accumulate the unstructured content and these machine learning models were in many ways being developed externally to the more traditional statistical analysis and they were coming more out of the programming communities.
Then what we eventually saw is this combination of statistics and programming and I believe that skillset which also heavily emphasizes the bilious programming skills to acquire, "massage" data plus machine learning plus programming skills that work. This is what we now see as the data scientist. So it’s a more programming oriented role and not exclusively using statistics and it brings in other techniques. That being said, more and more statisticians today are adopting these other techniques and capabilities because of the increased productivity and accuracy. They get the ability to very rapidly model dozens of methods and techniques at once. So rather than doing one by one by one and see which works.
So I would say that most statisticians are becoming data scientists. If they are not data scientists already.
Boris: Great answer, thank you. How can you explain what Data Robot offers to the industry and what are some examples of your customers use cases?
Neal: So Data Robot offers an opportunity to revolutionize the way insurances analyses. Currently, most insurance companies use very traditional statistical techniques both on the pricing side and on the reserving side. They’re generally resource-bounded, limited in terms of the number of data scientists and actuaries they have to apply sophisticated machine learning or even statistical techniques to solve a lot of other problems. They simply don’t have enough people.
So the insurance industry, when they start applying machine learning, they’re primarily applying it for pricing because that affects the largest volume of money in the business and then secondarily, they’re applying work traditional statistics and actuarial techniques to build their triangles for reserving. That area actually suffers and didn’t get enough of machine learning capability.
So what we’re seeing is through automated machine learning, instead of being able to only address 10% or 15% of the critical problems in insurance, we can expand the aperture and address 85% or 90%. The key use cases that are being addressed include “guess pricing” which is one of the starting points where companies want to do dynamic pricing they want to have more accurate risk segmentation and differentiation. They would like to see not only how to price a customer within a given bucket or all the customers within a given bucket, but they would like to be able to price on an individual basis using a much larger list of risk characteristics and have a more accurate price. So if the average price for the industry laws is 1.0, they might find that there’s much as a 30% or 40% variance within a given pricing bucket of individual risk. But they’d like to be able to capture and resolve that risk of the actual level of variance 0.7% or 1.3% or 1.4% rather than go and say we’ll just pick up an average across the entire bucket. Does that make sense?
Boris: Yes, absolutely.
Neal: May I continue for just a second?
Another area, where insurance carriers are always wanting more is specifically underwriting where we look at risk selection analysis. When a new business application comes in, there will be times when it’s very helpful either for risk tiering purposes or to be able to get an indication such as an app business any potential complexity or potentially very expensive cost and if the insurance agent or brokers involved they will often ask after submitting a new business submission where is this going to come in at?
Before the underwriters have the chance to do their underwriting pricing work, risk selection models help identify the primary issues and approximate pricing where a complexity of a risk expense submitted allows the underwriter to have a very consistent conversation with a broker or with a customer to let them understand an indication of where they’ll be coming in. If they don’t have a risk selection model then we’re relying on the very different levels of experience across all underwriters. We might have very inconsistent communications and sometimes they won’t be right. Statistically, a risk selection model helps them be right more often.
On the reserving side, this has been handled very traditionally by actuaries using their chain ladder techniques to build their laws reserving triangles. The biggest challenge here isn’t one in speed as much as how to understand changes. With reserving, it’s usually been done on a top-down basis. So we can see changes and reserves happen from quarter to quarter and we’ll know that they are changing.
However, what we can’t do is easily answer the question as to why they’re changing. From a traditional perspective that research into why reserves change either negative or positive development. It’s something that may take weeks to resolve and sometimes even longer. It’s a very difficult process. When we use machine learning, we can develop each individual claim that’s not fully developed or closed out to ultimate and then we roll them up.
That means that immediately if there’s a change in your forecast development you can identify which claims are the cause of it, you can group them and see what trends or issues are happening and then you can even analyze them to understand what claim or policy characteristics are driving it. There are roughly 150-200 primary use cases across PNC maybe a good hundred in life insurance, disability probably has about 130-150 and the number of use case and applications from machine learning will grow. We see them in other lines of business, other insurance products. I just need those 3 things as examples.
There are also very horizontal use cases that are up and coming to other industries and insurance. Those would be things like customer churn. Life insurance is very, very big on understanding whether or not a policy will lapse. Re-insurance needs to know this when they buy books or groups of business. So there are numerous use cases but I think that as we continue to apply machine learning and explore the capabilities we’re actually going to see the number of use cases multiply by 5 or 10.
Boris: Alright, thank you for this extensive answer. How are you different from some established data analytics companies?
Neal: That’s a great question. Data Robot was the inventor of automated machine learning. So traditional analytics companies either use a single technique like the generalized linear regression model invented in 1973 which requires very substantial manual interaction management. Trial and error work between, creating variables, see if they work looking into a variable and changing them to see if they work and it takes a very, very long time. Some companies will take from 10 months to 3 years to create a new underwriting your pricing model which is very, very long. When they start using automated machine learning, we can automate the analysis of interaction, we can automate the engineering of features that are used to make models. A feature might be, say a ratio or calculation between two variables in your data set. But having those features, the machine learning process becomes more accurate.
In the case of traditional analytics companies, you have to do all that by hand. Data Robot will do that automatically and it explores a much wider variety of features. It will automatically explore a much wider variety of analytic methods and techniques. So instead of doing one by one by one in variations all by hand over many months or years, you can have this automated very quickly. So I’d say the biggest differentiation is the application of the automation to increase both the variety of analytic techniques being used, to increase the breadth of an engineered feature that allows you to have better predictors and to increase the improvement and correction of the data so you don’t have to do it manually. That can result in a 50% to 80% time savings even for experienced people.
Boris: Alright, thank you. How is the model risk monitored so the managers cannot hide behind the automated models and we don’t repeat ourselves in the crisis with such tools?
Neal: Sure. I think there’s something important when we talk about like the past financial crisis, primarily a lot of that was on the collateralized debt obligations. What we’ve heard and learned is that a lot of the fundamental data on those debt obligations were not correct or even falsified. So the underlying data, their models were depending on were not necessarily accurate.
There are many other complex issues going in. When we talk about insurance risk pricing which I think would be the area that you would be most concerned about when you make a comparison to the past, we have to understand that this is a very different environment. First of all, insurance risks are much more granular. We’re talking about insuring a car or, your house or buildings. We’re not talking about insuring a lot of 100,000 cars or 100,00 buildings. Granted, re-insurance may do that but the fundamental primary insurance doesn’t. There’s a price on the individual asset or the thing being insured.
Second and what’s most important is that there was a complete lack of transparency into what was happening with this leverage products in the financial crisis. They use debt to aggregate large volumes and there is very poor understanding of how they would behave. With automated machine learning, what is important to have and what Data Robot has built in is significant levels of transparency both on the level lift and improvement on the models that are being created, on the explanation of what variables are important and predictive and what are not, the interactions between those variables and also what’s very important is the reasons when a prediction is made, the explanation of when a prediction is made and why the score was created. So Data Robot will show you for every time you create a prediction, which variables and what values influence that prediction to either it is high or low.
So this is critical. So when you have a reg letter asking questions of why did you price this insurance policy at this level, you can show them one the model documentation and the modelling how it works, two we provide the theory and the math behind everything so it’s completely transparent and three, we can also show for each individual policy for a quote that was made, which variables drove that quote price up or down and what values those variables did it. I think this transparency reduces the likeliness that we’ll have a similar problem in the future.
Boris: Alright, thank you. This is a very good answer. Last question, where do you see the future of machine learning and artificial intelligence and risk management in insurance? What can we expect from you guys in the future?
Neal: It’s a great question. I think we are at the very beginning of the adoption of automated machine learning. We’re seeing a great explosion of access to advanced analytics and artificial intelligence through machine learning. So I think we’re going to see in the near future a significant increase in the number of non-traditional data sciences. We call them citizen data sciences. We’re going to start analyzing and data and making predictions. I think that the role of established data sciences who are specialized in it is going to evolve even further. They’re going to grow. I think we’re going to see that data scientists will have increasingly large governance, coaching, enablement, and advisory role. They’re going to be working with a much larger number of citizen data scientists to help them continue to be productive but also to give them insights on how to make their analyses even better. I think we’re going to see an evolution of software capabilities that allow more and more communication about transparency, trusted AI, trusted performance and how we can communicate and work together between citizen data scientists, data scientist regulators, and consumers.
I think in terms of general features we will see a continued broadening of the number of analytic methods and types of analyses that are being done. I think we will also see an expansion of the kind of data types that are being brought into many of these tools so we’ll see wider and varied types of data, not always structured data being brought into analyses to make better and broader predictions.
Traditionally, you may have very, very specific tools to analyze certain types of data. But we have to answer an important insurance or business problem, that problem can’t always be reduced to one data type. So we achieve much more lift when we blend multiple types of information or data into the same analyses.
An example of that is fraud. When we look at fraudulent claims analysis. When we do pure structured data analysis, we may see a 60% or 70% or 75% initial accuracy.
But when we start adding unstructured data that might come from police reports, medical records, and other sources, you might see a 10% or 15% lift. The more types of data that can be added collectively into the same analysis, so let’s combine structured data, unstructured data and other new data types, I think we’ll continue to see increasingly large amounts of improvements of accuracy and new insights come about.
Boris: Alright, thank you. I wish you great success with your future growth plans. During these years you grew your company quite significantly. I hope to hear from you and I hope that our members will hear from you and will start using your AI and ML services.
Neal: Thank you. It’s been a real pleasure. I was just thinking as we’re going through this one of the big future transitions that we can really expect to see is a complement of traditional demographic data and historical data with real-time captured behavioral data. We see it in telematics data with vehicles. We’re now seeing it in life and then that’s usually for auto or motor insurance and we’re now seeing it in life insurance with biometric measurements of people’s activity and lifestyle. I think we’re going to see even more of it in commercial GL and property insurance when we look at traffic and movement of people monitoring safety situations devices, construction, behavior, and occurrences of both machinery and people.
We’re also seeing it in warranty insurances as well. So we’re going to see a complement of historical data plus extremely large volumes of what I would call sensor-base data which really gives us the insight into actual behavior. So that’s another major change I think something we’ll see more and more in the future.