An introduction to Reinforcement Learning – Part 2Last updated on March 25, 2020
RL (Reinforcement learning) has proven its worth in a slew of AI-led (artificial intelligence) domains and is also starting to show successes in real-world cases.
The most popular application of deep reinforcement learning is of Google’s Deepmind and its robot named AlphaGo. Deepmind developed AlphaGo for it to be able to beat the most challenging board game in the world – Go, which it did. It then went on to create AlphaGo Zero, a version that trounced AlphaGo. In the real world, RL is playing a significant role in driving insightful decision-making across various industries such as optimizing manufacturing, solving supply-chain inventory problems, making healthcare safer, helping a car learn to drive by itself, etc. among many others.
That said, much of the advances in reinforcement learning have proved challenging to leverage in real-world systems. It is because of a series of presumptions and poorly defined realities/environments that are rarely satisfied in real practice.
In the previous part-1 of the ‘Introduction to RL’ series, we covered what RL is, the several differences between general practice ML and associated benefits. In this blog, I would like to present as a testbed for tangible RL research and the unique challenges that need to be addressed before RL can be effectively productionized in the real world.
Challenges of RL in the real world:
More recent work on RL has recognized that poorly designed realities of real-world systems are contributing to hamper the progress of real-world learning. While games and simple physical simulations have offered a benchmark domain for many fundamental developments, it is crucial to develop more sophisticated learning environments to solve complex real-world problems; as the field continues to mature. Domain experts have addresses, among others, issues such as limited exploration, unspecified reward functions, preparing the simulation environment adequately – all of which are highly dependent on the tasks to be performed. These issues make it difficult for control systems to be grounded in the real, physical world.
Here are SIX of the unique high-level challenges we can look at:
- Algorithm gap: RL systems take several trials to begin learning, are sensitive to several hyper-parameters, and find it challenging to balance between exploiting what is learned and what needs to be discovered/explored.
- Ineffective offline learning: Many times the case is that training cannot be done online, and so learning has to take place offline, using logs of the control system. Ideally, we would like it to see if the new system performs better than the old one, and hence requires ‘off-policy evaluations’ – wherein we can estimate performance without running it on a real system.
- Limited samples for learning: Many real-world systems do not have separate training and evaluation environments. The agent is, therefore, likely to explore very little of the state space and, thus, leads to limited learning.
- Unclear reward functions: in most cases, the system or product owners are unable to present a clear picture of what needs to be optimized. Rewards are often multi-dimensional and therefore need reward function evaluations through the distribution of behaviors.
- Delayed rewards: Real systems often have delays when sensing the state of reward feedback. These reward delays can be spread over a week or four, and therefore causing challenges with time-bound objectives
- Partially prohibitive exploration in commercial systems: Both offline and online learning are affected by this problem leading to logging policies lacking in explicit exploration
- Human interpretability/explainability: Real-world systems operated/owned by humans need the operators to be continuously reassured about the controller’s intentions and need evolving insights on failure cases
Therefore, being able to transfer an RL model out of the training environment into the real world can prove to be tricky. Tweaking and scaling neural networks controlling the agent is another challenge. To solve them, researchers are looking into several traditional concepts such as intrinsic motivation, imitation learning, and hierarchical learning.
The excitement is justified: Driving RL to maturity
Despite its limitations and the research being in its infancy, RL has offered tremendous value and early wins in use cases for industrial robotics, financial services, healthcare, and designing drugs. Soon and with more maturity, it may be one of the most effective ways to interact with a customer.
In marketing, for example, a brand’s actions could include all the combinations of solutions, services, products, offers, and messaging – harmoniously integrated across different channels, and each message personalized – down to the font, color, words, or images. In supply-chain, RL can help decision-makers make better logistics decisions based on inventory forecasts, resource availability, and timely delivery of shipments at a lower cost.
We shall dive into each of these challenges with ways to solve them in upcoming blogs in the RL series.
- An introduction to Reinforcement Learning – Towards Data Science
- Three Things to Know About Reinforcement Learning
- Reinforcement learning for real-world systems | Strong Analytics
- Reinforcement Learning: What is, Algorithms, Applications, Example
- Challenges of Real-World Reinforcement Learning
- Reinforcement Learning: Its necessity and challenges
- Challenges of real-world reinforcement learning – the morning paper
- The Real Challenge of Real-World Reinforcement Learning: The Human Factor – StatNLP Heidelberg
- The next challenges for reinforcement learning – Microsoft Research
- Project Malmo: Reinforcement learning in a complex world – Microsoft Research
- reinforcement learning real-world applications – Google Search
- Applications of Reinforcement Learning in Real World
- Delivering on the Promise of Real-World Reinforcement Learning – Intel AI