Research Interests

At present, my research focuses on i) safe reinforcement learning, which can produce reliable solutions with no hazardous outcomes, and ii) investigating the decision-making mechanism in multi-agent systems where multiple agents collaboratively intend to do a task. I believe that resolving these issues will be significant in mitigating risks assigned to autonomous decision-making mechanisms and will be a noteworthy advance in the applicability of intelligence in these systems.

If interested to see what could be the applications of RL, our blog post illustrates an example on that. It is based our work on using RL for a supply chain game, called the Beer Game.  Together with the opex analytics, we have redesigned this game which is running with the RL backend: https://beergame.opexanalytics.com/

Another example is in transportation.  We study the Vehicle Routing Problem (VRP) which is a difficult combinatorial optimization problem. In this paper, we develop a framework for VRP, with the capability of solving a wide variety of combinatorial optimization problems using RL. According to the findings of this paper, our RL algorithm is competitive with state-of-the-art VRP heuristics both in solution quality and runtime, and this represents progress toward solving the VRP with RL for real applications. Our code is open-sourced and you can see the paper summary in the following video.

During the first two years of my Ph.D., I was working with Professor Alexander Stolyar on Stochastic Processes and Optimal Control. We developed a real-time optimal control scheme for the dynamic matching problem, with many possible applications such as real-time Taxi-allocations, Internet advertising, or assemble-to-order systems.

In the future years, I expect that my research agenda will be strongly correlated with developments of AI. Considering myself as a researcher who works on interdisciplinary topics, my goal is to help to close the gap between real-world applications and state-of-the-art architectures. In addition to developing improved methods to the problems that I have already investigated, I see large potential in manufacturing, Internet of Thing, Healthcare and Finance. The question that I am trying to answer in all these domains is, can we leverage these systems with a more efficient design, safe, and personalized behaviors?