Research team develops novel metric for evaluation of risk-return tradeoff in off-policy evaluation

24, Apr, 2024

Machine Learning and AI, News

Reinforcement learning (RL) is a machine learning technique that trains software by mimicking the trial-and-error learning process of humans. It has demonstrated considerable success in many areas that involve sequential decision-making. However, training RL models with real-world online tests is often undesirable as it can be risky, time-consuming, and importantly, unethical. Thus, using offline datasets that are naturally collected through past operations is becoming increasingly popular for training and evaluating RL and bandit policies.Reinforcement learning (RL) is a machine learning technique that trains software by mimicking the trial-and-error learning process of humans. It has demonstrated considerable success in many areas that involve sequential decision-making. However, training RL models with real-world online tests is often undesirable as it can be risky, time-consuming, and importantly, unethical. Thus, using offline datasets that are naturally collected through past operations is becoming increasingly popular for training and evaluating RL and bandit policies.Machine learning & AI[#item_full_content]

Save

HireBucket

HireBucket

Research team develops novel metric for evaluation of risk-return tradeoff in off-policy evaluation

Leave a Reply Cancel reply