세미나
ICIM 연구교류 세미나(3.21.목)
등록일자 : 2024-03-12
오민환 교수(서울대학교)
|
2024-03-21 14:00-16:00
|
국가수리과학연구소 산업수학혁신센터(판교)
일시: 2024년 3월 21일(목), 14:00~16:00
장소: 판교 테크노밸리 산업수학혁신센터 세미나실
경기 성남시 수정구 대왕판교로 815, 기업지원허브 231호 국가수리과학연구소
무료주차는 2시간 지원됩니다.
발표자: 오민환 교수(서울대학교)
주요내용: Cascading Contextual Assortment Bandits
Multi-armed bandit is a fundamental sequential decision-making problem that is often used to model interactions between users and a recommender agent. We propose a new combinatorial bandit model, the cascading contextual contextual assortment bandit. This model serves as a generalization of both existing cascading bandits and assortment bandits, broadening their applicability in practice. For this model, we propose our first UCB bandit algorithm, UCB-CCA. We prove that this algorithm achieves a T-step regret upper-bound of O((d/κ)√T) sharper than existing bounds for cascading contextual bandits by eliminating dependence on cascade length K. To improve the dependence on problem-dependent constant κ, we introduce our second algorithm, UCB-CCA+, which leverages a new Bernstein-type concentration result. This algorithm achieves O(d√T) without dependence on κ in the leading term. We substantiate our theoretical claims with numerical experiments, demonstrating the practical efficacy of our proposed methods.
유튜브 스트리밍 예정입니다.
-
일시: 2024년 3월 21일(목), 14:00~16:00
-
장소: 판교 테크노밸리 산업수학혁신센터 세미나실
- 경기 성남시 수정구 대왕판교로 815, 기업지원허브 231호 국가수리과학연구소
- 무료주차는 2시간 지원됩니다.
-
발표자: 오민환 교수(서울대학교)
-
주요내용: Cascading Contextual Assortment Bandits
Multi-armed bandit is a fundamental sequential decision-making problem that is often used to model interactions between users and a recommender agent. We propose a new combinatorial bandit model, the cascading contextual contextual assortment bandit. This model serves as a generalization of both existing cascading bandits and assortment bandits, broadening their applicability in practice. For this model, we propose our first UCB bandit algorithm, UCB-CCA. We prove that this algorithm achieves a T-step regret upper-bound of O((d/κ)√T) sharper than existing bounds for cascading contextual bandits by eliminating dependence on cascade length K. To improve the dependence on problem-dependent constant κ, we introduce our second algorithm, UCB-CCA+, which leverages a new Bernstein-type concentration result. This algorithm achieves O(d√T) without dependence on κ in the leading term. We substantiate our theoretical claims with numerical experiments, demonstrating the practical efficacy of our proposed methods.
유튜브 스트리밍 예정입니다.
-
일시: 2024년 3월 21일(목), 14:00~16:00
-
장소: 판교 테크노밸리 산업수학혁신센터 세미나실
- 경기 성남시 수정구 대왕판교로 815, 기업지원허브 231호 국가수리과학연구소
- 무료주차는 2시간 지원됩니다.
-
발표자: 오민환 교수(서울대학교)
-
주요내용: Cascading Contextual Assortment Bandits
Multi-armed bandit is a fundamental sequential decision-making problem that is often used to model interactions between users and a recommender agent. We propose a new combinatorial bandit model, the cascading contextual contextual assortment bandit. This model serves as a generalization of both existing cascading bandits and assortment bandits, broadening their applicability in practice. For this model, we propose our first UCB bandit algorithm, UCB-CCA. We prove that this algorithm achieves a T-step regret upper-bound of O((d/κ)√T) sharper than existing bounds for cascading contextual bandits by eliminating dependence on cascade length K. To improve the dependence on problem-dependent constant κ, we introduce our second algorithm, UCB-CCA+, which leverages a new Bernstein-type concentration result. This algorithm achieves O(d√T) without dependence on κ in the leading term. We substantiate our theoretical claims with numerical experiments, demonstrating the practical efficacy of our proposed methods.
유튜브 스트리밍 예정입니다.
자세히보기