2082d133f1dd3da86db098bf06d604033ab2f1272a079ecb79d0

https://arxiv.org/abs/2203.06173

UC ๋ฒ„ํด๋ฆฌ์—์„œ ๋‚˜์˜จ ๋…ผ๋ฌธ์ž…๋‹ˆ๋‹ค.


์‚ฌ๋žŒ๋“ค์€ ์ด ๋…ผ๋ฌธ์ด MAE (Masked autoencoder)๋ฅผ real world ์˜์ƒ์œผ๋กœ ํ•™์Šต์„ ์‹œ์ผœ์„œ ๋‚˜์˜จ visual representation์ด ๋ฌผ์ฒด ์กฐ์ž‘ ์ž‘์—…์— ์•„์ฃผ ์œ ์šฉํ•˜๋‹ค๋Š” ์‚ฌ์‹ค์„ ๋ณด์—ฌ์ค€๋‹ค๊ณ  ํ•˜๋„ค์š”.


๊ทธ๋Ÿฌ๋‹ˆ๊นŒ ์ œ๊ฐ€ ์•„๋Š” ํ•œ๋„ ๋‚ด์—์„œ ํ’€์–ด์„œ ๋ง์”€์„ ๋“œ๋ฆฌ์ž๋ฉด, real ์ •๋ณด๋ฅผ random sampling ํ•œ ๋งˆ์Šคํฌ๋กœ ๋งˆ์Šคํ‚น ํ•œ ํ›„, ViT encoder์™€ ๋™์ผํ•˜๊ฒŒ ์˜์ƒ์„ ํŒจ์น˜๋กœ ๋‚˜๋ˆ„์–ด encoder๋ฅผ ํ•™์Šต์„ ์‹œ์ผฐ์œผ๋ฉฐ (์ ์€ ๋น„์šฉ์œผ๋กœ ํ•™์Šต์„ ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค๋Š” ์žฅ์ ๋„ ์žˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค. ์œ ๋ช…ํ•œ Kaiming He์˜ 2021๋…„ ๋…ผ๋ฌธ ์ฐธ์กฐ, https://arxiv.org/abs/2111.06377)

์ด๋ ‡๊ฒŒ ํ•™์Šต์‹œํ‚จ visual encoder๋ฅผ freeze ํ•˜์—ฌ RL๊ณผ ํ•จ๊ป˜ motor control task์— ์‚ฌ์šฉ์„ ํ•˜์˜€๋”๋‹ˆ task specificํ•œ fine-tuning ์—†์ด๋„ ์ž˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ–ˆ๋‹ค๋Š” ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค.


๊ทธ๋ฆฌ๊ณ  ๋ณธ๋ฌธ์— ์ฒจ๋ถ€ํ•œ ์‚ฌ์ง„๊ณผ ๊ฐ™์ด encoder์—์„œ ๋‚˜์˜จ visual represention์ด ์—ฌ๋Ÿฌ ๋ชจ์–‘๊ณผ ์ƒ‰์ƒ์˜ ๋ฌธ์ œ๋„ ์ž˜ ํ‘œํ˜„ํ•˜๊ณ  ๋„“์€ ๋ฒ”์œ„์˜ ๋ฌผ์ฒด์™€ ํ™˜๊ฒฝ๋„ ์ž˜ ๋‚˜ํƒ€๋‚ด๋Š” ๋“ฑ ์ผ๋ฐ˜ํ™”๊ฐ€ ์ž˜ ๋œ๋‹ค๋Š” ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค.


MAE์˜ visual representation์—์„œ decoding ํ•œ ๊ฒฐ๊ณผ๊ฐ€ ์ƒ๋‹นํžˆ ์ธ์ƒ๊นŠ์–ด์„œ ๊ณต์œ ๋ฅผ ํ•ด ๋ด…๋‹ˆ๋‹ค...


(์ œ๊ฐ€ generative model์— ๋Œ€ํ•ด์„œ ๋ณ„๋กœ ์•„๋Š”๊ฒŒ ์—†๋‹ค๋ณด๋‹ˆ ์„œ์ˆ  ์ž์ฒด๊ฐ€ ํ‹€๋ ธ์„ ํ™•๋ฅ ์ด ๋†’์Šต๋‹ˆ๋‹ค ใ…Žใ…Ž ์ง€์  ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค~)