(๋ณธ ๋‚ด์šฉ์€ ๋…ผ๋ฌธ์˜ ์ €์ž์ธ Greg Yang, @TheGregYang ์˜ ํŠธ์œ„ํ„ฐ ๊ธ€์—์„œ ๋„๋‘‘์งˆํ•จ, https://twitter.com/TheGregYang/status/1501294412126560257?s=20&t=QfvtD7a_iZ8eGPeWGex8fQ)


GPT-3 ์™€ ๊ฐ™์€ ํฐ ๋ชจ๋ธ์„ ํ•˜๋‚˜์˜ GPU์—์„œ ํ•™์Šต์„ ์‹œํ‚จ๋‹ค๋Š” ๊ฒƒ์€ ์–ด๋ ต๋‹ค๋Š” ์‚ฌ์‹ค์€ ๋‹ค๋“ค ์•Œ๊ฑฐ์•ผ.

ํ•˜์ง€๋งŒ ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•์œผ๋กœ ์ด๊ฒŒ ๊ฐ€๋Šฅํ•˜๋‹ค๊ณ  ํ•œ๋‹ค๋ฉด ์–ด๋–จ๊นŒ?


์ด ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด์„œ ์ž์„ธํžˆ ์•Œ๊ณ  ์‹ถ๋‹ค๋ฉด ์•„๋ž˜์˜ ๋งํฌ๋ฅผ ์ฐธ์กฐํ•ด

๋…ผ๋ฌธ https://arxiv.org/abs/2203.03466

์ฝ”๋“œ https://github.com/microsoft/mup

๋ธ”๋กœ๊ทธ https://microsoft.com/en-us/research/blog/%c2%b5transfer-a-technique-for-hyperparameter-tuning-of-enormous-neural-networks/


์›๋ฆฌ๋Š” ๊ฐ„๋‹จํ•ด(์ €์ž ์ฃผ์žฅ), ์ด์ „์— ์ €์ž๊ฐ€ ์ œ์•ˆํ•œ ยต-Parametrization(uP, ๊ผญ myu-P๋กœ ์ฝ์–ด๋‹ฌ๋ž˜) ์„ ์‚ฌ์šฉํ•˜๋ฉด ํ•™์Šต๊ณผ์ •์—์„œ ๋ชจ๋ธ์˜ ์Šค์ผ€์ผ๊ณผ ์ƒ๊ด€ ์—†์ด activation scale์„ initialization๋•Œ์™€ ๋น„์Šทํ•˜๊ฒŒ ์ผ์ •ํ•˜๊ฒŒ ์œ ์ง€ํ•˜๊ฒŒ ํ•˜๋Š”๋ฐ, uP๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ชจ๋ธ์ด ํฌ๊ธฐ์™€ ์ƒ๊ด€ ์—†์ด ๊ฐ™์€ optimal hyperparameter๋ฅผ ๊ฐ–๋Š”๋Œ€. (learning rate, learning rate schedule, initialization ๋“ฑ๋“ฑ...)



388d9d34e0c12daa7af1dca511f11a39f02ce345a7274ea375

โ†‘ uP๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ผ์ •ํ•˜๊ฒŒ ์œ ์ง€๋˜๋Š” activation




388d9d2efcc23db47ebec4b41bc26a3761a584145a821db31166dbd4684b

โ†‘ uP ์ ์šฉ์‹œ ๋ชจ๋ธ์˜ width ์™€ ์ƒ๊ด€ ์—†์ด learning rate๊ฐ€ ๋™์ผํ•œ ์ง€์ ์—์„œ training loss๊ฐ€ ๊ฐ€์žฅ ๋‚ฎ๊ฒŒ ๋‚˜์˜ค๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์ž‡์–ด (optimum stable)


๊ทธ๋ž˜์„œ ์ด ์•„์ด๋””์–ด๋ฅผ ์ด์šฉํ•˜์—ฌ ์•„์ฃผ ์ž‘์€ ๋ฒ„์ „์˜ GPT-3๋ฅผ ํ•œ ์žฅ์˜ GPU์—์„œ ํ•™์Šต์„ ์‹œ์ผœ์„œ ์ ์ ˆํ•œ hyperparameter๋ฅผ ์ฐพ๋Š”๊ฑฐ์•ผ. ๋งŒ์•ฝ ์ฐพ์€ hyperparameter๊ฐ€ ์ž‘์€ ๋ชจ๋ธ์—์„œ optimal์— ๊ฐ€๊น๋‹ค๋ฉด ํฐ ๋ชจ๋ธ์—์„œ๋„ ๊ฑฐ์˜ optimal์— ๊ฐ€๊น๋‹ค๋Š”๊ฑฐ์ง€. ๊ทธ๋ž˜์„œ ์ €์ž๋Š” ์ด๊ฑธ *ยตTransfer*๋ผ๊ณ  ๋ถ€๋ฅด๊ธฐ๋กœ ํ–ˆ๋Œ€.



0b93e736b0c5199e799df39f12f33e733ee90a977815dd35269da49c3f71



๊ทธ๋ž˜์„œ ์ €์ž๋Š” GPUํ•˜๋‚˜์—์„œ ํ•™์Šต ๊ฐ€๋Šฅํ•œ 4000๋งŒ๊ฐœ์˜ ์ž‘์€ ๋ชจ๋ธ์—์„œ ์–ป์€ hyperparameter๋ฅผ ยตTransferํ•ด์„œ 67์–ต๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ฐ€์ง€๋Š” ํฐ ๋ชจ๋ธ์—์„œ ํ•™์Šต์‹œํ‚ค๋Š” ์‹คํ—˜์„ ํ–ˆ๊ณ  ์•„์ฃผ ์ ์€ cost๋กœ ์„ฑ๊ณต์ ์ธ ํ•™์Šต๊ฒฐ๊ณผ๋ฅผ ๋ƒˆ๋Œ€.



388d9d34e0c12daa7af1dca511f11a39f02ce345a32045a676


์ €์ž๊ฐ€ uP๋ฅผ ์ง์ ‘ ๋‹ค๋ฅธ ๋ชจ๋ธ์— ์ ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์ฝ”๋“œ๋ฅผ ๊นƒํ—™์— ์˜ฌ๋ ค๋†จ๊ณ  ์ž๊ธฐ๋„ค๋“ค ์ฝ”๋“œ๋ฅผ ์“ฐ๋Š”๊ฑธ ์ถ”์ฒœํ•œ๋‹ค๊ณ  ํ•˜๋‹ˆ ๊ด€์‹ฌ์žˆ์œผ๋ฉด ํ•œ ๋ฒˆ ์‚ฌ์šฉํ•ด๋ด~