¸üÐÂʱ¼ä:2020Äê09ÔÂ07ÈÕ13ʱ59·Ö À´Ô´:ÀÖÓãµç¾º ä¯ÀÀ´ÎÊý:

±¾ÎĵÄÄ¿µÄÊÇÏòNLP°®ºÃÕßÃÇÏêϸ½âÎöÒ»¸öÖøÃûµÄÓïÑÔÄ£ÐÍ-BERT¡£ È«ÎĽ«·Ö4¸ö²¿·ÖÓÉdzÈëÉîµÄÒÀ´Î½²½â¡£
1.Bert¼ò½é
BERTÊÇ2018Äê10ÔÂÓÉGoogle AIÑо¿ÔºÌá³öµÄÒ»ÖÖԤѵÁ·Ä£ÐÍ¡£
BERTµÄÈ«³ÆÊÇBidirectional Encoder Representation from Transformers¡£BERTÔÚ»úÆ÷ÔĶÁÀí½â¶¥¼¶Ë®Æ½²âÊÔSQuAD1.1ÖбíÏÖ³ö¾ªÈ˵ijɼ¨: È«²¿Á½¸öºâÁ¿Ö¸±êÉÏÈ«Ãæ³¬Ô½ÈËÀ࣬²¢ÇÒÔÚ11ÖÖ²»Í¬NLP²âÊÔÖд´³öSOTA±íÏÖ£¬°üÀ¨½«GLUE»ù×¼ÍÆ¸ßÖÁ80.4% (¾ø¶Ô¸Ä½ø7.6%)£¬MultiNLI׼ȷ¶È´ïµ½86.7% (¾ø¶Ô¸Ä½ø5.6%)£¬³ÉΪNLP·¢Õ¹Ê·ÉϵÄÀï³Ì±®Ê½µÄÄ£Ðͳɾ͡£
2.¹ØÓÚBertµÄÄ£Ðͼܹ¹
×ÜÌå¼Ü¹¹£ºÈçÏÂͼËùʾ, ×î×ó±ßµÄ¾ÍÊÇBERTµÄ¼Ü¹¹Í¼£¬¿ÉÒÔºÜÇå³þµÄ¿´µ½BERT²ÉÓÃÁËTransformer Encoder block½øÐÐÁ¬½Ó£¬ ÒòΪÊÇÒ»¸öµäÐ͵ÄË«Ïò±àÂëÄ£ÐÍ¡£


3.1 ¹ØÓÚBertѵÁ·¹ý³ÌÖеĹؼüµã
1)ËĴ󹨼ü´Ê: Pre-trained, Deep, Bidirectional Transformer, Language Understanding
a. Pre-trained: Ê×ÏÈÃ÷È·ÕâÊǸöԤѵÁ·µÄÓïÑÔÄ£ÐÍ£¬Î´À´ËùÓеĿª·¢Õß¿ÉÒÔÖ±½Ó¼Ì³Ð!
Õû¸öBertÄ£ÐÍ×î´óµÄÁ½¸öÁÁµã¶¼¼¯ÖÐÔÚPre-trainedµÄÈÎÎñ²¿·Ö¡£
b. Deep
Bert_BASE:Layer = 12, Hidden = 768, Head = 12, Total Parameters = 110M
Bert_LARGE:Layer = 24, Hidden = 1024, Head = 16, Total Parameters = 340M
¶Ô±ÈÓÚTransformer: Layer = 6, Hidden = 2048, Head = 8£¬ÊǸödz¶ø¿í£¬ËµÃ÷BertÕâÑùÉî¶øÕµÄÄ£ÐÍЧ¹û¸üºÃ(ºÍCVÁìÓòµÄ×ÜÌå½áÂÛ»ù±¾Ò»ÖÂ)¡£
C. Bidirectional Transformer: BertµÄ¸ö´´Ðµ㣬ËüÊǸöË«ÏòµÄTransformerÍøÂç¡£
BertÖ±½ÓÒýÓÃÁËTransformer¼Ü¹¹ÖеÄEncoderÄ£¿é£¬²¢ÉáÆúÁËDecoderÄ£¿é, ÕâÑù±ã×Ô¶¯ÓµÓÐÁËË«Ïò±àÂëÄÜÁ¦ºÍÇ¿´óµÄÌØÕ÷ÌáÈ¡ÄÜÁ¦¡£
D. Language Understanding: ¸ü¼Ó²àÖØÓïÑÔµÄÀí½â£¬¶ø²»½ö½öÊÇÉú³É(Language Generation)
3.2 BertµÄÓïÑÔÊäÈë±íʾ°üº¬ÁË3¸ö×é³É²¿·Ö: (¼ûÉÏÃæµÚ¶þÕÅͼ)
´ÊǶÈëÕÅÁ¿: word embeddings
Óï¾ä·Ö¿éÕÅÁ¿: segmentation embeddings
λÖñàÂëÕÅÁ¿: position embeddings
×îÖÕµÄembeddingÏòÁ¿Êǽ«ÉÏÊöµÄ3¸öÏòÁ¿Ö±½Ó×ö¼ÓºÍµÄ½á¹û¡£
3.3: BertµÄԤѵÁ·ÖÐÒýÈëÁ½´óºËÐÄÈÎÎñ (ÕâÁ½¸öÈÎÎñÒ²ÊÇBertÔʼÂÛÎĵÄÁ½¸ö×î´óµÄ´´Ðµã)
a ÒýÈëMasked LM(´ømaskµÄÓïÑÔÄ£ÐÍѵÁ·)
a.1 ÔÚÔʼѵÁ·Îı¾ÖУ¬Ëæ»úµÄ³éÈ¡15%µÄtoken×÷Ϊ¼´½«²ÎÓëmaskµÄ¶ÔÏó¡£
a.2 ÔÚÕâЩ±»Ñ¡ÖеÄtokenÖУ¬Êý¾Ý?Éú³ÉÆ÷Æ÷²¢²»²»ÊǰÑËûÃÇÈ«²¿±ä³É[MASK]£¬?¶øÊÇÓÐÏÂÁÐÁÐ3¸öÑ¡Ôñ:
a.2.1 ÔÚ80%µÄ¸ÅÂÊÏ£¬ÓÃ[MASK]±ê¼ÇÌæ»»¸Ãtoken, ±ÈÈçmy dog is hairy -> my dog is [MASK]
a.2.2 ÔÚ10%µÄ¸ÅÂÊÏÂ, ??¸öËæ»úµÄµ¥´ÊÌæ»»¸Ãtoken, ±ÈÈçmy dog is hairy -> my dog is apple
a.2.3 ÔÚ10%µÄ¸ÅÂÊÏÂ, ±£³Ö¸Ãtoken²»±ä, ±ÈÈçmy dog is hairy -> my dog is hairy
a.3 Transformer EncoderÔÚѵÁ·µÄ¹ý³ÌÖÐ, ²¢²»ÖªµÀËü½«ÒªÔ¤²âÄÄЩµ¥´Ê? ÄÄЩµ¥´ÊÊÇÔʼµÄÑù? ÄÄЩµ¥´Ê±»ÕÚÑÚ³ÉÁË[MASK]? ÄÄЩµ¥´Ê±»Ìæ»»³ÉÁËÆäËûµ¥´Ê? ÕýÊÇÔÚÕâÑùÒ»Öָ߶Ȳ»È·¶¨µÄÇé¿öÏÂ, ·´µ¹±Æ×ÅÄ£ÐÍ¿ìËÙѧϰ¸ÃtokenµÄ·Ö²¼Ê½ÉÏÏÂÎĵÄÓïÒå, ¾¡×î´óŬÁ¦Ñ§Ï°ÔʼÓïÑÔ˵»°µÄÑù×Ó!!! ͬʱÒòΪÔʼÎı¾ÖÐÖ»ÓÐ15%µÄtoken²ÎÓëÁËMASK²Ù×÷, ²¢²»»áÆÆ»µÔÓïÑԵıí´ïÄÜÁ¦ºÍÓïÑÔ¹æÔò!!!
b ÒýÈëNext Sentence Prediction (ÏÂ?¾ä»°µÄÔ¤²âÈÎÎñ)
b.1 Ä¿µÄÊÇΪÁË·þÎñÎÊ´ð£¬ÍÆÀí£¬¾ä?Ö÷Ìâ¹ØÏµµÈNLPÈÎÎñ¡£
b.2 ËùÓеIJÎÓëÈÎÎñѵÁ·µÄÓï¾ä¶¼±»Ñ¡Öвμӡ£
·50%µÄBÊÇÔʼ?±¾ÖÐʵ¼Ê¸úËæAµÄÏÂ?¾ä»°¡£(±ê¼ÇΪIsNext£¬´ú±íÕýÑù±¾)
·50%µÄBÊÇÔʼ?±¾ÖÐËæ»ú³éÈ¡µÄ?¾ä»°¡£(±ê¼ÇΪNotNext£¬´ú±í¸ºÑù±¾)
b.3 ÔÚ¸ÃÈÎÎñÖУ¬BertÄ£ÐÍ¿ÉÒÔÔÚ²âÊÔ¼¯ÉÏÈ¡µÃ97-98%µÄ׼ȷÂÊ¡£
3.4 ¹ØÓÚ»ùÓÚBertµÄÄ£ÐÍ΢µ÷(fine-tuning)
Ö»ÐèÒª½«Ìض¨ÈÎÎñµÄÊäÈ룬Êä³ö²åÈëµ½BertÖУ¬ÀûÓÃTransformerÇ¿´óµÄ×¢ÒâÁ¦»úÖÆ¾Í¿ÉÒÔÄ£ÄâºÜ¶àÏÂÓÎÈÎÎñ¡£(¾ä×Ó¶Ô¹ØÏµÅжϣ¬µ¥Îı¾Ö÷Ìâ·ÖÀ࣬ÎÊ´ðÈÎÎñ(QA)£¬µ¥¾äÌù±êÇ©(ÃüÃûʵÌåʶ±ð))
΢µ÷µÄÈô¸É¾Ñé:
batch size:16,32
epochs:3,4
learning rate:2e-5,5e-5
È«Á¬½Ó²ãÌí¼Ó:layers:1-3,hidden_size:64,128

4¡¢BertÄ£Ðͱ¾ÉíµÄÓŵãºÍȱµã¡£
Óŵã: BertµÄ»ù´¡½¨Á¢ÔÚtransformerÖ®ÉÏ£¬ÓµÓÐÇ¿´óµÄÓïÑÔ±íÕ÷ÄÜÁ¦ºÍÌØÕ÷ÌáÈ¡ÄÜÁ¦¡£ÔÚ11Ïî NLP»ù×¼²âÊÔÈÎÎñÖдﵽÁËstate of the art¡£Í¬Ê±ÔÙ´ÎÖ¤Ã÷ÁËË«ÏòÓïÑÔÄ£Ð͵ÄÄÜÁ¦¸ü¼ÓÇ¿´ó¡£
ȱµã:
1)¿É¸´ÏÖÐԲ»ù±¾Ã»·¨×ö£¬Ö»ÄÜÄÃÀ´Ö÷ÒåÖ±½ÓÓÃ!
2)ѵÁ·¹ý³ÌÖÐÒòΪÿ¸öbatch_sizeÖеÄÊý¾ÝÖ»ÓÐ15%²ÎÓëÔ¤²â£¬Ä£ÐÍÊÕÁ²½ÏÂý£¬ÐèҪǿ´óµÄËãÁ¦Ö§³Å!
ÒýÉê:
1)Éî¶Èѧϰ¾ÍÊDZíÕ÷ѧϰ (Deep learning is representation learning)
·Õû¸öBertÔÚ11ÏîÓïÑÔÄ£ÐÍ´óÈüÖУ¬»ù±¾Ë¼Â·¾ÍÊÇË«ÏòTransformer¸ºÔðÌáÈ¡ÌØÕ÷£¬È»ºóÕû¸öÍøÂç¼ÓÒ»¸öÈ«Á¬½ÓÏßÐÔ²ã×÷Ϊfine-tuning΢µ÷¡£µ«¼´±ãÈç´Ëɵ¹ÏʽµÄ×é×°£¬ÔÚNLPÖÐÖøÃûµÄÄÑÈÎÎñ-NER(ÃüÃûʵÌåʶ±ð)ÖУ¬ÉõÖÁÖ±½ÓÈ¥³ýµôÁËCRF²ã£¬ÕÕÑù´ó³¬Ô½BiLSTM + CRFµÄ×éºÏЧ¹û, ÕâÈ¥ÄĶù˵ÀíÈ¥???
2)¹æÄ£µÄ¼«¶ËÖØÒªÐÔ (Scale matters)
²»¹ÜÊÇMasked LM£¬»¹ÊÇÏÂÒ»¾äÔ¤²âNext Sentence Prediction£¬¶¼²»ÊÇÊ×´´µÄ¸ÅÄ֮ǰÔÚÆäËûµÄÄ£ÐÍÖÐÒ²Ìá³ö¹ý£¬µ«ÊÇÒòΪÊý¾Ý¹æÄ£+ËãÁ¦¾ÖÏÞûÄÜÈÃÊÀÈË¿´µ½Õâ¸öÄ£Ð͵ÄDZÁ¦£¬ÄÇЩPaperÒ²¾Í²»ÖµÇ®ÁË¡£µ«Êǵ½Á˹ȸèÊÖÀï, ²»²îÇ®µÄ½á¹û¾ÍÊÇPaperֵǮÁË!!
3)¹ØÓÚ½øÒ»²½µÄÑо¿Õ¹Ê¾ÁËBertÔÚ²»Í¬µÄ²ãѧϰµ½ÁËʲô¡£
·µÍµÄÍøÂç²ã²¶×½µ½Á˶ÌÓï½á¹¹·½ÃæµÄÐÅÏ¢¡£
·µ¥´ÊºÍ×ÖµÄÌØÕ÷±íÏÖÔÚ3-4²ã£¬¾ä·¨ÐÅÏ¢µÄÌØÕ÷±íÏÖÔÚ6-9²ã£¬¾ä?ÓïÒåÐÅÏ¢µÄÌØÕ÷±íÏÖÔÚ10-12²ã¡£
·Ö÷νһÖµÄÌØÕ÷±íÏÖÔÚ8-9²ã (ÊôÓھ䷨ÐÅÏ¢µÄÒ»ÖÖ)¡£
²ÂÄãϲ»¶£º
±±¾©Ð£Çø