¸üÐÂʱ¼ä:2020Äê11ÔÂ03ÈÕ17ʱ38·Ö À´Ô´:ÀÖÓã²¥¿Í ä¯ÀÀ´ÎÊý:
¡¡¡¡InputFormatÖ÷ÒªÓÃÓÚÃèÊöÊäÈëÊý¾ÝµÄ¸ñʽ£¬ËüÌṩÒÔÏÂÁ½¸ö¹¦ÄÜ£º
¡¡¡¡Êý¾ÝÇз֣º°´ÕÕij¸ö²ßÂÔ½«ÊäÈëÊý¾ÝÇзֳÉÈô¸É¸ö·ÖƬ(split)£¬ÒÔ±ãÈ·¶¨MapTask¸öÊýÒÔ¼°¶ÔÓ¦µÄ·ÖƬ(split)¡£
¡¡¡¡·ÎªMapperÌṩÊäÈëÊý¾Ý£º¸ø¶¨Ä³¸ö·ÖƬ(split)£¬½«Æä½âÎö³ÉÒ»¸öÒ»¸öµÄkey/value¼üÖµ¶Ô¡£
¡¡¡¡· Hadoop×Ô´øÁËÒ»¸ö InputFormat½Ó¿Ú£¬¸Ã½Ó¿ÚµÄ¶¨Òå´úÂëÈçÏÂËùʾ£º
public abstract class InputFormat{ public abstract List getSplits(JobContext context ) throws IOException, InterruptedException; public abstract RecordReader createRecordReader(InputSplit split, TaskAttemptContext context ) throws IOException, InterruptedException; }
¡¡¡¡´ÓÉÏÊö´úÂë¿ÉÒÔ¿´³ö£¬InputFormat½Ó¿Ú¶¨ÒåÁËgetSplits()ºÍcreateRecordReader()Á½¸ö·½·¨£¬ÆäÖУ¬getSplits()·½·¨¸ºÔð½«ÎļþÇзÖΪ¶à¸ö·ÖƬ(split)£¬createRecordReader()·½·¨¸ºÔð´´½¨RecordReader¶ÔÏó£¬ÓÃÀ´´Ó·ÖƬÖжÁÈ¡Êý¾Ý¡£ÏÂÃæ£¬ÎÒÃÇÖ÷Òª¶ÔgetSplits()·½·¨½øÐнéÉÜ¡£
getSplits()·½·¨Ö÷ҪʵÏÖÁËÂß¼ÇÐÆ¬»úÖÆ¡£ÆäÖУ¬ÇÐÆ¬µÄ´óСsplitSizeÊÇÓÉ3¸öֵȷ¶¨µÄ£¬¼´minSize¡¢maxSizeºÍblockSize¡£
minSize£ºsplitSizeµÄ×îСֵ£¬ÓɲÎÊýmapred.min.split.sizeÈ·¶¨£¬¿ÉÔÚmapred-site.xmlÖнøÐÐÅäÖã¬Ä¬ÈÏΪ1MB¡£
maxSize£ºsplitSizeµÄ×î´óÖµ£¬ÓɲÎÊýmapreduce.jobtracker.split.metainfo.maxsizeÈ·¶¨£¬¿ÉÔÚmapred-site.xmlÖнøÐÐÉèÖã¬Ä¬ÈÏֵΪ10MB¡£
blockSize£ºHDFSÖÐÎļþ´æ´¢¿éµÄ´óС£¬ÓɲÎÊýdfs.block,sizeÈ·¶¨£¬¿ÉÔÚhdf-site.xmlÖнøÐÐÐ޸ģ¬Ä¬ÈÏΪ128MB¡£
²ÂÄãϲ»¶£º
¡¡Znode´¢´æ½á¹¹ÊÇÔõÑùµÄ?½ÚµãÀàÐÍÓм¸ÖÖ?
¡¡SparkµÄ¼¯Èº°²×°ÓëÅäÖüò½é
¡¡ÀÖÓã²¥¿Í´óÊý¾ÝÅàѵ¿Î³Ì
±±¾©Ð£Çø