¸üÐÂʱ¼ä:2021Äê05ÔÂ27ÈÕ17ʱ33·Ö À´Ô´: ä¯ÀÀ´ÎÊý:

ORCºÍParquet¶¼ÊǸßÐÔÄܵĴ洢·½Ê½£¬ÕâÁ½ÖÖ´æ´¢¸ñʽ×Ü»á´øÀ´´æ´¢ºÍÐÔÄÜÉϵÄÌáÉý¡£
1.Parquet
£¨1£©ParquetÖ§³ÖǶÌ×µÄÊý¾ÝÄ£ÐÍ£¬ÀàËÆÓÚProtocol Buffers£¬Ã¿Ò»¸öÊý¾ÝÄ£Ð͵Äschema°üº¬¶à¸ö×ֶΣ¬Ã¿Ò»¸ö×Ö¶ÎÓÐÈý¸öÊôÐÔ£ºÖظ´´ÎÊý¡¢Êý¾ÝÀàÐͺÍ×Ö¶ÎÃû£¬Öظ´´ÎÊý¿ÉÒÔÊÇÒÔÏÂÈýÖÖ£ºrequired(Ö»³öÏÖ1´Î)£¬repeated(³öÏÖ0´Î»ò¶à´Î)£¬optional(³öÏÖ0´Î»ò1´Î)¡£Ã¿Ò»¸ö×ֶεÄÊý¾ÝÀàÐÍ¿ÉÒÔ·Ö³ÉÁ½ÖÖ£º group(¸´ÔÓÀàÐÍ)ºÍprimitive(»ù±¾ÀàÐÍ)¡£
£¨2£©ParquetÖÐûÓÐMap¡¢ArrayÕâÑùµÄ¸´ÔÓÊý¾Ý½á¹¹£¬µ«ÊÇ¿ÉÒÔͨ¹ýrepeatedºÍgroup×éºÏÀ´ÊµÏֵġ£
£¨3£©ÓÉÓÚParquetÖ§³ÖµÄÊý¾ÝÄ£ÐͱȽÏËÉÉ¢£¬¿ÉÄÜÒ»Ìõ¼Ç¼ÖдæÔڱȽÏÉîµÄǶÌ×¹ØÏµ£¬Èç¹ûΪÿһÌõ¼Ç¼¶¼Î¬»¤Ò»¸öÀàËÆµÄÊ÷×´½á¿ÉÄÜ»áÕ¼ÓýϴóµÄ´æ´¢¿Õ¼ä£¬Òò´ËDremelÂÛÎÄÖÐÌá³öÁËÒ»ÖÖ¸ßЧµÄ¶ÔÓÚǶÌ×Êý¾Ý¸ñʽµÄѹËõËã·¨£ºStriping/AssemblyËã·¨¡£Í¨¹ýStriping/AssemblyËã·¨£¬parquet¿ÉÒÔʹÓýÏÉٵĴ洢¿Õ¼ä±íʾ¸´ÔÓµÄǶÌ׸ñʽ£¬²¢ÇÒͨ³£Repetition levelºÍDefinition level¶¼ÊǽÏСµÄÕûÊýÖµ£¬¿ÉÒÔͨ¹ýRLEËã·¨¶ÔÆä½øÐÐѹËõ£¬½øÒ»²½½µµÍ´æ´¢¿Õ¼ä¡£
ParquetÎļþÊÇÒÔ¶þ½øÖÆ·½Ê½´æ´¢µÄ£¬ÊDz»¿ÉÒÔÖ±½Ó¶ÁÈ¡ºÍÐ޸ĵģ¬ParquetÎļþÊÇ×Ô½âÎöµÄ£¬ÎļþÖаüÀ¨¸ÃÎļþµÄÊý¾ÝºÍÔªÊý¾Ý¡£
2.ORC
£¨1£©ORCÎļþÊÇ×ÔÃèÊöµÄ£¬ËüµÄÔªÊý¾ÝʹÓÃProtocol BuffersÐòÁл¯£¬²¢ÇÒÎļþÖеÄÊý¾Ý¾¡¿ÉÄܵÄѹËõÒÔ½µµÍ´æ´¢¿Õ¼äµÄÏûºÄ£»
£¨2£©ºÍParquetÀàËÆ£¬ORCÎļþÒ²ÊÇÒÔ¶þ½øÖÆ·½Ê½´æ´¢µÄ£¬ËùÒÔÊDz»¿ÉÒÔÖ±½Ó¶ÁÈ¡£¬ORCÎļþÒ²ÊÇ×Ô½âÎöµÄ£¬Ëü°üº¬Ðí¶àµÄÔªÊý¾Ý£¬ÕâЩԪÊý¾Ý¶¼ÊÇͬ¹¹ProtoBuffer½øÐÐÐòÁл¯µÄ£»

£¨3£©ORC»á¾¡¿ÉÄܺϲ¢¶à¸öÀëÉ¢µÄÇø¼ä¾¡¿ÉÄܵļõÉÙI/O´ÎÊý£»
£¨4£©ORCÖÐʹÓÃÁ˸ü¼Ó¾«È·µÄË÷ÒýÐÅÏ¢£¬Ê¹µÃÔÚ¶ÁÈ¡Êý¾Ýʱ¿ÉÒÔÖ¸¶¨´ÓÈÎÒâÒ»ÐпªÊ¼¶ÁÈ¡£¬¸üϸÁ£¶ÈµÄͳ¼ÆÐÅϢʹµÃ¶ÁÈ¡ORCÎļþÌø¹ýÕû¸örow group£¬ORCĬÈÏ»á¶ÔÈκÎÒ»¿éÊý¾ÝºÍË÷ÒýÐÅϢʹÓÃZLIBѹËõ£¬Òò´ËORCÎļþÕ¼ÓõĴ洢¿Õ¼äÒ²¸üС£»
£¨5£©ÔÚа汾µÄORCÖÐÒ²¼ÓÈëÁ˶ÔBloom FilterµÄÖ§³Ö£¬Ëü¿ÉÒÔ½øÒ»²½ÌáÉýν´ÊÏÂÍÆµÄЧÂÊ£¬ÔÚHive 1.2.0°æ±¾ÒÔºóÒ²¼ÓÈëÁ˶Դ˵ÄÖ§³Ö¡£
´óÊý¾ÝÖ®HiveÊÓÆµ½Ì³Ì[´óÊý¾ÝÅàѵ½Ì³Ì
ÔõÑù°²×°Hive£¿±¾µØºÍÔ¶³Ì²Ù×÷°²×°Çø±ðÔÚÄÄÀï
Redis¡¢´«Í³Êý¾Ý¿â¡¢HBaseÒÔ¼°HiveµÄÇø±ð
Spark SQLÈçºÎʵÏÖHiveÊý¾Ý²Ö¿âµÄ²Ù×÷£¿
ÀÖÓãµç¾ºPython+´óÊý¾Ý¿ª·¢Åàѵ
±±¾©Ð£Çø