¸üÐÂʱ¼ä:2021Äê12ÔÂ16ÈÕ18ʱ18·Ö À´Ô´:ÀÖÓãµç¾º ä¯ÀÀ´ÎÊý:
Spark StreamingÖ§³Ö´Ó¶àÖÖÊý¾ÝÔ´»ñÈ¡Êý¾Ý,ÆäÖоͰüÀ¨ Kafka£¬ÒªÏë´Ó Êý¾ÝÔ´»ñÈ¡Êý¾Ý£¬Ê×ÏÈÒª½¨Á¢Á½ÕßÖ®¼äµÄÁ¬½Ó£¬±¾½ÚÀ´½éÉÜÁ½ÖÖÁ¬½ÓKafkaµÄ·½Ê½¡£
(1)KafkaUtils.createDstream»ùÓÚ½ÓÊÕÆ÷·½Ê½£¬Ïû·ÑKafkaÊý¾ÝÒÑÌÔÌÆóÒµÖв»ÔÙʹÓÃ;
(2)Receiver×÷Ϊ³£×¤µÄTaskÔËÐÐÔÚExecutorµÈ´ýÊý¾Ý£¬µ«ÊÇÒ»¸öReceiverЧÂʵͣ¬ÐèÒª¿ªÆô¶à¸ö£¬ÔÙÊÖ¶¯ºÏ²¢Êý¾Ý(union)£¬ÔÙ½øÐд¦Àí£¬ºÜÂé·³;
(3)ReceiverÄÇ̨»úÆ÷¹ÒÁË£¬¿ÉÄܻᶪʧÊý¾Ý£¬ËùÒÔÐèÒª¿ªÆôWAL(ԤдÈÕÖ¾)±£Ö¤Êý¾Ý°²È«£¬ÄÇôЧÂÊÓֻήµÍ;
(4)Receiver·½Ê½ÊÇͨ¹ýzookeeperÀ´Á¬½Ókafka¶ÓÁУ¬µ÷ÓÃKafka¸ß½×API,offset´æ´¢ÔÚzookeeper,ÓÉReceiverά»¤
(5)SparkÔÚÏû·ÑµÄʱºòΪÁ˱£Ö¤Êý¾Ý²»¶ªÒ²»áÔÚCheckpointÖдæÒ»·Ýoffset,¿ÉÄÜ»á³öÏÖÊý¾Ý²»Ò»ÖÂ;
(1)KafkaUtils.createDirectStreamÖ±Á¬·½Ê½£¬StreamingÖÐÿÅú´ÎµÄÿ¸öjobÖ±½Óµ÷ÓÃSimple Consumer API»ñÈ¡¶ÔÓ¦TopicÊý¾Ý£¬´ËÖÖ·½Ê½Ê¹ÓÃ×î¶à£¬ÃæÊÔʱ±»ÎʵÄ×î¶à;
(2)Direct·½Ê½ÊÇÖ±½ÓÁ¬½Ókafka·ÖÇøÀ´»ñÈ¡Êý¾Ý£¬´Óÿ¸ö·ÖÇøÖ±½Ó¶ÁÈ¡Êý¾Ý´ó´óÌá¸ß²¢ÐÐÄÜÁ¦
(3)Direct·½Ê½µ÷ÓÃKafkaµÍ½×API(µ×²ãAPl)£¬offset×Ô¼º´æ´¢ºÍά»¤£¬Ä¬ÈÏÓÉSparkά»¤ÔÚcheckpointÖУ¬Ïû³ýÁËÓëzk²»Ò»ÖµÄÇé¿ö
(4)µ±È»Ò²¿ÉÒÔ×Ô¼ºÊÖ¶¯Î¬»¤£¬°Ñoffset´æÔÚMySQL/RedisÖÐ;

Spark StreamingÓëKafka¼¯³É£¬ÓÐÁ½Ì×API£¬ÔÒòÔÚÓÚKafka Consumer APIÓÐÁ½Ì×£¬Îĵµ£º
http://spatkapathe.org/docs/2.4.5/streaming-kafka-integration.html
http://spark apache.org/docs/latest/streaming-kafka-integration.html
Kafka0.8.x°æ±¾-ÔçÒÑÌÔÌ
µ×²ãʹÓÃÀϵÄKafkaAPI:Old Kafika Consumer API
Ö§³ÖReceiver(ÒÑÌÔ´ï)ºÍDirectģʽ£º
Kafka 0.10.x°æ±¾-¿ª·¢ÖÐʹÓÃ
µ×²ãʹÓÃеÄKafkaAPI:New Kafka Consumer API
Ö»Ö§³ÖDirectģʽ


ÔõÑùʹÓÃSpark ShellÀ´¶ÁÈ¡HDFSÎļþ£¿
Spark Streaming¿ò¼ÜÓÐÊ²Ã´ÌØµã£¿¡¾´óÊý¾ÝÅàѵ¡¿
Spark Streaming¹¤×÷ÔÀíÊÇʲô£¿
ÔõÑùÒ»¼üÆô¶¯»ò¹Ø±ÕKafka£¿Óпì½ÝµÄ·½·¨Âð£¿
ÀÖÓãµç¾ºpython+´óÊý¾Ý¿ª·¢¹¤³ÌʦÅàѵ
±±¾©Ð£Çø