ÀÖÓãµç¾º

½ÌÓýÐÐÒµA¹ÉIPOµÚÒ»¹É£¨¹ÉƱ´úÂë 003032£©

È«¹ú×Éѯ/ͶËßÈÈÏߣº400-618-4000

ʹÓÃPandas½øÐÐÊý¾ÝÇåÏ´µÄ¾ßÌå²Ù×÷

¸üÐÂʱ¼ä:2021Äê01ÔÂ18ÈÕ16ʱ02·Ö À´Ô´:ÀÖÓãµç¾º ä¯ÀÀ´ÎÊý:

ǰÆÚ²É¼¯µ½µÄÊý¾Ý£¬»ò¶à»òÉÙ¶¼´æÔÚһЩ覴úͲ»×㣬±ÈÈçÊý¾Ýȱʧ¡¢¼«¶ËÖµ¡¢Êý¾Ý¸ñʽ²»Í³Ò»µÈÎÊÌâ¡£Òò´Ë£¬ÔÚ·ÖÎöÊý¾Ý֮ǰÐèÒª¶ÔÊý¾Ý½øÐÐÔ¤´¦Àí£¬°üÀ¨Êý¾ÝµÄÇåÏ´¡¢ºÏ²¢¡¢ÖØËÜÓëת»»¡£PandasÖÐרÃÅÌṩÁËÓÃÓÚÊý¾ÝÔ¤´¦ÀíµÄºÜ¶àº¯ÊýÓë·½·¨£¬ÓÃÓÚÌæ»»Òì³£Êý¾Ý¡¢ºÏ²¢Êý¾Ý¡¢ÖØËÜÊý¾ÝµÈ¡£

Êý¾ÝÇåÏ´ÊÇÒ»ÏÔÓÇÒ·±ËöµÄ¹¤×÷£¬Í¬Ê±Ò²ÊÇÕû¸öÊý¾Ý·ÖÎö¹ý³ÌÖÐ×îÎªÖØÒªµÄ»·½Ú¡£Êý¾ÝÇåÏ´µÄÄ¿µÄÔÚÓÚÌá¸ßÊý¾ÝÖÊÁ¿£¬½«ÔàÊý¾Ý£¨ÔàÊý¾ÝÔÚÕâÀïÖ¸µÄÊǶÔÊý¾Ý·ÖÎöûÓÐʵ¼ÊÒâÒå¡¢¸ñʽ·Ç·¨¡¢²»ÔÚÖ¸¶¨·¶Î§ÄÚµÄÊý¾Ý£©ÇåÏ´¸É¾»£¬Ê¹Ô­Êý¾Ý¾ßÓÐÍêÕûÐÔ¡¢Î¨Ò»ÐÔ¡¢È¨ÍþÐÔ¡¢ºÏ·¨ÐÔ¡¢Ò»ÖÂÐÔµÈÌØµã¡£PandasÖг£¼ûµÄÊý¾ÝÇåÏ´²Ù×÷ÓпÕÖµºÍȱʧֵµÄ´¦Àí¡¢Öظ´ÖµµÄ´¦Àí¡¢Òì³£ÖµµÄ´¦Àí¡¢Í³Ò»Êý¾Ý¸ñʽµÈµÈ¡£

¿ÕÖµÒ»°ã±íʾÊý¾Ýδ֪¡¢²»ÊÊÓûò½«ÔÚÒÔºóÌí¼ÓÊý¾Ý¡£È±Ê§ÖµÊÇÖ¸Êý¾Ý¼¯ÖÐij¸ö»òijЩÊôÐÔµÄÖµÊDz»ÍêÕûµÄ£¬²úÉúµÄÔ­ÒòÖ÷ÒªÓÐÈËΪԭÒòºÍ»úеԭÒòÁ½ÖÖ£¬ÆäÖлúеԭÒòÊÇÓÉÓÚ»úÆ÷¹ÊÕÏÔì³ÉÊý¾ÝδÄÜÊÕ¼¯»ò´æ´¢Ê§°Ü£¬ÈËΪԭÒòÊÇÓÉÖ÷¹ÛʧÎó»òÓÐÒâÒþÂ÷Ôì³ÉµÄÊý¾Ýȱʧ¡£

Ò»°ã¿ÕֵʹÓÃNone±íʾ£¬È±Ê§ÖµÊ¹ÓÃNaN±íʾ¡£PandasÖÐÌṩÁËһЩÓÃÓÚ¼ì²é»ò´¦Àí¿ÕÖµºÍȱʧֵµÄº¯Êý£¬ÆäÖУ¬Ê¹ÓÃisnull()ºÍnotnull()º¯Êý¿ÉÒÔÅжÏÊý¾Ý¼¯ÖÐÊÇ·ñ´æÔÚ¿ÕÖµºÍȱʧֵ£¬¶ÔÓÚȱʧÊý¾Ý¿ÉÒÔʹÓÃdropna()ºÍfillna()·½·¨¶Ôȱʧֵ½øÐÐɾ³ýºÍÌî³ä£¬ÏÂÃæÀ´Ò»Ò»½éÉÜ¡£

1. isnull()º¯Êý

isnull()º¯ÊýµÄÓï·¨¸ñʽÈçÏ£º

pandas.isnull(obj)

ÉÏÊöº¯ÊýÖÐÖ»ÓÐÒ»¸ö²ÎÊýobj£¬±íʾ¼ì²é¿ÕÖµµÄ¶ÔÏ󣬸ú¯Êý»á·µ»ØÒ»¸ö²¼¶ûÀàÐ͵ÄÖµ£¬Èç¹û·µ»ØµÄ½á¹ûΪTrue£¬Ôò˵Ã÷ÓпÕÖµ»òȱʧֵ£¬·ñÔòΪFalse¡££¨NaN»òNoneÓ³Éäµ½TrueÖµ£¬ÆäËüÄÚÈÝÓ³Éäµ½False£©

½ÓÏÂÀ´£¬Í¨¹ýÒ»¶ÎʾÀýÀ´ÑÝʾÈçºÎͨ¹ýisnull()º¯ÊýÀ´¼ì²éȱʧֵ»ò¿ÕÖµ£¬¾ßÌå´úÂëÈçÏ£º

In [1]: from pandas import DataFrame, Series
        import pandas as pd
        from numpy import NaN
        series_obj = Series([1, None, NaN])
        pd.isnull(series_obj)    # ¼ì²éÊÇ·ñΪ¿ÕÖµ»òȱʧֵ
Out[1]:
        0   False
        1   True
        2   True
        dtype£ºbool

ÉÏÊöʾÀýÖУ¬Ê×ÏÈ´´½¨ÁËÒ»¸öSeries¶ÔÏ󣬸öÔÏóÖаüº¬1¡¢NoneºÍNaNÈý¸öÖµ£¬È»ºóµ÷ÓÃisnull()º¯Êý¼ì²éSeries¶ÔÏóÖеÄÊý¾Ý£¬Êý¾ÝΪ¿ÕÖµ»òȱʧֵ¾ÍÓ³ÉäΪTrue£¬ÆäÓàÖµ¾ÍÓ³ÉäΪFalse¡£´ÓÊä³ö½á¹û¿´³ö£¬µÚÒ»¸öÊý¾ÝÊÇÕý³£µÄ£¬ºóÁ½¸öÊý¾ÝÊÇ¿ÕÖµ»òȱʧֵ¡£

2. notnull()º¯Êý

notnull()º¯ÊýÓëisnull()º¯ÊýµÄ¹¦ÄÜÊÇÒ»ÑùµÄ£¬¶¼ÊÇÅжÏÊý¾ÝÖÐÊÇ·ñ´æÔÚ¿ÕÖµ»òȱʧֵ£¬²»Í¬Ö®´¦ÔÚÓÚ£¬Ç°Õß·¢ÏÖÊý¾ÝÖÐÓпÕÖµ»òȱʧֵʱ·µ»ØFalse£¬ºóÕß·µ»ØµÄÊÇTrue¡£

½«ÉÏÊöµ÷ÓÃisnull()º¯ÊýµÄ´úÂë¸ÄΪµ÷ÓÃnotnull()º¯Êý£¬¸ÄºóµÄ´úÂëÈçÏ£º

In [2]: from pandas import DataFrame, Series
        import pandas as pd
        from numpy import NaN
        series_obj = Series([1, None, NaN])
        pd.notnull(series_obj)    # ¼ì²éÊÇ·ñ²»Îª¿ÕÖµ»òȱʧֵ

Out[2]:
        0   True
        1  False
        2  False
        dtype: bool

ÉÏÊöʾÀýÖУ¬Í¨¹ýnotnull()º¯ÊýÀ´¼ì²é¿ÕÖµ»òȱʧֵ£¬Ö»Òª³öÏÖ¿ÕÖµ»òȱʧֵ¾ÍÓ³ÉäΪFalse£¬ÆäÓàÔòÓ³ÉäΪTrue¡£´ÓÊä³ö½á¹û¿´³ö£¬Ë÷Òý0¶ÔÓ¦µÄÊý¾ÝΪTrue£¬ËµÃ÷ûÓгöÏÖ¿ÕÖµ»òȱʧֵ£¬Ë÷Òý1ºÍ2¶ÔÓ¦µÄÊý¾ÝΪFalse£¬ËµÃ÷³öÏÖÁË¿ÕÖµ»òȱʧֵ¡£

3. dropna()·½·¨

dropna()·½·¨µÄ×÷ÓÃÊÇɾ³ýº¬ÓпÕÖµ»òȱʧֵµÄÐлòÁУ¬ÆäÓï·¨¸ñʽÈçÏ£º

dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

ÉÏÊö·½·¨Öв¿·Ö²ÎÊý±íʾµÄº¬ÒåÈçÏ£º

(1) axis£ºÈ·¶¨¹ýÂËÐлòÁУ¬È¡Öµ¿ÉÒÔΪ£º

0»òindex£ºÉ¾³ý°üº¬È±Ê§ÖµµÄÐУ¬Ä¬ÈÏΪ0¡£

1»òcolumns£ºÉ¾³ý°üº¬È±Ê§ÖµµÄÁС£

(2) how£ºÈ·¶¨¹ýÂ˵ıê×¼£¬È¡Öµ¿ÉÒÔΪ£º

any£ºÄ¬ÈÏÖµ¡£Èç¹û´æÔÚNaNÖµ£¬Ôòɾ³ý¸ÃÐлò¸ÃÁС£

all£ºÈç¹ûËùÓÐÖµ¶¼ÊÇNaNÖµ£¬Ôòɾ³ý¸ÃÐлò¸ÃÁС£

(3) thresh£ºc±íʾÓÐЧÊý¾ÝÁ¿µÄ×îСҪÇó¡£Èô´«ÈëÁË2£¬ÔòÊÇÒªÇó¸ÃÐлò¸ÃÁÐÖÁÉÙÓÐÁ½¸ö·ÇNaNֵʱ½«Æä±£Áô¡£

(4) subset£º±íʾÔÚÌØ¶¨µÄ×Ó¼¯ÖÐѰÕÒNaNÖµ¡£

(5) inplace£º±íʾÊÇ·ñÔÚÔ­Êý¾ÝÉϲÙ×÷¡£Èç¹ûÉèΪTrue£¬Ôò±íʾֱ½ÓÐÞ¸ÄÔ­Êý¾Ý£»Èç¹ûÉèΪFalse£¬Ôò±íʾÐÞ¸ÄÔ­Êý¾ÝµÄ¸±±¾£¬·µ»ØÐµÄÊý¾Ý¡£

¼ÙÉ裬ÏÖÔÚÓÐÒ»ÕŹØÓÚÊé¼®ÐÅÏ¢µÄ±í¸ñ£¬ËüÀïÃæÓÐÀà±ð¡¢ÊéÃûºÍ×÷ÕßÈýÁÐÊý¾Ý¡£ÆäÖУ¬ÔÚË÷ÒýΪ0µÄÒ»ÐÐÖÐÊéÃûΪNaN£¬±íÃ÷¸ÃλÖõÄÊý¾ÝÊÇȱʧֵ£¬Ë÷ÒýΪ1µÄÒ»ÐÐÖÐ×÷ÕßΪNone£¬±íÃ÷¸ÃλÖõÄÊý¾ÝÊÇ¿ÕÖµ¡£Èç¹ûɾ³ýÕâЩ¿ÕÖµºÍȱʧֵ£¬ÄÇôɾ³ýǰºóµÄЧ¹ûÈçͼ1Ëùʾ¡£

ͼ1 ɾ³ý¿ÕÖµ/ȱʧֵǰºóµÄ±í¸ñ

½ÓÏÂÀ´£¬Í¨¹ýÒ»¸öʾÀýÀ´ÑÝʾÈçºÎʹÓÃdropna()·½·¨É¾³ý¿ÕÖµºÍȱʧֵ£¬¾ßÌå´úÂëÈçÏ¡£

In [3]: import pandas as pd
        import numpy as np
        df_obj = pd.DataFrame({"Àà±ð":['С˵', 'É¢ÎÄËæ±Ê', 'Çà´ºÎÄѧ', '´«¼Ç'],
                               "ÊéÃû":[np.nan, '¡¶Æ¤ÄÒ¡·', '¡¶Âó̽áÊøÊ±¡·', '¡¶ÀÏÉá×Ô´«¡·'],
                               "×÷Õß":["ÀÏÉá", None, "ÕÅÆäöÎ", "ÀÏÉá"]})
        df_obj

Out[3]:      Àà±ð       ÊéÃû         ×÷Õß
        0    С˵       NaN          ÀÏÉá
        1  É¢ÎÄËæ±Ê     ¡¶Æ¤ÄÒ¡·       None
        2  Çà´ºÎÄѧ     ¡¶Âó̽áÊøÊ±¡·  ÕÅÆäöÎ
        3    ´«¼Ç      ¡¶ÀÏÉá×Ô´«¡·    ÀÏÉá
In [4]: df_obj.dropna()   # ɾ³ýÊý¾Ý¼¯ÖеĿÕÖµºÍȱʧֵ
Out[4]:
            Àà±ð       ÊéÃû    ×÷Õß
        2  Çà´ºÎÄѧ  ¡¶Âó̽áÊøÊ±¡·  ÕÅÆäöÎ
        3    ´«¼Ç   ¡¶ÀÏÉá×Ô´«¡·   ÀÏÉá

ÉÏÊö´úÂëÖУ¬Ê×ÏÈ´´½¨Ò»¸öº¬ÓпÕÖµºÍȱʧֵµÄDataFrame¶ÔÏó£¬ÔÙÈøöÔÏóµ÷ÓÃdropna()·½·¨½«Êý¾ÝÖеĿÕÖµ»òȱʧֵ½øÐйýÂËɾ³ý£¬Ö»±£ÁôÍêÕûµÄÊý¾Ý¡£

´ÓÊä³ö½á¹û¿´³ö£¬ËùÓаüº¬¿ÕÖµ»òȱʧֵµÄÐÐÒѾ­±»É¾³ýÁË¡£

4. Ìî³ä¿ÕÖµ/ȱʧֵ

Ìî³äȱʧֵºÍ¿ÕÖµµÄ·½Ê½ÓкܶàÖÖ£¬±ÈÈçÈ˹¤Ìîд¡¢ÌØÊâÖµÌîд¡¢ÈÈ¿¨Ìî³äµÈ¡£PandasÖеÄfillna()·½·¨¿ÉÒÔʵÏÖÌî³ä¿ÕÖµ»òȱʧֵ£¬ÆäÓï·¨¸ñʽÈçÏ£º

fillna(value=None, method=None, axis=None, inplace=False,limit=None, downcast=None, 
       **kwargs)

ÉÏÊö·½·¨Öв¿·Ö²ÎÊý±íʾµÄº¬ÒåÈçÏ£º

(1) value£ºÓÃÓÚÌî³äµÄÊýÖµ¡£

(2) method£º±íʾÌî³ä·½Ê½£¬Ä¬ÈÏֵΪNone£¬ÁíÍ⻹֧³ÖÒÔÏÂȡֵ£º

pad/ffill£º½«×îºóÒ»¸öÓÐЧµÄÊý¾ÝÏòºó´«²¥£¬Ò²¾ÍÊÇ˵ÓÃÈ±Ê§ÖµÇ°ÃæµÄÒ»¸öÖµ´úÌæÈ±Ê§Öµ¡£

backfill/bfill£º½«×îºóÒ»¸öÓÐЧµÄÊý¾ÝÏòǰ´«²¥£¬Ò²¾ÍÊÇ˵ÓÃȱʧֵºóÃæµÄÒ»¸öÖµ´úÌæÈ±Ê§Öµ¡£

(3) limit£º ¿ÉÒÔÁ¬ÐøÌî³äµÄ×î´óÊýÁ¿£¬Ä¬ÈÏNone¡£

×¢Ò⣺

method²ÎÊý²»ÄÜÓëvalue²ÎÊýͬʱʹÓá£

µ±Ê¹ÓÃfillna()·½·¨½øÐÐÌî³äʱ£¬¼È¿ÉÒÔÊDZêÁ¿¡¢×ֵ䣬Ҳ¿ÉÒÔÊÇSeries»òDataFrame¶ÔÏó¡£

¼ÙÉèÏÖÔÚÓÐÒ»Õűí¸ñ£¬ËüÀïÃæ´æÔÚһЩȱʧֵ£¬Èç¹ûʹÓÃÒ»¸ö³£Á¿66.0À´Ì滻ȱʧֵ£¬ÄÇôÌî³äǰºóµÄЧ¹ûÈçͼ2Ëùʾ¡£

ͼ2 Ìî³äȱʧֵʾÀý

Ìî³ä³£ÊýÌæ»»È±Ê§ÖµµÄʾÀý´úÂëÈçÏ¡£

In [5]: import pandas as pd
        import numpy as np
        from numpy import NaN
        df_obj = pd.DataFrame({'A': [1, 2, 3, NaN],
                               'B': [NaN, 4, NaN, 6],
                               'C': ['a', 7, 8, 9],
                               'D':[ NaN, 2, 3, NaN]})
        df_obj
Out[5]:
   A   B C  D
0 1.0 NaN a NaN
1 2.0 4.0 7 2.0
2 3.0 NaN 8 3.0
3 NaN 6.0 9 NaN
In [6]: df_obj.fillna('66.0')  # ʹÓÃ66.0Ìæ»»È±Ê§Öµ
Out[6]:
   A   B  C   D
0  1.0 66.0 a 66.0
1  2.0  4.0 7  2.0
2  3.0 66.0 8  3.0
3 66.0  6.0 9 66.0

ͨ¹ý±È½ÏÁ½´ÎµÄ½á¹û¿ÉÖª£¬µ±Ê¹ÓÃÈÎÒâÒ»¸öÓÐÐ§ÖµÌæ»»¿ÕÖµ»òȱʧֵʱ£¬¶ÔÏóÖÐËùÓеĿÕÖµ»òȱʧֵ¶¼½«»á±»Ìæ»»¡£

Èç¹ûÏ£ÍûÌî³ä²»Ò»ÑùµÄÄÚÈÝ£¬ÀýÈ磬AÁÐȱʧµÄÊý¾ÝʹÓÃÊý×Ö“4.0”½øÐÐÌî³ä£¬BÁÐȱʧµÄÊý¾ÝʹÓÃÊý×Ö“5.0”À´Ìî³ä£¬ÄÇôÌî³äǰºóµÄЧ¹ûÈçͼ3Ëùʾ¡£

ͼ3 Ö¸¶¨Ìî³äÁÐ

µ÷ÓÃfillna()·½·¨Ê±´«ÈëÒ»¸ö×ֵ䏸value²ÎÊý£¬ÆäÖÐ×ÖµäµÄ¼üΪÁбêÇ©£¬×ÖµäµÄֵΪ´ýÌæ»»µÄÖµ£¬ÊµÏÖ¶ÔÖ¸¶¨ÁеÄȱʧֵ½øÐÐÌæ»»£¬¾ßÌåʾÀý´úÂëÈçÏ¡£

In [7]: import pandas as pd
        import numpy as np
        from numpy import NaN
        df_obj = pd.DataFrame({'A': [1, 2, 3, NaN],
                               'B': [NaN, 4, NaN, 6],
                               'C': ['a', 7, 8, 9],
                               'D': [NaN, 2, 3, NaN]})
        df_obj

Out[7]:
   A   B C  D
0 1.0 NaN a NaN
1 2.0 4.0 7 2.0
2 3.0 NaN 8 3.0
3 NaN 6.0 9 NaN
In [8]: df_obj.fillna({'A': 4.0, 'B': 5.0}) # Ö¸¶¨ÁÐÌî³äÊý¾Ý
Out[8]:
   A  B  C  D
0 1.0 5.0 a NaN
1 2.0 4.0 7 2.0
2 3.0 5.0 8 3.0
3 4.0 6.0 9 NaN

Èç¹ûÏ£ÍûÌî³äÏàÁÚµÄÊý¾ÝÀ´Ì滻ȱʧֵ£¬ÀýÈ磬A~DÁÐÖа´´ÓǰÍùºóµÄ˳ÐòÌî³äȱʧµÄÊý¾Ý£¬Ò²¾ÍÊÇ˵ÔÚµ±Ç°ÁÐÖÐʹÓÃλÓÚÈ±Ê§ÖµÇ°ÃæµÄÊý¾Ý½øÐÐÌæ»»£¬Ìî³äǰºóµÄЧ¹ûÈçͼ4Ëùʾ¡£

ͼ4 ǰÏòÌî³äʾÀý

µ÷ÓÃfillna()·½·¨Ê±½«“ffill”´«Èë¸ømethod²ÎÊý£¬ÊµÏÖǰÏòÌî³äȱʧµÄÊý¾Ý£¬¾ßÌåʾÀý´úÂëÈçÏ¡£

In [9]: import pandas as pd
        import numpy as np
        from numpy import NaN
        df = pd.DataFrame({'A': [1, 2, 3, None],
                           'B': [NaN, 4, None, 6],
                           'C': ['a', 7, 8, 9],
                           'D': [None, 2, 3, NaN]})
        df

Out[9]:
   A   B C  D
0 1.0 NaN a NaN
1 2.0 4.0 7 2.0
2 3.0 NaN 8 3.0
3 NaN 6.0 9 NaN
In [10]: df.fillna(method='ffill')  # ʹÓÃǰÏòÌî³äµÄ·½Ê½Ìæ»»¿ÕÖµ»òȱʧֵ
Out[10]:
   A   B C  D
0 1.0 NaN a NaN
1 2.0 4.0 7 2.0
2 3.0 4.0 8 3.0



²ÂÄãϲ»¶£º

PythonÖг£ÓõÄÊý¾Ý·ÖÎö¹¤¾ß£¨Ä£¿é£©ÓÐÄÄЩ£¿

Python×öÊý¾Ý·ÖÎöÓÐÄÄЩÓÅÊÆ?

Python³£¼ûµÄÊý¾ÝÀàÐÍÓÐÄÄЩ£¿

ÀÖÓãµç¾ºPythonÅàѵ¿Î³Ì

0 ·ÖÏíµ½£º
ºÍÎÒÃÇÔÚÏß½»Ì¸£¡
¡¾ÍøÕ¾µØÍ¼¡¿¡¾sitemap¡¿