Python ÕýÔò±í´ïʽÏê½â
¸üÐÂʱ¼ä:2018Äê10ÔÂ19ÈÕ16ʱ42·Ö À´Ô´:ÀÖÓã²¥¿Í ä¯ÀÀ´ÎÊý:
Python ÕýÔò±í´ïʽ
ÕýÔò±í´ïʽÊÇÒ»¸öÌØÊâµÄ×Ö·ûÐòÁУ¬ËüÄܰïÖúÄã·½±ãµÄ¼ì²éÒ»¸ö×Ö·û´®ÊÇ·ñÓëijÖÖģʽƥÅä¡£
Python ×Ô1.5°æ±¾ÆðÔö¼ÓÁËre Ä£¿é£¬ËüÌṩ Perl ·ç¸ñµÄÕýÔò±í´ïʽģʽ¡£
re Ä£¿éʹ Python ÓïÑÔÓµÓÐÈ«²¿µÄÕýÔò±í´ïʽ¹¦ÄÜ¡£
compile º¯Êý¸ù¾ÝÒ»¸öģʽ×Ö·û´®ºÍ¿ÉÑ¡µÄ±êÖ¾²ÎÊýÉú³ÉÒ»¸öÕýÔò±í´ïʽ¶ÔÏ󡣸öÔÏóÓµÓÐһϵÁз½·¨ÓÃÓÚÕýÔò±í´ïʽƥÅäºÍÌæ»»¡£
re Ä£¿éÒ²ÌṩÁËÓëÕâЩ·½·¨¹¦ÄÜÍêȫһÖµĺ¯Êý£¬ÕâЩº¯ÊýʹÓÃÒ»¸öģʽ×Ö·û´®×öΪËüÃǵĵÚÒ»¸ö²ÎÊý¡£
ÕýÔò±í´ïʽģʽ
ģʽ×Ö·û´®Ê¹ÓÃÌØÊâµÄÓï·¨À´±íʾһ¸öÕýÔò±í´ïʽ£º
×ÖĸºÍÊý×Ö±íʾËûÃÇ×ÔÉí¡£Ò»¸öÕýÔò±í´ïʽģʽÖеÄ×ÖĸºÍÊý×ÖÆ¥ÅäͬÑùµÄ×Ö·û´®¡£
¶àÊý×ÖĸºÍÊý×Öǰ¼ÓÒ»¸ö·´Ð±¸Üʱ»áÓµÓв»Í¬µÄº¬Òå¡£
±êµã·ûºÅÖ»Óб»×ªÒåʱ²ÅÆ¥Åä×ÔÉí£¬·ñÔòËüÃDZíÊ¾ÌØÊâµÄº¬Òå¡£
·´Ð±¸Ü±¾ÉíÐèҪʹÓ÷´Ð±¸ÜתÒå¡£
ÓÉÓÚÕýÔò±í´ïʽͨ³£¶¼°üº¬·´Ð±¸Ü£¬ËùÒÔÄã×îºÃʹÓÃÔʼ×Ö·û´®À´±íʾËüÃÇ¡£Ä£Ê½ÔªËØ(Èç r'\t'£¬µÈ¼ÛÓÚ '\\t')Æ¥ÅäÏàÓ¦µÄÌØÊâ×Ö·û¡£
ϱíÁгöÁËÕýÔò±í´ïʽģʽÓï·¨ÖеÄÌØÊâÔªËØ¡£Èç¹ûÄãʹÓÃģʽµÄͬʱÌṩÁË¿ÉÑ¡µÄ±êÖ¾²ÎÊý£¬Ä³Ð©Ä£Ê½ÔªËصĺ¬Òå»á¸Ä±ä¡£
ģʽ ÃèÊö
^ Æ¥Åä×Ö·û´®µÄ¿ªÍ·
$ Æ¥Åä×Ö·û´®µÄĩβ¡£
. Æ¥ÅäÈÎÒâ×Ö·û£¬³ýÁË»»Ðзû£¬µ±re.DOTALL±ê¼Ç±»Ö¸¶¨Ê±£¬Ôò¿ÉÒÔÆ¥Åä°üÀ¨»»ÐзûµÄÈÎÒâ×Ö·û¡£
[...] ÓÃÀ´±íʾһ×é×Ö·û,µ¥¶ÀÁгö£º[amk] Æ¥Åä 'a'£¬'m'»ò'k'
[^...] ²»ÔÚ[]ÖеÄ×Ö·û£º[^abc] Æ¥Åä³ýÁËa,b,cÖ®ÍâµÄ×Ö·û¡£
re* Æ¥Åä0¸ö»ò¶à¸öµÄ±í´ïʽ¡£
re+ Æ¥Åä1¸ö»ò¶à¸öµÄ±í´ïʽ¡£
re? Æ¥Åä0¸ö»ò1¸öÓÉÇ°ÃæµÄÕýÔò±í´ïʽ¶¨ÒåµÄƬ¶Î£¬·Ç̰À··½Ê½
re{ n} ¾«È·Æ¥Åä n ¸öÇ°Ãæ±í´ïʽ¡£ÀýÈ磬 o{2} ²»ÄÜÆ¥Åä "Bob" ÖÐµÄ "o"£¬µ«ÊÇÄÜÆ¥Åä "food" ÖеÄÁ½¸ö o¡£
re{ n,} Æ¥Åä n ¸öÇ°Ãæ±í´ïʽ¡£ÀýÈ磬 o{2,} ²»ÄÜÆ¥Åä"Bob"ÖеÄ"o"£¬µ«ÄÜÆ¥Åä "foooood"ÖеÄËùÓÐ o¡£"o{1,}" µÈ¼ÛÓÚ "o+"¡£"o{0,}" ÔòµÈ¼ÛÓÚ "o*"¡£
re{ n, m} Æ¥Åä n µ½ m ´ÎÓÉÇ°ÃæµÄÕýÔò±í´ïʽ¶¨ÒåµÄƬ¶Î£¬Ì°À··½Ê½
a| b Æ¥Åäa»òb
(re) Æ¥ÅäÀ¨ºÅÄڵıí´ïʽ£¬Ò²±íʾһ¸ö×é
(?imx) ÕýÔò±í´ïʽ°üº¬ÈýÖÖ¿ÉÑ¡±êÖ¾£ºi, m, »ò x ¡£Ö»Ó°ÏìÀ¨ºÅÖеÄÇøÓò¡£
(?-imx) ÕýÔò±í´ïʽ¹Ø±Õ i, m, »ò x ¿ÉÑ¡±êÖ¾¡£Ö»Ó°ÏìÀ¨ºÅÖеÄÇøÓò¡£
(?: re) ÀàËÆ (...), µ«ÊDz»±íʾһ¸ö×é
(?imx: re) ÔÚÀ¨ºÅÖÐʹÓÃi, m, »ò x ¿ÉÑ¡±êÖ¾
(?-imx: re) ÔÚÀ¨ºÅÖв»Ê¹ÓÃi, m, »ò x ¿ÉÑ¡±êÖ¾
(?#...) ×¢ÊÍ.
(?= re) ǰÏò¿Ï¶¨½ç¶¨·û¡£Èç¹ûËùº¬ÕýÔò±í´ïʽ£¬ÒÔ ... ±íʾ£¬ÔÚµ±Ç°Î»Öóɹ¦Æ¥Åäʱ³É¹¦£¬·ñÔòʧ°Ü¡£µ«Ò»µ©Ëùº¬±í´ïʽÒѾ³¢ÊÔ£¬Æ¥ÅäÒýÇæ¸ù±¾Ã»ÓÐÌá¸ß£»Ä£Ê½µÄÊ£Óಿ·Ö»¹Òª³¢ÊԽ綨·ûµÄÓұߡ£
(?! re) ǰÏò·ñ¶¨½ç¶¨·û¡£Óë¿Ï¶¨½ç¶¨·ûÏà·´£»µ±Ëùº¬±í´ïʽ²»ÄÜÔÚ×Ö·û´®µ±Ç°Î»ÖÃÆ¥Åäʱ³É¹¦
(?> re) Æ¥ÅäµÄ¶ÀÁ¢Ä£Ê½£¬Ê¡È¥»ØËÝ¡£
\w Æ¥Åä×ÖĸÊý×Ö¼°Ï»®Ïß
\W Æ¥Åä·Ç×ÖĸÊý×Ö¼°Ï»®Ïß
\s Æ¥ÅäÈÎÒâ¿Õ°××Ö·û£¬µÈ¼ÛÓÚ [\t\n\r\f].
\S Æ¥ÅäÈÎÒâ·Ç¿Õ×Ö·û
\d Æ¥ÅäÈÎÒâÊý×Ö£¬µÈ¼ÛÓÚ [0-9].
\D Æ¥ÅäÈÎÒâ·ÇÊý×Ö
\A Æ¥Åä×Ö·û´®¿ªÊ¼
\Z Æ¥Åä×Ö·û´®½áÊø£¬Èç¹ûÊÇ´æÔÚ»»ÐУ¬Ö»Æ¥Åäµ½»»ÐÐǰµÄ½áÊø×Ö·û´®¡£
\z Æ¥Åä×Ö·û´®½áÊø
\G Æ¥Åä×îºóÆ¥ÅäÍê³ÉµÄλÖá£
\b Æ¥ÅäÒ»¸öµ¥´Ê±ß½ç£¬Ò²¾ÍÊÇÖ¸µ¥´ÊºÍ¿Õ¸ñ¼äµÄλÖá£ÀýÈ磬 'er\b' ¿ÉÒÔÆ¥Åä"never" ÖÐµÄ 'er'£¬µ«²»ÄÜÆ¥Åä "verb" ÖÐµÄ 'er'¡£
\B Æ¥Åä·Çµ¥´Ê±ß½ç¡£'er\B' ÄÜÆ¥Åä "verb" ÖÐµÄ 'er'£¬µ«²»ÄÜÆ¥Åä "never" ÖÐµÄ 'er'¡£
\n, \t, µÈ. Æ¥ÅäÒ»¸ö»»Ðзû¡£Æ¥ÅäÒ»¸öÖÆ±í·û¡£µÈ
\1...\9 Æ¥ÅäµÚn¸ö·Ö×éµÄÄÚÈÝ¡£
\10 Æ¥ÅäµÚn¸ö·Ö×éµÄÄÚÈÝ£¬Èç¹ûËü¾Æ¥Åä¡£·ñÔòÖ¸µÄÊǰ˽øÖÆ×Ö·ûÂëµÄ±í´ïʽ¡£
ÕýÔò±í´ïʽʵÀý
×Ö·ûÆ¥Åä ʵÀýÃèÊö
python Æ¥Åä "python".
×Ö·ûÀà ʵÀýÃèÊö
[Pp]ython Æ¥Åä "Python" »ò "python"
rub[ye] Æ¥Åä "ruby" »ò "rube"
[aeiou] Æ¥ÅäÖÐÀ¨ºÅÄÚµÄÈÎÒâÒ»¸ö×Öĸ
[0-9] Æ¥ÅäÈκÎÊý×Ö¡£ÀàËÆÓÚ [0123456789]
[a-z] Æ¥ÅäÈκÎСд×Öĸ
[A-Z] Æ¥ÅäÈκδóд×Öĸ
[a-zA-Z0-9] Æ¥ÅäÈκÎ×Öĸ¼°Êý×Ö
[^aeiou] ³ýÁËaeiou×ÖĸÒÔÍâµÄËùÓÐ×Ö·û
[^0-9] Æ¥Åä³ýÁËÊý×ÖÍâµÄ×Ö·û
ÌØÊâ×Ö·ûÀà ʵÀýÃèÊö
. Æ¥Åä³ý "\n" Ö®ÍâµÄÈκε¥¸ö×Ö·û¡£ÒªÆ¥Åä°üÀ¨ '\n' ÔÚÄÚµÄÈκÎ×Ö·û£¬ÇëʹÓÃÏó '[.\n]' µÄģʽ¡£
\d Æ¥ÅäÒ»¸öÊý×Ö×Ö·û¡£µÈ¼ÛÓÚ [0-9]¡£
\D Æ¥ÅäÒ»¸ö·ÇÊý×Ö×Ö·û¡£µÈ¼ÛÓÚ [^0-9]¡£
\s Æ¥ÅäÈκοհ××Ö·û£¬°üÀ¨¿Õ¸ñ¡¢ÖƱí·û¡¢»»Ò³·ûµÈµÈ¡£µÈ¼ÛÓÚ [ \f\n\r\t\v]¡£
\S Æ¥ÅäÈκηǿհ××Ö·û¡£µÈ¼ÛÓÚ [^ \f\n\r\t\v]¡£
\w Æ¥Åä°üÀ¨Ï»®ÏßµÄÈκε¥´Ê×Ö·û¡£µÈ¼ÛÓÚ'[A-Za-z0-9_]'¡£
\W Æ¥ÅäÈκηǵ¥´Ê×Ö·û¡£µÈ¼ÛÓÚ '[^A-Za-z0-9_]'¡£
reÄ£¿é
1.ReÄ£¿é¼ò½é
reÄ£¿éÊÇpythonÖд¦ÀíÕýÔò±í´ïʽµÄÒ»¸öÄ£¿é£¬Í¨¹ýreÄ£¿éµÄ·½·¨£¬°ÑÕýÔò±í´ïʽpattern±àÒë³ÉÕýÔò¶ÔÏó£¬ÒÔ±ãʹÓÃÕýÔò¶ÔÏóµÄ·½·¨
ЧÂÊÎÊÌ⣺
#!/usr/bin/env python# -*- coding: utf-8 -*-# @Time : 2018/4/29 22:02# @Author : Feng Xiaoqing# @File : test.py# @Function: -----------import reimport timeit
print(timeit.timeit(setup='''import re; reg = re.compile('<(?P<tagname>\w*)>.*</(?P=tagname)>')''', stmt='''reg.match('<h1>xxx</h1>')''', number=1000000))
print(timeit.timeit(setup='''import re''', stmt='''re.match('<(?P<tagname>\w*)>.*</(?P=tagname)>', '<h1>xxx</h1>')''', number=1000000))
reg = re.compile('<(?P<tagname>\w*)>.*</(?P=tagname)>')
reg.match('<h1>xxx</h1>')
Ö´Ðнá¹û£º
0.42296138327572711.0246964437151256
³£Ó÷½·¨£ºÏÈÉêÃ÷Ò»¸öÕýÔò¶ÔÏó£¬ÔÚͨ¹ýÕýÔò¶ÔÏóȥƥÅä¡£ÕâÑùµÄЧÂʸߡ£
re.compile(pattern[, flags])·½·¨
re.I(re.IGNORECASE): ºöÂÔ´óСд£¨À¨ºÅÄÚÊÇÍêÕûд·¨£¬ÏÂͬ£©
M(MULTILINE): ¶àÐÐģʽ£¬¸Ä±ä'^'ºÍ'$'µÄÐÐΪ
S(DOTALL): µãÈÎÒâÆ¥Åäģʽ£¬¸Ä±ä'.'µÄÐÐΪ
L(LOCALE): ʹԤ¶¨×Ö·ûÀà \w \W \b \B \s \S È¡¾öÓÚµ±Ç°ÇøÓòÉ趨
U(UNICODE): ʹԤ¶¨×Ö·ûÀà \w \W \b \B \s \S \d \D È¡¾öÓÚunicode¶¨ÒåµÄ×Ö·ûÊôÐÔ
X(VERBOSE): Ïêϸģʽ¡£Õâ¸öģʽÏÂÕýÔò±í´ïʽ¿ÉÒÔÊǶàÐУ¬ºöÂÔ¿Õ°××Ö·û£¬²¢¿ÉÒÔ¼ÓÈë×¢ÊÍ¡£ÒÔÏÂÁ½¸öÕýÔò±í´ïʽÊǵȼ۵ģº
compile º¯ÊýÓÃÓÚ±àÒëÕýÔò±í´ïʽ£¬Éú³ÉÒ»¸öÕýÔò±í´ïʽ£¨ Pattern £©¶ÔÏ󣬹© match() ºÍ search() ÕâÁ½¸öº¯ÊýʹÓá£
Óï·¨¸ñʽΪ£º
re.compile(pattern[, flags])
²ÎÊý£º
• pattern : Ò»¸ö×Ö·û´®ÐÎʽµÄÕýÔò±í´ïʽ
• flags : ¿ÉÑ¡£¬±íʾƥÅäģʽ£¬±ÈÈçºöÂÔ´óСд£¬¶àÐÐģʽµÈ£¬¾ßÌå²ÎÊýΪ£º
1. re.I ºöÂÔ´óСд
2. re.L ±íÊ¾ÌØÊâ×Ö·û¼¯ \w, \W, \b, \B, \s, \S ÒÀÀµÓÚµ±Ç°»·¾³
3. re.M ¶àÐÐģʽ
4. re.S ¼´Îª . ²¢ÇÒ°üÀ¨»»ÐзûÔÚÄÚµÄÈÎÒâ×Ö·û£¨. ²»°üÀ¨»»Ðзû£©
5. re.U ±íÊ¾ÌØÊâ×Ö·û¼¯ \w, \W, \b, \B, \d, \D, \s, \S ÒÀÀµÓÚ Unicode ×Ö·ûÊôÐÔÊý¾Ý¿â
6. re.X ΪÁËÔö¼Ó¿É¶ÁÐÔ£¬ºöÂÔ¿Õ¸ñºÍ # ºóÃæµÄ×¢ÊÍ
ʵÀý
>>>import re>>> pattern = re.compile(r'\d+') # ÓÃÓÚÆ¥ÅäÖÁÉÙÒ»¸öÊý×Ö
>>> m = pattern.match('one12twothree34four') # ²éÕÒÍ·²¿£¬Ã»ÓÐÆ¥Åä>>> print mNone
>>> m = pattern.match('one12twothree34four', 2, 10) # ´Ó'e'µÄλÖÿªÊ¼Æ¥Å䣬ûÓÐÆ¥Åä
>>> print mNone
>>> m = pattern.match('one12twothree34four', 3, 10) # ´Ó'1'µÄλÖÿªÊ¼Æ¥Å䣬ÕýºÃÆ¥Åä
>>> print m # ·µ»ØÒ»¸ö Match ¶ÔÏó<_sre.SRE_Match object at 0x10a42aac0
>>>> m.group(0) # ¿ÉÊ¡ÂÔ 0'12'>>> m.start(0) # ¿ÉÊ¡ÂÔ 03
>>> m.end(0) # ¿ÉÊ¡ÂÔ 05>>> m.span(0) # ¿ÉÊ¡ÂÔ 0(3, 5)
ÔÚÉÏÃæ£¬µ±Æ¥Åä³É¹¦Ê±·µ»ØÒ»¸ö Match ¶ÔÏ󣬯äÖУº
• group([group1, …]) ·½·¨ÓÃÓÚ»ñµÃÒ»¸ö»ò¶à¸ö·Ö×鯥ÅäµÄ×Ö·û´®£¬µ±Òª»ñµÃÕû¸öÆ¥ÅäµÄ×Ó´®Ê±£¬¿ÉÖ±½ÓʹÓà group()»ò group(0)£»
• start([group]) ·½·¨ÓÃÓÚ»ñÈ¡·Ö×鯥ÅäµÄ×Ó´®ÔÚÕû¸ö×Ö·û´®ÖÐµÄÆðʼλÖã¨×Ó´®µÚÒ»¸ö×Ö·ûµÄË÷Òý£©£¬²ÎÊýĬÈÏֵΪ 0£»
• end([group]) ·½·¨ÓÃÓÚ»ñÈ¡·Ö×鯥ÅäµÄ×Ó´®ÔÚÕû¸ö×Ö·û´®ÖеĽáÊøÎ»Öã¨×Ó´®×îºóÒ»¸ö×Ö·ûµÄË÷Òý+1£©£¬²ÎÊýĬÈÏֵΪ 0£»
• span([group]) ·½·¨·µ»Ø (start(group), end(group))¡£
ÔÙ¿´¿´Ò»¸öÀý×Ó£º
ʵÀý
>>>import re>>> pattern = re.compile(r'([a-z]+) ([a-z]+)', re.I) # re.I ±íʾºöÂÔ´óСд>>> m = pattern.match('Hello World Wide Web')>>> print m # Æ¥Åä³É¹¦£¬·µ»ØÒ»¸ö Match ¶ÔÏó<_sre.SRE_Match object at 0x10bea83e8>>>> m.group(0) # ·µ»ØÆ¥Åä³É¹¦µÄÕû¸ö×Ó´®'Hello World'>>> m.span(0) # ·µ»ØÆ¥Åä³É¹¦µÄÕû¸ö×Ó´®µÄË÷Òý(0, 11)>>> m.group(1) # ·µ»ØµÚÒ»¸ö·Ö×鯥Åä³É¹¦µÄ×Ó´®'Hello'>>> m.span(1) # ·µ»ØµÚÒ»¸ö·Ö×鯥Åä³É¹¦µÄ×Ó´®µÄË÷Òý(0, 5)>>> m.group(2) # ·µ»ØµÚ¶þ¸ö·Ö×鯥Åä³É¹¦µÄ×Ó´®'World'>>> m.span(2) # ·µ»ØµÚ¶þ¸ö·Ö×鯥Åä³É¹¦µÄ×Ó´®(6, 11)>>> m.groups() # µÈ¼ÛÓÚ (m.group(1), m.group(2), ...)('Hello', 'World')>>> m.group(3) # ²»´æÔÚµÚÈý¸ö·Ö×éTraceback (most recent call last):
File "<stdin>", line 1, in <module>IndexError: no such group
ReÄ£¿éµÄ·½·¨£º
(1)re.matchº¯Êý
re.match ³¢ÊÔ´Ó×Ö·û´®µÄÆðʼλÖÃÆ¥ÅäÒ»¸öģʽ£¬Èç¹û²»ÊÇÆðʼλÖÃÆ¥Åä³É¹¦µÄ»°£¬match()¾Í·µ»Ønone¡£
º¯ÊýÓï·¨£º
re.match(pattern, string, flags=0)
º¯Êý²ÎÊý˵Ã÷£º
²ÎÊý ÃèÊö
pattern Æ¥ÅäµÄÕýÔò±í´ïʽ
string ҪƥÅäµÄ×Ö·û´®¡£
flags ±ê־룬ÓÃÓÚ¿ØÖÆÕýÔò±í´ïʽµÄÆ¥Å䷽ʽ£¬È磺ÊÇ·ñÇø·Ö´óСд£¬¶àÐÐÆ¥ÅäµÈµÈ¡£²Î¼û£ºÕýÔò±í´ïʽÐÞÊηû - ¿ÉÑ¡±êÖ¾
Æ¥Åä³É¹¦re.match·½·¨·µ»ØÒ»¸öÆ¥ÅäµÄ¶ÔÏ󣬷ñÔò·µ»ØNone¡£
ÎÒÃÇ¿ÉÒÔʹÓÃgroup(num) »ò groups() Æ¥Åä¶ÔÏóº¯ÊýÀ´»ñȡƥÅä±í´ïʽ¡£
Æ¥Åä¶ÔÏó·½·¨ ÃèÊö
group(num=0) Æ¥ÅäµÄÕû¸ö±í´ïʽµÄ×Ö·û´®£¬group() ¿ÉÒÔÒ»´ÎÊäÈë¶à¸ö×éºÅ£¬ÔÚÕâÖÖÇé¿öÏÂËü½«·µ»ØÒ»¸ö°üº¬ÄÇЩ×éËù¶ÔÓ¦ÖµµÄÔª×é¡£
groups() ·µ»ØÒ»¸ö°üº¬ËùÓÐС×é×Ö·û´®µÄÔª×飬´Ó 1 µ½ Ëùº¬µÄС×éºÅ¡£
ʵÀý
#!/usr/bin/env python# -*- coding: utf-8 -*-# @Time : 2018/4/29 22:15# @Author : Feng Xiaoqing# @File : test2.py# @Function: -----------import re
pattern = re.compile(r'\d+') # ²éÕÒÊý×Öresult1 = pattern.findall('runoob 123 google 456')
result2 = pattern.findall('run88oob123google456', 0, 10)
print(result1)
print(result2)
ÒÔÉÏʵÀýÔËÐÐÊä³ö½á¹ûΪ£º
(0, 3)
None
ʵÀý
#!/usr/bin/env python# -*- coding: utf-8 -*-# @Time : 2018/4/29 22:55# @Author : Feng Xiaoqing# @File : test2.py# @Function: -----------import re
line = "Cats are smarter than dogs"
matchObj = re.match( r'(.*) are (.*?) .*', line, re.M|re.I)
if matchObj: print "matchObj.group() : ", matchObj.group() print "matchObj.group(1) : ", matchObj.group(1) print "matchObj.group(2) : ", matchObj.group(2)else: print "No match!!"
ÒÔÉÏʵÀýÖ´Ðнá¹ûÈçÏ£º
matchObj.group() : Cats are smarter than dogsmatchObj.group(1) : CatsmatchObj.group(2) : smarter
(2)re.search·½·¨
re.search ɨÃèÕû¸ö×Ö·û´®²¢·µ»ØµÚÒ»¸ö³É¹¦µÄÆ¥Åä¡£
º¯ÊýÓï·¨£º
re.search(pattern, string, flags=0)
º¯Êý²ÎÊý˵Ã÷£º
²ÎÊý ÃèÊö
pattern Æ¥ÅäµÄÕýÔò±í´ïʽ
string ҪƥÅäµÄ×Ö·û´®¡£
flags ±ê־룬ÓÃÓÚ¿ØÖÆÕýÔò±í´ïʽµÄÆ¥Å䷽ʽ£¬È磺ÊÇ·ñÇø·Ö´óСд£¬¶àÐÐÆ¥ÅäµÈµÈ¡£
Æ¥Åä³É¹¦re.search·½·¨·µ»ØÒ»¸öÆ¥ÅäµÄ¶ÔÏ󣬷ñÔò·µ»ØNone¡£
ÎÒÃÇ¿ÉÒÔʹÓÃgroup(num) »ò groups() Æ¥Åä¶ÔÏóº¯ÊýÀ´»ñȡƥÅä±í´ïʽ¡£
Æ¥Åä¶ÔÏó·½·¨ÃèÊö
group(num=0) Æ¥ÅäµÄÕû¸ö±í´ïʽµÄ×Ö·û´®£¬group() ¿ÉÒÔÒ»´ÎÊäÈë¶à¸ö×éºÅ£¬ÔÚÕâÖÖÇé¿öÏÂËü½«·µ»ØÒ»¸ö°üº¬ÄÇЩ×éËù¶ÔÓ¦ÖµµÄÔª×é¡£
groups() ·µ»ØÒ»¸ö°üº¬ËùÓÐС×é×Ö·û´®µÄÔª×飬´Ó 1 µ½ Ëùº¬µÄС×éºÅ¡£
ʵÀý
#!/usr/bin/env python# -*- coding: utf-8 -*-# @Time : 2018/4/29 22:15# @Author : Feng Xiaoqing# @File : test2.py# @Function: -----------import re
print(re.search('www', 'www.runoob.com').span()) # ÔÚÆðʼλÖÃÆ¥Åäprint(re.search('com', 'www.runoob.com').span()) # ²»ÔÚÆðʼλÖÃÆ¥Åä
ÒÔÉÏʵÀýÔËÐÐÊä³ö½á¹ûΪ£º
(0, 3)
(11, 14)
ʵÀý
#!/usr/bin/pythonimport re
line = "Cats are smarter than dogs";
searchObj = re.search( r'(.*) are (.*?) .*', line, re.M|re.I)
if searchObj: print "searchObj.group() : ", searchObj.group() print "searchObj.group(1) : ", searchObj.group(1) print "searchObj.group(2) : ", searchObj.group(2)else: print "Nothing found!!"
ÒÔÉÏʵÀýÖ´Ðнá¹ûÈçÏ£º
searchObj.group() : Cats are smarter than dogssearchObj.group(1) : CatssearchObj.group(2) : smarter
re.matchÓëre.searchµÄÇø±ð
re.matchֻƥÅä×Ö·û´®µÄ¿ªÊ¼£¬Èç¹û×Ö·û´®¿ªÊ¼²»·ûºÏÕýÔò±í´ïʽ£¬ÔòÆ¥Åäʧ°Ü£¬º¯Êý·µ»ØNone£»¶øre.searchÆ¥ÅäÕû¸ö×Ö·û´®£¬Ö±µ½ÕÒµ½Ò»¸öÆ¥Åä¡£
ʵÀý
#!/usr/bin/pythonimport re
line = "Cats are smarter than dogs";
matchObj = re.match( r'dogs', line, re.M|re.I)if matchObj: print "match --> matchObj.group() : ", matchObj.group()else: print "No match!!"
matchObj = re.search( r'dogs', line, re.M|re.I)if matchObj: print "search --> matchObj.group() : ", matchObj.group()else: print "No match!!"
ÒÔÉÏʵÀýÔËÐнá¹ûÈçÏ£º
No match!!
search --> matchObj.group() : dogs
Match´Ó¿ªÍ·¿ªÊ¼Æ¥Å䣬ƥÅä²»µ½£¬·µ»Ø¿Õ
Search´Ó¿ªÍ·¿ªÊ¼Æ¥Å䣬ȻºóµÚµÚ¶þ¸ö¿ªÊ¼Æ¥Å䣬ֻƥÅäÒ»¸ö½á¹û¡£
MatchµÄЧÂÊÊÇ×î¸ßµÄ£¬¾ÍÒªÇóÎÒÃÇÕýÔò±í´ïʽҪдÕýÈ·
£¨3£©Split·½·¨
re.split
split ·½·¨°´ÕÕÄܹ»Æ¥ÅäµÄ×Ó´®½«×Ö·û´®·Ö¸îºó·µ»ØÁÐ±í£¬ËüµÄʹÓÃÐÎʽÈçÏ£º
re.split(pattern, string[, maxsplit=0, flags=0])
²ÎÊý ÃèÊö
pattern Æ¥ÅäµÄÕýÔò±í´ïʽ
string ҪƥÅäµÄ×Ö·û´®¡£
maxsplit ·Ö¸ô´ÎÊý£¬maxsplit=1 ·Ö¸ôÒ»´Î£¬Ä¬ÈÏΪ 0£¬²»ÏÞÖÆ´ÎÊý¡£
flags ±ê־룬ÓÃÓÚ¿ØÖÆÕýÔò±í´ïʽµÄÆ¥Å䷽ʽ£¬È磺ÊÇ·ñÇø·Ö´óСд£¬¶àÐÐÆ¥ÅäµÈµÈ¡£²Î¼û£ºÕýÔò±í´ïʽÐÞÊηû - ¿ÉÑ¡±êÖ¾
ʵÀý
>>>import re
>>> re.split('\W+', 'runoob, runoob, runoob.') ['runoob', 'runoob', 'runoob', '']
>>> re.split('(\W+)', ' runoob, runoob, runoob.') ['', ' ', 'runoob', ', ', 'runoob', ', ', 'runoob', '.', ''] >>> re.split('\W+', ' runoob, runoob, runoob.', 1) ['', 'runoob, runoob, runoob.']
>>> re.split('a*', 'hello world') # ¶ÔÓÚÒ»¸öÕÒ²»µ½Æ¥ÅäµÄ×Ö·û´®¶øÑÔ£¬split ²»»á¶ÔÆä×÷³ö·Ö¸î ['hello world']
split(string[, maxsplit])
°´ÕÕÄܹ»Æ¥ÅäµÄ×Ó´®½«string·Ö¸îºó·µ»ØÁÐ±í¡£maxsplitÓÃÓÚÖ¸¶¨×î´ó·Ö¸î´ÎÊý£¬²»Ö¸¶¨½«È«²¿·Ö¸î¡£
#!/usr/bin/env python# -*- coding: utf-8 -*-# @Time : 2018/4/29 22:15# @Author : Feng Xiaoqing# @File : test2.py# @Function: -----------import re
p = re.compile(r'\d+')
print(p.split('one1two2three3four4'))
½á¹û£º
['one', 'two', 'three', 'four', '']
(4)sub¼ìË÷ºÍÌæ»»
Python µÄ re Ä£¿éÌṩÁËre.subÓÃÓÚÌæ»»×Ö·û´®ÖÐµÄÆ¥ÅäÏî¡£
Óï·¨£º
re.sub(pattern, repl, string, count=0, flags=0)
²ÎÊý£º
• pattern : ÕýÔòÖеÄģʽ×Ö·û´®¡£
• repl : Ìæ»»µÄ×Ö·û´®£¬Ò²¿ÉΪһ¸öº¯Êý¡£
• string : Òª±»²éÕÒÌæ»»µÄÔʼ×Ö·û´®¡£
• count : ģʽƥÅäºóÌæ»»µÄ×î´ó´ÎÊý£¬Ä¬ÈÏ 0 ±íÊ¾Ìæ»»ËùÓÐµÄÆ¥Åä¡£
ʵÀý
#!/usr/bin/env python# -*- coding: utf-8 -*-# @Time : 2018/4/29 22:33# @Author : Feng Xiaoqing# @File : test2.py# @Function: -----------
import re
phone = "2004-959-559 # ÕâÊÇÒ»¸ö¹úÍâµç»°ºÅÂë"
# ɾ³ý×Ö·û´®ÖÐµÄ Python×¢ÊÍ num = re.sub(r'#.*$', "", phone)print "µç»°ºÅÂëÊÇ: ", num
# ɾ³ý·ÇÊý×Ö(-)µÄ×Ö·û´® num = re.sub(r'\D', "", phone)print "µç»°ºÅÂëÊÇ : ", num
ÒÔÉÏʵÀýÖ´Ðнá¹ûÈçÏ£º
µç»°ºÅÂëÊÇ: 2004-959-559
µç»°ºÅÂëÊÇ : 2004959559
repl ²ÎÊýÊÇÒ»¸öº¯Êý
ÒÔÏÂʵÀýÖн«×Ö·û´®ÖÐµÄÆ¥ÅäµÄÊý×Ö³ËÒÔ 2£º
ʵÀý
#!/usr/bin/env python# -*- coding: utf-8 -*-# @Time : 2018/4/29 22:15# @Author : Feng Xiaoqing# @File : test2.py# @Function: -----------import re
pattern = re.compile(r'\d+') # ²éÕÒÊý×Öresult1 = pattern.findall('runoob 123 google 456')
result2 = pattern.findall('run88oob123google456', 0, 10)
print(result1)
print(result2)
Ö´ÐÐÊä³ö½á¹ûΪ£º
A46G8HFD1134
£¨5£©findall·½·¨
ÔÚ×Ö·û´®ÖÐÕÒµ½ÕýÔò±í´ïʽËùÆ¥ÅäµÄËùÓÐ×Ó´®£¬²¢·µ»ØÒ»¸öÁÐ±í£¬Èç¹ûûÓÐÕÒµ½Æ¥ÅäµÄ£¬Ôò·µ»Ø¿ÕÁÐ±í¡£
×¢Ò⣺ match ºÍ search ÊÇÆ¥ÅäÒ»´Î findall Æ¥ÅäËùÓС£
Óï·¨¸ñʽΪ£º
findall(string[, pos[, endpos]])
²ÎÊý£º
• string : ´ýÆ¥ÅäµÄ×Ö·û´®¡£
• pos : ¿ÉÑ¡²ÎÊý£¬Ö¸¶¨×Ö·û´®µÄÆðʼλÖã¬Ä¬ÈÏΪ 0¡£
• endpos : ¿ÉÑ¡²ÎÊý£¬Ö¸¶¨×Ö·û´®µÄ½áÊøÎ»Öã¬Ä¬ÈÏΪ×Ö·û´®µÄ³¤¶È¡£
²éÕÒ×Ö·û´®ÖеÄËùÓÐÊý×Ö£º
ʵÀý
#!/usr/bin/env python# -*- coding: utf-8 -*-# @Time : 2018/4/29 22:15# @Author : Feng Xiaoqing# @File : test2.py# @Function: -----------
import re
pattern = re.compile(r'\d+') # ²éÕÒÊý×Öresult1 = pattern.findall('runoob 123 google 456')
result2 = pattern.findall('run88oob123google456', 0, 10)
print(result1)
print(result2)
Êä³ö½á¹û£º
['123', '456']
['88', '12']
£¨6£©finditer·½·¨
ºÍ findall ÀàËÆ£¬ÔÚ×Ö·û´®ÖÐÕÒµ½ÕýÔò±í´ïʽËùÆ¥ÅäµÄËùÓÐ×Ó´®£¬²¢°ÑËüÃÇ×÷Ϊһ¸öµü´úÆ÷·µ»Ø¡£
re.finditer(pattern, string, flags=0)
²ÎÊý£º
²ÎÊý ÃèÊö
pattern Æ¥ÅäµÄÕýÔò±í´ïʽ
string ҪƥÅäµÄ×Ö·û´®¡£
flags ±ê־룬ÓÃÓÚ¿ØÖÆÕýÔò±í´ïʽµÄÆ¥Å䷽ʽ£¬È磺ÊÇ·ñÇø·Ö´óСд£¬¶àÐÐÆ¥ÅäµÈµÈ¡£²Î¼û£ºÕýÔò±í´ïʽÐÞÊηû - ¿ÉÑ¡±êÖ¾
ʵÀý
#!/usr/bin/env python# -*- coding: utf-8 -*-# @Time : 2018/4/29 22:15# @Author : Feng Xiaoqing# @File : test2.py# @Function: -----------import re
it = re.finditer(r"\d+","12a32bc43jf3")
for match in it:
print (match.group() )
Êä³ö½á¹û£º
12
32
43
3
group()
group(0) group(1) group(“tagname”)
gourps()
groupdict()
findall
import re
p = re.compile(r'\d+')
print(findall('one1two2three3four4'))
½á¹û£º
['1', '2', '3', '4']
finditer
sub
Split \d+
‘one1two2three3four4’
#!/usr/bin/env python# -*- coding: utf-8 -*-# @Time : 2018/4/29 20:24# @Author : fengxiaoqing# @File : test.py'''<h1>xxx</h1> ²é¿´²»Í¬Æ¥Å乿ÔòµÄЧÂÊ'''import reimport timeit# print(timeit.timeit(setup='''import re; reg = re.compile('<(?P<tagname>\w*)>.*</(?P=tagname)>')''', stmt='''reg.match('<h1>xxx</h1>')''', number=1000000))# print(timeit.timeit(setup='''import re''', stmt='''re.match('<(?P<tagname>\w*)>.*</(?P=tagname)>', '<h1>xxx</h1>')''', number=1000000))s = "ab<h1>xxx</h1>dsafasdf<html>sdfads</html>"reg = re.compile(r"(<(?P<tag>\w+)>(.*)</(?P=tag)>)")
print(reg.match(s))
print(reg.search(s).group(3))
print(reg.findall(s))# print(reg.findall(s)[1])# print(reg.findall(s)[2])# reg.split(s)# reg.findall(s)# reg.groups(s)x = '1one2two3three4four'reg1 = re.compile("\d")
print(reg1.findall(x))
print(reg1.split(x))
×÷ÕߣºÀÖÓã²¥¿ÍÈ˹¤ÖÇÄÜ+PythonÅàѵѧԺ
Ê×·¢£ºhttp://python.itcast.cn/