Ruby學習筆記-正則表達式－互聯網

文章出處

1.創建正則表達式

a) reg1 = /^[a-z]*$/ #將模式的定義放在兩個正斜杠之間,返回一個Regexp對象

b) reg2 = Regexp.new(‘^[a-z]*$’) #創建一個Regexp對象

c) reg3 = %r{^[a-z]*$} #使用前置的%r

2.匹配正則式: String和Regexp都支持以下兩個方法

a) match方法: 匹配成功時返回MatchData類的一個實例；否則返回nil；

b) =~ 操作符: 匹配成功,返回一個索引(integer)；否則,返回nil；

例:

puts( /abc/ =~ 'abc' ) #=>return 0

puts( /abc/ =~ 'cdg' ) #=>return nil

puts( /abc/.match('abc') ) #=>return abc

puts( /abc/.match('cdg') ) #=>return nil

3.匹配組

在Ruby正則表達式中,可以用正則式匹配一個或多個子字符串；方法是將正

則式用小括號括起來；使用小括號指定的獲取子字符串,可以將匹配的字符串保存；如下正則式中有兩個組(hi)和(h…o):

　　/(hi).*(h...o)/ =~ "The word 'hi' is short for 'hello'."

匹配成功時, 會把匹配的值賦給一些變量(正則式中有多少組就有多少變量), 這些變量可以通過$1,$2,$3…的形式訪問；如果執行上面的那行代碼,可以使用$1,$2來訪問變量:

print ( $1, " ", $2, "\n" ) #=> hi hello

Note: 如果整個正則式匹配不成功,那么就不會就有變量被初始化, 而是返回nil.

4. MatchData類型

前面也提到過了,使用=~時返回的是一個整數或nil, 面使用match方法時會返回MatchData對象, 它包含了匹配模式的結果；乍一看,很像是字符串:

　　 puts( /cde/.match('abcdefg') ) #=> cde #=>cde

puts( /cde/=~('abcdefg') ) #=> cde 　　#=>2

實際上, 它是MatchData類的一個實例且包含一個字符串:

p( /cde/.match('abcdefg') ) 　　　#=> #<MatchData: “cde” >

可以使用MatchData對象的to_a或captures方法返回包含其值的一個數組:

x = /(^.*)(#)(.*)/.match( 'def myMethod 　　 # This is a very nice method' )

x.captures.each{ |item| puts( item ) }

上面代碼會輸出:

def myMethod

This is a very nice method

Note: captures 和to_a方法有一點點區別,后者會包含原始串

x.captures #=>["def myMethod ","#"," This is a very nice method"]

x.to_a #=>["def myMethod # This is a very nice method","def myMethod ","#"," This is a very nice method"]

5. Pre & Post 方法

a) pre_match或($`): 返回匹配串前的串

b) post_match或($'): 返回匹配串后的串

　　x = /#/.match( 'def myMethod # This is a very nice method' )

　　puts( x.pre_match ) #=> def myMethod

　　puts( x.post_match ) #=> This is a very nice method

6. 貪婪匹配

當一個字符串包含多個可能的匹配時，有時可能只想返回第一個匹配的串；

有時可能想返回所有匹配的串，這種情況就叫貪婪匹配；符號*(0 or more) 和 + (1 or more)可以用來進行貪婪匹配。使用符號? (0 or 1) 進行最少匹配；

puts( /.*at/.match('The cat sat on the mat!') )　　 #=> returns: The cat sat on the mat

　　 puts( /.*?at/.match('The cat sat on the mat!') )　　#=> returns: The cat

7. 字符串中的方法

a) =~ 和match: 用法同Regexp.

b) String.scan(pattern)：盡可能多的去匹配，并把第一個匹配添加到數組中.

　　TESTSTR = "abc is not cba"

　　b = /[abc]/.match( TESTSTR ) 　　#=> MatchData: "a" puts( "--scan--" )

　　a = TESTSTR.scan(/[abc]/) #=> Array: ["a", "b", "c", "c", "b", "a"]

此外，還可以給sacn方法傳遞一個block：

a = TESTSTR.scan(/[abc]/){|c| print( c.upcase ) } #=> ABCCBA

c) String.split(pattern)：基于pattern來分割原串并返回一個數組；如果pattern為空(//)，就把原串分割為字符；

　　s = "def myMethod 　　# a comment"

　　p( s.split( /m.*d/ ) ) 　# => ["def ", " # a comment"]

　　p( s.split( /\s/ ) ) 　#=> ["def", "myMethod", "#", "a", "comment"]

　　p( s.split( // ) ) 　 # => ["d", "e", "f", " ", "m", "y", "M", "e", "t", "h", "o", "d", " ", "#", " ", "a", " ", "c", "o", "m", "m", "e", "n", "t"]

d) String. slice(pattern)：返回匹配的串(原串不變)，

String. Slice!(pattern)：返回匹配的串并在原串刪除匹配的串(修改了原串的值)

　　s = "def myMethod 　　 # a comment "

　　puts( s.slice( /m.*d/ ) ) 　　 #=> myMethod

　　puts( s ) 　　 #=> def myMethod # a comment

　　puts( s.slice!( /m.*d/ ) ) 　　 #=> myMethod

　　puts( s ) #=> def # a comment

8.正則表達式匹配規則

規則	說明
/a/	匹配字符a
/\?/	匹配特殊字符?。特殊字符包括^, $, ? , ., /, \, [, ], {, }, (, ), +, *.
.	匹配任意字符，例如/a./匹配ab和ac。
/[ab]c/	匹配ac和bc,[]之間代表范圍,例如：/[a-z]/ , /[a-zA-Z0-9]/。
/[^a-zA-Z0-9]/	匹配不在該范圍內的字符串
/[\d]/	代表任意數字
/[\w]/	代表任意字母，數字或者_
/[\s]/	代表空白字符，包括空格，TAB和換行
/[\D]/,/[\W]/,/[\S]/	均為上述的否定情況
?	代表0或1個字符
*	代表0或多個字符
+	代表1或多個字符
/d{3}/	匹配3個數字
/d{1,10}/	匹配1-10個數字
d{3,}/	匹配3個數字以上
/([A-Z]\d){5}/	匹配首位是大寫字母，后面4個是數字的字符串