如前所述,正则表达式主要是一种线性和仅单规则的引擎-
您可以在是否贪婪捕获之间进行选择,但不能同时选择两者。而且,大多数正则表达式引擎不支持重叠匹配(甚至那些支持它的子字符串/强制头部移动来伪造它),因为它也不符合正则表达式的原理。
如果只寻找两个子字符串之间的简单重叠匹配,则可以自己实现:
def find_substrings(data, start, end): result = [] s_len = len(start) # a shortcut for `start` length e_len = len(end) # a shortcut for `end` length current_pos = data.find(start) # find the first occurrence of `start` while current_pos != -1: # loop while we can find `start` in our data # find the first occurrence of `end` after the current occurrence of `start` end_pos = data.find(end, current_pos + s_len) while end_pos != -1: # loop while we can find `end` after the current `start` end_pos += e_len # just so we include the selected substring result.append(data[current_pos:end_pos]) # add the current substring end_pos = data.find(end, end_pos) # find the next `end` after the curr. `start` current_pos = data.find(start, current_pos + s_len) # find the next `start` return result
这将产生:
substrings = find_substrings("BADACBA", "B", "A")# ['BA', 'BADA', 'BADACBA', 'BA']
但是您必须对其进行修改才能进行更复杂的匹配。