kmp字符串匹配

问题:给定字符串text和模式串pattern,问pattern在text中出现过多少次?

KMP算法可以在线性时间内找出所有匹配,其优势在于最坏情况下的时间复杂度仍为O(n+m),其中n和m分别为text和pattern的长度。

#include <stdio.h>
#include <string.h>
char text[1000005], pattern[10005];
int next[10005];
void getnext(char *p, int m) {
    int i = 0, j = -1;
    next[0] = -1;
    while (i < m) {
        if (-1 == j || p[j] == p[i])
            i++, j++, next[i] = j;
        else j = next[j];
    }
}
int kmp(char *t, int n, char *p, int m) {
    int i = 0, j = 0, cnt = 0;
    getnext(p, m);
    while (i < n) {
        if (-1 == j || p[j] == t[i])
            i++, j++;
        else j = next[j];
        if (j == m) {
            cnt++; // find a match, index from i-j to i-1.
            j = 0; // or j = next[j] if overlap is ok
        }
    }
    return cnt;
}
int main() {
    while (~scanf("%s%s", text, pattern))
        printf("%d\n", kmp(text, strlen(text), pattern, strlen(pattern)));
    return 0;
}

关于next数组的理解可参考下图:

Table of Contents