霍洛维兹《数据结构基础》|数据结构基础（多维数组 | 字符串） FODS|霍洛维兹|数据结构

文章图片

前言：
最近在读霍罗维兹的《数据结构基础》（Fundamentals of Data Structures in C），本篇博客为阅读笔记和知识总结。

目录：
Ⅰ.多维数组的表示
0x00 定义
0x01 按行存储
Ⅱ.字符串
0x00概念
0x01字符串的抽象数据类型
0x02在C语言中的字符串
0x03 模式匹配
Ⅰ.多维数组的表示 (REPRESENTATION OF MULTIDIMENSIONAL ARRAYS)
0x00 定义
一个数组被声明为
文章图片
，那么数组中的元素数为
文章图片
.
表示多维数组的两种常见方式：① 按行存储 ② 按列存储。

0x01 按行存储
(row major order)
我们将二维数组
文章图片
解释为
文章图片
，这样每一行都包含
文章图片
元素。
我们假设
文章图片
是的地址，则
文章图片
的地址为
文章图片
.
为了表示三维数组
文章图片
，我们可以将其解释为
文章图片
个
文章图片
的二维数组。那么
文章图片
的地址为
文章图片
.
根据上述讨论，我们可以得到寻址公式 ——
声明为
文章图片
的数组中的元素
文章图片
.

文章图片
是
文章图片
的地址，
文章图片
的地址为：
【霍洛维兹《数据结构基础》|数据结构基础（多维数组 | 字符串）】
文章图片

Ⅱ.字符串 (STRINGS)
0x00概念
字符串是一个由零或者多个字符组成的有限序列，
文章图片
, 其中
文章图片
为字符。

0x01字符串的抽象数据类型

文章图片

0x02在C语言中的字符串
表示方式
在C语言中，我们将字符串表示为以零字符 ' \0 ' 结尾的字符数组。

#define MAX_SIZE 100 char s[MAX_SIZE] = "dog"; char t[MAX_SIZE] = "house";

文章图片

Alternative declaration：

char s[] = "dog"; char t[] = "house";

通过调用 strcat(s,t) 将这两个字符串串联起来，将结果存储在s中。
尽管 s 的长度增加了 5 个，但我们在 s 中没有额外的空间来存储额外的 5 个字符。
大多数 C 编译器只是简单地覆盖内存以容纳额外的五个字符。

例 2.2 [字符串插入]

# include # define MAX_SIZE 100 char string1 [MAX_SIZE], *s = string1; char string2 [MAX_SIZE], *t = string2;

文章图片

void strnins(char* s, char* t, int i) { /* insert string t into string s at position i */ char string[MAX_SIZE], * temp = string; if (i<0 && i>strlen(s)) { fprint(stderr, "Position is out of bounds "); exit(1); } if (!strlen(s)) strcpy(s, t); else if (strlen(t)) { strncpy(temp, s, i); strcat(temp, t); strcat(temp, (s + i)); strcpy(s, temp); } }

0x03 模式匹配
讨论一个非常精妙的字符串方法。给定字符串 pat、string，
要在 string 中查找模式串 pat，最简单的方法就是调用内部函数 strstr，给定以下语句：

char pat[MAX_SIZE], string[MAX_SIZE], * t;

为了确定 pat 是否在字符串中，我们可以：

if (t = strstr(string, pat)) printf("The string from strstr is: %s", t); else printf("The pattern was not found with strstr");

对于调用 t = strstr(string, pat) 的返回：
① pat 不在字符串中，则返回一个空指针；
② pat 在字符串中，则返回一个指向 pat 起始位置的指针。

开发属于我们自己的模式匹配函数的原因：
① 我们所使用的编译器可能没有strstr这个函数。
② 有几种不同的方法来实现模式匹配函数。

一个简单的匹配算法：

在字符串的每个位置 i，检查 k if pat == string[i+strlen(pat)-1]

如果 pat 不在字符串中，该算法的计算时间为

文章图片
.
其中 n 为 pat 的长度，m为字符串的长度。

优化点：
① 当 strlen(pat) 大于字符串的剩余字符数时退出。
② 在检查剩余字符之前，先检查 pat 的第一个和最后一个字符。

文章图片

[程序 2.13]

int nfind(chaqr* string, char* pat) { /* match the last character of the pattern first, and then match from the beginning */ int i, j, start = 0; int lasts = strlen(string) - 1; int lastp = strlen(pat) - 1; int endmatch = lastp; for (i = 0; endmatch <= lasts; endmatch++, start++) { if (string[endmatch] == pat[lastp]) for (j = 0, i = start; j < lastp && string[i] == pat[j]; i++, j++) ; if (j == lastp) return start; /* successful */ } return -1; }

分析：
对于 String = "aa...a" 和 pat = "aa...ab" ，计算时间为

文章图片
.
对于 String = "aa...a" 和 pa t= "aa...aba" ，计算时间仍然是

文章图片
.

KMP算法
当发生不匹配时，利用我们对模式中的字符和模式中发生不匹配的位置的了解，
来决定我们应该在哪里继续搜索。
我们想在字符串中搜索该模式，而不在字符串中向后移动。

文章图片

定义
模式串

文章图片
的失配函数

文章图片
定义为：
$霍洛维兹《数据结构基础》|数据结构基础（多维数组 | 字符串）$
文章图片

文章图片

模式匹配的规则
如果发现一个部分匹配，使得

文章图片
，
那么可以通过比较

文章图片
和

文章图片
来恢复匹配。
如果

文章图片
，那么我们可以通过比较

文章图片
和

文章图片
来继续。

申明

#include #include #define max_string_size 100 #define max_pattern_size 100 int pmatch(); void fail(); int failure[max_pattern_size]; char string[max_string_size]; char pat[max_pattern_size];

[程序 2.14]

int pmatch(char* string, char* pat) { /* Knuth, Morris, Pratt string matching algorithm */ int i = 0, j = 0; int lens = strlen(string); int lenp = strlen(pat); while (i < lens && j < lenp) { if (string[i] == pat[j]) { i++; j++; } else if (j == 0) i++; else j = failure[j - 1] + 1; } return ((j == lenp) ? (i - lenp) : -1); }

分析 pmatch 函数
while 循环被迭代，直到抵达字符串或模式的尾部。
每次迭代中，会发生以下三个动作之一：
① 增加 i
② 同时增加 i 和 j
③ 将 j 重置为失败 [ j - 1 ]+ 1
（这里不能超过 j 被语句 j++ 增加的次数）
请注意，j 的增量不能超过 m = strlen(string) 次，因此，pmatch 的复杂度为

文章图片
.

另一种定义方式
如果失配函数的计算时间是

文章图片
，
再加上 pmatch 的分析结果，这个模式匹配的总事件将正比于主串和字符长度之和。
所以失配函数的定义还可以用以下等价的公式表示：

文章图片

[程序 2.15]

void fail(char* pat) { /* compute the pattern’s failure function */ int i, n = strlen(pat); failure[0] = -1; for (j = 1; j < n; j++) { i = failure[j - 1]; while ((pat[j] != pat[i + 1]) && (i >= 0)) i = failure[i]; if (pat[j] == pat[i + 1]) failure[j] = i + 1; else failure[j] = -1; } }

分析 fail 函数：
在while循环的每次迭代中，i 的值都会减少（根据 f 的定义）。
变量 i 在 for 循环的每次迭代开始时被重置。
然而，它要么被重置为-1，要么被重置为比上一次迭代时的终端值大1的值。
由于 for循环只迭代了 n-1 次，i 的值最多有 n-1 的增量。
因此它不能被减去超过 n-1 次。因此，在整个算法中，while循环最多迭代n-1次，
因此 fail 的复杂度为