图像压缩——LZW压缩算法
文章首发于我的个人博客
1. LZW基础概念
之前提到的算术编码、霍夫曼编码等技术集中在消除编码的冗余上,而本文要讲的LZW编码 是一种针对空间冗余 的无误差压缩方法。
LZW算法o又叫“串表压缩算法”,就是通过建立一个将字符串和其对应的记号 构成的表(把已经出现过的字符串映射到记号上),用较短的代码来表示较长的字符串来实现压缩。
需要注意的是,LZW算法中字符串和记号的对应关系是在压缩的过程中动态生成的,并且隐含在压缩数据中,解压的时候也是一步一步还原编码并动态生成字典的过程。
2. LZW算法详解
2.1 LZW编码算法
编码器从字符串中不断地读入新的字符,并且将字符或字符串映射到新的记号,以动态地构建编码字典。在编码过程中,我们维护两个变量,即previous_char(表示目前已有的,但未被编码的字符)和current_char(表示刚刚读入的字符)。编码算法流程如下
-
(1)初始状态,
previous_char和current_char都是空的。 -
(2)读入新的字符current_char,并于previous合并成新的字符串
p_and_c(previous_char+current_car) -
(3)在编码字典里查找
p_and_c,如果:p_and_c在字典里,previous_char = p_and_cp_and_c不在字典里,将previous_char的标记输出;在字典中建立p_and_c的映射;更新previous_char = current_char。
-
(4)返回步骤(2),重复直至读完字符串。
2.2 Example
ababcababac
初始状态字典里有三个默认的映射
| String | Symbol |
|---|---|
| a | 0 |
| b | 1 |
| c | 2 |
开始编码
| previous_char | current_char | p_and_c | p_and_c in dict | Action | output |
|---|---|---|---|---|---|
| “” | a | a | Y | previous_char = a | - |
| a | b | ab | N | 添加映射{‘ab’, 3} previous_char = b | 0 |
| b | a | ba | N | 添加映射{‘ba’, 4} | 1 |
| a | b | ab | Y | previous_char = ab | - |
| ab | c | abc | N | 添加映射{‘abc’, 5} previous_char = c | 3 |
| c | a | ca | N | 添加映射{‘ca’, 6} previous_char = a | 2 |
| a | b | ab | Y | previous_char = ab | - |
| ab | a | aba | N | 添加映射{‘aba’, 7} previsou_char = a | 3 |
| a | b | ab | Y | previous_char = ab | - |
| ab | a | aba | Y | previous_char = aba | - |
| aba | c | abac | N | 添加映射{‘abac’, 8} previous_char = c | 7 |
| c | - | - | - | - | 2 |
编码结束,输出结果为0132372,编码过程中生成的字典为:
| String | Symbol |
|---|---|
| a | 0 |
| b | 1 |
| c | 2 |
| ab | 3 |
| ba | 4 |
| abc | 5 |
| ca | 6 |
| aba | 7 |
| abac | 8 |
2.2 LZW解码算法
解码器输入的是压缩后的数据。同时,我们将维护两个变量previous_symbol(目前已有的但未被处理的记号),current_symbol(当前读入的记号)。算法流程如下
-
(1)初始状态,
previous_symbol和current_symbol都为空。 -
(2)读入第一个记号,并赋给
current_symbol,并解码直接输出。因为第一个字符为单个字符,一定可以直接解码。 -
(3)赋值,
previous_symbol = current_symbol -
(4)在字典里查找
current_symbol,如果:-
current_symbol在字典里:- 将
current_symbol解码的结果输出 previous_char = dict.get(previous_symbol),current_char = dict.get(current_symbol)[0]。注意,这里current_char为current_symbol解码的第一个字符。- 将
p_and_c添加到新的映射
- 将
-
current_symbol不在字典里:previous_char = dict.get(previous_symbol),current_char = dict.get(previous_symbol)[0]。注意,这里current_char为previous_symbol解码的第一个字符。- 将
p_and_c添加到新的映射 - 输出
p_and_c
-
-
返回步骤(3),直至读完所有记号。
2.4 Example
0 1 3 2 3 7 2
初始状态字典里有三个默认的映射
| Symbol | String |
|---|---|
| 0 | a |
| 1 | b |
| 2 | c |
开始解码:
首先读取第一个记号,current_symbol = 0,输出解码结果dict.get(current_symbol),即a赋值previous_symbol = current_symbol,然后循环读取之后的记号。
| previous_symbol | current_symbol | current_symbol in dict | Action | output |
|---|---|---|---|---|
| 0 | 1 | Y | previous_char = a current_char = b p_and_c = ‘ab’ 添加映射{3, ‘ab’} | b |
| 1 | 3 | Y | previous_char = b current_char = a p_and_c = ‘ba’ 添加映射{4. ‘ba’} | ab |
| 3 | 2 | Y | previous_char = ab current_char = c p_and_c = ‘abc’ 添加映射{5, ‘abc’} | c |
| 2 | 3 | Y | previous_char = c current_char = a p_and_c = ca 添加映射{6. ‘ca’} | ab |
| 3 | 7 | N | previous_char = ab current_char = a p_and_c = aba 添加映射{7, ‘aba’} | aba |
| 7 | 2 | Y | previous_char = aba current_char = c P_and_c = abac 添加映射{8, ‘abac’} | c |
解码结束,输出结果为ababcababac。解码过程中生成的字典为
| Symbol | String |
|---|---|
| 0 | a |
| 1 | b |
| 2 | c |
| 3 | ab |
| 4 | ba |
| 5 | abc |
| 6 | ca |
| 7 | aba |
| 8 | abac |
2.5 代码实现(python)
def encode_init():
k = 0
char_num_dict = {}
for i in range(97,123):
char_num_dict[chr(i)] = k
k += 1
# print(char_num_dict.items())
return char_num_dict
def decode_init():
k = 0
num_char_dict = {}
for i in range(97,123):
num_char_dict[str(k)] = chr(i)
k += 1
# print(char_num_dict.items())
return num_char_dict
def LZW_encode(strList, char_num_dict):
symbol = 26 # 标记
previous_char = "" # 已有的,但未被编码的字符
current_char = "" # 当前读入的字符
p_and_c = ""
output = "" # 输出的编码序列
for s in strList:
current_char = s
p_and_c = previous_char + current_char
# print(previous_char, s)
if char_num_dict.get(p_and_c) != None:
previous_char = p_and_c
else:
output += str(char_num_dict.get(previous_char)) + '-'
char_num_dict[p_and_c] = symbol
symbol += 1
previous_char = current_char
output += str(char_num_dict[s[-1]])
print(char_num_dict.items())
return output
def LZW_decode(output, num_char_dict):
output_list = output.split('-')
print(output_list)
symbol = 26
previous_symbol = ""
current_symbol = ""
previous_char = ""
current_char = ""
decodeList = ""
decodeList += num_char_dict.get(output_list[0])
previous_symbol = output_list[0]
for o in output_list[1:]:
current_symbol = o
# print(previous_symbol, current_symbol)
if num_char_dict.get(current_symbol) != None:
decodeList += num_char_dict.get(current_symbol)
previous_char = num_char_dict.get(previous_symbol)
current_char = num_char_dict.get(current_symbol)[0]
num_char_dict[str(symbol)] = previous_char + current_char
symbol += 1
else:
previous_char = num_char_dict.get(previous_symbol)
current_char = num_char_dict.get(previous_symbol)[0]
num_char_dict[str(symbol)] = previous_char + current_char
symbol += 1
decodeList += previous_char + current_char
previous_symbol = current_symbol
# print(num_char_dict.items())
return decodeList
if __name__ == '__main__':
strList = input("请输入字符序列:").lower()
print('字符序列长度:', len(strList))
char_num_dict = encode_init()
output = LZW_encode(strList, char_num_dict)
print(output)
num_char_dict = decode_init()
# print(num_char_dict.items())
decodeList = LZW_decode(output, num_char_dict)
print(decodeList)
print("编码长度:", len(output) / 2)
python

