Advertisement

图像压缩——LZW压缩算法

阅读量:
文章首发于我的个人博客

1. LZW基础概念

之前提到的算术编码、霍夫曼编码等技术集中在消除编码的冗余上,而本文要讲的LZW编码 是一种针对空间冗余 的无误差压缩方法。

LZW算法o又叫“串表压缩算法”,就是通过建立一个将字符串和其对应的记号 构成的表(把已经出现过的字符串映射到记号上),用较短的代码来表示较长的字符串来实现压缩。

需要注意的是,LZW算法中字符串和记号的对应关系是在压缩的过程中动态生成的,并且隐含在压缩数据中,解压的时候也是一步一步还原编码并动态生成字典的过程。

2. LZW算法详解

2.1 LZW编码算法

编码器从字符串中不断地读入新的字符,并且将字符或字符串映射到新的记号,以动态地构建编码字典。在编码过程中,我们维护两个变量,即previous_char(表示目前已有的,但未被编码的字符)和current_char(表示刚刚读入的字符)。编码算法流程如下

  • (1)初始状态,previous_charcurrent_char都是空的。

  • (2)读入新的字符current_char,并于previous合并成新的字符串p_and_c(previous_char+current_car)

  • (3)在编码字典里查找p_and_c,如果:

    • p_and_c在字典里,previous_char = p_and_c
    • p_and_c不在字典里,将previous_char的标记输出;在字典中建立p_and_c的映射;更新previous_char = current_char
  • (4)返回步骤(2),重复直至读完字符串。

2.2 Example

ababcababac

初始状态字典里有三个默认的映射

String Symbol
a 0
b 1
c 2

开始编码

previous_char current_char p_and_c p_and_c in dict Action output
“” a a Y previous_char = a -
a b ab N 添加映射{‘ab’, 3} previous_char = b 0
b a ba N 添加映射{‘ba’, 4} 1
a b ab Y previous_char = ab -
ab c abc N 添加映射{‘abc’, 5} previous_char = c 3
c a ca N 添加映射{‘ca’, 6} previous_char = a 2
a b ab Y previous_char = ab -
ab a aba N 添加映射{‘aba’, 7} previsou_char = a 3
a b ab Y previous_char = ab -
ab a aba Y previous_char = aba -
aba c abac N 添加映射{‘abac’, 8} previous_char = c 7
c - - - - 2

编码结束,输出结果为0132372,编码过程中生成的字典为:

String Symbol
a 0
b 1
c 2
ab 3
ba 4
abc 5
ca 6
aba 7
abac 8
2.2 LZW解码算法

解码器输入的是压缩后的数据。同时,我们将维护两个变量previous_symbol(目前已有的但未被处理的记号),current_symbol(当前读入的记号)。算法流程如下

  • (1)初始状态,previous_symbolcurrent_symbol都为空。

  • (2)读入第一个记号,并赋给current_symbol,并解码直接输出。因为第一个字符为单个字符,一定可以直接解码。

  • (3)赋值,previous_symbol = current_symbol

  • (4)在字典里查找current_symbol,如果:

    • current_symbol在字典里:

      • current_symbol解码的结果输出
      • previous_char = dict.get(previous_symbol), current_char = dict.get(current_symbol)[0] 。注意,这里current_charcurrent_symbol解码的第一个字符。
      • p_and_c添加到新的映射
    • current_symbol不在字典里:

      • previous_char = dict.get(previous_symbol), current_char = dict.get(previous_symbol)[0] 。注意,这里current_charprevious_symbol解码的第一个字符。
      • p_and_c添加到新的映射
      • 输出p_and_c
  • 返回步骤(3),直至读完所有记号。

2.4 Example

0 1 3 2 3 7 2

初始状态字典里有三个默认的映射

Symbol String
0 a
1 b
2 c

开始解码:

首先读取第一个记号,current_symbol = 0,输出解码结果dict.get(current_symbol),即a赋值previous_symbol = current_symbol,然后循环读取之后的记号。

previous_symbol current_symbol current_symbol in dict Action output
0 1 Y previous_char = a current_char = b p_and_c = ‘ab’ 添加映射{3, ‘ab’} b
1 3 Y previous_char = b current_char = a p_and_c = ‘ba’ 添加映射{4. ‘ba’} ab
3 2 Y previous_char = ab current_char = c p_and_c = ‘abc’ 添加映射{5, ‘abc’} c
2 3 Y previous_char = c current_char = a p_and_c = ca 添加映射{6. ‘ca’} ab
3 7 N previous_char = ab current_char = a p_and_c = aba 添加映射{7, ‘aba’} aba
7 2 Y previous_char = aba current_char = c P_and_c = abac 添加映射{8, ‘abac’} c

解码结束,输出结果为ababcababac。解码过程中生成的字典为

Symbol String
0 a
1 b
2 c
3 ab
4 ba
5 abc
6 ca
7 aba
8 abac
2.5 代码实现(python)
复制代码
    def encode_init():
    	k = 0
    	char_num_dict = {}
    	for i in range(97,123):
    		char_num_dict[chr(i)] = k
    		k += 1
    
    	# print(char_num_dict.items())
    	return char_num_dict
    
    def decode_init():
    	k = 0
    	num_char_dict = {}
    	for i in range(97,123):
    		num_char_dict[str(k)] = chr(i)
    		k += 1
    
    	# print(char_num_dict.items())
    	return num_char_dict
    
    
    def LZW_encode(strList, char_num_dict):
    	symbol = 26		# 标记
    	previous_char = ""		# 已有的,但未被编码的字符
    	current_char = ""		# 当前读入的字符
    	p_and_c = ""
    	output = "" 			# 输出的编码序列
    	for s in strList:
    		current_char = s
    		p_and_c = previous_char + current_char
    		# print(previous_char, s)
    		if char_num_dict.get(p_and_c) != None:
    			previous_char = p_and_c
    		else:
    			output += str(char_num_dict.get(previous_char)) + '-'
    			char_num_dict[p_and_c] = symbol
    			symbol += 1 
    			previous_char = current_char
    	output += str(char_num_dict[s[-1]])
    
    	print(char_num_dict.items())
    	return output
    
    
    def LZW_decode(output, num_char_dict):
    	output_list = output.split('-')
    	print(output_list)
    
    	symbol = 26
    	previous_symbol = ""
    	current_symbol = ""
    
    	previous_char = ""
    	current_char = ""
    
    	decodeList = ""
    	decodeList += num_char_dict.get(output_list[0])
    	previous_symbol = output_list[0] 
    
    	for o in output_list[1:]:
    		current_symbol = o 
    		# print(previous_symbol, current_symbol)
    		if num_char_dict.get(current_symbol) != None:
    			decodeList += num_char_dict.get(current_symbol)
    
    			previous_char = num_char_dict.get(previous_symbol)
    			current_char = num_char_dict.get(current_symbol)[0]
    			num_char_dict[str(symbol)] = previous_char + current_char
    			symbol += 1 
    		else:
    			previous_char = num_char_dict.get(previous_symbol)
    			current_char = num_char_dict.get(previous_symbol)[0]
    			num_char_dict[str(symbol)] = previous_char + current_char
    			symbol += 1
    
    			decodeList += previous_char + current_char
    		previous_symbol = current_symbol
    
    	# print(num_char_dict.items())
    	return decodeList
    
    
    
    
    
    
    if __name__ == '__main__':
    	strList = input("请输入字符序列:").lower()
    	print('字符序列长度:', len(strList))
    	char_num_dict = encode_init()
    	output = LZW_encode(strList, char_num_dict)
    	print(output)
    
    	num_char_dict = decode_init()
    	# print(num_char_dict.items())
    	decodeList = LZW_decode(output, num_char_dict)
    	print(decodeList)
    
    	print("编码长度:", len(output) / 2)
    
    
    
    
    python
    
    
![](https://ad.itadn.com/c/weblog/blog-img/images/2025-07-14/JQ60zlUrT4jL53ByKStx1M8Nvgsa.png)

全部评论 (0)

还没有任何评论哟~