ValueError: ImageColorGenerator is smaller than the canvas

阅读量：

程序一开始运行没问题，都能成功生成。直到我分析文档用了个8w+字数的文档，max_words设为20+就报错，15能成功生成，期间换了词频背景也没用。
也试了网上说的更改width、height，甚至换过像素很大的背景，都不起作用。

最后解决方法是降低wordcloud版本，我wordcloud版本一开始默认下的最新版1.9.3，然后降到了1.8，就可以成功生成词云，不报错了，也不用设置width、height，默认就行。

复制代码

    pip install wordcloud==1.8

词云生成完整代码：
参考用python实现词频分析+词云

复制代码

    import re                           # 正则表达式库
    import jieba                        # 结巴分词
    import jieba.posseg                 # 词性获取
    import collections                  # 词频统计库
    import numpy                        # numpy数据处理库
    from PIL import Image               # 图像处理库
    import wordcloud                    # 词云展示库
    import matplotlib.pyplot as plt     # 图像展示库（这里以plt代表库的全称）
    
    # 主要功能自定义设置
    Analysis_text = '分析文档.txt'        # 分析文档
    userdict = '用户词典.txt'             # 用户词典
    StopWords = '停用词库.txt'            # 停用词库
    number = 50                          # 统计个数
    Output = '词频.txt'                   # 输出文件
    background = '词频背景.jpg'           # 词频背景
    
    # 读取文件
    fn = open(Analysis_text,'r',encoding = 'UTF-8')  # 打开文件
    string_data = fn.read()                          # 读出整个文件
    fn.close()                                       # 关闭文件
    
    # 文本预处理
    pattern = re.compile(u'\t|\n|\.|-|:|;|\)|\(|\?|"') # 定义正则表达式匹配模式（空格等）
    string_data = re.sub(pattern, '', string_data)     # 将符合模式的字符去除
    
    # 动态调整词典
    jieba.suggest_freq('林小木', True)     #True表示该词不能被分割，False表示该词能被分割
    
    # 添加用户词典
    jieba.load_userdict(userdict)
    
    # 文本分词
    seg_list_exact = jieba.cut(string_data, cut_all=False, HMM=True)    # 精确模式分词+HMM
    object_list = []
    
    # 去除停用词（去掉一些意义不大的词，如标点符号、嗯、啊等）
    with open(StopWords, 'r', encoding='UTF-8') as meaninglessFile:
    stopwords = set(meaninglessFile.read().split('\n'))
    stopwords.add(' ')
    for word in seg_list_exact:         # 循环读出每个分词
    if word not in stopwords:       # 如果不在去除词库中
        object_list.append(word)    # 分词追加到列表
    
    # 词频统计
    word_counts = collections.Counter(object_list)       # 对分词做词频统计
    word_counts_top = word_counts.most_common(number)    # 获取前number个最高频的词
    
    # 英文词性转中文词性字典：简洁版
    En2Cn = {
    'a'    : '形容词',
    'ad'   : '形容词',
    'ag'   : '形容词',
    'al'   : '形容词',
    'an'   : '形容词',
    'b'    : '区别词',
    'bl'   : '区别词',
    'c'    : '连词',
    'cc'   : '连词',
    'd'    : '副词',
    'e'    : '叹词',
    'eng'  : '英文',
    'f'    : '方位词',
    'g'    : '语素',
    'h'    : '前缀',
    'i'    : '成语',
    'j'    : '简称略语',
    'k'    : '后缀',
    'l'    : '习用语',
    'm'    : '数词',
    'mq'   : '数量词',
    'n'    : '名词',
    'ng'   : '名词',
    'nl'   : '名词',
    'nr'   : '名词',
    'nr1'  : '名词',
    'nr2'  : '名词',
    'nrf'  : '名词',
    'nrfg' : '名词',
    'nrj'  : '名词',
    'ns'   : '名词',
    'nsf'  : '名词',
    'nt'   : '名词',
    'nz'   : '名词',
    'o'    : '拟声词',
    'p'    : '介词',
    'pba'  : '介词',
    'pbei' : '介词',
    'q'    : '量词',
    'qt'   : '量词',
    'qv'   : '量词',
    'r'    : '代词',
    'rg'   : '代词',
    'rr'   : '代词',
    'rz'   : '代词',
    'rzs'  : '代词',
    'rzt'  : '代词',
    'rzv'  : '代词',
    'ry'   : '代词',
    'rys'  : '代词',
    'ryt'  : '代词',
    'ryv'  : '代词',
    's'    : '处所词',
    't'    : '时间词',
    'tg'   : '时间词',
    'u'    : '助词',
    'ude1' : '助词',
    'ude2' : '助词',
    'ude3' : '助词',
    'udeng': '助词',
    'udh'  : '助词',
    'uguo' : '助词',
    'ule'  : '助词',
    'ulian': '助词',
    'uls'  : '助词',
    'usuo' : '助词',
    'uyy'  : '助词',
    'uzhe' : '助词',
    'uzhi' : '助词',
    'v'    : '动词',
    'vd'   : '动词',
    'vf'   : '动词',
    'vg'   : '动词',
    'vi'   : '动词',
    'vl'   : '动词',
    'vn'   : '动词',
    'vshi' : '动词',
    'vx'   : '动词',
    'vyou' : '动词',
    'w'    : '标点符号',
    'wb'   : '标点符号',
    'wd'   : '标点符号',
    'wf'   : '标点符号',
    'wj'   : '标点符号',
    'wh'   : '标点符号',
    'wkz'  : '标点符号',
    'wky'  : '标点符号',
    'wm'   : '标点符号',
    'wn'   : '标点符号',
    'wp'   : '标点符号',
    'ws'   : '标点符号',
    'wt'   : '标点符号',
    'ww'   : '标点符号',
    'wyz'  : '标点符号',
    'wyy'  : '标点符号',
    'x'    : '字符串',
    'xu'   : '字符串',
    'xx'   : '字符串',
    'y'    : '语气词',
    'z'    : '状态词',
    'un'   : '未知词',
    }
    
    # 英文词性转中文词性字典：详细版
    En2Cn_Pro = {
    'a'    : '形容词',
    'ad'   : '形容词-副形词',
    'ag'   : '形容词-形容词性语素',
    'al'   : '形容词-形容词性惯用语',
    'an'   : '形容词-名形词',
    'b'    : '区别词',
    'bl'   : '区别词-区别词性惯用语',
    'c'    : '连词',
    'cc'   : '连词-并列连词',
    'd'    : '副词',
    'e'    : '叹词',
    'eng'  : '英文',
    'f'    : '方位词',
    'g'    : '语素',
    'h'    : '前缀',
    'i'    : '成语',
    'j'    : '简称略语',
    'k'    : '后缀',
    'l'    : '习用语',
    'm'    : '数词',
    'mq'   : '数量词',
    'n'    : '名词',
    'ng'   : '名词-名词性语素',
    'nl'   : '名词-名词性惯用语',
    'nr'   : '名词-人名',
    'nr1'  : '名词-汉语姓氏',
    'nr2'  : '名词-汉语名字',
    'nrf'  : '名词-音译人名',
    'nrfg' : '名词-人名',
    'nrj'  : '名词-日语人名',
    'ns'   : '名词-地名',
    'nsf'  : '名词-音译地名',
    'nt'   : '名词-机构团体名',
    'nz'   : '名词-其他专名',
    'o'    : '拟声词',
    'p'    : '介词',
    'pba'  : '介词-“把”',
    'pbei' : '介词-“被”',
    'q'    : '量词',
    'qt'   : '量词-动量词',
    'qv'   : '量词-时量词',
    'r'    : '代词',
    'rg'   : '代词-代词性语素',
    'rr'   : '代词-人称代词',
    'rz'   : '代词-指示代词',
    'rzs'  : '代词-处所指示代词',
    'rzt'  : '代词-时间指示代词',
    'rzv'  : '代词-谓词性指示代词',
    'ry'   : '代词-疑问代词',
    'rys'  : '代词-处所疑问代词',
    'ryt'  : '代词-时间疑问代词',
    'ryv'  : '代词-谓词性疑问代词',
    's'    : '处所词',
    't'    : '时间词',
    'tg'   : '时间词-时间词性语素',
    'u'    : '助词',
    'ude1' : '助词-“的”“底”',
    'ude2' : '助词-“地”',
    'ude3' : '助词-“得”',
    'udeng': '助词-“等”“等等”“云云”',
    'udh'  : '助词-“的话”',
    'uguo' : '助词-“过”',
    'ule'  : '助词-“了”“喽”',
    'ulian': '助词-“连”',
    'uls'  : '助词-“来讲”“来说”“而言”“说来”',
    'usuo' : '助词-“所”',
    'uyy'  : '助词-“一样”“一般”“似的”“般”',
    'uzhe' : '助词-“着”',
    'uzhi' : '助词-“之”',
    'v'    : '动词',
    'vd'   : '动词-副动词',
    'vf'   : '动词-趋向动词',
    'vg'   : '动词-动词性语素',
    'vi'   : '动词-不及物动词（内动词）',
    'vl'   : '动词-动词性惯用语',
    'vn'   : '动词-名动词',
    'vshi' : '动词-“是”',
    'vx'   : '动词-形式动词',
    'vyou' : '动词-“有”',
    'w'    : '标点符号',
    'wb'   : '标点符号-百分号千分号，全角：％ ‰ 半角：%',
    'wd'   : '标点符号-逗号，全角：， 半角：,',
    'wf'   : '标点符号-分号，全角：； 半角： ; ',
    'wj'   : '标点符号-句号，全角：。',
    'wh'   : '标点符号-单位符号，全角：￥ ＄ ￡ ° ℃ 半角 $',
    'wkz'  : '标点符号-左括号，全角：（ 〔 ［ ｛ 《 【 〖 〈 半角：( [ { <',
    'wky'  : '标点符号-右括号，全角：） 〕 ］ ｝ 》 】 〗 〉 半角： ) ] { >',
    'wm'   : '标点符号-冒号，全角：： 半角： :',
    'wn'   : '标点符号-顿号，全角：、',
    'wp'   : '标点符号-破折号，全角：—— －－ ——－ 半角：—',
    'ws'   : '标点符号-省略号，全角：…… …',
    'wt'   : '标点符号-叹号，全角：！ 半角：!',
    'ww'   : '标点符号-问号，全角：？ 半角：?',
    'wyz'  : '标点符号-左引号，全角：“ ‘ 『',
    'wyy'  : '标点符号-右引号，全角：” ’ 』',
    'x'    : '字符串',
    'xu'   : '字符串-网址URL',
    'xx'   : '字符串-非语素字',
    'y'    : '语气词',
    'z'    : '状态词',
    'un'   : '未知词',
    }
    
    # 输出至工作台，并导出“词频.txt”文件
    print ('\n词语\t词频\t词性')
    print ('——————————')
    fileOut = open(Output,'w',encoding='UTF-8')     # 创建文本文件；若已存在，则进行覆盖
    fileOut.write('词语\t词频\t词性\n')
    fileOut.write('——————————\n')
    count = 0
    for TopWord,Frequency in word_counts_top:                       # 获取词语和词频
    for POS in jieba.posseg.cut(TopWord):                       # 获取词性
        if count == number:
            break
        st = list(En2Cn.values())[list(En2Cn.keys()).index(POS.flag)]
        print(TopWord + '\t',str(Frequency) + '\t',list(En2Cn.values())[list(En2Cn.keys()).index(POS.flag)])                    # 逐行输出数据
        fileOut.write(TopWord + '\t' + str(Frequency) + '\t' + list(En2Cn.values())[list(En2Cn.keys()).index(POS.flag)] + '\n') # 逐行写入str格式数据
        count += 1
    fileOut.close()
    
    # 词频展示
    print ('\n开始制作词云……')                    # 提示当前状态
    from PIL import Image
    
    mask_image = Image.open(background)
    mask_width, mask_height = mask_image.size
    print(f"Mask size: {mask_width}x{mask_height}")
    mask = numpy.array(Image.open(background))      # 定义词频背景
    wc = wordcloud.WordCloud(
    font_path = 'C:/Windows/Fonts/simfang.ttf', # 设置字体（这里选择“仿宋”）
    background_color='white',                   # 背景颜色
    mask = mask,
    height=680,
    width=680,
    max_words = number,                         # 显示词数
    max_font_size = 150                         # 最大字号
    )
    # word_counts.pop('',None)
    wc.generate_from_frequencies(word_counts)                                        # 从字典生成词云
    wc.recolor(color_func=wordcloud.ImageColorGenerator(mask))                       # 将词云颜色设置为背景图方案
    plt.figure('词云')
    plt.subplots_adjust(top=0.99,bottom=0.01,right=0.99,left=0.01,hspace=0,wspace=0) # 调整边距
    plt.imshow(wc, cmap=plt.cm.gray, interpolation='bilinear')                       # 处理词云
    plt.axis('off')                                                                  # 关闭坐标轴
    print ('制作完成！')
    plt.show()
    # 避免程序运行完成后直接退出
    input()

全部评论 (0)

还没有任何评论哟~

ValueError: ImageColorGenerator is smaller than the canvas

程序一开始运行没问题，都能成功生成。直到我分析文档用了个8w+字数的文档，maxwords设为20+就报错，15能成功生成，期间换了词频背景也没用。也试了网上说的更改width、height，甚至换...

New Memory allocation 1046759 bytes is smaller than the minimum allocation size of 1048576 bytes.

org.apache.spark.SparkException:Jobabortedduetostagefailure:Task0instage3.0failed4times,mostrecentfa...

ValueError: The truth value of an array with more than one element is ambiguous

报错如下：思考：定位错误在if判断这一块，但是，仍然很困惑为什么会报这个错误，于是我分别打印data 以及stockqua 这样也看不出来有什么问题，于是想阿想。我们将复杂的问题简单化。现在da...

【Flink cdc】Data row is smaller than a column index, internal schema representation is probably xxx

使用FlinkCDC从mysql同步数据到Doris，遇到报错：Causedby:com.ververica.cdc.connectors.shaded.org.apache.kafka.connec...

报错 Attempted to pad to a smaller size than the input element

跑BERT的时候报的错，因为准备tfrecord的时候maxseqlength写的512，而训练finetune时maxseqlength写的256

moviepy AudioClip帧处理ValueError: The truth value of array with more than one element is ambiguous

☞░前往老猿Python博文目录░ 一、环境操作系统：win764位 moviepy：1.0.3 numpy：1.19.0 Python:3.7.2 二、应用代码及报错信息程序代码 ifname=...

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.al

报错 ValueError:Thetruthvalueofanarraywithmorethanoneelementisambiguous.Usea.anyora.all 原因我在使用多进程间使用M...

mmdetection中：RuntimeError：input is smaller than kernel(shape_check at mmdet/ops/dcn/src/deform_conv)

问题描述 RuntimeError：inputissmallerthankernelshapecheckatmmdet/ops/dcn/src/deformconv 前提：在使用mmdetection...

【Flink】 The transaction timeout is larger than the maximum value allowed by the broker

在编写Flink精准一次写入Kafka的时候需要如下问题 org.apache.kafka.common.KafkaException:UnexpectederrorinInitProducerIdR...

ValueError The passed save_path is not a valid checkpoint

1\.背景在使用tensorflow1.x版本来跑目标检测算法时，发现在加载官方与训练模型时出现上述错误，上网找了好几种解释，但是都没有成功。分别是：模型的checkpoint路径太长，要改短一...

是否确定退出登录?

ValueError: ImageColorGenerator is smaller than the canvas

全部评论 (0)

相关文章推荐

ValueError: ImageColorGenerator is smaller than the canvas

New Memory allocation 1046759 bytes is smaller than the minimum allocation size of 1048576 bytes.

ValueError: The truth value of an array with more than one element is ambiguous

【Flink cdc】Data row is smaller than a column index, internal schema representation is probably xxx

报错 Attempted to pad to a smaller size than the input element

moviepy AudioClip帧处理ValueError: The truth value of array with more than one element is ambiguous

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.al

mmdetection中：RuntimeError：input is smaller than kernel(shape_check at mmdet/ops/dcn/src/deform_conv)

【Flink】 The transaction timeout is larger than the maximum value allowed by the broker

ValueError The passed save_path is not a valid checkpoint