AIGC：Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis

阅读量：

重点内容：

1：Kolors 国内镜像加速下载

2：diffusers 特定版本安装

3：批量生成图片以及显存占用

项目：GitHub - Kwai-KCOLORS/KCOLORS: KCOLORS TEAM

论文：Kolors/imgs/Kolors_paper.pdf at master · Kwai-Kolors/Kolors · GitHub

模型：

huaggingface: https://huggingface.co/Kwai-Kolors/Kolors-diffusers

modelscope: https://modelscope.cn/models/Kwai-Kolors/Kolors

简介：

Kolors 是一个由快手 Kolors 团队主导开发的基于潜在扩散机制的大规模预训练模型，在文本到图像生成领域具有重要地位。该系统在大规模的数据集（包含数十亿文本-图像对）上进行了系统的训练，并在此过程中不断优化其生成能力。其在视觉质量、复杂语义理解和中英文字符渲染等方面的表现均优于现有开源及定制化模型。值得注意的是，该系统不仅支持中英文双语输入，在处理特定场景下的中文内容时也展现出卓越的能力。

使用：

Diffusers（版本 diffusers==0.30.0.dev0）

复制代码

 git clone -b v0.30.0 https://github.com/huggingface/diffusers

    
 cd diffusers
    
 python3 setup.py install
    
    
    
    
    代码解读

注意:

1：KolorsPipeline基于默认配置采用EulerDiscreteScheduler作为噪声调度器。建议搭配该调度器时分别设置guidance scale为5.0和num_inference_steps为50。
2：KolorsPipeline也支持EDMDPMSolverMultistepScheduler这一噪声调度器。建议搭配该噪声调度器时分别设置guidance scale为5.0和num_inference_steps为25。
3：除了具备文生图能力之外，KolorsImg2ImgPipeline还提供了图文生图功能。

模型：

复制代码

 # 国内镜像加速

    
 export HF_ENDPOINT=https://hf-mirror.com
    
  
    
 huggingface-cli download --resume-download Kwai-Kolors/Kolors-diffusers --local-dir weights/Kolors-diffusers
    
    
    
    
    代码解读

代码：

复制代码

 import torch

    
  
    
 from diffusers import KolorsPipeline
    
  
    
 pipe = KolorsPipeline.from_pretrained(
    
     "Kwai-Kolors/Kolors-diffusers", 
    
     torch_dtype=torch.float16, 
    
     variant="fp16"
    
 ).to("cuda")
    
  
    
 prompt = '一张瓢虫的照片，微距，变焦，高质量，电影，拿着一个牌子，写着"可图"'
    
  
    
 image = pipe(
    
     prompt=prompt,
    
     negative_prompt="",
    
     guidance_scale=5.0,
    
     num_inference_steps=50,
    
     generator=torch.Generator(pipe.device).manual_seed(66),
    
 ).images[0]
    
 image.show()
    
    
    
    
    代码解读

显存占用

批量运行

根据自己的项目数据场景，提前准备好prompt，批量生图

其他：

1：HuggingFace CLI 安装与使用

参考链接：[HuggingFace CLI 命令文档_huggingface-cli-博客]( "HuggingFace CLI 命令文档_huggingface-cli-博客")

2：HF-Mirror 安装与使用

参考：https://hf-mirror.com/

全部评论 (0)

还没有任何评论哟~

AIGC：Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis

重点内容： 1：Kolors国内镜像加速下载 2：diffusers特定版本安装 3：批量生成图片以及显存占用代码：GitHubKwaiKolors/Kolors:KolorsTeam 论文：Kol...

PixArt-α：Fast Training of diffusion transformer for photorealistic text-to-image synthesis

文生图模型之PixArtα知乎华为发布了新的文生图模型PixArtα，它主打的是低训练成本。PixArtα采用Meta之前提出的DiT架构，模型参数只有0.6B，它的训练时长只有SD1.5的10.8%...

vector quantized diffusion model for text-to-image synthesis

CVPR2022论文分享会基于VQDiffusion的文本到图像合成哔哩哔哩bilibiliCVPR2022论文分享会基于VQDiffusion的文本到图像合成,视频播放量1438、弹幕量2、点赞数3...

论文学习——Vector Quantized Diffusion Model for Text-to-Image Synthesis

文章目录引言正文 Abstract 文章的核心 VQ潜在空间适合文本转图片生成 VQDiffusion的比起自回归和GAN的其他模型的成果 Introduction NLP的成功给图片生成的启发 ...

PixArt-σ：Weak-to-strong training of diffusion transformer for 4k text-to-image generation

文生图模型PixArt再升级，可生成4K图像知乎文生图模型PixArt新的升级版本PIXARTΣ:WeaktoStrongTrainingofDiffusionTransformerfor4KText...

Photorealistic Text-to-Image Diffusion Modelswith Deep Language Understanding

1Title PhotorealisticTexttoImageDiffusionModelswithDeepLanguageUnderstanding（ChitwanSaharia,WilliamC...

DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis

1研究目的现有的模型方法中存在以下问题： •堆叠生成器引入的纠缠问题 •限制额外网络在语义一致性方面的监督能力 •计算成本导致跨模态注意力文本图像融合受限的问题为了解决这些局限性，作者提出了一种更...

Imagen Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

Imagen:PhotorealisticTexttoImageDiffusionModelswithDeepLanguageUnderstanding TL;DR：本文提出了Imagen，并介绍了多...

Kandinsky-3:text-to-image diffusion model

Kandinsky3：最大的开源文生图模型知乎在开源Kandinsky2.2之后，俄罗斯AI研究团队AIForever又开源了新的文生图模型Kandinsky3，这个模型最特别之处时采用了一个超大的t...

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

CoMat:AligningTexttoImageDiffusionModelwithImagetoTextConceptMatching 图文一致性的原因还是因为文本的token的激活注意值不高，导...

是否确定退出登录?

AIGC：Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis

全部评论 (0)

相关文章推荐

AIGC：Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis

PixArt-α：Fast Training of diffusion transformer for photorealistic text-to-image synthesis

vector quantized diffusion model for text-to-image synthesis

论文学习——Vector Quantized Diffusion Model for Text-to-Image Synthesis

PixArt-σ：Weak-to-strong training of diffusion transformer for 4k text-to-image generation

Photorealistic Text-to-Image Diffusion Modelswith Deep Language Understanding

DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis

Imagen Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

Kandinsky-3:text-to-image diffusion model

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching