AI FRONTIER

Qwen 发布 Qwen-Image-Edit 图像编辑模型

Qwen-Image-Edit: Image Editing with Higher Quality and Efficiency

QWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORD

We are excited to introduce Qwen-Image-Edit, the image editing version of Qwen-Image. Built upon our 20B Qwen-Image model, Qwen-Image-Edit successfully extends Qwen-Image's unique text rendering capabilities to image editing tasks, enabling precise text editing. Furthermore, Qwen-Image-Edit simultaneously feeds the input image into Qwen2.5-VL (for visual semantic control) and the VAE Encoder (for visual appearance control), achieving capabilities in both semantic and appearance editing. To experience the latest model, visit Qwen Chat and select the "Image Editing" feature.

Semantic and Appearance Editing : Qwen-Image-Edit supports both low-level visual appearance editing (such as adding, removing, or modifying elements, requiring all other regions of the image to remain completely unchanged) and high-level visual semantic editing (such as IP creation, object rotation, and style transfer, allowing overall pixel changes while maintaining semantic consistency).

Precise Text Editing : Qwen-Image-Edit supports bilingual (Chinese and English) text editing, allowing direct addition, deletion, and modification of text in images while preserving the original font, size, and style.

Strong Benchmark Performance : Evaluations on multiple public benchmarks demonstrate that Qwen-Image-Edit achieves state-of-the-art (SOTA) performance in image editing tasks, establishing it as a powerful foundation model for image editing.

One of the highlights of Qwen-Image-Edit lies in its powerful capabilities for semantic and appearance editing. Semantic editing refers to modifying image content while preserving the original visual semantics. To intuitively demonstrate this capability, let's take Qwen's mascot—Capybara—as an example:

As can be seen, although most pixels in the edited image differ from those in the input image (the leftmost image), the character consistency of Capybara is perfectly preserved. Qwen-Image-Edit's powerful semantic editing capability enables effortless and diverse creation of original IP content. Furthermore, on Qwen Chat, we designed a series of editing prompts centered around the 16 MBTI personality types. Leveraging these prompts, we successfully created a set of MBTI-themed emoji packs based on our mascot Capybara, effortlessly expanding the IP's reach and expression.

Another standout feature of Qwen-Image-Edit is its accurate text editing capability, which stems from Qwen-Image's deep expertise in text rendering. As shown below, the following two cases vividly demonstrate Qwen-Image-Edit's powerful performance in editing English text: Qwen-Image-Edit can also directly edit Chinese posters, enabling not only modifications to large headline text but also precise adjustments to even small and intricate text elements.

Isn't it amazing? With this chained, step-by-step editing approach, we can continuously correct character errors until the desired final result is achieved. Finally, we have successfully obtained a completely correct calligraphy version of Lantingji Xu (Orchid Pavilion Preface) !

In summary, we hope that Qwen-Image-Edit can further advance the field of image generation, truly lower the technical barriers to visual content creation, and inspire even more innovative applications.

QWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORD We are excited to introduce Qwen-Image-Edit, the image editing version of Qwen-Image. Built upon our 20B Qwen-Image model, Qwen-Image-Edit successfully extends Qwen-Image's unique text rendering capabilities to image editing tasks, enabling precise text editing. Furthermore, Qwen-Image-Edit simultaneously feeds the input image into Qwen2.5-VL (for visual semantic control) and the VAE Encoder (for visual appearance control), achieving capabilities in both semantic and appearance editing.

Qwen 官方博客发布 Qwen-Image-Edit,面向高质量图像编辑和文字渲染场景。

通义千问 这条官方动态围绕「Qwen 发布 Qwen-Image-Edit 图像编辑模型」展开,英文标题为 “Qwen-Image-Edit: Image Editing with Higher Quality and Efficiency”。正文重点落在多模态能力、生成质量和接口可用范围,需要结合官方发布内容理解它对模型使用和开发者接入的影响。

对用户来说,这类信息最有价值的部分是判断新能力是否已经可用、适合哪些任务,以及调用时可能受到哪些版本、地区、权限或产品形态限制。

放到 AI API 中转站评测场景,重点要看服务商是否真实支持相关模型或能力,模型名称、返回行为、延迟、错误信息、上下文限制和价格说明是否能相互印证。

后续自测时可以围绕「Qwen 发布 Qwen-Image-Edit 图像编辑模型」设计更具体的探针任务:复杂提示词、连续对话、工具调用、多模态输入或代码任务都能帮助区分真实能力和只写在页面上的模型列表。

引用来源:通义千问
返回 AI最前沿