DeepSeek-V4がもたらすNLPの新時代——Transformers v5.8.0リリース

Hugging Face Transformersが新バージョンv5.8.0をリリース、DeepSeek-V4モデル追加

元記事タイトル: Transformers v5.8.0リリース：DeepSeek-V4モデル追加

Hugging Face Transformers Releases 2026年05月05日

RELEASE リリース / Update

Field Note 読む前に確認

3行まとめ

Hugging Face Transformersプロジェクトがv5.8.0版をリリース
新たなモデルとしてDeepSeek-V4が追加された
大規模なデータセットでの学習効率と精度向上に貢献

こんな人に関係ある話

Pythonエンジニア自然言語処理開発者機械学習研究者

信頼度メモ

Hugging Face Transformers Releases の公式情報

記事の読み解き Reading

元記事を材料に、要点、編集視点、良い点と懸念点を読みやすい順に整理しています。

Hugging Face Transformersプロジェクトがv5.8.0版をリリースしました。このバージョンでは、新たなモデルとしてDeepSeek-V4が追加されました。DeepSeek-V4は高度な自然言語処理能力を持つ最新のモデルで、特に大規模なデータセットでの学習効率と精度向上に貢献します。

編集部コメント

Hugging Face Transformersプロジェクトは常に最新のAI技術を取り入れており、v5.8.0リリースではDeepSeek-V4モデルを追加することで、大規模なデータセットに対する自然言語処理能力を向上させました。このアップデートにより、開発者はより複雑で高度なNLPタスクに対応できるようになります。

評価ポイント Assessment

良い点

DeepSeek-V4モデルの導入により、大規模なNLPタスクへの対応力が向上した
新バージョンでは既存のAPIやインターフェースとの互換性を維持している
開発者コミュニティからのフィードバックに基づいて改善点が反映されている

懸念点

DeepSeek-V4モデルの利用には、より高度な計算リソースが必要となる可能性がある
新しいバージョンへの移行に伴う既存システムとの互換性問題が発生する可能性がある

業界・社会への影響 Impact

このリリースは、自然言語処理分野における開発者と研究者の作業を大幅に効率化し、大規模なデータセットでのモデルの学習と推論性能を向上させることが期待されます。これにより、より高度で実践的なNLPアプリケーションの開発が可能になります。

参照元 Sources

元記事と、深堀りで参照した情報源です。コミュニティ投稿やプレプリントでは、ここから根拠を確認できます。

Transformers v5.8.0リリース：DeepSeek-V4モデル追加

Hugging Face Transformers Releases

https://github.com/huggingface/transformers/releases/tag/v5.8.0

この記事の見取り図

読む前に確認
記事の読み解き
参照元
AI要約について
関連記事

キーワード

Hugging Face Transformers DeepSeek-V4 v5.8.0

AI要約について

本記事の要約・分類・読み解きにはAIを使用しています。内容確認に努めていますが、誤訳・解釈違い・元記事更新の反映漏れを含む可能性があります。重要な判断を行う場合は、必ず元記事もご確認ください。

速報について — 速報は追加調査や本文抽出の結果で内容が更新される場合があります。初期要約には誤りや不足が含まれる可能性があります。

記事データ

Source	公式情報
Category	リリース
Status	速報
出典	Hugging Face Transformers Releases
公開日	2026-05-05

元記事の説明文

<h1>Release v5.8.0</h1> <h2>New Model additions</h2> <h3>DeepSeek-V4</h3> <a href="https://private-user-images.githubusercontent.com/73884904/587809551-4c0fdb29-f770-463c-a97b-d24438896a4c.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3ODExMTA0MDUsIm5iZiI6MTc4MTExMDEwNSwicGF0aCI6Ii83Mzg4NDkwNC81ODc4MDk1NTEtNGMwZmRiMjktZjc3MC00NjNjLWE5N2ItZDI0NDM4ODk2YTRjLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjA2MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwNjEwVDE2NDgyNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTBlYWJmYzAzNzM4ODUwZGRhM2MxMmI4MzQ2NzI2M2U3NWFjZTcyMmUzZGU4YTA1YWM0NzM1MjM1MWY0NGQ5YTAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JnJlc3BvbnNlLWNvbnRlbnQtdHlwZT1pbWFnZSUyRnBuZyJ9.xtKn_tG7zPSViJbe2D5_pqsFOu74VJRRXO3Qhhg8_QE" rel="noopener noreferrer" target="_blank"><img alt="image" height="1082" src="https://private-user-images.githubusercontent.com/73884904/587809551-4c0fdb29-f770-463c-a97b-d24438896a4c.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3ODExMTA0MDUsIm5iZiI6MTc4MTExMDEwNSwicGF0aCI6Ii83Mzg4NDkwNC81ODc4MDk1NTEtNGMwZmRiMjktZjc3MC00NjNjLWE5N2ItZDI0NDM4ODk2YTRjLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjA2MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwNjEwVDE2NDgyNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTBlYWJmYzAzNzM4ODUwZGRhM2MxMmI4MzQ2NzI2M2U3NWFjZTcyMmUzZGU4YTA1YWM0NzM1MjM1MWY0NGQ5YTAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JnJlc3BvbnNlLWNvbnRlbnQtdHlwZT1pbWFnZSUyRnBuZyJ9.xtKn_tG7zPSViJbe2D5_pqsFOu74VJRRXO3Qhhg8_QE" style="height: auto;" width="2000" /></a> <p>DeepSeek-V4 is the next-generation MoE (Mixture of Experts) language model from DeepSeek that introduces several architectural innovations over DeepSeek-V3. The architecture replaces Multi-head Latent Attention (MLA) with a hybrid local + long-range attention design, swaps residual connections for Manifold-Constrained Hyper-Connections (mHC), and bootstraps the first few MoE layers with a static token-id → expert-id hash table. This implementation covers DeepSeek-V4-Flash, DeepSeek-V4-Pro, and their -Base pretrained variants, which share the same architecture but differ in width, depth, expert count and weights.</p> <p><strong>Links:</strong> <a href="https://huggingface.co/docs/transformers/main/en/model_doc/deepseek_v4" rel="nofollow">Documentation</a> | <a href="https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash/blob/main/DeepSeek_V4.pdf" rel="nofollow">Paper</a></p> <ul> <li>Add DeepSeek V4 (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45643">#45643</a>) by <a class="user-mention notranslate" href="https://github.com/ArthurZucker">@ArthurZucker</a> in <a href="https://github.com/huggingface/transformers/pull/45643">#45643</a></li> </ul> <h3>Gemma 4 Assistant</h3> <a href="https://private-user-images.githubusercontent.com/73884904/587812603-02c79b0b-a172-4495-b09d-a6a4b625ee66.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3ODExMTA0MDUsIm5iZiI6MTc4MTExMDEwNSwicGF0aCI6Ii83Mzg4NDkwNC81ODc4MTI2MDMtMDJjNzliMGItYTE3Mi00NDk1LWIwOWQtYTZhNGI2MjVlZTY2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjA2MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwNjEwVDE2NDgyNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTY5ZjQwYTU0NWU2ZDAwOWM1ZGNkNTAxNWQ0OGE1NTVkYWFjZjgxZmU3MDBmNzlmZTdjYTVkMzJkNDI0NWNkMDkmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JnJlc3BvbnNlLWNvbnRlbnQtdHlwZT1pbWFnZSUyRnBuZyJ9.UkoeRFelHWRcLLbZfgkCsKfJ2zHQ2CFJgoQKNWlVK8w" rel="noopener noreferrer" target="_blank"><img alt="image" height="400" src="https://private-user-images.githubusercontent.com/73884904/587812603-02c79b0b-a172-4495-b09d-a6a4b625ee66.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3ODExMTA0MDUsIm5iZiI6MTc4MTExMDEwNSwicGF0aCI6Ii83Mzg4NDkwNC81ODc4MTI2MDMtMDJjNzliMGItYTE3Mi00NDk1LWIwOWQtYTZhNGI2MjVlZTY2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjA2MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwNjEwVDE2NDgyNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTY5ZjQwYTU0NWU2ZDAwOWM1ZGNkNTAxNWQ0OGE1NTVkYWFjZjgxZmU3MDBmNzlmZTdjYTVkMzJkNDI0NWNkMDkmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JnJlc3BvbnNlLWNvbnRlbnQtdHlwZT1pbWFnZSUyRnBuZyJ9.UkoeRFelHWRcLLbZfgkCsKfJ2zHQ2CFJgoQKNWlVK8w" style="height: auto;" width="2000" /></a> <p>Gemma 4 Assistant is a small, text-only model that enables speculative decoding for Gemma 4 models using the Multi-Token Prediction (MTP) method and associated candidate generator. The model shares the same Gemma4TextModel backbone as other Gemma 4 models but uses KV sharing throughout the entire model, allowing it to reuse the KV cache populated by the target model and skip the pre-fill phase entirely. This architecture includes cross-attention to make the most of the target model's context, allowing the assistant to accurately predict more drafted tokens per drafting round.</p> <p><strong>Links:</strong> <a href="https://huggingface.co/docs/transformers/main/en/model_doc/gemma4_assistant" rel="nofollow">Documentation</a></p> <ul> <li>First model (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45788">#45788</a>) by <a class="user-mention notranslate" href="https://github.com/SindhuRaghuram97">@SindhuRaghuram97</a> in <a href="https://github.com/huggingface/transformers/pull/45788">#45788</a></li> </ul> <h3>GraniteSpeechPlus</h3> <a href="https://private-user-images.githubusercontent.com/73884904/587811200-94fc3730-742c-4b9e-ab6a-ed2e5c75d0bf.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3ODExMTA0MDUsIm5iZiI6MTc4MTExMDEwNSwicGF0aCI6Ii83Mzg4NDkwNC81ODc4MTEyMDAtOTRmYzM3MzAtNzQyYy00YjllLWFiNmEtZWQyZTVjNzVkMGJmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjA2MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwNjEwVDE2NDgyNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTliYWQyZTI3Yjc3NmQ5MGY2OWYxZGQ3ZjA2NWM0ZDNkMTc4MjVlMWYyMjEyNmQ0ZWExOGNiOTEwZjMyYjdiNDcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JnJlc3BvbnNlLWNvbnRlbnQtdHlwZT1pbWFnZSUyRnBuZyJ9.ezu-oxTLJAG1HZd86ouhxtvRGawURfZqLS0On6C7UIo" rel="noopener noreferrer" target="_blank"><img alt="image" height="930" src="https://private-user-images.githubusercontent.com/73884904/587811200-94fc3730-742c-4b9e-ab6a-ed2e5c75d0bf.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3ODExMTA0MDUsIm5iZiI6MTc4MTExMDEwNSwicGF0aCI6Ii83Mzg4NDkwNC81ODc4MTEyMDAtOTRmYzM3MzAtNzQyYy00YjllLWFiNmEtZWQyZTVjNzVkMGJmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjA2MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwNjEwVDE2NDgyNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTliYWQyZTI3Yjc3NmQ5MGY2OWYxZGQ3ZjA2NWM0ZDNkMTc4MjVlMWYyMjEyNmQ0ZWExOGNiOTEwZjMyYjdiNDcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JnJlc3BvbnNlLWNvbnRlbnQtdHlwZT1pbWFnZSUyRnBuZyJ9.ezu-oxTLJAG1HZd86ouhxtvRGawURfZqLS0On6C7UIo" style="height: auto;" width="1310" /></a> <p>Granite Speech Plus is a variant of Granite Speech that enhances the projector by consuming the concatenation of the encoder's final hidden states with an arbitrary subset of its intermediate hidden states along the feature dimension. It is a multimodal speech-to-text model that can transcribe audio, provide speaker annotation and word level timestamps by responding to text prompts. The model inherits the same architecture components as Granite Speech including the speech encoder, query transformer projector, language model, and optional LoRA adapter.</p> <p><strong>Links:</strong> <a href="https://huggingface.co/docs/transformers/main/en/model_doc/granite_speech_plus" rel="nofollow">Documentation</a></p> <ul> <li>Support for a new Granite-Speech-Plus model (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45695">#45695</a>) by <a class="user-mention notranslate" href="https://github.com/zvik">@zvik</a> in <a href="https://github.com/huggingface/transformers/pull/45695">#45695</a></li> </ul> <h3>Granite4Vision</h3> <p>Granite Vision 4.1 is a vision-language model from IBM Research designed for enterprise-grade document data extraction. It specializes in chart extraction (Chart2CSV, Chart2Summary, Chart2Code), table extraction (JSON, HTML, OTSL), and semantic key-value pair extraction. The model builds on LLaVA-NeXT with architectural innovations including SigLIP2 Vision Encoder, Window Q-Former Projectors, and DeepStack Feature Injection with 8 vision-to-LLM injection points.</p> <p><strong>Links:</strong> <a href="https://huggingface.co/docs/transformers/main/en/model_doc/granite4_vision" rel="nofollow">Documentation</a></p> <ul> <li>Add Granite 4.1 Vision (granite4_vision) (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45597">#45597</a>) by <a class="user-mention notranslate" href="https://github.com/artem-spector">@artem-spector</a> in <a href="https://github.com/huggingface/transformers/pull/45597">#45597</a></li> </ul> <h3>EXAONE-4.5</h3> <a href="https://private-user-images.githubusercontent.com/73884904/587811898-55eb732d-f9da-4f97-8226-2cd3f6476ca0.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3ODExMTA0MDUsIm5iZiI6MTc4MTExMDEwNSwicGF0aCI6Ii83Mzg4NDkwNC81ODc4MTE4OTgtNTVlYjczMmQtZjlkYS00Zjk3LTgyMjYtMmNkM2Y2NDc2Y2EwLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjA2MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwNjEwVDE2NDgyNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWY3YWEyM2YyMzJkNmIyMmFhNjdhYWU5MjNiNzg2NDAxZDRmYWQ1OTQ1MzIzMTU5N2ExZDE1ODQ5ODAyYzA3M2EmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JnJlc3BvbnNlLWNvbnRlbnQtdHlwZT1pbWFnZSUyRnBuZyJ9.IM14yq0VoZ7Io3TNCoEdsJk4CoZGuZiix0jnBF3B-js" rel="noopener noreferrer" target="_blank"><img alt="image" height="1125" src="https://private-user-images.githubusercontent.com/73884904/587811898-55eb732d-f9da-4f97-8226-2cd3f6476ca0.png?jwt=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3ODExMTA0MDUsIm5iZiI6MTc4MTExMDEwNSwicGF0aCI6Ii83Mzg4NDkwNC81ODc4MTE4OTgtNTVlYjczMmQtZjlkYS00Zjk3LTgyMjYtMmNkM2Y2NDc2Y2EwLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNjA2MTAlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjYwNjEwVDE2NDgyNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWY3YWEyM2YyMzJkNmIyMmFhNjdhYWU5MjNiNzg2NDAxZDRmYWQ1OTQ1MzIzMTU5N2ExZDE1ODQ5ODAyYzA3M2EmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JnJlc3BvbnNlLWNvbnRlbnQtdHlwZT1pbWFnZSUyRnBuZyJ9.IM14yq0VoZ7Io3TNCoEdsJk4CoZGuZiix0jnBF3B-js" style="height: auto;" width="2000" /></a> <p>EXAONE 4.5 is the first open-weight vision language model developed by LG AI Research, integrating a dedicated visual encoder into the existing EXAONE 4.0 framework to expand multimodal capabilities. The model features 33 billion parameters in total, including 1.2 billion parameters from the vision encoder, and achieves competitive performance in general benchmarks while outperforming similar-sized models in document understanding and Korean contextual reasoning. It builds on EXAONE 4.0 with key enhancements including an expanded vocabulary of 153,600 tokens, support for up to 256K token context windows, and a Multi-Token Prediction (MTP) mechanism.</p> <p><strong>Links:</strong> <a href="https://huggingface.co/docs/transformers/main/en/model_doc/exaone4_5" rel="nofollow">Documentation</a> | <a href="https://huggingface.co/papers/2604.08644" rel="nofollow">Paper</a> | <a href="https://www.lgresearch.ai/blog/view?seq=641" rel="nofollow">Blog Post</a></p> <ul> <li>Add EXAONE 4.5 implementations (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45471">#45471</a>) by <a class="user-mention notranslate" href="https://github.com/nuxlear">@nuxlear</a> in <a href="https://github.com/huggingface/transformers/pull/45471">#45471</a></li> </ul> <h3>PP-FormulaNet</h3> <p>PP-FormulaNet-L and PP-FormulaNet_plus-L are lightweight models designed for table structure recognition, focusing on accurately recognizing table structures in documents and natural scenes. The models are part of the SLANet series and can be used for image-to-text tasks, specifically for detecting and processing mathematical formulas and table structures from images.</p> <p><strong>Links:</strong> <a href="https://huggingface.co/docs/transformers/main/en/model_doc/pp_formulanet" rel="nofollow">Documentation</a></p> <ul> <li>[Model] Add PP-FormulaNet Model Support (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45626">#45626</a>) by <a class="user-mention notranslate" href="https://github.com/zhang-prog">@zhang-prog</a> in <a href="https://github.com/huggingface/transformers/pull/45626">#45626</a></li> </ul> <h2>Breaking changes</h2> <p>Apex integration has been removed from the library (including RMSNorm usage in T5 and related models), so users relying on Apex for mixed precision or fused ops should migrate to PyTorch's native equivalents instead.</p> <ul> <li>🚨 Get rid of most Apex references (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45723">#45723</a>) by <a class="user-mention notranslate" href="https://github.com/Rocketknight1">@Rocketknight1</a></li> </ul> <h2>Tokenization</h2> <p>Fixed tokenizer mapping issues for DeepSeek R1 distilled (Qwen2) and DeepSeek OCR models, and resolved a significant performance regression in <code>PreTrainedTokenizer.convert_ids_to_tokens</code> where <code>skip_special_tokens=True</code> was rebuilding the special token set on every iteration, resulting in a ~300x speedup for that code path.</p> <ul> <li>deepseek r1 distilled tokenizer fix for qwen2 mapping (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45741">#45741</a>) by <a class="user-mention notranslate" href="https://github.com/itazap">@itazap</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45741">#45741</a>]</li> <li>DeepSeek OCR specifies an incorrect tokenizer class on the Hub (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45739">#45739</a>) by <a class="user-mention notranslate" href="https://github.com/hmellor">@hmellor</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45739">#45739</a>]</li> <li>PythonBackend slow tokenizer convert_ids_to_tokens fix (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45728">#45728</a>) by <a class="user-mention notranslate" href="https://github.com/i3hz">@i3hz</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45728">#45728</a>]</li> </ul> <h2>Bugfixes and improvements</h2> <ul> <li>fix: correct spelling in continuous_api docstring (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45749">#45749</a>) by <a class="user-mention notranslate" href="https://github.com/Dhruv908615">@Dhruv908615</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45749">#45749</a>]</li> <li>Fix link to modular transformers documentation (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45746">#45746</a>) by <a class="user-mention notranslate" href="https://github.com/SangbumChoi">@SangbumChoi</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45746">#45746</a>]</li> <li>Gemma4: fix failed test cases (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45568">#45568</a>) by <a class="user-mention notranslate" href="https://github.com/kaixuanliu">@kaixuanliu</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45568">#45568</a>]</li> <li>Fix CI: Allow more artifacts to be download in CI (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45785">#45785</a>) by <a class="user-mention notranslate" href="https://github.com/ydshieh">@ydshieh</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45785">#45785</a>]</li> <li>Add <code>concurrency</code> to <code>PR CI</code> workflow file (<code>pr-ci-caller.yml</code>) (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45786">#45786</a>) by <a class="user-mention notranslate" href="https://github.com/ydshieh">@ydshieh</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45786">#45786</a>]</li> <li>Reorder decorators for autodoc and dataclass (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45702">#45702</a>) by <a class="user-mention notranslate" href="https://github.com/zucchini-nlp">@zucchini-nlp</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45702">#45702</a>]</li> <li>Unwrap <code>text_config</code> in <code>AutoModelFor*.from_config</code> (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45770">#45770</a>) by <a class="user-mention notranslate" href="https://github.com/jamesbraza">@jamesbraza</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45770">#45770</a>]</li> <li>fix: Added Mps support in float fallback backends list (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45687">#45687</a>) by <a class="user-mention notranslate" href="https://github.com/rigen1048">@rigen1048</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45687">#45687</a>]</li> <li>Github Actions PR CI (caller) (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45476">#45476</a>) by <a class="user-mention notranslate" href="https://github.com/ydshieh">@ydshieh</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45476">#45476</a>]</li> <li>make sure we call check_auto in CI (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45775">#45775</a>) by <a class="user-mention notranslate" href="https://github.com/tarekziade">@tarekziade</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45775">#45775</a>]</li> <li>Fix auto mapping script (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45774">#45774</a>) by <a class="user-mention notranslate" href="https://github.com/Cyrilvallez">@Cyrilvallez</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45774">#45774</a>]</li> <li>[MINISTRAL3] Fix conversion script yarn's apply_scale support. (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45744">#45744</a>) by <a class="user-mention notranslate" href="https://github.com/juliendenize">@juliendenize</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45744">#45744</a>]</li> <li>[nemotron_h] respect _no_reinit flag on dt_bias and out_proj.weight (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45591">#45591</a>) by <a class="user-mention notranslate" href="https://github.com/vai-minzhou">@vai-minzhou</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45591">#45591</a>]</li> <li>fix(utils): Resolve backbone utils test regressions (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45594">#45594</a>) by <a class="user-mention notranslate" href="https://github.com/harshaljanjani">@harshaljanjani</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45594">#45594</a>]</li> <li>[CB] Better overall script and decode bucketting (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45653">#45653</a>) by <a class="user-mention notranslate" href="https://github.com/remi-or">@remi-or</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45653">#45653</a>]</li> <li>[docs] model testing (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45152">#45152</a>) by <a class="user-mention notranslate" href="https://github.com/stevhliu">@stevhliu</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45152">#45152</a>]</li> <li>update dev (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45726">#45726</a>) by <a class="user-mention notranslate" href="https://github.com/vasqu">@vasqu</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45726">#45726</a>]</li> <li>Doc translate to Persian(farsi) (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45664">#45664</a>) by <a class="user-mention notranslate" href="https://github.com/zeoses">@zeoses</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45664">#45664</a>]</li> <li>[<code>OAI Privacy Filter</code>] Add integration test (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45725">#45725</a>) by <a class="user-mention notranslate" href="https://github.com/vasqu">@vasqu</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45725">#45725</a>]</li> <li>Speedup Qwen2VLImageProcessor (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45719">#45719</a>) by <a class="user-mention notranslate" href="https://github.com/lgeiger">@lgeiger</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45719">#45719</a>]</li> <li>Remove dead beam-search dummies from dummy_pt_objects.py (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45722">#45722</a>) by <a class="user-mention notranslate" href="https://github.com/jw9603">@jw9603</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45722">#45722</a>]</li> <li>chore(typing): add ty type checking for 10 utility files (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45703">#45703</a>) by <a class="user-mention notranslate" href="https://github.com/moonbogi">@moonbogi</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45703">#45703</a>]</li> <li>Llama3 video fix (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45040">#45040</a>) by <a class="user-mention notranslate" href="https://github.com/sywangyi">@sywangyi</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45040">#45040</a>]</li> <li>Fix custom-module copies inheriting read-only permissions (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45686">#45686</a>) by <a class="user-mention notranslate" href="https://github.com/nurpax">@nurpax</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45686">#45686</a>]</li> <li>Python code in model docs (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45608">#45608</a>) by <a class="user-mention notranslate" href="https://github.com/zucchini-nlp">@zucchini-nlp</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45608">#45608</a>]</li> <li>fix failed test cases for blt model (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45596">#45596</a>) by <a class="user-mention notranslate" href="https://github.com/kaixuanliu">@kaixuanliu</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45596">#45596</a>]</li> <li>chore(typing): add ty type checking for 3 pipeline files (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45667">#45667</a>) by <a class="user-mention notranslate" href="https://github.com/moonbogi">@moonbogi</a> in [<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45667">#45667</a>]</li> </ul> <h2>Significant community contributions</h2> <p>The following contributors have made significant changes to the library over the last release:</p> <ul> <li><a class="user-mention notranslate" href="https://github.com/artem-spector">@artem-spector</a> <ul> <li>Add Granite 4.1 Vision (granite4_vision) (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45597">#45597</a>)</li> </ul> </li> <li><a class="user-mention notranslate" href="https://github.com/SindhuRaghuram97">@SindhuRaghuram97</a> <ul> <li>First model (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45788">#45788</a>)</li> </ul> </li> <li><a class="user-mention notranslate" href="https://github.com/nuxlear">@nuxlear</a> <ul> <li>Add EXAONE 4.5 implementations (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45471">#45471</a>)</li> </ul> </li> <li><a class="user-mention notranslate" href="https://github.com/ArthurZucker">@ArthurZucker</a> <ul> <li>Add DeepSeek V4 (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45643">#45643</a>)</li> </ul> </li> <li><a class="user-mention notranslate" href="https://github.com/remi-or">@remi-or</a> <ul> <li>[CB] Better overall script and decode bucketting (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45653">#45653</a>)</li> </ul> </li> <li><a class="user-mention notranslate" href="https://github.com/zhang-prog">@zhang-prog</a> <ul> <li>[Model] Add PP-FormulaNet Model Support (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45626">#45626</a>)</li> </ul> </li> <li><a class="user-mention notranslate" href="https://github.com/zvik">@zvik</a> <ul> <li>Support for a new Granite-Speech-Plus model (<a class="issue-link js-issue-link" href="https://github.com/huggingface/transformers/pull/45695">#45695</a>)</li> </ul> </li> </ul>