Why Amazon EC2 Trn2 instances and UltraServers?
Amazon EC2 Trn2 instances, powered by 16 AWS Trainium2 chips, are purpose-built for generative AI and are the most powerful EC2 instances for training and deploying models with hundreds of billions to trillion+ parameters. Trn2 instances offer 30-40% better price performance than the current generation of GPU-based EC2 P5e and P5en instances. With Trn2 instances, you can get state-of-the-art training and inference performance while lowering costs, so you can reduce training times, iterate faster, and deliver real-time, AI-powered experiences. You can use Trn2 instances to train and deploy models including large language models (LLMs), multimodal models, and diffusion transformers to build next-generation generative AI applications.
To lower training times and deliver breakthrough response times (per-token-latency) for the most demanding, state-of-the-art models, you might need more compute and memory than a single instance can deliver. Trn2 UltraServers use NeuronLink, our proprietary chip-to-chip interconnect, to connect 64 Trainium2 chips across four Trn2 instances, quadrupling the compute, memory, and networking bandwidth available in a single node and offering breakthrough performance on AWS for deep learning and generative AI workloads. For inference, UltraServers help deliver industry-leading response time to create the best real-time experiences. For training, UltraServers boost model training speed and efficiency with faster collective communication for model parallelism as compared to standalone instances.
You can easily get started on Trn2 instances and Trn2 UltraServers with native support for popular machine learning (ML) frameworks such as PyTorch and JAX.
Benefits
Features
Customer and partner testimonials
Here are some examples of how customers and partners plan to achieve their business goals with Amazon EC2 Trn2 instances.
-
Anthropic
-
Databricks
-
poolside
-
Itaú Unibanco
Itaú Unibanco's purpose is to improve people's relationship with money, creating positive impact on their lives while expanding their opportunities for transformation. At Itaú Unibanco, we believe that each customer is unique and we focus on meeting their needs through intuitive digital journeys, that leverage the power of AI to constantly adapt to their consumer habits.
-
NinjaTech AI
Ninja is an All-In-One AI Agent for Unlimited Productivity: one simple subscription, unlimited access to world’s best AI models along with top AI skills such as: writing, coding, brainstorming, image generation, online research. Ninja is an agentic platform and offers “SuperAgent” which uses Mixture-of-agents with world class accuracy comparable to (and in some categories it’s beating) frontier foundation models. Ninja’s Agentic technology demands the highest performance accelerators, to deliver the unique real- time experiences our customers expect.
-
Ricoh
The RICOH machine learning team develops workplace solutions and digital transformation services designed to manage and optimize the flow of information across our enterprise solutions.
-
PyTorch
-
Refact.ai
Refact.ai offers comprehensive AI tools such as code auto-completion powered by Retrieval-Augmented Generation (RAG), providing more accurate suggestions, and a context-aware chat using both proprietary and open-source models.
-
Karakuri Inc.
-
Stockmark Inc.
-
Brave
-
Anyscale
Anyscale is the company behind Ray, an AI Compute Engine that fuels ML, and Generative AI initiatives for Enterprises. With Anyscale's unified AI platform driven by RayTurbo, customers see up to 4.5x faster data processing, 10X lower cost batch inference with LLMs, 5x faster scaling, 12X faster iteration, and cost savings of 50% for online model inference by optimizing utilization of resources.
-
Datadog
-
Hugging Face
-
Lightning AI
Lightning AI, the creator of PyTorch Lightning and Lightning Studios offers the most intuitive, all-in-one AI development platform for enterprise-grade AI. Lightning provides full code, low-code and no-code tools to build agents, AI applications and generative AI solutions, Lightning fast. Designed for flexibility, it runs seamlessly on your cloud or ours leveraging the expertise and support of a 3M+ strong developer community.
-
Domino Data Lab
Getting started
Product details
Instance Size | Available in EC2 UltraServers | Trainium2 chips | Accelerator memory |
vCPUs | Memory (TB) |
Instance storage (TB) | Network bandwidth (Tbps) | EBS bandwidth (Gbps) |
trn2.48xlarge | No | 16 | 1.5 TB | 192 | 2 TB | 4 x 1.92 NVMe SSD | 3.2 | 80 |
trn2u.48xlarge | Yes (Preview) | 16 | 1.5 TB | 192 | 2 TB | 4 x 1.92 NVMe SSD | 3.2 | 80 |