In the final installment of APMdigest's 2025 Predictions Series, industry experts offer predictions on how AI will evolve and impact technology and business in 2025. Part 5 covers the infrastructure and hardware supporting AI.
AI CHALLENGE: INFRASTRUCTURE
Looking ahead to 2025, the emphasis will move beyond simply adopting AI. The focus will shift to building and developing tech stacks that truly enhance AI capabilities. As models become more advanced, the demand for robust servers, high performance accelerators, and extensive storage will increase. The real challenge isn't using AI tools and models — it's ensuring the infrastructure is equipped to maximize AI's full potential and be flexible enough to allow organizations to work toward their specific needs. While many have overcome the initial barrier of building models, the focus must now shift to deploying and scaling these tools effectively. It's time to consider not just how to use AI, but what tools and systems will fuel and protect its next phase of growth.
Liat Mendelson Honderdors
Principal Product Manager, Leaseweb
AI CHALLENGE: RESILIENCE
2025 will reveal AI's resilience problem: In 2025, increased AI adoption and innovation will reveal that AI has a resilience problem that must be addressed. Application, data center and network outages are unfortunately an all too common occurrence, but how do these legacy IT challenges impact our new world of AI-powered everything? AI applications are very resource intensive, but the reality is that we've only scratched the surface due to AI's currently limited use within most organizations. As AI applications and models scale in adoption, we will see the detrimental impact of systems and tools that haven't prioritized ultra-resilience as a core design principle.
Karthik Ranganathan
Co-Founder and Co-CEO, Yugabyte
AI CHALLENGE: ENERGY CONSUMPTION
AI's massive energy consumption: Energy consumption is a pertinent question, given the explosion of AI. Microsoft has recently worked to reopen the nuclear power plant, Three Mile Island, and we're likely to see hyperscalers becoming utility companies to serve their energy needs. There may be greater focus on the sheer sustainability of AI, relative to the benefits it can offer. Spending on AI is projected to outstrip that of broadband infrastructure, even though it's unclear what the outcome will be — and that is worth bearing in mind.
Gordon Van Huizen
SVP of Strategy, Mendix
AI CHALLENGE: THE NETWORK
Everyone's Worried About Power — But What About Network? Concerns about the growing power demands of data centers have been dominating headlines lately. But as data centers scale to meet massive AI training and inference computing needs, available networking bandwidth and fiber resources have emerged as the next critical challenge in 2025. Fiber build outs at the edge, higher capacity subsea cables, and reducing network latency will become pressing concerns as data capacity pushes the limits of today's infrastructure. Network readiness is a crucial factor to consider in the future of data center deployments.
Jason Carolan
Chief Innovation Officer, Flexential
AI CHALLENGE: INFRASTRUCTURE INVESTMENT
AI and power infrastructure investment and expansion remains a national priority: If AI compute scaling maintains its current growth trajectory, GPU clusters will surge in size from 100K+ to 1M+ clusters, reaching gigawatt scale before 2030. The US is currently the leader in AI globally, maintaining a computing, chip, and technology advantage over its nearest rival, China, with the US having approximately 2X the number of installed computing servers as China.
However, since 2000 China has outpaced the US in terms of adding power infrastructure (adding 925 GW of generation via the US increase of 51 GW) primarily in support of its manufacturing base but readily able to pivot to support data center infrastructure. For the US to maintain its advantage, power infrastructure investment needs to materially expand to the 100+ gigawatt range.
Thankfully, this appears to have become a bipartisan area of political concern, and I believe the prioritization around national economic and security interests will help accelerate infrastructure development. However, a question remains, will investment be fast enough to maintain the US technological advantage or will innovation be bottlenecked due to capital or regulatory constraints?
Tom Traugott
SVP of Strategy, EdgeCore Digital Infrastructure
AI INFRASTRUCTURE MARKET
Bigger players will continue to dominate the AI infrastructure market, buying up vast volumes of GPUs, and constraining where AI is developed. AI infrastructure will likely remain concentrated in the hands of a few dominant tech giants and cloud hyperscalers. Their massive investments in GPUs and other specialized hardware will create a significant barrier to entry for smaller players, requiring them to shift where they innovate and potentially limiting the diversity of AI applications.
David DeSanto
Chief Product Officer, GitLab
PUBLIC-PRIVATE PARTNERSHIPS
Public-Private Partnerships - The Key to AI Infrastructure Growth: The soaring energy needs of AI-driven data centers are capturing attention, but the real challenge lies in securing sustainable energy sources. Public-private partnerships will play a crucial role in addressing these energy demands and help develop alternative energy sources like nuclear and natural gas, which are essential to powering future AI infrastructure.
Chris Downie
CEO, Flexential
SILICON DIVERSITY
Silicon Diversity Will Revolutionize AI Efficacy/ROI: A broader array of state-of-the-art GPUs will enable purpose-built AI models to drive the next wave of innovation in the enterprise. 2025 will see increased attention to matching AI workloads with optimal compute resources, driving exponential demand for specialized GPUs. Silicon diversity — the emergence of highly specialized AI compute chips — will provide tailored solutions for each stage of the AI model lifecycle. Organizations that embrace this diversity will enjoy enhanced AI capabilities at reduced costs. Those who fail to leverage silicon diversity will risk falling behind in both performance and cost efficiency.
Kevin Cochrane
CMO of Vultr
GPU MARKET SELF-CORRECTION
The GPU market will self-correct (in most places), allowing companies to better manage their AI-related costs and goals. The problem with AI and GPU usage is that the "super-chip" future will not be evenly distributed at first. Europe is more worried about the GPU shortage than companies in the United States, where there is greater capacity. Regional availability will be a longer-term problem, and even offering organizations options to route traffic across deployments to places where there is GPU capacity gives some organizations pause. For example, there may be regional data laws to consider, and even without that, it's still a mental shift for security architects. Sending your data to a different region, even through a secure connection and within the same platform, is something that security architects will need to vet and get comfortable with. As a result, we'll have to get creative with cloud service providers and new players creating new chips to meet demand, alongside leveraging new models that are both highly capable and cheaper to run.
Baris Gultekin
Head of AI, Snowflake
ALT-CLOUD
AI - Catalyst for the Alt-cloud: AI will become smarter and more dependable in the next year, but businesses will require agile, scalable, open, composable ecosystems to unlock its full potential — something Big Tech's cloud titans aren't capable of delivering. Enterprises will increasingly look to alternative cloud providers to supply the kind of infrastructure that supports the rapid deployment of new AI models without skyrocketing overheads. These open ecosystems will supplant the monolithic, rigid, and costly single-vendor paradigm that has disproportionately favored enterprises operating closer to the traditional tech heartlands, leveling the playing field for AI innovation across all regions of the world.
J.J. Kardwell
CEO, Vultr
EDGE-FOCUSED ARCHITECTURE
The Rise of Agentic AI Will Push Us to the Edge: In 2025, agentic AI will leap from imaginary to necessary, quickly redefining enterprise automation. Self-directed AI applications will allow organizations to make real-time, data-driven decisions, particularly in sectors already making use of sovereign and private clouds. Expect early enterprise-level adopters to crop up in places where CapEx isn't an issue, deploying high-performance GPU and CPU clusters for mission-critical applications. Simultaneously, lighter agentic AI solutions will flourish through alternative cloud providers, enabling serverless inference at the edge, slashing costs and complexity. By outsourcing infrastructure management, businesses will be able to focus on optimizing the AI application layer, unlocking unparalleled productivity and customer engagement. To support the massive scale of AI inference required, organizations will increasingly deploy specialized models paired with vector databases and RAG at edge locations. This edge-focused architecture will deliver the ultra-low latency needed for AI agents to effectively support the volume of AI interactions needed for agentic AI at scale.
J.J. Kardwell
CEO, Vultr
AI DATA CENTERS
The Next Wave of Data Centers will be Built to Handle AI Demand: While AI has already driven significant demand for data centers, this is just the beginning. The next wave of data centers will focus on distributed architectures to handle AI inference at scale, which will require more robust hybrid cloud environments.
Chris Downie
CEO, Flexential
Designing AI data centers with resiliency at the core: Considering the high-power requirements of AI and GPU clusters, everything is now larger and denser, which introduces new challenges to designing "the modern data center." Facilities and campuses over the next 5-10 years will need significant upsizing, from 100-300 megawatts for most hyperscale campuses today to the gigawatt level corresponding with 1M+ projected GPU clusters. 2025 will be the year developers more broadly advance on planning for designs that meet those requirements. Nonetheless, AI data centers must still be designed with uptime in mind to ensure GPUs don't suffer outages at the wrong time, mid-training or during an inference call. While some degree of failure is expected in large GPU clusters, that is at the hardware level. A data center outage has the potential to be catastrophic and damage very expensive GPUs in a thermal runaway event.
Tom Traugott
SVP of Strategy, EdgeCore Digital Infrastructure
NUCLEAR-POWERED AI DATA CENTERS
Nuclear-Powered AI Data Centers - A Sustainable Future:
As AI models grow in complexity and scale, the demand for computing power and energy consumption will soar. Nuclear energy, a clean and reliable source of power, will emerge as a critical solution to power AI data centers. This will not only address the energy needs of AI but also reduce carbon emissions and contribute to a sustainable future. Nuclear-powered AI data centers will help us maintain uninterrupted operation of AI data centers, and will be able to scale to meet the needs of large-scale AI infrastructure.
Karthik Sj
GM of AI, LogicMonitor
ON-PREMISE AI DEPLOYMENTS
More companies will run customized AI models on-premise
In 2025, we will see a shift toward on-premise AI deployments. As open-source models become more cost-effective and accessible, organizations will increasingly opt to run customized versions within their own data centers. As a result, it will be cheaper, faster, and easier to own AI models and fine-tune them to individual needs. Companies will find they can combine their data with existing models and tailor the experience for their customers at a fraction of today's costs. Meanwhile, increased compliance risks associated with AI will drive regulated industries, like financial institutions and government agencies, to deploy models in air-gapped environments for greater control over data privacy and security and reduced latency.
Emilio Salvador
VP of Strategy and Developer Relations, GitLab