Google launches Gemma 4, its big bet on open and local AI

  • Gemma 4 is a family of four open AI models based on Gemini 3 technology, with sizes ranging from E2B to 31B parameters.
  • The models combine high "parameter intelligence" with local execution on mobile, edge and proprietary infrastructures, thanks to context windows of up to 256K tokens.
  • The Apache 2.0 license allows unrestricted commercial use, strengthens digital sovereignty, and facilitates deployment in regulated environments in Europe.
  • Gemma 4 is multimodal (text, image, video and audio in small models), supports more than 140 languages ​​and is available in Google AI Studio, Hugging Face, Kaggle and Ollama.

Gemma 4 AI Model

Google has taken an important step in its strategy to open artificial intelligence With the launch of Gemma 4, a new family of models aims to combine high levels of reasoning with much more modest hardware requirements. The company presents this generation as a serious alternative for those who need to run advanced AI on their own infrastructure, from mobile devices to data centers.

Far from being a single model, Gemma 4 is a complete range of four open variantsDesigned for developers, businesses, and public entities that want more control over their data and deployments, the proposal fits particularly well with the requirements of Digital sovereignty and regulatory compliance in Europewhere the power to decide where the execution and where the data is stored is becoming increasingly important.

A family of four models focused on "parameter-based intelligence"

Gemma 4 model family

Gemma 4 has been built on the same technological foundation as Gemini 3But with a clear objective: to maximize what Google calls "parameter-based intelligence"Instead of competing solely on size, the company boasts of having achieved performance levels comparable to much larger systems in relatively compact models.

The family consists of four different sizes: Effective 2B (E2B), Effective 4B (E4B), a model of 26B with Mixture of Experts (MoE) architecture and a dense variant of 31B parametersThe latter is already located in the top 3 in Arena AI's ranking for open models, surpassing alternatives that multiply its number of parameters by twenty, something especially relevant for those looking to reduce GPU costs without sacrificing quality.

Model 26B MoE It is optimized to activate only a fraction of its parameters (around 3,8B) in inference, improving token generation speed and energy efficiency. In contrast, version 31B dense It is positioned as the preferred option for demanding fine-tuning tasks, complex orchestration, and intensive use in business or institutional environments.

Google emphasizes that, in terms of public benchmarks, these variants compete directly with heavier models from other providers, including those from Chinese manufacturers such as DeepSeek or Qwen, which in recent years had become strong in the open source ecosystem. Gemma 4's 31B is listed as the third best open model in Arena AI, while the 26B MoE also ranks highly.

From a business perspective, that relationship between size and performance implies less hardware expenditure, lower latency and the ability to run boundary models in a single NVIDIA H100 80GB GPUThis opens the door for medium-sized European companies to work with advanced AI without investing in disproportionate infrastructure.

Pocket-sized AI: mobile, IoT and edge computing

Gemma 4 on mobile devices

The smaller models, E2B and E4BThey are expressly designed to operate at the network edge, that is, in mobile devices, IoT and local hardwareGoogle notes that these variants are optimized to run on Android smartphones, Raspberry PiJetson Nano and other low-power systems, with very low latency and even without an internet connection.

In this segment, the priority is not just raw power, but the ability to offer multimodal functions and rapid response in resource-constrained environments. Gemma 4 edge models can handle text, images and video, and in the case of E2B and E4B they add native support for audioThis enables use cases such as local voice assistants, field image recognition, or real-time video analytics without the need to send data to the cloud.

The context window for these lightweight models reaches the 128.000 tokensThis is sufficient to process long documents, extensive conversations, or relevant code snippets in a single prompt. According to Google, this combination of broad context and local execution helps eliminate friction. privacy, connectivity and latencyThis is highly relevant for industrial, healthcare, or educational projects in Europe, where restrictions on data processing are becoming increasingly strict.

From the perspective of hardware manufacturers, Gemma 4 opens the door to integration Advanced AI directly into consumer productsFrom smartphones and tablets to medical devices and industrial sensors, the company has highlighted that these models are designed to work with chips from common Android ecosystem providers, such as Qualcomm and MediaTek, facilitating their widespread adoption.

Furthermore, the architecture of edge models leverages techniques such as Per-Layer Embeddings (PLE) to maximize the efficiency of parameter use, allowing for reasoning and context understanding at a much lower computational cost than usual in general-purpose models.

Multimodality, agents, and advanced developer support

Gemma's 4 multimodal capabilities

One of Gemma 4's strengths is its clear commitment to the agentic workflowsThe models are not limited to generating text: they natively integrate function calling, structured JSON output, and system instructionsThis allows the construction of autonomous agents that orchestrate various steps, call external APIs, and return results in formats easily integrated with enterprise applications.

Google insists that all models in the Gemma 4 family have been designed as high-level reasonerswith configurable thinking modes to adjust the depth of reasoning according to the task. This translates into better results in multi-stage reasoning, offline code generation and complex problem-solving, key aspects in corporate and public administration environments where reliability is required.

In the multimodal plane, the four models can process Text and images with different resolutions and aspect ratios, while the E2B and E4B variants expand that capacity to video and audioThis combination makes possible, for example, systems that analyze documents with graphics, industrial monitoring videos, or rich educational content, and generate contextual responses in real time.

The context window reaches the 256.000 tokens in the largest modelsThis allows users to upload entire code repositories, lengthy legal contracts, or large volumes of technical documentation in a single query. For support, consulting, or IT audit teams, this makes it easier to automate tasks that previously required many hours of manual review.

In terms of languages, Gemma 4 natively supports more than 140 languagesFor Europe, and specifically for Spain, this means that multilingual solutions can be developed that cover everything from the main EU languages ​​to less represented languages, helping to meet accessibility and inclusion goals in public and private services.

Cloud integration, digital sovereignty, and deployment in Europe

The deployment of Gemma 4 is not limited to on-premises hardware. Google has integrated these models into its cloud offering through Vertex A.I y Google Kubernetes Engine (GKE)allowing organizations to configure dedicated computing resources and scale inference workloads on demand. For regulated European sectors, this is combined with options for Sovereign Cloud and air-gapped or on-premise deployments, adjusted to the data residency requirements and compliance with the General Data Protection Regulation (GDPR).

The company highlights that the bfloat16 precision weights of the larger models can be run efficiently in a single 80GB NVIDIA H100 GPUreducing the barrier to entry for medium-sized companies or public institutions that want to maintain control of their infrastructure. In quantized versions, the models can also work in consumer hardware or workstations, expanding the range of possible deployments.

For technology managers in Spain and the rest of Europe, this combination of open model, controlled deployment, and sovereign cloud support It allows for the design of hybrid architectures: part of the intelligence can reside in local data centers, while other less sensitive workloads run in the public cloud, all while maintaining a common technological base.

In addition, Google offers a Agent Development Kit (ADK)A modular framework that simplifies creating, testing, and deploying Gemma 4-based agents. It also relies on services such as Cloud Run with NVIDIA RTX PRO 6000 GPUs (Blackwell) in serverless mode, which allows high-intensity pilot projects to be launched without the need to acquire your own hardware from day one.

In a European context where the debate on AI usually revolves around control, transparency, and auditability, the possibility of Deploy open models under Apache 2.0 in controlled infrastructures It is especially attractive to administrations, banks, insurance companies or companies in the health sector that need to reconcile innovation with strict regulatory frameworks.

Apache License 2.0, open ecosystem and community traction

If there is one aspect that has generated particular interest in the community, it is the decision to license Gemma 4 under Apache 2.0Previous versions of Gemma used custom licenses that raised legal questions for commercial products; now, with a standard open-source license, Developers and companies can modify, redistribute, and monetize models with much less friction.

This opening comes at a time when Google is trying regain ground in the open models ecosystemThis comes after a period in which alternatives like Meta's Llama or Chinese models (DeepSeek, Qwen, GLM, Minimax) had gained adoption rates. Influential voices in the sector, such as the co-founder of Hugging Face, have described the move as a "huge milestone" for local AI, highlighting that legal teams now have a much clearer framework for approving projects based on Gemma 4.

The ecosystem surrounding the Gemma family was already showing strength before this version. Google notes that previous generations exceed... 400 million downloads and that the community has created more than 100.000 variants adapted to different languages ​​and use cases. Among the most striking examples are models specialized in Bulgarian or cancer research tools such as Cell2Sentence-Scale developed at Yale University.

With Gemma 4, the company hopes that the "Gemmaverse" will expand even further, inviting... European startups, universities and research centers to create their own derivatives. The combination of a permissive license and open weights allows for the development of versions focused on specific sectors, such as healthcare, justice, Industry 4.0, or education, which can then be shared or marketed without too many restrictions.

For Spanish companies, this situation means that it is possible to build proprietary solutions on Gemma 4—such as internal assistants, corporate search engines, or advanced analytics systems—while maintaining control of the code, data, and infrastructure, something that fits well with the trend of strengthening the European technological sovereignty.

Use cases: from startups to large corporations

Gemma 4 has been presented with a wide range of potential applicationsIn the business world, models can be used to create multilingual virtual assistants capable of handling complex queries through advanced reasoning, or to automate code generation and review in development teams.

Larger models are geared towards tasks such as orchestration of agents, analysis of large volumes of documentationThis includes generating technical reports or assisting legal and compliance departments. The combination of broad context windows and multimodal support makes it easy for a single agent to work with contracts, emails, charts, monitoring system images, and audio recordings, all within the same workflow.

In education and the public sector, the ability to process text, images, and in some cases video and audio, allows for the creation of learning support platforms that generate summaries, step-by-step explanations, or materials adapted to different levels. Local implementation also helps to respect privacy requirements when working with sensitive data of minors or vulnerable groups.

In the startup arena, Gemma 4 can be the foundation of vertical products In fintech, digital health, logistics, or B2B SaaS, thanks to the flexibility offered by Apache 2.0, teams can do fine tuning of the model on their own data, deploy it on-premise or in the cloud and market the result without being tied to strict proprietary licenses.

Particularly interesting for Europe is the possibility of developing local AI solutions that respect national and community regulations, for example, by storing data in data centers located in European territory and keeping the models under the direct control of the organization, which may be key for projects linked to the future EU AI Regulation.

Where and how to access Gemma 4

Google has made the Gemma 4 weights available through various channels to facilitate their adoption by developers and researchers. The open weights can be downloaded from hugging face y GitHub, while use via interface and APIs is available in Google AI StudioIntegrations are also offered with Don'tDocker, Kaggle, and tools like LM Studio.

According to the company, Gemma 4 can be run locally on «billions of Android devices» and across a wide range of hardware: from Laptop GPUs and workstations, all the way to dedicated developer accelerators. This aligns with the strategy of extending advanced AI beyond large data centers, into end-user devices and edge computing environments.

For those who want to start with quick tests, the most direct option is to use Google AI Studio for the 26B and 31B models or the Google AI Edge Gallery in the case of the E2B and E4B variants. In parallel, developer communities on platforms like Hugging Face are already publishing adaptations and ready-to-use configurations for different environments.

In Spain and other European countries, it is expected that local integrators and managed service providers will begin to offer turnkey solutions based on Gemma 4, combining sovereign cloud deployments, support in Spanish and adaptation to specific sector regulations, such as those of financial services or healthcare.

Overall, the launch of Gemma 4 positions Google as one of the most relevant players in the field of open and locally executable AI modelsThis comes at a time when European industry is demanding tools that combine high performance, data control, and clear licensing frameworks to build long-term commercial products.

edge AI more privacy
Related article:
Edge AI and privacy: Powerful AI without giving away your data