Edge AI

Deepseek-R1 – 1.5B Parameters

edgeai — Wed, 05 Mar 2025 07:13:50 +0000

DeepSeek-R1: Optimized Small Models for Edge AI

Introduction DeepSeek-R1 represents a new generation of lightweight, high-performance reasoning models, optimized for Edge AI applications. These models deliver exceptional reasoning, coding, and mathematical capabilities while maintaining efficiency for deployment in resource-constrained environments.

Why Small Models Matter for Edge AI Edge AI demands models that strike a balance between computational efficiency and performance. Small-scale LLMs provide:

Lower Latency: Faster inference speeds for real-time applications.
Reduced Power Consumption: Ideal for battery-operated and embedded systems.
Compact Deployment: Can run on local devices without heavy cloud dependencies.

DeepSeek-R1 Small Model Variants The DeepSeek team has successfully distilled knowledge from larger models into smaller, dense models. These lightweight models leverage insights from extensive reasoning datasets, achieving strong benchmark results while being optimized for Edge AI use cases.

DeepSeek-R1-Distill-Qwen-1.5B(1.1GB)
- Ideal for lightweight natural language processing and reasoning tasks.
- Optimized quantization (Q4_K_M) for edge efficiency.
- Run with: ollama run deepseek-r1:1.5b
DeepSeek-R1-Distill-Qwen-7B
- Enhanced reasoning and comprehension with moderate computational requirements.
- Suitable for embedded AI applications that require a balance of performance and efficiency.
- Run with: ollama run deepseek-r1:7b
DeepSeek-R1-Distill-Llama-8B
- Based on Llama 3.1, offering optimized performance for reasoning tasks.
- Run with: ollama run deepseek-r1:8b

Applications in Edge AI These small models are particularly suited for:

On-Device Assistants: Running AI-powered assistants without cloud reliance.
Autonomous Systems: Integrating AI into robotics, drones, and IoT devices.
Security & Authentication: Deploying lightweight AI for on-device identity verification.
Industrial Edge Computing: Enhancing smart manufacturing and predictive maintenance.

Licensing & Flexibility DeepSeek-R1 small models are open-source under the MIT License, allowing for commercial use, modifications, and further fine-tuning. The Qwen-based models originate from Qwen-2.5 (Apache 2.0 License), and Llama-derived models follow Meta’s licensing terms.

Conclusion DeepSeek-R1’s distilled small models present a breakthrough for Edge AI, delivering robust performance with minimal computational overhead. These models enable AI at the edge—secure, fast, and efficient.

For more information and downloads, visit https://www.ollama.com/library/deepseek-r1:1.5b

The post Deepseek-R1 – 1.5B Parameters appeared first on Edge AI.

Mistral – 7B Parameters

edgeai — Tue, 25 Feb 2025 15:50:40 +0000

Mistral for Edge AI Applications

Mistral is a compact and powerful 7B parameter model, optimized for instruction following and text completion while being lightweight enough for Edge AI deployments. With an Apache 2.0 license, Mistral provides unrestricted flexibility for customization and integration into real-world applications.

Why Mistral for Edge AI?

Efficiency: Offers top-tier performance in a small model size, ideal for low-latency edge computing.
Benchmark Superiority: Outperforms Llama 2 13B across all major benchmarks and rivals CodeLlama 7B in coding tasks.
Versatile Deployment: Available in both instruct (optimized for guided interactions) and text (pure text completion) variants.
Function Calling Capabilities: Supports structured API interactions, making it suitable for intelligent automation and system integrations.

Model Versions

Mistral 0.3 (Latest) – Supports function calling for dynamic applications.
Mistral 0.2 – Minor update refining previous functionalities.
Mistral 0.1 – Initial release.

Function Calling for Edge AI

Mistral 0.3 enables function calling via Ollama’s raw mode, making it useful for real-world tasks such as:

Smart IoT Devices: Fetch real-time data and trigger automated responses.
Edge-Based Assistants: Process and retrieve localized information efficiently.
Industrial Automation: Execute structured commands in autonomous systems.

Example Function Call for Weather Retrieval

[AVAILABLE_TOOLS] [{"type": "function", "function": {"name": "get_current_weather", "description": "Get the current weather", "parameters": {"type": "object", "properties": {"location": {"type": "string", "description": "The city and state, e.g. San Francisco, CA"}, "format": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "The temperature unit to use. Infer this from the user's location."}}, "required": ["location", "format"]}}}][/AVAILABLE_TOOLS][INST] What is the weather like today in San Francisco [/INST]

Example Response:

[TOOL_CALLS] [{"name": "get_current_weather", "arguments": {"location": "San Francisco, CA", "format": "celsius"}}]

Deployment and Integration

Mistral can be easily deployed on edge devices via command-line or API:

CLI Usage

ollama run mistral

API Example

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "mistral",
  "prompt":"Summarize recent AI advancements"
 }'

Summary

Mistral’s compact size and powerful performance make it an ideal choice for Edge AI deployments, whether for industrial automation, IoT integration, or AI-powered assistants. Its ability to handle structured function calls enhances real-world usability, making it a key player in next-generation AI systems at the edge.

The post Mistral – 7B Parameters appeared first on Edge AI.

Phi-3 – 3.8B Parameters

edgeai — Tue, 25 Feb 2025 15:50:36 +0000

Phi-3 is a family of state-of-the-art open AI models developed by Microsoft, optimized for efficiency, strong reasoning, and long-context processing.

Model Variants

Model	Parameters	Context Window	Ollama Command
Phi-3 Mini	3.8B	4K tokens	`ollama run phi3:mini`
Phi-3 Medium	14B	4K tokens	`ollama run phi3:medium`
Phi-3 Medium (128K)	14B	128K tokens	`ollama run phi3:medium-128k` (Requires Ollama 0.1.39+)

Key Features

High Efficiency – Optimized for low-resource and latency-sensitive environments.
Strong Reasoning – Excels in math, logic, coding, and general knowledge.
Long Context Handling – Up to 128K tokens for deep context retention.
Optimized for Real-World Applications – Well-suited for chatbots, coding, research, and general AI tasks.

Technical Details

Architecture: Dense decoder-only Transformer.
Training Data: 3.3 trillion tokens, including high-quality educational data, synthetic “textbook-like” data, and supervised fine-tuning.
Post-Training: Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) for better instruction adherence and safety.
Training Hardware: 512 H100-80G GPUs over 7 days.
Training Cutoff: October 2023 (Offline dataset).

Performance Benchmarks

Phi-3 Mini achieves state-of-the-art results among models <13B on reasoning and common sense benchmarks.
Phi-3 Medium (14B) outperforms Gemini 1.0 Pro.

Responsible AI Considerations

Primarily trained in English; performance may degrade in other languages.
Potential for bias, misinformation, and hallucinations—requires human oversight.
Not optimized for high-risk applications like legal, medical, or financial decisions.

Deployment & Usage

Run with Ollama:bashCopyEditollama run phi3:mini # 3.8B model ollama run phi3:medium # 14B model ollama run phi3:medium-128k # 14B model with 128K context
Works with: PyTorch, DeepSpeed, FlashAttention, ONNX, Azure AI Studio.

License & Availability

Phi-3 Technical Report

License: MIT (Open-source).

Resources:

Phi-3 on Hugging Face

Microsoft Blog on Phi-3

The post Phi-3 – 3.8B Parameters appeared first on Edge AI.

Llama 3.2 – 1B Parameters

edgeai — Tue, 25 Feb 2025 07:13:46 +0000

Llama 3.2: Small-Scale Multilingual Models for Edge AI

Introduction Meta’s Llama 3.2 models bring the power of large language models (LLMs) to smaller, efficient architectures designed for Edge AI applications. With 1B and 3B parameter versions, these models enable robust multilingual dialogue, retrieval, and summarization capabilities while remaining computationally lightweight.

Why Llama 3.2 for Edge AI? Deploying AI at the edge requires models that are efficient, responsive, and adaptable. Llama 3.2’s small-scale variants provide:

Multilingual Support: Optimized for multiple languages, including English, Spanish, French, German, Portuguese, Hindi, Italian, and Thai.
Optimized for Local Processing: Reduces dependency on cloud infrastructure, enabling real-time inference on edge devices.
Instruction-Tuned Performance: Excels in tasks like summarization, prompt rewriting, and tool use.

Llama 3.2 Small Model Variants

Llama 3.2 – 1B(Efficient Edge AI Model)
- Competitive with other 1B-3B models in multilingual knowledge retrieval and personal information management.
- Ideal for lightweight on-device applications that require local processing.
- Run with: ollama run llama3.2:1b
Llama 3.2 – 3B(Balanced Performance Model)
- Outperforms models like Gemma 2 (2.6B) and Phi 3.5-mini in summarization, instruction following, and tool use.
- Stronger reasoning capabilities while maintaining efficiency for edge deployment.
- Run with: ollama run llama3.2

Applications in Edge AI Llama 3.2’s compact models are suited for:

On-Device Multilingual Assistants: Providing real-time translation, summarization, and Q&A capabilities.
Autonomous Systems: Supporting multilingual AI interactions in robotics, IoT, and industrial automation.
Personalized AI Agents: Running private, local AI assistants for knowledge management and information retrieval.
Secure & Offline AI Processing: Ensuring AI-driven decision-making without constant internet connectivity.

Licensing & Availability Llama 3.2 is released under Meta’s Llama 3.2 Community License Agreement, with usage governed by Meta’s Acceptable Use Policy. The models are freely available for research and commercial applications, subject to compliance with licensing terms.

Conclusion Llama 3.2’s 1B and 3B parameter models provide an optimal balance of efficiency and performance for Edge AI applications. Their multilingual capabilities, instruction tuning, and lightweight architecture make them powerful tools for deploying AI beyond traditional cloud environments.

For more details and downloads, visit EdgeAI.org.

The post Llama 3.2 – 1B Parameters appeared first on Edge AI.

Qwen2.5 – 0.5B – 1.5B – 3B Parameters

edgeai — Tue, 07 Jan 2025 15:50:34 +0000

Qwen2.5 is the latest series of Qwen large language models, developed by Alibaba Cloud. It offers a range of base and instruction-tuned models, with sizes ranging from 0.5B to 72B parameters.

Key Features

Expanded Knowledge & Capabilities: Significantly improved in coding, mathematics, and structured data understanding (e.g., tables, JSON).
Instruction Following & Role-Play: Better at following diverse prompts and setting chatbot conditions.
Long-Context Support: Handles up to 128K tokens and generates up to 8K tokens.
Multilingual Proficiency: Supports 29+ languages, including Chinese, English, French, Spanish, German, Russian, Japanese, Arabic, and more.
License Information: Most models are Apache 2.0 licensed, except for the 3B and 72B models, which fall under the Qwen license.

Model Sizes & Usage

0.5B – 72B: Covers a broad range of applications, from lightweight edge AI to large-scale enterprise use.

7B Model (Default)

Optimized for instruction-following, structured data, and long-text generation.

Run locally via:bashCopyEditollama run qwen2.5

Other Model Sizes:

The post Qwen2.5 – 0.5B – 1.5B – 3B Parameters appeared first on Edge AI.

Gemma – 2B Parameters

edgeai — Tue, 07 Jan 2025 15:40:58 +0000

Gemma is a lightweight, state-of-the-art open model developed by Google DeepMind. Inspired by Gemini, Gemma is optimized for efficiency and high-quality text generation.

Key Features

Available in 2B & 7B parameter sizes
Trained on diverse web documents, covering linguistic styles, coding syntax, and mathematical reasoning
Enhanced safety measures, including content filtering for sensitive and low-quality data
Optimized for Ollama 0.1.26+

Model Sizes & Usage

Run the default 7B model:bashCopyEditollama run gemma
Run the smaller 2B model:bashCopyEditollama run gemma:2b

Technical Details

License: Gemma Terms of Use (Modified February 21, 2024)

Architecture: Gemma

Parameters: 8.54B (7B model)

Quantization: Q4_0 (5.0GB)

The post Gemma – 2B Parameters appeared first on Edge AI.

TinyLlama – 1.1B Parameters

edgeai — Tue, 07 Jan 2025 15:40:53 +0000

The TinyLlama project is an open initiative aimed at training a compact 1.1B parameter Llama model on 3 trillion tokens. Designed for low-resource environments, TinyLlama offers efficient performance with minimal computational and memory requirements.

Key Features

Lightweight model with only 1.1B parameters
Optimized for efficiency in environments with restricted compute power
Trained on 3 trillion tokens for broad language understanding
Minimal memory footprint (~638MB)

Usage

Run TinyLlama with Ollama:bashCopyEditollama run tinyllama

Technical Details

Architecture: Llama
Parameters: 1.1B
Quantization: Q4_0 (638MB)
System Role: General-purpose AI assistant
License: Open-source

Additional Resources

Ollama Documentation

Project GitHub

Hugging Face

The post TinyLlama – 1.1B Parameters appeared first on Edge AI.

Gemma 2 – 2B Parameters

edgeai — Tue, 07 Jan 2025 15:40:50 +0000

Efficient, High-Performance AI for EdgeAI Applications

Google’s Gemma 2 model delivers a balance of power and efficiency, making it a strong candidate for EdgeAI deployments. With a focus on small-scale, high-performance models, the 2B and 9B parameter versions provide cutting-edge natural language processing (NLP) while maintaining a manageable computational footprint.

Lightweight AI for Real-Time Edge Deployment

For EdgeAI applications, model size and efficiency are critical. The 2B and 9B variants of Gemma 2 are designed to operate in constrained environments without sacrificing performance. These models enable:

On-Device AI Processing: Run inference locally without relying on cloud services, reducing latency and improving privacy.
Low-Power AI Applications: Efficient enough for deployment on edge devices with limited resources, such as IoT devices, industrial sensors, and mobile applications.
Adaptive AI Capabilities: Optimize real-time decision-making and contextual processing for robotics, smart assistants, and embedded systems.

Key Features for EdgeAI

2B Parameters: Ideal for ultra-low-resource applications requiring fast inference and minimal hardware requirements.
9B Parameters: A more powerful yet efficient model suitable for advanced on-device processing, capable of handling complex queries and multi-step reasoning.
Optimized Quantization: Efficient quantization techniques, such as Q4_0, reduce memory footprint while maintaining accuracy.
Seamless Integration: Supports frameworks like LangChain and LlamaIndex, allowing easy deployment in edge environments.

Example Use Cases

Autonomous Systems: Enabling AI-driven decision-making in drones, robotics, and industrial automation.
Smart Devices & IoT: Powering voice assistants, predictive maintenance, and real-time anomaly detection on edge devices.
Healthcare AI: Running diagnostic assistance models in local healthcare facilities without requiring cloud access.

Deploying Gemma 2 for EdgeAI

Gemma 2 can be efficiently run using Ollama:

from langchain_community.llms import Ollama
llm = Ollama(model="gemma2:2b")
response = llm.invoke("Explain the benefits of on-device AI.")

For more demanding edge applications, the 9B variant provides a balance between efficiency and performance:

llm = Ollama(model="gemma2:9b")
response = llm.complete("How can AI optimize edge computing workflows?")

Advancing EdgeAI with Gemma 2

By leveraging Gemma 2’s compact yet capable architecture, EdgeAI solutions can achieve faster, more reliable, and scalable AI-driven automation. These models provide a foundation for deploying AI where it matters most—on the edge, closer to real-world interactions.

The post Gemma 2 – 2B Parameters appeared first on Edge AI.