Skip to main content
AboutProblemsGuidelinesTerms & conditions

AjraSakha: AI-Powered Agricultural Advisory Platform

1. POP Processing Pipeline

Project Name: Agricultural Package of Practices (POP) PDF Processing Pipeline

Problem Statement

565 POP PDF documents across 26 Indian states, in English and 22 Indic languages, are scattered, unstructured, and partially garbled (283 PDFs, 50%). Farmers and advisors cannot access or search this knowledge efficiently.

Solution Overview

Automated multi-phase pipeline to classify, extract, translate, and convert PDFs into structured JSON/Markdown for RAG-powered AjraSakha chatbot.

Work Done So Far

  • Phase 1 & 2 complete: 215 PDFs processed (English & Indic).

  • OCR, translation, Docling extraction implemented.

  • Garbled text (Phase 6) preparation: character encoding detection, text reconstruction, font normalization.

Key Features

  • Multilingual PDF extraction (English + 22 Indic languages).

  • OCR with quality validation.

  • Automated JSON/Markdown output for chatbot ingestion.

Technologies Used

  • Extraction/OCR: Docling, PyMuPDF, Tesseract OCR, Google Cloud Vision

  • Translation: Google Translate API, IndicTrans2

  • Orchestration: Python CLI pipeline for automated processing

Future Scope

  • Garbled text recovery for remaining 283 PDFs.

  • Automated Q&A generation.

  • Semantic search across all 565 PDFs.

  • Integration with state agricultural portals.


2. MCP Servers for AjraSakha

Project Name: AjraSakha MCP (Model Context Protocol) Server Suite

Problem Statement

Monolithic backend cannot efficiently integrate FAQs, golden datasets, market data, or POP documents. Challenges include scalability, security, maintainability, and performance.

Solution Overview

Modular MCP server suite built on FastMCP framework, enabling specialized capabilities for chatbot integration.

MCP Servers Implemented

  • FAQ MCP Server: Semantic search over agricultural FAQs.

  • Golden Dataset MCP Server: Crop & state-specific Q&A retrieval.

  • Market Data MCP Server: Real-time commodity prices from eNAM.

  • Shared Functions & Constants: Utilities, configuration.

Work Done So Far

  • 4 production MCP servers deployed.

  • Vector search integrated with MongoDB Atlas (BAAI/bge-large-en embeddings).

  • Dockerized, CI/CD pipeline, systemd service management.

Key Features

  • Modular, secure, scalable backend.

  • Semantic search with vector embeddings.

  • Real-time market data integration.

  • Standardized MCP protocol for easy expansion.

Technologies Used

  • Backend: Python, FastAPI, FastMCP

  • Database: MongoDB Atlas + Vector Search

  • Infrastructure: Docker, Docker Compose, Systemd

  • LLMs: deepseek-r1:70b, qwen3:1.7b, Osmosis

Future Scope

  • Caching and query optimization.

  • Horizontal scaling & database sharding.

  • Real-time updates, WebSocket integration.

  • Additional MCP servers for weather, schemes, analytics.


Impact

  • Democratizes agricultural knowledge across languages.

  • Enables scalable, modular chatbot architecture.

  • Lays foundation for RAG-powered, AI-assisted advisory.

  • Reduces time to access verified agricultural guidance.

  • Facilitates rapid feature development and platform scalability.