Bazlur’s Blog Archive

Building Robust AI Applications with LangChain4j Guardrails and Spring Boot

2025-06-21T00:00:00+00:00

Building Robust AI Applications with LangChain4j Guardrails and Spring Boot

As AI applications become increasingly complex, ensuring that language models behave predictably and safely is paramount. LangChain4j’s guardrails feature provides a powerful framework for validating both the inputs and outputs of your AI services. This article demonstrates how to implement comprehensive guardrails in a Spring Boot application, with practical examples that you can adapt to your use cases.

📦 Complete source code available at : github.com/rokon12/guardrails-demo

Understanding LangChain4j Guardrails

In LangChain4j, guardrails are validation mechanisms that operate exclusively on AI Services, the framework’s high-level abstraction for interacting with language models. Unlike simple validators, guardrails provide sophisticated control over the entire AI interaction lifecycle.

Input Guardrails : Act as gatekeepers, validating user input before it reaches the LLM
1. Prevent prompt injection attacks
2. Filter inappropriate content
3. Enforce business rules
4. Sanitize and normalize input
Output Guardrails : Act as quality controllers, validating and potentially correcting LLM responses
1. Ensure a professional tone
2. Detect hallucinations
3. Validate response format
4. Enforce compliance requirements

This dual-layer approach ensures that your AI applications remain safe, compliant, and aligned with business requirements.

Setting Up a Spring Boot Project with LangChain4j

Let’s start by creating a Spring Boot application with the necessary dependencies. You can use Spring Initializr to bootstrap your project or create it directly in your IDE (IntelliJ IDEA, Eclipse, or VS Code).

🚀 Quick Start with Spring Initializr:

Go to start.spring.io

Choose: Maven/Gradle, Java 21+, Spring Boot 3.x

Add dependencies: Spring Web

Generate and import into your IDE

Add LangChain4j dependencies manually to your pom.xml or build.gradle


    
    
        org.springframework.boot
        spring-boot-starter-web
    
    
    
        org.springframework.boot
        spring-boot-starter-validation
    
    
    
    
        dev.langchain4j
        langchain4j
        1.1.0 
    
    
    
    
        dev.langchain4j
        langchain4j-open-ai
        1.1.0
    
    
    
    
        dev.langchain4j
        langchain4j-test
        1.1.0
        test 
    
    
    
    
        org.springframework.boot
        spring-boot-starter-actuator

Configure your application:

# application.yml
langchain4j:
  open-ai:
    chat-model:
      api-key: ${OPENAI_API_KEY} # 🔐 NEVER hardcode API keys - use environment variables
      model-name: gpt-4 # 💡 Consider cost vs performance when choosing models
      temperature: 0.7 # 🎲 Balance between creativity (1.0) and consistency (0.0)
      max-tokens: 1000 # 💰 Control costs by limiting response length
      timeout: 30s # ⏱️ Prevent hanging requests
      log-requests: true # 🔍 Enable for debugging, disable in production for performance
      log-responses: true

# Application-specific settings
app:
  guardrails:
    input:
      max-length: 1000 # 📏 Prevent resource exhaustion from large inputs
      rate-limit:
        enabled: true
        max-requests-per-minute: 10 # 🛡️ Protect against abuse and control costs
    output:
      max-retries: 3 # 🔄 Balance between reliability and latency

Implementing Input Guardrails

Input guardrails shield your application from malicious, inappropriate, or out-of-scope user inputs. Here are several practical examples.

Content Safety Input Guardrail

@Component
public class ContentSafetyInputGuardrail implements InputGuardrail {

    // 🚫 Customize this list based on your application's domain and risk profile
    private static final List PROHIBITED_WORDS = List.of(
            "hack", "exploit", "bypass", "illegal", "fraud", "crack", "breach",
            "penetrate", "malware", "virus", "trojan", "backdoor", "phishing",
            "spam", "scam", "steal", "theft", "identity", "password", "credential"
    );

    // 🎭 Detect obfuscated threats using regex patterns
    private static final List THREAT_PATTERNS = List.of(
            Pattern.compile("h[4@]ck", Pattern.CASE_INSENSITIVE), // Catches "h4ck", "h@ck"
            Pattern.compile("cr[4@]ck", Pattern.CASE_INSENSITIVE),
            Pattern.compile("expl[0o]it", Pattern.CASE_INSENSITIVE),
            Pattern.compile("byp[4@]ss", Pattern.CASE_INSENSITIVE),
            // 🎯 This pattern catches instruction-style prompts for malicious activities
            Pattern.compile("[\\w\\s]*(?:how\\s+to|teach\\s+me|show\\s+me)\\s+(?:hack|exploit|bypass)", Pattern.CASE_INSENSITIVE)
    );

    @Override
    public InputGuardrailResult validate(UserMessage userMessage) {
        String originalText = userMessage.singleText();
        String text = originalText.toLowerCase();

        // 📏 Length validation should be your first check for performance
        if (originalText.length() > 1000) {
            return failure("Your message is too long. Please keep it under 1000 characters.");
        }

        // 🔍 Check for prohibited words
        for (String word : PROHIBITED_WORDS) {
            if (text.contains(word)) {
                // ⚠️ Be careful not to reveal too much about your security measures
                return failure("Your message contains prohibited content related to security threats.");
            }
        }
        
        // 🎭 Check for obfuscated patterns
        for (Pattern pattern : THREAT_PATTERNS) {
            if (pattern.matcher(originalText).find()) {
                return failure("Your message contains potentially harmful content patterns.");
            }
        }

        return success();
    }
}

Smart Context-Aware Guardrail

This guardrail uses conversation history to make intelligent decisions:

@Component
@Slf4j
public class ContextAwareInputGuardrail implements InputGuardrail {
    
    private static final int MAX_SIMILAR_QUESTIONS = 3;
    private static final double SIMILARITY_THRESHOLD = 0.8; // 📊 Adjust based on your tolerance
    
    @Override
    public InputGuardrailResult validate(InputGuardrailRequest request) {
        ChatMemory memory = request.memory();
        UserMessage currentMessage = request.userMessage();
        
        // 💡 Always handle null cases gracefully
        if (memory == null || memory.messages().isEmpty()) {
            return success();
        }
        
        // Check for repetitive questions
        List previousQuestions = extractUserQuestions(memory);
        String currentQuestion = currentMessage.singleText();
        
        long similarQuestions = previousQuestions.stream()
            .filter(q -> calculateSimilarity(q, currentQuestion) > SIMILARITY_THRESHOLD)
            .count();
        
        if (similarQuestions >= MAX_SIMILAR_QUESTIONS) {
            // 📝 Log suspicious behavior for security monitoring
            log.info("User asking repetitive questions: {}", currentQuestion);
            return failure("You've asked similar questions multiple times. Please try a different topic or rephrase your question.");
        }
        
        // Check conversation velocity (potential abuse)
        if (isConversationTooFast(memory)) {
            return failure("Please slow down. You're sending messages too quickly.");
        }
        
        return success();
    }
    
    private List extractUserQuestions(ChatMemory memory) {
        return memory.messages().stream()
            .filter(msg -> msg instanceof UserMessage) // 🎯 Type-safe filtering
            .map(ChatMessage::text)
            .collect(Collectors.toList());
    }
    
    private double calculateSimilarity(String s1, String s2) {
        // 🧮 Simple Jaccard similarity - in production, use more sophisticated methods
        // Consider: Levenshtein distance, cosine similarity, or semantic embeddings
        Set set1 = new HashSet<>(Arrays.asList(s1.toLowerCase().split("\\s+")));
        Set set2 = new HashSet<>(Arrays.asList(s2.toLowerCase().split("\\s+")));
        
        Set intersection = new HashSet<>(set1);
        intersection.retainAll(set2);
        
        Set union = new HashSet<>(set1);
        union.addAll(set2);
        
        return union.isEmpty() ? 0 : (double) intersection.size() / union.size();
    }
    
    private boolean isConversationTooFast(ChatMemory memory) {
        // ⏱️ TODO: Implement timestamp checking
        // Check if user is sending messages too quickly (potential spam)
        List recentMessages = memory.messages();
        if (recentMessages.size() < 5) return false;
        
        // In a real implementation, you'd check timestamps
        // This is a simplified example
        return false;
    }
}

Intelligent Input Sanitizer

This guardrail not only validates but also improves input quality:

@Component
public class IntelligentInputSanitizerGuardrail implements InputGuardrail {
    
    // 🌐 Comprehensive URL pattern that handles most common URL formats
    private static final Pattern URL_PATTERN = Pattern.compile(
        "https?://[\\w\\-._~:/?#\\[\\]@!$&'()*+,;=.]+", 
        Pattern.CASE_INSENSITIVE
    );
    
    // 📧 Standard email pattern - consider RFC 5322 for stricter validation
    private static final Pattern EMAIL_PATTERN = Pattern.compile(
        "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}", 
        Pattern.CASE_INSENSITIVE
    );

    @Override
    public InputGuardrailResult validate(UserMessage userMessage) {
        String text = userMessage.singleText();
        
        // 🔒 Remove potential PII for privacy compliance (GDPR, CCPA)
        text = EMAIL_PATTERN.matcher(text).replaceAll("[EMAIL_REDACTED]");
        
        // 🔗 Clean URLs but keep them for context
        text = URL_PATTERN.matcher(text).replaceAll("[URL]");
        
        // 📝 Normalize whitespace for consistent processing
        text = text.replaceAll("\\s+", " ").trim();
        
        // 🛡️ Remove potentially harmful characters while preserving meaning
        // These characters could be used for injection attacks
        text = text.replaceAll("[<>{}\\[\\]|\\\\]", "");
        
        // ✂️ Smart truncation that preserves sentence structure
        if (text.length() > 500) {
            text = smartTruncate(text, 500);
        }
        
        // 🔤 Fix common typos and normalize
        text = normalizeText(text);
        
        // ✅ Return the sanitized text, not just validation result
        return successWith(text);
    }
    
    private String smartTruncate(String text, int maxLength) {
        if (text.length() <= maxLength) return text;
        
        // 📍 Try to cut at sentence boundary for better readability
        int lastPeriod = text.lastIndexOf('.', maxLength);
        if (lastPeriod > maxLength * 0.8) { // 80% threshold ensures we don't cut too early
            return text.substring(0, lastPeriod + 1);
        }
        
        // 🔤 Otherwise, cut at word boundary
        int lastSpace = text.lastIndexOf(' ', maxLength);
        if (lastSpace > maxLength * 0.8) {
            return text.substring(0, lastSpace) + "...";
        }
        
        // ✂️ Last resort: hard cut
        return text.substring(0, maxLength - 3) + "...";
    }
    
    private String normalizeText(String text) {
        // 🔧 Fix common issues
        text = text.replaceAll("\\bi\\s", "I ");  // i -> I
        text = text.replaceAll("\\s+([.,!?])", "$1");  // Remove space before punctuation
        text = text.replaceAll("([.,!?])(\\w)", "$1 $2");  // Add space after punctuation
        
        return text;
    }
}

ProTip: Input sanitizers should be the last guardrail in your input chain. They clean and normalize input after all validation checks have passed.

Implementing Output Guardrails

Output guardrails ensure that LLM responses meet your quality standards and business requirements.

Professional Tone Output Guardrail

@Component
public class ProfessionalToneOutputGuardrail implements OutputGuardrail {

    // 🚫 Phrases that damage professional credibility
    private static final List UNPROFESSIONAL_PHRASES = List.of(
            "that's weird", "that's dumb", "whatever", "i don't know"
    );

    // ✨ Elements that enhance professional communication
    private static final List REQUIRED_ELEMENTS = List.of(
            "thank you",
            "please",
            "happy to help"
    );

    @Override
    public OutputGuardrailResult validate(AiMessage responseFromLLM) {
        String text = responseFromLLM.text().toLowerCase();

        // 🔍 Check for unprofessional language
        for (String unprofessionalPhrase : UNPROFESSIONAL_PHRASES) {
            if (text.contains(unprofessionalPhrase)) {
                // 🔄 Request reprompting with specific guidance
                return reprompt("Unprofessional tone detected",
                        "Please maintain a professional and helpful tone");
            }
        }

        // 📏 Enforce response length limits for better UX
        if (text.length() > 1000) {
            return reprompt("Response too long",
                    "Please keep your response under 1000 characters.");
        }

        // 🎯 Ensure professional courtesy is present
        boolean hasCourtesy = REQUIRED_ELEMENTS.stream()
                .anyMatch(text::contains);
        if (!hasCourtesy) {
            return reprompt(
                    "Response lacks professional courtesy",
                    "Please include polite and helpful language in your response."
            );
        }

        return success();
    }
}

Hallucination Detection Guardrail

@Component
public class ProfessionalToneOutputGuardrail implements OutputGuardrail {

    // 🚫 Phrases that damage professional credibility
    private static final List UNPROFESSIONAL_PHRASES = List.of(
            "that's weird", "that's dumb", "whatever", "i don't know"
    );

    // ✨ Elements that enhance professional communication
    private static final List REQUIRED_ELEMENTS = List.of(
            "thank you",
            "please",
            "happy to help"
    );

    @Override
    public OutputGuardrailResult validate(AiMessage responseFromLLM) {
        String text = responseFromLLM.text().toLowerCase();

        // 🔍 Check for unprofessional language
        for (String unprofessionalPhrase : UNPROFESSIONAL_PHRASES) {
            if (text.contains(unprofessionalPhrase)) {
                // 🔄 Request reprompting with specific guidance
                return reprompt("Unprofessional tone detected",
                        "Please maintain a professional and helpful tone");
            }
        }

        // 📏 Enforce response length limits for better UX
        if (text.length() > 1000) {
            return reprompt("Response too long",
                    "Please keep your response under 1000 characters.");
        }

        // 🎯 Ensure professional courtesy is present
        boolean hasCourtesy = REQUIRED_ELEMENTS.stream()
                .anyMatch(text::contains);
        if (!hasCourtesy) {
            return reprompt(
                    "Response lacks professional courtesy",
                    "Please include polite and helpful language in your response."
            );
        }

        return success();
    }
}

ProTip: Hallucination detection can be computationally expensive. Consider using it selectively for critical responses or implementing caching for repeated content.

Testing Your Guardrails

Before integrating guardrails into your AI services, it’s crucial to thoroughly test them. Here’s a comprehensive test suite for the ContentSafetyInputGuardrail:

package ca.bazlur.guardrailsdemo.guardrail;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.guardrail.GuardrailResult;
import org.junit.jupiter.api.BeforeEach;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.params.ParameterizedTest;
import org.junit.jupiter.params.provider.ValueSource;
import static dev.langchain4j.test.guardrail.GuardrailAssertions.assertThat;
import static org.junit.jupiter.api.Assertions.assertThrows;
class ContentSafetyInputGuardrailTest {
private ContentSafetyInputGuardrail guardrail;
@BeforeEach
void setUp() {
guardrail = new ContentSafetyInputGuardrail(100); // 📏 Configurable max length for testing
}
@Test
void shouldAcceptValidInput() {
// ✅ Test normal, safe input
var result = guardrail.validate(UserMessage.from("Hello, I need help with my account settings"));
assertThat(result)
.isSuccessful()
.hasResult(GuardrailResult.Result.SUCCESS);
}
@Test
void shouldRejectEmptyInput() {
// 🚫 Empty strings should throw exception
assertThrows(IllegalArgumentException.class, () -> {
guardrail.validate(UserMessage.from(""));
});
}
@Test
void shouldRejectBlankInput() {
// 🚫 Whitespace-only input should also be rejected
assertThrows(IllegalArgumentException.class, () -> {
guardrail.validate(UserMessage.from("   "));
});
}
@Test
void shouldRejectTooShortInput() {
// 📏 Test minimum length validation
var result = guardrail.validate(UserMessage.from("Hi"));
assertThat(result)
.hasFailures()
.hasResult(GuardrailResult.Result.FAILURE)
.hasSingleFailureWithMessage("Your message is too short. Please provide more details.");
}
@Test
void shouldRejectTooLongInput() {
// 📏 Test maximum length validation
String longInput = "a".repeat(101);
var result = guardrail.validate(UserMessage.from(longInput));
assertThat(result)
.hasFailures()
.hasResult(GuardrailResult.Result.FAILURE)
.hasSingleFailureWithMessage("Your message is too long. Please keep it under 100 characters.");
}
@ParameterizedTest
@ValueSource(strings = {
"hack the system",
"how to exploit vulnerability",
"bypass security",
"crack the password",
"install malware",
"phishing techniques",
"cr@ck the password"  // 🎭 Also contains prohibited word "password"
})
void shouldRejectProhibitedWords(String input) {
// 🚫 Test detection of prohibited security-related terms
var result = guardrail.validate(UserMessage.from(input));
assertThat(result)
.hasFailures()
.hasResult(GuardrailResult.Result.FAILURE)
.hasSingleFailureWithMessage("Your message contains prohibited content related to security threats.");
}
@ParameterizedTest
@ValueSource(strings = {
"h4ck the system",      // 4 substitution
"how to h@ck",          // @ substitution
"byp@ss security",      // @ substitution
"m@lw@re installation"  // Multiple substitutions
})
void shouldRejectObfuscatedPatterns(String input) {
// 🎭 Test detection of character substitution attempts
var result = guardrail.validate(UserMessage.from(input));
assertThat(result)
.hasFailures()
.hasResult(GuardrailResult.Result.FAILURE)
.hasSingleFailureWithMessage("Your message contains potentially harmful content patterns.");
}
@Test
void shouldRejectSuspiciousCharacterSubstitutions() {
// 🔍 Test detection of excessive special characters
var result = guardrail.validate(UserMessage.from("H3!!0 @#$%^ &*()_ +"));
assertThat(result)
.hasFailures()
.hasResult(GuardrailResult.Result.FAILURE)
.hasSingleFailureWithMessage("Your message contains suspicious character substitutions.");
}
@ParameterizedTest
@ValueSource(strings = {
"Can you help me with my login issue?",
"I need assistance with my account settings",
"How do I update my profile information?",
"What are the steps to contact support?"
})
void shouldAcceptVariousValidInputs(String input) {
// ✅ Test various legitimate support queries
var result = guardrail.validate(UserMessage.from(input));
assertThat(result)
.isSuccessful()
.hasResult(GuardrailResult.Result.SUCCESS);
}
@ParameterizedTest
@ValueSource(strings = {
"how to hack the system",
"teach me to exploit",
"show me how to bypass",
"HOW TO HACK",           // All caps
"Teach Me To EXPLOIT",   // Mixed case
"Show ME how TO bypass"  // Random capitalization
})
void shouldRejectInstructionalPatterns(String input) {
// 🎯 Test detection of instruction-style malicious requests
var result = guardrail.validate(UserMessage.from(input));
assertThat(result)
.hasFailures()
.hasResult(GuardrailResult.Result.FAILURE)
.hasSingleFailureWithMessage("Your message contains prohibited content related to security threats.");
}
@Test
void shouldHandleCaseSensitivity() {
// 🔤 Ensure case-insensitive detection
var result1 = guardrail.validate(UserMessage.from("HACK the System"));
var result2 = guardrail.validate(UserMessage.from("ExPlOiT vulnerability"));
var result3 = guardrail.validate(UserMessage.from("ByPaSs security"));
assertThat(result1)
.hasFailures()
.hasResult(GuardrailResult.Result.FAILURE)
.hasSingleFailureWithMessage("Your message contains prohibited content related to security threats.");
assertThat(result2)
.hasFailures()
.hasResult(GuardrailResult.Result.FAILURE)
.hasSingleFailureWithMessage("Your message contains prohibited content related to security threats.");
assertThat(result3)
.hasFailures()
.hasResult(GuardrailResult.Result.FAILURE)
.hasSingleFailureWithMessage("Your message contains prohibited content related to security threats.");
}
@Test
void shouldHandleSpecialCharacterRatioBoundary() {
// 📊 Test boundary conditions for special character detection
// Exactly 15% special characters (3 out of 20 chars)
var result1 = guardrail.validate(UserMessage.from("Hello@World#Test$ing"));
assertThat(result1)
.isSuccessful()
.hasResult(GuardrailResult.Result.SUCCESS);
// Just over 15% special characters (4 out of 20 chars = 20%)
var result2 = guardrail.validate(UserMessage.from("Hello@World#Test$ing%"));
assertThat(result2)
.hasFailures()
.hasResult(GuardrailResult.Result.FAILURE)
.hasSingleFailureWithMessage("Your message contains suspicious character substitutions.");
}
@Test
void shouldHandleLengthBoundaries() {
// 📏 Test exact boundary conditions
// Exactly 5 characters (minimum allowed)
var result1 = guardrail.validate(UserMessage.from("Hello"));
assertThat(result1)
.isSuccessful()
.hasResult(GuardrailResult.Result.SUCCESS);
// 4 characters (too short)
var result2 = guardrail.validate(UserMessage.from("Help"));
assertThat(result2)
.hasFailures()
.hasResult(GuardrailResult.Result.FAILURE)
.hasSingleFailureWithMessage("Your message is too short. Please provide more details.");
// Exactly max length
var result3 = guardrail.validate(UserMessage.from("a".repeat(100)));
assertThat(result3)
.isSuccessful()
.hasResult(GuardrailResult.Result.SUCCESS);
}
}

💡 Testing Best Practices for Guardrails:

Test boundary conditions (minimum/maximum values)

Use parameterized tests for similar scenarios

Test both positive and negative cases

Verify exact error messages for better debugging

Test case sensitivity and special character handling

Use the GuardrailAssertions utility for cleaner test code

Creating AI Services with Guardrails

Now let’s combine our guardrails into comprehensive AI services.

@Component
public class ProfessionalToneOutputGuardrail implements OutputGuardrail {

    // 🚫 Phrases that damage professional credibility
    private static final List UNPROFESSIONAL_PHRASES = List.of(
            "that's weird", "that's dumb", "whatever", "i don't know"
    );

    // ✨ Elements that enhance professional communication
    private static final List REQUIRED_ELEMENTS = List.of(
            "thank you",
            "please",
            "happy to help"
    );

    @Override
    public OutputGuardrailResult validate(AiMessage responseFromLLM) {
        String text = responseFromLLM.text().toLowerCase();

        // 🔍 Check for unprofessional language
        for (String unprofessionalPhrase : UNPROFESSIONAL_PHRASES) {
            if (text.contains(unprofessionalPhrase)) {
                // 🔄 Request reprompting with specific guidance
                return reprompt("Unprofessional tone detected",
                        "Please maintain a professional and helpful tone");
            }
        }

        // 📏 Enforce response length limits for better UX
        if (text.length() > 1000) {
            return reprompt("Response too long",
                    "Please keep your response under 1000 characters.");
        }

        // 🎯 Ensure professional courtesy is present
        boolean hasCourtesy = REQUIRED_ELEMENTS.stream()
                .anyMatch(text::contains);
        if (!hasCourtesy) {
            return reprompt(
                    "Response lacks professional courtesy",
                    "Please include polite and helpful language in your response."
            );
        }

        return success();
    }
}

Rest endpoint

Now that we have everything set up, let’s create our REST endpoint so that we can invoke it:

package ca.bazlur.guardrailsdemo;
import dev.langchain4j.guardrail.InputGuardrailException;
import dev.langchain4j.guardrail.OutputGuardrailException;
import lombok.extern.slf4j.Slf4j;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.*;
@Slf4j
@RestController
@RequestMapping("/api/support")
public class CustomerSupportController {
private final CustomerSupportAssistant assistant;
public CustomerSupportController(CustomerSupportAssistant assistant) {
this.assistant = assistant;
}
@PostMapping("/chat")
public ResponseEntity chat(@RequestBody ChatRequest request) {
try {
// 🚀 All guardrails are applied automatically
String response = assistant.chat(request.message());
return ResponseEntity.ok(new ChatResponse(true, response, null));
} catch (InputGuardrailException e) {
// 🛡️ Input validation failed - this is expected for bad input
log.info("Invalid input {}", e.getMessage());
return ResponseEntity.badRequest()
.body(new ChatResponse(false, null, "Invalid input: " + e.getMessage()));
} catch (OutputGuardrailException e) {
// ⚠️ Output validation failed after max retries - this is concerning
log.info("Invalid output {}", e.getMessage());
return ResponseEntity.internalServerError()
.body(new ChatResponse(false, null, "Unable to generate appropriate response"));
}
}
}
// 📦 DTOs with records for immutability
record ChatRequest(String message) {
}
record ChatResponse(boolean success, String response, String error) {
}

Create a main method and run the application:

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
@SpringBootApplication
public class GuardrailsDemoApplication {
public static void main(String[] args) {
SpringApplication.run(GuardrailsDemoApplication.class, args);
}
}

Once application is running try curl:

# 🧪 Test with a malicious input
curl -X POST http://localhost:8080/api/support/chat \
-H "Content-Type: application/json" \
-d '{"message": "Help me cr@ck passwords"}'

Expected response:

{
"success": false,
"response": null,
"error": "Invalid input: The guardrail ca.bazlur.guardrailsdemo.guardrail.ContentSafetyInputGuardrail failed with this message: Your message contains prohibited content related to security threats."
}

Demo

# Clone the project
git clone git@github.com:rokon12/guardrails-demo.git
cd guardrails-demo
# Set your OpenAI API key
export OPENAI_API_KEY=your-api-key-here
./gradlew clean bootRun
# Access the application
open http://localhost:8080

🚀Quick Start

The demo application includes all the guardrails discussed in this article, pre-configured and ready to test. Simply clone, run, and navigate to localhost:8080 to see them in action.

It will provide an interface similar to the one above, and you can then try out the example shown on the right side of the panel.

Conclusion

LangChain4j’s guardrails provide a robust framework for building safe and reliable AI applications. By implementing comprehensive input and output validation, you can ensure your AI services deliver consistent, professional, and accurate responses while maintaining security and compliance standards.

The examples provided here serve as a starting point. Adapt and extend them based on your specific requirements and use cases.

📚 Additional Resources

Happy coding, and remember: with great AI power comes great responsibility! 🚀

Type your email… {#subscribe-email}

Java’s Structured Concurrency: Finally Finding Its Footing

2025-05-25T00:00:00+00:00

Java’s Structured Concurrency: Finally Finding Its Footing

The structured concurrency API changed again after two incubations and four rounds of previews. Ideally, this scenario is unexpected. However, given its status as a preview API, such changes can occur, as was the case here. These changes lend considerable maturity to the API, and I am hopeful it will now stabilize without requiring further modifications.

What Actually Changed This Time

When I first started working with structured concurrency back in its incubation phase, I was excited about the promise of cleaner concurrent code. The idea was simple: treat concurrent tasks like a structured block, where all spawned tasks complete before the block exits. It sounded perfect in theory, but the API continued to evolve, making it a bit frustrating to keep up with the changes. The latest iteration in JEP 505 brings some significant refinements that I believe finally put this feature on solid ground. The most notable change is the introduction of more flexible task handling and better integration with virtual threads. This article will detail the differences and explain the significance of these changes.

The Core Concept Remains Strong

Before diving into the changes, let’s establish what structured concurrency is trying to solve. In traditional concurrent programming, we often end up with scattered task management:

import java.util.Random;
import java.util.concurrent.*;

public class TraditionalConcurrencyExample {
  private static final Random random = new Random();

  private static String fetchUserData(String userId) throws InterruptedException {
    Thread.sleep(1000 + random.nextInt(2000)); // 1-3 seconds
    if (random.nextBoolean()) {
      throw new RuntimeException("User service unavailable");
    }
    return "UserData[" + userId + "]";
  }

  private static String fetchUserPreferences(String userId) throws InterruptedException {
    Thread.sleep(800 + random.nextInt(1500)); // 0.8-2.3 seconds
    if (random.nextBoolean()) {
      throw new RuntimeException("Preferences service down");
    }
    return "Preferences[" + userId + "]";
  }

  private static String combineUserInfo(String userData, String preferences) {
    return userData + " + " + preferences;
  }

  public static String getUserInfoTraditional(String userId) throws Exception {
    try (ExecutorService executor = Executors.newCachedThreadPool()) {
      Future future1 = executor.submit(() -> fetchUserData(userId));
      Future future2 = executor.submit(() -> fetchUserPreferences(userId));

      try {
        String userData = future1.get();
        String preferences = future2.get();
        return combineUserInfo(userData, preferences);
      } catch (Exception e) {
        // Cleanup is messy - what about the other task?
        System.out.println("Error occurred, attempting cleanup...");
        future1.cancel(true);
        future2.cancel(true);
        throw e;
      }
    }
  }

  void main() {
    for (int i = 0; i < 5; i++) {
      try {
        System.out.println("Attempt " + (i + 1) + ": " +
            getUserInfoTraditional("user123"));
      } catch (Exception e) {
        System.out.println("Attempt " + (i + 1) + " failed: " +
            e.getMessage());
      }
      System.out.println();
    }
  }
}

When you run this code, several issues typically emerge:

Complex error handling: If one task fails, we must manually cancel the other task. Otherwise, it will continue running despite no longer being required, leading to resource leakage.
Thread lifecycle management: You are responsible for the entire lifecycle of the threads.
Exception propagation: Checked exceptions tend to get wrapped awkwardly.
No guarantee of cleanup: If the main thread exits unexpectedly, tasks might continue running.

Structured concurrency aims to resolve these challenges.

The headline change: static factory methods

The most obvious tweak in JEP 505 is that you no longer call new StructuredTaskScope<>(). You open() one instead:

try (var scope = StructuredTaskScope.open()) {
    // ...
}

The zero-argument open() returns a scope that waits for all subtasks to succeed or any to fail—the default “all-or-fail” policy. If you need something fancier, call the overloaded open(joiner) variant and supply a custom completion policy via a Joiner (more on that in a minute). Why the factory? It packages sensible defaults and, critically, gives the implementation room to evolve without breaking your code. I find this change beneficial: using a single keyword is more concise, and it reduces potential complications.

Now let’s rewrite the previous example with the new API:

public static String getUserInfoTraditional(String userId) throws Exception {
  try (var scope = StructuredTaskScope.open()) {
    StructuredTaskScope.Subtask task1 = scope.fork(() -> fetchUserData(userId));
    StructuredTaskScope.Subtask task2 = scope.fork(() -> fetchUserPreferences(userId));

    scope.join();

    String userData = task1.get();
    String preferences = task2.get();

    return combineUserInfo(userData, preferences);
  }
}

The difference is striking. With structured concurrency, the cleanup is automatic and guaranteed. If any task fails, all other tasks in the scope are cancelled. If the scope exits (normally or exceptionally), all resources are cleaned up. This is comparable to having a try-with-resources mechanism for concurrent tasks.

This approach has several advantages I’ve come to appreciate:

Guaranteed cleanup: Tasks cannot outlive their scope.
Clear ownership: Tasks belong to a specific scope.
Exception safety: Failures are handled consistently.
Resource management: No thread pool management needed.
Composability: Scopes can be nested and combined.

Joiners: pick your success policy

A Joiner intercepts completion events and decides (1) whether to cancel siblings and (2) what join() should return. The JDK ships several factory helpers:

“First one wins” (aka racing a set of replicas)

try (var scope = StructuredTaskScope.open(
         Joiner.anySuccessfulResultOrThrow())) {

    urls.forEach(url -> scope.fork(() -> fetchFrom(url)));
    return scope.join();             // returns first successful String
}

“All must succeed and I want their results”

try (var scope = StructuredTaskScope.open(
         Joiner.allSuccessfulOrThrow())) {
    tasks.forEach(scope::fork);
    return scope.join()              // Stream>
                 .map(Subtask::get)
                 .toList();
}

These little helpers make common patterns—“race”, “gather”, “wait-for-all”—painless.

Rolling your own Joiner

Sometimes you need a custom policy. Suppose I want to collect every successful subtask but ignore failures:

import java.util.Queue;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.StructuredTaskScope;
import java.util.stream.Stream;

void main() {

  List urls = List.of("https://bazlur.ca", "https://foojay.io", "https://github.com");

  try (var scope = StructuredTaskScope.open(new MyCollectingJoiner())) {
    urls.forEach(url -> scope.fork(() -> fetchFrom(url)));
    List fetchedContent = scope.join().toList();

    System.out.println("Total fetched content: " + fetchedContent.size());
  } catch (InterruptedException e) {
    throw new RuntimeException(e);
  }

}

private String fetchFrom(String url) {
  return "fetched from " + url + "";
}

class MyCollectingJoiner implements StructuredTaskScope.Joiner> {
  private final Queue results = new ConcurrentLinkedQueue<>();

  @Override
  public boolean onComplete(StructuredTaskScope.Subtask st) {
    if (st.state() == StructuredTaskScope.Subtask.State.SUCCESS)
      results.add(st.get());
    return false;
  }

  @Override
  public Stream result() {
    return results.stream();
  }
}

The interface is tiny—onFork, onComplete, and result()—yet powerful enough for most custom logic. To run this, we need JDK 25, and we can execute it from the CLI using the following command:

java --enable-preview CollectingJoiner.java.

Better cancellation and deadlines

Cancellation rules did not change in spirit, but the API got stricter. If the owner thread is interrupted before or during join(), the scope automatically cancels every unfinished subtask. Subtasks should promptly honor InterruptedException; otherwise, close() will block, waiting for them to complete. (If you’re calling blocking I/O, you’re fine; if you’re polling, remember to check Thread.currentThread().isInterrupted()).

Need a deadline? Pass a configuration lambda:

try (var scope = StructuredTaskScope.open(
         Joiner.anySuccessfulResultOrThrow(),
         cfg -> cfg.withTimeout(Duration.ofSeconds(2)))) {
    // ...
}

If the timeout fires, the scope cancels, and join() throws TimeoutException. In practice, I attach a timeout to every external call to keep runaway tasks under control.

You can also swap the default virtual-thread factory for one that sets names or thread-locals:

ThreadFactory tagged = Thread.ofVirtual().name("api-%d").factory();

try (var scope = StructuredTaskScope.open(
         Joiner.allSuccessfulOrThrow(),
         cfg -> cfg.withThreadFactory(tagged))) {
    // ...
}

Thread naming alone makes thread dumps far more readable.

Scoped values ride along

All subtasks inherit bindings for ScopedValues established in the parent thread. That means you can pass request context, security credentials, or MDC information without packing it into every lambda. Once you experience this capability, you’ll find it hard to revert to using ThreadLocal.

Guard-rails against misuse

StructuredTaskScope strictly enforces structure. If fork() is called from any thread other than the owner, a StructureViolationException is thrown. Forget the try-with-resources and let the scope escape the method? Same result. This approach is strict, but it effectively prevents accidental resource exhaustion (akin to ‘fork-bombs’).

Observability improvements

Thread dumps now include the scope tree, so tools can show parent–child relationships directly. When I run jcmd Thread.dump_to_file -format=json, every scope appears with its forked threads nested below the owner. Finding the straggler that pins your virtual thread pool becomes a two-second grep instead of a half-hour investigation.

Some more examples to try out

Example 1 – 360° Product View (Gather–Then–Fail)

A classic e-commerce endpoint where a single HTTP request must aggregate product core data, real-time inventory, and a personalized price. Each sub-service is invoked in parallel inside a StructuredTaskScope that enforces an all-or-nothing policy: any failure or exceeding the one-second deadline cancels the whole group and surfaces an error to the caller. The scope’s timeout, custom thread names, and allSuccessfulOrThrow() joiner encapsulate what is often a complex web of CompletableFuture wiring in three declarative lines.

import java.time.Duration;
import java.util.Random;
import java.util.concurrent.StructuredTaskScope;
import java.util.concurrent.ThreadFactory;

public class ThreeSixtyProductView {
  record Product(long id, String name) {}
  record Stock(long productId, int quantity) {}
  record Price(long productId, double amount) {}
  record ProductPayload(Product core, Stock stock, Price price) {}

  private static Product coreApi(long id) throws InterruptedException {
    Thread.sleep(100); // simulate latency
    return new Product(id, "Gadget‑" + id);
  }

  private static Stock stockApi(long id) throws InterruptedException {
    Thread.sleep(120);
    return new Stock(id, new Random().nextInt(100));
  }

  private static Price priceApi(long id) throws InterruptedException {
    Thread.sleep(150);
    return new Price(id, 99.99);
  }

  static ProductPayload fetchProduct(long id) throws Exception {
    ThreadFactory named = Thread.ofVirtual().name("prod-%d", 1).factory();

    try (var scope = StructuredTaskScope.open(
        StructuredTaskScope.Joiner.allSuccessfulOrThrow(),
        cfg -> cfg.withTimeout(Duration.ofSeconds(1))
            .withThreadFactory(named))) {

      StructuredTaskScope.Subtask core = scope.fork(() -> coreApi(id));
      StructuredTaskScope.Subtask stock = scope.fork(() -> stockApi(id));
      StructuredTaskScope.Subtask price = scope.fork(() -> priceApi(id));

      scope.join(); // throws on first failure / timeout
      return new ProductPayload(core.get(), stock.get(), price.get());
    }
  }

  void main() throws Exception {
    ProductPayload productPayload = fetchProduct(1L);
    System.out.println(productPayload);
  }
}

Example 2 – “Race the Mirrors” File Downloader

Large binaries are hosted on several CDN mirrors. Latency varies, so we fire requests to every mirror simultaneously and use Joiner.anySuccessfulResultOrThrow() to stream the first successful InputStream, cancelling the rest. Bandwidth and connection slots are freed instantly, and users perceive the fastest possible download without manual cancellation plumbing.

import java.io.*;
import java.net.URI;
import java.nio.file.*;
import java.util.List;
import java.util.Random;
import java.util.concurrent.StructuredTaskScope;

public class MirrorDownloaderDemo {
  void main() throws Exception {
    List mirrors = List.of(
        URI.create("https://mirror‑a.example.com"),
        URI.create("https://mirror‑b.example.com"),
        URI.create("https://mirror‑c.example.com"));

    Path target = Files.createFile(Path.of("download1.txt"));
    download(target, mirrors);
    System.out.println("Saved to " + target.toAbsolutePath());
  }

  static Path download(Path target, List mirrors) throws Exception {
    try (var scope = StructuredTaskScope.open(
        StructuredTaskScope.Joiner.anySuccessfulResultOrThrow())) {

      mirrors.forEach(uri -> scope.fork(() -> fetchFromMirror(uri)));
      try (InputStream in = scope.join()) {
        Files.copy(in, target, StandardCopyOption.REPLACE_EXISTING);
      }
      return target;
    }
  }

  private static InputStream fetchFromMirror(URI uri) throws InterruptedException {
    Thread.sleep(50 + new Random().nextInt(300));
    String data = "Downloaded from " + uri + "\n";
    return new ByteArrayInputStream(data.getBytes());
  }
}

Example 3 – Batched Thumbnail Generator with Nested Scopes

A media pipeline step receives a directory of images. An outer scope iterates through the files, while an inner scope, for each image, fans out three resize tasks (small, medium, and large). The inner scope fails fast; if any resize fails, that image is skipped, but the outer batch continues unaffected. Nested scopes separate per-item consistency from batch-level throughput with minimal code.

import java.io.IOException;
import java.nio.file.*;
import java.util.concurrent.StructuredTaskScope;

public class ThumbnailBatchDemo {
  enum Size {SMALL, MEDIUM, LARGE}

  void main() throws Exception {
    Path tmpDir = Files.createTempDirectory("images");
    for (int i = 0; i < 3; i++) Files.createTempFile(tmpDir, "img" + i, ".jpg");
    processBatch(tmpDir);
  }

  static void processBatch(Path dir) throws IOException, InterruptedException {
    try (var batch = StructuredTaskScope.open()) {
      try (var files = Files.list(dir)) {
        files.filter(Files::isRegularFile)
            .forEach(img -> batch.fork(() -> handleOne(img)));
      }
      batch.join();
    }
  }

  private static void handleOne(Path image) {
    try (var scope = StructuredTaskScope.open(
        StructuredTaskScope.Joiner.allSuccessfulOrThrow())) {
      scope.fork(() -> resizeAndUpload(image, Size.SMALL));
      scope.fork(() -> resizeAndUpload(image, Size.MEDIUM));
      scope.fork(() -> resizeAndUpload(image, Size.LARGE));
      scope.join();
    } catch (Exception ex) {
      System.err.println("Skipping " + image.getFileName() + ": " + ex);
    }
  }

  private static Void resizeAndUpload(Path image, Size size) throws InterruptedException {
    Thread.sleep(80); // simulate resize
    Thread.sleep(40); // simulate upload
    System.out.println("Uploaded " + image.getFileName() + " [" + size + "]");
    return null;
  }
}

Example 4 – Real-Time Quote Service with Timed Fallback

A trading UI demands a quote within 30 ms. A custom joiner captures the first successful price from the primary market feed, with a scope-level timeout of 30 ms. If the feed stalls, scope.join() returns empty and the service instantly falls back to yesterday’s cached closing price. Callers always receive a value on time, and timeout logic lives in one declarative line.

import java.time.Duration;
import java.util.*;
import java.util.concurrent.StructuredTaskScope;
import java.util.concurrent.StructuredTaskScope.Subtask;

public class QuoteServiceDemo {
  void main() throws Exception {
    double q = quote("ACME");
    System.out.printf("Quote for ACME: %.2f%n", q);
  }

  static double quote(String symbol) throws InterruptedException {
    var firstSuccess = new StructuredTaskScope.Joiner>() {
      private volatile Double value;

      public boolean onComplete(Subtask st) {
        if (st.state() == Subtask.State.SUCCESS) value = st.get();
        return value != null;           // stop when we have one
      }

      public Optional result() {
        return Optional.ofNullable(value);
      }
    };

    try (var scope = StructuredTaskScope.open(firstSuccess,
        cfg -> cfg.withTimeout(Duration.ofMillis(30)))) {
      scope.fork(() -> marketFeed(symbol));
      Optional latest = scope.join();
      return latest.orElseGet(() -> cache(symbol));
    }
  }

  private static double marketFeed(String symbol) throws InterruptedException {
    long delay = new Random().nextBoolean() ? 20 : 60; // 50 % chance timeout
    Thread.sleep(delay);
    return 100 + new Random().nextDouble();
  }

  //for demo purposes only
  private static double cache(String symbol) {
    return 95.00;
  }
}

Final thoughts

These changes represent a significant maturation of the structured concurrency API. While I was initially frustrated by the frequent API changes, I now appreciate that the Java team took the time to get this right. The structured concurrency API we have today is significantly better than what we started with, and I’m confident it will serve as a solid foundation for concurrent programming in Java going forward. Want to dive deeper into the latest advancements in Java concurrency? To explore these topics further and master modern techniques, consider checking out the book “Modern Concurrency in Java” available on O’Reilly: https://learning.oreilly.com/library/view/modern-concurrency-in/9781098165406/

Type your email… {#subscribe-email}

Speaking at GeeCON 2025: A Memorable Kraków Experience

2025-05-25T00:00:00+00:00

Speaking at GeeCON 2025: A Memorable Kraków Experience

I had the pleasure of attending GeeCON 2025 in Kraków—my very first time at the conference. While the sessions were excellent, what truly stood out was the strong sense of community that made the experience special.

I was also lucky to have some great conversations beyond the tech. I had a wonderful discussion with Shaaf, ranging from history to politics, over dinner at a Turkish restaurant and then again at a Pakistani one the next day. Later, I spent time walking around the city with Mohamed Taman — we took photos in various poses and had fun soaking in Kraków’s atmosphere. That evening, we joined the speaker dinner, where we ended up discussing politics, World War history, technology, religion, and just about everything else. We returned to the hotel close to midnight — a long, engaging, and memorable evening.

Another fun moment: I had a nice chat with Heinz Kabutz at the hotel lobby. Both of us wanted to attend each other’s sessions, but unfortunately, they were scheduled at the same time. We laughed about it when Heinz jokingly predicted, “Your session will have 50 people, and mine will have 5!” — a classic, light-hearted moment of speaker camaraderie.

This year, I was fortunate to have two sessions accepted at GeeCON.

The first was “Java + LLMs: A Hands-on Guide to Building LLM Apps in Java with Jakarta.”

My co-speaker Shaaf and I presented in a movie theatre with a massive screen, which added an extra thrill to the experience. We demonstrated how Java developers can connect to LLMs using LangChain4j and shared a variety of practical techniques for building intelligent apps. The session drew a full house and was well-received, which was incredibly encouraging. Around 90-100 people joined the session.

Later in the day, I delivered another talk titled “Breaking Java Stereotypes: It’s Not Your Dad’s Language Anymore.”

This one was scheduled at the very end of the day, and I only had 20 minutes. By that point, both the audience and I were understandably fatigued from a long day of deep tech. Still, I gave it my all, and I hope I convinced a few attendees to see Java in a new light.

Outside the conference, Kraków itself left a lasting impression. I’m drawn to cities with rich historical backdrops, where the roads, ancient buildings, and even the pavement seem to hold layers of the past. It’s humbling to walk on ground that has witnessed the full spectrum of history, from golden ages to the turmoil of war, as this depth is what makes these places so distinct. This stands in stark contrast to many modern cities, which can feel uniform in their amenities.

Kraków, however, is captivating. Its forts, ancient architecture, and historic cobblestones create a remarkable aura. Although my visit lasted only a few days, as a traveller, I found the experience quite worthwhile. The city’s unique charm is something that will stay with me for a long time.

On a lighter note, I encountered a cultural quirk. As someone who drinks a lot of water, but almost never the sparkling kind, I was surprised by how ubiquitous sparkling water is in Poland. The question “Still or sparkling?” would come to you if you ask for water. So when I called room service, I made sure to be clear: “A large bottle of still water, please.” To my surprise, what arrived was a bottle that could only be described as small or, at best, a medium bottle. Our definitions of ‘large’ differed!

I look forward to the possibility of catching up with some of you again at a future GeeCON or somewhere else in the Java community! The sense of community and anticipation for future meetings is what makes these experiences truly special.

Type your email… {#subscribe-email}

Java + LLMs  + LangChain4j — 2025 Talk Series

2025-05-03T00:00:00+00:00

Java + LLMs  + LangChain4j — 2025 Talk Series

Shaaf and I have been heads‑down exploring how LangChain4j slots into everyday Java and Jakarta EE projects. Our experiments have grown into a full talk series.

You can find a list of delivered and upcoming talks on my conference page: https://bazlur.ca/conferences/

Why we’re doing this

LangChain4j gives Java devs RAG pipelines, vector‑store abstractions, and agent helpers without leaving the JVM.
Jakarta EE supplies the familiar plumbing—CDI, JPA, JAX‑RS—so LLM features drop into existing codebases instead of sitting in sidecars.
Together they let us prototype AI‑powered features (chat, summarization, semantic search), Function calling, MCP and many more. You can take them straight to production.

What the session covers

Quick introduction to LLM plumbing in Java
Prompt design patterns
Memory management techniques
Tool integration (function calling)
RAG (Retrieval‑Augmented Generation) end‑to‑end
vector stores
Model Context Protocol

Slides: https://speakerdeck.com/bazlur_rahman/java-plus-llms-a-hands-on-guide-to-building-llm-apps-in-java-with-jakarta-334970cb-c9e9-46ff-931b-65b0a7a50adb

Try the code

We built a progressive demo repo — https://github.com/learnj-ai/llm-jakarta .

We’re excited to keep refining these ideas and would love your feedback—see you at the next stop on the schedule!

Type your email… {#subscribe-email}

Chat with Your Knowledge Base: A Hands-On Java & LangChain4j Guide

2025-04-18T00:00:00+00:00

Chat with Your Knowledge Base: A Hands-On Java \& LangChain4j Guide

Disclaimer: This article details an experimental project built for learning and demonstration purposes. The implementation described is not intended as a production-grade solution. Some parts of the code were generated using JetBrains’ AI Agent, Junie.

Large Language Models (LLMs) like GPT-4, Llama, and Gemini have revolutionized how we interact with information. However, their knowledge is generally limited to the data they were trained on. What if you need an AI assistant that understands your specific domain knowledge – your company’s internal documentation, product specs, or operational data from a complex system?

This is where Retrieval-Augmented Generation (RAG) comes in. RAG enhances LLMs by providing them with relevant information retrieved from your specific knowledge sources before they generate a response. This allows them to answer questions based on data they weren’t originally trained on.

This article is a hands-on guide for Java developers looking to build such a system. We’ll walk through creating a simple application that allows you to “chat” with a custom knowledge base using Java and the LangChain4j library. LangChain4j simplifies the process of integrating LLMs and building AI applications within the Java ecosystem.

By the end of this guide, you’ll have built a basic RAG pipeline that:

Loads information from local text files representing your knowledge base.
Processes and stores this information in a way the LLM can access.
Uses an LLM (like OpenAI’s GPT or a local model via Ollama) combined with retrieved knowledge to answer your questions.

What is Retrieval-Augmented Generation (RAG)?

Imagine asking an LLM a question about a specific error code in your internal system. Without RAG, the LLM might guess or say it doesn’t know.

RAG changes this by adding a crucial step:

Retrieve: When you ask a question, the system first searches your specific knowledge base (documents, databases, etc.) for information relevant to your query.
Augment: This retrieved information (the “context”) is then added to your original question and sent as a more detailed prompt to the LLM.
Generate: The LLM uses both your question and the provided context to generate an informed answer.

Essentially, RAG gives the LLM the relevant “cheat sheet” just before it needs to answer your domain-specific question.

Why LangChain4j?

LangChain4j is a Java library inspired by the popular Python LangChain project. It provides helpful abstractions and tools to streamline the development of LLM-powered applications in Java. It simplifies tasks like:

Connecting to various LLM providers (OpenAI, Ollama, Gemini, etc.).
Managing prompts and chat memory.
Loading and transforming documents.
Integrating with embedding models and vector stores (essential for RAG).
Creating AI services and agents.

Using LangChain4j means you can focus more on your application’s logic rather than the boilerplate code often involved in API integrations and data handling for AI tasks.

The Scenario: Querying Operational Knowledge

For this demo, we won’t build a full-blown industrial system interface. Instead, we’ll simulate a knowledge base containing basic information about technical components, their status, and known issues or operational rules. This information will be stored in simple text files. Our goal is to build a chat interface that can answer questions based only on the information in these files, using RAG.

Prerequisites

Before we start coding, make sure you have the following installed:

Java Development Kit (JDK): Version 17 or later is recommended; JDK 21 or later is preferred.
Build Tool: Apache Maven or Gradle. We’ll use Maven examples here.
IDE: A Java IDE like IntelliJ IDEA, Eclipse, or VS Code with Java extensions.
LLM Access: You need a way to interact with a large language model (LLM). Choose one:

Option A (OpenAI): An API key from OpenAI. You can get one from their website. LangChain4j allows using “demo” as a key for basic, rate-limited testing.
Option B (Ollama – Local): Install Ollama on your machine. After installation, pull a model via the command line (e.g., ollama pull llama3 or ollama pull mistral). This allows you to run the LLM entirely locally.

Step 1: Project Setup (Maven)

Create a new Maven project in your IDE. Open the pom.xml file and add the necessary LangChain4j dependencies.


    dev.langchain4j
    langchain4j
    ${langchain4j.version}



    dev.langchain4j
    langchain4j-open-ai
    ${langchain4j.version}



    dev.langchain4j
    langchain4j-ollama
    ${langchain4j.version}

You can choose either the langchain4j-open-ai or langchain4j-ollama dependency.

Step 2: Creating the Knowledge Base Files

We need some raw data to feed our RAG system. Create a directory named src/main/resources in your project structure. Inside this directory, create two text files:

src/main/resources/components.txt :

Component ID: PUMP-001. Type: Centrifugal Pump. Status: Running. Connected to: VALVE-001, PIPE-002. Location: Sector A.
Component ID: VALVE-001. Type: Gate Valve. Status: Open. Connected to: PUMP-001, TANK-A. Location: Sector A.
Component ID: SENSOR-T1. Type: Temperature Sensor. Monitors: PUMP-001 Casing. Reading: 65C. Unit: Celsius. Location: Sector A.
Component ID: SENSOR-P1. Type: Pressure Sensor. Monitors: PIPE-002. Reading: 150. Unit: PSI. Location: Sector B.
Component ID: MOTOR-001. Type: Electric Motor. Status: Running. Drives: PUMP-001. Location: Sector A.

src/main/resources/knowledge.txt :

Fault ID: F001. Description: High Temperature on PUMP-001. Possible Causes: Low lubrication, bearing wear, blocked outlet VALVE-001. Recommended Action: Check lubrication levels and bearing condition.
Event ID: E001. Description: Pressure drop in PIPE-002 below 100 PSI. Related Components: PUMP-001, VALVE-001, SENSOR-P1. Possible Causes: Leak in PIPE-002, PUMP-001 failure, VALVE-001 partially closed.
Rule ID: R001. Condition: If SENSOR-T1 reading > 80C. Action: Generate HIGH_TEMP_ALERT for PUMP-001. Priority: High.
Maintenance Note M001: PUMP-001 bearings last replaced 6 months ago. Next inspection due in 1 month.
Safety Procedure S001: Before servicing PUMP-001, ensure MOTOR-001 is locked out and VALVE-001 is closed.

These files contain simple, factual statements about our simulated system.

Step 3: Ingesting the Knowledge (Building the RAG Pipeline)

Now, we write the Java code to load these files, process them, and store them in a way that’s searchable. This process involves:

Loading: Reading the content from the text files.
Splitting: Breaking down the documents into smaller, manageable chunks (or “segments”). This is important because LLMs have limits on how much text they can process at once, and smaller chunks often lead to more relevant retrieval.
Embedding: Converting each text segment into a numerical vector (an “embedding”) using an Embedding Model. These vectors capture the semantic meaning of the text. Similar concepts will have similar vectors.
Storing: Saving these embeddings along with their corresponding text segments in an “Embedding Store” (often a vector database, but we’ll use a simple in-memory store for this demo).

Create a new Java class, KnowledgeBaseIngestor.java:

package com.example; // Use your package name

package ca.bazlur.util;

import ca.bazlur.service.KnowledgeBaseService;
import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.document.DocumentParser;
import dev.langchain4j.data.document.DocumentSplitter;
import dev.langchain4j.data.document.parser.TextDocumentParser;
import dev.langchain4j.data.document.splitter.DocumentSplitters;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.model.ollama.OllamaEmbeddingModel;
import dev.langchain4j.model.openai.OpenAiEmbeddingModel; // Choose one
// import dev.langchain4j.model.ollama.OllamaEmbeddingModel; // Choose one
import dev.langchain4j.store.embedding.EmbeddingSearchRequest;
import dev.langchain4j.store.embedding.EmbeddingStore;
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;
import dev.langchain4j.store.embedding.inmemory.InMemoryEmbeddingStore;

import java.io.IOException;
import java.io.InputStream;
import java.net.URISyntaxException;
import java.net.URL;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
import java.util.Objects;

public class KnowledgeBaseIngestor {

    /**
     * Loads documents from resource files, creates embeddings, and stores them in an in-memory store.
     *
     * @return An EmbeddingStore containing the processed knowledge base.
     * @throws URISyntaxException if the resource file paths are invalid.
     */
    public static EmbeddingStore ingestData() throws URISyntaxException, IOException {
        System.out.println("Starting knowledge base ingestion...");

        // --- 1. Load Documents ---
        Document componentsDoc = loadDocumentFromResource("components.txt", new TextDocumentParser());
        Document knowledgeDoc = loadDocumentFromResource("knowledge.txt", new TextDocumentParser());
        List documents = List.of(componentsDoc, knowledgeDoc);
        System.out.println("Documents loaded successfully.");

        // --- 2. Setup Embedding Model ---
        // Choose *one* embedding model provider:

        // Option A: OpenAI (Requires OPENAI_API_KEY environment variable or use "demo")
//      System.out.println("Initializing OpenAI Embedding Model...");
//      EmbeddingModel embeddingModel = OpenAiEmbeddingModel.builder()
//              .apiKey(System.getenv("OPENAI_API_KEY") != null ? System.getenv("OPENAI_API_KEY") : "demo")
//              .logRequests(true) // Optional: Log requests to OpenAI
//              .logResponses(true) // Optional: Log responses from OpenAI
//              .build();

        // Option B: Ollama (Requires Ollama server running locally)
        System.out.println("Initializing Ollama Embedding Model...");
        EmbeddingModel embeddingModel = OllamaEmbeddingModel.builder()
                .baseUrl("http://localhost:11434") // Default Ollama URL
                .modelName("llama3")
                .build();
        System.out.println("Embedding Model initialized.");


        // --- 3. Setup Embedding Store ---
        // We use a simple in-memory store for this demo.
        // For persistent storage, explore options like Chroma, Pinecone, Weaviate, etc.
        System.out.println("Initializing In-Memory Embedding Store...");
        EmbeddingStore embeddingStore = new InMemoryEmbeddingStore<>();
        System.out.println("Embedding Store initialized.");

        // --- 4. Setup Ingestion Pipeline ---
        // Define how documents are split into segments (chunking strategy)
        // recursive(maxSegmentSize, maxOverlap) splits text recursively, trying to keep paragraphs/sentences together.
        // 300 characters per segment, 30 characters overlap between segments.
        DocumentSplitter splitter = DocumentSplitters.recursive(300, 30);
        System.out.println("Using recursive document splitter (300 chars, 30 overlap).");

        // EmbeddingStoreIngestor handles splitting, embedding, and storing.
        EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
                .documentSplitter(splitter)
                .embeddingModel(embeddingModel)
                .embeddingStore(embeddingStore)
                .build();

        // --- 5. Ingest Documents ---
        System.out.println("Ingesting documents into the embedding store...");
        ingestor.ingest(documents);
        System.out.println("Ingestion complete. Embedding store contains");

        return embeddingStore;
    }

    /**
     * Helper method to get the Path of a resource file.
     * Handles running from IDE and within a JAR file.
     * @param resourceName The name of the file in src/main/resources
     * @return The Path object for the resource.
     * @throws URISyntaxException If the resource URL is malformed.
     * @throws RuntimeException If the resource is not found.
     */
    private static Document loadDocumentFromResource(String resourceName, DocumentParser parser) throws IOException {
        try (InputStream inputStream = getResourceAsStream(resourceName)) {
            Objects.requireNonNull(inputStream, "Resource not found: " + resourceName);
            return parser.parse(inputStream);
        }
    }

    protected static InputStream getResourceAsStream(String resourceName) {
        return KnowledgeBaseService.class.getClassLoader().getResourceAsStream(resourceName);
    }

    public static void main(String[] args) {
        try {
            EmbeddingStore store = ingestData();
        } catch (URISyntaxException e) {
            System.err.println("Error finding resource files: " + e.getMessage());
            e.printStackTrace();
        } catch (Exception e) {
            System.err.println("An error occurred during ingestion: " + e.getMessage());
            e.printStackTrace();
        }
    }
}

Explanation of Key Classes:

TextDocumentParser: A simple parser for plain text files.
DocumentSplitters.recursive(): A strategy for splitting documents into segments, trying to respect sentence/paragraph boundaries. The numbers (e.g., 300, 30) control the maximum segment size and the overlap between segments.
EmbeddingModel (OpenAiEmbeddingModel / OllamaEmbeddingModel): The interface and implementations for converting text to embeddings. Note: For Ollama, using a dedicated embedding model like nomic-embed-text is generally better than using a chat model for embedding.
InMemoryEmbeddingStore: A basic implementation of EmbeddingStore that keeps data in memory. Suitable for demos, but data is lost when the application stops unless serialized.
EmbeddingStoreIngestor: Orchestrates the process of splitting documents, embedding the segments, and adding them to the embedding store.

Step 4: Building the Chat Interface (AiService)

Now we create the main application class that will handle user interaction. It will:

Initialize the knowledge base by calling our KnowledgeBaseIngestor.
Set up a Chat Language Model (the LLM that generates responses).
Set up a ContentRetriever that uses the embedding store to find relevant context for user queries.
Use LangChain4j’s AiServices to create a simple chat interface.
Optionally use ChatMemory to allow the assistant to remember the conversation history.

Create a new Java class, KnowledgeAssistant.java:

package ca.bazlur.util;

import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.memory.ChatMemory;
import dev.langchain4j.memory.chat.MessageWindowChatMemory;
import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.model.ollama.OllamaChatModel;
import dev.langchain4j.model.ollama.OllamaEmbeddingModel;
import dev.langchain4j.model.openai.OpenAiChatModel;
import dev.langchain4j.model.openai.OpenAiEmbeddingModel;
import dev.langchain4j.rag.content.retriever.ContentRetriever;
import dev.langchain4j.rag.content.retriever.EmbeddingStoreContentRetriever;
import dev.langchain4j.service.AiServices;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.store.embedding.EmbeddingStore;

import java.util.Scanner;

public class KnowledgeAssistant {

    interface Assistant {
        @SystemMessage("""
                    You are an AI assistant specialized in querying operational knowledge about technical systems
                    (components, status, faults, procedures). Answer user questions accurately and concisely, 
                    relying *strictly* on the information provided in the context. Do not use any prior knowledge or make assumptions.
                    """)
        String chat(String userMessage);
    }

    public static void main(String[] args) {
        try {
            // --- 1. Ingest Knowledge Base ---
            EmbeddingStore embeddingStore = KnowledgeBaseIngestor.ingestData();

            // --- 2. Setup Chat Model ---

            // Option A: OpenAI
            /*System.out.println("Initializing OpenAI Chat Model...");
            ChatLanguageModel chatModel = OpenAiChatModel.builder()
                    .apiKey(System.getenv("OPENAI_API_KEY") != null ? System.getenv("OPENAI_API_KEY") : "demo")
                    .modelName("gpt-4o") // Or gpt-4o, etc.
                    .logRequests(true)
                    .logResponses(true)
                    .build();
            // We also need the corresponding embedding model for the retriever
            EmbeddingModel embeddingModel = OpenAiEmbeddingModel.builder()
                    .apiKey(System.getenv("OPENAI_API_KEY") != null ? System.getenv("OPENAI_API_KEY") : "demo")
                    .logRequests(true)
                    .logResponses(true)
                    .build();
            */

            // Option B: Ollama
            System.out.println("Initializing Ollama Chat Model...");
            ChatLanguageModel chatModel = OllamaChatModel.builder()
                    .baseUrl("http://localhost:11434")
                    .modelName("llama3") // Or mistral, etc.
                    .build();
            // We also need the corresponding embedding model for the retriever
            EmbeddingModel embeddingModel = OllamaEmbeddingModel.builder()
                .baseUrl("http://localhost:11434")
                .modelName("llama3")
                .build();
            System.out.println("Chat Model initialized.");


            // --- 3. Setup Content Retriever (RAG) ---
            System.out.println("Initializing Content Retriever...");
            ContentRetriever contentRetriever = EmbeddingStoreContentRetriever.builder()
                    .embeddingStore(embeddingStore)
                    .embeddingModel(embeddingModel) // Use the *same* embedding model used during ingestion
                    .maxResults(3) // Retrieve top 3 most relevant segments
                    .minScore(0.6) // Filter out segments with relevance score below 0.6
                    .build();
            System.out.println("Content Retriever initialized.");

            // --- 4. Setup Chat Memory (Optional) ---
            // This allows the assistant to remember previous parts of the conversation.
            ChatMemory chatMemory = MessageWindowChatMemory.withMaxMessages(10);
            System.out.println("Chat Memory initialized (window size 10).");

            // --- 5. Create the AiService ---
            // AiServices wires together the chat model, retriever, memory, etc.
            // It automatically implements the Assistant interface based on annotations and configuration.
            System.out.println("Creating AI Service...");
            Assistant assistant = AiServices.builder(Assistant.class)
                    .chatLanguageModel(chatModel)
                    .contentRetriever(contentRetriever)
                    .chatMemory(chatMemory)
                    .build();
            System.out.println("AI Service created. Assistant is ready.");

            // --- 6. Start Interactive Chat Loop ---
            Scanner scanner = new Scanner(System.in);
            System.out.println("\nAssistant: Hello! Ask me about the system components or known issues.");
            while (true) {
                System.out.print("You: ");
                String userQuery = scanner.nextLine();

                if ("exit".equalsIgnoreCase(userQuery)) {
                    System.out.println("Assistant: Goodbye!");
                    break;
                }

                String assistantResponse = assistant.chat(userQuery);
                System.out.println("Assistant: " + assistantResponse);
            }
            scanner.close();

        } catch (Exception e) {
            System.err.println("An error occurred during assistant setup or chat: " + e.getMessage());
            e.printStackTrace();
        }
    }
}

Explanation of Key Classes:

ChatLanguageModel (OpenAiChatModel / OllamaChatModel): Interface and implementations for the core LLM that generates responses.
EmbeddingStoreContentRetriever: An implementation of ContentRetriever specifically designed to work with an EmbeddingStore. It takes the user query, embeds it using the same EmbeddingModel used during ingestion, searches the EmbeddingStore for similar embeddings, and retrieves the corresponding text segments.
ChatMemory (MessageWindowChatMemory): Stores the history of the conversation. MessageWindowChatMemory keeps only the last N messages.
AiServices: A powerful factory in LangChain4j that creates an implementation of your defined interface (here, Assistant). It automatically handles:

Taking the user message.
(If ContentRetriever is provided) Retrieving relevant context.
(If ChatMemory is provided) Loading previous messages.
Constructing the final prompt (including context and history) for the ChatLanguageModel.
Getting the response from the LLM.
(If ChatMemory is provided) Saving the current exchange.
Returning the LLM’s response.

Step 5: Running and Testing

Set Environment Variable (if using OpenAI): Make sure your OPENAI_API_KEY environment variable is set.
Run Ollama (if using Ollama): Ensure your Ollama application is running in the background.
Compile: Use Maven to compile your project (e.g., mvn clean compile).
Run: Execute the KnowledgeAssistant class. You can run it from your IDE or use Maven to create an executable JAR (mvn clean package) and run it (java -jar target/knowledge-base-chat-1.0-SNAPSHOT.jar).

Once running, you should see the ingestion messages followed by the “Assistant: Hello!” prompt. Try asking questions based on the content of components.txt and knowledge.txt:

You: What is the status of PUMP-001?
You: Where is SENSOR-P1 located?
You: What are the possible causes of high temperature on PUMP-001?
You: What is rule R001?
You: Tell me about PUMP-001.
You: What is the safety procedure for PUMP-001?

Observe how the assistant’s answers are derived from the information you provided in the text files, demonstrating the RAG process in action.

Conclusion

Congratulations! You’ve built a basic Retrieval-Augmented Generation (RAG) application using Java and LangChain4j. You’ve seen how to load custom knowledge, process it into searchable embeddings, and create an AI assistant that leverages this specific information to provide relevant answers.

This approach of combining the power of LLMs with your domain-specific data opens up vast possibilities for building intelligent applications that truly understand your world.

For the complete source code, visit: https://github.com/rokon12/knowledge-base-chat

If you’re looking for more examples integrating LLMs with Java, especially within the Jakarta EE context, you might find this repository helpful: https://github.com/learnj-ai/llm-jakarta

Type your email… {#subscribe-email}

Building FormPilot: My Journey Creating an AI-Powered Form Filler with RAG, LangChain4j, and Ollama

2025-04-06T00:00:00+00:00

Building FormPilot: My Journey Creating an AI-Powered Form Filler with RAG, LangChain4j, and Ollama

Disclaimer: This article details an experimental project built for learning and demonstration purposes. The implementation described is not intended as production-grade solution. Some parts of the code were generated using JetBrains’ AI Agent, Junie.

Have you ever found yourself filling out the same information on web forms over and over again? Name, email, address, phone number… it’s tedious and time-consuming. As a developer who values efficiency, I decided to tackle this problem head-on by building FormPilot, an intelligent form filler that leverages the power of Large Language Models (LLMs) to fill out forms with contextually appropriate information automatically.

In this article, I’ll take you through my journey of creating FormPilot, a Chrome extension backed by a Java Spring Boot application that uses Retrieval-Augmented Generation (RAG), LangChain4j, and Ollama to fill out web forms intelligently. I’ll share the challenges I faced, the solutions I implemented, and the lessons I learned along the way.

The Inspiration

The idea for FormPilot came to me during a conference registration season. I found myself registering for multiple tech conferences, each with their own registration forms asking for the same information. As I filled out yet another form with my name, email, and bio for the fifth time, I thought, “There has to be a better way.”

Sure, there are password managers and form fillers out there, but they typically just save and replay the exact information you’ve entered before. What if we could create something smarter? Something that could help us understand the context of each field and generate appropriate values based on that context?

That’s when I decided to leverage my experience with Java and my interest in AI to build FormPilot.

The Architecture

I designed FormPilot with two main components:

A Chrome extension that detects forms on web pages and communicates with a local server
A Spring Boot application that uses RAG, LangChain4j, and Ollama to generate appropriate values for form fields

This architecture allows the extension to be lightweight while offloading the heavy lifting of AI processing to the local server. Here’s how the data flows through the system:

The Chrome extension detects a form on a web page
It extracts metadata about each form field (id, name, type, label, placeholder, etc.)
It sends this metadata to the local server
The server uses RAG and LangChain4j with Ollama to generate appropriate values for each field
The server returns these values to the extension
The extension fills the form with the generated values

Getting Started: Setting Up Your Environment

Before we dive into the specific code implementation for FormPilot’s backend, let’s set up the necessary tools: Ollama for running the language model locally and the initial Spring Boot project structure.

Part 1: Installing and Running Ollama Locally

Ollama enables you to run open-source large language models (LLMs) directly on your own machine. This is great for privacy (data stays local) and avoids API costs.

Download and Install Ollama:

Visit the official Ollama website: https://ollama.com/
Follow the instructions for your operating system:

macOS: Download the .zip file, unzip it, move the app to your Applications folder, and run it once. It installs the command-line tool and runs as a background service (check your menu bar).
Windows: Download and run the .exe installer. It typically sets up Ollama as a background service.
Linux: Use the provided installation script in your terminal:
```
curl -fsSL https://ollama.com/install.sh | sh
```
This usually sets up Ollama as a systemd service.

Verify Installation:

Open your terminal (or Command Prompt/PowerShell) and type: ollama --version
You should see the Ollama version number displayed.

Ensure Ollama is Running:

Ollama usually runs automatically in the background after installation on macOS and Windows.
On Linux, you can check its status: sudo systemctl status ollama. If needed, start it with sudo systemctl start ollama and enable it on boot with sudo systemctl enable ollama.
The Ollama server typically listens for requests on http://localhost:11434.

Install an LLM (e.g., Llama 3):

Use the Ollama command line to download the model mentioned in this article (llama3 or deepseek-llm:7b):
```
ollama run llama3
```

This command will first download the model files (this can take a while) and then load the model, giving you a chat prompt (>>>).
You can exit the chat prompt by typing /bye. The model is now downloaded and available for use by applications like your Spring Boot backend.
To see all locally downloaded models, use:

ollama list

Part 2: Creating the Spring Boot Project via Spring Initializr

Spring Initializr is a web tool that generates a basic Spring Boot project structure for you.

Visit Spring Initializr:

Go to https://start.spring.io/ in your browser and configure your project; for brevity, since this part is pretty much known by every spring developer, I am going to skip it. Download the structure and open it in your favourite IDE.

Make sure you have the following dependencies in your gradle.build or pom.xml.

dependencies {
    implementation 'dev.langchain4j:langchain4j-spring-boot-starter:1.0.0-beta2'
    implementation 'dev.langchain4j:langchain4j-open-ai-spring-boot-starter:1.0.0-beta2'
    implementation 'dev.langchain4j:langchain4j-embeddings-all-minilm-l6-v2:1.0.0-beta2'
    implementation 'dev.langchain4j:langchain4j-ollama:1.0.0-beta2'

    implementation 'org.springframework.boot:spring-boot-starter-web'

    compileOnly 'org.projectlombok:lombok'
    developmentOnly 'org.springframework.boot:spring-boot-devtools'
    annotationProcessor 'org.projectlombok:lombok'
    testImplementation 'org.springframework.boot:spring-boot-starter-test'
    testRuntimeOnly 'org.junit.platform:junit-platform-launcher'
}

With Ollama running and the basic Spring Boot project created, you’re ready to start adding the FormPilot-specific code.

Implementing RAG with LangChain4j

One of the most exciting parts of building FormPilot was implementing Retrieval-Augmented Generation (RAG) using LangChain4j. RAG is a technique that enhances LLM outputs by retrieving relevant information from a knowledge base before generating a response.

For FormPilot, I wanted the system to be able to fill forms with personalized information. I started by creating a simple text file (content.txt placed in src/main/resources) containing information about conference speakers, including names, bios, contact information, and session details.

Here’s how I implemented RAG in the RAGConfig class:

package ca.bazlur.formpilot.config;

import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.memory.chat.ChatMemoryProvider;
import dev.langchain4j.memory.chat.TokenWindowChatMemory;
import dev.langchain4j.model.Tokenizer;
import dev.langchain4j.model.embedding.EmbeddingModel;
import dev.langchain4j.model.embedding.onnx.allminilml6v2.AllMiniLmL6V2EmbeddingModel;
import dev.langchain4j.store.embedding.EmbeddingStore;
import lombok.extern.slf4j.Slf4j;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.document.DocumentSplitter;
import dev.langchain4j.data.document.parser.TextDocumentParser;
import dev.langchain4j.data.document.splitter.DocumentSplitters;
import dev.langchain4j.rag.content.retriever.ContentRetriever;
import dev.langchain4j.rag.content.retriever.EmbeddingStoreContentRetriever;
import dev.langchain4j.store.embedding.EmbeddingStoreIngestor;
import dev.langchain4j.store.embedding.inmemory.InMemoryEmbeddingStore;
import org.springframework.core.io.Resource;
import org.springframework.core.io.ResourceLoader;

import java.io.IOException;

import static dev.langchain4j.data.document.loader.FileSystemDocumentLoader.loadDocument;

@Slf4j
@Configuration
public class RAGConfig {

    @Bean(name = "ollamaChatMemoryProvider")
    ChatMemoryProvider chatMemoryProvider(Tokenizer tokenizer) {
        return memoryId -> TokenWindowChatMemory.builder()
                .id(memoryId)
                .maxTokens(10_000, tokenizer)
                .build();
    }

    @Bean
    EmbeddingStore embeddingStore(EmbeddingModel embeddingModel, ResourceLoader resourceLoader) throws IOException {

        // Normally, you would already have your embedding store filled with your data.
        // However, for the purpose of this demonstration, we will:

        // 1. Create an in-memory embedding store
        EmbeddingStore embeddingStore = new InMemoryEmbeddingStore<>();

        // 2. Load an example document
        Resource resource = resourceLoader.getResource("classpath:content.txt");
        Document document = loadDocument(resource.getFile().toPath(), new TextDocumentParser());

        log.info("Document loaded: {}", document.metadata());

        // 3. Split the document into segments 2000 tokens each
        // 4. Convert segments into embeddings
        // 5. Store embeddings into embedding store
        // All this can be done manually, but we will use EmbeddingStoreIngestor to automate this:
        DocumentSplitter documentSplitter = DocumentSplitters.recursive(2000, 500);
        EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
                .documentSplitter(documentSplitter)
                .embeddingModel(embeddingModel)
                .embeddingStore(embeddingStore)
                .build();
        ingestor.ingest(document);

        return embeddingStore;
    }


    @Bean
    EmbeddingModel embeddingModel(){
        // not good but works for this demo
        return new AllMiniLmL6V2EmbeddingModel();
    }

    @Bean(name = "ollamaContentRetriever")
    ContentRetriever contentRetriever(EmbeddingStore embeddingStore, EmbeddingModel embeddingModel) {
        log.info("Creating ContentRetriever");
        // You will need to adjust these parameters to find the optimal setting,
        // which will depend on multiple factors, for example:
        // - The nature of your data
        // - The embedding model you are using
        int maxResults = 2;
        double minScore = 0.4;

        return EmbeddingStoreContentRetriever.builder()
                .embeddingStore(embeddingStore)
                .embeddingModel(embeddingModel)
                .maxResults(maxResults)
                .minScore(minScore)
                .build();
    }
}

This configuration:

Creates an in-memory embedding store.
Loads the content.txt document from the project’s resources.
Split the document into segments of 2000 characters with a 500-character overlap (adjust these values based on your content and model).
Converts these segments into vector embeddings using the configured EmbeddingModel and stores them.
Creates a ContentRetriever that will query this embedding store to find the most relevant text segments based on semantic similarity to the input query (e.g., a form field label).

The Magic of LangChain4j’s @AiService

One of the most elegant aspects of FormPilot is how it uses LangChain4j’s @AiService annotation to create a declarative interface for interacting with the LLM. This approach dramatically simplifies the code required to prompt the model and parse its response.

Here’s the FormAssistant interface:

package ca.bazlur.formpilot.service;

import ca.bazlur.formpilot.model.FormField;
import dev.langchain4j.service.MemoryId;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import dev.langchain4j.service.spring.AiService;
import dev.langchain4j.service.spring.AiServiceWiringMode;

import java.util.List;
import java.util.Map;
import java.util.Objects;

@AiService
public interface FormAssistant {

    @SystemMessage("""
          You are an AI assistant acting *exclusively* as a JSON Form Filler.
          Your *only* task is to process a list of form fields and accompanying context (if provided) and output a *single, valid JSON object* representing the filled form.
          
          **CRITICAL OUTPUT REQUIREMENT:**
          Your entire response MUST be *only* the final JSON object.
          - Start immediately with `{`.
          - End immediately with `}`.
          - Do NOT include any introductory text, explanations, apologies, conversational filler, or markdown formatting *outside* the JSON structure itself.
          
          **Input You Will Receive:**
          1.  A list/description of form fields, each with metadata like: `id`, `name`, `type`, `label`, `placeholder`, `required`.
          2.  Optional: Text context or user-supplied data relevant to the form.
          
          **Processing Rules:**
          * **JSON Key:** Use the field's `id` as the key in the output JSON. If `id` is empty or missing, use `name`.
          * **Skipping Fields:** Ignore any field where both `id` and `name` are empty strings.
          * **Value Determination (Priority Order):**
              1.  **Context First:** Use any provided text context or user-supplied data to fill the fields. Match data to relevant field types/labels (e.g., use provided phone number for `tel` field, biography text for `textarea` bio field).
              2.  **Generation:** If context is insufficient for a field, generate a realistic and contextually appropriate value based on its `type`, `label`, and `placeholder`.
          * **Type-Specific Generation Rules:**
              * `email`: Generate a plausible email (e.g., `jane.doe@example.com`) if not in context.
              * `password`: Use a placeholder strong password (e.g., `P@ssw0rd123!`).
              * `date`: Generate a valid date (e.g., `1995-06-15`).
              * `number`: Generate a sensible number based on context (e.g., `85000` for annual income).
              * `tel`: Generate a plausible phone number (e.g., `+1-416-555-0199`) if not in context.
              * `textarea` (bios, comments, descriptions): Generate detailed, multi-sentence text relevant to the field's purpose, using context if available.
              * `select-one` (often paired):
                  * Fill the associated text input (`field-id-selectized`) with the *text label* (e.g., "Intermediate").
                  * Leave the `select-one` field (`field-id`) value as `""` (empty string) unless the specific *internal value* is known from context.
          * **Required Fields:** If `required: true` and no value can be determined from context, generate a placeholder value (e.g., `"N/A - Required"`, `"user@example.com"`, `"https://example.com/placeholder"`) instead of leaving it empty.
          * **Duplicate Sections:** Fill apparently duplicated sections (e.g., `User` vs. `ImpersonatedUser`) consistently with the same data unless context specifies otherwise.
      
          Now, process the following form fields and context, providing only the JSON output. 
          
          Return Map where the key is the field's `id` or `name` and the value is the filled value.
                   
          """)
    Map generateForm(@MemoryId String memoryId, @UserMessage List fields);
}

Define the FormField record/class (e.g., in the model package)

package ca.bazlur.formpilot.model;

import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;

import java.util.ArrayList;
import java.util.List;

/**
 * Represents a form field from a webpage.
 * This data is sent from the Chrome extension to the server.
 */
@Data
@NoArgsConstructor
@AllArgsConstructor
public class FormField {
    private String id;
    private String name;
    private String type;
    private String label;
    private String placeholder;
    private boolean required;
    private String value;
    private List

With just this interface and the @AiService annotation (from dev.langchain4j.service.spring.AiService), the LangChain4j Spring Boot starter automatically creates a bean implementing this interface. When you call the generateForm method:

It constructs a prompt using the @SystemMessage and the provided fields list (formatted as the @UserMessage).
The process automatically incorporates the ContentRetriever bean (our RAG setup). LangChain4j handles querying the retriever and adding relevant context to the prompt sent to the LLM.
It sends the combined prompt to the configured LLM (Ollama in this case).
It parses the LLM’s JSON response back into a Map.

This declarative approach keeps the service layer clean and focuses on what needs to be done rather than the low-level details of LLM interaction and RAG integration.

Integrating with Ollama

I chose to use Ollama for the LLM backend, which allows for the local running of LLMs. This provides several advantages:

Privacy: Form data never leaves the user’s computer.
No API Costs: Users don’t need to pay for LLM API usage.
Offline Capability: The system can potentially work without an active internet connection (once models are downloaded).

Configuring LangChain4j to use your local Ollama instance is straightforward using Spring Boot’s application.properties (or application.yml) file located in src/main/resources.

Add the following properties:

# LangChain4j Configuration for Ollama

# Base URL for the Ollama API (default port is 11434)
langchain4j.chat-model.ollama.base-url=http://localhost:11434
# Specify the Ollama model to use for chat completions
langchain4j.chat-model.ollama.model-name=llama3
# Temperature: Controls randomness (0.0 = deterministic)
langchain4j.chat-model.ollama.temperature=0.0
# Timeout for API calls
langchain4j.chat-model.ollama.timeout=PT60S # 60 seconds

# Configure the embedding model (used for RAG)
# Use an Ollama model capable of generating embeddings
# Note: Not all models are fine-tuned for embeddings. Check Ollama docs.

langchain4j.embedding-model.ollama.base-url=http://localhost:11434/v1
langchain4j.embedding-model.ollama.model-name=deepseek-llm:7b

langchain4j.embedding-model.ollama.timeout=PT60S

Make sure you have downloaded the specified models using Ollama (e.g., ollama run llama3 or ollama run deepseek-llm:7b).

These properties tell the LangChain4j Spring Boot starter to configure both the chat model (for generating the form values) and the embedding model (for RAG) to use your local Ollama server.

Building the Chrome Extension

The Chrome extension acts as the user-facing part of FormPilot. It needs to:

Detect forms on the current web page.
Extract relevant information about each input field.
Send this information to the local Spring Boot backend.
Receive the generated values from the backend.
Fill out the form fields with the values you received.

I structured the extension with three main parts:

Content Script (content.js) : Injected into web pages to interact with the DOM (find forms, fill fields).
Background Script (background.js or Service Worker) : Acts as a central hub, handling communication with the backend server and managing the extension state.
Popup UI (popup.html and popup.js) : Provides a simple interface for the user (e.g., a button to trigger form filling).

The most challenging part was reliably detecting form fields and extracting useful metadata, especially labels, as HTML forms vary widely in structure. I implemented several strategies in content.js:

Look for a element with a for attribute matching the field’s ID.
Check if the field is nested inside a element.
Look for a sibling element.
Look at the aria-label or aria-labelledby attributes.
As a fallback, check for preceding non-interactive element text content near the field.

Here’s a simplified snippet illustrating field data extraction:

// content.js (Simplified Example)
function findFormsAndFields() {
  const forms = document.querySelectorAll('form');
  const allFieldsData = [];
  forms.forEach(form => {
    // Select common form input elements within the current form
    const elements = form.querySelectorAll('input:not([type="submit"]):not([type="button"]):not([type="reset"]):not([type="hidden"]):not([type="file"]), select, textarea');
    elements.forEach(element => {
      const fieldData = createFormFieldData(element);
      if (fieldData && !fieldData.value) { // Only include fields that are not already filled
        // Check if field is visible to the user
        if (element.offsetParent !== null) {
            allFieldsData.push(fieldData);
        }
      }
    });
  });
  return allFieldsData;
}
function getLabelForElement(element) {
    // 1. Check for 
    if (element.id) {
        const label = document.querySelector(`label[for="${element.id}"]`);
        if (label) return label.textContent.trim();
    }
    // 2. Check for parent 
    const parentLabel = element.closest('label');
    if (parentLabel) return parentLabel.textContent.replace(element.value || '', '').trim(); // Attempt to remove element's own text if nested
    // 3. Check aria-label
    const ariaLabel = element.getAttribute('aria-label');
    if (ariaLabel) return ariaLabel.trim();
    // 4. Check aria-labelledby
    const ariaLabelledBy = element.getAttribute('aria-labelledby');
    if (ariaLabelledBy) {
        const labelElement = document.getElementById(ariaLabelledBy);
        if (labelElement) return labelElement.textContent.trim();
    }
    // 5. Fallback: Preceding sibling text (simplified)
    let previous = element.previousElementSibling;
    if (previous && !['input', 'select', 'textarea', 'button', 'label'].includes(previous.tagName.toLowerCase())) {
        return previous.textContent.trim();
    }
    return ''; // No label found
}
function createFormFieldData(element) {
  // Basic check to skip elements we don't want to fill
  const type = element.type ? element.type.toLowerCase() : element.tagName.toLowerCase();
  if (type === 'hidden' || type === 'submit' || type === 'button' || type === 'reset' || type === 'file') {
    return null;
  }
  // Skip if element is disabled or readonly
  if (element.disabled || element.readOnly) {
      return null;
  }
  // Get the current value (to potentially avoid filling already completed fields)
  let value = '';
  if (type === 'checkbox' || type === 'radio') {
      value = element.checked ? 'true' : 'false'; // Represent boolean state
  } else {
      value = element.value || '';
  }
  return {
    id: element.id || '',
    name: element.name || '',
    type: type,
    label: getLabelForElement(element), // Use dedicated function for label finding
    placeholder: element.placeholder || '',
    required: element.required || false,
    value: value // Include current value
  };
}
// --- Communication Logic (Example using Chrome messaging) ---
// This function would be called, e.g., when the user clicks the extension icon
function requestFormFill() {
    const fieldsToSend = findFormsAndFields();
    if (fieldsToSend.length > 0) {
        // Send fields to the background script
        chrome.runtime.sendMessage({ action: "fillForm", fields: fieldsToSend }, (response) => {
            if (chrome.runtime.lastError) {
                console.error("FormPilot Error:", chrome.runtime.lastError.message);
                // Handle error (e.g., show message to user)
                return;
            }
            if (response && response.filledFields) {
                // Fill the actual form elements on the page
                fillFormOnPage(response.filledFields);
            } else {
                console.error("FormPilot: No filled fields received from backend or backend error.");
                // Handle error or empty response
            }
        });
    } else {
        console.log("FormPilot: No suitable empty fields found on the page.");
        // Optionally notify user
    }
}
function fillFormOnPage(filledFieldsMap) {
    // filledFieldsMap is the Map from the backend
    console.log("FormPilot: Received fields to fill:", filledFieldsMap);
    for (const [identifier, value] of Object.entries(filledFieldsMap)) {
        // Try finding element by ID first, then by name
        let element = document.getElementById(identifier) || document.querySelector(`[name="${identifier}"]`);
        if (element) {
            fillField(element, value);
        } else {
            console.warn(`FormPilot: Could not find element with id or name: ${identifier}`);
        }
    }
    console.log("FormPilot: Form filling attempt complete.");
}
function fillField(element, value) {
    const tagName = element.tagName.toLowerCase();
    const type = element.type ? element.type.toLowerCase() : tagName;
    console.log(`FormPilot: Filling field (id=${element.id}, name=${element.name}, type=${type}) with value: ${value}`);
    if (type === 'select' || type === 'select-one') {
        // Find the option with matching value or text (case-insensitive text match)
        const options = Array.from(element.options);
        let foundOption = options.find(opt => opt.value === value);
        if (!foundOption) {
            foundOption = options.find(opt => opt.text.toLowerCase() === value.toLowerCase());
        }
        if (foundOption) {
            element.value = foundOption.value;
        } else {
            console.warn(`FormPilot: Could not find option "${value}" for select field (id=${element.id}, name=${element.name})`);
            // Optionally try creating an option if allowed, or skip
        }
    } else if (type === 'checkbox') {
        // Check/uncheck based on truthiness (e.g., "true", "yes", "on", true)
        const shouldBeChecked = ['true', 'yes', 'on', '1'].includes(String(value).toLowerCase());
        element.checked = shouldBeChecked;
    } else if (type === 'radio') {
        // For radio buttons, find the one in the group with the matching value and check it
        const radioGroup = document.querySelectorAll(`input[type="radio"][name="${element.name}"]`);
        radioGroup.forEach(radio => {
            radio.checked = (radio.value === value);
        });
    } else {
        // For text inputs, textareas, email, password, number, date, etc.
        element.value = value;
    }
    // IMPORTANT: Trigger events to simulate user input, which many frameworks rely on
    triggerEvents(element);
}
function triggerEvents(element) {
    // Events commonly listened for by frameworks (React, Vue, Angular, etc.)
    const eventsToTrigger = [
        new Event('input', { bubbles: true, cancelable: true }),
        new Event('change', { bubbles: true, cancelable: true }),
        new FocusEvent('focus', { bubbles: true, cancelable: true }), // Simulate focus first
        new FocusEvent('blur', { bubbles: true, cancelable: true })   // Then blur
    ];
    // Dispatch focus first
    element.dispatchEvent(eventsToTrigger[2]);
    // Dispatch input and change
    element.dispatchEvent(eventsToTrigger[0]);
    element.dispatchEvent(eventsToTrigger[1]);
    // Dispatch blur last
    element.dispatchEvent(eventsToTrigger[3]);
}
// --- Listener for messages from background or popup ---
chrome.runtime.onMessage.addListener((request, sender, sendResponse) => {
    if (request.action === "triggerFormFill") {
        console.log("FormPilot: Received triggerFormFill command.");
        requestFormFill();
        // Indicate async response if needed, though maybe not for this simple trigger
        // sendResponse({ status: "Form fill initiated" });
        return true; // Keep message channel open for potential async response from requestFormFill if it used sendResponse
    }
});
console.log("FormPilot Content Script Loaded.");

The background script (background.js) would handle the chrome.runtime.onMessage listener for the “fillForm” action, make a fetch request to the local Spring Boot server (http://localhost:8080/api/fill-form or similar), and send the response back to the content script. Remember to configure CORS in your Spring Boot application to allow requests from the Chrome extension’s origin.

Check out the chrome-extension: https://github.com/rokon12/form-pilot/tree/main/chrome-extension

Setting up the Spring Boot Server

Build the Spring Boot application:

./gradlew build

Run the Spring Boot application:

./gradlew bootRun

The server will start on port 8080. You can verify it’s running by visiting:

http://localhost:8080/api/form/health

Setting up the Chrome Extension

Open Google Chrome and navigate to: chrome://extensions/
Enable “Developer mode” by toggling the switch in the top right corner.
Click “Load unpacked” and select the chrome-extension directory from this project.
The extension should now be installed and visible in your Chrome toolbar.

Using the Demo Form

A demo form is included in this project to help you test the Smart Form Filler extension:

Create the demo form and open it in your browser: file:///path/to/FormPilot/demo/demo-form.html
Alternatively, you can serve the demo form using a simple HTTP server:

If you have Python installed

python -m http.server

Then visit http://localhost:8000/demo/demo-form.html in your browser.

With the Smart Form Filler extension installed and the local server running, the extension should automatically detect the form fields on the demo page.
You can manually trigger form filling in two ways:

Click the extension icon in your Chrome toolbar
Right-click on the page and select “Smart Form Filler” from the context menu, then choose “Fill Forms”

Testing with a Demo Form

To test FormPilot thoroughly, I created a comprehensive demo HTML form (demo.html) that included a wide variety of field types:

Text inputs (First Name, Last Name, Address Line 1)
Email input
Password input
Telephone input (tel)
Date input
Number input (Age)
Select dropdown (Country)
Radio buttons (T-Shirt Size)
Checkboxes (Interests)
Textarea (Comments/Bio)

Using this demo form locally allowed me to iterate quickly on both the backend logic (ensuring correct values were generated for each type) and the frontend JavaScript (ensuring fields were detected and filled correctly, including event triggering).

Challenges and Solutions

Building FormPilot wasn’t without its hurdles. Here are some key challenges and how I addressed them:

Challenge 1: Robust Form Field Detection \& Label Association

As mentioned, reliably finding input fields and their corresponding labels across diverse HTML structures is tricky.

Solution: Implemented a multi-strategy approach in content.js (checking for attribute, parent labels, aria-label, siblings) as shown in the snippet. Added checks for visibility (offsetParent !== null) and avoided filling already completed fields. It’s not perfect, but covers many common patterns.

Challenge 2: Handling Diverse Field Types Correctly

Simply setting element.value doesn’t work for all types (selects, checkboxes, radio buttons).

Solution: Created specific logic within the fillField function in content.js to handle different type attributes or tagName. For selects, it searches for matching option values or text. For checkboxes/radio buttons, it sets the checked property based on the received value.

Challenge 3: Triggering JavaScript Events for Framework Compatibility

Modern web apps heavily rely on JavaScript frameworks (React, Vue, Angular) that listen for input events (input, change, blur, focus) to update their internal state. Just setting the field’s value programmatically often doesn’t trigger these listeners.

Solution: Implemented the triggerEvents function in content.js to manually create and dispatch input, change, focus, and blur events on the element after setting its value. This significantly improves compatibility with framework-based forms.

Challenge 4: Backend Communication \& CORS

Chrome extensions have security restrictions. Content scripts cannot directly make cross-origin requests (like to http://localhost:8080).

Solution:

The content script sends a message to the background script.
The background script (which has fewer restrictions) uses the fetch API to call the Spring Boot backend.
Crucially, the Spring Boot application must be configured to handle Cross-Origin Resource Sharing (CORS) by sending the appropriate headers (e.g., Access-Control-Allow-Origin: chrome-extension://YOUR_EXTENSION_ID). This can be done globally using @Configuration and WebMvcConfigurer or on specific @RestController endpoints using @CrossOrigin.

Challenge 5: Prompt Engineering \& Model Compliance

Getting LLMs to follow detailed instructions exactly—especially when it comes to the output format and using logic like “try RAG first”—can be tricky.

Larger models like OpenAI’s GPT-4o usually follow detailed system instructions well, including sticking to strict JSON formats and trying RAG before generating answers. However, smaller models running locally with Ollama (like llama3 or deepseek-llm:7b) may not always do the same. They might add extra text outside the JSON, miss when they should use RAG, or create less accurate data for certain fields.

Solution: You often need to fine-tune the prompt to get better results with smaller models. This means making the system instructions clearer and simpler, giving good examples of the exact format you want, and even adding warnings like, “Don’t write anything before or after the JSON.” Sometimes, it also helps to add a cleanup step in your Java backend—like extracting just the JSON from the output—even if the model includes extra text. It’s a balance between how powerful the model is, how complex your prompt is, and how much cleanup you’re willing to do afterward.

Lessons Learned

This project was a fantastic learning experience, reinforcing several key concepts:

1. The Power of RAG for Contextual Personalization

Implementing RAG was the most impactful part. Instead of generic placeholders, FormPilot could potentially pull my actual bio or project details from the content.txt file when filling out a relevant field. This moves beyond simple auto-fill to context-aware generation, making the tool significantly more useful. The quality depends heavily on the relevance of the data source and the effectiveness of the embedding model and retrieval parameters.

2. LangChain4j Simplifies Java LLM Integration

LangChain4j, especially with its Spring Boot starter, abstracts away much of the complexity of interacting with LLMs and integrating RAG. The @AiService annotation is incredibly powerful, allowing a declarative approach that keeps the business logic clean and focuses on the desired interaction rather than the plumbing.

3. Local LLMs (Ollama) Offer Privacy \& Control

Using Ollama provided complete control over the model and ensured user data privacy, which is paramount when dealing with potentially sensitive form information. It also decouples the application from reliance on third-party API keys and costs, although it requires users to have sufficient hardware resources. The ease of switching models locally (just change application.properties and run ollama run ) is also a plus for experimentation.

4. Chrome Extension Development

Building the Chrome extension highlighted the importance of understanding the different script contexts (content, background, popup), their capabilities, and limitations (especially around DOM access and network requests). Proper event handling (triggerEvents) is essential for compatibility with modern web applications. Careful consideration of permissions and security (like CORS) is mandatory.

Conclusion

This a demonstration of how modern AI techniques like RAG can be combined with frameworks like LangChain4j and local LLMs via Ollama to create genuinely intelligent tools within the familiar Java ecosystem. It moves beyond simple automation to provide context-aware assistance for repetitive tasks like filling forms.

The journey highlighted the power of abstraction provided by libraries like LangChain4j and the increasing accessibility of running powerful LLMs locally. While challenges exist, particularly in reliably interacting with the diverse landscape of web forms, the potential for AI-driven productivity tools like FormPilot is immense.

The combination of a browser extension frontend and a local AI backend offers a compelling architecture for building privacy-preserving, intelligent applications.

If you’re interested in exploring the code further or contributing, you can find the project on GitHub form-pilot.

Feel free to fork it, experiment, and adapt it. I’d love to hear your feedback and suggestions!

Happy (and smarter) form filling!

Type your email… {#subscribe-email}

A Journey to DevNexus: Challenges, Friendships, and Java

2025-03-08T00:00:00+00:00

A Journey to DevNexus: Challenges, Friendships, and Java

I love participating in conferences—you get to meet so many industry and thought leaders in person, hang out with them, and build lasting bonds. That’s why I had always wanted to attend DevNexus, which isn’t far from where I live. Despite receiving an invitation, my trip kept getting delayed due to visa complications.

But finally, it happened. I was heading to Atlanta for DevNexus, one of the biggest Java conferences in this region.

The excitement was real, but of course, my journey had to come with a few unexpected twists.

A Rough Start at Pearson

When I arrived at Pearson Airport, I expected a smooth check-in and immigration process. Last year, I visited London, UK, and then Copenhagen; it was a super smooth experience with my new blue passport.

However, that didn’t happen. After passing through security, I was surprised to learn that US immigration takes place here in Pearson. After taking my fingerprints, I was asked some basic questions, such as where I was going, what the purpose of the travel was, how long I was staying, how much money I was carrying, etc., which I thought were standard questions.

However, I was pulled aside for additional questioning, asked to provide detailed information about my travel history, and had my bags thoroughly checked. I had to wait for an hour and a half before being called in, and the questioning itself took another half-hour. The whole process took so long that I missed my flight.

The airport is 40 km from home, and I was fasting. I couldn’t break my fast during the interrogation, and the back-and-forth journey, combined with a lack of sleep, left me exhausted and stressed.

Frustrated but with no other option, I returned home, spent a few hours there, and headed back to the airport at 4 a.m. the next morning. I was worried about the fact that now I would have to go through immigration again; what if they treated me the same way?

However, there were no issues this time—just a long sigh of relief as I finally boarded my flight to Atlanta.

When I arrived at the hotel, I started feeling a strong, burning sensation in my chest. I took a hot shower, which provided some relief, but the pain didn’t completely go away.

First Impressions of DevNexus

Walking into DevNexus felt incredible. It was even bigger than I had imagined, buzzing with energy and filled with familiar faces from the Java community.

One of the first people I met was Mike—Michael Redlich, InfoQ Java Queue Editor Lead. We had been working together remotely for a long time, but meeting face-to-face brought a different energy to our discussions. He is a very kind and cheerful person and got very excited seeing me in person.

Then I met my co-speaker, Shaaf, a longtime friend. Although we talk almost daily, I was eagerly looking forward to meeting him again in person.

We sat at the speaker lounge and planned our session together.

Although the pain in my chest was intense, practically unbearable, we decided to go ahead with our talk. Shaaf assured me that if I wasn’t able to present, he would take over. Many thanks to him; he supported me all the way in these two days in the conference.

The next day, I met the kind, brilliant Geertjan Wielenga, the community champion. We’ve known each other for several years now, and both contribute to Foojay. He gave me a tour of the entire conference, stopping by each booth. It was amusing to watch him introduce me to people—though many of whom I already knew online. Meeting them in person, however, was a completely different experience.

Then there was Bruno Souza, my mentor for a long time. The moment he saw me, he pulled me into a big hug and shouted, “Bazlur! How are you, man? You finally made it!” That moment alone made the trip feel worthwhile. Bruno is a very kind-hearted, incredibly warm, and exceptionally friendly person who always makes you feel special and valued. I just love him.

I also met Scott Wierschem. We’ve known each other for over seven years, but this was our first time meeting in real life. And then there were Ondro, Simon Martinelli, Andres Almiray, Steve Poole, Kito Mann, Kenneth Kousen, Elder Moraes, Frank Greco, and many more—each an incredible contributor to the Java ecosystem. Meeting them was an absolute privilege.

My Talk: Java + LLMs

The big moment for me was my talk: “Java + LLMs: A Hands-on Guide to Building LLM Apps in Java with Jakarta EE.”

The room was quite full, with a lot of happy attendees.

We went through our presentation, showcasing live demos and explaining how Java developers can integrate LLMs into their applications. Shaaf and I did a great job engaging with the audience for an hour, or at least I’d like to think so.

Slides:

After the talk, several people came up to share their thoughts and feedback. Some had more questions, while others simply appreciated the session.

In Closing..

One of the highlights of the conference was getting interviewed by Ari Waller for the Fika AI Interviews at DevNexus.We had a good conversation about Java, AI, and the evolving landscape of development. I’m really looking forward to seeing the interview go live!

To be honest, it wasn’t all smooth sailing. I had some unexpected health issues because of the stress I went through, which made things more challenging.

Despite the hurdles, the experience was incredible—meeting so many amazing people, delivering a successful talk, and being part of such an exciting event. Thanks to Pratik Patel and Vincent Mayers for arranging such an incredible conference.

As DevNexus wrapped up, I headed to the airport with my friend Shaaf. We shared one last iftar, prayed together, had our little chit-chat, and then said our goodbyes.

By 2 AM, I was finally back home in Toronto, greeted by my lovely wife, Tabassum—a warm and familiar welcome after a whirlwind trip.

Type your email… {#subscribe-email}

JetBrains Junie: My Firsthand Experience

2025-03-03T00:00:00+00:00

JetBrains Junie: My Firsthand Experience

LLMs are taking over the world—almost literally. Every day, we hear about new models popping up, pushing the boundaries of what AI can do. While some worry about it, others welcome it with open arms. The truth is that AI will inevitably take over certain tasks, much like cars replaced by horse-drawn carriages. But that doesn’t mean we’ll all be useless. Instead, it means we’ll have tools that enhance our productivity and free us from mundane tasks.

As a developer, I’m always on the lookout for anything that can streamline my workflow—something that automates repetitive coding tasks and reduces the time I spend on things that don’t necessarily require deep thinking. Let’s be honest: not every line of code we write is groundbreaking. A lot of it is boilerplate. And AI is proving to be a fantastic assistant in handling those parts.

I’ve been experimenting with several AI tools lately—ChatGPT, Gemini, Anthropic’s Claude, and many more. I also have access to GitHub Copilot and JetBrains AI Assistant. Both are excellent for auto-completing code and offering contextual suggestions. You can highlight a chunk of code and ask for improvements, especially in UI design. But they’re not fully autonomous.

But what if we had an AI agent that could autonomously take on entire tasks? One that I could point to my project and instruct to implement a feature. The agent would not only generate the code but also run tests while I brew my coffee.

Well, that’s exactly what I discovered with JetBrains’ new AI tool—Junie EAP.

I’ve been using it for a while now, and in this post, I want to share two specific use cases where it significantly enhanced my workflow.

Building a Ramadan Awareness Website

As Ramadan approached, I realized that while Muslims worldwide observe the month-long fast, many people—especially in the West—are still unfamiliar with its significance. That got me thinking: why not build a website to raise awareness?

Now, building a website from scratch is no small feat, especially for someone who isn’t a front-end expert. So, I turned to Junie. I already had all the content written in thecontent.md file, and I gave it this prompt something like the following, not exactly, as I don’t remember:

“Create a website about Ramadan to help my non-Muslim friends understand its significance. The content is in the content.md file—read it and use it for the site. The design should reflect Islamic art and culture, be responsive, visually appealing, and built as a single-page application using React.”

Of course, Junie didn’t complete the whole thing in one go. It required multiple follow-ups, refinements, and tweaks. However, it did an excellent job laying out the structure, creating reusable components, and producing a functional website. What would have taken me a week or more to build manually, I completed in a single day.

That’s a huge time saver.

Checkout the website: https://ramadan-facts.onrender.com/

You can scroll to see the whole website.

Of course, it wasn’t without hiccups. At one point, a loading dialogue made the entire page disappear. As someone who isn’t deeply familiar with CSS quirks, I struggled to fix it, and Junie wasn’t able to troubleshoot it effectively. But despite these minor setbacks, the tool proved invaluable.

Preparing for DevNexus

I’ll be speaking at DevNexus this week. For my demo, I built multiple projects that showcased step-by-step implementations with LangChain4j and Jakarta EE.

While I’m comfortable with backend development, frontend work is another story. I wanted to build a chatbot interface with a bit of animations and configure the LLM parameters via the UI. So I wouldn’t have to modify the code whenever I wanted to adjust settings.

Again, Junie came to the rescue. It helped me craft a sleek UI using Jakarta Faces with some advanced CSS tricks. The result? A fully functional chatbot interface with dynamic configuration settings—all done in a fraction of the time it would have taken me otherwise.

Also, I wanted to update the README. Well, who wants to write it manually when you have Junie? So I gave it this instruction:

“Can you scan the repository, identify key components, and update the README, including setup instructions, usage details, and any relevant documentation? Also, include a screenshot from the images folder in the README.”

It did an excellent job. Although I wanted all the screenshots, I asked to add one screenshot to the prompt, but it understood.

Checkout: https://github.com/rokon12/llm-jakarta

In Closing…

It turns out these tools, like Junie, can automate many coding tasks, create tests, and run them autonomously, and when they fail, it goes on to check the error, fix it, and run them again, but they still need human oversight. They struggle with complex issues, and when they hit a roadblock, I must step in.

The key takeaway? Don’t rely on AI mindlessly. Instead, use it as a powerful assistant to handle the tedious parts so you can focus on the creative, high-value aspects of your work.

And that, to me, is an exciting future.

A Note of Thanks

A huge thank you to JetBrains for allowing me to access Junie EAP. Those who are interested, check this page: https://www.jetbrains.com/junie/

Type your email… {#subscribe-email}

DIY JVM Part 1: Decoding the Magic – Parsing Java

2025-02-21T00:00:00+00:00

DIY JVM Part 1: Decoding the Magic – Parsing Java

It’s been about 15 years since I last touched C programming. Back in university, I remember writing a small C program for an assignment—just a simple parser that could scan a C source file and check if its structures were correctly formed. It was nothing fancy, just a mundane academic exercise, and to be honest, I never thought much about it afterward. That was the last time I wrote anything significant in C.

Since then, my career has been all about Java and JVM-based languages. C had faded into the background, a relic of my early programming days. But recently, I found myself feeling a bit nostalgic. C is often called the “mother of all programming languages,” and I started wondering—what would it be like to revisit it now, with all the knowledge and experience I’ve accumulated?

Of course, jumping back into C just for the sake of it didn’t make much sense. I wasn’t looking to relearn pointers and memory management from scratch. I wanted a project—something meaningful, something that would bridge my JVM expertise with this old friend from my past. And then it hit me: what if I built a barebones JVM in C? Nothing too ambitious—just a minimal interpreter that could read and execute Java class files. That way, I could explore the internals of the JVM while also refreshing my C skills.

And so, this DIY experiment began.

Understanding Java .class Files

When you compile a Java program, it produces a .class file—essentially a binary representation of Java bytecode. This file follows a strict format defined in the Java Virtual Machine Specification. At its core, it’s a structured stream of bytes, each part serving a specific purpose. If you’ve ever wondered what makes a .class file valid, here’s a quick breakdown:

Magic Number (4 bytes) : Every valid .class file starts with 0xCAFEBABE (big-endian format). If this signature doesn’t match, the JVM rejects the file.
Minor \& Major Version (2 + 2 bytes) : Indicates the Java version the file was compiled for. Java 8, for instance, has a major version of 52.
Constant Pool Count (2 bytes) : This tells us how many entries exist in the constant pool—a table of constants (strings, numbers, method references, etc.) used throughout the class file.
Constant Pool (variable size) : This is where the JVM stores symbolic references to class names, method names, field names, and other constants.
Access Flags (2 bytes) : Specifies whether the class is public, final, abstract, or an interface.
This Class (2 bytes) : A reference to the constant pool entry that holds the fully qualified name of the class.
Super Class (2 bytes) : Similar to This Class, but for the superclass. If the class extends another class, this field points to it; otherwise, it’s java.lang.Object.
Interfaces, Fields, Methods, and Attributes : These define what the class implements, the variables it declares, the methods it contains, and additional metadata like annotations and debugging information.

For example, if we take a hexdump of a .class file, it will look like this:

As you can see, it is just a stream of bytes, and the first 4 bytes are the magic number, cafe babe.

We don’t need to implement everything for our minimal JVM—just enough to parse and understand a .class file.

A Quick C Refresher

Before diving into the implementation, let’s quickly touch on some fundamental C concepts we’ll use in our parser.

Preprocessor Directives

In C, preprocessor directives (which begin with #) are handled before actual compilation. Common ones include:

#include 
#include 
#include 

#define MAX_CONSTANT_POOL_SIZE 32767

#include : Includes the standard I/O functions like printf and fopen.
#include : Includes general utilities like malloc/calloc and free.
#define MAX_SIZE 32767 : Defines a macro, replacing occurrences of MAX_SIZE with 32767 before compilation.

Structs and Header Files

To keep things clean, we’ll separate our code into header files (.h ) and source files (.c ). Our header file (diyjvm.h ) will contain:

typedef struct {
    uint32_t magic;
    uint16_t minor_version;
    uint16_t major_version;
    uint16_t constant_pool_count;
    cp_info *constant_pool;

    uint16_t access_flags;
    uint16_t this_class;
    uint16_t super_class;
    uint16_t interfaces_count;

    uint16_t fields_count;  // We'll skip storing the fields themselves for now
    uint16_t methods_count;
    method_info *methods;
} ClassFile;

We define data structures using struct and union keywords in C. It’s like class, but I guess it’s less classy or more!

It will also define constants like:

#define JAVA_MAGIC 0xCAFEBABE
#define MAX_STRING_LENGTH 65535

// Constant pool tags
#define CONSTANT_Class               7
#define CONSTANT_Fieldref            9
#define CONSTANT_Methodref           10
#define CONSTANT_InterfaceMethodref  11
#define CONSTANT_String              8
#define CONSTANT_Integer             3
#define CONSTANT_Float               4
#define CONSTANT_Long                5
#define CONSTANT_Double              6
#define CONSTANT_NameAndType         12
#define CONSTANT_Utf8                1

And function prototypes:

ClassFile *read_class_file(const char *filename);
void free_class_file(ClassFile *cf);

To make debugging easier, we use the following macro:

#define DEBUG_PRINT(fmt, ...)                                \
    do {                                                     \
        if (debug_mode) {                                    \
            fprintf(stderr, "[DEBUG] " fmt, ##__VA_ARGS__);  \
        }                                                    \
    } while (0)

This macro allows us to print debug messages when debug_mode is enabled. __VA_ARGS__ is a preprocessor macro that allows you to create variadic functions in C. Variadic functions are functions that can accept a variable number of arguments. If debug_mode is false, the macro does nothing.

By including diyjvm.h in our source files, we ensure we have access to these definitions.

Memory Management in C

Another important part of C is that once you allocate memory, it’s your responsibility to clean it up—no garbage collector here. Forgetting to free memory can lead to leaks, which may slow down or crash your program over time.

A macro like this helps handle free memory:

#define SAFE_FREE(p)            \
    do {                        \
        if ((p) != NULL) {      \
            free(p);            \
            (p) = NULL;         \
        }                       \
    } while (0)

Handling Endianness in .class File Parsing

One critical aspect of parsing Java class files in C is handling endianness. Java class files use big-endian format, meaning the most significant byte comes first. However, many modern systems, particularly x86-based machines, use little-endian format, where the least significant byte comes first. If we were to read multi-byte values directly, they would be interpreted incorrectly on little-endian systems.

To handle this, I need to write some function as follows:

static uint32_t read_u4(FILE *fp, bool *ok) {
    uint32_t value = 0;
    if (!safe_fread(&value, 4, 1, fp)) {
        *ok = false;
        return 0;
    }
    return __builtin_bswap32(value); // Convert from big-endian
}

he read_u4 function reads four bytes from the file and stores them in value. However, since value is stored in the system’s native endianness, we need to ensure that it is properly converted. The __builtin_bswap32 function swaps the byte order, converting from big-endian to little-endian where necessary. This ensures that the parsed values are correctly interpreted regardless of the underlying architecture.

Parsing a `.class` File

The read_class_file function is responsible for reading a Java class file and extracting its structural elements. It performs the following tasks:

Opens the specified class file for reading.
Reads and verifies the magic number (0xCAFEBABE).
Extracts the minor and major version numbers, ensuring they are within the supported range.
Parses the constant pool, allocating memory dynamically for its entries.
Reads access flags, class references, and superclass details.
Extracts interface and field metadata.
Skips over field and method attributes, ensuring the correct structure is maintained.
Allocates space for methods and parses their details, including code attributes.

Parts of this function are here:

ClassFile *read_class_file(const char *filename) {
    DEBUG_PRINT("Opening class file: %s\n", filename);

    FILE *file = fopen(filename, "rb");
    if (!file) {
        char error_msg[256];
        snprintf(error_msg, sizeof(error_msg), "Failed to open class file '%s'.", filename);
        ERROR_AND_CLEANUP(error_msg, { /* no cleanup needed here */ });
    }

    bool ok = true;
    ClassFile *cf = malloc(sizeof(ClassFile));
    if (!cf) {
        ERROR_AND_CLEANUP("Out of memory allocating ClassFile.", {
            fclose(file);
        });
    }
    memset(cf, 0, sizeof(*cf)); // zero out structure

    // Read magic
    cf->magic = read_u4(file, &ok);
    DEBUG_PRINT("Read magic number: 0x%08X\n", cf->magic);
    if (!ok || cf->magic != JAVA_MAGIC) {
        char error_msg[256];
        snprintf(error_msg, sizeof(error_msg),
                 "Invalid or missing magic number in '%s'.", filename);
        ERROR_AND_CLEANUP(error_msg, {
            free_class_file(cf);
            fclose(file);
        });
    }
    DEBUG_PRINT("Magic number verified successfully\n");

    // Read minor/major version
    cf->minor_version = read_u2(file, &ok);
    cf->major_version = read_u2(file, &ok);
    if (!ok) {
        ERROR_AND_CLEANUP("Could not read version numbers.", {
            free_class_file(cf);
            fclose(file);
        });
    }

    if (cf->major_version < 45 || cf->major_version > 69) {
        ERROR_AND_CLEANUP("Unsupported class file version.", {
            free_class_file(cf);
            fclose(file);
        });
    }

    // Read constant pool count
    cf->constant_pool_count = read_u2(file, &ok);
    DEBUG_PRINT("Constant pool count: %d\n", cf->constant_pool_count);
    if (!ok || cf->constant_pool_count > MAX_CONSTANT_POOL_SIZE) {
        ERROR_AND_CLEANUP("Invalid constant pool count.", {
            free_class_file(cf);
            fclose(file);
        });
    }

    cf->constant_pool = (cp_info *) calloc(cf->constant_pool_count, sizeof(cp_info));
    if (!cf->constant_pool) {
        ERROR_AND_CLEANUP("Out of memory allocating constant pool.", {
            free_class_file(cf);
            fclose(file);
        });
    }

....
.....
}

Project Structure

Our project follows a simple layout:

diyjvm/
├── include/
│   └── diyjvm.h
├── src/
│   └── main.c
└── test/
    └── HelloWorld.class

include/ : Holds header files.
src/ : Contains source code.
test/ : Includes sample .class files for testing.

Source Code:

The GitHub repository can be found here: https://github.com/rokon12/diyjvm/

Compiling and Running

To compile the program:

gcc -DDEBUG -Wall -Wextra -I./include src/main.c -o diyjvm

Then, to run it:

./diyjvm test/HelloWorld.class

This should print something like:

Class file: test/HelloWorld.class
Magic: 0xCAFEBABE
Version: 65.0
Constant pool entries: 29
Methods: 2

For debugging:

./diyjvm -d test/HelloWorld.class

This will output more detailed logs about how the file is being parsed.

[DEBUG] Initializing diyJVM...
[DEBUG] Opening class file: test/HelloWorld.class
[DEBUG] Read magic number: 0xCAFEBABE
[DEBUG] Magic number verified successfully
[DEBUG] Constant pool count: 29
[DEBUG] Reading constant pool entry with tag: 10
[DEBUG] Reading constant pool entry with tag: 7
[DEBUG] Reading constant pool entry with tag: 12
[DEBUG] Reading constant pool entry with tag: 1
[DEBUG] Reading constant pool entry with tag: 1
[DEBUG] Reading constant pool entry with tag: 1
[DEBUG] Reading constant pool entry with tag: 9
[DEBUG] Reading constant pool entry with tag: 7
[DEBUG] Reading constant pool entry with tag: 12
[DEBUG] Reading constant pool entry with tag: 1
[DEBUG] Reading constant pool entry with tag: 1
[DEBUG] Reading constant pool entry with tag: 1
[DEBUG] Reading constant pool entry with tag: 8
[DEBUG] Reading constant pool entry with tag: 1
[DEBUG] Reading constant pool entry with tag: 10
[DEBUG] Reading constant pool entry with tag: 7
[DEBUG] Reading constant pool entry with tag: 12
[DEBUG] Reading constant pool entry with tag: 1
[DEBUG] Reading constant pool entry with tag: 1
[DEBUG] Reading constant pool entry with tag: 1
[DEBUG] Reading constant pool entry with tag: 7
[DEBUG] Reading constant pool entry with tag: 1
[DEBUG] Reading constant pool entry with tag: 1
[DEBUG] Reading constant pool entry with tag: 1
[DEBUG] Reading constant pool entry with tag: 1
[DEBUG] Reading constant pool entry with tag: 1
[DEBUG] Reading constant pool entry with tag: 1
[DEBUG] Reading constant pool entry with tag: 1
[DEBUG] Methods count: 2
[DEBUG] Method[0]: access=0x0001, name_index=5, desc_index=6, attr_count=1
[DEBUG]  -> Found Code attribute
[DEBUG] Method[0], Code attribute, Sub-attribute 0: name_index=24, length=6
[DEBUG] Method[1]: access=0x0009, name_index=25, desc_index=26, attr_count=1
[DEBUG]  -> Found Code attribute
[DEBUG] Method[1], Code attribute, Sub-attribute 0: name_index=24, length=10
Class file: test/HelloWorld.class
Magic: 0xCAFEBABE
Version: 65.0
Constant pool entries: 29
Methods: 2
[DEBUG] Cleaning up diyJVM...

Wrapping Up

This is just the beginning. In this first part, we’ve laid the groundwork—understanding Java class files, refreshing some C fundamentals, and setting up our project. The real fun begins as we start parsing these files in earnest.

Next time, we’ll get our hands dirty with more features. Stay tuned!

Type your email… {#subscribe-email}

SummarizingTokenWindowChatMemory: Enhancing LLM’s Conversations with Efficient Summarization

2025-02-14T00:00:00+00:00

SummarizingTokenWindowChatMemory: Enhancing LLM’s Conversations with Efficient Summarization

LLM chat models have become an integral part of many applications today. We are all experimenting and exploring the best ways to utilize them effectively. For Java developers, LangChain4j has been an incredible tool in this journey.

By design, most available APIs, such as ChatGPT and Gemini, operate in a fire-and-forget mode. They don’t retain previous interactions, meaning every request is treated as a completely new one. However, for a smooth and engaging conversation, maintaining context is crucial—otherwise, the interaction becomes disjointed and frustrating. The common solution is to pass previous messages along with each new prompt, allowing the LLM to infer context. This is where the chat memory concept comes in.

That said, every conversation has a context window , which limits the maximum number of tokens that can be processed. We can’t pass an unlimited number of tokens in a prompt. As conversations grow longer, they consume more resources, making it necessary to find efficient ways to manage and evict older messages.

LangChain4j already provides two types of memory eviction policies:

MessageWindowChatMemory – This approach keeps track of a fixed number of recent messages, discarding the oldest ones when the limit is reached.
TokenWindowChatMemory – Instead of counting messages, this method monitors the total number of tokens used and removes older messages when the token count exceeds a predefined threshold.

These memory policies help optimize resource usage while ensuring conversations remain coherent and user-friendly.

While both approaches are effective, I wondered why not add a third approach —one that summarizes old messages rather than removing them. This idea led me to experiment and eventually implement SummarizingTokenWindowChatMemory .

SummarizingTokenWindowChatMemory

The core idea behind SummarizingTokenWindowChatMemory is straightforward:

Monitor Chat Tokens: The system tracks the number of tokens used in a conversation.
Trigger Summarization: The summarization process activates when the token count reaches a predefined threshold.
Generate a Summary: Instead of storing full conversation logs, a summarizer condenses key information into a succinct, meaningful summary.
Replace Old Messages: The summarized content replaces older messages, ensuring the conversation remains within token limits while maintaining continuity.

Let’s look at the code:

package ca.bazlur.chefbot.ai;

import dev.langchain4j.data.message.ChatMessage;
import dev.langchain4j.data.message.SystemMessage;
import dev.langchain4j.internal.ValidationUtils;
import dev.langchain4j.memory.ChatMemory;
import dev.langchain4j.model.Tokenizer;
import dev.langchain4j.store.memory.chat.ChatMemoryStore;
import dev.langchain4j.store.memory.chat.InMemoryChatMemoryStore;
import lombok.extern.slf4j.Slf4j;

import java.util.ArrayList;
import java.util.List;
import java.util.Optional;

@Slf4j
public class SummarizingTokenWindowChatMemory implements ChatMemory {

    private final Object id;
    private final int maxTokens;
    private final Tokenizer tokenizer;
    private final ChatMemoryStore store;
    private final Summarizer summarizer;

    private SummarizingTokenWindowChatMemory(Builder builder) {
        this.id = ValidationUtils.ensureNotNull(builder.id, "id");
        this.maxTokens = ValidationUtils.ensureGreaterThanZero(builder.maxTokens, "maxTokens");
        this.tokenizer = ValidationUtils.ensureNotNull(builder.tokenizer, "tokenizer");
        this.store = ValidationUtils.ensureNotNull(builder.store, "store");
        this.summarizer = ValidationUtils.ensureNotNull(builder.summarizer, "summarizer");
    }

    @Override
    public Object id() {
        return id;
    }

    @Override
    public void add(ChatMessage message) {
        // Pull existing messages from store
        List messages = new ArrayList<>(store.getMessages(id));

        // If it's a system message, handle "replace existing system message" logic
        if (message instanceof SystemMessage) {
            Optional maybeSystem = findSystemMessage(messages);
            if (maybeSystem.isPresent()) {
                if (maybeSystem.get().equals(message)) {
                    // Same system message, do nothing
                    return;
                } else {
                    // Remove old system message so we can replace with new one
                    messages.remove(maybeSystem.get());
                }
            }
        }

        // Add the new message
        messages.add(message);

        // Enforce capacity by summarizing older messages if needed
        ensureSummarizedCapacity(messages);

        // Update store
        store.updateMessages(id, messages);
    }

    @Override
    public List messages() {
        // Return a copy of messages from store
        List messages = new ArrayList<>(store.getMessages(id));

        // (Optional) ensure capacity here again, if you want to guarantee it every time
        ensureSummarizedCapacity(messages);
        return messages;
    }

    @Override
    public void clear() {
        store.deleteMessages(id);
    }

    private void ensureSummarizedCapacity(List messages) {
        int currentTokenCount = tokenizer.estimateTokenCountInMessages(messages);
        if (currentTokenCount <= maxTokens) {
            return; // We are within capacity
        }

        // If we exceed tokens, let's summarize the older messages (except system msg & possibly the newest).
        // 1) Separate out the system message if present at index 0.
        // 2) Summarize everything from startIndex...up to near the end,
        //    leaving maybe the last user or assistant message "unsummarized" for context.
        // 3) Insert the summary as a single message, then re-check capacity.

        // First, handle any system message
        Optional maybeSystem = findSystemMessage(messages);
        maybeSystem.ifPresent(messages::remove);

        // Now we can work with the non-system messages
        int startIndex = 0;
        int endIndex = messages.size() - 1; // Leave the last message for context

        // Don't try to summarize if we have 2 or fewer messages
        if (endIndex - startIndex <= 1) {
            // If we can't summarize, fall back to just removing oldest messages
            removeOldestUntilFit(messages);
            // Re-add system message if we had one
            maybeSystem.ifPresent(messages::addFirst);
            return;
        }

        // Get the messages to summarize (everything except maybe system & last)
        List toSummarize = new ArrayList<>(messages.subList(startIndex, endIndex));

        // Generate the summary
        String summary = summarizer.summarize(toSummarize);

        // Replace the summarized messages with the summary
        messages.subList(startIndex, endIndex).clear();
        messages.add(startIndex, SystemMessage.from("Previous conversation summary: " + summary));

        // Re-add system message if we had one
        maybeSystem.ifPresent(messages::addFirst);

        // If we're still over capacity, remove oldest messages (after any system message)
        if (tokenizer.estimateTokenCountInMessages(messages) > maxTokens) {
            removeOldestUntilFit(messages);
        }
    }

    private void removeOldestUntilFit(List messages) {
        // Keep system message if present
        Optional maybeSystem = findSystemMessage(messages);
        maybeSystem.ifPresent(messages::remove);

        // Remove oldest messages until we're under token limit
        while (!messages.isEmpty() && tokenizer.estimateTokenCountInMessages(messages) > maxTokens) {
            messages.removeFirst();
        }

        // Re-add system message if we had one
        maybeSystem.ifPresent(messages::addFirst);
    }

    private Optional findSystemMessage(List messages) {
        if (!messages.isEmpty() && messages.getFirst() instanceof SystemMessage) {
            return Optional.of((SystemMessage) messages.getFirst());
        }
        return Optional.empty();
    }

    public static Builder builder() {
        return new Builder();
    }

    public static class Builder {
        private Object id;
        private Integer maxTokens;
        private Tokenizer tokenizer;
        private ChatMemoryStore store = new InMemoryChatMemoryStore();
        private Summarizer summarizer;

        public Builder id(Object id) {
            this.id = id;
            return this;
        }

        public Builder maxTokens(Integer maxTokens, Tokenizer tokenizer) {
            this.maxTokens = maxTokens;
            this.tokenizer = tokenizer;
            return this;
        }

        public Builder chatMemoryStore(ChatMemoryStore store) {
            this.store = store;
            return this;
        }

        public Builder summarizer(Summarizer summarizer) {
            this.summarizer = summarizer;
            return this;
        }

        public SummarizingTokenWindowChatMemory build() {
            return new SummarizingTokenWindowChatMemory(this);
        }
    }
}

The SummarizingTokenWindowChatMemory class manages chat messages, ensuring they stay within a specified token limit by summarizing older messages when necessary. It uses an OpenAiTokenizer to estimate token counts and a Summarizer to generate summaries. The class also handles system messages separately, preserving them during summarization. The `Builder` class provides a convenient way to construct SummarizingTokenWindowChatMemory instances with the required dependencies.

Now, let’s work on the Sumerizer.

The Summarizer Interface

At its core, OpenAILLMSummarizer implements the Summarizer interface:

import dev.langchain4j.data.message.ChatMessage;

import java.util.List;

public interface Summarizer {
    String summarize(List messages);
}

This provides a contract for any summarization strategy.

Summarization Logic

The main implementation, OpenAILLMSummarizer , is responsible for processing chat history and condensing it into a summary:

package ca.bazlur.chefbot.ai;

import dev.langchain4j.data.message.AiMessage;
import dev.langchain4j.data.message.ChatMessage;
import dev.langchain4j.data.message.SystemMessage;
import dev.langchain4j.data.message.UserMessage;
import dev.langchain4j.model.openai.OpenAiChatModel;
import dev.langchain4j.service.AiServices;
import dev.langchain4j.service.V;

import java.util.List;

public class OpenAILLMSummarizer implements Summarizer {

    private final int desiredTokenLimit;
    private final SummarizerAssistant assistant;

    public OpenAILLMSummarizer(OpenAiChatModel openAiChatModel, int desiredTokenLimit) {
        this.desiredTokenLimit = desiredTokenLimit;
        assistant = AiServices.builder(SummarizerAssistant.class)
                .chatLanguageModel(openAiChatModel)
                .build();
    }

    @Override
    public String summarize(List messages) {
        StringBuilder promptBuilder = new StringBuilder("Summarize the following conversation: \n");
        for (ChatMessage msg : messages) {
            if (msg instanceof UserMessage) {
                promptBuilder.append("User: ").append(((UserMessage) msg).contents()).append("\n");
            } else if (msg instanceof AiMessage) {
                promptBuilder.append("Assistant: ").append(((AiMessage) msg).text()).append("\n");
            } else if (msg instanceof SystemMessage) {
                promptBuilder.append("System: ").append(((SystemMessage) msg).text()).append("\n");
            }
        }

        String summary = assistant.summarize(promptBuilder.toString(), desiredTokenLimit);
        return summary.trim();
    }

}

Constructs a structured prompt containing key user, AI, and system messages.
Invokes the SummarizerAssistant , which interacts with OpenAI’s language model to generate the summary.

LLM Summarization

The SummarizerAssistant provides an AI-driven summarization method:

interface SummarizerAssistant {

    @dev.langchain4j.service.UserMessage("""
    You are a helpful assistant summarizing past conversation turns for a chatbot.

    Your goal is to create a concise and informative summary of the provided conversation history, 
    focusing on key information relevant to continuing the conversation.  
    Pay close attention to user preferences, requests, and any decisions they have made.  
    Also note any specific topics or tasks discussed.  Be sure to retain information that might be needed 
    to fulfill a user request or provide a relevant response.

    The summary should be under {desiredTokenLimit} tokens and written in a clear, natural language style.  
    Avoid simply listing the turns.  Instead, synthesize the information into a coherent narrative.

    Conversation History:
    {message}
    """)
    String summarize(@V("message") String messages, @V("desiredTokenLimit") int desiredTokenLimit);
}

This mechanism:

Calls an LLM model (e.g., OpenAI’s GPT) to generate concise summaries.
Ensures that the summarized conversation remains within a predefined token limit.

Why This Matters

The SummarizingTokenWindowChatMemory approach effectively manages long conversations while staying within the constraints of an LLM’s context window. The chatbot can retain essential details by summarizing older exchanges, ensuring coherence without exceeding token limits. This results in a more fluid and engaging user experience, where the chatbot can “remember” and refer back to past interactions naturally. Additionally, it optimizes efficiency by reducing the amount of text processed with each request, potentially leading to faster response times and lower costs. This makes it particularly valuable when maintaining conversational history, which is crucial for customer support, complex problem-solving, or multi-turn discussions.

That said, this approach isn’t without its challenges. The very nature of summarization means that some details—especially infrequent ones—might get lost in the process. This could occasionally impact the chatbot’s ability to deliver the most precise or personalized responses. Moreover, the summarization step itself adds computational overhead, and its effectiveness heavily depends on the prompt design and the capabilities of the underlying LLM. Another potential issue is contextual drift —over multiple rounds of summarization, the chatbot’s understanding of the conversation could gradually shift away from its original meaning. These trade-offs must be carefully weighed when deciding whether this approach is the right fit for a given application.

Conclusion

To test its functionality, I created a simple chatbot that suggests recipes. The chatbot will ask you a few questions and recommend a recipe based on your responses. As the conversation progresses, the chatbot will start summarizing older messages to retain essential context while staying within the allowed token threshold if the token limit is exceeded.

The source code is here: https://github.com/rokon12/chefbot

Type your email… {#subscribe-email}

Bazlur’s Blog Archive

Building Robust AI Applications with LangChain4j Guardrails and Spring Boot

Building Robust AI Applications with LangChain4j Guardrails and Spring Boot

Understanding LangChain4j Guardrails

Setting Up a Spring Boot Project with LangChain4j

Implementing Input Guardrails

Content Safety Input Guardrail

Smart Context-Aware Guardrail

Intelligent Input Sanitizer

Implementing Output Guardrails

Professional Tone Output Guardrail

Hallucination Detection Guardrail

Testing Your Guardrails

Creating AI Services with Guardrails

Rest endpoint

Demo

Conclusion

Java’s Structured Concurrency: Finally Finding Its Footing

Java’s Structured Concurrency: Finally Finding Its Footing

What Actually Changed This Time

The Core Concept Remains Strong

The headline change: static factory methods

Joiners: pick your success policy

Rolling your own Joiner

Better cancellation and deadlines

Scoped values ride along

Guard-rails against misuse

Observability improvements

Some more examples to try out

Example 1 – 360° Product View (Gather–Then–Fail)

Example 2 – “Race the Mirrors” File Downloader

Example 3 – Batched Thumbnail Generator with Nested Scopes

Example 4 – Real-Time Quote Service with Timed Fallback

Final thoughts

Speaking at GeeCON 2025: A Memorable Kraków Experience

Speaking at GeeCON 2025: A Memorable Kraków Experience

Java + LLMs + LangChain4j — 2025 Talk Series

Java + LLMs + LangChain4j — 2025 Talk Series

Why we’re doing this

What the session covers

Try the code

Chat with Your Knowledge Base: A Hands-On Java & LangChain4j Guide

Chat with Your Knowledge Base: A Hands-On Java \& LangChain4j Guide

What is Retrieval-Augmented Generation (RAG)?

Why LangChain4j?

The Scenario: Querying Operational Knowledge

Prerequisites

Step 1: Project Setup (Maven)

Step 2: Creating the Knowledge Base Files

Step 3: Ingesting the Knowledge (Building the RAG Pipeline)

Step 4: Building the Chat Interface (AiService)

Step 5: Running and Testing

Conclusion

Building FormPilot: My Journey Creating an AI-Powered Form Filler with RAG, LangChain4j, and Ollama

Building FormPilot: My Journey Creating an AI-Powered Form Filler with RAG, LangChain4j, and Ollama

The Inspiration

The Architecture

Getting Started: Setting Up Your Environment

Part 1: Installing and Running Ollama Locally

Part 2: Creating the Spring Boot Project via Spring Initializr

Implementing RAG with LangChain4j

The Magic of LangChain4j’s @AiService

Integrating with Ollama

Building the Chrome Extension

Setting up the Spring Boot Server

Setting up the Chrome Extension

Using the Demo Form

If you have Python installed

Testing with a Demo Form

Challenges and Solutions

Challenge 1: Robust Form Field Detection \& Label Association

Challenge 2: Handling Diverse Field Types Correctly

Challenge 3: Triggering JavaScript Events for Framework Compatibility

Challenge 4: Backend Communication \& CORS

Challenge 5: Prompt Engineering \& Model Compliance

Lessons Learned

1. The Power of RAG for Contextual Personalization

2. LangChain4j Simplifies Java LLM Integration

3. Local LLMs (Ollama) Offer Privacy \& Control

4. Chrome Extension Development

Java + LLMs  + LangChain4j — 2025 Talk Series

Java + LLMs  + LangChain4j — 2025 Talk Series

Parsing a `.class` File