Chapter 2: Javadoc Test Parser
In the previous chapter, we saw how the jdoc-test
Gradle plugins act like a foreman, automating the process of running tests embedded in Javadocs. But how does the foreman actually find the instructions (the test code) written inside those comments? It needs a specialized tool to read and understand them.
The Problem: Finding Needles in a Haystack
Imagine your Java code is like a big library full of books (your .java
files). Your Javadoc comments are like notes written inside these books. Some of these notes are just explanations, while others contain special instructions – your tests!
How can we automatically find only the test instructions and ignore the rest? We need a way to tell our tool: “Search through all these Java files, look inside the Javadoc comments (/** ... */
), and only pull out the parts marked as tests.”
The Solution: The Javadoc Test Parser - Your Code Librarian
This is where the Javadoc Test Parser comes in. Think of it as a meticulous librarian specifically trained to find test code within your Javadocs.
- Its Goal: Scan Java source files and extract code snippets marked with special tags.
- The Special Tag: It looks for HTML-like tags:
<code lang="...">
. Thelang
attribute tells the parser what kind of code it is (e.g.,lang="spock"
orlang="gherkin"
). - The Process:
- The parser reads a Java file.
- It finds all Javadoc comments (
/** ... */
). - Inside each comment, it searches for
<code lang="...">
tags. - If a tag’s
lang
matches what it’s looking for (like “spock”), it carefully copies the text between the opening<code ...>
and the closing</code>
. - This extracted text (your test code) is then collected.
This extracted code is the raw material that will later be turned into actual runnable tests.
How It Works: An Example
Let’s look at a simple Java class with a Javadoc test:
Calculator.java
package com.example;
/**
* A simple calculator.
*
* <pre><code lang="spock"> // <-- The parser looks for this tag!
* def "should add two numbers"() {
* given:
* def calc = new Calculator()
* when:
* def result = calc.add(2, 3)
* then:
* result == 5
* }
* </code></pre> // <-- And extracts content until here
*/
public class Calculator {
/**
* Adds two integers.
* @param a the first number
* @param b the second number
* @return the sum
*/
public int add(int a, int b) {
return a + b;
}
// Another method without Javadoc tests...
public int subtract(int a, int b) {
return a - b;
}
}
Now, let’s imagine the Javadoc Test Parser (configured to look for lang="spock"
) processes this file.
Input: Calculator.java
(the file content above)
Processing:
- The parser finds the main Javadoc comment for the
Calculator
class. - Inside, it spots
<code lang="spock">
. Sincelang
matches “spock”, it gets interested. - It extracts everything between
<code ...>
and</code>
. - It might do a little cleanup, like removing the leading
*
often found on Javadoc lines.
Output: A list containing one string:
// This is the extracted string (example)
"""
def "should add two numbers"() {
given:
def calc = new Calculator()
when:
def result = calc.add(2, 3)
then:
result == 5
}
"""
This extracted string is exactly the Spock test code we wrote in the comment! The parser ignored the regular Javadoc text and the add
method’s Javadoc because they didn’t have the special <code lang="spock">
tag.
Under the Hood: The Parsing Machinery
How does the parser actually achieve this? It uses a couple of key components working together:
JavaFileParser
: This component is responsible for opening and reading the.java
file. It uses a standard Java parsing library (likejavaparser-core
) to understand the structure of the Java code and, importantly, to identify all the comments (both/** Javadoc */
and/* block */
comments).JdocParser
(specificallyRegexParser
): OnceJavaFileParser
finds a comment, it passes the text content of that comment to aJdocParser
. TheRegexParser
implementation uses Regular Expressions (powerful text-searching patterns) to find the specific<code lang="...">...</code>
structures within the comment text. It checks if thelang
attribute matches the desired language (e.g., “spock”). If it finds a match, it extracts the content.
Let’s look at a simplified view of the code involved:
Core Interface (JdocParser.java
):
package org.bool.jdoc.core;
import java.util.List;
// Defines the basic contract: take comment text, return code blocks
public interface JdocParser {
List<String> parse(String content);
}
This simple interface just says: “Give me some text (the comment content), and I’ll give you back a list of code strings I found inside.”
The File Reader (JavaFileParser.java
- Simplified):
package org.bool.jdoc.core;
// ... imports ...
public class JavaFileParser {
private final JavaParser javaParser; // Tool to read Java code structure
private final JdocParser jdocParser; // Tool to find code in comments
// Constructor sets up the tools
public JavaFileParser(String lang) {
this.javaParser = new JavaParser(/* ... */); // From javaparser library
this.jdocParser = new RegexParser(lang); // Our regex tool, configured for the language
}
// Main method to parse a file
public SpecSource parse(Path file) {
// 1. Use javaParser to read the file & find comments
ParseResult<CompilationUnit> result = javaParser.parse(file);
// (Error handling omitted for simplicity)
CompilationUnit unit = result.getResult().get();
CommentsCollection comments = result.getCommentsCollection().get();
// 2. Process all found comments
List<String> allCodeBlocks = comments.getComments().stream()
// Only look at Javadoc and block comments
.filter(c -> c instanceof JavadocComment || c instanceof BlockComment)
// 3. For each comment, use jdocParser to extract code
.flatMap(comment -> jdocParser.parse(comment.getContent()).stream())
// Ignore empty results
.filter(code -> !code.trim().isEmpty())
.collect(Collectors.toList());
// Return the findings
return new SpecSource(unit, allCodeBlocks);
}
}
This shows how JavaFileParser
orchestrates the process: use javaParser
to get comments, then loop through them, passing each comment’s content to jdocParser
(our RegexParser
) to do the detailed searching.
The Regex Finder (RegexParser.java
- Conceptual):
package org.bool.jdoc.core;
// ... imports ...
public class RegexParser implements JdocParser {
// Pattern to find "<code ...>" or "</code>"
private static final Pattern CODE_TAGS = Pattern.compile("<code(?: .*?)?>|<\\/code>");
// Filter to check the lang="..." attribute
private final Predicate<String> langFilter;
// Constructor takes a filter for the language
public RegexParser(String lang) {
// Creates a filter: does the tag contain 'lang="<lang>"'?
this.langFilter = Pattern.compile(String.format(" lang *= *\"%s\"", lang))
.asPredicate();
}
@Override
public List<String> parse(String commentContent) {
List<String> extractedCode = new ArrayList<>();
// 1. Use CODE_TAGS pattern to find all <code> or </code> tags
Matcher matcher = CODE_TAGS.matcher(commentContent);
// (Logic to pair opening/closing tags and check langFilter omitted)
// ... complex logic using the matcher ...
// 2. If an opening tag passes the langFilter:
// Extract text between opening and closing tag
// Clean up text (e.g., remove leading '*')
// Add cleaned text to extractedCode list
// ...
return extractedCode;
}
// Helper to remove leading asterisks from Javadoc lines
private String removeAsterisks(String content) {
// ... logic to strip leading "*" ...
return content; // Simplified for example
}
}
The RegexParser
uses patterns (CODE_TAGS
) to find the start and end points of code blocks and applies the langFilter
to ensure it only extracts blocks for the correct language.
Simplified Flow Diagram:
Let’s visualize the steps when a Gradle task (like generateSpockSpecs
from Chapter 1) uses the parser:
sequenceDiagram
participant Task as Gradle Task
participant JFP as JavaFileParser
participant JP Lib as javaparser library
participant RP as RegexParser
Task->>JFP: parse("Calculator.java")
JFP->>JP Lib: parse("Calculator.java")
JP Lib-->>JFP: Return AST & Comments
JFP->>RP: parse("/** ... <code lang='spock'>...</code> ... */")
RP->>RP: Find <code lang='spock'> tag
RP->>RP: Check lang attribute (matches "spock")
RP->>RP: Extract content between tags
RP->>RP: Clean up extracted content (remove *)
RP-->>JFP: Return [" def 'should add...'() { ... } "]
JFP-->>Task: Return SpecSource (containing extracted code)
This diagram shows the collaboration: The Task asks JavaFileParser
to parse the file. JavaFileParser
uses the javaparser library
to get comments and then asks RegexParser
to find the relevant code within those comments.
Conclusion
You’ve now learned about the Javadoc Test Parser, the essential component in jdoc-test
responsible for finding and extracting your test code from Javadoc comments. It acts like a librarian, scanning your Java files and using the special <code lang="...">
tag as a bookmark to identify and copy the test snippets you’ve written.
Key takeaways:
- The parser looks for
<code lang="...">
tags within Javadoc comments (/** ... */
). - The
lang
attribute (e.g., “spock”, “gherkin”) tells the parser which code blocks to extract. - It uses tools like
JavaFileParser
(to read files and find comments) andRegexParser
(to find tags and extract content within comments). - The output is a list of strings, each containing the raw code found inside a matching tag.
But what happens to this extracted code? Just having the text isn’t enough; we need to turn it into actual test files that tools like Spock or Cucumber can run. That’s the job of the next component we’ll explore.
In the next chapter, we’ll see how the BDD Spec Generator takes this extracted code and transforms it into runnable .groovy
or .feature
files.
Generated by AI Codebase Knowledge Builder