← Back to Blog
Text Analysis

Understanding N-grams: The Building Blocks of Keyword Density Analysis

When you analyze keyword density in your content, you’re actually working with something called n-grams. These are sequences of words that help reveal patterns, themes, and optimization opportunities in your text. Whether you’re optimizing for search engines, ensuring readability, or analyzing content themes, understanding n-grams is key to effective text analysis.

What Are N-grams?

N-grams are contiguous sequences of words (or characters) from a text sample. The “n” represents the number of items in each sequence. In text analysis, we typically focus on word-based n-grams:

  • Unigrams (1-grams): Single words like “content,” “marketing,” “analysis”
  • Bigrams (2-grams): Two-word phrases like “content marketing,” “keyword density”
  • Trigrams (3-grams): Three-word phrases like “search engine optimization,” “content marketing strategy”

Think of n-grams as the linguistic building blocks that help computers understand the structure and meaning of human language. They capture not just individual words, but the relationships and patterns between them.

Understanding N-grams

Why N-grams Matter for Content Analysis

Beyond Single Words

While individual word frequency tells us something about content focus, n-grams reveal much more:

// Example text analysis
"Content marketing drives engagement. Good content marketing requires strategy."

// Unigrams might show:
"content"2 occurrences
"marketing"2 occurrences

// But bigrams reveal the relationship:
"content marketing"2 occurrences (specific phrase)

This distinction is crucial because “content marketing” as a phrase carries different semantic weight than the words “content” and “marketing” appearing separately.

Capturing Long-tail Keywords

N-grams help identify long-tail keyword opportunities that single-word analysis might miss:

  • Unigram: “SEO” (broad, competitive)
  • Bigram: “SEO strategy” (more specific)
  • Trigram: “local SEO strategy” (even more specific, less competitive)

How N-gram Algorithms Work

Basic N-gram Extraction

The fundamental algorithm for extracting n-grams involves sliding a window of size ‘n’ across your text:

function extractNgrams(words, n) {
  const ngrams = [];
  
  for (let i = 0; i <= words.length - n; i++) {
    const ngram = words.slice(i, i + n).join(' ');
    ngrams.push(ngram);
  }
  
  return ngrams;
}

// Example with trigrams (n=3)
const words = ["search", "engine", "optimization", "improves", "visibility"];
const trigrams = extractNgrams(words, 3);
// Result: ["search engine optimization", "engine optimization improves", "optimization improves visibility"]

Frequency Counting and Density Calculation

Once n-grams are extracted, we count their occurrences and calculate density:

function calculateNgramDensity(ngrams, totalNgramCount) {
  const ngramCounts = {};
  
  // Count occurrences
  ngrams.forEach(ngram => {
    ngramCounts[ngram] = (ngramCounts[ngram] || 0) + 1;
  });
  
  // Calculate densities
  const densities = {};
  Object.entries(ngramCounts).forEach(([ngram, count]) => {
    densities[ngram] = {
      count: count,
      density: ((count / totalNgramCount) * 100).toFixed(2) + '%'
    };
  });
  
  return densities;
}

Implementation Challenges and Solutions

Handling Stop Words

One key challenge in n-gram analysis is dealing with stop words which are common words like “the,” “and,” “is” that add little semantic value. Consider these examples:

Without stop word filtering:

  • “the best content marketing strategy”
  • “is content marketing effective”

With stop word filtering:

  • “best content marketing strategy”
  • “content marketing effective”

The filtered versions focus on meaningful terms while preserving the essential semantic relationships between important words.

Text Preprocessing

Effective n-gram analysis requires careful text preprocessing:

  1. Normalization: Converting to lowercase for consistent matching
  2. Punctuation handling: Deciding whether to include or exclude punctuation
  3. Word boundaries: Properly identifying where words begin and end
  4. Special characters: Handling numbers, symbols, and formatting

Memory and Performance Considerations

N-gram analysis can be computationally intensive, especially for large texts:

  • Filtering by frequency: Only analyze n-grams that appear above a minimum threshold
  • Limiting results: Cap the number of n-grams analyzed to prevent overwhelming output
  • Efficient data structures: Use appropriate data structures for counting and storage

How Gorby Implements N-gram Analysis

In building Gorby’s text analysis tools, we use the Compromise JS library, a powerful natural language processing library that makes n-gram extraction both fast and accurate.

This library handles the complex work of parsing text and identifying word boundaries, which is more challenging than it might seem. Consider phrases like “don’t,” “twenty-one,” or “Ph.D.” Each of these presents unique challenges for determining what counts as a “word.”

When you use our keyword density feature, you have several options to customize your analysis. You can exclude common stop words like “the,” “and,” and “is” to focus on more meaningful terms. If you’re writing in a specialized field, you can also add your own custom stop words. For example, if you’re writing about cooking, you might want to exclude words like “recipe” or “ingredient” to see what other patterns emerge.

One feature that many users find helpful is the ability to track specific words or phrases. If you’re working on a piece about “sustainable design,” you can monitor how often that exact phrase appears throughout your text, ensuring you’re covering the topic consistently without overdoing it.

Why N-gram Analysis Matters for Writers

Understanding n-grams isn’t just useful for search engine optimization—though that’s certainly one application. Writers, editors, and content creators can use n-gram analysis to improve their work in several ways.

When you’re editing a long piece, n-gram analysis can reveal patterns you might not notice otherwise. Maybe you’re unconsciously repeating certain phrases, or perhaps you’re not using key terminology as consistently as you thought. These insights help you refine your writing to be more engaging and coherent.

For academic writing, n-gram analysis can help ensure you’re using discipline-specific terminology appropriately. In business writing, it can help you maintain the right tone and terminology for your audience.

Real-World Applications

Consider a travel blogger writing about Paris. Looking at unigrams might show frequent use of words like “Paris,” “museum,” and “restaurant.” But examining bigrams reveals more specific patterns: “Louvre Museum,” “French cuisine,” “Seine River.” Trigrams might uncover even more specific phrases like “Eiffel Tower views” or “Latin Quarter cafes.”

This type of analysis helps writers understand not just what they’re talking about, but how they’re talking about it. Are you providing specific, concrete details (evidenced by longer n-grams) or staying at a general level (suggested by simpler unigrams)?

Getting Started with N-gram Analysis

If you’re curious about the patterns in your own writing, n-gram analysis is easier to try than you might think. You don’t need to understand algorithms or write code—tools like our keyword density calculator do the heavy lifting for you.

Start by analyzing a piece of writing you’ve recently completed. Look at the most frequent unigrams, bigrams, and trigrams. Do they reflect what you intended to emphasize? Are there surprising patterns or repetitions you hadn’t noticed?

The goal isn’t to optimize for any particular metric, but to better understand your own writing patterns and improve your communication with readers.

Try Advanced N-gram Analysis

Ready to see n-grams in action? Gorby’s keyword density analyzer provides instant n-gram analysis to get you started. For customizable settings like stop word configuration and keyword targets, try our keyword density feature in the main app. Premium users also get keyword highlighting directly in their text.

Whether you’re optimizing content for search engines, analyzing writing patterns, or ensuring balanced keyword distribution, understanding n-grams gives you the foundation for more effective text analysis.

Analyze your content with Gorby →


Branimir Rijavec

Branimir Rijavec

Founder & Lead Developer

© 2025 Gorby. All rights reserved.