What is Google Indexing? How Crawling Works

You published a page. But it's not showing up in Google search. Why? Because publishing and indexing are two completely different things. Google has to discover, crawl, and index your page before it can appear in any search result.

Here's how that pipeline works, and what you can do to speed it up.

The three-stage pipeline

Google processes web content in three stages:

Crawling — Googlebot discovers and fetches your page's HTML
Indexing — Google processes the HTML, extracts content, and stores it in the search index
Ranking — When a user searches, Google retrieves and ranks relevant indexed pages

If your page fails at stage 1 or 2, it will never appear in stage 3. Most SEO advice focuses on ranking. But if you're not indexed, ranking optimization is irrelevant.

How Googlebot discovers pages

Googlebot finds your pages through:

Sitemaps — Your sitemap.xml is the most direct way to tell Google which URLs exist
Links — Googlebot follows links from already-indexed pages. Internal linking matters.
Google Search Console — You can manually request indexing via the URL Inspection tool
IndexNow — A protocol to instantly notify search engines of new or changed URLs

For new sites, the fastest path is: submit a sitemap to Google Search Console, then request indexing for your most important pages manually.

Crawl budget and why it matters

Googlebot doesn't crawl every URL every day. It allocates a "crawl budget" to each site based on:

Site size — larger sites get more budget
Server speed — faster servers allow more crawling
Content freshness — frequently updated pages get crawled more often
Page importance — pages with more internal/external links get priority

For most sites under 10,000 pages, crawl budget is not a concern. But if you have duplicate URLs, infinite pagination, or thousands of thin pages, you're wasting budget on pages that shouldn't be crawled.

Optimize crawl budget by: keeping your sitemap clean (only include canonical URLs), blocking irrelevant paths in robots.txt, and fixing redirect chains.

Why pages don't get indexed

Google may discover your page but choose not to index it. Common reasons:

noindex directive — You're explicitly telling Google not to index the page
Duplicate content — Google sees it as a copy of another page and picks the "canonical" version
Low quality — Thin content, mostly boilerplate, or auto-generated pages
Crawl errors — 404, 500, or timeout when Googlebot visits
Blocked by robots.txt — You're preventing Googlebot from accessing the page

Check the Coverage report in Google Search Console to see which pages are excluded and why. The "Excluded" tab is one of the most useful tools in GSC.

The Google Indexing API

The Indexing API is Google's fastest indexing channel. Originally limited to job postings and livestream content, it signals to Google that a URL needs urgent re-crawling.

// Notify Google of a new or updated URL
POST https://indexing.googleapis.com/v3/urlNotifications:publish
{
  "url": "https://yoursite.com/blog/new-post",
  "type": "URL_UPDATED"
}

Setup requires a Google Cloud service account with the Indexing API enabled. The API is rate-limited (200 requests per day by default), so use it strategically for your most important pages.

IndexNow: instant notification

IndexNow is a protocol supported by Bing, Yandex, and other search engines (not Google yet, though Google has been testing it). It lets you ping search engines immediately when a URL changes:

POST https://api.indexnow.org/indexnow
{
  "host": "yoursite.com",
  "key": "your-api-key",
  "urlList": [
    "https://yoursite.com/blog/new-post",
    "https://yoursite.com/blog/updated-post"
  ]
}

Unlike sitemaps (which are passive — search engines check on their schedule), IndexNow is active — you push notifications in real time.

Monitoring indexation status

You need to know which of your pages are indexed and which aren't. Monitor this by:

Checking the Coverage report in Google Search Console
Running site:yoursite.com searches to see indexed page count
Comparing sitemap URLs vs. indexed URLs

# Check indexation status with Indxel
$ npx indxel check

Indexation:
  Submitted: 47 URLs
  Indexed:   44 URLs (93.6%)
  Pending:    2 URLs (submitted 3 days ago)
  Excluded:   1 URL (noindex)

Indxel's indexation engine automates this entire workflow: it tracks which pages are indexed, auto-submits new URLs via the Indexing API and IndexNow, and retries pages that Google hasn't picked up within your configured threshold. No manual Search Console checks needed.

Getting indexed faster: a checklist

Submit your sitemap to Google Search Console
Manually request indexing for your top 10 pages via URL Inspection
Set up IndexNow for instant notifications on content changes
Ensure fast server response times (under 200ms TTFB)
Build internal links to new pages from existing indexed pages
Avoid noindex directives on pages you want indexed
Remove duplicate content and set proper canonicals
Monitor the Coverage report weekly for new exclusions

For a more detailed technical breakdown of each stage, read our complete guide on how Google indexing works.

Indexation is the foundation of SEO. Everything else — keywords, content, link building — only works once your pages are in the index.

Here's how that pipeline works, and what you can do to speed it up.

The three-stage pipeline

Google processes web content in three stages:

Crawling — Googlebot discovers and fetches your page's HTML
Indexing — Google processes the HTML, extracts content, and stores it in the search index
Ranking — When a user searches, Google retrieves and ranks relevant indexed pages

If your page fails at stage 1 or 2, it will never appear in stage 3. Most SEO advice focuses on ranking. But if you're not indexed, ranking optimization is irrelevant.

How Googlebot discovers pages

Googlebot finds your pages through:

Sitemaps — Your sitemap.xml is the most direct way to tell Google which URLs exist
Links — Googlebot follows links from already-indexed pages. Internal linking matters.
Google Search Console — You can manually request indexing via the URL Inspection tool
IndexNow — A protocol to instantly notify search engines of new or changed URLs

For new sites, the fastest path is: submit a sitemap to Google Search Console, then request indexing for your most important pages manually.

Crawl budget and why it matters

Googlebot doesn't crawl every URL every day. It allocates a "crawl budget" to each site based on:

Site size — larger sites get more budget
Server speed — faster servers allow more crawling
Content freshness — frequently updated pages get crawled more often
Page importance — pages with more internal/external links get priority

Optimize crawl budget by: keeping your sitemap clean (only include canonical URLs), blocking irrelevant paths in robots.txt, and fixing redirect chains.

Why pages don't get indexed

Google may discover your page but choose not to index it. Common reasons:

noindex directive — You're explicitly telling Google not to index the page
Duplicate content — Google sees it as a copy of another page and picks the "canonical" version
Low quality — Thin content, mostly boilerplate, or auto-generated pages
Crawl errors — 404, 500, or timeout when Googlebot visits
Blocked by robots.txt — You're preventing Googlebot from accessing the page

Check the Coverage report in Google Search Console to see which pages are excluded and why. The "Excluded" tab is one of the most useful tools in GSC.

The Google Indexing API

The Indexing API is Google's fastest indexing channel. Originally limited to job postings and livestream content, it signals to Google that a URL needs urgent re-crawling.

// Notify Google of a new or updated URL
POST https://indexing.googleapis.com/v3/urlNotifications:publish
{
  "url": "https://yoursite.com/blog/new-post",
  "type": "URL_UPDATED"
}

Setup requires a Google Cloud service account with the Indexing API enabled. The API is rate-limited (200 requests per day by default), so use it strategically for your most important pages.

IndexNow: instant notification

IndexNow is a protocol supported by Bing, Yandex, and other search engines (not Google yet, though Google has been testing it). It lets you ping search engines immediately when a URL changes:

POST https://api.indexnow.org/indexnow
{
  "host": "yoursite.com",
  "key": "your-api-key",
  "urlList": [
    "https://yoursite.com/blog/new-post",
    "https://yoursite.com/blog/updated-post"
  ]
}

Unlike sitemaps (which are passive — search engines check on their schedule), IndexNow is active — you push notifications in real time.

Monitoring indexation status

You need to know which of your pages are indexed and which aren't. Monitor this by:

Checking the Coverage report in Google Search Console
Running site:yoursite.com searches to see indexed page count
Comparing sitemap URLs vs. indexed URLs

# Check indexation status with Indxel
$ npx indxel check

Indexation:
  Submitted: 47 URLs
  Indexed:   44 URLs (93.6%)
  Pending:    2 URLs (submitted 3 days ago)
  Excluded:   1 URL (noindex)

Getting indexed faster: a checklist

Submit your sitemap to Google Search Console
Manually request indexing for your top 10 pages via URL Inspection
Set up IndexNow for instant notifications on content changes
Ensure fast server response times (under 200ms TTFB)
Build internal links to new pages from existing indexed pages
Avoid noindex directives on pages you want indexed
Remove duplicate content and set proper canonicals
Monitor the Coverage report weekly for new exclusions

For a more detailed technical breakdown of each stage, read our complete guide on how Google indexing works.

Indexation is the foundation of SEO. Everything else — keywords, content, link building — only works once your pages are in the index.

What is Google Indexing and How Does It Work?

The three-stage pipeline

How Googlebot discovers pages

Crawl budget and why it matters

Why pages don't get indexed

The Google Indexing API

IndexNow: instant notification

Monitoring indexation status

Getting indexed faster: a checklist

What is Google Indexing and How Does It Work?

The three-stage pipeline

How Googlebot discovers pages

Crawl budget and why it matters

Why pages don't get indexed

The Google Indexing API

IndexNow: instant notification

Monitoring indexation status

Getting indexed faster: a checklist