SDK
Crawling
Crawl live websites. Validate every page. Detect cross-page SEO issues.
crawlSite(url, options?)
Crawl a live website by following internal links. Each page is audited with validateMetadata(). Returns per-page results and cross-page analysis.
Basic crawltypescript
import { crawlSite } from 'indxel'
const result = await crawlSite('https://mysite.com', {
maxPages: 100,
maxDepth: 5,
delay: 200,
strict: false,
ignorePatterns: ['/admin/*', '/api/*'],
onPageCrawled: (page) => console.log(`Crawled: ${page.url}`),
})Options
| Parameter | Type | Description |
|---|---|---|
| maxPages | number | Maximum pages to crawl (default: 50) |
| maxDepth | number | Maximum link depth from start URL (default: 5) |
| delay | number | Milliseconds between requests (default: 200) |
| strict | boolean | Treat warnings as errors in scoring |
| ignorePatterns | string[] | Glob patterns for URLs to skip (e.g. '/admin/*') |
| onPageCrawled | (page) => void | Callback fired after each page is crawled |
Result
CrawlResulttypescript
result.pages // CrawledPage[] — per-page data
result.averageScore // number — average score across all pages
result.totalPages // number
result.grade // string — overall grade
result.totalErrors // number
result.totalWarnings // number
result.durationMs // number — total crawl time
result.skippedUrls // string[] — URLs skipped by ignorePatternsCrawledPage
Per-page datatypescript
type CrawledPage = {
url: string
metadata: ResolvedMetadata
validation: ValidationResult // score, grade, errors, warnings
h1s: string[] // all H1 tags found
wordCount: number // text content word count
responseTimeMs: number
structuredDataTypes: string[] // JSON-LD @types found
}Cross-Page Analysis
The crawl result includes analysis that spans multiple pages:
CrawlAnalysistypescript
result.analysis.duplicateTitles // [{ title, urls }]
result.analysis.duplicateDescriptions // [{ description, urls }]
result.analysis.h1Issues // [{ url, issue: 'missing'|'multiple', count }]
result.analysis.brokenInternalLinks // [{ from, to, status }]
result.analysis.redirects // [{ url, chain }]
result.analysis.thinContentPages // [{ url, wordCount, isAppPage }]
result.analysis.orphanPages // string[] — pages not linked from any other page
result.analysis.slowestPages // [{ url, responseTimeMs }]
result.analysis.structuredDataSummary // [{ type, count }]fetchSitemap(url)
Fetch and parse a site's sitemap.xml.
typescript
import { fetchSitemap } from 'indxel'
const result = await fetchSitemap('https://mysite.com')
result.found // boolean
result.urls // [{ loc, lastmod?, changefreq?, priority? }]
result.errors // string[]compareSitemap(sitemapUrls, crawledUrls)
Compare sitemap URLs against crawled pages to find discrepancies.
typescript
import { compareSitemap } from 'indxel'
const comparison = compareSitemap(sitemapUrls, crawledUrls)
comparison.inCrawlOnly // Pages missing from sitemap
comparison.inSitemapOnly // Sitemap URLs not reachable via links
comparison.issues // string[] — summary of problemsfetchRobots(url)
Fetch and parse robots.txt.
typescript
import { fetchRobots } from 'indxel'
const result = await fetchRobots('https://mysite.com')
result.found // boolean
result.directives // [{ userAgent, allow, disallow }]
result.sitemapUrls // string[]
result.warnings // string[]
result.errors // string[]checkUrlsAgainstRobots(directives, urls)
typescript
import { checkUrlsAgainstRobots } from 'indxel'
const checks = checkUrlsAgainstRobots(directives, urls)
// [{ path: '/admin', blocked: true, blockedBy: 'Disallow: /admin' }]verifyAssets(pages)
Verify that referenced assets (og:image, favicon) are accessible.
typescript
import { verifyAssets } from 'indxel'
const result = await verifyAssets(crawlResult.pages)
result.totalChecked // number
result.totalBroken // number
result.checks // [{ url, type, ok, status?, error?, warning? }]Rate limiting
Be respectful of the sites you crawl. Use the
delay option to avoid overwhelming servers. Default is 200ms between requests.