A sitemap does not improve your rankings. It never has. What it does is help search engines discover your pages, which only matters if those pages deserve to be found in the first place.
Most sitemap guides bury this distinction under technical details. They explain the XML structure, list the attributes, show you how to submit. All useful information. But they skip the part that actually determines whether your sitemap helps or wastes crawl budget.
Here is what matters: two of the four sitemap attributes are completely ignored by Google. One attribute is required. One attribute is valuable but frequently misused in ways that actively hurt your site. Understanding which is which separates effective sitemap strategy from busywork.
When Sitemaps Actually Help
Search engines find URLs two ways: following links from already-indexed pages and reading sitemaps you provide. Neither guarantees indexing. Both improve the odds your pages get crawled.
Sitemaps provide real value in specific situations.
New sites with few external links have no entry point for crawlers. A sitemap submitted through Search Console creates that entry point when nothing else exists.
Large sites with deep page hierarchies risk having content buried too far from the homepage. That product page four clicks deep with minimal internal links might never get discovered through crawling alone. A sitemap ensures it appears on Google’s radar.
Sites with rapidly updating content benefit from accurate lastmod signals. News publishers and e-commerce sites with frequent inventory changes use sitemaps to communicate freshness. When the lastmod attribute reflects reality, crawlers can prioritize recently updated pages.
Sites with orphaned content face a structural problem. Pages that exist but link from nowhere else on your site will not be discovered through crawling. Sitemaps fill that gap, though fixing your internal linking is the better long-term solution.
Where sitemaps do not help: improving rankings (that depends on content quality and links), forcing indexing of low-quality pages (Google still evaluates the content), or replacing proper internal linking structure.
A 50-page site with solid navigation does not need sitemap optimization. A million-page e-commerce catalog cannot function without it.
The Four Attributes and Which Ones Matter
Every sitemap URL entry can include four attributes. Two of them are worth your attention. Two of them are not.
loc (required): The full, absolute URL of the page. Must include protocol, domain, and path. Must match the canonical URL exactly. If your canonical uses trailing slashes, your sitemap URLs should too. This is the only required attribute and the foundation of your sitemap.
lastmod (valuable when accurate): The date the page content was last substantially modified. Format as YYYY-MM-DD or full ISO 8601 datetime. This attribute influences crawl scheduling, but only when Google trusts it.
changefreq (ignored): How frequently the page is expected to change, from “always” to “never”. Google does not use this attribute. Their June 2023 announcement confirmed it explicitly: “Google still doesn’t use the changefreq or priority elements at all.”
priority (ignored): Relative importance within your own site, from 0.0 to 1.0. Also ignored by Google. Sites abused this attribute so consistently that the signal became meaningless. Setting everything to 1.0 told Google nothing useful.
Do not spend time configuring changefreq and priority. They exist in the protocol specification but have no effect on Google’s behavior. Focus your attention entirely on loc accuracy and lastmod honesty.
The lastmod Trust Problem
Here is where most sites get sitemap wrong.
lastmod is valuable precisely because it helps Google prioritize crawling. A page with a recent lastmod gets revisited sooner than a page with an old one. This makes freshness signals work in your favor.
But lastmod only helps when it reflects reality.
What should update lastmod: meaningful content changes, significant structural changes to the page, adding or removing major sections, updated information that changes what the page communicates.
What should not update lastmod: template or layout changes affecting all pages, sidebar or footer updates, comment additions (unless comments are the primary content), regenerating the sitemap file itself, minor typo corrections.
The common failure mode: CMS systems that update lastmod whenever any template renders. Every page looks “fresh” constantly. Google learns quickly that your lastmod means nothing.
Google addressed this directly in their 2023 blog post: “If a page hasn’t changed in several years, but the lastmod element indicates a recent change, search engines may eventually stop trusting the last modified date of your pages.”
This is not a per-page problem. It is a site-wide trust problem. Once Google decides your lastmod signals are unreliable, the attribute stops helping you entirely.
Audit your sitemap generation logic. Compare lastmod dates against actual content modification dates for a sample of URLs. If they consistently mismatch, fix your generation before Google discounts your freshness signals.
Basic Sitemap Structure
The XML format is straightforward:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/page-1/</loc>
<lastmod>2024-11-15</lastmod>
</url>
<url>
<loc>https://example.com/page-2/</loc>
<lastmod>2024-11-10</lastmod>
</url>
</urlset>
Notice what is missing: changefreq and priority. Skip them. They add XML bloat without adding value.
A single sitemap file can contain a maximum of 50,000 URLs and must not exceed 50MB uncompressed. Most sites never hit these limits. Those that do need sitemap index files.
Sitemap Index for Large Sites
When you exceed 50,000 URLs, coordinate multiple sitemaps through an index file:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemaps/products.xml</loc>
<lastmod>2024-11-15</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemaps/blog.xml</loc>
<lastmod>2024-11-15</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemaps/locations.xml</loc>
<lastmod>2024-11-10</lastmod>
</sitemap>
</sitemapindex>
The index file itself can list up to 50,000 individual sitemaps, theoretically allowing 2.5 billion URLs.
Segmentation strategies that actually help:
By content type: Separate sitemaps for products, blog posts, category pages, and support articles. This organization helps identify patterns in Search Console reports and makes partial updates easier. A Nashville, TN-based home services company might have separate sitemaps for service pages, location pages covering middle Tennessee, and blog content.
By update frequency: Static pages in one sitemap, frequently updated pages in another. Update the dynamic sitemap often; regenerate the static sitemap rarely.
By priority: Core pages in a separate sitemap from long-tail content. Monitor the core sitemap more closely.
Avoid unnecessary segmentation on small sites. If your site has 5,000 URLs, one sitemap file handles everything. Overcomplicating the structure adds maintenance burden without benefit.
What Belongs in Your Sitemap
The sitemap should function as a curated list of your best content, not a comprehensive dump of every URL your CMS generates.
Include: pages with valuable original content, pages you want appearing in search results, canonical URLs only, pages returning 200 status codes, pages that pass your own quality threshold.
Exclude: pages with noindex tags (why list URLs you are telling Google not to index?), redirect URLs, 404 or 410 error pages, duplicate content pages, thin content pages you would not want ranking, utility pages like login, cart, and thank you pages, URLs blocked by robots.txt.
If 80% of your sitemap URLs should not be indexed, Google learns to treat your sitemap signals with skepticism.
A common mistake: automatically including every URL WordPress, Shopify, or other platforms generate. These systems create URLs for tags, archives, author pages, and attachment pages that might not deserve indexing. Audit what your automated sitemap actually contains.
Specialized Sitemap Types
Beyond standard URL sitemaps, specialized formats exist for specific content.
Image sitemaps help Google discover images, especially important when images load dynamically or are not directly linked in HTML:
<url>
<loc>https://example.com/product-page/</loc>
<image:image>
<image:loc>https://example.com/images/product-photo.jpg</image:loc>
<image:caption>Product description for image</image:caption>
</image:image>
</url>
Video sitemaps provide metadata that helps videos appear in video search results:
<url>
<loc>https://example.com/video-page/</loc>
<video:video>
<video:thumbnail_loc>https://example.com/thumbs/video1.jpg</video:thumbnail_loc>
<video:title>Video Title</video:title>
<video:description>Video description</video:description>
<video:content_loc>https://example.com/videos/video1.mp4</video:content_loc>
</video:video>
</url>
News sitemaps are specifically for Google News publishers. URLs should only include content from the last 48 hours.
For most sites, the standard URL sitemap is sufficient. Image sitemaps help if you rely heavily on image search traffic. Video sitemaps matter if video content is central to your strategy.
Submitting Your Sitemap
Multiple submission methods exist. Use more than one for redundancy.
Search Console submission is the primary method. Navigate to Sitemaps in the left menu, enter your sitemap URL, and submit. Search Console then shows submission status, discovery statistics, and any errors.
Robots.txt declaration works as a passive submission method:
Sitemap: https://example.com/sitemap.xml
Any crawler reading your robots.txt will discover the sitemap location. This method requires no Search Console access and works for all compliant crawlers.
Ping endpoints were historically used to notify search engines of updates. Google deprecated their ping endpoint in June 2023, and it began returning 404 errors by the end of that year. The primary reason: most ping submissions were spam. Bing still supports pinging at https://www.bing.com/ping?sitemap=YOUR_SITEMAP_URL, but for most sites, Search Console submission plus robots.txt declaration is sufficient.
For new sites or major sitemap updates, Search Console submission triggers the fastest processing. For ongoing maintenance, robots.txt declaration ensures crawlers always know where to find current sitemaps.
Common Errors and Fixes
Sitemap errors fall into three categories.
Format errors prevent parsing entirely. Invalid XML structure, missing closing tags, improper nesting, and character encoding issues cause rejection. Namespace errors happen when the xmlns declaration is missing or incorrect. Exceeding size limits (50,000 URLs or 50MB) causes partial processing at best. Validate XML before submission.
Content errors involve problematic URLs. URLs returning non-200 status codes indicate broken links or redirect chains. URLs blocked by robots.txt create a contradiction: “please index this URL” in the sitemap, “don’t crawl this URL” in robots.txt. Canonical URL mismatches occur when sitemap URLs differ from declared canonical URLs. Mixed protocols (http URLs in a sitemap on an https site) cause issues.
Strategic errors mean the sitemap exists but does not help. Including noindexed URLs wastes sitemap space and sends mixed signals. Stale lastmod values that never change teach Google to ignore your freshness signals. Excessive URL counts dilute focus.
Search Console’s sitemap report shows status, last read time, discovered URLs, and any errors. Monitor it after submissions and periodically for established sitemaps.
Generation and Maintenance
Static sitemaps work for small, stable sites. Generate once, update manually when content changes. Any site with regular updates needs automated generation.
WordPress plugins like Yoast SEO or Rank Math generate sitemaps automatically. Verify the default configuration matches your needs. Some plugins include URLs you would rather exclude.
Shopify generates sitemaps automatically at /sitemap.xml. You cannot directly customize it, but products, collections, blogs, and pages are included by default.
For large or complex sites, custom generation provides more control. Database-driven generation queries your content database directly. Crawl-based generation uses a crawler to discover your own URLs. Hybrid approaches combine both methods.
Schedule regeneration appropriately. A news site might regenerate every 15 minutes. An e-commerce site might regenerate nightly. A corporate site might regenerate weekly or on-demand when content is published.
Whatever approach you use, test lastmod accuracy. Compare sitemap dates against actual content changes. If your automation lies about freshness, you are actively harming your site’s trust signals.
Monitoring Performance
Sitemaps require ongoing attention.
Track submitted versus discovered ratio in Search Console. Major discrepancies suggest format issues or URL filtering.
Review coverage status of sitemap URLs. Are submitted URLs actually getting indexed? What reasons appear for exclusions?
Check crawl date in the sitemap report. If Google last read your sitemap weeks ago, investigate blocking issues.
Compare sitemap URLs against server log data. Are pages in your sitemap getting crawled? How frequently? Are crawlers requesting your sitemap regularly?
Run quarterly audits. Compare current sitemap URLs against your content inventory. Are new pages being included? Have removed pages been cleaned out? Has lastmod accuracy degraded?
The goal is not sitemap perfection for its own sake. The goal is reliable discovery of your important pages. Sitemaps are one tool toward that end, most valuable when combined with strong internal linking and content worth indexing.
Sources
- Google Search Central: Learn About Sitemaps – https://developers.google.com/search/docs/crawling-indexing/sitemaps/overview
- Google Search Central Blog: Sitemaps ping endpoint is going away (June 2023) – https://developers.google.com/search/blog/2023/06/sitemaps-lastmod-ping
- Sitemaps.org Protocol Specification – https://www.sitemaps.org/protocol.html
- Google Search Console Help: Sitemaps Report – https://support.google.com/webmasters/answer/7451001
- Bing Webmaster Tools: Sitemaps Documentation – https://www.bing.com/webmasters/help/Sitemaps-3b5cf6ed