DR Web Studio
HomeAbout
PortfolioPricingBlogContact
Start ProjectGet Quote
HomeAboutPortfolioPricingBlogContact
Services
Landing Pages & One-Page SitesWebsite Migrations or RebuildsWeb ApplicationsHeadless CMS DevelopmentCustom Business WebsitesOngoing Website Maintenance & SupportE-commerce IntegrationsMultilingual & International WebsitesAPI Integrations & Automation
Language
Start ProjectGet Quote

Ready to Start Your Website Project?

Get a free consultation and custom quote for your business website.

Start QuestionnaireContact Us
DR Web Studio

Custom website development for businesses in the Dominican Republic and worldwide. We build fast, modern, multilingual websites that grow your brand.

Dominican Republic
james@dr-webstudio.com

Quick Links

  • Home
  • About
  • Portfolio
  • Pricing
  • Blog
  • Contact
  • Complete Guide

Services

  • Landing Pages & One-Page Sites
  • Website Migrations or Rebuilds
  • Web Applications
  • Headless CMS Development
  • Custom Business Websites
  • Ongoing Website Maintenance & Support
  • E-commerce Integrations
  • Multilingual & International Websites
  • API Integrations & Automation

Resources

  • Website Questionnaire
  • Get Free Quote
  • Custom Payment
  • FAQ
  • Privacy Policy

Follow Us

© 2026 DR Web Studio. All rights reserved
Privacy PolicyTerms of ServiceSitemap
Back to Blog

What Is a Robots.txt File and Is Yours Blocking Google From Your Website?

May 19, 2026
13 min read
What Is a Robots.txt File and Is Yours Blocking Google From Your Website?

What Is a Robots.txt File and Is Yours Blocking Google From Your Website?

There is a file on your website right now that Google reads before it reads anything else. Before it looks at your homepage, before it crawls your service pages, before it evaluates your content for ranking — it reads this file and follows its instructions.

That file is called robots.txt. And for a significant number of Dominican business websites, that file is currently telling Google to stay away.

Not because the business owner intended it. Not because a developer made a strategic decision. Often because a single line was set during the website's development — when you do not want Google indexing an unfinished site — and was never changed when the site went live.

The result is a website that looks complete, loads correctly, and has been actively investing in SEO content — but is invisible to Google because the front door has a "do not enter" sign that nobody remembered to take down.

This article explains what robots.txt actually is, what it controls (and what it does not), the five most common mistakes that cause Dominican websites to block their own Google visibility, and how to check whether your file has a problem in under 60 seconds.

What Robots.txt Actually Is

A robots.txt file is a plain text file located at the root of your website — always at the URL yourdomain.com/robots.txt. It is publicly accessible, which means anyone (and any crawler) can read it.

The file uses a simple syntax to tell search engine bots — Googlebot, Bingbot, and others — which parts of your website they are allowed to crawl. "Crawling" means visiting and reading pages so they can be considered for inclusion in the search index. A page that cannot be crawled cannot be properly indexed. A page that is not properly indexed does not appear in Google search results.

The syntax is minimal. A robots.txt file has only a few types of lines:

User-agent: Specifies which crawler the following rules apply to. User-agent: * applies to all crawlers. User-agent: Googlebot applies only to Google's crawler.

Disallow: Specifies which URLs the crawler should not visit. Disallow: /admin/ means do not crawl anything in the admin directory. Disallow: / means do not crawl anything on the entire website.

Allow: Overrides a Disallow rule for specific pages within a blocked directory. Allow: /admin/public-page.html within a block on /admin/ allows that one specific page.

Sitemap: Tells crawlers where your XML sitemap is located — a helpful positive instruction that ensures Google finds all your important pages.

A minimal, correct robots.txt for most Dominican tourism websites looks like this:

User-agent: *
Disallow: /admin/
Disallow: /wp-admin/
Sitemap: https://www.yourdomain.com/sitemap.xml

This file tells all crawlers: do not access the admin area (which should not be in search results anyway), but everything else is open. And here is where to find all the pages you should index.

That is all that most Dominican business websites need. The problem arises when the file says something very different from this.

What Robots.txt Does NOT Do (A Critical Distinction)

Before covering the common mistakes, there is a distinction that trips up even experienced developers and is the source of significant misunderstanding about what robots.txt controls.

Robots.txt controls crawling. It does not control indexing.

These are different things:

  • Crawling = Google visiting and reading a page
  • Indexing = Google including a page in its searchable database

A page blocked by robots.txt cannot be properly crawled. But Google can still index it — and show it in search results — if other websites link to it. In that case, Google knows the page exists from the inbound link, but because it cannot crawl it, the search result will show the URL without a meta description: a blank, stub entry with no content preview.

This is the source of the "Indexed, though blocked by robots.txt" error in Google Search Console — a page that appears in results but shows no snippet because Google cannot read its content.

The practical implication: if you want a page completely removed from Google search results, Disallow: in robots.txt is not sufficient. You need a noindex meta tag on the page itself. But for that noindex tag to work, Google must be able to crawl the page — which means the page must not be blocked in robots.txt.

This creates a specific trap for blocked pages: if you block a page in robots.txt and add a noindex tag, Google cannot see the noindex tag because it cannot access the page. The page may still appear in search results from link signals, with no description.

The correct approach for pages you genuinely want out of Google:

  1. Do not block them in robots.txt
  2. Add a <meta name="robots" content="noindex"> tag to the page's HTML
  3. Google crawls the page, sees the noindex instruction, and removes it from results

The Five Most Common Robots.txt Mistakes on Dominican Websites

Mistake 1 — The Development Mode Disaster

This is the most common and most damaging robots.txt problem on Dominican business websites — and it is almost invisible unless you know to look for it.

When building a WordPress website, developers often check the "Search Engine Visibility" box in Settings → Reading: "Discourage search engines from indexing this site." This adds the following to the site's robots.txt:

User-agent: *
Disallow: /

This single directive — Disallow: / — tells every crawler to stay away from every page on the entire website. It is the correct thing to do during development, when you do not want an unfinished site appearing in search results.

The problem is that when the site launches, this setting is frequently never changed. The business owner sees a live, functional website and assumes Google will find it. The "Discourage search engines" checkbox remains checked. Months pass. The business wonders why they are not appearing in Google. Nobody thinks to check robots.txt.

This exact scenario is extremely common in the Dominican market, where websites are frequently handed off from developers to business owners without a comprehensive technical SEO checklist at launch. The site is "live" — but invisible to every search engine in the world.

How to check: Visit yourdomain.com/robots.txt in your browser. If you see Disallow: /, your entire website is blocked.

How to fix for WordPress: Go to Settings → Reading and uncheck "Discourage search engines from indexing this site." Then save. The change takes effect immediately.

Mistake 2 — Blocking CSS and JavaScript Files

A robots.txt file that blocks CSS stylesheets or JavaScript files does not just hide those files from Google — it prevents Google from rendering your pages correctly.

Google renders your website essentially like a browser would: it loads the HTML, then loads the CSS and JavaScript referenced in that HTML, and assembles the complete visual page. When CSS or JavaScript files are blocked in robots.txt, Google receives incomplete page rendering. It sees bare HTML without styling, without interactive elements, without the full content that JavaScript might render.

The consequence is that Google's quality assessment of the page — its content, its usability, its Core Web Vitals signals — is based on a degraded version of what real visitors actually see. This suppresses rankings even for pages that are technically not blocked.

What to check: Look for any Disallow: lines in your robots.txt that reference directories containing .css or .js files. Common problematic lines:

  • Disallow: /wp-content/ (blocks all WordPress media, themes, and plugins including CSS/JS)
  • Disallow: /assets/
  • Disallow: /static/

None of these should be blocked. If your robots.txt contains them, remove or modify the rules.

Mistake 3 — Confusing Robots.txt Blocking With Noindex

This is the conceptual mistake that leads many Dominican website owners and developers to believe their pages are hidden from Google when they are not — or to believe pages are accessible when they are blocked.

Disallow: in robots.txt: prevents crawling. Google cannot read the page content. noindex meta tag on the page: prevents indexing. Google can read the page but will not show it in search results.

These are often used interchangeably by people who think they are equivalent. They are not.

Common errors that result from this confusion:

  • Adding Disallow: to block a page from search results, then wondering why it still appears (Google indexed it from a link before the block was added)
  • Adding both Disallow: and noindex to the same page, then Google can't see the noindex instruction because the page is blocked — and may still show the URL from link signals

The correct usage for each:

  • Use Disallow: for pages that should never be crawled and that you are not concerned about appearing in results from link signals (login pages, admin areas, internal search result pages, staging areas)
  • Use noindex for pages that should not appear in search results but where you need Google to crawl the page to see the instruction (thank-you pages, duplicate content pages, filtered navigation pages)

Mistake 4 — Staging Site Rules Deployed to Production

Another extremely common source of crawling disasters on Dominican websites: a robots.txt configuration built for a staging environment that gets deployed to the production site.

During development, it is correct practice to block crawlers on the staging site (staging.yourdomain.com) so that the unfinished version does not appear in Google alongside the live version. The staging robots.txt correctly contains Disallow: /.

When the site is deployed to production, if the robots.txt file is deployed along with the codebase without being updated, the production site inherits the staging block. The result is identical to Mistake 1: a live site that Google cannot crawl.

The difference from Mistake 1 is that this error can be more persistent — it may survive WordPress settings changes because the robots.txt is being served by the web server rather than generated by WordPress. It requires finding and editing or replacing the actual robots.txt file at the site root.

Prevention: Include robots.txt review in every deployment checklist. After any site migration or launch, verify yourdomain.com/robots.txt does not contain Disallow: /.

Mistake 5 — Wildcards That Block More Than Intended

Advanced robots.txt configurations using wildcard characters (*) and end-of-URL markers ($) can accidentally block patterns that were not intended.

A common example on tour operator or e-commerce sites that use URL parameters for filtering:

Disallow: /*?*

This is intended to block all URLs with query parameters (like filtered search results: tours/?category=snorkeling&price=low). The problem is that the wildcard can also match legitimate URLs that happen to contain a ? — including some booking systems and event registration URLs.

Another common example:

Disallow: /blog/

Added by a developer who wanted to block one specific blog category, this rule blocks the entire blog directory — including every individual blog post, every category page, and every tag page. All of it invisible to Google.

How to test wildcard rules: Use Google Search Console's robots.txt tester (Settings → robots.txt in Search Console) or the URL Inspection tool to check whether specific URLs are blocked before making changes to your robots.txt.

What a Correct Robots.txt Does for Your SEO (The Positive Side)

Robots.txt is not just about avoiding mistakes. Used correctly, it actively helps your SEO by directing Google's crawl budget toward your most important pages.

For a Dominican tourism website, the pages that matter for SEO are: your homepage, your service pages, your blog posts, your about page, your contact page. The pages that do not need to be indexed are: admin areas, login pages, thank-you pages after form submissions, internal search result pages, and pages that exist for technical reasons but have no user value.

By explicitly blocking the no-value pages, you allow Google to concentrate its crawl activity on the pages that matter — which can modestly accelerate the indexing and ranking of those pages, particularly on newer sites or sites with large amounts of content.

The Sitemap directive is also a positive contribution: pointing Google directly to your sitemap file ensures Google has a complete list of your priority URLs rather than having to discover them through link-following alone.

A well-configured robots.txt for a Punta Cana tour operator with WordPress looks like:

User-agent: *
Disallow: /wp-admin/
Disallow: /wp-login.php
Disallow: /cart/
Disallow: /checkout/
Disallow: /my-account/
Disallow: /thank-you/
Allow: /wp-admin/admin-ajax.php

User-agent: Googlebot
Allow: /

Sitemap: https://www.yourdomain.com/sitemap.xml

This file blocks the admin area and low-value user-journey pages, explicitly allows Google's crawler access to everything else, and directs it to the sitemap.

How to Check Your Robots.txt Right Now (60-Second Audit)

Step 1 — Find and read the file: Open a browser and go to yourdomain.com/robots.txt. Read it. If you see Disallow: / without a specific path (especially without a preceding Allow: that would override it), your entire site is blocked.

Step 2 — Check in Google Search Console: In Search Console, navigate to Settings and find the robots.txt checker. This tool shows you what your robots.txt currently says and allows you to test whether specific URLs are blocked or allowed.

Step 3 — Use the URL Inspection tool: In Search Console, paste any of your important service page URLs into the URL Inspection tool (the search bar at the top). If it shows "URL is not on Google" or "Blocked by robots.txt," that URL cannot be crawled or indexed.

Step 4 — Check Coverage report: In Search Console's Coverage (or Indexing) report, filter by "Excluded" status and look for "Blocked by robots.txt" as an exclusion reason. If you see important pages here, your robots.txt has an unintended block.

Step 5 — Google the site: Search site:yourdomain.com in Google. This shows all pages Google has indexed from your domain. If you have 50 pages but only 3 appear in this search, there is likely an indexing problem — either robots.txt blocking or noindex tags preventing the rest from appearing.

How Next.js and DR Web Studio Handle Robots.txt

For websites built on Next.js — the foundation of every DR Web Studio build — the robots.txt file is generated programmatically from a robots.ts file in the app directory. This gives it several advantages over a manually maintained static file:

It cannot accidentally be deployed with development-mode blocking rules, because the production configuration is explicitly separate from any development configuration. The Sitemap URL is always current because it references the canonical domain configured in the project's environment variables. It can be updated by changing a single configuration file rather than manually editing a text file that might be overwritten on the next deployment.

For clients using Sanity CMS, the robots.ts configuration is set once at launch and never needs to be touched again — because content management happens through Sanity, not through file changes that could accidentally overwrite the robots.txt.

At DR Web Studio, robots.txt configuration is part of our standard launch checklist alongside sitemap submission, Search Console verification, and structured data testing. The file is reviewed and confirmed before any site goes live — because the cost of a misconfigured robots.txt, in lost Google visibility and delayed rankings, is too high to leave to chance.

If you want to verify that your current robots.txt is correctly configured and that no important pages are being inadvertently blocked, request a free consultation. We will run a complete technical SEO audit including robots.txt review, Search Console coverage analysis, and indexation status for your key pages — and show you exactly what Google can and cannot currently see on your website.

One line can undo months of SEO work. Two minutes of checking can confirm it hasn't.

Related posts

Bilingual SEO: How to Rank in Both English and Spanish Without Hurting Either
SEO & Performance

Bilingual SEO: How to Rank in Both English and Spanish Without Hurting Either

May 17, 2026
15 min read
bilingual SEOhreflang
Read More
Structured Data for Dominican Businesses: How to Get Rich Results in Google Search
SEO & Performance

Structured Data for Dominican Businesses: How to Get Rich Results in Google Search

May 17, 2026
15 min read
JSON-LDrich results
Read More
Why a Fast Website Makes You More Money: The Conversion Math Behind Page Speed
SEO & Performance

Why a Fast Website Makes You More Money: The Conversion Math Behind Page Speed

May 16, 2026
12 min read
website speedmobile performance
Read More
The Difference Between SEO and SEM: Which One Does a Punta Cana Business Actually Need?
SEO & Performance

The Difference Between SEO and SEM: Which One Does a Punta Cana Business Actually Need?

May 16, 2026
13 min read
SEO vs SEMgoogle ads
Read More
How Long Does SEO Take? A Realistic Timeline for Dominican Businesses
SEO & Performance

How Long Does SEO Take? A Realistic Timeline for Dominican Businesses

May 14, 2026
14 min read
SEO dominican republicSEO timeline
Read More
What Is a Sitemap and Why Does Google Need One for Your Dominican Website?
SEO & Performance

What Is a Sitemap and Why Does Google Need One for Your Dominican Website?

May 13, 2026
13 min read
sitemaphreflang
Read More
Image Optimization for Tourism Websites: How to Make High-Quality Photos Load Fast
SEO & Performance

Image Optimization for Tourism Websites: How to Make High-Quality Photos Load Fast

May 12, 2026
14 min read
core web vitalsLCP
Read More
Why Your Punta Cana Business Doesn't Appear on Google Maps (And How to Fix It)
SEO & Performance

Why Your Punta Cana Business Doesn't Appear on Google Maps (And How to Fix It)

May 11, 2026
13 min read
local SEOgoogle maps
Read More
Google's Core Web Vitals in 2026: The Performance Score That's Costing Dominican Businesses Rankings
SEO & Performance

Google's Core Web Vitals in 2026: The Performance Score That's Costing Dominican Businesses Rankings

May 10, 2026
14 min read
google rankingspage speed
Read More