What does robots.txt do?

Robots.txt is a text file at yourdomain.com/robots.txt that tells search engine crawlers which parts of your website they are allowed to visit. It controls crawling — whether Google can read your pages — but not indexing directly. A page blocked in robots.txt can still appear in Google search results if other sites link to it, just without a meta description snippet.

How do I know if robots.txt is blocking my website from Google?

Visit yourdomain.com/robots.txt in your browser. If you see 'Disallow: /' without a specific path, your entire site is blocked. Also check Google Search Console's Coverage report for pages with 'Blocked by robots.txt' status, and use the URL Inspection tool to check if specific pages show 'Blocked by robots.txt' as their indexing status.

What is the difference between robots.txt Disallow and noindex?

Disallow in robots.txt prevents crawling — Google cannot read the page content. Noindex is a meta tag on the page that prevents indexing — Google can read the page but won't show it in search results. If you add both Disallow and noindex to the same page, Google cannot see the noindex instruction because the page is blocked, so the noindex has no effect. To properly remove a page from search results, allow it to be crawled and use noindex.

Why does my WordPress website have Disallow: / in robots.txt?

WordPress has a setting under Settings → Reading called 'Discourage search engines from indexing this site.' When this box is checked — typically during development — it adds 'Disallow: /' to robots.txt, blocking all crawlers from the entire site. If this setting was never unchecked when the site went live, your entire website is invisible to Google. Uncheck this box and save to fix it immediately.

What should a correct robots.txt look like for a Dominican business website?

A correct robots.txt for most Dominican tourism businesses blocks only admin areas and low-value pages, explicitly allows Googlebot access to everything else, and includes a Sitemap directive. It should never contain 'Disallow: /' (which blocks the whole site) and should not block CSS or JavaScript files that Google needs to render pages correctly.

Is Robots.txt Blocking Google From Your Website?

What Is a Robots.txt File and Is Yours Blocking Google From Your Website?

There is a file on your website right now that Google reads before it reads anything else. Before it looks at your homepage, before it crawls your service pages, before it evaluates your content for ranking — it reads this file and follows its instructions.

That file is called robots.txt. And for a significant number of Dominican business websites, that file is currently telling Google to stay away.

Not because the business owner intended it. Not because a developer made a strategic decision. Often because a single line was set during the website's development — when you do not want Google indexing an unfinished site — and was never changed when the site went live.

The result is a website that looks complete, loads correctly, and has been actively investing in SEO content — but is invisible to Google because the front door has a "do not enter" sign that nobody remembered to take down.

This article explains what robots.txt actually is, what it controls (and what it does not), the five most common mistakes that cause Dominican websites to block their own Google visibility, and how to check whether your file has a problem in under 60 seconds.

What Robots.txt Actually Is

A robots.txt file is a plain text file located at the root of your website — always at the URL yourdomain.com/robots.txt. It is publicly accessible, which means anyone (and any crawler) can read it.

The file uses a simple syntax to tell search engine bots — Googlebot, Bingbot, and others — which parts of your website they are allowed to crawl. "Crawling" means visiting and reading pages so they can be considered for inclusion in the search index. A page that cannot be crawled cannot be properly indexed. A page that is not properly indexed does not appear in Google search results.

The syntax is minimal. A robots.txt file has only a few types of lines:

User-agent: Specifies which crawler the following rules apply to. User-agent: * applies to all crawlers. User-agent: Googlebot applies only to Google's crawler.

Disallow: Specifies which URLs the crawler should not visit. Disallow: /admin/ means do not crawl anything in the admin directory. Disallow: / means do not crawl anything on the entire website.

Allow: Overrides a Disallow rule for specific pages within a blocked directory. Allow: /admin/public-page.html within a block on /admin/ allows that one specific page.

Sitemap: Tells crawlers where your XML sitemap is located — a helpful positive instruction that ensures Google finds all your important pages.

A minimal, correct robots.txt for most Dominican tourism websites looks like this:

User-agent: *
Disallow: /admin/
Disallow: /wp-admin/
Sitemap: https://www.yourdomain.com/sitemap.xml

This file tells all crawlers: do not access the admin area (which should not be in search results anyway), but everything else is open. And here is where to find all the pages you should index.

That is all that most Dominican business websites need. The problem arises when the file says something very different from this.

What Robots.txt Does NOT Do (A Critical Distinction)

Before covering the common mistakes, there is a distinction that trips up even experienced developers and is the source of significant misunderstanding about what robots.txt controls.

Robots.txt controls crawling. It does not control indexing.

These are different things:

Crawling = Google visiting and reading a page
Indexing = Google including a page in its searchable database

A page blocked by robots.txt cannot be properly crawled. But Google can still index it — and show it in search results — if other websites link to it. In that case, Google knows the page exists from the inbound link, but because it cannot crawl it, the search result will show the URL without a meta description: a blank, stub entry with no content preview.

This is the source of the "Indexed, though blocked by robots.txt" error in Google Search Console — a page that appears in results but shows no snippet because Google cannot read its content.

The practical implication: if you want a page completely removed from Google search results, Disallow: in robots.txt is not sufficient. You need a noindex meta tag on the page itself. But for that noindex tag to work, Google must be able to crawl the page — which means the page must not be blocked in robots.txt.

This creates a specific trap for blocked pages: if you block a page in robots.txt and add a noindex tag, Google cannot see the noindex tag because it cannot access the page. The page may still appear in search results from link signals, with no description.

The correct approach for pages you genuinely want out of Google:

Do not block them in robots.txt
Add a <meta name="robots" content="noindex"> tag to the page's HTML
Google crawls the page, sees the noindex instruction, and removes it from results

The Five Most Common Robots.txt Mistakes on Dominican Websites

Mistake 1 — The Development Mode Disaster

This is the most common and most damaging robots.txt problem on Dominican business websites — and it is almost invisible unless you know to look for it.

When building a WordPress website, developers often check the "Search Engine Visibility" box in Settings → Reading: "Discourage search engines from indexing this site." This adds the following to the site's robots.txt:

User-agent: *
Disallow: /

This single directive — Disallow: / — tells every crawler to stay away from every page on the entire website. It is the correct thing to do during development, when you do not want an unfinished site appearing in search results.

The problem is that when the site launches, this setting is frequently never changed. The business owner sees a live, functional website and assumes Google will find it. The "Discourage search engines" checkbox remains checked. Months pass. The business wonders why they are not appearing in Google. Nobody thinks to check robots.txt.

This exact scenario is extremely common in the Dominican market, where websites are frequently handed off from developers to business owners without a comprehensive technical SEO checklist at launch. The site is "live" — but invisible to every search engine in the world.

How to check: Visit yourdomain.com/robots.txt in your browser. If you see Disallow: /, your entire website is blocked.

How to fix for WordPress: Go to Settings → Reading and uncheck "Discourage search engines from indexing this site." Then save. The change takes effect immediately.

Mistake 2 — Blocking CSS and JavaScript Files

A robots.txt file that blocks CSS stylesheets or JavaScript files does not just hide those files from Google — it prevents Google from rendering your pages correctly.

Google renders your website essentially like a browser would: it loads the HTML, then loads the CSS and JavaScript referenced in that HTML, and assembles the complete visual page. When CSS or JavaScript files are blocked in robots.txt, Google receives incomplete page rendering. It sees bare HTML without styling, without interactive elements, without the full content that JavaScript might render.

The consequence is that Google's quality assessment of the page — its content, its usability, its Core Web Vitals signals — is based on a degraded version of what real visitors actually see. This suppresses rankings even for pages that are technically not blocked.

What to check: Look for any Disallow: lines in your robots.txt that reference directories containing .css or .js files. Common problematic lines:

Disallow: /wp-content/ (blocks all WordPress media, themes, and plugins including CSS/JS)
Disallow: /assets/
Disallow: /static/

None of these should be blocked. If your robots.txt contains them, remove or modify the rules.

Mistake 3 — Confusing Robots.txt Blocking With Noindex

This is the conceptual mistake that leads many Dominican website owners and developers to believe their pages are hidden from Google when they are not — or to believe pages are accessible when they are blocked.

Disallow: in robots.txt: prevents crawling. Google cannot read the page content. noindex meta tag on the page: prevents indexing. Google can read the page but will not show it in search results.

These are often used interchangeably by people who think they are equivalent. They are not.

Common errors that result from this confusion:

Adding Disallow: to block a page from search results, then wondering why it still appears (Google indexed it from a link before the block was added)
Adding both Disallow: and noindex to the same page, then Google can't see the noindex instruction because the page is blocked — and may still show the URL from link signals

The correct usage for each:

Use Disallow: for pages that should never be crawled and that you are not concerned about appearing in results from link signals (login pages, admin areas, internal search result pages, staging areas)
Use noindex for pages that should not appear in search results but where you need Google to crawl the page to see the instruction (thank-you pages, duplicate content pages, filtered navigation pages)

Mistake 4 — Staging Site Rules Deployed to Production

Another extremely common source of crawling disasters on Dominican websites: a robots.txt configuration built for a staging environment that gets deployed to the production site.

During development, it is correct practice to block crawlers on the staging site (staging.yourdomain.com) so that the unfinished version does not appear in Google alongside the live version. The staging robots.txt correctly contains Disallow: /.

When the site is deployed to production, if the robots.txt file is deployed along with the codebase without being updated, the production site inherits the staging block. The result is identical to Mistake 1: a live site that Google cannot crawl.

The difference from Mistake 1 is that this error can be more persistent — it may survive WordPress settings changes because the robots.txt is being served by the web server rather than generated by WordPress. It requires finding and editing or replacing the actual robots.txt file at the site root.

Prevention: Include robots.txt review in every deployment checklist. After any site migration or launch, verify yourdomain.com/robots.txt does not contain Disallow: /.

Mistake 5 — Wildcards That Block More Than Intended

Advanced robots.txt configurations using wildcard characters (*) and end-of-URL markers ($) can accidentally block patterns that were not intended.

A common example on tour operator or e-commerce sites that use URL parameters for filtering:

Disallow: /*?*

This is intended to block all URLs with query parameters (like filtered search results: tours/?category=snorkeling&price=low). The problem is that the wildcard can also match legitimate URLs that happen to contain a ? — including some booking systems and event registration URLs.

Another common example:

Disallow: /blog/

Added by a developer who wanted to block one specific blog category, this rule blocks the entire blog directory — including every individual blog post, every category page, and every tag page. All of it invisible to Google.

How to test wildcard rules: Use Google Search Console's robots.txt tester (Settings → robots.txt in Search Console) or the URL Inspection tool to check whether specific URLs are blocked before making changes to your robots.txt.

What a Correct Robots.txt Does for Your SEO (The Positive Side)

Robots.txt is not just about avoiding mistakes. Used correctly, it actively helps your SEO by directing Google's crawl budget toward your most important pages.

For a Dominican tourism website, the pages that matter for SEO are: your homepage, your service pages, your blog posts, your about page, your contact page. The pages that do not need to be indexed are: admin areas, login pages, thank-you pages after form submissions, internal search result pages, and pages that exist for technical reasons but have no user value.

By explicitly blocking the no-value pages, you allow Google to concentrate its crawl activity on the pages that matter — which can modestly accelerate the indexing and ranking of those pages, particularly on newer sites or sites with large amounts of content.

The Sitemap directive is also a positive contribution: pointing Google directly to your sitemap file ensures Google has a complete list of your priority URLs rather than having to discover them through link-following alone.

A well-configured robots.txt for a Punta Cana tour operator with WordPress looks like:

User-agent: *
Disallow: /wp-admin/
Disallow: /wp-login.php
Disallow: /cart/
Disallow: /checkout/
Disallow: /my-account/
Disallow: /thank-you/
Allow: /wp-admin/admin-ajax.php

User-agent: Googlebot
Allow: /

Sitemap: https://www.yourdomain.com/sitemap.xml

This file blocks the admin area and low-value user-journey pages, explicitly allows Google's crawler access to everything else, and directs it to the sitemap.

How to Check Your Robots.txt Right Now (60-Second Audit)

Step 1 — Find and read the file: Open a browser and go to yourdomain.com/robots.txt. Read it. If you see Disallow: / without a specific path (especially without a preceding Allow: that would override it), your entire site is blocked.

Step 2 — Check in Google Search Console: In Search Console, navigate to Settings and find the robots.txt checker. This tool shows you what your robots.txt currently says and allows you to test whether specific URLs are blocked or allowed.

Step 3 — Use the URL Inspection tool: In Search Console, paste any of your important service page URLs into the URL Inspection tool (the search bar at the top). If it shows "URL is not on Google" or "Blocked by robots.txt," that URL cannot be crawled or indexed.

Step 4 — Check Coverage report: In Search Console's Coverage (or Indexing) report, filter by "Excluded" status and look for "Blocked by robots.txt" as an exclusion reason. If you see important pages here, your robots.txt has an unintended block.

Step 5 — Google the site: Search site:yourdomain.com in Google. This shows all pages Google has indexed from your domain. If you have 50 pages but only 3 appear in this search, there is likely an indexing problem — either robots.txt blocking or noindex tags preventing the rest from appearing.

How Next.js and DR Web Studio Handle Robots.txt

For websites built on Next.js — the foundation of every DR Web Studio build — the robots.txt file is generated programmatically from a robots.ts file in the app directory. This gives it several advantages over a manually maintained static file:

It cannot accidentally be deployed with development-mode blocking rules, because the production configuration is explicitly separate from any development configuration. The Sitemap URL is always current because it references the canonical domain configured in the project's environment variables. It can be updated by changing a single configuration file rather than manually editing a text file that might be overwritten on the next deployment.

For clients using Sanity CMS, the robots.ts configuration is set once at launch and never needs to be touched again — because content management happens through Sanity, not through file changes that could accidentally overwrite the robots.txt.

At DR Web Studio, robots.txt configuration is part of our standard launch checklist alongside sitemap submission, Search Console verification, and structured data testing. The file is reviewed and confirmed before any site goes live — because the cost of a misconfigured robots.txt, in lost Google visibility and delayed rankings, is too high to leave to chance.

If you want to verify that your current robots.txt is correctly configured and that no important pages are being inadvertently blocked, request a free consultation. We will run a complete technical SEO audit including robots.txt review, Search Console coverage analysis, and indexation status for your key pages — and show you exactly what Google can and cannot currently see on your website.

One line can undo months of SEO work. Two minutes of checking can confirm it hasn't.