

There is a file on your website right now that Google reads before it reads anything else. Before it looks at your homepage, before it crawls your service pages, before it evaluates your content for ranking — it reads this file and follows its instructions.
That file is called robots.txt. And for a significant number of Dominican business websites, that file is currently telling Google to stay away.
Not because the business owner intended it. Not because a developer made a strategic decision. Often because a single line was set during the website's development — when you do not want Google indexing an unfinished site — and was never changed when the site went live.
The result is a website that looks complete, loads correctly, and has been actively investing in SEO content — but is invisible to Google because the front door has a "do not enter" sign that nobody remembered to take down.
This article explains what robots.txt actually is, what it controls (and what it does not), the five most common mistakes that cause Dominican websites to block their own Google visibility, and how to check whether your file has a problem in under 60 seconds.
A robots.txt file is a plain text file located at the root of your website — always at the URL yourdomain.com/robots.txt. It is publicly accessible, which means anyone (and any crawler) can read it.
The file uses a simple syntax to tell search engine bots — Googlebot, Bingbot, and others — which parts of your website they are allowed to crawl. "Crawling" means visiting and reading pages so they can be considered for inclusion in the search index. A page that cannot be crawled cannot be properly indexed. A page that is not properly indexed does not appear in Google search results.
The syntax is minimal. A robots.txt file has only a few types of lines:
User-agent: Specifies which crawler the following rules apply to. User-agent: * applies to all crawlers. User-agent: Googlebot applies only to Google's crawler.
Disallow: Specifies which URLs the crawler should not visit. Disallow: /admin/ means do not crawl anything in the admin directory. Disallow: / means do not crawl anything on the entire website.
Allow: Overrides a Disallow rule for specific pages within a blocked directory. Allow: /admin/public-page.html within a block on /admin/ allows that one specific page.
Sitemap: Tells crawlers where your XML sitemap is located — a helpful positive instruction that ensures Google finds all your important pages.
A minimal, correct robots.txt for most Dominican tourism websites looks like this:
User-agent: *
Disallow: /admin/
Disallow: /wp-admin/
Sitemap: https://www.yourdomain.com/sitemap.xml
This file tells all crawlers: do not access the admin area (which should not be in search results anyway), but everything else is open. And here is where to find all the pages you should index.
That is all that most Dominican business websites need. The problem arises when the file says something very different from this.
Before covering the common mistakes, there is a distinction that trips up even experienced developers and is the source of significant misunderstanding about what robots.txt controls.
Robots.txt controls crawling. It does not control indexing.
These are different things:
A page blocked by robots.txt cannot be properly crawled. But Google can still index it — and show it in search results — if other websites link to it. In that case, Google knows the page exists from the inbound link, but because it cannot crawl it, the search result will show the URL without a meta description: a blank, stub entry with no content preview.
This is the source of the "Indexed, though blocked by robots.txt" error in Google Search Console — a page that appears in results but shows no snippet because Google cannot read its content.
The practical implication: if you want a page completely removed from Google search results, Disallow: in robots.txt is not sufficient. You need a noindex meta tag on the page itself. But for that noindex tag to work, Google must be able to crawl the page — which means the page must not be blocked in robots.txt.
This creates a specific trap for blocked pages: if you block a page in robots.txt and add a noindex tag, Google cannot see the noindex tag because it cannot access the page. The page may still appear in search results from link signals, with no description.
The correct approach for pages you genuinely want out of Google:
<meta name="robots" content="noindex"> tag to the page's HTMLThis is the most common and most damaging robots.txt problem on Dominican business websites — and it is almost invisible unless you know to look for it.
When building a WordPress website, developers often check the "Search Engine Visibility" box in Settings → Reading: "Discourage search engines from indexing this site." This adds the following to the site's robots.txt:
User-agent: *
Disallow: /
This single directive — Disallow: / — tells every crawler to stay away from every page on the entire website. It is the correct thing to do during development, when you do not want an unfinished site appearing in search results.
The problem is that when the site launches, this setting is frequently never changed. The business owner sees a live, functional website and assumes Google will find it. The "Discourage search engines" checkbox remains checked. Months pass. The business wonders why they are not appearing in Google. Nobody thinks to check robots.txt.
This exact scenario is extremely common in the Dominican market, where websites are frequently handed off from developers to business owners without a comprehensive technical SEO checklist at launch. The site is "live" — but invisible to every search engine in the world.
How to check: Visit yourdomain.com/robots.txt in your browser. If you see Disallow: /, your entire website is blocked.
How to fix for WordPress: Go to Settings → Reading and uncheck "Discourage search engines from indexing this site." Then save. The change takes effect immediately.
A robots.txt file that blocks CSS stylesheets or JavaScript files does not just hide those files from Google — it prevents Google from rendering your pages correctly.
Google renders your website essentially like a browser would: it loads the HTML, then loads the CSS and JavaScript referenced in that HTML, and assembles the complete visual page. When CSS or JavaScript files are blocked in robots.txt, Google receives incomplete page rendering. It sees bare HTML without styling, without interactive elements, without the full content that JavaScript might render.
The consequence is that Google's quality assessment of the page — its content, its usability, its Core Web Vitals signals — is based on a degraded version of what real visitors actually see. This suppresses rankings even for pages that are technically not blocked.
What to check: Look for any Disallow: lines in your robots.txt that reference directories containing .css or .js files. Common problematic lines:
Disallow: /wp-content/ (blocks all WordPress media, themes, and plugins including CSS/JS)Disallow: /assets/Disallow: /static/None of these should be blocked. If your robots.txt contains them, remove or modify the rules.
This is the conceptual mistake that leads many Dominican website owners and developers to believe their pages are hidden from Google when they are not — or to believe pages are accessible when they are blocked.
Disallow: in robots.txt: prevents crawling. Google cannot read the page content. noindex meta tag on the page: prevents indexing. Google can read the page but will not show it in search results.
These are often used interchangeably by people who think they are equivalent. They are not.
Common errors that result from this confusion:
Disallow: to block a page from search results, then wondering why it still appears (Google indexed it from a link before the block was added)Disallow: and noindex to the same page, then Google can't see the noindex instruction because the page is blocked — and may still show the URL from link signalsThe correct usage for each:
Disallow: for pages that should never be crawled and that you are not concerned about appearing in results from link signals (login pages, admin areas, internal search result pages, staging areas)noindex for pages that should not appear in search results but where you need Google to crawl the page to see the instruction (thank-you pages, duplicate content pages, filtered navigation pages)Another extremely common source of crawling disasters on Dominican websites: a robots.txt configuration built for a staging environment that gets deployed to the production site.
During development, it is correct practice to block crawlers on the staging site (staging.yourdomain.com) so that the unfinished version does not appear in Google alongside the live version. The staging robots.txt correctly contains Disallow: /.
When the site is deployed to production, if the robots.txt file is deployed along with the codebase without being updated, the production site inherits the staging block. The result is identical to Mistake 1: a live site that Google cannot crawl.
The difference from Mistake 1 is that this error can be more persistent — it may survive WordPress settings changes because the robots.txt is being served by the web server rather than generated by WordPress. It requires finding and editing or replacing the actual robots.txt file at the site root.
Prevention: Include robots.txt review in every deployment checklist. After any site migration or launch, verify yourdomain.com/robots.txt does not contain Disallow: /.
Advanced robots.txt configurations using wildcard characters (*) and end-of-URL markers ($) can accidentally block patterns that were not intended.
A common example on tour operator or e-commerce sites that use URL parameters for filtering:
Disallow: /*?*
This is intended to block all URLs with query parameters (like filtered search results: tours/?category=snorkeling&price=low). The problem is that the wildcard can also match legitimate URLs that happen to contain a ? — including some booking systems and event registration URLs.
Another common example:
Disallow: /blog/
Added by a developer who wanted to block one specific blog category, this rule blocks the entire blog directory — including every individual blog post, every category page, and every tag page. All of it invisible to Google.
How to test wildcard rules: Use Google Search Console's robots.txt tester (Settings → robots.txt in Search Console) or the URL Inspection tool to check whether specific URLs are blocked before making changes to your robots.txt.
Robots.txt is not just about avoiding mistakes. Used correctly, it actively helps your SEO by directing Google's crawl budget toward your most important pages.
For a Dominican tourism website, the pages that matter for SEO are: your homepage, your service pages, your blog posts, your about page, your contact page. The pages that do not need to be indexed are: admin areas, login pages, thank-you pages after form submissions, internal search result pages, and pages that exist for technical reasons but have no user value.
By explicitly blocking the no-value pages, you allow Google to concentrate its crawl activity on the pages that matter — which can modestly accelerate the indexing and ranking of those pages, particularly on newer sites or sites with large amounts of content.
The Sitemap directive is also a positive contribution: pointing Google directly to your sitemap file ensures Google has a complete list of your priority URLs rather than having to discover them through link-following alone.
A well-configured robots.txt for a Punta Cana tour operator with WordPress looks like:
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-login.php
Disallow: /cart/
Disallow: /checkout/
Disallow: /my-account/
Disallow: /thank-you/
Allow: /wp-admin/admin-ajax.php
User-agent: Googlebot
Allow: /
Sitemap: https://www.yourdomain.com/sitemap.xml
This file blocks the admin area and low-value user-journey pages, explicitly allows Google's crawler access to everything else, and directs it to the sitemap.
Step 1 — Find and read the file: Open a browser and go to yourdomain.com/robots.txt. Read it. If you see Disallow: / without a specific path (especially without a preceding Allow: that would override it), your entire site is blocked.
Step 2 — Check in Google Search Console: In Search Console, navigate to Settings and find the robots.txt checker. This tool shows you what your robots.txt currently says and allows you to test whether specific URLs are blocked or allowed.
Step 3 — Use the URL Inspection tool: In Search Console, paste any of your important service page URLs into the URL Inspection tool (the search bar at the top). If it shows "URL is not on Google" or "Blocked by robots.txt," that URL cannot be crawled or indexed.
Step 4 — Check Coverage report: In Search Console's Coverage (or Indexing) report, filter by "Excluded" status and look for "Blocked by robots.txt" as an exclusion reason. If you see important pages here, your robots.txt has an unintended block.
Step 5 — Google the site: Search site:yourdomain.com in Google. This shows all pages Google has indexed from your domain. If you have 50 pages but only 3 appear in this search, there is likely an indexing problem — either robots.txt blocking or noindex tags preventing the rest from appearing.
For websites built on Next.js — the foundation of every DR Web Studio build — the robots.txt file is generated programmatically from a robots.ts file in the app directory. This gives it several advantages over a manually maintained static file:
It cannot accidentally be deployed with development-mode blocking rules, because the production configuration is explicitly separate from any development configuration. The Sitemap URL is always current because it references the canonical domain configured in the project's environment variables. It can be updated by changing a single configuration file rather than manually editing a text file that might be overwritten on the next deployment.
For clients using Sanity CMS, the robots.ts configuration is set once at launch and never needs to be touched again — because content management happens through Sanity, not through file changes that could accidentally overwrite the robots.txt.
At DR Web Studio, robots.txt configuration is part of our standard launch checklist alongside sitemap submission, Search Console verification, and structured data testing. The file is reviewed and confirmed before any site goes live — because the cost of a misconfigured robots.txt, in lost Google visibility and delayed rankings, is too high to leave to chance.
If you want to verify that your current robots.txt is correctly configured and that no important pages are being inadvertently blocked, request a free consultation. We will run a complete technical SEO audit including robots.txt review, Search Console coverage analysis, and indexation status for your key pages — and show you exactly what Google can and cannot currently see on your website.
One line can undo months of SEO work. Two minutes of checking can confirm it hasn't.