LANCDN

CF 的 ai 反爬 robots.txt 似乎部分情况下跟 Pages 机制冲突?

  •  
  •   LANCDN · Jan 28 · 1718 views
    This topic created in 128 days ago, the information mentioned may be changed or developed.

    触发条件

    • Pages 有一个根域名(二级域名好像没这问题)的自定义域
    • 部署的 Pages 里没有 404.html,有正常的 index.html
    • 仪表板的 AI Crawl Control => Robots.txt => Cloudflare managed 开着

    现象

    • 手动访问 xxx.com/robots.txt 的时候 index.html 的文件内容会出现在 CF 的 robots.txt 模板下面,感觉像 Pages 默认回落的逻辑也跟着执行了。大概就像这样:
    # As a condition of accessing this website, you agree to abide by the following
    # content signals:
    
    ...
    
    # BEGIN Cloudflare Managed content
    
    User-agent: *
    Content-Signal: search=yes,ai-train=no
    Allow: /
    
    ...
    
    # END Cloudflare Managed Content
    
    <!DOCTYPE html>
    <html lang="zh">
    	...
    </html>
    
    No Comments Yet
    About   ·   Help   ·   Advertise   ·   Blog   ·   API   ·   FAQ   ·   Solana   ·   2540 Online   Highest 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 26ms · UTC 00:56 · PVG 08:56 · LAX 17:56 · JFK 20:56
    ♥ Do have faith in what you're doing.