Enhance embed URL handling and validation system#7
Conversation
|
@cubic-dev-ai review this pull request |
@ShashankFC I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
23 issues found across 28 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="lib/tasks/disqus.thor">
<violation number="1" location="lib/tasks/disqus.thor:148">
P2: No error handling for remote URL fetch: `TopicEmbed.import_remote` calls `open(url).read` which will raise on network errors, 404s, or unreachable Disqus thread URLs. This will abort the entire import loop. Consider wrapping this call in a `begin/rescue` block so that a single failed URL doesn't stop the whole import.</violation>
<violation number="2" location="lib/tasks/disqus.thor:148">
P1: Loss of original creation date: `TopicEmbed.import_remote` does not forward `created_at` to `PostCreator`, so all imported Disqus threads will get the current timestamp instead of preserving their original `t[:created_at]` date. For a historical import tool, this is a significant data loss. Consider extending the `opts` hash and `TopicEmbed.import` to accept and pass through `created_at`.</violation>
</file>
<file name="app/jobs/scheduled/poll_feed.rb">
<violation number="1" location="app/jobs/scheduled/poll_feed.rb:29">
P0: **Bug (Ruby 3.3 incompatible):** `Kernel#open` no longer handles URIs since Ruby 3.0. This call will raise `Errno::ENOENT` trying to open the URL as a local file path. Use `URI.open` instead. Note that even `URI.open` accepts `"|command"` pipe syntax, so consider validating the URL scheme before opening.</violation>
<violation number="2" location="app/jobs/scheduled/poll_feed.rb:35">
P1: **Potential NoMethodError:** `i.content` may be `nil` for feed items that lack a `<content>` element (some feeds only have `<description>`). Calling `.scrub` on `nil` will raise `NoMethodError`. Use safe navigation or provide a fallback.</violation>
</file>
<file name="app/views/embed/best.html.erb">
<violation number="1" location="app/views/embed/best.html.erb:6">
P0: Syntax error: `end if` without a condition will raise a `SyntaxError`. This should be `end`, not `end if`. Ruby's `end if <condition>` is a modifier form that requires a boolean expression after `if`.</violation>
</file>
<file name="db/migrate/20131219203905_add_cook_method_to_posts.rb">
<violation number="1" location="db/migrate/20131219203905_add_cook_method_to_posts.rb:1">
P1: This migration will raise `StandardError` at runtime on Rails 8.0. Since Rails 5.1, you must specify a migration version (e.g., `ActiveRecord::Migration[8.0]`). Without it, Rails refuses to run the migration.</violation>
</file>
<file name="app/controllers/embed_controller.rb">
<violation number="1" location="app/controllers/embed_controller.rb:28">
P1: `X-Frame-Options: ALLOWALL` is not a valid directive per the HTTP spec. Browsers may ignore it (allowing framing by accident) or fall back to `DENY` (breaking embeds entirely). Since the goal is to allow cross-origin framing, delete the default Rails header instead.</violation>
</file>
<file name="lib/post_revisor.rb">
<violation number="1" location="lib/post_revisor.rb:85">
P1: Skipping all model validations via `skip_validations` is risky — the save result is not checked, so failures are silent. If `save` returns `false` (e.g., due to a DB constraint violation when validations are skipped), the code continues to `save_reply_relationships`, potentially causing inconsistent state. Consider at minimum checking the return value of `save` and handling failure, or using `save!` within the existing transaction block so failures trigger a rollback.</violation>
</file>
<file name="spec/jobs/poll_feed_spec.rb">
<violation number="1" location="spec/jobs/poll_feed_spec.rb:18">
P2: Test doesn't isolate the condition it claims to verify. Because `feed_polling_enabled?` is not stubbed to `true`, short-circuit evaluation means `poll_feed` is never reached regardless of `feed_polling_url`'s value. This test will pass even if the `feed_polling_url.present?` check is removed from the implementation. Stub the other two settings to their valid/truthy values so only the setting under test controls the outcome.</violation>
</file>
<file name="app/assets/javascripts/embed.js">
<violation number="1" location="app/assets/javascripts/embed.js:7">
P1: URL construction will break if `discourseUrl` doesn't end with a trailing `/`. For example, `https://forum.example.com` + `"embed/best?..."` produces the malformed `https://forum.example.comembed/best?...`.</violation>
<violation number="2" location="app/assets/javascripts/embed.js:17">
P1: Security: `indexOf` for postMessage origin validation is vulnerable to substring attacks. An attacker controlling a domain that is a prefix/substring of `discourseUrl` (e.g., `https://forum.example.co` vs `https://forum.example.com`) would bypass this check, allowing them to send arbitrary resize messages to manipulate the iframe.
Use exact origin comparison instead:</violation>
</file>
<file name="app/views/layouts/embed.html.erb">
<violation number="1" location="app/views/layouts/embed.html.erb:11">
P2: **Security concern**: `request.referer` is a user-controlled HTTP header being used as the `postMessage` `targetOrigin`. An attacker could craft requests with a malicious Referer, causing the iframe to send messages to an attacker-controlled origin. Consider validating the referer against a server-side allowlist of permitted embed origins before using it as the target origin.</violation>
<violation number="2" location="app/views/layouts/embed.html.erb:11">
P1: **Incorrect escaping context**: `<%= request.referer %>` uses HTML escaping inside a `<script>` tag, which corrupts the URL for JavaScript use (e.g., `&` becomes `&` — a literal string in JS, not an HTML entity). Additionally, `request.referer` can be `nil` when the header is absent.
Use `escape_javascript` (or `j`) for JavaScript string contexts, and handle the `nil` case:</violation>
</file>
<file name="db/migrate/20131223171005_create_top_topics.rb">
<violation number="1" location="db/migrate/20131223171005_create_top_topics.rb:3">
P0: `force: true` on `create_table` in a migration will **drop the existing table and all its data** before recreating it. This is dangerous in production — if the migration is re-run or the table already exists, all rows in `top_topics` are silently deleted. Remove `force: true`; if you need to handle an already-existing table, use a conditional check or a separate migration to modify the schema.</violation>
</file>
<file name="db/migrate/20131217174004_create_topic_embeds.rb">
<violation number="1" location="db/migrate/20131217174004_create_topic_embeds.rb:3">
P1: `force: true` will drop the `topic_embeds` table (and all its data) if it already exists before recreating it. This is dangerous in production and inconsistent with every other migration in the codebase. Remove `force: true` to let Rails raise an error instead of silently destroying data.</violation>
</file>
<file name="spec/controllers/embed_controller_spec.rb">
<violation number="1" location="spec/controllers/embed_controller_spec.rb:35">
P2: Assertions in `after` blocks are an anti-pattern. `after` hooks are meant for teardown/cleanup, not verification. If a test fails before the `get` request is made (e.g., a mock setup error), the `after` block still runs and tries to assert on the response, producing confusing cascading failures. Move these assertions into each `it` block instead.</violation>
</file>
<file name="app/models/post.rb">
<violation number="1" location="app/models/post.rb:133">
P0: **Security: XSS risk and plugin filter bypass.** Returning `raw` directly bypasses both HTML sanitization and the `:after_post_cook` plugin filter. Since the comment states this path is for RSS-imported posts (an external, untrusted source), unsanitized HTML could be rendered in users' browsers. At minimum, the raw HTML should still be passed through the plugin filter, and ideally through a sanitization step.</violation>
</file>
<file name="app/models/topic_embed.rb">
<violation number="1" location="app/models/topic_embed.rb:11">
P1: **Security: Incorrect regex anchor**. In Ruby, `^` matches the beginning of any *line*, not the beginning of the string. Use `\A` instead to properly validate that the URL starts with `http://` or `https://`. A string like `"malicious\nhttps://..."` would bypass this check.</violation>
<violation number="2" location="app/models/topic_embed.rb:13">
P1: **Security: XSS via unescaped URL in HTML**. The `url` is interpolated directly into an HTML `<a>` tag with single-quoted attributes and no HTML escaping. A URL containing `'` can break out of the `href` attribute. Use `ERB::Util.html_escape(url)` or `CGI.escapeHTML(url)` and prefer double-quoted attributes.</violation>
<violation number="3" location="app/models/topic_embed.rb:48">
P0: **Security: Command injection / SSRF via `open(url)`**. `Kernel#open` with user-supplied input can execute arbitrary commands if the string starts with `|`. Even with `OpenURI` loaded, this is an SSRF risk. Use `URI.open(url)` (Ruby 2.7+) or `URI.parse(url).open` with explicit scheme validation *before* opening, not after.</violation>
</file>
<file name="lib/topic_retriever.rb">
<violation number="1" location="lib/topic_retriever.rb:27">
P1: Race condition: `setnx` + `expire` are not atomic. If the process crashes between these two calls, the key will never expire, permanently blocking retrieval for this URL. Use a single atomic `SET` with `EX` and `NX` options instead.</violation>
<violation number="2" location="lib/topic_retriever.rb:49">
P2: Potential `NoMethodError` if `SiteSetting.embed_by_username` is nil or not configured — calling `.downcase` on `nil` will raise. Consider adding a guard or using safe navigation.</violation>
</file>
<file name="app/views/embed/loading.html.erb">
<violation number="1" location="app/views/embed/loading.html.erb:8">
P2: Unbounded infinite reload loop: this page unconditionally reloads every 30 seconds with no retry limit. If the loading condition never resolves (e.g., feed fetch fails permanently), the page will reload indefinitely, wasting server resources and bandwidth. Consider adding a maximum retry count using a URL parameter or cookie to cap the number of reloads.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| return if user.blank? | ||
|
|
||
| require 'simple-rss' | ||
| rss = SimpleRSS.parse open(SiteSetting.feed_polling_url) |
There was a problem hiding this comment.
P0: Bug (Ruby 3.3 incompatible): Kernel#open no longer handles URIs since Ruby 3.0. This call will raise Errno::ENOENT trying to open the URL as a local file path. Use URI.open instead. Note that even URI.open accepts "|command" pipe syntax, so consider validating the URL scheme before opening.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At app/jobs/scheduled/poll_feed.rb, line 29:
<comment>**Bug (Ruby 3.3 incompatible):** `Kernel#open` no longer handles URIs since Ruby 3.0. This call will raise `Errno::ENOENT` trying to open the URL as a local file path. Use `URI.open` instead. Note that even `URI.open` accepts `"|command"` pipe syntax, so consider validating the URL scheme before opening.</comment>
<file context>
@@ -0,0 +1,41 @@
+ return if user.blank?
+
+ require 'simple-rss'
+ rss = SimpleRSS.parse open(SiteSetting.feed_polling_url)
+
+ rss.items.each do |i|
</file context>
| <%= link_to(I18n.t('embed.title'), @topic_view.topic.url, class: 'button', target: '_blank') %> | ||
| <%- else %> | ||
| <%= link_to(I18n.t('embed.start_discussion'), @topic_view.topic.url, class: 'button', target: '_blank') %> | ||
| <%- end if %> |
There was a problem hiding this comment.
P0: Syntax error: end if without a condition will raise a SyntaxError. This should be end, not end if. Ruby's end if <condition> is a modifier form that requires a boolean expression after if.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At app/views/embed/best.html.erb, line 6:
<comment>Syntax error: `end if` without a condition will raise a `SyntaxError`. This should be `end`, not `end if`. Ruby's `end if <condition>` is a modifier form that requires a boolean expression after `if`.</comment>
<file context>
@@ -0,0 +1,30 @@
+ <%= link_to(I18n.t('embed.title'), @topic_view.topic.url, class: 'button', target: '_blank') %>
+ <%- else %>
+ <%= link_to(I18n.t('embed.start_discussion'), @topic_view.topic.url, class: 'button', target: '_blank') %>
+ <%- end if %>
+
+ <%= link_to(image_tag(SiteSetting.logo_url, class: 'logo'), Discourse.base_url) %>
</file context>
| class CreateTopTopics < ActiveRecord::Migration | ||
| def change | ||
| create_table :top_topics do |t| | ||
| create_table :top_topics, force: true do |t| |
There was a problem hiding this comment.
P0: force: true on create_table in a migration will drop the existing table and all its data before recreating it. This is dangerous in production — if the migration is re-run or the table already exists, all rows in top_topics are silently deleted. Remove force: true; if you need to handle an already-existing table, use a conditional check or a separate migration to modify the schema.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At db/migrate/20131223171005_create_top_topics.rb, line 3:
<comment>`force: true` on `create_table` in a migration will **drop the existing table and all its data** before recreating it. This is dangerous in production — if the migration is re-run or the table already exists, all rows in `top_topics` are silently deleted. Remove `force: true`; if you need to handle an already-existing table, use a conditional check or a separate migration to modify the schema.</comment>
<file context>
@@ -1,6 +1,6 @@
class CreateTopTopics < ActiveRecord::Migration
def change
- create_table :top_topics do |t|
+ create_table :top_topics, force: true do |t|
t.belongs_to :topic
</file context>
| create_table :top_topics, force: true do |t| | |
| create_table :top_topics do |t| |
| def cook(*args) | ||
| # For some posts, for example those imported via RSS, we support raw HTML. In that | ||
| # case we can skip the rendering pipeline. | ||
| return raw if cook_method == Post.cook_methods[:raw_html] |
There was a problem hiding this comment.
P0: Security: XSS risk and plugin filter bypass. Returning raw directly bypasses both HTML sanitization and the :after_post_cook plugin filter. Since the comment states this path is for RSS-imported posts (an external, untrusted source), unsanitized HTML could be rendered in users' browsers. At minimum, the raw HTML should still be passed through the plugin filter, and ideally through a sanitization step.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At app/models/post.rb, line 133:
<comment>**Security: XSS risk and plugin filter bypass.** Returning `raw` directly bypasses both HTML sanitization and the `:after_post_cook` plugin filter. Since the comment states this path is for RSS-imported posts (an external, untrusted source), unsanitized HTML could be rendered in users' browsers. At minimum, the raw HTML should still be passed through the plugin filter, and ideally through a sanitization step.</comment>
<file context>
@@ -124,6 +128,11 @@ def post_analyzer
def cook(*args)
+ # For some posts, for example those imported via RSS, we support raw HTML. In that
+ # case we can skip the rendering pipeline.
+ return raw if cook_method == Post.cook_methods[:raw_html]
+
+ # Default is to cook posts
</file context>
| require 'ruby-readability' | ||
|
|
||
| opts = opts || {} | ||
| doc = Readability::Document.new(open(url).read, |
There was a problem hiding this comment.
P0: Security: Command injection / SSRF via open(url). Kernel#open with user-supplied input can execute arbitrary commands if the string starts with |. Even with OpenURI loaded, this is an SSRF risk. Use URI.open(url) (Ruby 2.7+) or URI.parse(url).open with explicit scheme validation before opening, not after.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At app/models/topic_embed.rb, line 48:
<comment>**Security: Command injection / SSRF via `open(url)`**. `Kernel#open` with user-supplied input can execute arbitrary commands if the string starts with `|`. Even with `OpenURI` loaded, this is an SSRF risk. Use `URI.open(url)` (Ruby 2.7+) or `URI.parse(url).open` with explicit scheme validation *before* opening, not after.</comment>
<file context>
@@ -0,0 +1,82 @@
+ require 'ruby-readability'
+
+ opts = opts || {}
+ doc = Readability::Document.new(open(url).read,
+ tags: %w[div p code pre h1 h2 h3 b em i strong a img],
+ attributes: %w[href src])
</file context>
| poller.execute({}) | ||
| end | ||
|
|
||
| it "requires feed_polling_url" do |
There was a problem hiding this comment.
P2: Test doesn't isolate the condition it claims to verify. Because feed_polling_enabled? is not stubbed to true, short-circuit evaluation means poll_feed is never reached regardless of feed_polling_url's value. This test will pass even if the feed_polling_url.present? check is removed from the implementation. Stub the other two settings to their valid/truthy values so only the setting under test controls the outcome.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At spec/jobs/poll_feed_spec.rb, line 18:
<comment>Test doesn't isolate the condition it claims to verify. Because `feed_polling_enabled?` is not stubbed to `true`, short-circuit evaluation means `poll_feed` is never reached regardless of `feed_polling_url`'s value. This test will pass even if the `feed_polling_url.present?` check is removed from the implementation. Stub the other two settings to their valid/truthy values so only the setting under test controls the outcome.</comment>
<file context>
@@ -0,0 +1,40 @@
+ poller.execute({})
+ end
+
+ it "requires feed_polling_url" do
+ SiteSetting.stubs(:feed_polling_url).returns(nil)
+ poller.expects(:poll_feed).never
</file context>
| window.onload = function() { | ||
| if (parent) { | ||
| // Send a post message with our loaded height | ||
| parent.postMessage({type: 'discourse-resize', height: document['body'].offsetHeight}, '<%= request.referer %>'); |
There was a problem hiding this comment.
P2: Security concern: request.referer is a user-controlled HTTP header being used as the postMessage targetOrigin. An attacker could craft requests with a malicious Referer, causing the iframe to send messages to an attacker-controlled origin. Consider validating the referer against a server-side allowlist of permitted embed origins before using it as the target origin.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At app/views/layouts/embed.html.erb, line 11:
<comment>**Security concern**: `request.referer` is a user-controlled HTTP header being used as the `postMessage` `targetOrigin`. An attacker could craft requests with a malicious Referer, causing the iframe to send messages to an attacker-controlled origin. Consider validating the referer against a server-side allowlist of permitted embed origins before using it as the target origin.</comment>
<file context>
@@ -0,0 +1,20 @@
+ window.onload = function() {
+ if (parent) {
+ // Send a post message with our loaded height
+ parent.postMessage({type: 'discourse-resize', height: document['body'].offsetHeight}, '<%= request.referer %>');
+ }
+ }
</file context>
| controller.request.stubs(:referer).returns(embed_url) | ||
| end | ||
|
|
||
| after do |
There was a problem hiding this comment.
P2: Assertions in after blocks are an anti-pattern. after hooks are meant for teardown/cleanup, not verification. If a test fails before the get request is made (e.g., a mock setup error), the after block still runs and tries to assert on the response, producing confusing cascading failures. Move these assertions into each it block instead.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At spec/controllers/embed_controller_spec.rb, line 35:
<comment>Assertions in `after` blocks are an anti-pattern. `after` hooks are meant for teardown/cleanup, not verification. If a test fails before the `get` request is made (e.g., a mock setup error), the `after` block still runs and tries to assert on the response, producing confusing cascading failures. Move these assertions into each `it` block instead.</comment>
<file context>
@@ -0,0 +1,58 @@
+ controller.request.stubs(:referer).returns(embed_url)
+ end
+
+ after do
+ response.should be_success
+ response.headers['X-Frame-Options'].should == "ALLOWALL"
</file context>
| end | ||
|
|
||
| def fetch_http | ||
| user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first |
There was a problem hiding this comment.
P2: Potential NoMethodError if SiteSetting.embed_by_username is nil or not configured — calling .downcase on nil will raise. Consider adding a guard or using safe navigation.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At lib/topic_retriever.rb, line 49:
<comment>Potential `NoMethodError` if `SiteSetting.embed_by_username` is nil or not configured — calling `.downcase` on `nil` will raise. Consider adding a guard or using safe navigation.</comment>
<file context>
@@ -0,0 +1,55 @@
+ end
+
+ def fetch_http
+ user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first
+ return if user.blank?
+
</file context>
|
|
||
| <script> | ||
| (function() { | ||
| setTimeout(function() { |
There was a problem hiding this comment.
P2: Unbounded infinite reload loop: this page unconditionally reloads every 30 seconds with no retry limit. If the loading condition never resolves (e.g., feed fetch fails permanently), the page will reload indefinitely, wasting server resources and bandwidth. Consider adding a maximum retry count using a URL parameter or cookie to cap the number of reloads.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At app/views/embed/loading.html.erb, line 8:
<comment>Unbounded infinite reload loop: this page unconditionally reloads every 30 seconds with no retry limit. If the loading condition never resolves (e.g., feed fetch fails permanently), the page will reload indefinitely, wasting server resources and bandwidth. Consider adding a maximum retry count using a URL parameter or cookie to cap the number of reloads.</comment>
<file context>
@@ -0,0 +1,12 @@
+
+<script>
+ (function() {
+ setTimeout(function() {
+ document.location.reload();
+ }, 30000);
</file context>
Test 4
Summary by CodeRabbit
New Features
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.
Replicated from ai-code-review-evaluation/discourse-coderabbit#4