Enhance embed URL handling and validation system#4
Conversation
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
📝 WalkthroughWalkthroughThis pull request introduces comprehensive RSS feed embedding functionality. It adds feed-parsing gems, creates a new EmbedController with supporting views and stylesheets, implements a TopicEmbed model for managing embedded content, establishes scheduled and background jobs for polling feeds and retrieving topics, updates post rendering with a new cook method, and includes database migrations and configuration for the embedding feature. Changes
Sequence Diagram(s)sequenceDiagram
participant Scheduler as Scheduler
participant PollFeed as PollFeed Job
participant Retriever as TopicRetriever
participant TopicEmbed as TopicEmbed Model
participant RemoteFeed as Remote Feed
participant PostCreator as PostCreator
participant DB as Database
Scheduler->>PollFeed: execute hourly
PollFeed->>PollFeed: check site settings enabled
PollFeed->>Retriever: new(feed_url, {no_throttle: true})
Retriever->>Retriever: validate host
Retriever->>Retriever: check Redis throttle (60s)
alt throttled or invalid
Retriever-->>PollFeed: skip
else valid & not throttled
Retriever->>TopicEmbed: check existing embed
alt embed exists
Retriever-->>PollFeed: skip
else no embed
Retriever->>RemoteFeed: fetch feed
RemoteFeed-->>Retriever: RSS/ATOM items
Retriever->>TopicEmbed: import_remote(user, url, ...)
TopicEmbed->>TopicEmbed: parse HTML content
TopicEmbed->>TopicEmbed: absolutize URLs
TopicEmbed->>TopicEmbed: compute SHA1 hash
TopicEmbed->>PostCreator: create post(raw_html)
PostCreator->>DB: insert topic & post
DB-->>TopicEmbed: record IDs
TopicEmbed->>DB: insert topic_embed
TopicEmbed-->>PollFeed: post created
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Actionable comments posted: 16
🤖 Fix all issues with AI agents
In @app/assets/javascripts/embed.js:
- Around line 5-12: The code assumes
document.getElementById('discourse-comments') returns an element and will throw
if null; modify the logic around the comments variable (result of
document.getElementById('discourse-comments')) to check for null before
creating/appending the iframe (id 'discourse-embed-frame'), and only build and
append the iframe if comments exists—optionally log or return early when
comments is null.
- Around line 15-24: The origin check in postMessageReceived uses
discourseUrl.indexOf(e.origin) which is vulnerable to substring matches; replace
it with a strict origin validation by comparing e.origin exactly to the expected
origin(s) (e.g. e.origin === discourseUrl) or by parsing both with the URL
constructor and comparing protocol + host (or maintaining an allowlist array and
checking membership), and then only proceed to set iframe.height when the strict
check passes; update references in postMessageReceived, discourseUrl and the
iframe.height assignment to use the new exact-origin logic.
In @app/controllers/embed_controller.rb:
- Around line 2-4: Replace deprecated filter methods: change uses of
skip_before_filter and before_filter in this controller to the Rails 5+
equivalents skip_before_action and before_action (e.g., update the lines
referencing skip_before_filter :check_xhr, skip_before_filter :preload_json, and
before_filter :ensure_embeddable to use skip_before_action and before_action
respectively) so the controller uses ensure_embeddable with the modern callbacks
API.
In @app/jobs/scheduled/poll_feed.rb:
- Line 29: The call using Kernel#open in the feed poll job should explicitly use
open-uri: add require 'open-uri' to the file and change the SimpleRSS.parse
invocation to use URI.open(SiteSetting.feed_polling_url) instead of open(...);
update the line referencing SimpleRSS.parse and SiteSetting.feed_polling_url so
it calls URI.open to satisfy Brakeman and make the URL fetch intent explicit.
- Around line 20-22: The method feed_key is defined but unused; either remove
the feed_key method entirely or, if it's intended for a future feature, add a
concise TODO above it explaining its purpose and expected usage (e.g., why it
computes "feed-modified:#{Digest::SHA1.hexdigest(SiteSetting.feed_polling_url)}"
and when it will be referenced). Ensure the change targets the feed_key method
and mentions the dependency on SiteSetting.feed_polling_url in the TODO so
future readers know the context.
- Around line 35-36: The code calls i.content.scrub which raises NoMethodError
when i.content is nil; guard against nil by using safe navigation or a fallback
before unescaping and scrubbing (e.g., get a safe_content using i.content&.scrub
|| "" or similar) and then pass CGI.unescapeHTML(safe_content) into
TopicEmbed.import so TopicEmbed.import(user, url, i.title, safe_content) always
receives a string.
In @app/models/topic_embed.rb:
- Around line 44-53: In TopicEmbed.import_remote, avoid using Kernel#open(url)
which allows shell execution and SSRF; instead parse and validate the URL (use
URI.parse) to ensure only http/https schemes (and optionally block
local/internal hosts), then fetch the body with a safe HTTP client (e.g.,
Net::HTTP or URI.open with strict scheme/host checks) with timeouts and redirect
handling, and rescue/fail gracefully; replace open(url).read with the validated
fetch and keep passing the fetched content into Readability::Document, and
ensure errors are logged/handled before calling TopicEmbed.import.
In @app/views/embed/best.html.erb:
- Line 6: The ERB template uses invalid Ruby syntax `<%- end if %>`; update the
template to close the if block correctly by replacing that token with a standard
ERB block end (`<%- end %>`) so the `if` opened earlier is properly terminated
and the view will render without a syntax error.
- Line 16: The avatar URL is inserted raw into an img src which can enable XSS
via crafted URLs; replace the raw interpolation of post.user.small_avatar_url
with a Rails helper that safely escapes/validates the URL (e.g., use
image_tag(post.user.small_avatar_url, alt: ...) or tag.img with url_for()) so
the value is escaped and properly encoded before rendering, ensuring you
reference post.user.small_avatar_url and the surrounding img element in your
change.
In @app/views/layouts/embed.html.erb:
- Around line 6-15: The view currently uses the user-controlled request.referer
as the postMessage target origin; instead compute and validate a safe origin in
the controller and pass it into the view (e.g., set an @embed_target_origin
after checking request.referer against your allowed embed hosts whitelist), then
use that variable in the embedded script instead of request.referer; if the
height is non-sensitive you may alternatively default to '*' when the referer is
not on the whitelist. Ensure the validation logic lives in the controller (or a
helper) and onlyTrustedOrigin/@embed_target_origin is used by the script that
calls parent.postMessage({type: 'discourse-resize', height:
document.body.offsetHeight}, @embed_target_origin).
In @db/migrate/20131219203905_add_cook_method_to_posts.rb:
- Around line 1-5: The migration AddCookMethodToPosts sets the new
posts.cook_method default to 1 which maps to :raw_html; update the add_column
call in the change method to use default: 0 so the enum value :regular is the
default for new posts (modify the add_column line in AddCookMethodToPosts
accordingly and run/redo the migration as needed).
In @db/migrate/20131223171005_create_top_topics.rb:
- Line 3: Remove the destructive `force: true` option from the `create_table
:top_topics` call so the migration does not drop/recreate the existing table;
instead create a new, non-destructive migration (e.g., a separate migration file
with `add_column`, `remove_column`, or `change_column`) to modify the
`top_topics` schema if needed, and ensure the original migration only creates
the table when it does not already exist.
In @lib/topic_retriever.rb:
- Around line 27-30: Replace the two-step $redis.setnx(...) followed by
$redis.expire(...) with a single atomic SET call using NX and EX options: call
$redis.set(retrieved_key, "1", nx: true, ex: 60) and treat a truthy/"OK" return
as the key being newly set (preserve the original behavior of returning false in
that case), otherwise proceed as when the key already existed; this eliminates
the race window between setnx and expire.
- Around line 40-43: The code currently calls Jobs::PollFeed.new.execute({})
synchronously inside the request flow, which blocks on I/O in poll_feed; instead
enqueue the poll job asynchronously (use the job's enqueue API such as
Jobs.enqueue(:poll_feed, ...) or Jobs::PollFeed.enqueue(...)) and remove the
direct new.execute call so the request does not wait for feed fetching; keep the
existing guard that checks TopicEmbed.where(embed_url: @embed_url).exists? and
ensure SiteSetting.feed_polling_enabled? still gates enqueuing.
In @spec/controllers/embed_controller_spec.rb:
- Around line 48-52: The test currently stubs TopicEmbed.topic_id_for_embed but
calls TopicView.new without returning an instance, so @topic_view becomes nil;
update the test to return a mock TopicView instance by creating a mock (e.g.,
mock_topic_view) and have TopicView.expects(:new).with(123, nil, {best:
5}).returns(mock_topic_view) so the controller assigns a real (mock) object to
@topic_view; ensure the mock responds to any methods the view expects.
🧹 Nitpick comments (18)
app/assets/stylesheets/embed.css.scss (1)
4-37: Consider using CSS variables for colors and responsive layout.The hard-coded color values (#ddd, #aaa, #4a6b82) and fixed width (92px) may cause maintenance overhead and responsiveness issues:
- Extract colors to variables for theming consistency
- The fixed 92px author width may truncate on narrow viewports
These are optional improvements for future maintainability.
db/migrate/20131217174004_create_topic_embeds.rb (1)
3-9: Consider removingforce: trueand adding an index ontopic_id.
force: truewill drop and recreate the table if it already exists, which could cause data loss if the migration is accidentally re-run. Unless this is intentional for development purposes, consider using a conditional check or removingforce: true.Consider adding an index on
topic_idto support efficient lookups when querying embeds by topic.♻️ Suggested improvements
class CreateTopicEmbeds < ActiveRecord::Migration def change - create_table :topic_embeds, force: true do |t| + create_table :topic_embeds do |t| t.integer :topic_id, null: false t.integer :post_id, null: false t.string :embed_url, null: false t.string :content_sha1, null: false, limit: 40 t.timestamps end + add_index :topic_embeds, :topic_id add_index :topic_embeds, :embed_url, unique: true end endlib/topic_retriever.rb (1)
48-53: Consider usingfind_byfor single record lookup.
User.where(...).firstis idiomatic butfind_byis clearer for single-record lookups and slightly more efficient.♻️ Suggested refactor
def fetch_http - user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first + user = User.find_by(username_lower: SiteSetting.embed_by_username.downcase) return if user.blank? TopicEmbed.import_remote(user, @embed_url) endapp/models/topic_embed.rb (4)
1-1: Userequireinstead ofrequire_dependencyfor gems.
require_dependencyis for Rails autoloading of application code. For gems like Nokogiri, use standardrequire 'nokogiri'.♻️ Suggested fix
-require_dependency 'nokogiri' +require 'nokogiri'
13-13: Mutating thecontentsparameter has side effects.The
<<operator modifies the original string passed in, which could surprise callers. Consider using string concatenation that creates a new string instead.♻️ Proposed fix to avoid mutation
- contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{url}'>#{url}</a>")}</small>\n" + contents = contents + "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{url}'>#{url}</a>")}</small>\n"
22-29: Missing error handling for failed post creation.If
creator.createfails (returns nil or a post with errors), the code still attempts to create aTopicEmbedrecord. Thepost.present?check helps, butcreator.errorsshould be logged or handled to aid debugging.♻️ Suggested improvement
creator = PostCreator.new(user, title: title, raw: absolutize_urls(url, contents), skip_validations: true, cook_method: Post.cook_methods[:raw_html]) post = creator.create - if post.present? + if post.present? && post.errors.blank? TopicEmbed.create!(topic_id: post.topic_id, embed_url: url, content_sha1: content_sha1, post_id: post.id) + else + Rails.logger.warn("TopicEmbed import failed for #{url}: #{creator.errors.full_messages.join(', ')}") end
56-76: Consider extracting URL prefix building and element iteration.The static analysis flagged high complexity. The prefix-building logic and the nearly-identical loops for
aandimgtags could be simplified.♻️ Suggested refactor to reduce duplication
def self.absolutize_urls(url, contents) uri = URI(url) prefix = "#{uri.scheme}://#{uri.host}" prefix << ":#{uri.port}" if uri.port != 80 && uri.port != 443 fragment = Nokogiri::HTML.fragment(contents) - fragment.css('a').each do |a| - href = a['href'] - if href.present? && href.start_with?('/') - a['href'] = "#{prefix}/#{href.sub(/^\/+/, '')}" - end - end - fragment.css('img').each do |a| - src = a['src'] - if src.present? && src.start_with?('/') - a['src'] = "#{prefix}/#{src.sub(/^\/+/, '')}" - end - end + { 'a' => 'href', 'img' => 'src' }.each do |tag, attr| + fragment.css(tag).each do |el| + val = el[attr] + el[attr] = "#{prefix}/#{val.sub(/^\/+/, '')}" if val.present? && val.start_with?('/') + end + end fragment.to_html endapp/jobs/regular/retrieve_topic.rb (1)
1-1: Unused dependency:email/senderappears unreferenced.This
require_dependencyseems unnecessary for this job's functionality. Consider removing it if not needed.♻️ Proposed fix
-require_dependency 'email/sender' require_dependency 'topic_retriever'spec/jobs/poll_feed_spec.rb (1)
2-2: Unused require:jobs/regular/process_postappears unrelated.This require seems unnecessary for testing
Jobs::PollFeed. Consider removing it.♻️ Proposed fix
require 'spec_helper' -require_dependency 'jobs/regular/process_post'Gemfile (1)
209-211: Version pinning is optional for these gems. While pinning versions improves reproducibility, it's not the pattern used in this Gemfile—only 8 of 66 gems specify version constraints. These gems are appropriately configured withrequire: falsefor conditional loading. If version pinning becomes a project-wide practice, apply it consistently.config/routes.rb (1)
245-246: Route addition looks good.The route correctly maps to
embed#bestand follows the existing routing conventions in this file.However, per the static analysis hint, the codebase is running on Rails 3.2.12 which has a known RCE vulnerability (CVE-2014-0130) related to globbing routes. While this specific route doesn't use globbing, upgrading Rails should be prioritized to patch this vulnerability.
spec/models/topic_embed_spec.rb (1)
18-21: Tests use deprecated RSpec syntax.The tests use
shouldsyntax which is deprecated in favor ofexpect. Also,be_true(line 34) is deprecated in newer RSpec versions.This is a minor style concern and can be addressed in a follow-up if the project adopts newer RSpec conventions.
♻️ Modern RSpec syntax example
it "returns nil when the URL is malformed" do - TopicEmbed.import(user, "invalid url", title, contents).should be_nil - TopicEmbed.count.should == 0 + expect(TopicEmbed.import(user, "invalid url", title, contents)).to be_nil + expect(TopicEmbed.count).to eq(0) endSimilarly for
be_true→be trueoreq(true).spec/controllers/embed_controller_spec.rb (2)
35-38: Avoid assertions inafterblocks.Placing assertions in
afterhooks is unconventional and can mask failures. If anitblock fails before reaching thegetcall, theafterblock still runs and may report misleading errors. Move these assertions into each individual test.♻️ Suggested refactor
- after do - response.should be_success - response.headers['X-Frame-Options'].should == "ALLOWALL" - end - it "tells the topic retriever to work when no previous embed is found" do TopicEmbed.expects(:topic_id_for_embed).returns(nil) retriever = mock TopicRetriever.expects(:new).returns(retriever) retriever.expects(:retrieve) get :best, embed_url: embed_url + expect(response).to be_success + expect(response.headers['X-Frame-Options']).to eq("ALLOWALL") end it "creates a topic view when a topic_id is found" do TopicEmbed.expects(:topic_id_for_embed).returns(123) TopicView.expects(:new).with(123, nil, {best: 5}) get :best, embed_url: embed_url + expect(response).to be_success + expect(response.headers['X-Frame-Options']).to eq("ALLOWALL") end
10-10: Consider updating to modern RSpec syntax.The tests use the deprecated
shouldsyntax (e.g.,response.should_not be_success). Modern RSpec prefers theexpectsyntax for clarity and better error messages.# Instead of: response.should_not be_success # Use: expect(response).not_to be_successAlso applies to: 16-16, 26-26, 36-37
app/views/embed/best.html.erb (1)
20-20: Replace inline style with CSS class.Inline styles reduce maintainability. Consider using a CSS class for the clearfix pattern, which is already likely defined in the embed stylesheet.
♻️ Proposed refactor
- <div style='clear: both'></div> + <div class='clearfix'></div>Then define in CSS:
.clearfix { clear: both; }spec/components/topic_retriever_spec.rb (1)
21-25: Test may pass for the wrong reason.This test doesn't stub
SiteSetting.embeddable_host. Looking atlib/topic_retriever.rb,retrieveskipsperform_retrievewheninvalid_host?is true. Without stubbing the host setting,invalid_host?may return true because no host is configured, not because "not a url" is detected as invalid.Consider stubbing the host to ensure the test validates URL parsing specifically:
♻️ Proposed fix
it "does not call perform_retrieve when the embed url is not a url" do + SiteSetting.expects(:embeddable_host).returns("eviltrout.com") r = TopicRetriever.new("not a url") r.expects(:perform_retrieve).never r.retrieve endapp/controllers/embed_controller.rb (2)
24-31: Referer host comparison may be too strict.The exact string comparison
URI(request.referer || '').host != SiteSetting.embeddable_hostcould fail for legitimate embeds if:
- The referer has a
www.prefix butembeddable_hostdoesn't (or vice versa)- Case differences exist (e.g.,
Eviltrout.comvseviltrout.com)Consider normalizing both hosts before comparison.
♻️ Proposed improvement
def ensure_embeddable raise Discourse::InvalidAccess.new('embeddable host not set') if SiteSetting.embeddable_host.blank? - raise Discourse::InvalidAccess.new('invalid referer host') if URI(request.referer || '').host != SiteSetting.embeddable_host + referer_host = URI(request.referer || '').host.to_s.downcase.sub(/\Awww\./, '') + embeddable = SiteSetting.embeddable_host.to_s.downcase.sub(/\Awww\./, '') + raise Discourse::InvalidAccess.new('invalid referer host') if referer_host != embeddable response.headers['X-Frame-Options'] = "ALLOWALL" rescue URI::InvalidURIError raise Discourse::InvalidAccess.new('invalid referer host') end
12-20: Consider explicit return after render.When
render 'loading'is called on line 16, execution continues todiscourse_expires_inon line 19. While this works, the flow is clearer with an explicit early return. This also prevents accidental double-render if code is added later.♻️ Proposed refactor
if topic_id @topic_view = TopicView.new(topic_id, current_user, {best: 5}) else Jobs.enqueue(:retrieve_topic, user_id: current_user.try(:id), embed_url: embed_url) - render 'loading' + discourse_expires_in 1.minute + return render 'loading' end discourse_expires_in 1.minute endOr restructure to avoid duplication:
+ discourse_expires_in 1.minute + if topic_id @topic_view = TopicView.new(topic_id, current_user, {best: 5}) else Jobs.enqueue(:retrieve_topic, user_id: current_user.try(:id), embed_url: embed_url) - render 'loading' + return render 'loading' end - - discourse_expires_in 1.minute end
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Gemfile_rails4.lockis excluded by!**/*.lock
📒 Files selected for processing (27)
Gemfileapp/assets/javascripts/embed.jsapp/assets/stylesheets/embed.css.scssapp/controllers/embed_controller.rbapp/jobs/regular/retrieve_topic.rbapp/jobs/scheduled/poll_feed.rbapp/models/post.rbapp/models/topic_embed.rbapp/views/embed/best.html.erbapp/views/embed/loading.html.erbapp/views/layouts/embed.html.erbconfig/locales/client.en.ymlconfig/locales/server.en.ymlconfig/routes.rbconfig/site_settings.ymldb/migrate/20131210181901_migrate_word_counts.rbdb/migrate/20131217174004_create_topic_embeds.rbdb/migrate/20131219203905_add_cook_method_to_posts.rbdb/migrate/20131223171005_create_top_topics.rblib/post_creator.rblib/post_revisor.rblib/tasks/disqus.thorlib/topic_retriever.rbspec/components/topic_retriever_spec.rbspec/controllers/embed_controller_spec.rbspec/jobs/poll_feed_spec.rbspec/models/topic_embed_spec.rb
🧰 Additional context used
🧬 Code graph analysis (8)
spec/components/topic_retriever_spec.rb (1)
lib/topic_retriever.rb (1)
retrieve(8-10)
app/jobs/regular/retrieve_topic.rb (2)
app/jobs/scheduled/poll_feed.rb (1)
execute(14-18)lib/topic_retriever.rb (1)
retrieve(8-10)
app/jobs/scheduled/poll_feed.rb (1)
app/models/topic_embed.rb (1)
import(10-42)
db/migrate/20131219203905_add_cook_method_to_posts.rb (2)
db/migrate/20131217174004_create_topic_embeds.rb (2)
change(1-13)change(2-12)db/migrate/20131223171005_create_top_topics.rb (2)
change(1-23)change(2-22)
app/controllers/embed_controller.rb (2)
app/models/topic_embed.rb (1)
topic_id_for_embed(78-80)app/controllers/application_controller.rb (1)
discourse_expires_in(168-171)
spec/jobs/poll_feed_spec.rb (1)
app/jobs/scheduled/poll_feed.rb (1)
execute(14-18)
app/models/topic_embed.rb (3)
app/models/post.rb (2)
url(267-269)url(271-273)lib/post_creator.rb (2)
create(51-87)create(90-92)lib/post_revisor.rb (1)
revise!(11-22)
db/migrate/20131217174004_create_topic_embeds.rb (2)
db/migrate/20131219203905_add_cook_method_to_posts.rb (2)
change(1-5)change(2-4)db/migrate/20131223171005_create_top_topics.rb (2)
change(1-23)change(2-22)
🪛 Brakeman (7.1.1)
app/jobs/scheduled/poll_feed.rb
[medium] 29-29: Model attribute used in file name
Type: File Access
Confidence: Medium
More info: https://brakemanscanner.org/docs/warning_types/file_access/
(File Access)
config/routes.rb
[medium] Rails 3.2.12 with globbing routes is vulnerable to directory traversal and remote code execution. Patch or upgrade to Rails 3.2.18
Type: Remote Code Execution
Confidence: Medium
More info: http://matasano.com/research/AnatomyOfRailsVuln-CVE-2014-0130.pdf
(Remote Code Execution)
🪛 RuboCop (1.82.1)
spec/components/topic_retriever_spec.rb
[convention] 5-6: Extra empty line detected at block body beginning.
(Layout/EmptyLinesAroundBlockBody)
[convention] 43-44: Extra empty line detected at block body end.
(Layout/EmptyLinesAroundBlockBody)
[convention] 45-46: Extra empty line detected at block body end.
(Layout/EmptyLinesAroundBlockBody)
app/jobs/regular/retrieve_topic.rb
[convention] 5-6: Extra empty line detected at module body beginning.
(Layout/EmptyLinesAroundModuleBody)
[convention] 21-22: Extra empty line detected at module body end.
(Layout/EmptyLinesAroundModuleBody)
spec/controllers/embed_controller_spec.rb
[convention] 4-5: Extra empty line detected at block body beginning.
(Layout/EmptyLinesAroundBlockBody)
[convention] 30-31: Extra empty line detected at block body beginning.
(Layout/EmptyLinesAroundBlockBody)
[convention] 54-55: Extra empty line detected at block body end.
(Layout/EmptyLinesAroundBlockBody)
[convention] 57-58: Extra empty line detected at block body end.
(Layout/EmptyLinesAroundBlockBody)
spec/models/topic_embed_spec.rb
[convention] 4-5: Extra empty line detected at block body beginning.
(Layout/EmptyLinesAroundBlockBody)
[convention] 12-13: Extra empty line detected at block body beginning.
(Layout/EmptyLinesAroundBlockBody)
[convention] 43-44: Extra empty line detected at block body end.
(Layout/EmptyLinesAroundBlockBody)
[convention] 45-46: Extra empty line detected at block body end.
(Layout/EmptyLinesAroundBlockBody)
[convention] 47-48: Extra empty line detected at block body end.
(Layout/EmptyLinesAroundBlockBody)
spec/jobs/poll_feed_spec.rb
[convention] 5-6: Extra empty line detected at block body beginning.
(Layout/EmptyLinesAroundBlockBody)
[convention] 39-40: Extra empty line detected at block body end.
(Layout/EmptyLinesAroundBlockBody)
app/models/topic_embed.rb
[convention] 10-42: Assignment Branch Condition size for import is too high. [<7, 23, 6> 24.78/23]
(Metrics/AbcSize)
[convention] 56-76: Assignment Branch Condition size for absolutize_urls is too high. [<9, 23, 10> 26.65/23]
(Metrics/AbcSize)
[convention] 56-76: Cyclomatic complexity for absolutize_urls is too high. [9/7]
(Metrics/CyclomaticComplexity)
[convention] 56-76: Perceived complexity for absolutize_urls is too high. [9/8]
(Metrics/PerceivedComplexity)
🔇 Additional comments (20)
db/migrate/20131210181901_migrate_word_counts.rb (1)
1-33: Formatting-only changes look fine, but file appears unrelated to PR objectives.The changes here are limited to whitespace adjustments (empty line on line 3 and end-of-file newline correction on line 33) with no functional impact on the migration logic.
However, this word counts migration is unrelated to the PR's stated purpose of enhancing embed URL handling and validation. Consider whether this formatting change was intentionally included or accidentally staged.
app/models/post.rb (2)
63-65: LGTM on the enum definition.The
cook_methodsenum provides a clean way to distinguish between regular post cooking and raw HTML passthrough for embedded content.
130-137: The migration default and enum implementation are actually correct. TheEnumclass defaultsstartto1(not0), soEnum.new(:regular, :raw_html)assigns:regular => 1and:raw_html => 2. The migration'sdefault: 1correctly sets new posts to the:regularcooking method, which is the intended behavior. The line 135 comment "Default is to cook posts" aligns with this implementation.Likely an incorrect or invalid review comment.
lib/post_revisor.rb (1)
85-86: Consistent with PostCreator pattern.The conditional validation bypass aligns with the same pattern in
PostCreator#save_post. This is appropriate for importing external content (e.g., RSS feeds) where standard validation rules may not apply.Ensure
skip_validationsis only set in trusted contexts (e.g., system-level imports) and never exposed to user input.lib/post_creator.rb (1)
216-216: LGTM.The
cook_methodassignment follows the established pattern for optional post attributes and integrates cleanly with the embedding workflow.config/site_settings.yml (1)
353-358: LGTM!The new embedding settings are well-structured and follow the existing patterns in the file. Default values are appropriate (feature disabled by default with empty configuration strings).
config/locales/client.en.yml (1)
1477-1477: LGTM!The new
embeddingcategory label is correctly added to the admin site settings categories, matching the new embedding section insite_settings.yml.config/locales/server.en.yml (2)
32-38: LGTM!The embed namespace translations are well-structured and provide clear, user-friendly strings for the embedding UI. The
imported_fromkey correctly uses Rails i18n interpolation syntax with%{link}.
768-772: LGTM!The site settings descriptions are clear and informative, properly explaining the purpose of each embedding configuration option.
app/jobs/regular/retrieve_topic.rb (1)
9-18: LGTM with minor suggestion.The job logic is correct: validates input, optionally resolves user, and delegates to
TopicRetriever. Theno_throttleflag based on staff status is a good UX touch.Consider using
User.find_by(id: args[:user_id])for clarity.spec/jobs/poll_feed_spec.rb (1)
8-37: Good coverage of execute gating conditions.The tests properly verify all three conditions that gate the
poll_feedcall. The test structure is clear and follows the implementation logic.Consider adding integration tests for
poll_feeditself (e.g., with a mocked HTTP response) to verify the actual feed processing behavior.lib/tasks/disqus.thor (1)
148-148: The review comment is based on an incorrect assumption about the data available in the Disqus export. The thread object (t) contains only metadata (:title,:link, and:posts), not the main thread content. The:cookedcontent exists only for individual posts withint[:posts], not for the thread itself. Since the main thread body content is not available in the Disqus data,TopicEmbed.import_remoteis necessary to fetch the article content from the source URL. The code correctly handles individual post content by usingp[:cooked]for replies in the subsequent loop.Likely an incorrect or invalid review comment.
app/views/embed/loading.html.erb (1)
1-4: Header implementation is clean.Uses proper Rails helpers for translation and link generation, which handle escaping correctly.
app/views/layouts/embed.html.erb (1)
1-5: Layout structure is appropriate for embed use case.Minimal HTML structure with embed-specific stylesheet is suitable for iframe embedding.
Also applies to: 16-20
spec/models/topic_embed_spec.rb (2)
23-44: Good test coverage for the import functionality.The tests appropriately verify:
- Post creation with valid URLs
- Raw HTML rendering mode
- Relative-to-absolute URL conversion
- Update/re-import behavior
5-8: Association and validation tests are appropriate.Uses shoulda matchers correctly for testing model associations and validations.
app/views/embed/best.html.erb (1)
19-19: Ensurepost.cookedis sanitized before storage.Using
rawbypasses escaping entirely, trusting the content completely. Verify thatcookedHTML is sanitized server-side when the post is created/updated to prevent stored XSS.spec/components/topic_retriever_spec.rb (2)
9-25: Tests comprehensively cover host validation edge cases.Good coverage of the various failure modes: nil host, mismatched host, and invalid URL. The test structure clearly separates concerns.
32-42: Throttling behavior well tested.The tests properly verify the
retrieved_recently?throttling logic with both true and false cases.app/controllers/embed_controller.rb (1)
8-10: Action logic is clean and well-structured.The
bestaction properly handles both cases: existing topic (renders view) and missing topic (enqueues job and renders loading state). Good separation of concerns.
| var comments = document.getElementById('discourse-comments'), | ||
| iframe = document.createElement('iframe'); | ||
| iframe.src = discourseUrl + "embed/best?embed_url=" + encodeURIComponent(discourseEmbedUrl); | ||
| iframe.id = 'discourse-embed-frame'; | ||
| iframe.width = "100%"; | ||
| iframe.frameBorder = "0"; | ||
| iframe.scrolling = "no"; | ||
| comments.appendChild(iframe); |
There was a problem hiding this comment.
Add null check for the container element.
If #discourse-comments element doesn't exist on the page, comments.appendChild(iframe) will throw an error.
💡 Proposed fix
var comments = document.getElementById('discourse-comments'),
iframe = document.createElement('iframe');
+
+ if (!comments) { return; }
+
iframe.src = discourseUrl + "embed/best?embed_url=" + encodeURIComponent(discourseEmbedUrl);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| var comments = document.getElementById('discourse-comments'), | |
| iframe = document.createElement('iframe'); | |
| iframe.src = discourseUrl + "embed/best?embed_url=" + encodeURIComponent(discourseEmbedUrl); | |
| iframe.id = 'discourse-embed-frame'; | |
| iframe.width = "100%"; | |
| iframe.frameBorder = "0"; | |
| iframe.scrolling = "no"; | |
| comments.appendChild(iframe); | |
| var comments = document.getElementById('discourse-comments'), | |
| iframe = document.createElement('iframe'); | |
| if (!comments) { return; } | |
| iframe.src = discourseUrl + "embed/best?embed_url=" + encodeURIComponent(discourseEmbedUrl); | |
| iframe.id = 'discourse-embed-frame'; | |
| iframe.width = "100%"; | |
| iframe.frameBorder = "0"; | |
| iframe.scrolling = "no"; | |
| comments.appendChild(iframe); |
🤖 Prompt for AI Agents
In @app/assets/javascripts/embed.js around lines 5 - 12, The code assumes
document.getElementById('discourse-comments') returns an element and will throw
if null; modify the logic around the comments variable (result of
document.getElementById('discourse-comments')) to check for null before
creating/appending the iframe (id 'discourse-embed-frame'), and only build and
append the iframe if comments exists—optionally log or return early when
comments is null.
| function postMessageReceived(e) { | ||
| if (!e) { return; } | ||
| if (discourseUrl.indexOf(e.origin) === -1) { return; } | ||
|
|
||
| if (e.data) { | ||
| if (e.data.type === 'discourse-resize' && e.data.height) { | ||
| iframe.height = e.data.height + "px"; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Security: Origin check using indexOf is bypassable.
The check discourseUrl.indexOf(e.origin) === -1 can be bypassed. If discourseUrl is https://discourse.example.com, an attacker origin like https://discourse.example.com.evil.com would pass this check since indexOf finds a substring match.
🔒 Proposed fix with strict origin validation
function postMessageReceived(e) {
if (!e) { return; }
- if (discourseUrl.indexOf(e.origin) === -1) { return; }
+ // Extract origin from discourseUrl for strict comparison
+ var defined;
+ try {
+ defined = new URL(discourseUrl).origin;
+ } catch (err) {
+ return;
+ }
+ if (e.origin !== defined) { return; }
if (e.data) {
if (e.data.type === 'discourse-resize' && e.data.height) {
iframe.height = e.data.height + "px";
}
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function postMessageReceived(e) { | |
| if (!e) { return; } | |
| if (discourseUrl.indexOf(e.origin) === -1) { return; } | |
| if (e.data) { | |
| if (e.data.type === 'discourse-resize' && e.data.height) { | |
| iframe.height = e.data.height + "px"; | |
| } | |
| } | |
| } | |
| function postMessageReceived(e) { | |
| if (!e) { return; } | |
| // Extract origin from discourseUrl for strict comparison | |
| var defined; | |
| try { | |
| defined = new URL(discourseUrl).origin; | |
| } catch (err) { | |
| return; | |
| } | |
| if (e.origin !== defined) { return; } | |
| if (e.data) { | |
| if (e.data.type === 'discourse-resize' && e.data.height) { | |
| iframe.height = e.data.height + "px"; | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
In @app/assets/javascripts/embed.js around lines 15 - 24, The origin check in
postMessageReceived uses discourseUrl.indexOf(e.origin) which is vulnerable to
substring matches; replace it with a strict origin validation by comparing
e.origin exactly to the expected origin(s) (e.g. e.origin === discourseUrl) or
by parsing both with the URL constructor and comparing protocol + host (or
maintaining an allowlist array and checking membership), and then only proceed
to set iframe.height when the strict check passes; update references in
postMessageReceived, discourseUrl and the iframe.height assignment to use the
new exact-origin logic.
| skip_before_filter :check_xhr | ||
| skip_before_filter :preload_json | ||
| before_filter :ensure_embeddable |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
before_filter skip_before_filter Rails deprecation removal timeline
💡 Result:
Short answer: Rails deprecated the *_filter methods (including before_filter and skip_before_filter) in Rails 4.2 and removed them in Rails 5.1 (use before_action / skip_before_action instead). (guides.rubyonrails.org)
Citations:
🏁 Script executed:
# Check Rails version from Gemfile or gemspec
fd -e Gemfile -e "*.gemspec" | head -5 | xargs -I {} cat {} | grep -i "^gem.*rails" | head -5Repository: AI-Code-Review-Evals/coderabbit-discourse
Length of output: 67
🏁 Script executed:
# Check if there are other uses of skip_before_filter or before_filter in the codebase
rg "skip_before_filter|before_filter" --type rb | head -20Repository: AI-Code-Review-Evals/coderabbit-discourse
Length of output: 111
🏁 Script executed:
# Search for Gemfile more broadly
find . -name "Gemfile" -o -name "*.gemspec" | xargs cat 2>/dev/null | grep -i "rails"Repository: AI-Code-Review-Evals/coderabbit-discourse
Length of output: 1871
🏁 Script executed:
# Search for deprecated filter methods without type specification
rg "skip_before_filter|before_filter" --max-count 30Repository: AI-Code-Review-Evals/coderabbit-discourse
Length of output: 50398
🏁 Script executed:
# Also check the specific file mentioned in the review
cat -n app/controllers/embed_controller.rb | head -20Repository: AI-Code-Review-Evals/coderabbit-discourse
Length of output: 758
Replace deprecated filter methods with Rails 5+ compatible versions.
skip_before_filter and before_filter were deprecated in Rails 4.2 and removed in Rails 5.1. Update to skip_before_action and before_action:
Suggested change
- skip_before_filter :check_xhr
- skip_before_filter :preload_json
- before_filter :ensure_embeddable
+ skip_before_action :check_xhr
+ skip_before_action :preload_json
+ before_action :ensure_embeddable📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| skip_before_filter :check_xhr | |
| skip_before_filter :preload_json | |
| before_filter :ensure_embeddable | |
| skip_before_action :check_xhr | |
| skip_before_action :preload_json | |
| before_action :ensure_embeddable |
🤖 Prompt for AI Agents
In @app/controllers/embed_controller.rb around lines 2 - 4, Replace deprecated
filter methods: change uses of skip_before_filter and before_filter in this
controller to the Rails 5+ equivalents skip_before_action and before_action
(e.g., update the lines referencing skip_before_filter :check_xhr,
skip_before_filter :preload_json, and before_filter :ensure_embeddable to use
skip_before_action and before_action respectively) so the controller uses
ensure_embeddable with the modern callbacks API.
| def feed_key | ||
| @feed_key ||= "feed-modified:#{Digest::SHA1.hexdigest(SiteSetting.feed_polling_url)}" | ||
| end |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
rg -n "feed_key" --type rubyRepository: AI-Code-Review-Evals/coderabbit-discourse
Length of output: 263
Remove the feed_key method or document its intent with a TODO comment.
The method is defined but never used anywhere in the codebase. If this is part of a planned feature, add a TODO comment explaining the intention; otherwise, remove it.
🤖 Prompt for AI Agents
In @app/jobs/scheduled/poll_feed.rb around lines 20 - 22, The method feed_key is
defined but unused; either remove the feed_key method entirely or, if it's
intended for a future feature, add a concise TODO above it explaining its
purpose and expected usage (e.g., why it computes
"feed-modified:#{Digest::SHA1.hexdigest(SiteSetting.feed_polling_url)}" and when
it will be referenced). Ensure the change targets the feed_key method and
mentions the dependency on SiteSetting.feed_polling_url in the TODO so future
readers know the context.
| return if user.blank? | ||
|
|
||
| require 'simple-rss' | ||
| rss = SimpleRSS.parse open(SiteSetting.feed_polling_url) |
There was a problem hiding this comment.
Security: Use URI.open with explicit require for URL fetching.
The Brakeman warning is valid. Using open() with a URL, even from admin settings, should use URI.open explicitly (after require 'open-uri') to make the intent clear and avoid potential issues with Kernel#open.
🔒 Proposed fix
require 'simple-rss'
- rss = SimpleRSS.parse open(SiteSetting.feed_polling_url)
+ rss = SimpleRSS.parse URI.open(SiteSetting.feed_polling_url)🧰 Tools
🪛 Brakeman (7.1.1)
[medium] 29-29: Model attribute used in file name
Type: File Access
Confidence: Medium
More info: https://brakemanscanner.org/docs/warning_types/file_access/
(File Access)
🤖 Prompt for AI Agents
In @app/jobs/scheduled/poll_feed.rb at line 29, The call using Kernel#open in
the feed poll job should explicitly use open-uri: add require 'open-uri' to the
file and change the SimpleRSS.parse invocation to use
URI.open(SiteSetting.feed_polling_url) instead of open(...); update the line
referencing SimpleRSS.parse and SiteSetting.feed_polling_url so it calls
URI.open to satisfy Brakeman and make the URL fetch intent explicit.
| class AddCookMethodToPosts < ActiveRecord::Migration | ||
| def change | ||
| add_column :posts, :cook_method, :integer, default: 1, null: false | ||
| end | ||
| end |
There was a problem hiding this comment.
Fix default value to match :regular enum.
The default value 1 maps to :raw_html in the Post.cook_methods enum, which would cause all new posts to skip cooking by default. Change to default: 0 to ensure normal posts use regular cooking.
🐛 Proposed fix
class AddCookMethodToPosts < ActiveRecord::Migration
def change
- add_column :posts, :cook_method, :integer, default: 1, null: false
+ add_column :posts, :cook_method, :integer, default: 0, null: false
end
end📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| class AddCookMethodToPosts < ActiveRecord::Migration | |
| def change | |
| add_column :posts, :cook_method, :integer, default: 1, null: false | |
| end | |
| end | |
| class AddCookMethodToPosts < ActiveRecord::Migration | |
| def change | |
| add_column :posts, :cook_method, :integer, default: 0, null: false | |
| end | |
| end |
🤖 Prompt for AI Agents
In @db/migrate/20131219203905_add_cook_method_to_posts.rb around lines 1 - 5,
The migration AddCookMethodToPosts sets the new posts.cook_method default to 1
which maps to :raw_html; update the add_column call in the change method to use
default: 0 so the enum value :regular is the default for new posts (modify the
add_column line in AddCookMethodToPosts accordingly and run/redo the migration
as needed).
| class CreateTopTopics < ActiveRecord::Migration | ||
| def change | ||
| create_table :top_topics do |t| | ||
| create_table :top_topics, force: true do |t| |
There was a problem hiding this comment.
Adding force: true is destructive and may cause data loss.
The force: true option will drop the existing top_topics table and recreate it, deleting all existing data. This is dangerous in production environments where this migration may have already been run.
If you need to modify an existing table, consider using a separate migration with add_column, remove_column, or change_column instead.
🤖 Prompt for AI Agents
In @db/migrate/20131223171005_create_top_topics.rb at line 3, Remove the
destructive `force: true` option from the `create_table :top_topics` call so the
migration does not drop/recreate the existing table; instead create a new,
non-destructive migration (e.g., a separate migration file with `add_column`,
`remove_column`, or `change_column`) to modify the `top_topics` schema if
needed, and ensure the original migration only creates the table when it does
not already exist.
| if $redis.setnx(retrieved_key, "1") | ||
| $redis.expire(retrieved_key, 60) | ||
| return false | ||
| end |
There was a problem hiding this comment.
Race condition between setnx and expire calls.
The setnx and expire are separate operations, creating a small window where if the process crashes between them, the key may persist indefinitely without an expiration. Use set with nx and ex options in a single atomic call.
🔧 Proposed fix using atomic SET with NX and EX
- if $redis.setnx(retrieved_key, "1")
- $redis.expire(retrieved_key, 60)
- return false
- end
+ # Atomic set with NX (only if not exists) and EX (expiration in seconds)
+ return false if $redis.set(retrieved_key, "1", nx: true, ex: 60)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if $redis.setnx(retrieved_key, "1") | |
| $redis.expire(retrieved_key, 60) | |
| return false | |
| end | |
| # Atomic set with NX (only if not exists) and EX (expiration in seconds) | |
| return false if $redis.set(retrieved_key, "1", nx: true, ex: 60) |
🤖 Prompt for AI Agents
In @lib/topic_retriever.rb around lines 27 - 30, Replace the two-step
$redis.setnx(...) followed by $redis.expire(...) with a single atomic SET call
using NX and EX options: call $redis.set(retrieved_key, "1", nx: true, ex: 60)
and treat a truthy/"OK" return as the key being newly set (preserve the original
behavior of returning false in that case), otherwise proceed as when the key
already existed; this eliminates the race window between setnx and expire.
| if SiteSetting.feed_polling_enabled? | ||
| Jobs::PollFeed.new.execute({}) | ||
| return if TopicEmbed.where(embed_url: @embed_url).exists? | ||
| end |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check how PollFeed is typically invoked elsewhere in the codebase
rg -n "PollFeed" --type ruby -C 3Repository: AI-Code-Review-Evals/coderabbit-discourse
Length of output: 1315
🏁 Script executed:
#!/bin/bash
# Find the Jobs::Scheduled base class definition
fd -e rb | xargs grep -l "class Scheduled" | head -5
# Also check for typical job enqueue patterns in the codebase
rg -n "\.enqueue" --type ruby | grep -i "job" | head -10
# And check for other Jobs invocations in the topic_retriever context
rg -n "Jobs::" lib/topic_retriever.rb -C 2Repository: AI-Code-Review-Evals/coderabbit-discourse
Length of output: 1227
🏁 Script executed:
#!/bin/bash
# Read the base.rb file to understand the job system
cat app/jobs/base.rb | head -100Repository: AI-Code-Review-Evals/coderabbit-discourse
Length of output: 2294
🏁 Script executed:
#!/bin/bash
# Read the PollFeed job implementation
cat app/jobs/scheduled/poll_feed.rbRepository: AI-Code-Review-Evals/coderabbit-discourse
Length of output: 1156
Synchronous job execution blocks the retrieval flow.
Jobs::PollFeed.new.execute({}) runs the entire feed polling synchronously within this request path. The poll_feed method performs blocking I/O (HTTP request to fetch the RSS feed, parsing, and database operations for each item). If the feed is large or the network is slow, this will block. Consider whether this is intentional or if it should be enqueued asynchronously.
🤖 Prompt for AI Agents
In @lib/topic_retriever.rb around lines 40 - 43, The code currently calls
Jobs::PollFeed.new.execute({}) synchronously inside the request flow, which
blocks on I/O in poll_feed; instead enqueue the poll job asynchronously (use the
job's enqueue API such as Jobs.enqueue(:poll_feed, ...) or
Jobs::PollFeed.enqueue(...)) and remove the direct new.execute call so the
request does not wait for feed fetching; keep the existing guard that checks
TopicEmbed.where(embed_url: @embed_url).exists? and ensure
SiteSetting.feed_polling_enabled? still gates enqueuing.
| it "creates a topic view when a topic_id is found" do | ||
| TopicEmbed.expects(:topic_id_for_embed).returns(123) | ||
| TopicView.expects(:new).with(123, nil, {best: 5}) | ||
| get :best, embed_url: embed_url | ||
| end |
There was a problem hiding this comment.
Mock should return an object for TopicView.new.
The controller assigns @topic_view = TopicView.new(...). If the mock doesn't return anything, @topic_view will be nil, potentially causing the view to fail or behave unexpectedly. Return a mock object.
🐛 Proposed fix
it "creates a topic view when a topic_id is found" do
TopicEmbed.expects(:topic_id_for_embed).returns(123)
- TopicView.expects(:new).with(123, nil, {best: 5})
+ topic_view = mock('topic_view', posts: [], topic: mock('topic', url: 'http://example.com'))
+ TopicView.expects(:new).with(123, nil, {best: 5}).returns(topic_view)
get :best, embed_url: embed_url
end🤖 Prompt for AI Agents
In @spec/controllers/embed_controller_spec.rb around lines 48 - 52, The test
currently stubs TopicEmbed.topic_id_for_embed but calls TopicView.new without
returning an instance, so @topic_view becomes nil; update the test to return a
mock TopicView instance by creating a mock (e.g., mock_topic_view) and have
TopicView.expects(:new).with(123, nil, {best: 5}).returns(mock_topic_view) so
the controller assigns a real (mock) object to @topic_view; ensure the mock
responds to any methods the view expects.
Test 4
Summary by CodeRabbit
New Features
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.
Replicated from ai-code-review-evaluation/discourse-coderabbit#4