Enhance embed URL handling and validation system#4
Conversation
There was a problem hiding this comment.
Greptile Summary
This PR implements a comprehensive embedding system for Discourse that allows external websites to embed forum discussions and automatically import content from RSS/ATOM feeds. The changes introduce several key components:
Core Embedding Infrastructure:
- New
TopicEmbedmodel handles importing external content via URLs using the ruby-readability gem for content parsing and Nokogiri for URL absolutization EmbedControllerprovides the main endpoint (/embed/best) for external sites to retrieve embedded content with host validation and iframe supportTopicRetrieverclass manages URL validation, throttling (60-second cooldown), and delegates to TopicEmbed for content fetching- New database tables (
topic_embeds) track relationships between external URLs and Discourse topics
Content Processing Enhancements:
- Extended
Postmodel with acook_methodsenum supporting both regular markdown and raw HTML processing - Modified
PostCreatorandPostRevisorto handle custom cook methods and optional validation bypassing - New scheduled job
PollFeedautomatically imports RSS/ATOM feeds hourly using the simple-rss gem
Frontend Integration:
- JavaScript embed widget (
embed.js) for third-party sites to load Discourse content in iframes - Responsive embed layout with postMessage communication for dynamic resizing
- Loading states and "best posts" view for embedded content display
- Dedicated CSS styling optimized for embedded contexts
Configuration & Localization:
- New site settings section for embedding configuration (embeddable hosts, feed polling, embed user)
- Comprehensive localization strings for both client and server-side embed functionality
- Updated Disqus import task to leverage the new embed system
The system follows a security model where only configured hosts can embed content, with referer validation and throttling to prevent abuse. Background jobs handle potentially slow content retrieval operations asynchronously.
Confidence score: 1/5
- This PR contains multiple critical security vulnerabilities that make it unsafe to merge in its current state
- The implementation uses deprecated and unsafe methods like
open()for HTTP requests, has weak origin validation in JavaScript, potential XSS vulnerabilities from unescaped content, and SSRF attack vectors - Files requiring immediate attention:
app/models/topic_embed.rb,app/assets/javascripts/embed.js,app/controllers/embed_controller.rb,app/jobs/scheduled/poll_feed.rb,app/views/layouts/embed.html.erb
27 files reviewed, 28 comments
| class CreateTopTopics < ActiveRecord::Migration | ||
| def change | ||
| create_table :top_topics do |t| | ||
| create_table :top_topics, force: true do |t| |
There was a problem hiding this comment.
logic: Adding force: true makes this migration destructive - it will drop and recreate the table if it exists, potentially causing data loss in production
| create_table :top_topics, force: true do |t| | |
| create_table :top_topics do |t| |
| <%= link_to(I18n.t('embed.title'), @topic_view.topic.url, class: 'button', target: '_blank') %> | ||
| <%- else %> | ||
| <%= link_to(I18n.t('embed.start_discussion'), @topic_view.topic.url, class: 'button', target: '_blank') %> | ||
| <%- end if %> |
There was a problem hiding this comment.
syntax: Unusual end if syntax - should be just end
| <%- end if %> | |
| <%- end %> |
| require 'ruby-readability' | ||
|
|
||
| opts = opts || {} | ||
| doc = Readability::Document.new(open(url).read, |
There was a problem hiding this comment.
logic: Using open() is deprecated and creates security vulnerabilities (command injection, SSRF). Use Net::HTTP, HTTParty, or similar safe HTTP libraries instead.
| doc = Readability::Document.new(open(url).read, | |
| doc = Readability::Document.new(Net::HTTP.get_response(URI(url)).body, |
| @@ -0,0 +1,40 @@ | |||
| require 'spec_helper' | |||
| require_dependency 'jobs/regular/process_post' | |||
There was a problem hiding this comment.
logic: Incorrect require_dependency path - should be 'jobs/scheduled/poll_feed' not 'jobs/regular/process_post'
|
|
||
| # Import an article from a source (RSS/Atom/Other) | ||
| def self.import(user, url, title, contents) | ||
| return unless url =~ /^https?\:\/\// |
There was a problem hiding this comment.
logic: URL validation is insufficient - allows access to internal services (localhost, 192.168.x.x, etc). Add proper SSRF protection.
|
|
||
| def ensure_embeddable | ||
| raise Discourse::InvalidAccess.new('embeddable host not set') if SiteSetting.embeddable_host.blank? | ||
| raise Discourse::InvalidAccess.new('invalid referer host') if URI(request.referer || '').host != SiteSetting.embeddable_host |
There was a problem hiding this comment.
logic: Host validation can be bypassed if request.referer is nil - the || '' fallback makes URI('').host return nil, which could equal SiteSetting.embeddable_host if it's also nil
|
|
||
| function postMessageReceived(e) { | ||
| if (!e) { return; } | ||
| if (discourseUrl.indexOf(e.origin) === -1) { return; } |
There was a problem hiding this comment.
logic: Security flaw - indexOf allows subdomain attacks. Use exact origin matching: e.origin !== discourseUrl.replace(/\/$/, '')
| if (discourseUrl.indexOf(e.origin) === -1) { return; } | |
| var expectedOrigin = discourseUrl.replace(/\/$/, '').replace(/^(https?:\/\/[^/]+).*/, '$1'); | |
| if (e.origin !== expectedOrigin) { return; } |
| raise Discourse::InvalidAccess.new('embeddable host not set') if SiteSetting.embeddable_host.blank? | ||
| raise Discourse::InvalidAccess.new('invalid referer host') if URI(request.referer || '').host != SiteSetting.embeddable_host | ||
|
|
||
| response.headers['X-Frame-Options'] = "ALLOWALL" |
There was a problem hiding this comment.
logic: Setting X-Frame-Options to 'ALLOWALL' permits embedding from any domain, potentially enabling clickjacking attacks from malicious sites
| return if user.blank? | ||
|
|
||
| require 'simple-rss' | ||
| rss = SimpleRSS.parse open(SiteSetting.feed_polling_url) |
There was a problem hiding this comment.
logic: Security vulnerability: using open without URL validation allows potential SSRF attacks and local file access
| rss = SimpleRSS.parse open(SiteSetting.feed_polling_url) | |
| require 'net/http' | |
| require 'uri' | |
| uri = URI.parse(SiteSetting.feed_polling_url) | |
| raise ArgumentError, "Invalid URL scheme" unless %w[http https].include?(uri.scheme) | |
| response = Net::HTTP.get_response(uri) | |
| raise "HTTP error: #{response.code}" unless response.is_a?(Net::HTTPSuccess) | |
| rss = SimpleRSS.parse response.body |
|
|
||
| user = nil | ||
| if args[:user_id] | ||
| user = User.where(id: args[:user_id]).first |
There was a problem hiding this comment.
style: use find_by(id: args[:user_id]) instead of where().first for cleaner code
Test 4