Skip to content

Enhance embed URL handling and validation system#4

Open
everettbu wants to merge 1 commit into
embed-url-handling-prefrom
embed-url-handling-post
Open

Enhance embed URL handling and validation system#4
everettbu wants to merge 1 commit into
embed-url-handling-prefrom
embed-url-handling-post

Conversation

@everettbu

Copy link
Copy Markdown
Contributor

Test 4

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR implements a comprehensive embedding system for Discourse that allows external websites to embed forum discussions and automatically import content from RSS/ATOM feeds. The changes introduce several key components:

Core Embedding Infrastructure:

  • New TopicEmbed model handles importing external content via URLs using the ruby-readability gem for content parsing and Nokogiri for URL absolutization
  • EmbedController provides the main endpoint (/embed/best) for external sites to retrieve embedded content with host validation and iframe support
  • TopicRetriever class manages URL validation, throttling (60-second cooldown), and delegates to TopicEmbed for content fetching
  • New database tables (topic_embeds) track relationships between external URLs and Discourse topics

Content Processing Enhancements:

  • Extended Post model with a cook_methods enum supporting both regular markdown and raw HTML processing
  • Modified PostCreator and PostRevisor to handle custom cook methods and optional validation bypassing
  • New scheduled job PollFeed automatically imports RSS/ATOM feeds hourly using the simple-rss gem

Frontend Integration:

  • JavaScript embed widget (embed.js) for third-party sites to load Discourse content in iframes
  • Responsive embed layout with postMessage communication for dynamic resizing
  • Loading states and "best posts" view for embedded content display
  • Dedicated CSS styling optimized for embedded contexts

Configuration & Localization:

  • New site settings section for embedding configuration (embeddable hosts, feed polling, embed user)
  • Comprehensive localization strings for both client and server-side embed functionality
  • Updated Disqus import task to leverage the new embed system

The system follows a security model where only configured hosts can embed content, with referer validation and throttling to prevent abuse. Background jobs handle potentially slow content retrieval operations asynchronously.

Confidence score: 1/5

  • This PR contains multiple critical security vulnerabilities that make it unsafe to merge in its current state
  • The implementation uses deprecated and unsafe methods like open() for HTTP requests, has weak origin validation in JavaScript, potential XSS vulnerabilities from unescaped content, and SSRF attack vectors
  • Files requiring immediate attention: app/models/topic_embed.rb, app/assets/javascripts/embed.js, app/controllers/embed_controller.rb, app/jobs/scheduled/poll_feed.rb, app/views/layouts/embed.html.erb

27 files reviewed, 28 comments

Edit Code Review Bot Settings | Greptile

class CreateTopTopics < ActiveRecord::Migration
def change
create_table :top_topics do |t|
create_table :top_topics, force: true do |t|

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Adding force: true makes this migration destructive - it will drop and recreate the table if it exists, potentially causing data loss in production

Suggested change
create_table :top_topics, force: true do |t|
create_table :top_topics do |t|

<%= link_to(I18n.t('embed.title'), @topic_view.topic.url, class: 'button', target: '_blank') %>
<%- else %>
<%= link_to(I18n.t('embed.start_discussion'), @topic_view.topic.url, class: 'button', target: '_blank') %>
<%- end if %>

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

syntax: Unusual end if syntax - should be just end

Suggested change
<%- end if %>
<%- end %>

Comment thread app/models/topic_embed.rb
require 'ruby-readability'

opts = opts || {}
doc = Readability::Document.new(open(url).read,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Using open() is deprecated and creates security vulnerabilities (command injection, SSRF). Use Net::HTTP, HTTParty, or similar safe HTTP libraries instead.

Suggested change
doc = Readability::Document.new(open(url).read,
doc = Readability::Document.new(Net::HTTP.get_response(URI(url)).body,

@@ -0,0 +1,40 @@
require 'spec_helper'
require_dependency 'jobs/regular/process_post'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Incorrect require_dependency path - should be 'jobs/scheduled/poll_feed' not 'jobs/regular/process_post'

Comment thread app/models/topic_embed.rb

# Import an article from a source (RSS/Atom/Other)
def self.import(user, url, title, contents)
return unless url =~ /^https?\:\/\//

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: URL validation is insufficient - allows access to internal services (localhost, 192.168.x.x, etc). Add proper SSRF protection.


def ensure_embeddable
raise Discourse::InvalidAccess.new('embeddable host not set') if SiteSetting.embeddable_host.blank?
raise Discourse::InvalidAccess.new('invalid referer host') if URI(request.referer || '').host != SiteSetting.embeddable_host

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Host validation can be bypassed if request.referer is nil - the || '' fallback makes URI('').host return nil, which could equal SiteSetting.embeddable_host if it's also nil


function postMessageReceived(e) {
if (!e) { return; }
if (discourseUrl.indexOf(e.origin) === -1) { return; }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Security flaw - indexOf allows subdomain attacks. Use exact origin matching: e.origin !== discourseUrl.replace(/\/$/, '')

Suggested change
if (discourseUrl.indexOf(e.origin) === -1) { return; }
var expectedOrigin = discourseUrl.replace(/\/$/, '').replace(/^(https?:\/\/[^/]+).*/, '$1');
if (e.origin !== expectedOrigin) { return; }

raise Discourse::InvalidAccess.new('embeddable host not set') if SiteSetting.embeddable_host.blank?
raise Discourse::InvalidAccess.new('invalid referer host') if URI(request.referer || '').host != SiteSetting.embeddable_host

response.headers['X-Frame-Options'] = "ALLOWALL"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Setting X-Frame-Options to 'ALLOWALL' permits embedding from any domain, potentially enabling clickjacking attacks from malicious sites

return if user.blank?

require 'simple-rss'
rss = SimpleRSS.parse open(SiteSetting.feed_polling_url)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Security vulnerability: using open without URL validation allows potential SSRF attacks and local file access

Suggested change
rss = SimpleRSS.parse open(SiteSetting.feed_polling_url)
require 'net/http'
require 'uri'
uri = URI.parse(SiteSetting.feed_polling_url)
raise ArgumentError, "Invalid URL scheme" unless %w[http https].include?(uri.scheme)
response = Net::HTTP.get_response(uri)
raise "HTTP error: #{response.code}" unless response.is_a?(Net::HTTPSuccess)
rss = SimpleRSS.parse response.body


user = nil
if args[:user_id]
user = User.where(id: args[:user_id]).first

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: use find_by(id: args[:user_id]) instead of where().first for cleaner code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants