Skip to content

Fixes: driver/gpu/drm/bridge adv7511_drv.c determines EDID read success when the DDC communication error#22

Open
ymh-ryutaro1-okada wants to merge 1 commit intonxp-imx:lf-6.6.yfrom
ymh-ryutaro1-okada:fixes_adv7511drv_HPD_event_asserts_at_DDC_error
Open

Fixes: driver/gpu/drm/bridge adv7511_drv.c determines EDID read success when the DDC communication error#22
ymh-ryutaro1-okada wants to merge 1 commit intonxp-imx:lf-6.6.yfrom
ymh-ryutaro1-okada:fixes_adv7511drv_HPD_event_asserts_at_DDC_error

Conversation

@ymh-ryutaro1-okada
Copy link
Copy Markdown

@ymh-ryutaro1-okada ymh-ryutaro1-okada commented Sep 2, 2024

Issue:

We found two issues happened following when the HPD event asserts.
・ADV7511/ADV7535 fails to read EDID(All 0x00) (Btw, it's okay until use)
・DRM tries to update new supported resolution list by using the wrong EDID(All 0x00)
→This sometimes causes unnecessary dynamic resolution changes to HDMI output from ADV7511/ADV7535.

How to fix:

HDCP_ERROR_INT means HDCP/DDC controller error occurred, it does not mean EDID read success. However, the current implementation determines both ADV7511_INT0_EDID_READY and ADV7511_INT1_DDC_ERROR are EDID read successfully. ADV7511_INT1_DDC_ERROR should be handled as EDID read failed or EDID read timeout. The EDID read timeout sequence is already implemented in the adv7511_wait_for_edid(), so it is safe to ignore DDC_ERROR.

image
※screen captured register map is allowed to use by ADI

Root cause:

adv7511_drv.c determines EDID read successfully when ADV7511/ADV7535 INT1 bit(HDCP_ERROR_INT) asserted.
(DRM debug log is following, added the @IRQ description)

DRM debug log

DRM tries to specify new resolution by wrong EDID(All 0x00).

[ 6937.533628] drm card1-HDMI-A-1: [drm:adv7511_irq_process] IRQ0:(80) @ADV7511_INT0_EDID_READY
[ 6937.533659] drm card1-HDMI-A-1: [drm:adv7511_irq_process] IRQ1:(00)
[ 6937.533982] drm card1-HDMI-A-1: [drm:adv7511_hpd_work] HDMI HPD event: connected
[ 6937.533992] [drm:drm_sysfs_hotplug_event] generating hotplug event
[ 6937.534058] imx-drm 32c00000.bus:display-subsystem: [drm:drm_client_dev_hotplug] fbdev: ret=0
[ 6937.535023] [drm:drm_ioctl] comm="weston" pid=658, dev=0xe201, auth=1, DRM_IOCTL_MODE_GETRESOURCES
[ 6937.535058] [drm:drm_ioctl] comm="weston" pid=658, dev=0xe201, auth=1, DRM_IOCTL_MODE_GETRESOURCES
[ 6937.535073] [drm:drm_ioctl] comm="weston" pid=658, dev=0xe201, auth=1, DRM_IOCTL_MODE_GETCONNECTOR
[ 6937.535087] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:35:HDMI-A-1]
[ 6937.538715] drm card1-HDMI-A-1: [drm:adv7511_irq_process] IRQ0:(00)
[ 6937.538742] drm card1-HDMI-A-1: [drm:adv7511_irq_process] IRQ1:(80) @ADV7511_INT1_DDC_ERROR
[ 6937.548077] imx-drm 32c00000.bus:display-subsystem: [drm:connector_bad_edid] HDMI-A-1: EDID is invalid:
[ 6937.548107] [00] ZERO 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 6937.548111] [00] ZERO 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 6937.548114] [00] ZERO 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 6937.548117] [00] ZERO 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 6937.548120] [00] ZERO 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 6937.548123] [00] ZERO 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 6937.548126] [00] ZERO 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 6937.548129] [00] ZERO 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 6937.551071] [drm:drm_mode_debug_printmodeline] Modeline "640x480": 60 25175 640 656 752 800 480 490 492 525 0x40 0xa
[ 6937.551097] [drm:drm_mode_prune_invalid] Not using 640x480 mode: NOCLOCK
[ 6937.551104] [drm:drm_mode_debug_printmodeline] Modeline "800x600": 56 36000 800 824 896 1024 600 601 603 625 0x40 0x5
[ 6937.551116] [drm:drm_mode_prune_invalid] Not using 800x600 mode: NOCLOCK
[ 6937.551122] [drm:drm_mode_debug_printmodeline] Modeline "800x600": 60 40000 800 840 968 1056 600 601 605 628 0x40 0x5
[ 6937.551136] [drm:drm_mode_prune_invalid] Not using 800x600 mode: BAD
[ 6937.551143] [drm:drm_mode_debug_printmodeline] Modeline "848x480": 60 33750 848 864 976 1088 480 486 494 517 0x40 0x5
[ 6937.551156] [drm:drm_mode_prune_invalid] Not using 848x480 mode: NOCLOCK
[ 6937.551164] [drm:drm_mode_debug_printmodeline] Modeline "1024x768": 60 65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa
[ 6937.551178] [drm:drm_mode_prune_invalid] Not using 1024x768 mode: BAD

…the DDC communication error has occurred.

HDCP_ERROR_INT means HDCP/DDC controller error occurred, it does not
mean EDID read success. However, the current implementation determines
both ADV7511_INT0_EDID_READY and ADV7511_INT1_DDC_ERROR are EDID read
successfully. ADV7511_INT1_DDC_ERROR should be handled as EDID read
failed or EDID read timeout. The EDID read timeout sequence is already
implemented in the adv7511_wait_for_edid(), so it is safe to ignore
DDC_ERROR.
Overdr0ne pushed a commit to Overdr0ne/linux-imx that referenced this pull request Jul 28, 2025
ossaleem pushed a commit to AirLinkOS/linux-imx that referenced this pull request Apr 29, 2026
[ Upstream commit 769e6a1 ]

ui_browser__show() is capturing the input title that is stack allocated
memory in hist_browser__run().

Avoid a use after return by strdup-ing the string.

Committer notes:

Further explanation from Ian Rogers:

My command line using tui is:
$ sudo bash -c 'rm /tmp/asan.log*; export
ASAN_OPTIONS="log_path=/tmp/asan.log"; /tmp/perf/perf mem record -a
sleep 1; /tmp/perf/perf mem report'
I then go to the perf annotate view and quit. This triggers the asan
error (from the log file):
```
==1254591==ERROR: AddressSanitizer: stack-use-after-return on address
0x7f2813331920 at pc 0x7f28180
65991 bp 0x7fff0a21c750 sp 0x7fff0a21bf10
READ of size 80 at 0x7f2813331920 thread T0
    #0 0x7f2818065990 in __interceptor_strlen
../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:461
    #1 0x7f2817698251 in SLsmg_write_wrapped_string
(/lib/x86_64-linux-gnu/libslang.so.2+0x98251)
    nxp-imx#2 0x7f28176984b9 in SLsmg_write_nstring
(/lib/x86_64-linux-gnu/libslang.so.2+0x984b9)
    nxp-imx#3 0x55c94045b365 in ui_browser__write_nstring ui/browser.c:60
    nxp-imx#4 0x55c94045c558 in __ui_browser__show_title ui/browser.c:266
    nxp-imx#5 0x55c94045c776 in ui_browser__show ui/browser.c:288
    nxp-imx#6 0x55c94045c06d in ui_browser__handle_resize ui/browser.c:206
    nxp-imx#7 0x55c94047979b in do_annotate ui/browsers/hists.c:2458
    nxp-imx#8 0x55c94047fb17 in evsel__hists_browse ui/browsers/hists.c:3412
    nxp-imx#9 0x55c940480a0c in perf_evsel_menu__run ui/browsers/hists.c:3527
    nxp-imx#10 0x55c940481108 in __evlist__tui_browse_hists ui/browsers/hists.c:3613
    nxp-imx#11 0x55c9404813f7 in evlist__tui_browse_hists ui/browsers/hists.c:3661
    nxp-imx#12 0x55c93ffa253f in report__browse_hists tools/perf/builtin-report.c:671
    nxp-imx#13 0x55c93ffa58ca in __cmd_report tools/perf/builtin-report.c:1141
    nxp-imx#14 0x55c93ffaf159 in cmd_report tools/perf/builtin-report.c:1805
    nxp-imx#15 0x55c94000c05c in report_events tools/perf/builtin-mem.c:374
    nxp-imx#16 0x55c94000d96d in cmd_mem tools/perf/builtin-mem.c:516
    nxp-imx#17 0x55c9400e44ee in run_builtin tools/perf/perf.c:350
    nxp-imx#18 0x55c9400e4a5a in handle_internal_command tools/perf/perf.c:403
    nxp-imx#19 0x55c9400e4e22 in run_argv tools/perf/perf.c:447
    nxp-imx#20 0x55c9400e53ad in main tools/perf/perf.c:561
    nxp-imx#21 0x7f28170456c9 in __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58
    nxp-imx#22 0x7f2817045784 in __libc_start_main_impl ../csu/libc-start.c:360
    nxp-imx#23 0x55c93ff544c0 in _start (/tmp/perf/perf+0x19a4c0) (BuildId:
84899b0e8c7d3a3eaa67b2eb35e3d8b2f8cd4c93)

Address 0x7f2813331920 is located in stack of thread T0 at offset 32 in frame
    #0 0x55c94046e85e in hist_browser__run ui/browsers/hists.c:746

  This frame has 1 object(s):
    [32, 192) 'title' (line 747) <== Memory access at offset 32 is
inside this variable
HINT: this may be a false positive if your program uses some custom
stack unwind mechanism, swapcontext or vfork
```
hist_browser__run isn't on the stack so the asan error looks legit.
There's no clean init/exit on struct ui_browser so I may be trading a
use-after-return for a memory leak, but that seems look a good trade
anyway.

Fixes: 05e8b08 ("perf ui browser: Stop using 'self'")
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ben Gainey <ben.gainey@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Li Dong <lidong@vivo.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Paran Lee <p4ranlee@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sun Haiyong <sunhaiyong@loongson.cn>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Yanteng Si <siyanteng@loongson.cn>
Cc: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20240507183545.1236093-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
ossaleem pushed a commit to AirLinkOS/linux-imx that referenced this pull request Apr 29, 2026
…uddy pages

commit 8cf360b upstream.

When I did memory failure tests recently, below panic occurs:

page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00
flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff)
raw: 06fffe0000000000 dead000000000100 dead000000000122 0000000000000000
raw: 0000000000000000 0000000000000009 00000000ffffffff 0000000000000000
page dumped because: VM_BUG_ON_PAGE(!PageBuddy(page))
------------[ cut here ]------------
kernel BUG at include/linux/page-flags.h:1009!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
RIP: 0010:__del_page_from_free_list+0x151/0x180
RSP: 0018:ffffa49c90437998 EFLAGS: 00000046
RAX: 0000000000000035 RBX: 0000000000000009 RCX: ffff8dd8dfd1c9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff8dd8dfd1c9c0
RBP: ffffd901233b8000 R08: ffffffffab5511f8 R09: 0000000000008c69
R10: 0000000000003c15 R11: ffffffffab5511f8 R12: ffff8dd8fffc0c80
R13: 0000000000000001 R14: ffff8dd8fffc0c80 R15: 0000000000000009
FS:  00007ff916304740(0000) GS:ffff8dd8dfd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055eae50124c8 CR3: 00000008479e0000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 __rmqueue_pcplist+0x23b/0x520
 get_page_from_freelist+0x26b/0xe40
 __alloc_pages_noprof+0x113/0x1120
 __folio_alloc_noprof+0x11/0xb0
 alloc_buddy_hugetlb_folio.isra.0+0x5a/0x130
 __alloc_fresh_hugetlb_folio+0xe7/0x140
 alloc_pool_huge_folio+0x68/0x100
 set_max_huge_pages+0x13d/0x340
 hugetlb_sysctl_handler_common+0xe8/0x110
 proc_sys_call_handler+0x194/0x280
 vfs_write+0x387/0x550
 ksys_write+0x64/0xe0
 do_syscall_64+0xc2/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ff916114887
RSP: 002b:00007ffec8a2fd78 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000055eae500e350 RCX: 00007ff916114887
RDX: 0000000000000004 RSI: 000055eae500e390 RDI: 0000000000000003
RBP: 000055eae50104c0 R08: 0000000000000000 R09: 000055eae50104c0
R10: 0000000000000077 R11: 0000000000000246 R12: 0000000000000004
R13: 0000000000000004 R14: 00007ff916216b80 R15: 00007ff916216a00
 </TASK>
Modules linked in: mce_inject hwpoison_inject
---[ end trace 0000000000000000 ]---

And before the panic, there had an warning about bad page state:

BUG: Bad page state in process page-types  pfn:8cee00
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8cee00
flags: 0x6fffe0000000000(node=1|zone=2|lastcpupid=0x7fff)
page_type: 0xffffff7f(buddy)
raw: 06fffe0000000000 ffffd901241c0008 ffffd901240f8008 0000000000000000
raw: 0000000000000000 0000000000000009 00000000ffffff7f 0000000000000000
page dumped because: nonzero mapcount
Modules linked in: mce_inject hwpoison_inject
CPU: 8 PID: 154211 Comm: page-types Not tainted 6.9.0-rc4-00499-g5544ec3178e2-dirty nxp-imx#22
Call Trace:
 <TASK>
 dump_stack_lvl+0x83/0xa0
 bad_page+0x63/0xf0
 free_unref_page+0x36e/0x5c0
 unpoison_memory+0x50b/0x630
 simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110
 debugfs_attr_write+0x42/0x60
 full_proxy_write+0x5b/0x80
 vfs_write+0xcd/0x550
 ksys_write+0x64/0xe0
 do_syscall_64+0xc2/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f189a514887
RSP: 002b:00007ffdcd899718 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f189a514887
RDX: 0000000000000009 RSI: 00007ffdcd899730 RDI: 0000000000000003
RBP: 00007ffdcd8997a0 R08: 0000000000000000 R09: 00007ffdcd8994b2
R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcda199a8
R13: 0000000000404af1 R14: 000000000040ad78 R15: 00007f189a7a5040
 </TASK>

The root cause should be the below race:

 memory_failure
  try_memory_failure_hugetlb
   me_huge_page
    __page_handle_poison
     dissolve_free_hugetlb_folio
     drain_all_pages -- Buddy page can be isolated e.g. for compaction.
     take_page_off_buddy -- Failed as page is not in the buddy list.
	     -- Page can be putback into buddy after compaction.
    page_ref_inc -- Leads to buddy page with refcnt = 1.

Then unpoison_memory() can unpoison the page and send the buddy page back
into buddy list again leading to the above bad page state warning.  And
bad_page() will call page_mapcount_reset() to remove PageBuddy from buddy
page leading to later VM_BUG_ON_PAGE(!PageBuddy(page)) when trying to
allocate this page.

Fix this issue by only treating __page_handle_poison() as successful when
it returns 1.

Link: https://lkml.kernel.org/r/20240523071217.1696196-1-linmiaohe@huawei.com
Fixes: ceaf8fb ("mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant