Skip to content

nanopb integration#118

Open
kevoreilly wants to merge 2 commits intocapemonfrom
nanopb
Open

nanopb integration#118
kevoreilly wants to merge 2 commits intocapemonfrom
nanopb

Conversation

@kevoreilly
Copy link
Owner

No description provided.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @kevoreilly, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request initiates a significant migration of the logging data serialization mechanism from BSON to Nanopb (Protocol Buffers). The change aims to leverage Nanopb's efficiency for embedded systems, potentially leading to smaller log sizes and improved performance. It involves integrating the Nanopb library, defining a new Protobuf schema for logging events, and refactoring the core logging logic in log.c to utilize a new protobuf_wrapper layer. While the core serialization switch is implemented, some aspects like complex array handling and binary data logging are noted as requiring further development.

Highlights

  • Serialization Migration: Replaced the existing BSON data serialization library with Nanopb, a Protocol Buffers implementation optimized for constrained systems.
  • New Protobuf Schema: Introduced schema.proto to define the new data structures for logging events, including StrMessage and RegularCall types within a HookEvent oneof.
  • Protobuf Wrapper: Added protobuf_wrapper.c and protobuf_wrapper.h to provide a simplified API for interacting with Nanopb, mirroring some functionalities of the previous BSON library.
  • Logging Mechanism Refactor: Extensively updated log.c to switch from BSON-based data appending to the new Protobuf wrapper functions for various data types like integers, strings, and pointers.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to replace "BSON" with "nanopb" (Protocol Buffers) for logging serialization. However, it introduces critical buffer over-read vulnerabilities in "log.c" within "log_string" and "log_wstring", where "strlen" can read past buffer boundaries due to ignored length parameters, potentially causing crashes or information disclosure. Beyond this, the implementation is incomplete, with significant logging functionality commented out or replaced, thread-safety mechanisms removed, and risks of uninitialized memory usage. The "protobuf_wrapper" also presents concerns regarding its fixed-size buffer and non-scalable field mapping.

Comment on lines +207 to +212
protobuf_context_t ctx;
//protobuf_init_message(&ctx, "debug");
protobuf_append_string(&ctx, "msg", msg);
protobuf_finish(&ctx);
log_raw_direct(protobuf_data(&ctx), protobuf_size(&ctx));
protobuf_destroy(&ctx);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The protobuf_context_t ctx variable is used without being initialized. This will lead to undefined behavior when protobuf_append_string and other functions attempt to write to it. You should initialize it using protobuf_init(&ctx); at the beginning of the function.

void debug_message(const char *msg) {
	protobuf_context_t ctx;
	protobuf_init(&ctx);
	//protobuf_init_message(&ctx, "debug");
	protobuf_append_string(&ctx, "msg", msg);
	protobuf_finish(&ctx);
	log_raw_direct(protobuf_data(&ctx), protobuf_size(&ctx));
	protobuf_destroy(&ctx);
	log_flush();
}

Comment on lines +505 to +506
// if (!TryEnterCriticalSection(&g_mutex))
// goto exit;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Commenting out the TryEnterCriticalSection call (and the corresponding LeaveCriticalSection at the end of the function) removes thread safety from the logging mechanism. This is a critical issue that can lead to race conditions and memory corruption if loq is called from multiple threads concurrently. The mutex protection must be restored.

Comment on lines +667 to +686
//bson_init( g_bson );
//bson_append_int( g_bson, "I", index );
hookinfo = hook_info();
bson_append_ptr(g_bson, "C", hookinfo->return_address);
//bson_append_ptr(g_bson, "C", hookinfo->return_address);
// return location of malware callsite
bson_append_ptr(g_bson, "R", hookinfo->main_caller_retaddr);
//bson_append_ptr(g_bson, "R", hookinfo->main_caller_retaddr);
// return parent location of malware callsite
bson_append_ptr(g_bson, "P", hookinfo->parent_caller_retaddr);
bson_append_int(g_bson, "T", GetCurrentThreadId());
bson_append_int(g_bson, "t", raw_gettickcount() - g_starttick );
//bson_append_ptr(g_bson, "P", hookinfo->parent_caller_retaddr);
//bson_append_int(g_bson, "T", GetCurrentThreadId());
//bson_append_int(g_bson, "t", raw_gettickcount() - g_starttick );
// number of times this log was repeated -- we'll modify this
bson_append_int(g_bson, "r", 0);
//bson_append_int(g_bson, "r", 0);

compare_offset = (unsigned int)(g_bson->cur - bson_data(g_bson));
//compare_offset = (unsigned int)(g_bson->cur - bson_data(g_bson));
// the repeated value is encoded immediately before the stream we want to compare
repeat_offset = compare_offset - 4;

bson_append_start_array(g_bson, "args");
bson_append_int( g_bson, "0", is_success );
bson_append_ptr( g_bson, "1", return_value );
//bson_append_start_array(g_bson, "args");
//bson_append_int( g_bson, "0", is_success );
//bson_append_ptr( g_bson, "1", return_value );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The core logic for logging API call details has been commented out. This effectively disables the primary functionality of the loq function. This appears to be a work-in-progress, but in its current state, no API call information will be recorded.

Comment on lines 272 to 279
static void log_string(const char *str, int length)
{
int ret;
char *utf8s;
int utf8len;

if (str == NULL) {
bson_append_string_n( g_bson, g_istr, "", 0 );
protobuf_append_string(g_protobuf, g_istr, "");
return;
}
utf8s = utf8_string(str, length);
utf8len = * (int *) utf8s;
ret = bson_append_binary( g_bson, g_istr, BSON_BIN_BINARY, utf8s+4, utf8len );
if (ret == BSON_ERROR) {
bson_append_string_n(g_bson, g_istr, "", 0);
}
free(utf8s);
protobuf_append_string(g_protobuf, g_istr, str);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

This refactoring introduces a critical buffer over-read vulnerability in "log_string". The "length" parameter is ignored, and "protobuf_append_string" relies on "strlen", which will read past buffer boundaries if a non-null-terminated string is passed (e.g., via "loq"'s 'S' format specifier). This can lead to crashes or information disclosure. The same issue applies to "log_wstring". The previous "BSON" implementation correctly handled string lengths.

Comment on lines 281 to 295
static void log_wstring(const wchar_t *str, int length)
{
int ret;
char *utf8s;
int utf8len;

if (str == NULL) {
bson_append_string_n( g_bson, g_istr, "", 0 );
protobuf_append_string(g_protobuf, g_istr, "");
return;
}
utf8s = utf8_wstring(str, length);
utf8len = * (int *) utf8s;
ret = bson_append_binary( g_bson, g_istr, BSON_BIN_BINARY, utf8s+4, utf8len );
if (ret == BSON_ERROR) {
bson_append_string_n(g_bson, g_istr, "", 0);
// Convert wide string to UTF-8 or handle appropriately
char *utf8s = utf8_wstring(str, length);
if (utf8s) {
int utf8len = *(int*)utf8s;
// This needs adaptation for binary data in protobuf
protobuf_append_string(g_protobuf, g_istr, utf8s + 4);
free(utf8s);
}
free(utf8s);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

Similar to the vulnerability in log_string, this function is also vulnerable to a buffer over-read. The result of utf8_wstring is not guaranteed to be null-terminated. Passing this buffer (utf8s + 4) to protobuf_append_string will cause strlen to be called on a non-null-terminated string, leading to a buffer over-read. The old implementation correctly used the explicit length of the converted string.

Comment on lines 434 to 441
static void log_wargv(int argc, const wchar_t ** argv) {
int i;

bson_append_start_array( g_bson, g_istr );

for (i = 0; i < argc; i++) {
num_to_string(g_istr, 4, i);
log_wstring(argv[i], -1);
}

bson_append_finish_array( g_bson );
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This implementation of log_wargv is incorrect. Since bson_append_start_array and bson_append_finish_array have been removed, the loop will repeatedly call log_wstring, which in turn calls protobuf_append_string. This will overwrite the same field in the protobuf message on each iteration. To correctly log an array of strings, the schema needs to be updated to use a repeated field, and the logging logic must be adapted to add multiple entries to it.

Comment on lines +450 to +455
// You'll need to add protobuf_append_binary to your wrapper
// For now, we'll skip binary data or log as string
if (trunclength > 0) {
// This is a placeholder - you'll need proper binary handling
protobuf_append_string(g_protobuf, g_istr, "[binary data]");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Logging a placeholder string "[binary data]" instead of the actual buffer content is a critical loss of functionality. The comment indicates that a binary append function is missing from the wrapper. This needs to be implemented in protobuf_wrapper.c and used here to restore the ability to log binary data.

Comment on lines +54 to +96
int protobuf_append_string(protobuf_context_t* ctx, const char* name, const char* value) {
if (!value) return 0;

if (ctx->event.which_message_type == HookEvent_str_tag) {
StrMessage* str_msg = &ctx->event.message_type.str;

if (strcmp(name, "name") == 0) {
str_msg->name.funcs.encode = encode_string_callback;
str_msg->name.arg = (void*)value;
} else if (strcmp(name, "type") == 0) {
str_msg->type.funcs.encode = encode_string_callback;
str_msg->type.arg = (void*)value;
} else if (strcmp(name, "category") == 0) {
str_msg->category.funcs.encode = encode_string_callback;
str_msg->category.arg = (void*)value;
} else if (strcmp(name, "api_name") == 0) {
str_msg->api_name.funcs.encode = encode_string_callback;
str_msg->api_name.arg = (void*)value;
}
}

return 1;
}

int protobuf_append_int(protobuf_context_t* ctx, const char* name, int32_t value) {
if (ctx->event.which_message_type == HookEvent_regular_call_tag) {
RegularCall* call = &ctx->event.message_type.regular_call;

if (strcmp(name, "I") == 0 || strcmp(name, "i") == 0) {
call->i = value;
} else if (strcmp(name, "T") == 0 || strcmp(name, "t") == 0) {
call->t = value;
}
} else if (ctx->event.which_message_type == HookEvent_str_tag) {
StrMessage* str_msg = &ctx->event.message_type.str;

if (strcmp(name, "I") == 0 || strcmp(name, "i") == 0) {
str_msg->i = value;
}
}

return 1;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using strcmp to map string names to fields is inefficient and brittle. It makes the code hard to maintain, as any change in field names in the .proto file requires manual updates here. A more robust approach would be to pass field descriptors or use a more direct mapping mechanism to set the values, avoiding string comparisons at runtime.

Comment on lines +11 to +15
typedef struct {
HookEvent event;
uint8_t buffer[4096];
size_t encoded_size;
} protobuf_context_t;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using a fixed-size buffer of 4096 bytes for encoding protobuf messages introduces a risk of buffer overflows if a message exceeds this size. The previous BSON implementation appeared to handle dynamic resizing. Consider either increasing the buffer size significantly or, preferably, implementing dynamic resizing or at least adding checks in protobuf_finish to prevent writing past the end of the buffer.

Comment on lines +17 to +43
void protobuf_init(protobuf_context_t* ctx) {
// Zero the entire structure first
memset(ctx, 0, sizeof(protobuf_context_t));

// Initialize nanopb structs manually for MSVC compatibility
ctx->event.which_message_type = HookEvent_regular_call_tag;

// Manually initialize the regular_call message
ctx->event.message_type.regular_call.i = 0;
ctx->event.message_type.regular_call.t = 0;
ctx->event.message_type.regular_call.r = 0;
ctx->event.message_type.regular_call.p = 0;

// Initialize string callbacks to NULL
ctx->event.message_type.regular_call.c.funcs.encode = NULL;
ctx->event.message_type.regular_call.c.arg = NULL;
ctx->event.message_type.regular_call.args.funcs.encode = NULL;
ctx->event.message_type.regular_call.args.arg = NULL;
ctx->event.message_type.regular_call.index.funcs.encode = NULL;
ctx->event.message_type.regular_call.index.arg = NULL;
ctx->event.message_type.regular_call.aux.funcs.encode = NULL;
ctx->event.message_type.regular_call.aux.arg = NULL;
ctx->event.message_type.regular_call.data.funcs.encode = NULL;
ctx->event.message_type.regular_call.data.arg = NULL;

ctx->encoded_size = 0;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Instead of manually zeroing out fields, it's safer and more maintainable to use the initializer macros generated by nanopb. This ensures all fields, including any future additions, are correctly initialized to their default state.

For example, you can use HookEvent_init_zero.

void protobuf_init(protobuf_context_t* ctx) {
    // Use the nanopb-generated default initializer.
    ctx->event = HookEvent_init_zero;
    ctx->encoded_size = 0;
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant