-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathcontent.json
More file actions
1 lines (1 loc) · 116 KB
/
content.json
File metadata and controls
1 lines (1 loc) · 116 KB
1
{"pages":[{"title":"About me","text":"An undergraduate student in Zhejiang University, Hangzhou, China. Currently major in Information Security. Research interests: LLM4code, software engineering, system and software security, machine learning.","link":"/blog/about/index.html"}],"posts":[{"title":"[Review] Testing Database Engines via Pivoted Query Synthesis","text":"Link here This paper aims to detect the logic bugs in the DBMSs. In this paper, logic bugs are defined as bugs that cause a query to return an incorrect result without crashing the DBMS. It randomly selects a row from the table(called pivot row) and syntheses a query whose result should contain the selected row, and then sends the query to the DBMS. By checking if the pivot row is contained in the result, we will know if there is a logic bug. Motivation: Logic bugs in DBMSs are hard to find. While former logic bugs detector RAGS applying differential testing to itself, there are still a lot of problems because of DBMSs' dialects^{1} and the common bugs^{2}. So SQLancer is proposed to tackle this problem. [1]: Different DBMSs have different ways of implementation and unique grammar towards the same SQL query. [2]: Different DBMSs may have the same bug, which disables differential testing. Implementation: Randomly generate tables and rows. Randomly select a row from each table(the pivot row may cross several tables). Randomly generate a AST based on the database’s schema(the column names and types). Correct the result of the generated AST(keep still with TRUE and add a NOT with FALSE). Transform the AST into the SQL query. Send the query to DBMSs to test(we can use INTERSECT or IN provided by the DBMS to help our test). Evaluation: Setup: A laptop with a 6-core Intel i7-8850H CPU at 2.60 GHz and 32 GB of memory running Ubuntu 19.04. Interestingly, authors say there is no other tools to compare with except RAGS, which was proposed more than 20 years ago and was with low efficiency. So in the evaluation section, authors only mention the implementation effort and the coverage, as well as introduce the bugs they found in the test. Implementation effort: Compares the LOC(line of code) of SQLancer and DBMSs(quite strange!). Coverage: The coverage is low, because they only tested the data-centric SQL statements. Future work: Try to avoid duplicate problem(several queries trigger the same problem). Check the correct insertion or deletion of records, detect concurrency bugs, bugs related to transactions, or bugs in the access control layer of DBMSs. Test multiple rows. To solve the dialect problem of different DBMSs.","link":"/blog/2023/10/25/11_paper_review_2/"},{"title":"[Review] MINER: A Hybrid Data-Driven Approach for REST API Fuzzing","text":"Link here This paper proposed a new approach for REST^{1} API fuzzing, which: Focuses more on the long sequence query. Induces a customized attention model to support fuzzing process. Implements a new data-driven security rule checker to capture the new kind of errors caused by undefined parameters. [1]: REST standard, usually including GET, POST, PUT, DELETE. Motivation: Cloud service testing is important, but early works(like RESTler) fail to generate long request sequence for testing, which is not enough to detect deep errors hidden in hard-to-reach states of cloud services. MINER applies length oriented mechanisms to generate long request sequence, and applies a attention model to help pass the semantic checking. Further more, it applies a data-driven security rule checker to capture the new kind of errors caused by undefined parameters. Implementation: 5 main components: Sequence Template Selection, Generation Module, Fuzzing Module, Collection Module, and Training Module. Sequence Template Selection first generates the frameworks of the sequence. Generation Module fills in the frameworks with parameters. Fuzzing Module fuzzes the cloud service. Collection Module collects the related data(valid request sequences, param-value pair). Training Module is periodically invoked to train an attention model, which helps the Generation Module to work better. response: 50x means bugs found, 20x means syntactic and semantic correctness, 40x means syntactic and semantic error. Evaluation: Construct two prototypes without and with the DataDriven Checker to measure the error discovery performance of the checker. Comparison: Compares with state-of-the-art open-sourced fuzzer RESTler. Benchmarks: GitLab, Bugzilla and WordPress via 11 REST APIs. Deploy an open-sourced version of each cloud service on their own server. Setup: Lasts for 48 hours on a docker container configured with 8 CPU cores, 20 GB RAM, Python 3.8.2, and the OS of Ubuntu 16.04 LTS. They run evaluations on 3 servers, each of which has two E5-2680 CPUs, 256GB RAM and a Nvidia GTX 1080 Ti graphics card. Evaluate the 1) pass rate of syntax and semantic checking, 2) count the types of generated requests that get responses in 20× Range, 3) count the number of unique errors, which trigger the responses in 50× Range or violate the defined security rules. Apply length-orientated sequence construction, attention model, and other techniques in RESTler to have a further analysis. Performance on Reproducing Serious Bugs, Coverage Performance Analysis, Schedule of the Training Module, and Execution Distribution of Requests. Future work: Apply attention model to other areas. To improve the reproducibility of the bugs found. Keep improving the length-oriented quest generation approach.","link":"/blog/2023/10/25/12_paper_review_3/"},{"title":"[Review] How IoT Re-using Threatens Your Sensitive Data: Exploring the User-Data Disposal in Used IoT Devices","text":"Link here This paper performs the first in-depth investigation on the user-data disposal of used IoT devices, and finds that: Most users lack the awareness of disposing used IoT devices. IoT devices collect more sensitive data than users expect, and current data protections of used IoT devices are inadequate. The disposal methods of used IoT devices are often ineffective. Implementation: RQ1: Which kinds of sensitive data reside in used IoT devices? RQ2: Which methods can be used to dispose of sensitive data? RQ3: Are existing disposal methods effective in erasing the sensitive data? Conduct a user study to understand the user awareness of sensitive data or how to dispose of data. Design a system to detect sensitive data collection in IoT firmware image. Test real world IoT devices to evaluate data disposal efficiency. Raise ethical considerations(always included in IoT related papers). Evaluation: Evaluates the effectiveness of the designed system Compares the system detection result with manual detection result. Compares the system with other systems(SOTA sensitive information tracking systems). Result: Answer to RQ1 Device management information Network setting information Third-party account information User portrait information Answer to RQ2(apply manual test) Overwrite or remove sensitive data through a user interface. Perform a soft resetting by clicking the “reset to factory defaults” button on the configuration page of an IoT device. Perform a hard resetting by pressing the RESET button on the device. Perform a firmware upgrade by clicking the “upgrade firmware” button on the configuration page of an IoT device. Log in to the terminal of an IoT device and overwrite/remove the files that store user data. Answer to RQ3(often ineffective) Future work: A better system to detect sensitive data collection. Apply automatic analysis to the early manual analysis. Test a larger IoT device database. Design a safer and more effective data disposal method for IoT device.","link":"/blog/2023/10/26/13_paper_review_4/"},{"title":"[Review] Squirrel: Testing Database Management Systems with Language Validity and Coverage Feedback","text":"Link here This paper proposes a new recipe to detect the DBMSs crash. It’s difficult to ensure the syntactic and semantic correctness when fuzzing the DBMSs, and former methods(mutation-based fuzzers, generation-based fuzzers) are not eligible for it. Mutation-based fuzzers are not able to ensure the syntactic and semantic correctness, and generation-based fuzzers can guarantee the syntax correctness of the inputs, but it does utilize any feedback. Implementation: Change the SQL query into IR(a representation methods proposed in the paper), using AST(Abstract Syntax Tree). Make mutations on IR, which will guarantee the syntactic and semantic correctness(using dependency graph). Change the mutated IR back to the SQL query. Send mutated SQL queries to fuzz the DBMSs. Evaluation: Benchmarks: SQLite, PostgreSQL, MySQL, MariaDB Setup: Ubuntu 16.04 system, on a machine that has Intel Xeon CPU E5-2690 (2.90GHz) with 16 cores and 188GB RAM Comparison: Compares with five state-of-the-art fuzzers(AFL, SQLsmith, QSYM, Angora, GRIMOIRE). Criteria: unique crashes, unique bugs, new edges, syntax validity, semantic validity. Future work: Automatic fuzz regardless of different DBMSs. Detecting logic bugs in DBMSs. A new feedback mechanism rather than normal code coverage feedback.","link":"/blog/2023/10/24/10_paper_review_1/"},{"title":"[Review] A Large-Scale Empirical Analysis of the Vulnerabilities Introduced by Third-Party Components in IoT Firmware","text":"Link here This paper doesn’t propose anything new, but creates a system called FirmSec that can detect the TPCs(third-part components) at version-level in firmware, and then recognizes the corresponding vulnerabilities. FirmSec takes IoT firmware images as input and output the vulnerabilities of TPCs contained in the firmware image. Also, their work creates a database consisting of 34, 136 firmware images. FirmSecDataset Implementation: Preprocess the database, gathering various firmware images both public and private. Preprocess the database, gathering various TPCs and their vulnerabilities. Take in the firmware image, identify its characters and determines the TPCs(at version level) contained in the firmware. Generate the vulnerability report of the firmware. In order to implement version level verification, they apply syntactical features and CFG features to perform the version check. Evaluation: Evaluate the accuracy of FirmSec. Comparison: Compare with three state of the arts: Gemini, BAT and OSSPolice. The work also discloses the GPL/AGPL license violations widely exist in firmware. Future work: A better approach to version-level verification. Adopt fuzzing mechanisms to automatically find vulnerabilities.","link":"/blog/2023/10/26/14_paper_review_5/"},{"title":"[Review] autofz: Automated Fuzzer Composition at Runtime","text":"Link here This paper proposes a new fuzzing mechanism which integrates several fuzzers to perform a unique fuzzing process. For every workload, one or several optimal mixture of fuzzers are employed for fuzzing. Unlike the early work, autofz: Do not need presetting and human efforts. Allocate fuzzers for every workload, rather than every program. Background: A large amount of fuzzers have been created, which makes it difficult to choose a proper fuzzer for a specific fuzzing. No universal fuzzer perpetually outperforms others, so choosing a optimal fuzzer will be difficult. The efficiency of a fuzzer may not last for the whole fuzzing process. Fuzzing is a random process, a optimal fuzzer may not always be that case. Implementation: Divide fuzzing process into two phases, preparation phase and focus phase. In preparation phase, autofz tests every fuzzers and finds one or several well-performed fuzzers. In focus phase, autofz allocates different resources for fuzzers chosen from preparation phase to perform fuzzing. A workload is composed of a preparation phase and a focus phase, and a fuzzing process is composed of several workloads. Evaluation: Setup: Ubuntu 20.04 equipped with AMD Ryzen 9 3900 having 24 cores and 32 GB memory. Baseline fuzzers: AFL, AFLFast, MOpt, FairFuzz, LearnAFL, QSYM, Angora, Redqueen, Radamsa, LAF-I NTEL, and libFuzzer. Evaluate coverage, bugs found, elasticity, and compare with collaborative fuzzing. Future work: A better approach to choose the optimal fuzzers in the preparation phase. How to automatically choose the fuzzer set used.","link":"/blog/2023/10/31/16_paper_review_6/"},{"title":"Software Analysis Basics","text":"Background and Basics Test oracle: a mechanism for determining whether software executed correctly for a test. Differential test: Provide the same input to similar applications, and observe output differences. Metamorphic testing: Provide the manipulated inputs to same application, and observe if output differences are as expected. Program Analysis Basics Abstract syntax tree(AST): Represents the abstract syntactic structure of a language construct. Control flow graph(CFG): Divide the program into basic blocks. Basic blocks: A sequence of straight-line code that can be entered only at the beginning and exited at the end. Connect basic blocks together to generate CFG. Control-flow-based code coverage: Statement coverage, Branch coverage, Path coverage. Path coverage strictly subsumes branch coverage, branch coverage in turn strictly subsumes statement coverage. Path coverage > branch coverage > statement coverage. Data-flow analysis: Live Variables Analysis, Available Expressions Analysis, Very Busy Expressions Analysis. Data-flow-based code coverage: DU-pair, DU-path Program analysis tools: Java JavaParser: A lightweight source code analysis and manipulation framework. Eclipse JDT: A source-level code analysis and manipulation framework. ASM: A lightweight bytecode-level analysis and manipulation framework. Soot: An Intermediate Representation (IR) level analysis and manipulation framework. Wala: An IR-level analysis and manipulation (via Shrike) framework for Java and JavaScript. C++ LLVM: Highly customizable and modular compiler framework Mutation TestingUsually, more real bugs detecting means more effective test suite, but real bugs are usually small in number, making it hard to: Evaluate test effectiveness comprehensively Evaluate test effectiveness in detecting future bugs So we can create artificial bugs to stimulate real bugs in test effectiveness evaluation, which is called Mutation Testing. Mutation testing injects changes to statements of programs to generate artificial bugs. Applies artificial changes based on mutation operators (aka mutators) to generate mutants. Execute the test suite against each mutant. Compute the mutation score (e.g., the ratio of killed mutants). The higher the better! Limitation: Mutation testing is extremely costly, since we need to run the test suite against each mutant Mutation testing tools Java PIT: http://pitest.org/ MAJOR: http://mutation-testing.org/ Javalanche: https://github.com/david-schuler/javalanche/ MuJava: http://cs.gmu.edu/~offutt/mujava/ C MILU: http://www0.cs.ucl.ac.uk/staff/y.jia/Milu/ python MutPy: https://pypi.python.org/pypi/MutPy/0.4.0 C# NinjaTurtles: http://ninjaturtles.codeplex.com/ Formal Methods BasicsBoolean satisfiability problem (SAT) Satisfiability Modulo Theories (SMT) SMT tools Z3 Supported theories: empty theory, linear arithme-c, nonlinear arithme-c, bitvectors, arrays, datatypes, quan-fiers, strings CVC4 Supported theories: ra-onal and integer linear arithme-c, arrays, tuples, records, induc-ve data types, bitvectors, strings, and equality over uninterpreted func-on symbols STP Supported theories: bitvectors, arrays Boolector Supported theories: bitvectors, arrays, and uninterpreted func-ons Automated TestingGuided Unit Test Generation For projects providing a number of public APIs for external use(e.g., JDK lib). Method-level test generation: consider various method invocation sequences to expose possible faults. [Review] Feedback-directed Random Test Generation [Review] Whole Test Suite Generation Cited from “CS527: Topics in Software Engineering” taught by Lingming Zhang.","link":"/blog/2023/10/27/15_software_basic/"},{"title":"[Review] Feedback-directed Random Test Generation","text":"Link here The paper presents a technique to improve random test generation by incorporating feedback obtained from executing test inputs as they are created. This paper aims to exposing the potential faults in objects(e.g., Java class), i.e., object oriented, by generating a sequence of method calls to explore bugs. Background Random testing is of low efficiency, and may generate useless and redundant test sequences. So RANDOOP is proposed to handle this problem. Implementation Randomly select some method sequences that have been checked with no error. Concatenate and extend the sequence to form a new sequence. If the newly generated sequence is redundant, abandon it. Check the sequence with personal-designed contracts and filters. Evaluation: The coverage that RANDOOP achieves on a collection of container data structures, and compares it with JPF. Uses RANDOOP to generate test inputs that find API contract violations on 14 widely-used libraries, and compares with JPF and other undirected random testing methods. Uses RANDOOP-generated regression test cases to finnd regression errors in three industrial implementations of the Java JDK.","link":"/blog/2023/11/09/17_paper_review_7/"},{"title":"[Review] Whole Test Suite Generation","text":"Link here The paper presents a Genetic Algorithm(GA) in which whole test suites are evolved with the aim of covering all coverage goals at the same time. Whole test suite generation achieves higher coverage than single branch test case generation. Whole test suite generation produces smaller test suites than single branch test case generation. http://www.evosuite.org Background: Current work only target at one coverage goal at a time. Engineers should manually write assertion for every test case, so the length of the test case should be as short as possible(after satisfying the coverage prerequisite). Implementation: The GA is as following. The main part of achieving targeting at different goals at the same time is the fitness function. The function is used to score the test suite. You can design your own fitness function, incorporating more coverage goals to realize multi-goal evolvement. An individual in the population means a single test suite. So, O_1, O_2, P_1, P_2 are all test suites. The most important step is to define crossover and mutation operators. Referring to the thesis for more implementation details. Evaluation: Test a total of 19 open source libraries and programs. Explain the parameter decision in this work. Test infeasible test goals(Some branches that is impossible to reach). Compare with other tools, Prove the convergence of the GA theoretically.","link":"/blog/2023/11/12/18_paper_review_8/"},{"title":"[Review] Titan : Efficient Multi-target Directed Greybox Fuzzing","text":"Link here The paper presents a multi-target fuzzing method, which fuzzes different targets at the same time. Titan is proposed to perform this work, enabling the fuzzers to distinguish correlations between various targets in the program. And under these correlations, optimizes the input generation efficiently and simultaneously fuzzing different targets. repo Introduction: In practice, more than 1000 potential targets may need verification, which will be costly. Current direct fuzzing only aims at on target at a time, lowering the verification efficiency, and generating multiple instances for fuzzing multiple targets will also be 3.6x slower compared with sequentially applying only one instance at a time for one target. One of the root causes of this challenge is that existing approaches are unaware of the correlations between the targets and, as a result, could degenerate to undirected fuzzing as the number of targets grows, which is defined as synergy ignorance problem. under these circumstances, Titan is created. Implementation: Classify the correlations to: overlapping, conflicting, independent. Use a static analyzer to infer the correlations among multiple targets based on their path conditions. Design a synergy-aware fuzzer that effectively generates inputs for multiple targets. In order to deploy synergy-aware fuzzer efficiently, the correlations between the input bites are figured out. So, simultaneous mutation to the input bites may be possible. Evaluation: To answer four questions: RQ1: How efficiently can Titan reproduce the vulnerabilities compared with other fuzzer? RQ2: How effectively do the correlations inferred by Titan help reproduce the vulnerabilities? RQ3: How effectively can Titan help other directed fuzzing for multiple targets? RQ4: What is the runtime overhead brought by Titan? Compare Titan with the following fuzzers: benchmarks: Magma, detecting incomplete fixes. Future work: This paper only focus on the reachability of the targets, but without further exploration.","link":"/blog/2023/11/13/19_paper_review_9/"},{"title":"RE:重新开始的博客写作","text":"为什么是重新开始    其实从高中就开始尝试着写过博客,但由于学业压力,能分出的时间本来就少。再加上自己又喜欢偷懒,当时写博客可能更多的只是象征性的记录,自己根本不会回头去复习,也没有希望有其他人来看,整个就一浑浑噩噩的状态。高二时也是直接退役,离开了OI,所以原先博客的价值其实并不大。再加上高考结束后换了电脑,原先的文件丢了,也只好重新开始写。 为什么要写博客    更多还是为了一个记录,能够有一个平台能够记录自己的所学,并不是想要有多少人来看,仅仅是想要督促自己去复习一些知识。在写博客的过程中对学到的知识进行一个复习,也算是让这个博客发挥了他的用处。 对这个博客的打算   1. 把博客像笔记本一样对待,写下自己学到的,便于复习。   2. 列下自己的目标与计划。   3. 拒绝摆烂!","link":"/blog/2022/02/15/1_restart/"},{"title":"[Review] On the Naturalness of Software","text":"Link here A classical paper showing software also has its own naturalness like natural languages, demonstrating the basics of programming prediction and completion. Natural languages are repetitive and predictable, which can be processed by statistical approaches(NLP). Programming code is also very regular, and even more so than natural languages. Demonstrate, using standard cross-entropy and perplexity measures, that the above model is indeed capturing the high-level statistical regularity that exists in software at the n-gram level (probabilistic chains of tokens). Regularities are specific to both projects and to application domains. Implementation & Evaluation: Implement a plug-in n-gram language model, manifesting its effectiveness. Compare with other built-in completion facilities of eclipse. Future work: More sophisticated language model to do the code prediction(LLM). Not only the code naturalness, but also the deeper properties of software may also have naturalness.","link":"/blog/2023/11/14/20_paper_review_10/"},{"title":"[Review] Large Language Models are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models","text":"Link here The paper proposes a new approach to leveraging LLMs to generate input programs for fuzzing DL libraries. More specifically, apply LLMs(Codex & INCODER) to fuzz DL libraries(pytorch & tensorflow). Background: Previous work on fuzzing DL libraries mainly falls into two categories: API-level fuzzing and model-level fuzzing. They still have some limitations. Model level fuzzers attempt to leverage complete DL models (which cover various sets of DL library APIs) as test inputs. But due to the input/output constraints of DL APIs, model-level mutation/generation is hard to perform, leading to a limited number of unique APIs covered. API-level fuzzing focuses on finding bugs within a single API at a time. But API-level fuzzers cannot detect any bug that arises from interactions within a complex API sequence. Implementation: TitanFuzz is performed. Use a generative LLM(Codex) with a step-by-step input prompt to produce the initial seed programs for fuzzing. Adopt an evolutionary strategy to produce new test programs by using LLMs(INCODER) to automatically mutate the seed programs. Collect the generated programs, and feed them to the target DL libraries. test oracle: results shown on CPU and GPU. Because results from CPU and GPU may be reasonable to have little differences, so, a threshold is set to indicate the bugs. Some points: This is especially true for API sequences, as the combination of keywords in multiple API calls can lead to previously undiscovered bugs. It is common in DL libraries for related APIs to share the same input, and borrowing inputs from one API can help trigger bugs in its relational APIs. Evaluation: RQ1: How does TitanFuzz compare against existing DL library fuzzers? RQ2: How do the key components of TitanFuzz contribute to its effectiveness? RQ3: Is TitanFuzz able to detect real-world bugs? set up: A 64-core workstation with 256 GB RAM and running Ubuntu 20.04.5 LTS with 4 NVIDIA RTX A6000 GPUs. For RQ1: Compare with prior work. For RQ2: Have a ablation study experiment. For RQ3: Analyze the detected real-world pytorch and tensorflow bugs. Future work: Apply LLMs to other areas. Improve the related algorithm(e.g., evolution algorithm). Find some other LLMs to achieve a better performance.","link":"/blog/2023/11/15/21_paper_review_11/"},{"title":"[Review] Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm","text":"Link here The paper discusses about prompt engineering, mainly focusing on GPT-3. It compiles some prompt engineering approaches. Background: The recent rise of massive self-supervised language models such as GPT-3 arises the interests of prompt engineering. For such models, 0-shot prompts may significantly outperform few-shot prompts. So, the importance of prompt engineering is again being promoted. Some facts: 0-shot may outperform few-shot: instead of treating examples as a categorical guide, it is inferred that their semantic meaning is relevant to the task. For GPT-3, its resemblance not to a single human author but a superposition of authors. Methods for prompt engineering: Direct task specification: constructing the signifier A signifier is a pattern which keys the intended behavior. It could be the name of the task, such as “translate”, a compound description, such as “rephrase this paragraph so that a 2nd grader can understand it, emphasizing real-world applications”. Such signifier explicitly or implicitly calls functions which it assumes the language model has already learned. Task specification by demonstration Some tasks are most effectively communicated using examples. Examples may be effective to specify GPT-3. Task specification by memetic proxy GPT-3 demonstrates nuanced understanding of analogies. GPT-3’s ability to create simulations of well-known figures and to draw on cultural information far exceeds the ability of most humans. By creating a narrative environment or staging a dialogue between a teacher and student may be a good way to task specification. Prompt programming as constraining behavior GPT-3 has the ability to continue the prompts people want, but also the opposite direction, for its abundant knowledge. So a contextually ambiguous prompt may be continued in mutually incoherent ways. GPT-3 will respond in many ways to a prompt if there are various ways that it is possible to continue the prompt - including all the ways unintended by the human operator. So, a prompt that is not merely consistent with the desired continuation, but inconsistent with undesired continuations is needed. Serializing reasoning for closed-ended questions For tasks that require reasoning, it is crucial that prompts direct a language model’s computation in truth-seeking patterns. It is reasonable to expect that some tasks may be too difficult to compute in a single pass but solvable if broken up into individually tractable sub-tasks. When extending reasoning, it is essential to discourage premature verdicts, otherwise all subsequent computation serves only to rationalize the already-chosen verdict without improving the probability of the verdict’s accuracy. Metaprompt programming Apply metaprompt to generating the whole prompt may be quite effective.","link":"/blog/2023/11/18/22_paper_review_12/"},{"title":"[Review] Automated Program Repair in the Era of Large Pre-trained Language Models","text":"Link here The paper presents the first extensive evaluation of recent LLMs for fixing real-world projects. It evaluates the effectiveness of the Automated Program Repair(ARP) in the era of LLMs. Several conclusions were drawn: As we increase the size of the model, we also increase in the number of correct and plausible patches generated. Successfully utilizing the code after the buggy lines is important for fixing bugs. While LLMs have the capability to perform fault localization and repair in one shot, for real world software systems, it is still more cost-effective to first use traditional fault localization techniques to pinpoint the precise bug locations and then leverage LLMs for more targeted patch generation. By directly applying LLMs for APR without any specific change/finetuning, we can already achieve the highest number of correct fixes compared to existing baselines. Entropy computation via LLMs can help distinguish correct patches from plausible patches. Sum entropy performs slightly better compared to mean entropy. Background: Current APR tools, both template-based APR and learning-based APR, have been restricted by former knowledge shortage. Recent developments in building LLMs offer an alternative solution that can be applied for program repair without relying on historical bug fixes. Automated Program Repair (APR) tools are used to generate patched code given the original code and the corresponding buggy location. Evaluation: RQ1: How do different types of LLMs perform for different APR settings? RQ2: How does directly applying LLMs for APR compare against state-of-the-art APR tools? RQ3: Can LLMs be directly used for patch ranking and correctness checking? RQ4: Can we further improve the performance of LLMs? Select different LLM models with different implementations and different parameter numbers: GPT-Neo(125M, 1.3B and 2.7B parameters), GPT-J(6.7B parameters), GPT-NeoX(20B parameters) Codex(12B parameters) CodeT5(220M parameters) INCODER(1.3B and 6.7B parameters) Codex suffix version Generation methods Complete function generation: input the whole buggy function. Correct code infilling: know the bug location, generate the correct replacement code given the prefix and suffix of the buggy function. Single line generation: know the bug location, replace the single line. Evaluate the relations between entropy and repair validity. Compare LLM supported methods with former methods. Evaluate unique examples towards the same buggy segment(different from the official bug fix). Apply repair templates with LLMs to further improve the performance. Future work: Apply LLMs to other areas. Apply other form metrics(for example, repair templates) with LLMs to achieve a better performance. Improving LLM’s performance itself.","link":"/blog/2023/11/19/23_paper_review_13/"},{"title":"[Review] Examining Zero-Shot Vulnerability Repair with Large Language Models","text":"Link here The paper tests the performance of LLM for program repair. The same topic as Automated Program Repair in the Era of Large Pre-trained Language Models. Differently, this paper focuses more on the details, whose program repair setting is much more complicated. Some conclusions were drawn: LLMs can generate fixes to bugs. But for real-world settings, the performance is not enough. Background: Security bugs are significant. LLMs are popular and has outstanding performance. Implementation: RQ1: Can off-the-shelf LLMs generate safe and functional code to fix security vulnerabilities? RQ2: Does varying the amount of context in the comments of a prompt affect the LLM’s ability to suggest fixes? RQ3: What are the challenges when using LLMs to fix vulnerabilities in the real world? RQ4: How reliable are LLMs at generating repairs? apply different LLMs: code-cushman-001, code-davinci-001, code-davinci-002, j1-large, j1-jumbo, gpt2-csrc(self-trained), polycoder. synthetic experimentation synthesize buggy programs. manually write the starting part of the program apply LLMs to generate the whole program the generated program itself may be valid, compilable, vulnerable, functional or safe. test the influence of different parameters(temperature and top_p). apply LLMs to repair the generated but vulnerable programs. evaluate the performance. not every time the more specific prompt will achieve a better performance, but the more specific one has the better performance on average. The OpenAI Codex models consistently outperform the other models with regards to generating successful patches.(which means Codex may be a quite good tool for program generation.) test on repairing hardware design languages(e.g., verilog) LLMs were less proficient at producing Verilog code than they were at C or Python. real-world bugs security patches tend to be more localized, have fewer source code modifications, and tend to affect fewer functions, compared to non-security bugs. (from A Large-Scale Empirical Study of Security Patches | Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security) So bugs may gather, meaning that focusing on some nearby area can still be enough valid. A whole real-world program is too long to digest, so some measures are taken. testing process is almost the same as synthesized buggy programs testing.","link":"/blog/2023/11/21/24_paper_review_14/"},{"title":"[Review] Prompting Is All You Need: Automated Android Bug Replay with Large Language Models","text":"Link here This paper demonstrates a new approach to replaying the Android bugs. More specifically, creates a new tool called AdbGPT to automatedly convert bug reports to reproduction. For the result, AdbGPT is able to reproduce 81.3% of bug reports in 253.6 seconds, outperforming the state-of-the-art baselines and ablation studies. Background: Bug reports often go on to contain the steps to reproduce (S2Rs) the bugs that assist developers to replicate and rectify the bugs, albeit with considerable amounts of engineering effort. The bug reports contain several steps and are difficult for pre-trained models to e the extract the features. It will be suitable for LLMs to handle this with some prior examples. Implementation: Typically divided into two phases: S2R Entity Extraction phase: extract S2R entities defining each step to reproduce the bug report. provide examples: an S2R as input, a chain-of-thought as reasoning, and the final entities as the output. Input bug reports to query for S2R entities. Guided Replay phase: match the entities in S2R with the GUI states to repeat the bug reproduction steps. GUI encoding: encode the GUI into html form. some examples for in-context learning: help AdbGPT understand the meaning of the html tags and help it learn the handling of some missing steps. use the ChatGPT model. use Genymotion for running and controlling the virtual Android device, Android UIAutomator for dumping the GUI view hierarchy, and Android Debug Bridge (ADB) for replaying the steps. Evaluation: RQ1: How accurate is our approach in extracting S2R entities? RQ2: How accurate is our approach in guiding bug replay? RQ3: How efficient is our approach in bug replay? RQ4: How usefulness is our approach for developers in real world bug replay? For RQ1: test the performance of AdbGPT in extracting S2R entities. compare it with some baseline tools(ReCDroid, ReCDroid+, MaCa). apply ablation study: without pre-inputted examples, turn it into zero-shot. without intermediate reasons. For RQ2: The same evaluation method as RQ1. For RQ3: calculate the average time it takes for a bug report to pass through each of the two phases. 2.6 GHz Macbook Pro with 6 dedicated CPU Intel Core. official Android x86-64 emulator. The same evaluation method as RQ1. For RQ4: invited experienced Android developer to replay the bug report, recording the time cost. replay the bug reports using AdbGPT. compare the time. Future work: LLMs for software engineering!!!","link":"/blog/2023/11/22/25_paper_review_15/"},{"title":"[Review] Nuances are the Key: Unlocking ChatGPT to Find Failure-Inducing Tests with Differential Prompting","text":"Link here The paper explores the ability of ChatGPT(not LLMs, only ChatGPT) to find failure-inducing tests, and proposes a new method called Differential Prompting to do it. It can achieve a success rate of 75% for programs of QuixBugs and 66.7% of programs of Codeforces. This approach may only be useful in some small scale programs(less than 100 LOC). Background: Failure-inducing tests is some testcases that can trigger bugs of the specific program. Finding such tests is a main objective in software engineering, but challenging in practice. Recently, applying LLMs(e.g., ChatGPT) for software engineering has become popular, but directly apply ChatGPT to this task may be challenging and has a bad performance. Cause ChatGPT is insensitive to nuances(i.e., subtle differences between two similar sequence to tokens). So, it’s challenging for ChatGPT to find identify bugs because a bug is essentially a nuance between a buggy program and its fixed version. As we know that ChatGPT is insensitive to nuances, which also facilitate its ability to infer a program’s intention. So, as long as we know the original intention of the maybe buggy program, we can tell ChatGPT to generate some inference programs according to this intention(assume the inference programs has a high possibility of non-buggy, and several inference programs should be generated to guarantee it). Then, given a testcase, we can check the output of the original program and the inference programs, and examine if there is any difference between the output. Similar to Differential Testing, this approach is named Differential prompting. Implementation: The implementation is divided into three sub-tasks: program intention inference, program generation, and differential testing. program intention inference given the original program, tell the ChatGPT to infer its intention. program generation according to the intention inferred before, generate several inference programs with this same intention. this inference programs should be the different implementation but with the same intention. differential testing perform differential testing between the original program and inference programs. Evaluation: RQ1: Can correct failure inducing test cases for QuixBugs programs be effectively found? RQ2: Can program intention be effectively inferred? RQ3: Can reference versions be effectively generated? RQ4: Can correct failure-inducing test cases for recent Codeforces programs be effectively found? equipment: AMD Ryzen 7 5800 8-Core Processor 3.40 GHz and 16GB RAM. For RQ1 two baselines BASECHATGPT: prompt ChatGPT directly for failure-inducing test cases. PYNGUIN: the state-of-the-art unit test generation tool for Python. dataset: QuixBugs, consisting of 40 pairs of buggy and patched Python programs. overall, Differential Prompting’s success rate is 75.0%, 2.6X as BASECHATGPT (28.8%) and 10.0X as PYNGUIN (7.5%). For RQ2 manually analyze the intention of the programs, and compare the results with the ChatGPT generated intention. Differential Prompting’s success rate in inferring intention is 91.0%. For RQ3 baseline: directly prompts ChatGPT to generate reference versions. ask ChatGPT whether a PUT has bugs. upon affirmative response, it further asks ChatGPT to generate two bug-fixed implementations of the PUT. Differential Prompting’s success rate in generating good reference versions is 74.6%, outperforms the baseline (6.8%) by 11.0X. Differential Prompting is effective in generating good reference versions. For RQ4 to ensure validity: conduct an evaluation on Codeforces programs released after the cutoff date of ChatGPT’s training dataset. baselines: as adopted for RQ1. Differential Prompting’s success rate on these Codeforces programs (66.7%) is comparable to its success rate on QuixBugs programs (75.0%). Future work: Differential Prompting can only be adopted to small programs, so dividing a large program into small programs can be further investigated. Incorporate coverage oriented methods to better generate failure-inducing tests.","link":"/blog/2023/11/23/26_paper_review_16/"},{"title":"[Review] DynSQL: Stateful Fuzzing for Database Management Systems with Complex and Valid SQL Query Generation","text":"Link here The paper designs a stateful DBMS fuzzer called DynSQL. DynSQL adopts two new methods: Dynamic Query Interaction and Error Feedback. Instead of generating all of the SQL queries before performing them, Dynamic Query Interaction allows the fuzzer to fuzz the DBMSs “step-by-step”, that is, dynamically determine the next statement after executing every prior statement. Also, Error Feedback allows the seed generation to generate more valid SQL statements and queries. Background: Former DBMS testing tools: SQLsmith, SQUIRREL, SQLancer. Existing DBMS fuzzers are still limited in generating complex and valid queries to find deep bugs in DBMSs. SQLsmith generates only one statement in each query, SQUIRREL produces over 50% invalid queries and tends to generate simple statements. SQLancer aims to figuring out logic bugs of DBMSs rather than general bugs. Implementation: select seed from the seed pool, perform seed mutation, and generate files according to the seed. Translator translate the files into SQL queries(with DBMS state information). Scheduler feeds the DBMS for every single query, update the state information after every execution, and the Translator will provide a stateful query for Scheduler. Code instrumentor: compiles(using Clang) and instruments the code of the target DBMS, and generates an executable program that receives and processes SQL queries. Query interactor receives input files from the file fuzzer and performs dynamic query interaction to generate complex and valid queries. collects necessary runtime information of the target DBMS for dynamic analysis. Statement generator: uses an internal AST model to generate syntactically correct SQL statements. Runtime analyzer: analyzes the collected runtime information, identifies seeds according to error feedback, and selects a seed for the next round of fuzzing. File fuzzer: performs conventional file fuzzing(like AFL) to generate files based on the given seeds. Bug checker: detects bugs based on the collected runtime information and generates corresponding bug reports. Evaluation: RQ1: Can DynSQL find bugs in real-world DBMSs by generating complex and valid queries? RQ2: How about the security impact of the bugs found by DynSQL? RQ3: How do dynamic query interaction and error feedback contribute to DynSQL in DBMS fuzzing? RQ4: Can DynSQL outperform other state-of-the-art DBMS fuzzers? testbench: SQLite, MySQL, MariaDB, PostgreSQL, MonetDB. For RQ1 calculate generated queries and statements, analyzing its validity. calculate number of bugs found. analyze statements in bug-triggering queries. analyze statement number for different queries. analyze distribution of different SQL statement types. analyze size of bug-triggering queries For RQ2 analyze the bugs types of the detected bugs. perform 3 case studies to illustrate the importance of the bugs found. For RQ3 perform ablation study. compare the performance of DynSQL, DynSQL without dynamic query interaction, DynSQL without error feedback, DynSQL without both dynamic query interaction and error feedback. For RQ4 compare with SQLsmith, SQUIRREL. compare code coverage, bug detection, query complexity. Future work: adopt it(Dynamic Query Interaction and Error Feedback) to logic bug detection. adopt Dynamic Query Interaction to other fuzzing areas, e.g., REST API fuzzing.","link":"/blog/2023/11/28/27_paper_review_17/"},{"title":"[Review] PyRTFuzz: Detecting Bugs in Python Runtimes via Two-Level Collaborative Fuzzing","text":"Link here The paper proposes a new approach to Python fuzzing, called PyRTFuzz. PyRTFuzz divides the fuzzing process into two levels: the generation-based level: generate the python applications. the mutation-based level: apply mutation-based fuzzing to test the generated python applications. Background: Three existing problems for Python fuzzing: testing the Python runtime requires testing both the interpreter core and the language’s runtime libraries. diverse and valid(syntactically and semantically correct) Python applications are needed. data types are not available in Python, so type-aware input generation is difficult. Implementation: Runtime API Description Extraction to extract the API description from Python’s official documentation. Static Extraction: use the standard AST parser of Python to extract API descriptions. Dynamic Refinement: given the untyped API description of a runtime API, run the unit tests to refine the untyped description to produce the typed API description. Level-1 Fuzzing generation-based fuzzing for a single API, generate a Python application for testing. perform application generation, generate more diverse applications towards this API. Level-2 Fuzzing given a application generated in level-1, perform mutation-based fuzzing for testing. mutate the input data according to its data type. Evaluation: PyRTFuzz only generates Python APPs each using a single API, without considering the potential dependencies among APIs. RQ1: How effective is PyRTFuzz on fuzzing Python runtime? RQ2: How scalable is Python APP generation in PyRTFuzz? RQ3: What are the factors affecting PyRTFuzz’s effectiveness? Benchmarks: Python 3.9.15, Python 3.8.15, and Python 3.7.15. For RQ1: demonstrate the coverage. show the bug triggering ability. For RQ2: show the impact of APP specification sizes towards time costs. increasing the APP specification size can generally help generate more complex Python APPs. For RQ3: evaluate the influences towards effectiveness of the following three dimensions. APP Specification Size. Level-2 Time Budget. Typed versus Untyped API Descriptions. Perform two case studies to introduce the bugs triggered. Future work: apply combined fuzzing(both generation-based and mutation-based fuzzing) to other areas. introduce a new method which can test multiple APIs together.","link":"/blog/2023/12/04/28_paper_review_18/"},{"title":"[Review] Automatic Detection of Java Cryptographic API Misuses: Are We There Yet?","text":"Link here A large study of Java cryptographic API misuse. Two main contributions are made: evaluate the effectiveness of existing cryptographic API misuse detection tools. conduct a study with the developers, measuring the real-world performance of detectors. Introduction: JCA (Java Cryptography Architecture), JSSE (Java Secure Socket Extension). Java cryptographic API misuses are common, which may cause a extensive security problems. 13 Java types frequently mentioned in the API-misuse patterns. Implementation: 6 existing tools: CogniCrypt, CryptoGuard, CryptoTutor, FindSecBugs, SonarQube, and Xanitizer. 3 benchmarks: CryptoBench, MUBench, and OWASP Benchmark. Evaluation criteria: precision, recall, F-score rates, and runtime overhead. RQ1: How are current tools designed to detect cryptographic API misuses? RQ2: How effectively do current tools work to locate cryptographic API misuses? RQ3: How do developers perceive the usefulness of tools’ outputs? For RQ1: existing tools are different in terms of their availability, input formats, pattern representations, pattern-matching strategies, and outputs. most tools represent patterns as built-in rules, conduct inter-procedural analysis, and report detected API misuses as outputs. For RQ2: the measured time costs imply that given hundreds of programs to scan, the experimented tools usually respond within six hours (18,000 seconds). no tool consistently worked best. However, CogniCrypt, CryptoGuard, and Xanitizer always outperformed SonarQube, probably due to their sophisticated inter-procedural analysis and larger pattern sets. For RQ3: developers are usually negative towards the reported security-API misuses. reasons for this phenomenon: Incomplete fixing suggestion: the fixing guidance offered by the tool is not complete, extra fixing efforts were needed. Complex repair procedures: some repair process are challenging and time consuming, and developers have almost zero tool assistance for non-code artifact configuration. developers need tools to provide more detailed suggestions on repairing edits and non-code artifact configuration. some PRs(pull requests) were rejected, main because: No exploit demo: actual security attacks to demonstrate the security exploit are required. False positives without actual security impact: some misuse may cause no damage to the system. developers need tools to demonstrate security exploits of vulnerabilities, and to skip issues located in test cases, archived code, and security-irrelevant context. Future work: some new methods to detect the misuse, other than the traditional static analysis. mix the use of different existing tools to achieve a better performance. attach the tools with some other functions, like security exploits of vulnerabilities.","link":"/blog/2023/12/09/29_paper_review_19/"},{"title":"《深入理解计算机系统》CSAPP 前言","text":"虽然寒假已经要结束了,但还是心血来潮给自己挖了个坑。在和学长的聊天中得知了CSAPP,听说是本不错的书,也有相应的课程,于是就想着学学看,也是想抓住假期的尾巴,防止自己摆烂。我是看书学习的,但也有相应的CMU课程。B站上有相应CMU课程的网课,链接放在下面。建议看书学习,网课用来复习【精校中英字幕】2015 CMU 15-213 CSAPP 深入理解计算机系统 课程视频","link":"/blog/2022/02/15/2_csapp_pre/"},{"title":"[Review] CryptoGuard: High Precision Detection of Cryptographic Vulnerabilities in Massive-sized Java Projects","text":"Link here The paper designs a new architecture called CryptoGuard to detect the cryptographic API misuse. Use 16 rules to figure out the misuses and 5 refinement methods to avoid false positive, which resulting a precision of 98.61%. Creates a benchmark named CryptoApi-Bench with 112 unit test cases. CryptoApi-Bench contains basic intraprocedural instances, inter-procedural cases, field sensitive cases, false positive tests, and correct API uses. Introduction: For cryptographic API misuse detection, both static and dynamic analyses have their respective pros and cons. Static methods do not require the execution of programs. They scale up to a large number of programs, cover a wide range of security rules, and are unlikely to have false negatives. Dynamic methods require one to trigger and detect specific misuse symptoms at runtime. They tend to produce fewer false positives than static analysis. API misuse mainly contain the following problems: Vulnerabilities due to predictable secrets. Vulnerabilities from MitM attacks on SSL/TLS. Vulnerabilities from predictable PRNGs(predictable pseudorandom number generators). Vulnerabilities from CPA(chosen plaintext attacks). Vulnerabilities from feasible bruteforce attacks. Implementation: Apply static def-use analysis and forward and backward program slicings for detecting Java cryptographic API misuses. Apply refinement: RI-I: Removal of state indicators. Discard constants/predictable values that are used to describe the state of a variable during an orthogonal method invocation. RI-II: Removal of resource identifiers. Discard constants/predictable values that are used as the identifier of a value source during an orthogonal method invocation. RI-III: Removal of bookkeeping indices. Discard constants/predictable values that are used as the index or size of any data structures. Specifically, RI-III discards any influences on i) size parameter of an array or a collection instantiation, ii) indices of an array, iii) indices of a collection. RI-IV: Removal of contextually incompatible constants. Discard constants/predictable values, if their types are incompatible with the analysis context. For example, a boolean variable cannot be used as a key, IV, or salt. RI-V: Removal of constants in infeasible paths. Some constant initializations are updated along the path to the slicing criterion. Need to discard the initializations that do not have a valid path of influence to the criterion. Evaluation: RQ1: What are the security findings in Apache Projects? Do Apache projects have any high-risk vulnerabilities such as hardcoded secrets or MitM vulnerabilities? RQ2: What are the security findings in Android Apps? Do thirdparty libraries have any high-risk vulnerabilities? RQ3: How does CryptoGuard compare with CrySL, SpotBugs, and the free trial version of Coverity on benchmarks or real-world projects? Future work: the refinement methods may cause false negatives while reducing the false positives. generate reports that shows how to exploit the vulnerabilities.","link":"/blog/2023/12/09/30_paper_review_20/"},{"title":"[Review] Evaluation of Static Vulnerability Detection Tools with Java Cryptographic API Benchmarks","text":"Link here The paper assesses the performance of the current static vulnerability detection tools in the era of Java cryptographic API misuse. Main contributions: provide two benchmarks: CryptoAPI-Bench, ApacheCryptoAPI-Bench. CryptoAPI-Bench consists of 181 test cases covering 16 types of Cryptographic and SSL/TLS API misuse vulnerabilities, with basic level and advanced level. ApacheCryptoAPI-Bench documents the API misuse vulnerabilities from 10 real-world Apache projects. This benchmark is for checking the scalability(the ability to induce low computational overhead to analyze large code-bases) of the detection tool. evaluate four static analysis tools based on the two proposed benchmarks: specialized tools(CryptoGuard, CrySL), general purpose tools(SpotBugs, Coverity). Background: Categorizes the types of cryptographic API misuse. A brief introduction to 4 static analysis tools. Implementation: CryptoAPI-Bench Basic Cases: some simple misuse examples Advanced Cases: more complex examples Interprocedural Cases: API misuse exists in different function procedures. Field Sensitive Cases: API misuse exists in different fields in the same object. Combined Cases: combine both Interprocedural Cases and Field Sensitive Cases. Path-Sensitive Cases: function execution depends on the path condition. Miscellaneous Cases: distinguish some irrelevant constraints or other interfaces. Multiple Class Cases: API misuse exists in different classes. ApacheCryptoAPI-Bench include the early version of real-world large 10 Apache projects to check the scalability property of different tools. enlist 121 test cases, and 79 of them are basic cases, 42 of them are advanced cases. check the official documents of the Apache Projects and filter out the ground truth API misuse. Evaluation: Evaluation Criteria: True positive, False positive, False negative. Main findings: tools that are specialized to detect cryptographic misuses cover more rules and higher recall than general purpose tools. none of the existing tools is path-sensitive. Future work: focus on path-sensitive API misuse detection. focus on API misuse in other eras and other programming languages.","link":"/blog/2023/12/15/31_paper_review_21/"},{"title":"[Review] Python Crypto Misuses in the Wild","text":"Link here The paper conducts a study on Python crypto API misuses. A tool called LICMA is implemented aiming at detecting crypto API misuses in python. Several conclusions: 52.26 % of the Python projects using crypto APIs contain at least a potential misuse. Only 14.81 % of the projects directly contain a misuse of a crypto API. The rest is introduced through third-party code. Most Python applications are more secure compared with C or Java, and the distribution between the concrete types of misuses differ a lot. Background: There has been some tools to detect the crypto API misuses in C and Java, but there is no such tool in Python. There are some user study of API misuses in Python, but no empirical analysis. Implementation: apply Babelfish to create a Universal Abstract Syntax Tree (UAST). with the defined rule, by filtering the AST with XPath, perform the backward analysis. cover 5 different crypto modules: cryptography, M2Crypto, PyCrypto, PyNaCl, ucryptolib. Future work: apply crypto API misuses detection to other programming languages, like Rust and Go. further develop useful tools for AST abstraction and backward analysis.","link":"/blog/2023/12/22/32_paper_review_22/"},{"title":"[LeetCode] 4. Median of Two Sorted Arrays","text":"Link here An excellent binary search problem. It’s easy to use brute-force or mergesort to solve it, but the time complexity is specified as $O(log(m+n))$ in the problem. It’s obvious to think of binary search when noticing the time complexity with “log”, but constructing such a algorithm remains to be a big problem. Here are three ways to solve the problem: Method1: O(log(m)*log(n))As we are going to find the median of the two sorted arrays, which means we are going to find the k-th(or (k+1)-th) of the two arrays, we can use binary search to find the rank of each number in the array. For example, as we have picked up one single number from array A, we can use binary search to determine how many numbers in array B are smaller than this single number. Then, we can calculate the rank of this number simply, and the time complexity of it is $O(log(n))$ or $O(log(m))$. But now we are going to find the median, so another binary search is needed. For the first array A, we apply binary search to it. And for each turn, we can calculate the rank of the select number, and move the boundary to left or right after comparing the rank with median rank(i,e., $(n+m+1)/2$). In this way, we can solve the problem in $O(log(m)*log(n))$. Method2: O(log(m*n)) The key operations are specified in the following picture: For every turn of binary search, at least half of one array can be abandoned, thus decreasing the size of the searching space. Method3: O(log(min(m, n)))We can only consider about applying binary search in one single array, and now we apply it to the shorter array. After we determine the partition point of array A($partitionA$), we can calculate the partition point in array B($(n+m+1)/2-paritionA$), show shown in the picture below. Now we can compare the edge elements. Here maxLeftA is partitionA, and maxLeftB is partitionB. By comparing these four elements, we can have the following conclusions: maxLeftA<=minRightA, maxLeftB<=minRightB, because the array is sorted. if maxLeftA<=minRightB, and maxLeftB<=minRightA, then we find the answer. if maxLeftA<=minRightB, and maxLeftB>minRightA, we move the binary boundary to right. if maxLeftA>minRightB, and maxLeftB<=minRightA, we move the binary boundary to left. it’s impossible that maxLeftA>minRightB, and maxLeftB>minRightA. So we can solve the problem in $O(log(min(m, n)))$. The code is shown below. 12345678910111213141516171819202122232425262728293031class Solution {public: double findMedianSortedArrays(vector<int>& nums1, vector<int>& nums2) { int m = nums1.size(), n = nums2.size(); if (m > n) return findMedianSortedArrays(nums2, nums1); int l1, l2, r1, r2; int l = 0, r = m, mid1, mid2; while (l <= r) { mid1 = (l + r) >> 1; mid2 = (m + n + 1) / 2 - mid1; int l1 = (mid1 == 0) ? INT_MIN : nums1[mid1 - 1]; int r1 = (mid1 == m) ? INT_MAX : nums1[mid1]; int l2 = (mid2 == 0) ? INT_MIN : nums2[mid2 - 1]; int r2 = (mid2 == n) ? INT_MAX : nums2[mid2]; if (l1 <= r2 && l2 <= r1) { if ((n + m) % 2 == 1) return max(l1, l2); else return ((double)max(l1, l2) + min(r1, r2)) / 2; } else if (l1 <= r2 && l2 > r1) { l = mid1 + 1; } else if (l1 > r2 && l2 <= r1) { r = mid1 - 1; } } return 0.0; }};","link":"/blog/2023/12/27/33_leetcode_1/"},{"title":"[LeetCode] 1531. String Compression II","text":"Link here I thought it should be Greedy at first, but was stuck in Example 2. Then, I considered about DP, but found it hard to figure out how to build the larger solution from the sub-problem. So, another time I reached out for solution. We use dp[i][j] to represent the minimum length of the first i letters after compression and after j deletions. It obvious that if we delete the i-th letter, and then dp[i][j] = dp[i-1][j-1]. If we don’t, we can scan the former string sequence to try to “combine” the letters. Time complexity: $O(n^2k)$ 1234567891011121314151617181920212223242526272829303132333435class Solution {public: int getLengthOfOptimalCompression(string s, int k) { int n = s.length(); vector<vector<int>> dp(n + 5, vector<int>(k + 5, 2e9)); dp[0][0] = 0; for (int i = 1; i <= n; ++i) { for (int j = 0; j <= min(n, k); ++j) { if (j > 0) dp[i][j] = dp[i - 1][j - 1]; int del = 0, lasting_length = 0; for (int h = i; h >= 1; --h) { if (s[h - 1] == s[i - 1]) { lasting_length++; } else { del++; } if (del > j) break; if (lasting_length == 1) dp[i][j] = min(dp[i][j], dp[h - 1][j - del] + 1); else if (lasting_length <= 9) dp[i][j] = min(dp[i][j], dp[h - 1][j - del] + 2); else if (lasting_length <= 99) dp[i][j] = min(dp[i][j], dp[h - 1][j - del] + 3); else dp[i][j] = min(dp[i][j], dp[h - 1][j - del] + 4); } } } return dp[n][k]; }};","link":"/blog/2023/12/28/34_leetcode_2/"},{"title":"[Review] Towards Precise Reporting of Cryptographic Misuses","text":"Link here The paper demonstrates an investigation into Java cryptographic misuse. To be brief, the paper does some research on current misuse detection techniques, analyzing the false positive cases and true positive cases they manifest. The paper discovers the root cause of high false positive rate and invalid true positive cases. Introduction: Many cryptographic misuse detection techniques have been proposed but with a high false positive rate. Additionally, many of the misuse alarms might not be very actionable to developers, and previous works might have overestimated the number of misuses and vulnerabilities. Implementation: Three main detection tools: CRYPTOGUARD, CogniCryptSAST, CRYPTOREX. Two parts of investigations: false postive, invalid true postive. Manually inspect the false positive cases and analyze their root cause. Results: Some detectors’ implementation methods may have some mistakes. Detecting methods for static detectors should be updated. The same crypto API may be unsafe in some Java versions but safe in a higher version. Static seeds for random may not always be considered risky. The constant seed can also lead to proper random function. Whitelists require careful curation to capture common legitimate programming patterns. AES-ECB, http://, non-CSPRNG, and collision-prone hash functions have legitimate usages where they provide sufficient guarantees and desirable performance. Developers are sometimes bound by standard mandates to use certain algorithms and constants. As a partial refinement, one can extract class/method names known to be implementing such standards, and incorporate them in a misuse alarm filter. Future work: That is, take real-world situations into account, instead of designing some constant patterns.","link":"/blog/2024/03/24/36_paper_review_24/"},{"title":"[Review] How Good Are the Specs? A Study of the Bug-Finding Effectiveness of Existing Java API Specifications","text":"Link here The paper is a evaluation, which assesses the current runtime verification technology, and mainly the effectiveness of the existing API specifications. Three conclusions: Current RV technology has matured enough with tolerable runtime overhead. Existing API specification can find many bugs that developers are willing to fix. The false alarm rates are quite high due to the ineffective specifications. Introduction specification: a way to use an API as asserted by the developer or analyst, and which encodes information about the behavior of a program when an API is used. In RV, the execution of a software system is dynamically checked against formal specifications. The program is monitored, and while there are violations, they will be captured. Experiment Set up: 199 specs(182 manually written, 17 automatically mined) and 200 open-source projects(used Maven, had at least one test, had all tests pass without monitoring, had all tests pass when monitoring with JavaMOP). Environment: Intel i7-3770K CPU @ 3.50GHz processor and 32GB of RAM running Ubuntu 14.04.4 LTS and Java 7 or 8. For manually written specs: written by Lee et al., selected from else where. For automatically mined specs: Paper Search -> Paper Filtering -> Email Authors => 17 papers related. Tests generation: Randoop. The violations are divided into two groups: dynamic violations (DV) and static violations (SV). Cause some same violations may be triggered for several times, so for different DVs may be grouped into one single SV. The violations are classified as: TrueBug: A potential bug to be confirmed by reporting to the developers or by checking if it was already fixed. FalseAlarm: The violation does not indicate a bug in the code but effectively a bug/imprecision in the spec. HardToInspect: The violation is hard to classify as a TrueBug or a FalseAlarm, because source code is missing or is particularly hard to reason about. The false alarm rate (FAR) will then be calculated. Results The FARs are high, reaching more than 80%. The similar FARs across all these dimensions suggests that the FARs are mostly due to inherent (in)effectiveness of the specs and less due to specific code-related factors. Violations in libraries are somewhat more likely to be false alarms, as one would expect that libraries are indeed better tested and have fewer bugs than the project code. Existing specs are rather ineffective for finding bugs, because they raise too many false alarms. Future works Better technologies for specification generation. Maybe try LLMs. Better automatic specification mining technologies. Automated filtering of specs and false alarms. Or try to decease the false alarms from the code side(a plugin for preventing potential false alarm code problems).","link":"/blog/2024/02/06/35_paper_review_23/"},{"title":"[Review] One Simple API Can Cause Hundreds of Bugs: An Analysis of Refcounting Bugs in All Modern Linux Kernels","text":"Link The paper mainly focuses on the reference counting(refcounting) bugs in Linux Kernel. Analyzes the history of 1,033 refcounting bugs in 753 versions of Linux Kernels from 2005 to 2022, and concludes 9 critical rules to check refcounting bugs. Designs a new tool applying these 9 rules, and detects 351 new bugs, of which 240 are confirmed. Introduction Reference counting bugs: the reference count is used to record the reference number of an object(similar to smart pointers in C++). Potential risks: Memory leakage, UAF. Implementation Detailedly analyzes the history information. Findings: A majority (741/1033, about 71.7%) of the studied refcounting bugs can lead to memory leaks, and more than two-thirds (694/1033, about 67.2%) of all bugs are caused by missing-decreasing problems. More than one-half (590/1033, about 57.1%) of the bugs can be detected by searching unpaired operations within the same functions. The refcounting bugs meet the long-tailed distributions in the Linux kernel. About 82.4% (851/1033) of refcounting bugs could be detected within “drivers”, “net” and “fs” subsystems, among which more than half (588/1033, about 56.9%) of all bugs occurred in “drivers”. Interestingly, when demonstrating the root cause of the hidden refcounting type, the author shows a table below: The word vector is used to prove why such kind of bug is easily prone. The author implements a new tool according to the 9 rules(static analysis).","link":"/blog/2024/04/25/38_paper_review_26/"},{"title":"[Review] HEALER: Relation Learning Guided Kernel Fuzzing","text":"Link The paper proposes a new technique called relation learning to help infer the relations between system calls when fuzzing the kernel. Relation learning is achieved by constructing a relation graph, which is a two-dimensional graph with each cell R_{ij} representing the dependencies between two system calls. The relation graph is built through static and dynamic learning. Static learning will infer the dependencies by analyzing the parameters and the return value of each system call. Dynamic learning will determine the dependencies by analyzing the generated minimized system call sequences. Introduction Kernel fuzzing is critical, and current works like Syzkaller and Moonshine are unable to infer the relations between system calls, resulting in the inability to figure out the hidden problems in the kernel. In this scenario, HEALER is created with relation learning to help cope with this problem. HEALER achieves higher coverage than Syzkaller and Moonshine by 28% and 21% on average, respectively. Furthermore, HEALER achieves the same amount of coverage as that of Syzkaller and Moonshine with a speed-up of 2.2× and 1.8×, respectively. Implementation System call descriptions are reused(i.e., Syzlang). Static learning: Inferring the dependencies through the parameters and the return value of each system call. Dynamic learning: Inferring the dependencies through generated minimized system call sequences. The relation graph will help the fuzzing process by mutation and parameter synthesis. Evaluation RQ1: How well does HEALER perform compared to Syzkaller and Moonshine? RQ2: How effective is relation learning in assisting test case generation and mutation? RQ3: How does HEALER perform in vulnerability detection? Ablation involved. Reconstruct a subsystem HEALER-(HEALER without relation learning). Benchmarks: HEALER, HEALER-, Syzkaller, Moonshine. Testing dimensions: branch coverage, efficiency, vulnerabilities detected, system call sequence length, and a case study. Future work Syzlang descriptions autonomous generation. Better relation inferring techniques to increase the system call length.","link":"/blog/2024/04/01/37_paper_review_25/"},{"title":"[Review] MoonShine: Optimizing OS Fuzzer Seed Selection with Trace Distillation","text":"Link The paper proposes the concept of Trace Distillation, that is, to distill or extract the key system calls from the original system call sequence without lowering the coverage, and these distilled sequences will be used as the seed for mutation during fuzzing. From the distillation process, the dependencies between the system calls will be inferred to help distillation. So actually, the root cause of the speed-up is the dependency inference. Use static analysis to achieve the seed distillation: inferring both explicit and implicit dependencies between system calls. MoonShine improved Syzkaller’s test coverage for the Linux kernel by 13% and discovered 17 new previously-undisclosed vulnerabilities in the Linux kernel. Introduction Kernel fuzzing, an old topic. Challenges: dependencies between system calls, kernel states for specific bug triggering. Existing hand-coded rules are not scalable or effective. Implementation Collects system call sequences from Linux Testing Project, Linux Kernel selftests(kselftests), Open Posix Tests, and Glibc Testsuite. Refine the sequences by explicit dependency inference and implicit dependency inference. Explicit dependency inference: build connections between the arguments and return values of two system calls. Implicit dependency inference: If a system call will affect another system call by changing the states(like changing a global variable), we call there is an implicit dependency. Apply source code analysis to figure out this kind of dependency(check the assignment statement and the conditional statement). Evaluation RQ1: Can MoonShine discover new vulnerabilities? RQ2: Can MoonShine improve code coverage? RQ3: How effectively can MoonShine track dependencies? RQ4: How efficient is MoonShine? RQ5: Is distillation useful? Baselines: Moonshine(Implicit+Explicit), MoonShine(Explicit), RANDOM(randomly choose system calls when distilling), default Syzkaller. Result: 1. MoonShine found 17 new vulnerabilities that default Syzkaller cannot find out of which 10 vulnerabilities can only be found using implicit dependency distillation. MoonShine achieves 13% higher edge coverage than the default Syzkaller. MoonShine distills 3220 traces consisting of 2.9 million calls into seeds totaling 16,442 calls that preserve 86% of trace coverage. MoonShine collects and distills 110 gigabytes of raw program traces in under 80 minutes. Running Syzkaller with undistilled seeds slows the mutation rate by 53%. Running Syzkaller on distilled seeds only reduces the mutation rate to 88.4% of what is achieved by default Syzkaller. Future work Dependency inference(still an important topic). Thread-related dependency inference: Some dependencies may occur between different threads, which is a challenge for traditional methods to infer these dependencies.","link":"/blog/2024/04/28/39_paper_review_27/"},{"title":"《算法导论》ITA 前言","text":"后天就要回学校了,寒假是真的要结束了,可我又开了一个坑…本来在看着CSAPP的网课,后来发现网课讲的还是不够详细,真要学还得是看书,于是我关掉了课程,捡起了书本。但看书是真的无聊,看文字比看视频枯燥了许多,所以还是感觉写些东西会好一些(虽然我还有两篇3000字的思想报告没写)算导这本书我已经买了好久了,由于高中搞过OI,对算法这一块也还有些执念。虽然已经将近两年没有学过算法,但学些东西总是好的。最后希望我不要成为一个只会写前言的人…","link":"/blog/2022/02/16/3_ita_pre/"},{"title":"[Review] GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis","text":"Link The paper introduces GPTScan to detect logic bugs in smart contracts. GPTScan combines LLM and traditional static analysis tools to create a new detection tool. GPTScan depends little on the LLM, which only serves as a role of determining whether the target function has a bug or not. What’s more, the criteria for determining the bug is hand-written. So, only a small part of the tool is composed of LLM. GPTScan achieves high precision (over 90%) for token contracts and acceptable precision (57.14%) for large projects, as well as a recall of over 70% for detecting ground-truth logic vulnerabilities. Implementation GPT may overlook some low-level information, potentially leading to low recall and high false positives. Multidimensional filtering + Static Reachability Analysis -> filter possible candidate functions candidate functions -> GPT -> YES/NO if YES -> GPT -> key variables, key statements -> static analysis. some techniques: “mimic-in-the-background” prompting: add “You can mimic answering them in the background five times and provide me with the most frequently appearing answer.” in the prompt. set temperature to 0 to make the model deterministic. Multidimensional filtering: filter out libraries and test files. Evaluation RQ1: What is the false positive rate of GPTScan when analyzing a dataset of non-vulnerable top contracts? RQ2: How accurate is GPTScan in analyzing real-word datasets with logic vulnerabilities, and how effective is it compared to existing tools? RQ3: How effective is GPTScan’s static confirmation in improving the accuracy of GPTScan? RQ4: What are the running performance and financial costs of GPTScan? RQ5: Can GPTScan discover new vulnerabilities that were previously missed by human auditors? dataset: Top200: consists of smart contracts with the top 200 market capitalization. Web3Bugs: collected from the recent Web3Bugs dataset. DefiHacks: sourced from the well-known DeFi Hacks dataset, which contains vulnerable contracts that have experienced past attack incidents. benchmark: Slither, MetaScan’s online static scanning service.","link":"/blog/2024/06/04/40_paper_review_28/"},{"title":"《深入理解计算机系统》CSAPP(一)","text":"计算机系统漫游 信息就是位+上下文 8位为一字节 只有ASCLL字符构成的文件称为文本文件,所有其他文件都称为二进制文件 系统中所有信息都是由一串比特表示的,仅靠数据对象的上下文进行区分 程序被其他程序翻译成不同的格式 预处理阶段:预处理器根据#开头的命令修改原始C程序。hello.c$\\rightarrow$hello.i 编译阶段:编译器将hello.i翻译成汇编语言程序。hello.i$\\rightarrow$hello.s 汇编阶段:汇编器将hello.s翻译成机械语言指令,把这些指令打包成可重定位目标程序的格式,将结果保存在hello.o中。hello.s$\\rightarrow$hello.o 链接阶段:将单独预编译好的文件与hello.o合并,得到可执行文件hello。hello.o$\\rightarrow$hello 了解编译系统如何工作大有益处 可优化程序性能 理解链接时出现的错误 避免安全漏洞 处理器读并解释储存在内存中的指令 此时hello.c已被编译系统翻译成了可执行文件hello,并被存放在磁盘上 将它的文件名输入到成为shell的应用程序中以在Unix系统上运行该文件 系统的硬件组成 总线 贯穿整个系统的一组电子管道 携带信息字节并负责在各个部件间传递 I/O设备 系统与外部世界的联系通道 每个I/O设备都通过一个控制器或适配器与I/O总线相连 控制器与适配器都用于在I/O总线和I/O设备之间传递信息 主存 临时储存设备 处理器执行程序时用来存放程序和程序处理的数据 物理上:由一组动态随机存取存储器(DRAM)芯片组成 逻辑上:是一个线性的字节数组,每个字节都有其唯一的地址(数组索引),地址从零开始 中央处理单元/处理器(CPU) 执行存储在主存中的指令 核心为一个大小为一个字的寄存器,称为程序计数器(PC) PC指向机器语言指令(CS:IP) 算术/逻辑单元(ALU) 下面为CPU可能执行的操作 加载:从主存复制一个字/字节到寄存器 储存:从寄存器复制一个字/字节到主存 操作:把两个寄存器的内容复制到ALU,ALU对这两个字做算数运算,并将结果储存到一个寄存器中 跳转:改变PC中的值 区分处理器的指令集架构和微体系架构 指令集架构描述每条机器代码指令的效果 微体系架构描述处理器实际实现方式 运行hello程序 高速缓存至关重要 上述例子揭示了:系统花费大量时间运输信息 机械原理:较大的存储设备比较小的存储设备运行慢 高速缓存(cache) 存放处理器近期可能会需要的信息 L1和L2高速缓存用静态随机访问存储器(SRAM)的硬件技术实现 储存设备形成层次结构 上一层作为下一层的高速缓存 操作系统管理硬件 所有应用程序对硬件的操作尝试都必须通过操作系统 操作系统功能 防止硬件被失控的应用程序滥用 向应用程序提供简单一致的机制来控制复杂而又通常大不相同的低级硬件设备 操作系统通过几个抽象概念(进程、虚拟内存和文件)来实现这两个功能 文件是对I/O设备的抽象表示 虚拟内存是对主存和磁盘I/O设备的抽象表示 进程是对处理器、主存和I/O设备的抽象表示 进程 进程制造了程序在单独运行的假象 在一个系统上可以同时运行多个进程,而每个进程都好像在独占地使用硬件。 并发运行是说一个进程和另一个进程交错执行 线程 一个进程实际上可以由多个称为线程的执行单元组成 每个线程都运行在线程的上下文中 多线程之间比多进程之间更容易共享数据 线程一般比进程高效 虚拟内存 为每个进程提供每个进程都在独占内存的假象 每个进程看到的内存一致,称为虚拟内存地址 地址空间最上面保留给操作系统中的代码和数据,底部存放用户进程定义的代码和数据 图中地址从下往上增大 每个进程看到的虚拟空间由大量准确定义的区构成,每个区都有专门的功能 最低的地址开始,逐步向上介绍 程序代码和数据 对所有的进程来说,代码是从同一固定地址开始,紧接着的是和全局变量相对应的数据位置。 代码和数据区是直接按照可执行目标文件的内容初始化的,在示例中就是可执行文件hello 堆 代码和数据区后紧随着的是运行时堆 指定了大小,与此不同,当调用像malloc和free这样的C标准库函数时,堆可以在运行时动态地扩展和收缩 共享库 大约在地址空间的中间部分是一块用来存放像C标准库和数学库这样的共享库的代码和数据的区域 栈 位于用户虚拟地址空间顶部的是用户栈,编译器用它来实现函数调用 用户栈在程序执行期间可以动态地扩展和收缩 每次调用一个函数时,栈就会增长;从一个函数返回时,栈就会收缩 内核虚拟内存 地址空间顶部的区域是为内核保留的 不允许应用程序读写这个区域的内容或者直接调用内核代码定义的函数。 必须调用内核来执行这些操作 文件 仅仅为字节序列 每个I/O设备都可以看成是文件 向应用程序提供了一个统一的视图来看待系统中可能含有的所有各式各样的I/O设备 系统之间利用网络通信 现代系统通过网络和其他系统连接到一起 网络也可视为一个I/O设备 重要主题 系统是硬件和软件互相交织的集合体 Amdahl定律 当我们对系统的某个部分加速时,其对系统整体性能的影响取决于该部分的重要性和加速程度 若程序执行莫应用程序所需时间为$T_{old}$ 系统某部分所需执行时间与该时间的比例为α,而该部分提升比例为k 即该部分初始所需时间为$αT_{old}$,现在所需时间为$(αT_{old})/k$ 总执行时间为$T_{new} = (1 - α)T_{old} + (αT_{old})/k = T_{old}[(1 - α) + α/k]$ 加速比$S = T_{old} / T_{new}$为$S=\\frac{1}{(1-α)+α/k}$ 要想显著加速整个系统,必须提升全系统中相当大部分的速度 当k趋向于∞时,发现$S=\\frac{1}{1-α}$ 并发和并行 并发:一个同时具有多个活动的系统 并行:用并发来使一个系统运行得更快 1.线程级并发 同时有多个程序运行的系统 单线程系统:在多个任务间切换 构建单操作系统控制的多处理器系统$\\rightarrow$多处理器系统 多核处理器:将多个CPU集成到一个集成电路芯片上 超线程 2.指令级并行 现代处理器可以同时执行多条指令的属性称为指令级并行 如果处理器可以达到比一个周期一条指令更快的执行速率,就称之为超标量处理器 3.单指令、多数据并行 SIMD并行 一条指令产生多个可以并行执行的操作 计算机系统中抽象的重要性 减少复杂性 虚拟机:对整个计算机的抽象 小结","link":"/blog/2022/02/16/4_csapp_1/"},{"title":"My thoughts towards this blog","text":"It has been quite a long time since last time I wrote on my blog. The heavy school work makes me hard to breathe, so some topics were suspended(like CSAPP and ITA). But maybe I am not going to keep them alive because they are not my concentration right now. After nearly two years, I’ve already known some basic algorithms and data structures, and the knowledge of computer system was also taught by my teacher. This semester, I start to do some research. Unlike daily study, I think it’s better to take some notes after reading every paper. And I think it’s quite necessary to record my get-to-research process, which may be helpful to people behind. I start to research almost all by myself, so my experience may be instructive to those who is in the same position. Anyway, this blog is all for recording. I don’t expect a lot of people to come and learn a lot (actually mainly for myself). (Please forgive me for my poor English writing skills.)","link":"/blog/2023/10/24/8_thoughts/"},{"title":"x86汇编学习笔记","text":"cbw:把AL扩充成AX,扩充时要考虑负数 cwd:把AX扩充成DX:AX,扩充时要考虑负数 cdq:把EAX扩充成EDX:EAX,扩充时要考虑负数 一般用于放大被除数,为之后的除法做准备 movsx:符号扩充 movzx:零扩充 12movsx ax, al ;将al符号扩充成axmovzx ax, al ;将al零扩充成ax rol: rotate left 循环左移 把16位整数转化成16进制格式输出 123mov ah, 9Ahrol ah, 1; 9A = 1001 1010 -> 0011 0101 通过下列方式可以使用32位寄存器 123456.386data degment use16data ends code segment use16 code ends 段首地址(5位16进制数)必须以0结尾 0000:0000~9000:FFFF dos操作系统及用户代码占用的内存空间,总共640K内存空间 A000:0000~F000:FFFF 保留给显卡及ROM 显卡:text mode 与 graphic mode text mode:80*25 assume ds:data,编译器会把data替换成ds: 若写成assume es:data,编译器则会把data替换成es: 同一个段与多个寄存器有关联时:ds > ss > es > cs 如,若写成assume ds:data, es:data,则会将data替换成ds: cs:ip和ss:sp会被操作系统初始化 四个段寄存器中只有cs无法通过mov来改变 ds和es也会被操作系统初始化,ds=es=首段地址-10h 首段地址-10h:0000指向一块长度为100h字节的内存块,称为PSP(program segment prefix),PSP是操作系统自动分配给正在运行的程序的,里面存放了命令行参数等信息 图形模式编程 用int 10h把显卡切换到图形模式 如若要切换到分辨率320*200,颜色为256色的图形模式: mov ah, 0h mov al, 13h int 10h 例如: mov ax, 0a000h mov es, ax mov bx, 0 mov byte ptr es:[bx], 4 一个点只需要填一个字节 (x, y)对应的显卡偏移地址=y*320+x,段地址=A000h mov ah, 0 int 16h 能读取上下左右方向键,并且读取键盘时不回显 mov ah, 1 int 16h 检测键盘缓冲区中有没有曾经按下的键,如果有则zf=0,没有则zf=1 jz nokey mov ah, 0 int 16h … nokey: 继续刷新游戏画面 堆栈段定义stk segment stack db 200h dup(0) stk ends 加上ss:stk 当源代码中并没有定义堆栈段时,编译器会自动生成一个堆栈段,ss=首段的段地址,sp=0 FLAG寄存器FL共16位,但只用其中9位,包括6个状态标志和3个控制标志 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 X X X X OF DF IF TF SF ZF X AF X PF X CF 0 0 0 0 0 0 1 mov指令不会影响标志位的状态 CF:进位标志 移位指令也会影响CF的值,最后移出去的那一位会自动保存到CF中 jc 有进位则跳 jnc 无进位则跳 adc 带进位加 clc CF=0 stc CF=1 ZF:零标志 结果为0时ZF=1 jz==je jnz==jne cmp ax, ax jz或je next会跳转到next SF:运算结果的最高位 mov ah, 7fh add ah, 1 ;AH=80h=10000000B, SF=1 sub ah, 1 l;AH=7F=01111111B, SF=0 js与jns OF:溢出标志 在补码规则下,正数加正数得到负数时,OF=1;或者负数加负数得到正数时,OF=1;减法同理 jo与jno PF:奇偶标志位 统计第八位中1的个数,偶数为1,奇数为0 jp/jpe与jnp/jpo AF:辅助进位标志 第四位向高四位产生进位或借位 AF跟BCD码有关 123mov al, 29h add al, 8 daa ;加法的十进制调整,这条指令会根据AF=1做AL=AL+6运算,使得AL=37h 123mov al, 29hadd al, 1 daa ;AL=AL+6=2AH+6=30h DF:方向标志 控制字符串运算的方向 当DF=0时为正方向,当DF=1时是反方向 cld:使DF=0 std:使DF=1 源首地址<目标首地址时,复制按反方向 源首地址>目标首地址时,复制按正方向 只有源地址快和目标地址快有部分重叠时,才需要注意复制的正方向与反方向 IF:中断标志 IF=1时,允许硬件中断;IF=0时,禁止硬件中断 cli:使IF=0 sti:使IF=1 软件中断:在代码中用int n形式来调用某个函数集中的子函数 硬件中断:有硬件的某个事件触发,并由CPU自动插入并调用一个隐式的int n指令来调用某个中断服务子函数 TF:跟踪/陷阱标志 当TF=1时,CPU会进入单步模式,CPU在每执行一条指令后,会自动在该条指令与下条指令之间插入一条int 1h指令并执行它 当某条指令执行前TF=1,则该条指令执行后会自动执行int 1h单步中断 123456pushf ;push FLpop ax; AX=FLor ax, 100000000B; 或100h ; and ax, not 100h; 或0FEFFh,把AX的第8位清零push ax popf ;TF=1 jg,jl,jge,jle是符号数比较相关的跳转指令 jg:jump if greater SF == OF 且 ZF == 0 jge:SF == OF jl:SF != OF 不需要考虑ZF的状态 jle:SF != OF || (SF == OF && ZF == 1) jcxz:当cx=0时跳转 端口端口地址范围:[0000h, 0FFFFh],共65536个端口 对端口操作使用指令in和out实现 通过60h端口,CPU与键盘之间可以建立通讯 12345678in al, 60h ;端口号<=FFh时可之间用常数或者mov dx, 60h in al, dx out <=FFh的常数, al 或者out dx, al 32位间接寻址方式 32位比16位多了以下这种寻址方式: [寄存器+寄存器*n+常数] 其中n=2、4、8 如mov eax, [ebx+esi*4] 16位中只有4个寄存器可以用来放在[]内:[bx] [bp] [si] [di] 32位中对[]中的两个寄存器几乎不加限制 xchg用于交换两个寄存器之间的值 或者交换寄存器和地址之间的值 1234mov ax, 1 mov bx, 2xchg ax, bx ;则ax=2, bx=1xchg ax, ds[bx] 乘法指令:mul 8位乘法:被乘数为AL,乘积为AX mul bh表示AX=AL*BH mul后面所跟的操作数必须是8位寄存器或8位变量,不能是常数 16位乘法:被乘数为AX,乘积为DX:AX mul bx表示DX:AX=AX*BX 32位乘法:被乘数为EAX,乘积为EDX:EAX mul ebx表示EDX:EAX=EAX*EBX 带符号乘法为imul 第一类用法跟mul指令一样 mul及imul的第二类用法可以包含2个或3个操作数 12①mul eax, ebx ; eax=eax*ebx②imul eax, ebx, 3 ;eax=ebx*3 ①②中的第2个操作数可以是寄存器也可以是变量 ②中的第3个操作数只能是常数 除法指令:div 16位除以8位得8位 ax为被除数,al为商,ah为余数 ax / 除数 = AL…AH 32位除以16位得16位 dx:ax / 除数 = ax…dx 64位除以32位得32位 edx:eax / 除数 = eax…edx idiv:带符号除法 地址传送指令:lea lds les lea 取变量的编译地址 1234lea dx, ds[bx+si+3] ;dx=bx+si+3mov dx, bx+si+3 ;语法错误!lea eax, [eax+eax*4] ;eax=eax*5 用lea做乘法 []中*后所跟的数只能是2的n次方 远指针 近指针:某个变量的偏移地址 远指针:某个变量的段地址和偏移地址 C语言中的指针都为近指针 les di, dword ptr ds:[bx] 取出ds:[bx]处的32位,高16位在es,低16位在di lds si, dword ptr ds:[bx] 与les类似,但高16位在ds,低16位在si fword ptr 特指48位宽度的变量 换码指令:在xlat执行前必须让ds:bx指向表,al必须赋值为数组的下标。执行xlat后,al=ds:[bx+al] inc和dec不改变CF位 adc带进位加法 sbb带借位减法 neg ax ;即求ax=-ax 小数运算fadd fsub fmul fdiv 小数的运算 123pi dd 3.14; 32位小数,相当于floatr dq 3.14159; 64位小数,相当于doubles dt 3.1415926 ;80位小数,相当于long double CPU内部一共有8个小数寄存器,分别叫做st(0) st(1) st(2) … st(7) 其中st(0)可以简写成st 这八个寄存器的宽度都为80位 在载入数据时,先载入的值比如3.14进入st(0)后,再载入的值比如2.0并不是进入st(1),而是先把st(0)中的3.14存到st(1)中,再将2.0载入st(0)中 fld [a]:将[a]中值载入st中 fstp [a]:将st(0)中的内容取到[a]中,然后将st(0)弹出(st[1]中的内容会移动到st[0]中) fstp st(0)会将st(0)中的内容清空 每一个小数运算之前都会自动插入一条wait指令 fild [a]:将a中的整数值当作小数载入到st中 逻辑运算指令AND,OR,XOR,NOT,TEST test ax, 8000h ;将即做ax&8000h,但不保存结果,只改变标志寄存器的值 移位指令shl, shr, sal, sar, rol, ror, rcl, rcr 移出去的位都会放到CF中 sal是算术左移 sar为算术右移 shl为逻辑左移 shr为逻辑右移 算术左移(sal)及算术右移(sar)的对象是符号数 逻辑左移(shl)及逻辑右移(shr)的对象是非符号数 rcl:带进位循环左移 rcr:带进位循环右移 12345678910mov ah, 0b6h stc ;CF=1rcl ah, 1 ;CF=1, ah = 1011 0110 移位前 ;CF=1, ah = 0110 1101 移位后 ;即把CF和ah当作9个位一起循环转mov ah, 0b6hstc ;CF=1rcr ah, 1 ;AH = 1011 0110, CF=1 移位前 ;AH = 1101 1011, CF=0 移位后 字符串复制指令movsbrep movsb原理如下: 12345678again:if (cx == 0) goto done;byte ptr es:[di] = byte ptr ds:[si]if (df == 0) {si ++; di ++;}else {si --; di --;}cx --;goto againdone: 单独的movsb指令所做的操作如下: 123byte ptr es:[di] = byte ptr ds:[si]if (df == 0) {si ++; di ++;}else {si --; di --;} movswrep movsw原理如下: 12345678again:if (cx == 0) goto done;word ptr es:[di] = word ptr ds:[si] if (df == 0) {si += 2; di += 2;}else {si -= 2; di -= 2;} cx --;goto againdone: 单独的movsw指令与movsb指令类似 movsdrep movsd原理如下: 12345678again:if (cx == 0) goto done;dword ptr es:[di] = dword ptr ds:[si] if (df == 0) {si += 4; di += 4;}else {si -= 4; di -= 4;} cx --;goto againdone: 单独的movsd指令与movsb类似 32位系统在32位系统中,为ds:esi与es:edi 字符串比较指令:cmpsb, cmpsw, cmpsdcmpsb: 比较byte ptr ds:[si]与byte ptr es:[di] 当df=0时,si++,di++ 当df=1时,si—,di— rep cmpsb:连续进行比较 repe cmpsb:若本次相等则继续比较下一个 1234567again:if (cx == 0) goto done;if (df == 0) {si ++; di ++;}else {si --; di --;} cx --;若本次相等,则goto donedone: repne cmpsb:若本次不想等则继续比较下一个 可以根据zf=1推出两个字符串全等 可以根据zf=0推出两个字符串全不等 cmpsw和cmpsd同理 字符串扫描指令:scasb, scasw, scasdscasb: 12cmp al, es:[di] di++; 当df=1时,为di-- repne scasb: 12345678next:if (cx == 0) goto done;cmp al, es[di]di++; 当df=1时,为di--cx--;je donegoto next done: repe scasb:相等则进行下一次扫描 字符串操作指令:stosb, lodsbstosb: 12es[di] = al di ++; rep stosb:循环cx次stosb stosw:储存ax stosd:储存eax lodsb: 12al = ds:[si]si ++; lodsb通常没有rep前缀 lodsw:取出ax lodsd:取出eax 控制转移指令:jmp, call, intbyte ptr word ptr dword ptr fword ptr qword ptr tbyte ptr 短跳指令:机器码由2字节构成 第1个字节=EB,第二个字节=目标地址-下调指令的偏移地址 所有条件跳转都是近跳 call 近跳,仅push ip,返回用ret call dword ptr,远跳,push cs, push ip,返回用retf int,远眺,pushf, push cs, push ip,返回用iret 汇编语言中的三种参数传递方式 寄存器传递 123456f: add ax, ax ret main: mov ax, 3 call f 变量传递 1234567f: mov ax, var add ax, ax ret main: mov var, 3 call f 寄存器传递可以多线程,因为操作系统在进行多线程时会自动保存寄存器的值 变量传递不支持多线程 构造堆栈结构 12345678910111213f: push bp mov bp, sp mov ax, [bp + 4] add ax, ax pop bp ret main: mov ax, 3 push ax call fback: add sp, 2 动态变量的构造将如下c语言函数转化成汇编: 12345int f(int a, int b) { int c; c = a + b; return c;} 12345678910111213141516171819f: push bp mov bp, sp sub sp, 2 mov ax, [bp + 4] add ax, [bp + 6] mov [bp - 2], ax mov ax, [bp - 2] mov sp, bp pop bp ret main: mov ax, 20 push ax mov ax, 10 push ax call f back: add sp, 4 C语言函数中需要保护bp, bx, si, di(32位为ebp, ebx, esi, edi): 1234567891011121314f: push bp mov bp, sp sub sp, n ;其中n一个常数,用来为动态变量分配空间 push bx push si push di ... pop di pop si pop bx mov sp, bp pop bp ret 32位以上的cpu允许[esp+n]或[esp-n]来引用堆栈中的内容 递归1234int f(int n) { if (n == 1) return 1; return n + f(n - 1);} 上述C语言递归函数可翻译成以下汇编代码: 123456789101112131415161718192021f: push bp mov bp, sp mov ax, [bp + 4] cmp ax, 1 je done dec ax push ax call f there: add sp, 2 add ax, [bp + 4] done: pop bp ret main: mov ax, 3 push ax call f here: add sp, 2 结束程序运行但保留内存块123mov dx, 内存块的长度mov ah, 31h int 21h PSP:在code段之前,100字节,由操作系统自动分配 调用int 21h/ah=3h功能时,需要把PSP的占用的空间长度也计算到当前程序占用的内存块长度中 dx = ((100h + code段的长度) + 0fh ) / 10h 除以10h是因为长度单位为节(1节=10h字节) 加上0fh是考虑到code段长度无法被10h整除 123final label byte 相当于final是一个db类型的变量,但没有分配内存之后可用offset final表示程序结束位置的偏移地址 int、iret中断指令格式:int n; 其中n的范围是[0, 0FFh]","link":"/blog/2022/07/28/7_x86masm/"},{"title":"<Pinned> How to Read a Paper","text":"It’s a good question that “how to read a paper ?”. After reading some papers I suddenly find myself actually don’t know how to effectively and efficiently read a paper. So I’ll make some collections here on this topic. “How to Read a Research Paper”, by Michael Mitzenmacher “How to Read an Engineering Research Paper”, by William Griswold You can refer to the links above to get some general ideas (these links were copied from Lingming Zhang’s slides of cs527-s23). According to William Griswold, try to answer the problem below: What are the motivations for this work? For a research paper, there is an expectation that a problem has been solved that no one else has published in the literature. This problem intrinsically has two parts. The first is often unstated, what I call the people problem. The people problem is the benefits that are desired in the world at large; for example some issue of quality of life, such as saved time or increased safety. The second part is the technical problem, which is: why doesn’t the people problem have a trivial solution? There is also an implication that previous solutions to the problem are inadequate. What are the previous solutions and why are they inadequate? Finally, the motivation and statement of the problem are distilled into a research question, the question that the paper sets out to answer. This might be more focused than the problem stated at the outset. Oftentimes, one or more of these elements are not explicitly stated, making your job more difficult. What is the proposed solution? This is also called the hypothesis or idea. This is the proposed answer to the research question. There should also be an answer to the question why is it believed that this solution will work, and be better than previous solutions? There should also be a discussion about how the solution is achieved (designed and implemented) or is at least achievable. What is the work’s evaluation of the proposed solution? An idea alone is usually not adequate for publication of a research paper. This is the concrete engagement of the research question. What argument, implementation, and/or experiment makes the case for the value of the ideas? What benefits or problems are identified? What is your analysis of the identified problem, idea and evaluation? Is this a good idea? What flaws do you perceive in the work? What are the most interesting points made? What are the most controversial ideas or points made? For work that has practical implications, you also want to ask: Is this really going to work, who would want it, what it will take to give it to them, and when might it become a reality? What are the contributions? The contributions in a paper may be many and varied. Beyond the insights on the research question, a few additional possibilities include: ideas, software, experimental techniques, or an area survey. What are future directions for this research? Not only what future directions do the authors identify, but what ideas did you come up with while reading the paper? Sometimes these may be identified as shortcomings or other critiques in the current work. What questions are you left with? What questions would you like to raise in an open discussion of the work? What do you find confusing or difficult to understand? By taking the time to list several, you will be forced to think more deeply about the work. What is your take-away message from this paper? Sum up the main implication of the paper from your perspective. This is useful for very quick review and refreshing your memory. It also forces you to try to identify the essence of the work.","link":"/blog/2023/10/24/9_how_to_read_paper/"},{"title":"《算法导论》ITA(一)归并排序","text":"时间复杂度:O(nlogn) 归并排序完全遵循分治模式,直观上操作如下 分解:分解待排序的n个元素的序列成各具n/2个元素的两个子序列 解决:使用归并排序递归排序两个子序列 合并:合并两个已排序的子序列以产生已排序的答案 在子序列元素个数为1时直接返回 代码如下:1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253#include<stdio.h>#include<stdlib.h>#define maxn 1000000// 待排序数组a,临时数组tmpvoid merge_sort(int *a, int l, int r, int *tmp) { if (l == r) return; if (l < r) { int mid = l + r >> 1; //递归处理序列左右两段 merge_sort(a, l, mid, tmp); merge_sort(a, mid + 1, r, tmp); //将两段有序序列合并 //将将合并的序列放在临时数组tmp中 int i = l, j = mid + 1, k = l; while (i <= mid || j <= r) { //处理一段序列已经空了的情况 if (i > mid) { tmp[k ++] = a[j]; j ++; continue; } if (j > r) { tmp[k ++] = a[i]; i ++; continue; } //处理两段数列都非空的情况 if (a[i] <= a[j]) { tmp[k ++] = a[i]; i ++; } else { tmp[k ++] = a[j]; j ++; } } //将临时数组tmp中的数据放回序列数组a中 for (-- k; k >= l; -- k) a[k] = tmp[k]; }}int main() { int i, n; scanf("%d", &n); int *a = malloc(sizeof(int) * n); int *tmp = malloc(sizeof(int) * n); if (a == NULL || tmp == NULL) return 0; for (i = 0; i < n; ++ i) scanf("%d", &a[i]); merge_sort(a, 0, n - 1, tmp); for (i = 0; i < n; ++ i) printf("%d ", a[i]); return 0;}","link":"/blog/2022/02/27/6_ita_1/"},{"title":"《深入理解计算机系统》CSAPP(二)","text":"信息的表示和处理 无符号编码 表示≥0的数字 补码编码 表示有符号整数 浮点数编码 表示实数的科学计数法的以2为基数的版本 溢出 计算机以有限位表示结果 结果过大不能表示导致溢出 如$200 \\times 300 \\times 400 \\times 500 = -884901888$ 整数的计算机运算满足真正整数运算的许多性质 如乘法满足结合律和交换律 浮点运算 溢出产生特殊值+∞ 一组正数的乘积总是正的 不可结合 $3.14 + (10^{20} - 10^{20}) = 3.14$ $(3.14 + 10^{20}) - 10^{20} = 0$ 整数编码虽然只能编码一个较小的范围,但表示精确 浮点数编码虽然可以编码一个较大的范围,但表示近似 信息存储十六进制表示法 用十六进制(hex)来表示值 ‘0’ $\\sim$ ‘9’,‘A’ $\\sim$ ‘F’,表示16个值 C语言中以0x或0X开头 熟悉十六进制,十进制和二进制之间的相互转化 字数据大小 每台计算机都有一个字长,指明指针数据的标称大小 虚拟地址以这样的一个字来编码$\\rightarrow$字长决定的最重要的系统参数就是虚拟地址空间的最大大小 对一个字长为w位的机器而言,虚拟地址的范围为$0 \\sim 2^{w}-1$,程序最多访问$2^{w}$个字节 大多数64位机器可以运行32位机器编译的程序,这是一种向后兼容 当程序prog.c用如下伪指令编译后 linux> gcc -m32 prog.c 该程序可以在32位或64位机器上正确运行 用下述伪指令编译后 linux> gcc -m64 prog.c 就只能在64位机器上运行 将程序称为“32位程序”或“64位程序”时,区别在于该程序是如何编译的,而不是其运行的机器类型 计算机和编译器支持多种不同方式编码的数字格式 C语言标准对不同数据类型的数字范围设置了下界,但是却没有上界 寻址和字节顺序 大多数intel机都只用小端模式 双端法 既可以配置成大端也可以配置成小端 但一旦选定了操作系统,其字节顺序就固定了 书写字节序列时最低位字节在左边,最高位字节在右边 C语言中可通过强制类型转换或联合来允许一种数据类型引用一个对象,而这种数据类型与创建这个对象时定义的数据类型不同 用浮点型和整型表示同一个数据时,将十六进制转化为二进制后,并进行适当的移位,就会发现一个有13个相匹配的位的序列。 如下图,分别用整型和浮点型表示12345 而这并不是巧合,我们后面会进行相关学习 表示字符串 C语言中字符串被编码为一个以NULL字符结尾的字符数组 文本数据比二进制数据具有更强的平台独立性 表示代码123int sum(int x, int y) { return x + y; } 编译上述代码时,在不同机器上生成的机器代码如下 发现指令编码不同 不同机器类型使用不同且不兼容的指令和编码方式 二进制代码不兼容 布尔代数简介 C语言中的位级运算 C语言支持按位布尔运算 位运算实现掩码运算 C语言中的逻辑运算 逻辑运算符 ||、&&、! 如果对第一个参数求值就能确定表达式的值,那么逻辑运算符就不会对第二个参数求值 a && 5 / a 不会造成被零除 p && *p++也不会导致间接引用空指针 不要混淆逻辑运算和位运算 C语言中的移位运算 左移 << x << k 右端补k个0 右移 >> x >> k 机器支持逻辑右移和算术右移 逻辑右移 左端补k个0 算术右移 左端补k个最高有效位的值 C语言中几乎所有编译器都对有符号数使用算术右移,对无符号数进行逻辑右移 注意移位运算的优先级! 整数表示 相关的数学术语 整数数据类型 取值范围不对称 负数范围比正数范围大1 C语言标准定义了每种数据类型必须能够表示的最小的取值范围 无符号数的编码 无符号数编码的定义 对向量$\\vec{x}=[x_{w-1},x_{w-2},…,x_{0}]:$B2U_{w}(\\vec{x})=\\sum_{i=0}^{w-1}x_i2^i 如$B2U_4([0101])=0\\cdot2^3+1\\cdot2^2+0\\cdot2^1+1\\cdot2^0=0+4+0+1=5$ $0~2^w-1$ 编码具有唯一性 补码编码 最高有效位解释为负权 补码编码的定义 向量$\\vec{x}=[x_{w-1},x_{w-2},…,x_{0}]:$B2T_w(\\vec{x})=-x_{w-1}2^{w-1}+\\sum_{i=0}^{w-2}x_i2^i 最高有效位$x_{w-1}$也称为有效位 他为1时,表示值为负 他为0时,表示值为正 如$B2U_4([0101])=-0\\cdot2^3+1\\cdot2^2+0\\cdot2^1+1\\cdot2^0=0+4+0+1=5$ $B2U_4([1011])=-1\\cdot2^3+0\\cdot2^2+1\\cdot2^1+1\\cdot2^0=-8+0+2+1=-5$ 表示最小值的向量为[10…0],值为$-2^{w-1}$ 最大值的向量为[011…1],值为$2^{w-1}-1$ $-2^{w-1}~2^{w-1}-1$ 补码具有唯一性 注意事项 补码范围不对称:|TMin|=|TMax|+1 最大无符号数值刚好比补码的最大值的两倍大1:$UMax_w=2TMax_w+1$ 几乎所有现代机器都使用补码 有符号数的其他表示方法 反码 除了最高有效位的权为$-(2^{w-1}-1)$而不是$-2^{w-1}$,他和补码是一样的B2O_w(\\vec{x})=-x_{w-1}(2^{w-1}-1)+\\sum_{i=0}^{w-2}x_i2^i 原码 最高有效位是符号位,用来确定剩下的位应该取负权还是正权B2S_w(\\vec{x})=(-1)^{x_{w-1}}\\cdot(\\sum_{i=0}^{w-2}x_i2^i) 这两种方法都有一个奇怪的属性 把[00…0]都解释为+0 -0在反码中表示为[11…1],在原码中表示为[10…0] 有符号数和无符号数之间的转换 C语言允许各种不同的数字数据类型之间做强制类型转换 强制类型转换的结果保持位置不变,只是改变了解释这些位的方式 C语言中的有符号数与无符号数 要创建一个无符号常量,必须加上后缀字符’U’或者’u’ 如12345U或者0x5A11u 如果无符号和有符号的数同时参与运算,C语言会将无符号强制转换为有符号 这会导致非直观的影响 扩展一个数字的位表示 无符号数的零扩展 定义宽度为w的位向量$\\vec{u}=[u_{w-1},u_{w-2},…,u_0]$和宽度位$w^{‘}$的位向量$\\vec{u^{‘}}=[0,…,0,u_{w-1},u_{w-2},…,u_0]$,其中$w_{‘}>w$。 则$B2U_w(\\vec{u})=B2U_{w^{‘}}(\\vec{u^{‘}})$ 补码数的符号扩展 定义宽度为w的位向量$\\vec{x}=[x_{w-1},x_{w-2},…,x_0]$和宽度位$w^{‘}$的位向量$\\vec{x^{‘}}=[x_{w-1},…,x_{w-1},x_{w-1},x_{w-2},…,x_0]$,其中$w_{‘}>w$。 则$B2T_w(\\vec{x})=B2T_{w^{‘}}(\\vec{x^{‘}})$ 当把short转换成unsigned时,要先改变大小,再完成有符号到无符号的转换 截断数字 减少表示一个数字的数位时会发生截断 从高位进行截断 整数运算 无符号加法 若相加没有溢出,则正常相加运算 若发生溢出,则进行高位截断 对满足$0≤x,y<2^w$的x和y有:x+^{u}_{w}y=\\left\\{\\begin{matrix}x+y,x+y","link":"/blog/2022/02/19/5_csapp_2/"},{"title":"[Review] Detecting Missed Security Operations Through Differential Checking of Object-based Similar Paths","text":"Link Problem: Missing a security operation, such as a bound check. Traditional Methods: Cross-checking. Locate the potential bugs by exploiting a large number of similar code snippets and compare their patterns. The paper proposes a new approach to locating bugs, which do not need a large number of cases. Instead, only two code snippets are required. To be specific, object-based similar-path pairs are constructed. Background Large-scale programs usually enforce various kinds of security operations (e.g., security checks, locks, and reference counting) to ensure the safety. Missing security operations is the cause of 61% vulnerabilities in the national vulnerability database (NVD). Cross-checking: Collects a substantial number of functionally or semantically similar code pieces. Checks the behaviors of security operations across these code slices. Once we find that the majority of the code pieces have enforced a security operation, we assume that the majority is correct and report the minority cases that miss the security operation as bugs. Problems: Many code pieces may be unique, and thus we may not be able to find enough similar cases to enable cross-checking. The granularity of code slicing is hard to control. The hypothesis that the majority is correct might not always hold. Implementation Designs IPPO (Inconsistent Path Pairs as a bug Oracle). Requires only one pair of similar code paths to determine if a path misses a security operation. Construct the object-based similar-path pairs (OSPP). Four rules for extraction: The two paths start at the same block and end at the same block in CFG. The object has the same state in two paths. The two paths have the same SO-influential operations. The two paths have the same sets of pre- and post-conditions against the object.","link":"/blog/2024/07/04/41_paper_review_29/"},{"title":"[Review] Assisting Static Analysis with Large Language Models: A ChatGPT Experiment","text":"Link The paper demonstrates the effectiveness of LLM in static analysis. The most important thing of this paper is the task division and the workflow design. First we need to figure out what the LLM is good at, and assign such tasks to it. What’s more, we need to care about the design of the workflow, which could significantly affect the final result. Background Traditional static analysis tools have some shortages. Embedding LLM into the toolchain can help the analysis. In this paper, Use Before Initialization (UBI) bugs are chosen as the example. UBITect, which is a tool for UBI bugs, has some shortcomings in detecting, and may discord some cases. LLM can help determine whether these bugs are true bugs. Implementation Chain-of-Thought: add “think step by step” in the prompt. (not obviously used in the paper) Task Decomposition: break down the problem into multiple steps and instruct LLM to complete smaller ones. When we need a structured output, always initiate a new request at the end of the conversation and prompt LLM to conclude with JSON format separately. Progressive Prompt: Iterative prompting. “If you experience uncertainty due to insucient function denitions, please indicate the required functions.”","link":"/blog/2024/07/12/42_paper_review_30/"}],"tags":[{"name":"计算机系统","slug":"计算机系统","link":"/blog/tags/%E8%AE%A1%E7%AE%97%E6%9C%BA%E7%B3%BB%E7%BB%9F/"},{"name":"自学笔记","slug":"自学笔记","link":"/blog/tags/%E8%87%AA%E5%AD%A6%E7%AC%94%E8%AE%B0/"},{"name":"OSDI","slug":"OSDI","link":"/blog/tags/OSDI/"},{"name":"2020","slug":"2020","link":"/blog/tags/2020/"},{"name":"OSDI 2020","slug":"OSDI-2020","link":"/blog/tags/OSDI-2020/"},{"name":"Paper Review","slug":"Paper-Review","link":"/blog/tags/Paper-Review/"},{"name":"USENIX Security","slug":"USENIX-Security","link":"/blog/tags/USENIX-Security/"},{"name":"2023","slug":"2023","link":"/blog/tags/2023/"},{"name":"USENIX Security 2023","slug":"USENIX-Security-2023","link":"/blog/tags/USENIX-Security-2023/"},{"name":"IEEE S&P","slug":"IEEE-S-P","link":"/blog/tags/IEEE-S-P/"},{"name":"IEEE S&P 2023","slug":"IEEE-S-P-2023","link":"/blog/tags/IEEE-S-P-2023/"},{"name":"CCS","slug":"CCS","link":"/blog/tags/CCS/"},{"name":"CCS 2020","slug":"CCS-2020","link":"/blog/tags/CCS-2020/"},{"name":"ISSTA","slug":"ISSTA","link":"/blog/tags/ISSTA/"},{"name":"2022","slug":"2022","link":"/blog/tags/2022/"},{"name":"ISSTA 2022","slug":"ISSTA-2022","link":"/blog/tags/ISSTA-2022/"},{"name":"ICSE","slug":"ICSE","link":"/blog/tags/ICSE/"},{"name":"2007","slug":"2007","link":"/blog/tags/2007/"},{"name":"ICSE 2007","slug":"ICSE-2007","link":"/blog/tags/ICSE-2007/"},{"name":"TSE","slug":"TSE","link":"/blog/tags/TSE/"},{"name":"2013","slug":"2013","link":"/blog/tags/2013/"},{"name":"TSE 2013","slug":"TSE-2013","link":"/blog/tags/TSE-2013/"},{"name":"2024","slug":"2024","link":"/blog/tags/2024/"},{"name":"IEEE S&P 2024","slug":"IEEE-S-P-2024","link":"/blog/tags/IEEE-S-P-2024/"},{"name":"Jottings","slug":"Jottings","link":"/blog/tags/Jottings/"},{"name":"2012","slug":"2012","link":"/blog/tags/2012/"},{"name":"ICSE 2012","slug":"ICSE-2012","link":"/blog/tags/ICSE-2012/"},{"name":"ISSTA 2023","slug":"ISSTA-2023","link":"/blog/tags/ISSTA-2023/"},{"name":"CHI","slug":"CHI","link":"/blog/tags/CHI/"},{"name":"2021","slug":"2021","link":"/blog/tags/2021/"},{"name":"CHI 2021","slug":"CHI-2021","link":"/blog/tags/CHI-2021/"},{"name":"ICSE 2023","slug":"ICSE-2023","link":"/blog/tags/ICSE-2023/"},{"name":"ICSE 2024","slug":"ICSE-2024","link":"/blog/tags/ICSE-2024/"},{"name":"ASE","slug":"ASE","link":"/blog/tags/ASE/"},{"name":"ASE 2024","slug":"ASE-2024","link":"/blog/tags/ASE-2024/"},{"name":"CCS 2023","slug":"CCS-2023","link":"/blog/tags/CCS-2023/"},{"name":"TSE 2022","slug":"TSE-2022","link":"/blog/tags/TSE-2022/"},{"name":"2019","slug":"2019","link":"/blog/tags/2019/"},{"name":"CCS 2019","slug":"CCS-2019","link":"/blog/tags/CCS-2019/"},{"name":"ESEM","slug":"ESEM","link":"/blog/tags/ESEM/"},{"name":"ESEM 2021","slug":"ESEM-2021","link":"/blog/tags/ESEM-2021/"},{"name":"LeetCode","slug":"LeetCode","link":"/blog/tags/LeetCode/"},{"name":"Binary Search","slug":"Binary-Search","link":"/blog/tags/Binary-Search/"},{"name":"Dynamic Programming","slug":"Dynamic-Programming","link":"/blog/tags/Dynamic-Programming/"},{"name":"NDSS","slug":"NDSS","link":"/blog/tags/NDSS/"},{"name":"NDSS 2024","slug":"NDSS-2024","link":"/blog/tags/NDSS-2024/"},{"name":"2016","slug":"2016","link":"/blog/tags/2016/"},{"name":"ASE 2016","slug":"ASE-2016","link":"/blog/tags/ASE-2016/"},{"name":"SOSP","slug":"SOSP","link":"/blog/tags/SOSP/"},{"name":"SOSP 2023","slug":"SOSP-2023","link":"/blog/tags/SOSP-2023/"},{"name":"SOSP 2021","slug":"SOSP-2021","link":"/blog/tags/SOSP-2021/"},{"name":"2018","slug":"2018","link":"/blog/tags/2018/"},{"name":"USENIX Security 2018","slug":"USENIX-Security-2018","link":"/blog/tags/USENIX-Security-2018/"},{"name":"算法","slug":"算法","link":"/blog/tags/%E7%AE%97%E6%B3%95/"},{"name":"汇编","slug":"汇编","link":"/blog/tags/%E6%B1%87%E7%BC%96/"},{"name":"CCS 2021","slug":"CCS-2021","link":"/blog/tags/CCS-2021/"},{"name":"ESEC/FSE","slug":"ESEC-FSE","link":"/blog/tags/ESEC-FSE/"},{"name":"ESEC/FSE 2023","slug":"ESEC-FSE-2023","link":"/blog/tags/ESEC-FSE-2023/"}],"categories":[{"name":"《深入理解计算机系统》CSAPP","slug":"《深入理解计算机系统》CSAPP","link":"/blog/categories/%E3%80%8A%E6%B7%B1%E5%85%A5%E7%90%86%E8%A7%A3%E8%AE%A1%E7%AE%97%E6%9C%BA%E7%B3%BB%E7%BB%9F%E3%80%8BCSAPP/"},{"name":"Research","slug":"Research","link":"/blog/categories/Research/"},{"name":"Notes","slug":"Notes","link":"/blog/categories/Notes/"},{"name":"Jottings","slug":"Jottings","link":"/blog/categories/Jottings/"},{"name":"LeetCode","slug":"LeetCode","link":"/blog/categories/LeetCode/"},{"name":"《算法导论》ITA","slug":"《算法导论》ITA","link":"/blog/categories/%E3%80%8A%E7%AE%97%E6%B3%95%E5%AF%BC%E8%AE%BA%E3%80%8BITA/"},{"name":"汇编","slug":"汇编","link":"/blog/categories/%E6%B1%87%E7%BC%96/"}]}