Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ Step 6: Add a new action method named `ExtractData` in HomeController.cs and inc
// Open the input PDF file as a stream.
using (FileStream stream = new FileStream(Path.GetFullPath("Input.pdf"), FileMode.Open, FileAccess.Read))
{
// Initialize the Smart Data Extractor.
// Initialize the Data Extractor.
DataExtractor extractor = new DataExtractor();
// Extract form data as JSON.
string data = extractor.ExtractDataAsJson(stream);
Expand All @@ -87,10 +87,12 @@ using (FileStream stream = new FileStream(Path.GetFullPath("Input.pdf"), FileMod
{% endhighlight %}

Step 7: Build the project.
Click on Build > Build Solution or press Ctrl + Shift + B to build the project.

Click on **Build** → **Build Solution** or press <kbd>Ctrl</kbd>+<kbd>Shift</kbd>+<kbd>B</kbd> to build the project.

Step 8: Run the project.
Click the Start button (green arrow) or press F5 to run the app.

Click the Start button (green arrow) or press <kbd>F5</kbd> to run the application.

{% endtabcontent %}

Expand Down Expand Up @@ -153,7 +155,7 @@ Step 7: Add a new action method named `ExportToJson` in HomeController.cs and in
// Open the input PDF file as a stream.
using (FileStream stream = new FileStream(Path.GetFullPath("Input.pdf"), FileMode.Open, FileAccess.Read))
{
// Initialize the Smart Data Extractor.
// Initialize the Data Extractor.
DataExtractor extractor = new DataExtractor();
// Extract form data as JSON.
string data = extractor.ExtractDataAsJson(stream);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ Step 5: Add a new button in the Index.cshtml as shown below.

{% endhighlight %}

Step 6: Add a new action method named ExtractData in `HomeController.cs` and include the following code example to extract data from a PDF document using the [ExtractDataAsJson](https://help.syncfusion.com/cr/document-processing/Syncfusion.SmartDataExtractor.DataExtractor.html#Syncfusion_SmartDataExtractor_DataExtractor_ExtractDataAsJson_System_IO_Stream_) method in the [DataExtractor](https://help.syncfusion.com/cr/document-processing/Syncfusion.SmartDataExtractor.DataExtractor.html) class.
Step 6: Add a new action method named `ExtractData` in HomeController.cs and include the following code example to extract data from a PDF document using the [ExtractDataAsJson](https://help.syncfusion.com/cr/document-processing/Syncfusion.SmartDataExtractor.DataExtractor.html#Syncfusion_SmartDataExtractor_DataExtractor_ExtractDataAsJson_System_IO_Stream_) method in the [DataExtractor](https://help.syncfusion.com/cr/document-processing/Syncfusion.SmartDataExtractor.DataExtractor.html) class.

{% highlight c# tabtitle="C#" %}

Expand All @@ -61,7 +61,7 @@ string inputPath = Server.MapPath("~/App_Data/Input.pdf");
// Open the input PDF file as a stream.
using (FileStream stream = new FileStream(inputPath, FileMode.Open, FileAccess.ReadWrite))
{
// Initialize the Smart Data Extractor.
// Initialize the Data Extractor.
DataExtractor extractor = new DataExtractor();
// Extract form data as JSON.
string data = extractor.ExtractDataAsJson(stream);
Expand All @@ -75,10 +75,10 @@ using (FileStream stream = new FileStream(inputPath, FileMode.Open, FileAccess.R

{% endhighlight %}

A complete working sample can be downloaded from [GitHub](https://github.com/SyncfusionExamples/PDF-Examples/tree/master/Data-Extraction/Getting-Started/ASP.NETMVC/Extract_Data).

By executing the program, you will get the JSON file as follows.
![ASP.NET MVC output JSON document](GettingStarted_images/JSON_Output.png)

A complete working sample can be downloaded from [GitHub](https://github.com/SyncfusionExamples/PDF-Examples/tree/master/Data-Extraction/Getting-Started/ASP.NETMVC/Extract_Data).

Click [here](https://www.syncfusion.com/document-sdk/net-pdf-data-extraction) to explore the rich set of Syncfusion<sup>&reg;</sup> Data Extraction library features.

Original file line number Diff line number Diff line change
Expand Up @@ -66,14 +66,11 @@ Include the following code snippet to add a button in your Blazor application th
{% tabs %}
{% highlight CSHTML %}
<h1>Run Extraction</h1>

<button @onclick="RunExtraction" class="btn btn-primary">
Run Extractor
</button>

<p>@message</p>


{% endhighlight %}
{% endtabs %}

Expand All @@ -85,12 +82,10 @@ Add the following code snippet to extract data from a PDF and download the file
{% highlight c# tabtitle="C#" %}
@code {
string message = "Waiting...";

async Task RunExtraction()
{
message = "Processing...";
StateHasChanged(); // force UI update immediately

message = await extractor.RunExtraction();
}
}
Expand Down Expand Up @@ -122,10 +117,8 @@ using (FileStream stream = new FileStream(@"wwwroot/Input.pdf", FileMode.Open, F
{
// Initialize the Smart Data Extractor
DataExtractor extractor = new DataExtractor();

// Extract data as JSON string
string data = extractor.ExtractDataAsJson(stream);

// Return the JSON string
return data;
}
Expand Down
12 changes: 6 additions & 6 deletions Document-Processing/Data-Extraction/NET/Extract-Data-in-MAUI.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,24 +48,24 @@ Step 4: Add a new button to the **MainPage.xaml** as shown below.
xmlns:x="http://schemas.microsoft.com/winfx/2009/xaml"
x:Class="Extract_Data_MAUI.MainPage">

<ScrollView>
<VerticalStackLayout
<ScrollView>
<VerticalStackLayout
Padding="30,0"
Spacing="25">

<Label
<Label
Text="Smart Data Extractor Demo"
Style="{StaticResource Headline}"
SemanticProperties.HeadingLevel="Level1" />

<Button
<Button
Text="Extract Data from PDF"
SemanticProperties.Hint="Extract structured data from PDF"
Clicked="OnExtractDataClicked"
HorizontalOptions="Fill" />

</VerticalStackLayout>
</ScrollView>
</VerticalStackLayout>
</ScrollView>
</ContentPage>

{% endhighlight %}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ Step 5: Add the following code in `ExtractButton_Click` to extract data from a P
// Open the input PDF file as a stream.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
// Initialize the Smart Data Extractor.
// Initialize the Data Extractor.
DataExtractor extractor = new DataExtractor();
// Extract form data as JSON.
string data = extractor.ExtractDataAsJson(stream);
Expand All @@ -67,9 +67,9 @@ using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess

{% endhighlight %}

A complete working sample can be downloaded from [GitHub](https://github.com/SyncfusionExamples/PDF-Examples/tree/master/Data-Extraction/Getting-Started/WPF/Extract_Data).

By executing the program, you will get the JSON file as follows.
![WPF output JSON document](GettingStarted_images/JSON_Output.png)

A complete working sample can be downloaded from [GitHub](https://github.com/SyncfusionExamples/PDF-Examples/tree/master/Data-Extraction/Getting-Started/WPF/Extract_Data).


Click [here](https://www.syncfusion.com/document-sdk/net-pdf-data-extraction) to explore the rich set of Syncfusion<sup>&reg;</sup>Data Extraction library features.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,12 @@ JavaScript Object Notation (JSON) is a lightweight data‑interchange format tha

Refer to the following links for the assemblies and NuGet packages required on different platforms to extract data as a JSON file using the Smart Data Extractor library.

* [Assemblies required for PDF to JSON Extraction](https://help.syncfusion.com/document-processing/data-extraction/smart-data-extractor/net/assemblies-required)
* [NuGet packages required for PDF to JSON Extraction](https://help.syncfusion.com/document-processing/data-extraction/smart-data-extractor/net/nuget-packages-required)
* [Assemblies required for PDF to JSON Extraction](/document-processing/data-extraction/net/Assemblies-required)
* [NuGet packages required for PDF to JSON Extraction](/document-processing/data-extraction/net/Nuget-packages-required)

## Extract Data as JSON from PDF or Image

To extract form fields across a PDF document using the **ExtractDataAsJson** method of the **DataExtractor** class, refer to the following code example:
To extract form fields across a PDF document using the [ExtractDataAsJson](https://help.syncfusion.com/cr/document-processing/Syncfusion.SmartDataExtractor.DataExtractor.html#Syncfusion_SmartDataExtractor_DataExtractor_ExtractDataAsJson_System_IO_Stream_) method of the [DataExtractor](https://help.syncfusion.com/cr/document-processing/Syncfusion.SmartDataExtractor.DataExtractor.html) class, refer to the following code example:

{% tabs %}

Expand Down Expand Up @@ -73,7 +73,7 @@ You can download a complete working sample from [GitHub](https://github.com/Sync

## Extract Data from a Customized Page Range

To extract data from a specific range of pages in a PDF document using the ExtractDataAsJson method of the DataExtractor class, refer to the following code example:
To extract data from a specific range of pages in a PDF document using the [ExtractDataAsJson](https://help.syncfusion.com/cr/document-processing/Syncfusion.SmartDataExtractor.DataExtractor.html#Syncfusion_SmartDataExtractor_DataExtractor_ExtractDataAsJson_System_IO_Stream_) method of the [DataExtractor](https://help.syncfusion.com/cr/document-processing/Syncfusion.SmartDataExtractor.DataExtractor.html) class, refer to the following code example:

{% tabs %}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,12 @@ Markdown is a lightweight markup language that adds formatting elements to plain

Refer to the following links for assemblies and NuGet packages required based on platforms to Extract data as Markdown file using the .NET Word Library (DocIO).

* [PDF to Markdown Extraction assemblies](https://help.syncfusion.com/document-processing/data-extraction/smart-data-extractor/net/assemblies-required)
* [PDF to Markdown Extraction NuGet packages](https://help.syncfusion.com/document-processing/data-extraction/smart-data-extractor/net/nuget-packages-required)
* [PDF to Markdown Extraction assemblies](/document-processing/data-extraction/net/Assemblies-required)
* [PDF to Markdown Extraction NuGet packages](/document-processing/data-extraction/net/Nuget-packages-required)

## Extract Data as Markdown from PDF or Image

To extract form fields across a PDF document using the **ExtractDataAsMarkdown** method of the **DataExtractor** class, refer to the following code example:
To extract form fields across a PDF document using the [ExtractDataAsMarkdown](https://help.syncfusion.com/cr/document-processing/Syncfusion.SmartDataExtractor.DataExtractor.html#Syncfusion_SmartDataExtractor_DataExtractor_ExtractDataAsMarkdown_System_IO_Stream_) method of the [DataExtractor](https://help.syncfusion.com/cr/document-processing/Syncfusion.SmartDataExtractor.DataExtractor.html) class, refer to the following code example:

{% tabs %}

Expand Down Expand Up @@ -70,7 +70,7 @@ You can download a complete working sample from [GitHub](https://github.com/Sync

## Extract a specific page to Markdown

The following code demonstrates how to use the **ExtractDataAsMarkdown** method of the **DataExtractor** class to extract content from a selected page in a PDF and save it as a Markdown file by specifying its page index.
The following code demonstrates how to use the [ExtractDataAsMarkdown](https://help.syncfusion.com/cr/document-processing/Syncfusion.SmartDataExtractor.DataExtractor.html#Syncfusion_SmartDataExtractor_DataExtractor_ExtractDataAsMarkdown_System_IO_Stream_) method of the [DataExtractor](https://help.syncfusion.com/cr/document-processing/Syncfusion.SmartDataExtractor.DataExtractor.html) class to extract content from a selected page in a PDF and save it as a Markdown file by specifying its page index.

{% tabs %}

Expand Down Expand Up @@ -119,10 +119,9 @@ using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess

{% endtabs %}


## Extract a range of pages to Markdown

The following code demonstrates how to use the **ExtractDataAsMarkdown** method of the **DataExtractor** class to extract content from a range of pages in a PDF and save it as a Markdown file by specifying the page range.
The following code demonstrates how to use the [ExtractDataAsMarkdown](https://help.syncfusion.com/cr/document-processing/Syncfusion.SmartDataExtractor.DataExtractor.html#Syncfusion_SmartDataExtractor_DataExtractor_ExtractDataAsMarkdown_System_IO_Stream_) method of the [DataExtractor](https://help.syncfusion.com/cr/document-processing/Syncfusion.SmartDataExtractor.DataExtractor.html) class to extract content from a range of pages in a PDF and save it as a Markdown file by specifying the page range.

{% tabs %}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,5 @@ The Syncfusion<sup>&reg;</sup> Data Extraction Add-On can be downloaded from the

4. The Syncfusion Data Extraction Add-On is provided in ZIP format. After downloading, extract the file to access assemblies and demos for PDF and image data extraction.
![License and downloads of Syncfusion SmartDataExtractor](images/start-trial-download-offline-installer.png)

N> The Syncfusion Data Extraction Add‑On is available in ZIP format for Windows, Linux, and Mac. Extract the file to access the assemblies and demos for PDF or image data extraction.
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: Working with Table Extraction | Syncfusion
title: Working with Data Extraction | Syncfusion
description: Syncfusion® Smart Data Extractor is a .NET library that extracts text, tables, forms, and images from PDF and image files with structured outputs.
platform: document-processing
control: SmartDataExtractor
Expand Down Expand Up @@ -185,7 +185,6 @@ using (FileStream stream = new FileStream("Input.png", FileMode.Open, FileAccess
File.WriteAllText("Output.md", data, Encoding.UTF8);
}


{% endhighlight %}

{% highlight c# tabtitle="C# [Windows-specific]" %}
Expand Down Expand Up @@ -392,7 +391,7 @@ using Syncfusion.SmartDataExtractor;
// Load the input PDF file.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
// Initialize the Smart Data Extractor.
// Initialize the Data Extractor.
DataExtractor extractor = new DataExtractor();
// Disable table detection.
//By default - true
Expand All @@ -416,7 +415,7 @@ using Syncfusion.SmartDataExtractor;
// Load the input PDF file.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
// Initialize the Smart Data Extractor.
// Initialize the Data Extractor.
DataExtractor extractor = new DataExtractor();
// Disable table detection.
//By default - true
Expand All @@ -425,7 +424,7 @@ using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess
PdfLoadedDocument pdf = extractor.ExtractDataAsJson(stream);
// Save the extracted output as a new json file.
pdf.Save("Output.json");
// Close the document to release resources.
// Close the document.
pdf.Close(true);
}

Expand All @@ -452,10 +451,8 @@ using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess
{
//Initialize the Data Extractor.
DataExtractor extractor = new DataExtractor();

//Enable form detection in the document to identify form fields.
extractor.EnableFormDetection = true;

//Configure form recognition options for advanced detection.
FormRecognizeOptions formOptions = new FormRecognizeOptions();
//Recognize forms across pages 1 to 5 in the document.
Expand All @@ -472,10 +469,8 @@ using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess
formOptions.DetectRadioButtons = true;
//Assign the configured form recognition options to the extractor.
extractor.FormRecognizeOptions = formOptions;

//Extract form data and return as a loaded json file.
PdfLoadedDocument pdf = extractor.ExtractDataAsJson(stream);

//Save the extracted output as a new json file.
pdf.Save("Output.json");
//Close the document.
Expand All @@ -495,10 +490,8 @@ using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess
{
//Initialize the Data Extractor.
DataExtractor extractor = new DataExtractor();

//Enable form detection in the document to identify form fields.
extractor.EnableFormDetection = true;

//Configure form recognition options for advanced detection.
FormRecognizeOptions formOptions = new FormRecognizeOptions();
//Recognize forms across pages 1 to 5 in the document.
Expand All @@ -515,10 +508,8 @@ using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess
formOptions.DetectRadioButtons = true;
//Assign the configured form recognition options to the extractor.
extractor.FormRecognizeOptions = formOptions;

//Extract form data and return as a loaded json document.
PdfLoadedDocument pdf = extractor.ExtractDataAsJson(stream);

//Save the extracted output as a new json file.
pdf.Save("Output.json");
//Close the document.
Expand Down Expand Up @@ -546,12 +537,10 @@ using Syncfusion.SmartTableExtractor;
// Load the input PDF file.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
// Initialize the Smart Data Extractor.
// Initialize the Data Extractor.
DataExtractor extractor = new DataExtractor();

// Enable table detection and set confidence threshold.
extractor.EnableTableDetection = true;

// Configure table extraction options.
TableExtractionOptions tableOptions = new TableExtractionOptions();
// Extract tables across pages 1 to 5.
Expand Down Expand Up @@ -581,7 +570,7 @@ using Syncfusion.SmartTableExtractor;
// Load the input PDF file.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
// Initialize the Smart Data Extractor.
// Initialize the Data Extractor.
DataExtractor extractor = new DataExtractor();
// Enable table detection and set confidence threshold.
extractor.EnableTableDetection = true;
Expand Down Expand Up @@ -623,7 +612,7 @@ using Syncfusion.SmartDataExtractor;
// Load the input PDF file.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
// Initialize the Smart Data Extractor.
// Initialize the Data Extractor.
DataExtractor extractor = new DataExtractor();
// Apply confidence threshold to extract the data.
// Only elements with confidence >= 0.75 will be included in the results.
Expand All @@ -647,7 +636,7 @@ using Syncfusion.SmartDataExtractor;
// Load the input PDF file.
using (FileStream stream = new FileStream("Input.pdf", FileMode.Open, FileAccess.Read))
{
// Initialize the Smart Data Extractor.
// Initialize the Data Extractor.
DataExtractor extractor = new DataExtractor();
// Apply confidence threshold to extract the data.
// Only elements with confidence >= 0.75 will be included in the results.
Expand Down
Loading