Skip to content

PdfInvalidFormatException occurs for some PDF files when using PDFtoImage 5.2.0 on Ubuntu 22.04 #175

@vsolominov

Description

@vsolominov

PDFtoImage version

5.2.0

OS

Linux

OS version

Ubuntu 22.04

Architecture

x64

Framework

.NET (Core)

App framework

Console App

Detailed bug report

Problem

I am using the PDFtoImage NuGet package version 5.2.0 to extract images from PDF files on Ubuntu 22.04. Some PDFs are processed correctly, but on certain PDFs I get the following exception:

PDFtoImage.Exceptions.PdfInvalidFormatException: File not in PDF format or corrupted.

Interestingly, if I limit the application to process only one or two PDFs per run, all files are processed correctly. The issue consistently occurs when processing the third file in the same run.

Code to Reproduce

internal class Program
{
    private static async Task Main(string[] args)
    {
        foreach (var file in Directory.EnumerateFiles(Directory.GetCurrentDirectory(), "*.pdf"))
        {
            await using var input = new FileStream(file, FileMode.Open, FileAccess.Read);
            using var _ = Conversion.ToImage(input, page: 0, leaveOpen: false);
        }
    }
}

Environment

OS: Ubuntu 22.04
PDFtoImage: 5.2.0
.NET version: 10

Additional

On Windows, this issue does not occur.
Tried manually forcing garbage collection, but it did not help.
Tried other methods from the Conversion class, but the issue persists.


test_1.pdf
test_2.pdf
test_3.pdf
test_4.pdf
test_5.pdf
test_6.pdf

It seems like there might be a resource leak or an issue when multiple PDFs are processed sequentially.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions