The Way to Programming
The Way to Programming
Scientific article PDFs have a common structure say Paper title, Abstract, Introduction, Proposed scheme, Experiments, Conclusion, References and so on. There will be sections and subsections for each.
How can I accept a PDF document as input and extract its physical structure. I want to get the physical structure of the PDF along with the data.
Eg: if the pdf document is like
Article Title
Abstract
sentences
Introduction
section-1
subsection-1
subsection-2
Proposed scheme
section-1
subsection-1
and so on….
I am coding in C#.
I want the logical structure of the document. Please remember that all scientific articles have a similar logical structure.
I have heard of some heuristic measures, but due to time constraint, I am looking for codes. Can anyone please help?
I am not allowed to use any publicly available libraries such as parsCit and so on. This is for my project work.
Sign in to your account