Get The Logical Structure Of An Academic Article Without Using Libraries - Code With C

This topic has 2 replies, 3 voices, and was last updated 9 years, 4 months ago by Amit.

Get the logical structure of an academic article without using libraries

ShikhaTan Member October 29, 2015 at 6:18 pm

Scientific article PDFs have a common structure say Paper title, Abstract, Introduction, Proposed scheme, Experiments, Conclusion, References and so on. There will be sections and subsections for each.

How can I accept a PDF document as input and extract its physical structure. I want to get the physical structure of the PDF along with the data.

Eg: if the pdf document is like
Article Title
Abstract
sentences

Introduction
section-1
subsection-1
subsection-2
Proposed scheme
section-1
subsection-1

and so on….

I am coding in C#.

I want the logical structure of the document. Please remember that all scientific articles have a similar logical structure.

I have heard of some heuristic measures, but due to time constraint, I am looking for codes. Can anyone please help?

I am not allowed to use any publicly available libraries such as parsCit and so on. This is for my project work.

#8266
Amit Member October 29, 2015 at 6:19 pm

One thing to note is that all of these articles are actually written using Latex, with a specific template, e.g. have a look at this fir IEEE articles:

https://www.sharelatex.com/templates/journals/ieee-journal

#8268

Viewing 1 reply thread

You must be logged in to reply to this topic.

English