Excel Translations

Excellence for Global Growth

Offices and Operation Centers
in North America, Europe, and South America
  • Medical
  • Technical
  • Corporate
  • Processes
  • Certifications
  • Services
  • Contact
  • Blog
  • Industry Reports

How to Translate PDF Documents in Several Languages – Part 2

translate pdf documents part 2In Part 2 of this post we will continue covering some aspects related to PDF conversion and discussing the state-of-the-art in accomplishing the translation task.

Click here if you’d like to view part one

What’s the big deal with PDF conversion?
Why not simply cut and paste, it’s easy to do and doesn’t cost anything.

Extracting textual information from PDFs – though time-consuming – can seem relatively easy at first glance. You can copy and paste, take screenshots and even manually retype any needed information. However, it becomes nearly impossible when copying from the PDF isn’t allowed or when a pasted section produces results that cannot be used. Also, it may seem easy to overcome an iceberg when you consider only the visible part, but there are more things to consider under the surface. The visual part of a PDF document – the look and feel – is only the tip of the iceberg.

Are all PDFs created equal?

Every PDF has its own shape and features – no two are the same. There are several different flavors of PDF, but you can reduce all flavors into basically 2 types: Distilled PDF and Scanned PDF. You get Distilled PDF when you produce a PDF document from a text publishing tool (via Acrobat Distiller or other PDF writers). Adobe Acrobat allows other flavors of PDF to contain raster images of each of the pages of the document (with or without some text in the background to allow text searching). These PDF documents are referred to as Scanned PDF. You get these when you scan paper documents (via Acrobat Exchange or some other method).

Does the type of PDF created matter?

Yes, it does. When it comes to converting PDFs into an editable format, the nature of the PDF does matter. Extracting text from a Scanned PDF is not that simple and it requires at least some tailoring to the problem at hand and good OCR software. The complications arise when, for instance, the image is noisy or text pixels cannot be well distinguished from the background. In this case, the OCR process does not work as smoothly because it depends on the quality of the provided PDF. Usually it will require a lot of clean up once they are converted.

What types of documents will convert easily?

It is important to note that process optimization is a utopia when it comes to translating PDFs, but as a general rule, the simpler the layout of the source documents, the better the converted documents will be. For instance, if you are converting novels, since there is typically not much layout in the source documents, you can expect a lot of success (and hence very little cleanup) in converting these to editable format. If, on the other hand, you’ve got complex pages such as scanned scientific journal pages, which are likely to contain multiple columns, lots of complex tables, math, footnotes and bibliographies, you should expect have to do a fair amount of cleanup on the converted documents.

Is there anything happening to make PDF conversion easier in the future?

Several tools have been designed and developed to interact with PDF documents. Beside the common Adobe products and solutions, third party developers propose many different softwares and API, either under license or as freeware. Consequently, a wide range of PDF tools are proposed in the market. Most of them allow for the extraction of textual content but their practical use is limited in the sense that the text’s reading order is not necessary preserved, especially when handling multi-column documents, or in the presence of complex layouts.

Adobe Acrobat X Pro [ https://acrobat.adobe.com/us/en/acrobat/acrobat-pro.html ] does a startlingly good job of exporting PDF files into Word or Excel editable documents. It isn’t perfect, and didn’t select the correct fonts when exporting my test documents, but it did a far better job of preserving the original format than anything I’ve seen in third-party software. This export function worked best when I used Distilled PDFs—not from a scanned image. In contrast, Scanned PDFs contain only a picture of the original text, and Acrobat can only extract the text by using its built-in Optical Character Reading (OCR) software. Acrobat X has more accurate OCR than previous versions did, but it still lags far behind the best third-party OCR software like ABBYY Finereader 10 Professional Edition [ https://www.abbyy.com/finereader/ ].

Our experience is that you need to experiment with various options to see which ones best fit into your needs and work best with your PDF documents. Our approach is constantly re-evaluating the various tools, methods and techniques available and incorporating the best of what’s out there into what we do.

The fact is all PDF files used as a source for translation need reworking before they’re translated into several languages. By making the native source documents available to your translation partner, you will avoid any rework or any unnecessary preparation of the documents before translation can start. It will allow us to perform a full analysis and it will let you stay in control of your budget and schedule without any surprises down the road. PDFs serve a purpose, but when it comes to translation, there is nothing better than the real thing: native source documents (such as FrameMaker, InDesign, Quark XPress, etc).

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

For a FREE Translation Quote

Name*
GDPR Agreement*
This field is for validation purposes and should be left unchanged.

Categories

  • Company Updates
  • Industry News
  • Info Articles
  • Press Releases

Most Recent Posts

  • Ask ChatGPT – August 1, 2024
  • Ask ChatGPT – July 30, 2024
  • Ask ChatGPT – July 29, 2024
  • Ask ChatGPT – July 25, 2024
  • Ask ChatGPT – July 22, 2024
  • Ask ChatGPT – July 18, 2024
  • Ask ChatGPT – July 17, 2024
  • Ask ChatGPT – July 15, 2024
  • Ask ChatGPT – July 11, 2024
  • Ask ChatGPT – July 9, 2024

Search

 

 

Excel Translations Overview

About Excel Translations

Excel Translations is a full-service, US-based corporate translation agency and technical translation agency servicing the global medical, technical, and corporate translation needs of companies and organizations worldwide.

With offices and operations in the US, Europe, and South America, we are your local and global source for translation services performed in certified quality management system ‡. If you have questions for us, you may visit our contact page.

Connect With Us

  • Facebook
  • LinkedIn
  • Twitter

View our Privacy Policy

© 2025 · Excel Translations

We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.OkPrivacy policy