Creating Multiple Language PDFs using Apache FOP
by Balaji Loganathan
This article will explain how to create PDFs in multiple languages using Apache FOP with XML and XSL.
This article assumes that the reader is familiar with basic of Apache FOP, XML and XSL.
This is a 5 step process...
Step 1: Locate font.
For english you don't need to find extra fonts unless you need more style.
For other languages, you need to find either a TrueType or Type1 font.
For example for Arabic you can use the TrueType(Iqraa.ttf) font downloadable at
http://www7.bev.net/civic/icb/ICB_Arabic.html
Store the specific font file in your hard disk say at C:\ folder.
Note: You have to explicitly tell to fo:block to use particular font for rendering other language
data's, otherwise a ? or # symbol will appear in the generated PDF.
Step 2: Create a language resource XML file.
This XML file contains text values in
various languages with an special element called "fontname" which will
tell FOP what font to use for displaying the specific language text. For example
Chinese text cannot be displayed using fonts like Helvetica or Arial, so we will
assign specific font name for specific language.
Sample XML structure (Lets call it as Lang.xml)
<?xml version="1.0" encoding="utf-8" ?>
<Lang>
<en><!-- for english-->
<fontname>Arial</fontname>
<text1>Consignee!</text1>
</en>
<fr><!-- for french -->
<fontname>Arial</fontname>
<text1>Destinataire!</text1>
</fr>
<ar><!-- for Arabic -->
<fontname>Naqsh</fontname>
<text1>المرسل إليه</text1>
</ar>
<jp><!-- for Japanese -->
<fontname>MSGothic</fontname>
<text1>荷受人!</text1>
</jp>
<ch/> <!-- Chinese -->
</Lang>
Step 3: Configure userconfig.xml
Now read the document at
http://xml.apache.org/fop/fonts.html
carefully, which will
explain how to add, embed a new TrueType or Type1 font for FOP to understand the input
character and display it at particular font style.
For example:
To import and use the arabic font C:\Iqraa.ttf, you have to generate the
metrics file first using FOP TTFReader java file, like
>java org.apache.fop.fonts.apps.TTFReader C:\Iqraa.ttf Iqraa.xml
then you have to change your userconfig.xml file like
<font metrics-file="Iqraa.xml" kerning="yes" embed-file="C:\myfonts\Iqraa.ttf">
<font-triplet name="Iqraa" style="normal" weight="normal">
</font>
this will tell FOP how to display text with Iqraa font style for Arabic texts.
Step 4: Configure the style sheet
Configure the XSL which you will use for converting the XML in to XSL:FO and then to PDF.
In the XSL file, try to import particular language data and store it in a XSL variable
For example the below code will store the fr\text1 value in the variable "message" and
the fontname to use in the variable "font".
<xsl:variable name="message" select="document('Lang.xml')/Lang/fr/text1"/>
<xsl:variable name="font" select="document('Lang.xml')/Lang/fr/fontname"/>
Step 5:
Use it
Now use this in fo:block like this
<fo:block font-family="{$font}"><xsl:value-of select="$message"/></fo:block>
It is important to make sure that FOP.bat or FOP.sh is able to locate userconfig.xml, Iqraa.xml, Iqraa.ttf and LAng.xml
Make sure that you specify the option "-c userconfig.xml" while running the FOP
For example
>FOP -c userconfig.xml -xml InputXML.xml -xsl MultiLAng.xsl - pdf MultiLang.pdf
That's it.
With some XSL tricks you can make everything dynamic without hard coding any part.
For example: Arabic font always starts at right end which can be made dynamic by supplying some extra language specific tags in Lang.xml
The sample file MultiLang.xsl and Lang.xml can be used for your local testing, however it is important to configure the above mentioned steps for proper display of texts in PDF.You can also have a look at the generated PDF MultiLang.pdf
Now the question is UNICODE. Use XMLSPY or Visual Studio or equivalent editor to edit your Lang.xml file,
For example for to display "Consignee" in Chinese, go to http://www.babylon.com/ copy and paste the
Chinese word into the Lang.xml using the XML editor, the XML editors (like XMLSPY) will take care of encoding them to UTF8.