Custom Text Import

<< Click to Display Table of Contents >>

Navigation:  Import Subtitles > Standard formats >

Custom Text Import

This feature is very helpful if you have to import a text, RTF or Word Document with subtitles that doesn't match any of the file formats natively supported by EZConvert.

This option can be accessed from the File menu. Choose "Advanced Import..." and then "Custom Text Format".

After you have selected the desired file a window (same as the listed below) shows up. There you can instruct the program how to interpret the file so the timecode, text and format information is extracted correctly:

 Custom Text Import Settings


Custom Text Import Settings

Basic principles

Before explaining the available options in the above dialog window it is important to learn more about the Custom Text Import module principles and how it works.

The whole process can be split into three parts:

1. Find the subtitles segment

First using the Subtitles begin and Subtitles end fields the program locates the part of the file containing the subtitles.

The fields contain search patterns (described below) that helps to locate the subtitles part of the text. This way the header or footer can be skipped.

You can define and use both or only one of them. In this case EZConvert will locate the file's header or footer and each line after the header and before the footer will be considered as the part that contains the subtitles.

If you leave the Subtitles Begin and End fields empty, this means that there are no header or footer and the subtitles start with the very first line in the file.

2. Locate the individual subtitles

The Subtitle header and Subtitle footer fields are intended to locate the individual subtitles, defining the start and respectively the end of each one.

All the text between them is considered a valuable text which has to be imported.

At least one of these parameters must be specified. If the subtitle footer is empty the text between the current header and the next one is considered as the subtitle's body containing the text.

You can also specify the subtitle's footer only and in this case the text between two footers will be considered as the subtitle's body.

The already located subtitle's header and footer sections will be additionally checked if they consist any time code for the in and out cues.

3. Extract the subtitle text

After the subtitles has been located, EZConvert will interpret all the individual lines in the subtitle using the Row Delimiter.

The Italic Start and Italic End options should be used if you wish to keep the italic formatting of the text.

If the Italic End field is empty that means that Italic Start just flips the italic attribute.

Preview

Any time you make any changes at the Custom Import dialog,  you can press the Preview button and have a quick review of the file and check if the text and the time code will be correctly imported.

Please note that due to the limitations of the preview window the italic text is surrounded by "<" and ">" marks.

If there is nothing displayed in the Preview you have to adjust the Import Settings, because they don't match the layout of the imported file.

Load/Save templates

When you tune all the parameters, you can save them as a template and you won't need to enter the same settings again and again when you have to handle more files of the same type.

Parameters

Code Page

Specifies the text file encoding. More about selecting the proper code page you can find in Plain ASCII/Unicode Text Files topic.

The code page field is not available if you are importing RTF or .doc files.

Force subtitle text bottom

This option can be used when the imported text has some empty rows bellow the last text line. EZConvert will always place the subtitles on the bottom row.

Regular expression syntax

You can use regular expressions or direct match when specifying the pattern parameters. More about the regular expression syntax you can find later in this topic.

Comment line

You can define the pattern used in the file as a comment. Comments will be imported and saved to the output as long as the selected output format supports commentaries.

Subtitles begin
Subtitles end

These identifies where the subtitles begin and end if there is header or footer section in the file. You can leave them empty when there are no header or footer sections.

Subtitle header
Subtitle footer

The Subtitles header and footer are used to locate the individual subtitles. At least one of them must be specified.

Usually the Subtitles Header/Footer sections contain the timecodes so please remember to properly include the <in_cue>, <out_cue> tags in your pattern.

Row delimiter

Using the row delimiter the text between the subtitle header and footer is divided into individual lines. It is important to define it correctly.

Otherwise the text won't be properly wrapped in EZConvert.

Italic start
Italic stop

The Italic Start/Stop options are useful to locate and read the text when it is italics formatted.

If you leave the Italic Start option open, EZConvert won't import any italic formatted text.

If the Italic End field is empty that means that Italic Start just flips the italic attribute

Reset italic on new line

By switching this option on each text line will start with italics turned off.

Reset italic on new subtitle

By switching this option on each subtitle will start with italics turned off.

Subtitle layout

The available options are In- and Out-cues and In-cues only.

As the name of the first layout suggests the program looks for In- and Out-cues in the subtitle header and footer.

The second option is mostly used for Closed Captions. The subtitles are chained and there are no Out-cues. When the next In-cue is reached the subtitle simply pops-up.

If there are no subtitles for some period of time the next subtitle is without text. Here is an example of such layout:

 

10:00:00:00 The first subtitle.
10:00:02:00 and the next one pops-up now.
10:00:04:00
10:00:24:00 This one appears after 20 s gap.

Pattern syntax and Tags

The parameters explained in the Custom Import setup represent search patterns. If no Regular Expression option is selected the text specified with these patterns will be searched for a direct match with the text in the file. Exception are the Tags which have to be surrounded by "<", ">"  and will instruct the program to search for the following attributes:

<in_cue>, <out_cue>

Subtitles' in- and out-cues. By default the timecode format is HH:MM:SS:FF. If the timecode in your file is formatted differently you can specify the pattern by using the following form of syntax: <in_cue hh:mm:ss.ff>  (note that the frames are separated with “.”). The format timecode patterns are explained later in this topic.

<space>

The <space> tag is used to instruct EZConvert that the next symbol is either a space or tab character.

<dur>

Duration of the subtitle. The duration is formatted as SS:FF by default, but you can change it by specifying another pattern.

Thus if you want to separate the seconds and frames by “.”, specify <dur ss.ff>. The format timecode patterns are explained later in this topic.

<num>

This is the subtitle sequential number. Actually this can be used to represent any number in the subtitle header, footer, etc.

<sttl_num>

Subtitle number as it is in EZConvert. Contains number and an optional letter. For example if there are subtitles numbered 3 and 3a, etc.

<skip_eol>

Skip any characters until the end of the line is reached. Please note that you are placed at the end of the line and not at the start of the next line.

To skip everything until the beginning of the next line use "<skip_eol><new_line>"

<new_line>

New line.

<tab>

Tab character is expected.

<cr>

Cartridge return character (ASCII code 13).

<lf>

Linefeed character (ASCII code 10).

Format Timecode Patterns:

In-, out-cues and duration representation are customizable by using timecode format patterns.

Here are some examples: " hh:mm:ss:ff", "hh:mm:ss.ff", "hh:mm:ss.nn", "hh:mm:ss.nnn", "feet+ff", "feet.ff",  "ms", "frames", etc., where:

hh - hours, mm - minutes, ss - seconds, ff - frames, nnn - miliseconds, nn - 10 miliseconds (1/100 seconds),
ms - the whole time is in miliseconds, frames - the whole time is in frames.
feet - used to specify 35mm feet/frames timecode. Must be followed by a separator ("+", "." ...) and frames. 1 feet = 16 frames.  

Regular expressions

Regular expressions provide a powerful way to precise the patterns and perform comprehensive searching when the corresponding option from the Custom Import dialog is switched on.

In addition to the standard syntax you can use the tags described above. This way the pattern:

^<num>\s+<in_cue hh:mm:ss,nnn>\s+<out_cue hh:mm:ss,nnn>.*(<new_line>)?

will match any line in the following form:

433    00:43:23,270 00:43:25,439   2,169 L         

EZConvert uses a fairly standard regular expression syntax and you can search the Internet for any guide as reference. Here is a brief list of the symbols used:

.

Any character except new line.

\s

Any whitespace character - space or tab.

\S

Any non-whitespace character.

\d

Any digit.

\D

Any non-digit.

+

One or more instances of the previous symbol.

*

Zero or more instances of the previous symbol.

?

Zero or one instances of the previous symbol.

{n}

n instances of the previous symbol.

{n,m}

From n to m instances.

{n,}

At least n instances.

^

Start of line.

$

End of line.

\

Escape character. If for example you want to search for "$" you have to specify "\$".

More on how the regular expression works can be found at: http://en.wikipedia.org/wiki/Regular_expression

Examples

Here are two examples on how to perform the custom text import option on two different type of files.

Example I.

If you are sent a file with subtitles listed as:

1                                                      Here the first row represents the subtitles' number.

00:00:39,254 --> 00:00:44,384                The second row contains the timecode for the in- and the out- cues.

ПОРТОКАЛ С                                              And the next two rows represents the subtitles' text.

ЧАСОВНИКОВ МЕХАНИЗЪМ

 

2

00:01:40,982 --> 00:01:42,692

<i>Това съм аз.</i>                                The whole row is in italic

 

3

00:01:43,151 --> 00:01:45,862

<i>Алекс и моите трима друзя.</i>

 

4

00:01:46,112 --> 00:01:49,783

Пит, Джорджи и Дим.

 

Choose File\Import\Custom text Format from the EZConvert menu. When the import window appears you will be needed to do as explained below.

First choose the appropriate Code Page. As in this example it should be 1251 ANSI Cyrillic.

As you have looked at the file you are trying to import you are well aware at this point that no Regular expression syntax is needed so you just have to leave that checkbox unchecked.

There are no comments in the file either as well as there are no file header or footer so subtitle begin or subtitle end fields should be left empty.

What you need to fulfill is the subtitle header field. And you can do this by entering:

<num><new_line><in_cue hh:mm:ss,nnn> --> <out_cue hh:mm:ss,nnn><skip_eol><new_line>

 

<num> should represent the subtitles number as listed with the first row of the original file.

<new_line> should be applied because the timecode is placed over a new row below the subtitles number.

<in_cue hh:mm:ss,nnn> --> <out_cue hh:mm:ss,nnn> represents the timecode in- and out-cues for the current subtitle. Note that between the in- and out- cue a separator is used. In this case it is a "-->" but it could be just about any symbol so you have to check it with the original file and apply it correctly in EZConvert. Make sure you have left a blank space between the beginning and the end of the separator if needed.

Then you should apply <skip_eol> to skip any other character until the end of the line is reached.

<new_line> is used in this case because the subtitle text is located on the next row after the header.

After you have set the subtitles header you should move on to the Row delimiter field. In this case it should be <new_line> because the individual text rows are separated by a new line.

If the file you are trying to import has some characters listed in italic you could import them as well by applying <i> at the Italic Star field and </i> at the italic stop next to it.

With the subtitles footer field you should apply <new_line><new_line> as in the original file there always are a new line before each separate subtitle.

Then you should check your work by clicking on the Preview button.

After a few seconds EZConvert will display the imported subtitles exactly as they will appear within the program.

Example II.

If you are given a file with subtitles listed as:

c 3 00000002 00000065 'Life and how to live it' 'Episode 1' 0

c 1 00000345 00000415 'In fact every end is a beginning.' 'As well as every beginning is an end in fact.' 0

c 1 00000417 00000489 'Hello Jonathan.' '-Hello Jane. How are you?' 0

c 1 00000493 00000579 'Doing time, you know. Same thing different day. How about you?' 0

Here the in- and out- cues are represented not with minutes, seconds and etc. but with frames. Next to that the subtitles lines are surrounded by ' character.

Here is how you should import this into EZConvert.

Choose custom text import as explained with the previous example.

This time the Regular expressions syntax will be used.

Then in order for EZConvert to import the subtitles correctly you must put down this at the subtitles header field:

\D\s\d\s<in_cue frames>\s<out_cue frames>\s'

\D is there to represent the non digit character at the beginning of each line in the original file. In this case its the "c".

\s should be applied to explain the blank space after the first character.

\d is there to tell EZConvert that right after there is a digit. In this case "1".

Then another \s should be applied to represent the next blank space between the characters.

<in_cue frames> should stand for the timecode is in frames with the original file.

The next \s is there for the blank space between the in- and out- cues. Then <out_cue frames> should be applied. And \s' is needed to represent the next blank space and the next quote character.

The next step is specifying the Row delimiter. In this case it should be '\s' as it is seen in the file.

Then you need to specify the subtitles footer. In this example it should be:

'\s\d<new_line>

' is there to represent the end of the line. Then an \s is applied to mark the blank space in the original file. The \d should be used to skip the "0" character and then a <new_line> is applied to mark that the current subtitle is over and the next one will begin with the next row in the original file.

The next step should be to preview the imported subtitles.