Create an import filter

Top  Previous  Next

Before starting to define an import filter, first copy and paste a sample record into the "Example" field so that you can easily check format details while working on the import filter. For each import filter, there are some properties that are common to all the sub-filters such as the record separator, the way to parse author names, the date format, etc.

 

import_filter_edit

About

Import Filter: When creating a new import filter, give it a name using the edit box at the top.
Based On: If the new import filter is based on an existing one, you should enter the original import filter name. If you click the BaseOn button in the Import Filters window, the "Based On" field is automatically filled.
Category: You can tag an import filter with any categorizing scheme. For example: Life Sciences; Chemistry; etc. If there is more than one category added, separate them by a semi-colon.
Provider: Enter the database provider name. Commercial citation databases are owned by a few companies. Enter the company name here if the import filter is made for such a commercial database. If your import filter is made for a university library catalog, the university name should be entered here.
Database: Enter the database name. Each commercial citation database usually have a well- known name such as PsycINFO. The database may be sold from one provider to another, but the database name usually does not change. The same database could also be available from more than one provider. For example, Medline database is produced by the National Library of Medicine, but it is also available from several commercial providers.
Favorite: Check the "Favorite" box if you want to add it to the Favorite list. In the import filters window, you can easily display Favorite import filters by clicking the Favorite button.
Last Update: Enter the date of your last update as a six digit integer. For the date January 22, 2009, 20090122 should be entered.
Comments: Enter your notes about this import filter in the Comments field. For example, how data should be prepared for a smooth import, what data sources this import filter can be used, etc.
Example: Always paste an example first so you can easily find out what the data looks like when designing an import filter.

import_filter_record

Record

Each record in the import file has to be separated by an unique marker. This can be a blank line, special text indicating the beginning of a record, or special text indicating the end of a record.

Blank Line: Each record is separated by a blank line. Biblioscape will create a new record once it reads a blank line. You should not select "Blank Line" until you are sure all records are separated by blank lines and blank lines are not present in a single record. For example, if there is a blank line inside the Abstract field, Biblioscape will treat part of the abstract as a separate record. This will cause only part of a record to be imported. If there are multiple blank lines between each record, you can still select this option.
Sep. Text: If records in your file are separated by special characters or text, enter it here. For example, each record may be separated by "--------" in a line, or there may be repeated text before every record, like "Produced by Science Data Corporation ...". If there is such at the beginning or end of each record, you can use it as the separator.
First Tag: Each field in the import file should be tagged. If there is a tag field consistently placed at the top of each record, you can use this tag as a record separator. For example, Title or Authors fields are usually used as the first tag. In the screenshot shown above, if the tag "%0" is used as the first tag for every record, you can use it as the separator. Using a tag as a separator does not affect the text to be imported into a data field.
Last Tag: If a tagged field appears consistently at the end of each record, you can use it as the record separator. For example, some commercial database providers put the database provider name as the last tag for each record. You can then use it as the separator and can still map this tag to a database field.

Note: When you use text to match a tag field, regular expression string can be used. To use regular expression, please put your regular expression string inside "RE(...)RE" so Biblioscape knows the regular expression engine should be used instead of simple word matching. For example, NLM PubMed uses the tag "PMID - " as the first tag, but some sites put one space before "-" and other sites put two spaces before "-". To make your import filter work for both sources, you can use regular expression RE(PMID *- )RE as the First Tag separator. The star letter in regular expression means that the previous letter, which is a space, can be found once or many times, so the first tag in both cases will be found as a hit.

 

import_filter_date

Date and Others

How to parse Data: Specify the format of date in your import file, then put separator text between day, month and year. We recommend you select "Smart Parsing" as the Date Format unless you are sure all dates are displayed in the same format. With "Smart Parsing", Biblioscape can import the date correctly in several cases. If "Smart Parsing" is selected, there is no need to specify "First Date Separator" and "Second Date Separator".

 

Identifiers for imports that require multiple filters: Some information providers, like Ovid Technologies, may include records from several databases with different formats in one file. In such cases, identifiers are needed to tell Biblioscape which import filter to use. For Biblioscape import filters that support such multi-format files, vendor and database identifiers need to be added so they can match the ones used in the file from data providers like Ovid. This is how it works: When Biblioscape finds a tag that has been entered as "Vendor Identifier", it will use the "Database Identifier" to find the import filter that matches both tags. Then, it uses the found import filter for that record, and for the next record, if a different database identifier is found, another matching import filter will be used for the next record. Here is an example of the first two lines of a file from Ovid.

 

VN - Ovid Technologies

DB - MEDLINE

.......

 

Vendor identification: The tag that identifies the data provider should be put into the Tag box. For example: "VN - ". The data provider's name should put into the Text box. For example: "Ovid Technologies".

Database identification: The tag that identifies the database should be put into the Tag box. For example: "DB - ". The database name should put into the Text box. For example: "MEDLINE".

 

import_filter_authors

Authors and Keywords

Author name format: The author name could be formatted differently in different sources, so make sure you select the "Author name format" that matches the ones in your file. If there are variations in author name format, you can pick “Smart Parsing” which will work in most cases.
What separates each author: The author field in most reference files has more than one author, so you need to specify a separator between each author in the box "What separates each author". The most popular separator used is "; ". For example: Smith, K.; Bowen, P.
What separates each keyword: If there is more than one keyword on a line, please give the symbol that separates each keyword in box "What separates each keyword". The most common separators are "; " and ", ". For example: Nucleoside; Cancer; ...
Parenthetical data in "Authors" field: Some citation database providers put author related data inside a parenthesis next to the author name. The most common data are author's role, date of birth, etc. For example: Smith, K. (1945-). You can choose to discard the text, import it into another data field, or keep it as is. If you decide to keep the text along with parenthesis, there is no need to worry about this messing up your formatted citations and bibliography. Biblioscape will ignore text inside parenthesis when generating formatted authors.

import_filter_replace

Replace or Remove

The import file may contain some characters you want to remove or replace. For example, DIALOG search results will put “|” at the end of each data field. To remove this unwanted character, you have to type the field tag into the “Limit changes to Field or Tag” box, then enter “|” in “Find what” box and leave the “Replace with” box blank.

 

You can also change the case of imported text to “Title case”, “Sentence case”, “Lower case”, or “Upper case”. You can limit the “Replace or Remove” operation to a tag field or to a data field; a tag field means the text after a tag in the import file. For example: "AU: Smith, K.; Bowen, P." is a tag field. To limit the "Replace or Remove" operation to a tag field, you have to type the tag into the edit box. A data field means the text of a data field in your database. To limit the "Replace or Remove" operation to a data field, you can select the "Data Field" on the right and click the left arrow button. If you limit the changes to a tag field, the "Replace and Remove" operation will be applied before the data is parsed. If you limit the changes to a data field, the "Replace and Remove" operation will be applied after the data is parsed by the import filter. When importing one tag field into multiple data fields, it is better to limit “Replace or Remove” to a data field instead of a tag field, such as when importing the following text:

 

JN: Journal-of-Public-Policy; 1985, 5, 133-153.

 

The “-” has to be replaced by a space “ ” if you limit the “Replace and Remove” operation to tag field “JN:”; the hyphen “-” between start page and end page will be replaced also. It is better to limit the “Replace and Remove” operation to data field “Journal, Book, etc.” which can be selected from the combo box on the right. Click the arrow button to move the selected data field “Limit changes…” box.

 

If you want to insert text into a field during import, leave the “Find what:” edit box blank and type the words you want to insert into the “Replace with:” edit box. The text will be inserted at the end of the field. If you put “^” at the beginning of the text in the “Replace with:” edit box, the text will be inserted at the beginning of the field. For example, you may want to insert words “Found in library” to the end of field “Custom 1” for all the records you want to import. Then, just limit the changes to the "Custom 1" data field, leave the "Find what:" box blank and enter "Found in library" to the "Replace with:" field.

 

Note

To find what text should be replaced or removed, regular expression string can be used. To use regular expression, please put your regular expression string inside "RE(...)RE" so Biblioscape knows the regular expression engine should be used instead of simple word matching. The real power lies in the use of regular expression. If you know regular expression, you can use it to pre-process a tag field with powerful pattern matching and make some seemly impossible things happen.

 

Multiple Lines

If the text of a tagged field takes more than one line, Biblioscape will combine all the lines according to the following rule: If the tagged field is “Authors” or “Keywords”, the lines will first be trimmed then joined by “; ”. For other fields, the lines will first be trimmed then joined by “ ”.

 

When making an import filter, you have to take multiple line trimming issues into consideration such as when designing an import filter for the following tagged field:

 

AU Kurita Y. Masuda H. Suzuki K. Fujita K.

Kawabe K.

 

The second line will be trimmed automatically by Biblioscape and combined with the first one by “; ”, so the final text will look like:

 

AU Kurita Y. Masuda H. Suzuki K. Fujita K.; Kawabe K.

 

You should use “Find & Replace” to replace “ ” in tag field “AU ” to “; ” and use “; ” to separate individual author.