|
Create an import filter |
Top Previous Next |
|
Before starting to define an import filter, first copy and paste a sample record into the "Example" field so that you can easily check format details while working on the import filter. For each import filter, there are some properties that are common to all the sub-filters such as the record separator, the way to parse author names, the date format, etc.
About
Record Each record in the import file has to be separated by an unique marker. This can be a blank line, special text indicating the beginning of a record, or special text indicating the end of a record.
Note: When you use text to match a tag field, regular expression string can be used. To use regular expression, please put your regular expression string inside "RE(...)RE" so Biblioscape knows the regular expression engine should be used instead of simple word matching. For example, NLM PubMed uses the tag "PMID - " as the first tag, but some sites put one space before "-" and other sites put two spaces before "-". To make your import filter work for both sources, you can use regular expression RE(PMID *- )RE as the First Tag separator. The star letter in regular expression means that the previous letter, which is a space, can be found once or many times, so the first tag in both cases will be found as a hit.
Date and Others How to parse Data: Specify the format of date in your import file, then put separator text between day, month and year. We recommend you select "Smart Parsing" as the Date Format unless you are sure all dates are displayed in the same format. With "Smart Parsing", Biblioscape can import the date correctly in several cases. If "Smart Parsing" is selected, there is no need to specify "First Date Separator" and "Second Date Separator".
Identifiers for imports that require multiple filters: Some information providers, like Ovid Technologies, may include records from several databases with different formats in one file. In such cases, identifiers are needed to tell Biblioscape which import filter to use. For Biblioscape import filters that support such multi-format files, vendor and database identifiers need to be added so they can match the ones used in the file from data providers like Ovid. This is how it works: When Biblioscape finds a tag that has been entered as "Vendor Identifier", it will use the "Database Identifier" to find the import filter that matches both tags. Then, it uses the found import filter for that record, and for the next record, if a different database identifier is found, another matching import filter will be used for the next record. Here is an example of the first two lines of a file from Ovid.
VN - Ovid Technologies DB - MEDLINE .......
Vendor identification: The tag that identifies the data provider should be put into the Tag box. For example: "VN - ". The data provider's name should put into the Text box. For example: "Ovid Technologies". Database identification: The tag that identifies the database should be put into the Tag box. For example: "DB - ". The database name should put into the Text box. For example: "MEDLINE".
Authors and Keywords
Replace or Remove The import file may contain some characters you want to remove or replace. For example, DIALOG search results will put “|” at the end of each data field. To remove this unwanted character, you have to type the field tag into the “Limit changes to Field or Tag” box, then enter “|” in “Find what” box and leave the “Replace with” box blank.
You can also change the case of imported text to “Title case”, “Sentence case”, “Lower case”, or “Upper case”. You can limit the “Replace or Remove” operation to a tag field or to a data field; a tag field means the text after a tag in the import file. For example: "AU: Smith, K.; Bowen, P." is a tag field. To limit the "Replace or Remove" operation to a tag field, you have to type the tag into the edit box. A data field means the text of a data field in your database. To limit the "Replace or Remove" operation to a data field, you can select the "Data Field" on the right and click the left arrow button. If you limit the changes to a tag field, the "Replace and Remove" operation will be applied before the data is parsed. If you limit the changes to a data field, the "Replace and Remove" operation will be applied after the data is parsed by the import filter. When importing one tag field into multiple data fields, it is better to limit “Replace or Remove” to a data field instead of a tag field, such as when importing the following text:
JN: Journal-of-Public-Policy; 1985, 5, 133-153.
The “-” has to be replaced by a space “ ” if you limit the “Replace and Remove” operation to tag field “JN:”; the hyphen “-” between start page and end page will be replaced also. It is better to limit the “Replace and Remove” operation to data field “Journal, Book, etc.” which can be selected from the combo box on the right. Click the arrow button to move the selected data field “Limit changes…” box.
If you want to insert text into a field during import, leave the “Find what:” edit box blank and type the words you want to insert into the “Replace with:” edit box. The text will be inserted at the end of the field. If you put “^” at the beginning of the text in the “Replace with:” edit box, the text will be inserted at the beginning of the field. For example, you may want to insert words “Found in library” to the end of field “Custom 1” for all the records you want to import. Then, just limit the changes to the "Custom 1" data field, leave the "Find what:" box blank and enter "Found in library" to the "Replace with:" field.
Note To find what text should be replaced or removed, regular expression string can be used. To use regular expression, please put your regular expression string inside "RE(...)RE" so Biblioscape knows the regular expression engine should be used instead of simple word matching. The real power lies in the use of regular expression. If you know regular expression, you can use it to pre-process a tag field with powerful pattern matching and make some seemly impossible things happen.
Multiple Lines If the text of a tagged field takes more than one line, Biblioscape will combine all the lines according to the following rule: If the tagged field is “Authors” or “Keywords”, the lines will first be trimmed then joined by “; ”. For other fields, the lines will first be trimmed then joined by “ ”.
When making an import filter, you have to take multiple line trimming issues into consideration such as when designing an import filter for the following tagged field:
AU Kurita Y. Masuda H. Suzuki K. Fujita K. Kawabe K.
The second line will be trimmed automatically by Biblioscape and combined with the first one by “; ”, so the final text will look like:
AU Kurita Y. Masuda H. Suzuki K. Fujita K.; Kawabe K.
You should use “Find & Replace” to replace “ ” in tag field “AU ” to “; ” and use “; ” to separate individual author.
|