TSEP (The Search Engine Project) has been in development since 2004. Girish started this project as a requirement for a site which needed a search engine to search its pages which numbered up to 150 pages approximately. So he went through the currently available Open Source search engines software, but was unable to find any that was easy to understand and setup for a web-master. Since then he has built and improved this software and will continue to do so. Olaf joined in since v0.9 beta and is responsible for the (main) development and documentation since then.
The primary objective of this software is 'ease of use'. If you still think that this software is difficult to setup and / or use please let us know, and we will personally help you as well as make us aware of the complexities. By submitting this software to the Open Source community we strive for this software to become the most powerful personal site search engine in the world.
We have made every effort to make the copyright notice nice and small, so please do not remove it from your site, so that others too can discover what a great tool TSEP is.
We are very interested where TSEP is being used. Therefore we would really appreciate it if you could contact us to let us know the web address where we can take a look (and grab a screenshot) or - if it's an intranet - to send us a screenshot.
A word about versioning: we publish a new version whenever we think one is ready. A version number does not indicate the quantity or complexity of the changes applied since its previous version. We might add a 0.001 to a version number but still have made huge changes to TSEP. In brief, we recommend downloading every new version of TSEP.
Before you start, backup your files and your database!
Also, when upgrading from a previous version, follow the installation procedure completely.
If - and only if - you have installed version 0.917 and above you can run the 'update 0923.sql' in phpMyAdmin to insert the new variable into the config which has come to use in version 0.923.
Continue with the next step: configuration.php
These files - when correctly executed - will create the database tables with some starting values. Below is a screenshot of the database model:
You have now integrated TSEP into your own page. Continue with Security.
For security you might want to protect the include and the admin directory using .htaccess!
Make sure you have set all the values to your need in the configuration.php file in the admin directory before you continue!
This was introduced in v0.912. Open the configutation.php directly after installation and set the correct values, especially for language and the TSEP path - and of course every other value.
For populating the database with the values for the search engine to search for, you have to run the file 'indexer.php' (in the admin directory) with '?index' added to the end in the web browser address bar i.e. http://www.sitename.com/admin/indexer.php?index . Now input the details asked by the form. By submitting the form with the appropriate values you will see the results of the indexing after a few seconds. The script will provide a detailed information on the number of pages, the title, the URL, the size and the indexed words found by the indexing script. Also the entries you have made in the form are saved to the database for later re-use / re-indexing.
To run a search, open your search page or the page we prepared for you called 'tsepsearch.php' in your browser and input the words to be searched. The search words are not case sensitive.
TSEP supports boolean search if you have a MySQL version equal or higher than 4. Below are some the boolean search features. Important: Your tables must be MyISAM tables if the boolean search should work. (they should be MyISAM when we created them)
The minimum length of a search term is 4, see MySQL restrictions below for details. (User defined) stopwords are not marked in the results and not used in the database query.
You can add, update and delete your own stopwords.
Stopwords are words which will not be searched on your pages. This means that when using a stopword as a searchterm, it will not be marked as a searchterm in the results.
Stopwords are not case sensitive. This means that if you enter "Apple" in the stopwords section and the users searches for "apple" this word will be treated as a stopword.
Please note that there are MySQL restrictions as well!
This was introduced in version 0.911. The administrator can define in the setup file whether and what search activity should be logged. All log entries are accompanied by a timestamp. The admin can decide to log the following: IP address, search term and clicks on the results.
The administrator may want to analyse what users are searching for on his site and make navigation to those points easier.
The administrator can also log the IP address of the person searching. Be aware that people might not like the idea of you "spying" on them. But we thought this might be a useful feature - maybe especially for Intranets. In those, if someone is totally lost the administrator can take him by his hand and help directly.
The administrator may want to notify the users if their actions are being loged, especially when logging their IP address.
For sorting the log entries by IP adddress, MySQL v3.23 or higher is required.
Quote:
Any word that is too short is ignored. The default minimum length of words that will be found by full-text searches is four characters.
Quote:
Words in the stopword list are ignored. A stopword is a word such as ``the'' or ``some'' that is so common that it is considered to have zero semantic value. There is a built-in stopword list.
For more details you might read on the source page of these quotes: 13.6 Full-Text Search Functions
The restrictions are covered on 13.6.3 Full-Text Restrictions
People with access to the MySQL server though can fine-tune their MySQL to overcome these restrictions. You find information about this on 13.6.4 Fine-Tuning MySQL Full-Text Search
More on built-in MySQL stopwords you will find when you search the MySQL page for "stopword list".
Personally I do not see the big problem about the built-in stopwords because they are so general that probably no one really trying to find something will enter "you" as a search word. Searching is nothing new to people so that they will enter words which they think match what they need best. This also comes down to that they will enter words which are probably long enough not to fall under the length restriction. Also those are English words and TSEP is now ready for other languages as well. (Olaf)
The version of TSEP is included in the 'title' tag of the copyright notice. This means that you can move your cursor over the copyright notice (on the bottom of the search page for example) and after a little while your browser should display the version number.
The version number is read from a textfile in the include directory named tsepversion.txt. There is no need to change anything in this file: it is maintained by the programmers.
If you decide to create a new language please mail us the language.php file which you created, so that we can add it to the next version.
Language files define the PHP variables which are being used in the TSEP files. Place the language.php into a subdirectory of the language directory. Let's say you are creating a Spanish version:
Some people asked how they can delete a word from the index or correct a word. In version 0.910 we introduced the possibility to do this right from TSEP. Follow the link to the "Index Editing" on the indexer.php page.
But there are 2 other possibilities (old, but work):
At this time you will have to change the code. We are planning to put this as a configuration possibility into the config.php file.
For now follow these steps to add more filetypes to TSEP to index. Please test your changes. You will not be able to index any binary data of course!
Rank means that all pages are shown ordered by the number of hits they received by all search words. Example: You get 2 results after a search, on the page with rank 1 the search words were found more often than on the page with the rank 2 - simple but very useful if you have many pages on your site and the user might face lots of results.
This is simple but takes a little while. To make things as easy as we can, we will take a look on the result page step by step. The formating we show you here is from version 0.911. It might change in future but still be pretty much the same.
Please note that there are additional div-blocks in the search page. Those are only shown when errors occur (stopword was searched, MySQL version to low...) Therefore we leave it up to you for now to look deeply into these formattings and for the general users sake we stick with something most people will see.
If you have done some nice formating we would appreciate it if you could contact us and send us your CSS file so that we could include it in a new TSEP version.
All of TSEP - on all TSEP pages is in the following div container to provide a global area for TSEP.
With this knowledge already you can change the look very much, for example setting the .tsepProject class in the tsep.css file to another font. This will change all fonts in the TSEP area to whatever you define.
Now that you know the header, let's look on the next part of the search page: The .SearchBlock which contains the search form fields and the help - which as you can see has it's extra div container .SearchHintsHelp .
This SearchBlock is being followed by another .SearchBlock which provides status information. This whole block is repeated at the bottom of all search results. If you know a little about CSS you should be able to format this block to fit your needs.
This first container of this type is followed by our search results. Here we use the following classes:
.SearchResultAllPagesBlock - this is the block of all the results.
.SearchResultOnePageBlock - this is a block of one resulting page.
.SearchResultOnePageTitle - this is the title of the webpage we found in the database.
.resultnumber - this is the rank of the page. (details: rank).
.SearchResultPageRank - displays how many times the page had a hit from the searchwords.
.SearchResultOutput - these are the words which we indexed - until we encounter the first "explode" character (a . (dot) right now).
.foundSearchWord - this is one of the words the user has searched. We can mark it special so that the user sees it faster.
.SearchResultOutputMore - these are the little dots which show the user there is more on the page.
.SearchResultURL - is the URL of the page we have found, extended by the size of the page (as written in the database).
"Warning: set_time_limit(): Cannot set time limit in safe mode in /..../tsep/admin/indexer.php on line 110"
This is nothing really important. It shows only in the admin area. The error occurs when the safe-mode on the server is on. No problems except this are know at this time (concerning the safe-mode).
You might run into problems with MySQL v3.23 or lower. If you are running such a version we would be happy to hear if TSEP is working for you or and what kind of problems you have encountered.
It seems that with MySQL 5 alpha there are problems concerning the indexer.php. We will assume for now that is an issue of the new MySQL version.
Try populating the $db_table_prefix.config table by hand (using phpMyAdmin for example) with the values you find in the SQL dump.
Warning: array_multisort(): Array sizes are inconsistent in /srv/www/htdocs/blabla/php/tsepsearch/search.php on line 410
You will notice that the results are not sorted correctly.
Some entries you made in the indexer (when creating a new index) are wrong. Please check and maybe index your site again. You can also look for indexed pages with zero (0) words in the index.
This software has been tested on Windows and Linux systems with Apache as web server running PHP v4.2 or greater and MySQL (v4 or greater for boolean capabilities). 'allow_url_fopen' option should be enabled for PHP.
Please mail us any suggestions or questions you ay have or post them to the Sourceforge forums. We welcome any response. If you need help or "something does not work" please include the version number of TSEP you are using.
Software by: Olaf Noehring (main development since 0.9beta (excluding)) and Girish R (main development until 0.9beta (including))
Version: TSEP 0.9nnn
This file has been last modified on:
2004-09-01 9:38 AM
by Olaf Noehring
Copyright (c) 2002-2004, Girish R & Olaf Noehring. All Rights Reserved.
Support & Info (Summary on Sourceforge): http://sourceforge.net/projects/tsep/
Contact: Olaf Noehring (email on website: http://www.team-noehring.de) or Girish R at:
girishr at gmail.com with your comments, suggestions, enquires or requirements.
This file is part of TSEP (The Search Engine Project)
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA