TSEP (The Search Engine Project) has been developed for several months now. Girish started this project as a requirement for a site which needed a search engine to search its pages which numbered up to 150 pages approximately. So he went thru the currently available open source search engines software, but was unable to find any that was easy to understand and setup for a web-master. Since then he has built and improved this software and will continue to do so. Olaf joined in on version 0.9 beta and did the (main) development and documentation since then.
The primary objective of this software is 'ease of use'.
By submitting this software to the open source community we can all hope that this software will one day become the most powerful personal site search engine in the world.
If you still think that this software is difficult to setup and / or use please let us know that, we will personally help you and also it will make us aware of the complexity part.
We think that we have made the copyright notice nice and small enough - even for your site. Please do not remove it and leave it visible at all times, so that hopefully others will discover what a great tool TSEP is.
We are very interested where TSEP is being used. Therefore we would really appreciate it if you would contact us to let us know the web address where we can take a look (and grab a screenshot) or, if it's an intranet if you could send us a screenshot.
A word about our versioning: We publish a new version when we think it's ready. A version number does not tell you much about any changes that we have done since it's previous version. We might add a 0.001 to a version number but still have made huge changes to TSEP. All in one sentence: We think it's worth downloading every new version as a change in the version number indicates something has happened to TSEP.
Dear user,
Thank you for downloading TSEP (The Search Engine Project). We hope this manual will help you install this TSEP on your site in minutes! We know how frustrating it is to go thru a lengthy manual and end up understanding nothing. So we have kept everything to the minimum and simple. If you experience some problems during installation, don't hesitate to contact us, we would be pleased to help you.
General:
There have been a whole lot of changes, but you do not need to worry - it should be simple.
Follow the install instructions completely.
Continue with the next step: configuration.php
The queries will create the tables which are needed with some starting values. Below you see a screenshot of the database modell which will be created.
You are DONE now, you have integrated TSEP into your own page. Read Security now.
For security you might want to protect the include and the admin directory using .htaccess!
Make sure you have set all the values to your need in the configuration.php file in the admin directory before you continue!
This was introduced in 0.912.
As mentioned before, you need to (you must) open the configutation.php directly after installing everything. You must set the correct values, especially for language and the TSEP path - and of course every other value.
For populating the database with the values for the search engine to search for, you have to run the file 'indexer.php' (in the admin directory) with '?index' added to the end in the web browser address bar i.e. http://www.sitename.com/admin/indexer.php?index . Now input the details asked by the form. By submitting the form with the appropriate values you will see the results of the indexing after a few seconds. The script will provide a detailed information on the number of pages, the title, the URL, the size and the indexed words found by the indexing script. Also the entries you have made in the form are saved to the database for later re-use / re-indexing.
Now its time to run a search. Open your search page or the page we prepared for you called 'tsepsearch.php' in your browser and input the words to be searched. The search words are not case sensitive.
TSEP supports boolean search if you have a MySQL version equal or higher than 4. Below are some the boolean search features
Minimum length of the search word is 4, see MySQL restrictions below for details. (User defined) stopwords are not marked in the results and not used in the database query.
You can add, update and delete your own stopwords.
Stopwords are words which will not be searched on your pages. This means that if the user is trying to search for a stopword this will not be taken into account and marked as stopword in the area where the words the user has searched for are displayed.
Stopwords are not case sensitive. This means that if you enter "Apple" in the stopwords section and the users searches for "apple" this word will be recognized as a valid stopword and because of this will not be searched for and not be marked in the results.
Please note that there are MySQL restrictions as well!
This was first introduced in versin 0.911. The admin can efine in the setup file if things should be logged at all and which. If something is written to the log always the time then will be written as well. The admin can decide to log the following: IP address, search term and clicks on the results.
We thought when the administrator knows what people are searching for on his site he might make navigation to those points easier, put even more effort into the design and updating of those pages... Probably we can find many more good reasons.
You can also log the IP address of the person searching. Be aware that people might not like the idea of you "spying" on them. But we thought this might be a useful feature - maybe especially for Intranets. In those, if someone is totally lost the administrator can take him by his hand and help directly.
You might want to notify your users if you are logging their actions, especially if you are logging their IP address.
For sorting the log by IP adddresses MySQL >3.23 is needed!
Quote:
Any word that is too short is ignored. The default minimum length of words that will be found by full-text searches is four characters.
Quote:
Words in the stopword list are ignored. A stopword is a word such as ``the'' or ``some'' that is so common that it is considered to have zero semantic value. There is a built-in stopword list.
For more details you might read on the source page of these quotes: 13.6 Full-Text Search Functions
The restrictions are covered on 13.6.3 Full-Text Restrictions
People with access to the MySQL server though can fine-tune their MySQL to overcome these restrictions. You find information about this on 13.6.4 Fine-Tuning MySQL Full-Text Search
More on built-in MySQL stopwords you will find when you search the MySQL page for "stopword list".
Personally I do not see the big problem about the built-in stopwords because they are so general that probably no one really trying to find something will enter "you" as a search word. Searching is nothing new to people so that they will enter words which they think match what they need best. This also comes down to that they will enter words which are probably long enough not to fall under the length restriction. Also those are English words and TSEP is now ready for other languages as well. (Olaf)
The version you are running is written into the 'title' tag of the copyright notice. (Please remember that we ask you to leave this notice visible) This means that you can move your mouse cursor over the copyright notice (on the bottom of the search page for example) and after a little while your browser should display the text we provide in the 'title' tag.
This information (the version number) is read from a simple textfile in the include directory named tsepversion.txt. There is no need for you to change anything in this file yourself. It is frequently updated by the programmers.
If you decide to create a new language please mail us the language.php file which you created, so that we can add it to the next version.
Language files are quite simple. They define PHP variables which are being used in the TSEP files. Place the language.php into a subdirectory of the language directory. Let's say you are creating a Spanish version:
Some pleople asked how they can delete a word from the index or correct a word. In version 0.910 we introduced the possibility to do this right from TSEP. Follow the link to the "Index Editing" on the indexer.php page.
But there are 2 other possibilities (old, but work):
At this time you will have to change the code. We are planning to put this as a configuration possibility into the config.php file.
For now follow the following steps to add more filetypes to TSEP to index. Please try if your changes work. You will not be able to index any binary data of course!
Rank means that all pages are shown ordered by the number of hits they received by all search words. Example: You get 2 results after a search, on the page with rank 1 the search words were found more often than on the page with the rank 2 - simple but very useful if you have many pages on your site and the user might face lots of results.
This is simple but takes a little while. To make things as easy as we think we can for you we will take a look on the result page step by step. The formating we show you here is from version 0.911. It might change in future but still be pretty much the same.
Please note that there are additional div-blocks in the search page. Those are only shown when errors occur (stopword was searched, MySQL version to low...) Therefore we leave it up to you for now to look deeply into these formattings and for the general users sake we stick with something most people will see.
If you have done some nice formating we would appreciate it if you could contact us and send us your CSS file so that we could include it in a new TSEP version.
All of TSEP - on all TSEP pages is in the following div container to provide a global area for TSEP.
With this knowledge already you can change the look very much, for example setting the .tsepProject class in the tsep.css file to another font. This will change all fonts in the TSEP area to whatever you define.
Ok, now that you know the header lets look on the next part of the search page: The .SearchBlock which contains the search form fields and the help - which as you can see has it's extra div container .SearchHintsHelp .
This SearchBlock is being followed by another .SearchBlock which provides status information. This whole block is repeated at the bottom of all search results. If you know a little about CSS you should be able to format this block to fit your needs.
This first container of this type is followed by our search results. Here we use the following classes:
.SearchResultAllPagesBlock - this is the block of all the results
.SearchResultOnePageBlock - this is a block of one resulting page
.SearchResultOnePageTitle - this is the title of the webpage we found in the database
.resultnumber - this is the rank of the page. (details: rank)
.SearchResultPageRank - displays how many times the page had a hit from the searchwords.
.SearchResultOutput - these are the words which we indexed - until we encounter the first "explode" charcter (a . (dot) right now)
.foundSearchWord - this is one of the words the user has searched. We can mark it special so that the user sees it faster.
.SearchResultOutputMore - these are the little dots which show the user there is more on the page
.SearchResultURL - is the URL of the page we have found, extended by the size of the page (as written in the database).
You might run into problems with old (<3.23) versions of MySQL. If someone of you is running such an old version we would be happy to hear if TSEP is working for you or and if what kind of problems you encounter.
It seems that with MySQL 5 alpha there are problems concerning the indexer.php. We will assume for now that is an issue of the new MySQL version.
Try feeding the $db_table_prefix.config table by hand (using phpMyAdmin for example) with the values you find in the SQL dump.
This software has been test on windows and linux systems with Apache as server running PHP 4.2 or greater and MySQL (4 or greater for boolean capabilities). 'allow_url_fopen' option should be enabled for php.
Software by : Olaf Noehring (main development since 0.9beta (excluding)) and Girish R (main development until 0.9beta (including))
Version : TSEP 0.9nnn
This file has been last modified on:
2004-07-23 9:35 AM
by Olaf Noehring
Copyright (c) 2002-2004, Girish R & Olaf Noehring. All Rights Reserved.
Support & Info (Summary on Sourceforge): http://sourceforge.net/projects/tsep/
Contact: Olaf Noehring (email on website: http://www.team-noehring.de) or Girish R at:
girishr at
gmail.com with your comments, suggestions, enquires or requirements.
This file is part of TSEP (The Search Engine Project)
We think that we have made the copyright notice nice and small enough - even for your site. Please do not remove it and leave it visible at all times, so that hopefully others will discover what a great tool TSEP is.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA