Author: | Enrico Zini |
---|---|
Contact: | |
Revision: | $Revision: 1.0 $ |
Date: | $Date: 2005/05/31 23:02:42 $ |
Copyright: | GNU GPL v2 or later. |
Abstract: | The Debian archive is getting larger and larger, and the software more and more diverse and complex. Organising software in the archive is difficult, and the existing section system, designed to cope with a much smaller number of packages, is no longer sufficient. The goal of Debtags is to provide a working alternative for categorising software that can cope with our numbers. The core idea of Debtags is to adapt the technique of Faceted Classification to be used for our packages. Faceted Classification is a 70-years-old library science technique which is being rediscovered and loved by modern Information Architects. Debags attaches categories (we call them tags) to packages, creating a new set of useful structured metadata that can be used to implement more advanced ways of presenting, searching, maintaining and navigating the package archive. Example uses of Debtags include searching for software, browsing the archive, and filtering out unwanted groups of packages. The Debtags effort needs to face three major problems:
All three issues are being actively addressed with good results:
This paper gives a broad technical overview of the Debtags project, its theoretical foundations, and the tools available for it now. The paper also offers some practical tutorials on how to do all sort of nice Debtags tricks. |
Contents
As of May 24, 2005, my Debian unstable system counts 16769 different binary packages, and I feel very powerful.
I can access at least three complete office suites, various painting programs (of which at least one is designed for kids and one for world-famous digital effect movie studios), software for massively parallel quantum chemistry, GIS geographical tools, dental office management software, and, my favourite, a tool to have a cute cow read my presentations:
________________________________ / a tool to have a cute cow read \ \ my presentations: / -------------------------------- \ ^__^ \ (oo)\_______ (__)\ )\/\ ||----w | || ||
As a Debian user, I am very powerful. However, there is a simple question that I'm having a hard time answering:
What do you want to install in your computer? [ ] 3dchess 3D chess for X11 [ ] 3ddesktop "Three-dimensional" desktop switcher [ ] 44bsd-rdist 4.4BSD rdist. [ ] 6tunnel TCP proxy for non-IPv6 applications [... 16765 more options follow ...]
And another one:
What do you want to remove from your system to make up some space? [1563 options follow]
If I search the package archive, I get:
Now, I profoundly believe that having choice is really good: what we need is not to reduce the size of the archive, but to create some way to make it easier for people to find what they need.
What does it mean, "make it easier"? Here is a nice measure: "in front of a list with more than about 7 plus or minus 2 items, our brain goes banana" [ZEN] [MILLER].
The way we are currently organizing packages is by grouping them in sections: Debian main has about 16197 binary packages divided in 33 different sections. However, this means an average of 490 packages per section, which is too many.
In fact the problem is even worse than this, because some sections are more lightly subscribed than others. After all, the more packages a section has, the more likely that the one package you want is found therein. By this metric, the Debtags team calculates that the typical Debian package is found in a section of 785 binaries. Such a number approaches the size of the entire archive at the time when the section system was first introduced.
Increasing the number of sections would not help too much: to have an average of 20 packages per section we would need 800 different sections: but then, how does one choose among 800 sections?
Besides the numbers, current software is getting more and more complex: for example, which section should a full-featured web browser such as Mozilla be put in? net? web? mail?
Sections were fine when Debian was smaller and simpler. Now there is a bug (#144046: Sections are not finely grained) asking for something better.
That bug has been assigned to Debtags. And Debtags is going to close it.
The original idea of Debtags' was just to allow more than one section per package. That would have allowed us to categorize Mozilla under net, web and mail simultaneously, allowing the user to find it under whichever classification they had in mind.
That is how we started, and that is how we got stuck. What do we mean with net? Some possibilities are:
All of these are probably valid interpretations. And then, why shouldn't all of Gnome belonging to net, since the 'N' in GNOME means 'Network'?
Classifying packages is a bigger problem than it seems at first sight. Luckily I suspected that someone has been thinking about it already, and I did some research.
It turned out that there is a whole new science calling "Information Architecture" whose goal is to sort out the Information Mess of the Information Age.
It also turned out that our problem had been solved somewhere in 1933, but they forgot to leave me a note.
Shiyali Ramamrita Ranganathan
Once upon a time in India, a mathematician and librarian called Shiyali Ramamrita Ranganathan started a secret project to innovate software categorization for Debian. It was about 1931, and he did not want to use the words "Debian" and "software" just yet, so he disguised all of his work under words such as "library", "books" and "library science" [STECKEL].
Ranganathan is famous for his five laws of library science:
If we do a simple word substitution, we obtain the perfect rules for classifying Debian packages:
These rules look simple, yet they are important and deep: rule number 1 tells us that software is there to be used, which means that people need to know about it and find it. This is the main reason for organizing packages: to allow people to use them.
Rules number 2 and 3 tell us that not every software is appropriate for every user, and users need to be able not only to find software, but also to find the software which is appropriate for them.
Rule number 4 tells us that the research must be quick and efficient, and the user should not spend a lot of time trying to search through long lists of packages, or learning a complicated search system.
Rule number 5 says that Debian keeps growing and evolving, and any system we design will have to cope with that.
This is exactly the purpose of Debtags. Codified by Ranganathan in 1931.
Ranganathan has been working all his life to solve the problem of classifying books in big libraries. To do so, he designed a system that he called Faceted Classification or Colon Classification:
A faceted classification [is a way of classification that] uses clearly defined, mutually exclusive, and collectively exhaustive aspects, properties, or characteristics of a class or specific subject.
Wynar, Bohdan S. ''Introduction to cataloging and classification''. 8th edition. p. 320
In other words, with faceted classification we have more than one set of categories: one for each aspect of our packages. Every different aspect of an object is described using a separate set of categories, called "facet".
Ranganathan has also given suggestions on how to identify the facets to categorise. He suggested that there are 5 fundamental kinds of facets, which are sometimes addressed as "PMEST" because of their initials:
- Personality
- what the object is primarily about. This is considered the "main facet."
- Matter
- the material of the object.
- Energy
- the processes or activities that take place in relation to the object.
- Space
- where the object happens or exists.
- Time
- when the object occurs.
If at first sight this seems unrelated to software, remember that Ranganathan was talking about books, and we need to do some mapping. For example, pixelcharmer has been mapping the PMEST to blog entries:
Let's follow her steps and map the PMEST to Debian packages:
- Personality
- What the package is primarily about. Answers the question what is it?
- Matter
- The "material" that constitutes the package. Answers the question what is it made of?
- Energy
- The processes or activities that take place in relation to the object. Answers the question what do I do with it?
- Space
- Where the object happens or exists. Answers the question where do I use this?
- Time
- When the object occurs. I still haven't found a suitable mapping for this: it might just be that Debian packages exist a bit outside of time, and that would explain how come Sarge took so much to release.
In Debtags theoretical foundations I gave an example of many different ways to interpret the section net. We can try to look at some of them again using the PMEST:
The PMEST is useful in that it gives us more mental structure to understand the properties of an object and to resolve the ambiguities in its categorisation.
Let's see how this could help, with three examples:
Now, let's look better at what they want:
User A will probably start searching through Gnome software. User B will probably start searching among software implemented in C. User C will probably start searching for commandline tools. And they will probably all look for software to manage an audio CD collection.
These three people are looking at the package archive from different angles.
Luckily, Faceted Classification does just that: it categorizes the items from different aspects. If we do things right, there will be something like a 'Desktop' facet, an 'Implemented-in' facet, a 'Kind-of-interface' facet and a 'Purpose' facet, each of these facets categorised with their sets of tags; and everyone will be able to find what they are looking for.
The key concepts of Debtags are Facets, Tags, the Vocabulary and the Tag Database. Let's consider them one by one.
Facets are the aspects of our packages that we choose to categorise. Example of facets in Debtags are implemented-in, culture, interface, media.
Each facet defines an aspect of packages we look at. For each facet there is a set of categories that we use to describe what we see when we look at that aspect of a package.
Faceted categorization is like saying, "If I look at what is the use of this package, I see editing". "If I look at what is its user interface, I see a text-mode user interface".
Facets are "If I look at X"; tags are "I see X".
The list of facets currently categorised by Debtags is available in The Vocabulary.
The tag is a Debtags category. Debtags uses different sets of categories, that we call tags, to categorise different facets of software. We use the word tags instead of categories because it's shorter, and we talk about tags a lot.
In Debtags, we represent tags with a short English name, prefixed by the name of the facet it categorises. For example, when we look at what media types a package can work with and we categorise "The Gimp", we use the tag media::rasterimage.
The Vocabulary is where we store the list of facets and tags that we use for Debian packages.
Having a common vocabulary is important so that we all refer to the same property with the same name. Other benefits of the common vocabulary include the abilities to present selections of facets and tags in a user interface, to store meta-information about them (such as a short and long description, possibly in many languages), and to validate classification information.
The Debtags vocabulary is the file that delineates the facets and lists the respective tags that can be used to categorize Debian packages. The vocabulary file contains whatever needs to be stored about facets and tags. At the moment this means:
This information is stored in a format similar to the usual Debian package records. For example:
Facet: implemented-in Description: Language that the package is implemented in Nature: matter Status: complete Tag: implemented-in::c++ Description: C++
This defines a facet called implemented-in and a tag c++ inside it.
The main location of the vocabulary is in the subversion repository at svn://svn.debian.org/debtags/vocabulary/debian-packages; from there, it makes its way to the default Debtags tag source and then, by means of debtags update, it is stored locally in /var/lib/debtags/vocabulary. It is also indexed by debtags update to allow applications to quickly lookup entries inside it.
There are currently 32 facets defined, containing a total of 440 tags. To see the facets along with their descriptions and other data, you can run debtags update and then use grep-dctrl:
grep-dctrl -FFacet -r . /var/lib/debtags/vocabulary
A tutorial on sending patches to the vocabulary can be found at http://debtags.alioth.debian.org/vocabulary.html
A tag is assigned to a package by entering the appropriate information in the Tag Database.
Designing this database is one of the most tricky and important implementation details of Debtags, as it is the way to share the tags and make them useful.
An obvious place for storing the tags is in the control file of the packages, together with the other package metadata. This possibility has been discarded, however, for a couple of good reasons:
We chose instead to store tag data in a central tag repository, located at http://debian.vitavonni.de/packagebrowser, from which it can be downloaded in the user's computer by means of the debtags update command, just like the package database can be downloaded by means of apt-get update.
This makes it possible to store and maintain the categorization data without causing an added burden to the Debian infrastructure. It also allows to have different tag sources merged together. Check this out because it's cool!
Let's make an example with a fictitious Debian-Circus CDD (Custom Debian Distribution) that chooses to categorise packages also from the aspect ("facet") of their possible use in a circus. To do so, they want to have in Debtags a circus facet, categorised with tags such as travel-info, installation-procedures, inventory, artist-management, show-management, propaganda, accounting.
To realise this, Debian-Circus can create their own piece of Vocabulary and Tag Database and make it available to be downloaded as a tag source. debtags is able to download data from various tag sources (listed in /etc/debtags/sources.list) and merge them together.
Different tag sources can be merged easily when facets differ, and they complete each other's view of packages by shedding light on more aspects of them.
Now, Debian-Circus provides categorization along the circus facet, and Debian provides categorization for the suite and media facets. Any circus sysadmin can now merge this information together and look for circus::propaganda software for suite::gnome that can work with media::vectorimage.
Isn't it exciting?
The main location of the database is inside Erich Schubert's packagebrowser at http://debian.vitavonni.de/packagebrowser. From there, it makes its way to the default Debtags tag source and then, by means of debtags update, it is merged with other tag sources and stored locally in /var/lib/debtags/package-tags.
The database is stored locally in an easily parsable format that can also be understood by tools such as tagcoll and tagcolledit. It is also indexed by debtags update to allow applications to look entries up quickly in it.
The individual user can modify the Debtags database locally. Such local modifications are stored as tag patches in the file ~/.debtags/patch in the user's home directory. A tag patch is a special patch format that can represent a change to a tag database, and can also be applied to future versions of the database, so that your changes are preserved across debtags update invocations.
Another useful feature of the tag patches is that you can send them to the central database, which will integrate them with the rest of the data. You can do this by using debtags-edit (it has a "File/Mail changes to central database" feature) or by using the command debtags send.
Debtags is not only categorization data, but also a suite of tools that allow to work with the data:
The suite is quite large and diverse, and it allows to do many different kinds of analyses and manipulations of the tagged data.
If you are new to Debtags and you are looking for some function, chances are that it is already implemented somewhere: if you cannot find it, don't hesitate to ask in the debtags-devel mailinglist.
These are some example uses of debtags:
Shows all packages that allow to edit raster images. You can use full arbitrarily complex boolean expressions such as (use::playing && ! use::recording) && media::audio && (interface::commandline || interface::text-mode).
An interesting one is this: !culture::* || culture::italian: it shows all packages that are either not locale-specific or that are specific to my locale. This can be used in package managers to filter out packages for locales not used by the system.
Screenshot of the main window of debtags-edit.
debtags-edit is a graphical interface for searching and tagging packages.
Since I'm always busy experimenting with new features, the interface is quite rough and I'll have to explain it a moment.
Try running debtags-edit. If it does not work, it might be because you have not yet run debtags update to download the Debtags data: do it now and run debtags-edit again.
On the left-hand side of the application window there is a Filter area which allows you to look for packages: you play with the controls, and a list of matching packages appears in the bottom left part of the screen. A notable part of the filter area is the Tags area, which lets you add tags to the filter.
Try pushing the Add button of the tags area and add something like "Use/Editing": the list will display all packages which can edit something.
Now push "Add" again: the list of facets and tags has narrowed down to only the available possibilities, and if you look inside the "Media" facet, you'll see a list of the possible kinds of media that can be edited using Debian packages.
Note
If this example does not work for you, check what is in the "Maintainer:" field: there might be your e-mail there. That is because debtags-edit wants to kindly suggest you to tag the packages you maintain [1]. For the purpose of this example, you can clear the "Maintainer:" field and try again. After I tell you how to tag packages, please put your address back there ;)
Using the filter can search packages by name, do a full-text search à-la apt-cache search, search by maintainer, installed status or tags. This makes debtags-edit an interesting tool already, but let's see more.
[1] | If debtags-edit cannot find the DEBEMAIL environment variable, it will instead leave the "Maintainer:" field empty and preselect "Status: Installed", kindly suggesting you to check if the packages you know and use everyday are tagged well enough. |
You can now click on a package name in the bottom-left part of the application: that will display the package in the main display on the right side. At the bottom part you will see all the package data, very much like the output of apt-cache show, while in the top part you will see the tags of the package. If the package you chose has no tags, try looking for debtags: that will be nicely tagged.
Now try adding a new tag to debtags: try adding "Games/Toys". Push the other "Add" button below the tag list in the top-right part of the window, and navigate the complete facet list until you find the "Games" facet: inside it you will find "Toys". Select it: the tag has been added, and you made your first step into Debtags tagging!
Now try doing "File/Save", then go to a terminal and type debtags show debtags: you will notice that your new games::toys tag is already there. Cool, isn't it?
What happened is that debtags-edit saved your modifications in ~/.debtags/patch, and all Debtags-aware applications read that file at startup. Have a look at the file: it's just a patch with the change you made, and its format is such that it can be applied to any future version of the tag database. Isn't it smart? I'm so proud of it!
Now go back to debtags-edit and look in the "File" menu: there's a "Mail changes to central archive" option. If you click on it, it will mail the contents of your ~/.debtags/patch to the central archive, and the next day your contribute will be available to everyone when they will do debtags update.
What else do you need to contribute to the categorization? Only one more thing maybe: subscribe to the debtags-devel mailing list.
Screenshot of the main window of packagesearch.
packagesearch is a package search program written by Benjamin Mesing. Among its many features, it is able to make use of Debtags data.
To use packagesearch, you set search parameters in the top part of the application, and it displays the results in the bottom. The Debtags search is on the right side, on a tab named "Debtags".
Try running packagesearch, then select the Debtags tag "Use/Editing": you will see all the packages that can edit something. Then select "Media/Raster Image" and you will see all raster image editors in Debian that are known to Debtags.
Compared to debtags-edit, packagesearch does not allow to change the categorization, but it features more search functionalities, and it also integrates apt-file searches.
Lastly, the central tag archive has a web interface (the packagebrowser, written by Erich Schubert) that can be used to navigate the Debtags data and edit them online. You can find it at http://debian.vitavonni.de/packagebrowser/.
One way of working with the packagebrowser is to help with the not yet tagged packages, following the link at the top of the main page.
Another way is to work on packages is from the Debtags TODO report, in which all package names directly link to their edit page.
Note
The packagebrowser edit page lists all available tags and is very long. However, modern browsers have very quick and convenient search functions such as the "/" function in Firefox. Using those functions it becomes quite fast and easy to find what you need in the list of tags to add.
In the words of its developer Thaddeus H. Black, Debram was a primitive attempt to treat the same Debian package problem Debtags more properly solves. Today, Thaddeus has enthusiastically joined Debtags development and is now in the process of merging the large body of Debram data into Debtags. Unable to attend Debconf5, Thaddeus has asked me to recommend to Debram users that they migrate directly to Debtags at their convenience. Post-sarge, Debian Maintainers need not tag their packages in Debram at all; standard Debtags tagging suffices.
Tag your packages, you will
(Master Yoda)
These are some examples on how to use debtags-edit to do some more serious categorization work. Most of them are ways of identifying packages that need some work: once you find them in debtags-edit, you can directly work on them
Fire up debtags-edit and set the filter to display all installed packages with the tag special::not-yet-tagged. Now look at the results, pick something you know and add tags to it. After you are satisfied with its categorization, remove the not-yet-tagged tags and submit.
Click on "Edit/TODO dialog": a dialog will open with various TODO options:
- Empty tagset
lists all packages with no tags at all. Those are usually not reported, because the central repository should automatically add special::not-yet-tagged tags to them; however this is sometimes not the case, especially for local packages or packages that have recently been added in the archive, so this option is available to spot such packages.
- Uitoolkit and not interface
lists all packages which have uitoolkit tags but not interface tags. This might be a normal situation, especially for software libraries, but for normal applications it is probably an anomaly worth checking.
- Uitoolkit and not implemented-in
lists all packages which have uitoolkit tags but not implemented-in tags. This might be normal for documentation about user interface toolkits, however it is an anomaly for applications and libraries, and it's worth checking as well.
- Missing Role
Ideally every package should have a kind of role in the distribution. However, there are lots of packages which do not yet have a role::* tag. Since the tags in the Role facet are still under discussion, this function is good to bring up food for ideas.
- Specials
The options starting with "Specials" allow to see packages whose tag sets are special cases compared to all the other packages with similar tags. How this is computed is a bit complex, but this is proving to be very useful to highlight some corner cases that lead to extra reasoning and improvements of the vocabulary.
Click on "Edit/Facet dialog" to activate an experimental new feature of debtags-edit to find even more corner cases.
This will pop up a dialog with two columns: the Has column and the And not column. Both columns list facets.
If you click on a facet on the "Has" column, the "And not" column will list all the other facets, together with a number. The number represents how many packages have the "Has" facet, but do not have that "And not" facet. The "And not" facets are sorted by increasing values of this number.
Sometimes all the "And not" facets have similar numbers, but sometimes there are facets with numbers that are definitely smaller than the others. Those facets are very likely to have something in common with the current facet in the "Has" column, since a lot of packages have tags from both facets.
For instance, it happens at the time of writing that, when selecting the "Game" facet on the left side, the corresponding "Use" facet reports a count which is one order of magnitude less than the others. This is because of the existence of use::gaming, which is normally found in most, if not all, game::* packages.
What are then those packages that have game::* but not use::gaming? Let's see them: click on "Use" on the right column, then click ok. That will set game:: && !use:: as the base for the package list, and further filtering will be based on them. You can now see what are those packages, and if you want you can check which of them are installed in your computer.
With game::*, it turns out that there is a reason: there are games (many of which are game::toys) that are not useful for gaming. Some examples are cappuccino, cowsay, polygen and fortune. Howewer, one could realise that these toys have some other use before gaming.
Maybe a new "use::something" tag is needed?
The Facet Dialog has helped finding out.
debtags can be useful for tagging in two ways: both to give you hints on what needs to be worked on, and as a low-level tool you can use if you want to automate tagging tasks.
There are three features of debtags you can use to get hints on what to do:
this prints a text report about packages that need some work to do. It includes a range of simple to very complex checks to find out problematic areas of the categorization, and the report that it outputs contains explanations about the reasons behind every group of packages. The report also includes packages that are not installed.
The output of debtags todoreport is periodically processed with rst2html and published at http://debtags.alioth.debian.org/todoreport.html
debtags related shows you what packages are similar (tagging-wise) to a given one. Try running debtags related cdcd: you will see a list of other CD players. You can use the -d switch to increase the distance of the search, that is, to include more packages that are "a bit less similar".
You can use this function to check if the list of related packages is what you would expect: if not, you can check the tags of the missing or extra packages to see if there is anything wrong.
You can also use debtags as a low-level backend if you want to experiment with scripted or automated tagging ideas:
These commands are intended to be run from scripts, and they list the tags attached to a package and allow you to add or remove tags to it. Try this:
debtags tag ls debtags debtags tag add debtags game::toys debtags tag ls debtags debtags tag rm debtags game::toys debtags tag ls debtags
tagcolledit is a generic tool to do mass-editing of tags. It is not specific for Debtags, so it does not know about packages, but compared to debtags-edit it can make it easy to do large-scale operations like renaming of tags.
Let's run tagcolledit:
# Get a version of the tag database including the local patch debtags cat > /tmp/tags tagcolledit /tmp/tags
The interface of tagcolledit is unfortunately very rough, just like debtags-edit, and it needs some explanation. It is divided in two panels, like Midnight Commander, and every panel works in the same way: it has a filter, and a list of matching items (that are packages in our case).
Now try doing this: in the left panel, select the uitookit::qt tag. In the right panel, select the implemented-in::c++ and the interface::x11. Now, QT is a C++ library, so one would expect most of the QT applications to have an X11 interface and to be implemented in C++. With tagcolledit it's very easy to have a look at the left panel, select groups of packages that need fixing (the list supports multiple selection), then right click on the selection and choose "Copy to other panel". This will add the tags implemented-in::c++ and interface::x11 to the selected packages, and they will show up in the other panel as well.
You can do many other tricks with tagcolledit: merge or intersect tag sets, add a tag to a group of items or even to all of them (right click on the tag in the filter: you will see the "Add to all" and "Remove from all" functions).
It is however easy to get carried away and make mistakes, such as adding the use::gaming tag to all the games::toys, to find out later that packages such as fortune or cowsay cannot really be used to play games.
Once you are finished with tagcolledit, you can save the tag database and quit. If you are happy with your changes you will want to integrate them in the main tag database: first create a patch:
debtags mkpatch /tmp/tags > /tmp/tags.patch
This will create a tag patch with the changes you have made with tagcolledit. You can then add your patch to the local tag patch:
cat /tmp/tags.patch >> ~/.debtags/patch
And see your changes in all the Debtags-aware applications.
Then you can submit it, with either debtags-edit or debtags submit.
You can even submit it directly:
debtags submit /tmp/tags.patch
While generally there is a need for a smart brain to do good categorization, there are some tagging tasks that can be automated. For example, it's good enough to assume that all packages that depend on libgtk2.0 will need a uitoolkit::gtk tag, while all packages that depend on libstdc++6 will probably need an implemented-in::c++ tag.
These ideas have brought to creating autodebtag, which is a Perl script that does all kinds of automated reasoning that I and other people in the debtags-devel list conceived so far. Then it prints out a tag patch that can be evaluated, tested and submitted to the central tag database.
autodebtag has also some experimental code to try and bring categorization data from the debram database into the Debtags world.
Benjamin Mesing is also experimenting with setting up a smart Bayesian engine to infer new tags based on what tags are there now. This has very promising and exciting possibilities: there is a prototype available in the autodebtag subversion repository, and it needs more people to play with it.
If you regularly try out all image manipulation software, if you are a Gnome expert, if you are addicted to role-playing games, you might consider adopting a tag.
Adopting a tag means that you take responsibility for checking from time to time that all packages that should have your tag actually have it. It is a very important task, because it introduces a level of reliability and guaranteed accurateness into some parts of the tag database.
For example, Stefano Zacchiroli is an Ocaml guru, and is taking care of the implemented-in::ocaml tag. You will notice that all Ocaml applications have the implemented-in::ocaml tag, and thanks to Stefano you can rely on that tag to be accurate. We need more people to do what he is doing.
To adopt a tag, just send an e-mail to the debtags-devel list, introduce yourself and tell what you want to adopt. If it's already taken, you can team up with the others and share the brainwork.
The vocabulary is another area needing help. Debian spans a very wide and diverse range of areas, and knowing about all of them is nearly impossible.
There are two ways of having a good vocabulary to cover all of Debian's interests:
So far, we have been working mainly on 1. This was because there was poor documentation on Debtags and its technical foundations. Now there is this paper, and it's time to switch to the second strategy.
If you feel like you could help (and you do not need to be a Debian Developer, but a Debian user with a lot of experience in some field), you can read the tutorial at http://debtags.alioth.debian.org/vocabulary.html and join the debtags-devel mailing list.
You can also make local experiments with the vocabulary: the Debtags FAQ has an entry with a small tutorial on how to do it. Someone is trying to do it to provide specific sets of tags for their Custom Debian Distribution, but it's also a good way to experiment with ideas for new facets or tags.
If you are a package maintainer, please run debtags-edit and have a look at your packages. You are probably the best person in Debian to assign tags such as implemented-in::*, interface::* and uitoolkit::*, and it's going to be very easy for you to do it.
Also remember that you can use more than one tag from the same facet, as your package may contain more than one program, or it may contain programs with multiple behaviours: for example, mutt or aptitude have uses both as interface::text-mode and interface::command-line programs.
Now that you know everything about Debtags, you may want to make use of the Debtags data in your applications, or play with Debtags using code instead of existing tools.
If that is the case, there is a selection of libraries and modules available for you.
If you look inside the code of debtags and debtags-edit, you will notice that most of what they do related to tags is calling functions from the libdebtags1 or libtagcoll1 libraries. libtagcoll1 provides generic manipulation functions for tagged stuff, while libdebtags1 uses libtagcoll1 to implement all Debtags functionality.
If you like programming in C++, install libdebtags1-dev; from there, you can either look at debtags and debtags-edit source code for examples (debtags is particularly straightforward) and use /usr/share/doc/libdebtags1-dev/html/index.html for the doxygen documentation.
The documentation still has missing parts. If you have questions, no matter how silly they could seem, please ask them freely in debtags-devel list: I have promised to myself that I will turn every answer I give into more documentation of libdebtags1-dev.
If you instead like coding in Perl or Python rather than C++, you can install either libdebtags-perl or python-debtags: they are swig-generated bindings to libdebtags1.
Again, for whatever documentation or examples that are missing, ask in debtags-devel, and every answer will also contribute to improve the documentation and examples in the packages.
This is an example Python code that plays with Debtags a bit (please teach me how to use iterators properly):
import Debtags # Instantiate the simple Debtags class dt = Debtags.DebtagsSimple(0) # Get some information from the vocabulary voc = dt.vocabulary() print "Facets:" for a in voc.getFacets().getIterable(): print "%s: %s %s" % (a.name(), a.field("Status"), a.field("Nature")) # List the accessibility tags fac = voc.getFacet("accessibility"); print "Tags in 'accessibility::':" for a in fac.tags().getIterable(): print a.name() # Print the tags of ``debtags`` tags = dt.getTags("debtags") for a in tags.getIterable(): print a.name() # List the image editors in Debian tags = Debtags.TagSet() tags.insert(voc.getTag("use::editing")) tags.insert(voc.getTag("media::rasterimage")) editors = dt.getPackages(tags) print "Raster image editors:" for a in editors.getIterable(): print a.name()
This is the same example, written in Perl:
#!/usr/bin/perl -w use strict; use warnings; use Debtags; my $dt = Debtags::DebtagsSimple->new(0); my $voc = $dt->vocabulary(); print "Facets: ", join(', ', map { $_->name() } @{$voc->getFacets()->getIterable()}), "\n"; my $fac = $voc->getFacet("accessibility"); print "Tags in 'accessibility::': ", join(', ', map { $_->name() } @{$fac->tags()->getIterable()}), "\n"; print "Tags for 'debtags': ", join(', ', map { $_->fullname() } @{$dt->getTags('debtags')->getIterable()}), "\n"; my $tags = Debtags::TagSet->new(); $tags->insert($voc->getTag('use::editing')); $tags->insert($voc->getTag('media::rasterimage')); my $editors = $dt->getPackages($tags); print "Raster image editors: ", join(', ', map { $_->name() } @{$editors->getIterable()} ), "\n";
libapt-front is the next-generation wrapper for libapt. The goal of the project is having a library to base all package managers on, and to have it able to access different sources of metadata besides libapt. Including Debtags.
libapt-front is still under heavy development, and Debtags support is still not implemented. However, I have commit access to the libapt-front repository and I've recently been able to take libdebtags1 where I wanted it to be in order to support libapt-front: expect exciting news from this corner of Alioth!
Once we have the data and the functions to access it, we can improve the way we navigate our package archive.
One area where more research is needed is finding good ways of navigating the package archive using Debtags. As I mentioned before, in front of a list with more than about 7 plus or minus 2 items, our brain goes banana [ZEN] [MILLER]. This sets a first goal for Debtags: to allow people to navigate the package archive in such a way that no list of unrelated choices should have more than 7 plus or minus 2 items.
Do you know of any existing and very good interfaces to navigate such a large archive using faceted categorization? I don't. I know of very good examples on a much smaller scale, but we seem to have a much more large, complex and heterogeneous dataset.
We need mockups and prototypes of interfaces offering an intuitive and efficient way of navigating through tags, keeping in mind those 7+/-2 and EVN requirements.
As you can see running debtags-edit and tagcolledit, I still haven't done much progress in this field, and I'm desperately in need of good ideas.
This paper is the first single comprehensive source of information about the Debtags project. It has covered the theoretical foundations, the tools existing at the moment, how to take advantage of Debtags and various ways of getting involved.
Since we would really like to get help, I'll repeat a short summary on how to get involved:
If you are interested in helping, or just curious to see what's happening, join the debtags-devel mailing list: that's where people hang out, discussions happens and announcements are posted.
[STECKEL] | Mike Steckel, "Ranganathan for IAs", 2002, http://www.boxesandarrows.com/archives/ranganathan_for_ias.php |
[ZEN] | (1, 2) Enrico Zini, "Zen and the art of Free Software: know your user, know yourself", 2004, http://people.debian.org/~enrico/talks/2004linuxtag/ |
[MILLER] | (1, 2) George A. Miller, "The magical number seven, plus or minus two: Some limits on our capacity for processing information", 1956, Psychological Review n.63 pp. 81-91, http://www.well.com/user/smalin/miller.html |
[FURNAS] | George W. Furnas, "Effective View Navigation", 1997, Proceedings of CHI'97 pp. 367--374, ACM Conference on Human Factors in Computing Systems, Atlanta, http://www.si.umich.edu/~furnas/Papers/CHI97-EVN.2.pdf |
[TAGCOLL] | Enrico Zini, "The tagged collection: an alternative way of organizing a collection of bookmark-like items and its integration with existing web browsers", 2001, http://svn.debian.org/wsvn/debtags/tagcoll/trunk/doc/ |