Automatic music rating

The problem

I like listening to music while I’m programming. However, I have lots of music on my computer and I don’t really like all of it (or find it appropriate as background music for when I’m programming).

I guess I could spend a few evenings going through my music collection, deleting anything I don’t like (which isn’t always such an easy decision), creating a playlist or whatever. This isn’t a permanent solution, though, since with the time my collection will continue growing, I may get sick of hearing some songs, my taste may change, etc. Also, I prefer spending my time programming, reading or doing anything else more interesting than sorting my music collection.

Unrelated to the problem itself, let me also mention that I use different media players. Currently I’m mostly using Banshee, but I’ve been intermittently toggling between it and Rhythmbox those last months, and also used a command-line player for some time. I’m saying this because I expect a optimal solution to be media player-independent. The bigger problem of each media player having its own database is also something I’d like to see addressed, maybe with some technology like Tracker, but that is another topic.

The status quo

My current way to approach this problem is basically setting my media player to choose songs randomly, and using the big “Next >>|” button on my keyboard whenever it chooses some song which annoys me.

(Banshee’s playback list interface has an option to automatically fill itself sorting songs by popularity, but it doesn’t seem to work good at all here; I also recall Amarok having some automatic playlist generation options, but I’m not using any KDE applications anymore and in any case this is outside the scope of an application-neutral solution).

The solution

It has recently occurred to me that a neat solution for this would be to gather information from a generic event log and to translate that into a numerical punctuation for each song. I think you may already guess where I’m heading, but if not: Zeitgeist!

With the appropriate data-sources installed, Zeitgeist holds information on which songs started playing automatically (because they are on a playlist or because you have your media player set to random), which were started manually, and in both cases which you listened to completely and which ones you skipped (and how long you resisted listening to them). It shouldn’t be too difficult to write a script which will periodically request this information from Zeitgeist and give songs positive points for every time you listened to them completely (extra if you chose them yourself manually) and negative points to the ones you skipped (but giving less negative points if you resisted half of the song than if you skipped it right after you recognized it).

With this punctuation information, music players can avoid playing songs you don’t like and give you only those you like or new ones for which there isn’t any information yet (if it followed the punctuation strictly this would end with the same songs being played all the time and new songs with punctuation around 0 being ignored). The importance of the play/skip actions would decay over time (it’s more important to consider whether you listened to the song yesterday than if you did six months ago), etc., etc.

If we wanted to create something really fancy we could even look at generating different ratings for separate circumstances, eg. in case you like listening to a different sort of music during the morning than during the evening, or to differentiate between what you like to hear while you are coding and while the computer is idle (maybe because you are doing paper homework and only using the computer to get some background music). The information for all this is there in Zeitgeist, so it’s only a matter of writing a good algorithm.

The implementation

I’ve already explained most of how this should work in the previous section, but here’s a bit of an overview of what’s needed for this:

  • Data-sources inserting the music reproduction information. We already have a data-source for Rhythmbox implemented as an extension, but Banshee and any other players are missing.
  • The actual algorithm, probably implemented as a periodically run script leaving the aggregated information at some accessible place, although this may vary depending on the degree of fanciness you choose.
  • The interface, ie. plugins for Rhythmbox and other media players which take that information and use it to provide an option for semi-randomly choosing music excluding stuff you don’t like.

If I’ve got you interested on this, I’m willing to mentor someone on this, so get in touch! Feel free to jump into #zeitgeist on irc.freenode.net or drop me a mail.

Sudoku solving with Python and SAT

At Logic class last week we saw how to solve a Sudoku using SAT and for fun I decided to actually try this out using Python. It turned out to be pretty trivial to implement and I thought I’d share the experience.

First of all let’s see how the Sudoku problem was described at class: we have a table with 9 rows and 9 columns;

  • 1. Each field [i, j] (where i=1..9 and j=1..9) has at least one value (between 1 and 9).
  • 2. Each field [i, j] (where i=1..9 and j=1..9) doesn’t have more than one value.
  • 3. There isn’t any repeated value in any row, column or 3×3 group.
  • 4. Some of the fields have a predefined value.

Now to implement this in code, first of all I needed a Python module implementing SAT solving. A quick search in Debian’s repositories gave me python-logilab-constraint, which I’ve found to be quite nice to use, even though it could definitely take some speed improvements.

Conditions 1 and 2 aren’t a problem at all, as logilab.constraint can be used quite naturally [0]. We just define a variable for each field (eg., x11 to x99, where the first number is the row and the second number is the column) and the domain in which they operate (integer value from 1 to 9):

values = range(1, 10) # [1..9]
variables = ["x%d%d" % (i, j) for j in values for i in values]
domains = {}

for variable in variables:
	domains[variable] = fd.FiniteDomain(values)

The 4th rule is also straightforward, we just need to hardcode the values. If we have a bidimensional list sudoku containing the initial numbers and None in all empty fields, we add each of them as a constraint:

constraints = []
for i, row in enumerate(sudoku):
	for j, field in enumerate(row):
		if field is None:
			continue
		variable = "x%d%d" % (i+1, j+1)
		constraints.append(fd.make_expression((variable,),
			"%s == %d" % (variable, field)))

Now only rule 3 remains; here we basically have to set up three more groups of constraints: one for rows, one for columns and one for the 3×3 groups. My initial implementation checked each row/column/group at once; for example, for the first row «x11 != x12 != x13 != … != x19», for the first column «x11 != x21 != … != x91», etc. However, this proved to be extremely slow, and after checking the «Performance considerations» section of Logilab Constraint’s documentation I split up the row and column conditions [1] to lots of smaller conditions, as in: «x11 != x12», «x11 != x13», «x11 != x14», etc. I also moved the constraints for the initial numbers to the top (I had them at the end of the constraints list before), as they are the simplest ones. With those changes resolution time changed from several minutes to some tenths of a second.

And this is it. After all constraints have been added, we just need to run the solver:

repository = Repository(variables, domains, constraints)
solutions = Solver().solve(repository)

The complete code is available via Bazaar at lp:~rainct/+junk/sudoku-sat. Being completely new to the logilab.constraints module, or implementing any such stuff at all, it took me around half an hour to write this, which shows how SAT makes such sort of problems really straightforward.


[0] Using logilab.constraint it’s possible to assign arbitrary Python data to variables (here we just give each an integer, but variables could also take tuples or whatever else). When this problem was presented at class using pure propositional logic it was a bit more cumbersome, as we couldn’t just say “there’s a variable x11 with domain [1..9]“. For instance, rule 1 was «(p111 | p112 | p113 | … | p119) and (p121 | … | p129) …», where “p111″ would be True if field [1,1] is supposed to contain a one, “p112″ is True if it’s supposed to contain a two, etc.
[1] I didn’t bother also splitting up he 3×3 group constraints since the other two changes already gave me enough of a speedup; changing that may squeeze a few msecs more out of it.
P.S.: If you’d like a more formal explanation of this, a search on Google found this paper: A SAT-based Sudoku Solver.

You no longer have an excuse not to look at Python Snippets!

You’ve probably already heard of Jono Bacon’s effort to create a collection of Python snippets and of Acire, the application to browse through them. If you’re interested in developing something with Python you should really take a look at those 115 (and couting!) awesome snippets, ranging from the most basic stuff to more advanced examples on a wide range of topics.

Last Thursday there was a talk on Acire and a snippet creation party (as part of the Ubuntu Opportunistic Developer Week) and someone mentioned it would be nice to have a web interface. So, yesterday I took a couple hours and hacked together a quick script to convert the snippets to HTML, resulting in this snippets directory which you can use to look up stuff when you haven’t Acire at hand or to give links to snippets to other people!

The new web interface to python-snippets!

And now that you’ve read this post, maybe you would like to delve into Clutter, update your application to work with Application Indicator, see how easy it is to get data from Zeitgeist or refresh the knowledge on lists you gained at Rick Spencer’s Python talk?

The code for the snippets generator is available here. Thanks go to Mako Templates, Pygments and Chardet for having made it dead easy to create this page! And, of course, patches are welcome.

Zeitgeist Data-Source Registry

This post is about an upcoming feature in Zeitgeist 0.3.3 and is so far only available in checkouts from trunk or our PPA. It is expected to be released within the next weeks.

Some weeks ago I implemented a new feature in Zeitgeist and I figured I’d drop some lines about it. I’m talking about a data-source registry.

You may be wondering, «what the hell is that even, a data-source?». So Zeitgeist is a database, a global event log, but it doesn’t do any magic indexing or monitoring by itself;  the information it logs needs to come from somewhere – be it applications sending it to Zeitgeist themselves, daemons, plugins for applications, etc. Any such entity is called a «data-source».

Having a register of all data-sources interacting with Zeitgeist provides some benefits, like for example:

Screenshot of tools/gtk/zeitgeist-data-sources-gtk.py

Prototype of a data-source management user-interface

  • Having a list of them. Lists are nice :).
  • Being able to disable data-sources from a centralized place instead of requiring each to write its own preferences dialogue.
  • Being able to set up a blacklist rule considering which data-source the information comes from as a condition (not implemented yet).

More details

When they register, data-sources will need to provide the following information: an unique ID, a name (may be localized), a description (may be localized) and optionally a template specifying what sort of events it logs. Additionally, the last timestamp the data-source was seen by Zeitgeist, whether it is running right now and whether it is enabled will also be available.

It is important to note that registration is not compulsory; while it is highly encouraged for data-sources to use it, is it still possible to anonymously insert information into Zeitgeist (for example from a shell script). The event template is also only informational, and will not be enforced by Zeitgeist at this time.

Avoiding duplicates from GTK Recently Used

You may know that zeitgeist-datahub provides basic support for applications which have no direct Zeitgeist support but do use GTK’s RecentManager, which is not as detailed as we would like, but it is better than nothing. However, until now we had a problem: when applications had support for both, GTK’s Recently Used and Zeitgeist (be it native or as an extension), it was possible for duplicate events to be inserted or other sorts of conflicts between both data-sources. Now that we have the information from the registry available, we’ve been able to solve this modifying our Recently Used data-source to ignore any events concerning applications which already have another data-source logging the same types of events.
 


In case you don’t care at all about what I’m talking here and you just want to see fancy interfaces, go check this out.

Confession

My dear Ubuntu,

I think the time has come that I make a confession. You may be wondering why I haven’t spent much time with you those last months, and you have the right to know. The case is, I have another one. She’s called Debian.

No, it’s not because of you. It’s also not because of your parents; while they aren’t perfect, they are great in many ways and I certainly don’t want to run away from then. Just wait and let me explain this. You surely remember that I spent a week with her some time ago? Well, now that I want to join Debian’s family, I decided that after all that wasn’t much time to get to know here, so I gave her another chance, and with this second experience I fell in love with her unstable character. Ubuntu, you are really extraordinary, and I’m still telling anyone who wants to listen about your wonders, but you just can’t appease my desire for trying out new things as well as Debian does.

Now, this doesn’t mean we won’t see each other anymore. I’ll still take you with me whenever I travel, and I’ll continue helping you with the household. In fact, all this wouldn’t signify a big change at all, if it wasn’t that at the same time I also decided to refocus most of my efforts on raising children, which is what really reduced the time I spent helping you.

So, while currently I’m not seeing you as much as before, I hope you all the best, and I’ll still try to help you advance, be it indirectly (1, 2) or, while probably to a lesser degree, continuing with direct contributions.

By the way, hello Planet Debian!

 
web development