FreeDBToMySQL

Download.

The archive contains all webpages.

This is mainly for developers or people who like to play around with SQL. It's also useful for people who like to clean up the freedb entries.
Also have a look at the excellent jmbase.

Introduction

FreeDBToMySQL is a perl script which imports files from the freedb archive to a mysql database. Freedb is a free alternative to the CDDB-service, a service which lets you download information about your audio-CDs (like artist and title) directly into your CD-Player on your computer and saves you the time to type in the information by yourself. You should support freedb if you can, because the originators of CDDB did a rather nasty thing: The information about the CDs is supplied by normal users. Whenever a new CD comes out, someone types in the information and sends it to the CDDB-server. After enough information was supplied to CDDB, it went commercial and started bugging developers to only include support for CDDB in their player-applications. This is why freedb was made. The information supplied to freedb is and stays free.

I wrote this because this is the first step for the development of an SQL-based cddb-server which could be used locally without the internet. You can already use cddbd from the freedb site if you want to run a server by yourself. However, since freedb stores the information in a one-file-per-album-basis, your file system will get a bit cluttered because you will have over 500 000 files on your partition. (Moreover, on Mac OS X you are probably using HFS+, which really does not perform that well with that many files.) The logical step is to import all data into a SQL-database and modify cddbd to fetch the entries from there. I found that I can save about 5/6 of the space needed by the files alone, not even speaking of the overhead the file system needs for the management of the files. Instead of 500 000+ files, all data will be contained in 24 files (not counting in the files needed for mysql). And I haven't even used myisampack yet, which will compress the tables even further (at the cost of the tables being read-only from there on).

The following table should give you an idea of the task. For every directory, a new image file was made in order to unclutter the HFS+ B*-Tree. All tables were empty before each import. All imports were made on an iBook/600 with a 20 GB drive, the freedb archives consisted of freedb-complete-20020327.tar.bz2, freedb-update-20020327-20020501.tar.bz2 and freedb-update-20020501-20020601.tar.bz2. Note that the FreeDBToMySQL.pl used did not import the EXT-fields.

Directory name
# of files
Space consumed in HFS+ in MB (1)
Space consumed in mySQL in MB (2)
Space consumed in mySQL on incremental import in MB (3)
Time to unpack the files in seconds (4)
Time to import in seconds (4)
Time to import on incremental import in seconds (5)
jazz
41895
186,2
28,3
28,3
886
2430
2430
data
3465
15,7
3,1
30,6
533
153
147
blues
24898
110,5
19,7
49,6
720
1145
1695
reggae
7212
32,3
5,9
55,0
488
335
513
soundtrack
19091
84,8
18,0
72,4
665
1078
1574
folk
33782
149,8
27,3
98,8
717
1820
2510
country
16074
71,6
13,7
111,7
575
759
1077
misc
182879
814,5
143,8
254,1
2802
14823
17675
classical
55371
245,6
50,0
302,3
1020
2890
6596
rock
~170000






newage
~21000







  1. Determined via df -k in the shell. The size of the image was chosen to be as small as possible.
  2. Determined via du -sk /sw/var/mysql/freedb in the shell.
  3. For this column, the tables were not emptied. For each row, consider the previous rows already in the database. Here you can see how space is saved as the tables grow.
  4. Determined via time in the shell. The times are real times.
  5. For this column, the tables were not emptied. For each row, consider the previous rows already in the database. Here you can see how inserting gets slower as the tables grow.

Installing on Mac OS X

Before you can use this, you need a few things. Also, you should be able to use the terminal fairly well. You will need to download about 130-140 MB of files, and the database needs (after everything is imported) about 500 MB of hard disk space. You will temporarily need about 1 GB of additional scratch space. Also, importing all the files can literally take days. Be patient!

After installation, you will have mysql and the CD database on your harddisk. At the moment, the only thing you can do with this is train your SQL skills and make queries from the command line. You can't use this for automatic CD lookup.

What follows are detailed installation instructions for Mac OS X. Users of other operating systems may skip steps as needed. Feel free to mail me installation instructions for other systems.

Installing the software

  1. Install mysql. I recommend using fink for this, so install that first. Follow the installation instructions carefully, then open a terminal and type:
    fink install mysql
    fink install dbd-mysql-pm
    fink install bzip2

    one at a time, while you are online. This will download all sources needed, compile and install them.bzip2 is a compression tool, which you will need to decompress the freedb files.
  2. Make sure you secure your mysql installation (substitute new_password):
    shell> mysql -u root mysql
    mysql> SET PASSWORD FOR root=PASSWORD('new_password');
    mysql> FLUSH PRIVILEGES;

    See the chapter "Setting Up the Initial MySQL Privileges" in the mysql manual.
  3. Configure some stuff. From the shell (substitute the name as appropriate):
    shell> mysql -u root mysql
    mysql> create database freedb;
    mysql> grant select,insert,update,delete,create,drop,alter on freedb.* to lars@localhost;
    mysql> FLUSH PRIVILEGES;

    User lars is now able to do stuff with the freedb database.
  4. Get the freedb archive from the freedb web site, along with the latest updates of the archive.
  5. Get FreeDBToMySQL.pl, and make sure it is executable:
    chmod 755 FreeDBToMySQL.pl
  6. Start the mysql daemon:
    sudo /sw/share/mysql/mysql.serverstart
    You have to do this after each restart.

Importing the freedb files

Now we will create a disk image for temporarily storing the files from freedb. The files are distributed in a few subdirecories in the archive. We will unpack and import one subdirectory at a time. (If you have a lot of space, you can of course unpack everything in one go.) I'm assuming there's a subdirectory called "freedbstuff" in your home directory containing all relevant files.

  1. Initialize the database with
    mysql < ~/freedbstuff/inittdb.mysql
  2. Create a new disk image with Disk Copy. Set the volume name to "freedb", the size to 1.3 GB. Mount the image, and create a folder named "freedb" in the volume.
  3. Go to the shell, go to /Volumes/freedb/freedb/, and type
    bzcat ~/freedbstuff/freedb-complete-20020327.tar.bz2 | tar xvf - data/\*
    This will unpack all files in the data-subdirectory. if you have any updates, unpack them as well:
    bzcat ~/freedbstuff/freedb-update-20020327-20020501.tar.bz2 | tar xvf - data/\*
  4. Now run the import script:
    ~/freedbstuff/FreeDBToMySQL.pl
    and wait. The script looks for the cddb-directories in the current directory and imports all files it finds.
  5. Remove the directory:
    rm -rf data
    and repeat steps 2-4 for the other directories: blues, classical, country, folk, jazz, misc, newage, reggae, rock, soundtrack
  6. Unmount the image and delete the image file.

That's it. You can try to do some selecting stuff. Go to the shell, start the client with
mysql freedb, and do a
SELECT id,title FROM artist WHERE name = "Dire Straits";
All this should work now. I hope someone starts to modify cddbd so that it will query the sql-server instead. Volunteers?

Things to consider

Don't worry if SELECTs are slow if you play around while importing. Only the indexes which are really needed during importing are updated. For example, for table song, only the primary index is updated, so SELECTs on the artist_id in this table is really slow. Indexes are updated at the end of the import.

MySQL is slow for importing stuff (at least the way it's done with this script), so don't get frustrated about the time it takes to import everything. Lookup is very fast.

Developers

In a lot of ways, the script is incomplete:

Feel free to help. :)

Table design

Table album:

Field name
Type
Comments, size, etc.
Example (partly fictional)
id
mediumint(8) unsigned
0-16 777 215, primary key, auto-increment
59021
artist_id
mediumint(8) unsigned
0-16 777 215, foreign key to artist
9076
title
blob
length <= 2^16, case sensitive
Brothers In Arms
filedir_id
tinyint(3) unsigned
0-255
7
filename
char(8)
length 8, case insensitive
640c8a09
discids
varchar(255)
length <= 255, case insensitive, comma separated
640c8a09,8d0cf209
length
smallint(5) unsigned
0-65535, in seconds
3212
trackoffsets
text
length <= 2^16, case insensitive, comma separated
182,23052,59222,77422,
105872,143630,174417,
195240,210842
year
smallint(5) unsigned
0-65535
1995
genre_id
mediumint(8) unsigned
0-16 777 215, foreign key to genre
26
revision
smallint(5) unsigned
0-65535
2
firstline_id
mediumint(8) unsigned
0-16 777 215, foreign key to firstline
2
processed_id
mediumint(8) unsigned
0-16 777 215, foreign key to processedline
4
submitted_id
mediumint(8) unsigned
0-16 777 215, foreign key to submittedline
164

Table artist:

Field name
Type
Comments, size, etc.
Example
id
mediumint(8), unsigned
0-16 777 215, primary key, auto-increment
9076
name
blob
length <= 2^16, case sensitive
Dire Straits

Table dirname:

Field name
Type
Comments, size, etc.
Example
id
mediumint(8), unsigned
0-16 777 215, primary key, auto-increment
7
name
char(10)
length = 10 characters
misc

Table firstline:

Field name
Type
Comments, size, etc.
Example
id
mediumint(8), unsigned
0-16 777 215, primary key, auto-increment
2
firstlinetext
varchar(255) binary
length <= 255, case sensitive
# xmcd

Table genre:

Field name
Type
Comments, size, etc.
Example
id
mediumint(8), unsigned
0-16 777 215, primary key, auto-increment
26
genretext
varchar(255) binary
length <= 255, case sensitive
Rock

Table processedbyline:

Field name
Type
Comments, size, etc.
Example
id
mediumint(8), unsigned
0-16 777 215, primary key, auto-increment
4
processedbylinetext
varchar(255) binary
length <= 255, case sensitive
# Processed by: cddbd v1.4b41PL3 Copyright(c) Steve Scherf et al.

Table song:

Field name
Type
Comments, size, etc.
Example
id
int(10) unsigned
0-4 294 967 295, PK, auto-inc.
730507
artist_id
mediumint(8), unsigned
0-16 777 215, foreign key to artist
9076
album_id
mediumint(8), unsigned
0-16 777 215, foreign key to album
59021
title
blob
length <= 2^16, case sensitive
Money For Nothing
song_num
tinyint(3) unsigned
0-255, starting from 0
1

Table submittedvialine:

Field name
Type
Comments, size, etc.
Example
id
mediumint(8), unsigned
0-16 777 215, primary key, auto-increment
164
submittedvialinetext
varchar(255) binary
length <= 255, case sensitive
# Submitted via: AltoMP3 Maker 3.0

Table design considerations

Album.title and Artist.name are BLOBs because some were longer than 255. BLOB instead of TEXT because of case sensitivity.
Some varchars are binary to get case sensitiveness. Same goes for blobs.
Album.year: Since mysql does all kinds of conversions for a year field, it was decided to leave the field as a smallint(5).
Table dirname: The index for the name is done on the first two characters. Consider this if ever another directory is added.

Last modified: 18.07.2002
© 2002 Lars Bohn