FreeDBToMySQL

Download.

The archive contains all webpages.

This is mainly for developers or people who like to play around with SQL. It's also useful for people who like to clean up the freedb entries.
Also have a look at the excellent jmbase.

Introduction

FreeDBToMySQL is a perl script which imports files from the freedb archive to a mysql database. Freedb is a free alternative to the CDDB-service, a service which lets you download information about your audio-CDs (like artist and title) directly into your CD-Player on your computer and saves you the time to type in the information by yourself. You should support freedb if you can, because the originators of CDDB did a rather nasty thing: The information about the CDs is supplied by normal users. Whenever a new CD comes out, someone types in the information and sends it to the CDDB-server. After enough information was supplied to CDDB, it went commercial and started bugging developers to only include support for CDDB in their player-applications. This is why freedb was made. The information supplied to freedb is and stays free.

I wrote this because this is the first step for the development of an SQL-based cddb-server which could be used locally without the internet. You can already use cddbd from the freedb site if you want to run a server by yourself. However, since freedb stores the information in a one-file-per-album-basis, your file system will get a bit cluttered because you will have over 500 000 files on your partition. (Moreover, on Mac OS X you are probably using HFS+, which really does not perform that well with that many files.) The logical step is to import all data into a SQL-database and modify cddbd to fetch the entries from there. I found that I can save about 5/6 of the space needed by the files alone, not even speaking of the overhead the file system needs for the management of the files. Instead of 500 000+ files, all data will be contained in 24 files (not counting in the files needed for mysql). And I haven't even used myisampack yet, which will compress the tables even further (at the cost of the tables being read-only from there on).

The following table should give you an idea of the task. For every directory, a new image file was made in order to unclutter the HFS+ B*-Tree. All tables were empty before each import. All imports were made on an iBook/600 with a 20 GB drive, the freedb archives consisted of freedb-complete-20020327.tar.bz2, freedb-update-20020327-20020501.tar.bz2 and freedb-update-20020501-20020601.tar.bz2. Note that the FreeDBToMySQL.pl used did not import the EXT-fields.

Directory name	# of files	Space consumed in HFS+ in MB (1)	Space consumed in mySQL in MB (2)	Space consumed in mySQL on incremental import in MB (3)	Time to unpack the files in seconds (4)	Time to import in seconds (4)	Time to import on incremental import in seconds (5)
jazz	41895	186,2	28,3	28,3	886	2430	2430
data	3465	15,7	3,1	30,6	533	153	147
blues	24898	110,5	19,7	49,6	720	1145	1695
reggae	7212	32,3	5,9	55,0	488	335	513
soundtrack	19091	84,8	18,0	72,4	665	1078	1574
folk	33782	149,8	27,3	98,8	717	1820	2510
country	16074	71,6	13,7	111,7	575	759	1077
misc	182879	814,5	143,8	254,1	2802	14823	17675
classical	55371	245,6	50,0	302,3	1020	2890	6596
rock	~170000
newage	~21000

Determined via df -k in the shell. The size of the image was chosen to be as small as possible.
Determined via du -sk /sw/var/mysql/freedb in the shell.
For this column, the tables were not emptied. For each row, consider the previous rows already in the database. Here you can see how space is saved as the tables grow.
Determined via time in the shell. The times are real times.
For this column, the tables were not emptied. For each row, consider the previous rows already in the database. Here you can see how inserting gets slower as the tables grow.

Installing on Mac OS X

Before you can use this, you need a few things. Also, you should be able to use the terminal fairly well. You will need to download about 130-140 MB of files, and the database needs (after everything is imported) about 500 MB of hard disk space. You will temporarily need about 1 GB of additional scratch space. Also, importing all the files can literally take days. Be patient!

After installation, you will have mysql and the CD database on your harddisk. At the moment, the only thing you can do with this is train your SQL skills and make queries from the command line. You can't use this for automatic CD lookup.

What follows are detailed installation instructions for Mac OS X. Users of other operating systems may skip steps as needed. Feel free to mail me installation instructions for other systems.

Installing the software

Install mysql. I recommend using fink for this, so install that first. Follow the installation instructions carefully, then open a terminal and type:
fink install mysql fink install dbd-mysql-pm fink install bzip2
one at a time, while you are online. This will download all sources needed, compile and install them.bzip2 is a compression tool, which you will need to decompress the freedb files.
Make sure you secure your mysql installation (substitute new_password):
shell> mysql -u root mysql mysql> SET PASSWORD FOR root=PASSWORD('new_password'); mysql> FLUSH PRIVILEGES;
See the chapter "Setting Up the Initial MySQL Privileges" in the mysql manual.
Configure some stuff. From the shell (substitute the name as appropriate):
shell> mysql -u root mysql mysql> create database freedb; mysql> grant select,insert,update,delete,create,drop,alter on freedb.* to lars@localhost; mysql> FLUSH PRIVILEGES;
User lars is now able to do stuff with the freedb database.
Get the freedb archive from the freedb web site, along with the latest updates of the archive.
Get FreeDBToMySQL.pl, and make sure it is executable:
chmod 755 FreeDBToMySQL.pl
Start the mysql daemon:
sudo /sw/share/mysql/mysql.serverstart
You have to do this after each restart.

Importing the freedb files

Now we will create a disk image for temporarily storing the files from freedb. The files are distributed in a few subdirecories in the archive. We will unpack and import one subdirectory at a time. (If you have a lot of space, you can of course unpack everything in one go.) I'm assuming there's a subdirectory called "freedbstuff" in your home directory containing all relevant files.

Initialize the database with
mysql < ~/freedbstuff/inittdb.mysql
Create a new disk image with Disk Copy. Set the volume name to "freedb", the size to 1.3 GB. Mount the image, and create a folder named "freedb" in the volume.
Go to the shell, go to /Volumes/freedb/freedb/, and type
bzcat ~/freedbstuff/freedb-complete-20020327.tar.bz2 | tar xvf - data/\*
This will unpack all files in the data-subdirectory. if you have any updates, unpack them as well:
bzcat ~/freedbstuff/freedb-update-20020327-20020501.tar.bz2 | tar xvf - data/\*
Now run the import script:
~/freedbstuff/FreeDBToMySQL.pl
and wait. The script looks for the cddb-directories in the current directory and imports all files it finds.
Remove the directory:
rm -rf data
and repeat steps 2-4 for the other directories: blues, classical, country, folk, jazz, misc, newage, reggae, rock, soundtrack
Unmount the image and delete the image file.

That's it. You can try to do some selecting stuff. Go to the shell, start the client with
mysql freedb, and do a
SELECT id,title FROM artist WHERE name = "Dire Straits";
All this should work now. I hope someone starts to modify cddbd so that it will query the sql-server instead. Volunteers?

Things to consider

Don't worry if SELECTs are slow if you play around while importing. Only the indexes which are really needed during importing are updated. For example, for table song, only the primary index is updated, so SELECTs on the artist_id in this table is really slow. Indexes are updated at the end of the import.

MySQL is slow for importing stuff (at least the way it's done with this script), so don't get frustrated about the time it takes to import everything. Lookup is very fast.

Developers

In a lot of ways, the script is incomplete:

At the moment, no EXT fields are imported.
No consistency checks are made. The script relies on the uniqueness of a file's name in a directory.
When INSERTing albums and songs, there's no checking if the album or song already exists.
Speed is not optimal. I'm sure there are a lot of perl tricks to do.
The data is not very good: A lot of entries do not conform to the standard defined in DBFORMAT in the cddbd distribution. This results in very large artist and genre tables with a lot of bogus entries. The main problem is IMHO the very poor standardization of the format of the files.
It would be nice if importing would work directly from the .bz2 archives.
Rearranging the columns might speed SELECTs up. But this needs to be tested.
Caching tables in the script would be nice. firstline and processedbyline are candidates for this.
Increasing the index cache for the bigger tables might help.
See into DELAY_KEY_WRITE=1 in the mysql docs. Should speed things up.
As a last resort, consider splitting the bigger tables into multiple tables.
Some fields are too big. Resizing some columns would be nice. (Didn't know this until after the import.)
Sometimes, it hangs. Oh well.

Feel free to help. :)

Table design

Table album:

Field name	Type	Comments, size, etc.	Example (partly fictional)
id	mediumint(8) unsigned	0-16 777 215, primary key, auto-increment	59021
artist_id	mediumint(8) unsigned	0-16 777 215, foreign key to artist	9076
title	blob	length <= 2^16, case sensitive	Brothers In Arms
filedir_id	tinyint(3) unsigned	0-255	7
filename	char(8)	length 8, case insensitive	640c8a09
discids	varchar(255)	length <= 255, case insensitive, comma separated	640c8a09,8d0cf209
length	smallint(5) unsigned	0-65535, in seconds	3212
trackoffsets	text	length <= 2^16, case insensitive, comma separated	182,23052,59222,77422, 105872,143630,174417, 195240,210842
year	smallint(5) unsigned	0-65535	1995
genre_id	mediumint(8) unsigned	0-16 777 215, foreign key to genre	26
revision	smallint(5) unsigned	0-65535	2
firstline_id	mediumint(8) unsigned	0-16 777 215, foreign key to firstline	2
processed_id	mediumint(8) unsigned	0-16 777 215, foreign key to processedline	4
submitted_id	mediumint(8) unsigned	0-16 777 215, foreign key to submittedline	164

Table artist:

Field name	Type	Comments, size, etc.	Example
id	mediumint(8), unsigned	0-16 777 215, primary key, auto-increment	9076
name	blob	length <= 2^16, case sensitive	Dire Straits

Table dirname:

Field name	Type	Comments, size, etc.	Example
id	mediumint(8), unsigned	0-16 777 215, primary key, auto-increment	7
name	char(10)	length = 10 characters	misc

Table firstline:

Field name	Type	Comments, size, etc.	Example
id	mediumint(8), unsigned	0-16 777 215, primary key, auto-increment	2
firstlinetext	varchar(255) binary	length <= 255, case sensitive	# xmcd

Table genre:

Field name	Type	Comments, size, etc.	Example
id	mediumint(8), unsigned	0-16 777 215, primary key, auto-increment	26
genretext	varchar(255) binary	length <= 255, case sensitive	Rock

Table processedbyline:

Field name	Type	Comments, size, etc.	Example
id	mediumint(8), unsigned	0-16 777 215, primary key, auto-increment	4
processedbylinetext	varchar(255) binary	length <= 255, case sensitive	# Processed by: cddbd v1.4b41PL3 Copyright(c) Steve Scherf et al.

Table song:

Field name	Type	Comments, size, etc.	Example
id	int(10) unsigned	0-4 294 967 295, PK, auto-inc.	730507
artist_id	mediumint(8), unsigned	0-16 777 215, foreign key to artist	9076
album_id	mediumint(8), unsigned	0-16 777 215, foreign key to album	59021
title	blob	length <= 2^16, case sensitive	Money For Nothing
song_num	tinyint(3) unsigned	0-255, starting from 0	1

Table submittedvialine:

Field name	Type	Comments, size, etc.	Example
id	mediumint(8), unsigned	0-16 777 215, primary key, auto-increment	164
submittedvialinetext	varchar(255) binary	length <= 255, case sensitive	# Submitted via: AltoMP3 Maker 3.0

Table design considerations

Album.title and Artist.name are BLOBs because some were longer than 255. BLOB instead of TEXT because of case sensitivity.
Some varchars are binary to get case sensitiveness. Same goes for blobs.
Album.year: Since mysql does all kinds of conversions for a year field, it was decided to leave the field as a smallint(5).
Table dirname: The index for the name is done on the first two characters. Consider this if ever another directory is added.