[Gpe-list] Bad database design in Contacts
Florent de Dinechin
Florent.de.Dinechin at ens-lyon.fr
Wed May 30 13:05:33 CEST 2007
I had several problems with Contacts on a N800.
Some contacts don't show up in the proper place in the alphabetical
view. I imported most contacts as VCARDs from a Psion 5, and this may
be the initial cause. Anyway some contacts added manually before this
mass import still show improperly sorted at the end of the list.
So I used sqlite to look at the database, and it is uglily
redondant. Ex:
3|FAMILY_NAME|Dinechin
3|GIVEN_NAME|Florent
3|NAME|Florent de Dinechin
3|MIDDLE_NAME|de
the name shouldn't be systematically the concatenation of
three other fields of the same table. It should be constructed
on the fly when needed, or it should be a sql VIEW.
Similarly, the import of vcards with notes resulted in two
identical fields, one with tag NOTES and the other one with COMMENTS.
Worse, the contacts_urn table contains only information that is already
in the contacts table, and is also redundant with itself.
All I remember from my database courses is that such semantic
redundancies are the source of all evil,
as they allow the database to become inconsistent with itself,
leading to bugs which the programmmers will not have expected.
I suspect it is the case for my problem.
For example I managed to have some entries with proper
GIVEN_NAME and FAMILY_NAME but empty NAME.
Maybe (probably) some of my VCARDs were faulty in the first place. The
point is, they shouldn't be able to put the database into an
inconsistent state.
All the early research in database design was to define a hierarchy
of normal forms that forbids any such redundancy.
The contacts database definitely violates several of them.
This is a problem that is going to plague GPE Contacts forever
until it is solved. Solving it is easy, but unfortunately rather
fundamental, as it will break backward compatibility. Basically it would
consist in removing contacts_urn altogether from the persistant
database: It can be constructed out of contacts when needed by a SQL
query, or it can be a read-only SQL VIEW -- something like the following
which works under sqlite
create view contacts_urn_view as
select urn, value as family_name, NULL as company
from contacts
where tag='FAMILY_NAME'
UNION
select urn, NULL as family_name, value as company
from contacts
where tag='COMPANY' ;
(I haven't had a look in the source what contacts_urn is used for, so
the good view may be a different one. I'm just making the point of
redundancy here. If you want a view that defines a unique name for
sorting for each card, it would be for instance
select urn, family_name as sort_name
from contacts_urn_view
where family_name NOTNULL
UNION
select urn, company as sort_name
from contacts_urn_view
where family_name ISNULL
order by sort_name;
enough SQL now.)
Similarly, NAME and other redundant informations should be removed from
the contacts table.
I probably overlooked some constraints that lead to this bad design,
like computing power required or whatever. I'm sure there are solutions
that don't introduce redundancy in the tables. I'm ready to offer more
details if somebody wants them, and of course I'm ready to code for it.
Please point me to posts that discussed this issue if it has arised
already. Meanwhile I have done a .dump in sqlite, and I am going to
rebuild a consistent database of my contacts out of the mess produced by
"import vcard" :)
A related problem is the interface which has only one name field,
and tries to infer the three subfields.
It gets it wrong for all the non-american cases
(in John von Neumann or Louis de Broglie : von and de are counted as
middle names. Now can you tell where Ricardo P. da Silva Sa Pinto will
be sorted?).
Besides the rules are obscure. If you enter one von Neumann
from the interface it will be sorted at N, if you import one from
a correct VCARD it will be sorted as V. This has also happened
to my poor database.
It will be much less dangerous to have input fields for first name,
middle name and family name, plus a field for compagny/organization name
which is used for sorting if the family name is absent as in the query
above.
Having an explicit Family Name entry will at least
give us the option to enter da Silva Sa Pinto as "Silva Sa Pinto, da"
which will be sorted properly.
Cheers,
Florent
More information about the Gpe-list
mailing list