[Gpe-list] Bad database design in Contacts

Florent de Dinechin Florent.de.Dinechin at ens-lyon.fr
Wed May 30 13:05:33 CEST 2007


I had several problems with Contacts on a N800.
Some contacts don't show up in the proper place in the alphabetical 
view. I imported most  contacts as VCARDs  from a Psion 5, and this may 
be the initial cause. Anyway some contacts added manually before this 
mass import still show improperly sorted at the end of the list.

So I used sqlite to look at the database, and it is uglily
redondant. Ex:
3|FAMILY_NAME|Dinechin
3|GIVEN_NAME|Florent
3|NAME|Florent de Dinechin
3|MIDDLE_NAME|de
the name shouldn't be systematically the concatenation of
three other fields of the same table. It should be constructed
on the fly when needed, or it should be a sql VIEW.
Similarly, the import of vcards with notes resulted in two
identical fields, one with tag NOTES and the other one with COMMENTS.
Worse, the contacts_urn table contains only information that is already
in the contacts table, and is also redundant with itself.

All I remember from my database courses is that such semantic 
redundancies are the source of all evil,
as they allow the database to become inconsistent with itself,
leading to bugs which the programmmers will not have expected.
I suspect it is the case for my problem.
For example I managed to have some entries with proper
GIVEN_NAME and FAMILY_NAME but empty NAME.
Maybe (probably) some of my VCARDs were faulty in the first place. The 
point is, they shouldn't be able to put the database into an 
inconsistent state.

All the early research in database design was to define a hierarchy
of normal forms that forbids any such redundancy.
The contacts database definitely violates several of them.

This is a problem that is going to plague GPE Contacts forever
until it is solved. Solving it is easy, but unfortunately rather 
fundamental, as it will break backward compatibility. Basically it would 
consist in removing contacts_urn altogether from the persistant 
database: It can be constructed out of contacts when needed by a SQL 
query, or it can be a read-only SQL VIEW -- something like the following 
which works under sqlite
create view contacts_urn_view as
    select urn, value as family_name, NULL as company
      from contacts
      where tag='FAMILY_NAME'
    UNION
    select urn, NULL as family_name, value as company
      from contacts
      where tag='COMPANY'  ;

(I haven't had a look in the source what contacts_urn is used for, so 
the good view may be a different one. I'm just making the point of 
redundancy here. If you want a view that defines a unique name for 
sorting for each card, it would be for instance

select urn, family_name as sort_name
   from contacts_urn_view
   where family_name NOTNULL
UNION
select urn, company as sort_name
   from contacts_urn_view
   where family_name ISNULL
order by sort_name;

enough SQL now.)


Similarly, NAME and other redundant informations should be removed from 
the contacts table.



I probably overlooked some constraints that lead to this bad design, 
like computing power required or whatever. I'm sure there are solutions 
that don't introduce redundancy in the tables. I'm ready to offer more 
details if somebody wants them, and of course I'm ready to code for it.

Please point me to posts that discussed this issue if it has arised 
already. Meanwhile I have done a .dump in sqlite, and I am going to 
rebuild a consistent database of my contacts out of the mess produced by 
"import vcard" :)




A related problem is the interface which has only one name field,
and  tries to infer the three subfields.
It gets it wrong for all the non-american cases
(in John von Neumann or Louis de Broglie : von and de are counted as 
middle names. Now can you tell where Ricardo P. da Silva Sa Pinto will 
be sorted?).
Besides the rules are obscure. If you enter one von Neumann
from the interface it will be sorted at N, if you import one from
a correct VCARD it will be sorted as V. This has also happened
to my poor database.

It will be much less dangerous to have input fields for first name, 
middle name and family name, plus a field for compagny/organization name
which is used for sorting if the family name is absent as in the query 
above.
Having an explicit Family Name entry will at least
give us the option to enter da Silva Sa Pinto as "Silva Sa Pinto, da"
which will be sorted properly.

Cheers,

	Florent




More information about the Gpe-list mailing list