Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
kristian
Starting Member
21 Posts |
Posted - 2002-06-30 : 10:14:42
|
| Hi everyoneIm designing the database structure for a link directory and i have some questions regarding performance and would value your opinion on the preferred method.I have a table called links (I better post the script before i get flamed)CREATE TABLE [dbo].[Links] ( [ID] [int] IDENTITY (1, 1) NOT NULL , [UID] [uniqueidentifier] NOT NULL , [FTTimeStamp] [timestamp] NOT NULL , [Email] [nvarchar] (150) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL , [UserName] [nvarchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL , [Password] [nvarchar] (20) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL , [ContactName] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL , [DateRegistered] [smalldatetime] NOT NULL , [LastEdited] [smalldatetime] NULL , [LastLoggedIn] [smalldatetime] NOT NULL , [LogInCount] [int] NOT NULL , [Title] [nvarchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL , [URL] [nvarchar] (100) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL , [Description] [nvarchar] (300) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL , [Keywords] [nvarchar] (100) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [Country] [nvarchar] (40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [City] [nvarchar] (40) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [Cat1] [char] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [SubCat1] [char] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [Cat2] [char] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [SubCat2] [char] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [AltCat] [nvarchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [Gen1] [char] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [SubGen1] [char] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [Gen2] [char] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [SubGen2] [char] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [HitsFrom] [int] NULL , [HitsTo] [int] NULL , [Validated] [bit] NULL ) ON [PRIMARY]GOMy questions regards the last field (Validated) A bit field indicating if the link has been validated and is visible in the directory. When someone adds a link it defaults to False then when i validate the link it becomes True.Should i be using a second table to keep unvalidated links that i then insert into the links table upon validation? The reason is ask is because.. a) Im assuming it will be more efficient for my queries to not all have "where validated=1" attached,b) Im using full text searching, and as yet havnt worked out how to exclude rows based on criteria.Which leads me to my second question.Should i be using a view? I could create a view of all validated links, and i could also aggregrate the hitsfrom and hitsto columns because i will be pulling these 2 as a combined total. And i could then create my full text index from the view, (I think, aint tried yet).Performance is paramount, Im at the design stage now so want to get it right before its too late to change it. Im assuming using a view adds some overhead. I will be validating the links only weekly, so not much data change there, but the values of hitsfrom and hitto will be updating frequently. I assume the fastest method would be 2 tables, one for unvalidated and one for validated links. The problem with this method is i wanted link owners to be able to start increasing there hitsfrom and hitsto stats before i have validated the link. The script that does that will have to have logic for knowing which table to update, and its essential that script is very simple and high speed.What would you guys recommend, a view, 2 tables or a bit field.Thanks in advanceKristian |
|
|
Page47
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2002-07-01 : 07:28:46
|
I would leave the bit field as it is. A link is a link. Whether or not it is 'validated' (I'm assuming that means it passed some certifaction standards) is an attribute of the entity. An un-validated link is not essentially different than a validated link. You certainlyl want to careful when indexing to include the bit field where is will help performance. I think having a pair of views (ValidLinks and UnBvalidLinks: essentially select where valid equals/does not equal 1) is a fine idea. There will be minimal overhead, but certainly no performance gain other than you won't have to constantly remember to include the Validated = something condition in your where clause.I can't help myself. I'm gonna throw out a couple other things I think you should think about.- URL is (probably) the natural key for this table. While I can certainly understand deciding to implement a surrogate key in place of the cumbersome varchar(100), I don't think there is any valid reason to have TWO surrogate keys (ID and UID).
- Speaking of primary keys, you haven't defined one. I'm assuming you will go with ID or UID otherwise there is no reason for them in the table.
- Is there a reason you are using NVARCHARs. Can you have Unicode in a URL? (I don't know.)
- You've got some normalization issues here. First, your Cat and Gen columns. I don't know what Cat and Gen mean, but generally you don't want a column1, column2 design. It makes is very difficult to add Cat3 or Gen3 later. Plus, lets say Cat means category, you have know way of ensuring data integrity, i.e. cat1 could equal 'Product Link', 'Product', 'Prod link', 'Linkco de Producto'. You should have a cat table and gen table with a many to many linker-upper in between (linkcat and linkgen).
- Another normalization issue is that you seem to have some transitive dependencies. How can a link have a Username and Password? How can a link 'Log in'? It seems you are capturing Link info and User info in the same table. This should be split up.
- I think you can have URLs that are longer than 100 characters.
- I would make HitsTo and HitsFrom not null with a default of zero. It will probably makes some queries down the road a bit easier to write.
That's all I got right now. Hope this was helpful.<O> |
 |
|
|
kristian
Starting Member
21 Posts |
Posted - 2002-07-01 : 08:54:18
|
| Thanks a lot for the reply.I hve been considering a lot of what you mentioned and implemeneted a couple of points anyway yesterday and ill be looking over the others today.(TWO surrogate keys (ID and UID))I dont expect this to stay, but im forced to use a uid for full text indexing it seems.(Speaking of primary keys, you haven't defined one. )Yeah still deciding best indexes etc.Is there a reason you are using NVARCHARs. I allways use nvarchar for text that could be inputted by a user.Can you have Unicode in a URL? (I don't know.) Good question?(You've got some normalization issues here. )Ive been considering this a lot, and was gonna spend today testing different options, originally it was fully normalized, and then it got denormalized. Im basically gonna build it both ways and test the query performance. My main reason for including the category n genre columns in the link like i did is because i want them to be searched on with the rest of the columns. But i might be able to achieve this with a view that performs a join anyway. mmm.(Another normalization issue is that you seem to have some transitive dependencies. How can a link have a Username and Password? How can a link 'Log in'? )I went for this option for simplicity, multiple links will require multiple logins to be accessed. I appreciate this aint properley normalized, on most my sites its fully normalized, but this one ive kept simple.(I think you can have URLs that are longer than 100 characters.)You can but i took the longest length from 3000 links and it was only about 80. If its longer than that its proberbly some crappy page hosted in a sub durectory of a subdirectory on some site. ie. if the links longer than 100 it aint worth increasing page size for a site that cant even get it self a resaonable domain nameI would make HitsTo and HitsFrom not null with a default of zero. It will probably makes some queries down the road a bit easier to write. (Yeah this was something that happenened yesterday)And yeah, im stupid, i didnt consider passing the bit field too the full text index then excluding it from there. For some reason my brain was trying to work out how to exclude records from the index.But, youve said theres min overhead with views and im thinking this would be the easiest solution so im gonna head down that path i reackon.ThanksEdited by - Kristian on 07/01/2002 08:57:16 |
 |
|
|
Page47
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2002-07-01 : 09:10:35
|
quote: (TWO surrogate keys (ID and UID))I dont expect this to stay, but im forced to use a uid for full text indexing it seems.
If you must have a UID, then you can just forget about the IDENTITY.quote: (Speaking of primary keys, you haven't defined one. )Yeah still deciding best indexes etc.
Just keep in mind that the Primary Key constraint has very little to do with indexing. There is nothing to say that your PK must be clustered. You could clustere URL, for example and define your PK constraint as non-clustered.quote: (You've got some normalization issues here. )Ive been considering this a lot, and was gonna spend today testing different options, originally it was fully normalized, and then it got denormalized. Im basically gonna build it both ways and test the query performance. My main reason for including the category n genre columns in the link like i did is because i want them to be searched on with the rest of the columns. But i might be able to achieve this with a view that performs a join anyway. mmm.
quote: (Another normalization issue is that you seem to have some transitive dependencies. How can a link have a Username and Password? How can a link 'Log in'? )I went for this option for simplicity, multiple links will require multiple logins to be accessed. I appreciate this aint properley normalized, on most my sites its fully normalized, but this one ive kept simple.
Many of the academics in the field will subscribe to the idea that at the end of the day denormalization will not improve performance. The cycles you spend attempting to maintaining data integrity in a de-normalized schema outweight the percieved perf benefit of the reduced number of joins. I think its best practice to design to a normalized schema first, doing all you can with it, then if performance is still not up to snuff try denormalizing. Pascal will ask, "Which is easier to find in a library? 5 books or 10?" The answer, of course, is "It depends where the books are located, not how many you need".<O> |
 |
|
|
kristian
Starting Member
21 Posts |
Posted - 2002-07-01 : 10:26:25
|
| OK, so im going down the normalization route regarding the categorys and genres. Ive come up with thisCREATE TABLE [dbo].[MemberOf] ( [LinkID] [int] NOT NULL , [IsCat] [bit] NOT NULL , [Item] [varchar] (30) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ) ON [PRIMARY]GOLinkId will relate to the correspondings links identity, IsCat will indicate true if item is a category, and false if item is a genre.Is this not normal, should i be splitting genre and category into 2 tables, or is this acceptable. And split or not, how would you apply the indexes. There will be lots of select where item = something, but itll always be joined by the linkID. Im guessing, (and im new to using indexes for optimizing queries) that there should be a clustered primary index on link id (because ill be wanting all the items relating to an id), and a non clustered index on item, is that right?Thanks |
 |
|
|
Page47
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2002-07-01 : 10:47:49
|
I still don't fully understand Category and Genre and Item. Tell me which is true...- A link can have one or zero category/genre.
- A link can have one or zero categories and a link can have one or zero genres.
- A link can have many categories and a link can have many genres. Many links can have the same category and many links can have the same genres.
<O> |
 |
|
|
kristian
Starting Member
21 Posts |
Posted - 2002-07-01 : 12:43:58
|
| A link..must have a 1+ categories.can have 0+ genres.will not have a duplicated genre or category. |
 |
|
|
kristian
Starting Member
21 Posts |
Posted - 2002-07-01 : 13:06:07
|
| Im not using the member table the way i listed above now, let me struggle on for a little bit and ill see how i get on. |
 |
|
|
AjarnMark
SQL Slashing Gunting Master
3246 Posts |
Posted - 2002-07-01 : 15:24:03
|
quote: (TWO surrogate keys (ID and UID))I dont expect this to stay, but im forced to use a uid for full text indexing it seems.
I don't know of any reason that this would be true. I have a few tables with Full-Text Indexing, and I never use GUIDs. What led you to this conclusion? Perhaps you are confusing a GUID with a TIMESTAMP which Full-Text Indexing can use to do incremental updates to the catalog rather than full repopulation? |
 |
|
|
kristian
Starting Member
21 Posts |
Posted - 2002-07-01 : 16:47:42
|
| Im not sure where i got that from actually, ive read so many sites the last few days, i hope im wrong, ive read more sites saying avoid uniqueidentifier if you can help it. Ill try creating the index without it.So i presume my primary clustered index will be the userid, seeing as hits etc are recorded by selecting that field. |
 |
|
|
kristian
Starting Member
21 Posts |
Posted - 2002-07-01 : 18:24:33
|
| Your right about not needing the UID, thats gone now. But is there anyway of creating a full text catalogue on just some of the records in the table. Id like to keep non validated links (links with validated bit field set to 0) out of the full text searhces.A workaround would be to only repopulate the catalogues after i have validated all the links. But this isnt very elegent and i know from experience ill validate half the links then save the rest for another day. |
 |
|
|
AjarnMark
SQL Slashing Gunting Master
3246 Posts |
Posted - 2002-07-01 : 19:09:25
|
| AFAIK there's no other way than what you have already discussed to keep them out of the index. Of course, my question is... WHY do you want to keep them out of the index? You can still put your "WHERE Validated=1" clause in your SELECT statement that queries the Full-Text Index. You don't have to use just one or the other. |
 |
|
|
|
|
|
|
|