Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Data Modeling - De-Normalization & Indexes

Author  Topic 

imadiamond2016
Starting Member

29 Posts

Posted - 2011-08-25 : 13:25:45
I am preparing my data model and need to decide about de-normalization and indexes on one table. For that, I am providing the scripts for DB schema which you can find at the end of my reply:

Points to consider while analyzing are:

1. Trust is an organization which runs the hospitals. And there will be many trusts in the system.
2. Though, I have shown only 1 hospital under a trust, but in actual scenario there will be many hospitals under a trust (That’s why we have TrustID 1:N in table MS_Hospitals) and for each hospital we will have a different set of values for Units, Departments and Amounts.
3. There will be around 2-3 units under each hospital. That’s why we have HospitalID 1:N in table MS_HospitalUnits.
4. There will be around 20-30 deparments under each unit. That’s why we have UnitID 1:N in table MS_HospitalDepartments.
5. There will be around 20K-30K patient ids in each department. That’s why we have DepartmentID 1:N in table MS_HospitalPatients.

AND THE IMPORT IS DONE FOR EACH DEPARTMENT, HENCE THERE WILL BE ONLY ONE ENTRY IN TABLE MS_HospitalDepartments AND 20K-30K ENTRIES IN MS_HospitalPatients TABLE FOR EACH IMPORT.

This is going to be the core data which we will receive in XML and will be imported in the system. So, we will be importing patient’s data for each Department which belongs to a Unit which further belongs to a Hospital & which further belongs to a Trust.
6. DataID column in table MS_HospitalPatients is a surrogate key and it can be removed.

Questions for you

The database schema prepared by me is a normalized one. Now, as I have explained the business logic and provided the schema, please help in fixing my next two challenges i.e. De-Normlazation and Index

1. De-Normlization:
Data from MS_HospitalPatients table would be accessed on the basis of HospitalID and UnitID mostly and DepartmentID sometimes. So, should the table MS_HospitalPatients be de-normalized by including fields HospitalID, UnitID into it to avoid joins with table MS_HospitalDepartments and MS_HospitalUnits which contains the UnitID and HospitalID.

So, I strongly feel that HospitalID and UnitID should be included. This de-normalization is a very-2 important point here in terms of performance, it could save a couple of joins in my core screen which could result in a better user experience then. Please emphasize on that.

2. Indexes:
a. What should be the clustered index for table MS_HospitalPatients table and why?
b. What non-clustered indexes be added for table MS_HospitalPatients table and why?

Once I am clear about these two, I would be able to look into “Table Partioning” and some other features which can further help improve the performance.

Scripts for Database Schema


/****** Object: Table [dbo].[MS_HospitalTrusts] Script Date: 08/25/2011 00:56:10 ******/
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[MS_HospitalTrusts]') AND type in (N'U'))
DROP TABLE [dbo].[MS_HospitalTrusts]
GO

/****** Object: Table [dbo].[MS_HospitalTrusts] Script Date: 08/25/2011 00:56:10 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[MS_HospitalTrusts](
[TrustID] [int] IDENTITY(1,1) NOT NULL,
[TrustName] [nvarchar](100) NOT NULL,
CONSTRAINT [PK_MS_Trusts] PRIMARY KEY CLUSTERED
(
[TrustID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO



IF EXISTS (SELECT * FROM sys.foreign_keys WHERE object_id = OBJECT_ID(N'[dbo].[FK_MS_Hospitals_MS_Trusts]') AND parent_object_id = OBJECT_ID(N'[dbo].[MS_Hospitals]'))
ALTER TABLE [dbo].[MS_Hospitals] DROP CONSTRAINT [FK_MS_Hospitals_MS_Trusts]
GO


/****** Object: Table [dbo].[MS_Hospitals] Script Date: 08/25/2011 00:56:31 ******/
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[MS_Hospitals]') AND type in (N'U'))
DROP TABLE [dbo].[MS_Hospitals]
GO

/****** Object: Table [dbo].[MS_Hospitals] Script Date: 08/25/2011 00:56:31 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[MS_Hospitals](
[HospitalID] [int] IDENTITY(1,1) NOT NULL,
[HospitalName] [varchar](400) NOT NULL,
[TrustID] [int] NOT NULL,
CONSTRAINT [PK_MS_Hospitals] PRIMARY KEY CLUSTERED
(
[HospitalID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

ALTER TABLE [dbo].[MS_Hospitals] WITH CHECK ADD CONSTRAINT [FK_MS_Hospitals_MS_Trusts] FOREIGN KEY([TrustID])
REFERENCES [dbo].[MS_HospitalTrusts] ([TrustID])
GO

ALTER TABLE [dbo].[MS_Hospitals] CHECK CONSTRAINT [FK_MS_Hospitals_MS_Trusts]
GO


IF EXISTS (SELECT * FROM sys.foreign_keys WHERE object_id = OBJECT_ID(N'[dbo].[FK_MS_HospitalUnits_MS_Hospitals]') AND parent_object_id = OBJECT_ID(N'[dbo].[MS_HospitalUnits]'))
ALTER TABLE [dbo].[MS_HospitalUnits] DROP CONSTRAINT [FK_MS_HospitalUnits_MS_Hospitals]
GO


/****** Object: Table [dbo].[MS_HospitalUnits] Script Date: 08/25/2011 00:56:57 ******/
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[MS_HospitalUnits]') AND type in (N'U'))
DROP TABLE [dbo].[MS_HospitalUnits]
GO


/****** Object: Table [dbo].[MS_HospitalUnits] Script Date: 08/25/2011 00:56:57 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[MS_HospitalUnits](
[UnitID] [int] IDENTITY(1,1) NOT NULL,
[HospitalID] [int] NOT NULL,
[UnitName] [nvarchar](100) NOT NULL,
CONSTRAINT [PK_MS_HospitalUnits] PRIMARY KEY CLUSTERED
(
[UnitID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[MS_HospitalUnits] WITH CHECK ADD CONSTRAINT [FK_MS_HospitalUnits_MS_Hospitals] FOREIGN KEY([HospitalID])
REFERENCES [dbo].[MS_Hospitals] ([HospitalID])
GO

ALTER TABLE [dbo].[MS_HospitalUnits] CHECK CONSTRAINT [FK_MS_HospitalUnits_MS_Hospitals]
GO


IF EXISTS (SELECT * FROM sys.foreign_keys WHERE object_id = OBJECT_ID(N'[dbo].[FK_MS_HospitalDepartments_MS_HospitalUnits]') AND parent_object_id = OBJECT_ID(N'[dbo].[MS_HospitalDepartments]'))
ALTER TABLE [dbo].[MS_HospitalDepartments] DROP CONSTRAINT [FK_MS_HospitalDepartments_MS_HospitalUnits]
GO


/****** Object: Table [dbo].[MS_HospitalDepartments] Script Date: 08/25/2011 00:57:15 ******/
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[MS_HospitalDepartments]') AND type in (N'U'))
DROP TABLE [dbo].[MS_HospitalDepartments]
GO

/****** Object: Table [dbo].[MS_HospitalDepartments] Script Date: 08/25/2011 00:57:15 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[MS_HospitalDepartments](
[DepartmentID] [int] IDENTITY(1,1) NOT NULL,
[DepartmentName] [varchar](100) NULL,
[UnitID] [int] NOT NULL,
CONSTRAINT [PK_MS_HospitalDepartments] PRIMARY KEY CLUSTERED
(
[DepartmentID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

ALTER TABLE [dbo].[MS_HospitalDepartments] WITH CHECK ADD CONSTRAINT [FK_MS_HospitalDepartments_MS_HospitalUnits] FOREIGN KEY([UnitID])
REFERENCES [dbo].[MS_HospitalUnits] ([UnitID])
GO

ALTER TABLE [dbo].[MS_HospitalDepartments] CHECK CONSTRAINT [FK_MS_HospitalDepartments_MS_HospitalUnits]
GO

IF EXISTS (SELECT * FROM sys.foreign_keys WHERE object_id = OBJECT_ID(N'[dbo].[FK_MS_HospitalPatients_MS_Sources]') AND parent_object_id = OBJECT_ID(N'[dbo].[MS_HospitalPatients]'))
ALTER TABLE [dbo].[MS_HospitalPatients] DROP CONSTRAINT [FK_MS_HospitalPatients_MS_Sources]
GO


/****** Object: Table [dbo].[MS_HospitalPatients] Script Date: 08/25/2011 00:57:32 ******/
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[MS_HospitalPatients]') AND type in (N'U'))
DROP TABLE [dbo].[MS_HospitalPatients]
GO


/****** Object: Table [dbo].[MS_HospitalPatients] Script Date: 08/25/2011 00:57:32 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[MS_HospitalPatients](
[DataID] [int] IDENTITY(1,1) NOT NULL,
[DepartmentID] [int] NOT NULL,
[PatientID] [int] NOT NULL,
[Amount] [bigint] NOT NULL,
CONSTRAINT [PK_MS_HospitalPatients] PRIMARY KEY CLUSTERED
(
[DataID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[MS_HospitalPatients] WITH CHECK ADD CONSTRAINT [FK_MS_HospitalPatients_MS_Sources] FOREIGN KEY([DepartmentID])
REFERENCES [dbo].[MS_HospitalDepartments] ([DepartmentID])
GO

ALTER TABLE [dbo].[MS_HospitalPatients] CHECK CONSTRAINT [FK_MS_HospitalPatients_MS_Sources]
GO






denis_the_thief
Aged Yak Warrior

596 Posts

Posted - 2011-08-25 : 13:41:43
quote:
Originally posted by imadiamond2016


1. De-Normlization:


I would hold off on denormalization. We have some de-normalization and it was causing confusion - ie. Are these columns the same? Are they different? Do they appear twice for a business reason? If the denormailization means the user waits 1.04 seconds instead of 1.05 seconds, you haven't gained anything. Perhaps a denormalized Database in more of a Datawarehouse approach might work.

quote:

2. Indexes:
a. What should be the clustered index for table MS_HospitalPatients table and why?


A good approach is to start by indexing all foreign keys.
Go to Top of Page

imadiamond2016
Starting Member

29 Posts

Posted - 2011-08-25 : 13:49:29
Are these columns the same?
Ans: Yes, it's going to be the same column.

Are they different? Do they appear twice for a business reason?
Ans: As there will be loads of data in table MS_HospitalPatients and it would be accessed by UnitID and HospitalID most of the times. And to do that, I'll have to put 2 joins. I can save these 2 joins if use de-normalization here.
Go to Top of Page

denis_the_thief
Aged Yak Warrior

596 Posts

Posted - 2011-08-25 : 14:08:33
quote:
Originally posted by imadiamond2016

Are these columns the same?
Ans: Yes, it's going to be the same column.



You know that but someone else looking at it 2 years down the road won't.

quote:

Are they different? Do they appear twice for a business reason?
Ans: As there will be loads of data in table MS_HospitalPatients and it would be accessed by UnitID and HospitalID most of the times. And to do that, I'll have to put 2 joins. I can save these 2 joins if use de-normalization here.



I agree you'll save joins but does the benefit outwiegh the costs? For one, you'll need to write more code to update 2 columns rather than 1. Your example suggests you could save 1 join by having the column in 2 places - could you use the same argument to put the Hospital ID in every table and then save 2, 3 or 4 joins?
Go to Top of Page

imadiamond2016
Starting Member

29 Posts

Posted - 2011-08-25 : 14:38:57
I am considering to have HospitalID and UnitID in table MS_HospitalPatients because this table is going to have lacs of records, hence the retrieval for that should should be optimized.

Secondly, while inserting a row in table MS_HospitalPatients, I will not be doing any calculations for HospitalID and UnitID, they are just the values from parent table that need to be inserted in child table to optimize its performance.

Also, once a row is inserted in table MS_HospitalPatients, then the HospitalID and UnitID are NEVER going to be changed. Hence, there is no cost in UPDATE.

So, I think the quick retreival with de-normalization will outweigh the cost in my case. What you say?
Go to Top of Page

denis_the_thief
Aged Yak Warrior

596 Posts

Posted - 2011-08-25 : 15:03:11
quote:
Originally posted by imadiamond2016

So, I think the quick retreival with de-normalization will outweigh the cost in my case. What you say?



I guess I haven't gotten in to Denormalization so hard to say. You could try Normalized, time your query when you have data, then time it Denormalized and see what the time-saving is.
Go to Top of Page
   

- Advertisement -