It will probably be logging about 2-3 rows in this table per second maximum so there will be some rows ... My question to you experts now is how the design of this table is done in the best way ( performance wise) in terms of "primary keys" , " clustered index" , "non clustered index"...
The only type of query that will retrieve data from this table is a standard "SELECT FROM" where " client_id " is of a certain value and " DATE_TIME " is between a specific date range.
SELECT Count ( client_id ) AS totantal FROM tbl_statistics WHERE ( client_datetime ) Between '2013 - 01-01 00:00:00 ' And '2014 - 01-01 00:00:00 '
Or is this better?
SELECT Count ( p_key ) AS totantal FROM tbl_statistics WHERE ( client_datetime ) Between '2013 - 01-01 00:00:00 ' And '2014 - 01-01 00:00:00 '
Do the sequence client_datetime values pretty much coincide with identity sequence? In other words the value is NOW for each insert? If so then I would make the clustered index on client_datetime. That way inserts are very efficient with rows added naturally to the "bottom" of the table. And your queries by client_datetime range will be very efficient.
However, I now see that your example date ranges are for a full year. That is a lot of data to return at 2-3 rows per second. If that is a typical query then I would probably change my answer and make the clustered index on p_key and a non clustered on (Client_id, client_datetime).
You are correct. The "client_datetime" is always "NOW". In the example query i selected a range for a full year. This will not be the most common question. I guess the most common range would be 1-30 days.
So would that mean you recommend a clustered index on client_datetime and nothing on the other columns (p_key, client_id) ?
Best practice is to have some sort of unique constraint on the table so that you can avoid duplicates and have a reliable row identifier. If there is no logical key - in other words duplicates are OK - then a non clustered primary key on p_key an obvious choice. I would only consider an index on client_id to resolve a performance problem. I could see that if do end up many rows in your date range or no date range at all.
EDIT: one more thing to consider. because you only have the three columns perhaps a clustered primary key on all of them could be best. That would essentially mean the PK IS the table. (Client_Datetime, Client_id, p_key). no other indexes.
The p_key would be last and just be a tie breaker if a couple rows got in there with the exact same date.
If all your queries include a date range then that would probably be best performing for both the inserts and the selects.