Author |
Topic |
sreeni12345
Starting Member
14 Posts |
Posted - 2014-05-02 : 08:45:48
|
Hi all I have a table by name table1 it has following columnsFirstName LastName AmountAtmuri Neelima 1000Munagala Sreenivas 2000Koppuravuri Satya 3000Maradugu Jahnavi 4000Kadem Rajesh 5000Jonnala Omkar 6000Mallela Subramanyam 7000My Requirement is I wanto shuffle the records which are in FirstName(column) Following is the scenario which I have followed1.I have taken a temp table and inserted the shuffled records into it SELECT t2.FirstName into #temptable1 FROM (SELECT FirstName,ROW_NUMBER() OVER (ORDER BY NEWID()) AS recid FROM table1) AS t1 INNER JOIN (SELECT FirstName,ROW_NUMBER() OVER (ORDER BY NEWID()) AS recid FROM table1) AS t2 ON t2.recID = t1.recID The above query generates the FirstNames in a shuffled manner.2.Now I wanto update the records into Table1 exactly as it got shuffled in the above query. To achieve the same I have written the below queryUPDATE Table1 SET FirstName = t2.FirstName FROM Table1 t1 INNER JOIN # temptable1 t2 ON t1. FirstName = t2. FirstNameBut this is not getting worked out.Finally my goal is to shuffle the records which are in FirstName(Column). Note : The above table consists of 75 lakhs of records in it. Please suggest me a best approachKindly Help |
|
gbritton
Master Smack Fu Yak Hacker
2780 Posts |
Posted - 2014-05-02 : 09:04:08
|
If you mean you want to control the physical order of the records, you have to build a clustered index. That index might be the shuffling order you're talking about. Note that, without a clustered index, the table is stored as a hash table on disk; the physical order is then determined by SQL according to its best judgement. You cannot influence or alter the physical order of a hash table in any way.Abstractly, SQL works with sets. Mathematically, the elements of a set are not ordered in any way. |
|
|
sreeni12345
Starting Member
14 Posts |
Posted - 2014-05-02 : 09:36:15
|
i dont wanto keep the actual data . i wanto modify the FirstName(column) where the actual data is modified as shuffled. I wanto modify only the FirstName(column) but not the entire record. Please suggest me the answer in such a way where i wanto shuffle only the FirstName(Column) randomly. |
|
|
gbritton
Master Smack Fu Yak Hacker
2780 Posts |
Posted - 2014-05-02 : 10:25:09
|
This seemed to do it for me:with t1 as ( select FirstName, LastName, Amount, rn = row_number() over(order by newid()) from #t ), t2 as ( select FirstName, LastName, Amount, rn = row_number() over(order by newid()) from #t ) select t2.FirstName, t1.LastName, t1.Amount from t2, t1where t1.rn = t2.rnorder by t1.lastname asc the trick is the ORDER BY NEWID() clause, which causes a random ordering for the computed column "rn" |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2014-05-02 : 11:18:18
|
I think the OP knows the random trick as they used it in the original post. Also, I think the question is how to update the real table with these randomized results.this should work:update t1 set firstName = (select top 1 firstname from table1 where firstname != t1.firstname order by newid())from table1 t1 Be One with the OptimizerTG |
|
|
gbritton
Master Smack Fu Yak Hacker
2780 Posts |
Posted - 2014-05-02 : 11:35:15
|
quote: Originally posted by TG I think the OP knows the random trick as they used it in the original post. Also, I think the question is how to update the real table with these randomized results.this should work:update t1 set firstName = (select top 1 firstname from table1 where firstname != t1.firstname order by newid())from table1 t1 Be One with the OptimizerTG
Well...not quite. You can get duplicate firstnames in the result with that. e.g. when I first tried it:select firstName = (select top 1 firstname from #t where firstname != t1.firstname order by newid())from #t t1 I got: quote: firstNameJonnalaMallelaMunagalaMunagalaAtmuriMallelaJonnala
with repeated names. |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2014-05-02 : 12:05:59
|
that will happen if you want to randomize it - plus with "75 lakhs" rows i'm sure there are folks with the same first name anyway . How much is a lakh anyway?Be One with the OptimizerTG |
|
|
gbritton
Master Smack Fu Yak Hacker
2780 Posts |
Posted - 2014-05-02 : 12:17:03
|
Remember, the OP wants to shuffle, not randomize! |
|
|
Lamprey
Master Smack Fu Yak Hacker
4614 Posts |
Posted - 2014-05-02 : 12:24:18
|
quote: Originally posted by gbritton If you mean you want to control the physical order of the records, you have to build a clustered index.
Sorry to get pedantic on you, but a clustered index doesn't control the physical order, rather, it controls the logical order.That may be a distinction that isn't irrelevant in the case, but there is a difference. :) |
|
|
gbritton
Master Smack Fu Yak Hacker
2780 Posts |
Posted - 2014-05-02 : 12:34:51
|
well...as I understand it the rows in each cluster should be physically ordered. the clusters, not so much. so you can "control" the physical order (as opposed to hash table), but not completely determine it.Microsoft emphasizes the stored order:The only time the data rows in a table are stored in sorted order is when the table contains a clustered indexhttp://technet.microsoft.com/en-us/library/ms190457.aspxbut that may not result in a total physical ordering from row 1 to row n because of a variety of factors, including filegroups, RAID levels, fragmentation etc. |
|
|
Lamprey
Master Smack Fu Yak Hacker
4614 Posts |
Posted - 2014-05-02 : 12:40:22
|
Rows may not even be physically ordered on a page. yes the pages are linked in "physical" order. But you are right that given sql just hands off the disk request to the OS, there really is no telling where, physically, the data lives.Here is a quote I ripped from Gail Shaw. I don't have a link to the original post (as it seems to have died), but maybe this will help explain:quote: Clustered index determines the logical order of the data, not the physical. Rows can be stored physically on a page in an order differing from the clustered index order and, if there is any fragmentation, the physical order and logical order of the leaf pages will not match (the definition of fragmentation).Let's take an example of a clustered index on an int column (not an identity) and let's say there are 100 rows in the index at 10 rows per page and those int values are contiguousThe index will have rows 1..10 on page 1, 11..20 on page 2 and so on with 91..100 on page 10. That the clustered index guarantees. Within the physical page however, there's no guarantee that the rows will physically be written on the page in that order. Let's say, for some odd reason, first all the odd numbered rows were inserted and then all the even. If you went and read the page physically, you'd see this:Page HeaderRow 1Row 3Row 5Row 7Row 9Row 2Row 4Row 6Row 8Slot indexPage footerBecause, when SQL goes to insert row 2, it doesn't want the cost of shifting all the rows down to make space, same as with page splits it doesn't want have to shift pages around to maintain order.The slot index, at the bottom of the page, maintains pointers into the page that are in the order of the clustering key, so that the correct rows can be retrieved. So logically, the slot index maintains the 'order' of rows but physically the rows may not be stored in order.You could argue that I'm been pedantic about it, but the prevalent myth that clustered index order = physical order on disk gives rise to lots of misunderstandings about indexes and ordering.
|
|
|
gbritton
Master Smack Fu Yak Hacker
2780 Posts |
Posted - 2014-05-02 : 12:47:08
|
I believe Gail may be wrong about this. Other commentators agree with my position. Of course the only way to be sure would be to provide a sector dump as a counterexample. I wish I had the tools to do that at my fingertips! |
|
|
gbritton
Master Smack Fu Yak Hacker
2780 Posts |
Posted - 2014-05-02 : 13:08:08
|
quote: Originally posted by gbritton I believe Gail may be wrong about this. Other commentators agree with my position. Of course the only way to be sure would be to provide a sector dump as a counterexample. I wish I had the tools to do that at my fingertips!
Here's a great detailed description from the source:http://technet.microsoft.com/en-us/library/ms177443(v=sql.105).aspxCheck out this section:"The pages in the data chain and the rows in them are ordered on the value of the clustered index key. "Though the sentence omits the word "physically", in context that is the only possible meaning. |
|
|
Lamprey
Master Smack Fu Yak Hacker
4614 Posts |
Posted - 2014-05-02 : 13:53:12
|
You are certainly free to believe what you like. :) |
|
|
sreeni12345
Starting Member
14 Posts |
Posted - 2014-05-03 : 06:26:16
|
I have used select firstName = (select top 1 firstname from #t where firstname != t1.firstname order by newid())from #t t1but its taking lot of time . Currently it has been 3 hours and still the update is running. Kindly suggest me with an alternative. However can i use any kind of batch wise updates ie save points are some thing to go ahead |
|
|
sreeni12345
Starting Member
14 Posts |
Posted - 2014-05-05 : 03:19:48
|
can any body help me in improving the performance of the udpate |
|
|
sreeni12345
Starting Member
14 Posts |
Posted - 2014-05-05 : 03:30:01
|
I have used the below query as well, Its still runningupdate t1 set firstName = (select top 1 firstname from table1 where firstname != t1.firstname order by newid())from table1 t1 |
|
|
gbritton
Master Smack Fu Yak Hacker
2780 Posts |
Posted - 2014-05-05 : 08:26:30
|
quote: Originally posted by sreeni12345 I have used the below query as well, Its still runningupdate t1 set firstName = (select top 1 firstname from table1 where firstname != t1.firstname order by newid())from table1 t1
This is basically an O(n**2) algorithm. So if your table is large enough, yes it will run a while!Did you try my earlier suggestion? I framed it as a query, but wrapping it in an update should be pretty simple.with t1 as ( select FirstName, LastName, Amount, rn = row_number() over(order by newid()) from #t ), t2 as ( select FirstName, LastName, Amount, rn = row_number() over(order by newid()) from #t ) select t2.FirstName, t1.LastName, t1.Amount from t2, t1where t1.rn = t2.rnorder by t1.lastname asc |
|
|
sreeni12345
Starting Member
14 Posts |
Posted - 2014-05-05 : 08:33:49
|
yaa i have tried it but can u let me know how to wrap it in an update , so sorry i am very much new to sql |
|
|
gbritton
Master Smack Fu Yak Hacker
2780 Posts |
Posted - 2014-05-05 : 09:20:14
|
This worked for me:;with t1 as ( select FirstName, LastName, Amount, rn = row_number() over(order by newid()) from #t ), t2 as ( select FirstName, LastName, Amount, rn = row_number() over(order by newid()) from #t )update t1set t1.Firstname = t2.Firstnamefrom t1, t2where t1.rn = t2.rnselect * from #t |
|
|
sreeni12345
Starting Member
14 Posts |
Posted - 2014-05-05 : 10:51:45
|
Actually i have worked out the query given by you . But its updating the same value in all the records. However i have used the below query and currently it worked outUPDATE table1 SET table.FirstName = #temptable1.FirstName FROM table INNER JOIN #temptable1 ON table.tablekey = #temptable1.tablekey WHERE table.tablekey = #temptable1.tablekey or (#temptable1.tablekey is not null and #temptable1.tablekey is null) or (#temptable1.tablekey is not null and #temptable1.tablekey is null)Correct me if i am wrong . |
|
|
Next Page
|