Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Replace part of string with capitalised equivalent

Author  Topic 

chrissyppppp
Starting Member

6 Posts

Posted - 2012-12-19 : 05:35:16
We have an issue with instrument names loaded into our data warehouse where they either loaded with the string all in upper case or in normal case (capital at the beginning of each word). We have a function that deals with changing all of the upper case strings into normal case and an example of some of these are below:

Merck & Co Inc Common Stock Usd.5
Newmont Mining Corp Common Stock Usd Inc 1.6

However there are parts of this which we would like to capitalise when it comes to reporting ie in the examples above 'Usd' should become 'USD', 'Inc' should become 'INC', 'Corp' should become 'CORP'.

These are all held in a cross reference table. I've decided to create a function which an example of which is below:


CREATE TABLE capital ([InternalValue] VARCHAR(255), [ExternalValue] VARCHAR(255));

CREATE TABLE instrument ([Instrument] VARCHAR(255));

INSERT INTO capital ([InternalValue], [ExternalValue])
VALUES
('Usd', 'USD'),
('Ltd', 'LTD'),
('Corp', 'CORP'),
('Inc', 'INC')

INSERT INTO instrument ([Instrument])
VALUES
('Merck & Co Inc Common Stock Usd.5'),
('Newmont Mining Corp Common Stock Usd Inc 1.6');

CREATE FUNCTION [dbo].[fn_replacementsecurityname_test] (@string VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN

WITH ReplaceWord(InternalWord, ExternalWord) AS
(
SELECT InternalValue, ExternalValue
FROM capital
)
SELECT @string = REPLACE(@string, r.InternalWord, r.ExternalWord)
FROM ReplaceWord r
WHERE CHARINDEX(r.InternalWord, @string) > 0

RETURN @string
END

SELECT dbo.replacement(i.Instrument) NewValue
FROM instrument AS i


So this works fine, however how can I ensure that i'm replacing what i want to replace. ie in the above example say we wanted all "Co" to be "CO" but also ensuring that "Common" didn't become "COmmon"? Is there any way we could look at the length of the replacement string and that it only replaces words of that length?

Any help or examples would be greatly appreciated.

bandi
Master Smack Fu Yak Hacker

2242 Posts

Posted - 2012-12-19 : 07:06:26
See this example..
If you have identity column in [instrument] table... Then you can very easily adopt the following query

Just run this script once
IF OBJECT_ID('TempDB..#mytable','U') IS NOT NULL
DROP TABLE #mytable
IF OBJECT_ID('TempDB..#replacements','U') IS NOT NULL
DROP TABLE #replacements

--===== Create the test table with
CREATE TABLE #mytable
(
ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
LongString nvarchar(max)
)

--===== Create table with replacements
CREATE TABLE #replacements
(
code nvarchar(55)
,replacement nvarchar(255)
)


--===== Add sample data
INSERT INTO #mytable (LongString)
select 'code01 something else code02' union all
select 'code02 code02 aacode01'

INSERT INTO #replacements
select 'code01', 'XXXX' union all
select 'code02', 'YYYY'


-- solution
;WITH Tally AS (SELECT top 100 n = ROW_NUMBER() OVER(ORDER BY [name]) from master.dbo.syscolumns),
NormalisedData AS (
SELECT m.ID,
WordID = ROW_NUMBER() OVER(PARTITION BY m.ID ORDER BY t.n),
m.longstring,
word = SUBSTRING(m.longstring, t.n,
(ISNULL(NULLIF(CHARINDEX(' ', SUBSTRING(m.longstring, t.n, LEN(m.LongString)-t.n)), 0),
LEN(SUBSTRING(m.longstring, t.n, LEN(m.LongString)-t.n))+2)) - 1)
FROM #mytable m
INNER JOIN Tally t ON SUBSTRING(' ' + m.LongString, t.n, 1) = ' '
WHERE t.n <= LEN(m.LongString)
),
ReplacedData AS (
SELECT d.ID, d.WordID, d.LongString, NewWord = ISNULL(r.replacement, d.word)
FROM NormalisedData d
LEFT JOIN #replacements r ON r.code = d.word
)
SELECT d.ID, OldString = d.LongString,
NewString = (
SELECT ' ' + NewWord
FROM ReplacedData r
WHERE r.ID = d.ID
ORDER BY ID, WordID
FOR XML PATH(''))
FROM ReplacedData d
GROUP BY d.ID, d.LongString


--
Chandu
Go to Top of Page

chrissyppppp
Starting Member

6 Posts

Posted - 2012-12-19 : 07:23:07
Thanks for the response. Is there anyway you could convert this into a function? This is what i'm truely after so that i can supply it with a string and it do a lookup to be able to replace the words that need replacing from the cross reference table.

Thanks
Go to Top of Page

bandi
Master Smack Fu Yak Hacker

2242 Posts

Posted - 2012-12-19 : 08:23:55
I just converted above query into a function.. You can do better than this..
For time being use following function:

IF OBJECT_ID('mytable','U') IS NOT NULL
DROP TABLE mytable
IF OBJECT_ID('replacements','U') IS NOT NULL
DROP TABLE replacements

--===== Create the test table with
CREATE TABLE mytable
(
ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
LongString nvarchar(max)
)

--===== Create table with replacements
CREATE TABLE replacements
(
code nvarchar(55)
,replacement nvarchar(255)
)


--===== Add sample data
INSERT INTO mytable (LongString)
select 'code01 something else code02' union all
select 'code02 code02 aacode01'

INSERT INTO replacements
select 'code01', 'XXXX' union all
select 'code02', 'YYYY'
GO
CREATE FUNCTION MyOwnReplace (@str varchar(1000))
RETURNs VARCHAR(2000)
AS
BEGIN
DECLARE @NewString varchar(2000) = ''
;WITH Tally AS
(SELECT top 100 n = ROW_NUMBER() OVER(ORDER BY [name]) from master.dbo.syscolumns)
,NormalisedData AS (
SELECT m.ID,
WordID = ROW_NUMBER() OVER(PARTITION BY m.ID ORDER BY t.n),
m.longstring,
word = SUBSTRING(m.longstring, t.n,
(ISNULL(NULLIF(CHARINDEX(' ', SUBSTRING(m.longstring, t.n, LEN(m.LongString)-t.n)), 0),
LEN(SUBSTRING(m.longstring, t.n, LEN(m.LongString)-t.n))+2)) - 1)
FROM mytable m
INNER JOIN Tally t ON SUBSTRING(' ' + m.LongString, t.n, 1) = ' ' AND m.LongString = @str
WHERE t.n <= LEN(m.LongString)
),
ReplacedData AS (
SELECT d.ID, d.WordID, d.LongString, NewWord = ISNULL(r.replacement, d.word)
FROM NormalisedData d
LEFT JOIN replacements r ON r.code = d.word
)
SELECT @NewString = (
SELECT ' ' + NewWord
FROM ReplacedData r
WHERE r.ID = d.ID
ORDER BY ID, WordID
FOR XML PATH(''))
FROM ReplacedData d
GROUP BY d.ID, d.LongString
RETURN @NewString
END
--Testing
SELECT dbo.MyOwnReplace(LongString), LongString
FROM mytable


--
Chandu
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-12-19 : 08:47:35
quote:
Originally posted by chrissyppppp

Thanks for the response. Is there anyway you could convert this into a function? This is what i'm truely after so that i can supply it with a string and it do a lookup to be able to replace the words that need replacing from the cross reference table.

Thanks


sorry why do you need a function if you already have string to be replaced as values inside another table?

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

chrissyppppp
Starting Member

6 Posts

Posted - 2012-12-19 : 08:50:21
I simply had the temporary table above as an example. I would prefer to use a function due to performance
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-12-19 : 08:56:52
quote:
Originally posted by chrissyppppp

I simply had the temporary table above as an example. I would prefer to use a function due to performance


so do you mean you will be calling function for replacing each of the words?

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

chrissyppppp
Starting Member

6 Posts

Posted - 2012-12-19 : 09:00:39
Yeah so i'll have a stored proc that will call this function and replace any of the words that need capitilising in instrument name column so that the Instrument names look correct in the data extract i'm developing
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-12-19 : 09:04:46
that means multiple invocations of same function and will be performance hit especially when in case of large datasets.Why not populate a table with all values and do a set based replace in one shot?

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

bandi
Master Smack Fu Yak Hacker

2242 Posts

Posted - 2012-12-19 : 09:11:43
quote:
So this works fine, however how can I ensure that i'm replacing what i want to replace. ie in the above example say we wanted all "Co" to be "CO" but also ensuring that "Common" didn't become "COmmon"? Is there any way we could look at the length of the replacement string and that it only replaces words of that length?

Hi simple solution for above problem is:
In your first function, do this modification

SELECT @string = REPLACE(@string, ' '+ r.InternalWord + ' ', ' '+r.ExternalWord+ ' ')
FROM ReplaceWord

--
Chandu
Go to Top of Page

chrissyppppp
Starting Member

6 Posts

Posted - 2012-12-19 : 10:05:12
quote:
Originally posted by bandi

quote:
So this works fine, however how can I ensure that i'm replacing what i want to replace. ie in the above example say we wanted all "Co" to be "CO" but also ensuring that "Common" didn't become "COmmon"? Is there any way we could look at the length of the replacement string and that it only replaces words of that length?

Hi simple solution for above problem is:
In your first function, do this modification

SELECT @string = REPLACE(@string, ' '+ r.InternalWord + ' ', ' '+r.ExternalWord+ ' ')
FROM ReplaceWord

--
Chandu



This is what i was thinking as a simple solution, but wanted to see if there was a more robust solution
Go to Top of Page

ScottPletcher
Aged Yak Warrior

550 Posts

Posted - 2012-12-19 : 16:51:05
I think using mixed case could just cause headaches for searches.

Even though it takes additional storage, I would materialize a column that was all upper case for doing searches, even if you didn't display that column but instead displayed the mixed-case version of the column.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-12-19 : 22:07:53
case difference would cause search issues only when database collation used is case sensitive. By default sql server collation is case insensitive so case difference doesnt make any issues.

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

ScottPletcher
Aged Yak Warrior

550 Posts

Posted - 2012-12-21 : 17:24:39
quote:
Originally posted by visakh16

case difference would cause search issues only when database collation used is case sensitive. By default sql server collation is case insensitive so case difference doesnt make any issues.

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/




Perhaps. But each db, and even each column, can have a specific collation specified Or if the column value is used in XML
and the XML is searched, case issues will occur again.
Go to Top of Page
   

- Advertisement -